Operators#
All Polars conjunction, comparison, and binary operators in the linked documentation are supported and are considered row-by-row.
Even if you are in an aggregation context like .select
or .agg
, OpenDP enforces that inputs to binary operators are row-by-row. This is to ensure that the left and right arguments of binary operators have meaningful row alignment.
These operators are particularly useful for building filtering predicates and grouping columns.
[1]:
import polars as pl
import opendp.prelude as dp
dp.enable_features("contrib")
# Fetch and unpack the data.
![ -e ../sample_FR_LFS.csv ] || ( curl 'https://github.com/opendp/dp-test-datasets/blob/main/data/sample_FR_LFS.csv.zip?raw=true' --location --output sample_FR_LFS.csv.zip; unzip sample_FR_LFS.csv.zip -d ../ )
context = dp.Context.compositor(
# Many columns contain mixtures of strings and numbers and cannot be parsed as floats,
# so we'll set `ignore_errors` to true to avoid conversion errors.
data=pl.scan_csv("../sample_FR_LFS.csv", ignore_errors=True),
privacy_unit=dp.unit_of(contributions=36),
privacy_loss=dp.loss_of(epsilon=1.0, delta=1e-7),
split_evenly_over=1,
margins={(): dp.polars.Margin(max_partition_length=60_000_000 * 36)}
)
query = (
context.query()
.filter((pl.col.HWUSUAL > 0) & (pl.col.HWUSUAL != 99)) # using the .gt, .and_ and .ne operators
.with_columns(OVER_40=pl.col.AGE > 40)
.group_by("SEX", "OVER_40")
.agg(dp.len())
)
query.release().collect().sort("SEX", "OVER_40")
[1]:
shape: (4, 3)
SEX | OVER_40 | len |
---|---|---|
i64 | bool | u32 |
1 | false | 18045 |
1 | true | 22883 |
2 | false | 15838 |
2 | true | 21500 |