Operators#

[Polars Documentation]

All Polars conjunction, comparison, and binary operators in the linked documentation are supported and are considered row-by-row.

Even if you are in an aggregation context like .select or .agg, OpenDP enforces that inputs to binary operators are row-by-row. This is to ensure that the left and right arguments of binary operators have meaningful row alignment.

These operators are particularly useful for building filtering predicates and grouping columns.

[1]:
import polars as pl
import opendp.prelude as dp
dp.enable_features("contrib")
# Fetch and unpack the data.
![ -e ../sample_FR_LFS.csv ] || ( curl 'https://github.com/opendp/dp-test-datasets/blob/main/data/sample_FR_LFS.csv.zip?raw=true' --location --output sample_FR_LFS.csv.zip; unzip sample_FR_LFS.csv.zip -d ../ )

context = dp.Context.compositor(
    # Many columns contain mixtures of strings and numbers and cannot be parsed as floats,
    # so we'll set `ignore_errors` to true to avoid conversion errors.
    data=pl.scan_csv("../sample_FR_LFS.csv", ignore_errors=True),
    privacy_unit=dp.unit_of(contributions=36),
    privacy_loss=dp.loss_of(epsilon=1.0, delta=1e-7),
    split_evenly_over=1,
    margins={(): dp.polars.Margin(max_partition_length=60_000_000 * 36)}
)

query = (
    context.query()
    .filter((pl.col.HWUSUAL > 0) & (pl.col.HWUSUAL != 99))  # using the .gt, .and_ and .ne operators
    .with_columns(OVER_40=pl.col.AGE > 40)
    .group_by("SEX", "OVER_40")
    .agg(dp.len())
)
query.release().collect().sort("SEX", "OVER_40")
[1]:
shape: (4, 3)
SEXOVER_40len
i64boolu32
1false18045
1true22883
2false15838
2true21500