opendp.polars module#
The polars
module provides supporting utilities for making DP releases with the Polars library.
- class opendp.polars.DPExpr(expr)[source]#
- gaussian(scale=None)[source]#
Add Gaussian noise to the expression.
If scale is None it is filled by global_scale in
opendp.measurements.make_private_lazyframe()
.- Parameters:
scale (float | None) – Noise scale parameter for the Gaussian distribution. scale == standard_deviation.
- laplace(scale=None)[source]#
Add Laplace noise to the expression.
If scale is None it is filled by global_scale in
opendp.measurements.make_private_lazyframe()
.- Parameters:
scale (float | None) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).
- mean(bounds, scale=None)[source]#
Compute the differentially private mean.
The amount of noise to be added to the sum is determined by the scale. If scale is None it is filled by global_scale in
opendp.measurements.make_private_lazyframe()
.- Parameters:
bounds (Tuple[float, float]) – The bounds of the input data.
scale (float | None) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).
- median(candidates, scale=None)[source]#
Compute a differentially private median.
The scale calibrates the level of entropy when selecting a candidate.
- Parameters:
candidates (list[float]) – Potential quantiles to select from.
scale (float | None) – How much noise to add to the scores of candidate.
- noise(scale=None, distribution=None)[source]#
Add noise to the expression.
If scale is None it is filled by global_scale in
opendp.measurements.make_private_lazyframe()
. If distribution is None, then the noise distribution will be chosen for you:Pure-DP: Laplace noise, where scale == standard_deviation / sqrt(2)
zCDP: Gaussian noise, where scale == standard_devation
- Parameters:
scale (float | None) – Scale parameter for the noise distribution.
distribution (Literal['Laplace'] | ~typing.Literal['Gaussian'] | None) – Either Laplace, Gaussian or None.
- quantile(alpha, candidates, scale=None)[source]#
Compute a differentially private quantile.
The scale calibrates the level of entropy when selecting a candidate.
- Parameters:
alpha (float) – a value in $[0, 1]$. Choose 0.5 for median
candidates (list[float]) – Potential quantiles to select from.
scale (float | None) – How much noise to add to the scores of candidate.
- sum(bounds, scale=None)[source]#
Compute the differentially private sum.
If scale is None it is filled by global_scale in
opendp.measurements.make_private_lazyframe()
.- Parameters:
bounds (Tuple[float, float]) – The bounds of the input data.
scale (float | None) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).
- class opendp.polars.LazyFrameQuery[source]#
LazyFrameQuery mimics a Polars LazyFrame, but makes a few additions and changes as documented below.
- class opendp.polars.Margin(public_info: "Literal['keys'] | Literal['lengths'] | None" = None, max_partition_length: 'int | None' = None, max_num_partitions: 'int | None' = None, max_partition_contributions: 'int | None' = None, max_influenced_partitions: 'int | None' = None)[source]#
- Parameters:
public_info (Literal['keys'] | ~typing.Literal['lengths'] | None) –
max_partition_length (int | None) –
max_num_partitions (int | None) –
max_partition_contributions (int | None) –
max_influenced_partitions (int | None) –
- max_influenced_partitions: int | None = None#
The greatest number of partitions any one individual can contribute to.
- max_num_partitions: int | None = None#
An upper bound on the number of distinct partitions.
- max_partition_contributions: int | None = None#
The greatest number of records an individual may contribute to any one partition.
This can significantly reduce the sensitivity of grouped queries under zero-Concentrated DP.
- max_partition_length: int | None = None#
An upper bound on the number of records in any one partition.
If you don’t know how many records are in the data, you can specify a very loose upper bound.
This is used to resolve issues raised in [CSVW22 Widespread Underestimation of Sensitivity…](https://arxiv.org/pdf/2207.10635.pdf)
- public_info: Literal['keys'] | Literal['lengths'] | None = None#
Identifies properties of grouped data that are considered public information.
“keys” designates that keys are not protected
“lengths” designates that both keys and partition lengths are not protected
- class opendp.polars.OnceFrame(queryable)[source]#
-
- lazy()[source]#
Extracts a LazyFrame from a OnceFrame, circumventing protections against multiple evaluations.
Each collection consumes the entire allocated privacy budget. To remain DP at the advertised privacy level, only collect the LazyFrame once.
Features:
honest-but-curious - LazyFrames can be collected an unlimited number of times.