opendp.polars module#

The polars module provides supporting utilities for making DP releases with the Polars library.

class opendp.polars.DPExpr(expr)[source]#
gaussian(scale=None)[source]#

Add Gaussian noise to the expression.

If scale is None it is filled by global_scale in opendp.measurements.make_private_lazyframe().

Parameters:

scale (float | None) – Noise scale parameter for the Gaussian distribution. scale == standard_deviation.

laplace(scale=None)[source]#

Add Laplace noise to the expression.

If scale is None it is filled by global_scale in opendp.measurements.make_private_lazyframe().

Parameters:

scale (float | None) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).

mean(bounds, scale=None)[source]#

Compute the differentially private mean.

The amount of noise to be added to the sum is determined by the scale. If scale is None it is filled by global_scale in opendp.measurements.make_private_lazyframe().

Parameters:
  • bounds (Tuple[float, float]) – The bounds of the input data.

  • scale (float | None) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).

median(candidates, scale=None)[source]#

Compute a differentially private median.

The scale calibrates the level of entropy when selecting a candidate.

Parameters:
  • candidates (list[float]) – Potential quantiles to select from.

  • scale (float | None) – How much noise to add to the scores of candidate.

noise(scale=None, distribution=None)[source]#

Add noise to the expression.

If scale is None it is filled by global_scale in opendp.measurements.make_private_lazyframe(). If distribution is None, then the noise distribution will be chosen for you:

  • Pure-DP: Laplace noise, where scale == standard_deviation / sqrt(2)

  • zCDP: Gaussian noise, where scale == standard_devation

Parameters:
  • scale (float | None) – Scale parameter for the noise distribution.

  • distribution (Literal['Laplace'] | ~typing.Literal['Gaussian'] | None) – Either Laplace, Gaussian or None.

quantile(alpha, candidates, scale=None)[source]#

Compute a differentially private quantile.

The scale calibrates the level of entropy when selecting a candidate.

Parameters:
  • alpha (float) – a value in $[0, 1]$. Choose 0.5 for median

  • candidates (list[float]) – Potential quantiles to select from.

  • scale (float | None) – How much noise to add to the scores of candidate.

sum(bounds, scale=None)[source]#

Compute the differentially private sum.

If scale is None it is filled by global_scale in opendp.measurements.make_private_lazyframe().

Parameters:
  • bounds (Tuple[float, float]) – The bounds of the input data.

  • scale (float | None) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).

class opendp.polars.LazyFrameQuery[source]#

LazyFrameQuery mimics a Polars LazyFrame, but makes a few additions and changes as documented below.

release()[source]#

Release the query. The query must be part of a context.

Return type:

OnceFrame

resolve()[source]#

Resolve the query into a measurement.

Return type:

Measurement

class opendp.polars.Margin(public_info: "Literal['keys'] | Literal['lengths'] | None" = None, max_partition_length: 'int | None' = None, max_num_partitions: 'int | None' = None, max_partition_contributions: 'int | None' = None, max_influenced_partitions: 'int | None' = None)[source]#
Parameters:
  • public_info (Literal['keys'] | ~typing.Literal['lengths'] | None) –

  • max_partition_length (int | None) –

  • max_num_partitions (int | None) –

  • max_partition_contributions (int | None) –

  • max_influenced_partitions (int | None) –

max_influenced_partitions: int | None = None#

The greatest number of partitions any one individual can contribute to.

max_num_partitions: int | None = None#

An upper bound on the number of distinct partitions.

max_partition_contributions: int | None = None#

The greatest number of records an individual may contribute to any one partition.

This can significantly reduce the sensitivity of grouped queries under zero-Concentrated DP.

max_partition_length: int | None = None#

An upper bound on the number of records in any one partition.

If you don’t know how many records are in the data, you can specify a very loose upper bound.

This is used to resolve issues raised in [CSVW22 Widespread Underestimation of Sensitivity…](https://arxiv.org/pdf/2207.10635.pdf)

public_info: Literal['keys'] | Literal['lengths'] | None = None#

Identifies properties of grouped data that are considered public information.

  • “keys” designates that keys are not protected

  • “lengths” designates that both keys and partition lengths are not protected

class opendp.polars.OnceFrame(queryable)[source]#
collect()[source]#

Collects a DataFrame from a OnceFrame, exhausting the OnceFrame.

lazy()[source]#

Extracts a LazyFrame from a OnceFrame, circumventing protections against multiple evaluations.

Each collection consumes the entire allocated privacy budget. To remain DP at the advertised privacy level, only collect the LazyFrame once.

Features:

  • honest-but-curious - LazyFrames can be collected an unlimited number of times.