Questions or feedback?

opendp.measurements module#

The measurements module provides functions that apply calibrated noise to data to ensure differential privacy. For more context, see measurements in the User Guide.

For convenience, all the functions of this module are also available from opendp.prelude. We suggest importing under the conventional name dp:

>>> import opendp.prelude as dp

The methods of this module will then be accessible at dp.m.

opendp.measurements.debias_randomized_response_bitvec(answers, f)[source]#

Convert a vector of randomized response bitvec responses to a frequency estimate

Required features: contrib

debias_randomized_response_bitvec in Rust documentation.

Parameters:
  • answers – A vector of BitVectors with consistent size

  • f (float) – The per bit flipping probability used to encode answers

Computes the sum of the answers into a $k$-length vector $Y$ and returns ````math Y\frac{Y-\frac{f}{2}}{1-f} ```` :type f: float :raises TypeError: if an argument’s type differs from the expected type :raises UnknownTypeException: if a type argument fails to parse :raises OpenDPException: packaged error from the core OpenDP library

opendp.measurements.make_alp_queryable(input_domain, input_metric, scale, total_limit, value_limit=None, size_factor=50, alpha=4)[source]#

Measurement to release a queryable containing a DP projection of bounded sparse data.

The size of the projection is O(total * size_factor * scale / alpha). The evaluation time of post-processing is O(beta * scale / alpha).

size_factor is an optional multiplier (defaults to 50) for setting the size of the projection. There is a memory/utility trade-off. The value should be sufficiently large to limit hash collisions.

Required features: contrib

make_alp_queryable in Rust documentation.

Citations:

Supporting Elements:

  • Input Domain: MapDomain<AtomDomain<K>, AtomDomain<CI>>

  • Output Type: Queryable<K, f64>

  • Input Metric: L1Distance<CI>

  • Output Measure: MaxDivergence

Parameters:
  • input_domain (Domain) –

  • input_metric (Metric) –

  • scale (float) – Privacy loss parameter. This is equal to epsilon/sensitivity.

  • total_limit – Either the true value or an upper bound estimate of the sum of all values in the input.

  • value_limit – Upper bound on individual values (referred to as β). Entries above β are clamped.

  • size_factor – Optional multiplier (default of 50) for setting the size of the projection.

  • alpha – Optional parameter (default of 4) for scaling and determining p in randomized response step.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

opendp.measurements.make_gaussian(input_domain, input_metric, scale, k=None, MO='ZeroConcentratedDivergence')[source]#

Make a Measurement that adds noise from the Gaussian(scale) distribution to the input.

Valid inputs for input_domain and input_metric are:

input_domain

input type

input_metric

atom_domain(T)

T

absolute_distance(QI)

vector_domain(atom_domain(T))

Vec<T>

l2_distance(QI)

Required features: contrib

make_gaussian in Rust documentation.

Supporting Elements:

  • Input Domain: D

  • Output Type: D::Carrier

  • Input Metric: D::InputMetric

  • Output Measure: MO

Parameters:
  • input_domain (Domain) – Domain of the data type to be privatized.

  • input_metric (Metric) – Metric of the data type to be privatized.

  • scale (float) – Noise scale parameter for the gaussian distribution. scale == standard_deviation.

  • k – The noise granularity in terms of 2^k.

  • MO (Type Argument) – Output Measure. The only valid measure is ZeroConcentratedDivergence.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> dp.enable_features('contrib')
>>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float)
>>> gaussian = dp.m.make_gaussian(*input_space, scale=1.0)
>>> print('100?', gaussian(100.0))
100? ...

Or, more readably, define the space and then chain:

>>> gaussian = input_space >> dp.m.then_gaussian(scale=1.0)
>>> print('100?', gaussian(100.0))
100? ...
opendp.measurements.make_geometric(input_domain, input_metric, scale, bounds=None)[source]#

Equivalent to make_laplace but restricted to an integer support. Can specify bounds to run the algorithm in near constant-time.

Required features: contrib

make_geometric in Rust documentation.

Citations:

Supporting Elements:

  • Input Domain: D

  • Output Type: D::Carrier

  • Input Metric: D::InputMetric

  • Output Measure: MaxDivergence

Parameters:
  • input_domain (Domain) –

  • input_metric (Metric) –

  • scale (float) –

  • bounds

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> dp.enable_features("contrib")
>>> input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int)
>>> geometric = dp.m.make_geometric(*input_space, scale=1.0)
>>> print('100?', geometric(100))
100? ...

Or, more readably, define the space and then chain:

>>> geometric = input_space >> dp.m.then_geometric(scale=1.0)
>>> print('100?', geometric(100))
100? ...
opendp.measurements.make_laplace(input_domain, input_metric, scale, k=None)[source]#

Make a Measurement that adds noise from the Laplace(scale) distribution to the input.

Valid inputs for input_domain and input_metric are:

input_domain

input type

input_metric

atom_domain(T) (default)

T

absolute_distance(T)

vector_domain(atom_domain(T))

Vec<T>

l1_distance(T)

Internally, all sampling is done using the discrete Laplace distribution.

Required features: contrib

make_laplace in Rust documentation.

Citations:

Supporting Elements:

  • Input Domain: D

  • Output Type: D::Carrier

  • Input Metric: D::InputMetric

  • Output Measure: MaxDivergence

Parameters:
  • input_domain (Domain) – Domain of the data type to be privatized.

  • input_metric (Metric) – Metric of the data type to be privatized.

  • scale (float) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).

  • k – The noise granularity in terms of 2^k, only valid for domains over floats.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> import opendp.prelude as dp
>>> dp.enable_features("contrib")
>>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float)
>>> laplace = dp.m.make_laplace(*input_space, scale=1.0)
>>> print('100?', laplace(100.0))
100? ...

Or, more readably, define the space and then chain:

>>> laplace = input_space >> dp.m.then_laplace(scale=1.0)
>>> print('100?', laplace(100.0))
100? ...
opendp.measurements.make_laplace_threshold(input_domain, input_metric, scale, threshold, k=-1074)[source]#

Make a Measurement that uses propose-test-release to privatize a hashmap of counts.

This function takes a noise granularity in terms of 2^k. Larger granularities are more computationally efficient, but have a looser privacy map. If k is not set, k defaults to the smallest granularity.

Required features: contrib, floating-point

make_laplace_threshold in Rust documentation.

Supporting Elements:

  • Input Domain: MapDomain<AtomDomain<TK>, AtomDomain<TV>>

  • Output Type: HashMap<TK, TV>

  • Input Metric: L1Distance<TV>

  • Output Measure: Approximate<MaxDivergence>

Parameters:
  • input_domain (Domain) – Domain of the input.

  • input_metric (Metric) – Metric for the input domain.

  • scale (float) – Noise scale parameter for the laplace distribution. scale == standard_deviation / sqrt(2).

  • threshold – Exclude counts that are less than this minimum value.

  • k (int) – The noise granularity in terms of 2^k.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

opendp.measurements.make_private_expr(input_domain, input_metric, output_measure, expr, global_scale=None)[source]#

Create a differentially private measurement from an [Expr].

Required features: contrib, honest-but-curious

make_private_expr in Rust documentation.

Why honest-but-curious?:

The privacy guarantee governs only at most one evaluation of the released expression.

Supporting Elements:

  • Input Domain: WildExprDomain

  • Output Type: ExprPlan

  • Input Metric: MI

  • Output Measure: MO

Parameters:
  • input_domain (Domain) – The domain of the input data.

  • input_metric (Metric) – How to measure distances between neighboring input data sets.

  • output_measure (Measure) – How to measure privacy loss.

  • expr – The [Expr] to be privatized.

  • global_scale – A tune-able parameter that affects the privacy-utility tradeoff.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

opendp.measurements.make_private_lazyframe(input_domain, input_metric, output_measure, lazyframe, global_scale=None, threshold=None)[source]#

Create a differentially private measurement from a [LazyFrame].

Any data inside the [LazyFrame] is ignored, but it is still recommended to start with an empty [DataFrame] and build up the computation using the [LazyFrame] API.

Required features: contrib

make_private_lazyframe in Rust documentation.

Supporting Elements:

  • Input Domain: LazyFrameDomain

  • Output Type: OnceFrame

  • Input Metric: MI

  • Output Measure: MO

Parameters:
  • input_domain (Domain) – The domain of the input data.

  • input_metric (Metric) – How to measure distances between neighboring input data sets.

  • output_measure (Measure) – How to measure privacy loss.

  • lazyframe – A description of the computations to be run, in the form of a [LazyFrame].

  • global_scale – Optional. A tune-able parameter that affects the privacy-utility tradeoff.

  • threshold – Optional. Minimum number of rows in each released partition.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> dp.enable_features("contrib")
>>> import polars as pl

We’ll imagine an elementary school is taking a pet census. The private census data will have two columns:

>>> lf_domain = dp.lazyframe_domain([
...     dp.series_domain("grade", dp.atom_domain(T=dp.i32)),
...     dp.series_domain("pet_count", dp.atom_domain(T=dp.i32))])

We also need to specify the column we’ll be grouping by.

>>> lf_domain_with_margin = dp.with_margin(
...     lf_domain,
...     by=["grade"],
...     public_info="keys",
...     max_partition_length=50)

With that in place, we can plan the Polars computation, using the dp plugin.

>>> plan = (
...     pl.LazyFrame(schema={'grade': pl.Int32, 'pet_count': pl.Int32})
...     .group_by("grade")
...     .agg(pl.col("pet_count").dp.sum((0, 10), scale=1.0)))

We now have all the pieces to make our measurement function using make_private_lazyframe:

>>> dp_sum_pets_by_grade = dp.m.make_private_lazyframe(
...     input_domain=lf_domain_with_margin,
...     input_metric=dp.symmetric_distance(),
...     output_measure=dp.max_divergence(),
...     lazyframe=plan,
...     global_scale=1.0)

It’s only at this point that we need to introduce the private data.

>>> df = pl.from_records(
...     [
...         [0, 0], # No kindergarteners with pets.
...         [0, 0],
...         [0, 0],
...         [1, 1], # Each first grader has 1 pet.
...         [1, 1],
...         [1, 1],
...         [2, 1], # One second grader has chickens!
...         [2, 1],
...         [2, 9]
...     ],
...     schema=['grade', 'pet_count'], orient="row")
>>> lf = pl.LazyFrame(df)
>>> results = dp_sum_pets_by_grade(lf).collect()
>>> print(results.sort("grade")) 
shape: (3, 2)
┌───────┬───────────┐
│ grade ┆ pet_count │
│ ---   ┆ ---       │
│ i64   ┆ i64       │
╞═══════╪═══════════╡
│ 0     ┆ ...       │
│ 1     ┆ ...       │
│ 2     ┆ ...       │
└───────┴───────────┘
opendp.measurements.make_randomized_response(categories, prob, T=None)[source]#

Make a Measurement that implements randomized response on a categorical value.

Required features: contrib

make_randomized_response in Rust documentation.

Supporting Elements:

  • Input Domain: AtomDomain<T>

  • Output Type: T

  • Input Metric: DiscreteDistance

  • Output Measure: MaxDivergence

Proof Definition:

(Proof Document)

Parameters:
  • categories – Set of valid outcomes

  • prob (float) – Probability of returning the correct answer. Must be in [1/num_categories, 1)

  • T (Type Argument) – Data type of a category.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> dp.enable_features("contrib")
>>> random_string = dp.m.make_randomized_response(['a', 'b', 'c'], 0.99)
>>> print('a?', random_string('a'))
a? ...
opendp.measurements.make_randomized_response_bitvec(input_domain, input_metric, f, constant_time=False)[source]#

Make a Measurement that implements randomized response on a bit vector.

This primitive can be useful for implementing RAPPOR.

Required features: contrib

make_randomized_response_bitvec in Rust documentation.

Citations:

Supporting Elements:

  • Input Domain: BitVectorDomain

  • Output Type: BitVector

  • Input Metric: DiscreteDistance

  • Output Measure: MaxDivergence

Proof Definition:

(Proof Document)

Parameters:
  • input_domain (Domain) – BitVectorDomain with max_weight

  • input_metric (Metric) – DiscreteDistance

  • f (float) – Per-bit flipping probability. Must be in $(0, 1]$.

  • constant_time (bool) – Whether to run the Bernoulli samplers in constant time, this is likely to be extremely slow.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> import numpy as np
>>> import opendp.prelude as dp
>>> dp.enable_features("contrib")
>>> # Create the randomized response mechanism
>>> m_rr = dp.m.make_randomized_response_bitvec(
...     dp.bitvector_domain(max_weight=4), dp.discrete_distance(), f=0.95
... )
>>> # compute privacy loss
>>> m_rr.map(1)
0.8006676684558611
>>> # formula is 2 * m * ln((2 - f) / f)
>>> # where m = 4 (the weight) and f = .95 (the flipping probability)
>>> # prepare a dataset to release, by encoding a bit vector as a numpy byte array
>>> data = np.packbits(
...     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
... )
>>> assert np.array_equal(data, np.array([0, 8, 12], dtype=np.uint8))
>>> # roundtrip: numpy -> bytes -> mech -> bytes -> numpy
>>> release = np.frombuffer(m_rr(data.tobytes()), dtype=np.uint8)
>>> # compare the two bit vectors:
>>> [int(bit) for bit in np.unpackbits(data)]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
>>> [int(bit) for bit in np.unpackbits(release)]
[...]
opendp.measurements.make_randomized_response_bool(prob, constant_time=False)[source]#

Make a Measurement that implements randomized response on a boolean value.

Required features: contrib

make_randomized_response_bool in Rust documentation.

Supporting Elements:

  • Input Domain: AtomDomain<bool>

  • Output Type: bool

  • Input Metric: DiscreteDistance

  • Output Measure: MaxDivergence

Proof Definition:

(Proof Document)

Parameters:
  • prob (float) – Probability of returning the correct answer. Must be in [0.5, 1)

  • constant_time (bool) – Set to true to enable constant time. Slower.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> dp.enable_features("contrib")
>>> random_bool = dp.m.make_randomized_response_bool(0.99)
>>> print('True?', random_bool(True))
True? ...
opendp.measurements.make_report_noisy_max_gumbel(input_domain, input_metric, scale, optimize)[source]#

Make a Measurement that takes a vector of scores and privately selects the index of the highest score.

Required features: contrib

make_report_noisy_max_gumbel in Rust documentation.

Supporting Elements:

  • Input Domain: VectorDomain<AtomDomain<TIA>>

  • Output Type: usize

  • Input Metric: LInfDistance<TIA>

  • Output Measure: MaxDivergence

Proof Definition:

(Proof Document)

Parameters:
  • input_domain (Domain) – Domain of the input vector. Must be a non-nullable VectorDomain.

  • input_metric (Metric) – Metric on the input domain. Must be LInfDistance

  • scale (float) – Higher scales are more private.

  • optimize (str) – Indicate whether to privately return the “max” or “min”

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> dp.enable_features("contrib")
>>> input_space = dp.vector_domain(dp.atom_domain(T=int)), dp.linf_distance(T=int)
>>> select_index = dp.m.make_report_noisy_max_gumbel(*input_space, scale=1.0, optimize='max')
>>> print('2?', select_index([1, 2, 3, 2, 1]))
2? ...

Or, more readably, define the space and then chain:

>>> select_index = input_space >> dp.m.then_report_noisy_max_gumbel(scale=1.0, optimize='max')
>>> print('2?', select_index([1, 2, 3, 2, 1]))
2? ...
opendp.measurements.make_user_measurement(input_domain, input_metric, output_measure, function, privacy_map, TO='ExtrinsicObject')[source]#

Construct a Measurement from user-defined callbacks.

Required features: contrib, honest-but-curious

Why honest-but-curious?:

This constructor only returns a valid measurement if for every pair of elements \(x, x'\) in input_domain, and for every pair (d_in, d_out), where d_in has the associated type for input_metric and d_out has the associated type for output_measure, if \(x, x'\) are d_in-close under input_metric, privacy_map(d_in) does not raise an exception, and privacy_map(d_in) <= d_out, then function(x), function(x') are d_out-close under output_measure.

In addition, function must not have side-effects, and privacy_map must be a pure function.

Supporting Elements:

  • Input Domain: AnyDomain

  • Output Type: AnyObject

  • Input Metric: AnyMetric

  • Output Measure: AnyMeasure

Parameters:
  • input_domain (Domain) – A domain describing the set of valid inputs for the function.

  • input_metric (Metric) – The metric from which distances between adjacent inputs are measured.

  • output_measure (Measure) – The measure from which distances between adjacent output distributions are measured.

  • function – A function mapping data from input_domain to a release of type TO.

  • privacy_map – A function mapping distances from input_metric to output_measure.

  • TO (Type Argument) – The data type of outputs from the function.

Return type:

Measurement

Raises:
  • TypeError – if an argument’s type differs from the expected type

  • UnknownTypeException – if a type argument fails to parse

  • OpenDPException – packaged error from the core OpenDP library

Example:

>>> dp.enable_features("contrib")
>>> def const_function(_arg):
...     return 42
>>> def privacy_map(_d_in):
...     return 0.
>>> space = dp.atom_domain(T=int), dp.absolute_distance(int)
>>> user_measurement = dp.m.make_user_measurement(
...     *space,
...     output_measure=dp.max_divergence(),
...     function=const_function,
...     privacy_map=privacy_map
... )
>>> print('42?', user_measurement(0))
42? 42
opendp.measurements.then_alp_queryable(scale, total_limit, value_limit=None, size_factor=50, alpha=4)[source]#

partial constructor of make_alp_queryable

See also

Delays application of input_domain and input_metric in opendp.measurements.make_alp_queryable()

Parameters:
  • scale (float) – Privacy loss parameter. This is equal to epsilon/sensitivity.

  • total_limit – Either the true value or an upper bound estimate of the sum of all values in the input.

  • value_limit – Upper bound on individual values (referred to as β). Entries above β are clamped.

  • size_factor – Optional multiplier (default of 50) for setting the size of the projection.

  • alpha – Optional parameter (default of 4) for scaling and determining p in randomized response step.

opendp.measurements.then_gaussian(scale, k=None, MO='ZeroConcentratedDivergence')[source]#

partial constructor of make_gaussian

See also

Delays application of input_domain and input_metric in opendp.measurements.make_gaussian()

Parameters:
  • scale (float) – Noise scale parameter for the gaussian distribution. scale == standard_deviation.

  • k – The noise granularity in terms of 2^k.

  • MO (Type Argument) – Output Measure. The only valid measure is ZeroConcentratedDivergence.

Example:

>>> dp.enable_features('contrib')
>>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float)
>>> gaussian = dp.m.make_gaussian(*input_space, scale=1.0)
>>> print('100?', gaussian(100.0))
100? ...

Or, more readably, define the space and then chain:

>>> gaussian = input_space >> dp.m.then_gaussian(scale=1.0)
>>> print('100?', gaussian(100.0))
100? ...
opendp.measurements.then_geometric(scale, bounds=None)[source]#

partial constructor of make_geometric

See also

Delays application of input_domain and input_metric in opendp.measurements.make_geometric()

Parameters:
  • scale (float) –

  • bounds

Example:

>>> dp.enable_features("contrib")
>>> input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int)
>>> geometric = dp.m.make_geometric(*input_space, scale=1.0)
>>> print('100?', geometric(100))
100? ...

Or, more readably, define the space and then chain:

>>> geometric = input_space >> dp.m.then_geometric(scale=1.0)
>>> print('100?', geometric(100))
100? ...
opendp.measurements.then_laplace(scale, k=None)[source]#

partial constructor of make_laplace

See also

Delays application of input_domain and input_metric in opendp.measurements.make_laplace()

Parameters:
  • scale (float) – Noise scale parameter for the Laplace distribution. scale == standard_deviation / sqrt(2).

  • k – The noise granularity in terms of 2^k, only valid for domains over floats.

Example:

>>> import opendp.prelude as dp
>>> dp.enable_features("contrib")
>>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float)
>>> laplace = dp.m.make_laplace(*input_space, scale=1.0)
>>> print('100?', laplace(100.0))
100? ...

Or, more readably, define the space and then chain:

>>> laplace = input_space >> dp.m.then_laplace(scale=1.0)
>>> print('100?', laplace(100.0))
100? ...
opendp.measurements.then_laplace_threshold(scale, threshold, k=-1074)[source]#

partial constructor of make_laplace_threshold

See also

Delays application of input_domain and input_metric in opendp.measurements.make_laplace_threshold()

Parameters:
  • scale (float) – Noise scale parameter for the laplace distribution. scale == standard_deviation / sqrt(2).

  • threshold – Exclude counts that are less than this minimum value.

  • k (int) – The noise granularity in terms of 2^k.

opendp.measurements.then_private_expr(output_measure, expr, global_scale=None)[source]#

partial constructor of make_private_expr

See also

Delays application of input_domain and input_metric in opendp.measurements.make_private_expr()

Parameters:
  • output_measure (Measure) – How to measure privacy loss.

  • expr – The [Expr] to be privatized.

  • global_scale – A tune-able parameter that affects the privacy-utility tradeoff.

opendp.measurements.then_private_lazyframe(output_measure, lazyframe, global_scale=None, threshold=None)[source]#

partial constructor of make_private_lazyframe

See also

Delays application of input_domain and input_metric in opendp.measurements.make_private_lazyframe()

Parameters:
  • output_measure (Measure) – How to measure privacy loss.

  • lazyframe – A description of the computations to be run, in the form of a [LazyFrame].

  • global_scale – Optional. A tune-able parameter that affects the privacy-utility tradeoff.

  • threshold – Optional. Minimum number of rows in each released partition.

Example:

>>> dp.enable_features("contrib")
>>> import polars as pl

We’ll imagine an elementary school is taking a pet census. The private census data will have two columns:

>>> lf_domain = dp.lazyframe_domain([
...     dp.series_domain("grade", dp.atom_domain(T=dp.i32)),
...     dp.series_domain("pet_count", dp.atom_domain(T=dp.i32))])

We also need to specify the column we’ll be grouping by.

>>> lf_domain_with_margin = dp.with_margin(
...     lf_domain,
...     by=["grade"],
...     public_info="keys",
...     max_partition_length=50)

With that in place, we can plan the Polars computation, using the dp plugin.

>>> plan = (
...     pl.LazyFrame(schema={'grade': pl.Int32, 'pet_count': pl.Int32})
...     .group_by("grade")
...     .agg(pl.col("pet_count").dp.sum((0, 10), scale=1.0)))

We now have all the pieces to make our measurement function using make_private_lazyframe:

>>> dp_sum_pets_by_grade = dp.m.make_private_lazyframe(
...     input_domain=lf_domain_with_margin,
...     input_metric=dp.symmetric_distance(),
...     output_measure=dp.max_divergence(),
...     lazyframe=plan,
...     global_scale=1.0)

It’s only at this point that we need to introduce the private data.

>>> df = pl.from_records(
...     [
...         [0, 0], # No kindergarteners with pets.
...         [0, 0],
...         [0, 0],
...         [1, 1], # Each first grader has 1 pet.
...         [1, 1],
...         [1, 1],
...         [2, 1], # One second grader has chickens!
...         [2, 1],
...         [2, 9]
...     ],
...     schema=['grade', 'pet_count'], orient="row")
>>> lf = pl.LazyFrame(df)
>>> results = dp_sum_pets_by_grade(lf).collect()
>>> print(results.sort("grade")) 
shape: (3, 2)
┌───────┬───────────┐
│ grade ┆ pet_count │
│ ---   ┆ ---       │
│ i64   ┆ i64       │
╞═══════╪═══════════╡
│ 0     ┆ ...       │
│ 1     ┆ ...       │
│ 2     ┆ ...       │
└───────┴───────────┘
opendp.measurements.then_randomized_response_bitvec(f, constant_time=False)[source]#

partial constructor of make_randomized_response_bitvec

See also

Delays application of input_domain and input_metric in opendp.measurements.make_randomized_response_bitvec()

Parameters:
  • f (float) – Per-bit flipping probability. Must be in $(0, 1]$.

  • constant_time (bool) – Whether to run the Bernoulli samplers in constant time, this is likely to be extremely slow.

Example:

>>> import numpy as np
>>> import opendp.prelude as dp
>>> dp.enable_features("contrib")
>>> # Create the randomized response mechanism
>>> m_rr = dp.m.make_randomized_response_bitvec(
...     dp.bitvector_domain(max_weight=4), dp.discrete_distance(), f=0.95
... )
>>> # compute privacy loss
>>> m_rr.map(1)
0.8006676684558611
>>> # formula is 2 * m * ln((2 - f) / f)
>>> # where m = 4 (the weight) and f = .95 (the flipping probability)
>>> # prepare a dataset to release, by encoding a bit vector as a numpy byte array
>>> data = np.packbits(
...     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
... )
>>> assert np.array_equal(data, np.array([0, 8, 12], dtype=np.uint8))
>>> # roundtrip: numpy -> bytes -> mech -> bytes -> numpy
>>> release = np.frombuffer(m_rr(data.tobytes()), dtype=np.uint8)
>>> # compare the two bit vectors:
>>> [int(bit) for bit in np.unpackbits(data)]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
>>> [int(bit) for bit in np.unpackbits(release)]
[...]
opendp.measurements.then_report_noisy_max_gumbel(scale, optimize)[source]#

partial constructor of make_report_noisy_max_gumbel

See also

Delays application of input_domain and input_metric in opendp.measurements.make_report_noisy_max_gumbel()

Parameters:
  • scale (float) – Higher scales are more private.

  • optimize (str) – Indicate whether to privately return the “max” or “min”

Example:

>>> dp.enable_features("contrib")
>>> input_space = dp.vector_domain(dp.atom_domain(T=int)), dp.linf_distance(T=int)
>>> select_index = dp.m.make_report_noisy_max_gumbel(*input_space, scale=1.0, optimize='max')
>>> print('2?', select_index([1, 2, 3, 2, 1]))
2? ...

Or, more readably, define the space and then chain:

>>> select_index = input_space >> dp.m.then_report_noisy_max_gumbel(scale=1.0, optimize='max')
>>> print('2?', select_index([1, 2, 3, 2, 1]))
2? ...
opendp.measurements.then_user_measurement(output_measure, function, privacy_map, TO='ExtrinsicObject')[source]#

partial constructor of make_user_measurement

See also

Delays application of input_domain and input_metric in opendp.measurements.make_user_measurement()

Parameters:
  • output_measure (Measure) – The measure from which distances between adjacent output distributions are measured.

  • function – A function mapping data from input_domain to a release of type TO.

  • privacy_map – A function mapping distances from input_metric to output_measure.

  • TO (Type Argument) – The data type of outputs from the function.

Example:

>>> dp.enable_features("contrib")
>>> def const_function(_arg):
...     return 42
>>> def privacy_map(_d_in):
...     return 0.
>>> space = dp.atom_domain(T=int), dp.absolute_distance(int)
>>> user_measurement = dp.m.make_user_measurement(
...     *space,
...     output_measure=dp.max_divergence(),
...     function=const_function,
...     privacy_map=privacy_map
... )
>>> print('42?', user_measurement(0))
42? 42