opendp.measurements module#
The measurements module provides functions that apply calibrated noise to data to ensure differential privacy.
For more context, see measurements in the User Guide.
For convenience, all the functions of this module are also available from opendp.prelude.
We suggest importing under the conventional name dp:
>>> import opendp.prelude as dp
The methods of this module will then be accessible at dp.m.
- opendp.measurements.debias_randomized_response_bitvec(answers, f)[source]#
Convert a vector of randomized response bitvec responses to a frequency estimate
Computes the sum of the answers into a \(k\)-length vector \(Y\) and returns \(Y\frac{Y-\frac{f}{2}}{1-f}\)
Required features:
contribdebias_randomized_response_bitvec in Rust documentation.
- Parameters:
answers – A vector of BitVectors with consistent size
f (float) – The per bit flipping probability used to encode
answers
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- opendp.measurements.make_alp_queryable(input_domain, input_metric, scale, total_limit, value_limit=None, size_factor=50, alpha=4)[source]#
Measurement to release a queryable containing a DP projection of bounded sparse data.
The size of the projection is O(total * size_factor * scale / alpha). The evaluation time of post-processing is O(beta * scale / alpha).
size_factoris an optional multiplier (defaults to 50) for setting the size of the projection. There is a memory/utility trade-off. The value should be sufficiently large to limit hash collisions.Required features:
contribmake_alp_queryable in Rust documentation.
Citations:
ALP21 Differentially Private Sparse Vectors with Low Error, Optimal Space, and Fast Access Algorithm 4
Supporting Elements:
Input Domain:
MapDomain<AtomDomain<K>, AtomDomain<CI>>Output Type:
L01InfDistance<AbsoluteDistance<CI>>Input Metric:
MaxDivergenceOutput Measure:
Queryable<K, f64>
- Parameters:
input_domain (Domain) – Domain of input data
input_metric (Metric) – Metric on input domain
scale (float) – Privacy loss parameter. This is equal to epsilon/sensitivity.
total_limit – Either the true value or an upper bound estimate of the sum of all values in the input.
value_limit – Upper bound on individual values (referred to as β). Entries above β are clamped.
size_factor – Optional multiplier (default of 50) for setting the size of the projection.
alpha – Optional parameter (default of 4) for scaling and determining p in randomized response step.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_canonical_noise(input_domain, input_metric, d_in, d_out)[source]#
Make a Measurement that adds noise from a canonical noise distribution. The implementation is tailored towards approximate-DP, resulting in noise sampled from the Tulap distribution.
Required features:
contribmake_canonical_noise in Rust documentation.
Citations:
Supporting Elements:
Input Domain:
AtomDomain<f64>Output Type:
AbsoluteDistance<f64>Input Metric:
Approximate<MaxDivergence>Output Measure:
f64
Proof Definition:
- Parameters:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_gaussian(input_domain, input_metric, scale, k=None, MO='ZeroConcentratedDivergence')[source]#
Make a Measurement that adds noise from the Gaussian(
scale) distribution to the input.Valid inputs for
input_domainandinput_metricare:input_domaininput type
input_metricatom_domain(T)Tabsolute_distance(QI)vector_domain(atom_domain(T))Vec<T>l2_distance(QI)Required features:
contribmake_gaussian in Rust documentation.
Supporting Elements:
Input Domain:
DIOutput Type:
MIInput Metric:
MOOutput Measure:
DI::Carrier
Proof Definition:
- Parameters:
input_domain (Domain) – Domain of the data type to be privatized.
input_metric (Metric) – Metric of the data type to be privatized.
scale (float) – Noise scale parameter for the gaussian distribution.
scale== standard_deviation.k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure. The only valid measure is
ZeroConcentratedDivergence.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> dp.enable_features('contrib') >>> input_space = dp.atom_domain(T=float, nan=False), dp.absolute_distance(T=float) >>> gaussian = dp.m.make_gaussian(*input_space, scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> gaussian = input_space >> dp.m.then_gaussian(scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
- opendp.measurements.make_gaussian_threshold(input_domain, input_metric, scale, threshold, k=None, MO='Approximate<ZeroConcentratedDivergence>')[source]#
Make a Measurement that uses propose-test-release to privatize a hashmap of counts.
This function takes a noise granularity in terms of 2^k. Larger granularities are more computationally efficient, but have a looser privacy map. If k is not set, k defaults to the smallest granularity.
Required features:
contribmake_gaussian_threshold in Rust documentation.
Supporting Elements:
Input Domain:
DIOutput Type:
MIInput Metric:
MOOutput Measure:
DI::Carrier
Proof Definition:
- Parameters:
input_domain (Domain) – Domain of the input.
input_metric (Metric) – Metric for the input domain.
scale (float) – Noise scale parameter for the laplace distribution.
scale== standard_deviation / sqrt(2).threshold – Exclude pairs with values whose distance from zero exceeds this value.
k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_geometric(input_domain, input_metric, scale, bounds=None, MO='MaxDivergence')[source]#
Equivalent to
make_laplacebut restricted to an integer support. Can specifyboundsto run the algorithm in near constant-time.Required features:
contribmake_geometric in Rust documentation.
Citations:
Supporting Elements:
Input Domain:
DIOutput Type:
MIInput Metric:
MOOutput Measure:
DI::Carrier
Proof Definition:
- Parameters:
input_domain (Domain) – Domain of the data type to be privatized.
input_metric (Metric) – Metric of the data type to be privatized.
scale (float) – Noise scale parameter for the distribution.
scale== standard_deviation / sqrt(2).bounds – Set bounds on the count to make the algorithm run in constant-time.
MO (Type Argument) – Measure used to quantify privacy loss. Valid values are just
MaxDivergence
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int) >>> geometric = dp.m.make_geometric(*input_space, scale=1.0) >>> print('100?', geometric(100)) 100? ...
Or, more readably, define the space and then chain:
>>> geometric = input_space >> dp.m.then_geometric(scale=1.0) >>> print('100?', geometric(100)) 100? ...
- opendp.measurements.make_laplace(input_domain, input_metric, scale, k=None, MO='MaxDivergence')[source]#
Make a Measurement that adds noise from the Laplace(
scale) distribution to the input.Valid inputs for
input_domainandinput_metricare:input_domaininput type
input_metricatom_domain(T)(default)Tabsolute_distance(T)vector_domain(atom_domain(T))Vec<T>l1_distance(T)Internally, all sampling is done using the discrete Laplace distribution.
Required features:
contribmake_laplace in Rust documentation.
Citations:
Supporting Elements:
Input Domain:
DIOutput Type:
MIInput Metric:
MOOutput Measure:
DI::Carrier
Proof Definition:
- Parameters:
input_domain (Domain) – Domain of the data type to be privatized.
input_metric (Metric) – Metric of the data type to be privatized.
scale (float) – Noise scale parameter for the Laplace distribution.
scale== standard_deviation / sqrt(2).k – The noise granularity in terms of 2^k, only valid for domains over floats.
MO (Type Argument) – Measure used to quantify privacy loss. Valid values are just
MaxDivergence
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=float, nan=False), dp.absolute_distance(T=float) >>> laplace = dp.m.make_laplace(*input_space, scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> laplace = input_space >> dp.m.then_laplace(scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
- opendp.measurements.make_laplace_threshold(input_domain, input_metric, scale, threshold, k=None, MO='Approximate<MaxDivergence>')[source]#
Make a Measurement that uses propose-test-release to privatize a hashmap of counts.
This function takes a noise granularity in terms of 2^k. Larger granularities are more computationally efficient, but have a looser privacy map. If k is not set, k defaults to the smallest granularity.
Required features:
contribmake_laplace_threshold in Rust documentation.
Supporting Elements:
Input Domain:
DIOutput Type:
MIInput Metric:
MOOutput Measure:
DI::Carrier
Proof Definition:
- Parameters:
input_domain (Domain) – Domain of the input.
input_metric (Metric) – Metric for the input domain.
scale (float) – Noise scale parameter for the laplace distribution.
scale== standard_deviation / sqrt(2).threshold – Exclude counts that are less than this minimum value.
k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_noise(input_domain, input_metric, output_measure, scale, k=None)[source]#
Make a Measurement that adds noise from the appropriate distribution to the input.
Valid inputs for
input_domainandinput_metricare:input_domaininput type
input_metricatom_domain(T)Tabsolute_distance(QI)vector_domain(atom_domain(T))Vec<T>l2_distance(QI)Required features:
contribmake_noise in Rust documentation.
Supporting Elements:
Input Domain:
DIOutput Type:
MIInput Metric:
MOOutput Measure:
DI::Carrier
- Parameters:
input_domain (Domain) – Domain of the data type to be privatized.
input_metric (Metric) – Metric of the data type to be privatized.
output_measure (Measure) – Privacy measure. Either
MaxDivergenceorZeroConcentratedDivergence.scale (float) – Noise scale parameter.
k – The noise granularity in terms of 2^k.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_noise_threshold(input_domain, input_metric, output_measure, scale, threshold, k=None)[source]#
Make a Measurement that uses propose-test-release to privatize a hashmap of counts.
This function takes a noise granularity in terms of 2^k. Larger granularities are more computationally efficient, but have a looser privacy map. If k is not set, k defaults to the smallest granularity.
Required features:
contribmake_noise_threshold in Rust documentation.
Supporting Elements:
Input Domain:
DIOutput Type:
MIInput Metric:
Approximate<MO>Output Measure:
DI::Carrier
- Parameters:
input_domain (Domain) – Domain of the input.
input_metric (Metric) – Metric for the input domain.
output_measure (Measure) – Privacy measure. Either
MaxDivergenceorZeroConcentratedDivergence.scale (float) – Noise scale parameter for the laplace distribution.
scale== standard_deviation / sqrt(2).threshold – Exclude counts that are less than this minimum value.
k – The noise granularity in terms of 2^k.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_noisy_max(input_domain, input_metric, output_measure, scale, negate=False)[source]#
Make a Measurement that takes a vector of scores and privately selects the index of the highest score.
Required features:
contribmake_noisy_max in Rust documentation.
Supporting Elements:
Input Domain:
VectorDomain<AtomDomain<TIA>>Output Type:
LInfDistance<TIA>Input Metric:
MOOutput Measure:
usize
- Parameters:
input_domain (Domain) – Domain of the input vector. Must be a non-nullable
VectorDomaininput_metric (Metric) – Metric on the input domain. Must be
LInfDistanceoutput_measure (Measure) – One of
MaxDivergence,ZeroConcentratedDivergencescale (float) – Scale for the noise distribution
negate (bool) – Set to true to return min
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> dp.enable_features("contrib") >>> input_space = dp.vector_domain(dp.atom_domain(T=int)), dp.linf_distance(T=int) >>> select_index = dp.m.make_noisy_max(*input_space, dp.max_divergence(), scale=1.0) >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
Or, more readably, define the space and then chain:
>>> select_index = input_space >> dp.m.then_noisy_max(dp.max_divergence(), scale=1.0) >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
- opendp.measurements.make_noisy_top_k(input_domain, input_metric, output_measure, k, scale, negate=False)[source]#
Make a Measurement that takes a vector of scores and privately selects the index of the highest score.
Required features:
contribmake_noisy_top_k in Rust documentation.
Supporting Elements:
Input Domain:
VectorDomain<AtomDomain<TIA>>Output Type:
LInfDistance<TIA>Input Metric:
MOOutput Measure:
Vec<usize>
Proof Definition:
- Parameters:
input_domain (Domain) – Domain of the input vector. Must be a non-nullable VectorDomain.
input_metric (Metric) – Metric on the input domain. Must be LInfDistance
output_measure (Measure) – One of
MaxDivergenceorZeroConcentratedDivergencek (int) – Number of indices to select.
scale (float) – Scale for the noise distribution.
negate (bool) – Set to true to return bottom k
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_private_expr(input_domain, input_metric, output_measure, expr, global_scale=None)[source]#
Create a differentially private measurement from an [
Expr].Required features:
contrib,honest-but-curiousmake_private_expr in Rust documentation.
Why honest-but-curious?:
The privacy guarantee governs only at most one evaluation of the released expression.
Supporting Elements:
Input Domain:
WildExprDomainOutput Type:
MIInput Metric:
MOOutput Measure:
ExprPlan
- Parameters:
input_domain (Domain) – The domain of the input data.
input_metric (Metric) – How to measure distances between neighboring input data sets.
output_measure (Measure) – How to measure privacy loss.
expr – The [
Expr] to be privatized.global_scale – A tune-able parameter that affects the privacy-utility tradeoff.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_private_lazyframe(input_domain, input_metric, output_measure, lazyframe, global_scale=None, threshold=None)[source]#
Create a differentially private measurement from a [
LazyFrame].Any data inside the [
LazyFrame] is ignored, but it is still recommended to start with an empty [DataFrame] and build up the computation using the [LazyFrame] API.Required features:
contribmake_private_lazyframe in Rust documentation.
Supporting Elements:
Input Domain:
LazyFrameDomainOutput Type:
MIInput Metric:
MOOutput Measure:
OnceFrame
- Parameters:
input_domain (Domain) – The domain of the input data.
input_metric (Metric) – How to measure distances between neighboring input data sets.
output_measure (Measure) – How to measure privacy loss.
lazyframe – A description of the computations to be run, in the form of a [
LazyFrame].global_scale – Optional. A tune-able parameter that affects the privacy-utility tradeoff.
threshold – Optional. Minimum number of rows in each released group.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> dp.enable_features("contrib") >>> import polars as pl
We’ll imagine an elementary school is taking a pet census. The private census data will have two columns:
>>> lf_domain = dp.lazyframe_domain([ ... dp.series_domain("grade", dp.atom_domain(T=dp.i32)), ... dp.series_domain("pet_count", dp.atom_domain(T=dp.i32))])
We also need to specify the column we’ll be grouping by.
>>> lf_domain_with_margin = dp.with_margin( ... lf_domain, ... dp.polars.Margin( ... by=[pl.col("grade")], ... invariant="keys", ... max_length=50))
With that in place, we can plan the Polars computation, using the
dpplugin.>>> plan = ( ... pl.LazyFrame(schema={'grade': pl.Int32, 'pet_count': pl.Int32}) ... .group_by("grade") ... .agg(pl.col("pet_count").dp.sum((0, 10), scale=1.0)))
We now have all the pieces to make our measurement function using make_private_lazyframe:
>>> dp_sum_pets_by_grade = dp.m.make_private_lazyframe( ... input_domain=lf_domain_with_margin, ... input_metric=dp.symmetric_distance(), ... output_measure=dp.max_divergence(), ... lazyframe=plan, ... global_scale=1.0)
It’s only at this point that we need to introduce the private data.
>>> df = pl.from_records( ... [ ... [0, 0], # No kindergarteners with pets. ... [0, 0], ... [0, 0], ... [1, 1], # Each first grader has 1 pet. ... [1, 1], ... [1, 1], ... [2, 1], # One second grader has chickens! ... [2, 1], ... [2, 9] ... ], ... schema=['grade', 'pet_count'], orient="row") >>> lf = pl.LazyFrame(df) >>> results = dp_sum_pets_by_grade(lf).collect() >>> print(results.sort("grade")) shape: (3, 2) ┌───────┬───────────┐ │ grade ┆ pet_count │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═══════╪═══════════╡ │ 0 ┆ ... │ │ 1 ┆ ... │ │ 2 ┆ ... │ └───────┴───────────┘
- opendp.measurements.make_private_quantile(input_domain, input_metric, output_measure, candidates, alpha, scale)[source]#
Makes a Measurement the computes the quantile of a dataset.
Required features:
contribmake_private_quantile in Rust documentation.
Supporting Elements:
Input Domain:
VectorDomain<AtomDomain<T>>Output Type:
MIInput Metric:
MOOutput Measure:
T
- Parameters:
input_domain (Domain) – Uses a tighter sensitivity when the size of vectors in the input domain is known.
input_metric (Metric) – Either SymmetricDistance or InsertDeleteDistance.
output_measure (Measure) – Either MaxDivergence or ZeroConcentratedDivergence.
candidates – Potential quantiles to score
alpha (float) – a value in $[0, 1]$. Choose 0.5 for median
scale (float) – the scale of the noise added
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
- opendp.measurements.make_randomized_response(categories, prob, T=None)[source]#
Make a Measurement that implements randomized response on a categorical value.
Required features:
contribmake_randomized_response in Rust documentation.
Supporting Elements:
Input Domain:
AtomDomain<T>Output Type:
DiscreteDistanceInput Metric:
MaxDivergenceOutput Measure:
T
Proof Definition:
- Parameters:
categories – Set of valid outcomes
prob (float) – Probability of returning the correct answer. Must be in
[1/num_categories, 1]T (Type Argument) – Data type of a category.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> dp.enable_features("contrib") >>> random_string = dp.m.make_randomized_response(['a', 'b', 'c'], 0.99) >>> print('a?', random_string('a')) a? ...
- opendp.measurements.make_randomized_response_bitvec(input_domain, input_metric, f, constant_time=False)[source]#
Make a Measurement that implements randomized response on a bit vector.
This primitive can be useful for implementing RAPPOR.
Required features:
contribmake_randomized_response_bitvec in Rust documentation.
Citations:
Supporting Elements:
Input Domain:
BitVectorDomainOutput Type:
DiscreteDistanceInput Metric:
MaxDivergenceOutput Measure:
BitVector
Proof Definition:
- Parameters:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> import numpy as np >>> import opendp.prelude as dp >>> dp.enable_features("contrib") >>> # Create the randomized response mechanism >>> m_rr = dp.m.make_randomized_response_bitvec( ... dp.bitvector_domain(max_weight=4), dp.discrete_distance(), f=0.95 ... ) >>> # compute privacy loss >>> m_rr.map(1) 0.8006676684558611 >>> # formula is 2 * m * ln((2 - f) / f) >>> # where m = 4 (the weight) and f = .95 (the flipping probability) >>> # prepare a dataset to release, by encoding a bit vector as a numpy byte array >>> data = np.packbits( ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] ... ) >>> assert np.array_equal(data, np.array([0, 8, 12], dtype=np.uint8)) >>> # roundtrip: numpy -> bytes -> mech -> bytes -> numpy >>> release = np.frombuffer(m_rr(data.tobytes()), dtype=np.uint8) >>> # compare the two bit vectors: >>> [int(bit) for bit in np.unpackbits(data)] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] >>> [int(bit) for bit in np.unpackbits(release)] [...]
- opendp.measurements.make_randomized_response_bool(prob, constant_time=False)[source]#
Make a Measurement that implements randomized response on a boolean value.
Required features:
contribmake_randomized_response_bool in Rust documentation.
Supporting Elements:
Input Domain:
AtomDomain<bool>Output Type:
DiscreteDistanceInput Metric:
MaxDivergenceOutput Measure:
bool
Proof Definition:
- Parameters:
prob (float) – Probability of returning the correct answer. Must be in
[0.5, 1]constant_time (bool) – Set to true to enable constant time. Slower.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> dp.enable_features("contrib") >>> random_bool = dp.m.make_randomized_response_bool(0.99) >>> print('True?', random_bool(True)) True? ...
- opendp.measurements.make_report_noisy_max_gumbel(input_domain, input_metric, scale, optimize='max')[source]#
Make a Measurement that takes a vector of scores and privately selects the index of the highest score.
Required features:
contribmake_report_noisy_max_gumbel in Rust documentation.
Supporting Elements:
Input Domain:
VectorDomain<AtomDomain<TIA>>Output Type:
LInfDistance<TIA>Input Metric:
MaxDivergenceOutput Measure:
usize
- Parameters:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Return type:
Deprecated since version 0.14.0: use make_noisy_max instead
- opendp.measurements.make_user_measurement(input_domain, input_metric, output_measure, function, privacy_map, TO='ExtrinsicObject')[source]#
Construct a Measurement from user-defined callbacks.
Required features:
contrib,honest-but-curiousWhy honest-but-curious?:
This constructor only returns a valid measurement if for every pair of elements \(x, x'\) in
input_domain, and for every pair(d_in, d_out), whered_inhas the associated type forinput_metricandd_outhas the associated type foroutput_measure, if \(x, x'\) ared_in-close underinput_metric,privacy_map(d_in)does not raise an exception, andprivacy_map(d_in) <= d_out, thenfunction(x), function(x')are d_out-close underoutput_measure.In addition,
functionmust not have side-effects, andprivacy_mapmust be a pure function.Supporting Elements:
Input Domain:
AnyDomainOutput Type:
AnyMetricInput Metric:
AnyMeasureOutput Measure:
AnyObject
- Parameters:
input_domain (Domain) – A domain describing the set of valid inputs for the function.
input_metric (Metric) – The metric from which distances between adjacent inputs are measured.
output_measure (Measure) – The measure from which distances between adjacent output distributions are measured.
function – A function mapping data from
input_domainto a release of typeTO.privacy_map – A function mapping distances from
input_metrictooutput_measure.TO (Type Argument) – The data type of outputs from the function.
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
- Return type:
>>> dp.enable_features("contrib") >>> def const_function(_arg): ... return 42 >>> def privacy_map(_d_in): ... return 0. >>> space = dp.atom_domain(T=int), dp.absolute_distance(int) >>> user_measurement = dp.m.make_user_measurement( ... *space, ... output_measure=dp.max_divergence(), ... function=const_function, ... privacy_map=privacy_map ... ) >>> print('42?', user_measurement(0)) 42? 42
- opendp.measurements.then_alp_queryable(scale, total_limit, value_limit=None, size_factor=50, alpha=4)[source]#
partial constructor of make_alp_queryable
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_alp_queryable()- Parameters:
scale (float) – Privacy loss parameter. This is equal to epsilon/sensitivity.
total_limit – Either the true value or an upper bound estimate of the sum of all values in the input.
value_limit – Upper bound on individual values (referred to as β). Entries above β are clamped.
size_factor – Optional multiplier (default of 50) for setting the size of the projection.
alpha – Optional parameter (default of 4) for scaling and determining p in randomized response step.
- opendp.measurements.then_canonical_noise(d_in, d_out)[source]#
partial constructor of make_canonical_noise
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_canonical_noise()- Parameters:
d_in (float) – Sensitivity
d_out (tuple[Any, Any]) – Privacy parameters (ε, δ)
- opendp.measurements.then_gaussian(scale, k=None, MO='ZeroConcentratedDivergence')[source]#
partial constructor of make_gaussian
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_gaussian()- Parameters:
scale (float) – Noise scale parameter for the gaussian distribution.
scale== standard_deviation.k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure. The only valid measure is
ZeroConcentratedDivergence.
- Example:
>>> dp.enable_features('contrib') >>> input_space = dp.atom_domain(T=float, nan=False), dp.absolute_distance(T=float) >>> gaussian = dp.m.make_gaussian(*input_space, scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> gaussian = input_space >> dp.m.then_gaussian(scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
- opendp.measurements.then_gaussian_threshold(scale, threshold, k=None, MO='Approximate<ZeroConcentratedDivergence>')[source]#
partial constructor of make_gaussian_threshold
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_gaussian_threshold()- Parameters:
scale (float) – Noise scale parameter for the laplace distribution.
scale== standard_deviation / sqrt(2).threshold – Exclude pairs with values whose distance from zero exceeds this value.
k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure.
- opendp.measurements.then_geometric(scale, bounds=None, MO='MaxDivergence')[source]#
partial constructor of make_geometric
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_geometric()- Parameters:
scale (float) – Noise scale parameter for the distribution.
scale== standard_deviation / sqrt(2).bounds – Set bounds on the count to make the algorithm run in constant-time.
MO (Type Argument) – Measure used to quantify privacy loss. Valid values are just
MaxDivergence
- Example:
>>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int) >>> geometric = dp.m.make_geometric(*input_space, scale=1.0) >>> print('100?', geometric(100)) 100? ...
Or, more readably, define the space and then chain:
>>> geometric = input_space >> dp.m.then_geometric(scale=1.0) >>> print('100?', geometric(100)) 100? ...
- opendp.measurements.then_laplace(scale, k=None, MO='MaxDivergence')[source]#
partial constructor of make_laplace
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_laplace()- Parameters:
scale (float) – Noise scale parameter for the Laplace distribution.
scale== standard_deviation / sqrt(2).k – The noise granularity in terms of 2^k, only valid for domains over floats.
MO (Type Argument) – Measure used to quantify privacy loss. Valid values are just
MaxDivergence
- Example:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=float, nan=False), dp.absolute_distance(T=float) >>> laplace = dp.m.make_laplace(*input_space, scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> laplace = input_space >> dp.m.then_laplace(scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
- opendp.measurements.then_laplace_threshold(scale, threshold, k=None, MO='Approximate<MaxDivergence>')[source]#
partial constructor of make_laplace_threshold
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_laplace_threshold()- Parameters:
scale (float) – Noise scale parameter for the laplace distribution.
scale== standard_deviation / sqrt(2).threshold – Exclude counts that are less than this minimum value.
k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure.
- opendp.measurements.then_noise(output_measure, scale, k=None)[source]#
partial constructor of make_noise
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_noise()- Parameters:
output_measure (Measure) – Privacy measure. Either
MaxDivergenceorZeroConcentratedDivergence.scale (float) – Noise scale parameter.
k – The noise granularity in terms of 2^k.
- opendp.measurements.then_noise_threshold(output_measure, scale, threshold, k=None)[source]#
partial constructor of make_noise_threshold
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_noise_threshold()- Parameters:
output_measure (Measure) – Privacy measure. Either
MaxDivergenceorZeroConcentratedDivergence.scale (float) – Noise scale parameter for the laplace distribution.
scale== standard_deviation / sqrt(2).threshold – Exclude counts that are less than this minimum value.
k – The noise granularity in terms of 2^k.
- opendp.measurements.then_noisy_max(output_measure, scale, negate=False)[source]#
partial constructor of make_noisy_max
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_noisy_max()- Parameters:
output_measure (Measure) – One of
MaxDivergence,ZeroConcentratedDivergencescale (float) – Scale for the noise distribution
negate (bool) – Set to true to return min
- Example:
>>> dp.enable_features("contrib") >>> input_space = dp.vector_domain(dp.atom_domain(T=int)), dp.linf_distance(T=int) >>> select_index = dp.m.make_noisy_max(*input_space, dp.max_divergence(), scale=1.0) >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
Or, more readably, define the space and then chain:
>>> select_index = input_space >> dp.m.then_noisy_max(dp.max_divergence(), scale=1.0) >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
- opendp.measurements.then_noisy_top_k(output_measure, k, scale, negate=False)[source]#
partial constructor of make_noisy_top_k
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_noisy_top_k()- Parameters:
output_measure (Measure) – One of
MaxDivergenceorZeroConcentratedDivergencek (int) – Number of indices to select.
scale (float) – Scale for the noise distribution.
negate (bool) – Set to true to return bottom k
- opendp.measurements.then_private_expr(output_measure, expr, global_scale=None)[source]#
partial constructor of make_private_expr
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_private_expr()- Parameters:
output_measure (Measure) – How to measure privacy loss.
expr – The [
Expr] to be privatized.global_scale – A tune-able parameter that affects the privacy-utility tradeoff.
- opendp.measurements.then_private_lazyframe(output_measure, lazyframe, global_scale=None, threshold=None)[source]#
partial constructor of make_private_lazyframe
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_private_lazyframe()- Parameters:
output_measure (Measure) – How to measure privacy loss.
lazyframe – A description of the computations to be run, in the form of a [
LazyFrame].global_scale – Optional. A tune-able parameter that affects the privacy-utility tradeoff.
threshold – Optional. Minimum number of rows in each released group.
- Example:
>>> dp.enable_features("contrib") >>> import polars as pl
We’ll imagine an elementary school is taking a pet census. The private census data will have two columns:
>>> lf_domain = dp.lazyframe_domain([ ... dp.series_domain("grade", dp.atom_domain(T=dp.i32)), ... dp.series_domain("pet_count", dp.atom_domain(T=dp.i32))])
We also need to specify the column we’ll be grouping by.
>>> lf_domain_with_margin = dp.with_margin( ... lf_domain, ... dp.polars.Margin( ... by=[pl.col("grade")], ... invariant="keys", ... max_length=50))
With that in place, we can plan the Polars computation, using the
dpplugin.>>> plan = ( ... pl.LazyFrame(schema={'grade': pl.Int32, 'pet_count': pl.Int32}) ... .group_by("grade") ... .agg(pl.col("pet_count").dp.sum((0, 10), scale=1.0)))
We now have all the pieces to make our measurement function using make_private_lazyframe:
>>> dp_sum_pets_by_grade = dp.m.make_private_lazyframe( ... input_domain=lf_domain_with_margin, ... input_metric=dp.symmetric_distance(), ... output_measure=dp.max_divergence(), ... lazyframe=plan, ... global_scale=1.0)
It’s only at this point that we need to introduce the private data.
>>> df = pl.from_records( ... [ ... [0, 0], # No kindergarteners with pets. ... [0, 0], ... [0, 0], ... [1, 1], # Each first grader has 1 pet. ... [1, 1], ... [1, 1], ... [2, 1], # One second grader has chickens! ... [2, 1], ... [2, 9] ... ], ... schema=['grade', 'pet_count'], orient="row") >>> lf = pl.LazyFrame(df) >>> results = dp_sum_pets_by_grade(lf).collect() >>> print(results.sort("grade")) shape: (3, 2) ┌───────┬───────────┐ │ grade ┆ pet_count │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═══════╪═══════════╡ │ 0 ┆ ... │ │ 1 ┆ ... │ │ 2 ┆ ... │ └───────┴───────────┘
- opendp.measurements.then_private_quantile(output_measure, candidates, alpha, scale)[source]#
partial constructor of make_private_quantile
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_private_quantile()- Parameters:
output_measure (Measure) – Either MaxDivergence or ZeroConcentratedDivergence.
candidates – Potential quantiles to score
alpha (float) – a value in $[0, 1]$. Choose 0.5 for median
scale (float) – the scale of the noise added
- opendp.measurements.then_randomized_response_bitvec(f, constant_time=False)[source]#
partial constructor of make_randomized_response_bitvec
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_randomized_response_bitvec()- Parameters:
f (float) – Per-bit flipping probability. Must be in $(0, 1]$.
constant_time (bool) – Whether to run the Bernoulli samplers in constant time, this is likely to be extremely slow.
- Example:
>>> import numpy as np >>> import opendp.prelude as dp >>> dp.enable_features("contrib") >>> # Create the randomized response mechanism >>> m_rr = dp.m.make_randomized_response_bitvec( ... dp.bitvector_domain(max_weight=4), dp.discrete_distance(), f=0.95 ... ) >>> # compute privacy loss >>> m_rr.map(1) 0.8006676684558611 >>> # formula is 2 * m * ln((2 - f) / f) >>> # where m = 4 (the weight) and f = .95 (the flipping probability) >>> # prepare a dataset to release, by encoding a bit vector as a numpy byte array >>> data = np.packbits( ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] ... ) >>> assert np.array_equal(data, np.array([0, 8, 12], dtype=np.uint8)) >>> # roundtrip: numpy -> bytes -> mech -> bytes -> numpy >>> release = np.frombuffer(m_rr(data.tobytes()), dtype=np.uint8) >>> # compare the two bit vectors: >>> [int(bit) for bit in np.unpackbits(data)] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] >>> [int(bit) for bit in np.unpackbits(release)] [...]
- opendp.measurements.then_report_noisy_max_gumbel(scale, optimize='max')[source]#
partial constructor of make_report_noisy_max_gumbel
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_report_noisy_max_gumbel()- Parameters:
scale (float) – Scale for the noise distribution
optimize (str) – Set to “min” to report noisy min
- opendp.measurements.then_user_measurement(output_measure, function, privacy_map, TO='ExtrinsicObject')[source]#
partial constructor of make_user_measurement
See also
Delays application of
input_domainandinput_metricinopendp.measurements.make_user_measurement()- Parameters:
output_measure (Measure) – The measure from which distances between adjacent output distributions are measured.
function – A function mapping data from
input_domainto a release of typeTO.privacy_map – A function mapping distances from
input_metrictooutput_measure.TO (Type Argument) – The data type of outputs from the function.
- Example:
>>> dp.enable_features("contrib") >>> def const_function(_arg): ... return 42 >>> def privacy_map(_d_in): ... return 0. >>> space = dp.atom_domain(T=int), dp.absolute_distance(int) >>> user_measurement = dp.m.make_user_measurement( ... *space, ... output_measure=dp.max_divergence(), ... function=const_function, ... privacy_map=privacy_map ... ) >>> print('42?', user_measurement(0)) 42? 42