opendp.measurements module#
The measurements
module provides functions that apply calibrated noise to data to ensure differential privacy.
For more context, see measurements in the User Guide.
For convenience, all the functions of this module are also available from opendp.prelude
.
We suggest importing under the conventional name dp
:
>>> import opendp.prelude as dp
The methods of this module will then be accessible at dp.m
.
- opendp.measurements.debias_randomized_response_bitvec(answers, f)[source]#
Convert a vector of randomized response bitvec responses to a frequency estimate
Required features:
contrib
debias_randomized_response_bitvec in Rust documentation.
- Parameters:
answers – A vector of BitVectors with consistent size
f (float) – The per bit flipping probability used to encode
answers
Computes the sum of the answers into a $k$-length vector $Y$ and returns
````math Y\frac{Y-\frac{f}{2}}{1-f} ````
:type f: float :raises TypeError: if an argument’s type differs from the expected type :raises UnknownTypeException: if a type argument fails to parse :raises OpenDPException: packaged error from the core OpenDP library
- opendp.measurements.make_alp_queryable(input_domain, input_metric, scale, total_limit, value_limit=None, size_factor=50, alpha=4)[source]#
Measurement to release a queryable containing a DP projection of bounded sparse data.
The size of the projection is O(total * size_factor * scale / alpha). The evaluation time of post-processing is O(beta * scale / alpha).
size_factor
is an optional multiplier (defaults to 50) for setting the size of the projection. There is a memory/utility trade-off. The value should be sufficiently large to limit hash collisions.Required features:
contrib
make_alp_queryable in Rust documentation.
Citations:
ALP21 Differentially Private Sparse Vectors with Low Error, Optimal Space, and Fast Access Algorithm 4
Supporting Elements:
Input Domain:
MapDomain<AtomDomain<K>, AtomDomain<CI>>
Output Type:
Queryable<K, f64>
Input Metric:
L1Distance<CI>
Output Measure:
MaxDivergence
- Parameters:
input_domain (Domain) –
input_metric (Metric) –
scale (float) – Privacy loss parameter. This is equal to epsilon/sensitivity.
total_limit – Either the true value or an upper bound estimate of the sum of all values in the input.
value_limit – Upper bound on individual values (referred to as β). Entries above β are clamped.
size_factor – Optional multiplier (default of 50) for setting the size of the projection.
alpha – Optional parameter (default of 4) for scaling and determining p in randomized response step.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- opendp.measurements.make_gaussian(input_domain, input_metric, scale, k=None, MO='ZeroConcentratedDivergence')[source]#
Make a Measurement that adds noise from the Gaussian(
scale
) distribution to the input.Valid inputs for
input_domain
andinput_metric
are:input_domain
input type
input_metric
atom_domain(T)
T
absolute_distance(QI)
vector_domain(atom_domain(T))
Vec<T>
l2_distance(QI)
Required features:
contrib
make_gaussian in Rust documentation.
Supporting Elements:
Input Domain:
D
Output Type:
D::Carrier
Input Metric:
D::InputMetric
Output Measure:
MO
- Parameters:
input_domain (Domain) – Domain of the data type to be privatized.
input_metric (Metric) – Metric of the data type to be privatized.
scale (float) – Noise scale parameter for the gaussian distribution.
scale
== standard_deviation.k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure. The only valid measure is
ZeroConcentratedDivergence
.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> dp.enable_features('contrib') >>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float) >>> gaussian = dp.m.make_gaussian(*input_space, scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> gaussian = input_space >> dp.m.then_gaussian(scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
- opendp.measurements.make_geometric(input_domain, input_metric, scale, bounds=None)[source]#
Equivalent to
make_laplace
but restricted to an integer support. Can specifybounds
to run the algorithm in near constant-time.Required features:
contrib
make_geometric in Rust documentation.
Citations:
Supporting Elements:
Input Domain:
D
Output Type:
D::Carrier
Input Metric:
D::InputMetric
Output Measure:
MaxDivergence
- Parameters:
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int) >>> geometric = dp.m.make_geometric(*input_space, scale=1.0) >>> print('100?', geometric(100)) 100? ...
Or, more readably, define the space and then chain:
>>> geometric = input_space >> dp.m.then_geometric(scale=1.0) >>> print('100?', geometric(100)) 100? ...
- opendp.measurements.make_laplace(input_domain, input_metric, scale, k=None)[source]#
Make a Measurement that adds noise from the Laplace(
scale
) distribution to the input.Valid inputs for
input_domain
andinput_metric
are:input_domain
input type
input_metric
atom_domain(T)
(default)T
absolute_distance(T)
vector_domain(atom_domain(T))
Vec<T>
l1_distance(T)
Internally, all sampling is done using the discrete Laplace distribution.
Required features:
contrib
make_laplace in Rust documentation.
Citations:
Supporting Elements:
Input Domain:
D
Output Type:
D::Carrier
Input Metric:
D::InputMetric
Output Measure:
MaxDivergence
- Parameters:
input_domain (Domain) – Domain of the data type to be privatized.
input_metric (Metric) – Metric of the data type to be privatized.
scale (float) – Noise scale parameter for the Laplace distribution.
scale
== standard_deviation / sqrt(2).k – The noise granularity in terms of 2^k, only valid for domains over floats.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float) >>> laplace = dp.m.make_laplace(*input_space, scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> laplace = input_space >> dp.m.then_laplace(scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
- opendp.measurements.make_laplace_threshold(input_domain, input_metric, scale, threshold, k=-1074)[source]#
Make a Measurement that uses propose-test-release to privatize a hashmap of counts.
This function takes a noise granularity in terms of 2^k. Larger granularities are more computationally efficient, but have a looser privacy map. If k is not set, k defaults to the smallest granularity.
Required features:
contrib
,floating-point
make_laplace_threshold in Rust documentation.
Supporting Elements:
Input Domain:
MapDomain<AtomDomain<TK>, AtomDomain<TV>>
Output Type:
HashMap<TK, TV>
Input Metric:
L1Distance<TV>
Output Measure:
Approximate<MaxDivergence>
- Parameters:
input_domain (Domain) – Domain of the input.
input_metric (Metric) – Metric for the input domain.
scale (float) – Noise scale parameter for the laplace distribution.
scale
== standard_deviation / sqrt(2).threshold – Exclude counts that are less than this minimum value.
k (int) – The noise granularity in terms of 2^k.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- opendp.measurements.make_private_expr(input_domain, input_metric, output_measure, expr, global_scale=None)[source]#
Create a differentially private measurement from an [
Expr
].Required features:
contrib
,honest-but-curious
make_private_expr in Rust documentation.
Why honest-but-curious?:
The privacy guarantee governs only at most one evaluation of the released expression.
Supporting Elements:
Input Domain:
WildExprDomain
Output Type:
ExprPlan
Input Metric:
MI
Output Measure:
MO
- Parameters:
input_domain (Domain) – The domain of the input data.
input_metric (Metric) – How to measure distances between neighboring input data sets.
output_measure (Measure) – How to measure privacy loss.
expr – The [
Expr
] to be privatized.global_scale – A tune-able parameter that affects the privacy-utility tradeoff.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- opendp.measurements.make_private_lazyframe(input_domain, input_metric, output_measure, lazyframe, global_scale=None, threshold=None)[source]#
Create a differentially private measurement from a [
LazyFrame
].Any data inside the [
LazyFrame
] is ignored, but it is still recommended to start with an empty [DataFrame
] and build up the computation using the [LazyFrame
] API.Required features:
contrib
make_private_lazyframe in Rust documentation.
Supporting Elements:
Input Domain:
LazyFrameDomain
Output Type:
OnceFrame
Input Metric:
MI
Output Measure:
MO
- Parameters:
input_domain (Domain) – The domain of the input data.
input_metric (Metric) – How to measure distances between neighboring input data sets.
output_measure (Measure) – How to measure privacy loss.
lazyframe – A description of the computations to be run, in the form of a [
LazyFrame
].global_scale – Optional. A tune-able parameter that affects the privacy-utility tradeoff.
threshold – Optional. Minimum number of rows in each released partition.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> dp.enable_features("contrib") >>> import polars as pl
We’ll imagine an elementary school is taking a pet census. The private census data will have two columns:
>>> lf_domain = dp.lazyframe_domain([ ... dp.series_domain("grade", dp.atom_domain(T=dp.i32)), ... dp.series_domain("pet_count", dp.atom_domain(T=dp.i32))])
We also need to specify the column we’ll be grouping by.
>>> lf_domain_with_margin = dp.with_margin( ... lf_domain, ... by=["grade"], ... public_info="keys", ... max_partition_length=50)
With that in place, we can plan the Polars computation, using the
dp
plugin.>>> plan = ( ... pl.LazyFrame(schema={'grade': pl.Int32, 'pet_count': pl.Int32}) ... .group_by("grade") ... .agg(pl.col("pet_count").dp.sum((0, 10), scale=1.0)))
We now have all the pieces to make our measurement function using
make_private_lazyframe
:>>> dp_sum_pets_by_grade = dp.m.make_private_lazyframe( ... input_domain=lf_domain_with_margin, ... input_metric=dp.symmetric_distance(), ... output_measure=dp.max_divergence(), ... lazyframe=plan, ... global_scale=1.0)
It’s only at this point that we need to introduce the private data.
>>> df = pl.from_records( ... [ ... [0, 0], # No kindergarteners with pets. ... [0, 0], ... [0, 0], ... [1, 1], # Each first grader has 1 pet. ... [1, 1], ... [1, 1], ... [2, 1], # One second grader has chickens! ... [2, 1], ... [2, 9] ... ], ... schema=['grade', 'pet_count'], orient="row") >>> lf = pl.LazyFrame(df) >>> results = dp_sum_pets_by_grade(lf).collect() >>> print(results.sort("grade")) shape: (3, 2) ┌───────┬───────────┐ │ grade ┆ pet_count │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═══════╪═══════════╡ │ 0 ┆ ... │ │ 1 ┆ ... │ │ 2 ┆ ... │ └───────┴───────────┘
- opendp.measurements.make_randomized_response(categories, prob, T=None)[source]#
Make a Measurement that implements randomized response on a categorical value.
Required features:
contrib
make_randomized_response in Rust documentation.
Supporting Elements:
Input Domain:
AtomDomain<T>
Output Type:
T
Input Metric:
DiscreteDistance
Output Measure:
MaxDivergence
Proof Definition:
- Parameters:
categories – Set of valid outcomes
prob (float) – Probability of returning the correct answer. Must be in
[1/num_categories, 1)
T (Type Argument) – Data type of a category.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> dp.enable_features("contrib") >>> random_string = dp.m.make_randomized_response(['a', 'b', 'c'], 0.99) >>> print('a?', random_string('a')) a? ...
- opendp.measurements.make_randomized_response_bitvec(input_domain, input_metric, f, constant_time=False)[source]#
Make a Measurement that implements randomized response on a bit vector.
This primitive can be useful for implementing RAPPOR.
Required features:
contrib
make_randomized_response_bitvec in Rust documentation.
Citations:
Supporting Elements:
Input Domain:
BitVectorDomain
Output Type:
BitVector
Input Metric:
DiscreteDistance
Output Measure:
MaxDivergence
Proof Definition:
- Parameters:
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> import numpy as np >>> import opendp.prelude as dp
>>> dp.enable_features("contrib")
>>> # Create the randomized response mechanism >>> m_rr = dp.m.make_randomized_response_bitvec( ... dp.bitvector_domain(max_weight=4), dp.discrete_distance(), f=0.95 ... )
>>> # compute privacy loss >>> m_rr.map(1) 0.8006676684558611
>>> # formula is 2 * m * ln((2 - f) / f) >>> # where m = 4 (the weight) and f = .95 (the flipping probability)
>>> # prepare a dataset to release, by encoding a bit vector as a numpy byte array >>> data = np.packbits( ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] ... ) >>> assert np.array_equal(data, np.array([0, 8, 12], dtype=np.uint8))
>>> # roundtrip: numpy -> bytes -> mech -> bytes -> numpy >>> release = np.frombuffer(m_rr(data.tobytes()), dtype=np.uint8)
>>> # compare the two bit vectors: >>> [int(bit) for bit in np.unpackbits(data)] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] >>> [int(bit) for bit in np.unpackbits(release)] [...]
- opendp.measurements.make_randomized_response_bool(prob, constant_time=False)[source]#
Make a Measurement that implements randomized response on a boolean value.
Required features:
contrib
make_randomized_response_bool in Rust documentation.
Supporting Elements:
Input Domain:
AtomDomain<bool>
Output Type:
bool
Input Metric:
DiscreteDistance
Output Measure:
MaxDivergence
Proof Definition:
- Parameters:
prob (float) – Probability of returning the correct answer. Must be in
[0.5, 1)
constant_time (bool) – Set to true to enable constant time. Slower.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> dp.enable_features("contrib") >>> random_bool = dp.m.make_randomized_response_bool(0.99) >>> print('True?', random_bool(True)) True? ...
- opendp.measurements.make_report_noisy_max_gumbel(input_domain, input_metric, scale, optimize)[source]#
Make a Measurement that takes a vector of scores and privately selects the index of the highest score.
Required features:
contrib
make_report_noisy_max_gumbel in Rust documentation.
Supporting Elements:
Input Domain:
VectorDomain<AtomDomain<TIA>>
Output Type:
usize
Input Metric:
LInfDistance<TIA>
Output Measure:
MaxDivergence
Proof Definition:
- Parameters:
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> dp.enable_features("contrib") >>> input_space = dp.vector_domain(dp.atom_domain(T=int)), dp.linf_distance(T=int) >>> select_index = dp.m.make_report_noisy_max_gumbel(*input_space, scale=1.0, optimize='max') >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
Or, more readably, define the space and then chain:
>>> select_index = input_space >> dp.m.then_report_noisy_max_gumbel(scale=1.0, optimize='max') >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
- opendp.measurements.make_user_measurement(input_domain, input_metric, output_measure, function, privacy_map, TO='ExtrinsicObject')[source]#
Construct a Measurement from user-defined callbacks.
Required features:
contrib
,honest-but-curious
Why honest-but-curious?:
This constructor only returns a valid measurement if for every pair of elements \(x, x'\) in
input_domain
, and for every pair(d_in, d_out)
, whered_in
has the associated type forinput_metric
andd_out
has the associated type foroutput_measure
, if \(x, x'\) ared_in
-close underinput_metric
,privacy_map(d_in)
does not raise an exception, andprivacy_map(d_in) <= d_out
, thenfunction(x), function(x')
are d_out-close underoutput_measure
.In addition,
function
must not have side-effects, andprivacy_map
must be a pure function.Supporting Elements:
Input Domain:
AnyDomain
Output Type:
AnyObject
Input Metric:
AnyMetric
Output Measure:
AnyMeasure
- Parameters:
input_domain (Domain) – A domain describing the set of valid inputs for the function.
input_metric (Metric) – The metric from which distances between adjacent inputs are measured.
output_measure (Measure) – The measure from which distances between adjacent output distributions are measured.
function – A function mapping data from
input_domain
to a release of typeTO
.privacy_map – A function mapping distances from
input_metric
tooutput_measure
.TO (Type Argument) – The data type of outputs from the function.
- Return type:
- Raises:
TypeError – if an argument’s type differs from the expected type
UnknownTypeException – if a type argument fails to parse
OpenDPException – packaged error from the core OpenDP library
- Example:
>>> dp.enable_features("contrib") >>> def const_function(_arg): ... return 42 >>> def privacy_map(_d_in): ... return 0. >>> space = dp.atom_domain(T=int), dp.absolute_distance(int) >>> user_measurement = dp.m.make_user_measurement( ... *space, ... output_measure=dp.max_divergence(), ... function=const_function, ... privacy_map=privacy_map ... ) >>> print('42?', user_measurement(0)) 42? 42
- opendp.measurements.then_alp_queryable(scale, total_limit, value_limit=None, size_factor=50, alpha=4)[source]#
partial constructor of make_alp_queryable
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_alp_queryable()
- Parameters:
scale (float) – Privacy loss parameter. This is equal to epsilon/sensitivity.
total_limit – Either the true value or an upper bound estimate of the sum of all values in the input.
value_limit – Upper bound on individual values (referred to as β). Entries above β are clamped.
size_factor – Optional multiplier (default of 50) for setting the size of the projection.
alpha – Optional parameter (default of 4) for scaling and determining p in randomized response step.
- opendp.measurements.then_gaussian(scale, k=None, MO='ZeroConcentratedDivergence')[source]#
partial constructor of make_gaussian
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_gaussian()
- Parameters:
scale (float) – Noise scale parameter for the gaussian distribution.
scale
== standard_deviation.k – The noise granularity in terms of 2^k.
MO (Type Argument) – Output Measure. The only valid measure is
ZeroConcentratedDivergence
.
- Example:
>>> dp.enable_features('contrib') >>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float) >>> gaussian = dp.m.make_gaussian(*input_space, scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> gaussian = input_space >> dp.m.then_gaussian(scale=1.0) >>> print('100?', gaussian(100.0)) 100? ...
- opendp.measurements.then_geometric(scale, bounds=None)[source]#
partial constructor of make_geometric
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_geometric()
- Parameters:
scale (float) –
bounds –
- Example:
>>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int) >>> geometric = dp.m.make_geometric(*input_space, scale=1.0) >>> print('100?', geometric(100)) 100? ...
Or, more readably, define the space and then chain:
>>> geometric = input_space >> dp.m.then_geometric(scale=1.0) >>> print('100?', geometric(100)) 100? ...
- opendp.measurements.then_laplace(scale, k=None)[source]#
partial constructor of make_laplace
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_laplace()
- Parameters:
scale (float) – Noise scale parameter for the Laplace distribution.
scale
== standard_deviation / sqrt(2).k – The noise granularity in terms of 2^k, only valid for domains over floats.
- Example:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib") >>> input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float) >>> laplace = dp.m.make_laplace(*input_space, scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
Or, more readably, define the space and then chain:
>>> laplace = input_space >> dp.m.then_laplace(scale=1.0) >>> print('100?', laplace(100.0)) 100? ...
- opendp.measurements.then_laplace_threshold(scale, threshold, k=-1074)[source]#
partial constructor of make_laplace_threshold
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_laplace_threshold()
- Parameters:
scale (float) – Noise scale parameter for the laplace distribution.
scale
== standard_deviation / sqrt(2).threshold – Exclude counts that are less than this minimum value.
k (int) – The noise granularity in terms of 2^k.
- opendp.measurements.then_private_expr(output_measure, expr, global_scale=None)[source]#
partial constructor of make_private_expr
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_private_expr()
- Parameters:
output_measure (Measure) – How to measure privacy loss.
expr – The [
Expr
] to be privatized.global_scale – A tune-able parameter that affects the privacy-utility tradeoff.
- opendp.measurements.then_private_lazyframe(output_measure, lazyframe, global_scale=None, threshold=None)[source]#
partial constructor of make_private_lazyframe
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_private_lazyframe()
- Parameters:
output_measure (Measure) – How to measure privacy loss.
lazyframe – A description of the computations to be run, in the form of a [
LazyFrame
].global_scale – Optional. A tune-able parameter that affects the privacy-utility tradeoff.
threshold – Optional. Minimum number of rows in each released partition.
- Example:
>>> dp.enable_features("contrib") >>> import polars as pl
We’ll imagine an elementary school is taking a pet census. The private census data will have two columns:
>>> lf_domain = dp.lazyframe_domain([ ... dp.series_domain("grade", dp.atom_domain(T=dp.i32)), ... dp.series_domain("pet_count", dp.atom_domain(T=dp.i32))])
We also need to specify the column we’ll be grouping by.
>>> lf_domain_with_margin = dp.with_margin( ... lf_domain, ... by=["grade"], ... public_info="keys", ... max_partition_length=50)
With that in place, we can plan the Polars computation, using the
dp
plugin.>>> plan = ( ... pl.LazyFrame(schema={'grade': pl.Int32, 'pet_count': pl.Int32}) ... .group_by("grade") ... .agg(pl.col("pet_count").dp.sum((0, 10), scale=1.0)))
We now have all the pieces to make our measurement function using
make_private_lazyframe
:>>> dp_sum_pets_by_grade = dp.m.make_private_lazyframe( ... input_domain=lf_domain_with_margin, ... input_metric=dp.symmetric_distance(), ... output_measure=dp.max_divergence(), ... lazyframe=plan, ... global_scale=1.0)
It’s only at this point that we need to introduce the private data.
>>> df = pl.from_records( ... [ ... [0, 0], # No kindergarteners with pets. ... [0, 0], ... [0, 0], ... [1, 1], # Each first grader has 1 pet. ... [1, 1], ... [1, 1], ... [2, 1], # One second grader has chickens! ... [2, 1], ... [2, 9] ... ], ... schema=['grade', 'pet_count'], orient="row") >>> lf = pl.LazyFrame(df) >>> results = dp_sum_pets_by_grade(lf).collect() >>> print(results.sort("grade")) shape: (3, 2) ┌───────┬───────────┐ │ grade ┆ pet_count │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═══════╪═══════════╡ │ 0 ┆ ... │ │ 1 ┆ ... │ │ 2 ┆ ... │ └───────┴───────────┘
- opendp.measurements.then_randomized_response_bitvec(f, constant_time=False)[source]#
partial constructor of make_randomized_response_bitvec
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_randomized_response_bitvec()
- Parameters:
f (float) – Per-bit flipping probability. Must be in $(0, 1]$.
constant_time (bool) – Whether to run the Bernoulli samplers in constant time, this is likely to be extremely slow.
- Example:
>>> import numpy as np >>> import opendp.prelude as dp
>>> dp.enable_features("contrib")
>>> # Create the randomized response mechanism >>> m_rr = dp.m.make_randomized_response_bitvec( ... dp.bitvector_domain(max_weight=4), dp.discrete_distance(), f=0.95 ... )
>>> # compute privacy loss >>> m_rr.map(1) 0.8006676684558611
>>> # formula is 2 * m * ln((2 - f) / f) >>> # where m = 4 (the weight) and f = .95 (the flipping probability)
>>> # prepare a dataset to release, by encoding a bit vector as a numpy byte array >>> data = np.packbits( ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] ... ) >>> assert np.array_equal(data, np.array([0, 8, 12], dtype=np.uint8))
>>> # roundtrip: numpy -> bytes -> mech -> bytes -> numpy >>> release = np.frombuffer(m_rr(data.tobytes()), dtype=np.uint8)
>>> # compare the two bit vectors: >>> [int(bit) for bit in np.unpackbits(data)] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0] >>> [int(bit) for bit in np.unpackbits(release)] [...]
- opendp.measurements.then_report_noisy_max_gumbel(scale, optimize)[source]#
partial constructor of make_report_noisy_max_gumbel
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_report_noisy_max_gumbel()
- Parameters:
scale (float) – Higher scales are more private.
optimize (str) – Indicate whether to privately return the “max” or “min”
- Example:
>>> dp.enable_features("contrib") >>> input_space = dp.vector_domain(dp.atom_domain(T=int)), dp.linf_distance(T=int) >>> select_index = dp.m.make_report_noisy_max_gumbel(*input_space, scale=1.0, optimize='max') >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
Or, more readably, define the space and then chain:
>>> select_index = input_space >> dp.m.then_report_noisy_max_gumbel(scale=1.0, optimize='max') >>> print('2?', select_index([1, 2, 3, 2, 1])) 2? ...
- opendp.measurements.then_user_measurement(output_measure, function, privacy_map, TO='ExtrinsicObject')[source]#
partial constructor of make_user_measurement
See also
Delays application of
input_domain
andinput_metric
inopendp.measurements.make_user_measurement()
- Parameters:
output_measure (Measure) – The measure from which distances between adjacent output distributions are measured.
function – A function mapping data from
input_domain
to a release of typeTO
.privacy_map – A function mapping distances from
input_metric
tooutput_measure
.TO (Type Argument) – The data type of outputs from the function.
- Example:
>>> dp.enable_features("contrib") >>> def const_function(_arg): ... return 42 >>> def privacy_map(_d_in): ... return 0. >>> space = dp.atom_domain(T=int), dp.absolute_distance(int) >>> user_measurement = dp.m.make_user_measurement( ... *space, ... output_measure=dp.max_divergence(), ... function=const_function, ... privacy_map=privacy_map ... ) >>> print('42?', user_measurement(0)) 42? 42