opendp.mod module#
- class opendp.mod.Measurement[source]#
Bases:
LP_AnyMeasurement
A differentially private unit of computation. A measurement contains a function and a privacy relation. The function releases a differentially-private release. The privacy relation maps from an input metric to an output measure.
- Example:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib") ... >>> # create an instance of Measurement using a constructor from the meas module >>> base_dl: dp.Measurement = dp.m.make_base_discrete_laplace( ... dp.atom_domain(T=int), dp.absolute_distance(T=int), ... scale=2.) ... >>> # invoke the measurement (invoke and __call__ are equivalent) >>> base_dl.invoke(100) # -> 101 >>> base_dl(100) # -> 99 ... >>> # check the measurement's relation at >>> # (1, 0.5): (AbsoluteDistance<u32>, MaxDivergence) >>> assert base_dl.check(1, 0.5) ... >>> # chain with a transformation from the trans module >>> chained = ( ... (dp.vector_domain(dp.atom_domain(T=int)), dp.symmetric_distance()) >> ... dp.t.then_count() >> ... base_dl ... ) ... >>> # the resulting measurement has the same features >>> chained([1, 2, 3]) # -> 4 >>> # check the chained measurement's relation at >>> # (1, 0.5): (SymmetricDistance, MaxDivergence) >>> assert chained.check(1, 0.5)
- check(d_in, d_out, *, debug=False)[source]#
Check if the measurement is (d_in, d_out)-close. If true, implies that if the distance between inputs is at most d_in, then the privacy usage is at most d_out. See also
check()
, a similar check for transformations.- Parameters:
d_in – Distance in terms of the input metric.
d_out – Distance in terms of the output measure.
debug – Enable to raise Exceptions to help identify why the privacy relation failed.
- Returns:
If True, a release is differentially private at d_in, d_out.
- Return type:
bool
- property input_carrier_type#
Retrieve the carrier type of the input domain. Any member of the input domain is a member of the carrier type.
- Returns:
carrier type
- property input_distance_type#
Retrieve the distance type of the input metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.
- Returns:
distance type
- invoke(arg)[source]#
Create a differentially-private release with arg.
If self is (d_in, d_out)-close, then each invocation of this function is a d_out-DP release.
- Parameters:
arg – Input to the measurement.
- Returns:
differentially-private release
- Raises:
OpenDPException – packaged error from the core OpenDP library
- property output_distance_type#
Retrieve the distance type of the output measure. This is the type that the budget is expressed in.
- Returns:
distance type
- exception opendp.mod.OpenDPException(variant, message=None, raw_traceback=None)[source]#
Bases:
Exception
General exception for errors originating from the underlying OpenDP library. The variant attribute corresponds to one of the following variants and can be matched on. Error variants may change in library updates.
See Rust ErrorVariant for values variant may take on.
- Parameters:
variant (str) –
message (str) –
raw_traceback (str) –
- class opendp.mod.Transformation[source]#
Bases:
LP_AnyTransformation
A non-differentially private unit of computation. A transformation contains a function and a stability relation. The function maps from an input domain to an output domain. The stability relation maps from an input metric to an output metric.
- Example:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib") ... >>> # create an instance of Transformation using a constructor from the trans module >>> input_space = (dp.vector_domain(dp.atom_domain(T=int)), dp.symmetric_distance()) >>> count: dp.Transformation = input_space >> dp.t.then_count() ... >>> # invoke the transformation (invoke and __call__ are equivalent) >>> count.invoke([1, 2, 3]) # -> 3 >>> count([1, 2, 3]) # -> 3 ... >>> # check the transformation's relation at >>> # (1, 1): (SymmetricDistance, AbsoluteDistance<u32>) >>> assert count.check(1, 1) ... >>> # chain with more transformations from the trans module >>> chained = ( ... dp.t.make_split_lines() >> ... dp.t.then_cast_default(TOA=int) >> ... count ... ) ... >>> # the resulting transformation has the same features >>> chained("1\n2\n3") # -> 3 >>> assert chained.check(1, 1) # both chained transformations were 1-stable
- check(d_in, d_out, *, debug=False)[source]#
Check if the transformation is (d_in, d_out)-close. If true, implies that if the distance between inputs is at most d_in, then the distance between outputs is at most d_out. See also
check()
, a similar check for measurements.- Parameters:
d_in – Distance in terms of the input metric.
d_out – Distance in terms of the output metric.
debug – Enable to raise Exceptions to help identify why the stability relation failed.
- Returns:
True if the relation passes. False if the relation failed.
- Return type:
bool
- Raises:
OpenDPException – packaged error from the core OpenDP library
- property input_carrier_type#
Retrieve the carrier type of the input domain. Any member of the input domain is a member of the carrier type.
- Returns:
carrier type
- property input_distance_type#
Retrieve the distance type of the input metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.
- Returns:
distance type
- invoke(arg)[source]#
Execute a non-differentially-private query with arg.
- Parameters:
arg – Input to the transformation.
- Returns:
non-differentially-private answer
- Raises:
OpenDPException – packaged error from the core OpenDP library
- property output_distance_type#
Retrieve the distance type of the output metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.
- Returns:
distance type
- opendp.mod.binary_search(predicate, bounds=None, T=None, return_sign=False)[source]#
Find the closest passing value to the decision boundary of predicate within float or integer bounds.
If bounds are not passed, conducts an exponential search.
- Parameters:
predicate (Callable[[float | int], bool]) – a monotonic unary function from a number to a boolean
bounds (Tuple[float, float] | Tuple[int, int] | None) – a 2-tuple of the lower and upper bounds to the input of predicate
T – type of argument to predicate, one of {float, int}
return_sign – if True, also return the direction away from the decision boundary
- Returns:
the discovered parameter within the bounds
- Raises:
TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.
- Example:
>>> from opendp.mod import binary_search >>> # Float binary search >>> assert binary_search(lambda x: x >= 5.) == 5. >>> assert binary_search(lambda x: x <= 5.) == 5. >>> # Integer binary search >>> assert binary_search(lambda x: x > 5, T=int) == 6 >>> assert binary_search(lambda x: x < 5, T=int) == 4
Find epsilon usage of the gaussian(scale=1.) mechanism applied on a dp mean. Assume neighboring datasets differ by up to three additions/removals, and fix delta to 1e-8.
>>> # build a histogram that emits float counts >>> input_space = dp.vector_domain(dp.atom_domain(bounds=(0., 100.)), 1000), dp.symmetric_distance() >>> dp_mean = dp.c.make_fix_delta(dp.c.make_zCDP_to_approxDP( ... input_space >> dp.t.then_mean() >> dp.m.then_gaussian(1.)), ... 1e-8 ... ) ... >>> dp.binary_search( ... lambda d_out: dp_mean.check(3, (d_out, 1e-8)), ... bounds = (0., 1.)) 0.5235561269546629
Find the L2 distance sensitivity of a histogram when neighboring datasets differ by up to 3 additions/removals.
>>> from opendp.transformations import make_count_by_categories >>> histogram = dp.t.make_count_by_categories( ... dp.vector_domain(dp.atom_domain(T=str)), dp.symmetric_distance(), ... categories=["a"], MO=dp.L2Distance[int]) ... >>> dp.binary_search( ... lambda d_out: histogram.check(3, d_out), ... bounds = (0, 100)) 3
- opendp.mod.binary_search_chain(make_chain, d_in, d_out, bounds=None, T=None)[source]#
Useful to find the Transformation or Measurement parameterized with the ideal constructor argument.
Optimizes a parameterized chain make_chain within float or integer bounds, subject to the chained relation being (d_in, d_out)-close.
See binary_search_param to retrieve the discovered parameter instead of the complete computation chain.
- Parameters:
make_chain (Callable[[float | int], Transformation | Measurement]) – a unary function that maps from a number to a Transformation or Measurement
d_in – desired input distance of the computation chain
d_out – desired output distance of the computation chain
bounds (Tuple[float, float] | Tuple[int, int] | None) – a 2-tuple of the lower and upper bounds to the input of make_chain
T – type of argument to make_chain, one of {float, int}
- Returns:
a chain parameterized at the nearest passing value to the decision point of the relation
- Return type:
Union[Transformation, Measurement]
- Raises:
TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.
- Examples:
Find a base_laplace measurement with the smallest noise scale that is still (d_in, d_out)-close.
>>> from typing import List >>> import opendp.prelude as dp >>> dp.enable_features("floating-point", "contrib") ... >>> # The majority of the chain only needs to be defined once. >>> pre = ( ... dp.space_of(List[float]) >> ... dp.t.then_clamp(bounds=(0., 1.)) >> ... dp.t.then_resize(size=10, constant=0.) >> ... dp.t.then_mean() ... ) ... >>> # Find a value in `bounds` that produces a (`d_in`, `d_out`)-chain nearest the decision boundary. >>> # The lambda function returns the complete computation chain when given a single numeric parameter. >>> chain = dp.binary_search_chain( ... lambda s: pre >> dp.m.then_base_laplace(scale=s), ... d_in=1, d_out=1.) ... >>> # The resulting computation chain is always (`d_in`, `d_out`)-close, but we can still double-check: >>> assert chain.check(1, 1.)
Build a (2 neighboring, 1. epsilon)-close sized bounded sum with discrete_laplace(100.) noise. It should have the widest possible admissible clamping bounds (-b, b).
>>> def make_sum(b): ... space = dp.vector_domain(dp.atom_domain((-b, b)), 10_000), dp.symmetric_distance() ... return space >> dp.t.then_sum() >> dp.m.then_laplace(100.) ... >>> # `meas` is a Measurement with the widest possible clamping bounds. >>> meas = dp.binary_search_chain(make_sum, d_in=2, d_out=1., bounds=(0, 10_000)) ... >>> # If you want the discovered clamping bound, use `binary_search_param` instead.
- opendp.mod.binary_search_param(make_chain, d_in, d_out, bounds=None, T=None)[source]#
Useful to solve for the ideal constructor argument.
Optimizes a parameterized chain make_chain within float or integer bounds, subject to the chained relation being (d_in, d_out)-close.
- Parameters:
make_chain (Callable[[float | int], Transformation | Measurement]) – a unary function that maps from a number to a Transformation or Measurement
d_in – desired input distance of the computation chain
d_out – desired output distance of the computation chain
bounds (Tuple[float, float] | Tuple[int, int] | None) – a 2-tuple of the lower and upper bounds to the input of make_chain
T – type of argument to make_chain, one of {float, int}
- Returns:
the nearest passing value to the decision point of the relation
- Raises:
TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.
- Example:
- Return type:
float | int
>>> import opendp.prelude as dp ... >>> # Find a value in `bounds` that produces a (`d_in`, `d_out`)-chain nearest the decision boundary. >>> # The first argument is any function that returns your complete computation chain >>> # when passed a single numeric parameter. ... >>> def make_fixed_laplace(scale): ... # fixes the input domain and metric, but parameterizes the noise scale ... return dp.m.make_base_laplace(dp.atom_domain(T=float), dp.absolute_distance(T=float), scale) ... >>> scale = dp.binary_search_param(make_fixed_laplace, d_in=0.1, d_out=1.) >>> assert scale == 0.1 >>> # Constructing the same chain with the discovered parameter will always be (0.1, 1.)-close. >>> assert make_fixed_laplace(scale).check(0.1, 1.)
A policy research organization wants to know the smallest sample size necessary to release an “accurate” epsilon=1 DP mean income. Determine the smallest dataset size such that, with 95% confidence, the DP release differs from the clipped dataset’s mean by no more than 1000. Assume that neighboring datasets have a symmetric distance at most 2. Also assume a clipping bound of 500,000.
>>> # we first work out the necessary noise scale to satisfy the above constraints. >>> necessary_scale = dp.accuracy_to_laplacian_scale(accuracy=1000., alpha=.05) ... >>> # we then write a function that make a computation chain with a given data size >>> def make_mean(data_size): ... return ( ... (dp.vector_domain(dp.atom_domain(bounds=(0., 500_000.)), data_size), dp.symmetric_distance()) >> ... dp.t.then_mean() >> ... dp.m.then_base_laplace(necessary_scale) ... ) ... >>> # solve for the smallest dataset size that admits a (2 neighboring, 1. epsilon)-close measurement >>> dp.binary_search_param( ... make_mean, ... d_in=2, d_out=1., ... bounds=(1, 1000000)) 1498
- opendp.mod.exponential_bounds_search(predicate, T)[source]#
Determine bounds for a binary search via an exponential search, in large bands of [2^((k - 1)^2), 2^(k^2)] for k in [0, 8). Will attempt to recover once if predicate throws an exception, by searching bands on the ok side of the exception boundary.
- Parameters:
predicate (Callable[[float | int], bool]) – a monotonic unary function from a number to a boolean
T (type | None) – type of argument to predicate, one of {float, int}
- Returns:
a tuple of float or int bounds that the decision boundary lies within
- Raises:
TypeError – if the type is not inferrable (pass T)
ValueError – if the predicate function is constant
- Return type:
Tuple[float, float] | Tuple[int, int] | None