opendp.mod module#
The mod
module provides the classes which implement the
OpenDP Programming Framework,
as well as utilities for enabling features and finding parameter values.
The classes here correspond to other top-level modules: For example,
instances of opendp.mod.Domain
are either inputs or outputs for functions in opendp.domains
.
- class opendp.mod.Domain[source]#
Bases:
LP_AnyDomain
See the Domain section in the Programming Framework docs for more context.
Functions for creating domains are in
opendp.domains
.- carrier_type#
- descriptor#
- type#
- class opendp.mod.Function[source]#
Bases:
LP_AnyFunction
See the Function section in the Programming Framework docs for more context.
- class opendp.mod.Measure[source]#
Bases:
LP_AnyMeasure
See the Measure section in the Programming Framework docs for more context.
Measures should be created with the functions in
opendp.measures
oropendp.context
, for a higher-level interface:>>> import opendp.prelude as dp >>> measure, distance = dp.loss_of(epsilon=1.0) >>> measure, distance (MaxDivergence(f64), 1.0)
- distance_type#
- type#
- class opendp.mod.Measurement[source]#
Bases:
LP_AnyMeasurement
A differentially private unit of computation. A measurement contains a function and a privacy relation. The function releases a differentially-private release. The privacy relation maps from an input metric to an output measure.
See the Measurement section in the Programming Framework docs for more context.
Functions for creating measurements are in
opendp.measurements
.- Example:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib")
>>> # create an instance of Measurement using a constructor from the meas module >>> laplace = dp.m.make_laplace( ... dp.atom_domain(T=int), dp.absolute_distance(T=int), ... scale=2.) >>> laplace Measurement( input_domain = AtomDomain(T=i32), input_metric = AbsoluteDistance(i32), output_measure = MaxDivergence(f64))
>>> # invoke the measurement (invoke and __call__ are equivalent) >>> print('explicit: ', laplace.invoke(100)) # -> 101 explicit: ... >>> print('concise: ', laplace(100)) # -> 99 concise: ... >>> # check the measurement's relation at >>> # (1, 0.5): (AbsoluteDistance<u32>, MaxDivergence) >>> assert laplace.check(1, 0.5)
>>> # chain with a transformation from the trans module >>> chained = ( ... (dp.vector_domain(dp.atom_domain(T=int)), dp.symmetric_distance()) >> ... dp.t.then_count() >> ... laplace ... )
>>> # the resulting measurement has the same features >>> print('dp count: ', chained([1, 2, 3])) # -> 4 dp count: ...
>>> # check the chained measurement's relation at >>> # (1, 0.5): (SymmetricDistance, MaxDivergence) >>> assert chained.check(1, 0.5)
- check(d_in, d_out, *, debug=False)[source]#
Check if the measurement is (d_in, d_out)-close. If true, implies that if the distance between inputs is at most d_in, then the privacy usage is at most d_out. See also
check()
, a similar check for transformations.- Parameters:
d_in – Distance in terms of the input metric.
d_out – Distance in terms of the output measure.
debug – Enable to raise Exceptions to help identify why the privacy relation failed.
- Returns:
If True, a release is differentially private at d_in, d_out.
- Return type:
bool
- function#
- input_carrier_type#
Retrieve the carrier type of the input domain. Any member of the input domain is a member of the carrier type.
- Returns:
carrier type
- input_distance_type#
Retrieve the distance type of the input metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.
- Returns:
distance type
- input_domain#
- input_metric#
- input_space#
- invoke(arg)[source]#
Create a differentially-private release with arg.
If self is (d_in, d_out)-close, then each invocation of this function is a d_out-DP release.
- Parameters:
arg – Input to the measurement.
- Returns:
differentially-private release
- Raises:
OpenDPException – packaged error from the core OpenDP library
- output_distance_type#
Retrieve the distance type of the output measure. This is the type that the budget is expressed in.
- Returns:
distance type
- output_measure#
- class opendp.mod.Metric[source]#
Bases:
LP_AnyMetric
See the Metric section in the Programming Framework docs for more context.
Functions for creating metrics are in
opendp.metrics
.- distance_type#
- type#
- exception opendp.mod.OpenDPException(variant, message=None, raw_traceback=None)[source]#
Bases:
Exception
General exception for errors originating from the underlying OpenDP library. The variant attribute corresponds to one of the following variants and can be matched on. Error variants may change in library updates.
See Rust ErrorVariant for values variant may take on.
Run
dp.enable_features('rust-stack-trace')
to see wrapped Rust stack traces.- Parameters:
variant (str) –
message (Optional[str]) –
raw_traceback (str | None) –
- raw_traceback: str | None#
- class opendp.mod.Transformation[source]#
Bases:
LP_AnyTransformation
A non-differentially private unit of computation. A transformation contains a function and a stability relation. The function maps from an input domain to an output domain. The stability relation maps from an input metric to an output metric.
See the Transformation section in the Programming Framework docs for more context.
Functions for creating transformations are in
opendp.transformations
.- Example:
>>> import opendp.prelude as dp >>> dp.enable_features("contrib")
>>> # create an instance of Transformation using a constructor from the trans module >>> input_space = (dp.vector_domain(dp.atom_domain(T=int)), dp.symmetric_distance()) >>> count = input_space >> dp.t.then_count() >>> count Transformation( input_domain = VectorDomain(AtomDomain(T=i32)), output_domain = AtomDomain(T=i32), input_metric = SymmetricDistance(), output_metric = AbsoluteDistance(i32))
>>> count.input_space (VectorDomain(AtomDomain(T=i32)), SymmetricDistance())
>>> # invoke the transformation (invoke and __call__ are equivalent) >>> count.invoke([1, 2, 3]) 3 >>> count([1, 2, 3]) 3 >>> # check the transformation's relation at >>> # (1, 1): (SymmetricDistance, AbsoluteDistance<u32>) >>> assert count.check(1, 1)
>>> # chain with more transformations from the trans module >>> chained = ( ... dp.t.make_split_lines() >> ... dp.t.then_cast_default(TOA=int) >> ... count ... )
>>> # the resulting transformation has the same features >>> chained("1\n2\n3") 3 >>> assert chained.check(1, 1) # both chained transformations were 1-stable
- check(d_in, d_out, *, debug=False)[source]#
Check if the transformation is (d_in, d_out)-close. If true, implies that if the distance between inputs is at most d_in, then the distance between outputs is at most d_out. See also
check()
, a similar check for measurements.- Parameters:
d_in – Distance in terms of the input metric.
d_out – Distance in terms of the output metric.
debug – Enable to raise Exceptions to help identify why the stability relation failed.
- Returns:
True if the relation passes. False if the relation failed.
- Return type:
bool
- Raises:
OpenDPException – packaged error from the core OpenDP library
- function#
- input_carrier_type#
Retrieve the carrier type of the input domain. Any member of the input domain is a member of the carrier type.
- Returns:
carrier type
- input_distance_type#
Retrieve the distance type of the input metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.
- Returns:
distance type
- input_domain#
- input_metric#
- input_space#
- invoke(arg)[source]#
Execute a non-differentially-private query with arg.
- Parameters:
arg – Input to the transformation.
- Returns:
non-differentially-private answer
- Raises:
OpenDPException – packaged error from the core OpenDP library
- output_distance_type#
Retrieve the distance type of the output metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.
- Returns:
distance type
- output_domain#
- output_metric#
- output_space#
- opendp.mod.binary_search(predicate: Callable[[float], bool], bounds: tuple[float, float] | None = None, T: Type[float] | None = None, return_sign: Literal[False] = False) float [source]#
- opendp.mod.binary_search(predicate: Callable[[float], bool], bounds: tuple[float, float] | None = None, T: Type[float] | None = None, *, return_sign: Literal[True]) tuple[float, int]
- opendp.mod.binary_search(predicate: Callable[[float], bool], bounds: tuple[float, float] | None, T: Type[float] | None, return_sign: Literal[True]) tuple[float, int]
Find the closest passing value to the decision boundary of predicate.
If bounds are not passed, conducts an exponential search.
- Parameters:
predicate – a monotonic unary function from a number to a boolean
bounds – a 2-tuple of the lower and upper bounds to the input of predicate
T – type of argument to predicate, one of {float, int}
return_sign – if True, also return the direction away from the decision boundary
- Returns:
the discovered parameter within the bounds
- Raises:
TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.
- Example:
>>> import opendp.prelude as dp >>> dp.binary_search(lambda x: x >= 5.) 5.0 >>> dp.binary_search(lambda x: x <= 5.) 5.0
>>> dp.binary_search(lambda x: x > 5, T=int) 6 >>> dp.binary_search(lambda x: x < 5, T=int) 4
Find epsilon usage of the gaussian(scale=1.) mechanism applied on a dp mean. Assume neighboring datasets differ by up to three additions/removals, and fix delta to 1e-8.
>>> # build a histogram that emits float counts >>> input_space = dp.vector_domain(dp.atom_domain(bounds=(0., 100.)), 1000), dp.symmetric_distance() >>> dp_mean = dp.c.make_fix_delta(dp.c.make_zCDP_to_approxDP( ... input_space >> dp.t.then_mean() >> dp.m.then_gaussian(1.)), ... 1e-8 ... ) ... >>> dp.binary_search( ... lambda d_out: dp_mean.check(3, (d_out, 1e-8)), ... bounds = (0., 1.)) 0.5235561269546629
Find the L2 distance sensitivity of a histogram when neighboring datasets differ by up to 3 additions/removals.
>>> histogram = dp.t.make_count_by_categories( ... dp.vector_domain(dp.atom_domain(T=str)), dp.symmetric_distance(), ... categories=["a"], MO=dp.L2Distance[int]) ... >>> dp.binary_search( ... lambda d_out: histogram.check(3, d_out), ... bounds = (0, 100)) 3
- opendp.mod.binary_search_chain(make_chain, d_in, d_out, bounds=None, T=None)[source]#
Find the highest-utility (d_in, d_out)-close Transformation or Measurement.
Searches for the numeric parameter to make_chain that results in a computation that most tightly satisfies d_out when datasets differ by at most d_in, then returns the Transformation or Measurement corresponding to said parameter.
See binary_search_param to retrieve the discovered parameter instead of the complete computation chain.
- Parameters:
make_chain (Callable[[float], M]) – a function that takes a number and returns a Transformation or Measurement
d_in (Any) – how far apart input datasets can be
d_out (Any) – how far apart output datasets or distributions can be
bounds (tuple[float, float] | None) – a 2-tuple of the lower and upper bounds on the input of make_chain
T – type of argument to make_chain, one of {float, int}
- Returns:
a chain parameterized at the nearest passing value to the decision point of the relation
- Return type:
Union[Transformation, Measurement]
- Raises:
TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.
- Examples:
Find a laplace measurement with the smallest noise scale that is still (d_in, d_out)-close.
>>> import opendp.prelude as dp >>> dp.enable_features("floating-point", "contrib") ... >>> # The majority of the chain only needs to be defined once. >>> pre = ( ... dp.space_of(list[float]) >> ... dp.t.then_clamp(bounds=(0., 1.)) >> ... dp.t.then_resize(size=10, constant=0.) >> ... dp.t.then_mean() ... ) ... >>> # Find a value in `bounds` that produces a (`d_in`, `d_out`)-chain nearest the decision boundary. >>> # The lambda function returns the complete computation chain when given a single numeric parameter. >>> chain = dp.binary_search_chain( ... lambda s: pre >> dp.m.then_laplace(scale=s), ... d_in=1, d_out=1.) ... >>> # The resulting computation chain is always (`d_in`, `d_out`)-close, but we can still double-check: >>> assert chain.check(1, 1.)
Build a (2 neighboring, 1. epsilon)-close sized bounded sum with discrete_laplace(100.) noise. It should have the widest possible admissible clamping bounds (-b, b).
>>> def make_sum(b): ... space = dp.vector_domain(dp.atom_domain((-b, b)), 10_000), dp.symmetric_distance() ... return space >> dp.t.then_sum() >> dp.m.then_laplace(100.) ... >>> # `meas` is a Measurement with the widest possible clamping bounds. >>> meas = dp.binary_search_chain(make_sum, d_in=2, d_out=1., bounds=(0, 10_000)) ... >>> # If you want the discovered clamping bound, use `binary_search_param` instead.
- opendp.mod.binary_search_param(make_chain, d_in, d_out, bounds=None, T=None)[source]#
Solve for the ideal constructor argument to make_chain.
Optimizes a parameterized chain make_chain within float or integer bounds, subject to the chained relation being (d_in, d_out)-close.
- Parameters:
make_chain (Callable[[float], Transformation | Measurement]) – a function that takes a number and returns a Transformation or Measurement
d_in (Any) – how far apart input datasets can be
d_out (Any) – how far apart output datasets or distributions can be
bounds (tuple[float, float] | None) – a 2-tuple of the lower and upper bounds on the input of make_chain
T – type of argument to make_chain, one of {float, int}
- Returns:
the nearest passing value to the decision point of the relation
- Raises:
TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.
- Example:
- Return type:
float
>>> import opendp.prelude as dp ... >>> # Find a value in `bounds` that produces a (`d_in`, `d_out`)-chain nearest the decision boundary. >>> # The first argument is any function that returns your complete computation chain >>> # when passed a single numeric parameter. ... >>> def make_fixed_laplace(scale): ... # fixes the input domain and metric, but parameterizes the noise scale ... return dp.m.make_laplace(dp.atom_domain(T=float), dp.absolute_distance(T=float), scale) ... >>> scale = dp.binary_search_param(make_fixed_laplace, d_in=0.1, d_out=1.) >>> assert scale == 0.1 >>> # Constructing the same chain with the discovered parameter will always be (0.1, 1.)-close. >>> assert make_fixed_laplace(scale).check(0.1, 1.)
A policy research organization wants to know the smallest sample size necessary to release an “accurate” epsilon=1 DP mean income. Determine the smallest dataset size such that, with 95% confidence, the DP release differs from the clipped dataset’s mean by no more than 1000. Assume that neighboring datasets have a symmetric distance at most 2. Also assume a clipping bound of 500,000.
>>> # we first work out the necessary noise scale to satisfy the above constraints. >>> necessary_scale = dp.accuracy_to_laplacian_scale(accuracy=1000., alpha=.05) ... >>> # we then write a function that make a computation chain with a given data size >>> def make_mean(data_size): ... return ( ... (dp.vector_domain(dp.atom_domain(bounds=(0., 500_000.)), data_size), dp.symmetric_distance()) >> ... dp.t.then_mean() >> ... dp.m.then_laplace(necessary_scale) ... ) ... >>> # solve for the smallest dataset size that admits a (2 neighboring, 1. epsilon)-close measurement >>> dp.binary_search_param( ... make_mean, ... d_in=2, d_out=1., ... bounds=(1, 1000000)) 1498
- opendp.mod.exponential_bounds_search(predicate, T)[source]#
Determine bounds for a binary search via an exponential search, in large bands of [2^((k - 1)^2), 2^(k^2)] for k in [0, 8). Will attempt to recover once if predicate throws an exception, by searching bands on the ok side of the exception boundary.
- Parameters:
predicate (Callable[[float], bool]) – a monotonic unary function from a number to a boolean
T (Type[float] | None) – type of argument to predicate, one of {float, int}
- Returns:
a tuple of float or int bounds that the decision boundary lies within
- Raises:
TypeError – if the type is not inferrable (pass T)
ValueError – if the predicate function is constant
- Return type:
tuple[float, float] | None