Questions or feedback?

Report a bug or request a feature on Github.
Send general queries to info@opendp.org, or email security@opendp.org if it is related to security.
Join the conversation on Slack, or the mailing list.

opendp.mod module#

The mod module provides the classes which implement the OpenDP Programming Framework, as well as utilities for enabling features and finding parameter values.

The classes here correspond to other top-level modules: For example, instances of opendp.mod.Domain are either inputs or outputs for functions in opendp.domains.

class opendp.mod.AtomDomain[source]#

Bases: Domain

The domain of all values of a given atomic type.

Create an instance of this domain with opendp.domains.atom_domain().

If bounds are set, then the domain is restricted to the bounds. If nullable is set, then null value(s) are included in the domain.

bounds#: Bounds of the domain, if they exist

nan#

Whether the domain includes NaN values

Only relevant when the carrier type is a floating point type. All other types will always return False.

class opendp.mod.Domain[source]#

Bases: LP_AnyDomain

See the Domain section in the Programming Framework docs for more context.

Functions for creating domains are in opendp.domains.

carrier_type#: Carrier type of domain

member(val)[source]#

Check if val is a member of the domain.

Parameters:: val – a value to be checked for membership in self
Return type:: bool

type#: Type of domain

class opendp.mod.ExtrinsicDomain[source]#

Bases: Domain

A user-defined domain.

descriptor#: Descriptor of domain. Used to retrieve the descriptor associated with domains defined in Python

class opendp.mod.Function[source]#

Bases: LP_AnyFunction

See the Function section in the Programming Framework docs for more context.

class opendp.mod.LazyFrameDomain[source]#

Bases: Domain

LazyFrameDomain describes the domain of all polars LazyFrames.

Create an instance of this domain with opendp.domains.lazyframe_domain().

columns#: List of column names in the frame

get_margin(by)[source]#

Get the margin descriptor of the frame when grouped by the given columns

Parameters:: by (Sequence[Any]) –

get_series_domain(name)[source]#

Retrieve the series domain with the given name

Parameters:: name (str) –
Return type:: SeriesDomain

class opendp.mod.Measure[source]#

Bases: LP_AnyMeasure

See the Measure section in the Programming Framework docs for more context.

Measures should be created with the functions in opendp.measures or opendp.context, for a higher-level interface:

>>> import opendp.prelude as dp
>>> measure, distance = dp.loss_of(epsilon=1.0)
>>> measure, distance
(MaxDivergence, 1.0)

distance_type#: Distance type of measure

type#: Type of measure

class opendp.mod.Measurement[source]#

Bases: LP_AnyMeasurement

A differentially private unit of computation. A measurement contains a function and a privacy relation. The function releases a differentially-private release. The privacy relation maps from an input metric to an output measure.

See the Measurement section in the Programming Framework docs for more context.

Functions for creating measurements are in opendp.measurements.

Example:

>>> import opendp.prelude as dp
>>> dp.enable_features("contrib")

>>> # create an instance of Measurement using a constructor from the meas module
>>> laplace = dp.m.make_laplace(
...     dp.atom_domain(T=int), dp.absolute_distance(T=int),
...     scale=2.)
>>> laplace
Measurement(
    input_domain   = AtomDomain(T=i32),
    input_metric   = AbsoluteDistance(i32),
    output_measure = MaxDivergence)

>>> # invoke the measurement (invoke and __call__ are equivalent)
>>> print('explicit: ', laplace.invoke(100))  # -> 101   
explicit: ...
>>> print('concise: ', laplace(100))  # -> 99            
concise: ...
>>> # check the measurement's relation at
>>> #     (1, 0.5): (AbsoluteDistance<u32>, MaxDivergence)
>>> assert laplace.check(1, 0.5)

>>> # chain with a transformation from the trans module
>>> chained = (
...     (dp.vector_domain(dp.atom_domain(T=int)), dp.symmetric_distance()) >>
...     dp.t.then_count() >>
...     laplace
... )

>>> # the resulting measurement has the same features
>>> print('dp count: ', chained([1, 2, 3]))  # -> 4     
dp count: ...

>>> # check the chained measurement's relation at
>>> #     (1, 0.5): (SymmetricDistance, MaxDivergence)
>>> assert chained.check(1, 0.5)

check(d_in, d_out, *, debug=False)[source]#

Check if the measurement is (d_in, d_out)-close. If true, implies that if the distance between inputs is at most d_in, then the privacy usage is at most d_out. See also check(), a similar check for transformations.

Parameters:

d_in – Distance in terms of the input metric.
d_out – Distance in terms of the output measure.
debug – Enable to raise Exceptions to help identify why the privacy relation failed.

Returns:

If True, a release is differentially private at d_in, d_out.

Return type:

bool

function#: Function of measurement

input_carrier_type#

Retrieve the carrier type of the input domain. Any member of the input domain is a member of the carrier type.

Returns:: carrier type

input_distance_type#

Retrieve the distance type of the input metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.

Returns:: distance type

input_domain#: Input domain of measurement

input_metric#: Input metric of measurement

input_space#: Input space of measurement

invoke(arg)[source]#

Create a differentially-private release with arg.

If self is (d_in, d_out)-close, then each invocation of this function is a d_out-DP release.

Parameters:: arg – Input to the measurement.
Returns:: differentially-private release
Raises:: OpenDPException – packaged error from the core OpenDP library

map(d_in)[source]#

Map an input distance d_in to an output distance.

Parameters:: d_in – Input distance

output_distance_type#

Retrieve the distance type of the output measure. This is the type that the budget is expressed in.

Returns:: distance type

output_measure#: Output measure of measurement

class opendp.mod.Metric[source]#

Bases: LP_AnyMetric

See the Metric section in the Programming Framework docs for more context.

Functions for creating metrics are in opendp.metrics.

distance_type#: Distance type of metric

type#: Type of metric

exception opendp.mod.OpenDPException(variant, message=None, raw_traceback=None)[source]#

Bases: Exception

General exception for errors originating from the underlying OpenDP library. The variant attribute corresponds to one of the following variants and can be matched on. Error variants may change in library updates.

See Rust ErrorVariant for values variant may take on.

Run dp.enable_features('rust-stack-trace') to see wrapped Rust stack traces.

Parameters:

variant (str) –
message (Optional[str]) –
raw_traceback (str | None) –

raw_traceback: str | None#

class opendp.mod.OptionDomain[source]#

Bases: Domain

A domain whose members are either members of the element_domain, or None.

Create an instance of this domain with opendp.domains.option_domain().

The element domain is the domain of non-null values.

element_domain#: Domain of non-null values

class opendp.mod.PrivacyProfile(curve)[source]#

Given a profile function provided by the user, gives the epsilon corresponding to a given delta, and vice versa.

new_privacy_profile should be used to create new instances.

delta(epsilon)[source]#

Returns the delta that corresponds to this epsilon.

Parameters:: epsilon – Allowance for a multiplicative difference, or max divergence, in the distributions of releases on adjacent datasets

epsilon(delta)[source]#

Returns the epsilon that corresponds to this delta.

Parameters:: delta – Allowance for an additive difference between the distributions of releases on adjacent datasets

class opendp.mod.Queryable(value, query_type)[source]#

Queryables are used for interactive mechanisms like sequential composition.

Queryables can be created with make_sequential_composition or new_queryable.

class opendp.mod.SeriesDomain[source]#

Bases: Domain

SeriesDomain describes the domain of all polars Series.

Create an instance of this domain with opendp.domains.series_domain().

element_domain#: Domain of non-null elements in the series

name#: Name of series in the domain

nullable#: Whether series in the domain may include null values

class opendp.mod.Transformation[source]#

Bases: LP_AnyTransformation

A non-differentially private unit of computation. A transformation contains a function and a stability relation. The function maps from an input domain to an output domain. The stability relation maps from an input metric to an output metric.

See the Transformation section in the Programming Framework docs for more context.

Functions for creating transformations are in opendp.transformations.

Example:

>>> import opendp.prelude as dp
>>> dp.enable_features("contrib")

>>> # create an instance of Transformation using a constructor from the trans module
>>> input_space = (dp.vector_domain(dp.atom_domain(T=int)), dp.symmetric_distance())
>>> count = input_space >> dp.t.then_count()
>>> count
Transformation(
    input_domain   = VectorDomain(AtomDomain(T=i32)),
    output_domain  = AtomDomain(T=i32),
    input_metric   = SymmetricDistance(),
    output_metric  = AbsoluteDistance(i32))

>>> count.input_space
(VectorDomain(AtomDomain(T=i32)), SymmetricDistance())

>>> # invoke the transformation (invoke and __call__ are equivalent)
>>> count.invoke([1, 2, 3])
3
>>> count([1, 2, 3])
3
>>> # check the transformation's relation at
>>> #     (1, 1): (SymmetricDistance, AbsoluteDistance<u32>)
>>> assert count.check(1, 1)

>>> # chain with more transformations from the trans module
>>> chained = (
...     dp.t.make_split_lines() >>
...     dp.t.then_cast_default(TOA=int) >>
...     count
... )

>>> # the resulting transformation has the same features
>>> chained("1\n2\n3")
3
>>> assert chained.check(1, 1)  # both chained transformations were 1-stable

check(d_in, d_out, *, debug=False)[source]#

Check if the transformation is (d_in, d_out)-close. If true, implies that if the distance between inputs is at most d_in, then the distance between outputs is at most d_out. See also check(), a similar check for measurements.

Parameters:

d_in – Distance in terms of the input metric.
d_out – Distance in terms of the output metric.
debug – Enable to raise Exceptions to help identify why the stability relation failed.

Returns:

True if the relation passes. False if the relation failed.

Return type:

bool

Raises:

OpenDPException – packaged error from the core OpenDP library

function#: Function of transformation

input_carrier_type#

Retrieve the carrier type of the input domain. Any member of the input domain is a member of the carrier type.

Returns:: carrier type

input_distance_type#

Retrieve the distance type of the input metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.

Returns:: distance type

input_domain#: Input domain of transformation

input_metric#: Input metric of transformation

input_space#: Input space of transformation

invoke(arg)[source]#

Execute a non-differentially-private query with arg.

Parameters:: arg – Input to the transformation.
Returns:: non-differentially-private answer
Raises:: OpenDPException – packaged error from the core OpenDP library

map(d_in)[source]#

Map an input distance d_in to an output distance.

Parameters:: d_in – Input distance. An upper bound on how far apart neighboring datasets can be with respect to the input metric

output_distance_type#

Retrieve the distance type of the output metric. This may be any integral type for dataset metrics, or any numeric type for sensitivity metrics.

Returns:: distance type

output_domain#: Output domain of transformation

output_metric#: Ouput metric of transformation

output_space#: Output space of transformation

exception opendp.mod.UnknownTypeException[source]#: Bases: Exception

class opendp.mod.VectorDomain[source]#

Bases: Domain

VectorDomain describes the domain of all vectors whose elements are members of a given domain.

Create an instance of this domain with opendp.domains.vector_domain().

element_domain#: Domain of elements in the vector

size#: Size of vectors in the domain, if it is fixed

opendp.mod.assert_features(*features)[source]#

Check whether a given feature is enabled. See Feature Listing for details.

Parameters:: features (str) –
Return type:: None

opendp.mod.binary_search(predicate: Callable[[float], bool], bounds: tuple[float, float] | None = None, T: Type[float] | None = None, return_sign: Literal[False] = False) → float[source]#

opendp.mod.binary_search(predicate: Callable[[float], bool], bounds: tuple[float, float] | None = None, T: Type[float] | None = None, *, return_sign: Literal[True]) → tuple[float, int]

opendp.mod.binary_search(predicate: Callable[[float], bool], bounds: tuple[float, float] | None, T: Type[float] | None, return_sign: Literal[True]) → tuple[float, int]

Find the closest passing value to the decision boundary of predicate.

If bounds are not passed, conducts an exponential search.

Parameters:

predicate – a monotonic unary function from a number to a boolean
bounds – a 2-tuple of the lower and upper bounds to the input of predicate
T – type of argument to predicate, one of {float, int}
return_sign – if True, also return the direction away from the decision boundary

Returns:

the discovered parameter within the bounds

Raises:

TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.

Example:

>>> import opendp.prelude as dp
>>> dp.binary_search(lambda x: x >= 5.)
5.0
>>> dp.binary_search(lambda x: x <= 5.)
5.0
>>>
>>> dp.binary_search(lambda x: x > 5, T=int)
6
>>> dp.binary_search(lambda x: x < 5, T=int)
4

Find epsilon usage of the gaussian(scale=1.) mechanism applied on a dp mean. Assume neighboring datasets differ by up to three additions/removals, and fix delta to 1e-8.

>>> # build a histogram that emits float counts
>>> input_space = dp.vector_domain(dp.atom_domain(bounds=(0., 100.)), 1000), dp.symmetric_distance()
>>> dp_mean = dp.c.make_fix_delta(dp.c.make_zCDP_to_approxDP(
...     input_space >> dp.t.then_mean() >> dp.m.then_gaussian(1.)), 
...     1e-8
... )
...
>>> dp.binary_search(
...     lambda d_out: dp_mean.check(3, (d_out, 1e-8)), 
...     bounds = (0., 1.))
0.5235561269546629

Find the L2 distance sensitivity of a histogram when neighboring datasets differ by up to 3 additions/removals.

>>> histogram = dp.t.make_count_by_categories(
...     dp.vector_domain(dp.atom_domain(T=str)), dp.symmetric_distance(),
...     categories=["a"], MO=dp.L2Distance[int])
...
>>> dp.binary_search(
...     lambda d_out: histogram.check(3, d_out), 
...     bounds = (0, 100))
3

opendp.mod.binary_search_chain(make_chain, d_in, d_out, bounds=None, T=None)[source]#

Find the highest-utility (d_in, d_out)-close Transformation or Measurement.

Searches for the numeric parameter to make_chain that results in a computation that most tightly satisfies d_out when datasets differ by at most d_in, then returns the Transformation or Measurement corresponding to said parameter.

See binary_search_param to retrieve the discovered parameter instead of the complete computation chain.

Parameters:

make_chain (Callable[[float], M]) – a function that takes a number and returns a Transformation or Measurement
d_in (Any) – how far apart input datasets can be
d_out (Any) – how far apart output datasets or distributions can be
bounds (tuple[float, float] | None) – a 2-tuple of the lower and upper bounds on the input of make_chain
T – type of argument to make_chain, one of {float, int}

Returns:

a chain parameterized at the nearest passing value to the decision point of the relation

Return type:

Union[Transformation, Measurement]

Raises:

TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.

Example:

Find a laplace measurement with the smallest noise scale that is still (d_in, d_out)-close.

>>> import opendp.prelude as dp
>>> dp.enable_features("floating-point", "contrib")
...
>>> # The majority of the chain only needs to be defined once.
>>> pre = (
...     dp.space_of(list[float]) >>
...     dp.t.then_impute_constant(0.0) >>
...     dp.t.then_clamp(bounds=(0., 1.)) >>
...     dp.t.then_resize(size=10, constant=0.) >>
...     dp.t.then_mean()
... )
...
>>> # Find a value in `bounds` that produces a (`d_in`, `d_out`)-chain nearest the decision boundary.
>>> # The lambda function returns the complete computation chain when given a single numeric parameter.
>>> chain = dp.binary_search_chain(
...     lambda s: pre >> dp.m.then_laplace(scale=s), 
...     d_in=1, d_out=1.)
...
>>> # The resulting computation chain is always (`d_in`, `d_out`)-close, but we can still double-check:
>>> assert chain.check(1, 1.)

Build a (2 neighboring, 1. epsilon)-close sized bounded sum with discrete_laplace(100.) noise. It should have the widest possible admissible clamping bounds (-b, b).

>>> def make_sum(b):
...     space = dp.vector_domain(dp.atom_domain((-b, b)), 10_000), dp.symmetric_distance()
...     return space >> dp.t.then_sum() >> dp.m.then_laplace(100.)
...
>>> # `meas` is a Measurement with the widest possible clamping bounds.
>>> meas = dp.binary_search_chain(make_sum, d_in=2, d_out=1., bounds=(0, 10_000))
...
>>> # If you want the discovered clamping bound, use `binary_search_param` instead.

opendp.mod.binary_search_param(make_chain, d_in, d_out, bounds=None, T=None)[source]#

Solve for the ideal constructor argument to make_chain.

Optimizes a parameterized chain make_chain within float or integer bounds, subject to the chained relation being (d_in, d_out)-close.

Parameters:

make_chain (Callable[[float], Transformation | Measurement]) – a function that takes a number and returns a Transformation or Measurement
d_in (Any) – how far apart input datasets can be
d_out (Any) – how far apart output datasets or distributions can be
bounds (tuple[float, float] | None) – a 2-tuple of the lower and upper bounds on the input of make_chain
T – type of argument to make_chain, one of {float, int}

Returns:

the nearest passing value to the decision point of the relation

Raises:

TypeError – if the type is not inferrable (pass T) or the type is invalid
ValueError – if the predicate function is constant, bounds cannot be inferred, or decision boundary is not within bounds.

Example:

Return type:

float

>>> import opendp.prelude as dp
...
>>> # Find a value in `bounds` that produces a (`d_in`, `d_out`)-chain nearest the decision boundary.
>>> # The first argument is any function that returns your complete computation chain
>>> #     when passed a single numeric parameter.
...
>>> def make_fixed_laplace(scale):
...     # fixes the input domain and metric, but parameterizes the noise scale
...     return dp.m.make_laplace(dp.atom_domain(T=float, nan=False), dp.absolute_distance(T=float), scale)
...
>>> scale = dp.binary_search_param(make_fixed_laplace, d_in=0.1, d_out=1.)
>>> assert scale == 0.1
>>> # Constructing the same chain with the discovered parameter will always be (0.1, 1.)-close.
>>> assert make_fixed_laplace(scale).check(0.1, 1.)

A policy research organization wants to know the smallest sample size necessary to release an “accurate” epsilon=1 DP mean income. Determine the smallest dataset size such that, with 95% confidence, the DP release differs from the clipped dataset’s mean by no more than 1000. Assume that neighboring datasets have a symmetric distance at most 2. Also assume a clipping bound of 500,000.

>>> # we first work out the necessary noise scale to satisfy the above constraints.
>>> necessary_scale = dp.accuracy_to_laplacian_scale(accuracy=1000., alpha=.05)
...
>>> # we then write a function that make a computation chain with a given data size
>>> def make_mean(data_size):
...    return (
...        (dp.vector_domain(dp.atom_domain(bounds=(0., 500_000.)), data_size), dp.symmetric_distance()) >>
...        dp.t.then_mean() >> 
...        dp.m.then_laplace(necessary_scale)
...    )
...
>>> # solve for the smallest dataset size that admits a (2 neighboring, 1. epsilon)-close measurement
>>> dp.binary_search_param(
...     make_mean, 
...     d_in=2, d_out=1.,
...     bounds=(1, 1000000))
1498

opendp.mod.deserialize(dp_json)[source]#

opendp.mod.disable_features(*features)[source]#

Disallow the use of optional features. See Feature Listing for details.

Parameters:: features (str) –
Return type:: None

opendp.mod.enable_features(*features)[source]#

Allow the use of optional features. See Feature Listing for details.

Parameters:: features (str) –
Return type:: None

opendp.mod.exponential_bounds_search(predicate, T)[source]#

Determine bounds for a binary search via an exponential search, in large bands of [2^((k - 1)^2), 2^(k^2)] for k in [0, 8). Will attempt to recover once if predicate throws an exception, by searching bands on the ok side of the exception boundary.

Parameters:

predicate (Callable[[float], bool]) – a monotonic unary function from a number to a boolean
T (Type[float] | None) – type of argument to predicate, one of {float, int}

Returns:

a tuple of float or int bounds that the decision boundary lies within

Raises:

TypeError – if the type is not inferrable (pass T)
ValueError – if the predicate function is constant

Return type:

tuple[float, float] | None

opendp.mod.serialize(dp_obj)[source]#

Branches

Releases

opendp.mod module#