Core Structures#
OpenDP is focused on creating computations with specific privacy characteristics.
These computations are modeled with two core structures in OpenDP:
opendp.mod.Transformation
and opendp.mod.Measurement
.
These structures are in all OpenDP programs, regardless of the underlying algorithm or definition of privacy.
By modeling computations in this abstract way, we’re able to combine them in flexible arrangements and reason about the resulting programs.
A unifying perspective towards OpenDP is that OpenDP is a system for relating:
an upper bound on distance between neighboring function inputs (to)
an upper bound on distance between respective function outputs (or distributions)
OpenDP naturally captures the definition of privacy via relations, because the definition of privacy is an upper bound on distance between probability distributions.
Each measurement or transformation is a self-contained structure with a relation, function, and supporting proof.
Measurement#
A Measurement
is a randomized mapping from datasets to outputs of an arbitrary type.
Say we have an arbitrary instance of a Measurement, called meas
, and a code snippet meas.check(d_in, d_out)
.
If the code snippet evaluates to True, then meas
is d_out
-DP on d_in
-close inputs,
or equivalently “(d_in
, d_out
)-close”.
The code snippet simply checks the privacy relation that comes bundled inside meas
.
The distances d_in
and d_out
are expressed in the units of the input metric and output measure.
Depending on the context, d_in
could be a distance bound to neighboring datasets or a global sensitivity,
and d_out
may be epsilon
, (epsilon, delta)
, or some other measure of privacy.
More information on distances is available here.
Each invocation of the measurement’s function (via meas.invoke(data)
or meas(data)
) is a differentially private release.
The privacy expenditure of each release is any d_out
that makes the relation pass.
The tightest privacy expenditure d_out
can be found directly via meas.map(d_in)
.
A measurement structure contains the following internal fields:
- input_domain:
A domain that describes the set of all possible input values for the function.
- output_domain:
A domain that describes the set of all possible output values of the function.
- function:
A function that computes a differentially-private release on private data.
- input_metric:
A metric used to compute distance between two members of the input domain.
- output_measure:
A measure used to measure distance between two distributions in the output domain.
- privacy_map:
A map that encapsulates the privacy characteristics of the function.
The framework quantifies output distance bounds on measurements with a measure, instead of a metric, because measurements emit samples from a probability distribution, and measures can be used to quantify differences between probability distributions. This is the primary differentiating factor between measurements and transformations.
Transformation#
A Transformation
is a (deterministic) mapping from datasets to datasets.
Transformations are used to preprocess and aggregate data before chaining with a measurement.
Similarly to meas
above, say we have an arbitrary instance of a Transformation, called trans
,
and a code snippet trans.check(d_in, d_out)
.
If the code snippet evaluates to True, then trans
is d_out
-close on d_in
-close inputs,
or equivalently “(d_in
, d_out
)-close”.
The code snippet simply checks the stability relation that comes bundled inside trans
.
In this context, the relation captures the stability of a transformation.
The distances d_in
and d_out
are expressed in the units of the input metric and output metric.
Depending on the context, d_in
and d_out
could be a distance bound to neighboring datasets or a global sensitivity.
More information on distances is available here.
Invoking the function (via trans.invoke(data)
or trans(data)
) transforms the data, but the output is not differentially private.
Transformations need to be chained with a measurement before they can be used to create a differentially-private release.
A transformation structure contains the following internal fields:
- input_domain:
A domain that describes the set of all possible input values for the function.
- output_domain:
A domain that describes the set of all possible output values of the function.
- function:
A function that transforms data.
- input_metric:
A metric used to compute distance between two members of the input domain.
- output_metric:
A metric used to measure distance between two members of the output domain.
- stability_map:
A map that encapsulates the stability characteristics of the function.