This page was generated from docs/source/user/getting-started.ipynb. Interactive online version: Binder badge.

Getting Started#

This notebook is an introduction to programming in the OpenDP Library.

Before we get started, the notebook A Framework to Understand DP provides useful background. It explains

  • basic terminology like sensitivity and epsilon

  • the OpenDP Programming Framework

  • how we define transformations, measurements, stability and privacy


Any constructors that have not completed the proof-writing and vetting process may still be accessed if you opt-in to “contrib”. Please contact us if you are interested in proof-writing. Thank you!

[146]:
from opendp.mod import enable_features
enable_features("contrib")

The Laplace Mechanism#

The Laplace mechanism is a ubiquitous algorithm in the DP community that is used to privatize an aggregate, like a sum or mean.

An instance of the Laplace mechanism is captured by a measurement containing the following six elements:

  1. We first define the function \(f(\cdot)\), that applies the Laplace mechanism to some argument \(x\). This function simply samples from the Laplace distribution centered at \(x\), with a fixed noise scale.

\[f(x) = Laplace(\mu=x, b=scale)\]
  1. Importantly, \(f(\cdot)\) is only well-defined for any finite float input. This set of permitted inputs is described by the input domain (denoted AllDomain<f64>).

  2. The set of possible outputs is described by the output domain (also AllDomain<f64>).

  3. The Laplace mechanism has a privacy guarantee in terms of epsilon. This guarantee is represented by a privacy map, a function that computes the privacy loss \(\epsilon\) for any choice of sensitivity \(\Delta\).

\[map(\Delta) = \Delta / scale <= \epsilon\]
  1. This map only holds if the same aggregation applied to any two neighboring datasets may differ by no more than some quantity \(\Delta\) under the absolute distance input metric (AbsoluteDistance<f64>).

  2. We similarly describe units on the output (\(\epsilon\)) via the output measure (MaxDivergence<f64>).

The OpenDP Library consists of constructor functions that can be called with simple arguments and always return valid measurements. The make_base_laplace constructor function returns the equivalent of the Laplace measurement described above.

[147]:
from opendp.measurements import make_base_laplace

# call the constructor to produce the measurement `base_lap`
base_lap = make_base_laplace(scale=5.)

We now invoke the measurement on some aggregate 0., to sample \(Laplace(\mu=0., scale=5.)\) noise:

[148]:
aggregate = 0.
print("noisy aggregate:", base_lap(aggregate))
noisy aggregate: -9.028542355622688

We must know the sensitivity of aggregate to determine epsilon. In this case, we know base_lap has an absolute distance input metric, so the sensitivity should represent the greatest possible absolute distance between aggregates on adjacent datasets.

[149]:
absolute_distance = 10.
print("epsilon:", base_lap.map(d_in=absolute_distance))
epsilon: 2.0

This tells us that when the sensitivity is 10, and the noise scale is 5, the epsilon consumption of a release is 2.

Transformation Example: Sum#

We package computations with bounded stability into transformations.

A transformation that computes the sum of a vector dataset contains a very similar set of six elements:

  1. We first define the function \(f(\cdot)\), that computes the sum of some argument \(x\).

\[f(x) = \sum x_i\]
  1. \(f(\cdot)\) is only well-defined for any vector input of a specific type. Each element must be bounded between some lower bound L and upper bound U. Thus the input domain is of type VectorDomain<BoundedDomain<f64>> and contains L and U.

  2. The output domain consists of any single finite f64 scalar: AllDomain<f64>.

  3. The sum transformation has a stability guarantee in terms of sensitivity. This guarantee is represented by a stability map, which is a function that computes the stability \(d_{out}\) for any choice of dataset distance \(d_{in}\). In this case \(d_{out}\) is in terms of the sensitivity.

\[map(d_{in}) = d_{in} * max(|L|, U) <= d_{out}\]
  1. This map only holds if neighboring datasets differ by no more than some quantity \(d_{in}\) under the symmetric distance input metric (SymmetricDistance).

  2. The sensitivity is computed with respect to the absolute distance. This gives units to the output (\(d_{out}\)) via the output metric (AbsoluteDistance<f64>).

make_bounded_sum constructs the equivalent of the sum transformation described above. It is important to note that since the bounds are float, the resulting transformation is calibrated to work for floating-point numbers. You will need to be careful and intentional about the types you use.

[150]:
from opendp.transformations import make_bounded_sum

# call the constructor to produce the transformation `bounded_sum`
bounded_sum = make_bounded_sum(bounds=(0., 5.))

According to the documentation, this transformation expects a vector of data with non-null elements bounded between 0. and 5.. We now invoke the transformation on some mock dataset that satisfies this constraint. Remember that since this component is a transformation, and not a measurement, the resulting output is not differentially private.

[151]:
# under the condition that the input data is a member of the input domain...
bounded_mock_dataset = [1.3, 3.8, 0., 5.]
# ...the exact sum is:
bounded_sum(bounded_mock_dataset)
[151]:
10.1

It can help to understand a simple example of how a stability map works, but going forward you don’t need to understand why the maps give the numbers they give in order to use the library.

The stability argument for this transformation’s advertised sensitivity goes roughly as follows:

If the input data consists of numbers bounded between 0. and 5.,
then the addition or removal of any one row can influence the sum by \(max(|0.|, 5.)\).
In addition, if one individual may contribute up to k rows,
then the sensitivity should further be multiplied by k.

In practice, the calculated sensitivity may be larger under certain conditions to account for finite data types.

[152]:
# under the condition that one individual may contribute up to 2 records to `bounded_mock_dataset`...
max_contributions = 2
# ...then the sensitivity, expressed in terms of the absolute distance, is:
bounded_sum.map(d_in=max_contributions)
[152]:
10.00000004656613

As we would expect, the sensitivity is roughly 2 * max(|0.|, 5.).

Transformation Example: Clamp#

The sum transformation has an input domain of vectors with bounded elements. We now construct a transformation that clamps/clips each element to a given set of bounds.

Instead of listing the components of a clamp transformation as I’ve done above, going forward you can check the **Supporting Elements** section of the relevant API documentation entry:

[153]:
from opendp.transformations import make_clamp
help(make_clamp)
Help on function make_clamp in module opendp.transformations:

make_clamp(bounds: Tuple[Any, Any], TA: Union[ForwardRef('RuntimeType'), _GenericAlias, str, Type[Union[List, Tuple, int, float, str, bool]], tuple] = None) -> opendp.mod.Transformation
    Make a Transformation that clamps numeric data in `Vec<TA>` to `bounds`.

    If datum is less than lower, let datum be lower.
    If datum is greater than upper, let datum be upper.

    [make_clamp in Rust documentation.](https://docs.rs/opendp/latest/opendp/transformations/fn.make_clamp.html)

    **Supporting Elements:**

    * Input Domain:   `VectorDomain<AllDomain<TA>>`
    * Output Domain:  `VectorDomain<BoundedDomain<TA>>`
    * Input Metric:   `SymmetricDistance`
    * Output Metric:  `SymmetricDistance`

    :param bounds: Tuple of inclusive lower and upper bounds.
    :type bounds: Tuple[Any, Any]
    :param TA: Atomic Type
    :type TA: :py:ref:`RuntimeTypeDescriptor`
    :rtype: Transformation
    :raises TypeError: if an argument's type differs from the expected type
    :raises UnknownTypeError: if a type argument fails to parse
    :raises OpenDPException: packaged error from the core OpenDP library

Documentation for specific types may be found behind the following links:

[154]:
# call the constructor to produce the transformation `clamp`
clamp = make_clamp(bounds=(0., 5.))

# `clamp` expects vectors of non-null, unbounded elements
mock_dataset = [1.3, 7.8, -2.5, 7.0]
# `clamp` emits data that is suitable for `bounded_sum`
clamp(mock_dataset)
[154]:
[1.3, 5.0, 0.0, 5.0]

According to the API documentation, the input and output metric is SymmetricDistance. Therefore, the stability map accepts a dataset distance describing the maximum number of contributions an individual may make, and emits the same.

The stability argument for the clamp transformation is very simple:

If an individual may influence at most k records in a dataset,
then after clamping each element,
an individual may still influence at most k records in a dataset.
[155]:
# dataset distance in... dataset distance out
clamp.map(max_contributions)
[155]:
2

Chaining#

The OpenDP library supports chaining a transformation with a transformation to produce a compound transformation, or a transformation with a measurement to produce a compound measurement.

When any two compatible computations are chained, all six components of each primitive are used to construct the new primitive.

A measurement produced from chaining a transformation with a measurement contains the same set of six elements as in previous examples:

  1. A function \(f(\cdot)\). When you chain, the output domain of the transformation must match the input domain of the measurement.

\[f(x) = measurement(transformation(x))\]
  1. The input domain from the transformation.

  2. The output domain from the measurement.

  3. A privacy_map \(map(\cdot)\). When you chain, the output metric of the transformation must match the input metric of the measurement.

\[map(d_{in}) = measurement.map(transformation.map(d_{in}))\]
  1. The input metric from the transformation.

  2. The output measure from the measurement.

A similar logic is used when chaining a transformation with a transformation.

We know that the

  • output domain of bounded_sum matches the input domain of base_lap, and the

  • output metric of bounded_sum matches the input metric of base_lap.

The same holds for clamp and bounded_sum. Therefore, we can chain all of these primitives to form a new compound measurement:

[156]:
dp_sum = clamp >> bounded_sum >> base_lap

# compute the DP sum of a dataset of bounded elements
print("DP sum:", dp_sum(mock_dataset))

# evaluate the privacy loss of the dp_sum, when an individual can contribute at most 2 records
print("epsilon:", dp_sum.map(d_in=max_contributions))
DP sum: 13.705564590232404
epsilon: 2.000000009313226

Retrospective#

Now that you have a more thorough understanding of what’s going on, we can breeze through an entire release:

[157]:
from opendp.transformations import *
from opendp.measurements import *

# establish public info
max_contributions = 2
bounds = (0., 5.)

# construct the measurement
dp_sum = make_clamp(bounds) >> make_bounded_sum(bounds) >> make_base_laplace(5.)

# evaluate the privacy expenditure and make a DP release
mock_dataset = [0.7, -0.3, 1., -1.]
print("epsilon:", dp_sum.map(max_contributions))
print("DP sum release:", dp_sum(mock_dataset))
epsilon: 2.000000009313226
DP sum release: 5.988716324791998

The next major sections of the documentation each cover a module where you can find constructors:

  • opendp.transformations

  • opendp.measurements

  • opendp.combinators