Additive Noise Mechanisms#
This notebook documents the variations on additive noise mechanisms in OpenDP:
Distribution: Laplace vs. Gaussian
Support: float vs. integer
Domain: scalar vs. vector
Bit-depth
Any constructors that have not completed the proof-writing and vetting process may still be accessed if you opt-in to “contrib”. Please contact us if you are interested in proof-writing. Thank you!
[1]:
import opendp.prelude as dp
dp.enable_features("contrib")
Distribution: Laplace vs. Gaussian#
The Laplace mechanism is a ubiquitous algorithm in the DP ecosystem that is used to privatize an aggregate, like a sum or mean.
An instance of the Laplace mechanism is captured by a measurement containing the following five elements:
Elements of a Laplace Measurement
We first define the function \(f(\cdot)\), that applies the Laplace mechanism to some argument \(x\). This function simply samples from the Laplace distribution centered at \(x\), with a fixed noise scale.
Importantly, \(f(\cdot)\) is only well-defined for any finite float input. This set of permitted inputs is described by the input domain (denoted
AtomDomain<f64>
).The Laplace mechanism has a privacy guarantee in terms of epsilon. This guarantee is represented by a privacy map, a function that computes the privacy loss \(\epsilon\) for any choice of sensitivity \(\Delta\).
This map only promises that the privacy loss will be at most \(\epsilon\) if inputs from any two neighboring datasets may differ by no more than some quantity \(\Delta\) under the absolute distance input metric (
AbsoluteDistance<f64>
).We similarly describe units on the output (\(\epsilon\)) via the output measure (
MaxDivergence
).
The make_laplace
constructor function returns the equivalent of the Laplace measurement described above.
[2]:
# call the constructor to produce the measurement `base_lap`
input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float)
base_lap = dp.m.make_laplace(*input_space, scale=2.)
# invoke the measurement on some aggregate x, to sample Laplace(x, 1.) noise
aggregated = 0.
print("noisy aggregate:", base_lap(aggregated))
# we must know the sensitivity of `aggregated` to determine epsilon
sensitivity = 1.
print("epsilon:", base_lap.map(d_in=sensitivity))
noisy aggregate: 0.8418214435677124
epsilon: 0.5
The analogous constructor for gaussian noise is make_gaussian
:
[3]:
# call the constructor to produce the measurement `gauss`
input_space = dp.atom_domain(T=float), dp.absolute_distance(T=float)
gauss = dp.m.make_gaussian(*input_space, scale=2.)
# invoke the measurement on some aggregate x, to sample Gaussian(x, 1.) noise
aggregated = 0.
print("noisy aggregate:", gauss(aggregated))
# we must know the sensitivity of `aggregated` to determine epsilon
sensitivity = 1.
print("rho:", gauss.map(d_in=sensitivity))
noisy aggregate: 2.322023365796082
rho: 0.125
Notice that base_lap
measures privacy with epsilon (in the MaxDivergence
measure), and base_gauss
measures privacy with rho (in the ZeroConcentratedDivergence
measure).
Support: Float vs. Integer#
There are also discrete analogues of the continuous Laplace and Gaussian measurements. The continuous measurements accept and emit floats, while the discrete measurements accept and emit integers. Measurements with distributions supported on the integers expect integer sensitivities by default.
make_laplace
on a discrete support is equivalent to the geometric mechanism:
[4]:
# call the constructor to produce the measurement `base_discrete_lap`
input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int)
base_discrete_lap = dp.m.make_laplace(*input_space, scale=1.)
# invoke the measurement on some integer aggregate x, to sample DiscreteLaplace(x, 1.) noise
aggregated = 0
print("noisy aggregate:", base_discrete_lap(aggregated))
# in this case, the sensitivity is integral:
sensitivity = 1
print("epsilon:", base_discrete_lap.map(d_in=sensitivity))
noisy aggregate: 1
epsilon: 1.0
make_gaussian
on a discrete support is the analogous measurement for Gaussian noise:
[5]:
# call the constructor to produce the measurement `base_discrete_gauss`
input_space = dp.atom_domain(T=int), dp.absolute_distance(T=int)
base_discrete_gauss = dp.m.make_gaussian(*input_space, scale=1.)
# invoke the measurement on some aggregate x, to sample DiscreteGaussian(x, 1.) noise
aggregated = 0
print("noisy aggregate:", base_discrete_gauss(aggregated))
# we must know the sensitivity of `aggregated` to determine epsilon
sensitivity = 1
print("rho:", base_discrete_gauss.map(d_in=sensitivity))
noisy aggregate: -1
rho: 0.5
The continuous mechanisms use these discrete samplers internally. More information on this can be found at the end of this notebook.
Domain: Scalar vs. Vector#
Measurements covered thus far have accepted scalar inputs and emitted scalar outputs, and sensitivities have been expressed in terms of the absolute distance.
The noise addition mechanisms can similarly operate over metric spaces consisting of vectors, and where the distance between any two vectors is computed via the L1 or L2 distance.
[6]:
# call again, but this time indicate that the measurement should operate over a vector domain
input_space = dp.vector_domain(dp.atom_domain(T=float)), dp.l1_distance(T=float)
base_lap_vec = dp.m.make_laplace(*input_space, scale=1.)
aggregated = 1.
# If we try to pass the wrong data type into our vector laplace measurement,
# the error shows that our float argument should be a vector of floats.
try:
print("noisy aggregate:", base_lap_vec(aggregated))
except TypeError as e:
# The error messages will often point to a discussion page with more info.
print(e)
Expected type is Vec<f64> but input data is not a list.
[7]:
# actually pass a vector-valued input, as expected
aggregated = [0., 2., 2.]
print("noisy aggregate:", base_lap_vec(aggregated))
noisy aggregate: [-1.249922018362259, 1.5557446543831857, -1.620935690993615]
The resulting measurement expects sensitivity in terms of the appropriate Lp-distance: the vector Laplace measurement expects sensitivity in terms of an "l1_distance(T=f64)"
, while the vector Gaussian measurement expects a sensitivity in terms of an "l2_distance(T=f64)"
.
[8]:
sensitivity = 1.
print("epsilon:", base_lap_vec.map(d_in=sensitivity))
epsilon: 1.0
The documentation for each constructor also reflects the relationship between D
and the resulting input metric in a table:
[9]:
help(dp.m.make_laplace)
Help on function make_laplace in module opendp.measurements:
make_laplace(input_domain: opendp.mod.Domain, input_metric: opendp.mod.Metric, scale, k=None, QO: Union[ForwardRef('RuntimeType'), str, Type[Union[List[Any], Tuple[Any, Any], int, float, str, bool]], Tuple[ForwardRef('RuntimeTypeDescriptor'), ...], _GenericAlias, types.GenericAlias] = 'float') -> opendp.mod.Measurement
Make a Measurement that adds noise from the Laplace(`scale`) distribution to the input.
Valid inputs for `input_domain` and `input_metric` are:
| `input_domain` | input type | `input_metric` |
| ------------------------------- | ------------ | ---------------------- |
| `atom_domain(T)` (default) | `T` | `absolute_distance(T)` |
| `vector_domain(atom_domain(T))` | `Vec<T>` | `l1_distance(T)` |
Internally, all sampling is done using the discrete Laplace distribution.
[make_laplace in Rust documentation.](https://docs.rs/opendp/latest/opendp/measurements/fn.make_laplace.html)
**Citations:**
* [GRS12 Universally Utility-Maximizing Privacy Mechanisms](https://theory.stanford.edu/~tim/papers/priv.pdf)
* [CKS20 The Discrete Gaussian for Differential Privacy](https://arxiv.org/pdf/2004.00010.pdf#subsection.5.2)
**Supporting Elements:**
* Input Domain: `D`
* Output Type: `D::Carrier`
* Input Metric: `D::InputMetric`
* Output Measure: `MaxDivergence`
:param input_domain: Domain of the data type to be privatized.
:type input_domain: Domain
:param input_metric: Metric of the data type to be privatized.
:type input_metric: Metric
:param scale: Noise scale parameter for the Laplace distribution. `scale` == standard_deviation / sqrt(2).
:param k: The noise granularity in terms of 2^k, only valid for domains over floats.
:param QO: Data type of the output distance and scale. `f32` or `f64`.
:type QO: :py:ref:`RuntimeTypeDescriptor`
:rtype: Measurement
:raises TypeError: if an argument's type differs from the expected type
:raises UnknownTypeException: if a type argument fails to parse
:raises OpenDPException: packaged error from the core OpenDP library
The discrete Gaussian mechanism allows for the type of the input sensitivity to be a float. This is because there is often a square root in the sensitivity calculations for vector-valued queries.
[10]:
# call again, but this time indicate that the measurement should operate over a vector domain
input_space = dp.vector_domain(dp.atom_domain(T=int)), dp.l2_distance(T=float)
base_gauss_vec = dp.m.make_gaussian(*input_space, scale=1.)
base_gauss_vec.map(d_in=1.414)
[10]:
0.999698
Bit depth#
By default, all floating-point data types default to 64-bit double-precision (denoted "f64"
), and all integral data types default to 32-bit (denoted "i32"
). The atomic data type expected by the function and privacy units can be further configured to operate over specific bit-depths by explicitly specifying "f32"
instead of "float"
, or "i64"
instead of "int"
.
More information on acceptable data types can be found in the Utilities > Typing section of the User Guide.
Desideratum: Floating-Point Granularity#
The “continuous” Laplace and Gaussian measurements convert their float arguments to a rational representation, and then add integer noise to the numerator via the respective discrete distribution. In the OpenDP Library’s default configuration, this rational representation of a float is exact. Therefore the privacy analysis is as tight as if you were to sample truly continuous noise and then postprocess by rounding to the nearest float.
For most use-cases the sampling algorithm is sufficiently fast when the rational representation is exact. That is, when noise is sampled with a granularity of \(2^{-1074}\), the same granularity as the distance between subnormal 64-bit floats. However, the granularity can be adjusted to \(2^k\), for some choice of k, for a faster runtime. Adjusting this parameter comes with a small penalty to the sensitivity (to account for rounding to the nearest rational), and subsequently, to the privacy parameters.
The following plot shows the resulting distribution for some large choices of k:
[12]:
import numpy as np
import matplotlib.pyplot as plt
num_samples = 10_000
space = dp.vector_domain(dp.atom_domain(T=float), num_samples), dp.l1_distance(T=float)
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(10, 3))
fig.subplots_adjust(top=0.8)
for axis, k in zip(axes, [1, 0, -1]):
base_lap_vec = dp.m.make_laplace(*space, scale=1., k=k)
support, counts = np.unique(base_lap_vec([0.] * num_samples), return_counts=True)
axis.bar(support, counts / num_samples)
axis.set_xticks([-2, 0, 2])
axis.set_title(f"k = {k}; gap = {2**k}")
axes[0].set_ylabel("density")
fig.suptitle('make_laplace for different choices of k', y=0.95);
The distribution becomes increasingly smooth as k approaches the default value (-1074
).
The privacy map still adds a penalization when the sensitivity is zero. The following table uses this behavior to show the increase in epsilon for some choices of k:
[13]:
k = [-1074, -1073, -100, -1, 0, 1]
space = dp.atom_domain(T=float), dp.absolute_distance(T=float)
ε_penalty = [dp.m.make_laplace(*space, scale=1., k=k_i).map(d_in=0.) for k_i in k]
detail = ["no penalty", "~min float", "~2^-100", "~2^-1", "~2^0", "~2^1"]
import pandas as pd
pd.DataFrame({"k": k, "ε penalty": ε_penalty, "detail": detail}).set_index("k")
[13]:
ε penalty | detail | |
---|---|---|
k | ||
-1074 | 0.000000e+00 | no penalty |
-1073 | 4.940656e-324 | ~min float |
-100 | 7.888609e-31 | ~2^-100 |
-1 | 5.000000e-01 | ~2^-1 |
0 | 1.000000e+00 | ~2^0 |
1 | 2.000000e+00 | ~2^1 |