Amplification#
If your dataset is a simple sample from a larger population,
you can make the privacy relation more permissive by wrapping your measurement with a privacy amplification combinator:
opendp.combinators.make_population_amplification()
.
Note
The amplifier requires a looser trust model, as the population size can be set arbitrarily.
>>> dp.enable_features("honest-but-curious")
In order to demonstrate this API, we’ll first create a measurement with a sized input domain. The resulting measurement expects the size of the input dataset to be 10.
>>> atom_domain = dp.atom_domain(bounds=(0.0, 10.0), nan=False)
>>> input_space = (
... dp.vector_domain(atom_domain, size=10),
... dp.symmetric_distance(),
... )
>>> meas = (
... input_space
... >> dp.t.then_mean()
... >> dp.m.then_laplace(scale=0.5)
... )
>>> print(
... "standard mean:", meas([1.0] * 10)
... ) # -> 1.03
standard mean: ...
We can now use the amplification combinator to construct an amplified measurement. The function on the amplified measurement is identical to the standard measurement.
>>> amplified = dp.c.make_population_amplification(
... meas, population_size=100
... )
>>> print(
... "amplified mean:", amplified([1.0] * 10)
... ) # -> .97
amplified mean: ...
The privacy relation on the amplified measurement takes into account that the input dataset of size 10 is a simple sample of individuals from a theoretical larger dataset that captures the entire population, with 100 rows.
>>> # Where we once had a privacy utilization of ~2 epsilon...
>>> assert meas.check(2, 2.0 + 1e-6)
>>> # ...we now have a privacy utilization of ~.4941 epsilon.
>>> assert amplified.check(2, 0.4941)
The efficacy of this combinator improves as n gets larger.