Parameter Search#

The OpenDP library provides two different kinds of search algorithms to aid in finding free parameters. The primary being the binary search functions, and secondary an exponential search.

Binary Search#

There are many parameters in a typical DP measurement:

d_in input distance (oftentimes how many records differ when you perturb one individual)
d_out output distance (oftentimes the privacy budget)
noise scale and any other parameters passed to the constructors

To evaluate a privacy relation, you must fix all of these parameters. The relation simply returns a boolean indicating if it passed. If the relation passes for a given d_out, it will also pass for any value greater than d_out. This behavior makes it possible to solve for any one parameter using a binary search because the relation itself acts as your predicate function.

OpenDP comes with some utility functions to make these binary searches easier to conduct:

binary_search_chain(): Pass it a function that makes a measurement or transformation from one numeric argument, as well as d_in and d_out. Returns the tightest chain.
binary_search_param(): Same as binary_search_chain, but returns the discovered parameter.
binary_search(): Pass a predicate function and bounds. Returns the discovered parameter. Useful when you just want to solve for d_in or d_out.

This is extremely powerful!

Python

>>> import opendp.prelude as dp
>>> dp.enable_features("contrib", "idealized-numerics")

If you have a bound on d_in and a budget d_out, you can solve for the smallest noise scale that is still differentially private.

This is useful when you want to determine how accurate you can make a query with a given budget.

>>> input_space = dp.atom_domain(
...     T=float, nan=False
... ), dp.absolute_distance(T=float)
>>> dp.binary_search_param(
...     lambda s: dp.m.make_gaussian(*input_space, scale=s),
...     d_in=1.0,
...     d_out=1.0,
... )
0.7071067811865476

If you have a bound on d_in and a noise scale, you can solve for the tightest budget d_out that is still differentially private.

This is useful when you want to find the smallest budget that will satisfy a target accuracy.
```
>>> # in this case, a search is unnecessary. We can just use the map:
>>> dp.m.make_gaussian(*input_space, scale=1.0).map(d_in=1.0)
0.5
```
If you have a noise scale and a budget d_out, you can solve for the largest bound on d_in that is still differentially private.

This is useful when you want to determine an upper bound on how many records can be collected from an individual before needing to truncate.
```
>>> # finds the largest permissible d_in, a sensitivity
>>> dp.binary_search(
...     lambda d_in: dp.m.make_gaussian(
...         *input_space, scale=1.0
...     ).check(d_in=d_in, d_out=1.0)
... )
1.414213562373095
```

If you have d_in, d_out, and noise scale derived from a target accuracy, you can solve for the smallest dataset size n that is still differentially private.

This is useful when you want to determine the necessary sample size when collecting data.

>>> # finds the smallest n
>>> dp.binary_search_param(
...     lambda n: dp.t.make_mean(
...         dp.vector_domain(dp.atom_domain((0.0, 10.0)), n),
...         dp.symmetric_distance(),
...     )
...     >> dp.m.then_gaussian(scale=1.0),
...     d_in=2,
...     d_out=1.0,
... )
8

If you have d_in, d_out, and noise scale derived from a target accuracy, you can solve for the greatest clipping range that is still differentially private

This is useful when you want to minimize the likelihood of introducing bias.

>>> # finds the largest clipping bounds
>>> dp.binary_search_param(
...     lambda c: dp.t.make_sum(
...         dp.vector_domain(dp.atom_domain(bounds=(-c, c))),
...         dp.symmetric_distance(),
...     )
...     >> dp.m.then_gaussian(scale=1.0),
...     d_in=2,
...     d_out=1.0,
... )
0.353553389770093

The API documentation on these functions have more specific usage examples.

Exponential Search#

An exponential search starts at an origin location in the search space, and finds the first step where a predicate function changes value. Generally speaking, each step the algorithm takes is exponentially larger than the previous one. If bounds are not passed to the binary search algorithm, an exponential search is run to find the bounds for the binary search. This is generally less likely to overflow than if you were to set large binary search bounds, because the magnitude of exponential bounds queries starts small.

exponential_bounds_search() uses a number of heuristics that tend to work well on most problems. If the heuristics fail you, then pass your own bounds into the binary search utilities.

Branches

Releases

Parameter Search#

Binary Search#

Exponential Search#