OpenDP is based on a conceptual model that defines the characteristics of privacy-preserving operations and provides a way for components to be assembled into programs with desired behavior. This model, known as the OpenDP Programming Framework, is described in the paper A Programming Framework for OpenDP. The framework is designed with a clear and verifiable means of capturing the sensitive aspects of an algorithm, while remaining highly flexible and extensible. OpenDP (the software library) is intended to be a faithful implementation of that approach. Because OpenDP is based on a well-defined model, users can create applications with rigorous privacy properties.
The OpenDP Programming Framework consists of a set of high-level conceptual elements. We’ll cover the highlights here, which should be enough for you to get acquainted with OpenDP programming. If you’re interested in more of the details and motivations behind the framework, you’re encouraged to read the paper. There is also an illustrative notebook A Framework to Understand DP.
In this section, we’ve used lower case when writing the names of OpenDP concepts. Later, when we talk about programming elements, we’ll use the capitalized form to refer to the concrete data types that implement these concepts. (The concept names link to their corresponding type descriptions.)
Measurements are randomized mappings from a dataset to an arbitrary output value. They are a controlled means of introducing privacy (e.g. noise) to a computation. An example of a measurement is one which applies Laplace noise to a value.
Transformations are deterministic mappings from a dataset to another dataset. They are used to summarize or transform values in some way. An example of a transformation is one which calculates the mean of a set of values.
Domains are sets which identify the possible values that some object can take. They are used to constrain the input or output of measurements and transformations. Examples of domains are the integers between 1 and 10, or vectors of length 5 containing floating point numbers.
Metrics capture the distance between two neighboring datasets. An example metric is “symmetric distance” (counting the number of elements changed).
Privacy relations and stability relations are boolean functions which characterize the notion “closeness” of operation inputs and outputs. They are the glue that binds everything together.
A stability relation is a statement about a transformation. It’s also a boolean function of two values, an input distance (in a specific metric) and an output distance (in a specific metric, possibly different from the input metric). A stability relation lets you make assertions about the behavior of a transformation when that transformation is evaluated on any pairs of neighboring datasets. If the stability relation is true, it is a guarantee that any pair of transformation inputs within the input distance will always produce transformation outputs within the output distance.
Relations capture the notion of closeness in a very general way, allowing the extension of OpenDP to different definitions of privacy.
As you can see, these elements are interdependent and support each other. The interaction of these elements is what gives the OpenDP Programming Framework its flexibility and expressiveness. These topics are covered at a more granular level in the following sections:
You don’t need to know all the details of the Programming Framework to write OpenDP applications, but it helps understand some of the key points:
OpenDP calculations are built by assembling a measurement from a number of constituent transformations and measurements, typically through chaining or composition.
Measurements don’t have a static privacy loss specified when constructing the measurement. Instead, measurements are typically constructed by specifying the scale of noise, and the loss is bounded by the resulting privacy relation. This requires some extra work compared to specifying the loss directly, but OpenDP provides some utilities to make this easier on the programmer, and the benefit is greatly increased flexibility of the framework as a whole.
As a work in progress, it’s important to note that OpenDP doesn’t yet implement all the details of the Programming Framework.
An important aspect of the Programming Framework is the flexible way that it models interactive measurements. These are measurements where the operation isn’t a static function, but instead captures a series of queries and responses, where the sequence is possibly determined dynamically. This is a very flexible model of computation, and can be used to capture notions such as adaptive composition.
Unfortunately, OpenDP doesn’t yet implement interactive measurements, and is limited to plain (non-interactive) measurements. We know this is important functionality, and are in the process of prototyping an implementation, but unfortunately it’ll take some time before it’s ready for use.
Row transforms are a way of applying a user-defined function to each of the elements of a dataset. This concept can be used to construct transformations for operations that aren’t provided “out of the box” by OpenDP. Unfortunately, supporting row transforms has some privacy limitations around pure functions and also requires some tricky technical work, so these aren’t yet implemented in OpenDP.
Applying the Concepts#
This is just a glance at the abstract concepts in the OpenDP Programming Framework. The following sections of this guide describe the actual software components in OpenDP implementing these concepts, and how they can be used in your programs.