\n", " Elements of a Laplace Measurement\n", "\n", "1. We first define the **function** $f(\\cdot)$, that applies the Laplace mechanism to some argument $x$. This function simply samples from the Laplace distribution centered at $x$, with a fixed noise scale.\n", "\n", "$$f(x) = Laplace(\\mu=x, b=scale)$$\n", "\n", "2. Importantly, $f(\\cdot)$ is only well-defined for any finite float input. This set of permitted inputs is described by the **input domain** (denoted AtomDomain).\n", "\n", "3. The Laplace mechanism has a privacy guarantee in terms of epsilon. \n", "This guarantee is represented by a **privacy map**, a function that computes the privacy loss $\\epsilon$ for any choice of sensitivity $\\Delta$.\n", "\n", "$$map(\\Delta) = \\Delta / scale <= \\epsilon$$\n", "\n", "4. This map only promises that the privacy loss will be at most $\\epsilon$ if inputs from any two neighboring datasets may differ by no more than some quantity $\\Delta$ under the absolute distance **input metric** (AbsoluteDistance).\n", "\n", "5. We similarly describe units on the output ($\\epsilon$) via the **output measure** (MaxDivergence).\n", "
\n", "\n", "\n", "The make_base_laplace constructor function returns the equivalent of the Laplace measurement described above." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "noisy aggregate: 1.7764984858902648\n", "epsilon: 0.5\n" ] } ], "source": [ "from opendp.measurements import make_base_laplace\n", "from opendp.domains import atom_domain\n", "from opendp.metrics import absolute_distance\n", "\n", "# call the constructor to produce the measurement base_lap\n", "input_space = atom_domain(T=float), absolute_distance(T=float)\n", "base_lap = make_base_laplace(*input_space, scale=2.)\n", "\n", "# invoke the measurement on some aggregate x, to sample Laplace(x, 1.) noise\n", "aggregated = 0.\n", "print(\"noisy aggregate:\", base_lap(aggregated))\n", "\n", "# we must know the sensitivity of aggregated to determine epsilon\n", "sensitivity = 1.\n", "print(\"epsilon:\", base_lap.map(d_in=sensitivity))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The analogous constructor for gaussian noise is make_gaussian: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "noisy aggregate: 0.2214567907024509\n", "rho: 0.125\n" ] } ], "source": [ "from opendp.measurements import make_gaussian\n", "\n", "# call the constructor to produce the measurement gauss\n", "input_space = atom_domain(T=float), absolute_distance(T=float)\n", "gauss = make_gaussian(*input_space, scale=2.)\n", "\n", "# invoke the measurement on some aggregate x, to sample Gaussian(x, 1.) noise\n", "aggregated = 0.\n", "print(\"noisy aggregate:\", gauss(aggregated))\n", "\n", "# we must know the sensitivity of aggregated to determine epsilon\n", "sensitivity = 1.\n", "print(\"rho:\", gauss.map(d_in=sensitivity))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Notice that base_lap measures privacy with epsilon (in the MaxDivergence measure), and base_gauss measures privacy with rho (in the ZeroConcentratedDivergence measure).\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Support: Float vs. Integer\n", "\n", "There are also discrete analogues of the continuous Laplace and Gaussian measurements.\n", "The continuous measurements accept and emit floats, while the discrete measurements accept and emit integers.\n", "Measurements with distributions supported on the integers expect integer sensitivities by default.\n", "\n", "make_base_discrete_laplace is equivalent to the geometric mechanism:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "noisy aggregate: 0\n", "epsilon: 1.0\n" ] } ], "source": [ "from opendp.measurements import make_base_discrete_laplace\n", "\n", "# call the constructor to produce the measurement base_discrete_lap\n", "input_space = atom_domain(T=int), absolute_distance(T=int)\n", "base_discrete_lap = make_base_discrete_laplace(*input_space, scale=1.)\n", "\n", "# invoke the measurement on some integer aggregate x, to sample DiscreteLaplace(x, 1.) noise\n", "aggregated = 0\n", "print(\"noisy aggregate:\", base_discrete_lap(aggregated))\n", "\n", "# in this case, the sensitivity is integral:\n", "sensitivity = 1\n", "print(\"epsilon:\", base_discrete_lap.map(d_in=sensitivity))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "make_base_discrete_gaussian is the analogous measurement for Gaussian noise:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "noisy aggregate: 1\n", "rho: 0.5\n" ] } ], "source": [ "from opendp.measurements import make_base_discrete_gaussian\n", "\n", "# call the constructor to produce the measurement base_discrete_gauss\n", "input_space = atom_domain(T=int), absolute_distance(T=int)\n", "base_discrete_gauss = make_base_discrete_gaussian(*input_space, scale=1.)\n", "\n", "# invoke the measurement on some aggregate x, to sample DiscreteGaussian(x, 1.) noise\n", "aggregated = 0\n", "print(\"noisy aggregate:\", base_discrete_gauss(aggregated))\n", "\n", "# we must know the sensitivity of aggregated to determine epsilon\n", "sensitivity = 1\n", "print(\"rho:\", base_discrete_gauss.map(d_in=sensitivity))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The continuous mechanisms use these discrete samplers internally.\n", "More information on this can be found at the end of this notebook." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Domain: Scalar vs. Vector" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Measurements covered thus far have accepted scalar inputs and emitted scalar outputs, \n", "and sensitivities have been expressed in terms of the absolute distance.\n", "\n", "The noise addition mechanisms can similarly operate over metric spaces consisting of vectors, \n", "and where the distance between any two vectors is computed via the L1 or L2 distance." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "inferred type is f64, expected Vec. See https://github.com/opendp/opendp/discussions/298\n" ] } ], "source": [ "# call again, but this time indicate that the measurement should operate over a vector domain\n", "\n", "from opendp.domains import vector_domain, atom_domain\n", "from opendp.metrics import l1_distance\n", "input_space = vector_domain(atom_domain(T=float)), l1_distance(T=float)\n", "base_lap_vec = make_base_laplace(*input_space, scale=1.)\n", "\n", "aggregated = 1.\n", "# If we try to pass the wrong data type into our vector laplace measurement, \n", "# the error shows that our float argument should be a vector of floats.\n", "try:\n", " print(\"noisy aggregate:\", base_lap_vec(aggregated))\n", "except TypeError as e:\n", " # The error messages will often point to a discussion page with more info.\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "noisy aggregate: [-2.600872570181145, 0.056345706343156464, 0.5331502281803662]\n" ] } ], "source": [ "# actually pass a vector-valued input, as expected\n", "aggregated = [0., 2., 2.]\n", "print(\"noisy aggregate:\", base_lap_vec(aggregated))\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The resulting measurement expects sensitivity in terms of the appropriate Lp-distance: the vector Laplace measurement expects sensitivity in terms of an \"l1_distance(T=f64)\", while the vector Gaussian measurement expects a sensitivity in terms of an \"l2_distance(T=f64)\". " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epsilon: 1.0\n" ] } ], "source": [ "sensitivity = 1.\n", "print(\"epsilon:\", base_lap_vec.map(d_in=sensitivity))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The documentation for each constructor also reflects the relationship between D and the resulting input metric in a table:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function make_base_laplace in module opendp.measurements:\n", "\n", "make_base_laplace(input_domain, input_metric, scale, k: int = -1074) -> opendp.mod.Measurement\n", " Make a Measurement that adds noise from the Laplace(scale) distribution to a scalar value.\n", " \n", " Valid inputs for input_domain and input_metric are:\n", " \n", " | input_domain | input type | input_metric |\n", " | ------------------------------- | ------------ | ---------------------- |\n", " | atom_domain(T) (default) | T | absolute_distance(T) |\n", " | vector_domain(atom_domain(T)) | Vec | l1_distance(T) |\n", " \n", " This function takes a noise granularity in terms of 2^k.\n", " Larger granularities are more computationally efficient, but have a looser privacy map.\n", " If k is not set, k defaults to the smallest granularity.\n", " \n", " [make_base_laplace in Rust documentation.](https://docs.rs/opendp/latest/opendp/measurements/fn.make_base_laplace.html)\n", " \n", " **Supporting Elements:**\n", " \n", " * Input Domain: D\n", " * Output Type: D::Carrier\n", " * Input Metric: D::InputMetric\n", " * Output Measure: MaxDivergence\n", " \n", " :param input_domain: Domain of the data type to be privatized.\n", " :param input_metric: Metric of the data type to be privatized.\n", " :param scale: Noise scale parameter for the laplace distribution. scale == standard_deviation / sqrt(2).\n", " :param k: The noise granularity in terms of 2^k.\n", " :type k: int\n", " :rtype: Measurement\n", " :raises TypeError: if an argument's type differs from the expected type\n", " :raises UnknownTypeException: if a type argument fails to parse\n", " :raises OpenDPException: packaged error from the core OpenDP library\n", "\n" ] } ], "source": [ "help(make_base_laplace)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The discrete Gaussian mechanism allows for the type of the input sensitivity to be a float.\n", "This is because there is often a square root in the sensitivity calculations for vector-valued queries." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.999698" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from opendp.metrics import l2_distance\n", "\n", "# call again, but this time indicate that the measurement should operate over a vector domain\n", "input_space = vector_domain(atom_domain(T=int)), l2_distance(T=float)\n", "base_gauss_vec = make_base_discrete_gaussian(*input_space, scale=1.)\n", "\n", "base_gauss_vec.map(d_in=1.414)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Bit depth\n", "\n", "By default, all floating-point data types default to 64-bit double-precision (denoted \"f64\"), and all integral data types default to 32-bit (denoted \"i32\").\n", "The atomic data type expected by the function and privacy units can be further configured to operate over specific bit-depths by explicitly specifying \"f32\" instead of \"float\", or \"i64\" instead of \"int\". " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# explicitly specify that the...\n", "# * computation should be handled with 32-bit integers, and the\n", "# * privacy analysis be conducted with 64-bit floats\n", "base_discrete_lap_i32 = make_base_discrete_laplace(\n", " atom_domain(T=\"i32\"), absolute_distance(T=\"i32\"),\n", " scale=1., QO=\"f64\"\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "More information on acceptable data types can be found in the _Utilities > Typing_ section of the User Guide." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Desideratum: Floating-Point Granularity\n", "\n", "The \"continuous\" Laplace and Gaussian measurements convert their float arguments to a rational representation, and then add integer noise to the numerator via the respective discrete distribution. \n", "In the OpenDP Library's default configuration, this rational representation of a float is exact.\n", "Therefore the privacy analysis is as tight as if you were to sample truly continuous noise and then postprocess by rounding to the nearest float. \n", "\n", "For most use-cases the sampling algorithm is sufficiently fast when the rational representation is exact.\n", "That is, when noise is sampled with a granularity of $2^{-1074}$, the same granularity as the distance between subnormal 64-bit floats.\n", "However, the granularity can be adjusted to $2^k$, for some choice of k, for a faster runtime.\n", "Adjusting this parameter comes with a small penalty to the sensitivity (to account for rounding to the nearest rational), and subsequently, to the privacy parameters.\n", "\n", "The following plot shows the resulting distribution for some large choices of k:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "