{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Thresholded Noise Mechanisms\n",
    "\n",
    "Thresholded noise mechanisms are used to privately release a hashmap, where the keys are unknown and values are numbers,\n",
    "for example, a histogram of counts or a set of frequencies.\n",
    "\n",
    "The mechanism only releases key-value pairs that are \"stable\".\n",
    "When the value is large enough in magnitude to represent contributions from many different individuals,\n",
    "then the key is not specific to any one individual and can be released privately.\n",
    "The intuition is that the key is present among all neighboring datasets.\n",
    "\n",
    "We'll look at the various thresholded noise mechanisms in OpenDP:\n",
    "\n",
    "* Distribution: Laplace vs. Gaussian\n",
    "* Support: float vs. integer\n",
    "* Threshold: positive vs. negative\n",
    "* Bit-depth"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "Any functions that have not completed the proof-writing and vetting process may still be accessed if you opt-in to \"contrib\".\n",
    "Please contact us if you are interested in proof-writing. Thank you!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:44.436968Z",
     "iopub.status.busy": "2025-06-04T18:18:44.436653Z",
     "iopub.status.idle": "2025-06-04T18:18:45.118593Z",
     "shell.execute_reply": "2025-06-04T18:18:45.118279Z"
    }
   },
   "outputs": [],
   "source": [
    "import opendp.prelude as dp\n",
    "dp.enable_features(\"contrib\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distribution: Laplace vs. Gaussian\n",
    "\n",
    "The Laplace mechanism is used to privatize an aggregate, like a sum or mean.\n",
    "\n",
    "An instance of the Laplace threshold mechanism is captured by a _measurement_ containing the following five elements:\n",
    "\n",
    "<details>\n",
    "  <summary>Elements of a Laplace Threshold Measurement</summary>\n",
    "\n",
    "1. We first define the **function** $f(\\cdot)$, that applies the Laplace mechanism to the values of $x$, \n",
    "and then discards pairs whose value is below the threshold.\n",
    "\n",
    "```python\n",
    "    def f(x: dict[Any, float]) -> dict[Any, float]:\n",
    "        x = {k: Laplace(mu=v, b=scale) for k, v in x.items()}\n",
    "        return {k: v for k, v in x.items() if v >= threshold}\n",
    "```\n",
    "\n",
    "2. Importantly, $f(\\cdot)$ is only well-defined for any dictionary with finite float values. This set of permitted inputs is described by the **input domain** (denoted `MapDomain<AtomDomain<TK>, AtomDomain<f64>>`).\n",
    "\n",
    "3. The Laplace threshold mechanism has a privacy guarantee in terms of $\\epsilon$ and $\\delta$. \n",
    "This guarantee is represented by a **privacy map**, a function that computes the greatest privacy loss $(\\epsilon, \\delta)$ for any choice of sensitivity $\\Delta_0, \\Delta_1, \\Delta_\\infty$. The privacy map is roughly implemented as follows:\n",
    "\n",
    "```python\n",
    "    def map(d_in):\n",
    "        l0, l1, li = d_in\n",
    "        epsilon = l1 / scale\n",
    "\n",
    "        # probability of sampling a noise value greater than: threshold - li\n",
    "        delta_single = tail(scale, threshold - li)\n",
    "        delta = 1 - (1 - delta_single)**l0\n",
    "        return epsilon, delta\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "4. This map only promises that the privacy loss will be at most $\\epsilon$ if inputs from any two neighboring datasets may differ by no more than some quantity $\\Delta_0, \\Delta_1, \\Delta_\\infty$ under the absolute distance **input metric** (`L01InfDistance<AbsoluteDistance<f64>>`).\n",
    "\n",
    "5. We similarly describe units on the output ($(\\epsilon, \\delta)$) via the **output measure** (`Approximate<MaxDivergence>`).\n",
    "</details>\n",
    "\n",
    "\n",
    "The `make_laplace_threshold` constructor function returns the equivalent of the Laplace threshold measurement described above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.120152Z",
     "iopub.status.busy": "2025-06-04T18:18:45.120012Z",
     "iopub.status.idle": "2025-06-04T18:18:45.125531Z",
     "shell.execute_reply": "2025-06-04T18:18:45.125291Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "noisy aggregate: {'c': 40.17307713885866}\n"
     ]
    }
   ],
   "source": [
    "m_lap = dp.m.make_laplace_threshold(\n",
    "    dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=float, nan=False)),\n",
    "    dp.l01inf_distance(dp.absolute_distance(T=float)),\n",
    "    scale=1.,\n",
    "    threshold=20.0\n",
    ")\n",
    "\n",
    "# invoke the measurement on some aggregate hashmap, to sample Laplace(x, 1.) noise\n",
    "aggregated = {\n",
    "    \"a\": 0.0,\n",
    "    \"b\": 20.0,\n",
    "    \"c\": 40.0,\n",
    "}\n",
    "print(\"noisy aggregate:\", m_lap(aggregated))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected, pairs with small values (like `\"a\": 0.0`) had too few people contribute to be included in the release."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.143204Z",
     "iopub.status.busy": "2025-06-04T18:18:45.143074Z",
     "iopub.status.idle": "2025-06-04T18:18:45.145758Z",
     "shell.execute_reply": "2025-06-04T18:18:45.145560Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(ε, δ): (1.0, 2.801398224505647e-09)\n"
     ]
    }
   ],
   "source": [
    "# we must know the sensitivity of `aggregated` to determine privacy params\n",
    "#  3 kinds: Δ_0, Δ_1, Δ_∞\n",
    "sensitivity = 1, 1.0, 1.0\n",
    "lap_eps_del = m_lap.map(d_in=sensitivity)\n",
    "print(\"(ε, δ):\", lap_eps_del)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`d_in` carries three different kinds of sensitivity.\n",
    "\n",
    "* $\\Delta_0$: how many values an individual may influence\n",
    "* $\\Delta_1$: the total influence an individual may have over all values\n",
    "* $\\Delta_\\infty$: the influence an individual may have on any one value\n",
    "\n",
    "The analogous constructor for gaussian noise is `make_gaussian_threshold`: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.146922Z",
     "iopub.status.busy": "2025-06-04T18:18:45.146825Z",
     "iopub.status.idle": "2025-06-04T18:18:45.151053Z",
     "shell.execute_reply": "2025-06-04T18:18:45.150856Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "noisy aggregate: {'c': 40.93198267967212}\n",
      "(ρ, δ): (0.5, 1.1102230246251565e-16)\n"
     ]
    }
   ],
   "source": [
    "m_gauss = dp.m.make_gaussian_threshold(\n",
    "    dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=float, nan=False)),\n",
    "    # NOTE: L1 is changed to L2 in the input metric\n",
    "    dp.l02inf_distance(dp.absolute_distance(T=float)),\n",
    "    scale=1.,\n",
    "    threshold=20.0\n",
    ")\n",
    "\n",
    "# invoke the measurement on some aggregate hashmap, to sample Gaussian(x, 1.) noise\n",
    "print(\"noisy aggregate:\", m_gauss(aggregated))\n",
    "\n",
    "# we must know the sensitivity of `aggregated` to determine privacy params\n",
    "#  3 kinds: Δ_0, Δ_1, Δ_∞\n",
    "sensitivity = 1, 1.0, 1.0\n",
    "print(\"(ρ, δ):\", m_gauss.map(d_in=sensitivity))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, $\\Delta_1$ in `d_in` is replaced with $\\Delta_2$.\n",
    "\n",
    "* $\\Delta_0$: how many values an individual may influence\n",
    "* $\\Delta_2$: the euclidean influence an individual may have over all values\n",
    "* $\\Delta_\\infty$: the influence an individual may have on any one value\n",
    "\n",
    "`m_lap` measures privacy with $\\epsilon$ and $\\delta$ (in the `Approximate<MaxDivergence>` measure), and `m_gauss` measures privacy with $\\rho$ and $\\delta$ (in the `Approximate<ZeroConcentratedDivergence>` measure).\n",
    "\n",
    "Notice how much smaller $\\delta$ is this time (`2.8e-9` vs `1.1e-16`).\n",
    "This is because the laplace distribution is a \"fat-tailed\" distribution, \n",
    "meaning more of the mass of the distribution is in the tails.\n",
    "The tails of the gaussian distribution decay much more quickly, \n",
    "resulting in a much smaller $\\delta$.\n",
    "\n",
    "For comparison, let's convert the privacy guarantee from approx-zCDP to compare with the thresholded laplace mechanism:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.152103Z",
     "iopub.status.busy": "2025-06-04T18:18:45.152020Z",
     "iopub.status.idle": "2025-06-04T18:18:45.167868Z",
     "shell.execute_reply": "2025-06-04T18:18:45.167656Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(ε, δ): (6.3035767282855915, 2.801398224505647e-09)\n"
     ]
    }
   ],
   "source": [
    "# convert ρ to an ε(δ_2) privacy profile, where total privacy loss is (ε(δ_2), δ_1 + δ_2)\n",
    "m_gauss_profile = dp.c.make_zCDP_to_approxDP(m_gauss)\n",
    "# fix overall δ to that used by the laplace threshold, for comparison\n",
    "m_gauss_approx = dp.c.make_fix_delta(m_gauss_profile, delta=lap_eps_del[1])\n",
    "\n",
    "print(\"(ε, δ):\", m_gauss_approx.map(sensitivity))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this setting, at the same level of $\\delta$ as the thresholded laplace mechanism, \n",
    "the privacy loss of the thresholded gaussian mechanism is over four times larger.\n",
    "On the other hand, the thresholded gaussian mechanism will perform much better than the thresholded laplace mechanism\n",
    "when $\\Delta_\\infty$ is small and $\\Delta_0$ is large.\n",
    "This arises when an individual has a small influence over a large number of partitions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.168997Z",
     "iopub.status.busy": "2025-06-04T18:18:45.168927Z",
     "iopub.status.idle": "2025-06-04T18:18:45.182877Z",
     "shell.execute_reply": "2025-06-04T18:18:45.182662Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "laplace  (ε, δ): (0.1, 1.0316078580263621e-07)\n",
      "gaussian (ε, δ): (0.049969691134438526, 2.801398224505647e-09)\n"
     ]
    }
   ],
   "source": [
    "sensitivity_spread = 100, 10.0, 0.001\n",
    "print(\"laplace  (ε, δ):\", m_lap.map(d_in=sensitivity_spread))\n",
    "print(\"gaussian (ε, δ):\", m_gauss_approx.map(d_in=sensitivity_spread))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this alternative world where individuals may have a small influence on many partitions, \n",
    "the thresholded gaussian mechanism dominates in utility over the thresholded laplace mechanism.\n",
    "\n",
    "Notice that there is some redundancy in the sensitivity. \n",
    "Above, when an individual may only influence 100 partitions by at most 0.001, \n",
    "then a user's total influence ($\\Delta_1$) could only be 0.1!\n",
    "Instead of using 10, OpenDP infers $\\Delta_1$ is $100 \\cdot 0.001 = 0.1$, and $\\Delta_2$ is $\\sqrt{100} \\cdot 0.001 = .01$."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Support: Float vs. Integer\n",
    "\n",
    "There are also discrete analogues of the continuous Laplace and Gaussian threshold measurements.\n",
    "The continuous measurements accept and emit floats, while the discrete measurements accept and emit integers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.184083Z",
     "iopub.status.busy": "2025-06-04T18:18:45.184013Z",
     "iopub.status.idle": "2025-06-04T18:18:45.187518Z",
     "shell.execute_reply": "2025-06-04T18:18:45.187309Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "noisy aggregate: {'b': 10, 'c': 22}\n",
      "(ε, δ): (1.0, 3.319000812207484e-05)\n"
     ]
    }
   ],
   "source": [
    "# call the constructor to produce the measurement `m_dlap`\n",
    "input_space = dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=int)), dp.absolute_distance(T=int)\n",
    "m_dlap = dp.m.make_laplace_threshold(\n",
    "    dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=int)), \n",
    "    dp.l01inf_distance(dp.absolute_distance(T=int)), \n",
    "    scale=1.0,\n",
    "    threshold=10,\n",
    ")\n",
    "\n",
    "# invoke the measurement on some integer aggregate hashmap, to sample DiscreteLaplace(x, 1.) noise\n",
    "aggregated = {\n",
    "    \"a\": 0,\n",
    "    \"b\": 10,\n",
    "    \"c\": 20,\n",
    "}\n",
    "print(\"noisy aggregate:\", m_dlap(aggregated))\n",
    "\n",
    "# in this case, the sensitivity is integral:\n",
    "sensitivity = 1, 1, 1\n",
    "print(\"(ε, δ):\", m_dlap.map(d_in=sensitivity))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`make_gaussian_threshold` on a discrete support is the analogous measurement for Gaussian noise:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.188671Z",
     "iopub.status.busy": "2025-06-04T18:18:45.188583Z",
     "iopub.status.idle": "2025-06-04T18:18:45.191820Z",
     "shell.execute_reply": "2025-06-04T18:18:45.191633Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "noisy aggregate: {'c': 20, 'b': 10}\n",
      "(ρ, δ): (0.5, 1.1102230246251565e-16)\n"
     ]
    }
   ],
   "source": [
    "# call the constructor to produce the measurement `m_dgauss`\n",
    "input_space = dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=int)), dp.absolute_distance(T=int)\n",
    "m_dgauss = dp.m.make_gaussian_threshold(\n",
    "    dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=int)), \n",
    "    dp.l02inf_distance(dp.absolute_distance(T=float)), \n",
    "    scale=1.0,\n",
    "    threshold=10,\n",
    ")\n",
    "\n",
    "# invoke the measurement on some integer aggregate hashmap, to sample DiscreteGaussian(x, 1.) noise\n",
    "aggregated = {\n",
    "    \"a\": 0,\n",
    "    \"b\": 10,\n",
    "    \"c\": 20,\n",
    "}\n",
    "print(\"noisy aggregate:\", m_dgauss(aggregated))\n",
    "\n",
    "# in this case, the sensitivity is integral:\n",
    "sensitivity = 1, 1, 1\n",
    "print(\"(ρ, δ):\", m_dgauss.map(d_in=sensitivity))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The continuous mechanisms use these discrete samplers internally."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Threshold: Positive vs. Negative"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When the threshold is negative, pairs with noisy values greater than the threshold are discarded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-06-04T18:18:45.193034Z",
     "iopub.status.busy": "2025-06-04T18:18:45.192964Z",
     "iopub.status.idle": "2025-06-04T18:18:45.196421Z",
     "shell.execute_reply": "2025-06-04T18:18:45.196201Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "noisy aggregate: {'c': -20, 'b': -11}\n",
      "(ε, δ): (1.0, 3.319000812207484e-05)\n"
     ]
    }
   ],
   "source": [
    "# call the constructor to produce the measurement `m_dlap`\n",
    "input_space = dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=int)), dp.absolute_distance(T=int)\n",
    "m_dlap = dp.m.make_laplace_threshold(\n",
    "    dp.map_domain(dp.atom_domain(T=str), dp.atom_domain(T=int)), \n",
    "    dp.l01inf_distance(dp.absolute_distance(T=int)), \n",
    "    scale=1.0,\n",
    "    threshold=-10,\n",
    ")\n",
    "\n",
    "# invoke the measurement on some integer aggregate hashmap, to sample DiscreteLaplace(x, 1.) noise\n",
    "aggregated = {\n",
    "    \"a\": 0,\n",
    "    \"b\": -10,\n",
    "    \"c\": -20,\n",
    "}\n",
    "print(\"noisy aggregate:\", m_dlap(aggregated))\n",
    "\n",
    "# in this case, the sensitivity is integral:\n",
    "sensitivity = 1, 1, 1\n",
    "print(\"(ε, δ):\", m_dlap.map(d_in=sensitivity))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bit depth\n",
    "\n",
    "By default, all floating-point data types default to 64-bit double-precision (denoted `\"f64\"`), and all integral data types default to 32-bit (denoted `\"i32\"`).\n",
    "The atomic data type expected by the function and privacy units can be further configured to operate over specific bit-depths by explicitly specifying `\"f32\"` instead of `\"float\"`, or `\"i64\"` instead of `\"int\"`. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More information on acceptable data types can be found in the [Typing section of the User Guide](../utilities/typing.rst)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Desideratum: Floating-Point Granularity\n",
    "\n",
    "<details>\n",
    "  <summary>Excerpt from Additive Noise Mechanism notebook</summary>\n",
    "The \"continuous\" Laplace and Gaussian measurements convert their float values to a rational representation, and then add integer noise to the numerator via the respective discrete distribution. \n",
    "In the OpenDP Library's default configuration, this rational representation of a float is exact.\n",
    "Therefore the privacy analysis is as tight as if you were to sample truly continuous noise and then postprocess by rounding to the nearest float. \n",
    "\n",
    "For most use-cases the sampling algorithm is sufficiently fast when the rational representation is exact.\n",
    "That is, when noise is sampled with a granularity of $2^{-1074}$, the same granularity as the distance between subnormal 64-bit floats.\n",
    "However, the granularity can be adjusted to $2^k$, for some choice of k, for a faster runtime.\n",
    "Adjusting this parameter comes with a small penalty to the sensitivity (to account for rounding to the nearest rational), and subsequently, to the privacy parameters.\n",
    "</details>\n",
    "\n",
    "* In the case of additive noise mechanisms, the sensitivity from rounding increases as a function of the vector length.\n",
    "* In the case of thresholded noise mechanisms, the sensitivity from rounding increases as a function of $\\Delta_0$,\n",
    "as only $\\Delta_0$ different values can round in different directions."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}