{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Differentially Private PCA\n", "\n", "This notebook documents making a differentially private PCA release." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "Any constructors that have not completed the proof-writing and vetting process may still be accessed if you opt-in to \"contrib\".\n", "Please contact us if you are interested in proof-writing. Thank you!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from opendp.mod import enable_features\n", "enable_features(\"contrib\", \"floating-point\", \"honest-but-curious\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def sample_microdata(*, num_columns=None, num_rows=None, cov=None):\n", " cov = cov or sample_covariance(num_columns)\n", " microdata = np.random.multivariate_normal(\n", " np.zeros(cov.shape[0]), cov, size=num_rows or 100_000\n", " )\n", " microdata -= microdata.mean(axis=0)\n", " return microdata\n", "\n", "def sample_covariance(num_features):\n", " A = np.random.uniform(0, num_features, size=(num_features, num_features))\n", " return A.T @ A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we'll be working with an example dataset generated from a random covariance matrix." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "num_columns = 4\n", "num_rows = 10_000\n", "example_dataset = sample_microdata(num_columns=num_columns, num_rows=num_rows)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Releasing a DP PCA model with the OpenDP Library is easy because it provides an API similar to scikit-learn:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import opendp.prelude as dp\n", "\n", "model = dp.sklearn.PCA(\n", " epsilon=1.,\n", " row_norm=1.,\n", " n_samples=num_rows,\n", " n_features=4,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A private release occurs when you fit the model to the data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
PCA(epsilon=1.0, n_components=4, n_features=4, n_samples=10000, row_norm=1.0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
PCA(epsilon=1.0, n_components=4, n_features=4, n_samples=10000, row_norm=1.0)
PCA(epsilon=1.0, n_components=2, n_features=4, n_samples=10000, row_norm=1.0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
PCA(epsilon=1.0, n_components=2, n_features=4, n_samples=10000, row_norm=1.0)