Questions or feedback?

opendp.extras.sklearn.decomposition package#

Module contents#

This module requires extra installs: pip install 'opendp[scikit-learn]'

For convenience, all the members of this module are also available from opendp.prelude. We suggest importing under the conventional name dp:

>>> import opendp.prelude as dp

The members of this module will then be accessible at dp.sklearn.decomposition.

See also our tutorial on diffentially private PCA.

class opendp.extras.sklearn.decomposition.PCA(*, epsilon, row_norm, n_samples, n_features, n_components=None, n_changes=1, whiten=False)[source]#

DP wrapper for sklearn’s PCA. This implementation is based on Differentially Private Covariance Estimation by Kareem Amin, et al.

Trying to create an instance without sklearn installed will raise an ImportError.

See the tutorial on diffentially private PCA for details.

Parameters:
  • whiten (bool) – Mirrors the corresponding sklearn parameter: When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

  • epsilon (float) –

  • row_norm (float) –

  • n_samples (int) –

  • n_features (int) –

  • n_components (int | float | str | None) –

  • n_changes (int) –

fit(X, y=None)[source]#

Fit the model with X.

Parameters:
  • X – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y – Ignored

measurement()[source]#

Return a measurement that releases a fitted model.

Return type:

Measurement

n_features#

Number of features

class opendp.extras.sklearn.decomposition.PCAEpsilons(eigvals, eigvecs, mean)[source]#

Bases: NamedTuple

Tuple used to describe the ε-expenditure per changed record in the input data

Parameters:
  • eigvals (float) –

  • eigvecs (Sequence[float]) –

  • mean (float | None) –

eigvals: float#

ε-expenditure to estimate the eigenvalues

eigvecs: Sequence[float]#

ε-expenditure to estimate the eigenvectors

mean: float | None#

ε-expenditure to estimate the mean.

A portion of the budget is used to estimate the mean because the OpenDP PCA algorithm releases an eigendecomposition of the sum of squares and cross-products matrix (SSCP), not of the covariance matrix. If the data is centered beforehand (either by a prior from the user or by privately estimating the mean and then centering), then PCA will correspond to the covariance matrix, as expected, because the SSCP matrix of centered data is equivalent to a scaled covariance matrix.

If the data is not centered (or the mean is poorly estimated), then the first eigenvector will be dominated by the true mean.

opendp.extras.sklearn.decomposition.make_private_pca(input_domain, input_metric, unit_epsilon, norm=None, num_components=None)[source]#

Construct a Measurement that returns the data mean, singular values and right singular vectors.

Parameters:
  • input_domain (Domain) – instance of array2_domain(size=_, num_columns=_)

  • input_metric (Metric) – instance of symmetric_distance()

  • unit_epsilon (float | PCAEpsilons) – ε-expenditure per changed record in the input data

  • norm (float | None) – clamp each row to this norm bound

  • num_components – optional, number of eigenvectors to release. defaults to num_columns from input_domain

Returns:

a Measurement that computes a tuple of (mean, S, Vt)

Return type:

Measurement