Questions or feedback?

opendp.extras.sklearn.decomposition package#

Module contents#

This module requires extra installs: pip install 'opendp[scikit-learn]'

For convenience, all the functions of this module are also available from opendp.prelude. We suggest importing under the conventional name dp:

>>> import opendp.prelude as dp

The methods of this module will then be accessible at dp.sklearn.decomposition.

See also our tutorial on diffentially private PCA.

class opendp.extras.sklearn.decomposition.PCA(*, epsilon, row_norm, n_samples, n_features, n_components=None, n_changes=1, whiten=False)[source]#

DP wrapper for sklearn’s PCA.

Trying to create an instance without sklearn installed will raise an ImportError.

See the tutorial on diffentially private PCA for details.

Parameters:
  • whiten (bool) – Mirrors the corresponding sklearn parameter: When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

  • epsilon (float) –

  • row_norm (float) –

  • n_samples (int) –

  • n_features (int) –

  • n_components (int | float | str | None) –

  • n_changes (int) –

fit(X, y=None)[source]#

Fit the model with X.

Parameters:
  • X – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y – Ignored

measurement()[source]#

Return a measurement that releases a fitted model.

Return type:

Measurement

n_features#

Number of features

class opendp.extras.sklearn.decomposition.PCAEpsilons(eigvals, eigvecs, mean)[source]#

Bases: NamedTuple

Tuple used to describe the ε-expenditure per changed record in the input data

Parameters:
  • eigvals (float) –

  • eigvecs (Sequence[float]) –

  • mean (float | None) –

eigvals: float#

Alias for field number 0

eigvecs: Sequence[float]#

Alias for field number 1

mean: float | None#

Alias for field number 2

opendp.extras.sklearn.decomposition.make_private_pca(input_domain, input_metric, unit_epsilon, norm=None, num_components=None)[source]#

Construct a Measurement that returns the data mean, singular values and right singular vectors.

Parameters:
  • input_domain (Domain) – instance of array2_domain(size=_, num_columns=_)

  • input_metric (Metric) – instance of symmetric_distance()

  • unit_epsilon (float | PCAEpsilons) – ε-expenditure per changed record in the input data

  • norm (float | None) – clamp each row to this norm bound

  • num_components – optional, number of eigenvectors to release. defaults to num_columns from input_domain

Returns:

a Measurement that computes a tuple of (mean, S, Vt)

Return type:

Measurement