opendp.extras.sklearn.decomposition package#
Module contents#
This module requires extra installs: pip install 'opendp[scikit-learn]'
For convenience, all the members of this module are also available from opendp.prelude
.
We suggest importing under the conventional name dp
:
>>> import opendp.prelude as dp
The members of this module will then be accessible at dp.sklearn.decomposition
.
See also our tutorial on diffentially private PCA.
- class opendp.extras.sklearn.decomposition.PCA(*, epsilon, row_norm, n_samples, n_features, n_components=None, n_changes=1, whiten=False)[source]#
DP wrapper for sklearn’s PCA. This implementation is based on Differentially Private Covariance Estimation by Kareem Amin, et al.
Trying to create an instance without sklearn installed will raise an
ImportError
.See the tutorial on diffentially private PCA for details.
- Parameters:
whiten (bool) – Mirrors the corresponding sklearn parameter: When
True
(False
by default) thecomponents_
vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.epsilon (float) –
row_norm (float) –
n_samples (int) –
n_features (int) –
n_components (int | float | str | None) –
n_changes (int) –
- fit(X, y=None)[source]#
Fit the model with X.
- Parameters:
X – Training data, where
n_samples
is the number of samples andn_features
is the number of features.y – Ignored
- n_features#
Number of features
- class opendp.extras.sklearn.decomposition.PCAEpsilons(eigvals, eigvecs, mean)[source]#
Bases:
NamedTuple
Tuple used to describe the ε-expenditure per changed record in the input data
- Parameters:
eigvals (float) –
eigvecs (Sequence[float]) –
mean (float | None) –
- eigvals: float#
ε-expenditure to estimate the eigenvalues
- eigvecs: Sequence[float]#
ε-expenditure to estimate the eigenvectors
- mean: float | None#
ε-expenditure to estimate the mean.
A portion of the budget is used to estimate the mean because the OpenDP PCA algorithm releases an eigendecomposition of the sum of squares and cross-products matrix (SSCP), not of the covariance matrix. If the data is centered beforehand (either by a prior from the user or by privately estimating the mean and then centering), then PCA will correspond to the covariance matrix, as expected, because the SSCP matrix of centered data is equivalent to a scaled covariance matrix.
If the data is not centered (or the mean is poorly estimated), then the first eigenvector will be dominated by the true mean.
- opendp.extras.sklearn.decomposition.make_private_pca(input_domain, input_metric, unit_epsilon, norm=None, num_components=None)[source]#
Construct a Measurement that returns the data mean, singular values and right singular vectors.
- Parameters:
input_domain (Domain) – instance of array2_domain(size=_, num_columns=_)
input_metric (Metric) – instance of symmetric_distance()
unit_epsilon (float | PCAEpsilons) – ε-expenditure per changed record in the input data
norm (float | None) – clamp each row to this norm bound
num_components – optional, number of eigenvectors to release. defaults to num_columns from input_domain
- Returns:
a Measurement that computes a tuple of (mean, S, Vt)
- Return type: