opendp.extras.sklearn.decomposition package#
Module contents#
This module requires extra installs: pip install 'opendp[scikit-learn]'
For convenience, all the functions of this module are also available from opendp.prelude
.
We suggest importing under the conventional name dp
:
>>> import opendp.prelude as dp
The methods of this module will then be accessible at dp.sklearn.decomposition
.
See also our tutorial on diffentially private PCA.
- class opendp.extras.sklearn.decomposition.PCA(*, epsilon, row_norm, n_samples, n_features, n_components=None, n_changes=1, whiten=False)[source]#
DP wrapper for sklearn’s PCA.
Trying to create an instance without sklearn installed will raise an
ImportError
.See the tutorial on diffentially private PCA for details.
- Parameters:
whiten (bool) – Mirrors the corresponding sklearn parameter: When
True
(False
by default) thecomponents_
vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.epsilon (float) –
row_norm (float) –
n_samples (int) –
n_features (int) –
n_components (int | float | str | None) –
n_changes (int) –
- fit(X, y=None)[source]#
Fit the model with X.
- Parameters:
X – Training data, where
n_samples
is the number of samples andn_features
is the number of features.y – Ignored
- n_features#
Number of features
- class opendp.extras.sklearn.decomposition.PCAEpsilons(eigvals, eigvecs, mean)[source]#
Bases:
NamedTuple
Tuple used to describe the ε-expenditure per changed record in the input data
- Parameters:
eigvals (float) –
eigvecs (Sequence[float]) –
mean (float | None) –
- eigvals: float#
Alias for field number 0
- eigvecs: Sequence[float]#
Alias for field number 1
- mean: float | None#
Alias for field number 2
- opendp.extras.sklearn.decomposition.make_private_pca(input_domain, input_metric, unit_epsilon, norm=None, num_components=None)[source]#
Construct a Measurement that returns the data mean, singular values and right singular vectors.
- Parameters:
input_domain (Domain) – instance of array2_domain(size=_, num_columns=_)
input_metric (Metric) – instance of symmetric_distance()
unit_epsilon (float | PCAEpsilons) – ε-expenditure per changed record in the input data
norm (float | None) – clamp each row to this norm bound
num_components – optional, number of eigenvectors to release. defaults to num_columns from input_domain
- Returns:
a Measurement that computes a tuple of (mean, S, Vt)
- Return type: