Questions or feedback?

String#

[Polars Documentation]

In the string module, OpenDP currently only supports parsing to temporal data types.

[1]:
import polars as pl
import opendp.prelude as dp
dp.enable_features("contrib")

context = dp.Context.compositor(
    # Many columns contain mixtures of strings and numbers and cannot be parsed as floats,
    # so we'll set `ignore_errors` to true to avoid conversion errors.
    data=pl.scan_csv(dp.examples.get_france_lfs_path(), ignore_errors=True),
    privacy_unit=dp.unit_of(contributions=36),
    privacy_loss=dp.loss_of(epsilon=1.0, delta=1e-7),
    split_evenly_over=2,
)

Strptime, To Date, To Datetime, To Time#

Dates can be parsed from strings via .str.strptime, and its variants .str.to_date, .str.to_datetime, and .str.to_time.

[2]:
query = (
    context.query()
    .with_columns(pl.col.YEAR.cast(str).str.to_date(format=r"%Y"))
    .group_by("YEAR")
    .agg(dp.len())
)
query.release().collect().sort("YEAR")
[2]:
shape: (9, 2)
YEARlen
dateu32
2005-01-01342193
2006-01-01339683
2007-01-01350429
2008-01-01348574
2009-01-01416966
2010-01-01500385
2011-01-01517166
2012-01-01515460
2013-01-01480615

While Polars supports automatic inference of the datetime format from reading the data, doing so can lead to situations where the data-dependent inferred format changes or cannot be inferred upon the addition or removal of a single individual, resulting in an unstable computation. For this reason, the format argument is required.

OpenDP also does not allow parsing strings into nanosecond datetimes, because the underlying implementation throws data-dependent errors (not private) for certain inputs.

[3]:
query = (
    context.query()
    .with_columns(pl.col.YEAR.cast(str).str.to_datetime(format=r"%Y", time_unit="ns"))
    .group_by("YEAR")
    .agg(dp.len())
)
try:
    query.release()
    assert False, "unreachable!"
except dp.OpenDPException as err:
    assert "Nanoseconds are not currently supported" in str(err)

Parsed data can then be manipulated with temporal expressions.