{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# String\n", "\n", "[[Polars Documentation](https://docs.pola.rs/api/python/stable/reference/expressions/string.html)]\n", "\n", "In the string module, OpenDP currently only supports parsing to temporal data types." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import polars as pl\n", "import opendp.prelude as dp\n", "dp.enable_features(\"contrib\")\n", "# Fetch and unpack the data. \n", "![ -e ../sample_FR_LFS.csv ] || ( curl 'https://github.com/opendp/dp-test-datasets/blob/main/data/sample_FR_LFS.csv.zip?raw=true' --location --output sample_FR_LFS.csv.zip; unzip sample_FR_LFS.csv.zip -d ../ )\n", "\n", "context = dp.Context.compositor(\n", " # Many columns contain mixtures of strings and numbers and cannot be parsed as floats,\n", " # so we'll set `ignore_errors` to true to avoid conversion errors.\n", " data=pl.scan_csv(\"../sample_FR_LFS.csv\", ignore_errors=True),\n", " privacy_unit=dp.unit_of(contributions=36),\n", " privacy_loss=dp.loss_of(epsilon=1.0, delta=1e-7),\n", " split_evenly_over=2,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Strptime, To Date, To Datetime, To Time\n", "\n", "Dates can be parsed from strings via `.str.strptime`, and its variants `.str.to_date`, `.str.to_datetime`, and `.str.to_time`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
YEAR | len |
---|---|
date | u32 |
2004-01-01 | 16510 |
2005-01-01 | 16448 |
2006-01-01 | 16108 |
2007-01-01 | 16802 |
2008-01-01 | 16757 |
2009-01-01 | 19846 |
2010-01-01 | 24061 |
2011-01-01 | 24842 |
2012-01-01 | 24834 |
2013-01-01 | 23316 |