{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Temporal\n", "\n", "[[Polars Documentation](https://docs.pola.rs/api/python/stable/reference/expressions/temporal.html)]\n", "\n", "OpenDP supports some manipulation of dates and times, \n", "which can be useful in predicates and grouping functions." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import polars as pl\n", "\n", "import opendp.prelude as dp\n", "dp.enable_features(\"contrib\")\n", "# Fetch and unpack the data. \n", "![ -e ../sample_FR_LFS.csv ] || ( curl 'https://github.com/opendp/dp-test-datasets/blob/main/data/sample_FR_LFS.csv.zip?raw=true' --location --output sample_FR_LFS.csv.zip; unzip sample_FR_LFS.csv.zip -d ../ )\n", "\n", "lf_dates = (\n", " pl.scan_csv(\"../sample_FR_LFS.csv\", ignore_errors=True)\n", " # prepare the data with some expressions that are not yet supported in OpenDP\n", " .select(DATE=pl.concat_str(\"REFYEAR\", pl.col.QUARTER * 3, pl.lit(\"01\"), separator=\"-\"))\n", ")\n", "\n", "context = dp.Context.compositor(\n", " data=lf_dates,\n", " privacy_unit=dp.unit_of(contributions=36),\n", " privacy_loss=dp.loss_of(epsilon=1.0, delta=1e-7),\n", " split_evenly_over=1,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Date/Time Components\n", "\n", "- Date expressions (can be applied to `pl.Date` and `pl.Datetime` dtypes)\n", " - `.dt.year`\n", " - `.dt.iso_year`\n", " - `.dt.quarter`\n", " - `.dt.month`\n", " - `.dt.week`\n", " - `.dt.weekday`\n", " - `.dt.day`\n", " - `.dt.ordinal_day`\n", "- Time expressions (can be applied to `pl.Time` and `pl.Datetime` dtypes)\n", " - `.dt.hour`\n", " - `.dt.minute`\n", " - `.dt.second`\n", " - `.dt.millisecond`\n", " - `.dt.microsecond`\n", " - `.dt.nanosecond`\n", "\n", "An example of their use can be seen below, where a string column is parsed into dates,\n", "and then year and month components are retrieved from the dates." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (40, 3)
YEARMONTHlen
i32i8u32
200434209
200464179
200494123
2004124061
200534136
2012126424
201336300
201365905
201395618
2013125702
" ], "text/plain": [ "shape: (40, 3)\n", "┌──────┬───────┬──────┐\n", "│ YEAR ┆ MONTH ┆ len │\n", "│ --- ┆ --- ┆ --- │\n", "│ i32 ┆ i8 ┆ u32 │\n", "╞══════╪═══════╪══════╡\n", "│ 2004 ┆ 3 ┆ 4209 │\n", "│ 2004 ┆ 6 ┆ 4179 │\n", "│ 2004 ┆ 9 ┆ 4123 │\n", "│ 2004 ┆ 12 ┆ 4061 │\n", "│ 2005 ┆ 3 ┆ 4136 │\n", "│ … ┆ … ┆ … │\n", "│ 2012 ┆ 12 ┆ 6424 │\n", "│ 2013 ┆ 3 ┆ 6300 │\n", "│ 2013 ┆ 6 ┆ 5905 │\n", "│ 2013 ┆ 9 ┆ 5618 │\n", "│ 2013 ┆ 12 ┆ 5702 │\n", "└──────┴───────┴──────┘" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query = (\n", " context.query()\n", " .with_columns(pl.col.DATE.str.to_date(format=r\"%Y-%m-%d\"))\n", " .with_columns(YEAR=pl.col.DATE.dt.year(), MONTH=pl.col.DATE.dt.month())\n", " .group_by(\"YEAR\", \"MONTH\")\n", " .agg(dp.len())\n", ")\n", "query.release().collect().sort(\"YEAR\", \"MONTH\")" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" } }, "nbformat": 4, "nbformat_minor": 2 }