{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# OpenDP Rust Initiation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook is an introduction to the Rust internals of the OpenDP framework.\n", "I'm assuming you have already read about the programming framework in the user guide, and you have some familiarity with the library interfaces.\n", "\n", "If you have not worked with Rust before, [a great place to get started is the Rust book](https://doc.rust-lang.org/stable/book/ch01-00-getting-started.html).\n", "This notebook will also reference sections of the Rust book that surface commonly in the OpenDP library." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# we'll use this Python snip to demonstrate concepts later...\n", "from opendp.mod import enable_features\n", "enable_features(\"contrib\")\n", "\n", "from opendp.transformations import make_cast_default\n", "default_cast_trans = make_cast_default(str, int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transformation Structure\n", "The following snip is the definition of a Transformation, from [rust/src/core/mod.rs](https://github.com/opendp/opendp/blob/main/rust/src/core/mod.rs).\n", "Transformations are structs ([see chapter 5](https://doc.rust-lang.org/stable/book/ch05-00-structs.html)).\n", "\n", "```rust\n", "struct Transformation {\n", " pub input_domain: DI,\n", " pub output_domain: DO,\n", " pub function: Function,\n", " pub input_metric: MI,\n", " pub output_metric: MO,\n", " pub stability_relation: StabilityRelation,\n", "}\n", "```\n", "\n", "This struct has four generics ([see chapter 10.1](https://doc.rust-lang.org/stable/book/ch10-00-generics.html)): \n", "\n", "```\n", "- DI for input domain\n", "- DO for output domain\n", "- MI for input metric\n", "- MO for output metric\n", "```\n", "\n", "A generic is a type that has not yet been explicitly determined. \n", "These generics let us build transformations out of many different types.\n", "\n", "Notice that each of these generics are marked as either `Domain` or `Metric`. \n", "These are called \"trait bounds\" ([see chapter 10.2](https://doc.rust-lang.org/stable/book/ch10-02-traits.html#trait-bound-syntax)).\n", "\n", "`Domain` and `Metric` are both traits.\n", "Trait bounds constrain the set of possible types that a generic may take on.\n", "In this case, `DI: Domain` indicates that `DI` may be any type that has the `Domain` trait implemented for it.\n", "There is a reasonably small set of types that satisfy these trait bounds.\n", "\n", "We now take a closer look at each of the struct members.\n", "```rust\n", " ...\n", " pub input_domain: DI,\n", " pub output_domain: DO,\n", " ...\n", "```\n", "The input and output domains strictly define the set of permissible input and output values.\n", "Examples of metrics are `AllDomain`, `VectorDomain`, `MapDomain` and `DataFrameDomain`. \n", "When you attempt to chain any two transformations, the output domain of the first transformation must match the input domain of the second transformation ([via the PartialEq trait](https://doc.rust-lang.org/std/cmp/trait.PartialEq.html)).\n", "The resulting chained transformation contains the input domain of the first transformation, the output domain of the second transformation, as well as the functional composition of the two functions.\n", "\n", "```rust\n", " ...\n", " pub function: Function,\n", " ...\n", "```\n", "We wrap the closure in a `Function` struct that is generic over the input domain and output domain.\n", "The definition of this struct [is in the same file](https://github.com/opendp/opendp/blob/main/rust/src/core/mod.rs).\n", "\n", "\n", "When we invoke the following transformation:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "[0, 0, 2, 456]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "default_cast_trans([\"null\", \"1.\", \"2\", \"456\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. the Python data structure is translated into a low-level C representation and then into a Rust representation\n", "2. the Rust `function` is evaluated on a Rust `Vec`\n", "3. the result is shipped back out to familiar Python data structures\n", "\n", "\n", "We also have input and output metrics.\n", "```rust\n", " ...\n", " pub input_metric: MI,\n", " pub output_metric: MO,\n", " ...\n", "```\n", "Examples of metrics are `HammingDistance`, `SymmetricDistance`, `AbsoluteDistance` and `L1Distance`. \n", "They behave in the same way that the input and output domains do when chaining.\n", "Finally, the stability map. \n", "\n", "```rust\n", " ...\n", " pub stability_map: StabilityMap,\n", " ...\n", "```\n", "It is a function that takes in an input distance, in the respective metric space, and returns the smallest acceptable output distance in terms of the output metric.\n", "The definition of this struct [is also in the same file](https://github.com/opendp/opendp/blob/main/rust/src/core/mod.rs).\n", "\n", "Invoking this function triggers a similar process as the function did:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "default_cast_trans.map(d_in=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When any two compatible transformations are chained, the resulting transformation contains a functional composition of the relations.\n", "\n", "Ultimately, all pieces are used to construct the new transformation:\n", "\n", "| input | chaining | output |\n", "|---:|:---:|:---|\n", "| input_domain_1 | output_domain_1 == input_domain_2 | output_domain_2 |\n", "| function_1 |composed with| function_2 |\n", "| input_metric_1 | output_metric_1 == input_metric_2 | output_metric_2 |\n", "| stability_relation_1 | composed with | stability_relation_2 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you've seen above, when we want to create a transformation, we use \"constructor\" functions. These are, by convention, prefixed with `make_`.\n", "\n", "### Example Transformation Constructor\n", "An example implementation of the casting transformation constructor is provided. \n", "I'll break it down into three parts.\n", "\n", "```rust\n", "// 1.\n", "pub fn make_cast_default()\n", " -> Fallible<\n", " Transformation<\n", " VectorDomain>, \n", " VectorDomain>, \n", " SymmetricDistance, \n", " SymmetricDistance>>\n", "\n", " // 2.\n", " where TIA: 'static + Clone + CheckNull, \n", " TOA: 'static + RoundCast + Default + CheckNull {\n", "\n", " // 3.\n", " Ok(Transformation::new(\n", " VectorDomain::new(AllDomain::new()),\n", " VectorDomain::new(AllDomain::new()),\n", " Function::new(move |arg: &Vec|\n", " arg.iter().map(|v| TOA::round_cast(v.clone()).unwrap_or_default()).collect()),\n", " SymmetricDistance::new(),\n", " SymmetricDistance::new(),\n", " StabilityRelation::new_from_constant(1)))\n", "}\n", "```\n", "\n", "The first part is the function signature:\n", "```rust\n", "pub fn make_cast_default()\n", " -> Fallible<\n", " Transformation<\n", " VectorDomain>, \n", " VectorDomain>, \n", " SymmetricDistance, \n", " SymmetricDistance>>\n", " ...\n", "```\n", "Most of the signature consists of types. \n", "Rust is strictly typed, so the code needs to be very explicit about what the type of the constructor function's inputs and outputs are. \n", "\n", "This is a generic function with two type arguments `TIA` and `TOA`, standing for \"atomic input type\" and \"atomic output type\".\n", "This function doesn't take any concrete arguments `()`.\n", "\n", "The constructor returns a fallible transformation.\n", "The last two lines specify the types of the input/output domains/metrics, that is, what `DI`, `DO`, `MI` and `MO` (from the definition of a Transformation) are.\n", "\n", "The second part is the where clause:\n", "```rust\n", " ...\n", " where TIA: 'static + Clone + CheckNull, \n", " TOA: 'static + RoundCast + Default + CheckNull {\n", " ...\n", "```\n", "A where clause is another, equivalent way of listing trait bounds on generics.\n", "You can interpret this as, \"the compiler will enforce that `TIA` must be some type that has the `Clone` and `CheckNull` traits. \n", "In other words, while I don't specify what `TIA` must be up-front, I can bound what type it may be to types that are cloneable and have some concept of null-checking.\n", "`TOA`, in particular, has a `RoundCast` trait, which can be used to cast from type `TIA` to `TOA`. \n", "For now, please feel free to ignore the `'static` trait bounds.\n", "\n", "The final part is the function body, which creates and implicitly returns a Transformation struct.\n", "```rust\n", " ...\n", " Ok(Transformation::new(\n", " VectorDomain::new(AllDomain::new()),\n", " VectorDomain::new(AllDomain::new()),\n", " Function::new(move |arg: &Vec|\n", " arg.iter().map(|v| TOA::round_cast(v.clone()).unwrap_or_default()).collect()),\n", " SymmetricDistance::new(),\n", " SymmetricDistance::new(),\n", " StabilityRelation::new_from_constant(1)))\n", "}\n", "```\n", "Each argument corresponds to a struct member.\n", "To make the `Function`, we use a useful shorthand to create an anonymous closure (a function) ([see chapter 13.1](https://doc.rust-lang.org/stable/book/ch13-01-closures.html)).\n", "For example, `|a, b| a + b`. takes two arguments, `a` and `b`. The function body is `a + b`.\n", "\n", "This closure casts the data by iterating over each record `v`, casting, and replacing nulls with the default value for the type ([see chapter 13.2](https://doc.rust-lang.org/stable/book/ch13-02-iterators.html)).\n", "\n", "We also take advantage of a convenient constructor for building `c`-stable relations.\n", "Since the cast function is row-by-row, it is 1-stable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Measurement Structure\n", "\n", "Measurements are very similar to Transformations, with two key differences.\n", "\n", "```rust\n", "pub struct Measurement {\n", " pub input_domain: DI,\n", " pub output_domain: DO,\n", " pub function: Function,\n", " pub input_metric: MI,\n", " pub output_measure: MO,\n", " pub privacy_map: PrivacyMap,\n", "}\n", "```\n", "\n", "First, the `output_metric` is replaced with an `output_measure`, as distances in the output space are measured in terms of divergences between probability distributions.\n", "\n", "Second, the name of the map has changed from a stability map to a privacy map. \n", "This is because the relation between distances now carries meaning with respect to privacy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Developer Loop\n", "\n", "When writing code:\n", "\n", "1. Make a change to the Rust source.\n", "1. Use `cargo check --features untrusted` to do a quick check for compiler errors. A properly configured development environment will automatically run this command for you and highlight your code.\n", "1. Read the compiler errors and iterate. Rust errors usually provide helpful explanations.\n", "\n", "When testing code in Rust, a properly configured development environment will mark up `#[test]` annotations with a button to execute the test. \n", "\n", "When testing code in Python, run `cargo build --features untrusted,bindings-python` to update the binary.\n", "You'll need to restart the Python interpreter or kernel for changes to appear.\n", "All folders named ``out`` are .gitignored, so they're a great place to throw scratch work that you don't want to commit.\n", "\n", "If you are writing a new function, you'll need to write FFI bindings (`./ffi.rs`) and decorate the function with the ``bootstrap`` macro before you can access the function from Python.\n", "Please don't hesitate to ask for help!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Next Steps\n", "\n", "1. If you are adding a new file, please place your code inside a ``mod.rs`` file in a new folder.\n", " This is to give room to place the proof file adjacent to the implementation.\n", "1. Please accompany your sources with a testing module at the end of the file.\n", " Test modules are also a great way to play with your constructor before the FFI bindings are available.\n", "1. Please format your code nicely (rustfmt), add documentation, and comment meaningfully!\n", "\n", "The other constructor functions in the library are great to use as a reference.\n", "It's likely you have more questions — this short guide could never possibly be complete.\n", "If you'd like to get more involved in OpenDP development, don't hesitate to send a message and we'll help get you bootstrapped!" ] } ], "metadata": { "interpreter": { "hash": "3220da548452ac41acb293d0d6efded0f046fab635503eb911c05f743e930f34" }, "kernelspec": { "display_name": "Python 3.8.13 ('psi')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }