Writing User-Defined Functions in Rust
One of the most straightforward ways to add functionality to MariaDB or MySQL server instances is by creating a user-defined function (UDF). These are compiled functions loaded from dynamic binaries that can be much more performant and flexible than what can be written in SQL, providing the same capabilities as builtin functions.
These UDFs are typically written in C or C++, but a library is now available that makes it easy to write them in Rust. This blog discusses some of the reasoning for writing this library, followed by a very basic usage example that doesn’t require any experience with the language.
Why Rust?
Extensions to MariaDB can be written in anything that can produce a compiled dynamic library, which is typically C or C++ (the languages of the server’s source itself). There is nothing wrong with this current approach, but being able to write them in Rust has some advantages:
- The compiler always guarantees protection from the most common relevant CWEs (specifically overread/overwrite, use after free, null dereference, and race conditions)
- Good use of type safety can enforce code correctness
- Smart pointers prevent memory leaks (similar to C++, but Rust’s implementation is somewhat more straightforward)
- Incredible toolchain;
cargo
is Rust’s default build system. Out of the box, you get:- Compiling (
cargo check
/cargo build
) - Linting (
cargo clippy
) - Testing (
cargo test
, aliascargo t
) - Documentation (
cargo doc
, usuallycargo doc --document-private-items --open
for libraries) - Dependency management (configured in
Cargo.toml
)
- Compiling (
Databases are applications where performance bottlenecks can easily bog down the web services they drive. They must also take extreme care to avoid security issues, since easy to miss things like buffer overreads can mean compromising sensitive data. Rust is particularly well suited to this niche: it provides performance similar to or better than C and C++, but guarantees against those languages’ most common security pitfalls.
If you aren’t familiar with the language, you might question how C interfaces and lower-level programming are possible with the guarantees. The answer is fairly straightforward; anything that might introduce unsoundness (pointer operations, potentially thread-unsafe things, inline assembly, C FFI) are possible, but only within an unsafe {...}
block. This means that these isolated few lines of code can be easily identified and thoroughly verified; any safe API that builds upon these blocks is then guaranteed to be sound as long as the unsafe
sections are. Since the udf
library handles all these unsafe
operations for you, it is possible to write almost any UDF entirely in safe code!
Example UDF walkthrough
In this section, we will implement an extremely simple user-defined function and cover its writing, building, and usage aspects. To follow along, you will need a Rust compiler with version ≥1.65 (because of dependence on GATs); run rustup update
if needed to get to the latest version. If you don’t yet have Rust, get it from https://rustup.rs/. Get the rust-analyzer language server if you are using an IDE (not required, but nice to have).
Workspace setup
The first step is to create a new project; Cargo makes this pretty easy;
# Create a new Rust project called test-udf
$ cargo new --lib test-udf
$ cd test-udf
# Simply validate that the code compiles, same as `cargo c`
$ cargo check
# Run the unit test, same as `cargo t`
$ cargo test
The above creates a directory called test-udf
with Cargo.toml
and src/lib.rs
, then verifies everything is working correctly (the cargo new
command makes a super simple example function and test). We need to updateCargo.toml
to tell Cargo to produce the correct kind of output (a dynamic library) and to use udf
as a dependency:
[package]
name = "test-udf"
version = "0.1.0"
edition = "2021"
publish = false # prevent accidentally publishing to crates.io
# Add this section to specify we want to create a C dynamic library
[lib]
crate-type = ["cdylib"]
[dependencies]
udf = "0.5" # our dependency on the `udf` crate
You can delete everything in lib.rs
and our setup is complete.
UDF architecture
Let’s write a very simple UDF that performs a running total of integers.
A UDF typically needs to make three symbols available to the server:
- An
init
call that validates argument type and performs memory allocation - A
process
call run once per row that produces a result - A
deinit
call that frees any memory from the setup
Common interfaces like this in Rust are grouped into trait
s. The BasicUdf
trait is of interest here and provides interfaces for init
and process
(deinit
is handled automatically). This trait should be implemented on a structure representing data to be shared among calls to process
, which is called once per row. In this case, the data is just our current total.
struct RunningTotal(i64);
I am using an “tuple struct” syntax here which means you can access fields with numbers (some_struct.0
, some_struct.1
) rather than by names (some_struct.field
). This is just a convenience as we only have one field, but you are more than welcome to use a standard struct (they’re identical behind the scenes):
struct RunningTotal {
total: i64
}
In either case, our struct name will get converted to snake case to give the SQL function name (so Runningtotal
becomes running_total(...)
). This will be the basis of our implementation.
Basic structure
We now need to do three things:
- Import needed types and functions.
udf
has aprelude
module with the most commonly needed imports, so we can just import everything there; - Implement the
BasicUdf
trait for our struct; and - Add the
#[register]
macro to create the correct symbols.
The minimum compiling code looks like this
use udf::prelude::*;
struct RunningTotal(i64);
#[register]
impl BasicUdf for RunningTotal {
type Returns<'a> = i64;
fn init(cfg: &UdfCfg<Init>, args: &ArgList<Init>) -> Result<Self, String> {
todo!()
}
fn process<'a>(
&'a mut self,
cfg: &UdfCfg<Process>,
args: &ArgList<Process>,
error: Option<NonZeroU8>,
) -> Result<Self::Returns<'a>, ProcessError> {
todo!()
}
}
(hint: if you just type impl BasicUdf for RunningTotal {}
then open the quick fix inside the brackets (ctrl
+.
on VSCode); it will offer to autofill the function signatures for you.)
Woah function signatures! The docs on BasicUdf
go into detail about what everything here does, but let’s break it down simply:
type Returns<'a> = i64;
This is just where we specify the return type of our UDF. See the docs for more information about possible return types, but we just use i64
to represent a non-null integer. (Ignore the <'a>
– that is relevant only when returning references)
fn init(cfg: &UdfCfg<Init>, args: &ArgList<Init>) -> Result<Self, String> {
todo!()
}
This is our initialization function, which takes a configuration object cfg
and a list of arguments args
. Init
on these types is just a marker to indicate where they’re being used (it’s tied to what methods are available).
This returns a Result
, a builtin enum
to represent a value that has different types for Ok
and Success
(Rust enums are “tagged unions” or “sum types”). So, from our function signature, we can tell that the type of a successful function call will be Self
(i.e. RunningTotal
, which gets saved for later use) and an error will be a String
(which gets displayed to the user). Makes sense, right?
todo!()
is a builtin macro that just matches whatever type signature is needed to compile. If we actually tried to run it would fail, but we can at least verify that there are no errors in our code structure:
$ cargo c
warning: `test-udf` (lib) generated 5 warnings (run `cargo fix --lib -p test-udf` to apply 5 suggestions)
Finished dev [unoptimized + debuginfo] target(s) in 0.07s
There are warnings about unused arguments, but the basic structure is all set!
UDF implementation: init
Now that we have our basic structure, let’s take a look at how to get some results.
The main goal of our init
function is to validate arguments. Let’s look at the implementation then break it down:
fn init(_cfg: &UdfCfg<Init>, args: &ArgList<Init>) -> Result<Self, String> {
if args.len() != 1 {
return Err(format!("expected 1 argument; got {}", args.len()));
}
// Coerce everything to an integer
args.get(0).unwrap().set_type_coercion(SqlType::Int);
Ok(Self(0))
}
The first part here checks out argument count:
if args.len() != 1 {
return Err(format!("expected 1 argument; got {}", args.len()));
}
The number of arguments should be one. If not, it creates a formatted error message string and returns it (Err(something)
is how to construct a Result
enum error variant).
The second logical block:
args.get(0).unwrap().set_type_coercion(SqlType::Int);
Uses .get(0)
to attempt to get the first argument. This returns an Option<SqlArg>
which is another builtin enum type like Result
.
Option<T>
has two possible variants: Some(T)
to represent an existing value of type T, and None
to represent nothing. unwrap()
is used to get the inner value out of a Some()
value, or panic if there is None
, so we use it to get the argument at index 0 (first argument).
Note that panicking is a very bad idea in UDFs since it can cause the server to crash. We have already verified that there is a single argument here though, so unwrapping will not fail and is thus okay to use here.
Once we have our first argument representation, we are able to call set_type_coercion
. This instructs the server to either coerce the first argument to an integer or cancel the function call with an error.
Ok(Self(0))
The last line simply creates a Self
instance with 0 as its inner value, and wraps it in Ok
to indicate success. In Rust, the last line of any block is that block’s value if it doesn’t end in a semicolon, so this statement is our return value. That’s all that is needed in the init
phase.
UDF implementation: process
The process
function also has a fairly simple body:
fn process<'a>(
&'a mut self,
_cfg: &UdfCfg<Process>,
args: &ArgList<Process>,
_error: Option<NonZeroU8>,
) -> Result<i64, ProcessError> {
// Get the value as an integer and add it to our total
self.0 += args.get(0).unwrap().value().as_int().unwrap_or(0);
// The result is just our running total
Ok(self.0)
}
The first line contains most of the logic, and uses combinators to keep things
terse. It does the following:
args.get(0).unwrap()
: this gets the first argument, as discussed above. We are again okay to unwrap here because we validated our arguments (init
andprocess
get the same number of arguments when they are called, and we performed validation in init).value()
gets the argument’s value, which is aSqlResult
. This is an enum with variants forString
,Real
,Int
, andDecimal
..as_int()
is a convenience function that returns anOption
. If the value is a non-null integer, it will returnSome(i64)
; any other possibilities will returnNone
. Because we set type coercion ininit
, we can reasonably expect that all values will be a potentially null integer.unwrap_or(0)
acts the same asunwrap()
when the value isSome
, but substitutes the specified value of 0 when the value isNone
. This means that any null values will use 0 so they don’t affect our running sum.
We add this to our struct’s inner value, self.0
, which represents our current running total, and return it with Ok(self.0)
. With that, our process function is complete!
Unit testing
The udf
crate provides functionality to thoroughly test UDF implementations without even loading them into SQL. If you aren’t familiar, it is worth looking at the basics of Rust’s unit testing for a brief outline, but basically any function marked #[test]
will get run as one.
We need to update our udf
dependency to use the mock
feature so we can access that feature-gated module. Change the dependency line to look like the following:
udf = { version = "0.5", features = ["mock"] }
And add the following below our test UDF:
#[cfg(test)]
mod tests {
// Use our parent module
use super::*;
use udf::mock::*;
#[test]
fn test_basic() {
// Create a mock `UdfCfg` and mock `UdfArgs`
let mut cfg = MockUdfCfg::new();
// Create an array of mock row arguments. We will "run" our function with each
// one and verify the result
let mut row_args = [
// Each entry here acts as a row. Format for the macro is
// `([type] value, "attribute name", nullable)`
mock_args![(Int None, "", false)],
mock_args![(10, "", false)],
mock_args![(Int None, "", false)],
mock_args![(-20, "", false)],
];
// This is the expected output after each of the above calls
let outputs = [0i64, 10, 10, -10];
// Run the `init` function on our mock data
let mut rt = RunningTotal::init(cfg.as_init(), row_args[0].as_init()).unwrap();
// Iterate through our list of arguments
for (arglist, outval) in row_args.iter_mut().zip(outputs.iter()) {
// Run the process function and verify the result
let res = rt.process(cfg.as_process(), arglist.as_process(), None);
assert_eq!(res, Ok(*outval));
}
}
}
Let’s check the result:
running 1 test
test tests::test_basic ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Excellent!
See this blog post’s repository for some more example unit tests. With good unit testing, it is possible to be highly confident that a UDF performs as expected without even loading it into our server.
Loading the function
The final test for our UDF is to actually run it in a server. Building a .so
or .dll
file that can be loaded into MariaDB is easy – cargo build --release
produces the output, which will be located in target/release
. If you have Docker however, testing in an isolated environment is even easier:
FROM rust:1.66 AS build
WORKDIR /build
COPY . .
# Use Docker buildkit to cache our build directory for quicker recompilations
RUN --mount=type=cache,target=/usr/local/cargo/registry \
--mount=type=cache,target=/build/target \
cargo build --release \
&& mkdir /output \
&& cp target/release/libMY_CRATE_NAME.so /output
FROM mariadb:10.10
COPY --from=build /output/* /usr/lib/mysql/plugin/
This docker image uses cache, which is a feature of Docker buildkit. Make sure you are using a newer version of Docker, or have the environment variable set correctly to enable it. Or, remove the cache indicators.
Be sure to update file name at MY_CRATE_NAME
(libtest_udf.so
if you followed the earlier suggestion). The following runs and builds our image:
# Build the image
$ docker build . --tag mdb-blog-udf
# Run the image and name it mariadb_blog_udf for convenience
$ docker run --rm -d -e MARIADB_ROOT_PASSWORD=example --name mariadb_blog_udf mdb-blog-udf
# Enter the SQL console
$ docker exec -it mariadb_blog_udf mysql -pexample
Let’s load our function and test it, first with too many arguments and then with the correct argument count:
MariaDB [(none)]> CREATE FUNCTION running_total RETURNS integer SONAME 'libtest_udf.so';
Query OK, 0 rows affected (0.003 sec)
MariaDB [(none)]> select running_total(1, 2, 3);
ERROR 1123 (HY000): Can't initialize function 'running_total'; xpected 1 argument; got 3
MariaDB [(none)]> select running_total(10);
+-------------------+
| running_total(10) |
+-------------------+
| 10 |
+-------------------+
1 row in set (0.000 sec)
So far so good! Now a slightly harder test
MariaDB [(none)]> create database db; use db; create table t1 (val int);
Query OK, 1 row affected (0.000 sec)
Database changed
Query OK, 0 rows affected (0.023 sec)
MariaDB [db]> insert into t1(val) values (1),(2),(3),(NULL),(-100),(50),(123456789);
Query OK, 7 rows affected (0.002 sec)
Records: 7 Duplicates: 0 Warnings: 0
MariaDB [db]> select val, running_total(val) from t1;
+-----------+--------------------+
| val | running_total(val) |
+-----------+--------------------+
| 1 | 1 |
| 2 | 3 |
| 3 | 6 |
| NULL | 6 |
| -100 | -94 |
| 50 | -44 |
| 123456789 | 123456745 |
+-----------+--------------------+
7 rows in set (0.000 sec)
Perfect!
Wrapup
We have successfully written a simple UDF that validates its arguments and stores some data among calls, all with just a few lines of code. We also performed both unit and (non-automated) integration testing, which are easy steps to make sure our program works as expected.
As mentioned, the code for this example is in this blog post’s repository. For anyone looking to explore a little further, the code could easily be turned into an aggregate UDF by implementing the AggregateUdf
trait.
Helpful links:
udf
library repository page and documentation- MariaDB UDF usage documentation
- Simple suite of UDFs written in Rust
Try the example or try your own function; you are welcome to join us on Zulip to discuss the results. Happy programming!