MariaDB is part of Google Summer of Code 2023

We are excited to announce that this year MariaDB has once again been accepted as a Google Summer of Code organization. With this blog post I want to showcase the projects we’re taking on and wish good luck to our mentees for the summer!

At MariaDB we strongly believe in growing Open Source and we encourage new developers to contribute. Google Summer of Code allows us to have dedicated contributors focus on a project for a few months, knowing the costs are covered. We at MariaDB can then just focus on the core aspects – writing code and growing our community. This is the main reason we have proudly been mentoring for more than 10 years.

Some of the contributors we mentor end up sticking around the ecosystem, and some even end up working for us full time. (cough, the author became part of the Foundation that way 😉 )

We had many good proposals this year and we have received 6 slots from Google to allocate mentors to. Although the selection process was difficult, here are the projects that MariaDB will be mentoring this year:

ColumnStore is the star

With 5 projects accepted for ColumnStore, it’s clear that this will be the highlight this year.

Parquet support in cpimport

Bin Ruan will be improving data import methods into ColumnStore. Adding data within columnar storage engines is tricky, particularly if one wants to do many “row” inserts. That’s why ColumnStore has a dedicated tool (cpimport) for importing bulk data. Extending its functionality for multiple data formats will allow ColumnStore to be used with more applications and data sources.

Fuzzing pipeline for ColumnStore

Testing lies at the core of ensuring quality software. And one can never have enough testing. That’s why, this year, Lajat Manekar will be implementing Fuzzing Pipeline for ColumnStore.
We already have comprehensive testing in place for MariaDB Server and its more traditional Storage Engines. ColumnStore however is an entirely different problem, given its distributed architecture. That is why we are looking forward to getting proper advanced testing into ColumnStore. We are also hoping the lessons learned here can be further ported to the CI running on buildbot.mariadb.org.

SIMD for SQL expressions and functions in ColumnStore

To achieve top performance, software needs to take advantage of hardware features. Writing generic code will only bump the performance up so much. This is where dedicated parallel instructions within the CPU called Single Instruction Multiple Data (SIMD) can bring huge benefits. The data in ColumnStore is already stored in a “vector” format, so it only seems natural to apply SIMD optimizations where possible, which is what Mu He will be working on this year. We are looking forward to seeing the performance gains we can achieve once SIMDs are being taken advantage of.

Optimize GROUP BY in Maria DB Columnstore

And on the topic of performance: GROUP BY is a complex operation that has multiple possible execution plans. It’s usually a tradeoff between space and time. One plan might use more RAM (or disk storage) while another might just scan more rows, taking longer to execute but being cheap on memory. One can sort keys for faster lookup, or use hash tables. Aggregation methods often end up having to spill intermediate results to disks. Any one of these execution steps can have an impact on the final query performance.

Theresa Hradilak will be working on improving GROUP BY performance in ColumnStore. The focus areas will be experimenting with different hash tables as well as introducing async I/O via liburing.

Experiment with JIT optimizations for SQL expressions in ColumnStore

The final project coming up for ColumnStore revolves around experimenting with modern JIT capable compilers such as LLVM, MIR. The goal of this project, worked on by Xie Qijun, is to identify if there are any performance benefits to introducing JIT compiled byte-code for SQL expressions. Expressions such as t1.a + FLOOR(t1.b) incur a penalty due to all the type checking and during evaluation. Eliminating that penalty is what we believe will have a positive effect on performance.

MyRocks (aka RocksDB) is getting some overdue love

Finally, stepping outside ColumnStore land, we have one more Storage Engine project, this time on MyRocks. MyRocks was first introduced in MariaDB 10.2. What makes this storage engine special is that it relies on RocksDB, a key-value store developed by Facebook. MariaDB has not kept up with the recent changes in RocksDB’s API and right now we are not able to simply bump the RocksDB version. The goal of this project that will be worked on by Junqi Xie is to create the necessary wrappers and changes to get MyRocks to compile with the latest version of RocksDB.

With this said we wish our mentees and mentors a productive summer!