DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack.
An early look at the DuckDB storage engine for MariaDB — columnar, vectorized analytics that live right next to your transactional tables.
The problem
MariaDB’s InnoDB is excellent at what it was built for: transactions. Row-by-row inserts, updates, point lookups, strong consistency. But the moment you ask it to scan tens of millions of rows for a multi-way join with a few aggregations, a row store has to work hard.
The usual answer is to stand up a separate analytical system, then build ETL pipelines to copy data into it. Now you maintain two systems and you live with the lag between them.
What if the analytical engine just lived inside the same server — same SQL, same client, same data directory?
What we built
The DuckDB storage engine for MariaDB does exactly that. Create a table with ENGINE=DuckDB and its data is stored in DuckDB‘s native columnar format; analytical queries against it run through DuckDB’s vectorized, multi-core engine — in-process, inside the MariaDB server.
No separate cluster. No ETL. No new protocol. The familiar mariadb client, one SQL dialect, InnoDB and DuckDB tables side by side in the same database.
A note on MariaDB ColumnStore
MariaDB already ships ColumnStore, a mature columnar engine for large-scale analytics — typically deployed as a scale-out, multi-node data warehouse. The DuckDB engine targets a different point on the spectrum: a single-node, in-process, zero-ops analytical engine that sits next to your InnoDB tables for lightweight embedded analytics and HTAP. The two are complementary, not competing — reach for ColumnStore when you need a distributed warehouse, and the DuckDB engine when you want in-process analytics with no extra moving parts.
A 30-second taste
Create a DuckDB table just like any other:
CREATE TABLE sales (
id BIGINT PRIMARY KEY,
product VARCHAR(64),
amount DECIMAL(12,2),
sold_at TIMESTAMP
) ENGINE=DuckDB;
Query it with ordinary SQL — the whole SELECT is pushed down to DuckDB:
SELECT product, SUM(amount) AS revenue
FROM sales
GROUP BY product
ORDER BY revenue DESC
LIMIT 10;
And the part that makes it genuinely useful — cross-engine joins. A single query can combine a DuckDB analytical table with an InnoDB operational table. Here it is end to end — two tables in two different engines, joined in one statement:
-- analytical data in DuckDB, operational data in InnoDB
CREATE TABLE analytics.orders (id BIGINT PRIMARY KEY, product_id INT, amount DECIMAL(12,2)) ENGINE=DuckDB;
CREATE TABLE inventory.products (id INT PRIMARY KEY, name VARCHAR(64)) ENGINE=InnoDB;
INSERT INTO inventory.products VALUES (1, 'Wireless Keyboard'), (2, 'Mechanical Switch');
INSERT INTO analytics.orders VALUES (4217, 1, 1299.00), (4218, 2, 42.00);
-- one query, both engines
SELECT d.id, d.amount, i.name
FROM analytics.orders d -- ENGINE=DuckDB
JOIN inventory.products i -- ENGINE=InnoDB
ON d.product_id = i.id
WHERE d.amount > 1000;
DuckDB handles the join, aggregation, and sorting; the InnoDB rows are produced on demand through the MariaDB optimizer. No copying, no ETL.
Performance
We benchmarked the engine on TPC-H SF10 (~11 GB, 86.6M rows, 60M in lineitem). Two things matter for an analytical engine: how fast you can load data, and how fast you can query it.
Hardware / environment: Intel Core i7-13700H (14 cores / 20 threads), 64 GB RAM, NVMe SSD. MariaDB 11.4.13 with embedded DuckDB v1.5.2, duckdb_memory_limit=8 GiB, threads=20. Warm runs, stable to ~±15%.
Query latency — all 22 TPC-H queries
The full suite runs warm in ~4.3 seconds:
| Query | Time (s) | Query | Time (s) | Query | Time (s) |
|---|---|---|---|---|---|
| q01 | 0.253 | q09 | 0.337 | q17 | 0.116 |
| q02 | 0.076 | q10 | 0.253 | q18 | 0.318 |
| q03 | 0.134 | q11 | 0.049 | q19 | 0.207 |
| q04 | 0.135 | q12 | 0.152 | q20 | 0.166 |
| q05 | 0.142 | q13 | 0.606 | q21 | 0.486 |
| q06 | 0.070 | q14 | 0.128 | q22 | 0.113 |
| q07 | 0.133 | q15 | 0.101 | Total | 4.30 |
| q08 | 0.145 | q16 | 0.178 |
Each query also pays a small fixed cost (~40 ms) for client connect and pushdown setup — noticeable on the cheapest queries, negligible on the heavy analytical ones where DuckDB’s vectorized execution dominates.
Bulk loading
| Path | Total load |
|---|---|
In-engine COPY (run_in_duckdb, 8 GB mem) | 33 s |
LOAD DATA LOCAL INFILE | ~400 s (6.7 min) |
Loading external CSVs through DuckDB’s native parallel reader in-process — via the run_in_duckdb() function — ingests all 86.6M rows in ~33 s. The standard LOAD DATA LOCAL INFILE path is ~12× slower because every row is routed through MariaDB’s SQL/handler layer, serializing what COPY does in parallel. For bulk loads, prefer in-engine COPY via run_in_duckdb and raise duckdb_memory_limit for large tables.
How it works (briefly)
DuckDB’s speed rests on three pillars: columnar storage (read only the columns you need), vectorized execution (process data in cache-friendly batches), and parallelism (use every core).
Inside MariaDB, the engine plugs into two places:
- Query pushdown — the entire
SELECT(itsWHERE,JOIN,GROUP BY,ORDER BY) is handed to DuckDB through MariaDB’sselect_handlerinterface, so DuckDB’s optimizer plans the whole query. - Cross-engine scan — when a query mixes DuckDB and non-DuckDB tables, the non-DuckDB tables are streamed into DuckDB on demand through the full MariaDB execution pipeline, so index access and predicate evaluation still happen on the MariaDB side.
That’s the short version. A deeper architecture post is coming.
Why a plugin
The DuckDB engine ships as a MariaDB plugin — and that’s a deliberate, important choice. MariaDB’s pluggable architecture lets a component like this have independent maturity from the server itself.
- No impact on server stability — because a storage engine plugin is loaded on demand and isolated behind well-defined interfaces, it can be added to a GA release without putting the stability of the core server at risk. If you don’t load a plugin, nothing changes for you.
- Maturity on its own schedule — the plugin can iterate, stabilize, and reach production-readiness at its own pace, decoupled from the server’s release cadence. Early adopters can opt in today; everyone else is unaffected.
- Clean opt-in — install the plugin when you want the capability, leave it out when you don’t. The same mechanism that has long delivered engines like InnoDB, Spider, and ColumnStore now delivers DuckDB.
This is exactly the model that lets MariaDB ship ambitious new capabilities safely: powerful when you want it, invisible when you don’t.
When to use it
Good fit
- HTAP — InnoDB for transactions, DuckDB for analytics, in one database.
- Ad-hoc analytical queries — big joins, aggregations, window functions — without exporting data.
- Eliminating ETL — the analytical engine runs in-process.
Current limitations (it’s early days)
DECIMALprecision is capped at 38 digits (DuckDB’s limit).- DuckDB-engine tables require a
PRIMARY KEYand reject non-UTF8 charsets. - Some MariaDB functions aren’t pushdown-compatible yet (e.g.
GROUP_CONCAT(),DATE_FORMAT()) and fall back to MariaDB execution. - Stricter
GROUP BYthan MariaDB’s defaultsql_modeallows. - No
XAtransactions; cross-engine external scans are currently single-threaded. - Collations are approximated for non-UTF8 charsets.
Roadmap
This is just the opening move. Where we’re headed next:
- Analytical GIS — expose DuckDB’s Spatial extension so geospatial analytics run inside MariaDB.
- Faster HTAP over InnoDB-only data — run queries against purely InnoDB tables through DuckDB’s vectorized execution, accelerating analytics even when no DuckDB table is involved.
- Partial query pushdown — implement a derived handler to push parts of complex queries down to DuckDB, so mixed workloads can offload their analytical sub-queries.
- Data lake access — reach external data lake protocols and formats through DuckDB’s extension ecosystem.
Try it
Get prebuilt packages
The quickest path: MariaDB’s community CI publishes ready-built packages at ci.mariadb.org/68929/. Pick the directory matching your platform and architecture — there’s one per OS/arch combination, so you have to choose the appropriate path. For Ubuntu 24.04 (amd64) that’s:
https://ci.mariadb.org/68929/amd64-ubuntu-2404-deb-autobake/
Use that directory as a package repository (or just download the .deb files from it) to install without compiling. Swap in the matching folder for other targets — e.g. amd64-rhel-9-rpm-autobake/, aarch64-ubuntu-2404-deb-autobake/, and so on.
Build from source
Alternatively, the engine ships in-tree under storage/duckdb/. Clone the branch matching your target MariaDB version and build:
git clone --recurse-submodules -b 11.4 https://github.com/MariaDB/server.git mariadb-server
cd mariadb-server
./storage/duckdb/build.sh -D # install build dependencies (root)
./storage/duckdb/build.sh # build + install
Build a package yourself
Prefer to roll your own distributable package? After installing the build dependencies (build.sh -D), build.sh -p produces an RPM or DEB for your platform. See the Build packages section of the engine README for details:
./storage/duckdb/build.sh -p # build RPM (Rocky/Fedora/Amazon Linux) or DEB (Debian/Ubuntu)
Enable the engine
The DuckDB engine must be loaded at server startup — this step is obligatory. Drop a duckdb.cnf file into your server’s config-include directory:
- Debian / Ubuntu:
/etc/mysql/mariadb.conf.d/duckdb.cnf - RHEL / Fedora / openSUSE: /etc/my.cnf.d/duckdb.cnf
[mysqld]
plugin-maturity=alpha
plugin-load-add=ha_duckdb.so
duckdb-memory-limit=8G # default is only 1G — worth raising before testing
plugin-load-add loads the engine and plugin-maturity=alpha is required because the plugin is currently at alpha maturity. Note the default duckdb-memory-limit is just 1 GB — for any serious testing it’s worth tweaking upward first, as it’s the main bottleneck for large loads and queries. Restart the server and you’re ready to create ENGINE=DuckDB tables.
This is a preview — we’d love testers, bug reports, and feedback.
Licensing
The engine is licensed under the GPL v2 (the same license as MariaDB Server), while DuckDB itself is distributed under the permissive MIT License.
Credits
This work draws on the DuckDB project, whose clean, embeddable, MIT-licensed engine makes integrations like this feasible, and on Alibaba’s AliSQL, whose December 2025 release integrating DuckDB into a MySQL-compatible server was a valuable reference and proof point.
More to come — including a deeper dive on the architecture and a fuller benchmark study.