DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack.

An early look at the DuckDB storage engine for MariaDB — columnar, vectorized analytics that live right next to your transactional tables.

The problem

MariaDB’s InnoDB is excellent at what it was built for: transactions. Row-by-row inserts, updates, point lookups, strong consistency. But the moment you ask it to scan tens of millions of rows for a multi-way join with a few aggregations, a row store has to work hard.

The usual answer is to stand up a separate analytical system, then build ETL pipelines to copy data into it. Now you maintain two systems and you live with the lag between them.

What if the analytical engine just lived inside the same server — same SQL, same client, same data directory?

What we built

The DuckDB storage engine for MariaDB does exactly that. Create a table with ENGINE=DuckDB and its data is stored in DuckDB‘s native columnar format; analytical queries against it run through DuckDB’s vectorized, multi-core engine — in-process, inside the MariaDB server.

No separate cluster. No ETL. No new protocol. The familiar mariadb client, one SQL dialect, InnoDB and DuckDB tables side by side in the same database.

A note on MariaDB ColumnStore

MariaDB already ships ColumnStore, a mature columnar engine for large-scale analytics — typically deployed as a scale-out, multi-node data warehouse. The DuckDB engine targets a different point on the spectrum: a single-node, in-process, zero-ops analytical engine that sits next to your InnoDB tables for lightweight embedded analytics and HTAP. The two are complementary, not competing — reach for ColumnStore when you need a distributed warehouse, and the DuckDB engine when you want in-process analytics with no extra moving parts.

A 30-second taste

Create a DuckDB table just like any other:

CREATE TABLE sales (
  id        BIGINT PRIMARY KEY,
  product   VARCHAR(64),
  amount    DECIMAL(12,2),
  sold_at   TIMESTAMP
) ENGINE=DuckDB;

Query it with ordinary SQL — the whole SELECT is pushed down to DuckDB:

SELECT product, SUM(amount) AS revenue
  FROM sales
 GROUP BY product
 ORDER BY revenue DESC
 LIMIT 10;

And the part that makes it genuinely useful — cross-engine joins. A single query can combine a DuckDB analytical table with an InnoDB operational table. Here it is end to end — two tables in two different engines, joined in one statement:

-- analytical data in DuckDB, operational data in InnoDB
CREATE TABLE analytics.orders   (id BIGINT PRIMARY KEY, product_id INT, amount DECIMAL(12,2)) ENGINE=DuckDB;
CREATE TABLE inventory.products (id INT PRIMARY KEY, name VARCHAR(64))                          ENGINE=InnoDB;

INSERT INTO inventory.products VALUES (1, 'Wireless Keyboard'), (2, 'Mechanical Switch');
INSERT INTO analytics.orders   VALUES (4217, 1, 1299.00), (4218, 2, 42.00);

-- one query, both engines
SELECT d.id, d.amount, i.name
  FROM analytics.orders   d        -- ENGINE=DuckDB
  JOIN inventory.products i        -- ENGINE=InnoDB
    ON d.product_id = i.id
 WHERE d.amount > 1000;

DuckDB handles the join, aggregation, and sorting; the InnoDB rows are produced on demand through the MariaDB optimizer. No copying, no ETL.

Performance

We benchmarked the engine on TPC-H SF10 (~11 GB, 86.6M rows, 60M in lineitem). Two things matter for an analytical engine: how fast you can load data, and how fast you can query it.

Hardware / environment: Intel Core i7-13700H (14 cores / 20 threads), 64 GB RAM, NVMe SSD. MariaDB 11.4.13 with embedded DuckDB v1.5.2, duckdb_memory_limit=8 GiB, threads=20. Warm runs, stable to ~±15%.

Query latency — all 22 TPC-H queries

The full suite runs warm in ~4.3 seconds:

Query	Time (s)	Query	Time (s)	Query	Time (s)
q01	0.253	q09	0.337	q17	0.116
q02	0.076	q10	0.253	q18	0.318
q03	0.134	q11	0.049	q19	0.207
q04	0.135	q12	0.152	q20	0.166
q05	0.142	q13	0.606	q21	0.486
q06	0.070	q14	0.128	q22	0.113
q07	0.133	q15	0.101	Total	4.30
q08	0.145	q16	0.178

Each query also pays a small fixed cost (~40 ms) for client connect and pushdown setup — noticeable on the cheapest queries, negligible on the heavy analytical ones where DuckDB’s vectorized execution dominates.

Bulk loading

Path	Total load
In-engine `COPY` (`run_in_duckdb`, 8 GB mem)	33 s
`LOAD DATA LOCAL INFILE`	~400 s (6.7 min)

Loading external CSVs through DuckDB’s native parallel reader in-process — via the run_in_duckdb() function — ingests all 86.6M rows in ~33 s. The standard LOAD DATA LOCAL INFILE path is ~12× slower because every row is routed through MariaDB’s SQL/handler layer, serializing what COPY does in parallel. For bulk loads, prefer in-engine COPY via run_in_duckdb and raise duckdb_memory_limit for large tables.

How it works (briefly)

DuckDB’s speed rests on three pillars: columnar storage (read only the columns you need), vectorized execution (process data in cache-friendly batches), and parallelism (use every core).

Inside MariaDB, the engine plugs into two places:

Query pushdown — the entire SELECT (its WHERE, JOIN, GROUP BY, ORDER BY) is handed to DuckDB through MariaDB’s select_handler interface, so DuckDB’s optimizer plans the whole query.
Cross-engine scan — when a query mixes DuckDB and non-DuckDB tables, the non-DuckDB tables are streamed into DuckDB on demand through the full MariaDB execution pipeline, so index access and predicate evaluation still happen on the MariaDB side.

That’s the short version. A deeper architecture post is coming.

Why a plugin

The DuckDB engine ships as a MariaDB plugin — and that’s a deliberate, important choice. MariaDB’s pluggable architecture lets a component like this have independent maturity from the server itself.

No impact on server stability — because a storage engine plugin is loaded on demand and isolated behind well-defined interfaces, it can be added to a GA release without putting the stability of the core server at risk. If you don’t load a plugin, nothing changes for you.
Maturity on its own schedule — the plugin can iterate, stabilize, and reach production-readiness at its own pace, decoupled from the server’s release cadence. Early adopters can opt in today; everyone else is unaffected.
Clean opt-in — install the plugin when you want the capability, leave it out when you don’t. The same mechanism that has long delivered engines like InnoDB, Spider, and ColumnStore now delivers DuckDB.

This is exactly the model that lets MariaDB ship ambitious new capabilities safely: powerful when you want it, invisible when you don’t.

When to use it

Good fit

HTAP — InnoDB for transactions, DuckDB for analytics, in one database.
Ad-hoc analytical queries — big joins, aggregations, window functions — without exporting data.
Eliminating ETL — the analytical engine runs in-process.

Current limitations (it’s early days)

DECIMAL precision is capped at 38 digits (DuckDB’s limit).
DuckDB-engine tables require a PRIMARY KEY and reject non-UTF8 charsets.
Some MariaDB functions aren’t pushdown-compatible yet (e.g. GROUP_CONCAT(), DATE_FORMAT()) and fall back to MariaDB execution.
Stricter GROUP BY than MariaDB’s default sql_mode allows.
No XA transactions; cross-engine external scans are currently single-threaded.
Collations are approximated for non-UTF8 charsets.

Roadmap

This is just the opening move. Where we’re headed next:

Analytical GIS — expose DuckDB’s Spatial extension so geospatial analytics run inside MariaDB.
Faster HTAP over InnoDB-only data — run queries against purely InnoDB tables through DuckDB’s vectorized execution, accelerating analytics even when no DuckDB table is involved.
Partial query pushdown — implement a derived handler to push parts of complex queries down to DuckDB, so mixed workloads can offload their analytical sub-queries.
Data lake access — reach external data lake protocols and formats through DuckDB’s extension ecosystem.

Try it

Get prebuilt packages

The quickest path: MariaDB’s community CI publishes ready-built packages here. Pick the directory matching your platform and architecture — there’s one per OS/arch combination, so you have to choose the appropriate path. For Ubuntu 24.04 (amd64) that’s:

https://ci.mariadb.org/70621/amd64-ubuntu-2404-deb-autobake/

Use that directory as a package repository (or just download the .deb files from it) to install without compiling. Swap in the matching folder for other targets — e.g. amd64-rhel-9-rpm-autobake/, aarch64-ubuntu-2404-deb-autobake/, and so on. Note that these packages are not signed and suitable only for testing purposes.

Build from source

Alternatively, the engine ships in-tree under storage/duckdb/. Clone the branch matching your target MariaDB version and build:

git clone --recurse-submodules -b 11.4 https://github.com/MariaDB/server.git mariadb-server
cd mariadb-server

./storage/duckdb/build.sh -D   # install build dependencies (root)
./storage/duckdb/build.sh      # build + install

Build a package yourself

Prefer to roll your own distributable package? After installing the build dependencies (build.sh -D), build.sh -p produces an RPM or DEB for your platform. See the Build packages section of the engine README for details:

./storage/duckdb/build.sh -p   # build RPM (Rocky/Fedora/Amazon Linux) or DEB (Debian/Ubuntu)

Enable the engine

The DuckDB engine must be loaded at server startup — this step is obligatory. Drop a duckdb.cnf file into your server’s config-include directory:

Debian / Ubuntu: /etc/mysql/mariadb.conf.d/duckdb.cnf
RHEL / Fedora / openSUSE: /etc/my.cnf.d/duckdb.cnf

[mysqld]
plugin-maturity=alpha
plugin-load-add=ha_duckdb.so
duckdb-memory-limit=8G        # default is only 1G — worth raising before testing

plugin-load-add loads the engine and plugin-maturity=alpha is required because the plugin is currently at alpha maturity. Note the default duckdb-memory-limit is just 1 GB — for any serious testing it’s worth tweaking upward first, as it’s the main bottleneck for large loads and queries. Restart the server and you’re ready to create ENGINE=DuckDB tables.

This is a preview — we’d love testers, bug reports, and feedback.

Licensing

The engine is licensed under the GPL v2 (the same license as MariaDB Server), while DuckDB itself is distributed under the permissive MIT License.

Credits

This work draws on the DuckDB project, whose clean, embeddable, MIT-licensed engine makes integrations like this feasible, and on Alibaba’s AliSQL, whose December 2025 release integrating DuckDB into a MySQL-compatible server was a valuable reference and proof point.

More to come — including a deeper dive on the architecture and a fuller benchmark study.

Join the Conversation

2 Comments

Thomas says:

2026-06-10 at 01:11

Great!!
Will it be possible to read and write parquet files?
Will it be possible to use DuckDB specific SQL like FULL OUTER JOIN?

1. Roman Nozdrin says:
  
  2026-06-10 at 11:03
  
  The current version allows this only via plugin function `run_in_duckdb`, e.g.
  “`
  SELECT run_in_duckdb(‘SELECT * FROM read_parquet(”/data/foo.parquet”)’);
  “`
  However this function is a workaround. The future is to deliver parser that enables duckdb specific syntax, e.g. parquet files ops or FULL OUTER JOIN.