MariaDB 11.0 – new optimizer, new major version series

With MariaDB 10.0.0 having been released over ten years ago (12 Nov 2012), you may ask yourself when there will be MariaDB 11.0.0. If so, I can answer you: Today.

You can now download MariaDB Server 11.0 Alpha preview from our dedicated download page and check out the release notes.

Time has passed …

Of course, we have a better reason for going with a new first number in a release other than ten years having passed. Significant new features. Significant incompatibilities with earlier versions. Or some other objective reason.

… and we have reached an important milestone!

We are excited that MariaDB Server has reached a new milestone. The majority of the changes in this release are a direct result of you, our users, reporting performance issues in the form of bad query plans to us. Our community has been deeply involved in guiding the development of this release.

A new cost model for the optimizer

The flagship feature of MariaDB 11.0 is the new optimizer cost model, which will be able to more accurately predict the actual cost of each query execution plan. But as we cannot be sure that it is better in all scenarios, we increase the major version number in order to send the signal: don’t be surprised if some stray query will be slower.

Our three-part message

Our key message is threefold:

The community helped us identify optimizer problems.
We have listened and now have a solution that brings MariaDB’s optimizer in line with 2022 hardware (“hard disks” aren’t what they used to be).
We want the community to take this change into use and give us even more feedback. The more the better! You can reach out via a bug report in jira.mariadb.org, Zulip, or maria-discuss & maria-developers mailing-lists.

It’s the complicated queries that are improved

Most queries won’t be affected – in particular the simple ones, with at most one join. Out of 100 queries, perhaps 10 may be executed using a different plan, meaning in a different order, based on the actual contents in the table, the number of rows, the available indexes. Out of those 10 affected ones, we expect 9 or 10 to be faster.

More in later blogs

The cost model changes will be described in detail in a follow-up blog post, as can be expected from a big improvement with well over 12 months of work. The work was mostly done by our founder, Michael “Monty” Widenius, with extensive support by Sergey Petrunia and Vicențiu Ciorbaru.

Incompatibilities between 10.11 and 11.0

There are also a number of incompatibilities between 10.11 and 11.0, which we will list and blog about in due course. We wanted to implement these changes for a long time, but could not introduce them without changing a major version number.

Revisiting criteria for renumbering

As the purported “objective” criteria of the significance of new features and incompatibilities are hardly black and white, we will revisit our criteria for renumbering the first version. Suffice it to say that we do expect you to see MariaDB 12.0.0 earlier than in 10 years from now, 2032. Insights and comments welcome!

Above all: Good news – more performance!

All in all, we think there are plenty of reasons to download MariaDB 11.0 and check whether your queries run faster. This is the case in particular for those of you who have been using FORCE INDEX as a way to improve execution speed by giving the optimizer hints as to which plan to use.

Experience “the numbers going to eleven”

P.S. For friends of British pop culture and/or the fictional heavy metal band Spinal Tap: Looking forward to your experiences from seeing MariaDB Server’s AMP going to 11.

2 replies on “MariaDB 11.0 – new optimizer, new major version series”

Matthew Hastie says:

2022-12-29 at 22:53

Delightful.

I’ve been deeply involved in MariaDB implementation, having co-founded team who delivered the fork that ServiceNow uses in production today.

My work gave me excellent study of the implementation history with innobase lens, from 10.1 though today.

I have such compassion for your cause. Make it superb!

I hope the horror of complexity like we’ve seen past – the 10.4 immaturity due to debasement of type system for push down select, and instant alter table complexity, the AHI drop = O(N^whtevs), the coarse mutexes, the uncompiled where clause AST cliffing branch prediction, b-tree access through mmap/fsync straw, especially with user space NVMe multipage atomic write in plain sight. The list is getting shorter every day!

The compile of the massive codebase and indeed time to sound heap (MTR soundness on 500G/80T white hot heap?) semantics for any massive heap, one that maxes rack DDR is still a huge challenge.

Also noteworthy, the directives of producing the world’s finest relational heap and desire for general purpose cloud business weaken both product and business; the plugin space should be brutally culled for junksQL collected throughout the zeitgeist of C21.

Failure to do this may have SQLite derivatives eat your lunch.

The other opportunity in 2023 is the lack of integrated solution from database to array of Samsung Xilinx FPGA co-located Storage, and other, to effect rendering of page content for PCIe bus transfer to memory.

100% selectivity across PCIe bus is possible should WHERE clauses be compiled near storage, indeed at a time when we have uring i/o.

This might reflect a next generation metaphor or the atomic write Fusion I/O era that gave Maria competitive advantage in 2012.

I observe that FPGA narratives also occur surrounding network card, for CPU offload of projection from memory, and CPU/FPGA itself (RISC-V b-tree extensions) to accelerate parallel join under single thread, or provide indexed based addressing by data-path engineering of query plan, for example.

Offload of b-tree normalization to FPGA algorithm. All kinds.

The other opportunity to take is that of using storage co-located FPGA to render b-tree content in column store layout, to give SIMD execution or what some might say “analytical query capability” to the transactional store.

This possibly eliminates the insanity of backends, replication, necessitated by HTAP ideology, and plugins that exist merely to present pages to an SIMD unit.

In the case of perceived need for column layout to present b-tree pages to SIMD unit, both the architectural problems experienced inherent in the plug-in architecture, that of each store working set dividing main memory to always defeat performance of just innobase alone is gone, a singleton store may better control relative importance of columnular projections of b-tree span by FPGA close to storage.

Computational joins must also extend to machine learnt symbols and dot product orderings – all AI/ML for every organization has must integrate with database south of the join layer, if there is desire to chatgpt or recognize image relevance into relational query, without creating client side joins.

I trust ServiceNow and your other largest community customers have absolute efficiency in backups and restores of massive heaps in 2023, and that trust issues surrounding this sensitive issue are put to rest.

Your reputation is at stake at the highest levels of industry where this is not class leading and flawless.

I must say though, congratulations this is a great achievement, having deep experience and horror in the depths of sql_select.c. Godspeed MariaDB 2023!
Michael "Monty" Widenius says:

2023-01-03 at 12:39

Hi!

Matthew Hastie, thanks for a lot of interesting thoughts and ideas.

A few comments to selected sections:

100% selectivity across PCIe bus is possible should WHERE clauses be compiled near storage, indeed at a time when we have uring i/o.

I see some issues with trying to this generally:

In most real time scenarios, most of the active data is cached by the database, so using uring will not help that much.

At the block-read level, there is very little information of what is read and how to interpret that.

Applying WHERE to block without context and when data for one record can often be split over multiple blocks is very hard to do.

What makes things more complex is that our b-tree’s are not mem-comparable (as we want to support index-only reads). For example MyRocks can support hardware accelerated parts for sorting and SST file compaction as it stores data in mem-comparable form. However this means that one cannot use index only reads with MyRocks.

The one place where this could be useful is when doing a total scan of a table. However a better choice could be to apply MIN/MAX indexes, Bloom filters or other block filtering mechanics to not have to read/check blocks at all that does not contain any relevant data.

Parallel query is on the topic for 2023. We have too big MariaDB/MySQL users who has done an implementation of this already in MySQL/MariaDB and I have good hopes that either of them will donate their code to MariaDB. If not, I will look at implementing this myself for MariaDB 11.1 or 11.2!

offloading b-tree to FPGA”

The problem here is that b-tree’s code is not completely general and isolated. As the user can have any combinations of types and character sets in the index, most of the b-tree code is calling sub functions to compare things. The b-tree code also depends on the state of the transactions, for example what is visible for the transaction, what can be removed etc. A big part of the performance of the b-tree codes goes to locking the b-tree, copying blocks (to allow parallel access etc). I don’t know how much an FPGA could help here. In case of b-trees that only stores one type (double?) I assume it would be relatively easy to do an FPGA solution. However I don’t know how to do this efficiently in the general case.

The other opportunity to take is that of using storage co-located FPGA to render b-tree content in column store layout

This may be doable for a clustered indexes. However it would not give that much performance improvement as one would still have to read all blocks, including data one does not need.

The biggest advantage of column store layouts is that you get much more rows per block-read and you get a big advantage of compression. For example, MariaDB column store reads compressed blocks in size of megabytes at a time. An FPGA working on normal b-tree’s cannot match that!

sql_select.cc

I hope that you have seen all the good work we have done there to improve things in 11.0! At least, I hope some of the ‘horrors’ you have experienced in the past are gone now!

Regards,
Monty