Adding a New Data Type to MariaDB with Type_handler – Part 2

After having discovered the Type_hander framework and learned how to build MariaDB Server from source, it’s time to code our first data type!

We will create a MariaDB plugin that registers a new MONEY type and instantiates a custom field object.

Our component won’t be exciting, but we want to understand how to use the framework and test it.

We want to prove that

  • the plugin loads,
  • the server sees the type hander,
  • a MONEY column can create a Field_money object.

Everything else comes later.

Adding a New Data Type to MariaDB with Type_handler – Part 1

This is the first part of the series about how to add a new data type to MariaDB using the Type_handler framework. A preliminary article has already been published to start the series; it covers how to set up your development environment and compile MariaDB Server: Adding a New Data Type to MariaDB with Type_handler – Part 0.

Understand Type_handler Before Writing Code

When you add a new type to MariaDB, you are not only adding a new SQL keyword. Historically, that kind of work required invasive changes across the parser, optimizer, protocol, replication, and type-conversion mechanisms.

Adding a New Data Type to MariaDB with Type_handler – Part 0

Welcome to this new series about extending MariaDB. This series covers the addition of a new data type using the Type_handler.

The goal of the entire series is to create a new plugin data type MONEY to store and display amounts with currency.

Something like:

MariaDB [test]> select * from t1;
+—-+————-+
| id | amount |
+—-+————-+
| 1 | $2,000.00 |
| 2 | $100,000.56 |
+—-+————-+
2 rows in set (0.002 sec)

Of course, the ultimate goal is to teach how to add data types in MariaDB, and we expect to see how creative our community developers are!

Database Trends: What is changing in the database world (besides AI)

Earlier this month, I had a half-hour chat with Kellyn Gorman, a Database and AI Advocate and Engineer at Redgate. The UK software company is known for database DevOps and database management tools most databases – and since 2024 as the owner of DB-Engines popularity Ranking of database management systems.

The chat was an intellectual pleasure, to say the least. Kellyn is outstandingly well informed on databases, with a background starting in Oracle, spanning most databases as a DBA and industry analyst, and by now using MariaDB for about fifteen years, almost since its inception.

MariaDB Vector: How it works. Part IV

This is the last post in the “MariaDB Vector: How it works” series. The first three were about storage, in-memory representation, HNSW modifications. Everything that was done in MariaDB 11.8. This post talks about new feature in MariaDB 12.3: optimized distance calculation.

As I mentioned earlier, distance calculation is the most time consuming part of the vector search, taking 80–90% of the total search time. Also it is linear on the number of dimension — computing the distance between vectors of 1536 dimensions takes twice as long compared to vectors of 768 dimensions.

MariaDB Vector: How it works. Part III

In the previous parts of this series we’ve seen how MariaDB stores vector indexes in a table and how to implement HNSW for a good performance. But MariaDB is not implementing HNSW, it calls its vector search algorithm mHNWS, a modified HNSW. Let’s see how exactly it was modified.

Not so greedy!

HWNS, like many, if not most, graph based vector search algorithms is greedy. Think of it this way, when it needs to find just one nearest vector (ef=1), it will walk the graph always choosing the node that will take it the closest to the target at this particular step.

MariaDB Vector: How it works. Part II

In the first post of this series, I’ve described how the vector index is stored in a table and how it achieves full transactional behavior and ACID properties compatible with the storage engine of the table the user created. But while the table provides persistent storage of the index, it’s in-memory part that gives it the performance. This is how it works.

Distance calculations

This is the most performance sensitive part of the HNSW. According to various estimates, distance calculations account for 80–90% of search time. And this operation time grows linearly with the vector length.

MariaDB Vector: How it works

You might have seen that MariaDB Vector is fast. And is getting faster. But why? How does it achieve that? And why it is said to use mHNSW (modified HNSW) algorithm? What did it modify in the conventional HNSW that all other databases are using? Let’s take it apart and analyze piece by piece.

Introduction into HNSW

This post is not a full description of HNSW, there are many HNSW descriptions online and they are good, better than what I could’ve written. I will only show the basic concepts beyond HNSW, concepts that are crucial for the rest of the post.