Generative AI and MariaDB Server

Sea Lion Gen AI

“Generative AI is a can of worms that has to be opened”. That was the laconic comment from a senior industry influencer, when I shared MariaDB Foundation’s plans for successively making MariaDB Server a platform for AI solutions. The statement combines the opportunity with the inevitability, the complexity with the need for stepwise refinement.

Late to the game?

Are we late to the game? I believe not. I believe this is the right timing. Open Source isn’t a pioneer when it comes to basic research or even early product development. Linux came when operating systems were a long-since established concept, and Unix had emerged as a standard. MySQL came when RDBMSes were long-since established, and SQL had emerged as a standard. 

No, perhaps even early!

By those standards, we may even be early. But user needs start to be better understood, and Open Source implementations for storing vectors in databases are emerging. Postgres has pgvector, which is unsurprising, given the research and academic roots of Postgres, which predate not just MariaDB but also MySQL.

Vector based indexes

The first step in the iterative process of enabling MariaDB Server for AI development and delivery will be implementing support for vectors, storing vectors, indexing them, searching them. This is a good technical fit with MariaDB Server, given the storage engine architecture that we inherited from MySQL. This is also a good cultural fit with MariaDB Foundation, given our openness value that encourages ecosystem cooperation and contributions. 

Unique openings specific to MariaDB Server

In combination, the technical and cultural matches give us serendipitous openings. While technological choices have not yet been made, let me underline the opportunity of building upon the MyRocks storage engine, in the process of being updated for MariaDB Server by Andrew Hutchings with the help of a Google Summer of Code participant. The use case of vectors involves injecting lots of infrequently updated data, where high compression could shine.

We have a few first technical action items

But let me not portray the MariaDB Server generative AI initiative as more mature than it is. We are in the very early stages, both technically and governance-wise.

Technically, we’ve created the first three Jira items, 

We have established a steering committee 

Governance-wise, last week’s MariaDB Foundation Board meeting 4/2023 decided to establish a Steering Committee for our Gen AI initiative, inviting all interested sponsors and contributors to join.

To quote from the meeting minutes,

Putting resources in a Gen AI initiative is important for MariaDB Foundation. It’s a significant area of interest for the ecosystem, and important for adoption. Collaborating around such a resource-intense undertaking is a great example of multi-vendor cooperation, and in line with MariaDB’s value of openness. From an external perspective, this makes MariaDB Server a better fit for a Gen AI platform than MySQL Server, which is single-vendor and not encouraging extensive contributions.

Our reasoning is based on interacting with most of our sponsors since our last Board meeting 3/2023 (Wed 6 Sep 2023). Already then, Amazon suggested MariaDB Server to provide infrastructure for AI through a vector storage engine, similar to pgvector.

We have a core technical team

Continuing the meeting minute quote,

With Sergei Golubchik representing MariaDB plc and Vicentiu Ciorbaru representing MariaDB Foundation, a core technical team has now been established, with the initial goal to create an engine that can store and index vectors and search them, based on distance/similarity functions (eg. Euclidean distance and over time expanded to inverted inner product and cosine distances). 

Initial plans have reached the level of a suggested syntax

VEC_DISTANCE(v1, v2)

The contributor response is overwhelming

What gives me the sense that the timing is right, is the overwhelmingly positive response we have been getting from our sponsors and contributors. Let me come with one more quote from the minutes:

Interest in participation has, besides Amazon, already been expressed by Acronis, Alibaba, Automattic, Constructor, IBM, Intel, and Microsoft. With the growing industry interest and the potential of using MariaDB Server as a platform for generative AI, it would be wise to create an advisory body to guide technical progress. MariaDB Foundation invites members to join MariaDB plc CTO Jonah Harris and MariaDB Foundation CEO Kaj Arnö on this team, setting goals and reporting back to the MariaDB Foundation board.

The vision: a de-facto standard

This means that, once the can of worms is properly open, we can pave the way for the creation of a de-facto standard platform for deploying AI solutions, building upon MariaDB Server, with a dead-easy migration path also for those currently using MySQL Server.

We’re open for contributions

As a result of the board meeting, the Steering Committee is set up and open for board members to join. There will be a Zoom presentation and discussion on this item, on Thu 14 Dec 2023 at 16:00-17:00 EET. This meeting is recommended for all board members interested in joining the steering committee. Hey, this is an open source initiative: If you wish to join, contact us at foundation@mariadb.org!