Generative AI and MariaDB Server
“Generative AI is a can of worms that has to be opened”. That was the laconic comment from a senior industry influencer, when I shared MariaDB Foundation’s plans for successively making MariaDB Server a platform for AI solutions. The statement combines the opportunity with the inevitability, the complexity with the need for stepwise refinement.
Late to the game?
Are we late to the game? I believe not. I believe this is the right timing. Open Source isn’t a pioneer when it comes to basic research or even early product development. Linux came when operating systems were a long-since established concept, and Unix had emerged as a standard. MySQL came when RDBMSes were long-since established, and SQL had emerged as a standard.
No, perhaps even early!
By those standards, we may even be early. But user needs start to be better understood, and Open Source implementations for storing vectors in databases are emerging. Postgres has pgvector, which is unsurprising, given the research and academic roots of Postgres, which predate not just MariaDB but also MySQL.
Vector based indexes
The first step in the iterative process of enabling MariaDB Server for AI development and delivery will be implementing support for vectors, storing vectors, indexing them, searching them. This is a good technical fit with MariaDB Server, given the storage engine architecture that we inherited from MySQL. This is also a good cultural fit with MariaDB Foundation, given our openness value that encourages ecosystem cooperation and contributions.
Unique openings specific to MariaDB Server
In combination, the technical and cultural matches give us serendipitous openings. While technological choices have not yet been made, let me underline the opportunity of building upon the MyRocks storage engine, in the process of being updated for MariaDB Server by Andrew Hutchings with the help of a Google Summer of Code participant. The use case of vectors involves injecting lots of infrequently updated data, where high compression could shine.
We have a few first technical action items
But let me not portray the MariaDB Server generative AI initiative as more mature than it is. We are in the very early stages, both technically and governance-wise.
Technically, we’ve created the first three Jira items,
- MDEV-32885 VEC_DISTANCE() function
- MDEV-32886 VEC_FromText() and VEC_AsText() functions
- MDEV-32887 k-ANN indexes for vectors
We have established a steering committee
Governance-wise, last week’s MariaDB Foundation Board meeting 4/2023 decided to establish a Steering Committee for our Gen AI initiative, inviting all interested sponsors and contributors to join.
To quote from the meeting minutes,
Putting resources in a Gen AI initiative is important for MariaDB Foundation. It’s a significant area of interest for the ecosystem, and important for adoption. Collaborating around such a resource-intense undertaking is a great example of multi-vendor cooperation, and in line with MariaDB’s value of openness. From an external perspective, this makes MariaDB Server a better fit for a Gen AI platform than MySQL Server, which is single-vendor and not encouraging extensive contributions.
Our reasoning is based on interacting with most of our sponsors since our last Board meeting 3/2023 (Wed 6 Sep 2023). Already then, Amazon suggested MariaDB Server to provide infrastructure for AI through a vector storage engine, similar to pgvector.
We have a core technical team
Continuing the meeting minute quote,
With Sergei Golubchik representing MariaDB plc and Vicentiu Ciorbaru representing MariaDB Foundation, a core technical team has now been established, with the initial goal to create an engine that can store and index vectors and search them, based on distance/similarity functions (eg. Euclidean distance and over time expanded to inverted inner product and cosine distances).
Initial plans have reached the level of a suggested syntax
VEC_DISTANCE(v1, v2)
The contributor response is overwhelming
What gives me the sense that the timing is right, is the overwhelmingly positive response we have been getting from our sponsors and contributors. Let me come with one more quote from the minutes:
Interest in participation has, besides Amazon, already been expressed by Acronis, Alibaba, Automattic, Constructor, IBM, Intel, and Microsoft. With the growing industry interest and the potential of using MariaDB Server as a platform for generative AI, it would be wise to create an advisory body to guide technical progress. MariaDB Foundation invites members to join MariaDB plc CTO Jonah Harris and MariaDB Foundation CEO Kaj Arnö on this team, setting goals and reporting back to the MariaDB Foundation board.
The vision: a de-facto standard
This means that, once the can of worms is properly open, we can pave the way for the creation of a de-facto standard platform for deploying AI solutions, building upon MariaDB Server, with a dead-easy migration path also for those currently using MySQL Server.
We’re open for contributions
As a result of the board meeting, the Steering Committee is set up and open for board members to join. There will be a Zoom presentation and discussion on this item, on Thu 14 Dec 2023 at 16:00-17:00 EET. This meeting is recommended for all board members interested in joining the steering committee. Hey, this is an open source initiative: If you wish to join, contact us at foundation@mariadb.org!
super exciting, MariaDB. Very timely.
Do you know that the CONNECT engine has a vector table type?
At my university the database course is taught with MariaDB, and until now it is oriented towards the programming career. Since I am one of the first students of the Artificial Intelligence degree and I’m going to take it next semester, this is great news and just in time.
congratulations, great news for us, veterans using mariadb