ChatGPT, OpenAI and MariaDB Foundation

When everybody and their grandmother are talking about ChatGPT, you know something is happening – something with significance outside the usual IT bubble. As the first in a series of blog entries, let me reflect upon what AI means for users of MariaDB Server – or, at least, what implications we at the MariaDB Foundation can see at this point in time.

The AI revolution is inevitable …

Pundits say that lawyers or programmers won’t be replaced by AI – but they will be replaced by lawyers or programmers that use AI. I would agree.

So we want to jump in on the excitement, and ensure the readers of mariadb.org blogs belong to the latter category.

… but it’s also exciting!

Because excitement, that is the overall feeling we get about the recent improvements in AI! After playing around with various LLM models in a multitude of areas of expertise, I have a feeling like the little boy who discovered programming in the late 1970s. Wow, you can do this! Ooh, the possibilities are endless! Of course, there have been many similar pushes of excitement since then; Oh, there are high level languages more expressive than simple BASIC and Z80 Assembler. Ah, Focus 4GL! Ooh, Delphi. Mmm, Python. Oh, GUIs. Wow, Open Source. Yay, Internet. Hey, Stack Overflow. Just to mention a few of my past excitements.

AI has a bad track record

Until now, AI has been a source of disappointment for me personally. Level 5 was an AI-ish tool provided in the early 1990s by the same company that did the Focus 4GL; later on I dabbled into KBMS by AICorp. The results were underwhelming, and I learned to steer clear of AI, with its track record of overpromising and underdelivering.

So what are the implications?

There is hardly any need to prove why OpenAI and ChatGPT excite me now, with all the press coverage. So let me instead elaborate on the consequences. Or rather, let me ask the question: How should MariaDB Foundation react to the new world of impressive and productive AI?

Three ways for us to ride the AI wave

We see three ways for us to ride the AI wave:

1. Increase our own productivity

2. Increase the productivity of MariaDB Server users

3. Indirectly improve the quality of AI responses about MariaDB Server 

1. Increasing our own productivity

In all likelihood, our own productivity is not a high concern of yours. But it is a concern of our sponsors, who expect us to deliver on our mission of Openness, Adoption, and Continuity – with the limited resources we have. So if we become better in our communication (more high quality blog entries! better presentations at events! improved documentation!) without added resources, it benefits the end user of MariaDB Server.

Also very significantly: eat your own dog food. We can hardly help our user base use ChatGPT or Bard or LaMDA, if we don’t use them ourselves. 

2. Increasing the productivity of MariaDB Server users

If we can help you use AI to become more productive as a developer of applications using MariaDB Server, or as a DBA, then we’re doing our job properly.

There are plenty of blog entries out there that show how to get access to OpenAI (eg. through https://nat.dev/), so we’re more inclined to show how to best use AI to create the right CREATE statements. How do you use AI to come up with the right initial database schema? To get it into the third normal form, or to denormalise it for reporting purposes? To index it? 

Or, for that matter, can AI help you improve upon your SELECT statements? Choose the right replication topology? Find the right connectors? Make sense of a strange error message? 

In all of the above cases, we can do the experimenting for you. Your time is limited, so why don’t we spend the time experimenting – and sharing the results with you?

3. Indirectly improving the quality of AI responses about MariaDB Server 

If I were the designer of a Large Language Model, I would certainly pay more attention to Wikipedia and Wikidata than to general blog entries. For a product like MariaDB Server, I would pay a lot of attention to the MariaDB Knowledge Base on https://mariadb.com/kb/ – as the canonical representation of knowledge about MariaDB Server. And based on our experiments so far, this indeed seems to be the case.

True, it’s hard to tie GPT output to individual pieces of knowledge out there. Nonetheless, we’re learning more as we experiment – which in turn means that we get an idea of where the LLMs get their information. Sometimes the source is under the direct control of MariaDB Foundation, and then we can experiment in ways of tuning it to become more accessible for AI models. 

If AI answers contain misinformation about MariaDB Server, we have a problem. A problem on behalf of our users. How can you properly use our software, if you get wrong advice?

Acting on the misinformation, we can do a number of things. We can highlight the misinformation towards you, but that will only help those of you who read our blogs. We would ideally want to fix the AI output ASAP. Retrain the models. This will become speculative, but we may in individual cases be able to trace a likely source of misinformation to some external website, which we can then approach with a suggestion to fix the bug. 

Is that boiling the ocean? A drop of water on a hot stone? We will at least make an attempt at fixing some apparent misconceptions, because misinformation on highly frequented websites is a problem even without AI and with grandpa’s old Google queries.

MariaDB Foundation has a resident AI guru

We have a resident AI guru at MariaDB Foundation, Vlad Bogolin. My arbitrary definition of such a guru is one who has spent time using or at least experimenting with AI during the past few years (so my own adventures in the 1990s don’t count). To qualify as such a guru, it’s not enough to have played around with a Machine Learning library in Python. Experiments also with natural language prompts and image creation are required.

Vlad qualifies. His mindset is thus more tuned to how to use LLM than those of us who (like me) started experimenting only after ChatGPT was announced. And this shows in his approach to nearly any problem. 

ChatGPT suggests our blog titles

A few weeks ago, Vlad and I chatted about how to use ChatGPT. Can we write better blog entries, faster? Yes we can, said Vlad. But, ehh, can they be published? Sure they can, if reviewed properly, said Vlad. Well, then, please draft a few experimental blog entries, I asked him, and he said he would.  

But he took it a step further, and asked ChatGPT for ideas on what to blog about. He came up with five titles:

  1. Getting Started with MariaDB: A Comprehensive Guide for Beginners
  2. Understanding MariaDB Architecture: A Deep Dive
  3. Mastering MariaDB Indexes for Optimal Performance
  4. MariaDB Security Best Practices: Safeguard Your Data
  5. MariaDB Performance Tuning: Tips and Tricks

Not bad, even if that is a selection of the best ones.

Our plan: Highlight misinformation, one area at a time

Now, our plan is to take a look at those blog entries and ask ChatGPT to write them. And then for us to highlight the misinformation. This is in line with all three of our ways to ride the AI wave. First, we produce more content for you, in a shorter time. Second and more importantly, we help you become more productive. And third, we learn about the nature of AI’s misconceptions, and learn about how we may possibly fix them.

So, stay tuned for what’s coming. I hope you get and share the same type of little-boy (or little-girl) excitement I got, from this exciting AI revolution!