Adaptive Query Optimizer for MariaDB Vector - Innovation Winner of MariaDB Python Hackathon 2025

We recently announced the winners of the MariaDB Python Hackathon. We sat down with the Innovation track first place winners to learn more about the team and their submission.

Aakanksha Singh and Mihir Phalke developed an Adaptive Query Optimizer for MariaDB Vector, addressing performance challenges in vector similarity search operations. They were interviewed by Robert Silén, Community Advocate and Kaj Arnö, Executive Chairman of MariaDB Foundation. For the recorded interview, watch it on Youtube, or read the interview below.

Introducing Mihir and Aakanksha

Mihir: My name is Mihir Phalke and alongside me is Aakanksha Singh. We’re both third-year computer engineering students based out of Mumbai. Together we have been working on various hackathons. We were interested in starting our journey in open source and that’s why we are here.

We encountered MariaDB through our coursework where we have MySQL in our curriculum. So that’s how we were confident about taking part in this hackathon.

Did you know about vectors before the hackathon?

Mihir: Since we have the exposure of working in various hackathons and building ML-related applications, we were aware of what vectors are. That’s how we started researching more in depth about how they work internally and made our project.

How did you arrive at this particular submission idea?

Aakanksha: We were looking at what problems we could address when it came to the innovation track. When it comes to building AI models and RAG systems, MariaDB already has implemented the vector data type as well as the HNSW indexing.

Our solution – the cost-based optimizer – was an automated layer which could be built upon the HNSW indexing in order to help developers find the solution which had the least latency. We wanted to build something upon it and innovate something there. So we went through a lot of the problem statements and issues.

Kaj: I think it’s quite innovative to sort of compare how you do the index search with the vector and then doing other types of searches. That’s sort of a fundamental question that somebody who’s exposed to vectors for the first time has to ask themselves: when do I use a vector search versus when do I use a standard SQL where clause?

What’s the benefit of your solution for developers?

Aakanksha: The core idea of this project was to make the developer’s job or work really easy. They can focus on building the code and the actual implementation, not really having to worry about which method to go with – whether they should be applying a vector-first approach or an SQL-first approach.

Kaj: MariaDB itself has sort of this dual where clause where you combine a classic where clause which is just SQL-based with this fluffy vector-based thing. So as the developer, you don’t have to do all of the thinking about “well now I do a pure SQL one and now I do a vector-based one.”

How can developers use your contribution?

Mihir: Basically our vision with this product was to publish this as a Python library which can be installed by users, and then they can import this particular library and start building upon this.

What are the future enhancements you’re planning?

Mihir: Currently we have the threshold value for which we can determine whether we should go with the SQL-first approach or vector-first approach. But we want to train an ML model which can be trained on historical data to determine what should be the optimal threshold.

Adding a machine learning layer to that can be done as a future enhancement, and we are trying to do that as well from our end. But we are also open for other people to work on this and submit PRs.

Kaj: So you are actively encouraging contributions and you will devote some energy into focusing on them and commenting.

Mihir: Yes, absolutely.

Robert: By the way, it is interesting that there is a MariaDB Jira ticket describing the issue that your submission solves: MDEV-33412 “cost-based optimizer choice for k-NN indexes”.

How was your experience participating in the hackathon?

Mihir: First of all, the experience was amazing for us. The constant support from the MariaDB team and the AMA (Ask Me Anything) sessions which had been conducted – all of that was very helpful for us. Plus the documentation of MariaDB was very clear, so we had a great time exploring that and there were very few hurdles in understanding what MariaDB actually is.

But apart from that, while we were building, we did encounter lots of bugs for us personally during our code, but we kept on going and this project is now a reality. So we’re happy about that.

What are your future plans?

Mihir: Honestly speaking, what we want to first do is make this project and enhance it – as I said, adding the machine learning layer. That’s something which we want to explore so that we get exposure to both databases and machine learning. That’s the kind of intersection we’d love to contribute to.

Talking about the future in terms of our career, we’ll always be willing to learn. In today’s world everything is changing so rapidly, so that ability to learn and keep adapting is essential and that’s what we want to focus on. But this was an amazing start to the open source journey and we look forward to contributing to it.

Robert: That’s the nice thing with contributing to open source – that you can get others to use your solutions and contribute to a larger ecosystem. So great to hear that you enjoyed that.

Kaj: As we said in the Ask Me Anything sessions during the hackathon, we are super happy to shine some extra light on your project. We will link to it of course from this blog entry, but there will also be a set of pointers to all of the contributions that are relevant and easy to use, and yours is obviously a top one amongst those. So we hope to give you indirectly much more attention to your contribution.