Apache Airflow integration for MariaDB - winner of MariaDB BangPypers Hackathon 2025

We recently announced the winners of the MariaDB Hackathon at the BangPypers meetup in Bengaluru. We sat down with the Integration track first place winner to learn more about the team and their submission.

Pratush Maheshwari and Jyothi Muthuraj developed an Apache Airflow integration for MariaDB, addressing a significant gap in the data engineering ecosystem. They were interviewed by Robert Silén, Community Advocate and Kaj Arnö, Executive Chairman of MariaDB Foundation. For the recorded interview, watch it on Youtube, or read the summary below.

But first, a short introduction to the topic of data orchestration.

Introducing Data Orchestration Frameworks (for non-Data Engineers)

Before we move to the interview, it’s worth introducing the concept of data orchestration frameworks — especially for readers who are not data engineers.

Put simply, these frameworks help you organise and run complex chains of data tasks — a bit like turning a handful of individual scripts into a well-rehearsed orchestra. Each instrument (or task) plays in the right order, with the right timing, so that the whole data pipeline performs smoothly and predictably. You could think of it as “scripting made smarter” — scripts adapted for today’s large-scale, cloud-based data environments.

Some of the most widely used data orchestration frameworks include Apache Airflow, Dagster, Prefect, and Luigi. Each helps engineers define, schedule, and monitor workflows so that data can move and transform reliably across increasingly complex systems.

Who are you and what’s your background?

Pratush: Hello everyone. I’m a data engineer with about six years of experience across pharmaceutical and retail domains, working extensively with data systems and databases.

Jyothi: Hey, hello everyone. I’m from Bangalore with over five years in IT, currently working as a senior data engineer in retail. I’m passionate about building robust data pipelines.

Do you have any previous hackathon experience?

Pratush: I participated in some college hackathons, but recently Jyothi and I started actively seeking them out on platforms like HackerEarth. When we discovered the MariaDB hackathon and saw it was focused on data engineering, it felt like a perfect match.

Jyothi: I’d mainly done internal hackathons at Target, where I work. For the past few months we’ve been exploring external hackathons as a way to learn new technologies and contribute to the community.

How did you find our hackathon and what caught your interest?

Jyothi: We found it through social media and developer communities. As data engineers who work with databases daily, the name “MariaDB” immediately caught our attention. What sealed it was the focus—most hackathons nowadays center on AI or general data topics, but this one was specifically about database technology, which is our expertise.

What was your first experience with MariaDB?

Pratush: I learnt about MySQL when I started learning SQL. MySQL was my go-to tool – it was very easy to get started with. I’ve been using various database connection tools in my work, and I have seen MariaDB listed too. I really started exploring MariaDB a few months ago while working with embeddings and RAG systems. That’s when I discovered MariaDB’s vector storage capabilities and realized it was more than just a MySQL fork.

Jyothi: I learned about MariaDB through team discussions. After someone mentioned MariaDB, I researched how it compared to MySQL and discovered its unique features and performance improvements.

How did you end up choosing to integrate MariaDB with Apache Airflow?

Pratush: Since we use Airflow daily, I knew MySQL connectors existed. Out of curiosity, I compared MariaDB versus MySQL as a Python library and was surprised—SELECT statements were two to three times faster, and operations like LOAD or JSON updates showed similar improvements. That’s when we realized we should create a dedicated, optimized MariaDB connector.

Jyothi: When we were researching about this, I too was surprised to discover that MariaDB wasn’t available as a dedicated connector in Airflow. Airflow is used everywhere in the industry – it’s an essential tool for data engineering. But MariaDB, despite having so many advanced features compared to MySQL, didn’t have its own connector. That’s where we thought of participating in this integration track of the MariaDB hackathon, so data engineering people could easily use these powerful features.

The first time we did the comparison, it was three times faster in bulk injection, whether it was SELECT operations or JSON operations. MariaDB was performing way better than MySQL. We felt that everyone should definitely be using this MariaDB connector once we make it available.

What makes MariaDB special for your use case?

Pratush: When we were going through all the properties and features that MariaDB offers, the ColumnStore capability really stood out. I think more or less everyone uses analytical databases these days, and when you talk about real-time analytical databases, you need data to be available almost instantly. That’s where features like columnstore import come in, which is very, very fast.

Jyothi: People can use ColumnStore import in their code, but we wanted to abstract all that complexity – all the coding mechanisms – and just let users focus on their actual work. They can plug in their credentials, use the connector, and that’s it.

Additionally, having everything in one database – structured data, vectors for AI applications, indexes – makes MariaDB a great choice. You don’t have to think about multiple database solutions; MariaDB provides a comprehensive platform.

What’s the future roadmap for the Airflow connector?

Pratush: The path to full integration isn’t a simple process, and we’re taking it step by step. Right now, the Airflow team is quite busy with their new release, but we’ve sent communication to them about our connector and our plans.

We’re breaking it down into manageable steps:

Immediate next step: Making the connector available on PyPI so anyone can easily install it with pip. Even now, our connector is available from our GitHub – people can download it and use it right away.
Full Airflow integration: Working along with the Airflow community to integrate it completely into Airflow so anyone can use it without having to look elsewhere. We want it to be as seamless as possible.

The connector is already functional and can be used directly. We’re working to make it even more accessible and integrated into the broader ecosystem.

Kaj: That is music to our ears, we are happy to support in that process!

What was it like participating in the hackathon?

Pratush: First of all, I would like to thank you guys. The AMA (Ask Me Anything) sessions were really helpful. Just reading about what the hackathon is about versus actually attending the sessions and getting direct guidance made a huge difference. The way you explained things about MariaDB and what the competition was looking for – discussing aspects like code elegance, the impact it makes, and other evaluation criteria – helped us frame our idea and approach much better.

Taking part in the hackathon during October was special too. We had festivals in India, I had my birthday in October, and I was very happy to be working on this project. And definitely very happy to be on the winning side! It was a great experience and I’m definitely looking forward to more events from MariaDB. I’ll be more than happy to take part and contribute wherever I can.

Jyothi: Yeah, just to add – it really gives an individual a lot of confidence to contribute to open source projects. MariaDB is open source, and now we’re contributing something to it. Imagining someone seeing the MariaDB connector in their Airflow UI and using it, that brings a lot of happiness. That happiness really makes a lot of difference. That’s the more interesting part to be frank – to contribute to open source and make a difference.

What did you learn about MariaDB through this process?

Pratush: We learned a lot about MariaDB during this hackathon. As I mentioned earlier, while learning about AI and RAG systems, I discovered that MariaDB has features like vector storage. But through this deeper dive, we discovered so much more, Columnstore for analytics and more!