Comments on Kostja’s motivations on hacking MySQL

Recently Kostja posted two insightful blog posts about his thoughts on the currently fragmented MySQL landscape and quality of a piece of code contributed by a “community member”, which is a MySQL euphemism for a person not employed by MySQL. (Hence, the full time MySQL developers are themselves not members of their own community?)

I wanted to comment on both posts, but found out Kostja only allows logged in LiveJournal users to comment, which I am not. Since the posts were interesting enough, I suppose they deserve a comment in a new blog post like this instead.

From “RDBMS software is difficult” (slightly reordered)

The main reason it is harder to do changes with MySQL is a larger legacy, including political and managerial, but you get into exact same situation in any project after your first release. I said that all things considered, the current MySQL trunk is perhaps as good starting point for rethinking as the current Drizzle. […] I would not want to actually diminish importance of Drizzle (initially, I was fond of it and rather wanted to join; the reason I didn’t, I’ve just spelled out). I’d love to be proven wrong, but I don’t see it becoming such a universal piece of software that I personally would like to be contributing to.

Recent years there’s been a serious fragmentation of technical thought in MySQL ecosystem. Drizzle, MariaDB, Percona are excellent for community, but are not at all good for our ability to make MySQL a universal database platform. I mean, ability to make MySQL a database platform comparable to what Linux/Unix is nowadays to operating systems. Truth be said, I am not at all sure that my current employer, Oracle, is a good host to seek this holy grail either. Perhaps we’ll never get there, not with this project.

Kostja, you are not alone with such thoughts. I think it makes sense to separate Drizzle and other forks when one looks at the MySQL ecosystem. In my opinion, when Drizzle got started, all the good reasons for a new fork existed: Stagnated development in the original project, patches not flowing into the main trunk, not answering to new technological needs (the cloud)… at the same time, Drizzle’s approach is simply not useful in the short term. It is now 2 years since Drizzle got started. They will go into Beta this Summer, and even their first release is not even aiming for addressing the entire MySQL space.

This means that even in a best case scenario for Drizzle, short term it simply wasn’t realistic that all MySQL developers would have joined it. MySQL has a large install base of servers currently in production. You can not turn your back to that, on the contrary, your best bet for a universal database of course is always the one who already has so many users. Even so, I think there was good reasons for a small group of developers forking Drizzle. This has the cost they are essentially away from MySQL/MariaDB development, except the friendly support we still give to each other.

So to your thoughts on this, I just wanted to say this is exactly the same reason I work for MariaDB and not Drizzle. If Drizzle one day “gets there” I will not hesitate to redirect my energy when the time is right, but for the time being, this is the reasons I work for MariaDB.

As for all the other forks, which remain more or less compatible with the original MySQL code base, the situation is different. It is mostly a result of how MySQL was organized: on the outside, even if we wanted, we cannot participate in some of the MySQL infrastructure like Pushbuild, we cannot call our packages “MySQL” for trademark reasons (Percona did it first but not anymore), and MySQL will not incorporate our code into itself (when released as GPL), so we end up diverging.

So when looking at the big picture, it is a bit messy at the moment. At the same time, it is nice to see how the people in the MySQL community, whether developers or else, are all very committed to continue to work together, despite current obstacles. For instance, also myself wouldn’t trust that Oracle is the perfect steward to take MySQL forward, but I don’t think MySQL AB was near-perfect either! As long as Oracle pays you a salary and you can develop GPL code, it is up to the community as a whole to make sure there is a future for that code. Oracle is welcome to contribute – and they do – but the future of our open source database must not be dependent on what one company is doing.

Then in “How on earth is it possible to accept this” Kostja laments the low quality of a contributed patch:

Should a semi-working, semi-documented code be accepted, expecting that there will be more patches?

The answer is obviously “No”. A semi-working patch should be reviewed and feedback given, so the original developer can continue to perfect it.

Unfortunately, this was not happening in MySQL. In MariaDB 5.2 we have now pulled in quite many patches created in the MySQL community over the years. We had to spend significant effort to get them into acceptable quality. This is not how an open source project is supposed to work. (If you send a low quality patch to Linux, it’s not like Linus will hold your hand and fix it for you.) But since this kind of workflow has not been in place before, and many patches were several years old, we have considered our work on them as a “bootsrapping effort”. It didn’t feel right to go back to someone that contributed something 3 years ago and now ask them to fix a few things. Even so, we don’t intend to continue this way, we do want to turn MariaDB into a project where it is up to the contributor to finish several iterations of a patch, then we commit it.

And by the way, it is not like being employed by MySQL/Sun/Oracle magically makes you a flawless coder either. In recent merges we have started to reject some patches coming from MySQL, since they don’t pass our review, or more likely since they get caught in our automated QA (buildbot). For instance, by including some engines that MySQL doesn’t, we get broader test coverage then MySQL and sometimes catch errors that pass the MySQL process.

And to finish the loop, I suspect MariaDB developers (in particular those employed by Monty Program) are not perfect either! One of the patches we spent some effort improving before committing it was Segmented Key Cache. But now we get feedback from the original developer that the finalized patch gives him less performance boost than his original patch does. Maybe we broke something while reviewing? This is still being investigated as I write.

I just wanted to respond to this since there is a perception with many in MySQL (and I speak perhaps more of some managers than people like Kostja) that a non-employee simply couldn’t produce something useful and this community thing is just a distraction. I hope MariaDB 5.2 proves that there have been useful contributions, and I hope the future will prove there can be much more. And, no matter who you are employed with, you will produce bugs every now and then.

Next Entry RDBMS software is difficultRDBMS software is difficult