Due to a catalogue of issues our previous quarterly update for developer metrics was not published. This time, however, we have made quite a few changes. In this post, we will summarise 2022 and what has happened in the first couple of months of 2023. All the data for this blog post can be found in CSV format in the release section of the MariaDB Metrics repository, along with everything you need to generate the metrics yourself.
For the main commit metrics, we use a tool called “GitDM” or Git Data Miner which was developed for the git kernel trees to group commits by people and organisations. We have modified this in several ways so that it can provide the reports we require and work better with our git trees. In addition, we have also written scripts around it to automate the reports. All of this is open source and available here. We also have pull request statistics which are mined from GitHub’s API.
The most notable change has come in the form of categorisations. Previously we did this by basically re-parsing the metrics with a different configuration that used categories instead of organisation data. This was very quickly becoming out-of-sync and challenging to maintain. Therefore, we have modified Gitdm’s output so that it has a map of organisations into categories. It will then output an extra column in the “organisations” CSV file to show the category it is in, and can produce a categories-only CSV which is more accurate than our previous data.
We have also added several more projects to be tracked around the MariaDB Server. These are:
- mariadb_kernel – The MariaDB kernel for Jupyter notebook, this repository may be renamed in future to avoid confusion with an operating system kernel
- mariadb-docker – A Docker configuration for MariaDB Server
- mariadb-connector-c – The client library for MariaDB
On top of all this, we have made a number of fixes and cleanups to the configuration files we use to determine which hacker is attached to which organisation.
Finally, the most obvious change is MariaDB Corporation has been renamed MariaDB Plc. as part of their IPO. We will use their stock ticker (MRDB) when abbreviating. We have not made this change in the GitDM configuration yet but, will happen before the data next snapshot.
Before I delve into the wider stats I want to shine a light on the top organisations who contributed to MariaDB Server in 2022, these are:
MariaDB Plc. is at the top as to be expected, they pay the largest number of full-time developers for the MariaDB Server, but there are others that are interesting here. It has been a bumper year for contributions from Amazon, this is more than double their contribution count for 2021 and judging by the pull request pool and merged commits right now, 2023 will be even bigger. After this is Google Summer of Code, as I mentioned in a blog post earlier this week, we are huge supporters of the Google Summer of Code project and it is great to see contributions come in from these new contributors.
Now, let us dig into the data. The September 2022 report was a little error-prone due to scripting and configuration issues, but we have now refined the process and this report should be a lot more accurate. Now that we are in 2023 the report is 2020-2023 (to the beginning of March).
First up is the number of commits per organisation.
|MDBB||MariaDB Plc.||2460 / 81.43%||2123 / 83.39%||1816 / 82.51%||342 / 74.35%||6741 / 81.93%|
|MDBF||MariaDB Foundation||297 / 9.83%||192 / 7.54%||179 / 8.13%||80 / 17.39%||748 / 9.09%|
|Provider||Codership||71 / 2.35%||56 / 2.2%||32 / 1.45%||14 / 3.04%||173 / 2.1%|
|CONNECT||65 / 2.15%||55 / 2.16%||1 / 0.05%||–||121 / 1.47%|
|Tempesta||2 / 0.07%||3 / 0.18%||–||–||5 / 0.06%|
|Sponsor||IBM||63 / 2.09%||5 / 0.2%||1 / 0.05%||–||69 / 0.84%|
|GSoC||GSoC||1 / 0.03%||8 / 0.31%||34 / 1.54%||–||43 / 0.52%|
|Distro||All Distros||20 / 0.66%||20 / 0.79%||17 / 0.77%||2 / 0.43%||59 / 0.72%|
|Other||Amazon||2 / 0.07%||24 / 0.94%||54 / 2.45%||9 / 1.96%||89 / 1.08%|
|Others||40 / 1.32%||60 / 2.36%||67 / 3.04%||13 / 2.83%||180 / 2.19%|
- Red Hat is in a grey area after the IBM acquisition. For this matrix, we have put them under “Distro”, separate from IBM and “Sponsor”.
- “CONNECT” signifies the connect engine contributions which were by a single author.
- There are more entities in the “Sponsor” and “Provider” categories, but for simplification, these have been put into “Other” for this table along with independent contributors.
- Commits don’t always tell the full story, a commit could be anywhere between one line of code or thousands.
Next up we have the number of contributions to other projects in 2022, this is in “commits / hackers” format:
|libmarias3||ColumnStore Engine||Connector/C||MariaDB Docker|
|MariaDB Plc.||2 / 2||513 / 17||113 / 9||15 / 1|
|MariaDB Foundation||–||–||–||100 / 2|
|GSoC||–||11 / 2||–||–|
|Amazon||–||–||1 / 1||–|
|Other||–||6 / 2||5 / 4||4 / 3|
- The MariaDB Jupyter Kernel repository saw no commits in 2022 so has been omitted, there has been more work, particularly by independent developers on this, in 2023.
Finally, we have the MariaDB Server pull request metrics for the last couple of quarters. This shows the newly opened PR count for the week, the number of closed but not merged and the number of merged. The final two columns show the all-time running total number of PRs and the number that were still open at the end of that week.
|Week Ending||New PRs||Closed PRs||Merged PRs||Total PRs||Still Open PRs|
A couple of conclusions from this:
- We have been bombarded with pull requests and whilst doing our best, we are falling behind.
- At the end of 2022 everyone was away and beginning of February 2023 many were at FOSDEM, which significantly reduced the number of possible reviews.
Since the end of this report, the new PRs are trending rapidly upward. This is likely due to an influx of GSoC candidates getting up to speed with the codebase, as well as a large influx from entities such as Amazon.
One of the Foundation’s core focuses for 2023 is going to be trying to bring this down even more.
There are some special people we should thank based on the stats and the pull requests currently open:
- Amazon’s RDS team for a significant number of contributions and for making my job reviewing the code very busy.
- Weijun Huang, an independent contributor who used to be a GSoC student, they are by far the largest independent contributor of 2023 so far.
- Alexander Barkov from the MariaDB Plc., who touched more lines of code in 2022 than anyone else!