MariaDB Contribution Statistics, March 2023

Due to a catalogue of issues our previous quarterly update for developer metrics was not published. This time, however, we have made quite a few changes. In this post, we will summarise 2022 and what has happened in the first couple of months of 2023. All the data for this blog post can be found in CSV format in the release section of the MariaDB Metrics repository, along with everything you need to generate the metrics yourself.

Changes to metrics gathering

For the main commit metrics, we use a tool called “GitDM” or Git Data Miner which was developed for the git kernel trees to group commits by people and organisations. We have modified this in several ways so that it can provide the reports we require and work better with our git trees. In addition, we have also written scripts around it to automate the reports. All of this is open source and available here. We also have pull request statistics which are mined from GitHub’s API.

The most notable change has come in the form of categorisations. Previously we did this by basically re-parsing the metrics with a different configuration that used categories instead of organisation data. This was very quickly becoming out-of-sync and challenging to maintain. Therefore, we have modified Gitdm’s output so that it has a map of organisations into categories. It will then output an extra column in the “organisations” CSV file to show the category it is in, and can produce a categories-only CSV which is more accurate than our previous data.

We have also added several more projects to be tracked around the MariaDB Server. These are:

  • mariadb_kernel – The MariaDB kernel for Jupyter notebook, this repository may be renamed in future to avoid confusion with an operating system kernel
  • mariadb-docker – A Docker configuration for MariaDB Server
  • mariadb-connector-c – The client library for MariaDB

On top of all this, we have made a number of fixes and cleanups to the configuration files we use to determine which hacker is attached to which organisation.

Finally, the most obvious change is MariaDB Corporation has been renamed MariaDB Plc. as part of their IPO. We will use their stock ticker (MRDB) when abbreviating. We have not made this change in the GitDM configuration yet but, will happen before the data next snapshot.

Top Organisations of 2022

Before I delve into the wider stats I want to shine a light on the top organisations who contributed to MariaDB Server in 2022, these are:

NameHackersCommits
MariaDB Plc.361816
MariaDB Foundation8179
Amazon1554
GSoC334
Codership532
Top 5 organisations of 2022 ordered by commit

MariaDB Plc. is at the top as to be expected, they pay the largest number of full-time developers for the MariaDB Server, but there are others that are interesting here. It has been a bumper year for contributions from Amazon, this is more than double their contribution count for 2021 and judging by the pull request pool and merged commits right now, 2023 will be even bigger. After this is Google Summer of Code, as I mentioned in a blog post earlier this week, we are huge supporters of the Google Summer of Code project and it is great to see contributions come in from these new contributors.

Deeper Dive

Now, let us dig into the data. The September 2022 report was a little error-prone due to scripting and configuration issues, but we have now refined the process and this report should be a lot more accurate. Now that we are in 2023 the report is 2020-2023 (to the beginning of March).

First up is the number of commits per organisation.

CategoryEntity2020202120222023Total
MDBBMariaDB Plc.2460 / 81.43%2123 / 83.39%1816 / 82.51%342 / 74.35%6741 / 81.93%
MDBFMariaDB Foundation297 / 9.83%192 / 7.54%179 / 8.13%80 / 17.39%748 / 9.09%
ProviderCodership71 / 2.35%56 / 2.2%32 / 1.45%14 / 3.04%173 / 2.1%
CONNECT65 / 2.15%55 / 2.16%1 / 0.05%121 / 1.47%
Tempesta2 / 0.07%3 / 0.18%5 / 0.06%
SponsorIBM63 / 2.09%5 / 0.2%1 / 0.05%69 / 0.84%
GSoCGSoC1 / 0.03%8 / 0.31%34 / 1.54%43 / 0.52%
DistroAll Distros20 / 0.66%20 / 0.79%17 / 0.77%2 / 0.43%59 / 0.72%
OtherAmazon2 / 0.07%24 / 0.94%54 / 2.45%9 / 1.96%89 / 1.08%
Others40 / 1.32%60 / 2.36%67 / 3.04%13 / 2.83%180 / 2.19%
Total3021254622014608228
Commits by organisation

Notes:

  1. Red Hat is in a grey area after the IBM acquisition. For this matrix, we have put them under “Distro”, separate from IBM and “Sponsor”.
  2. “CONNECT” signifies the connect engine contributions which were by a single author.
  3. There are more entities in the “Sponsor” and “Provider” categories, but for simplification, these have been put into “Other” for this table along with independent contributors.
  4. Commits don’t always tell the full story, a commit could be anywhere between one line of code or thousands.

Next up we have the number of contributions to other projects in 2022, this is in “commits / hackers” format:

libmarias3ColumnStore EngineConnector/CMariaDB Docker
MariaDB Plc.2 / 2513 / 17113 / 915 / 1
MariaDB Foundation100 / 2
GSoC11 / 2
Amazon1 / 1
Other6 / 25 / 44 / 3
Commits / Hackers for 2022 in other MariaDB projects
  1. The MariaDB Jupyter Kernel repository saw no commits in 2022 so has been omitted, there has been more work, particularly by independent developers on this, in 2023.

Pull Requests

Finally, we have the MariaDB Server pull request metrics for the last couple of quarters. This shows the newly opened PR count for the week, the number of closed but not merged and the number of merged. The final two columns show the all-time running total number of PRs and the number that were still open at the end of that week.

Week EndingNew PRsClosed PRsMerged PRsTotal PRsStill Open PRs
2022-09-0411162243118
2022-09-116352249116
2022-09-189342258118
2022-09-2510762268115
2022-10-022152270111
2022-10-092112272111
2022-10-165142277111
2022-10-23113132288106
2022-10-30174172305102
2022-11-066012311107
2022-11-136452317104
2022-11-208112325110
2022-11-2711322336116
2022-12-0410782346111
2022-12-1110442357113
2022-12-1811262368116
2022-12-2516572384120
2023-01-013002387123
2023-01-08249152411123
2023-01-1513562424125
2023-01-22121262436119
2023-01-29173142453119
2023-02-0517032470133
2023-02-12143122484132
2023-02-1951102489126
2023-02-267232496128
2023-03-0518542514137
Pull request counts

A couple of conclusions from this:

  1. We have been bombarded with pull requests and whilst doing our best, we are falling behind.
  2. At the end of 2022 everyone was away and beginning of February 2023 many were at FOSDEM, which significantly reduced the number of possible reviews.

Since the end of this report, the new PRs are trending rapidly upward. This is likely due to an influx of GSoC candidates getting up to speed with the codebase, as well as a large influx from entities such as Amazon.

One of the Foundation’s core focuses for 2023 is going to be trying to bring this down even more.

Thanks

There are some special people we should thank based on the stats and the pull requests currently open:

  • Amazon’s RDS team for a significant number of contributions and for making my job reviewing the code very busy.
  • Weijun Huang, an independent contributor who used to be a GSoC student, they are by far the largest independent contributor of 2023 so far.
  • Alexander Barkov from the MariaDB Plc., who touched more lines of code in 2022 than anyone else!

Published by Andrew Hutchings

Chief Contributions Officer for the MariaDB Foundation