MariaDB Contribution Statistics, October 2023

We are in October, which means it has been 4 months since the last metrics report. It is, therefore, time for another quarterly metrics report (plus a bit more). The extra month was to allow for an announcement which is a prerequisite for this post, and it also means we are more or less aligned to real quarters. The major changes to this will come in the second half of this post, we have lots of additional data for pull requests. With that, let’s get started.

Update Process

Before we get into the statistics themselves, I wanted to delve a little bit into the process I use to update the statistics. Everything used is in the public metrics repository.

The first thing I do is run the generate_3+year.sh script to generate the provisional CSV output. I then analyse these CSVs to find out where the configuration needs updating. The CSV output will contain email addresses for people it does not recognise. I will look up (to the best of my ability) who these people are. If they already contribute, but this is a new email address, such as a GitHub auto-generated one, they are added to the “aliases” list. If they are a new contributor, I try to figure out their affiliation and add them to the “employers” accordingly (this list should be renamed at some point).

I’ll also do a few dives into the CSVs to try and find errors and tweak the configurations accordingly. There will be things I miss and people I cannot identify correctly. But this is as accurate as I can make it. The configuration is public, so you are welcome to contribute any fixes for things I have missed.

Finally I’ll re-run the script so that I can generate this report.

The pull request statistics are easier, they just require a GitHub token and they can be left to execute. They take some time to execute due to GitHub’s API rate limiting, but I have optimised things to accelerate it this time around. This was needed because we are making significantly more API calls with the newer metrics gathered.

GSoC Season

For those who aren’t aware of it already, we are at the end of the main development period for Google Summer of Code. This is a program with which Google pays contributors, to work on a project over the summer (Northern Hemisphere summer, at least). Traditionally, this has just been for university students, but this year it was open to everyone.

This has meant that we have had some great contributions start to land in the trees as a result of this. Not least, a large pull request to bring RocksDB up to date in MariaDB (merged into 11.3). In total, across the projects monitored, we had 13,636 lines of code added and 4,164 lines of code removed by GSoC developers this year.

Unfortunately, due to a regression that could not be fixed before 11.3 reached preview release, the RocksDB update had to be rolled back. But we are confident that we can bring it back in the future.

Amazon Sponsorship

As you may have seen in our recent news, Amazon has become a sponsor of MariaDB Foundation, as such they have been moved to the “Sponsor” category in the data. This change will apply to all years of data in this data snapshot release because categories, for now, do not have date ranges. Whereas, the affiliation of individual contributors is tracked with date ranges.

Project Tracking

As with last month, I’ll provide a summary of MariaDB Plc / Foundation contributions and external contributions for each project. These are:

  • MariaDB Server – the server itself
  • libmarias3 – an open source library to talk to Amazon S3 and related block storage services. Maintained by MariaDB Plc. and used for Aria’s S3 storage and MariaDB ColumnStore
  • MariaDB ColumnStore – a columnar based, clustered storage engine for MariaDB Server. Maintained by MariaDB Plc.
  • MariaDB Docker – the official Docker image files for MariaDB Server. Maintained by the MariaDB Foundation
  • MariaDB Jupyter Kernel – a Jupyter Notebook plugin for MariaDB Server. Maintained by the MariaDB Foundation
  • MariaDB Connector/C – the C client library for MariaDB Server. Maintained by MariaDB Plc.

Compared to last time there has been a ~50% increase in MariaDB Server and MariaDB ColumnStore Engine commits. Development on the other projects has been a little slower.

ProjectHackers MariaDBCommits MariaDBHackers OthersCommits Others
MariaDB Server37138951160
libmarias33411
MariaDB ColumnStore Engine16231711
MariaDB Docker27028
MariaDB Jupyter Kernel21224
MariaDB Connector/C888
Hackers & commits from MariaDB Plc. + MariaDB Foundation and everyone else

Note that due to me not noticing the multiple email addresses used by one developer, the MariaDB ColumnStore Engine stats counted one person twice last time. The gitdm configuration has been adjusted to account for this for this blog post.

New Contributors

As mentioned earlier, we have had some new GSoC contributors this time around. But we also have Ruoyu Zhong from the Homebrew project contribute some fixes so that MariaDB will continue to compile well with Homebrew. We thank you for this work.

This also means that Homebrew has been added to the “Distros” category for the statistics.

Data Comparison

Let’s compare MariaDB Server’s contributions stats from three months ago to today.

CategoryEntityJune 2023
Contributors
June 2023
Commits
October 2023
Contributors
October 2023
Commits
MRDBMariaDB Plc.25706301213
MRDFMariaDB Foundation71267176
ProviderCodership636648
SponsorAmazon11331443
GSoCGSoC1112
DistroAll Distros3357
OtherOthers176532109
TOTAL70970881549
MariaDB Server contribution metrics comparing stats from June 2023 to October 2023

Notes:

  1. Red Hat is in a grey area after the IBM acquisition. For this matrix, we have put them under “Distro”, separate from IBM and “Sponsor”.
  2. There are more entities in the “Sponsor” and “Provider” categories, but for simplification, these have been put into “Other” for this table along with independent contributors.
  3. Commits don’t always tell the full story, a commit could be anywhere between one line of code or thousands.

Pull Requests

This section is where things get interesting this time around. As mentioned in a previous blog post, we now generate statistics around the time to first meaningful response. Which means we now have the following statistics:

  • New PRs: The number of PRs that have been opened that week.
  • Draft PRs: Of the newly opened PRs that week, how many are currently drafts.
  • Closed PRs: The number of PRs that have been closed that week (not merged).
  • Merged PRs: The number of PRs that have been merged that week.
  • Total PRs: The total number of PRs we have had up to the end of that week.
  • Still Open PRs: The total number of PRs still open (including draft) at the end of that week.
  • Days to First Response: The average number of days to first meaningful response of PRs for PRs that have been responded, for the PRs opened that week.
  • New PRs Responded: The total number of PRs that have had a meaningful response that have been opened that week.
  • PRs Self Merge No Review: The number of PRs opened that week which have been merged by the author with no review from anyone else in the MariaDB team.
  • PRs Self Closed No Review: The number of PRs opened that week which have had no meaningful response and have been closed by the author.

I’m going to continue on from the June 2023 report, but the snapshot CSV file has the data for the entire year to date. I’m also going to split the PR count and response data into separate tables, otherwise the column count because too large.

Week EndingNew PRsDraft PRsClosed PRsMerged PRsTotal PRsStill Open PRs
2023-06-1871302661155
2023-06-2540042665155
2023-07-0250402670156
2023-07-0960152676156
2023-07-1680182684155
2023-07-23140462698159
2023-07-3070872705151
2023-08-0630232708149
2023-08-1370412715151
2023-08-2050322720151
2023-08-2770102727157
2023-09-0350142732157
2023-09-10101142743162
2023-09-17711352750151
2023-09-2490232759155
2023-10-0150612764153
Pull request counts
Week EndingDays to First ResponseNew PRs RespondedNew PRs Not RespondedPRs Self Merged No ReviewPRs Self Closed No Review
2023-06-181.54200
2023-06-2512.84000
2023-07-020.33110
2023-07-0917.45100
2023-07-1619.25210
2023-07-2326.85522
2023-07-304.33310
2023-08-069.73000
2023-08-1311.65101
2023-08-2011.54100
2023-08-2711.37000
2023-09-030.04010
2023-09-101.54311
2023-09-170.02400
2023-09-242.65301
2023-10-011.01400
Pull request responses

The first table, which I used to show with every report, shows that we need to improve some things. The second one adds a whole new dimension to this. Towards the end of July I started an effort to get the oldest pull requests moving along again, so that we could bring them to a conclusion. This gave us a nice bump in closes and merges, but more work needs to be done there.

As for the responses data, there is some interesting information here. The biggest red-flag is the “PRs Self Merged No Review”, this is where the author self-merged their code without a recorded review. A review may have happened outside of GitHub, but nothing was recorded there. Processes need to be put in place so that this does not need to happen.

Last Update Reporting

Another addition thing added to the metrics reporting codebase is reports on pull requests that require attention. This is a tool called report.py and one example output below is the five pull requests that have had the longest time since there has been any action. These will be a priority for me to get moving again.

Pull RequestDays Since Last Update
16241065
16511012
1756922
1689795
1872706

Next Time

If there are any additions you would like to see, please let us know. Otherwise, I’ll be back in December with even more metrics!

Published by Andrew Hutchings

Chief Contributions Officer for the MariaDB Foundation