MariaDB Contribution Statistics, October 2023

We are in October, which means it has been 4 months since the last metrics report. It is, therefore, time for another quarterly metrics report (plus a bit more). The extra month was to allow for an announcement which is a prerequisite for this post, and it also means we are more or less aligned to real quarters. The major changes to this will come in the second half of this post, we have lots of additional data for pull requests. With that, let’s get started.

Update Process

Before we get into the statistics themselves, I wanted to delve a little bit into the process I use to update the statistics. Everything used is in the public metrics repository.

The first thing I do is run the generate_3+year.sh script to generate the provisional CSV output. I then analyse these CSVs to find out where the configuration needs updating. The CSV output will contain email addresses for people it does not recognise. I will look up (to the best of my ability) who these people are. If they already contribute, but this is a new email address, such as a GitHub auto-generated one, they are added to the “aliases” list. If they are a new contributor, I try to figure out their affiliation and add them to the “employers” accordingly (this list should be renamed at some point).

I’ll also do a few dives into the CSVs to try and find errors and tweak the configurations accordingly. There will be things I miss and people I cannot identify correctly. But this is as accurate as I can make it. The configuration is public, so you are welcome to contribute any fixes for things I have missed.

Finally I’ll re-run the script so that I can generate this report.

The pull request statistics are easier, they just require a GitHub token and they can be left to execute. They take some time to execute due to GitHub’s API rate limiting, but I have optimised things to accelerate it this time around. This was needed because we are making significantly more API calls with the newer metrics gathered.

GSoC Season

For those who aren’t aware of it already, we are at the end of the main development period for Google Summer of Code. This is a program with which Google pays contributors, to work on a project over the summer (Northern Hemisphere summer, at least). Traditionally, this has just been for university students, but this year it was open to everyone.

This has meant that we have had some great contributions start to land in the trees as a result of this. Not least, a large pull request to bring RocksDB up to date in MariaDB (merged into 11.3). In total, across the projects monitored, we had 13,636 lines of code added and 4,164 lines of code removed by GSoC developers this year.

Unfortunately, due to a regression that could not be fixed before 11.3 reached preview release, the RocksDB update had to be rolled back. But we are confident that we can bring it back in the future.

Amazon Sponsorship

As you may have seen in our recent news, Amazon has become a sponsor of MariaDB Foundation, as such they have been moved to the “Sponsor” category in the data. This change will apply to all years of data in this data snapshot release because categories, for now, do not have date ranges. Whereas, the affiliation of individual contributors is tracked with date ranges.

Project Tracking

As with last month, I’ll provide a summary of MariaDB Plc / Foundation contributions and external contributions for each project. These are:

MariaDB Server – the server itself
libmarias3 – an open source library to talk to Amazon S3 and related block storage services. Maintained by MariaDB Plc. and used for Aria’s S3 storage and MariaDB ColumnStore
MariaDB ColumnStore – a columnar based, clustered storage engine for MariaDB Server. Maintained by MariaDB Plc.
MariaDB Docker – the official Docker image files for MariaDB Server. Maintained by the MariaDB Foundation
MariaDB Jupyter Kernel – a Jupyter Notebook plugin for MariaDB Server. Maintained by the MariaDB Foundation
MariaDB Connector/C – the C client library for MariaDB Server. Maintained by MariaDB Plc.

Compared to last time there has been a ~50% increase in MariaDB Server and MariaDB ColumnStore Engine commits. Development on the other projects has been a little slower.

Project	Hackers MariaDB	Commits MariaDB	Hackers Others	Commits Others
MariaDB Server	37	1389	51	160
libmarias3	3	4	1	1
MariaDB ColumnStore Engine	16	231	7	11
MariaDB Docker	2	70	2	8
MariaDB Jupyter Kernel	2	12	2	4
MariaDB Connector/C	8	88	–	–

Hackers & commits from MariaDB Plc. + MariaDB Foundation and everyone else

Note that due to me not noticing the multiple email addresses used by one developer, the MariaDB ColumnStore Engine stats counted one person twice last time. The gitdm configuration has been adjusted to account for this for this blog post.

New Contributors

As mentioned earlier, we have had some new GSoC contributors this time around. But we also have Ruoyu Zhong from the Homebrew project contribute some fixes so that MariaDB will continue to compile well with Homebrew. We thank you for this work.

This also means that Homebrew has been added to the “Distros” category for the statistics.

Data Comparison

Let’s compare MariaDB Server’s contributions stats from three months ago to today.

Category	Entity	June 2023 Contributors	June 2023 Commits	October 2023 Contributors	October 2023 Commits
MRDB	MariaDB Plc.	25	706	30	1213
MRDF	MariaDB Foundation	7	126	7	176
Provider	Codership	6	36	6	48
Sponsor	Amazon	11	33	14	43
GSoC	GSoC	1	1	1	2
Distro	All Distros	3	3	5	7
Other	Others	17	65	32	109
	TOTAL	70	970	88	1549

MariaDB Server contribution metrics comparing stats from June 2023 to October 2023

Notes:

Red Hat is in a grey area after the IBM acquisition. For this matrix, we have put them under “Distro”, separate from IBM and “Sponsor”.
There are more entities in the “Sponsor” and “Provider” categories, but for simplification, these have been put into “Other” for this table along with independent contributors.
Commits don’t always tell the full story, a commit could be anywhere between one line of code or thousands.

Pull Requests

This section is where things get interesting this time around. As mentioned in a previous blog post, we now generate statistics around the time to first meaningful response. Which means we now have the following statistics:

New PRs: The number of PRs that have been opened that week.
Draft PRs: Of the newly opened PRs that week, how many are currently drafts.
Closed PRs: The number of PRs that have been closed that week (not merged).
Merged PRs: The number of PRs that have been merged that week.
Total PRs: The total number of PRs we have had up to the end of that week.
Still Open PRs: The total number of PRs still open (including draft) at the end of that week.
Days to First Response: The average number of days to first meaningful response of PRs for PRs that have been responded, for the PRs opened that week.
New PRs Responded: The total number of PRs that have had a meaningful response that have been opened that week.
PRs Self Merge No Review: The number of PRs opened that week which have been merged by the author with no review from anyone else in the MariaDB team.
PRs Self Closed No Review: The number of PRs opened that week which have had no meaningful response and have been closed by the author.

I’m going to continue on from the June 2023 report, but the snapshot CSV file has the data for the entire year to date. I’m also going to split the PR count and response data into separate tables, otherwise the column count because too large.

Week Ending	New PRs	Draft PRs	Closed PRs	Merged PRs	Total PRs	Still Open PRs
2023-06-18	7	1	3	0	2661	155
2023-06-25	4	0	0	4	2665	155
2023-07-02	5	0	4	0	2670	156
2023-07-09	6	0	1	5	2676	156
2023-07-16	8	0	1	8	2684	155
2023-07-23	14	0	4	6	2698	159
2023-07-30	7	0	8	7	2705	151
2023-08-06	3	0	2	3	2708	149
2023-08-13	7	0	4	1	2715	151
2023-08-20	5	0	3	2	2720	151
2023-08-27	7	0	1	0	2727	157
2023-09-03	5	0	1	4	2732	157
2023-09-10	10	1	1	4	2743	162
2023-09-17	7	1	13	5	2750	151
2023-09-24	9	0	2	3	2759	155
2023-10-01	5	0	6	1	2764	153

Pull request counts

Week Ending	Days to First Response	New PRs Responded	New PRs Not Responded	PRs Self Merged No Review	PRs Self Closed No Review
2023-06-18	1.5	4	2	0	0
2023-06-25	12.8	4	0	0	0
2023-07-02	0.3	3	1	1	0
2023-07-09	17.4	5	1	0	0
2023-07-16	19.2	5	2	1	0
2023-07-23	26.8	5	5	2	2
2023-07-30	4.3	3	3	1	0
2023-08-06	9.7	3	0	0	0
2023-08-13	11.6	5	1	0	1
2023-08-20	11.5	4	1	0	0
2023-08-27	11.3	7	0	0	0
2023-09-03	0.0	4	0	1	0
2023-09-10	1.5	4	3	1	1
2023-09-17	0.0	2	4	0	0
2023-09-24	2.6	5	3	0	1
2023-10-01	1.0	1	4	0	0

Pull request responses

The first table, which I used to show with every report, shows that we need to improve some things. The second one adds a whole new dimension to this. Towards the end of July I started an effort to get the oldest pull requests moving along again, so that we could bring them to a conclusion. This gave us a nice bump in closes and merges, but more work needs to be done there.

As for the responses data, there is some interesting information here. The biggest red-flag is the “PRs Self Merged No Review”, this is where the author self-merged their code without a recorded review. A review may have happened outside of GitHub, but nothing was recorded there. Processes need to be put in place so that this does not need to happen.

Last Update Reporting

Another addition thing added to the metrics reporting codebase is reports on pull requests that require attention. This is a tool called report.py and one example output below is the five pull requests that have had the longest time since there has been any action. These will be a priority for me to get moving again.