We are in October, which means it has been 4 months since the last metrics report. It is, therefore, time for another quarterly metrics report (plus a bit more). The extra month was to allow for an announcement which is a prerequisite for this post, and it also means we are more or less aligned to real quarters. The major changes to this will come in the second half of this post, we have lots of additional data for pull requests. With that, let’s get started.
Before we get into the statistics themselves, I wanted to delve a little bit into the process I use to update the statistics. Everything used is in the public metrics repository.
The first thing I do is run the
generate_3+year.sh script to generate the provisional CSV output. I then analyse these CSVs to find out where the configuration needs updating. The CSV output will contain email addresses for people it does not recognise. I will look up (to the best of my ability) who these people are. If they already contribute, but this is a new email address, such as a GitHub auto-generated one, they are added to the “aliases” list. If they are a new contributor, I try to figure out their affiliation and add them to the “employers” accordingly (this list should be renamed at some point).
I’ll also do a few dives into the CSVs to try and find errors and tweak the configurations accordingly. There will be things I miss and people I cannot identify correctly. But this is as accurate as I can make it. The configuration is public, so you are welcome to contribute any fixes for things I have missed.
Finally I’ll re-run the script so that I can generate this report.
The pull request statistics are easier, they just require a GitHub token and they can be left to execute. They take some time to execute due to GitHub’s API rate limiting, but I have optimised things to accelerate it this time around. This was needed because we are making significantly more API calls with the newer metrics gathered.
For those who aren’t aware of it already, we are at the end of the main development period for Google Summer of Code. This is a program with which Google pays contributors, to work on a project over the summer (Northern Hemisphere summer, at least). Traditionally, this has just been for university students, but this year it was open to everyone.
This has meant that we have had some great contributions start to land in the trees as a result of this. Not least, a large pull request to bring RocksDB up to date in MariaDB (merged into 11.3). In total, across the projects monitored, we had 13,636 lines of code added and 4,164 lines of code removed by GSoC developers this year.
Unfortunately, due to a regression that could not be fixed before 11.3 reached preview release, the RocksDB update had to be rolled back. But we are confident that we can bring it back in the future.
As you may have seen in our recent news, Amazon has become a sponsor of MariaDB Foundation, as such they have been moved to the “Sponsor” category in the data. This change will apply to all years of data in this data snapshot release because categories, for now, do not have date ranges. Whereas, the affiliation of individual contributors is tracked with date ranges.
As with last month, I’ll provide a summary of MariaDB Plc / Foundation contributions and external contributions for each project. These are:
- MariaDB Server – the server itself
- libmarias3 – an open source library to talk to Amazon S3 and related block storage services. Maintained by MariaDB Plc. and used for Aria’s S3 storage and MariaDB ColumnStore
- MariaDB ColumnStore – a columnar based, clustered storage engine for MariaDB Server. Maintained by MariaDB Plc.
- MariaDB Docker – the official Docker image files for MariaDB Server. Maintained by the MariaDB Foundation
- MariaDB Jupyter Kernel – a Jupyter Notebook plugin for MariaDB Server. Maintained by the MariaDB Foundation
- MariaDB Connector/C – the C client library for MariaDB Server. Maintained by MariaDB Plc.
Compared to last time there has been a ~50% increase in MariaDB Server and MariaDB ColumnStore Engine commits. Development on the other projects has been a little slower.
|Project||Hackers MariaDB||Commits MariaDB||Hackers Others||Commits Others|
|MariaDB ColumnStore Engine||16||231||7||11|
|MariaDB Jupyter Kernel||2||12||2||4|
Note that due to me not noticing the multiple email addresses used by one developer, the MariaDB ColumnStore Engine stats counted one person twice last time. The gitdm configuration has been adjusted to account for this for this blog post.
As mentioned earlier, we have had some new GSoC contributors this time around. But we also have Ruoyu Zhong from the Homebrew project contribute some fixes so that MariaDB will continue to compile well with Homebrew. We thank you for this work.
This also means that Homebrew has been added to the “Distros” category for the statistics.
Let’s compare MariaDB Server’s contributions stats from three months ago to today.
- Red Hat is in a grey area after the IBM acquisition. For this matrix, we have put them under “Distro”, separate from IBM and “Sponsor”.
- There are more entities in the “Sponsor” and “Provider” categories, but for simplification, these have been put into “Other” for this table along with independent contributors.
- Commits don’t always tell the full story, a commit could be anywhere between one line of code or thousands.
This section is where things get interesting this time around. As mentioned in a previous blog post, we now generate statistics around the time to first meaningful response. Which means we now have the following statistics:
- New PRs: The number of PRs that have been opened that week.
- Draft PRs: Of the newly opened PRs that week, how many are currently drafts.
- Closed PRs: The number of PRs that have been closed that week (not merged).
- Merged PRs: The number of PRs that have been merged that week.
- Total PRs: The total number of PRs we have had up to the end of that week.
- Still Open PRs: The total number of PRs still open (including draft) at the end of that week.
- Days to First Response: The average number of days to first meaningful response of PRs for PRs that have been responded, for the PRs opened that week.
- New PRs Responded: The total number of PRs that have had a meaningful response that have been opened that week.
- PRs Self Merge No Review: The number of PRs opened that week which have been merged by the author with no review from anyone else in the MariaDB team.
- PRs Self Closed No Review: The number of PRs opened that week which have had no meaningful response and have been closed by the author.
I’m going to continue on from the June 2023 report, but the snapshot CSV file has the data for the entire year to date. I’m also going to split the PR count and response data into separate tables, otherwise the column count because too large.
|Week Ending||New PRs||Draft PRs||Closed PRs||Merged PRs||Total PRs||Still Open PRs|
|Week Ending||Days to First Response||New PRs Responded||New PRs Not Responded||PRs Self Merged No Review||PRs Self Closed No Review|
The first table, which I used to show with every report, shows that we need to improve some things. The second one adds a whole new dimension to this. Towards the end of July I started an effort to get the oldest pull requests moving along again, so that we could bring them to a conclusion. This gave us a nice bump in closes and merges, but more work needs to be done there.
As for the responses data, there is some interesting information here. The biggest red-flag is the “PRs Self Merged No Review”, this is where the author self-merged their code without a recorded review. A review may have happened outside of GitHub, but nothing was recorded there. Processes need to be put in place so that this does not need to happen.
Another addition thing added to the metrics reporting codebase is reports on pull requests that require attention. This is a tool called
report.py and one example output below is the five pull requests that have had the longest time since there has been any action. These will be a priority for me to get moving again.
If there are any additions you would like to see, please let us know. Otherwise, I’ll be back in December with even more metrics!