Publishing of Contribution Statistics

Attribution: Alpha Stock Images - http://alphastockimages.com/

In the last meeting, the MariaDB Foundation Board proposed the regular publishing of contribution statistics. This post is an update on our progress and the first report.

Whilst there has been some progress on this project since the board meeting, a vast majority of the progress has happened since mid-August and improvements are continuing to happen rather quickly. I will break down the request and our progress below.

Monthly Statistics Publishing

MariaDB Foundation starts to, on a monthly cadence, publish contributor statistics on a) lines of code and b) commits.

The statistics are currently being published using a snapshot of the metrics git repository at reporting time so that the methodology and configuration used can be scrutinised. They will include details of lines of code and commits.

The reports are given additional attention on a quarterly cadence, through blog entries and board meetings.

This will happen with this post being the first report.

The report looks at code contributions from the previous three years.

The report outputs are currently the past 3 years and all of the current year to date. So for this report, it is 2019 – 2022.

Categories

The next point is regarding categorisation of the commits.

Each individual developer in the statistics is categorised into one of three categories, based on where the developer gets his or her salary: a) Financial sponsor, b) Non-sponsor, c) Foundation employee.

What we have gone with is something on multiple levels which is maybe a little more complex than this. First of all each contributor is linked with an organisation if a relevant one has been found, otherwise they are marked as an “Individual”. We have for now broken down organisations into the following categories:

  • MariaDB Corporation
  • MariaDB Foundation
  • Sponsor
  • GSoC (Google Summer of Code)
  • Provider
  • Distro
  • Other

The two MariaDB entities have by far the largest number so are separated out so that they can added or removed as needed. Sponsors the entities that sponsor the MariaDB Foundation. Distro covers contributions from the various Linux and BSD distributions. Provider is typically an organisation that uses MariaDB Server to provide a service. Finally “Other” is an independent contributor or one that does not fit into the other categories.

This level of breakdown can easily be merged to create the three categories requested. But we feel it can provide more useful statistics. That being said, we are open to suggestion on changes to the categorisations and the data can be regenerated easily.

Transparency

The stats are run by a transparent script, to be published on Github, and is up for public scrutiny.

This we have done, it is available in the “metrics” repository on GitHub and we have requested public scrutiny in a previous blog post.

The stats also make a distinction between core MariaDB Server and plugins.

This we do not have at this stage. We intend to implement it via ticket MDBF-466. The tool used to generate the statistics “gitdm” does not yet have this level of breakdown so it will require some modification by us.

Code Review Metrics

The final point was one raise by Eric:

Discussion: Eric noted the importance of also monitoring reviews (as reviewers are a scarce resource).

We have a script to monitor this too. Whilst it does not monitor the individual reviewers at this time, it does monitor the current open/closed state of the reviews on a week-by-week basis.

First Report

With that, this is the first report. The CSV data can be obtained from a GitHub release here, along with a snapshot of the code and configuration used to generate it. At this point the category information may need a little ironing out before it fits the needs correctly.

We have tried to strike a difficult balance between being fair in visibility of contributors and also not making the below matrices excessively long. We are open to questions and comments regarding these representations.

As a simple illustration, this is a table that shows the lines added for different categories with their percentages of the total. Lines added does not tell the whole picture though, so we will dive deeper further on.

Category2019202020212022
MDBC539484 / 82.39%785029 / 91.56%792562 / 90.76%1317885 / 95.27%
MDBF26553 / 4.06%17395 / 2.03%28008 / 3.21%37780 / 2.73%
Provider55237 / 8.44%38483 / 4.49%29419 / 3.67%10327 / 0.75%
Sponsor1439 / 0.22%15174 / 1.77%359 / 0.04%1 / 0%
GSoC4824 / 0.74%55 / 0.01%11735 / 1.34%5188 / 0.38%
Distro666 / 0.1%261 / 0.03%122 / 0.01%93 / 0.01%
Other26570 / 4.06%1035 / 0.12%11008 / 1.26%11989 / 0.87%
TOTAL6547738574328732131383263
Lines added by entity category

The first matrix table represents the number of commits and the percentage of total commits for that year, with a totals column for all the years combined.

CategoryEntity2019202020212022Totals
MDBCMariaDB Corporation2954 / 84.54%3570 / 84.5%3217 / 82.91%2578 / 81.53%12319 / 83.46%
MDBFMariaDB Foundation256 / 7.33%356 / 8.43%307 / 7.91%357 / 11.29%1276 / 8.64%
ProviderCodership80 / 2.29%94 / 2.22%75 / 1.93%53 / 1.68%302 / 2.05%
CONNECT54 / 1.55%65 / 1.54%55 / 1.42%1 / 0.03%175 / 1.19%
Amazon2 / 0.05%27 / 0.7%50 / 1.58%79 / 0.54%
Oracle Corporation14 / 0.4%11 / 0.26%4 / 0.10%1 / 0.03%30 / 0.2%
Tempesta17 / 0.49%4 / 0.09%3 / 0.08%24 / 0.16%
Huawei1 / 0.03%5 / 0.12%14 / 0.36%2 / 0.06%22 / 0.07%
SponsorIBM35 / 1%63 / 1.49%4 / 0.1%1 / 0.03%103 / 0.7%
ServiceNow12 / 0.34%5 / 0.12%17 / 0.12%
GSoCGSoC3 / 0.09%1 / 0.02%62 / 1.6%27 / 0.85%93 / 0.63%
DistroAll Distros15 / 0.43%22 / 0.52%20 / 0.52%11 / 0.35%68 / 0.46%
OtherOthers53 / 1.52%27 / 0.64%92 / 2.37%81 / 2.56%253 / 1.71%
TOTAL349442253880316214761
Commits by organisation

The second matrix table shows the number of lines added and deleted by each entity. A deleted line could be a line that has been replaced by an added line, or it could be completely deleted. The inverse is also true, you can add lines without deleting old lines in a commit. This is why it is possible to have more deleted than added lines.

CategoryEntity2019202020212022
MDBCMariaDB Corporation539484 / 535998785029 / 2906979792562 / 2978591317885 / 499848
MDBFMariaDB Foundation26553 / 2454117395 / 1348228008 / 1163737780 / 15588
ProviderCodership39457 / 1093815062 / 262216857 / 63087545 / 1286
CONNECT10524 / 824722163 / 94329342 / 24987 / 5
Amazon19 / 2 985 / 3382601 / 4533
Oracle Corporation690 / 112162 / 4231392 / 27144 / 55
Tempesta3991 / 837822 / 247437 / 212
Huawei575 / 35255 / 320406 / 21830 / 21
SponsorIBM814 / 40415108 / 8431359 / 271 / 0
ServiceNow625 / 51266 / 71
GSoCGSoC4824 / 26855 / 7211735 / 37845188 / 1968
DistroAll Distros666 / 661261 / 116122 / 3093 / 96
OtherOthers26570 / 32061035 / 37511008 / 424511989 / 2098
TOTAL654773 / 585759857432 / 2942572873213 / 3271831383263 / 525498
Lines of code added / removed

Notes:
1. RedHat is in a grey area after the IBM acquisition. For this matrix we have put them under “Distro”.
2. Tempesta is under MariaDB Corporation in the raw data, but moved to “Provider” here.
3. Any organisation in any section with less than 10 contributions across the date range has been compacted into “Others”, along with individual contributors. All Linux / BSD distros have been compacted no matter the number of contributions. This is to make the matrix more compact.
4. “CONNECT” signifies the connect engine contributions which were by a single author.

Finally this is the status of pull requests since the beginning of August. This shows the newly opened PR count for the week, the number of closed but not merged and the number of merged. The final two columns show the all-time running total number of PRs and the number that were still open at the end of that week.

Week EndingNew PRsClosed PRsMerged PRsTotal PRsStill Open PRs
2022-08-0712352219113
2022-08-148522227114
2022-08-213202230115
2022-08-284242234113
2022-09-0411162245117
Pull request counts

Feature image attribution: Alpha Stock Images

Published by Andrew Hutchings

Chief Contributions Officer for the MariaDB Foundation