On Contributions, Pride and Cockiness
At MariaDB Foundation, we are proud of MariaDB Server getting plenty of contributions. But we don’t want to get cocky, so here is an update about where we stand, and what we want to make happen.
First, we have shown our contribution pride in several places. On 15 February 2019, I tweeted
On code contributions, #MariaDB beats #MySQL 1009 to 247: We have over a thousand (1009) closed pull requests on github (and 179 open), MySQL has 247 closed (1 open). https://t.co/32NIuMMTvc pic.twitter.com/ZZcRBdk939
— Kaj Arnö (@kajarno) February 15, 2019
Repeating: On code contributions, #MariaDB beats #MySQL 1009 to 247: We have over a thousand (1009) closed pull requests on GitHub (and 179 open), MySQL has 247 closed (1 open).
In our Annual Report 2018, we spent several pages, talking about pull requests and patches, showing code contribution statistics. We ranked our top 30 individual committers, both 2018 and all-time, 2009-2018. We are grateful for the contributions, and we want to keep them coming.
At the same time, we are far from perfect. Part of the contributions come from Tempesta, which is a subcontractor to MariaDB Corporation and hence only semi-external. Moreover, we have a sizable backlog of contributions, meaning there are contributors who have submitted pull requests over GitHub, but their contributions are not yet merged into the code base. They are waiting, and we are frustrating them. This is disappointing for all: MariaDB Foundation leaves opportunities to improve MariaDB Server on the table, the contributors don’t see their hard work go live and become demotivated, and the users of MariaDB don’t get to use the new functionality. Seems like a no-brainer for us to improve the process.
Which is what we are setting out to do. After some internal meetings, we touched base with our most prolific contributor, Daniel Black of IBM – most prolific as measured by the absolute number of pull requests. The goal was to improve our contribution handling, or in slightly more technical lingo, our pull request handling. We want to get rid of the backlog, in a reasonable time, if possible without adding further resources to the equation. We stated the problem as follows:
End goal
- Reduce backlog of open pull requests
- Motivate contributors to make more contributions
Method of achieving goal
- Clearly set expectations of contributors
- Document, refine, improve the pull request handling process
Specifically, we wanted to test a few hypotheses:
1. Waiting is bad, seeing one’s merged contribution being revoked is worse. Daniel confirmed this. We will thus not change the policy of being very restrictive with our merges, rather erring on the side of carefulness, and merge only in a sustainable way.
2. Context switches are bad; interactivity could speed up the process. Meaningful contributions are non-trivial, and it takes time to dive into them for the reviewer. The reviewer may have detailed questions, which again may take time for the contributor to answer. Our idea is to introduce a semi-mandatory initial contact between the reviewer and the contributor, reasonably soon after the submission, where the reviewer gives the first feedback to the contributor, asking for clarifications, and setting expectations on the duration of the merge process. Merges don’t happen continuously but at particular sprints, and we had better set the contributor expectations properly, because the road can be long from submitting a patch to seeing it released live.
Daniel agreed with the reasoning. Our plan is to verify with all reviewers that they are ready to do this initial review, and whether it can usually be done within a fairly short time (barring conferences and yearly holidays). And our plan is for the review to happen over Zulip in an IRC-like text interaction. If the contributor and reviewer so desire, they can make the interaction over video, with whatever their appropriate joint tool may be.
On top of testing our assumptions, the meeting with Daniel and a follow-up discussion with Rasmus Johansson, VP of Engineering for the MariaDB Corporation Server Team, gave two other key conclusions:
3. Having a written trace on GitHub of all interactions is mandatory, but not enough. Having interactions only on GitHub causes the context switches mentioned above, and on top of that, the comments by the reviewers are not always easy to interpret. Are we asking the contributor to change something or not? Is it a suggestion or a requirement? What is the next step, by whom? Writing is a difficult task! Part of the issues may be solved by picking chat over email as a means of communication, but, then the communication must be summed up over GitHub, and not every interaction will happen over chat.
4. The MariaDB Foundation Administrator should do lifecycle management of the Pull Request. This means alerting the reviewer not just by official means (Jira), but also interactively (Zulip), alerting the reviewer to the pull request over chat. The reviewer may be mostly doing reviews of Corporation colleagues, and might need nudges on how external contributions differ. If the review doesn’t proceed as can be reasonably expected, the administrator can also nudge the reviewer along, so the contributed isn’t left in the dark for an undefined period of time.
As a consequence, we are planning to make some guidelines for our reviewers, on top of the already fairly detailed guidelines we are rolling out for the administrators, who are responsible for the initial contact with the contributor. This initial step involves checking that the contribution is made using a license enabling us to take the contribution, entering things into Jira, finding the adequate release to merge into, and allocating the right reviewer. That should usually happen within the next work day after the contribution, but doesn’t yet set any expectations on behalf of the reviewer.
To conclude, I am happy to point out that we are already making some progress. During 2018, there were 535 Pull Requests opened, and only 392 PRs were closed. The backlog thus grew with 143 contributions. During 2019, we have changed the trend, closing 266 pull requests, compared to 238 opened PRs. The backlog has thus shrunk by 28. A start.
All in all, of 1302 pull requests, we have closed 89%, or 1154 PRs. We are working on the remaining 148.
And as for the meaning of “closed”, it means “handled”, as in either merged or declined. We could be more specific, with categories like rejected, out-of-tree merge, and resubmitted. But we still have more elementary homework to do: We will now focus primarily on reducing backlog and improving expectation setting, before going into other goals.
If you have input for us on how to improve, talk to us! This month, Vicentiu Ciorbaru and Robert Bindar will be at Percona Live in Austin in the US, and I will be at Open Shift in Split in Croatia. And you can also approach us over Zulip or email us using our email addresses at firstname@mariadb.org. Both Zulip and personal email also work as escalation paths, should you find that we’re too slow or otherwise don’t act according to the expectations we set.
There are also two time slots related to new beginner contributors as stated here – from 8:00 to 10:00 UTC on Mondays, and 10:00 to 12:00 UTC on Thursdays that are happening on IRC and zulip-stream “general”, topic “New Contributors”.
Starting point resources for new contributors are here .
You can find more information in the “CONTRIBUTING.md” file in the server repository