In order to ensure that new (or changed) code does not break anything, there is an extensive test suite that is run to catch regressions during MariaDB Server development. Developers are expected to run the test suite locally and, after pushing the code to the remote repository, also check that the more extensive tests run on Travis CI and in particular Buildbot do not find any regressions either. However, sometimes developers are sloppy, make mistakes, don’t check the test results and in a hurry to just push their code change on the main branches, and then the test suite gives errors for everybody else from that point on. Ideally there should be some automatic mechanism that prevents code changes that cause failures.
In September 2015 GitHub announced a feature called “Protected branches”. As MariaDB development is done on GitHub, we have been wanting to use this feature. Due to reasons explained later in this article, we cannot quite yet switch it on, but we are quite close to being able to do so. Here is a summary of how the Protected Branches feature could work, and what still needs to be done to do so.
Unlike git repositories in general, the MariaDB Server git repository does not have a master branch. Instead there is a branch for each major version. Currently, major versions still maintained are from 5.5 to 10.2. New development is done on 10.3 which will become the next major release. These branches are the ones that should be protected. All bugfix and feature development branch off the major release branches, and they must always be flawless and pass the test suite, so that if a developer introduces an error in their new code, developers should be able to realize that their own change broke something when the test suite stops passing (turns from green to red).
In practice, if a developer has modified something on a branch that is protected and tries to push that to GitHub, it will look like this:
$ git push remote: error: GH006: Protected branch update failed for refs/heads/10.3. remote: error: Required status check "continuous-integration/travis-ci" is expected. To github.com:mariadb/server.git ! [remote rejected] 10.3 -> 10.3 (protected branch hook declined) error: failed to push some refs to 'firstname.lastname@example.org:mariadb/server.git'
Force pushes are disabled as well when branch protection is active:
$ git push -f remote: error: GH006: Protected branch update failed for refs/heads/10.3. remote: error: Cannot force-push to a protected branch To github.com:mariadb/server.git ! [remote rejected] 10.3 -> 10.3 (protected branch hook declined) error: failed to push some refs to 'email@example.com:mariadb/server.git'
GitHub will automatically prevent developers from pushing changes on a protected branch unless the test suite runs have passed. For MariaDB right now it means that the Travis CI run must have completed and passed without errors.
On the github.com web page this will be visible so that PRs cannot be merged before it is confirmed (via Travis-CI) that the test suite passes.
If the test suite has passed, the PR reviewer will be able to merge as usual, either using the button in the web UI, or using command line commands. GitHub accepts pushes on the protected branches if the push only sends commits that are previously already known to Github and simply have been merged (fast-forward or not) on the protected branch.
Basically this means that all developers must always do all development on their own branches, push those branches to Github, wait until tests pass, and only then can they run git merge and git push from their own local repositories. GitHub recognizes the commit ids and knows they have already been flagged as tested and passed.
As all developers in the MariaDB Server already do all development on separate branches and only merge to the main branches as the final step, using protected branches does not require anybody to really change their basic workflow. Also, nobody is required to use the web UI of GitHub. All of the commit id and test status detection happens in the GitHub backend. It does gatekeeping equally well on git push and prints out in text the responses if the push is accepted or not, so the command line experience is just as good one could wish.
In addition to our main build and test system, Buildbot, there is also a .travis.yml file defined in the project source repository and the Travis-CI.org is activated. The Travis CI testing has some limits, most notably the 50 minute maximum duration, so it is not as complete as Buildbot tests. From a gatekeeping point of view this is not a problem. In fact it is good that Travis CI tests are slightly smaller in scope, so they are less picky. The tests don’t include every single test that exists, they run only Ubuntu 14.04 build environments and only on amd64 architecture.
If a code change causes a failure that can be seen on Travis CI, it surely must not be accepted into the mainline branches. At some later time, we can start to raise the bar higher, requiring every developer to account for more environments and architectures and more extensive tests. But for the first version of a gatekeeper, Travis CI is perfect.
Travis CI also has the benefit that it is easy for anybody outside of the core developers group to replicate the tests and test their own code on Travis CI before submitting it to MariaDB for review. Travis CI is free to use for any MariaDB/server fork on Github.
There are just two problems remaining. First of all the test suite needs to pass to begin with. We cannot activate a gatekeeper if the codebase is already broken. Developers would be forced to focus 100% on only test suite error fixing, which many consider uninteresting work, especially when it was somebody else before them who broke the test suite. Luckily, this is a one time thing, and having the automatic gatekeeper will make sure that such a situation will not repeat.
Secondly, Travis CI itself has some random failures that are a bit too common. Travis CI actually differentiates between errors, which mean the test raised a red flag for error, and failures, which stem from Travis CI being unable to run all of the test suite, for example due to a network error or an unresponsive test runner. The fails seem to happen right now a bit too frequently and hopefully that will be fixed by improvements to the Travis CI platform.
If a single job fails every now and then, it does not matter much. The developer will not waste too much time looking at the error message and pressing Restart on that specific job.
But if false positives are too frequent, then real development gets stalled as developers just bang their heads against test runner systems.
We need to work a bit more on the MariaDB test suite itself and the .travis.yml definition file to get a system that works reliably and has a rate of false positives below 1%.
This is also a good time for people interested in MariaDB development to read up on Travis-CI.org documentation. It is a great service utilized by multiple repositories on github.com/MariaDB as well as by many companies we know to continuously test their private repositories using the Travis CI business version. Hopefully we can get to the next step in GitHub and Travis CI integration for us and activate protected branches soon.