How Git and Mercurial can hurt your code review
For all of their efforts in helping teams deliver more commits, Git and Mercurial have also introduced one significant problem: the slowing down of peer reviews. Marcin Kuzminski explains how “pragmatic groupings” can help.
Five years ago, the telecommunications company I worked at started to move from Subversion to distributed version control systems (DVCS) like Git and Mercurial. The benefits were huge, but like anything new and exciting, there were problems.
Git and Mercurial are both wonderful at letting software developers commit frequently, from anywhere, even remotely. The ease of commits allowed our team to go from 5 commits a day to 100 commits a day. Then, we noticed a problem. Reading through the history of commits for peer review of code became too much handle. In the short-term, our peer review of code decreased in efficiency.
We eventually developed a technique that I call “pragmatic grouping.” The technique can be used with either Git or Mercurial and requires no special technology. Just download Git or Mercurial from the Internet and you can get started. Before I explain the techniques of pragmatic grouping, I need to provide some background information on why I moved from Subversion.
Subversion limitations for new school developers
When our development team used Subversion, everyone had to be connected to the server to commit. This wasn’t scaling for our team as we became more distributed. We had a lot of people working on the same project and had constant problems with merges and deployment. Our team members were wasting huge amounts of time fixing merge problems.
We were moving to a modern development model where people commit as often as possible. Each developer records each small step. With Subversion you have to constantly wait for commits. If someone else merges, you can have a conflict and then you waste an hour just sorting through the conflicting merges. This can happen all the time.
As the development process changed, everyone knew that the tools had to change. My new school colleagues and I were changing our development workflow faster than Subversion could adapt. We were caught up in a type of “merge hell,” as we tried to commit as often as possible, sometimes every 15 minutes. You always needed a connection to the office and working from home was a nightmare due to constant communication lags.
Don’t depend on the server
When you don’t depend on a server connection, you can often work faster. This also means that you don’t have to constantly look for places where you can get Internet access (or a VPN) just to be able to commit. With both Git and Mercurial, everyone has a backup of everything (files, history), not just the server.
With Git and Mercurial, anyone can become the server. You can commit very frequently if you need to without breaking the code of other developers. Commits are local. You don’t step on the toes of other developers while committing. You don’t break the builds or environments of other developers just by committing. People without “commit access” can commit (because committing in a distributed version control system (DVCS) does not imply uploading code).
This lowers the barrier for contributions. As an integrator, you can decide to pull their changes or not. Contributors can team up and handle their own merging, meaning less work for integrators in the end. Contributors can have their own branches without affecting others’ (but being able to share them if necessary). This new programming style can reinforce natural communication since a DVCS makes communication essential. In Subversion, what you have instead are commit races, which force communication, but by obstructing your work.
Exploding number of commits
Other people in my organization were committing once per day or even once every few days. Often times, these were developers working almost by themselves on a project. You can see by this difference that there is a great range of workflows in collaborative team development. My opinion is that the world is moving toward more collaboration and more frequent commits with better packaging or representation of a set of commits.
The frequency of commits and the ability of the version control system to handle frequent commits greatly improves collaboration. For my team of developers building Web and cloud-based applications, Subversion was driving us nuts. We were encountering so many problems adapting Subversion to our new development process that we assigned one of the developers on our team to manage Subversion. We always needed to wait for him to fix problems, sometimes leaving us unable to work. With our development style so dependent on frequent commits, this downtime of Subversion caused huge frustration.
Today, we’re committing ten to twenty times a day per person. Sometimes a person will commit sixty times. In a small team of five developers, we’ll have a hundred commits per day.
Taming the commit history
Our best practice is to commit as often as possible to clear out the working copy. Each developer marks out the steps of how they get from point A to point B. We then review all the individual steps together as one change. I use a special terminology called a “Pragmatic History” to describe the workflow.
For example, if a person does a pull request with twenty commits that are around a single idea, then we squash all twenty commits to one pragmatic. Going from twenty commits to one pragmatic is rare. A more typical scenario is to divide the twenty commits into three pragmatics. A typical group of pragmatics:
Don’t enforce a single way to represent the workflow. You should give developers on your team the freedom to do what they think is best. This is especially important in large enterprises where it’s common to have mixed teams and each team may have a different way of doing things.
For example, if you have a typo in a doc and you fix it, then you fix another typo, you should enforce a workflow policy of saying, “please just squash these changes into one pragmatic.” We tell our developers to take a few “pragmatic” steps. These are the major steps they did to achieve the functionality needed. Here’s another example group of pragmatics:
- Created the function
- Extended the function with additional parameters
- Wrote tests for the function
In this way, you can commit frequently, but during the code review, ask that people refer to pragmatics. For example, take twenty commits, and change it into a nice history of five pragmatic, clear steps. You should avoid steps, like “I fixed a typo” or “I renamed a function,” which are not helpful to the code review process.
This pragmatic history is a workflow that I use and developed after many years of using version control systems extensively. Both Mercurial and Git support history edits. When people on my team approve the code with either Mercurial or Git, they make a note to squash down the commits into a few steps. Then the developer rearranges their code into a smaller group of pragmatics and I click a button and accept the change.
This technique allows you to do things like bisect. For example, if you’re trying to retroactively find where the bug was introduced, you have clear steps to look at. Each of the steps covers an area of working functionality. You can then easily go to each step, dig down and identify the smaller step that caused the program to break.
This also makes reading the commit history very efficient, which is needed for the new style of distributed development teams are moving to.
If I read “fixed a typo” fifty times, it will get boring and I may lose motivation to dig down into the code. Developers may ask, “why do I even need to read the commit history?” By grouping the commits into pragmatics, it makes the history easier to read, making work more fun for each developer, and makes your team much more efficient.
When I first started to look at the new tools like Mercurial and Git, I quickly saw that they solved the painful points of Subversion. That was pretty exciting because each member could commit as often as they wanted to. If they were online or offline, it didn’t matter. The merges and the whole history looked much better with Mercurial and Git. Both Mercurial and Git were generally faster and made us more productive.
My story of the pragmatic workflow illustrates what we did to handle the history of a large number of commits. When we were on Subversion and committing once every day or every few days, this problem didn’t occur. As we started to work more collaboratively, the number of commits increased and we ran into problems with Subversion. Moving off of Subversion created a new set of challenges. One of these challenges was how to deal with the large number of commits. A change in management of the workflow resolved these problems.