Better Together Than Alone: Pull Requests
In this JAX Magazine article, Tim Berglund is our guide in getting to grips with one of GitHubs core features.
In this JAX Magazine article, Tim Berglund is our guide in getting to grips with one of GitHub’s core features.
The romping success of both Git and Github is impossible to ignore, and Git practices are rapidly becoming a ubiquitous staple of a developer’s working day. To those who still believe forking is a bad word, Githubber Tim Berglund explains the beauty of Pull Requests within both open source and enterprise circles.
GitHub’s mission is to make it easier to work together than alone. Throughout the company’s history, they have worked toward this goal by providing an easy way to host Git repositories online and surrounding those repositories with a growing set of collaborative mechanisms that work in the browser and through Git itself.
Pull Requests may be the most important of these innovations. They have enabled increased open-source contributions, provided new ways for enterprise teams to work together, and offered a full-featured code review mechanism—all at the cost of a few Git commands and a simple web user interface. Let’s take a look at how pull requests work and how to use them in open-source and enterprise environments.
An open source use case
Suppose you are using the open-source Ratpack framework for a lightweight web application you want to build using the Groovy language. For simple apps, this just means you clone the template and code away, but you’ve encountered a missing feature in the framework that’s really getting in your way. (Full disclosure: the author is also the maintainer of Ratpack, and is aware of several missing features in the framework on which he would happily accept pull requests!)
To get your new feature into the framework, you need to download the code, make the changes, test them locally, and then persuade the maintainer to accept them. In the past, this meant submitting a patch to a mailing list, or worse, fighting your way into the inner circle of the project’s committers. Both of those options worked in the past, but they contain just enough friction to dissuade those who are marginally less motivated to contribute back to the project. Pull requests help capture that margin of productive committers who want to submit their contributions, and enable more highly motivated committers to contribute with less wasted time.
The pull request process starts with you making a copy of the project to which you want to contribute. You could simply clone the project to your local disk, and you’d be free to make changes to that clone as you saw fit, but you wouldn’t be able to submit them back to the project. Remember, you aren’t a committer to the project, and you might not have the goal of becoming one. Instead, to get a copy of the project, you have to go to the source and make the project your own. You have to copy it on GitHub and own that copy. You have to fork the project.
Prior to GitHub, forking was a bad word. For an open-source project to fork, it meant that factions had developed within the team writing the code, and one faction was splitting off from the other and taking the codebase in a separate and incompatible direction. On GitHub, forking simply means that you create a copy of the project under your username, maintaining a connection to the original. It means you’ve got a place to do independent work on the repo, with the promise that you’ll easily be able to submit your changes back later on.
Figure 1: The Fork button in the Github web UI
In the upper-right corner of the main repo, there’s a button labeled “Fork.” Click on this button as shown in Figure 1, and you’ll be treated to a brief animation while GitHub does some work in the background. A few seconds later, you’ll be redirected to what looks like the same repo you just left—except this time you’ll notice that the URL has changed like in Figure 2. This copy of the repo belongs to you!
Figure 2: A newly forked repo, belonging to githubstudent instead of tlberglund.
To do any serious work in your fork, you’ll have to clone it to your development machine. Following the username we’re using in our example, you’d want to get to a working directory on your machine and type the following:
$ git clone https://github.com/githubstudent/Ratpack.git
You can then make changes to that clone, commit them, and push them back to GitHub. You own the clone, so you have the right to push commits to it whenever you’d like.
Once you’ve forked, you have complete control over your own copy of the original project. You can use your fork in your own local builds, push your changes to GitHub, and generally carry on with a private copy of the project in whatever way you see fit. Eventually, though, you’d probably like to get those changes incorporated back into the original project. The easiest way to do this is with a pull request.
A pull request is like a message you send from your fork to the original repository. It has a title, a message body, and a list of commits you want incorporated in the original repo. It’s a way of telling the person or organization who owns the repo that you’ve got work you would like them to merge into their version of the project.
To be precise, the pull request “message” doesn’t really contain a list of commits. It reality, it specifies the branch from which you want the pull-requested commits to come, plus the branch into which you want them merged. As shown in the pull request page screen shot in Figure 3, the source branch is on the right-hand side of the screen, and the destination is on the left. All of the commits that are in the source branch are a part of the pull request. As we’ll see later on, we can even push new commits to this branch after opening the pull request, and those new commits participate in the request as well.
Figure 3: The form used to submit a new pull request
Communicating around pull requests
Once the contributor clicks on the Send Pull Request button, he is redirected to the page showing the pull request detail. Since the destination of the pull request is the tlberglund/Ratpack repo, the PR’s page is at a URL of the form of id. This is the home base for the PR: where the owner of the repo can accept or reject it, others can view its status, and we can collaborate around the proposed code change. That collaboration takes place through three channels of communcation: comments on the PR, comments on the PR’s commits, and comments on lines of code in those commits.
If you look at the bottom of the pull request page, you’ll find a comment box like in Figure 4. Anyone with read access to the repository can enter a comment here about the PR. Generally, the owner of the repo uses this thread to discuss the proffered changes with the person who submitted the pull request. If there’s something about the submission that doesn’t look right to the owner, he or she can mention it here, and the author of the PR will get notified about the comment. It’s a great way to talk about a submission that the owner doesn’t want to accept, but also doesn’t want to reject outright. Significant collaboration can take place in this part of the page.
Figure 4: The discussion thread associated with a pull thread
Since a pull request lists all of the commits that were in the pull-requested branch (but not yet in the destination branch), you can also access those commits directly through the web interface. Each commit’s hash is a link to the commit’s detail page. Clicking on that link as shown as Figure 5, you will again see a comment box at the bottom of the commit detail page. Here you are able to engage in a discussion of a particular commit, as distinct from the entire PR in which the commit participates. If the bulk of the commits in a PR looked good to the repo owner, but s/he wanted to object to a particular commit that contained only whitespace changes, this might be the right place to do it.
Figure 5: Linking to a commit detail page from a pull request
Finally, and perhaps most powerfully, we can focus our online discussion on the diffs introduced by the pull request, and comment on individual lines of code in that diff (as shown in Figure 6). When the conversation must delve down into very low-level details, there is simply no substitute for looking directly at code. To see the aggregate diffs introduced by the pull request, click on the Files Changed tab near the top of the pull request detail page. This view provides a web-based method for conducting this discussion among multiple participants, regardless of where they are located geographically or whether they can all participate in the discussion at the same time.
Figure 6: Commenting on an individual line of code
If you receive negative, but constructive, feedback on a pull request, you’re likely to want to address it to make the PR acceptable to the repo owner rather than just abandon the effort. There’s nothing else you have to do to the PR itself to submit this additional work; simply continue making changes in the branch on which you originally submitted the PR, push those changes to GitHub, and they show up automatically. The pull request itself is a request to submit an entire branch on a given repo, so new commits in that branch on that repo automatically participate, even if they didn’t exist when you first sent the request.
Merging Pull Requests
So far we’ve been playing the role of the unknown open-source contributor on the Internet, hoping to have his or her commits accepted by the Ratpack project. Assuming we’ve done great work and have been persuasive in the pull request discussion, the owner of the main repo will be ready to click the merge button. Let’s put that person’s shoes on now, and take a look at the different ways to merge PRs and the various scenarios that surround them.
From the Web
The simplest way to merge a pull request is on the web. If you look at the bottom of the pull request detail page as shown in Figure 7, you might see a bright green bar with a big button in it that says “Merge This Pull Request Automatically.” If you see this bar, clicking the button will cause the commits submitted in the pull request to be merged into the target branch. Note that this will actually introduce a new merge commit into the hosted Git repository on GitHub—a commit you’ll have to pull down to your clone later on. (This article assumes you already know how to push and pull from upstream Git repositories, but just in case, we’ll see how to do this in a little while.)
Figure 7: GitHub’s built-in support for merging a pull request
In some cases, of course, you can’t automatically merge the PR. You’ll know this, because the green bar will be instead be gray, and will contain text telling you that you can’t merge automatically. Behind the scenes, GitHub has already attempted the merge, and knows that a conflict will result if it proceeds. Since there is no way to resolve that conflict through the web site, you’re going to have to do the merge from the command line.
From the command line
Merging a pull request is ultimately the same as merging any other kind of branch. It differs only in what branch is being merged: most merges are done on local feature branches, but the branch to be merged in the case of a PR comes from another repository entirely. We are forced to do this merge “manually” in the case of a merge conflict, but we might decide to handle unconflicted merges this way as well. Fetching the commits to a local repository gives us the freedom to experiment with the merge in isolation before sharing it with the world through the GitHub repo.
If you already know how to branch and merge in Git, there is really only one new step in this process, and even this step involves a command, fetch, which you’re probably already familiar with. git fetch connects to an upstream repo, determines what objects that repo has that the local repo does not, downloads these objects to the local object database, and optionally updates any remote branch names associated with that upstream repo. Importantly, fetch does not create any new commits on any local branches of the repo; that is, it doesn’t merge any of the downloaded content. It merely saves it to the local object database and updates named pointers to the newly-downloaded commits.
To fetch the branch from which the pull request was sent, you’ll need a repo URL and a branch name. You can find all the information you need on the pull request detail page. See Figure 8 to see where the merge instructions are kept. Clicking the information icon brings up detailed pull request merge instructions, which will always work if you follow them to the letter. However, a lighter-weight procedure is easy to construct from the information we have.
Figure 8: Click on the information icon to get pull request merge instructions
Using the URL and branch name shown in the dialog box, we can simply copy them both to the clipboard as shown in Figure 9, and we will be ready to do the fetch.
Figure 9: The URL and branch from which to fetch to resolve the merge conflict
The most common use of fetch relies on a named remote repository, typically called origin. In this case, however, we’ll be fetching directly from the URL we just constructed, specifying the branch containing the commits we want to merge (in this case, master).
$ git fetch https://github.com/githubstudent/Ratpack.git master
TODO Capture what fetch gives us when we do the above
Since this may be a one-time fetch from this repository (we don’t know if this clone will be submitting more PRs or not), we didn’t bother to create a remote. That remote would have served as a handy label to easier recall the URL of the requesting repo. As a result, Git has no remote branch names to update, and instead issues us a temporary pointer to the fetched commits, called FETCH_HEAD. Keep in mind that this label is temporary; the very next fetch will overwrite it, causing us to have to repeat steps to recall it.
Now we’re ready to perform our merge. Since the pull request was targeted to the master branch of our repo, checkout the master branch and type the following:
$ git merge FETCH_HEAD
If we are merging the PR locally because of a merge conflict (and not merely because we like to be more cautious with our merges), then that conflict will arise at this point, and we’ll be able to resolve it through the normal merge conflict resolution procedure.
Once the merge is complete, push your work to the upstream repo with a git push from the master branch. This will send the pull-requested commits and the merge commit you just created to GitHub. If you go back to the pull request detail page, you’ll notice that it automatically shows that the PR has been closed. Your work is done.
An Enterprise Use Case
So the basic pull request workflow is clear enough. However, what if your project is not open source, but instead proprietary code belonging to your employer? Pull Requests have proved to be an incredibly effective mechanism for simplifying committer lists while inviting increased contributions from various open-source communities, but the concerns of open source development are far from your mind if you are an enterprise user. How, then, do pull requests benefit you? The answer is that the potential benefits are just as large as in the open-source case. Let’s talk through a scenario.
Suppose you work on an internal product with a team of 5-10 other developers. Each of you has push and pull rights on your repository, and you routinely push your work to master and various feature branches as you see fit. It’s not obvious how pull requests could help you manage this workflow. (As it turns out, the GitHub flow uses pull requests even in scenarios like this, but this is beyond the scope of this article.)
However, you also consume other components developed internally in your organization. Suppose you need to use the corporate locationalization API, a module from the company-wide security team, and some build infrastructure components from the enterprise delivery automation team. Normally your build system obtains these modules and incorporates them into your build, and you receive updates to the components as they are published by their respective authors. But as the consumer of an API, sometimes you have to change that API for reasons of your own, and push your own changes upstream to the source of the component. Sometimes the consumer legitimately becomes the producer.
Forking and pull requesting provide an ideal solution to this common enterprise problem. If you have to make changes to the enterprise localization API to implement a new feature, begin by forking the enterprise localization API repo, and reconfiguring your build to use your forked version, not the “official” one. Once you’ve made your changes to the API and tested them thoroughly—perhaps even deploying them to a localized production environment, depending on your release management policies—you are ready to submit your work back to the enterprise localization repo. The ideal mechanism for doing this is through a pull request, using exactly the same procedures outlined in the open-source use case. You are the developer of a proprietary application that consumes a proprietary component, but the workflow through which you modify that component looks exactly the same as the open-source workflow described above.
Keeping your fork up to date
Regardless of whether you’re an enterprise user or an open-source contributor, at some point you’ll submit a pull request and it will be accepted by the upstream repo. Congratulations! Your code, signed with your name and email address, is now an immutable part of that repo. If you plan to continue your own contributions on your fork to be submitted by future pull requests, you’re going to have to keep your fork up to date with the main repo. This also involves Git commands you probably already know, but may not have used in precisely this formula.
To begin with, you should create a remote to point to the main repo. You already have a remote called origin, which points to your fork on GitHub. You’ll need to add a second remote to point to the main repo, like this:
$ git remote add mainrepo https://github.com/tlberglund/Ratpack.git
With that remote established, you need only pull from it to keep your repo up to date. From your own master branch, type the following:
$ git pull mainrepo master
This will keep your master branch up to date with the main repo. As with any merge, there is the potential for this command to generate merge conflicts, which you should be prepared to resolve and commit.
Effective use of pull requests requires just a little familiarity with some important GitHub UI features, along with basic Git network and merge commands. They are not hard to learn, and they manage to implement the sweet spot between managing contributor rights and encouraging broad-based community contributions in many open-source and enterprise contexts. They are not the only collaborative feature on GitHub, and they will not be the last, but they remain one of GitHub’s most important innovations to date, helping the site to deliver on the promise of making it better to work together on code than alone.
Author Bio: Tim is a full-stack generalist and passionate teacher who loves working with people as much as he loves to code. He is a GitHubber whose mission is to make it easy for everybody in the world to use Git. He is a speaker internationally and on the No Fluff Just Stuff tour in the United States, who loves to speak on Git, Cassandra, and other topics. He is co-president of the Denver Open Source User Group, co-presenter of the best-selling O’Reilly Git Master Class, co-author of Building and Testing with Gradle, a member of the O’Reilly Expert Network and a member of the GigOM Pro Analyst Network. He occasionally blogs at timberglund.com. He lives in Littleton, CO, USA with the wife of his youth and their three children.
This article previously appeared in JAX Magazine: Pulling Together. For that issues and others, click here.