Git 101

Better Together Than Alone: Pull Requests

TimBerglund
github-alert

In this JAX Magazine article, Tim Berglund is our guide in getting to grips with one of GitHub’s core features.

In this JAX Magazine article, Tim Berglund is our guide in
getting to grips with one of GitHub’s core features.

The romping success of both Git and Github is
impossible to ignore, and Git practices are rapidly becoming a
ubiquitous staple of a developer’s working day. To those who still
believe forking is a bad word, Githubber Tim Berglund explains the
beauty of Pull Requests within both open source and enterprise
circles.

GitHub’s mission is to make it easier to work
together than alone. Throughout the company’s history, they have
worked toward this goal by providing an easy way to host Git
repositories online and surrounding those repositories with a
growing set of collaborative mechanisms that work in the browser
and through Git itself.

Pull Requests may be the most important of these
innovations. They have enabled increased open-source contributions,
provided new ways for enterprise teams to work together, and
offered a full-featured code review mechanism—all at the cost of a
few Git commands and a simple web user interface. Let’s take a look
at how pull requests work and how to use them in open-source and
enterprise environments.

An open source use case

Suppose you are using the
open-source
Ratpack
framework for a lightweight web application you
want to build using the Groovy language. For simple apps, this just
means you clone
 the
template
and code away,
but you’ve encountered a missing feature in the framework that’s
really getting in your way. (Full disclosure: the author is
also the maintainer of Ratpack, and is aware of several missing
features in the framework on which he would happily accept pull
requests!
)

To get your new feature into the framework, you
need to download the code, make the changes, test them locally, and
then persuade the maintainer to accept them. In the past, this
meant submitting a patch to a mailing list, or worse, fighting your
way into the inner circle of the project’s committers. Both of
those options worked in the past, but they contain just enough
friction to dissuade those who are marginally less motivated to
contribute back to the project. Pull requests help capture that
margin of productive committers who want to submit their
contributions, and enable more highly motivated committers to
contribute with less wasted time.

Forking

The pull request process
starts with you making a copy of the project to which you want to
contribute. You could simply clone the project to your local disk,
and you’d be free to make changes to that clone as you saw fit, but
you wouldn’t be able to submit them back to the project. Remember,
you aren’t a committer to the project, and you might not have the
goal of becoming one. Instead, to get a copy of the project, you
have to go to the source and make the project your own. You have to
copy it on GitHub and own that copy. You have to
fork the
project.

Prior to GitHub, forking was a bad word. For an
open-source project to fork, it meant that factions had developed
within the team writing the code, and one faction was splitting off
from the other and taking the codebase in a separate and
incompatible direction. On GitHub, forking simply means that you
create a copy of the project under your username, maintaining a
connection to the original. It means you’ve got a place to do
independent work on the repo, with the promise that you’ll easily
be able to submit your changes back later on.

Figure 1: The Fork
button in the Github web UI

 In the upper-right corner of the main
repo, there’s a button labeled “Fork.” Click on this button as
shown in Figure 1, and you’ll be treated to a
brief animation while GitHub does some work in the background. A
few seconds later, you’ll be redirected to what looks like the same
repo you just left—except this time you’ll notice that the URL has
changed like in Figure 2. This copy of the repo
belongs to you!

 

Figure
2:
A newly forked repo, belonging to githubstudent instead
of tlberglund.

 

To do any serious work in your fork, you’ll have
to clone it to your development machine. Following the username
we’re using in our example, you’d want to get to a working
directory on your machine and type the following:

 

$ git clone https://github.com/githubstudent/Ratpack.git

 

You can then make changes to that clone, commit
them, and push them back to GitHub. You own the clone, so you have
the right to push commits to it whenever you’d like.

Pull Requesting

Once you’ve forked, you have complete control
over your own copy of the original project. You can use your fork
in your own local builds, push your changes to GitHub, and
generally carry on with a private copy of the project in whatever
way you see fit. Eventually, though, you’d probably like to get
those changes incorporated back into the original project. The
easiest way to do this is with a pull
request
.

A pull request is like a message you send from
your fork to the original repository. It has a title, a message
body, and a list of commits you want incorporated in the original
repo. It’s a way of telling the person or organization who owns the
repo that you’ve got work you would like them to merge into their
version of the project.

To be precise, the pull request “message”
doesn’t really contain a list of commits. It reality, it specifies
the branch from which you want the pull-requested commits to come,
plus the branch into which you want them merged. As shown
in the pull request page screen shot in Figure
3
, the source branch is on the right-hand side of the
screen, and the destination is on the left. All of the commits that
are in the source branch are a part of the pull request. As we’ll
see later on, we can even push new commits to this branch after
opening the pull request, and those new commits participate in the
request as well.

 

Figure
3: The form used to submit a new pull request

Communicating around pull requests

Once the contributor clicks on
the Send Pull Request button, he is
redirected to the page showing the pull request detail. Since the
destination of the pull request is
the tlberglund/Ratpack repo, the PR’s page is at a URL of
the form of id. This is the
home base for the PR: where the owner of the repo can accept or
reject it, others can view its status, and we can collaborate
around the proposed code change. That collaboration takes place
through three channels of communcation: comments on the PR,
comments on the PR’s commits, and comments on lines of code in
those commits.

If you look at the bottom of the pull request
page, you’ll find a comment box like in Figure 4.
Anyone with read access to the repository can enter a comment here
about the PR. Generally, the owner of the repo uses this thread to
discuss the proffered changes with the person who submitted the
pull request. If there’s something about the submission that
doesn’t look right to the owner, he or she can mention it here, and
the author of the PR will get notified about the comment. It’s a
great way to talk about a submission that the owner doesn’t want to
accept, but also doesn’t want to reject outright. Significant
collaboration can take place in this part of the page.

 

Figure
4: The discussion thread associated with a pull thread

 

Since a pull request lists all of the commits
that were in the pull-requested branch (but not yet in the
destination branch), you can also access those commits directly
through the web interface. Each commit’s hash is a link to the
commit’s detail page. Clicking on that link as shown as
Figure 5, you will again see a comment box at the
bottom of the commit detail page. Here you are able to engage in a
discussion of a particular commit, as distinct from the entire PR
in which the commit participates. If the bulk of the commits in a
PR looked good to the repo owner, but s/he wanted to object to a
particular commit that contained only whitespace changes, this
might be the right place to do it.

 

Figure
5: Linking to a commit detail page from a pull request

 

Finally, and perhaps most powerfully, we can
focus our online discussion on the diffs introduced by the pull
request, and comment on individual lines of code in that diff (as
shown in Figure 6). When the conversation must
delve down into very low-level details, there is simply no
substitute for looking directly at code. To see the aggregate diffs
introduced by the pull request, click on the Files
Changed
 tab near the top of the pull request detail
page. This view provides a web-based method for conducting this
discussion among multiple participants, regardless of where they
are located geographically or whether they can all participate in
the discussion at the same time.

 

Figure
6: Commenting on an individual line of code

 

If you receive negative, but constructive,
feedback on a pull request, you’re likely to want to address it to
make the PR acceptable to the repo owner rather than just abandon
the effort. There’s nothing else you have to do to the PR itself to
submit this additional work; simply continue making changes in the
branch on which you originally submitted the PR, push those changes
to GitHub, and they show up automatically. The pull request itself
is a request to submit an entire branch on a
given repo, so new commits in that branch on that repo
automatically participate, even if they didn’t exist when you first
sent the request.

  

Merging Pull Requests

So far we’ve been playing the role of the
unknown open-source contributor on the Internet, hoping to have his
or her commits accepted by the Ratpack project. Assuming we’ve done
great work and have been persuasive in the pull request discussion,
the owner of the main repo will be ready to click the merge button.
Let’s put that person’s shoes on now, and take a look at the
different ways to merge PRs and the various scenarios that surround
them.

From the Web

The simplest way to merge a pull request is on
the web. If you look at the bottom of the pull request detail page
as shown in Figure 7, you might see a bright green
bar with a big button in it that says “Merge This Pull Request
Automatically.” If you see this bar, clicking the button will cause
the commits submitted in the pull request to be merged into the
target branch. Note that this will actually introduce a new merge
commit into the hosted Git repository on GitHub—a commit you’ll
have to pull down to your clone later on. (This article assumes you
already know how to push and pull from upstream Git repositories,
but just in case, we’ll see how to do this in a little while.)

 

Figure
7: GitHub’s built-in support for merging a pull request

 

In some cases, of course, you can’t
automatically merge the PR. You’ll know this, because the green bar
will be instead be gray, and will contain text telling you that you
can’t merge automatically. Behind the scenes, GitHub has already
attempted the merge, and knows that a conflict will result if it
proceeds. Since there is no way to resolve that conflict through
the web site, you’re going to have to do the merge from the command
line.

From the command line

Merging a pull request is ultimately the same as
merging any other kind of branch. It differs only in what branch is
being merged: most merges are done on local feature branches, but
the branch to be merged in the case of a PR comes from another
repository entirely. We are forced to do this merge “manually” in
the case of a merge conflict, but we might decide to handle
unconflicted merges this way as well. Fetching the commits to a
local repository gives us the freedom to experiment with the merge
in isolation before sharing it with the world through the GitHub
repo.

If you already know how to branch and merge in
Git, there is really only one new step in this process, and even
this step involves a command, fetch, which you’re
probably already familiar with. git
fetch
 connects to an upstream repo, determines what
objects that repo has that the local repo does not, downloads these
objects to the local object database, and optionally updates any
remote branch names associated with that upstream repo.
Importantly, fetch does not create any new commits on any local
branches of the repo; that is, it doesn’t merge any of the
downloaded content. It merely saves it to the local object database
and updates named pointers to the newly-downloaded commits.

To fetch the branch from which the pull request
was sent, you’ll need a repo URL and a branch name. You can find
all the information you need on the pull request detail page.
See Figure 8 to see where the merge
instructions are kept. Clicking the information icon brings up
detailed pull request merge instructions, which will always work if
you follow them to the letter. However, a lighter-weight procedure
is easy to construct from the information we have.

 

Figure
8: Click on the information icon to get pull request merge
instructions

 

Using the URL and branch name shown in the
dialog box, we can simply copy them both to the clipboard as shown
in Figure 9, and we will be ready to do the
fetch.

 

Figure
9: The URL and branch from which to fetch to resolve the merge
conflict

 

The most common use
of fetch relies on a named remote
repository, typically called origin. In this
case, however, we’ll be fetching directly from the URL we just
constructed, specifying the branch containing the commits we want
to merge (in this case, master).

 

$ git fetch https://github.com/githubstudent/Ratpack.git master

 

TODO Capture what fetch gives us when we do the
above

Since this may be a one-time fetch from this
repository (we don’t know if this clone will be submitting more PRs
or not), we didn’t bother to create a remote. That remote would
have served as a handy label to easier recall the URL of the
requesting repo. As a result, Git has no remote branch names to
update, and instead issues us a temporary pointer to the fetched
commits, called FETCH_HEAD. Keep in mind that
this label is temporary; the very next fetch
will overwrite it, causing us to have to repeat steps to recall
it.

Now we’re ready to perform our merge. Since the
pull request was targeted to the master branch of our repo,
checkout the master branch and type the following:

 

$ git merge FETCH_HEAD

 

If we are merging the PR locally because of a
merge conflict (and not merely because we like to be more cautious
with our merges), then that conflict will arise at this point, and
we’ll be able to resolve it through the normal merge conflict
resolution procedure.

Once the merge is complete, push your work to
the upstream repo with a git push from
the master branch. This will send the
pull-requested commits and the merge commit you just created to
GitHub. If you go back to the pull request detail page, you’ll
notice that it automatically shows that the PR has been closed.
Your work is done.

An Enterprise Use Case

So the basic pull request workflow is clear
enough. However, what if your project is not open source, but
instead proprietary code belonging to your employer? Pull Requests
have proved to be an incredibly effective mechanism for simplifying
committer lists while inviting increased contributions from various
open-source communities, but the concerns of open source
development are far from your mind if you are an enterprise user.
How, then, do pull requests benefit you? The answer is that the
potential benefits are just as large as in the open-source case.
Let’s talk through a scenario.

Suppose you work on an internal product with a
team of 5-10 other developers. Each of you has push and pull rights
on your repository, and you routinely push your work to master and
various feature branches as you see fit. It’s not obvious how pull
requests could help you manage this workflow. (As it turns out, the GitHub
flow uses pull requests even in
scenarios like this, but this is beyond the
scope of this article
.)

However, you also consume other components
developed internally in your organization. Suppose you need to use
the corporate locationalization API, a module from the company-wide
security team, and some build infrastructure components from the
enterprise delivery automation team. Normally your build system
obtains these modules and incorporates them into your build, and
you receive updates to the components as they are published by
their respective authors. But as the consumer of an API, sometimes
you have to change that API for reasons of your own, and push your
own changes upstream to the source of the component. Sometimes the
consumer legitimately becomes the producer.

Forking and pull requesting provide an ideal
solution to this common enterprise problem. If you have to make
changes to the enterprise localization API to implement a new
feature, begin by forking the enterprise localization API repo, and
reconfiguring your build to use your forked version, not the
“official” one. Once you’ve made your changes to the API and tested
them thoroughly—perhaps even deploying them to a localized
production environment, depending on your release management
policies—you are ready to submit your work back to the enterprise
localization repo. The ideal mechanism for doing this is through a
pull request, using exactly the same procedures outlined in the
open-source use case. You are the developer of a proprietary
application that consumes a proprietary component, but the workflow
through which you modify that component looks exactly the same as
the open-source workflow described above.

Keeping your fork up to date

Regardless of whether you’re an enterprise user
or an open-source contributor, at some point you’ll submit a pull
request and it will be accepted by the upstream repo.
Congratulations! Your code, signed with your name and email
address, is now an immutable part of that repo. If you plan to
continue your own contributions on your fork to be submitted by
future pull requests, you’re going to have to keep your fork up to
date with the main repo. This also involves Git commands you
probably already know, but may not have used in precisely this
formula.

To begin with, you should create a remote to
point to the main repo. You already have a remote
called origin, which points to your fork on
GitHub. You’ll need to add a second remote to point to the main
repo, like this:

 

$ git remote add mainrepo https://github.com/tlberglund/Ratpack.git

 

With that remote established, you need only pull
from it to keep your repo up to date. From your
own master branch, type the
following:

 

$ git pull mainrepo master

 

This will keep
your master branch up to date with the
main repo. As with any merge, there is the potential for this
command to generate merge conflicts, which you should be prepared
to resolve and commit.

Concluding thoughts

Effective use of pull requests requires just a
little familiarity with some important GitHub UI features, along
with basic Git network and merge commands. They are not hard to
learn, and they manage to implement the sweet spot between managing
contributor rights and encouraging broad-based community
contributions in many open-source and enterprise contexts. They are
not the only collaborative feature on GitHub, and they will not be
the last, but they remain one of GitHub’s most important
innovations to date, helping the site to deliver on the promise of
making it better to work together on code than alone.

Author Bio: Tim is a
full-stack generalist and passionate teacher who loves working with
people as much as he loves to code. He is a GitHubber whose mission
is to make it easy for everybody in the world to use Git. He is a
speaker internationally and on the No Fluff Just Stuff tour in the
United States, who loves to speak on Git, Cassandra, and other
topics. He is co-president of the Denver Open Source User Group,
co-presenter of the best-selling O’Reilly Git Master Class,
co-author of Building and Testing with Gradle, a member of the
O’Reilly Expert Network and a member of the GigOM Pro Analyst
Network. He occasionally blogs at timberglund.com. He lives in
Littleton, CO, USA with the wife of his youth and their three
children.

This article previously appeared in JAX
Magazine: Pulling Together. For that issues and others, click
here.

Author
Tim is a full-stack generalist and passionate teacher who loves coding, presenting, and working with people. He is founder and principal software developer at the August Technology Group, a technology consulting firm focused on the JVM. He is a speaker internationally and on the No Fluff Just Stuff tour in the United States, and is co-president of the Denver Open Source User Group. He has recently been exploring non-relational data stores, continuous deployment, and how software architecture should resemble an ant colony. He lives in Littleton with the wife of his youth and their three children.
Comments
comments powered by Disqus