Travel through time with Git

Change the Past

BenStraub
timetravel

Of all the things git can do, perhaps one of the most feared is the ability to rewrite history – Ben Straub explains how.

What sets Git apart from other version control systems is the amount of control it provides to its users. If you’re used to Subversion or Mercurial, where history is permanent and immutable, you’ll probably have two reactions upon discovering the rebase command: horror that such a thing even exists, and confusion as to why anyone would want it in the first place.

If you’ve spent much time working on software, you’ve probably been in at least one of these situations:

While working on a feature, you find that you’ve written a utility that’s useful in other parts of the system, but it’s tangled up with your feature.

You forgot to delete a file, and a two-line bug fix got spread across 3 commits.

You had to make a commit that only undoes some changes introduced in an older commit.

You started working on a bugfix, which later turned into a deploy-it-yesterday hotfix.

Rebase can help you solve these problems, or avoid them entirely. It’s easiest to think about this by seeing it in action. Let’s look at a couple of examples.

[ Diagram key: Green blocks are commits, and their arrows indicate “parenthood”, so they point backward in time. The gray blocks are branch refs, and their arrows indicate the commit they refer to. Blue blocks are remote refs. ]

A trivial example

Here’s a situation where the master branch (which is shared with the whole team) and your own personal experiment branch (which only exists on your machine) have diverged somewhat (figure 1).

Let’s suppose that, for whatever reason, you don’t want a merge commit. Rebase allows you to bring in the changes from the experiment branch while keeping the history linear. Here’s how to make that happen:

$ git checkout experiment

$ git rebase master

Figure 2 shows what the history looks like afterwards:

It helps to think of commits as a collection of changes rather than a series of fixed versions. The c and d commits have been re-made as c’ and d’, such that their changes are applied on top of f.

A slightly less trivial example

Let’s say you were working on a bug fix, but it turns out to be more urgent than you first thought. Figure 3 shows what your history looks like right now.

You started your fix on the development branch, where all new features work goes. Meanwhile, the deployed branch has had a couple of small fixes applied to it. Now you discover that your fix corrects a critical issue, and you want to deploy it to production, but without all the half-finished features in b and c. Rebase to the rescue!

$ git rebase –onto deployed development fix

The syntax is “–onto <new-base> <old-base> <end>”. In this case, we want a set of changes (starting after development and ending with fix) to be replayed on top of deployed. Figure 4 shows what it looks like afterward.

Conflicts

At its heart, rebase is a merge operation, which inevitably means merge conflicts. The good news is that rebase actually makes it easier to deal with them!

When you do a standard git merge, all of the conflicts are thrown at you at once. If your branches have been separated for a while, this can be a daunting task; dozens of conflicts, scattered through your codebase; only some of which are related to each other. It’s the kind of situation that makes developers run screaming from their keyboards.

Rebase, on the other hand, applies commits one at a time. If any of them conflict, you get to review them as they’re applied, and correct the problems. If it gets to be too much, you can always change your mind: git rebase –abort resets your repository back to how it was before it started.

Interactivity

You could also think of rebase as a series of git cherry-pick operations, like running a script. In fact, there’s even a way to edit this script before it runs. Typing git rebase –interactive (or the shorter -i) drops the rebase script into your text editor (see Listing 1 for an example).

Listing 1

pick 6ad9071 Work on feature ABC

pick 5742a11 Fix bug #24

pick fd68d8d Work on feature DEF

pick a4806b0 Fix makefile



# Rebase 83b50a7..a4806b0 onto 83b50a7

#

# Commands:

# p, pick = use commit

# r, reword = use commit, but edit the commit message

# e, edit = use commit, but stop for amending

# s, squash = use commit, but meld into previous commit

# f, fixup = like "squash", but discard this commit's log message

# x, exec = run command (the rest of the line) using shell

#

# These lines can be re-ordered; they are executed from top to bottom.

#

# If you remove a line here THAT COMMIT WILL BE LOST.

#

# However, if you remove everything, the rebase will be aborted.

#

# Note that empty commits are commented out

End

The top few lines are the rebase script, which you have complete control over. If you decide you don’t want to rebase after all, just delete all the content. The script runs as soon as you close your editor.

This is where rebase’s power really shines through. You can remove commits that you don’t want, introduce new ones, or squash commits together where it makes more sense to have just one. You can change commit messages, or even the contents of the files committed.

This is more than just aesthetics; it becomes even more useful when combined with other git features. Many projects have the policy that all commits should pass the unit tests, which allows git bisect to be very useful for finding when problems were introduced. It also combines well with cherry-pick – extracting a feature into just one commit allows it to be portable to another branch.

A non-trivial example

Suppose you’re working on a feature using a branch, and it’s a ways from being finished. But you notice that part of what you’ve done is actually a completely separate fix, that the rest of the team needs right now (figure 5).

You can use rebase to merge the “accidental” fix into master, while keeping the in-progress feature separate. Let’s suppose that commits e and f contain this fix. First, we create a new branch to hold the fix:

$ git checkout -b fix

Next, we use an interactive rebase to keep only the commits related to the fix, and change it so they apply on top of master:

$ git rebase -i master

# the rebase script:

pick 7a8b9c0 Commit D

pick 1f2e3d4 Commit E

Figure 6 shows what we have so far.

Now we can merge the fix back into master, and rebase our feature on top of it:

$ git checkout master

$ git merge fix

$ git checkout feature

$ git rebase -i master

# the rebase script:

pick abcd123 Commit F

pick 7890def Commit G

We use an interactive rebase to remove the old e and f commits. Figure 7 shows the end result.

It’s really not that scary

The biggest worry people have when they learn about this feature is that they’ll screw up. Relax. It’s going to be okay.

Looking at the diagrams above, you may notice that the original commits aren’t gone; they’re just harder to see. Nothing is ever truly lost in a Git repository. History is built out of commit objects (which are immutable because of SHA–1 hashing) and refs or branches (which change all the time). Almost every invocation of rebase will move a branch around, but the underlying commits are still in the repository.

Try this in any repository you’ve been working in:

$ git reflog my_feature

You’ll see a listing of every commit that branch has pointed to. There’s an entry in the log for every commit you’ve ever made on your machine. It’ll look something like this:

8672898 my_feature@{0}: rebase finished: refs/heads/my_feature onto 72637a5

7b47989 my_feature@{1}: commit: Fixed #7294

c6cb71b my_feature@{2}: clone: from http://url.to/origin/repo

The lines are in reverse date order; the newest entry is at the top. The first line shows the most recent change: A rebase. Undoing a rebase is usually as simple as this:

$ git reset --hard my_feature@{1}

Also keep in mind that all of these operations only affect the repository on your machine. None of the changes rebase is making are shared with anybody else until you decide to share them.

Publicity

When it comes time to put your work out there for other people to see, one simple guideline will save you from worlds of pain: Don’t change history that has left your machine. If you’ve added 10 commits since you pushed, keep your rebasing to only those 10 commits. Git helps you with this; if your push would overwrite history on the origin, it warns you:

$ git push

To https://url.to/origin/repo

! [rejected] master -> master (non-fast-forward)

error: failed to push some refs to 'url.to/origin/repo'

A simple git push -f will get around this, but the warning should be enough to keep people from making mistakes. This is emblematic of git’s approach to most sensitive subjects: Recommend against something risky, but allow it, because it might come in handy someday. Git never says “never”.

Still, there are some situations where you do want to change history that exists elsewhere. One example is the removal of sensitive information from a repository. Let’s walk through an example – start with figure 8.

Let’s say that the c commit contains a server password, and simply adding a new commit that deletes that line won’t do, you want it to have never existed in the repository. So you do an interactive rebase, and remove the offending commit (figure 9).

You make sure the new history is pushed to origin:

$ git push -f origin master

And you send an email to your team, explaining what happened, and what they should do to adjust.

The short version: Tell everyone to do a git pull –rebase. But keep reading to find out what its doing.

The long version: Here’s what’s going on behind the scenes to make that happen.

Your teammate Jill has some work that was based off of the old master (figure 10).

Figure 11 shows what it looks like after a git fetch.

So she has to rebase her work onto the new head of the history:

$ git rebase --onto origin/master master~ master

Figure 12 shows what it looks like now.

Now she’s ready to keep working, or push her work to the origin.

Your teammate Bill has it easier; his master doesn’t have any extra commits on it. Figure 13 shows what his repository looks like after the fetch.

So all he has to do is change his master branch to point to origin/master, and he’ll get the result shows in figure 14.

$ git rebase origin/master

Fin

By now it should be clear that this isn’t just another feature. It changes the way you relate to version control.

VCS tools used to be write-only, like a typewriter – once the ink hit the paper, the only options you had were to keep typing or start over. The ability to rebase is like getting a text editor. Now you can fix typos, check your spelling, and even restructure your entire document before it becomes permanent. Now your history is editable, and you have the power to make it better.

Rebase isn’t without its downsides, though. It can help make the log graph more linear and understandable, but it also erases the messiness, the record of what actually happened while the software was being developed. History, as they say, is written by the victors.

No matter which side of this debate you come down on, the good news here is that you are the one in charge of your history. Do you want periodic merges in your log, or do you prefer it to be linear? Do you want an accurate history of what happened, or an idealized legend of correct actions? It’s entirely up to you.

What do I do now?

Make yourself a cup of tea. Oh, you mean with rebase? Here are some places to start:

The Git documentation is excellent: http://git-scm.com/docs/git-rebase.html

The Git Book has a great section on rebasing as well: http://git-scm.com/book/en/Git-Branching-Rebasing

Ben Straub has always been a programmer, and loves making things work. He hacks on things at GitHub.

Image by Bob Owen

Author
Comments
comments powered by Disqus