Stefan Holm Olsen

To rewrite Git history: add and edit files back in time

When I prepared the code for my previous blog post (an efficient and open sourced inbound connector for InRiver iPMC), I had not yet decided on an open source license for the project. My priority was to get the project to a state of first release and then select an appropriate license.

In the end I chose the very permissive MIT license. I then had to add the license file to my Git repository, as well as copyright header to all source code files. The easiest way to do that, would be to just add the license file to the repository, and to add the copyright notice to all source files in a new commit (on top of all the previous commits).

Call me meticulous, but I really wanted that license file to be in the very first commit. I also wanted the copyright notices to be in all source files when they were added. This meant that I had to rewrite the history (as this is called in Git terms).

In this blog post, I reveal the Git commands I refined and ended up using myself.

Scripts to rewrite the history

Warning! Rewriting history and force pushing a Git repository can lead to loss of data, if not done carefully. Before rewriting anything, you should always make sure that other contributors push all their changes to the remote repository, and that you fetch all remote changes to your computer. Even then, you should always backup the repository and/or test the process and the commands on a copy of the repository.

Following the nice warning about data loss, this is what my script does:

  1. At the very first commit, the filter-branch command copies and adds the LICENSE and README files from folder outside the repository. It also updates the Git index so those files appear in all later commits.
  2. Then in each commit, the filter-branch command goes through all C# files (.cs) and adds my copyright notice (also from a file outside the repository) to the beginning of those files.
  3. Another command cleans up rewrite backups made by the first command.
  4. At the end, the script pushes the repository with force (overwriting the remote repository completely).

These are the commands I used:

Using those commands, the LICENSE and README files appeared in my repository as if they were added to the repository from first commit. Likewise, a copyright notice was added to all .cs files from the commits where they were added to the repository.

With minor changes, those commands can also be used for most other use cases. If you achieve what you want using a command in a new commit, it can probably also be done back in time with a filter-branch command.

It could be, for instance, removing a file that should never have been committed (e.g. a big database backup, NuGet or node modules folder) or replacing a password that was committed by accident. However, for those exact examples I would personally use a much faster alternative, called BFG Repo-Cleaner. In one of my older blog post, I used that tool for removing folders and files from the history of a repository.

Use at your own risk I do not make any warranty about the completeness, reliability and accuracy of this information. Any action that you take upon the information in this blog post is strictly at your own risk, and I will not be liable for any losses and damages in connection to the use of this blog post.