git

Completely remove a file from a git repository with git forget-blob

Usage

Installation

Get the script from github and make it executable.

To do it in one step, paste the following on your terminal

If you don’t like installing to /usr/local/bin  using sudo ,  just copy git-forget-blob  wherever you like. It will work as long as the file is in the $PATH with execute permissions.

Details

Be it by mistake or by a change of mind, sooner or later we all deal with the problem of making a git repository forget about a file.

We soon realize that  git rm  will not suffice, as git remembers that the file existed once in our history, and thus will keep a reference to it.

To make things worse, rebasing is not easy either, because any references to the blob will prevent git garbage collector from cleaning up the space. This includes remote references and reflog references.

Typically, we run into this problem whenever there is some chunky binary blob that our repository needs to hold, even worse if we have to update it from time to time. This can result in our repository quickly growing in size.

Enter git-forget-blob

github

In a nutshell, this

  • uses git filter-branch  to apply git rm  to each single commit
  • then, it removes all possible references including remotes, tags and reflog
  • next, it deletes unreferenced packs, and
  • finally, it forces aggresive garbage collection with git gc --prune .

Things to keep in mind:

  • This rewrites history, so  forced pushes,  merges, conflicts and such niceties will happen.
  • For the same reasons, tags will be lost and commit hashes will change.

EDIT: you can use git forget-blob inside git rebasetags so that you don’t lose your tags.

Remember to keep a checked out copy of the repo before trying this, and use with care.

Author: nachoparker

Humbly sharing things that I find useful [ github dockerhub ]

5 Comments on “Completely remove a file from a git repository with git forget-blob

  1. Awesome tool!

    Is there any way to use it with a commit hash ? My problem is that I have an old commit that added a folder with hundreds of files that add up to ~1GB (it’s an SDK).
    We have since removed those folders from the repo, however, they persist in the history and .git folder somewhere and makes our repo huge.

    Using git-forget-blob would be tedious to run against every file from that commit (if I could even find such a list). Do you have any recommendations?

    1. I guess you could do ( assuming a6aa2117 is the bad commit)

      git diff –name-only a6aa2117 a6aa2117~ | while read l; do git forget-blob $l; done

      This should forget all files that were modified in that commit. Make sure you did not modify any other file that you do not want to forget on that same commit.

      If you are in doubt, do just the first part first and double check

      git diff –name-only a6aa2117 a6aa2117~

      It will probably take a loong time

Leave a Reply

Your email address will not be published. Required fields are marked *