git, OYB software

Completely remove a file from a git repository with git forget-blob

Usage
git forget-blob file_to_forget

Installation

Get the script from github and make it executable.

To do it in one step, paste the following on your terminal

sudo wget https://raw.githubusercontent.com/nachoparker/git-forget-blob/master/git-forget-blob.sh -O /usr/local/bin/git-forget-blob
sudo chmod +x /usr/local/bin/git-forget-blob

If you don’t like installing to /usr/local/bin  using sudo ,  just copy git-forget-blob  wherever you like. It will work as long as the file is in the $PATH with execute permissions.

Details

Be it by mistake or by a change of mind, sooner or later we all deal with the problem of making a git repository forget about a file.

We soon realize that git rm  will not suffice, as git remembers that the file existed once in our history, and thus will keep a reference to it.

To make things worse, rebasing is not easy either, because any references to the blob will prevent git garbage collector from cleaning up the space. This includes remote references and reflog references.

Typically, we run into this problem whenever there is some chunky binary blob that our repository needs to hold, even worse if we have to update it from time to time. This can result in our repository quickly growing in size.

Enter git-forget-blob

# Completely remove a file from a git repository history
#
# Copyleft 2017 by Ignacio Nunez Hernanz <nacho _a_t_ ownyourbits _d_o_t_ com>
# GPL licensed (see end of file) * Use at your own risk!
#
# Usage:
#   git-forget-blob file_to_forget
#
# Notes:
#   It rewrites history, therefore will change commit references
function git-forget-blob()
{
&nbsp; git repack -A
&nbsp; ls .git/objects/pack/*.idx &>/dev/null || {
&nbsp;&nbsp;&nbsp; echo "there is nothing to be forgotten in this repo" && return;
&nbsp; }
&nbsp; local BLOBS=( $( git verify-pack -v .git/objects/pack/*.idx | grep blob | \
                awk '{ print $1 }' ) )
&nbsp; for ref in ${BLOBS[@]}; do
&nbsp;&nbsp;&nbsp; local FILE="$( git rev-list --objects --all | grep $ref | awk '{ print $2 }' )"
&nbsp;&nbsp;&nbsp; [[ "$FILE" == "$1" ]] && break
&nbsp;&nbsp;&nbsp; unset FILE
&nbsp; done
&nbsp; [[ "$FILE" == "" ]] && { echo "$1 not found in repo history" && return; }

&nbsp; git tag | xargs git tag -d
&nbsp; git filter-branch --index-filter "git rm --cached --ignore-unmatch $FILE"
&nbsp; rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
&nbsp; git for-each-ref --format="%(refname)" refs/original/ | \
&nbsp;&nbsp;&nbsp; xargs -n1 --no-run-if-empty git update-ref -d
&nbsp; git reflog expire --expire-unreachable=now --all
&nbsp; git repack -A -d
&nbsp; git prune
}
# License
#
# This script is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This script is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this script; if not, write to the
# Free Software Foundation, Inc., 59 Temple Place, Suite 330,
# Boston, MA  02111-1307  USA

In a nutshell, this

  • uses git filter-branch  to apply git rm  to each single commit
  • then, it removes all possible references including remotes, tags and reflog
  • next, it deletes unreferenced packs, and
  • finally, it forces aggresive garbage collection with git gc –prune .

Things to keep in mind:

  • This rewrites history, so  forced pushes,  merges, conflicts and such niceties will happen.
  • For the same reasons, tags will be lost and commit hashes will change.

Remember to keep a checked out copy of the repo before trying this, and use with care.

Author: nachoparker

Humbly sharing things that I find useful [ github dockerhub ]

9 Comments on “Completely remove a file from a git repository with git forget-blob

  1. Awesome tool!

    Is there any way to use it with a commit hash ? My problem is that I have an old commit that added a folder with hundreds of files that add up to ~1GB (it’s an SDK).
    We have since removed those folders from the repo, however, they persist in the history and .git folder somewhere and makes our repo huge.

    Using git-forget-blob would be tedious to run against every file from that commit (if I could even find such a list). Do you have any recommendations?

    1. I guess you could do ( assuming a6aa2117 is the bad commit)

      git diff –name-only a6aa2117 a6aa2117~ | while read l; do git forget-blob $l; done

      This should forget all files that were modified in that commit. Make sure you did not modify any other file that you do not want to forget on that same commit.

      If you are in doubt, do just the first part first and double check

      git diff –name-only a6aa2117 a6aa2117~

      It will probably take a loong time

  2. Hello,
    I have used the script (many times in fact) all seems to be ok, from second time I get some msg like :
    “file ‘filetodelte’ not found in repository’
    but there is a url of github where the file is still present.

    what can be the issue?
    thanks,
    Paolo

Leave a Reply to HenRy Cancel reply

Your email address will not be published. Required fields are marked *