# git清除大文件和敏感文件

# 方法一:BFG法(推荐)

# 步骤
  1. 下载BFG文件 https://repo1.maven.org/maven2/com/madgag/bfg/1.13.0/bfg-1.13.0.jar

  2. clone 最新的代码镜像(mirror)

    git clone --mirror ssh://git@git.mofar.top:222/path/to/x-repo.git
    
  3. 查找出大文件

    cd x-repo.git
    #查出最大的13个文件
    git rev-list --objects --all | grep "$(git verify-pack -v objects/pack/*.idx | sort -k 3 -n | tail -13 | awk '{print$1}'
    
  4. 删除文件

    cd ..
    #删除文件,有多个,重复执行
    java -jar bfg-1.13.0.jar --delete-files 待删文件名 --no-blob-protection x-repo.git
    
  5. 释放空间

    cd x-repo.git
    git reflog expire --expire=now --all && git gc --prune=now --aggressive
    
  6. 把远程保护分支解除保护。

    settings->Protected Branches

  7. 推送整理好的本地工程到远程仓库

    #全部分支都推送上去
    git push
    
  8. 还原保护分支

    settings->Protected Branches

    注意:

    1. 建议用ssh方式clone和push代码,用http(s)可能会提示文件过大而失败

    2. 文件名不唯一时,会有问题,由于删除文件不能带路径。

# 优点:
  • 快且简单
# 缺点:
  • 文件名不唯一时,会全部删除。
  • 需要安装jdk8及以上版本。
# 附:
bfg 1.13.0
Usage: bfg [options] [<repo>]

  -b, --strip-blobs-bigger-than <size>
                           strip blobs bigger than X (eg '128K', '1M', etc)
  -B, --strip-biggest-blobs NUM
                           strip the top NUM biggest blobs
  -bi, --strip-blobs-with-ids <blob-ids-file>
                           strip blobs with the specified Git object ids
  -D, --delete-files <glob>
                           delete files with the specified names (eg '*.class', '*.{txt,log}' - matches on file name, not path within repo)
  --delete-folders <glob>  delete folders with the specified names (eg '.svn', '*-tmp' - matches on folder name, not path within repo)
  --convert-to-git-lfs <value>
                           extract files with the specified names (eg '*.zip' or '*.mp4') into Git LFS
  -rt, --replace-text <expressions-file>
                           filter content of files, replacing matched text. Match expressions should be listed in the file, one expression per line - by default, each expression is treated as a literal, but 'regex:' & 'glob:' prefixes are supported, with '==>' to specify a replacement string other than the default of '***REMOVED***'.
  -fi, --filter-content-including <glob>
                           do file-content filtering on files that match the specified expression (eg '*.{txt,properties}')
  -fe, --filter-content-excluding <glob>
                           don't do file-content filtering on files that match the specified expression (eg '*.{xml,pdf}')
  -fs, --filter-content-size-threshold <size>
                           only do file-content filtering on files smaller than <size> (default is 1048576 bytes)
  -p, --protect-blobs-from <refs>
                           protect blobs that appear in the most recent versions of the specified refs (default is 'HEAD')
  --no-blob-protection     allow the BFG to modify even your *latest* commit. Not recommended: you should have already ensured your latest commit is clean.
  --private                treat this repo-rewrite as removing private data (for example: omit old commit ids from commit messages)
  --massive-non-file-objects-sized-up-to <size>
                           increase memory usage to handle over-size Commits, Tags, and Trees that are up to X in size (eg '10M')
  <repo>                   file path for Git repository to clean

# 参考:

https://rtyley.github.io/bfg-repo-cleaner/

# 方法二:git-filter-branch法(传统)

  1. clone 最新的代码镜像(mirror)

    git clone --mirror ssh://git@git.mofar.top:222/path/to/x-repo.git
    
  2. 查找出大文件

    cd x-repo.git
    #查出最大的13个文件
    git rev-list --objects --all | grep "$(git verify-pack -v objects/pack/*.idx | sort -k 3 -n | tail -13 | awk '{print$1}'
    
  3. 删除文件

    git filter-branch --force --index-filter 'git rm -rf --cached --ignore-unmatch 待删文件名' --prune-empty --tag-name-filter cat -- --all
    
  4. 推送到远程

    git push
    

    注意:

    1. 建议用ssh方式clone和push代码,用http(s)可能会提示文件过大而失败
# 优点:
  • 功能强大,git原生处理方法
# 缺点:
  • 非常慢