1GIT-FILTER-BRANCH(1)              Git Manual              GIT-FILTER-BRANCH(1)
2
3
4

NAME

6       git-filter-branch - Rewrite branches
7

SYNOPSIS

9       git filter-branch [--setup <command>] [--subdirectory-filter <directory>]
10               [--env-filter <command>] [--tree-filter <command>]
11               [--index-filter <command>] [--parent-filter <command>]
12               [--msg-filter <command>] [--commit-filter <command>]
13               [--tag-name-filter <command>] [--prune-empty]
14               [--original <namespace>] [-d <directory>] [-f | --force]
15               [--state-branch <branch>] [--] [<rev-list options>...]
16
17

DESCRIPTION

19       Lets you rewrite Git revision history by rewriting the branches
20       mentioned in the <rev-list options>, applying custom filters on each
21       revision. Those filters can modify each tree (e.g. removing a file or
22       running a perl rewrite on all files) or information about each commit.
23       Otherwise, all information (including original commit times or merge
24       information) will be preserved.
25
26       The command will only rewrite the positive refs mentioned in the
27       command line (e.g. if you pass a..b, only b will be rewritten). If you
28       specify no filters, the commits will be recommitted without any
29       changes, which would normally have no effect. Nevertheless, this may be
30       useful in the future for compensating for some Git bugs or such,
31       therefore such a usage is permitted.
32
33       NOTE: This command honors .git/info/grafts file and refs in the
34       refs/replace/ namespace. If you have any grafts or replacement refs
35       defined, running this command will make them permanent.
36
37       WARNING! The rewritten history will have different object names for all
38       the objects and will not converge with the original branch. You will
39       not be able to easily push and distribute the rewritten branch on top
40       of the original branch. Please do not use this command if you do not
41       know the full implications, and avoid using it anyway, if a simple
42       single commit would suffice to fix your problem. (See the "RECOVERING
43       FROM UPSTREAM REBASE" section in git-rebase(1) for further information
44       about rewriting published history.)
45
46       Always verify that the rewritten version is correct: The original refs,
47       if different from the rewritten ones, will be stored in the namespace
48       refs/original/.
49
50       Note that since this operation is very I/O expensive, it might be a
51       good idea to redirect the temporary directory off-disk with the -d
52       option, e.g. on tmpfs. Reportedly the speedup is very noticeable.
53
54   Filters
55       The filters are applied in the order as listed below. The <command>
56       argument is always evaluated in the shell context using the eval
57       command (with the notable exception of the commit filter, for technical
58       reasons). Prior to that, the $GIT_COMMIT environment variable will be
59       set to contain the id of the commit being rewritten. Also,
60       GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME,
61       GIT_COMMITTER_EMAIL, and GIT_COMMITTER_DATE are taken from the current
62       commit and exported to the environment, in order to affect the author
63       and committer identities of the replacement commit created by git-
64       commit-tree(1) after the filters have run.
65
66       If any evaluation of <command> returns a non-zero exit status, the
67       whole operation will be aborted.
68
69       A map function is available that takes an "original sha1 id" argument
70       and outputs a "rewritten sha1 id" if the commit has been already
71       rewritten, and "original sha1 id" otherwise; the map function can
72       return several ids on separate lines if your commit filter emitted
73       multiple commits.
74

OPTIONS

76       --setup <command>
77           This is not a real filter executed for each commit but a one time
78           setup just before the loop. Therefore no commit-specific variables
79           are defined yet. Functions or variables defined here can be used or
80           modified in the following filter steps except the commit filter,
81           for technical reasons.
82
83       --subdirectory-filter <directory>
84           Only look at the history which touches the given subdirectory. The
85           result will contain that directory (and only that) as its project
86           root. Implies the section called “Remap to ancestor”.
87
88       --env-filter <command>
89           This filter may be used if you only need to modify the environment
90           in which the commit will be performed. Specifically, you might want
91           to rewrite the author/committer name/email/time environment
92           variables (see git-commit-tree(1) for details).
93
94       --tree-filter <command>
95           This is the filter for rewriting the tree and its contents. The
96           argument is evaluated in shell with the working directory set to
97           the root of the checked out tree. The new tree is then used as-is
98           (new files are auto-added, disappeared files are auto-removed -
99           neither .gitignore files nor any other ignore rules HAVE ANY
100           EFFECT!).
101
102       --index-filter <command>
103           This is the filter for rewriting the index. It is similar to the
104           tree filter but does not check out the tree, which makes it much
105           faster. Frequently used with git rm --cached --ignore-unmatch ...,
106           see EXAMPLES below. For hairy cases, see git-update-index(1).
107
108       --parent-filter <command>
109           This is the filter for rewriting the commit’s parent list. It will
110           receive the parent string on stdin and shall output the new parent
111           string on stdout. The parent string is in the format described in
112           git-commit-tree(1): empty for the initial commit, "-p parent" for a
113           normal commit and "-p parent1 -p parent2 -p parent3 ..." for a
114           merge commit.
115
116       --msg-filter <command>
117           This is the filter for rewriting the commit messages. The argument
118           is evaluated in the shell with the original commit message on
119           standard input; its standard output is used as the new commit
120           message.
121
122       --commit-filter <command>
123           This is the filter for performing the commit. If this filter is
124           specified, it will be called instead of the git commit-tree
125           command, with arguments of the form "<TREE_ID> [(-p
126           <PARENT_COMMIT_ID>)...]" and the log message on stdin. The commit
127           id is expected on stdout.
128
129           As a special extension, the commit filter may emit multiple commit
130           ids; in that case, the rewritten children of the original commit
131           will have all of them as parents.
132
133           You can use the map convenience function in this filter, and other
134           convenience functions, too. For example, calling skip_commit "$@"
135           will leave out the current commit (but not its changes! If you want
136           that, use git rebase instead).
137
138           You can also use the git_commit_non_empty_tree "$@" instead of git
139           commit-tree "$@" if you don’t wish to keep commits with a single
140           parent and that makes no change to the tree.
141
142       --tag-name-filter <command>
143           This is the filter for rewriting tag names. When passed, it will be
144           called for every tag ref that points to a rewritten object (or to a
145           tag object which points to a rewritten object). The original tag
146           name is passed via standard input, and the new tag name is expected
147           on standard output.
148
149           The original tags are not deleted, but can be overwritten; use
150           "--tag-name-filter cat" to simply update the tags. In this case, be
151           very careful and make sure you have the old tags backed up in case
152           the conversion has run afoul.
153
154           Nearly proper rewriting of tag objects is supported. If the tag has
155           a message attached, a new tag object will be created with the same
156           message, author, and timestamp. If the tag has a signature
157           attached, the signature will be stripped. It is by definition
158           impossible to preserve signatures. The reason this is "nearly"
159           proper, is because ideally if the tag did not change (points to the
160           same object, has the same name, etc.) it should retain any
161           signature. That is not the case, signatures will always be removed,
162           buyer beware. There is also no support for changing the author or
163           timestamp (or the tag message for that matter). Tags which point to
164           other tags will be rewritten to point to the underlying commit.
165
166       --prune-empty
167           Some filters will generate empty commits that leave the tree
168           untouched. This option instructs git-filter-branch to remove such
169           commits if they have exactly one or zero non-pruned parents; merge
170           commits will therefore remain intact. This option cannot be used
171           together with --commit-filter, though the same effect can be
172           achieved by using the provided git_commit_non_empty_tree function
173           in a commit filter.
174
175       --original <namespace>
176           Use this option to set the namespace where the original commits
177           will be stored. The default value is refs/original.
178
179       -d <directory>
180           Use this option to set the path to the temporary directory used for
181           rewriting. When applying a tree filter, the command needs to
182           temporarily check out the tree to some directory, which may consume
183           considerable space in case of large projects. By default it does
184           this in the .git-rewrite/ directory but you can override that
185           choice by this parameter.
186
187       -f, --force
188           git filter-branch refuses to start with an existing temporary
189           directory or when there are already refs starting with
190           refs/original/, unless forced.
191
192       --state-branch <branch>
193           This option will cause the mapping from old to new objects to be
194           loaded from named branch upon startup and saved as a new commit to
195           that branch upon exit, enabling incremental of large trees. If
196           <branch> does not exist it will be created.
197
198       <rev-list options>...
199           Arguments for git rev-list. All positive refs included by these
200           options are rewritten. You may also specify options such as --all,
201           but you must use -- to separate them from the git filter-branch
202           options. Implies the section called “Remap to ancestor”.
203
204   Remap to ancestor
205       By using git-rev-list(1) arguments, e.g., path limiters, you can limit
206       the set of revisions which get rewritten. However, positive refs on the
207       command line are distinguished: we don’t let them be excluded by such
208       limiters. For this purpose, they are instead rewritten to point at the
209       nearest ancestor that was not excluded.
210

EXIT STATUS

212       On success, the exit status is 0. If the filter can’t find any commits
213       to rewrite, the exit status is 2. On any other error, the exit status
214       may be any other non-zero value.
215

EXAMPLES

217       Suppose you want to remove a file (containing confidential information
218       or copyright violation) from all commits:
219
220           git filter-branch --tree-filter 'rm filename' HEAD
221
222
223       However, if the file is absent from the tree of some commit, a simple
224       rm filename will fail for that tree and commit. Thus you may instead
225       want to use rm -f filename as the script.
226
227       Using --index-filter with git rm yields a significantly faster version.
228       Like with using rm filename, git rm --cached filename will fail if the
229       file is absent from the tree of a commit. If you want to "completely
230       forget" a file, it does not matter when it entered history, so we also
231       add --ignore-unmatch:
232
233           git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
234
235
236       Now, you will get the rewritten history saved in HEAD.
237
238       To rewrite the repository to look as if foodir/ had been its project
239       root, and discard all other history:
240
241           git filter-branch --subdirectory-filter foodir -- --all
242
243
244       Thus you can, e.g., turn a library subdirectory into a repository of
245       its own. Note the -- that separates filter-branch options from revision
246       options, and the --all to rewrite all branches and tags.
247
248       To set a commit (which typically is at the tip of another history) to
249       be the parent of the current initial commit, in order to paste the
250       other history behind the current history:
251
252           git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
253
254
255       (if the parent string is empty - which happens when we are dealing with
256       the initial commit - add graftcommit as a parent). Note that this
257       assumes history with a single root (that is, no merge without common
258       ancestors happened). If this is not the case, use:
259
260           git filter-branch --parent-filter \
261                   'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
262
263
264       or even simpler:
265
266           git replace --graft $commit-id $graft-id
267           git filter-branch $graft-id..HEAD
268
269
270       To remove commits authored by "Darl McBribe" from the history:
271
272           git filter-branch --commit-filter '
273                   if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
274                   then
275                           skip_commit "$@";
276                   else
277                           git commit-tree "$@";
278                   fi' HEAD
279
280
281       The function skip_commit is defined as follows:
282
283           skip_commit()
284           {
285                   shift;
286                   while [ -n "$1" ];
287                   do
288                           shift;
289                           map "$1";
290                           shift;
291                   done;
292           }
293
294
295       The shift magic first throws away the tree id and then the -p
296       parameters. Note that this handles merges properly! In case Darl
297       committed a merge between P1 and P2, it will be propagated properly and
298       all children of the merge will become merge commits with P1,P2 as their
299       parents instead of the merge commit.
300
301       NOTE the changes introduced by the commits, and which are not reverted
302       by subsequent commits, will still be in the rewritten branch. If you
303       want to throw out changes together with the commits, you should use the
304       interactive mode of git rebase.
305
306       You can rewrite the commit log messages using --msg-filter. For
307       example, git svn-id strings in a repository created by git svn can be
308       removed this way:
309
310           git filter-branch --msg-filter '
311                   sed -e "/^git-svn-id:/d"
312           '
313
314
315       If you need to add Acked-by lines to, say, the last 10 commits (none of
316       which is a merge), use this command:
317
318           git filter-branch --msg-filter '
319                   cat &&
320                   echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
321           ' HEAD~10..HEAD
322
323
324       The --env-filter option can be used to modify committer and/or author
325       identity. For example, if you found out that your commits have the
326       wrong identity due to a misconfigured user.email, you can make a
327       correction, before publishing the project, like this:
328
329           git filter-branch --env-filter '
330                   if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
331                   then
332                           GIT_AUTHOR_EMAIL=john@example.com
333                   fi
334                   if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
335                   then
336                           GIT_COMMITTER_EMAIL=john@example.com
337                   fi
338           ' -- --all
339
340
341       To restrict rewriting to only part of the history, specify a revision
342       range in addition to the new branch name. The new branch name will
343       point to the top-most revision that a git rev-list of this range will
344       print.
345
346       Consider this history:
347
348                D--E--F--G--H
349               /     /
350           A--B-----C
351
352
353       To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
354
355           git filter-branch ... C..H
356
357
358       To rewrite commits E,F,G,H, use one of these:
359
360           git filter-branch ... C..H --not D
361           git filter-branch ... D..H --not C
362
363
364       To move the whole tree into a subdirectory, or remove it from there:
365
366           git filter-branch --index-filter \
367                   'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
368                           GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
369                                   git update-index --index-info &&
370                    mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
371
372

CHECKLIST FOR SHRINKING A REPOSITORY

374       git-filter-branch can be used to get rid of a subset of files, usually
375       with some combination of --index-filter and --subdirectory-filter.
376       People expect the resulting repository to be smaller than the original,
377       but you need a few more steps to actually make it smaller, because Git
378       tries hard not to lose your objects until you tell it to. First make
379       sure that:
380
381       ·   You really removed all variants of a filename, if a blob was moved
382           over its lifetime.  git log --name-only --follow --all -- filename
383           can help you find renames.
384
385       ·   You really filtered all refs: use --tag-name-filter cat -- --all
386           when calling git-filter-branch.
387
388       Then there are two ways to get a smaller repository. A safer way is to
389       clone, that keeps your original intact.
390
391       ·   Clone it with git clone file:///path/to/repo. The clone will not
392           have the removed objects. See git-clone(1). (Note that cloning with
393           a plain path just hardlinks everything!)
394
395       If you really don’t want to clone it, for whatever reasons, check the
396       following points instead (in this order). This is a very destructive
397       approach, so make a backup or go back to cloning it. You have been
398       warned.
399
400       ·   Remove the original refs backed up by git-filter-branch: say git
401           for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
402           update-ref -d.
403
404       ·   Expire all reflogs with git reflog expire --expire=now --all.
405
406       ·   Garbage collect all unreferenced objects with git gc --prune=now
407           (or if your git-gc is not new enough to support arguments to
408           --prune, use git repack -ad; git prune instead).
409

NOTES

411       git-filter-branch allows you to make complex shell-scripted rewrites of
412       your Git history, but you probably don’t need this flexibility if
413       you’re simply removing unwanted data like large files or passwords. For
414       those operations you may want to consider The BFG Repo-Cleaner[1], a
415       JVM-based alternative to git-filter-branch, typically at least 10-50x
416       faster for those use-cases, and with quite different characteristics:
417
418       ·   Any particular version of a file is cleaned exactly once. The BFG,
419           unlike git-filter-branch, does not give you the opportunity to
420           handle a file differently based on where or when it was committed
421           within your history. This constraint gives the core performance
422           benefit of The BFG, and is well-suited to the task of cleansing bad
423           data - you don’t care where the bad data is, you just want it gone.
424
425       ·   By default The BFG takes full advantage of multi-core machines,
426           cleansing commit file-trees in parallel. git-filter-branch cleans
427           commits sequentially (i.e. in a single-threaded manner), though it
428           is possible to write filters that include their own parallelism, in
429           the scripts executed against each commit.
430
431       ·   The command options[2] are much more restrictive than git-filter
432           branch, and dedicated just to the tasks of removing unwanted data-
433           e.g: --strip-blobs-bigger-than 1M.
434

GIT

436       Part of the git(1) suite
437

NOTES

439        1. The BFG Repo-Cleaner
440           http://rtyley.github.io/bfg-repo-cleaner/
441
442        2. command options
443           http://rtyley.github.io/bfg-repo-cleaner/#examples
444
445
446
447Git 2.21.0                        02/24/2019              GIT-FILTER-BRANCH(1)
Impressum