1GIT-FILTER-BRANCH(1) Git Manual GIT-FILTER-BRANCH(1)
2
3
4
6 git-filter-branch - Rewrite branches
7
9 git filter-branch [--setup <command>] [--subdirectory-filter <directory>]
10 [--env-filter <command>] [--tree-filter <command>]
11 [--index-filter <command>] [--parent-filter <command>]
12 [--msg-filter <command>] [--commit-filter <command>]
13 [--tag-name-filter <command>] [--prune-empty]
14 [--original <namespace>] [-d <directory>] [-f | --force]
15 [--state-branch <branch>] [--] [<rev-list options>...]
16
17
19 Lets you rewrite Git revision history by rewriting the branches
20 mentioned in the <rev-list options>, applying custom filters on each
21 revision. Those filters can modify each tree (e.g. removing a file or
22 running a perl rewrite on all files) or information about each commit.
23 Otherwise, all information (including original commit times or merge
24 information) will be preserved.
25
26 The command will only rewrite the positive refs mentioned in the
27 command line (e.g. if you pass a..b, only b will be rewritten). If you
28 specify no filters, the commits will be recommitted without any
29 changes, which would normally have no effect. Nevertheless, this may be
30 useful in the future for compensating for some Git bugs or such,
31 therefore such a usage is permitted.
32
33 NOTE: This command honors .git/info/grafts file and refs in the
34 refs/replace/ namespace. If you have any grafts or replacement refs
35 defined, running this command will make them permanent.
36
37 WARNING! The rewritten history will have different object names for all
38 the objects and will not converge with the original branch. You will
39 not be able to easily push and distribute the rewritten branch on top
40 of the original branch. Please do not use this command if you do not
41 know the full implications, and avoid using it anyway, if a simple
42 single commit would suffice to fix your problem. (See the "RECOVERING
43 FROM UPSTREAM REBASE" section in git-rebase(1) for further information
44 about rewriting published history.)
45
46 Always verify that the rewritten version is correct: The original refs,
47 if different from the rewritten ones, will be stored in the namespace
48 refs/original/.
49
50 Note that since this operation is very I/O expensive, it might be a
51 good idea to redirect the temporary directory off-disk with the -d
52 option, e.g. on tmpfs. Reportedly the speedup is very noticeable.
53
54 Filters
55 The filters are applied in the order as listed below. The <command>
56 argument is always evaluated in the shell context using the eval
57 command (with the notable exception of the commit filter, for technical
58 reasons). Prior to that, the $GIT_COMMIT environment variable will be
59 set to contain the id of the commit being rewritten. Also,
60 GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME,
61 GIT_COMMITTER_EMAIL, and GIT_COMMITTER_DATE are taken from the current
62 commit and exported to the environment, in order to affect the author
63 and committer identities of the replacement commit created by git-
64 commit-tree(1) after the filters have run.
65
66 If any evaluation of <command> returns a non-zero exit status, the
67 whole operation will be aborted.
68
69 A map function is available that takes an "original sha1 id" argument
70 and outputs a "rewritten sha1 id" if the commit has been already
71 rewritten, and "original sha1 id" otherwise; the map function can
72 return several ids on separate lines if your commit filter emitted
73 multiple commits.
74
76 --setup <command>
77 This is not a real filter executed for each commit but a one time
78 setup just before the loop. Therefore no commit-specific variables
79 are defined yet. Functions or variables defined here can be used or
80 modified in the following filter steps except the commit filter,
81 for technical reasons.
82
83 --subdirectory-filter <directory>
84 Only look at the history which touches the given subdirectory. The
85 result will contain that directory (and only that) as its project
86 root. Implies the section called “Remap to ancestor”.
87
88 --env-filter <command>
89 This filter may be used if you only need to modify the environment
90 in which the commit will be performed. Specifically, you might want
91 to rewrite the author/committer name/email/time environment
92 variables (see git-commit-tree(1) for details).
93
94 --tree-filter <command>
95 This is the filter for rewriting the tree and its contents. The
96 argument is evaluated in shell with the working directory set to
97 the root of the checked out tree. The new tree is then used as-is
98 (new files are auto-added, disappeared files are auto-removed -
99 neither .gitignore files nor any other ignore rules HAVE ANY
100 EFFECT!).
101
102 --index-filter <command>
103 This is the filter for rewriting the index. It is similar to the
104 tree filter but does not check out the tree, which makes it much
105 faster. Frequently used with git rm --cached --ignore-unmatch ...,
106 see EXAMPLES below. For hairy cases, see git-update-index(1).
107
108 --parent-filter <command>
109 This is the filter for rewriting the commit’s parent list. It will
110 receive the parent string on stdin and shall output the new parent
111 string on stdout. The parent string is in the format described in
112 git-commit-tree(1): empty for the initial commit, "-p parent" for a
113 normal commit and "-p parent1 -p parent2 -p parent3 ..." for a
114 merge commit.
115
116 --msg-filter <command>
117 This is the filter for rewriting the commit messages. The argument
118 is evaluated in the shell with the original commit message on
119 standard input; its standard output is used as the new commit
120 message.
121
122 --commit-filter <command>
123 This is the filter for performing the commit. If this filter is
124 specified, it will be called instead of the git commit-tree
125 command, with arguments of the form "<TREE_ID> [(-p
126 <PARENT_COMMIT_ID>)...]" and the log message on stdin. The commit
127 id is expected on stdout.
128
129 As a special extension, the commit filter may emit multiple commit
130 ids; in that case, the rewritten children of the original commit
131 will have all of them as parents.
132
133 You can use the map convenience function in this filter, and other
134 convenience functions, too. For example, calling skip_commit "$@"
135 will leave out the current commit (but not its changes! If you want
136 that, use git rebase instead).
137
138 You can also use the git_commit_non_empty_tree "$@" instead of git
139 commit-tree "$@" if you don’t wish to keep commits with a single
140 parent and that makes no change to the tree.
141
142 --tag-name-filter <command>
143 This is the filter for rewriting tag names. When passed, it will be
144 called for every tag ref that points to a rewritten object (or to a
145 tag object which points to a rewritten object). The original tag
146 name is passed via standard input, and the new tag name is expected
147 on standard output.
148
149 The original tags are not deleted, but can be overwritten; use
150 "--tag-name-filter cat" to simply update the tags. In this case, be
151 very careful and make sure you have the old tags backed up in case
152 the conversion has run afoul.
153
154 Nearly proper rewriting of tag objects is supported. If the tag has
155 a message attached, a new tag object will be created with the same
156 message, author, and timestamp. If the tag has a signature
157 attached, the signature will be stripped. It is by definition
158 impossible to preserve signatures. The reason this is "nearly"
159 proper, is because ideally if the tag did not change (points to the
160 same object, has the same name, etc.) it should retain any
161 signature. That is not the case, signatures will always be removed,
162 buyer beware. There is also no support for changing the author or
163 timestamp (or the tag message for that matter). Tags which point to
164 other tags will be rewritten to point to the underlying commit.
165
166 --prune-empty
167 Some filters will generate empty commits that leave the tree
168 untouched. This option instructs git-filter-branch to remove such
169 commits if they have exactly one or zero non-pruned parents; merge
170 commits will therefore remain intact. This option cannot be used
171 together with --commit-filter, though the same effect can be
172 achieved by using the provided git_commit_non_empty_tree function
173 in a commit filter.
174
175 --original <namespace>
176 Use this option to set the namespace where the original commits
177 will be stored. The default value is refs/original.
178
179 -d <directory>
180 Use this option to set the path to the temporary directory used for
181 rewriting. When applying a tree filter, the command needs to
182 temporarily check out the tree to some directory, which may consume
183 considerable space in case of large projects. By default it does
184 this in the .git-rewrite/ directory but you can override that
185 choice by this parameter.
186
187 -f, --force
188 git filter-branch refuses to start with an existing temporary
189 directory or when there are already refs starting with
190 refs/original/, unless forced.
191
192 --state-branch <branch>
193 This option will cause the mapping from old to new objects to be
194 loaded from named branch upon startup and saved as a new commit to
195 that branch upon exit, enabling incremental of large trees. If
196 <branch> does not exist it will be created.
197
198 <rev-list options>...
199 Arguments for git rev-list. All positive refs included by these
200 options are rewritten. You may also specify options such as --all,
201 but you must use -- to separate them from the git filter-branch
202 options. Implies the section called “Remap to ancestor”.
203
204 Remap to ancestor
205 By using git-rev-list(1) arguments, e.g., path limiters, you can limit
206 the set of revisions which get rewritten. However, positive refs on the
207 command line are distinguished: we don’t let them be excluded by such
208 limiters. For this purpose, they are instead rewritten to point at the
209 nearest ancestor that was not excluded.
210
212 On success, the exit status is 0. If the filter can’t find any commits
213 to rewrite, the exit status is 2. On any other error, the exit status
214 may be any other non-zero value.
215
217 Suppose you want to remove a file (containing confidential information
218 or copyright violation) from all commits:
219
220 git filter-branch --tree-filter 'rm filename' HEAD
221
222
223 However, if the file is absent from the tree of some commit, a simple
224 rm filename will fail for that tree and commit. Thus you may instead
225 want to use rm -f filename as the script.
226
227 Using --index-filter with git rm yields a significantly faster version.
228 Like with using rm filename, git rm --cached filename will fail if the
229 file is absent from the tree of a commit. If you want to "completely
230 forget" a file, it does not matter when it entered history, so we also
231 add --ignore-unmatch:
232
233 git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
234
235
236 Now, you will get the rewritten history saved in HEAD.
237
238 To rewrite the repository to look as if foodir/ had been its project
239 root, and discard all other history:
240
241 git filter-branch --subdirectory-filter foodir -- --all
242
243
244 Thus you can, e.g., turn a library subdirectory into a repository of
245 its own. Note the -- that separates filter-branch options from revision
246 options, and the --all to rewrite all branches and tags.
247
248 To set a commit (which typically is at the tip of another history) to
249 be the parent of the current initial commit, in order to paste the
250 other history behind the current history:
251
252 git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
253
254
255 (if the parent string is empty - which happens when we are dealing with
256 the initial commit - add graftcommit as a parent). Note that this
257 assumes history with a single root (that is, no merge without common
258 ancestors happened). If this is not the case, use:
259
260 git filter-branch --parent-filter \
261 'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
262
263
264 or even simpler:
265
266 git replace --graft $commit-id $graft-id
267 git filter-branch $graft-id..HEAD
268
269
270 To remove commits authored by "Darl McBribe" from the history:
271
272 git filter-branch --commit-filter '
273 if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
274 then
275 skip_commit "$@";
276 else
277 git commit-tree "$@";
278 fi' HEAD
279
280
281 The function skip_commit is defined as follows:
282
283 skip_commit()
284 {
285 shift;
286 while [ -n "$1" ];
287 do
288 shift;
289 map "$1";
290 shift;
291 done;
292 }
293
294
295 The shift magic first throws away the tree id and then the -p
296 parameters. Note that this handles merges properly! In case Darl
297 committed a merge between P1 and P2, it will be propagated properly and
298 all children of the merge will become merge commits with P1,P2 as their
299 parents instead of the merge commit.
300
301 NOTE the changes introduced by the commits, and which are not reverted
302 by subsequent commits, will still be in the rewritten branch. If you
303 want to throw out changes together with the commits, you should use the
304 interactive mode of git rebase.
305
306 You can rewrite the commit log messages using --msg-filter. For
307 example, git svn-id strings in a repository created by git svn can be
308 removed this way:
309
310 git filter-branch --msg-filter '
311 sed -e "/^git-svn-id:/d"
312 '
313
314
315 If you need to add Acked-by lines to, say, the last 10 commits (none of
316 which is a merge), use this command:
317
318 git filter-branch --msg-filter '
319 cat &&
320 echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
321 ' HEAD~10..HEAD
322
323
324 The --env-filter option can be used to modify committer and/or author
325 identity. For example, if you found out that your commits have the
326 wrong identity due to a misconfigured user.email, you can make a
327 correction, before publishing the project, like this:
328
329 git filter-branch --env-filter '
330 if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
331 then
332 GIT_AUTHOR_EMAIL=john@example.com
333 fi
334 if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
335 then
336 GIT_COMMITTER_EMAIL=john@example.com
337 fi
338 ' -- --all
339
340
341 To restrict rewriting to only part of the history, specify a revision
342 range in addition to the new branch name. The new branch name will
343 point to the top-most revision that a git rev-list of this range will
344 print.
345
346 Consider this history:
347
348 D--E--F--G--H
349 / /
350 A--B-----C
351
352
353 To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
354
355 git filter-branch ... C..H
356
357
358 To rewrite commits E,F,G,H, use one of these:
359
360 git filter-branch ... C..H --not D
361 git filter-branch ... D..H --not C
362
363
364 To move the whole tree into a subdirectory, or remove it from there:
365
366 git filter-branch --index-filter \
367 'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
368 GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
369 git update-index --index-info &&
370 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
371
372
374 git-filter-branch can be used to get rid of a subset of files, usually
375 with some combination of --index-filter and --subdirectory-filter.
376 People expect the resulting repository to be smaller than the original,
377 but you need a few more steps to actually make it smaller, because Git
378 tries hard not to lose your objects until you tell it to. First make
379 sure that:
380
381 · You really removed all variants of a filename, if a blob was moved
382 over its lifetime. git log --name-only --follow --all -- filename
383 can help you find renames.
384
385 · You really filtered all refs: use --tag-name-filter cat -- --all
386 when calling git-filter-branch.
387
388 Then there are two ways to get a smaller repository. A safer way is to
389 clone, that keeps your original intact.
390
391 · Clone it with git clone file:///path/to/repo. The clone will not
392 have the removed objects. See git-clone(1). (Note that cloning with
393 a plain path just hardlinks everything!)
394
395 If you really don’t want to clone it, for whatever reasons, check the
396 following points instead (in this order). This is a very destructive
397 approach, so make a backup or go back to cloning it. You have been
398 warned.
399
400 · Remove the original refs backed up by git-filter-branch: say git
401 for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
402 update-ref -d.
403
404 · Expire all reflogs with git reflog expire --expire=now --all.
405
406 · Garbage collect all unreferenced objects with git gc --prune=now
407 (or if your git-gc is not new enough to support arguments to
408 --prune, use git repack -ad; git prune instead).
409
411 git-filter-branch allows you to make complex shell-scripted rewrites of
412 your Git history, but you probably don’t need this flexibility if
413 you’re simply removing unwanted data like large files or passwords. For
414 those operations you may want to consider The BFG Repo-Cleaner[1], a
415 JVM-based alternative to git-filter-branch, typically at least 10-50x
416 faster for those use-cases, and with quite different characteristics:
417
418 · Any particular version of a file is cleaned exactly once. The BFG,
419 unlike git-filter-branch, does not give you the opportunity to
420 handle a file differently based on where or when it was committed
421 within your history. This constraint gives the core performance
422 benefit of The BFG, and is well-suited to the task of cleansing bad
423 data - you don’t care where the bad data is, you just want it gone.
424
425 · By default The BFG takes full advantage of multi-core machines,
426 cleansing commit file-trees in parallel. git-filter-branch cleans
427 commits sequentially (i.e. in a single-threaded manner), though it
428 is possible to write filters that include their own parallelism, in
429 the scripts executed against each commit.
430
431 · The command options[2] are much more restrictive than git-filter
432 branch, and dedicated just to the tasks of removing unwanted data-
433 e.g: --strip-blobs-bigger-than 1M.
434
436 Part of the git(1) suite
437
439 1. The BFG Repo-Cleaner
440 http://rtyley.github.io/bfg-repo-cleaner/
441
442 2. command options
443 http://rtyley.github.io/bfg-repo-cleaner/#examples
444
445
446
447Git 2.21.0 02/24/2019 GIT-FILTER-BRANCH(1)