1GIT-FILTER-REPO(1) Git Manual GIT-FILTER-REPO(1)
2
3
4
6 git-filter-repo - Rewrite repository history
7
9 git filter-repo --analyze
10 git filter-repo [<path_filtering_options>] [<content_filtering_options>]
11 [<ref_renaming_options>] [<commit_message_filtering_options>]
12 [<name_or_email_filtering_options>] [<parent_rewriting_options>]
13 [<generic_callback_options>] [<miscellaneous_options>]
14
15
17 Rapidly rewrite entire repository history using user-specified filters.
18 This is a destructive operation which should not be used lightly; it
19 writes new commits, trees, tags, and blobs corresponding to (but
20 filtered from) the original objects in the repository, then deletes the
21 original history and leaves only the new. See the section called
22 “DISCUSSION” for more details on the ramifications of using this tool.
23 Several different types of history rewrites are possible; examples
24 include (but are not limited to):
25
26 · stripping large files (or large directories or large extensions)
27
28 · stripping unwanted files by path
29
30 · extracting wanted paths and their history (stripping everything
31 else)
32
33 · restructuring the file layout (such as moving all files into a
34 subdirectory in preparation for merging with another repo, making a
35 subdirectory become the new toplevel directory, or merging two
36 directories with independent filenames into one directory)
37
38 · renaming tags (also often in preparation for merging with another
39 repo)
40
41 · replacing or removing sensitive text such as passwords
42
43 · making mailmap rewriting of user names or emails permanent
44
45 · making grafts or replacement refs permanent
46
47 · rewriting commit messages
48
49 Additionally, several concerns are handled automatically (many of these
50 can be overridden, but they are all on by default):
51
52 · rewriting (possibly abbreviated) hashes in commit messages to refer
53 to the new post-rewrite commit hashes
54
55 · pruning commits which become empty due to the above filters (also
56 handles edge cases like pruning of merge commits which become
57 degenerate and empty)
58
59 · creating replace-refs (see git-replace(1)) for old commit hashes,
60 which if pushed and fetched will allow users to continue to refer
61 to new commits using (unabbreviated) old commit IDs
62
63 · stripping of original history to avoid mixing old and new history
64
65 · repacking the repository post-rewrite to shrink the repo for the
66 user
67
68 Also, it’s worth noting that there is an important safety mechanism:
69
70 · abort if run from a repo that is not a fresh clone (to prevent
71 accidental data loss from rewriting local history that doesn’t
72 exist anywhere else)
73
74 For those who know that there is large unwanted stuff in their history
75 and want help finding it, this command also
76
77 · provides an option to analyze a repository and generate reports
78 that can be useful in determining what to filter (or in determining
79 whether a separate filtering command was successful).
80
81 See also the section called “VERSATILITY”, the section called
82 “DISCUSSION”, the section called “EXAMPLES”, and the section called
83 “INTERNALS”.
84
86 Analysis Options
87 --analyze
88 Analyze repository history and create a report that may be useful
89 in determining what to filter in a subsequent run (or in
90 determining if a previous filtering command did what you wanted).
91 Will not modify your repo.
92
93 Filtering based on paths (see also --filename-callback)
94 --invert-paths
95 Invert the selection of files from the specified
96 --path-{match,glob,regex} options below, i.e. only select files
97 matching none of those options.
98
99 --path-match <dir_or_file>, --path <dir_or_file>
100 Exact paths (files or directories) to include in filtered history.
101 Multiple --path options can be specified to get a union of paths.
102
103 --path-glob <glob>
104 Glob of paths to include in filtered history. Multiple --path-glob
105 options can be specified to get a union of paths.
106
107 --path-regex <regex>
108 Regex of paths to include in filtered history. Multiple
109 --path-regex options can be specified to get a union of paths.
110
111 --use-base-name
112 Match on file base name instead of full path from the top of the
113 repo. Incompatible with --path-rename.
114
115 Renaming based on paths (see also --filename-callback)
116 --path-rename <old_name:new_name>, --path-rename-match
117 <old_name:new_name>
118 Path to rename; if filename or directory matches <old_name> rename
119 to <new_name>. Multiple --path-rename options can be specified.
120
121 Path shortcuts
122 --paths-from-file <filename>
123 Specify several path filtering and renaming directives, one per
124 line. Lines with ==> in them specify path renames, and lines can
125 begin with literal: (the default), glob:, or regex: to specify
126 different matching styles
127
128 --subdirectory-filter <directory>
129 Only look at history that touches the given subdirectory and treat
130 that directory as the project root. Equivalent to using --path
131 <directory>/ --path-rename <directory>/:
132
133 --to-subdirectory-filter <directory>
134 Treat the project root as instead being under <directory>.
135 Equivalent to using --path-rename :<directory>/
136
137 Content editing filters (see also --blob-callback)
138 --replace-text <expressions_file>
139 A file with expressions that, if found, will be replaced. By
140 default, each expression is treated as literal text, but regex: and
141 glob: prefixes are supported. You can end the line with ==> and
142 some replacement text to choose a replacement choice other than the
143 default of ***REMOVED***.
144
145 --strip-blobs-bigger-than <size>
146 Strip blobs (files) bigger than specified size (e.g. 5M, 2G, etc)
147
148 --strip-blobs-with-ids <blob_id_filename>
149 Read git object ids from each line of the given file, and strip all
150 of them from history
151
152 Renaming of refs (see also --refname-callback)
153 --tag-rename <old:new>
154 Rename tags starting with <old> to start with <new>. For example,
155 --tag-rename foo:bar will rename tag foo-1.2.3 to bar-1.2.3; either
156 <old> or <new> can be empty.
157
158 Filtering of commit messages (see also --message-callback)
159 --preserve-commit-hashes
160 By default, since commits are rewritten and thus gain new hashes,
161 references to old commit hashes in commit messages are replaced
162 with new commit hashes (abbreviated to the same length as the old
163 reference). Use this flag to turn off updating commit hashes in
164 commit messages.
165
166 --preserve-commit-encoding
167 Do not reencode commit messages into UTF-8. By default, if the
168 commit object specifies an encoding for the commit message, the
169 message is re-encoded into UTF-8.
170
171 Filtering of names & emails (see also --name-callback and --email-callback)
172 --mailmap <filename>
173 Use specified mailmap file (see git-shortlog(1) for details on the
174 format) when rewriting author, committer, and tagger names and
175 emails. If the specified file is part of git history, historical
176 versions of the file will be ignored; only the current contents are
177 consulted.
178
179 --use-mailmap
180 Same as: --mailmap .mailmap
181
182 Parent rewriting
183 --replace-refs {delete-no-add, delete-and-add, update-no-add,
184 update-or-add, update-and-add}
185 Replace refs (see git-replace(1)) are used to rewrite parents
186 (unless turned off by the usual git mechanism); this flag specifies
187 what do do with those refs afterward. Replace refs can either be
188 deleted or updated to point at new commit hashes. Also, new replace
189 refs can be added for each commit rewrite. With update-or-add, new
190 replace refs are only added for commit rewrites that aren’t used to
191 update an existing replace ref. default is update-and-add if
192 $GIT_DIR/filter-repo/already_ran does not exist; update-or-add
193 otherwise.
194
195 --prune-empty {always, auto, never}
196 Whether to prune empty commits. auto (the default) means only
197 prune commits which become empty (not commits which were empty in
198 the original repo, unless their parent was pruned). When the parent
199 of a commit is pruned, the first non-pruned ancestor becomes the
200 new parent.
201
202 --prune-degenerate {always, auto, never}
203 Since merge commits are needed for history topology, they are
204 typically exempt from pruning. However, they can become degenerate
205 with the pruning of other commits (having fewer than two parents,
206 having one commit serve as both parents, or having one parent as
207 the ancestor of the other.) If such merge commits have no file
208 changes, they can be pruned. The default (auto) is to only prune
209 empty merge commits which become degenerate (not which started as
210 such).
211
212 Generic callback code snippets
213 --filename-callback <function_body>
214 Python code body for processing filenames; see the section called
215 “CALLBACKS”.
216
217 --message-callback <function_body>
218 Python code body for processing messages (both commit messages and
219 tag messages); see the section called “CALLBACKS”.
220
221 --name-callback <function_body>
222 Python code body for processing names of people; see the section
223 called “CALLBACKS”.
224
225 --email-callback <function_body>
226 Python code body for processing emails addresses; see the section
227 called “CALLBACKS”.
228
229 --refname-callback <function_body>
230 Python code body for processing refnames; see the section called
231 “CALLBACKS”.
232
233 --blob-callback <function_body>
234 Python code body for processing blob objects; see the section
235 called “CALLBACKS”.
236
237 --commit-callback <function_body>
238 Python code body for processing commit objects; see the section
239 called “CALLBACKS”.
240
241 --tag-callback <function_body>
242 Python code body for processing tag objects; see the section called
243 “CALLBACKS”.
244
245 --reset-callback <function_body>
246 Python code body for processing reset objects; see the section
247 called “CALLBACKS”.
248
249 Location to filter from/to
250 Note
251 Specifying alternate source or target locations implies --partial
252 except that the normal default for --replace-refs is used. However,
253 unlike normal uses of --partial, this doesn’t risk mixing old and
254 new history since the old and new histories are in different
255 repositories.
256
257 --source <source>
258 Git repository to read from
259
260 --target <target>
261 Git repository to overwrite with filtered history
262
263 Miscellaneous options
264 --help, -h
265 Show a help message and exit.
266
267 --force, -f
268 Rewrite history even if the current repo does not look like a fresh
269 clone.
270
271 --partial
272 Do a partial history rewrite, resulting in the mixture of old and
273 new history. This implies a default of update-no-add for
274 --replace-refs, disables rewriting refs/remotes/origin/* to
275 refs/heads/*, disables removing of the origin remote, disables
276 removing unexported refs, disables expiring the reflog, and
277 disables the automatic post-filter gc. Also, this modifies
278 --tag-rename and --refname-callback options such that instead of
279 replacing old refs with new refnames, it will instead create new
280 refs and keep the old ones around. Use with caution.
281
282 --refs <refs+>
283 Limit history rewriting to the specified refs. Implies --partial.
284 In addition to the normal caveats of --partial (mixing old and new
285 history, no automatic remapping of refs/remotes/origin/* to
286 refs/heads/*, etc.), this also may cause problems for pruning of
287 degenerate empty merge commits when negative revisions are
288 specified.
289
290 --dry-run
291 Do not change the repository. Run git fast-export and filter its
292 output, and save both the original and the filtered version for
293 comparison. This also disables rewriting commit messages due to not
294 knowing new commit IDs and disables filtering of some empty commits
295 due to inability to query the fast-import backend.
296
297 --debug
298 Print additional information about operations being performed and
299 commands being run. (If used together with --dry-run, shows extra
300 information about what would be run).
301
302 --stdin
303 Instead of running git fast-export and filtering its output, filter
304 the fast-export stream from stdin. The stdin must be in the
305 expected input format (e.g. it needs to include original-oid
306 directives).
307
308 --quiet
309 Pass --quiet to other git commands called.
310
312 filter-repo has a hierarchy of capabilities on the spectrum from easy
313 to use convenience flags that perform pre-defined types of filtering,
314 to choices that provide lots of flexibility in controlling how
315 filtering occurs. This spectrum includes the following:
316
317 · Convenience flags making common types of history rewriting simple
318 (e.g. --path, --strip-blobs-bigger-than, --replace-text, --mailmap)
319
320 · Options which are shorthand for others or which provide greater
321 control than others (e.g. --subdirectory-filter could just be
322 written using both a path selection (--path) and a path rename
323 (--path-rename) filter; --paths-from-file can handle all other
324 --path* options and more such as regex renaming of paths)
325
326 · Generic python callbacks for handling a certain type of data (the
327 filename, message, name, email, and refname callbacks)
328
329 · Generic python callbacks for handling fundamental git objects,
330 allowing greater control over the combination of data types the
331 object holds (the commit, tag, blob, and reset callbacks)
332
333 · The ability to import filter-repo as a module in a python program
334 and use its classes and functions for even greater control and
335 flexibility while still leveraging lots of basic capabilities. One
336 can even use this to write new tools with a completely different
337 interface.
338
339 For more information about callbacks, see the section called
340 “CALLBACKS”. For examples on writing python programs that import
341 filter-repo as a module to create new history rewriting tools, look at
342 the contrib/filter-repo-demos/ directory. That directory includes,
343 among other examples, a reimplementation of git-filter-branch which is
344 faster than git-filter-branch, and a reimplementation of BFG Repo
345 Cleaner with several bug fixes and new features.
346
348 Using filter-repo is relatively simple, but rewriting history is part
349 of a larger discussion in terms of collaboration. When you rewrite
350 history, the old and new histories are no longer compatible; if you
351 push this history somewhere for others to view, it will look as though
352 you’ve done a rebase of all branches and tags. Make sure you are
353 familiar with the "RECOVERING FROM UPSTREAM REBASE" section of git-
354 rebase(1) (and in particular, "The hard case") before proceeding, in
355 addition to this section.
356
357 Steps to use git-filter-repo as part of the bigger picture of doing a
358 history rewrite are roughly as follows:
359
360 1. Create a clone of your repository (if you created special refs
361 outside of refs/heads/ or refs/tags/, make sure to fetch those
362 too). Note that --bare and --mirror clones are supported too, if
363 you prefer.
364
365 2. (Optional) Run git filter-repo --analyze. This will create a
366 directory of reports mentioning renames that have occurred in your
367 repo and also listing sizes of objects aggregated by
368 path/directory/extension/blob-id; this information may be useful in
369 choosing how to filter your repo. It can also be useful to re-run
370 --analyze after filtering to verify the changes look correct.
371
372 3. Run filter-repo with your desired filtering options. Many examples
373 are given below. For more complex cases, note that doing the
374 filtering in multiple steps (by running multiple filter-repo
375 invocations in a sequence) is supported. If anything goes wrong
376 here, simply delete your clone and restart.
377
378 4. Push your new repository to its new home (note that
379 refs/remotes/origin/* will have been moved to refs/heads/* as the
380 first part of filter-repo, so you can just deal with normal
381 branches instead of remote tracking branches). While you can force
382 push this to the same URL you cloned from, there are good reasons
383 to consider pushing to a different location instead:
384
385 · People who cloned from the original repo will have old history.
386 When they fetch the new history you force pushed up, unless
387 they do a git reset --hard @{u} on their branches or rebase
388 their local work, git will think they have hundreds or
389 thousands of commits with very similar commit messages as what
390 exist upstream (but which include files you wanted excised from
391 history), and allow the user to merge the two histories,
392 resulting in what looks like two copies of each commit. If they
393 then push this history back up, then everyone now has history
394 with two copies of each commit and the bad files have returned.
395 You’re more likely to succeed in forcing people to get rid of
396 the old history if they have to clone a new URL.
397
398 · Rewriting history will rewrite tags; those who have already
399 downloaded tags will not get the updated tags by default (see
400 the "On Re-tagging" section of git-tag(1)). Every user trying
401 to use an existing clone will have to forcibly delete all tags
402 and re-fetch them; it may be easier for them to just re-clone,
403 which they are more likely to do with a new clone URL.
404
405 · Rewriting history may delete some refs (e.g. branches that only
406 had files that you wanted excised from history); unless you run
407 git push with the --mirror or --prune options, those refs will
408 continue to exist on the server. If folks then merge these
409 branches into others, then people have started mixing old and
410 new history. If users had already cloned these branches,
411 removing them from the server isn’t enough; you need all users
412 to delete any local branches based on these refs and run fetch
413 with the --prune option as well. Simply re-cloning from a new
414 URL is easier.
415
416 · The server may not allow you to force push over some refs. For
417 example, code review systems may have special ref namespaces
418 (e.g. refs/changes/, refs/pull/, refs/merge-requests/) that
419 they have locked down.
420
421 5. (Optional) Some additional considerations
422
423 · filter-repo by default creates replace refs (see git-
424 replace(1)) for each rewritten commit ID, allowing you to use
425 old (unabbreviated) commit hashes to refer to the newly
426 rewritten commits. If you want to use these replace refs, push
427 them to the relevant clone URL and tell users to adjust their
428 fetch refspec (e.g. git config --add remote.origin.fetch
429 +refs/replace/*:refs/replace/*) Sadly, some existing git
430 servers (e.g. Gerrit, GitHub) do not yet understand replace
431 refs, and thus one can’t use old commit hashes within their UI;
432 this may change in the future. But replace refs at least help
433 users locally within the git CLI.
434
435 · If you have a central repo, you may want to prevent people from
436 pushing old commit IDs, in order to avoid mixing old and new
437 history. Every repository manager does this differently, some
438 provide specialized commands (e.g.
439 https://gerrit-review.googlesource.com/Documentation/cmd-ban-commit.html),
440 others require you to write hooks.
441
443 Path based filtering
444 To only keep the README.md file plus the directories guides and
445 tools/releases/:
446
447 git filter-repo --path README.md --path guides/ --path tools/releases
448
449
450 Directory names can be given with or without a trailing slash, and all
451 filenames are relative to the toplevel of the repo. To keep all files
452 except these paths, just add --invert-paths:
453
454 git filter-repo --path README.md --path guides/ --path tools/releases --invert-paths
455
456
457 If you want to have both an inclusion filter and an exclusion filter,
458 just run filter-repo multiple times. For example, to keep the src/main
459 subdirectory but exclude files under src/main named data, run:
460
461 git filter-repo --path src/main/
462 git filter-repo --path-glob 'src/*/data' --invert-paths
463
464
465 Note that the asterisk (*) will match across multiple directories, so
466 the second command would remove e.g. src/main/org/whatever/data. Also,
467 the second command by itself would also remove e.g.
468 src/not-main/foo/data, but since src/not-main/ was removed by the first
469 command, that’s not an issue. Also, the use of quotes around the
470 asterisk is sometimes important to avoid glob expansion by the shell.
471
472 You can also select paths by regular expression (see
473 https://docs.python.org/3/library/re.html#regular-expression-syntax).
474 For example, to only include files from the repo whose name is in the
475 format YYYY-MM-DD.txt and is found at least two subdirectories deep:
476
477 git filter-repo --path-regex '^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$'
478
479
480 If you want two directories to be renamed (and maybe merged if both are
481 renamed to the same location), use --path-rename; for example, to
482 rename both cmds/ and src/scripts/ to tools/:
483
484 git filter-repo --path-rename cmds:tools --path-rename src/scripts/:tools/
485
486
487 As with --path, directories can be specified with or without a trailing
488 slash for --path-rename.
489
490 If you do a --path-rename to something that was already in use, it will
491 be silently overwritten. However, if you try to rename multiple files
492 to the same location (e.g. src/scripts/run_release.sh and
493 cmds/run_release.sh both existed and had different content with the
494 renames above), then you will be given an error. If you have such a
495 case, you may want to add another rename command to move one of the
496 paths somewhere else where it won’t collide:
497
498 git filter-repo --path-rename cmds/run_release.sh:tools/do_release.sh \
499 --path-rename cmds/:tools/ \
500 --path-rename src/scripts/:tools/
501
502
503 Also, --path-rename brings up ordering issues; all path arguments are
504 applied in order. Thus, a command like
505
506 git filter-repo --path-rename sources/:src/main/ --path src/main/
507
508
509 would make sense but reversing the two arguments would not (src/main/
510 is created by the rename so reversing the two would give you an empty
511 repo). Also, note that the rename of cmds/run_release.sh a couple
512 examples ago was done before the other renames.
513
514 If you prefer to filter based solely on basename, use the
515 --use-base-name flag (though this is incompatible with --path-rename).
516 For example, to only include README.md and Makefile files from any
517 directory:
518
519 git filter-repo --use-base-name --path README.md --path Makefile
520
521
522 If you wanted to delete all .DS_Store files in any directory, you could
523 either use:
524
525 git filter-repo --invert-paths --path '.DS_Store' --use-base-name
526
527
528 or
529
530 git filter-repo --invert-paths --path-glob '*/.DS_Store' --path '.DS_Store'
531
532
533 (the --path-glob isn’t sufficient by itself as it might miss a toplevel
534 .DS_Store file; further while something like --path-glob '*.DS_Store'
535 would workaround that problem it would also grab files named
536 foo.DS_Store or bar/baz.DS_Store)
537
538 If you have a long list of files, directories, globs, or regular
539 expressions to filter on, you can stick them in a file and use
540 --paths-from-file; for example, with a file named stuff-i-want.txt with
541 contents of
542
543 README.md
544 guides/
545 tools/releases
546 glob:*.py
547 regex:^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$
548 tools/==>scripts/
549 regex:(.*)/([^/]*)/([^/]*)\.text$==>\2/\1/\3.txt
550
551
552 then you could run
553
554 git filter-repo --paths-from-file stuff-i-want.txt
555
556
557 to get a repo containing only the toplevel README.md file, the guides/
558 and tools/releases/ directories, all python files, files whose name was
559 of the form YYYY.MM-DD.txt at least two subdirectories deep, and would
560 rename tools/ to scripts/ and rename files like foo/bar/baz/bleh.text
561 to baz/foo/bar/bleh.txt. Note the special line prefixes of glob: and
562 regex: and the special string ==> denoting renames.
563
564 Finally, see also the --filename-callback from the section called
565 “CALLBACKS”.
566
567 Content based filtering
568 If you want to filter out all files bigger than a certain size, you can
569 use --strip-blobs-bigger-than with some size (K, M, and G suffixes are
570 recognized), e.g.:
571
572 git filter-repo --strip-blobs-bigger-than 10M
573
574
575 If you want to strip out all files with specified git object ids
576 (hashes), list the hashes in a file and run
577
578 git filter-repo --strip-blobs-with-ids FILE_WITH_GIT_BLOB_IDS
579
580
581 If you want to modify file contents, you can do so based on a list of
582 expressions in a file, one per line. For example, with a file named
583 expressions.txt containing
584
585 p455w0rd
586 foo==>bar
587 glob:*666*==>
588 regex:\bdriver\b==>pilot
589 literal:MM/DD/YYYY=>YYYY-MM-DD
590 regex:([0-9]{2})/([0-9]{2})/([0-9]{4})==>\3-\1-\2
591
592
593 then running
594
595 git filter-repo --replace-text expressions.txt
596
597
598 will go through and replace p455w0rd with ***REMOVED***, foo with bar,
599 any line containing 666 with a blank line, the word driver with pilot
600 (but not if it has letters before or after; e.g. drivers will be
601 unmodified), replace the exact text MM/DD/YYYY with YYYY-MM-DD and
602 replace date strings of the form MM/DD/YYYY with ones of the form
603 YYYY-MM-DD. In the expressions file, there are a few things to note:
604
605 · Every line has a replacement, given by whatever is on the right of
606 ==>. If ==> does not appear on the line, the default replacement is
607 ***REMOVED***.
608
609 · Lines can start with literal:, glob:, or regex: to specify whether
610 to do literal string matches, globs (see
611 https://docs.python.org/3/library/fnmatch.html), or regular
612 expressions (see
613 https://docs.python.org/3/library/re.html#regular-expression-syntax).
614 If none of these are specified, literal: is assumed.
615
616 · globs and regexes are applied to each line of the file; it is not
617 possible with --replace-text to match a multi-line string.
618
619 · If multiple matches are found on a line, all are replaced.
620
621 See also the --blob-callback from the section called “CALLBACKS”.
622
623 Refname based filtering
624 To rename tags, use --tag-rename, e.g.:
625
626 git filter-repo --tag-rename foo:bar
627
628
629 This will rename any tags starting with foo to now start with bar.
630 Either side of the colon could be blank, e.g.
631
632 git filter-repo --tag-rename '':'my-module-'
633
634
635 For more general refname modification, see --refname-callback from the
636 section called “CALLBACKS”.
637
638 User and email based filtering
639 To modify username and emails of commits, you can create a mailmap file
640 in the format accepted by git-shortlog(1). For example, if you have a
641 file named my-mailmap you can run
642
643 git filter-repo --mailmap my-mailmap
644
645
646 and if the current contents of that file are as follows (if the
647 specified mailmap file is version controlled, historical versions of
648 the file are ignored):
649
650 Name For User <email@addre.ss>
651 <new@ema.il> <old1@ema.il>
652 New Name And <new@ema.il> <old2@ema.il>
653 New Name And <new@ema.il> Old Name And <old3@ema.il>
654
655
656 then we can update username and/or emails based on the specified
657 mapping.
658
659 See also the --name-callback and --email-callback from the section
660 called “CALLBACKS”.
661
662 Parent rewriting
663 To replace $commit_A with $commit_B (e.g. make all commits which had
664 $commit_A as a parent instead have $commit_B for that parent), and
665 rewrite history to make it permanent:
666
667 git replace $commit_A $commit_B
668 git filter-repo --force
669
670
671 To create a new commit with the same contents as $commit_A except with
672 different parent(s) and then replace $commit_A with the new commit, and
673 rewrite history to make it permanent:
674
675 git replace --graft $commit_A $new_parent_or_parents
676 git filter-repo --force
677
678
679 The reason to specify --force is two-fold: filter-repo will error out
680 if no arguments are specified, and the new graft commit would otherwise
681 trigger the not-a-fresh-clone check.
682
683 Partial history rewrites
684 To rewrite the history on just one branch (which may cause it to no
685 longer share any common history with other branches), use --refs. For
686 example, to remove a file named extraneous.txt from the master branch:
687
688 git filter-repo --invert-paths --path extraneous.txt --refs master
689
690
691 To rewrite just some recent commits:
692
693 git filter-repo --invert-paths --path extraneous.txt --refs master~3..master
694
695
697 For flexibility, filter-repo allows you to specify functions on the
698 command line to further filter all changes. Please note that there are
699 some API compatibility caveats associated with these callbacks that you
700 should be aware of before using them; see the "API BACKWARD
701 COMPATIBILITY CAVEAT" comment near the top of git-filter-repo source
702 code.
703
704 All callback functions are of the same general format. For a command
705 line argument like
706
707 --foo-callback 'BODY'
708
709
710 the following code will be compiled and called:
711
712 def foo_callback(foo):
713 BODY
714
715
716 Thus, you just need to make sure your BODY modifies and returns foo
717 appropriately. One important thing to note for all callbacks is that
718 filter-repo uses bytestrings (see
719 https://docs.python.org/3/library/stdtypes.html#bytes) everywhere
720 instead of strings.
721
722 There are four callbacks that allow you to operate directly on raw
723 objects that contain data that’s easy to write in fast-import(1)
724 format:
725
726 --blob-callback
727 --commit-callback
728 --tag-callback
729 --reset-callback
730
731
732 We’ll come back to these later because it is often the case that the
733 other callbacks are more convenient. The other callbacks operate on a
734 small piece of the raw objects or operate on pieces across multiple
735 types of raw object (e.g. author names and committer names and tagger
736 names across commits and tags, or refnames across commits, tags, and
737 resets, or messages across commits and tags). The convenience callbacks
738 are:
739
740 --filename-callback
741 --message-callback
742 --name-callback
743 --email-callback
744 --refname-callback
745
746
747 in each you are expected to simply return a new value based on the one
748 passed in. For example,
749
750 git-filter-repo --name-callback 'return name.replace(b"Wiliam", b"William")'
751
752
753 would result in the following function being called:
754
755 def name_callback(name):
756 return name.replace(b"Wiliam", b"William")
757
758
759 The email callback is quite similar:
760
761 git-filter-repo --email-callback 'return email.replace(b".cm", b".com")'
762
763
764 The refname callback is also similar, but note that the refname passed
765 in and returned are expected to be fully qualified (e.g.
766 b"refs/heads/master" instead of just b"master" and b"refs/tags/v1.0.7"
767 instead of b"1.0.7"):
768
769 git-filter-repo --refname-callback '
770 # Change e.g. refs/heads/master to refs/heads/prefix-master
771 rdir,rpath = os.path.split(refname)
772 return rdir + b"/prefix-" + rpath'
773
774
775 The message callback is quite similar to the previous three callbacks,
776 though it operates on a bytestring that is likely more than one line:
777
778 git-filter-repo --message-callback '
779 if b"Signed-off-by:" not in message:
780 message += b"\nSigned-off-by: Me My <self@and.eye>"
781 return re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)'
782
783
784 The filename callback is slightly more interesting. Returning None
785 means the file should be removed from all commits, returning the
786 filename unmodified marks the file to be kept, and returning a
787 different name means the file should be renamed. An example:
788
789 git-filter-repo --filename-callback '
790 if b"/src/" in filename:
791 # Remove all files with a directory named "src" in their path
792 # (except when "src" appears at the toplevel).
793 return None
794 elif filename.startswith(b"tools/"):
795 # Rename tools/ -> scripts/misc/
796 return b"scripts/misc/" + filename[6:]
797 else:
798 # Keep the filename and do not rename it
799 return filename
800 '
801
802
803 In contrast, the blob, reset, tag, and commit callbacks are not
804 expected to return a value, but are instead expected to modify the
805 object passed in. Major fields for these objects are (subject to API
806 backward compatibility caveats mentioned previously):
807
808 · Blob: original_id (original hash) and data
809
810 · Reset: ref (name of reference) and from_ref (hash or integer mark)
811
812 · Tag: ref, from_ref, original_id, tagger_name, tagger_email,
813 tagger_date, message
814
815 · Commit: branch, original_id, author_name, author_email,
816 author_date, committer_name, committer_email, committer_date `,
817 `message, file_changes (list of FileChange objects, each containing
818 a type, filename, mode, and blob_id), parents (list of hashes or
819 integer marks)
820
821 An example of each:
822
823 git filter-repo --blob-callback '
824 if len(blob.data) > 25:
825 # Mark this blob for removal from all commits
826 blob.skip()
827 else:
828 blob.data = blob.data.replace(b"Hello", b"Goodbye")
829 '
830
831
832
833 git filter-repo --reset-callback 'reset.ref = reset.ref.replace(b"master", b"dev")'
834
835
836
837 git filter-repo --tag-callback '
838 if tag.tagger_name == b"Jim Williams":
839 # Omit this tag
840 tag.skip()
841 else:
842 tag.message = tag.message + b"\n\nTag of %s by %s on %s" % (tag.ref, tag.tagger_email, tag.tagger_date)'
843
844
845
846 git filter-repo --commit-callback '
847 # Remove executable files with three 6s in their name (including
848 # from leading directories).
849 # Also, undo deletion of sources/foo/bar.txt (change types are
850 # either b"D" (deletion) or b"M" (add or modify); renames are
851 # handled by deleting the old file and adding a new one)
852 commit.file_changes = [
853 change for change in commit.file_changes
854 if not (change.mode == b"100755" and
855 change.filename.count(b"6") == 3) and
856 not (change.type == b"D" and
857 change.filename == b"sources/foo/bar.txt")]
858 # Mark all .sh files as executable; modes in git are always one of
859 # 100644 (normal file), 100755 (executable), 120000 (symlink), or
860 # 160000 (submodule)
861 for change in commit.file_changes:
862 if change.filename.endswith(b".sh"):
863 change.mode = b"100755"
864 '
865
866
868 You probably don’t need to read this section unless you are just very
869 curious or you are trying to do a very complex history rewrite.
870
871 How filter-repo works
872 Roughly, filter-repo works by running
873
874 git fast-export <options> | filter | git fast-import <options>
875
876
877 where filter-repo not only launches the whole pipeline but also serves
878 as the filter in the middle. However, filter-repo does a few additional
879 things on top in order to make it into a well-rounded filtering tool. A
880 sequence that more accurately reflects what filter-repo runs is:
881
882 1. Verify we’re in a fresh clone
883
884 2. git fetch -u . refs/remotes/origin/*:refs/heads/*
885
886 3. git remote rm origin
887
888 4. git fast-export --show-original-ids --reference-excluded-parents
889 --fake-missing-tagger --signed-tags=strip
890 --tag-of-filtered-object=rewrite --use-done-feature --no-data
891 --reencode=yes --mark-tags --all | filter | git -c
892 core.ignorecase=false fast-import --force --quiet
893
894 5. git update-ref --no-deref --stdin, fed with a list of refs to nuke,
895 and a list of replace refs to delete, create, or update.
896
897 6. git reset --hard
898
899 7. git reflog expire --expire=now --all
900
901 8. git gc --prune=now
902
903 Some notes or exceptions on each of the above:
904
905 1. If we’re not in a fresh clone, users will not be able to recover if
906 they used the wrong command or ran in the wrong repo. (Though
907 --force overrides this check, and it’s also off if you’ve already
908 ran filter-repo once in this repo.)
909
910 2. Technically, we actually use a git update-ref command fed with a
911 lot of input due to the fact that users can use --force when local
912 branches might not match remote branches. But this fetch command
913 catches the intent rather succinctly.
914
915 3. We don’t want users accidentally pushing back to the original repo,
916 as discussed in the section called “DISCUSSION”. It also reminds
917 users that since history has been rewritten, this repo is no longer
918 compatible with the original. Finally, another minor benefit is
919 this allows users to push with the --mirror option to their new
920 home without accidentally sending remote tracking branches.
921
922 4. Some of these flags are always used but others are actually
923 conditional. For example, filter-repo’s --replace-text and
924 --blob-callback options need to work on blobs so --no-data cannot
925 be passed to fast-export. But when we don’t need to work on blobs,
926 passing --no-data speeds things up. Also, other flags may change
927 the structure of the pipeline as well (e.g. --dry-run and --debug)
928
929 5. We use this step to write replace refs for accessing the newly
930 written commit hashes using their previous names. Also, if refs
931 were renamed by various steps, we need to delete the old refnames
932 in order to avoid mixing old and new history.
933
934 6. Users also have old versions of files in their working tree and
935 index; we want those cleaned up to match the rewritten history as
936 well. Note that this step is skipped in bare repos.
937
938 7. Reflogs will hold on to old history, so we need to expire them.
939
940 8. We need to gc to avoid mixing new and old history. Also, it shrinks
941 the repository for users, so they don’t have to do extra work.
942 (Odds are that they’ve only rewritten trees and commits and maybe a
943 few blobs, so --aggressive isn’t needed and would be too slow.)
944
945 Information about these steps is printed out when --debug is passed to
946 filter-repo. When doing a --partial history rewrite, steps 2, 3, 7, and
947 8 are unconditionally skipped, step 5 is skipped if --replace-refs is
948 update-no-add, and just the nuke-unused-refs portion of step 5 is
949 skipped if --replace-refs is something else.
950
951 Limitations
952 Inherited limitations
953 Since git filter-repo calls fast-export and fast-import to do a lot
954 of the heavy lifting, it inherits limitations from those systems:
955
956 · extended commit headers, if any, are stripped
957
958 · commits get rewritten meaning they will have new hashes;
959 therefore, signatures on commits and tags cannot continue to
960 work and instead are just removed (thus signed tags become
961 annotated tags)
962
963 · tags of commits are supported. Prior to git-2.24.0, tags of
964 blobs and tags of tags are not supported (fast-export would die
965 on such tags). tags of trees are not supported in any git
966 version (since fast-export ignores tags of trees with a warning
967 and fast-import provides no way to import them).
968
969 · annotated and signed tags outside of the refs/tags/ namespace
970 are not supported (their location will be mangled in weird
971 ways)
972
973 · fast-import will die on various forms of invalid input, such as
974 a timezone with more than four digits
975
976 · fast-export cannot reencode commit messages into UTF-8 if the
977 commit message is not valid in its specified encoding (in such
978 cases, it’ll leave the commit message and the encoding header
979 alone).
980
981 · commits without an author will be given one matching the
982 committer
983
984 · tags without a tagger will be given a fake tagger
985
986 · references that include commit cycles in their history (which
987 can be created with git-replace(1)) will not be flagged to the
988 user as an error but will be silently deleted by fast-export as
989 though the branch or tag contained no interesting files
990
991 There are also some limitations due to the design of these systems:
992
993 · Trying to insert additional files into the stream can be
994 tricky; since fast-export only lists file changes in a merge
995 relative to its first parent, if you insert additional files
996 into a commit that is in the second (or third or fourth) parent
997 history of a merge, then you also need to add it to the merge
998 manually. (Similarly, if you change which parent is the first
999 parent in a merge commit, you need to manually update the list
1000 of file changes to be relative to the new first parent.)
1001
1002 · fast-export and fast-import work with exact file contents, not
1003 patches. (e.g. "Whatever the current contents of this file,
1004 update them to now have these contents") Because of this,
1005 removing the changes made in a single commit or inserting
1006 additional changes to a file in some commit and expecting them
1007 to propagate forward is not something that can be done with
1008 these tools. Use git-rebase(1) for that.
1009
1010 Intrinsic limitations
1011 Some types of filtering have limitations that would affect any tool
1012 attempting to perform them; the most any tool can do is attempt to
1013 notify the user when it detects an issue:
1014
1015 · When rewriting commit hashes in commit messages, there are a
1016 variety of cases when the hash will not be updated (whenever
1017 this happens, a note is written to
1018 .git/filter-repo/suboptimal-issues):
1019
1020 · if a commit hash does not correspond to a commit in the old
1021 repo
1022
1023 · if a commit hash corresponds to a commit that gets pruned
1024
1025 · if an abbreviated hash is not unique
1026
1027 · Pruning of empty commits can cause a merge commit to lose an
1028 entire ancestry line and become a non-merge. If the merge
1029 commit had no changes then it can be pruned too, but if it
1030 still has changes it needs to be kept. This might cause minor
1031 confusion since the commit will likely have a commit message
1032 that makes it sound like a merge commit even though it’s not.
1033 (Whenever a merge commit becomes a non-merge commit, a note is
1034 written to .git/filter-repo/suboptimal-issues)
1035
1036 Issues specific to filter-repo
1037 · Multiple repositories in the wild have been observed which use
1038 a bogus timezone (+051800); google will find you some reports.
1039 The intended timezone wasn’t clear or wasn’t always the same.
1040 Replace with a different bogus timezone that fast-import will
1041 accept (+0261).
1042
1043 · --path-rename can result in pathname collisions; to avoid
1044 excessive memory requirements of tracking which files are in
1045 all commits or looking up what files exist with either every
1046 commit or every usage of --path-rename, we just tell the user
1047 that they might clobber other changes if they aren’t careful.
1048 We can check if the clobbering comes from another --path-rename
1049 without much overhead. (Perhaps in the future it’s worth adding
1050 a slow mode to --path-rename that will do the more exhaustive
1051 checks?)
1052
1053 · There is no mechanism for directly controlling which flags are
1054 passed to fast-export (or fast-import); only pre-defined flags
1055 can be turned on or off as a side-effect of other options.
1056 Direct control would make little sense because some options
1057 like --full-tree would require additional code in filter-repo
1058 (to parse new directives), and others such as -M or -C would
1059 break assumptions used in other places of filter-repo.
1060
1061 · Partial-repo filtering, while supported, runs counter to
1062 filter-repo’s "avoid mixing old and new history" design. This
1063 support has required improvements to core git as well (e.g. it
1064 depends upon the --reference-excluded-parents option to
1065 fast-export that was added specifically for this usage within
1066 filter-repo). The --partial and --refs options will continue to
1067 be supported since there are people with usecases for them;
1068 however, I am concerned that this inconsistency about mixing
1069 old and new history seems likely to lead to user mistakes. For
1070 now, I just hope that long explanations of caveats in the
1071 documentation of these options suffice to curtail any such
1072 problems.
1073
1074 Comments on reversibility
1075 Some people are interested in reversibility of of a rewrite; e.g.
1076 rewrite history, possibly add some commits, then unrewrite and get
1077 the original history back plus a few new "unrewritten" commits.
1078 Obviously this is impossible if your rewrite involves throwing away
1079 information (e.g. filtering out files or replacing several
1080 different strings with ***REMOVED***), but may be possible with
1081 some rewrites. filter-repo is likely to be a poor fit for this type
1082 of workflow for a few reasons:
1083
1084 · most of the limitations inherited from fast-export and
1085 fast-import are of a type that cause reversibility issues
1086
1087 · grafts and replace refs, if present, are used in the rewrite
1088 and made permanent
1089
1090 · rewriting of commit hashes will probably be reversible, but it
1091 is possible for rewritten abbreviated hashes to not be unique
1092 even if the original abbreviated hashes were.
1093
1094 · filter-repo defaults to several forms of unreversible rewriting
1095 that you may need to turn off (e.g. the last two bullet points
1096 above or reencoding commit messages into UTF-8); it’s possible
1097 that additional forms of unreversible rewrites will be added in
1098 the future.
1099
1100 · I assume that people use filter-repo for one-shot conversions,
1101 not ongoing data transfers. I explicitly reserve the right to
1102 change any API in filter-repo based on this presumption (and a
1103 comment to this effect is found in multiple places in the code
1104 and examples). You have been warned.
1105
1107 git-rebase(1), git-filter-branch(1)
1108
1110 Part of the git(1) suite
1111
1112
1113
1114Git 2.25.0.dirty 01/13/2020 GIT-FILTER-REPO(1)