1GIT-FILTER-REPO(1)                Git Manual                GIT-FILTER-REPO(1)
2
3
4

NAME

6       git-filter-repo - Rewrite repository history
7

SYNOPSIS

9       git filter-repo --analyze
10       git filter-repo [<path_filtering_options>] [<content_filtering_options>]
11               [<ref_renaming_options>] [<commit_message_filtering_options>]
12               [<name_or_email_filtering_options>] [<parent_rewriting_options>]
13               [<generic_callback_options>] [<miscellaneous_options>]
14
15

DESCRIPTION

17       Rapidly rewrite entire repository history using user-specified filters.
18       This is a destructive operation which should not be used lightly; it
19       writes new commits, trees, tags, and blobs corresponding to (but
20       filtered from) the original objects in the repository, then deletes the
21       original history and leaves only the new. See the section called
22       “DISCUSSION” for more details on the ramifications of using this tool.
23       Several different types of history rewrites are possible; examples
24       include (but are not limited to):
25
26       ·   stripping large files (or large directories or large extensions)
27
28       ·   stripping unwanted files by path
29
30       ·   extracting wanted paths and their history (stripping everything
31           else)
32
33       ·   restructuring the file layout (such as moving all files into a
34           subdirectory in preparation for merging with another repo, making a
35           subdirectory become the new toplevel directory, or merging two
36           directories with independent filenames into one directory)
37
38       ·   renaming tags (also often in preparation for merging with another
39           repo)
40
41       ·   replacing or removing sensitive text such as passwords
42
43       ·   making mailmap rewriting of user names or emails permanent
44
45       ·   making grafts or replacement refs permanent
46
47       ·   rewriting commit messages
48
49       Additionally, several concerns are handled automatically (many of these
50       can be overridden, but they are all on by default):
51
52       ·   rewriting (possibly abbreviated) hashes in commit messages to refer
53           to the new post-rewrite commit hashes
54
55       ·   pruning commits which become empty due to the above filters (also
56           handles edge cases like pruning of merge commits which become
57           degenerate and empty)
58
59       ·   creating replace-refs (see git-replace(1)) for old commit hashes,
60           which if pushed and fetched will allow users to continue to refer
61           to new commits using (unabbreviated) old commit IDs
62
63       ·   stripping of original history to avoid mixing old and new history
64
65       ·   repacking the repository post-rewrite to shrink the repo for the
66           user
67
68       Also, it’s worth noting that there is an important safety mechanism:
69
70       ·   abort if run from a repo that is not a fresh clone (to prevent
71           accidental data loss from rewriting local history that doesn’t
72           exist anywhere else)
73
74       For those who know that there is large unwanted stuff in their history
75       and want help finding it, this command also
76
77       ·   provides an option to analyze a repository and generate reports
78           that can be useful in determining what to filter (or in determining
79           whether a separate filtering command was successful).
80
81       See also the section called “VERSATILITY”, the section called
82       “DISCUSSION”, the section called “EXAMPLES”, and the section called
83       “INTERNALS”.
84

OPTIONS

86   Analysis Options
87       --analyze
88           Analyze repository history and create a report that may be useful
89           in determining what to filter in a subsequent run (or in
90           determining if a previous filtering command did what you wanted).
91           Will not modify your repo.
92
93   Filtering based on paths (see also --filename-callback)
94       --invert-paths
95           Invert the selection of files from the specified
96           --path-{match,glob,regex} options below, i.e. only select files
97           matching none of those options.
98
99       --path-match <dir_or_file>, --path <dir_or_file>
100           Exact paths (files or directories) to include in filtered history.
101           Multiple --path options can be specified to get a union of paths.
102
103       --path-glob <glob>
104           Glob of paths to include in filtered history. Multiple --path-glob
105           options can be specified to get a union of paths.
106
107       --path-regex <regex>
108           Regex of paths to include in filtered history. Multiple
109           --path-regex options can be specified to get a union of paths.
110
111       --use-base-name
112           Match on file base name instead of full path from the top of the
113           repo. Incompatible with --path-rename.
114
115   Renaming based on paths (see also --filename-callback)
116       --path-rename <old_name:new_name>, --path-rename-match
117       <old_name:new_name>
118           Path to rename; if filename or directory matches <old_name> rename
119           to <new_name>. Multiple --path-rename options can be specified.
120
121   Path shortcuts
122       --paths-from-file <filename>
123           Specify several path filtering and renaming directives, one per
124           line. Lines with ==> in them specify path renames, and lines can
125           begin with literal: (the default), glob:, or regex: to specify
126           different matching styles
127
128       --subdirectory-filter <directory>
129           Only look at history that touches the given subdirectory and treat
130           that directory as the project root. Equivalent to using --path
131           <directory>/ --path-rename <directory>/:
132
133       --to-subdirectory-filter <directory>
134           Treat the project root as instead being under <directory>.
135           Equivalent to using --path-rename :<directory>/
136
137   Content editing filters (see also --blob-callback)
138       --replace-text <expressions_file>
139           A file with expressions that, if found, will be replaced. By
140           default, each expression is treated as literal text, but regex: and
141           glob: prefixes are supported. You can end the line with ==> and
142           some replacement text to choose a replacement choice other than the
143           default of ***REMOVED***.
144
145       --strip-blobs-bigger-than <size>
146           Strip blobs (files) bigger than specified size (e.g.  5M, 2G, etc)
147
148       --strip-blobs-with-ids <blob_id_filename>
149           Read git object ids from each line of the given file, and strip all
150           of them from history
151
152   Renaming of refs (see also --refname-callback)
153       --tag-rename <old:new>
154           Rename tags starting with <old> to start with <new>. For example,
155           --tag-rename foo:bar will rename tag foo-1.2.3 to bar-1.2.3; either
156           <old> or <new> can be empty.
157
158   Filtering of commit messages (see also --message-callback)
159       --preserve-commit-hashes
160           By default, since commits are rewritten and thus gain new hashes,
161           references to old commit hashes in commit messages are replaced
162           with new commit hashes (abbreviated to the same length as the old
163           reference). Use this flag to turn off updating commit hashes in
164           commit messages.
165
166       --preserve-commit-encoding
167           Do not reencode commit messages into UTF-8. By default, if the
168           commit object specifies an encoding for the commit message, the
169           message is re-encoded into UTF-8.
170
171   Filtering of names & emails (see also --name-callback and --email-callback)
172       --mailmap <filename>
173           Use specified mailmap file (see git-shortlog(1) for details on the
174           format) when rewriting author, committer, and tagger names and
175           emails. If the specified file is part of git history, historical
176           versions of the file will be ignored; only the current contents are
177           consulted.
178
179       --use-mailmap
180           Same as: --mailmap .mailmap
181
182   Parent rewriting
183       --replace-refs {delete-no-add, delete-and-add, update-no-add,
184       update-or-add, update-and-add}
185           Replace refs (see git-replace(1)) are used to rewrite parents
186           (unless turned off by the usual git mechanism); this flag specifies
187           what do do with those refs afterward. Replace refs can either be
188           deleted or updated to point at new commit hashes. Also, new replace
189           refs can be added for each commit rewrite. With update-or-add, new
190           replace refs are only added for commit rewrites that aren’t used to
191           update an existing replace ref. default is update-and-add if
192           $GIT_DIR/filter-repo/already_ran does not exist; update-or-add
193           otherwise.
194
195       --prune-empty {always, auto, never}
196           Whether to prune empty commits.  auto (the default) means only
197           prune commits which become empty (not commits which were empty in
198           the original repo, unless their parent was pruned). When the parent
199           of a commit is pruned, the first non-pruned ancestor becomes the
200           new parent.
201
202       --prune-degenerate {always, auto, never}
203           Since merge commits are needed for history topology, they are
204           typically exempt from pruning. However, they can become degenerate
205           with the pruning of other commits (having fewer than two parents,
206           having one commit serve as both parents, or having one parent as
207           the ancestor of the other.) If such merge commits have no file
208           changes, they can be pruned. The default (auto) is to only prune
209           empty merge commits which become degenerate (not which started as
210           such).
211
212   Generic callback code snippets
213       --filename-callback <function_body>
214           Python code body for processing filenames; see the section called
215           “CALLBACKS”.
216
217       --message-callback <function_body>
218           Python code body for processing messages (both commit messages and
219           tag messages); see the section called “CALLBACKS”.
220
221       --name-callback <function_body>
222           Python code body for processing names of people; see the section
223           called “CALLBACKS”.
224
225       --email-callback <function_body>
226           Python code body for processing emails addresses; see the section
227           called “CALLBACKS”.
228
229       --refname-callback <function_body>
230           Python code body for processing refnames; see the section called
231           “CALLBACKS”.
232
233       --blob-callback <function_body>
234           Python code body for processing blob objects; see the section
235           called “CALLBACKS”.
236
237       --commit-callback <function_body>
238           Python code body for processing commit objects; see the section
239           called “CALLBACKS”.
240
241       --tag-callback <function_body>
242           Python code body for processing tag objects; see the section called
243           “CALLBACKS”.
244
245       --reset-callback <function_body>
246           Python code body for processing reset objects; see the section
247           called “CALLBACKS”.
248
249   Location to filter from/to
250           Note
251           Specifying alternate source or target locations implies --partial
252           except that the normal default for --replace-refs is used. However,
253           unlike normal uses of --partial, this doesn’t risk mixing old and
254           new history since the old and new histories are in different
255           repositories.
256
257       --source <source>
258           Git repository to read from
259
260       --target <target>
261           Git repository to overwrite with filtered history
262
263   Miscellaneous options
264       --help, -h
265           Show a help message and exit.
266
267       --force, -f
268           Rewrite history even if the current repo does not look like a fresh
269           clone.
270
271       --partial
272           Do a partial history rewrite, resulting in the mixture of old and
273           new history. This implies a default of update-no-add for
274           --replace-refs, disables rewriting refs/remotes/origin/* to
275           refs/heads/*, disables removing of the origin remote, disables
276           removing unexported refs, disables expiring the reflog, and
277           disables the automatic post-filter gc. Also, this modifies
278           --tag-rename and --refname-callback options such that instead of
279           replacing old refs with new refnames, it will instead create new
280           refs and keep the old ones around. Use with caution.
281
282       --refs <refs+>
283           Limit history rewriting to the specified refs. Implies --partial.
284           In addition to the normal caveats of --partial (mixing old and new
285           history, no automatic remapping of refs/remotes/origin/* to
286           refs/heads/*, etc.), this also may cause problems for pruning of
287           degenerate empty merge commits when negative revisions are
288           specified.
289
290       --dry-run
291           Do not change the repository. Run git fast-export and filter its
292           output, and save both the original and the filtered version for
293           comparison. This also disables rewriting commit messages due to not
294           knowing new commit IDs and disables filtering of some empty commits
295           due to inability to query the fast-import backend.
296
297       --debug
298           Print additional information about operations being performed and
299           commands being run. (If used together with --dry-run, shows extra
300           information about what would be run).
301
302       --stdin
303           Instead of running git fast-export and filtering its output, filter
304           the fast-export stream from stdin. The stdin must be in the
305           expected input format (e.g. it needs to include original-oid
306           directives).
307
308       --quiet
309           Pass --quiet to other git commands called.
310

VERSATILITY

312       filter-repo has a hierarchy of capabilities on the spectrum from easy
313       to use convenience flags that perform pre-defined types of filtering,
314       to choices that provide lots of flexibility in controlling how
315       filtering occurs. This spectrum includes the following:
316
317       ·   Convenience flags making common types of history rewriting simple
318           (e.g. --path, --strip-blobs-bigger-than, --replace-text, --mailmap)
319
320       ·   Options which are shorthand for others or which provide greater
321           control than others (e.g. --subdirectory-filter could just be
322           written using both a path selection (--path) and a path rename
323           (--path-rename) filter; --paths-from-file can handle all other
324           --path* options and more such as regex renaming of paths)
325
326       ·   Generic python callbacks for handling a certain type of data (the
327           filename, message, name, email, and refname callbacks)
328
329       ·   Generic python callbacks for handling fundamental git objects,
330           allowing greater control over the combination of data types the
331           object holds (the commit, tag, blob, and reset callbacks)
332
333       ·   The ability to import filter-repo as a module in a python program
334           and use its classes and functions for even greater control and
335           flexibility while still leveraging lots of basic capabilities. One
336           can even use this to write new tools with a completely different
337           interface.
338
339       For more information about callbacks, see the section called
340       “CALLBACKS”. For examples on writing python programs that import
341       filter-repo as a module to create new history rewriting tools, look at
342       the contrib/filter-repo-demos/ directory. That directory includes,
343       among other examples, a reimplementation of git-filter-branch which is
344       faster than git-filter-branch, and a reimplementation of BFG Repo
345       Cleaner with several bug fixes and new features.
346

DISCUSSION

348       Using filter-repo is relatively simple, but rewriting history is part
349       of a larger discussion in terms of collaboration. When you rewrite
350       history, the old and new histories are no longer compatible; if you
351       push this history somewhere for others to view, it will look as though
352       you’ve done a rebase of all branches and tags. Make sure you are
353       familiar with the "RECOVERING FROM UPSTREAM REBASE" section of git-
354       rebase(1) (and in particular, "The hard case") before proceeding, in
355       addition to this section.
356
357       Steps to use git-filter-repo as part of the bigger picture of doing a
358       history rewrite are roughly as follows:
359
360        1. Create a clone of your repository (if you created special refs
361           outside of refs/heads/ or refs/tags/, make sure to fetch those
362           too). Note that --bare and --mirror clones are supported too, if
363           you prefer.
364
365        2. (Optional) Run git filter-repo --analyze. This will create a
366           directory of reports mentioning renames that have occurred in your
367           repo and also listing sizes of objects aggregated by
368           path/directory/extension/blob-id; this information may be useful in
369           choosing how to filter your repo. It can also be useful to re-run
370           --analyze after filtering to verify the changes look correct.
371
372        3. Run filter-repo with your desired filtering options. Many examples
373           are given below. For more complex cases, note that doing the
374           filtering in multiple steps (by running multiple filter-repo
375           invocations in a sequence) is supported. If anything goes wrong
376           here, simply delete your clone and restart.
377
378        4. Push your new repository to its new home (note that
379           refs/remotes/origin/* will have been moved to refs/heads/* as the
380           first part of filter-repo, so you can just deal with normal
381           branches instead of remote tracking branches). While you can force
382           push this to the same URL you cloned from, there are good reasons
383           to consider pushing to a different location instead:
384
385           ·   People who cloned from the original repo will have old history.
386               When they fetch the new history you force pushed up, unless
387               they do a git reset --hard @{u} on their branches or rebase
388               their local work, git will think they have hundreds or
389               thousands of commits with very similar commit messages as what
390               exist upstream (but which include files you wanted excised from
391               history), and allow the user to merge the two histories,
392               resulting in what looks like two copies of each commit. If they
393               then push this history back up, then everyone now has history
394               with two copies of each commit and the bad files have returned.
395               You’re more likely to succeed in forcing people to get rid of
396               the old history if they have to clone a new URL.
397
398           ·   Rewriting history will rewrite tags; those who have already
399               downloaded tags will not get the updated tags by default (see
400               the "On Re-tagging" section of git-tag(1)). Every user trying
401               to use an existing clone will have to forcibly delete all tags
402               and re-fetch them; it may be easier for them to just re-clone,
403               which they are more likely to do with a new clone URL.
404
405           ·   Rewriting history may delete some refs (e.g. branches that only
406               had files that you wanted excised from history); unless you run
407               git push with the --mirror or --prune options, those refs will
408               continue to exist on the server. If folks then merge these
409               branches into others, then people have started mixing old and
410               new history. If users had already cloned these branches,
411               removing them from the server isn’t enough; you need all users
412               to delete any local branches based on these refs and run fetch
413               with the --prune option as well. Simply re-cloning from a new
414               URL is easier.
415
416           ·   The server may not allow you to force push over some refs. For
417               example, code review systems may have special ref namespaces
418               (e.g. refs/changes/, refs/pull/, refs/merge-requests/) that
419               they have locked down.
420
421        5. (Optional) Some additional considerations
422
423           ·   filter-repo by default creates replace refs (see git-
424               replace(1)) for each rewritten commit ID, allowing you to use
425               old (unabbreviated) commit hashes to refer to the newly
426               rewritten commits. If you want to use these replace refs, push
427               them to the relevant clone URL and tell users to adjust their
428               fetch refspec (e.g.  git config --add remote.origin.fetch
429               +refs/replace/*:refs/replace/*) Sadly, some existing git
430               servers (e.g. Gerrit, GitHub) do not yet understand replace
431               refs, and thus one can’t use old commit hashes within their UI;
432               this may change in the future. But replace refs at least help
433               users locally within the git CLI.
434
435           ·   If you have a central repo, you may want to prevent people from
436               pushing old commit IDs, in order to avoid mixing old and new
437               history. Every repository manager does this differently, some
438               provide specialized commands (e.g.
439               https://gerrit-review.googlesource.com/Documentation/cmd-ban-commit.html),
440               others require you to write hooks.
441

EXAMPLES

443   Path based filtering
444       To only keep the README.md file plus the directories guides and
445       tools/releases/:
446
447           git filter-repo --path README.md --path guides/ --path tools/releases
448
449
450       Directory names can be given with or without a trailing slash, and all
451       filenames are relative to the toplevel of the repo. To keep all files
452       except these paths, just add --invert-paths:
453
454           git filter-repo --path README.md --path guides/ --path tools/releases --invert-paths
455
456
457       If you want to have both an inclusion filter and an exclusion filter,
458       just run filter-repo multiple times. For example, to keep the src/main
459       subdirectory but exclude files under src/main named data, run:
460
461           git filter-repo --path src/main/
462           git filter-repo --path-glob 'src/*/data' --invert-paths
463
464
465       Note that the asterisk (*) will match across multiple directories, so
466       the second command would remove e.g. src/main/org/whatever/data. Also,
467       the second command by itself would also remove e.g.
468       src/not-main/foo/data, but since src/not-main/ was removed by the first
469       command, that’s not an issue. Also, the use of quotes around the
470       asterisk is sometimes important to avoid glob expansion by the shell.
471
472       You can also select paths by regular expression (see
473       https://docs.python.org/3/library/re.html#regular-expression-syntax).
474       For example, to only include files from the repo whose name is in the
475       format YYYY-MM-DD.txt and is found at least two subdirectories deep:
476
477           git filter-repo --path-regex '^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$'
478
479
480       If you want two directories to be renamed (and maybe merged if both are
481       renamed to the same location), use --path-rename; for example, to
482       rename both cmds/ and src/scripts/ to tools/:
483
484           git filter-repo --path-rename cmds:tools --path-rename src/scripts/:tools/
485
486
487       As with --path, directories can be specified with or without a trailing
488       slash for --path-rename.
489
490       If you do a --path-rename to something that was already in use, it will
491       be silently overwritten. However, if you try to rename multiple files
492       to the same location (e.g. src/scripts/run_release.sh and
493       cmds/run_release.sh both existed and had different content with the
494       renames above), then you will be given an error. If you have such a
495       case, you may want to add another rename command to move one of the
496       paths somewhere else where it won’t collide:
497
498           git filter-repo --path-rename cmds/run_release.sh:tools/do_release.sh \
499                           --path-rename cmds/:tools/ \
500                           --path-rename src/scripts/:tools/
501
502
503       Also, --path-rename brings up ordering issues; all path arguments are
504       applied in order. Thus, a command like
505
506           git filter-repo --path-rename sources/:src/main/ --path src/main/
507
508
509       would make sense but reversing the two arguments would not (src/main/
510       is created by the rename so reversing the two would give you an empty
511       repo). Also, note that the rename of cmds/run_release.sh a couple
512       examples ago was done before the other renames.
513
514       If you prefer to filter based solely on basename, use the
515       --use-base-name flag (though this is incompatible with --path-rename).
516       For example, to only include README.md and Makefile files from any
517       directory:
518
519           git filter-repo --use-base-name --path README.md --path Makefile
520
521
522       If you wanted to delete all .DS_Store files in any directory, you could
523       either use:
524
525           git filter-repo --invert-paths --path '.DS_Store' --use-base-name
526
527
528       or
529
530           git filter-repo --invert-paths --path-glob '*/.DS_Store' --path '.DS_Store'
531
532
533       (the --path-glob isn’t sufficient by itself as it might miss a toplevel
534       .DS_Store file; further while something like --path-glob '*.DS_Store'
535       would workaround that problem it would also grab files named
536       foo.DS_Store or bar/baz.DS_Store)
537
538       If you have a long list of files, directories, globs, or regular
539       expressions to filter on, you can stick them in a file and use
540       --paths-from-file; for example, with a file named stuff-i-want.txt with
541       contents of
542
543           README.md
544           guides/
545           tools/releases
546           glob:*.py
547           regex:^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$
548           tools/==>scripts/
549           regex:(.*)/([^/]*)/([^/]*)\.text$==>\2/\1/\3.txt
550
551
552       then you could run
553
554           git filter-repo --paths-from-file stuff-i-want.txt
555
556
557       to get a repo containing only the toplevel README.md file, the guides/
558       and tools/releases/ directories, all python files, files whose name was
559       of the form YYYY.MM-DD.txt at least two subdirectories deep, and would
560       rename tools/ to scripts/ and rename files like foo/bar/baz/bleh.text
561       to baz/foo/bar/bleh.txt. Note the special line prefixes of glob: and
562       regex: and the special string ==> denoting renames.
563
564       Finally, see also the --filename-callback from the section called
565       “CALLBACKS”.
566
567   Content based filtering
568       If you want to filter out all files bigger than a certain size, you can
569       use --strip-blobs-bigger-than with some size (K, M, and G suffixes are
570       recognized), e.g.:
571
572           git filter-repo --strip-blobs-bigger-than 10M
573
574
575       If you want to strip out all files with specified git object ids
576       (hashes), list the hashes in a file and run
577
578           git filter-repo --strip-blobs-with-ids FILE_WITH_GIT_BLOB_IDS
579
580
581       If you want to modify file contents, you can do so based on a list of
582       expressions in a file, one per line. For example, with a file named
583       expressions.txt containing
584
585           p455w0rd
586           foo==>bar
587           glob:*666*==>
588           regex:\bdriver\b==>pilot
589           literal:MM/DD/YYYY=>YYYY-MM-DD
590           regex:([0-9]{2})/([0-9]{2})/([0-9]{4})==>\3-\1-\2
591
592
593       then running
594
595           git filter-repo --replace-text expressions.txt
596
597
598       will go through and replace p455w0rd with ***REMOVED***, foo with bar,
599       any line containing 666 with a blank line, the word driver with pilot
600       (but not if it has letters before or after; e.g. drivers will be
601       unmodified), replace the exact text MM/DD/YYYY with YYYY-MM-DD and
602       replace date strings of the form MM/DD/YYYY with ones of the form
603       YYYY-MM-DD. In the expressions file, there are a few things to note:
604
605       ·   Every line has a replacement, given by whatever is on the right of
606           ==>. If ==> does not appear on the line, the default replacement is
607           ***REMOVED***.
608
609       ·   Lines can start with literal:, glob:, or regex: to specify whether
610           to do literal string matches, globs (see
611           https://docs.python.org/3/library/fnmatch.html), or regular
612           expressions (see
613           https://docs.python.org/3/library/re.html#regular-expression-syntax).
614           If none of these are specified, literal: is assumed.
615
616       ·   globs and regexes are applied to each line of the file; it is not
617           possible with --replace-text to match a multi-line string.
618
619       ·   If multiple matches are found on a line, all are replaced.
620
621       See also the --blob-callback from the section called “CALLBACKS”.
622
623   Refname based filtering
624       To rename tags, use --tag-rename, e.g.:
625
626           git filter-repo --tag-rename foo:bar
627
628
629       This will rename any tags starting with foo to now start with bar.
630       Either side of the colon could be blank, e.g.
631
632           git filter-repo --tag-rename '':'my-module-'
633
634
635       For more general refname modification, see --refname-callback from the
636       section called “CALLBACKS”.
637
638   User and email based filtering
639       To modify username and emails of commits, you can create a mailmap file
640       in the format accepted by git-shortlog(1). For example, if you have a
641       file named my-mailmap you can run
642
643           git filter-repo --mailmap my-mailmap
644
645
646       and if the current contents of that file are as follows (if the
647       specified mailmap file is version controlled, historical versions of
648       the file are ignored):
649
650           Name For User <email@addre.ss>
651           <new@ema.il> <old1@ema.il>
652           New Name And <new@ema.il> <old2@ema.il>
653           New Name And <new@ema.il> Old Name And <old3@ema.il>
654
655
656       then we can update username and/or emails based on the specified
657       mapping.
658
659       See also the --name-callback and --email-callback from the section
660       called “CALLBACKS”.
661
662   Parent rewriting
663       To replace $commit_A with $commit_B (e.g. make all commits which had
664       $commit_A as a parent instead have $commit_B for that parent), and
665       rewrite history to make it permanent:
666
667           git replace $commit_A $commit_B
668           git filter-repo --force
669
670
671       To create a new commit with the same contents as $commit_A except with
672       different parent(s) and then replace $commit_A with the new commit, and
673       rewrite history to make it permanent:
674
675           git replace --graft $commit_A $new_parent_or_parents
676           git filter-repo --force
677
678
679       The reason to specify --force is two-fold: filter-repo will error out
680       if no arguments are specified, and the new graft commit would otherwise
681       trigger the not-a-fresh-clone check.
682
683   Partial history rewrites
684       To rewrite the history on just one branch (which may cause it to no
685       longer share any common history with other branches), use --refs. For
686       example, to remove a file named extraneous.txt from the master branch:
687
688           git filter-repo --invert-paths --path extraneous.txt --refs master
689
690
691       To rewrite just some recent commits:
692
693           git filter-repo --invert-paths --path extraneous.txt --refs master~3..master
694
695

CALLBACKS

697       For flexibility, filter-repo allows you to specify functions on the
698       command line to further filter all changes. Please note that there are
699       some API compatibility caveats associated with these callbacks that you
700       should be aware of before using them; see the "API BACKWARD
701       COMPATIBILITY CAVEAT" comment near the top of git-filter-repo source
702       code.
703
704       All callback functions are of the same general format. For a command
705       line argument like
706
707           --foo-callback 'BODY'
708
709
710       the following code will be compiled and called:
711
712           def foo_callback(foo):
713             BODY
714
715
716       Thus, you just need to make sure your BODY modifies and returns foo
717       appropriately. One important thing to note for all callbacks is that
718       filter-repo uses bytestrings (see
719       https://docs.python.org/3/library/stdtypes.html#bytes) everywhere
720       instead of strings.
721
722       There are four callbacks that allow you to operate directly on raw
723       objects that contain data that’s easy to write in fast-import(1)
724       format:
725
726           --blob-callback
727           --commit-callback
728           --tag-callback
729           --reset-callback
730
731
732       We’ll come back to these later because it is often the case that the
733       other callbacks are more convenient. The other callbacks operate on a
734       small piece of the raw objects or operate on pieces across multiple
735       types of raw object (e.g. author names and committer names and tagger
736       names across commits and tags, or refnames across commits, tags, and
737       resets, or messages across commits and tags). The convenience callbacks
738       are:
739
740           --filename-callback
741           --message-callback
742           --name-callback
743           --email-callback
744           --refname-callback
745
746
747       in each you are expected to simply return a new value based on the one
748       passed in. For example,
749
750           git-filter-repo --name-callback 'return name.replace(b"Wiliam", b"William")'
751
752
753       would result in the following function being called:
754
755           def name_callback(name):
756             return name.replace(b"Wiliam", b"William")
757
758
759       The email callback is quite similar:
760
761           git-filter-repo --email-callback 'return email.replace(b".cm", b".com")'
762
763
764       The refname callback is also similar, but note that the refname passed
765       in and returned are expected to be fully qualified (e.g.
766       b"refs/heads/master" instead of just b"master" and b"refs/tags/v1.0.7"
767       instead of b"1.0.7"):
768
769           git-filter-repo --refname-callback '
770             # Change e.g. refs/heads/master to refs/heads/prefix-master
771             rdir,rpath = os.path.split(refname)
772             return rdir + b"/prefix-" + rpath'
773
774
775       The message callback is quite similar to the previous three callbacks,
776       though it operates on a bytestring that is likely more than one line:
777
778           git-filter-repo --message-callback '
779             if b"Signed-off-by:" not in message:
780               message += b"\nSigned-off-by: Me My <self@and.eye>"
781             return re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)'
782
783
784       The filename callback is slightly more interesting. Returning None
785       means the file should be removed from all commits, returning the
786       filename unmodified marks the file to be kept, and returning a
787       different name means the file should be renamed. An example:
788
789           git-filter-repo --filename-callback '
790             if b"/src/" in filename:
791               # Remove all files with a directory named "src" in their path
792               # (except when "src" appears at the toplevel).
793               return None
794             elif filename.startswith(b"tools/"):
795               # Rename tools/ -> scripts/misc/
796               return b"scripts/misc/" + filename[6:]
797             else:
798               # Keep the filename and do not rename it
799               return filename
800             '
801
802
803       In contrast, the blob, reset, tag, and commit callbacks are not
804       expected to return a value, but are instead expected to modify the
805       object passed in. Major fields for these objects are (subject to API
806       backward compatibility caveats mentioned previously):
807
808       ·   Blob: original_id (original hash) and data
809
810       ·   Reset: ref (name of reference) and from_ref (hash or integer mark)
811
812       ·   Tag: ref, from_ref, original_id, tagger_name, tagger_email,
813           tagger_date, message
814
815       ·   Commit: branch, original_id, author_name, author_email,
816           author_date, committer_name, committer_email, committer_date `,
817           `message, file_changes (list of FileChange objects, each containing
818           a type, filename, mode, and blob_id), parents (list of hashes or
819           integer marks)
820
821       An example of each:
822
823           git filter-repo --blob-callback '
824             if len(blob.data) > 25:
825               # Mark this blob for removal from all commits
826               blob.skip()
827             else:
828               blob.data = blob.data.replace(b"Hello", b"Goodbye")
829             '
830
831
832
833           git filter-repo --reset-callback 'reset.ref = reset.ref.replace(b"master", b"dev")'
834
835
836
837           git filter-repo --tag-callback '
838             if tag.tagger_name == b"Jim Williams":
839               # Omit this tag
840               tag.skip()
841             else:
842               tag.message = tag.message + b"\n\nTag of %s by %s on %s" % (tag.ref, tag.tagger_email, tag.tagger_date)'
843
844
845
846           git filter-repo --commit-callback '
847             # Remove executable files with three 6s in their name (including
848             # from leading directories).
849             # Also, undo deletion of sources/foo/bar.txt (change types are
850             # either b"D" (deletion) or b"M" (add or modify); renames are
851             # handled by deleting the old file and adding a new one)
852             commit.file_changes = [
853                    change for change in commit.file_changes
854                    if not (change.mode == b"100755" and
855                            change.filename.count(b"6") == 3) and
856                       not (change.type == b"D" and
857                            change.filename == b"sources/foo/bar.txt")]
858             # Mark all .sh files as executable; modes in git are always one of
859             # 100644 (normal file), 100755 (executable), 120000 (symlink), or
860             # 160000 (submodule)
861             for change in commit.file_changes:
862               if change.filename.endswith(b".sh"):
863                 change.mode = b"100755"
864             '
865
866

INTERNALS

868       You probably don’t need to read this section unless you are just very
869       curious or you are trying to do a very complex history rewrite.
870
871   How filter-repo works
872       Roughly, filter-repo works by running
873
874           git fast-export <options> | filter | git fast-import <options>
875
876
877       where filter-repo not only launches the whole pipeline but also serves
878       as the filter in the middle. However, filter-repo does a few additional
879       things on top in order to make it into a well-rounded filtering tool. A
880       sequence that more accurately reflects what filter-repo runs is:
881
882        1. Verify we’re in a fresh clone
883
884        2. git fetch -u . refs/remotes/origin/*:refs/heads/*
885
886        3. git remote rm origin
887
888        4. git fast-export --show-original-ids --reference-excluded-parents
889           --fake-missing-tagger --signed-tags=strip
890           --tag-of-filtered-object=rewrite --use-done-feature --no-data
891           --reencode=yes --mark-tags --all | filter | git -c
892           core.ignorecase=false fast-import --force --quiet
893
894        5. git update-ref --no-deref --stdin, fed with a list of refs to nuke,
895           and a list of replace refs to delete, create, or update.
896
897        6. git reset --hard
898
899        7. git reflog expire --expire=now --all
900
901        8. git gc --prune=now
902
903       Some notes or exceptions on each of the above:
904
905        1. If we’re not in a fresh clone, users will not be able to recover if
906           they used the wrong command or ran in the wrong repo. (Though
907           --force overrides this check, and it’s also off if you’ve already
908           ran filter-repo once in this repo.)
909
910        2. Technically, we actually use a git update-ref command fed with a
911           lot of input due to the fact that users can use --force when local
912           branches might not match remote branches. But this fetch command
913           catches the intent rather succinctly.
914
915        3. We don’t want users accidentally pushing back to the original repo,
916           as discussed in the section called “DISCUSSION”. It also reminds
917           users that since history has been rewritten, this repo is no longer
918           compatible with the original. Finally, another minor benefit is
919           this allows users to push with the --mirror option to their new
920           home without accidentally sending remote tracking branches.
921
922        4. Some of these flags are always used but others are actually
923           conditional. For example, filter-repo’s --replace-text and
924           --blob-callback options need to work on blobs so --no-data cannot
925           be passed to fast-export. But when we don’t need to work on blobs,
926           passing --no-data speeds things up. Also, other flags may change
927           the structure of the pipeline as well (e.g.  --dry-run and --debug)
928
929        5. We use this step to write replace refs for accessing the newly
930           written commit hashes using their previous names. Also, if refs
931           were renamed by various steps, we need to delete the old refnames
932           in order to avoid mixing old and new history.
933
934        6. Users also have old versions of files in their working tree and
935           index; we want those cleaned up to match the rewritten history as
936           well. Note that this step is skipped in bare repos.
937
938        7. Reflogs will hold on to old history, so we need to expire them.
939
940        8. We need to gc to avoid mixing new and old history. Also, it shrinks
941           the repository for users, so they don’t have to do extra work.
942           (Odds are that they’ve only rewritten trees and commits and maybe a
943           few blobs, so --aggressive isn’t needed and would be too slow.)
944
945       Information about these steps is printed out when --debug is passed to
946       filter-repo. When doing a --partial history rewrite, steps 2, 3, 7, and
947       8 are unconditionally skipped, step 5 is skipped if --replace-refs is
948       update-no-add, and just the nuke-unused-refs portion of step 5 is
949       skipped if --replace-refs is something else.
950
951   Limitations
952       Inherited limitations
953           Since git filter-repo calls fast-export and fast-import to do a lot
954           of the heavy lifting, it inherits limitations from those systems:
955
956           ·   extended commit headers, if any, are stripped
957
958           ·   commits get rewritten meaning they will have new hashes;
959               therefore, signatures on commits and tags cannot continue to
960               work and instead are just removed (thus signed tags become
961               annotated tags)
962
963           ·   tags of commits are supported. Prior to git-2.24.0, tags of
964               blobs and tags of tags are not supported (fast-export would die
965               on such tags). tags of trees are not supported in any git
966               version (since fast-export ignores tags of trees with a warning
967               and fast-import provides no way to import them).
968
969           ·   annotated and signed tags outside of the refs/tags/ namespace
970               are not supported (their location will be mangled in weird
971               ways)
972
973           ·   fast-import will die on various forms of invalid input, such as
974               a timezone with more than four digits
975
976           ·   fast-export cannot reencode commit messages into UTF-8 if the
977               commit message is not valid in its specified encoding (in such
978               cases, it’ll leave the commit message and the encoding header
979               alone).
980
981           ·   commits without an author will be given one matching the
982               committer
983
984           ·   tags without a tagger will be given a fake tagger
985
986           ·   references that include commit cycles in their history (which
987               can be created with git-replace(1)) will not be flagged to the
988               user as an error but will be silently deleted by fast-export as
989               though the branch or tag contained no interesting files
990
991           There are also some limitations due to the design of these systems:
992
993           ·   Trying to insert additional files into the stream can be
994               tricky; since fast-export only lists file changes in a merge
995               relative to its first parent, if you insert additional files
996               into a commit that is in the second (or third or fourth) parent
997               history of a merge, then you also need to add it to the merge
998               manually. (Similarly, if you change which parent is the first
999               parent in a merge commit, you need to manually update the list
1000               of file changes to be relative to the new first parent.)
1001
1002           ·   fast-export and fast-import work with exact file contents, not
1003               patches. (e.g. "Whatever the current contents of this file,
1004               update them to now have these contents") Because of this,
1005               removing the changes made in a single commit or inserting
1006               additional changes to a file in some commit and expecting them
1007               to propagate forward is not something that can be done with
1008               these tools. Use git-rebase(1) for that.
1009
1010       Intrinsic limitations
1011           Some types of filtering have limitations that would affect any tool
1012           attempting to perform them; the most any tool can do is attempt to
1013           notify the user when it detects an issue:
1014
1015           ·   When rewriting commit hashes in commit messages, there are a
1016               variety of cases when the hash will not be updated (whenever
1017               this happens, a note is written to
1018               .git/filter-repo/suboptimal-issues):
1019
1020           ·   if a commit hash does not correspond to a commit in the old
1021               repo
1022
1023           ·   if a commit hash corresponds to a commit that gets pruned
1024
1025           ·   if an abbreviated hash is not unique
1026
1027           ·   Pruning of empty commits can cause a merge commit to lose an
1028               entire ancestry line and become a non-merge. If the merge
1029               commit had no changes then it can be pruned too, but if it
1030               still has changes it needs to be kept. This might cause minor
1031               confusion since the commit will likely have a commit message
1032               that makes it sound like a merge commit even though it’s not.
1033               (Whenever a merge commit becomes a non-merge commit, a note is
1034               written to .git/filter-repo/suboptimal-issues)
1035
1036       Issues specific to filter-repo
1037           ·   Multiple repositories in the wild have been observed which use
1038               a bogus timezone (+051800); google will find you some reports.
1039               The intended timezone wasn’t clear or wasn’t always the same.
1040               Replace with a different bogus timezone that fast-import will
1041               accept (+0261).
1042
1043           ·   --path-rename can result in pathname collisions; to avoid
1044               excessive memory requirements of tracking which files are in
1045               all commits or looking up what files exist with either every
1046               commit or every usage of --path-rename, we just tell the user
1047               that they might clobber other changes if they aren’t careful.
1048               We can check if the clobbering comes from another --path-rename
1049               without much overhead. (Perhaps in the future it’s worth adding
1050               a slow mode to --path-rename that will do the more exhaustive
1051               checks?)
1052
1053           ·   There is no mechanism for directly controlling which flags are
1054               passed to fast-export (or fast-import); only pre-defined flags
1055               can be turned on or off as a side-effect of other options.
1056               Direct control would make little sense because some options
1057               like --full-tree would require additional code in filter-repo
1058               (to parse new directives), and others such as -M or -C would
1059               break assumptions used in other places of filter-repo.
1060
1061           ·   Partial-repo filtering, while supported, runs counter to
1062               filter-repo’s "avoid mixing old and new history" design. This
1063               support has required improvements to core git as well (e.g. it
1064               depends upon the --reference-excluded-parents option to
1065               fast-export that was added specifically for this usage within
1066               filter-repo). The --partial and --refs options will continue to
1067               be supported since there are people with usecases for them;
1068               however, I am concerned that this inconsistency about mixing
1069               old and new history seems likely to lead to user mistakes. For
1070               now, I just hope that long explanations of caveats in the
1071               documentation of these options suffice to curtail any such
1072               problems.
1073
1074       Comments on reversibility
1075           Some people are interested in reversibility of of a rewrite; e.g.
1076           rewrite history, possibly add some commits, then unrewrite and get
1077           the original history back plus a few new "unrewritten" commits.
1078           Obviously this is impossible if your rewrite involves throwing away
1079           information (e.g. filtering out files or replacing several
1080           different strings with ***REMOVED***), but may be possible with
1081           some rewrites. filter-repo is likely to be a poor fit for this type
1082           of workflow for a few reasons:
1083
1084           ·   most of the limitations inherited from fast-export and
1085               fast-import are of a type that cause reversibility issues
1086
1087           ·   grafts and replace refs, if present, are used in the rewrite
1088               and made permanent
1089
1090           ·   rewriting of commit hashes will probably be reversible, but it
1091               is possible for rewritten abbreviated hashes to not be unique
1092               even if the original abbreviated hashes were.
1093
1094           ·   filter-repo defaults to several forms of unreversible rewriting
1095               that you may need to turn off (e.g. the last two bullet points
1096               above or reencoding commit messages into UTF-8); it’s possible
1097               that additional forms of unreversible rewrites will be added in
1098               the future.
1099
1100           ·   I assume that people use filter-repo for one-shot conversions,
1101               not ongoing data transfers. I explicitly reserve the right to
1102               change any API in filter-repo based on this presumption (and a
1103               comment to this effect is found in multiple places in the code
1104               and examples). You have been warned.
1105

SEE ALSO

1107       git-rebase(1), git-filter-branch(1)
1108

GIT

1110       Part of the git(1) suite
1111
1112
1113
1114Git 2.25.0.dirty                  01/13/2020                GIT-FILTER-REPO(1)
Impressum