1GIT-FILTER-REPO(1)                Git Manual                GIT-FILTER-REPO(1)
2
3
4

NAME

6       git-filter-repo - Rewrite repository history
7

SYNOPSIS

9       git filter-repo --analyze
10       git filter-repo [<path_filtering_options>] [<content_filtering_options>]
11               [<ref_renaming_options>] [<commit_message_filtering_options>]
12               [<name_or_email_filtering_options>] [<parent_rewriting_options>]
13               [<generic_callback_options>] [<miscellaneous_options>]
14
15

DESCRIPTION

17       Rapidly rewrite entire repository history using user-specified filters.
18       This is a destructive operation which should not be used lightly; it
19       writes new commits, trees, tags, and blobs corresponding to (but
20       filtered from) the original objects in the repository, then deletes the
21       original history and leaves only the new. See the section called
22       “DISCUSSION” for more details on the ramifications of using this tool.
23       Several different types of history rewrites are possible; examples
24       include (but are not limited to):
25
26       •   stripping large files (or large directories or large extensions)
27
28       •   stripping unwanted files by path
29
30       •   extracting wanted paths and their history (stripping everything
31           else)
32
33       •   restructuring the file layout (such as moving all files into a
34           subdirectory in preparation for merging with another repo, making a
35           subdirectory become the new toplevel directory, or merging two
36           directories with independent filenames into one directory)
37
38       •   renaming tags (also often in preparation for merging with another
39           repo)
40
41       •   replacing or removing sensitive text such as passwords
42
43       •   making mailmap rewriting of user names or emails permanent
44
45       •   making grafts or replacement refs permanent
46
47       •   rewriting commit messages
48
49       Additionally, several concerns are handled automatically (many of these
50       can be overridden, but they are all on by default):
51
52       •   rewriting (possibly abbreviated) hashes in commit messages to refer
53           to the new post-rewrite commit hashes
54
55       •   pruning commits which become empty due to the above filters (also
56           handles edge cases like pruning of merge commits which become
57           degenerate and empty)
58
59       •   creating replace-refs (see git-replace(1)) for old commit hashes,
60           which if manually pushed and fetched will allow users to continue
61           to refer to new commits using (unabbreviated) old commit IDs
62
63       •   stripping of original history to avoid mixing old and new history
64
65       •   repacking the repository post-rewrite to shrink the repo for the
66           user
67
68       Also, it’s worth noting that there is an important safety mechanism:
69
70       •   abort if run from a repo that is not a fresh clone (to prevent
71           accidental data loss from rewriting local history that doesn’t
72           exist anywhere else). See the section called “FRESH CLONE SAFETY
73           CHECK AND --FORCE”.
74
75       For those who know that there is large unwanted stuff in their history
76       and want help finding it, this command also
77
78       •   provides an option to analyze a repository and generate reports
79           that can be useful in determining what to filter (or in determining
80           whether a separate filtering command was successful).
81
82       See also the section called “VERSATILITY”, the section called
83       “DISCUSSION”, the section called “EXAMPLES”, and the section called
84       “INTERNALS”.
85

OPTIONS

87   Analysis Options
88       --analyze
89           Analyze repository history and create a report that may be useful
90           in determining what to filter in a subsequent run (or in
91           determining if a previous filtering command did what you wanted).
92           Will not modify your repo.
93
94   Filtering based on paths (see also --filename-callback)
95       These options specify the paths to select. Note that much like git
96       itself, renames are NOT followed so you may need to specify multiple
97       paths, e.g. --path olddir/ --path newdir/
98
99       --invert-paths
100           Invert the selection of files from the specified
101           --path-{match,glob,regex} options below, i.e. only select files
102           matching none of those options.
103
104       --path-match <dir_or_file>, --path <dir_or_file>
105           Exact paths (files or directories) to include in filtered history.
106           Multiple --path options can be specified to get a union of paths.
107
108       --path-glob <glob>
109           Glob of paths to include in filtered history. Multiple --path-glob
110           options can be specified to get a union of paths.
111
112       --path-regex <regex>
113           Regex of paths to include in filtered history. Multiple
114           --path-regex options can be specified to get a union of paths.
115
116       --use-base-name
117           Match on file base name instead of full path from the top of the
118           repo. Incompatible with --path-rename, and incompatible with
119           matching against directory names.
120
121   Renaming based on paths (see also --filename-callback)
122       Note: if you combine path filtering with path renaming, be aware that a
123       rename directive does not select paths, it only says how to rename
124       paths that are selected with the filters.
125
126       --path-rename <old_name:new_name>, --path-rename-match
127       <old_name:new_name>
128           Path to rename; if filename or directory matches <old_name> rename
129           to <new_name>. Multiple --path-rename options can be specified.
130
131   Path shortcuts
132       --paths-from-file <filename>
133           Specify several path filtering and renaming directives, one per
134           line. Lines with ==> in them specify path renames, and lines can
135           begin with literal: (the default), glob:, or regex: to specify
136           different matching styles. Blank lines and lines starting with a #
137           are ignored (if you have a filename that you want to filter on that
138           starts with literal:, #, glob:, or regex:, then prefix the line
139           with literal:).
140
141       --subdirectory-filter <directory>
142           Only look at history that touches the given subdirectory and treat
143           that directory as the project root. Equivalent to using --path
144           <directory>/ --path-rename <directory>/:
145
146       --to-subdirectory-filter <directory>
147           Treat the project root as instead being under <directory>.
148           Equivalent to using --path-rename :<directory>/
149
150   Content editing filters (see also --blob-callback)
151       --replace-text <expressions_file>
152           A file with expressions that, if found, will be replaced. By
153           default, each expression is treated as literal text, but regex: and
154           glob: prefixes are supported. You can end the line with ==> and
155           some replacement text to choose a replacement choice other than the
156           default of ***REMOVED***.
157
158       --strip-blobs-bigger-than <size>
159           Strip blobs (files) bigger than specified size (e.g.  5M, 2G, etc)
160
161       --strip-blobs-with-ids <blob_id_filename>
162           Read git object ids from each line of the given file, and strip all
163           of them from history
164
165   Renaming of refs (see also --refname-callback)
166       --tag-rename <old:new>
167           Rename tags starting with <old> to start with <new>. For example,
168           --tag-rename foo:bar will rename tag foo-1.2.3 to bar-1.2.3; either
169           <old> or <new> can be empty.
170
171   Filtering of commit messages (see also --message-callback)
172       --replace-message <expressions_file>
173           A file with expressions that, if found in commit or tag messages,
174           will be replaced. This file uses the same syntax as --replace-text.
175
176       --preserve-commit-hashes
177           By default, since commits are rewritten and thus gain new hashes,
178           references to old commit hashes in commit messages are replaced
179           with new commit hashes (abbreviated to the same length as the old
180           reference). Use this flag to turn off updating commit hashes in
181           commit messages.
182
183       --preserve-commit-encoding
184           Do not reencode commit messages into UTF-8. By default, if the
185           commit object specifies an encoding for the commit message, the
186           message is re-encoded into UTF-8.
187
188   Filtering of names & emails (see also --name-callback and --email-callback)
189       --mailmap <filename>
190           Use specified mailmap file (see git-shortlog(1) for details on the
191           format) when rewriting author, committer, and tagger names and
192           emails. If the specified file is part of git history, historical
193           versions of the file will be ignored; only the current contents are
194           consulted.
195
196       --use-mailmap
197           Same as: --mailmap .mailmap
198
199   Parent rewriting
200       --replace-refs {delete-no-add, delete-and-add, update-no-add,
201       update-or-add, update-and-add}
202           Replace refs (see git-replace(1)) are used to rewrite parents
203           (unless turned off by the usual git mechanism); this flag specifies
204           what do do with those refs afterward. Replace refs can either be
205           deleted or updated to point at new commit hashes. Also, new replace
206           refs can be added for each commit rewrite. With update-or-add, new
207           replace refs are only added for commit rewrites that aren’t used to
208           update an existing replace ref. default is update-and-add if
209           $GIT_DIR/filter-repo/already_ran does not exist; update-or-add
210           otherwise.
211
212       --prune-empty {always, auto, never}
213           Whether to prune empty commits.  auto (the default) means only
214           prune commits which become empty (not commits which were empty in
215           the original repo, unless their parent was pruned). When the parent
216           of a commit is pruned, the first non-pruned ancestor becomes the
217           new parent.
218
219       --prune-degenerate {always, auto, never}
220           Since merge commits are needed for history topology, they are
221           typically exempt from pruning. However, they can become degenerate
222           with the pruning of other commits (having fewer than two parents,
223           having one commit serve as both parents, or having one parent as
224           the ancestor of the other.) If such merge commits have no file
225           changes, they can be pruned. The default (auto) is to only prune
226           empty merge commits which become degenerate (not which started as
227           such).
228
229       --no-ff
230           Even if the first parent is or becomes an ancestor of another
231           parent, do not prune it. This modifies how --prune-degenerate
232           behaves, and may be useful in projects who always use merge
233           --no-ff.
234
235   Generic callback code snippets
236       --filename-callback <function_body>
237           Python code body for processing filenames; see the section called
238           “CALLBACKS”.
239
240       --message-callback <function_body>
241           Python code body for processing messages (both commit messages and
242           tag messages); see the section called “CALLBACKS”.
243
244       --name-callback <function_body>
245           Python code body for processing names of people; see the section
246           called “CALLBACKS”.
247
248       --email-callback <function_body>
249           Python code body for processing emails addresses; see the section
250           called “CALLBACKS”.
251
252       --refname-callback <function_body>
253           Python code body for processing refnames; see the section called
254           “CALLBACKS”.
255
256       --blob-callback <function_body>
257           Python code body for processing blob objects; see the section
258           called “CALLBACKS”.
259
260       --commit-callback <function_body>
261           Python code body for processing commit objects; see the section
262           called “CALLBACKS”.
263
264       --tag-callback <function_body>
265           Python code body for processing tag objects; see the section called
266           “CALLBACKS”.
267
268       --reset-callback <function_body>
269           Python code body for processing reset objects; see the section
270           called “CALLBACKS”.
271
272   Location to filter from/to
273           Note
274           Specifying alternate source or target locations implies --partial
275           except that the normal default for --replace-refs is used. However,
276           unlike normal uses of --partial, this doesn’t risk mixing old and
277           new history since the old and new histories are in different
278           repositories.
279
280       --source <source>
281           Git repository to read from
282
283       --target <target>
284           Git repository to overwrite with filtered history
285
286   Miscellaneous options
287       --help, -h
288           Show a help message and exit.
289
290       --force, -f
291           Ignore fresh clone checks and rewrite history (an irreversible
292           operation, especially since it by default ends with an immediate
293           pruning of reflogs and old objects). See the section called “FRESH
294           CLONE SAFETY CHECK AND --FORCE”. Note that when cloning repos on a
295           local filesystem, it is better to pass --no-local to git clone than
296           passing --force to git-filter-repo.
297
298       --partial
299           Do a partial history rewrite, resulting in the mixture of old and
300           new history. This implies a default of update-no-add for
301           --replace-refs, disables rewriting refs/remotes/origin/* to
302           refs/heads/*, disables removing of the origin remote, disables
303           removing unexported refs, disables expiring the reflog, and
304           disables the automatic post-filter gc. Also, this modifies
305           --tag-rename and --refname-callback options such that instead of
306           replacing old refs with new refnames, it will instead create new
307           refs and keep the old ones around. Use with caution.
308
309       --refs <refs+>
310           Limit history rewriting to the specified refs. Implies --partial.
311           In addition to the normal caveats of --partial (mixing old and new
312           history, no automatic remapping of refs/remotes/origin/* to
313           refs/heads/*, etc.), this also may cause problems for pruning of
314           degenerate empty merge commits when negative revisions are
315           specified.
316
317       --dry-run
318           Do not change the repository. Run git fast-export and filter its
319           output, and save both the original and the filtered version for
320           comparison. This also disables rewriting commit messages due to not
321           knowing new commit IDs and disables filtering of some empty commits
322           due to inability to query the fast-import backend.
323
324       --debug
325           Print additional information about operations being performed and
326           commands being run. (If used together with --dry-run, shows extra
327           information about what would be run).
328
329       --stdin
330           Instead of running git fast-export and filtering its output, filter
331           the fast-export stream from stdin. The stdin must be in the
332           expected input format (e.g. it needs to include original-oid
333           directives).
334
335       --quiet
336           Pass --quiet to other git commands called.
337

OUTPUT

339       Every time filter-repo is run, files are created in the
340       .git/filter-repo/ directory. These files overwritten unconditionally on
341       every run.
342
343   Commit map
344       The .git/filter-repo/commit-map file contains a mapping of how all
345       commits were (or were not) changed.
346
347       •   A header is the first line with the text "old" and "new"
348
349       •   Commit mappings are in no particular order
350
351       •   All commits in range of the rewrite will be listed, even commits
352           that are unchanged (e.g. because the commit pre-dated when the
353           large file(s) were introduced to the repo).
354
355       •   An all-zeros hash, or null SHA, represents a non-existant object.
356           When in the "new" column, this means the commit was removed
357           entirely.
358
359   Reference map
360       The .git/filter-repo/ref-map file contains a mapping of which local
361       references were changed.
362
363       •   A header is the first line with the text "old" and "new"
364
365       •   Reference mappings are in no particular order
366
367       •   An all-zeros hash, or null SHA, represents a non-existant object.
368           When in the "new" column, this means the ref was removed entirely.
369

FRESH CLONE SAFETY CHECK AND --FORCE

371       Since filter-repo does irreversible rewriting of history, it is
372       important to avoid making changes to a repo for which the user doesn’t
373       have a good backup. The primary defense mechanism is to simply educate
374       users and rely on them to be good stewards of their data; thus there
375       are several warnings in the documentation about how filter repo
376       rewrites history.
377
378       However, as a service to users, we would like to provide an additional
379       safety check beyond the documentation. There isn’t a good way to check
380       if the user has a good backup, but we can ask a related question that
381       is an imperfect but quite reasonable proxy: "Is this repository a fresh
382       clone?" Unfortunately, that is also a question we can’t get a perfect
383       answer to; git provides no way to answer that question. However, there
384       are approximately a dozen things that I found that seem to always be
385       true of brand new clones (assuming they are either clones of remote
386       repositories or are made with the --no-local flag), and I check for all
387       of those.
388
389       These checks can have both false positives and false negatives. Someone
390       might have a perfectly good backup of their repo without it actually
391       being a fresh clone — but there’s no way for filter-repo to know that.
392       Conversely, someone could look at all things that filter-repo checks
393       for in its safety checks and then just tweak their non-backed-up
394       repository to satisfy those conditions (though it would take a fair
395       amount of effort, and it’s astronomically unlikely that a repo that
396       isn’t a fresh clone randomly happens to match all the criteria). In
397       practice, the safety checks filter-repo uses seem to be really good at
398       avoiding people accidentally running filter-repo on a repository that
399       they shouldn’t be running it on. It even caught me once when I did mean
400       to run filter-repo but was in a different directory than I thought I
401       was.
402
403       In short, it’s perfectly fine to use ‘--force` to override the safety
404       checks as long as you’re okay with filter-repo irreversibly rewriting
405       the contents of the current repository. It is a really bad idea to get
406       in the habit of always specifying --force; if you do, one day you will
407       run one of your commands in the wrong directory like I did, and you
408       won’t have the safety check anymore to bail you out. Also, it is
409       definitely NOT okay to recommend --force on forums, Q&A sites, or in
410       emails to other users without first carefully explaining that --force
411       means putting your repositories’ data at risk. I am especially bothered
412       by people who suggest the flag when it clearly is NOT needed; they are
413       needlessly putting other peoples' data at risk.
414

VERSATILITY

416       filter-repo has a hierarchy of capabilities on the spectrum from easy
417       to use convenience flags that perform pre-defined types of filtering,
418       to choices that provide lots of flexibility in controlling how
419       filtering occurs. This spectrum includes the following:
420
421       •   Convenience flags making common types of history rewriting simple
422           (e.g. --path, --strip-blobs-bigger-than, --replace-text, --mailmap)
423
424       •   Options which are shorthand for others or which provide greater
425           control than others (e.g. --subdirectory-filter could just be
426           written using both a path selection (--path) and a path rename
427           (--path-rename) filter; --paths-from-file can handle all other
428           --path* options and more such as regex renaming of paths)
429
430       •   Generic python callbacks for handling a certain type of data (the
431           filename, message, name, email, and refname callbacks)
432
433       •   Generic python callbacks for handling fundamental git objects,
434           allowing greater control over the combination of data types the
435           object holds (the commit, tag, blob, and reset callbacks)
436
437       •   The ability to import filter-repo as a module in a python program
438           and use its classes and functions for even greater control and
439           flexibility while still leveraging lots of basic capabilities. One
440           can even use this to write new tools with a completely different
441           interface.
442
443       For more information about callbacks, see the section called
444       “CALLBACKS”. For examples on writing python programs that import
445       filter-repo as a module to create new history rewriting tools, look at
446       the contrib/filter-repo-demos/ directory. That directory includes,
447       among other examples, a reimplementation of git-filter-branch which is
448       faster than git-filter-branch, and a reimplementation of BFG Repo
449       Cleaner with several bug fixes and new features.
450

DISCUSSION

452       Using filter-repo is relatively simple, but rewriting history is part
453       of a larger discussion in terms of collaboration. When you rewrite
454       history, the old and new histories are no longer compatible; if you
455       push this history somewhere for others to view, it will look as though
456       you’ve done a rebase of all branches and tags. Make sure you are
457       familiar with the "RECOVERING FROM UPSTREAM REBASE" section of git-
458       rebase(1) (and in particular, "The hard case") before proceeding, in
459       addition to this section.
460
461       Steps to use git-filter-repo as part of the bigger picture of doing a
462       history rewrite are roughly as follows:
463
464        1. Create a clone of your repository (if you created special refs
465           outside of refs/heads/ or refs/tags/, make sure to fetch those
466           too). You may pass --bare or --mirror to git clone, if you prefer.
467           You should pass --no-local if the repository you are cloning from
468           is on the local filesystem. Avoid other flags; some might confuse
469           the fresh clone check, and others could cause parts of the data to
470           be missing that are needed for the rewrite.
471
472        2. (Optional) Run git filter-repo --analyze. This will create a
473           directory of reports mentioning renames that have occurred in your
474           repo and also listing sizes of objects aggregated by
475           path/directory/extension/blob-id; this information may be useful in
476           choosing how to filter your repo. It can also be useful to re-run
477           --analyze after filtering to verify the changes look correct.
478
479        3. Run filter-repo with your desired filtering options. Many examples
480           are given below. For more complex cases, note that doing the
481           filtering in multiple steps (by running multiple filter-repo
482           invocations in a sequence) is supported. If anything goes wrong
483           here, simply delete your clone and restart.
484
485        4. Push your new repository to its new home (note that
486           refs/remotes/origin/* will have been moved to refs/heads/* as the
487           first part of filter-repo, so you can just deal with normal
488           branches instead of remote tracking branches). While you can force
489           push this to the same URL you cloned from, there are good reasons
490           to consider pushing to a different location instead:
491
492           •   People who cloned from the original repo will have old history.
493               When they fetch the new history you force pushed up, unless
494               they do a git reset --hard @{u} on their branches or rebase
495               their local work, git will think they have hundreds or
496               thousands of commits with very similar commit messages as what
497               exist upstream (but which include files you wanted excised from
498               history), and allow the user to merge the two histories,
499               resulting in what looks like two copies of each commit. If they
500               then push this history back up, then everyone now has history
501               with two copies of each commit and the bad files have returned.
502               You’re more likely to succeed in forcing people to get rid of
503               the old history if they have to clone a new URL.
504
505           •   Rewriting history will rewrite tags; those who have already
506               downloaded tags will not get the updated tags by default (see
507               the "On Re-tagging" section of git-tag(1)). Every user trying
508               to use an existing clone will have to forcibly delete all tags
509               and re-fetch them; it may be easier for them to just re-clone,
510               which they are more likely to do with a new clone URL.
511
512           •   Rewriting history may delete some refs (e.g. branches that only
513               had files that you wanted excised from history); unless you run
514               git push with the --mirror or --prune options, those refs will
515               continue to exist on the server. If folks then merge these
516               branches into others, then people have started mixing old and
517               new history. If users had already cloned these branches,
518               removing them from the server isn’t enough; you need all users
519               to delete any local branches based on these refs and run fetch
520               with the --prune option as well. Simply re-cloning from a new
521               URL is easier.
522
523           •   The server may not allow you to force push over some refs. For
524               example, code review systems may have special ref namespaces
525               (e.g. refs/changes/, refs/pull/, refs/merge-requests/) that
526               they have locked down.
527
528        5. If you still want to push your rewritten history back to the
529           original url despite my warnings above, you’ll have to manage it
530           very carefully:
531
532           •   git-filter-repo deletes the "origin" remote to help avoid
533               people accidentally repushing to the same repository, so you’ll
534               need to remind git what origin’s url was. You’ll have to look
535               up the command for that.
536
537           •   You’ll need to carefully synchronize with everyone who has
538               cloned the repository, and will also need to carefully
539               synchronize with everything (e.g. CI systems) that has cloned
540               it. Every single clone will either need to be thrown away and
541               re-cloned, or need to take all the steps outlined in item 4 as
542               well as follow the necessary steps from "RECOVERING FROM
543               UPSTREAM REBASE" section of git-rebase(1). If you miss fixing
544               any clones, you’ll risk mixing old and new history and end up
545               with an even worse mess to clean up.
546
547           •   Finally, you’ll need to consult any documentation from your
548               hosting provider about how to remove any server-side references
549               to the old commits (example: GitLab’s excellent docs on
550               reducing repository size[1], or just the warning box that
551               references "GitHub support" from GitHub’s otherwise dangerously
552               out-of-date docs on removing sensitive data[2]).
553
554        6. (Optional) Some additional considerations
555
556           •   filter-repo by default creates replace refs (see git-
557               replace(1)) for each rewritten commit ID, allowing you to use
558               old (unabbreviated) commit hashes in the git command line to
559               refer to the newly rewritten commits. If you want to use these
560               replace refs, manually push them to the relevant clone URL and
561               tell users to manually fetch them (e.g. by adjusting their
562               fetch refspec, git config --add remote.origin.fetch
563               +refs/replace/*:refs/replace/*). Sadly, replace refs are not
564               yet widely understood; projects like jgit and libgit2 do not
565               support them and existing repository managers (e.g. Gerrit,
566               GitHub, GitLab) do not yet understand replace refs. Thus one
567               can’t use old commit hashes within the UI of these other
568               systems. This may change in the future, but replace refs at
569               least help users locally within the git command line interface.
570               Also, be aware that commit-graphs are excessively cautious
571               around replace refs and just turn off entirely if any are
572               present, so after enough time has passed that old commit IDs
573               become less relevant, users may want to locally delete the
574               replace refs to regain the speedups from commit-graphs.
575
576           •   If you have a central repo, you may want to prevent people from
577               pushing old commit IDs, in order to avoid mixing old and new
578               history. Every repository manager does this differently, some
579               provide specialized commands (e.g.
580               https://gerrit-review.googlesource.com/Documentation/cmd-ban-commit.html),
581               others require you to write hooks.
582

EXAMPLES

584   Path based filtering
585       To only keep the README.md file plus the directories guides and
586       tools/releases/:
587
588           git filter-repo --path README.md --path guides/ --path tools/releases
589
590
591       Directory names can be given with or without a trailing slash, and all
592       filenames are relative to the toplevel of the repo. To keep all files
593       except these paths, just add --invert-paths:
594
595           git filter-repo --path README.md --path guides/ --path tools/releases --invert-paths
596
597
598       If you want to have both an inclusion filter and an exclusion filter,
599       just run filter-repo multiple times. For example, to keep the src/main
600       subdirectory but exclude files under src/main named data, run:
601
602           git filter-repo --path src/main/
603           git filter-repo --path-glob 'src/*/data' --invert-paths
604
605
606       Note that the asterisk (*) will match across multiple directories, so
607       the second command would remove e.g. src/main/org/whatever/data. Also,
608       the second command by itself would also remove e.g.
609       src/not-main/foo/data, but since src/not-main/ was removed by the first
610       command, that’s not an issue. Also, the use of quotes around the
611       asterisk is sometimes important to avoid glob expansion by the shell.
612
613       You can also select paths by regular expression (see
614       https://docs.python.org/3/library/re.html#regular-expression-syntax).
615       For example, to only include files from the repo whose name is in the
616       format YYYY-MM-DD.txt and is found at least two subdirectories deep:
617
618           git filter-repo --path-regex '^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$'
619
620
621       If you want two directories to be renamed (and maybe merged if both are
622       renamed to the same location), use --path-rename; for example, to
623       rename both cmds/ and src/scripts/ to tools/:
624
625           git filter-repo --path-rename cmds:tools --path-rename src/scripts/:tools/
626
627
628       As with --path, directories can be specified with or without a trailing
629       slash for --path-rename.
630
631       If you do a --path-rename to something that was already in use, it will
632       be silently overwritten. However, if you try to rename multiple files
633       to the same location (e.g. src/scripts/run_release.sh and
634       cmds/run_release.sh both existed and had different content with the
635       renames above), then you will be given an error. If you have such a
636       case, you may want to add another rename command to move one of the
637       paths somewhere else where it won’t collide:
638
639           git filter-repo --path-rename cmds/run_release.sh:tools/do_release.sh \
640                           --path-rename cmds/:tools/ \
641                           --path-rename src/scripts/:tools/
642
643
644       Also, --path-rename brings up ordering issues; all path arguments are
645       applied in order. Thus, a command like
646
647           git filter-repo --path-rename sources/:src/main/ --path src/main/
648
649
650       would make sense but reversing the two arguments would not (src/main/
651       is created by the rename so reversing the two would give you an empty
652       repo). Also, note that the rename of cmds/run_release.sh a couple
653       examples ago was done before the other renames.
654
655       Note that path renaming does not do path filtering, thus the following
656       command
657
658           git filter-repo --path src/main/ --path-rename tools/:scripts/
659
660
661       would not result in the tools or scripts directories being present,
662       because the single filter selected only src/main/. It’s likely that you
663       would instead want to run:
664
665           git filter-repo --path src/main/ --path tools/ --path-rename tools/:scripts/
666
667
668       If you prefer to filter based solely on basename, use the
669       --use-base-name flag (though this is incompatible with --path-rename).
670       For example, to only include README.md and Makefile files from any
671       directory:
672
673           git filter-repo --use-base-name --path README.md --path Makefile
674
675
676       If you wanted to delete all .DS_Store files in any directory, you could
677       either use:
678
679           git filter-repo --invert-paths --path '.DS_Store' --use-base-name
680
681
682       or
683
684           git filter-repo --invert-paths --path-glob '*/.DS_Store' --path '.DS_Store'
685
686
687       (the --path-glob isn’t sufficient by itself as it might miss a toplevel
688       .DS_Store file; further while something like --path-glob '*.DS_Store'
689       would workaround that problem it would also grab files named
690       foo.DS_Store or bar/baz.DS_Store)
691
692       Finally, see also the --filename-callback from the section called
693       “CALLBACKS”.
694
695   Filtering based on many paths
696       If you have a long list of files, directories, globs, or regular
697       expressions to filter on, you can stick them in a file and use
698       --paths-from-file; for example, with a file named stuff-i-want.txt with
699       contents of
700
701           # Blank lines and comment lines are ignored.
702           # Examples similar to --path:
703           README.md
704           guides/
705           tools/releases
706
707           # An example that is like --path-glob:
708           glob:*.py
709
710           # An example that is like --path-regex:
711           regex:^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$
712
713           # An example of renaming a path
714           tools/==>scripts/
715
716           # An example of using a regex to rename a path
717           regex:(.*)/([^/]*)/([^/]*)\.text$==>\2/\1/\3.txt
718
719
720       then you could run
721
722           git filter-repo --paths-from-file stuff-i-want.txt
723
724
725       to get a repo containing only the toplevel README.md file, the guides/
726       and tools/releases/ directories, all python files, files whose name was
727       of the form YYYY-MM-DD.txt at least two subdirectories deep, and would
728       rename tools/ to scripts/ and rename files like foo/bar/baz.text to
729       bar/foo/baz.txt. Note the special line prefixes of glob: and regex: and
730       the special string ==> denoting renames.
731
732       Sometimes you have a way of easily generating all the files you want.
733       For example, if you know that none of the currently tracked files have
734       any newlines or special characters in them (see core.quotePath from git
735       config --help) so that git ls-files would print all files literally one
736       per line, and you knew that you wanted to keep only the files that are
737       currently tracked (thus deleting from all commits in history any files
738       that only appear on other branches or that only appear in older
739       commits), then you could use a pair of commands such as
740
741           git ls-files >../paths-i-want.txt
742           git filter-repo --paths-from-file ../paths-i-want.txt
743
744
745       Similarly, you could use --paths-from-file to delete many files. For
746       example, you could run git filter-repo --analyze to get reports, look
747       in one such as .git/filter-repo/analysis/path-deleted-sizes.txt and
748       copy all the filenames into a file such as
749       /tmp/files-i-dont-want-anymore.txt and then run
750
751           git filter-repo --invert-paths --paths-from-file /tmp/files-i-dont-want-anymore.txt
752
753
754       to delete them all.
755
756   Directory based shortcuts
757       Let’s say you had a directory structure like the following:
758
759           module/
760              foo.c
761              bar.c
762           otherDir/
763              blah.config
764              stuff.txt
765           zebra.jpg
766
767       If you wanted just the module/ directory and you wanted it to become
768       the new root so that your new directory structure looked like
769
770           foo.c
771           bar.c
772
773       then you could run:
774
775           git filter-repo --subdirectory-filter module/
776
777
778       If you wanted all the files from the original repo, but wanted to move
779       everything under a subdirectory named my-module/, so that your new
780       directory structure looked like
781
782           my-module/
783              module/
784                 foo.c
785                 bar.c
786              otherDir/
787                 blah.config
788                 stuff.txt
789              zebra.jpg
790
791       then you would instead run run
792
793           git filter-repo --to-subdirectory-filter my-module/
794
795
796   Content based filtering
797       If you want to filter out all files bigger than a certain size, you can
798       use --strip-blobs-bigger-than with some size (K, M, and G suffixes are
799       recognized), e.g.:
800
801           git filter-repo --strip-blobs-bigger-than 10M
802
803
804       If you want to strip out all files with specified git object ids
805       (hashes), list the hashes in a file and run
806
807           git filter-repo --strip-blobs-with-ids FILE_WITH_GIT_BLOB_IDS
808
809
810       If you want to modify file contents, you can do so based on a list of
811       expressions in a file, one per line. For example, with a file named
812       expressions.txt containing
813
814           p455w0rd
815           foo==>bar
816           glob:*666*==>
817           regex:\bdriver\b==>pilot
818           literal:MM/DD/YYYY==>YYYY-MM-DD
819           regex:([0-9]{2})/([0-9]{2})/([0-9]{4})==>\3-\1-\2
820
821
822       then running
823
824           git filter-repo --replace-text expressions.txt
825
826
827       will go through and replace p455w0rd with ***REMOVED***, foo with bar,
828       any line containing 666 with a blank line, the word driver with pilot
829       (but not if it has letters before or after; e.g. drivers will be
830       unmodified), replace the exact text MM/DD/YYYY with YYYY-MM-DD and
831       replace date strings of the form MM/DD/YYYY with ones of the form
832       YYYY-MM-DD. In the expressions file, there are a few things to note:
833
834       •   Every line has a replacement, given by whatever is on the right of
835           ==>. If ==> does not appear on the line, the default replacement is
836           ***REMOVED***.
837
838       •   Lines can start with literal:, glob:, or regex: to specify whether
839           to do literal string matches, globs (see
840           https://docs.python.org/3/library/fnmatch.html), or regular
841           expressions (see
842           https://docs.python.org/3/library/re.html#regular-expression-syntax).
843           If none of these are specified, literal: is assumed.
844
845       •   If multiple matches are found, all are replaced.
846
847       •   globs and regexes are applied to the entire file, but without any
848           special flags turned on. Some folks may be interested in adding
849           (?m) to the regex to turn on MULTILINE mode, so that ^ and $ match
850           the beginning and ends of lines rather than the beginning and end
851           of file. See https://docs.python.org/3/library/re.html for details.
852
853       See also the --blob-callback from the section called “CALLBACKS”.
854
855   Updating commit/tag messages
856       If you want to modify commit or tag messages, you can do so with the
857       same syntax as --replace-text, explained above. For example, with a
858       file named expressions.txt containing
859
860           foo==>bar
861
862
863       then running
864
865           git filter-repo --replace-message expressions.txt
866
867
868       will replace foo in commit or tag messages with bar.
869
870       See also the --message-callback from the section called “CALLBACKS”.
871
872   Refname based filtering
873       To rename tags, use --tag-rename, e.g.:
874
875           git filter-repo --tag-rename foo:bar
876
877
878       This will rename any tags starting with foo to now start with bar.
879       Either side of the colon could be blank, e.g.
880
881           git filter-repo --tag-rename '':'my-module-'
882
883
884       For more general refname modification, see --refname-callback from the
885       section called “CALLBACKS”.
886
887   User and email based filtering
888       To modify username and emails of commits, you can create a mailmap file
889       in the format accepted by git-shortlog(1). For example, if you have a
890       file named my-mailmap you can run
891
892           git filter-repo --mailmap my-mailmap
893
894
895       and if the current contents of that file are as follows (if the
896       specified mailmap file is version controlled, historical versions of
897       the file are ignored):
898
899           Name For User <email@addre.ss>
900           <new@ema.il> <old1@ema.il>
901           New Name And <new@ema.il> <old2@ema.il>
902           New Name And <new@ema.il> Old Name And <old3@ema.il>
903
904
905       then we can update username and/or emails based on the specified
906       mapping.
907
908       See also the --name-callback and --email-callback from the section
909       called “CALLBACKS”.
910
911   Parent rewriting
912       To replace $commit_A with $commit_B (e.g. make all commits which had
913       $commit_A as a parent instead have $commit_B for that parent), and
914       rewrite history to make it permanent:
915
916           git replace $commit_A $commit_B
917           git filter-repo --force
918
919
920       To create a new commit with the same contents as $commit_A except with
921       different parent(s) and then replace $commit_A with the new commit, and
922       rewrite history to make it permanent:
923
924           git replace --graft $commit_A $new_parent_or_parents
925           git filter-repo --force
926
927
928       The reason to specify --force is two-fold: filter-repo will error out
929       if no arguments are specified, and the new graft commit would otherwise
930       trigger the not-a-fresh-clone check.
931
932   Partial history rewrites
933       To rewrite the history on just one branch (which may cause it to no
934       longer share any common history with other branches), use --refs. For
935       example, to remove a file named extraneous.txt from the master branch:
936
937           git filter-repo --invert-paths --path extraneous.txt --refs master
938
939
940       To rewrite just some recent commits:
941
942           git filter-repo --invert-paths --path extraneous.txt --refs master~3..master
943
944

CALLBACKS

946       For flexibility, filter-repo allows you to specify functions on the
947       command line to further filter all changes. Please note that there are
948       some API compatibility caveats associated with these callbacks that you
949       should be aware of before using them; see the "API BACKWARD
950       COMPATIBILITY CAVEAT" comment near the top of git-filter-repo source
951       code.
952
953       All callback functions are of the same general format. For a command
954       line argument like
955
956           --foo-callback 'BODY'
957
958
959       the following code will be compiled and called:
960
961           def foo_callback(foo):
962             BODY
963
964
965       Thus, you just need to make sure your BODY modifies and returns foo
966       appropriately. One important thing to note for all callbacks is that
967       filter-repo uses bytestrings (see
968       https://docs.python.org/3/library/stdtypes.html#bytes) everywhere
969       instead of strings.
970
971       There are four callbacks that allow you to operate directly on raw
972       objects that contain data that’s easy to write in fast-import(1)
973       format:
974
975           --blob-callback
976           --commit-callback
977           --tag-callback
978           --reset-callback
979
980
981       We’ll come back to these later because it is often the case that the
982       other callbacks are more convenient. The other callbacks operate on a
983       small piece of the raw objects or operate on pieces across multiple
984       types of raw object (e.g. author names and committer names and tagger
985       names across commits and tags, or refnames across commits, tags, and
986       resets, or messages across commits and tags). The convenience callbacks
987       are:
988
989           --filename-callback
990           --message-callback
991           --name-callback
992           --email-callback
993           --refname-callback
994
995
996       in each you are expected to simply return a new value based on the one
997       passed in. For example,
998
999           git-filter-repo --name-callback 'return name.replace(b"Wiliam", b"William")'
1000
1001
1002       would result in the following function being called:
1003
1004           def name_callback(name):
1005             return name.replace(b"Wiliam", b"William")
1006
1007
1008       The email callback is quite similar:
1009
1010           git-filter-repo --email-callback 'return email.replace(b".cm", b".com")'
1011
1012
1013       The refname callback is also similar, but note that the refname passed
1014       in and returned are expected to be fully qualified (e.g.
1015       b"refs/heads/master" instead of just b"master" and b"refs/tags/v1.0.7"
1016       instead of b"1.0.7"):
1017
1018           git-filter-repo --refname-callback '
1019             # Change e.g. refs/heads/master to refs/heads/prefix-master
1020             rdir,rpath = os.path.split(refname)
1021             return rdir + b"/prefix-" + rpath'
1022
1023
1024       The message callback is quite similar to the previous three callbacks,
1025       though it operates on a bytestring that is likely more than one line:
1026
1027           git-filter-repo --message-callback '
1028             if b"Signed-off-by:" not in message:
1029               message += b"\nSigned-off-by: Me My <self@and.eye>"
1030             return re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)'
1031
1032
1033       The filename callback is slightly more interesting. Returning None
1034       means the file should be removed from all commits, returning the
1035       filename unmodified marks the file to be kept, and returning a
1036       different name means the file should be renamed. An example:
1037
1038           git-filter-repo --filename-callback '
1039             if b"/src/" in filename:
1040               # Remove all files with a directory named "src" in their path
1041               # (except when "src" appears at the toplevel).
1042               return None
1043             elif filename.startswith(b"tools/"):
1044               # Rename tools/ -> scripts/misc/
1045               return b"scripts/misc/" + filename[6:]
1046             else:
1047               # Keep the filename and do not rename it
1048               return filename
1049             '
1050
1051
1052       In contrast, the blob, reset, tag, and commit callbacks are not
1053       expected to return a value, but are instead expected to modify the
1054       object passed in. Major fields for these objects are (subject to API
1055       backward compatibility caveats mentioned previously):
1056
1057       •   Blob: original_id (original hash) and data
1058
1059       •   Reset: ref (name of reference) and from_ref (hash or integer mark)
1060
1061       •   Tag: ref, from_ref, original_id, tagger_name, tagger_email,
1062           tagger_date, message
1063
1064       •   Commit: branch, original_id, author_name, author_email,
1065           author_date, committer_name, committer_email, committer_date,
1066           message, file_changes (list of FileChange objects, each containing
1067           a type, filename, mode, and blob_id), parents (list of hashes or
1068           integer marks)
1069
1070       An example of each:
1071
1072           git filter-repo --blob-callback '
1073             if len(blob.data) > 25:
1074               # Mark this blob for removal from all commits
1075               blob.skip()
1076             else:
1077               blob.data = blob.data.replace(b"Hello", b"Goodbye")
1078             '
1079
1080
1081
1082           git filter-repo --reset-callback 'reset.ref = reset.ref.replace(b"master", b"dev")'
1083
1084
1085
1086           git filter-repo --tag-callback '
1087             if tag.tagger_name == b"Jim Williams":
1088               # Omit this tag
1089               tag.skip()
1090             else:
1091               tag.message = tag.message + b"\n\nTag of %s by %s on %s" % (tag.ref, tag.tagger_email, tag.tagger_date)'
1092
1093
1094
1095           git filter-repo --commit-callback '
1096             # Remove executable files with three 6s in their name (including
1097             # from leading directories).
1098             # Also, undo deletion of sources/foo/bar.txt (change types are
1099             # either b"D" (deletion) or b"M" (add or modify); renames are
1100             # handled by deleting the old file and adding a new one)
1101             commit.file_changes = [
1102                    change for change in commit.file_changes
1103                    if not (change.mode == b"100755" and
1104                            change.filename.count(b"6") == 3) and
1105                       not (change.type == b"D" and
1106                            change.filename == b"sources/foo/bar.txt")]
1107             # Mark all .sh files as executable; modes in git are always one of
1108             # 100644 (normal file), 100755 (executable), 120000 (symlink), or
1109             # 160000 (submodule)
1110             for change in commit.file_changes:
1111               if change.filename.endswith(b".sh"):
1112                 change.mode = b"100755"
1113             '
1114
1115

INTERNALS

1117       You probably don’t need to read this section unless you are just very
1118       curious or you are trying to do a very complex history rewrite.
1119
1120   How filter-repo works
1121       Roughly, filter-repo works by running
1122
1123           git fast-export <options> | filter | git fast-import <options>
1124
1125
1126       where filter-repo not only launches the whole pipeline but also serves
1127       as the filter in the middle. However, filter-repo does a few additional
1128       things on top in order to make it into a well-rounded filtering tool. A
1129       sequence that more accurately reflects what filter-repo runs is:
1130
1131        1. Verify we’re in a fresh clone
1132
1133        2. git fetch -u . refs/remotes/origin/*:refs/heads/*
1134
1135        3. git remote rm origin
1136
1137        4. git fast-export --show-original-ids --reference-excluded-parents
1138           --fake-missing-tagger --signed-tags=strip
1139           --tag-of-filtered-object=rewrite --use-done-feature --no-data
1140           --reencode=yes --mark-tags --all | filter | git -c
1141           core.ignorecase=false fast-import --date-format=raw-permissive
1142           --force --quiet
1143
1144        5. git update-ref --no-deref --stdin, fed with a list of refs to nuke,
1145           and a list of replace refs to delete, create, or update.
1146
1147        6. git reset --hard
1148
1149        7. git reflog expire --expire=now --all
1150
1151        8. git gc --prune=now
1152
1153       Some notes or exceptions on each of the above:
1154
1155        1. If we’re not in a fresh clone, users will not be able to recover if
1156           they used the wrong command or ran in the wrong repo. (Though
1157           --force overrides this check, and it’s also off if you’ve already
1158           ran filter-repo once in this repo.)
1159
1160        2. Technically, we actually use a git update-ref command fed with a
1161           lot of input due to the fact that users can use --force when local
1162           branches might not match remote branches. But this fetch command
1163           catches the intent rather succinctly.
1164
1165        3. We don’t want users accidentally pushing back to the original repo,
1166           as discussed in the section called “DISCUSSION”. It also reminds
1167           users that since history has been rewritten, this repo is no longer
1168           compatible with the original. Finally, another minor benefit is
1169           this allows users to push with the --mirror option to their new
1170           home without accidentally sending remote tracking branches.
1171
1172        4. Some of these flags are always used but others are actually
1173           conditional. For example, filter-repo’s --replace-text and
1174           --blob-callback options need to work on blobs so --no-data cannot
1175           be passed to fast-export. But when we don’t need to work on blobs,
1176           passing --no-data speeds things up. Also, other flags may change
1177           the structure of the pipeline as well (e.g.  --dry-run and --debug)
1178
1179        5. We use this step to write replace refs for accessing the newly
1180           written commit hashes using their previous names. Also, if refs
1181           were renamed by various steps, we need to delete the old refnames
1182           in order to avoid mixing old and new history.
1183
1184        6. Users also have old versions of files in their working tree and
1185           index; we want those cleaned up to match the rewritten history as
1186           well. Note that this step is skipped in bare repos.
1187
1188        7. Reflogs will hold on to old history, so we need to expire them.
1189
1190        8. We need to gc to avoid mixing new and old history. Also, it shrinks
1191           the repository for users, so they don’t have to do extra work.
1192           (Odds are that they’ve only rewritten trees and commits and maybe a
1193           few blobs, so --aggressive isn’t needed and would be too slow.)
1194
1195       Information about these steps is printed out when --debug is passed to
1196       filter-repo. When doing a --partial history rewrite, steps 2, 3, 7, and
1197       8 are unconditionally skipped, step 5 is skipped if --replace-refs is
1198       update-no-add, and just the nuke-unused-refs portion of step 5 is
1199       skipped if --replace-refs is something else.
1200
1201   Limitations
1202       Inherited limitations
1203           Since git filter-repo calls fast-export and fast-import to do a lot
1204           of the heavy lifting, it inherits limitations from those systems:
1205
1206           •   extended commit headers, if any, are stripped
1207
1208           •   commits get rewritten meaning they will have new hashes;
1209               therefore, signatures on commits and tags cannot continue to
1210               work and instead are just removed (thus signed tags become
1211               annotated tags)
1212
1213           •   tags of commits are supported. Prior to git-2.24.0, tags of
1214               blobs and tags of tags are not supported (fast-export would die
1215               on such tags). tags of trees are not supported in any git
1216               version (since fast-export ignores tags of trees with a warning
1217               and fast-import provides no way to import them).
1218
1219           •   annotated and signed tags outside of the refs/tags/ namespace
1220               are not supported (their location will be mangled in weird
1221               ways)
1222
1223           •   fast-import will die on various forms of invalid input, such as
1224               a timezone with more than four digits
1225
1226           •   fast-export cannot reencode commit messages into UTF-8 if the
1227               commit message is not valid in its specified encoding (in such
1228               cases, it’ll leave the commit message and the encoding header
1229               alone).
1230
1231           •   commits without an author will be given one matching the
1232               committer
1233
1234           •   tags without a tagger will be given a fake tagger
1235
1236           •   references that include commit cycles in their history (which
1237               can be created with git-replace(1)) will not be flagged to the
1238               user as an error but will be silently deleted by fast-export as
1239               though the branch or tag contained no interesting files
1240
1241           There are also some limitations due to the design of these systems:
1242
1243           •   Trying to insert additional files into the stream can be
1244               tricky; since fast-export only lists file changes in a merge
1245               relative to its first parent, if you insert additional files
1246               into a commit that is in the second (or third or fourth) parent
1247               history of a merge, then you also need to add it to the merge
1248               manually. (Similarly, if you change which parent is the first
1249               parent in a merge commit, you need to manually update the list
1250               of file changes to be relative to the new first parent.)
1251
1252           •   fast-export and fast-import work with exact file contents, not
1253               patches. (e.g. "Whatever the current contents of this file,
1254               update them to now have these contents") Because of this,
1255               removing the changes made in a single commit or inserting
1256               additional changes to a file in some commit and expecting them
1257               to propagate forward is not something that can be done with
1258               these tools. Use git-rebase(1) for that.
1259
1260       Intrinsic limitations
1261           Some types of filtering have limitations that would affect any tool
1262           attempting to perform them; the most any tool can do is attempt to
1263           notify the user when it detects an issue:
1264
1265           •   When rewriting commit hashes in commit messages, there are a
1266               variety of cases when the hash will not be updated (whenever
1267               this happens, a note is written to
1268               .git/filter-repo/suboptimal-issues):
1269
1270               •   if a commit hash does not correspond to a commit in the old
1271                   repo
1272
1273               •   if a commit hash corresponds to a commit that gets pruned
1274
1275               •   if an abbreviated hash is not unique
1276
1277           •   Pruning of empty commits can cause a merge commit to lose an
1278               entire ancestry line and become a non-merge. If the merge
1279               commit had no changes then it can be pruned too, but if it
1280               still has changes it needs to be kept. This might cause minor
1281               confusion since the commit will likely have a commit message
1282               that makes it sound like a merge commit even though it’s not.
1283               (Whenever a merge commit becomes a non-merge commit, a note is
1284               written to .git/filter-repo/suboptimal-issues)
1285
1286       Issues specific to filter-repo
1287           •   Multiple repositories in the wild have been observed which use
1288               a bogus timezone (+051800); google will find you some reports.
1289               The intended timezone wasn’t clear or wasn’t always the same.
1290               Replace with a different bogus timezone that fast-import will
1291               accept (+0261).
1292
1293--path-rename can result in pathname collisions; to avoid
1294               excessive memory requirements of tracking which files are in
1295               all commits or looking up what files exist with either every
1296               commit or every usage of --path-rename, we just tell the user
1297               that they might clobber other changes if they aren’t careful.
1298               We can check if the clobbering comes from another --path-rename
1299               without much overhead. (Perhaps in the future it’s worth adding
1300               a slow mode to --path-rename that will do the more exhaustive
1301               checks?)
1302
1303           •   There is no mechanism for directly controlling which flags are
1304               passed to fast-export (or fast-import); only pre-defined flags
1305               can be turned on or off as a side-effect of other options.
1306               Direct control would make little sense because some options
1307               like --full-tree would require additional code in filter-repo
1308               (to parse new directives), and others such as -M or -C would
1309               break assumptions used in other places of filter-repo.
1310
1311           •   Partial-repo filtering, while supported, runs counter to
1312               filter-repo’s "avoid mixing old and new history" design. This
1313               support has required improvements to core git as well (e.g. it
1314               depends upon the --reference-excluded-parents option to
1315               fast-export that was added specifically for this usage within
1316               filter-repo). The --partial and --refs options will continue to
1317               be supported since there are people with usecases for them;
1318               however, I am concerned that this inconsistency about mixing
1319               old and new history seems likely to lead to user mistakes. For
1320               now, I just hope that long explanations of caveats in the
1321               documentation of these options suffice to curtail any such
1322               problems.
1323
1324       Comments on reversibility
1325           Some people are interested in reversibility of of a rewrite; e.g.
1326           rewrite history, possibly add some commits, then unrewrite and get
1327           the original history back plus a few new "unrewritten" commits.
1328           Obviously this is impossible if your rewrite involves throwing away
1329           information (e.g. filtering out files or replacing several
1330           different strings with ***REMOVED***), but may be possible with
1331           some rewrites. filter-repo is likely to be a poor fit for this type
1332           of workflow for a few reasons:
1333
1334           •   most of the limitations inherited from fast-export and
1335               fast-import are of a type that cause reversibility issues
1336
1337           •   grafts and replace refs, if present, are used in the rewrite
1338               and made permanent
1339
1340           •   rewriting of commit hashes will probably be reversible, but it
1341               is possible for rewritten abbreviated hashes to not be unique
1342               even if the original abbreviated hashes were.
1343
1344           •   filter-repo defaults to several forms of unreversible rewriting
1345               that you may need to turn off (e.g. the last two bullet points
1346               above or reencoding commit messages into UTF-8); it’s possible
1347               that additional forms of unreversible rewrites will be added in
1348               the future.
1349
1350           •   I assume that people use filter-repo for one-shot conversions,
1351               not ongoing data transfers. I explicitly reserve the right to
1352               change any API in filter-repo based on this presumption (and a
1353               comment to this effect is found in multiple places in the code
1354               and examples). You have been warned.
1355

SEE ALSO

1357       git-rebase(1), git-filter-branch(1)
1358

GIT

1360       Part of the git(1) suite
1361

NOTES

1363        1. GitLab’s excellent docs on reducing repository size
1364           https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html
1365
1366        2. GitHub’s otherwise dangerously out-of-date docs on removing
1367           sensitive data
1368           https://docs.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository
1369
1370
1371
1372Git 2.34.0.dirty                  11/15/2021                GIT-FILTER-REPO(1)
Impressum