1GIT-FILTER-REPO(1) Git Manual GIT-FILTER-REPO(1)
2
3
4
6 git-filter-repo - Rewrite repository history
7
9 git filter-repo --analyze
10 git filter-repo [<path_filtering_options>] [<content_filtering_options>]
11 [<ref_renaming_options>] [<commit_message_filtering_options>]
12 [<name_or_email_filtering_options>] [<parent_rewriting_options>]
13 [<generic_callback_options>] [<miscellaneous_options>]
14
15
17 Rapidly rewrite entire repository history using user-specified filters.
18 This is a destructive operation which should not be used lightly; it
19 writes new commits, trees, tags, and blobs corresponding to (but
20 filtered from) the original objects in the repository, then deletes the
21 original history and leaves only the new. See the section called
22 “DISCUSSION” for more details on the ramifications of using this tool.
23 Several different types of history rewrites are possible; examples
24 include (but are not limited to):
25
26 • stripping large files (or large directories or large extensions)
27
28 • stripping unwanted files by path
29
30 • extracting wanted paths and their history (stripping everything
31 else)
32
33 • restructuring the file layout (such as moving all files into a
34 subdirectory in preparation for merging with another repo, making a
35 subdirectory become the new toplevel directory, or merging two
36 directories with independent filenames into one directory)
37
38 • renaming tags (also often in preparation for merging with another
39 repo)
40
41 • replacing or removing sensitive text such as passwords
42
43 • making mailmap rewriting of user names or emails permanent
44
45 • making grafts or replacement refs permanent
46
47 • rewriting commit messages
48
49 Additionally, several concerns are handled automatically (many of these
50 can be overridden, but they are all on by default):
51
52 • rewriting (possibly abbreviated) hashes in commit messages to refer
53 to the new post-rewrite commit hashes
54
55 • pruning commits which become empty due to the above filters (also
56 handles edge cases like pruning of merge commits which become
57 degenerate and empty)
58
59 • creating replace-refs (see git-replace(1)) for old commit hashes,
60 which if manually pushed and fetched will allow users to continue
61 to refer to new commits using (unabbreviated) old commit IDs
62
63 • stripping of original history to avoid mixing old and new history
64
65 • repacking the repository post-rewrite to shrink the repo for the
66 user
67
68 Also, it’s worth noting that there is an important safety mechanism:
69
70 • abort if run from a repo that is not a fresh clone (to prevent
71 accidental data loss from rewriting local history that doesn’t
72 exist anywhere else). See the section called “FRESH CLONE SAFETY
73 CHECK AND --FORCE”.
74
75 For those who know that there is large unwanted stuff in their history
76 and want help finding it, this command also
77
78 • provides an option to analyze a repository and generate reports
79 that can be useful in determining what to filter (or in determining
80 whether a separate filtering command was successful).
81
82 See also the section called “VERSATILITY”, the section called
83 “DISCUSSION”, the section called “EXAMPLES”, and the section called
84 “INTERNALS”.
85
87 Analysis Options
88 --analyze
89 Analyze repository history and create a report that may be useful
90 in determining what to filter in a subsequent run (or in
91 determining if a previous filtering command did what you wanted).
92 Will not modify your repo.
93
94 Filtering based on paths (see also --filename-callback)
95 These options specify the paths to select. Note that much like git
96 itself, renames are NOT followed so you may need to specify multiple
97 paths, e.g. --path olddir/ --path newdir/
98
99 --invert-paths
100 Invert the selection of files from the specified
101 --path-{match,glob,regex} options below, i.e. only select files
102 matching none of those options.
103
104 --path-match <dir_or_file>, --path <dir_or_file>
105 Exact paths (files or directories) to include in filtered history.
106 Multiple --path options can be specified to get a union of paths.
107
108 --path-glob <glob>
109 Glob of paths to include in filtered history. Multiple --path-glob
110 options can be specified to get a union of paths.
111
112 --path-regex <regex>
113 Regex of paths to include in filtered history. Multiple
114 --path-regex options can be specified to get a union of paths.
115
116 --use-base-name
117 Match on file base name instead of full path from the top of the
118 repo. Incompatible with --path-rename, and incompatible with
119 matching against directory names.
120
121 Renaming based on paths (see also --filename-callback)
122 Note: if you combine path filtering with path renaming, be aware that a
123 rename directive does not select paths, it only says how to rename
124 paths that are selected with the filters.
125
126 --path-rename <old_name:new_name>, --path-rename-match
127 <old_name:new_name>
128 Path to rename; if filename or directory matches <old_name> rename
129 to <new_name>. Multiple --path-rename options can be specified.
130
131 Path shortcuts
132 --paths-from-file <filename>
133 Specify several path filtering and renaming directives, one per
134 line. Lines with ==> in them specify path renames, and lines can
135 begin with literal: (the default), glob:, or regex: to specify
136 different matching styles. Blank lines and lines starting with a #
137 are ignored (if you have a filename that you want to filter on that
138 starts with literal:, #, glob:, or regex:, then prefix the line
139 with literal:).
140
141 --subdirectory-filter <directory>
142 Only look at history that touches the given subdirectory and treat
143 that directory as the project root. Equivalent to using --path
144 <directory>/ --path-rename <directory>/:
145
146 --to-subdirectory-filter <directory>
147 Treat the project root as instead being under <directory>.
148 Equivalent to using --path-rename :<directory>/
149
150 Content editing filters (see also --blob-callback)
151 --replace-text <expressions_file>
152 A file with expressions that, if found, will be replaced. By
153 default, each expression is treated as literal text, but regex: and
154 glob: prefixes are supported. You can end the line with ==> and
155 some replacement text to choose a replacement choice other than the
156 default of ***REMOVED***.
157
158 --strip-blobs-bigger-than <size>
159 Strip blobs (files) bigger than specified size (e.g. 5M, 2G, etc)
160
161 --strip-blobs-with-ids <blob_id_filename>
162 Read git object ids from each line of the given file, and strip all
163 of them from history
164
165 Renaming of refs (see also --refname-callback)
166 --tag-rename <old:new>
167 Rename tags starting with <old> to start with <new>. For example,
168 --tag-rename foo:bar will rename tag foo-1.2.3 to bar-1.2.3; either
169 <old> or <new> can be empty.
170
171 Filtering of commit messages (see also --message-callback)
172 --replace-message <expressions_file>
173 A file with expressions that, if found in commit or tag messages,
174 will be replaced. This file uses the same syntax as --replace-text.
175
176 --preserve-commit-hashes
177 By default, since commits are rewritten and thus gain new hashes,
178 references to old commit hashes in commit messages are replaced
179 with new commit hashes (abbreviated to the same length as the old
180 reference). Use this flag to turn off updating commit hashes in
181 commit messages.
182
183 --preserve-commit-encoding
184 Do not reencode commit messages into UTF-8. By default, if the
185 commit object specifies an encoding for the commit message, the
186 message is re-encoded into UTF-8.
187
188 Filtering of names & emails (see also --name-callback and --email-callback)
189 --mailmap <filename>
190 Use specified mailmap file (see git-shortlog(1) for details on the
191 format) when rewriting author, committer, and tagger names and
192 emails. If the specified file is part of git history, historical
193 versions of the file will be ignored; only the current contents are
194 consulted.
195
196 --use-mailmap
197 Same as: --mailmap .mailmap
198
199 Parent rewriting
200 --replace-refs {delete-no-add, delete-and-add, update-no-add,
201 update-or-add, update-and-add}
202 Replace refs (see git-replace(1)) are used to rewrite parents
203 (unless turned off by the usual git mechanism); this flag specifies
204 what do do with those refs afterward. Replace refs can either be
205 deleted or updated to point at new commit hashes. Also, new replace
206 refs can be added for each commit rewrite. With update-or-add, new
207 replace refs are only added for commit rewrites that aren’t used to
208 update an existing replace ref. default is update-and-add if
209 $GIT_DIR/filter-repo/already_ran does not exist; update-or-add
210 otherwise.
211
212 --prune-empty {always, auto, never}
213 Whether to prune empty commits. auto (the default) means only
214 prune commits which become empty (not commits which were empty in
215 the original repo, unless their parent was pruned). When the parent
216 of a commit is pruned, the first non-pruned ancestor becomes the
217 new parent.
218
219 --prune-degenerate {always, auto, never}
220 Since merge commits are needed for history topology, they are
221 typically exempt from pruning. However, they can become degenerate
222 with the pruning of other commits (having fewer than two parents,
223 having one commit serve as both parents, or having one parent as
224 the ancestor of the other.) If such merge commits have no file
225 changes, they can be pruned. The default (auto) is to only prune
226 empty merge commits which become degenerate (not which started as
227 such).
228
229 --no-ff
230 Even if the first parent is or becomes an ancestor of another
231 parent, do not prune it. This modifies how --prune-degenerate
232 behaves, and may be useful in projects who always use merge
233 --no-ff.
234
235 Generic callback code snippets
236 --filename-callback <function_body>
237 Python code body for processing filenames; see the section called
238 “CALLBACKS”.
239
240 --message-callback <function_body>
241 Python code body for processing messages (both commit messages and
242 tag messages); see the section called “CALLBACKS”.
243
244 --name-callback <function_body>
245 Python code body for processing names of people; see the section
246 called “CALLBACKS”.
247
248 --email-callback <function_body>
249 Python code body for processing emails addresses; see the section
250 called “CALLBACKS”.
251
252 --refname-callback <function_body>
253 Python code body for processing refnames; see the section called
254 “CALLBACKS”.
255
256 --blob-callback <function_body>
257 Python code body for processing blob objects; see the section
258 called “CALLBACKS”.
259
260 --commit-callback <function_body>
261 Python code body for processing commit objects; see the section
262 called “CALLBACKS”.
263
264 --tag-callback <function_body>
265 Python code body for processing tag objects; see the section called
266 “CALLBACKS”.
267
268 --reset-callback <function_body>
269 Python code body for processing reset objects; see the section
270 called “CALLBACKS”.
271
272 Location to filter from/to
273 Note
274 Specifying alternate source or target locations implies --partial
275 except that the normal default for --replace-refs is used. However,
276 unlike normal uses of --partial, this doesn’t risk mixing old and
277 new history since the old and new histories are in different
278 repositories.
279
280 --source <source>
281 Git repository to read from
282
283 --target <target>
284 Git repository to overwrite with filtered history
285
286 Miscellaneous options
287 --help, -h
288 Show a help message and exit.
289
290 --force, -f
291 Ignore fresh clone checks and rewrite history (an irreversible
292 operation, especially since it by default ends with an immediate
293 pruning of reflogs and old objects). See the section called “FRESH
294 CLONE SAFETY CHECK AND --FORCE”. Note that when cloning repos on a
295 local filesystem, it is better to pass --no-local to git clone than
296 passing --force to git-filter-repo.
297
298 --partial
299 Do a partial history rewrite, resulting in the mixture of old and
300 new history. This implies a default of update-no-add for
301 --replace-refs, disables rewriting refs/remotes/origin/* to
302 refs/heads/*, disables removing of the origin remote, disables
303 removing unexported refs, disables expiring the reflog, and
304 disables the automatic post-filter gc. Also, this modifies
305 --tag-rename and --refname-callback options such that instead of
306 replacing old refs with new refnames, it will instead create new
307 refs and keep the old ones around. Use with caution.
308
309 --refs <refs+>
310 Limit history rewriting to the specified refs. Implies --partial.
311 In addition to the normal caveats of --partial (mixing old and new
312 history, no automatic remapping of refs/remotes/origin/* to
313 refs/heads/*, etc.), this also may cause problems for pruning of
314 degenerate empty merge commits when negative revisions are
315 specified.
316
317 --dry-run
318 Do not change the repository. Run git fast-export and filter its
319 output, and save both the original and the filtered version for
320 comparison. This also disables rewriting commit messages due to not
321 knowing new commit IDs and disables filtering of some empty commits
322 due to inability to query the fast-import backend.
323
324 --debug
325 Print additional information about operations being performed and
326 commands being run. (If used together with --dry-run, shows extra
327 information about what would be run).
328
329 --stdin
330 Instead of running git fast-export and filtering its output, filter
331 the fast-export stream from stdin. The stdin must be in the
332 expected input format (e.g. it needs to include original-oid
333 directives).
334
335 --quiet
336 Pass --quiet to other git commands called.
337
339 Every time filter-repo is run, files are created in the
340 .git/filter-repo/ directory. These files are overwritten
341 unconditionally on every run.
342
343 Commit map
344 The .git/filter-repo/commit-map file contains a mapping of how all
345 commits were (or were not) changed.
346
347 • A header is the first line with the text "old" and "new"
348
349 • Commit mappings are in no particular order
350
351 • All commits in range of the rewrite will be listed, even commits
352 that are unchanged (e.g. because the commit pre-dated when the
353 large file(s) were introduced to the repo).
354
355 • An all-zeros hash, or null SHA, represents a non-existent object.
356 When in the "new" column, this means the commit was removed
357 entirely.
358
359 Reference map
360 The .git/filter-repo/ref-map file contains a mapping of which local
361 references were changed.
362
363 • A header is the first line with the text "old", "new" and "ref"
364
365 • Reference mappings are in no particular order
366
367 • An all-zeros hash, or null SHA, represents a non-existent object.
368 When in the "new" column, this means the ref was removed entirely.
369
371 Since filter-repo does irreversible rewriting of history, it is
372 important to avoid making changes to a repo for which the user doesn’t
373 have a good backup. The primary defense mechanism is to simply educate
374 users and rely on them to be good stewards of their data; thus there
375 are several warnings in the documentation about how filter repo
376 rewrites history.
377
378 However, as a service to users, we would like to provide an additional
379 safety check beyond the documentation. There isn’t a good way to check
380 if the user has a good backup, but we can ask a related question that
381 is an imperfect but quite reasonable proxy: "Is this repository a fresh
382 clone?" Unfortunately, that is also a question we can’t get a perfect
383 answer to; git provides no way to answer that question. However, there
384 are approximately a dozen things that I found that seem to always be
385 true of brand new clones (assuming they are either clones of remote
386 repositories or are made with the --no-local flag), and I check for all
387 of those.
388
389 These checks can have both false positives and false negatives. Someone
390 might have a perfectly good backup of their repo without it actually
391 being a fresh clone — but there’s no way for filter-repo to know that.
392 Conversely, someone could look at all things that filter-repo checks
393 for in its safety checks and then just tweak their non-backed-up
394 repository to satisfy those conditions (though it would take a fair
395 amount of effort, and it’s astronomically unlikely that a repo that
396 isn’t a fresh clone randomly happens to match all the criteria). In
397 practice, the safety checks filter-repo uses seem to be really good at
398 avoiding people accidentally running filter-repo on a repository that
399 they shouldn’t be running it on. It even caught me once when I did mean
400 to run filter-repo but was in a different directory than I thought I
401 was.
402
403 In short, it’s perfectly fine to use ‘--force` to override the safety
404 checks as long as you’re okay with filter-repo irreversibly rewriting
405 the contents of the current repository. It is a really bad idea to get
406 in the habit of always specifying --force; if you do, one day you will
407 run one of your commands in the wrong directory like I did, and you
408 won’t have the safety check anymore to bail you out. Also, it is
409 definitely NOT okay to recommend --force on forums, Q&A sites, or in
410 emails to other users without first carefully explaining that --force
411 means putting your repositories’ data at risk. I am especially bothered
412 by people who suggest the flag when it clearly is NOT needed; they are
413 needlessly putting other peoples' data at risk.
414
416 filter-repo has a hierarchy of capabilities on the spectrum from easy
417 to use convenience flags that perform pre-defined types of filtering,
418 to choices that provide lots of flexibility in controlling how
419 filtering occurs. This spectrum includes the following:
420
421 • Convenience flags making common types of history rewriting simple
422 (e.g. --path, --strip-blobs-bigger-than, --replace-text, --mailmap)
423
424 • Options which are shorthand for others or which provide greater
425 control than others (e.g. --subdirectory-filter could just be
426 written using both a path selection (--path) and a path rename
427 (--path-rename) filter; --paths-from-file can handle all other
428 --path* options and more such as regex renaming of paths)
429
430 • Generic python callbacks for handling a certain type of data (the
431 filename, message, name, email, and refname callbacks)
432
433 • Generic python callbacks for handling fundamental git objects,
434 allowing greater control over the combination of data types the
435 object holds (the commit, tag, blob, and reset callbacks)
436
437 • The ability to import filter-repo as a module in a python program
438 and use its classes and functions for even greater control and
439 flexibility while still leveraging lots of basic capabilities. One
440 can even use this to write new tools with a completely different
441 interface.
442
443 For more information about callbacks, see the section called
444 “CALLBACKS”. For examples on writing python programs that import
445 filter-repo as a module to create new history rewriting tools, look at
446 the contrib/filter-repo-demos/ directory. That directory includes,
447 among other examples, a reimplementation of git-filter-branch which is
448 faster than git-filter-branch, and a reimplementation of BFG Repo
449 Cleaner with several bug fixes and new features.
450
452 Using filter-repo is relatively simple, but rewriting history is part
453 of a larger discussion in terms of collaboration. When you rewrite
454 history, the old and new histories are no longer compatible; if you
455 push this history somewhere for others to view, it will look as though
456 you’ve done a rebase of all branches and tags. Make sure you are
457 familiar with the "RECOVERING FROM UPSTREAM REBASE" section of git-
458 rebase(1) (and in particular, "The hard case") before proceeding, in
459 addition to this section.
460
461 Steps to use git-filter-repo as part of the bigger picture of doing a
462 history rewrite are roughly as follows:
463
464 1. Create a clone of your repository (if you created special refs
465 outside of refs/heads/ or refs/tags/, make sure to fetch those
466 too). You may pass --bare or --mirror to git clone, if you prefer.
467 You should pass --no-local if the repository you are cloning from
468 is on the local filesystem. Avoid other flags; some might confuse
469 the fresh clone check, and others could cause parts of the data to
470 be missing that are needed for the rewrite.
471
472 2. (Optional) Run git filter-repo --analyze. This will create a
473 directory of reports mentioning renames that have occurred in your
474 repo and also listing sizes of objects aggregated by
475 path/directory/extension/blob-id; this information may be useful in
476 choosing how to filter your repo. It can also be useful to re-run
477 --analyze after filtering to verify the changes look correct.
478
479 3. Run filter-repo with your desired filtering options. Many examples
480 are given below. For more complex cases, note that doing the
481 filtering in multiple steps (by running multiple filter-repo
482 invocations in a sequence) is supported. If anything goes wrong
483 here, simply delete your clone and restart.
484
485 4. Push your new repository to its new home (note that
486 refs/remotes/origin/* will have been moved to refs/heads/* as the
487 first part of filter-repo, so you can just deal with normal
488 branches instead of remote tracking branches). While you can force
489 push this to the same URL you cloned from, there are good reasons
490 to consider pushing to a different location instead:
491
492 • People who cloned from the original repo will have old history.
493 When they fetch the new history you force pushed up, unless
494 they do a git reset --hard @{u} on their branches or rebase
495 their local work, git will think they have hundreds or
496 thousands of commits with very similar commit messages as what
497 exist upstream (but which include files you wanted excised from
498 history), and allow the user to merge the two histories,
499 resulting in what looks like two copies of each commit. If they
500 then push this history back up, then everyone now has history
501 with two copies of each commit and the bad files have returned.
502 You’re more likely to succeed in forcing people to get rid of
503 the old history if they have to clone a new URL.
504
505 • Rewriting history will rewrite tags; those who have already
506 downloaded tags will not get the updated tags by default (see
507 the "On Re-tagging" section of git-tag(1)). Every user trying
508 to use an existing clone will have to forcibly delete all tags
509 and re-fetch them; it may be easier for them to just re-clone,
510 which they are more likely to do with a new clone URL.
511
512 • Rewriting history may delete some refs (e.g. branches that only
513 had files that you wanted excised from history); unless you run
514 git push with the --mirror or --prune options, those refs will
515 continue to exist on the server. If folks then merge these
516 branches into others, then people have started mixing old and
517 new history. If users had already cloned these branches,
518 removing them from the server isn’t enough; you need all users
519 to delete any local branches based on these refs and run fetch
520 with the --prune option as well. Simply re-cloning from a new
521 URL is easier.
522
523 • The server may not allow you to force push over some refs. For
524 example, code review systems may have special ref namespaces
525 (e.g. refs/changes/, refs/pull/, refs/merge-requests/) that
526 they have locked down.
527
528 5. If you still want to push your rewritten history back to the
529 original url despite my warnings above, you’ll have to manage it
530 very carefully:
531
532 • git-filter-repo deletes the "origin" remote to help avoid
533 people accidentally repushing to the same repository, so you’ll
534 need to remind git what origin’s url was. You’ll have to look
535 up the command for that.
536
537 • You’ll need to carefully synchronize with everyone who has
538 cloned the repository, and will also need to carefully
539 synchronize with everything (e.g. CI systems) that has cloned
540 it. Every single clone will either need to be thrown away and
541 re-cloned, or need to take all the steps outlined in item 4 as
542 well as follow the necessary steps from "RECOVERING FROM
543 UPSTREAM REBASE" section of git-rebase(1). If you miss fixing
544 any clones, you’ll risk mixing old and new history and end up
545 with an even worse mess to clean up.
546
547 • Finally, you’ll need to consult any documentation from your
548 hosting provider about how to remove any server-side references
549 to the old commits (example: GitLab’s excellent docs on
550 reducing repository size[1], or just the warning box that
551 references "GitHub support" from GitHub’s otherwise dangerously
552 out-of-date docs on removing sensitive data[2]).
553
554 6. (Optional) Some additional considerations
555
556 • filter-repo by default creates replace refs (see git-
557 replace(1)) for each rewritten commit ID, allowing you to use
558 old (unabbreviated) commit hashes in the git command line to
559 refer to the newly rewritten commits. If you want to use these
560 replace refs, manually push them to the relevant clone URL and
561 tell users to manually fetch them (e.g. by adjusting their
562 fetch refspec, git config --add remote.origin.fetch
563 +refs/replace/*:refs/replace/*). Sadly, replace refs are not
564 yet widely understood; projects like jgit and libgit2 do not
565 support them and existing repository managers (e.g. Gerrit,
566 GitHub, GitLab) do not yet understand replace refs. Thus one
567 can’t use old commit hashes within the UI of these other
568 systems. This may change in the future, but replace refs at
569 least help users locally within the git command line interface.
570 Also, be aware that commit-graphs are excessively cautious
571 around replace refs and just turn off entirely if any are
572 present, so after enough time has passed that old commit IDs
573 become less relevant, users may want to locally delete the
574 replace refs to regain the speedups from commit-graphs.
575
576 • If you have a central repo, you may want to prevent people from
577 pushing old commit IDs, in order to avoid mixing old and new
578 history. Every repository manager does this differently, some
579 provide specialized commands (e.g.
580 https://gerrit-review.googlesource.com/Documentation/cmd-ban-commit.html),
581 others require you to write hooks.
582
584 Path based filtering
585 To only keep the README.md file plus the directories guides and
586 tools/releases/:
587
588 git filter-repo --path README.md --path guides/ --path tools/releases
589
590
591 Directory names can be given with or without a trailing slash, and all
592 filenames are relative to the toplevel of the repo. To keep all files
593 except these paths, just add --invert-paths:
594
595 git filter-repo --path README.md --path guides/ --path tools/releases --invert-paths
596
597
598 If you want to have both an inclusion filter and an exclusion filter,
599 just run filter-repo multiple times. For example, to keep the src/main
600 subdirectory but exclude files under src/main named data, run:
601
602 git filter-repo --path src/main/
603 git filter-repo --path-glob 'src/*/data' --invert-paths
604
605
606 Note that the asterisk (*) will match across multiple directories, so
607 the second command would remove e.g. src/main/org/whatever/data. Also,
608 the second command by itself would also remove e.g.
609 src/not-main/foo/data, but since src/not-main/ was removed by the first
610 command, that’s not an issue. Also, the use of quotes around the
611 asterisk is sometimes important to avoid glob expansion by the shell.
612
613 You can also select paths by regular expression (see
614 https://docs.python.org/3/library/re.html#regular-expression-syntax).
615 For example, to only include files from the repo whose name is in the
616 format YYYY-MM-DD.txt and is found at least two subdirectories deep:
617
618 git filter-repo --path-regex '^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$'
619
620
621 If you want two directories to be renamed (and maybe merged if both are
622 renamed to the same location), use --path-rename; for example, to
623 rename both cmds/ and src/scripts/ to tools/:
624
625 git filter-repo --path-rename cmds:tools --path-rename src/scripts/:tools/
626
627
628 As with --path, directories can be specified with or without a trailing
629 slash for --path-rename.
630
631 If you do a --path-rename to something that was already in use, it will
632 be silently overwritten. However, if you try to rename multiple files
633 to the same location (e.g. src/scripts/run_release.sh and
634 cmds/run_release.sh both existed and had different content with the
635 renames above), then you will be given an error. If you have such a
636 case, you may want to add another rename command to move one of the
637 paths somewhere else where it won’t collide:
638
639 git filter-repo --path-rename cmds/run_release.sh:tools/do_release.sh \
640 --path-rename cmds/:tools/ \
641 --path-rename src/scripts/:tools/
642
643
644 Also, --path-rename brings up ordering issues; all path arguments are
645 applied in order. Thus, a command like
646
647 git filter-repo --path-rename sources/:src/main/ --path src/main/
648
649
650 would make sense but reversing the two arguments would not (src/main/
651 is created by the rename so reversing the two would give you an empty
652 repo). Also, note that the rename of cmds/run_release.sh a couple
653 examples ago was done before the other renames.
654
655 Note that path renaming does not do path filtering, thus the following
656 command
657
658 git filter-repo --path src/main/ --path-rename tools/:scripts/
659
660
661 would not result in the tools or scripts directories being present,
662 because the single filter selected only src/main/. It’s likely that you
663 would instead want to run:
664
665 git filter-repo --path src/main/ --path tools/ --path-rename tools/:scripts/
666
667
668 If you prefer to filter based solely on basename, use the
669 --use-base-name flag (though this is incompatible with --path-rename).
670 For example, to only include README.md and Makefile files from any
671 directory:
672
673 git filter-repo --use-base-name --path README.md --path Makefile
674
675
676 If you wanted to delete all .DS_Store files in any directory, you could
677 either use:
678
679 git filter-repo --invert-paths --path '.DS_Store' --use-base-name
680
681
682 or
683
684 git filter-repo --invert-paths --path-glob '*/.DS_Store' --path '.DS_Store'
685
686
687 (the --path-glob isn’t sufficient by itself as it might miss a toplevel
688 .DS_Store file; further while something like --path-glob '*.DS_Store'
689 would workaround that problem it would also grab files named
690 foo.DS_Store or bar/baz.DS_Store)
691
692 Finally, see also the --filename-callback from the section called
693 “CALLBACKS”.
694
695 Filtering based on many paths
696 If you have a long list of files, directories, globs, or regular
697 expressions to filter on, you can stick them in a file and use
698 --paths-from-file; for example, with a file named stuff-i-want.txt with
699 contents of
700
701 # Blank lines and comment lines are ignored.
702 # Examples similar to --path:
703 README.md
704 guides/
705 tools/releases
706
707 # An example that is like --path-glob:
708 glob:*.py
709
710 # An example that is like --path-regex:
711 regex:^.*/.*/[0-9]{4}-[0-9]{2}-[0-9]{2}.txt$
712
713 # An example of renaming a path
714 tools/==>scripts/
715
716 # An example of using a regex to rename a path
717 regex:(.*)/([^/]*)/([^/]*)\.text$==>\2/\1/\3.txt
718
719
720 then you could run
721
722 git filter-repo --paths-from-file stuff-i-want.txt
723
724
725 to get a repo containing only the toplevel README.md file, the guides/
726 and tools/releases/ directories, all python files, files whose name was
727 of the form YYYY-MM-DD.txt at least two subdirectories deep, and would
728 rename tools/ to scripts/ and rename files like foo/bar/baz.text to
729 bar/foo/baz.txt. Note the special line prefixes of glob: and regex: and
730 the special string ==> denoting renames.
731
732 Sometimes you have a way of easily generating all the files you want.
733 For example, if you know that none of the currently tracked files have
734 any newlines or special characters in them (see core.quotePath from git
735 config --help) so that git ls-files would print all files literally one
736 per line, and you knew that you wanted to keep only the files that are
737 currently tracked (thus deleting from all commits in history any files
738 that only appear on other branches or that only appear in older
739 commits), then you could use a pair of commands such as
740
741 git ls-files >../paths-i-want.txt
742 git filter-repo --paths-from-file ../paths-i-want.txt
743
744
745 Similarly, you could use --paths-from-file to delete many files. For
746 example, you could run git filter-repo --analyze to get reports, look
747 in one such as .git/filter-repo/analysis/path-deleted-sizes.txt and
748 copy all the filenames into a file such as
749 /tmp/files-i-dont-want-anymore.txt and then run
750
751 git filter-repo --invert-paths --paths-from-file /tmp/files-i-dont-want-anymore.txt
752
753
754 to delete them all.
755
756 Directory based shortcuts
757 Let’s say you had a directory structure like the following:
758
759 module/
760 foo.c
761 bar.c
762 otherDir/
763 blah.config
764 stuff.txt
765 zebra.jpg
766
767 If you wanted just the module/ directory and you wanted it to become
768 the new root so that your new directory structure looked like
769
770 foo.c
771 bar.c
772
773 then you could run:
774
775 git filter-repo --subdirectory-filter module/
776
777
778 If you wanted all the files from the original repo, but wanted to move
779 everything under a subdirectory named my-module/, so that your new
780 directory structure looked like
781
782 my-module/
783 module/
784 foo.c
785 bar.c
786 otherDir/
787 blah.config
788 stuff.txt
789 zebra.jpg
790
791 then you would instead run run
792
793 git filter-repo --to-subdirectory-filter my-module/
794
795
796 Content based filtering
797 If you want to filter out all files bigger than a certain size, you can
798 use --strip-blobs-bigger-than with some size (K, M, and G suffixes are
799 recognized), e.g.:
800
801 git filter-repo --strip-blobs-bigger-than 10M
802
803
804 If you want to strip out all files with specified git object ids
805 (hashes), list the hashes in a file and run
806
807 git filter-repo --strip-blobs-with-ids FILE_WITH_GIT_BLOB_IDS
808
809
810 If you want to modify file contents, you can do so based on a list of
811 expressions in a file, one per line. For example, with a file named
812 expressions.txt containing
813
814 p455w0rd
815 foo==>bar
816 glob:*666*==>
817 regex:\bdriver\b==>pilot
818 literal:MM/DD/YYYY==>YYYY-MM-DD
819 regex:([0-9]{2})/([0-9]{2})/([0-9]{4})==>\3-\1-\2
820
821
822 then running
823
824 git filter-repo --replace-text expressions.txt
825
826
827 will go through and replace p455w0rd with ***REMOVED***, foo with bar,
828 any line containing 666 with a blank line, the word driver with pilot
829 (but not if it has letters before or after; e.g. drivers will be
830 unmodified), replace the exact text MM/DD/YYYY with YYYY-MM-DD and
831 replace date strings of the form MM/DD/YYYY with ones of the form
832 YYYY-MM-DD. In the expressions file, there are a few things to note:
833
834 • Every line has a replacement, given by whatever is on the right of
835 ==>. If ==> does not appear on the line, the default replacement is
836 ***REMOVED***.
837
838 • Lines can start with literal:, glob:, or regex: to specify whether
839 to do literal string matches, globs (see
840 https://docs.python.org/3/library/fnmatch.html), or regular
841 expressions (see
842 https://docs.python.org/3/library/re.html#regular-expression-syntax).
843 If none of these are specified, literal: is assumed.
844
845 • If multiple matches are found, all are replaced.
846
847 • globs and regexes are applied to the entire file, but without any
848 special flags turned on. Some folks may be interested in adding
849 (?m) to the regex to turn on MULTILINE mode, so that ^ and $ match
850 the beginning and ends of lines rather than the beginning and end
851 of file. See https://docs.python.org/3/library/re.html for details.
852
853 See also the --blob-callback from the section called “CALLBACKS”.
854
855 Updating commit/tag messages
856 If you want to modify commit or tag messages, you can do so with the
857 same syntax as --replace-text, explained above. For example, with a
858 file named expressions.txt containing
859
860 foo==>bar
861
862
863 then running
864
865 git filter-repo --replace-message expressions.txt
866
867
868 will replace foo in commit or tag messages with bar.
869
870 See also the --message-callback from the section called “CALLBACKS”.
871
872 Refname based filtering
873 To rename tags, use --tag-rename, e.g.:
874
875 git filter-repo --tag-rename foo:bar
876
877
878 This will rename any tags starting with foo to now start with bar.
879 Either side of the colon could be blank, e.g.
880
881 git filter-repo --tag-rename '':'my-module-'
882
883
884 For more general refname modification, see --refname-callback from the
885 section called “CALLBACKS”.
886
887 User and email based filtering
888 To modify username and emails of commits, you can create a mailmap file
889 in the format accepted by git-shortlog(1). For example, if you have a
890 file named my-mailmap you can run
891
892 git filter-repo --mailmap my-mailmap
893
894
895 and if the current contents of that file are as follows (if the
896 specified mailmap file is version controlled, historical versions of
897 the file are ignored):
898
899 Name For User <email@addre.ss>
900 <new@ema.il> <old1@ema.il>
901 New Name And <new@ema.il> <old2@ema.il>
902 New Name And <new@ema.il> Old Name And <old3@ema.il>
903
904
905 then we can update username and/or emails based on the specified
906 mapping.
907
908 See also the --name-callback and --email-callback from the section
909 called “CALLBACKS”.
910
911 Parent rewriting
912 To replace $commit_A with $commit_B (e.g. make all commits which had
913 $commit_A as a parent instead have $commit_B for that parent), and
914 rewrite history to make it permanent:
915
916 git replace $commit_A $commit_B
917 git filter-repo --force
918
919
920 To create a new commit with the same contents as $commit_A except with
921 different parent(s) and then replace $commit_A with the new commit, and
922 rewrite history to make it permanent:
923
924 git replace --graft $commit_A $new_parent_or_parents
925 git filter-repo --force
926
927
928 The reason to specify --force is two-fold: filter-repo will error out
929 if no arguments are specified, and the new graft commit would otherwise
930 trigger the not-a-fresh-clone check.
931
932 Partial history rewrites
933 To rewrite the history on just one branch (which may cause it to no
934 longer share any common history with other branches), use --refs. For
935 example, to remove a file named extraneous.txt from the master branch:
936
937 git filter-repo --invert-paths --path extraneous.txt --refs master
938
939
940 To rewrite just some recent commits:
941
942 git filter-repo --invert-paths --path extraneous.txt --refs master~3..master
943
944
946 For flexibility, filter-repo allows you to specify functions on the
947 command line to further filter all changes. Please note that there are
948 some API compatibility caveats associated with these callbacks that you
949 should be aware of before using them; see the "API BACKWARD
950 COMPATIBILITY CAVEAT" comment near the top of git-filter-repo source
951 code.
952
953 All callback functions are of the same general format. For a command
954 line argument like
955
956 --foo-callback 'BODY'
957
958
959 the following code will be compiled and called:
960
961 def foo_callback(foo):
962 BODY
963
964
965 Thus, you just need to make sure your BODY modifies and returns foo
966 appropriately. One important thing to note for all callbacks is that
967 filter-repo uses bytestrings (see
968 https://docs.python.org/3/library/stdtypes.html#bytes) everywhere
969 instead of strings.
970
971 There are four callbacks that allow you to operate directly on raw
972 objects that contain data that’s easy to write in git-fast-import(1)
973 format:
974
975 --blob-callback
976 --commit-callback
977 --tag-callback
978 --reset-callback
979
980
981 We’ll come back to these later because it is often the case that the
982 other callbacks are more convenient. The other callbacks operate on a
983 small piece of the raw objects or operate on pieces across multiple
984 types of raw object (e.g. author names and committer names and tagger
985 names across commits and tags, or refnames across commits, tags, and
986 resets, or messages across commits and tags). The convenience callbacks
987 are:
988
989 --filename-callback
990 --message-callback
991 --name-callback
992 --email-callback
993 --refname-callback
994
995
996 in each you are expected to simply return a new value based on the one
997 passed in. For example,
998
999 git-filter-repo --name-callback 'return name.replace(b"Wiliam", b"William")'
1000
1001
1002 would result in the following function being called:
1003
1004 def name_callback(name):
1005 return name.replace(b"Wiliam", b"William")
1006
1007
1008 The email callback is quite similar:
1009
1010 git-filter-repo --email-callback 'return email.replace(b".cm", b".com")'
1011
1012
1013 The refname callback is also similar, but note that the refname passed
1014 in and returned are expected to be fully qualified (e.g.
1015 b"refs/heads/master" instead of just b"master" and b"refs/tags/v1.0.7"
1016 instead of b"1.0.7"):
1017
1018 git-filter-repo --refname-callback '
1019 # Change e.g. refs/heads/master to refs/heads/prefix-master
1020 rdir,rpath = os.path.split(refname)
1021 return rdir + b"/prefix-" + rpath'
1022
1023
1024 The message callback is quite similar to the previous three callbacks,
1025 though it operates on a bytestring that is likely more than one line:
1026
1027 git-filter-repo --message-callback '
1028 if b"Signed-off-by:" not in message:
1029 message += b"\nSigned-off-by: Me My <self@and.eye>"
1030 return re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)'
1031
1032
1033 The filename callback is slightly more interesting. Returning None
1034 means the file should be removed from all commits, returning the
1035 filename unmodified marks the file to be kept, and returning a
1036 different name means the file should be renamed. An example:
1037
1038 git-filter-repo --filename-callback '
1039 if b"/src/" in filename:
1040 # Remove all files with a directory named "src" in their path
1041 # (except when "src" appears at the toplevel).
1042 return None
1043 elif filename.startswith(b"tools/"):
1044 # Rename tools/ -> scripts/misc/
1045 return b"scripts/misc/" + filename[6:]
1046 else:
1047 # Keep the filename and do not rename it
1048 return filename
1049 '
1050
1051
1052 In contrast, the blob, reset, tag, and commit callbacks are not
1053 expected to return a value, but are instead expected to modify the
1054 object passed in. Major fields for these objects are (subject to API
1055 backward compatibility caveats mentioned previously):
1056
1057 • Blob: original_id (original hash) and data
1058
1059 • Reset: ref (name of reference) and from_ref (hash or integer mark)
1060
1061 • Tag: ref, from_ref, original_id, tagger_name, tagger_email,
1062 tagger_date, message
1063
1064 • Commit: branch, original_id, author_name, author_email,
1065 author_date, committer_name, committer_email, committer_date,
1066 message, file_changes (list of FileChange objects, each containing
1067 a type, filename, mode, and blob_id), parents (list of hashes or
1068 integer marks)
1069
1070 An example of each:
1071
1072 git filter-repo --blob-callback '
1073 if len(blob.data) > 25:
1074 # Mark this blob for removal from all commits
1075 blob.skip()
1076 else:
1077 blob.data = blob.data.replace(b"Hello", b"Goodbye")
1078 '
1079
1080
1081
1082 git filter-repo --reset-callback 'reset.ref = reset.ref.replace(b"master", b"dev")'
1083
1084
1085
1086 git filter-repo --tag-callback '
1087 if tag.tagger_name == b"Jim Williams":
1088 # Omit this tag
1089 tag.skip()
1090 else:
1091 tag.message = tag.message + b"\n\nTag of %s by %s on %s" % (tag.ref, tag.tagger_email, tag.tagger_date)'
1092
1093
1094
1095 git filter-repo --commit-callback '
1096 # Remove executable files with three 6s in their name (including
1097 # from leading directories).
1098 # Also, undo deletion of sources/foo/bar.txt (change types are
1099 # either b"D" (deletion) or b"M" (add or modify); renames are
1100 # handled by deleting the old file and adding a new one)
1101 commit.file_changes = [
1102 change for change in commit.file_changes
1103 if not (change.mode == b"100755" and
1104 change.filename.count(b"6") == 3) and
1105 not (change.type == b"D" and
1106 change.filename == b"sources/foo/bar.txt")]
1107 # Mark all .sh files as executable; modes in git are always one of
1108 # 100644 (normal file), 100755 (executable), 120000 (symlink), or
1109 # 160000 (submodule)
1110 for change in commit.file_changes:
1111 if change.filename.endswith(b".sh"):
1112 change.mode = b"100755"
1113 '
1114
1115
1117 You probably don’t need to read this section unless you are just very
1118 curious or you are trying to do a very complex history rewrite.
1119
1120 How filter-repo works
1121 Roughly, filter-repo works by running
1122
1123 git fast-export <options> | filter | git fast-import <options>
1124
1125
1126 where filter-repo not only launches the whole pipeline but also serves
1127 as the filter in the middle. However, filter-repo does a few additional
1128 things on top in order to make it into a well-rounded filtering tool. A
1129 sequence that more accurately reflects what filter-repo runs is:
1130
1131 1. Verify we’re in a fresh clone
1132
1133 2. git fetch -u . refs/remotes/origin/*:refs/heads/*
1134
1135 3. git remote rm origin
1136
1137 4. git fast-export --show-original-ids --reference-excluded-parents
1138 --fake-missing-tagger --signed-tags=strip
1139 --tag-of-filtered-object=rewrite --use-done-feature --no-data
1140 --reencode=yes --mark-tags --all | filter | git -c
1141 core.ignorecase=false fast-import --date-format=raw-permissive
1142 --force --quiet
1143
1144 5. git update-ref --no-deref --stdin, fed with a list of refs to nuke,
1145 and a list of replace refs to delete, create, or update.
1146
1147 6. git reset --hard
1148
1149 7. git reflog expire --expire=now --all
1150
1151 8. git gc --prune=now
1152
1153 Some notes or exceptions on each of the above:
1154
1155 1. If we’re not in a fresh clone, users will not be able to recover if
1156 they used the wrong command or ran in the wrong repo. (Though
1157 --force overrides this check, and it’s also off if you’ve already
1158 ran filter-repo once in this repo.)
1159
1160 2. Technically, we actually use a git update-ref command fed with a
1161 lot of input due to the fact that users can use --force when local
1162 branches might not match remote branches. But this fetch command
1163 catches the intent rather succinctly.
1164
1165 3. We don’t want users accidentally pushing back to the original repo,
1166 as discussed in the section called “DISCUSSION”. It also reminds
1167 users that since history has been rewritten, this repo is no longer
1168 compatible with the original. Finally, another minor benefit is
1169 this allows users to push with the --mirror option to their new
1170 home without accidentally sending remote tracking branches.
1171
1172 4. Some of these flags are always used but others are actually
1173 conditional. For example, filter-repo’s --replace-text and
1174 --blob-callback options need to work on blobs so --no-data cannot
1175 be passed to fast-export. But when we don’t need to work on blobs,
1176 passing --no-data speeds things up. Also, other flags may change
1177 the structure of the pipeline as well (e.g. --dry-run and --debug)
1178
1179 5. We use this step to write replace refs for accessing the newly
1180 written commit hashes using their previous names. Also, if refs
1181 were renamed by various steps, we need to delete the old refnames
1182 in order to avoid mixing old and new history.
1183
1184 6. Users also have old versions of files in their working tree and
1185 index; we want those cleaned up to match the rewritten history as
1186 well. Note that this step is skipped in bare repos.
1187
1188 7. Reflogs will hold on to old history, so we need to expire them.
1189
1190 8. We need to gc to avoid mixing new and old history. Also, it shrinks
1191 the repository for users, so they don’t have to do extra work.
1192 (Odds are that they’ve only rewritten trees and commits and maybe a
1193 few blobs, so --aggressive isn’t needed and would be too slow.)
1194
1195 Information about these steps is printed out when --debug is passed to
1196 filter-repo. When doing a --partial history rewrite, steps 2, 3, 7, and
1197 8 are unconditionally skipped, step 5 is skipped if --replace-refs is
1198 update-no-add, and just the nuke-unused-refs portion of step 5 is
1199 skipped if --replace-refs is something else.
1200
1201 Limitations
1202 Inherited limitations
1203 Since git filter-repo calls fast-export and fast-import to do a lot
1204 of the heavy lifting, it inherits limitations from those systems:
1205
1206 • extended commit headers, if any, are stripped
1207
1208 • commits get rewritten meaning they will have new hashes;
1209 therefore, signatures on commits and tags cannot continue to
1210 work and instead are just removed (thus signed tags become
1211 annotated tags)
1212
1213 • tags of commits are supported. Prior to git-2.24.0, tags of
1214 blobs and tags of tags are not supported (fast-export would die
1215 on such tags). tags of trees are not supported in any git
1216 version (since fast-export ignores tags of trees with a warning
1217 and fast-import provides no way to import them).
1218
1219 • annotated and signed tags outside of the refs/tags/ namespace
1220 are not supported (their location will be mangled in weird
1221 ways)
1222
1223 • fast-import will die on various forms of invalid input, such as
1224 a timezone with more than four digits
1225
1226 • fast-export cannot reencode commit messages into UTF-8 if the
1227 commit message is not valid in its specified encoding (in such
1228 cases, it’ll leave the commit message and the encoding header
1229 alone).
1230
1231 • commits without an author will be given one matching the
1232 committer
1233
1234 • tags without a tagger will be given a fake tagger
1235
1236 • references that include commit cycles in their history (which
1237 can be created with git-replace(1)) will not be flagged to the
1238 user as an error but will be silently deleted by fast-export as
1239 though the branch or tag contained no interesting files
1240
1241 There are also some limitations due to the design of these systems:
1242
1243 • Trying to insert additional files into the stream can be
1244 tricky; since fast-export only lists file changes in a merge
1245 relative to its first parent, if you insert additional files
1246 into a commit that is in the second (or third or fourth) parent
1247 history of a merge, then you also need to add it to the merge
1248 manually. (Similarly, if you change which parent is the first
1249 parent in a merge commit, you need to manually update the list
1250 of file changes to be relative to the new first parent.)
1251
1252 • fast-export and fast-import work with exact file contents, not
1253 patches. (e.g. "Whatever the current contents of this file,
1254 update them to now have these contents") Because of this,
1255 removing the changes made in a single commit or inserting
1256 additional changes to a file in some commit and expecting them
1257 to propagate forward is not something that can be done with
1258 these tools. Use git-rebase(1) for that.
1259
1260 Intrinsic limitations
1261 Some types of filtering have limitations that would affect any tool
1262 attempting to perform them; the most any tool can do is attempt to
1263 notify the user when it detects an issue:
1264
1265 • When rewriting commit hashes in commit messages, there are a
1266 variety of cases when the hash will not be updated (whenever
1267 this happens, a note is written to
1268 .git/filter-repo/suboptimal-issues):
1269
1270 • if a commit hash does not correspond to a commit in the old
1271 repo
1272
1273 • if a commit hash corresponds to a commit that gets pruned
1274
1275 • if an abbreviated hash is not unique
1276
1277 • Pruning of empty commits can cause a merge commit to lose an
1278 entire ancestry line and become a non-merge. If the merge
1279 commit had no changes then it can be pruned too, but if it
1280 still has changes it needs to be kept. This might cause minor
1281 confusion since the commit will likely have a commit message
1282 that makes it sound like a merge commit even though it’s not.
1283 (Whenever a merge commit becomes a non-merge commit, a note is
1284 written to .git/filter-repo/suboptimal-issues)
1285
1286 Issues specific to filter-repo
1287 • Multiple repositories in the wild have been observed which use
1288 a bogus timezone (+051800); google will find you some reports.
1289 The intended timezone wasn’t clear or wasn’t always the same.
1290 Replace with a different bogus timezone that fast-import will
1291 accept (+0261).
1292
1293 • --path-rename can result in pathname collisions; to avoid
1294 excessive memory requirements of tracking which files are in
1295 all commits or looking up what files exist with either every
1296 commit or every usage of --path-rename, we just tell the user
1297 that they might clobber other changes if they aren’t careful.
1298 We can check if the clobbering comes from another --path-rename
1299 without much overhead. (Perhaps in the future it’s worth adding
1300 a slow mode to --path-rename that will do the more exhaustive
1301 checks?)
1302
1303 • There is no mechanism for directly controlling which flags are
1304 passed to fast-export (or fast-import); only pre-defined flags
1305 can be turned on or off as a side-effect of other options.
1306 Direct control would make little sense because some options
1307 like --full-tree would require additional code in filter-repo
1308 (to parse new directives), and others such as -M or -C would
1309 break assumptions used in other places of filter-repo.
1310
1311 • Partial-repo filtering, while supported, runs counter to
1312 filter-repo’s "avoid mixing old and new history" design. This
1313 support has required improvements to core git as well (e.g. it
1314 depends upon the --reference-excluded-parents option to
1315 fast-export that was added specifically for this usage within
1316 filter-repo). The --partial and --refs options will continue to
1317 be supported since there are people with usecases for them;
1318 however, I am concerned that this inconsistency about mixing
1319 old and new history seems likely to lead to user mistakes. For
1320 now, I just hope that long explanations of caveats in the
1321 documentation of these options suffice to curtail any such
1322 problems.
1323
1324 Comments on reversibility
1325 Some people are interested in reversibility of a rewrite; e.g.
1326 rewrite history, possibly add some commits, then unrewrite and get
1327 the original history back plus a few new "unrewritten" commits.
1328 Obviously this is impossible if your rewrite involves throwing away
1329 information (e.g. filtering out files or replacing several
1330 different strings with ***REMOVED***), but may be possible with
1331 some rewrites. filter-repo is likely to be a poor fit for this type
1332 of workflow for a few reasons:
1333
1334 • most of the limitations inherited from fast-export and
1335 fast-import are of a type that cause reversibility issues
1336
1337 • grafts and replace refs, if present, are used in the rewrite
1338 and made permanent
1339
1340 • rewriting of commit hashes will probably be reversible, but it
1341 is possible for rewritten abbreviated hashes to not be unique
1342 even if the original abbreviated hashes were.
1343
1344 • filter-repo defaults to several forms of irreversible rewriting
1345 that you may need to turn off (e.g. the last two bullet points
1346 above or reencoding commit messages into UTF-8); it’s possible
1347 that additional forms of irreversible rewrites will be added in
1348 the future.
1349
1350 • I assume that people use filter-repo for one-shot conversions,
1351 not ongoing data transfers. I explicitly reserve the right to
1352 change any API in filter-repo based on this presumption (and a
1353 comment to this effect is found in multiple places in the code
1354 and examples). You have been warned.
1355
1357 git-rebase(1), git-filter-branch(1)
1358
1360 Part of the git(1) suite
1361
1363 1. GitLab’s excellent docs on reducing repository size
1364 https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html
1365
1366 2. GitHub’s otherwise dangerously out-of-date docs on removing
1367 sensitive data
1368 https://docs.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository
1369
1370
1371
1372Git 2.38.0.dirty 10/10/2022 GIT-FILTER-REPO(1)