1GITDIFFCORE(7)                    Git Manual                    GITDIFFCORE(7)
2
3
4

NAME

6       gitdiffcore - Tweaking diff output
7

SYNOPSIS

9       git diff *
10

DESCRIPTION

12       The diff commands git diff-index, git diff-files, and git diff-tree can
13       be told to manipulate differences they find in unconventional ways
14       before showing diff output. The manipulation is collectively called
15       "diffcore transformation". This short note describes what they are and
16       how to use them to produce diff output that is easier to understand
17       than the conventional kind.
18

THE CHAIN OF OPERATION

20       The git diff-* family works by first comparing two sets of files:
21
22git diff-index compares contents of a "tree" object and the working
23           directory (when --cached flag is not used) or a "tree" object and
24           the index file (when --cached flag is used);
25
26git diff-files compares contents of the index file and the working
27           directory;
28
29git diff-tree compares contents of two "tree" objects;
30
31       In all of these cases, the commands themselves first optionally limit
32       the two sets of files by any pathspecs given on their command-lines,
33       and compare corresponding paths in the two resulting sets of files.
34
35       The pathspecs are used to limit the world diff operates in. They remove
36       the filepairs outside the specified sets of pathnames. E.g. If the
37       input set of filepairs included:
38
39           :100644 100644 bcd1234... 0123456... M junkfile
40
41       but the command invocation was git diff-files myfile, then the junkfile
42       entry would be removed from the list because only "myfile" is under
43       consideration.
44
45       The result of comparison is passed from these commands to what is
46       internally called "diffcore", in a format similar to what is output
47       when the -p option is not used. E.g.
48
49           in-place edit  :100644 100644 bcd1234... 0123456... M file0
50           create         :000000 100644 0000000... 1234567... A file4
51           delete         :100644 000000 1234567... 0000000... D file5
52           unmerged       :000000 000000 0000000... 0000000... U file6
53
54       The diffcore mechanism is fed a list of such comparison results (each
55       of which is called "filepair", although at this point each of them
56       talks about a single file), and transforms such a list into another
57       list. There are currently 5 such transformations:
58
59       •   diffcore-break
60
61       •   diffcore-rename
62
63       •   diffcore-merge-broken
64
65       •   diffcore-pickaxe
66
67       •   diffcore-order
68
69       •   diffcore-rotate
70
71       These are applied in sequence. The set of filepairs git diff-* commands
72       find are used as the input to diffcore-break, and the output from
73       diffcore-break is used as the input to the next transformation. The
74       final result is then passed to the output routine and generates either
75       diff-raw format (see Output format sections of the manual for git
76       diff-* commands) or diff-patch format.
77

DIFFCORE-BREAK: FOR SPLITTING UP COMPLETE REWRITES

79       The second transformation in the chain is diffcore-break, and is
80       controlled by the -B option to the git diff-* commands. This is used to
81       detect a filepair that represents "complete rewrite" and break such
82       filepair into two filepairs that represent delete and create. E.g. If
83       the input contained this filepair:
84
85           :100644 100644 bcd1234... 0123456... M file0
86
87       and if it detects that the file "file0" is completely rewritten, it
88       changes it to:
89
90           :100644 000000 bcd1234... 0000000... D file0
91           :000000 100644 0000000... 0123456... A file0
92
93       For the purpose of breaking a filepair, diffcore-break examines the
94       extent of changes between the contents of the files before and after
95       modification (i.e. the contents that have "bcd1234..." and "0123456..."
96       as their SHA-1 content ID, in the above example). The amount of
97       deletion of original contents and insertion of new material are added
98       together, and if it exceeds the "break score", the filepair is broken
99       into two. The break score defaults to 50% of the size of the smaller of
100       the original and the result (i.e. if the edit shrinks the file, the
101       size of the result is used; if the edit lengthens the file, the size of
102       the original is used), and can be customized by giving a number after
103       "-B" option (e.g. "-B75" to tell it to use 75%).
104

DIFFCORE-RENAME: FOR DETECTING RENAMES AND COPIES

106       This transformation is used to detect renames and copies, and is
107       controlled by the -M option (to detect renames) and the -C option (to
108       detect copies as well) to the git diff-* commands. If the input
109       contained these filepairs:
110
111           :100644 000000 0123456... 0000000... D fileX
112           :000000 100644 0000000... 0123456... A file0
113
114       and the contents of the deleted file fileX is similar enough to the
115       contents of the created file file0, then rename detection merges these
116       filepairs and creates:
117
118           :100644 100644 0123456... 0123456... R100 fileX file0
119
120       When the "-C" option is used, the original contents of modified files,
121       and deleted files (and also unmodified files, if the
122       "--find-copies-harder" option is used) are considered as candidates of
123       the source files in rename/copy operation. If the input were like these
124       filepairs, that talk about a modified file fileY and a newly created
125       file file0:
126
127           :100644 100644 0123456... 1234567... M fileY
128           :000000 100644 0000000... bcd3456... A file0
129
130       the original contents of fileY and the resulting contents of file0 are
131       compared, and if they are similar enough, they are changed to:
132
133           :100644 100644 0123456... 1234567... M fileY
134           :100644 100644 0123456... bcd3456... C100 fileY file0
135
136       In both rename and copy detection, the same "extent of changes"
137       algorithm used in diffcore-break is used to determine if two files are
138       "similar enough", and can be customized to use a similarity score
139       different from the default of 50% by giving a number after the "-M" or
140       "-C" option (e.g. "-M8" to tell it to use 8/10 = 80%).
141
142       Note that when rename detection is on but both copy and break detection
143       are off, rename detection adds a preliminary step that first checks if
144       files are moved across directories while keeping their filename the
145       same. If there is a file added to a directory whose contents is
146       sufficiently similar to a file with the same name that got deleted from
147       a different directory, it will mark them as renames and exclude them
148       from the later quadratic step (the one that pairwise compares all
149       unmatched files to find the "best" matches, determined by the highest
150       content similarity). So, for example, if a deleted docs/ext.txt and an
151       added docs/config/ext.txt are similar enough, they will be marked as a
152       rename and prevent an added docs/ext.md that may be even more similar
153       to the deleted docs/ext.txt from being considered as the rename
154       destination in the later step. For this reason, the preliminary "match
155       same filename" step uses a bit higher threshold to mark a file pair as
156       a rename and stop considering other candidates for better matches. At
157       most, one comparison is done per file in this preliminary pass; so if
158       there are several remaining ext.txt files throughout the directory
159       hierarchy after exact rename detection, this preliminary step will be
160       skipped for those files.
161
162       Note. When the "-C" option is used with --find-copies-harder option,
163       git diff-* commands feed unmodified filepairs to diffcore mechanism as
164       well as modified ones. This lets the copy detector consider unmodified
165       files as copy source candidates at the expense of making it slower.
166       Without --find-copies-harder, git diff-* commands can detect copies
167       only if the file that was copied happened to have been modified in the
168       same changeset.
169

DIFFCORE-MERGE-BROKEN: FOR PUTTING COMPLETE REWRITES BACK TOGETHER

171       This transformation is used to merge filepairs broken by
172       diffcore-break, and not transformed into rename/copy by
173       diffcore-rename, back into a single modification. This always runs when
174       diffcore-break is used.
175
176       For the purpose of merging broken filepairs back, it uses a different
177       "extent of changes" computation from the ones used by diffcore-break
178       and diffcore-rename. It counts only the deletion from the original, and
179       does not count insertion. If you removed only 10 lines from a 100-line
180       document, even if you added 910 new lines to make a new 1000-line
181       document, you did not do a complete rewrite. diffcore-break breaks such
182       a case in order to help diffcore-rename to consider such filepairs as
183       candidate of rename/copy detection, but if filepairs broken that way
184       were not matched with other filepairs to create rename/copy, then this
185       transformation merges them back into the original "modification".
186
187       The "extent of changes" parameter can be tweaked from the default 80%
188       (that is, unless more than 80% of the original material is deleted, the
189       broken pairs are merged back into a single modification) by giving a
190       second number to -B option, like these:
191
192       •   -B50/60 (give 50% "break score" to diffcore-break, use 60% for
193           diffcore-merge-broken).
194
195       •   -B/60 (the same as above, since diffcore-break defaults to 50%).
196
197       Note that earlier implementation left a broken pair as a separate
198       creation and deletion patches. This was an unnecessary hack and the
199       latest implementation always merges all the broken pairs back into
200       modifications, but the resulting patch output is formatted differently
201       for easier review in case of such a complete rewrite by showing the
202       entire contents of old version prefixed with -, followed by the entire
203       contents of new version prefixed with +.
204

DIFFCORE-PICKAXE: FOR DETECTING ADDITION/DELETION OF SPECIFIED STRING

206       This transformation limits the set of filepairs to those that change
207       specified strings between the preimage and the postimage in a certain
208       way. -S<block of text> and -G<regular expression> options are used to
209       specify different ways these strings are sought.
210
211       "-S<block of text>" detects filepairs whose preimage and postimage have
212       different number of occurrences of the specified block of text. By
213       definition, it will not detect in-file moves. Also, when a changeset
214       moves a file wholesale without affecting the interesting string,
215       diffcore-rename kicks in as usual, and -S omits the filepair (since the
216       number of occurrences of that string didn’t change in that
217       rename-detected filepair). When used with --pickaxe-regex, treat the
218       <block of text> as an extended POSIX regular expression to match,
219       instead of a literal string.
220
221       "-G<regular expression>" (mnemonic: grep) detects filepairs whose
222       textual diff has an added or a deleted line that matches the given
223       regular expression. This means that it will detect in-file (or what
224       rename-detection considers the same file) moves, which is noise. The
225       implementation runs diff twice and greps, and this can be quite
226       expensive. To speed things up binary files without textconv filters
227       will be ignored.
228
229       When -S or -G are used without --pickaxe-all, only filepairs that match
230       their respective criterion are kept in the output. When --pickaxe-all
231       is used, if even one filepair matches their respective criterion in a
232       changeset, the entire changeset is kept. This behavior is designed to
233       make reviewing changes in the context of the whole changeset easier.
234

DIFFCORE-ORDER: FOR SORTING THE OUTPUT BASED ON FILENAMES

236       This is used to reorder the filepairs according to the user’s (or
237       project’s) taste, and is controlled by the -O option to the git diff-*
238       commands.
239
240       This takes a text file each of whose lines is a shell glob pattern.
241       Filepairs that match a glob pattern on an earlier line in the file are
242       output before ones that match a later line, and filepairs that do not
243       match any glob pattern are output last.
244
245       As an example, a typical orderfile for the core Git probably would look
246       like this:
247
248           README
249           Makefile
250           Documentation
251           *.h
252           *.c
253           t
254

DIFFCORE-ROTATE: FOR CHANGING AT WHICH PATH OUTPUT STARTS

256       This transformation takes one pathname, and rotates the set of
257       filepairs so that the filepair for the given pathname comes first,
258       optionally discarding the paths that come before it. This is used to
259       implement the --skip-to and the --rotate-to options. It is an error
260       when the specified pathname is not in the set of filepairs, but it is
261       not useful to error out when used with "git log" family of commands,
262       because it is unreasonable to expect that a given path would be
263       modified by each and every commit shown by the "git log" command. For
264       this reason, when used with "git log", the filepair that sorts the same
265       as, or the first one that sorts after, the given pathname is where the
266       output starts.
267
268       Use of this transformation combined with diffcore-order will produce
269       unexpected results, as the input to this transformation is likely not
270       sorted when diffcore-order is in effect.
271

SEE ALSO

273       git-diff(1), git-diff-files(1), git-diff-index(1), git-diff-tree(1),
274       git-format-patch(1), git-log(1), gitglossary(7), The Git User’s
275       Manual[1]
276

GIT

278       Part of the git(1) suite
279

NOTES

281        1. The Git User’s Manual
282           file:///usr/share/doc/git/user-manual.html
283
284
285
286Git 2.31.1                        2021-03-26                    GITDIFFCORE(7)
Impressum