1GITDIFFCORE(7) Git Manual GITDIFFCORE(7)
2
3
4
6 gitdiffcore - Tweaking diff output
7
9 git diff *
10
12 The diff commands git diff-index, git diff-files, and git diff-tree can
13 be told to manipulate differences they find in unconventional ways
14 before showing diff output. The manipulation is collectively called
15 "diffcore transformation". This short note describes what they are and
16 how to use them to produce diff output that is easier to understand
17 than the conventional kind.
18
20 The git diff-* family works by first comparing two sets of files:
21
22 • git diff-index compares contents of a "tree" object and the working
23 directory (when --cached flag is not used) or a "tree" object and
24 the index file (when --cached flag is used);
25
26 • git diff-files compares contents of the index file and the working
27 directory;
28
29 • git diff-tree compares contents of two "tree" objects;
30
31 In all of these cases, the commands themselves first optionally limit
32 the two sets of files by any pathspecs given on their command-lines,
33 and compare corresponding paths in the two resulting sets of files.
34
35 The pathspecs are used to limit the world diff operates in. They remove
36 the filepairs outside the specified sets of pathnames. E.g. If the
37 input set of filepairs included:
38
39 :100644 100644 bcd1234... 0123456... M junkfile
40
41 but the command invocation was git diff-files myfile, then the junkfile
42 entry would be removed from the list because only "myfile" is under
43 consideration.
44
45 The result of comparison is passed from these commands to what is
46 internally called "diffcore", in a format similar to what is output
47 when the -p option is not used. E.g.
48
49 in-place edit :100644 100644 bcd1234... 0123456... M file0
50 create :000000 100644 0000000... 1234567... A file4
51 delete :100644 000000 1234567... 0000000... D file5
52 unmerged :000000 000000 0000000... 0000000... U file6
53
54 The diffcore mechanism is fed a list of such comparison results (each
55 of which is called "filepair", although at this point each of them
56 talks about a single file), and transforms such a list into another
57 list. There are currently 5 such transformations:
58
59 • diffcore-break
60
61 • diffcore-rename
62
63 • diffcore-merge-broken
64
65 • diffcore-pickaxe
66
67 • diffcore-order
68
69 • diffcore-rotate
70
71 These are applied in sequence. The set of filepairs git diff-* commands
72 find are used as the input to diffcore-break, and the output from
73 diffcore-break is used as the input to the next transformation. The
74 final result is then passed to the output routine and generates either
75 diff-raw format (see Output format sections of the manual for git
76 diff-* commands) or diff-patch format.
77
79 The second transformation in the chain is diffcore-break, and is
80 controlled by the -B option to the git diff-* commands. This is used to
81 detect a filepair that represents "complete rewrite" and break such
82 filepair into two filepairs that represent delete and create. E.g. If
83 the input contained this filepair:
84
85 :100644 100644 bcd1234... 0123456... M file0
86
87 and if it detects that the file "file0" is completely rewritten, it
88 changes it to:
89
90 :100644 000000 bcd1234... 0000000... D file0
91 :000000 100644 0000000... 0123456... A file0
92
93 For the purpose of breaking a filepair, diffcore-break examines the
94 extent of changes between the contents of the files before and after
95 modification (i.e. the contents that have "bcd1234..." and "0123456..."
96 as their SHA-1 content ID, in the above example). The amount of
97 deletion of original contents and insertion of new material are added
98 together, and if it exceeds the "break score", the filepair is broken
99 into two. The break score defaults to 50% of the size of the smaller of
100 the original and the result (i.e. if the edit shrinks the file, the
101 size of the result is used; if the edit lengthens the file, the size of
102 the original is used), and can be customized by giving a number after
103 "-B" option (e.g. "-B75" to tell it to use 75%).
104
106 This transformation is used to detect renames and copies, and is
107 controlled by the -M option (to detect renames) and the -C option (to
108 detect copies as well) to the git diff-* commands. If the input
109 contained these filepairs:
110
111 :100644 000000 0123456... 0000000... D fileX
112 :000000 100644 0000000... 0123456... A file0
113
114 and the contents of the deleted file fileX is similar enough to the
115 contents of the created file file0, then rename detection merges these
116 filepairs and creates:
117
118 :100644 100644 0123456... 0123456... R100 fileX file0
119
120 When the "-C" option is used, the original contents of modified files,
121 and deleted files (and also unmodified files, if the
122 "--find-copies-harder" option is used) are considered as candidates of
123 the source files in rename/copy operation. If the input were like these
124 filepairs, that talk about a modified file fileY and a newly created
125 file file0:
126
127 :100644 100644 0123456... 1234567... M fileY
128 :000000 100644 0000000... bcd3456... A file0
129
130 the original contents of fileY and the resulting contents of file0 are
131 compared, and if they are similar enough, they are changed to:
132
133 :100644 100644 0123456... 1234567... M fileY
134 :100644 100644 0123456... bcd3456... C100 fileY file0
135
136 In both rename and copy detection, the same "extent of changes"
137 algorithm used in diffcore-break is used to determine if two files are
138 "similar enough", and can be customized to use a similarity score
139 different from the default of 50% by giving a number after the "-M" or
140 "-C" option (e.g. "-M8" to tell it to use 8/10 = 80%).
141
142 Note that when rename detection is on but both copy and break detection
143 are off, rename detection adds a preliminary step that first checks if
144 files are moved across directories while keeping their filename the
145 same. If there is a file added to a directory whose contents is
146 sufficiently similar to a file with the same name that got deleted from
147 a different directory, it will mark them as renames and exclude them
148 from the later quadratic step (the one that pairwise compares all
149 unmatched files to find the "best" matches, determined by the highest
150 content similarity). So, for example, if a deleted docs/ext.txt and an
151 added docs/config/ext.txt are similar enough, they will be marked as a
152 rename and prevent an added docs/ext.md that may be even more similar
153 to the deleted docs/ext.txt from being considered as the rename
154 destination in the later step. For this reason, the preliminary "match
155 same filename" step uses a bit higher threshold to mark a file pair as
156 a rename and stop considering other candidates for better matches. At
157 most, one comparison is done per file in this preliminary pass; so if
158 there are several remaining ext.txt files throughout the directory
159 hierarchy after exact rename detection, this preliminary step may be
160 skipped for those files.
161
162 Note. When the "-C" option is used with --find-copies-harder option,
163 git diff-* commands feed unmodified filepairs to diffcore mechanism as
164 well as modified ones. This lets the copy detector consider unmodified
165 files as copy source candidates at the expense of making it slower.
166 Without --find-copies-harder, git diff-* commands can detect copies
167 only if the file that was copied happened to have been modified in the
168 same changeset.
169
171 This transformation is used to merge filepairs broken by
172 diffcore-break, and not transformed into rename/copy by
173 diffcore-rename, back into a single modification. This always runs when
174 diffcore-break is used.
175
176 For the purpose of merging broken filepairs back, it uses a different
177 "extent of changes" computation from the ones used by diffcore-break
178 and diffcore-rename. It counts only the deletion from the original, and
179 does not count insertion. If you removed only 10 lines from a 100-line
180 document, even if you added 910 new lines to make a new 1000-line
181 document, you did not do a complete rewrite. diffcore-break breaks such
182 a case in order to help diffcore-rename to consider such filepairs as
183 candidate of rename/copy detection, but if filepairs broken that way
184 were not matched with other filepairs to create rename/copy, then this
185 transformation merges them back into the original "modification".
186
187 The "extent of changes" parameter can be tweaked from the default 80%
188 (that is, unless more than 80% of the original material is deleted, the
189 broken pairs are merged back into a single modification) by giving a
190 second number to -B option, like these:
191
192 • -B50/60 (give 50% "break score" to diffcore-break, use 60% for
193 diffcore-merge-broken).
194
195 • -B/60 (the same as above, since diffcore-break defaults to 50%).
196
197 Note that earlier implementation left a broken pair as a separate
198 creation and deletion patches. This was an unnecessary hack and the
199 latest implementation always merges all the broken pairs back into
200 modifications, but the resulting patch output is formatted differently
201 for easier review in case of such a complete rewrite by showing the
202 entire contents of old version prefixed with -, followed by the entire
203 contents of new version prefixed with +.
204
206 This transformation limits the set of filepairs to those that change
207 specified strings between the preimage and the postimage in a certain
208 way. -S<block of text> and -G<regular expression> options are used to
209 specify different ways these strings are sought.
210
211 "-S<block of text>" detects filepairs whose preimage and postimage have
212 different number of occurrences of the specified block of text. By
213 definition, it will not detect in-file moves. Also, when a changeset
214 moves a file wholesale without affecting the interesting string,
215 diffcore-rename kicks in as usual, and -S omits the filepair (since the
216 number of occurrences of that string didn’t change in that
217 rename-detected filepair). When used with --pickaxe-regex, treat the
218 <block of text> as an extended POSIX regular expression to match,
219 instead of a literal string.
220
221 "-G<regular expression>" (mnemonic: grep) detects filepairs whose
222 textual diff has an added or a deleted line that matches the given
223 regular expression. This means that it will detect in-file (or what
224 rename-detection considers the same file) moves, which is noise. The
225 implementation runs diff twice and greps, and this can be quite
226 expensive. To speed things up binary files without textconv filters
227 will be ignored.
228
229 When -S or -G are used without --pickaxe-all, only filepairs that match
230 their respective criterion are kept in the output. When --pickaxe-all
231 is used, if even one filepair matches their respective criterion in a
232 changeset, the entire changeset is kept. This behavior is designed to
233 make reviewing changes in the context of the whole changeset easier.
234
236 This is used to reorder the filepairs according to the user’s (or
237 project’s) taste, and is controlled by the -O option to the git diff-*
238 commands.
239
240 This takes a text file each of whose lines is a shell glob pattern.
241 Filepairs that match a glob pattern on an earlier line in the file are
242 output before ones that match a later line, and filepairs that do not
243 match any glob pattern are output last.
244
245 As an example, a typical orderfile for the core Git probably would look
246 like this:
247
248 README
249 Makefile
250 Documentation
251 *.h
252 *.c
253 t
254
256 This transformation takes one pathname, and rotates the set of
257 filepairs so that the filepair for the given pathname comes first,
258 optionally discarding the paths that come before it. This is used to
259 implement the --skip-to and the --rotate-to options. It is an error
260 when the specified pathname is not in the set of filepairs, but it is
261 not useful to error out when used with "git log" family of commands,
262 because it is unreasonable to expect that a given path would be
263 modified by each and every commit shown by the "git log" command. For
264 this reason, when used with "git log", the filepair that sorts the same
265 as, or the first one that sorts after, the given pathname is where the
266 output starts.
267
268 Use of this transformation combined with diffcore-order will produce
269 unexpected results, as the input to this transformation is likely not
270 sorted when diffcore-order is in effect.
271
273 git-diff(1), git-diff-files(1), git-diff-index(1), git-diff-tree(1),
274 git-format-patch(1), git-log(1), gitglossary(7), The Git User’s
275 Manual[1]
276
278 Part of the git(1) suite
279
281 1. The Git User’s Manual
282 file:///usr/share/doc/git/user-manual.html
283
284
285
286Git 2.39.1 2023-01-13 GITDIFFCORE(7)