1DIFF(1P) POSIX Programmer's Manual DIFF(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
11
13 diff — compare two files
14
16 diff [−c|−e|−f|−u|−C n|−U n] [−br] file1 file2
17
19 The diff utility shall compare the contents of file1 and file2 and
20 write to standard output a list of changes necessary to convert file1
21 into file2. This list should be minimal. No output shall be produced
22 if the files are identical.
23
25 The diff utility shall conform to the Base Definitions volume of
26 POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines.
27
28 The following options shall be supported:
29
30 −b Cause any amount of white space at the end of a line to be
31 treated as a single <newline> (that is, the white-space char‐
32 acters preceding the <newline> are ignored) and other strings
33 of white-space characters, not including <newline> charac‐
34 ters, to compare equal.
35
36 −c Produce output in a form that provides three lines of copied
37 context.
38
39 −C n Produce output in a form that provides n lines of copied con‐
40 text (where n shall be interpreted as a positive decimal
41 integer).
42
43 −e Produce output in a form suitable as input for the ed util‐
44 ity, which can then be used to convert file1 into file2.
45
46 −f Produce output in an alternative form, similar in format to
47 −e, but not intended to be suitable as input for the ed util‐
48 ity, and in the opposite order.
49
50 −r Apply diff recursively to files and directories of the same
51 name when file1 and file2 are both directories.
52
53 The diff utility shall detect infinite loops; that is, enter‐
54 ing a previously visited directory that is an ancestor of the
55 last file encountered. When it detects an infinite loop,
56 diff shall write a diagnostic message to standard error and
57 shall either recover its position in the hierarchy or termi‐
58 nate.
59
60 −u Produce output in a form that provides three lines of unified
61 context.
62
63 −U n Produce output in a form that provides n lines of unified
64 context (where n shall be interpreted as a non-negative deci‐
65 mal integer).
66
68 The following operands shall be supported:
69
70 file1, file2
71 A pathname of a file to be compared. If either the file1 or
72 file2 operand is '−', the standard input shall be used in its
73 place.
74
75 If both file1 and file2 are directories, diff shall not compare block
76 special files, character special files, or FIFO special files to any
77 files and shall not compare regular files to directories. Further
78 details are as specified in Diff Directory Comparison Format. The
79 behavior of diff on other file types is implementation-defined when
80 found in directories.
81
82 If only one of file1 and file2 is a directory, diff shall be applied to
83 the non-directory file and the file contained in the directory file
84 with a filename that is the same as the last component of the non-
85 directory file.
86
88 The standard input shall be used only if one of the file1 or file2 op‐
89 erands references standard input. See the INPUT FILES section.
90
92 The input files may be of any type.
93
95 The following environment variables shall affect the execution of diff:
96
97 LANG Provide a default value for the internationalization vari‐
98 ables that are unset or null. (See the Base Definitions vol‐
99 ume of POSIX.1‐2008, Section 8.2, Internationalization Vari‐
100 ables for the precedence of internationalization variables
101 used to determine the values of locale categories.)
102
103 LC_ALL If set to a non-empty string value, override the values of
104 all the other internationalization variables.
105
106 LC_CTYPE Determine the locale for the interpretation of sequences of
107 bytes of text data as characters (for example, single-byte as
108 opposed to multi-byte characters in arguments and input
109 files).
110
111 LC_MESSAGES
112 Determine the locale that should be used to affect the format
113 and contents of diagnostic messages written to standard error
114 and informative messages written to standard output.
115
116 LC_TIME Determine the locale for affecting the format of file time‐
117 stamps written with the −C and −c options.
118
119 NLSPATH Determine the location of message catalogs for the processing
120 of LC_MESSAGES.
121
122 TZ Determine the timezone used for calculating file timestamps
123 written with a context format. If TZ is unset or null, an
124 unspecified default timezone shall be used.
125
127 Default.
128
130 Diff Directory Comparison Format
131 If both file1 and file2 are directories, the following output formats
132 shall be used.
133
134 In the POSIX locale, each file that is present in only one directory
135 shall be reported using the following format:
136
137 "Only in %s: %s\n", <directory pathname>, <filename>
138
139 In the POSIX locale, subdirectories that are common to the two directo‐
140 ries may be reported with the following format:
141
142 "Common subdirectories: %s and %s\n", <directory1 pathname>,
143 <directory2 pathname>
144
145 For each file common to the two directories, if the two files are not
146 to be compared: if the two files have the same device ID and file
147 serial number, or are both block special files that refer to the same
148 device, or are both character special files that refer to the same
149 device, in the POSIX locale the output format is unspecified. Other‐
150 wise, in the POSIX locale an unspecified format shall be used that con‐
151 tains the pathnames of the two files.
152
153 For each file common to the two directories, if the files are compared
154 and are identical, no output shall be written. If the two files differ,
155 the following format is written:
156
157 "diff %s %s %s\n", <diff_options>, <filename1>, <filename2>
158
159 where <diff_options> are the options as specified on the command line.
160
161 All directory pathnames listed in this section shall be relative to the
162 original command line arguments. All other names of files listed in
163 this section shall be filenames (pathname components).
164
165 Diff Binary Output Format
166 In the POSIX locale, if one or both of the files being compared are not
167 text files, it is implementation-defined whether diff uses the binary
168 file output format or the other formats as specified below. The binary
169 file output format shall contain the pathnames of two files being com‐
170 pared and the string "differ".
171
172 If both files being compared are text files, depending on the options
173 specified, one of the following formats shall be used to write the dif‐
174 ferences.
175
176 Diff Default Output Format
177 The default (without −e, −f, −c, −C, −u, or −U options) diff utility
178 output shall contain lines of these forms:
179
180 "%da%d\n", <num1>, <num2>
181
182 "%da%d,%d\n", <num1>, <num2>, <num3>
183
184 "%dd%d\n", <num1>, <num2>
185
186 "%d,%dd%d\n", <num1>, <num2>, <num3>
187
188 "%dc%d\n", <num1>, <num2>
189
190 "%d,%dc%d\n", <num1>, <num2>, <num3>
191
192 "%dc%d,%d\n", <num1>, <num2>, <num3>
193
194 "%d,%dc%d,%d\n", <num1>, <num2>, <num3>, <num4>
195
196 These lines resemble ed subcommands to convert file1 into file2. The
197 line numbers before the action letters shall pertain to file1; those
198 after shall pertain to file2. Thus, by exchanging a for d and reading
199 the line in reverse order, one can also determine how to convert file2
200 into file1. As in ed, identical pairs (where num1= num2) are abbrevi‐
201 ated as a single number.
202
203 Following each of these lines, diff shall write to standard output all
204 lines affected in the first file using the format:
205
206 "< %s", <line>
207
208 and all lines affected in the second file using the format:
209
210 "> %s", <line>
211
212 If there are lines affected in both file1 and file2 (as with the c sub‐
213 command), the changes are separated with a line consisting of three
214 <hyphen> characters:
215
216 "−−−\n"
217
218 Diff −e Output Format
219 With the −e option, a script shall be produced that shall, when pro‐
220 vided as input to ed, along with an appended w (write) command, convert
221 file1 into file2. Only the a (append), c (change), d (delete), i
222 (insert), and s (substitute) commands of ed shall be used in this
223 script. Text lines, except those consisting of the single character
224 <period> ('.'), shall be output as they appear in the file.
225
226 Diff −f Output Format
227 With the −f option, an alternative format of script shall be produced.
228 It is similar to that produced by −e, with the following differences:
229
230 1. It is expressed in reverse sequence; the output of −e orders
231 changes from the end of the file to the beginning; the −f from
232 beginning to end.
233
234 2. The command form <lines> <command-letter> used by −e is reversed.
235 For example, 10c with −e would be c10 with −f.
236
237 3. The form used for ranges of line numbers is <space>-separated,
238 rather than <comma>-separated.
239
240 Diff −c or −C Output Format
241 With the −c or −C option, the output format shall consist of affected
242 lines along with surrounding lines of context. The affected lines shall
243 show which ones need to be deleted or changed in file1, and those added
244 from file2. With the −c option, three lines of context, if available,
245 shall be written before and after the affected lines. With the −C
246 option, the user can specify how many lines of context are written.
247 The exact format follows.
248
249 The name and last modification time of each file shall be output in the
250 following format:
251
252 "*** %s %s\n", file1, <file1 timestamp>
253 "−−− %s %s\n", file2, <file2 timestamp>
254
255 Each <file> field shall be the pathname of the corresponding file being
256 compared. The pathname written for standard input is unspecified.
257
258 In the POSIX locale, each <timestamp> field shall be equivalent to the
259 output from the following command:
260
261 date "+%a %b %e %T %Y"
262
263 without the trailing <newline>, executed at the time of last modifica‐
264 tion of the corresponding file (or the current time, if the file is
265 standard input).
266
267 Then, the following output formats shall be applied for every set of
268 changes.
269
270 First, a line shall be written in the following format:
271
272 "***************\n"
273
274 Next, the range of lines in file1 shall be written in the following
275 format if the range contains two or more lines:
276
277 "*** %d,%d ****\n", <beginning line number>, <ending line number>
278
279 and the following format otherwise:
280
281 "*** %d ****\n", <ending line number>
282
283 The ending line number of an empty range shall be the number of the
284 preceding line, or 0 if the range is at the start of the file.
285
286 Next, the affected lines along with lines of context (unaffected lines)
287 shall be written. Unaffected lines shall be written in the following
288 format:
289
290 " %s", <unaffected_line>
291
292 Deleted lines shall be written as:
293
294 "− %s", <deleted_line>
295
296 Changed lines shall be written as:
297
298 "! %s", <changed_line>
299
300 Next, the range of lines in file2 shall be written in the following
301 format if the range contains two or more lines:
302
303 "−−− %d,%d −−−−\n", <beginning line number>, <ending line number>
304
305 and the following format otherwise:
306
307 "−−− %d −−−−\n", <ending line number>
308
309 Then, lines of context and changed lines shall be written as described
310 in the previous formats. Lines added from file2 shall be written in the
311 following format:
312
313 "+ %s", <added_line>
314
315 Diff −u or −U Output Format
316 The −u or −U options behave like the −c or −C options, except that the
317 context lines are not repeated; instead, the context, deleted, and
318 added lines are shown together, interleaved. The exact format follows.
319
320 The name and last modification time of each file shall be output in the
321 following format:
322
323 "--- %s%s%s %s0, file1, <file1 timestamp>, <file1 frac>, <file1 zone>
324 "+++ %s%s%s %s0, file2, <file2 timestamp>, <file2 frac>, <file2 zone>
325
326 Each <file> field shall be the pathname of the corresponding file being
327 compared, or the single character '−' if standard input is being com‐
328 pared. However, if the pathname contains a <tab> or a <newline>, or if
329 it does not consist entirely of characters taken from the portable
330 character set, the behavior is implementation-defined.
331
332 Each <timestamp> field shall be equivalent to the output from the fol‐
333 lowing command:
334
335 date '+%Y-%m-%d %H:%M:%S'
336
337 without the trailing <newline>, executed at the time of last modifica‐
338 tion of the corresponding file (or the current time, if the file is
339 standard input).
340
341 Each <frac> field shall be either empty, or a decimal point followed by
342 at least one decimal digit, indicating the fractional-seconds part (if
343 any) of the file timestamp. The number of fractional digits shall be at
344 least the number needed to represent the file's timestamp without loss
345 of information.
346
347 Each <zone> field shall be of the form "shhmm", where "shh" is a signed
348 two-digit decimal number in the range −24 through +25, and "mm" is an
349 unsigned two-digit decimal number in the range 00 through 59. It rep‐
350 resents the timezone of the timestamp as the number of hours (hh) and
351 minutes (mm) east (+) or west (−) of UTC for the timestamp. If the
352 hours and minutes are both zero, the sign shall be '+'. However, if
353 the timezone is not an integral number of minutes away from UTC, the
354 <zone> field is implementation-defined.
355
356 Then, the following output formats shall be applied for every set of
357 changes.
358
359 First, the range of lines in each file shall be written in the follow‐
360 ing format:
361
362 "@@ -%s +%s @@", <file1 range>, <file2 range>
363
364 Each <range> field shall be of the form:
365
366 "%1d", <beginning line number>
367
368 if the range contains exactly one line, and:
369
370 "%1d,%1d", <beginning line number>, <number of lines>
371
372 otherwise. If a range is empty, its beginning line number shall be the
373 number of the line just before the range, or 0 if the empty range
374 starts the file.
375
376 Next, the affected lines along with lines of context shall be written.
377 Each non-empty unaffected line shall be written in the following for‐
378 mat:
379
380 " %s", <unaffected_line>
381
382 where the contents of the unaffected line shall be taken from file1.
383 It is implementation-defined whether an empty unaffected line is writ‐
384 ten as an empty line or a line containing a single <space> character.
385 This line also represents the same line of file2, even though file2's
386 line may contain different contents due to the −b. Deleted lines shall
387 be written as:
388
389 "-%s", <deleted_line>
390
391 Added lines shall be written as:
392
393 "+%s", <added_line>
394
395 The order of lines written shall be the same as that of the correspond‐
396 ing file. A deleted line shall never be written immediately after an
397 added line.
398
399 If −U n is specified, the output shall contain no more than n consecu‐
400 tive unaffected lines; and if the output contains an affected line and
401 this line is adjacent to up to n consecutive unaffected lines in the
402 corresponding file, the output shall contain these unaffected lines.
403 −u shall act like −U3.
404
406 The standard error shall be used only for diagnostic messages.
407
409 None.
410
412 None.
413
415 The following exit values shall be returned:
416
417 0 No differences were found.
418
419 1 Differences were found.
420
421 >1 An error occurred.
422
424 Default.
425
426 The following sections are informative.
427
429 If lines at the end of a file are changed and other lines are added,
430 diff output may show this as a delete and add, as a change, or as a
431 change and add; diff is not expected to know which happened and users
432 should not care about the difference in output as long as it clearly
433 shows the differences between the files.
434
436 If dir1 is a directory containing a directory named x, dir2 is a direc‐
437 tory containing a directory named x, dir1/x and dir2/x both contain
438 files named date.out, and dir2/x contains a file named y, the command:
439
440 diff −r dir1 dir2
441
442 could produce output similar to:
443
444 Common subdirectories: dir1/x and dir2/x
445 Only in dir2/x: y
446 diff −r dir1/x/date.out dir2/x/date.out
447 1c1
448 < Mon Jul 2 13:12:16 PDT 1990
449 −−−
450 > Tue Jun 19 21:41:39 PDT 1990
451
453 The −h option was omitted because it was insufficiently specified and
454 does not add to applications portability.
455
456 Historical implementations employ algorithms that do not always produce
457 a minimum list of differences; the current language about making every
458 effort is the best this volume of POSIX.1‐2008 can do, as there is no
459 metric that could be employed to judge the quality of implementations
460 against any and all file contents. The statement ``This list should be
461 minimal'' clearly implies that implementations are not expected to pro‐
462 vide the following output when comparing two 100-line files that differ
463 in only one character on a single line:
464
465 1,100c1,100
466 all 100 lines from file1 preceded with "< "
467 −−−
468 all 100 lines from file2 preceded with "> "
469
470 The ``Only in'' messages required when the −r option is specified are
471 not used by most historical implementations if the −e option is also
472 specified. It is required here because it provides useful information
473 that must be provided to update a target directory hierarchy to match a
474 source hierarchy. The ``Common subdirectories'' messages are written by
475 System V and 4.3 BSD when the −r option is specified. They are allowed
476 here but are not required because they are reporting on something that
477 is the same, not reporting a difference, and are not needed to update a
478 target hierarchy.
479
480 The −c option, which writes output in a format using lines of context,
481 has been included. The format is useful for a variety of reasons, among
482 them being much improved readability and the ability to understand dif‐
483 ference changes when the target file has line numbers that differ from
484 another similar, but slightly different, copy. The patch utility is
485 most valuable when working with difference listings using a context
486 format. The BSD version of −c takes an optional argument specifying the
487 amount of context. Rather than overloading −c and breaking the Utility
488 Syntax Guidelines for diff, the standard developers decided to add a
489 separate option for specifying a context diff with a specified amount
490 of context (−C). Also, the format for context diffs was extended
491 slightly in 4.3 BSD to allow multiple changes that are within context
492 lines from each other to be merged together. The output format contains
493 an additional four <asterisk> characters after the range of affected
494 lines in the first filename. This was to provide a flag for old pro‐
495 grams (like old versions of patch) that only understand the old context
496 format. The version of context described here does not require that
497 multiple changes within context lines be merged, but it does not pro‐
498 hibit it either. The extension is upwards-compatible, so any vendors
499 that wish to retain the old version of diff can do so by adding the
500 extra four <asterisk> characters (that is, utilities that currently use
501 diff and understand the new merged format will also understand the old
502 unmerged format, but not vice versa).
503
504 The −u and −U options of GNU diff have been included. Their output for‐
505 mat, designed by Wayne Davison, takes up less space than −c and −C for‐
506 mat, and in many cases is easier to read. The format's timestamps do
507 not vary by locale, so LC_TIME does not affect it. The format's line
508 numbers are rendered with the %1d format, not %d, because the file for‐
509 mat notation rules would allow extra <blank> characters to appear
510 around the numbers.
511
512 The substitute command was added as an additional format for the −e
513 option. This was added to provide implementations with a way to fix the
514 classic ``dot alone on a line'' bug present in many versions of diff.
515 Since many implementations have fixed this bug, the standard developers
516 decided not to standardize broken behavior, but rather to provide the
517 necessary tool for fixing the bug. One way to fix this bug is to output
518 two periods whenever a lone period is needed, then terminate the append
519 command with a period, and then use the substitute command to convert
520 the two periods into one period.
521
522 The BSD-derived −r option was added to provide a mechanism for using
523 diff to compare two file system trees. This behavior is useful, is
524 standard practice on all BSD-derived systems, and is not easily repro‐
525 ducible with the find utility.
526
527 The requirement that diff not compare files in some circumstances, even
528 though they have the same name, is based on the actual output of his‐
529 torical implementations. The specified behavior precludes the problems
530 arising from running into FIFOs and other files that would cause diff
531 to hang waiting for input with no indication to the user that diff was
532 hung. An earlier version of this standard specified the output format
533 more precisely, but in practice this requirement was widely ignored and
534 the benefit of standardization seemed small, so it is now unspecified.
535 In most common usage, diff −r should indicate differences in the file
536 hierarchies, not the difference of contents of devices pointed to by
537 the hierarchies.
538
539 Many early implementations of diff require seekable files. Since the
540 System Interfaces volume of POSIX.1‐2008 supports named pipes, the
541 standard developers decided that such a restriction was unreasonable.
542 Note also that the allowed filename − almost always refers to a pipe.
543
544 No directory search order is specified for diff. The historical order‐
545 ing is, in fact, not optimal, in that it prints out all of the differ‐
546 ences at the current level, including the statements about all common
547 subdirectories before recursing into those subdirectories.
548
549 The message:
550
551 "diff %s %s %s\n", <diff_options>, <filename1>, <filename2>
552
553 does not vary by locale because it is the representation of a command,
554 not an English sentence.
555
557 None.
558
560 cmp, comm, ed, find
561
562 The Base Definitions volume of POSIX.1‐2008, Chapter 8, Environment
563 Variables, Section 12.2, Utility Syntax Guidelines
564
566 Portions of this text are reprinted and reproduced in electronic form
567 from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
568 -- Portable Operating System Interface (POSIX), The Open Group Base
569 Specifications Issue 7, Copyright (C) 2013 by the Institute of Electri‐
570 cal and Electronics Engineers, Inc and The Open Group. (This is
571 POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.) In the
572 event of any discrepancy between this version and the original IEEE and
573 The Open Group Standard, the original IEEE and The Open Group Standard
574 is the referee document. The original Standard can be obtained online
575 at http://www.unix.org/online.html .
576
577 Any typographical or formatting errors that appear in this page are
578 most likely to have been introduced during the conversion of the source
579 files to man page format. To report such errors, see https://www.ker‐
580 nel.org/doc/man-pages/reporting_bugs.html .
581
582
583
584IEEE/The Open Group 2013 DIFF(1P)