1DIFF(1P) POSIX Programmer's Manual DIFF(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 diff — compare two files
13
15 diff [-c|-e|-f|-u|-C n|-U n] [-br] file1 file2
16
18 The diff utility shall compare the contents of file1 and file2 and
19 write to standard output a list of changes necessary to convert file1
20 into file2. This list should be minimal. No output shall be produced
21 if the files are identical.
22
24 The diff utility shall conform to the Base Definitions volume of
25 POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.
26
27 The following options shall be supported:
28
29 -b Cause any amount of white space at the end of a line to be
30 treated as a single <newline> (that is, the white-space char‐
31 acters preceding the <newline> are ignored) and other strings
32 of white-space characters, not including <newline> charac‐
33 ters, to compare equal.
34
35 -c Produce output in a form that provides three lines of copied
36 context.
37
38 -C n Produce output in a form that provides n lines of copied con‐
39 text (where n shall be interpreted as a positive decimal
40 integer).
41
42 -e Produce output in a form suitable as input for the ed util‐
43 ity, which can then be used to convert file1 into file2.
44
45 -f Produce output in an alternative form, similar in format to
46 -e, but not intended to be suitable as input for the ed util‐
47 ity, and in the opposite order.
48
49 -r Apply diff recursively to files and directories of the same
50 name when file1 and file2 are both directories.
51
52 The diff utility shall detect infinite loops; that is, enter‐
53 ing a previously visited directory that is an ancestor of the
54 last file encountered. When it detects an infinite loop,
55 diff shall write a diagnostic message to standard error and
56 shall either recover its position in the hierarchy or termi‐
57 nate.
58
59 -u Produce output in a form that provides three lines of unified
60 context.
61
62 -U n Produce output in a form that provides n lines of unified
63 context (where n shall be interpreted as a non-negative deci‐
64 mal integer).
65
67 The following operands shall be supported:
68
69 file1, file2
70 A pathname of a file to be compared. If either the file1 or
71 file2 operand is '-', the standard input shall be used in its
72 place.
73
74 If both file1 and file2 are directories, diff shall not compare block
75 special files, character special files, or FIFO special files to any
76 files and shall not compare regular files to directories. Further
77 details are as specified in Diff Directory Comparison Format. The
78 behavior of diff on other file types is implementation-defined when
79 found in directories.
80
81 If only one of file1 and file2 is a directory, diff shall be applied to
82 the non-directory file and the file contained in the directory file
83 with a filename that is the same as the last component of the non-
84 directory file.
85
87 The standard input shall be used only if one of the file1 or file2 op‐
88 erands references standard input. See the INPUT FILES section.
89
91 The input files may be of any type.
92
94 The following environment variables shall affect the execution of diff:
95
96 LANG Provide a default value for the internationalization vari‐
97 ables that are unset or null. (See the Base Definitions vol‐
98 ume of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
99 ables for the precedence of internationalization variables
100 used to determine the values of locale categories.)
101
102 LC_ALL If set to a non-empty string value, override the values of
103 all the other internationalization variables.
104
105 LC_CTYPE Determine the locale for the interpretation of sequences of
106 bytes of text data as characters (for example, single-byte as
107 opposed to multi-byte characters in arguments and input
108 files).
109
110 LC_MESSAGES
111 Determine the locale that should be used to affect the format
112 and contents of diagnostic messages written to standard error
113 and informative messages written to standard output.
114
115 LC_TIME Determine the locale for affecting the format of file time‐
116 stamps written with the -C and -c options.
117
118 NLSPATH Determine the location of message catalogs for the processing
119 of LC_MESSAGES.
120
121 TZ Determine the timezone used for calculating file timestamps
122 written with a context format. If TZ is unset or null, an
123 unspecified default timezone shall be used.
124
126 Default.
127
129 Diff Directory Comparison Format
130 If both file1 and file2 are directories, the following output formats
131 shall be used.
132
133 In the POSIX locale, each file that is present in only one directory
134 shall be reported using the following format:
135
136
137 "Only in %s: %s\n", <directory pathname>, <filename>
138
139 In the POSIX locale, subdirectories that are common to the two directo‐
140 ries may be reported with the following format:
141
142
143 "Common subdirectories: %s and %s\n", <directory1 pathname>,
144 <directory2 pathname>
145
146 For each file common to the two directories, if the two files are not
147 to be compared: if the two files have the same device ID and file
148 serial number, or are both block special files that refer to the same
149 device, or are both character special files that refer to the same
150 device, in the POSIX locale the output format is unspecified. Other‐
151 wise, in the POSIX locale an unspecified format shall be used that con‐
152 tains the pathnames of the two files.
153
154 For each file common to the two directories, if the files are compared
155 and are identical, no output shall be written. If the two files differ,
156 the following format is written:
157
158
159 "diff %s %s %s\n", <diff_options>, <filename1>, <filename2>
160
161 where <diff_options> are the options as specified on the command line.
162
163 All directory pathnames listed in this section shall be relative to the
164 original command line arguments. All other names of files listed in
165 this section shall be filenames (pathname components).
166
167 Diff Binary Output Format
168 In the POSIX locale, if one or both of the files being compared are not
169 text files, it is implementation-defined whether diff uses the binary
170 file output format or the other formats as specified below. The binary
171 file output format shall contain the pathnames of two files being com‐
172 pared and the string "differ".
173
174 If both files being compared are text files, depending on the options
175 specified, one of the following formats shall be used to write the dif‐
176 ferences.
177
178 Diff Default Output Format
179 The default (without -e, -f, -c, -C, -u, or -U options) diff utility
180 output shall contain lines of these forms:
181
182
183 "%da%d\n", <num1>, <num2>
184
185 "%da%d,%d\n", <num1>, <num2>, <num3>
186
187 "%dd%d\n", <num1>, <num2>
188
189 "%d,%dd%d\n", <num1>, <num2>, <num3>
190
191 "%dc%d\n", <num1>, <num2>
192
193 "%d,%dc%d\n", <num1>, <num2>, <num3>
194
195 "%dc%d,%d\n", <num1>, <num2>, <num3>
196
197 "%d,%dc%d,%d\n", <num1>, <num2>, <num3>, <num4>
198
199 These lines resemble ed subcommands to convert file1 into file2. The
200 line numbers before the action letters shall pertain to file1; those
201 after shall pertain to file2. Thus, by exchanging a for d and reading
202 the line in reverse order, one can also determine how to convert file2
203 into file1. As in ed, identical pairs (where num1= num2) are abbrevi‐
204 ated as a single number.
205
206 Following each of these lines, diff shall write to standard output all
207 lines affected in the first file using the format:
208
209
210 "< %s", <line>
211
212 and all lines affected in the second file using the format:
213
214
215 "> %s", <line>
216
217 If there are lines affected in both file1 and file2 (as with the c sub‐
218 command), the changes are separated with a line consisting of three
219 <hyphen-minus> characters:
220
221
222 "---\n"
223
224 Diff -e Output Format
225 With the -e option, a script shall be produced that shall, when pro‐
226 vided as input to ed, along with an appended w (write) command, convert
227 file1 into file2. Only the a (append), c (change), d (delete), i
228 (insert), and s (substitute) commands of ed shall be used in this
229 script. Text lines, except those consisting of the single character
230 <period> ('.'), shall be output as they appear in the file.
231
232 Diff -f Output Format
233 With the -f option, an alternative format of script shall be produced.
234 It is similar to that produced by -e, with the following differences:
235
236 1. It is expressed in reverse sequence; the output of -e orders
237 changes from the end of the file to the beginning; the -f from
238 beginning to end.
239
240 2. The command form <lines> <command-letter> used by -e is reversed.
241 For example, 10c with -e would be c10 with -f.
242
243 3. The form used for ranges of line numbers is <space>-separated,
244 rather than <comma>-separated.
245
246 Diff -c or -C Output Format
247 With the -c or -C option, the output format shall consist of affected
248 lines along with surrounding lines of context. The affected lines shall
249 show which ones need to be deleted or changed in file1, and those added
250 from file2. With the -c option, three lines of context, if available,
251 shall be written before and after the affected lines. With the -C
252 option, the user can specify how many lines of context are written.
253 The exact format follows.
254
255 The name and last modification time of each file shall be output in the
256 following format:
257
258
259 "*** %s %s\n", file1, <file1 timestamp>
260 "--- %s %s\n", file2, <file2 timestamp>
261
262 Each <file> field shall be the pathname of the corresponding file being
263 compared. The pathname written for standard input is unspecified.
264
265 In the POSIX locale, each <timestamp> field shall be equivalent to the
266 output from the following command:
267
268
269 date "+%a %b %e %T %Y"
270
271 without the trailing <newline>, executed at the time of last modifica‐
272 tion of the corresponding file (or the current time, if the file is
273 standard input).
274
275 Then, the following output formats shall be applied for every set of
276 changes.
277
278 First, a line shall be written in the following format:
279
280
281 "***************\n"
282
283 Next, the range of lines in file1 shall be written in the following
284 format if the range contains two or more lines:
285
286
287 "*** %d,%d ****\n", <beginning line number>, <ending line number>
288
289 and the following format otherwise:
290
291
292 "*** %d ****\n", <ending line number>
293
294 The ending line number of an empty range shall be the number of the
295 preceding line, or 0 if the range is at the start of the file.
296
297 Next, the affected lines along with lines of context (unaffected lines)
298 shall be written. Unaffected lines shall be written in the following
299 format:
300
301
302 " %s", <unaffected_line>
303
304 Deleted lines shall be written as:
305
306
307 "- %s", <deleted_line>
308
309 Changed lines shall be written as:
310
311
312 "! %s", <changed_line>
313
314 Next, the range of lines in file2 shall be written in the following
315 format if the range contains two or more lines:
316
317
318 "--- %d,%d ----\n", <beginning line number>, <ending line number>
319
320 and the following format otherwise:
321
322
323 "--- %d ----\n", <ending line number>
324
325 Then, lines of context and changed lines shall be written as described
326 in the previous formats. Lines added from file2 shall be written in the
327 following format:
328
329
330 "+ %s", <added_line>
331
332 Diff -u or -U Output Format
333 The -u or -U options behave like the -c or -C options, except that the
334 context lines are not repeated; instead, the context, deleted, and
335 added lines are shown together, interleaved. The exact format follows.
336
337 The name and last modification time of each file shall be output in the
338 following format:
339
340
341 "--- %s\t%s%s %s\n", file1, <file1 timestamp>, <file1 frac>, <file1 zone>
342 "+++ %s\t%s%s %s\n", file2, <file2 timestamp>, <file2 frac>, <file2 zone>
343
344 Each <file> field shall be the pathname of the corresponding file being
345 compared, or the single character '-' if standard input is being com‐
346 pared. However, if the pathname contains a <tab> or a <newline>, or if
347 it does not consist entirely of characters taken from the portable
348 character set, the behavior is implementation-defined.
349
350 Each <timestamp> field shall be equivalent to the output from the fol‐
351 lowing command:
352
353
354 date '+%Y-%m-%d %H:%M:%S'
355
356 without the trailing <newline>, executed at the time of last modifica‐
357 tion of the corresponding file (or the current time, if the file is
358 standard input).
359
360 Each <frac> field shall be either empty, or a decimal point followed by
361 at least one decimal digit, indicating the fractional-seconds part (if
362 any) of the file timestamp. The number of fractional digits shall be at
363 least the number needed to represent the file's timestamp without loss
364 of information.
365
366 Each <zone> field shall be of the form "shhmm", where "shh" is a signed
367 two-digit decimal number in the range -24 through +25, and "mm" is an
368 unsigned two-digit decimal number in the range 00 through 59. It rep‐
369 resents the timezone of the timestamp as the number of hours (hh) and
370 minutes (mm) east (+) or west (-) of UTC for the timestamp. If the
371 hours and minutes are both zero, the sign shall be '+'. However, if
372 the timezone is not an integral number of minutes away from UTC, the
373 <zone> field is implementation-defined.
374
375 Then, the following output formats shall be applied for every set of
376 changes.
377
378 First, the range of lines in each file shall be written in the follow‐
379 ing format:
380
381
382 "@@ -%s +%s @@", <file1 range>, <file2 range>
383
384 Each <range> field shall be of the form:
385
386
387 "%1d", <beginning line number>
388
389 or:
390
391
392 "%1d,1", <beginning line number>
393
394 if the range contains exactly one line, and:
395
396
397 "%1d,%1d", <beginning line number>, <number of lines>
398
399 otherwise. If a range is empty, its beginning line number shall be the
400 number of the line just before the range, or 0 if the empty range
401 starts the file.
402
403 Next, the affected lines along with lines of context shall be written.
404 Each non-empty unaffected line shall be written in the following for‐
405 mat:
406
407
408 " %s", <unaffected_line>
409
410 where the contents of the unaffected line shall be taken from file1.
411 It is implementation-defined whether an empty unaffected line is writ‐
412 ten as an empty line or a line containing a single <space> character.
413 This line also represents the same line of file2, even though file2's
414 line may contain different contents due to the -b. Deleted lines shall
415 be written as:
416
417
418 "-%s", <deleted_line>
419
420 Added lines shall be written as:
421
422
423 "+%s", <added_line>
424
425 The order of lines written shall be the same as that of the correspond‐
426 ing file. A deleted line shall never be written immediately after an
427 added line.
428
429 If -U n is specified, the output shall contain no more than 2n consecu‐
430 tive unaffected lines; and if the output contains an affected line and
431 this line is adjacent to up to n consecutive unaffected lines in the
432 corresponding file, the output shall contain these unaffected lines.
433 -u shall act like -U3.
434
436 The standard error shall be used only for diagnostic messages.
437
439 None.
440
442 None.
443
445 The following exit values shall be returned:
446
447 0 No differences were found.
448
449 1 Differences were found.
450
451 >1 An error occurred.
452
454 Default.
455
456 The following sections are informative.
457
459 If lines at the end of a file are changed and other lines are added,
460 diff output may show this as a delete and add, as a change, or as a
461 change and add; diff is not expected to know which happened and users
462 should not care about the difference in output as long as it clearly
463 shows the differences between the files.
464
466 If dir1 is a directory containing a directory named x, dir2 is a direc‐
467 tory containing a directory named x, dir1/x and dir2/x both contain
468 files named date.out, and dir2/x contains a file named y, the command:
469
470
471 diff -r dir1 dir2
472
473 could produce output similar to:
474
475
476 Common subdirectories: dir1/x and dir2/x
477 Only in dir2/x: y
478 diff -r dir1/x/date.out dir2/x/date.out
479 1c1
480 < Mon Jul 2 13:12:16 PDT 1990
481 ---
482 > Tue Jun 19 21:41:39 PDT 1990
483
485 The -h option was omitted because it was insufficiently specified and
486 does not add to applications portability.
487
488 Historical implementations employ algorithms that do not always produce
489 a minimum list of differences; the current language about making every
490 effort is the best this volume of POSIX.1‐2017 can do, as there is no
491 metric that could be employed to judge the quality of implementations
492 against any and all file contents. The statement ``This list should be
493 minimal'' clearly implies that implementations are not expected to pro‐
494 vide the following output when comparing two 100-line files that differ
495 in only one character on a single line:
496
497
498 1,100c1,100
499 all 100 lines from file1 preceded with "< "
500 ---
501 all 100 lines from file2 preceded with "> "
502
503 The ``Only in'' messages required when the -r option is specified are
504 not used by most historical implementations if the -e option is also
505 specified. It is required here because it provides useful information
506 that must be provided to update a target directory hierarchy to match a
507 source hierarchy. The ``Common subdirectories'' messages are written by
508 System V and 4.3 BSD when the -r option is specified. They are allowed
509 here but are not required because they are reporting on something that
510 is the same, not reporting a difference, and are not needed to update a
511 target hierarchy.
512
513 The -c option, which writes output in a format using lines of context,
514 has been included. The format is useful for a variety of reasons, among
515 them being much improved readability and the ability to understand dif‐
516 ference changes when the target file has line numbers that differ from
517 another similar, but slightly different, copy. The patch utility is
518 most valuable when working with difference listings using a context
519 format. The BSD version of -c takes an optional argument specifying the
520 amount of context. Rather than overloading -c and breaking the Utility
521 Syntax Guidelines for diff, the standard developers decided to add a
522 separate option for specifying a context diff with a specified amount
523 of context (-C). Also, the format for context diffs was extended
524 slightly in 4.3 BSD to allow multiple changes that are within context
525 lines from each other to be merged together. The output format contains
526 an additional four <asterisk> characters after the range of affected
527 lines in the first filename. This was to provide a flag for old pro‐
528 grams (like old versions of patch) that only understand the old context
529 format. The version of context described here does not require that
530 multiple changes within context lines be merged, but it does not pro‐
531 hibit it either. The extension is upwards-compatible, so any vendors
532 that wish to retain the old version of diff can do so by adding the
533 extra four <asterisk> characters (that is, utilities that currently use
534 diff and understand the new merged format will also understand the old
535 unmerged format, but not vice versa).
536
537 The -u and -U options of GNU diff have been included. Their output for‐
538 mat, designed by Wayne Davison, takes up less space than -c and -C for‐
539 mat, and in many cases is easier to read. The format's timestamps do
540 not vary by locale, so LC_TIME does not affect it. The format's line
541 numbers are rendered with the %1d format, not %d, because the file for‐
542 mat notation rules would allow extra <blank> characters to appear
543 around the numbers.
544
545 The substitute command was added as an additional format for the -e
546 option. This was added to provide implementations with a way to fix the
547 classic ``dot alone on a line'' bug present in many versions of diff.
548 Since many implementations have fixed this bug, the standard developers
549 decided not to standardize broken behavior, but rather to provide the
550 necessary tool for fixing the bug. One way to fix this bug is to output
551 two periods whenever a lone period is needed, then terminate the append
552 command with a period, and then use the substitute command to convert
553 the two periods into one period.
554
555 The BSD-derived -r option was added to provide a mechanism for using
556 diff to compare two file system trees. This behavior is useful, is
557 standard practice on all BSD-derived systems, and is not easily repro‐
558 ducible with the find utility.
559
560 The requirement that diff not compare files in some circumstances, even
561 though they have the same name, is based on the actual output of his‐
562 torical implementations. The specified behavior precludes the problems
563 arising from running into FIFOs and other files that would cause diff
564 to hang waiting for input with no indication to the user that diff was
565 hung. An earlier version of this standard specified the output format
566 more precisely, but in practice this requirement was widely ignored and
567 the benefit of standardization seemed small, so it is now unspecified.
568 In most common usage, diff -r should indicate differences in the file
569 hierarchies, not the difference of contents of devices pointed to by
570 the hierarchies.
571
572 Many early implementations of diff require seekable files. Since the
573 System Interfaces volume of POSIX.1‐2017 supports named pipes, the
574 standard developers decided that such a restriction was unreasonable.
575 Note also that the allowed filename - almost always refers to a pipe.
576
577 No directory search order is specified for diff. The historical order‐
578 ing is, in fact, not optimal, in that it prints out all of the differ‐
579 ences at the current level, including the statements about all common
580 subdirectories before recursing into those subdirectories.
581
582 The message:
583
584
585 "diff %s %s %s\n", <diff_options>, <filename1>, <filename2>
586
587 does not vary by locale because it is the representation of a command,
588 not an English sentence.
589
591 None.
592
594 cmp, comm, ed, find
595
596 The Base Definitions volume of POSIX.1‐2017, Chapter 8, Environment
597 Variables, Section 12.2, Utility Syntax Guidelines
598
600 Portions of this text are reprinted and reproduced in electronic form
601 from IEEE Std 1003.1-2017, Standard for Information Technology -- Por‐
602 table Operating System Interface (POSIX), The Open Group Base Specifi‐
603 cations Issue 7, 2018 Edition, Copyright (C) 2018 by the Institute of
604 Electrical and Electronics Engineers, Inc and The Open Group. In the
605 event of any discrepancy between this version and the original IEEE and
606 The Open Group Standard, the original IEEE and The Open Group Standard
607 is the referee document. The original Standard can be obtained online
608 at http://www.opengroup.org/unix/online.html .
609
610 Any typographical or formatting errors that appear in this page are
611 most likely to have been introduced during the conversion of the source
612 files to man page format. To report such errors, see https://www.ker‐
613 nel.org/doc/man-pages/reporting_bugs.html .
614
615
616
617IEEE/The Open Group 2017 DIFF(1P)