1COMM(1P) POSIX Programmer's Manual COMM(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 comm — select or reject lines common to two files
13
15 comm [-123] file1 file2
16
18 The comm utility shall read file1 and file2, which should be ordered in
19 the current collating sequence, and produce three text columns as out‐
20 put: lines only in file1, lines only in file2, and lines in both files.
21
22 If the lines in both files are not ordered according to the collating
23 sequence of the current locale, the results are unspecified.
24
25 If the collating sequence of the current locale does not have a total
26 ordering of all characters (see the Base Definitions volume of
27 POSIX.1‐2017, Section 7.3.2, LC_COLLATE) and any lines from the input
28 files collate equally but are not identical, comm should treat them as
29 different lines but may treat them as being the same. If it treats them
30 as different, comm should expect them to be ordered according to a fur‐
31 ther byte-by-byte comparison using the collating sequence for the POSIX
32 locale and if they are not ordered in this way, the output of comm can
33 identify such lines as being both unique to file1 and unique to file2
34 instead of being in both files.
35
37 The comm utility shall conform to the Base Definitions volume of
38 POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.
39
40 The following options shall be supported:
41
42 -1 Suppress the output column of lines unique to file1.
43
44 -2 Suppress the output column of lines unique to file2.
45
46 -3 Suppress the output column of lines duplicated in file1 and
47 file2.
48
50 The following operands shall be supported:
51
52 file1 A pathname of the first file to be compared. If file1 is '-',
53 the standard input shall be used.
54
55 file2 A pathname of the second file to be compared. If file2 is
56 '-', the standard input shall be used.
57
58 If both file1 and file2 refer to standard input or to the same FIFO
59 special, block special, or character special file, the results are
60 undefined.
61
63 The standard input shall be used only if one of the file1 or file2 op‐
64 erands refers to standard input. See the INPUT FILES section.
65
67 The input files shall be text files.
68
70 The following environment variables shall affect the execution of comm:
71
72 LANG Provide a default value for the internationalization vari‐
73 ables that are unset or null. (See the Base Definitions vol‐
74 ume of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
75 ables for the precedence of internationalization variables
76 used to determine the values of locale categories.)
77
78 LC_ALL If set to a non-empty string value, override the values of
79 all the other internationalization variables.
80
81 LC_COLLATE
82 Determine the locale for the collating sequence comm expects
83 to have been used when the input files were sorted.
84
85 LC_CTYPE Determine the locale for the interpretation of sequences of
86 bytes of text data as characters (for example, single-byte as
87 opposed to multi-byte characters in arguments and input
88 files).
89
90 LC_MESSAGES
91 Determine the locale that should be used to affect the format
92 and contents of diagnostic messages written to standard
93 error.
94
95 NLSPATH Determine the location of message catalogs for the processing
96 of LC_MESSAGES.
97
99 Default.
100
102 The comm utility shall produce output depending on the options
103 selected. If the -1, -2, and -3 options are all selected, comm shall
104 write nothing to standard output.
105
106 If the -1 option is not selected, lines contained only in file1 shall
107 be written using the format:
108
109
110 "%s\n", <line in file1>
111
112 If the -2 option is not selected, lines contained only in file2 are
113 written using the format:
114
115
116 "%s%s\n", <lead>, <line in file2>
117
118 where the string <lead> is as follows:
119
120 <tab> The -1 option is not selected.
121
122 null string
123 The -1 option is selected.
124
125 If the -3 option is not selected, lines contained in both files shall
126 be written using the format:
127
128
129 "%s%s\n", <lead>, <line in both>
130
131 where the string <lead> is as follows:
132
133 <tab><tab>
134 Neither the -1 nor the -2 option is selected.
135
136 <tab> Exactly one of the -1 and -2 options is selected.
137
138 null string
139 Both the -1 and -2 options are selected.
140
141 If the input files were ordered according to the collating sequence of
142 the current locale, the lines written shall be in the collating
143 sequence of the current locale. If the input files contained any lines
144 that collated equally but were not identical and within each file those
145 lines were ordered according to a further byte-by-byte comparison using
146 the collating sequence for the POSIX locale, and comm treated them as
147 different lines, then lines written that collate equally but are not
148 identical should be ordered according to a further byte-by-byte compar‐
149 ison using the collating sequence for the POSIX locale.
150
152 The standard error shall be used only for diagnostic messages.
153
155 None.
156
158 None.
159
161 The following exit values shall be returned:
162
163 0 All input files were successfully output as specified.
164
165 >0 An error occurred.
166
168 Default.
169
170 The following sections are informative.
171
173 If the input files are not properly presorted, the output of comm might
174 not be useful.
175
176 When using comm to process pathnames, it is recommended that LC_ALL, or
177 at least LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environ‐
178 ment, since pathnames can contain byte sequences that do not form valid
179 characters in some locales, in which case the utility's behavior would
180 be undefined. In the POSIX locale each byte is a valid single-byte
181 character, and therefore this problem is avoided.
182
183 If the collating sequence of the current locale does not have a total
184 ordering of all characters, this can affect the behavior of comm in the
185 following ways:
186
187 * If comm treats lines as being the same only if they are identical,
188 some lines can be misleadingly identified as being both unique to
189 file1 and unique to file2.
190
191 * If comm treats lines as being the same if they collate equally and
192 a line from file1 collates equally with a line from file2 but is
193 not identical to it, one of the lines is misleadingly identified as
194 being in both files and the other is not written to the output at
195 all.
196
197 Such problems can be avoided by forcing the use of the POSIX locale;
198 for example, the following identifies lines in both file1 and file2:
199
200
201 LC_ALL=POSIX sort file1 > file1.posix
202 LC_ALL=POSIX sort file2 > file2.posix
203 LC_ALL=POSIX comm -12 file1.posix file2.posix | sort
204
205 The final sort re-sorts the output of comm according to the collating
206 sequence of the original locale. Doing this might be difficult if more
207 than one column is output and leading <blank>s cannot be ignored.
208
210 If a file named xcu contains a sorted list of the utilities in this
211 volume of POSIX.1‐2017, a file named xpg3 contains a sorted list of the
212 utilities specified in the X/Open Portability Guide, Issue 3, and a
213 file named svid89 contains a sorted list of the utilities in the System
214 V Interface Definition Third Edition:
215
216
217 comm -23 xcu xpg3 | comm -23 - svid89
218
219 would print a list of utilities in this volume of POSIX.1‐2017 not
220 specified by either of the other documents:
221
222
223 comm -12 xcu xpg3 | comm -12 - svid89
224
225 would print a list of utilities specified by all three documents, and:
226
227
228 comm -12 xpg3 svid89 | comm -23 - xcu
229
230 would print a list of utilities specified by both XPG3 and the SVID,
231 but not specified in this volume of POSIX.1‐2017.
232
234 None.
235
237 A future version of this standard may require that if any lines from
238 the input files collate equally but are not identical, then comm treats
239 them as different lines and expects them to be ordered according to a
240 further byte-by-byte comparison using the collating sequence for the
241 POSIX locale.
242
243 A future version of this standard may require that if the input files
244 contained any lines that collated equally but were not identical and
245 within each file those lines were ordered according to a further byte-
246 by-byte comparison using the collating sequence for the POSIX locale,
247 then lines written that collate equally but are not identical are
248 ordered according to a further byte-by-byte comparison using the col‐
249 lating sequence for the POSIX locale.
250
252 cmp, diff, sort, uniq
253
254 The Base Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COLLATE,
255 Chapter 8, Environment Variables, Section 12.2, Utility Syntax Guide‐
256 lines
257
259 Portions of this text are reprinted and reproduced in electronic form
260 from IEEE Std 1003.1-2017, Standard for Information Technology -- Por‐
261 table Operating System Interface (POSIX), The Open Group Base Specifi‐
262 cations Issue 7, 2018 Edition, Copyright (C) 2018 by the Institute of
263 Electrical and Electronics Engineers, Inc and The Open Group. In the
264 event of any discrepancy between this version and the original IEEE and
265 The Open Group Standard, the original IEEE and The Open Group Standard
266 is the referee document. The original Standard can be obtained online
267 at http://www.opengroup.org/unix/online.html .
268
269 Any typographical or formatting errors that appear in this page are
270 most likely to have been introduced during the conversion of the source
271 files to man page format. To report such errors, see https://www.ker‐
272 nel.org/doc/man-pages/reporting_bugs.html .
273
274
275
276IEEE/The Open Group 2017 COMM(1P)