1SORT(1P) POSIX Programmer's Manual SORT(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
11
13 sort — sort, merge, or sequence check text files
14
16 sort [−m] [−o output] [−bdfinru] [−t char] [−k keydef]... [file...]
17
18 sort [−c|−C] [−bdfinru] [−t char] [−k keydef] [file]
19
21 The sort utility shall perform one of the following functions:
22
23 1. Sort lines of all the named files together and write the result to
24 the specified output.
25
26 2. Merge lines of all the named (presorted) files together and write
27 the result to the specified output.
28
29 3. Check that a single input file is correctly presorted.
30
31 Comparisons shall be based on one or more sort keys extracted from each
32 line of input (or, if no sort keys are specified, the entire line up
33 to, but not including, the terminating <newline>), and shall be per‐
34 formed using the collating sequence of the current locale.
35
37 The sort utility shall conform to the Base Definitions volume of
38 POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines, except for
39 Guideline 9, and the −k keydef option should follow the −b, −d, −f, −i,
40 −n, and −r options. In addition, '+' may be recognized as an option
41 delimiter as well as '−'.
42
43 The following options shall be supported:
44
45 −c Check that the single input file is ordered as specified by
46 the arguments and the collating sequence of the current
47 locale. Output shall not be sent to standard output. The exit
48 code shall indicate whether or not disorder was detected or
49 an error occurred. If disorder (or, with −u, a duplicate key)
50 is detected, a warning message shall be sent to standard
51 error indicating where the disorder or duplicate key was
52 found.
53
54 −C Same as −c, except that a warning message shall not be sent
55 to standard error if disorder or, with −u, a duplicate key is
56 detected.
57
58 −m Merge only; the input file shall be assumed to be already
59 sorted.
60
61 −o output Specify the name of an output file to be used instead of the
62 standard output. This file can be the same as one of the
63 input files.
64
65 −u Unique: suppress all but one in each set of lines having
66 equal keys. If used with the −c option, check that there are
67 no lines with duplicate keys, in addition to checking that
68 the input file is sorted.
69
70 The following options shall override the default ordering rules. When
71 ordering options appear independent of any key field specifications,
72 the requested field ordering rules shall be applied globally to all
73 sort keys. When attached to a specific key (see −k), the specified
74 ordering options shall override all global ordering options for that
75 key.
76
77 −d Specify that only <blank> characters and alphanumeric charac‐
78 ters, according to the current setting of LC_CTYPE, shall be
79 significant in comparisons. The behavior is undefined for a
80 sort key to which −i or −n also applies.
81
82 −f Consider all lowercase characters that have uppercase equiva‐
83 lents, according to the current setting of LC_CTYPE, to be
84 the uppercase equivalent for the purposes of comparison.
85
86 −i Ignore all characters that are non-printable, according to
87 the current setting of LC_CTYPE. The behavior is undefined
88 for a sort key for which −n also applies.
89
90 −n Restrict the sort key to an initial numeric string, consist‐
91 ing of optional <blank> characters, optional minus-sign, and
92 zero or more digits with an optional radix character and
93 thousands separators (as defined in the current locale),
94 which shall be sorted by arithmetic value. An empty digit
95 string shall be treated as zero. Leading zeros and signs on
96 zeros shall not affect ordering.
97
98 −r Reverse the sense of comparisons.
99
100 The treatment of field separators can be altered using the options:
101
102 −b Ignore leading <blank> characters when determining the start‐
103 ing and ending positions of a restricted sort key. If the −b
104 option is specified before the first −k option, it shall be
105 applied to all −k options. Otherwise, the −b option can be
106 attached independently to each −k field_start or field_end
107 option-argument (see below).
108
109 −t char Use char as the field separator character; char shall not be
110 considered to be part of a field (although it can be included
111 in a sort key). Each occurrence of char shall be significant
112 (for example, <char><char> delimits an empty field). If −t is
113 not specified, <blank> characters shall be used as default
114 field separators; each maximal non-empty sequence of <blank>
115 characters that follows a non-<blank> shall be a field sepa‐
116 rator.
117
118 Sort keys can be specified using the options:
119
120 −k keydef The keydef argument is a restricted sort key field defini‐
121 tion. The format of this definition is:
122
123 field_start[type][,field_end[type]]
124
125 where field_start and field_end define a key field restricted
126 to a portion of the line (see the EXTENDED DESCRIPTION sec‐
127 tion), and type is a modifier from the list of characters
128 'b', 'd', 'f', 'i', 'n', 'r'. The 'b' modifier shall behave
129 like the −b option, but shall apply only to the field_start
130 or field_end to which it is attached. The other modifiers
131 shall behave like the corresponding options, but shall apply
132 only to the key field to which they are attached; they shall
133 have this effect if specified with field_start, field_end, or
134 both. If any modifier is attached to a field_start or to a
135 field_end, no option shall apply to either. Implementations
136 shall support at least nine occurrences of the −k option,
137 which shall be significant in command line order. If no −k
138 option is specified, a default sort key of the entire line
139 shall be used.
140
141 When there are multiple key fields, later keys shall be com‐
142 pared only after all earlier keys compare equal. Except when
143 the −u option is specified, lines that otherwise compare
144 equal shall be ordered as if none of the options −d, −f, −i,
145 −n, or −k were present (but with −r still in effect, if it
146 was specified) and with all bytes in the lines significant to
147 the comparison. The order in which lines that still compare
148 equal are written is unspecified.
149
151 The following operand shall be supported:
152
153 file A pathname of a file to be sorted, merged, or checked. If no
154 file operands are specified, or if a file operand is '−', the
155 standard input shall be used.
156
158 The standard input shall be used only if no file operands are speci‐
159 fied, or if a file operand is '−'. See the INPUT FILES section.
160
162 The input files shall be text files, except that the sort utility shall
163 add a <newline> to the end of a file ending with an incomplete last
164 line.
165
167 The following environment variables shall affect the execution of sort:
168
169 LANG Provide a default value for the internationalization vari‐
170 ables that are unset or null. (See the Base Definitions vol‐
171 ume of POSIX.1‐2008, Section 8.2, Internationalization Vari‐
172 ables for the precedence of internationalization variables
173 used to determine the values of locale categories.)
174
175 LC_ALL If set to a non-empty string value, override the values of
176 all the other internationalization variables.
177
178 LC_COLLATE
179 Determine the locale for ordering rules.
180
181 LC_CTYPE Determine the locale for the interpretation of sequences of
182 bytes of text data as characters (for example, single-byte as
183 opposed to multi-byte characters in arguments and input
184 files) and the behavior of character classification for the
185 −b, −d, −f, −i, and −n options.
186
187 LC_MESSAGES
188 Determine the locale that should be used to affect the format
189 and contents of diagnostic messages written to standard
190 error.
191
192 LC_NUMERIC
193 Determine the locale for the definition of the radix charac‐
194 ter and thousands separator for the −n option.
195
196 NLSPATH Determine the location of message catalogs for the processing
197 of LC_MESSAGES.
198
200 Default.
201
203 Unless the −o or −c options are in effect, the standard output shall
204 contain the sorted input.
205
207 The standard error shall be used for diagnostic messages. When −c is
208 specified, if disorder is detected (or if −u is also specified and a
209 duplicate key is detected), a message shall be written to the standard
210 error which identifies the input line at which disorder (or a duplicate
211 key) was detected. A warning message about correcting an incomplete
212 last line of an input file may be generated, but need not affect the
213 final exit status.
214
216 If the −o option is in effect, the sorted input shall be written to the
217 file output.
218
220 The notation:
221
222 −k field_start[type][,field_end[type]]
223
224 shall define a key field that begins at field_start and ends at
225 field_end inclusive, unless field_start falls beyond the end of the
226 line or after field_end, in which case the key field is empty. A miss‐
227 ing field_end shall mean the last character of the line.
228
229 A field comprises a maximal sequence of non-separating characters and,
230 in the absence of option −t, any preceding field separator.
231
232 The field_start portion of the keydef option-argument shall have the
233 form:
234
235 field_number[.first_character]
236
237 Fields and characters within fields shall be numbered starting with 1.
238 The field_number and first_character pieces, interpreted as positive
239 decimal integers, shall specify the first character to be used as part
240 of a sort key. If .first_character is omitted, it shall refer to the
241 first character of the field.
242
243 The field_end portion of the keydef option-argument shall have the
244 form:
245
246 field_number[.last_character]
247
248 The field_number shall be as described above for field_start. The
249 last_character piece, interpreted as a non-negative decimal integer,
250 shall specify the last character to be used as part of the sort key. If
251 last_character evaluates to zero or .last_character is omitted, it
252 shall refer to the last character of the field specified by field_num‐
253 ber.
254
255 If the −b option or b type modifier is in effect, characters within a
256 field shall be counted from the first non-<blank> in the field. (This
257 shall apply separately to first_character and last_character.)
258
260 The following exit values shall be returned:
261
262 0 All input files were output successfully, or −c was specified and
263 the input file was correctly sorted.
264
265 1 Under the −c option, the file was not ordered as specified, or if
266 the −c and −u options were both specified, two input lines were
267 found with equal keys.
268
269 >1 An error occurred.
270
272 Default.
273
274 The following sections are informative.
275
277 The default value for −t, <blank>, has different properties from, for
278 example, −t"<space>". If a line contains:
279
280 <space><space>foo
281
282 the following treatment would occur with default separation as opposed
283 to specifically selecting a <space>:
284
285 ┌──────┬───────────────────┬──────────────┐
286 │Field │ Default │ −t "<space>" │
287 ├──────┼───────────────────┼──────────────┤
288 │ 1 │ <space><space>foo │ empty │
289 │ 2 │ empty │ empty │
290 │ 3 │ empty │ foo │
291 └──────┴───────────────────┴──────────────┘
292 The leading field separator itself is included in a field when −t is
293 not used. For example, this command returns an exit status of zero,
294 meaning the input was already sorted:
295
296 sort −c −k 2 <<eof
297 y<tab>b
298 x<space>a
299 eof
300
301 (assuming that a <tab> precedes the <space> in the current collating
302 sequence). The field separator is not included in a field when it is
303 explicitly set via −t. This is historical practice and allows usage
304 such as:
305
306 sort −t "|" −k 2n <<eof
307 Atlanta|425022|Georgia
308 Birmingham|284413|Alabama
309 Columbia|100385|South Carolina
310 eof
311
312 where the second field can be correctly sorted numerically without
313 regard to the non-numeric field separator.
314
315 The wording in the OPTIONS section clarifies that the −b, −d, −f, −i,
316 −n, and −r options have to come before the first sort key specified if
317 they are intended to apply to all specified keys. The way it is
318 described in this volume of POSIX.1‐2008 matches historical practice,
319 not historical documentation. The results are unspecified if these
320 options are specified after a −k option.
321
322 The −f option might not work as expected in locales where there is not
323 a one-to-one mapping between an uppercase and a lowercase letter.
324
326 1. The following command sorts the contents of infile with the second
327 field as the sort key:
328
329 sort −k 2,2 infile
330
331 2. The following command sorts, in reverse order, the contents of
332 infile1 and infile2, placing the output in outfile and using the
333 second character of the second field as the sort key (assuming that
334 the first character of the second field is the field separator):
335
336 sort −r −o outfile −k 2.2,2.2 infile1 infile2
337
338 3. The following command sorts the contents of infile1 and infile2
339 using the second non-<blank> of the second field as the sort key:
340
341 sort −k 2.2b,2.2b infile1 infile2
342
343 4. The following command prints the System V password file (user data‐
344 base) sorted by the numeric user ID (the third <colon>-separated
345 field):
346
347 sort −t : −k 3,3n /etc/passwd
348
349 5. The following command prints the lines of the already sorted file
350 infile, suppressing all but one occurrence of lines having the same
351 third field:
352
353 sort −um −k 3.1,3.0 infile
354
356 Examples in some historical documentation state that options −um with
357 one input file keep the first in each set of lines with equal keys.
358 This behavior was deemed to be an implementation artifact and was not
359 standardized.
360
361 The −z option was omitted; it is not standard practice on most systems
362 and is inconsistent with using sort to sort several files individually
363 and then merge them together. The text concerning −z in historical doc‐
364 umentation appeared to require implementations to determine the proper
365 buffer length during the sort phase of operation, but not during the
366 merge.
367
368 The −y option was omitted because of non-portability. The −M option,
369 present in System V, was omitted because of non-portability in interna‐
370 tional usage.
371
372 An undocumented −T option exists in some implementations. It is used to
373 specify a directory for intermediate files. Implementations are encour‐
374 aged to support the use of the TMPDIR environment variable instead of
375 adding an option to support this functionality.
376
377 The −k option was added to satisfy two objections. First, the zero-
378 based counting used by sort is not consistent with other utility con‐
379 ventions. Second, it did not meet syntax guideline requirements.
380
381 Historical documentation indicates that ``setting −n implies −b''. The
382 description of −n already states that optional leading <blank>s are
383 tolerated in doing the comparison. If −b is enabled, rather than
384 implied, by −n, this has unusual side-effects. When a character offset
385 is used in a column of numbers (for example, to sort modulo 100), that
386 offset is measured relative to the most significant digit, not to the
387 column. Based upon a recommendation from the author of the original
388 sort utility, the −b implication has been omitted from this volume of
389 POSIX.1‐2008, and an application wishing to achieve the previously men‐
390 tioned side-effects has to code the −b flag explicitly.
391
392 Earlier versions of this standard allowed the −o option to appear after
393 operands. Historical practice allowed all options to be interspersed
394 with operands. This version of the standard allows implementations to
395 accept options after operands but conforming applications should not
396 use this form.
397
398 Earlier versions of this standard also allowed the −number and +number
399 options. These options are no longer specified by POSIX.1‐2008 but may
400 be present in some implementations.
401
402 Historical implementations produced a message on standard error when −c
403 was specified and disorder was detected, and when −c and −u were speci‐
404 fied and a duplicate key was detected. An earlier version of this stan‐
405 dard contained wording that did not make it clear that this message was
406 allowed and some implementations removed this message to be sure that
407 they conformed to the standard's requirements. Confronted with this
408 difference in behavior, interactive users that wanted to be sure that
409 they got visual feedback instead of just exit code 1 could have used a
410 command like:
411
412 sort −c file || echo disorder
413
414 whether or not the sort utility provided a message in this case. But,
415 it was not easy for a user to find where the disorder or duplicate key
416 occurred on implementations that do not produce a message, especially
417 when some parts of the input line were not part of the key and when one
418 or more of the −b, −d, −f, −i, −n, or −r options or keydef type modi‐
419 fiers were in use. POSIX.1‐2008 requires a message to be produced in
420 this case. POSIX.1‐2008 also contains the −C option giving users the
421 ability to choose either behavior.
422
423 When a disorder or duplicate is found when the −c option is specified,
424 some implementations print a message containing the first line that is
425 out of order or contains a duplicate key; others print a message speci‐
426 fying the line number of the offending line. This standard allows
427 either type of message.
428
430 None.
431
433 comm, join, uniq
434
435 The Base Definitions volume of POSIX.1‐2008, Chapter 8, Environment
436 Variables, Section 12.2, Utility Syntax Guidelines
437
438 The System Interfaces volume of POSIX.1‐2008, toupper()
439
441 Portions of this text are reprinted and reproduced in electronic form
442 from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
443 -- Portable Operating System Interface (POSIX), The Open Group Base
444 Specifications Issue 7, Copyright (C) 2013 by the Institute of Electri‐
445 cal and Electronics Engineers, Inc and The Open Group. (This is
446 POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.) In the
447 event of any discrepancy between this version and the original IEEE and
448 The Open Group Standard, the original IEEE and The Open Group Standard
449 is the referee document. The original Standard can be obtained online
450 at http://www.unix.org/online.html .
451
452 Any typographical or formatting errors that appear in this page are
453 most likely to have been introduced during the conversion of the source
454 files to man page format. To report such errors, see https://www.ker‐
455 nel.org/doc/man-pages/reporting_bugs.html .
456
457
458
459IEEE/The Open Group 2013 SORT(1P)