1SORT(1P) POSIX Programmer's Manual SORT(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 sort — sort, merge, or sequence check text files
13
15 sort [-m] [-o output] [-bdfinru] [-t char] [-k keydef]... [file...]
16
17 sort [-c|-C] [-bdfinru] [-t char] [-k keydef] [file]
18
20 The sort utility shall perform one of the following functions:
21
22 1. Sort lines of all the named files together and write the result to
23 the specified output.
24
25 2. Merge lines of all the named (presorted) files together and write
26 the result to the specified output.
27
28 3. Check that a single input file is correctly presorted.
29
30 Comparisons shall be based on one or more sort keys extracted from each
31 line of input (or, if no sort keys are specified, the entire line up
32 to, but not including, the terminating <newline>), and shall be per‐
33 formed using the collating sequence of the current locale. If this col‐
34 lating sequence does not have a total ordering of all characters (see
35 the Base Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COL‐
36 LATE), any lines of input that collate equally should be further com‐
37 pared byte-by-byte using the collating sequence for the POSIX locale.
38
40 The sort utility shall conform to the Base Definitions volume of
41 POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines, except for
42 Guideline 9, and the -k keydef option should follow the -b, -d, -f, -i,
43 -n, and -r options. In addition, '+' may be recognized as an option
44 delimiter as well as '-'.
45
46 The following options shall be supported:
47
48 -c Check that the single input file is ordered as specified by
49 the arguments and the collating sequence of the current
50 locale. Output shall not be sent to standard output. The exit
51 code shall indicate whether or not disorder was detected or
52 an error occurred. If disorder (or, with -u, a duplicate key)
53 is detected, a warning message shall be sent to standard
54 error indicating where the disorder or duplicate key was
55 found.
56
57 -C Same as -c, except that a warning message shall not be sent
58 to standard error if disorder or, with -u, a duplicate key is
59 detected.
60
61 -m Merge only; the input file shall be assumed to be already
62 sorted.
63
64 -o output Specify the name of an output file to be used instead of the
65 standard output. This file can be the same as one of the
66 input files.
67
68 -u Unique: suppress all but one in each set of lines having
69 equal keys. If used with the -c option, check that there are
70 no lines with duplicate keys, in addition to checking that
71 the input file is sorted.
72
73 The following options shall override the default ordering rules. When
74 ordering options appear independent of any key field specifications,
75 the requested field ordering rules shall be applied globally to all
76 sort keys. When attached to a specific key (see -k), the specified
77 ordering options shall override all global ordering options for that
78 key.
79
80 -d Specify that only <blank> characters and alphanumeric charac‐
81 ters, according to the current setting of LC_CTYPE, shall be
82 significant in comparisons. The behavior is undefined for a
83 sort key to which -i or -n also applies.
84
85 -f Consider all lowercase characters that have uppercase equiva‐
86 lents, according to the current setting of LC_CTYPE, to be
87 the uppercase equivalent for the purposes of comparison.
88
89 -i Ignore all characters that are non-printable, according to
90 the current setting of LC_CTYPE. The behavior is undefined
91 for a sort key for which -n also applies.
92
93 -n Restrict the sort key to an initial numeric string, consist‐
94 ing of optional <blank> characters, optional <hyphen-minus>
95 character, and zero or more digits with an optional radix
96 character and thousands separators (as defined in the current
97 locale), which shall be sorted by arithmetic value. An empty
98 digit string shall be treated as zero. Leading zeros and
99 signs on zeros shall not affect ordering.
100
101 -r Reverse the sense of comparisons.
102
103 The treatment of field separators can be altered using the options:
104
105 -b Ignore leading <blank> characters when determining the start‐
106 ing and ending positions of a restricted sort key. If the -b
107 option is specified before the first -k option, it shall be
108 applied to all -k options. Otherwise, the -b option can be
109 attached independently to each -k field_start or field_end
110 option-argument (see below).
111
112 -t char Use char as the field separator character; char shall not be
113 considered to be part of a field (although it can be included
114 in a sort key). Each occurrence of char shall be significant
115 (for example, <char><char> delimits an empty field). If -t is
116 not specified, <blank> characters shall be used as default
117 field separators; each maximal non-empty sequence of <blank>
118 characters that follows a non-<blank> shall be a field sepa‐
119 rator.
120
121 Sort keys can be specified using the options:
122
123 -k keydef The keydef argument is a restricted sort key field defini‐
124 tion. The format of this definition is:
125
126
127 field_start[type][,field_end[type]]
128
129 where field_start and field_end define a key field restricted
130 to a portion of the line (see the EXTENDED DESCRIPTION sec‐
131 tion), and type is one or more modifiers from the list of
132 characters 'b', 'd', 'f', 'i', 'n', 'r'. The 'b' modifier
133 shall behave like the -b option, but shall apply only to the
134 field_start or field_end to which it is attached. The other
135 modifiers shall behave like the corresponding options, but
136 shall apply only to the key field to which they are attached;
137 they shall have this effect if specified with field_start,
138 field_end, or both. If any modifier is attached to a
139 field_start or to a field_end, no option shall apply to
140 either. Implementations shall support at least nine occur‐
141 rences of the -k option, which shall be significant in com‐
142 mand line order. If no -k option is specified, a default sort
143 key of the entire line shall be used.
144
145 When there are multiple key fields, later keys shall be com‐
146 pared only after all earlier keys compare equal. Except when
147 the -u option is specified, lines that otherwise compare
148 equal shall be ordered as if none of the options -d, -f, -i,
149 -n, or -k were present (but with -r still in effect, if it
150 was specified) and with all bytes in the lines significant to
151 the comparison. The order in which lines that still compare
152 equal are written is unspecified.
153
155 The following operand shall be supported:
156
157 file A pathname of a file to be sorted, merged, or checked. If no
158 file operands are specified, or if a file operand is '-', the
159 standard input shall be used. If sort encounters an error
160 when opening or reading a file operand, it may exit without
161 writing any output to standard output or processing later op‐
162 erands.
163
165 The standard input shall be used only if no file operands are speci‐
166 fied, or if a file operand is '-'. See the INPUT FILES section.
167
169 The input files shall be text files, except that the sort utility shall
170 add a <newline> to the end of a file ending with an incomplete last
171 line.
172
174 The following environment variables shall affect the execution of sort:
175
176 LANG Provide a default value for the internationalization vari‐
177 ables that are unset or null. (See the Base Definitions vol‐
178 ume of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
179 ables for the precedence of internationalization variables
180 used to determine the values of locale categories.)
181
182 LC_ALL If set to a non-empty string value, override the values of
183 all the other internationalization variables.
184
185 LC_COLLATE
186 Determine the locale for ordering rules.
187
188 LC_CTYPE Determine the locale for the interpretation of sequences of
189 bytes of text data as characters (for example, single-byte as
190 opposed to multi-byte characters in arguments and input
191 files) and the behavior of character classification for the
192 -b, -d, -f, -i, and -n options.
193
194 LC_MESSAGES
195 Determine the locale that should be used to affect the format
196 and contents of diagnostic messages written to standard
197 error.
198
199 LC_NUMERIC
200 Determine the locale for the definition of the radix charac‐
201 ter and thousands separator for the -n option.
202
203 NLSPATH Determine the location of message catalogs for the processing
204 of LC_MESSAGES.
205
207 Default.
208
210 Unless the -o or -c options are in effect, the standard output shall
211 contain the sorted input.
212
214 The standard error shall be used for diagnostic messages. When -c is
215 specified, if disorder is detected (or if -u is also specified and a
216 duplicate key is detected), a message shall be written to the standard
217 error which identifies the input line at which disorder (or a duplicate
218 key) was detected. A warning message about correcting an incomplete
219 last line of an input file may be generated, but need not affect the
220 final exit status.
221
223 If the -o option is in effect, the sorted input shall be written to the
224 file output.
225
227 The notation:
228
229
230 -k field_start[type][,field_end[type]]
231
232 shall define a key field that begins at field_start and ends at
233 field_end inclusive, unless field_start falls beyond the end of the
234 line or after field_end, in which case the key field is empty. A miss‐
235 ing field_end shall mean the last character of the line.
236
237 A field comprises a maximal sequence of non-separating characters and,
238 in the absence of option -t, any preceding field separator.
239
240 The field_start portion of the keydef option-argument shall have the
241 form:
242
243
244 field_number[.first_character]
245
246 Fields and characters within fields shall be numbered starting with 1.
247 The field_number and first_character pieces, interpreted as positive
248 decimal integers, shall specify the first character to be used as part
249 of a sort key. If .first_character is omitted, it shall refer to the
250 first character of the field.
251
252 The field_end portion of the keydef option-argument shall have the
253 form:
254
255
256 field_number[.last_character]
257
258 The field_number shall be as described above for field_start. The
259 last_character piece, interpreted as a non-negative decimal integer,
260 shall specify the last character to be used as part of the sort key. If
261 last_character evaluates to zero or .last_character is omitted, it
262 shall refer to the last character of the field specified by field_num‐
263 ber.
264
265 If the -b option or b type modifier is in effect, characters within a
266 field shall be counted from the first non-<blank> in the field. (This
267 shall apply separately to first_character and last_character.)
268
270 The following exit values shall be returned:
271
272 0 All input files were output successfully, or -c was specified and
273 the input file was correctly sorted.
274
275 1 Under the -c option, the file was not ordered as specified, or if
276 the -c and -u options were both specified, two input lines were
277 found with equal keys.
278
279 >1 An error occurred.
280
282 The default requirements shall apply, except that if sort encounters an
283 error when opening or reading a file operand, it may exit without writ‐
284 ing any output to standard output or processing later operands.
285
286 The following sections are informative.
287
289 The default value for -t, <blank>, has different properties from, for
290 example, -t"<space>". If a line contains:
291
292
293 <space><space>foo
294
295 the following treatment would occur with default separation as opposed
296 to specifically selecting a <space>:
297
298 ┌──────┬───────────────────┬──────────────┐
299 │Field │ Default │ -t "<space>" │
300 ├──────┼───────────────────┼──────────────┤
301 │ 1 │ <space><space>foo │ empty │
302 │ 2 │ empty │ empty │
303 │ 3 │ empty │ foo │
304 └──────┴───────────────────┴──────────────┘
305 The leading field separator itself is included in a field when -t is
306 not used. For example, this command returns an exit status of zero,
307 meaning the input was already sorted:
308
309
310 sort -c -k 2 <<eof
311 y<tab>b
312 x<space>a
313 eof
314
315 (assuming that a <tab> precedes the <space> in the current collating
316 sequence). The field separator is not included in a field when it is
317 explicitly set via -t. This is historical practice and allows usage
318 such as:
319
320
321 sort -t "|" -k 2n <<eof
322 Atlanta|425022|Georgia
323 Birmingham|284413|Alabama
324 Columbia|100385|South Carolina
325 eof
326
327 where the second field can be correctly sorted numerically without
328 regard to the non-numeric field separator.
329
330 The wording in the OPTIONS section clarifies that the -b, -d, -f, -i,
331 -n, and -r options have to come before the first sort key specified if
332 they are intended to apply to all specified keys. The way it is
333 described in this volume of POSIX.1‐2017 matches historical practice,
334 not historical documentation. The results are unspecified if these
335 options are specified after a -k option.
336
337 The -f option might not work as expected in locales where there is not
338 a one-to-one mapping between an uppercase and a lowercase letter.
339
340 When using sort to process pathnames, it is recommended that LC_ALL, or
341 at least LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environ‐
342 ment, since pathnames can contain byte sequences that do not form valid
343 characters in some locales, in which case the utility's behavior would
344 be undefined. In the POSIX locale each byte is a valid single-byte
345 character, and therefore this problem is avoided.
346
347 If the collating sequence of the current locale does not have a total
348 ordering of all characters, this can affect the behavior of sort in the
349 following ways:
350
351 * As sort -u suppresses lines with duplicate keys, it suppresses
352 lines that collate equally but are not identical.
353
354 * The output of sort (without -u) can contain identical lines that
355 are not adjacent, if it does not implement the recommended further
356 byte-by-byte comparison of lines that collate equally. This affects
357 the use of sort with comm and uniq; see the APPLICATION USAGE for
358 those utilities.
359
361 1. The following command sorts the contents of infile with the second
362 field as the sort key:
363
364
365 sort -k 2,2 infile
366
367 2. The following command sorts, in reverse order, the contents of
368 infile1 and infile2, placing the output in outfile and using the
369 second character of the second field as the sort key (assuming that
370 the first character of the second field is the field separator):
371
372
373 sort -r -o outfile -k 2.2,2.2 infile1 infile2
374
375 3. The following command sorts the contents of infile1 and infile2
376 using the second non-<blank> of the second field as the sort key:
377
378
379 sort -k 2.2b,2.2b infile1 infile2
380
381 4. The following command prints the System V password file (user data‐
382 base) sorted by the numeric user ID (the third <colon>-separated
383 field):
384
385
386 sort -t : -k 3,3n /etc/passwd
387
388 5. The following command prints the lines of the already sorted file
389 infile, suppressing all but one occurrence of lines having the same
390 third field:
391
392
393 sort -um -k 3.1,3.0 infile
394
396 Examples in some historical documentation state that options -um with
397 one input file keep the first in each set of lines with equal keys.
398 This behavior was deemed to be an implementation artifact and was not
399 standardized.
400
401 The -z option was omitted; it is not standard practice on most systems
402 and is inconsistent with using sort to sort several files individually
403 and then merge them together. The text concerning -z in historical doc‐
404 umentation appeared to require implementations to determine the proper
405 buffer length during the sort phase of operation, but not during the
406 merge.
407
408 The -y option was omitted because of non-portability. The -M option,
409 present in System V, was omitted because of non-portability in interna‐
410 tional usage.
411
412 An undocumented -T option exists in some implementations. It is used to
413 specify a directory for intermediate files. Implementations are encour‐
414 aged to support the use of the TMPDIR environment variable instead of
415 adding an option to support this functionality.
416
417 The -k option was added to satisfy two objections. First, the zero-
418 based counting used by sort is not consistent with other utility con‐
419 ventions. Second, it did not meet syntax guideline requirements.
420
421 Historical documentation indicates that ``setting -n implies -b''. The
422 description of -n already states that optional leading <blank>s are
423 tolerated in doing the comparison. If -b is enabled, rather than
424 implied, by -n, this has unusual side-effects. When a character offset
425 is used in a column of numbers (for example, to sort modulo 100), that
426 offset is measured relative to the most significant digit, not to the
427 column. Based upon a recommendation from the author of the original
428 sort utility, the -b implication has been omitted from this volume of
429 POSIX.1‐2017, and an application wishing to achieve the previously men‐
430 tioned side-effects has to code the -b flag explicitly.
431
432 Earlier versions of this standard allowed the -o option to appear after
433 operands. Historical practice allowed all options to be interspersed
434 with operands. This version of the standard allows implementations to
435 accept options after operands but conforming applications should not
436 use this form.
437
438 Earlier versions of this standard also allowed the -number and +number
439 options. These options are no longer specified by POSIX.1‐2008 but may
440 be present in some implementations.
441
442 Historical implementations produced a message on standard error when -c
443 was specified and disorder was detected, and when -c and -u were speci‐
444 fied and a duplicate key was detected. An earlier version of this stan‐
445 dard contained wording that did not make it clear that this message was
446 allowed and some implementations removed this message to be sure that
447 they conformed to the standard's requirements. Confronted with this
448 difference in behavior, interactive users that wanted to be sure that
449 they got visual feedback instead of just exit code 1 could have used a
450 command like:
451
452
453 sort -c file || echo disorder
454
455 whether or not the sort utility provided a message in this case. But,
456 it was not easy for a user to find where the disorder or duplicate key
457 occurred on implementations that do not produce a message, especially
458 when some parts of the input line were not part of the key and when one
459 or more of the -b, -d, -f, -i, -n, or -r options or keydef type modi‐
460 fiers were in use. POSIX.1‐2008 requires a message to be produced in
461 this case. POSIX.1‐2008 also contains the -C option giving users the
462 ability to choose either behavior.
463
464 When a disorder or duplicate is found when the -c option is specified,
465 some implementations print a message containing the first line that is
466 out of order or contains a duplicate key; others print a message speci‐
467 fying the line number of the offending line. This standard allows
468 either type of message.
469
470 Implementations are encouraged to perform the recommended further byte-
471 by-byte comparison of lines that collate equally, even though this may
472 affect efficiency. The impact on efficiency can be mitigated by only
473 performing the additional comparison if the current locale's collating
474 sequence does not have a total ordering of all characters (if the
475 implementation provides a way to query this) or by only performing the
476 additional comparison if the locale name associated with the LC_COLLATE
477 category has an '@' modifier in the name (since locales without an '@'
478 modifier should have a total ordering of all characters — see the Base
479 Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COLLATE). Note
480 that if the implementation provides a stable sort option as an exten‐
481 sion (usually -s), the additional comparison should not be performed
482 when this option has been specified.
483
485 A future version of this standard may require that if the collating
486 sequence of the current locale does not have a total ordering of all
487 characters, any lines of input that collate equally when comparing them
488 as whole lines are further compared byte-by-byte using the collating
489 sequence for the POSIX locale.
490
492 comm, join, uniq
493
494 The Base Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COLLATE,
495 Chapter 8, Environment Variables, Section 12.2, Utility Syntax Guide‐
496 lines
497
498 The System Interfaces volume of POSIX.1‐2017, toupper()
499
501 Portions of this text are reprinted and reproduced in electronic form
502 from IEEE Std 1003.1-2017, Standard for Information Technology -- Por‐
503 table Operating System Interface (POSIX), The Open Group Base Specifi‐
504 cations Issue 7, 2018 Edition, Copyright (C) 2018 by the Institute of
505 Electrical and Electronics Engineers, Inc and The Open Group. In the
506 event of any discrepancy between this version and the original IEEE and
507 The Open Group Standard, the original IEEE and The Open Group Standard
508 is the referee document. The original Standard can be obtained online
509 at http://www.opengroup.org/unix/online.html .
510
511 Any typographical or formatting errors that appear in this page are
512 most likely to have been introduced during the conversion of the source
513 files to man page format. To report such errors, see https://www.ker‐
514 nel.org/doc/man-pages/reporting_bugs.html .
515
516
517
518IEEE/The Open Group 2017 SORT(1P)