1SORT(1P) POSIX Programmer's Manual SORT(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 sort - sort, merge, or sequence check text files
13
15 sort [-m][-o output][-bdfinru][-t char][-k keydef]... [file...]
16
17 sort -c [-bdfinru][-t char][-k keydef][file]
18
19
21 The sort utility shall perform one of the following functions:
22
23 1. Sort lines of all the named files together and write the result to
24 the specified output.
25
26 2. Merge lines of all the named (presorted) files together and write
27 the result to the specified output.
28
29 3. Check that a single input file is correctly presorted.
30
31 Comparisons shall be based on one or more sort keys extracted from each
32 line of input (or, if no sort keys are specified, the entire line up
33 to, but not including, the terminating <newline>), and shall be per‐
34 formed using the collating sequence of the current locale.
35
37 The sort utility shall conform to the Base Definitions volume of
38 IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines, and the
39 -k keydef option should follow the -b, -d, -f, -i, -n, and -r options.
40
41 The following options shall be supported:
42
43 -c Check that the single input file is ordered as specified by the
44 arguments and the collating sequence of the current locale. No
45 output shall be produced; only the exit code shall be affected.
46
47 -m Merge only; the input file shall be assumed to be already
48 sorted.
49
50 -o output
51 Specify the name of an output file to be used instead of the
52 standard output. This file can be the same as one of the input
53 files.
54
55 -u Unique: suppress all but one in each set of lines having equal
56 keys. If used with the -c option, check that there are no lines
57 with duplicate keys, in addition to checking that the input file
58 is sorted.
59
60
61 The following options shall override the default ordering rules. When
62 ordering options appear independent of any key field specifications,
63 the requested field ordering rules shall be applied globally to all
64 sort keys. When attached to a specific key (see -k), the specified
65 ordering options shall override all global ordering options for that
66 key.
67
68 -d Specify that only <blank>s and alphanumeric characters, accord‐
69 ing to the current setting of LC_CTYPE, shall be significant in
70 comparisons. The behavior is undefined for a sort key to which
71 -i or -n also applies.
72
73 -f Consider all lowercase characters that have uppercase equiva‐
74 lents, according to the current setting of LC_CTYPE, to be the
75 uppercase equivalent for the purposes of comparison.
76
77 -i Ignore all characters that are non-printable, according to the
78 current setting of LC_CTYPE.
79
80 -n Restrict the sort key to an initial numeric string, consisting
81 of optional <blank>s, optional minus sign, and zero or more dig‐
82 its with an optional radix character and thousands separators
83 (as defined in the current locale), which shall be sorted by
84 arithmetic value. An empty digit string shall be treated as
85 zero. Leading zeros and signs on zeros shall not affect order‐
86 ing.
87
88 -r Reverse the sense of comparisons.
89
90
91 The treatment of field separators can be altered using the options:
92
93 -b Ignore leading <blank>s when determining the starting and ending
94 positions of a restricted sort key. If the -b option is speci‐
95 fied before the first -k option, it shall be applied to all -k
96 options. Otherwise, the -b option can be attached independently
97 to each -k field_start or field_end option-argument (see below).
98
99 -t char
100 Use char as the field separator character; char shall not be
101 considered to be part of a field (although it can be included in
102 a sort key). Each occurrence of char shall be significant (for
103 example, <char><char> delimits an empty field). If -t is not
104 specified, <blank>s shall be used as default field separators;
105 each maximal non-empty sequence of <blank>s that follows a non-
106 <blank> shall be a field separator.
107
108
109 Sort keys can be specified using the options:
110
111 -k keydef
112 The keydef argument is a restricted sort key field definition.
113 The format of this definition is:
114
115
116 field_start[type][,field_end[type]]
117
118 where field_start and field_end define a key field restricted to a por‐
119 tion of the line (see the EXTENDED DESCRIPTION section), and type is a
120 modifier from the list of characters 'b', 'd', 'f', 'i', 'n', 'r' . The
121 'b' modifier shall behave like the -b option, but shall apply only to
122 the field_start or field_end to which it is attached. The other modi‐
123 fiers shall behave like the corresponding options, but shall apply only
124 to the key field to which they are attached; they shall have this
125 effect if specified with field_start, field_end, or both. If any modi‐
126 fier is attached to a field_start or to a field_end, no option shall
127 apply to either. Implementations shall support at least nine occur‐
128 rences of the -k option, which shall be significant in command line
129 order. If no -k option is specified, a default sort key of the entire
130 line shall be used.
131
132 When there are multiple key fields, later keys shall be compared only
133 after all earlier keys compare equal. Except when the -u option is
134 specified, lines that otherwise compare equal shall be ordered as if
135 none of the options -d, -f, -i, -n, or -k were present (but with -r
136 still in effect, if it was specified) and with all bytes in the lines
137 significant to the comparison. The order in which lines that still com‐
138 pare equal are written is unspecified.
139
140
142 The following operand shall be supported:
143
144 file A pathname of a file to be sorted, merged, or checked. If no
145 file operands are specified, or if a file operand is '-', the
146 standard input shall be used.
147
148
150 The standard input shall be used only if no file operands are speci‐
151 fied, or if a file operand is '-' . See the INPUT FILES section.
152
154 The input files shall be text files, except that the sort utility shall
155 add a <newline> to the end of a file ending with an incomplete last
156 line.
157
159 The following environment variables shall affect the execution of sort:
160
161 LANG Provide a default value for the internationalization variables
162 that are unset or null. (See the Base Definitions volume of
163 IEEE Std 1003.1-2001, Section 8.2, Internationalization Vari‐
164 ables for the precedence of internationalization variables used
165 to determine the values of locale categories.)
166
167 LC_ALL If set to a non-empty string value, override the values of all
168 the other internationalization variables.
169
170 LC_COLLATE
171
172 Determine the locale for ordering rules.
173
174 LC_CTYPE
175 Determine the locale for the interpretation of sequences of
176 bytes of text data as characters (for example, single-byte as
177 opposed to multi-byte characters in arguments and input files)
178 and the behavior of character classification for the -b, -d, -f,
179 -i, and -n options.
180
181 LC_MESSAGES
182 Determine the locale that should be used to affect the format
183 and contents of diagnostic messages written to standard error.
184
185 LC_NUMERIC
186
187 Determine the locale for the definition of the radix character
188 and thousands separator for the -n option.
189
190 NLSPATH
191 Determine the location of message catalogs for the processing of
192 LC_MESSAGES .
193
194
196 Default.
197
199 Unless the -o or -c options are in effect, the standard output shall
200 contain the sorted input.
201
203 The standard error shall be used for diagnostic messages. A warning
204 message about correcting an incomplete last line of an input file may
205 be generated, but need not affect the final exit status.
206
208 If the -o option is in effect, the sorted input shall be written to the
209 file output.
210
212 The notation:
213
214
215 -k field_start[type][,field_end[type]]
216
217 shall define a key field that begins at field_start and ends at
218 field_end inclusive, unless field_start falls beyond the end of the
219 line or after field_end, in which case the key field is empty. A miss‐
220 ing field_end shall mean the last character of the line.
221
222 A field comprises a maximal sequence of non-separating characters and,
223 in the absence of option -t, any preceding field separator.
224
225 The field_start portion of the keydef option-argument shall have the
226 form:
227
228
229 field_number[.first_character]
230
231 Fields and characters within fields shall be numbered starting with 1.
232 The field_number and first_character pieces, interpreted as positive
233 decimal integers, shall specify the first character to be used as part
234 of a sort key. If .first_character is omitted, it shall refer to the
235 first character of the field.
236
237 The field_end portion of the keydef option-argument shall have the
238 form:
239
240
241 field_number[.last_character]
242
243 The field_number shall be as described above for field_start. The
244 last_character piece, interpreted as a non-negative decimal integer,
245 shall specify the last character to be used as part of the sort key. If
246 last_character evaluates to zero or .last_character is omitted, it
247 shall refer to the last character of the field specified by field_num‐
248 ber.
249
250 If the -b option or b type modifier is in effect, characters within a
251 field shall be counted from the first non- <blank> in the field. (This
252 shall apply separately to first_character and last_character.)
253
255 The following exit values shall be returned:
256
257 0 All input files were output successfully, or -c was specified
258 and the input file was correctly sorted.
259
260 1 Under the -c option, the file was not ordered as specified, or
261 if the -c and -u options were both specified, two input lines
262 were found with equal keys.
263
264 >1 An error occurred.
265
266
268 Default.
269
270 The following sections are informative.
271
273 The default value for -t, <blank>, has different properties from, for
274 example, -t "<space>". If a line contains:
275
276
277 <space><space>foo
278
279 the following treatment would occur with default separation as opposed
280 to specifically selecting a <space>:
281
282 Field Default -t "<space>"
283 1 <space><space>foo empty
284 2 empty empty
285 3 empty foo
286
287 The leading field separator itself is included in a field when -t is
288 not used. For example, this command returns an exit status of zero,
289 meaning the input was already sorted:
290
291
292 sort -c -k 2 <<eof
293 y<tab>b
294 x<space>a
295 eof
296
297 (assuming that a <tab> precedes the <space> in the current collating
298 sequence). The field separator is not included in a field when it is
299 explicitly set via -t. This is historical practice and allows usage
300 such as:
301
302
303 sort -t "|" -k 2n <<eof
304 Atlanta|425022|Georgia
305 Birmingham|284413|Alabama
306 Columbia|100385|South Carolina
307 eof
308
309 where the second field can be correctly sorted numerically without
310 regard to the non-numeric field separator.
311
312 The wording in the OPTIONS section clarifies that the -b, -d, -f, -i,
313 -n, and -r options have to come before the first sort key specified if
314 they are intended to apply to all specified keys. The way it is
315 described in this volume of IEEE Std 1003.1-2001 matches historical
316 practice, not historical documentation. The results are unspecified if
317 these options are specified after a -k option.
318
319 The -f option might not work as expected in locales where there is not
320 a one-to-one mapping between an uppercase and a lowercase letter.
321
323 1. The following command sorts the contents of infile with the second
324 field as the sort key:
325
326
327 sort -k 2,2 infile
328
329 2. The following command sorts, in reverse order, the contents of
330 infile1 and infile2, placing the output in outfile and using the
331 second character of the second field as the sort key (assuming that
332 the first character of the second field is the field separator):
333
334
335 sort -r -o outfile -k 2.2,2.2 infile1 infile2
336
337 3. The following command sorts the contents of infile1 and infile2
338 using the second non- <blank> of the second field as the sort key:
339
340
341 sort -k 2.2b,2.2b infile1 infile2
342
343 4. The following command prints the System V password file (user data‐
344 base) sorted by the numeric user ID (the third colon-separated
345 field):
346
347
348 sort -t : -k 3,3n /etc/passwd
349
350 5. The following command prints the lines of the already sorted file
351 infile, suppressing all but one occurrence of lines having the same
352 third field:
353
354
355 sort -um -k 3.1,3.0 infile
356
358 Examples in some historical documentation state that options -um with
359 one input file keep the first in each set of lines with equal keys.
360 This behavior was deemed to be an implementation artifact and was not
361 standardized.
362
363 The -z option was omitted; it is not standard practice on most systems
364 and is inconsistent with using sort to sort several files individually
365 and then merge them together. The text concerning -z in historical doc‐
366 umentation appeared to require implementations to determine the proper
367 buffer length during the sort phase of operation, but not during the
368 merge.
369
370 The -y option was omitted because of non-portability. The -M option,
371 present in System V, was omitted because of non-portability in interna‐
372 tional usage.
373
374 An undocumented -T option exists in some implementations. It is used to
375 specify a directory for intermediate files. Implementations are
376 encouraged to support the use of the TMPDIR environment variable
377 instead of adding an option to support this functionality.
378
379 The -k option was added to satisfy two objections. First, the zero-
380 based counting used by sort is not consistent with other utility con‐
381 ventions. Second, it did not meet syntax guideline requirements.
382
383 Historical documentation indicates that "setting -n implies -b". The
384 description of -n already states that optional leading <blank>s are
385 tolerated in doing the comparison. If -b is enabled, rather than
386 implied, by -n, this has unusual side effects. When a character offset
387 is used in a column of numbers (for example, to sort modulo 100), that
388 offset is measured relative to the most significant digit, not to the
389 column. Based upon a recommendation from the author of the original
390 sort utility, the -b implication has been omitted from this volume of
391 IEEE Std 1003.1-2001, and an application wishing to achieve the previ‐
392 ously mentioned side effects has to code the -b flag explicitly.
393
395 None.
396
398 comm, join, uniq, the System Interfaces volume of IEEE Std 1003.1-2001,
399 toupper()
400
402 Portions of this text are reprinted and reproduced in electronic form
403 from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
404 -- Portable Operating System Interface (POSIX), The Open Group Base
405 Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
406 Electrical and Electronics Engineers, Inc and The Open Group. In the
407 event of any discrepancy between this version and the original IEEE and
408 The Open Group Standard, the original IEEE and The Open Group Standard
409 is the referee document. The original Standard can be obtained online
410 at http://www.opengroup.org/unix/online.html .
411
412
413
414IEEE/The Open Group 2003 SORT(1P)