1SORT(P) POSIX Programmer's Manual SORT(P)
2
3
4
6 sort - sort, merge, or sequence check text files
7
9 sort [-m][-o output][-bdfinru][-t char][-k keydef]... [file...]
10
11 sort -c [-bdfinru][-t char][-k keydef][file]
12
13
15 The sort utility shall perform one of the following functions:
16
17 1. Sort lines of all the named files together and write the result to
18 the specified output.
19
20 2. Merge lines of all the named (presorted) files together and write
21 the result to the specified output.
22
23 3. Check that a single input file is correctly presorted.
24
25 Comparisons shall be based on one or more sort keys extracted from each
26 line of input (or, if no sort keys are specified, the entire line up
27 to, but not including, the terminating <newline>), and shall be per‐
28 formed using the collating sequence of the current locale.
29
31 The sort utility shall conform to the Base Definitions volume of
32 IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines, and the
33 -k keydef option should follow the -b, -d, -f, -i, -n, and -r options.
34
35 The following options shall be supported:
36
37 -c Check that the single input file is ordered as specified by the
38 arguments and the collating sequence of the current locale. No
39 output shall be produced; only the exit code shall be affected.
40
41 -m Merge only; the input file shall be assumed to be already
42 sorted.
43
44 -o output
45 Specify the name of an output file to be used instead of the
46 standard output. This file can be the same as one of the input
47 files.
48
49 -u Unique: suppress all but one in each set of lines having equal
50 keys. If used with the -c option, check that there are no lines
51 with duplicate keys, in addition to checking that the input file
52 is sorted.
53
54
55 The following options shall override the default ordering rules. When
56 ordering options appear independent of any key field specifications,
57 the requested field ordering rules shall be applied globally to all
58 sort keys. When attached to a specific key (see -k), the specified
59 ordering options shall override all global ordering options for that
60 key.
61
62 -d Specify that only <blank>s and alphanumeric characters, accord‐
63 ing to the current setting of LC_CTYPE , shall be significant in
64 comparisons. The behavior is undefined for a sort key to which
65 -i or -n also applies.
66
67 -f Consider all lowercase characters that have uppercase equiva‐
68 lents, according to the current setting of LC_CTYPE , to be the
69 uppercase equivalent for the purposes of comparison.
70
71 -i Ignore all characters that are non-printable, according to the
72 current setting of LC_CTYPE .
73
74 -n Restrict the sort key to an initial numeric string, consisting
75 of optional <blank>s, optional minus sign, and zero or more dig‐
76 its with an optional radix character and thousands separators
77 (as defined in the current locale), which shall be sorted by
78 arithmetic value. An empty digit string shall be treated as
79 zero. Leading zeros and signs on zeros shall not affect order‐
80 ing.
81
82 -r Reverse the sense of comparisons.
83
84
85 The treatment of field separators can be altered using the options:
86
87 -b Ignore leading <blank>s when determining the starting and ending
88 positions of a restricted sort key. If the -b option is speci‐
89 fied before the first -k option, it shall be applied to all -k
90 options. Otherwise, the -b option can be attached independently
91 to each -k field_start or field_end option-argument (see below).
92
93 -t char
94 Use char as the field separator character; char shall not be
95 considered to be part of a field (although it can be included in
96 a sort key). Each occurrence of char shall be significant (for
97 example, <char><char> delimits an empty field). If -t is not
98 specified, <blank>s shall be used as default field separators;
99 each maximal non-empty sequence of <blank>s that follows a non-
100 <blank> shall be a field separator.
101
102
103 Sort keys can be specified using the options:
104
105 -k keydef
106 The keydef argument is a restricted sort key field definition.
107 The format of this definition is:
108
109
110 field_start[type][,field_end[type]]
111
112 where field_start and field_end define a key field restricted to a por‐
113 tion of the line (see the EXTENDED DESCRIPTION section), and type is a
114 modifier from the list of characters 'b' , 'd' , 'f' , 'i' , 'n' , 'r'
115 . The 'b' modifier shall behave like the -b option, but shall apply
116 only to the field_start or field_end to which it is attached. The
117 other modifiers shall behave like the corresponding options, but shall
118 apply only to the key field to which they are attached; they shall have
119 this effect if specified with field_start, field_end, or both. If any
120 modifier is attached to a field_start or to a field_end, no option
121 shall apply to either. Implementations shall support at least nine
122 occurrences of the -k option, which shall be significant in command
123 line order. If no -k option is specified, a default sort key of the
124 entire line shall be used.
125
126 When there are multiple key fields, later keys shall be compared only
127 after all earlier keys compare equal. Except when the -u option is
128 specified, lines that otherwise compare equal shall be ordered as if
129 none of the options -d, -f, -i, -n, or -k were present (but with -r
130 still in effect, if it was specified) and with all bytes in the lines
131 significant to the comparison. The order in which lines that still com‐
132 pare equal are written is unspecified.
133
134
136 The following operand shall be supported:
137
138 file A pathname of a file to be sorted, merged, or checked. If no
139 file operands are specified, or if a file operand is '-' , the
140 standard input shall be used.
141
142
144 The standard input shall be used only if no file operands are speci‐
145 fied, or if a file operand is '-' . See the INPUT FILES section.
146
148 The input files shall be text files, except that the sort utility shall
149 add a <newline> to the end of a file ending with an incomplete last
150 line.
151
153 The following environment variables shall affect the execution of sort:
154
155 LANG Provide a default value for the internationalization variables
156 that are unset or null. (See the Base Definitions volume of
157 IEEE Std 1003.1-2001, Section 8.2, Internationalization Vari‐
158 ables for the precedence of internationalization variables used
159 to determine the values of locale categories.)
160
161 LC_ALL If set to a non-empty string value, override the values of all
162 the other internationalization variables.
163
164 LC_COLLATE
165
166 Determine the locale for ordering rules.
167
168 LC_CTYPE
169 Determine the locale for the interpretation of sequences of
170 bytes of text data as characters (for example, single-byte as
171 opposed to multi-byte characters in arguments and input files)
172 and the behavior of character classification for the -b, -d, -f,
173 -i, and -n options.
174
175 LC_MESSAGES
176 Determine the locale that should be used to affect the format
177 and contents of diagnostic messages written to standard error.
178
179 LC_NUMERIC
180
181 Determine the locale for the definition of the radix character
182 and thousands separator for the -n option.
183
184 NLSPATH
185 Determine the location of message catalogs for the processing of
186 LC_MESSAGES .
187
188
190 Default.
191
193 Unless the -o or -c options are in effect, the standard output shall
194 contain the sorted input.
195
197 The standard error shall be used for diagnostic messages. A warning
198 message about correcting an incomplete last line of an input file may
199 be generated, but need not affect the final exit status.
200
202 If the -o option is in effect, the sorted input shall be written to the
203 file output.
204
206 The notation:
207
208
209 -k field_start[type][,field_end[type]]
210
211 shall define a key field that begins at field_start and ends at
212 field_end inclusive, unless field_start falls beyond the end of the
213 line or after field_end, in which case the key field is empty. A miss‐
214 ing field_end shall mean the last character of the line.
215
216 A field comprises a maximal sequence of non-separating characters and,
217 in the absence of option -t, any preceding field separator.
218
219 The field_start portion of the keydef option-argument shall have the
220 form:
221
222
223 field_number[.first_character]
224
225 Fields and characters within fields shall be numbered starting with 1.
226 The field_number and first_character pieces, interpreted as positive
227 decimal integers, shall specify the first character to be used as part
228 of a sort key. If .first_character is omitted, it shall refer to the
229 first character of the field.
230
231 The field_end portion of the keydef option-argument shall have the
232 form:
233
234
235 field_number[.last_character]
236
237 The field_number shall be as described above for field_start. The
238 last_character piece, interpreted as a non-negative decimal integer,
239 shall specify the last character to be used as part of the sort key. If
240 last_character evaluates to zero or .last_character is omitted, it
241 shall refer to the last character of the field specified by field_num‐
242 ber.
243
244 If the -b option or b type modifier is in effect, characters within a
245 field shall be counted from the first non- <blank> in the field. (This
246 shall apply separately to first_character and last_character.)
247
249 The following exit values shall be returned:
250
251 0 All input files were output successfully, or -c was specified
252 and the input file was correctly sorted.
253
254 1 Under the -c option, the file was not ordered as specified, or
255 if the -c and -u options were both specified, two input lines
256 were found with equal keys.
257
258 >1 An error occurred.
259
260
262 Default.
263
264 The following sections are informative.
265
267 The default value for -t, <blank>, has different properties from, for
268 example, -t "<space>". If a line contains:
269
270
271 <space><space>foo
272
273 the following treatment would occur with default separation as opposed
274 to specifically selecting a <space>:
275
276 Field Default -t "<space>"
277 1 <space><space>foo empty
278 2 empty empty
279 3 empty foo
280
281 The leading field separator itself is included in a field when -t is
282 not used. For example, this command returns an exit status of zero,
283 meaning the input was already sorted:
284
285
286 sort -c -k 2 <<eof
287 y<tab>b
288 x<space>a
289 eof
290
291 (assuming that a <tab> precedes the <space> in the current collating
292 sequence). The field separator is not included in a field when it is
293 explicitly set via -t. This is historical practice and allows usage
294 such as:
295
296
297 sort -t "|" -k 2n <<eof
298 Atlanta|425022|Georgia
299 Birmingham|284413|Alabama
300 Columbia|100385|South Carolina
301 eof
302
303 where the second field can be correctly sorted numerically without
304 regard to the non-numeric field separator.
305
306 The wording in the OPTIONS section clarifies that the -b, -d, -f, -i,
307 -n, and -r options have to come before the first sort key specified if
308 they are intended to apply to all specified keys. The way it is
309 described in this volume of IEEE Std 1003.1-2001 matches historical
310 practice, not historical documentation. The results are unspecified if
311 these options are specified after a -k option.
312
313 The -f option might not work as expected in locales where there is not
314 a one-to-one mapping between an uppercase and a lowercase letter.
315
317 1. The following command sorts the contents of infile with the second
318 field as the sort key:
319
320
321 sort -k 2,2 infile
322
323 2. The following command sorts, in reverse order, the contents of
324 infile1 and infile2, placing the output in outfile and using the
325 second character of the second field as the sort key (assuming that
326 the first character of the second field is the field separator):
327
328
329 sort -r -o outfile -k 2.2,2.2 infile1 infile2
330
331 3. The following command sorts the contents of infile1 and infile2
332 using the second non- <blank> of the second field as the sort key:
333
334
335 sort -k 2.2b,2.2b infile1 infile2
336
337 4. The following command prints the System V password file (user data‐
338 base) sorted by the numeric user ID (the third colon-separated
339 field):
340
341
342 sort -t : -k 3,3n /etc/passwd
343
344 5. The following command prints the lines of the already sorted file
345 infile, suppressing all but one occurrence of lines having the same
346 third field:
347
348
349 sort -um -k 3.1,3.0 infile
350
352 Examples in some historical documentation state that options -um with
353 one input file keep the first in each set of lines with equal keys.
354 This behavior was deemed to be an implementation artifact and was not
355 standardized.
356
357 The -z option was omitted; it is not standard practice on most systems
358 and is inconsistent with using sort to sort several files individually
359 and then merge them together. The text concerning -z in historical doc‐
360 umentation appeared to require implementations to determine the proper
361 buffer length during the sort phase of operation, but not during the
362 merge.
363
364 The -y option was omitted because of non-portability. The -M option,
365 present in System V, was omitted because of non-portability in interna‐
366 tional usage.
367
368 An undocumented -T option exists in some implementations. It is used to
369 specify a directory for intermediate files. Implementations are
370 encouraged to support the use of the TMPDIR environment variable
371 instead of adding an option to support this functionality.
372
373 The -k option was added to satisfy two objections. First, the zero-
374 based counting used by sort is not consistent with other utility con‐
375 ventions. Second, it did not meet syntax guideline requirements.
376
377 Historical documentation indicates that "setting -n implies -b". The
378 description of -n already states that optional leading <blank>s are
379 tolerated in doing the comparison. If -b is enabled, rather than
380 implied, by -n, this has unusual side effects. When a character offset
381 is used in a column of numbers (for example, to sort modulo 100), that
382 offset is measured relative to the most significant digit, not to the
383 column. Based upon a recommendation from the author of the original
384 sort utility, the -b implication has been omitted from this volume of
385 IEEE Std 1003.1-2001, and an application wishing to achieve the previ‐
386 ously mentioned side effects has to code the -b flag explicitly.
387
389 None.
390
392 comm , join , uniq , the System Interfaces volume of
393 IEEE Std 1003.1-2001, toupper()
394
396 Portions of this text are reprinted and reproduced in electronic form
397 from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
398 -- Portable Operating System Interface (POSIX), The Open Group Base
399 Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
400 Electrical and Electronics Engineers, Inc and The Open Group. In the
401 event of any discrepancy between this version and the original IEEE and
402 The Open Group Standard, the original IEEE and The Open Group Standard
403 is the referee document. The original Standard can be obtained online
404 at http://www.opengroup.org/unix/online.html .
405
406
407
408IEEE/The Open Group 2003 SORT(P)