1SORT(P)                    POSIX Programmer's Manual                   SORT(P)
2
3
4

NAME

6       sort - sort, merge, or sequence check text files
7

SYNOPSIS

9       sort [-m][-o output][-bdfinru][-t char][-k keydef]... [file...]
10
11       sort -c [-bdfinru][-t char][-k keydef][file]
12
13

DESCRIPTION

15       The sort utility shall perform one of the following functions:
16
17        1. Sort  lines of all the named files together and write the result to
18           the specified output.
19
20        2. Merge lines of all the named (presorted) files together  and  write
21           the result to the specified output.
22
23        3. Check that a single input file is correctly presorted.
24
25       Comparisons shall be based on one or more sort keys extracted from each
26       line of input (or, if no sort keys are specified, the  entire  line  up
27       to,  but  not  including, the terminating <newline>), and shall be per‐
28       formed using the collating sequence of the current locale.
29

OPTIONS

31       The sort utility shall  conform  to  the  Base  Definitions  volume  of
32       IEEE Std 1003.1-2001,  Section 12.2, Utility Syntax Guidelines, and the
33       -k keydef option should follow the -b, -d, -f, -i, -n, and -r options.
34
35       The following options shall be supported:
36
37       -c     Check that the single input file is ordered as specified by  the
38              arguments  and  the collating sequence of the current locale. No
39              output shall be produced; only the exit code shall be affected.
40
41       -m     Merge only; the input  file  shall  be  assumed  to  be  already
42              sorted.
43
44       -o  output
45              Specify  the  name  of  an output file to be used instead of the
46              standard output. This file can be the same as one of  the  input
47              files.
48
49       -u     Unique:  suppress  all but one in each set of lines having equal
50              keys.  If used with the -c option, check that there are no lines
51              with duplicate keys, in addition to checking that the input file
52              is sorted.
53
54
55       The following options shall override the default ordering  rules.  When
56       ordering  options  appear  independent of any key field specifications,
57       the requested field ordering rules shall be  applied  globally  to  all
58       sort  keys.  When  attached  to  a specific key (see -k), the specified
59       ordering options shall override all global ordering  options  for  that
60       key.
61
62       -d     Specify  that only <blank>s and alphanumeric characters, accord‐
63              ing to the current setting of LC_CTYPE , shall be significant in
64              comparisons.  The  behavior is undefined for a sort key to which
65              -i or -n also applies.
66
67       -f     Consider all lowercase characters that  have  uppercase  equiva‐
68              lents,  according to the current setting of LC_CTYPE , to be the
69              uppercase equivalent for the purposes of comparison.
70
71       -i     Ignore all characters that are non-printable, according  to  the
72              current setting of LC_CTYPE .
73
74       -n     Restrict  the  sort key to an initial numeric string, consisting
75              of optional <blank>s, optional minus sign, and zero or more dig‐
76              its  with  an  optional radix character and thousands separators
77              (as defined in the current locale), which  shall  be  sorted  by
78              arithmetic  value.  An  empty  digit  string shall be treated as
79              zero.  Leading zeros and signs on zeros shall not affect  order‐
80              ing.
81
82       -r     Reverse the sense of comparisons.
83
84
85       The treatment of field separators can be altered using the options:
86
87       -b     Ignore leading <blank>s when determining the starting and ending
88              positions of a restricted sort key. If the -b option  is  speci‐
89              fied  before  the first -k option, it shall be applied to all -k
90              options. Otherwise, the -b option can be attached  independently
91              to each -k field_start or field_end option-argument (see below).
92
93       -t  char
94              Use  char  as  the  field separator character; char shall not be
95              considered to be part of a field (although it can be included in
96              a  sort  key). Each occurrence of char shall be significant (for
97              example, <char><char> delimits an empty field).  If  -t  is  not
98              specified,  <blank>s  shall be used as default field separators;
99              each maximal non-empty sequence of <blank>s that follows a  non-
100              <blank> shall be a field separator.
101
102
103       Sort keys can be specified using the options:
104
105       -k  keydef
106              The  keydef  argument is a restricted sort key field definition.
107              The format of this definition is:
108
109
110              field_start[type][,field_end[type]]
111
112       where field_start and field_end define a key field restricted to a por‐
113       tion  of the line (see the EXTENDED DESCRIPTION section), and type is a
114       modifier from the list of characters 'b' , 'd' , 'f' , 'i' , 'n' ,  'r'
115       .  The  'b'  modifier  shall behave like the -b option, but shall apply
116       only to the field_start or field_end to  which  it  is  attached.   The
117       other  modifiers shall behave like the corresponding options, but shall
118       apply only to the key field to which they are attached; they shall have
119       this  effect  if specified with field_start, field_end, or both. If any
120       modifier is attached to a field_start or  to  a  field_end,  no  option
121       shall  apply  to  either.  Implementations  shall support at least nine
122       occurrences of the -k option, which shall  be  significant  in  command
123       line  order.  If  no  -k option is specified, a default sort key of the
124       entire line shall be used.
125
126       When there are multiple key fields, later keys shall be  compared  only
127       after  all  earlier  keys  compare  equal. Except when the -u option is
128       specified, lines that otherwise compare equal shall be  ordered  as  if
129       none  of  the  options  -d, -f, -i, -n, or -k were present (but with -r
130       still in effect, if it was specified) and with all bytes in  the  lines
131       significant to the comparison. The order in which lines that still com‐
132       pare equal are written is unspecified.
133
134

OPERANDS

136       The following operand shall be supported:
137
138       file   A pathname of a file to be sorted, merged,  or  checked.  If  no
139              file  operands  are specified, or if a file operand is '-' , the
140              standard input shall be used.
141
142

STDIN

144       The standard input shall be used only if no file  operands  are  speci‐
145       fied, or if a file operand is '-' .  See the INPUT FILES section.
146

INPUT FILES

148       The input files shall be text files, except that the sort utility shall
149       add a <newline> to the end of a file ending  with  an  incomplete  last
150       line.
151

ENVIRONMENT VARIABLES

153       The following environment variables shall affect the execution of sort:
154
155       LANG   Provide  a  default value for the internationalization variables
156              that are unset or null. (See  the  Base  Definitions  volume  of
157              IEEE Std 1003.1-2001,  Section  8.2,  Internationalization Vari‐
158              ables for the precedence of internationalization variables  used
159              to determine the values of locale categories.)
160
161       LC_ALL If  set  to a non-empty string value, override the values of all
162              the other internationalization variables.
163
164       LC_COLLATE
165
166              Determine the locale for ordering rules.
167
168       LC_CTYPE
169              Determine the locale for  the  interpretation  of  sequences  of
170              bytes  of  text  data as characters (for example, single-byte as
171              opposed to multi-byte characters in arguments and  input  files)
172              and the behavior of character classification for the -b, -d, -f,
173              -i, and -n options.
174
175       LC_MESSAGES
176              Determine the locale that should be used to  affect  the  format
177              and contents of diagnostic messages written to standard error.
178
179       LC_NUMERIC
180
181              Determine  the  locale for the definition of the radix character
182              and thousands separator for the -n option.
183
184       NLSPATH
185              Determine the location of message catalogs for the processing of
186              LC_MESSAGES .
187
188

ASYNCHRONOUS EVENTS

190       Default.
191

STDOUT

193       Unless  the  -o  or -c options are in effect, the standard output shall
194       contain the sorted input.
195

STDERR

197       The standard error shall be used for  diagnostic  messages.  A  warning
198       message  about  correcting an incomplete last line of an input file may
199       be generated, but need not affect the final exit status.
200

OUTPUT FILES

202       If the -o option is in effect, the sorted input shall be written to the
203       file output.
204

EXTENDED DESCRIPTION

206       The notation:
207
208
209              -k field_start[type][,field_end[type]]
210
211       shall  define  a  key  field  that  begins  at  field_start and ends at
212       field_end inclusive, unless field_start falls beyond  the  end  of  the
213       line  or after field_end, in which case the key field is empty. A miss‐
214       ing field_end shall mean the last character of the line.
215
216       A field comprises a maximal sequence of non-separating characters  and,
217       in the absence of option -t, any preceding field separator.
218
219       The  field_start  portion  of the keydef option-argument shall have the
220       form:
221
222
223              field_number[.first_character]
224
225       Fields and characters within fields shall be numbered starting with  1.
226       The  field_number  and  first_character pieces, interpreted as positive
227       decimal integers, shall specify the first character to be used as  part
228       of  a  sort  key. If .first_character is omitted, it shall refer to the
229       first character of the field.
230
231       The field_end portion of the  keydef  option-argument  shall  have  the
232       form:
233
234
235              field_number[.last_character]
236
237       The  field_number  shall  be  as  described above for field_start.  The
238       last_character piece, interpreted as a  non-negative  decimal  integer,
239       shall specify the last character to be used as part of the sort key. If
240       last_character evaluates to zero  or  .last_character  is  omitted,  it
241       shall  refer to the last character of the field specified by field_num‐
242       ber.
243
244       If the -b option or b type modifier is in effect, characters  within  a
245       field  shall be counted from the first non- <blank> in the field. (This
246       shall apply separately to first_character and last_character.)
247

EXIT STATUS

249       The following exit values shall be returned:
250
251        0     All input files were output successfully, or  -c  was  specified
252              and the input file was correctly sorted.
253
254        1     Under  the  -c option, the file was not ordered as specified, or
255              if the -c and -u options were both specified,  two  input  lines
256              were found with equal keys.
257
258       >1     An error occurred.
259
260

CONSEQUENCES OF ERRORS

262       Default.
263
264       The following sections are informative.
265

APPLICATION USAGE

267       The  default  value for -t, <blank>, has different properties from, for
268       example, -t "<space>". If a line contains:
269
270
271              <space><space>foo
272
273       the following treatment would occur with default separation as  opposed
274       to specifically selecting a <space>:
275
276                      Field   Default             -t "<space>"
277                      1       <space><space>foo   empty
278                      2       empty               empty
279                      3       empty               foo
280
281       The  leading  field  separator itself is included in a field when -t is
282       not used. For example, this command returns an  exit  status  of  zero,
283       meaning the input was already sorted:
284
285
286              sort -c -k 2 <<eof
287              y<tab>b
288              x<space>a
289              eof
290
291       (assuming  that  a  <tab> precedes the <space> in the current collating
292       sequence). The field separator is not included in a field  when  it  is
293       explicitly  set  via  -t.  This is historical practice and allows usage
294       such as:
295
296
297              sort -t "|" -k 2n <<eof
298              Atlanta|425022|Georgia
299              Birmingham|284413|Alabama
300              Columbia|100385|South Carolina
301              eof
302
303       where the second field can  be  correctly  sorted  numerically  without
304       regard to the non-numeric field separator.
305
306       The  wording  in the OPTIONS section clarifies that the -b, -d, -f, -i,
307       -n, and -r options have to come before the first sort key specified  if
308       they  are  intended  to  apply  to  all  specified  keys. The way it is
309       described in this volume  of  IEEE Std 1003.1-2001  matches  historical
310       practice,  not historical documentation. The results are unspecified if
311       these options are specified after a -k option.
312
313       The -f option might not work as expected in locales where there is  not
314       a one-to-one mapping between an uppercase and a lowercase letter.
315

EXAMPLES

317        1. The  following command sorts the contents of infile with the second
318           field as the sort key:
319
320
321           sort -k 2,2 infile
322
323        2. The following command sorts, in  reverse  order,  the  contents  of
324           infile1  and  infile2,  placing the output in outfile and using the
325           second character of the second field as the sort key (assuming that
326           the first character of the second field is the field separator):
327
328
329           sort -r -o outfile -k 2.2,2.2 infile1 infile2
330
331        3. The  following  command  sorts  the contents of infile1 and infile2
332           using the second non- <blank> of the second field as the sort key:
333
334
335           sort -k 2.2b,2.2b infile1 infile2
336
337        4. The following command prints the System V password file (user data‐
338           base)  sorted  by  the  numeric  user ID (the third colon-separated
339           field):
340
341
342           sort -t : -k 3,3n /etc/passwd
343
344        5. The following command prints the lines of the already  sorted  file
345           infile, suppressing all but one occurrence of lines having the same
346           third field:
347
348
349           sort -um -k 3.1,3.0 infile
350

RATIONALE

352       Examples in some historical documentation state that options  -um  with
353       one  input  file  keep  the first in each set of lines with equal keys.
354       This behavior was deemed to be an implementation artifact and  was  not
355       standardized.
356
357       The  -z option was omitted; it is not standard practice on most systems
358       and is inconsistent with using sort to sort several files  individually
359       and then merge them together. The text concerning -z in historical doc‐
360       umentation appeared to require implementations to determine the  proper
361       buffer  length  during  the sort phase of operation, but not during the
362       merge.
363
364       The -y option was omitted because of non-portability.  The  -M  option,
365       present in System V, was omitted because of non-portability in interna‐
366       tional usage.
367
368       An undocumented -T option exists in some implementations. It is used to
369       specify  a  directory  for  intermediate  files.   Implementations  are
370       encouraged to support  the  use  of  the  TMPDIR  environment  variable
371       instead of adding an option to support this functionality.
372
373       The  -k  option  was  added to satisfy two objections. First, the zero-
374       based counting used by sort is not consistent with other  utility  con‐
375       ventions. Second, it did not meet syntax guideline requirements.
376
377       Historical  documentation  indicates  that "setting -n implies -b". The
378       description of -n already states that  optional  leading  <blank>s  are
379       tolerated  in  doing  the  comparison.   If  -b is enabled, rather than
380       implied, by -n, this has unusual side effects. When a character  offset
381       is  used in a column of numbers (for example, to sort modulo 100), that
382       offset is measured relative to the most significant digit, not  to  the
383       column.  Based  upon  a  recommendation from the author of the original
384       sort utility, the -b implication has been omitted from this  volume  of
385       IEEE Std 1003.1-2001,  and an application wishing to achieve the previ‐
386       ously mentioned side effects has to code the -b flag explicitly.
387

FUTURE DIRECTIONS

389       None.
390

SEE ALSO

392       comm  ,   join   ,   uniq   ,   the   System   Interfaces   volume   of
393       IEEE Std 1003.1-2001, toupper()
394
396       Portions  of  this text are reprinted and reproduced in electronic form
397       from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
398       --  Portable  Operating  System  Interface (POSIX), The Open Group Base
399       Specifications Issue 6, Copyright (C) 2001-2003  by  the  Institute  of
400       Electrical  and  Electronics  Engineers, Inc and The Open Group. In the
401       event of any discrepancy between this version and the original IEEE and
402       The  Open Group Standard, the original IEEE and The Open Group Standard
403       is the referee document. The original Standard can be  obtained  online
404       at http://www.opengroup.org/unix/online.html .
405
406
407
408IEEE/The Open Group                  2003                              SORT(P)
Impressum