1SORT(1P)                   POSIX Programmer's Manual                  SORT(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10

NAME

12       sort — sort, merge, or sequence check text files
13

SYNOPSIS

15       sort [-m] [-o output] [-bdfinru] [-t char] [-k keydef]... [file...]
16
17       sort [-c|-C] [-bdfinru] [-t char] [-k keydef] [file]
18

DESCRIPTION

20       The sort utility shall perform one of the following functions:
21
22        1. Sort lines of all the named files together and write the result  to
23           the specified output.
24
25        2. Merge  lines  of all the named (presorted) files together and write
26           the result to the specified output.
27
28        3. Check that a single input file is correctly presorted.
29
30       Comparisons shall be based on one or more sort keys extracted from each
31       line  of  input  (or, if no sort keys are specified, the entire line up
32       to, but not including, the terminating <newline>), and  shall  be  per‐
33       formed using the collating sequence of the current locale. If this col‐
34       lating sequence does not have a total ordering of all  characters  (see
35       the  Base  Definitions  volume  of POSIX.1‐2017, Section 7.3.2, LC_COL‐
36       LATE), any lines of input that collate equally should be  further  com‐
37       pared byte-by-byte using the collating sequence for the POSIX locale.
38

OPTIONS

40       The  sort  utility  shall  conform  to  the  Base Definitions volume of
41       POSIX.1‐2017, Section  12.2,  Utility  Syntax  Guidelines,  except  for
42       Guideline 9, and the -k keydef option should follow the -b, -d, -f, -i,
43       -n, and -r options. In addition, '+' may be  recognized  as  an  option
44       delimiter as well as '-'.
45
46       The following options shall be supported:
47
48       -c        Check  that  the single input file is ordered as specified by
49                 the arguments and  the  collating  sequence  of  the  current
50                 locale. Output shall not be sent to standard output. The exit
51                 code shall indicate whether or not disorder was  detected  or
52                 an error occurred. If disorder (or, with -u, a duplicate key)
53                 is detected, a warning message  shall  be  sent  to  standard
54                 error  indicating  where  the  disorder  or duplicate key was
55                 found.
56
57       -C        Same as -c, except that a warning message shall not  be  sent
58                 to standard error if disorder or, with -u, a duplicate key is
59                 detected.
60
61       -m        Merge only; the input file shall be  assumed  to  be  already
62                 sorted.
63
64       -o output Specify  the name of an output file to be used instead of the
65                 standard output. This file can be the  same  as  one  of  the
66                 input files.
67
68       -u        Unique:  suppress  all  but  one  in each set of lines having
69                 equal keys.  If used with the -c option, check that there are
70                 no  lines  with  duplicate keys, in addition to checking that
71                 the input file is sorted.
72
73       The following options shall override the default ordering  rules.  When
74       ordering  options  appear  independent of any key field specifications,
75       the requested field ordering rules shall be  applied  globally  to  all
76       sort  keys.  When  attached  to  a specific key (see -k), the specified
77       ordering options shall override all global ordering  options  for  that
78       key.
79
80       -d        Specify that only <blank> characters and alphanumeric charac‐
81                 ters, according to the current setting of LC_CTYPE, shall  be
82                 significant  in  comparisons. The behavior is undefined for a
83                 sort key to which -i or -n also applies.
84
85       -f        Consider all lowercase characters that have uppercase equiva‐
86                 lents,  according  to  the current setting of LC_CTYPE, to be
87                 the uppercase equivalent for the purposes of comparison.
88
89       -i        Ignore all characters that are  non-printable,  according  to
90                 the  current  setting of LC_CTYPE.  The behavior is undefined
91                 for a sort key for which -n also applies.
92
93       -n        Restrict the sort key to an initial numeric string,  consist‐
94                 ing  of  optional <blank> characters, optional <hyphen-minus>
95                 character, and zero or more digits  with  an  optional  radix
96                 character and thousands separators (as defined in the current
97                 locale), which shall be sorted by arithmetic value. An  empty
98                 digit  string  shall  be  treated  as zero. Leading zeros and
99                 signs on zeros shall not affect ordering.
100
101       -r        Reverse the sense of comparisons.
102
103       The treatment of field separators can be altered using the options:
104
105       -b        Ignore leading <blank> characters when determining the start‐
106                 ing  and ending positions of a restricted sort key. If the -b
107                 option is specified before the first -k option, it  shall  be
108                 applied  to  all  -k options. Otherwise, the -b option can be
109                 attached independently to each -k  field_start  or  field_end
110                 option-argument (see below).
111
112       -t char   Use  char as the field separator character; char shall not be
113                 considered to be part of a field (although it can be included
114                 in  a sort key). Each occurrence of char shall be significant
115                 (for example, <char><char> delimits an empty field). If -t is
116                 not  specified,  <blank>  characters shall be used as default
117                 field separators; each maximal non-empty sequence of  <blank>
118                 characters  that follows a non-<blank> shall be a field sepa‐
119                 rator.
120
121       Sort keys can be specified using the options:
122
123       -k keydef The keydef argument is a restricted sort  key  field  defini‐
124                 tion. The format of this definition is:
125
126
127                     field_start[type][,field_end[type]]
128
129                 where field_start and field_end define a key field restricted
130                 to a portion of the line (see the EXTENDED  DESCRIPTION  sec‐
131                 tion),  and  type  is  one or more modifiers from the list of
132                 characters 'b', 'd', 'f', 'i', 'n', 'r'.   The  'b'  modifier
133                 shall  behave like the -b option, but shall apply only to the
134                 field_start or field_end to which it is attached.  The  other
135                 modifiers  shall  behave  like the corresponding options, but
136                 shall apply only to the key field to which they are attached;
137                 they  shall  have  this effect if specified with field_start,
138                 field_end,  or  both.  If  any  modifier  is  attached  to  a
139                 field_start  or  to  a  field_end,  no  option shall apply to
140                 either. Implementations shall support at  least  nine  occur‐
141                 rences  of  the -k option, which shall be significant in com‐
142                 mand line order. If no -k option is specified, a default sort
143                 key of the entire line shall be used.
144
145                 When  there are multiple key fields, later keys shall be com‐
146                 pared only after all earlier keys compare equal. Except  when
147                 the  -u  option  is  specified,  lines that otherwise compare
148                 equal shall be ordered as if none of the options -d, -f,  -i,
149                 -n,  or  -k  were present (but with -r still in effect, if it
150                 was specified) and with all bytes in the lines significant to
151                 the  comparison.  The order in which lines that still compare
152                 equal are written is unspecified.
153

OPERANDS

155       The following operand shall be supported:
156
157       file      A pathname of a file to be sorted, merged, or checked. If  no
158                 file operands are specified, or if a file operand is '-', the
159                 standard input shall be used. If  sort  encounters  an  error
160                 when  opening  or reading a file operand, it may exit without
161                 writing any output to standard output or processing later op‐
162                 erands.
163

STDIN

165       The  standard  input  shall be used only if no file operands are speci‐
166       fied, or if a file operand is '-'.  See the INPUT FILES section.
167

INPUT FILES

169       The input files shall be text files, except that the sort utility shall
170       add  a  <newline>  to  the end of a file ending with an incomplete last
171       line.
172

ENVIRONMENT VARIABLES

174       The following environment variables shall affect the execution of sort:
175
176       LANG      Provide a default value for  the  internationalization  vari‐
177                 ables  that are unset or null. (See the Base Definitions vol‐
178                 ume of POSIX.1‐2017, Section 8.2, Internationalization  Vari‐
179                 ables  for  the  precedence of internationalization variables
180                 used to determine the values of locale categories.)
181
182       LC_ALL    If set to a non-empty string value, override  the  values  of
183                 all the other internationalization variables.
184
185       LC_COLLATE
186                 Determine the locale for ordering rules.
187
188       LC_CTYPE  Determine  the  locale for the interpretation of sequences of
189                 bytes of text data as characters (for example, single-byte as
190                 opposed  to  multi-byte  characters  in  arguments  and input
191                 files) and the behavior of character classification  for  the
192                 -b, -d, -f, -i, and -n options.
193
194       LC_MESSAGES
195                 Determine the locale that should be used to affect the format
196                 and contents  of  diagnostic  messages  written  to  standard
197                 error.
198
199       LC_NUMERIC
200                 Determine  the locale for the definition of the radix charac‐
201                 ter and thousands separator for the -n option.
202
203       NLSPATH   Determine the location of message catalogs for the processing
204                 of LC_MESSAGES.
205

ASYNCHRONOUS EVENTS

207       Default.
208

STDOUT

210       Unless  the  -o  or -c options are in effect, the standard output shall
211       contain the sorted input.
212

STDERR

214       The standard error shall be used for diagnostic messages.  When  -c  is
215       specified,  if  disorder  is detected (or if -u is also specified and a
216       duplicate key is detected), a message shall be written to the  standard
217       error which identifies the input line at which disorder (or a duplicate
218       key) was detected. A warning message  about  correcting  an  incomplete
219       last  line  of  an input file may be generated, but need not affect the
220       final exit status.
221

OUTPUT FILES

223       If the -o option is in effect, the sorted input shall be written to the
224       file output.
225

EXTENDED DESCRIPTION

227       The notation:
228
229
230           -k field_start[type][,field_end[type]]
231
232       shall  define  a  key  field  that  begins  at  field_start and ends at
233       field_end inclusive, unless field_start falls beyond  the  end  of  the
234       line  or after field_end, in which case the key field is empty. A miss‐
235       ing field_end shall mean the last character of the line.
236
237       A field comprises a maximal sequence of non-separating characters  and,
238       in the absence of option -t, any preceding field separator.
239
240       The  field_start  portion  of the keydef option-argument shall have the
241       form:
242
243
244           field_number[.first_character]
245
246       Fields and characters within fields shall be numbered starting with  1.
247       The  field_number  and  first_character pieces, interpreted as positive
248       decimal integers, shall specify the first character to be used as  part
249       of  a  sort  key. If .first_character is omitted, it shall refer to the
250       first character of the field.
251
252       The field_end portion of the  keydef  option-argument  shall  have  the
253       form:
254
255
256           field_number[.last_character]
257
258       The  field_number  shall  be  as  described above for field_start.  The
259       last_character piece, interpreted as a  non-negative  decimal  integer,
260       shall specify the last character to be used as part of the sort key. If
261       last_character evaluates to zero  or  .last_character  is  omitted,  it
262       shall  refer to the last character of the field specified by field_num‐
263       ber.
264
265       If the -b option or b type modifier is in effect, characters  within  a
266       field  shall  be counted from the first non-<blank> in the field. (This
267       shall apply separately to first_character and last_character.)
268

EXIT STATUS

270       The following exit values shall be returned:
271
272        0    All input files were output successfully, or -c was specified and
273             the input file was correctly sorted.
274
275        1    Under the -c option, the file was not ordered as specified, or if
276             the -c and -u options were both specified, two input  lines  were
277             found with equal keys.
278
279       >1    An error occurred.
280

CONSEQUENCES OF ERRORS

282       The default requirements shall apply, except that if sort encounters an
283       error when opening or reading a file operand, it may exit without writ‐
284       ing any output to standard output or processing later operands.
285
286       The following sections are informative.
287

APPLICATION USAGE

289       The  default  value for -t, <blank>, has different properties from, for
290       example, -t"<space>". If a line contains:
291
292
293           <space><space>foo
294
295       the following treatment would occur with default separation as  opposed
296       to specifically selecting a <space>:
297
298                     ┌──────┬───────────────────┬──────────────┐
299Field Default      -t "<space>" 
300                     ├──────┼───────────────────┼──────────────┤
301                     │  1   │ <space><space>foo │ empty
302                     │  2   │ emptyempty
303                     │  3   │ empty             │ foo          │
304                     └──────┴───────────────────┴──────────────┘
305       The  leading  field  separator itself is included in a field when -t is
306       not used. For example, this command returns an  exit  status  of  zero,
307       meaning the input was already sorted:
308
309
310           sort -c -k 2 <<eof
311           y<tab>b
312           x<space>a
313           eof
314
315       (assuming  that  a  <tab> precedes the <space> in the current collating
316       sequence). The field separator is not included in a field  when  it  is
317       explicitly  set  via  -t.  This is historical practice and allows usage
318       such as:
319
320
321           sort -t "|" -k 2n <<eof
322           Atlanta|425022|Georgia
323           Birmingham|284413|Alabama
324           Columbia|100385|South Carolina
325           eof
326
327       where the second field can  be  correctly  sorted  numerically  without
328       regard to the non-numeric field separator.
329
330       The  wording  in the OPTIONS section clarifies that the -b, -d, -f, -i,
331       -n, and -r options have to come before the first sort key specified  if
332       they  are  intended  to  apply  to  all  specified  keys. The way it is
333       described in this volume of POSIX.1‐2017 matches  historical  practice,
334       not  historical  documentation.   The  results are unspecified if these
335       options are specified after a -k option.
336
337       The -f option might not work as expected in locales where there is  not
338       a one-to-one mapping between an uppercase and a lowercase letter.
339
340       When using sort to process pathnames, it is recommended that LC_ALL, or
341       at least LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environ‐
342       ment, since pathnames can contain byte sequences that do not form valid
343       characters in some locales, in which case the utility's behavior  would
344       be  undefined.  In  the  POSIX  locale each byte is a valid single-byte
345       character, and therefore this problem is avoided.
346
347       If the collating sequence of the current locale does not have  a  total
348       ordering of all characters, this can affect the behavior of sort in the
349       following ways:
350
351        *  As sort -u suppresses lines  with  duplicate  keys,  it  suppresses
352           lines that collate equally but are not identical.
353
354        *  The  output  of  sort (without -u) can contain identical lines that
355           are not adjacent, if it does not implement the recommended  further
356           byte-by-byte comparison of lines that collate equally. This affects
357           the use of sort with comm and uniq; see the APPLICATION  USAGE  for
358           those utilities.
359

EXAMPLES

361        1. The  following command sorts the contents of infile with the second
362           field as the sort key:
363
364
365               sort -k 2,2 infile
366
367        2. The following command sorts, in  reverse  order,  the  contents  of
368           infile1  and  infile2,  placing the output in outfile and using the
369           second character of the second field as the sort key (assuming that
370           the first character of the second field is the field separator):
371
372
373               sort -r -o outfile -k 2.2,2.2 infile1 infile2
374
375        3. The  following  command  sorts  the contents of infile1 and infile2
376           using the second non-<blank> of the second field as the sort key:
377
378
379               sort -k 2.2b,2.2b infile1 infile2
380
381        4. The following command prints the System V password file (user data‐
382           base)  sorted  by  the numeric user ID (the third <colon>-separated
383           field):
384
385
386               sort -t : -k 3,3n /etc/passwd
387
388        5. The following command prints the lines of the already  sorted  file
389           infile, suppressing all but one occurrence of lines having the same
390           third field:
391
392
393               sort -um -k 3.1,3.0 infile
394

RATIONALE

396       Examples in some historical documentation state that options  -um  with
397       one  input  file  keep  the first in each set of lines with equal keys.
398       This behavior was deemed to be an implementation artifact and  was  not
399       standardized.
400
401       The  -z option was omitted; it is not standard practice on most systems
402       and is inconsistent with using sort to sort several files  individually
403       and then merge them together. The text concerning -z in historical doc‐
404       umentation appeared to require implementations to determine the  proper
405       buffer  length  during  the sort phase of operation, but not during the
406       merge.
407
408       The -y option was omitted because of non-portability.  The  -M  option,
409       present in System V, was omitted because of non-portability in interna‐
410       tional usage.
411
412       An undocumented -T option exists in some implementations. It is used to
413       specify a directory for intermediate files. Implementations are encour‐
414       aged to support the use of the TMPDIR environment variable  instead  of
415       adding an option to support this functionality.
416
417       The  -k  option  was  added to satisfy two objections. First, the zero-
418       based counting used by sort is not consistent with other  utility  con‐
419       ventions. Second, it did not meet syntax guideline requirements.
420
421       Historical documentation indicates that ``setting -n implies -b''.  The
422       description of -n already states that  optional  leading  <blank>s  are
423       tolerated  in  doing  the  comparison.  If  -b  is enabled, rather than
424       implied, by -n, this has unusual side-effects. When a character  offset
425       is  used in a column of numbers (for example, to sort modulo 100), that
426       offset is measured relative to the most significant digit, not  to  the
427       column.   Based  upon  a recommendation from the author of the original
428       sort utility, the -b implication has been omitted from this  volume  of
429       POSIX.1‐2017, and an application wishing to achieve the previously men‐
430       tioned side-effects has to code the -b flag explicitly.
431
432       Earlier versions of this standard allowed the -o option to appear after
433       operands.  Historical  practice  allowed all options to be interspersed
434       with operands. This version of the standard allows  implementations  to
435       accept  options  after  operands but conforming applications should not
436       use this form.
437
438       Earlier versions of this standard also allowed the -number and  +number
439       options.  These options are no longer specified by POSIX.1‐2008 but may
440       be present in some implementations.
441
442       Historical implementations produced a message on standard error when -c
443       was specified and disorder was detected, and when -c and -u were speci‐
444       fied and a duplicate key was detected. An earlier version of this stan‐
445       dard contained wording that did not make it clear that this message was
446       allowed and some implementations removed this message to be  sure  that
447       they  conformed  to  the  standard's requirements. Confronted with this
448       difference in behavior, interactive users that wanted to be  sure  that
449       they  got visual feedback instead of just exit code 1 could have used a
450       command like:
451
452
453           sort -c file || echo disorder
454
455       whether or not the sort utility provided a message in this  case.  But,
456       it  was not easy for a user to find where the disorder or duplicate key
457       occurred on implementations that do not produce a  message,  especially
458       when some parts of the input line were not part of the key and when one
459       or more of the -b, -d, -f, -i, -n, or -r options or keydef  type  modi‐
460       fiers  were  in  use. POSIX.1‐2008 requires a message to be produced in
461       this case. POSIX.1‐2008 also contains the -C option  giving  users  the
462       ability to choose either behavior.
463
464       When  a disorder or duplicate is found when the -c option is specified,
465       some implementations print a message containing the first line that  is
466       out of order or contains a duplicate key; others print a message speci‐
467       fying the line number of  the  offending  line.  This  standard  allows
468       either type of message.
469
470       Implementations are encouraged to perform the recommended further byte-
471       by-byte comparison of lines that collate equally, even though this  may
472       affect  efficiency.  The  impact on efficiency can be mitigated by only
473       performing the additional comparison if the current locale's  collating
474       sequence  does  not  have  a  total  ordering of all characters (if the
475       implementation provides a way to query this) or by only performing  the
476       additional comparison if the locale name associated with the LC_COLLATE
477       category has an '@' modifier in the name (since locales without an  '@'
478       modifier  should have a total ordering of all characters — see the Base
479       Definitions volume of POSIX.1‐2017, Section 7.3.2,  LC_COLLATE).   Note
480       that  if  the implementation provides a stable sort option as an exten‐
481       sion (usually -s), the additional comparison should  not  be  performed
482       when this option has been specified.
483

FUTURE DIRECTIONS

485       A  future  version  of  this standard may require that if the collating
486       sequence of the current locale does not have a total  ordering  of  all
487       characters, any lines of input that collate equally when comparing them
488       as whole lines are further compared byte-by-byte  using  the  collating
489       sequence for the POSIX locale.
490

SEE ALSO

492       comm, join, uniq
493
494       The Base Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COLLATE,
495       Chapter 8, Environment Variables, Section 12.2, Utility  Syntax  Guide‐
496       lines
497
498       The System Interfaces volume of POSIX.1‐2017, toupper()
499
501       Portions  of  this text are reprinted and reproduced in electronic form
502       from IEEE Std 1003.1-2017, Standard for Information Technology --  Por‐
503       table  Operating System Interface (POSIX), The Open Group Base Specifi‐
504       cations Issue 7, 2018 Edition, Copyright (C) 2018 by the  Institute  of
505       Electrical  and  Electronics Engineers, Inc and The Open Group.  In the
506       event of any discrepancy between this version and the original IEEE and
507       The  Open Group Standard, the original IEEE and The Open Group Standard
508       is the referee document. The original Standard can be  obtained  online
509       at http://www.opengroup.org/unix/online.html .
510
511       Any  typographical  or  formatting  errors that appear in this page are
512       most likely to have been introduced during the conversion of the source
513       files  to  man page format. To report such errors, see https://www.ker
514       nel.org/doc/man-pages/reporting_bugs.html .
515
516
517
518IEEE/The Open Group                  2017                             SORT(1P)
Impressum