1
2MSORT(1)                         User Commands                        MSORT(1)
3
4
5

NAME

7       msort - sort records in complex ways
8

SYNOPSIS

10       msort <options> [<input file>]
11

DESCRIPTION

13       msort  is  a  program for sorting text files in sophisticated ways.  It
14       was developed initially for alphabetizing dictionaries of languages  in
15       which  the  ordering  may  be quite different from English but has many
16       other uses.
17
18       msort allows you to sort blocks of text delimited in a number  of  ways
19       rather  than just lines and to specify particular fields of a record as
20       sort keys using either their position, counted from either end,  or  by
21       matching regular expressions to their tags.
22
23       msort  is capable of sorting on multiple keys, so that when two records
24       tie on one key, the tie may be broken on another. Any or all  keys  may
25       be  optional.   How  absent  optional  keys are ordered with respect to
26       present keys may be set separately for each key.
27
28       msort allows you to specify arbitrary sort orders and to define  virtu‐
29       ally  unlimited numbers of multigraphs of effectively unlimited length.
30       The sort order and multigraphs are defined separately for each key.  If
31       your system has locale support, you can also use locale collation rules
32       instead of specify your own sort order.
33
34
35       msort provides twelve types of key comparison: lexicographic,  numeric,
36       numeric  string, hybrid, by string length, by angle, by date, by domain
37       name, by time, by ISO8601 date/time stamp, by month name, and random.
38
39
40       What month names are used is a bit complicated. If the -s flag is  used
41       on the same key and its argument is the name of a file, the month names
42       are read from the file, which should be in the same format  as  a  sort
43       order  definition  file.  If  the -s flag is used and its argument is a
44       locale name, the month names recognized will be  the  month  names  and
45       abbreviations  associated  with the specified locale. If the -s flag is
46       not used the month names recognized will be the month names and  abbre‐
47       viations  associated  with  the current locale. If your system does not
48       have locale support and you do not use the -s flag to  read  the  month
49       names from a file, the month names recognized will be the English month
50       names and abbreviations.
51
52
53       msort can reverse the characters in a key, allowing it to  be  used  to
54       generate reverse dictionaries.
55
56       A choice of sorting algorithms is provided.
57
58       msort fully supports Unicode. The text to be sorted, and all specifica‐
59       tions, should be in UTF-8 Unicode. (If you have plain ASCII text,  this
60       is  not  a problem as ASCII is a subset of Unicode.) Full Unicode case-
61       folding is available, in Turkic and non-Turkic variants.  Unicode  nor‐
62       malization is performed before sorting.
63
64       For usage information, execute msort with no arguments.
65
66       Full  information about msort is currently to be found in the reference
67       manual, which is distributed as a PDF (Portable Document Format)  file.
68       If  a  copy  is not available locally, you can download it from msort's
69       home page:
70       http://billposer.org/Software/msort.html
71
72

OPTIONS

74   Informational options
75       -h,--help
76              Print usage message
77
78       -v,--version
79              Print version message
80
81       -D,--defaults
82              List defaults
83
84       -F,--general-options
85              List general command line options
86
87       -G,--gnu-equivalences
88              List equivalents for GNU sort command line options.
89
90       -H,--informational-options
91              List informational command line options
92
93       -K,--key-specific-options
94              List key-specific command line options
95
96       -L,--limits
97              List limits
98
99       -N,--number-systems
100              List the supported number systems.
101
102   General options
103       -b,--block
104              A record is terminated by two or more newlines
105
106       -l,--line
107              A record consists of a single line
108
109       -r,--record-separator <separator>
110              A record is terminated by separator character
111
112       -O,--fixed-size-record <bytes>
113              A record consists of the specified number of bytes.
114
115       -d,--field-separators <character>+
116              Fields are delimited by the named character(s)
117
118       -w,--whole
119              Sort on the entire text of the record
120
121       -a,--algorithm <algorithm>
122              Use the specified sort algorithm. The choices  are:  I(nsertion‐
123              Sort),  M(ergeSort),  Q(uickSort),  and  S(hellSort).  Note that
124              InsertionSort and MergeSort  are  stable,  while  QuickSort  and
125              ShellSort are unstable. The default is QuickSort.
126
127       -M,-initial-maximum-records <records>
128              Set initial maximum number of records
129
130       -m,--line-end-carriage-return
131              End-of-line  in  the  input  data  is  marked by Carriage Return
132              (0x0D) as on the Macintosh rather than by Line Feed (0x0A) as on
133              Unix systems.
134
135       -I,--invert-globally
136              Invert sense of comparisons globally
137
138       -B,--BMP
139              No  characters fall outside the Basic Multingual Plane (that is,
140              have values greater than 0xFFFF).
141
142       -Z,--skip-first-record
143              Copy the first record in the input to the output without sorting
144              it. This is useful for sorting files with a header.
145
146       -p,--reserve-private-use-area
147              Do  not  make internal use of the Private Use areas. By default,
148              multigraphs are assigned internally to codepoints in the Supple‐
149              mentary  Private Use areas if full Unicode is in use or to code‐
150              points in the Private Use area if input  is  restricted  to  the
151              Basic  Multilingual  Plane  by  means  of the -B option. If your
152              input makes use of the Private Use areas, this  option  prevents
153              interference  with your input. In this case, multigraphs will be
154              assigned to the Low and High  Surrogate  areas  (0xD800-0xDFFF).
155              Note that this limits the number of multigraphs to 2,048.
156
157       -P,--random-seed <seed>
158              Set  the  seed for the random number generator. If not set here,
159              it is set to a value determined by the time. The  seed  used  is
160              reported in the log. This option allows runs to be replicated.
161
162       -Q,--check-only
163              Check  whether  the input is already sorted. Do not generate any
164              output.  Exit status is 0 if input is already sorted, 11 if  not
165              sorted.
166
167       -1,--in <input file name>
168
169       -2,--out <output file name>
170              If the output file is the same as the input file, the input file
171              will be overwritten. The input file will not be  overwritten  if
172              the run is unsuccessful.
173
174       -j,--suppress-log
175              Suppress  output  to the log. If this flag is given before there
176              is any output to the log from a command line flag, nothing  will
177              be written to the log and the log file will not be created. If a
178              command line flag generates a log message before  this  flag  is
179              processed, the log file will be created but no log messages will
180              be written to it once this flag is processed. To guarantee  that
181              no  attempt  will  be  made  to  open a log file, give this flag
182              first.
183
184       -q,--quiet
185              Be quiet - do not chat while working
186
187       -u,--unicode-normalization <mode>
188              Select Unicode normalization mode. The choices of  mode  are:  c
189              for  normalization  form  C  (NFC),  d  for normalization form D
190              (NFD), C for normalization form KC (NFKC), D  for  normalization
191              form KD (NFKD), and n for no normalization. The default is NFC.
192
193   Key specific options
194       -e,--character-range <m,n>
195              Sort on characters m through n. Positive indices start from one.
196              Negative indices indicate position with respect to  the  end  of
197              the  record.   For example, the range 3,-2 consists of the third
198              character through the next-to-last character.
199
200       -n,--position <POS>(,<POS>)
201              Sort on the specified POS or contiguous range of POSs,  where  a
202              POS  is  of  the  form <field number>(.<character number>). Both
203              counts begin at one.  Field numbers but  not  character  numbers
204              may  be negative, in which case they are counted from the right.
205              Thus, 1.2 is the second character of the first  field;  -2.1  is
206              the first character of the next to last field.
207
208       -t,--tag <tag regexp>
209              Sort on the field with the specified tag
210
211       -o,--optional <comparison>
212              Optional: compare as (<,=,>) to present key if absent
213
214       -C,--fold-case
215              Fold case
216
217       -z,--fold-case-turkic
218              Fold case with additional Turkic conversions.
219
220       -c,--comparison-type <comparison type>
221              a(ngle),l(exicographic),  i(so8601  date/time),  t(ime), D(omain
222              name/email address), d(ate), m(onth name),  n(umeric),  N(umeric
223              string),s(ize), h(hybrid), r(andom)
224
225       -y,--number-system <number system>
226              Specifies  the number system expected for this key. This affects
227              only numeric and numeric string keys. There are two special val‐
228              ues. If the number system is "all", records may contain any num‐
229              ber system that msort can interpret. Different records may  con‐
230              tain  different  number systems.  If the number system is "any",
231              records may contain any writing system that msort can interpret,
232              but  all records must make use of the same number system.  msort
233              sets the number system on the basis of the first record.
234
235       -f,--date-format <date format>
236              Permutation of ymd with separators, e.g. y-m-d for international
237              date format, m/d/y for American date format, or a permutation of
238              yd with separators, e.g. y-d, for day-of-year dates.  All  three
239              components  may  be  numbers in any available number system. The
240              month field may also be a month name,  determined  by  the  same
241              devices as independent month name fields.
242
243       -W,--sort-order-file-separators <file name>
244              Read  the  list of characters to be treated as separators in the
245              sort order definition file.
246
247       -S,--substitutions <file name>
248              Read substitutions from named file
249
250       -s,--sort-order <file name>|<locale name>|"locale"
251              If the argument is a file name, it is taken to be a  sort  order
252              file  and  the  sort order for the key is read from the file. If
253              the argument is a locale name,  the  collation  rules  for  that
254              locale  are  used.  If  the  argument is "locale", the collation
255              rules for the current locale are used.
256
257       -T,--transformations <(d)(e)(s)>
258              Apply the specified transformations.  d specifies that  diacrit‐
259              ics  are to be stripped. Separately encoded combining diacritics
260              are removed. Characters with  diacritics  represented  by single
261              codepoints  are  replaced with the corresponding ASCII character
262              without the diacritics, if  there  is  one.   e  specifies  that
263              enclosed  characters,  that  is,  characters  within  circles or
264              parentheses, are to be replaced  with  the  corresponding  plain
265              ASCII character if there is one.  s specifies that characters in
266              special styles are to be replaced with the  corresponding  plain
267              ASCII  character if there is one. Stylistic equivalents include:
268              small capitals (e.g. U+1D04), script forms (e.g. U+212C),  black
269              letter  forms  (e.g.  U+212D),  Arabic  presentation forms (e.g.
270              U+FE81), Hebrew  presentation  forms  (e.g.  U+FB1D),  fullwidth
271              forms  (e.g.  U+FF01),  halfwidth  forms  (e.g. U+FF7B), and the
272              mathematical alphanumeric symbols (e.g. U+1D400).
273
274       -x,--exclusion-file <file name>
275              Read exclusions from named file
276
277       -X,--exclude-characters <exclusions>
278              Exclude specified characters
279
280       -i,--invert-locally
281              Invert sense of comparisons
282
283       -R,--reverse-key
284              Reverse characters of key
285
286       -A,--first-character-only
287              Ignore all but the first character of the field, after substitu‐
288              tions, exclusions, etc.
289
290       Note: long options may not be available on your system.
291

SEE ALSO

293       sort(1), uninum(3)
294
295

AUTHOR

297       Bill Poser (billposer@alum.mit.edu)
298

LICENSE

300       GNU General Public License (http://www.gnu.org/licenses/gpl.html), ver‐
301       sion 3.
302
303
304
305
306msort                            January 2010                         MSORT(1)
Impressum