1Sort::Naturally(3)    User Contributed Perl Documentation   Sort::Naturally(3)
2
3
4

NAME

6       Sort::Naturally -- sort lexically, but sort numeral parts numerically
7

SYNOPSIS

9         @them = nsort(qw(
10          foo12a foo12z foo13a foo 14 9x foo12 fooa foolio Foolio Foo12a
11         ));
12         print join(' ', @them), "\n";
13
14       Prints:
15
16         9x 14 foo fooa foolio Foolio foo12 foo12a Foo12a foo12z foo13a
17
18       (Or "foo12a" + "Foo12a" and "foolio" + "Foolio" and might be switched,
19       depending on your locale.)
20

DESCRIPTION

22       This module exports two functions, "nsort" and "ncmp"; they are used in
23       implementing my idea of a "natural sorting" algorithm.  Under natural
24       sorting, numeric substrings are compared numerically, and other word-
25       characters are compared lexically.
26
27       This is the way I define natural sorting:
28
29       •   Non-numeric word-character substrings are sorted lexically, case-
30           insensitively: "Foo" comes between "fish" and "fowl".
31
32       •   Numeric substrings are sorted numerically: "100" comes after "20",
33           not before.
34
35       •   \W substrings (neither words-characters nor digits) are ignored.
36
37       •   Our use of \w, \d, \D, and \W is locale-sensitive:  Sort::Naturally
38           uses a "use locale" statement.
39
40       •   When comparing two strings, where a numeric substring in one place
41           is not up against a numeric substring in another, the non-numeric
42           always comes first.  This is fudged by reading pretending that the
43           lack of a number substring has the value -1, like so:
44
45             foo       =>  "foo",  -1
46             foobar    =>  "foo",  -1,  "bar"
47             foo13     =>  "foo",  13,
48             foo13xyz  =>  "foo",  13,  "xyz"
49
50           That's so that "foo" will come before "foo13", which will come
51           before "foobar".
52
53       •   The start of a string is exceptional: leading non-\W (non-word,
54           non-digit) components are are ignored, and numbers come before
55           letters.
56
57       •   I define "numeric substring" just as sequences matching m/\d+/ --
58           scientific notation, commas, decimals, etc., are not seen.  If your
59           data has thousands separators in numbers ("20,000 Leagues Under The
60           Sea" or "20.000 lieues sous les mers"), consider stripping them
61           before feeding them to "nsort" or "ncmp".
62
63   The nsort function
64       This function takes a list of strings, and returns a copy of the list,
65       sorted.
66
67       This is what most people will want to use:
68
69         @stuff = nsort(...list...);
70
71       When nsort needs to compare non-numeric substrings, it uses Perl's "lc"
72       function in scope of a <use locale>.  And when nsort needs to lowercase
73       things, it uses Perl's "lc" function in scope of a <use locale>.  If
74       you want nsort to use other functions instead, you can specify them in
75       an arrayref as the first argument to nsort:
76
77         @stuff = nsort( [
78                           \&string_comparator,   # optional
79                           \&lowercaser_function  # optional
80                         ],
81                         ...list...
82                       );
83
84       If you want to specify a string comparator but no lowercaser, then the
85       options list is "[\&comparator, '']" or "[\&comparator]".  If you want
86       to specify no string comparator but a lowercaser, then the options list
87       is "['', \&lowercaser]".
88
89       Any comparator you specify is called as "$comparator->($left, $right)",
90       and, like a normal Perl "cmp" replacement, must return -1, 0, or 1
91       depending on whether the left argument is stringwise less than, equal
92       to, or greater than the right argument.
93
94       Any lowercaser function you specify is called as "$lowercased =
95       $lowercaser->($original)".  The routine must not modify its $_[0].
96
97   The ncmp function
98       Often, when sorting non-string values like this:
99
100          @objects_sorted = sort { $a->tag cmp $b->tag } @objects;
101
102       ...or even in a Schwartzian transform, like this:
103
104          @strings =
105            map $_->[0]
106            sort { $a->[1] cmp $b->[1] }
107            map { [$_, make_a_sort_key_from($_) ]
108            @_
109          ;
110
111       ...you wight want something that replaces not "sort", but "cmp".
112       That's what Sort::Naturally's "ncmp" function is for.  Call it with the
113       syntax "ncmp($left,$right)" instead of "$left cmp $right", but
114       otherwise it's a fine replacement:
115
116          @objects_sorted = sort { ncmp($a->tag,$b->tag) } @objects;
117
118          @strings =
119            map $_->[0]
120            sort { ncmp($a->[1], $b->[1]) }
121            map { [$_, make_a_sort_key_from($_) ]
122            @_
123          ;
124
125       Just as with "nsort" can take different a string-comparator and/or
126       lowercaser, you can do the same with "ncmp", by passing an arrayref as
127       the first argument:
128
129         ncmp( [
130                 \&string_comparator,   # optional
131                 \&lowercaser_function  # optional
132               ],
133               $left, $right
134             )
135
136       You might get string comparators from Sort::ArbBiLex.
137

NOTES

139       •   This module is not a substitute for Sort::Versions!  If you just
140           need proper version sorting, use that!
141
142       •   If you need something that works sort of like this module's
143           functions, but not quite the same, consider scouting thru this
144           module's source code, and adapting what you see.  Besides the
145           functions that actually compile in this module, after the POD,
146           there's several alternate attempts of mine at natural sorting
147           routines, which are not compiled as part of the module, but which
148           you might find useful.  They should all be working implementations
149           of slightly different algorithms (all of them based on Martin
150           Pool's "nsort") which I eventually discarded in favor of my
151           algorithm.  If you are having to naturally-sort very large data
152           sets, and sorting is getting ridiculously slow, you might consider
153           trying one of those discarded functions -- I have a feeling they
154           might be faster on large data sets.  Benchmark them on your data
155           and see.  (Unless you need the speed, don't bother.  Hint:
156           substitute "sort" for "nsort" in your code, and unless your program
157           speeds up drastically, it's not the sorting that's slowing things
158           down.  But if it is "nsort" that's slowing things down, consider
159           just:
160
161                 if(@set >= SOME_VERY_BIG_NUMBER) {
162                   no locale; # vroom vroom
163                   @sorted = sort(@set);  # feh, good enough
164                 } elsif(@set >= SOME_BIG_NUMBER) {
165                   use locale;
166                   @sorted = sort(@set);  # feh, good enough
167                 } else {
168                   # but keep it pretty for normal cases
169                   @sorted = nsort(@set);
170                 }
171
172       •   If you do adapt the routines in this module, email me; I'd just be
173           interested in hearing about it.
174
175       •   Thanks to the EFNet #perl people for encouraging this module,
176           especially magister and a-mused.
177
179       Copyright 2001, Sean M. Burke "sburke@cpan.org", all rights reserved.
180       This program is free software; you can redistribute it and/or modify it
181       under the same terms as Perl itself.
182
183       This program is distributed in the hope that it will be useful, but
184       without any warranty; without even the implied warranty of
185       merchantability or fitness for a particular purpose.
186

AUTHOR

188       Sean M. Burke "sburke@cpan.org"
189
190
191
192perl v5.34.0                      2022-01-21                Sort::Naturally(3)
Impressum