1Sort::Maker(3)        User Contributed Perl Documentation       Sort::Maker(3)
2
3
4

NAME

6       Sort::Maker - A simple way to make efficient sort subs
7

SYNOPSIS

9               use Sort::Maker ;
10
11               my $sorter = make_sorter( ... ) ;
12

DESCRIPTION

14       This module has two main goals: to make it easy to create correct sort
15       functions, and to make it simple to select the optimum sorting
16       algorithm for the number of items to be sorted. Sort::Maker generates
17       complete sort subroutines in one of four styles, plain, orcish
18       manouver, Schwartzian Transform and the Guttman-Rosler Transform. You
19       can also get the source for a sort sub you create via the sorter_source
20       call.
21

"make_sorter"

23       The sub "make_sorter" is exported by Sort::Maker. It makes a sort sub
24       and returns a reference to it. You describe how you want it to sort its
25       input by passing general options and key descriptions to "make_sorter".
26
27   Arguments to "make_sorter"
28       There are two types of arguments, boolean and value. Boolean arguments
29       can be set just with the option name and can optionally be followed by
30       '1'. You can easily set multiple boolean general arguments with qw().
31       Value arguments must have a following value.  Arguments can appear in
32       any order but the key descriptions (see below) must appear in their
33       sort order. The code examples below show various ways to set the
34       various arguments.
35
36       Arguments fall into four categories: selecting the style of the sort,
37       key descriptions, setting defaults for key description attributes, and
38       setting general flags and values. The following sections will describe
39       the categories and their associated arguments.
40
41   Sort Style
42       The style of the sort to be made is selected by setting one of the
43       following Boolean arguments. Only one may be set otherwise an error is
44       reported (see below for error handling). Also see below for detailed
45       descriptions of the supported sort styles.
46
47               plain
48               orcish
49               ST
50               GRT
51
52               # Make a plain sorter
53               my $plain_sorter = make_sorter( qw( plain ) ... ) ;
54
55               # Make an orcish manouevre sorter
56               my $orcish_sorter = make_sorter( orcish => 1 ... ) ;
57
58               # Make a Schwartzian Transform sorter
59               my $st_sorter = make_sorter( 'ST', 1, ... ) ;
60
61               # Make a GRT sort
62               my $GRT = make_sorter( 'GRT', ... ) ;
63
64   Key Attribute Defaults
65       The following arguments set defaults for the all of the keys'
66       attributes.  These default values can be overridden in any individual
67       key.  Only one of the attributes in each of the groups below can be set
68       as defaults or for any given key. If more than one attribute in each
69       group is set, then "make_sorter" will return an error.  The attribute
70       that is the default for each group is marked.  See below for details on
71       key attributes.
72
73               ascending       (default)
74               descending
75
76               case            (default)
77               no_case
78
79               signed
80               unsigned
81               signed_float    (default)
82               unsigned_float
83
84               fixed
85               varying
86
87   General Options
88       These arguments set general options that apply to how the generated
89       sorter interacts with the outside world.
90
91       "name"
92
93       This is a value option which exports the generated sort sub to that
94       name. The call to "make_sorter" must be run to install the named named
95       function before it is called. You should still check the result of
96       "make_sorter" to see if an error occurred (it returns undef).
97
98               my $sorter = make_sorter( name => 'sort_func', ... ) ;
99               die "make_sorter: $@" unless $sorter ;
100
101               ...
102
103               @sorted = sort_func @unsorted ;
104
105       "ref_in/ref_out"
106
107       This boolean arguments specifies that the input to and output from the
108       sort sub will be array references. "ref_in" makes the sorter only take
109       as input a single array reference (which contains the unsorted
110       records). "ref_out" makes the sorter only return a single array
111       reference (which contains the sorted records). You can set both of
112       these options in a sorter.
113
114       Note: This does not affect key extraction code which still gets each
115       record in $_. It only modifies the I/O of the generated sorter.
116
117               # input records are in an array reference
118               my $sorter = make_sorter( qw( ref_in ), ... ) ;
119               @sorted_array = $sorter->( \@unsorted_input ) ;
120
121               # sorted output records are in an array reference
122               my $sorter = make_sorter( ref_out => 1, ... ) ;
123               $sorted_array_ref = $sorter->( @unsorted_input ) ;
124
125               # input and output records are in array references
126               my $sorter = make_sorter( qw( ref_in ref_out ), ... ) ;
127               $sorted_array_ref = $sorter->( \@unsorted_input ) ;
128
129       "string_data"
130
131       This boolean argument specifies that the input records will be plain
132       text strings with no null (0x0) bytes in them.  It is only valid for
133       use with the GRT and it is ignored for the other sort styles. It tells
134       the GRT that it can put the record directly into the string cache and
135       it will be separated from the packed keys with a null byte (hence that
136       restriction). This is an optimization that can run slightly faster than
137       the normal index sorting done with the GRT. Run this to see the
138       benchmark results.
139
140               perl t/string_data.t -bench
141
142       "init_code"
143
144       This value argument is code that will be put into the beginning of the
145       generated sorter subroutine. It is meant to be used to declare lexical
146       variables that the extraction code can use. Normally different
147       extraction code have no way to share common code. By declaring lexicals
148       with the "init_code" option, some key extraction code can save data
149       there for use by another key. This is useful if you have two (or more)
150       keys that share a complex piece of code such as accessing a deep value
151       in a record tree.
152
153       For example, suppose the input record is an array of arrays of hashes
154       of strings and the string has 2 keys that need to be grabbed by a
155       regex. The string is a string key, a ':' and a number key. So the
156       common part of the key extraction is:
157
158               $_->[0][0]{a}
159
160       And the make_sorter call is:
161
162               my $sorter = make_sorter(
163                       'ST',
164                       init_code => 'my( $str, $num ) ;',
165                       string => 'do{( $str, $num ) =
166                               $_->[0][0]{a} =~ /^(\w+):(\d+)$/; $str}',
167                       number => '$num'
168               ) ;
169
170       In the above code both keys are extracted in the first key extraction
171       code and the number key is saved in $num. The second key extraction
172       code just uses that saved value.
173
174       Note that "init_code" is only useful in the ST and GRT sort styles as
175       they process all the keys of a record at one time and can use variables
176       declared in "init_code" to transfer data to later keys. The plain and
177       orcish sorts may not process a later key at the same time as an earlier
178       key (that only happens when the earlier key is compared to an equal
179       key). Also for "init_code" to be a win, the data set must be large
180       enough and the work to extract the keys must be hard enough for the
181       savings to be noticed. The test init_code.t shows some examples and you
182       can see the speedup when you run:
183
184               perl t/init_code.t -bench
185
186   Key Description Arguments
187       Sorting data requires that records be compared in some way so they can
188       be put into a proper sequence. The parts of the records that actually
189       get compared are called its keys. In the simplest case the entire
190       record is the key, as when you sort a list of numbers or file names.
191       But in many cases the keys are embedded in the full record and they
192       need to be extracted before they can be used in comparisons.
193       Sort::Maker uses key descriptions that extract the key from the record,
194       and optional other attributes that will help optimize the sorting
195       operation. This section will explain how to pass key description
196       arguments to the make_sorter subroutine and what the various attributes
197       mean and how to best use them.
198
199       The generated sorter will sort the records according to the order of
200       the key arguments. The first key is used to compare a pair of records
201       and if they are deemed equal, then the next key is examined. This
202       happens until the records are given an ordering or you run out of keys
203       and the records are deemed equal in sort order.  Key descriptions can
204       be mixed with the other arguments which can appear in any order and
205       anywhere in the argument list, but the keys themselves must be in the
206       desired order.
207
208       A key argument is either 'string' or 'number' followed by optional
209       attributes. The key type sets the way that the key is compared (e.g.
210       using 'cmp' or '<=>').  All key attributes can be set from the default
211       values in the global arguments or set in each individual key
212       description.
213
214       There are 4 ways to provide attributes to a key:
215
216       No attributes
217
218       A key argument which is either at the end of the argument list or is
219       followed by a valid keyword token has no explict attributes. This key
220       will use the default attributes.  In both of these examples, a default
221       attribute was set and used by the key description which is just a
222       single key argument.
223
224               # sort the record as a single number in descending order
225               my $sorter = make_sorter( qw( plain number descending ) ) ;
226
227               # sort the record as a case sensitive string
228               my $sorter = make_sorter( qw( plain case string ) ) ;
229
230               # sort the record as a single number in ascending order
231               my $sorter = make_sorter( qw( ST number ) ) ;
232
233       Only Code as a Value
234
235       A key argument which is followed by a scalar value which is not a valid
236       keyword token, will use that scalar value as its key extraction code.
237       See below for more on key extraction code.
238
239               # sort by the first (optionally signed) number matched
240               my $sorter = make_sorter( qw( plain number /([+-]?\d+)/ ) ) ;
241
242               # string sort by the 3rd field in the input records (array refs)
243               my $sorter = make_sorter( 'ST', string => '$_->[2]' ) ;
244
245       An Array Reference
246
247       A key argument which is followed by an array reference will parse that
248       array for its description attributes. As with the general boolean
249       arguments, any boolean attribute can be optionally followed by a '1'.
250       Value attributes must be followed by their value.
251
252               # another way to specify the same sort as above
253               # sort by the first (optionally signed) number matched
254
255               my $sorter = make_sorter(
256                       qw( plain ),
257                       number => [
258                               code => '/(\d+)/',
259                               'descending',
260                       ],
261               ) ;
262
263               # same sort but for the GRT which uses the 'unsigned'
264               # attribute to optimize the sort.
265
266               my $sorter = make_sorter(
267                       qw( GRT ),
268                       number => [
269                               qw( descending unsigned ),
270                               code => '/(\d+)/',
271                       ],
272               ) ;
273
274       A Hash Reference
275
276       A key argument which is followed a hash reference will use that hash as
277       its description attributes. Any boolean attribute in the hash must have
278       a value of '1'.  Value attributes must be followed by their value.
279
280               # another way to specify the same sort as above
281               # sort by the first (optionally signed) number matched
282
283               my $sorter = make_sorter(
284                       qw( plain ),
285                       number => {
286                               code => '/(\d+)/',
287                               descending => 1,
288                       },
289               ) ;
290
291               # a multi-key sort. the first key is a descending unsigned
292               # integer and the second is a string padded to 10 characters
293
294               my $sorter = make_sorter(
295                       qw( GRT ),
296                       number => {
297                               code => '/(\d+)/',
298                               descending => 1,
299                               unsigned => 1,
300                       },
301                       string => {
302                               code => '/FOO<(\w+)>/',
303                               fixed => 10,
304                       },
305               ) ;
306
307   Key Description Attributes
308       What follows are the attributes for key descriptions. Most use the
309       default values passed in the arguments to "make_sorter".
310
311       "code"
312
313       This value attribute is the code that will be used to extract a key
314       from the input record. It can be a string of Perl code, a qr// regular
315       expression (Regexp reference) or an anonymous sub (CODE reference) that
316       operates on $_ and extracts a value.  The code will be wrapped in a
317       do{} block and called in a list context so that regular expressions can
318       just use () to grab a key value. The code defaults to $_ which means
319       the entire record is used for this key. You can't set the default for
320       code (unlike all the other key attributes). See the section on
321       Extraction Code for more.
322
323               # make an ST sort of the first number grabbed in descending order
324
325               my $sorter = make_sorter(
326                       qw( ST ),
327                       number => {
328                               code    => '/(\d+)/',
329                               descending => 1,
330                       },
331               ) ;
332
333       "ascending/descending"
334
335       These two Boolean attributes control the sorting order for this key. If
336       a key is marked as "ascending" (which is the initial default for all
337       keys), then lower keys will sort before higher keys. "descending" sorts
338       have the higher keys sort before the lower keys. It is illegal to have
339       both set in the defaults or in any key.
340
341               # sort by descending order of the first grabbed number
342               # and then sort in ascending order the first grabbed <word>
343
344               my $sorter = make_sorter(
345                       qw( ST descending ),
346                       number => {
347                               code    => '/(\d+)/',
348                       },
349                       string => {
350                               code    => '/<(\w+)>/',
351                               ascending => 1,
352                       },
353               ) ;
354
355               # this will return undef and store an error in $@.
356               # you can't have both 'ascending' and 'descending' as defaults
357
358               my $sorter = make_sorter(
359                       qw( ST ascending descending ),
360                       number => {
361                               code    => '/(\d+)/',
362                               descending => 1,
363                       },
364               ) ;
365
366               # this will return undef and store an error in $@.
367               # you can't have both 'ascending' and 'descending' in a key
368
369               my $sorter = make_sorter(
370                       qw( ST )
371                       number => {
372                               code    => '/(\d+)/',
373                               descending => 1,
374                               ascending => 1,
375                       },
376               ) ;
377
378       "case/no_case"
379
380       These two Boolean attributes control how 'string' keys handle case
381       sensitivity. If a key is marked as "case" (which is the initial default
382       for all keys), then keys will treat upper and lower case letters as
383       different.  If the key is marked as "no_case" then they are treated as
384       equal.  It is illegal to have both set in the defaults or in any key.
385       Internally this uses the uc() function so you can use locale settings
386       to affect string sorts.
387
388               # sort by the first grabbed word with no case
389               # and then sort the grabbed <word> with case
390
391               my $sorter = make_sorter(
392                       qw( ST no_case ),
393                       string => {
394                               code    => '/(\w+)/',
395                       },
396                       string => {
397                               code    => '/<(\w+)>/',
398                               case => 1,
399                       },
400               ) ;
401
402               # this will return undef and store an error in $@.
403               # you can't have both 'case' and 'no_case' as defaults
404
405               my $sorter = make_sorter(
406                       qw( ST no_case case ),
407                       string => {
408                               code    => '/(\w+)/',
409                       },
410               ) ;
411
412               # this will return undef and store an error in $@.
413               # you can't have both 'case' and 'no_case' in a key
414
415               my $sorter = make_sorter(
416                       qw( ST )
417                       string => {
418                               code    => '/(\w+)/',
419                               no_case => 1,
420                               case    => 1,
421                       },
422               ) ;
423
424       "closure"
425
426       This Boolean attribute causes this key to use call its CODE reference
427       to extract its value. This is useful if you need to access a lexical
428       variable during the key extraction. A typical use would be if you have
429       a sorting order stored in a lexical and need to access that from the
430       extraction code. If you didn't set the "closure" attribute for this
431       key, the generated source (see Key Extraction) would not be able to see
432       that lexical which will trigger a Perl compiling error in make_sorter.
433
434               my @months = qw(
435                       January February March April May June
436                       July August September October November December ) ;
437               my @month_jumble = qw(
438                       February June October March January April
439                       July November August December May September ) ;
440
441               my %month_to_num ;
442               @month_to_num{ @months } = 1 .. @months ;
443
444       # this will fail to generate a sorter if 'closure' is removed # as
445       %month_to_num will not be in scope to the eval inside sort_maker.
446
447               my $sorter = make_sorter(
448                       'closure',
449                       number => sub { $month_to_num{$_} },
450               ) ;
451
452               my @sorted = $sorter->( @month_jumble ) ;
453
454       "signed/unsigned/signed_float/unsigned_float" (GRT only)
455
456       These Boolean attributes are only used by the GRT sort style. They are
457       meant to describe the type of a number key so that the GRT can best
458       process and cache the key's value. It is illegal to have more than one
459       of them set in the defaults or in any key. See the section on GRT
460       sorting for more.
461
462       The "signed" and "unsigned" attributes mark this number key as an
463       integer. The GRT does the least amount of work processing an unsigned
464       integer and only slightly more work for a signed integer. It is worth
465       using these attributes if a sort key is restricted to integers.
466
467       The "signed_float" (which is the normal default for all keys) and
468       "unsigned_float" attributes mark this number key as a float. The GRT
469       does the less work processing an unsigned float then a signed float.
470       It is worth using the "unsigned_float" attribute if a sort key is
471       restricted to non-negative values. The "signed_float" attribute is
472       supported to allow overriding defaults and to make it easier to auto-
473       generate sorts.
474
475       "fixed/varying" (GRT only)
476
477       These attributes are only used by the GRT sort style. They are used to
478       describe the type of a string key so that the GRT can properly process
479       and cache the key's value. It is illegal to have more than one of them
480       set in the defaults or in any key. See the section on GRT sorting for
481       more.
482
483       "fixed" is a value attribute that marks this string key as always being
484       this length. The extracted value will either be padded with null (0x0)
485       bytes or truncated to the specified length (the value of "fixed"). The
486       data in this key may have embedded null bytes (0x0) and may be sorted
487       in descending order.
488
489       "varying" is a Boolean attribute marks this string key as being of
490       varying lengths. The GRT sorter will do a scan of all of this key's
491       values to find the maximum string length and then it pads all the
492       extracted values to that length. The data in this key may have embedded
493       null bytes (0x0) and may be sorted in descending order.
494
495   Key Extraction Code
496       Each input record must have its sort keys extracted from the data.
497       This is the purpose of the 'code' attribute in key descriptions.  The
498       code has to operate on a record which is in $_ and it must return the
499       key value. The code is executed in a list context so you can use grabs
500       in m// to return the key. Note that only the first grab will be used
501       but you shouldn't have more than one anyway. See the examples below.
502
503       Code can be either a string, a qr// object (Regexp reference) or an
504       anonymous sub (CODE reference).
505
506       If qr// is used, the actual generated code will be m($qr) which works
507       because qr// will interpolate to its string representation. The
508       advantage of qr// over a string is that the qr// will be syntax checked
509       at compile time while the string only later when the generated sorter
510       is compiled by an eval.
511
512       If a CODE reference is found, it is used to extract the key in the
513       generated sorter. As with qr//, the advantage is that the extraction
514       code is syntax checked at compile time and not runtime. Also the
515       deparsed code is wrapped in a "do{}" block so you may use complex code
516       to extract the key. In the default case a CODE reference will be
517       deparsed by the B::Deparse module into Perl source. If the key has the
518       "closure" attribute set, the code will be called to extract the key.
519
520       The following will generate sorters with exact same behavior:
521
522               $sorter = make_sorter( 'ST', string => '/(\w+)/' ) ;
523               $sorter = make_sorter( 'ST', string => qr/(\w+)/ ) ;
524               $sorter = make_sorter( 'ST', string => sub { /(\w+)/ } ) ;
525               $sorter = make_sorter( 'ST', 'closure', string => sub { /(\w+)/ } ) ;
526
527       Extraction code for a key can be set in one of three ways.
528
529       No explicit code
530
531       If you don't pass any extraction code to a key, it will default to $_
532       which is the entire record. This is useful in certain cases such as in
533       simple sorts where you are sorting the entire record.
534
535               # sort numerically and in reverse order
536               my $sorter = make_sorter( qw( plain number descending ) ;
537
538               # sort with case folding
539               my $sorter = make_sorter( qw( plain no_case string ) ;
540
541               # sort by file time stamp and then by name
542               my $sorter = make_sorter( 'ST', number => '-M', 'string' ) ;
543
544       Code is the only key attribute
545
546       In many cases you don't need to specify any specific key attributes
547       (the normal or globally set defaults are fine) but you need extraction
548       code. If the argument that follows a key type ( 'string' or 'number' )
549       is not a valid keyword, it will be assumed to be the extraction code
550       for that key.
551
552               # grab the first number string as the key
553               my $sorter = make_sorter( qw( plain number /(\d+)/ ) ) ;
554
555               # no_case string sort on the 3rd-5th chars of the 2nd array element
556               my $sorter = make_sorter(
557                       plain   => 1,
558                       no_case => 1,
559                       string  => 'substr( $_->[1], 2, 3)'
560               ) ;
561
562       Key needs specific attributes
563
564       When the key needs to have its own specific attributes other than its
565       code, you need to pass them in an ARRAY or HASH reference. This is
566       mostly needed when there are multiple keys and the defaults are not
567       correct for all the keys.
568
569               # string sort by the first 3 elements of the array record with
570               # different case requirements
571
572               my $sorter = make_sorter(
573                       ST      => 1,
574                       string  => {
575                               code    => '$_->[0]',
576                               no_case => 1,
577                       },
578                       string  => '$_->[1]',
579                       string  => {
580                               code    => '$_->[2]',
581                               no_case => 1,
582                       },
583               ) ;
584
585               # GRT sort with multiple unsigned integers and padded strings
586               # note that some keys use a hash ref and some an array ref
587               # the record is marked with key\d: sections
588               my $sorter = make_sorter(
589                       GRT     => 1,
590                       descending => 1,
591                       number  => {
592                               code    => 'key1:(\d+)',
593                               unsigned => 1,
594                       },
595                       number  => [
596                               code    => 'key2:([+-]?\d+)',
597                               qw( signed ascending ),
598                       ],
599                       string  => [
600                               code    => 'key3:(\w{10})',
601                               fixed => 1,
602                               ascending => 1,
603                       ],
604                       # pad the extracted keys to 8 chars
605                       string  => {
606                               code    => 'key4:([A-Z]+)',
607                               pad => 8,
608                       },
609               ) ;
610

Key Caching

612       A good question to ask is "What speed advantages do you get from this
613       module when all the sorts generated use Perl's internal sort function?"
614       The sort function has a O( N * log N ) growth function which means that
615       the amount of work done increases by that formula as N (the number of
616       input records) increases. In a plain sort this means the the key
617       extraction code is executed N * log N times when you only have N
618       records. That can be a very large waste of cpu time. So the other three
619       sort styles speed up the overall sort by only executing the extraction
620       code N times by caching the extracted keys. How they cache the keys is
621       their primary difference. To compare or study the actual code generated
622       for the different sort styles, you can run make_sorter and just change
623       the style. Then call sorter_source (not exported by default) and pass
624       it the sort code reference returned by make_sorter. It will return the
625       generated sort source.
626
627   "plain"
628       Plain sorting doesn't do any key caching. It is fine for short input
629       lists (see the Benchmark section) and also as a way to see how much CPU
630       is saved when using one of the other styles.
631
632   "orcish"
633       The Orcish maneuvre (created by Joseph Hall) caches the extracted keys
634       in a hash. It does this with code like this:
635
636               $cache{$a} ||= CODE($a) ;
637
638       CODE is the extract code and it operates on a record in $a. If we have
639       never seen this record before then the cache entry will be undef and
640       the ||= operator will assign the extracted key to that hash slot. The
641       next time this record is seen in a comparison, the saved extracted key
642       will be found in the hash and used. The name orcish comes from OR-
643       cache.
644
645   "ST"
646       The ST (Schwartzian Transform and popularized by Randal Schwartz) uses
647       an anonymous array to store the record and its extracted keys. It first
648       executes a map that creates an anonymous array:
649
650               map [ $_, CODE1( $_ ), CODE2( $_ ) ], @input
651
652       The CODE's extract the set of keys from the record but only once per
653       record so it is O(N). Now the sort function can just do the comparisons
654       and it returns a list of sorted anonymous arrays.
655
656               sort {
657                       $a->[1] cmp $b->[1]
658                               ||
659                       $a->[2] cmp $b->[2]
660               }
661
662       Finally, we need to get back the original records which are in the
663       first slot of the anonymous array:
664
665               map $_->[0]
666
667       This is why the ST is known as a map/sort/map technique.
668
669   "GRT"
670       The Guttman-Rosler Transform (popularized by Uri Guttman and Larry
671       Rosler) uses a string to cache the extracted keys as well as either the
672       record or its index. It is also a map/sort/map technique but because
673       its cache is a string, it can be sorted without any Perl level callback
674       (the {} block passed to sort). This is a signifigant win since that
675       callback is running O( N log N). But this speedup comes at a cost of
676       complexity. You can't just join the keys into a string and properly
677       sort them. Each key may need to be processed so that it will correctly
678       sort in order and it doesn't interfere with other keys. That is why the
679       GRT has several key attributes to enable it to properly and efficiently
680       pack the sort keys into a single string. The following lists the GRT
681       key attributes, when you need them and what key processing is done for
682       each.  Note that you can always enable the GRT specific attributes as
683       they are just ignored by the other sort styles.
684
685       The GRT gains its speed by using a single byte string to cache all of
686       the extracted keys from a given input record. Packing keys into a
687       string such that it will lexically sort the correct way requires some
688       deep mojo and data munging. But that is why this module was written -
689       to hide all that from the coder. Below are descriptions of how the
690       various key types are packed and how to best use the GRT specific key
691       attributes.  Note: you can only use one of the GRT number or string
692       attributes for any key. Setting more than one in either the defaults or
693       in any given key is an error (a key's attribute can override a default
694       choice).
695
696       "unsigned"
697
698       The 'unsigned' Boolean attribute tells the GRT that this number key is
699       a non-negative integer. This allows the GRT to just pack it into 4
700       bytes using the N format (network order - big endian). An integer
701       packed this way will have its most significant bytes compared before
702       its least signifigant bytes. This involves the least amount of key
703       munging and so it is the most efficient way to sort numbers in the GRT.
704
705       If you want this key to sort in descending order, then the key value is
706       negated and normalized (see the 'signed' attribute) so there is no
707       advantage to using 'unsigned'.
708
709       "signed"
710
711       The 'signed' Boolean attribute tells the GRT that this number key is an
712       integer. This allows the GRT to just pack it into 4 bytes using the N
713       format (network order - big endian).  The key value must first be
714       normalized which will convert it to an unsigned integer but with the
715       same ordering as a signed integer. This is simply done by inverting the
716       sign (highest order) bit of the integer. As mentioned above, when
717       sorting this key in descending order, the GRT just negates the key
718       value.
719
720       NOTE: In the GRT the signed and unsigned integer attributes only work
721       on perl built with 32 bit integers. This is due to using the N format
722       of pack which is specified to be 32 bits. A future version may support
723       64 bit integers (anyone want to help?).
724
725       "unsigned_float"
726
727       The 'unsigned_float' Boolean attribute tells the GRT that this number
728       key is a non-negative floating point number. This allows the GRT to
729       pack it into 8 bytes using the 'd' format. A float packed this way will
730       have its most significant bytes compared before its least signifigant
731       bytes.
732
733       "signed_float"
734
735       The "signed_float" Boolean attribute (which is the default for all
736       number keys when using the GRT) tells the GRT that this number key is a
737       floating point number. This allows the GRT to pack it into 8 bytes
738       using the 'd' format. A float packed this way will have its most
739       significant bytes compared before its least signifigant bytes. When
740       processed this key will be normalized to an unsigned float similar to
741       to the "signed" to "unsigned" conversion mentioned above.
742
743       NOTE: The GRT only works with floats that are in the IEEE format for
744       doubles. This includes most modern architectures including x86, sparc,
745       powerpc, mips, etc. If the cpu doesn't have IEEE floats you can either
746       use the integer attributes or select another sort style (all the others
747       have no restriction on float formats).
748
749       simple string.
750
751       If a string key is being sorted in ascending order with the GRT and it
752       doesn't have one of the GRT string attributes, it will be packed
753       without any munging and a null (0x0) byte will be appended to it. This
754       byte enables a shorter string to sort before longer ones that start
755       with the shorter string.
756
757       NOTE: You cannot sort strings in descending order in the GRT unless the
758       key has either the 'fixed' or 'varying' attributes set. Also, if a
759       string is being sorted in ascending order but has any null (0x0) bytes
760       in it, the key must have one of those attributes set.
761
762       "fixed"
763
764       This value attribute tells the GRT to pack this key value as a fixed
765       length string. The extracted value will either be padded with null
766       (0x0) bytes or truncated to the specified length (the value of the
767       "fixed" attribute). This means it can be packed into the cache string
768       with no padding and no trailing null byte is needed. The key can
769       contain any data including null (0x0) bytes. Data munging happens only
770       if the key's sort order is descending. Then the key value is xor'ed
771       with a same length string of 0xff bytes. This toggles each bit which
772       allows for a lexical comparison but in the reverse order. This same bit
773       inversion is used for descending varying strings.
774
775       "varying"
776
777       This Boolean attribute tells the GRT that this key value is a varying
778       length string and has no predetermined padding length. A prescan is
779       done to determine the maximum string length for this key and that is
780       used as the padding length. The rest is the same as with the 'fixed'
781       attribute.
782
783   "sorter_source"
784       This sub (which can be exported) returns the source of a generated sort
785       sub or the source of the last one that had an error. To get the source
786       of an existing sort sub, pass it a reference to that sub (i.e.  the
787       reference returned from make_sorter). To get the source for a failed
788       call to make_sorter, don't pass in any arguments.
789
790               my $sorter = make_sorter( ... ) ;
791               print sorter_source( $sorter ) ;
792
793               make_sorter( name => 'my_sorter', ... ) ;
794               print sorter_source( \&my_sorter ) ;
795
796               my $sorter = make_sorter( ... )
797                       or die "make_sorter error: $@\n", sorter_source();
798
799       If all you want is the generated source you can just do:
800
801               print sorter_source make_sorter( ... ) ;
802
803   Error Handling
804       When "make_sorter" detects an error (either bad arguments or when the
805       generated sorter won't compile), it returns undef and set $@ to an
806       error message. The error message will include the generated source and
807       compiler and warning errors if the sorter didn't compile correctly.
808       The test t/errors.t covers all the possible error messages.  You can
809       also retrieve the generated source after a compiling error by calling
810       "sorter_source".
811

TESTS

813       "Sort::Maker" uses a table of test configurations that can both run
814       tests and benchmarks. Each test script is mostly a table that generates
815       multiple versions of the sorters, generate sample data and compares the
816       sorter results with a sort that is known to be good. If you run the
817       scripts directly and with a -bench argument, then they generate the
818       same sorter subs and benchmark them. This design ensures that
819       benchmarks are running on correctly generated code and it makes it very
820       easy to add more test and benchmark variations. The code that does all
821       the work is in t/common.pl. Here is a typical test table entry:
822
823               {
824                       skip    => 0,
825                       source  => 0,
826                       name    => 'init_code',
827                       gen     => sub { rand_choice( @string_keys ) . ':' .
828                                        rand_choice( @number_keys ) },
829                       gold    => sub {
830                                ($a =~ /^(\w+)/)[0] cmp ($b =~ /^(\w+)/)[0]
831                                               ||
832                                ($a =~ /(\d+$)/)[0] <=> ($b =~ /(\d+$)/)[0]
833                       },
834                       args    => [
835                               init_code => 'my( $str, $num ) ;',
836                               string => 'do{( $str, $num ) = /^(\w+):(\d+)$/; $str}',
837                               number => '$num',
838                       ],
839               },
840
841       "skip" is a boolean that causes this test/benchmark to be skipped.
842       Setting "source" causes the sorter's source to be printed out.  "gen"
843       is a sub that generates a single input record. There are support subs
844       in t/common.pl that will generate random data. Some tests have a "data"
845       field which is fixed data for a test (instead of the generated data).
846       The <gold> field is a comparision subroutine usable by the sort
847       function. It is used to sort the test data into a golden result which
848       is used to compare against all the generated sorters.  "args" is an
849       anonymous array of arguments for a sorter or a hash ref with multiple
850       named/args pairs. See t/io.t for an example of that.
851

BENCHMARKS

EXPORT

854       This module always exports the "make_sorter" sub.  It can also
855       optionally export "sorter_source".
856

BUGS

858       Sort::Maker GRT currently works only with 32 bit integers due to pack N
859       format being exactly 32 bits. If someone with a 64 bit Perl wants to
860       work on using the Q format or the ! suffix and dealing with endian
861       issues, I will be glad to help and support it. It would be best if
862       there was a network (big endian) pack format for quads/longlongs but
863       that can be done similarly to how floats are packed now.
864

AUTHOR

866       Uri Guttman, <uri@stemsystems.com>
867

ACKNOWLEDGEMENTS

869       I would like to thank the inimitable Damian Conway for his help in the
870       API design, the POD, and for being a good Perl friend.
871
872       And thanks to Boston.pm for the idea of allowing qr// for key
873       extraction code.
874
875
876
877perl v5.28.1                      2006-12-29                    Sort::Maker(3)
Impressum