1PERLFAQ4(1)            Perl Programmers Reference Guide            PERLFAQ4(1)
2
3
4

NAME

6       perlfaq4 - Data Manipulation
7

DESCRIPTION

9       This section of the FAQ answers questions related to manipulating
10       numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
11

Data: Numbers

13   Why am I getting long decimals (eg, 19.9499999999999) instead of the
14       numbers I should be getting (eg, 19.95)?
15       For the long explanation, see David Goldberg's "What Every Computer
16       Scientist Should Know About Floating-Point Arithmetic"
17       (http://docs.sun.com/source/806-3568/ncg_goldberg.html).
18
19       Internally, your computer represents floating-point numbers in binary.
20       Digital (as in powers of two) computers cannot store all numbers
21       exactly.  Some real numbers lose precision in the process.  This is a
22       problem with how computers store numbers and affects all computer
23       languages, not just Perl.
24
25       perlnumber shows the gory details of number representations and
26       conversions.
27
28       To limit the number of decimal places in your numbers, you can use the
29       "printf" or "sprintf" function.  See the "Floating Point Arithmetic"
30       for more details.
31
32               printf "%.2f", 10/3;
33
34               my $number = sprintf "%.2f", 10/3;
35
36   Why is int() broken?
37       Your "int()" is most probably working just fine.  It's the numbers that
38       aren't quite what you think.
39
40       First, see the answer to "Why am I getting long decimals (eg,
41       19.9499999999999) instead of the numbers I should be getting (eg,
42       19.95)?".
43
44       For example, this
45
46               print int(0.6/0.2-2), "\n";
47
48       will in most computers print 0, not 1, because even such simple numbers
49       as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
50       What you think in the above as 'three' is really more like
51       2.9999999999999995559.
52
53   Why isn't my octal data interpreted correctly?
54       (contributed by brian d foy)
55
56       You're probably trying to convert a string to a number, which Perl only
57       converts as a decimal number. When Perl converts a string to a number,
58       it ignores leading spaces and zeroes, then assumes the rest of the
59       digits are in base 10:
60
61               my $string = '0644';
62
63               print $string + 0;  # prints 644
64
65               print $string + 44; # prints 688, certainly not octal!
66
67       This problem usually involves one of the Perl built-ins that has the
68       same name a Unix command that uses octal numbers as arguments on the
69       command line. In this example, "chmod" on the command line knows that
70       its first argument is octal because that's what it does:
71
72               %prompt> chmod 644 file
73
74       If you want to use the same literal digits (644) in Perl, you have to
75       tell Perl to treat them as octal numbers either by prefixing the digits
76       with a 0 or using "oct":
77
78               chmod(     0644, $file);   # right, has leading zero
79               chmod( oct(644), $file );  # also correct
80
81       The problem comes in when you take your numbers from something that
82       Perl thinks is a string, such as a command line argument in @ARGV:
83
84               chmod( $ARGV[0],      $file);   # wrong, even if "0644"
85
86               chmod( oct($ARGV[0]), $file );  # correct, treat string as octal
87
88       You can always check the value you're using by printing it in octal
89       notation to ensure it matches what you think it should be. Print it in
90       octal  and decimal format:
91
92               printf "0%o %d", $number, $number;
93
94   Does Perl have a round() function?  What about ceil() and floor()?  Trig
95       functions?
96       Remember that "int()" merely truncates toward 0.  For rounding to a
97       certain number of digits, "sprintf()" or "printf()" is usually the
98       easiest route.
99
100               printf("%.3f", 3.1415926535);   # prints 3.142
101
102       The "POSIX" module (part of the standard Perl distribution) implements
103       "ceil()", "floor()", and a number of other mathematical and
104       trigonometric functions.
105
106               use POSIX;
107               $ceil   = ceil(3.5);   # 4
108               $floor  = floor(3.5);  # 3
109
110       In 5.000 to 5.003 perls, trigonometry was done in the "Math::Complex"
111       module.  With 5.004, the "Math::Trig" module (part of the standard Perl
112       distribution) implements the trigonometric functions. Internally it
113       uses the "Math::Complex" module and some functions can break out from
114       the real axis into the complex plane, for example the inverse sine of
115       2.
116
117       Rounding in financial applications can have serious implications, and
118       the rounding method used should be specified precisely.  In these
119       cases, it probably pays not to trust whichever system rounding is being
120       used by Perl, but to instead implement the rounding function you need
121       yourself.
122
123       To see why, notice how you'll still have an issue on half-way-point
124       alternation:
125
126               for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
127
128               0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
129               0.8 0.8 0.9 0.9 1.0 1.0
130
131       Don't blame Perl.  It's the same as in C.  IEEE says we have to do
132       this. Perl numbers whose absolute values are integers under 2**31 (on
133       32 bit machines) will work pretty much like mathematical integers.
134       Other numbers are not guaranteed.
135
136   How do I convert between numeric representations/bases/radixes?
137       As always with Perl there is more than one way to do it.  Below are a
138       few examples of approaches to making common conversions between number
139       representations.  This is intended to be representational rather than
140       exhaustive.
141
142       Some of the examples later in perlfaq4 use the "Bit::Vector" module
143       from CPAN. The reason you might choose "Bit::Vector" over the perl
144       built in functions is that it works with numbers of ANY size, that it
145       is optimized for speed on some operations, and for at least some
146       programmers the notation might be familiar.
147
148       How do I convert hexadecimal into decimal
149           Using perl's built in conversion of "0x" notation:
150
151                   $dec = 0xDEADBEEF;
152
153           Using the "hex" function:
154
155                   $dec = hex("DEADBEEF");
156
157           Using "pack":
158
159                   $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
160
161           Using the CPAN module "Bit::Vector":
162
163                   use Bit::Vector;
164                   $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
165                   $dec = $vec->to_Dec();
166
167       How do I convert from decimal to hexadecimal
168           Using "sprintf":
169
170                   $hex = sprintf("%X", 3735928559); # upper case A-F
171                   $hex = sprintf("%x", 3735928559); # lower case a-f
172
173           Using "unpack":
174
175                   $hex = unpack("H*", pack("N", 3735928559));
176
177           Using "Bit::Vector":
178
179                   use Bit::Vector;
180                   $vec = Bit::Vector->new_Dec(32, -559038737);
181                   $hex = $vec->to_Hex();
182
183           And "Bit::Vector" supports odd bit counts:
184
185                   use Bit::Vector;
186                   $vec = Bit::Vector->new_Dec(33, 3735928559);
187                   $vec->Resize(32); # suppress leading 0 if unwanted
188                   $hex = $vec->to_Hex();
189
190       How do I convert from octal to decimal
191           Using Perl's built in conversion of numbers with leading zeros:
192
193                   $dec = 033653337357; # note the leading 0!
194
195           Using the "oct" function:
196
197                   $dec = oct("33653337357");
198
199           Using "Bit::Vector":
200
201                   use Bit::Vector;
202                   $vec = Bit::Vector->new(32);
203                   $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
204                   $dec = $vec->to_Dec();
205
206       How do I convert from decimal to octal
207           Using "sprintf":
208
209                   $oct = sprintf("%o", 3735928559);
210
211           Using "Bit::Vector":
212
213                   use Bit::Vector;
214                   $vec = Bit::Vector->new_Dec(32, -559038737);
215                   $oct = reverse join('', $vec->Chunk_List_Read(3));
216
217       How do I convert from binary to decimal
218           Perl 5.6 lets you write binary numbers directly with the "0b"
219           notation:
220
221                   $number = 0b10110110;
222
223           Using "oct":
224
225                   my $input = "10110110";
226                   $decimal = oct( "0b$input" );
227
228           Using "pack" and "ord":
229
230                   $decimal = ord(pack('B8', '10110110'));
231
232           Using "pack" and "unpack" for larger strings:
233
234                   $int = unpack("N", pack("B32",
235                   substr("0" x 32 . "11110101011011011111011101111", -32)));
236                   $dec = sprintf("%d", $int);
237
238                   # substr() is used to left pad a 32 character string with zeros.
239
240           Using "Bit::Vector":
241
242                   $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
243                   $dec = $vec->to_Dec();
244
245       How do I convert from decimal to binary
246           Using "sprintf" (perl 5.6+):
247
248                   $bin = sprintf("%b", 3735928559);
249
250           Using "unpack":
251
252                   $bin = unpack("B*", pack("N", 3735928559));
253
254           Using "Bit::Vector":
255
256                   use Bit::Vector;
257                   $vec = Bit::Vector->new_Dec(32, -559038737);
258                   $bin = $vec->to_Bin();
259
260           The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
261           are left as an exercise to the inclined reader.
262
263   Why doesn't & work the way I want it to?
264       The behavior of binary arithmetic operators depends on whether they're
265       used on numbers or strings.  The operators treat a string as a series
266       of bits and work with that (the string "3" is the bit pattern
267       00110011).  The operators work with the binary form of a number (the
268       number 3 is treated as the bit pattern 00000011).
269
270       So, saying "11 & 3" performs the "and" operation on numbers (yielding
271       3).  Saying "11" & "3" performs the "and" operation on strings
272       (yielding "1").
273
274       Most problems with "&" and "|" arise because the programmer thinks they
275       have a number but really it's a string.  The rest arise because the
276       programmer says:
277
278               if ("\020\020" & "\101\101") {
279                       # ...
280                       }
281
282       but a string consisting of two null bytes (the result of "\020\020" &
283       "\101\101") is not a false value in Perl.  You need:
284
285               if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
286                       # ...
287                       }
288
289   How do I multiply matrices?
290       Use the "Math::Matrix" or "Math::MatrixReal" modules (available from
291       CPAN) or the "PDL" extension (also available from CPAN).
292
293   How do I perform an operation on a series of integers?
294       To call a function on each element in an array, and collect the
295       results, use:
296
297               @results = map { my_func($_) } @array;
298
299       For example:
300
301               @triple = map { 3 * $_ } @single;
302
303       To call a function on each element of an array, but ignore the results:
304
305               foreach $iterator (@array) {
306                       some_func($iterator);
307                       }
308
309       To call a function on each integer in a (small) range, you can use:
310
311               @results = map { some_func($_) } (5 .. 25);
312
313       but you should be aware that the ".." operator creates an array of all
314       integers in the range.  This can take a lot of memory for large ranges.
315       Instead use:
316
317               @results = ();
318               for ($i=5; $i < 500_005; $i++) {
319                       push(@results, some_func($i));
320                       }
321
322       This situation has been fixed in Perl5.005. Use of ".." in a "for" loop
323       will iterate over the range, without creating the entire range.
324
325               for my $i (5 .. 500_005) {
326                       push(@results, some_func($i));
327                       }
328
329       will not create a list of 500,000 integers.
330
331   How can I output Roman numerals?
332       Get the http://www.cpan.org/modules/by-module/Roman
333       <http://www.cpan.org/modules/by-module/Roman> module.
334
335   Why aren't my random numbers random?
336       If you're using a version of Perl before 5.004, you must call "srand"
337       once at the start of your program to seed the random number generator.
338
339                BEGIN { srand() if $] < 5.004 }
340
341       5.004 and later automatically call "srand" at the beginning.  Don't
342       call "srand" more than once--you make your numbers less random, rather
343       than more.
344
345       Computers are good at being predictable and bad at being random
346       (despite appearances caused by bugs in your programs :-).  see the
347       random article in the "Far More Than You Ever Wanted To Know"
348       collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy
349       of Tom Phoenix, talks more about this.  John von Neumann said, "Anyone
350       who attempts to generate random numbers by deterministic means is, of
351       course, living in a state of sin."
352
353       If you want numbers that are more random than "rand" with "srand"
354       provides, you should also check out the "Math::TrulyRandom" module from
355       CPAN.  It uses the imperfections in your system's timer to generate
356       random numbers, but this takes quite a while.  If you want a better
357       pseudorandom generator than comes with your operating system, look at
358       "Numerical Recipes in C" at <http://www.nr.com/>.
359
360   How do I get a random number between X and Y?
361       To get a random number between two values, you can use the "rand()"
362       built-in to get a random number between 0 and 1. From there, you shift
363       that into the range that you want.
364
365       "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
366       you want to have perl figure out is a random number in the range from 0
367       to the difference between your X and Y.
368
369       That is, to get a number between 10 and 15, inclusive, you want a
370       random number between 0 and 5 that you can then add to 10.
371
372               my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
373
374       Hence you derive the following simple function to abstract that. It
375       selects a random integer between the two given integers (inclusive),
376       For example: "random_int_between(50,120)".
377
378               sub random_int_between {
379                       my($min, $max) = @_;
380                       # Assumes that the two arguments are integers themselves!
381                       return $min if $min == $max;
382                       ($min, $max) = ($max, $min)  if  $min > $max;
383                       return $min + int rand(1 + $max - $min);
384                       }
385

Data: Dates

387   How do I find the day or week of the year?
388       The "localtime" function returns the day of the year.  Without an
389       argument "localtime" uses the current time.
390
391               $day_of_year = (localtime)[7];
392
393       The "POSIX" module can also format a date as the day of the year or
394       week of the year.
395
396               use POSIX qw/strftime/;
397               my $day_of_year  = strftime "%j", localtime;
398               my $week_of_year = strftime "%W", localtime;
399
400       To get the day of year for any date, use "POSIX"'s "mktime" to get a
401       time in epoch seconds for the argument to "localtime".
402
403               use POSIX qw/mktime strftime/;
404               my $week_of_year = strftime "%W",
405                       localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
406
407       The "Date::Calc" module provides two functions to calculate these.
408
409               use Date::Calc;
410               my $day_of_year  = Day_of_Year(  1987, 12, 18 );
411               my $week_of_year = Week_of_Year( 1987, 12, 18 );
412
413   How do I find the current century or millennium?
414       Use the following simple functions:
415
416               sub get_century    {
417                       return int((((localtime(shift || time))[5] + 1999))/100);
418                       }
419
420               sub get_millennium {
421                       return 1+int((((localtime(shift || time))[5] + 1899))/1000);
422                       }
423
424       On some systems, the "POSIX" module's "strftime()" function has been
425       extended in a non-standard way to use a %C format, which they sometimes
426       claim is the "century". It isn't, because on most such systems, this is
427       only the first two digits of the four-digit year, and thus cannot be
428       used to reliably determine the current century or millennium.
429
430   How can I compare two dates and find the difference?
431       (contributed by brian d foy)
432
433       You could just store all your dates as a number and then subtract.
434       Life isn't always that simple though. If you want to work with
435       formatted dates, the "Date::Manip", "Date::Calc", or "DateTime" modules
436       can help you.
437
438   How can I take a string and turn it into epoch seconds?
439       If it's a regular enough string that it always has the same format, you
440       can split it up and pass the parts to "timelocal" in the standard
441       "Time::Local" module.  Otherwise, you should look into the "Date::Calc"
442       and "Date::Manip" modules from CPAN.
443
444   How can I find the Julian Day?
445       (contributed by brian d foy and Dave Cross)
446
447       You can use the "Time::JulianDay" module available on CPAN.  Ensure
448       that you really want to find a Julian day, though, as many people have
449       different ideas about Julian days.  See
450       http://www.hermetic.ch/cal_stud/jdn.htm for instance.
451
452       You can also try the "DateTime" module, which can convert a date/time
453       to a Julian Day.
454
455               $ perl -MDateTime -le'print DateTime->today->jd'
456               2453401.5
457
458       Or the modified Julian Day
459
460               $ perl -MDateTime -le'print DateTime->today->mjd'
461               53401
462
463       Or even the day of the year (which is what some people think of as a
464       Julian day)
465
466               $ perl -MDateTime -le'print DateTime->today->doy'
467               31
468
469   How do I find yesterday's date?
470       (contributed by brian d foy)
471
472       Use one of the Date modules. The "DateTime" module makes it simple, and
473       give you the same time of day, only the day before.
474
475               use DateTime;
476
477               my $yesterday = DateTime->now->subtract( days => 1 );
478
479               print "Yesterday was $yesterday\n";
480
481       You can also use the "Date::Calc" module using its "Today_and_Now"
482       function.
483
484               use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
485
486               my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
487
488               print "@date_time\n";
489
490       Most people try to use the time rather than the calendar to figure out
491       dates, but that assumes that days are twenty-four hours each.  For most
492       people, there are two days a year when they aren't: the switch to and
493       from summer time throws this off. Let the modules do the work.
494
495       If you absolutely must do it yourself (or can't use one of the
496       modules), here's a solution using "Time::Local", which comes with Perl:
497
498               # contributed by Gunnar Hjalmarsson
499                use Time::Local;
500                my $today = timelocal 0, 0, 12, ( localtime )[3..5];
501                my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
502                printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
503
504       In this case, you measure the day starting at noon, and subtract 24
505       hours. Even if the length of the calendar day is 23 or 25 hours, you'll
506       still end up on the previous calendar day, although not at noon. Since
507       you don't care about the time, the one hour difference doesn't matter
508       and you end up with the previous date.
509
510   Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
511       (contributed by brian d foy)
512
513       Perl itself never had a Y2K problem, although that never stopped people
514       from creating Y2K problems on their own. See the documentation for
515       "localtime" for its proper use.
516
517       Starting with Perl 5.11, "localtime" and "gmtime" can handle dates past
518       03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
519       still might get a warning on a 32-bit "perl":
520
521               % perl5.11.2 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
522               Integer overflow in hexadecimal number at -e line 1.
523               Wed Nov  1 19:42:39 5576711
524
525       On a 64-bit "perl", you can get even larger dates for those really long
526       running projects:
527
528               % perl5.11.2 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
529               Thu Nov  2 00:42:39 5576711
530
531       You're still out of luck if you need to keep tracking of decaying
532       protons though.
533

Data: Strings

535   How do I validate input?
536       (contributed by brian d foy)
537
538       There are many ways to ensure that values are what you expect or want
539       to accept. Besides the specific examples that we cover in the perlfaq,
540       you can also look at the modules with "Assert" and "Validate" in their
541       names, along with other modules such as "Regexp::Common".
542
543       Some modules have validation for particular types of input, such as
544       "Business::ISBN", "Business::CreditCard", "Email::Valid", and
545       "Data::Validate::IP".
546
547   How do I unescape a string?
548       It depends just what you mean by "escape".  URL escapes are dealt with
549       in perlfaq9.  Shell escapes with the backslash ("\") character are
550       removed with
551
552               s/\\(.)/$1/g;
553
554       This won't expand "\n" or "\t" or any other special escapes.
555
556   How do I remove consecutive pairs of characters?
557       (contributed by brian d foy)
558
559       You can use the substitution operator to find pairs of characters (or
560       runs of characters) and replace them with a single instance. In this
561       substitution, we find a character in "(.)". The memory parentheses
562       store the matched character in the back-reference "\1" and we use that
563       to require that the same thing immediately follow it. We replace that
564       part of the string with the character in $1.
565
566               s/(.)\1/$1/g;
567
568       We can also use the transliteration operator, "tr///". In this example,
569       the search list side of our "tr///" contains nothing, but the "c"
570       option complements that so it contains everything. The replacement list
571       also contains nothing, so the transliteration is almost a no-op since
572       it won't do any replacements (or more exactly, replace the character
573       with itself). However, the "s" option squashes duplicated and
574       consecutive characters in the string so a character does not show up
575       next to itself
576
577               my $str = 'Haarlem';   # in the Netherlands
578               $str =~ tr///cs;       # Now Harlem, like in New York
579
580   How do I expand function calls in a string?
581       (contributed by brian d foy)
582
583       This is documented in perlref, and although it's not the easiest thing
584       to read, it does work. In each of these examples, we call the function
585       inside the braces used to dereference a reference. If we have more than
586       one return value, we can construct and dereference an anonymous array.
587       In this case, we call the function in list context.
588
589               print "The time values are @{ [localtime] }.\n";
590
591       If we want to call the function in scalar context, we have to do a bit
592       more work. We can really have any code we like inside the braces, so we
593       simply have to end with the scalar reference, although how you do that
594       is up to you, and you can use code inside the braces. Note that the use
595       of parens creates a list context, so we need "scalar" to force the
596       scalar context on the function:
597
598               print "The time is ${\(scalar localtime)}.\n"
599
600               print "The time is ${ my $x = localtime; \$x }.\n";
601
602       If your function already returns a reference, you don't need to create
603       the reference yourself.
604
605               sub timestamp { my $t = localtime; \$t }
606
607               print "The time is ${ timestamp() }.\n";
608
609       The "Interpolation" module can also do a lot of magic for you. You can
610       specify a variable name, in this case "E", to set up a tied hash that
611       does the interpolation for you. It has several other methods to do this
612       as well.
613
614               use Interpolation E => 'eval';
615               print "The time values are $E{localtime()}.\n";
616
617       In most cases, it is probably easier to simply use string
618       concatenation, which also forces scalar context.
619
620               print "The time is " . localtime() . ".\n";
621
622   How do I find matching/nesting anything?
623       This isn't something that can be done in one regular expression, no
624       matter how complicated.  To find something between two single
625       characters, a pattern like "/x([^x]*)x/" will get the intervening bits
626       in $1. For multiple ones, then something more like "/alpha(.*?)omega/"
627       would be needed. But none of these deals with nested patterns.  For
628       balanced expressions using "(", "{", "[" or "<" as delimiters, use the
629       CPAN module Regexp::Common, or see "(??{ code })" in perlre.  For other
630       cases, you'll have to write a parser.
631
632       If you are serious about writing a parser, there are a number of
633       modules or oddities that will make your life a lot easier.  There are
634       the CPAN modules "Parse::RecDescent", "Parse::Yapp", and
635       "Text::Balanced"; and the "byacc" program. Starting from perl 5.8 the
636       "Text::Balanced" is part of the standard distribution.
637
638       One simple destructive, inside-out approach that you might try is to
639       pull out the smallest nesting parts one at a time:
640
641               while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
642                       # do something with $1
643                       }
644
645       A more complicated and sneaky approach is to make Perl's regular
646       expression engine do it for you.  This is courtesy Dean Inada, and
647       rather has the nature of an Obfuscated Perl Contest entry, but it
648       really does work:
649
650               # $_ contains the string to parse
651               # BEGIN and END are the opening and closing markers for the
652               # nested text.
653
654               @( = ('(','');
655               @) = (')','');
656               ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
657               @$ = (eval{/$re/},$@!~/unmatched/i);
658               print join("\n",@$[0..$#$]) if( $$[-1] );
659
660   How do I reverse a string?
661       Use "reverse()" in scalar context, as documented in "reverse" in
662       perlfunc.
663
664               $reversed = reverse $string;
665
666   How do I expand tabs in a string?
667       You can do it yourself:
668
669               1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
670
671       Or you can just use the "Text::Tabs" module (part of the standard Perl
672       distribution).
673
674               use Text::Tabs;
675               @expanded_lines = expand(@lines_with_tabs);
676
677   How do I reformat a paragraph?
678       Use "Text::Wrap" (part of the standard Perl distribution):
679
680               use Text::Wrap;
681               print wrap("\t", '  ', @paragraphs);
682
683       The paragraphs you give to "Text::Wrap" should not contain embedded
684       newlines.  "Text::Wrap" doesn't justify the lines (flush-right).
685
686       Or use the CPAN module "Text::Autoformat".  Formatting files can be
687       easily done by making a shell alias, like so:
688
689               alias fmt="perl -i -MText::Autoformat -n0777 \
690                       -e 'print autoformat $_, {all=>1}' $*"
691
692       See the documentation for "Text::Autoformat" to appreciate its many
693       capabilities.
694
695   How can I access or change N characters of a string?
696       You can access the first characters of a string with substr().  To get
697       the first character, for example, start at position 0 and grab the
698       string of length 1.
699
700               $string = "Just another Perl Hacker";
701               $first_char = substr( $string, 0, 1 );  #  'J'
702
703       To change part of a string, you can use the optional fourth argument
704       which is the replacement string.
705
706               substr( $string, 13, 4, "Perl 5.8.0" );
707
708       You can also use substr() as an lvalue.
709
710               substr( $string, 13, 4 ) =  "Perl 5.8.0";
711
712   How do I change the Nth occurrence of something?
713       You have to keep track of N yourself.  For example, let's say you want
714       to change the fifth occurrence of "whoever" or "whomever" into
715       "whosoever" or "whomsoever", case insensitively.  These all assume that
716       $_ contains the string to be altered.
717
718               $count = 0;
719               s{((whom?)ever)}{
720               ++$count == 5       # is it the 5th?
721                   ? "${2}soever"  # yes, swap
722                   : $1            # renege and leave it there
723                       }ige;
724
725       In the more general case, you can use the "/g" modifier in a "while"
726       loop, keeping count of matches.
727
728               $WANT = 3;
729               $count = 0;
730               $_ = "One fish two fish red fish blue fish";
731               while (/(\w+)\s+fish\b/gi) {
732                       if (++$count == $WANT) {
733                               print "The third fish is a $1 one.\n";
734                               }
735                       }
736
737       That prints out: "The third fish is a red one."  You can also use a
738       repetition count and repeated pattern like this:
739
740               /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
741
742   How can I count the number of occurrences of a substring within a string?
743       There are a number of ways, with varying efficiency.  If you want a
744       count of a certain single character (X) within a string, you can use
745       the "tr///" function like so:
746
747               $string = "ThisXlineXhasXsomeXx'sXinXit";
748               $count = ($string =~ tr/X//);
749               print "There are $count X characters in the string";
750
751       This is fine if you are just looking for a single character.  However,
752       if you are trying to count multiple character substrings within a
753       larger string, "tr///" won't work.  What you can do is wrap a while()
754       loop around a global pattern match.  For example, let's count negative
755       integers:
756
757               $string = "-9 55 48 -2 23 -76 4 14 -44";
758               while ($string =~ /-\d+/g) { $count++ }
759               print "There are $count negative numbers in the string";
760
761       Another version uses a global match in list context, then assigns the
762       result to a scalar, producing a count of the number of matches.
763
764               $count = () = $string =~ /-\d+/g;
765
766   How do I capitalize all the words on one line?
767       (contributed by brian d foy)
768
769       Damian Conway's Text::Autoformat handles all of the thinking for you.
770
771               use Text::Autoformat;
772               my $x = "Dr. Strangelove or: How I Learned to Stop ".
773                 "Worrying and Love the Bomb";
774
775               print $x, "\n";
776               for my $style (qw( sentence title highlight )) {
777                       print autoformat($x, { case => $style }), "\n";
778                       }
779
780       How do you want to capitalize those words?
781
782               FRED AND BARNEY'S LODGE        # all uppercase
783               Fred And Barney's Lodge        # title case
784               Fred and Barney's Lodge        # highlight case
785
786       It's not as easy a problem as it looks. How many words do you think are
787       in there? Wait for it... wait for it.... If you answered 5 you're
788       right. Perl words are groups of "\w+", but that's not what you want to
789       capitalize. How is Perl supposed to know not to capitalize that "s"
790       after the apostrophe? You could try a regular expression:
791
792               $string =~ s/ (
793                                        (^\w)    #at the beginning of the line
794                                          |      # or
795                                        (\s\w)   #preceded by whitespace
796                                          )
797                                       /\U$1/xg;
798
799               $string =~ s/([\w']+)/\u\L$1/g;
800
801       Now, what if you don't want to capitalize that "and"? Just use
802       Text::Autoformat and get on with the next problem. :)
803
804   How can I split a [character] delimited string except when inside
805       [character]?
806       Several modules can handle this sort of parsing--"Text::Balanced",
807       "Text::CSV", "Text::CSV_XS", and "Text::ParseWords", among others.
808
809       Take the example case of trying to split a string that is comma-
810       separated into its different fields. You can't use "split(/,/)" because
811       you shouldn't split if the comma is inside quotes.  For example, take a
812       data line like this:
813
814               SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
815
816       Due to the restriction of the quotes, this is a fairly complex problem.
817       Thankfully, we have Jeffrey Friedl, author of Mastering Regular
818       Expressions, to handle these for us.  He suggests (assuming your string
819       is contained in $text):
820
821                @new = ();
822                push(@new, $+) while $text =~ m{
823                        "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
824                       | ([^,]+),?
825                       | ,
826                       }gx;
827                push(@new, undef) if substr($text,-1,1) eq ',';
828
829       If you want to represent quotation marks inside a quotation-mark-
830       delimited field, escape them with backslashes (eg, "like \"this\"".
831
832       Alternatively, the "Text::ParseWords" module (part of the standard Perl
833       distribution) lets you say:
834
835               use Text::ParseWords;
836               @new = quotewords(",", 0, $text);
837
838   How do I strip blank space from the beginning/end of a string?
839       (contributed by brian d foy)
840
841       A substitution can do this for you. For a single line, you want to
842       replace all the leading or trailing whitespace with nothing. You can do
843       that with a pair of substitutions.
844
845               s/^\s+//;
846               s/\s+$//;
847
848       You can also write that as a single substitution, although it turns out
849       the combined statement is slower than the separate ones. That might not
850       matter to you, though.
851
852               s/^\s+|\s+$//g;
853
854       In this regular expression, the alternation matches either at the
855       beginning or the end of the string since the anchors have a lower
856       precedence than the alternation. With the "/g" flag, the substitution
857       makes all possible matches, so it gets both. Remember, the trailing
858       newline matches the "\s+", and  the "$" anchor can match to the
859       physical end of the string, so the newline disappears too. Just add the
860       newline to the output, which has the added benefit of preserving
861       "blank" (consisting entirely of whitespace) lines which the "^\s+"
862       would remove all by itself.
863
864               while( <> )
865                       {
866                       s/^\s+|\s+$//g;
867                       print "$_\n";
868                       }
869
870       For a multi-line string, you can apply the regular expression to each
871       logical line in the string by adding the "/m" flag (for "multi-line").
872       With the "/m" flag, the "$" matches before an embedded newline, so it
873       doesn't remove it. It still removes the newline at the end of the
874       string.
875
876               $string =~ s/^\s+|\s+$//gm;
877
878       Remember that lines consisting entirely of whitespace will disappear,
879       since the first part of the alternation can match the entire string and
880       replace it with nothing. If need to keep embedded blank lines, you have
881       to do a little more work. Instead of matching any whitespace (since
882       that includes a newline), just match the other whitespace.
883
884               $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
885
886   How do I pad a string with blanks or pad a number with zeroes?
887       In the following examples, $pad_len is the length to which you wish to
888       pad the string, $text or $num contains the string to be padded, and
889       $pad_char contains the padding character. You can use a single
890       character string constant instead of the $pad_char variable if you know
891       what it is in advance. And in the same way you can use an integer in
892       place of $pad_len if you know the pad length in advance.
893
894       The simplest method uses the "sprintf" function. It can pad on the left
895       or right with blanks and on the left with zeroes and it will not
896       truncate the result. The "pack" function can only pad strings on the
897       right with blanks and it will truncate the result to a maximum length
898       of $pad_len.
899
900               # Left padding a string with blanks (no truncation):
901               $padded = sprintf("%${pad_len}s", $text);
902               $padded = sprintf("%*s", $pad_len, $text);  # same thing
903
904               # Right padding a string with blanks (no truncation):
905               $padded = sprintf("%-${pad_len}s", $text);
906               $padded = sprintf("%-*s", $pad_len, $text); # same thing
907
908               # Left padding a number with 0 (no truncation):
909               $padded = sprintf("%0${pad_len}d", $num);
910               $padded = sprintf("%0*d", $pad_len, $num); # same thing
911
912               # Right padding a string with blanks using pack (will truncate):
913               $padded = pack("A$pad_len",$text);
914
915       If you need to pad with a character other than blank or zero you can
916       use one of the following methods.  They all generate a pad string with
917       the "x" operator and combine that with $text. These methods do not
918       truncate $text.
919
920       Left and right padding with any character, creating a new string:
921
922               $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
923               $padded = $text . $pad_char x ( $pad_len - length( $text ) );
924
925       Left and right padding with any character, modifying $text directly:
926
927               substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
928               $text .= $pad_char x ( $pad_len - length( $text ) );
929
930   How do I extract selected columns from a string?
931       (contributed by brian d foy)
932
933       If you know the columns that contain the data, you can use "substr" to
934       extract a single column.
935
936               my $column = substr( $line, $start_column, $length );
937
938       You can use "split" if the columns are separated by whitespace or some
939       other delimiter, as long as whitespace or the delimiter cannot appear
940       as part of the data.
941
942               my $line    = ' fred barney   betty   ';
943               my @columns = split /\s+/, $line;
944                       # ( '', 'fred', 'barney', 'betty' );
945
946               my $line    = 'fred||barney||betty';
947               my @columns = split /\|/, $line;
948                       # ( 'fred', '', 'barney', '', 'betty' );
949
950       If you want to work with comma-separated values, don't do this since
951       that format is a bit more complicated. Use one of the modules that
952       handle that format, such as "Text::CSV", "Text::CSV_XS", or
953       "Text::CSV_PP".
954
955       If you want to break apart an entire line of fixed columns, you can use
956       "unpack" with the A (ASCII) format. By using a number after the format
957       specifier, you can denote the column width. See the "pack" and "unpack"
958       entries in perlfunc for more details.
959
960               my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
961
962       Note that spaces in the format argument to "unpack" do not denote
963       literal spaces. If you have space separated data, you may want "split"
964       instead.
965
966   How do I find the soundex value of a string?
967       (contributed by brian d foy)
968
969       You can use the Text::Soundex module. If you want to do fuzzy or close
970       matching, you might also try the "String::Approx", and
971       "Text::Metaphone", and "Text::DoubleMetaphone" modules.
972
973   How can I expand variables in text strings?
974       (contributed by brian d foy)
975
976       If you can avoid it, don't, or if you can use a templating system, such
977       as "Text::Template" or "Template" Toolkit, do that instead. You might
978       even be able to get the job done with "sprintf" or "printf":
979
980               my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
981
982       However, for the one-off simple case where I don't want to pull out a
983       full templating system, I'll use a string that has two Perl scalar
984       variables in it. In this example, I want to expand $foo and $bar to
985       their variable's values:
986
987               my $foo = 'Fred';
988               my $bar = 'Barney';
989               $string = 'Say hello to $foo and $bar';
990
991       One way I can do this involves the substitution operator and a double
992       "/e" flag.  The first "/e" evaluates $1 on the replacement side and
993       turns it into $foo. The second /e starts with $foo and replaces it with
994       its value. $foo, then, turns into 'Fred', and that's finally what's
995       left in the string:
996
997               $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
998
999       The "/e" will also silently ignore violations of strict, replacing
1000       undefined variable names with the empty string. Since I'm using the
1001       "/e" flag (twice even!), I have all of the same security problems I
1002       have with "eval" in its string form. If there's something odd in $foo,
1003       perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1004       myself in trouble.
1005
1006       To get around the security problem, I could also pull the values from a
1007       hash instead of evaluating variable names. Using a single "/e", I can
1008       check the hash to ensure the value exists, and if it doesn't, I can
1009       replace the missing value with a marker, in this case "???" to signal
1010       that I missed something:
1011
1012               my $string = 'This has $foo and $bar';
1013
1014               my %Replacements = (
1015                       foo  => 'Fred',
1016                       );
1017
1018               # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1019               $string =~ s/\$(\w+)/
1020                       exists $Replacements{$1} ? $Replacements{$1} : '???'
1021                       /eg;
1022
1023               print $string;
1024
1025   What's wrong with always quoting "$vars"?
1026       The problem is that those double-quotes force stringification--coercing
1027       numbers and references into strings--even when you don't want them to
1028       be strings.  Think of it this way: double-quote expansion is used to
1029       produce new strings.  If you already have a string, why do you need
1030       more?
1031
1032       If you get used to writing odd things like these:
1033
1034               print "$var";           # BAD
1035               $new = "$old";          # BAD
1036               somefunc("$var");       # BAD
1037
1038       You'll be in trouble.  Those should (in 99.8% of the cases) be the
1039       simpler and more direct:
1040
1041               print $var;
1042               $new = $old;
1043               somefunc($var);
1044
1045       Otherwise, besides slowing you down, you're going to break code when
1046       the thing in the scalar is actually neither a string nor a number, but
1047       a reference:
1048
1049               func(\@array);
1050               sub func {
1051                       my $aref = shift;
1052                       my $oref = "$aref";  # WRONG
1053                       }
1054
1055       You can also get into subtle problems on those few operations in Perl
1056       that actually do care about the difference between a string and a
1057       number, such as the magical "++" autoincrement operator or the
1058       syscall() function.
1059
1060       Stringification also destroys arrays.
1061
1062               @lines = `command`;
1063               print "@lines";     # WRONG - extra blanks
1064               print @lines;       # right
1065
1066   Why don't my <<HERE documents work?
1067       Check for these three things:
1068
1069       There must be no space after the << part.
1070       There (probably) should be a semicolon at the end.
1071       You can't (easily) have any space in front of the tag.
1072
1073       If you want to indent the text in the here document, you can do this:
1074
1075           # all in one
1076           ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1077               your text
1078               goes here
1079           HERE_TARGET
1080
1081       But the HERE_TARGET must still be flush against the margin.  If you
1082       want that indented also, you'll have to quote in the indentation.
1083
1084           ($quote = <<'    FINIS') =~ s/^\s+//gm;
1085                   ...we will have peace, when you and all your works have
1086                   perished--and the works of your dark master to whom you
1087                   would deliver us. You are a liar, Saruman, and a corrupter
1088                   of men's hearts.  --Theoden in /usr/src/perl/taint.c
1089               FINIS
1090           $quote =~ s/\s+--/\n--/;
1091
1092       A nice general-purpose fixer-upper function for indented here documents
1093       follows.  It expects to be called with a here document as its argument.
1094       It looks to see whether each line begins with a common substring, and
1095       if so, strips that substring off.  Otherwise, it takes the amount of
1096       leading whitespace found on the first line and removes that much off
1097       each subsequent line.
1098
1099           sub fix {
1100               local $_ = shift;
1101               my ($white, $leader);  # common whitespace and common leading string
1102               if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1103                   ($white, $leader) = ($2, quotemeta($1));
1104               } else {
1105                   ($white, $leader) = (/^(\s+)/, '');
1106               }
1107               s/^\s*?$leader(?:$white)?//gm;
1108               return $_;
1109           }
1110
1111       This works with leading special strings, dynamically determined:
1112
1113               $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1114               @@@ int
1115               @@@ runops() {
1116               @@@     SAVEI32(runlevel);
1117               @@@     runlevel++;
1118               @@@     while ( op = (*op->op_ppaddr)() );
1119               @@@     TAINT_NOT;
1120               @@@     return 0;
1121               @@@ }
1122               MAIN_INTERPRETER_LOOP
1123
1124       Or with a fixed amount of leading whitespace, with remaining
1125       indentation correctly preserved:
1126
1127               $poem = fix<<EVER_ON_AND_ON;
1128              Now far ahead the Road has gone,
1129                 And I must follow, if I can,
1130              Pursuing it with eager feet,
1131                 Until it joins some larger way
1132              Where many paths and errands meet.
1133                 And whither then? I cannot say.
1134                       --Bilbo in /usr/src/perl/pp_ctl.c
1135               EVER_ON_AND_ON
1136

Data: Arrays

1138   What is the difference between a list and an array?
1139       (contributed by brian d foy)
1140
1141       A list is a fixed collection of scalars. An array is a variable that
1142       holds a variable collection of scalars. An array can supply its
1143       collection for list operations, so list operations also work on arrays:
1144
1145               # slices
1146               ( 'dog', 'cat', 'bird' )[2,3];
1147               @animals[2,3];
1148
1149               # iteration
1150               foreach ( qw( dog cat bird ) ) { ... }
1151               foreach ( @animals ) { ... }
1152
1153               my @three = grep { length == 3 } qw( dog cat bird );
1154               my @three = grep { length == 3 } @animals;
1155
1156               # supply an argument list
1157               wash_animals( qw( dog cat bird ) );
1158               wash_animals( @animals );
1159
1160       Array operations, which change the scalars, reaaranges them, or adds or
1161       subtracts some scalars, only work on arrays. These can't work on a
1162       list, which is fixed. Array operations include "shift", "unshift",
1163       "push", "pop", and "splice".
1164
1165       An array can also change its length:
1166
1167               $#animals = 1;  # truncate to two elements
1168               $#animals = 10000; # pre-extend to 10,001 elements
1169
1170       You can change an array element, but you can't change a list element:
1171
1172               $animals[0] = 'Rottweiler';
1173               qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1174
1175               foreach ( @animals ) {
1176                       s/^d/fr/;  # works fine
1177                       }
1178
1179               foreach ( qw( dog cat bird ) ) {
1180                       s/^d/fr/;  # Error! Modification of read only value!
1181                       }
1182
1183       However, if the list element is itself a variable, it appears that you
1184       can change a list element. However, the list element is the variable,
1185       not the data. You're not changing the list element, but something the
1186       list element refers to. The list element itself doesn't change: it's
1187       still the same variable.
1188
1189       You also have to be careful about context. You can assign an array to a
1190       scalar to get the number of elements in the array. This only works for
1191       arrays, though:
1192
1193               my $count = @animals;  # only works with arrays
1194
1195       If you try to do the same thing with what you think is a list, you get
1196       a quite different result. Although it looks like you have a list on the
1197       righthand side, Perl actually sees a bunch of scalars separated by a
1198       comma:
1199
1200               my $scalar = ( 'dog', 'cat', 'bird' );  # $scalar gets bird
1201
1202       Since you're assigning to a scalar, the righthand side is in scalar
1203       context. The comma operator (yes, it's an operator!) in scalar context
1204       evaluates its lefthand side, throws away the result, and evaluates it's
1205       righthand side and returns the result. In effect, that list-lookalike
1206       assigns to $scalar it's rightmost value. Many people mess this up
1207       becuase they choose a list-lookalike whose last element is also the
1208       count they expect:
1209
1210               my $scalar = ( 1, 2, 3 );  # $scalar gets 3, accidentally
1211
1212   What is the difference between $array[1] and @array[1]?
1213       (contributed by brian d foy)
1214
1215       The difference is the sigil, that special character in front of the
1216       array name. The "$" sigil means "exactly one item", while the "@" sigil
1217       means "zero or more items". The "$" gets you a single scalar, while the
1218       "@" gets you a list.
1219
1220       The confusion arises because people incorrectly assume that the sigil
1221       denotes the variable type.
1222
1223       The $array[1] is a single-element access to the array. It's going to
1224       return the item in index 1 (or undef if there is no item there).  If
1225       you intend to get exactly one element from the array, this is the form
1226       you should use.
1227
1228       The @array[1] is an array slice, although it has only one index.  You
1229       can pull out multiple elements simultaneously by specifying additional
1230       indices as a list, like @array[1,4,3,0].
1231
1232       Using a slice on the lefthand side of the assignment supplies list
1233       context to the righthand side. This can lead to unexpected results.
1234       For instance, if you want to read a single line from a filehandle,
1235       assigning to a scalar value is fine:
1236
1237               $array[1] = <STDIN>;
1238
1239       However, in list context, the line input operator returns all of the
1240       lines as a list. The first line goes into @array[1] and the rest of the
1241       lines mysteriously disappear:
1242
1243               @array[1] = <STDIN>;  # most likely not what you want
1244
1245       Either the "use warnings" pragma or the -w flag will warn you when you
1246       use an array slice with a single index.
1247
1248   How can I remove duplicate elements from a list or array?
1249       (contributed by brian d foy)
1250
1251       Use a hash. When you think the words "unique" or "duplicated", think
1252       "hash keys".
1253
1254       If you don't care about the order of the elements, you could just
1255       create the hash then extract the keys. It's not important how you
1256       create that hash: just that you use "keys" to get the unique elements.
1257
1258               my %hash   = map { $_, 1 } @array;
1259               # or a hash slice: @hash{ @array } = ();
1260               # or a foreach: $hash{$_} = 1 foreach ( @array );
1261
1262               my @unique = keys %hash;
1263
1264       If you want to use a module, try the "uniq" function from
1265       "List::MoreUtils". In list context it returns the unique elements,
1266       preserving their order in the list. In scalar context, it returns the
1267       number of unique elements.
1268
1269               use List::MoreUtils qw(uniq);
1270
1271               my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1272               my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1273
1274       You can also go through each element and skip the ones you've seen
1275       before. Use a hash to keep track. The first time the loop sees an
1276       element, that element has no key in %Seen. The "next" statement creates
1277       the key and immediately uses its value, which is "undef", so the loop
1278       continues to the "push" and increments the value for that key. The next
1279       time the loop sees that same element, its key exists in the hash and
1280       the value for that key is true (since it's not 0 or "undef"), so the
1281       next skips that iteration and the loop goes to the next element.
1282
1283               my @unique = ();
1284               my %seen   = ();
1285
1286               foreach my $elem ( @array )
1287                       {
1288                       next if $seen{ $elem }++;
1289                       push @unique, $elem;
1290                       }
1291
1292       You can write this more briefly using a grep, which does the same
1293       thing.
1294
1295               my %seen = ();
1296               my @unique = grep { ! $seen{ $_ }++ } @array;
1297
1298   How can I tell whether a certain element is contained in a list or array?
1299       (portions of this answer contributed by Anno Siegel and brian d foy)
1300
1301       Hearing the word "in" is an indication that you probably should have
1302       used a hash, not a list or array, to store your data.  Hashes are
1303       designed to answer this question quickly and efficiently.  Arrays
1304       aren't.
1305
1306       That being said, there are several ways to approach this.  In Perl 5.10
1307       and later, you can use the smart match operator to check that an item
1308       is contained in an array or a hash:
1309
1310               use 5.010;
1311
1312               if( $item ~~ @array )
1313                       {
1314                       say "The array contains $item"
1315                       }
1316
1317               if( $item ~~ %hash )
1318                       {
1319                       say "The hash contains $item"
1320                       }
1321
1322       With earlier versions of Perl, you have to do a bit more work. If you
1323       are going to make this query many times over arbitrary string values,
1324       the fastest way is probably to invert the original array and maintain a
1325       hash whose keys are the first array's values:
1326
1327               @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1328               %is_blue = ();
1329               for (@blues) { $is_blue{$_} = 1 }
1330
1331       Now you can check whether $is_blue{$some_color}.  It might have been a
1332       good idea to keep the blues all in a hash in the first place.
1333
1334       If the values are all small integers, you could use a simple indexed
1335       array.  This kind of an array will take up less space:
1336
1337               @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1338               @is_tiny_prime = ();
1339               for (@primes) { $is_tiny_prime[$_] = 1 }
1340               # or simply  @istiny_prime[@primes] = (1) x @primes;
1341
1342       Now you check whether $is_tiny_prime[$some_number].
1343
1344       If the values in question are integers instead of strings, you can save
1345       quite a lot of space by using bit strings instead:
1346
1347               @articles = ( 1..10, 150..2000, 2017 );
1348               undef $read;
1349               for (@articles) { vec($read,$_,1) = 1 }
1350
1351       Now check whether "vec($read,$n,1)" is true for some $n.
1352
1353       These methods guarantee fast individual tests but require a re-
1354       organization of the original list or array.  They only pay off if you
1355       have to test multiple values against the same array.
1356
1357       If you are testing only once, the standard module "List::Util" exports
1358       the function "first" for this purpose.  It works by stopping once it
1359       finds the element. It's written in C for speed, and its Perl equivalent
1360       looks like this subroutine:
1361
1362               sub first (&@) {
1363                       my $code = shift;
1364                       foreach (@_) {
1365                               return $_ if &{$code}();
1366                       }
1367                       undef;
1368               }
1369
1370       If speed is of little concern, the common idiom uses grep in scalar
1371       context (which returns the number of items that passed its condition)
1372       to traverse the entire list. This does have the benefit of telling you
1373       how many matches it found, though.
1374
1375               my $is_there = grep $_ eq $whatever, @array;
1376
1377       If you want to actually extract the matching elements, simply use grep
1378       in list context.
1379
1380               my @matches = grep $_ eq $whatever, @array;
1381
1382   How do I compute the difference of two arrays?  How do I compute the
1383       intersection of two arrays?
1384       Use a hash.  Here's code to do both and more.  It assumes that each
1385       element is unique in a given array:
1386
1387               @union = @intersection = @difference = ();
1388               %count = ();
1389               foreach $element (@array1, @array2) { $count{$element}++ }
1390               foreach $element (keys %count) {
1391                       push @union, $element;
1392                       push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1393                       }
1394
1395       Note that this is the symmetric difference, that is, all elements in
1396       either A or in B but not in both.  Think of it as an xor operation.
1397
1398   How do I test whether two arrays or hashes are equal?
1399       With Perl 5.10 and later, the smart match operator can give you the
1400       answer with the least amount of work:
1401
1402               use 5.010;
1403
1404               if( @array1 ~~ @array2 )
1405                       {
1406                       say "The arrays are the same";
1407                       }
1408
1409               if( %hash1 ~~ %hash2 ) # doesn't check values!
1410                       {
1411                       say "The hash keys are the same";
1412                       }
1413
1414       The following code works for single-level arrays.  It uses a stringwise
1415       comparison, and does not distinguish defined versus undefined empty
1416       strings.  Modify if you have other needs.
1417
1418               $are_equal = compare_arrays(\@frogs, \@toads);
1419
1420               sub compare_arrays {
1421                       my ($first, $second) = @_;
1422                       no warnings;  # silence spurious -w undef complaints
1423                       return 0 unless @$first == @$second;
1424                       for (my $i = 0; $i < @$first; $i++) {
1425                               return 0 if $first->[$i] ne $second->[$i];
1426                               }
1427                       return 1;
1428                       }
1429
1430       For multilevel structures, you may wish to use an approach more like
1431       this one.  It uses the CPAN module "FreezeThaw":
1432
1433               use FreezeThaw qw(cmpStr);
1434               @a = @b = ( "this", "that", [ "more", "stuff" ] );
1435
1436               printf "a and b contain %s arrays\n",
1437                       cmpStr(\@a, \@b) == 0
1438                       ? "the same"
1439                       : "different";
1440
1441       This approach also works for comparing hashes.  Here we'll demonstrate
1442       two different answers:
1443
1444               use FreezeThaw qw(cmpStr cmpStrHard);
1445
1446               %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1447               $a{EXTRA} = \%b;
1448               $b{EXTRA} = \%a;
1449
1450               printf "a and b contain %s hashes\n",
1451               cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1452
1453               printf "a and b contain %s hashes\n",
1454               cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1455
1456       The first reports that both those the hashes contain the same data,
1457       while the second reports that they do not.  Which you prefer is left as
1458       an exercise to the reader.
1459
1460   How do I find the first array element for which a condition is true?
1461       To find the first array element which satisfies a condition, you can
1462       use the "first()" function in the "List::Util" module, which comes with
1463       Perl 5.8. This example finds the first element that contains "Perl".
1464
1465               use List::Util qw(first);
1466
1467               my $element = first { /Perl/ } @array;
1468
1469       If you cannot use "List::Util", you can make your own loop to do the
1470       same thing.  Once you find the element, you stop the loop with last.
1471
1472               my $found;
1473               foreach ( @array ) {
1474                       if( /Perl/ ) { $found = $_; last }
1475                       }
1476
1477       If you want the array index, you can iterate through the indices and
1478       check the array element at each index until you find one that satisfies
1479       the condition.
1480
1481               my( $found, $index ) = ( undef, -1 );
1482               for( $i = 0; $i < @array; $i++ ) {
1483                       if( $array[$i] =~ /Perl/ ) {
1484                               $found = $array[$i];
1485                               $index = $i;
1486                               last;
1487                               }
1488                       }
1489
1490   How do I handle linked lists?
1491       In general, you usually don't need a linked list in Perl, since with
1492       regular arrays, you can push and pop or shift and unshift at either
1493       end, or you can use splice to add and/or remove arbitrary number of
1494       elements at arbitrary points.  Both pop and shift are O(1) operations
1495       on Perl's dynamic arrays.  In the absence of shifts and pops, push in
1496       general needs to reallocate on the order every log(N) times, and
1497       unshift will need to copy pointers each time.
1498
1499       If you really, really wanted, you could use structures as described in
1500       perldsc or perltoot and do just what the algorithm book tells you to
1501       do.  For example, imagine a list node like this:
1502
1503               $node = {
1504                       VALUE => 42,
1505                       LINK  => undef,
1506                       };
1507
1508       You could walk the list this way:
1509
1510               print "List: ";
1511               for ($node = $head;  $node; $node = $node->{LINK}) {
1512                       print $node->{VALUE}, " ";
1513                       }
1514               print "\n";
1515
1516       You could add to the list this way:
1517
1518               my ($head, $tail);
1519               $tail = append($head, 1);       # grow a new head
1520               for $value ( 2 .. 10 ) {
1521                       $tail = append($tail, $value);
1522                       }
1523
1524               sub append {
1525                       my($list, $value) = @_;
1526                       my $node = { VALUE => $value };
1527                       if ($list) {
1528                               $node->{LINK} = $list->{LINK};
1529                               $list->{LINK} = $node;
1530                               }
1531                       else {
1532                               $_[0] = $node;      # replace caller's version
1533                               }
1534                       return $node;
1535                       }
1536
1537       But again, Perl's built-in are virtually always good enough.
1538
1539   How do I handle circular lists?
1540       (contributed by brian d foy)
1541
1542       If you want to cycle through an array endlessly, you can increment the
1543       index modulo the number of elements in the array:
1544
1545               my @array = qw( a b c );
1546               my $i = 0;
1547
1548               while( 1 ) {
1549                       print $array[ $i++ % @array ], "\n";
1550                       last if $i > 20;
1551                       }
1552
1553       You can also use "Tie::Cycle" to use a scalar that always has the next
1554       element of the circular array:
1555
1556               use Tie::Cycle;
1557
1558               tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1559
1560               print $cycle; # FFFFFF
1561               print $cycle; # 000000
1562               print $cycle; # FFFF00
1563
1564       The "Array::Iterator::Circular" creates an iterator object for circular
1565       arrays:
1566
1567               use Array::Iterator::Circular;
1568
1569               my $color_iterator = Array::Iterator::Circular->new(
1570                       qw(red green blue orange)
1571                       );
1572
1573               foreach ( 1 .. 20 ) {
1574                       print $color_iterator->next, "\n";
1575                       }
1576
1577   How do I shuffle an array randomly?
1578       If you either have Perl 5.8.0 or later installed, or if you have
1579       Scalar-List-Utils 1.03 or later installed, you can say:
1580
1581               use List::Util 'shuffle';
1582
1583               @shuffled = shuffle(@list);
1584
1585       If not, you can use a Fisher-Yates shuffle.
1586
1587               sub fisher_yates_shuffle {
1588                       my $deck = shift;  # $deck is a reference to an array
1589                       return unless @$deck; # must not be empty!
1590
1591                       my $i = @$deck;
1592                       while (--$i) {
1593                               my $j = int rand ($i+1);
1594                               @$deck[$i,$j] = @$deck[$j,$i];
1595                               }
1596               }
1597
1598               # shuffle my mpeg collection
1599               #
1600               my @mpeg = <audio/*/*.mp3>;
1601               fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1602               print @mpeg;
1603
1604       Note that the above implementation shuffles an array in place, unlike
1605       the "List::Util::shuffle()" which takes a list and returns a new
1606       shuffled list.
1607
1608       You've probably seen shuffling algorithms that work using splice,
1609       randomly picking another element to swap the current element with
1610
1611               srand;
1612               @new = ();
1613               @old = 1 .. 10;  # just a demo
1614               while (@old) {
1615                       push(@new, splice(@old, rand @old, 1));
1616                       }
1617
1618       This is bad because splice is already O(N), and since you do it N
1619       times, you just invented a quadratic algorithm; that is, O(N**2).  This
1620       does not scale, although Perl is so efficient that you probably won't
1621       notice this until you have rather largish arrays.
1622
1623   How do I process/modify each element of an array?
1624       Use "for"/"foreach":
1625
1626               for (@lines) {
1627                       s/foo/bar/;     # change that word
1628                       tr/XZ/ZX/;      # swap those letters
1629                       }
1630
1631       Here's another; let's compute spherical volumes:
1632
1633               for (@volumes = @radii) {   # @volumes has changed parts
1634                       $_ **= 3;
1635                       $_ *= (4/3) * 3.14159;  # this will be constant folded
1636                       }
1637
1638       which can also be done with "map()" which is made to transform one list
1639       into another:
1640
1641               @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1642
1643       If you want to do the same thing to modify the values of the hash, you
1644       can use the "values" function.  As of Perl 5.6 the values are not
1645       copied, so if you modify $orbit (in this case), you modify the value.
1646
1647               for $orbit ( values %orbits ) {
1648                       ($orbit **= 3) *= (4/3) * 3.14159;
1649                       }
1650
1651       Prior to perl 5.6 "values" returned copies of the values, so older perl
1652       code often contains constructions such as @orbits{keys %orbits} instead
1653       of "values %orbits" where the hash is to be modified.
1654
1655   How do I select a random element from an array?
1656       Use the "rand()" function (see "rand" in perlfunc):
1657
1658               $index   = rand @array;
1659               $element = $array[$index];
1660
1661       Or, simply:
1662
1663               my $element = $array[ rand @array ];
1664
1665   How do I permute N elements of a list?
1666       Use the "List::Permutor" module on CPAN. If the list is actually an
1667       array, try the "Algorithm::Permute" module (also on CPAN). It's written
1668       in XS code and is very efficient:
1669
1670               use Algorithm::Permute;
1671
1672               my @array = 'a'..'d';
1673               my $p_iterator = Algorithm::Permute->new ( \@array );
1674
1675               while (my @perm = $p_iterator->next) {
1676                  print "next permutation: (@perm)\n";
1677                       }
1678
1679       For even faster execution, you could do:
1680
1681               use Algorithm::Permute;
1682
1683               my @array = 'a'..'d';
1684
1685               Algorithm::Permute::permute {
1686                       print "next permutation: (@array)\n";
1687                       } @array;
1688
1689       Here's a little program that generates all permutations of all the
1690       words on each line of input. The algorithm embodied in the "permute()"
1691       function is discussed in Volume 4 (still unpublished) of Knuth's The
1692       Art of Computer Programming and will work on any list:
1693
1694               #!/usr/bin/perl -n
1695               # Fischer-Krause ordered permutation generator
1696
1697               sub permute (&@) {
1698                       my $code = shift;
1699                       my @idx = 0..$#_;
1700                       while ( $code->(@_[@idx]) ) {
1701                               my $p = $#idx;
1702                               --$p while $idx[$p-1] > $idx[$p];
1703                               my $q = $p or return;
1704                               push @idx, reverse splice @idx, $p;
1705                               ++$q while $idx[$p-1] > $idx[$q];
1706                               @idx[$p-1,$q]=@idx[$q,$p-1];
1707                       }
1708               }
1709
1710               permute { print "@_\n" } split;
1711
1712       The "Algorithm::Loops" module also provides the "NextPermute" and
1713       "NextPermuteNum" functions which efficiently find all unique
1714       permutations of an array, even if it contains duplicate values,
1715       modifying it in-place: if its elements are in reverse-sorted order then
1716       the array is reversed, making it sorted, and it returns false;
1717       otherwise the next permutation is returned.
1718
1719       "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1720       you can enumerate all the permutations of 0..9 like this:
1721
1722               use Algorithm::Loops qw(NextPermuteNum);
1723
1724           my @list= 0..9;
1725           do { print "@list\n" } while NextPermuteNum @list;
1726
1727   How do I sort an array by (anything)?
1728       Supply a comparison function to sort() (described in "sort" in
1729       perlfunc):
1730
1731               @list = sort { $a <=> $b } @list;
1732
1733       The default sort function is cmp, string comparison, which would sort
1734       "(1, 2, 10)" into "(1, 10, 2)".  "<=>", used above, is the numerical
1735       comparison operator.
1736
1737       If you have a complicated function needed to pull out the part you want
1738       to sort on, then don't do it inside the sort function.  Pull it out
1739       first, because the sort BLOCK can be called many times for the same
1740       element.  Here's an example of how to pull out the first word after the
1741       first number on each item, and then sort those words case-
1742       insensitively.
1743
1744               @idx = ();
1745               for (@data) {
1746                       ($item) = /\d+\s*(\S+)/;
1747                       push @idx, uc($item);
1748                   }
1749               @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1750
1751       which could also be written this way, using a trick that's come to be
1752       known as the Schwartzian Transform:
1753
1754               @sorted = map  { $_->[0] }
1755                       sort { $a->[1] cmp $b->[1] }
1756                       map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1757
1758       If you need to sort on several fields, the following paradigm is
1759       useful.
1760
1761               @sorted = sort {
1762                       field1($a) <=> field1($b) ||
1763                       field2($a) cmp field2($b) ||
1764                       field3($a) cmp field3($b)
1765                       } @data;
1766
1767       This can be conveniently combined with precalculation of keys as given
1768       above.
1769
1770       See the sort article in the "Far More Than You Ever Wanted To Know"
1771       collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more
1772       about this approach.
1773
1774       See also the question later in perlfaq4 on sorting hashes.
1775
1776   How do I manipulate arrays of bits?
1777       Use "pack()" and "unpack()", or else "vec()" and the bitwise
1778       operations.
1779
1780       For example, you don't have to store individual bits in an array (which
1781       would mean that you're wasting a lot of space). To convert an array of
1782       bits to a string, use "vec()" to set the right bits. This sets $vec to
1783       have bit N set only if $ints[N] was set:
1784
1785               @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1786               $vec = '';
1787               foreach( 0 .. $#ints ) {
1788                       vec($vec,$_,1) = 1 if $ints[$_];
1789                       }
1790
1791       The string $vec only takes up as many bits as it needs. For instance,
1792       if you had 16 entries in @ints, $vec only needs two bytes to store them
1793       (not counting the scalar variable overhead).
1794
1795       Here's how, given a vector in $vec, you can get those bits into your
1796       @ints array:
1797
1798               sub bitvec_to_list {
1799                       my $vec = shift;
1800                       my @ints;
1801                       # Find null-byte density then select best algorithm
1802                       if ($vec =~ tr/\0// / length $vec > 0.95) {
1803                               use integer;
1804                               my $i;
1805
1806                               # This method is faster with mostly null-bytes
1807                               while($vec =~ /[^\0]/g ) {
1808                                       $i = -9 + 8 * pos $vec;
1809                                       push @ints, $i if vec($vec, ++$i, 1);
1810                                       push @ints, $i if vec($vec, ++$i, 1);
1811                                       push @ints, $i if vec($vec, ++$i, 1);
1812                                       push @ints, $i if vec($vec, ++$i, 1);
1813                                       push @ints, $i if vec($vec, ++$i, 1);
1814                                       push @ints, $i if vec($vec, ++$i, 1);
1815                                       push @ints, $i if vec($vec, ++$i, 1);
1816                                       push @ints, $i if vec($vec, ++$i, 1);
1817                                       }
1818                               }
1819                       else {
1820                               # This method is a fast general algorithm
1821                               use integer;
1822                               my $bits = unpack "b*", $vec;
1823                               push @ints, 0 if $bits =~ s/^(\d)// && $1;
1824                               push @ints, pos $bits while($bits =~ /1/g);
1825                               }
1826
1827                       return \@ints;
1828                       }
1829
1830       This method gets faster the more sparse the bit vector is.  (Courtesy
1831       of Tim Bunce and Winfried Koenig.)
1832
1833       You can make the while loop a lot shorter with this suggestion from
1834       Benjamin Goldberg:
1835
1836               while($vec =~ /[^\0]+/g ) {
1837                       push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1838                       }
1839
1840       Or use the CPAN module "Bit::Vector":
1841
1842               $vector = Bit::Vector->new($num_of_bits);
1843               $vector->Index_List_Store(@ints);
1844               @ints = $vector->Index_List_Read();
1845
1846       "Bit::Vector" provides efficient methods for bit vector, sets of small
1847       integers and "big int" math.
1848
1849       Here's a more extensive illustration using vec():
1850
1851               # vec demo
1852               $vector = "\xff\x0f\xef\xfe";
1853               print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1854               unpack("N", $vector), "\n";
1855               $is_set = vec($vector, 23, 1);
1856               print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1857               pvec($vector);
1858
1859               set_vec(1,1,1);
1860               set_vec(3,1,1);
1861               set_vec(23,1,1);
1862
1863               set_vec(3,1,3);
1864               set_vec(3,2,3);
1865               set_vec(3,4,3);
1866               set_vec(3,4,7);
1867               set_vec(3,8,3);
1868               set_vec(3,8,7);
1869
1870               set_vec(0,32,17);
1871               set_vec(1,32,17);
1872
1873               sub set_vec {
1874                       my ($offset, $width, $value) = @_;
1875                       my $vector = '';
1876                       vec($vector, $offset, $width) = $value;
1877                       print "offset=$offset width=$width value=$value\n";
1878                       pvec($vector);
1879                       }
1880
1881               sub pvec {
1882                       my $vector = shift;
1883                       my $bits = unpack("b*", $vector);
1884                       my $i = 0;
1885                       my $BASE = 8;
1886
1887                       print "vector length in bytes: ", length($vector), "\n";
1888                       @bytes = unpack("A8" x length($vector), $bits);
1889                       print "bits are: @bytes\n\n";
1890                       }
1891
1892   Why does defined() return true on empty arrays and hashes?
1893       The short story is that you should probably only use defined on scalars
1894       or functions, not on aggregates (arrays and hashes).  See "defined" in
1895       perlfunc in the 5.004 release or later of Perl for more detail.
1896

Data: Hashes (Associative Arrays)

1898   How do I process an entire hash?
1899       (contributed by brian d foy)
1900
1901       There are a couple of ways that you can process an entire hash. You can
1902       get a list of keys, then go through each key, or grab a one key-value
1903       pair at a time.
1904
1905       To go through all of the keys, use the "keys" function. This extracts
1906       all of the keys of the hash and gives them back to you as a list. You
1907       can then get the value through the particular key you're processing:
1908
1909               foreach my $key ( keys %hash ) {
1910                       my $value = $hash{$key}
1911                       ...
1912                       }
1913
1914       Once you have the list of keys, you can process that list before you
1915       process the hash elements. For instance, you can sort the keys so you
1916       can process them in lexical order:
1917
1918               foreach my $key ( sort keys %hash ) {
1919                       my $value = $hash{$key}
1920                       ...
1921                       }
1922
1923       Or, you might want to only process some of the items. If you only want
1924       to deal with the keys that start with "text:", you can select just
1925       those using "grep":
1926
1927               foreach my $key ( grep /^text:/, keys %hash ) {
1928                       my $value = $hash{$key}
1929                       ...
1930                       }
1931
1932       If the hash is very large, you might not want to create a long list of
1933       keys. To save some memory, you can grab one key-value pair at a time
1934       using "each()", which returns a pair you haven't seen yet:
1935
1936               while( my( $key, $value ) = each( %hash ) ) {
1937                       ...
1938                       }
1939
1940       The "each" operator returns the pairs in apparently random order, so if
1941       ordering matters to you, you'll have to stick with the "keys" method.
1942
1943       The "each()" operator can be a bit tricky though. You can't add or
1944       delete keys of the hash while you're using it without possibly skipping
1945       or re-processing some pairs after Perl internally rehashes all of the
1946       elements. Additionally, a hash has only one iterator, so if you use
1947       "keys", "values", or "each" on the same hash, you can reset the
1948       iterator and mess up your processing. See the "each" entry in perlfunc
1949       for more details.
1950
1951   How do I merge two hashes?
1952       (contributed by brian d foy)
1953
1954       Before you decide to merge two hashes, you have to decide what to do if
1955       both hashes contain keys that are the same and if you want to leave the
1956       original hashes as they were.
1957
1958       If you want to preserve the original hashes, copy one hash (%hash1) to
1959       a new hash (%new_hash), then add the keys from the other hash (%hash2
1960       to the new hash. Checking that the key already exists in %new_hash
1961       gives you a chance to decide what to do with the duplicates:
1962
1963               my %new_hash = %hash1; # make a copy; leave %hash1 alone
1964
1965               foreach my $key2 ( keys %hash2 )
1966                       {
1967                       if( exists $new_hash{$key2} )
1968                               {
1969                               warn "Key [$key2] is in both hashes!";
1970                               # handle the duplicate (perhaps only warning)
1971                               ...
1972                               next;
1973                               }
1974                       else
1975                               {
1976                               $new_hash{$key2} = $hash2{$key2};
1977                               }
1978                       }
1979
1980       If you don't want to create a new hash, you can still use this looping
1981       technique; just change the %new_hash to %hash1.
1982
1983               foreach my $key2 ( keys %hash2 )
1984                       {
1985                       if( exists $hash1{$key2} )
1986                               {
1987                               warn "Key [$key2] is in both hashes!";
1988                               # handle the duplicate (perhaps only warning)
1989                               ...
1990                               next;
1991                               }
1992                       else
1993                               {
1994                               $hash1{$key2} = $hash2{$key2};
1995                               }
1996                       }
1997
1998       If you don't care that one hash overwrites keys and values from the
1999       other, you could just use a hash slice to add one hash to another. In
2000       this case, values from %hash2 replace values from %hash1 when they have
2001       keys in common:
2002
2003               @hash1{ keys %hash2 } = values %hash2;
2004
2005   What happens if I add or remove keys from a hash while iterating over it?
2006       (contributed by brian d foy)
2007
2008       The easy answer is "Don't do that!"
2009
2010       If you iterate through the hash with each(), you can delete the key
2011       most recently returned without worrying about it.  If you delete or add
2012       other keys, the iterator may skip or double up on them since perl may
2013       rearrange the hash table.  See the entry for "each()" in perlfunc.
2014
2015   How do I look up a hash element by value?
2016       Create a reverse hash:
2017
2018               %by_value = reverse %by_key;
2019               $key = $by_value{$value};
2020
2021       That's not particularly efficient.  It would be more space-efficient to
2022       use:
2023
2024               while (($key, $value) = each %by_key) {
2025                       $by_value{$value} = $key;
2026                   }
2027
2028       If your hash could have repeated values, the methods above will only
2029       find one of the associated keys.   This may or may not worry you.  If
2030       it does worry you, you can always reverse the hash into a hash of
2031       arrays instead:
2032
2033               while (($key, $value) = each %by_key) {
2034                        push @{$key_list_by_value{$value}}, $key;
2035                       }
2036
2037   How can I know how many entries are in a hash?
2038       (contributed by brian d foy)
2039
2040       This is very similar to "How do I process an entire hash?", also in
2041       perlfaq4, but a bit simpler in the common cases.
2042
2043       You can use the "keys()" built-in function in scalar context to find
2044       out have many entries you have in a hash:
2045
2046               my $key_count = keys %hash; # must be scalar context!
2047
2048       If you want to find out how many entries have a defined value, that's a
2049       bit different. You have to check each value. A "grep" is handy:
2050
2051               my $defined_value_count = grep { defined } values %hash;
2052
2053       You can use that same structure to count the entries any way that you
2054       like. If you want the count of the keys with vowels in them, you just
2055       test for that instead:
2056
2057               my $vowel_count = grep { /[aeiou]/ } keys %hash;
2058
2059       The "grep" in scalar context returns the count. If you want the list of
2060       matching items, just use it in list context instead:
2061
2062               my @defined_values = grep { defined } values %hash;
2063
2064       The "keys()" function also resets the iterator, which means that you
2065       may see strange results if you use this between uses of other hash
2066       operators such as "each()".
2067
2068   How do I sort a hash (optionally by value instead of key)?
2069       (contributed by brian d foy)
2070
2071       To sort a hash, start with the keys. In this example, we give the list
2072       of keys to the sort function which then compares them ASCIIbetically
2073       (which might be affected by your locale settings). The output list has
2074       the keys in ASCIIbetical order. Once we have the keys, we can go
2075       through them to create a report which lists the keys in ASCIIbetical
2076       order.
2077
2078               my @keys = sort { $a cmp $b } keys %hash;
2079
2080               foreach my $key ( @keys )
2081                       {
2082                       printf "%-20s %6d\n", $key, $hash{$key};
2083                       }
2084
2085       We could get more fancy in the "sort()" block though. Instead of
2086       comparing the keys, we can compute a value with them and use that value
2087       as the comparison.
2088
2089       For instance, to make our report order case-insensitive, we use the
2090       "\L" sequence in a double-quoted string to make everything lowercase.
2091       The "sort()" block then compares the lowercased values to determine in
2092       which order to put the keys.
2093
2094               my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
2095
2096       Note: if the computation is expensive or the hash has many elements,
2097       you may want to look at the Schwartzian Transform to cache the
2098       computation results.
2099
2100       If we want to sort by the hash value instead, we use the hash key to
2101       look it up. We still get out a list of keys, but this time they are
2102       ordered by their value.
2103
2104               my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2105
2106       From there we can get more complex. If the hash values are the same, we
2107       can provide a secondary sort on the hash key.
2108
2109               my @keys = sort {
2110                       $hash{$a} <=> $hash{$b}
2111                               or
2112                       "\L$a" cmp "\L$b"
2113                       } keys %hash;
2114
2115   How can I always keep my hash sorted?
2116       You can look into using the "DB_File" module and "tie()" using the
2117       $DB_BTREE hash bindings as documented in "In Memory Databases" in
2118       DB_File. The "Tie::IxHash" module from CPAN might also be instructive.
2119       Although this does keep your hash sorted, you might not like the slow
2120       down you suffer from the tie interface. Are you sure you need to do
2121       this? :)
2122
2123   What's the difference between "delete" and "undef" with hashes?
2124       Hashes contain pairs of scalars: the first is the key, the second is
2125       the value.  The key will be coerced to a string, although the value can
2126       be any kind of scalar: string, number, or reference.  If a key $key is
2127       present in %hash, "exists($hash{$key})" will return true.  The value
2128       for a given key can be "undef", in which case $hash{$key} will be
2129       "undef" while "exists $hash{$key}" will return true.  This corresponds
2130       to ($key, "undef") being in the hash.
2131
2132       Pictures help...  Here's the %hash table:
2133
2134                 keys  values
2135               +------+------+
2136               |  a   |  3   |
2137               |  x   |  7   |
2138               |  d   |  0   |
2139               |  e   |  2   |
2140               +------+------+
2141
2142       And these conditions hold
2143
2144               $hash{'a'}                       is true
2145               $hash{'d'}                       is false
2146               defined $hash{'d'}               is true
2147               defined $hash{'a'}               is true
2148               exists $hash{'a'}                is true (Perl 5 only)
2149               grep ($_ eq 'a', keys %hash)     is true
2150
2151       If you now say
2152
2153               undef $hash{'a'}
2154
2155       your table now reads:
2156
2157                 keys  values
2158               +------+------+
2159               |  a   | undef|
2160               |  x   |  7   |
2161               |  d   |  0   |
2162               |  e   |  2   |
2163               +------+------+
2164
2165       and these conditions now hold; changes in caps:
2166
2167               $hash{'a'}                       is FALSE
2168               $hash{'d'}                       is false
2169               defined $hash{'d'}               is true
2170               defined $hash{'a'}               is FALSE
2171               exists $hash{'a'}                is true (Perl 5 only)
2172               grep ($_ eq 'a', keys %hash)     is true
2173
2174       Notice the last two: you have an undef value, but a defined key!
2175
2176       Now, consider this:
2177
2178               delete $hash{'a'}
2179
2180       your table now reads:
2181
2182                 keys  values
2183               +------+------+
2184               |  x   |  7   |
2185               |  d   |  0   |
2186               |  e   |  2   |
2187               +------+------+
2188
2189       and these conditions now hold; changes in caps:
2190
2191               $hash{'a'}                       is false
2192               $hash{'d'}                       is false
2193               defined $hash{'d'}               is true
2194               defined $hash{'a'}               is false
2195               exists $hash{'a'}                is FALSE (Perl 5 only)
2196               grep ($_ eq 'a', keys %hash)     is FALSE
2197
2198       See, the whole entry is gone!
2199
2200   Why don't my tied hashes make the defined/exists distinction?
2201       This depends on the tied hash's implementation of EXISTS().  For
2202       example, there isn't the concept of undef with hashes that are tied to
2203       DBM* files. It also means that exists() and defined() do the same thing
2204       with a DBM* file, and what they end up doing is not what they do with
2205       ordinary hashes.
2206
2207   How do I reset an each() operation part-way through?
2208       (contributed by brian d foy)
2209
2210       You can use the "keys" or "values" functions to reset "each". To simply
2211       reset the iterator used by "each" without doing anything else, use one
2212       of them in void context:
2213
2214               keys %hash; # resets iterator, nothing else.
2215               values %hash; # resets iterator, nothing else.
2216
2217       See the documentation for "each" in perlfunc.
2218
2219   How can I get the unique keys from two hashes?
2220       First you extract the keys from the hashes into lists, then solve the
2221       "removing duplicates" problem described above.  For example:
2222
2223               %seen = ();
2224               for $element (keys(%foo), keys(%bar)) {
2225                       $seen{$element}++;
2226                       }
2227               @uniq = keys %seen;
2228
2229       Or more succinctly:
2230
2231               @uniq = keys %{{%foo,%bar}};
2232
2233       Or if you really want to save space:
2234
2235               %seen = ();
2236               while (defined ($key = each %foo)) {
2237                       $seen{$key}++;
2238               }
2239               while (defined ($key = each %bar)) {
2240                       $seen{$key}++;
2241               }
2242               @uniq = keys %seen;
2243
2244   How can I store a multidimensional array in a DBM file?
2245       Either stringify the structure yourself (no fun), or else get the MLDBM
2246       (which uses Data::Dumper) module from CPAN and layer it on top of
2247       either DB_File or GDBM_File.
2248
2249   How can I make my hash remember the order I put elements into it?
2250       Use the "Tie::IxHash" from CPAN.
2251
2252               use Tie::IxHash;
2253
2254               tie my %myhash, 'Tie::IxHash';
2255
2256               for (my $i=0; $i<20; $i++) {
2257                       $myhash{$i} = 2*$i;
2258                       }
2259
2260               my @keys = keys %myhash;
2261               # @keys = (0,1,2,3,...)
2262
2263   Why does passing a subroutine an undefined element in a hash create it?
2264       (contributed by brian d foy)
2265
2266       Are you using a really old version of Perl?
2267
2268       Normally, accessing a hash key's value for a nonexistent key will not
2269       create the key.
2270
2271               my %hash  = ();
2272               my $value = $hash{ 'foo' };
2273               print "This won't print\n" if exists $hash{ 'foo' };
2274
2275       Passing $hash{ 'foo' } to a subroutine used to be a special case,
2276       though.  Since you could assign directly to $_[0], Perl had to be ready
2277       to make that assignment so it created the hash key ahead of time:
2278
2279           my_sub( $hash{ 'foo' } );
2280               print "This will print before 5.004\n" if exists $hash{ 'foo' };
2281
2282               sub my_sub {
2283                       # $_[0] = 'bar'; # create hash key in case you do this
2284                       1;
2285                       }
2286
2287       Since Perl 5.004, however, this situation is a special case and Perl
2288       creates the hash key only when you make the assignment:
2289
2290           my_sub( $hash{ 'foo' } );
2291               print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2292
2293               sub my_sub {
2294                       $_[0] = 'bar';
2295                       }
2296
2297       However, if you want the old behavior (and think carefully about that
2298       because it's a weird side effect), you can pass a hash slice instead.
2299       Perl 5.004 didn't make this a special case:
2300
2301               my_sub( @hash{ qw/foo/ } );
2302
2303   How can I make the Perl equivalent of a C structure/C++ class/hash or array
2304       of hashes or arrays?
2305       Usually a hash ref, perhaps like this:
2306
2307               $record = {
2308                       NAME   => "Jason",
2309                       EMPNO  => 132,
2310                       TITLE  => "deputy peon",
2311                       AGE    => 23,
2312                       SALARY => 37_000,
2313                       PALS   => [ "Norbert", "Rhys", "Phineas"],
2314               };
2315
2316       References are documented in perlref and the upcoming perlreftut.
2317       Examples of complex data structures are given in perldsc and perllol.
2318       Examples of structures and object-oriented classes are in perltoot.
2319
2320   How can I use a reference as a hash key?
2321       (contributed by brian d foy and Ben Morrow)
2322
2323       Hash keys are strings, so you can't really use a reference as the key.
2324       When you try to do that, perl turns the reference into its stringified
2325       form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2326       the reference from the stringified form, at least without doing some
2327       extra work on your own.
2328
2329       Remember that the entry in the hash will still be there even if the
2330       referenced variable  goes out of scope, and that it is entirely
2331       possible for Perl to subsequently allocate a different variable at the
2332       same address. This will mean a new variable might accidentally be
2333       associated with the value for an old.
2334
2335       If you have Perl 5.10 or later, and you just want to store a value
2336       against the reference for lookup later, you can use the core
2337       Hash::Util::Fieldhash module. This will also handle renaming the keys
2338       if you use multiple threads (which causes all variables to be
2339       reallocated at new addresses, changing their stringification), and
2340       garbage-collecting the entries when the referenced variable goes out of
2341       scope.
2342
2343       If you actually need to be able to get a real reference back from each
2344       hash entry, you can use the Tie::RefHash module, which does the
2345       required work for you.
2346

Data: Misc

2348   How do I handle binary data correctly?
2349       Perl is binary clean, so it can handle binary data just fine.  On
2350       Windows or DOS, however, you have to use "binmode" for binary files to
2351       avoid conversions for line endings. In general, you should use
2352       "binmode" any time you want to work with binary data.
2353
2354       Also see "binmode" in perlfunc or perlopentut.
2355
2356       If you're concerned about 8-bit textual data then see perllocale.  If
2357       you want to deal with multibyte characters, however, there are some
2358       gotchas.  See the section on Regular Expressions.
2359
2360   How do I determine whether a scalar is a number/whole/integer/float?
2361       Assuming that you don't care about IEEE notations like "NaN" or
2362       "Infinity", you probably just want to use a regular expression.
2363
2364               if (/\D/)            { print "has nondigits\n" }
2365               if (/^\d+$/)         { print "is a whole number\n" }
2366               if (/^-?\d+$/)       { print "is an integer\n" }
2367               if (/^[+-]?\d+$/)    { print "is a +/- integer\n" }
2368               if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2369               if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2370               if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
2371                               { print "a C float\n" }
2372
2373       There are also some commonly used modules for the task.  Scalar::Util
2374       (distributed with 5.8) provides access to perl's internal function
2375       "looks_like_number" for determining whether a variable looks like a
2376       number.  Data::Types exports functions that validate data types using
2377       both the above and other regular expressions. Thirdly, there is
2378       "Regexp::Common" which has regular expressions to match various types
2379       of numbers. Those three modules are available from the CPAN.
2380
2381       If you're on a POSIX system, Perl supports the "POSIX::strtod"
2382       function.  Its semantics are somewhat cumbersome, so here's a "getnum"
2383       wrapper function for more convenient access.  This function takes a
2384       string and returns the number it found, or "undef" for input that isn't
2385       a C float.  The "is_numeric" function is a front end to "getnum" if you
2386       just want to say, "Is this a float?"
2387
2388               sub getnum {
2389                       use POSIX qw(strtod);
2390                       my $str = shift;
2391                       $str =~ s/^\s+//;
2392                       $str =~ s/\s+$//;
2393                       $! = 0;
2394                       my($num, $unparsed) = strtod($str);
2395                       if (($str eq '') || ($unparsed != 0) || $!) {
2396                                       return undef;
2397                               }
2398                       else {
2399                               return $num;
2400                               }
2401                       }
2402
2403               sub is_numeric { defined getnum($_[0]) }
2404
2405       Or you could check out the String::Scanf module on the CPAN instead.
2406       The "POSIX" module (part of the standard Perl distribution) provides
2407       the "strtod" and "strtol" for converting strings to double and longs,
2408       respectively.
2409
2410   How do I keep persistent data across program calls?
2411       For some specific applications, you can use one of the DBM modules.
2412       See AnyDBM_File.  More generically, you should consult the "FreezeThaw"
2413       or "Storable" modules from CPAN.  Starting from Perl 5.8 "Storable" is
2414       part of the standard distribution.  Here's one example using
2415       "Storable"'s "store" and "retrieve" functions:
2416
2417               use Storable;
2418               store(\%hash, "filename");
2419
2420               # later on...
2421               $href = retrieve("filename");        # by ref
2422               %hash = %{ retrieve("filename") };   # direct to hash
2423
2424   How do I print out or copy a recursive data structure?
2425       The "Data::Dumper" module on CPAN (or the 5.005 release of Perl) is
2426       great for printing out data structures.  The "Storable" module on CPAN
2427       (or the 5.8 release of Perl), provides a function called "dclone" that
2428       recursively copies its argument.
2429
2430               use Storable qw(dclone);
2431               $r2 = dclone($r1);
2432
2433       Where $r1 can be a reference to any kind of data structure you'd like.
2434       It will be deeply copied.  Because "dclone" takes and returns
2435       references, you'd have to add extra punctuation if you had a hash of
2436       arrays that you wanted to copy.
2437
2438               %newhash = %{ dclone(\%oldhash) };
2439
2440   How do I define methods for every class/object?
2441       (contributed by Ben Morrow)
2442
2443       You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2444       very careful to consider the consequences of doing this: adding methods
2445       to every object is very likely to have unintended consequences. If
2446       possible, it would be better to have all your object inherit from some
2447       common base class, or to use an object system like Moose that supports
2448       roles.
2449
2450   How do I verify a credit card checksum?
2451       Get the "Business::CreditCard" module from CPAN.
2452
2453   How do I pack arrays of doubles or floats for XS code?
2454       The arrays.h/arrays.c code in the "PGPLOT" module on CPAN does just
2455       this.  If you're doing a lot of float or double processing, consider
2456       using the "PDL" module from CPAN instead--it makes number-crunching
2457       easy.
2458
2459       See <http://search.cpan.org/dist/PGPLOT> for the code.
2460
2462       Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other
2463       authors as noted. All rights reserved.
2464
2465       This documentation is free; you can redistribute it and/or modify it
2466       under the same terms as Perl itself.
2467
2468       Irrespective of its distribution, all code examples in this file are
2469       hereby placed into the public domain.  You are permitted and encouraged
2470       to use this code in your own programs for fun or for profit as you see
2471       fit.  A simple comment in the code giving credit would be courteous but
2472       is not required.
2473
2474
2475
2476perl v5.12.4                      2011-06-07                       PERLFAQ4(1)
Impressum