1PERLFAQ4(1)            Perl Programmers Reference Guide            PERLFAQ4(1)
2
3
4

NAME

6       perlfaq4 - Data Manipulation
7

DESCRIPTION

9       This section of the FAQ answers questions related to manipulating
10       numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
11

Data: Numbers

13   Why am I getting long decimals (eg, 19.9499999999999) instead of the
14       numbers I should be getting (eg, 19.95)?
15       For the long explanation, see David Goldberg's "What Every Computer
16       Scientist Should Know About Floating-Point Arithmetic"
17       (<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>).
18
19       Internally, your computer represents floating-point numbers in binary.
20       Digital (as in powers of two) computers cannot store all numbers
21       exactly. Some real numbers lose precision in the process. This is a
22       problem with how computers store numbers and affects all computer
23       languages, not just Perl.
24
25       perlnumber shows the gory details of number representations and
26       conversions.
27
28       To limit the number of decimal places in your numbers, you can use the
29       "printf" or "sprintf" function. See "Floating-point Arithmetic" in
30       perlop for more details.
31
32           printf "%.2f", 10/3;
33
34           my $number = sprintf "%.2f", 10/3;
35
36   Why is int() broken?
37       Your "int()" is most probably working just fine. It's the numbers that
38       aren't quite what you think.
39
40       First, see the answer to "Why am I getting long decimals (eg,
41       19.9499999999999) instead of the numbers I should be getting (eg,
42       19.95)?".
43
44       For example, this
45
46           print int(0.6/0.2-2), "\n";
47
48       will in most computers print 0, not 1, because even such simple numbers
49       as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
50       What you think in the above as 'three' is really more like
51       2.9999999999999995559.
52
53   Why isn't my octal data interpreted correctly?
54       (contributed by brian d foy)
55
56       You're probably trying to convert a string to a number, which Perl only
57       converts as a decimal number. When Perl converts a string to a number,
58       it ignores leading spaces and zeroes, then assumes the rest of the
59       digits are in base 10:
60
61           my $string = '0644';
62
63           print $string + 0;  # prints 644
64
65           print $string + 44; # prints 688, certainly not octal!
66
67       This problem usually involves one of the Perl built-ins that has the
68       same name a Unix command that uses octal numbers as arguments on the
69       command line. In this example, "chmod" on the command line knows that
70       its first argument is octal because that's what it does:
71
72           %prompt> chmod 644 file
73
74       If you want to use the same literal digits (644) in Perl, you have to
75       tell Perl to treat them as octal numbers either by prefixing the digits
76       with a 0 or using "oct":
77
78           chmod(     0644, $filename );  # right, has leading zero
79           chmod( oct(644), $filename );  # also correct
80
81       The problem comes in when you take your numbers from something that
82       Perl thinks is a string, such as a command line argument in @ARGV:
83
84           chmod( $ARGV[0],      $filename );  # wrong, even if "0644"
85
86           chmod( oct($ARGV[0]), $filename );  # correct, treat string as octal
87
88       You can always check the value you're using by printing it in octal
89       notation to ensure it matches what you think it should be. Print it in
90       octal  and decimal format:
91
92           printf "0%o %d", $number, $number;
93
94   Does Perl have a round() function? What about ceil() and floor()? Trig
95       functions?
96       Remember that "int()" merely truncates toward 0. For rounding to a
97       certain number of digits, "sprintf()" or "printf()" is usually the
98       easiest route.
99
100           printf("%.3f", 3.1415926535);   # prints 3.142
101
102       The POSIX module (part of the standard Perl distribution) implements
103       "ceil()", "floor()", and a number of other mathematical and
104       trigonometric functions.
105
106           use POSIX;
107           my $ceil   = ceil(3.5);   # 4
108           my $floor  = floor(3.5);  # 3
109
110       In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
111       module. With 5.004, the Math::Trig module (part of the standard Perl
112       distribution) implements the trigonometric functions. Internally it
113       uses the Math::Complex module and some functions can break out from the
114       real axis into the complex plane, for example the inverse sine of 2.
115
116       Rounding in financial applications can have serious implications, and
117       the rounding method used should be specified precisely. In these cases,
118       it probably pays not to trust whichever system of rounding is being
119       used by Perl, but instead to implement the rounding function you need
120       yourself.
121
122       To see why, notice how you'll still have an issue on half-way-point
123       alternation:
124
125           for (my $i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
126
127           0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
128           0.8 0.8 0.9 0.9 1.0 1.0
129
130       Don't blame Perl. It's the same as in C. IEEE says we have to do this.
131       Perl numbers whose absolute values are integers under 2**31 (on 32-bit
132       machines) will work pretty much like mathematical integers.  Other
133       numbers are not guaranteed.
134
135   How do I convert between numeric representations/bases/radixes?
136       As always with Perl there is more than one way to do it. Below are a
137       few examples of approaches to making common conversions between number
138       representations. This is intended to be representational rather than
139       exhaustive.
140
141       Some of the examples later in perlfaq4 use the Bit::Vector module from
142       CPAN. The reason you might choose Bit::Vector over the perl built-in
143       functions is that it works with numbers of ANY size, that it is
144       optimized for speed on some operations, and for at least some
145       programmers the notation might be familiar.
146
147       How do I convert hexadecimal into decimal
148           Using perl's built in conversion of "0x" notation:
149
150               my $dec = 0xDEADBEEF;
151
152           Using the "hex" function:
153
154               my $dec = hex("DEADBEEF");
155
156           Using "pack":
157
158               my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
159
160           Using the CPAN module "Bit::Vector":
161
162               use Bit::Vector;
163               my $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
164               my $dec = $vec->to_Dec();
165
166       How do I convert from decimal to hexadecimal
167           Using "sprintf":
168
169               my $hex = sprintf("%X", 3735928559); # upper case A-F
170               my $hex = sprintf("%x", 3735928559); # lower case a-f
171
172           Using "unpack":
173
174               my $hex = unpack("H*", pack("N", 3735928559));
175
176           Using Bit::Vector:
177
178               use Bit::Vector;
179               my $vec = Bit::Vector->new_Dec(32, -559038737);
180               my $hex = $vec->to_Hex();
181
182           And Bit::Vector supports odd bit counts:
183
184               use Bit::Vector;
185               my $vec = Bit::Vector->new_Dec(33, 3735928559);
186               $vec->Resize(32); # suppress leading 0 if unwanted
187               my $hex = $vec->to_Hex();
188
189       How do I convert from octal to decimal
190           Using Perl's built in conversion of numbers with leading zeros:
191
192               my $dec = 033653337357; # note the leading 0!
193
194           Using the "oct" function:
195
196               my $dec = oct("33653337357");
197
198           Using Bit::Vector:
199
200               use Bit::Vector;
201               my $vec = Bit::Vector->new(32);
202               $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
203               my $dec = $vec->to_Dec();
204
205       How do I convert from decimal to octal
206           Using "sprintf":
207
208               my $oct = sprintf("%o", 3735928559);
209
210           Using Bit::Vector:
211
212               use Bit::Vector;
213               my $vec = Bit::Vector->new_Dec(32, -559038737);
214               my $oct = reverse join('', $vec->Chunk_List_Read(3));
215
216       How do I convert from binary to decimal
217           Perl 5.6 lets you write binary numbers directly with the "0b"
218           notation:
219
220               my $number = 0b10110110;
221
222           Using "oct":
223
224               my $input = "10110110";
225               my $decimal = oct( "0b$input" );
226
227           Using "pack" and "ord":
228
229               my $decimal = ord(pack('B8', '10110110'));
230
231           Using "pack" and "unpack" for larger strings:
232
233               my $int = unpack("N", pack("B32",
234               substr("0" x 32 . "11110101011011011111011101111", -32)));
235               my $dec = sprintf("%d", $int);
236
237               # substr() is used to left-pad a 32-character string with zeros.
238
239           Using Bit::Vector:
240
241               my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
242               my $dec = $vec->to_Dec();
243
244       How do I convert from decimal to binary
245           Using "sprintf" (perl 5.6+):
246
247               my $bin = sprintf("%b", 3735928559);
248
249           Using "unpack":
250
251               my $bin = unpack("B*", pack("N", 3735928559));
252
253           Using Bit::Vector:
254
255               use Bit::Vector;
256               my $vec = Bit::Vector->new_Dec(32, -559038737);
257               my $bin = $vec->to_Bin();
258
259           The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
260           are left as an exercise to the inclined reader.
261
262   Why doesn't & work the way I want it to?
263       The behavior of binary arithmetic operators depends on whether they're
264       used on numbers or strings. The operators treat a string as a series of
265       bits and work with that (the string "3" is the bit pattern 00110011).
266       The operators work with the binary form of a number (the number 3 is
267       treated as the bit pattern 00000011).
268
269       So, saying "11 & 3" performs the "and" operation on numbers (yielding
270       3). Saying "11" & "3" performs the "and" operation on strings (yielding
271       "1").
272
273       Most problems with "&" and "|" arise because the programmer thinks they
274       have a number but really it's a string or vice versa. To avoid this,
275       stringify the arguments explicitly (using "" or "qq()") or convert them
276       to numbers explicitly (using "0+$arg"). The rest arise because the
277       programmer says:
278
279           if ("\020\020" & "\101\101") {
280               # ...
281           }
282
283       but a string consisting of two null bytes (the result of "\020\020" &
284       "\101\101") is not a false value in Perl. You need:
285
286           if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
287               # ...
288           }
289
290   How do I multiply matrices?
291       Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
292       or the PDL extension (also available from CPAN).
293
294   How do I perform an operation on a series of integers?
295       To call a function on each element in an array, and collect the
296       results, use:
297
298           my @results = map { my_func($_) } @array;
299
300       For example:
301
302           my @triple = map { 3 * $_ } @single;
303
304       To call a function on each element of an array, but ignore the results:
305
306           foreach my $iterator (@array) {
307               some_func($iterator);
308           }
309
310       To call a function on each integer in a (small) range, you can use:
311
312           my @results = map { some_func($_) } (5 .. 25);
313
314       but you should be aware that in this form, the ".." operator creates a
315       list of all integers in the range, which can take a lot of memory for
316       large ranges. However, the problem does not occur when using ".."
317       within a "for" loop, because in that case the range operator is
318       optimized to iterate over the range, without creating the entire list.
319       So
320
321           my @results = ();
322           for my $i (5 .. 500_005) {
323               push(@results, some_func($i));
324           }
325
326       or even
327
328          push(@results, some_func($_)) for 5 .. 500_005;
329
330       will not create an intermediate list of 500,000 integers.
331
332   How can I output Roman numerals?
333       Get the http://www.cpan.org/modules/by-module/Roman
334       <http://www.cpan.org/modules/by-module/Roman> module.
335
336   Why aren't my random numbers random?
337       If you're using a version of Perl before 5.004, you must call "srand"
338       once at the start of your program to seed the random number generator.
339
340            BEGIN { srand() if $] < 5.004 }
341
342       5.004 and later automatically call "srand" at the beginning. Don't call
343       "srand" more than once--you make your numbers less random, rather than
344       more.
345
346       Computers are good at being predictable and bad at being random
347       (despite appearances caused by bugs in your programs :-). The random
348       article in the "Far More Than You Ever Wanted To Know" collection in
349       <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy of Tom
350       Phoenix, talks more about this. John von Neumann said, "Anyone who
351       attempts to generate random numbers by deterministic means is, of
352       course, living in a state of sin."
353
354       Perl relies on the underlying system for the implementation of "rand"
355       and "srand"; on some systems, the generated numbers are not random
356       enough (especially on Windows : see
357       <http://www.perlmonks.org/?node_id=803632>).  Several CPAN modules in
358       the "Math" namespace implement better pseudorandom generators; see for
359       example Math::Random::MT ("Mersenne Twister", fast), or
360       Math::TrulyRandom (uses the imperfections in the system's timer to
361       generate random numbers, which is rather slow).  More algorithms for
362       random numbers are described in "Numerical Recipes in C" at
363       <http://www.nr.com/>
364
365   How do I get a random number between X and Y?
366       To get a random number between two values, you can use the "rand()"
367       built-in to get a random number between 0 and 1. From there, you shift
368       that into the range that you want.
369
370       "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
371       you want to have perl figure out is a random number in the range from 0
372       to the difference between your X and Y.
373
374       That is, to get a number between 10 and 15, inclusive, you want a
375       random number between 0 and 5 that you can then add to 10.
376
377           my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
378
379       Hence you derive the following simple function to abstract that. It
380       selects a random integer between the two given integers (inclusive),
381       For example: "random_int_between(50,120)".
382
383           sub random_int_between {
384               my($min, $max) = @_;
385               # Assumes that the two arguments are integers themselves!
386               return $min if $min == $max;
387               ($min, $max) = ($max, $min)  if  $min > $max;
388               return $min + int rand(1 + $max - $min);
389           }
390

Data: Dates

392   How do I find the day or week of the year?
393       The day of the year is in the list returned by the "localtime"
394       function. Without an argument "localtime" uses the current time.
395
396           my $day_of_year = (localtime)[7];
397
398       The POSIX module can also format a date as the day of the year or week
399       of the year.
400
401           use POSIX qw/strftime/;
402           my $day_of_year  = strftime "%j", localtime;
403           my $week_of_year = strftime "%W", localtime;
404
405       To get the day of year for any date, use POSIX's "mktime" to get a time
406       in epoch seconds for the argument to "localtime".
407
408           use POSIX qw/mktime strftime/;
409           my $week_of_year = strftime "%W",
410               localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
411
412       You can also use Time::Piece, which comes with Perl and provides a
413       "localtime" that returns an object:
414
415           use Time::Piece;
416           my $day_of_year  = localtime->yday;
417           my $week_of_year = localtime->week;
418
419       The Date::Calc module provides two functions to calculate these, too:
420
421           use Date::Calc;
422           my $day_of_year  = Day_of_Year(  1987, 12, 18 );
423           my $week_of_year = Week_of_Year( 1987, 12, 18 );
424
425   How do I find the current century or millennium?
426       Use the following simple functions:
427
428           sub get_century    {
429               return int((((localtime(shift || time))[5] + 1999))/100);
430           }
431
432           sub get_millennium {
433               return 1+int((((localtime(shift || time))[5] + 1899))/1000);
434           }
435
436       On some systems, the POSIX module's "strftime()" function has been
437       extended in a non-standard way to use a %C format, which they sometimes
438       claim is the "century". It isn't, because on most such systems, this is
439       only the first two digits of the four-digit year, and thus cannot be
440       used to determine reliably the current century or millennium.
441
442   How can I compare two dates and find the difference?
443       (contributed by brian d foy)
444
445       You could just store all your dates as a number and then subtract.
446       Life isn't always that simple though.
447
448       The Time::Piece module, which comes with Perl, replaces localtime with
449       a version that returns an object. It also overloads the comparison
450       operators so you can compare them directly:
451
452           use Time::Piece;
453           my $date1 = localtime( $some_time );
454           my $date2 = localtime( $some_other_time );
455
456           if( $date1 < $date2 ) {
457               print "The date was in the past\n";
458           }
459
460       You can also get differences with a subtraction, which returns a
461       Time::Seconds object:
462
463           my $diff = $date1 - $date2;
464           print "The difference is ", $date_diff->days, " days\n";
465
466       If you want to work with formatted dates, the Date::Manip, Date::Calc,
467       or DateTime modules can help you.
468
469   How can I take a string and turn it into epoch seconds?
470       If it's a regular enough string that it always has the same format, you
471       can split it up and pass the parts to "timelocal" in the standard
472       Time::Local module. Otherwise, you should look into the Date::Calc,
473       Date::Parse, and Date::Manip modules from CPAN.
474
475   How can I find the Julian Day?
476       (contributed by brian d foy and Dave Cross)
477
478       You can use the Time::Piece module, part of the Standard Library, which
479       can convert a date/time to a Julian Day:
480
481           $ perl -MTime::Piece -le 'print localtime->julian_day'
482           2455607.7959375
483
484       Or the modified Julian Day:
485
486           $ perl -MTime::Piece -le 'print localtime->mjd'
487           55607.2961226851
488
489       Or even the day of the year (which is what some people think of as a
490       Julian day):
491
492           $ perl -MTime::Piece -le 'print localtime->yday'
493           45
494
495       You can also do the same things with the DateTime module:
496
497           $ perl -MDateTime -le'print DateTime->today->jd'
498           2453401.5
499           $ perl -MDateTime -le'print DateTime->today->mjd'
500           53401
501           $ perl -MDateTime -le'print DateTime->today->doy'
502           31
503
504       You can use the Time::JulianDay module available on CPAN. Ensure that
505       you really want to find a Julian day, though, as many people have
506       different ideas about Julian days (see
507       <http://www.hermetic.ch/cal_stud/jdn.htm> for instance):
508
509           $  perl -MTime::JulianDay -le 'print local_julian_day( time )'
510           55608
511
512   How do I find yesterday's date?
513       (contributed by brian d foy)
514
515       To do it correctly, you can use one of the "Date" modules since they
516       work with calendars instead of times. The DateTime module makes it
517       simple, and give you the same time of day, only the day before, despite
518       daylight saving time changes:
519
520           use DateTime;
521
522           my $yesterday = DateTime->now->subtract( days => 1 );
523
524           print "Yesterday was $yesterday\n";
525
526       You can also use the Date::Calc module using its "Today_and_Now"
527       function.
528
529           use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
530
531           my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
532
533           print "@date_time\n";
534
535       Most people try to use the time rather than the calendar to figure out
536       dates, but that assumes that days are twenty-four hours each. For most
537       people, there are two days a year when they aren't: the switch to and
538       from summer time throws this off. For example, the rest of the
539       suggestions will be wrong sometimes:
540
541       Starting with Perl 5.10, Time::Piece and Time::Seconds are part of the
542       standard distribution, so you might think that you could do something
543       like this:
544
545           use Time::Piece;
546           use Time::Seconds;
547
548           my $yesterday = localtime() - ONE_DAY; # WRONG
549           print "Yesterday was $yesterday\n";
550
551       The Time::Piece module exports a new "localtime" that returns an
552       object, and Time::Seconds exports the "ONE_DAY" constant that is a set
553       number of seconds. This means that it always gives the time 24 hours
554       ago, which is not always yesterday. This can cause problems around the
555       end of daylight saving time when there's one day that is 25 hours long.
556
557       You have the same problem with Time::Local, which will give the wrong
558       answer for those same special cases:
559
560           # contributed by Gunnar Hjalmarsson
561            use Time::Local;
562            my $today = timelocal 0, 0, 12, ( localtime )[3..5];
563            my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
564            printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
565
566   Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
567       (contributed by brian d foy)
568
569       Perl itself never had a Y2K problem, although that never stopped people
570       from creating Y2K problems on their own. See the documentation for
571       "localtime" for its proper use.
572
573       Starting with Perl 5.12, "localtime" and "gmtime" can handle dates past
574       03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
575       still might get a warning on a 32-bit "perl":
576
577           % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
578           Integer overflow in hexadecimal number at -e line 1.
579           Wed Nov  1 19:42:39 5576711
580
581       On a 64-bit "perl", you can get even larger dates for those really long
582       running projects:
583
584           % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
585           Thu Nov  2 00:42:39 5576711
586
587       You're still out of luck if you need to keep track of decaying protons
588       though.
589

Data: Strings

591   How do I validate input?
592       (contributed by brian d foy)
593
594       There are many ways to ensure that values are what you expect or want
595       to accept. Besides the specific examples that we cover in the perlfaq,
596       you can also look at the modules with "Assert" and "Validate" in their
597       names, along with other modules such as Regexp::Common.
598
599       Some modules have validation for particular types of input, such as
600       Business::ISBN, Business::CreditCard, Email::Valid, and
601       Data::Validate::IP.
602
603   How do I unescape a string?
604       It depends just what you mean by "escape". URL escapes are dealt with
605       in perlfaq9. Shell escapes with the backslash ("\") character are
606       removed with
607
608           s/\\(.)/$1/g;
609
610       This won't expand "\n" or "\t" or any other special escapes.
611
612   How do I remove consecutive pairs of characters?
613       (contributed by brian d foy)
614
615       You can use the substitution operator to find pairs of characters (or
616       runs of characters) and replace them with a single instance. In this
617       substitution, we find a character in "(.)". The memory parentheses
618       store the matched character in the back-reference "\g1" and we use that
619       to require that the same thing immediately follow it. We replace that
620       part of the string with the character in $1.
621
622           s/(.)\g1/$1/g;
623
624       We can also use the transliteration operator, "tr///". In this example,
625       the search list side of our "tr///" contains nothing, but the "c"
626       option complements that so it contains everything. The replacement list
627       also contains nothing, so the transliteration is almost a no-op since
628       it won't do any replacements (or more exactly, replace the character
629       with itself). However, the "s" option squashes duplicated and
630       consecutive characters in the string so a character does not show up
631       next to itself
632
633           my $str = 'Haarlem';   # in the Netherlands
634           $str =~ tr///cs;       # Now Harlem, like in New York
635
636   How do I expand function calls in a string?
637       (contributed by brian d foy)
638
639       This is documented in perlref, and although it's not the easiest thing
640       to read, it does work. In each of these examples, we call the function
641       inside the braces used to dereference a reference. If we have more than
642       one return value, we can construct and dereference an anonymous array.
643       In this case, we call the function in list context.
644
645           print "The time values are @{ [localtime] }.\n";
646
647       If we want to call the function in scalar context, we have to do a bit
648       more work. We can really have any code we like inside the braces, so we
649       simply have to end with the scalar reference, although how you do that
650       is up to you, and you can use code inside the braces. Note that the use
651       of parens creates a list context, so we need "scalar" to force the
652       scalar context on the function:
653
654           print "The time is ${\(scalar localtime)}.\n"
655
656           print "The time is ${ my $x = localtime; \$x }.\n";
657
658       If your function already returns a reference, you don't need to create
659       the reference yourself.
660
661           sub timestamp { my $t = localtime; \$t }
662
663           print "The time is ${ timestamp() }.\n";
664
665       The "Interpolation" module can also do a lot of magic for you. You can
666       specify a variable name, in this case "E", to set up a tied hash that
667       does the interpolation for you. It has several other methods to do this
668       as well.
669
670           use Interpolation E => 'eval';
671           print "The time values are $E{localtime()}.\n";
672
673       In most cases, it is probably easier to simply use string
674       concatenation, which also forces scalar context.
675
676           print "The time is " . localtime() . ".\n";
677
678   How do I find matching/nesting anything?
679       To find something between two single characters, a pattern like
680       "/x([^x]*)x/" will get the intervening bits in $1. For multiple ones,
681       then something more like "/alpha(.*?)omega/" would be needed. For
682       nested patterns and/or balanced expressions, see the so-called (?PARNO)
683       construct (available since perl 5.10).  The CPAN module Regexp::Common
684       can help to build such regular expressions (see in particular
685       Regexp::Common::balanced and Regexp::Common::delimited).
686
687       More complex cases will require to write a parser, probably using a
688       parsing module from CPAN, like Regexp::Grammars, Parse::RecDescent,
689       Parse::Yapp, Text::Balanced, or Marpa::XS.
690
691   How do I reverse a string?
692       Use "reverse()" in scalar context, as documented in "reverse" in
693       perlfunc.
694
695           my $reversed = reverse $string;
696
697   How do I expand tabs in a string?
698       You can do it yourself:
699
700           1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
701
702       Or you can just use the Text::Tabs module (part of the standard Perl
703       distribution).
704
705           use Text::Tabs;
706           my @expanded_lines = expand(@lines_with_tabs);
707
708   How do I reformat a paragraph?
709       Use Text::Wrap (part of the standard Perl distribution):
710
711           use Text::Wrap;
712           print wrap("\t", '  ', @paragraphs);
713
714       The paragraphs you give to Text::Wrap should not contain embedded
715       newlines. Text::Wrap doesn't justify the lines (flush-right).
716
717       Or use the CPAN module Text::Autoformat. Formatting files can be easily
718       done by making a shell alias, like so:
719
720           alias fmt="perl -i -MText::Autoformat -n0777 \
721               -e 'print autoformat $_, {all=>1}' $*"
722
723       See the documentation for Text::Autoformat to appreciate its many
724       capabilities.
725
726   How can I access or change N characters of a string?
727       You can access the first characters of a string with substr().  To get
728       the first character, for example, start at position 0 and grab the
729       string of length 1.
730
731           my $string = "Just another Perl Hacker";
732           my $first_char = substr( $string, 0, 1 );  #  'J'
733
734       To change part of a string, you can use the optional fourth argument
735       which is the replacement string.
736
737           substr( $string, 13, 4, "Perl 5.8.0" );
738
739       You can also use substr() as an lvalue.
740
741           substr( $string, 13, 4 ) =  "Perl 5.8.0";
742
743   How do I change the Nth occurrence of something?
744       You have to keep track of N yourself. For example, let's say you want
745       to change the fifth occurrence of "whoever" or "whomever" into
746       "whosoever" or "whomsoever", case insensitively. These all assume that
747       $_ contains the string to be altered.
748
749           $count = 0;
750           s{((whom?)ever)}{
751           ++$count == 5       # is it the 5th?
752               ? "${2}soever"  # yes, swap
753               : $1            # renege and leave it there
754               }ige;
755
756       In the more general case, you can use the "/g" modifier in a "while"
757       loop, keeping count of matches.
758
759           $WANT = 3;
760           $count = 0;
761           $_ = "One fish two fish red fish blue fish";
762           while (/(\w+)\s+fish\b/gi) {
763               if (++$count == $WANT) {
764                   print "The third fish is a $1 one.\n";
765               }
766           }
767
768       That prints out: "The third fish is a red one."  You can also use a
769       repetition count and repeated pattern like this:
770
771           /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
772
773   How can I count the number of occurrences of a substring within a string?
774       There are a number of ways, with varying efficiency. If you want a
775       count of a certain single character (X) within a string, you can use
776       the "tr///" function like so:
777
778           my $string = "ThisXlineXhasXsomeXx'sXinXit";
779           my $count = ($string =~ tr/X//);
780           print "There are $count X characters in the string";
781
782       This is fine if you are just looking for a single character. However,
783       if you are trying to count multiple character substrings within a
784       larger string, "tr///" won't work. What you can do is wrap a while()
785       loop around a global pattern match. For example, let's count negative
786       integers:
787
788           my $string = "-9 55 48 -2 23 -76 4 14 -44";
789           my $count = 0;
790           while ($string =~ /-\d+/g) { $count++ }
791           print "There are $count negative numbers in the string";
792
793       Another version uses a global match in list context, then assigns the
794       result to a scalar, producing a count of the number of matches.
795
796           my $count = () = $string =~ /-\d+/g;
797
798   How do I capitalize all the words on one line?
799       (contributed by brian d foy)
800
801       Damian Conway's Text::Autoformat handles all of the thinking for you.
802
803           use Text::Autoformat;
804           my $x = "Dr. Strangelove or: How I Learned to Stop ".
805             "Worrying and Love the Bomb";
806
807           print $x, "\n";
808           for my $style (qw( sentence title highlight )) {
809               print autoformat($x, { case => $style }), "\n";
810           }
811
812       How do you want to capitalize those words?
813
814           FRED AND BARNEY'S LODGE        # all uppercase
815           Fred And Barney's Lodge        # title case
816           Fred and Barney's Lodge        # highlight case
817
818       It's not as easy a problem as it looks. How many words do you think are
819       in there? Wait for it... wait for it.... If you answered 5 you're
820       right. Perl words are groups of "\w+", but that's not what you want to
821       capitalize. How is Perl supposed to know not to capitalize that "s"
822       after the apostrophe? You could try a regular expression:
823
824           $string =~ s/ (
825                        (^\w)    #at the beginning of the line
826                          |      # or
827                        (\s\w)   #preceded by whitespace
828                          )
829                       /\U$1/xg;
830
831           $string =~ s/([\w']+)/\u\L$1/g;
832
833       Now, what if you don't want to capitalize that "and"? Just use
834       Text::Autoformat and get on with the next problem. :)
835
836   How can I split a [character]-delimited string except when inside
837       [character]?
838       Several modules can handle this sort of parsing--Text::Balanced,
839       Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
840
841       Take the example case of trying to split a string that is comma-
842       separated into its different fields. You can't use "split(/,/)" because
843       you shouldn't split if the comma is inside quotes. For example, take a
844       data line like this:
845
846           SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
847
848       Due to the restriction of the quotes, this is a fairly complex problem.
849       Thankfully, we have Jeffrey Friedl, author of Mastering Regular
850       Expressions, to handle these for us. He suggests (assuming your string
851       is contained in $text):
852
853            my @new = ();
854            push(@new, $+) while $text =~ m{
855                "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
856               | ([^,]+),?
857               | ,
858            }gx;
859            push(@new, undef) if substr($text,-1,1) eq ',';
860
861       If you want to represent quotation marks inside a quotation-mark-
862       delimited field, escape them with backslashes (eg, "like \"this\"".
863
864       Alternatively, the Text::ParseWords module (part of the standard Perl
865       distribution) lets you say:
866
867           use Text::ParseWords;
868           @new = quotewords(",", 0, $text);
869
870       For parsing or generating CSV, though, using Text::CSV rather than
871       implementing it yourself is highly recommended; you'll save yourself
872       odd bugs popping up later by just using code which has already been
873       tried and tested in production for years.
874
875   How do I strip blank space from the beginning/end of a string?
876       (contributed by brian d foy)
877
878       A substitution can do this for you. For a single line, you want to
879       replace all the leading or trailing whitespace with nothing. You can do
880       that with a pair of substitutions:
881
882           s/^\s+//;
883           s/\s+$//;
884
885       You can also write that as a single substitution, although it turns out
886       the combined statement is slower than the separate ones. That might not
887       matter to you, though:
888
889           s/^\s+|\s+$//g;
890
891       In this regular expression, the alternation matches either at the
892       beginning or the end of the string since the anchors have a lower
893       precedence than the alternation. With the "/g" flag, the substitution
894       makes all possible matches, so it gets both. Remember, the trailing
895       newline matches the "\s+", and  the "$" anchor can match to the
896       absolute end of the string, so the newline disappears too. Just add the
897       newline to the output, which has the added benefit of preserving
898       "blank" (consisting entirely of whitespace) lines which the "^\s+"
899       would remove all by itself:
900
901           while( <> ) {
902               s/^\s+|\s+$//g;
903               print "$_\n";
904           }
905
906       For a multi-line string, you can apply the regular expression to each
907       logical line in the string by adding the "/m" flag (for "multi-line").
908       With the "/m" flag, the "$" matches before an embedded newline, so it
909       doesn't remove it. This pattern still removes the newline at the end of
910       the string:
911
912           $string =~ s/^\s+|\s+$//gm;
913
914       Remember that lines consisting entirely of whitespace will disappear,
915       since the first part of the alternation can match the entire string and
916       replace it with nothing. If you need to keep embedded blank lines, you
917       have to do a little more work. Instead of matching any whitespace
918       (since that includes a newline), just match the other whitespace:
919
920           $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
921
922   How do I pad a string with blanks or pad a number with zeroes?
923       In the following examples, $pad_len is the length to which you wish to
924       pad the string, $text or $num contains the string to be padded, and
925       $pad_char contains the padding character. You can use a single
926       character string constant instead of the $pad_char variable if you know
927       what it is in advance. And in the same way you can use an integer in
928       place of $pad_len if you know the pad length in advance.
929
930       The simplest method uses the "sprintf" function. It can pad on the left
931       or right with blanks and on the left with zeroes and it will not
932       truncate the result. The "pack" function can only pad strings on the
933       right with blanks and it will truncate the result to a maximum length
934       of $pad_len.
935
936           # Left padding a string with blanks (no truncation):
937           my $padded = sprintf("%${pad_len}s", $text);
938           my $padded = sprintf("%*s", $pad_len, $text);  # same thing
939
940           # Right padding a string with blanks (no truncation):
941           my $padded = sprintf("%-${pad_len}s", $text);
942           my $padded = sprintf("%-*s", $pad_len, $text); # same thing
943
944           # Left padding a number with 0 (no truncation):
945           my $padded = sprintf("%0${pad_len}d", $num);
946           my $padded = sprintf("%0*d", $pad_len, $num); # same thing
947
948           # Right padding a string with blanks using pack (will truncate):
949           my $padded = pack("A$pad_len",$text);
950
951       If you need to pad with a character other than blank or zero you can
952       use one of the following methods. They all generate a pad string with
953       the "x" operator and combine that with $text. These methods do not
954       truncate $text.
955
956       Left and right padding with any character, creating a new string:
957
958           my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
959           my $padded = $text . $pad_char x ( $pad_len - length( $text ) );
960
961       Left and right padding with any character, modifying $text directly:
962
963           substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
964           $text .= $pad_char x ( $pad_len - length( $text ) );
965
966   How do I extract selected columns from a string?
967       (contributed by brian d foy)
968
969       If you know the columns that contain the data, you can use "substr" to
970       extract a single column.
971
972           my $column = substr( $line, $start_column, $length );
973
974       You can use "split" if the columns are separated by whitespace or some
975       other delimiter, as long as whitespace or the delimiter cannot appear
976       as part of the data.
977
978           my $line    = ' fred barney   betty   ';
979           my @columns = split /\s+/, $line;
980               # ( '', 'fred', 'barney', 'betty' );
981
982           my $line    = 'fred||barney||betty';
983           my @columns = split /\|/, $line;
984               # ( 'fred', '', 'barney', '', 'betty' );
985
986       If you want to work with comma-separated values, don't do this since
987       that format is a bit more complicated. Use one of the modules that
988       handle that format, such as Text::CSV, Text::CSV_XS, or Text::CSV_PP.
989
990       If you want to break apart an entire line of fixed columns, you can use
991       "unpack" with the A (ASCII) format. By using a number after the format
992       specifier, you can denote the column width. See the "pack" and "unpack"
993       entries in perlfunc for more details.
994
995           my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
996
997       Note that spaces in the format argument to "unpack" do not denote
998       literal spaces. If you have space separated data, you may want "split"
999       instead.
1000
1001   How do I find the soundex value of a string?
1002       (contributed by brian d foy)
1003
1004       You can use the "Text::Soundex" module. If you want to do fuzzy or
1005       close matching, you might also try the String::Approx, and
1006       Text::Metaphone, and Text::DoubleMetaphone modules.
1007
1008   How can I expand variables in text strings?
1009       (contributed by brian d foy)
1010
1011       If you can avoid it, don't, or if you can use a templating system, such
1012       as Text::Template or Template Toolkit, do that instead. You might even
1013       be able to get the job done with "sprintf" or "printf":
1014
1015           my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1016
1017       However, for the one-off simple case where I don't want to pull out a
1018       full templating system, I'll use a string that has two Perl scalar
1019       variables in it. In this example, I want to expand $foo and $bar to
1020       their variable's values:
1021
1022           my $foo = 'Fred';
1023           my $bar = 'Barney';
1024           $string = 'Say hello to $foo and $bar';
1025
1026       One way I can do this involves the substitution operator and a double
1027       "/e" flag. The first "/e" evaluates $1 on the replacement side and
1028       turns it into $foo. The second /e starts with $foo and replaces it with
1029       its value. $foo, then, turns into 'Fred', and that's finally what's
1030       left in the string:
1031
1032           $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1033
1034       The "/e" will also silently ignore violations of strict, replacing
1035       undefined variable names with the empty string. Since I'm using the
1036       "/e" flag (twice even!), I have all of the same security problems I
1037       have with "eval" in its string form. If there's something odd in $foo,
1038       perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1039       myself in trouble.
1040
1041       To get around the security problem, I could also pull the values from a
1042       hash instead of evaluating variable names. Using a single "/e", I can
1043       check the hash to ensure the value exists, and if it doesn't, I can
1044       replace the missing value with a marker, in this case "???" to signal
1045       that I missed something:
1046
1047           my $string = 'This has $foo and $bar';
1048
1049           my %Replacements = (
1050               foo  => 'Fred',
1051               );
1052
1053           # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1054           $string =~ s/\$(\w+)/
1055               exists $Replacements{$1} ? $Replacements{$1} : '???'
1056               /eg;
1057
1058           print $string;
1059
1060   What's wrong with always quoting "$vars"?
1061       The problem is that those double-quotes force stringification--coercing
1062       numbers and references into strings--even when you don't want them to
1063       be strings. Think of it this way: double-quote expansion is used to
1064       produce new strings. If you already have a string, why do you need
1065       more?
1066
1067       If you get used to writing odd things like these:
1068
1069           print "$var";       # BAD
1070           my $new = "$old";       # BAD
1071           somefunc("$var");    # BAD
1072
1073       You'll be in trouble. Those should (in 99.8% of the cases) be the
1074       simpler and more direct:
1075
1076           print $var;
1077           my $new = $old;
1078           somefunc($var);
1079
1080       Otherwise, besides slowing you down, you're going to break code when
1081       the thing in the scalar is actually neither a string nor a number, but
1082       a reference:
1083
1084           func(\@array);
1085           sub func {
1086               my $aref = shift;
1087               my $oref = "$aref";  # WRONG
1088           }
1089
1090       You can also get into subtle problems on those few operations in Perl
1091       that actually do care about the difference between a string and a
1092       number, such as the magical "++" autoincrement operator or the
1093       syscall() function.
1094
1095       Stringification also destroys arrays.
1096
1097           my @lines = `command`;
1098           print "@lines";     # WRONG - extra blanks
1099           print @lines;       # right
1100
1101   Why don't my <<HERE documents work?
1102       Here documents are found in perlop. Check for these three things:
1103
1104       There must be no space after the << part.
1105       There (probably) should be a semicolon at the end of the opening token
1106       You can't (easily) have any space in front of the tag.
1107       There needs to be at least a line separator after the end token.
1108
1109       If you want to indent the text in the here document, you can do this:
1110
1111           # all in one
1112           (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1113               your text
1114               goes here
1115           HERE_TARGET
1116
1117       But the HERE_TARGET must still be flush against the margin.  If you
1118       want that indented also, you'll have to quote in the indentation.
1119
1120           (my $quote = <<'    FINIS') =~ s/^\s+//gm;
1121                   ...we will have peace, when you and all your works have
1122                   perished--and the works of your dark master to whom you
1123                   would deliver us. You are a liar, Saruman, and a corrupter
1124                   of men's hearts. --Theoden in /usr/src/perl/taint.c
1125               FINIS
1126           $quote =~ s/\s+--/\n--/;
1127
1128       A nice general-purpose fixer-upper function for indented here documents
1129       follows. It expects to be called with a here document as its argument.
1130       It looks to see whether each line begins with a common substring, and
1131       if so, strips that substring off. Otherwise, it takes the amount of
1132       leading whitespace found on the first line and removes that much off
1133       each subsequent line.
1134
1135           sub fix {
1136               local $_ = shift;
1137               my ($white, $leader);  # common whitespace and common leading string
1138               if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
1139                   ($white, $leader) = ($2, quotemeta($1));
1140               } else {
1141                   ($white, $leader) = (/^(\s+)/, '');
1142               }
1143               s/^\s*?$leader(?:$white)?//gm;
1144               return $_;
1145           }
1146
1147       This works with leading special strings, dynamically determined:
1148
1149           my $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1150           @@@ int
1151           @@@ runops() {
1152           @@@     SAVEI32(runlevel);
1153           @@@     runlevel++;
1154           @@@     while ( op = (*op->op_ppaddr)() );
1155           @@@     TAINT_NOT;
1156           @@@     return 0;
1157           @@@ }
1158           MAIN_INTERPRETER_LOOP
1159
1160       Or with a fixed amount of leading whitespace, with remaining
1161       indentation correctly preserved:
1162
1163           my $poem = fix<<EVER_ON_AND_ON;
1164              Now far ahead the Road has gone,
1165             And I must follow, if I can,
1166              Pursuing it with eager feet,
1167             Until it joins some larger way
1168              Where many paths and errands meet.
1169             And whither then? I cannot say.
1170               --Bilbo in /usr/src/perl/pp_ctl.c
1171           EVER_ON_AND_ON
1172

Data: Arrays

1174   What is the difference between a list and an array?
1175       (contributed by brian d foy)
1176
1177       A list is a fixed collection of scalars. An array is a variable that
1178       holds a variable collection of scalars. An array can supply its
1179       collection for list operations, so list operations also work on arrays:
1180
1181           # slices
1182           ( 'dog', 'cat', 'bird' )[2,3];
1183           @animals[2,3];
1184
1185           # iteration
1186           foreach ( qw( dog cat bird ) ) { ... }
1187           foreach ( @animals ) { ... }
1188
1189           my @three = grep { length == 3 } qw( dog cat bird );
1190           my @three = grep { length == 3 } @animals;
1191
1192           # supply an argument list
1193           wash_animals( qw( dog cat bird ) );
1194           wash_animals( @animals );
1195
1196       Array operations, which change the scalars, rearranges them, or adds or
1197       subtracts some scalars, only work on arrays. These can't work on a
1198       list, which is fixed. Array operations include "shift", "unshift",
1199       "push", "pop", and "splice".
1200
1201       An array can also change its length:
1202
1203           $#animals = 1;  # truncate to two elements
1204           $#animals = 10000; # pre-extend to 10,001 elements
1205
1206       You can change an array element, but you can't change a list element:
1207
1208           $animals[0] = 'Rottweiler';
1209           qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1210
1211           foreach ( @animals ) {
1212               s/^d/fr/;  # works fine
1213           }
1214
1215           foreach ( qw( dog cat bird ) ) {
1216               s/^d/fr/;  # Error! Modification of read only value!
1217           }
1218
1219       However, if the list element is itself a variable, it appears that you
1220       can change a list element. However, the list element is the variable,
1221       not the data. You're not changing the list element, but something the
1222       list element refers to. The list element itself doesn't change: it's
1223       still the same variable.
1224
1225       You also have to be careful about context. You can assign an array to a
1226       scalar to get the number of elements in the array. This only works for
1227       arrays, though:
1228
1229           my $count = @animals;  # only works with arrays
1230
1231       If you try to do the same thing with what you think is a list, you get
1232       a quite different result. Although it looks like you have a list on the
1233       righthand side, Perl actually sees a bunch of scalars separated by a
1234       comma:
1235
1236           my $scalar = ( 'dog', 'cat', 'bird' );  # $scalar gets bird
1237
1238       Since you're assigning to a scalar, the righthand side is in scalar
1239       context. The comma operator (yes, it's an operator!) in scalar context
1240       evaluates its lefthand side, throws away the result, and evaluates it's
1241       righthand side and returns the result. In effect, that list-lookalike
1242       assigns to $scalar it's rightmost value. Many people mess this up
1243       because they choose a list-lookalike whose last element is also the
1244       count they expect:
1245
1246           my $scalar = ( 1, 2, 3 );  # $scalar gets 3, accidentally
1247
1248   What is the difference between $array[1] and @array[1]?
1249       (contributed by brian d foy)
1250
1251       The difference is the sigil, that special character in front of the
1252       array name. The "$" sigil means "exactly one item", while the "@" sigil
1253       means "zero or more items". The "$" gets you a single scalar, while the
1254       "@" gets you a list.
1255
1256       The confusion arises because people incorrectly assume that the sigil
1257       denotes the variable type.
1258
1259       The $array[1] is a single-element access to the array. It's going to
1260       return the item in index 1 (or undef if there is no item there).  If
1261       you intend to get exactly one element from the array, this is the form
1262       you should use.
1263
1264       The @array[1] is an array slice, although it has only one index.  You
1265       can pull out multiple elements simultaneously by specifying additional
1266       indices as a list, like @array[1,4,3,0].
1267
1268       Using a slice on the lefthand side of the assignment supplies list
1269       context to the righthand side. This can lead to unexpected results.
1270       For instance, if you want to read a single line from a filehandle,
1271       assigning to a scalar value is fine:
1272
1273           $array[1] = <STDIN>;
1274
1275       However, in list context, the line input operator returns all of the
1276       lines as a list. The first line goes into @array[1] and the rest of the
1277       lines mysteriously disappear:
1278
1279           @array[1] = <STDIN>;  # most likely not what you want
1280
1281       Either the "use warnings" pragma or the -w flag will warn you when you
1282       use an array slice with a single index.
1283
1284   How can I remove duplicate elements from a list or array?
1285       (contributed by brian d foy)
1286
1287       Use a hash. When you think the words "unique" or "duplicated", think
1288       "hash keys".
1289
1290       If you don't care about the order of the elements, you could just
1291       create the hash then extract the keys. It's not important how you
1292       create that hash: just that you use "keys" to get the unique elements.
1293
1294           my %hash   = map { $_, 1 } @array;
1295           # or a hash slice: @hash{ @array } = ();
1296           # or a foreach: $hash{$_} = 1 foreach ( @array );
1297
1298           my @unique = keys %hash;
1299
1300       If you want to use a module, try the "uniq" function from
1301       List::MoreUtils. In list context it returns the unique elements,
1302       preserving their order in the list. In scalar context, it returns the
1303       number of unique elements.
1304
1305           use List::MoreUtils qw(uniq);
1306
1307           my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1308           my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1309
1310       You can also go through each element and skip the ones you've seen
1311       before. Use a hash to keep track. The first time the loop sees an
1312       element, that element has no key in %Seen. The "next" statement creates
1313       the key and immediately uses its value, which is "undef", so the loop
1314       continues to the "push" and increments the value for that key. The next
1315       time the loop sees that same element, its key exists in the hash and
1316       the value for that key is true (since it's not 0 or "undef"), so the
1317       next skips that iteration and the loop goes to the next element.
1318
1319           my @unique = ();
1320           my %seen   = ();
1321
1322           foreach my $elem ( @array ) {
1323               next if $seen{ $elem }++;
1324               push @unique, $elem;
1325           }
1326
1327       You can write this more briefly using a grep, which does the same
1328       thing.
1329
1330           my %seen = ();
1331           my @unique = grep { ! $seen{ $_ }++ } @array;
1332
1333   How can I tell whether a certain element is contained in a list or array?
1334       (portions of this answer contributed by Anno Siegel and brian d foy)
1335
1336       Hearing the word "in" is an indication that you probably should have
1337       used a hash, not a list or array, to store your data. Hashes are
1338       designed to answer this question quickly and efficiently. Arrays
1339       aren't.
1340
1341       That being said, there are several ways to approach this. In Perl 5.10
1342       and later, you can use the smart match operator to check that an item
1343       is contained in an array or a hash:
1344
1345           use 5.010;
1346
1347           if( $item ~~ @array ) {
1348               say "The array contains $item"
1349           }
1350
1351           if( $item ~~ %hash ) {
1352               say "The hash contains $item"
1353           }
1354
1355       With earlier versions of Perl, you have to do a bit more work. If you
1356       are going to make this query many times over arbitrary string values,
1357       the fastest way is probably to invert the original array and maintain a
1358       hash whose keys are the first array's values:
1359
1360           my @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1361           my %is_blue = ();
1362           for (@blues) { $is_blue{$_} = 1 }
1363
1364       Now you can check whether $is_blue{$some_color}. It might have been a
1365       good idea to keep the blues all in a hash in the first place.
1366
1367       If the values are all small integers, you could use a simple indexed
1368       array. This kind of an array will take up less space:
1369
1370           my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1371           my @is_tiny_prime = ();
1372           for (@primes) { $is_tiny_prime[$_] = 1 }
1373           # or simply  @istiny_prime[@primes] = (1) x @primes;
1374
1375       Now you check whether $is_tiny_prime[$some_number].
1376
1377       If the values in question are integers instead of strings, you can save
1378       quite a lot of space by using bit strings instead:
1379
1380           my @articles = ( 1..10, 150..2000, 2017 );
1381           undef $read;
1382           for (@articles) { vec($read,$_,1) = 1 }
1383
1384       Now check whether "vec($read,$n,1)" is true for some $n.
1385
1386       These methods guarantee fast individual tests but require a re-
1387       organization of the original list or array. They only pay off if you
1388       have to test multiple values against the same array.
1389
1390       If you are testing only once, the standard module List::Util exports
1391       the function "first" for this purpose. It works by stopping once it
1392       finds the element. It's written in C for speed, and its Perl equivalent
1393       looks like this subroutine:
1394
1395           sub first (&@) {
1396               my $code = shift;
1397               foreach (@_) {
1398                   return $_ if &{$code}();
1399               }
1400               undef;
1401           }
1402
1403       If speed is of little concern, the common idiom uses grep in scalar
1404       context (which returns the number of items that passed its condition)
1405       to traverse the entire list. This does have the benefit of telling you
1406       how many matches it found, though.
1407
1408           my $is_there = grep $_ eq $whatever, @array;
1409
1410       If you want to actually extract the matching elements, simply use grep
1411       in list context.
1412
1413           my @matches = grep $_ eq $whatever, @array;
1414
1415   How do I compute the difference of two arrays? How do I compute the
1416       intersection of two arrays?
1417       Use a hash. Here's code to do both and more. It assumes that each
1418       element is unique in a given array:
1419
1420           my (@union, @intersection, @difference);
1421           my %count = ();
1422           foreach my $element (@array1, @array2) { $count{$element}++ }
1423           foreach my $element (keys %count) {
1424               push @union, $element;
1425               push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1426           }
1427
1428       Note that this is the symmetric difference, that is, all elements in
1429       either A or in B but not in both. Think of it as an xor operation.
1430
1431   How do I test whether two arrays or hashes are equal?
1432       With Perl 5.10 and later, the smart match operator can give you the
1433       answer with the least amount of work:
1434
1435           use 5.010;
1436
1437           if( @array1 ~~ @array2 ) {
1438               say "The arrays are the same";
1439           }
1440
1441           if( %hash1 ~~ %hash2 ) # doesn't check values!  {
1442               say "The hash keys are the same";
1443           }
1444
1445       The following code works for single-level arrays. It uses a stringwise
1446       comparison, and does not distinguish defined versus undefined empty
1447       strings. Modify if you have other needs.
1448
1449           $are_equal = compare_arrays(\@frogs, \@toads);
1450
1451           sub compare_arrays {
1452               my ($first, $second) = @_;
1453               no warnings;  # silence spurious -w undef complaints
1454               return 0 unless @$first == @$second;
1455               for (my $i = 0; $i < @$first; $i++) {
1456                   return 0 if $first->[$i] ne $second->[$i];
1457               }
1458               return 1;
1459           }
1460
1461       For multilevel structures, you may wish to use an approach more like
1462       this one. It uses the CPAN module FreezeThaw:
1463
1464           use FreezeThaw qw(cmpStr);
1465           my @a = my @b = ( "this", "that", [ "more", "stuff" ] );
1466
1467           printf "a and b contain %s arrays\n",
1468               cmpStr(\@a, \@b) == 0
1469               ? "the same"
1470               : "different";
1471
1472       This approach also works for comparing hashes. Here we'll demonstrate
1473       two different answers:
1474
1475           use FreezeThaw qw(cmpStr cmpStrHard);
1476
1477           my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1478           $a{EXTRA} = \%b;
1479           $b{EXTRA} = \%a;
1480
1481           printf "a and b contain %s hashes\n",
1482           cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1483
1484           printf "a and b contain %s hashes\n",
1485           cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1486
1487       The first reports that both those the hashes contain the same data,
1488       while the second reports that they do not. Which you prefer is left as
1489       an exercise to the reader.
1490
1491   How do I find the first array element for which a condition is true?
1492       To find the first array element which satisfies a condition, you can
1493       use the "first()" function in the List::Util module, which comes with
1494       Perl 5.8. This example finds the first element that contains "Perl".
1495
1496           use List::Util qw(first);
1497
1498           my $element = first { /Perl/ } @array;
1499
1500       If you cannot use List::Util, you can make your own loop to do the same
1501       thing. Once you find the element, you stop the loop with last.
1502
1503           my $found;
1504           foreach ( @array ) {
1505               if( /Perl/ ) { $found = $_; last }
1506           }
1507
1508       If you want the array index, use the "firstidx()" function from
1509       "List::MoreUtils":
1510
1511           use List::MoreUtils qw(firstidx);
1512           my $index = firstidx { /Perl/ } @array;
1513
1514       Or write it yourself, iterating through the indices and checking the
1515       array element at each index until you find one that satisfies the
1516       condition:
1517
1518           my( $found, $index ) = ( undef, -1 );
1519           for( $i = 0; $i < @array; $i++ ) {
1520               if( $array[$i] =~ /Perl/ ) {
1521                   $found = $array[$i];
1522                   $index = $i;
1523                   last;
1524               }
1525           }
1526
1527   How do I handle linked lists?
1528       (contributed by brian d foy)
1529
1530       Perl's arrays do not have a fixed size, so you don't need linked lists
1531       if you just want to add or remove items. You can use array operations
1532       such as "push", "pop", "shift", "unshift", or "splice" to do that.
1533
1534       Sometimes, however, linked lists can be useful in situations where you
1535       want to "shard" an array so you have have many small arrays instead of
1536       a single big array. You can keep arrays longer than Perl's largest
1537       array index, lock smaller arrays separately in threaded programs,
1538       reallocate less memory, or quickly insert elements in the middle of the
1539       chain.
1540
1541       Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
1542       Linked Lists" ( http://www.slideshare.net/lembark/perly-linked-lists
1543       <http://www.slideshare.net/lembark/perly-linked-lists> ), although you
1544       can just use his LinkedList::Single module.
1545
1546   How do I handle circular lists?
1547       (contributed by brian d foy)
1548
1549       If you want to cycle through an array endlessly, you can increment the
1550       index modulo the number of elements in the array:
1551
1552           my @array = qw( a b c );
1553           my $i = 0;
1554
1555           while( 1 ) {
1556               print $array[ $i++ % @array ], "\n";
1557               last if $i > 20;
1558           }
1559
1560       You can also use Tie::Cycle to use a scalar that always has the next
1561       element of the circular array:
1562
1563           use Tie::Cycle;
1564
1565           tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1566
1567           print $cycle; # FFFFFF
1568           print $cycle; # 000000
1569           print $cycle; # FFFF00
1570
1571       The Array::Iterator::Circular creates an iterator object for circular
1572       arrays:
1573
1574           use Array::Iterator::Circular;
1575
1576           my $color_iterator = Array::Iterator::Circular->new(
1577               qw(red green blue orange)
1578               );
1579
1580           foreach ( 1 .. 20 ) {
1581               print $color_iterator->next, "\n";
1582           }
1583
1584   How do I shuffle an array randomly?
1585       If you either have Perl 5.8.0 or later installed, or if you have
1586       Scalar-List-Utils 1.03 or later installed, you can say:
1587
1588           use List::Util 'shuffle';
1589
1590           @shuffled = shuffle(@list);
1591
1592       If not, you can use a Fisher-Yates shuffle.
1593
1594           sub fisher_yates_shuffle {
1595               my $deck = shift;  # $deck is a reference to an array
1596               return unless @$deck; # must not be empty!
1597
1598               my $i = @$deck;
1599               while (--$i) {
1600                   my $j = int rand ($i+1);
1601                   @$deck[$i,$j] = @$deck[$j,$i];
1602               }
1603           }
1604
1605           # shuffle my mpeg collection
1606           #
1607           my @mpeg = <audio/*/*.mp3>;
1608           fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1609           print @mpeg;
1610
1611       Note that the above implementation shuffles an array in place, unlike
1612       the "List::Util::shuffle()" which takes a list and returns a new
1613       shuffled list.
1614
1615       You've probably seen shuffling algorithms that work using splice,
1616       randomly picking another element to swap the current element with
1617
1618           srand;
1619           @new = ();
1620           @old = 1 .. 10;  # just a demo
1621           while (@old) {
1622               push(@new, splice(@old, rand @old, 1));
1623           }
1624
1625       This is bad because splice is already O(N), and since you do it N
1626       times, you just invented a quadratic algorithm; that is, O(N**2).  This
1627       does not scale, although Perl is so efficient that you probably won't
1628       notice this until you have rather largish arrays.
1629
1630   How do I process/modify each element of an array?
1631       Use "for"/"foreach":
1632
1633           for (@lines) {
1634               s/foo/bar/;    # change that word
1635               tr/XZ/ZX/;    # swap those letters
1636           }
1637
1638       Here's another; let's compute spherical volumes:
1639
1640           my @volumes = @radii;
1641           for (@volumes) {   # @volumes has changed parts
1642               $_ **= 3;
1643               $_ *= (4/3) * 3.14159;  # this will be constant folded
1644           }
1645
1646       which can also be done with "map()" which is made to transform one list
1647       into another:
1648
1649           my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1650
1651       If you want to do the same thing to modify the values of the hash, you
1652       can use the "values" function. As of Perl 5.6 the values are not
1653       copied, so if you modify $orbit (in this case), you modify the value.
1654
1655           for my $orbit ( values %orbits ) {
1656               ($orbit **= 3) *= (4/3) * 3.14159;
1657           }
1658
1659       Prior to perl 5.6 "values" returned copies of the values, so older perl
1660       code often contains constructions such as @orbits{keys %orbits} instead
1661       of "values %orbits" where the hash is to be modified.
1662
1663   How do I select a random element from an array?
1664       Use the "rand()" function (see "rand" in perlfunc):
1665
1666           my $index   = rand @array;
1667           my $element = $array[$index];
1668
1669       Or, simply:
1670
1671           my $element = $array[ rand @array ];
1672
1673   How do I permute N elements of a list?
1674       Use the List::Permutor module on CPAN. If the list is actually an
1675       array, try the Algorithm::Permute module (also on CPAN). It's written
1676       in XS code and is very efficient:
1677
1678           use Algorithm::Permute;
1679
1680           my @array = 'a'..'d';
1681           my $p_iterator = Algorithm::Permute->new ( \@array );
1682
1683           while (my @perm = $p_iterator->next) {
1684              print "next permutation: (@perm)\n";
1685           }
1686
1687       For even faster execution, you could do:
1688
1689           use Algorithm::Permute;
1690
1691           my @array = 'a'..'d';
1692
1693           Algorithm::Permute::permute {
1694               print "next permutation: (@array)\n";
1695           } @array;
1696
1697       Here's a little program that generates all permutations of all the
1698       words on each line of input. The algorithm embodied in the "permute()"
1699       function is discussed in Volume 4 (still unpublished) of Knuth's The
1700       Art of Computer Programming and will work on any list:
1701
1702           #!/usr/bin/perl -n
1703           # Fischer-Krause ordered permutation generator
1704
1705           sub permute (&@) {
1706               my $code = shift;
1707               my @idx = 0..$#_;
1708               while ( $code->(@_[@idx]) ) {
1709                   my $p = $#idx;
1710                   --$p while $idx[$p-1] > $idx[$p];
1711                   my $q = $p or return;
1712                   push @idx, reverse splice @idx, $p;
1713                   ++$q while $idx[$p-1] > $idx[$q];
1714                   @idx[$p-1,$q]=@idx[$q,$p-1];
1715               }
1716           }
1717
1718           permute { print "@_\n" } split;
1719
1720       The Algorithm::Loops module also provides the "NextPermute" and
1721       "NextPermuteNum" functions which efficiently find all unique
1722       permutations of an array, even if it contains duplicate values,
1723       modifying it in-place: if its elements are in reverse-sorted order then
1724       the array is reversed, making it sorted, and it returns false;
1725       otherwise the next permutation is returned.
1726
1727       "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1728       you can enumerate all the permutations of 0..9 like this:
1729
1730           use Algorithm::Loops qw(NextPermuteNum);
1731
1732           my @list= 0..9;
1733           do { print "@list\n" } while NextPermuteNum @list;
1734
1735   How do I sort an array by (anything)?
1736       Supply a comparison function to sort() (described in "sort" in
1737       perlfunc):
1738
1739           @list = sort { $a <=> $b } @list;
1740
1741       The default sort function is cmp, string comparison, which would sort
1742       "(1, 2, 10)" into "(1, 10, 2)". "<=>", used above, is the numerical
1743       comparison operator.
1744
1745       If you have a complicated function needed to pull out the part you want
1746       to sort on, then don't do it inside the sort function. Pull it out
1747       first, because the sort BLOCK can be called many times for the same
1748       element. Here's an example of how to pull out the first word after the
1749       first number on each item, and then sort those words case-
1750       insensitively.
1751
1752           my @idx;
1753           for (@data) {
1754               my $item;
1755               ($item) = /\d+\s*(\S+)/;
1756               push @idx, uc($item);
1757           }
1758           my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1759
1760       which could also be written this way, using a trick that's come to be
1761       known as the Schwartzian Transform:
1762
1763           my @sorted = map  { $_->[0] }
1764               sort { $a->[1] cmp $b->[1] }
1765               map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1766
1767       If you need to sort on several fields, the following paradigm is
1768       useful.
1769
1770           my @sorted = sort {
1771               field1($a) <=> field1($b) ||
1772               field2($a) cmp field2($b) ||
1773               field3($a) cmp field3($b)
1774           } @data;
1775
1776       This can be conveniently combined with precalculation of keys as given
1777       above.
1778
1779       See the sort article in the "Far More Than You Ever Wanted To Know"
1780       collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for more
1781       about this approach.
1782
1783       See also the question later in perlfaq4 on sorting hashes.
1784
1785   How do I manipulate arrays of bits?
1786       Use "pack()" and "unpack()", or else "vec()" and the bitwise
1787       operations.
1788
1789       For example, you don't have to store individual bits in an array (which
1790       would mean that you're wasting a lot of space). To convert an array of
1791       bits to a string, use "vec()" to set the right bits. This sets $vec to
1792       have bit N set only if $ints[N] was set:
1793
1794           my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1795           my $vec = '';
1796           foreach( 0 .. $#ints ) {
1797               vec($vec,$_,1) = 1 if $ints[$_];
1798           }
1799
1800       The string $vec only takes up as many bits as it needs. For instance,
1801       if you had 16 entries in @ints, $vec only needs two bytes to store them
1802       (not counting the scalar variable overhead).
1803
1804       Here's how, given a vector in $vec, you can get those bits into your
1805       @ints array:
1806
1807           sub bitvec_to_list {
1808               my $vec = shift;
1809               my @ints;
1810               # Find null-byte density then select best algorithm
1811               if ($vec =~ tr/\0// / length $vec > 0.95) {
1812                   use integer;
1813                   my $i;
1814
1815                   # This method is faster with mostly null-bytes
1816                   while($vec =~ /[^\0]/g ) {
1817                       $i = -9 + 8 * pos $vec;
1818                       push @ints, $i if vec($vec, ++$i, 1);
1819                       push @ints, $i if vec($vec, ++$i, 1);
1820                       push @ints, $i if vec($vec, ++$i, 1);
1821                       push @ints, $i if vec($vec, ++$i, 1);
1822                       push @ints, $i if vec($vec, ++$i, 1);
1823                       push @ints, $i if vec($vec, ++$i, 1);
1824                       push @ints, $i if vec($vec, ++$i, 1);
1825                       push @ints, $i if vec($vec, ++$i, 1);
1826                   }
1827               }
1828               else {
1829                   # This method is a fast general algorithm
1830                   use integer;
1831                   my $bits = unpack "b*", $vec;
1832                   push @ints, 0 if $bits =~ s/^(\d)// && $1;
1833                   push @ints, pos $bits while($bits =~ /1/g);
1834               }
1835
1836               return \@ints;
1837           }
1838
1839       This method gets faster the more sparse the bit vector is.  (Courtesy
1840       of Tim Bunce and Winfried Koenig.)
1841
1842       You can make the while loop a lot shorter with this suggestion from
1843       Benjamin Goldberg:
1844
1845           while($vec =~ /[^\0]+/g ) {
1846               push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1847           }
1848
1849       Or use the CPAN module Bit::Vector:
1850
1851           my $vector = Bit::Vector->new($num_of_bits);
1852           $vector->Index_List_Store(@ints);
1853           my @ints = $vector->Index_List_Read();
1854
1855       Bit::Vector provides efficient methods for bit vector, sets of small
1856       integers and "big int" math.
1857
1858       Here's a more extensive illustration using vec():
1859
1860           # vec demo
1861           my $vector = "\xff\x0f\xef\xfe";
1862           print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1863           unpack("N", $vector), "\n";
1864           my $is_set = vec($vector, 23, 1);
1865           print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1866           pvec($vector);
1867
1868           set_vec(1,1,1);
1869           set_vec(3,1,1);
1870           set_vec(23,1,1);
1871
1872           set_vec(3,1,3);
1873           set_vec(3,2,3);
1874           set_vec(3,4,3);
1875           set_vec(3,4,7);
1876           set_vec(3,8,3);
1877           set_vec(3,8,7);
1878
1879           set_vec(0,32,17);
1880           set_vec(1,32,17);
1881
1882           sub set_vec {
1883               my ($offset, $width, $value) = @_;
1884               my $vector = '';
1885               vec($vector, $offset, $width) = $value;
1886               print "offset=$offset width=$width value=$value\n";
1887               pvec($vector);
1888           }
1889
1890           sub pvec {
1891               my $vector = shift;
1892               my $bits = unpack("b*", $vector);
1893               my $i = 0;
1894               my $BASE = 8;
1895
1896               print "vector length in bytes: ", length($vector), "\n";
1897               @bytes = unpack("A8" x length($vector), $bits);
1898               print "bits are: @bytes\n\n";
1899           }
1900
1901   Why does defined() return true on empty arrays and hashes?
1902       The short story is that you should probably only use defined on scalars
1903       or functions, not on aggregates (arrays and hashes). See "defined" in
1904       perlfunc in the 5.004 release or later of Perl for more detail.
1905

Data: Hashes (Associative Arrays)

1907   How do I process an entire hash?
1908       (contributed by brian d foy)
1909
1910       There are a couple of ways that you can process an entire hash. You can
1911       get a list of keys, then go through each key, or grab a one key-value
1912       pair at a time.
1913
1914       To go through all of the keys, use the "keys" function. This extracts
1915       all of the keys of the hash and gives them back to you as a list. You
1916       can then get the value through the particular key you're processing:
1917
1918           foreach my $key ( keys %hash ) {
1919               my $value = $hash{$key}
1920               ...
1921           }
1922
1923       Once you have the list of keys, you can process that list before you
1924       process the hash elements. For instance, you can sort the keys so you
1925       can process them in lexical order:
1926
1927           foreach my $key ( sort keys %hash ) {
1928               my $value = $hash{$key}
1929               ...
1930           }
1931
1932       Or, you might want to only process some of the items. If you only want
1933       to deal with the keys that start with "text:", you can select just
1934       those using "grep":
1935
1936           foreach my $key ( grep /^text:/, keys %hash ) {
1937               my $value = $hash{$key}
1938               ...
1939           }
1940
1941       If the hash is very large, you might not want to create a long list of
1942       keys. To save some memory, you can grab one key-value pair at a time
1943       using "each()", which returns a pair you haven't seen yet:
1944
1945           while( my( $key, $value ) = each( %hash ) ) {
1946               ...
1947           }
1948
1949       The "each" operator returns the pairs in apparently random order, so if
1950       ordering matters to you, you'll have to stick with the "keys" method.
1951
1952       The "each()" operator can be a bit tricky though. You can't add or
1953       delete keys of the hash while you're using it without possibly skipping
1954       or re-processing some pairs after Perl internally rehashes all of the
1955       elements. Additionally, a hash has only one iterator, so if you mix
1956       "keys", "values", or "each" on the same hash, you risk resetting the
1957       iterator and messing up your processing. See the "each" entry in
1958       perlfunc for more details.
1959
1960   How do I merge two hashes?
1961       (contributed by brian d foy)
1962
1963       Before you decide to merge two hashes, you have to decide what to do if
1964       both hashes contain keys that are the same and if you want to leave the
1965       original hashes as they were.
1966
1967       If you want to preserve the original hashes, copy one hash (%hash1) to
1968       a new hash (%new_hash), then add the keys from the other hash (%hash2
1969       to the new hash. Checking that the key already exists in %new_hash
1970       gives you a chance to decide what to do with the duplicates:
1971
1972           my %new_hash = %hash1; # make a copy; leave %hash1 alone
1973
1974           foreach my $key2 ( keys %hash2 ) {
1975               if( exists $new_hash{$key2} ) {
1976                   warn "Key [$key2] is in both hashes!";
1977                   # handle the duplicate (perhaps only warning)
1978                   ...
1979                   next;
1980               }
1981               else {
1982                   $new_hash{$key2} = $hash2{$key2};
1983               }
1984           }
1985
1986       If you don't want to create a new hash, you can still use this looping
1987       technique; just change the %new_hash to %hash1.
1988
1989           foreach my $key2 ( keys %hash2 ) {
1990               if( exists $hash1{$key2} ) {
1991                   warn "Key [$key2] is in both hashes!";
1992                   # handle the duplicate (perhaps only warning)
1993                   ...
1994                   next;
1995               }
1996               else {
1997                   $hash1{$key2} = $hash2{$key2};
1998               }
1999             }
2000
2001       If you don't care that one hash overwrites keys and values from the
2002       other, you could just use a hash slice to add one hash to another. In
2003       this case, values from %hash2 replace values from %hash1 when they have
2004       keys in common:
2005
2006           @hash1{ keys %hash2 } = values %hash2;
2007
2008   What happens if I add or remove keys from a hash while iterating over it?
2009       (contributed by brian d foy)
2010
2011       The easy answer is "Don't do that!"
2012
2013       If you iterate through the hash with each(), you can delete the key
2014       most recently returned without worrying about it. If you delete or add
2015       other keys, the iterator may skip or double up on them since perl may
2016       rearrange the hash table. See the entry for "each()" in perlfunc.
2017
2018   How do I look up a hash element by value?
2019       Create a reverse hash:
2020
2021           my %by_value = reverse %by_key;
2022           my $key = $by_value{$value};
2023
2024       That's not particularly efficient. It would be more space-efficient to
2025       use:
2026
2027           while (my ($key, $value) = each %by_key) {
2028               $by_value{$value} = $key;
2029           }
2030
2031       If your hash could have repeated values, the methods above will only
2032       find one of the associated keys.  This may or may not worry you. If it
2033       does worry you, you can always reverse the hash into a hash of arrays
2034       instead:
2035
2036           while (my ($key, $value) = each %by_key) {
2037                push @{$key_list_by_value{$value}}, $key;
2038           }
2039
2040   How can I know how many entries are in a hash?
2041       (contributed by brian d foy)
2042
2043       This is very similar to "How do I process an entire hash?", also in
2044       perlfaq4, but a bit simpler in the common cases.
2045
2046       You can use the "keys()" built-in function in scalar context to find
2047       out have many entries you have in a hash:
2048
2049           my $key_count = keys %hash; # must be scalar context!
2050
2051       If you want to find out how many entries have a defined value, that's a
2052       bit different. You have to check each value. A "grep" is handy:
2053
2054           my $defined_value_count = grep { defined } values %hash;
2055
2056       You can use that same structure to count the entries any way that you
2057       like. If you want the count of the keys with vowels in them, you just
2058       test for that instead:
2059
2060           my $vowel_count = grep { /[aeiou]/ } keys %hash;
2061
2062       The "grep" in scalar context returns the count. If you want the list of
2063       matching items, just use it in list context instead:
2064
2065           my @defined_values = grep { defined } values %hash;
2066
2067       The "keys()" function also resets the iterator, which means that you
2068       may see strange results if you use this between uses of other hash
2069       operators such as "each()".
2070
2071   How do I sort a hash (optionally by value instead of key)?
2072       (contributed by brian d foy)
2073
2074       To sort a hash, start with the keys. In this example, we give the list
2075       of keys to the sort function which then compares them ASCIIbetically
2076       (which might be affected by your locale settings). The output list has
2077       the keys in ASCIIbetical order. Once we have the keys, we can go
2078       through them to create a report which lists the keys in ASCIIbetical
2079       order.
2080
2081           my @keys = sort { $a cmp $b } keys %hash;
2082
2083           foreach my $key ( @keys ) {
2084               printf "%-20s %6d\n", $key, $hash{$key};
2085           }
2086
2087       We could get more fancy in the "sort()" block though. Instead of
2088       comparing the keys, we can compute a value with them and use that value
2089       as the comparison.
2090
2091       For instance, to make our report order case-insensitive, we use "lc" to
2092       lowercase the keys before comparing them:
2093
2094           my @keys = sort { lc $a cmp lc $b } keys %hash;
2095
2096       Note: if the computation is expensive or the hash has many elements,
2097       you may want to look at the Schwartzian Transform to cache the
2098       computation results.
2099
2100       If we want to sort by the hash value instead, we use the hash key to
2101       look it up. We still get out a list of keys, but this time they are
2102       ordered by their value.
2103
2104           my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2105
2106       From there we can get more complex. If the hash values are the same, we
2107       can provide a secondary sort on the hash key.
2108
2109           my @keys = sort {
2110               $hash{$a} <=> $hash{$b}
2111                   or
2112               "\L$a" cmp "\L$b"
2113           } keys %hash;
2114
2115   How can I always keep my hash sorted?
2116       You can look into using the "DB_File" module and "tie()" using the
2117       $DB_BTREE hash bindings as documented in "In Memory Databases" in
2118       DB_File. The Tie::IxHash module from CPAN might also be instructive.
2119       Although this does keep your hash sorted, you might not like the
2120       slowdown you suffer from the tie interface. Are you sure you need to do
2121       this? :)
2122
2123   What's the difference between "delete" and "undef" with hashes?
2124       Hashes contain pairs of scalars: the first is the key, the second is
2125       the value. The key will be coerced to a string, although the value can
2126       be any kind of scalar: string, number, or reference. If a key $key is
2127       present in %hash, "exists($hash{$key})" will return true. The value for
2128       a given key can be "undef", in which case $hash{$key} will be "undef"
2129       while "exists $hash{$key}" will return true. This corresponds to ($key,
2130       "undef") being in the hash.
2131
2132       Pictures help... Here's the %hash table:
2133
2134             keys  values
2135           +------+------+
2136           |  a   |  3   |
2137           |  x   |  7   |
2138           |  d   |  0   |
2139           |  e   |  2   |
2140           +------+------+
2141
2142       And these conditions hold
2143
2144           $hash{'a'}                       is true
2145           $hash{'d'}                       is false
2146           defined $hash{'d'}               is true
2147           defined $hash{'a'}               is true
2148           exists $hash{'a'}                is true (Perl 5 only)
2149           grep ($_ eq 'a', keys %hash)     is true
2150
2151       If you now say
2152
2153           undef $hash{'a'}
2154
2155       your table now reads:
2156
2157             keys  values
2158           +------+------+
2159           |  a   | undef|
2160           |  x   |  7   |
2161           |  d   |  0   |
2162           |  e   |  2   |
2163           +------+------+
2164
2165       and these conditions now hold; changes in caps:
2166
2167           $hash{'a'}                       is FALSE
2168           $hash{'d'}                       is false
2169           defined $hash{'d'}               is true
2170           defined $hash{'a'}               is FALSE
2171           exists $hash{'a'}                is true (Perl 5 only)
2172           grep ($_ eq 'a', keys %hash)     is true
2173
2174       Notice the last two: you have an undef value, but a defined key!
2175
2176       Now, consider this:
2177
2178           delete $hash{'a'}
2179
2180       your table now reads:
2181
2182             keys  values
2183           +------+------+
2184           |  x   |  7   |
2185           |  d   |  0   |
2186           |  e   |  2   |
2187           +------+------+
2188
2189       and these conditions now hold; changes in caps:
2190
2191           $hash{'a'}                       is false
2192           $hash{'d'}                       is false
2193           defined $hash{'d'}               is true
2194           defined $hash{'a'}               is false
2195           exists $hash{'a'}                is FALSE (Perl 5 only)
2196           grep ($_ eq 'a', keys %hash)     is FALSE
2197
2198       See, the whole entry is gone!
2199
2200   Why don't my tied hashes make the defined/exists distinction?
2201       This depends on the tied hash's implementation of EXISTS().  For
2202       example, there isn't the concept of undef with hashes that are tied to
2203       DBM* files. It also means that exists() and defined() do the same thing
2204       with a DBM* file, and what they end up doing is not what they do with
2205       ordinary hashes.
2206
2207   How do I reset an each() operation part-way through?
2208       (contributed by brian d foy)
2209
2210       You can use the "keys" or "values" functions to reset "each". To simply
2211       reset the iterator used by "each" without doing anything else, use one
2212       of them in void context:
2213
2214           keys %hash; # resets iterator, nothing else.
2215           values %hash; # resets iterator, nothing else.
2216
2217       See the documentation for "each" in perlfunc.
2218
2219   How can I get the unique keys from two hashes?
2220       First you extract the keys from the hashes into lists, then solve the
2221       "removing duplicates" problem described above. For example:
2222
2223           my %seen = ();
2224           for my $element (keys(%foo), keys(%bar)) {
2225               $seen{$element}++;
2226           }
2227           my @uniq = keys %seen;
2228
2229       Or more succinctly:
2230
2231           my @uniq = keys %{{%foo,%bar}};
2232
2233       Or if you really want to save space:
2234
2235           my %seen = ();
2236           while (defined ($key = each %foo)) {
2237               $seen{$key}++;
2238           }
2239           while (defined ($key = each %bar)) {
2240               $seen{$key}++;
2241           }
2242           my @uniq = keys %seen;
2243
2244   How can I store a multidimensional array in a DBM file?
2245       Either stringify the structure yourself (no fun), or else get the MLDBM
2246       (which uses Data::Dumper) module from CPAN and layer it on top of
2247       either DB_File or GDBM_File. You might also try DBM::Deep, but it can
2248       be a bit slow.
2249
2250   How can I make my hash remember the order I put elements into it?
2251       Use the Tie::IxHash from CPAN.
2252
2253           use Tie::IxHash;
2254
2255           tie my %myhash, 'Tie::IxHash';
2256
2257           for (my $i=0; $i<20; $i++) {
2258               $myhash{$i} = 2*$i;
2259           }
2260
2261           my @keys = keys %myhash;
2262           # @keys = (0,1,2,3,...)
2263
2264   Why does passing a subroutine an undefined element in a hash create it?
2265       (contributed by brian d foy)
2266
2267       Are you using a really old version of Perl?
2268
2269       Normally, accessing a hash key's value for a nonexistent key will not
2270       create the key.
2271
2272           my %hash  = ();
2273           my $value = $hash{ 'foo' };
2274           print "This won't print\n" if exists $hash{ 'foo' };
2275
2276       Passing $hash{ 'foo' } to a subroutine used to be a special case,
2277       though.  Since you could assign directly to $_[0], Perl had to be ready
2278       to make that assignment so it created the hash key ahead of time:
2279
2280           my_sub( $hash{ 'foo' } );
2281           print "This will print before 5.004\n" if exists $hash{ 'foo' };
2282
2283           sub my_sub {
2284               # $_[0] = 'bar'; # create hash key in case you do this
2285               1;
2286           }
2287
2288       Since Perl 5.004, however, this situation is a special case and Perl
2289       creates the hash key only when you make the assignment:
2290
2291           my_sub( $hash{ 'foo' } );
2292           print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2293
2294           sub my_sub {
2295               $_[0] = 'bar';
2296           }
2297
2298       However, if you want the old behavior (and think carefully about that
2299       because it's a weird side effect), you can pass a hash slice instead.
2300       Perl 5.004 didn't make this a special case:
2301
2302           my_sub( @hash{ qw/foo/ } );
2303
2304   How can I make the Perl equivalent of a C structure/C++ class/hash or array
2305       of hashes or arrays?
2306       Usually a hash ref, perhaps like this:
2307
2308           $record = {
2309               NAME   => "Jason",
2310               EMPNO  => 132,
2311               TITLE  => "deputy peon",
2312               AGE    => 23,
2313               SALARY => 37_000,
2314               PALS   => [ "Norbert", "Rhys", "Phineas"],
2315           };
2316
2317       References are documented in perlref and perlreftut.  Examples of
2318       complex data structures are given in perldsc and perllol. Examples of
2319       structures and object-oriented classes are in perltoot.
2320
2321   How can I use a reference as a hash key?
2322       (contributed by brian d foy and Ben Morrow)
2323
2324       Hash keys are strings, so you can't really use a reference as the key.
2325       When you try to do that, perl turns the reference into its stringified
2326       form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2327       the reference from the stringified form, at least without doing some
2328       extra work on your own.
2329
2330       Remember that the entry in the hash will still be there even if the
2331       referenced variable  goes out of scope, and that it is entirely
2332       possible for Perl to subsequently allocate a different variable at the
2333       same address. This will mean a new variable might accidentally be
2334       associated with the value for an old.
2335
2336       If you have Perl 5.10 or later, and you just want to store a value
2337       against the reference for lookup later, you can use the core
2338       Hash::Util::Fieldhash module. This will also handle renaming the keys
2339       if you use multiple threads (which causes all variables to be
2340       reallocated at new addresses, changing their stringification), and
2341       garbage-collecting the entries when the referenced variable goes out of
2342       scope.
2343
2344       If you actually need to be able to get a real reference back from each
2345       hash entry, you can use the Tie::RefHash module, which does the
2346       required work for you.
2347
2348   How can I check if a key exists in a multilevel hash?
2349       (contributed by brian d foy)
2350
2351       The trick to this problem is avoiding accidental autovivification. If
2352       you want to check three keys deep, you might naievely try this:
2353
2354           my %hash;
2355           if( exists $hash{key1}{key2}{key3} ) {
2356               ...;
2357           }
2358
2359       Even though you started with a completely empty hash, after that call
2360       to "exists" you've created the structure you needed to check for
2361       "key3":
2362
2363           %hash = (
2364                     'key1' => {
2365                                 'key2' => {}
2366                               }
2367                   );
2368
2369       That's autovivification. You can get around this in a few ways. The
2370       easiest way is to just turn it off. The lexical "autovivification"
2371       pragma is available on CPAN. Now you don't add to the hash:
2372
2373           {
2374               no autovivification;
2375               my %hash;
2376               if( exists $hash{key1}{key2}{key3} ) {
2377                   ...;
2378               }
2379           }
2380
2381       The Data::Diver module on CPAN can do it for you too. Its "Dive"
2382       subroutine can tell you not only if the keys exist but also get the
2383       value:
2384
2385           use Data::Diver qw(Dive);
2386
2387           my @exists = Dive( \%hash, qw(key1 key2 key3) );
2388           if(  ! @exists  ) {
2389               ...; # keys do not exist
2390           }
2391           elsif(  ! defined $exists[0]  ) {
2392               ...; # keys exist but value is undef
2393           }
2394
2395       You can easily do this yourself too by checking each level of the hash
2396       before you move onto the next level. This is essentially what
2397       Data::Diver does for you:
2398
2399           if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2400               ...;
2401           }
2402
2403           sub check_hash {
2404              my( $hash, @keys ) = @_;
2405
2406              return unless @keys;
2407
2408              foreach my $key ( @keys ) {
2409                  return unless eval { exists $hash->{$key} };
2410                  $hash = $hash->{$key};
2411               }
2412
2413              return 1;
2414           }
2415
2416   How can I prevent addition of unwanted keys into a hash?
2417       Since version 5.8.0, hashes can be restricted to a fixed number of
2418       given keys. Methods for creating and dealing with restricted hashes are
2419       exported by the Hash::Util module.
2420

Data: Misc

2422   How do I handle binary data correctly?
2423       Perl is binary-clean, so it can handle binary data just fine.  On
2424       Windows or DOS, however, you have to use "binmode" for binary files to
2425       avoid conversions for line endings. In general, you should use
2426       "binmode" any time you want to work with binary data.
2427
2428       Also see "binmode" in perlfunc or perlopentut.
2429
2430       If you're concerned about 8-bit textual data then see perllocale.  If
2431       you want to deal with multibyte characters, however, there are some
2432       gotchas. See the section on Regular Expressions.
2433
2434   How do I determine whether a scalar is a number/whole/integer/float?
2435       Assuming that you don't care about IEEE notations like "NaN" or
2436       "Infinity", you probably just want to use a regular expression:
2437
2438           use 5.010;
2439
2440           given( $number ) {
2441               when( /\D/ )
2442                   { say "\thas nondigits"; continue }
2443               when( /^\d+\z/ )
2444                   { say "\tis a whole number"; continue }
2445               when( /^-?\d+\z/ )
2446                   { say "\tis an integer"; continue }
2447               when( /^[+-]?\d+\z/ )
2448                   { say "\tis a +/- integer"; continue }
2449               when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2450                   { say "\tis a real number"; continue }
2451               when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
2452                   { say "\tis a C float" }
2453           }
2454
2455       There are also some commonly used modules for the task.  Scalar::Util
2456       (distributed with 5.8) provides access to perl's internal function
2457       "looks_like_number" for determining whether a variable looks like a
2458       number. Data::Types exports functions that validate data types using
2459       both the above and other regular expressions. Thirdly, there is
2460       Regexp::Common which has regular expressions to match various types of
2461       numbers. Those three modules are available from the CPAN.
2462
2463       If you're on a POSIX system, Perl supports the "POSIX::strtod" function
2464       for converting strings to doubles (and also "POSIX::strtol" for longs).
2465       Its semantics are somewhat cumbersome, so here's a "getnum" wrapper
2466       function for more convenient access. This function takes a string and
2467       returns the number it found, or "undef" for input that isn't a C float.
2468       The "is_numeric" function is a front end to "getnum" if you just want
2469       to say, "Is this a float?"
2470
2471           sub getnum {
2472               use POSIX qw(strtod);
2473               my $str = shift;
2474               $str =~ s/^\s+//;
2475               $str =~ s/\s+$//;
2476               $! = 0;
2477               my($num, $unparsed) = strtod($str);
2478               if (($str eq '') || ($unparsed != 0) || $!) {
2479                       return undef;
2480               }
2481               else {
2482                   return $num;
2483               }
2484           }
2485
2486           sub is_numeric { defined getnum($_[0]) }
2487
2488       Or you could check out the String::Scanf module on the CPAN instead.
2489
2490   How do I keep persistent data across program calls?
2491       For some specific applications, you can use one of the DBM modules.
2492       See AnyDBM_File. More generically, you should consult the FreezeThaw or
2493       Storable modules from CPAN. Starting from Perl 5.8, Storable is part of
2494       the standard distribution. Here's one example using Storable's "store"
2495       and "retrieve" functions:
2496
2497           use Storable;
2498           store(\%hash, "filename");
2499
2500           # later on...
2501           $href = retrieve("filename");        # by ref
2502           %hash = %{ retrieve("filename") };   # direct to hash
2503
2504   How do I print out or copy a recursive data structure?
2505       The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
2506       for printing out data structures. The Storable module on CPAN (or the
2507       5.8 release of Perl), provides a function called "dclone" that
2508       recursively copies its argument.
2509
2510           use Storable qw(dclone);
2511           $r2 = dclone($r1);
2512
2513       Where $r1 can be a reference to any kind of data structure you'd like.
2514       It will be deeply copied. Because "dclone" takes and returns
2515       references, you'd have to add extra punctuation if you had a hash of
2516       arrays that you wanted to copy.
2517
2518           %newhash = %{ dclone(\%oldhash) };
2519
2520   How do I define methods for every class/object?
2521       (contributed by Ben Morrow)
2522
2523       You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2524       very careful to consider the consequences of doing this: adding methods
2525       to every object is very likely to have unintended consequences. If
2526       possible, it would be better to have all your object inherit from some
2527       common base class, or to use an object system like Moose that supports
2528       roles.
2529
2530   How do I verify a credit card checksum?
2531       Get the Business::CreditCard module from CPAN.
2532
2533   How do I pack arrays of doubles or floats for XS code?
2534       The arrays.h/arrays.c code in the PGPLOT module on CPAN does just this.
2535       If you're doing a lot of float or double processing, consider using the
2536       PDL module from CPAN instead--it makes number-crunching easy.
2537
2538       See <http://search.cpan.org/dist/PGPLOT> for the code.
2539
2541       Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other
2542       authors as noted. All rights reserved.
2543
2544       This documentation is free; you can redistribute it and/or modify it
2545       under the same terms as Perl itself.
2546
2547       Irrespective of its distribution, all code examples in this file are
2548       hereby placed into the public domain. You are permitted and encouraged
2549       to use this code in your own programs for fun or for profit as you see
2550       fit. A simple comment in the code giving credit would be courteous but
2551       is not required.
2552
2553
2554
2555perl v5.16.3                      2013-03-04                       PERLFAQ4(1)
Impressum