perlfaq4(1)

1PERLFAQ4(1)            Perl Programmers Reference Guide            PERLFAQ4(1)
2
3
4

NAME

6       perlfaq4 - Data Manipulation
7

DESCRIPTION

9       This section of the FAQ answers questions related to manipulating
10       numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
11

Data: Numbers

13   Why am I getting long decimals (eg, 19.9499999999999) instead of the
14       numbers I should be getting (eg, 19.95)?
15       Internally, your computer represents floating-point numbers in binary.
16       Digital (as in powers of two) computers cannot store all numbers
17       exactly.  Some real numbers lose precision in the process.  This is a
18       problem with how computers store numbers and affects all computer
19       languages, not just Perl.
20
21       perlnumber shows the gory details of number representations and
22       conversions.
23
24       To limit the number of decimal places in your numbers, you can use the
25       printf or sprintf function.  See the "Floating Point Arithmetic" for
26       more details.
27
28               printf "%.2f", 10/3;
29
30               my $number = sprintf "%.2f", 10/3;
31
32   Why is int() broken?
33       Your "int()" is most probably working just fine.  It's the numbers that
34       aren't quite what you think.
35
36       First, see the answer to "Why am I getting long decimals (eg,
37       19.9499999999999) instead of the numbers I should be getting (eg,
38       19.95)?".
39
40       For example, this
41
42               print int(0.6/0.2-2), "\n";
43
44       will in most computers print 0, not 1, because even such simple numbers
45       as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
46       What you think in the above as 'three' is really more like
47       2.9999999999999995559.
48
49   Why isn't my octal data interpreted correctly?
50       (contributed by brian d foy)
51
52       You're probably trying to convert a string to a number, which Perl only
53       converts as a decimal number. When Perl converts a string to a number,
54       it ignores leading spaces and zeroes, then assumes the rest of the
55       digits are in base 10:
56
57               my $string = '0644';
58
59               print $string + 0;  # prints 644
60
61               print $string + 44; # prints 688, certainly not octal!
62
63       This problem usually involves one of the Perl built-ins that has the
64       same name a unix command that uses octal numbers as arguments on the
65       command line. In this example, "chmod" on the command line knows that
66       its first argument is octal because that's what it does:
67
68               %prompt> chmod 644 file
69
70       If you want to use the same literal digits (644) in Perl, you have to
71       tell Perl to treat them as octal numbers either by prefixing the digits
72       with a 0 or using "oct":
73
74               chmod(     0644, $file);   # right, has leading zero
75               chmod( oct(644), $file );  # also correct
76
77       The problem comes in when you take your numbers from something that
78       Perl thinks is a string, such as a command line argument in @ARGV:
79
80               chmod( $ARGV[0],      $file);   # wrong, even if "0644"
81
82               chmod( oct($ARGV[0]), $file );  # correct, treat string as octal
83
84       You can always check the value you're using by printing it in octal
85       notation to ensure it matches what you think it should be. Print it in
86       octal  and decimal format:
87
88               printf "0%o %d", $number, $number;
89
90   Does Perl have a round() function?  What about ceil() and floor()?  Trig
91       functions?
92       Remember that "int()" merely truncates toward 0.  For rounding to a
93       certain number of digits, "sprintf()" or "printf()" is usually the
94       easiest route.
95
96               printf("%.3f", 3.1415926535);   # prints 3.142
97
98       The "POSIX" module (part of the standard Perl distribution) implements
99       "ceil()", "floor()", and a number of other mathematical and
100       trigonometric functions.
101
102               use POSIX;
103               $ceil   = ceil(3.5);   # 4
104               $floor  = floor(3.5);  # 3
105
106       In 5.000 to 5.003 perls, trigonometry was done in the "Math::Complex"
107       module.  With 5.004, the "Math::Trig" module (part of the standard Perl
108       distribution) implements the trigonometric functions. Internally it
109       uses the "Math::Complex" module and some functions can break out from
110       the real axis into the complex plane, for example the inverse sine of
111       2.
112
113       Rounding in financial applications can have serious implications, and
114       the rounding method used should be specified precisely.  In these
115       cases, it probably pays not to trust whichever system rounding is being
116       used by Perl, but to instead implement the rounding function you need
117       yourself.
118
119       To see why, notice how you'll still have an issue on half-way-point
120       alternation:
121
122               for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
123
124               0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
125               0.8 0.8 0.9 0.9 1.0 1.0
126
127       Don't blame Perl.  It's the same as in C.  IEEE says we have to do
128       this. Perl numbers whose absolute values are integers under 2**31 (on
129       32 bit machines) will work pretty much like mathematical integers.
130       Other numbers are not guaranteed.
131
132   How do I convert between numeric representations/bases/radixes?
133       As always with Perl there is more than one way to do it.  Below are a
134       few examples of approaches to making common conversions between number
135       representations.  This is intended to be representational rather than
136       exhaustive.
137
138       Some of the examples later in perlfaq4 use the "Bit::Vector" module
139       from CPAN. The reason you might choose "Bit::Vector" over the perl
140       built in functions is that it works with numbers of ANY size, that it
141       is optimized for speed on some operations, and for at least some
142       programmers the notation might be familiar.
143
144       How do I convert hexadecimal into decimal
145           Using perl's built in conversion of "0x" notation:
146
147                   $dec = 0xDEADBEEF;
148
149           Using the "hex" function:
150
151                   $dec = hex("DEADBEEF");
152
153           Using "pack":
154
155                   $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
156
157           Using the CPAN module "Bit::Vector":
158
159                   use Bit::Vector;
160                   $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
161                   $dec = $vec->to_Dec();
162
163       How do I convert from decimal to hexadecimal
164           Using "sprintf":
165
166                   $hex = sprintf("%X", 3735928559); # upper case A-F
167                   $hex = sprintf("%x", 3735928559); # lower case a-f
168
169           Using "unpack":
170
171                   $hex = unpack("H*", pack("N", 3735928559));
172
173           Using "Bit::Vector":
174
175                   use Bit::Vector;
176                   $vec = Bit::Vector->new_Dec(32, -559038737);
177                   $hex = $vec->to_Hex();
178
179           And "Bit::Vector" supports odd bit counts:
180
181                   use Bit::Vector;
182                   $vec = Bit::Vector->new_Dec(33, 3735928559);
183                   $vec->Resize(32); # suppress leading 0 if unwanted
184                   $hex = $vec->to_Hex();
185
186       How do I convert from octal to decimal
187           Using Perl's built in conversion of numbers with leading zeros:
188
189                   $dec = 033653337357; # note the leading 0!
190
191           Using the "oct" function:
192
193                   $dec = oct("33653337357");
194
195           Using "Bit::Vector":
196
197                   use Bit::Vector;
198                   $vec = Bit::Vector->new(32);
199                   $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
200                   $dec = $vec->to_Dec();
201
202       How do I convert from decimal to octal
203           Using "sprintf":
204
205                   $oct = sprintf("%o", 3735928559);
206
207           Using "Bit::Vector":
208
209                   use Bit::Vector;
210                   $vec = Bit::Vector->new_Dec(32, -559038737);
211                   $oct = reverse join('', $vec->Chunk_List_Read(3));
212
213       How do I convert from binary to decimal
214           Perl 5.6 lets you write binary numbers directly with the "0b"
215           notation:
216
217                   $number = 0b10110110;
218
219           Using "oct":
220
221                   my $input = "10110110";
222                   $decimal = oct( "0b$input" );
223
224           Using "pack" and "ord":
225
226                   $decimal = ord(pack('B8', '10110110'));
227
228           Using "pack" and "unpack" for larger strings:
229
230                   $int = unpack("N", pack("B32",
231                   substr("0" x 32 . "11110101011011011111011101111", -32)));
232                   $dec = sprintf("%d", $int);
233
234                   # substr() is used to left pad a 32 character string with zeros.
235
236           Using "Bit::Vector":
237
238                   $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
239                   $dec = $vec->to_Dec();
240
241       How do I convert from decimal to binary
242           Using "sprintf" (perl 5.6+):
243
244                   $bin = sprintf("%b", 3735928559);
245
246           Using "unpack":
247
248                   $bin = unpack("B*", pack("N", 3735928559));
249
250           Using "Bit::Vector":
251
252                   use Bit::Vector;
253                   $vec = Bit::Vector->new_Dec(32, -559038737);
254                   $bin = $vec->to_Bin();
255
256           The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
257           are left as an exercise to the inclined reader.
258
259   Why doesn't & work the way I want it to?
260       The behavior of binary arithmetic operators depends on whether they're
261       used on numbers or strings.  The operators treat a string as a series
262       of bits and work with that (the string "3" is the bit pattern
263       00110011).  The operators work with the binary form of a number (the
264       number 3 is treated as the bit pattern 00000011).
265
266       So, saying "11 & 3" performs the "and" operation on numbers (yielding
267       3).  Saying "11" & "3" performs the "and" operation on strings
268       (yielding "1").
269
270       Most problems with "&" and "|" arise because the programmer thinks they
271       have a number but really it's a string.  The rest arise because the
272       programmer says:
273
274               if ("\020\020" & "\101\101") {
275                       # ...
276                       }
277
278       but a string consisting of two null bytes (the result of "\020\020" &
279       "\101\101") is not a false value in Perl.  You need:
280
281               if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
282                       # ...
283                       }
284
285   How do I multiply matrices?
286       Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
287       or the PDL extension (also available from CPAN).
288
289   How do I perform an operation on a series of integers?
290       To call a function on each element in an array, and collect the
291       results, use:
292
293               @results = map { my_func($_) } @array;
294
295       For example:
296
297               @triple = map { 3 * $_ } @single;
298
299       To call a function on each element of an array, but ignore the results:
300
301               foreach $iterator (@array) {
302                       some_func($iterator);
303                       }
304
305       To call a function on each integer in a (small) range, you can use:
306
307               @results = map { some_func($_) } (5 .. 25);
308
309       but you should be aware that the ".." operator creates an array of all
310       integers in the range.  This can take a lot of memory for large ranges.
311       Instead use:
312
313               @results = ();
314               for ($i=5; $i < 500_005; $i++) {
315                       push(@results, some_func($i));
316                       }
317
318       This situation has been fixed in Perl5.005. Use of ".." in a "for" loop
319       will iterate over the range, without creating the entire range.
320
321               for my $i (5 .. 500_005) {
322                       push(@results, some_func($i));
323                       }
324
325       will not create a list of 500,000 integers.
326
327   How can I output Roman numerals?
328       Get the http://www.cpan.org/modules/by-module/Roman module.
329
330   Why aren't my random numbers random?
331       If you're using a version of Perl before 5.004, you must call "srand"
332       once at the start of your program to seed the random number generator.
333
334                BEGIN { srand() if $] < 5.004 }
335
336       5.004 and later automatically call "srand" at the beginning.  Don't
337       call "srand" more than once--you make your numbers less random, rather
338       than more.
339
340       Computers are good at being predictable and bad at being random
341       (despite appearances caused by bugs in your programs :-).  see the
342       random article in the "Far More Than You Ever Wanted To Know"
343       collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
344       of Tom Phoenix, talks more about this.  John von Neumann said, "Anyone
345       who attempts to generate random numbers by deterministic means is, of
346       course, living in a state of sin."
347
348       If you want numbers that are more random than "rand" with "srand"
349       provides, you should also check out the "Math::TrulyRandom" module from
350       CPAN.  It uses the imperfections in your system's timer to generate
351       random numbers, but this takes quite a while.  If you want a better
352       pseudorandom generator than comes with your operating system, look at
353       "Numerical Recipes in C" at http://www.nr.com/ .
354
355   How do I get a random number between X and Y?
356       To get a random number between two values, you can use the "rand()"
357       built-in to get a random number between 0 and 1. From there, you shift
358       that into the range that you want.
359
360       "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
361       you want to have perl figure out is a random number in the range from 0
362       to the difference between your X and Y.
363
364       That is, to get a number between 10 and 15, inclusive, you want a
365       random number between 0 and 5 that you can then add to 10.
366
367               my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
368
369       Hence you derive the following simple function to abstract that. It
370       selects a random integer between the two given integers (inclusive),
371       For example: "random_int_between(50,120)".
372
373               sub random_int_between {
374                       my($min, $max) = @_;
375                       # Assumes that the two arguments are integers themselves!
376                       return $min if $min == $max;
377                       ($min, $max) = ($max, $min)  if  $min > $max;
378                       return $min + int rand(1 + $max - $min);
379                       }
380

Data: Dates

382   How do I find the day or week of the year?
383       The localtime function returns the day of the year.  Without an
384       argument localtime uses the current time.
385
386               $day_of_year = (localtime)[7];
387
388       The "POSIX" module can also format a date as the day of the year or
389       week of the year.
390
391               use POSIX qw/strftime/;
392               my $day_of_year  = strftime "%j", localtime;
393               my $week_of_year = strftime "%W", localtime;
394
395       To get the day of year for any date, use "POSIX"'s "mktime" to get a
396       time in epoch seconds for the argument to localtime.
397
398               use POSIX qw/mktime strftime/;
399               my $week_of_year = strftime "%W",
400                       localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
401
402       The "Date::Calc" module provides two functions to calculate these.
403
404               use Date::Calc;
405               my $day_of_year  = Day_of_Year(  1987, 12, 18 );
406               my $week_of_year = Week_of_Year( 1987, 12, 18 );
407
408   How do I find the current century or millennium?
409       Use the following simple functions:
410
411               sub get_century    {
412                       return int((((localtime(shift || time))[5] + 1999))/100);
413                       }
414
415               sub get_millennium {
416                       return 1+int((((localtime(shift || time))[5] + 1899))/1000);
417                       }
418
419       On some systems, the "POSIX" module's "strftime()" function has been
420       extended in a non-standard way to use a %C format, which they sometimes
421       claim is the "century". It isn't, because on most such systems, this is
422       only the first two digits of the four-digit year, and thus cannot be
423       used to reliably determine the current century or millennium.
424
425   How can I compare two dates and find the difference?
426       (contributed by brian d foy)
427
428       You could just store all your dates as a number and then subtract.
429       Life isn't always that simple though. If you want to work with
430       formatted dates, the "Date::Manip", "Date::Calc", or "DateTime" modules
431       can help you.
432
433   How can I take a string and turn it into epoch seconds?
434       If it's a regular enough string that it always has the same format, you
435       can split it up and pass the parts to "timelocal" in the standard
436       "Time::Local" module.  Otherwise, you should look into the "Date::Calc"
437       and "Date::Manip" modules from CPAN.
438
439   How can I find the Julian Day?
440       (contributed by brian d foy and Dave Cross)
441
442       You can use the "Time::JulianDay" module available on CPAN.  Ensure
443       that you really want to find a Julian day, though, as many people have
444       different ideas about Julian days.  See
445       http://www.hermetic.ch/cal_stud/jdn.htm for instance.
446
447       You can also try the "DateTime" module, which can convert a date/time
448       to a Julian Day.
449
450               $ perl -MDateTime -le'print DateTime->today->jd'
451               2453401.5
452
453       Or the modified Julian Day
454
455               $ perl -MDateTime -le'print DateTime->today->mjd'
456               53401
457
458       Or even the day of the year (which is what some people think of as a
459       Julian day)
460
461               $ perl -MDateTime -le'print DateTime->today->doy'
462               31
463
464   How do I find yesterday's date?
465       (contributed by brian d foy)
466
467       Use one of the Date modules. The "DateTime" module makes it simple, and
468       give you the same time of day, only the day before.
469
470               use DateTime;
471
472               my $yesterday = DateTime->now->subtract( days => 1 );
473
474               print "Yesterday was $yesterday\n";
475
476       You can also use the "Date::Calc" module using its "Today_and_Now"
477       function.
478
479               use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
480
481               my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
482
483               print "@date_time\n";
484
485       Most people try to use the time rather than the calendar to figure out
486       dates, but that assumes that days are twenty-four hours each.  For most
487       people, there are two days a year when they aren't: the switch to and
488       from summer time throws this off. Let the modules do the work.
489
490       If you absolutely must do it yourself (or can't use one of the
491       modules), here's a solution using "Time::Local", which comes with Perl:
492
493               # contributed by Gunnar Hjalmarsson
494                use Time::Local;
495                my $today = timelocal 0, 0, 12, ( localtime )[3..5];
496                my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
497                printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
498
499       In this case, you measure the day starting at noon, and subtract 24
500       hours. Even if the length of the calendar day is 23 or 25 hours, you'll
501       still end up on the previous calendar day, although not at noon. Since
502       you don't care about the time, the one hour difference doesn't matter
503       and you end up with the previous date.
504
505   Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
506       Short answer: No, Perl does not have a Year 2000 problem.  Yes, Perl is
507       Y2K compliant (whatever that means). The programmers you've hired to
508       use it, however, probably are not.
509
510       Long answer: The question belies a true understanding of the issue.
511       Perl is just as Y2K compliant as your pencil--no more, and no less.
512       Can you use your pencil to write a non-Y2K-compliant memo?  Of course
513       you can.  Is that the pencil's fault?  Of course it isn't.
514
515       The date and time functions supplied with Perl (gmtime and localtime)
516       supply adequate information to determine the year well beyond 2000
517       (2038 is when trouble strikes for 32-bit machines).  The year returned
518       by these functions when used in a list context is the year minus 1900.
519       For years between 1910 and 1999 this happens to be a 2-digit decimal
520       number. To avoid the year 2000 problem simply do not treat the year as
521       a 2-digit number.  It isn't.
522
523       When gmtime() and localtime() are used in scalar context they return a
524       timestamp string that contains a fully-expanded year.  For example,
525       "$timestamp = gmtime(1005613200)" sets $timestamp to "Tue Nov 13
526       01:00:00 2001".  There's no year 2000 problem here.
527
528       That doesn't mean that Perl can't be used to create non-Y2K compliant
529       programs.  It can.  But so can your pencil.  It's the fault of the
530       user, not the language.  At the risk of inflaming the NRA: "Perl
531       doesn't break Y2K, people do."  See http://www.perl.org/about/y2k.html
532       for a longer exposition.
533

Data: Strings

535   How do I validate input?
536       (contributed by brian d foy)
537
538       There are many ways to ensure that values are what you expect or want
539       to accept. Besides the specific examples that we cover in the perlfaq,
540       you can also look at the modules with "Assert" and "Validate" in their
541       names, along with other modules such as "Regexp::Common".
542
543       Some modules have validation for particular types of input, such as
544       "Business::ISBN", "Business::CreditCard", "Email::Valid", and
545       "Data::Validate::IP".
546
547   How do I unescape a string?
548       It depends just what you mean by "escape".  URL escapes are dealt with
549       in perlfaq9.  Shell escapes with the backslash ("\") character are
550       removed with
551
552               s/\\(.)/$1/g;
553
554       This won't expand "\n" or "\t" or any other special escapes.
555
556   How do I remove consecutive pairs of characters?
557       (contributed by brian d foy)
558
559       You can use the substitution operator to find pairs of characters (or
560       runs of characters) and replace them with a single instance. In this
561       substitution, we find a character in "(.)". The memory parentheses
562       store the matched character in the back-reference "\1" and we use that
563       to require that the same thing immediately follow it. We replace that
564       part of the string with the character in $1.
565
566               s/(.)\1/$1/g;
567
568       We can also use the transliteration operator, "tr///". In this example,
569       the search list side of our "tr///" contains nothing, but the "c"
570       option complements that so it contains everything. The replacement list
571       also contains nothing, so the transliteration is almost a no-op since
572       it won't do any replacements (or more exactly, replace the character
573       with itself). However, the "s" option squashes duplicated and
574       consecutive characters in the string so a character does not show up
575       next to itself
576
577               my $str = 'Haarlem';   # in the Netherlands
578               $str =~ tr///cs;       # Now Harlem, like in New York
579
580   How do I expand function calls in a string?
581       (contributed by brian d foy)
582
583       This is documented in perlref, and although it's not the easiest thing
584       to read, it does work. In each of these examples, we call the function
585       inside the braces used to dereference a reference. If we have more than
586       one return value, we can construct and dereference an anonymous array.
587       In this case, we call the function in list context.
588
589               print "The time values are @{ [localtime] }.\n";
590
591       If we want to call the function in scalar context, we have to do a bit
592       more work. We can really have any code we like inside the braces, so we
593       simply have to end with the scalar reference, although how you do that
594       is up to you, and you can use code inside the braces. Note that the use
595       of parens creates a list context, so we need "scalar" to force the
596       scalar context on the function:
597
598               print "The time is ${\(scalar localtime)}.\n"
599
600               print "The time is ${ my $x = localtime; \$x }.\n";
601
602       If your function already returns a reference, you don't need to create
603       the reference yourself.
604
605               sub timestamp { my $t = localtime; \$t }
606
607               print "The time is ${ timestamp() }.\n";
608
609       The "Interpolation" module can also do a lot of magic for you. You can
610       specify a variable name, in this case "E", to set up a tied hash that
611       does the interpolation for you. It has several other methods to do this
612       as well.
613
614               use Interpolation E => 'eval';
615               print "The time values are $E{localtime()}.\n";
616
617       In most cases, it is probably easier to simply use string
618       concatenation, which also forces scalar context.
619
620               print "The time is " . localtime() . ".\n";
621
622   How do I find matching/nesting anything?
623       This isn't something that can be done in one regular expression, no
624       matter how complicated.  To find something between two single
625       characters, a pattern like "/x([^x]*)x/" will get the intervening bits
626       in $1. For multiple ones, then something more like "/alpha(.*?)omega/"
627       would be needed. But none of these deals with nested patterns.  For
628       balanced expressions using "(", "{", "[" or "<" as delimiters, use the
629       CPAN module Regexp::Common, or see "(??{ code })" in perlre.  For other
630       cases, you'll have to write a parser.
631
632       If you are serious about writing a parser, there are a number of
633       modules or oddities that will make your life a lot easier.  There are
634       the CPAN modules "Parse::RecDescent", "Parse::Yapp", and
635       "Text::Balanced"; and the "byacc" program. Starting from perl 5.8 the
636       "Text::Balanced" is part of the standard distribution.
637
638       One simple destructive, inside-out approach that you might try is to
639       pull out the smallest nesting parts one at a time:
640
641               while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
642                       # do something with $1
643                       }
644
645       A more complicated and sneaky approach is to make Perl's regular
646       expression engine do it for you.  This is courtesy Dean Inada, and
647       rather has the nature of an Obfuscated Perl Contest entry, but it
648       really does work:
649
650               # $_ contains the string to parse
651               # BEGIN and END are the opening and closing markers for the
652               # nested text.
653
654               @( = ('(','');
655               @) = (')','');
656               ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
657               @$ = (eval{/$re/},$@!~/unmatched/i);
658               print join("\n",@$[0..$#$]) if( $$[-1] );
659
660   How do I reverse a string?
661       Use "reverse()" in scalar context, as documented in "reverse" in
662       perlfunc.
663
664               $reversed = reverse $string;
665
666   How do I expand tabs in a string?
667       You can do it yourself:
668
669               1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
670
671       Or you can just use the "Text::Tabs" module (part of the standard Perl
672       distribution).
673
674               use Text::Tabs;
675               @expanded_lines = expand(@lines_with_tabs);
676
677   How do I reformat a paragraph?
678       Use "Text::Wrap" (part of the standard Perl distribution):
679
680               use Text::Wrap;
681               print wrap("\t", '  ', @paragraphs);
682
683       The paragraphs you give to "Text::Wrap" should not contain embedded
684       newlines.  "Text::Wrap" doesn't justify the lines (flush-right).
685
686       Or use the CPAN module "Text::Autoformat".  Formatting files can be
687       easily done by making a shell alias, like so:
688
689               alias fmt="perl -i -MText::Autoformat -n0777 \
690                       -e 'print autoformat $_, {all=>1}' $*"
691
692       See the documentation for "Text::Autoformat" to appreciate its many
693       capabilities.
694
695   How can I access or change N characters of a string?
696       You can access the first characters of a string with substr().  To get
697       the first character, for example, start at position 0 and grab the
698       string of length 1.
699
700               $string = "Just another Perl Hacker";
701               $first_char = substr( $string, 0, 1 );  #  'J'
702
703       To change part of a string, you can use the optional fourth argument
704       which is the replacement string.
705
706               substr( $string, 13, 4, "Perl 5.8.0" );
707
708       You can also use substr() as an lvalue.
709
710               substr( $string, 13, 4 ) =  "Perl 5.8.0";
711
712   How do I change the Nth occurrence of something?
713       You have to keep track of N yourself.  For example, let's say you want
714       to change the fifth occurrence of "whoever" or "whomever" into
715       "whosoever" or "whomsoever", case insensitively.  These all assume that
716       $_ contains the string to be altered.
717
718               $count = 0;
719               s{((whom?)ever)}{
720               ++$count == 5       # is it the 5th?
721                   ? "${2}soever"  # yes, swap
722                   : $1            # renege and leave it there
723                       }ige;
724
725       In the more general case, you can use the "/g" modifier in a "while"
726       loop, keeping count of matches.
727
728               $WANT = 3;
729               $count = 0;
730               $_ = "One fish two fish red fish blue fish";
731               while (/(\w+)\s+fish\b/gi) {
732                       if (++$count == $WANT) {
733                               print "The third fish is a $1 one.\n";
734                               }
735                       }
736
737       That prints out: "The third fish is a red one."  You can also use a
738       repetition count and repeated pattern like this:
739
740               /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
741
742   How can I count the number of occurrences of a substring within a string?
743       There are a number of ways, with varying efficiency.  If you want a
744       count of a certain single character (X) within a string, you can use
745       the "tr///" function like so:
746
747               $string = "ThisXlineXhasXsomeXx'sXinXit";
748               $count = ($string =~ tr/X//);
749               print "There are $count X characters in the string";
750
751       This is fine if you are just looking for a single character.  However,
752       if you are trying to count multiple character substrings within a
753       larger string, "tr///" won't work.  What you can do is wrap a while()
754       loop around a global pattern match.  For example, let's count negative
755       integers:
756
757               $string = "-9 55 48 -2 23 -76 4 14 -44";
758               while ($string =~ /-\d+/g) { $count++ }
759               print "There are $count negative numbers in the string";
760
761       Another version uses a global match in list context, then assigns the
762       result to a scalar, producing a count of the number of matches.
763
764               $count = () = $string =~ /-\d+/g;
765
766   Does Perl have a Year 2038 problem?
767       No, all of Perl's built in date and time functions and modules will
768       work to about 2 billion years before and after 1970.
769
770       Many systems cannot count time past the year 2038.  Older versions of
771       Perl were dependent on the system to do date calculation and thus
772       shared their 2038 bug.
773
774   How do I capitalize all the words on one line?
775       (contributed by brian d foy)
776
777       Damian Conway's Text::Autoformat handles all of the thinking for you.
778
779               use Text::Autoformat;
780               my $x = "Dr. Strangelove or: How I Learned to Stop ".
781                 "Worrying and Love the Bomb";
782
783               print $x, "\n";
784               for my $style (qw( sentence title highlight )) {
785                       print autoformat($x, { case => $style }), "\n";
786                       }
787
788       How do you want to capitalize those words?
789
790               FRED AND BARNEY'S LODGE        # all uppercase
791               Fred And Barney's Lodge        # title case
792               Fred and Barney's Lodge        # highlight case
793
794       It's not as easy a problem as it looks. How many words do you think are
795       in there? Wait for it... wait for it.... If you answered 5 you're
796       right. Perl words are groups of "\w+", but that's not what you want to
797       capitalize. How is Perl supposed to know not to capitalize that "s"
798       after the apostrophe? You could try a regular expression:
799
800               $string =~ s/ (
801                                        (^\w)    #at the beginning of the line
802                                          |      # or
803                                        (\s\w)   #preceded by whitespace
804                                          )
805                                       /\U$1/xg;
806
807               $string =~ s/([\w']+)/\u\L$1/g;
808
809       Now, what if you don't want to capitalize that "and"? Just use
810       Text::Autoformat and get on with the next problem. :)
811
812   How can I split a [character] delimited string except when inside
813       [character]?
814       Several modules can handle this sort of parsing--"Text::Balanced",
815       "Text::CSV", "Text::CSV_XS", and "Text::ParseWords", among others.
816
817       Take the example case of trying to split a string that is comma-
818       separated into its different fields. You can't use "split(/,/)" because
819       you shouldn't split if the comma is inside quotes.  For example, take a
820       data line like this:
821
822               SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
823
824       Due to the restriction of the quotes, this is a fairly complex problem.
825       Thankfully, we have Jeffrey Friedl, author of Mastering Regular
826       Expressions, to handle these for us.  He suggests (assuming your string
827       is contained in $text):
828
829                @new = ();
830                push(@new, $+) while $text =~ m{
831                        "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
832                       | ([^,]+),?
833                       | ,
834                       }gx;
835                push(@new, undef) if substr($text,-1,1) eq ',';
836
837       If you want to represent quotation marks inside a quotation-mark-
838       delimited field, escape them with backslashes (eg, "like \"this\"".
839
840       Alternatively, the "Text::ParseWords" module (part of the standard Perl
841       distribution) lets you say:
842
843               use Text::ParseWords;
844               @new = quotewords(",", 0, $text);
845
846   How do I strip blank space from the beginning/end of a string?
847       (contributed by brian d foy)
848
849       A substitution can do this for you. For a single line, you want to
850       replace all the leading or trailing whitespace with nothing. You can do
851       that with a pair of substitutions.
852
853               s/^\s+//;
854               s/\s+$//;
855
856       You can also write that as a single substitution, although it turns out
857       the combined statement is slower than the separate ones. That might not
858       matter to you, though.
859
860               s/^\s+|\s+$//g;
861
862       In this regular expression, the alternation matches either at the
863       beginning or the end of the string since the anchors have a lower
864       precedence than the alternation. With the "/g" flag, the substitution
865       makes all possible matches, so it gets both. Remember, the trailing
866       newline matches the "\s+", and  the "$" anchor can match to the
867       physical end of the string, so the newline disappears too. Just add the
868       newline to the output, which has the added benefit of preserving
869       "blank" (consisting entirely of whitespace) lines which the "^\s+"
870       would remove all by itself.
871
872               while( <> )
873                       {
874                       s/^\s+|\s+$//g;
875                       print "$_\n";
876                       }
877
878       For a multi-line string, you can apply the regular expression to each
879       logical line in the string by adding the "/m" flag (for "multi-line").
880       With the "/m" flag, the "$" matches before an embedded newline, so it
881       doesn't remove it. It still removes the newline at the end of the
882       string.
883
884               $string =~ s/^\s+|\s+$//gm;
885
886       Remember that lines consisting entirely of whitespace will disappear,
887       since the first part of the alternation can match the entire string and
888       replace it with nothing. If need to keep embedded blank lines, you have
889       to do a little more work. Instead of matching any whitespace (since
890       that includes a newline), just match the other whitespace.
891
892               $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
893
894   How do I pad a string with blanks or pad a number with zeroes?
895       In the following examples, $pad_len is the length to which you wish to
896       pad the string, $text or $num contains the string to be padded, and
897       $pad_char contains the padding character. You can use a single
898       character string constant instead of the $pad_char variable if you know
899       what it is in advance. And in the same way you can use an integer in
900       place of $pad_len if you know the pad length in advance.
901
902       The simplest method uses the "sprintf" function. It can pad on the left
903       or right with blanks and on the left with zeroes and it will not
904       truncate the result. The "pack" function can only pad strings on the
905       right with blanks and it will truncate the result to a maximum length
906       of $pad_len.
907
908               # Left padding a string with blanks (no truncation):
909               $padded = sprintf("%${pad_len}s", $text);
910               $padded = sprintf("%*s", $pad_len, $text);  # same thing
911
912               # Right padding a string with blanks (no truncation):
913               $padded = sprintf("%-${pad_len}s", $text);
914               $padded = sprintf("%-*s", $pad_len, $text); # same thing
915
916               # Left padding a number with 0 (no truncation):
917               $padded = sprintf("%0${pad_len}d", $num);
918               $padded = sprintf("%0*d", $pad_len, $num); # same thing
919
920               # Right padding a string with blanks using pack (will truncate):
921               $padded = pack("A$pad_len",$text);
922
923       If you need to pad with a character other than blank or zero you can
924       use one of the following methods.  They all generate a pad string with
925       the "x" operator and combine that with $text. These methods do not
926       truncate $text.
927
928       Left and right padding with any character, creating a new string:
929
930               $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
931               $padded = $text . $pad_char x ( $pad_len - length( $text ) );
932
933       Left and right padding with any character, modifying $text directly:
934
935               substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
936               $text .= $pad_char x ( $pad_len - length( $text ) );
937
938   How do I extract selected columns from a string?
939       (contributed by brian d foy)
940
941       If you know where the columns that contain the data, you can use
942       "substr" to extract a single column.
943
944               my $column = substr( $line, $start_column, $length );
945
946       You can use "split" if the columns are separated by whitespace or some
947       other delimiter, as long as whitespace or the delimiter cannot appear
948       as part of the data.
949
950               my $line    = ' fred barney   betty   ';
951               my @columns = split /\s+/, $line;
952                       # ( '', 'fred', 'barney', 'betty' );
953
954               my $line    = 'fred||barney||betty';
955               my @columns = split /\|/, $line;
956                       # ( 'fred', '', 'barney', '', 'betty' );
957
958       If you want to work with comma-separated values, don't do this since
959       that format is a bit more complicated. Use one of the modules that
960       handle that format, such as "Text::CSV", "Text::CSV_XS", or
961       "Text::CSV_PP".
962
963       If you want to break apart an entire line of fixed columns, you can use
964       "unpack" with the A (ASCII) format. by using a number after the format
965       specifier, you can denote the column width. See the "pack" and "unpack"
966       entries in perlfunc for more details.
967
968               my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
969
970       Note that spaces in the format argument to "unpack" do not denote
971       literal spaces. If you have space separated data, you may want "split"
972       instead.
973
974   How do I find the soundex value of a string?
975       (contributed by brian d foy)
976
977       You can use the Text::Soundex module. If you want to do fuzzy or close
978       matching, you might also try the "String::Approx", and
979       "Text::Metaphone", and "Text::DoubleMetaphone" modules.
980
981   How can I expand variables in text strings?
982       (contributed by brian d foy)
983
984       If you can avoid it, don't, or if you can use a templating system, such
985       as "Text::Template" or "Template" Toolkit, do that instead. You might
986       even be able to get the job done with "sprintf" or "printf":
987
988               my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
989
990       However, for the one-off simple case where I don't want to pull out a
991       full templating system, I'll use a string that has two Perl scalar
992       variables in it. In this example, I want to expand $foo and $bar to
993       their variable's values:
994
995               my $foo = 'Fred';
996               my $bar = 'Barney';
997               $string = 'Say hello to $foo and $bar';
998
999       One way I can do this involves the substitution operator and a double
1000       "/e" flag.  The first "/e" evaluates $1 on the replacement side and
1001       turns it into $foo. The second /e starts with $foo and replaces it with
1002       its value. $foo, then, turns into 'Fred', and that's finally what's
1003       left in the string:
1004
1005               $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1006
1007       The "/e" will also silently ignore violations of strict, replacing
1008       undefined variable names with the empty string. Since I'm using the
1009       "/e" flag (twice even!), I have all of the same security problems I
1010       have with "eval" in its string form. If there's something odd in $foo,
1011       perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1012       myself in trouble.
1013
1014       To get around the security problem, I could also pull the values from a
1015       hash instead of evaluating variable names. Using a single "/e", I can
1016       check the hash to ensure the value exists, and if it doesn't, I can
1017       replace the missing value with a marker, in this case "???" to signal
1018       that I missed something:
1019
1020               my $string = 'This has $foo and $bar';
1021
1022               my %Replacements = (
1023                       foo  => 'Fred',
1024                       );
1025
1026               # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1027               $string =~ s/\$(\w+)/
1028                       exists $Replacements{$1} ? $Replacements{$1} : '???'
1029                       /eg;
1030
1031               print $string;
1032
1033   What's wrong with always quoting "$vars"?
1034       The problem is that those double-quotes force stringification--coercing
1035       numbers and references into strings--even when you don't want them to
1036       be strings.  Think of it this way: double-quote expansion is used to
1037       produce new strings.  If you already have a string, why do you need
1038       more?
1039
1040       If you get used to writing odd things like these:
1041
1042               print "$var";           # BAD
1043               $new = "$old";          # BAD
1044               somefunc("$var");       # BAD
1045
1046       You'll be in trouble.  Those should (in 99.8% of the cases) be the
1047       simpler and more direct:
1048
1049               print $var;
1050               $new = $old;
1051               somefunc($var);
1052
1053       Otherwise, besides slowing you down, you're going to break code when
1054       the thing in the scalar is actually neither a string nor a number, but
1055       a reference:
1056
1057               func(\@array);
1058               sub func {
1059                       my $aref = shift;
1060                       my $oref = "$aref";  # WRONG
1061                       }
1062
1063       You can also get into subtle problems on those few operations in Perl
1064       that actually do care about the difference between a string and a
1065       number, such as the magical "++" autoincrement operator or the
1066       syscall() function.
1067
1068       Stringification also destroys arrays.
1069
1070               @lines = `command`;
1071               print "@lines";     # WRONG - extra blanks
1072               print @lines;       # right
1073
1074   Why don't my <<HERE documents work?
1075       Check for these three things:
1076
1077       There must be no space after the << part.
1078       There (probably) should be a semicolon at the end.
1079       You can't (easily) have any space in front of the tag.
1080
1081       If you want to indent the text in the here document, you can do this:
1082
1083           # all in one
1084           ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1085               your text
1086               goes here
1087           HERE_TARGET
1088
1089       But the HERE_TARGET must still be flush against the margin.  If you
1090       want that indented also, you'll have to quote in the indentation.
1091
1092           ($quote = <<'    FINIS') =~ s/^\s+//gm;
1093                   ...we will have peace, when you and all your works have
1094                   perished--and the works of your dark master to whom you
1095                   would deliver us. You are a liar, Saruman, and a corrupter
1096                   of men's hearts.  --Theoden in /usr/src/perl/taint.c
1097               FINIS
1098           $quote =~ s/\s+--/\n--/;
1099
1100       A nice general-purpose fixer-upper function for indented here documents
1101       follows.  It expects to be called with a here document as its argument.
1102       It looks to see whether each line begins with a common substring, and
1103       if so, strips that substring off.  Otherwise, it takes the amount of
1104       leading whitespace found on the first line and removes that much off
1105       each subsequent line.
1106
1107           sub fix {
1108               local $_ = shift;
1109               my ($white, $leader);  # common whitespace and common leading string
1110               if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1111                   ($white, $leader) = ($2, quotemeta($1));
1112               } else {
1113                   ($white, $leader) = (/^(\s+)/, '');
1114               }
1115               s/^\s*?$leader(?:$white)?//gm;
1116               return $_;
1117           }
1118
1119       This works with leading special strings, dynamically determined:
1120
1121               $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1122               @@@ int
1123               @@@ runops() {
1124               @@@     SAVEI32(runlevel);
1125               @@@     runlevel++;
1126               @@@     while ( op = (*op->op_ppaddr)() );
1127               @@@     TAINT_NOT;
1128               @@@     return 0;
1129               @@@ }
1130               MAIN_INTERPRETER_LOOP
1131
1132       Or with a fixed amount of leading whitespace, with remaining
1133       indentation correctly preserved:
1134
1135               $poem = fix<<EVER_ON_AND_ON;
1136              Now far ahead the Road has gone,
1137                 And I must follow, if I can,
1138              Pursuing it with eager feet,
1139                 Until it joins some larger way
1140              Where many paths and errands meet.
1141                 And whither then? I cannot say.
1142                       --Bilbo in /usr/src/perl/pp_ctl.c
1143               EVER_ON_AND_ON
1144

Data: Arrays

1146   What is the difference between a list and an array?
1147       An array has a changeable length.  A list does not.  An array is
1148       something you can push or pop, while a list is a set of values.  Some
1149       people make the distinction that a list is a value while an array is a
1150       variable. Subroutines are passed and return lists, you put things into
1151       list context, you initialize arrays with lists, and you "foreach()"
1152       across a list.  "@" variables are arrays, anonymous arrays are arrays,
1153       arrays in scalar context behave like the number of elements in them,
1154       subroutines access their arguments through the array @_, and
1155       "push"/"pop"/"shift" only work on arrays.
1156
1157       As a side note, there's no such thing as a list in scalar context.
1158       When you say
1159
1160               $scalar = (2, 5, 7, 9);
1161
1162       you're using the comma operator in scalar context, so it uses the
1163       scalar comma operator.  There never was a list there at all! This
1164       causes the last value to be returned: 9.
1165
1166   What is the difference between $array[1] and @array[1]?
1167       The former is a scalar value; the latter an array slice, making it a
1168       list with one (scalar) value.  You should use $ when you want a scalar
1169       value (most of the time) and @ when you want a list with one scalar
1170       value in it (very, very rarely; nearly never, in fact).
1171
1172       Sometimes it doesn't make a difference, but sometimes it does.  For
1173       example, compare:
1174
1175               $good[0] = `some program that outputs several lines`;
1176
1177       with
1178
1179               @bad[0]  = `same program that outputs several lines`;
1180
1181       The "use warnings" pragma and the -w flag will warn you about these
1182       matters.
1183
1184   How can I remove duplicate elements from a list or array?
1185       (contributed by brian d foy)
1186
1187       Use a hash. When you think the words "unique" or "duplicated", think
1188       "hash keys".
1189
1190       If you don't care about the order of the elements, you could just
1191       create the hash then extract the keys. It's not important how you
1192       create that hash: just that you use "keys" to get the unique elements.
1193
1194               my %hash   = map { $_, 1 } @array;
1195               # or a hash slice: @hash{ @array } = ();
1196               # or a foreach: $hash{$_} = 1 foreach ( @array );
1197
1198               my @unique = keys %hash;
1199
1200       If you want to use a module, try the "uniq" function from
1201       "List::MoreUtils". In list context it returns the unique elements,
1202       preserving their order in the list. In scalar context, it returns the
1203       number of unique elements.
1204
1205               use List::MoreUtils qw(uniq);
1206
1207               my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1208               my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1209
1210       You can also go through each element and skip the ones you've seen
1211       before. Use a hash to keep track. The first time the loop sees an
1212       element, that element has no key in %Seen. The "next" statement creates
1213       the key and immediately uses its value, which is "undef", so the loop
1214       continues to the "push" and increments the value for that key. The next
1215       time the loop sees that same element, its key exists in the hash and
1216       the value for that key is true (since it's not 0 or "undef"), so the
1217       next skips that iteration and the loop goes to the next element.
1218
1219               my @unique = ();
1220               my %seen   = ();
1221
1222               foreach my $elem ( @array )
1223                       {
1224                       next if $seen{ $elem }++;
1225                       push @unique, $elem;
1226                       }
1227
1228       You can write this more briefly using a grep, which does the same
1229       thing.
1230
1231               my %seen = ();
1232               my @unique = grep { ! $seen{ $_ }++ } @array;
1233
1234   How can I tell whether a certain element is contained in a list or array?
1235       (portions of this answer contributed by Anno Siegel and brian d foy)
1236
1237       Hearing the word "in" is an indication that you probably should have
1238       used a hash, not a list or array, to store your data.  Hashes are
1239       designed to answer this question quickly and efficiently.  Arrays
1240       aren't.
1241
1242       That being said, there are several ways to approach this.  In Perl 5.10
1243       and later, you can use the smart match operator to check that an item
1244       is contained in an array or a hash:
1245
1246               use 5.010;
1247
1248               if( $item ~~ @array )
1249                       {
1250                       say "The array contains $item"
1251                       }
1252
1253               if( $item ~~ %hash )
1254                       {
1255                       say "The hash contains $item"
1256                       }
1257
1258       With earlier versions of Perl, you have to do a bit more work. If you
1259       are going to make this query many times over arbitrary string values,
1260       the fastest way is probably to invert the original array and maintain a
1261       hash whose keys are the first array's values:
1262
1263               @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1264               %is_blue = ();
1265               for (@blues) { $is_blue{$_} = 1 }
1266
1267       Now you can check whether $is_blue{$some_color}.  It might have been a
1268       good idea to keep the blues all in a hash in the first place.
1269
1270       If the values are all small integers, you could use a simple indexed
1271       array.  This kind of an array will take up less space:
1272
1273               @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1274               @is_tiny_prime = ();
1275               for (@primes) { $is_tiny_prime[$_] = 1 }
1276               # or simply  @istiny_prime[@primes] = (1) x @primes;
1277
1278       Now you check whether $is_tiny_prime[$some_number].
1279
1280       If the values in question are integers instead of strings, you can save
1281       quite a lot of space by using bit strings instead:
1282
1283               @articles = ( 1..10, 150..2000, 2017 );
1284               undef $read;
1285               for (@articles) { vec($read,$_,1) = 1 }
1286
1287       Now check whether "vec($read,$n,1)" is true for some $n.
1288
1289       These methods guarantee fast individual tests but require a re-
1290       organization of the original list or array.  They only pay off if you
1291       have to test multiple values against the same array.
1292
1293       If you are testing only once, the standard module "List::Util" exports
1294       the function "first" for this purpose.  It works by stopping once it
1295       finds the element. It's written in C for speed, and its Perl equivalent
1296       looks like this subroutine:
1297
1298               sub first (&@) {
1299                       my $code = shift;
1300                       foreach (@_) {
1301                               return $_ if &{$code}();
1302                       }
1303                       undef;
1304               }
1305
1306       If speed is of little concern, the common idiom uses grep in scalar
1307       context (which returns the number of items that passed its condition)
1308       to traverse the entire list. This does have the benefit of telling you
1309       how many matches it found, though.
1310
1311               my $is_there = grep $_ eq $whatever, @array;
1312
1313       If you want to actually extract the matching elements, simply use grep
1314       in list context.
1315
1316               my @matches = grep $_ eq $whatever, @array;
1317
1318   How do I compute the difference of two arrays?  How do I compute the
1319       intersection of two arrays?
1320       Use a hash.  Here's code to do both and more.  It assumes that each
1321       element is unique in a given array:
1322
1323               @union = @intersection = @difference = ();
1324               %count = ();
1325               foreach $element (@array1, @array2) { $count{$element}++ }
1326               foreach $element (keys %count) {
1327                       push @union, $element;
1328                       push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1329                       }
1330
1331       Note that this is the symmetric difference, that is, all elements in
1332       either A or in B but not in both.  Think of it as an xor operation.
1333
1334   How do I test whether two arrays or hashes are equal?
1335       With Perl 5.10 and later, the smart match operator can give you the
1336       answer with the least amount of work:
1337
1338               use 5.010;
1339
1340               if( @array1 ~~ @array2 )
1341                       {
1342                       say "The arrays are the same";
1343                       }
1344
1345               if( %hash1 ~~ %hash2 ) # doesn't check values!
1346                       {
1347                       say "The hash keys are the same";
1348                       }
1349
1350       The following code works for single-level arrays.  It uses a stringwise
1351       comparison, and does not distinguish defined versus undefined empty
1352       strings.  Modify if you have other needs.
1353
1354               $are_equal = compare_arrays(\@frogs, \@toads);
1355
1356               sub compare_arrays {
1357                       my ($first, $second) = @_;
1358                       no warnings;  # silence spurious -w undef complaints
1359                       return 0 unless @$first == @$second;
1360                       for (my $i = 0; $i < @$first; $i++) {
1361                               return 0 if $first->[$i] ne $second->[$i];
1362                               }
1363                       return 1;
1364                       }
1365
1366       For multilevel structures, you may wish to use an approach more like
1367       this one.  It uses the CPAN module "FreezeThaw":
1368
1369               use FreezeThaw qw(cmpStr);
1370               @a = @b = ( "this", "that", [ "more", "stuff" ] );
1371
1372               printf "a and b contain %s arrays\n",
1373                       cmpStr(\@a, \@b) == 0
1374                       ? "the same"
1375                       : "different";
1376
1377       This approach also works for comparing hashes.  Here we'll demonstrate
1378       two different answers:
1379
1380               use FreezeThaw qw(cmpStr cmpStrHard);
1381
1382               %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1383               $a{EXTRA} = \%b;
1384               $b{EXTRA} = \%a;
1385
1386               printf "a and b contain %s hashes\n",
1387               cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1388
1389               printf "a and b contain %s hashes\n",
1390               cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1391
1392       The first reports that both those the hashes contain the same data,
1393       while the second reports that they do not.  Which you prefer is left as
1394       an exercise to the reader.
1395
1396   How do I find the first array element for which a condition is true?
1397       To find the first array element which satisfies a condition, you can
1398       use the "first()" function in the "List::Util" module, which comes with
1399       Perl 5.8. This example finds the first element that contains "Perl".
1400
1401               use List::Util qw(first);
1402
1403               my $element = first { /Perl/ } @array;
1404
1405       If you cannot use "List::Util", you can make your own loop to do the
1406       same thing.  Once you find the element, you stop the loop with last.
1407
1408               my $found;
1409               foreach ( @array ) {
1410                       if( /Perl/ ) { $found = $_; last }
1411                       }
1412
1413       If you want the array index, you can iterate through the indices and
1414       check the array element at each index until you find one that satisfies
1415       the condition.
1416
1417               my( $found, $index ) = ( undef, -1 );
1418               for( $i = 0; $i < @array; $i++ ) {
1419                       if( $array[$i] =~ /Perl/ ) {
1420                               $found = $array[$i];
1421                               $index = $i;
1422                               last;
1423                               }
1424                       }
1425
1426   How do I handle linked lists?
1427       In general, you usually don't need a linked list in Perl, since with
1428       regular arrays, you can push and pop or shift and unshift at either
1429       end, or you can use splice to add and/or remove arbitrary number of
1430       elements at arbitrary points.  Both pop and shift are O(1) operations
1431       on Perl's dynamic arrays.  In the absence of shifts and pops, push in
1432       general needs to reallocate on the order every log(N) times, and
1433       unshift will need to copy pointers each time.
1434
1435       If you really, really wanted, you could use structures as described in
1436       perldsc or perltoot and do just what the algorithm book tells you to
1437       do.  For example, imagine a list node like this:
1438
1439               $node = {
1440                       VALUE => 42,
1441                       LINK  => undef,
1442                       };
1443
1444       You could walk the list this way:
1445
1446               print "List: ";
1447               for ($node = $head;  $node; $node = $node->{LINK}) {
1448                       print $node->{VALUE}, " ";
1449                       }
1450               print "\n";
1451
1452       You could add to the list this way:
1453
1454               my ($head, $tail);
1455               $tail = append($head, 1);       # grow a new head
1456               for $value ( 2 .. 10 ) {
1457                       $tail = append($tail, $value);
1458                       }
1459
1460               sub append {
1461                       my($list, $value) = @_;
1462                       my $node = { VALUE => $value };
1463                       if ($list) {
1464                               $node->{LINK} = $list->{LINK};
1465                               $list->{LINK} = $node;
1466                               }
1467                       else {
1468                               $_[0] = $node;      # replace caller's version
1469                               }
1470                       return $node;
1471                       }
1472
1473       But again, Perl's built-in are virtually always good enough.
1474
1475   How do I handle circular lists?
1476       (contributed by brian d foy)
1477
1478       If you want to cycle through an array endlessy, you can increment the
1479       index modulo the number of elements in the array:
1480
1481               my @array = qw( a b c );
1482               my $i = 0;
1483
1484               while( 1 ) {
1485                       print $array[ $i++ % @array ], "\n";
1486                       last if $i > 20;
1487                       }
1488
1489       You can also use "Tie::Cycle" to use a scalar that always has the next
1490       element of the circular array:
1491
1492               use Tie::Cycle;
1493
1494               tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1495
1496               print $cycle; # FFFFFF
1497               print $cycle; # 000000
1498               print $cycle; # FFFF00
1499
1500       The "Array::Iterator::Circular" creates an iterator object for circular
1501       arrays:
1502
1503               use Array::Iterator::Circular;
1504
1505               my $color_iterator = Array::Iterator::Circular->new(
1506                       qw(red green blue orange)
1507                       );
1508
1509               foreach ( 1 .. 20 ) {
1510                       print $color_iterator->next, "\n";
1511                       }
1512
1513   How do I shuffle an array randomly?
1514       If you either have Perl 5.8.0 or later installed, or if you have
1515       Scalar-List-Utils 1.03 or later installed, you can say:
1516
1517               use List::Util 'shuffle';
1518
1519               @shuffled = shuffle(@list);
1520
1521       If not, you can use a Fisher-Yates shuffle.
1522
1523               sub fisher_yates_shuffle {
1524                       my $deck = shift;  # $deck is a reference to an array
1525                       return unless @$deck; # must not be empty!
1526
1527                       my $i = @$deck;
1528                       while (--$i) {
1529                               my $j = int rand ($i+1);
1530                               @$deck[$i,$j] = @$deck[$j,$i];
1531                               }
1532               }
1533
1534               # shuffle my mpeg collection
1535               #
1536               my @mpeg = <audio/*/*.mp3>;
1537               fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1538               print @mpeg;
1539
1540       Note that the above implementation shuffles an array in place, unlike
1541       the "List::Util::shuffle()" which takes a list and returns a new
1542       shuffled list.
1543
1544       You've probably seen shuffling algorithms that work using splice,
1545       randomly picking another element to swap the current element with
1546
1547               srand;
1548               @new = ();
1549               @old = 1 .. 10;  # just a demo
1550               while (@old) {
1551                       push(@new, splice(@old, rand @old, 1));
1552                       }
1553
1554       This is bad because splice is already O(N), and since you do it N
1555       times, you just invented a quadratic algorithm; that is, O(N**2).  This
1556       does not scale, although Perl is so efficient that you probably won't
1557       notice this until you have rather largish arrays.
1558
1559   How do I process/modify each element of an array?
1560       Use "for"/"foreach":
1561
1562               for (@lines) {
1563                       s/foo/bar/;     # change that word
1564                       tr/XZ/ZX/;      # swap those letters
1565                       }
1566
1567       Here's another; let's compute spherical volumes:
1568
1569               for (@volumes = @radii) {   # @volumes has changed parts
1570                       $_ **= 3;
1571                       $_ *= (4/3) * 3.14159;  # this will be constant folded
1572                       }
1573
1574       which can also be done with "map()" which is made to transform one list
1575       into another:
1576
1577               @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1578
1579       If you want to do the same thing to modify the values of the hash, you
1580       can use the "values" function.  As of Perl 5.6 the values are not
1581       copied, so if you modify $orbit (in this case), you modify the value.
1582
1583               for $orbit ( values %orbits ) {
1584                       ($orbit **= 3) *= (4/3) * 3.14159;
1585                       }
1586
1587       Prior to perl 5.6 "values" returned copies of the values, so older perl
1588       code often contains constructions such as @orbits{keys %orbits} instead
1589       of "values %orbits" where the hash is to be modified.
1590
1591   How do I select a random element from an array?
1592       Use the "rand()" function (see "rand" in perlfunc):
1593
1594               $index   = rand @array;
1595               $element = $array[$index];
1596
1597       Or, simply:
1598
1599               my $element = $array[ rand @array ];
1600
1601   How do I permute N elements of a list?
1602       Use the "List::Permutor" module on CPAN. If the list is actually an
1603       array, try the "Algorithm::Permute" module (also on CPAN). It's written
1604       in XS code and is very efficient:
1605
1606               use Algorithm::Permute;
1607
1608               my @array = 'a'..'d';
1609               my $p_iterator = Algorithm::Permute->new ( \@array );
1610
1611               while (my @perm = $p_iterator->next) {
1612                  print "next permutation: (@perm)\n";
1613                       }
1614
1615       For even faster execution, you could do:
1616
1617               use Algorithm::Permute;
1618
1619               my @array = 'a'..'d';
1620
1621               Algorithm::Permute::permute {
1622                       print "next permutation: (@array)\n";
1623                       } @array;
1624
1625       Here's a little program that generates all permutations of all the
1626       words on each line of input. The algorithm embodied in the "permute()"
1627       function is discussed in Volume 4 (still unpublished) of Knuth's The
1628       Art of Computer Programming and will work on any list:
1629
1630               #!/usr/bin/perl -n
1631               # Fischer-Krause ordered permutation generator
1632
1633               sub permute (&@) {
1634                       my $code = shift;
1635                       my @idx = 0..$#_;
1636                       while ( $code->(@_[@idx]) ) {
1637                               my $p = $#idx;
1638                               --$p while $idx[$p-1] > $idx[$p];
1639                               my $q = $p or return;
1640                               push @idx, reverse splice @idx, $p;
1641                               ++$q while $idx[$p-1] > $idx[$q];
1642                               @idx[$p-1,$q]=@idx[$q,$p-1];
1643                       }
1644               }
1645
1646               permute { print "@_\n" } split;
1647
1648       The "Algorithm::Loops" module also provides the "NextPermute" and
1649       "NextPermuteNum" functions which efficiently find all unique
1650       permutations of an array, even if it contains duplicate values,
1651       modifying it in-place: if its elements are in reverse-sorted order then
1652       the array is reversed, making it sorted, and it returns false;
1653       otherwise the next permutation is returned.
1654
1655       "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1656       you can enumerate all the permutations of 0..9 like this:
1657
1658               use Algorithm::Loops qw(NextPermuteNum);
1659
1660           my @list= 0..9;
1661           do { print "@list\n" } while NextPermuteNum @list;
1662
1663   How do I sort an array by (anything)?
1664       Supply a comparison function to sort() (described in "sort" in
1665       perlfunc):
1666
1667               @list = sort { $a <=> $b } @list;
1668
1669       The default sort function is cmp, string comparison, which would sort
1670       "(1, 2, 10)" into "(1, 10, 2)".  "<=>", used above, is the numerical
1671       comparison operator.
1672
1673       If you have a complicated function needed to pull out the part you want
1674       to sort on, then don't do it inside the sort function.  Pull it out
1675       first, because the sort BLOCK can be called many times for the same
1676       element.  Here's an example of how to pull out the first word after the
1677       first number on each item, and then sort those words case-
1678       insensitively.
1679
1680               @idx = ();
1681               for (@data) {
1682                       ($item) = /\d+\s*(\S+)/;
1683                       push @idx, uc($item);
1684                   }
1685               @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1686
1687       which could also be written this way, using a trick that's come to be
1688       known as the Schwartzian Transform:
1689
1690               @sorted = map  { $_->[0] }
1691                       sort { $a->[1] cmp $b->[1] }
1692                       map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1693
1694       If you need to sort on several fields, the following paradigm is
1695       useful.
1696
1697               @sorted = sort {
1698                       field1($a) <=> field1($b) ||
1699                       field2($a) cmp field2($b) ||
1700                       field3($a) cmp field3($b)
1701                       } @data;
1702
1703       This can be conveniently combined with precalculation of keys as given
1704       above.
1705
1706       See the sort article in the "Far More Than You Ever Wanted To Know"
1707       collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more
1708       about this approach.
1709
1710       See also the question later in perlfaq4 on sorting hashes.
1711
1712   How do I manipulate arrays of bits?
1713       Use "pack()" and "unpack()", or else "vec()" and the bitwise
1714       operations.
1715
1716       For example, you don't have to store individual bits in an array (which
1717       would mean that you're wasting a lot of space). To convert an array of
1718       bits to a string, use "vec()" to set the right bits. This sets $vec to
1719       have bit N set only if $ints[N] was set:
1720
1721               @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1722               $vec = '';
1723               foreach( 0 .. $#ints ) {
1724                       vec($vec,$_,1) = 1 if $ints[$_];
1725                       }
1726
1727       The string $vec only takes up as many bits as it needs. For instance,
1728       if you had 16 entries in @ints, $vec only needs two bytes to store them
1729       (not counting the scalar variable overhead).
1730
1731       Here's how, given a vector in $vec, you can get those bits into your
1732       @ints array:
1733
1734               sub bitvec_to_list {
1735                       my $vec = shift;
1736                       my @ints;
1737                       # Find null-byte density then select best algorithm
1738                       if ($vec =~ tr/\0// / length $vec > 0.95) {
1739                               use integer;
1740                               my $i;
1741
1742                               # This method is faster with mostly null-bytes
1743                               while($vec =~ /[^\0]/g ) {
1744                                       $i = -9 + 8 * pos $vec;
1745                                       push @ints, $i if vec($vec, ++$i, 1);
1746                                       push @ints, $i if vec($vec, ++$i, 1);
1747                                       push @ints, $i if vec($vec, ++$i, 1);
1748                                       push @ints, $i if vec($vec, ++$i, 1);
1749                                       push @ints, $i if vec($vec, ++$i, 1);
1750                                       push @ints, $i if vec($vec, ++$i, 1);
1751                                       push @ints, $i if vec($vec, ++$i, 1);
1752                                       push @ints, $i if vec($vec, ++$i, 1);
1753                                       }
1754                               }
1755                       else {
1756                               # This method is a fast general algorithm
1757                               use integer;
1758                               my $bits = unpack "b*", $vec;
1759                               push @ints, 0 if $bits =~ s/^(\d)// && $1;
1760                               push @ints, pos $bits while($bits =~ /1/g);
1761                               }
1762
1763                       return \@ints;
1764                       }
1765
1766       This method gets faster the more sparse the bit vector is.  (Courtesy
1767       of Tim Bunce and Winfried Koenig.)
1768
1769       You can make the while loop a lot shorter with this suggestion from
1770       Benjamin Goldberg:
1771
1772               while($vec =~ /[^\0]+/g ) {
1773                       push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1774                       }
1775
1776       Or use the CPAN module "Bit::Vector":
1777
1778               $vector = Bit::Vector->new($num_of_bits);
1779               $vector->Index_List_Store(@ints);
1780               @ints = $vector->Index_List_Read();
1781
1782       "Bit::Vector" provides efficient methods for bit vector, sets of small
1783       integers and "big int" math.
1784
1785       Here's a more extensive illustration using vec():
1786
1787               # vec demo
1788               $vector = "\xff\x0f\xef\xfe";
1789               print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1790               unpack("N", $vector), "\n";
1791               $is_set = vec($vector, 23, 1);
1792               print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1793               pvec($vector);
1794
1795               set_vec(1,1,1);
1796               set_vec(3,1,1);
1797               set_vec(23,1,1);
1798
1799               set_vec(3,1,3);
1800               set_vec(3,2,3);
1801               set_vec(3,4,3);
1802               set_vec(3,4,7);
1803               set_vec(3,8,3);
1804               set_vec(3,8,7);
1805
1806               set_vec(0,32,17);
1807               set_vec(1,32,17);
1808
1809               sub set_vec {
1810                       my ($offset, $width, $value) = @_;
1811                       my $vector = '';
1812                       vec($vector, $offset, $width) = $value;
1813                       print "offset=$offset width=$width value=$value\n";
1814                       pvec($vector);
1815                       }
1816
1817               sub pvec {
1818                       my $vector = shift;
1819                       my $bits = unpack("b*", $vector);
1820                       my $i = 0;
1821                       my $BASE = 8;
1822
1823                       print "vector length in bytes: ", length($vector), "\n";
1824                       @bytes = unpack("A8" x length($vector), $bits);
1825                       print "bits are: @bytes\n\n";
1826                       }
1827
1828   Why does defined() return true on empty arrays and hashes?
1829       The short story is that you should probably only use defined on scalars
1830       or functions, not on aggregates (arrays and hashes).  See "defined" in
1831       perlfunc in the 5.004 release or later of Perl for more detail.
1832

Data: Hashes (Associative Arrays)

1834   How do I process an entire hash?
1835       (contributed by brian d foy)
1836
1837       There are a couple of ways that you can process an entire hash. You can
1838       get a list of keys, then go through each key, or grab a one key-value
1839       pair at a time.
1840
1841       To go through all of the keys, use the "keys" function. This extracts
1842       all of the keys of the hash and gives them back to you as a list. You
1843       can then get the value through the particular key you're processing:
1844
1845               foreach my $key ( keys %hash ) {
1846                       my $value = $hash{$key}
1847                       ...
1848                       }
1849
1850       Once you have the list of keys, you can process that list before you
1851       process the hash elements. For instance, you can sort the keys so you
1852       can process them in lexical order:
1853
1854               foreach my $key ( sort keys %hash ) {
1855                       my $value = $hash{$key}
1856                       ...
1857                       }
1858
1859       Or, you might want to only process some of the items. If you only want
1860       to deal with the keys that start with "text:", you can select just
1861       those using "grep":
1862
1863               foreach my $key ( grep /^text:/, keys %hash ) {
1864                       my $value = $hash{$key}
1865                       ...
1866                       }
1867
1868       If the hash is very large, you might not want to create a long list of
1869       keys. To save some memory, you can grab one key-value pair at a time
1870       using "each()", which returns a pair you haven't seen yet:
1871
1872               while( my( $key, $value ) = each( %hash ) ) {
1873                       ...
1874                       }
1875
1876       The "each" operator returns the pairs in apparently random order, so if
1877       ordering matters to you, you'll have to stick with the "keys" method.
1878
1879       The "each()" operator can be a bit tricky though. You can't add or
1880       delete keys of the hash while you're using it without possibly skipping
1881       or re-processing some pairs after Perl internally rehashes all of the
1882       elements. Additionally, a hash has only one iterator, so if you use
1883       "keys", "values", or "each" on the same hash, you can reset the
1884       iterator and mess up your processing. See the "each" entry in perlfunc
1885       for more details.
1886
1887   How do I merge two hashes?
1888       (contributed by brian d foy)
1889
1890       Before you decide to merge two hashes, you have to decide what to do if
1891       both hashes contain keys that are the same and if you want to leave the
1892       original hashes as they were.
1893
1894       If you want to preserve the original hashes, copy one hash (%hash1) to
1895       a new hash (%new_hash), then add the keys from the other hash (%hash2
1896       to the new hash. Checking that the key already exists in %new_hash
1897       gives you a chance to decide what to do with the duplicates:
1898
1899               my %new_hash = %hash1; # make a copy; leave %hash1 alone
1900
1901               foreach my $key2 ( keys %hash2 )
1902                       {
1903                       if( exists $new_hash{$key2} )
1904                               {
1905                               warn "Key [$key2] is in both hashes!";
1906                               # handle the duplicate (perhaps only warning)
1907                               ...
1908                               next;
1909                               }
1910                       else
1911                               {
1912                               $new_hash{$key2} = $hash2{$key2};
1913                               }
1914                       }
1915
1916       If you don't want to create a new hash, you can still use this looping
1917       technique; just change the %new_hash to %hash1.
1918
1919               foreach my $key2 ( keys %hash2 )
1920                       {
1921                       if( exists $hash1{$key2} )
1922                               {
1923                               warn "Key [$key2] is in both hashes!";
1924                               # handle the duplicate (perhaps only warning)
1925                               ...
1926                               next;
1927                               }
1928                       else
1929                               {
1930                               $hash1{$key2} = $hash2{$key2};
1931                               }
1932                       }
1933
1934       If you don't care that one hash overwrites keys and values from the
1935       other, you could just use a hash slice to add one hash to another. In
1936       this case, values from %hash2 replace values from %hash1 when they have
1937       keys in common:
1938
1939               @hash1{ keys %hash2 } = values %hash2;
1940
1941   What happens if I add or remove keys from a hash while iterating over it?
1942       (contributed by brian d foy)
1943
1944       The easy answer is "Don't do that!"
1945
1946       If you iterate through the hash with each(), you can delete the key
1947       most recently returned without worrying about it.  If you delete or add
1948       other keys, the iterator may skip or double up on them since perl may
1949       rearrange the hash table.  See the entry for "each()" in perlfunc.
1950
1951   How do I look up a hash element by value?
1952       Create a reverse hash:
1953
1954               %by_value = reverse %by_key;
1955               $key = $by_value{$value};
1956
1957       That's not particularly efficient.  It would be more space-efficient to
1958       use:
1959
1960               while (($key, $value) = each %by_key) {
1961                       $by_value{$value} = $key;
1962                   }
1963
1964       If your hash could have repeated values, the methods above will only
1965       find one of the associated keys.   This may or may not worry you.  If
1966       it does worry you, you can always reverse the hash into a hash of
1967       arrays instead:
1968
1969               while (($key, $value) = each %by_key) {
1970                        push @{$key_list_by_value{$value}}, $key;
1971                       }
1972
1973   How can I know how many entries are in a hash?
1974       (contributed by brian d foy)
1975
1976       This is very similar to "How do I process an entire hash?", also in
1977       perlfaq4, but a bit simpler in the common cases.
1978
1979       You can use the "keys()" built-in function in scalar context to find
1980       out have many entries you have in a hash:
1981
1982               my $key_count = keys %hash; # must be scalar context!
1983
1984       If you want to find out how many entries have a defined value, that's a
1985       bit different. You have to check each value. A "grep" is handy:
1986
1987               my $defined_value_count = grep { defined } values %hash;
1988
1989       You can use that same structure to count the entries any way that you
1990       like. If you want the count of the keys with vowels in them, you just
1991       test for that instead:
1992
1993               my $vowel_count = grep { /[aeiou]/ } keys %hash;
1994
1995       The "grep" in scalar context returns the count. If you want the list of
1996       matching items, just use it in list context instead:
1997
1998               my @defined_values = grep { defined } values %hash;
1999
2000       The "keys()" function also resets the iterator, which means that you
2001       may see strange results if you use this between uses of other hash
2002       operators such as "each()".
2003
2004   How do I sort a hash (optionally by value instead of key)?
2005       (contributed by brian d foy)
2006
2007       To sort a hash, start with the keys. In this example, we give the list
2008       of keys to the sort function which then compares them ASCIIbetically
2009       (which might be affected by your locale settings). The output list has
2010       the keys in ASCIIbetical order. Once we have the keys, we can go
2011       through them to create a report which lists the keys in ASCIIbetical
2012       order.
2013
2014               my @keys = sort { $a cmp $b } keys %hash;
2015
2016               foreach my $key ( @keys )
2017                       {
2018                       printf "%-20s %6d\n", $key, $hash{$key};
2019                       }
2020
2021       We could get more fancy in the "sort()" block though. Instead of
2022       comparing the keys, we can compute a value with them and use that value
2023       as the comparison.
2024
2025       For instance, to make our report order case-insensitive, we use the
2026       "\L" sequence in a double-quoted string to make everything lowercase.
2027       The "sort()" block then compares the lowercased values to determine in
2028       which order to put the keys.
2029
2030               my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
2031
2032       Note: if the computation is expensive or the hash has many elements,
2033       you may want to look at the Schwartzian Transform to cache the
2034       computation results.
2035
2036       If we want to sort by the hash value instead, we use the hash key to
2037       look it up. We still get out a list of keys, but this time they are
2038       ordered by their value.
2039
2040               my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2041
2042       From there we can get more complex. If the hash values are the same, we
2043       can provide a secondary sort on the hash key.
2044
2045               my @keys = sort {
2046                       $hash{$a} <=> $hash{$b}
2047                               or
2048                       "\L$a" cmp "\L$b"
2049                       } keys %hash;
2050
2051   How can I always keep my hash sorted?
2052       You can look into using the "DB_File" module and "tie()" using the
2053       $DB_BTREE hash bindings as documented in "In Memory Databases" in
2054       DB_File. The "Tie::IxHash" module from CPAN might also be instructive.
2055       Although this does keep your hash sorted, you might not like the slow
2056       down you suffer from the tie interface. Are you sure you need to do
2057       this? :)
2058
2059   What's the difference between "delete" and "undef" with hashes?
2060       Hashes contain pairs of scalars: the first is the key, the second is
2061       the value.  The key will be coerced to a string, although the value can
2062       be any kind of scalar: string, number, or reference.  If a key $key is
2063       present in %hash, "exists($hash{$key})" will return true.  The value
2064       for a given key can be "undef", in which case $hash{$key} will be
2065       "undef" while "exists $hash{$key}" will return true.  This corresponds
2066       to ($key, "undef") being in the hash.
2067
2068       Pictures help...  here's the %hash table:
2069
2070                 keys  values
2071               +------+------+
2072               |  a   |  3   |
2073               |  x   |  7   |
2074               |  d   |  0   |
2075               |  e   |  2   |
2076               +------+------+
2077
2078       And these conditions hold
2079
2080               $hash{'a'}                       is true
2081               $hash{'d'}                       is false
2082               defined $hash{'d'}               is true
2083               defined $hash{'a'}               is true
2084               exists $hash{'a'}                is true (Perl 5 only)
2085               grep ($_ eq 'a', keys %hash)     is true
2086
2087       If you now say
2088
2089               undef $hash{'a'}
2090
2091       your table now reads:
2092
2093                 keys  values
2094               +------+------+
2095               |  a   | undef|
2096               |  x   |  7   |
2097               |  d   |  0   |
2098               |  e   |  2   |
2099               +------+------+
2100
2101       and these conditions now hold; changes in caps:
2102
2103               $hash{'a'}                       is FALSE
2104               $hash{'d'}                       is false
2105               defined $hash{'d'}               is true
2106               defined $hash{'a'}               is FALSE
2107               exists $hash{'a'}                is true (Perl 5 only)
2108               grep ($_ eq 'a', keys %hash)     is true
2109
2110       Notice the last two: you have an undef value, but a defined key!
2111
2112       Now, consider this:
2113
2114               delete $hash{'a'}
2115
2116       your table now reads:
2117
2118                 keys  values
2119               +------+------+
2120               |  x   |  7   |
2121               |  d   |  0   |
2122               |  e   |  2   |
2123               +------+------+
2124
2125       and these conditions now hold; changes in caps:
2126
2127               $hash{'a'}                       is false
2128               $hash{'d'}                       is false
2129               defined $hash{'d'}               is true
2130               defined $hash{'a'}               is false
2131               exists $hash{'a'}                is FALSE (Perl 5 only)
2132               grep ($_ eq 'a', keys %hash)     is FALSE
2133
2134       See, the whole entry is gone!
2135
2136   Why don't my tied hashes make the defined/exists distinction?
2137       This depends on the tied hash's implementation of EXISTS().  For
2138       example, there isn't the concept of undef with hashes that are tied to
2139       DBM* files. It also means that exists() and defined() do the same thing
2140       with a DBM* file, and what they end up doing is not what they do with
2141       ordinary hashes.
2142
2143   How do I reset an each() operation part-way through?
2144       (contributed by brian d foy)
2145
2146       You can use the "keys" or "values" functions to reset "each". To simply
2147       reset the iterator used by "each" without doing anything else, use one
2148       of them in void context:
2149
2150               keys %hash; # resets iterator, nothing else.
2151               values %hash; # resets iterator, nothing else.
2152
2153       See the documentation for "each" in perlfunc.
2154
2155   How can I get the unique keys from two hashes?
2156       First you extract the keys from the hashes into lists, then solve the
2157       "removing duplicates" problem described above.  For example:
2158
2159               %seen = ();
2160               for $element (keys(%foo), keys(%bar)) {
2161                       $seen{$element}++;
2162                       }
2163               @uniq = keys %seen;
2164
2165       Or more succinctly:
2166
2167               @uniq = keys %{{%foo,%bar}};
2168
2169       Or if you really want to save space:
2170
2171               %seen = ();
2172               while (defined ($key = each %foo)) {
2173                       $seen{$key}++;
2174               }
2175               while (defined ($key = each %bar)) {
2176                       $seen{$key}++;
2177               }
2178               @uniq = keys %seen;
2179
2180   How can I store a multidimensional array in a DBM file?
2181       Either stringify the structure yourself (no fun), or else get the MLDBM
2182       (which uses Data::Dumper) module from CPAN and layer it on top of
2183       either DB_File or GDBM_File.
2184
2185   How can I make my hash remember the order I put elements into it?
2186       Use the "Tie::IxHash" from CPAN.
2187
2188               use Tie::IxHash;
2189
2190               tie my %myhash, 'Tie::IxHash';
2191
2192               for (my $i=0; $i<20; $i++) {
2193                       $myhash{$i} = 2*$i;
2194                       }
2195
2196               my @keys = keys %myhash;
2197               # @keys = (0,1,2,3,...)
2198
2199   Why does passing a subroutine an undefined element in a hash create it?
2200       (contributed by brian d foy)
2201
2202       Are you using a really old version of Perl?
2203
2204       Normally, accessing a hash key's value for a nonexistent key will not
2205       create the key.
2206
2207               my %hash  = ();
2208               my $value = $hash{ 'foo' };
2209               print "This won't print\n" if exists $hash{ 'foo' };
2210
2211       Passing $hash{ 'foo' } to a subroutine used to be a special case,
2212       though.  Since you could assign directly to $_[0], Perl had to be ready
2213       to make that assignment so it created the hash key ahead of time:
2214
2215           my_sub( $hash{ 'foo' } );
2216               print "This will print before 5.004\n" if exists $hash{ 'foo' };
2217
2218               sub my_sub {
2219                       # $_[0] = 'bar'; # create hash key in case you do this
2220                       1;
2221                       }
2222
2223       Since Perl 5.004, however, this situation is a special case and Perl
2224       creates the hash key only when you make the assignment:
2225
2226           my_sub( $hash{ 'foo' } );
2227               print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2228
2229               sub my_sub {
2230                       $_[0] = 'bar';
2231                       }
2232
2233       However, if you want the old behavior (and think carefully about that
2234       because it's a weird side effect), you can pass a hash slice instead.
2235       Perl 5.004 didn't make this a special case:
2236
2237               my_sub( @hash{ qw/foo/ } );
2238
2239   How can I make the Perl equivalent of a C structure/C++ class/hash or array
2240       of hashes or arrays?
2241       Usually a hash ref, perhaps like this:
2242
2243               $record = {
2244                       NAME   => "Jason",
2245                       EMPNO  => 132,
2246                       TITLE  => "deputy peon",
2247                       AGE    => 23,
2248                       SALARY => 37_000,
2249                       PALS   => [ "Norbert", "Rhys", "Phineas"],
2250               };
2251
2252       References are documented in perlref and the upcoming perlreftut.
2253       Examples of complex data structures are given in perldsc and perllol.
2254       Examples of structures and object-oriented classes are in perltoot.
2255
2256   How can I use a reference as a hash key?
2257       (contributed by brian d foy and Ben Morrow)
2258
2259       Hash keys are strings, so you can't really use a reference as the key.
2260       When you try to do that, perl turns the reference into its stringified
2261       form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2262       the reference from the stringified form, at least without doing some
2263       extra work on your own.
2264
2265       Remember that the entry in the hash will still be there even if the
2266       referenced variable  goes out of scope, and that it is entirely
2267       possible for Perl to subsequently allocate a different variable at the
2268       same address. This will mean a new variable might accidentally be
2269       associated with the value for an old.
2270
2271       If you have Perl 5.10 or later, and you just want to store a value
2272       against the reference for lookup later, you can use the core
2273       Hash::Util::Fieldhash module. This will also handle renaming the keys
2274       if you use multiple threads (which causes all variables to be
2275       reallocated at new addresses, changing their stringification), and
2276       garbage-collecting the entries when the referenced variable goes out of
2277       scope.
2278
2279       If you actually need to be able to get a real reference back from each
2280       hash entry, you can use the Tie::RefHash module, which does the
2281       required work for you.
2282

Data: Misc

2284   How do I handle binary data correctly?
2285       Perl is binary clean, so it can handle binary data just fine.  On
2286       Windows or DOS, however, you have to use "binmode" for binary files to
2287       avoid conversions for line endings. In general, you should use
2288       "binmode" any time you want to work with binary data.
2289
2290       Also see "binmode" in perlfunc or perlopentut.
2291
2292       If you're concerned about 8-bit textual data then see perllocale.  If
2293       you want to deal with multibyte characters, however, there are some
2294       gotchas.  See the section on Regular Expressions.
2295
2296   How do I determine whether a scalar is a number/whole/integer/float?
2297       Assuming that you don't care about IEEE notations like "NaN" or
2298       "Infinity", you probably just want to use a regular expression.
2299
2300               if (/\D/)            { print "has nondigits\n" }
2301               if (/^\d+$/)         { print "is a whole number\n" }
2302               if (/^-?\d+$/)       { print "is an integer\n" }
2303               if (/^[+-]?\d+$/)    { print "is a +/- integer\n" }
2304               if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2305               if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2306               if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
2307                               { print "a C float\n" }
2308
2309       There are also some commonly used modules for the task.  Scalar::Util
2310       (distributed with 5.8) provides access to perl's internal function
2311       "looks_like_number" for determining whether a variable looks like a
2312       number.  Data::Types exports functions that validate data types using
2313       both the above and other regular expressions. Thirdly, there is
2314       "Regexp::Common" which has regular expressions to match various types
2315       of numbers. Those three modules are available from the CPAN.
2316
2317       If you're on a POSIX system, Perl supports the "POSIX::strtod"
2318       function.  Its semantics are somewhat cumbersome, so here's a "getnum"
2319       wrapper function for more convenient access.  This function takes a
2320       string and returns the number it found, or "undef" for input that isn't
2321       a C float.  The "is_numeric" function is a front end to "getnum" if you
2322       just want to say, "Is this a float?"
2323
2324               sub getnum {
2325                       use POSIX qw(strtod);
2326                       my $str = shift;
2327                       $str =~ s/^\s+//;
2328                       $str =~ s/\s+$//;
2329                       $! = 0;
2330                       my($num, $unparsed) = strtod($str);
2331                       if (($str eq '') || ($unparsed != 0) || $!) {
2332                                       return undef;
2333                               }
2334                       else {
2335                               return $num;
2336                               }
2337                       }
2338
2339               sub is_numeric { defined getnum($_[0]) }
2340
2341       Or you could check out the String::Scanf module on the CPAN instead.
2342       The "POSIX" module (part of the standard Perl distribution) provides
2343       the "strtod" and "strtol" for converting strings to double and longs,
2344       respectively.
2345
2346   How do I keep persistent data across program calls?
2347       For some specific applications, you can use one of the DBM modules.
2348       See AnyDBM_File.  More generically, you should consult the "FreezeThaw"
2349       or "Storable" modules from CPAN.  Starting from Perl 5.8 "Storable" is
2350       part of the standard distribution.  Here's one example using
2351       "Storable"'s "store" and "retrieve" functions:
2352
2353               use Storable;
2354               store(\%hash, "filename");
2355
2356               # later on...
2357               $href = retrieve("filename");        # by ref
2358               %hash = %{ retrieve("filename") };   # direct to hash
2359
2360   How do I print out or copy a recursive data structure?
2361       The "Data::Dumper" module on CPAN (or the 5.005 release of Perl) is
2362       great for printing out data structures.  The "Storable" module on CPAN
2363       (or the 5.8 release of Perl), provides a function called "dclone" that
2364       recursively copies its argument.
2365
2366               use Storable qw(dclone);
2367               $r2 = dclone($r1);
2368
2369       Where $r1 can be a reference to any kind of data structure you'd like.
2370       It will be deeply copied.  Because "dclone" takes and returns
2371       references, you'd have to add extra punctuation if you had a hash of
2372       arrays that you wanted to copy.
2373
2374               %newhash = %{ dclone(\%oldhash) };
2375
2376   How do I define methods for every class/object?
2377       (contributed by Ben Morrow)
2378
2379       You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2380       very careful to consider the consequences of doing this: adding methods
2381       to every object is very likely to have unintended consequences. If
2382       possible, it would be better to have all your object inherit from some
2383       common base class, or to use an object system like Moose that supports
2384       roles.
2385
2386   How do I verify a credit card checksum?
2387       Get the "Business::CreditCard" module from CPAN.
2388
2389   How do I pack arrays of doubles or floats for XS code?
2390       The arrays.h/arrays.c code in the "PGPLOT" module on CPAN does just
2391       this.  If you're doing a lot of float or double processing, consider
2392       using the "PDL" module from CPAN instead--it makes number-crunching
2393       easy.
2394
2395       See <http://search.cpan.org/dist/PGPLOT> for the code.
2396

REVISION

2398       Revision: $Revision$
2399
2400       Date: $Date$
2401
2402       See perlfaq for source control details and availability.
2403

AUTHOR AND COPYRIGHT

2405       Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other
2406       authors as noted. All rights reserved.
2407
2408       This documentation is free; you can redistribute it and/or modify it
2409       under the same terms as Perl itself.
2410
2411       Irrespective of its distribution, all code examples in this file are
2412       hereby placed into the public domain.  You are permitted and encouraged
2413       to use this code in your own programs for fun or for profit as you see
2414       fit.  A simple comment in the code giving credit would be courteous but
2415       is not required.
2416
2417
2418
2419perl v5.10.1                      2009-08-15                       PERLFAQ4(1)