perlfaq4(3pm)

1perlfaq4(3)           User Contributed Perl Documentation          perlfaq4(3)
2
3
4

NAME

6       perlfaq4 - Data Manipulation
7

VERSION

9       version 5.20210520
10

DESCRIPTION

12       This section of the FAQ answers questions related to manipulating
13       numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
14

Data: Numbers

16   Why am I getting long decimals (eg, 19.9499999999999) instead of the
17       numbers I should be getting (eg, 19.95)?
18       For the long explanation, see David Goldberg's "What Every Computer
19       Scientist Should Know About Floating-Point Arithmetic"
20       (<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>).
21
22       Internally, your computer represents floating-point numbers in binary.
23       Digital (as in powers of two) computers cannot store all numbers
24       exactly. Some real numbers lose precision in the process. This is a
25       problem with how computers store numbers and affects all computer
26       languages, not just Perl.
27
28       perlnumber shows the gory details of number representations and
29       conversions.
30
31       To limit the number of decimal places in your numbers, you can use the
32       "printf" or "sprintf" function. See "Floating-point Arithmetic" in
33       perlop for more details.
34
35           printf "%.2f", 10/3;
36
37           my $number = sprintf "%.2f", 10/3;
38
39   Why is int() broken?
40       Your "int()" is most probably working just fine. It's the numbers that
41       aren't quite what you think.
42
43       First, see the answer to "Why am I getting long decimals (eg,
44       19.9499999999999) instead of the numbers I should be getting (eg,
45       19.95)?".
46
47       For example, this
48
49           print int(0.6/0.2-2), "\n";
50
51       will in most computers print 0, not 1, because even such simple numbers
52       as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
53       What you think in the above as 'three' is really more like
54       2.9999999999999995559.
55
56   Why isn't my octal data interpreted correctly?
57       (contributed by brian d foy)
58
59       You're probably trying to convert a string to a number, which Perl only
60       converts as a decimal number. When Perl converts a string to a number,
61       it ignores leading spaces and zeroes, then assumes the rest of the
62       digits are in base 10:
63
64           my $string = '0644';
65
66           print $string + 0;  # prints 644
67
68           print $string + 44; # prints 688, certainly not octal!
69
70       This problem usually involves one of the Perl built-ins that has the
71       same name a Unix command that uses octal numbers as arguments on the
72       command line. In this example, "chmod" on the command line knows that
73       its first argument is octal because that's what it does:
74
75           %prompt> chmod 644 file
76
77       If you want to use the same literal digits (644) in Perl, you have to
78       tell Perl to treat them as octal numbers either by prefixing the digits
79       with a 0 or using "oct":
80
81           chmod(     0644, $filename );  # right, has leading zero
82           chmod( oct(644), $filename );  # also correct
83
84       The problem comes in when you take your numbers from something that
85       Perl thinks is a string, such as a command line argument in @ARGV:
86
87           chmod( $ARGV[0],      $filename );  # wrong, even if "0644"
88
89           chmod( oct($ARGV[0]), $filename );  # correct, treat string as octal
90
91       You can always check the value you're using by printing it in octal
92       notation to ensure it matches what you think it should be. Print it in
93       octal  and decimal format:
94
95           printf "0%o %d", $number, $number;
96
97   Does Perl have a round() function? What about ceil() and floor()? Trig
98       functions?
99       Remember that "int()" merely truncates toward 0. For rounding to a
100       certain number of digits, "sprintf()" or "printf()" is usually the
101       easiest route.
102
103           printf("%.3f", 3.1415926535);   # prints 3.142
104
105       The POSIX module (part of the standard Perl distribution) implements
106       "ceil()", "floor()", and a number of other mathematical and
107       trigonometric functions.
108
109           use POSIX;
110           my $ceil   = ceil(3.5);   # 4
111           my $floor  = floor(3.5);  # 3
112
113       In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
114       module. With 5.004, the Math::Trig module (part of the standard Perl
115       distribution) implements the trigonometric functions. Internally it
116       uses the Math::Complex module and some functions can break out from the
117       real axis into the complex plane, for example the inverse sine of 2.
118
119       Rounding in financial applications can have serious implications, and
120       the rounding method used should be specified precisely. In these cases,
121       it probably pays not to trust whichever system of rounding is being
122       used by Perl, but instead to implement the rounding function you need
123       yourself.
124
125       To see why, notice how you'll still have an issue on half-way-point
126       alternation:
127
128           for (my $i = -5; $i <= 5; $i += 0.5) { printf "%.0f ",$i }
129
130           -5 -4 -4 -4 -3 -2 -2 -2 -1 -0 0 0 1 2 2 2 3 4 4 4 5
131
132       Don't blame Perl. It's the same as in C. IEEE says we have to do this.
133       Perl numbers whose absolute values are integers under 2**31 (on 32-bit
134       machines) will work pretty much like mathematical integers.  Other
135       numbers are not guaranteed.
136
137   How do I convert between numeric representations/bases/radixes?
138       As always with Perl there is more than one way to do it. Below are a
139       few examples of approaches to making common conversions between number
140       representations. This is intended to be representational rather than
141       exhaustive.
142
143       Some of the examples later in perlfaq4 use the Bit::Vector module from
144       CPAN. The reason you might choose Bit::Vector over the perl built-in
145       functions is that it works with numbers of ANY size, that it is
146       optimized for speed on some operations, and for at least some
147       programmers the notation might be familiar.
148
149       How do I convert hexadecimal into decimal
150           Using perl's built in conversion of "0x" notation:
151
152               my $dec = 0xDEADBEEF;
153
154           Using the "hex" function:
155
156               my $dec = hex("DEADBEEF");
157
158           Using "pack":
159
160               my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
161
162           Using the CPAN module "Bit::Vector":
163
164               use Bit::Vector;
165               my $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
166               my $dec = $vec->to_Dec();
167
168       How do I convert from decimal to hexadecimal
169           Using "sprintf":
170
171               my $hex = sprintf("%X", 3735928559); # upper case A-F
172               my $hex = sprintf("%x", 3735928559); # lower case a-f
173
174           Using "unpack":
175
176               my $hex = unpack("H*", pack("N", 3735928559));
177
178           Using Bit::Vector:
179
180               use Bit::Vector;
181               my $vec = Bit::Vector->new_Dec(32, -559038737);
182               my $hex = $vec->to_Hex();
183
184           And Bit::Vector supports odd bit counts:
185
186               use Bit::Vector;
187               my $vec = Bit::Vector->new_Dec(33, 3735928559);
188               $vec->Resize(32); # suppress leading 0 if unwanted
189               my $hex = $vec->to_Hex();
190
191       How do I convert from octal to decimal
192           Using Perl's built in conversion of numbers with leading zeros:
193
194               my $dec = 033653337357; # note the leading 0!
195
196           Using the "oct" function:
197
198               my $dec = oct("33653337357");
199
200           Using Bit::Vector:
201
202               use Bit::Vector;
203               my $vec = Bit::Vector->new(32);
204               $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
205               my $dec = $vec->to_Dec();
206
207       How do I convert from decimal to octal
208           Using "sprintf":
209
210               my $oct = sprintf("%o", 3735928559);
211
212           Using Bit::Vector:
213
214               use Bit::Vector;
215               my $vec = Bit::Vector->new_Dec(32, -559038737);
216               my $oct = reverse join('', $vec->Chunk_List_Read(3));
217
218       How do I convert from binary to decimal
219           Perl 5.6 lets you write binary numbers directly with the "0b"
220           notation:
221
222               my $number = 0b10110110;
223
224           Using "oct":
225
226               my $input = "10110110";
227               my $decimal = oct( "0b$input" );
228
229           Using "pack" and "ord":
230
231               my $decimal = ord(pack('B8', '10110110'));
232
233           Using "pack" and "unpack" for larger strings:
234
235               my $int = unpack("N", pack("B32",
236               substr("0" x 32 . "11110101011011011111011101111", -32)));
237               my $dec = sprintf("%d", $int);
238
239               # substr() is used to left-pad a 32-character string with zeros.
240
241           Using Bit::Vector:
242
243               my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
244               my $dec = $vec->to_Dec();
245
246       How do I convert from decimal to binary
247           Using "sprintf" (perl 5.6+):
248
249               my $bin = sprintf("%b", 3735928559);
250
251           Using "unpack":
252
253               my $bin = unpack("B*", pack("N", 3735928559));
254
255           Using Bit::Vector:
256
257               use Bit::Vector;
258               my $vec = Bit::Vector->new_Dec(32, -559038737);
259               my $bin = $vec->to_Bin();
260
261           The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
262           are left as an exercise to the inclined reader.
263
264   Why doesn't & work the way I want it to?
265       The behavior of binary arithmetic operators depends on whether they're
266       used on numbers or strings. The operators treat a string as a series of
267       bits and work with that (the string "3" is the bit pattern 00110011).
268       The operators work with the binary form of a number (the number 3 is
269       treated as the bit pattern 00000011).
270
271       So, saying "11 & 3" performs the "and" operation on numbers (yielding
272       3). Saying "11" & "3" performs the "and" operation on strings (yielding
273       "1").
274
275       Most problems with "&" and "|" arise because the programmer thinks they
276       have a number but really it's a string or vice versa. To avoid this,
277       stringify the arguments explicitly (using "" or "qq()") or convert them
278       to numbers explicitly (using "0+$arg"). The rest arise because the
279       programmer says:
280
281           if ("\020\020" & "\101\101") {
282               # ...
283           }
284
285       but a string consisting of two null bytes (the result of "\020\020" &
286       "\101\101") is not a false value in Perl. You need:
287
288           if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
289               # ...
290           }
291
292   How do I multiply matrices?
293       Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
294       or the PDL extension (also available from CPAN).
295
296   How do I perform an operation on a series of integers?
297       To call a function on each element in an array, and collect the
298       results, use:
299
300           my @results = map { my_func($_) } @array;
301
302       For example:
303
304           my @triple = map { 3 * $_ } @single;
305
306       To call a function on each element of an array, but ignore the results:
307
308           foreach my $iterator (@array) {
309               some_func($iterator);
310           }
311
312       To call a function on each integer in a (small) range, you can use:
313
314           my @results = map { some_func($_) } (5 .. 25);
315
316       but you should be aware that in this form, the ".." operator creates a
317       list of all integers in the range, which can take a lot of memory for
318       large ranges. However, the problem does not occur when using ".."
319       within a "for" loop, because in that case the range operator is
320       optimized to iterate over the range, without creating the entire list.
321       So
322
323           my @results = ();
324           for my $i (5 .. 500_005) {
325               push(@results, some_func($i));
326           }
327
328       or even
329
330          push(@results, some_func($_)) for 5 .. 500_005;
331
332       will not create an intermediate list of 500,000 integers.
333
334   How can I output Roman numerals?
335       Get the <http://www.cpan.org/modules/by-module/Roman> module.
336
337   Why aren't my random numbers random?
338       If you're using a version of Perl before 5.004, you must call "srand"
339       once at the start of your program to seed the random number generator.
340
341            BEGIN { srand() if $] < 5.004 }
342
343       5.004 and later automatically call "srand" at the beginning. Don't call
344       "srand" more than once--you make your numbers less random, rather than
345       more.
346
347       Computers are good at being predictable and bad at being random
348       (despite appearances caused by bugs in your programs :-). The random
349       article in the "Far More Than You Ever Wanted To Know" collection in
350       <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy of Tom
351       Phoenix, talks more about this. John von Neumann said, "Anyone who
352       attempts to generate random numbers by deterministic means is, of
353       course, living in a state of sin."
354
355       Perl relies on the underlying system for the implementation of "rand"
356       and "srand"; on some systems, the generated numbers are not random
357       enough (especially on Windows : see
358       <http://www.perlmonks.org/?node_id=803632>).  Several CPAN modules in
359       the "Math" namespace implement better pseudorandom generators; see for
360       example Math::Random::MT ("Mersenne Twister", fast), or
361       Math::TrulyRandom (uses the imperfections in the system's timer to
362       generate random numbers, which is rather slow).  More algorithms for
363       random numbers are described in "Numerical Recipes in C" at
364       <http://www.nr.com/>
365
366   How do I get a random number between X and Y?
367       To get a random number between two values, you can use the "rand()"
368       built-in to get a random number between 0 and 1. From there, you shift
369       that into the range that you want.
370
371       "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
372       you want to have perl figure out is a random number in the range from 0
373       to the difference between your X and Y.
374
375       That is, to get a number between 10 and 15, inclusive, you want a
376       random number between 0 and 5 that you can then add to 10.
377
378           my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
379
380       Hence you derive the following simple function to abstract that. It
381       selects a random integer between the two given integers (inclusive).
382       For example: "random_int_between(50,120)".
383
384           sub random_int_between {
385               my($min, $max) = @_;
386               # Assumes that the two arguments are integers themselves!
387               return $min if $min == $max;
388               ($min, $max) = ($max, $min)  if  $min > $max;
389               return $min + int rand(1 + $max - $min);
390           }
391

Data: Dates

393   How do I find the day or week of the year?
394       The day of the year is in the list returned by the "localtime"
395       function. Without an argument "localtime" uses the current time.
396
397           my $day_of_year = (localtime)[7];
398
399       The POSIX module can also format a date as the day of the year or week
400       of the year.
401
402           use POSIX qw/strftime/;
403           my $day_of_year  = strftime "%j", localtime;
404           my $week_of_year = strftime "%W", localtime;
405
406       To get the day of year for any date, use POSIX's "mktime" to get a time
407       in epoch seconds for the argument to "localtime".
408
409           use POSIX qw/mktime strftime/;
410           my $week_of_year = strftime "%W",
411               localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
412
413       You can also use Time::Piece, which comes with Perl and provides a
414       "localtime" that returns an object:
415
416           use Time::Piece;
417           my $day_of_year  = localtime->yday;
418           my $week_of_year = localtime->week;
419
420       The Date::Calc module provides two functions to calculate these, too:
421
422           use Date::Calc;
423           my $day_of_year  = Day_of_Year(  1987, 12, 18 );
424           my $week_of_year = Week_of_Year( 1987, 12, 18 );
425
426   How do I find the current century or millennium?
427       Use the following simple functions:
428
429           sub get_century    {
430               return int((((localtime(shift || time))[5] + 1999))/100);
431           }
432
433           sub get_millennium {
434               return 1+int((((localtime(shift || time))[5] + 1899))/1000);
435           }
436
437       On some systems, the POSIX module's "strftime()" function has been
438       extended in a non-standard way to use a %C format, which they sometimes
439       claim is the "century". It isn't, because on most such systems, this is
440       only the first two digits of the four-digit year, and thus cannot be
441       used to determine reliably the current century or millennium.
442
443   How can I compare two dates and find the difference?
444       (contributed by brian d foy)
445
446       You could just store all your dates as a number and then subtract.
447       Life isn't always that simple though.
448
449       The Time::Piece module, which comes with Perl, replaces localtime with
450       a version that returns an object. It also overloads the comparison
451       operators so you can compare them directly:
452
453           use Time::Piece;
454           my $date1 = localtime( $some_time );
455           my $date2 = localtime( $some_other_time );
456
457           if( $date1 < $date2 ) {
458               print "The date was in the past\n";
459           }
460
461       You can also get differences with a subtraction, which returns a
462       Time::Seconds object:
463
464           my $date_diff = $date1 - $date2;
465           print "The difference is ", $date_diff->days, " days\n";
466
467       If you want to work with formatted dates, the Date::Manip, Date::Calc,
468       or DateTime modules can help you.
469
470   How can I take a string and turn it into epoch seconds?
471       If it's a regular enough string that it always has the same format, you
472       can split it up and pass the parts to "timelocal" in the standard
473       Time::Local module. Otherwise, you should look into the Date::Calc,
474       Date::Parse, and Date::Manip modules from CPAN.
475
476   How can I find the Julian Day?
477       (contributed by brian d foy and Dave Cross)
478
479       You can use the Time::Piece module, part of the Standard Library, which
480       can convert a date/time to a Julian Day:
481
482           $ perl -MTime::Piece -le 'print localtime->julian_day'
483           2455607.7959375
484
485       Or the modified Julian Day:
486
487           $ perl -MTime::Piece -le 'print localtime->mjd'
488           55607.2961226851
489
490       Or even the day of the year (which is what some people think of as a
491       Julian day):
492
493           $ perl -MTime::Piece -le 'print localtime->yday'
494           45
495
496       You can also do the same things with the DateTime module:
497
498           $ perl -MDateTime -le'print DateTime->today->jd'
499           2453401.5
500           $ perl -MDateTime -le'print DateTime->today->mjd'
501           53401
502           $ perl -MDateTime -le'print DateTime->today->doy'
503           31
504
505       You can use the Time::JulianDay module available on CPAN. Ensure that
506       you really want to find a Julian day, though, as many people have
507       different ideas about Julian days (see
508       <http://www.hermetic.ch/cal_stud/jdn.htm> for instance):
509
510           $  perl -MTime::JulianDay -le 'print local_julian_day( time )'
511           55608
512
513   How do I find yesterday's date?
514       (contributed by brian d foy)
515
516       To do it correctly, you can use one of the "Date" modules since they
517       work with calendars instead of times. The DateTime module makes it
518       simple, and give you the same time of day, only the day before, despite
519       daylight saving time changes:
520
521           use DateTime;
522
523           my $yesterday = DateTime->now->subtract( days => 1 );
524
525           print "Yesterday was $yesterday\n";
526
527       You can also use the Date::Calc module using its "Today_and_Now"
528       function.
529
530           use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
531
532           my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
533
534           print "@date_time\n";
535
536       Most people try to use the time rather than the calendar to figure out
537       dates, but that assumes that days are twenty-four hours each. For most
538       people, there are two days a year when they aren't: the switch to and
539       from summer time throws this off. For example, the rest of the
540       suggestions will be wrong sometimes:
541
542       Starting with Perl 5.10, Time::Piece and Time::Seconds are part of the
543       standard distribution, so you might think that you could do something
544       like this:
545
546           use Time::Piece;
547           use Time::Seconds;
548
549           my $yesterday = localtime() - ONE_DAY; # WRONG
550           print "Yesterday was $yesterday\n";
551
552       The Time::Piece module exports a new "localtime" that returns an
553       object, and Time::Seconds exports the "ONE_DAY" constant that is a set
554       number of seconds. This means that it always gives the time 24 hours
555       ago, which is not always yesterday. This can cause problems around the
556       end of daylight saving time when there's one day that is 25 hours long.
557
558       You have the same problem with Time::Local, which will give the wrong
559       answer for those same special cases:
560
561           # contributed by Gunnar Hjalmarsson
562            use Time::Local;
563            my $today = timelocal 0, 0, 12, ( localtime )[3..5];
564            my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
565            printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
566
567   Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
568       (contributed by brian d foy)
569
570       Perl itself never had a Y2K problem, although that never stopped people
571       from creating Y2K problems on their own. See the documentation for
572       "localtime" for its proper use.
573
574       Starting with Perl 5.12, "localtime" and "gmtime" can handle dates past
575       03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
576       still might get a warning on a 32-bit "perl":
577
578           % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
579           Integer overflow in hexadecimal number at -e line 1.
580           Wed Nov  1 19:42:39 5576711
581
582       On a 64-bit "perl", you can get even larger dates for those really long
583       running projects:
584
585           % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
586           Thu Nov  2 00:42:39 5576711
587
588       You're still out of luck if you need to keep track of decaying protons
589       though.
590

Data: Strings

592   How do I validate input?
593       (contributed by brian d foy)
594
595       There are many ways to ensure that values are what you expect or want
596       to accept. Besides the specific examples that we cover in the perlfaq,
597       you can also look at the modules with "Assert" and "Validate" in their
598       names, along with other modules such as Regexp::Common.
599
600       Some modules have validation for particular types of input, such as
601       Business::ISBN, Business::CreditCard, Email::Valid, and
602       Data::Validate::IP.
603
604   How do I unescape a string?
605       It depends just what you mean by "escape". URL escapes are dealt with
606       in perlfaq9. Shell escapes with the backslash ("\") character are
607       removed with
608
609           s/\\(.)/$1/g;
610
611       This won't expand "\n" or "\t" or any other special escapes.
612
613   How do I remove consecutive pairs of characters?
614       (contributed by brian d foy)
615
616       You can use the substitution operator to find pairs of characters (or
617       runs of characters) and replace them with a single instance. In this
618       substitution, we find a character in "(.)". The memory parentheses
619       store the matched character in the back-reference "\g1" and we use that
620       to require that the same thing immediately follow it. We replace that
621       part of the string with the character in $1.
622
623           s/(.)\g1/$1/g;
624
625       We can also use the transliteration operator, "tr///". In this example,
626       the search list side of our "tr///" contains nothing, but the "c"
627       option complements that so it contains everything. The replacement list
628       also contains nothing, so the transliteration is almost a no-op since
629       it won't do any replacements (or more exactly, replace the character
630       with itself). However, the "s" option squashes duplicated and
631       consecutive characters in the string so a character does not show up
632       next to itself
633
634           my $str = 'Haarlem';   # in the Netherlands
635           $str =~ tr///cs;       # Now Harlem, like in New York
636
637   How do I expand function calls in a string?
638       (contributed by brian d foy)
639
640       This is documented in perlref, and although it's not the easiest thing
641       to read, it does work. In each of these examples, we call the function
642       inside the braces used to dereference a reference. If we have more than
643       one return value, we can construct and dereference an anonymous array.
644       In this case, we call the function in list context.
645
646           print "The time values are @{ [localtime] }.\n";
647
648       If we want to call the function in scalar context, we have to do a bit
649       more work. We can really have any code we like inside the braces, so we
650       simply have to end with the scalar reference, although how you do that
651       is up to you, and you can use code inside the braces. Note that the use
652       of parens creates a list context, so we need "scalar" to force the
653       scalar context on the function:
654
655           print "The time is ${\(scalar localtime)}.\n"
656
657           print "The time is ${ my $x = localtime; \$x }.\n";
658
659       If your function already returns a reference, you don't need to create
660       the reference yourself.
661
662           sub timestamp { my $t = localtime; \$t }
663
664           print "The time is ${ timestamp() }.\n";
665
666       The "Interpolation" module can also do a lot of magic for you. You can
667       specify a variable name, in this case "E", to set up a tied hash that
668       does the interpolation for you. It has several other methods to do this
669       as well.
670
671           use Interpolation E => 'eval';
672           print "The time values are $E{localtime()}.\n";
673
674       In most cases, it is probably easier to simply use string
675       concatenation, which also forces scalar context.
676
677           print "The time is " . localtime() . ".\n";
678
679   How do I find matching/nesting anything?
680       To find something between two single characters, a pattern like
681       "/x([^x]*)x/" will get the intervening bits in $1. For multiple ones,
682       then something more like "/alpha(.*?)omega/" would be needed. For
683       nested patterns and/or balanced expressions, see the so-called (?PARNO)
684       construct (available since perl 5.10).  The CPAN module Regexp::Common
685       can help to build such regular expressions (see in particular
686       Regexp::Common::balanced and Regexp::Common::delimited).
687
688       More complex cases will require to write a parser, probably using a
689       parsing module from CPAN, like Regexp::Grammars, Parse::RecDescent,
690       Parse::Yapp, Text::Balanced, or Marpa::R2.
691
692   How do I reverse a string?
693       Use "reverse()" in scalar context, as documented in "reverse" in
694       perlfunc.
695
696           my $reversed = reverse $string;
697
698   How do I expand tabs in a string?
699       You can do it yourself:
700
701           1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
702
703       Or you can just use the Text::Tabs module (part of the standard Perl
704       distribution).
705
706           use Text::Tabs;
707           my @expanded_lines = expand(@lines_with_tabs);
708
709   How do I reformat a paragraph?
710       Use Text::Wrap (part of the standard Perl distribution):
711
712           use Text::Wrap;
713           print wrap("\t", '  ', @paragraphs);
714
715       The paragraphs you give to Text::Wrap should not contain embedded
716       newlines. Text::Wrap doesn't justify the lines (flush-right).
717
718       Or use the CPAN module Text::Autoformat. Formatting files can be easily
719       done by making a shell alias, like so:
720
721           alias fmt="perl -i -MText::Autoformat -n0777 \
722               -e 'print autoformat $_, {all=>1}' $*"
723
724       See the documentation for Text::Autoformat to appreciate its many
725       capabilities.
726
727   How can I access or change N characters of a string?
728       You can access the first characters of a string with substr().  To get
729       the first character, for example, start at position 0 and grab the
730       string of length 1.
731
732           my $string = "Just another Perl Hacker";
733           my $first_char = substr( $string, 0, 1 );  #  'J'
734
735       To change part of a string, you can use the optional fourth argument
736       which is the replacement string.
737
738           substr( $string, 13, 4, "Perl 5.8.0" );
739
740       You can also use substr() as an lvalue.
741
742           substr( $string, 13, 4 ) =  "Perl 5.8.0";
743
744   How do I change the Nth occurrence of something?
745       You have to keep track of N yourself. For example, let's say you want
746       to change the fifth occurrence of "whoever" or "whomever" into
747       "whosoever" or "whomsoever", case insensitively. These all assume that
748       $_ contains the string to be altered.
749
750           $count = 0;
751           s{((whom?)ever)}{
752           ++$count == 5       # is it the 5th?
753               ? "${2}soever"  # yes, swap
754               : $1            # renege and leave it there
755               }ige;
756
757       In the more general case, you can use the "/g" modifier in a "while"
758       loop, keeping count of matches.
759
760           $WANT = 3;
761           $count = 0;
762           $_ = "One fish two fish red fish blue fish";
763           while (/(\w+)\s+fish\b/gi) {
764               if (++$count == $WANT) {
765                   print "The third fish is a $1 one.\n";
766               }
767           }
768
769       That prints out: "The third fish is a red one."  You can also use a
770       repetition count and repeated pattern like this:
771
772           /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
773
774   How can I count the number of occurrences of a substring within a string?
775       There are a number of ways, with varying efficiency. If you want a
776       count of a certain single character (X) within a string, you can use
777       the "tr///" function like so:
778
779           my $string = "ThisXlineXhasXsomeXx'sXinXit";
780           my $count = ($string =~ tr/X//);
781           print "There are $count X characters in the string";
782
783       This is fine if you are just looking for a single character. However,
784       if you are trying to count multiple character substrings within a
785       larger string, "tr///" won't work. What you can do is wrap a while()
786       loop around a global pattern match. For example, let's count negative
787       integers:
788
789           my $string = "-9 55 48 -2 23 -76 4 14 -44";
790           my $count = 0;
791           while ($string =~ /-\d+/g) { $count++ }
792           print "There are $count negative numbers in the string";
793
794       Another version uses a global match in list context, then assigns the
795       result to a scalar, producing a count of the number of matches.
796
797           my $count = () = $string =~ /-\d+/g;
798
799   How do I capitalize all the words on one line?
800       (contributed by brian d foy)
801
802       Damian Conway's Text::Autoformat handles all of the thinking for you.
803
804           use Text::Autoformat;
805           my $x = "Dr. Strangelove or: How I Learned to Stop ".
806             "Worrying and Love the Bomb";
807
808           print $x, "\n";
809           for my $style (qw( sentence title highlight )) {
810               print autoformat($x, { case => $style }), "\n";
811           }
812
813       How do you want to capitalize those words?
814
815           FRED AND BARNEY'S LODGE        # all uppercase
816           Fred And Barney's Lodge        # title case
817           Fred and Barney's Lodge        # highlight case
818
819       It's not as easy a problem as it looks. How many words do you think are
820       in there? Wait for it... wait for it.... If you answered 5 you're
821       right. Perl words are groups of "\w+", but that's not what you want to
822       capitalize. How is Perl supposed to know not to capitalize that "s"
823       after the apostrophe? You could try a regular expression:
824
825           $string =~ s/ (
826                        (^\w)    #at the beginning of the line
827                          |      # or
828                        (\s\w)   #preceded by whitespace
829                          )
830                       /\U$1/xg;
831
832           $string =~ s/([\w']+)/\u\L$1/g;
833
834       Now, what if you don't want to capitalize that "and"? Just use
835       Text::Autoformat and get on with the next problem. :)
836
837   How can I split a [character]-delimited string except when inside
838       [character]?
839       Several modules can handle this sort of parsing--Text::Balanced,
840       Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
841
842       Take the example case of trying to split a string that is comma-
843       separated into its different fields. You can't use "split(/,/)" because
844       you shouldn't split if the comma is inside quotes. For example, take a
845       data line like this:
846
847           SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
848
849       Due to the restriction of the quotes, this is a fairly complex problem.
850       Thankfully, we have Jeffrey Friedl, author of Mastering Regular
851       Expressions, to handle these for us. He suggests (assuming your string
852       is contained in $text):
853
854            my @new = ();
855            push(@new, $+) while $text =~ m{
856                "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
857               | ([^,]+),?
858               | ,
859            }gx;
860            push(@new, undef) if substr($text,-1,1) eq ',';
861
862       If you want to represent quotation marks inside a quotation-mark-
863       delimited field, escape them with backslashes (eg, "like \"this\"".
864
865       Alternatively, the Text::ParseWords module (part of the standard Perl
866       distribution) lets you say:
867
868           use Text::ParseWords;
869           @new = quotewords(",", 0, $text);
870
871       For parsing or generating CSV, though, using Text::CSV rather than
872       implementing it yourself is highly recommended; you'll save yourself
873       odd bugs popping up later by just using code which has already been
874       tried and tested in production for years.
875
876   How do I strip blank space from the beginning/end of a string?
877       (contributed by brian d foy)
878
879       A substitution can do this for you. For a single line, you want to
880       replace all the leading or trailing whitespace with nothing. You can do
881       that with a pair of substitutions:
882
883           s/^\s+//;
884           s/\s+$//;
885
886       You can also write that as a single substitution, although it turns out
887       the combined statement is slower than the separate ones. That might not
888       matter to you, though:
889
890           s/^\s+|\s+$//g;
891
892       In this regular expression, the alternation matches either at the
893       beginning or the end of the string since the anchors have a lower
894       precedence than the alternation. With the "/g" flag, the substitution
895       makes all possible matches, so it gets both. Remember, the trailing
896       newline matches the "\s+", and  the "$" anchor can match to the
897       absolute end of the string, so the newline disappears too. Just add the
898       newline to the output, which has the added benefit of preserving
899       "blank" (consisting entirely of whitespace) lines which the "^\s+"
900       would remove all by itself:
901
902           while( <> ) {
903               s/^\s+|\s+$//g;
904               print "$_\n";
905           }
906
907       For a multi-line string, you can apply the regular expression to each
908       logical line in the string by adding the "/m" flag (for "multi-line").
909       With the "/m" flag, the "$" matches before an embedded newline, so it
910       doesn't remove it. This pattern still removes the newline at the end of
911       the string:
912
913           $string =~ s/^\s+|\s+$//gm;
914
915       Remember that lines consisting entirely of whitespace will disappear,
916       since the first part of the alternation can match the entire string and
917       replace it with nothing. If you need to keep embedded blank lines, you
918       have to do a little more work. Instead of matching any whitespace
919       (since that includes a newline), just match the other whitespace:
920
921           $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
922
923   How do I pad a string with blanks or pad a number with zeroes?
924       In the following examples, $pad_len is the length to which you wish to
925       pad the string, $text or $num contains the string to be padded, and
926       $pad_char contains the padding character. You can use a single
927       character string constant instead of the $pad_char variable if you know
928       what it is in advance. And in the same way you can use an integer in
929       place of $pad_len if you know the pad length in advance.
930
931       The simplest method uses the "sprintf" function. It can pad on the left
932       or right with blanks and on the left with zeroes and it will not
933       truncate the result. The "pack" function can only pad strings on the
934       right with blanks and it will truncate the result to a maximum length
935       of $pad_len.
936
937           # Left padding a string with blanks (no truncation):
938           my $padded = sprintf("%${pad_len}s", $text);
939           my $padded = sprintf("%*s", $pad_len, $text);  # same thing
940
941           # Right padding a string with blanks (no truncation):
942           my $padded = sprintf("%-${pad_len}s", $text);
943           my $padded = sprintf("%-*s", $pad_len, $text); # same thing
944
945           # Left padding a number with 0 (no truncation):
946           my $padded = sprintf("%0${pad_len}d", $num);
947           my $padded = sprintf("%0*d", $pad_len, $num); # same thing
948
949           # Right padding a string with blanks using pack (will truncate):
950           my $padded = pack("A$pad_len",$text);
951
952       If you need to pad with a character other than blank or zero you can
953       use one of the following methods. They all generate a pad string with
954       the "x" operator and combine that with $text. These methods do not
955       truncate $text.
956
957       Left and right padding with any character, creating a new string:
958
959           my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
960           my $padded = $text . $pad_char x ( $pad_len - length( $text ) );
961
962       Left and right padding with any character, modifying $text directly:
963
964           substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
965           $text .= $pad_char x ( $pad_len - length( $text ) );
966
967   How do I extract selected columns from a string?
968       (contributed by brian d foy)
969
970       If you know the columns that contain the data, you can use "substr" to
971       extract a single column.
972
973           my $column = substr( $line, $start_column, $length );
974
975       You can use "split" if the columns are separated by whitespace or some
976       other delimiter, as long as whitespace or the delimiter cannot appear
977       as part of the data.
978
979           my $line    = ' fred barney   betty   ';
980           my @columns = split /\s+/, $line;
981               # ( '', 'fred', 'barney', 'betty' );
982
983           my $line    = 'fred||barney||betty';
984           my @columns = split /\|/, $line;
985               # ( 'fred', '', 'barney', '', 'betty' );
986
987       If you want to work with comma-separated values, don't do this since
988       that format is a bit more complicated. Use one of the modules that
989       handle that format, such as Text::CSV, Text::CSV_XS, or Text::CSV_PP.
990
991       If you want to break apart an entire line of fixed columns, you can use
992       "unpack" with the A (ASCII) format. By using a number after the format
993       specifier, you can denote the column width. See the "pack" and "unpack"
994       entries in perlfunc for more details.
995
996           my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
997
998       Note that spaces in the format argument to "unpack" do not denote
999       literal spaces. If you have space separated data, you may want "split"
1000       instead.
1001
1002   How do I find the soundex value of a string?
1003       (contributed by brian d foy)
1004
1005       You can use the "Text::Soundex" module. If you want to do fuzzy or
1006       close matching, you might also try the String::Approx, and
1007       Text::Metaphone, and Text::DoubleMetaphone modules.
1008
1009   How can I expand variables in text strings?
1010       (contributed by brian d foy)
1011
1012       If you can avoid it, don't, or if you can use a templating system, such
1013       as Text::Template or Template Toolkit, do that instead. You might even
1014       be able to get the job done with "sprintf" or "printf":
1015
1016           my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1017
1018       However, for the one-off simple case where I don't want to pull out a
1019       full templating system, I'll use a string that has two Perl scalar
1020       variables in it. In this example, I want to expand $foo and $bar to
1021       their variable's values:
1022
1023           my $foo = 'Fred';
1024           my $bar = 'Barney';
1025           $string = 'Say hello to $foo and $bar';
1026
1027       One way I can do this involves the substitution operator and a double
1028       "/e" flag. The first "/e" evaluates $1 on the replacement side and
1029       turns it into $foo. The second /e starts with $foo and replaces it with
1030       its value. $foo, then, turns into 'Fred', and that's finally what's
1031       left in the string:
1032
1033           $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1034
1035       The "/e" will also silently ignore violations of strict, replacing
1036       undefined variable names with the empty string. Since I'm using the
1037       "/e" flag (twice even!), I have all of the same security problems I
1038       have with "eval" in its string form. If there's something odd in $foo,
1039       perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1040       myself in trouble.
1041
1042       To get around the security problem, I could also pull the values from a
1043       hash instead of evaluating variable names. Using a single "/e", I can
1044       check the hash to ensure the value exists, and if it doesn't, I can
1045       replace the missing value with a marker, in this case "???" to signal
1046       that I missed something:
1047
1048           my $string = 'This has $foo and $bar';
1049
1050           my %Replacements = (
1051               foo  => 'Fred',
1052               );
1053
1054           # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1055           $string =~ s/\$(\w+)/
1056               exists $Replacements{$1} ? $Replacements{$1} : '???'
1057               /eg;
1058
1059           print $string;
1060
1061   Does Perl have anything like Ruby's #{} or Python's f string?
1062       Unlike the others, Perl allows you to embed a variable naked in a
1063       double quoted string, e.g. "variable $variable". When there isn't
1064       whitespace or other non-word characters following the variable name,
1065       you can add braces (e.g. "foo ${foo}bar") to ensure correct parsing.
1066
1067       An array can also be embedded directly in a string, and will be
1068       expanded by default with spaces between the elements. The default
1069       LIST_SEPARATOR can be changed by assigning a different string to the
1070       special variable $", such as "local $" = ', ';".
1071
1072       Perl also supports references within a string providing the equivalent
1073       of the features in the other two languages.
1074
1075       "${\ ... }" embedded within a string will work for most simple
1076       statements such as an object->method call. More complex code can be
1077       wrapped in a do block "${\ do{...} }".
1078
1079       When you want a list to be expanded per $", use "@{[ ... ]}".
1080
1081           use Time::Piece;
1082           use Time::Seconds;
1083           my $scalar = 'STRING';
1084           my @array = ( 'zorro', 'a', 1, 'B', 3 );
1085
1086           # Print the current date and time and then Tommorrow
1087           my $t = Time::Piece->new;
1088           say "Now is: ${\ $t->cdate() }";
1089           say "Tomorrow: ${\ do{ my $T=Time::Piece->new + ONE_DAY ; $T->fullday }}";
1090
1091           # some variables in strings
1092           say "This is some scalar I have $scalar, this is an array @array.";
1093           say "You can also write it like this ${scalar} @{array}.";
1094
1095           # Change the $LIST_SEPARATOR
1096           local $" = ':';
1097           say "Set \$\" to delimit with ':' and sort the Array @{[ sort @array ]}";
1098
1099       You may also want to look at the module Quote::Code, and templating
1100       tools such as Template::Toolkit and Mojo::Template.
1101
1102       See also: "How can I expand variables in text strings?" and "How do I
1103       expand function calls in a string?" in this FAQ.
1104
1105   What's wrong with always quoting "$vars"?
1106       The problem is that those double-quotes force stringification--coercing
1107       numbers and references into strings--even when you don't want them to
1108       be strings. Think of it this way: double-quote expansion is used to
1109       produce new strings. If you already have a string, why do you need
1110       more?
1111
1112       If you get used to writing odd things like these:
1113
1114           print "$var";       # BAD
1115           my $new = "$old";       # BAD
1116           somefunc("$var");    # BAD
1117
1118       You'll be in trouble. Those should (in 99.8% of the cases) be the
1119       simpler and more direct:
1120
1121           print $var;
1122           my $new = $old;
1123           somefunc($var);
1124
1125       Otherwise, besides slowing you down, you're going to break code when
1126       the thing in the scalar is actually neither a string nor a number, but
1127       a reference:
1128
1129           func(\@array);
1130           sub func {
1131               my $aref = shift;
1132               my $oref = "$aref";  # WRONG
1133           }
1134
1135       You can also get into subtle problems on those few operations in Perl
1136       that actually do care about the difference between a string and a
1137       number, such as the magical "++" autoincrement operator or the
1138       syscall() function.
1139
1140       Stringification also destroys arrays.
1141
1142           my @lines = `command`;
1143           print "@lines";     # WRONG - extra blanks
1144           print @lines;       # right
1145
1146   Why don't my <<HERE documents work?
1147       Here documents are found in perlop. Check for these three things:
1148
1149       There must be no space after the << part.
1150       There (probably) should be a semicolon at the end of the opening token
1151       You can't (easily) have any space in front of the tag.
1152       There needs to be at least a line separator after the end token.
1153
1154       If you want to indent the text in the here document, you can do this:
1155
1156           # all in one
1157           (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1158               your text
1159               goes here
1160           HERE_TARGET
1161
1162       But the HERE_TARGET must still be flush against the margin.  If you
1163       want that indented also, you'll have to quote in the indentation.
1164
1165           (my $quote = <<'    FINIS') =~ s/^\s+//gm;
1166                   ...we will have peace, when you and all your works have
1167                   perished--and the works of your dark master to whom you
1168                   would deliver us. You are a liar, Saruman, and a corrupter
1169                   of men's hearts. --Theoden in /usr/src/perl/taint.c
1170               FINIS
1171           $quote =~ s/\s+--/\n--/;
1172
1173       A nice general-purpose fixer-upper function for indented here documents
1174       follows. It expects to be called with a here document as its argument.
1175       It looks to see whether each line begins with a common substring, and
1176       if so, strips that substring off. Otherwise, it takes the amount of
1177       leading whitespace found on the first line and removes that much off
1178       each subsequent line.
1179
1180           sub fix {
1181               local $_ = shift;
1182               my ($white, $leader);  # common whitespace and common leading string
1183               if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
1184                   ($white, $leader) = ($2, quotemeta($1));
1185               } else {
1186                   ($white, $leader) = (/^(\s+)/, '');
1187               }
1188               s/^\s*?$leader(?:$white)?//gm;
1189               return $_;
1190           }
1191
1192       This works with leading special strings, dynamically determined:
1193
1194           my $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1195           @@@ int
1196           @@@ runops() {
1197           @@@     SAVEI32(runlevel);
1198           @@@     runlevel++;
1199           @@@     while ( op = (*op->op_ppaddr)() );
1200           @@@     TAINT_NOT;
1201           @@@     return 0;
1202           @@@ }
1203           MAIN_INTERPRETER_LOOP
1204
1205       Or with a fixed amount of leading whitespace, with remaining
1206       indentation correctly preserved:
1207
1208           my $poem = fix<<EVER_ON_AND_ON;
1209              Now far ahead the Road has gone,
1210             And I must follow, if I can,
1211              Pursuing it with eager feet,
1212             Until it joins some larger way
1213              Where many paths and errands meet.
1214             And whither then? I cannot say.
1215               --Bilbo in /usr/src/perl/pp_ctl.c
1216           EVER_ON_AND_ON
1217
1218       Beginning with Perl version 5.26, a much simpler and cleaner way to
1219       write indented here documents has been added to the language: the tilde
1220       (~) modifier. See "Indented Here-docs" in perlop for details.
1221

Data: Arrays

1223   What is the difference between a list and an array?
1224       (contributed by brian d foy)
1225
1226       A list is a fixed collection of scalars. An array is a variable that
1227       holds a variable collection of scalars. An array can supply its
1228       collection for list operations, so list operations also work on arrays:
1229
1230           # slices
1231           ( 'dog', 'cat', 'bird' )[2,3];
1232           @animals[2,3];
1233
1234           # iteration
1235           foreach ( qw( dog cat bird ) ) { ... }
1236           foreach ( @animals ) { ... }
1237
1238           my @three = grep { length == 3 } qw( dog cat bird );
1239           my @three = grep { length == 3 } @animals;
1240
1241           # supply an argument list
1242           wash_animals( qw( dog cat bird ) );
1243           wash_animals( @animals );
1244
1245       Array operations, which change the scalars, rearrange them, or add or
1246       subtract some scalars, only work on arrays. These can't work on a list,
1247       which is fixed. Array operations include "shift", "unshift", "push",
1248       "pop", and "splice".
1249
1250       An array can also change its length:
1251
1252           $#animals = 1;  # truncate to two elements
1253           $#animals = 10000; # pre-extend to 10,001 elements
1254
1255       You can change an array element, but you can't change a list element:
1256
1257           $animals[0] = 'Rottweiler';
1258           qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1259
1260           foreach ( @animals ) {
1261               s/^d/fr/;  # works fine
1262           }
1263
1264           foreach ( qw( dog cat bird ) ) {
1265               s/^d/fr/;  # Error! Modification of read only value!
1266           }
1267
1268       However, if the list element is itself a variable, it appears that you
1269       can change a list element. However, the list element is the variable,
1270       not the data. You're not changing the list element, but something the
1271       list element refers to. The list element itself doesn't change: it's
1272       still the same variable.
1273
1274       You also have to be careful about context. You can assign an array to a
1275       scalar to get the number of elements in the array. This only works for
1276       arrays, though:
1277
1278           my $count = @animals;  # only works with arrays
1279
1280       If you try to do the same thing with what you think is a list, you get
1281       a quite different result. Although it looks like you have a list on the
1282       righthand side, Perl actually sees a bunch of scalars separated by a
1283       comma:
1284
1285           my $scalar = ( 'dog', 'cat', 'bird' );  # $scalar gets bird
1286
1287       Since you're assigning to a scalar, the righthand side is in scalar
1288       context. The comma operator (yes, it's an operator!) in scalar context
1289       evaluates its lefthand side, throws away the result, and evaluates it's
1290       righthand side and returns the result. In effect, that list-lookalike
1291       assigns to $scalar it's rightmost value. Many people mess this up
1292       because they choose a list-lookalike whose last element is also the
1293       count they expect:
1294
1295           my $scalar = ( 1, 2, 3 );  # $scalar gets 3, accidentally
1296
1297   What is the difference between $array[1] and @array[1]?
1298       (contributed by brian d foy)
1299
1300       The difference is the sigil, that special character in front of the
1301       array name. The "$" sigil means "exactly one item", while the "@" sigil
1302       means "zero or more items". The "$" gets you a single scalar, while the
1303       "@" gets you a list.
1304
1305       The confusion arises because people incorrectly assume that the sigil
1306       denotes the variable type.
1307
1308       The $array[1] is a single-element access to the array. It's going to
1309       return the item in index 1 (or undef if there is no item there).  If
1310       you intend to get exactly one element from the array, this is the form
1311       you should use.
1312
1313       The @array[1] is an array slice, although it has only one index.  You
1314       can pull out multiple elements simultaneously by specifying additional
1315       indices as a list, like @array[1,4,3,0].
1316
1317       Using a slice on the lefthand side of the assignment supplies list
1318       context to the righthand side. This can lead to unexpected results.
1319       For instance, if you want to read a single line from a filehandle,
1320       assigning to a scalar value is fine:
1321
1322           $array[1] = <STDIN>;
1323
1324       However, in list context, the line input operator returns all of the
1325       lines as a list. The first line goes into @array[1] and the rest of the
1326       lines mysteriously disappear:
1327
1328           @array[1] = <STDIN>;  # most likely not what you want
1329
1330       Either the "use warnings" pragma or the -w flag will warn you when you
1331       use an array slice with a single index.
1332
1333   How can I remove duplicate elements from a list or array?
1334       (contributed by brian d foy)
1335
1336       Use a hash. When you think the words "unique" or "duplicated", think
1337       "hash keys".
1338
1339       If you don't care about the order of the elements, you could just
1340       create the hash then extract the keys. It's not important how you
1341       create that hash: just that you use "keys" to get the unique elements.
1342
1343           my %hash   = map { $_, 1 } @array;
1344           # or a hash slice: @hash{ @array } = ();
1345           # or a foreach: $hash{$_} = 1 foreach ( @array );
1346
1347           my @unique = keys %hash;
1348
1349       If you want to use a module, try the "uniq" function from
1350       List::MoreUtils. In list context it returns the unique elements,
1351       preserving their order in the list. In scalar context, it returns the
1352       number of unique elements.
1353
1354           use List::MoreUtils qw(uniq);
1355
1356           my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1357           my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1358
1359       You can also go through each element and skip the ones you've seen
1360       before. Use a hash to keep track. The first time the loop sees an
1361       element, that element has no key in %Seen. The "next" statement creates
1362       the key and immediately uses its value, which is "undef", so the loop
1363       continues to the "push" and increments the value for that key. The next
1364       time the loop sees that same element, its key exists in the hash and
1365       the value for that key is true (since it's not 0 or "undef"), so the
1366       next skips that iteration and the loop goes to the next element.
1367
1368           my @unique = ();
1369           my %seen   = ();
1370
1371           foreach my $elem ( @array ) {
1372               next if $seen{ $elem }++;
1373               push @unique, $elem;
1374           }
1375
1376       You can write this more briefly using a grep, which does the same
1377       thing.
1378
1379           my %seen = ();
1380           my @unique = grep { ! $seen{ $_ }++ } @array;
1381
1382   How can I tell whether a certain element is contained in a list or array?
1383       (portions of this answer contributed by Anno Siegel and brian d foy)
1384
1385       Hearing the word "in" is an indication that you probably should have
1386       used a hash, not a list or array, to store your data. Hashes are
1387       designed to answer this question quickly and efficiently. Arrays
1388       aren't.
1389
1390       That being said, there are several ways to approach this. If you are
1391       going to make this query many times over arbitrary string values, the
1392       fastest way is probably to invert the original array and maintain a
1393       hash whose keys are the first array's values:
1394
1395           my @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1396           my %is_blue = ();
1397           for (@blues) { $is_blue{$_} = 1 }
1398
1399       Now you can check whether $is_blue{$some_color}. It might have been a
1400       good idea to keep the blues all in a hash in the first place.
1401
1402       If the values are all small integers, you could use a simple indexed
1403       array. This kind of an array will take up less space:
1404
1405           my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1406           my @is_tiny_prime = ();
1407           for (@primes) { $is_tiny_prime[$_] = 1 }
1408           # or simply  @istiny_prime[@primes] = (1) x @primes;
1409
1410       Now you check whether $is_tiny_prime[$some_number].
1411
1412       If the values in question are integers instead of strings, you can save
1413       quite a lot of space by using bit strings instead:
1414
1415           my @articles = ( 1..10, 150..2000, 2017 );
1416           undef $read;
1417           for (@articles) { vec($read,$_,1) = 1 }
1418
1419       Now check whether "vec($read,$n,1)" is true for some $n.
1420
1421       These methods guarantee fast individual tests but require a re-
1422       organization of the original list or array. They only pay off if you
1423       have to test multiple values against the same array.
1424
1425       If you are testing only once, the standard module List::Util exports
1426       the function "any" for this purpose. It works by stopping once it finds
1427       the element. It's written in C for speed, and its Perl equivalent looks
1428       like this subroutine:
1429
1430           sub any (&@) {
1431               my $code = shift;
1432               foreach (@_) {
1433                   return 1 if $code->();
1434               }
1435               return 0;
1436           }
1437
1438       If speed is of little concern, the common idiom uses grep in scalar
1439       context (which returns the number of items that passed its condition)
1440       to traverse the entire list. This does have the benefit of telling you
1441       how many matches it found, though.
1442
1443           my $is_there = grep $_ eq $whatever, @array;
1444
1445       If you want to actually extract the matching elements, simply use grep
1446       in list context.
1447
1448           my @matches = grep $_ eq $whatever, @array;
1449
1450   How do I compute the difference of two arrays? How do I compute the
1451       intersection of two arrays?
1452       Use a hash. Here's code to do both and more. It assumes that each
1453       element is unique in a given array:
1454
1455           my (@union, @intersection, @difference);
1456           my %count = ();
1457           foreach my $element (@array1, @array2) { $count{$element}++ }
1458           foreach my $element (keys %count) {
1459               push @union, $element;
1460               push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1461           }
1462
1463       Note that this is the symmetric difference, that is, all elements in
1464       either A or in B but not in both. Think of it as an xor operation.
1465
1466   How do I test whether two arrays or hashes are equal?
1467       The following code works for single-level arrays. It uses a stringwise
1468       comparison, and does not distinguish defined versus undefined empty
1469       strings. Modify if you have other needs.
1470
1471           $are_equal = compare_arrays(\@frogs, \@toads);
1472
1473           sub compare_arrays {
1474               my ($first, $second) = @_;
1475               no warnings;  # silence spurious -w undef complaints
1476               return 0 unless @$first == @$second;
1477               for (my $i = 0; $i < @$first; $i++) {
1478                   return 0 if $first->[$i] ne $second->[$i];
1479               }
1480               return 1;
1481           }
1482
1483       For multilevel structures, you may wish to use an approach more like
1484       this one. It uses the CPAN module FreezeThaw:
1485
1486           use FreezeThaw qw(cmpStr);
1487           my @a = my @b = ( "this", "that", [ "more", "stuff" ] );
1488
1489           printf "a and b contain %s arrays\n",
1490               cmpStr(\@a, \@b) == 0
1491               ? "the same"
1492               : "different";
1493
1494       This approach also works for comparing hashes. Here we'll demonstrate
1495       two different answers:
1496
1497           use FreezeThaw qw(cmpStr cmpStrHard);
1498
1499           my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1500           $a{EXTRA} = \%b;
1501           $b{EXTRA} = \%a;
1502
1503           printf "a and b contain %s hashes\n",
1504           cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1505
1506           printf "a and b contain %s hashes\n",
1507           cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1508
1509       The first reports that both those the hashes contain the same data,
1510       while the second reports that they do not. Which you prefer is left as
1511       an exercise to the reader.
1512
1513   How do I find the first array element for which a condition is true?
1514       To find the first array element which satisfies a condition, you can
1515       use the "first()" function in the List::Util module, which comes with
1516       Perl 5.8. This example finds the first element that contains "Perl".
1517
1518           use List::Util qw(first);
1519
1520           my $element = first { /Perl/ } @array;
1521
1522       If you cannot use List::Util, you can make your own loop to do the same
1523       thing. Once you find the element, you stop the loop with last.
1524
1525           my $found;
1526           foreach ( @array ) {
1527               if( /Perl/ ) { $found = $_; last }
1528           }
1529
1530       If you want the array index, use the "firstidx()" function from
1531       "List::MoreUtils":
1532
1533           use List::MoreUtils qw(firstidx);
1534           my $index = firstidx { /Perl/ } @array;
1535
1536       Or write it yourself, iterating through the indices and checking the
1537       array element at each index until you find one that satisfies the
1538       condition:
1539
1540           my( $found, $index ) = ( undef, -1 );
1541           for( $i = 0; $i < @array; $i++ ) {
1542               if( $array[$i] =~ /Perl/ ) {
1543                   $found = $array[$i];
1544                   $index = $i;
1545                   last;
1546               }
1547           }
1548
1549   How do I handle linked lists?
1550       (contributed by brian d foy)
1551
1552       Perl's arrays do not have a fixed size, so you don't need linked lists
1553       if you just want to add or remove items. You can use array operations
1554       such as "push", "pop", "shift", "unshift", or "splice" to do that.
1555
1556       Sometimes, however, linked lists can be useful in situations where you
1557       want to "shard" an array so you have many small arrays instead of a
1558       single big array. You can keep arrays longer than Perl's largest array
1559       index, lock smaller arrays separately in threaded programs, reallocate
1560       less memory, or quickly insert elements in the middle of the chain.
1561
1562       Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
1563       Linked Lists" ( <http://www.slideshare.net/lembark/perly-linked-lists>
1564       ), although you can just use his LinkedList::Single module.
1565
1566   How do I handle circular lists?
1567       (contributed by brian d foy)
1568
1569       If you want to cycle through an array endlessly, you can increment the
1570       index modulo the number of elements in the array:
1571
1572           my @array = qw( a b c );
1573           my $i = 0;
1574
1575           while( 1 ) {
1576               print $array[ $i++ % @array ], "\n";
1577               last if $i > 20;
1578           }
1579
1580       You can also use Tie::Cycle to use a scalar that always has the next
1581       element of the circular array:
1582
1583           use Tie::Cycle;
1584
1585           tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1586
1587           print $cycle; # FFFFFF
1588           print $cycle; # 000000
1589           print $cycle; # FFFF00
1590
1591       The Array::Iterator::Circular creates an iterator object for circular
1592       arrays:
1593
1594           use Array::Iterator::Circular;
1595
1596           my $color_iterator = Array::Iterator::Circular->new(
1597               qw(red green blue orange)
1598               );
1599
1600           foreach ( 1 .. 20 ) {
1601               print $color_iterator->next, "\n";
1602           }
1603
1604   How do I shuffle an array randomly?
1605       If you either have Perl 5.8.0 or later installed, or if you have
1606       Scalar-List-Utils 1.03 or later installed, you can say:
1607
1608           use List::Util 'shuffle';
1609
1610           @shuffled = shuffle(@list);
1611
1612       If not, you can use a Fisher-Yates shuffle.
1613
1614           sub fisher_yates_shuffle {
1615               my $deck = shift;  # $deck is a reference to an array
1616               return unless @$deck; # must not be empty!
1617
1618               my $i = @$deck;
1619               while (--$i) {
1620                   my $j = int rand ($i+1);
1621                   @$deck[$i,$j] = @$deck[$j,$i];
1622               }
1623           }
1624
1625           # shuffle my mpeg collection
1626           #
1627           my @mpeg = <audio/*/*.mp3>;
1628           fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1629           print @mpeg;
1630
1631       Note that the above implementation shuffles an array in place, unlike
1632       the "List::Util::shuffle()" which takes a list and returns a new
1633       shuffled list.
1634
1635       You've probably seen shuffling algorithms that work using splice,
1636       randomly picking another element to swap the current element with
1637
1638           srand;
1639           @new = ();
1640           @old = 1 .. 10;  # just a demo
1641           while (@old) {
1642               push(@new, splice(@old, rand @old, 1));
1643           }
1644
1645       This is bad because splice is already O(N), and since you do it N
1646       times, you just invented a quadratic algorithm; that is, O(N**2).  This
1647       does not scale, although Perl is so efficient that you probably won't
1648       notice this until you have rather largish arrays.
1649
1650   How do I process/modify each element of an array?
1651       Use "for"/"foreach":
1652
1653           for (@lines) {
1654               s/foo/bar/;    # change that word
1655               tr/XZ/ZX/;    # swap those letters
1656           }
1657
1658       Here's another; let's compute spherical volumes:
1659
1660           my @volumes = @radii;
1661           for (@volumes) {   # @volumes has changed parts
1662               $_ **= 3;
1663               $_ *= (4/3) * 3.14159;  # this will be constant folded
1664           }
1665
1666       which can also be done with "map()" which is made to transform one list
1667       into another:
1668
1669           my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1670
1671       If you want to do the same thing to modify the values of the hash, you
1672       can use the "values" function. As of Perl 5.6 the values are not
1673       copied, so if you modify $orbit (in this case), you modify the value.
1674
1675           for my $orbit ( values %orbits ) {
1676               ($orbit **= 3) *= (4/3) * 3.14159;
1677           }
1678
1679       Prior to perl 5.6 "values" returned copies of the values, so older perl
1680       code often contains constructions such as @orbits{keys %orbits} instead
1681       of "values %orbits" where the hash is to be modified.
1682
1683   How do I select a random element from an array?
1684       Use the "rand()" function (see "rand" in perlfunc):
1685
1686           my $index   = rand @array;
1687           my $element = $array[$index];
1688
1689       Or, simply:
1690
1691           my $element = $array[ rand @array ];
1692
1693   How do I permute N elements of a list?
1694       Use the List::Permutor module on CPAN. If the list is actually an
1695       array, try the Algorithm::Permute module (also on CPAN). It's written
1696       in XS code and is very efficient:
1697
1698           use Algorithm::Permute;
1699
1700           my @array = 'a'..'d';
1701           my $p_iterator = Algorithm::Permute->new ( \@array );
1702
1703           while (my @perm = $p_iterator->next) {
1704              print "next permutation: (@perm)\n";
1705           }
1706
1707       For even faster execution, you could do:
1708
1709           use Algorithm::Permute;
1710
1711           my @array = 'a'..'d';
1712
1713           Algorithm::Permute::permute {
1714               print "next permutation: (@array)\n";
1715           } @array;
1716
1717       Here's a little program that generates all permutations of all the
1718       words on each line of input. The algorithm embodied in the "permute()"
1719       function is discussed in Volume 4 (still unpublished) of Knuth's The
1720       Art of Computer Programming and will work on any list:
1721
1722           #!/usr/bin/perl -n
1723           # Fischer-Krause ordered permutation generator
1724
1725           sub permute (&@) {
1726               my $code = shift;
1727               my @idx = 0..$#_;
1728               while ( $code->(@_[@idx]) ) {
1729                   my $p = $#idx;
1730                   --$p while $idx[$p-1] > $idx[$p];
1731                   my $q = $p or return;
1732                   push @idx, reverse splice @idx, $p;
1733                   ++$q while $idx[$p-1] > $idx[$q];
1734                   @idx[$p-1,$q]=@idx[$q,$p-1];
1735               }
1736           }
1737
1738           permute { print "@_\n" } split;
1739
1740       The Algorithm::Loops module also provides the "NextPermute" and
1741       "NextPermuteNum" functions which efficiently find all unique
1742       permutations of an array, even if it contains duplicate values,
1743       modifying it in-place: if its elements are in reverse-sorted order then
1744       the array is reversed, making it sorted, and it returns false;
1745       otherwise the next permutation is returned.
1746
1747       "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1748       you can enumerate all the permutations of 0..9 like this:
1749
1750           use Algorithm::Loops qw(NextPermuteNum);
1751
1752           my @list= 0..9;
1753           do { print "@list\n" } while NextPermuteNum @list;
1754
1755   How do I sort an array by (anything)?
1756       Supply a comparison function to sort() (described in "sort" in
1757       perlfunc):
1758
1759           @list = sort { $a <=> $b } @list;
1760
1761       The default sort function is cmp, string comparison, which would sort
1762       "(1, 2, 10)" into "(1, 10, 2)". "<=>", used above, is the numerical
1763       comparison operator.
1764
1765       If you have a complicated function needed to pull out the part you want
1766       to sort on, then don't do it inside the sort function. Pull it out
1767       first, because the sort BLOCK can be called many times for the same
1768       element. Here's an example of how to pull out the first word after the
1769       first number on each item, and then sort those words case-
1770       insensitively.
1771
1772           my @idx;
1773           for (@data) {
1774               my $item;
1775               ($item) = /\d+\s*(\S+)/;
1776               push @idx, uc($item);
1777           }
1778           my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1779
1780       which could also be written this way, using a trick that's come to be
1781       known as the Schwartzian Transform:
1782
1783           my @sorted = map  { $_->[0] }
1784               sort { $a->[1] cmp $b->[1] }
1785               map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1786
1787       If you need to sort on several fields, the following paradigm is
1788       useful.
1789
1790           my @sorted = sort {
1791               field1($a) <=> field1($b) ||
1792               field2($a) cmp field2($b) ||
1793               field3($a) cmp field3($b)
1794           } @data;
1795
1796       This can be conveniently combined with precalculation of keys as given
1797       above.
1798
1799       See the sort article in the "Far More Than You Ever Wanted To Know"
1800       collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for more
1801       about this approach.
1802
1803       See also the question later in perlfaq4 on sorting hashes.
1804
1805   How do I manipulate arrays of bits?
1806       Use "pack()" and "unpack()", or else "vec()" and the bitwise
1807       operations.
1808
1809       For example, you don't have to store individual bits in an array (which
1810       would mean that you're wasting a lot of space). To convert an array of
1811       bits to a string, use "vec()" to set the right bits. This sets $vec to
1812       have bit N set only if $ints[N] was set:
1813
1814           my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1815           my $vec = '';
1816           foreach( 0 .. $#ints ) {
1817               vec($vec,$_,1) = 1 if $ints[$_];
1818           }
1819
1820       The string $vec only takes up as many bits as it needs. For instance,
1821       if you had 16 entries in @ints, $vec only needs two bytes to store them
1822       (not counting the scalar variable overhead).
1823
1824       Here's how, given a vector in $vec, you can get those bits into your
1825       @ints array:
1826
1827           sub bitvec_to_list {
1828               my $vec = shift;
1829               my @ints;
1830               # Find null-byte density then select best algorithm
1831               if ($vec =~ tr/\0// / length $vec > 0.95) {
1832                   use integer;
1833                   my $i;
1834
1835                   # This method is faster with mostly null-bytes
1836                   while($vec =~ /[^\0]/g ) {
1837                       $i = -9 + 8 * pos $vec;
1838                       push @ints, $i if vec($vec, ++$i, 1);
1839                       push @ints, $i if vec($vec, ++$i, 1);
1840                       push @ints, $i if vec($vec, ++$i, 1);
1841                       push @ints, $i if vec($vec, ++$i, 1);
1842                       push @ints, $i if vec($vec, ++$i, 1);
1843                       push @ints, $i if vec($vec, ++$i, 1);
1844                       push @ints, $i if vec($vec, ++$i, 1);
1845                       push @ints, $i if vec($vec, ++$i, 1);
1846                   }
1847               }
1848               else {
1849                   # This method is a fast general algorithm
1850                   use integer;
1851                   my $bits = unpack "b*", $vec;
1852                   push @ints, 0 if $bits =~ s/^(\d)// && $1;
1853                   push @ints, pos $bits while($bits =~ /1/g);
1854               }
1855
1856               return \@ints;
1857           }
1858
1859       This method gets faster the more sparse the bit vector is.  (Courtesy
1860       of Tim Bunce and Winfried Koenig.)
1861
1862       You can make the while loop a lot shorter with this suggestion from
1863       Benjamin Goldberg:
1864
1865           while($vec =~ /[^\0]+/g ) {
1866               push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1867           }
1868
1869       Or use the CPAN module Bit::Vector:
1870
1871           my $vector = Bit::Vector->new($num_of_bits);
1872           $vector->Index_List_Store(@ints);
1873           my @ints = $vector->Index_List_Read();
1874
1875       Bit::Vector provides efficient methods for bit vector, sets of small
1876       integers and "big int" math.
1877
1878       Here's a more extensive illustration using vec():
1879
1880           # vec demo
1881           my $vector = "\xff\x0f\xef\xfe";
1882           print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1883           unpack("N", $vector), "\n";
1884           my $is_set = vec($vector, 23, 1);
1885           print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1886           pvec($vector);
1887
1888           set_vec(1,1,1);
1889           set_vec(3,1,1);
1890           set_vec(23,1,1);
1891
1892           set_vec(3,1,3);
1893           set_vec(3,2,3);
1894           set_vec(3,4,3);
1895           set_vec(3,4,7);
1896           set_vec(3,8,3);
1897           set_vec(3,8,7);
1898
1899           set_vec(0,32,17);
1900           set_vec(1,32,17);
1901
1902           sub set_vec {
1903               my ($offset, $width, $value) = @_;
1904               my $vector = '';
1905               vec($vector, $offset, $width) = $value;
1906               print "offset=$offset width=$width value=$value\n";
1907               pvec($vector);
1908           }
1909
1910           sub pvec {
1911               my $vector = shift;
1912               my $bits = unpack("b*", $vector);
1913               my $i = 0;
1914               my $BASE = 8;
1915
1916               print "vector length in bytes: ", length($vector), "\n";
1917               @bytes = unpack("A8" x length($vector), $bits);
1918               print "bits are: @bytes\n\n";
1919           }
1920
1921   Why does defined() return true on empty arrays and hashes?
1922       The short story is that you should probably only use defined on scalars
1923       or functions, not on aggregates (arrays and hashes). See "defined" in
1924       perlfunc in the 5.004 release or later of Perl for more detail.
1925

Data: Hashes (Associative Arrays)

1927   How do I process an entire hash?
1928       (contributed by brian d foy)
1929
1930       There are a couple of ways that you can process an entire hash. You can
1931       get a list of keys, then go through each key, or grab a one key-value
1932       pair at a time.
1933
1934       To go through all of the keys, use the "keys" function. This extracts
1935       all of the keys of the hash and gives them back to you as a list. You
1936       can then get the value through the particular key you're processing:
1937
1938           foreach my $key ( keys %hash ) {
1939               my $value = $hash{$key}
1940               ...
1941           }
1942
1943       Once you have the list of keys, you can process that list before you
1944       process the hash elements. For instance, you can sort the keys so you
1945       can process them in lexical order:
1946
1947           foreach my $key ( sort keys %hash ) {
1948               my $value = $hash{$key}
1949               ...
1950           }
1951
1952       Or, you might want to only process some of the items. If you only want
1953       to deal with the keys that start with "text:", you can select just
1954       those using "grep":
1955
1956           foreach my $key ( grep /^text:/, keys %hash ) {
1957               my $value = $hash{$key}
1958               ...
1959           }
1960
1961       If the hash is very large, you might not want to create a long list of
1962       keys. To save some memory, you can grab one key-value pair at a time
1963       using "each()", which returns a pair you haven't seen yet:
1964
1965           while( my( $key, $value ) = each( %hash ) ) {
1966               ...
1967           }
1968
1969       The "each" operator returns the pairs in apparently random order, so if
1970       ordering matters to you, you'll have to stick with the "keys" method.
1971
1972       The "each()" operator can be a bit tricky though. You can't add or
1973       delete keys of the hash while you're using it without possibly skipping
1974       or re-processing some pairs after Perl internally rehashes all of the
1975       elements. Additionally, a hash has only one iterator, so if you mix
1976       "keys", "values", or "each" on the same hash, you risk resetting the
1977       iterator and messing up your processing. See the "each" entry in
1978       perlfunc for more details.
1979
1980   How do I merge two hashes?
1981       (contributed by brian d foy)
1982
1983       Before you decide to merge two hashes, you have to decide what to do if
1984       both hashes contain keys that are the same and if you want to leave the
1985       original hashes as they were.
1986
1987       If you want to preserve the original hashes, copy one hash (%hash1) to
1988       a new hash (%new_hash), then add the keys from the other hash (%hash2
1989       to the new hash. Checking that the key already exists in %new_hash
1990       gives you a chance to decide what to do with the duplicates:
1991
1992           my %new_hash = %hash1; # make a copy; leave %hash1 alone
1993
1994           foreach my $key2 ( keys %hash2 ) {
1995               if( exists $new_hash{$key2} ) {
1996                   warn "Key [$key2] is in both hashes!";
1997                   # handle the duplicate (perhaps only warning)
1998                   ...
1999                   next;
2000               }
2001               else {
2002                   $new_hash{$key2} = $hash2{$key2};
2003               }
2004           }
2005
2006       If you don't want to create a new hash, you can still use this looping
2007       technique; just change the %new_hash to %hash1.
2008
2009           foreach my $key2 ( keys %hash2 ) {
2010               if( exists $hash1{$key2} ) {
2011                   warn "Key [$key2] is in both hashes!";
2012                   # handle the duplicate (perhaps only warning)
2013                   ...
2014                   next;
2015               }
2016               else {
2017                   $hash1{$key2} = $hash2{$key2};
2018               }
2019             }
2020
2021       If you don't care that one hash overwrites keys and values from the
2022       other, you could just use a hash slice to add one hash to another. In
2023       this case, values from %hash2 replace values from %hash1 when they have
2024       keys in common:
2025
2026           @hash1{ keys %hash2 } = values %hash2;
2027
2028   What happens if I add or remove keys from a hash while iterating over it?
2029       (contributed by brian d foy)
2030
2031       The easy answer is "Don't do that!"
2032
2033       If you iterate through the hash with each(), you can delete the key
2034       most recently returned without worrying about it. If you delete or add
2035       other keys, the iterator may skip or double up on them since perl may
2036       rearrange the hash table. See the entry for "each()" in perlfunc.
2037
2038   How do I look up a hash element by value?
2039       Create a reverse hash:
2040
2041           my %by_value = reverse %by_key;
2042           my $key = $by_value{$value};
2043
2044       That's not particularly efficient. It would be more space-efficient to
2045       use:
2046
2047           while (my ($key, $value) = each %by_key) {
2048               $by_value{$value} = $key;
2049           }
2050
2051       If your hash could have repeated values, the methods above will only
2052       find one of the associated keys.  This may or may not worry you. If it
2053       does worry you, you can always reverse the hash into a hash of arrays
2054       instead:
2055
2056           while (my ($key, $value) = each %by_key) {
2057                push @{$key_list_by_value{$value}}, $key;
2058           }
2059
2060   How can I know how many entries are in a hash?
2061       (contributed by brian d foy)
2062
2063       This is very similar to "How do I process an entire hash?", also in
2064       perlfaq4, but a bit simpler in the common cases.
2065
2066       You can use the "keys()" built-in function in scalar context to find
2067       out have many entries you have in a hash:
2068
2069           my $key_count = keys %hash; # must be scalar context!
2070
2071       If you want to find out how many entries have a defined value, that's a
2072       bit different. You have to check each value. A "grep" is handy:
2073
2074           my $defined_value_count = grep { defined } values %hash;
2075
2076       You can use that same structure to count the entries any way that you
2077       like. If you want the count of the keys with vowels in them, you just
2078       test for that instead:
2079
2080           my $vowel_count = grep { /[aeiou]/ } keys %hash;
2081
2082       The "grep" in scalar context returns the count. If you want the list of
2083       matching items, just use it in list context instead:
2084
2085           my @defined_values = grep { defined } values %hash;
2086
2087       The "keys()" function also resets the iterator, which means that you
2088       may see strange results if you use this between uses of other hash
2089       operators such as "each()".
2090
2091   How do I sort a hash (optionally by value instead of key)?
2092       (contributed by brian d foy)
2093
2094       To sort a hash, start with the keys. In this example, we give the list
2095       of keys to the sort function which then compares them ASCIIbetically
2096       (which might be affected by your locale settings). The output list has
2097       the keys in ASCIIbetical order. Once we have the keys, we can go
2098       through them to create a report which lists the keys in ASCIIbetical
2099       order.
2100
2101           my @keys = sort { $a cmp $b } keys %hash;
2102
2103           foreach my $key ( @keys ) {
2104               printf "%-20s %6d\n", $key, $hash{$key};
2105           }
2106
2107       We could get more fancy in the "sort()" block though. Instead of
2108       comparing the keys, we can compute a value with them and use that value
2109       as the comparison.
2110
2111       For instance, to make our report order case-insensitive, we use "lc" to
2112       lowercase the keys before comparing them:
2113
2114           my @keys = sort { lc $a cmp lc $b } keys %hash;
2115
2116       Note: if the computation is expensive or the hash has many elements,
2117       you may want to look at the Schwartzian Transform to cache the
2118       computation results.
2119
2120       If we want to sort by the hash value instead, we use the hash key to
2121       look it up. We still get out a list of keys, but this time they are
2122       ordered by their value.
2123
2124           my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2125
2126       From there we can get more complex. If the hash values are the same, we
2127       can provide a secondary sort on the hash key.
2128
2129           my @keys = sort {
2130               $hash{$a} <=> $hash{$b}
2131                   or
2132               "\L$a" cmp "\L$b"
2133           } keys %hash;
2134
2135   How can I always keep my hash sorted?
2136       You can look into using the "DB_File" module and "tie()" using the
2137       $DB_BTREE hash bindings as documented in "In Memory Databases" in
2138       DB_File. The Tie::IxHash module from CPAN might also be instructive.
2139       Although this does keep your hash sorted, you might not like the
2140       slowdown you suffer from the tie interface. Are you sure you need to do
2141       this? :)
2142
2143   What's the difference between "delete" and "undef" with hashes?
2144       Hashes contain pairs of scalars: the first is the key, the second is
2145       the value. The key will be coerced to a string, although the value can
2146       be any kind of scalar: string, number, or reference. If a key $key is
2147       present in %hash, "exists($hash{$key})" will return true. The value for
2148       a given key can be "undef", in which case $hash{$key} will be "undef"
2149       while "exists $hash{$key}" will return true. This corresponds to ($key,
2150       "undef") being in the hash.
2151
2152       Pictures help... Here's the %hash table:
2153
2154             keys  values
2155           +------+------+
2156           |  a   |  3   |
2157           |  x   |  7   |
2158           |  d   |  0   |
2159           |  e   |  2   |
2160           +------+------+
2161
2162       And these conditions hold
2163
2164           $hash{'a'}                       is true
2165           $hash{'d'}                       is false
2166           defined $hash{'d'}               is true
2167           defined $hash{'a'}               is true
2168           exists $hash{'a'}                is true (Perl 5 only)
2169           grep ($_ eq 'a', keys %hash)     is true
2170
2171       If you now say
2172
2173           undef $hash{'a'}
2174
2175       your table now reads:
2176
2177             keys  values
2178           +------+------+
2179           |  a   | undef|
2180           |  x   |  7   |
2181           |  d   |  0   |
2182           |  e   |  2   |
2183           +------+------+
2184
2185       and these conditions now hold; changes in caps:
2186
2187           $hash{'a'}                       is FALSE
2188           $hash{'d'}                       is false
2189           defined $hash{'d'}               is true
2190           defined $hash{'a'}               is FALSE
2191           exists $hash{'a'}                is true (Perl 5 only)
2192           grep ($_ eq 'a', keys %hash)     is true
2193
2194       Notice the last two: you have an undef value, but a defined key!
2195
2196       Now, consider this:
2197
2198           delete $hash{'a'}
2199
2200       your table now reads:
2201
2202             keys  values
2203           +------+------+
2204           |  x   |  7   |
2205           |  d   |  0   |
2206           |  e   |  2   |
2207           +------+------+
2208
2209       and these conditions now hold; changes in caps:
2210
2211           $hash{'a'}                       is false
2212           $hash{'d'}                       is false
2213           defined $hash{'d'}               is true
2214           defined $hash{'a'}               is false
2215           exists $hash{'a'}                is FALSE (Perl 5 only)
2216           grep ($_ eq 'a', keys %hash)     is FALSE
2217
2218       See, the whole entry is gone!
2219
2220   Why don't my tied hashes make the defined/exists distinction?
2221       This depends on the tied hash's implementation of EXISTS().  For
2222       example, there isn't the concept of undef with hashes that are tied to
2223       DBM* files. It also means that exists() and defined() do the same thing
2224       with a DBM* file, and what they end up doing is not what they do with
2225       ordinary hashes.
2226
2227   How do I reset an each() operation part-way through?
2228       (contributed by brian d foy)
2229
2230       You can use the "keys" or "values" functions to reset "each". To simply
2231       reset the iterator used by "each" without doing anything else, use one
2232       of them in void context:
2233
2234           keys %hash; # resets iterator, nothing else.
2235           values %hash; # resets iterator, nothing else.
2236
2237       See the documentation for "each" in perlfunc.
2238
2239   How can I get the unique keys from two hashes?
2240       First you extract the keys from the hashes into lists, then solve the
2241       "removing duplicates" problem described above. For example:
2242
2243           my %seen = ();
2244           for my $element (keys(%foo), keys(%bar)) {
2245               $seen{$element}++;
2246           }
2247           my @uniq = keys %seen;
2248
2249       Or more succinctly:
2250
2251           my @uniq = keys %{{%foo,%bar}};
2252
2253       Or if you really want to save space:
2254
2255           my %seen = ();
2256           while (defined ($key = each %foo)) {
2257               $seen{$key}++;
2258           }
2259           while (defined ($key = each %bar)) {
2260               $seen{$key}++;
2261           }
2262           my @uniq = keys %seen;
2263
2264   How can I store a multidimensional array in a DBM file?
2265       Either stringify the structure yourself (no fun), or else get the MLDBM
2266       (which uses Data::Dumper) module from CPAN and layer it on top of
2267       either DB_File or GDBM_File. You might also try DBM::Deep, but it can
2268       be a bit slow.
2269
2270   How can I make my hash remember the order I put elements into it?
2271       Use the Tie::IxHash from CPAN.
2272
2273           use Tie::IxHash;
2274
2275           tie my %myhash, 'Tie::IxHash';
2276
2277           for (my $i=0; $i<20; $i++) {
2278               $myhash{$i} = 2*$i;
2279           }
2280
2281           my @keys = keys %myhash;
2282           # @keys = (0,1,2,3,...)
2283
2284   Why does passing a subroutine an undefined element in a hash create it?
2285       (contributed by brian d foy)
2286
2287       Are you using a really old version of Perl?
2288
2289       Normally, accessing a hash key's value for a nonexistent key will not
2290       create the key.
2291
2292           my %hash  = ();
2293           my $value = $hash{ 'foo' };
2294           print "This won't print\n" if exists $hash{ 'foo' };
2295
2296       Passing $hash{ 'foo' } to a subroutine used to be a special case,
2297       though.  Since you could assign directly to $_[0], Perl had to be ready
2298       to make that assignment so it created the hash key ahead of time:
2299
2300           my_sub( $hash{ 'foo' } );
2301           print "This will print before 5.004\n" if exists $hash{ 'foo' };
2302
2303           sub my_sub {
2304               # $_[0] = 'bar'; # create hash key in case you do this
2305               1;
2306           }
2307
2308       Since Perl 5.004, however, this situation is a special case and Perl
2309       creates the hash key only when you make the assignment:
2310
2311           my_sub( $hash{ 'foo' } );
2312           print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2313
2314           sub my_sub {
2315               $_[0] = 'bar';
2316           }
2317
2318       However, if you want the old behavior (and think carefully about that
2319       because it's a weird side effect), you can pass a hash slice instead.
2320       Perl 5.004 didn't make this a special case:
2321
2322           my_sub( @hash{ qw/foo/ } );
2323
2324   How can I make the Perl equivalent of a C structure/C++ class/hash or array
2325       of hashes or arrays?
2326       Usually a hash ref, perhaps like this:
2327
2328           $record = {
2329               NAME   => "Jason",
2330               EMPNO  => 132,
2331               TITLE  => "deputy peon",
2332               AGE    => 23,
2333               SALARY => 37_000,
2334               PALS   => [ "Norbert", "Rhys", "Phineas"],
2335           };
2336
2337       References are documented in perlref and perlreftut.  Examples of
2338       complex data structures are given in perldsc and perllol. Examples of
2339       structures and object-oriented classes are in perlootut.
2340
2341   How can I use a reference as a hash key?
2342       (contributed by brian d foy and Ben Morrow)
2343
2344       Hash keys are strings, so you can't really use a reference as the key.
2345       When you try to do that, perl turns the reference into its stringified
2346       form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2347       the reference from the stringified form, at least without doing some
2348       extra work on your own.
2349
2350       Remember that the entry in the hash will still be there even if the
2351       referenced variable  goes out of scope, and that it is entirely
2352       possible for Perl to subsequently allocate a different variable at the
2353       same address. This will mean a new variable might accidentally be
2354       associated with the value for an old.
2355
2356       If you have Perl 5.10 or later, and you just want to store a value
2357       against the reference for lookup later, you can use the core
2358       Hash::Util::Fieldhash module. This will also handle renaming the keys
2359       if you use multiple threads (which causes all variables to be
2360       reallocated at new addresses, changing their stringification), and
2361       garbage-collecting the entries when the referenced variable goes out of
2362       scope.
2363
2364       If you actually need to be able to get a real reference back from each
2365       hash entry, you can use the Tie::RefHash module, which does the
2366       required work for you.
2367
2368   How can I check if a key exists in a multilevel hash?
2369       (contributed by brian d foy)
2370
2371       The trick to this problem is avoiding accidental autovivification. If
2372       you want to check three keys deep, you might naïvely try this:
2373
2374           my %hash;
2375           if( exists $hash{key1}{key2}{key3} ) {
2376               ...;
2377           }
2378
2379       Even though you started with a completely empty hash, after that call
2380       to "exists" you've created the structure you needed to check for
2381       "key3":
2382
2383           %hash = (
2384                     'key1' => {
2385                                 'key2' => {}
2386                               }
2387                   );
2388
2389       That's autovivification. You can get around this in a few ways. The
2390       easiest way is to just turn it off. The lexical "autovivification"
2391       pragma is available on CPAN. Now you don't add to the hash:
2392
2393           {
2394               no autovivification;
2395               my %hash;
2396               if( exists $hash{key1}{key2}{key3} ) {
2397                   ...;
2398               }
2399           }
2400
2401       The Data::Diver module on CPAN can do it for you too. Its "Dive"
2402       subroutine can tell you not only if the keys exist but also get the
2403       value:
2404
2405           use Data::Diver qw(Dive);
2406
2407           my @exists = Dive( \%hash, qw(key1 key2 key3) );
2408           if(  ! @exists  ) {
2409               ...; # keys do not exist
2410           }
2411           elsif(  ! defined $exists[0]  ) {
2412               ...; # keys exist but value is undef
2413           }
2414
2415       You can easily do this yourself too by checking each level of the hash
2416       before you move onto the next level. This is essentially what
2417       Data::Diver does for you:
2418
2419           if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2420               ...;
2421           }
2422
2423           sub check_hash {
2424              my( $hash, @keys ) = @_;
2425
2426              return unless @keys;
2427
2428              foreach my $key ( @keys ) {
2429                  return unless eval { exists $hash->{$key} };
2430                  $hash = $hash->{$key};
2431               }
2432
2433              return 1;
2434           }
2435
2436   How can I prevent addition of unwanted keys into a hash?
2437       Since version 5.8.0, hashes can be restricted to a fixed number of
2438       given keys. Methods for creating and dealing with restricted hashes are
2439       exported by the Hash::Util module.
2440

Data: Misc

2442   How do I handle binary data correctly?
2443       Perl is binary-clean, so it can handle binary data just fine.  On
2444       Windows or DOS, however, you have to use "binmode" for binary files to
2445       avoid conversions for line endings. In general, you should use
2446       "binmode" any time you want to work with binary data.
2447
2448       Also see "binmode" in perlfunc or perlopentut.
2449
2450       If you're concerned about 8-bit textual data then see perllocale.  If
2451       you want to deal with multibyte characters, however, there are some
2452       gotchas. See the section on Regular Expressions.
2453
2454   How do I determine whether a scalar is a number/whole/integer/float?
2455       Assuming that you don't care about IEEE notations like "NaN" or
2456       "Infinity", you probably just want to use a regular expression (see
2457       also perlretut and perlre):
2458
2459           use 5.010;
2460
2461           if ( /\D/ )
2462               { say "\thas nondigits"; }
2463           if ( /^\d+\z/ )
2464               { say "\tis a whole number"; }
2465           if ( /^-?\d+\z/ )
2466               { say "\tis an integer"; }
2467           if ( /^[+-]?\d+\z/ )
2468               { say "\tis a +/- integer"; }
2469           if ( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2470               { say "\tis a real number"; }
2471           if ( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i )
2472               { say "\tis a C float" }
2473
2474       There are also some commonly used modules for the task.  Scalar::Util
2475       (distributed with 5.8) provides access to perl's internal function
2476       "looks_like_number" for determining whether a variable looks like a
2477       number. Data::Types exports functions that validate data types using
2478       both the above and other regular expressions. Thirdly, there is
2479       Regexp::Common which has regular expressions to match various types of
2480       numbers. Those three modules are available from the CPAN.
2481
2482       If you're on a POSIX system, Perl supports the "POSIX::strtod" function
2483       for converting strings to doubles (and also "POSIX::strtol" for longs).
2484       Its semantics are somewhat cumbersome, so here's a "getnum" wrapper
2485       function for more convenient access. This function takes a string and
2486       returns the number it found, or "undef" for input that isn't a C float.
2487       The "is_numeric" function is a front end to "getnum" if you just want
2488       to say, "Is this a float?"
2489
2490           sub getnum {
2491               use POSIX qw(strtod);
2492               my $str = shift;
2493               $str =~ s/^\s+//;
2494               $str =~ s/\s+$//;
2495               $! = 0;
2496               my($num, $unparsed) = strtod($str);
2497               if (($str eq '') || ($unparsed != 0) || $!) {
2498                       return undef;
2499               }
2500               else {
2501                   return $num;
2502               }
2503           }
2504
2505           sub is_numeric { defined getnum($_[0]) }
2506
2507       Or you could check out the String::Scanf module on the CPAN instead.
2508
2509   How do I keep persistent data across program calls?
2510       For some specific applications, you can use one of the DBM modules.
2511       See AnyDBM_File. More generically, you should consult the FreezeThaw or
2512       Storable modules from CPAN. Starting from Perl 5.8, Storable is part of
2513       the standard distribution. Here's one example using Storable's "store"
2514       and "retrieve" functions:
2515
2516           use Storable;
2517           store(\%hash, "filename");
2518
2519           # later on...
2520           $href = retrieve("filename");        # by ref
2521           %hash = %{ retrieve("filename") };   # direct to hash
2522
2523   How do I print out or copy a recursive data structure?
2524       The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
2525       for printing out data structures. The Storable module on CPAN (or the
2526       5.8 release of Perl), provides a function called "dclone" that
2527       recursively copies its argument.
2528
2529           use Storable qw(dclone);
2530           $r2 = dclone($r1);
2531
2532       Where $r1 can be a reference to any kind of data structure you'd like.
2533       It will be deeply copied. Because "dclone" takes and returns
2534       references, you'd have to add extra punctuation if you had a hash of
2535       arrays that you wanted to copy.
2536
2537           %newhash = %{ dclone(\%oldhash) };
2538
2539   How do I define methods for every class/object?
2540       (contributed by Ben Morrow)
2541
2542       You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2543       very careful to consider the consequences of doing this: adding methods
2544       to every object is very likely to have unintended consequences. If
2545       possible, it would be better to have all your object inherit from some
2546       common base class, or to use an object system like Moose that supports
2547       roles.
2548
2549   How do I verify a credit card checksum?
2550       Get the Business::CreditCard module from CPAN.
2551
2552   How do I pack arrays of doubles or floats for XS code?
2553       The arrays.h/arrays.c code in the PGPLOT module on CPAN does just this.
2554       If you're doing a lot of float or double processing, consider using the
2555       PDL module from CPAN instead--it makes number-crunching easy.
2556
2557       See <https://metacpan.org/release/PGPLOT> for the code.
2558

AUTHOR AND COPYRIGHT

2560       Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other
2561       authors as noted. All rights reserved.
2562
2563       This documentation is free; you can redistribute it and/or modify it
2564       under the same terms as Perl itself.
2565
2566       Irrespective of its distribution, all code examples in this file are
2567       hereby placed into the public domain. You are permitted and encouraged
2568       to use this code in your own programs for fun or for profit as you see
2569       fit. A simple comment in the code giving credit would be courteous but
2570       is not required.
2571
2572
2573
2574perl v5.36.0                      2022-07-22                       perlfaq4(3)