1perlfaq4(3) User Contributed Perl Documentation perlfaq4(3)
2
3
4
6 perlfaq4 - Data Manipulation
7
9 version 5.20190126
10
12 This section of the FAQ answers questions related to manipulating
13 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
14
16 Why am I getting long decimals (eg, 19.9499999999999) instead of the
17 numbers I should be getting (eg, 19.95)?
18 For the long explanation, see David Goldberg's "What Every Computer
19 Scientist Should Know About Floating-Point Arithmetic"
20 (<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>).
21
22 Internally, your computer represents floating-point numbers in binary.
23 Digital (as in powers of two) computers cannot store all numbers
24 exactly. Some real numbers lose precision in the process. This is a
25 problem with how computers store numbers and affects all computer
26 languages, not just Perl.
27
28 perlnumber shows the gory details of number representations and
29 conversions.
30
31 To limit the number of decimal places in your numbers, you can use the
32 "printf" or "sprintf" function. See "Floating-point Arithmetic" in
33 perlop for more details.
34
35 printf "%.2f", 10/3;
36
37 my $number = sprintf "%.2f", 10/3;
38
39 Why is int() broken?
40 Your "int()" is most probably working just fine. It's the numbers that
41 aren't quite what you think.
42
43 First, see the answer to "Why am I getting long decimals (eg,
44 19.9499999999999) instead of the numbers I should be getting (eg,
45 19.95)?".
46
47 For example, this
48
49 print int(0.6/0.2-2), "\n";
50
51 will in most computers print 0, not 1, because even such simple numbers
52 as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
53 What you think in the above as 'three' is really more like
54 2.9999999999999995559.
55
56 Why isn't my octal data interpreted correctly?
57 (contributed by brian d foy)
58
59 You're probably trying to convert a string to a number, which Perl only
60 converts as a decimal number. When Perl converts a string to a number,
61 it ignores leading spaces and zeroes, then assumes the rest of the
62 digits are in base 10:
63
64 my $string = '0644';
65
66 print $string + 0; # prints 644
67
68 print $string + 44; # prints 688, certainly not octal!
69
70 This problem usually involves one of the Perl built-ins that has the
71 same name a Unix command that uses octal numbers as arguments on the
72 command line. In this example, "chmod" on the command line knows that
73 its first argument is octal because that's what it does:
74
75 %prompt> chmod 644 file
76
77 If you want to use the same literal digits (644) in Perl, you have to
78 tell Perl to treat them as octal numbers either by prefixing the digits
79 with a 0 or using "oct":
80
81 chmod( 0644, $filename ); # right, has leading zero
82 chmod( oct(644), $filename ); # also correct
83
84 The problem comes in when you take your numbers from something that
85 Perl thinks is a string, such as a command line argument in @ARGV:
86
87 chmod( $ARGV[0], $filename ); # wrong, even if "0644"
88
89 chmod( oct($ARGV[0]), $filename ); # correct, treat string as octal
90
91 You can always check the value you're using by printing it in octal
92 notation to ensure it matches what you think it should be. Print it in
93 octal and decimal format:
94
95 printf "0%o %d", $number, $number;
96
97 Does Perl have a round() function? What about ceil() and floor()? Trig
98 functions?
99 Remember that "int()" merely truncates toward 0. For rounding to a
100 certain number of digits, "sprintf()" or "printf()" is usually the
101 easiest route.
102
103 printf("%.3f", 3.1415926535); # prints 3.142
104
105 The POSIX module (part of the standard Perl distribution) implements
106 "ceil()", "floor()", and a number of other mathematical and
107 trigonometric functions.
108
109 use POSIX;
110 my $ceil = ceil(3.5); # 4
111 my $floor = floor(3.5); # 3
112
113 In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
114 module. With 5.004, the Math::Trig module (part of the standard Perl
115 distribution) implements the trigonometric functions. Internally it
116 uses the Math::Complex module and some functions can break out from the
117 real axis into the complex plane, for example the inverse sine of 2.
118
119 Rounding in financial applications can have serious implications, and
120 the rounding method used should be specified precisely. In these cases,
121 it probably pays not to trust whichever system of rounding is being
122 used by Perl, but instead to implement the rounding function you need
123 yourself.
124
125 To see why, notice how you'll still have an issue on half-way-point
126 alternation:
127
128 for (my $i = -5; $i <= 5; $i += 0.5) { printf "%.0f ",$i }
129
130 -5 -4 -4 -4 -3 -2 -2 -2 -1 -0 0 0 1 2 2 2 3 4 4 4 5
131
132 Don't blame Perl. It's the same as in C. IEEE says we have to do this.
133 Perl numbers whose absolute values are integers under 2**31 (on 32-bit
134 machines) will work pretty much like mathematical integers. Other
135 numbers are not guaranteed.
136
137 How do I convert between numeric representations/bases/radixes?
138 As always with Perl there is more than one way to do it. Below are a
139 few examples of approaches to making common conversions between number
140 representations. This is intended to be representational rather than
141 exhaustive.
142
143 Some of the examples later in perlfaq4 use the Bit::Vector module from
144 CPAN. The reason you might choose Bit::Vector over the perl built-in
145 functions is that it works with numbers of ANY size, that it is
146 optimized for speed on some operations, and for at least some
147 programmers the notation might be familiar.
148
149 How do I convert hexadecimal into decimal
150 Using perl's built in conversion of "0x" notation:
151
152 my $dec = 0xDEADBEEF;
153
154 Using the "hex" function:
155
156 my $dec = hex("DEADBEEF");
157
158 Using "pack":
159
160 my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
161
162 Using the CPAN module "Bit::Vector":
163
164 use Bit::Vector;
165 my $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
166 my $dec = $vec->to_Dec();
167
168 How do I convert from decimal to hexadecimal
169 Using "sprintf":
170
171 my $hex = sprintf("%X", 3735928559); # upper case A-F
172 my $hex = sprintf("%x", 3735928559); # lower case a-f
173
174 Using "unpack":
175
176 my $hex = unpack("H*", pack("N", 3735928559));
177
178 Using Bit::Vector:
179
180 use Bit::Vector;
181 my $vec = Bit::Vector->new_Dec(32, -559038737);
182 my $hex = $vec->to_Hex();
183
184 And Bit::Vector supports odd bit counts:
185
186 use Bit::Vector;
187 my $vec = Bit::Vector->new_Dec(33, 3735928559);
188 $vec->Resize(32); # suppress leading 0 if unwanted
189 my $hex = $vec->to_Hex();
190
191 How do I convert from octal to decimal
192 Using Perl's built in conversion of numbers with leading zeros:
193
194 my $dec = 033653337357; # note the leading 0!
195
196 Using the "oct" function:
197
198 my $dec = oct("33653337357");
199
200 Using Bit::Vector:
201
202 use Bit::Vector;
203 my $vec = Bit::Vector->new(32);
204 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
205 my $dec = $vec->to_Dec();
206
207 How do I convert from decimal to octal
208 Using "sprintf":
209
210 my $oct = sprintf("%o", 3735928559);
211
212 Using Bit::Vector:
213
214 use Bit::Vector;
215 my $vec = Bit::Vector->new_Dec(32, -559038737);
216 my $oct = reverse join('', $vec->Chunk_List_Read(3));
217
218 How do I convert from binary to decimal
219 Perl 5.6 lets you write binary numbers directly with the "0b"
220 notation:
221
222 my $number = 0b10110110;
223
224 Using "oct":
225
226 my $input = "10110110";
227 my $decimal = oct( "0b$input" );
228
229 Using "pack" and "ord":
230
231 my $decimal = ord(pack('B8', '10110110'));
232
233 Using "pack" and "unpack" for larger strings:
234
235 my $int = unpack("N", pack("B32",
236 substr("0" x 32 . "11110101011011011111011101111", -32)));
237 my $dec = sprintf("%d", $int);
238
239 # substr() is used to left-pad a 32-character string with zeros.
240
241 Using Bit::Vector:
242
243 my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
244 my $dec = $vec->to_Dec();
245
246 How do I convert from decimal to binary
247 Using "sprintf" (perl 5.6+):
248
249 my $bin = sprintf("%b", 3735928559);
250
251 Using "unpack":
252
253 my $bin = unpack("B*", pack("N", 3735928559));
254
255 Using Bit::Vector:
256
257 use Bit::Vector;
258 my $vec = Bit::Vector->new_Dec(32, -559038737);
259 my $bin = $vec->to_Bin();
260
261 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
262 are left as an exercise to the inclined reader.
263
264 Why doesn't & work the way I want it to?
265 The behavior of binary arithmetic operators depends on whether they're
266 used on numbers or strings. The operators treat a string as a series of
267 bits and work with that (the string "3" is the bit pattern 00110011).
268 The operators work with the binary form of a number (the number 3 is
269 treated as the bit pattern 00000011).
270
271 So, saying "11 & 3" performs the "and" operation on numbers (yielding
272 3). Saying "11" & "3" performs the "and" operation on strings (yielding
273 "1").
274
275 Most problems with "&" and "|" arise because the programmer thinks they
276 have a number but really it's a string or vice versa. To avoid this,
277 stringify the arguments explicitly (using "" or "qq()") or convert them
278 to numbers explicitly (using "0+$arg"). The rest arise because the
279 programmer says:
280
281 if ("\020\020" & "\101\101") {
282 # ...
283 }
284
285 but a string consisting of two null bytes (the result of "\020\020" &
286 "\101\101") is not a false value in Perl. You need:
287
288 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
289 # ...
290 }
291
292 How do I multiply matrices?
293 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
294 or the PDL extension (also available from CPAN).
295
296 How do I perform an operation on a series of integers?
297 To call a function on each element in an array, and collect the
298 results, use:
299
300 my @results = map { my_func($_) } @array;
301
302 For example:
303
304 my @triple = map { 3 * $_ } @single;
305
306 To call a function on each element of an array, but ignore the results:
307
308 foreach my $iterator (@array) {
309 some_func($iterator);
310 }
311
312 To call a function on each integer in a (small) range, you can use:
313
314 my @results = map { some_func($_) } (5 .. 25);
315
316 but you should be aware that in this form, the ".." operator creates a
317 list of all integers in the range, which can take a lot of memory for
318 large ranges. However, the problem does not occur when using ".."
319 within a "for" loop, because in that case the range operator is
320 optimized to iterate over the range, without creating the entire list.
321 So
322
323 my @results = ();
324 for my $i (5 .. 500_005) {
325 push(@results, some_func($i));
326 }
327
328 or even
329
330 push(@results, some_func($_)) for 5 .. 500_005;
331
332 will not create an intermediate list of 500,000 integers.
333
334 How can I output Roman numerals?
335 Get the <http://www.cpan.org/modules/by-module/Roman> module.
336
337 Why aren't my random numbers random?
338 If you're using a version of Perl before 5.004, you must call "srand"
339 once at the start of your program to seed the random number generator.
340
341 BEGIN { srand() if $] < 5.004 }
342
343 5.004 and later automatically call "srand" at the beginning. Don't call
344 "srand" more than once--you make your numbers less random, rather than
345 more.
346
347 Computers are good at being predictable and bad at being random
348 (despite appearances caused by bugs in your programs :-). The random
349 article in the "Far More Than You Ever Wanted To Know" collection in
350 <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy of Tom
351 Phoenix, talks more about this. John von Neumann said, "Anyone who
352 attempts to generate random numbers by deterministic means is, of
353 course, living in a state of sin."
354
355 Perl relies on the underlying system for the implementation of "rand"
356 and "srand"; on some systems, the generated numbers are not random
357 enough (especially on Windows : see
358 <http://www.perlmonks.org/?node_id=803632>). Several CPAN modules in
359 the "Math" namespace implement better pseudorandom generators; see for
360 example Math::Random::MT ("Mersenne Twister", fast), or
361 Math::TrulyRandom (uses the imperfections in the system's timer to
362 generate random numbers, which is rather slow). More algorithms for
363 random numbers are described in "Numerical Recipes in C" at
364 <http://www.nr.com/>
365
366 How do I get a random number between X and Y?
367 To get a random number between two values, you can use the "rand()"
368 built-in to get a random number between 0 and 1. From there, you shift
369 that into the range that you want.
370
371 "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
372 you want to have perl figure out is a random number in the range from 0
373 to the difference between your X and Y.
374
375 That is, to get a number between 10 and 15, inclusive, you want a
376 random number between 0 and 5 that you can then add to 10.
377
378 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
379
380 Hence you derive the following simple function to abstract that. It
381 selects a random integer between the two given integers (inclusive).
382 For example: "random_int_between(50,120)".
383
384 sub random_int_between {
385 my($min, $max) = @_;
386 # Assumes that the two arguments are integers themselves!
387 return $min if $min == $max;
388 ($min, $max) = ($max, $min) if $min > $max;
389 return $min + int rand(1 + $max - $min);
390 }
391
393 How do I find the day or week of the year?
394 The day of the year is in the list returned by the "localtime"
395 function. Without an argument "localtime" uses the current time.
396
397 my $day_of_year = (localtime)[7];
398
399 The POSIX module can also format a date as the day of the year or week
400 of the year.
401
402 use POSIX qw/strftime/;
403 my $day_of_year = strftime "%j", localtime;
404 my $week_of_year = strftime "%W", localtime;
405
406 To get the day of year for any date, use POSIX's "mktime" to get a time
407 in epoch seconds for the argument to "localtime".
408
409 use POSIX qw/mktime strftime/;
410 my $week_of_year = strftime "%W",
411 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
412
413 You can also use Time::Piece, which comes with Perl and provides a
414 "localtime" that returns an object:
415
416 use Time::Piece;
417 my $day_of_year = localtime->yday;
418 my $week_of_year = localtime->week;
419
420 The Date::Calc module provides two functions to calculate these, too:
421
422 use Date::Calc;
423 my $day_of_year = Day_of_Year( 1987, 12, 18 );
424 my $week_of_year = Week_of_Year( 1987, 12, 18 );
425
426 How do I find the current century or millennium?
427 Use the following simple functions:
428
429 sub get_century {
430 return int((((localtime(shift || time))[5] + 1999))/100);
431 }
432
433 sub get_millennium {
434 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
435 }
436
437 On some systems, the POSIX module's "strftime()" function has been
438 extended in a non-standard way to use a %C format, which they sometimes
439 claim is the "century". It isn't, because on most such systems, this is
440 only the first two digits of the four-digit year, and thus cannot be
441 used to determine reliably the current century or millennium.
442
443 How can I compare two dates and find the difference?
444 (contributed by brian d foy)
445
446 You could just store all your dates as a number and then subtract.
447 Life isn't always that simple though.
448
449 The Time::Piece module, which comes with Perl, replaces localtime with
450 a version that returns an object. It also overloads the comparison
451 operators so you can compare them directly:
452
453 use Time::Piece;
454 my $date1 = localtime( $some_time );
455 my $date2 = localtime( $some_other_time );
456
457 if( $date1 < $date2 ) {
458 print "The date was in the past\n";
459 }
460
461 You can also get differences with a subtraction, which returns a
462 Time::Seconds object:
463
464 my $date_diff = $date1 - $date2;
465 print "The difference is ", $date_diff->days, " days\n";
466
467 If you want to work with formatted dates, the Date::Manip, Date::Calc,
468 or DateTime modules can help you.
469
470 How can I take a string and turn it into epoch seconds?
471 If it's a regular enough string that it always has the same format, you
472 can split it up and pass the parts to "timelocal" in the standard
473 Time::Local module. Otherwise, you should look into the Date::Calc,
474 Date::Parse, and Date::Manip modules from CPAN.
475
476 How can I find the Julian Day?
477 (contributed by brian d foy and Dave Cross)
478
479 You can use the Time::Piece module, part of the Standard Library, which
480 can convert a date/time to a Julian Day:
481
482 $ perl -MTime::Piece -le 'print localtime->julian_day'
483 2455607.7959375
484
485 Or the modified Julian Day:
486
487 $ perl -MTime::Piece -le 'print localtime->mjd'
488 55607.2961226851
489
490 Or even the day of the year (which is what some people think of as a
491 Julian day):
492
493 $ perl -MTime::Piece -le 'print localtime->yday'
494 45
495
496 You can also do the same things with the DateTime module:
497
498 $ perl -MDateTime -le'print DateTime->today->jd'
499 2453401.5
500 $ perl -MDateTime -le'print DateTime->today->mjd'
501 53401
502 $ perl -MDateTime -le'print DateTime->today->doy'
503 31
504
505 You can use the Time::JulianDay module available on CPAN. Ensure that
506 you really want to find a Julian day, though, as many people have
507 different ideas about Julian days (see
508 <http://www.hermetic.ch/cal_stud/jdn.htm> for instance):
509
510 $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
511 55608
512
513 How do I find yesterday's date?
514 (contributed by brian d foy)
515
516 To do it correctly, you can use one of the "Date" modules since they
517 work with calendars instead of times. The DateTime module makes it
518 simple, and give you the same time of day, only the day before, despite
519 daylight saving time changes:
520
521 use DateTime;
522
523 my $yesterday = DateTime->now->subtract( days => 1 );
524
525 print "Yesterday was $yesterday\n";
526
527 You can also use the Date::Calc module using its "Today_and_Now"
528 function.
529
530 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
531
532 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
533
534 print "@date_time\n";
535
536 Most people try to use the time rather than the calendar to figure out
537 dates, but that assumes that days are twenty-four hours each. For most
538 people, there are two days a year when they aren't: the switch to and
539 from summer time throws this off. For example, the rest of the
540 suggestions will be wrong sometimes:
541
542 Starting with Perl 5.10, Time::Piece and Time::Seconds are part of the
543 standard distribution, so you might think that you could do something
544 like this:
545
546 use Time::Piece;
547 use Time::Seconds;
548
549 my $yesterday = localtime() - ONE_DAY; # WRONG
550 print "Yesterday was $yesterday\n";
551
552 The Time::Piece module exports a new "localtime" that returns an
553 object, and Time::Seconds exports the "ONE_DAY" constant that is a set
554 number of seconds. This means that it always gives the time 24 hours
555 ago, which is not always yesterday. This can cause problems around the
556 end of daylight saving time when there's one day that is 25 hours long.
557
558 You have the same problem with Time::Local, which will give the wrong
559 answer for those same special cases:
560
561 # contributed by Gunnar Hjalmarsson
562 use Time::Local;
563 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
564 my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
565 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
566
567 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
568 (contributed by brian d foy)
569
570 Perl itself never had a Y2K problem, although that never stopped people
571 from creating Y2K problems on their own. See the documentation for
572 "localtime" for its proper use.
573
574 Starting with Perl 5.12, "localtime" and "gmtime" can handle dates past
575 03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
576 still might get a warning on a 32-bit "perl":
577
578 % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
579 Integer overflow in hexadecimal number at -e line 1.
580 Wed Nov 1 19:42:39 5576711
581
582 On a 64-bit "perl", you can get even larger dates for those really long
583 running projects:
584
585 % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
586 Thu Nov 2 00:42:39 5576711
587
588 You're still out of luck if you need to keep track of decaying protons
589 though.
590
592 How do I validate input?
593 (contributed by brian d foy)
594
595 There are many ways to ensure that values are what you expect or want
596 to accept. Besides the specific examples that we cover in the perlfaq,
597 you can also look at the modules with "Assert" and "Validate" in their
598 names, along with other modules such as Regexp::Common.
599
600 Some modules have validation for particular types of input, such as
601 Business::ISBN, Business::CreditCard, Email::Valid, and
602 Data::Validate::IP.
603
604 How do I unescape a string?
605 It depends just what you mean by "escape". URL escapes are dealt with
606 in perlfaq9. Shell escapes with the backslash ("\") character are
607 removed with
608
609 s/\\(.)/$1/g;
610
611 This won't expand "\n" or "\t" or any other special escapes.
612
613 How do I remove consecutive pairs of characters?
614 (contributed by brian d foy)
615
616 You can use the substitution operator to find pairs of characters (or
617 runs of characters) and replace them with a single instance. In this
618 substitution, we find a character in "(.)". The memory parentheses
619 store the matched character in the back-reference "\g1" and we use that
620 to require that the same thing immediately follow it. We replace that
621 part of the string with the character in $1.
622
623 s/(.)\g1/$1/g;
624
625 We can also use the transliteration operator, "tr///". In this example,
626 the search list side of our "tr///" contains nothing, but the "c"
627 option complements that so it contains everything. The replacement list
628 also contains nothing, so the transliteration is almost a no-op since
629 it won't do any replacements (or more exactly, replace the character
630 with itself). However, the "s" option squashes duplicated and
631 consecutive characters in the string so a character does not show up
632 next to itself
633
634 my $str = 'Haarlem'; # in the Netherlands
635 $str =~ tr///cs; # Now Harlem, like in New York
636
637 How do I expand function calls in a string?
638 (contributed by brian d foy)
639
640 This is documented in perlref, and although it's not the easiest thing
641 to read, it does work. In each of these examples, we call the function
642 inside the braces used to dereference a reference. If we have more than
643 one return value, we can construct and dereference an anonymous array.
644 In this case, we call the function in list context.
645
646 print "The time values are @{ [localtime] }.\n";
647
648 If we want to call the function in scalar context, we have to do a bit
649 more work. We can really have any code we like inside the braces, so we
650 simply have to end with the scalar reference, although how you do that
651 is up to you, and you can use code inside the braces. Note that the use
652 of parens creates a list context, so we need "scalar" to force the
653 scalar context on the function:
654
655 print "The time is ${\(scalar localtime)}.\n"
656
657 print "The time is ${ my $x = localtime; \$x }.\n";
658
659 If your function already returns a reference, you don't need to create
660 the reference yourself.
661
662 sub timestamp { my $t = localtime; \$t }
663
664 print "The time is ${ timestamp() }.\n";
665
666 The "Interpolation" module can also do a lot of magic for you. You can
667 specify a variable name, in this case "E", to set up a tied hash that
668 does the interpolation for you. It has several other methods to do this
669 as well.
670
671 use Interpolation E => 'eval';
672 print "The time values are $E{localtime()}.\n";
673
674 In most cases, it is probably easier to simply use string
675 concatenation, which also forces scalar context.
676
677 print "The time is " . localtime() . ".\n";
678
679 How do I find matching/nesting anything?
680 To find something between two single characters, a pattern like
681 "/x([^x]*)x/" will get the intervening bits in $1. For multiple ones,
682 then something more like "/alpha(.*?)omega/" would be needed. For
683 nested patterns and/or balanced expressions, see the so-called (?PARNO)
684 construct (available since perl 5.10). The CPAN module Regexp::Common
685 can help to build such regular expressions (see in particular
686 Regexp::Common::balanced and Regexp::Common::delimited).
687
688 More complex cases will require to write a parser, probably using a
689 parsing module from CPAN, like Regexp::Grammars, Parse::RecDescent,
690 Parse::Yapp, Text::Balanced, or Marpa::R2.
691
692 How do I reverse a string?
693 Use "reverse()" in scalar context, as documented in "reverse" in
694 perlfunc.
695
696 my $reversed = reverse $string;
697
698 How do I expand tabs in a string?
699 You can do it yourself:
700
701 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
702
703 Or you can just use the Text::Tabs module (part of the standard Perl
704 distribution).
705
706 use Text::Tabs;
707 my @expanded_lines = expand(@lines_with_tabs);
708
709 How do I reformat a paragraph?
710 Use Text::Wrap (part of the standard Perl distribution):
711
712 use Text::Wrap;
713 print wrap("\t", ' ', @paragraphs);
714
715 The paragraphs you give to Text::Wrap should not contain embedded
716 newlines. Text::Wrap doesn't justify the lines (flush-right).
717
718 Or use the CPAN module Text::Autoformat. Formatting files can be easily
719 done by making a shell alias, like so:
720
721 alias fmt="perl -i -MText::Autoformat -n0777 \
722 -e 'print autoformat $_, {all=>1}' $*"
723
724 See the documentation for Text::Autoformat to appreciate its many
725 capabilities.
726
727 How can I access or change N characters of a string?
728 You can access the first characters of a string with substr(). To get
729 the first character, for example, start at position 0 and grab the
730 string of length 1.
731
732 my $string = "Just another Perl Hacker";
733 my $first_char = substr( $string, 0, 1 ); # 'J'
734
735 To change part of a string, you can use the optional fourth argument
736 which is the replacement string.
737
738 substr( $string, 13, 4, "Perl 5.8.0" );
739
740 You can also use substr() as an lvalue.
741
742 substr( $string, 13, 4 ) = "Perl 5.8.0";
743
744 How do I change the Nth occurrence of something?
745 You have to keep track of N yourself. For example, let's say you want
746 to change the fifth occurrence of "whoever" or "whomever" into
747 "whosoever" or "whomsoever", case insensitively. These all assume that
748 $_ contains the string to be altered.
749
750 $count = 0;
751 s{((whom?)ever)}{
752 ++$count == 5 # is it the 5th?
753 ? "${2}soever" # yes, swap
754 : $1 # renege and leave it there
755 }ige;
756
757 In the more general case, you can use the "/g" modifier in a "while"
758 loop, keeping count of matches.
759
760 $WANT = 3;
761 $count = 0;
762 $_ = "One fish two fish red fish blue fish";
763 while (/(\w+)\s+fish\b/gi) {
764 if (++$count == $WANT) {
765 print "The third fish is a $1 one.\n";
766 }
767 }
768
769 That prints out: "The third fish is a red one." You can also use a
770 repetition count and repeated pattern like this:
771
772 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
773
774 How can I count the number of occurrences of a substring within a string?
775 There are a number of ways, with varying efficiency. If you want a
776 count of a certain single character (X) within a string, you can use
777 the "tr///" function like so:
778
779 my $string = "ThisXlineXhasXsomeXx'sXinXit";
780 my $count = ($string =~ tr/X//);
781 print "There are $count X characters in the string";
782
783 This is fine if you are just looking for a single character. However,
784 if you are trying to count multiple character substrings within a
785 larger string, "tr///" won't work. What you can do is wrap a while()
786 loop around a global pattern match. For example, let's count negative
787 integers:
788
789 my $string = "-9 55 48 -2 23 -76 4 14 -44";
790 my $count = 0;
791 while ($string =~ /-\d+/g) { $count++ }
792 print "There are $count negative numbers in the string";
793
794 Another version uses a global match in list context, then assigns the
795 result to a scalar, producing a count of the number of matches.
796
797 my $count = () = $string =~ /-\d+/g;
798
799 How do I capitalize all the words on one line?
800 (contributed by brian d foy)
801
802 Damian Conway's Text::Autoformat handles all of the thinking for you.
803
804 use Text::Autoformat;
805 my $x = "Dr. Strangelove or: How I Learned to Stop ".
806 "Worrying and Love the Bomb";
807
808 print $x, "\n";
809 for my $style (qw( sentence title highlight )) {
810 print autoformat($x, { case => $style }), "\n";
811 }
812
813 How do you want to capitalize those words?
814
815 FRED AND BARNEY'S LODGE # all uppercase
816 Fred And Barney's Lodge # title case
817 Fred and Barney's Lodge # highlight case
818
819 It's not as easy a problem as it looks. How many words do you think are
820 in there? Wait for it... wait for it.... If you answered 5 you're
821 right. Perl words are groups of "\w+", but that's not what you want to
822 capitalize. How is Perl supposed to know not to capitalize that "s"
823 after the apostrophe? You could try a regular expression:
824
825 $string =~ s/ (
826 (^\w) #at the beginning of the line
827 | # or
828 (\s\w) #preceded by whitespace
829 )
830 /\U$1/xg;
831
832 $string =~ s/([\w']+)/\u\L$1/g;
833
834 Now, what if you don't want to capitalize that "and"? Just use
835 Text::Autoformat and get on with the next problem. :)
836
837 How can I split a [character]-delimited string except when inside
838 [character]?
839 Several modules can handle this sort of parsing--Text::Balanced,
840 Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
841
842 Take the example case of trying to split a string that is comma-
843 separated into its different fields. You can't use "split(/,/)" because
844 you shouldn't split if the comma is inside quotes. For example, take a
845 data line like this:
846
847 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
848
849 Due to the restriction of the quotes, this is a fairly complex problem.
850 Thankfully, we have Jeffrey Friedl, author of Mastering Regular
851 Expressions, to handle these for us. He suggests (assuming your string
852 is contained in $text):
853
854 my @new = ();
855 push(@new, $+) while $text =~ m{
856 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
857 | ([^,]+),?
858 | ,
859 }gx;
860 push(@new, undef) if substr($text,-1,1) eq ',';
861
862 If you want to represent quotation marks inside a quotation-mark-
863 delimited field, escape them with backslashes (eg, "like \"this\"".
864
865 Alternatively, the Text::ParseWords module (part of the standard Perl
866 distribution) lets you say:
867
868 use Text::ParseWords;
869 @new = quotewords(",", 0, $text);
870
871 For parsing or generating CSV, though, using Text::CSV rather than
872 implementing it yourself is highly recommended; you'll save yourself
873 odd bugs popping up later by just using code which has already been
874 tried and tested in production for years.
875
876 How do I strip blank space from the beginning/end of a string?
877 (contributed by brian d foy)
878
879 A substitution can do this for you. For a single line, you want to
880 replace all the leading or trailing whitespace with nothing. You can do
881 that with a pair of substitutions:
882
883 s/^\s+//;
884 s/\s+$//;
885
886 You can also write that as a single substitution, although it turns out
887 the combined statement is slower than the separate ones. That might not
888 matter to you, though:
889
890 s/^\s+|\s+$//g;
891
892 In this regular expression, the alternation matches either at the
893 beginning or the end of the string since the anchors have a lower
894 precedence than the alternation. With the "/g" flag, the substitution
895 makes all possible matches, so it gets both. Remember, the trailing
896 newline matches the "\s+", and the "$" anchor can match to the
897 absolute end of the string, so the newline disappears too. Just add the
898 newline to the output, which has the added benefit of preserving
899 "blank" (consisting entirely of whitespace) lines which the "^\s+"
900 would remove all by itself:
901
902 while( <> ) {
903 s/^\s+|\s+$//g;
904 print "$_\n";
905 }
906
907 For a multi-line string, you can apply the regular expression to each
908 logical line in the string by adding the "/m" flag (for "multi-line").
909 With the "/m" flag, the "$" matches before an embedded newline, so it
910 doesn't remove it. This pattern still removes the newline at the end of
911 the string:
912
913 $string =~ s/^\s+|\s+$//gm;
914
915 Remember that lines consisting entirely of whitespace will disappear,
916 since the first part of the alternation can match the entire string and
917 replace it with nothing. If you need to keep embedded blank lines, you
918 have to do a little more work. Instead of matching any whitespace
919 (since that includes a newline), just match the other whitespace:
920
921 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
922
923 How do I pad a string with blanks or pad a number with zeroes?
924 In the following examples, $pad_len is the length to which you wish to
925 pad the string, $text or $num contains the string to be padded, and
926 $pad_char contains the padding character. You can use a single
927 character string constant instead of the $pad_char variable if you know
928 what it is in advance. And in the same way you can use an integer in
929 place of $pad_len if you know the pad length in advance.
930
931 The simplest method uses the "sprintf" function. It can pad on the left
932 or right with blanks and on the left with zeroes and it will not
933 truncate the result. The "pack" function can only pad strings on the
934 right with blanks and it will truncate the result to a maximum length
935 of $pad_len.
936
937 # Left padding a string with blanks (no truncation):
938 my $padded = sprintf("%${pad_len}s", $text);
939 my $padded = sprintf("%*s", $pad_len, $text); # same thing
940
941 # Right padding a string with blanks (no truncation):
942 my $padded = sprintf("%-${pad_len}s", $text);
943 my $padded = sprintf("%-*s", $pad_len, $text); # same thing
944
945 # Left padding a number with 0 (no truncation):
946 my $padded = sprintf("%0${pad_len}d", $num);
947 my $padded = sprintf("%0*d", $pad_len, $num); # same thing
948
949 # Right padding a string with blanks using pack (will truncate):
950 my $padded = pack("A$pad_len",$text);
951
952 If you need to pad with a character other than blank or zero you can
953 use one of the following methods. They all generate a pad string with
954 the "x" operator and combine that with $text. These methods do not
955 truncate $text.
956
957 Left and right padding with any character, creating a new string:
958
959 my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
960 my $padded = $text . $pad_char x ( $pad_len - length( $text ) );
961
962 Left and right padding with any character, modifying $text directly:
963
964 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
965 $text .= $pad_char x ( $pad_len - length( $text ) );
966
967 How do I extract selected columns from a string?
968 (contributed by brian d foy)
969
970 If you know the columns that contain the data, you can use "substr" to
971 extract a single column.
972
973 my $column = substr( $line, $start_column, $length );
974
975 You can use "split" if the columns are separated by whitespace or some
976 other delimiter, as long as whitespace or the delimiter cannot appear
977 as part of the data.
978
979 my $line = ' fred barney betty ';
980 my @columns = split /\s+/, $line;
981 # ( '', 'fred', 'barney', 'betty' );
982
983 my $line = 'fred||barney||betty';
984 my @columns = split /\|/, $line;
985 # ( 'fred', '', 'barney', '', 'betty' );
986
987 If you want to work with comma-separated values, don't do this since
988 that format is a bit more complicated. Use one of the modules that
989 handle that format, such as Text::CSV, Text::CSV_XS, or Text::CSV_PP.
990
991 If you want to break apart an entire line of fixed columns, you can use
992 "unpack" with the A (ASCII) format. By using a number after the format
993 specifier, you can denote the column width. See the "pack" and "unpack"
994 entries in perlfunc for more details.
995
996 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
997
998 Note that spaces in the format argument to "unpack" do not denote
999 literal spaces. If you have space separated data, you may want "split"
1000 instead.
1001
1002 How do I find the soundex value of a string?
1003 (contributed by brian d foy)
1004
1005 You can use the "Text::Soundex" module. If you want to do fuzzy or
1006 close matching, you might also try the String::Approx, and
1007 Text::Metaphone, and Text::DoubleMetaphone modules.
1008
1009 How can I expand variables in text strings?
1010 (contributed by brian d foy)
1011
1012 If you can avoid it, don't, or if you can use a templating system, such
1013 as Text::Template or Template Toolkit, do that instead. You might even
1014 be able to get the job done with "sprintf" or "printf":
1015
1016 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1017
1018 However, for the one-off simple case where I don't want to pull out a
1019 full templating system, I'll use a string that has two Perl scalar
1020 variables in it. In this example, I want to expand $foo and $bar to
1021 their variable's values:
1022
1023 my $foo = 'Fred';
1024 my $bar = 'Barney';
1025 $string = 'Say hello to $foo and $bar';
1026
1027 One way I can do this involves the substitution operator and a double
1028 "/e" flag. The first "/e" evaluates $1 on the replacement side and
1029 turns it into $foo. The second /e starts with $foo and replaces it with
1030 its value. $foo, then, turns into 'Fred', and that's finally what's
1031 left in the string:
1032
1033 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1034
1035 The "/e" will also silently ignore violations of strict, replacing
1036 undefined variable names with the empty string. Since I'm using the
1037 "/e" flag (twice even!), I have all of the same security problems I
1038 have with "eval" in its string form. If there's something odd in $foo,
1039 perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1040 myself in trouble.
1041
1042 To get around the security problem, I could also pull the values from a
1043 hash instead of evaluating variable names. Using a single "/e", I can
1044 check the hash to ensure the value exists, and if it doesn't, I can
1045 replace the missing value with a marker, in this case "???" to signal
1046 that I missed something:
1047
1048 my $string = 'This has $foo and $bar';
1049
1050 my %Replacements = (
1051 foo => 'Fred',
1052 );
1053
1054 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1055 $string =~ s/\$(\w+)/
1056 exists $Replacements{$1} ? $Replacements{$1} : '???'
1057 /eg;
1058
1059 print $string;
1060
1061 What's wrong with always quoting "$vars"?
1062 The problem is that those double-quotes force stringification--coercing
1063 numbers and references into strings--even when you don't want them to
1064 be strings. Think of it this way: double-quote expansion is used to
1065 produce new strings. If you already have a string, why do you need
1066 more?
1067
1068 If you get used to writing odd things like these:
1069
1070 print "$var"; # BAD
1071 my $new = "$old"; # BAD
1072 somefunc("$var"); # BAD
1073
1074 You'll be in trouble. Those should (in 99.8% of the cases) be the
1075 simpler and more direct:
1076
1077 print $var;
1078 my $new = $old;
1079 somefunc($var);
1080
1081 Otherwise, besides slowing you down, you're going to break code when
1082 the thing in the scalar is actually neither a string nor a number, but
1083 a reference:
1084
1085 func(\@array);
1086 sub func {
1087 my $aref = shift;
1088 my $oref = "$aref"; # WRONG
1089 }
1090
1091 You can also get into subtle problems on those few operations in Perl
1092 that actually do care about the difference between a string and a
1093 number, such as the magical "++" autoincrement operator or the
1094 syscall() function.
1095
1096 Stringification also destroys arrays.
1097
1098 my @lines = `command`;
1099 print "@lines"; # WRONG - extra blanks
1100 print @lines; # right
1101
1102 Why don't my <<HERE documents work?
1103 Here documents are found in perlop. Check for these three things:
1104
1105 There must be no space after the << part.
1106 There (probably) should be a semicolon at the end of the opening token
1107 You can't (easily) have any space in front of the tag.
1108 There needs to be at least a line separator after the end token.
1109
1110 If you want to indent the text in the here document, you can do this:
1111
1112 # all in one
1113 (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1114 your text
1115 goes here
1116 HERE_TARGET
1117
1118 But the HERE_TARGET must still be flush against the margin. If you
1119 want that indented also, you'll have to quote in the indentation.
1120
1121 (my $quote = <<' FINIS') =~ s/^\s+//gm;
1122 ...we will have peace, when you and all your works have
1123 perished--and the works of your dark master to whom you
1124 would deliver us. You are a liar, Saruman, and a corrupter
1125 of men's hearts. --Theoden in /usr/src/perl/taint.c
1126 FINIS
1127 $quote =~ s/\s+--/\n--/;
1128
1129 A nice general-purpose fixer-upper function for indented here documents
1130 follows. It expects to be called with a here document as its argument.
1131 It looks to see whether each line begins with a common substring, and
1132 if so, strips that substring off. Otherwise, it takes the amount of
1133 leading whitespace found on the first line and removes that much off
1134 each subsequent line.
1135
1136 sub fix {
1137 local $_ = shift;
1138 my ($white, $leader); # common whitespace and common leading string
1139 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
1140 ($white, $leader) = ($2, quotemeta($1));
1141 } else {
1142 ($white, $leader) = (/^(\s+)/, '');
1143 }
1144 s/^\s*?$leader(?:$white)?//gm;
1145 return $_;
1146 }
1147
1148 This works with leading special strings, dynamically determined:
1149
1150 my $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1151 @@@ int
1152 @@@ runops() {
1153 @@@ SAVEI32(runlevel);
1154 @@@ runlevel++;
1155 @@@ while ( op = (*op->op_ppaddr)() );
1156 @@@ TAINT_NOT;
1157 @@@ return 0;
1158 @@@ }
1159 MAIN_INTERPRETER_LOOP
1160
1161 Or with a fixed amount of leading whitespace, with remaining
1162 indentation correctly preserved:
1163
1164 my $poem = fix<<EVER_ON_AND_ON;
1165 Now far ahead the Road has gone,
1166 And I must follow, if I can,
1167 Pursuing it with eager feet,
1168 Until it joins some larger way
1169 Where many paths and errands meet.
1170 And whither then? I cannot say.
1171 --Bilbo in /usr/src/perl/pp_ctl.c
1172 EVER_ON_AND_ON
1173
1174 Beginning with Perl version 5.26, a much simpler and cleaner way to
1175 write indented here documents has been added to the language: the tilde
1176 (~) modifier. See "Indented Here-docs" in perlop for details.
1177
1179 What is the difference between a list and an array?
1180 (contributed by brian d foy)
1181
1182 A list is a fixed collection of scalars. An array is a variable that
1183 holds a variable collection of scalars. An array can supply its
1184 collection for list operations, so list operations also work on arrays:
1185
1186 # slices
1187 ( 'dog', 'cat', 'bird' )[2,3];
1188 @animals[2,3];
1189
1190 # iteration
1191 foreach ( qw( dog cat bird ) ) { ... }
1192 foreach ( @animals ) { ... }
1193
1194 my @three = grep { length == 3 } qw( dog cat bird );
1195 my @three = grep { length == 3 } @animals;
1196
1197 # supply an argument list
1198 wash_animals( qw( dog cat bird ) );
1199 wash_animals( @animals );
1200
1201 Array operations, which change the scalars, rearrange them, or add or
1202 subtract some scalars, only work on arrays. These can't work on a list,
1203 which is fixed. Array operations include "shift", "unshift", "push",
1204 "pop", and "splice".
1205
1206 An array can also change its length:
1207
1208 $#animals = 1; # truncate to two elements
1209 $#animals = 10000; # pre-extend to 10,001 elements
1210
1211 You can change an array element, but you can't change a list element:
1212
1213 $animals[0] = 'Rottweiler';
1214 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1215
1216 foreach ( @animals ) {
1217 s/^d/fr/; # works fine
1218 }
1219
1220 foreach ( qw( dog cat bird ) ) {
1221 s/^d/fr/; # Error! Modification of read only value!
1222 }
1223
1224 However, if the list element is itself a variable, it appears that you
1225 can change a list element. However, the list element is the variable,
1226 not the data. You're not changing the list element, but something the
1227 list element refers to. The list element itself doesn't change: it's
1228 still the same variable.
1229
1230 You also have to be careful about context. You can assign an array to a
1231 scalar to get the number of elements in the array. This only works for
1232 arrays, though:
1233
1234 my $count = @animals; # only works with arrays
1235
1236 If you try to do the same thing with what you think is a list, you get
1237 a quite different result. Although it looks like you have a list on the
1238 righthand side, Perl actually sees a bunch of scalars separated by a
1239 comma:
1240
1241 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
1242
1243 Since you're assigning to a scalar, the righthand side is in scalar
1244 context. The comma operator (yes, it's an operator!) in scalar context
1245 evaluates its lefthand side, throws away the result, and evaluates it's
1246 righthand side and returns the result. In effect, that list-lookalike
1247 assigns to $scalar it's rightmost value. Many people mess this up
1248 because they choose a list-lookalike whose last element is also the
1249 count they expect:
1250
1251 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
1252
1253 What is the difference between $array[1] and @array[1]?
1254 (contributed by brian d foy)
1255
1256 The difference is the sigil, that special character in front of the
1257 array name. The "$" sigil means "exactly one item", while the "@" sigil
1258 means "zero or more items". The "$" gets you a single scalar, while the
1259 "@" gets you a list.
1260
1261 The confusion arises because people incorrectly assume that the sigil
1262 denotes the variable type.
1263
1264 The $array[1] is a single-element access to the array. It's going to
1265 return the item in index 1 (or undef if there is no item there). If
1266 you intend to get exactly one element from the array, this is the form
1267 you should use.
1268
1269 The @array[1] is an array slice, although it has only one index. You
1270 can pull out multiple elements simultaneously by specifying additional
1271 indices as a list, like @array[1,4,3,0].
1272
1273 Using a slice on the lefthand side of the assignment supplies list
1274 context to the righthand side. This can lead to unexpected results.
1275 For instance, if you want to read a single line from a filehandle,
1276 assigning to a scalar value is fine:
1277
1278 $array[1] = <STDIN>;
1279
1280 However, in list context, the line input operator returns all of the
1281 lines as a list. The first line goes into @array[1] and the rest of the
1282 lines mysteriously disappear:
1283
1284 @array[1] = <STDIN>; # most likely not what you want
1285
1286 Either the "use warnings" pragma or the -w flag will warn you when you
1287 use an array slice with a single index.
1288
1289 How can I remove duplicate elements from a list or array?
1290 (contributed by brian d foy)
1291
1292 Use a hash. When you think the words "unique" or "duplicated", think
1293 "hash keys".
1294
1295 If you don't care about the order of the elements, you could just
1296 create the hash then extract the keys. It's not important how you
1297 create that hash: just that you use "keys" to get the unique elements.
1298
1299 my %hash = map { $_, 1 } @array;
1300 # or a hash slice: @hash{ @array } = ();
1301 # or a foreach: $hash{$_} = 1 foreach ( @array );
1302
1303 my @unique = keys %hash;
1304
1305 If you want to use a module, try the "uniq" function from
1306 List::MoreUtils. In list context it returns the unique elements,
1307 preserving their order in the list. In scalar context, it returns the
1308 number of unique elements.
1309
1310 use List::MoreUtils qw(uniq);
1311
1312 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1313 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1314
1315 You can also go through each element and skip the ones you've seen
1316 before. Use a hash to keep track. The first time the loop sees an
1317 element, that element has no key in %Seen. The "next" statement creates
1318 the key and immediately uses its value, which is "undef", so the loop
1319 continues to the "push" and increments the value for that key. The next
1320 time the loop sees that same element, its key exists in the hash and
1321 the value for that key is true (since it's not 0 or "undef"), so the
1322 next skips that iteration and the loop goes to the next element.
1323
1324 my @unique = ();
1325 my %seen = ();
1326
1327 foreach my $elem ( @array ) {
1328 next if $seen{ $elem }++;
1329 push @unique, $elem;
1330 }
1331
1332 You can write this more briefly using a grep, which does the same
1333 thing.
1334
1335 my %seen = ();
1336 my @unique = grep { ! $seen{ $_ }++ } @array;
1337
1338 How can I tell whether a certain element is contained in a list or array?
1339 (portions of this answer contributed by Anno Siegel and brian d foy)
1340
1341 Hearing the word "in" is an indication that you probably should have
1342 used a hash, not a list or array, to store your data. Hashes are
1343 designed to answer this question quickly and efficiently. Arrays
1344 aren't.
1345
1346 That being said, there are several ways to approach this. In Perl 5.10
1347 and later, you can use the smart match operator to check that an item
1348 is contained in an array or a hash:
1349
1350 use 5.010;
1351
1352 if( $item ~~ @array ) {
1353 say "The array contains $item"
1354 }
1355
1356 if( $item ~~ %hash ) {
1357 say "The hash contains $item"
1358 }
1359
1360 With earlier versions of Perl, you have to do a bit more work. If you
1361 are going to make this query many times over arbitrary string values,
1362 the fastest way is probably to invert the original array and maintain a
1363 hash whose keys are the first array's values:
1364
1365 my @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1366 my %is_blue = ();
1367 for (@blues) { $is_blue{$_} = 1 }
1368
1369 Now you can check whether $is_blue{$some_color}. It might have been a
1370 good idea to keep the blues all in a hash in the first place.
1371
1372 If the values are all small integers, you could use a simple indexed
1373 array. This kind of an array will take up less space:
1374
1375 my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1376 my @is_tiny_prime = ();
1377 for (@primes) { $is_tiny_prime[$_] = 1 }
1378 # or simply @istiny_prime[@primes] = (1) x @primes;
1379
1380 Now you check whether $is_tiny_prime[$some_number].
1381
1382 If the values in question are integers instead of strings, you can save
1383 quite a lot of space by using bit strings instead:
1384
1385 my @articles = ( 1..10, 150..2000, 2017 );
1386 undef $read;
1387 for (@articles) { vec($read,$_,1) = 1 }
1388
1389 Now check whether "vec($read,$n,1)" is true for some $n.
1390
1391 These methods guarantee fast individual tests but require a re-
1392 organization of the original list or array. They only pay off if you
1393 have to test multiple values against the same array.
1394
1395 If you are testing only once, the standard module List::Util exports
1396 the function "first" for this purpose. It works by stopping once it
1397 finds the element. It's written in C for speed, and its Perl equivalent
1398 looks like this subroutine:
1399
1400 sub first (&@) {
1401 my $code = shift;
1402 foreach (@_) {
1403 return $_ if &{$code}();
1404 }
1405 undef;
1406 }
1407
1408 If speed is of little concern, the common idiom uses grep in scalar
1409 context (which returns the number of items that passed its condition)
1410 to traverse the entire list. This does have the benefit of telling you
1411 how many matches it found, though.
1412
1413 my $is_there = grep $_ eq $whatever, @array;
1414
1415 If you want to actually extract the matching elements, simply use grep
1416 in list context.
1417
1418 my @matches = grep $_ eq $whatever, @array;
1419
1420 How do I compute the difference of two arrays? How do I compute the
1421 intersection of two arrays?
1422 Use a hash. Here's code to do both and more. It assumes that each
1423 element is unique in a given array:
1424
1425 my (@union, @intersection, @difference);
1426 my %count = ();
1427 foreach my $element (@array1, @array2) { $count{$element}++ }
1428 foreach my $element (keys %count) {
1429 push @union, $element;
1430 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1431 }
1432
1433 Note that this is the symmetric difference, that is, all elements in
1434 either A or in B but not in both. Think of it as an xor operation.
1435
1436 How do I test whether two arrays or hashes are equal?
1437 With Perl 5.10 and later, the smart match operator can give you the
1438 answer with the least amount of work:
1439
1440 use 5.010;
1441
1442 if( @array1 ~~ @array2 ) {
1443 say "The arrays are the same";
1444 }
1445
1446 if( %hash1 ~~ %hash2 ) # doesn't check values! {
1447 say "The hash keys are the same";
1448 }
1449
1450 The following code works for single-level arrays. It uses a stringwise
1451 comparison, and does not distinguish defined versus undefined empty
1452 strings. Modify if you have other needs.
1453
1454 $are_equal = compare_arrays(\@frogs, \@toads);
1455
1456 sub compare_arrays {
1457 my ($first, $second) = @_;
1458 no warnings; # silence spurious -w undef complaints
1459 return 0 unless @$first == @$second;
1460 for (my $i = 0; $i < @$first; $i++) {
1461 return 0 if $first->[$i] ne $second->[$i];
1462 }
1463 return 1;
1464 }
1465
1466 For multilevel structures, you may wish to use an approach more like
1467 this one. It uses the CPAN module FreezeThaw:
1468
1469 use FreezeThaw qw(cmpStr);
1470 my @a = my @b = ( "this", "that", [ "more", "stuff" ] );
1471
1472 printf "a and b contain %s arrays\n",
1473 cmpStr(\@a, \@b) == 0
1474 ? "the same"
1475 : "different";
1476
1477 This approach also works for comparing hashes. Here we'll demonstrate
1478 two different answers:
1479
1480 use FreezeThaw qw(cmpStr cmpStrHard);
1481
1482 my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1483 $a{EXTRA} = \%b;
1484 $b{EXTRA} = \%a;
1485
1486 printf "a and b contain %s hashes\n",
1487 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1488
1489 printf "a and b contain %s hashes\n",
1490 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1491
1492 The first reports that both those the hashes contain the same data,
1493 while the second reports that they do not. Which you prefer is left as
1494 an exercise to the reader.
1495
1496 How do I find the first array element for which a condition is true?
1497 To find the first array element which satisfies a condition, you can
1498 use the "first()" function in the List::Util module, which comes with
1499 Perl 5.8. This example finds the first element that contains "Perl".
1500
1501 use List::Util qw(first);
1502
1503 my $element = first { /Perl/ } @array;
1504
1505 If you cannot use List::Util, you can make your own loop to do the same
1506 thing. Once you find the element, you stop the loop with last.
1507
1508 my $found;
1509 foreach ( @array ) {
1510 if( /Perl/ ) { $found = $_; last }
1511 }
1512
1513 If you want the array index, use the "firstidx()" function from
1514 "List::MoreUtils":
1515
1516 use List::MoreUtils qw(firstidx);
1517 my $index = firstidx { /Perl/ } @array;
1518
1519 Or write it yourself, iterating through the indices and checking the
1520 array element at each index until you find one that satisfies the
1521 condition:
1522
1523 my( $found, $index ) = ( undef, -1 );
1524 for( $i = 0; $i < @array; $i++ ) {
1525 if( $array[$i] =~ /Perl/ ) {
1526 $found = $array[$i];
1527 $index = $i;
1528 last;
1529 }
1530 }
1531
1532 How do I handle linked lists?
1533 (contributed by brian d foy)
1534
1535 Perl's arrays do not have a fixed size, so you don't need linked lists
1536 if you just want to add or remove items. You can use array operations
1537 such as "push", "pop", "shift", "unshift", or "splice" to do that.
1538
1539 Sometimes, however, linked lists can be useful in situations where you
1540 want to "shard" an array so you have many small arrays instead of a
1541 single big array. You can keep arrays longer than Perl's largest array
1542 index, lock smaller arrays separately in threaded programs, reallocate
1543 less memory, or quickly insert elements in the middle of the chain.
1544
1545 Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
1546 Linked Lists" ( <http://www.slideshare.net/lembark/perly-linked-lists>
1547 ), although you can just use his LinkedList::Single module.
1548
1549 How do I handle circular lists?
1550 (contributed by brian d foy)
1551
1552 If you want to cycle through an array endlessly, you can increment the
1553 index modulo the number of elements in the array:
1554
1555 my @array = qw( a b c );
1556 my $i = 0;
1557
1558 while( 1 ) {
1559 print $array[ $i++ % @array ], "\n";
1560 last if $i > 20;
1561 }
1562
1563 You can also use Tie::Cycle to use a scalar that always has the next
1564 element of the circular array:
1565
1566 use Tie::Cycle;
1567
1568 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1569
1570 print $cycle; # FFFFFF
1571 print $cycle; # 000000
1572 print $cycle; # FFFF00
1573
1574 The Array::Iterator::Circular creates an iterator object for circular
1575 arrays:
1576
1577 use Array::Iterator::Circular;
1578
1579 my $color_iterator = Array::Iterator::Circular->new(
1580 qw(red green blue orange)
1581 );
1582
1583 foreach ( 1 .. 20 ) {
1584 print $color_iterator->next, "\n";
1585 }
1586
1587 How do I shuffle an array randomly?
1588 If you either have Perl 5.8.0 or later installed, or if you have
1589 Scalar-List-Utils 1.03 or later installed, you can say:
1590
1591 use List::Util 'shuffle';
1592
1593 @shuffled = shuffle(@list);
1594
1595 If not, you can use a Fisher-Yates shuffle.
1596
1597 sub fisher_yates_shuffle {
1598 my $deck = shift; # $deck is a reference to an array
1599 return unless @$deck; # must not be empty!
1600
1601 my $i = @$deck;
1602 while (--$i) {
1603 my $j = int rand ($i+1);
1604 @$deck[$i,$j] = @$deck[$j,$i];
1605 }
1606 }
1607
1608 # shuffle my mpeg collection
1609 #
1610 my @mpeg = <audio/*/*.mp3>;
1611 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1612 print @mpeg;
1613
1614 Note that the above implementation shuffles an array in place, unlike
1615 the "List::Util::shuffle()" which takes a list and returns a new
1616 shuffled list.
1617
1618 You've probably seen shuffling algorithms that work using splice,
1619 randomly picking another element to swap the current element with
1620
1621 srand;
1622 @new = ();
1623 @old = 1 .. 10; # just a demo
1624 while (@old) {
1625 push(@new, splice(@old, rand @old, 1));
1626 }
1627
1628 This is bad because splice is already O(N), and since you do it N
1629 times, you just invented a quadratic algorithm; that is, O(N**2). This
1630 does not scale, although Perl is so efficient that you probably won't
1631 notice this until you have rather largish arrays.
1632
1633 How do I process/modify each element of an array?
1634 Use "for"/"foreach":
1635
1636 for (@lines) {
1637 s/foo/bar/; # change that word
1638 tr/XZ/ZX/; # swap those letters
1639 }
1640
1641 Here's another; let's compute spherical volumes:
1642
1643 my @volumes = @radii;
1644 for (@volumes) { # @volumes has changed parts
1645 $_ **= 3;
1646 $_ *= (4/3) * 3.14159; # this will be constant folded
1647 }
1648
1649 which can also be done with "map()" which is made to transform one list
1650 into another:
1651
1652 my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1653
1654 If you want to do the same thing to modify the values of the hash, you
1655 can use the "values" function. As of Perl 5.6 the values are not
1656 copied, so if you modify $orbit (in this case), you modify the value.
1657
1658 for my $orbit ( values %orbits ) {
1659 ($orbit **= 3) *= (4/3) * 3.14159;
1660 }
1661
1662 Prior to perl 5.6 "values" returned copies of the values, so older perl
1663 code often contains constructions such as @orbits{keys %orbits} instead
1664 of "values %orbits" where the hash is to be modified.
1665
1666 How do I select a random element from an array?
1667 Use the "rand()" function (see "rand" in perlfunc):
1668
1669 my $index = rand @array;
1670 my $element = $array[$index];
1671
1672 Or, simply:
1673
1674 my $element = $array[ rand @array ];
1675
1676 How do I permute N elements of a list?
1677 Use the List::Permutor module on CPAN. If the list is actually an
1678 array, try the Algorithm::Permute module (also on CPAN). It's written
1679 in XS code and is very efficient:
1680
1681 use Algorithm::Permute;
1682
1683 my @array = 'a'..'d';
1684 my $p_iterator = Algorithm::Permute->new ( \@array );
1685
1686 while (my @perm = $p_iterator->next) {
1687 print "next permutation: (@perm)\n";
1688 }
1689
1690 For even faster execution, you could do:
1691
1692 use Algorithm::Permute;
1693
1694 my @array = 'a'..'d';
1695
1696 Algorithm::Permute::permute {
1697 print "next permutation: (@array)\n";
1698 } @array;
1699
1700 Here's a little program that generates all permutations of all the
1701 words on each line of input. The algorithm embodied in the "permute()"
1702 function is discussed in Volume 4 (still unpublished) of Knuth's The
1703 Art of Computer Programming and will work on any list:
1704
1705 #!/usr/bin/perl -n
1706 # Fischer-Krause ordered permutation generator
1707
1708 sub permute (&@) {
1709 my $code = shift;
1710 my @idx = 0..$#_;
1711 while ( $code->(@_[@idx]) ) {
1712 my $p = $#idx;
1713 --$p while $idx[$p-1] > $idx[$p];
1714 my $q = $p or return;
1715 push @idx, reverse splice @idx, $p;
1716 ++$q while $idx[$p-1] > $idx[$q];
1717 @idx[$p-1,$q]=@idx[$q,$p-1];
1718 }
1719 }
1720
1721 permute { print "@_\n" } split;
1722
1723 The Algorithm::Loops module also provides the "NextPermute" and
1724 "NextPermuteNum" functions which efficiently find all unique
1725 permutations of an array, even if it contains duplicate values,
1726 modifying it in-place: if its elements are in reverse-sorted order then
1727 the array is reversed, making it sorted, and it returns false;
1728 otherwise the next permutation is returned.
1729
1730 "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1731 you can enumerate all the permutations of 0..9 like this:
1732
1733 use Algorithm::Loops qw(NextPermuteNum);
1734
1735 my @list= 0..9;
1736 do { print "@list\n" } while NextPermuteNum @list;
1737
1738 How do I sort an array by (anything)?
1739 Supply a comparison function to sort() (described in "sort" in
1740 perlfunc):
1741
1742 @list = sort { $a <=> $b } @list;
1743
1744 The default sort function is cmp, string comparison, which would sort
1745 "(1, 2, 10)" into "(1, 10, 2)". "<=>", used above, is the numerical
1746 comparison operator.
1747
1748 If you have a complicated function needed to pull out the part you want
1749 to sort on, then don't do it inside the sort function. Pull it out
1750 first, because the sort BLOCK can be called many times for the same
1751 element. Here's an example of how to pull out the first word after the
1752 first number on each item, and then sort those words case-
1753 insensitively.
1754
1755 my @idx;
1756 for (@data) {
1757 my $item;
1758 ($item) = /\d+\s*(\S+)/;
1759 push @idx, uc($item);
1760 }
1761 my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1762
1763 which could also be written this way, using a trick that's come to be
1764 known as the Schwartzian Transform:
1765
1766 my @sorted = map { $_->[0] }
1767 sort { $a->[1] cmp $b->[1] }
1768 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1769
1770 If you need to sort on several fields, the following paradigm is
1771 useful.
1772
1773 my @sorted = sort {
1774 field1($a) <=> field1($b) ||
1775 field2($a) cmp field2($b) ||
1776 field3($a) cmp field3($b)
1777 } @data;
1778
1779 This can be conveniently combined with precalculation of keys as given
1780 above.
1781
1782 See the sort article in the "Far More Than You Ever Wanted To Know"
1783 collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for more
1784 about this approach.
1785
1786 See also the question later in perlfaq4 on sorting hashes.
1787
1788 How do I manipulate arrays of bits?
1789 Use "pack()" and "unpack()", or else "vec()" and the bitwise
1790 operations.
1791
1792 For example, you don't have to store individual bits in an array (which
1793 would mean that you're wasting a lot of space). To convert an array of
1794 bits to a string, use "vec()" to set the right bits. This sets $vec to
1795 have bit N set only if $ints[N] was set:
1796
1797 my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1798 my $vec = '';
1799 foreach( 0 .. $#ints ) {
1800 vec($vec,$_,1) = 1 if $ints[$_];
1801 }
1802
1803 The string $vec only takes up as many bits as it needs. For instance,
1804 if you had 16 entries in @ints, $vec only needs two bytes to store them
1805 (not counting the scalar variable overhead).
1806
1807 Here's how, given a vector in $vec, you can get those bits into your
1808 @ints array:
1809
1810 sub bitvec_to_list {
1811 my $vec = shift;
1812 my @ints;
1813 # Find null-byte density then select best algorithm
1814 if ($vec =~ tr/\0// / length $vec > 0.95) {
1815 use integer;
1816 my $i;
1817
1818 # This method is faster with mostly null-bytes
1819 while($vec =~ /[^\0]/g ) {
1820 $i = -9 + 8 * pos $vec;
1821 push @ints, $i if vec($vec, ++$i, 1);
1822 push @ints, $i if vec($vec, ++$i, 1);
1823 push @ints, $i if vec($vec, ++$i, 1);
1824 push @ints, $i if vec($vec, ++$i, 1);
1825 push @ints, $i if vec($vec, ++$i, 1);
1826 push @ints, $i if vec($vec, ++$i, 1);
1827 push @ints, $i if vec($vec, ++$i, 1);
1828 push @ints, $i if vec($vec, ++$i, 1);
1829 }
1830 }
1831 else {
1832 # This method is a fast general algorithm
1833 use integer;
1834 my $bits = unpack "b*", $vec;
1835 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1836 push @ints, pos $bits while($bits =~ /1/g);
1837 }
1838
1839 return \@ints;
1840 }
1841
1842 This method gets faster the more sparse the bit vector is. (Courtesy
1843 of Tim Bunce and Winfried Koenig.)
1844
1845 You can make the while loop a lot shorter with this suggestion from
1846 Benjamin Goldberg:
1847
1848 while($vec =~ /[^\0]+/g ) {
1849 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1850 }
1851
1852 Or use the CPAN module Bit::Vector:
1853
1854 my $vector = Bit::Vector->new($num_of_bits);
1855 $vector->Index_List_Store(@ints);
1856 my @ints = $vector->Index_List_Read();
1857
1858 Bit::Vector provides efficient methods for bit vector, sets of small
1859 integers and "big int" math.
1860
1861 Here's a more extensive illustration using vec():
1862
1863 # vec demo
1864 my $vector = "\xff\x0f\xef\xfe";
1865 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1866 unpack("N", $vector), "\n";
1867 my $is_set = vec($vector, 23, 1);
1868 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1869 pvec($vector);
1870
1871 set_vec(1,1,1);
1872 set_vec(3,1,1);
1873 set_vec(23,1,1);
1874
1875 set_vec(3,1,3);
1876 set_vec(3,2,3);
1877 set_vec(3,4,3);
1878 set_vec(3,4,7);
1879 set_vec(3,8,3);
1880 set_vec(3,8,7);
1881
1882 set_vec(0,32,17);
1883 set_vec(1,32,17);
1884
1885 sub set_vec {
1886 my ($offset, $width, $value) = @_;
1887 my $vector = '';
1888 vec($vector, $offset, $width) = $value;
1889 print "offset=$offset width=$width value=$value\n";
1890 pvec($vector);
1891 }
1892
1893 sub pvec {
1894 my $vector = shift;
1895 my $bits = unpack("b*", $vector);
1896 my $i = 0;
1897 my $BASE = 8;
1898
1899 print "vector length in bytes: ", length($vector), "\n";
1900 @bytes = unpack("A8" x length($vector), $bits);
1901 print "bits are: @bytes\n\n";
1902 }
1903
1904 Why does defined() return true on empty arrays and hashes?
1905 The short story is that you should probably only use defined on scalars
1906 or functions, not on aggregates (arrays and hashes). See "defined" in
1907 perlfunc in the 5.004 release or later of Perl for more detail.
1908
1910 How do I process an entire hash?
1911 (contributed by brian d foy)
1912
1913 There are a couple of ways that you can process an entire hash. You can
1914 get a list of keys, then go through each key, or grab a one key-value
1915 pair at a time.
1916
1917 To go through all of the keys, use the "keys" function. This extracts
1918 all of the keys of the hash and gives them back to you as a list. You
1919 can then get the value through the particular key you're processing:
1920
1921 foreach my $key ( keys %hash ) {
1922 my $value = $hash{$key}
1923 ...
1924 }
1925
1926 Once you have the list of keys, you can process that list before you
1927 process the hash elements. For instance, you can sort the keys so you
1928 can process them in lexical order:
1929
1930 foreach my $key ( sort keys %hash ) {
1931 my $value = $hash{$key}
1932 ...
1933 }
1934
1935 Or, you might want to only process some of the items. If you only want
1936 to deal with the keys that start with "text:", you can select just
1937 those using "grep":
1938
1939 foreach my $key ( grep /^text:/, keys %hash ) {
1940 my $value = $hash{$key}
1941 ...
1942 }
1943
1944 If the hash is very large, you might not want to create a long list of
1945 keys. To save some memory, you can grab one key-value pair at a time
1946 using "each()", which returns a pair you haven't seen yet:
1947
1948 while( my( $key, $value ) = each( %hash ) ) {
1949 ...
1950 }
1951
1952 The "each" operator returns the pairs in apparently random order, so if
1953 ordering matters to you, you'll have to stick with the "keys" method.
1954
1955 The "each()" operator can be a bit tricky though. You can't add or
1956 delete keys of the hash while you're using it without possibly skipping
1957 or re-processing some pairs after Perl internally rehashes all of the
1958 elements. Additionally, a hash has only one iterator, so if you mix
1959 "keys", "values", or "each" on the same hash, you risk resetting the
1960 iterator and messing up your processing. See the "each" entry in
1961 perlfunc for more details.
1962
1963 How do I merge two hashes?
1964 (contributed by brian d foy)
1965
1966 Before you decide to merge two hashes, you have to decide what to do if
1967 both hashes contain keys that are the same and if you want to leave the
1968 original hashes as they were.
1969
1970 If you want to preserve the original hashes, copy one hash (%hash1) to
1971 a new hash (%new_hash), then add the keys from the other hash (%hash2
1972 to the new hash. Checking that the key already exists in %new_hash
1973 gives you a chance to decide what to do with the duplicates:
1974
1975 my %new_hash = %hash1; # make a copy; leave %hash1 alone
1976
1977 foreach my $key2 ( keys %hash2 ) {
1978 if( exists $new_hash{$key2} ) {
1979 warn "Key [$key2] is in both hashes!";
1980 # handle the duplicate (perhaps only warning)
1981 ...
1982 next;
1983 }
1984 else {
1985 $new_hash{$key2} = $hash2{$key2};
1986 }
1987 }
1988
1989 If you don't want to create a new hash, you can still use this looping
1990 technique; just change the %new_hash to %hash1.
1991
1992 foreach my $key2 ( keys %hash2 ) {
1993 if( exists $hash1{$key2} ) {
1994 warn "Key [$key2] is in both hashes!";
1995 # handle the duplicate (perhaps only warning)
1996 ...
1997 next;
1998 }
1999 else {
2000 $hash1{$key2} = $hash2{$key2};
2001 }
2002 }
2003
2004 If you don't care that one hash overwrites keys and values from the
2005 other, you could just use a hash slice to add one hash to another. In
2006 this case, values from %hash2 replace values from %hash1 when they have
2007 keys in common:
2008
2009 @hash1{ keys %hash2 } = values %hash2;
2010
2011 What happens if I add or remove keys from a hash while iterating over it?
2012 (contributed by brian d foy)
2013
2014 The easy answer is "Don't do that!"
2015
2016 If you iterate through the hash with each(), you can delete the key
2017 most recently returned without worrying about it. If you delete or add
2018 other keys, the iterator may skip or double up on them since perl may
2019 rearrange the hash table. See the entry for "each()" in perlfunc.
2020
2021 How do I look up a hash element by value?
2022 Create a reverse hash:
2023
2024 my %by_value = reverse %by_key;
2025 my $key = $by_value{$value};
2026
2027 That's not particularly efficient. It would be more space-efficient to
2028 use:
2029
2030 while (my ($key, $value) = each %by_key) {
2031 $by_value{$value} = $key;
2032 }
2033
2034 If your hash could have repeated values, the methods above will only
2035 find one of the associated keys. This may or may not worry you. If it
2036 does worry you, you can always reverse the hash into a hash of arrays
2037 instead:
2038
2039 while (my ($key, $value) = each %by_key) {
2040 push @{$key_list_by_value{$value}}, $key;
2041 }
2042
2043 How can I know how many entries are in a hash?
2044 (contributed by brian d foy)
2045
2046 This is very similar to "How do I process an entire hash?", also in
2047 perlfaq4, but a bit simpler in the common cases.
2048
2049 You can use the "keys()" built-in function in scalar context to find
2050 out have many entries you have in a hash:
2051
2052 my $key_count = keys %hash; # must be scalar context!
2053
2054 If you want to find out how many entries have a defined value, that's a
2055 bit different. You have to check each value. A "grep" is handy:
2056
2057 my $defined_value_count = grep { defined } values %hash;
2058
2059 You can use that same structure to count the entries any way that you
2060 like. If you want the count of the keys with vowels in them, you just
2061 test for that instead:
2062
2063 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2064
2065 The "grep" in scalar context returns the count. If you want the list of
2066 matching items, just use it in list context instead:
2067
2068 my @defined_values = grep { defined } values %hash;
2069
2070 The "keys()" function also resets the iterator, which means that you
2071 may see strange results if you use this between uses of other hash
2072 operators such as "each()".
2073
2074 How do I sort a hash (optionally by value instead of key)?
2075 (contributed by brian d foy)
2076
2077 To sort a hash, start with the keys. In this example, we give the list
2078 of keys to the sort function which then compares them ASCIIbetically
2079 (which might be affected by your locale settings). The output list has
2080 the keys in ASCIIbetical order. Once we have the keys, we can go
2081 through them to create a report which lists the keys in ASCIIbetical
2082 order.
2083
2084 my @keys = sort { $a cmp $b } keys %hash;
2085
2086 foreach my $key ( @keys ) {
2087 printf "%-20s %6d\n", $key, $hash{$key};
2088 }
2089
2090 We could get more fancy in the "sort()" block though. Instead of
2091 comparing the keys, we can compute a value with them and use that value
2092 as the comparison.
2093
2094 For instance, to make our report order case-insensitive, we use "lc" to
2095 lowercase the keys before comparing them:
2096
2097 my @keys = sort { lc $a cmp lc $b } keys %hash;
2098
2099 Note: if the computation is expensive or the hash has many elements,
2100 you may want to look at the Schwartzian Transform to cache the
2101 computation results.
2102
2103 If we want to sort by the hash value instead, we use the hash key to
2104 look it up. We still get out a list of keys, but this time they are
2105 ordered by their value.
2106
2107 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2108
2109 From there we can get more complex. If the hash values are the same, we
2110 can provide a secondary sort on the hash key.
2111
2112 my @keys = sort {
2113 $hash{$a} <=> $hash{$b}
2114 or
2115 "\L$a" cmp "\L$b"
2116 } keys %hash;
2117
2118 How can I always keep my hash sorted?
2119 You can look into using the "DB_File" module and "tie()" using the
2120 $DB_BTREE hash bindings as documented in "In Memory Databases" in
2121 DB_File. The Tie::IxHash module from CPAN might also be instructive.
2122 Although this does keep your hash sorted, you might not like the
2123 slowdown you suffer from the tie interface. Are you sure you need to do
2124 this? :)
2125
2126 What's the difference between "delete" and "undef" with hashes?
2127 Hashes contain pairs of scalars: the first is the key, the second is
2128 the value. The key will be coerced to a string, although the value can
2129 be any kind of scalar: string, number, or reference. If a key $key is
2130 present in %hash, "exists($hash{$key})" will return true. The value for
2131 a given key can be "undef", in which case $hash{$key} will be "undef"
2132 while "exists $hash{$key}" will return true. This corresponds to ($key,
2133 "undef") being in the hash.
2134
2135 Pictures help... Here's the %hash table:
2136
2137 keys values
2138 +------+------+
2139 | a | 3 |
2140 | x | 7 |
2141 | d | 0 |
2142 | e | 2 |
2143 +------+------+
2144
2145 And these conditions hold
2146
2147 $hash{'a'} is true
2148 $hash{'d'} is false
2149 defined $hash{'d'} is true
2150 defined $hash{'a'} is true
2151 exists $hash{'a'} is true (Perl 5 only)
2152 grep ($_ eq 'a', keys %hash) is true
2153
2154 If you now say
2155
2156 undef $hash{'a'}
2157
2158 your table now reads:
2159
2160 keys values
2161 +------+------+
2162 | a | undef|
2163 | x | 7 |
2164 | d | 0 |
2165 | e | 2 |
2166 +------+------+
2167
2168 and these conditions now hold; changes in caps:
2169
2170 $hash{'a'} is FALSE
2171 $hash{'d'} is false
2172 defined $hash{'d'} is true
2173 defined $hash{'a'} is FALSE
2174 exists $hash{'a'} is true (Perl 5 only)
2175 grep ($_ eq 'a', keys %hash) is true
2176
2177 Notice the last two: you have an undef value, but a defined key!
2178
2179 Now, consider this:
2180
2181 delete $hash{'a'}
2182
2183 your table now reads:
2184
2185 keys values
2186 +------+------+
2187 | x | 7 |
2188 | d | 0 |
2189 | e | 2 |
2190 +------+------+
2191
2192 and these conditions now hold; changes in caps:
2193
2194 $hash{'a'} is false
2195 $hash{'d'} is false
2196 defined $hash{'d'} is true
2197 defined $hash{'a'} is false
2198 exists $hash{'a'} is FALSE (Perl 5 only)
2199 grep ($_ eq 'a', keys %hash) is FALSE
2200
2201 See, the whole entry is gone!
2202
2203 Why don't my tied hashes make the defined/exists distinction?
2204 This depends on the tied hash's implementation of EXISTS(). For
2205 example, there isn't the concept of undef with hashes that are tied to
2206 DBM* files. It also means that exists() and defined() do the same thing
2207 with a DBM* file, and what they end up doing is not what they do with
2208 ordinary hashes.
2209
2210 How do I reset an each() operation part-way through?
2211 (contributed by brian d foy)
2212
2213 You can use the "keys" or "values" functions to reset "each". To simply
2214 reset the iterator used by "each" without doing anything else, use one
2215 of them in void context:
2216
2217 keys %hash; # resets iterator, nothing else.
2218 values %hash; # resets iterator, nothing else.
2219
2220 See the documentation for "each" in perlfunc.
2221
2222 How can I get the unique keys from two hashes?
2223 First you extract the keys from the hashes into lists, then solve the
2224 "removing duplicates" problem described above. For example:
2225
2226 my %seen = ();
2227 for my $element (keys(%foo), keys(%bar)) {
2228 $seen{$element}++;
2229 }
2230 my @uniq = keys %seen;
2231
2232 Or more succinctly:
2233
2234 my @uniq = keys %{{%foo,%bar}};
2235
2236 Or if you really want to save space:
2237
2238 my %seen = ();
2239 while (defined ($key = each %foo)) {
2240 $seen{$key}++;
2241 }
2242 while (defined ($key = each %bar)) {
2243 $seen{$key}++;
2244 }
2245 my @uniq = keys %seen;
2246
2247 How can I store a multidimensional array in a DBM file?
2248 Either stringify the structure yourself (no fun), or else get the MLDBM
2249 (which uses Data::Dumper) module from CPAN and layer it on top of
2250 either DB_File or GDBM_File. You might also try DBM::Deep, but it can
2251 be a bit slow.
2252
2253 How can I make my hash remember the order I put elements into it?
2254 Use the Tie::IxHash from CPAN.
2255
2256 use Tie::IxHash;
2257
2258 tie my %myhash, 'Tie::IxHash';
2259
2260 for (my $i=0; $i<20; $i++) {
2261 $myhash{$i} = 2*$i;
2262 }
2263
2264 my @keys = keys %myhash;
2265 # @keys = (0,1,2,3,...)
2266
2267 Why does passing a subroutine an undefined element in a hash create it?
2268 (contributed by brian d foy)
2269
2270 Are you using a really old version of Perl?
2271
2272 Normally, accessing a hash key's value for a nonexistent key will not
2273 create the key.
2274
2275 my %hash = ();
2276 my $value = $hash{ 'foo' };
2277 print "This won't print\n" if exists $hash{ 'foo' };
2278
2279 Passing $hash{ 'foo' } to a subroutine used to be a special case,
2280 though. Since you could assign directly to $_[0], Perl had to be ready
2281 to make that assignment so it created the hash key ahead of time:
2282
2283 my_sub( $hash{ 'foo' } );
2284 print "This will print before 5.004\n" if exists $hash{ 'foo' };
2285
2286 sub my_sub {
2287 # $_[0] = 'bar'; # create hash key in case you do this
2288 1;
2289 }
2290
2291 Since Perl 5.004, however, this situation is a special case and Perl
2292 creates the hash key only when you make the assignment:
2293
2294 my_sub( $hash{ 'foo' } );
2295 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2296
2297 sub my_sub {
2298 $_[0] = 'bar';
2299 }
2300
2301 However, if you want the old behavior (and think carefully about that
2302 because it's a weird side effect), you can pass a hash slice instead.
2303 Perl 5.004 didn't make this a special case:
2304
2305 my_sub( @hash{ qw/foo/ } );
2306
2307 How can I make the Perl equivalent of a C structure/C++ class/hash or array
2308 of hashes or arrays?
2309 Usually a hash ref, perhaps like this:
2310
2311 $record = {
2312 NAME => "Jason",
2313 EMPNO => 132,
2314 TITLE => "deputy peon",
2315 AGE => 23,
2316 SALARY => 37_000,
2317 PALS => [ "Norbert", "Rhys", "Phineas"],
2318 };
2319
2320 References are documented in perlref and perlreftut. Examples of
2321 complex data structures are given in perldsc and perllol. Examples of
2322 structures and object-oriented classes are in perlootut.
2323
2324 How can I use a reference as a hash key?
2325 (contributed by brian d foy and Ben Morrow)
2326
2327 Hash keys are strings, so you can't really use a reference as the key.
2328 When you try to do that, perl turns the reference into its stringified
2329 form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2330 the reference from the stringified form, at least without doing some
2331 extra work on your own.
2332
2333 Remember that the entry in the hash will still be there even if the
2334 referenced variable goes out of scope, and that it is entirely
2335 possible for Perl to subsequently allocate a different variable at the
2336 same address. This will mean a new variable might accidentally be
2337 associated with the value for an old.
2338
2339 If you have Perl 5.10 or later, and you just want to store a value
2340 against the reference for lookup later, you can use the core
2341 Hash::Util::Fieldhash module. This will also handle renaming the keys
2342 if you use multiple threads (which causes all variables to be
2343 reallocated at new addresses, changing their stringification), and
2344 garbage-collecting the entries when the referenced variable goes out of
2345 scope.
2346
2347 If you actually need to be able to get a real reference back from each
2348 hash entry, you can use the Tie::RefHash module, which does the
2349 required work for you.
2350
2351 How can I check if a key exists in a multilevel hash?
2352 (contributed by brian d foy)
2353
2354 The trick to this problem is avoiding accidental autovivification. If
2355 you want to check three keys deep, you might naïvely try this:
2356
2357 my %hash;
2358 if( exists $hash{key1}{key2}{key3} ) {
2359 ...;
2360 }
2361
2362 Even though you started with a completely empty hash, after that call
2363 to "exists" you've created the structure you needed to check for
2364 "key3":
2365
2366 %hash = (
2367 'key1' => {
2368 'key2' => {}
2369 }
2370 );
2371
2372 That's autovivification. You can get around this in a few ways. The
2373 easiest way is to just turn it off. The lexical "autovivification"
2374 pragma is available on CPAN. Now you don't add to the hash:
2375
2376 {
2377 no autovivification;
2378 my %hash;
2379 if( exists $hash{key1}{key2}{key3} ) {
2380 ...;
2381 }
2382 }
2383
2384 The Data::Diver module on CPAN can do it for you too. Its "Dive"
2385 subroutine can tell you not only if the keys exist but also get the
2386 value:
2387
2388 use Data::Diver qw(Dive);
2389
2390 my @exists = Dive( \%hash, qw(key1 key2 key3) );
2391 if( ! @exists ) {
2392 ...; # keys do not exist
2393 }
2394 elsif( ! defined $exists[0] ) {
2395 ...; # keys exist but value is undef
2396 }
2397
2398 You can easily do this yourself too by checking each level of the hash
2399 before you move onto the next level. This is essentially what
2400 Data::Diver does for you:
2401
2402 if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2403 ...;
2404 }
2405
2406 sub check_hash {
2407 my( $hash, @keys ) = @_;
2408
2409 return unless @keys;
2410
2411 foreach my $key ( @keys ) {
2412 return unless eval { exists $hash->{$key} };
2413 $hash = $hash->{$key};
2414 }
2415
2416 return 1;
2417 }
2418
2419 How can I prevent addition of unwanted keys into a hash?
2420 Since version 5.8.0, hashes can be restricted to a fixed number of
2421 given keys. Methods for creating and dealing with restricted hashes are
2422 exported by the Hash::Util module.
2423
2425 How do I handle binary data correctly?
2426 Perl is binary-clean, so it can handle binary data just fine. On
2427 Windows or DOS, however, you have to use "binmode" for binary files to
2428 avoid conversions for line endings. In general, you should use
2429 "binmode" any time you want to work with binary data.
2430
2431 Also see "binmode" in perlfunc or perlopentut.
2432
2433 If you're concerned about 8-bit textual data then see perllocale. If
2434 you want to deal with multibyte characters, however, there are some
2435 gotchas. See the section on Regular Expressions.
2436
2437 How do I determine whether a scalar is a number/whole/integer/float?
2438 Assuming that you don't care about IEEE notations like "NaN" or
2439 "Infinity", you probably just want to use a regular expression (see
2440 also perlretut and perlre):
2441
2442 use 5.010;
2443
2444 if ( /\D/ )
2445 { say "\thas nondigits"; }
2446 if ( /^\d+\z/ )
2447 { say "\tis a whole number"; }
2448 if ( /^-?\d+\z/ )
2449 { say "\tis an integer"; }
2450 if ( /^[+-]?\d+\z/ )
2451 { say "\tis a +/- integer"; }
2452 if ( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2453 { say "\tis a real number"; }
2454 if ( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i )
2455 { say "\tis a C float" }
2456
2457 There are also some commonly used modules for the task. Scalar::Util
2458 (distributed with 5.8) provides access to perl's internal function
2459 "looks_like_number" for determining whether a variable looks like a
2460 number. Data::Types exports functions that validate data types using
2461 both the above and other regular expressions. Thirdly, there is
2462 Regexp::Common which has regular expressions to match various types of
2463 numbers. Those three modules are available from the CPAN.
2464
2465 If you're on a POSIX system, Perl supports the "POSIX::strtod" function
2466 for converting strings to doubles (and also "POSIX::strtol" for longs).
2467 Its semantics are somewhat cumbersome, so here's a "getnum" wrapper
2468 function for more convenient access. This function takes a string and
2469 returns the number it found, or "undef" for input that isn't a C float.
2470 The "is_numeric" function is a front end to "getnum" if you just want
2471 to say, "Is this a float?"
2472
2473 sub getnum {
2474 use POSIX qw(strtod);
2475 my $str = shift;
2476 $str =~ s/^\s+//;
2477 $str =~ s/\s+$//;
2478 $! = 0;
2479 my($num, $unparsed) = strtod($str);
2480 if (($str eq '') || ($unparsed != 0) || $!) {
2481 return undef;
2482 }
2483 else {
2484 return $num;
2485 }
2486 }
2487
2488 sub is_numeric { defined getnum($_[0]) }
2489
2490 Or you could check out the String::Scanf module on the CPAN instead.
2491
2492 How do I keep persistent data across program calls?
2493 For some specific applications, you can use one of the DBM modules.
2494 See AnyDBM_File. More generically, you should consult the FreezeThaw or
2495 Storable modules from CPAN. Starting from Perl 5.8, Storable is part of
2496 the standard distribution. Here's one example using Storable's "store"
2497 and "retrieve" functions:
2498
2499 use Storable;
2500 store(\%hash, "filename");
2501
2502 # later on...
2503 $href = retrieve("filename"); # by ref
2504 %hash = %{ retrieve("filename") }; # direct to hash
2505
2506 How do I print out or copy a recursive data structure?
2507 The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
2508 for printing out data structures. The Storable module on CPAN (or the
2509 5.8 release of Perl), provides a function called "dclone" that
2510 recursively copies its argument.
2511
2512 use Storable qw(dclone);
2513 $r2 = dclone($r1);
2514
2515 Where $r1 can be a reference to any kind of data structure you'd like.
2516 It will be deeply copied. Because "dclone" takes and returns
2517 references, you'd have to add extra punctuation if you had a hash of
2518 arrays that you wanted to copy.
2519
2520 %newhash = %{ dclone(\%oldhash) };
2521
2522 How do I define methods for every class/object?
2523 (contributed by Ben Morrow)
2524
2525 You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2526 very careful to consider the consequences of doing this: adding methods
2527 to every object is very likely to have unintended consequences. If
2528 possible, it would be better to have all your object inherit from some
2529 common base class, or to use an object system like Moose that supports
2530 roles.
2531
2532 How do I verify a credit card checksum?
2533 Get the Business::CreditCard module from CPAN.
2534
2535 How do I pack arrays of doubles or floats for XS code?
2536 The arrays.h/arrays.c code in the PGPLOT module on CPAN does just this.
2537 If you're doing a lot of float or double processing, consider using the
2538 PDL module from CPAN instead--it makes number-crunching easy.
2539
2540 See <https://metacpan.org/release/PGPLOT> for the code.
2541
2543 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other
2544 authors as noted. All rights reserved.
2545
2546 This documentation is free; you can redistribute it and/or modify it
2547 under the same terms as Perl itself.
2548
2549 Irrespective of its distribution, all code examples in this file are
2550 hereby placed into the public domain. You are permitted and encouraged
2551 to use this code in your own programs for fun or for profit as you see
2552 fit. A simple comment in the code giving credit would be courteous but
2553 is not required.
2554
2555
2556
2557perl v5.28.1 2019-01-26 perlfaq4(3)