1PERLFAQ4(1) Perl Programmers Reference Guide PERLFAQ4(1)
2
3
4
6 perlfaq4 - Data Manipulation
7
9 This section of the FAQ answers questions related to manipulating
10 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
11
13 Why am I getting long decimals (eg, 19.9499999999999) instead of the
14 numbers I should be getting (eg, 19.95)?
15 For the long explanation, see David Goldberg's "What Every Computer
16 Scientist Should Know About Floating-Point Arithmetic"
17 (<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>).
18
19 Internally, your computer represents floating-point numbers in binary.
20 Digital (as in powers of two) computers cannot store all numbers
21 exactly. Some real numbers lose precision in the process. This is a
22 problem with how computers store numbers and affects all computer
23 languages, not just Perl.
24
25 perlnumber shows the gory details of number representations and
26 conversions.
27
28 To limit the number of decimal places in your numbers, you can use the
29 "printf" or "sprintf" function. See "Floating-point Arithmetic" in
30 perlop for more details.
31
32 printf "%.2f", 10/3;
33
34 my $number = sprintf "%.2f", 10/3;
35
36 Why is int() broken?
37 Your "int()" is most probably working just fine. It's the numbers that
38 aren't quite what you think.
39
40 First, see the answer to "Why am I getting long decimals (eg,
41 19.9499999999999) instead of the numbers I should be getting (eg,
42 19.95)?".
43
44 For example, this
45
46 print int(0.6/0.2-2), "\n";
47
48 will in most computers print 0, not 1, because even such simple numbers
49 as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
50 What you think in the above as 'three' is really more like
51 2.9999999999999995559.
52
53 Why isn't my octal data interpreted correctly?
54 (contributed by brian d foy)
55
56 You're probably trying to convert a string to a number, which Perl only
57 converts as a decimal number. When Perl converts a string to a number,
58 it ignores leading spaces and zeroes, then assumes the rest of the
59 digits are in base 10:
60
61 my $string = '0644';
62
63 print $string + 0; # prints 644
64
65 print $string + 44; # prints 688, certainly not octal!
66
67 This problem usually involves one of the Perl built-ins that has the
68 same name a Unix command that uses octal numbers as arguments on the
69 command line. In this example, "chmod" on the command line knows that
70 its first argument is octal because that's what it does:
71
72 %prompt> chmod 644 file
73
74 If you want to use the same literal digits (644) in Perl, you have to
75 tell Perl to treat them as octal numbers either by prefixing the digits
76 with a 0 or using "oct":
77
78 chmod( 0644, $filename ); # right, has leading zero
79 chmod( oct(644), $filename ); # also correct
80
81 The problem comes in when you take your numbers from something that
82 Perl thinks is a string, such as a command line argument in @ARGV:
83
84 chmod( $ARGV[0], $filename ); # wrong, even if "0644"
85
86 chmod( oct($ARGV[0]), $filename ); # correct, treat string as octal
87
88 You can always check the value you're using by printing it in octal
89 notation to ensure it matches what you think it should be. Print it in
90 octal and decimal format:
91
92 printf "0%o %d", $number, $number;
93
94 Does Perl have a round() function? What about ceil() and floor()? Trig
95 functions?
96 Remember that "int()" merely truncates toward 0. For rounding to a
97 certain number of digits, "sprintf()" or "printf()" is usually the
98 easiest route.
99
100 printf("%.3f", 3.1415926535); # prints 3.142
101
102 The POSIX module (part of the standard Perl distribution) implements
103 "ceil()", "floor()", and a number of other mathematical and
104 trigonometric functions.
105
106 use POSIX;
107 my $ceil = ceil(3.5); # 4
108 my $floor = floor(3.5); # 3
109
110 In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
111 module. With 5.004, the Math::Trig module (part of the standard Perl
112 distribution) implements the trigonometric functions. Internally it
113 uses the Math::Complex module and some functions can break out from the
114 real axis into the complex plane, for example the inverse sine of 2.
115
116 Rounding in financial applications can have serious implications, and
117 the rounding method used should be specified precisely. In these cases,
118 it probably pays not to trust whichever system of rounding is being
119 used by Perl, but instead to implement the rounding function you need
120 yourself.
121
122 To see why, notice how you'll still have an issue on half-way-point
123 alternation:
124
125 for (my $i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
126
127 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
128 0.8 0.8 0.9 0.9 1.0 1.0
129
130 Don't blame Perl. It's the same as in C. IEEE says we have to do this.
131 Perl numbers whose absolute values are integers under 2**31 (on 32-bit
132 machines) will work pretty much like mathematical integers. Other
133 numbers are not guaranteed.
134
135 How do I convert between numeric representations/bases/radixes?
136 As always with Perl there is more than one way to do it. Below are a
137 few examples of approaches to making common conversions between number
138 representations. This is intended to be representational rather than
139 exhaustive.
140
141 Some of the examples later in perlfaq4 use the Bit::Vector module from
142 CPAN. The reason you might choose Bit::Vector over the perl built-in
143 functions is that it works with numbers of ANY size, that it is
144 optimized for speed on some operations, and for at least some
145 programmers the notation might be familiar.
146
147 How do I convert hexadecimal into decimal
148 Using perl's built in conversion of "0x" notation:
149
150 my $dec = 0xDEADBEEF;
151
152 Using the "hex" function:
153
154 my $dec = hex("DEADBEEF");
155
156 Using "pack":
157
158 my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
159
160 Using the CPAN module "Bit::Vector":
161
162 use Bit::Vector;
163 my $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
164 my $dec = $vec->to_Dec();
165
166 How do I convert from decimal to hexadecimal
167 Using "sprintf":
168
169 my $hex = sprintf("%X", 3735928559); # upper case A-F
170 my $hex = sprintf("%x", 3735928559); # lower case a-f
171
172 Using "unpack":
173
174 my $hex = unpack("H*", pack("N", 3735928559));
175
176 Using Bit::Vector:
177
178 use Bit::Vector;
179 my $vec = Bit::Vector->new_Dec(32, -559038737);
180 my $hex = $vec->to_Hex();
181
182 And Bit::Vector supports odd bit counts:
183
184 use Bit::Vector;
185 my $vec = Bit::Vector->new_Dec(33, 3735928559);
186 $vec->Resize(32); # suppress leading 0 if unwanted
187 my $hex = $vec->to_Hex();
188
189 How do I convert from octal to decimal
190 Using Perl's built in conversion of numbers with leading zeros:
191
192 my $dec = 033653337357; # note the leading 0!
193
194 Using the "oct" function:
195
196 my $dec = oct("33653337357");
197
198 Using Bit::Vector:
199
200 use Bit::Vector;
201 my $vec = Bit::Vector->new(32);
202 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
203 my $dec = $vec->to_Dec();
204
205 How do I convert from decimal to octal
206 Using "sprintf":
207
208 my $oct = sprintf("%o", 3735928559);
209
210 Using Bit::Vector:
211
212 use Bit::Vector;
213 my $vec = Bit::Vector->new_Dec(32, -559038737);
214 my $oct = reverse join('', $vec->Chunk_List_Read(3));
215
216 How do I convert from binary to decimal
217 Perl 5.6 lets you write binary numbers directly with the "0b"
218 notation:
219
220 my $number = 0b10110110;
221
222 Using "oct":
223
224 my $input = "10110110";
225 my $decimal = oct( "0b$input" );
226
227 Using "pack" and "ord":
228
229 my $decimal = ord(pack('B8', '10110110'));
230
231 Using "pack" and "unpack" for larger strings:
232
233 my $int = unpack("N", pack("B32",
234 substr("0" x 32 . "11110101011011011111011101111", -32)));
235 my $dec = sprintf("%d", $int);
236
237 # substr() is used to left-pad a 32-character string with zeros.
238
239 Using Bit::Vector:
240
241 my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
242 my $dec = $vec->to_Dec();
243
244 How do I convert from decimal to binary
245 Using "sprintf" (perl 5.6+):
246
247 my $bin = sprintf("%b", 3735928559);
248
249 Using "unpack":
250
251 my $bin = unpack("B*", pack("N", 3735928559));
252
253 Using Bit::Vector:
254
255 use Bit::Vector;
256 my $vec = Bit::Vector->new_Dec(32, -559038737);
257 my $bin = $vec->to_Bin();
258
259 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
260 are left as an exercise to the inclined reader.
261
262 Why doesn't & work the way I want it to?
263 The behavior of binary arithmetic operators depends on whether they're
264 used on numbers or strings. The operators treat a string as a series of
265 bits and work with that (the string "3" is the bit pattern 00110011).
266 The operators work with the binary form of a number (the number 3 is
267 treated as the bit pattern 00000011).
268
269 So, saying "11 & 3" performs the "and" operation on numbers (yielding
270 3). Saying "11" & "3" performs the "and" operation on strings (yielding
271 "1").
272
273 Most problems with "&" and "|" arise because the programmer thinks they
274 have a number but really it's a string or vice versa. To avoid this,
275 stringify the arguments explicitly (using "" or "qq()") or convert them
276 to numbers explicitly (using "0+$arg"). The rest arise because the
277 programmer says:
278
279 if ("\020\020" & "\101\101") {
280 # ...
281 }
282
283 but a string consisting of two null bytes (the result of "\020\020" &
284 "\101\101") is not a false value in Perl. You need:
285
286 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
287 # ...
288 }
289
290 How do I multiply matrices?
291 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
292 or the PDL extension (also available from CPAN).
293
294 How do I perform an operation on a series of integers?
295 To call a function on each element in an array, and collect the
296 results, use:
297
298 my @results = map { my_func($_) } @array;
299
300 For example:
301
302 my @triple = map { 3 * $_ } @single;
303
304 To call a function on each element of an array, but ignore the results:
305
306 foreach my $iterator (@array) {
307 some_func($iterator);
308 }
309
310 To call a function on each integer in a (small) range, you can use:
311
312 my @results = map { some_func($_) } (5 .. 25);
313
314 but you should be aware that in this form, the ".." operator creates a
315 list of all integers in the range, which can take a lot of memory for
316 large ranges. However, the problem does not occur when using ".."
317 within a "for" loop, because in that case the range operator is
318 optimized to iterate over the range, without creating the entire list.
319 So
320
321 my @results = ();
322 for my $i (5 .. 500_005) {
323 push(@results, some_func($i));
324 }
325
326 or even
327
328 push(@results, some_func($_)) for 5 .. 500_005;
329
330 will not create an intermediate list of 500,000 integers.
331
332 How can I output Roman numerals?
333 Get the http://www.cpan.org/modules/by-module/Roman
334 <http://www.cpan.org/modules/by-module/Roman> module.
335
336 Why aren't my random numbers random?
337 If you're using a version of Perl before 5.004, you must call "srand"
338 once at the start of your program to seed the random number generator.
339
340 BEGIN { srand() if $] < 5.004 }
341
342 5.004 and later automatically call "srand" at the beginning. Don't call
343 "srand" more than once--you make your numbers less random, rather than
344 more.
345
346 Computers are good at being predictable and bad at being random
347 (despite appearances caused by bugs in your programs :-). The random
348 article in the "Far More Than You Ever Wanted To Know" collection in
349 <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy of Tom
350 Phoenix, talks more about this. John von Neumann said, "Anyone who
351 attempts to generate random numbers by deterministic means is, of
352 course, living in a state of sin."
353
354 Perl relies on the underlying system for the implementation of "rand"
355 and "srand"; on some systems, the generated numbers are not random
356 enough (especially on Windows : see
357 <http://www.perlmonks.org/?node_id=803632>). Several CPAN modules in
358 the "Math" namespace implement better pseudorandom generators; see for
359 example Math::Random::MT ("Mersenne Twister", fast), or
360 Math::TrulyRandom (uses the imperfections in the system's timer to
361 generate random numbers, which is rather slow). More algorithms for
362 random numbers are described in "Numerical Recipes in C" at
363 <http://www.nr.com/>
364
365 How do I get a random number between X and Y?
366 To get a random number between two values, you can use the "rand()"
367 built-in to get a random number between 0 and 1. From there, you shift
368 that into the range that you want.
369
370 "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
371 you want to have perl figure out is a random number in the range from 0
372 to the difference between your X and Y.
373
374 That is, to get a number between 10 and 15, inclusive, you want a
375 random number between 0 and 5 that you can then add to 10.
376
377 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
378
379 Hence you derive the following simple function to abstract that. It
380 selects a random integer between the two given integers (inclusive),
381 For example: "random_int_between(50,120)".
382
383 sub random_int_between {
384 my($min, $max) = @_;
385 # Assumes that the two arguments are integers themselves!
386 return $min if $min == $max;
387 ($min, $max) = ($max, $min) if $min > $max;
388 return $min + int rand(1 + $max - $min);
389 }
390
392 How do I find the day or week of the year?
393 The day of the year is in the list returned by the "localtime"
394 function. Without an argument "localtime" uses the current time.
395
396 my $day_of_year = (localtime)[7];
397
398 The POSIX module can also format a date as the day of the year or week
399 of the year.
400
401 use POSIX qw/strftime/;
402 my $day_of_year = strftime "%j", localtime;
403 my $week_of_year = strftime "%W", localtime;
404
405 To get the day of year for any date, use POSIX's "mktime" to get a time
406 in epoch seconds for the argument to "localtime".
407
408 use POSIX qw/mktime strftime/;
409 my $week_of_year = strftime "%W",
410 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
411
412 You can also use Time::Piece, which comes with Perl and provides a
413 "localtime" that returns an object:
414
415 use Time::Piece;
416 my $day_of_year = localtime->yday;
417 my $week_of_year = localtime->week;
418
419 The Date::Calc module provides two functions to calculate these, too:
420
421 use Date::Calc;
422 my $day_of_year = Day_of_Year( 1987, 12, 18 );
423 my $week_of_year = Week_of_Year( 1987, 12, 18 );
424
425 How do I find the current century or millennium?
426 Use the following simple functions:
427
428 sub get_century {
429 return int((((localtime(shift || time))[5] + 1999))/100);
430 }
431
432 sub get_millennium {
433 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
434 }
435
436 On some systems, the POSIX module's "strftime()" function has been
437 extended in a non-standard way to use a %C format, which they sometimes
438 claim is the "century". It isn't, because on most such systems, this is
439 only the first two digits of the four-digit year, and thus cannot be
440 used to determine reliably the current century or millennium.
441
442 How can I compare two dates and find the difference?
443 (contributed by brian d foy)
444
445 You could just store all your dates as a number and then subtract.
446 Life isn't always that simple though.
447
448 The Time::Piece module, which comes with Perl, replaces localtime with
449 a version that returns an object. It also overloads the comparison
450 operators so you can compare them directly:
451
452 use Time::Piece;
453 my $date1 = localtime( $some_time );
454 my $date2 = localtime( $some_other_time );
455
456 if( $date1 < $date2 ) {
457 print "The date was in the past\n";
458 }
459
460 You can also get differences with a subtraction, which returns a
461 Time::Seconds object:
462
463 my $diff = $date1 - $date2;
464 print "The difference is ", $date_diff->days, " days\n";
465
466 If you want to work with formatted dates, the Date::Manip, Date::Calc,
467 or DateTime modules can help you.
468
469 How can I take a string and turn it into epoch seconds?
470 If it's a regular enough string that it always has the same format, you
471 can split it up and pass the parts to "timelocal" in the standard
472 Time::Local module. Otherwise, you should look into the Date::Calc,
473 Date::Parse, and Date::Manip modules from CPAN.
474
475 How can I find the Julian Day?
476 (contributed by brian d foy and Dave Cross)
477
478 You can use the Time::Piece module, part of the Standard Library, which
479 can convert a date/time to a Julian Day:
480
481 $ perl -MTime::Piece -le 'print localtime->julian_day'
482 2455607.7959375
483
484 Or the modified Julian Day:
485
486 $ perl -MTime::Piece -le 'print localtime->mjd'
487 55607.2961226851
488
489 Or even the day of the year (which is what some people think of as a
490 Julian day):
491
492 $ perl -MTime::Piece -le 'print localtime->yday'
493 45
494
495 You can also do the same things with the DateTime module:
496
497 $ perl -MDateTime -le'print DateTime->today->jd'
498 2453401.5
499 $ perl -MDateTime -le'print DateTime->today->mjd'
500 53401
501 $ perl -MDateTime -le'print DateTime->today->doy'
502 31
503
504 You can use the Time::JulianDay module available on CPAN. Ensure that
505 you really want to find a Julian day, though, as many people have
506 different ideas about Julian days (see
507 <http://www.hermetic.ch/cal_stud/jdn.htm> for instance):
508
509 $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
510 55608
511
512 How do I find yesterday's date?
513 (contributed by brian d foy)
514
515 To do it correctly, you can use one of the "Date" modules since they
516 work with calendars instead of times. The DateTime module makes it
517 simple, and give you the same time of day, only the day before, despite
518 daylight saving time changes:
519
520 use DateTime;
521
522 my $yesterday = DateTime->now->subtract( days => 1 );
523
524 print "Yesterday was $yesterday\n";
525
526 You can also use the Date::Calc module using its "Today_and_Now"
527 function.
528
529 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
530
531 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
532
533 print "@date_time\n";
534
535 Most people try to use the time rather than the calendar to figure out
536 dates, but that assumes that days are twenty-four hours each. For most
537 people, there are two days a year when they aren't: the switch to and
538 from summer time throws this off. For example, the rest of the
539 suggestions will be wrong sometimes:
540
541 Starting with Perl 5.10, Time::Piece and Time::Seconds are part of the
542 standard distribution, so you might think that you could do something
543 like this:
544
545 use Time::Piece;
546 use Time::Seconds;
547
548 my $yesterday = localtime() - ONE_DAY; # WRONG
549 print "Yesterday was $yesterday\n";
550
551 The Time::Piece module exports a new "localtime" that returns an
552 object, and Time::Seconds exports the "ONE_DAY" constant that is a set
553 number of seconds. This means that it always gives the time 24 hours
554 ago, which is not always yesterday. This can cause problems around the
555 end of daylight saving time when there's one day that is 25 hours long.
556
557 You have the same problem with Time::Local, which will give the wrong
558 answer for those same special cases:
559
560 # contributed by Gunnar Hjalmarsson
561 use Time::Local;
562 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
563 my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
564 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
565
566 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
567 (contributed by brian d foy)
568
569 Perl itself never had a Y2K problem, although that never stopped people
570 from creating Y2K problems on their own. See the documentation for
571 "localtime" for its proper use.
572
573 Starting with Perl 5.12, "localtime" and "gmtime" can handle dates past
574 03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
575 still might get a warning on a 32-bit "perl":
576
577 % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
578 Integer overflow in hexadecimal number at -e line 1.
579 Wed Nov 1 19:42:39 5576711
580
581 On a 64-bit "perl", you can get even larger dates for those really long
582 running projects:
583
584 % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
585 Thu Nov 2 00:42:39 5576711
586
587 You're still out of luck if you need to keep track of decaying protons
588 though.
589
591 How do I validate input?
592 (contributed by brian d foy)
593
594 There are many ways to ensure that values are what you expect or want
595 to accept. Besides the specific examples that we cover in the perlfaq,
596 you can also look at the modules with "Assert" and "Validate" in their
597 names, along with other modules such as Regexp::Common.
598
599 Some modules have validation for particular types of input, such as
600 Business::ISBN, Business::CreditCard, Email::Valid, and
601 Data::Validate::IP.
602
603 How do I unescape a string?
604 It depends just what you mean by "escape". URL escapes are dealt with
605 in perlfaq9. Shell escapes with the backslash ("\") character are
606 removed with
607
608 s/\\(.)/$1/g;
609
610 This won't expand "\n" or "\t" or any other special escapes.
611
612 How do I remove consecutive pairs of characters?
613 (contributed by brian d foy)
614
615 You can use the substitution operator to find pairs of characters (or
616 runs of characters) and replace them with a single instance. In this
617 substitution, we find a character in "(.)". The memory parentheses
618 store the matched character in the back-reference "\g1" and we use that
619 to require that the same thing immediately follow it. We replace that
620 part of the string with the character in $1.
621
622 s/(.)\g1/$1/g;
623
624 We can also use the transliteration operator, "tr///". In this example,
625 the search list side of our "tr///" contains nothing, but the "c"
626 option complements that so it contains everything. The replacement list
627 also contains nothing, so the transliteration is almost a no-op since
628 it won't do any replacements (or more exactly, replace the character
629 with itself). However, the "s" option squashes duplicated and
630 consecutive characters in the string so a character does not show up
631 next to itself
632
633 my $str = 'Haarlem'; # in the Netherlands
634 $str =~ tr///cs; # Now Harlem, like in New York
635
636 How do I expand function calls in a string?
637 (contributed by brian d foy)
638
639 This is documented in perlref, and although it's not the easiest thing
640 to read, it does work. In each of these examples, we call the function
641 inside the braces used to dereference a reference. If we have more than
642 one return value, we can construct and dereference an anonymous array.
643 In this case, we call the function in list context.
644
645 print "The time values are @{ [localtime] }.\n";
646
647 If we want to call the function in scalar context, we have to do a bit
648 more work. We can really have any code we like inside the braces, so we
649 simply have to end with the scalar reference, although how you do that
650 is up to you, and you can use code inside the braces. Note that the use
651 of parens creates a list context, so we need "scalar" to force the
652 scalar context on the function:
653
654 print "The time is ${\(scalar localtime)}.\n"
655
656 print "The time is ${ my $x = localtime; \$x }.\n";
657
658 If your function already returns a reference, you don't need to create
659 the reference yourself.
660
661 sub timestamp { my $t = localtime; \$t }
662
663 print "The time is ${ timestamp() }.\n";
664
665 The "Interpolation" module can also do a lot of magic for you. You can
666 specify a variable name, in this case "E", to set up a tied hash that
667 does the interpolation for you. It has several other methods to do this
668 as well.
669
670 use Interpolation E => 'eval';
671 print "The time values are $E{localtime()}.\n";
672
673 In most cases, it is probably easier to simply use string
674 concatenation, which also forces scalar context.
675
676 print "The time is " . localtime() . ".\n";
677
678 How do I find matching/nesting anything?
679 To find something between two single characters, a pattern like
680 "/x([^x]*)x/" will get the intervening bits in $1. For multiple ones,
681 then something more like "/alpha(.*?)omega/" would be needed. For
682 nested patterns and/or balanced expressions, see the so-called (?PARNO)
683 construct (available since perl 5.10). The CPAN module Regexp::Common
684 can help to build such regular expressions (see in particular
685 Regexp::Common::balanced and Regexp::Common::delimited).
686
687 More complex cases will require to write a parser, probably using a
688 parsing module from CPAN, like Regexp::Grammars, Parse::RecDescent,
689 Parse::Yapp, Text::Balanced, or Marpa::XS.
690
691 How do I reverse a string?
692 Use "reverse()" in scalar context, as documented in "reverse" in
693 perlfunc.
694
695 my $reversed = reverse $string;
696
697 How do I expand tabs in a string?
698 You can do it yourself:
699
700 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
701
702 Or you can just use the Text::Tabs module (part of the standard Perl
703 distribution).
704
705 use Text::Tabs;
706 my @expanded_lines = expand(@lines_with_tabs);
707
708 How do I reformat a paragraph?
709 Use Text::Wrap (part of the standard Perl distribution):
710
711 use Text::Wrap;
712 print wrap("\t", ' ', @paragraphs);
713
714 The paragraphs you give to Text::Wrap should not contain embedded
715 newlines. Text::Wrap doesn't justify the lines (flush-right).
716
717 Or use the CPAN module Text::Autoformat. Formatting files can be easily
718 done by making a shell alias, like so:
719
720 alias fmt="perl -i -MText::Autoformat -n0777 \
721 -e 'print autoformat $_, {all=>1}' $*"
722
723 See the documentation for Text::Autoformat to appreciate its many
724 capabilities.
725
726 How can I access or change N characters of a string?
727 You can access the first characters of a string with substr(). To get
728 the first character, for example, start at position 0 and grab the
729 string of length 1.
730
731 my $string = "Just another Perl Hacker";
732 my $first_char = substr( $string, 0, 1 ); # 'J'
733
734 To change part of a string, you can use the optional fourth argument
735 which is the replacement string.
736
737 substr( $string, 13, 4, "Perl 5.8.0" );
738
739 You can also use substr() as an lvalue.
740
741 substr( $string, 13, 4 ) = "Perl 5.8.0";
742
743 How do I change the Nth occurrence of something?
744 You have to keep track of N yourself. For example, let's say you want
745 to change the fifth occurrence of "whoever" or "whomever" into
746 "whosoever" or "whomsoever", case insensitively. These all assume that
747 $_ contains the string to be altered.
748
749 $count = 0;
750 s{((whom?)ever)}{
751 ++$count == 5 # is it the 5th?
752 ? "${2}soever" # yes, swap
753 : $1 # renege and leave it there
754 }ige;
755
756 In the more general case, you can use the "/g" modifier in a "while"
757 loop, keeping count of matches.
758
759 $WANT = 3;
760 $count = 0;
761 $_ = "One fish two fish red fish blue fish";
762 while (/(\w+)\s+fish\b/gi) {
763 if (++$count == $WANT) {
764 print "The third fish is a $1 one.\n";
765 }
766 }
767
768 That prints out: "The third fish is a red one." You can also use a
769 repetition count and repeated pattern like this:
770
771 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
772
773 How can I count the number of occurrences of a substring within a string?
774 There are a number of ways, with varying efficiency. If you want a
775 count of a certain single character (X) within a string, you can use
776 the "tr///" function like so:
777
778 my $string = "ThisXlineXhasXsomeXx'sXinXit";
779 my $count = ($string =~ tr/X//);
780 print "There are $count X characters in the string";
781
782 This is fine if you are just looking for a single character. However,
783 if you are trying to count multiple character substrings within a
784 larger string, "tr///" won't work. What you can do is wrap a while()
785 loop around a global pattern match. For example, let's count negative
786 integers:
787
788 my $string = "-9 55 48 -2 23 -76 4 14 -44";
789 my $count = 0;
790 while ($string =~ /-\d+/g) { $count++ }
791 print "There are $count negative numbers in the string";
792
793 Another version uses a global match in list context, then assigns the
794 result to a scalar, producing a count of the number of matches.
795
796 my $count = () = $string =~ /-\d+/g;
797
798 How do I capitalize all the words on one line?
799 (contributed by brian d foy)
800
801 Damian Conway's Text::Autoformat handles all of the thinking for you.
802
803 use Text::Autoformat;
804 my $x = "Dr. Strangelove or: How I Learned to Stop ".
805 "Worrying and Love the Bomb";
806
807 print $x, "\n";
808 for my $style (qw( sentence title highlight )) {
809 print autoformat($x, { case => $style }), "\n";
810 }
811
812 How do you want to capitalize those words?
813
814 FRED AND BARNEY'S LODGE # all uppercase
815 Fred And Barney's Lodge # title case
816 Fred and Barney's Lodge # highlight case
817
818 It's not as easy a problem as it looks. How many words do you think are
819 in there? Wait for it... wait for it.... If you answered 5 you're
820 right. Perl words are groups of "\w+", but that's not what you want to
821 capitalize. How is Perl supposed to know not to capitalize that "s"
822 after the apostrophe? You could try a regular expression:
823
824 $string =~ s/ (
825 (^\w) #at the beginning of the line
826 | # or
827 (\s\w) #preceded by whitespace
828 )
829 /\U$1/xg;
830
831 $string =~ s/([\w']+)/\u\L$1/g;
832
833 Now, what if you don't want to capitalize that "and"? Just use
834 Text::Autoformat and get on with the next problem. :)
835
836 How can I split a [character]-delimited string except when inside
837 [character]?
838 Several modules can handle this sort of parsing--Text::Balanced,
839 Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
840
841 Take the example case of trying to split a string that is comma-
842 separated into its different fields. You can't use "split(/,/)" because
843 you shouldn't split if the comma is inside quotes. For example, take a
844 data line like this:
845
846 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
847
848 Due to the restriction of the quotes, this is a fairly complex problem.
849 Thankfully, we have Jeffrey Friedl, author of Mastering Regular
850 Expressions, to handle these for us. He suggests (assuming your string
851 is contained in $text):
852
853 my @new = ();
854 push(@new, $+) while $text =~ m{
855 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
856 | ([^,]+),?
857 | ,
858 }gx;
859 push(@new, undef) if substr($text,-1,1) eq ',';
860
861 If you want to represent quotation marks inside a quotation-mark-
862 delimited field, escape them with backslashes (eg, "like \"this\"".
863
864 Alternatively, the Text::ParseWords module (part of the standard Perl
865 distribution) lets you say:
866
867 use Text::ParseWords;
868 @new = quotewords(",", 0, $text);
869
870 For parsing or generating CSV, though, using Text::CSV rather than
871 implementing it yourself is highly recommended; you'll save yourself
872 odd bugs popping up later by just using code which has already been
873 tried and tested in production for years.
874
875 How do I strip blank space from the beginning/end of a string?
876 (contributed by brian d foy)
877
878 A substitution can do this for you. For a single line, you want to
879 replace all the leading or trailing whitespace with nothing. You can do
880 that with a pair of substitutions:
881
882 s/^\s+//;
883 s/\s+$//;
884
885 You can also write that as a single substitution, although it turns out
886 the combined statement is slower than the separate ones. That might not
887 matter to you, though:
888
889 s/^\s+|\s+$//g;
890
891 In this regular expression, the alternation matches either at the
892 beginning or the end of the string since the anchors have a lower
893 precedence than the alternation. With the "/g" flag, the substitution
894 makes all possible matches, so it gets both. Remember, the trailing
895 newline matches the "\s+", and the "$" anchor can match to the
896 absolute end of the string, so the newline disappears too. Just add the
897 newline to the output, which has the added benefit of preserving
898 "blank" (consisting entirely of whitespace) lines which the "^\s+"
899 would remove all by itself:
900
901 while( <> ) {
902 s/^\s+|\s+$//g;
903 print "$_\n";
904 }
905
906 For a multi-line string, you can apply the regular expression to each
907 logical line in the string by adding the "/m" flag (for "multi-line").
908 With the "/m" flag, the "$" matches before an embedded newline, so it
909 doesn't remove it. This pattern still removes the newline at the end of
910 the string:
911
912 $string =~ s/^\s+|\s+$//gm;
913
914 Remember that lines consisting entirely of whitespace will disappear,
915 since the first part of the alternation can match the entire string and
916 replace it with nothing. If you need to keep embedded blank lines, you
917 have to do a little more work. Instead of matching any whitespace
918 (since that includes a newline), just match the other whitespace:
919
920 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
921
922 How do I pad a string with blanks or pad a number with zeroes?
923 In the following examples, $pad_len is the length to which you wish to
924 pad the string, $text or $num contains the string to be padded, and
925 $pad_char contains the padding character. You can use a single
926 character string constant instead of the $pad_char variable if you know
927 what it is in advance. And in the same way you can use an integer in
928 place of $pad_len if you know the pad length in advance.
929
930 The simplest method uses the "sprintf" function. It can pad on the left
931 or right with blanks and on the left with zeroes and it will not
932 truncate the result. The "pack" function can only pad strings on the
933 right with blanks and it will truncate the result to a maximum length
934 of $pad_len.
935
936 # Left padding a string with blanks (no truncation):
937 my $padded = sprintf("%${pad_len}s", $text);
938 my $padded = sprintf("%*s", $pad_len, $text); # same thing
939
940 # Right padding a string with blanks (no truncation):
941 my $padded = sprintf("%-${pad_len}s", $text);
942 my $padded = sprintf("%-*s", $pad_len, $text); # same thing
943
944 # Left padding a number with 0 (no truncation):
945 my $padded = sprintf("%0${pad_len}d", $num);
946 my $padded = sprintf("%0*d", $pad_len, $num); # same thing
947
948 # Right padding a string with blanks using pack (will truncate):
949 my $padded = pack("A$pad_len",$text);
950
951 If you need to pad with a character other than blank or zero you can
952 use one of the following methods. They all generate a pad string with
953 the "x" operator and combine that with $text. These methods do not
954 truncate $text.
955
956 Left and right padding with any character, creating a new string:
957
958 my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
959 my $padded = $text . $pad_char x ( $pad_len - length( $text ) );
960
961 Left and right padding with any character, modifying $text directly:
962
963 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
964 $text .= $pad_char x ( $pad_len - length( $text ) );
965
966 How do I extract selected columns from a string?
967 (contributed by brian d foy)
968
969 If you know the columns that contain the data, you can use "substr" to
970 extract a single column.
971
972 my $column = substr( $line, $start_column, $length );
973
974 You can use "split" if the columns are separated by whitespace or some
975 other delimiter, as long as whitespace or the delimiter cannot appear
976 as part of the data.
977
978 my $line = ' fred barney betty ';
979 my @columns = split /\s+/, $line;
980 # ( '', 'fred', 'barney', 'betty' );
981
982 my $line = 'fred||barney||betty';
983 my @columns = split /\|/, $line;
984 # ( 'fred', '', 'barney', '', 'betty' );
985
986 If you want to work with comma-separated values, don't do this since
987 that format is a bit more complicated. Use one of the modules that
988 handle that format, such as Text::CSV, Text::CSV_XS, or Text::CSV_PP.
989
990 If you want to break apart an entire line of fixed columns, you can use
991 "unpack" with the A (ASCII) format. By using a number after the format
992 specifier, you can denote the column width. See the "pack" and "unpack"
993 entries in perlfunc for more details.
994
995 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
996
997 Note that spaces in the format argument to "unpack" do not denote
998 literal spaces. If you have space separated data, you may want "split"
999 instead.
1000
1001 How do I find the soundex value of a string?
1002 (contributed by brian d foy)
1003
1004 You can use the "Text::Soundex" module. If you want to do fuzzy or
1005 close matching, you might also try the String::Approx, and
1006 Text::Metaphone, and Text::DoubleMetaphone modules.
1007
1008 How can I expand variables in text strings?
1009 (contributed by brian d foy)
1010
1011 If you can avoid it, don't, or if you can use a templating system, such
1012 as Text::Template or Template Toolkit, do that instead. You might even
1013 be able to get the job done with "sprintf" or "printf":
1014
1015 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1016
1017 However, for the one-off simple case where I don't want to pull out a
1018 full templating system, I'll use a string that has two Perl scalar
1019 variables in it. In this example, I want to expand $foo and $bar to
1020 their variable's values:
1021
1022 my $foo = 'Fred';
1023 my $bar = 'Barney';
1024 $string = 'Say hello to $foo and $bar';
1025
1026 One way I can do this involves the substitution operator and a double
1027 "/e" flag. The first "/e" evaluates $1 on the replacement side and
1028 turns it into $foo. The second /e starts with $foo and replaces it with
1029 its value. $foo, then, turns into 'Fred', and that's finally what's
1030 left in the string:
1031
1032 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1033
1034 The "/e" will also silently ignore violations of strict, replacing
1035 undefined variable names with the empty string. Since I'm using the
1036 "/e" flag (twice even!), I have all of the same security problems I
1037 have with "eval" in its string form. If there's something odd in $foo,
1038 perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1039 myself in trouble.
1040
1041 To get around the security problem, I could also pull the values from a
1042 hash instead of evaluating variable names. Using a single "/e", I can
1043 check the hash to ensure the value exists, and if it doesn't, I can
1044 replace the missing value with a marker, in this case "???" to signal
1045 that I missed something:
1046
1047 my $string = 'This has $foo and $bar';
1048
1049 my %Replacements = (
1050 foo => 'Fred',
1051 );
1052
1053 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1054 $string =~ s/\$(\w+)/
1055 exists $Replacements{$1} ? $Replacements{$1} : '???'
1056 /eg;
1057
1058 print $string;
1059
1060 What's wrong with always quoting "$vars"?
1061 The problem is that those double-quotes force stringification--coercing
1062 numbers and references into strings--even when you don't want them to
1063 be strings. Think of it this way: double-quote expansion is used to
1064 produce new strings. If you already have a string, why do you need
1065 more?
1066
1067 If you get used to writing odd things like these:
1068
1069 print "$var"; # BAD
1070 my $new = "$old"; # BAD
1071 somefunc("$var"); # BAD
1072
1073 You'll be in trouble. Those should (in 99.8% of the cases) be the
1074 simpler and more direct:
1075
1076 print $var;
1077 my $new = $old;
1078 somefunc($var);
1079
1080 Otherwise, besides slowing you down, you're going to break code when
1081 the thing in the scalar is actually neither a string nor a number, but
1082 a reference:
1083
1084 func(\@array);
1085 sub func {
1086 my $aref = shift;
1087 my $oref = "$aref"; # WRONG
1088 }
1089
1090 You can also get into subtle problems on those few operations in Perl
1091 that actually do care about the difference between a string and a
1092 number, such as the magical "++" autoincrement operator or the
1093 syscall() function.
1094
1095 Stringification also destroys arrays.
1096
1097 my @lines = `command`;
1098 print "@lines"; # WRONG - extra blanks
1099 print @lines; # right
1100
1101 Why don't my <<HERE documents work?
1102 Here documents are found in perlop. Check for these three things:
1103
1104 There must be no space after the << part.
1105 There (probably) should be a semicolon at the end of the opening token
1106 You can't (easily) have any space in front of the tag.
1107 There needs to be at least a line separator after the end token.
1108
1109 If you want to indent the text in the here document, you can do this:
1110
1111 # all in one
1112 (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1113 your text
1114 goes here
1115 HERE_TARGET
1116
1117 But the HERE_TARGET must still be flush against the margin. If you
1118 want that indented also, you'll have to quote in the indentation.
1119
1120 (my $quote = <<' FINIS') =~ s/^\s+//gm;
1121 ...we will have peace, when you and all your works have
1122 perished--and the works of your dark master to whom you
1123 would deliver us. You are a liar, Saruman, and a corrupter
1124 of men's hearts. --Theoden in /usr/src/perl/taint.c
1125 FINIS
1126 $quote =~ s/\s+--/\n--/;
1127
1128 A nice general-purpose fixer-upper function for indented here documents
1129 follows. It expects to be called with a here document as its argument.
1130 It looks to see whether each line begins with a common substring, and
1131 if so, strips that substring off. Otherwise, it takes the amount of
1132 leading whitespace found on the first line and removes that much off
1133 each subsequent line.
1134
1135 sub fix {
1136 local $_ = shift;
1137 my ($white, $leader); # common whitespace and common leading string
1138 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
1139 ($white, $leader) = ($2, quotemeta($1));
1140 } else {
1141 ($white, $leader) = (/^(\s+)/, '');
1142 }
1143 s/^\s*?$leader(?:$white)?//gm;
1144 return $_;
1145 }
1146
1147 This works with leading special strings, dynamically determined:
1148
1149 my $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1150 @@@ int
1151 @@@ runops() {
1152 @@@ SAVEI32(runlevel);
1153 @@@ runlevel++;
1154 @@@ while ( op = (*op->op_ppaddr)() );
1155 @@@ TAINT_NOT;
1156 @@@ return 0;
1157 @@@ }
1158 MAIN_INTERPRETER_LOOP
1159
1160 Or with a fixed amount of leading whitespace, with remaining
1161 indentation correctly preserved:
1162
1163 my $poem = fix<<EVER_ON_AND_ON;
1164 Now far ahead the Road has gone,
1165 And I must follow, if I can,
1166 Pursuing it with eager feet,
1167 Until it joins some larger way
1168 Where many paths and errands meet.
1169 And whither then? I cannot say.
1170 --Bilbo in /usr/src/perl/pp_ctl.c
1171 EVER_ON_AND_ON
1172
1174 What is the difference between a list and an array?
1175 (contributed by brian d foy)
1176
1177 A list is a fixed collection of scalars. An array is a variable that
1178 holds a variable collection of scalars. An array can supply its
1179 collection for list operations, so list operations also work on arrays:
1180
1181 # slices
1182 ( 'dog', 'cat', 'bird' )[2,3];
1183 @animals[2,3];
1184
1185 # iteration
1186 foreach ( qw( dog cat bird ) ) { ... }
1187 foreach ( @animals ) { ... }
1188
1189 my @three = grep { length == 3 } qw( dog cat bird );
1190 my @three = grep { length == 3 } @animals;
1191
1192 # supply an argument list
1193 wash_animals( qw( dog cat bird ) );
1194 wash_animals( @animals );
1195
1196 Array operations, which change the scalars, rearranges them, or adds or
1197 subtracts some scalars, only work on arrays. These can't work on a
1198 list, which is fixed. Array operations include "shift", "unshift",
1199 "push", "pop", and "splice".
1200
1201 An array can also change its length:
1202
1203 $#animals = 1; # truncate to two elements
1204 $#animals = 10000; # pre-extend to 10,001 elements
1205
1206 You can change an array element, but you can't change a list element:
1207
1208 $animals[0] = 'Rottweiler';
1209 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1210
1211 foreach ( @animals ) {
1212 s/^d/fr/; # works fine
1213 }
1214
1215 foreach ( qw( dog cat bird ) ) {
1216 s/^d/fr/; # Error! Modification of read only value!
1217 }
1218
1219 However, if the list element is itself a variable, it appears that you
1220 can change a list element. However, the list element is the variable,
1221 not the data. You're not changing the list element, but something the
1222 list element refers to. The list element itself doesn't change: it's
1223 still the same variable.
1224
1225 You also have to be careful about context. You can assign an array to a
1226 scalar to get the number of elements in the array. This only works for
1227 arrays, though:
1228
1229 my $count = @animals; # only works with arrays
1230
1231 If you try to do the same thing with what you think is a list, you get
1232 a quite different result. Although it looks like you have a list on the
1233 righthand side, Perl actually sees a bunch of scalars separated by a
1234 comma:
1235
1236 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
1237
1238 Since you're assigning to a scalar, the righthand side is in scalar
1239 context. The comma operator (yes, it's an operator!) in scalar context
1240 evaluates its lefthand side, throws away the result, and evaluates it's
1241 righthand side and returns the result. In effect, that list-lookalike
1242 assigns to $scalar it's rightmost value. Many people mess this up
1243 because they choose a list-lookalike whose last element is also the
1244 count they expect:
1245
1246 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
1247
1248 What is the difference between $array[1] and @array[1]?
1249 (contributed by brian d foy)
1250
1251 The difference is the sigil, that special character in front of the
1252 array name. The "$" sigil means "exactly one item", while the "@" sigil
1253 means "zero or more items". The "$" gets you a single scalar, while the
1254 "@" gets you a list.
1255
1256 The confusion arises because people incorrectly assume that the sigil
1257 denotes the variable type.
1258
1259 The $array[1] is a single-element access to the array. It's going to
1260 return the item in index 1 (or undef if there is no item there). If
1261 you intend to get exactly one element from the array, this is the form
1262 you should use.
1263
1264 The @array[1] is an array slice, although it has only one index. You
1265 can pull out multiple elements simultaneously by specifying additional
1266 indices as a list, like @array[1,4,3,0].
1267
1268 Using a slice on the lefthand side of the assignment supplies list
1269 context to the righthand side. This can lead to unexpected results.
1270 For instance, if you want to read a single line from a filehandle,
1271 assigning to a scalar value is fine:
1272
1273 $array[1] = <STDIN>;
1274
1275 However, in list context, the line input operator returns all of the
1276 lines as a list. The first line goes into @array[1] and the rest of the
1277 lines mysteriously disappear:
1278
1279 @array[1] = <STDIN>; # most likely not what you want
1280
1281 Either the "use warnings" pragma or the -w flag will warn you when you
1282 use an array slice with a single index.
1283
1284 How can I remove duplicate elements from a list or array?
1285 (contributed by brian d foy)
1286
1287 Use a hash. When you think the words "unique" or "duplicated", think
1288 "hash keys".
1289
1290 If you don't care about the order of the elements, you could just
1291 create the hash then extract the keys. It's not important how you
1292 create that hash: just that you use "keys" to get the unique elements.
1293
1294 my %hash = map { $_, 1 } @array;
1295 # or a hash slice: @hash{ @array } = ();
1296 # or a foreach: $hash{$_} = 1 foreach ( @array );
1297
1298 my @unique = keys %hash;
1299
1300 If you want to use a module, try the "uniq" function from
1301 List::MoreUtils. In list context it returns the unique elements,
1302 preserving their order in the list. In scalar context, it returns the
1303 number of unique elements.
1304
1305 use List::MoreUtils qw(uniq);
1306
1307 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1308 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1309
1310 You can also go through each element and skip the ones you've seen
1311 before. Use a hash to keep track. The first time the loop sees an
1312 element, that element has no key in %Seen. The "next" statement creates
1313 the key and immediately uses its value, which is "undef", so the loop
1314 continues to the "push" and increments the value for that key. The next
1315 time the loop sees that same element, its key exists in the hash and
1316 the value for that key is true (since it's not 0 or "undef"), so the
1317 next skips that iteration and the loop goes to the next element.
1318
1319 my @unique = ();
1320 my %seen = ();
1321
1322 foreach my $elem ( @array ) {
1323 next if $seen{ $elem }++;
1324 push @unique, $elem;
1325 }
1326
1327 You can write this more briefly using a grep, which does the same
1328 thing.
1329
1330 my %seen = ();
1331 my @unique = grep { ! $seen{ $_ }++ } @array;
1332
1333 How can I tell whether a certain element is contained in a list or array?
1334 (portions of this answer contributed by Anno Siegel and brian d foy)
1335
1336 Hearing the word "in" is an indication that you probably should have
1337 used a hash, not a list or array, to store your data. Hashes are
1338 designed to answer this question quickly and efficiently. Arrays
1339 aren't.
1340
1341 That being said, there are several ways to approach this. In Perl 5.10
1342 and later, you can use the smart match operator to check that an item
1343 is contained in an array or a hash:
1344
1345 use 5.010;
1346
1347 if( $item ~~ @array ) {
1348 say "The array contains $item"
1349 }
1350
1351 if( $item ~~ %hash ) {
1352 say "The hash contains $item"
1353 }
1354
1355 With earlier versions of Perl, you have to do a bit more work. If you
1356 are going to make this query many times over arbitrary string values,
1357 the fastest way is probably to invert the original array and maintain a
1358 hash whose keys are the first array's values:
1359
1360 my @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1361 my %is_blue = ();
1362 for (@blues) { $is_blue{$_} = 1 }
1363
1364 Now you can check whether $is_blue{$some_color}. It might have been a
1365 good idea to keep the blues all in a hash in the first place.
1366
1367 If the values are all small integers, you could use a simple indexed
1368 array. This kind of an array will take up less space:
1369
1370 my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1371 my @is_tiny_prime = ();
1372 for (@primes) { $is_tiny_prime[$_] = 1 }
1373 # or simply @istiny_prime[@primes] = (1) x @primes;
1374
1375 Now you check whether $is_tiny_prime[$some_number].
1376
1377 If the values in question are integers instead of strings, you can save
1378 quite a lot of space by using bit strings instead:
1379
1380 my @articles = ( 1..10, 150..2000, 2017 );
1381 undef $read;
1382 for (@articles) { vec($read,$_,1) = 1 }
1383
1384 Now check whether "vec($read,$n,1)" is true for some $n.
1385
1386 These methods guarantee fast individual tests but require a re-
1387 organization of the original list or array. They only pay off if you
1388 have to test multiple values against the same array.
1389
1390 If you are testing only once, the standard module List::Util exports
1391 the function "first" for this purpose. It works by stopping once it
1392 finds the element. It's written in C for speed, and its Perl equivalent
1393 looks like this subroutine:
1394
1395 sub first (&@) {
1396 my $code = shift;
1397 foreach (@_) {
1398 return $_ if &{$code}();
1399 }
1400 undef;
1401 }
1402
1403 If speed is of little concern, the common idiom uses grep in scalar
1404 context (which returns the number of items that passed its condition)
1405 to traverse the entire list. This does have the benefit of telling you
1406 how many matches it found, though.
1407
1408 my $is_there = grep $_ eq $whatever, @array;
1409
1410 If you want to actually extract the matching elements, simply use grep
1411 in list context.
1412
1413 my @matches = grep $_ eq $whatever, @array;
1414
1415 How do I compute the difference of two arrays? How do I compute the
1416 intersection of two arrays?
1417 Use a hash. Here's code to do both and more. It assumes that each
1418 element is unique in a given array:
1419
1420 my (@union, @intersection, @difference);
1421 my %count = ();
1422 foreach my $element (@array1, @array2) { $count{$element}++ }
1423 foreach my $element (keys %count) {
1424 push @union, $element;
1425 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1426 }
1427
1428 Note that this is the symmetric difference, that is, all elements in
1429 either A or in B but not in both. Think of it as an xor operation.
1430
1431 How do I test whether two arrays or hashes are equal?
1432 With Perl 5.10 and later, the smart match operator can give you the
1433 answer with the least amount of work:
1434
1435 use 5.010;
1436
1437 if( @array1 ~~ @array2 ) {
1438 say "The arrays are the same";
1439 }
1440
1441 if( %hash1 ~~ %hash2 ) # doesn't check values! {
1442 say "The hash keys are the same";
1443 }
1444
1445 The following code works for single-level arrays. It uses a stringwise
1446 comparison, and does not distinguish defined versus undefined empty
1447 strings. Modify if you have other needs.
1448
1449 $are_equal = compare_arrays(\@frogs, \@toads);
1450
1451 sub compare_arrays {
1452 my ($first, $second) = @_;
1453 no warnings; # silence spurious -w undef complaints
1454 return 0 unless @$first == @$second;
1455 for (my $i = 0; $i < @$first; $i++) {
1456 return 0 if $first->[$i] ne $second->[$i];
1457 }
1458 return 1;
1459 }
1460
1461 For multilevel structures, you may wish to use an approach more like
1462 this one. It uses the CPAN module FreezeThaw:
1463
1464 use FreezeThaw qw(cmpStr);
1465 my @a = my @b = ( "this", "that", [ "more", "stuff" ] );
1466
1467 printf "a and b contain %s arrays\n",
1468 cmpStr(\@a, \@b) == 0
1469 ? "the same"
1470 : "different";
1471
1472 This approach also works for comparing hashes. Here we'll demonstrate
1473 two different answers:
1474
1475 use FreezeThaw qw(cmpStr cmpStrHard);
1476
1477 my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1478 $a{EXTRA} = \%b;
1479 $b{EXTRA} = \%a;
1480
1481 printf "a and b contain %s hashes\n",
1482 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1483
1484 printf "a and b contain %s hashes\n",
1485 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1486
1487 The first reports that both those the hashes contain the same data,
1488 while the second reports that they do not. Which you prefer is left as
1489 an exercise to the reader.
1490
1491 How do I find the first array element for which a condition is true?
1492 To find the first array element which satisfies a condition, you can
1493 use the "first()" function in the List::Util module, which comes with
1494 Perl 5.8. This example finds the first element that contains "Perl".
1495
1496 use List::Util qw(first);
1497
1498 my $element = first { /Perl/ } @array;
1499
1500 If you cannot use List::Util, you can make your own loop to do the same
1501 thing. Once you find the element, you stop the loop with last.
1502
1503 my $found;
1504 foreach ( @array ) {
1505 if( /Perl/ ) { $found = $_; last }
1506 }
1507
1508 If you want the array index, use the "firstidx()" function from
1509 "List::MoreUtils":
1510
1511 use List::MoreUtils qw(firstidx);
1512 my $index = firstidx { /Perl/ } @array;
1513
1514 Or write it yourself, iterating through the indices and checking the
1515 array element at each index until you find one that satisfies the
1516 condition:
1517
1518 my( $found, $index ) = ( undef, -1 );
1519 for( $i = 0; $i < @array; $i++ ) {
1520 if( $array[$i] =~ /Perl/ ) {
1521 $found = $array[$i];
1522 $index = $i;
1523 last;
1524 }
1525 }
1526
1527 How do I handle linked lists?
1528 (contributed by brian d foy)
1529
1530 Perl's arrays do not have a fixed size, so you don't need linked lists
1531 if you just want to add or remove items. You can use array operations
1532 such as "push", "pop", "shift", "unshift", or "splice" to do that.
1533
1534 Sometimes, however, linked lists can be useful in situations where you
1535 want to "shard" an array so you have have many small arrays instead of
1536 a single big array. You can keep arrays longer than Perl's largest
1537 array index, lock smaller arrays separately in threaded programs,
1538 reallocate less memory, or quickly insert elements in the middle of the
1539 chain.
1540
1541 Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
1542 Linked Lists" ( http://www.slideshare.net/lembark/perly-linked-lists
1543 <http://www.slideshare.net/lembark/perly-linked-lists> ), although you
1544 can just use his LinkedList::Single module.
1545
1546 How do I handle circular lists?
1547 (contributed by brian d foy)
1548
1549 If you want to cycle through an array endlessly, you can increment the
1550 index modulo the number of elements in the array:
1551
1552 my @array = qw( a b c );
1553 my $i = 0;
1554
1555 while( 1 ) {
1556 print $array[ $i++ % @array ], "\n";
1557 last if $i > 20;
1558 }
1559
1560 You can also use Tie::Cycle to use a scalar that always has the next
1561 element of the circular array:
1562
1563 use Tie::Cycle;
1564
1565 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1566
1567 print $cycle; # FFFFFF
1568 print $cycle; # 000000
1569 print $cycle; # FFFF00
1570
1571 The Array::Iterator::Circular creates an iterator object for circular
1572 arrays:
1573
1574 use Array::Iterator::Circular;
1575
1576 my $color_iterator = Array::Iterator::Circular->new(
1577 qw(red green blue orange)
1578 );
1579
1580 foreach ( 1 .. 20 ) {
1581 print $color_iterator->next, "\n";
1582 }
1583
1584 How do I shuffle an array randomly?
1585 If you either have Perl 5.8.0 or later installed, or if you have
1586 Scalar-List-Utils 1.03 or later installed, you can say:
1587
1588 use List::Util 'shuffle';
1589
1590 @shuffled = shuffle(@list);
1591
1592 If not, you can use a Fisher-Yates shuffle.
1593
1594 sub fisher_yates_shuffle {
1595 my $deck = shift; # $deck is a reference to an array
1596 return unless @$deck; # must not be empty!
1597
1598 my $i = @$deck;
1599 while (--$i) {
1600 my $j = int rand ($i+1);
1601 @$deck[$i,$j] = @$deck[$j,$i];
1602 }
1603 }
1604
1605 # shuffle my mpeg collection
1606 #
1607 my @mpeg = <audio/*/*.mp3>;
1608 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1609 print @mpeg;
1610
1611 Note that the above implementation shuffles an array in place, unlike
1612 the "List::Util::shuffle()" which takes a list and returns a new
1613 shuffled list.
1614
1615 You've probably seen shuffling algorithms that work using splice,
1616 randomly picking another element to swap the current element with
1617
1618 srand;
1619 @new = ();
1620 @old = 1 .. 10; # just a demo
1621 while (@old) {
1622 push(@new, splice(@old, rand @old, 1));
1623 }
1624
1625 This is bad because splice is already O(N), and since you do it N
1626 times, you just invented a quadratic algorithm; that is, O(N**2). This
1627 does not scale, although Perl is so efficient that you probably won't
1628 notice this until you have rather largish arrays.
1629
1630 How do I process/modify each element of an array?
1631 Use "for"/"foreach":
1632
1633 for (@lines) {
1634 s/foo/bar/; # change that word
1635 tr/XZ/ZX/; # swap those letters
1636 }
1637
1638 Here's another; let's compute spherical volumes:
1639
1640 my @volumes = @radii;
1641 for (@volumes) { # @volumes has changed parts
1642 $_ **= 3;
1643 $_ *= (4/3) * 3.14159; # this will be constant folded
1644 }
1645
1646 which can also be done with "map()" which is made to transform one list
1647 into another:
1648
1649 my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1650
1651 If you want to do the same thing to modify the values of the hash, you
1652 can use the "values" function. As of Perl 5.6 the values are not
1653 copied, so if you modify $orbit (in this case), you modify the value.
1654
1655 for my $orbit ( values %orbits ) {
1656 ($orbit **= 3) *= (4/3) * 3.14159;
1657 }
1658
1659 Prior to perl 5.6 "values" returned copies of the values, so older perl
1660 code often contains constructions such as @orbits{keys %orbits} instead
1661 of "values %orbits" where the hash is to be modified.
1662
1663 How do I select a random element from an array?
1664 Use the "rand()" function (see "rand" in perlfunc):
1665
1666 my $index = rand @array;
1667 my $element = $array[$index];
1668
1669 Or, simply:
1670
1671 my $element = $array[ rand @array ];
1672
1673 How do I permute N elements of a list?
1674 Use the List::Permutor module on CPAN. If the list is actually an
1675 array, try the Algorithm::Permute module (also on CPAN). It's written
1676 in XS code and is very efficient:
1677
1678 use Algorithm::Permute;
1679
1680 my @array = 'a'..'d';
1681 my $p_iterator = Algorithm::Permute->new ( \@array );
1682
1683 while (my @perm = $p_iterator->next) {
1684 print "next permutation: (@perm)\n";
1685 }
1686
1687 For even faster execution, you could do:
1688
1689 use Algorithm::Permute;
1690
1691 my @array = 'a'..'d';
1692
1693 Algorithm::Permute::permute {
1694 print "next permutation: (@array)\n";
1695 } @array;
1696
1697 Here's a little program that generates all permutations of all the
1698 words on each line of input. The algorithm embodied in the "permute()"
1699 function is discussed in Volume 4 (still unpublished) of Knuth's The
1700 Art of Computer Programming and will work on any list:
1701
1702 #!/usr/bin/perl -n
1703 # Fischer-Krause ordered permutation generator
1704
1705 sub permute (&@) {
1706 my $code = shift;
1707 my @idx = 0..$#_;
1708 while ( $code->(@_[@idx]) ) {
1709 my $p = $#idx;
1710 --$p while $idx[$p-1] > $idx[$p];
1711 my $q = $p or return;
1712 push @idx, reverse splice @idx, $p;
1713 ++$q while $idx[$p-1] > $idx[$q];
1714 @idx[$p-1,$q]=@idx[$q,$p-1];
1715 }
1716 }
1717
1718 permute { print "@_\n" } split;
1719
1720 The Algorithm::Loops module also provides the "NextPermute" and
1721 "NextPermuteNum" functions which efficiently find all unique
1722 permutations of an array, even if it contains duplicate values,
1723 modifying it in-place: if its elements are in reverse-sorted order then
1724 the array is reversed, making it sorted, and it returns false;
1725 otherwise the next permutation is returned.
1726
1727 "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1728 you can enumerate all the permutations of 0..9 like this:
1729
1730 use Algorithm::Loops qw(NextPermuteNum);
1731
1732 my @list= 0..9;
1733 do { print "@list\n" } while NextPermuteNum @list;
1734
1735 How do I sort an array by (anything)?
1736 Supply a comparison function to sort() (described in "sort" in
1737 perlfunc):
1738
1739 @list = sort { $a <=> $b } @list;
1740
1741 The default sort function is cmp, string comparison, which would sort
1742 "(1, 2, 10)" into "(1, 10, 2)". "<=>", used above, is the numerical
1743 comparison operator.
1744
1745 If you have a complicated function needed to pull out the part you want
1746 to sort on, then don't do it inside the sort function. Pull it out
1747 first, because the sort BLOCK can be called many times for the same
1748 element. Here's an example of how to pull out the first word after the
1749 first number on each item, and then sort those words case-
1750 insensitively.
1751
1752 my @idx;
1753 for (@data) {
1754 my $item;
1755 ($item) = /\d+\s*(\S+)/;
1756 push @idx, uc($item);
1757 }
1758 my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1759
1760 which could also be written this way, using a trick that's come to be
1761 known as the Schwartzian Transform:
1762
1763 my @sorted = map { $_->[0] }
1764 sort { $a->[1] cmp $b->[1] }
1765 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1766
1767 If you need to sort on several fields, the following paradigm is
1768 useful.
1769
1770 my @sorted = sort {
1771 field1($a) <=> field1($b) ||
1772 field2($a) cmp field2($b) ||
1773 field3($a) cmp field3($b)
1774 } @data;
1775
1776 This can be conveniently combined with precalculation of keys as given
1777 above.
1778
1779 See the sort article in the "Far More Than You Ever Wanted To Know"
1780 collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for more
1781 about this approach.
1782
1783 See also the question later in perlfaq4 on sorting hashes.
1784
1785 How do I manipulate arrays of bits?
1786 Use "pack()" and "unpack()", or else "vec()" and the bitwise
1787 operations.
1788
1789 For example, you don't have to store individual bits in an array (which
1790 would mean that you're wasting a lot of space). To convert an array of
1791 bits to a string, use "vec()" to set the right bits. This sets $vec to
1792 have bit N set only if $ints[N] was set:
1793
1794 my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1795 my $vec = '';
1796 foreach( 0 .. $#ints ) {
1797 vec($vec,$_,1) = 1 if $ints[$_];
1798 }
1799
1800 The string $vec only takes up as many bits as it needs. For instance,
1801 if you had 16 entries in @ints, $vec only needs two bytes to store them
1802 (not counting the scalar variable overhead).
1803
1804 Here's how, given a vector in $vec, you can get those bits into your
1805 @ints array:
1806
1807 sub bitvec_to_list {
1808 my $vec = shift;
1809 my @ints;
1810 # Find null-byte density then select best algorithm
1811 if ($vec =~ tr/\0// / length $vec > 0.95) {
1812 use integer;
1813 my $i;
1814
1815 # This method is faster with mostly null-bytes
1816 while($vec =~ /[^\0]/g ) {
1817 $i = -9 + 8 * pos $vec;
1818 push @ints, $i if vec($vec, ++$i, 1);
1819 push @ints, $i if vec($vec, ++$i, 1);
1820 push @ints, $i if vec($vec, ++$i, 1);
1821 push @ints, $i if vec($vec, ++$i, 1);
1822 push @ints, $i if vec($vec, ++$i, 1);
1823 push @ints, $i if vec($vec, ++$i, 1);
1824 push @ints, $i if vec($vec, ++$i, 1);
1825 push @ints, $i if vec($vec, ++$i, 1);
1826 }
1827 }
1828 else {
1829 # This method is a fast general algorithm
1830 use integer;
1831 my $bits = unpack "b*", $vec;
1832 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1833 push @ints, pos $bits while($bits =~ /1/g);
1834 }
1835
1836 return \@ints;
1837 }
1838
1839 This method gets faster the more sparse the bit vector is. (Courtesy
1840 of Tim Bunce and Winfried Koenig.)
1841
1842 You can make the while loop a lot shorter with this suggestion from
1843 Benjamin Goldberg:
1844
1845 while($vec =~ /[^\0]+/g ) {
1846 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1847 }
1848
1849 Or use the CPAN module Bit::Vector:
1850
1851 my $vector = Bit::Vector->new($num_of_bits);
1852 $vector->Index_List_Store(@ints);
1853 my @ints = $vector->Index_List_Read();
1854
1855 Bit::Vector provides efficient methods for bit vector, sets of small
1856 integers and "big int" math.
1857
1858 Here's a more extensive illustration using vec():
1859
1860 # vec demo
1861 my $vector = "\xff\x0f\xef\xfe";
1862 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1863 unpack("N", $vector), "\n";
1864 my $is_set = vec($vector, 23, 1);
1865 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1866 pvec($vector);
1867
1868 set_vec(1,1,1);
1869 set_vec(3,1,1);
1870 set_vec(23,1,1);
1871
1872 set_vec(3,1,3);
1873 set_vec(3,2,3);
1874 set_vec(3,4,3);
1875 set_vec(3,4,7);
1876 set_vec(3,8,3);
1877 set_vec(3,8,7);
1878
1879 set_vec(0,32,17);
1880 set_vec(1,32,17);
1881
1882 sub set_vec {
1883 my ($offset, $width, $value) = @_;
1884 my $vector = '';
1885 vec($vector, $offset, $width) = $value;
1886 print "offset=$offset width=$width value=$value\n";
1887 pvec($vector);
1888 }
1889
1890 sub pvec {
1891 my $vector = shift;
1892 my $bits = unpack("b*", $vector);
1893 my $i = 0;
1894 my $BASE = 8;
1895
1896 print "vector length in bytes: ", length($vector), "\n";
1897 @bytes = unpack("A8" x length($vector), $bits);
1898 print "bits are: @bytes\n\n";
1899 }
1900
1901 Why does defined() return true on empty arrays and hashes?
1902 The short story is that you should probably only use defined on scalars
1903 or functions, not on aggregates (arrays and hashes). See "defined" in
1904 perlfunc in the 5.004 release or later of Perl for more detail.
1905
1907 How do I process an entire hash?
1908 (contributed by brian d foy)
1909
1910 There are a couple of ways that you can process an entire hash. You can
1911 get a list of keys, then go through each key, or grab a one key-value
1912 pair at a time.
1913
1914 To go through all of the keys, use the "keys" function. This extracts
1915 all of the keys of the hash and gives them back to you as a list. You
1916 can then get the value through the particular key you're processing:
1917
1918 foreach my $key ( keys %hash ) {
1919 my $value = $hash{$key}
1920 ...
1921 }
1922
1923 Once you have the list of keys, you can process that list before you
1924 process the hash elements. For instance, you can sort the keys so you
1925 can process them in lexical order:
1926
1927 foreach my $key ( sort keys %hash ) {
1928 my $value = $hash{$key}
1929 ...
1930 }
1931
1932 Or, you might want to only process some of the items. If you only want
1933 to deal with the keys that start with "text:", you can select just
1934 those using "grep":
1935
1936 foreach my $key ( grep /^text:/, keys %hash ) {
1937 my $value = $hash{$key}
1938 ...
1939 }
1940
1941 If the hash is very large, you might not want to create a long list of
1942 keys. To save some memory, you can grab one key-value pair at a time
1943 using "each()", which returns a pair you haven't seen yet:
1944
1945 while( my( $key, $value ) = each( %hash ) ) {
1946 ...
1947 }
1948
1949 The "each" operator returns the pairs in apparently random order, so if
1950 ordering matters to you, you'll have to stick with the "keys" method.
1951
1952 The "each()" operator can be a bit tricky though. You can't add or
1953 delete keys of the hash while you're using it without possibly skipping
1954 or re-processing some pairs after Perl internally rehashes all of the
1955 elements. Additionally, a hash has only one iterator, so if you mix
1956 "keys", "values", or "each" on the same hash, you risk resetting the
1957 iterator and messing up your processing. See the "each" entry in
1958 perlfunc for more details.
1959
1960 How do I merge two hashes?
1961 (contributed by brian d foy)
1962
1963 Before you decide to merge two hashes, you have to decide what to do if
1964 both hashes contain keys that are the same and if you want to leave the
1965 original hashes as they were.
1966
1967 If you want to preserve the original hashes, copy one hash (%hash1) to
1968 a new hash (%new_hash), then add the keys from the other hash (%hash2
1969 to the new hash. Checking that the key already exists in %new_hash
1970 gives you a chance to decide what to do with the duplicates:
1971
1972 my %new_hash = %hash1; # make a copy; leave %hash1 alone
1973
1974 foreach my $key2 ( keys %hash2 ) {
1975 if( exists $new_hash{$key2} ) {
1976 warn "Key [$key2] is in both hashes!";
1977 # handle the duplicate (perhaps only warning)
1978 ...
1979 next;
1980 }
1981 else {
1982 $new_hash{$key2} = $hash2{$key2};
1983 }
1984 }
1985
1986 If you don't want to create a new hash, you can still use this looping
1987 technique; just change the %new_hash to %hash1.
1988
1989 foreach my $key2 ( keys %hash2 ) {
1990 if( exists $hash1{$key2} ) {
1991 warn "Key [$key2] is in both hashes!";
1992 # handle the duplicate (perhaps only warning)
1993 ...
1994 next;
1995 }
1996 else {
1997 $hash1{$key2} = $hash2{$key2};
1998 }
1999 }
2000
2001 If you don't care that one hash overwrites keys and values from the
2002 other, you could just use a hash slice to add one hash to another. In
2003 this case, values from %hash2 replace values from %hash1 when they have
2004 keys in common:
2005
2006 @hash1{ keys %hash2 } = values %hash2;
2007
2008 What happens if I add or remove keys from a hash while iterating over it?
2009 (contributed by brian d foy)
2010
2011 The easy answer is "Don't do that!"
2012
2013 If you iterate through the hash with each(), you can delete the key
2014 most recently returned without worrying about it. If you delete or add
2015 other keys, the iterator may skip or double up on them since perl may
2016 rearrange the hash table. See the entry for "each()" in perlfunc.
2017
2018 How do I look up a hash element by value?
2019 Create a reverse hash:
2020
2021 my %by_value = reverse %by_key;
2022 my $key = $by_value{$value};
2023
2024 That's not particularly efficient. It would be more space-efficient to
2025 use:
2026
2027 while (my ($key, $value) = each %by_key) {
2028 $by_value{$value} = $key;
2029 }
2030
2031 If your hash could have repeated values, the methods above will only
2032 find one of the associated keys. This may or may not worry you. If it
2033 does worry you, you can always reverse the hash into a hash of arrays
2034 instead:
2035
2036 while (my ($key, $value) = each %by_key) {
2037 push @{$key_list_by_value{$value}}, $key;
2038 }
2039
2040 How can I know how many entries are in a hash?
2041 (contributed by brian d foy)
2042
2043 This is very similar to "How do I process an entire hash?", also in
2044 perlfaq4, but a bit simpler in the common cases.
2045
2046 You can use the "keys()" built-in function in scalar context to find
2047 out have many entries you have in a hash:
2048
2049 my $key_count = keys %hash; # must be scalar context!
2050
2051 If you want to find out how many entries have a defined value, that's a
2052 bit different. You have to check each value. A "grep" is handy:
2053
2054 my $defined_value_count = grep { defined } values %hash;
2055
2056 You can use that same structure to count the entries any way that you
2057 like. If you want the count of the keys with vowels in them, you just
2058 test for that instead:
2059
2060 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2061
2062 The "grep" in scalar context returns the count. If you want the list of
2063 matching items, just use it in list context instead:
2064
2065 my @defined_values = grep { defined } values %hash;
2066
2067 The "keys()" function also resets the iterator, which means that you
2068 may see strange results if you use this between uses of other hash
2069 operators such as "each()".
2070
2071 How do I sort a hash (optionally by value instead of key)?
2072 (contributed by brian d foy)
2073
2074 To sort a hash, start with the keys. In this example, we give the list
2075 of keys to the sort function which then compares them ASCIIbetically
2076 (which might be affected by your locale settings). The output list has
2077 the keys in ASCIIbetical order. Once we have the keys, we can go
2078 through them to create a report which lists the keys in ASCIIbetical
2079 order.
2080
2081 my @keys = sort { $a cmp $b } keys %hash;
2082
2083 foreach my $key ( @keys ) {
2084 printf "%-20s %6d\n", $key, $hash{$key};
2085 }
2086
2087 We could get more fancy in the "sort()" block though. Instead of
2088 comparing the keys, we can compute a value with them and use that value
2089 as the comparison.
2090
2091 For instance, to make our report order case-insensitive, we use "lc" to
2092 lowercase the keys before comparing them:
2093
2094 my @keys = sort { lc $a cmp lc $b } keys %hash;
2095
2096 Note: if the computation is expensive or the hash has many elements,
2097 you may want to look at the Schwartzian Transform to cache the
2098 computation results.
2099
2100 If we want to sort by the hash value instead, we use the hash key to
2101 look it up. We still get out a list of keys, but this time they are
2102 ordered by their value.
2103
2104 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2105
2106 From there we can get more complex. If the hash values are the same, we
2107 can provide a secondary sort on the hash key.
2108
2109 my @keys = sort {
2110 $hash{$a} <=> $hash{$b}
2111 or
2112 "\L$a" cmp "\L$b"
2113 } keys %hash;
2114
2115 How can I always keep my hash sorted?
2116 You can look into using the "DB_File" module and "tie()" using the
2117 $DB_BTREE hash bindings as documented in "In Memory Databases" in
2118 DB_File. The Tie::IxHash module from CPAN might also be instructive.
2119 Although this does keep your hash sorted, you might not like the
2120 slowdown you suffer from the tie interface. Are you sure you need to do
2121 this? :)
2122
2123 What's the difference between "delete" and "undef" with hashes?
2124 Hashes contain pairs of scalars: the first is the key, the second is
2125 the value. The key will be coerced to a string, although the value can
2126 be any kind of scalar: string, number, or reference. If a key $key is
2127 present in %hash, "exists($hash{$key})" will return true. The value for
2128 a given key can be "undef", in which case $hash{$key} will be "undef"
2129 while "exists $hash{$key}" will return true. This corresponds to ($key,
2130 "undef") being in the hash.
2131
2132 Pictures help... Here's the %hash table:
2133
2134 keys values
2135 +------+------+
2136 | a | 3 |
2137 | x | 7 |
2138 | d | 0 |
2139 | e | 2 |
2140 +------+------+
2141
2142 And these conditions hold
2143
2144 $hash{'a'} is true
2145 $hash{'d'} is false
2146 defined $hash{'d'} is true
2147 defined $hash{'a'} is true
2148 exists $hash{'a'} is true (Perl 5 only)
2149 grep ($_ eq 'a', keys %hash) is true
2150
2151 If you now say
2152
2153 undef $hash{'a'}
2154
2155 your table now reads:
2156
2157 keys values
2158 +------+------+
2159 | a | undef|
2160 | x | 7 |
2161 | d | 0 |
2162 | e | 2 |
2163 +------+------+
2164
2165 and these conditions now hold; changes in caps:
2166
2167 $hash{'a'} is FALSE
2168 $hash{'d'} is false
2169 defined $hash{'d'} is true
2170 defined $hash{'a'} is FALSE
2171 exists $hash{'a'} is true (Perl 5 only)
2172 grep ($_ eq 'a', keys %hash) is true
2173
2174 Notice the last two: you have an undef value, but a defined key!
2175
2176 Now, consider this:
2177
2178 delete $hash{'a'}
2179
2180 your table now reads:
2181
2182 keys values
2183 +------+------+
2184 | x | 7 |
2185 | d | 0 |
2186 | e | 2 |
2187 +------+------+
2188
2189 and these conditions now hold; changes in caps:
2190
2191 $hash{'a'} is false
2192 $hash{'d'} is false
2193 defined $hash{'d'} is true
2194 defined $hash{'a'} is false
2195 exists $hash{'a'} is FALSE (Perl 5 only)
2196 grep ($_ eq 'a', keys %hash) is FALSE
2197
2198 See, the whole entry is gone!
2199
2200 Why don't my tied hashes make the defined/exists distinction?
2201 This depends on the tied hash's implementation of EXISTS(). For
2202 example, there isn't the concept of undef with hashes that are tied to
2203 DBM* files. It also means that exists() and defined() do the same thing
2204 with a DBM* file, and what they end up doing is not what they do with
2205 ordinary hashes.
2206
2207 How do I reset an each() operation part-way through?
2208 (contributed by brian d foy)
2209
2210 You can use the "keys" or "values" functions to reset "each". To simply
2211 reset the iterator used by "each" without doing anything else, use one
2212 of them in void context:
2213
2214 keys %hash; # resets iterator, nothing else.
2215 values %hash; # resets iterator, nothing else.
2216
2217 See the documentation for "each" in perlfunc.
2218
2219 How can I get the unique keys from two hashes?
2220 First you extract the keys from the hashes into lists, then solve the
2221 "removing duplicates" problem described above. For example:
2222
2223 my %seen = ();
2224 for my $element (keys(%foo), keys(%bar)) {
2225 $seen{$element}++;
2226 }
2227 my @uniq = keys %seen;
2228
2229 Or more succinctly:
2230
2231 my @uniq = keys %{{%foo,%bar}};
2232
2233 Or if you really want to save space:
2234
2235 my %seen = ();
2236 while (defined ($key = each %foo)) {
2237 $seen{$key}++;
2238 }
2239 while (defined ($key = each %bar)) {
2240 $seen{$key}++;
2241 }
2242 my @uniq = keys %seen;
2243
2244 How can I store a multidimensional array in a DBM file?
2245 Either stringify the structure yourself (no fun), or else get the MLDBM
2246 (which uses Data::Dumper) module from CPAN and layer it on top of
2247 either DB_File or GDBM_File. You might also try DBM::Deep, but it can
2248 be a bit slow.
2249
2250 How can I make my hash remember the order I put elements into it?
2251 Use the Tie::IxHash from CPAN.
2252
2253 use Tie::IxHash;
2254
2255 tie my %myhash, 'Tie::IxHash';
2256
2257 for (my $i=0; $i<20; $i++) {
2258 $myhash{$i} = 2*$i;
2259 }
2260
2261 my @keys = keys %myhash;
2262 # @keys = (0,1,2,3,...)
2263
2264 Why does passing a subroutine an undefined element in a hash create it?
2265 (contributed by brian d foy)
2266
2267 Are you using a really old version of Perl?
2268
2269 Normally, accessing a hash key's value for a nonexistent key will not
2270 create the key.
2271
2272 my %hash = ();
2273 my $value = $hash{ 'foo' };
2274 print "This won't print\n" if exists $hash{ 'foo' };
2275
2276 Passing $hash{ 'foo' } to a subroutine used to be a special case,
2277 though. Since you could assign directly to $_[0], Perl had to be ready
2278 to make that assignment so it created the hash key ahead of time:
2279
2280 my_sub( $hash{ 'foo' } );
2281 print "This will print before 5.004\n" if exists $hash{ 'foo' };
2282
2283 sub my_sub {
2284 # $_[0] = 'bar'; # create hash key in case you do this
2285 1;
2286 }
2287
2288 Since Perl 5.004, however, this situation is a special case and Perl
2289 creates the hash key only when you make the assignment:
2290
2291 my_sub( $hash{ 'foo' } );
2292 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2293
2294 sub my_sub {
2295 $_[0] = 'bar';
2296 }
2297
2298 However, if you want the old behavior (and think carefully about that
2299 because it's a weird side effect), you can pass a hash slice instead.
2300 Perl 5.004 didn't make this a special case:
2301
2302 my_sub( @hash{ qw/foo/ } );
2303
2304 How can I make the Perl equivalent of a C structure/C++ class/hash or array
2305 of hashes or arrays?
2306 Usually a hash ref, perhaps like this:
2307
2308 $record = {
2309 NAME => "Jason",
2310 EMPNO => 132,
2311 TITLE => "deputy peon",
2312 AGE => 23,
2313 SALARY => 37_000,
2314 PALS => [ "Norbert", "Rhys", "Phineas"],
2315 };
2316
2317 References are documented in perlref and perlreftut. Examples of
2318 complex data structures are given in perldsc and perllol. Examples of
2319 structures and object-oriented classes are in perltoot.
2320
2321 How can I use a reference as a hash key?
2322 (contributed by brian d foy and Ben Morrow)
2323
2324 Hash keys are strings, so you can't really use a reference as the key.
2325 When you try to do that, perl turns the reference into its stringified
2326 form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2327 the reference from the stringified form, at least without doing some
2328 extra work on your own.
2329
2330 Remember that the entry in the hash will still be there even if the
2331 referenced variable goes out of scope, and that it is entirely
2332 possible for Perl to subsequently allocate a different variable at the
2333 same address. This will mean a new variable might accidentally be
2334 associated with the value for an old.
2335
2336 If you have Perl 5.10 or later, and you just want to store a value
2337 against the reference for lookup later, you can use the core
2338 Hash::Util::Fieldhash module. This will also handle renaming the keys
2339 if you use multiple threads (which causes all variables to be
2340 reallocated at new addresses, changing their stringification), and
2341 garbage-collecting the entries when the referenced variable goes out of
2342 scope.
2343
2344 If you actually need to be able to get a real reference back from each
2345 hash entry, you can use the Tie::RefHash module, which does the
2346 required work for you.
2347
2348 How can I check if a key exists in a multilevel hash?
2349 (contributed by brian d foy)
2350
2351 The trick to this problem is avoiding accidental autovivification. If
2352 you want to check three keys deep, you might naievely try this:
2353
2354 my %hash;
2355 if( exists $hash{key1}{key2}{key3} ) {
2356 ...;
2357 }
2358
2359 Even though you started with a completely empty hash, after that call
2360 to "exists" you've created the structure you needed to check for
2361 "key3":
2362
2363 %hash = (
2364 'key1' => {
2365 'key2' => {}
2366 }
2367 );
2368
2369 That's autovivification. You can get around this in a few ways. The
2370 easiest way is to just turn it off. The lexical "autovivification"
2371 pragma is available on CPAN. Now you don't add to the hash:
2372
2373 {
2374 no autovivification;
2375 my %hash;
2376 if( exists $hash{key1}{key2}{key3} ) {
2377 ...;
2378 }
2379 }
2380
2381 The Data::Diver module on CPAN can do it for you too. Its "Dive"
2382 subroutine can tell you not only if the keys exist but also get the
2383 value:
2384
2385 use Data::Diver qw(Dive);
2386
2387 my @exists = Dive( \%hash, qw(key1 key2 key3) );
2388 if( ! @exists ) {
2389 ...; # keys do not exist
2390 }
2391 elsif( ! defined $exists[0] ) {
2392 ...; # keys exist but value is undef
2393 }
2394
2395 You can easily do this yourself too by checking each level of the hash
2396 before you move onto the next level. This is essentially what
2397 Data::Diver does for you:
2398
2399 if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2400 ...;
2401 }
2402
2403 sub check_hash {
2404 my( $hash, @keys ) = @_;
2405
2406 return unless @keys;
2407
2408 foreach my $key ( @keys ) {
2409 return unless eval { exists $hash->{$key} };
2410 $hash = $hash->{$key};
2411 }
2412
2413 return 1;
2414 }
2415
2416 How can I prevent addition of unwanted keys into a hash?
2417 Since version 5.8.0, hashes can be restricted to a fixed number of
2418 given keys. Methods for creating and dealing with restricted hashes are
2419 exported by the Hash::Util module.
2420
2422 How do I handle binary data correctly?
2423 Perl is binary-clean, so it can handle binary data just fine. On
2424 Windows or DOS, however, you have to use "binmode" for binary files to
2425 avoid conversions for line endings. In general, you should use
2426 "binmode" any time you want to work with binary data.
2427
2428 Also see "binmode" in perlfunc or perlopentut.
2429
2430 If you're concerned about 8-bit textual data then see perllocale. If
2431 you want to deal with multibyte characters, however, there are some
2432 gotchas. See the section on Regular Expressions.
2433
2434 How do I determine whether a scalar is a number/whole/integer/float?
2435 Assuming that you don't care about IEEE notations like "NaN" or
2436 "Infinity", you probably just want to use a regular expression:
2437
2438 use 5.010;
2439
2440 given( $number ) {
2441 when( /\D/ )
2442 { say "\thas nondigits"; continue }
2443 when( /^\d+\z/ )
2444 { say "\tis a whole number"; continue }
2445 when( /^-?\d+\z/ )
2446 { say "\tis an integer"; continue }
2447 when( /^[+-]?\d+\z/ )
2448 { say "\tis a +/- integer"; continue }
2449 when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2450 { say "\tis a real number"; continue }
2451 when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
2452 { say "\tis a C float" }
2453 }
2454
2455 There are also some commonly used modules for the task. Scalar::Util
2456 (distributed with 5.8) provides access to perl's internal function
2457 "looks_like_number" for determining whether a variable looks like a
2458 number. Data::Types exports functions that validate data types using
2459 both the above and other regular expressions. Thirdly, there is
2460 Regexp::Common which has regular expressions to match various types of
2461 numbers. Those three modules are available from the CPAN.
2462
2463 If you're on a POSIX system, Perl supports the "POSIX::strtod" function
2464 for converting strings to doubles (and also "POSIX::strtol" for longs).
2465 Its semantics are somewhat cumbersome, so here's a "getnum" wrapper
2466 function for more convenient access. This function takes a string and
2467 returns the number it found, or "undef" for input that isn't a C float.
2468 The "is_numeric" function is a front end to "getnum" if you just want
2469 to say, "Is this a float?"
2470
2471 sub getnum {
2472 use POSIX qw(strtod);
2473 my $str = shift;
2474 $str =~ s/^\s+//;
2475 $str =~ s/\s+$//;
2476 $! = 0;
2477 my($num, $unparsed) = strtod($str);
2478 if (($str eq '') || ($unparsed != 0) || $!) {
2479 return undef;
2480 }
2481 else {
2482 return $num;
2483 }
2484 }
2485
2486 sub is_numeric { defined getnum($_[0]) }
2487
2488 Or you could check out the String::Scanf module on the CPAN instead.
2489
2490 How do I keep persistent data across program calls?
2491 For some specific applications, you can use one of the DBM modules.
2492 See AnyDBM_File. More generically, you should consult the FreezeThaw or
2493 Storable modules from CPAN. Starting from Perl 5.8, Storable is part of
2494 the standard distribution. Here's one example using Storable's "store"
2495 and "retrieve" functions:
2496
2497 use Storable;
2498 store(\%hash, "filename");
2499
2500 # later on...
2501 $href = retrieve("filename"); # by ref
2502 %hash = %{ retrieve("filename") }; # direct to hash
2503
2504 How do I print out or copy a recursive data structure?
2505 The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
2506 for printing out data structures. The Storable module on CPAN (or the
2507 5.8 release of Perl), provides a function called "dclone" that
2508 recursively copies its argument.
2509
2510 use Storable qw(dclone);
2511 $r2 = dclone($r1);
2512
2513 Where $r1 can be a reference to any kind of data structure you'd like.
2514 It will be deeply copied. Because "dclone" takes and returns
2515 references, you'd have to add extra punctuation if you had a hash of
2516 arrays that you wanted to copy.
2517
2518 %newhash = %{ dclone(\%oldhash) };
2519
2520 How do I define methods for every class/object?
2521 (contributed by Ben Morrow)
2522
2523 You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2524 very careful to consider the consequences of doing this: adding methods
2525 to every object is very likely to have unintended consequences. If
2526 possible, it would be better to have all your object inherit from some
2527 common base class, or to use an object system like Moose that supports
2528 roles.
2529
2530 How do I verify a credit card checksum?
2531 Get the Business::CreditCard module from CPAN.
2532
2533 How do I pack arrays of doubles or floats for XS code?
2534 The arrays.h/arrays.c code in the PGPLOT module on CPAN does just this.
2535 If you're doing a lot of float or double processing, consider using the
2536 PDL module from CPAN instead--it makes number-crunching easy.
2537
2538 See <http://search.cpan.org/dist/PGPLOT> for the code.
2539
2541 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other
2542 authors as noted. All rights reserved.
2543
2544 This documentation is free; you can redistribute it and/or modify it
2545 under the same terms as Perl itself.
2546
2547 Irrespective of its distribution, all code examples in this file are
2548 hereby placed into the public domain. You are permitted and encouraged
2549 to use this code in your own programs for fun or for profit as you see
2550 fit. A simple comment in the code giving credit would be courteous but
2551 is not required.
2552
2553
2554
2555perl v5.16.3 2013-03-04 PERLFAQ4(1)