1perlfaq4(3) User Contributed Perl Documentation perlfaq4(3)
2
3
4
6 perlfaq4 - Data Manipulation
7
9 version 5.20230812
10
12 This section of the FAQ answers questions related to manipulating
13 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
14
16 Why am I getting long decimals (eg, 19.9499999999999) instead of the
17 numbers I should be getting (eg, 19.95)?
18 For the long explanation, see David Goldberg's "What Every Computer
19 Scientist Should Know About Floating-Point Arithmetic"
20 (<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>).
21
22 Internally, your computer represents floating-point numbers in binary.
23 Digital (as in powers of two) computers cannot store all numbers
24 exactly. Some real numbers lose precision in the process. This is a
25 problem with how computers store numbers and affects all computer
26 languages, not just Perl.
27
28 perlnumber shows the gory details of number representations and
29 conversions.
30
31 To limit the number of decimal places in your numbers, you can use the
32 "printf" or "sprintf" function. See "Floating-point Arithmetic" in
33 perlop for more details.
34
35 printf "%.2f", 10/3;
36
37 my $number = sprintf "%.2f", 10/3;
38
39 Why is int() broken?
40 Your int() is most probably working just fine. It's the numbers that
41 aren't quite what you think.
42
43 First, see the answer to "Why am I getting long decimals (eg,
44 19.9499999999999) instead of the numbers I should be getting (eg,
45 19.95)?".
46
47 For example, this
48
49 print int(0.6/0.2-2), "\n";
50
51 will in most computers print 0, not 1, because even such simple numbers
52 as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
53 What you think in the above as 'three' is really more like
54 2.9999999999999995559.
55
56 Why isn't my octal data interpreted correctly?
57 (contributed by brian d foy)
58
59 You're probably trying to convert a string to a number, which Perl only
60 converts as a decimal number. When Perl converts a string to a number,
61 it ignores leading spaces and zeroes, then assumes the rest of the
62 digits are in base 10:
63
64 my $string = '0644';
65
66 print $string + 0; # prints 644
67
68 print $string + 44; # prints 688, certainly not octal!
69
70 This problem usually involves one of the Perl built-ins that has the
71 same name a Unix command that uses octal numbers as arguments on the
72 command line. In this example, "chmod" on the command line knows that
73 its first argument is octal because that's what it does:
74
75 %prompt> chmod 644 file
76
77 If you want to use the same literal digits (644) in Perl, you have to
78 tell Perl to treat them as octal numbers either by prefixing the digits
79 with a 0 or using "oct":
80
81 chmod( 0644, $filename ); # right, has leading zero
82 chmod( oct(644), $filename ); # also correct
83
84 The problem comes in when you take your numbers from something that
85 Perl thinks is a string, such as a command line argument in @ARGV:
86
87 chmod( $ARGV[0], $filename ); # wrong, even if "0644"
88
89 chmod( oct($ARGV[0]), $filename ); # correct, treat string as octal
90
91 You can always check the value you're using by printing it in octal
92 notation to ensure it matches what you think it should be. Print it in
93 octal and decimal format:
94
95 printf "0%o %d", $number, $number;
96
97 Does Perl have a round() function? What about ceil() and floor()? Trig
98 functions?
99 Remember that int() merely truncates toward 0. For rounding to a
100 certain number of digits, sprintf() or printf() is usually the easiest
101 route.
102
103 printf("%.3f", 3.1415926535); # prints 3.142
104
105 The POSIX module (part of the standard Perl distribution) implements
106 ceil(), floor(), and a number of other mathematical and trigonometric
107 functions.
108
109 use POSIX;
110 my $ceil = ceil(3.5); # 4
111 my $floor = floor(3.5); # 3
112
113 In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex
114 module. With 5.004, the Math::Trig module (part of the standard Perl
115 distribution) implements the trigonometric functions. Internally it
116 uses the Math::Complex module and some functions can break out from the
117 real axis into the complex plane, for example the inverse sine of 2.
118
119 Rounding in financial applications can have serious implications, and
120 the rounding method used should be specified precisely. In these cases,
121 it probably pays not to trust whichever system of rounding is being
122 used by Perl, but instead to implement the rounding function you need
123 yourself.
124
125 To see why, notice how you'll still have an issue on half-way-point
126 alternation:
127
128 for (my $i = -5; $i <= 5; $i += 0.5) { printf "%.0f ",$i }
129
130 -5 -4 -4 -4 -3 -2 -2 -2 -1 -0 0 0 1 2 2 2 3 4 4 4 5
131
132 Don't blame Perl. It's the same as in C. IEEE says we have to do this.
133 Perl numbers whose absolute values are integers under 2**31 (on 32-bit
134 machines) will work pretty much like mathematical integers. Other
135 numbers are not guaranteed.
136
137 How do I convert between numeric representations/bases/radixes?
138 As always with Perl there is more than one way to do it. Below are a
139 few examples of approaches to making common conversions between number
140 representations. This is intended to be representational rather than
141 exhaustive.
142
143 Some of the examples later in perlfaq4 use the Bit::Vector module from
144 CPAN. The reason you might choose Bit::Vector over the perl built-in
145 functions is that it works with numbers of ANY size, that it is
146 optimized for speed on some operations, and for at least some
147 programmers the notation might be familiar.
148
149 How do I convert hexadecimal into decimal
150 Using perl's built in conversion of "0x" notation:
151
152 my $dec = 0xDEADBEEF;
153
154 Using the "hex" function:
155
156 my $dec = hex("DEADBEEF");
157
158 Using "pack":
159
160 my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
161
162 Using the CPAN module "Bit::Vector":
163
164 use Bit::Vector;
165 my $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
166 my $dec = $vec->to_Dec();
167
168 How do I convert from decimal to hexadecimal
169 Using "sprintf":
170
171 my $hex = sprintf("%X", 3735928559); # upper case A-F
172 my $hex = sprintf("%x", 3735928559); # lower case a-f
173
174 Using "unpack":
175
176 my $hex = unpack("H*", pack("N", 3735928559));
177
178 Using Bit::Vector:
179
180 use Bit::Vector;
181 my $vec = Bit::Vector->new_Dec(32, -559038737);
182 my $hex = $vec->to_Hex();
183
184 And Bit::Vector supports odd bit counts:
185
186 use Bit::Vector;
187 my $vec = Bit::Vector->new_Dec(33, 3735928559);
188 $vec->Resize(32); # suppress leading 0 if unwanted
189 my $hex = $vec->to_Hex();
190
191 How do I convert from octal to decimal
192 Using Perl's built in conversion of numbers with leading zeros:
193
194 my $dec = 033653337357; # note the leading 0!
195
196 Using the "oct" function:
197
198 my $dec = oct("33653337357");
199
200 Using Bit::Vector:
201
202 use Bit::Vector;
203 my $vec = Bit::Vector->new(32);
204 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
205 my $dec = $vec->to_Dec();
206
207 How do I convert from decimal to octal
208 Using "sprintf":
209
210 my $oct = sprintf("%o", 3735928559);
211
212 Using Bit::Vector:
213
214 use Bit::Vector;
215 my $vec = Bit::Vector->new_Dec(32, -559038737);
216 my $oct = reverse join('', $vec->Chunk_List_Read(3));
217
218 How do I convert from binary to decimal
219 Perl 5.6 lets you write binary numbers directly with the "0b"
220 notation:
221
222 my $number = 0b10110110;
223
224 Using "oct":
225
226 my $input = "10110110";
227 my $decimal = oct( "0b$input" );
228
229 Using "pack" and "ord":
230
231 my $decimal = ord(pack('B8', '10110110'));
232
233 Using "pack" and "unpack" for larger strings:
234
235 my $int = unpack("N", pack("B32",
236 substr("0" x 32 . "11110101011011011111011101111", -32)));
237 my $dec = sprintf("%d", $int);
238
239 # substr() is used to left-pad a 32-character string with zeros.
240
241 Using Bit::Vector:
242
243 my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
244 my $dec = $vec->to_Dec();
245
246 How do I convert from decimal to binary
247 Using "sprintf" (perl 5.6+):
248
249 my $bin = sprintf("%b", 3735928559);
250
251 Using "unpack":
252
253 my $bin = unpack("B*", pack("N", 3735928559));
254
255 Using Bit::Vector:
256
257 use Bit::Vector;
258 my $vec = Bit::Vector->new_Dec(32, -559038737);
259 my $bin = $vec->to_Bin();
260
261 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
262 are left as an exercise to the inclined reader.
263
264 Why doesn't & work the way I want it to?
265 Perl's "&" bitwise operator works on both numbers and strings,
266 sometimes producing surprising results when you expected a number but
267 received a string. You probably expected perl to automatically convert
268 the operands to numbers like the mathematical operators would.
269 Instead, perl treats string operands as bitvectors.
270
271 Consider the bitwise difference between the number 3 and the bitvector
272 represented by "3". A number has the bit pattern for its magnitude. The
273 number 3 is 0b11 (a 2 and a 1). The bitvector has the bit pattern that
274 is the ordinal value for each octet, and that value is unrelated to any
275 numeric value that the digit represents. The character "3" is the
276 bitvector 0b0011_0011.
277
278 These operations have different results even though you might think
279 they look like the same "number":
280
281 11 & 3; # 0b0000_1011 & 0b0000_0011
282 # -> 0b0000_0011 (number 3)
283 "11" & "3"; # 0b0011_0001_0011_0001 & 0b0011_0011
284 # -> 0b0011_0001 (ASCII char "1")
285
286 Note that if any operand has a numeric value, perl uses numeric
287 semantics (although you should not count on this):
288
289 my($i, $j) = ( 11, 3 ); # $i & $j # 11 & 3 -> 3
290 my($i, $j) = ("11", 3 ); # $i & $j # 11 & 3 -> 3
291 my($i, $j) = ("11", "3"); # $i & $j # "11" & "3" -> 1
292
293 Remember that a perl scalar can have both string and numeric values at
294 the same time. A value that starts as a string and has never
295 encountered a numeric operation has no numeric value yet. Perl does
296 this to save time and work so it doesn't have to decide a numeric value
297 for a scalar it might never use as a number. In that case, string
298 semantics wins. But, if there is a numeric value already, numeric
299 semantics win. Force perl to compute the numeric value by adding 0:
300
301 my($i, $j) = ("11", "3"); $j += 0 # $i & $j # "11" & 3 -> 3
302
303 However, this is not a documented feature, or as perlop says, it "is
304 not well defined". One way to fix ensure numeric semantics is to
305 explicitly convert both of values to numbers:
306
307 (0+$i) & (0+$j)
308
309 To fix this annoyance, Perl v5.22 separated the string and number
310 behavior. The "bitwise" feature introduced four new operators that
311 would work with only string semantics: "&.", "|.", "^.", and "~.". The
312 original operators, "&", "|", "^", and "~", would then apply only
313 numeric semantics.
314
315 Enable this feature explicitly with feature:
316
317 use feature qw(bitwise);
318
319 Or, as of v5.28, require the minimum version of perl with "use":
320
321 use v5.28; # bitwise feature for free
322
323 How do I multiply matrices?
324 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
325 or the PDL extension (also available from CPAN).
326
327 How do I perform an operation on a series of integers?
328 To call a function on each element in an array, and collect the
329 results, use:
330
331 my @results = map { my_func($_) } @array;
332
333 For example:
334
335 my @triple = map { 3 * $_ } @single;
336
337 To call a function on each element of an array, but ignore the results:
338
339 foreach my $iterator (@array) {
340 some_func($iterator);
341 }
342
343 To call a function on each integer in a (small) range, you can use:
344
345 my @results = map { some_func($_) } (5 .. 25);
346
347 but you should be aware that in this form, the ".." operator creates a
348 list of all integers in the range, which can take a lot of memory for
349 large ranges. However, the problem does not occur when using ".."
350 within a "for" loop, because in that case the range operator is
351 optimized to iterate over the range, without creating the entire list.
352 So
353
354 my @results = ();
355 for my $i (5 .. 500_005) {
356 push(@results, some_func($i));
357 }
358
359 or even
360
361 push(@results, some_func($_)) for 5 .. 500_005;
362
363 will not create an intermediate list of 500,000 integers.
364
365 How can I output Roman numerals?
366 Get the <http://www.cpan.org/modules/by-module/Roman> module.
367
368 Why aren't my random numbers random?
369 If you're using a version of Perl before 5.004, you must call "srand"
370 once at the start of your program to seed the random number generator.
371
372 BEGIN { srand() if $] < 5.004 }
373
374 5.004 and later automatically call "srand" at the beginning. Don't call
375 "srand" more than once--you make your numbers less random, rather than
376 more.
377
378 Computers are good at being predictable and bad at being random
379 (despite appearances caused by bugs in your programs :-). The random
380 article in the "Far More Than You Ever Wanted To Know" collection in
381 <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy of Tom
382 Phoenix, talks more about this. John von Neumann said, "Anyone who
383 attempts to generate random numbers by deterministic means is, of
384 course, living in a state of sin."
385
386 Perl relies on the underlying system for the implementation of "rand"
387 and "srand"; on some systems, the generated numbers are not random
388 enough (especially on Windows : see
389 <http://www.perlmonks.org/?node_id=803632>). Several CPAN modules in
390 the "Math" namespace implement better pseudorandom generators; see for
391 example Math::Random::MT ("Mersenne Twister", fast), or
392 Math::TrulyRandom (uses the imperfections in the system's timer to
393 generate random numbers, which is rather slow). More algorithms for
394 random numbers are described in "Numerical Recipes in C" at
395 <http://www.nr.com/>
396
397 How do I get a random number between X and Y?
398 To get a random number between two values, you can use the rand()
399 built-in to get a random number between 0 and 1. From there, you shift
400 that into the range that you want.
401
402 rand($x) returns a number such that "0 <= rand($x) < $x". Thus what you
403 want to have perl figure out is a random number in the range from 0 to
404 the difference between your X and Y.
405
406 That is, to get a number between 10 and 15, inclusive, you want a
407 random number between 0 and 5 that you can then add to 10.
408
409 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
410
411 Hence you derive the following simple function to abstract that. It
412 selects a random integer between the two given integers (inclusive).
413 For example: "random_int_between(50,120)".
414
415 sub random_int_between {
416 my($min, $max) = @_;
417 # Assumes that the two arguments are integers themselves!
418 return $min if $min == $max;
419 ($min, $max) = ($max, $min) if $min > $max;
420 return $min + int rand(1 + $max - $min);
421 }
422
424 How do I find the day or week of the year?
425 The day of the year is in the list returned by the "localtime"
426 function. Without an argument "localtime" uses the current time.
427
428 my $day_of_year = (localtime)[7];
429
430 The POSIX module can also format a date as the day of the year or week
431 of the year.
432
433 use POSIX qw/strftime/;
434 my $day_of_year = strftime "%j", localtime;
435 my $week_of_year = strftime "%W", localtime;
436
437 To get the day of year for any date, use POSIX's "mktime" to get a time
438 in epoch seconds for the argument to "localtime".
439
440 use POSIX qw/mktime strftime/;
441 my $week_of_year = strftime "%j",
442 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
443
444 You can also use Time::Piece, which comes with Perl and provides a
445 "localtime" that returns an object:
446
447 use Time::Piece;
448 my $day_of_year = localtime->yday;
449 my $week_of_year = localtime->week;
450
451 The Date::Calc module provides two functions to calculate these, too:
452
453 use Date::Calc;
454 my $day_of_year = Day_of_Year( 1987, 12, 18 );
455 my $week_of_year = Week_of_Year( 1987, 12, 18 );
456
457 How do I find the current century or millennium?
458 Use the following simple functions:
459
460 sub get_century {
461 return int((((localtime(shift || time))[5] + 1999))/100);
462 }
463
464 sub get_millennium {
465 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
466 }
467
468 On some systems, the POSIX module's strftime() function has been
469 extended in a non-standard way to use a %C format, which they sometimes
470 claim is the "century". It isn't, because on most such systems, this is
471 only the first two digits of the four-digit year, and thus cannot be
472 used to determine reliably the current century or millennium.
473
474 How can I compare two dates and find the difference?
475 (contributed by brian d foy)
476
477 You could just store all your dates as a number and then subtract.
478 Life isn't always that simple though.
479
480 The Time::Piece module, which comes with Perl, replaces localtime with
481 a version that returns an object. It also overloads the comparison
482 operators so you can compare them directly:
483
484 use Time::Piece;
485 my $date1 = localtime( $some_time );
486 my $date2 = localtime( $some_other_time );
487
488 if( $date1 < $date2 ) {
489 print "The date was in the past\n";
490 }
491
492 You can also get differences with a subtraction, which returns a
493 Time::Seconds object:
494
495 my $date_diff = $date1 - $date2;
496 print "The difference is ", $date_diff->days, " days\n";
497
498 If you want to work with formatted dates, the Date::Manip, Date::Calc,
499 or DateTime modules can help you.
500
501 How can I take a string and turn it into epoch seconds?
502 If it's a regular enough string that it always has the same format, you
503 can split it up and pass the parts to "timelocal" in the standard
504 Time::Local module. Otherwise, you should look into the Date::Calc,
505 Date::Parse, and Date::Manip modules from CPAN.
506
507 How can I find the Julian Day?
508 (contributed by brian d foy and Dave Cross)
509
510 You can use the Time::Piece module, part of the Standard Library, which
511 can convert a date/time to a Julian Day:
512
513 $ perl -MTime::Piece -le 'print localtime->julian_day'
514 2455607.7959375
515
516 Or the modified Julian Day:
517
518 $ perl -MTime::Piece -le 'print localtime->mjd'
519 55607.2961226851
520
521 Or even the day of the year (which is what some people think of as a
522 Julian day):
523
524 $ perl -MTime::Piece -le 'print localtime->yday'
525 45
526
527 You can also do the same things with the DateTime module:
528
529 $ perl -MDateTime -le'print DateTime->today->jd'
530 2453401.5
531 $ perl -MDateTime -le'print DateTime->today->mjd'
532 53401
533 $ perl -MDateTime -le'print DateTime->today->doy'
534 31
535
536 You can use the Time::JulianDay module available on CPAN. Ensure that
537 you really want to find a Julian day, though, as many people have
538 different ideas about Julian days (see
539 <http://www.hermetic.ch/cal_stud/jdn.htm> for instance):
540
541 $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
542 55608
543
544 How do I find yesterday's date?
545 (contributed by brian d foy)
546
547 To do it correctly, you can use one of the "Date" modules since they
548 work with calendars instead of times. The DateTime module makes it
549 simple, and give you the same time of day, only the day before, despite
550 daylight saving time changes:
551
552 use DateTime;
553
554 my $yesterday = DateTime->now->subtract( days => 1 );
555
556 print "Yesterday was $yesterday\n";
557
558 You can also use the Date::Calc module using its "Today_and_Now"
559 function.
560
561 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
562
563 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
564
565 print "@date_time\n";
566
567 Most people try to use the time rather than the calendar to figure out
568 dates, but that assumes that days are twenty-four hours each. For most
569 people, there are two days a year when they aren't: the switch to and
570 from summer time throws this off. For example, the rest of the
571 suggestions will be wrong sometimes:
572
573 Starting with Perl 5.10, Time::Piece and Time::Seconds are part of the
574 standard distribution, so you might think that you could do something
575 like this:
576
577 use Time::Piece;
578 use Time::Seconds;
579
580 my $yesterday = localtime() - ONE_DAY; # WRONG
581 print "Yesterday was $yesterday\n";
582
583 The Time::Piece module exports a new "localtime" that returns an
584 object, and Time::Seconds exports the "ONE_DAY" constant that is a set
585 number of seconds. This means that it always gives the time 24 hours
586 ago, which is not always yesterday. This can cause problems around the
587 end of daylight saving time when there's one day that is 25 hours long.
588
589 You have the same problem with Time::Local, which will give the wrong
590 answer for those same special cases:
591
592 # contributed by Gunnar Hjalmarsson
593 use Time::Local;
594 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
595 my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
596 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
597
598 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
599 (contributed by brian d foy)
600
601 Perl itself never had a Y2K problem, although that never stopped people
602 from creating Y2K problems on their own. See the documentation for
603 "localtime" for its proper use.
604
605 Starting with Perl 5.12, "localtime" and "gmtime" can handle dates past
606 03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
607 still might get a warning on a 32-bit "perl":
608
609 % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
610 Integer overflow in hexadecimal number at -e line 1.
611 Wed Nov 1 19:42:39 5576711
612
613 On a 64-bit "perl", you can get even larger dates for those really long
614 running projects:
615
616 % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
617 Thu Nov 2 00:42:39 5576711
618
619 You're still out of luck if you need to keep track of decaying protons
620 though.
621
623 How do I validate input?
624 (contributed by brian d foy)
625
626 There are many ways to ensure that values are what you expect or want
627 to accept. Besides the specific examples that we cover in the perlfaq,
628 you can also look at the modules with "Assert" and "Validate" in their
629 names, along with other modules such as Regexp::Common.
630
631 Some modules have validation for particular types of input, such as
632 Business::ISBN, Business::CreditCard, Email::Valid, and
633 Data::Validate::IP.
634
635 How do I unescape a string?
636 It depends just what you mean by "escape". URL escapes are dealt with
637 in perlfaq9. Shell escapes with the backslash ("\") character are
638 removed with
639
640 s/\\(.)/$1/g;
641
642 This won't expand "\n" or "\t" or any other special escapes.
643
644 How do I remove consecutive pairs of characters?
645 (contributed by brian d foy)
646
647 You can use the substitution operator to find pairs of characters (or
648 runs of characters) and replace them with a single instance. In this
649 substitution, we find a character in "(.)". The memory parentheses
650 store the matched character in the back-reference "\g1" and we use that
651 to require that the same thing immediately follow it. We replace that
652 part of the string with the character in $1.
653
654 s/(.)\g1/$1/g;
655
656 We can also use the transliteration operator, "tr///". In this example,
657 the search list side of our "tr///" contains nothing, but the "c"
658 option complements that so it contains everything. The replacement list
659 also contains nothing, so the transliteration is almost a no-op since
660 it won't do any replacements (or more exactly, replace the character
661 with itself). However, the "s" option squashes duplicated and
662 consecutive characters in the string so a character does not show up
663 next to itself
664
665 my $str = 'Haarlem'; # in the Netherlands
666 $str =~ tr///cs; # Now Harlem, like in New York
667
668 How do I expand function calls in a string?
669 (contributed by brian d foy)
670
671 This is documented in perlref, and although it's not the easiest thing
672 to read, it does work. In each of these examples, we call the function
673 inside the braces used to dereference a reference. If we have more than
674 one return value, we can construct and dereference an anonymous array.
675 In this case, we call the function in list context.
676
677 print "The time values are @{ [localtime] }.\n";
678
679 If we want to call the function in scalar context, we have to do a bit
680 more work. We can really have any code we like inside the braces, so we
681 simply have to end with the scalar reference, although how you do that
682 is up to you, and you can use code inside the braces. Note that the use
683 of parens creates a list context, so we need "scalar" to force the
684 scalar context on the function:
685
686 print "The time is ${\(scalar localtime)}.\n"
687
688 print "The time is ${ my $x = localtime; \$x }.\n";
689
690 If your function already returns a reference, you don't need to create
691 the reference yourself.
692
693 sub timestamp { my $t = localtime; \$t }
694
695 print "The time is ${ timestamp() }.\n";
696
697 The "Interpolation" module can also do a lot of magic for you. You can
698 specify a variable name, in this case "E", to set up a tied hash that
699 does the interpolation for you. It has several other methods to do this
700 as well.
701
702 use Interpolation E => 'eval';
703 print "The time values are $E{localtime()}.\n";
704
705 In most cases, it is probably easier to simply use string
706 concatenation, which also forces scalar context.
707
708 print "The time is " . localtime() . ".\n";
709
710 How do I find matching/nesting anything?
711 To find something between two single characters, a pattern like
712 "/x([^x]*)x/" will get the intervening bits in $1. For multiple ones,
713 then something more like "/alpha(.*?)omega/" would be needed. For
714 nested patterns and/or balanced expressions, see the so-called (?PARNO)
715 construct (available since perl 5.10). The CPAN module Regexp::Common
716 can help to build such regular expressions (see in particular
717 Regexp::Common::balanced and Regexp::Common::delimited).
718
719 More complex cases will require to write a parser, probably using a
720 parsing module from CPAN, like Regexp::Grammars, Parse::RecDescent,
721 Parse::Yapp, Text::Balanced, or Marpa::R2.
722
723 How do I reverse a string?
724 Use reverse() in scalar context, as documented in "reverse" in
725 perlfunc.
726
727 my $reversed = reverse $string;
728
729 How do I expand tabs in a string?
730 You can do it yourself:
731
732 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
733
734 Or you can just use the Text::Tabs module (part of the standard Perl
735 distribution).
736
737 use Text::Tabs;
738 my @expanded_lines = expand(@lines_with_tabs);
739
740 How do I reformat a paragraph?
741 Use Text::Wrap (part of the standard Perl distribution):
742
743 use Text::Wrap;
744 print wrap("\t", ' ', @paragraphs);
745
746 The paragraphs you give to Text::Wrap should not contain embedded
747 newlines. Text::Wrap doesn't justify the lines (flush-right).
748
749 Or use the CPAN module Text::Autoformat. Formatting files can be easily
750 done by making a shell alias, like so:
751
752 alias fmt="perl -i -MText::Autoformat -n0777 \
753 -e 'print autoformat $_, {all=>1}' $*"
754
755 See the documentation for Text::Autoformat to appreciate its many
756 capabilities.
757
758 How can I access or change N characters of a string?
759 You can access the first characters of a string with substr(). To get
760 the first character, for example, start at position 0 and grab the
761 string of length 1.
762
763 my $string = "Just another Perl Hacker";
764 my $first_char = substr( $string, 0, 1 ); # 'J'
765
766 To change part of a string, you can use the optional fourth argument
767 which is the replacement string.
768
769 substr( $string, 13, 4, "Perl 5.8.0" );
770
771 You can also use substr() as an lvalue.
772
773 substr( $string, 13, 4 ) = "Perl 5.8.0";
774
775 How do I change the Nth occurrence of something?
776 You have to keep track of N yourself. For example, let's say you want
777 to change the fifth occurrence of "whoever" or "whomever" into
778 "whosoever" or "whomsoever", case insensitively. These all assume that
779 $_ contains the string to be altered.
780
781 $count = 0;
782 s{((whom?)ever)}{
783 ++$count == 5 # is it the 5th?
784 ? "${2}soever" # yes, swap
785 : $1 # renege and leave it there
786 }ige;
787
788 In the more general case, you can use the "/g" modifier in a "while"
789 loop, keeping count of matches.
790
791 $WANT = 3;
792 $count = 0;
793 $_ = "One fish two fish red fish blue fish";
794 while (/(\w+)\s+fish\b/gi) {
795 if (++$count == $WANT) {
796 print "The third fish is a $1 one.\n";
797 }
798 }
799
800 That prints out: "The third fish is a red one." You can also use a
801 repetition count and repeated pattern like this:
802
803 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
804
805 How can I count the number of occurrences of a substring within a string?
806 There are a number of ways, with varying efficiency. If you want a
807 count of a certain single character (X) within a string, you can use
808 the "tr///" function like so:
809
810 my $string = "ThisXlineXhasXsomeXx'sXinXit";
811 my $count = ($string =~ tr/X//);
812 print "There are $count X characters in the string";
813
814 This is fine if you are just looking for a single character. However,
815 if you are trying to count multiple character substrings within a
816 larger string, "tr///" won't work. What you can do is wrap a while()
817 loop around a global pattern match. For example, let's count negative
818 integers:
819
820 my $string = "-9 55 48 -2 23 -76 4 14 -44";
821 my $count = 0;
822 while ($string =~ /-\d+/g) { $count++ }
823 print "There are $count negative numbers in the string";
824
825 Another version uses a global match in list context, then assigns the
826 result to a scalar, producing a count of the number of matches.
827
828 my $count = () = $string =~ /-\d+/g;
829
830 How do I capitalize all the words on one line?
831 (contributed by brian d foy)
832
833 Damian Conway's Text::Autoformat handles all of the thinking for you.
834
835 use Text::Autoformat;
836 my $x = "Dr. Strangelove or: How I Learned to Stop ".
837 "Worrying and Love the Bomb";
838
839 print $x, "\n";
840 for my $style (qw( sentence title highlight )) {
841 print autoformat($x, { case => $style }), "\n";
842 }
843
844 How do you want to capitalize those words?
845
846 FRED AND BARNEY'S LODGE # all uppercase
847 Fred And Barney's Lodge # title case
848 Fred and Barney's Lodge # highlight case
849
850 It's not as easy a problem as it looks. How many words do you think are
851 in there? Wait for it... wait for it.... If you answered 5 you're
852 right. Perl words are groups of "\w+", but that's not what you want to
853 capitalize. How is Perl supposed to know not to capitalize that "s"
854 after the apostrophe? You could try a regular expression:
855
856 $string =~ s/ (
857 (^\w) #at the beginning of the line
858 | # or
859 (\s\w) #preceded by whitespace
860 )
861 /\U$1/xg;
862
863 $string =~ s/([\w']+)/\u\L$1/g;
864
865 Now, what if you don't want to capitalize that "and"? Just use
866 Text::Autoformat and get on with the next problem. :)
867
868 How can I split a [character]-delimited string except when inside
869 [character]?
870 Several modules can handle this sort of parsing--Text::Balanced,
871 Text::CSV, Text::CSV_XS, and Text::ParseWords, among others.
872
873 Take the example case of trying to split a string that is comma-
874 separated into its different fields. You can't use "split(/,/)" because
875 you shouldn't split if the comma is inside quotes. For example, take a
876 data line like this:
877
878 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
879
880 Due to the restriction of the quotes, this is a fairly complex problem.
881 Thankfully, we have Jeffrey Friedl, author of Mastering Regular
882 Expressions, to handle these for us. He suggests (assuming your string
883 is contained in $text):
884
885 my @new = ();
886 push(@new, $+) while $text =~ m{
887 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
888 | ([^,]+),?
889 | ,
890 }gx;
891 push(@new, undef) if substr($text,-1,1) eq ',';
892
893 If you want to represent quotation marks inside a quotation-mark-
894 delimited field, escape them with backslashes (eg, "like \"this\"".
895
896 Alternatively, the Text::ParseWords module (part of the standard Perl
897 distribution) lets you say:
898
899 use Text::ParseWords;
900 @new = quotewords(",", 0, $text);
901
902 For parsing or generating CSV, though, using Text::CSV rather than
903 implementing it yourself is highly recommended; you'll save yourself
904 odd bugs popping up later by just using code which has already been
905 tried and tested in production for years.
906
907 How do I strip blank space from the beginning/end of a string?
908 (contributed by brian d foy)
909
910 A substitution can do this for you. For a single line, you want to
911 replace all the leading or trailing whitespace with nothing. You can do
912 that with a pair of substitutions:
913
914 s/^\s+//;
915 s/\s+$//;
916
917 You can also write that as a single substitution, although it turns out
918 the combined statement is slower than the separate ones. That might not
919 matter to you, though:
920
921 s/^\s+|\s+$//g;
922
923 In this regular expression, the alternation matches either at the
924 beginning or the end of the string since the anchors have a lower
925 precedence than the alternation. With the "/g" flag, the substitution
926 makes all possible matches, so it gets both. Remember, the trailing
927 newline matches the "\s+", and the "$" anchor can match to the
928 absolute end of the string, so the newline disappears too. Just add the
929 newline to the output, which has the added benefit of preserving
930 "blank" (consisting entirely of whitespace) lines which the "^\s+"
931 would remove all by itself:
932
933 while( <> ) {
934 s/^\s+|\s+$//g;
935 print "$_\n";
936 }
937
938 For a multi-line string, you can apply the regular expression to each
939 logical line in the string by adding the "/m" flag (for "multi-line").
940 With the "/m" flag, the "$" matches before an embedded newline, so it
941 doesn't remove it. This pattern still removes the newline at the end of
942 the string:
943
944 $string =~ s/^\s+|\s+$//gm;
945
946 Remember that lines consisting entirely of whitespace will disappear,
947 since the first part of the alternation can match the entire string and
948 replace it with nothing. If you need to keep embedded blank lines, you
949 have to do a little more work. Instead of matching any whitespace
950 (since that includes a newline), just match the other whitespace:
951
952 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
953
954 How do I pad a string with blanks or pad a number with zeroes?
955 In the following examples, $pad_len is the length to which you wish to
956 pad the string, $text or $num contains the string to be padded, and
957 $pad_char contains the padding character. You can use a single
958 character string constant instead of the $pad_char variable if you know
959 what it is in advance. And in the same way you can use an integer in
960 place of $pad_len if you know the pad length in advance.
961
962 The simplest method uses the "sprintf" function. It can pad on the left
963 or right with blanks and on the left with zeroes and it will not
964 truncate the result. The "pack" function can only pad strings on the
965 right with blanks and it will truncate the result to a maximum length
966 of $pad_len.
967
968 # Left padding a string with blanks (no truncation):
969 my $padded = sprintf("%${pad_len}s", $text);
970 my $padded = sprintf("%*s", $pad_len, $text); # same thing
971
972 # Right padding a string with blanks (no truncation):
973 my $padded = sprintf("%-${pad_len}s", $text);
974 my $padded = sprintf("%-*s", $pad_len, $text); # same thing
975
976 # Left padding a number with 0 (no truncation):
977 my $padded = sprintf("%0${pad_len}d", $num);
978 my $padded = sprintf("%0*d", $pad_len, $num); # same thing
979
980 # Right padding a string with blanks using pack (will truncate):
981 my $padded = pack("A$pad_len",$text);
982
983 If you need to pad with a character other than blank or zero you can
984 use one of the following methods. They all generate a pad string with
985 the "x" operator and combine that with $text. These methods do not
986 truncate $text.
987
988 Left and right padding with any character, creating a new string:
989
990 my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
991 my $padded = $text . $pad_char x ( $pad_len - length( $text ) );
992
993 Left and right padding with any character, modifying $text directly:
994
995 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
996 $text .= $pad_char x ( $pad_len - length( $text ) );
997
998 How do I extract selected columns from a string?
999 (contributed by brian d foy)
1000
1001 If you know the columns that contain the data, you can use "substr" to
1002 extract a single column.
1003
1004 my $column = substr( $line, $start_column, $length );
1005
1006 You can use "split" if the columns are separated by whitespace or some
1007 other delimiter, as long as whitespace or the delimiter cannot appear
1008 as part of the data.
1009
1010 my $line = ' fred barney betty ';
1011 my @columns = split /\s+/, $line;
1012 # ( '', 'fred', 'barney', 'betty' );
1013
1014 my $line = 'fred||barney||betty';
1015 my @columns = split /\|/, $line;
1016 # ( 'fred', '', 'barney', '', 'betty' );
1017
1018 If you want to work with comma-separated values, don't do this since
1019 that format is a bit more complicated. Use one of the modules that
1020 handle that format, such as Text::CSV, Text::CSV_XS, or Text::CSV_PP.
1021
1022 If you want to break apart an entire line of fixed columns, you can use
1023 "unpack" with the A (ASCII) format. By using a number after the format
1024 specifier, you can denote the column width. See the "pack" and "unpack"
1025 entries in perlfunc for more details.
1026
1027 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1028
1029 Note that spaces in the format argument to "unpack" do not denote
1030 literal spaces. If you have space separated data, you may want "split"
1031 instead.
1032
1033 How do I find the soundex value of a string?
1034 (contributed by brian d foy)
1035
1036 You can use the "Text::Soundex" module. If you want to do fuzzy or
1037 close matching, you might also try the String::Approx, and
1038 Text::Metaphone, and Text::DoubleMetaphone modules.
1039
1040 How can I expand variables in text strings?
1041 (contributed by brian d foy)
1042
1043 If you can avoid it, don't, or if you can use a templating system, such
1044 as Text::Template or Template Toolkit, do that instead. You might even
1045 be able to get the job done with "sprintf" or "printf":
1046
1047 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1048
1049 However, for the one-off simple case where I don't want to pull out a
1050 full templating system, I'll use a string that has two Perl scalar
1051 variables in it. In this example, I want to expand $foo and $bar to
1052 their variable's values:
1053
1054 my $foo = 'Fred';
1055 my $bar = 'Barney';
1056 $string = 'Say hello to $foo and $bar';
1057
1058 One way I can do this involves the substitution operator and a double
1059 "/e" flag. The first "/e" evaluates $1 on the replacement side and
1060 turns it into $foo. The second /e starts with $foo and replaces it with
1061 its value. $foo, then, turns into 'Fred', and that's finally what's
1062 left in the string:
1063
1064 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1065
1066 The "/e" will also silently ignore violations of strict, replacing
1067 undefined variable names with the empty string. Since I'm using the
1068 "/e" flag (twice even!), I have all of the same security problems I
1069 have with "eval" in its string form. If there's something odd in $foo,
1070 perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1071 myself in trouble.
1072
1073 To get around the security problem, I could also pull the values from a
1074 hash instead of evaluating variable names. Using a single "/e", I can
1075 check the hash to ensure the value exists, and if it doesn't, I can
1076 replace the missing value with a marker, in this case "???" to signal
1077 that I missed something:
1078
1079 my $string = 'This has $foo and $bar';
1080
1081 my %Replacements = (
1082 foo => 'Fred',
1083 );
1084
1085 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1086 $string =~ s/\$(\w+)/
1087 exists $Replacements{$1} ? $Replacements{$1} : '???'
1088 /eg;
1089
1090 print $string;
1091
1092 Does Perl have anything like Ruby's #{} or Python's f string?
1093 Unlike the others, Perl allows you to embed a variable naked in a
1094 double quoted string, e.g. "variable $variable". When there isn't
1095 whitespace or other non-word characters following the variable name,
1096 you can add braces (e.g. "foo ${foo}bar") to ensure correct parsing.
1097
1098 An array can also be embedded directly in a string, and will be
1099 expanded by default with spaces between the elements. The default
1100 LIST_SEPARATOR can be changed by assigning a different string to the
1101 special variable $", such as "local $" = ', ';".
1102
1103 Perl also supports references within a string providing the equivalent
1104 of the features in the other two languages.
1105
1106 "${\ ... }" embedded within a string will work for most simple
1107 statements such as an object->method call. More complex code can be
1108 wrapped in a do block "${\ do{...} }".
1109
1110 When you want a list to be expanded per $", use "@{[ ... ]}".
1111
1112 use Time::Piece;
1113 use Time::Seconds;
1114 my $scalar = 'STRING';
1115 my @array = ( 'zorro', 'a', 1, 'B', 3 );
1116
1117 # Print the current date and time and then Tommorrow
1118 my $t = Time::Piece->new;
1119 say "Now is: ${\ $t->cdate() }";
1120 say "Tomorrow: ${\ do{ my $T=Time::Piece->new + ONE_DAY ; $T->fullday }}";
1121
1122 # some variables in strings
1123 say "This is some scalar I have $scalar, this is an array @array.";
1124 say "You can also write it like this ${scalar} @{array}.";
1125
1126 # Change the $LIST_SEPARATOR
1127 local $" = ':';
1128 say "Set \$\" to delimit with ':' and sort the Array @{[ sort @array ]}";
1129
1130 You may also want to look at the module Quote::Code, and templating
1131 tools such as Template::Toolkit and Mojo::Template.
1132
1133 See also: "How can I expand variables in text strings?" and "How do I
1134 expand function calls in a string?" in this FAQ.
1135
1136 What's wrong with always quoting "$vars"?
1137 The problem is that those double-quotes force stringification--coercing
1138 numbers and references into strings--even when you don't want them to
1139 be strings. Think of it this way: double-quote expansion is used to
1140 produce new strings. If you already have a string, why do you need
1141 more?
1142
1143 If you get used to writing odd things like these:
1144
1145 print "$var"; # BAD
1146 my $new = "$old"; # BAD
1147 somefunc("$var"); # BAD
1148
1149 You'll be in trouble. Those should (in 99.8% of the cases) be the
1150 simpler and more direct:
1151
1152 print $var;
1153 my $new = $old;
1154 somefunc($var);
1155
1156 Otherwise, besides slowing you down, you're going to break code when
1157 the thing in the scalar is actually neither a string nor a number, but
1158 a reference:
1159
1160 func(\@array);
1161 sub func {
1162 my $aref = shift;
1163 my $oref = "$aref"; # WRONG
1164 }
1165
1166 You can also get into subtle problems on those few operations in Perl
1167 that actually do care about the difference between a string and a
1168 number, such as the magical "++" autoincrement operator or the
1169 syscall() function.
1170
1171 Stringification also destroys arrays.
1172
1173 my @lines = `command`;
1174 print "@lines"; # WRONG - extra blanks
1175 print @lines; # right
1176
1177 Why don't my <<HERE documents work?
1178 Here documents are found in perlop. Check for these three things:
1179
1180 There must be no space after the << part.
1181 There (probably) should be a semicolon at the end of the opening token
1182 You can't (easily) have any space in front of the tag.
1183 There needs to be at least a line separator after the end token.
1184
1185 If you want to indent the text in the here document, you can do this:
1186
1187 # all in one
1188 (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1189 your text
1190 goes here
1191 HERE_TARGET
1192
1193 But the HERE_TARGET must still be flush against the margin. If you
1194 want that indented also, you'll have to quote in the indentation.
1195
1196 (my $quote = <<' FINIS') =~ s/^\s+//gm;
1197 ...we will have peace, when you and all your works have
1198 perished--and the works of your dark master to whom you
1199 would deliver us. You are a liar, Saruman, and a corrupter
1200 of men's hearts. --Theoden in /usr/src/perl/taint.c
1201 FINIS
1202 $quote =~ s/\s+--/\n--/;
1203
1204 A nice general-purpose fixer-upper function for indented here documents
1205 follows. It expects to be called with a here document as its argument.
1206 It looks to see whether each line begins with a common substring, and
1207 if so, strips that substring off. Otherwise, it takes the amount of
1208 leading whitespace found on the first line and removes that much off
1209 each subsequent line.
1210
1211 sub fix {
1212 local $_ = shift;
1213 my ($white, $leader); # common whitespace and common leading string
1214 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
1215 ($white, $leader) = ($2, quotemeta($1));
1216 } else {
1217 ($white, $leader) = (/^(\s+)/, '');
1218 }
1219 s/^\s*?$leader(?:$white)?//gm;
1220 return $_;
1221 }
1222
1223 This works with leading special strings, dynamically determined:
1224
1225 my $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1226 @@@ int
1227 @@@ runops() {
1228 @@@ SAVEI32(runlevel);
1229 @@@ runlevel++;
1230 @@@ while ( op = (*op->op_ppaddr)() );
1231 @@@ TAINT_NOT;
1232 @@@ return 0;
1233 @@@ }
1234 MAIN_INTERPRETER_LOOP
1235
1236 Or with a fixed amount of leading whitespace, with remaining
1237 indentation correctly preserved:
1238
1239 my $poem = fix<<EVER_ON_AND_ON;
1240 Now far ahead the Road has gone,
1241 And I must follow, if I can,
1242 Pursuing it with eager feet,
1243 Until it joins some larger way
1244 Where many paths and errands meet.
1245 And whither then? I cannot say.
1246 --Bilbo in /usr/src/perl/pp_ctl.c
1247 EVER_ON_AND_ON
1248
1249 Beginning with Perl version 5.26, a much simpler and cleaner way to
1250 write indented here documents has been added to the language: the tilde
1251 (~) modifier. See "Indented Here-docs" in perlop for details.
1252
1254 What is the difference between a list and an array?
1255 (contributed by brian d foy)
1256
1257 A list is a fixed collection of scalars. An array is a variable that
1258 holds a variable collection of scalars. An array can supply its
1259 collection for list operations, so list operations also work on arrays:
1260
1261 # slices
1262 ( 'dog', 'cat', 'bird' )[2,3];
1263 @animals[2,3];
1264
1265 # iteration
1266 foreach ( qw( dog cat bird ) ) { ... }
1267 foreach ( @animals ) { ... }
1268
1269 my @three = grep { length == 3 } qw( dog cat bird );
1270 my @three = grep { length == 3 } @animals;
1271
1272 # supply an argument list
1273 wash_animals( qw( dog cat bird ) );
1274 wash_animals( @animals );
1275
1276 Array operations, which change the scalars, rearrange them, or add or
1277 subtract some scalars, only work on arrays. These can't work on a list,
1278 which is fixed. Array operations include "shift", "unshift", "push",
1279 "pop", and "splice".
1280
1281 An array can also change its length:
1282
1283 $#animals = 1; # truncate to two elements
1284 $#animals = 10000; # pre-extend to 10,001 elements
1285
1286 You can change an array element, but you can't change a list element:
1287
1288 $animals[0] = 'Rottweiler';
1289 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1290
1291 foreach ( @animals ) {
1292 s/^d/fr/; # works fine
1293 }
1294
1295 foreach ( qw( dog cat bird ) ) {
1296 s/^d/fr/; # Error! Modification of read only value!
1297 }
1298
1299 However, if the list element is itself a variable, it appears that you
1300 can change a list element. However, the list element is the variable,
1301 not the data. You're not changing the list element, but something the
1302 list element refers to. The list element itself doesn't change: it's
1303 still the same variable.
1304
1305 You also have to be careful about context. You can assign an array to a
1306 scalar to get the number of elements in the array. This only works for
1307 arrays, though:
1308
1309 my $count = @animals; # only works with arrays
1310
1311 If you try to do the same thing with what you think is a list, you get
1312 a quite different result. Although it looks like you have a list on the
1313 righthand side, Perl actually sees a bunch of scalars separated by a
1314 comma:
1315
1316 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
1317
1318 Since you're assigning to a scalar, the righthand side is in scalar
1319 context. The comma operator (yes, it's an operator!) in scalar context
1320 evaluates its lefthand side, throws away the result, and evaluates its
1321 righthand side and returns the result. In effect, that list-lookalike
1322 assigns to $scalar its rightmost value. Many people mess this up
1323 because they choose a list-lookalike whose last element is also the
1324 count they expect:
1325
1326 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
1327
1328 What is the difference between $array[1] and @array[1]?
1329 (contributed by brian d foy)
1330
1331 The difference is the sigil, that special character in front of the
1332 array name. The "$" sigil means "exactly one item", while the "@" sigil
1333 means "zero or more items". The "$" gets you a single scalar, while the
1334 "@" gets you a list.
1335
1336 The confusion arises because people incorrectly assume that the sigil
1337 denotes the variable type.
1338
1339 The $array[1] is a single-element access to the array. It's going to
1340 return the item in index 1 (or undef if there is no item there). If
1341 you intend to get exactly one element from the array, this is the form
1342 you should use.
1343
1344 The @array[1] is an array slice, although it has only one index. You
1345 can pull out multiple elements simultaneously by specifying additional
1346 indices as a list, like @array[1,4,3,0].
1347
1348 Using a slice on the lefthand side of the assignment supplies list
1349 context to the righthand side. This can lead to unexpected results.
1350 For instance, if you want to read a single line from a filehandle,
1351 assigning to a scalar value is fine:
1352
1353 $array[1] = <STDIN>;
1354
1355 However, in list context, the line input operator returns all of the
1356 lines as a list. The first line goes into @array[1] and the rest of the
1357 lines mysteriously disappear:
1358
1359 @array[1] = <STDIN>; # most likely not what you want
1360
1361 Either the "use warnings" pragma or the -w flag will warn you when you
1362 use an array slice with a single index.
1363
1364 How can I remove duplicate elements from a list or array?
1365 (contributed by brian d foy)
1366
1367 Use a hash. When you think the words "unique" or "duplicated", think
1368 "hash keys".
1369
1370 If you don't care about the order of the elements, you could just
1371 create the hash then extract the keys. It's not important how you
1372 create that hash: just that you use "keys" to get the unique elements.
1373
1374 my %hash = map { $_, 1 } @array;
1375 # or a hash slice: @hash{ @array } = ();
1376 # or a foreach: $hash{$_} = 1 foreach ( @array );
1377
1378 my @unique = keys %hash;
1379
1380 If you want to use a module, try the "uniq" function from
1381 List::MoreUtils. In list context it returns the unique elements,
1382 preserving their order in the list. In scalar context, it returns the
1383 number of unique elements.
1384
1385 use List::MoreUtils qw(uniq);
1386
1387 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1388 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1389
1390 You can also go through each element and skip the ones you've seen
1391 before. Use a hash to keep track. The first time the loop sees an
1392 element, that element has no key in %Seen. The "next" statement creates
1393 the key and immediately uses its value, which is "undef", so the loop
1394 continues to the "push" and increments the value for that key. The next
1395 time the loop sees that same element, its key exists in the hash and
1396 the value for that key is true (since it's not 0 or "undef"), so the
1397 next skips that iteration and the loop goes to the next element.
1398
1399 my @unique = ();
1400 my %seen = ();
1401
1402 foreach my $elem ( @array ) {
1403 next if $seen{ $elem }++;
1404 push @unique, $elem;
1405 }
1406
1407 You can write this more briefly using a grep, which does the same
1408 thing.
1409
1410 my %seen = ();
1411 my @unique = grep { ! $seen{ $_ }++ } @array;
1412
1413 How can I tell whether a certain element is contained in a list or array?
1414 (portions of this answer contributed by Anno Siegel and brian d foy)
1415
1416 Hearing the word "in" is an indication that you probably should have
1417 used a hash, not a list or array, to store your data. Hashes are
1418 designed to answer this question quickly and efficiently. Arrays
1419 aren't.
1420
1421 That being said, there are several ways to approach this. If you are
1422 going to make this query many times over arbitrary string values, the
1423 fastest way is probably to invert the original array and maintain a
1424 hash whose keys are the first array's values:
1425
1426 my @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1427 my %is_blue = ();
1428 for (@blues) { $is_blue{$_} = 1 }
1429
1430 Now you can check whether $is_blue{$some_color}. It might have been a
1431 good idea to keep the blues all in a hash in the first place.
1432
1433 If the values are all small integers, you could use a simple indexed
1434 array. This kind of an array will take up less space:
1435
1436 my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1437 my @is_tiny_prime = ();
1438 for (@primes) { $is_tiny_prime[$_] = 1 }
1439 # or simply @istiny_prime[@primes] = (1) x @primes;
1440
1441 Now you check whether $is_tiny_prime[$some_number].
1442
1443 If the values in question are integers instead of strings, you can save
1444 quite a lot of space by using bit strings instead:
1445
1446 my @articles = ( 1..10, 150..2000, 2017 );
1447 undef $read;
1448 for (@articles) { vec($read,$_,1) = 1 }
1449
1450 Now check whether "vec($read,$n,1)" is true for some $n.
1451
1452 These methods guarantee fast individual tests but require a re-
1453 organization of the original list or array. They only pay off if you
1454 have to test multiple values against the same array.
1455
1456 If you are testing only once, the standard module List::Util exports
1457 the function "any" for this purpose. It works by stopping once it finds
1458 the element. It's written in C for speed, and its Perl equivalent looks
1459 like this subroutine:
1460
1461 sub any (&@) {
1462 my $code = shift;
1463 foreach (@_) {
1464 return 1 if $code->();
1465 }
1466 return 0;
1467 }
1468
1469 If speed is of little concern, the common idiom uses grep in scalar
1470 context (which returns the number of items that passed its condition)
1471 to traverse the entire list. This does have the benefit of telling you
1472 how many matches it found, though.
1473
1474 my $is_there = grep $_ eq $whatever, @array;
1475
1476 If you want to actually extract the matching elements, simply use grep
1477 in list context.
1478
1479 my @matches = grep $_ eq $whatever, @array;
1480
1481 How do I compute the difference of two arrays? How do I compute the
1482 intersection of two arrays?
1483 Use a hash. Here's code to do both and more. It assumes that each
1484 element is unique in a given array:
1485
1486 my (@union, @intersection, @difference);
1487 my %count = ();
1488 foreach my $element (@array1, @array2) { $count{$element}++ }
1489 foreach my $element (keys %count) {
1490 push @union, $element;
1491 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1492 }
1493
1494 Note that this is the symmetric difference, that is, all elements in
1495 either A or in B but not in both. Think of it as an xor operation.
1496
1497 How do I test whether two arrays or hashes are equal?
1498 The following code works for single-level arrays. It uses a stringwise
1499 comparison, and does not distinguish defined versus undefined empty
1500 strings. Modify if you have other needs.
1501
1502 $are_equal = compare_arrays(\@frogs, \@toads);
1503
1504 sub compare_arrays {
1505 my ($first, $second) = @_;
1506 no warnings; # silence spurious -w undef complaints
1507 return 0 unless @$first == @$second;
1508 for (my $i = 0; $i < @$first; $i++) {
1509 return 0 if $first->[$i] ne $second->[$i];
1510 }
1511 return 1;
1512 }
1513
1514 For multilevel structures, you may wish to use an approach more like
1515 this one. It uses the CPAN module FreezeThaw:
1516
1517 use FreezeThaw qw(cmpStr);
1518 my @a = my @b = ( "this", "that", [ "more", "stuff" ] );
1519
1520 printf "a and b contain %s arrays\n",
1521 cmpStr(\@a, \@b) == 0
1522 ? "the same"
1523 : "different";
1524
1525 This approach also works for comparing hashes. Here we'll demonstrate
1526 two different answers:
1527
1528 use FreezeThaw qw(cmpStr cmpStrHard);
1529
1530 my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1531 $a{EXTRA} = \%b;
1532 $b{EXTRA} = \%a;
1533
1534 printf "a and b contain %s hashes\n",
1535 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1536
1537 printf "a and b contain %s hashes\n",
1538 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1539
1540 The first reports that both those the hashes contain the same data,
1541 while the second reports that they do not. Which you prefer is left as
1542 an exercise to the reader.
1543
1544 How do I find the first array element for which a condition is true?
1545 To find the first array element which satisfies a condition, you can
1546 use the first() function in the List::Util module, which comes with
1547 Perl 5.8. This example finds the first element that contains "Perl".
1548
1549 use List::Util qw(first);
1550
1551 my $element = first { /Perl/ } @array;
1552
1553 If you cannot use List::Util, you can make your own loop to do the same
1554 thing. Once you find the element, you stop the loop with last.
1555
1556 my $found;
1557 foreach ( @array ) {
1558 if( /Perl/ ) { $found = $_; last }
1559 }
1560
1561 If you want the array index, use the firstidx() function from
1562 "List::MoreUtils":
1563
1564 use List::MoreUtils qw(firstidx);
1565 my $index = firstidx { /Perl/ } @array;
1566
1567 Or write it yourself, iterating through the indices and checking the
1568 array element at each index until you find one that satisfies the
1569 condition:
1570
1571 my( $found, $index ) = ( undef, -1 );
1572 for( $i = 0; $i < @array; $i++ ) {
1573 if( $array[$i] =~ /Perl/ ) {
1574 $found = $array[$i];
1575 $index = $i;
1576 last;
1577 }
1578 }
1579
1580 How do I handle linked lists?
1581 (contributed by brian d foy)
1582
1583 Perl's arrays do not have a fixed size, so you don't need linked lists
1584 if you just want to add or remove items. You can use array operations
1585 such as "push", "pop", "shift", "unshift", or "splice" to do that.
1586
1587 Sometimes, however, linked lists can be useful in situations where you
1588 want to "shard" an array so you have many small arrays instead of a
1589 single big array. You can keep arrays longer than Perl's largest array
1590 index, lock smaller arrays separately in threaded programs, reallocate
1591 less memory, or quickly insert elements in the middle of the chain.
1592
1593 Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
1594 Linked Lists" ( <http://www.slideshare.net/lembark/perly-linked-lists>
1595 ), although you can just use his LinkedList::Single module.
1596
1597 How do I handle circular lists?
1598 (contributed by brian d foy)
1599
1600 If you want to cycle through an array endlessly, you can increment the
1601 index modulo the number of elements in the array:
1602
1603 my @array = qw( a b c );
1604 my $i = 0;
1605
1606 while( 1 ) {
1607 print $array[ $i++ % @array ], "\n";
1608 last if $i > 20;
1609 }
1610
1611 You can also use Tie::Cycle to use a scalar that always has the next
1612 element of the circular array:
1613
1614 use Tie::Cycle;
1615
1616 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1617
1618 print $cycle; # FFFFFF
1619 print $cycle; # 000000
1620 print $cycle; # FFFF00
1621
1622 The Array::Iterator::Circular creates an iterator object for circular
1623 arrays:
1624
1625 use Array::Iterator::Circular;
1626
1627 my $color_iterator = Array::Iterator::Circular->new(
1628 qw(red green blue orange)
1629 );
1630
1631 foreach ( 1 .. 20 ) {
1632 print $color_iterator->next, "\n";
1633 }
1634
1635 How do I shuffle an array randomly?
1636 If you either have Perl 5.8.0 or later installed, or if you have
1637 Scalar-List-Utils 1.03 or later installed, you can say:
1638
1639 use List::Util 'shuffle';
1640
1641 @shuffled = shuffle(@list);
1642
1643 If not, you can use a Fisher-Yates shuffle.
1644
1645 sub fisher_yates_shuffle {
1646 my $deck = shift; # $deck is a reference to an array
1647 return unless @$deck; # must not be empty!
1648
1649 my $i = @$deck;
1650 while (--$i) {
1651 my $j = int rand ($i+1);
1652 @$deck[$i,$j] = @$deck[$j,$i];
1653 }
1654 }
1655
1656 # shuffle my mpeg collection
1657 #
1658 my @mpeg = <audio/*/*.mp3>;
1659 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1660 print @mpeg;
1661
1662 Note that the above implementation shuffles an array in place, unlike
1663 the List::Util::shuffle() which takes a list and returns a new shuffled
1664 list.
1665
1666 You've probably seen shuffling algorithms that work using splice,
1667 randomly picking another element to swap the current element with
1668
1669 srand;
1670 @new = ();
1671 @old = 1 .. 10; # just a demo
1672 while (@old) {
1673 push(@new, splice(@old, rand @old, 1));
1674 }
1675
1676 This is bad because splice is already O(N), and since you do it N
1677 times, you just invented a quadratic algorithm; that is, O(N**2). This
1678 does not scale, although Perl is so efficient that you probably won't
1679 notice this until you have rather largish arrays.
1680
1681 How do I process/modify each element of an array?
1682 Use "for"/"foreach":
1683
1684 for (@lines) {
1685 s/foo/bar/; # change that word
1686 tr/XZ/ZX/; # swap those letters
1687 }
1688
1689 Here's another; let's compute spherical volumes:
1690
1691 my @volumes = @radii;
1692 for (@volumes) { # @volumes has changed parts
1693 $_ **= 3;
1694 $_ *= (4/3) * 3.14159; # this will be constant folded
1695 }
1696
1697 which can also be done with map() which is made to transform one list
1698 into another:
1699
1700 my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1701
1702 If you want to do the same thing to modify the values of the hash, you
1703 can use the "values" function. As of Perl 5.6 the values are not
1704 copied, so if you modify $orbit (in this case), you modify the value.
1705
1706 for my $orbit ( values %orbits ) {
1707 ($orbit **= 3) *= (4/3) * 3.14159;
1708 }
1709
1710 Prior to perl 5.6 "values" returned copies of the values, so older perl
1711 code often contains constructions such as @orbits{keys %orbits} instead
1712 of "values %orbits" where the hash is to be modified.
1713
1714 How do I select a random element from an array?
1715 Use the rand() function (see "rand" in perlfunc):
1716
1717 my $index = rand @array;
1718 my $element = $array[$index];
1719
1720 Or, simply:
1721
1722 my $element = $array[ rand @array ];
1723
1724 How do I permute N elements of a list?
1725 Use the List::Permutor module on CPAN. If the list is actually an
1726 array, try the Algorithm::Permute module (also on CPAN). It's written
1727 in XS code and is very efficient:
1728
1729 use Algorithm::Permute;
1730
1731 my @array = 'a'..'d';
1732 my $p_iterator = Algorithm::Permute->new ( \@array );
1733
1734 while (my @perm = $p_iterator->next) {
1735 print "next permutation: (@perm)\n";
1736 }
1737
1738 For even faster execution, you could do:
1739
1740 use Algorithm::Permute;
1741
1742 my @array = 'a'..'d';
1743
1744 Algorithm::Permute::permute {
1745 print "next permutation: (@array)\n";
1746 } @array;
1747
1748 Here's a little program that generates all permutations of all the
1749 words on each line of input. The algorithm embodied in the permute()
1750 function is discussed in Volume 4 (still unpublished) of Knuth's The
1751 Art of Computer Programming and will work on any list:
1752
1753 #!/usr/bin/perl -n
1754 # Fischer-Krause ordered permutation generator
1755
1756 sub permute (&@) {
1757 my $code = shift;
1758 my @idx = 0..$#_;
1759 while ( $code->(@_[@idx]) ) {
1760 my $p = $#idx;
1761 --$p while $idx[$p-1] > $idx[$p];
1762 my $q = $p or return;
1763 push @idx, reverse splice @idx, $p;
1764 ++$q while $idx[$p-1] > $idx[$q];
1765 @idx[$p-1,$q]=@idx[$q,$p-1];
1766 }
1767 }
1768
1769 permute { print "@_\n" } split;
1770
1771 The Algorithm::Loops module also provides the "NextPermute" and
1772 "NextPermuteNum" functions which efficiently find all unique
1773 permutations of an array, even if it contains duplicate values,
1774 modifying it in-place: if its elements are in reverse-sorted order then
1775 the array is reversed, making it sorted, and it returns false;
1776 otherwise the next permutation is returned.
1777
1778 "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1779 you can enumerate all the permutations of 0..9 like this:
1780
1781 use Algorithm::Loops qw(NextPermuteNum);
1782
1783 my @list= 0..9;
1784 do { print "@list\n" } while NextPermuteNum @list;
1785
1786 How do I sort an array by (anything)?
1787 Supply a comparison function to sort() (described in "sort" in
1788 perlfunc):
1789
1790 @list = sort { $a <=> $b } @list;
1791
1792 The default sort function is cmp, string comparison, which would sort
1793 "(1, 2, 10)" into "(1, 10, 2)". "<=>", used above, is the numerical
1794 comparison operator.
1795
1796 If you have a complicated function needed to pull out the part you want
1797 to sort on, then don't do it inside the sort function. Pull it out
1798 first, because the sort BLOCK can be called many times for the same
1799 element. Here's an example of how to pull out the first word after the
1800 first number on each item, and then sort those words case-
1801 insensitively.
1802
1803 my @idx;
1804 for (@data) {
1805 my $item;
1806 ($item) = /\d+\s*(\S+)/;
1807 push @idx, uc($item);
1808 }
1809 my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1810
1811 which could also be written this way, using a trick that's come to be
1812 known as the Schwartzian Transform:
1813
1814 my @sorted = map { $_->[0] }
1815 sort { $a->[1] cmp $b->[1] }
1816 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1817
1818 If you need to sort on several fields, the following paradigm is
1819 useful.
1820
1821 my @sorted = sort {
1822 field1($a) <=> field1($b) ||
1823 field2($a) cmp field2($b) ||
1824 field3($a) cmp field3($b)
1825 } @data;
1826
1827 This can be conveniently combined with precalculation of keys as given
1828 above.
1829
1830 See the sort article in the "Far More Than You Ever Wanted To Know"
1831 collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for more
1832 about this approach.
1833
1834 See also the question later in perlfaq4 on sorting hashes.
1835
1836 How do I manipulate arrays of bits?
1837 Use pack() and unpack(), or else vec() and the bitwise operations.
1838
1839 For example, you don't have to store individual bits in an array (which
1840 would mean that you're wasting a lot of space). To convert an array of
1841 bits to a string, use vec() to set the right bits. This sets $vec to
1842 have bit N set only if $ints[N] was set:
1843
1844 my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1845 my $vec = '';
1846 foreach( 0 .. $#ints ) {
1847 vec($vec,$_,1) = 1 if $ints[$_];
1848 }
1849
1850 The string $vec only takes up as many bits as it needs. For instance,
1851 if you had 16 entries in @ints, $vec only needs two bytes to store them
1852 (not counting the scalar variable overhead).
1853
1854 Here's how, given a vector in $vec, you can get those bits into your
1855 @ints array:
1856
1857 sub bitvec_to_list {
1858 my $vec = shift;
1859 my @ints;
1860 # Find null-byte density then select best algorithm
1861 if ($vec =~ tr/\0// / length $vec > 0.95) {
1862 use integer;
1863 my $i;
1864
1865 # This method is faster with mostly null-bytes
1866 while($vec =~ /[^\0]/g ) {
1867 $i = -9 + 8 * pos $vec;
1868 push @ints, $i if vec($vec, ++$i, 1);
1869 push @ints, $i if vec($vec, ++$i, 1);
1870 push @ints, $i if vec($vec, ++$i, 1);
1871 push @ints, $i if vec($vec, ++$i, 1);
1872 push @ints, $i if vec($vec, ++$i, 1);
1873 push @ints, $i if vec($vec, ++$i, 1);
1874 push @ints, $i if vec($vec, ++$i, 1);
1875 push @ints, $i if vec($vec, ++$i, 1);
1876 }
1877 }
1878 else {
1879 # This method is a fast general algorithm
1880 use integer;
1881 my $bits = unpack "b*", $vec;
1882 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1883 push @ints, pos $bits while($bits =~ /1/g);
1884 }
1885
1886 return \@ints;
1887 }
1888
1889 This method gets faster the more sparse the bit vector is. (Courtesy
1890 of Tim Bunce and Winfried Koenig.)
1891
1892 You can make the while loop a lot shorter with this suggestion from
1893 Benjamin Goldberg:
1894
1895 while($vec =~ /[^\0]+/g ) {
1896 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1897 }
1898
1899 Or use the CPAN module Bit::Vector:
1900
1901 my $vector = Bit::Vector->new($num_of_bits);
1902 $vector->Index_List_Store(@ints);
1903 my @ints = $vector->Index_List_Read();
1904
1905 Bit::Vector provides efficient methods for bit vector, sets of small
1906 integers and "big int" math.
1907
1908 Here's a more extensive illustration using vec():
1909
1910 # vec demo
1911 my $vector = "\xff\x0f\xef\xfe";
1912 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1913 unpack("N", $vector), "\n";
1914 my $is_set = vec($vector, 23, 1);
1915 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1916 pvec($vector);
1917
1918 set_vec(1,1,1);
1919 set_vec(3,1,1);
1920 set_vec(23,1,1);
1921
1922 set_vec(3,1,3);
1923 set_vec(3,2,3);
1924 set_vec(3,4,3);
1925 set_vec(3,4,7);
1926 set_vec(3,8,3);
1927 set_vec(3,8,7);
1928
1929 set_vec(0,32,17);
1930 set_vec(1,32,17);
1931
1932 sub set_vec {
1933 my ($offset, $width, $value) = @_;
1934 my $vector = '';
1935 vec($vector, $offset, $width) = $value;
1936 print "offset=$offset width=$width value=$value\n";
1937 pvec($vector);
1938 }
1939
1940 sub pvec {
1941 my $vector = shift;
1942 my $bits = unpack("b*", $vector);
1943 my $i = 0;
1944 my $BASE = 8;
1945
1946 print "vector length in bytes: ", length($vector), "\n";
1947 @bytes = unpack("A8" x length($vector), $bits);
1948 print "bits are: @bytes\n\n";
1949 }
1950
1951 Why does defined() return true on empty arrays and hashes?
1952 The short story is that you should probably only use defined on scalars
1953 or functions, not on aggregates (arrays and hashes). See "defined" in
1954 perlfunc in the 5.004 release or later of Perl for more detail.
1955
1957 How do I process an entire hash?
1958 (contributed by brian d foy)
1959
1960 There are a couple of ways that you can process an entire hash. You can
1961 get a list of keys, then go through each key, or grab a one key-value
1962 pair at a time.
1963
1964 To go through all of the keys, use the "keys" function. This extracts
1965 all of the keys of the hash and gives them back to you as a list. You
1966 can then get the value through the particular key you're processing:
1967
1968 foreach my $key ( keys %hash ) {
1969 my $value = $hash{$key}
1970 ...
1971 }
1972
1973 Once you have the list of keys, you can process that list before you
1974 process the hash elements. For instance, you can sort the keys so you
1975 can process them in lexical order:
1976
1977 foreach my $key ( sort keys %hash ) {
1978 my $value = $hash{$key}
1979 ...
1980 }
1981
1982 Or, you might want to only process some of the items. If you only want
1983 to deal with the keys that start with "text:", you can select just
1984 those using "grep":
1985
1986 foreach my $key ( grep /^text:/, keys %hash ) {
1987 my $value = $hash{$key}
1988 ...
1989 }
1990
1991 If the hash is very large, you might not want to create a long list of
1992 keys. To save some memory, you can grab one key-value pair at a time
1993 using each(), which returns a pair you haven't seen yet:
1994
1995 while( my( $key, $value ) = each( %hash ) ) {
1996 ...
1997 }
1998
1999 The "each" operator returns the pairs in apparently random order, so if
2000 ordering matters to you, you'll have to stick with the "keys" method.
2001
2002 The each() operator can be a bit tricky though. You can't add or delete
2003 keys of the hash while you're using it without possibly skipping or re-
2004 processing some pairs after Perl internally rehashes all of the
2005 elements. Additionally, a hash has only one iterator, so if you mix
2006 "keys", "values", or "each" on the same hash, you risk resetting the
2007 iterator and messing up your processing. See the "each" entry in
2008 perlfunc for more details.
2009
2010 How do I merge two hashes?
2011 (contributed by brian d foy)
2012
2013 Before you decide to merge two hashes, you have to decide what to do if
2014 both hashes contain keys that are the same and if you want to leave the
2015 original hashes as they were.
2016
2017 If you want to preserve the original hashes, copy one hash (%hash1) to
2018 a new hash (%new_hash), then add the keys from the other hash (%hash2
2019 to the new hash. Checking that the key already exists in %new_hash
2020 gives you a chance to decide what to do with the duplicates:
2021
2022 my %new_hash = %hash1; # make a copy; leave %hash1 alone
2023
2024 foreach my $key2 ( keys %hash2 ) {
2025 if( exists $new_hash{$key2} ) {
2026 warn "Key [$key2] is in both hashes!";
2027 # handle the duplicate (perhaps only warning)
2028 ...
2029 next;
2030 }
2031 else {
2032 $new_hash{$key2} = $hash2{$key2};
2033 }
2034 }
2035
2036 If you don't want to create a new hash, you can still use this looping
2037 technique; just change the %new_hash to %hash1.
2038
2039 foreach my $key2 ( keys %hash2 ) {
2040 if( exists $hash1{$key2} ) {
2041 warn "Key [$key2] is in both hashes!";
2042 # handle the duplicate (perhaps only warning)
2043 ...
2044 next;
2045 }
2046 else {
2047 $hash1{$key2} = $hash2{$key2};
2048 }
2049 }
2050
2051 If you don't care that one hash overwrites keys and values from the
2052 other, you could just use a hash slice to add one hash to another. In
2053 this case, values from %hash2 replace values from %hash1 when they have
2054 keys in common:
2055
2056 @hash1{ keys %hash2 } = values %hash2;
2057
2058 What happens if I add or remove keys from a hash while iterating over it?
2059 (contributed by brian d foy)
2060
2061 The easy answer is "Don't do that!"
2062
2063 If you iterate through the hash with each(), you can delete the key
2064 most recently returned without worrying about it. If you delete or add
2065 other keys, the iterator may skip or double up on them since perl may
2066 rearrange the hash table. See the entry for each() in perlfunc.
2067
2068 How do I look up a hash element by value?
2069 Create a reverse hash:
2070
2071 my %by_value = reverse %by_key;
2072 my $key = $by_value{$value};
2073
2074 That's not particularly efficient. It would be more space-efficient to
2075 use:
2076
2077 while (my ($key, $value) = each %by_key) {
2078 $by_value{$value} = $key;
2079 }
2080
2081 If your hash could have repeated values, the methods above will only
2082 find one of the associated keys. This may or may not worry you. If it
2083 does worry you, you can always reverse the hash into a hash of arrays
2084 instead:
2085
2086 while (my ($key, $value) = each %by_key) {
2087 push @{$key_list_by_value{$value}}, $key;
2088 }
2089
2090 How can I know how many entries are in a hash?
2091 (contributed by brian d foy)
2092
2093 This is very similar to "How do I process an entire hash?", also in
2094 perlfaq4, but a bit simpler in the common cases.
2095
2096 You can use the keys() built-in function in scalar context to find out
2097 how many entries you have in a hash:
2098
2099 my $key_count = keys %hash; # must be scalar context!
2100
2101 If you want to find out how many entries have a defined value, that's a
2102 bit different. You have to check each value. A "grep" is handy:
2103
2104 my $defined_value_count = grep { defined } values %hash;
2105
2106 You can use that same structure to count the entries any way that you
2107 like. If you want the count of the keys with vowels in them, you just
2108 test for that instead:
2109
2110 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2111
2112 The "grep" in scalar context returns the count. If you want the list of
2113 matching items, just use it in list context instead:
2114
2115 my @defined_values = grep { defined } values %hash;
2116
2117 The keys() function also resets the iterator, which means that you may
2118 see strange results if you use this between uses of other hash
2119 operators such as each().
2120
2121 How do I sort a hash (optionally by value instead of key)?
2122 (contributed by brian d foy)
2123
2124 To sort a hash, start with the keys. In this example, we give the list
2125 of keys to the sort function which then compares them as strings. The
2126 output list has the keys in string order. Once we have the keys, we can
2127 go through them to create a report which lists the keys in string
2128 order:
2129
2130 my @keys = sort { $a cmp $b } keys %hash;
2131
2132 foreach my $key ( @keys ) {
2133 printf "%-20s %6d\n", $key, $hash{$key};
2134 }
2135
2136 We could get more fancy in the sort() block though. Instead of
2137 comparing the keys, we can compute a value with them and use that value
2138 as the comparison.
2139
2140 For instance, to make our report order case-insensitive, we use "fc" to
2141 safely lowercase the keys before comparing them:
2142
2143 use v5.16;
2144 my @keys = sort { fc($a) cmp fc($b) } keys %hash;
2145
2146 Earlier versions of this answer used "lc", but that could give
2147 unexpected results with some Unicode strings. See "fc" in perlfunc for
2148 the details. The Unicode::UCD module does the same thing for earlier
2149 perls.
2150
2151 Note: if the computation is expensive or the hash has many elements,
2152 you may want to look at the Schwartzian Transform to cache the
2153 computation results.
2154
2155 If we want to sort by the hash value instead, we use the hash key to
2156 look it up. We still get out a list of keys, but this time they are
2157 ordered by their value:
2158
2159 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2160
2161 From there we can get more complex. If the hash values are the same, we
2162 can provide a secondary sort on the hash key:
2163
2164 use v5.16;
2165 my @keys = sort {
2166 $hash{$a} <=> $hash{$b}
2167 or
2168 fc($a) cmp fc($b)
2169 } keys %hash;
2170
2171 How can I always keep my hash sorted?
2172 You can look into using the "DB_File" module and tie() using the
2173 $DB_BTREE hash bindings as documented in "In Memory Databases" in
2174 DB_File. The Tie::IxHash module from CPAN might also be instructive.
2175 Although this does keep your hash sorted, you might not like the
2176 slowdown you suffer from the tie interface. Are you sure you need to do
2177 this? :)
2178
2179 What's the difference between "delete" and "undef" with hashes?
2180 Hashes contain pairs of scalars: the first is the key, the second is
2181 the value. The key will be coerced to a string, although the value can
2182 be any kind of scalar: string, number, or reference. If a key $key is
2183 present in %hash, exists($hash{$key}) will return true. The value for a
2184 given key can be "undef", in which case $hash{$key} will be "undef"
2185 while "exists $hash{$key}" will return true. This corresponds to ($key,
2186 "undef") being in the hash.
2187
2188 Pictures help... Here's the %hash table:
2189
2190 keys values
2191 +------+------+
2192 | a | 3 |
2193 | x | 7 |
2194 | d | 0 |
2195 | e | 2 |
2196 +------+------+
2197
2198 And these conditions hold
2199
2200 $hash{'a'} is true
2201 $hash{'d'} is false
2202 defined $hash{'d'} is true
2203 defined $hash{'a'} is true
2204 exists $hash{'a'} is true (Perl 5 only)
2205 grep ($_ eq 'a', keys %hash) is true
2206
2207 If you now say
2208
2209 undef $hash{'a'}
2210
2211 your table now reads:
2212
2213 keys values
2214 +------+------+
2215 | a | undef|
2216 | x | 7 |
2217 | d | 0 |
2218 | e | 2 |
2219 +------+------+
2220
2221 and these conditions now hold; changes in caps:
2222
2223 $hash{'a'} is FALSE
2224 $hash{'d'} is false
2225 defined $hash{'d'} is true
2226 defined $hash{'a'} is FALSE
2227 exists $hash{'a'} is true (Perl 5 only)
2228 grep ($_ eq 'a', keys %hash) is true
2229
2230 Notice the last two: you have an undef value, but a defined key!
2231
2232 Now, consider this:
2233
2234 delete $hash{'a'}
2235
2236 your table now reads:
2237
2238 keys values
2239 +------+------+
2240 | x | 7 |
2241 | d | 0 |
2242 | e | 2 |
2243 +------+------+
2244
2245 and these conditions now hold; changes in caps:
2246
2247 $hash{'a'} is false
2248 $hash{'d'} is false
2249 defined $hash{'d'} is true
2250 defined $hash{'a'} is false
2251 exists $hash{'a'} is FALSE (Perl 5 only)
2252 grep ($_ eq 'a', keys %hash) is FALSE
2253
2254 See, the whole entry is gone!
2255
2256 Why don't my tied hashes make the defined/exists distinction?
2257 This depends on the tied hash's implementation of EXISTS(). For
2258 example, there isn't the concept of undef with hashes that are tied to
2259 DBM* files. It also means that exists() and defined() do the same thing
2260 with a DBM* file, and what they end up doing is not what they do with
2261 ordinary hashes.
2262
2263 How do I reset an each() operation part-way through?
2264 (contributed by brian d foy)
2265
2266 You can use the "keys" or "values" functions to reset "each". To simply
2267 reset the iterator used by "each" without doing anything else, use one
2268 of them in void context:
2269
2270 keys %hash; # resets iterator, nothing else.
2271 values %hash; # resets iterator, nothing else.
2272
2273 See the documentation for "each" in perlfunc.
2274
2275 How can I get the unique keys from two hashes?
2276 First you extract the keys from the hashes into lists, then solve the
2277 "removing duplicates" problem described above. For example:
2278
2279 my %seen = ();
2280 for my $element (keys(%foo), keys(%bar)) {
2281 $seen{$element}++;
2282 }
2283 my @uniq = keys %seen;
2284
2285 Or more succinctly:
2286
2287 my @uniq = keys %{{%foo,%bar}};
2288
2289 Or if you really want to save space:
2290
2291 my %seen = ();
2292 while (defined ($key = each %foo)) {
2293 $seen{$key}++;
2294 }
2295 while (defined ($key = each %bar)) {
2296 $seen{$key}++;
2297 }
2298 my @uniq = keys %seen;
2299
2300 How can I store a multidimensional array in a DBM file?
2301 Either stringify the structure yourself (no fun), or else get the MLDBM
2302 (which uses Data::Dumper) module from CPAN and layer it on top of
2303 either DB_File or GDBM_File. You might also try DBM::Deep, but it can
2304 be a bit slow.
2305
2306 How can I make my hash remember the order I put elements into it?
2307 Use the Tie::IxHash from CPAN.
2308
2309 use Tie::IxHash;
2310
2311 tie my %myhash, 'Tie::IxHash';
2312
2313 for (my $i=0; $i<20; $i++) {
2314 $myhash{$i} = 2*$i;
2315 }
2316
2317 my @keys = keys %myhash;
2318 # @keys = (0,1,2,3,...)
2319
2320 Why does passing a subroutine an undefined element in a hash create it?
2321 (contributed by brian d foy)
2322
2323 Are you using a really old version of Perl?
2324
2325 Normally, accessing a hash key's value for a nonexistent key will not
2326 create the key.
2327
2328 my %hash = ();
2329 my $value = $hash{ 'foo' };
2330 print "This won't print\n" if exists $hash{ 'foo' };
2331
2332 Passing $hash{ 'foo' } to a subroutine used to be a special case,
2333 though. Since you could assign directly to $_[0], Perl had to be ready
2334 to make that assignment so it created the hash key ahead of time:
2335
2336 my_sub( $hash{ 'foo' } );
2337 print "This will print before 5.004\n" if exists $hash{ 'foo' };
2338
2339 sub my_sub {
2340 # $_[0] = 'bar'; # create hash key in case you do this
2341 1;
2342 }
2343
2344 Since Perl 5.004, however, this situation is a special case and Perl
2345 creates the hash key only when you make the assignment:
2346
2347 my_sub( $hash{ 'foo' } );
2348 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2349
2350 sub my_sub {
2351 $_[0] = 'bar';
2352 }
2353
2354 However, if you want the old behavior (and think carefully about that
2355 because it's a weird side effect), you can pass a hash slice instead.
2356 Perl 5.004 didn't make this a special case:
2357
2358 my_sub( @hash{ qw/foo/ } );
2359
2360 How can I make the Perl equivalent of a C structure/C++ class/hash or array
2361 of hashes or arrays?
2362 Usually a hash ref, perhaps like this:
2363
2364 $record = {
2365 NAME => "Jason",
2366 EMPNO => 132,
2367 TITLE => "deputy peon",
2368 AGE => 23,
2369 SALARY => 37_000,
2370 PALS => [ "Norbert", "Rhys", "Phineas"],
2371 };
2372
2373 References are documented in perlref and perlreftut. Examples of
2374 complex data structures are given in perldsc and perllol. Examples of
2375 structures and object-oriented classes are in perlootut.
2376
2377 How can I use a reference as a hash key?
2378 (contributed by brian d foy and Ben Morrow)
2379
2380 Hash keys are strings, so you can't really use a reference as the key.
2381 When you try to do that, perl turns the reference into its stringified
2382 form (for instance, HASH(0xDEADBEEF)). From there you can't get back
2383 the reference from the stringified form, at least without doing some
2384 extra work on your own.
2385
2386 Remember that the entry in the hash will still be there even if the
2387 referenced variable goes out of scope, and that it is entirely
2388 possible for Perl to subsequently allocate a different variable at the
2389 same address. This will mean a new variable might accidentally be
2390 associated with the value for an old.
2391
2392 If you have Perl 5.10 or later, and you just want to store a value
2393 against the reference for lookup later, you can use the core
2394 Hash::Util::Fieldhash module. This will also handle renaming the keys
2395 if you use multiple threads (which causes all variables to be
2396 reallocated at new addresses, changing their stringification), and
2397 garbage-collecting the entries when the referenced variable goes out of
2398 scope.
2399
2400 If you actually need to be able to get a real reference back from each
2401 hash entry, you can use the Tie::RefHash module, which does the
2402 required work for you.
2403
2404 How can I check if a key exists in a multilevel hash?
2405 (contributed by brian d foy)
2406
2407 The trick to this problem is avoiding accidental autovivification. If
2408 you want to check three keys deep, you might naïvely try this:
2409
2410 my %hash;
2411 if( exists $hash{key1}{key2}{key3} ) {
2412 ...;
2413 }
2414
2415 Even though you started with a completely empty hash, after that call
2416 to "exists" you've created the structure you needed to check for
2417 "key3":
2418
2419 %hash = (
2420 'key1' => {
2421 'key2' => {}
2422 }
2423 );
2424
2425 That's autovivification. You can get around this in a few ways. The
2426 easiest way is to just turn it off. The lexical "autovivification"
2427 pragma is available on CPAN. Now you don't add to the hash:
2428
2429 {
2430 no autovivification;
2431 my %hash;
2432 if( exists $hash{key1}{key2}{key3} ) {
2433 ...;
2434 }
2435 }
2436
2437 The Data::Diver module on CPAN can do it for you too. Its "Dive"
2438 subroutine can tell you not only if the keys exist but also get the
2439 value:
2440
2441 use Data::Diver qw(Dive);
2442
2443 my @exists = Dive( \%hash, qw(key1 key2 key3) );
2444 if( ! @exists ) {
2445 ...; # keys do not exist
2446 }
2447 elsif( ! defined $exists[0] ) {
2448 ...; # keys exist but value is undef
2449 }
2450
2451 You can easily do this yourself too by checking each level of the hash
2452 before you move onto the next level. This is essentially what
2453 Data::Diver does for you:
2454
2455 if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2456 ...;
2457 }
2458
2459 sub check_hash {
2460 my( $hash, @keys ) = @_;
2461
2462 return unless @keys;
2463
2464 foreach my $key ( @keys ) {
2465 return unless eval { exists $hash->{$key} };
2466 $hash = $hash->{$key};
2467 }
2468
2469 return 1;
2470 }
2471
2472 How can I prevent addition of unwanted keys into a hash?
2473 Since version 5.8.0, hashes can be restricted to a fixed number of
2474 given keys. Methods for creating and dealing with restricted hashes are
2475 exported by the Hash::Util module.
2476
2478 How do I handle binary data correctly?
2479 Perl is binary-clean, so it can handle binary data just fine. On
2480 Windows or DOS, however, you have to use "binmode" for binary files to
2481 avoid conversions for line endings. In general, you should use
2482 "binmode" any time you want to work with binary data.
2483
2484 Also see "binmode" in perlfunc or perlopentut.
2485
2486 If you're concerned about 8-bit textual data then see perllocale. If
2487 you want to deal with multibyte characters, however, there are some
2488 gotchas. See the section on Regular Expressions.
2489
2490 How do I determine whether a scalar is a number/whole/integer/float?
2491 Assuming that you don't care about IEEE notations like "NaN" or
2492 "Infinity", you probably just want to use a regular expression (see
2493 also perlretut and perlre):
2494
2495 use 5.010;
2496
2497 if ( /\D/ )
2498 { say "\thas nondigits"; }
2499 if ( /^\d+\z/ )
2500 { say "\tis a whole number"; }
2501 if ( /^-?\d+\z/ )
2502 { say "\tis an integer"; }
2503 if ( /^[+-]?\d+\z/ )
2504 { say "\tis a +/- integer"; }
2505 if ( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2506 { say "\tis a real number"; }
2507 if ( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i )
2508 { say "\tis a C float" }
2509
2510 There are also some commonly used modules for the task. Scalar::Util
2511 (distributed with 5.8) provides access to perl's internal function
2512 "looks_like_number" for determining whether a variable looks like a
2513 number. Data::Types exports functions that validate data types using
2514 both the above and other regular expressions. Thirdly, there is
2515 Regexp::Common which has regular expressions to match various types of
2516 numbers. Those three modules are available from the CPAN.
2517
2518 If you're on a POSIX system, Perl supports the "POSIX::strtod" function
2519 for converting strings to doubles (and also "POSIX::strtol" for longs).
2520 Its semantics are somewhat cumbersome, so here's a "getnum" wrapper
2521 function for more convenient access. This function takes a string and
2522 returns the number it found, or "undef" for input that isn't a C float.
2523 The "is_numeric" function is a front end to "getnum" if you just want
2524 to say, "Is this a float?"
2525
2526 sub getnum {
2527 use POSIX qw(strtod);
2528 my $str = shift;
2529 $str =~ s/^\s+//;
2530 $str =~ s/\s+$//;
2531 $! = 0;
2532 my($num, $unparsed) = strtod($str);
2533 if (($str eq '') || ($unparsed != 0) || $!) {
2534 return undef;
2535 }
2536 else {
2537 return $num;
2538 }
2539 }
2540
2541 sub is_numeric { defined getnum($_[0]) }
2542
2543 Or you could check out the String::Scanf module on the CPAN instead.
2544
2545 How do I keep persistent data across program calls?
2546 For some specific applications, you can use one of the DBM modules.
2547 See AnyDBM_File. More generically, you should consult the FreezeThaw or
2548 Storable modules from CPAN. Starting from Perl 5.8, Storable is part of
2549 the standard distribution. Here's one example using Storable's "store"
2550 and "retrieve" functions:
2551
2552 use Storable;
2553 store(\%hash, "filename");
2554
2555 # later on...
2556 $href = retrieve("filename"); # by ref
2557 %hash = %{ retrieve("filename") }; # direct to hash
2558
2559 How do I print out or copy a recursive data structure?
2560 The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
2561 for printing out data structures. The Storable module on CPAN (or the
2562 5.8 release of Perl), provides a function called "dclone" that
2563 recursively copies its argument.
2564
2565 use Storable qw(dclone);
2566 $r2 = dclone($r1);
2567
2568 Where $r1 can be a reference to any kind of data structure you'd like.
2569 It will be deeply copied. Because "dclone" takes and returns
2570 references, you'd have to add extra punctuation if you had a hash of
2571 arrays that you wanted to copy.
2572
2573 %newhash = %{ dclone(\%oldhash) };
2574
2575 How do I define methods for every class/object?
2576 (contributed by Ben Morrow)
2577
2578 You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2579 very careful to consider the consequences of doing this: adding methods
2580 to every object is very likely to have unintended consequences. If
2581 possible, it would be better to have all your object inherit from some
2582 common base class, or to use an object system like Moose that supports
2583 roles.
2584
2585 How do I verify a credit card checksum?
2586 Get the Business::CreditCard module from CPAN.
2587
2588 How do I pack arrays of doubles or floats for XS code?
2589 The arrays.h/arrays.c code in the PGPLOT module on CPAN does just this.
2590 If you're doing a lot of float or double processing, consider using the
2591 PDL module from CPAN instead--it makes number-crunching easy.
2592
2593 See <https://metacpan.org/release/PGPLOT> for the code.
2594
2596 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other
2597 authors as noted. All rights reserved.
2598
2599 This documentation is free; you can redistribute it and/or modify it
2600 under the same terms as Perl itself.
2601
2602 Irrespective of its distribution, all code examples in this file are
2603 hereby placed into the public domain. You are permitted and encouraged
2604 to use this code in your own programs for fun or for profit as you see
2605 fit. A simple comment in the code giving credit would be courteous but
2606 is not required.
2607
2608
2609
2610perl v5.36.1 2023-08-24 perlfaq4(3)