1PERLFAQ4(1) Perl Programmers Reference Guide PERLFAQ4(1)
2
3
4
6 perlfaq4 - Data Manipulation
7
9 This section of the FAQ answers questions related to manipulating
10 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
11
13 Why am I getting long decimals (eg, 19.9499999999999) instead of the
14 numbers I should be getting (eg, 19.95)?
15 For the long explanation, see David Goldberg's "What Every Computer
16 Scientist Should Know About Floating-Point Arithmetic"
17 (http://docs.sun.com/source/806-3568/ncg_goldberg.html).
18
19 Internally, your computer represents floating-point numbers in binary.
20 Digital (as in powers of two) computers cannot store all numbers
21 exactly. Some real numbers lose precision in the process. This is a
22 problem with how computers store numbers and affects all computer
23 languages, not just Perl.
24
25 perlnumber shows the gory details of number representations and
26 conversions.
27
28 To limit the number of decimal places in your numbers, you can use the
29 "printf" or "sprintf" function. See the "Floating Point Arithmetic"
30 for more details.
31
32 printf "%.2f", 10/3;
33
34 my $number = sprintf "%.2f", 10/3;
35
36 Why is int() broken?
37 Your "int()" is most probably working just fine. It's the numbers that
38 aren't quite what you think.
39
40 First, see the answer to "Why am I getting long decimals (eg,
41 19.9499999999999) instead of the numbers I should be getting (eg,
42 19.95)?".
43
44 For example, this
45
46 print int(0.6/0.2-2), "\n";
47
48 will in most computers print 0, not 1, because even such simple numbers
49 as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
50 What you think in the above as 'three' is really more like
51 2.9999999999999995559.
52
53 Why isn't my octal data interpreted correctly?
54 (contributed by brian d foy)
55
56 You're probably trying to convert a string to a number, which Perl only
57 converts as a decimal number. When Perl converts a string to a number,
58 it ignores leading spaces and zeroes, then assumes the rest of the
59 digits are in base 10:
60
61 my $string = '0644';
62
63 print $string + 0; # prints 644
64
65 print $string + 44; # prints 688, certainly not octal!
66
67 This problem usually involves one of the Perl built-ins that has the
68 same name a Unix command that uses octal numbers as arguments on the
69 command line. In this example, "chmod" on the command line knows that
70 its first argument is octal because that's what it does:
71
72 %prompt> chmod 644 file
73
74 If you want to use the same literal digits (644) in Perl, you have to
75 tell Perl to treat them as octal numbers either by prefixing the digits
76 with a 0 or using "oct":
77
78 chmod( 0644, $file); # right, has leading zero
79 chmod( oct(644), $file ); # also correct
80
81 The problem comes in when you take your numbers from something that
82 Perl thinks is a string, such as a command line argument in @ARGV:
83
84 chmod( $ARGV[0], $file); # wrong, even if "0644"
85
86 chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
87
88 You can always check the value you're using by printing it in octal
89 notation to ensure it matches what you think it should be. Print it in
90 octal and decimal format:
91
92 printf "0%o %d", $number, $number;
93
94 Does Perl have a round() function? What about ceil() and floor()? Trig
95 functions?
96 Remember that "int()" merely truncates toward 0. For rounding to a
97 certain number of digits, "sprintf()" or "printf()" is usually the
98 easiest route.
99
100 printf("%.3f", 3.1415926535); # prints 3.142
101
102 The "POSIX" module (part of the standard Perl distribution) implements
103 "ceil()", "floor()", and a number of other mathematical and
104 trigonometric functions.
105
106 use POSIX;
107 $ceil = ceil(3.5); # 4
108 $floor = floor(3.5); # 3
109
110 In 5.000 to 5.003 perls, trigonometry was done in the "Math::Complex"
111 module. With 5.004, the "Math::Trig" module (part of the standard Perl
112 distribution) implements the trigonometric functions. Internally it
113 uses the "Math::Complex" module and some functions can break out from
114 the real axis into the complex plane, for example the inverse sine of
115 2.
116
117 Rounding in financial applications can have serious implications, and
118 the rounding method used should be specified precisely. In these
119 cases, it probably pays not to trust whichever system rounding is being
120 used by Perl, but to instead implement the rounding function you need
121 yourself.
122
123 To see why, notice how you'll still have an issue on half-way-point
124 alternation:
125
126 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
127
128 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
129 0.8 0.8 0.9 0.9 1.0 1.0
130
131 Don't blame Perl. It's the same as in C. IEEE says we have to do
132 this. Perl numbers whose absolute values are integers under 2**31 (on
133 32 bit machines) will work pretty much like mathematical integers.
134 Other numbers are not guaranteed.
135
136 How do I convert between numeric representations/bases/radixes?
137 As always with Perl there is more than one way to do it. Below are a
138 few examples of approaches to making common conversions between number
139 representations. This is intended to be representational rather than
140 exhaustive.
141
142 Some of the examples later in perlfaq4 use the "Bit::Vector" module
143 from CPAN. The reason you might choose "Bit::Vector" over the perl
144 built in functions is that it works with numbers of ANY size, that it
145 is optimized for speed on some operations, and for at least some
146 programmers the notation might be familiar.
147
148 How do I convert hexadecimal into decimal
149 Using perl's built in conversion of "0x" notation:
150
151 $dec = 0xDEADBEEF;
152
153 Using the "hex" function:
154
155 $dec = hex("DEADBEEF");
156
157 Using "pack":
158
159 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
160
161 Using the CPAN module "Bit::Vector":
162
163 use Bit::Vector;
164 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
165 $dec = $vec->to_Dec();
166
167 How do I convert from decimal to hexadecimal
168 Using "sprintf":
169
170 $hex = sprintf("%X", 3735928559); # upper case A-F
171 $hex = sprintf("%x", 3735928559); # lower case a-f
172
173 Using "unpack":
174
175 $hex = unpack("H*", pack("N", 3735928559));
176
177 Using "Bit::Vector":
178
179 use Bit::Vector;
180 $vec = Bit::Vector->new_Dec(32, -559038737);
181 $hex = $vec->to_Hex();
182
183 And "Bit::Vector" supports odd bit counts:
184
185 use Bit::Vector;
186 $vec = Bit::Vector->new_Dec(33, 3735928559);
187 $vec->Resize(32); # suppress leading 0 if unwanted
188 $hex = $vec->to_Hex();
189
190 How do I convert from octal to decimal
191 Using Perl's built in conversion of numbers with leading zeros:
192
193 $dec = 033653337357; # note the leading 0!
194
195 Using the "oct" function:
196
197 $dec = oct("33653337357");
198
199 Using "Bit::Vector":
200
201 use Bit::Vector;
202 $vec = Bit::Vector->new(32);
203 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
204 $dec = $vec->to_Dec();
205
206 How do I convert from decimal to octal
207 Using "sprintf":
208
209 $oct = sprintf("%o", 3735928559);
210
211 Using "Bit::Vector":
212
213 use Bit::Vector;
214 $vec = Bit::Vector->new_Dec(32, -559038737);
215 $oct = reverse join('', $vec->Chunk_List_Read(3));
216
217 How do I convert from binary to decimal
218 Perl 5.6 lets you write binary numbers directly with the "0b"
219 notation:
220
221 $number = 0b10110110;
222
223 Using "oct":
224
225 my $input = "10110110";
226 $decimal = oct( "0b$input" );
227
228 Using "pack" and "ord":
229
230 $decimal = ord(pack('B8', '10110110'));
231
232 Using "pack" and "unpack" for larger strings:
233
234 $int = unpack("N", pack("B32",
235 substr("0" x 32 . "11110101011011011111011101111", -32)));
236 $dec = sprintf("%d", $int);
237
238 # substr() is used to left pad a 32 character string with zeros.
239
240 Using "Bit::Vector":
241
242 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
243 $dec = $vec->to_Dec();
244
245 How do I convert from decimal to binary
246 Using "sprintf" (perl 5.6+):
247
248 $bin = sprintf("%b", 3735928559);
249
250 Using "unpack":
251
252 $bin = unpack("B*", pack("N", 3735928559));
253
254 Using "Bit::Vector":
255
256 use Bit::Vector;
257 $vec = Bit::Vector->new_Dec(32, -559038737);
258 $bin = $vec->to_Bin();
259
260 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
261 are left as an exercise to the inclined reader.
262
263 Why doesn't & work the way I want it to?
264 The behavior of binary arithmetic operators depends on whether they're
265 used on numbers or strings. The operators treat a string as a series
266 of bits and work with that (the string "3" is the bit pattern
267 00110011). The operators work with the binary form of a number (the
268 number 3 is treated as the bit pattern 00000011).
269
270 So, saying "11 & 3" performs the "and" operation on numbers (yielding
271 3). Saying "11" & "3" performs the "and" operation on strings
272 (yielding "1").
273
274 Most problems with "&" and "|" arise because the programmer thinks they
275 have a number but really it's a string. The rest arise because the
276 programmer says:
277
278 if ("\020\020" & "\101\101") {
279 # ...
280 }
281
282 but a string consisting of two null bytes (the result of "\020\020" &
283 "\101\101") is not a false value in Perl. You need:
284
285 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
286 # ...
287 }
288
289 How do I multiply matrices?
290 Use the "Math::Matrix" or "Math::MatrixReal" modules (available from
291 CPAN) or the "PDL" extension (also available from CPAN).
292
293 How do I perform an operation on a series of integers?
294 To call a function on each element in an array, and collect the
295 results, use:
296
297 @results = map { my_func($_) } @array;
298
299 For example:
300
301 @triple = map { 3 * $_ } @single;
302
303 To call a function on each element of an array, but ignore the results:
304
305 foreach $iterator (@array) {
306 some_func($iterator);
307 }
308
309 To call a function on each integer in a (small) range, you can use:
310
311 @results = map { some_func($_) } (5 .. 25);
312
313 but you should be aware that the ".." operator creates an array of all
314 integers in the range. This can take a lot of memory for large ranges.
315 Instead use:
316
317 @results = ();
318 for ($i=5; $i < 500_005; $i++) {
319 push(@results, some_func($i));
320 }
321
322 This situation has been fixed in Perl5.005. Use of ".." in a "for" loop
323 will iterate over the range, without creating the entire range.
324
325 for my $i (5 .. 500_005) {
326 push(@results, some_func($i));
327 }
328
329 will not create a list of 500,000 integers.
330
331 How can I output Roman numerals?
332 Get the http://www.cpan.org/modules/by-module/Roman
333 <http://www.cpan.org/modules/by-module/Roman> module.
334
335 Why aren't my random numbers random?
336 If you're using a version of Perl before 5.004, you must call "srand"
337 once at the start of your program to seed the random number generator.
338
339 BEGIN { srand() if $] < 5.004 }
340
341 5.004 and later automatically call "srand" at the beginning. Don't
342 call "srand" more than once--you make your numbers less random, rather
343 than more.
344
345 Computers are good at being predictable and bad at being random
346 (despite appearances caused by bugs in your programs :-). see the
347 random article in the "Far More Than You Ever Wanted To Know"
348 collection in <http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy
349 of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
350 who attempts to generate random numbers by deterministic means is, of
351 course, living in a state of sin."
352
353 If you want numbers that are more random than "rand" with "srand"
354 provides, you should also check out the "Math::TrulyRandom" module from
355 CPAN. It uses the imperfections in your system's timer to generate
356 random numbers, but this takes quite a while. If you want a better
357 pseudorandom generator than comes with your operating system, look at
358 "Numerical Recipes in C" at <http://www.nr.com/>.
359
360 How do I get a random number between X and Y?
361 To get a random number between two values, you can use the "rand()"
362 built-in to get a random number between 0 and 1. From there, you shift
363 that into the range that you want.
364
365 "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
366 you want to have perl figure out is a random number in the range from 0
367 to the difference between your X and Y.
368
369 That is, to get a number between 10 and 15, inclusive, you want a
370 random number between 0 and 5 that you can then add to 10.
371
372 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
373
374 Hence you derive the following simple function to abstract that. It
375 selects a random integer between the two given integers (inclusive),
376 For example: "random_int_between(50,120)".
377
378 sub random_int_between {
379 my($min, $max) = @_;
380 # Assumes that the two arguments are integers themselves!
381 return $min if $min == $max;
382 ($min, $max) = ($max, $min) if $min > $max;
383 return $min + int rand(1 + $max - $min);
384 }
385
387 How do I find the day or week of the year?
388 The "localtime" function returns the day of the year. Without an
389 argument "localtime" uses the current time.
390
391 $day_of_year = (localtime)[7];
392
393 The "POSIX" module can also format a date as the day of the year or
394 week of the year.
395
396 use POSIX qw/strftime/;
397 my $day_of_year = strftime "%j", localtime;
398 my $week_of_year = strftime "%W", localtime;
399
400 To get the day of year for any date, use "POSIX"'s "mktime" to get a
401 time in epoch seconds for the argument to "localtime".
402
403 use POSIX qw/mktime strftime/;
404 my $week_of_year = strftime "%W",
405 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
406
407 The "Date::Calc" module provides two functions to calculate these.
408
409 use Date::Calc;
410 my $day_of_year = Day_of_Year( 1987, 12, 18 );
411 my $week_of_year = Week_of_Year( 1987, 12, 18 );
412
413 How do I find the current century or millennium?
414 Use the following simple functions:
415
416 sub get_century {
417 return int((((localtime(shift || time))[5] + 1999))/100);
418 }
419
420 sub get_millennium {
421 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
422 }
423
424 On some systems, the "POSIX" module's "strftime()" function has been
425 extended in a non-standard way to use a %C format, which they sometimes
426 claim is the "century". It isn't, because on most such systems, this is
427 only the first two digits of the four-digit year, and thus cannot be
428 used to reliably determine the current century or millennium.
429
430 How can I compare two dates and find the difference?
431 (contributed by brian d foy)
432
433 You could just store all your dates as a number and then subtract.
434 Life isn't always that simple though. If you want to work with
435 formatted dates, the "Date::Manip", "Date::Calc", or "DateTime" modules
436 can help you.
437
438 How can I take a string and turn it into epoch seconds?
439 If it's a regular enough string that it always has the same format, you
440 can split it up and pass the parts to "timelocal" in the standard
441 "Time::Local" module. Otherwise, you should look into the "Date::Calc"
442 and "Date::Manip" modules from CPAN.
443
444 How can I find the Julian Day?
445 (contributed by brian d foy and Dave Cross)
446
447 You can use the "Time::JulianDay" module available on CPAN. Ensure
448 that you really want to find a Julian day, though, as many people have
449 different ideas about Julian days. See
450 http://www.hermetic.ch/cal_stud/jdn.htm for instance.
451
452 You can also try the "DateTime" module, which can convert a date/time
453 to a Julian Day.
454
455 $ perl -MDateTime -le'print DateTime->today->jd'
456 2453401.5
457
458 Or the modified Julian Day
459
460 $ perl -MDateTime -le'print DateTime->today->mjd'
461 53401
462
463 Or even the day of the year (which is what some people think of as a
464 Julian day)
465
466 $ perl -MDateTime -le'print DateTime->today->doy'
467 31
468
469 How do I find yesterday's date?
470 (contributed by brian d foy)
471
472 Use one of the Date modules. The "DateTime" module makes it simple, and
473 give you the same time of day, only the day before.
474
475 use DateTime;
476
477 my $yesterday = DateTime->now->subtract( days => 1 );
478
479 print "Yesterday was $yesterday\n";
480
481 You can also use the "Date::Calc" module using its "Today_and_Now"
482 function.
483
484 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
485
486 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
487
488 print "@date_time\n";
489
490 Most people try to use the time rather than the calendar to figure out
491 dates, but that assumes that days are twenty-four hours each. For most
492 people, there are two days a year when they aren't: the switch to and
493 from summer time throws this off. Let the modules do the work.
494
495 If you absolutely must do it yourself (or can't use one of the
496 modules), here's a solution using "Time::Local", which comes with Perl:
497
498 # contributed by Gunnar Hjalmarsson
499 use Time::Local;
500 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
501 my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
502 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
503
504 In this case, you measure the day starting at noon, and subtract 24
505 hours. Even if the length of the calendar day is 23 or 25 hours, you'll
506 still end up on the previous calendar day, although not at noon. Since
507 you don't care about the time, the one hour difference doesn't matter
508 and you end up with the previous date.
509
510 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
511 (contributed by brian d foy)
512
513 Perl itself never had a Y2K problem, although that never stopped people
514 from creating Y2K problems on their own. See the documentation for
515 "localtime" for its proper use.
516
517 Starting with Perl 5.11, "localtime" and "gmtime" can handle dates past
518 03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
519 still might get a warning on a 32-bit "perl":
520
521 % perl5.11.2 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
522 Integer overflow in hexadecimal number at -e line 1.
523 Wed Nov 1 19:42:39 5576711
524
525 On a 64-bit "perl", you can get even larger dates for those really long
526 running projects:
527
528 % perl5.11.2 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
529 Thu Nov 2 00:42:39 5576711
530
531 You're still out of luck if you need to keep tracking of decaying
532 protons though.
533
535 How do I validate input?
536 (contributed by brian d foy)
537
538 There are many ways to ensure that values are what you expect or want
539 to accept. Besides the specific examples that we cover in the perlfaq,
540 you can also look at the modules with "Assert" and "Validate" in their
541 names, along with other modules such as "Regexp::Common".
542
543 Some modules have validation for particular types of input, such as
544 "Business::ISBN", "Business::CreditCard", "Email::Valid", and
545 "Data::Validate::IP".
546
547 How do I unescape a string?
548 It depends just what you mean by "escape". URL escapes are dealt with
549 in perlfaq9. Shell escapes with the backslash ("\") character are
550 removed with
551
552 s/\\(.)/$1/g;
553
554 This won't expand "\n" or "\t" or any other special escapes.
555
556 How do I remove consecutive pairs of characters?
557 (contributed by brian d foy)
558
559 You can use the substitution operator to find pairs of characters (or
560 runs of characters) and replace them with a single instance. In this
561 substitution, we find a character in "(.)". The memory parentheses
562 store the matched character in the back-reference "\1" and we use that
563 to require that the same thing immediately follow it. We replace that
564 part of the string with the character in $1.
565
566 s/(.)\1/$1/g;
567
568 We can also use the transliteration operator, "tr///". In this example,
569 the search list side of our "tr///" contains nothing, but the "c"
570 option complements that so it contains everything. The replacement list
571 also contains nothing, so the transliteration is almost a no-op since
572 it won't do any replacements (or more exactly, replace the character
573 with itself). However, the "s" option squashes duplicated and
574 consecutive characters in the string so a character does not show up
575 next to itself
576
577 my $str = 'Haarlem'; # in the Netherlands
578 $str =~ tr///cs; # Now Harlem, like in New York
579
580 How do I expand function calls in a string?
581 (contributed by brian d foy)
582
583 This is documented in perlref, and although it's not the easiest thing
584 to read, it does work. In each of these examples, we call the function
585 inside the braces used to dereference a reference. If we have more than
586 one return value, we can construct and dereference an anonymous array.
587 In this case, we call the function in list context.
588
589 print "The time values are @{ [localtime] }.\n";
590
591 If we want to call the function in scalar context, we have to do a bit
592 more work. We can really have any code we like inside the braces, so we
593 simply have to end with the scalar reference, although how you do that
594 is up to you, and you can use code inside the braces. Note that the use
595 of parens creates a list context, so we need "scalar" to force the
596 scalar context on the function:
597
598 print "The time is ${\(scalar localtime)}.\n"
599
600 print "The time is ${ my $x = localtime; \$x }.\n";
601
602 If your function already returns a reference, you don't need to create
603 the reference yourself.
604
605 sub timestamp { my $t = localtime; \$t }
606
607 print "The time is ${ timestamp() }.\n";
608
609 The "Interpolation" module can also do a lot of magic for you. You can
610 specify a variable name, in this case "E", to set up a tied hash that
611 does the interpolation for you. It has several other methods to do this
612 as well.
613
614 use Interpolation E => 'eval';
615 print "The time values are $E{localtime()}.\n";
616
617 In most cases, it is probably easier to simply use string
618 concatenation, which also forces scalar context.
619
620 print "The time is " . localtime() . ".\n";
621
622 How do I find matching/nesting anything?
623 This isn't something that can be done in one regular expression, no
624 matter how complicated. To find something between two single
625 characters, a pattern like "/x([^x]*)x/" will get the intervening bits
626 in $1. For multiple ones, then something more like "/alpha(.*?)omega/"
627 would be needed. But none of these deals with nested patterns. For
628 balanced expressions using "(", "{", "[" or "<" as delimiters, use the
629 CPAN module Regexp::Common, or see "(??{ code })" in perlre. For other
630 cases, you'll have to write a parser.
631
632 If you are serious about writing a parser, there are a number of
633 modules or oddities that will make your life a lot easier. There are
634 the CPAN modules "Parse::RecDescent", "Parse::Yapp", and
635 "Text::Balanced"; and the "byacc" program. Starting from perl 5.8 the
636 "Text::Balanced" is part of the standard distribution.
637
638 One simple destructive, inside-out approach that you might try is to
639 pull out the smallest nesting parts one at a time:
640
641 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
642 # do something with $1
643 }
644
645 A more complicated and sneaky approach is to make Perl's regular
646 expression engine do it for you. This is courtesy Dean Inada, and
647 rather has the nature of an Obfuscated Perl Contest entry, but it
648 really does work:
649
650 # $_ contains the string to parse
651 # BEGIN and END are the opening and closing markers for the
652 # nested text.
653
654 @( = ('(','');
655 @) = (')','');
656 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
657 @$ = (eval{/$re/},$@!~/unmatched/i);
658 print join("\n",@$[0..$#$]) if( $$[-1] );
659
660 How do I reverse a string?
661 Use "reverse()" in scalar context, as documented in "reverse" in
662 perlfunc.
663
664 $reversed = reverse $string;
665
666 How do I expand tabs in a string?
667 You can do it yourself:
668
669 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
670
671 Or you can just use the "Text::Tabs" module (part of the standard Perl
672 distribution).
673
674 use Text::Tabs;
675 @expanded_lines = expand(@lines_with_tabs);
676
677 How do I reformat a paragraph?
678 Use "Text::Wrap" (part of the standard Perl distribution):
679
680 use Text::Wrap;
681 print wrap("\t", ' ', @paragraphs);
682
683 The paragraphs you give to "Text::Wrap" should not contain embedded
684 newlines. "Text::Wrap" doesn't justify the lines (flush-right).
685
686 Or use the CPAN module "Text::Autoformat". Formatting files can be
687 easily done by making a shell alias, like so:
688
689 alias fmt="perl -i -MText::Autoformat -n0777 \
690 -e 'print autoformat $_, {all=>1}' $*"
691
692 See the documentation for "Text::Autoformat" to appreciate its many
693 capabilities.
694
695 How can I access or change N characters of a string?
696 You can access the first characters of a string with substr(). To get
697 the first character, for example, start at position 0 and grab the
698 string of length 1.
699
700 $string = "Just another Perl Hacker";
701 $first_char = substr( $string, 0, 1 ); # 'J'
702
703 To change part of a string, you can use the optional fourth argument
704 which is the replacement string.
705
706 substr( $string, 13, 4, "Perl 5.8.0" );
707
708 You can also use substr() as an lvalue.
709
710 substr( $string, 13, 4 ) = "Perl 5.8.0";
711
712 How do I change the Nth occurrence of something?
713 You have to keep track of N yourself. For example, let's say you want
714 to change the fifth occurrence of "whoever" or "whomever" into
715 "whosoever" or "whomsoever", case insensitively. These all assume that
716 $_ contains the string to be altered.
717
718 $count = 0;
719 s{((whom?)ever)}{
720 ++$count == 5 # is it the 5th?
721 ? "${2}soever" # yes, swap
722 : $1 # renege and leave it there
723 }ige;
724
725 In the more general case, you can use the "/g" modifier in a "while"
726 loop, keeping count of matches.
727
728 $WANT = 3;
729 $count = 0;
730 $_ = "One fish two fish red fish blue fish";
731 while (/(\w+)\s+fish\b/gi) {
732 if (++$count == $WANT) {
733 print "The third fish is a $1 one.\n";
734 }
735 }
736
737 That prints out: "The third fish is a red one." You can also use a
738 repetition count and repeated pattern like this:
739
740 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
741
742 How can I count the number of occurrences of a substring within a string?
743 There are a number of ways, with varying efficiency. If you want a
744 count of a certain single character (X) within a string, you can use
745 the "tr///" function like so:
746
747 $string = "ThisXlineXhasXsomeXx'sXinXit";
748 $count = ($string =~ tr/X//);
749 print "There are $count X characters in the string";
750
751 This is fine if you are just looking for a single character. However,
752 if you are trying to count multiple character substrings within a
753 larger string, "tr///" won't work. What you can do is wrap a while()
754 loop around a global pattern match. For example, let's count negative
755 integers:
756
757 $string = "-9 55 48 -2 23 -76 4 14 -44";
758 while ($string =~ /-\d+/g) { $count++ }
759 print "There are $count negative numbers in the string";
760
761 Another version uses a global match in list context, then assigns the
762 result to a scalar, producing a count of the number of matches.
763
764 $count = () = $string =~ /-\d+/g;
765
766 How do I capitalize all the words on one line?
767 (contributed by brian d foy)
768
769 Damian Conway's Text::Autoformat handles all of the thinking for you.
770
771 use Text::Autoformat;
772 my $x = "Dr. Strangelove or: How I Learned to Stop ".
773 "Worrying and Love the Bomb";
774
775 print $x, "\n";
776 for my $style (qw( sentence title highlight )) {
777 print autoformat($x, { case => $style }), "\n";
778 }
779
780 How do you want to capitalize those words?
781
782 FRED AND BARNEY'S LODGE # all uppercase
783 Fred And Barney's Lodge # title case
784 Fred and Barney's Lodge # highlight case
785
786 It's not as easy a problem as it looks. How many words do you think are
787 in there? Wait for it... wait for it.... If you answered 5 you're
788 right. Perl words are groups of "\w+", but that's not what you want to
789 capitalize. How is Perl supposed to know not to capitalize that "s"
790 after the apostrophe? You could try a regular expression:
791
792 $string =~ s/ (
793 (^\w) #at the beginning of the line
794 | # or
795 (\s\w) #preceded by whitespace
796 )
797 /\U$1/xg;
798
799 $string =~ s/([\w']+)/\u\L$1/g;
800
801 Now, what if you don't want to capitalize that "and"? Just use
802 Text::Autoformat and get on with the next problem. :)
803
804 How can I split a [character] delimited string except when inside
805 [character]?
806 Several modules can handle this sort of parsing--"Text::Balanced",
807 "Text::CSV", "Text::CSV_XS", and "Text::ParseWords", among others.
808
809 Take the example case of trying to split a string that is comma-
810 separated into its different fields. You can't use "split(/,/)" because
811 you shouldn't split if the comma is inside quotes. For example, take a
812 data line like this:
813
814 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
815
816 Due to the restriction of the quotes, this is a fairly complex problem.
817 Thankfully, we have Jeffrey Friedl, author of Mastering Regular
818 Expressions, to handle these for us. He suggests (assuming your string
819 is contained in $text):
820
821 @new = ();
822 push(@new, $+) while $text =~ m{
823 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
824 | ([^,]+),?
825 | ,
826 }gx;
827 push(@new, undef) if substr($text,-1,1) eq ',';
828
829 If you want to represent quotation marks inside a quotation-mark-
830 delimited field, escape them with backslashes (eg, "like \"this\"".
831
832 Alternatively, the "Text::ParseWords" module (part of the standard Perl
833 distribution) lets you say:
834
835 use Text::ParseWords;
836 @new = quotewords(",", 0, $text);
837
838 How do I strip blank space from the beginning/end of a string?
839 (contributed by brian d foy)
840
841 A substitution can do this for you. For a single line, you want to
842 replace all the leading or trailing whitespace with nothing. You can do
843 that with a pair of substitutions.
844
845 s/^\s+//;
846 s/\s+$//;
847
848 You can also write that as a single substitution, although it turns out
849 the combined statement is slower than the separate ones. That might not
850 matter to you, though.
851
852 s/^\s+|\s+$//g;
853
854 In this regular expression, the alternation matches either at the
855 beginning or the end of the string since the anchors have a lower
856 precedence than the alternation. With the "/g" flag, the substitution
857 makes all possible matches, so it gets both. Remember, the trailing
858 newline matches the "\s+", and the "$" anchor can match to the
859 physical end of the string, so the newline disappears too. Just add the
860 newline to the output, which has the added benefit of preserving
861 "blank" (consisting entirely of whitespace) lines which the "^\s+"
862 would remove all by itself.
863
864 while( <> )
865 {
866 s/^\s+|\s+$//g;
867 print "$_\n";
868 }
869
870 For a multi-line string, you can apply the regular expression to each
871 logical line in the string by adding the "/m" flag (for "multi-line").
872 With the "/m" flag, the "$" matches before an embedded newline, so it
873 doesn't remove it. It still removes the newline at the end of the
874 string.
875
876 $string =~ s/^\s+|\s+$//gm;
877
878 Remember that lines consisting entirely of whitespace will disappear,
879 since the first part of the alternation can match the entire string and
880 replace it with nothing. If need to keep embedded blank lines, you have
881 to do a little more work. Instead of matching any whitespace (since
882 that includes a newline), just match the other whitespace.
883
884 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
885
886 How do I pad a string with blanks or pad a number with zeroes?
887 In the following examples, $pad_len is the length to which you wish to
888 pad the string, $text or $num contains the string to be padded, and
889 $pad_char contains the padding character. You can use a single
890 character string constant instead of the $pad_char variable if you know
891 what it is in advance. And in the same way you can use an integer in
892 place of $pad_len if you know the pad length in advance.
893
894 The simplest method uses the "sprintf" function. It can pad on the left
895 or right with blanks and on the left with zeroes and it will not
896 truncate the result. The "pack" function can only pad strings on the
897 right with blanks and it will truncate the result to a maximum length
898 of $pad_len.
899
900 # Left padding a string with blanks (no truncation):
901 $padded = sprintf("%${pad_len}s", $text);
902 $padded = sprintf("%*s", $pad_len, $text); # same thing
903
904 # Right padding a string with blanks (no truncation):
905 $padded = sprintf("%-${pad_len}s", $text);
906 $padded = sprintf("%-*s", $pad_len, $text); # same thing
907
908 # Left padding a number with 0 (no truncation):
909 $padded = sprintf("%0${pad_len}d", $num);
910 $padded = sprintf("%0*d", $pad_len, $num); # same thing
911
912 # Right padding a string with blanks using pack (will truncate):
913 $padded = pack("A$pad_len",$text);
914
915 If you need to pad with a character other than blank or zero you can
916 use one of the following methods. They all generate a pad string with
917 the "x" operator and combine that with $text. These methods do not
918 truncate $text.
919
920 Left and right padding with any character, creating a new string:
921
922 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
923 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
924
925 Left and right padding with any character, modifying $text directly:
926
927 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
928 $text .= $pad_char x ( $pad_len - length( $text ) );
929
930 How do I extract selected columns from a string?
931 (contributed by brian d foy)
932
933 If you know the columns that contain the data, you can use "substr" to
934 extract a single column.
935
936 my $column = substr( $line, $start_column, $length );
937
938 You can use "split" if the columns are separated by whitespace or some
939 other delimiter, as long as whitespace or the delimiter cannot appear
940 as part of the data.
941
942 my $line = ' fred barney betty ';
943 my @columns = split /\s+/, $line;
944 # ( '', 'fred', 'barney', 'betty' );
945
946 my $line = 'fred||barney||betty';
947 my @columns = split /\|/, $line;
948 # ( 'fred', '', 'barney', '', 'betty' );
949
950 If you want to work with comma-separated values, don't do this since
951 that format is a bit more complicated. Use one of the modules that
952 handle that format, such as "Text::CSV", "Text::CSV_XS", or
953 "Text::CSV_PP".
954
955 If you want to break apart an entire line of fixed columns, you can use
956 "unpack" with the A (ASCII) format. By using a number after the format
957 specifier, you can denote the column width. See the "pack" and "unpack"
958 entries in perlfunc for more details.
959
960 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
961
962 Note that spaces in the format argument to "unpack" do not denote
963 literal spaces. If you have space separated data, you may want "split"
964 instead.
965
966 How do I find the soundex value of a string?
967 (contributed by brian d foy)
968
969 You can use the Text::Soundex module. If you want to do fuzzy or close
970 matching, you might also try the "String::Approx", and
971 "Text::Metaphone", and "Text::DoubleMetaphone" modules.
972
973 How can I expand variables in text strings?
974 (contributed by brian d foy)
975
976 If you can avoid it, don't, or if you can use a templating system, such
977 as "Text::Template" or "Template" Toolkit, do that instead. You might
978 even be able to get the job done with "sprintf" or "printf":
979
980 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
981
982 However, for the one-off simple case where I don't want to pull out a
983 full templating system, I'll use a string that has two Perl scalar
984 variables in it. In this example, I want to expand $foo and $bar to
985 their variable's values:
986
987 my $foo = 'Fred';
988 my $bar = 'Barney';
989 $string = 'Say hello to $foo and $bar';
990
991 One way I can do this involves the substitution operator and a double
992 "/e" flag. The first "/e" evaluates $1 on the replacement side and
993 turns it into $foo. The second /e starts with $foo and replaces it with
994 its value. $foo, then, turns into 'Fred', and that's finally what's
995 left in the string:
996
997 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
998
999 The "/e" will also silently ignore violations of strict, replacing
1000 undefined variable names with the empty string. Since I'm using the
1001 "/e" flag (twice even!), I have all of the same security problems I
1002 have with "eval" in its string form. If there's something odd in $foo,
1003 perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1004 myself in trouble.
1005
1006 To get around the security problem, I could also pull the values from a
1007 hash instead of evaluating variable names. Using a single "/e", I can
1008 check the hash to ensure the value exists, and if it doesn't, I can
1009 replace the missing value with a marker, in this case "???" to signal
1010 that I missed something:
1011
1012 my $string = 'This has $foo and $bar';
1013
1014 my %Replacements = (
1015 foo => 'Fred',
1016 );
1017
1018 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1019 $string =~ s/\$(\w+)/
1020 exists $Replacements{$1} ? $Replacements{$1} : '???'
1021 /eg;
1022
1023 print $string;
1024
1025 What's wrong with always quoting "$vars"?
1026 The problem is that those double-quotes force stringification--coercing
1027 numbers and references into strings--even when you don't want them to
1028 be strings. Think of it this way: double-quote expansion is used to
1029 produce new strings. If you already have a string, why do you need
1030 more?
1031
1032 If you get used to writing odd things like these:
1033
1034 print "$var"; # BAD
1035 $new = "$old"; # BAD
1036 somefunc("$var"); # BAD
1037
1038 You'll be in trouble. Those should (in 99.8% of the cases) be the
1039 simpler and more direct:
1040
1041 print $var;
1042 $new = $old;
1043 somefunc($var);
1044
1045 Otherwise, besides slowing you down, you're going to break code when
1046 the thing in the scalar is actually neither a string nor a number, but
1047 a reference:
1048
1049 func(\@array);
1050 sub func {
1051 my $aref = shift;
1052 my $oref = "$aref"; # WRONG
1053 }
1054
1055 You can also get into subtle problems on those few operations in Perl
1056 that actually do care about the difference between a string and a
1057 number, such as the magical "++" autoincrement operator or the
1058 syscall() function.
1059
1060 Stringification also destroys arrays.
1061
1062 @lines = `command`;
1063 print "@lines"; # WRONG - extra blanks
1064 print @lines; # right
1065
1066 Why don't my <<HERE documents work?
1067 Check for these three things:
1068
1069 There must be no space after the << part.
1070 There (probably) should be a semicolon at the end.
1071 You can't (easily) have any space in front of the tag.
1072
1073 If you want to indent the text in the here document, you can do this:
1074
1075 # all in one
1076 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1077 your text
1078 goes here
1079 HERE_TARGET
1080
1081 But the HERE_TARGET must still be flush against the margin. If you
1082 want that indented also, you'll have to quote in the indentation.
1083
1084 ($quote = <<' FINIS') =~ s/^\s+//gm;
1085 ...we will have peace, when you and all your works have
1086 perished--and the works of your dark master to whom you
1087 would deliver us. You are a liar, Saruman, and a corrupter
1088 of men's hearts. --Theoden in /usr/src/perl/taint.c
1089 FINIS
1090 $quote =~ s/\s+--/\n--/;
1091
1092 A nice general-purpose fixer-upper function for indented here documents
1093 follows. It expects to be called with a here document as its argument.
1094 It looks to see whether each line begins with a common substring, and
1095 if so, strips that substring off. Otherwise, it takes the amount of
1096 leading whitespace found on the first line and removes that much off
1097 each subsequent line.
1098
1099 sub fix {
1100 local $_ = shift;
1101 my ($white, $leader); # common whitespace and common leading string
1102 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1103 ($white, $leader) = ($2, quotemeta($1));
1104 } else {
1105 ($white, $leader) = (/^(\s+)/, '');
1106 }
1107 s/^\s*?$leader(?:$white)?//gm;
1108 return $_;
1109 }
1110
1111 This works with leading special strings, dynamically determined:
1112
1113 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1114 @@@ int
1115 @@@ runops() {
1116 @@@ SAVEI32(runlevel);
1117 @@@ runlevel++;
1118 @@@ while ( op = (*op->op_ppaddr)() );
1119 @@@ TAINT_NOT;
1120 @@@ return 0;
1121 @@@ }
1122 MAIN_INTERPRETER_LOOP
1123
1124 Or with a fixed amount of leading whitespace, with remaining
1125 indentation correctly preserved:
1126
1127 $poem = fix<<EVER_ON_AND_ON;
1128 Now far ahead the Road has gone,
1129 And I must follow, if I can,
1130 Pursuing it with eager feet,
1131 Until it joins some larger way
1132 Where many paths and errands meet.
1133 And whither then? I cannot say.
1134 --Bilbo in /usr/src/perl/pp_ctl.c
1135 EVER_ON_AND_ON
1136
1138 What is the difference between a list and an array?
1139 (contributed by brian d foy)
1140
1141 A list is a fixed collection of scalars. An array is a variable that
1142 holds a variable collection of scalars. An array can supply its
1143 collection for list operations, so list operations also work on arrays:
1144
1145 # slices
1146 ( 'dog', 'cat', 'bird' )[2,3];
1147 @animals[2,3];
1148
1149 # iteration
1150 foreach ( qw( dog cat bird ) ) { ... }
1151 foreach ( @animals ) { ... }
1152
1153 my @three = grep { length == 3 } qw( dog cat bird );
1154 my @three = grep { length == 3 } @animals;
1155
1156 # supply an argument list
1157 wash_animals( qw( dog cat bird ) );
1158 wash_animals( @animals );
1159
1160 Array operations, which change the scalars, reaaranges them, or adds or
1161 subtracts some scalars, only work on arrays. These can't work on a
1162 list, which is fixed. Array operations include "shift", "unshift",
1163 "push", "pop", and "splice".
1164
1165 An array can also change its length:
1166
1167 $#animals = 1; # truncate to two elements
1168 $#animals = 10000; # pre-extend to 10,001 elements
1169
1170 You can change an array element, but you can't change a list element:
1171
1172 $animals[0] = 'Rottweiler';
1173 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1174
1175 foreach ( @animals ) {
1176 s/^d/fr/; # works fine
1177 }
1178
1179 foreach ( qw( dog cat bird ) ) {
1180 s/^d/fr/; # Error! Modification of read only value!
1181 }
1182
1183 However, if the list element is itself a variable, it appears that you
1184 can change a list element. However, the list element is the variable,
1185 not the data. You're not changing the list element, but something the
1186 list element refers to. The list element itself doesn't change: it's
1187 still the same variable.
1188
1189 You also have to be careful about context. You can assign an array to a
1190 scalar to get the number of elements in the array. This only works for
1191 arrays, though:
1192
1193 my $count = @animals; # only works with arrays
1194
1195 If you try to do the same thing with what you think is a list, you get
1196 a quite different result. Although it looks like you have a list on the
1197 righthand side, Perl actually sees a bunch of scalars separated by a
1198 comma:
1199
1200 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
1201
1202 Since you're assigning to a scalar, the righthand side is in scalar
1203 context. The comma operator (yes, it's an operator!) in scalar context
1204 evaluates its lefthand side, throws away the result, and evaluates it's
1205 righthand side and returns the result. In effect, that list-lookalike
1206 assigns to $scalar it's rightmost value. Many people mess this up
1207 becuase they choose a list-lookalike whose last element is also the
1208 count they expect:
1209
1210 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
1211
1212 What is the difference between $array[1] and @array[1]?
1213 (contributed by brian d foy)
1214
1215 The difference is the sigil, that special character in front of the
1216 array name. The "$" sigil means "exactly one item", while the "@" sigil
1217 means "zero or more items". The "$" gets you a single scalar, while the
1218 "@" gets you a list.
1219
1220 The confusion arises because people incorrectly assume that the sigil
1221 denotes the variable type.
1222
1223 The $array[1] is a single-element access to the array. It's going to
1224 return the item in index 1 (or undef if there is no item there). If
1225 you intend to get exactly one element from the array, this is the form
1226 you should use.
1227
1228 The @array[1] is an array slice, although it has only one index. You
1229 can pull out multiple elements simultaneously by specifying additional
1230 indices as a list, like @array[1,4,3,0].
1231
1232 Using a slice on the lefthand side of the assignment supplies list
1233 context to the righthand side. This can lead to unexpected results.
1234 For instance, if you want to read a single line from a filehandle,
1235 assigning to a scalar value is fine:
1236
1237 $array[1] = <STDIN>;
1238
1239 However, in list context, the line input operator returns all of the
1240 lines as a list. The first line goes into @array[1] and the rest of the
1241 lines mysteriously disappear:
1242
1243 @array[1] = <STDIN>; # most likely not what you want
1244
1245 Either the "use warnings" pragma or the -w flag will warn you when you
1246 use an array slice with a single index.
1247
1248 How can I remove duplicate elements from a list or array?
1249 (contributed by brian d foy)
1250
1251 Use a hash. When you think the words "unique" or "duplicated", think
1252 "hash keys".
1253
1254 If you don't care about the order of the elements, you could just
1255 create the hash then extract the keys. It's not important how you
1256 create that hash: just that you use "keys" to get the unique elements.
1257
1258 my %hash = map { $_, 1 } @array;
1259 # or a hash slice: @hash{ @array } = ();
1260 # or a foreach: $hash{$_} = 1 foreach ( @array );
1261
1262 my @unique = keys %hash;
1263
1264 If you want to use a module, try the "uniq" function from
1265 "List::MoreUtils". In list context it returns the unique elements,
1266 preserving their order in the list. In scalar context, it returns the
1267 number of unique elements.
1268
1269 use List::MoreUtils qw(uniq);
1270
1271 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1272 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1273
1274 You can also go through each element and skip the ones you've seen
1275 before. Use a hash to keep track. The first time the loop sees an
1276 element, that element has no key in %Seen. The "next" statement creates
1277 the key and immediately uses its value, which is "undef", so the loop
1278 continues to the "push" and increments the value for that key. The next
1279 time the loop sees that same element, its key exists in the hash and
1280 the value for that key is true (since it's not 0 or "undef"), so the
1281 next skips that iteration and the loop goes to the next element.
1282
1283 my @unique = ();
1284 my %seen = ();
1285
1286 foreach my $elem ( @array )
1287 {
1288 next if $seen{ $elem }++;
1289 push @unique, $elem;
1290 }
1291
1292 You can write this more briefly using a grep, which does the same
1293 thing.
1294
1295 my %seen = ();
1296 my @unique = grep { ! $seen{ $_ }++ } @array;
1297
1298 How can I tell whether a certain element is contained in a list or array?
1299 (portions of this answer contributed by Anno Siegel and brian d foy)
1300
1301 Hearing the word "in" is an indication that you probably should have
1302 used a hash, not a list or array, to store your data. Hashes are
1303 designed to answer this question quickly and efficiently. Arrays
1304 aren't.
1305
1306 That being said, there are several ways to approach this. In Perl 5.10
1307 and later, you can use the smart match operator to check that an item
1308 is contained in an array or a hash:
1309
1310 use 5.010;
1311
1312 if( $item ~~ @array )
1313 {
1314 say "The array contains $item"
1315 }
1316
1317 if( $item ~~ %hash )
1318 {
1319 say "The hash contains $item"
1320 }
1321
1322 With earlier versions of Perl, you have to do a bit more work. If you
1323 are going to make this query many times over arbitrary string values,
1324 the fastest way is probably to invert the original array and maintain a
1325 hash whose keys are the first array's values:
1326
1327 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1328 %is_blue = ();
1329 for (@blues) { $is_blue{$_} = 1 }
1330
1331 Now you can check whether $is_blue{$some_color}. It might have been a
1332 good idea to keep the blues all in a hash in the first place.
1333
1334 If the values are all small integers, you could use a simple indexed
1335 array. This kind of an array will take up less space:
1336
1337 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1338 @is_tiny_prime = ();
1339 for (@primes) { $is_tiny_prime[$_] = 1 }
1340 # or simply @istiny_prime[@primes] = (1) x @primes;
1341
1342 Now you check whether $is_tiny_prime[$some_number].
1343
1344 If the values in question are integers instead of strings, you can save
1345 quite a lot of space by using bit strings instead:
1346
1347 @articles = ( 1..10, 150..2000, 2017 );
1348 undef $read;
1349 for (@articles) { vec($read,$_,1) = 1 }
1350
1351 Now check whether "vec($read,$n,1)" is true for some $n.
1352
1353 These methods guarantee fast individual tests but require a re-
1354 organization of the original list or array. They only pay off if you
1355 have to test multiple values against the same array.
1356
1357 If you are testing only once, the standard module "List::Util" exports
1358 the function "first" for this purpose. It works by stopping once it
1359 finds the element. It's written in C for speed, and its Perl equivalent
1360 looks like this subroutine:
1361
1362 sub first (&@) {
1363 my $code = shift;
1364 foreach (@_) {
1365 return $_ if &{$code}();
1366 }
1367 undef;
1368 }
1369
1370 If speed is of little concern, the common idiom uses grep in scalar
1371 context (which returns the number of items that passed its condition)
1372 to traverse the entire list. This does have the benefit of telling you
1373 how many matches it found, though.
1374
1375 my $is_there = grep $_ eq $whatever, @array;
1376
1377 If you want to actually extract the matching elements, simply use grep
1378 in list context.
1379
1380 my @matches = grep $_ eq $whatever, @array;
1381
1382 How do I compute the difference of two arrays? How do I compute the
1383 intersection of two arrays?
1384 Use a hash. Here's code to do both and more. It assumes that each
1385 element is unique in a given array:
1386
1387 @union = @intersection = @difference = ();
1388 %count = ();
1389 foreach $element (@array1, @array2) { $count{$element}++ }
1390 foreach $element (keys %count) {
1391 push @union, $element;
1392 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1393 }
1394
1395 Note that this is the symmetric difference, that is, all elements in
1396 either A or in B but not in both. Think of it as an xor operation.
1397
1398 How do I test whether two arrays or hashes are equal?
1399 With Perl 5.10 and later, the smart match operator can give you the
1400 answer with the least amount of work:
1401
1402 use 5.010;
1403
1404 if( @array1 ~~ @array2 )
1405 {
1406 say "The arrays are the same";
1407 }
1408
1409 if( %hash1 ~~ %hash2 ) # doesn't check values!
1410 {
1411 say "The hash keys are the same";
1412 }
1413
1414 The following code works for single-level arrays. It uses a stringwise
1415 comparison, and does not distinguish defined versus undefined empty
1416 strings. Modify if you have other needs.
1417
1418 $are_equal = compare_arrays(\@frogs, \@toads);
1419
1420 sub compare_arrays {
1421 my ($first, $second) = @_;
1422 no warnings; # silence spurious -w undef complaints
1423 return 0 unless @$first == @$second;
1424 for (my $i = 0; $i < @$first; $i++) {
1425 return 0 if $first->[$i] ne $second->[$i];
1426 }
1427 return 1;
1428 }
1429
1430 For multilevel structures, you may wish to use an approach more like
1431 this one. It uses the CPAN module "FreezeThaw":
1432
1433 use FreezeThaw qw(cmpStr);
1434 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1435
1436 printf "a and b contain %s arrays\n",
1437 cmpStr(\@a, \@b) == 0
1438 ? "the same"
1439 : "different";
1440
1441 This approach also works for comparing hashes. Here we'll demonstrate
1442 two different answers:
1443
1444 use FreezeThaw qw(cmpStr cmpStrHard);
1445
1446 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1447 $a{EXTRA} = \%b;
1448 $b{EXTRA} = \%a;
1449
1450 printf "a and b contain %s hashes\n",
1451 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1452
1453 printf "a and b contain %s hashes\n",
1454 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1455
1456 The first reports that both those the hashes contain the same data,
1457 while the second reports that they do not. Which you prefer is left as
1458 an exercise to the reader.
1459
1460 How do I find the first array element for which a condition is true?
1461 To find the first array element which satisfies a condition, you can
1462 use the "first()" function in the "List::Util" module, which comes with
1463 Perl 5.8. This example finds the first element that contains "Perl".
1464
1465 use List::Util qw(first);
1466
1467 my $element = first { /Perl/ } @array;
1468
1469 If you cannot use "List::Util", you can make your own loop to do the
1470 same thing. Once you find the element, you stop the loop with last.
1471
1472 my $found;
1473 foreach ( @array ) {
1474 if( /Perl/ ) { $found = $_; last }
1475 }
1476
1477 If you want the array index, you can iterate through the indices and
1478 check the array element at each index until you find one that satisfies
1479 the condition.
1480
1481 my( $found, $index ) = ( undef, -1 );
1482 for( $i = 0; $i < @array; $i++ ) {
1483 if( $array[$i] =~ /Perl/ ) {
1484 $found = $array[$i];
1485 $index = $i;
1486 last;
1487 }
1488 }
1489
1490 How do I handle linked lists?
1491 In general, you usually don't need a linked list in Perl, since with
1492 regular arrays, you can push and pop or shift and unshift at either
1493 end, or you can use splice to add and/or remove arbitrary number of
1494 elements at arbitrary points. Both pop and shift are O(1) operations
1495 on Perl's dynamic arrays. In the absence of shifts and pops, push in
1496 general needs to reallocate on the order every log(N) times, and
1497 unshift will need to copy pointers each time.
1498
1499 If you really, really wanted, you could use structures as described in
1500 perldsc or perltoot and do just what the algorithm book tells you to
1501 do. For example, imagine a list node like this:
1502
1503 $node = {
1504 VALUE => 42,
1505 LINK => undef,
1506 };
1507
1508 You could walk the list this way:
1509
1510 print "List: ";
1511 for ($node = $head; $node; $node = $node->{LINK}) {
1512 print $node->{VALUE}, " ";
1513 }
1514 print "\n";
1515
1516 You could add to the list this way:
1517
1518 my ($head, $tail);
1519 $tail = append($head, 1); # grow a new head
1520 for $value ( 2 .. 10 ) {
1521 $tail = append($tail, $value);
1522 }
1523
1524 sub append {
1525 my($list, $value) = @_;
1526 my $node = { VALUE => $value };
1527 if ($list) {
1528 $node->{LINK} = $list->{LINK};
1529 $list->{LINK} = $node;
1530 }
1531 else {
1532 $_[0] = $node; # replace caller's version
1533 }
1534 return $node;
1535 }
1536
1537 But again, Perl's built-in are virtually always good enough.
1538
1539 How do I handle circular lists?
1540 (contributed by brian d foy)
1541
1542 If you want to cycle through an array endlessly, you can increment the
1543 index modulo the number of elements in the array:
1544
1545 my @array = qw( a b c );
1546 my $i = 0;
1547
1548 while( 1 ) {
1549 print $array[ $i++ % @array ], "\n";
1550 last if $i > 20;
1551 }
1552
1553 You can also use "Tie::Cycle" to use a scalar that always has the next
1554 element of the circular array:
1555
1556 use Tie::Cycle;
1557
1558 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1559
1560 print $cycle; # FFFFFF
1561 print $cycle; # 000000
1562 print $cycle; # FFFF00
1563
1564 The "Array::Iterator::Circular" creates an iterator object for circular
1565 arrays:
1566
1567 use Array::Iterator::Circular;
1568
1569 my $color_iterator = Array::Iterator::Circular->new(
1570 qw(red green blue orange)
1571 );
1572
1573 foreach ( 1 .. 20 ) {
1574 print $color_iterator->next, "\n";
1575 }
1576
1577 How do I shuffle an array randomly?
1578 If you either have Perl 5.8.0 or later installed, or if you have
1579 Scalar-List-Utils 1.03 or later installed, you can say:
1580
1581 use List::Util 'shuffle';
1582
1583 @shuffled = shuffle(@list);
1584
1585 If not, you can use a Fisher-Yates shuffle.
1586
1587 sub fisher_yates_shuffle {
1588 my $deck = shift; # $deck is a reference to an array
1589 return unless @$deck; # must not be empty!
1590
1591 my $i = @$deck;
1592 while (--$i) {
1593 my $j = int rand ($i+1);
1594 @$deck[$i,$j] = @$deck[$j,$i];
1595 }
1596 }
1597
1598 # shuffle my mpeg collection
1599 #
1600 my @mpeg = <audio/*/*.mp3>;
1601 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1602 print @mpeg;
1603
1604 Note that the above implementation shuffles an array in place, unlike
1605 the "List::Util::shuffle()" which takes a list and returns a new
1606 shuffled list.
1607
1608 You've probably seen shuffling algorithms that work using splice,
1609 randomly picking another element to swap the current element with
1610
1611 srand;
1612 @new = ();
1613 @old = 1 .. 10; # just a demo
1614 while (@old) {
1615 push(@new, splice(@old, rand @old, 1));
1616 }
1617
1618 This is bad because splice is already O(N), and since you do it N
1619 times, you just invented a quadratic algorithm; that is, O(N**2). This
1620 does not scale, although Perl is so efficient that you probably won't
1621 notice this until you have rather largish arrays.
1622
1623 How do I process/modify each element of an array?
1624 Use "for"/"foreach":
1625
1626 for (@lines) {
1627 s/foo/bar/; # change that word
1628 tr/XZ/ZX/; # swap those letters
1629 }
1630
1631 Here's another; let's compute spherical volumes:
1632
1633 for (@volumes = @radii) { # @volumes has changed parts
1634 $_ **= 3;
1635 $_ *= (4/3) * 3.14159; # this will be constant folded
1636 }
1637
1638 which can also be done with "map()" which is made to transform one list
1639 into another:
1640
1641 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1642
1643 If you want to do the same thing to modify the values of the hash, you
1644 can use the "values" function. As of Perl 5.6 the values are not
1645 copied, so if you modify $orbit (in this case), you modify the value.
1646
1647 for $orbit ( values %orbits ) {
1648 ($orbit **= 3) *= (4/3) * 3.14159;
1649 }
1650
1651 Prior to perl 5.6 "values" returned copies of the values, so older perl
1652 code often contains constructions such as @orbits{keys %orbits} instead
1653 of "values %orbits" where the hash is to be modified.
1654
1655 How do I select a random element from an array?
1656 Use the "rand()" function (see "rand" in perlfunc):
1657
1658 $index = rand @array;
1659 $element = $array[$index];
1660
1661 Or, simply:
1662
1663 my $element = $array[ rand @array ];
1664
1665 How do I permute N elements of a list?
1666 Use the "List::Permutor" module on CPAN. If the list is actually an
1667 array, try the "Algorithm::Permute" module (also on CPAN). It's written
1668 in XS code and is very efficient:
1669
1670 use Algorithm::Permute;
1671
1672 my @array = 'a'..'d';
1673 my $p_iterator = Algorithm::Permute->new ( \@array );
1674
1675 while (my @perm = $p_iterator->next) {
1676 print "next permutation: (@perm)\n";
1677 }
1678
1679 For even faster execution, you could do:
1680
1681 use Algorithm::Permute;
1682
1683 my @array = 'a'..'d';
1684
1685 Algorithm::Permute::permute {
1686 print "next permutation: (@array)\n";
1687 } @array;
1688
1689 Here's a little program that generates all permutations of all the
1690 words on each line of input. The algorithm embodied in the "permute()"
1691 function is discussed in Volume 4 (still unpublished) of Knuth's The
1692 Art of Computer Programming and will work on any list:
1693
1694 #!/usr/bin/perl -n
1695 # Fischer-Krause ordered permutation generator
1696
1697 sub permute (&@) {
1698 my $code = shift;
1699 my @idx = 0..$#_;
1700 while ( $code->(@_[@idx]) ) {
1701 my $p = $#idx;
1702 --$p while $idx[$p-1] > $idx[$p];
1703 my $q = $p or return;
1704 push @idx, reverse splice @idx, $p;
1705 ++$q while $idx[$p-1] > $idx[$q];
1706 @idx[$p-1,$q]=@idx[$q,$p-1];
1707 }
1708 }
1709
1710 permute { print "@_\n" } split;
1711
1712 The "Algorithm::Loops" module also provides the "NextPermute" and
1713 "NextPermuteNum" functions which efficiently find all unique
1714 permutations of an array, even if it contains duplicate values,
1715 modifying it in-place: if its elements are in reverse-sorted order then
1716 the array is reversed, making it sorted, and it returns false;
1717 otherwise the next permutation is returned.
1718
1719 "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1720 you can enumerate all the permutations of 0..9 like this:
1721
1722 use Algorithm::Loops qw(NextPermuteNum);
1723
1724 my @list= 0..9;
1725 do { print "@list\n" } while NextPermuteNum @list;
1726
1727 How do I sort an array by (anything)?
1728 Supply a comparison function to sort() (described in "sort" in
1729 perlfunc):
1730
1731 @list = sort { $a <=> $b } @list;
1732
1733 The default sort function is cmp, string comparison, which would sort
1734 "(1, 2, 10)" into "(1, 10, 2)". "<=>", used above, is the numerical
1735 comparison operator.
1736
1737 If you have a complicated function needed to pull out the part you want
1738 to sort on, then don't do it inside the sort function. Pull it out
1739 first, because the sort BLOCK can be called many times for the same
1740 element. Here's an example of how to pull out the first word after the
1741 first number on each item, and then sort those words case-
1742 insensitively.
1743
1744 @idx = ();
1745 for (@data) {
1746 ($item) = /\d+\s*(\S+)/;
1747 push @idx, uc($item);
1748 }
1749 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1750
1751 which could also be written this way, using a trick that's come to be
1752 known as the Schwartzian Transform:
1753
1754 @sorted = map { $_->[0] }
1755 sort { $a->[1] cmp $b->[1] }
1756 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1757
1758 If you need to sort on several fields, the following paradigm is
1759 useful.
1760
1761 @sorted = sort {
1762 field1($a) <=> field1($b) ||
1763 field2($a) cmp field2($b) ||
1764 field3($a) cmp field3($b)
1765 } @data;
1766
1767 This can be conveniently combined with precalculation of keys as given
1768 above.
1769
1770 See the sort article in the "Far More Than You Ever Wanted To Know"
1771 collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more
1772 about this approach.
1773
1774 See also the question later in perlfaq4 on sorting hashes.
1775
1776 How do I manipulate arrays of bits?
1777 Use "pack()" and "unpack()", or else "vec()" and the bitwise
1778 operations.
1779
1780 For example, you don't have to store individual bits in an array (which
1781 would mean that you're wasting a lot of space). To convert an array of
1782 bits to a string, use "vec()" to set the right bits. This sets $vec to
1783 have bit N set only if $ints[N] was set:
1784
1785 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1786 $vec = '';
1787 foreach( 0 .. $#ints ) {
1788 vec($vec,$_,1) = 1 if $ints[$_];
1789 }
1790
1791 The string $vec only takes up as many bits as it needs. For instance,
1792 if you had 16 entries in @ints, $vec only needs two bytes to store them
1793 (not counting the scalar variable overhead).
1794
1795 Here's how, given a vector in $vec, you can get those bits into your
1796 @ints array:
1797
1798 sub bitvec_to_list {
1799 my $vec = shift;
1800 my @ints;
1801 # Find null-byte density then select best algorithm
1802 if ($vec =~ tr/\0// / length $vec > 0.95) {
1803 use integer;
1804 my $i;
1805
1806 # This method is faster with mostly null-bytes
1807 while($vec =~ /[^\0]/g ) {
1808 $i = -9 + 8 * pos $vec;
1809 push @ints, $i if vec($vec, ++$i, 1);
1810 push @ints, $i if vec($vec, ++$i, 1);
1811 push @ints, $i if vec($vec, ++$i, 1);
1812 push @ints, $i if vec($vec, ++$i, 1);
1813 push @ints, $i if vec($vec, ++$i, 1);
1814 push @ints, $i if vec($vec, ++$i, 1);
1815 push @ints, $i if vec($vec, ++$i, 1);
1816 push @ints, $i if vec($vec, ++$i, 1);
1817 }
1818 }
1819 else {
1820 # This method is a fast general algorithm
1821 use integer;
1822 my $bits = unpack "b*", $vec;
1823 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1824 push @ints, pos $bits while($bits =~ /1/g);
1825 }
1826
1827 return \@ints;
1828 }
1829
1830 This method gets faster the more sparse the bit vector is. (Courtesy
1831 of Tim Bunce and Winfried Koenig.)
1832
1833 You can make the while loop a lot shorter with this suggestion from
1834 Benjamin Goldberg:
1835
1836 while($vec =~ /[^\0]+/g ) {
1837 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1838 }
1839
1840 Or use the CPAN module "Bit::Vector":
1841
1842 $vector = Bit::Vector->new($num_of_bits);
1843 $vector->Index_List_Store(@ints);
1844 @ints = $vector->Index_List_Read();
1845
1846 "Bit::Vector" provides efficient methods for bit vector, sets of small
1847 integers and "big int" math.
1848
1849 Here's a more extensive illustration using vec():
1850
1851 # vec demo
1852 $vector = "\xff\x0f\xef\xfe";
1853 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1854 unpack("N", $vector), "\n";
1855 $is_set = vec($vector, 23, 1);
1856 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1857 pvec($vector);
1858
1859 set_vec(1,1,1);
1860 set_vec(3,1,1);
1861 set_vec(23,1,1);
1862
1863 set_vec(3,1,3);
1864 set_vec(3,2,3);
1865 set_vec(3,4,3);
1866 set_vec(3,4,7);
1867 set_vec(3,8,3);
1868 set_vec(3,8,7);
1869
1870 set_vec(0,32,17);
1871 set_vec(1,32,17);
1872
1873 sub set_vec {
1874 my ($offset, $width, $value) = @_;
1875 my $vector = '';
1876 vec($vector, $offset, $width) = $value;
1877 print "offset=$offset width=$width value=$value\n";
1878 pvec($vector);
1879 }
1880
1881 sub pvec {
1882 my $vector = shift;
1883 my $bits = unpack("b*", $vector);
1884 my $i = 0;
1885 my $BASE = 8;
1886
1887 print "vector length in bytes: ", length($vector), "\n";
1888 @bytes = unpack("A8" x length($vector), $bits);
1889 print "bits are: @bytes\n\n";
1890 }
1891
1892 Why does defined() return true on empty arrays and hashes?
1893 The short story is that you should probably only use defined on scalars
1894 or functions, not on aggregates (arrays and hashes). See "defined" in
1895 perlfunc in the 5.004 release or later of Perl for more detail.
1896
1898 How do I process an entire hash?
1899 (contributed by brian d foy)
1900
1901 There are a couple of ways that you can process an entire hash. You can
1902 get a list of keys, then go through each key, or grab a one key-value
1903 pair at a time.
1904
1905 To go through all of the keys, use the "keys" function. This extracts
1906 all of the keys of the hash and gives them back to you as a list. You
1907 can then get the value through the particular key you're processing:
1908
1909 foreach my $key ( keys %hash ) {
1910 my $value = $hash{$key}
1911 ...
1912 }
1913
1914 Once you have the list of keys, you can process that list before you
1915 process the hash elements. For instance, you can sort the keys so you
1916 can process them in lexical order:
1917
1918 foreach my $key ( sort keys %hash ) {
1919 my $value = $hash{$key}
1920 ...
1921 }
1922
1923 Or, you might want to only process some of the items. If you only want
1924 to deal with the keys that start with "text:", you can select just
1925 those using "grep":
1926
1927 foreach my $key ( grep /^text:/, keys %hash ) {
1928 my $value = $hash{$key}
1929 ...
1930 }
1931
1932 If the hash is very large, you might not want to create a long list of
1933 keys. To save some memory, you can grab one key-value pair at a time
1934 using "each()", which returns a pair you haven't seen yet:
1935
1936 while( my( $key, $value ) = each( %hash ) ) {
1937 ...
1938 }
1939
1940 The "each" operator returns the pairs in apparently random order, so if
1941 ordering matters to you, you'll have to stick with the "keys" method.
1942
1943 The "each()" operator can be a bit tricky though. You can't add or
1944 delete keys of the hash while you're using it without possibly skipping
1945 or re-processing some pairs after Perl internally rehashes all of the
1946 elements. Additionally, a hash has only one iterator, so if you use
1947 "keys", "values", or "each" on the same hash, you can reset the
1948 iterator and mess up your processing. See the "each" entry in perlfunc
1949 for more details.
1950
1951 How do I merge two hashes?
1952 (contributed by brian d foy)
1953
1954 Before you decide to merge two hashes, you have to decide what to do if
1955 both hashes contain keys that are the same and if you want to leave the
1956 original hashes as they were.
1957
1958 If you want to preserve the original hashes, copy one hash (%hash1) to
1959 a new hash (%new_hash), then add the keys from the other hash (%hash2
1960 to the new hash. Checking that the key already exists in %new_hash
1961 gives you a chance to decide what to do with the duplicates:
1962
1963 my %new_hash = %hash1; # make a copy; leave %hash1 alone
1964
1965 foreach my $key2 ( keys %hash2 )
1966 {
1967 if( exists $new_hash{$key2} )
1968 {
1969 warn "Key [$key2] is in both hashes!";
1970 # handle the duplicate (perhaps only warning)
1971 ...
1972 next;
1973 }
1974 else
1975 {
1976 $new_hash{$key2} = $hash2{$key2};
1977 }
1978 }
1979
1980 If you don't want to create a new hash, you can still use this looping
1981 technique; just change the %new_hash to %hash1.
1982
1983 foreach my $key2 ( keys %hash2 )
1984 {
1985 if( exists $hash1{$key2} )
1986 {
1987 warn "Key [$key2] is in both hashes!";
1988 # handle the duplicate (perhaps only warning)
1989 ...
1990 next;
1991 }
1992 else
1993 {
1994 $hash1{$key2} = $hash2{$key2};
1995 }
1996 }
1997
1998 If you don't care that one hash overwrites keys and values from the
1999 other, you could just use a hash slice to add one hash to another. In
2000 this case, values from %hash2 replace values from %hash1 when they have
2001 keys in common:
2002
2003 @hash1{ keys %hash2 } = values %hash2;
2004
2005 What happens if I add or remove keys from a hash while iterating over it?
2006 (contributed by brian d foy)
2007
2008 The easy answer is "Don't do that!"
2009
2010 If you iterate through the hash with each(), you can delete the key
2011 most recently returned without worrying about it. If you delete or add
2012 other keys, the iterator may skip or double up on them since perl may
2013 rearrange the hash table. See the entry for "each()" in perlfunc.
2014
2015 How do I look up a hash element by value?
2016 Create a reverse hash:
2017
2018 %by_value = reverse %by_key;
2019 $key = $by_value{$value};
2020
2021 That's not particularly efficient. It would be more space-efficient to
2022 use:
2023
2024 while (($key, $value) = each %by_key) {
2025 $by_value{$value} = $key;
2026 }
2027
2028 If your hash could have repeated values, the methods above will only
2029 find one of the associated keys. This may or may not worry you. If
2030 it does worry you, you can always reverse the hash into a hash of
2031 arrays instead:
2032
2033 while (($key, $value) = each %by_key) {
2034 push @{$key_list_by_value{$value}}, $key;
2035 }
2036
2037 How can I know how many entries are in a hash?
2038 (contributed by brian d foy)
2039
2040 This is very similar to "How do I process an entire hash?", also in
2041 perlfaq4, but a bit simpler in the common cases.
2042
2043 You can use the "keys()" built-in function in scalar context to find
2044 out have many entries you have in a hash:
2045
2046 my $key_count = keys %hash; # must be scalar context!
2047
2048 If you want to find out how many entries have a defined value, that's a
2049 bit different. You have to check each value. A "grep" is handy:
2050
2051 my $defined_value_count = grep { defined } values %hash;
2052
2053 You can use that same structure to count the entries any way that you
2054 like. If you want the count of the keys with vowels in them, you just
2055 test for that instead:
2056
2057 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2058
2059 The "grep" in scalar context returns the count. If you want the list of
2060 matching items, just use it in list context instead:
2061
2062 my @defined_values = grep { defined } values %hash;
2063
2064 The "keys()" function also resets the iterator, which means that you
2065 may see strange results if you use this between uses of other hash
2066 operators such as "each()".
2067
2068 How do I sort a hash (optionally by value instead of key)?
2069 (contributed by brian d foy)
2070
2071 To sort a hash, start with the keys. In this example, we give the list
2072 of keys to the sort function which then compares them ASCIIbetically
2073 (which might be affected by your locale settings). The output list has
2074 the keys in ASCIIbetical order. Once we have the keys, we can go
2075 through them to create a report which lists the keys in ASCIIbetical
2076 order.
2077
2078 my @keys = sort { $a cmp $b } keys %hash;
2079
2080 foreach my $key ( @keys )
2081 {
2082 printf "%-20s %6d\n", $key, $hash{$key};
2083 }
2084
2085 We could get more fancy in the "sort()" block though. Instead of
2086 comparing the keys, we can compute a value with them and use that value
2087 as the comparison.
2088
2089 For instance, to make our report order case-insensitive, we use the
2090 "\L" sequence in a double-quoted string to make everything lowercase.
2091 The "sort()" block then compares the lowercased values to determine in
2092 which order to put the keys.
2093
2094 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
2095
2096 Note: if the computation is expensive or the hash has many elements,
2097 you may want to look at the Schwartzian Transform to cache the
2098 computation results.
2099
2100 If we want to sort by the hash value instead, we use the hash key to
2101 look it up. We still get out a list of keys, but this time they are
2102 ordered by their value.
2103
2104 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2105
2106 From there we can get more complex. If the hash values are the same, we
2107 can provide a secondary sort on the hash key.
2108
2109 my @keys = sort {
2110 $hash{$a} <=> $hash{$b}
2111 or
2112 "\L$a" cmp "\L$b"
2113 } keys %hash;
2114
2115 How can I always keep my hash sorted?
2116 You can look into using the "DB_File" module and "tie()" using the
2117 $DB_BTREE hash bindings as documented in "In Memory Databases" in
2118 DB_File. The "Tie::IxHash" module from CPAN might also be instructive.
2119 Although this does keep your hash sorted, you might not like the slow
2120 down you suffer from the tie interface. Are you sure you need to do
2121 this? :)
2122
2123 What's the difference between "delete" and "undef" with hashes?
2124 Hashes contain pairs of scalars: the first is the key, the second is
2125 the value. The key will be coerced to a string, although the value can
2126 be any kind of scalar: string, number, or reference. If a key $key is
2127 present in %hash, "exists($hash{$key})" will return true. The value
2128 for a given key can be "undef", in which case $hash{$key} will be
2129 "undef" while "exists $hash{$key}" will return true. This corresponds
2130 to ($key, "undef") being in the hash.
2131
2132 Pictures help... Here's the %hash table:
2133
2134 keys values
2135 +------+------+
2136 | a | 3 |
2137 | x | 7 |
2138 | d | 0 |
2139 | e | 2 |
2140 +------+------+
2141
2142 And these conditions hold
2143
2144 $hash{'a'} is true
2145 $hash{'d'} is false
2146 defined $hash{'d'} is true
2147 defined $hash{'a'} is true
2148 exists $hash{'a'} is true (Perl 5 only)
2149 grep ($_ eq 'a', keys %hash) is true
2150
2151 If you now say
2152
2153 undef $hash{'a'}
2154
2155 your table now reads:
2156
2157 keys values
2158 +------+------+
2159 | a | undef|
2160 | x | 7 |
2161 | d | 0 |
2162 | e | 2 |
2163 +------+------+
2164
2165 and these conditions now hold; changes in caps:
2166
2167 $hash{'a'} is FALSE
2168 $hash{'d'} is false
2169 defined $hash{'d'} is true
2170 defined $hash{'a'} is FALSE
2171 exists $hash{'a'} is true (Perl 5 only)
2172 grep ($_ eq 'a', keys %hash) is true
2173
2174 Notice the last two: you have an undef value, but a defined key!
2175
2176 Now, consider this:
2177
2178 delete $hash{'a'}
2179
2180 your table now reads:
2181
2182 keys values
2183 +------+------+
2184 | x | 7 |
2185 | d | 0 |
2186 | e | 2 |
2187 +------+------+
2188
2189 and these conditions now hold; changes in caps:
2190
2191 $hash{'a'} is false
2192 $hash{'d'} is false
2193 defined $hash{'d'} is true
2194 defined $hash{'a'} is false
2195 exists $hash{'a'} is FALSE (Perl 5 only)
2196 grep ($_ eq 'a', keys %hash) is FALSE
2197
2198 See, the whole entry is gone!
2199
2200 Why don't my tied hashes make the defined/exists distinction?
2201 This depends on the tied hash's implementation of EXISTS(). For
2202 example, there isn't the concept of undef with hashes that are tied to
2203 DBM* files. It also means that exists() and defined() do the same thing
2204 with a DBM* file, and what they end up doing is not what they do with
2205 ordinary hashes.
2206
2207 How do I reset an each() operation part-way through?
2208 (contributed by brian d foy)
2209
2210 You can use the "keys" or "values" functions to reset "each". To simply
2211 reset the iterator used by "each" without doing anything else, use one
2212 of them in void context:
2213
2214 keys %hash; # resets iterator, nothing else.
2215 values %hash; # resets iterator, nothing else.
2216
2217 See the documentation for "each" in perlfunc.
2218
2219 How can I get the unique keys from two hashes?
2220 First you extract the keys from the hashes into lists, then solve the
2221 "removing duplicates" problem described above. For example:
2222
2223 %seen = ();
2224 for $element (keys(%foo), keys(%bar)) {
2225 $seen{$element}++;
2226 }
2227 @uniq = keys %seen;
2228
2229 Or more succinctly:
2230
2231 @uniq = keys %{{%foo,%bar}};
2232
2233 Or if you really want to save space:
2234
2235 %seen = ();
2236 while (defined ($key = each %foo)) {
2237 $seen{$key}++;
2238 }
2239 while (defined ($key = each %bar)) {
2240 $seen{$key}++;
2241 }
2242 @uniq = keys %seen;
2243
2244 How can I store a multidimensional array in a DBM file?
2245 Either stringify the structure yourself (no fun), or else get the MLDBM
2246 (which uses Data::Dumper) module from CPAN and layer it on top of
2247 either DB_File or GDBM_File.
2248
2249 How can I make my hash remember the order I put elements into it?
2250 Use the "Tie::IxHash" from CPAN.
2251
2252 use Tie::IxHash;
2253
2254 tie my %myhash, 'Tie::IxHash';
2255
2256 for (my $i=0; $i<20; $i++) {
2257 $myhash{$i} = 2*$i;
2258 }
2259
2260 my @keys = keys %myhash;
2261 # @keys = (0,1,2,3,...)
2262
2263 Why does passing a subroutine an undefined element in a hash create it?
2264 (contributed by brian d foy)
2265
2266 Are you using a really old version of Perl?
2267
2268 Normally, accessing a hash key's value for a nonexistent key will not
2269 create the key.
2270
2271 my %hash = ();
2272 my $value = $hash{ 'foo' };
2273 print "This won't print\n" if exists $hash{ 'foo' };
2274
2275 Passing $hash{ 'foo' } to a subroutine used to be a special case,
2276 though. Since you could assign directly to $_[0], Perl had to be ready
2277 to make that assignment so it created the hash key ahead of time:
2278
2279 my_sub( $hash{ 'foo' } );
2280 print "This will print before 5.004\n" if exists $hash{ 'foo' };
2281
2282 sub my_sub {
2283 # $_[0] = 'bar'; # create hash key in case you do this
2284 1;
2285 }
2286
2287 Since Perl 5.004, however, this situation is a special case and Perl
2288 creates the hash key only when you make the assignment:
2289
2290 my_sub( $hash{ 'foo' } );
2291 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2292
2293 sub my_sub {
2294 $_[0] = 'bar';
2295 }
2296
2297 However, if you want the old behavior (and think carefully about that
2298 because it's a weird side effect), you can pass a hash slice instead.
2299 Perl 5.004 didn't make this a special case:
2300
2301 my_sub( @hash{ qw/foo/ } );
2302
2303 How can I make the Perl equivalent of a C structure/C++ class/hash or array
2304 of hashes or arrays?
2305 Usually a hash ref, perhaps like this:
2306
2307 $record = {
2308 NAME => "Jason",
2309 EMPNO => 132,
2310 TITLE => "deputy peon",
2311 AGE => 23,
2312 SALARY => 37_000,
2313 PALS => [ "Norbert", "Rhys", "Phineas"],
2314 };
2315
2316 References are documented in perlref and the upcoming perlreftut.
2317 Examples of complex data structures are given in perldsc and perllol.
2318 Examples of structures and object-oriented classes are in perltoot.
2319
2320 How can I use a reference as a hash key?
2321 (contributed by brian d foy and Ben Morrow)
2322
2323 Hash keys are strings, so you can't really use a reference as the key.
2324 When you try to do that, perl turns the reference into its stringified
2325 form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2326 the reference from the stringified form, at least without doing some
2327 extra work on your own.
2328
2329 Remember that the entry in the hash will still be there even if the
2330 referenced variable goes out of scope, and that it is entirely
2331 possible for Perl to subsequently allocate a different variable at the
2332 same address. This will mean a new variable might accidentally be
2333 associated with the value for an old.
2334
2335 If you have Perl 5.10 or later, and you just want to store a value
2336 against the reference for lookup later, you can use the core
2337 Hash::Util::Fieldhash module. This will also handle renaming the keys
2338 if you use multiple threads (which causes all variables to be
2339 reallocated at new addresses, changing their stringification), and
2340 garbage-collecting the entries when the referenced variable goes out of
2341 scope.
2342
2343 If you actually need to be able to get a real reference back from each
2344 hash entry, you can use the Tie::RefHash module, which does the
2345 required work for you.
2346
2348 How do I handle binary data correctly?
2349 Perl is binary clean, so it can handle binary data just fine. On
2350 Windows or DOS, however, you have to use "binmode" for binary files to
2351 avoid conversions for line endings. In general, you should use
2352 "binmode" any time you want to work with binary data.
2353
2354 Also see "binmode" in perlfunc or perlopentut.
2355
2356 If you're concerned about 8-bit textual data then see perllocale. If
2357 you want to deal with multibyte characters, however, there are some
2358 gotchas. See the section on Regular Expressions.
2359
2360 How do I determine whether a scalar is a number/whole/integer/float?
2361 Assuming that you don't care about IEEE notations like "NaN" or
2362 "Infinity", you probably just want to use a regular expression.
2363
2364 if (/\D/) { print "has nondigits\n" }
2365 if (/^\d+$/) { print "is a whole number\n" }
2366 if (/^-?\d+$/) { print "is an integer\n" }
2367 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2368 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2369 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2370 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
2371 { print "a C float\n" }
2372
2373 There are also some commonly used modules for the task. Scalar::Util
2374 (distributed with 5.8) provides access to perl's internal function
2375 "looks_like_number" for determining whether a variable looks like a
2376 number. Data::Types exports functions that validate data types using
2377 both the above and other regular expressions. Thirdly, there is
2378 "Regexp::Common" which has regular expressions to match various types
2379 of numbers. Those three modules are available from the CPAN.
2380
2381 If you're on a POSIX system, Perl supports the "POSIX::strtod"
2382 function. Its semantics are somewhat cumbersome, so here's a "getnum"
2383 wrapper function for more convenient access. This function takes a
2384 string and returns the number it found, or "undef" for input that isn't
2385 a C float. The "is_numeric" function is a front end to "getnum" if you
2386 just want to say, "Is this a float?"
2387
2388 sub getnum {
2389 use POSIX qw(strtod);
2390 my $str = shift;
2391 $str =~ s/^\s+//;
2392 $str =~ s/\s+$//;
2393 $! = 0;
2394 my($num, $unparsed) = strtod($str);
2395 if (($str eq '') || ($unparsed != 0) || $!) {
2396 return undef;
2397 }
2398 else {
2399 return $num;
2400 }
2401 }
2402
2403 sub is_numeric { defined getnum($_[0]) }
2404
2405 Or you could check out the String::Scanf module on the CPAN instead.
2406 The "POSIX" module (part of the standard Perl distribution) provides
2407 the "strtod" and "strtol" for converting strings to double and longs,
2408 respectively.
2409
2410 How do I keep persistent data across program calls?
2411 For some specific applications, you can use one of the DBM modules.
2412 See AnyDBM_File. More generically, you should consult the "FreezeThaw"
2413 or "Storable" modules from CPAN. Starting from Perl 5.8 "Storable" is
2414 part of the standard distribution. Here's one example using
2415 "Storable"'s "store" and "retrieve" functions:
2416
2417 use Storable;
2418 store(\%hash, "filename");
2419
2420 # later on...
2421 $href = retrieve("filename"); # by ref
2422 %hash = %{ retrieve("filename") }; # direct to hash
2423
2424 How do I print out or copy a recursive data structure?
2425 The "Data::Dumper" module on CPAN (or the 5.005 release of Perl) is
2426 great for printing out data structures. The "Storable" module on CPAN
2427 (or the 5.8 release of Perl), provides a function called "dclone" that
2428 recursively copies its argument.
2429
2430 use Storable qw(dclone);
2431 $r2 = dclone($r1);
2432
2433 Where $r1 can be a reference to any kind of data structure you'd like.
2434 It will be deeply copied. Because "dclone" takes and returns
2435 references, you'd have to add extra punctuation if you had a hash of
2436 arrays that you wanted to copy.
2437
2438 %newhash = %{ dclone(\%oldhash) };
2439
2440 How do I define methods for every class/object?
2441 (contributed by Ben Morrow)
2442
2443 You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2444 very careful to consider the consequences of doing this: adding methods
2445 to every object is very likely to have unintended consequences. If
2446 possible, it would be better to have all your object inherit from some
2447 common base class, or to use an object system like Moose that supports
2448 roles.
2449
2450 How do I verify a credit card checksum?
2451 Get the "Business::CreditCard" module from CPAN.
2452
2453 How do I pack arrays of doubles or floats for XS code?
2454 The arrays.h/arrays.c code in the "PGPLOT" module on CPAN does just
2455 this. If you're doing a lot of float or double processing, consider
2456 using the "PDL" module from CPAN instead--it makes number-crunching
2457 easy.
2458
2459 See <http://search.cpan.org/dist/PGPLOT> for the code.
2460
2462 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and other
2463 authors as noted. All rights reserved.
2464
2465 This documentation is free; you can redistribute it and/or modify it
2466 under the same terms as Perl itself.
2467
2468 Irrespective of its distribution, all code examples in this file are
2469 hereby placed into the public domain. You are permitted and encouraged
2470 to use this code in your own programs for fun or for profit as you see
2471 fit. A simple comment in the code giving credit would be courteous but
2472 is not required.
2473
2474
2475
2476perl v5.12.4 2011-06-07 PERLFAQ4(1)