1PERLFAQ4(1) Perl Programmers Reference Guide PERLFAQ4(1)
2
3
4
6 perlfaq4 - Data Manipulation
7
9 This section of the FAQ answers questions related to manipulating
10 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
11
13 Why am I getting long decimals (eg, 19.9499999999999) instead of the
14 numbers I should be getting (eg, 19.95)?
15 Internally, your computer represents floating-point numbers in binary.
16 Digital (as in powers of two) computers cannot store all numbers
17 exactly. Some real numbers lose precision in the process. This is a
18 problem with how computers store numbers and affects all computer
19 languages, not just Perl.
20
21 perlnumber shows the gory details of number representations and
22 conversions.
23
24 To limit the number of decimal places in your numbers, you can use the
25 printf or sprintf function. See the "Floating Point Arithmetic" for
26 more details.
27
28 printf "%.2f", 10/3;
29
30 my $number = sprintf "%.2f", 10/3;
31
32 Why is int() broken?
33 Your "int()" is most probably working just fine. It's the numbers that
34 aren't quite what you think.
35
36 First, see the answer to "Why am I getting long decimals (eg,
37 19.9499999999999) instead of the numbers I should be getting (eg,
38 19.95)?".
39
40 For example, this
41
42 print int(0.6/0.2-2), "\n";
43
44 will in most computers print 0, not 1, because even such simple numbers
45 as 0.6 and 0.2 cannot be presented exactly by floating-point numbers.
46 What you think in the above as 'three' is really more like
47 2.9999999999999995559.
48
49 Why isn't my octal data interpreted correctly?
50 (contributed by brian d foy)
51
52 You're probably trying to convert a string to a number, which Perl only
53 converts as a decimal number. When Perl converts a string to a number,
54 it ignores leading spaces and zeroes, then assumes the rest of the
55 digits are in base 10:
56
57 my $string = '0644';
58
59 print $string + 0; # prints 644
60
61 print $string + 44; # prints 688, certainly not octal!
62
63 This problem usually involves one of the Perl built-ins that has the
64 same name a unix command that uses octal numbers as arguments on the
65 command line. In this example, "chmod" on the command line knows that
66 its first argument is octal because that's what it does:
67
68 %prompt> chmod 644 file
69
70 If you want to use the same literal digits (644) in Perl, you have to
71 tell Perl to treat them as octal numbers either by prefixing the digits
72 with a 0 or using "oct":
73
74 chmod( 0644, $file); # right, has leading zero
75 chmod( oct(644), $file ); # also correct
76
77 The problem comes in when you take your numbers from something that
78 Perl thinks is a string, such as a command line argument in @ARGV:
79
80 chmod( $ARGV[0], $file); # wrong, even if "0644"
81
82 chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
83
84 You can always check the value you're using by printing it in octal
85 notation to ensure it matches what you think it should be. Print it in
86 octal and decimal format:
87
88 printf "0%o %d", $number, $number;
89
90 Does Perl have a round() function? What about ceil() and floor()? Trig
91 functions?
92 Remember that "int()" merely truncates toward 0. For rounding to a
93 certain number of digits, "sprintf()" or "printf()" is usually the
94 easiest route.
95
96 printf("%.3f", 3.1415926535); # prints 3.142
97
98 The "POSIX" module (part of the standard Perl distribution) implements
99 "ceil()", "floor()", and a number of other mathematical and
100 trigonometric functions.
101
102 use POSIX;
103 $ceil = ceil(3.5); # 4
104 $floor = floor(3.5); # 3
105
106 In 5.000 to 5.003 perls, trigonometry was done in the "Math::Complex"
107 module. With 5.004, the "Math::Trig" module (part of the standard Perl
108 distribution) implements the trigonometric functions. Internally it
109 uses the "Math::Complex" module and some functions can break out from
110 the real axis into the complex plane, for example the inverse sine of
111 2.
112
113 Rounding in financial applications can have serious implications, and
114 the rounding method used should be specified precisely. In these
115 cases, it probably pays not to trust whichever system rounding is being
116 used by Perl, but to instead implement the rounding function you need
117 yourself.
118
119 To see why, notice how you'll still have an issue on half-way-point
120 alternation:
121
122 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
123
124 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
125 0.8 0.8 0.9 0.9 1.0 1.0
126
127 Don't blame Perl. It's the same as in C. IEEE says we have to do
128 this. Perl numbers whose absolute values are integers under 2**31 (on
129 32 bit machines) will work pretty much like mathematical integers.
130 Other numbers are not guaranteed.
131
132 How do I convert between numeric representations/bases/radixes?
133 As always with Perl there is more than one way to do it. Below are a
134 few examples of approaches to making common conversions between number
135 representations. This is intended to be representational rather than
136 exhaustive.
137
138 Some of the examples later in perlfaq4 use the "Bit::Vector" module
139 from CPAN. The reason you might choose "Bit::Vector" over the perl
140 built in functions is that it works with numbers of ANY size, that it
141 is optimized for speed on some operations, and for at least some
142 programmers the notation might be familiar.
143
144 How do I convert hexadecimal into decimal
145 Using perl's built in conversion of "0x" notation:
146
147 $dec = 0xDEADBEEF;
148
149 Using the "hex" function:
150
151 $dec = hex("DEADBEEF");
152
153 Using "pack":
154
155 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
156
157 Using the CPAN module "Bit::Vector":
158
159 use Bit::Vector;
160 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
161 $dec = $vec->to_Dec();
162
163 How do I convert from decimal to hexadecimal
164 Using "sprintf":
165
166 $hex = sprintf("%X", 3735928559); # upper case A-F
167 $hex = sprintf("%x", 3735928559); # lower case a-f
168
169 Using "unpack":
170
171 $hex = unpack("H*", pack("N", 3735928559));
172
173 Using "Bit::Vector":
174
175 use Bit::Vector;
176 $vec = Bit::Vector->new_Dec(32, -559038737);
177 $hex = $vec->to_Hex();
178
179 And "Bit::Vector" supports odd bit counts:
180
181 use Bit::Vector;
182 $vec = Bit::Vector->new_Dec(33, 3735928559);
183 $vec->Resize(32); # suppress leading 0 if unwanted
184 $hex = $vec->to_Hex();
185
186 How do I convert from octal to decimal
187 Using Perl's built in conversion of numbers with leading zeros:
188
189 $dec = 033653337357; # note the leading 0!
190
191 Using the "oct" function:
192
193 $dec = oct("33653337357");
194
195 Using "Bit::Vector":
196
197 use Bit::Vector;
198 $vec = Bit::Vector->new(32);
199 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
200 $dec = $vec->to_Dec();
201
202 How do I convert from decimal to octal
203 Using "sprintf":
204
205 $oct = sprintf("%o", 3735928559);
206
207 Using "Bit::Vector":
208
209 use Bit::Vector;
210 $vec = Bit::Vector->new_Dec(32, -559038737);
211 $oct = reverse join('', $vec->Chunk_List_Read(3));
212
213 How do I convert from binary to decimal
214 Perl 5.6 lets you write binary numbers directly with the "0b"
215 notation:
216
217 $number = 0b10110110;
218
219 Using "oct":
220
221 my $input = "10110110";
222 $decimal = oct( "0b$input" );
223
224 Using "pack" and "ord":
225
226 $decimal = ord(pack('B8', '10110110'));
227
228 Using "pack" and "unpack" for larger strings:
229
230 $int = unpack("N", pack("B32",
231 substr("0" x 32 . "11110101011011011111011101111", -32)));
232 $dec = sprintf("%d", $int);
233
234 # substr() is used to left pad a 32 character string with zeros.
235
236 Using "Bit::Vector":
237
238 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
239 $dec = $vec->to_Dec();
240
241 How do I convert from decimal to binary
242 Using "sprintf" (perl 5.6+):
243
244 $bin = sprintf("%b", 3735928559);
245
246 Using "unpack":
247
248 $bin = unpack("B*", pack("N", 3735928559));
249
250 Using "Bit::Vector":
251
252 use Bit::Vector;
253 $vec = Bit::Vector->new_Dec(32, -559038737);
254 $bin = $vec->to_Bin();
255
256 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
257 are left as an exercise to the inclined reader.
258
259 Why doesn't & work the way I want it to?
260 The behavior of binary arithmetic operators depends on whether they're
261 used on numbers or strings. The operators treat a string as a series
262 of bits and work with that (the string "3" is the bit pattern
263 00110011). The operators work with the binary form of a number (the
264 number 3 is treated as the bit pattern 00000011).
265
266 So, saying "11 & 3" performs the "and" operation on numbers (yielding
267 3). Saying "11" & "3" performs the "and" operation on strings
268 (yielding "1").
269
270 Most problems with "&" and "|" arise because the programmer thinks they
271 have a number but really it's a string. The rest arise because the
272 programmer says:
273
274 if ("\020\020" & "\101\101") {
275 # ...
276 }
277
278 but a string consisting of two null bytes (the result of "\020\020" &
279 "\101\101") is not a false value in Perl. You need:
280
281 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
282 # ...
283 }
284
285 How do I multiply matrices?
286 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
287 or the PDL extension (also available from CPAN).
288
289 How do I perform an operation on a series of integers?
290 To call a function on each element in an array, and collect the
291 results, use:
292
293 @results = map { my_func($_) } @array;
294
295 For example:
296
297 @triple = map { 3 * $_ } @single;
298
299 To call a function on each element of an array, but ignore the results:
300
301 foreach $iterator (@array) {
302 some_func($iterator);
303 }
304
305 To call a function on each integer in a (small) range, you can use:
306
307 @results = map { some_func($_) } (5 .. 25);
308
309 but you should be aware that the ".." operator creates an array of all
310 integers in the range. This can take a lot of memory for large ranges.
311 Instead use:
312
313 @results = ();
314 for ($i=5; $i < 500_005; $i++) {
315 push(@results, some_func($i));
316 }
317
318 This situation has been fixed in Perl5.005. Use of ".." in a "for" loop
319 will iterate over the range, without creating the entire range.
320
321 for my $i (5 .. 500_005) {
322 push(@results, some_func($i));
323 }
324
325 will not create a list of 500,000 integers.
326
327 How can I output Roman numerals?
328 Get the http://www.cpan.org/modules/by-module/Roman module.
329
330 Why aren't my random numbers random?
331 If you're using a version of Perl before 5.004, you must call "srand"
332 once at the start of your program to seed the random number generator.
333
334 BEGIN { srand() if $] < 5.004 }
335
336 5.004 and later automatically call "srand" at the beginning. Don't
337 call "srand" more than once--you make your numbers less random, rather
338 than more.
339
340 Computers are good at being predictable and bad at being random
341 (despite appearances caused by bugs in your programs :-). see the
342 random article in the "Far More Than You Ever Wanted To Know"
343 collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
344 of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
345 who attempts to generate random numbers by deterministic means is, of
346 course, living in a state of sin."
347
348 If you want numbers that are more random than "rand" with "srand"
349 provides, you should also check out the "Math::TrulyRandom" module from
350 CPAN. It uses the imperfections in your system's timer to generate
351 random numbers, but this takes quite a while. If you want a better
352 pseudorandom generator than comes with your operating system, look at
353 "Numerical Recipes in C" at http://www.nr.com/ .
354
355 How do I get a random number between X and Y?
356 To get a random number between two values, you can use the "rand()"
357 built-in to get a random number between 0 and 1. From there, you shift
358 that into the range that you want.
359
360 "rand($x)" returns a number such that "0 <= rand($x) < $x". Thus what
361 you want to have perl figure out is a random number in the range from 0
362 to the difference between your X and Y.
363
364 That is, to get a number between 10 and 15, inclusive, you want a
365 random number between 0 and 5 that you can then add to 10.
366
367 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
368
369 Hence you derive the following simple function to abstract that. It
370 selects a random integer between the two given integers (inclusive),
371 For example: "random_int_between(50,120)".
372
373 sub random_int_between {
374 my($min, $max) = @_;
375 # Assumes that the two arguments are integers themselves!
376 return $min if $min == $max;
377 ($min, $max) = ($max, $min) if $min > $max;
378 return $min + int rand(1 + $max - $min);
379 }
380
382 How do I find the day or week of the year?
383 The localtime function returns the day of the year. Without an
384 argument localtime uses the current time.
385
386 $day_of_year = (localtime)[7];
387
388 The "POSIX" module can also format a date as the day of the year or
389 week of the year.
390
391 use POSIX qw/strftime/;
392 my $day_of_year = strftime "%j", localtime;
393 my $week_of_year = strftime "%W", localtime;
394
395 To get the day of year for any date, use "POSIX"'s "mktime" to get a
396 time in epoch seconds for the argument to localtime.
397
398 use POSIX qw/mktime strftime/;
399 my $week_of_year = strftime "%W",
400 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
401
402 The "Date::Calc" module provides two functions to calculate these.
403
404 use Date::Calc;
405 my $day_of_year = Day_of_Year( 1987, 12, 18 );
406 my $week_of_year = Week_of_Year( 1987, 12, 18 );
407
408 How do I find the current century or millennium?
409 Use the following simple functions:
410
411 sub get_century {
412 return int((((localtime(shift || time))[5] + 1999))/100);
413 }
414
415 sub get_millennium {
416 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
417 }
418
419 On some systems, the "POSIX" module's "strftime()" function has been
420 extended in a non-standard way to use a %C format, which they sometimes
421 claim is the "century". It isn't, because on most such systems, this is
422 only the first two digits of the four-digit year, and thus cannot be
423 used to reliably determine the current century or millennium.
424
425 How can I compare two dates and find the difference?
426 (contributed by brian d foy)
427
428 You could just store all your dates as a number and then subtract.
429 Life isn't always that simple though. If you want to work with
430 formatted dates, the "Date::Manip", "Date::Calc", or "DateTime" modules
431 can help you.
432
433 How can I take a string and turn it into epoch seconds?
434 If it's a regular enough string that it always has the same format, you
435 can split it up and pass the parts to "timelocal" in the standard
436 "Time::Local" module. Otherwise, you should look into the "Date::Calc"
437 and "Date::Manip" modules from CPAN.
438
439 How can I find the Julian Day?
440 (contributed by brian d foy and Dave Cross)
441
442 You can use the "Time::JulianDay" module available on CPAN. Ensure
443 that you really want to find a Julian day, though, as many people have
444 different ideas about Julian days. See
445 http://www.hermetic.ch/cal_stud/jdn.htm for instance.
446
447 You can also try the "DateTime" module, which can convert a date/time
448 to a Julian Day.
449
450 $ perl -MDateTime -le'print DateTime->today->jd'
451 2453401.5
452
453 Or the modified Julian Day
454
455 $ perl -MDateTime -le'print DateTime->today->mjd'
456 53401
457
458 Or even the day of the year (which is what some people think of as a
459 Julian day)
460
461 $ perl -MDateTime -le'print DateTime->today->doy'
462 31
463
464 How do I find yesterday's date?
465 (contributed by brian d foy)
466
467 Use one of the Date modules. The "DateTime" module makes it simple, and
468 give you the same time of day, only the day before.
469
470 use DateTime;
471
472 my $yesterday = DateTime->now->subtract( days => 1 );
473
474 print "Yesterday was $yesterday\n";
475
476 You can also use the "Date::Calc" module using its "Today_and_Now"
477 function.
478
479 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
480
481 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
482
483 print "@date_time\n";
484
485 Most people try to use the time rather than the calendar to figure out
486 dates, but that assumes that days are twenty-four hours each. For most
487 people, there are two days a year when they aren't: the switch to and
488 from summer time throws this off. Let the modules do the work.
489
490 If you absolutely must do it yourself (or can't use one of the
491 modules), here's a solution using "Time::Local", which comes with Perl:
492
493 # contributed by Gunnar Hjalmarsson
494 use Time::Local;
495 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
496 my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
497 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
498
499 In this case, you measure the day starting at noon, and subtract 24
500 hours. Even if the length of the calendar day is 23 or 25 hours, you'll
501 still end up on the previous calendar day, although not at noon. Since
502 you don't care about the time, the one hour difference doesn't matter
503 and you end up with the previous date.
504
505 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
506 Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
507 Y2K compliant (whatever that means). The programmers you've hired to
508 use it, however, probably are not.
509
510 Long answer: The question belies a true understanding of the issue.
511 Perl is just as Y2K compliant as your pencil--no more, and no less.
512 Can you use your pencil to write a non-Y2K-compliant memo? Of course
513 you can. Is that the pencil's fault? Of course it isn't.
514
515 The date and time functions supplied with Perl (gmtime and localtime)
516 supply adequate information to determine the year well beyond 2000
517 (2038 is when trouble strikes for 32-bit machines). The year returned
518 by these functions when used in a list context is the year minus 1900.
519 For years between 1910 and 1999 this happens to be a 2-digit decimal
520 number. To avoid the year 2000 problem simply do not treat the year as
521 a 2-digit number. It isn't.
522
523 When gmtime() and localtime() are used in scalar context they return a
524 timestamp string that contains a fully-expanded year. For example,
525 "$timestamp = gmtime(1005613200)" sets $timestamp to "Tue Nov 13
526 01:00:00 2001". There's no year 2000 problem here.
527
528 That doesn't mean that Perl can't be used to create non-Y2K compliant
529 programs. It can. But so can your pencil. It's the fault of the
530 user, not the language. At the risk of inflaming the NRA: "Perl
531 doesn't break Y2K, people do." See http://www.perl.org/about/y2k.html
532 for a longer exposition.
533
535 How do I validate input?
536 (contributed by brian d foy)
537
538 There are many ways to ensure that values are what you expect or want
539 to accept. Besides the specific examples that we cover in the perlfaq,
540 you can also look at the modules with "Assert" and "Validate" in their
541 names, along with other modules such as "Regexp::Common".
542
543 Some modules have validation for particular types of input, such as
544 "Business::ISBN", "Business::CreditCard", "Email::Valid", and
545 "Data::Validate::IP".
546
547 How do I unescape a string?
548 It depends just what you mean by "escape". URL escapes are dealt with
549 in perlfaq9. Shell escapes with the backslash ("\") character are
550 removed with
551
552 s/\\(.)/$1/g;
553
554 This won't expand "\n" or "\t" or any other special escapes.
555
556 How do I remove consecutive pairs of characters?
557 (contributed by brian d foy)
558
559 You can use the substitution operator to find pairs of characters (or
560 runs of characters) and replace them with a single instance. In this
561 substitution, we find a character in "(.)". The memory parentheses
562 store the matched character in the back-reference "\1" and we use that
563 to require that the same thing immediately follow it. We replace that
564 part of the string with the character in $1.
565
566 s/(.)\1/$1/g;
567
568 We can also use the transliteration operator, "tr///". In this example,
569 the search list side of our "tr///" contains nothing, but the "c"
570 option complements that so it contains everything. The replacement list
571 also contains nothing, so the transliteration is almost a no-op since
572 it won't do any replacements (or more exactly, replace the character
573 with itself). However, the "s" option squashes duplicated and
574 consecutive characters in the string so a character does not show up
575 next to itself
576
577 my $str = 'Haarlem'; # in the Netherlands
578 $str =~ tr///cs; # Now Harlem, like in New York
579
580 How do I expand function calls in a string?
581 (contributed by brian d foy)
582
583 This is documented in perlref, and although it's not the easiest thing
584 to read, it does work. In each of these examples, we call the function
585 inside the braces used to dereference a reference. If we have more than
586 one return value, we can construct and dereference an anonymous array.
587 In this case, we call the function in list context.
588
589 print "The time values are @{ [localtime] }.\n";
590
591 If we want to call the function in scalar context, we have to do a bit
592 more work. We can really have any code we like inside the braces, so we
593 simply have to end with the scalar reference, although how you do that
594 is up to you, and you can use code inside the braces. Note that the use
595 of parens creates a list context, so we need "scalar" to force the
596 scalar context on the function:
597
598 print "The time is ${\(scalar localtime)}.\n"
599
600 print "The time is ${ my $x = localtime; \$x }.\n";
601
602 If your function already returns a reference, you don't need to create
603 the reference yourself.
604
605 sub timestamp { my $t = localtime; \$t }
606
607 print "The time is ${ timestamp() }.\n";
608
609 The "Interpolation" module can also do a lot of magic for you. You can
610 specify a variable name, in this case "E", to set up a tied hash that
611 does the interpolation for you. It has several other methods to do this
612 as well.
613
614 use Interpolation E => 'eval';
615 print "The time values are $E{localtime()}.\n";
616
617 In most cases, it is probably easier to simply use string
618 concatenation, which also forces scalar context.
619
620 print "The time is " . localtime() . ".\n";
621
622 How do I find matching/nesting anything?
623 This isn't something that can be done in one regular expression, no
624 matter how complicated. To find something between two single
625 characters, a pattern like "/x([^x]*)x/" will get the intervening bits
626 in $1. For multiple ones, then something more like "/alpha(.*?)omega/"
627 would be needed. But none of these deals with nested patterns. For
628 balanced expressions using "(", "{", "[" or "<" as delimiters, use the
629 CPAN module Regexp::Common, or see "(??{ code })" in perlre. For other
630 cases, you'll have to write a parser.
631
632 If you are serious about writing a parser, there are a number of
633 modules or oddities that will make your life a lot easier. There are
634 the CPAN modules "Parse::RecDescent", "Parse::Yapp", and
635 "Text::Balanced"; and the "byacc" program. Starting from perl 5.8 the
636 "Text::Balanced" is part of the standard distribution.
637
638 One simple destructive, inside-out approach that you might try is to
639 pull out the smallest nesting parts one at a time:
640
641 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
642 # do something with $1
643 }
644
645 A more complicated and sneaky approach is to make Perl's regular
646 expression engine do it for you. This is courtesy Dean Inada, and
647 rather has the nature of an Obfuscated Perl Contest entry, but it
648 really does work:
649
650 # $_ contains the string to parse
651 # BEGIN and END are the opening and closing markers for the
652 # nested text.
653
654 @( = ('(','');
655 @) = (')','');
656 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
657 @$ = (eval{/$re/},$@!~/unmatched/i);
658 print join("\n",@$[0..$#$]) if( $$[-1] );
659
660 How do I reverse a string?
661 Use "reverse()" in scalar context, as documented in "reverse" in
662 perlfunc.
663
664 $reversed = reverse $string;
665
666 How do I expand tabs in a string?
667 You can do it yourself:
668
669 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
670
671 Or you can just use the "Text::Tabs" module (part of the standard Perl
672 distribution).
673
674 use Text::Tabs;
675 @expanded_lines = expand(@lines_with_tabs);
676
677 How do I reformat a paragraph?
678 Use "Text::Wrap" (part of the standard Perl distribution):
679
680 use Text::Wrap;
681 print wrap("\t", ' ', @paragraphs);
682
683 The paragraphs you give to "Text::Wrap" should not contain embedded
684 newlines. "Text::Wrap" doesn't justify the lines (flush-right).
685
686 Or use the CPAN module "Text::Autoformat". Formatting files can be
687 easily done by making a shell alias, like so:
688
689 alias fmt="perl -i -MText::Autoformat -n0777 \
690 -e 'print autoformat $_, {all=>1}' $*"
691
692 See the documentation for "Text::Autoformat" to appreciate its many
693 capabilities.
694
695 How can I access or change N characters of a string?
696 You can access the first characters of a string with substr(). To get
697 the first character, for example, start at position 0 and grab the
698 string of length 1.
699
700 $string = "Just another Perl Hacker";
701 $first_char = substr( $string, 0, 1 ); # 'J'
702
703 To change part of a string, you can use the optional fourth argument
704 which is the replacement string.
705
706 substr( $string, 13, 4, "Perl 5.8.0" );
707
708 You can also use substr() as an lvalue.
709
710 substr( $string, 13, 4 ) = "Perl 5.8.0";
711
712 How do I change the Nth occurrence of something?
713 You have to keep track of N yourself. For example, let's say you want
714 to change the fifth occurrence of "whoever" or "whomever" into
715 "whosoever" or "whomsoever", case insensitively. These all assume that
716 $_ contains the string to be altered.
717
718 $count = 0;
719 s{((whom?)ever)}{
720 ++$count == 5 # is it the 5th?
721 ? "${2}soever" # yes, swap
722 : $1 # renege and leave it there
723 }ige;
724
725 In the more general case, you can use the "/g" modifier in a "while"
726 loop, keeping count of matches.
727
728 $WANT = 3;
729 $count = 0;
730 $_ = "One fish two fish red fish blue fish";
731 while (/(\w+)\s+fish\b/gi) {
732 if (++$count == $WANT) {
733 print "The third fish is a $1 one.\n";
734 }
735 }
736
737 That prints out: "The third fish is a red one." You can also use a
738 repetition count and repeated pattern like this:
739
740 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
741
742 How can I count the number of occurrences of a substring within a string?
743 There are a number of ways, with varying efficiency. If you want a
744 count of a certain single character (X) within a string, you can use
745 the "tr///" function like so:
746
747 $string = "ThisXlineXhasXsomeXx'sXinXit";
748 $count = ($string =~ tr/X//);
749 print "There are $count X characters in the string";
750
751 This is fine if you are just looking for a single character. However,
752 if you are trying to count multiple character substrings within a
753 larger string, "tr///" won't work. What you can do is wrap a while()
754 loop around a global pattern match. For example, let's count negative
755 integers:
756
757 $string = "-9 55 48 -2 23 -76 4 14 -44";
758 while ($string =~ /-\d+/g) { $count++ }
759 print "There are $count negative numbers in the string";
760
761 Another version uses a global match in list context, then assigns the
762 result to a scalar, producing a count of the number of matches.
763
764 $count = () = $string =~ /-\d+/g;
765
766 Does Perl have a Year 2038 problem?
767 No, all of Perl's built in date and time functions and modules will
768 work to about 2 billion years before and after 1970.
769
770 Many systems cannot count time past the year 2038. Older versions of
771 Perl were dependent on the system to do date calculation and thus
772 shared their 2038 bug.
773
774 How do I capitalize all the words on one line?
775 (contributed by brian d foy)
776
777 Damian Conway's Text::Autoformat handles all of the thinking for you.
778
779 use Text::Autoformat;
780 my $x = "Dr. Strangelove or: How I Learned to Stop ".
781 "Worrying and Love the Bomb";
782
783 print $x, "\n";
784 for my $style (qw( sentence title highlight )) {
785 print autoformat($x, { case => $style }), "\n";
786 }
787
788 How do you want to capitalize those words?
789
790 FRED AND BARNEY'S LODGE # all uppercase
791 Fred And Barney's Lodge # title case
792 Fred and Barney's Lodge # highlight case
793
794 It's not as easy a problem as it looks. How many words do you think are
795 in there? Wait for it... wait for it.... If you answered 5 you're
796 right. Perl words are groups of "\w+", but that's not what you want to
797 capitalize. How is Perl supposed to know not to capitalize that "s"
798 after the apostrophe? You could try a regular expression:
799
800 $string =~ s/ (
801 (^\w) #at the beginning of the line
802 | # or
803 (\s\w) #preceded by whitespace
804 )
805 /\U$1/xg;
806
807 $string =~ s/([\w']+)/\u\L$1/g;
808
809 Now, what if you don't want to capitalize that "and"? Just use
810 Text::Autoformat and get on with the next problem. :)
811
812 How can I split a [character] delimited string except when inside
813 [character]?
814 Several modules can handle this sort of parsing--"Text::Balanced",
815 "Text::CSV", "Text::CSV_XS", and "Text::ParseWords", among others.
816
817 Take the example case of trying to split a string that is comma-
818 separated into its different fields. You can't use "split(/,/)" because
819 you shouldn't split if the comma is inside quotes. For example, take a
820 data line like this:
821
822 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
823
824 Due to the restriction of the quotes, this is a fairly complex problem.
825 Thankfully, we have Jeffrey Friedl, author of Mastering Regular
826 Expressions, to handle these for us. He suggests (assuming your string
827 is contained in $text):
828
829 @new = ();
830 push(@new, $+) while $text =~ m{
831 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
832 | ([^,]+),?
833 | ,
834 }gx;
835 push(@new, undef) if substr($text,-1,1) eq ',';
836
837 If you want to represent quotation marks inside a quotation-mark-
838 delimited field, escape them with backslashes (eg, "like \"this\"".
839
840 Alternatively, the "Text::ParseWords" module (part of the standard Perl
841 distribution) lets you say:
842
843 use Text::ParseWords;
844 @new = quotewords(",", 0, $text);
845
846 How do I strip blank space from the beginning/end of a string?
847 (contributed by brian d foy)
848
849 A substitution can do this for you. For a single line, you want to
850 replace all the leading or trailing whitespace with nothing. You can do
851 that with a pair of substitutions.
852
853 s/^\s+//;
854 s/\s+$//;
855
856 You can also write that as a single substitution, although it turns out
857 the combined statement is slower than the separate ones. That might not
858 matter to you, though.
859
860 s/^\s+|\s+$//g;
861
862 In this regular expression, the alternation matches either at the
863 beginning or the end of the string since the anchors have a lower
864 precedence than the alternation. With the "/g" flag, the substitution
865 makes all possible matches, so it gets both. Remember, the trailing
866 newline matches the "\s+", and the "$" anchor can match to the
867 physical end of the string, so the newline disappears too. Just add the
868 newline to the output, which has the added benefit of preserving
869 "blank" (consisting entirely of whitespace) lines which the "^\s+"
870 would remove all by itself.
871
872 while( <> )
873 {
874 s/^\s+|\s+$//g;
875 print "$_\n";
876 }
877
878 For a multi-line string, you can apply the regular expression to each
879 logical line in the string by adding the "/m" flag (for "multi-line").
880 With the "/m" flag, the "$" matches before an embedded newline, so it
881 doesn't remove it. It still removes the newline at the end of the
882 string.
883
884 $string =~ s/^\s+|\s+$//gm;
885
886 Remember that lines consisting entirely of whitespace will disappear,
887 since the first part of the alternation can match the entire string and
888 replace it with nothing. If need to keep embedded blank lines, you have
889 to do a little more work. Instead of matching any whitespace (since
890 that includes a newline), just match the other whitespace.
891
892 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
893
894 How do I pad a string with blanks or pad a number with zeroes?
895 In the following examples, $pad_len is the length to which you wish to
896 pad the string, $text or $num contains the string to be padded, and
897 $pad_char contains the padding character. You can use a single
898 character string constant instead of the $pad_char variable if you know
899 what it is in advance. And in the same way you can use an integer in
900 place of $pad_len if you know the pad length in advance.
901
902 The simplest method uses the "sprintf" function. It can pad on the left
903 or right with blanks and on the left with zeroes and it will not
904 truncate the result. The "pack" function can only pad strings on the
905 right with blanks and it will truncate the result to a maximum length
906 of $pad_len.
907
908 # Left padding a string with blanks (no truncation):
909 $padded = sprintf("%${pad_len}s", $text);
910 $padded = sprintf("%*s", $pad_len, $text); # same thing
911
912 # Right padding a string with blanks (no truncation):
913 $padded = sprintf("%-${pad_len}s", $text);
914 $padded = sprintf("%-*s", $pad_len, $text); # same thing
915
916 # Left padding a number with 0 (no truncation):
917 $padded = sprintf("%0${pad_len}d", $num);
918 $padded = sprintf("%0*d", $pad_len, $num); # same thing
919
920 # Right padding a string with blanks using pack (will truncate):
921 $padded = pack("A$pad_len",$text);
922
923 If you need to pad with a character other than blank or zero you can
924 use one of the following methods. They all generate a pad string with
925 the "x" operator and combine that with $text. These methods do not
926 truncate $text.
927
928 Left and right padding with any character, creating a new string:
929
930 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
931 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
932
933 Left and right padding with any character, modifying $text directly:
934
935 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
936 $text .= $pad_char x ( $pad_len - length( $text ) );
937
938 How do I extract selected columns from a string?
939 (contributed by brian d foy)
940
941 If you know where the columns that contain the data, you can use
942 "substr" to extract a single column.
943
944 my $column = substr( $line, $start_column, $length );
945
946 You can use "split" if the columns are separated by whitespace or some
947 other delimiter, as long as whitespace or the delimiter cannot appear
948 as part of the data.
949
950 my $line = ' fred barney betty ';
951 my @columns = split /\s+/, $line;
952 # ( '', 'fred', 'barney', 'betty' );
953
954 my $line = 'fred||barney||betty';
955 my @columns = split /\|/, $line;
956 # ( 'fred', '', 'barney', '', 'betty' );
957
958 If you want to work with comma-separated values, don't do this since
959 that format is a bit more complicated. Use one of the modules that
960 handle that format, such as "Text::CSV", "Text::CSV_XS", or
961 "Text::CSV_PP".
962
963 If you want to break apart an entire line of fixed columns, you can use
964 "unpack" with the A (ASCII) format. by using a number after the format
965 specifier, you can denote the column width. See the "pack" and "unpack"
966 entries in perlfunc for more details.
967
968 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
969
970 Note that spaces in the format argument to "unpack" do not denote
971 literal spaces. If you have space separated data, you may want "split"
972 instead.
973
974 How do I find the soundex value of a string?
975 (contributed by brian d foy)
976
977 You can use the Text::Soundex module. If you want to do fuzzy or close
978 matching, you might also try the "String::Approx", and
979 "Text::Metaphone", and "Text::DoubleMetaphone" modules.
980
981 How can I expand variables in text strings?
982 (contributed by brian d foy)
983
984 If you can avoid it, don't, or if you can use a templating system, such
985 as "Text::Template" or "Template" Toolkit, do that instead. You might
986 even be able to get the job done with "sprintf" or "printf":
987
988 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
989
990 However, for the one-off simple case where I don't want to pull out a
991 full templating system, I'll use a string that has two Perl scalar
992 variables in it. In this example, I want to expand $foo and $bar to
993 their variable's values:
994
995 my $foo = 'Fred';
996 my $bar = 'Barney';
997 $string = 'Say hello to $foo and $bar';
998
999 One way I can do this involves the substitution operator and a double
1000 "/e" flag. The first "/e" evaluates $1 on the replacement side and
1001 turns it into $foo. The second /e starts with $foo and replaces it with
1002 its value. $foo, then, turns into 'Fred', and that's finally what's
1003 left in the string:
1004
1005 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1006
1007 The "/e" will also silently ignore violations of strict, replacing
1008 undefined variable names with the empty string. Since I'm using the
1009 "/e" flag (twice even!), I have all of the same security problems I
1010 have with "eval" in its string form. If there's something odd in $foo,
1011 perhaps something like "@{[ system "rm -rf /" ]}", then I could get
1012 myself in trouble.
1013
1014 To get around the security problem, I could also pull the values from a
1015 hash instead of evaluating variable names. Using a single "/e", I can
1016 check the hash to ensure the value exists, and if it doesn't, I can
1017 replace the missing value with a marker, in this case "???" to signal
1018 that I missed something:
1019
1020 my $string = 'This has $foo and $bar';
1021
1022 my %Replacements = (
1023 foo => 'Fred',
1024 );
1025
1026 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1027 $string =~ s/\$(\w+)/
1028 exists $Replacements{$1} ? $Replacements{$1} : '???'
1029 /eg;
1030
1031 print $string;
1032
1033 What's wrong with always quoting "$vars"?
1034 The problem is that those double-quotes force stringification--coercing
1035 numbers and references into strings--even when you don't want them to
1036 be strings. Think of it this way: double-quote expansion is used to
1037 produce new strings. If you already have a string, why do you need
1038 more?
1039
1040 If you get used to writing odd things like these:
1041
1042 print "$var"; # BAD
1043 $new = "$old"; # BAD
1044 somefunc("$var"); # BAD
1045
1046 You'll be in trouble. Those should (in 99.8% of the cases) be the
1047 simpler and more direct:
1048
1049 print $var;
1050 $new = $old;
1051 somefunc($var);
1052
1053 Otherwise, besides slowing you down, you're going to break code when
1054 the thing in the scalar is actually neither a string nor a number, but
1055 a reference:
1056
1057 func(\@array);
1058 sub func {
1059 my $aref = shift;
1060 my $oref = "$aref"; # WRONG
1061 }
1062
1063 You can also get into subtle problems on those few operations in Perl
1064 that actually do care about the difference between a string and a
1065 number, such as the magical "++" autoincrement operator or the
1066 syscall() function.
1067
1068 Stringification also destroys arrays.
1069
1070 @lines = `command`;
1071 print "@lines"; # WRONG - extra blanks
1072 print @lines; # right
1073
1074 Why don't my <<HERE documents work?
1075 Check for these three things:
1076
1077 There must be no space after the << part.
1078 There (probably) should be a semicolon at the end.
1079 You can't (easily) have any space in front of the tag.
1080
1081 If you want to indent the text in the here document, you can do this:
1082
1083 # all in one
1084 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1085 your text
1086 goes here
1087 HERE_TARGET
1088
1089 But the HERE_TARGET must still be flush against the margin. If you
1090 want that indented also, you'll have to quote in the indentation.
1091
1092 ($quote = <<' FINIS') =~ s/^\s+//gm;
1093 ...we will have peace, when you and all your works have
1094 perished--and the works of your dark master to whom you
1095 would deliver us. You are a liar, Saruman, and a corrupter
1096 of men's hearts. --Theoden in /usr/src/perl/taint.c
1097 FINIS
1098 $quote =~ s/\s+--/\n--/;
1099
1100 A nice general-purpose fixer-upper function for indented here documents
1101 follows. It expects to be called with a here document as its argument.
1102 It looks to see whether each line begins with a common substring, and
1103 if so, strips that substring off. Otherwise, it takes the amount of
1104 leading whitespace found on the first line and removes that much off
1105 each subsequent line.
1106
1107 sub fix {
1108 local $_ = shift;
1109 my ($white, $leader); # common whitespace and common leading string
1110 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1111 ($white, $leader) = ($2, quotemeta($1));
1112 } else {
1113 ($white, $leader) = (/^(\s+)/, '');
1114 }
1115 s/^\s*?$leader(?:$white)?//gm;
1116 return $_;
1117 }
1118
1119 This works with leading special strings, dynamically determined:
1120
1121 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1122 @@@ int
1123 @@@ runops() {
1124 @@@ SAVEI32(runlevel);
1125 @@@ runlevel++;
1126 @@@ while ( op = (*op->op_ppaddr)() );
1127 @@@ TAINT_NOT;
1128 @@@ return 0;
1129 @@@ }
1130 MAIN_INTERPRETER_LOOP
1131
1132 Or with a fixed amount of leading whitespace, with remaining
1133 indentation correctly preserved:
1134
1135 $poem = fix<<EVER_ON_AND_ON;
1136 Now far ahead the Road has gone,
1137 And I must follow, if I can,
1138 Pursuing it with eager feet,
1139 Until it joins some larger way
1140 Where many paths and errands meet.
1141 And whither then? I cannot say.
1142 --Bilbo in /usr/src/perl/pp_ctl.c
1143 EVER_ON_AND_ON
1144
1146 What is the difference between a list and an array?
1147 An array has a changeable length. A list does not. An array is
1148 something you can push or pop, while a list is a set of values. Some
1149 people make the distinction that a list is a value while an array is a
1150 variable. Subroutines are passed and return lists, you put things into
1151 list context, you initialize arrays with lists, and you "foreach()"
1152 across a list. "@" variables are arrays, anonymous arrays are arrays,
1153 arrays in scalar context behave like the number of elements in them,
1154 subroutines access their arguments through the array @_, and
1155 "push"/"pop"/"shift" only work on arrays.
1156
1157 As a side note, there's no such thing as a list in scalar context.
1158 When you say
1159
1160 $scalar = (2, 5, 7, 9);
1161
1162 you're using the comma operator in scalar context, so it uses the
1163 scalar comma operator. There never was a list there at all! This
1164 causes the last value to be returned: 9.
1165
1166 What is the difference between $array[1] and @array[1]?
1167 The former is a scalar value; the latter an array slice, making it a
1168 list with one (scalar) value. You should use $ when you want a scalar
1169 value (most of the time) and @ when you want a list with one scalar
1170 value in it (very, very rarely; nearly never, in fact).
1171
1172 Sometimes it doesn't make a difference, but sometimes it does. For
1173 example, compare:
1174
1175 $good[0] = `some program that outputs several lines`;
1176
1177 with
1178
1179 @bad[0] = `same program that outputs several lines`;
1180
1181 The "use warnings" pragma and the -w flag will warn you about these
1182 matters.
1183
1184 How can I remove duplicate elements from a list or array?
1185 (contributed by brian d foy)
1186
1187 Use a hash. When you think the words "unique" or "duplicated", think
1188 "hash keys".
1189
1190 If you don't care about the order of the elements, you could just
1191 create the hash then extract the keys. It's not important how you
1192 create that hash: just that you use "keys" to get the unique elements.
1193
1194 my %hash = map { $_, 1 } @array;
1195 # or a hash slice: @hash{ @array } = ();
1196 # or a foreach: $hash{$_} = 1 foreach ( @array );
1197
1198 my @unique = keys %hash;
1199
1200 If you want to use a module, try the "uniq" function from
1201 "List::MoreUtils". In list context it returns the unique elements,
1202 preserving their order in the list. In scalar context, it returns the
1203 number of unique elements.
1204
1205 use List::MoreUtils qw(uniq);
1206
1207 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1208 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1209
1210 You can also go through each element and skip the ones you've seen
1211 before. Use a hash to keep track. The first time the loop sees an
1212 element, that element has no key in %Seen. The "next" statement creates
1213 the key and immediately uses its value, which is "undef", so the loop
1214 continues to the "push" and increments the value for that key. The next
1215 time the loop sees that same element, its key exists in the hash and
1216 the value for that key is true (since it's not 0 or "undef"), so the
1217 next skips that iteration and the loop goes to the next element.
1218
1219 my @unique = ();
1220 my %seen = ();
1221
1222 foreach my $elem ( @array )
1223 {
1224 next if $seen{ $elem }++;
1225 push @unique, $elem;
1226 }
1227
1228 You can write this more briefly using a grep, which does the same
1229 thing.
1230
1231 my %seen = ();
1232 my @unique = grep { ! $seen{ $_ }++ } @array;
1233
1234 How can I tell whether a certain element is contained in a list or array?
1235 (portions of this answer contributed by Anno Siegel and brian d foy)
1236
1237 Hearing the word "in" is an indication that you probably should have
1238 used a hash, not a list or array, to store your data. Hashes are
1239 designed to answer this question quickly and efficiently. Arrays
1240 aren't.
1241
1242 That being said, there are several ways to approach this. In Perl 5.10
1243 and later, you can use the smart match operator to check that an item
1244 is contained in an array or a hash:
1245
1246 use 5.010;
1247
1248 if( $item ~~ @array )
1249 {
1250 say "The array contains $item"
1251 }
1252
1253 if( $item ~~ %hash )
1254 {
1255 say "The hash contains $item"
1256 }
1257
1258 With earlier versions of Perl, you have to do a bit more work. If you
1259 are going to make this query many times over arbitrary string values,
1260 the fastest way is probably to invert the original array and maintain a
1261 hash whose keys are the first array's values:
1262
1263 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1264 %is_blue = ();
1265 for (@blues) { $is_blue{$_} = 1 }
1266
1267 Now you can check whether $is_blue{$some_color}. It might have been a
1268 good idea to keep the blues all in a hash in the first place.
1269
1270 If the values are all small integers, you could use a simple indexed
1271 array. This kind of an array will take up less space:
1272
1273 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1274 @is_tiny_prime = ();
1275 for (@primes) { $is_tiny_prime[$_] = 1 }
1276 # or simply @istiny_prime[@primes] = (1) x @primes;
1277
1278 Now you check whether $is_tiny_prime[$some_number].
1279
1280 If the values in question are integers instead of strings, you can save
1281 quite a lot of space by using bit strings instead:
1282
1283 @articles = ( 1..10, 150..2000, 2017 );
1284 undef $read;
1285 for (@articles) { vec($read,$_,1) = 1 }
1286
1287 Now check whether "vec($read,$n,1)" is true for some $n.
1288
1289 These methods guarantee fast individual tests but require a re-
1290 organization of the original list or array. They only pay off if you
1291 have to test multiple values against the same array.
1292
1293 If you are testing only once, the standard module "List::Util" exports
1294 the function "first" for this purpose. It works by stopping once it
1295 finds the element. It's written in C for speed, and its Perl equivalent
1296 looks like this subroutine:
1297
1298 sub first (&@) {
1299 my $code = shift;
1300 foreach (@_) {
1301 return $_ if &{$code}();
1302 }
1303 undef;
1304 }
1305
1306 If speed is of little concern, the common idiom uses grep in scalar
1307 context (which returns the number of items that passed its condition)
1308 to traverse the entire list. This does have the benefit of telling you
1309 how many matches it found, though.
1310
1311 my $is_there = grep $_ eq $whatever, @array;
1312
1313 If you want to actually extract the matching elements, simply use grep
1314 in list context.
1315
1316 my @matches = grep $_ eq $whatever, @array;
1317
1318 How do I compute the difference of two arrays? How do I compute the
1319 intersection of two arrays?
1320 Use a hash. Here's code to do both and more. It assumes that each
1321 element is unique in a given array:
1322
1323 @union = @intersection = @difference = ();
1324 %count = ();
1325 foreach $element (@array1, @array2) { $count{$element}++ }
1326 foreach $element (keys %count) {
1327 push @union, $element;
1328 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1329 }
1330
1331 Note that this is the symmetric difference, that is, all elements in
1332 either A or in B but not in both. Think of it as an xor operation.
1333
1334 How do I test whether two arrays or hashes are equal?
1335 With Perl 5.10 and later, the smart match operator can give you the
1336 answer with the least amount of work:
1337
1338 use 5.010;
1339
1340 if( @array1 ~~ @array2 )
1341 {
1342 say "The arrays are the same";
1343 }
1344
1345 if( %hash1 ~~ %hash2 ) # doesn't check values!
1346 {
1347 say "The hash keys are the same";
1348 }
1349
1350 The following code works for single-level arrays. It uses a stringwise
1351 comparison, and does not distinguish defined versus undefined empty
1352 strings. Modify if you have other needs.
1353
1354 $are_equal = compare_arrays(\@frogs, \@toads);
1355
1356 sub compare_arrays {
1357 my ($first, $second) = @_;
1358 no warnings; # silence spurious -w undef complaints
1359 return 0 unless @$first == @$second;
1360 for (my $i = 0; $i < @$first; $i++) {
1361 return 0 if $first->[$i] ne $second->[$i];
1362 }
1363 return 1;
1364 }
1365
1366 For multilevel structures, you may wish to use an approach more like
1367 this one. It uses the CPAN module "FreezeThaw":
1368
1369 use FreezeThaw qw(cmpStr);
1370 @a = @b = ( "this", "that", [ "more", "stuff" ] );
1371
1372 printf "a and b contain %s arrays\n",
1373 cmpStr(\@a, \@b) == 0
1374 ? "the same"
1375 : "different";
1376
1377 This approach also works for comparing hashes. Here we'll demonstrate
1378 two different answers:
1379
1380 use FreezeThaw qw(cmpStr cmpStrHard);
1381
1382 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1383 $a{EXTRA} = \%b;
1384 $b{EXTRA} = \%a;
1385
1386 printf "a and b contain %s hashes\n",
1387 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1388
1389 printf "a and b contain %s hashes\n",
1390 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1391
1392 The first reports that both those the hashes contain the same data,
1393 while the second reports that they do not. Which you prefer is left as
1394 an exercise to the reader.
1395
1396 How do I find the first array element for which a condition is true?
1397 To find the first array element which satisfies a condition, you can
1398 use the "first()" function in the "List::Util" module, which comes with
1399 Perl 5.8. This example finds the first element that contains "Perl".
1400
1401 use List::Util qw(first);
1402
1403 my $element = first { /Perl/ } @array;
1404
1405 If you cannot use "List::Util", you can make your own loop to do the
1406 same thing. Once you find the element, you stop the loop with last.
1407
1408 my $found;
1409 foreach ( @array ) {
1410 if( /Perl/ ) { $found = $_; last }
1411 }
1412
1413 If you want the array index, you can iterate through the indices and
1414 check the array element at each index until you find one that satisfies
1415 the condition.
1416
1417 my( $found, $index ) = ( undef, -1 );
1418 for( $i = 0; $i < @array; $i++ ) {
1419 if( $array[$i] =~ /Perl/ ) {
1420 $found = $array[$i];
1421 $index = $i;
1422 last;
1423 }
1424 }
1425
1426 How do I handle linked lists?
1427 In general, you usually don't need a linked list in Perl, since with
1428 regular arrays, you can push and pop or shift and unshift at either
1429 end, or you can use splice to add and/or remove arbitrary number of
1430 elements at arbitrary points. Both pop and shift are O(1) operations
1431 on Perl's dynamic arrays. In the absence of shifts and pops, push in
1432 general needs to reallocate on the order every log(N) times, and
1433 unshift will need to copy pointers each time.
1434
1435 If you really, really wanted, you could use structures as described in
1436 perldsc or perltoot and do just what the algorithm book tells you to
1437 do. For example, imagine a list node like this:
1438
1439 $node = {
1440 VALUE => 42,
1441 LINK => undef,
1442 };
1443
1444 You could walk the list this way:
1445
1446 print "List: ";
1447 for ($node = $head; $node; $node = $node->{LINK}) {
1448 print $node->{VALUE}, " ";
1449 }
1450 print "\n";
1451
1452 You could add to the list this way:
1453
1454 my ($head, $tail);
1455 $tail = append($head, 1); # grow a new head
1456 for $value ( 2 .. 10 ) {
1457 $tail = append($tail, $value);
1458 }
1459
1460 sub append {
1461 my($list, $value) = @_;
1462 my $node = { VALUE => $value };
1463 if ($list) {
1464 $node->{LINK} = $list->{LINK};
1465 $list->{LINK} = $node;
1466 }
1467 else {
1468 $_[0] = $node; # replace caller's version
1469 }
1470 return $node;
1471 }
1472
1473 But again, Perl's built-in are virtually always good enough.
1474
1475 How do I handle circular lists?
1476 (contributed by brian d foy)
1477
1478 If you want to cycle through an array endlessy, you can increment the
1479 index modulo the number of elements in the array:
1480
1481 my @array = qw( a b c );
1482 my $i = 0;
1483
1484 while( 1 ) {
1485 print $array[ $i++ % @array ], "\n";
1486 last if $i > 20;
1487 }
1488
1489 You can also use "Tie::Cycle" to use a scalar that always has the next
1490 element of the circular array:
1491
1492 use Tie::Cycle;
1493
1494 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1495
1496 print $cycle; # FFFFFF
1497 print $cycle; # 000000
1498 print $cycle; # FFFF00
1499
1500 The "Array::Iterator::Circular" creates an iterator object for circular
1501 arrays:
1502
1503 use Array::Iterator::Circular;
1504
1505 my $color_iterator = Array::Iterator::Circular->new(
1506 qw(red green blue orange)
1507 );
1508
1509 foreach ( 1 .. 20 ) {
1510 print $color_iterator->next, "\n";
1511 }
1512
1513 How do I shuffle an array randomly?
1514 If you either have Perl 5.8.0 or later installed, or if you have
1515 Scalar-List-Utils 1.03 or later installed, you can say:
1516
1517 use List::Util 'shuffle';
1518
1519 @shuffled = shuffle(@list);
1520
1521 If not, you can use a Fisher-Yates shuffle.
1522
1523 sub fisher_yates_shuffle {
1524 my $deck = shift; # $deck is a reference to an array
1525 return unless @$deck; # must not be empty!
1526
1527 my $i = @$deck;
1528 while (--$i) {
1529 my $j = int rand ($i+1);
1530 @$deck[$i,$j] = @$deck[$j,$i];
1531 }
1532 }
1533
1534 # shuffle my mpeg collection
1535 #
1536 my @mpeg = <audio/*/*.mp3>;
1537 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1538 print @mpeg;
1539
1540 Note that the above implementation shuffles an array in place, unlike
1541 the "List::Util::shuffle()" which takes a list and returns a new
1542 shuffled list.
1543
1544 You've probably seen shuffling algorithms that work using splice,
1545 randomly picking another element to swap the current element with
1546
1547 srand;
1548 @new = ();
1549 @old = 1 .. 10; # just a demo
1550 while (@old) {
1551 push(@new, splice(@old, rand @old, 1));
1552 }
1553
1554 This is bad because splice is already O(N), and since you do it N
1555 times, you just invented a quadratic algorithm; that is, O(N**2). This
1556 does not scale, although Perl is so efficient that you probably won't
1557 notice this until you have rather largish arrays.
1558
1559 How do I process/modify each element of an array?
1560 Use "for"/"foreach":
1561
1562 for (@lines) {
1563 s/foo/bar/; # change that word
1564 tr/XZ/ZX/; # swap those letters
1565 }
1566
1567 Here's another; let's compute spherical volumes:
1568
1569 for (@volumes = @radii) { # @volumes has changed parts
1570 $_ **= 3;
1571 $_ *= (4/3) * 3.14159; # this will be constant folded
1572 }
1573
1574 which can also be done with "map()" which is made to transform one list
1575 into another:
1576
1577 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1578
1579 If you want to do the same thing to modify the values of the hash, you
1580 can use the "values" function. As of Perl 5.6 the values are not
1581 copied, so if you modify $orbit (in this case), you modify the value.
1582
1583 for $orbit ( values %orbits ) {
1584 ($orbit **= 3) *= (4/3) * 3.14159;
1585 }
1586
1587 Prior to perl 5.6 "values" returned copies of the values, so older perl
1588 code often contains constructions such as @orbits{keys %orbits} instead
1589 of "values %orbits" where the hash is to be modified.
1590
1591 How do I select a random element from an array?
1592 Use the "rand()" function (see "rand" in perlfunc):
1593
1594 $index = rand @array;
1595 $element = $array[$index];
1596
1597 Or, simply:
1598
1599 my $element = $array[ rand @array ];
1600
1601 How do I permute N elements of a list?
1602 Use the "List::Permutor" module on CPAN. If the list is actually an
1603 array, try the "Algorithm::Permute" module (also on CPAN). It's written
1604 in XS code and is very efficient:
1605
1606 use Algorithm::Permute;
1607
1608 my @array = 'a'..'d';
1609 my $p_iterator = Algorithm::Permute->new ( \@array );
1610
1611 while (my @perm = $p_iterator->next) {
1612 print "next permutation: (@perm)\n";
1613 }
1614
1615 For even faster execution, you could do:
1616
1617 use Algorithm::Permute;
1618
1619 my @array = 'a'..'d';
1620
1621 Algorithm::Permute::permute {
1622 print "next permutation: (@array)\n";
1623 } @array;
1624
1625 Here's a little program that generates all permutations of all the
1626 words on each line of input. The algorithm embodied in the "permute()"
1627 function is discussed in Volume 4 (still unpublished) of Knuth's The
1628 Art of Computer Programming and will work on any list:
1629
1630 #!/usr/bin/perl -n
1631 # Fischer-Krause ordered permutation generator
1632
1633 sub permute (&@) {
1634 my $code = shift;
1635 my @idx = 0..$#_;
1636 while ( $code->(@_[@idx]) ) {
1637 my $p = $#idx;
1638 --$p while $idx[$p-1] > $idx[$p];
1639 my $q = $p or return;
1640 push @idx, reverse splice @idx, $p;
1641 ++$q while $idx[$p-1] > $idx[$q];
1642 @idx[$p-1,$q]=@idx[$q,$p-1];
1643 }
1644 }
1645
1646 permute { print "@_\n" } split;
1647
1648 The "Algorithm::Loops" module also provides the "NextPermute" and
1649 "NextPermuteNum" functions which efficiently find all unique
1650 permutations of an array, even if it contains duplicate values,
1651 modifying it in-place: if its elements are in reverse-sorted order then
1652 the array is reversed, making it sorted, and it returns false;
1653 otherwise the next permutation is returned.
1654
1655 "NextPermute" uses string order and "NextPermuteNum" numeric order, so
1656 you can enumerate all the permutations of 0..9 like this:
1657
1658 use Algorithm::Loops qw(NextPermuteNum);
1659
1660 my @list= 0..9;
1661 do { print "@list\n" } while NextPermuteNum @list;
1662
1663 How do I sort an array by (anything)?
1664 Supply a comparison function to sort() (described in "sort" in
1665 perlfunc):
1666
1667 @list = sort { $a <=> $b } @list;
1668
1669 The default sort function is cmp, string comparison, which would sort
1670 "(1, 2, 10)" into "(1, 10, 2)". "<=>", used above, is the numerical
1671 comparison operator.
1672
1673 If you have a complicated function needed to pull out the part you want
1674 to sort on, then don't do it inside the sort function. Pull it out
1675 first, because the sort BLOCK can be called many times for the same
1676 element. Here's an example of how to pull out the first word after the
1677 first number on each item, and then sort those words case-
1678 insensitively.
1679
1680 @idx = ();
1681 for (@data) {
1682 ($item) = /\d+\s*(\S+)/;
1683 push @idx, uc($item);
1684 }
1685 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1686
1687 which could also be written this way, using a trick that's come to be
1688 known as the Schwartzian Transform:
1689
1690 @sorted = map { $_->[0] }
1691 sort { $a->[1] cmp $b->[1] }
1692 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1693
1694 If you need to sort on several fields, the following paradigm is
1695 useful.
1696
1697 @sorted = sort {
1698 field1($a) <=> field1($b) ||
1699 field2($a) cmp field2($b) ||
1700 field3($a) cmp field3($b)
1701 } @data;
1702
1703 This can be conveniently combined with precalculation of keys as given
1704 above.
1705
1706 See the sort article in the "Far More Than You Ever Wanted To Know"
1707 collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more
1708 about this approach.
1709
1710 See also the question later in perlfaq4 on sorting hashes.
1711
1712 How do I manipulate arrays of bits?
1713 Use "pack()" and "unpack()", or else "vec()" and the bitwise
1714 operations.
1715
1716 For example, you don't have to store individual bits in an array (which
1717 would mean that you're wasting a lot of space). To convert an array of
1718 bits to a string, use "vec()" to set the right bits. This sets $vec to
1719 have bit N set only if $ints[N] was set:
1720
1721 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1722 $vec = '';
1723 foreach( 0 .. $#ints ) {
1724 vec($vec,$_,1) = 1 if $ints[$_];
1725 }
1726
1727 The string $vec only takes up as many bits as it needs. For instance,
1728 if you had 16 entries in @ints, $vec only needs two bytes to store them
1729 (not counting the scalar variable overhead).
1730
1731 Here's how, given a vector in $vec, you can get those bits into your
1732 @ints array:
1733
1734 sub bitvec_to_list {
1735 my $vec = shift;
1736 my @ints;
1737 # Find null-byte density then select best algorithm
1738 if ($vec =~ tr/\0// / length $vec > 0.95) {
1739 use integer;
1740 my $i;
1741
1742 # This method is faster with mostly null-bytes
1743 while($vec =~ /[^\0]/g ) {
1744 $i = -9 + 8 * pos $vec;
1745 push @ints, $i if vec($vec, ++$i, 1);
1746 push @ints, $i if vec($vec, ++$i, 1);
1747 push @ints, $i if vec($vec, ++$i, 1);
1748 push @ints, $i if vec($vec, ++$i, 1);
1749 push @ints, $i if vec($vec, ++$i, 1);
1750 push @ints, $i if vec($vec, ++$i, 1);
1751 push @ints, $i if vec($vec, ++$i, 1);
1752 push @ints, $i if vec($vec, ++$i, 1);
1753 }
1754 }
1755 else {
1756 # This method is a fast general algorithm
1757 use integer;
1758 my $bits = unpack "b*", $vec;
1759 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1760 push @ints, pos $bits while($bits =~ /1/g);
1761 }
1762
1763 return \@ints;
1764 }
1765
1766 This method gets faster the more sparse the bit vector is. (Courtesy
1767 of Tim Bunce and Winfried Koenig.)
1768
1769 You can make the while loop a lot shorter with this suggestion from
1770 Benjamin Goldberg:
1771
1772 while($vec =~ /[^\0]+/g ) {
1773 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1774 }
1775
1776 Or use the CPAN module "Bit::Vector":
1777
1778 $vector = Bit::Vector->new($num_of_bits);
1779 $vector->Index_List_Store(@ints);
1780 @ints = $vector->Index_List_Read();
1781
1782 "Bit::Vector" provides efficient methods for bit vector, sets of small
1783 integers and "big int" math.
1784
1785 Here's a more extensive illustration using vec():
1786
1787 # vec demo
1788 $vector = "\xff\x0f\xef\xfe";
1789 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1790 unpack("N", $vector), "\n";
1791 $is_set = vec($vector, 23, 1);
1792 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1793 pvec($vector);
1794
1795 set_vec(1,1,1);
1796 set_vec(3,1,1);
1797 set_vec(23,1,1);
1798
1799 set_vec(3,1,3);
1800 set_vec(3,2,3);
1801 set_vec(3,4,3);
1802 set_vec(3,4,7);
1803 set_vec(3,8,3);
1804 set_vec(3,8,7);
1805
1806 set_vec(0,32,17);
1807 set_vec(1,32,17);
1808
1809 sub set_vec {
1810 my ($offset, $width, $value) = @_;
1811 my $vector = '';
1812 vec($vector, $offset, $width) = $value;
1813 print "offset=$offset width=$width value=$value\n";
1814 pvec($vector);
1815 }
1816
1817 sub pvec {
1818 my $vector = shift;
1819 my $bits = unpack("b*", $vector);
1820 my $i = 0;
1821 my $BASE = 8;
1822
1823 print "vector length in bytes: ", length($vector), "\n";
1824 @bytes = unpack("A8" x length($vector), $bits);
1825 print "bits are: @bytes\n\n";
1826 }
1827
1828 Why does defined() return true on empty arrays and hashes?
1829 The short story is that you should probably only use defined on scalars
1830 or functions, not on aggregates (arrays and hashes). See "defined" in
1831 perlfunc in the 5.004 release or later of Perl for more detail.
1832
1834 How do I process an entire hash?
1835 (contributed by brian d foy)
1836
1837 There are a couple of ways that you can process an entire hash. You can
1838 get a list of keys, then go through each key, or grab a one key-value
1839 pair at a time.
1840
1841 To go through all of the keys, use the "keys" function. This extracts
1842 all of the keys of the hash and gives them back to you as a list. You
1843 can then get the value through the particular key you're processing:
1844
1845 foreach my $key ( keys %hash ) {
1846 my $value = $hash{$key}
1847 ...
1848 }
1849
1850 Once you have the list of keys, you can process that list before you
1851 process the hash elements. For instance, you can sort the keys so you
1852 can process them in lexical order:
1853
1854 foreach my $key ( sort keys %hash ) {
1855 my $value = $hash{$key}
1856 ...
1857 }
1858
1859 Or, you might want to only process some of the items. If you only want
1860 to deal with the keys that start with "text:", you can select just
1861 those using "grep":
1862
1863 foreach my $key ( grep /^text:/, keys %hash ) {
1864 my $value = $hash{$key}
1865 ...
1866 }
1867
1868 If the hash is very large, you might not want to create a long list of
1869 keys. To save some memory, you can grab one key-value pair at a time
1870 using "each()", which returns a pair you haven't seen yet:
1871
1872 while( my( $key, $value ) = each( %hash ) ) {
1873 ...
1874 }
1875
1876 The "each" operator returns the pairs in apparently random order, so if
1877 ordering matters to you, you'll have to stick with the "keys" method.
1878
1879 The "each()" operator can be a bit tricky though. You can't add or
1880 delete keys of the hash while you're using it without possibly skipping
1881 or re-processing some pairs after Perl internally rehashes all of the
1882 elements. Additionally, a hash has only one iterator, so if you use
1883 "keys", "values", or "each" on the same hash, you can reset the
1884 iterator and mess up your processing. See the "each" entry in perlfunc
1885 for more details.
1886
1887 How do I merge two hashes?
1888 (contributed by brian d foy)
1889
1890 Before you decide to merge two hashes, you have to decide what to do if
1891 both hashes contain keys that are the same and if you want to leave the
1892 original hashes as they were.
1893
1894 If you want to preserve the original hashes, copy one hash (%hash1) to
1895 a new hash (%new_hash), then add the keys from the other hash (%hash2
1896 to the new hash. Checking that the key already exists in %new_hash
1897 gives you a chance to decide what to do with the duplicates:
1898
1899 my %new_hash = %hash1; # make a copy; leave %hash1 alone
1900
1901 foreach my $key2 ( keys %hash2 )
1902 {
1903 if( exists $new_hash{$key2} )
1904 {
1905 warn "Key [$key2] is in both hashes!";
1906 # handle the duplicate (perhaps only warning)
1907 ...
1908 next;
1909 }
1910 else
1911 {
1912 $new_hash{$key2} = $hash2{$key2};
1913 }
1914 }
1915
1916 If you don't want to create a new hash, you can still use this looping
1917 technique; just change the %new_hash to %hash1.
1918
1919 foreach my $key2 ( keys %hash2 )
1920 {
1921 if( exists $hash1{$key2} )
1922 {
1923 warn "Key [$key2] is in both hashes!";
1924 # handle the duplicate (perhaps only warning)
1925 ...
1926 next;
1927 }
1928 else
1929 {
1930 $hash1{$key2} = $hash2{$key2};
1931 }
1932 }
1933
1934 If you don't care that one hash overwrites keys and values from the
1935 other, you could just use a hash slice to add one hash to another. In
1936 this case, values from %hash2 replace values from %hash1 when they have
1937 keys in common:
1938
1939 @hash1{ keys %hash2 } = values %hash2;
1940
1941 What happens if I add or remove keys from a hash while iterating over it?
1942 (contributed by brian d foy)
1943
1944 The easy answer is "Don't do that!"
1945
1946 If you iterate through the hash with each(), you can delete the key
1947 most recently returned without worrying about it. If you delete or add
1948 other keys, the iterator may skip or double up on them since perl may
1949 rearrange the hash table. See the entry for "each()" in perlfunc.
1950
1951 How do I look up a hash element by value?
1952 Create a reverse hash:
1953
1954 %by_value = reverse %by_key;
1955 $key = $by_value{$value};
1956
1957 That's not particularly efficient. It would be more space-efficient to
1958 use:
1959
1960 while (($key, $value) = each %by_key) {
1961 $by_value{$value} = $key;
1962 }
1963
1964 If your hash could have repeated values, the methods above will only
1965 find one of the associated keys. This may or may not worry you. If
1966 it does worry you, you can always reverse the hash into a hash of
1967 arrays instead:
1968
1969 while (($key, $value) = each %by_key) {
1970 push @{$key_list_by_value{$value}}, $key;
1971 }
1972
1973 How can I know how many entries are in a hash?
1974 (contributed by brian d foy)
1975
1976 This is very similar to "How do I process an entire hash?", also in
1977 perlfaq4, but a bit simpler in the common cases.
1978
1979 You can use the "keys()" built-in function in scalar context to find
1980 out have many entries you have in a hash:
1981
1982 my $key_count = keys %hash; # must be scalar context!
1983
1984 If you want to find out how many entries have a defined value, that's a
1985 bit different. You have to check each value. A "grep" is handy:
1986
1987 my $defined_value_count = grep { defined } values %hash;
1988
1989 You can use that same structure to count the entries any way that you
1990 like. If you want the count of the keys with vowels in them, you just
1991 test for that instead:
1992
1993 my $vowel_count = grep { /[aeiou]/ } keys %hash;
1994
1995 The "grep" in scalar context returns the count. If you want the list of
1996 matching items, just use it in list context instead:
1997
1998 my @defined_values = grep { defined } values %hash;
1999
2000 The "keys()" function also resets the iterator, which means that you
2001 may see strange results if you use this between uses of other hash
2002 operators such as "each()".
2003
2004 How do I sort a hash (optionally by value instead of key)?
2005 (contributed by brian d foy)
2006
2007 To sort a hash, start with the keys. In this example, we give the list
2008 of keys to the sort function which then compares them ASCIIbetically
2009 (which might be affected by your locale settings). The output list has
2010 the keys in ASCIIbetical order. Once we have the keys, we can go
2011 through them to create a report which lists the keys in ASCIIbetical
2012 order.
2013
2014 my @keys = sort { $a cmp $b } keys %hash;
2015
2016 foreach my $key ( @keys )
2017 {
2018 printf "%-20s %6d\n", $key, $hash{$key};
2019 }
2020
2021 We could get more fancy in the "sort()" block though. Instead of
2022 comparing the keys, we can compute a value with them and use that value
2023 as the comparison.
2024
2025 For instance, to make our report order case-insensitive, we use the
2026 "\L" sequence in a double-quoted string to make everything lowercase.
2027 The "sort()" block then compares the lowercased values to determine in
2028 which order to put the keys.
2029
2030 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
2031
2032 Note: if the computation is expensive or the hash has many elements,
2033 you may want to look at the Schwartzian Transform to cache the
2034 computation results.
2035
2036 If we want to sort by the hash value instead, we use the hash key to
2037 look it up. We still get out a list of keys, but this time they are
2038 ordered by their value.
2039
2040 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2041
2042 From there we can get more complex. If the hash values are the same, we
2043 can provide a secondary sort on the hash key.
2044
2045 my @keys = sort {
2046 $hash{$a} <=> $hash{$b}
2047 or
2048 "\L$a" cmp "\L$b"
2049 } keys %hash;
2050
2051 How can I always keep my hash sorted?
2052 You can look into using the "DB_File" module and "tie()" using the
2053 $DB_BTREE hash bindings as documented in "In Memory Databases" in
2054 DB_File. The "Tie::IxHash" module from CPAN might also be instructive.
2055 Although this does keep your hash sorted, you might not like the slow
2056 down you suffer from the tie interface. Are you sure you need to do
2057 this? :)
2058
2059 What's the difference between "delete" and "undef" with hashes?
2060 Hashes contain pairs of scalars: the first is the key, the second is
2061 the value. The key will be coerced to a string, although the value can
2062 be any kind of scalar: string, number, or reference. If a key $key is
2063 present in %hash, "exists($hash{$key})" will return true. The value
2064 for a given key can be "undef", in which case $hash{$key} will be
2065 "undef" while "exists $hash{$key}" will return true. This corresponds
2066 to ($key, "undef") being in the hash.
2067
2068 Pictures help... here's the %hash table:
2069
2070 keys values
2071 +------+------+
2072 | a | 3 |
2073 | x | 7 |
2074 | d | 0 |
2075 | e | 2 |
2076 +------+------+
2077
2078 And these conditions hold
2079
2080 $hash{'a'} is true
2081 $hash{'d'} is false
2082 defined $hash{'d'} is true
2083 defined $hash{'a'} is true
2084 exists $hash{'a'} is true (Perl 5 only)
2085 grep ($_ eq 'a', keys %hash) is true
2086
2087 If you now say
2088
2089 undef $hash{'a'}
2090
2091 your table now reads:
2092
2093 keys values
2094 +------+------+
2095 | a | undef|
2096 | x | 7 |
2097 | d | 0 |
2098 | e | 2 |
2099 +------+------+
2100
2101 and these conditions now hold; changes in caps:
2102
2103 $hash{'a'} is FALSE
2104 $hash{'d'} is false
2105 defined $hash{'d'} is true
2106 defined $hash{'a'} is FALSE
2107 exists $hash{'a'} is true (Perl 5 only)
2108 grep ($_ eq 'a', keys %hash) is true
2109
2110 Notice the last two: you have an undef value, but a defined key!
2111
2112 Now, consider this:
2113
2114 delete $hash{'a'}
2115
2116 your table now reads:
2117
2118 keys values
2119 +------+------+
2120 | x | 7 |
2121 | d | 0 |
2122 | e | 2 |
2123 +------+------+
2124
2125 and these conditions now hold; changes in caps:
2126
2127 $hash{'a'} is false
2128 $hash{'d'} is false
2129 defined $hash{'d'} is true
2130 defined $hash{'a'} is false
2131 exists $hash{'a'} is FALSE (Perl 5 only)
2132 grep ($_ eq 'a', keys %hash) is FALSE
2133
2134 See, the whole entry is gone!
2135
2136 Why don't my tied hashes make the defined/exists distinction?
2137 This depends on the tied hash's implementation of EXISTS(). For
2138 example, there isn't the concept of undef with hashes that are tied to
2139 DBM* files. It also means that exists() and defined() do the same thing
2140 with a DBM* file, and what they end up doing is not what they do with
2141 ordinary hashes.
2142
2143 How do I reset an each() operation part-way through?
2144 (contributed by brian d foy)
2145
2146 You can use the "keys" or "values" functions to reset "each". To simply
2147 reset the iterator used by "each" without doing anything else, use one
2148 of them in void context:
2149
2150 keys %hash; # resets iterator, nothing else.
2151 values %hash; # resets iterator, nothing else.
2152
2153 See the documentation for "each" in perlfunc.
2154
2155 How can I get the unique keys from two hashes?
2156 First you extract the keys from the hashes into lists, then solve the
2157 "removing duplicates" problem described above. For example:
2158
2159 %seen = ();
2160 for $element (keys(%foo), keys(%bar)) {
2161 $seen{$element}++;
2162 }
2163 @uniq = keys %seen;
2164
2165 Or more succinctly:
2166
2167 @uniq = keys %{{%foo,%bar}};
2168
2169 Or if you really want to save space:
2170
2171 %seen = ();
2172 while (defined ($key = each %foo)) {
2173 $seen{$key}++;
2174 }
2175 while (defined ($key = each %bar)) {
2176 $seen{$key}++;
2177 }
2178 @uniq = keys %seen;
2179
2180 How can I store a multidimensional array in a DBM file?
2181 Either stringify the structure yourself (no fun), or else get the MLDBM
2182 (which uses Data::Dumper) module from CPAN and layer it on top of
2183 either DB_File or GDBM_File.
2184
2185 How can I make my hash remember the order I put elements into it?
2186 Use the "Tie::IxHash" from CPAN.
2187
2188 use Tie::IxHash;
2189
2190 tie my %myhash, 'Tie::IxHash';
2191
2192 for (my $i=0; $i<20; $i++) {
2193 $myhash{$i} = 2*$i;
2194 }
2195
2196 my @keys = keys %myhash;
2197 # @keys = (0,1,2,3,...)
2198
2199 Why does passing a subroutine an undefined element in a hash create it?
2200 (contributed by brian d foy)
2201
2202 Are you using a really old version of Perl?
2203
2204 Normally, accessing a hash key's value for a nonexistent key will not
2205 create the key.
2206
2207 my %hash = ();
2208 my $value = $hash{ 'foo' };
2209 print "This won't print\n" if exists $hash{ 'foo' };
2210
2211 Passing $hash{ 'foo' } to a subroutine used to be a special case,
2212 though. Since you could assign directly to $_[0], Perl had to be ready
2213 to make that assignment so it created the hash key ahead of time:
2214
2215 my_sub( $hash{ 'foo' } );
2216 print "This will print before 5.004\n" if exists $hash{ 'foo' };
2217
2218 sub my_sub {
2219 # $_[0] = 'bar'; # create hash key in case you do this
2220 1;
2221 }
2222
2223 Since Perl 5.004, however, this situation is a special case and Perl
2224 creates the hash key only when you make the assignment:
2225
2226 my_sub( $hash{ 'foo' } );
2227 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2228
2229 sub my_sub {
2230 $_[0] = 'bar';
2231 }
2232
2233 However, if you want the old behavior (and think carefully about that
2234 because it's a weird side effect), you can pass a hash slice instead.
2235 Perl 5.004 didn't make this a special case:
2236
2237 my_sub( @hash{ qw/foo/ } );
2238
2239 How can I make the Perl equivalent of a C structure/C++ class/hash or array
2240 of hashes or arrays?
2241 Usually a hash ref, perhaps like this:
2242
2243 $record = {
2244 NAME => "Jason",
2245 EMPNO => 132,
2246 TITLE => "deputy peon",
2247 AGE => 23,
2248 SALARY => 37_000,
2249 PALS => [ "Norbert", "Rhys", "Phineas"],
2250 };
2251
2252 References are documented in perlref and the upcoming perlreftut.
2253 Examples of complex data structures are given in perldsc and perllol.
2254 Examples of structures and object-oriented classes are in perltoot.
2255
2256 How can I use a reference as a hash key?
2257 (contributed by brian d foy and Ben Morrow)
2258
2259 Hash keys are strings, so you can't really use a reference as the key.
2260 When you try to do that, perl turns the reference into its stringified
2261 form (for instance, "HASH(0xDEADBEEF)"). From there you can't get back
2262 the reference from the stringified form, at least without doing some
2263 extra work on your own.
2264
2265 Remember that the entry in the hash will still be there even if the
2266 referenced variable goes out of scope, and that it is entirely
2267 possible for Perl to subsequently allocate a different variable at the
2268 same address. This will mean a new variable might accidentally be
2269 associated with the value for an old.
2270
2271 If you have Perl 5.10 or later, and you just want to store a value
2272 against the reference for lookup later, you can use the core
2273 Hash::Util::Fieldhash module. This will also handle renaming the keys
2274 if you use multiple threads (which causes all variables to be
2275 reallocated at new addresses, changing their stringification), and
2276 garbage-collecting the entries when the referenced variable goes out of
2277 scope.
2278
2279 If you actually need to be able to get a real reference back from each
2280 hash entry, you can use the Tie::RefHash module, which does the
2281 required work for you.
2282
2284 How do I handle binary data correctly?
2285 Perl is binary clean, so it can handle binary data just fine. On
2286 Windows or DOS, however, you have to use "binmode" for binary files to
2287 avoid conversions for line endings. In general, you should use
2288 "binmode" any time you want to work with binary data.
2289
2290 Also see "binmode" in perlfunc or perlopentut.
2291
2292 If you're concerned about 8-bit textual data then see perllocale. If
2293 you want to deal with multibyte characters, however, there are some
2294 gotchas. See the section on Regular Expressions.
2295
2296 How do I determine whether a scalar is a number/whole/integer/float?
2297 Assuming that you don't care about IEEE notations like "NaN" or
2298 "Infinity", you probably just want to use a regular expression.
2299
2300 if (/\D/) { print "has nondigits\n" }
2301 if (/^\d+$/) { print "is a whole number\n" }
2302 if (/^-?\d+$/) { print "is an integer\n" }
2303 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2304 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2305 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2306 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
2307 { print "a C float\n" }
2308
2309 There are also some commonly used modules for the task. Scalar::Util
2310 (distributed with 5.8) provides access to perl's internal function
2311 "looks_like_number" for determining whether a variable looks like a
2312 number. Data::Types exports functions that validate data types using
2313 both the above and other regular expressions. Thirdly, there is
2314 "Regexp::Common" which has regular expressions to match various types
2315 of numbers. Those three modules are available from the CPAN.
2316
2317 If you're on a POSIX system, Perl supports the "POSIX::strtod"
2318 function. Its semantics are somewhat cumbersome, so here's a "getnum"
2319 wrapper function for more convenient access. This function takes a
2320 string and returns the number it found, or "undef" for input that isn't
2321 a C float. The "is_numeric" function is a front end to "getnum" if you
2322 just want to say, "Is this a float?"
2323
2324 sub getnum {
2325 use POSIX qw(strtod);
2326 my $str = shift;
2327 $str =~ s/^\s+//;
2328 $str =~ s/\s+$//;
2329 $! = 0;
2330 my($num, $unparsed) = strtod($str);
2331 if (($str eq '') || ($unparsed != 0) || $!) {
2332 return undef;
2333 }
2334 else {
2335 return $num;
2336 }
2337 }
2338
2339 sub is_numeric { defined getnum($_[0]) }
2340
2341 Or you could check out the String::Scanf module on the CPAN instead.
2342 The "POSIX" module (part of the standard Perl distribution) provides
2343 the "strtod" and "strtol" for converting strings to double and longs,
2344 respectively.
2345
2346 How do I keep persistent data across program calls?
2347 For some specific applications, you can use one of the DBM modules.
2348 See AnyDBM_File. More generically, you should consult the "FreezeThaw"
2349 or "Storable" modules from CPAN. Starting from Perl 5.8 "Storable" is
2350 part of the standard distribution. Here's one example using
2351 "Storable"'s "store" and "retrieve" functions:
2352
2353 use Storable;
2354 store(\%hash, "filename");
2355
2356 # later on...
2357 $href = retrieve("filename"); # by ref
2358 %hash = %{ retrieve("filename") }; # direct to hash
2359
2360 How do I print out or copy a recursive data structure?
2361 The "Data::Dumper" module on CPAN (or the 5.005 release of Perl) is
2362 great for printing out data structures. The "Storable" module on CPAN
2363 (or the 5.8 release of Perl), provides a function called "dclone" that
2364 recursively copies its argument.
2365
2366 use Storable qw(dclone);
2367 $r2 = dclone($r1);
2368
2369 Where $r1 can be a reference to any kind of data structure you'd like.
2370 It will be deeply copied. Because "dclone" takes and returns
2371 references, you'd have to add extra punctuation if you had a hash of
2372 arrays that you wanted to copy.
2373
2374 %newhash = %{ dclone(\%oldhash) };
2375
2376 How do I define methods for every class/object?
2377 (contributed by Ben Morrow)
2378
2379 You can use the "UNIVERSAL" class (see UNIVERSAL). However, please be
2380 very careful to consider the consequences of doing this: adding methods
2381 to every object is very likely to have unintended consequences. If
2382 possible, it would be better to have all your object inherit from some
2383 common base class, or to use an object system like Moose that supports
2384 roles.
2385
2386 How do I verify a credit card checksum?
2387 Get the "Business::CreditCard" module from CPAN.
2388
2389 How do I pack arrays of doubles or floats for XS code?
2390 The arrays.h/arrays.c code in the "PGPLOT" module on CPAN does just
2391 this. If you're doing a lot of float or double processing, consider
2392 using the "PDL" module from CPAN instead--it makes number-crunching
2393 easy.
2394
2395 See <http://search.cpan.org/dist/PGPLOT> for the code.
2396
2398 Revision: $Revision$
2399
2400 Date: $Date$
2401
2402 See perlfaq for source control details and availability.
2403
2405 Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and other
2406 authors as noted. All rights reserved.
2407
2408 This documentation is free; you can redistribute it and/or modify it
2409 under the same terms as Perl itself.
2410
2411 Irrespective of its distribution, all code examples in this file are
2412 hereby placed into the public domain. You are permitted and encouraged
2413 to use this code in your own programs for fun or for profit as you see
2414 fit. A simple comment in the code giving credit would be courteous but
2415 is not required.
2416
2417
2418
2419perl v5.10.1 2009-08-15 PERLFAQ4(1)