1PERLOP(1) Perl Programmers Reference Guide PERLOP(1)
2
3
4
6 perlop - Perl operators and precedence
7
9 Operator Precedence and Associativity
10
11 Operator precedence and associativity work in Perl more or less like
12 they do in mathematics.
13
14 Operator precedence means some operators are evaluated before others.
15 For example, in "2 + 4 * 5", the multiplication has higher precedence
16 so "4 * 5" is evaluated first yielding "2 + 20 == 22" and not "6 * 5 ==
17 30".
18
19 Operator associativity defines what happens if a sequence of the same
20 operators is used one after another: whether the evaluator will evalu‐
21 ate the left operations first or the right. For example, in "8 - 4 -
22 2", subtraction is left associative so Perl evaluates the expression
23 left to right. "8 - 4" is evaluated first making the expression "4 - 2
24 == 2" and not "8 - 2 == 6".
25
26 Perl operators have the following associativity and precedence, listed
27 from highest precedence to lowest. Operators borrowed from C keep the
28 same precedence relationship with each other, even where C's precedence
29 is slightly screwy. (This makes learning Perl easier for C folks.)
30 With very few exceptions, these all operate on scalar values only, not
31 array values.
32
33 left terms and list operators (leftward)
34 left ->
35 nonassoc ++ --
36 right **
37 right ! ~ \ and unary + and -
38 left =~ !~
39 left * / % x
40 left + - .
41 left << >>
42 nonassoc named unary operators
43 nonassoc < > <= >= lt gt le ge
44 nonassoc == != <=> eq ne cmp
45 left &
46 left ⎪ ^
47 left &&
48 left ⎪⎪
49 nonassoc .. ...
50 right ?:
51 right = += -= *= etc.
52 left , =>
53 nonassoc list operators (rightward)
54 right not
55 left and
56 left or xor
57
58 In the following sections, these operators are covered in precedence
59 order.
60
61 Many operators can be overloaded for objects. See overload.
62
63 Terms and List Operators (Leftward)
64
65 A TERM has the highest precedence in Perl. They include variables,
66 quote and quote-like operators, any expression in parentheses, and any
67 function whose arguments are parenthesized. Actually, there aren't
68 really functions in this sense, just list operators and unary operators
69 behaving as functions because you put parentheses around the arguments.
70 These are all documented in perlfunc.
71
72 If any list operator (print(), etc.) or any unary operator (chdir(),
73 etc.) is followed by a left parenthesis as the next token, the opera‐
74 tor and arguments within parentheses are taken to be of highest prece‐
75 dence, just like a normal function call.
76
77 In the absence of parentheses, the precedence of list operators such as
78 "print", "sort", or "chmod" is either very high or very low depending
79 on whether you are looking at the left side or the right side of the
80 operator. For example, in
81
82 @ary = (1, 3, sort 4, 2);
83 print @ary; # prints 1324
84
85 the commas on the right of the sort are evaluated before the sort, but
86 the commas on the left are evaluated after. In other words, list oper‐
87 ators tend to gobble up all arguments that follow, and then act like a
88 simple TERM with regard to the preceding expression. Be careful with
89 parentheses:
90
91 # These evaluate exit before doing the print:
92 print($foo, exit); # Obviously not what you want.
93 print $foo, exit; # Nor is this.
94
95 # These do the print before evaluating exit:
96 (print $foo), exit; # This is what you want.
97 print($foo), exit; # Or this.
98 print ($foo), exit; # Or even this.
99
100 Also note that
101
102 print ($foo & 255) + 1, "\n";
103
104 probably doesn't do what you expect at first glance. The parentheses
105 enclose the argument list for "print" which is evaluated (printing the
106 result of "$foo & 255"). Then one is added to the return value of
107 "print" (usually 1). The result is something like this:
108
109 1 + 1, "\n"; # Obviously not what you meant.
110
111 To do what you meant properly, you must write:
112
113 print(($foo & 255) + 1, "\n");
114
115 See "Named Unary Operators" for more discussion of this.
116
117 Also parsed as terms are the "do {}" and "eval {}" constructs, as well
118 as subroutine and method calls, and the anonymous constructors "[]" and
119 "{}".
120
121 See also "Quote and Quote-like Operators" toward the end of this sec‐
122 tion, as well as "I/O Operators".
123
124 The Arrow Operator
125
126 ""->"" is an infix dereference operator, just as it is in C and C++.
127 If the right side is either a "[...]", "{...}", or a "(...)" subscript,
128 then the left side must be either a hard or symbolic reference to an
129 array, a hash, or a subroutine respectively. (Or technically speaking,
130 a location capable of holding a hard reference, if it's an array or
131 hash reference being used for assignment.) See perlreftut and perlref.
132
133 Otherwise, the right side is a method name or a simple scalar variable
134 containing either the method name or a subroutine reference, and the
135 left side must be either an object (a blessed reference) or a class
136 name (that is, a package name). See perlobj.
137
138 Auto-increment and Auto-decrement
139
140 "++" and "--" work as in C. That is, if placed before a variable, they
141 increment or decrement the variable by one before returning the value,
142 and if placed after, increment or decrement after returning the value.
143
144 $i = 0; $j = 0;
145 print $i++; # prints 0
146 print ++$j; # prints 1
147
148 Note that just as in C, Perl doesn't define when the variable is incre‐
149 mented or decremented. You just know it will be done sometime before or
150 after the value is returned. This also means that modifying a variable
151 twice in the same statement will lead to undefined behaviour. Avoid
152 statements like:
153
154 $i = $i ++;
155 print ++ $i + $i ++;
156
157 Perl will not guarantee what the result of the above statements is.
158
159 The auto-increment operator has a little extra builtin magic to it. If
160 you increment a variable that is numeric, or that has ever been used in
161 a numeric context, you get a normal increment. If, however, the vari‐
162 able has been used in only string contexts since it was set, and has a
163 value that is not the empty string and matches the pattern
164 "/^[a-zA-Z]*[0-9]*\z/", the increment is done as a string, preserving
165 each character within its range, with carry:
166
167 print ++($foo = '99'); # prints '100'
168 print ++($foo = 'a0'); # prints 'a1'
169 print ++($foo = 'Az'); # prints 'Ba'
170 print ++($foo = 'zz'); # prints 'aaa'
171
172 "undef" is always treated as numeric, and in particular is changed to 0
173 before incrementing (so that a post-increment of an undef value will
174 return 0 rather than "undef").
175
176 The auto-decrement operator is not magical.
177
178 Exponentiation
179
180 Binary "**" is the exponentiation operator. It binds even more tightly
181 than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is imple‐
182 mented using C's pow(3) function, which actually works on doubles
183 internally.)
184
185 Symbolic Unary Operators
186
187 Unary "!" performs logical negation, i.e., "not". See also "not" for a
188 lower precedence version of this.
189
190 Unary "-" performs arithmetic negation if the operand is numeric. If
191 the operand is an identifier, a string consisting of a minus sign con‐
192 catenated with the identifier is returned. Otherwise, if the string
193 starts with a plus or minus, a string starting with the opposite sign
194 is returned. One effect of these rules is that -bareword is equivalent
195 to the string "-bareword". If, however, the string begins with a non-
196 alphabetic character (exluding "+" or "-"), Perl will attempt to con‐
197 vert the string to a numeric and the arithmetic negation is performed.
198 If the string cannot be cleanly converted to a numeric, Perl will give
199 the warning Argument "the string" isn't numeric in negation (-) at ....
200
201 Unary "~" performs bitwise negation, i.e., 1's complement. For exam‐
202 ple, "0666 & ~027" is 0640. (See also "Integer Arithmetic" and "Bit‐
203 wise String Operators".) Note that the width of the result is plat‐
204 form-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64 bits
205 wide on a 64-bit platform, so if you are expecting a certain bit width,
206 remember to use the & operator to mask off the excess bits.
207
208 Unary "+" has no effect whatsoever, even on strings. It is useful syn‐
209 tactically for separating a function name from a parenthesized expres‐
210 sion that would otherwise be interpreted as the complete list of func‐
211 tion arguments. (See examples above under "Terms and List Operators
212 (Leftward)".)
213
214 Unary "\" creates a reference to whatever follows it. See perlreftut
215 and perlref. Do not confuse this behavior with the behavior of back‐
216 slash within a string, although both forms do convey the notion of pro‐
217 tecting the next thing from interpolation.
218
219 Binding Operators
220
221 Binary "=~" binds a scalar expression to a pattern match. Certain
222 operations search or modify the string $_ by default. This operator
223 makes that kind of operation work on some other string. The right
224 argument is a search pattern, substitution, or transliteration. The
225 left argument is what is supposed to be searched, substituted, or
226 transliterated instead of the default $_. When used in scalar context,
227 the return value generally indicates the success of the operation.
228 Behavior in list context depends on the particular operator. See "Reg‐
229 exp Quote-Like Operators" for details and perlretut for examples using
230 these operators.
231
232 If the right argument is an expression rather than a search pattern,
233 substitution, or transliteration, it is interpreted as a search pattern
234 at run time.
235
236 Binary "!~" is just like "=~" except the return value is negated in the
237 logical sense.
238
239 Multiplicative Operators
240
241 Binary "*" multiplies two numbers.
242
243 Binary "/" divides two numbers.
244
245 Binary "%" computes the modulus of two numbers. Given integer operands
246 $a and $b: If $b is positive, then "$a % $b" is $a minus the largest
247 multiple of $b that is not greater than $a. If $b is negative, then
248 "$a % $b" is $a minus the smallest multiple of $b that is not less than
249 $a (i.e. the result will be less than or equal to zero). Note that
250 when "use integer" is in scope, "%" gives you direct access to the mod‐
251 ulus operator as implemented by your C compiler. This operator is not
252 as well defined for negative operands, but it will execute faster.
253
254 Binary "x" is the repetition operator. In scalar context or if the
255 left operand is not enclosed in parentheses, it returns a string con‐
256 sisting of the left operand repeated the number of times specified by
257 the right operand. In list context, if the left operand is enclosed in
258 parentheses or is a list formed by "qw/STRING/", it repeats the list.
259 If the right operand is zero or negative, it returns an empty string or
260 an empty list, depending on the context.
261
262 print '-' x 80; # print row of dashes
263
264 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
265
266 @ones = (1) x 80; # a list of 80 1's
267 @ones = (5) x @ones; # set all elements to 5
268
269 Additive Operators
270
271 Binary "+" returns the sum of two numbers.
272
273 Binary "-" returns the difference of two numbers.
274
275 Binary "." concatenates two strings.
276
277 Shift Operators
278
279 Binary "<<" returns the value of its left argument shifted left by the
280 number of bits specified by the right argument. Arguments should be
281 integers. (See also "Integer Arithmetic".)
282
283 Binary ">>" returns the value of its left argument shifted right by the
284 number of bits specified by the right argument. Arguments should be
285 integers. (See also "Integer Arithmetic".)
286
287 Note that both "<<" and ">>" in Perl are implemented directly using
288 "<<" and ">>" in C. If "use integer" (see "Integer Arithmetic") is in
289 force then signed C integers are used, else unsigned C integers are
290 used. Either way, the implementation isn't going to generate results
291 larger than the size of the integer type Perl was built with (32 bits
292 or 64 bits).
293
294 The result of overflowing the range of the integers is undefined
295 because it is undefined also in C. In other words, using 32-bit inte‐
296 gers, "1 << 32" is undefined. Shifting by a negative number of bits is
297 also undefined.
298
299 Named Unary Operators
300
301 The various named unary operators are treated as functions with one
302 argument, with optional parentheses.
303
304 If any list operator (print(), etc.) or any unary operator (chdir(),
305 etc.) is followed by a left parenthesis as the next token, the opera‐
306 tor and arguments within parentheses are taken to be of highest prece‐
307 dence, just like a normal function call. For example, because named
308 unary operators are higher precedence than ⎪⎪:
309
310 chdir $foo ⎪⎪ die; # (chdir $foo) ⎪⎪ die
311 chdir($foo) ⎪⎪ die; # (chdir $foo) ⎪⎪ die
312 chdir ($foo) ⎪⎪ die; # (chdir $foo) ⎪⎪ die
313 chdir +($foo) ⎪⎪ die; # (chdir $foo) ⎪⎪ die
314
315 but, because * is higher precedence than named operators:
316
317 chdir $foo * 20; # chdir ($foo * 20)
318 chdir($foo) * 20; # (chdir $foo) * 20
319 chdir ($foo) * 20; # (chdir $foo) * 20
320 chdir +($foo) * 20; # chdir ($foo * 20)
321
322 rand 10 * 20; # rand (10 * 20)
323 rand(10) * 20; # (rand 10) * 20
324 rand (10) * 20; # (rand 10) * 20
325 rand +(10) * 20; # rand (10 * 20)
326
327 Regarding precedence, the filetest operators, like "-f", "-M", etc. are
328 treated like named unary operators, but they don't follow this func‐
329 tional parenthesis rule. That means, for example, that
330 "-f($file).".bak"" is equivalent to "-f "$file.bak"".
331
332 See also "Terms and List Operators (Leftward)".
333
334 Relational Operators
335
336 Binary "<" returns true if the left argument is numerically less than
337 the right argument.
338
339 Binary ">" returns true if the left argument is numerically greater
340 than the right argument.
341
342 Binary "<=" returns true if the left argument is numerically less than
343 or equal to the right argument.
344
345 Binary ">=" returns true if the left argument is numerically greater
346 than or equal to the right argument.
347
348 Binary "lt" returns true if the left argument is stringwise less than
349 the right argument.
350
351 Binary "gt" returns true if the left argument is stringwise greater
352 than the right argument.
353
354 Binary "le" returns true if the left argument is stringwise less than
355 or equal to the right argument.
356
357 Binary "ge" returns true if the left argument is stringwise greater
358 than or equal to the right argument.
359
360 Equality Operators
361
362 Binary "==" returns true if the left argument is numerically equal to
363 the right argument.
364
365 Binary "!=" returns true if the left argument is numerically not equal
366 to the right argument.
367
368 Binary "<=>" returns -1, 0, or 1 depending on whether the left argument
369 is numerically less than, equal to, or greater than the right argument.
370 If your platform supports NaNs (not-a-numbers) as numeric values, using
371 them with "<=>" returns undef. NaN is not "<", "==", ">", "<=" or ">="
372 anything (even NaN), so those 5 return false. NaN != NaN returns true,
373 as does NaN != anything else. If your platform doesn't support NaNs
374 then NaN is just a string with numeric value 0.
375
376 perl -le '$a = "NaN"; print "No NaN support here" if $a == $a'
377 perl -le '$a = "NaN"; print "NaN support here" if $a != $a'
378
379 Binary "eq" returns true if the left argument is stringwise equal to
380 the right argument.
381
382 Binary "ne" returns true if the left argument is stringwise not equal
383 to the right argument.
384
385 Binary "cmp" returns -1, 0, or 1 depending on whether the left argument
386 is stringwise less than, equal to, or greater than the right argument.
387
388 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order speci‐
389 fied by the current locale if "use locale" is in effect. See perllo‐
390 cale.
391
392 Bitwise And
393
394 Binary "&" returns its operands ANDed together bit by bit. (See also
395 "Integer Arithmetic" and "Bitwise String Operators".)
396
397 Note that "&" has lower priority than relational operators, so for
398 example the brackets are essential in a test like
399
400 print "Even\n" if ($x & 1) == 0;
401
402 Bitwise Or and Exclusive Or
403
404 Binary "⎪" returns its operands ORed together bit by bit. (See also
405 "Integer Arithmetic" and "Bitwise String Operators".)
406
407 Binary "^" returns its operands XORed together bit by bit. (See also
408 "Integer Arithmetic" and "Bitwise String Operators".)
409
410 Note that "⎪" and "^" have lower priority than relational operators, so
411 for example the brackets are essential in a test like
412
413 print "false\n" if (8 ⎪ 2) != 10;
414
415 C-style Logical And
416
417 Binary "&&" performs a short-circuit logical AND operation. That is,
418 if the left operand is false, the right operand is not even evaluated.
419 Scalar or list context propagates down to the right operand if it is
420 evaluated.
421
422 C-style Logical Or
423
424 Binary "⎪⎪" performs a short-circuit logical OR operation. That is, if
425 the left operand is true, the right operand is not even evaluated.
426 Scalar or list context propagates down to the right operand if it is
427 evaluated.
428
429 The "⎪⎪" and "&&" operators return the last value evaluated (unlike C's
430 "⎪⎪" and "&&", which return 0 or 1). Thus, a reasonably portable way to
431 find out the home directory might be:
432
433 $home = $ENV{'HOME'} ⎪⎪ $ENV{'LOGDIR'} ⎪⎪
434 (getpwuid($<))[7] ⎪⎪ die "You're homeless!\n";
435
436 In particular, this means that you shouldn't use this for selecting
437 between two aggregates for assignment:
438
439 @a = @b ⎪⎪ @c; # this is wrong
440 @a = scalar(@b) ⎪⎪ @c; # really meant this
441 @a = @b ? @b : @c; # this works fine, though
442
443 As more readable alternatives to "&&" and "⎪⎪" when used for control
444 flow, Perl provides "and" and "or" operators (see below). The short-
445 circuit behavior is identical. The precedence of "and" and "or" is
446 much lower, however, so that you can safely use them after a list oper‐
447 ator without the need for parentheses:
448
449 unlink "alpha", "beta", "gamma"
450 or gripe(), next LINE;
451
452 With the C-style operators that would have been written like this:
453
454 unlink("alpha", "beta", "gamma")
455 ⎪⎪ (gripe(), next LINE);
456
457 Using "or" for assignment is unlikely to do what you want; see below.
458
459 Range Operators
460
461 Binary ".." is the range operator, which is really two different opera‐
462 tors depending on the context. In list context, it returns a list of
463 values counting (up by ones) from the left value to the right value.
464 If the left value is greater than the right value then it returns the
465 empty list. The range operator is useful for writing "foreach (1..10)"
466 loops and for doing slice operations on arrays. In the current imple‐
467 mentation, no temporary array is created when the range operator is
468 used as the expression in "foreach" loops, but older versions of Perl
469 might burn a lot of memory when you write something like this:
470
471 for (1 .. 1_000_000) {
472 # code
473 }
474
475 The range operator also works on strings, using the magical auto-incre‐
476 ment, see below.
477
478 In scalar context, ".." returns a boolean value. The operator is
479 bistable, like a flip-flop, and emulates the line-range (comma) opera‐
480 tor of sed, awk, and various editors. Each ".." operator maintains its
481 own boolean state. It is false as long as its left operand is false.
482 Once the left operand is true, the range operator stays true until the
483 right operand is true, AFTER which the range operator becomes false
484 again. It doesn't become false till the next time the range operator
485 is evaluated. It can test the right operand and become false on the
486 same evaluation it became true (as in awk), but it still returns true
487 once. If you don't want it to test the right operand till the next
488 evaluation, as in sed, just use three dots ("...") instead of two. In
489 all other regards, "..." behaves just like ".." does.
490
491 The right operand is not evaluated while the operator is in the "false"
492 state, and the left operand is not evaluated while the operator is in
493 the "true" state. The precedence is a little lower than ⎪⎪ and &&.
494 The value returned is either the empty string for false, or a sequence
495 number (beginning with 1) for true. The sequence number is reset for
496 each range encountered. The final sequence number in a range has the
497 string "E0" appended to it, which doesn't affect its numeric value, but
498 gives you something to search for if you want to exclude the endpoint.
499 You can exclude the beginning point by waiting for the sequence number
500 to be greater than 1.
501
502 If either operand of scalar ".." is a constant expression, that operand
503 is considered true if it is equal ("==") to the current input line num‐
504 ber (the $. variable).
505
506 To be pedantic, the comparison is actually "int(EXPR) == int(EXPR)",
507 but that is only an issue if you use a floating point expression; when
508 implicitly using $. as described in the previous paragraph, the compar‐
509 ison is "int(EXPR) == int($.)" which is only an issue when $. is set
510 to a floating point value and you are not reading from a file. Fur‐
511 thermore, "span" .. "spat" or "2.18 .. 3.14" will not do what you want
512 in scalar context because each of the operands are evaluated using
513 their integer representation.
514
515 Examples:
516
517 As a scalar operator:
518
519 if (101 .. 200) { print; } # print 2nd hundred lines, short for
520 # if ($. == 101 .. $. == 200) ...
521
522 next LINE if (1 .. /^$/); # skip header lines, short for
523 # ... if ($. == 1 .. /^$/);
524 # (typically in a loop labeled LINE)
525
526 s/^/> / if (/^$/ .. eof()); # quote body
527
528 # parse mail messages
529 while (<>) {
530 $in_header = 1 .. /^$/;
531 $in_body = /^$/ .. eof;
532 if ($in_header) {
533 # ...
534 } else { # in body
535 # ...
536 }
537 } continue {
538 close ARGV if eof; # reset $. each file
539 }
540
541 Here's a simple example to illustrate the difference between the two
542 range operators:
543
544 @lines = (" - Foo",
545 "01 - Bar",
546 "1 - Baz",
547 " - Quux");
548
549 foreach (@lines) {
550 if (/0/ .. /1/) {
551 print "$_\n";
552 }
553 }
554
555 This program will print only the line containing "Bar". If the range
556 operator is changed to "...", it will also print the "Baz" line.
557
558 And now some examples as a list operator:
559
560 for (101 .. 200) { print; } # print $_ 100 times
561 @foo = @foo[0 .. $#foo]; # an expensive no-op
562 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
563
564 The range operator (in list context) makes use of the magical auto-
565 increment algorithm if the operands are strings. You can say
566
567 @alphabet = ('A' .. 'Z');
568
569 to get all normal letters of the English alphabet, or
570
571 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
572
573 to get a hexadecimal digit, or
574
575 @z2 = ('01' .. '31'); print $z2[$mday];
576
577 to get dates with leading zeros. If the final value specified is not
578 in the sequence that the magical increment would produce, the sequence
579 goes until the next value would be longer than the final value speci‐
580 fied.
581
582 Because each operand is evaluated in integer form, "2.18 .. 3.14" will
583 return two elements in list context.
584
585 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
586
587 Conditional Operator
588
589 Ternary "?:" is the conditional operator, just as in C. It works much
590 like an if-then-else. If the argument before the ? is true, the argu‐
591 ment before the : is returned, otherwise the argument after the : is
592 returned. For example:
593
594 printf "I have %d dog%s.\n", $n,
595 ($n == 1) ? '' : "s";
596
597 Scalar or list context propagates downward into the 2nd or 3rd argu‐
598 ment, whichever is selected.
599
600 $a = $ok ? $b : $c; # get a scalar
601 @a = $ok ? @b : @c; # get an array
602 $a = $ok ? @b : @c; # oops, that's just a count!
603
604 The operator may be assigned to if both the 2nd and 3rd arguments are
605 legal lvalues (meaning that you can assign to them):
606
607 ($a_or_b ? $a : $b) = $c;
608
609 Because this operator produces an assignable result, using assignments
610 without parentheses will get you in trouble. For example, this:
611
612 $a % 2 ? $a += 10 : $a += 2
613
614 Really means this:
615
616 (($a % 2) ? ($a += 10) : $a) += 2
617
618 Rather than this:
619
620 ($a % 2) ? ($a += 10) : ($a += 2)
621
622 That should probably be written more simply as:
623
624 $a += ($a % 2) ? 10 : 2;
625
626 Assignment Operators
627
628 "=" is the ordinary assignment operator.
629
630 Assignment operators work as in C. That is,
631
632 $a += 2;
633
634 is equivalent to
635
636 $a = $a + 2;
637
638 although without duplicating any side effects that dereferencing the
639 lvalue might trigger, such as from tie(). Other assignment operators
640 work similarly. The following are recognized:
641
642 **= += *= &= <<= &&=
643 -= /= ⎪= >>= ⎪⎪=
644 .= %= ^=
645 x=
646
647 Although these are grouped by family, they all have the precedence of
648 assignment.
649
650 Unlike in C, the scalar assignment operator produces a valid lvalue.
651 Modifying an assignment is equivalent to doing the assignment and then
652 modifying the variable that was assigned to. This is useful for modi‐
653 fying a copy of something, like this:
654
655 ($tmp = $global) =~ tr [A-Z] [a-z];
656
657 Likewise,
658
659 ($a += 2) *= 3;
660
661 is equivalent to
662
663 $a += 2;
664 $a *= 3;
665
666 Similarly, a list assignment in list context produces the list of lval‐
667 ues assigned to, and a list assignment in scalar context returns the
668 number of elements produced by the expression on the right hand side of
669 the assignment.
670
671 Comma Operator
672
673 Binary "," is the comma operator. In scalar context it evaluates its
674 left argument, throws that value away, then evaluates its right argu‐
675 ment and returns that value. This is just like C's comma operator.
676
677 In list context, it's just the list argument separator, and inserts
678 both its arguments into the list.
679
680 The "=>" operator is a synonym for the comma, but forces any word (con‐
681 sisting entirely of word characters) to its left to be interpreted as a
682 string (as of 5.001). This includes words that might otherwise be con‐
683 sidered a constant or function call.
684
685 use constant FOO => "something";
686
687 my %h = ( FOO => 23 );
688
689 is equivalent to:
690
691 my %h = ("FOO", 23);
692
693 It is NOT:
694
695 my %h = ("something", 23);
696
697 If the argument on the left is not a word, it is first interpreted as
698 an expression, and then the string value of that is used.
699
700 The "=>" operator is helpful in documenting the correspondence between
701 keys and values in hashes, and other paired elements in lists.
702
703 %hash = ( $key => $value );
704 login( $username => $password );
705
706 List Operators (Rightward)
707
708 On the right side of a list operator, it has very low precedence, such
709 that it controls all comma-separated expressions found there. The only
710 operators with lower precedence are the logical operators "and", "or",
711 and "not", which may be used to evaluate calls to list operators with‐
712 out the need for extra parentheses:
713
714 open HANDLE, "filename"
715 or die "Can't open: $!\n";
716
717 See also discussion of list operators in "Terms and List Operators
718 (Leftward)".
719
720 Logical Not
721
722 Unary "not" returns the logical negation of the expression to its
723 right. It's the equivalent of "!" except for the very low precedence.
724
725 Logical And
726
727 Binary "and" returns the logical conjunction of the two surrounding
728 expressions. It's equivalent to && except for the very low precedence.
729 This means that it short-circuits: i.e., the right expression is evalu‐
730 ated only if the left expression is true.
731
732 Logical or and Exclusive Or
733
734 Binary "or" returns the logical disjunction of the two surrounding
735 expressions. It's equivalent to ⎪⎪ except for the very low precedence.
736 This makes it useful for control flow
737
738 print FH $data or die "Can't write to FH: $!";
739
740 This means that it short-circuits: i.e., the right expression is evalu‐
741 ated only if the left expression is false. Due to its precedence, you
742 should probably avoid using this for assignment, only for control flow.
743
744 $a = $b or $c; # bug: this is wrong
745 ($a = $b) or $c; # really means this
746 $a = $b ⎪⎪ $c; # better written this way
747
748 However, when it's a list-context assignment and you're trying to use
749 "⎪⎪" for control flow, you probably need "or" so that the assignment
750 takes higher precedence.
751
752 @info = stat($file) ⎪⎪ die; # oops, scalar sense of stat!
753 @info = stat($file) or die; # better, now @info gets its due
754
755 Then again, you could always use parentheses.
756
757 Binary "xor" returns the exclusive-OR of the two surrounding expres‐
758 sions. It cannot short circuit, of course.
759
760 C Operators Missing From Perl
761
762 Here is what C has that Perl doesn't:
763
764 unary & Address-of operator. (But see the "\" operator for taking a
765 reference.)
766
767 unary * Dereference-address operator. (Perl's prefix dereferencing
768 operators are typed: $, @, %, and &.)
769
770 (TYPE) Type-casting operator.
771
772 Quote and Quote-like Operators
773
774 While we usually think of quotes as literal values, in Perl they func‐
775 tion as operators, providing various kinds of interpolating and pattern
776 matching capabilities. Perl provides customary quote characters for
777 these behaviors, but also provides a way for you to choose your quote
778 character for any of them. In the following table, a "{}" represents
779 any pair of delimiters you choose.
780
781 Customary Generic Meaning Interpolates
782 '' q{} Literal no
783 "" qq{} Literal yes
784 `` qx{} Command yes*
785 qw{} Word list no
786 // m{} Pattern match yes*
787 qr{} Pattern yes*
788 s{}{} Substitution yes*
789 tr{}{} Transliteration no (but see below)
790 <<EOF here-doc yes*
791
792 * unless the delimiter is ''.
793
794 Non-bracketing delimiters use the same character fore and aft, but the
795 four sorts of brackets (round, angle, square, curly) will all nest,
796 which means that
797
798 q{foo{bar}baz}
799
800 is the same as
801
802 'foo{bar}baz'
803
804 Note, however, that this does not always work for quoting Perl code:
805
806 $s = q{ if($a eq "}") ... }; # WRONG
807
808 is a syntax error. The "Text::Balanced" module (from CPAN, and starting
809 from Perl 5.8 part of the standard distribution) is able to do this
810 properly.
811
812 There can be whitespace between the operator and the quoting charac‐
813 ters, except when "#" is being used as the quoting character. "q#foo#"
814 is parsed as the string "foo", while "q #foo#" is the operator "q" fol‐
815 lowed by a comment. Its argument will be taken from the next line.
816 This allows you to write:
817
818 s {foo} # Replace foo
819 {bar} # with bar.
820
821 The following escape sequences are available in constructs that inter‐
822 polate and in transliterations.
823
824 \t tab (HT, TAB)
825 \n newline (NL)
826 \r return (CR)
827 \f form feed (FF)
828 \b backspace (BS)
829 \a alarm (bell) (BEL)
830 \e escape (ESC)
831 \033 octal char (ESC)
832 \x1b hex char (ESC)
833 \x{263a} wide hex char (SMILEY)
834 \c[ control char (ESC)
835 \N{name} named Unicode character
836
837 NOTE: Unlike C and other languages, Perl has no \v escape sequence for
838 the vertical tab (VT - ASCII 11).
839
840 The following escape sequences are available in constructs that inter‐
841 polate but not in transliterations.
842
843 \l lowercase next char
844 \u uppercase next char
845 \L lowercase till \E
846 \U uppercase till \E
847 \E end case modification
848 \Q quote non-word characters till \E
849
850 If "use locale" is in effect, the case map used by "\l", "\L", "\u" and
851 "\U" is taken from the current locale. See perllocale. If Unicode
852 (for example, "\N{}" or wide hex characters of 0x100 or beyond) is
853 being used, the case map used by "\l", "\L", "\u" and "\U" is as
854 defined by Unicode. For documentation of "\N{name}", see charnames.
855
856 All systems use the virtual "\n" to represent a line terminator, called
857 a "newline". There is no such thing as an unvarying, physical newline
858 character. It is only an illusion that the operating system, device
859 drivers, C libraries, and Perl all conspire to preserve. Not all sys‐
860 tems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on a
861 Mac, these are reversed, and on systems without line terminator, print‐
862 ing "\n" may emit no actual data. In general, use "\n" when you mean a
863 "newline" for your system, but use the literal ASCII when you need an
864 exact character. For example, most networking protocols expect and
865 prefer a CR+LF ("\015\012" or "\cM\cJ") for line terminators, and
866 although they often accept just "\012", they seldom tolerate just
867 "\015". If you get in the habit of using "\n" for networking, you may
868 be burned some day.
869
870 For constructs that do interpolate, variables beginning with ""$"" or
871 ""@"" are interpolated. Subscripted variables such as $a[3] or
872 "$href->{key}[0]" are also interpolated, as are array and hash slices.
873 But method calls such as "$obj->meth" are not.
874
875 Interpolating an array or slice interpolates the elements in order,
876 separated by the value of $", so is equivalent to interpolating "join
877 $", @array". "Punctuation" arrays such as "@+" are only interpolated
878 if the name is enclosed in braces "@{+}".
879
880 You cannot include a literal "$" or "@" within a "\Q" sequence. An
881 unescaped "$" or "@" interpolates the corresponding variable, while
882 escaping will cause the literal string "\$" to be inserted. You'll
883 need to write something like "m/\Quser\E\@\Qhost/".
884
885 Patterns are subject to an additional level of interpretation as a reg‐
886 ular expression. This is done as a second pass, after variables are
887 interpolated, so that regular expressions may be incorporated into the
888 pattern from the variables. If this is not what you want, use "\Q" to
889 interpolate a variable literally.
890
891 Apart from the behavior described above, Perl does not expand multiple
892 levels of interpolation. In particular, contrary to the expectations
893 of shell programmers, back-quotes do NOT interpolate within double
894 quotes, nor do single quotes impede evaluation of variables when used
895 within double quotes.
896
897 Regexp Quote-Like Operators
898
899 Here are the quote-like operators that apply to pattern matching and
900 related activities.
901
902 ?PATTERN?
903 This is just like the "/pattern/" search, except that it
904 matches only once between calls to the reset() operator. This
905 is a useful optimization when you want to see only the first
906 occurrence of something in each file of a set of files, for
907 instance. Only "??" patterns local to the current package are
908 reset.
909
910 while (<>) {
911 if (?^$?) {
912 # blank line between header and body
913 }
914 } continue {
915 reset if eof; # clear ?? status for next file
916 }
917
918 This usage is vaguely deprecated, which means it just might
919 possibly be removed in some distant future version of Perl,
920 perhaps somewhere around the year 2168.
921
922 m/PATTERN/cgimosx
923 /PATTERN/cgimosx
924 Searches a string for a pattern match, and in scalar context
925 returns true if it succeeds, false if it fails. If no string
926 is specified via the "=~" or "!~" operator, the $_ string is
927 searched. (The string specified with "=~" need not be an
928 lvalue--it may be the result of an expression evaluation, but
929 remember the "=~" binds rather tightly.) See also perlre. See
930 perllocale for discussion of additional considerations that
931 apply when "use locale" is in effect.
932
933 Options are:
934
935 c Do not reset search position on a failed match when /g is in effect.
936 g Match globally, i.e., find all occurrences.
937 i Do case-insensitive pattern matching.
938 m Treat string as multiple lines.
939 o Compile pattern only once.
940 s Treat string as single line.
941 x Use extended regular expressions.
942
943 If "/" is the delimiter then the initial "m" is optional. With
944 the "m" you can use any pair of non-alphanumeric, non-white‐
945 space characters as delimiters. This is particularly useful
946 for matching path names that contain "/", to avoid LTS (leaning
947 toothpick syndrome). If "?" is the delimiter, then the match-
948 only-once rule of "?PATTERN?" applies. If "'" is the delim‐
949 iter, no interpolation is performed on the PATTERN.
950
951 PATTERN may contain variables, which will be interpolated (and
952 the pattern recompiled) every time the pattern search is evalu‐
953 ated, except for when the delimiter is a single quote. (Note
954 that $(, $), and $⎪ are not interpolated because they look like
955 end-of-string tests.) If you want such a pattern to be com‐
956 piled only once, add a "/o" after the trailing delimiter. This
957 avoids expensive run-time recompilations, and is useful when
958 the value you are interpolating won't change over the life of
959 the script. However, mentioning "/o" constitutes a promise
960 that you won't change the variables in the pattern. If you
961 change them, Perl won't even notice. See also
962 "qr/STRING/imosx".
963
964 If the PATTERN evaluates to the empty string, the last success‐
965 fully matched regular expression is used instead. In this case,
966 only the "g" and "c" flags on the empty pattern is honoured -
967 the other flags are taken from the original pattern. If no
968 match has previously succeeded, this will (silently) act
969 instead as a genuine empty pattern (which will always match).
970
971 If the "/g" option is not used, "m//" in list context returns a
972 list consisting of the subexpressions matched by the parenthe‐
973 ses in the pattern, i.e., ($1, $2, $3...). (Note that here $1
974 etc. are also set, and that this differs from Perl 4's behav‐
975 ior.) When there are no parentheses in the pattern, the return
976 value is the list "(1)" for success. With or without parenthe‐
977 ses, an empty list is returned upon failure.
978
979 Examples:
980
981 open(TTY, '/dev/tty');
982 <TTY> =~ /^y/i && foo(); # do foo if desired
983
984 if (/Version: *([0-9.]*)/) { $version = $1; }
985
986 next if m#^/usr/spool/uucp#;
987
988 # poor man's grep
989 $arg = shift;
990 while (<>) {
991 print if /$arg/o; # compile only once
992 }
993
994 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
995
996 This last example splits $foo into the first two words and the
997 remainder of the line, and assigns those three fields to $F1,
998 $F2, and $Etc. The conditional is true if any variables were
999 assigned, i.e., if the pattern matched.
1000
1001 The "/g" modifier specifies global pattern matching--that is,
1002 matching as many times as possible within the string. How it
1003 behaves depends on the context. In list context, it returns a
1004 list of the substrings matched by any capturing parentheses in
1005 the regular expression. If there are no parentheses, it
1006 returns a list of all the matched strings, as if there were
1007 parentheses around the whole pattern.
1008
1009 In scalar context, each execution of "m//g" finds the next
1010 match, returning true if it matches, and false if there is no
1011 further match. The position after the last match can be read
1012 or set using the pos() function; see "pos" in perlfunc. A
1013 failed match normally resets the search position to the begin‐
1014 ning of the string, but you can avoid that by adding the "/c"
1015 modifier (e.g. "m//gc"). Modifying the target string also
1016 resets the search position.
1017
1018 You can intermix "m//g" matches with "m/\G.../g", where "\G" is
1019 a zero-width assertion that matches the exact position where
1020 the previous "m//g", if any, left off. Without the "/g" modi‐
1021 fier, the "\G" assertion still anchors at pos(), but the match
1022 is of course only attempted once. Using "\G" without "/g" on a
1023 target string that has not previously had a "/g" match applied
1024 to it is the same as using the "\A" assertion to match the
1025 beginning of the string. Note also that, currently, "\G" is
1026 only properly supported when anchored at the very beginning of
1027 the pattern.
1028
1029 Examples:
1030
1031 # list context
1032 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1033
1034 # scalar context
1035 $/ = "";
1036 while (defined($paragraph = <>)) {
1037 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
1038 $sentences++;
1039 }
1040 }
1041 print "$sentences\n";
1042
1043 # using m//gc with \G
1044 $_ = "ppooqppqq";
1045 while ($i++ < 2) {
1046 print "1: '";
1047 print $1 while /(o)/gc; print "', pos=", pos, "\n";
1048 print "2: '";
1049 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
1050 print "3: '";
1051 print $1 while /(p)/gc; print "', pos=", pos, "\n";
1052 }
1053 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
1054
1055 The last example should print:
1056
1057 1: 'oo', pos=4
1058 2: 'q', pos=5
1059 3: 'pp', pos=7
1060 1: '', pos=7
1061 2: 'q', pos=8
1062 3: '', pos=8
1063 Final: 'q', pos=8
1064
1065 Notice that the final match matched "q" instead of "p", which a
1066 match without the "\G" anchor would have done. Also note that
1067 the final match did not update "pos" -- "pos" is only updated
1068 on a "/g" match. If the final match did indeed match "p", it's
1069 a good bet that you're running an older (pre-5.6.0) Perl.
1070
1071 A useful idiom for "lex"-like scanners is "/\G.../gc". You can
1072 combine several regexps like this to process a string
1073 part-by-part, doing different actions depending on which regexp
1074 matched. Each regexp tries to match where the previous one
1075 leaves off.
1076
1077 $_ = <<'EOL';
1078 $url = new URI::URL "http://www/"; die if $url eq "xXx";
1079 EOL
1080 LOOP:
1081 {
1082 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
1083 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
1084 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
1085 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
1086 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
1087 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
1088 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
1089 print ". That's all!\n";
1090 }
1091
1092 Here is the output (split into several lines):
1093
1094 line-noise lowercase line-noise lowercase UPPERCASE line-noise
1095 UPPERCASE line-noise lowercase line-noise lowercase line-noise
1096 lowercase lowercase line-noise lowercase lowercase line-noise
1097 MiXeD line-noise. That's all!
1098
1099 q/STRING/
1100 'STRING'
1101 A single-quoted, literal string. A backslash represents a
1102 backslash unless followed by the delimiter or another back‐
1103 slash, in which case the delimiter or backslash is interpo‐
1104 lated.
1105
1106 $foo = q!I said, "You said, 'She said it.'"!;
1107 $bar = q('This is it.');
1108 $baz = '\n'; # a two-character string
1109
1110 qq/STRING/
1111 "STRING"
1112 A double-quoted, interpolated string.
1113
1114 $_ .= qq
1115 (*** The previous line contains the naughty word "$1".\n)
1116 if /\b(tcl⎪java⎪python)\b/i; # :-)
1117 $baz = "\n"; # a one-character string
1118
1119 qr/STRING/imosx
1120 This operator quotes (and possibly compiles) its STRING as a
1121 regular expression. STRING is interpolated the same way as
1122 PATTERN in "m/PATTERN/". If "'" is used as the delimiter, no
1123 interpolation is done. Returns a Perl value which may be used
1124 instead of the corresponding "/STRING/imosx" expression.
1125
1126 For example,
1127
1128 $rex = qr/my.STRING/is;
1129 s/$rex/foo/;
1130
1131 is equivalent to
1132
1133 s/my.STRING/foo/is;
1134
1135 The result may be used as a subpattern in a match:
1136
1137 $re = qr/$pattern/;
1138 $string =~ /foo${re}bar/; # can be interpolated in other patterns
1139 $string =~ $re; # or used standalone
1140 $string =~ /$re/; # or this way
1141
1142 Since Perl may compile the pattern at the moment of execution
1143 of qr() operator, using qr() may have speed advantages in some
1144 situations, notably if the result of qr() is used standalone:
1145
1146 sub match {
1147 my $patterns = shift;
1148 my @compiled = map qr/$_/i, @$patterns;
1149 grep {
1150 my $success = 0;
1151 foreach my $pat (@compiled) {
1152 $success = 1, last if /$pat/;
1153 }
1154 $success;
1155 } @_;
1156 }
1157
1158 Precompilation of the pattern into an internal representation
1159 at the moment of qr() avoids a need to recompile the pattern
1160 every time a match "/$pat/" is attempted. (Perl has many other
1161 internal optimizations, but none would be triggered in the
1162 above example if we did not use qr() operator.)
1163
1164 Options are:
1165
1166 i Do case-insensitive pattern matching.
1167 m Treat string as multiple lines.
1168 o Compile pattern only once.
1169 s Treat string as single line.
1170 x Use extended regular expressions.
1171
1172 See perlre for additional information on valid syntax for
1173 STRING, and for a detailed look at the semantics of regular
1174 expressions.
1175
1176 qx/STRING/
1177 `STRING`
1178 A string which is (possibly) interpolated and then executed as
1179 a system command with "/bin/sh" or its equivalent. Shell wild‐
1180 cards, pipes, and redirections will be honored. The collected
1181 standard output of the command is returned; standard error is
1182 unaffected. In scalar context, it comes back as a single
1183 (potentially multi-line) string, or undef if the command
1184 failed. In list context, returns a list of lines (however
1185 you've defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an
1186 empty list if the command failed.
1187
1188 Because backticks do not affect standard error, use shell file
1189 descriptor syntax (assuming the shell supports this) if you
1190 care to address this. To capture a command's STDERR and STDOUT
1191 together:
1192
1193 $output = `cmd 2>&1`;
1194
1195 To capture a command's STDOUT but discard its STDERR:
1196
1197 $output = `cmd 2>/dev/null`;
1198
1199 To capture a command's STDERR but discard its STDOUT (ordering
1200 is important here):
1201
1202 $output = `cmd 2>&1 1>/dev/null`;
1203
1204 To exchange a command's STDOUT and STDERR in order to capture
1205 the STDERR but leave its STDOUT to come out the old STDERR:
1206
1207 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1208
1209 To read both a command's STDOUT and its STDERR separately, it's
1210 easiest to redirect them separately to files, and then read
1211 from those files when the program is done:
1212
1213 system("program args 1>program.stdout 2>program.stderr");
1214
1215 Using single-quote as a delimiter protects the command from
1216 Perl's double-quote interpolation, passing it on to the shell
1217 instead:
1218
1219 $perl_info = qx(ps $$); # that's Perl's $$
1220 $shell_info = qx'ps $$'; # that's the new shell's $$
1221
1222 How that string gets evaluated is entirely subject to the com‐
1223 mand interpreter on your system. On most platforms, you will
1224 have to protect shell metacharacters if you want them treated
1225 literally. This is in practice difficult to do, as it's
1226 unclear how to escape which characters. See perlsec for a
1227 clean and safe example of a manual fork() and exec() to emulate
1228 backticks safely.
1229
1230 On some platforms (notably DOS-like ones), the shell may not be
1231 capable of dealing with multiline commands, so putting newlines
1232 in the string may not get you what you want. You may be able
1233 to evaluate multiple commands in a single line by separating
1234 them with the command separator character, if your shell sup‐
1235 ports that (e.g. ";" on many Unix shells; "&" on the Windows NT
1236 "cmd" shell).
1237
1238 Beginning with v5.6.0, Perl will attempt to flush all files
1239 opened for output before starting the child process, but this
1240 may not be supported on some platforms (see perlport). To be
1241 safe, you may need to set $⎪ ($AUTOFLUSH in English) or call
1242 the "autoflush()" method of "IO::Handle" on any open handles.
1243
1244 Beware that some command shells may place restrictions on the
1245 length of the command line. You must ensure your strings don't
1246 exceed this limit after any necessary interpolations. See the
1247 platform-specific release notes for more details about your
1248 particular environment.
1249
1250 Using this operator can lead to programs that are difficult to
1251 port, because the shell commands called vary between systems,
1252 and may in fact not be present at all. As one example, the
1253 "type" command under the POSIX shell is very different from the
1254 "type" command under DOS. That doesn't mean you should go out
1255 of your way to avoid backticks when they're the right way to
1256 get something done. Perl was made to be a glue language, and
1257 one of the things it glues together is commands. Just under‐
1258 stand what you're getting yourself into.
1259
1260 See "I/O Operators" for more discussion.
1261
1262 qw/STRING/
1263 Evaluates to a list of the words extracted out of STRING, using
1264 embedded whitespace as the word delimiters. It can be under‐
1265 stood as being roughly equivalent to:
1266
1267 split(' ', q/STRING/);
1268
1269 the differences being that it generates a real list at compile
1270 time, and in scalar context it returns the last element in the
1271 list. So this expression:
1272
1273 qw(foo bar baz)
1274
1275 is semantically equivalent to the list:
1276
1277 'foo', 'bar', 'baz'
1278
1279 Some frequently seen examples:
1280
1281 use POSIX qw( setlocale localeconv )
1282 @EXPORT = qw( foo bar baz );
1283
1284 A common mistake is to try to separate the words with comma or
1285 to put comments into a multi-line "qw"-string. For this rea‐
1286 son, the "use warnings" pragma and the -w switch (that is, the
1287 $^W variable) produces warnings if the STRING contains the ","
1288 or the "#" character.
1289
1290 s/PATTERN/REPLACEMENT/egimosx
1291 Searches a string for a pattern, and if found, replaces that
1292 pattern with the replacement text and returns the number of
1293 substitutions made. Otherwise it returns false (specifically,
1294 the empty string).
1295
1296 If no string is specified via the "=~" or "!~" operator, the $_
1297 variable is searched and modified. (The string specified with
1298 "=~" must be scalar variable, an array element, a hash element,
1299 or an assignment to one of those, i.e., an lvalue.)
1300
1301 If the delimiter chosen is a single quote, no interpolation is
1302 done on either the PATTERN or the REPLACEMENT. Otherwise, if
1303 the PATTERN contains a $ that looks like a variable rather than
1304 an end-of-string test, the variable will be interpolated into
1305 the pattern at run-time. If you want the pattern compiled only
1306 once the first time the variable is interpolated, use the "/o"
1307 option. If the pattern evaluates to the empty string, the last
1308 successfully executed regular expression is used instead. See
1309 perlre for further explanation on these. See perllocale for
1310 discussion of additional considerations that apply when "use
1311 locale" is in effect.
1312
1313 Options are:
1314
1315 e Evaluate the right side as an expression.
1316 g Replace globally, i.e., all occurrences.
1317 i Do case-insensitive pattern matching.
1318 m Treat string as multiple lines.
1319 o Compile pattern only once.
1320 s Treat string as single line.
1321 x Use extended regular expressions.
1322
1323 Any non-alphanumeric, non-whitespace delimiter may replace the
1324 slashes. If single quotes are used, no interpretation is done
1325 on the replacement string (the "/e" modifier overrides this,
1326 however). Unlike Perl 4, Perl 5 treats backticks as normal
1327 delimiters; the replacement text is not evaluated as a command.
1328 If the PATTERN is delimited by bracketing quotes, the REPLACE‐
1329 MENT has its own pair of quotes, which may or may not be brack‐
1330 eting quotes, e.g., "s(foo)(bar)" or "s<foo>/bar/". A "/e"
1331 will cause the replacement portion to be treated as a full-
1332 fledged Perl expression and evaluated right then and there. It
1333 is, however, syntax checked at compile-time. A second "e" modi‐
1334 fier will cause the replacement portion to be "eval"ed before
1335 being run as a Perl expression.
1336
1337 Examples:
1338
1339 s/\bgreen\b/mauve/g; # don't change wintergreen
1340
1341 $path =~ s⎪/usr/bin⎪/usr/local/bin⎪;
1342
1343 s/Login: $foo/Login: $bar/; # run-time pattern
1344
1345 ($foo = $bar) =~ s/this/that/; # copy first, then change
1346
1347 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
1348
1349 $_ = 'abc123xyz';
1350 s/\d+/$&*2/e; # yields 'abc246xyz'
1351 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
1352 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
1353
1354 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
1355 s/%(.)/$percent{$1} ⎪⎪ $&/ge; # expr now, so /e
1356 s/^=(\w+)/&pod($1)/ge; # use function call
1357
1358 # expand variables in $_, but dynamics only, using
1359 # symbolic dereferencing
1360 s/\$(\w+)/${$1}/g;
1361
1362 # Add one to the value of any numbers in the string
1363 s/(\d+)/1 + $1/eg;
1364
1365 # This will expand any embedded scalar variable
1366 # (including lexicals) in $_ : First $1 is interpolated
1367 # to the variable name, and then evaluated
1368 s/(\$\w+)/$1/eeg;
1369
1370 # Delete (most) C comments.
1371 $program =~ s {
1372 /\* # Match the opening delimiter.
1373 .*? # Match a minimal number of characters.
1374 \*/ # Match the closing delimiter.
1375 } []gsx;
1376
1377 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively
1378
1379 for ($variable) { # trim whitespace in $variable, cheap
1380 s/^\s+//;
1381 s/\s+$//;
1382 }
1383
1384 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
1385
1386 Note the use of $ instead of \ in the last example. Unlike
1387 sed, we use the \<digit> form in only the left hand side. Any‐
1388 where else it's $<digit>.
1389
1390 Occasionally, you can't use just a "/g" to get all the changes
1391 to occur that you might want. Here are two common cases:
1392
1393 # put commas in the right places in an integer
1394 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
1395
1396 # expand tabs to 8-column spacing
1397 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1398
1399 tr/SEARCHLIST/REPLACEMENTLIST/cds
1400 y/SEARCHLIST/REPLACEMENTLIST/cds
1401 Transliterates all occurrences of the characters found in the
1402 search list with the corresponding character in the replacement
1403 list. It returns the number of characters replaced or deleted.
1404 If no string is specified via the =~ or !~ operator, the $_
1405 string is transliterated. (The string specified with =~ must
1406 be a scalar variable, an array element, a hash element, or an
1407 assignment to one of those, i.e., an lvalue.)
1408
1409 A character range may be specified with a hyphen, so
1410 "tr/A-J/0-9/" does the same replacement as "tr/ACEG‐
1411 IBDFHJ/0246813579/". For sed devotees, "y" is provided as a
1412 synonym for "tr". If the SEARCHLIST is delimited by bracketing
1413 quotes, the REPLACEMENTLIST has its own pair of quotes, which
1414 may or may not be bracketing quotes, e.g., "tr[A-Z][a-z]" or
1415 "tr(+\-*/)/ABCD/".
1416
1417 Note that "tr" does not do regular expression character classes
1418 such as "\d" or "[:lower:]". The <tr> operator is not equiva‐
1419 lent to the tr(1) utility. If you want to map strings between
1420 lower/upper cases, see "lc" in perlfunc and "uc" in perlfunc,
1421 and in general consider using the "s" operator if you need reg‐
1422 ular expressions.
1423
1424 Note also that the whole range idea is rather unportable
1425 between character sets--and even within character sets they may
1426 cause results you probably didn't expect. A sound principle is
1427 to use only ranges that begin from and end at either alphabets
1428 of equal case (a-e, A-E), or digits (0-4). Anything else is
1429 unsafe. If in doubt, spell out the character sets in full.
1430
1431 Options:
1432
1433 c Complement the SEARCHLIST.
1434 d Delete found but unreplaced characters.
1435 s Squash duplicate replaced characters.
1436
1437 If the "/c" modifier is specified, the SEARCHLIST character set
1438 is complemented. If the "/d" modifier is specified, any char‐
1439 acters specified by SEARCHLIST not found in REPLACEMENTLIST are
1440 deleted. (Note that this is slightly more flexible than the
1441 behavior of some tr programs, which delete anything they find
1442 in the SEARCHLIST, period.) If the "/s" modifier is specified,
1443 sequences of characters that were transliterated to the same
1444 character are squashed down to a single instance of the charac‐
1445 ter.
1446
1447 If the "/d" modifier is used, the REPLACEMENTLIST is always
1448 interpreted exactly as specified. Otherwise, if the REPLACE‐
1449 MENTLIST is shorter than the SEARCHLIST, the final character is
1450 replicated till it is long enough. If the REPLACEMENTLIST is
1451 empty, the SEARCHLIST is replicated. This latter is useful for
1452 counting characters in a class or for squashing character
1453 sequences in a class.
1454
1455 Examples:
1456
1457 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
1458
1459 $cnt = tr/*/*/; # count the stars in $_
1460
1461 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
1462
1463 $cnt = tr/0-9//; # count the digits in $_
1464
1465 tr/a-zA-Z//s; # bookkeeper -> bokeper
1466
1467 ($HOST = $host) =~ tr/a-z/A-Z/;
1468
1469 tr/a-zA-Z/ /cs; # change non-alphas to single space
1470
1471 tr [\200-\377]
1472 [\000-\177]; # delete 8th bit
1473
1474 If multiple transliterations are given for a character, only
1475 the first one is used:
1476
1477 tr/AAA/XYZ/
1478
1479 will transliterate any A to X.
1480
1481 Because the transliteration table is built at compile time,
1482 neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to
1483 double quote interpolation. That means that if you want to use
1484 variables, you must use an eval():
1485
1486 eval "tr/$oldlist/$newlist/";
1487 die $@ if $@;
1488
1489 eval "tr/$oldlist/$newlist/, 1" or die $@;
1490
1491 <<EOF A line-oriented form of quoting is based on the shell
1492 "here-document" syntax. Following a "<<" you specify a string
1493 to terminate the quoted material, and all lines following the
1494 current line down to the terminating string are the value of
1495 the item. The terminating string may be either an identifier
1496 (a word), or some quoted text. If quoted, the type of quotes
1497 you use determines the treatment of the text, just as in regu‐
1498 lar quoting. An unquoted identifier works like double quotes.
1499 There must be no space between the "<<" and the identifier,
1500 unless the identifier is quoted. (If you put a space it will
1501 be treated as a null identifier, which is valid, and matches
1502 the first empty line.) The terminating string must appear by
1503 itself (unquoted and with no surrounding whitespace) on the
1504 terminating line.
1505
1506 print <<EOF;
1507 The price is $Price.
1508 EOF
1509
1510 print << "EOF"; # same as above
1511 The price is $Price.
1512 EOF
1513
1514 print << `EOC`; # execute commands
1515 echo hi there
1516 echo lo there
1517 EOC
1518
1519 print <<"foo", <<"bar"; # you can stack them
1520 I said foo.
1521 foo
1522 I said bar.
1523 bar
1524
1525 myfunc(<< "THIS", 23, <<'THAT');
1526 Here's a line
1527 or two.
1528 THIS
1529 and here's another.
1530 THAT
1531
1532 Just don't forget that you have to put a semicolon on the end
1533 to finish the statement, as Perl doesn't know you're not going
1534 to try to do this:
1535
1536 print <<ABC
1537 179231
1538 ABC
1539 + 20;
1540
1541 If you want your here-docs to be indented with the rest of the
1542 code, you'll need to remove leading whitespace from each line
1543 manually:
1544
1545 ($quote = <<'FINIS') =~ s/^\s+//gm;
1546 The Road goes ever on and on,
1547 down from the door where it began.
1548 FINIS
1549
1550 If you use a here-doc within a delimited construct, such as in
1551 "s///eg", the quoted material must come on the lines following
1552 the final delimiter. So instead of
1553
1554 s/this/<<E . 'that'
1555 the other
1556 E
1557 . 'more '/eg;
1558
1559 you have to write
1560
1561 s/this/<<E . 'that'
1562 . 'more '/eg;
1563 the other
1564 E
1565
1566 If the terminating identifier is on the last line of the pro‐
1567 gram, you must be sure there is a newline after it; otherwise,
1568 Perl will give the warning Can't find string terminator "END"
1569 anywhere before EOF....
1570
1571 Additionally, the quoting rules for the identifier are not
1572 related to Perl's quoting rules -- "q()", "qq()", and the like
1573 are not supported in place of '' and "", and the only interpo‐
1574 lation is for backslashing the quoting character:
1575
1576 print << "abc\"def";
1577 testing...
1578 abc"def
1579
1580 Finally, quoted strings cannot span multiple lines. The gen‐
1581 eral rule is that the identifier must be a string literal.
1582 Stick with that, and you should be safe.
1583
1584 Gory details of parsing quoted constructs
1585
1586 When presented with something that might have several different inter‐
1587 pretations, Perl uses the DWIM (that's "Do What I Mean") principle to
1588 pick the most probable interpretation. This strategy is so successful
1589 that Perl programmers often do not suspect the ambivalence of what they
1590 write. But from time to time, Perl's notions differ substantially from
1591 what the author honestly meant.
1592
1593 This section hopes to clarify how Perl handles quoted constructs.
1594 Although the most common reason to learn this is to unravel
1595 labyrinthine regular expressions, because the initial steps of parsing
1596 are the same for all quoting operators, they are all discussed
1597 together.
1598
1599 The most important Perl parsing rule is the first one discussed below:
1600 when processing a quoted construct, Perl first finds the end of that
1601 construct, then interprets its contents. If you understand this rule,
1602 you may skip the rest of this section on the first reading. The other
1603 rules are likely to contradict the user's expectations much less fre‐
1604 quently than this first one.
1605
1606 Some passes discussed below are performed concurrently, but because
1607 their results are the same, we consider them individually. For differ‐
1608 ent quoting constructs, Perl performs different numbers of passes, from
1609 one to five, but these passes are always performed in the same order.
1610
1611 Finding the end
1612 The first pass is finding the end of the quoted construct, whether
1613 it be a multicharacter delimiter "\nEOF\n" in the "<<EOF" con‐
1614 struct, a "/" that terminates a "qq//" construct, a "]" which ter‐
1615 minates "qq[]" construct, or a ">" which terminates a fileglob
1616 started with "<".
1617
1618 When searching for single-character non-pairing delimiters, such as
1619 "/", combinations of "\\" and "\/" are skipped. However, when
1620 searching for single-character pairing delimiter like "[", combina‐
1621 tions of "\\", "\]", and "\[" are all skipped, and nested "[", "]"
1622 are skipped as well. When searching for multicharacter delimiters,
1623 nothing is skipped.
1624
1625 For constructs with three-part delimiters ("s///", "y///", and
1626 "tr///"), the search is repeated once more.
1627
1628 During this search no attention is paid to the semantics of the
1629 construct. Thus:
1630
1631 "$hash{"$foo/$bar"}"
1632
1633 or:
1634
1635 m/
1636 bar # NOT a comment, this slash / terminated m//!
1637 /x
1638
1639 do not form legal quoted expressions. The quoted part ends on the
1640 first """ and "/", and the rest happens to be a syntax error.
1641 Because the slash that terminated "m//" was followed by a "SPACE",
1642 the example above is not "m//x", but rather "m//" with no "/x" mod‐
1643 ifier. So the embedded "#" is interpreted as a literal "#".
1644
1645 Also no attention is paid to "\c\" during this search. Thus the
1646 second "\" in "qq/\c\/" is interpreted as a part of "\/", and the
1647 following "/" is not recognized as a delimiter. Instead, use
1648 "\034" or "\x1c" at the end of quoted constructs.
1649
1650 Removal of backslashes before delimiters
1651 During the second pass, text between the starting and ending delim‐
1652 iters is copied to a safe location, and the "\" is removed from
1653 combinations consisting of "\" and delimiter--or delimiters, mean‐
1654 ing both starting and ending delimiters will should these differ.
1655 This removal does not happen for multi-character delimiters. Note
1656 that the combination "\\" is left intact, just as it was.
1657
1658 Starting from this step no information about the delimiters is used
1659 in parsing.
1660
1661 Interpolation
1662 The next step is interpolation in the text obtained, which is now
1663 delimiter-independent. There are four different cases.
1664
1665 "<<'EOF'", "m''", "s'''", "tr///", "y///"
1666 No interpolation is performed.
1667
1668 '', "q//"
1669 The only interpolation is removal of "\" from pairs "\\".
1670
1671 "", ``, "qq//", "qx//", "<file*glob>"
1672 "\Q", "\U", "\u", "\L", "\l" (possibly paired with "\E") are
1673 converted to corresponding Perl constructs. Thus,
1674 "$foo\Qbaz$bar" is converted to "$foo . (quotemeta("baz" .
1675 $bar))" internally. The other combinations are replaced with
1676 appropriate expansions.
1677
1678 Let it be stressed that whatever falls between "\Q" and "\E" is
1679 interpolated in the usual way. Something like "\Q\\E" has no
1680 "\E" inside. instead, it has "\Q", "\\", and "E", so the
1681 result is the same as for "\\\\E". As a general rule, back‐
1682 slashes between "\Q" and "\E" may lead to counterintuitive
1683 results. So, "\Q\t\E" is converted to "quotemeta("\t")", which
1684 is the same as "\\\t" (since TAB is not alphanumeric). Note
1685 also that:
1686
1687 $str = '\t';
1688 return "\Q$str";
1689
1690 may be closer to the conjectural intention of the writer of
1691 "\Q\t\E".
1692
1693 Interpolated scalars and arrays are converted internally to the
1694 "join" and "." catenation operations. Thus, "$foo XXX '@arr'"
1695 becomes:
1696
1697 $foo . " XXX '" . (join $", @arr) . "'";
1698
1699 All operations above are performed simultaneously, left to
1700 right.
1701
1702 Because the result of "\Q STRING \E" has all metacharacters
1703 quoted, there is no way to insert a literal "$" or "@" inside a
1704 "\Q\E" pair. If protected by "\", "$" will be quoted to became
1705 "\\\$"; if not, it is interpreted as the start of an interpo‐
1706 lated scalar.
1707
1708 Note also that the interpolation code needs to make a decision
1709 on where the interpolated scalar ends. For instance, whether
1710 "a $b -> {c}" really means:
1711
1712 "a " . $b . " -> {c}";
1713
1714 or:
1715
1716 "a " . $b -> {c};
1717
1718 Most of the time, the longest possible text that does not
1719 include spaces between components and which contains matching
1720 braces or brackets. because the outcome may be determined by
1721 voting based on heuristic estimators, the result is not
1722 strictly predictable. Fortunately, it's usually correct for
1723 ambiguous cases.
1724
1725 "?RE?", "/RE/", "m/RE/", "s/RE/foo/",
1726 Processing of "\Q", "\U", "\u", "\L", "\l", and interpolation
1727 happens (almost) as with "qq//" constructs, but the substitu‐
1728 tion of "\" followed by RE-special chars (including "\") is not
1729 performed. Moreover, inside "(?{BLOCK})", "(?# comment )", and
1730 a "#"-comment in a "//x"-regular expression, no processing is
1731 performed whatsoever. This is the first step at which the
1732 presence of the "//x" modifier is relevant.
1733
1734 Interpolation has several quirks: $⎪, $(, and $) are not inter‐
1735 polated, and constructs $var[SOMETHING] are voted (by several
1736 different estimators) to be either an array element or $var
1737 followed by an RE alternative. This is where the notation
1738 "${arr[$bar]}" comes handy: "/${arr[0-9]}/" is interpreted as
1739 array element "-9", not as a regular expression from the vari‐
1740 able $arr followed by a digit, which would be the interpreta‐
1741 tion of "/$arr[0-9]/". Since voting among different estimators
1742 may occur, the result is not predictable.
1743
1744 It is at this step that "\1" is begrudgingly converted to $1 in
1745 the replacement text of "s///" to correct the incorrigible sed
1746 hackers who haven't picked up the saner idiom yet. A warning
1747 is emitted if the "use warnings" pragma or the -w command-line
1748 flag (that is, the $^W variable) was set.
1749
1750 The lack of processing of "\\" creates specific restrictions on
1751 the post-processed text. If the delimiter is "/", one cannot
1752 get the combination "\/" into the result of this step. "/"
1753 will finish the regular expression, "\/" will be stripped to
1754 "/" on the previous step, and "\\/" will be left as is.
1755 Because "/" is equivalent to "\/" inside a regular expression,
1756 this does not matter unless the delimiter happens to be charac‐
1757 ter special to the RE engine, such as in "s*foo*bar*",
1758 "m[foo]", or "?foo?"; or an alphanumeric char, as in:
1759
1760 m m ^ a \s* b mmx;
1761
1762 In the RE above, which is intentionally obfuscated for illus‐
1763 tration, the delimiter is "m", the modifier is "mx", and after
1764 backslash-removal the RE is the same as for "m/ ^ a \s* b /mx".
1765 There's more than one reason you're encouraged to restrict your
1766 delimiters to non-alphanumeric, non-whitespace choices.
1767
1768 This step is the last one for all constructs except regular expres‐
1769 sions, which are processed further.
1770
1771 Interpolation of regular expressions
1772 Previous steps were performed during the compilation of Perl code,
1773 but this one happens at run time--although it may be optimized to
1774 be calculated at compile time if appropriate. After preprocessing
1775 described above, and possibly after evaluation if catenation, join‐
1776 ing, casing translation, or metaquoting are involved, the resulting
1777 string is passed to the RE engine for compilation.
1778
1779 Whatever happens in the RE engine might be better discussed in
1780 perlre, but for the sake of continuity, we shall do so here.
1781
1782 This is another step where the presence of the "//x" modifier is
1783 relevant. The RE engine scans the string from left to right and
1784 converts it to a finite automaton.
1785
1786 Backslashed characters are either replaced with corresponding lit‐
1787 eral strings (as with "\{"), or else they generate special nodes in
1788 the finite automaton (as with "\b"). Characters special to the RE
1789 engine (such as "⎪") generate corresponding nodes or groups of
1790 nodes. "(?#...)" comments are ignored. All the rest is either
1791 converted to literal strings to match, or else is ignored (as is
1792 whitespace and "#"-style comments if "//x" is present).
1793
1794 Parsing of the bracketed character class construct, "[...]", is
1795 rather different than the rule used for the rest of the pattern.
1796 The terminator of this construct is found using the same rules as
1797 for finding the terminator of a "{}"-delimited construct, the only
1798 exception being that "]" immediately following "[" is treated as
1799 though preceded by a backslash. Similarly, the terminator of
1800 "(?{...})" is found using the same rules as for finding the termi‐
1801 nator of a "{}"-delimited construct.
1802
1803 It is possible to inspect both the string given to RE engine and
1804 the resulting finite automaton. See the arguments "debug"/"debug‐
1805 color" in the "use re" pragma, as well as Perl's -Dr command-line
1806 switch documented in "Command Switches" in perlrun.
1807
1808 Optimization of regular expressions
1809 This step is listed for completeness only. Since it does not
1810 change semantics, details of this step are not documented and are
1811 subject to change without notice. This step is performed over the
1812 finite automaton that was generated during the previous pass.
1813
1814 It is at this stage that "split()" silently optimizes "/^/" to mean
1815 "/^/m".
1816
1817 I/O Operators
1818
1819 There are several I/O operators you should know about.
1820
1821 A string enclosed by backticks (grave accents) first undergoes double-
1822 quote interpolation. It is then interpreted as an external command,
1823 and the output of that command is the value of the backtick string,
1824 like in a shell. In scalar context, a single string consisting of all
1825 output is returned. In list context, a list of values is returned, one
1826 per line of output. (You can set $/ to use a different line termina‐
1827 tor.) The command is executed each time the pseudo-literal is evalu‐
1828 ated. The status value of the command is returned in $? (see perlvar
1829 for the interpretation of $?). Unlike in csh, no translation is done
1830 on the return data--newlines remain newlines. Unlike in any of the
1831 shells, single quotes do not hide variable names in the command from
1832 interpretation. To pass a literal dollar-sign through to the shell you
1833 need to hide it with a backslash. The generalized form of backticks is
1834 "qx//". (Because backticks always undergo shell expansion as well, see
1835 perlsec for security concerns.)
1836
1837 In scalar context, evaluating a filehandle in angle brackets yields the
1838 next line from that file (the newline, if any, included), or "undef" at
1839 end-of-file or on error. When $/ is set to "undef" (sometimes known as
1840 file-slurp mode) and the file is empty, it returns '' the first time,
1841 followed by "undef" subsequently.
1842
1843 Ordinarily you must assign the returned value to a variable, but there
1844 is one situation where an automatic assignment happens. If and only if
1845 the input symbol is the only thing inside the conditional of a "while"
1846 statement (even if disguised as a "for(;;)" loop), the value is auto‐
1847 matically assigned to the global variable $_, destroying whatever was
1848 there previously. (This may seem like an odd thing to you, but you'll
1849 use the construct in almost every Perl script you write.) The $_ vari‐
1850 able is not implicitly localized. You'll have to put a "local $_;"
1851 before the loop if you want that to happen.
1852
1853 The following lines are equivalent:
1854
1855 while (defined($_ = <STDIN>)) { print; }
1856 while ($_ = <STDIN>) { print; }
1857 while (<STDIN>) { print; }
1858 for (;<STDIN>;) { print; }
1859 print while defined($_ = <STDIN>);
1860 print while ($_ = <STDIN>);
1861 print while <STDIN>;
1862
1863 This also behaves similarly, but avoids $_ :
1864
1865 while (my $line = <STDIN>) { print $line }
1866
1867 In these loop constructs, the assigned value (whether assignment is
1868 automatic or explicit) is then tested to see whether it is defined.
1869 The defined test avoids problems where line has a string value that
1870 would be treated as false by Perl, for example a "" or a "0" with no
1871 trailing newline. If you really mean for such values to terminate the
1872 loop, they should be tested for explicitly:
1873
1874 while (($_ = <STDIN>) ne '0') { ... }
1875 while (<STDIN>) { last unless $_; ... }
1876
1877 In other boolean contexts, "<filehandle>" without an explicit "defined"
1878 test or comparison elicit a warning if the "use warnings" pragma or the
1879 -w command-line switch (the $^W variable) is in effect.
1880
1881 The filehandles STDIN, STDOUT, and STDERR are predefined. (The file‐
1882 handles "stdin", "stdout", and "stderr" will also work except in pack‐
1883 ages, where they would be interpreted as local identifiers rather than
1884 global.) Additional filehandles may be created with the open() func‐
1885 tion, amongst others. See perlopentut and "open" in perlfunc for
1886 details on this.
1887
1888 If a <FILEHANDLE> is used in a context that is looking for a list, a
1889 list comprising all input lines is returned, one line per list element.
1890 It's easy to grow to a rather large data space this way, so use with
1891 care.
1892
1893 <FILEHANDLE> may also be spelled "readline(*FILEHANDLE)". See "read‐
1894 line" in perlfunc.
1895
1896 The null filehandle <> is special: it can be used to emulate the behav‐
1897 ior of sed and awk. Input from <> comes either from standard input, or
1898 from each file listed on the command line. Here's how it works: the
1899 first time <> is evaluated, the @ARGV array is checked, and if it is
1900 empty, $ARGV[0] is set to "-", which when opened gives you standard
1901 input. The @ARGV array is then processed as a list of filenames. The
1902 loop
1903
1904 while (<>) {
1905 ... # code for each line
1906 }
1907
1908 is equivalent to the following Perl-like pseudo code:
1909
1910 unshift(@ARGV, '-') unless @ARGV;
1911 while ($ARGV = shift) {
1912 open(ARGV, $ARGV);
1913 while (<ARGV>) {
1914 ... # code for each line
1915 }
1916 }
1917
1918 except that it isn't so cumbersome to say, and will actually work. It
1919 really does shift the @ARGV array and put the current filename into the
1920 $ARGV variable. It also uses filehandle ARGV internally--<> is just a
1921 synonym for <ARGV>, which is magical. (The pseudo code above doesn't
1922 work because it treats <ARGV> as non-magical.)
1923
1924 You can modify @ARGV before the first <> as long as the array ends up
1925 containing the list of filenames you really want. Line numbers ($.)
1926 continue as though the input were one big happy file. See the example
1927 in "eof" in perlfunc for how to reset line numbers on each file.
1928
1929 If you want to set @ARGV to your own list of files, go right ahead.
1930 This sets @ARGV to all plain text files if no @ARGV was given:
1931
1932 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
1933
1934 You can even set them to pipe commands. For example, this automati‐
1935 cally filters compressed arguments through gzip:
1936
1937 @ARGV = map { /\.(gz⎪Z)$/ ? "gzip -dc < $_ ⎪" : $_ } @ARGV;
1938
1939 If you want to pass switches into your script, you can use one of the
1940 Getopts modules or put a loop on the front like this:
1941
1942 while ($_ = $ARGV[0], /^-/) {
1943 shift;
1944 last if /^--$/;
1945 if (/^-D(.*)/) { $debug = $1 }
1946 if (/^-v/) { $verbose++ }
1947 # ... # other switches
1948 }
1949
1950 while (<>) {
1951 # ... # code for each line
1952 }
1953
1954 The <> symbol will return "undef" for end-of-file only once. If you
1955 call it again after this, it will assume you are processing another
1956 @ARGV list, and if you haven't set @ARGV, will read input from STDIN.
1957
1958 If what the angle brackets contain is a simple scalar variable (e.g.,
1959 <$foo>), then that variable contains the name of the filehandle to
1960 input from, or its typeglob, or a reference to the same. For example:
1961
1962 $fh = \*STDIN;
1963 $line = <$fh>;
1964
1965 If what's within the angle brackets is neither a filehandle nor a sim‐
1966 ple scalar variable containing a filehandle name, typeglob, or typeglob
1967 reference, it is interpreted as a filename pattern to be globbed, and
1968 either a list of filenames or the next filename in the list is
1969 returned, depending on context. This distinction is determined on syn‐
1970 tactic grounds alone. That means "<$x>" is always a readline() from an
1971 indirect handle, but "<$hash{key}>" is always a glob(). That's because
1972 $x is a simple scalar variable, but $hash{key} is not--it's a hash ele‐
1973 ment. Even "<$x >" (note the extra space) is treated as "glob("$x ")",
1974 not "readline($x)".
1975
1976 One level of double-quote interpretation is done first, but you can't
1977 say "<$foo>" because that's an indirect filehandle as explained in the
1978 previous paragraph. (In older versions of Perl, programmers would
1979 insert curly brackets to force interpretation as a filename glob:
1980 "<${foo}>". These days, it's considered cleaner to call the internal
1981 function directly as "glob($foo)", which is probably the right way to
1982 have done it in the first place.) For example:
1983
1984 while (<*.c>) {
1985 chmod 0644, $_;
1986 }
1987
1988 is roughly equivalent to:
1989
1990 open(FOO, "echo *.c ⎪ tr -s ' \t\r\f' '\\012\\012\\012\\012'⎪");
1991 while (<FOO>) {
1992 chomp;
1993 chmod 0644, $_;
1994 }
1995
1996 except that the globbing is actually done internally using the standard
1997 "File::Glob" extension. Of course, the shortest way to do the above
1998 is:
1999
2000 chmod 0644, <*.c>;
2001
2002 A (file)glob evaluates its (embedded) argument only when it is starting
2003 a new list. All values must be read before it will start over. In
2004 list context, this isn't important because you automatically get them
2005 all anyway. However, in scalar context the operator returns the next
2006 value each time it's called, or "undef" when the list has run out. As
2007 with filehandle reads, an automatic "defined" is generated when the
2008 glob occurs in the test part of a "while", because legal glob returns
2009 (e.g. a file called 0) would otherwise terminate the loop. Again,
2010 "undef" is returned only once. So if you're expecting a single value
2011 from a glob, it is much better to say
2012
2013 ($file) = <blurch*>;
2014
2015 than
2016
2017 $file = <blurch*>;
2018
2019 because the latter will alternate between returning a filename and
2020 returning false.
2021
2022 If you're trying to do variable interpolation, it's definitely better
2023 to use the glob() function, because the older notation can cause people
2024 to become confused with the indirect filehandle notation.
2025
2026 @files = glob("$dir/*.[ch]");
2027 @files = glob($files[$i]);
2028
2029 Constant Folding
2030
2031 Like C, Perl does a certain amount of expression evaluation at compile
2032 time whenever it determines that all arguments to an operator are
2033 static and have no side effects. In particular, string concatenation
2034 happens at compile time between literals that don't do variable substi‐
2035 tution. Backslash interpolation also happens at compile time. You can
2036 say
2037
2038 'Now is the time for all' . "\n" .
2039 'good men to come to.'
2040
2041 and this all reduces to one string internally. Likewise, if you say
2042
2043 foreach $file (@filenames) {
2044 if (-s $file > 5 + 100 * 2**16) { }
2045 }
2046
2047 the compiler will precompute the number which that expression repre‐
2048 sents so that the interpreter won't have to.
2049
2050 No-ops
2051
2052 Perl doesn't officially have a no-op operator, but the bare constants 0
2053 and 1 are special-cased to not produce a warning in a void context, so
2054 you can for example safely do
2055
2056 1 while foo();
2057
2058 Bitwise String Operators
2059
2060 Bitstrings of any size may be manipulated by the bitwise operators ("~
2061 ⎪ & ^").
2062
2063 If the operands to a binary bitwise op are strings of different sizes,
2064 ⎪ and ^ ops act as though the shorter operand had additional zero bits
2065 on the right, while the & op acts as though the longer operand were
2066 truncated to the length of the shorter. The granularity for such
2067 extension or truncation is one or more bytes.
2068
2069 # ASCII-based examples
2070 print "j p \n" ^ " a h"; # prints "JAPH\n"
2071 print "JA" ⎪ " ph\n"; # prints "japh\n"
2072 print "japh\nJunk" & '_____'; # prints "JAPH\n";
2073 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
2074
2075 If you are intending to manipulate bitstrings, be certain that you're
2076 supplying bitstrings: If an operand is a number, that will imply a
2077 numeric bitwise operation. You may explicitly show which type of oper‐
2078 ation you intend by using "" or "0+", as in the examples below.
2079
2080 $foo = 150 ⎪ 105; # yields 255 (0x96 ⎪ 0x69 is 0xFF)
2081 $foo = '150' ⎪ 105; # yields 255
2082 $foo = 150 ⎪ '105'; # yields 255
2083 $foo = '150' ⎪ '105'; # yields string '155' (under ASCII)
2084
2085 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
2086 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
2087
2088 See "vec" in perlfunc for information on how to manipulate individual
2089 bits in a bit vector.
2090
2091 Integer Arithmetic
2092
2093 By default, Perl assumes that it must do most of its arithmetic in
2094 floating point. But by saying
2095
2096 use integer;
2097
2098 you may tell the compiler that it's okay to use integer operations (if
2099 it feels like it) from here to the end of the enclosing BLOCK. An
2100 inner BLOCK may countermand this by saying
2101
2102 no integer;
2103
2104 which lasts until the end of that BLOCK. Note that this doesn't mean
2105 everything is only an integer, merely that Perl may use integer opera‐
2106 tions if it is so inclined. For example, even under "use integer", if
2107 you take the sqrt(2), you'll still get 1.4142135623731 or so.
2108
2109 Used on numbers, the bitwise operators ("&", "⎪", "^", "~", "<<", and
2110 ">>") always produce integral results. (But see also "Bitwise String
2111 Operators".) However, "use integer" still has meaning for them. By
2112 default, their results are interpreted as unsigned integers, but if
2113 "use integer" is in effect, their results are interpreted as signed
2114 integers. For example, "~0" usually evaluates to a large integral
2115 value. However, "use integer; ~0" is "-1" on twos-complement machines.
2116
2117 Floating-point Arithmetic
2118
2119 While "use integer" provides integer-only arithmetic, there is no anal‐
2120 ogous mechanism to provide automatic rounding or truncation to a cer‐
2121 tain number of decimal places. For rounding to a certain number of
2122 digits, sprintf() or printf() is usually the easiest route. See perl‐
2123 faq4.
2124
2125 Floating-point numbers are only approximations to what a mathematician
2126 would call real numbers. There are infinitely more reals than floats,
2127 so some corners must be cut. For example:
2128
2129 printf "%.20g\n", 123456789123456789;
2130 # produces 123456789123456784
2131
2132 Testing for exact equality of floating-point equality or inequality is
2133 not a good idea. Here's a (relatively expensive) work-around to com‐
2134 pare whether two floating-point numbers are equal to a particular num‐
2135 ber of decimal places. See Knuth, volume II, for a more robust treat‐
2136 ment of this topic.
2137
2138 sub fp_equal {
2139 my ($X, $Y, $POINTS) = @_;
2140 my ($tX, $tY);
2141 $tX = sprintf("%.${POINTS}g", $X);
2142 $tY = sprintf("%.${POINTS}g", $Y);
2143 return $tX eq $tY;
2144 }
2145
2146 The POSIX module (part of the standard perl distribution) implements
2147 ceil(), floor(), and other mathematical and trigonometric functions.
2148 The Math::Complex module (part of the standard perl distribution)
2149 defines mathematical functions that work on both the reals and the
2150 imaginary numbers. Math::Complex not as efficient as POSIX, but POSIX
2151 can't work with complex numbers.
2152
2153 Rounding in financial applications can have serious implications, and
2154 the rounding method used should be specified precisely. In these
2155 cases, it probably pays not to trust whichever system rounding is being
2156 used by Perl, but to instead implement the rounding function you need
2157 yourself.
2158
2159 Bigger Numbers
2160
2161 The standard Math::BigInt and Math::BigFloat modules provide variable-
2162 precision arithmetic and overloaded operators, although they're cur‐
2163 rently pretty slow. At the cost of some space and considerable speed,
2164 they avoid the normal pitfalls associated with limited-precision repre‐
2165 sentations.
2166
2167 use Math::BigInt;
2168 $x = Math::BigInt->new('123456789123456789');
2169 print $x * $x;
2170
2171 # prints +15241578780673678515622620750190521
2172
2173 There are several modules that let you calculate with (bound only by
2174 memory and cpu-time) unlimited or fixed precision. There are also some
2175 non-standard modules that provide faster implementations via external C
2176 libraries.
2177
2178 Here is a short, but incomplete summary:
2179
2180 Math::Fraction big, unlimited fractions like 9973 / 12967
2181 Math::String treat string sequences like numbers
2182 Math::FixedPrecision calculate with a fixed precision
2183 Math::Currency for currency calculations
2184 Bit::Vector manipulate bit vectors fast (uses C)
2185 Math::BigIntFast Bit::Vector wrapper for big numbers
2186 Math::Pari provides access to the Pari C library
2187 Math::BigInteger uses an external C library
2188 Math::Cephes uses external Cephes C library (no big numbers)
2189 Math::Cephes::Fraction fractions via the Cephes library
2190 Math::GMP another one using an external C library
2191
2192 Choose wisely.
2193
2194
2195
2196perl v5.8.8 2006-01-07 PERLOP(1)