1PERLOP(1) Perl Programmers Reference Guide PERLOP(1)
2
3
4
6 perlop - Perl operators and precedence
7
9 In Perl, the operator determines what operation is performed,
10 independent of the type of the operands. For example "$x + $y" is
11 always a numeric addition, and if $x or $y do not contain numbers, an
12 attempt is made to convert them to numbers first.
13
14 This is in contrast to many other dynamic languages, where the
15 operation is determined by the type of the first argument. It also
16 means that Perl has two versions of some operators, one for numeric and
17 one for string comparison. For example "$x == $y" compares two numbers
18 for equality, and "$x eq $y" compares two strings.
19
20 There are a few exceptions though: "x" can be either string repetition
21 or list repetition, depending on the type of the left operand, and "&",
22 "|", "^" and "~" can be either string or numeric bit operations.
23
24 Operator Precedence and Associativity
25 Operator precedence and associativity work in Perl more or less like
26 they do in mathematics.
27
28 Operator precedence means some operators group more tightly than
29 others. For example, in "2 + 4 * 5", the multiplication has higher
30 precedence, so "4 * 5" is grouped together as the right-hand operand of
31 the addition, rather than "2 + 4" being grouped together as the left-
32 hand operand of the multiplication. It is as if the expression were
33 written "2 + (4 * 5)", not "(2 + 4) * 5". So the expression yields "2 +
34 20 == 22", rather than "6 * 5 == 30".
35
36 Operator associativity defines what happens if a sequence of the same
37 operators is used one after another: usually that they will be grouped
38 at the left or the right. For example, in "9 - 3 - 2", subtraction is
39 left associative, so "9 - 3" is grouped together as the left-hand
40 operand of the second subtraction, rather than "3 - 2" being grouped
41 together as the right-hand operand of the first subtraction. It is as
42 if the expression were written "(9 - 3) - 2", not "9 - (3 - 2)". So the
43 expression yields "6 - 2 == 4", rather than "9 - 1 == 8".
44
45 For simple operators that evaluate all their operands and then combine
46 the values in some way, precedence and associativity (and parentheses)
47 imply some ordering requirements on those combining operations. For
48 example, in 2 + 4 * 5, the grouping implied by precedence means that
49 the multiplication of 4 and 5 must be performed before the addition of
50 2 and 20, simply because the result of that multiplication is required
51 as one of the operands of the addition. But the order of operations is
52 not fully determined by this: in "2 * 2 + 4 * 5" both multiplications
53 must be performed before the addition, but the grouping does not say
54 anything about the order in which the two multiplications are
55 performed. In fact Perl has a general rule that the operands of an
56 operator are evaluated in left-to-right order. A few operators such as
57 "&&=" have special evaluation rules that can result in an operand not
58 being evaluated at all; in general, the top-level operator in an
59 expression has control of operand evaluation.
60
61 Some comparison operators, as their associativity, chain with some
62 operators of the same precedence (but never with operators of different
63 precedence). This chaining means that each comparison is performed on
64 the two arguments surrounding it, with each interior argument taking
65 part in two comparisons, and the comparison results are implicitly
66 ANDed. Thus "$x < $y <= $z" behaves exactly like
67 "$x < $y && $y <= $z", assuming that "$y" is as simple a scalar as it
68 looks. The ANDing short-circuits just like "&&" does, stopping the
69 sequence of comparisons as soon as one yields false.
70
71 In a chained comparison, each argument expression is evaluated at most
72 once, even if it takes part in two comparisons, but the result of the
73 evaluation is fetched for each comparison. (It is not evaluated at all
74 if the short-circuiting means that it's not required for any
75 comparisons.) This matters if the computation of an interior argument
76 is expensive or non-deterministic. For example,
77
78 if($x < expensive_sub() <= $z) { ...
79
80 is not entirely like
81
82 if($x < expensive_sub() && expensive_sub() <= $z) { ...
83
84 but instead closer to
85
86 my $tmp = expensive_sub();
87 if($x < $tmp && $tmp <= $z) { ...
88
89 in that the subroutine is only called once. However, it's not exactly
90 like this latter code either, because the chained comparison doesn't
91 actually involve any temporary variable (named or otherwise): there is
92 no assignment. This doesn't make much difference where the expression
93 is a call to an ordinary subroutine, but matters more with an lvalue
94 subroutine, or if the argument expression yields some unusual kind of
95 scalar by other means. For example, if the argument expression yields
96 a tied scalar, then the expression is evaluated to produce that scalar
97 at most once, but the value of that scalar may be fetched up to twice,
98 once for each comparison in which it is actually used.
99
100 In this example, the expression is evaluated only once, and the tied
101 scalar (the result of the expression) is fetched for each comparison
102 that uses it.
103
104 if ($x < $tied_scalar < $z) { ...
105
106 In the next example, the expression is evaluated only once, and the
107 tied scalar is fetched once as part of the operation within the
108 expression. The result of that operation is fetched for each
109 comparison, which normally doesn't matter unless that expression result
110 is also magical due to operator overloading.
111
112 if ($x < $tied_scalar + 42 < $z) { ...
113
114 Some operators are instead non-associative, meaning that it is a syntax
115 error to use a sequence of those operators of the same precedence. For
116 example, "$x .. $y .. $z" is an error.
117
118 Perl operators have the following associativity and precedence, listed
119 from highest precedence to lowest. Operators borrowed from C keep the
120 same precedence relationship with each other, even where C's precedence
121 is slightly screwy. (This makes learning Perl easier for C folks.)
122 With very few exceptions, these all operate on scalar values only, not
123 array values.
124
125 left terms and list operators (leftward)
126 left ->
127 nonassoc ++ --
128 right **
129 right ! ~ ~. \ and unary + and -
130 left =~ !~
131 left * / % x
132 left + - .
133 left << >>
134 nonassoc named unary operators
135 nonassoc isa
136 chained < > <= >= lt gt le ge
137 chain/na == != eq ne <=> cmp ~~
138 left & &.
139 left | |. ^ ^.
140 left &&
141 left || //
142 nonassoc .. ...
143 right ?:
144 right = += -= *= etc. goto last next redo dump
145 left , =>
146 nonassoc list operators (rightward)
147 right not
148 left and
149 left or xor
150
151 In the following sections, these operators are covered in detail, in
152 the same order in which they appear in the table above.
153
154 Many operators can be overloaded for objects. See overload.
155
156 Terms and List Operators (Leftward)
157 A TERM has the highest precedence in Perl. They include variables,
158 quote and quote-like operators, any expression in parentheses, and any
159 function whose arguments are parenthesized. Actually, there aren't
160 really functions in this sense, just list operators and unary operators
161 behaving as functions because you put parentheses around the arguments.
162 These are all documented in perlfunc.
163
164 If any list operator (print(), etc.) or any unary operator (chdir(),
165 etc.) is followed by a left parenthesis as the next token, the
166 operator and arguments within parentheses are taken to be of highest
167 precedence, just like a normal function call.
168
169 In the absence of parentheses, the precedence of list operators such as
170 "print", "sort", or "chmod" is either very high or very low depending
171 on whether you are looking at the left side or the right side of the
172 operator. For example, in
173
174 @ary = (1, 3, sort 4, 2);
175 print @ary; # prints 1324
176
177 the commas on the right of the "sort" are evaluated before the "sort",
178 but the commas on the left are evaluated after. In other words, list
179 operators tend to gobble up all arguments that follow, and then act
180 like a simple TERM with regard to the preceding expression. Be careful
181 with parentheses:
182
183 # These evaluate exit before doing the print:
184 print($foo, exit); # Obviously not what you want.
185 print $foo, exit; # Nor is this.
186
187 # These do the print before evaluating exit:
188 (print $foo), exit; # This is what you want.
189 print($foo), exit; # Or this.
190 print ($foo), exit; # Or even this.
191
192 Also note that
193
194 print ($foo & 255) + 1, "\n";
195
196 probably doesn't do what you expect at first glance. The parentheses
197 enclose the argument list for "print" which is evaluated (printing the
198 result of "$foo & 255"). Then one is added to the return value of
199 "print" (usually 1). The result is something like this:
200
201 1 + 1, "\n"; # Obviously not what you meant.
202
203 To do what you meant properly, you must write:
204
205 print(($foo & 255) + 1, "\n");
206
207 See "Named Unary Operators" for more discussion of this.
208
209 Also parsed as terms are the "do {}" and "eval {}" constructs, as well
210 as subroutine and method calls, and the anonymous constructors "[]" and
211 "{}".
212
213 See also "Quote and Quote-like Operators" toward the end of this
214 section, as well as "I/O Operators".
215
216 The Arrow Operator
217 ""->"" is an infix dereference operator, just as it is in C and C++.
218 If the right side is either a "[...]", "{...}", or a "(...)" subscript,
219 then the left side must be either a hard or symbolic reference to an
220 array, a hash, or a subroutine respectively. (Or technically speaking,
221 a location capable of holding a hard reference, if it's an array or
222 hash reference being used for assignment.) See perlreftut and perlref.
223
224 Otherwise, the right side is a method name or a simple scalar variable
225 containing either the method name or a subroutine reference, and (if it
226 is a method name) the left side must be either an object (a blessed
227 reference) or a class name (that is, a package name). See perlobj.
228
229 The dereferencing cases (as opposed to method-calling cases) are
230 somewhat extended by the "postderef" feature. For the details of that
231 feature, consult "Postfix Dereference Syntax" in perlref.
232
233 Auto-increment and Auto-decrement
234 "++" and "--" work as in C. That is, if placed before a variable, they
235 increment or decrement the variable by one before returning the value,
236 and if placed after, increment or decrement after returning the value.
237
238 $i = 0; $j = 0;
239 print $i++; # prints 0
240 print ++$j; # prints 1
241
242 Note that just as in C, Perl doesn't define when the variable is
243 incremented or decremented. You just know it will be done sometime
244 before or after the value is returned. This also means that modifying
245 a variable twice in the same statement will lead to undefined behavior.
246 Avoid statements like:
247
248 $i = $i ++;
249 print ++ $i + $i ++;
250
251 Perl will not guarantee what the result of the above statements is.
252
253 The auto-increment operator has a little extra builtin magic to it. If
254 you increment a variable that is numeric, or that has ever been used in
255 a numeric context, you get a normal increment. If, however, the
256 variable has been used in only string contexts since it was set, and
257 has a value that is not the empty string and matches the pattern
258 "/^[a-zA-Z]*[0-9]*\z/", the increment is done as a string, preserving
259 each character within its range, with carry:
260
261 print ++($foo = "99"); # prints "100"
262 print ++($foo = "a0"); # prints "a1"
263 print ++($foo = "Az"); # prints "Ba"
264 print ++($foo = "zz"); # prints "aaa"
265
266 "undef" is always treated as numeric, and in particular is changed to 0
267 before incrementing (so that a post-increment of an undef value will
268 return 0 rather than "undef").
269
270 The auto-decrement operator is not magical.
271
272 Exponentiation
273 Binary "**" is the exponentiation operator. It binds even more tightly
274 than unary minus, so "-2**4" is "-(2**4)", not "(-2)**4". (This is
275 implemented using C's pow(3) function, which actually works on doubles
276 internally.)
277
278 Note that certain exponentiation expressions are ill-defined: these
279 include "0**0", "1**Inf", and "Inf**0". Do not expect any particular
280 results from these special cases, the results are platform-dependent.
281
282 Symbolic Unary Operators
283 Unary "!" performs logical negation, that is, "not". See also "not"
284 for a lower precedence version of this.
285
286 Unary "-" performs arithmetic negation if the operand is numeric,
287 including any string that looks like a number. If the operand is an
288 identifier, a string consisting of a minus sign concatenated with the
289 identifier is returned. Otherwise, if the string starts with a plus or
290 minus, a string starting with the opposite sign is returned. One
291 effect of these rules is that "-bareword" is equivalent to the string
292 "-bareword". If, however, the string begins with a non-alphabetic
293 character (excluding "+" or "-"), Perl will attempt to convert the
294 string to a numeric, and the arithmetic negation is performed. If the
295 string cannot be cleanly converted to a numeric, Perl will give the
296 warning Argument "the string" isn't numeric in negation (-) at ....
297
298 Unary "~" performs bitwise negation, that is, 1's complement. For
299 example, "0666 & ~027" is 0640. (See also "Integer Arithmetic" and
300 "Bitwise String Operators".) Note that the width of the result is
301 platform-dependent: "~0" is 32 bits wide on a 32-bit platform, but 64
302 bits wide on a 64-bit platform, so if you are expecting a certain bit
303 width, remember to use the "&" operator to mask off the excess bits.
304
305 Starting in Perl 5.28, it is a fatal error to try to complement a
306 string containing a character with an ordinal value above 255.
307
308 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
309 v5.28", then unary "~" always treats its argument as a number, and an
310 alternate form of the operator, "~.", always treats its argument as a
311 string. So "~0" and "~"0"" will both give 2**32-1 on 32-bit platforms,
312 whereas "~.0" and "~."0"" will both yield "\xff". Until Perl 5.28,
313 this feature produced a warning in the "experimental::bitwise"
314 category.
315
316 Unary "+" has no effect whatsoever, even on strings. It is useful
317 syntactically for separating a function name from a parenthesized
318 expression that would otherwise be interpreted as the complete list of
319 function arguments. (See examples above under "Terms and List
320 Operators (Leftward)".)
321
322 Unary "\" creates references. If its operand is a single sigilled
323 thing, it creates a reference to that object. If its operand is a
324 parenthesised list, then it creates references to the things mentioned
325 in the list. Otherwise it puts its operand in list context, and
326 creates a list of references to the scalars in the list provided by the
327 operand. See perlreftut and perlref. Do not confuse this behavior
328 with the behavior of backslash within a string, although both forms do
329 convey the notion of protecting the next thing from interpolation.
330
331 Binding Operators
332 Binary "=~" binds a scalar expression to a pattern match. Certain
333 operations search or modify the string $_ by default. This operator
334 makes that kind of operation work on some other string. The right
335 argument is a search pattern, substitution, or transliteration. The
336 left argument is what is supposed to be searched, substituted, or
337 transliterated instead of the default $_. When used in scalar context,
338 the return value generally indicates the success of the operation. The
339 exceptions are substitution ("s///") and transliteration ("y///") with
340 the "/r" (non-destructive) option, which cause the return value to be
341 the result of the substitution. Behavior in list context depends on
342 the particular operator. See "Regexp Quote-Like Operators" for details
343 and perlretut for examples using these operators.
344
345 If the right argument is an expression rather than a search pattern,
346 substitution, or transliteration, it is interpreted as a search pattern
347 at run time. Note that this means that its contents will be
348 interpolated twice, so
349
350 '\\' =~ q'\\';
351
352 is not ok, as the regex engine will end up trying to compile the
353 pattern "\", which it will consider a syntax error.
354
355 Binary "!~" is just like "=~" except the return value is negated in the
356 logical sense.
357
358 Binary "!~" with a non-destructive substitution ("s///r") or
359 transliteration ("y///r") is a syntax error.
360
361 Multiplicative Operators
362 Binary "*" multiplies two numbers.
363
364 Binary "/" divides two numbers.
365
366 Binary "%" is the modulo operator, which computes the division
367 remainder of its first argument with respect to its second argument.
368 Given integer operands $m and $n: If $n is positive, then "$m % $n" is
369 $m minus the largest multiple of $n less than or equal to $m. If $n is
370 negative, then "$m % $n" is $m minus the smallest multiple of $n that
371 is not less than $m (that is, the result will be less than or equal to
372 zero). If the operands $m and $n are floating point values and the
373 absolute value of $n (that is abs($n)) is less than "(UV_MAX + 1)",
374 only the integer portion of $m and $n will be used in the operation
375 (Note: here "UV_MAX" means the maximum of the unsigned integer type).
376 If the absolute value of the right operand (abs($n)) is greater than or
377 equal to "(UV_MAX + 1)", "%" computes the floating-point remainder $r
378 in the equation "($r = $m - $i*$n)" where $i is a certain integer that
379 makes $r have the same sign as the right operand $n (not as the left
380 operand $m like C function fmod()) and the absolute value less than
381 that of $n. Note that when "use integer" is in scope, "%" gives you
382 direct access to the modulo operator as implemented by your C compiler.
383 This operator is not as well defined for negative operands, but it will
384 execute faster.
385
386 Binary "x" is the repetition operator. In scalar context, or if the
387 left operand is neither enclosed in parentheses nor a "qw//" list, it
388 performs a string repetition. In that case it supplies scalar context
389 to the left operand, and returns a string consisting of the left
390 operand string repeated the number of times specified by the right
391 operand. If the "x" is in list context, and the left operand is either
392 enclosed in parentheses or a "qw//" list, it performs a list
393 repetition. In that case it supplies list context to the left operand,
394 and returns a list consisting of the left operand list repeated the
395 number of times specified by the right operand. If the right operand
396 is zero or negative (raising a warning on negative), it returns an
397 empty string or an empty list, depending on the context.
398
399 print '-' x 80; # print row of dashes
400
401 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
402
403 @ones = (1) x 80; # a list of 80 1's
404 @ones = (5) x @ones; # set all elements to 5
405
406 Additive Operators
407 Binary "+" returns the sum of two numbers.
408
409 Binary "-" returns the difference of two numbers.
410
411 Binary "." concatenates two strings.
412
413 Shift Operators
414 Binary "<<" returns the value of its left argument shifted left by the
415 number of bits specified by the right argument. Arguments should be
416 integers. (See also "Integer Arithmetic".)
417
418 Binary ">>" returns the value of its left argument shifted right by the
419 number of bits specified by the right argument. Arguments should be
420 integers. (See also "Integer Arithmetic".)
421
422 If "use integer" (see "Integer Arithmetic") is in force then signed C
423 integers are used (arithmetic shift), otherwise unsigned C integers are
424 used (logical shift), even for negative shiftees. In arithmetic right
425 shift the sign bit is replicated on the left, in logical shift zero
426 bits come in from the left.
427
428 Either way, the implementation isn't going to generate results larger
429 than the size of the integer type Perl was built with (32 bits or 64
430 bits).
431
432 Shifting by negative number of bits means the reverse shift: left shift
433 becomes right shift, right shift becomes left shift. This is unlike in
434 C, where negative shift is undefined.
435
436 Shifting by more bits than the size of the integers means most of the
437 time zero (all bits fall off), except that under "use integer" right
438 overshifting a negative shiftee results in -1. This is unlike in C,
439 where shifting by too many bits is undefined. A common C behavior is
440 "shift by modulo wordbits", so that for example
441
442 1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1 # Common C behavior.
443
444 but that is completely accidental.
445
446 If you get tired of being subject to your platform's native integers,
447 the "use bigint" pragma neatly sidesteps the issue altogether:
448
449 print 20 << 20; # 20971520
450 print 20 << 40; # 5120 on 32-bit machines,
451 # 21990232555520 on 64-bit machines
452 use bigint;
453 print 20 << 100; # 25353012004564588029934064107520
454
455 Named Unary Operators
456 The various named unary operators are treated as functions with one
457 argument, with optional parentheses.
458
459 If any list operator (print(), etc.) or any unary operator (chdir(),
460 etc.) is followed by a left parenthesis as the next token, the
461 operator and arguments within parentheses are taken to be of highest
462 precedence, just like a normal function call. For example, because
463 named unary operators are higher precedence than "||":
464
465 chdir $foo || die; # (chdir $foo) || die
466 chdir($foo) || die; # (chdir $foo) || die
467 chdir ($foo) || die; # (chdir $foo) || die
468 chdir +($foo) || die; # (chdir $foo) || die
469
470 but, because "*" is higher precedence than named operators:
471
472 chdir $foo * 20; # chdir ($foo * 20)
473 chdir($foo) * 20; # (chdir $foo) * 20
474 chdir ($foo) * 20; # (chdir $foo) * 20
475 chdir +($foo) * 20; # chdir ($foo * 20)
476
477 rand 10 * 20; # rand (10 * 20)
478 rand(10) * 20; # (rand 10) * 20
479 rand (10) * 20; # (rand 10) * 20
480 rand +(10) * 20; # rand (10 * 20)
481
482 Regarding precedence, the filetest operators, like "-f", "-M", etc. are
483 treated like named unary operators, but they don't follow this
484 functional parenthesis rule. That means, for example, that
485 "-f($file).".bak"" is equivalent to "-f "$file.bak"".
486
487 See also "Terms and List Operators (Leftward)".
488
489 Relational Operators
490 Perl operators that return true or false generally return values that
491 can be safely used as numbers. For example, the relational operators
492 in this section and the equality operators in the next one return 1 for
493 true and a special version of the defined empty string, "", which
494 counts as a zero but is exempt from warnings about improper numeric
495 conversions, just as "0 but true" is.
496
497 Binary "<" returns true if the left argument is numerically less than
498 the right argument.
499
500 Binary ">" returns true if the left argument is numerically greater
501 than the right argument.
502
503 Binary "<=" returns true if the left argument is numerically less than
504 or equal to the right argument.
505
506 Binary ">=" returns true if the left argument is numerically greater
507 than or equal to the right argument.
508
509 Binary "lt" returns true if the left argument is stringwise less than
510 the right argument.
511
512 Binary "gt" returns true if the left argument is stringwise greater
513 than the right argument.
514
515 Binary "le" returns true if the left argument is stringwise less than
516 or equal to the right argument.
517
518 Binary "ge" returns true if the left argument is stringwise greater
519 than or equal to the right argument.
520
521 A sequence of relational operators, such as "$x < $y <= $z", performs
522 chained comparisons, in the manner described above in the section
523 "Operator Precedence and Associativity". Beware that they do not chain
524 with equality operators, which have lower precedence.
525
526 Equality Operators
527 Binary "==" returns true if the left argument is numerically equal to
528 the right argument.
529
530 Binary "!=" returns true if the left argument is numerically not equal
531 to the right argument.
532
533 Binary "eq" returns true if the left argument is stringwise equal to
534 the right argument.
535
536 Binary "ne" returns true if the left argument is stringwise not equal
537 to the right argument.
538
539 A sequence of the above equality operators, such as "$x == $y == $z",
540 performs chained comparisons, in the manner described above in the
541 section "Operator Precedence and Associativity". Beware that they do
542 not chain with relational operators, which have higher precedence.
543
544 Binary "<=>" returns -1, 0, or 1 depending on whether the left argument
545 is numerically less than, equal to, or greater than the right argument.
546 If your platform supports "NaN"'s (not-a-numbers) as numeric values,
547 using them with "<=>" returns undef. "NaN" is not "<", "==", ">", "<="
548 or ">=" anything (even "NaN"), so those 5 return false. "NaN != NaN"
549 returns true, as does "NaN !=" anything else. If your platform doesn't
550 support "NaN"'s then "NaN" is just a string with numeric value 0.
551
552 $ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
553 $ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
554
555 (Note that the bigint, bigrat, and bignum pragmas all support "NaN".)
556
557 Binary "cmp" returns -1, 0, or 1 depending on whether the left argument
558 is stringwise less than, equal to, or greater than the right argument.
559
560 Here we can see the difference between <=> and cmp,
561
562 print 10 <=> 2 #prints 1
563 print 10 cmp 2 #prints -1
564
565 (likewise between gt and >, lt and <, etc.)
566
567 Binary "~~" does a smartmatch between its arguments. Smart matching is
568 described in the next section.
569
570 The two-sided ordering operators "<=>" and "cmp", and the smartmatch
571 operator "~~", are non-associative with respect to each other and with
572 respect to the equality operators of the same precedence.
573
574 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order
575 specified by the current "LC_COLLATE" locale if a "use locale" form
576 that includes collation is in effect. See perllocale. Do not mix
577 these with Unicode, only use them with legacy 8-bit locale encodings.
578 The standard "Unicode::Collate" and "Unicode::Collate::Locale" modules
579 offer much more powerful solutions to collation issues.
580
581 For case-insensitive comparisons, look at the "fc" in perlfunc case-
582 folding function, available in Perl v5.16 or later:
583
584 if ( fc($x) eq fc($y) ) { ... }
585
586 Class Instance Operator
587 Binary "isa" evaluates to true when the left argument is an object
588 instance of the class (or a subclass derived from that class) given by
589 the right argument. If the left argument is not defined, not a blessed
590 object instance, nor does not derive from the class given by the right
591 argument, the operator evaluates as false. The right argument may give
592 the class either as a bareword or a scalar expression that yields a
593 string class name:
594
595 if( $obj isa Some::Class ) { ... }
596
597 if( $obj isa "Different::Class" ) { ... }
598 if( $obj isa $name_of_class ) { ... }
599
600 This feature is available from Perl 5.31.6 onwards when enabled by "use
601 feature 'isa'". This feature is enabled automatically by a "use v5.36"
602 (or higher) declaration in the current scope.
603
604 Smartmatch Operator
605 First available in Perl 5.10.1 (the 5.10.0 version behaved
606 differently), binary "~~" does a "smartmatch" between its arguments.
607 This is mostly used implicitly in the "when" construct described in
608 perlsyn, although not all "when" clauses call the smartmatch operator.
609 Unique among all of Perl's operators, the smartmatch operator can
610 recurse. The smartmatch operator is experimental and its behavior is
611 subject to change.
612
613 It is also unique in that all other Perl operators impose a context
614 (usually string or numeric context) on their operands, autoconverting
615 those operands to those imposed contexts. In contrast, smartmatch
616 infers contexts from the actual types of its operands and uses that
617 type information to select a suitable comparison mechanism.
618
619 The "~~" operator compares its operands "polymorphically", determining
620 how to compare them according to their actual types (numeric, string,
621 array, hash, etc.). Like the equality operators with which it shares
622 the same precedence, "~~" returns 1 for true and "" for false. It is
623 often best read aloud as "in", "inside of", or "is contained in",
624 because the left operand is often looked for inside the right operand.
625 That makes the order of the operands to the smartmatch operand often
626 opposite that of the regular match operator. In other words, the
627 "smaller" thing is usually placed in the left operand and the larger
628 one in the right.
629
630 The behavior of a smartmatch depends on what type of things its
631 arguments are, as determined by the following table. The first row of
632 the table whose types apply determines the smartmatch behavior.
633 Because what actually happens is mostly determined by the type of the
634 second operand, the table is sorted on the right operand instead of on
635 the left.
636
637 Left Right Description and pseudocode
638 ===============================================================
639 Any undef check whether Any is undefined
640 like: !defined Any
641
642 Any Object invoke ~~ overloading on Object, or die
643
644 Right operand is an ARRAY:
645
646 Left Right Description and pseudocode
647 ===============================================================
648 ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
649 like: (ARRAY1[0] ~~ ARRAY2[0])
650 && (ARRAY1[1] ~~ ARRAY2[1]) && ...
651 HASH ARRAY any ARRAY elements exist as HASH keys
652 like: grep { exists HASH->{$_} } ARRAY
653 Regexp ARRAY any ARRAY elements pattern match Regexp
654 like: grep { /Regexp/ } ARRAY
655 undef ARRAY undef in ARRAY
656 like: grep { !defined } ARRAY
657 Any ARRAY smartmatch each ARRAY element[3]
658 like: grep { Any ~~ $_ } ARRAY
659
660 Right operand is a HASH:
661
662 Left Right Description and pseudocode
663 ===============================================================
664 HASH1 HASH2 all same keys in both HASHes
665 like: keys HASH1 ==
666 grep { exists HASH2->{$_} } keys HASH1
667 ARRAY HASH any ARRAY elements exist as HASH keys
668 like: grep { exists HASH->{$_} } ARRAY
669 Regexp HASH any HASH keys pattern match Regexp
670 like: grep { /Regexp/ } keys HASH
671 undef HASH always false (undef cannot be a key)
672 like: 0 == 1
673 Any HASH HASH key existence
674 like: exists HASH->{Any}
675
676 Right operand is CODE:
677
678 Left Right Description and pseudocode
679 ===============================================================
680 ARRAY CODE sub returns true on all ARRAY elements[1]
681 like: !grep { !CODE->($_) } ARRAY
682 HASH CODE sub returns true on all HASH keys[1]
683 like: !grep { !CODE->($_) } keys HASH
684 Any CODE sub passed Any returns true
685 like: CODE->(Any)
686
687 Right operand is a Regexp:
688
689 Left Right Description and pseudocode
690 ===============================================================
691 ARRAY Regexp any ARRAY elements match Regexp
692 like: grep { /Regexp/ } ARRAY
693 HASH Regexp any HASH keys match Regexp
694 like: grep { /Regexp/ } keys HASH
695 Any Regexp pattern match
696 like: Any =~ /Regexp/
697
698 Other:
699
700 Left Right Description and pseudocode
701 ===============================================================
702 Object Any invoke ~~ overloading on Object,
703 or fall back to...
704
705 Any Num numeric equality
706 like: Any == Num
707 Num nummy[4] numeric equality
708 like: Num == nummy
709 undef Any check whether undefined
710 like: !defined(Any)
711 Any Any string equality
712 like: Any eq Any
713
714 Notes:
715
716 1. Empty hashes or arrays match.
717 2. That is, each element smartmatches the element of the same index in
718 the other array.[3]
719 3. If a circular reference is found, fall back to referential equality.
720 4. Either an actual number, or a string that looks like one.
721
722 The smartmatch implicitly dereferences any non-blessed hash or array
723 reference, so the "HASH" and "ARRAY" entries apply in those cases. For
724 blessed references, the "Object" entries apply. Smartmatches involving
725 hashes only consider hash keys, never hash values.
726
727 The "like" code entry is not always an exact rendition. For example,
728 the smartmatch operator short-circuits whenever possible, but "grep"
729 does not. Also, "grep" in scalar context returns the number of
730 matches, but "~~" returns only true or false.
731
732 Unlike most operators, the smartmatch operator knows to treat "undef"
733 specially:
734
735 use v5.10.1;
736 @array = (1, 2, 3, undef, 4, 5);
737 say "some elements undefined" if undef ~~ @array;
738
739 Each operand is considered in a modified scalar context, the
740 modification being that array and hash variables are passed by
741 reference to the operator, which implicitly dereferences them. Both
742 elements of each pair are the same:
743
744 use v5.10.1;
745
746 my %hash = (red => 1, blue => 2, green => 3,
747 orange => 4, yellow => 5, purple => 6,
748 black => 7, grey => 8, white => 9);
749
750 my @array = qw(red blue green);
751
752 say "some array elements in hash keys" if @array ~~ %hash;
753 say "some array elements in hash keys" if \@array ~~ \%hash;
754
755 say "red in array" if "red" ~~ @array;
756 say "red in array" if "red" ~~ \@array;
757
758 say "some keys end in e" if /e$/ ~~ %hash;
759 say "some keys end in e" if /e$/ ~~ \%hash;
760
761 Two arrays smartmatch if each element in the first array smartmatches
762 (that is, is "in") the corresponding element in the second array,
763 recursively.
764
765 use v5.10.1;
766 my @little = qw(red blue green);
767 my @bigger = ("red", "blue", [ "orange", "green" ] );
768 if (@little ~~ @bigger) { # true!
769 say "little is contained in bigger";
770 }
771
772 Because the smartmatch operator recurses on nested arrays, this will
773 still report that "red" is in the array.
774
775 use v5.10.1;
776 my @array = qw(red blue green);
777 my $nested_array = [[[[[[[ @array ]]]]]]];
778 say "red in array" if "red" ~~ $nested_array;
779
780 If two arrays smartmatch each other, then they are deep copies of each
781 others' values, as this example reports:
782
783 use v5.12.0;
784 my @a = (0, 1, 2, [3, [4, 5], 6], 7);
785 my @b = (0, 1, 2, [3, [4, 5], 6], 7);
786
787 if (@a ~~ @b && @b ~~ @a) {
788 say "a and b are deep copies of each other";
789 }
790 elsif (@a ~~ @b) {
791 say "a smartmatches in b";
792 }
793 elsif (@b ~~ @a) {
794 say "b smartmatches in a";
795 }
796 else {
797 say "a and b don't smartmatch each other at all";
798 }
799
800 If you were to set "$b[3] = 4", then instead of reporting that "a and b
801 are deep copies of each other", it now reports that "b smartmatches in
802 a". That's because the corresponding position in @a contains an array
803 that (eventually) has a 4 in it.
804
805 Smartmatching one hash against another reports whether both contain the
806 same keys, no more and no less. This could be used to see whether two
807 records have the same field names, without caring what values those
808 fields might have. For example:
809
810 use v5.10.1;
811 sub make_dogtag {
812 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
813
814 my ($class, $init_fields) = @_;
815
816 die "Must supply (only) name, rank, and serial number"
817 unless $init_fields ~~ $REQUIRED_FIELDS;
818
819 ...
820 }
821
822 However, this only does what you mean if $init_fields is indeed a hash
823 reference. The condition "$init_fields ~~ $REQUIRED_FIELDS" also allows
824 the strings "name", "rank", "serial_num" as well as any array reference
825 that contains "name" or "rank" or "serial_num" anywhere to pass
826 through.
827
828 The smartmatch operator is most often used as the implicit operator of
829 a "when" clause. See the section on "Switch Statements" in perlsyn.
830
831 Smartmatching of Objects
832
833 To avoid relying on an object's underlying representation, if the
834 smartmatch's right operand is an object that doesn't overload "~~", it
835 raises the exception ""Smartmatching a non-overloaded object breaks
836 encapsulation"". That's because one has no business digging around to
837 see whether something is "in" an object. These are all illegal on
838 objects without a "~~" overload:
839
840 %hash ~~ $object
841 42 ~~ $object
842 "fred" ~~ $object
843
844 However, you can change the way an object is smartmatched by
845 overloading the "~~" operator. This is allowed to extend the usual
846 smartmatch semantics. For objects that do have an "~~" overload, see
847 overload.
848
849 Using an object as the left operand is allowed, although not very
850 useful. Smartmatching rules take precedence over overloading, so even
851 if the object in the left operand has smartmatch overloading, this will
852 be ignored. A left operand that is a non-overloaded object falls back
853 on a string or numeric comparison of whatever the "ref" operator
854 returns. That means that
855
856 $object ~~ X
857
858 does not invoke the overload method with "X" as an argument. Instead
859 the above table is consulted as normal, and based on the type of "X",
860 overloading may or may not be invoked. For simple strings or numbers,
861 "in" becomes equivalent to this:
862
863 $object ~~ $number ref($object) == $number
864 $object ~~ $string ref($object) eq $string
865
866 For example, this reports that the handle smells IOish (but please
867 don't really do this!):
868
869 use IO::Handle;
870 my $fh = IO::Handle->new();
871 if ($fh ~~ /\bIO\b/) {
872 say "handle smells IOish";
873 }
874
875 That's because it treats $fh as a string like
876 "IO::Handle=GLOB(0x8039e0)", then pattern matches against that.
877
878 Bitwise And
879 Binary "&" returns its operands ANDed together bit by bit. Although no
880 warning is currently raised, the result is not well defined when this
881 operation is performed on operands that aren't either numbers (see
882 "Integer Arithmetic") nor bitstrings (see "Bitwise String Operators").
883
884 Note that "&" has lower priority than relational operators, so for
885 example the parentheses are essential in a test like
886
887 print "Even\n" if ($x & 1) == 0;
888
889 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
890 v5.28", then this operator always treats its operands as numbers.
891 Before Perl 5.28 this feature produced a warning in the
892 "experimental::bitwise" category.
893
894 Bitwise Or and Exclusive Or
895 Binary "|" returns its operands ORed together bit by bit.
896
897 Binary "^" returns its operands XORed together bit by bit.
898
899 Although no warning is currently raised, the results are not well
900 defined when these operations are performed on operands that aren't
901 either numbers (see "Integer Arithmetic") nor bitstrings (see "Bitwise
902 String Operators").
903
904 Note that "|" and "^" have lower priority than relational operators, so
905 for example the parentheses are essential in a test like
906
907 print "false\n" if (8 | 2) != 10;
908
909 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
910 v5.28", then this operator always treats its operands as numbers.
911 Before Perl 5.28. this feature produced a warning in the
912 "experimental::bitwise" category.
913
914 C-style Logical And
915 Binary "&&" performs a short-circuit logical AND operation. That is,
916 if the left operand is false, the right operand is not even evaluated.
917 Scalar or list context propagates down to the right operand if it is
918 evaluated.
919
920 C-style Logical Or
921 Binary "||" performs a short-circuit logical OR operation. That is, if
922 the left operand is true, the right operand is not even evaluated.
923 Scalar or list context propagates down to the right operand if it is
924 evaluated.
925
926 Logical Defined-Or
927 Although it has no direct equivalent in C, Perl's "//" operator is
928 related to its C-style "or". In fact, it's exactly the same as "||",
929 except that it tests the left hand side's definedness instead of its
930 truth. Thus, "EXPR1 // EXPR2" returns the value of "EXPR1" if it's
931 defined, otherwise, the value of "EXPR2" is returned. ("EXPR1" is
932 evaluated in scalar context, "EXPR2" in the context of "//" itself).
933 Usually, this is the same result as "defined(EXPR1) ? EXPR1 : EXPR2"
934 (except that the ternary-operator form can be used as a lvalue, while
935 "EXPR1 // EXPR2" cannot). This is very useful for providing default
936 values for variables. If you actually want to test if at least one of
937 $x and $y is defined, use "defined($x // $y)".
938
939 The "||", "//" and "&&" operators return the last value evaluated
940 (unlike C's "||" and "&&", which return 0 or 1). Thus, a reasonably
941 portable way to find out the home directory might be:
942
943 $home = $ENV{HOME}
944 // $ENV{LOGDIR}
945 // (getpwuid($<))[7]
946 // die "You're homeless!\n";
947
948 In particular, this means that you shouldn't use this for selecting
949 between two aggregates for assignment:
950
951 @a = @b || @c; # This doesn't do the right thing
952 @a = scalar(@b) || @c; # because it really means this.
953 @a = @b ? @b : @c; # This works fine, though.
954
955 As alternatives to "&&" and "||" when used for control flow, Perl
956 provides the "and" and "or" operators (see below). The short-circuit
957 behavior is identical. The precedence of "and" and "or" is much lower,
958 however, so that you can safely use them after a list operator without
959 the need for parentheses:
960
961 unlink "alpha", "beta", "gamma"
962 or gripe(), next LINE;
963
964 With the C-style operators that would have been written like this:
965
966 unlink("alpha", "beta", "gamma")
967 || (gripe(), next LINE);
968
969 It would be even more readable to write that this way:
970
971 unless(unlink("alpha", "beta", "gamma")) {
972 gripe();
973 next LINE;
974 }
975
976 Using "or" for assignment is unlikely to do what you want; see below.
977
978 Range Operators
979 Binary ".." is the range operator, which is really two different
980 operators depending on the context. In list context, it returns a list
981 of values counting (up by ones) from the left value to the right value.
982 If the left value is greater than the right value then it returns the
983 empty list. The range operator is useful for writing "foreach (1..10)"
984 loops and for doing slice operations on arrays. In the current
985 implementation, no temporary array is created when the range operator
986 is used as the expression in "foreach" loops, but older versions of
987 Perl might burn a lot of memory when you write something like this:
988
989 for (1 .. 1_000_000) {
990 # code
991 }
992
993 The range operator also works on strings, using the magical auto-
994 increment, see below.
995
996 In scalar context, ".." returns a boolean value. The operator is
997 bistable, like a flip-flop, and emulates the line-range (comma)
998 operator of sed, awk, and various editors. Each ".." operator
999 maintains its own boolean state, even across calls to a subroutine that
1000 contains it. It is false as long as its left operand is false. Once
1001 the left operand is true, the range operator stays true until the right
1002 operand is true, AFTER which the range operator becomes false again.
1003 It doesn't become false till the next time the range operator is
1004 evaluated. It can test the right operand and become false on the same
1005 evaluation it became true (as in awk), but it still returns true once.
1006 If you don't want it to test the right operand until the next
1007 evaluation, as in sed, just use three dots ("...") instead of two. In
1008 all other regards, "..." behaves just like ".." does.
1009
1010 The right operand is not evaluated while the operator is in the "false"
1011 state, and the left operand is not evaluated while the operator is in
1012 the "true" state. The precedence is a little lower than || and &&.
1013 The value returned is either the empty string for false, or a sequence
1014 number (beginning with 1) for true. The sequence number is reset for
1015 each range encountered. The final sequence number in a range has the
1016 string "E0" appended to it, which doesn't affect its numeric value, but
1017 gives you something to search for if you want to exclude the endpoint.
1018 You can exclude the beginning point by waiting for the sequence number
1019 to be greater than 1.
1020
1021 If either operand of scalar ".." is a constant expression, that operand
1022 is considered true if it is equal ("==") to the current input line
1023 number (the $. variable).
1024
1025 To be pedantic, the comparison is actually "int(EXPR) == int(EXPR)",
1026 but that is only an issue if you use a floating point expression; when
1027 implicitly using $. as described in the previous paragraph, the
1028 comparison is "int(EXPR) == int($.)" which is only an issue when $. is
1029 set to a floating point value and you are not reading from a file.
1030 Furthermore, "span" .. "spat" or "2.18 .. 3.14" will not do what you
1031 want in scalar context because each of the operands are evaluated using
1032 their integer representation.
1033
1034 Examples:
1035
1036 As a scalar operator:
1037
1038 if (101 .. 200) { print; } # print 2nd hundred lines, short for
1039 # if ($. == 101 .. $. == 200) { print; }
1040
1041 next LINE if (1 .. /^$/); # skip header lines, short for
1042 # next LINE if ($. == 1 .. /^$/);
1043 # (typically in a loop labeled LINE)
1044
1045 s/^/> / if (/^$/ .. eof()); # quote body
1046
1047 # parse mail messages
1048 while (<>) {
1049 $in_header = 1 .. /^$/;
1050 $in_body = /^$/ .. eof;
1051 if ($in_header) {
1052 # do something
1053 } else { # in body
1054 # do something else
1055 }
1056 } continue {
1057 close ARGV if eof; # reset $. each file
1058 }
1059
1060 Here's a simple example to illustrate the difference between the two
1061 range operators:
1062
1063 @lines = (" - Foo",
1064 "01 - Bar",
1065 "1 - Baz",
1066 " - Quux");
1067
1068 foreach (@lines) {
1069 if (/0/ .. /1/) {
1070 print "$_\n";
1071 }
1072 }
1073
1074 This program will print only the line containing "Bar". If the range
1075 operator is changed to "...", it will also print the "Baz" line.
1076
1077 And now some examples as a list operator:
1078
1079 for (101 .. 200) { print } # print $_ 100 times
1080 @foo = @foo[0 .. $#foo]; # an expensive no-op
1081 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
1082
1083 Because each operand is evaluated in integer form, "2.18 .. 3.14" will
1084 return two elements in list context.
1085
1086 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
1087
1088 The range operator in list context can make use of the magical auto-
1089 increment algorithm if both operands are strings, subject to the
1090 following rules:
1091
1092 • With one exception (below), if both strings look like numbers to
1093 Perl, the magic increment will not be applied, and the strings will
1094 be treated as numbers (more specifically, integers) instead.
1095
1096 For example, "-2".."2" is the same as -2..2, and "2.18".."3.14"
1097 produces "2, 3".
1098
1099 • The exception to the above rule is when the left-hand string begins
1100 with 0 and is longer than one character, in this case the magic
1101 increment will be applied, even though strings like "01" would
1102 normally look like a number to Perl.
1103
1104 For example, "01".."04" produces "01", "02", "03", "04", and
1105 "00".."-1" produces "00" through "99" - this may seem surprising,
1106 but see the following rules for why it works this way. To get
1107 dates with leading zeros, you can say:
1108
1109 @z2 = ("01" .. "31");
1110 print $z2[$mday];
1111
1112 If you want to force strings to be interpreted as numbers, you
1113 could say
1114
1115 @numbers = ( 0+$first .. 0+$last );
1116
1117 Note: In Perl versions 5.30 and below, any string on the left-hand
1118 side beginning with "0", including the string "0" itself, would
1119 cause the magic string increment behavior. This means that on these
1120 Perl versions, "0".."-1" would produce "0" through "99", which was
1121 inconsistent with "0..-1", which produces the empty list. This also
1122 means that "0".."9" now produces a list of integers instead of a
1123 list of strings.
1124
1125 • If the initial value specified isn't part of a magical increment
1126 sequence (that is, a non-empty string matching
1127 "/^[a-zA-Z]*[0-9]*\z/"), only the initial value will be returned.
1128
1129 For example, "ax".."az" produces "ax", "ay", "az", but "*x".."az"
1130 produces only "*x".
1131
1132 • For other initial values that are strings that do follow the rules
1133 of the magical increment, the corresponding sequence will be
1134 returned.
1135
1136 For example, you can say
1137
1138 @alphabet = ("A" .. "Z");
1139
1140 to get all normal letters of the English alphabet, or
1141
1142 $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
1143
1144 to get a hexadecimal digit.
1145
1146 • If the final value specified is not in the sequence that the
1147 magical increment would produce, the sequence goes until the next
1148 value would be longer than the final value specified. If the length
1149 of the final string is shorter than the first, the empty list is
1150 returned.
1151
1152 For example, "a".."--" is the same as "a".."zz", "0".."xx" produces
1153 "0" through "99", and "aaa".."--" returns the empty list.
1154
1155 As of Perl 5.26, the list-context range operator on strings works as
1156 expected in the scope of "use feature 'unicode_strings". In previous
1157 versions, and outside the scope of that feature, it exhibits "The
1158 "Unicode Bug"" in perlunicode: its behavior depends on the internal
1159 encoding of the range endpoint.
1160
1161 Because the magical increment only works on non-empty strings matching
1162 "/^[a-zA-Z]*[0-9]*\z/", the following will only return an alpha:
1163
1164 use charnames "greek";
1165 my @greek_small = ("\N{alpha}" .. "\N{omega}");
1166
1167 To get the 25 traditional lowercase Greek letters, including both
1168 sigmas, you could use this instead:
1169
1170 use charnames "greek";
1171 my @greek_small = map { chr } ( ord("\N{alpha}")
1172 ..
1173 ord("\N{omega}")
1174 );
1175
1176 However, because there are many other lowercase Greek characters than
1177 just those, to match lowercase Greek characters in a regular
1178 expression, you could use the pattern "/(?:(?=\p{Greek})\p{Lower})+/"
1179 (or the experimental feature "/(?[ \p{Greek} & \p{Lower} ])+/").
1180
1181 Conditional Operator
1182 Ternary "?:" is the conditional operator, just as in C. It works much
1183 like an if-then-else. If the argument before the "?" is true, the
1184 argument before the ":" is returned, otherwise the argument after the
1185 ":" is returned. For example:
1186
1187 printf "I have %d dog%s.\n", $n,
1188 ($n == 1) ? "" : "s";
1189
1190 Scalar or list context propagates downward into the 2nd or 3rd
1191 argument, whichever is selected.
1192
1193 $x = $ok ? $y : $z; # get a scalar
1194 @x = $ok ? @y : @z; # get an array
1195 $x = $ok ? @y : @z; # oops, that's just a count!
1196
1197 The operator may be assigned to if both the 2nd and 3rd arguments are
1198 legal lvalues (meaning that you can assign to them):
1199
1200 ($x_or_y ? $x : $y) = $z;
1201
1202 Because this operator produces an assignable result, using assignments
1203 without parentheses will get you in trouble. For example, this:
1204
1205 $x % 2 ? $x += 10 : $x += 2
1206
1207 Really means this:
1208
1209 (($x % 2) ? ($x += 10) : $x) += 2
1210
1211 Rather than this:
1212
1213 ($x % 2) ? ($x += 10) : ($x += 2)
1214
1215 That should probably be written more simply as:
1216
1217 $x += ($x % 2) ? 10 : 2;
1218
1219 Assignment Operators
1220 "=" is the ordinary assignment operator.
1221
1222 Assignment operators work as in C. That is,
1223
1224 $x += 2;
1225
1226 is equivalent to
1227
1228 $x = $x + 2;
1229
1230 although without duplicating any side effects that dereferencing the
1231 lvalue might trigger, such as from tie(). Other assignment operators
1232 work similarly. The following are recognized:
1233
1234 **= += *= &= &.= <<= &&=
1235 -= /= |= |.= >>= ||=
1236 .= %= ^= ^.= //=
1237 x=
1238
1239 Although these are grouped by family, they all have the precedence of
1240 assignment. These combined assignment operators can only operate on
1241 scalars, whereas the ordinary assignment operator can assign to arrays,
1242 hashes, lists and even references. (See "Context" and "List value
1243 constructors" in perldata, and "Assigning to References" in perlref.)
1244
1245 Unlike in C, the scalar assignment operator produces a valid lvalue.
1246 Modifying an assignment is equivalent to doing the assignment and then
1247 modifying the variable that was assigned to. This is useful for
1248 modifying a copy of something, like this:
1249
1250 ($tmp = $global) =~ tr/13579/24680/;
1251
1252 Although as of 5.14, that can be also be accomplished this way:
1253
1254 use v5.14;
1255 $tmp = ($global =~ tr/13579/24680/r);
1256
1257 Likewise,
1258
1259 ($x += 2) *= 3;
1260
1261 is equivalent to
1262
1263 $x += 2;
1264 $x *= 3;
1265
1266 Similarly, a list assignment in list context produces the list of
1267 lvalues assigned to, and a list assignment in scalar context returns
1268 the number of elements produced by the expression on the right hand
1269 side of the assignment.
1270
1271 The three dotted bitwise assignment operators ("&.=" "|.=" "^.=") are
1272 new in Perl 5.22. See "Bitwise String Operators".
1273
1274 Comma Operator
1275 Binary "," is the comma operator. In scalar context it evaluates its
1276 left argument, throws that value away, then evaluates its right
1277 argument and returns that value. This is just like C's comma operator.
1278
1279 In list context, it's just the list argument separator, and inserts
1280 both its arguments into the list. These arguments are also evaluated
1281 from left to right.
1282
1283 The "=>" operator (sometimes pronounced "fat comma") is a synonym for
1284 the comma except that it causes a word on its left to be interpreted as
1285 a string if it begins with a letter or underscore and is composed only
1286 of letters, digits and underscores. This includes operands that might
1287 otherwise be interpreted as operators, constants, single number
1288 v-strings or function calls. If in doubt about this behavior, the left
1289 operand can be quoted explicitly.
1290
1291 Otherwise, the "=>" operator behaves exactly as the comma operator or
1292 list argument separator, according to context.
1293
1294 For example:
1295
1296 use constant FOO => "something";
1297
1298 my %h = ( FOO => 23 );
1299
1300 is equivalent to:
1301
1302 my %h = ("FOO", 23);
1303
1304 It is NOT:
1305
1306 my %h = ("something", 23);
1307
1308 The "=>" operator is helpful in documenting the correspondence between
1309 keys and values in hashes, and other paired elements in lists.
1310
1311 %hash = ( $key => $value );
1312 login( $username => $password );
1313
1314 The special quoting behavior ignores precedence, and hence may apply to
1315 part of the left operand:
1316
1317 print time.shift => "bbb";
1318
1319 That example prints something like "1314363215shiftbbb", because the
1320 "=>" implicitly quotes the "shift" immediately on its left, ignoring
1321 the fact that "time.shift" is the entire left operand.
1322
1323 List Operators (Rightward)
1324 On the right side of a list operator, the comma has very low
1325 precedence, such that it controls all comma-separated expressions found
1326 there. The only operators with lower precedence are the logical
1327 operators "and", "or", and "not", which may be used to evaluate calls
1328 to list operators without the need for parentheses:
1329
1330 open HANDLE, "< :encoding(UTF-8)", "filename"
1331 or die "Can't open: $!\n";
1332
1333 However, some people find that code harder to read than writing it with
1334 parentheses:
1335
1336 open(HANDLE, "< :encoding(UTF-8)", "filename")
1337 or die "Can't open: $!\n";
1338
1339 in which case you might as well just use the more customary "||"
1340 operator:
1341
1342 open(HANDLE, "< :encoding(UTF-8)", "filename")
1343 || die "Can't open: $!\n";
1344
1345 See also discussion of list operators in "Terms and List Operators
1346 (Leftward)".
1347
1348 Logical Not
1349 Unary "not" returns the logical negation of the expression to its
1350 right. It's the equivalent of "!" except for the very low precedence.
1351
1352 Logical And
1353 Binary "and" returns the logical conjunction of the two surrounding
1354 expressions. It's equivalent to "&&" except for the very low
1355 precedence. This means that it short-circuits: the right expression is
1356 evaluated only if the left expression is true.
1357
1358 Logical or and Exclusive Or
1359 Binary "or" returns the logical disjunction of the two surrounding
1360 expressions. It's equivalent to "||" except for the very low
1361 precedence. This makes it useful for control flow:
1362
1363 print FH $data or die "Can't write to FH: $!";
1364
1365 This means that it short-circuits: the right expression is evaluated
1366 only if the left expression is false. Due to its precedence, you must
1367 be careful to avoid using it as replacement for the "||" operator. It
1368 usually works out better for flow control than in assignments:
1369
1370 $x = $y or $z; # bug: this is wrong
1371 ($x = $y) or $z; # really means this
1372 $x = $y || $z; # better written this way
1373
1374 However, when it's a list-context assignment and you're trying to use
1375 "||" for control flow, you probably need "or" so that the assignment
1376 takes higher precedence.
1377
1378 @info = stat($file) || die; # oops, scalar sense of stat!
1379 @info = stat($file) or die; # better, now @info gets its due
1380
1381 Then again, you could always use parentheses.
1382
1383 Binary "xor" returns the exclusive-OR of the two surrounding
1384 expressions. It cannot short-circuit (of course).
1385
1386 There is no low precedence operator for defined-OR.
1387
1388 C Operators Missing From Perl
1389 Here is what C has that Perl doesn't:
1390
1391 unary & Address-of operator. (But see the "\" operator for taking a
1392 reference.)
1393
1394 unary * Dereference-address operator. (Perl's prefix dereferencing
1395 operators are typed: "$", "@", "%", and "&".)
1396
1397 (TYPE) Type-casting operator.
1398
1399 Quote and Quote-like Operators
1400 While we usually think of quotes as literal values, in Perl they
1401 function as operators, providing various kinds of interpolating and
1402 pattern matching capabilities. Perl provides customary quote
1403 characters for these behaviors, but also provides a way for you to
1404 choose your quote character for any of them. In the following table, a
1405 "{}" represents any pair of delimiters you choose.
1406
1407 Customary Generic Meaning Interpolates
1408 '' q{} Literal no
1409 "" qq{} Literal yes
1410 `` qx{} Command yes*
1411 qw{} Word list no
1412 // m{} Pattern match yes*
1413 qr{} Pattern yes*
1414 s{}{} Substitution yes*
1415 tr{}{} Transliteration no (but see below)
1416 y{}{} Transliteration no (but see below)
1417 <<EOF here-doc yes*
1418
1419 * unless the delimiter is ''.
1420
1421 Non-bracketing delimiters use the same character fore and aft, but the
1422 four sorts of ASCII brackets (round, angle, square, curly) all nest,
1423 which means that
1424
1425 q{foo{bar}baz}
1426
1427 is the same as
1428
1429 'foo{bar}baz'
1430
1431 Note, however, that this does not always work for quoting Perl code:
1432
1433 $s = q{ if($x eq "}") ... }; # WRONG
1434
1435 is a syntax error. The "Text::Balanced" module (standard as of v5.8,
1436 and from CPAN before then) is able to do this properly.
1437
1438 There can (and in some cases, must) be whitespace between the operator
1439 and the quoting characters, except when "#" is being used as the
1440 quoting character. "q#foo#" is parsed as the string "foo", while
1441 "q #foo#" is the operator "q" followed by a comment. Its argument will
1442 be taken from the next line. This allows you to write:
1443
1444 s {foo} # Replace foo
1445 {bar} # with bar.
1446
1447 The cases where whitespace must be used are when the quoting character
1448 is a word character (meaning it matches "/\w/"):
1449
1450 q XfooX # Works: means the string 'foo'
1451 qXfooX # WRONG!
1452
1453 The following escape sequences are available in constructs that
1454 interpolate, and in transliterations whose delimiters aren't single
1455 quotes ("'"). In all the ones with braces, any number of blanks and/or
1456 tabs adjoining and within the braces are allowed (and ignored).
1457
1458 Sequence Note Description
1459 \t tab (HT, TAB)
1460 \n newline (NL)
1461 \r return (CR)
1462 \f form feed (FF)
1463 \b backspace (BS)
1464 \a alarm (bell) (BEL)
1465 \e escape (ESC)
1466 \x{263A} [1,8] hex char (example shown: SMILEY)
1467 \x{ 263A } Same, but shows optional blanks inside and
1468 adjoining the braces
1469 \x1b [2,8] restricted range hex char (example: ESC)
1470 \N{name} [3] named Unicode character or character sequence
1471 \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
1472 \c[ [5] control char (example: chr(27))
1473 \o{23072} [6,8] octal char (example: SMILEY)
1474 \033 [7,8] restricted range octal char (example: ESC)
1475
1476 Note that any escape sequence using braces inside interpolated
1477 constructs may have optional blanks (tab or space characters) adjoining
1478 with and inside of the braces, as illustrated above by the second
1479 "\x{ }" example.
1480
1481 [1] The result is the character specified by the hexadecimal number
1482 between the braces. See "[8]" below for details on which
1483 character.
1484
1485 Blanks (tab or space characters) may separate the number from
1486 either or both of the braces.
1487
1488 Otherwise, only hexadecimal digits are valid between the braces.
1489 If an invalid character is encountered, a warning will be issued
1490 and the invalid character and all subsequent characters (valid or
1491 invalid) within the braces will be discarded.
1492
1493 If there are no valid digits between the braces, the generated
1494 character is the NULL character ("\x{00}"). However, an explicit
1495 empty brace ("\x{}") will not cause a warning (currently).
1496
1497 [2] The result is the character specified by the hexadecimal number in
1498 the range 0x00 to 0xFF. See "[8]" below for details on which
1499 character.
1500
1501 Only hexadecimal digits are valid following "\x". When "\x" is
1502 followed by fewer than two valid digits, any valid digits will be
1503 zero-padded. This means that "\x7" will be interpreted as "\x07",
1504 and a lone "\x" will be interpreted as "\x00". Except at the end
1505 of a string, having fewer than two valid digits will result in a
1506 warning. Note that although the warning says the illegal character
1507 is ignored, it is only ignored as part of the escape and will still
1508 be used as the subsequent character in the string. For example:
1509
1510 Original Result Warns?
1511 "\x7" "\x07" no
1512 "\x" "\x00" no
1513 "\x7q" "\x07q" yes
1514 "\xq" "\x00q" yes
1515
1516 [3] The result is the Unicode character or character sequence given by
1517 name. See charnames.
1518
1519 [4] "\N{U+hexadecimal number}" means the Unicode character whose
1520 Unicode code point is hexadecimal number.
1521
1522 [5] The character following "\c" is mapped to some other character as
1523 shown in the table:
1524
1525 Sequence Value
1526 \c@ chr(0)
1527 \cA chr(1)
1528 \ca chr(1)
1529 \cB chr(2)
1530 \cb chr(2)
1531 ...
1532 \cZ chr(26)
1533 \cz chr(26)
1534 \c[ chr(27)
1535 # See below for chr(28)
1536 \c] chr(29)
1537 \c^ chr(30)
1538 \c_ chr(31)
1539 \c? chr(127) # (on ASCII platforms; see below for link to
1540 # EBCDIC discussion)
1541
1542 In other words, it's the character whose code point has had 64
1543 xor'd with its uppercase. "\c?" is DELETE on ASCII platforms
1544 because "ord("?") ^ 64" is 127, and "\c@" is NULL because the ord
1545 of "@" is 64, so xor'ing 64 itself produces 0.
1546
1547 Also, "\c\X" yields " chr(28) . "X"" for any X, but cannot come at
1548 the end of a string, because the backslash would be parsed as
1549 escaping the end quote.
1550
1551 On ASCII platforms, the resulting characters from the list above
1552 are the complete set of ASCII controls. This isn't the case on
1553 EBCDIC platforms; see "OPERATOR DIFFERENCES" in perlebcdic for a
1554 full discussion of the differences between these for ASCII versus
1555 EBCDIC platforms.
1556
1557 Use of any other character following the "c" besides those listed
1558 above is discouraged, and as of Perl v5.20, the only characters
1559 actually allowed are the printable ASCII ones, minus the left brace
1560 "{". What happens for any of the allowed other characters is that
1561 the value is derived by xor'ing with the seventh bit, which is 64,
1562 and a warning raised if enabled. Using the non-allowed characters
1563 generates a fatal error.
1564
1565 To get platform independent controls, you can use "\N{...}".
1566
1567 [6] The result is the character specified by the octal number between
1568 the braces. See "[8]" below for details on which character.
1569
1570 Blanks (tab or space characters) may separate the number from
1571 either or both of the braces.
1572
1573 Otherwise, if a character that isn't an octal digit is encountered,
1574 a warning is raised, and the value is based on the octal digits
1575 before it, discarding it and all following characters up to the
1576 closing brace. It is a fatal error if there are no octal digits at
1577 all.
1578
1579 [7] The result is the character specified by the three-digit octal
1580 number in the range 000 to 777 (but best to not use above 077, see
1581 next paragraph). See "[8]" below for details on which character.
1582
1583 Some contexts allow 2 or even 1 digit, but any usage without
1584 exactly three digits, the first being a zero, may give unintended
1585 results. (For example, in a regular expression it may be confused
1586 with a backreference; see "Octal escapes" in perlrebackslash.)
1587 Starting in Perl 5.14, you may use "\o{}" instead, which avoids all
1588 these problems. Otherwise, it is best to use this construct only
1589 for ordinals "\077" and below, remembering to pad to the left with
1590 zeros to make three digits. For larger ordinals, either use
1591 "\o{}", or convert to something else, such as to hex and use
1592 "\N{U+}" (which is portable between platforms with different
1593 character sets) or "\x{}" instead.
1594
1595 [8] Several constructs above specify a character by a number. That
1596 number gives the character's position in the character set encoding
1597 (indexed from 0). This is called synonymously its ordinal, code
1598 position, or code point. Perl works on platforms that have a
1599 native encoding currently of either ASCII/Latin1 or EBCDIC, each of
1600 which allow specification of 256 characters. In general, if the
1601 number is 255 (0xFF, 0377) or below, Perl interprets this in the
1602 platform's native encoding. If the number is 256 (0x100, 0400) or
1603 above, Perl interprets it as a Unicode code point and the result is
1604 the corresponding Unicode character. For example "\x{50}" and
1605 "\o{120}" both are the number 80 in decimal, which is less than
1606 256, so the number is interpreted in the native character set
1607 encoding. In ASCII the character in the 80th position (indexed
1608 from 0) is the letter "P", and in EBCDIC it is the ampersand symbol
1609 "&". "\x{100}" and "\o{400}" are both 256 in decimal, so the
1610 number is interpreted as a Unicode code point no matter what the
1611 native encoding is. The name of the character in the 256th
1612 position (indexed by 0) in Unicode is "LATIN CAPITAL LETTER A WITH
1613 MACRON".
1614
1615 An exception to the above rule is that "\N{U+hex number}" is always
1616 interpreted as a Unicode code point, so that "\N{U+0050}" is "P"
1617 even on EBCDIC platforms.
1618
1619 NOTE: Unlike C and other languages, Perl has no "\v" escape sequence
1620 for the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but
1621 you may use "\N{VT}", "\ck", "\N{U+0b}", or "\x0b". ("\v" does have
1622 meaning in regular expression patterns in Perl, see perlre.)
1623
1624 The following escape sequences are available in constructs that
1625 interpolate, but not in transliterations.
1626
1627 \l lowercase next character only
1628 \u titlecase (not uppercase!) next character only
1629 \L lowercase all characters till \E or end of string
1630 \U uppercase all characters till \E or end of string
1631 \F foldcase all characters till \E or end of string
1632 \Q quote (disable) pattern metacharacters till \E or
1633 end of string
1634 \E end either case modification or quoted section
1635 (whichever was last seen)
1636
1637 See "quotemeta" in perlfunc for the exact definition of characters that
1638 are quoted by "\Q".
1639
1640 "\L", "\U", "\F", and "\Q" can stack, in which case you need one "\E"
1641 for each. For example:
1642
1643 say "This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
1644 This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
1645
1646 If a "use locale" form that includes "LC_CTYPE" is in effect (see
1647 perllocale), the case map used by "\l", "\L", "\u", and "\U" is taken
1648 from the current locale. If Unicode (for example, "\N{}" or code
1649 points of 0x100 or beyond) is being used, the case map used by "\l",
1650 "\L", "\u", and "\U" is as defined by Unicode. That means that case-
1651 mapping a single character can sometimes produce a sequence of several
1652 characters. Under "use locale", "\F" produces the same results as "\L"
1653 for all locales but a UTF-8 one, where it instead uses the Unicode
1654 definition.
1655
1656 All systems use the virtual "\n" to represent a line terminator, called
1657 a "newline". There is no such thing as an unvarying, physical newline
1658 character. It is only an illusion that the operating system, device
1659 drivers, C libraries, and Perl all conspire to preserve. Not all
1660 systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on
1661 the ancient Macs (pre-MacOS X) of yesteryear, these used to be
1662 reversed, and on systems without a line terminator, printing "\n" might
1663 emit no actual data. In general, use "\n" when you mean a "newline"
1664 for your system, but use the literal ASCII when you need an exact
1665 character. For example, most networking protocols expect and prefer a
1666 CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they
1667 often accept just "\012", they seldom tolerate just "\015". If you get
1668 in the habit of using "\n" for networking, you may be burned some day.
1669
1670 For constructs that do interpolate, variables beginning with ""$"" or
1671 ""@"" are interpolated. Subscripted variables such as $a[3] or
1672 "$href->{key}[0]" are also interpolated, as are array and hash slices.
1673 But method calls such as "$obj->meth" are not.
1674
1675 Interpolating an array or slice interpolates the elements in order,
1676 separated by the value of $", so is equivalent to interpolating
1677 "join $", @array". "Punctuation" arrays such as "@*" are usually
1678 interpolated only if the name is enclosed in braces "@{*}", but the
1679 arrays @_, "@+", and "@-" are interpolated even without braces.
1680
1681 For double-quoted strings, the quoting from "\Q" is applied after
1682 interpolation and escapes are processed.
1683
1684 "abc\Qfoo\tbar$s\Exyz"
1685
1686 is equivalent to
1687
1688 "abc" . quotemeta("foo\tbar$s") . "xyz"
1689
1690 For the pattern of regex operators ("qr//", "m//" and "s///"), the
1691 quoting from "\Q" is applied after interpolation is processed, but
1692 before escapes are processed. This allows the pattern to match
1693 literally (except for "$" and "@"). For example, the following
1694 matches:
1695
1696 '\s\t' =~ /\Q\s\t/
1697
1698 Because "$" or "@" trigger interpolation, you'll need to use something
1699 like "/\Quser\E\@\Qhost/" to match them literally.
1700
1701 Patterns are subject to an additional level of interpretation as a
1702 regular expression. This is done as a second pass, after variables are
1703 interpolated, so that regular expressions may be incorporated into the
1704 pattern from the variables. If this is not what you want, use "\Q" to
1705 interpolate a variable literally.
1706
1707 Apart from the behavior described above, Perl does not expand multiple
1708 levels of interpolation. In particular, contrary to the expectations
1709 of shell programmers, back-quotes do NOT interpolate within double
1710 quotes, nor do single quotes impede evaluation of variables when used
1711 within double quotes.
1712
1713 Regexp Quote-Like Operators
1714 Here are the quote-like operators that apply to pattern matching and
1715 related activities.
1716
1717 "qr/STRING/msixpodualn"
1718 This operator quotes (and possibly compiles) its STRING as a
1719 regular expression. STRING is interpolated the same way as
1720 PATTERN in "m/PATTERN/". If "'" is used as the delimiter, no
1721 variable interpolation is done. Returns a Perl value which may
1722 be used instead of the corresponding "/STRING/msixpodualn"
1723 expression. The returned value is a normalized version of the
1724 original pattern. It magically differs from a string
1725 containing the same characters: ref(qr/x/) returns "Regexp";
1726 however, dereferencing it is not well defined (you currently
1727 get the normalized version of the original pattern, but this
1728 may change).
1729
1730 For example,
1731
1732 $rex = qr/my.STRING/is;
1733 print $rex; # prints (?si-xm:my.STRING)
1734 s/$rex/foo/;
1735
1736 is equivalent to
1737
1738 s/my.STRING/foo/is;
1739
1740 The result may be used as a subpattern in a match:
1741
1742 $re = qr/$pattern/;
1743 $string =~ /foo${re}bar/; # can be interpolated in other
1744 # patterns
1745 $string =~ $re; # or used standalone
1746 $string =~ /$re/; # or this way
1747
1748 Since Perl may compile the pattern at the moment of execution
1749 of the qr() operator, using qr() may have speed advantages in
1750 some situations, notably if the result of qr() is used
1751 standalone:
1752
1753 sub match {
1754 my $patterns = shift;
1755 my @compiled = map qr/$_/i, @$patterns;
1756 grep {
1757 my $success = 0;
1758 foreach my $pat (@compiled) {
1759 $success = 1, last if /$pat/;
1760 }
1761 $success;
1762 } @_;
1763 }
1764
1765 Precompilation of the pattern into an internal representation
1766 at the moment of qr() avoids the need to recompile the pattern
1767 every time a match "/$pat/" is attempted. (Perl has many other
1768 internal optimizations, but none would be triggered in the
1769 above example if we did not use qr() operator.)
1770
1771 Options (specified by the following modifiers) are:
1772
1773 m Treat string as multiple lines.
1774 s Treat string as single line. (Make . match a newline)
1775 i Do case-insensitive pattern matching.
1776 x Use extended regular expressions; specifying two
1777 x's means \t and the SPACE character are ignored within
1778 square-bracketed character classes
1779 p When matching preserve a copy of the matched string so
1780 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
1781 defined (ignored starting in v5.20 as these are always
1782 defined starting in that release)
1783 o Compile pattern only once.
1784 a ASCII-restrict: Use ASCII for \d, \s, \w and [[:posix:]]
1785 character classes; specifying two a's adds the further
1786 restriction that no ASCII character will match a
1787 non-ASCII one under /i.
1788 l Use the current run-time locale's rules.
1789 u Use Unicode rules.
1790 d Use Unicode or native charset, as in 5.12 and earlier.
1791 n Non-capture mode. Don't let () fill in $1, $2, etc...
1792
1793 If a precompiled pattern is embedded in a larger pattern then
1794 the effect of "msixpluadn" will be propagated appropriately.
1795 The effect that the "/o" modifier has is not propagated, being
1796 restricted to those patterns explicitly using it.
1797
1798 The "/a", "/d", "/l", and "/u" modifiers (added in Perl 5.14)
1799 control the character set rules, but "/a" is the only one you
1800 are likely to want to specify explicitly; the other three are
1801 selected automatically by various pragmas.
1802
1803 See perlre for additional information on valid syntax for
1804 STRING, and for a detailed look at the semantics of regular
1805 expressions. In particular, all modifiers except the largely
1806 obsolete "/o" are further explained in "Modifiers" in perlre.
1807 "/o" is described in the next section.
1808
1809 "m/PATTERN/msixpodualngc"
1810 "/PATTERN/msixpodualngc"
1811 Searches a string for a pattern match, and in scalar context
1812 returns true if it succeeds, false if it fails. If no string
1813 is specified via the "=~" or "!~" operator, the $_ string is
1814 searched. (The string specified with "=~" need not be an
1815 lvalue--it may be the result of an expression evaluation, but
1816 remember the "=~" binds rather tightly.) See also perlre.
1817
1818 Options are as described in "qr//" above; in addition, the
1819 following match process modifiers are available:
1820
1821 g Match globally, i.e., find all occurrences.
1822 c Do not reset search position on a failed match when /g is
1823 in effect.
1824
1825 If "/" is the delimiter then the initial "m" is optional. With
1826 the "m" you can use any pair of non-whitespace (ASCII)
1827 characters as delimiters. This is particularly useful for
1828 matching path names that contain "/", to avoid LTS (leaning
1829 toothpick syndrome). If "?" is the delimiter, then a match-
1830 only-once rule applies, described in "m?PATTERN?" below. If
1831 "'" (single quote) is the delimiter, no variable interpolation
1832 is performed on the PATTERN. When using a delimiter character
1833 valid in an identifier, whitespace is required after the "m".
1834
1835 PATTERN may contain variables, which will be interpolated every
1836 time the pattern search is evaluated, except for when the
1837 delimiter is a single quote. (Note that $(, $), and $| are not
1838 interpolated because they look like end-of-string tests.) Perl
1839 will not recompile the pattern unless an interpolated variable
1840 that it contains changes. You can force Perl to skip the test
1841 and never recompile by adding a "/o" (which stands for "once")
1842 after the trailing delimiter. Once upon a time, Perl would
1843 recompile regular expressions unnecessarily, and this modifier
1844 was useful to tell it not to do so, in the interests of speed.
1845 But now, the only reasons to use "/o" are one of:
1846
1847 1. The variables are thousands of characters long and you know
1848 that they don't change, and you need to wring out the last
1849 little bit of speed by having Perl skip testing for that.
1850 (There is a maintenance penalty for doing this, as
1851 mentioning "/o" constitutes a promise that you won't change
1852 the variables in the pattern. If you do change them, Perl
1853 won't even notice.)
1854
1855 2. you want the pattern to use the initial values of the
1856 variables regardless of whether they change or not. (But
1857 there are saner ways of accomplishing this than using
1858 "/o".)
1859
1860 3. If the pattern contains embedded code, such as
1861
1862 use re 'eval';
1863 $code = 'foo(?{ $x })';
1864 /$code/
1865
1866 then perl will recompile each time, even though the pattern
1867 string hasn't changed, to ensure that the current value of
1868 $x is seen each time. Use "/o" if you want to avoid this.
1869
1870 The bottom line is that using "/o" is almost never a good idea.
1871
1872 The empty pattern "//"
1873 If the PATTERN evaluates to the empty string, the last
1874 successfully matched regular expression is used instead. In
1875 this case, only the "g" and "c" flags on the empty pattern are
1876 honored; the other flags are taken from the original pattern.
1877 If no match has previously succeeded, this will (silently) act
1878 instead as a genuine empty pattern (which will always match).
1879 Using a user supplied string as a pattern has the risk that if
1880 the string is empty that it triggers the "last successful
1881 match" behavior, which can be very confusing. In such cases you
1882 are recommended to replace "m/$pattern/" with "m/(?:$pattern)/"
1883 to avoid this behavior.
1884
1885 The last successful pattern may be accessed as a variable via
1886 "${^LAST_SUCCESSFUL_PATTERN}". Matching against it, or the
1887 empty pattern should have the same effect, with the exception
1888 that when there is no last successful pattern the empty pattern
1889 will silently match, whereas using the
1890 "${^LAST_SUCCESSFUL_PATTERN}" variable will produce undefined
1891 warnings (if warnings are enabled). You can check
1892 defined(${^LAST_SUCCESSFUL_PATTERN}) to test if there is a
1893 "last successful match" in the current scope.
1894
1895 Note that it's possible to confuse Perl into thinking "//" (the
1896 empty regex) is really "//" (the defined-or operator). Perl is
1897 usually pretty good about this, but some pathological cases
1898 might trigger this, such as "$x///" (is that "($x) / (//)" or
1899 "$x // /"?) and "print $fh //" ("print $fh(//" or
1900 "print($fh //"?). In all of these examples, Perl will assume
1901 you meant defined-or. If you meant the empty regex, just use
1902 parentheses or spaces to disambiguate, or even prefix the empty
1903 regex with an "m" (so "//" becomes "m//").
1904
1905 Matching in list context
1906 If the "/g" option is not used, "m//" in list context returns a
1907 list consisting of the subexpressions matched by the
1908 parentheses in the pattern, that is, ($1, $2, $3...) (Note
1909 that here $1 etc. are also set). When there are no parentheses
1910 in the pattern, the return value is the list "(1)" for success.
1911 With or without parentheses, an empty list is returned upon
1912 failure.
1913
1914 Examples:
1915
1916 open(TTY, "+</dev/tty")
1917 || die "can't access /dev/tty: $!";
1918
1919 <TTY> =~ /^y/i && foo(); # do foo if desired
1920
1921 if (/Version: *([0-9.]*)/) { $version = $1; }
1922
1923 next if m#^/usr/spool/uucp#;
1924
1925 # poor man's grep
1926 $arg = shift;
1927 while (<>) {
1928 print if /$arg/o; # compile only once (no longer needed!)
1929 }
1930
1931 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
1932
1933 This last example splits $foo into the first two words and the
1934 remainder of the line, and assigns those three fields to $F1,
1935 $F2, and $Etc. The conditional is true if any variables were
1936 assigned; that is, if the pattern matched.
1937
1938 The "/g" modifier specifies global pattern matching--that is,
1939 matching as many times as possible within the string. How it
1940 behaves depends on the context. In list context, it returns a
1941 list of the substrings matched by any capturing parentheses in
1942 the regular expression. If there are no parentheses, it
1943 returns a list of all the matched strings, as if there were
1944 parentheses around the whole pattern.
1945
1946 In scalar context, each execution of "m//g" finds the next
1947 match, returning true if it matches, and false if there is no
1948 further match. The position after the last match can be read
1949 or set using the pos() function; see "pos" in perlfunc. A
1950 failed match normally resets the search position to the
1951 beginning of the string, but you can avoid that by adding the
1952 "/c" modifier (for example, "m//gc"). Modifying the target
1953 string also resets the search position.
1954
1955 "\G assertion"
1956 You can intermix "m//g" matches with "m/\G.../g", where "\G" is
1957 a zero-width assertion that matches the exact position where
1958 the previous "m//g", if any, left off. Without the "/g"
1959 modifier, the "\G" assertion still anchors at pos() as it was
1960 at the start of the operation (see "pos" in perlfunc), but the
1961 match is of course only attempted once. Using "\G" without
1962 "/g" on a target string that has not previously had a "/g"
1963 match applied to it is the same as using the "\A" assertion to
1964 match the beginning of the string. Note also that, currently,
1965 "\G" is only properly supported when anchored at the very
1966 beginning of the pattern.
1967
1968 Examples:
1969
1970 # list context
1971 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1972
1973 # scalar context
1974 local $/ = "";
1975 while ($paragraph = <>) {
1976 while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
1977 $sentences++;
1978 }
1979 }
1980 say $sentences;
1981
1982 Here's another way to check for sentences in a paragraph:
1983
1984 my $sentence_rx = qr{
1985 (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
1986 # whitespace
1987 \p{Lu} # capital letter
1988 .*? # a bunch of anything
1989 (?<= \S ) # that ends in non-
1990 # whitespace
1991 (?<! \b [DMS]r ) # but isn't a common abbr.
1992 (?<! \b Mrs )
1993 (?<! \b Sra )
1994 (?<! \b St )
1995 [.?!] # followed by a sentence
1996 # ender
1997 (?= $ | \s ) # in front of end-of-string
1998 # or whitespace
1999 }sx;
2000 local $/ = "";
2001 while (my $paragraph = <>) {
2002 say "NEW PARAGRAPH";
2003 my $count = 0;
2004 while ($paragraph =~ /($sentence_rx)/g) {
2005 printf "\tgot sentence %d: <%s>\n", ++$count, $1;
2006 }
2007 }
2008
2009 Here's how to use "m//gc" with "\G":
2010
2011 $_ = "ppooqppqq";
2012 while ($i++ < 2) {
2013 print "1: '";
2014 print $1 while /(o)/gc; print "', pos=", pos, "\n";
2015 print "2: '";
2016 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
2017 print "3: '";
2018 print $1 while /(p)/gc; print "', pos=", pos, "\n";
2019 }
2020 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
2021
2022 The last example should print:
2023
2024 1: 'oo', pos=4
2025 2: 'q', pos=5
2026 3: 'pp', pos=7
2027 1: '', pos=7
2028 2: 'q', pos=8
2029 3: '', pos=8
2030 Final: 'q', pos=8
2031
2032 Notice that the final match matched "q" instead of "p", which a
2033 match without the "\G" anchor would have done. Also note that
2034 the final match did not update "pos". "pos" is only updated on
2035 a "/g" match. If the final match did indeed match "p", it's a
2036 good bet that you're running an ancient (pre-5.6.0) version of
2037 Perl.
2038
2039 A useful idiom for "lex"-like scanners is "/\G.../gc". You can
2040 combine several regexps like this to process a string part-by-
2041 part, doing different actions depending on which regexp
2042 matched. Each regexp tries to match where the previous one
2043 leaves off.
2044
2045 $_ = <<'EOL';
2046 $url = URI::URL->new( "http://example.com/" );
2047 die if $url eq "xXx";
2048 EOL
2049
2050 LOOP: {
2051 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
2052 print(" lowercase"), redo LOOP
2053 if /\G\p{Ll}+\b[,.;]?\s*/gc;
2054 print(" UPPERCASE"), redo LOOP
2055 if /\G\p{Lu}+\b[,.;]?\s*/gc;
2056 print(" Capitalized"), redo LOOP
2057 if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
2058 print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
2059 print(" alphanumeric"), redo LOOP
2060 if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
2061 print(" line-noise"), redo LOOP if /\G\W+/gc;
2062 print ". That's all!\n";
2063 }
2064
2065 Here is the output (split into several lines):
2066
2067 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
2068 line-noise lowercase line-noise lowercase line-noise lowercase
2069 lowercase line-noise lowercase lowercase line-noise lowercase
2070 lowercase line-noise MiXeD line-noise. That's all!
2071
2072 "m?PATTERN?msixpodualngc"
2073 This is just like the "m/PATTERN/" search, except that it
2074 matches only once between calls to the reset() operator. This
2075 is a useful optimization when you want to see only the first
2076 occurrence of something in each file of a set of files, for
2077 instance. Only "m??" patterns local to the current package
2078 are reset.
2079
2080 while (<>) {
2081 if (m?^$?) {
2082 # blank line between header and body
2083 }
2084 } continue {
2085 reset if eof; # clear m?? status for next file
2086 }
2087
2088 Another example switched the first "latin1" encoding it finds
2089 to "utf8" in a pod file:
2090
2091 s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
2092
2093 The match-once behavior is controlled by the match delimiter
2094 being "?"; with any other delimiter this is the normal "m//"
2095 operator.
2096
2097 In the past, the leading "m" in "m?PATTERN?" was optional, but
2098 omitting it would produce a deprecation warning. As of
2099 v5.22.0, omitting it produces a syntax error. If you encounter
2100 this construct in older code, you can just add "m".
2101
2102 "s/PATTERN/REPLACEMENT/msixpodualngcer"
2103 Searches a string for a pattern, and if found, replaces that
2104 pattern with the replacement text and returns the number of
2105 substitutions made. Otherwise it returns false (a value that
2106 is both an empty string ("") and numeric zero (0) as described
2107 in "Relational Operators").
2108
2109 If the "/r" (non-destructive) option is used then it runs the
2110 substitution on a copy of the string and instead of returning
2111 the number of substitutions, it returns the copy whether or not
2112 a substitution occurred. The original string is never changed
2113 when "/r" is used. The copy will always be a plain string,
2114 even if the input is an object or a tied variable.
2115
2116 If no string is specified via the "=~" or "!~" operator, the $_
2117 variable is searched and modified. Unless the "/r" option is
2118 used, the string specified must be a scalar variable, an array
2119 element, a hash element, or an assignment to one of those; that
2120 is, some sort of scalar lvalue.
2121
2122 If the delimiter chosen is a single quote, no variable
2123 interpolation is done on either the PATTERN or the REPLACEMENT.
2124 Otherwise, if the PATTERN contains a "$" that looks like a
2125 variable rather than an end-of-string test, the variable will
2126 be interpolated into the pattern at run-time. If you want the
2127 pattern compiled only once the first time the variable is
2128 interpolated, use the "/o" option. If the pattern evaluates to
2129 the empty string, the last successfully executed regular
2130 expression is used instead. See perlre for further explanation
2131 on these.
2132
2133 Options are as with "m//" with the addition of the following
2134 replacement specific options:
2135
2136 e Evaluate the right side as an expression.
2137 ee Evaluate the right side as a string then eval the
2138 result.
2139 r Return substitution and leave the original string
2140 untouched.
2141
2142 Any non-whitespace delimiter may replace the slashes. Add
2143 space after the "s" when using a character allowed in
2144 identifiers. If single quotes are used, no interpretation is
2145 done on the replacement string (the "/e" modifier overrides
2146 this, however). Note that Perl treats backticks as normal
2147 delimiters; the replacement text is not evaluated as a command.
2148 If the PATTERN is delimited by bracketing quotes, the
2149 REPLACEMENT has its own pair of quotes, which may or may not be
2150 bracketing quotes, for example, "s(foo)(bar)" or "s<foo>/bar/".
2151 A "/e" will cause the replacement portion to be treated as a
2152 full-fledged Perl expression and evaluated right then and
2153 there. It is, however, syntax checked at compile-time. A
2154 second "e" modifier will cause the replacement portion to be
2155 "eval"ed before being run as a Perl expression.
2156
2157 Examples:
2158
2159 s/\bgreen\b/mauve/g; # don't change wintergreen
2160
2161 $path =~ s|/usr/bin|/usr/local/bin|;
2162
2163 s/Login: $foo/Login: $bar/; # run-time pattern
2164
2165 ($foo = $bar) =~ s/this/that/; # copy first, then
2166 # change
2167 ($foo = "$bar") =~ s/this/that/; # convert to string,
2168 # copy, then change
2169 $foo = $bar =~ s/this/that/r; # Same as above using /r
2170 $foo = $bar =~ s/this/that/r
2171 =~ s/that/the other/r; # Chained substitutes
2172 # using /r
2173 @foo = map { s/this/that/r } @bar # /r is very useful in
2174 # maps
2175
2176 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
2177
2178 $_ = 'abc123xyz';
2179 s/\d+/$&*2/e; # yields 'abc246xyz'
2180 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
2181 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
2182
2183 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
2184 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
2185 s/^=(\w+)/pod($1)/ge; # use function call
2186
2187 $_ = 'abc123xyz';
2188 $x = s/abc/def/r; # $x is 'def123xyz' and
2189 # $_ remains 'abc123xyz'.
2190
2191 # expand variables in $_, but dynamics only, using
2192 # symbolic dereferencing
2193 s/\$(\w+)/${$1}/g;
2194
2195 # Add one to the value of any numbers in the string
2196 s/(\d+)/1 + $1/eg;
2197
2198 # Titlecase words in the last 30 characters only (presuming
2199 # that the substring doesn't start in the middle of a word)
2200 substr($str, -30) =~ s/\b(\p{Alpha})(\p{Alpha}*)\b/\u$1\L$2/g;
2201
2202 # This will expand any embedded scalar variable
2203 # (including lexicals) in $_ : First $1 is interpolated
2204 # to the variable name, and then evaluated
2205 s/(\$\w+)/$1/eeg;
2206
2207 # Delete (most) C comments.
2208 $program =~ s {
2209 /\* # Match the opening delimiter.
2210 .*? # Match a minimal number of characters.
2211 \*/ # Match the closing delimiter.
2212 } []gsx;
2213
2214 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
2215 # expensively
2216
2217 for ($variable) { # trim whitespace in $variable,
2218 # cheap
2219 s/^\s+//;
2220 s/\s+$//;
2221 }
2222
2223 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
2224
2225 $foo !~ s/A/a/g; # Lowercase all A's in $foo; return
2226 # 0 if any were found and changed;
2227 # otherwise return 1
2228
2229 Note the use of "$" instead of "\" in the last example. Unlike
2230 sed, we use the \<digit> form only in the left hand side.
2231 Anywhere else it's $<digit>.
2232
2233 Occasionally, you can't use just a "/g" to get all the changes
2234 to occur that you might want. Here are two common cases:
2235
2236 # put commas in the right places in an integer
2237 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
2238
2239 # expand tabs to 8-column spacing
2240 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
2241
2242 While "s///" accepts the "/c" flag, it has no effect beyond
2243 producing a warning if warnings are enabled.
2244
2245 Quote-Like Operators
2246 "q/STRING/"
2247 'STRING'
2248 A single-quoted, literal string. A backslash represents a
2249 backslash unless followed by the delimiter or another backslash, in
2250 which case the delimiter or backslash is interpolated.
2251
2252 $foo = q!I said, "You said, 'She said it.'"!;
2253 $bar = q('This is it.');
2254 $baz = '\n'; # a two-character string
2255
2256 "qq/STRING/"
2257 "STRING"
2258 A double-quoted, interpolated string.
2259
2260 $_ .= qq
2261 (*** The previous line contains the naughty word "$1".\n)
2262 if /\b(tcl|java|python)\b/i; # :-)
2263 $baz = "\n"; # a one-character string
2264
2265 "qx/STRING/"
2266 `STRING`
2267 A string which is (possibly) interpolated and then executed as a
2268 system command, via /bin/sh or its equivalent if required. Shell
2269 wildcards, pipes, and redirections will be honored. Similarly to
2270 "system", if the string contains no shell metacharacters then it
2271 will executed directly. The collected standard output of the
2272 command is returned; standard error is unaffected. In scalar
2273 context, it comes back as a single (potentially multi-line) string,
2274 or "undef" if the shell (or command) could not be started. In list
2275 context, returns a list of lines (however you've defined lines with
2276 $/ or $INPUT_RECORD_SEPARATOR), or an empty list if the shell (or
2277 command) could not be started.
2278
2279 Because backticks do not affect standard error, use shell file
2280 descriptor syntax (assuming the shell supports this) if you care to
2281 address this. To capture a command's STDERR and STDOUT together:
2282
2283 $output = `cmd 2>&1`;
2284
2285 To capture a command's STDOUT but discard its STDERR:
2286
2287 $output = `cmd 2>/dev/null`;
2288
2289 To capture a command's STDERR but discard its STDOUT (ordering is
2290 important here):
2291
2292 $output = `cmd 2>&1 1>/dev/null`;
2293
2294 To exchange a command's STDOUT and STDERR in order to capture the
2295 STDERR but leave its STDOUT to come out the old STDERR:
2296
2297 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
2298
2299 To read both a command's STDOUT and its STDERR separately, it's
2300 easiest to redirect them separately to files, and then read from
2301 those files when the program is done:
2302
2303 system("program args 1>program.stdout 2>program.stderr");
2304
2305 The STDIN filehandle used by the command is inherited from Perl's
2306 STDIN. For example:
2307
2308 open(SPLAT, "stuff") || die "can't open stuff: $!";
2309 open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
2310 print STDOUT `sort`;
2311
2312 will print the sorted contents of the file named "stuff".
2313
2314 Using single-quote as a delimiter protects the command from Perl's
2315 double-quote interpolation, passing it on to the shell instead:
2316
2317 $perl_info = qx(ps $$); # that's Perl's $$
2318 $shell_info = qx'ps $$'; # that's the new shell's $$
2319
2320 How that string gets evaluated is entirely subject to the command
2321 interpreter on your system. On most platforms, you will have to
2322 protect shell metacharacters if you want them treated literally.
2323 This is in practice difficult to do, as it's unclear how to escape
2324 which characters. See perlsec for a clean and safe example of a
2325 manual fork() and exec() to emulate backticks safely.
2326
2327 On some platforms (notably DOS-like ones), the shell may not be
2328 capable of dealing with multiline commands, so putting newlines in
2329 the string may not get you what you want. You may be able to
2330 evaluate multiple commands in a single line by separating them with
2331 the command separator character, if your shell supports that (for
2332 example, ";" on many Unix shells and "&" on the Windows NT "cmd"
2333 shell).
2334
2335 Perl will attempt to flush all files opened for output before
2336 starting the child process, but this may not be supported on some
2337 platforms (see perlport). To be safe, you may need to set $|
2338 ($AUTOFLUSH in "English") or call the autoflush() method of
2339 "IO::Handle" on any open handles.
2340
2341 Beware that some command shells may place restrictions on the
2342 length of the command line. You must ensure your strings don't
2343 exceed this limit after any necessary interpolations. See the
2344 platform-specific release notes for more details about your
2345 particular environment.
2346
2347 Using this operator can lead to programs that are difficult to
2348 port, because the shell commands called vary between systems, and
2349 may in fact not be present at all. As one example, the "type"
2350 command under the POSIX shell is very different from the "type"
2351 command under DOS. That doesn't mean you should go out of your way
2352 to avoid backticks when they're the right way to get something
2353 done. Perl was made to be a glue language, and one of the things
2354 it glues together is commands. Just understand what you're getting
2355 yourself into.
2356
2357 Like "system", backticks put the child process exit code in $?. If
2358 you'd like to manually inspect failure, you can check all possible
2359 failure modes by inspecting $? like this:
2360
2361 if ($? == -1) {
2362 print "failed to execute: $!\n";
2363 }
2364 elsif ($? & 127) {
2365 printf "child died with signal %d, %s coredump\n",
2366 ($? & 127), ($? & 128) ? 'with' : 'without';
2367 }
2368 else {
2369 printf "child exited with value %d\n", $? >> 8;
2370 }
2371
2372 Use the open pragma to control the I/O layers used when reading the
2373 output of the command, for example:
2374
2375 use open IN => ":encoding(UTF-8)";
2376 my $x = `cmd-producing-utf-8`;
2377
2378 "qx//" can also be called like a function with "readpipe" in
2379 perlfunc.
2380
2381 See "I/O Operators" for more discussion.
2382
2383 "qw/STRING/"
2384 Evaluates to a list of the words extracted out of STRING, using
2385 embedded whitespace as the word delimiters. It can be understood
2386 as being roughly equivalent to:
2387
2388 split(" ", q/STRING/);
2389
2390 the differences being that it only splits on ASCII whitespace,
2391 generates a real list at compile time, and in scalar context it
2392 returns the last element in the list. So this expression:
2393
2394 qw(foo bar baz)
2395
2396 is semantically equivalent to the list:
2397
2398 "foo", "bar", "baz"
2399
2400 Some frequently seen examples:
2401
2402 use POSIX qw( setlocale localeconv )
2403 @EXPORT = qw( foo bar baz );
2404
2405 A common mistake is to try to separate the words with commas or to
2406 put comments into a multi-line "qw"-string. For this reason, the
2407 "use warnings" pragma and the -w switch (that is, the $^W variable)
2408 produces warnings if the STRING contains the "," or the "#"
2409 character.
2410
2411 "tr/SEARCHLIST/REPLACEMENTLIST/cdsr"
2412 "y/SEARCHLIST/REPLACEMENTLIST/cdsr"
2413 Transliterates all occurrences of the characters found (or not
2414 found if the "/c" modifier is specified) in the search list with
2415 the positionally corresponding character in the replacement list,
2416 possibly deleting some, depending on the modifiers specified. It
2417 returns the number of characters replaced or deleted. If no string
2418 is specified via the "=~" or "!~" operator, the $_ string is
2419 transliterated.
2420
2421 For sed devotees, "y" is provided as a synonym for "tr".
2422
2423 If the "/r" (non-destructive) option is present, a new copy of the
2424 string is made and its characters transliterated, and this copy is
2425 returned no matter whether it was modified or not: the original
2426 string is always left unchanged. The new copy is always a plain
2427 string, even if the input string is an object or a tied variable.
2428
2429 Unless the "/r" option is used, the string specified with "=~" must
2430 be a scalar variable, an array element, a hash element, or an
2431 assignment to one of those; in other words, an lvalue.
2432
2433 The characters delimitting SEARCHLIST and REPLACEMENTLIST can be
2434 any printable character, not just forward slashes. If they are
2435 single quotes ("tr'SEARCHLIST'REPLACEMENTLIST'"), the only
2436 interpolation is removal of "\" from pairs of "\\"; so hyphens are
2437 interpreted literally rather than specifying a character range.
2438
2439 Otherwise, a character range may be specified with a hyphen, so
2440 "tr/A-J/0-9/" does the same replacement as
2441 "tr/ACEGIBDFHJ/0246813579/".
2442
2443 If the SEARCHLIST is delimited by bracketing quotes, the
2444 REPLACEMENTLIST must have its own pair of quotes, which may or may
2445 not be bracketing quotes; for example, "tr(aeiouy)(yuoiea)" or
2446 "tr[+\-*/]"ABCD"". This final example shows a way to visually
2447 clarify what is going on for people who are more familiar with
2448 regular expression patterns than with "tr", and who may think
2449 forward slash delimiters imply that "tr" is more like a regular
2450 expression pattern than it actually is. (Another option might be
2451 to use "tr[...][...]".)
2452
2453 "tr" isn't fully like bracketed character classes, just
2454 (significantly) more like them than it is to full patterns. For
2455 example, characters appearing more than once in either list behave
2456 differently here than in patterns, and "tr" lists do not allow
2457 backslashed character classes such as "\d" or "\pL", nor variable
2458 interpolation, so "$" and "@" are always treated as literals.
2459
2460 The allowed elements are literals plus "\'" (meaning a single
2461 quote). If the delimiters aren't single quotes, also allowed are
2462 any of the escape sequences accepted in double-quoted strings.
2463 Escape sequence details are in the table near the beginning of this
2464 section.
2465
2466 A hyphen at the beginning or end, or preceded by a backslash is
2467 also always considered a literal. Precede a delimiter character
2468 with a backslash to allow it.
2469
2470 The "tr" operator is not equivalent to the tr(1) utility.
2471 "tr[a-z][A-Z]" will uppercase the 26 letters "a" through "z", but
2472 for case changing not confined to ASCII, use "lc", "uc", "lcfirst",
2473 "ucfirst" (all documented in perlfunc), or the substitution
2474 operator "s/PATTERN/REPLACEMENT/" (with "\U", "\u", "\L", and "\l"
2475 string-interpolation escapes in the REPLACEMENT portion).
2476
2477 Most ranges are unportable between character sets, but certain ones
2478 signal Perl to do special handling to make them portable. There
2479 are two classes of portable ranges. The first are any subsets of
2480 the ranges "A-Z", "a-z", and "0-9", when expressed as literal
2481 characters.
2482
2483 tr/h-k/H-K/
2484
2485 capitalizes the letters "h", "i", "j", and "k" and nothing else, no
2486 matter what the platform's character set is. In contrast, all of
2487
2488 tr/\x68-\x6B/\x48-\x4B/
2489 tr/h-\x6B/H-\x4B/
2490 tr/\x68-k/\x48-K/
2491
2492 do the same capitalizations as the previous example when run on
2493 ASCII platforms, but something completely different on EBCDIC ones.
2494
2495 The second class of portable ranges is invoked when one or both of
2496 the range's end points are expressed as "\N{...}"
2497
2498 $string =~ tr/\N{U+20}-\N{U+7E}//d;
2499
2500 removes from $string all the platform's characters which are
2501 equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E.
2502 This is a portable range, and has the same effect on every platform
2503 it is run on. In this example, these are the ASCII printable
2504 characters. So after this is run, $string has only controls and
2505 characters which have no ASCII equivalents.
2506
2507 But, even for portable ranges, it is not generally obvious what is
2508 included without having to look things up in the manual. A sound
2509 principle is to use only ranges that both begin from, and end at,
2510 either ASCII alphabetics of equal case ("b-e", "B-E"), or digits
2511 ("1-4"). Anything else is unclear (and unportable unless "\N{...}"
2512 is used). If in doubt, spell out the character sets in full.
2513
2514 Options:
2515
2516 c Complement the SEARCHLIST.
2517 d Delete found but unreplaced characters.
2518 r Return the modified string and leave the original string
2519 untouched.
2520 s Squash duplicate replaced characters.
2521
2522 If the "/d" modifier is specified, any characters specified by
2523 SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that
2524 this is slightly more flexible than the behavior of some tr
2525 programs, which delete anything they find in the SEARCHLIST,
2526 period.)
2527
2528 If the "/s" modifier is specified, sequences of characters, all in
2529 a row, that were transliterated to the same character are squashed
2530 down to a single instance of that character.
2531
2532 my $a = "aaabbbca";
2533 $a =~ tr/ab/dd/s; # $a now is "dcd"
2534
2535 If the "/d" modifier is used, the REPLACEMENTLIST is always
2536 interpreted exactly as specified. Otherwise, if the
2537 REPLACEMENTLIST is shorter than the SEARCHLIST, the final
2538 character, if any, is replicated until it is long enough. There
2539 won't be a final character if and only if the REPLACEMENTLIST is
2540 empty, in which case REPLACEMENTLIST is copied from SEARCHLIST.
2541 An empty REPLACEMENTLIST is useful for counting characters in a
2542 class, or for squashing character sequences in a class.
2543
2544 tr/abcd// tr/abcd/abcd/
2545 tr/abcd/AB/ tr/abcd/ABBB/
2546 tr/abcd//d s/[abcd]//g
2547 tr/abcd/AB/d (tr/ab/AB/ + s/[cd]//g) - but run together
2548
2549 If the "/c" modifier is specified, the characters to be
2550 transliterated are the ones NOT in SEARCHLIST, that is, it is
2551 complemented. If "/d" and/or "/s" are also specified, they apply
2552 to the complemented SEARCHLIST. Recall, that if REPLACEMENTLIST is
2553 empty (except under "/d") a copy of SEARCHLIST is used instead.
2554 That copy is made after complementing under "/c". SEARCHLIST is
2555 sorted by code point order after complementing, and any
2556 REPLACEMENTLIST is applied to that sorted result. This means that
2557 under "/c", the order of the characters specified in SEARCHLIST is
2558 irrelevant. This can lead to different results on EBCDIC systems
2559 if REPLACEMENTLIST contains more than one character, hence it is
2560 generally non-portable to use "/c" with such a REPLACEMENTLIST.
2561
2562 Another way of describing the operation is this: If "/c" is
2563 specified, the SEARCHLIST is sorted by code point order, then
2564 complemented. If REPLACEMENTLIST is empty and "/d" is not
2565 specified, REPLACEMENTLIST is replaced by a copy of SEARCHLIST (as
2566 modified under "/c"), and these potentially modified lists are used
2567 as the basis for what follows. Any character in the target string
2568 that isn't in SEARCHLIST is passed through unchanged. Every other
2569 character in the target string is replaced by the character in
2570 REPLACEMENTLIST that positionally corresponds to its mate in
2571 SEARCHLIST, except that under "/s", the 2nd and following
2572 characters are squeezed out in a sequence of characters in a row
2573 that all translate to the same character. If SEARCHLIST is longer
2574 than REPLACEMENTLIST, characters in the target string that match a
2575 character in SEARCHLIST that doesn't have a correspondence in
2576 REPLACEMENTLIST are either deleted from the target string if "/d"
2577 is specified; or replaced by the final character in REPLACEMENTLIST
2578 if "/d" isn't specified.
2579
2580 Some examples:
2581
2582 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
2583
2584 $cnt = tr/*/*/; # count the stars in $_
2585 $cnt = tr/*//; # same thing
2586
2587 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
2588 $cnt = $sky =~ tr/*//; # same thing
2589
2590 $cnt = $sky =~ tr/*//c; # count all the non-stars in $sky
2591 $cnt = $sky =~ tr/*/*/c; # same, but transliterate each non-star
2592 # into a star, leaving the already-stars
2593 # alone. Afterwards, everything in $sky
2594 # is a star.
2595
2596 $cnt = tr/0-9//; # count the ASCII digits in $_
2597
2598 tr/a-zA-Z//s; # bookkeeper -> bokeper
2599 tr/o/o/s; # bookkeeper -> bokkeeper
2600 tr/oe/oe/s; # bookkeeper -> bokkeper
2601 tr/oe//s; # bookkeeper -> bokkeper
2602 tr/oe/o/s; # bookkeeper -> bokkopor
2603
2604 ($HOST = $host) =~ tr/a-z/A-Z/;
2605 $HOST = $host =~ tr/a-z/A-Z/r; # same thing
2606
2607 $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
2608 =~ s/:/ -p/r;
2609
2610 tr/a-zA-Z/ /cs; # change non-alphas to single space
2611
2612 @stripped = map tr/a-zA-Z/ /csr, @original;
2613 # /r with map
2614
2615 tr [\200-\377]
2616 [\000-\177]; # wickedly delete 8th bit
2617
2618 $foo !~ tr/A/a/ # transliterate all the A's in $foo to 'a',
2619 # return 0 if any were found and changed.
2620 # Otherwise return 1
2621
2622 If multiple transliterations are given for a character, only the
2623 first one is used:
2624
2625 tr/AAA/XYZ/
2626
2627 will transliterate any A to X.
2628
2629 Because the transliteration table is built at compile time, neither
2630 the SEARCHLIST nor the REPLACEMENTLIST are subjected to double
2631 quote interpolation. That means that if you want to use variables,
2632 you must use an eval():
2633
2634 eval "tr/$oldlist/$newlist/";
2635 die $@ if $@;
2636
2637 eval "tr/$oldlist/$newlist/, 1" or die $@;
2638
2639 "<<EOF"
2640 A line-oriented form of quoting is based on the shell "here-
2641 document" syntax. Following a "<<" you specify a string to
2642 terminate the quoted material, and all lines following the current
2643 line down to the terminating string are the value of the item.
2644
2645 Prefixing the terminating string with a "~" specifies that you want
2646 to use "Indented Here-docs" (see below).
2647
2648 The terminating string may be either an identifier (a word), or
2649 some quoted text. An unquoted identifier works like double quotes.
2650 There may not be a space between the "<<" and the identifier,
2651 unless the identifier is explicitly quoted. The terminating string
2652 must appear by itself (unquoted and with no surrounding whitespace)
2653 on the terminating line.
2654
2655 If the terminating string is quoted, the type of quotes used
2656 determine the treatment of the text.
2657
2658 Double Quotes
2659 Double quotes indicate that the text will be interpolated using
2660 exactly the same rules as normal double quoted strings.
2661
2662 print <<EOF;
2663 The price is $Price.
2664 EOF
2665
2666 print << "EOF"; # same as above
2667 The price is $Price.
2668 EOF
2669
2670 Single Quotes
2671 Single quotes indicate the text is to be treated literally with
2672 no interpolation of its content. This is similar to single
2673 quoted strings except that backslashes have no special meaning,
2674 with "\\" being treated as two backslashes and not one as they
2675 would in every other quoting construct.
2676
2677 Just as in the shell, a backslashed bareword following the "<<"
2678 means the same thing as a single-quoted string does:
2679
2680 $cost = <<'VISTA'; # hasta la ...
2681 That'll be $10 please, ma'am.
2682 VISTA
2683
2684 $cost = <<\VISTA; # Same thing!
2685 That'll be $10 please, ma'am.
2686 VISTA
2687
2688 This is the only form of quoting in perl where there is no need
2689 to worry about escaping content, something that code generators
2690 can and do make good use of.
2691
2692 Backticks
2693 The content of the here doc is treated just as it would be if
2694 the string were embedded in backticks. Thus the content is
2695 interpolated as though it were double quoted and then executed
2696 via the shell, with the results of the execution returned.
2697
2698 print << `EOC`; # execute command and get results
2699 echo hi there
2700 EOC
2701
2702 Indented Here-docs
2703 The here-doc modifier "~" allows you to indent your here-docs
2704 to make the code more readable:
2705
2706 if ($some_var) {
2707 print <<~EOF;
2708 This is a here-doc
2709 EOF
2710 }
2711
2712 This will print...
2713
2714 This is a here-doc
2715
2716 ...with no leading whitespace.
2717
2718 The line containing the delimiter that marks the end of the
2719 here-doc determines the indentation template for the whole
2720 thing. Compilation croaks if any non-empty line inside the
2721 here-doc does not begin with the precise indentation of the
2722 terminating line. (An empty line consists of the single
2723 character "\n".) For example, suppose the terminating line
2724 begins with a tab character followed by 4 space characters.
2725 Every non-empty line in the here-doc must begin with a tab
2726 followed by 4 spaces. They are stripped from each line, and
2727 any leading white space remaining on a line serves as the
2728 indentation for that line. Currently, only the TAB and SPACE
2729 characters are treated as whitespace for this purpose. Tabs
2730 and spaces may be mixed, but are matched exactly; tabs remain
2731 tabs and are not expanded.
2732
2733 Additional beginning whitespace (beyond what preceded the
2734 delimiter) will be preserved:
2735
2736 print <<~EOF;
2737 This text is not indented
2738 This text is indented with two spaces
2739 This text is indented with two tabs
2740 EOF
2741
2742 Finally, the modifier may be used with all of the forms
2743 mentioned above:
2744
2745 <<~\EOF;
2746 <<~'EOF'
2747 <<~"EOF"
2748 <<~`EOF`
2749
2750 And whitespace may be used between the "~" and quoted
2751 delimiters:
2752
2753 <<~ 'EOF'; # ... "EOF", `EOF`
2754
2755 It is possible to stack multiple here-docs in a row:
2756
2757 print <<"foo", <<"bar"; # you can stack them
2758 I said foo.
2759 foo
2760 I said bar.
2761 bar
2762
2763 myfunc(<< "THIS", 23, <<'THAT');
2764 Here's a line
2765 or two.
2766 THIS
2767 and here's another.
2768 THAT
2769
2770 Just don't forget that you have to put a semicolon on the end to
2771 finish the statement, as Perl doesn't know you're not going to try
2772 to do this:
2773
2774 print <<ABC
2775 179231
2776 ABC
2777 + 20;
2778
2779 If you want to remove the line terminator from your here-docs, use
2780 chomp().
2781
2782 chomp($string = <<'END');
2783 This is a string.
2784 END
2785
2786 If you want your here-docs to be indented with the rest of the
2787 code, use the "<<~FOO" construct described under "Indented Here-
2788 docs":
2789
2790 $quote = <<~'FINIS';
2791 The Road goes ever on and on,
2792 down from the door where it began.
2793 FINIS
2794
2795 If you use a here-doc within a delimited construct, such as in
2796 "s///eg", the quoted material must still come on the line following
2797 the "<<FOO" marker, which means it may be inside the delimited
2798 construct:
2799
2800 s/this/<<E . 'that'
2801 the other
2802 E
2803 . 'more '/eg;
2804
2805 It works this way as of Perl 5.18. Historically, it was
2806 inconsistent, and you would have to write
2807
2808 s/this/<<E . 'that'
2809 . 'more '/eg;
2810 the other
2811 E
2812
2813 outside of string evals.
2814
2815 Additionally, quoting rules for the end-of-string identifier are
2816 unrelated to Perl's quoting rules. q(), qq(), and the like are not
2817 supported in place of '' and "", and the only interpolation is for
2818 backslashing the quoting character:
2819
2820 print << "abc\"def";
2821 testing...
2822 abc"def
2823
2824 Finally, quoted strings cannot span multiple lines. The general
2825 rule is that the identifier must be a string literal. Stick with
2826 that, and you should be safe.
2827
2828 Gory details of parsing quoted constructs
2829 When presented with something that might have several different
2830 interpretations, Perl uses the DWIM (that's "Do What I Mean") principle
2831 to pick the most probable interpretation. This strategy is so
2832 successful that Perl programmers often do not suspect the ambivalence
2833 of what they write. But from time to time, Perl's notions differ
2834 substantially from what the author honestly meant.
2835
2836 This section hopes to clarify how Perl handles quoted constructs.
2837 Although the most common reason to learn this is to unravel
2838 labyrinthine regular expressions, because the initial steps of parsing
2839 are the same for all quoting operators, they are all discussed
2840 together.
2841
2842 The most important Perl parsing rule is the first one discussed below:
2843 when processing a quoted construct, Perl first finds the end of that
2844 construct, then interprets its contents. If you understand this rule,
2845 you may skip the rest of this section on the first reading. The other
2846 rules are likely to contradict the user's expectations much less
2847 frequently than this first one.
2848
2849 Some passes discussed below are performed concurrently, but because
2850 their results are the same, we consider them individually. For
2851 different quoting constructs, Perl performs different numbers of
2852 passes, from one to four, but these passes are always performed in the
2853 same order.
2854
2855 Finding the end
2856 The first pass is finding the end of the quoted construct. This
2857 results in saving to a safe location a copy of the text (between
2858 the starting and ending delimiters), normalized as necessary to
2859 avoid needing to know what the original delimiters were.
2860
2861 If the construct is a here-doc, the ending delimiter is a line that
2862 has a terminating string as the content. Therefore "<<EOF" is
2863 terminated by "EOF" immediately followed by "\n" and starting from
2864 the first column of the terminating line. When searching for the
2865 terminating line of a here-doc, nothing is skipped. In other
2866 words, lines after the here-doc syntax are compared with the
2867 terminating string line by line.
2868
2869 For the constructs except here-docs, single characters are used as
2870 starting and ending delimiters. If the starting delimiter is an
2871 opening punctuation (that is "(", "[", "{", or "<"), the ending
2872 delimiter is the corresponding closing punctuation (that is ")",
2873 "]", "}", or ">"). If the starting delimiter is an unpaired
2874 character like "/" or a closing punctuation, the ending delimiter
2875 is the same as the starting delimiter. Therefore a "/" terminates
2876 a "qq//" construct, while a "]" terminates both "qq[]" and "qq]]"
2877 constructs.
2878
2879 When searching for single-character delimiters, escaped delimiters
2880 and "\\" are skipped. For example, while searching for terminating
2881 "/", combinations of "\\" and "\/" are skipped. If the delimiters
2882 are bracketing, nested pairs are also skipped. For example, while
2883 searching for a closing "]" paired with the opening "[",
2884 combinations of "\\", "\]", and "\[" are all skipped, and nested
2885 "[" and "]" are skipped as well. However, when backslashes are
2886 used as the delimiters (like "qq\\" and "tr\\\"), nothing is
2887 skipped. During the search for the end, backslashes that escape
2888 delimiters or other backslashes are removed (exactly speaking, they
2889 are not copied to the safe location).
2890
2891 For constructs with three-part delimiters ("s///", "y///", and
2892 "tr///"), the search is repeated once more. If the first delimiter
2893 is not an opening punctuation, the three delimiters must be the
2894 same, such as "s!!!" and "tr)))", in which case the second
2895 delimiter terminates the left part and starts the right part at
2896 once. If the left part is delimited by bracketing punctuation
2897 (that is "()", "[]", "{}", or "<>"), the right part needs another
2898 pair of delimiters such as "s(){}" and "tr[]//". In these cases,
2899 whitespace and comments are allowed between the two parts, although
2900 the comment must follow at least one whitespace character;
2901 otherwise a character expected as the start of the comment may be
2902 regarded as the starting delimiter of the right part.
2903
2904 During this search no attention is paid to the semantics of the
2905 construct. Thus:
2906
2907 "$hash{"$foo/$bar"}"
2908
2909 or:
2910
2911 m/
2912 bar # NOT a comment, this slash / terminated m//!
2913 /x
2914
2915 do not form legal quoted expressions. The quoted part ends on the
2916 first """ and "/", and the rest happens to be a syntax error.
2917 Because the slash that terminated "m//" was followed by a "SPACE",
2918 the example above is not "m//x", but rather "m//" with no "/x"
2919 modifier. So the embedded "#" is interpreted as a literal "#".
2920
2921 Also no attention is paid to "\c\" (multichar control char syntax)
2922 during this search. Thus the second "\" in "qq/\c\/" is
2923 interpreted as a part of "\/", and the following "/" is not
2924 recognized as a delimiter. Instead, use "\034" or "\x1c" at the
2925 end of quoted constructs.
2926
2927 Interpolation
2928 The next step is interpolation in the text obtained, which is now
2929 delimiter-independent. There are multiple cases.
2930
2931 "<<'EOF'"
2932 No interpolation is performed. Note that the combination "\\"
2933 is left intact, since escaped delimiters are not available for
2934 here-docs.
2935
2936 "m''", the pattern of "s'''"
2937 No interpolation is performed at this stage. Any backslashed
2938 sequences including "\\" are treated at the stage of "Parsing
2939 regular expressions".
2940
2941 '', "q//", "tr'''", "y'''", the replacement of "s'''"
2942 The only interpolation is removal of "\" from pairs of "\\".
2943 Therefore "-" in "tr'''" and "y'''" is treated literally as a
2944 hyphen and no character range is available. "\1" in the
2945 replacement of "s'''" does not work as $1.
2946
2947 "tr///", "y///"
2948 No variable interpolation occurs. String modifying
2949 combinations for case and quoting such as "\Q", "\U", and "\E"
2950 are not recognized. The other escape sequences such as "\200"
2951 and "\t" and backslashed characters such as "\\" and "\-" are
2952 converted to appropriate literals. The character "-" is
2953 treated specially and therefore "\-" is treated as a literal
2954 "-".
2955
2956 "", ``, "qq//", "qx//", "<file*glob>", "<<"EOF""
2957 "\Q", "\U", "\u", "\L", "\l", "\F" (possibly paired with "\E")
2958 are converted to corresponding Perl constructs. Thus,
2959 "$foo\Qbaz$bar" is converted to
2960 "$foo . (quotemeta("baz" . $bar))" internally. The other
2961 escape sequences such as "\200" and "\t" and backslashed
2962 characters such as "\\" and "\-" are replaced with appropriate
2963 expansions.
2964
2965 Let it be stressed that whatever falls between "\Q" and "\E" is
2966 interpolated in the usual way. Something like "\Q\\E" has no
2967 "\E" inside. Instead, it has "\Q", "\\", and "E", so the
2968 result is the same as for "\\\\E". As a general rule,
2969 backslashes between "\Q" and "\E" may lead to counterintuitive
2970 results. So, "\Q\t\E" is converted to quotemeta("\t"), which
2971 is the same as "\\\t" (since TAB is not alphanumeric). Note
2972 also that:
2973
2974 $str = '\t';
2975 return "\Q$str";
2976
2977 may be closer to the conjectural intention of the writer of
2978 "\Q\t\E".
2979
2980 Interpolated scalars and arrays are converted internally to the
2981 "join" and "." catenation operations. Thus, "$foo XXX '@arr'"
2982 becomes:
2983
2984 $foo . " XXX '" . (join $", @arr) . "'";
2985
2986 All operations above are performed simultaneously, left to
2987 right.
2988
2989 Because the result of "\Q STRING \E" has all metacharacters
2990 quoted, there is no way to insert a literal "$" or "@" inside a
2991 "\Q\E" pair. If protected by "\", "$" will be quoted to become
2992 "\\\$"; if not, it is interpreted as the start of an
2993 interpolated scalar.
2994
2995 Note also that the interpolation code needs to make a decision
2996 on where the interpolated scalar ends. For instance, whether
2997 "a $x -> {c}" really means:
2998
2999 "a " . $x . " -> {c}";
3000
3001 or:
3002
3003 "a " . $x -> {c};
3004
3005 Most of the time, the longest possible text that does not
3006 include spaces between components and which contains matching
3007 braces or brackets. because the outcome may be determined by
3008 voting based on heuristic estimators, the result is not
3009 strictly predictable. Fortunately, it's usually correct for
3010 ambiguous cases.
3011
3012 The replacement of "s///"
3013 Processing of "\Q", "\U", "\u", "\L", "\l", "\F" and
3014 interpolation happens as with "qq//" constructs.
3015
3016 It is at this step that "\1" is begrudgingly converted to $1 in
3017 the replacement text of "s///", in order to correct the
3018 incorrigible sed hackers who haven't picked up the saner idiom
3019 yet. A warning is emitted if the "use warnings" pragma or the
3020 -w command-line flag (that is, the $^W variable) was set.
3021
3022 "RE" in "m?RE?", "/RE/", "m/RE/", "s/RE/foo/",
3023 Processing of "\Q", "\U", "\u", "\L", "\l", "\F", "\E", and
3024 interpolation happens (almost) as with "qq//" constructs.
3025
3026 Processing of "\N{...}" is also done here, and compiled into an
3027 intermediate form for the regex compiler. (This is because, as
3028 mentioned below, the regex compilation may be done at execution
3029 time, and "\N{...}" is a compile-time construct.)
3030
3031 However any other combinations of "\" followed by a character
3032 are not substituted but only skipped, in order to parse them as
3033 regular expressions at the following step. As "\c" is skipped
3034 at this step, "@" of "\c@" in RE is possibly treated as an
3035 array symbol (for example @foo), even though the same text in
3036 "qq//" gives interpolation of "\c@".
3037
3038 Code blocks such as "(?{BLOCK})" are handled by temporarily
3039 passing control back to the perl parser, in a similar way that
3040 an interpolated array subscript expression such as
3041 "foo$array[1+f("[xyz")]bar" would be.
3042
3043 Moreover, inside "(?{BLOCK})", "(?# comment )", and a
3044 "#"-comment in a "/x"-regular expression, no processing is
3045 performed whatsoever. This is the first step at which the
3046 presence of the "/x" modifier is relevant.
3047
3048 Interpolation in patterns has several quirks: $|, $(, $), "@+"
3049 and "@-" are not interpolated, and constructs $var[SOMETHING]
3050 are voted (by several different estimators) to be either an
3051 array element or $var followed by an RE alternative. This is
3052 where the notation "${arr[$bar]}" comes handy: "/${arr[0-9]}/"
3053 is interpreted as array element -9, not as a regular expression
3054 from the variable $arr followed by a digit, which would be the
3055 interpretation of "/$arr[0-9]/". Since voting among different
3056 estimators may occur, the result is not predictable.
3057
3058 The lack of processing of "\\" creates specific restrictions on
3059 the post-processed text. If the delimiter is "/", one cannot
3060 get the combination "\/" into the result of this step. "/"
3061 will finish the regular expression, "\/" will be stripped to
3062 "/" on the previous step, and "\\/" will be left as is.
3063 Because "/" is equivalent to "\/" inside a regular expression,
3064 this does not matter unless the delimiter happens to be
3065 character special to the RE engine, such as in "s*foo*bar*",
3066 "m[foo]", or "m?foo?"; or an alphanumeric char, as in:
3067
3068 m m ^ a \s* b mmx;
3069
3070 In the RE above, which is intentionally obfuscated for
3071 illustration, the delimiter is "m", the modifier is "mx", and
3072 after delimiter-removal the RE is the same as for
3073 "m/ ^ a \s* b /mx". There's more than one reason you're
3074 encouraged to restrict your delimiters to non-alphanumeric,
3075 non-whitespace choices.
3076
3077 This step is the last one for all constructs except regular
3078 expressions, which are processed further.
3079
3080 Parsing regular expressions
3081 Previous steps were performed during the compilation of Perl code,
3082 but this one happens at run time, although it may be optimized to
3083 be calculated at compile time if appropriate. After preprocessing
3084 described above, and possibly after evaluation if concatenation,
3085 joining, casing translation, or metaquoting are involved, the
3086 resulting string is passed to the RE engine for compilation.
3087
3088 Whatever happens in the RE engine might be better discussed in
3089 perlre, but for the sake of continuity, we shall do so here.
3090
3091 This is another step where the presence of the "/x" modifier is
3092 relevant. The RE engine scans the string from left to right and
3093 converts it into a finite automaton.
3094
3095 Backslashed characters are either replaced with corresponding
3096 literal strings (as with "\{"), or else they generate special nodes
3097 in the finite automaton (as with "\b"). Characters special to the
3098 RE engine (such as "|") generate corresponding nodes or groups of
3099 nodes. "(?#...)" comments are ignored. All the rest is either
3100 converted to literal strings to match, or else is ignored (as is
3101 whitespace and "#"-style comments if "/x" is present).
3102
3103 Parsing of the bracketed character class construct, "[...]", is
3104 rather different than the rule used for the rest of the pattern.
3105 The terminator of this construct is found using the same rules as
3106 for finding the terminator of a "{}"-delimited construct, the only
3107 exception being that "]" immediately following "[" is treated as
3108 though preceded by a backslash.
3109
3110 The terminator of runtime "(?{...})" is found by temporarily
3111 switching control to the perl parser, which should stop at the
3112 point where the logically balancing terminating "}" is found.
3113
3114 It is possible to inspect both the string given to RE engine and
3115 the resulting finite automaton. See the arguments
3116 "debug"/"debugcolor" in the "use re" pragma, as well as Perl's -Dr
3117 command-line switch documented in "Command Switches" in perlrun.
3118
3119 Optimization of regular expressions
3120 This step is listed for completeness only. Since it does not
3121 change semantics, details of this step are not documented and are
3122 subject to change without notice. This step is performed over the
3123 finite automaton that was generated during the previous pass.
3124
3125 It is at this stage that split() silently optimizes "/^/" to mean
3126 "/^/m".
3127
3128 I/O Operators
3129 There are several I/O operators you should know about.
3130
3131 A string enclosed by backticks (grave accents) first undergoes double-
3132 quote interpolation. It is then interpreted as an external command,
3133 and the output of that command is the value of the backtick string,
3134 like in a shell. In scalar context, a single string consisting of all
3135 output is returned. In list context, a list of values is returned, one
3136 per line of output. (You can set $/ to use a different line
3137 terminator.) The command is executed each time the pseudo-literal is
3138 evaluated. The status value of the command is returned in $? (see
3139 perlvar for the interpretation of $?). Unlike in csh, no translation
3140 is done on the return data--newlines remain newlines. Unlike in any of
3141 the shells, single quotes do not hide variable names in the command
3142 from interpretation. To pass a literal dollar-sign through to the
3143 shell you need to hide it with a backslash. The generalized form of
3144 backticks is "qx//", or you can call the "readpipe" in perlfunc
3145 function. (Because backticks always undergo shell expansion as well,
3146 see perlsec for security concerns.)
3147
3148 In scalar context, evaluating a filehandle in angle brackets yields the
3149 next line from that file (the newline, if any, included), or "undef" at
3150 end-of-file or on error. When $/ is set to "undef" (sometimes known as
3151 file-slurp mode) and the file is empty, it returns '' the first time,
3152 followed by "undef" subsequently.
3153
3154 Ordinarily you must assign the returned value to a variable, but there
3155 is one situation where an automatic assignment happens. If and only if
3156 the input symbol is the only thing inside the conditional of a "while"
3157 statement (even if disguised as a for(;;) loop), the value is
3158 automatically assigned to the global variable $_, destroying whatever
3159 was there previously. (This may seem like an odd thing to you, but
3160 you'll use the construct in almost every Perl script you write.) The
3161 $_ variable is not implicitly localized. You'll have to put a
3162 "local $_;" before the loop if you want that to happen. Furthermore,
3163 if the input symbol or an explicit assignment of the input symbol to a
3164 scalar is used as a "while"/"for" condition, then the condition
3165 actually tests for definedness of the expression's value, not for its
3166 regular truth value.
3167
3168 Thus the following lines are equivalent:
3169
3170 while (defined($_ = <STDIN>)) { print; }
3171 while ($_ = <STDIN>) { print; }
3172 while (<STDIN>) { print; }
3173 for (;<STDIN>;) { print; }
3174 print while defined($_ = <STDIN>);
3175 print while ($_ = <STDIN>);
3176 print while <STDIN>;
3177
3178 This also behaves similarly, but assigns to a lexical variable instead
3179 of to $_:
3180
3181 while (my $line = <STDIN>) { print $line }
3182
3183 In these loop constructs, the assigned value (whether assignment is
3184 automatic or explicit) is then tested to see whether it is defined.
3185 The defined test avoids problems where the line has a string value that
3186 would be treated as false by Perl; for example a "" or a "0" with no
3187 trailing newline. If you really mean for such values to terminate the
3188 loop, they should be tested for explicitly:
3189
3190 while (($_ = <STDIN>) ne '0') { ... }
3191 while (<STDIN>) { last unless $_; ... }
3192
3193 In other boolean contexts, "<FILEHANDLE>" without an explicit "defined"
3194 test or comparison elicits a warning if the "use warnings" pragma or
3195 the -w command-line switch (the $^W variable) is in effect.
3196
3197 The filehandles STDIN, STDOUT, and STDERR are predefined. (The
3198 filehandles "stdin", "stdout", and "stderr" will also work except in
3199 packages, where they would be interpreted as local identifiers rather
3200 than global.) Additional filehandles may be created with the open()
3201 function, amongst others. See perlopentut and "open" in perlfunc for
3202 details on this.
3203
3204 If a "<FILEHANDLE>" is used in a context that is looking for a list, a
3205 list comprising all input lines is returned, one line per list element.
3206 It's easy to grow to a rather large data space this way, so use with
3207 care.
3208
3209 "<FILEHANDLE>" may also be spelled readline(*FILEHANDLE). See
3210 "readline" in perlfunc.
3211
3212 The null filehandle "<>" (sometimes called the diamond operator) is
3213 special: it can be used to emulate the behavior of sed and awk, and any
3214 other Unix filter program that takes a list of filenames, doing the
3215 same to each line of input from all of them. Input from "<>" comes
3216 either from standard input, or from each file listed on the command
3217 line. Here's how it works: the first time "<>" is evaluated, the @ARGV
3218 array is checked, and if it is empty, $ARGV[0] is set to "-", which
3219 when opened gives you standard input. The @ARGV array is then
3220 processed as a list of filenames. The loop
3221
3222 while (<>) {
3223 ... # code for each line
3224 }
3225
3226 is equivalent to the following Perl-like pseudo code:
3227
3228 unshift(@ARGV, '-') unless @ARGV;
3229 while ($ARGV = shift) {
3230 open(ARGV, $ARGV);
3231 while (<ARGV>) {
3232 ... # code for each line
3233 }
3234 }
3235
3236 except that it isn't so cumbersome to say, and will actually work. It
3237 really does shift the @ARGV array and put the current filename into the
3238 $ARGV variable. It also uses filehandle ARGV internally. "<>" is just
3239 a synonym for "<ARGV>", which is magical. (The pseudo code above
3240 doesn't work because it treats "<ARGV>" as non-magical.)
3241
3242 Since the null filehandle uses the two argument form of "open" in
3243 perlfunc it interprets special characters, so if you have a script like
3244 this:
3245
3246 while (<>) {
3247 print;
3248 }
3249
3250 and call it with "perl dangerous.pl 'rm -rfv *|'", it actually opens a
3251 pipe, executes the "rm" command and reads "rm"'s output from that pipe.
3252 If you want all items in @ARGV to be interpreted as file names, you can
3253 use the module "ARGV::readonly" from CPAN, or use the double diamond
3254 bracket:
3255
3256 while (<<>>) {
3257 print;
3258 }
3259
3260 Using double angle brackets inside of a while causes the open to use
3261 the three argument form (with the second argument being "<"), so all
3262 arguments in "ARGV" are treated as literal filenames (including "-").
3263 (Note that for convenience, if you use "<<>>" and if @ARGV is empty, it
3264 will still read from the standard input.)
3265
3266 You can modify @ARGV before the first "<>" as long as the array ends up
3267 containing the list of filenames you really want. Line numbers ($.)
3268 continue as though the input were one big happy file. See the example
3269 in "eof" in perlfunc for how to reset line numbers on each file.
3270
3271 If you want to set @ARGV to your own list of files, go right ahead.
3272 This sets @ARGV to all plain text files if no @ARGV was given:
3273
3274 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
3275
3276 You can even set them to pipe commands. For example, this
3277 automatically filters compressed arguments through gzip:
3278
3279 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
3280
3281 If you want to pass switches into your script, you can use one of the
3282 "Getopts" modules or put a loop on the front like this:
3283
3284 while ($_ = $ARGV[0], /^-/) {
3285 shift;
3286 last if /^--$/;
3287 if (/^-D(.*)/) { $debug = $1 }
3288 if (/^-v/) { $verbose++ }
3289 # ... # other switches
3290 }
3291
3292 while (<>) {
3293 # ... # code for each line
3294 }
3295
3296 The "<>" symbol will return "undef" for end-of-file only once. If you
3297 call it again after this, it will assume you are processing another
3298 @ARGV list, and if you haven't set @ARGV, will read input from STDIN.
3299
3300 If what the angle brackets contain is a simple scalar variable (for
3301 example, $foo), then that variable contains the name of the filehandle
3302 to input from, or its typeglob, or a reference to the same. For
3303 example:
3304
3305 $fh = \*STDIN;
3306 $line = <$fh>;
3307
3308 If what's within the angle brackets is neither a filehandle nor a
3309 simple scalar variable containing a filehandle name, typeglob, or
3310 typeglob reference, it is interpreted as a filename pattern to be
3311 globbed, and either a list of filenames or the next filename in the
3312 list is returned, depending on context. This distinction is determined
3313 on syntactic grounds alone. That means "<$x>" is always a readline()
3314 from an indirect handle, but "<$hash{key}>" is always a glob(). That's
3315 because $x is a simple scalar variable, but $hash{key} is not--it's a
3316 hash element. Even "<$x >" (note the extra space) is treated as
3317 "glob("$x ")", not readline($x).
3318
3319 One level of double-quote interpretation is done first, but you can't
3320 say "<$foo>" because that's an indirect filehandle as explained in the
3321 previous paragraph. (In older versions of Perl, programmers would
3322 insert curly brackets to force interpretation as a filename glob:
3323 "<${foo}>". These days, it's considered cleaner to call the internal
3324 function directly as glob($foo), which is probably the right way to
3325 have done it in the first place.) For example:
3326
3327 while (<*.c>) {
3328 chmod 0644, $_;
3329 }
3330
3331 is roughly equivalent to:
3332
3333 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
3334 while (<FOO>) {
3335 chomp;
3336 chmod 0644, $_;
3337 }
3338
3339 except that the globbing is actually done internally using the standard
3340 "File::Glob" extension. Of course, the shortest way to do the above
3341 is:
3342
3343 chmod 0644, <*.c>;
3344
3345 A (file)glob evaluates its (embedded) argument only when it is starting
3346 a new list. All values must be read before it will start over. In
3347 list context, this isn't important because you automatically get them
3348 all anyway. However, in scalar context the operator returns the next
3349 value each time it's called, or "undef" when the list has run out. As
3350 with filehandle reads, an automatic "defined" is generated when the
3351 glob occurs in the test part of a "while", because legal glob returns
3352 (for example, a file called 0) would otherwise terminate the loop.
3353 Again, "undef" is returned only once. So if you're expecting a single
3354 value from a glob, it is much better to say
3355
3356 ($file) = <blurch*>;
3357
3358 than
3359
3360 $file = <blurch*>;
3361
3362 because the latter will alternate between returning a filename and
3363 returning false.
3364
3365 If you're trying to do variable interpolation, it's definitely better
3366 to use the glob() function, because the older notation can cause people
3367 to become confused with the indirect filehandle notation.
3368
3369 @files = glob("$dir/*.[ch]");
3370 @files = glob($files[$i]);
3371
3372 If an angle-bracket-based globbing expression is used as the condition
3373 of a "while" or "for" loop, then it will be implicitly assigned to $_.
3374 If either a globbing expression or an explicit assignment of a globbing
3375 expression to a scalar is used as a "while"/"for" condition, then the
3376 condition actually tests for definedness of the expression's value, not
3377 for its regular truth value.
3378
3379 Constant Folding
3380 Like C, Perl does a certain amount of expression evaluation at compile
3381 time whenever it determines that all arguments to an operator are
3382 static and have no side effects. In particular, string concatenation
3383 happens at compile time between literals that don't do variable
3384 substitution. Backslash interpolation also happens at compile time.
3385 You can say
3386
3387 'Now is the time for all'
3388 . "\n"
3389 . 'good men to come to.'
3390
3391 and this all reduces to one string internally. Likewise, if you say
3392
3393 foreach $file (@filenames) {
3394 if (-s $file > 5 + 100 * 2**16) { }
3395 }
3396
3397 the compiler precomputes the number which that expression represents so
3398 that the interpreter won't have to.
3399
3400 No-ops
3401 Perl doesn't officially have a no-op operator, but the bare constants 0
3402 and 1 are special-cased not to produce a warning in void context, so
3403 you can for example safely do
3404
3405 1 while foo();
3406
3407 Bitwise String Operators
3408 Bitstrings of any size may be manipulated by the bitwise operators ("~
3409 | & ^").
3410
3411 If the operands to a binary bitwise op are strings of different sizes,
3412 | and ^ ops act as though the shorter operand had additional zero bits
3413 on the right, while the & op acts as though the longer operand were
3414 truncated to the length of the shorter. The granularity for such
3415 extension or truncation is one or more bytes.
3416
3417 # ASCII-based examples
3418 print "j p \n" ^ " a h"; # prints "JAPH\n"
3419 print "JA" | " ph\n"; # prints "japh\n"
3420 print "japh\nJunk" & '_____'; # prints "JAPH\n";
3421 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
3422
3423 If you are intending to manipulate bitstrings, be certain that you're
3424 supplying bitstrings: If an operand is a number, that will imply a
3425 numeric bitwise operation. You may explicitly show which type of
3426 operation you intend by using "" or "0+", as in the examples below.
3427
3428 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3429 $foo = '150' | 105; # yields 255
3430 $foo = 150 | '105'; # yields 255
3431 $foo = '150' | '105'; # yields string '155' (under ASCII)
3432
3433 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
3434 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
3435
3436 This somewhat unpredictable behavior can be avoided with the "bitwise"
3437 feature, new in Perl 5.22. You can enable it via use feature 'bitwise'
3438 or "use v5.28". Before Perl 5.28, it used to emit a warning in the
3439 "experimental::bitwise" category. Under this feature, the four
3440 standard bitwise operators ("~ | & ^") are always numeric. Adding a
3441 dot after each operator ("~. |. &. ^.") forces it to treat its operands
3442 as strings:
3443
3444 use feature "bitwise";
3445 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3446 $foo = '150' | 105; # yields 255
3447 $foo = 150 | '105'; # yields 255
3448 $foo = '150' | '105'; # yields 255
3449 $foo = 150 |. 105; # yields string '155'
3450 $foo = '150' |. 105; # yields string '155'
3451 $foo = 150 |.'105'; # yields string '155'
3452 $foo = '150' |.'105'; # yields string '155'
3453
3454 $baz = $foo & $bar; # both operands numeric
3455 $biz = $foo ^. $bar; # both operands stringy
3456
3457 The assignment variants of these operators ("&= |= ^= &.= |.= ^.=")
3458 behave likewise under the feature.
3459
3460 It is a fatal error if an operand contains a character whose ordinal
3461 value is above 0xFF, and hence not expressible except in UTF-8. The
3462 operation is performed on a non-UTF-8 copy for other operands encoded
3463 in UTF-8. See "Byte and Character Semantics" in perlunicode.
3464
3465 See "vec" in perlfunc for information on how to manipulate individual
3466 bits in a bit vector.
3467
3468 Integer Arithmetic
3469 By default, Perl assumes that it must do most of its arithmetic in
3470 floating point. But by saying
3471
3472 use integer;
3473
3474 you may tell the compiler to use integer operations (see integer for a
3475 detailed explanation) from here to the end of the enclosing BLOCK. An
3476 inner BLOCK may countermand this by saying
3477
3478 no integer;
3479
3480 which lasts until the end of that BLOCK. Note that this doesn't mean
3481 everything is an integer, merely that Perl will use integer operations
3482 for arithmetic, comparison, and bitwise operators. For example, even
3483 under "use integer", if you take the sqrt(2), you'll still get
3484 1.4142135623731 or so.
3485
3486 Used on numbers, the bitwise operators ("&" "|" "^" "~" "<<" ">>")
3487 always produce integral results. (But see also "Bitwise String
3488 Operators".) However, "use integer" still has meaning for them. By
3489 default, their results are interpreted as unsigned integers, but if
3490 "use integer" is in effect, their results are interpreted as signed
3491 integers. For example, "~0" usually evaluates to a large integral
3492 value. However, "use integer; ~0" is -1 on two's-complement machines.
3493
3494 Floating-point Arithmetic
3495 While "use integer" provides integer-only arithmetic, there is no
3496 analogous mechanism to provide automatic rounding or truncation to a
3497 certain number of decimal places. For rounding to a certain number of
3498 digits, sprintf() or printf() is usually the easiest route. See
3499 perlfaq4.
3500
3501 Floating-point numbers are only approximations to what a mathematician
3502 would call real numbers. There are infinitely more reals than floats,
3503 so some corners must be cut. For example:
3504
3505 printf "%.20g\n", 123456789123456789;
3506 # produces 123456789123456784
3507
3508 Testing for exact floating-point equality or inequality is not a good
3509 idea. Here's a (relatively expensive) work-around to compare whether
3510 two floating-point numbers are equal to a particular number of decimal
3511 places. See Knuth, volume II, for a more robust treatment of this
3512 topic.
3513
3514 sub fp_equal {
3515 my ($X, $Y, $POINTS) = @_;
3516 my ($tX, $tY);
3517 $tX = sprintf("%.${POINTS}g", $X);
3518 $tY = sprintf("%.${POINTS}g", $Y);
3519 return $tX eq $tY;
3520 }
3521
3522 The POSIX module (part of the standard perl distribution) implements
3523 ceil(), floor(), and other mathematical and trigonometric functions.
3524 The "Math::Complex" module (part of the standard perl distribution)
3525 defines mathematical functions that work on both the reals and the
3526 imaginary numbers. "Math::Complex" is not as efficient as POSIX, but
3527 POSIX can't work with complex numbers.
3528
3529 Rounding in financial applications can have serious implications, and
3530 the rounding method used should be specified precisely. In these
3531 cases, it probably pays not to trust whichever system rounding is being
3532 used by Perl, but to instead implement the rounding function you need
3533 yourself.
3534
3535 Bigger Numbers
3536 The standard "Math::BigInt", "Math::BigRat", and "Math::BigFloat"
3537 modules, along with the "bignum", "bigint", and "bigrat" pragmas,
3538 provide variable-precision arithmetic and overloaded operators,
3539 although they're currently pretty slow. At the cost of some space and
3540 considerable speed, they avoid the normal pitfalls associated with
3541 limited-precision representations.
3542
3543 use 5.010;
3544 use bigint; # easy interface to Math::BigInt
3545 $x = 123456789123456789;
3546 say $x * $x;
3547 +15241578780673678515622620750190521
3548
3549 Or with rationals:
3550
3551 use 5.010;
3552 use bigrat;
3553 $x = 3/22;
3554 $y = 4/6;
3555 say "x/y is ", $x/$y;
3556 say "x*y is ", $x*$y;
3557 x/y is 9/44
3558 x*y is 1/11
3559
3560 Several modules let you calculate with unlimited or fixed precision
3561 (bound only by memory and CPU time). There are also some non-standard
3562 modules that provide faster implementations via external C libraries.
3563
3564 Here is a short, but incomplete summary:
3565
3566 Math::String treat string sequences like numbers
3567 Math::FixedPrecision calculate with a fixed precision
3568 Math::Currency for currency calculations
3569 Bit::Vector manipulate bit vectors fast (uses C)
3570 Math::BigIntFast Bit::Vector wrapper for big numbers
3571 Math::Pari provides access to the Pari C library
3572 Math::Cephes uses the external Cephes C library (no
3573 big numbers)
3574 Math::Cephes::Fraction fractions via the Cephes library
3575 Math::GMP another one using an external C library
3576 Math::GMPz an alternative interface to libgmp's big ints
3577 Math::GMPq an interface to libgmp's fraction numbers
3578 Math::GMPf an interface to libgmp's floating point numbers
3579
3580 Choose wisely.
3581
3582
3583
3584perl v5.38.2 2023-11-30 PERLOP(1)