1PERLOP(1) Perl Programmers Reference Guide PERLOP(1)
2
3
4
6 perlop - Perl operators and precedence
7
9 In Perl, the operator determines what operation is performed,
10 independent of the type of the operands. For example "$x + $y" is
11 always a numeric addition, and if $x or $y do not contain numbers, an
12 attempt is made to convert them to numbers first.
13
14 This is in contrast to many other dynamic languages, where the
15 operation is determined by the type of the first argument. It also
16 means that Perl has two versions of some operators, one for numeric and
17 one for string comparison. For example "$x == $y" compares two numbers
18 for equality, and "$x eq $y" compares two strings.
19
20 There are a few exceptions though: "x" can be either string repetition
21 or list repetition, depending on the type of the left operand, and "&",
22 "|", "^" and "~" can be either string or numeric bit operations.
23
24 Operator Precedence and Associativity
25 Operator precedence and associativity work in Perl more or less like
26 they do in mathematics.
27
28 Operator precedence means some operators group more tightly than
29 others. For example, in "2 + 4 * 5", the multiplication has higher
30 precedence, so "4 * 5" is grouped together as the right-hand operand of
31 the addition, rather than "2 + 4" being grouped together as the left-
32 hand operand of the multiplication. It is as if the expression were
33 written "2 + (4 * 5)", not "(2 + 4) * 5". So the expression yields "2 +
34 20 == 22", rather than "6 * 5 == 30".
35
36 Operator associativity defines what happens if a sequence of the same
37 operators is used one after another: usually that they will be grouped
38 at the left or the right. For example, in "9 - 3 - 2", subtraction is
39 left associative, so "9 - 3" is grouped together as the left-hand
40 operand of the second subtraction, rather than "3 - 2" being grouped
41 together as the right-hand operand of the first subtraction. It is as
42 if the expression were written "(9 - 3) - 2", not "9 - (3 - 2)". So the
43 expression yields "6 - 2 == 4", rather than "9 - 1 == 8".
44
45 For simple operators that evaluate all their operands and then combine
46 the values in some way, precedence and associativity (and parentheses)
47 imply some ordering requirements on those combining operations. For
48 example, in "2 + 4 * 5", the grouping implied by precedence means that
49 the multiplication of 4 and 5 must be performed before the addition of
50 2 and 20, simply because the result of that multiplication is required
51 as one of the operands of the addition. But the order of operations is
52 not fully determined by this: in "2 * 2 + 4 * 5" both multiplications
53 must be performed before the addition, but the grouping does not say
54 anything about the order in which the two multiplications are
55 performed. In fact Perl has a general rule that the operands of an
56 operator are evaluated in left-to-right order. A few operators such as
57 "&&=" have special evaluation rules that can result in an operand not
58 being evaluated at all; in general, the top-level operator in an
59 expression has control of operand evaluation.
60
61 Some comparison operators, as their associativity, chain with some
62 operators of the same precedence (but never with operators of different
63 precedence). This chaining means that each comparison is performed on
64 the two arguments surrounding it, with each interior argument taking
65 part in two comparisons, and the comparison results are implicitly
66 ANDed. Thus "$x < $y <= $z" behaves exactly like
67 "$x < $y && $y <= $z", assuming that "$y" is as simple a scalar as it
68 looks. The ANDing short-circuits just like "&&" does, stopping the
69 sequence of comparisons as soon as one yields false.
70
71 In a chained comparison, each argument expression is evaluated at most
72 once, even if it takes part in two comparisons, but the result of the
73 evaluation is fetched for each comparison. (It is not evaluated at all
74 if the short-circuiting means that it's not required for any
75 comparisons.) This matters if the computation of an interior argument
76 is expensive or non-deterministic. For example,
77
78 if($x < expensive_sub() <= $z) { ...
79
80 is not entirely like
81
82 if($x < expensive_sub() && expensive_sub() <= $z) { ...
83
84 but instead closer to
85
86 my $tmp = expensive_sub();
87 if($x < $tmp && $tmp <= $z) { ...
88
89 in that the subroutine is only called once. However, it's not exactly
90 like this latter code either, because the chained comparison doesn't
91 actually involve any temporary variable (named or otherwise): there is
92 no assignment. This doesn't make much difference where the expression
93 is a call to an ordinary subroutine, but matters more with an lvalue
94 subroutine, or if the argument expression yields some unusual kind of
95 scalar by other means. For example, if the argument expression yields
96 a tied scalar, then the expression is evaluated to produce that scalar
97 at most once, but the value of that scalar may be fetched up to twice,
98 once for each comparison in which it is actually used.
99
100 In this example, the expression is evaluated only once, and the tied
101 scalar (the result of the expression) is fetched for each comparison
102 that uses it.
103
104 if ($x < $tied_scalar < $z) { ...
105
106 In the next example, the expression is evaluated only once, and the
107 tied scalar is fetched once as part of the operation within the
108 expression. The result of that operation is fetched for each
109 comparison, which normally doesn't matter unless that expression result
110 is also magical due to operator overloading.
111
112 if ($x < $tied_scalar + 42 < $z) { ...
113
114 Some operators are instead non-associative, meaning that it is a syntax
115 error to use a sequence of those operators of the same precedence. For
116 example, "$x .. $y .. $z" is an error.
117
118 Perl operators have the following associativity and precedence, listed
119 from highest precedence to lowest. Operators borrowed from C keep the
120 same precedence relationship with each other, even where C's precedence
121 is slightly screwy. (This makes learning Perl easier for C folks.)
122 With very few exceptions, these all operate on scalar values only, not
123 array values.
124
125 left terms and list operators (leftward)
126 left ->
127 nonassoc ++ --
128 right **
129 right ! ~ \ and unary + and -
130 left =~ !~
131 left * / % x
132 left + - .
133 left << >>
134 nonassoc named unary operators
135 chained < > <= >= lt gt le ge
136 chain/na == != eq ne <=> cmp ~~
137 nonassoc isa
138 left &
139 left | ^
140 left &&
141 left || //
142 nonassoc .. ...
143 right ?:
144 right = += -= *= etc. goto last next redo dump
145 left , =>
146 nonassoc list operators (rightward)
147 right not
148 left and
149 left or xor
150
151 In the following sections, these operators are covered in detail, in
152 the same order in which they appear in the table above.
153
154 Many operators can be overloaded for objects. See overload.
155
156 Terms and List Operators (Leftward)
157 A TERM has the highest precedence in Perl. They include variables,
158 quote and quote-like operators, any expression in parentheses, and any
159 function whose arguments are parenthesized. Actually, there aren't
160 really functions in this sense, just list operators and unary operators
161 behaving as functions because you put parentheses around the arguments.
162 These are all documented in perlfunc.
163
164 If any list operator ("print()", etc.) or any unary operator
165 ("chdir()", etc.) is followed by a left parenthesis as the next token,
166 the operator and arguments within parentheses are taken to be of
167 highest precedence, just like a normal function call.
168
169 In the absence of parentheses, the precedence of list operators such as
170 "print", "sort", or "chmod" is either very high or very low depending
171 on whether you are looking at the left side or the right side of the
172 operator. For example, in
173
174 @ary = (1, 3, sort 4, 2);
175 print @ary; # prints 1324
176
177 the commas on the right of the "sort" are evaluated before the "sort",
178 but the commas on the left are evaluated after. In other words, list
179 operators tend to gobble up all arguments that follow, and then act
180 like a simple TERM with regard to the preceding expression. Be careful
181 with parentheses:
182
183 # These evaluate exit before doing the print:
184 print($foo, exit); # Obviously not what you want.
185 print $foo, exit; # Nor is this.
186
187 # These do the print before evaluating exit:
188 (print $foo), exit; # This is what you want.
189 print($foo), exit; # Or this.
190 print ($foo), exit; # Or even this.
191
192 Also note that
193
194 print ($foo & 255) + 1, "\n";
195
196 probably doesn't do what you expect at first glance. The parentheses
197 enclose the argument list for "print" which is evaluated (printing the
198 result of "$foo & 255"). Then one is added to the return value of
199 "print" (usually 1). The result is something like this:
200
201 1 + 1, "\n"; # Obviously not what you meant.
202
203 To do what you meant properly, you must write:
204
205 print(($foo & 255) + 1, "\n");
206
207 See "Named Unary Operators" for more discussion of this.
208
209 Also parsed as terms are the "do {}" and "eval {}" constructs, as well
210 as subroutine and method calls, and the anonymous constructors "[]" and
211 "{}".
212
213 See also "Quote and Quote-like Operators" toward the end of this
214 section, as well as "I/O Operators".
215
216 The Arrow Operator
217 ""->"" is an infix dereference operator, just as it is in C and C++.
218 If the right side is either a "[...]", "{...}", or a "(...)" subscript,
219 then the left side must be either a hard or symbolic reference to an
220 array, a hash, or a subroutine respectively. (Or technically speaking,
221 a location capable of holding a hard reference, if it's an array or
222 hash reference being used for assignment.) See perlreftut and perlref.
223
224 Otherwise, the right side is a method name or a simple scalar variable
225 containing either the method name or a subroutine reference, and the
226 left side must be either an object (a blessed reference) or a class
227 name (that is, a package name). See perlobj.
228
229 The dereferencing cases (as opposed to method-calling cases) are
230 somewhat extended by the "postderef" feature. For the details of that
231 feature, consult "Postfix Dereference Syntax" in perlref.
232
233 Auto-increment and Auto-decrement
234 "++" and "--" work as in C. That is, if placed before a variable, they
235 increment or decrement the variable by one before returning the value,
236 and if placed after, increment or decrement after returning the value.
237
238 $i = 0; $j = 0;
239 print $i++; # prints 0
240 print ++$j; # prints 1
241
242 Note that just as in C, Perl doesn't define when the variable is
243 incremented or decremented. You just know it will be done sometime
244 before or after the value is returned. This also means that modifying
245 a variable twice in the same statement will lead to undefined behavior.
246 Avoid statements like:
247
248 $i = $i ++;
249 print ++ $i + $i ++;
250
251 Perl will not guarantee what the result of the above statements is.
252
253 The auto-increment operator has a little extra builtin magic to it. If
254 you increment a variable that is numeric, or that has ever been used in
255 a numeric context, you get a normal increment. If, however, the
256 variable has been used in only string contexts since it was set, and
257 has a value that is not the empty string and matches the pattern
258 "/^[a-zA-Z]*[0-9]*\z/", the increment is done as a string, preserving
259 each character within its range, with carry:
260
261 print ++($foo = "99"); # prints "100"
262 print ++($foo = "a0"); # prints "a1"
263 print ++($foo = "Az"); # prints "Ba"
264 print ++($foo = "zz"); # prints "aaa"
265
266 "undef" is always treated as numeric, and in particular is changed to 0
267 before incrementing (so that a post-increment of an undef value will
268 return 0 rather than "undef").
269
270 The auto-decrement operator is not magical.
271
272 Exponentiation
273 Binary "**" is the exponentiation operator. It binds even more tightly
274 than unary minus, so "-2**4" is "-(2**4)", not "(-2)**4". (This is
275 implemented using C's pow(3) function, which actually works on doubles
276 internally.)
277
278 Note that certain exponentiation expressions are ill-defined: these
279 include "0**0", "1**Inf", and "Inf**0". Do not expect any particular
280 results from these special cases, the results are platform-dependent.
281
282 Symbolic Unary Operators
283 Unary "!" performs logical negation, that is, "not". See also "not"
284 for a lower precedence version of this.
285
286 Unary "-" performs arithmetic negation if the operand is numeric,
287 including any string that looks like a number. If the operand is an
288 identifier, a string consisting of a minus sign concatenated with the
289 identifier is returned. Otherwise, if the string starts with a plus or
290 minus, a string starting with the opposite sign is returned. One
291 effect of these rules is that "-bareword" is equivalent to the string
292 "-bareword". If, however, the string begins with a non-alphabetic
293 character (excluding "+" or "-"), Perl will attempt to convert the
294 string to a numeric, and the arithmetic negation is performed. If the
295 string cannot be cleanly converted to a numeric, Perl will give the
296 warning Argument "the string" isn't numeric in negation (-) at ....
297
298 Unary "~" performs bitwise negation, that is, 1's complement. For
299 example, "0666 & ~027" is 0640. (See also "Integer Arithmetic" and
300 "Bitwise String Operators".) Note that the width of the result is
301 platform-dependent: "~0" is 32 bits wide on a 32-bit platform, but 64
302 bits wide on a 64-bit platform, so if you are expecting a certain bit
303 width, remember to use the "&" operator to mask off the excess bits.
304
305 Starting in Perl 5.28, it is a fatal error to try to complement a
306 string containing a character with an ordinal value above 255.
307
308 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
309 v5.28", then unary "~" always treats its argument as a number, and an
310 alternate form of the operator, "~.", always treats its argument as a
311 string. So "~0" and "~"0"" will both give 2**32-1 on 32-bit platforms,
312 whereas "~.0" and "~."0"" will both yield "\xff". Until Perl 5.28,
313 this feature produced a warning in the "experimental::bitwise"
314 category.
315
316 Unary "+" has no effect whatsoever, even on strings. It is useful
317 syntactically for separating a function name from a parenthesized
318 expression that would otherwise be interpreted as the complete list of
319 function arguments. (See examples above under "Terms and List
320 Operators (Leftward)".)
321
322 Unary "\" creates references. If its operand is a single sigilled
323 thing, it creates a reference to that object. If its operand is a
324 parenthesised list, then it creates references to the things mentioned
325 in the list. Otherwise it puts its operand in list context, and
326 creates a list of references to the scalars in the list provided by the
327 operand. See perlreftut and perlref. Do not confuse this behavior
328 with the behavior of backslash within a string, although both forms do
329 convey the notion of protecting the next thing from interpolation.
330
331 Binding Operators
332 Binary "=~" binds a scalar expression to a pattern match. Certain
333 operations search or modify the string $_ by default. This operator
334 makes that kind of operation work on some other string. The right
335 argument is a search pattern, substitution, or transliteration. The
336 left argument is what is supposed to be searched, substituted, or
337 transliterated instead of the default $_. When used in scalar context,
338 the return value generally indicates the success of the operation. The
339 exceptions are substitution ("s///") and transliteration ("y///") with
340 the "/r" (non-destructive) option, which cause the return value to be
341 the result of the substitution. Behavior in list context depends on
342 the particular operator. See "Regexp Quote-Like Operators" for details
343 and perlretut for examples using these operators.
344
345 If the right argument is an expression rather than a search pattern,
346 substitution, or transliteration, it is interpreted as a search pattern
347 at run time. Note that this means that its contents will be
348 interpolated twice, so
349
350 '\\' =~ q'\\';
351
352 is not ok, as the regex engine will end up trying to compile the
353 pattern "\", which it will consider a syntax error.
354
355 Binary "!~" is just like "=~" except the return value is negated in the
356 logical sense.
357
358 Binary "!~" with a non-destructive substitution ("s///r") or
359 transliteration ("y///r") is a syntax error.
360
361 Multiplicative Operators
362 Binary "*" multiplies two numbers.
363
364 Binary "/" divides two numbers.
365
366 Binary "%" is the modulo operator, which computes the division
367 remainder of its first argument with respect to its second argument.
368 Given integer operands $m and $n: If $n is positive, then "$m % $n" is
369 $m minus the largest multiple of $n less than or equal to $m. If $n is
370 negative, then "$m % $n" is $m minus the smallest multiple of $n that
371 is not less than $m (that is, the result will be less than or equal to
372 zero). If the operands $m and $n are floating point values and the
373 absolute value of $n (that is "abs($n)") is less than "(UV_MAX + 1)",
374 only the integer portion of $m and $n will be used in the operation
375 (Note: here "UV_MAX" means the maximum of the unsigned integer type).
376 If the absolute value of the right operand ("abs($n)") is greater than
377 or equal to "(UV_MAX + 1)", "%" computes the floating-point remainder
378 $r in the equation "($r = $m - $i*$n)" where $i is a certain integer
379 that makes $r have the same sign as the right operand $n (not as the
380 left operand $m like C function "fmod()") and the absolute value less
381 than that of $n. Note that when "use integer" is in scope, "%" gives
382 you direct access to the modulo operator as implemented by your C
383 compiler. This operator is not as well defined for negative operands,
384 but it will execute faster.
385
386 Binary "x" is the repetition operator. In scalar context, or if the
387 left operand is neither enclosed in parentheses nor a "qw//" list, it
388 performs a string repetition. In that case it supplies scalar context
389 to the left operand, and returns a string consisting of the left
390 operand string repeated the number of times specified by the right
391 operand. If the "x" is in list context, and the left operand is either
392 enclosed in parentheses or a "qw//" list, it performs a list
393 repetition. In that case it supplies list context to the left operand,
394 and returns a list consisting of the left operand list repeated the
395 number of times specified by the right operand. If the right operand
396 is zero or negative (raising a warning on negative), it returns an
397 empty string or an empty list, depending on the context.
398
399 print '-' x 80; # print row of dashes
400
401 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
402
403 @ones = (1) x 80; # a list of 80 1's
404 @ones = (5) x @ones; # set all elements to 5
405
406 Additive Operators
407 Binary "+" returns the sum of two numbers.
408
409 Binary "-" returns the difference of two numbers.
410
411 Binary "." concatenates two strings.
412
413 Shift Operators
414 Binary "<<" returns the value of its left argument shifted left by the
415 number of bits specified by the right argument. Arguments should be
416 integers. (See also "Integer Arithmetic".)
417
418 Binary ">>" returns the value of its left argument shifted right by the
419 number of bits specified by the right argument. Arguments should be
420 integers. (See also "Integer Arithmetic".)
421
422 If "use integer" (see "Integer Arithmetic") is in force then signed C
423 integers are used (arithmetic shift), otherwise unsigned C integers are
424 used (logical shift), even for negative shiftees. In arithmetic right
425 shift the sign bit is replicated on the left, in logical shift zero
426 bits come in from the left.
427
428 Either way, the implementation isn't going to generate results larger
429 than the size of the integer type Perl was built with (32 bits or 64
430 bits).
431
432 Shifting by negative number of bits means the reverse shift: left shift
433 becomes right shift, right shift becomes left shift. This is unlike in
434 C, where negative shift is undefined.
435
436 Shifting by more bits than the size of the integers means most of the
437 time zero (all bits fall off), except that under "use integer" right
438 overshifting a negative shiftee results in -1. This is unlike in C,
439 where shifting by too many bits is undefined. A common C behavior is
440 "shift by modulo wordbits", so that for example
441
442 1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1 # Common C behavior.
443
444 but that is completely accidental.
445
446 If you get tired of being subject to your platform's native integers,
447 the "use bigint" pragma neatly sidesteps the issue altogether:
448
449 print 20 << 20; # 20971520
450 print 20 << 40; # 5120 on 32-bit machines,
451 # 21990232555520 on 64-bit machines
452 use bigint;
453 print 20 << 100; # 25353012004564588029934064107520
454
455 Named Unary Operators
456 The various named unary operators are treated as functions with one
457 argument, with optional parentheses.
458
459 If any list operator ("print()", etc.) or any unary operator
460 ("chdir()", etc.) is followed by a left parenthesis as the next token,
461 the operator and arguments within parentheses are taken to be of
462 highest precedence, just like a normal function call. For example,
463 because named unary operators are higher precedence than "||":
464
465 chdir $foo || die; # (chdir $foo) || die
466 chdir($foo) || die; # (chdir $foo) || die
467 chdir ($foo) || die; # (chdir $foo) || die
468 chdir +($foo) || die; # (chdir $foo) || die
469
470 but, because "*" is higher precedence than named operators:
471
472 chdir $foo * 20; # chdir ($foo * 20)
473 chdir($foo) * 20; # (chdir $foo) * 20
474 chdir ($foo) * 20; # (chdir $foo) * 20
475 chdir +($foo) * 20; # chdir ($foo * 20)
476
477 rand 10 * 20; # rand (10 * 20)
478 rand(10) * 20; # (rand 10) * 20
479 rand (10) * 20; # (rand 10) * 20
480 rand +(10) * 20; # rand (10 * 20)
481
482 Regarding precedence, the filetest operators, like "-f", "-M", etc. are
483 treated like named unary operators, but they don't follow this
484 functional parenthesis rule. That means, for example, that
485 "-f($file).".bak"" is equivalent to "-f "$file.bak"".
486
487 See also "Terms and List Operators (Leftward)".
488
489 Relational Operators
490 Perl operators that return true or false generally return values that
491 can be safely used as numbers. For example, the relational operators
492 in this section and the equality operators in the next one return 1 for
493 true and a special version of the defined empty string, "", which
494 counts as a zero but is exempt from warnings about improper numeric
495 conversions, just as "0 but true" is.
496
497 Binary "<" returns true if the left argument is numerically less than
498 the right argument.
499
500 Binary ">" returns true if the left argument is numerically greater
501 than the right argument.
502
503 Binary "<=" returns true if the left argument is numerically less than
504 or equal to the right argument.
505
506 Binary ">=" returns true if the left argument is numerically greater
507 than or equal to the right argument.
508
509 Binary "lt" returns true if the left argument is stringwise less than
510 the right argument.
511
512 Binary "gt" returns true if the left argument is stringwise greater
513 than the right argument.
514
515 Binary "le" returns true if the left argument is stringwise less than
516 or equal to the right argument.
517
518 Binary "ge" returns true if the left argument is stringwise greater
519 than or equal to the right argument.
520
521 A sequence of relational operators, such as "$x < $y <= $z", performs
522 chained comparisons, in the manner described above in the section
523 "Operator Precedence and Associativity". Beware that they do not chain
524 with equality operators, which have lower precedence.
525
526 Equality Operators
527 Binary "==" returns true if the left argument is numerically equal to
528 the right argument.
529
530 Binary "!=" returns true if the left argument is numerically not equal
531 to the right argument.
532
533 Binary "eq" returns true if the left argument is stringwise equal to
534 the right argument.
535
536 Binary "ne" returns true if the left argument is stringwise not equal
537 to the right argument.
538
539 A sequence of the above equality operators, such as "$x == $y == $z",
540 performs chained comparisons, in the manner described above in the
541 section "Operator Precedence and Associativity". Beware that they do
542 not chain with relational operators, which have higher precedence.
543
544 Binary "<=>" returns -1, 0, or 1 depending on whether the left argument
545 is numerically less than, equal to, or greater than the right argument.
546 If your platform supports "NaN"'s (not-a-numbers) as numeric values,
547 using them with "<=>" returns undef. "NaN" is not "<", "==", ">", "<="
548 or ">=" anything (even "NaN"), so those 5 return false. "NaN != NaN"
549 returns true, as does "NaN !=" anything else. If your platform doesn't
550 support "NaN"'s then "NaN" is just a string with numeric value 0.
551
552 $ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
553 $ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
554
555 (Note that the bigint, bigrat, and bignum pragmas all support "NaN".)
556
557 Binary "cmp" returns -1, 0, or 1 depending on whether the left argument
558 is stringwise less than, equal to, or greater than the right argument.
559
560 Binary "~~" does a smartmatch between its arguments. Smart matching is
561 described in the next section.
562
563 The two-sided ordering operators "<=>" and "cmp", and the smartmatch
564 operator "~~", are non-associative with respect to each other and with
565 respect to the equality operators of the same precedence.
566
567 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order
568 specified by the current "LC_COLLATE" locale if a "use locale" form
569 that includes collation is in effect. See perllocale. Do not mix
570 these with Unicode, only use them with legacy 8-bit locale encodings.
571 The standard "Unicode::Collate" and "Unicode::Collate::Locale" modules
572 offer much more powerful solutions to collation issues.
573
574 For case-insensitive comparisons, look at the "fc" in perlfunc case-
575 folding function, available in Perl v5.16 or later:
576
577 if ( fc($x) eq fc($y) ) { ... }
578
579 Class Instance Operator
580 Binary "isa" evaluates to true when the left argument is an object
581 instance of the class (or a subclass derived from that class) given by
582 the right argument. If the left argument is not defined, not a blessed
583 object instance, nor does not derive from the class given by the right
584 argument, the operator evaluates as false. The right argument may give
585 the class either as a bareword or a scalar expression that yields a
586 string class name:
587
588 if( $obj isa Some::Class ) { ... }
589
590 if( $obj isa "Different::Class" ) { ... }
591 if( $obj isa $name_of_class ) { ... }
592
593 This is an experimental feature and is available from Perl 5.31.6 when
594 enabled by "use feature 'isa'". It emits a warning in the
595 "experimental::isa" category.
596
597 Smartmatch Operator
598 First available in Perl 5.10.1 (the 5.10.0 version behaved
599 differently), binary "~~" does a "smartmatch" between its arguments.
600 This is mostly used implicitly in the "when" construct described in
601 perlsyn, although not all "when" clauses call the smartmatch operator.
602 Unique among all of Perl's operators, the smartmatch operator can
603 recurse. The smartmatch operator is experimental and its behavior is
604 subject to change.
605
606 It is also unique in that all other Perl operators impose a context
607 (usually string or numeric context) on their operands, autoconverting
608 those operands to those imposed contexts. In contrast, smartmatch
609 infers contexts from the actual types of its operands and uses that
610 type information to select a suitable comparison mechanism.
611
612 The "~~" operator compares its operands "polymorphically", determining
613 how to compare them according to their actual types (numeric, string,
614 array, hash, etc.). Like the equality operators with which it shares
615 the same precedence, "~~" returns 1 for true and "" for false. It is
616 often best read aloud as "in", "inside of", or "is contained in",
617 because the left operand is often looked for inside the right operand.
618 That makes the order of the operands to the smartmatch operand often
619 opposite that of the regular match operator. In other words, the
620 "smaller" thing is usually placed in the left operand and the larger
621 one in the right.
622
623 The behavior of a smartmatch depends on what type of things its
624 arguments are, as determined by the following table. The first row of
625 the table whose types apply determines the smartmatch behavior.
626 Because what actually happens is mostly determined by the type of the
627 second operand, the table is sorted on the right operand instead of on
628 the left.
629
630 Left Right Description and pseudocode
631 ===============================================================
632 Any undef check whether Any is undefined
633 like: !defined Any
634
635 Any Object invoke ~~ overloading on Object, or die
636
637 Right operand is an ARRAY:
638
639 Left Right Description and pseudocode
640 ===============================================================
641 ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
642 like: (ARRAY1[0] ~~ ARRAY2[0])
643 && (ARRAY1[1] ~~ ARRAY2[1]) && ...
644 HASH ARRAY any ARRAY elements exist as HASH keys
645 like: grep { exists HASH->{$_} } ARRAY
646 Regexp ARRAY any ARRAY elements pattern match Regexp
647 like: grep { /Regexp/ } ARRAY
648 undef ARRAY undef in ARRAY
649 like: grep { !defined } ARRAY
650 Any ARRAY smartmatch each ARRAY element[3]
651 like: grep { Any ~~ $_ } ARRAY
652
653 Right operand is a HASH:
654
655 Left Right Description and pseudocode
656 ===============================================================
657 HASH1 HASH2 all same keys in both HASHes
658 like: keys HASH1 ==
659 grep { exists HASH2->{$_} } keys HASH1
660 ARRAY HASH any ARRAY elements exist as HASH keys
661 like: grep { exists HASH->{$_} } ARRAY
662 Regexp HASH any HASH keys pattern match Regexp
663 like: grep { /Regexp/ } keys HASH
664 undef HASH always false (undef can't be a key)
665 like: 0 == 1
666 Any HASH HASH key existence
667 like: exists HASH->{Any}
668
669 Right operand is CODE:
670
671 Left Right Description and pseudocode
672 ===============================================================
673 ARRAY CODE sub returns true on all ARRAY elements[1]
674 like: !grep { !CODE->($_) } ARRAY
675 HASH CODE sub returns true on all HASH keys[1]
676 like: !grep { !CODE->($_) } keys HASH
677 Any CODE sub passed Any returns true
678 like: CODE->(Any)
679
680 Right operand is a Regexp:
681
682 Left Right Description and pseudocode
683 ===============================================================
684 ARRAY Regexp any ARRAY elements match Regexp
685 like: grep { /Regexp/ } ARRAY
686 HASH Regexp any HASH keys match Regexp
687 like: grep { /Regexp/ } keys HASH
688 Any Regexp pattern match
689 like: Any =~ /Regexp/
690
691 Other:
692
693 Left Right Description and pseudocode
694 ===============================================================
695 Object Any invoke ~~ overloading on Object,
696 or fall back to...
697
698 Any Num numeric equality
699 like: Any == Num
700 Num nummy[4] numeric equality
701 like: Num == nummy
702 undef Any check whether undefined
703 like: !defined(Any)
704 Any Any string equality
705 like: Any eq Any
706
707 Notes:
708
709 1. Empty hashes or arrays match.
710 2. That is, each element smartmatches the element of the same index in
711 the other array.[3]
712 3. If a circular reference is found, fall back to referential equality.
713 4. Either an actual number, or a string that looks like one.
714
715 The smartmatch implicitly dereferences any non-blessed hash or array
716 reference, so the "HASH" and "ARRAY" entries apply in those cases. For
717 blessed references, the "Object" entries apply. Smartmatches involving
718 hashes only consider hash keys, never hash values.
719
720 The "like" code entry is not always an exact rendition. For example,
721 the smartmatch operator short-circuits whenever possible, but "grep"
722 does not. Also, "grep" in scalar context returns the number of
723 matches, but "~~" returns only true or false.
724
725 Unlike most operators, the smartmatch operator knows to treat "undef"
726 specially:
727
728 use v5.10.1;
729 @array = (1, 2, 3, undef, 4, 5);
730 say "some elements undefined" if undef ~~ @array;
731
732 Each operand is considered in a modified scalar context, the
733 modification being that array and hash variables are passed by
734 reference to the operator, which implicitly dereferences them. Both
735 elements of each pair are the same:
736
737 use v5.10.1;
738
739 my %hash = (red => 1, blue => 2, green => 3,
740 orange => 4, yellow => 5, purple => 6,
741 black => 7, grey => 8, white => 9);
742
743 my @array = qw(red blue green);
744
745 say "some array elements in hash keys" if @array ~~ %hash;
746 say "some array elements in hash keys" if \@array ~~ \%hash;
747
748 say "red in array" if "red" ~~ @array;
749 say "red in array" if "red" ~~ \@array;
750
751 say "some keys end in e" if /e$/ ~~ %hash;
752 say "some keys end in e" if /e$/ ~~ \%hash;
753
754 Two arrays smartmatch if each element in the first array smartmatches
755 (that is, is "in") the corresponding element in the second array,
756 recursively.
757
758 use v5.10.1;
759 my @little = qw(red blue green);
760 my @bigger = ("red", "blue", [ "orange", "green" ] );
761 if (@little ~~ @bigger) { # true!
762 say "little is contained in bigger";
763 }
764
765 Because the smartmatch operator recurses on nested arrays, this will
766 still report that "red" is in the array.
767
768 use v5.10.1;
769 my @array = qw(red blue green);
770 my $nested_array = [[[[[[[ @array ]]]]]]];
771 say "red in array" if "red" ~~ $nested_array;
772
773 If two arrays smartmatch each other, then they are deep copies of each
774 others' values, as this example reports:
775
776 use v5.12.0;
777 my @a = (0, 1, 2, [3, [4, 5], 6], 7);
778 my @b = (0, 1, 2, [3, [4, 5], 6], 7);
779
780 if (@a ~~ @b && @b ~~ @a) {
781 say "a and b are deep copies of each other";
782 }
783 elsif (@a ~~ @b) {
784 say "a smartmatches in b";
785 }
786 elsif (@b ~~ @a) {
787 say "b smartmatches in a";
788 }
789 else {
790 say "a and b don't smartmatch each other at all";
791 }
792
793 If you were to set "$b[3] = 4", then instead of reporting that "a and b
794 are deep copies of each other", it now reports that "b smartmatches in
795 a". That's because the corresponding position in @a contains an array
796 that (eventually) has a 4 in it.
797
798 Smartmatching one hash against another reports whether both contain the
799 same keys, no more and no less. This could be used to see whether two
800 records have the same field names, without caring what values those
801 fields might have. For example:
802
803 use v5.10.1;
804 sub make_dogtag {
805 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
806
807 my ($class, $init_fields) = @_;
808
809 die "Must supply (only) name, rank, and serial number"
810 unless $init_fields ~~ $REQUIRED_FIELDS;
811
812 ...
813 }
814
815 However, this only does what you mean if $init_fields is indeed a hash
816 reference. The condition "$init_fields ~~ $REQUIRED_FIELDS" also allows
817 the strings "name", "rank", "serial_num" as well as any array reference
818 that contains "name" or "rank" or "serial_num" anywhere to pass
819 through.
820
821 The smartmatch operator is most often used as the implicit operator of
822 a "when" clause. See the section on "Switch Statements" in perlsyn.
823
824 Smartmatching of Objects
825
826 To avoid relying on an object's underlying representation, if the
827 smartmatch's right operand is an object that doesn't overload "~~", it
828 raises the exception ""Smartmatching a non-overloaded object breaks
829 encapsulation"". That's because one has no business digging around to
830 see whether something is "in" an object. These are all illegal on
831 objects without a "~~" overload:
832
833 %hash ~~ $object
834 42 ~~ $object
835 "fred" ~~ $object
836
837 However, you can change the way an object is smartmatched by
838 overloading the "~~" operator. This is allowed to extend the usual
839 smartmatch semantics. For objects that do have an "~~" overload, see
840 overload.
841
842 Using an object as the left operand is allowed, although not very
843 useful. Smartmatching rules take precedence over overloading, so even
844 if the object in the left operand has smartmatch overloading, this will
845 be ignored. A left operand that is a non-overloaded object falls back
846 on a string or numeric comparison of whatever the "ref" operator
847 returns. That means that
848
849 $object ~~ X
850
851 does not invoke the overload method with "X" as an argument. Instead
852 the above table is consulted as normal, and based on the type of "X",
853 overloading may or may not be invoked. For simple strings or numbers,
854 "in" becomes equivalent to this:
855
856 $object ~~ $number ref($object) == $number
857 $object ~~ $string ref($object) eq $string
858
859 For example, this reports that the handle smells IOish (but please
860 don't really do this!):
861
862 use IO::Handle;
863 my $fh = IO::Handle->new();
864 if ($fh ~~ /\bIO\b/) {
865 say "handle smells IOish";
866 }
867
868 That's because it treats $fh as a string like
869 "IO::Handle=GLOB(0x8039e0)", then pattern matches against that.
870
871 Bitwise And
872 Binary "&" returns its operands ANDed together bit by bit. Although no
873 warning is currently raised, the result is not well defined when this
874 operation is performed on operands that aren't either numbers (see
875 "Integer Arithmetic") nor bitstrings (see "Bitwise String Operators").
876
877 Note that "&" has lower priority than relational operators, so for
878 example the parentheses are essential in a test like
879
880 print "Even\n" if ($x & 1) == 0;
881
882 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
883 v5.28", then this operator always treats its operands as numbers.
884 Before Perl 5.28 this feature produced a warning in the
885 "experimental::bitwise" category.
886
887 Bitwise Or and Exclusive Or
888 Binary "|" returns its operands ORed together bit by bit.
889
890 Binary "^" returns its operands XORed together bit by bit.
891
892 Although no warning is currently raised, the results are not well
893 defined when these operations are performed on operands that aren't
894 either numbers (see "Integer Arithmetic") nor bitstrings (see "Bitwise
895 String Operators").
896
897 Note that "|" and "^" have lower priority than relational operators, so
898 for example the parentheses are essential in a test like
899
900 print "false\n" if (8 | 2) != 10;
901
902 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
903 v5.28", then this operator always treats its operands as numbers.
904 Before Perl 5.28. this feature produced a warning in the
905 "experimental::bitwise" category.
906
907 C-style Logical And
908 Binary "&&" performs a short-circuit logical AND operation. That is,
909 if the left operand is false, the right operand is not even evaluated.
910 Scalar or list context propagates down to the right operand if it is
911 evaluated.
912
913 C-style Logical Or
914 Binary "||" performs a short-circuit logical OR operation. That is, if
915 the left operand is true, the right operand is not even evaluated.
916 Scalar or list context propagates down to the right operand if it is
917 evaluated.
918
919 Logical Defined-Or
920 Although it has no direct equivalent in C, Perl's "//" operator is
921 related to its C-style "or". In fact, it's exactly the same as "||",
922 except that it tests the left hand side's definedness instead of its
923 truth. Thus, "EXPR1 // EXPR2" returns the value of "EXPR1" if it's
924 defined, otherwise, the value of "EXPR2" is returned. ("EXPR1" is
925 evaluated in scalar context, "EXPR2" in the context of "//" itself).
926 Usually, this is the same result as "defined(EXPR1) ? EXPR1 : EXPR2"
927 (except that the ternary-operator form can be used as a lvalue, while
928 "EXPR1 // EXPR2" cannot). This is very useful for providing default
929 values for variables. If you actually want to test if at least one of
930 $x and $y is defined, use "defined($x // $y)".
931
932 The "||", "//" and "&&" operators return the last value evaluated
933 (unlike C's "||" and "&&", which return 0 or 1). Thus, a reasonably
934 portable way to find out the home directory might be:
935
936 $home = $ENV{HOME}
937 // $ENV{LOGDIR}
938 // (getpwuid($<))[7]
939 // die "You're homeless!\n";
940
941 In particular, this means that you shouldn't use this for selecting
942 between two aggregates for assignment:
943
944 @a = @b || @c; # This doesn't do the right thing
945 @a = scalar(@b) || @c; # because it really means this.
946 @a = @b ? @b : @c; # This works fine, though.
947
948 As alternatives to "&&" and "||" when used for control flow, Perl
949 provides the "and" and "or" operators (see below). The short-circuit
950 behavior is identical. The precedence of "and" and "or" is much lower,
951 however, so that you can safely use them after a list operator without
952 the need for parentheses:
953
954 unlink "alpha", "beta", "gamma"
955 or gripe(), next LINE;
956
957 With the C-style operators that would have been written like this:
958
959 unlink("alpha", "beta", "gamma")
960 || (gripe(), next LINE);
961
962 It would be even more readable to write that this way:
963
964 unless(unlink("alpha", "beta", "gamma")) {
965 gripe();
966 next LINE;
967 }
968
969 Using "or" for assignment is unlikely to do what you want; see below.
970
971 Range Operators
972 Binary ".." is the range operator, which is really two different
973 operators depending on the context. In list context, it returns a list
974 of values counting (up by ones) from the left value to the right value.
975 If the left value is greater than the right value then it returns the
976 empty list. The range operator is useful for writing "foreach (1..10)"
977 loops and for doing slice operations on arrays. In the current
978 implementation, no temporary array is created when the range operator
979 is used as the expression in "foreach" loops, but older versions of
980 Perl might burn a lot of memory when you write something like this:
981
982 for (1 .. 1_000_000) {
983 # code
984 }
985
986 The range operator also works on strings, using the magical auto-
987 increment, see below.
988
989 In scalar context, ".." returns a boolean value. The operator is
990 bistable, like a flip-flop, and emulates the line-range (comma)
991 operator of sed, awk, and various editors. Each ".." operator
992 maintains its own boolean state, even across calls to a subroutine that
993 contains it. It is false as long as its left operand is false. Once
994 the left operand is true, the range operator stays true until the right
995 operand is true, AFTER which the range operator becomes false again.
996 It doesn't become false till the next time the range operator is
997 evaluated. It can test the right operand and become false on the same
998 evaluation it became true (as in awk), but it still returns true once.
999 If you don't want it to test the right operand until the next
1000 evaluation, as in sed, just use three dots ("...") instead of two. In
1001 all other regards, "..." behaves just like ".." does.
1002
1003 The right operand is not evaluated while the operator is in the "false"
1004 state, and the left operand is not evaluated while the operator is in
1005 the "true" state. The precedence is a little lower than || and &&.
1006 The value returned is either the empty string for false, or a sequence
1007 number (beginning with 1) for true. The sequence number is reset for
1008 each range encountered. The final sequence number in a range has the
1009 string "E0" appended to it, which doesn't affect its numeric value, but
1010 gives you something to search for if you want to exclude the endpoint.
1011 You can exclude the beginning point by waiting for the sequence number
1012 to be greater than 1.
1013
1014 If either operand of scalar ".." is a constant expression, that operand
1015 is considered true if it is equal ("==") to the current input line
1016 number (the $. variable).
1017
1018 To be pedantic, the comparison is actually "int(EXPR) == int(EXPR)",
1019 but that is only an issue if you use a floating point expression; when
1020 implicitly using $. as described in the previous paragraph, the
1021 comparison is "int(EXPR) == int($.)" which is only an issue when $. is
1022 set to a floating point value and you are not reading from a file.
1023 Furthermore, "span" .. "spat" or "2.18 .. 3.14" will not do what you
1024 want in scalar context because each of the operands are evaluated using
1025 their integer representation.
1026
1027 Examples:
1028
1029 As a scalar operator:
1030
1031 if (101 .. 200) { print; } # print 2nd hundred lines, short for
1032 # if ($. == 101 .. $. == 200) { print; }
1033
1034 next LINE if (1 .. /^$/); # skip header lines, short for
1035 # next LINE if ($. == 1 .. /^$/);
1036 # (typically in a loop labeled LINE)
1037
1038 s/^/> / if (/^$/ .. eof()); # quote body
1039
1040 # parse mail messages
1041 while (<>) {
1042 $in_header = 1 .. /^$/;
1043 $in_body = /^$/ .. eof;
1044 if ($in_header) {
1045 # do something
1046 } else { # in body
1047 # do something else
1048 }
1049 } continue {
1050 close ARGV if eof; # reset $. each file
1051 }
1052
1053 Here's a simple example to illustrate the difference between the two
1054 range operators:
1055
1056 @lines = (" - Foo",
1057 "01 - Bar",
1058 "1 - Baz",
1059 " - Quux");
1060
1061 foreach (@lines) {
1062 if (/0/ .. /1/) {
1063 print "$_\n";
1064 }
1065 }
1066
1067 This program will print only the line containing "Bar". If the range
1068 operator is changed to "...", it will also print the "Baz" line.
1069
1070 And now some examples as a list operator:
1071
1072 for (101 .. 200) { print } # print $_ 100 times
1073 @foo = @foo[0 .. $#foo]; # an expensive no-op
1074 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
1075
1076 Because each operand is evaluated in integer form, "2.18 .. 3.14" will
1077 return two elements in list context.
1078
1079 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
1080
1081 The range operator in list context can make use of the magical auto-
1082 increment algorithm if both operands are strings, subject to the
1083 following rules:
1084
1085 • With one exception (below), if both strings look like numbers to
1086 Perl, the magic increment will not be applied, and the strings will
1087 be treated as numbers (more specifically, integers) instead.
1088
1089 For example, "-2".."2" is the same as "-2..2", and "2.18".."3.14"
1090 produces "2, 3".
1091
1092 • The exception to the above rule is when the left-hand string begins
1093 with 0 and is longer than one character, in this case the magic
1094 increment will be applied, even though strings like "01" would
1095 normally look like a number to Perl.
1096
1097 For example, "01".."04" produces "01", "02", "03", "04", and
1098 "00".."-1" produces "00" through "99" - this may seem surprising,
1099 but see the following rules for why it works this way. To get
1100 dates with leading zeros, you can say:
1101
1102 @z2 = ("01" .. "31");
1103 print $z2[$mday];
1104
1105 If you want to force strings to be interpreted as numbers, you
1106 could say
1107
1108 @numbers = ( 0+$first .. 0+$last );
1109
1110 Note: In Perl versions 5.30 and below, any string on the left-hand
1111 side beginning with "0", including the string "0" itself, would
1112 cause the magic string increment behavior. This means that on these
1113 Perl versions, "0".."-1" would produce "0" through "99", which was
1114 inconsistent with "0..-1", which produces the empty list. This also
1115 means that "0".."9" now produces a list of integers instead of a
1116 list of strings.
1117
1118 • If the initial value specified isn't part of a magical increment
1119 sequence (that is, a non-empty string matching
1120 "/^[a-zA-Z]*[0-9]*\z/"), only the initial value will be returned.
1121
1122 For example, "ax".."az" produces "ax", "ay", "az", but "*x".."az"
1123 produces only "*x".
1124
1125 • For other initial values that are strings that do follow the rules
1126 of the magical increment, the corresponding sequence will be
1127 returned.
1128
1129 For example, you can say
1130
1131 @alphabet = ("A" .. "Z");
1132
1133 to get all normal letters of the English alphabet, or
1134
1135 $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
1136
1137 to get a hexadecimal digit.
1138
1139 • If the final value specified is not in the sequence that the
1140 magical increment would produce, the sequence goes until the next
1141 value would be longer than the final value specified. If the length
1142 of the final string is shorter than the first, the empty list is
1143 returned.
1144
1145 For example, "a".."--" is the same as "a".."zz", "0".."xx" produces
1146 "0" through "99", and "aaa".."--" returns the empty list.
1147
1148 As of Perl 5.26, the list-context range operator on strings works as
1149 expected in the scope of "use feature 'unicode_strings". In previous
1150 versions, and outside the scope of that feature, it exhibits "The
1151 "Unicode Bug"" in perlunicode: its behavior depends on the internal
1152 encoding of the range endpoint.
1153
1154 Because the magical increment only works on non-empty strings matching
1155 "/^[a-zA-Z]*[0-9]*\z/", the following will only return an alpha:
1156
1157 use charnames "greek";
1158 my @greek_small = ("\N{alpha}" .. "\N{omega}");
1159
1160 To get the 25 traditional lowercase Greek letters, including both
1161 sigmas, you could use this instead:
1162
1163 use charnames "greek";
1164 my @greek_small = map { chr } ( ord("\N{alpha}")
1165 ..
1166 ord("\N{omega}")
1167 );
1168
1169 However, because there are many other lowercase Greek characters than
1170 just those, to match lowercase Greek characters in a regular
1171 expression, you could use the pattern "/(?:(?=\p{Greek})\p{Lower})+/"
1172 (or the experimental feature "/(?[ \p{Greek} & \p{Lower} ])+/").
1173
1174 Conditional Operator
1175 Ternary "?:" is the conditional operator, just as in C. It works much
1176 like an if-then-else. If the argument before the "?" is true, the
1177 argument before the ":" is returned, otherwise the argument after the
1178 ":" is returned. For example:
1179
1180 printf "I have %d dog%s.\n", $n,
1181 ($n == 1) ? "" : "s";
1182
1183 Scalar or list context propagates downward into the 2nd or 3rd
1184 argument, whichever is selected.
1185
1186 $x = $ok ? $y : $z; # get a scalar
1187 @x = $ok ? @y : @z; # get an array
1188 $x = $ok ? @y : @z; # oops, that's just a count!
1189
1190 The operator may be assigned to if both the 2nd and 3rd arguments are
1191 legal lvalues (meaning that you can assign to them):
1192
1193 ($x_or_y ? $x : $y) = $z;
1194
1195 Because this operator produces an assignable result, using assignments
1196 without parentheses will get you in trouble. For example, this:
1197
1198 $x % 2 ? $x += 10 : $x += 2
1199
1200 Really means this:
1201
1202 (($x % 2) ? ($x += 10) : $x) += 2
1203
1204 Rather than this:
1205
1206 ($x % 2) ? ($x += 10) : ($x += 2)
1207
1208 That should probably be written more simply as:
1209
1210 $x += ($x % 2) ? 10 : 2;
1211
1212 Assignment Operators
1213 "=" is the ordinary assignment operator.
1214
1215 Assignment operators work as in C. That is,
1216
1217 $x += 2;
1218
1219 is equivalent to
1220
1221 $x = $x + 2;
1222
1223 although without duplicating any side effects that dereferencing the
1224 lvalue might trigger, such as from "tie()". Other assignment operators
1225 work similarly. The following are recognized:
1226
1227 **= += *= &= &.= <<= &&=
1228 -= /= |= |.= >>= ||=
1229 .= %= ^= ^.= //=
1230 x=
1231
1232 Although these are grouped by family, they all have the precedence of
1233 assignment. These combined assignment operators can only operate on
1234 scalars, whereas the ordinary assignment operator can assign to arrays,
1235 hashes, lists and even references. (See "Context" and "List value
1236 constructors" in perldata, and "Assigning to References" in perlref.)
1237
1238 Unlike in C, the scalar assignment operator produces a valid lvalue.
1239 Modifying an assignment is equivalent to doing the assignment and then
1240 modifying the variable that was assigned to. This is useful for
1241 modifying a copy of something, like this:
1242
1243 ($tmp = $global) =~ tr/13579/24680/;
1244
1245 Although as of 5.14, that can be also be accomplished this way:
1246
1247 use v5.14;
1248 $tmp = ($global =~ tr/13579/24680/r);
1249
1250 Likewise,
1251
1252 ($x += 2) *= 3;
1253
1254 is equivalent to
1255
1256 $x += 2;
1257 $x *= 3;
1258
1259 Similarly, a list assignment in list context produces the list of
1260 lvalues assigned to, and a list assignment in scalar context returns
1261 the number of elements produced by the expression on the right hand
1262 side of the assignment.
1263
1264 The three dotted bitwise assignment operators ("&.=" "|.=" "^.=") are
1265 new in Perl 5.22. See "Bitwise String Operators".
1266
1267 Comma Operator
1268 Binary "," is the comma operator. In scalar context it evaluates its
1269 left argument, throws that value away, then evaluates its right
1270 argument and returns that value. This is just like C's comma operator.
1271
1272 In list context, it's just the list argument separator, and inserts
1273 both its arguments into the list. These arguments are also evaluated
1274 from left to right.
1275
1276 The "=>" operator (sometimes pronounced "fat comma") is a synonym for
1277 the comma except that it causes a word on its left to be interpreted as
1278 a string if it begins with a letter or underscore and is composed only
1279 of letters, digits and underscores. This includes operands that might
1280 otherwise be interpreted as operators, constants, single number
1281 v-strings or function calls. If in doubt about this behavior, the left
1282 operand can be quoted explicitly.
1283
1284 Otherwise, the "=>" operator behaves exactly as the comma operator or
1285 list argument separator, according to context.
1286
1287 For example:
1288
1289 use constant FOO => "something";
1290
1291 my %h = ( FOO => 23 );
1292
1293 is equivalent to:
1294
1295 my %h = ("FOO", 23);
1296
1297 It is NOT:
1298
1299 my %h = ("something", 23);
1300
1301 The "=>" operator is helpful in documenting the correspondence between
1302 keys and values in hashes, and other paired elements in lists.
1303
1304 %hash = ( $key => $value );
1305 login( $username => $password );
1306
1307 The special quoting behavior ignores precedence, and hence may apply to
1308 part of the left operand:
1309
1310 print time.shift => "bbb";
1311
1312 That example prints something like "1314363215shiftbbb", because the
1313 "=>" implicitly quotes the "shift" immediately on its left, ignoring
1314 the fact that "time.shift" is the entire left operand.
1315
1316 List Operators (Rightward)
1317 On the right side of a list operator, the comma has very low
1318 precedence, such that it controls all comma-separated expressions found
1319 there. The only operators with lower precedence are the logical
1320 operators "and", "or", and "not", which may be used to evaluate calls
1321 to list operators without the need for parentheses:
1322
1323 open HANDLE, "< :encoding(UTF-8)", "filename"
1324 or die "Can't open: $!\n";
1325
1326 However, some people find that code harder to read than writing it with
1327 parentheses:
1328
1329 open(HANDLE, "< :encoding(UTF-8)", "filename")
1330 or die "Can't open: $!\n";
1331
1332 in which case you might as well just use the more customary "||"
1333 operator:
1334
1335 open(HANDLE, "< :encoding(UTF-8)", "filename")
1336 || die "Can't open: $!\n";
1337
1338 See also discussion of list operators in "Terms and List Operators
1339 (Leftward)".
1340
1341 Logical Not
1342 Unary "not" returns the logical negation of the expression to its
1343 right. It's the equivalent of "!" except for the very low precedence.
1344
1345 Logical And
1346 Binary "and" returns the logical conjunction of the two surrounding
1347 expressions. It's equivalent to "&&" except for the very low
1348 precedence. This means that it short-circuits: the right expression is
1349 evaluated only if the left expression is true.
1350
1351 Logical or and Exclusive Or
1352 Binary "or" returns the logical disjunction of the two surrounding
1353 expressions. It's equivalent to "||" except for the very low
1354 precedence. This makes it useful for control flow:
1355
1356 print FH $data or die "Can't write to FH: $!";
1357
1358 This means that it short-circuits: the right expression is evaluated
1359 only if the left expression is false. Due to its precedence, you must
1360 be careful to avoid using it as replacement for the "||" operator. It
1361 usually works out better for flow control than in assignments:
1362
1363 $x = $y or $z; # bug: this is wrong
1364 ($x = $y) or $z; # really means this
1365 $x = $y || $z; # better written this way
1366
1367 However, when it's a list-context assignment and you're trying to use
1368 "||" for control flow, you probably need "or" so that the assignment
1369 takes higher precedence.
1370
1371 @info = stat($file) || die; # oops, scalar sense of stat!
1372 @info = stat($file) or die; # better, now @info gets its due
1373
1374 Then again, you could always use parentheses.
1375
1376 Binary "xor" returns the exclusive-OR of the two surrounding
1377 expressions. It cannot short-circuit (of course).
1378
1379 There is no low precedence operator for defined-OR.
1380
1381 C Operators Missing From Perl
1382 Here is what C has that Perl doesn't:
1383
1384 unary & Address-of operator. (But see the "\" operator for taking a
1385 reference.)
1386
1387 unary * Dereference-address operator. (Perl's prefix dereferencing
1388 operators are typed: "$", "@", "%", and "&".)
1389
1390 (TYPE) Type-casting operator.
1391
1392 Quote and Quote-like Operators
1393 While we usually think of quotes as literal values, in Perl they
1394 function as operators, providing various kinds of interpolating and
1395 pattern matching capabilities. Perl provides customary quote
1396 characters for these behaviors, but also provides a way for you to
1397 choose your quote character for any of them. In the following table, a
1398 "{}" represents any pair of delimiters you choose.
1399
1400 Customary Generic Meaning Interpolates
1401 '' q{} Literal no
1402 "" qq{} Literal yes
1403 `` qx{} Command yes*
1404 qw{} Word list no
1405 // m{} Pattern match yes*
1406 qr{} Pattern yes*
1407 s{}{} Substitution yes*
1408 tr{}{} Transliteration no (but see below)
1409 y{}{} Transliteration no (but see below)
1410 <<EOF here-doc yes*
1411
1412 * unless the delimiter is ''.
1413
1414 Non-bracketing delimiters use the same character fore and aft, but the
1415 four sorts of ASCII brackets (round, angle, square, curly) all nest,
1416 which means that
1417
1418 q{foo{bar}baz}
1419
1420 is the same as
1421
1422 'foo{bar}baz'
1423
1424 Note, however, that this does not always work for quoting Perl code:
1425
1426 $s = q{ if($x eq "}") ... }; # WRONG
1427
1428 is a syntax error. The "Text::Balanced" module (standard as of v5.8,
1429 and from CPAN before then) is able to do this properly.
1430
1431 There can (and in some cases, must) be whitespace between the operator
1432 and the quoting characters, except when "#" is being used as the
1433 quoting character. "q#foo#" is parsed as the string "foo", while
1434 "q #foo#" is the operator "q" followed by a comment. Its argument will
1435 be taken from the next line. This allows you to write:
1436
1437 s {foo} # Replace foo
1438 {bar} # with bar.
1439
1440 The cases where whitespace must be used are when the quoting character
1441 is a word character (meaning it matches "/\w/"):
1442
1443 q XfooX # Works: means the string 'foo'
1444 qXfooX # WRONG!
1445
1446 The following escape sequences are available in constructs that
1447 interpolate, and in transliterations whose delimiters aren't single
1448 quotes ("'").
1449
1450 Sequence Note Description
1451 \t tab (HT, TAB)
1452 \n newline (NL)
1453 \r return (CR)
1454 \f form feed (FF)
1455 \b backspace (BS)
1456 \a alarm (bell) (BEL)
1457 \e escape (ESC)
1458 \x{263A} [1,8] hex char (example shown: SMILEY)
1459 \x1b [2,8] restricted range hex char (example: ESC)
1460 \N{name} [3] named Unicode character or character sequence
1461 \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
1462 \c[ [5] control char (example: chr(27))
1463 \o{23072} [6,8] octal char (example: SMILEY)
1464 \033 [7,8] restricted range octal char (example: ESC)
1465
1466 [1] The result is the character specified by the hexadecimal number
1467 between the braces. See "[8]" below for details on which
1468 character.
1469
1470 Only hexadecimal digits are valid between the braces. If an
1471 invalid character is encountered, a warning will be issued and the
1472 invalid character and all subsequent characters (valid or invalid)
1473 within the braces will be discarded.
1474
1475 If there are no valid digits between the braces, the generated
1476 character is the NULL character ("\x{00}"). However, an explicit
1477 empty brace ("\x{}") will not cause a warning (currently).
1478
1479 [2] The result is the character specified by the hexadecimal number in
1480 the range 0x00 to 0xFF. See "[8]" below for details on which
1481 character.
1482
1483 Only hexadecimal digits are valid following "\x". When "\x" is
1484 followed by fewer than two valid digits, any valid digits will be
1485 zero-padded. This means that "\x7" will be interpreted as "\x07",
1486 and a lone "\x" will be interpreted as "\x00". Except at the end
1487 of a string, having fewer than two valid digits will result in a
1488 warning. Note that although the warning says the illegal character
1489 is ignored, it is only ignored as part of the escape and will still
1490 be used as the subsequent character in the string. For example:
1491
1492 Original Result Warns?
1493 "\x7" "\x07" no
1494 "\x" "\x00" no
1495 "\x7q" "\x07q" yes
1496 "\xq" "\x00q" yes
1497
1498 [3] The result is the Unicode character or character sequence given by
1499 name. See charnames.
1500
1501 [4] "\N{U+hexadecimal number}" means the Unicode character whose
1502 Unicode code point is hexadecimal number.
1503
1504 [5] The character following "\c" is mapped to some other character as
1505 shown in the table:
1506
1507 Sequence Value
1508 \c@ chr(0)
1509 \cA chr(1)
1510 \ca chr(1)
1511 \cB chr(2)
1512 \cb chr(2)
1513 ...
1514 \cZ chr(26)
1515 \cz chr(26)
1516 \c[ chr(27)
1517 # See below for chr(28)
1518 \c] chr(29)
1519 \c^ chr(30)
1520 \c_ chr(31)
1521 \c? chr(127) # (on ASCII platforms; see below for link to
1522 # EBCDIC discussion)
1523
1524 In other words, it's the character whose code point has had 64
1525 xor'd with its uppercase. "\c?" is DELETE on ASCII platforms
1526 because "ord("?") ^ 64" is 127, and "\c@" is NULL because the ord
1527 of "@" is 64, so xor'ing 64 itself produces 0.
1528
1529 Also, "\c\X" yields " chr(28) . "X"" for any X, but cannot come at
1530 the end of a string, because the backslash would be parsed as
1531 escaping the end quote.
1532
1533 On ASCII platforms, the resulting characters from the list above
1534 are the complete set of ASCII controls. This isn't the case on
1535 EBCDIC platforms; see "OPERATOR DIFFERENCES" in perlebcdic for a
1536 full discussion of the differences between these for ASCII versus
1537 EBCDIC platforms.
1538
1539 Use of any other character following the "c" besides those listed
1540 above is discouraged, and as of Perl v5.20, the only characters
1541 actually allowed are the printable ASCII ones, minus the left brace
1542 "{". What happens for any of the allowed other characters is that
1543 the value is derived by xor'ing with the seventh bit, which is 64,
1544 and a warning raised if enabled. Using the non-allowed characters
1545 generates a fatal error.
1546
1547 To get platform independent controls, you can use "\N{...}".
1548
1549 [6] The result is the character specified by the octal number between
1550 the braces. See "[8]" below for details on which character.
1551
1552 If a character that isn't an octal digit is encountered, a warning
1553 is raised, and the value is based on the octal digits before it,
1554 discarding it and all following characters up to the closing brace.
1555 It is a fatal error if there are no octal digits at all.
1556
1557 [7] The result is the character specified by the three-digit octal
1558 number in the range 000 to 777 (but best to not use above 077, see
1559 next paragraph). See "[8]" below for details on which character.
1560
1561 Some contexts allow 2 or even 1 digit, but any usage without
1562 exactly three digits, the first being a zero, may give unintended
1563 results. (For example, in a regular expression it may be confused
1564 with a backreference; see "Octal escapes" in perlrebackslash.)
1565 Starting in Perl 5.14, you may use "\o{}" instead, which avoids all
1566 these problems. Otherwise, it is best to use this construct only
1567 for ordinals "\077" and below, remembering to pad to the left with
1568 zeros to make three digits. For larger ordinals, either use
1569 "\o{}", or convert to something else, such as to hex and use
1570 "\N{U+}" (which is portable between platforms with different
1571 character sets) or "\x{}" instead.
1572
1573 [8] Several constructs above specify a character by a number. That
1574 number gives the character's position in the character set encoding
1575 (indexed from 0). This is called synonymously its ordinal, code
1576 position, or code point. Perl works on platforms that have a
1577 native encoding currently of either ASCII/Latin1 or EBCDIC, each of
1578 which allow specification of 256 characters. In general, if the
1579 number is 255 (0xFF, 0377) or below, Perl interprets this in the
1580 platform's native encoding. If the number is 256 (0x100, 0400) or
1581 above, Perl interprets it as a Unicode code point and the result is
1582 the corresponding Unicode character. For example "\x{50}" and
1583 "\o{120}" both are the number 80 in decimal, which is less than
1584 256, so the number is interpreted in the native character set
1585 encoding. In ASCII the character in the 80th position (indexed
1586 from 0) is the letter "P", and in EBCDIC it is the ampersand symbol
1587 "&". "\x{100}" and "\o{400}" are both 256 in decimal, so the
1588 number is interpreted as a Unicode code point no matter what the
1589 native encoding is. The name of the character in the 256th
1590 position (indexed by 0) in Unicode is "LATIN CAPITAL LETTER A WITH
1591 MACRON".
1592
1593 An exception to the above rule is that "\N{U+hex number}" is always
1594 interpreted as a Unicode code point, so that "\N{U+0050}" is "P"
1595 even on EBCDIC platforms.
1596
1597 NOTE: Unlike C and other languages, Perl has no "\v" escape sequence
1598 for the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but
1599 you may use "\N{VT}", "\ck", "\N{U+0b}", or "\x0b". ("\v" does have
1600 meaning in regular expression patterns in Perl, see perlre.)
1601
1602 The following escape sequences are available in constructs that
1603 interpolate, but not in transliterations.
1604
1605 \l lowercase next character only
1606 \u titlecase (not uppercase!) next character only
1607 \L lowercase all characters till \E or end of string
1608 \U uppercase all characters till \E or end of string
1609 \F foldcase all characters till \E or end of string
1610 \Q quote (disable) pattern metacharacters till \E or
1611 end of string
1612 \E end either case modification or quoted section
1613 (whichever was last seen)
1614
1615 See "quotemeta" in perlfunc for the exact definition of characters that
1616 are quoted by "\Q".
1617
1618 "\L", "\U", "\F", and "\Q" can stack, in which case you need one "\E"
1619 for each. For example:
1620
1621 say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
1622 This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
1623
1624 If a "use locale" form that includes "LC_CTYPE" is in effect (see
1625 perllocale), the case map used by "\l", "\L", "\u", and "\U" is taken
1626 from the current locale. If Unicode (for example, "\N{}" or code
1627 points of 0x100 or beyond) is being used, the case map used by "\l",
1628 "\L", "\u", and "\U" is as defined by Unicode. That means that case-
1629 mapping a single character can sometimes produce a sequence of several
1630 characters. Under "use locale", "\F" produces the same results as "\L"
1631 for all locales but a UTF-8 one, where it instead uses the Unicode
1632 definition.
1633
1634 All systems use the virtual "\n" to represent a line terminator, called
1635 a "newline". There is no such thing as an unvarying, physical newline
1636 character. It is only an illusion that the operating system, device
1637 drivers, C libraries, and Perl all conspire to preserve. Not all
1638 systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on
1639 the ancient Macs (pre-MacOS X) of yesteryear, these used to be
1640 reversed, and on systems without a line terminator, printing "\n" might
1641 emit no actual data. In general, use "\n" when you mean a "newline"
1642 for your system, but use the literal ASCII when you need an exact
1643 character. For example, most networking protocols expect and prefer a
1644 CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they
1645 often accept just "\012", they seldom tolerate just "\015". If you get
1646 in the habit of using "\n" for networking, you may be burned some day.
1647
1648 For constructs that do interpolate, variables beginning with ""$"" or
1649 ""@"" are interpolated. Subscripted variables such as $a[3] or
1650 "$href->{key}[0]" are also interpolated, as are array and hash slices.
1651 But method calls such as "$obj->meth" are not.
1652
1653 Interpolating an array or slice interpolates the elements in order,
1654 separated by the value of $", so is equivalent to interpolating
1655 "join $", @array". "Punctuation" arrays such as "@*" are usually
1656 interpolated only if the name is enclosed in braces "@{*}", but the
1657 arrays @_, "@+", and "@-" are interpolated even without braces.
1658
1659 For double-quoted strings, the quoting from "\Q" is applied after
1660 interpolation and escapes are processed.
1661
1662 "abc\Qfoo\tbar$s\Exyz"
1663
1664 is equivalent to
1665
1666 "abc" . quotemeta("foo\tbar$s") . "xyz"
1667
1668 For the pattern of regex operators ("qr//", "m//" and "s///"), the
1669 quoting from "\Q" is applied after interpolation is processed, but
1670 before escapes are processed. This allows the pattern to match
1671 literally (except for "$" and "@"). For example, the following
1672 matches:
1673
1674 '\s\t' =~ /\Q\s\t/
1675
1676 Because "$" or "@" trigger interpolation, you'll need to use something
1677 like "/\Quser\E\@\Qhost/" to match them literally.
1678
1679 Patterns are subject to an additional level of interpretation as a
1680 regular expression. This is done as a second pass, after variables are
1681 interpolated, so that regular expressions may be incorporated into the
1682 pattern from the variables. If this is not what you want, use "\Q" to
1683 interpolate a variable literally.
1684
1685 Apart from the behavior described above, Perl does not expand multiple
1686 levels of interpolation. In particular, contrary to the expectations
1687 of shell programmers, back-quotes do NOT interpolate within double
1688 quotes, nor do single quotes impede evaluation of variables when used
1689 within double quotes.
1690
1691 Regexp Quote-Like Operators
1692 Here are the quote-like operators that apply to pattern matching and
1693 related activities.
1694
1695 "qr/STRING/msixpodualn"
1696 This operator quotes (and possibly compiles) its STRING as a
1697 regular expression. STRING is interpolated the same way as
1698 PATTERN in "m/PATTERN/". If "'" is used as the delimiter, no
1699 variable interpolation is done. Returns a Perl value which may
1700 be used instead of the corresponding "/STRING/msixpodualn"
1701 expression. The returned value is a normalized version of the
1702 original pattern. It magically differs from a string
1703 containing the same characters: "ref(qr/x/)" returns "Regexp";
1704 however, dereferencing it is not well defined (you currently
1705 get the normalized version of the original pattern, but this
1706 may change).
1707
1708 For example,
1709
1710 $rex = qr/my.STRING/is;
1711 print $rex; # prints (?si-xm:my.STRING)
1712 s/$rex/foo/;
1713
1714 is equivalent to
1715
1716 s/my.STRING/foo/is;
1717
1718 The result may be used as a subpattern in a match:
1719
1720 $re = qr/$pattern/;
1721 $string =~ /foo${re}bar/; # can be interpolated in other
1722 # patterns
1723 $string =~ $re; # or used standalone
1724 $string =~ /$re/; # or this way
1725
1726 Since Perl may compile the pattern at the moment of execution
1727 of the "qr()" operator, using "qr()" may have speed advantages
1728 in some situations, notably if the result of "qr()" is used
1729 standalone:
1730
1731 sub match {
1732 my $patterns = shift;
1733 my @compiled = map qr/$_/i, @$patterns;
1734 grep {
1735 my $success = 0;
1736 foreach my $pat (@compiled) {
1737 $success = 1, last if /$pat/;
1738 }
1739 $success;
1740 } @_;
1741 }
1742
1743 Precompilation of the pattern into an internal representation
1744 at the moment of "qr()" avoids the need to recompile the
1745 pattern every time a match "/$pat/" is attempted. (Perl has
1746 many other internal optimizations, but none would be triggered
1747 in the above example if we did not use "qr()" operator.)
1748
1749 Options (specified by the following modifiers) are:
1750
1751 m Treat string as multiple lines.
1752 s Treat string as single line. (Make . match a newline)
1753 i Do case-insensitive pattern matching.
1754 x Use extended regular expressions; specifying two
1755 x's means \t and the SPACE character are ignored within
1756 square-bracketed character classes
1757 p When matching preserve a copy of the matched string so
1758 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
1759 defined (ignored starting in v5.20) as these are always
1760 defined starting in that release
1761 o Compile pattern only once.
1762 a ASCII-restrict: Use ASCII for \d, \s, \w and [[:posix:]]
1763 character classes; specifying two a's adds the further
1764 restriction that no ASCII character will match a
1765 non-ASCII one under /i.
1766 l Use the current run-time locale's rules.
1767 u Use Unicode rules.
1768 d Use Unicode or native charset, as in 5.12 and earlier.
1769 n Non-capture mode. Don't let () fill in $1, $2, etc...
1770
1771 If a precompiled pattern is embedded in a larger pattern then
1772 the effect of "msixpluadn" will be propagated appropriately.
1773 The effect that the "/o" modifier has is not propagated, being
1774 restricted to those patterns explicitly using it.
1775
1776 The "/a", "/d", "/l", and "/u" modifiers (added in Perl 5.14)
1777 control the character set rules, but "/a" is the only one you
1778 are likely to want to specify explicitly; the other three are
1779 selected automatically by various pragmas.
1780
1781 See perlre for additional information on valid syntax for
1782 STRING, and for a detailed look at the semantics of regular
1783 expressions. In particular, all modifiers except the largely
1784 obsolete "/o" are further explained in "Modifiers" in perlre.
1785 "/o" is described in the next section.
1786
1787 "m/PATTERN/msixpodualngc"
1788 "/PATTERN/msixpodualngc"
1789 Searches a string for a pattern match, and in scalar context
1790 returns true if it succeeds, false if it fails. If no string
1791 is specified via the "=~" or "!~" operator, the $_ string is
1792 searched. (The string specified with "=~" need not be an
1793 lvalue--it may be the result of an expression evaluation, but
1794 remember the "=~" binds rather tightly.) See also perlre.
1795
1796 Options are as described in "qr//" above; in addition, the
1797 following match process modifiers are available:
1798
1799 g Match globally, i.e., find all occurrences.
1800 c Do not reset search position on a failed match when /g is
1801 in effect.
1802
1803 If "/" is the delimiter then the initial "m" is optional. With
1804 the "m" you can use any pair of non-whitespace (ASCII)
1805 characters as delimiters. This is particularly useful for
1806 matching path names that contain "/", to avoid LTS (leaning
1807 toothpick syndrome). If "?" is the delimiter, then a match-
1808 only-once rule applies, described in "m?PATTERN?" below. If
1809 "'" (single quote) is the delimiter, no variable interpolation
1810 is performed on the PATTERN. When using a delimiter character
1811 valid in an identifier, whitespace is required after the "m".
1812
1813 PATTERN may contain variables, which will be interpolated every
1814 time the pattern search is evaluated, except for when the
1815 delimiter is a single quote. (Note that $(, $), and $| are not
1816 interpolated because they look like end-of-string tests.) Perl
1817 will not recompile the pattern unless an interpolated variable
1818 that it contains changes. You can force Perl to skip the test
1819 and never recompile by adding a "/o" (which stands for "once")
1820 after the trailing delimiter. Once upon a time, Perl would
1821 recompile regular expressions unnecessarily, and this modifier
1822 was useful to tell it not to do so, in the interests of speed.
1823 But now, the only reasons to use "/o" are one of:
1824
1825 1. The variables are thousands of characters long and you know
1826 that they don't change, and you need to wring out the last
1827 little bit of speed by having Perl skip testing for that.
1828 (There is a maintenance penalty for doing this, as
1829 mentioning "/o" constitutes a promise that you won't change
1830 the variables in the pattern. If you do change them, Perl
1831 won't even notice.)
1832
1833 2. you want the pattern to use the initial values of the
1834 variables regardless of whether they change or not. (But
1835 there are saner ways of accomplishing this than using
1836 "/o".)
1837
1838 3. If the pattern contains embedded code, such as
1839
1840 use re 'eval';
1841 $code = 'foo(?{ $x })';
1842 /$code/
1843
1844 then perl will recompile each time, even though the pattern
1845 string hasn't changed, to ensure that the current value of
1846 $x is seen each time. Use "/o" if you want to avoid this.
1847
1848 The bottom line is that using "/o" is almost never a good idea.
1849
1850 The empty pattern "//"
1851 If the PATTERN evaluates to the empty string, the last
1852 successfully matched regular expression is used instead. In
1853 this case, only the "g" and "c" flags on the empty pattern are
1854 honored; the other flags are taken from the original pattern.
1855 If no match has previously succeeded, this will (silently) act
1856 instead as a genuine empty pattern (which will always match).
1857
1858 Note that it's possible to confuse Perl into thinking "//" (the
1859 empty regex) is really "//" (the defined-or operator). Perl is
1860 usually pretty good about this, but some pathological cases
1861 might trigger this, such as "$x///" (is that "($x) / (//)" or
1862 "$x // /"?) and "print $fh //" ("print $fh(//" or
1863 "print($fh //"?). In all of these examples, Perl will assume
1864 you meant defined-or. If you meant the empty regex, just use
1865 parentheses or spaces to disambiguate, or even prefix the empty
1866 regex with an "m" (so "//" becomes "m//").
1867
1868 Matching in list context
1869 If the "/g" option is not used, "m//" in list context returns a
1870 list consisting of the subexpressions matched by the
1871 parentheses in the pattern, that is, ($1, $2, $3...) (Note
1872 that here $1 etc. are also set). When there are no parentheses
1873 in the pattern, the return value is the list "(1)" for success.
1874 With or without parentheses, an empty list is returned upon
1875 failure.
1876
1877 Examples:
1878
1879 open(TTY, "+</dev/tty")
1880 || die "can't access /dev/tty: $!";
1881
1882 <TTY> =~ /^y/i && foo(); # do foo if desired
1883
1884 if (/Version: *([0-9.]*)/) { $version = $1; }
1885
1886 next if m#^/usr/spool/uucp#;
1887
1888 # poor man's grep
1889 $arg = shift;
1890 while (<>) {
1891 print if /$arg/o; # compile only once (no longer needed!)
1892 }
1893
1894 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
1895
1896 This last example splits $foo into the first two words and the
1897 remainder of the line, and assigns those three fields to $F1,
1898 $F2, and $Etc. The conditional is true if any variables were
1899 assigned; that is, if the pattern matched.
1900
1901 The "/g" modifier specifies global pattern matching--that is,
1902 matching as many times as possible within the string. How it
1903 behaves depends on the context. In list context, it returns a
1904 list of the substrings matched by any capturing parentheses in
1905 the regular expression. If there are no parentheses, it
1906 returns a list of all the matched strings, as if there were
1907 parentheses around the whole pattern.
1908
1909 In scalar context, each execution of "m//g" finds the next
1910 match, returning true if it matches, and false if there is no
1911 further match. The position after the last match can be read
1912 or set using the "pos()" function; see "pos" in perlfunc. A
1913 failed match normally resets the search position to the
1914 beginning of the string, but you can avoid that by adding the
1915 "/c" modifier (for example, "m//gc"). Modifying the target
1916 string also resets the search position.
1917
1918 "\G assertion"
1919 You can intermix "m//g" matches with "m/\G.../g", where "\G" is
1920 a zero-width assertion that matches the exact position where
1921 the previous "m//g", if any, left off. Without the "/g"
1922 modifier, the "\G" assertion still anchors at "pos()" as it was
1923 at the start of the operation (see "pos" in perlfunc), but the
1924 match is of course only attempted once. Using "\G" without
1925 "/g" on a target string that has not previously had a "/g"
1926 match applied to it is the same as using the "\A" assertion to
1927 match the beginning of the string. Note also that, currently,
1928 "\G" is only properly supported when anchored at the very
1929 beginning of the pattern.
1930
1931 Examples:
1932
1933 # list context
1934 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1935
1936 # scalar context
1937 local $/ = "";
1938 while ($paragraph = <>) {
1939 while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
1940 $sentences++;
1941 }
1942 }
1943 say $sentences;
1944
1945 Here's another way to check for sentences in a paragraph:
1946
1947 my $sentence_rx = qr{
1948 (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
1949 # whitespace
1950 \p{Lu} # capital letter
1951 .*? # a bunch of anything
1952 (?<= \S ) # that ends in non-
1953 # whitespace
1954 (?<! \b [DMS]r ) # but isn't a common abbr.
1955 (?<! \b Mrs )
1956 (?<! \b Sra )
1957 (?<! \b St )
1958 [.?!] # followed by a sentence
1959 # ender
1960 (?= $ | \s ) # in front of end-of-string
1961 # or whitespace
1962 }sx;
1963 local $/ = "";
1964 while (my $paragraph = <>) {
1965 say "NEW PARAGRAPH";
1966 my $count = 0;
1967 while ($paragraph =~ /($sentence_rx)/g) {
1968 printf "\tgot sentence %d: <%s>\n", ++$count, $1;
1969 }
1970 }
1971
1972 Here's how to use "m//gc" with "\G":
1973
1974 $_ = "ppooqppqq";
1975 while ($i++ < 2) {
1976 print "1: '";
1977 print $1 while /(o)/gc; print "', pos=", pos, "\n";
1978 print "2: '";
1979 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
1980 print "3: '";
1981 print $1 while /(p)/gc; print "', pos=", pos, "\n";
1982 }
1983 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
1984
1985 The last example should print:
1986
1987 1: 'oo', pos=4
1988 2: 'q', pos=5
1989 3: 'pp', pos=7
1990 1: '', pos=7
1991 2: 'q', pos=8
1992 3: '', pos=8
1993 Final: 'q', pos=8
1994
1995 Notice that the final match matched "q" instead of "p", which a
1996 match without the "\G" anchor would have done. Also note that
1997 the final match did not update "pos". "pos" is only updated on
1998 a "/g" match. If the final match did indeed match "p", it's a
1999 good bet that you're running an ancient (pre-5.6.0) version of
2000 Perl.
2001
2002 A useful idiom for "lex"-like scanners is "/\G.../gc". You can
2003 combine several regexps like this to process a string part-by-
2004 part, doing different actions depending on which regexp
2005 matched. Each regexp tries to match where the previous one
2006 leaves off.
2007
2008 $_ = <<'EOL';
2009 $url = URI::URL->new( "http://example.com/" );
2010 die if $url eq "xXx";
2011 EOL
2012
2013 LOOP: {
2014 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
2015 print(" lowercase"), redo LOOP
2016 if /\G\p{Ll}+\b[,.;]?\s*/gc;
2017 print(" UPPERCASE"), redo LOOP
2018 if /\G\p{Lu}+\b[,.;]?\s*/gc;
2019 print(" Capitalized"), redo LOOP
2020 if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
2021 print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
2022 print(" alphanumeric"), redo LOOP
2023 if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
2024 print(" line-noise"), redo LOOP if /\G\W+/gc;
2025 print ". That's all!\n";
2026 }
2027
2028 Here is the output (split into several lines):
2029
2030 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
2031 line-noise lowercase line-noise lowercase line-noise lowercase
2032 lowercase line-noise lowercase lowercase line-noise lowercase
2033 lowercase line-noise MiXeD line-noise. That's all!
2034
2035 "m?PATTERN?msixpodualngc"
2036 This is just like the "m/PATTERN/" search, except that it
2037 matches only once between calls to the "reset()" operator.
2038 This is a useful optimization when you want to see only the
2039 first occurrence of something in each file of a set of files,
2040 for instance. Only "m??" patterns local to the current
2041 package are reset.
2042
2043 while (<>) {
2044 if (m?^$?) {
2045 # blank line between header and body
2046 }
2047 } continue {
2048 reset if eof; # clear m?? status for next file
2049 }
2050
2051 Another example switched the first "latin1" encoding it finds
2052 to "utf8" in a pod file:
2053
2054 s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
2055
2056 The match-once behavior is controlled by the match delimiter
2057 being "?"; with any other delimiter this is the normal "m//"
2058 operator.
2059
2060 In the past, the leading "m" in "m?PATTERN?" was optional, but
2061 omitting it would produce a deprecation warning. As of
2062 v5.22.0, omitting it produces a syntax error. If you encounter
2063 this construct in older code, you can just add "m".
2064
2065 "s/PATTERN/REPLACEMENT/msixpodualngcer"
2066 Searches a string for a pattern, and if found, replaces that
2067 pattern with the replacement text and returns the number of
2068 substitutions made. Otherwise it returns false (a value that
2069 is both an empty string ("") and numeric zero (0) as described
2070 in "Relational Operators").
2071
2072 If the "/r" (non-destructive) option is used then it runs the
2073 substitution on a copy of the string and instead of returning
2074 the number of substitutions, it returns the copy whether or not
2075 a substitution occurred. The original string is never changed
2076 when "/r" is used. The copy will always be a plain string,
2077 even if the input is an object or a tied variable.
2078
2079 If no string is specified via the "=~" or "!~" operator, the $_
2080 variable is searched and modified. Unless the "/r" option is
2081 used, the string specified must be a scalar variable, an array
2082 element, a hash element, or an assignment to one of those; that
2083 is, some sort of scalar lvalue.
2084
2085 If the delimiter chosen is a single quote, no variable
2086 interpolation is done on either the PATTERN or the REPLACEMENT.
2087 Otherwise, if the PATTERN contains a "$" that looks like a
2088 variable rather than an end-of-string test, the variable will
2089 be interpolated into the pattern at run-time. If you want the
2090 pattern compiled only once the first time the variable is
2091 interpolated, use the "/o" option. If the pattern evaluates to
2092 the empty string, the last successfully executed regular
2093 expression is used instead. See perlre for further explanation
2094 on these.
2095
2096 Options are as with "m//" with the addition of the following
2097 replacement specific options:
2098
2099 e Evaluate the right side as an expression.
2100 ee Evaluate the right side as a string then eval the
2101 result.
2102 r Return substitution and leave the original string
2103 untouched.
2104
2105 Any non-whitespace delimiter may replace the slashes. Add
2106 space after the "s" when using a character allowed in
2107 identifiers. If single quotes are used, no interpretation is
2108 done on the replacement string (the "/e" modifier overrides
2109 this, however). Note that Perl treats backticks as normal
2110 delimiters; the replacement text is not evaluated as a command.
2111 If the PATTERN is delimited by bracketing quotes, the
2112 REPLACEMENT has its own pair of quotes, which may or may not be
2113 bracketing quotes, for example, "s(foo)(bar)" or "s<foo>/bar/".
2114 A "/e" will cause the replacement portion to be treated as a
2115 full-fledged Perl expression and evaluated right then and
2116 there. It is, however, syntax checked at compile-time. A
2117 second "e" modifier will cause the replacement portion to be
2118 "eval"ed before being run as a Perl expression.
2119
2120 Examples:
2121
2122 s/\bgreen\b/mauve/g; # don't change wintergreen
2123
2124 $path =~ s|/usr/bin|/usr/local/bin|;
2125
2126 s/Login: $foo/Login: $bar/; # run-time pattern
2127
2128 ($foo = $bar) =~ s/this/that/; # copy first, then
2129 # change
2130 ($foo = "$bar") =~ s/this/that/; # convert to string,
2131 # copy, then change
2132 $foo = $bar =~ s/this/that/r; # Same as above using /r
2133 $foo = $bar =~ s/this/that/r
2134 =~ s/that/the other/r; # Chained substitutes
2135 # using /r
2136 @foo = map { s/this/that/r } @bar # /r is very useful in
2137 # maps
2138
2139 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
2140
2141 $_ = 'abc123xyz';
2142 s/\d+/$&*2/e; # yields 'abc246xyz'
2143 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
2144 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
2145
2146 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
2147 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
2148 s/^=(\w+)/pod($1)/ge; # use function call
2149
2150 $_ = 'abc123xyz';
2151 $x = s/abc/def/r; # $x is 'def123xyz' and
2152 # $_ remains 'abc123xyz'.
2153
2154 # expand variables in $_, but dynamics only, using
2155 # symbolic dereferencing
2156 s/\$(\w+)/${$1}/g;
2157
2158 # Add one to the value of any numbers in the string
2159 s/(\d+)/1 + $1/eg;
2160
2161 # Titlecase words in the last 30 characters only
2162 substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
2163
2164 # This will expand any embedded scalar variable
2165 # (including lexicals) in $_ : First $1 is interpolated
2166 # to the variable name, and then evaluated
2167 s/(\$\w+)/$1/eeg;
2168
2169 # Delete (most) C comments.
2170 $program =~ s {
2171 /\* # Match the opening delimiter.
2172 .*? # Match a minimal number of characters.
2173 \*/ # Match the closing delimiter.
2174 } []gsx;
2175
2176 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
2177 # expensively
2178
2179 for ($variable) { # trim whitespace in $variable,
2180 # cheap
2181 s/^\s+//;
2182 s/\s+$//;
2183 }
2184
2185 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
2186
2187 $foo !~ s/A/a/g; # Lowercase all A's in $foo; return
2188 # 0 if any were found and changed;
2189 # otherwise return 1
2190
2191 Note the use of "$" instead of "\" in the last example. Unlike
2192 sed, we use the \<digit> form only in the left hand side.
2193 Anywhere else it's $<digit>.
2194
2195 Occasionally, you can't use just a "/g" to get all the changes
2196 to occur that you might want. Here are two common cases:
2197
2198 # put commas in the right places in an integer
2199 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
2200
2201 # expand tabs to 8-column spacing
2202 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
2203
2204 While "s///" accepts the "/c" flag, it has no effect beyond
2205 producing a warning if warnings are enabled.
2206
2207 Quote-Like Operators
2208 "q/STRING/"
2209 'STRING'
2210 A single-quoted, literal string. A backslash represents a
2211 backslash unless followed by the delimiter or another backslash, in
2212 which case the delimiter or backslash is interpolated.
2213
2214 $foo = q!I said, "You said, 'She said it.'"!;
2215 $bar = q('This is it.');
2216 $baz = '\n'; # a two-character string
2217
2218 "qq/STRING/"
2219 "STRING"
2220 A double-quoted, interpolated string.
2221
2222 $_ .= qq
2223 (*** The previous line contains the naughty word "$1".\n)
2224 if /\b(tcl|java|python)\b/i; # :-)
2225 $baz = "\n"; # a one-character string
2226
2227 "qx/STRING/"
2228 "`STRING`"
2229 A string which is (possibly) interpolated and then executed as a
2230 system command, via /bin/sh or its equivalent if required. Shell
2231 wildcards, pipes, and redirections will be honored. Similarly to
2232 "system", if the string contains no shell metacharacters then it
2233 will executed directly. The collected standard output of the
2234 command is returned; standard error is unaffected. In scalar
2235 context, it comes back as a single (potentially multi-line) string,
2236 or "undef" if the shell (or command) could not be started. In list
2237 context, returns a list of lines (however you've defined lines with
2238 $/ or $INPUT_RECORD_SEPARATOR), or an empty list if the shell (or
2239 command) could not be started.
2240
2241 Because backticks do not affect standard error, use shell file
2242 descriptor syntax (assuming the shell supports this) if you care to
2243 address this. To capture a command's STDERR and STDOUT together:
2244
2245 $output = `cmd 2>&1`;
2246
2247 To capture a command's STDOUT but discard its STDERR:
2248
2249 $output = `cmd 2>/dev/null`;
2250
2251 To capture a command's STDERR but discard its STDOUT (ordering is
2252 important here):
2253
2254 $output = `cmd 2>&1 1>/dev/null`;
2255
2256 To exchange a command's STDOUT and STDERR in order to capture the
2257 STDERR but leave its STDOUT to come out the old STDERR:
2258
2259 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
2260
2261 To read both a command's STDOUT and its STDERR separately, it's
2262 easiest to redirect them separately to files, and then read from
2263 those files when the program is done:
2264
2265 system("program args 1>program.stdout 2>program.stderr");
2266
2267 The STDIN filehandle used by the command is inherited from Perl's
2268 STDIN. For example:
2269
2270 open(SPLAT, "stuff") || die "can't open stuff: $!";
2271 open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
2272 print STDOUT `sort`;
2273
2274 will print the sorted contents of the file named "stuff".
2275
2276 Using single-quote as a delimiter protects the command from Perl's
2277 double-quote interpolation, passing it on to the shell instead:
2278
2279 $perl_info = qx(ps $$); # that's Perl's $$
2280 $shell_info = qx'ps $$'; # that's the new shell's $$
2281
2282 How that string gets evaluated is entirely subject to the command
2283 interpreter on your system. On most platforms, you will have to
2284 protect shell metacharacters if you want them treated literally.
2285 This is in practice difficult to do, as it's unclear how to escape
2286 which characters. See perlsec for a clean and safe example of a
2287 manual "fork()" and "exec()" to emulate backticks safely.
2288
2289 On some platforms (notably DOS-like ones), the shell may not be
2290 capable of dealing with multiline commands, so putting newlines in
2291 the string may not get you what you want. You may be able to
2292 evaluate multiple commands in a single line by separating them with
2293 the command separator character, if your shell supports that (for
2294 example, ";" on many Unix shells and "&" on the Windows NT "cmd"
2295 shell).
2296
2297 Perl will attempt to flush all files opened for output before
2298 starting the child process, but this may not be supported on some
2299 platforms (see perlport). To be safe, you may need to set $|
2300 ($AUTOFLUSH in "English") or call the "autoflush()" method of
2301 "IO::Handle" on any open handles.
2302
2303 Beware that some command shells may place restrictions on the
2304 length of the command line. You must ensure your strings don't
2305 exceed this limit after any necessary interpolations. See the
2306 platform-specific release notes for more details about your
2307 particular environment.
2308
2309 Using this operator can lead to programs that are difficult to
2310 port, because the shell commands called vary between systems, and
2311 may in fact not be present at all. As one example, the "type"
2312 command under the POSIX shell is very different from the "type"
2313 command under DOS. That doesn't mean you should go out of your way
2314 to avoid backticks when they're the right way to get something
2315 done. Perl was made to be a glue language, and one of the things
2316 it glues together is commands. Just understand what you're getting
2317 yourself into.
2318
2319 Like "system", backticks put the child process exit code in $?. If
2320 you'd like to manually inspect failure, you can check all possible
2321 failure modes by inspecting $? like this:
2322
2323 if ($? == -1) {
2324 print "failed to execute: $!\n";
2325 }
2326 elsif ($? & 127) {
2327 printf "child died with signal %d, %s coredump\n",
2328 ($? & 127), ($? & 128) ? 'with' : 'without';
2329 }
2330 else {
2331 printf "child exited with value %d\n", $? >> 8;
2332 }
2333
2334 Use the open pragma to control the I/O layers used when reading the
2335 output of the command, for example:
2336
2337 use open IN => ":encoding(UTF-8)";
2338 my $x = `cmd-producing-utf-8`;
2339
2340 "qx//" can also be called like a function with "readpipe" in
2341 perlfunc.
2342
2343 See "I/O Operators" for more discussion.
2344
2345 "qw/STRING/"
2346 Evaluates to a list of the words extracted out of STRING, using
2347 embedded whitespace as the word delimiters. It can be understood
2348 as being roughly equivalent to:
2349
2350 split(" ", q/STRING/);
2351
2352 the differences being that it only splits on ASCII whitespace,
2353 generates a real list at compile time, and in scalar context it
2354 returns the last element in the list. So this expression:
2355
2356 qw(foo bar baz)
2357
2358 is semantically equivalent to the list:
2359
2360 "foo", "bar", "baz"
2361
2362 Some frequently seen examples:
2363
2364 use POSIX qw( setlocale localeconv )
2365 @EXPORT = qw( foo bar baz );
2366
2367 A common mistake is to try to separate the words with commas or to
2368 put comments into a multi-line "qw"-string. For this reason, the
2369 "use warnings" pragma and the -w switch (that is, the $^W variable)
2370 produces warnings if the STRING contains the "," or the "#"
2371 character.
2372
2373 "tr/SEARCHLIST/REPLACEMENTLIST/cdsr"
2374 "y/SEARCHLIST/REPLACEMENTLIST/cdsr"
2375 Transliterates all occurrences of the characters found (or not
2376 found if the "/c" modifier is specified) in the search list with
2377 the positionally corresponding character in the replacement list,
2378 possibly deleting some, depending on the modifiers specified. It
2379 returns the number of characters replaced or deleted. If no string
2380 is specified via the "=~" or "!~" operator, the $_ string is
2381 transliterated.
2382
2383 For sed devotees, "y" is provided as a synonym for "tr".
2384
2385 If the "/r" (non-destructive) option is present, a new copy of the
2386 string is made and its characters transliterated, and this copy is
2387 returned no matter whether it was modified or not: the original
2388 string is always left unchanged. The new copy is always a plain
2389 string, even if the input string is an object or a tied variable.
2390
2391 Unless the "/r" option is used, the string specified with "=~" must
2392 be a scalar variable, an array element, a hash element, or an
2393 assignment to one of those; in other words, an lvalue.
2394
2395 If the characters delimiting SEARCHLIST and REPLACEMENTLIST are
2396 single quotes ("tr'SEARCHLIST'REPLACEMENTLIST'"), the only
2397 interpolation is removal of "\" from pairs of "\\".
2398
2399 Otherwise, a character range may be specified with a hyphen, so
2400 "tr/A-J/0-9/" does the same replacement as
2401 "tr/ACEGIBDFHJ/0246813579/".
2402
2403 If the SEARCHLIST is delimited by bracketing quotes, the
2404 REPLACEMENTLIST must have its own pair of quotes, which may or may
2405 not be bracketing quotes; for example, "tr[aeiouy][yuoiea]" or
2406 "tr(+\-*/)/ABCD/".
2407
2408 Characters may be literals, or (if the delimiters aren't single
2409 quotes) any of the escape sequences accepted in double-quoted
2410 strings. But there is never any variable interpolation, so "$" and
2411 "@" are always treated as literals. A hyphen at the beginning or
2412 end, or preceded by a backslash is also always considered a
2413 literal. Escape sequence details are in the table near the
2414 beginning of this section.
2415
2416 Note that "tr" does not do regular expression character classes
2417 such as "\d" or "\pL". The "tr" operator is not equivalent to the
2418 tr(1) utility. "tr[a-z][A-Z]" will uppercase the 26 letters "a"
2419 through "z", but for case changing not confined to ASCII, use "lc",
2420 "uc", "lcfirst", "ucfirst" (all documented in perlfunc), or the
2421 substitution operator "s/PATTERN/REPLACEMENT/" (with "\U", "\u",
2422 "\L", and "\l" string-interpolation escapes in the REPLACEMENT
2423 portion).
2424
2425 Most ranges are unportable between character sets, but certain ones
2426 signal Perl to do special handling to make them portable. There
2427 are two classes of portable ranges. The first are any subsets of
2428 the ranges "A-Z", "a-z", and "0-9", when expressed as literal
2429 characters.
2430
2431 tr/h-k/H-K/
2432
2433 capitalizes the letters "h", "i", "j", and "k" and nothing else, no
2434 matter what the platform's character set is. In contrast, all of
2435
2436 tr/\x68-\x6B/\x48-\x4B/
2437 tr/h-\x6B/H-\x4B/
2438 tr/\x68-k/\x48-K/
2439
2440 do the same capitalizations as the previous example when run on
2441 ASCII platforms, but something completely different on EBCDIC ones.
2442
2443 The second class of portable ranges is invoked when one or both of
2444 the range's end points are expressed as "\N{...}"
2445
2446 $string =~ tr/\N{U+20}-\N{U+7E}//d;
2447
2448 removes from $string all the platform's characters which are
2449 equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E.
2450 This is a portable range, and has the same effect on every platform
2451 it is run on. In this example, these are the ASCII printable
2452 characters. So after this is run, $string has only controls and
2453 characters which have no ASCII equivalents.
2454
2455 But, even for portable ranges, it is not generally obvious what is
2456 included without having to look things up in the manual. A sound
2457 principle is to use only ranges that both begin from, and end at,
2458 either ASCII alphabetics of equal case ("b-e", "B-E"), or digits
2459 ("1-4"). Anything else is unclear (and unportable unless "\N{...}"
2460 is used). If in doubt, spell out the character sets in full.
2461
2462 Options:
2463
2464 c Complement the SEARCHLIST.
2465 d Delete found but unreplaced characters.
2466 r Return the modified string and leave the original string
2467 untouched.
2468 s Squash duplicate replaced characters.
2469
2470 If the "/d" modifier is specified, any characters specified by
2471 SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that
2472 this is slightly more flexible than the behavior of some tr
2473 programs, which delete anything they find in the SEARCHLIST,
2474 period.)
2475
2476 If the "/s" modifier is specified, sequences of characters, all in
2477 a row, that were transliterated to the same character are squashed
2478 down to a single instance of that character.
2479
2480 my $a = "aaaba"
2481 $a =~ tr/a/a/s # $a now is "aba"
2482
2483 If the "/d" modifier is used, the REPLACEMENTLIST is always
2484 interpreted exactly as specified. Otherwise, if the
2485 REPLACEMENTLIST is shorter than the SEARCHLIST, the final
2486 character, if any, is replicated until it is long enough. There
2487 won't be a final character if and only if the REPLACEMENTLIST is
2488 empty, in which case REPLACEMENTLIST is copied from SEARCHLIST.
2489 An empty REPLACEMENTLIST is useful for counting characters in a
2490 class, or for squashing character sequences in a class.
2491
2492 tr/abcd// tr/abcd/abcd/
2493 tr/abcd/AB/ tr/abcd/ABBB/
2494 tr/abcd//d s/[abcd]//g
2495 tr/abcd/AB/d (tr/ab/AB/ + s/[cd]//g) - but run together
2496
2497 If the "/c" modifier is specified, the characters to be
2498 transliterated are the ones NOT in SEARCHLIST, that is, it is
2499 complemented. If "/d" and/or "/s" are also specified, they apply
2500 to the complemented SEARCHLIST. Recall, that if REPLACEMENTLIST is
2501 empty (except under "/d") a copy of SEARCHLIST is used instead.
2502 That copy is made after complementing under "/c". SEARCHLIST is
2503 sorted by code point order after complementing, and any
2504 REPLACEMENTLIST is applied to that sorted result. This means that
2505 under "/c", the order of the characters specified in SEARCHLIST is
2506 irrelevant. This can lead to different results on EBCDIC systems
2507 if REPLACEMENTLIST contains more than one character, hence it is
2508 generally non-portable to use "/c" with such a REPLACEMENTLIST.
2509
2510 Another way of describing the operation is this: If "/c" is
2511 specified, the SEARCHLIST is sorted by code point order, then
2512 complemented. If REPLACEMENTLIST is empty and "/d" is not
2513 specified, REPLACEMENTLIST is replaced by a copy of SEARCHLIST (as
2514 modified under "/c"), and these potentially modified lists are used
2515 as the basis for what follows. Any character in the target string
2516 that isn't in SEARCHLIST is passed through unchanged. Every other
2517 character in the target string is replaced by the character in
2518 REPLACEMENTLIST that positionally corresponds to its mate in
2519 SEARCHLIST, except that under "/s", the 2nd and following
2520 characters are squeezed out in a sequence of characters in a row
2521 that all translate to the same character. If SEARCHLIST is longer
2522 than REPLACEMENTLIST, characters in the target string that match a
2523 character in SEARCHLIST that doesn't have a correspondence in
2524 REPLACEMENTLIST are either deleted from the target string if "/d"
2525 is specified; or replaced by the final character in REPLACEMENTLIST
2526 if "/d" isn't specified.
2527
2528 Some examples:
2529
2530 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
2531
2532 $cnt = tr/*/*/; # count the stars in $_
2533 $cnt = tr/*//; # same thing
2534
2535 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
2536 $cnt = $sky =~ tr/*//; # same thing
2537
2538 $cnt = $sky =~ tr/*//c; # count all the non-stars in $sky
2539 $cnt = $sky =~ tr/*/*/c; # same, but transliterate each non-star
2540 # into a star, leaving the already-stars
2541 # alone. Afterwards, everything in $sky
2542 # is a star.
2543
2544 $cnt = tr/0-9//; # count the ASCII digits in $_
2545
2546 tr/a-zA-Z//s; # bookkeeper -> bokeper
2547 tr/o/o/s; # bookkeeper -> bokkeeper
2548 tr/oe/oe/s; # bookkeeper -> bokkeper
2549 tr/oe//s; # bookkeeper -> bokkeper
2550 tr/oe/o/s; # bookkeeper -> bokkopor
2551
2552 ($HOST = $host) =~ tr/a-z/A-Z/;
2553 $HOST = $host =~ tr/a-z/A-Z/r; # same thing
2554
2555 $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
2556 =~ s/:/ -p/r;
2557
2558 tr/a-zA-Z/ /cs; # change non-alphas to single space
2559
2560 @stripped = map tr/a-zA-Z/ /csr, @original;
2561 # /r with map
2562
2563 tr [\200-\377]
2564 [\000-\177]; # wickedly delete 8th bit
2565
2566 $foo !~ tr/A/a/ # transliterate all the A's in $foo to 'a',
2567 # return 0 if any were found and changed.
2568 # Otherwise return 1
2569
2570 If multiple transliterations are given for a character, only the
2571 first one is used:
2572
2573 tr/AAA/XYZ/
2574
2575 will transliterate any A to X.
2576
2577 Because the transliteration table is built at compile time, neither
2578 the SEARCHLIST nor the REPLACEMENTLIST are subjected to double
2579 quote interpolation. That means that if you want to use variables,
2580 you must use an "eval()":
2581
2582 eval "tr/$oldlist/$newlist/";
2583 die $@ if $@;
2584
2585 eval "tr/$oldlist/$newlist/, 1" or die $@;
2586
2587 "<<EOF"
2588 A line-oriented form of quoting is based on the shell "here-
2589 document" syntax. Following a "<<" you specify a string to
2590 terminate the quoted material, and all lines following the current
2591 line down to the terminating string are the value of the item.
2592
2593 Prefixing the terminating string with a "~" specifies that you want
2594 to use "Indented Here-docs" (see below).
2595
2596 The terminating string may be either an identifier (a word), or
2597 some quoted text. An unquoted identifier works like double quotes.
2598 There may not be a space between the "<<" and the identifier,
2599 unless the identifier is explicitly quoted. The terminating string
2600 must appear by itself (unquoted and with no surrounding whitespace)
2601 on the terminating line.
2602
2603 If the terminating string is quoted, the type of quotes used
2604 determine the treatment of the text.
2605
2606 Double Quotes
2607 Double quotes indicate that the text will be interpolated using
2608 exactly the same rules as normal double quoted strings.
2609
2610 print <<EOF;
2611 The price is $Price.
2612 EOF
2613
2614 print << "EOF"; # same as above
2615 The price is $Price.
2616 EOF
2617
2618 Single Quotes
2619 Single quotes indicate the text is to be treated literally with
2620 no interpolation of its content. This is similar to single
2621 quoted strings except that backslashes have no special meaning,
2622 with "\\" being treated as two backslashes and not one as they
2623 would in every other quoting construct.
2624
2625 Just as in the shell, a backslashed bareword following the "<<"
2626 means the same thing as a single-quoted string does:
2627
2628 $cost = <<'VISTA'; # hasta la ...
2629 That'll be $10 please, ma'am.
2630 VISTA
2631
2632 $cost = <<\VISTA; # Same thing!
2633 That'll be $10 please, ma'am.
2634 VISTA
2635
2636 This is the only form of quoting in perl where there is no need
2637 to worry about escaping content, something that code generators
2638 can and do make good use of.
2639
2640 Backticks
2641 The content of the here doc is treated just as it would be if
2642 the string were embedded in backticks. Thus the content is
2643 interpolated as though it were double quoted and then executed
2644 via the shell, with the results of the execution returned.
2645
2646 print << `EOC`; # execute command and get results
2647 echo hi there
2648 EOC
2649
2650 Indented Here-docs
2651 The here-doc modifier "~" allows you to indent your here-docs
2652 to make the code more readable:
2653
2654 if ($some_var) {
2655 print <<~EOF;
2656 This is a here-doc
2657 EOF
2658 }
2659
2660 This will print...
2661
2662 This is a here-doc
2663
2664 ...with no leading whitespace.
2665
2666 The delimiter is used to determine the exact whitespace to
2667 remove from the beginning of each line. All lines must have at
2668 least the same starting whitespace (except lines only
2669 containing a newline) or perl will croak. Tabs and spaces can
2670 be mixed, but are matched exactly. One tab will not be equal
2671 to 8 spaces!
2672
2673 Additional beginning whitespace (beyond what preceded the
2674 delimiter) will be preserved:
2675
2676 print <<~EOF;
2677 This text is not indented
2678 This text is indented with two spaces
2679 This text is indented with two tabs
2680 EOF
2681
2682 Finally, the modifier may be used with all of the forms
2683 mentioned above:
2684
2685 <<~\EOF;
2686 <<~'EOF'
2687 <<~"EOF"
2688 <<~`EOF`
2689
2690 And whitespace may be used between the "~" and quoted
2691 delimiters:
2692
2693 <<~ 'EOF'; # ... "EOF", `EOF`
2694
2695 It is possible to stack multiple here-docs in a row:
2696
2697 print <<"foo", <<"bar"; # you can stack them
2698 I said foo.
2699 foo
2700 I said bar.
2701 bar
2702
2703 myfunc(<< "THIS", 23, <<'THAT');
2704 Here's a line
2705 or two.
2706 THIS
2707 and here's another.
2708 THAT
2709
2710 Just don't forget that you have to put a semicolon on the end to
2711 finish the statement, as Perl doesn't know you're not going to try
2712 to do this:
2713
2714 print <<ABC
2715 179231
2716 ABC
2717 + 20;
2718
2719 If you want to remove the line terminator from your here-docs, use
2720 "chomp()".
2721
2722 chomp($string = <<'END');
2723 This is a string.
2724 END
2725
2726 If you want your here-docs to be indented with the rest of the
2727 code, use the "<<~FOO" construct described under "Indented Here-
2728 docs":
2729
2730 $quote = <<~'FINIS';
2731 The Road goes ever on and on,
2732 down from the door where it began.
2733 FINIS
2734
2735 If you use a here-doc within a delimited construct, such as in
2736 "s///eg", the quoted material must still come on the line following
2737 the "<<FOO" marker, which means it may be inside the delimited
2738 construct:
2739
2740 s/this/<<E . 'that'
2741 the other
2742 E
2743 . 'more '/eg;
2744
2745 It works this way as of Perl 5.18. Historically, it was
2746 inconsistent, and you would have to write
2747
2748 s/this/<<E . 'that'
2749 . 'more '/eg;
2750 the other
2751 E
2752
2753 outside of string evals.
2754
2755 Additionally, quoting rules for the end-of-string identifier are
2756 unrelated to Perl's quoting rules. "q()", "qq()", and the like are
2757 not supported in place of '' and "", and the only interpolation is
2758 for backslashing the quoting character:
2759
2760 print << "abc\"def";
2761 testing...
2762 abc"def
2763
2764 Finally, quoted strings cannot span multiple lines. The general
2765 rule is that the identifier must be a string literal. Stick with
2766 that, and you should be safe.
2767
2768 Gory details of parsing quoted constructs
2769 When presented with something that might have several different
2770 interpretations, Perl uses the DWIM (that's "Do What I Mean") principle
2771 to pick the most probable interpretation. This strategy is so
2772 successful that Perl programmers often do not suspect the ambivalence
2773 of what they write. But from time to time, Perl's notions differ
2774 substantially from what the author honestly meant.
2775
2776 This section hopes to clarify how Perl handles quoted constructs.
2777 Although the most common reason to learn this is to unravel
2778 labyrinthine regular expressions, because the initial steps of parsing
2779 are the same for all quoting operators, they are all discussed
2780 together.
2781
2782 The most important Perl parsing rule is the first one discussed below:
2783 when processing a quoted construct, Perl first finds the end of that
2784 construct, then interprets its contents. If you understand this rule,
2785 you may skip the rest of this section on the first reading. The other
2786 rules are likely to contradict the user's expectations much less
2787 frequently than this first one.
2788
2789 Some passes discussed below are performed concurrently, but because
2790 their results are the same, we consider them individually. For
2791 different quoting constructs, Perl performs different numbers of
2792 passes, from one to four, but these passes are always performed in the
2793 same order.
2794
2795 Finding the end
2796 The first pass is finding the end of the quoted construct. This
2797 results in saving to a safe location a copy of the text (between
2798 the starting and ending delimiters), normalized as necessary to
2799 avoid needing to know what the original delimiters were.
2800
2801 If the construct is a here-doc, the ending delimiter is a line that
2802 has a terminating string as the content. Therefore "<<EOF" is
2803 terminated by "EOF" immediately followed by "\n" and starting from
2804 the first column of the terminating line. When searching for the
2805 terminating line of a here-doc, nothing is skipped. In other
2806 words, lines after the here-doc syntax are compared with the
2807 terminating string line by line.
2808
2809 For the constructs except here-docs, single characters are used as
2810 starting and ending delimiters. If the starting delimiter is an
2811 opening punctuation (that is "(", "[", "{", or "<"), the ending
2812 delimiter is the corresponding closing punctuation (that is ")",
2813 "]", "}", or ">"). If the starting delimiter is an unpaired
2814 character like "/" or a closing punctuation, the ending delimiter
2815 is the same as the starting delimiter. Therefore a "/" terminates
2816 a "qq//" construct, while a "]" terminates both "qq[]" and "qq]]"
2817 constructs.
2818
2819 When searching for single-character delimiters, escaped delimiters
2820 and "\\" are skipped. For example, while searching for terminating
2821 "/", combinations of "\\" and "\/" are skipped. If the delimiters
2822 are bracketing, nested pairs are also skipped. For example, while
2823 searching for a closing "]" paired with the opening "[",
2824 combinations of "\\", "\]", and "\[" are all skipped, and nested
2825 "[" and "]" are skipped as well. However, when backslashes are
2826 used as the delimiters (like "qq\\" and "tr\\\"), nothing is
2827 skipped. During the search for the end, backslashes that escape
2828 delimiters or other backslashes are removed (exactly speaking, they
2829 are not copied to the safe location).
2830
2831 For constructs with three-part delimiters ("s///", "y///", and
2832 "tr///"), the search is repeated once more. If the first delimiter
2833 is not an opening punctuation, the three delimiters must be the
2834 same, such as "s!!!" and "tr)))", in which case the second
2835 delimiter terminates the left part and starts the right part at
2836 once. If the left part is delimited by bracketing punctuation
2837 (that is "()", "[]", "{}", or "<>"), the right part needs another
2838 pair of delimiters such as "s(){}" and "tr[]//". In these cases,
2839 whitespace and comments are allowed between the two parts, although
2840 the comment must follow at least one whitespace character;
2841 otherwise a character expected as the start of the comment may be
2842 regarded as the starting delimiter of the right part.
2843
2844 During this search no attention is paid to the semantics of the
2845 construct. Thus:
2846
2847 "$hash{"$foo/$bar"}"
2848
2849 or:
2850
2851 m/
2852 bar # NOT a comment, this slash / terminated m//!
2853 /x
2854
2855 do not form legal quoted expressions. The quoted part ends on the
2856 first """ and "/", and the rest happens to be a syntax error.
2857 Because the slash that terminated "m//" was followed by a "SPACE",
2858 the example above is not "m//x", but rather "m//" with no "/x"
2859 modifier. So the embedded "#" is interpreted as a literal "#".
2860
2861 Also no attention is paid to "\c\" (multichar control char syntax)
2862 during this search. Thus the second "\" in "qq/\c\/" is
2863 interpreted as a part of "\/", and the following "/" is not
2864 recognized as a delimiter. Instead, use "\034" or "\x1c" at the
2865 end of quoted constructs.
2866
2867 Interpolation
2868 The next step is interpolation in the text obtained, which is now
2869 delimiter-independent. There are multiple cases.
2870
2871 "<<'EOF'"
2872 No interpolation is performed. Note that the combination "\\"
2873 is left intact, since escaped delimiters are not available for
2874 here-docs.
2875
2876 "m''", the pattern of "s'''"
2877 No interpolation is performed at this stage. Any backslashed
2878 sequences including "\\" are treated at the stage to "parsing
2879 regular expressions".
2880
2881 '', "q//", "tr'''", "y'''", the replacement of "s'''"
2882 The only interpolation is removal of "\" from pairs of "\\".
2883 Therefore "-" in "tr'''" and "y'''" is treated literally as a
2884 hyphen and no character range is available. "\1" in the
2885 replacement of "s'''" does not work as $1.
2886
2887 "tr///", "y///"
2888 No variable interpolation occurs. String modifying
2889 combinations for case and quoting such as "\Q", "\U", and "\E"
2890 are not recognized. The other escape sequences such as "\200"
2891 and "\t" and backslashed characters such as "\\" and "\-" are
2892 converted to appropriate literals. The character "-" is
2893 treated specially and therefore "\-" is treated as a literal
2894 "-".
2895
2896 "", "``", "qq//", "qx//", "<file*glob>", "<<"EOF""
2897 "\Q", "\U", "\u", "\L", "\l", "\F" (possibly paired with "\E")
2898 are converted to corresponding Perl constructs. Thus,
2899 "$foo\Qbaz$bar" is converted to
2900 "$foo . (quotemeta("baz" . $bar))" internally. The other
2901 escape sequences such as "\200" and "\t" and backslashed
2902 characters such as "\\" and "\-" are replaced with appropriate
2903 expansions.
2904
2905 Let it be stressed that whatever falls between "\Q" and "\E" is
2906 interpolated in the usual way. Something like "\Q\\E" has no
2907 "\E" inside. Instead, it has "\Q", "\\", and "E", so the
2908 result is the same as for "\\\\E". As a general rule,
2909 backslashes between "\Q" and "\E" may lead to counterintuitive
2910 results. So, "\Q\t\E" is converted to "quotemeta("\t")", which
2911 is the same as "\\\t" (since TAB is not alphanumeric). Note
2912 also that:
2913
2914 $str = '\t';
2915 return "\Q$str";
2916
2917 may be closer to the conjectural intention of the writer of
2918 "\Q\t\E".
2919
2920 Interpolated scalars and arrays are converted internally to the
2921 "join" and "." catenation operations. Thus, "$foo XXX '@arr'"
2922 becomes:
2923
2924 $foo . " XXX '" . (join $", @arr) . "'";
2925
2926 All operations above are performed simultaneously, left to
2927 right.
2928
2929 Because the result of "\Q STRING \E" has all metacharacters
2930 quoted, there is no way to insert a literal "$" or "@" inside a
2931 "\Q\E" pair. If protected by "\", "$" will be quoted to become
2932 "\\\$"; if not, it is interpreted as the start of an
2933 interpolated scalar.
2934
2935 Note also that the interpolation code needs to make a decision
2936 on where the interpolated scalar ends. For instance, whether
2937 "a $x -> {c}" really means:
2938
2939 "a " . $x . " -> {c}";
2940
2941 or:
2942
2943 "a " . $x -> {c};
2944
2945 Most of the time, the longest possible text that does not
2946 include spaces between components and which contains matching
2947 braces or brackets. because the outcome may be determined by
2948 voting based on heuristic estimators, the result is not
2949 strictly predictable. Fortunately, it's usually correct for
2950 ambiguous cases.
2951
2952 the replacement of "s///"
2953 Processing of "\Q", "\U", "\u", "\L", "\l", "\F" and
2954 interpolation happens as with "qq//" constructs.
2955
2956 It is at this step that "\1" is begrudgingly converted to $1 in
2957 the replacement text of "s///", in order to correct the
2958 incorrigible sed hackers who haven't picked up the saner idiom
2959 yet. A warning is emitted if the "use warnings" pragma or the
2960 -w command-line flag (that is, the $^W variable) was set.
2961
2962 "RE" in "m?RE?", "/RE/", "m/RE/", "s/RE/foo/",
2963 Processing of "\Q", "\U", "\u", "\L", "\l", "\F", "\E", and
2964 interpolation happens (almost) as with "qq//" constructs.
2965
2966 Processing of "\N{...}" is also done here, and compiled into an
2967 intermediate form for the regex compiler. (This is because, as
2968 mentioned below, the regex compilation may be done at execution
2969 time, and "\N{...}" is a compile-time construct.)
2970
2971 However any other combinations of "\" followed by a character
2972 are not substituted but only skipped, in order to parse them as
2973 regular expressions at the following step. As "\c" is skipped
2974 at this step, "@" of "\c@" in RE is possibly treated as an
2975 array symbol (for example @foo), even though the same text in
2976 "qq//" gives interpolation of "\c@".
2977
2978 Code blocks such as "(?{BLOCK})" are handled by temporarily
2979 passing control back to the perl parser, in a similar way that
2980 an interpolated array subscript expression such as
2981 "foo$array[1+f("[xyz")]bar" would be.
2982
2983 Moreover, inside "(?{BLOCK})", "(?# comment )", and a
2984 "#"-comment in a "/x"-regular expression, no processing is
2985 performed whatsoever. This is the first step at which the
2986 presence of the "/x" modifier is relevant.
2987
2988 Interpolation in patterns has several quirks: $|, $(, $), "@+"
2989 and "@-" are not interpolated, and constructs $var[SOMETHING]
2990 are voted (by several different estimators) to be either an
2991 array element or $var followed by an RE alternative. This is
2992 where the notation "${arr[$bar]}" comes handy: "/${arr[0-9]}/"
2993 is interpreted as array element "-9", not as a regular
2994 expression from the variable $arr followed by a digit, which
2995 would be the interpretation of "/$arr[0-9]/". Since voting
2996 among different estimators may occur, the result is not
2997 predictable.
2998
2999 The lack of processing of "\\" creates specific restrictions on
3000 the post-processed text. If the delimiter is "/", one cannot
3001 get the combination "\/" into the result of this step. "/"
3002 will finish the regular expression, "\/" will be stripped to
3003 "/" on the previous step, and "\\/" will be left as is.
3004 Because "/" is equivalent to "\/" inside a regular expression,
3005 this does not matter unless the delimiter happens to be
3006 character special to the RE engine, such as in "s*foo*bar*",
3007 "m[foo]", or "m?foo?"; or an alphanumeric char, as in:
3008
3009 m m ^ a \s* b mmx;
3010
3011 In the RE above, which is intentionally obfuscated for
3012 illustration, the delimiter is "m", the modifier is "mx", and
3013 after delimiter-removal the RE is the same as for
3014 "m/ ^ a \s* b /mx". There's more than one reason you're
3015 encouraged to restrict your delimiters to non-alphanumeric,
3016 non-whitespace choices.
3017
3018 This step is the last one for all constructs except regular
3019 expressions, which are processed further.
3020
3021 parsing regular expressions
3022 Previous steps were performed during the compilation of Perl code,
3023 but this one happens at run time, although it may be optimized to
3024 be calculated at compile time if appropriate. After preprocessing
3025 described above, and possibly after evaluation if concatenation,
3026 joining, casing translation, or metaquoting are involved, the
3027 resulting string is passed to the RE engine for compilation.
3028
3029 Whatever happens in the RE engine might be better discussed in
3030 perlre, but for the sake of continuity, we shall do so here.
3031
3032 This is another step where the presence of the "/x" modifier is
3033 relevant. The RE engine scans the string from left to right and
3034 converts it into a finite automaton.
3035
3036 Backslashed characters are either replaced with corresponding
3037 literal strings (as with "\{"), or else they generate special nodes
3038 in the finite automaton (as with "\b"). Characters special to the
3039 RE engine (such as "|") generate corresponding nodes or groups of
3040 nodes. "(?#...)" comments are ignored. All the rest is either
3041 converted to literal strings to match, or else is ignored (as is
3042 whitespace and "#"-style comments if "/x" is present).
3043
3044 Parsing of the bracketed character class construct, "[...]", is
3045 rather different than the rule used for the rest of the pattern.
3046 The terminator of this construct is found using the same rules as
3047 for finding the terminator of a "{}"-delimited construct, the only
3048 exception being that "]" immediately following "[" is treated as
3049 though preceded by a backslash.
3050
3051 The terminator of runtime "(?{...})" is found by temporarily
3052 switching control to the perl parser, which should stop at the
3053 point where the logically balancing terminating "}" is found.
3054
3055 It is possible to inspect both the string given to RE engine and
3056 the resulting finite automaton. See the arguments
3057 "debug"/"debugcolor" in the "use re" pragma, as well as Perl's -Dr
3058 command-line switch documented in "Command Switches" in perlrun.
3059
3060 Optimization of regular expressions
3061 This step is listed for completeness only. Since it does not
3062 change semantics, details of this step are not documented and are
3063 subject to change without notice. This step is performed over the
3064 finite automaton that was generated during the previous pass.
3065
3066 It is at this stage that "split()" silently optimizes "/^/" to mean
3067 "/^/m".
3068
3069 I/O Operators
3070 There are several I/O operators you should know about.
3071
3072 A string enclosed by backticks (grave accents) first undergoes double-
3073 quote interpolation. It is then interpreted as an external command,
3074 and the output of that command is the value of the backtick string,
3075 like in a shell. In scalar context, a single string consisting of all
3076 output is returned. In list context, a list of values is returned, one
3077 per line of output. (You can set $/ to use a different line
3078 terminator.) The command is executed each time the pseudo-literal is
3079 evaluated. The status value of the command is returned in $? (see
3080 perlvar for the interpretation of $?). Unlike in csh, no translation
3081 is done on the return data--newlines remain newlines. Unlike in any of
3082 the shells, single quotes do not hide variable names in the command
3083 from interpretation. To pass a literal dollar-sign through to the
3084 shell you need to hide it with a backslash. The generalized form of
3085 backticks is "qx//", or you can call the "readpipe" in perlfunc
3086 function. (Because backticks always undergo shell expansion as well,
3087 see perlsec for security concerns.)
3088
3089 In scalar context, evaluating a filehandle in angle brackets yields the
3090 next line from that file (the newline, if any, included), or "undef" at
3091 end-of-file or on error. When $/ is set to "undef" (sometimes known as
3092 file-slurp mode) and the file is empty, it returns '' the first time,
3093 followed by "undef" subsequently.
3094
3095 Ordinarily you must assign the returned value to a variable, but there
3096 is one situation where an automatic assignment happens. If and only if
3097 the input symbol is the only thing inside the conditional of a "while"
3098 statement (even if disguised as a "for(;;)" loop), the value is
3099 automatically assigned to the global variable $_, destroying whatever
3100 was there previously. (This may seem like an odd thing to you, but
3101 you'll use the construct in almost every Perl script you write.) The
3102 $_ variable is not implicitly localized. You'll have to put a
3103 "local $_;" before the loop if you want that to happen. Furthermore,
3104 if the input symbol or an explicit assignment of the input symbol to a
3105 scalar is used as a "while"/"for" condition, then the condition
3106 actually tests for definedness of the expression's value, not for its
3107 regular truth value.
3108
3109 Thus the following lines are equivalent:
3110
3111 while (defined($_ = <STDIN>)) { print; }
3112 while ($_ = <STDIN>) { print; }
3113 while (<STDIN>) { print; }
3114 for (;<STDIN>;) { print; }
3115 print while defined($_ = <STDIN>);
3116 print while ($_ = <STDIN>);
3117 print while <STDIN>;
3118
3119 This also behaves similarly, but assigns to a lexical variable instead
3120 of to $_:
3121
3122 while (my $line = <STDIN>) { print $line }
3123
3124 In these loop constructs, the assigned value (whether assignment is
3125 automatic or explicit) is then tested to see whether it is defined.
3126 The defined test avoids problems where the line has a string value that
3127 would be treated as false by Perl; for example a "" or a "0" with no
3128 trailing newline. If you really mean for such values to terminate the
3129 loop, they should be tested for explicitly:
3130
3131 while (($_ = <STDIN>) ne '0') { ... }
3132 while (<STDIN>) { last unless $_; ... }
3133
3134 In other boolean contexts, "<FILEHANDLE>" without an explicit "defined"
3135 test or comparison elicits a warning if the "use warnings" pragma or
3136 the -w command-line switch (the $^W variable) is in effect.
3137
3138 The filehandles STDIN, STDOUT, and STDERR are predefined. (The
3139 filehandles "stdin", "stdout", and "stderr" will also work except in
3140 packages, where they would be interpreted as local identifiers rather
3141 than global.) Additional filehandles may be created with the "open()"
3142 function, amongst others. See perlopentut and "open" in perlfunc for
3143 details on this.
3144
3145 If a "<FILEHANDLE>" is used in a context that is looking for a list, a
3146 list comprising all input lines is returned, one line per list element.
3147 It's easy to grow to a rather large data space this way, so use with
3148 care.
3149
3150 "<FILEHANDLE>" may also be spelled "readline(*FILEHANDLE)". See
3151 "readline" in perlfunc.
3152
3153 The null filehandle "<>" is special: it can be used to emulate the
3154 behavior of sed and awk, and any other Unix filter program that takes a
3155 list of filenames, doing the same to each line of input from all of
3156 them. Input from "<>" comes either from standard input, or from each
3157 file listed on the command line. Here's how it works: the first time
3158 "<>" is evaluated, the @ARGV array is checked, and if it is empty,
3159 $ARGV[0] is set to "-", which when opened gives you standard input.
3160 The @ARGV array is then processed as a list of filenames. The loop
3161
3162 while (<>) {
3163 ... # code for each line
3164 }
3165
3166 is equivalent to the following Perl-like pseudo code:
3167
3168 unshift(@ARGV, '-') unless @ARGV;
3169 while ($ARGV = shift) {
3170 open(ARGV, $ARGV);
3171 while (<ARGV>) {
3172 ... # code for each line
3173 }
3174 }
3175
3176 except that it isn't so cumbersome to say, and will actually work. It
3177 really does shift the @ARGV array and put the current filename into the
3178 $ARGV variable. It also uses filehandle ARGV internally. "<>" is just
3179 a synonym for "<ARGV>", which is magical. (The pseudo code above
3180 doesn't work because it treats "<ARGV>" as non-magical.)
3181
3182 Since the null filehandle uses the two argument form of "open" in
3183 perlfunc it interprets special characters, so if you have a script like
3184 this:
3185
3186 while (<>) {
3187 print;
3188 }
3189
3190 and call it with "perl dangerous.pl 'rm -rfv *|'", it actually opens a
3191 pipe, executes the "rm" command and reads "rm"'s output from that pipe.
3192 If you want all items in @ARGV to be interpreted as file names, you can
3193 use the module "ARGV::readonly" from CPAN, or use the double bracket:
3194
3195 while (<<>>) {
3196 print;
3197 }
3198
3199 Using double angle brackets inside of a while causes the open to use
3200 the three argument form (with the second argument being "<"), so all
3201 arguments in "ARGV" are treated as literal filenames (including "-").
3202 (Note that for convenience, if you use "<<>>" and if @ARGV is empty, it
3203 will still read from the standard input.)
3204
3205 You can modify @ARGV before the first "<>" as long as the array ends up
3206 containing the list of filenames you really want. Line numbers ($.)
3207 continue as though the input were one big happy file. See the example
3208 in "eof" in perlfunc for how to reset line numbers on each file.
3209
3210 If you want to set @ARGV to your own list of files, go right ahead.
3211 This sets @ARGV to all plain text files if no @ARGV was given:
3212
3213 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
3214
3215 You can even set them to pipe commands. For example, this
3216 automatically filters compressed arguments through gzip:
3217
3218 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
3219
3220 If you want to pass switches into your script, you can use one of the
3221 "Getopts" modules or put a loop on the front like this:
3222
3223 while ($_ = $ARGV[0], /^-/) {
3224 shift;
3225 last if /^--$/;
3226 if (/^-D(.*)/) { $debug = $1 }
3227 if (/^-v/) { $verbose++ }
3228 # ... # other switches
3229 }
3230
3231 while (<>) {
3232 # ... # code for each line
3233 }
3234
3235 The "<>" symbol will return "undef" for end-of-file only once. If you
3236 call it again after this, it will assume you are processing another
3237 @ARGV list, and if you haven't set @ARGV, will read input from STDIN.
3238
3239 If what the angle brackets contain is a simple scalar variable (for
3240 example, $foo), then that variable contains the name of the filehandle
3241 to input from, or its typeglob, or a reference to the same. For
3242 example:
3243
3244 $fh = \*STDIN;
3245 $line = <$fh>;
3246
3247 If what's within the angle brackets is neither a filehandle nor a
3248 simple scalar variable containing a filehandle name, typeglob, or
3249 typeglob reference, it is interpreted as a filename pattern to be
3250 globbed, and either a list of filenames or the next filename in the
3251 list is returned, depending on context. This distinction is determined
3252 on syntactic grounds alone. That means "<$x>" is always a "readline()"
3253 from an indirect handle, but "<$hash{key}>" is always a "glob()".
3254 That's because $x is a simple scalar variable, but $hash{key} is
3255 not--it's a hash element. Even "<$x >" (note the extra space) is
3256 treated as "glob("$x ")", not "readline($x)".
3257
3258 One level of double-quote interpretation is done first, but you can't
3259 say "<$foo>" because that's an indirect filehandle as explained in the
3260 previous paragraph. (In older versions of Perl, programmers would
3261 insert curly brackets to force interpretation as a filename glob:
3262 "<${foo}>". These days, it's considered cleaner to call the internal
3263 function directly as "glob($foo)", which is probably the right way to
3264 have done it in the first place.) For example:
3265
3266 while (<*.c>) {
3267 chmod 0644, $_;
3268 }
3269
3270 is roughly equivalent to:
3271
3272 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
3273 while (<FOO>) {
3274 chomp;
3275 chmod 0644, $_;
3276 }
3277
3278 except that the globbing is actually done internally using the standard
3279 "File::Glob" extension. Of course, the shortest way to do the above
3280 is:
3281
3282 chmod 0644, <*.c>;
3283
3284 A (file)glob evaluates its (embedded) argument only when it is starting
3285 a new list. All values must be read before it will start over. In
3286 list context, this isn't important because you automatically get them
3287 all anyway. However, in scalar context the operator returns the next
3288 value each time it's called, or "undef" when the list has run out. As
3289 with filehandle reads, an automatic "defined" is generated when the
3290 glob occurs in the test part of a "while", because legal glob returns
3291 (for example, a file called 0) would otherwise terminate the loop.
3292 Again, "undef" is returned only once. So if you're expecting a single
3293 value from a glob, it is much better to say
3294
3295 ($file) = <blurch*>;
3296
3297 than
3298
3299 $file = <blurch*>;
3300
3301 because the latter will alternate between returning a filename and
3302 returning false.
3303
3304 If you're trying to do variable interpolation, it's definitely better
3305 to use the "glob()" function, because the older notation can cause
3306 people to become confused with the indirect filehandle notation.
3307
3308 @files = glob("$dir/*.[ch]");
3309 @files = glob($files[$i]);
3310
3311 If an angle-bracket-based globbing expression is used as the condition
3312 of a "while" or "for" loop, then it will be implicitly assigned to $_.
3313 If either a globbing expression or an explicit assignment of a globbing
3314 expression to a scalar is used as a "while"/"for" condition, then the
3315 condition actually tests for definedness of the expression's value, not
3316 for its regular truth value.
3317
3318 Constant Folding
3319 Like C, Perl does a certain amount of expression evaluation at compile
3320 time whenever it determines that all arguments to an operator are
3321 static and have no side effects. In particular, string concatenation
3322 happens at compile time between literals that don't do variable
3323 substitution. Backslash interpolation also happens at compile time.
3324 You can say
3325
3326 'Now is the time for all'
3327 . "\n"
3328 . 'good men to come to.'
3329
3330 and this all reduces to one string internally. Likewise, if you say
3331
3332 foreach $file (@filenames) {
3333 if (-s $file > 5 + 100 * 2**16) { }
3334 }
3335
3336 the compiler precomputes the number which that expression represents so
3337 that the interpreter won't have to.
3338
3339 No-ops
3340 Perl doesn't officially have a no-op operator, but the bare constants 0
3341 and 1 are special-cased not to produce a warning in void context, so
3342 you can for example safely do
3343
3344 1 while foo();
3345
3346 Bitwise String Operators
3347 Bitstrings of any size may be manipulated by the bitwise operators ("~
3348 | & ^").
3349
3350 If the operands to a binary bitwise op are strings of different sizes,
3351 | and ^ ops act as though the shorter operand had additional zero bits
3352 on the right, while the & op acts as though the longer operand were
3353 truncated to the length of the shorter. The granularity for such
3354 extension or truncation is one or more bytes.
3355
3356 # ASCII-based examples
3357 print "j p \n" ^ " a h"; # prints "JAPH\n"
3358 print "JA" | " ph\n"; # prints "japh\n"
3359 print "japh\nJunk" & '_____'; # prints "JAPH\n";
3360 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
3361
3362 If you are intending to manipulate bitstrings, be certain that you're
3363 supplying bitstrings: If an operand is a number, that will imply a
3364 numeric bitwise operation. You may explicitly show which type of
3365 operation you intend by using "" or "0+", as in the examples below.
3366
3367 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3368 $foo = '150' | 105; # yields 255
3369 $foo = 150 | '105'; # yields 255
3370 $foo = '150' | '105'; # yields string '155' (under ASCII)
3371
3372 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
3373 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
3374
3375 This somewhat unpredictable behavior can be avoided with the "bitwise"
3376 feature, new in Perl 5.22. You can enable it via
3377 "use feature 'bitwise'" or "use v5.28". Before Perl 5.28, it used to
3378 emit a warning in the "experimental::bitwise" category. Under this
3379 feature, the four standard bitwise operators ("~ | & ^") are always
3380 numeric. Adding a dot after each operator ("~. |. &. ^.") forces it to
3381 treat its operands as strings:
3382
3383 use feature "bitwise";
3384 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3385 $foo = '150' | 105; # yields 255
3386 $foo = 150 | '105'; # yields 255
3387 $foo = '150' | '105'; # yields 255
3388 $foo = 150 |. 105; # yields string '155'
3389 $foo = '150' |. 105; # yields string '155'
3390 $foo = 150 |.'105'; # yields string '155'
3391 $foo = '150' |.'105'; # yields string '155'
3392
3393 $baz = $foo & $bar; # both operands numeric
3394 $biz = $foo ^. $bar; # both operands stringy
3395
3396 The assignment variants of these operators ("&= |= ^= &.= |.= ^.=")
3397 behave likewise under the feature.
3398
3399 It is a fatal error if an operand contains a character whose ordinal
3400 value is above 0xFF, and hence not expressible except in UTF-8. The
3401 operation is performed on a non-UTF-8 copy for other operands encoded
3402 in UTF-8. See "Byte and Character Semantics" in perlunicode.
3403
3404 See "vec" in perlfunc for information on how to manipulate individual
3405 bits in a bit vector.
3406
3407 Integer Arithmetic
3408 By default, Perl assumes that it must do most of its arithmetic in
3409 floating point. But by saying
3410
3411 use integer;
3412
3413 you may tell the compiler to use integer operations (see integer for a
3414 detailed explanation) from here to the end of the enclosing BLOCK. An
3415 inner BLOCK may countermand this by saying
3416
3417 no integer;
3418
3419 which lasts until the end of that BLOCK. Note that this doesn't mean
3420 everything is an integer, merely that Perl will use integer operations
3421 for arithmetic, comparison, and bitwise operators. For example, even
3422 under "use integer", if you take the sqrt(2), you'll still get
3423 1.4142135623731 or so.
3424
3425 Used on numbers, the bitwise operators ("&" "|" "^" "~" "<<" ">>")
3426 always produce integral results. (But see also "Bitwise String
3427 Operators".) However, "use integer" still has meaning for them. By
3428 default, their results are interpreted as unsigned integers, but if
3429 "use integer" is in effect, their results are interpreted as signed
3430 integers. For example, "~0" usually evaluates to a large integral
3431 value. However, "use integer; ~0" is "-1" on two's-complement
3432 machines.
3433
3434 Floating-point Arithmetic
3435 While "use integer" provides integer-only arithmetic, there is no
3436 analogous mechanism to provide automatic rounding or truncation to a
3437 certain number of decimal places. For rounding to a certain number of
3438 digits, "sprintf()" or "printf()" is usually the easiest route. See
3439 perlfaq4.
3440
3441 Floating-point numbers are only approximations to what a mathematician
3442 would call real numbers. There are infinitely more reals than floats,
3443 so some corners must be cut. For example:
3444
3445 printf "%.20g\n", 123456789123456789;
3446 # produces 123456789123456784
3447
3448 Testing for exact floating-point equality or inequality is not a good
3449 idea. Here's a (relatively expensive) work-around to compare whether
3450 two floating-point numbers are equal to a particular number of decimal
3451 places. See Knuth, volume II, for a more robust treatment of this
3452 topic.
3453
3454 sub fp_equal {
3455 my ($X, $Y, $POINTS) = @_;
3456 my ($tX, $tY);
3457 $tX = sprintf("%.${POINTS}g", $X);
3458 $tY = sprintf("%.${POINTS}g", $Y);
3459 return $tX eq $tY;
3460 }
3461
3462 The POSIX module (part of the standard perl distribution) implements
3463 "ceil()", "floor()", and other mathematical and trigonometric
3464 functions. The "Math::Complex" module (part of the standard perl
3465 distribution) defines mathematical functions that work on both the
3466 reals and the imaginary numbers. "Math::Complex" is not as efficient
3467 as POSIX, but POSIX can't work with complex numbers.
3468
3469 Rounding in financial applications can have serious implications, and
3470 the rounding method used should be specified precisely. In these
3471 cases, it probably pays not to trust whichever system rounding is being
3472 used by Perl, but to instead implement the rounding function you need
3473 yourself.
3474
3475 Bigger Numbers
3476 The standard "Math::BigInt", "Math::BigRat", and "Math::BigFloat"
3477 modules, along with the "bignum", "bigint", and "bigrat" pragmas,
3478 provide variable-precision arithmetic and overloaded operators,
3479 although they're currently pretty slow. At the cost of some space and
3480 considerable speed, they avoid the normal pitfalls associated with
3481 limited-precision representations.
3482
3483 use 5.010;
3484 use bigint; # easy interface to Math::BigInt
3485 $x = 123456789123456789;
3486 say $x * $x;
3487 +15241578780673678515622620750190521
3488
3489 Or with rationals:
3490
3491 use 5.010;
3492 use bigrat;
3493 $x = 3/22;
3494 $y = 4/6;
3495 say "x/y is ", $x/$y;
3496 say "x*y is ", $x*$y;
3497 x/y is 9/44
3498 x*y is 1/11
3499
3500 Several modules let you calculate with unlimited or fixed precision
3501 (bound only by memory and CPU time). There are also some non-standard
3502 modules that provide faster implementations via external C libraries.
3503
3504 Here is a short, but incomplete summary:
3505
3506 Math::String treat string sequences like numbers
3507 Math::FixedPrecision calculate with a fixed precision
3508 Math::Currency for currency calculations
3509 Bit::Vector manipulate bit vectors fast (uses C)
3510 Math::BigIntFast Bit::Vector wrapper for big numbers
3511 Math::Pari provides access to the Pari C library
3512 Math::Cephes uses the external Cephes C library (no
3513 big numbers)
3514 Math::Cephes::Fraction fractions via the Cephes library
3515 Math::GMP another one using an external C library
3516 Math::GMPz an alternative interface to libgmp's big ints
3517 Math::GMPq an interface to libgmp's fraction numbers
3518 Math::GMPf an interface to libgmp's floating point numbers
3519
3520 Choose wisely.
3521
3522
3523
3524perl v5.32.1 2021-05-31 PERLOP(1)