1PERLOP(1) Perl Programmers Reference Guide PERLOP(1)
2
3
4
6 perlop - Perl operators and precedence
7
9 In Perl, the operator determines what operation is performed,
10 independent of the type of the operands. For example "$x + $y" is
11 always a numeric addition, and if $x or $y do not contain numbers, an
12 attempt is made to convert them to numbers first.
13
14 This is in contrast to many other dynamic languages, where the
15 operation is determined by the type of the first argument. It also
16 means that Perl has two versions of some operators, one for numeric and
17 one for string comparison. For example "$x == $y" compares two numbers
18 for equality, and "$x eq $y" compares two strings.
19
20 There are a few exceptions though: "x" can be either string repetition
21 or list repetition, depending on the type of the left operand, and "&",
22 "|", "^" and "~" can be either string or numeric bit operations.
23
24 Operator Precedence and Associativity
25 Operator precedence and associativity work in Perl more or less like
26 they do in mathematics.
27
28 Operator precedence means some operators group more tightly than
29 others. For example, in "2 + 4 * 5", the multiplication has higher
30 precedence, so "4 * 5" is grouped together as the right-hand operand of
31 the addition, rather than "2 + 4" being grouped together as the left-
32 hand operand of the multiplication. It is as if the expression were
33 written "2 + (4 * 5)", not "(2 + 4) * 5". So the expression yields "2 +
34 20 == 22", rather than "6 * 5 == 30".
35
36 Operator associativity defines what happens if a sequence of the same
37 operators is used one after another: whether they will be grouped at
38 the left or the right. For example, in "9 - 3 - 2", subtraction is left
39 associative, so "9 - 3" is grouped together as the left-hand operand of
40 the second subtraction, rather than "3 - 2" being grouped together as
41 the right-hand operand of the first subtraction. It is as if the
42 expression were written "(9 - 3) - 2", not "9 - (3 - 2)". So the
43 expression yields "6 - 2 == 4", rather than "9 - 1 == 8".
44
45 For simple operators that evaluate all their operands and then combine
46 the values in some way, precedence and associativity (and parentheses)
47 imply some ordering requirements on those combining operations. For
48 example, in "2 + 4 * 5", the grouping implied by precedence means that
49 the multiplication of 4 and 5 must be performed before the addition of
50 2 and 20, simply because the result of that multiplication is required
51 as one of the operands of the addition. But the order of operations is
52 not fully determined by this: in "2 * 2 + 4 * 5" both multiplications
53 must be performed before the addition, but the grouping does not say
54 anything about the order in which the two multiplications are
55 performed. In fact Perl has a general rule that the operands of an
56 operator are evaluated in left-to-right order. A few operators such as
57 "&&=" have special evaluation rules that can result in an operand not
58 being evaluated at all; in general, the top-level operator in an
59 expression has control of operand evaluation.
60
61 Perl operators have the following associativity and precedence, listed
62 from highest precedence to lowest. Operators borrowed from C keep the
63 same precedence relationship with each other, even where C's precedence
64 is slightly screwy. (This makes learning Perl easier for C folks.)
65 With very few exceptions, these all operate on scalar values only, not
66 array values.
67
68 left terms and list operators (leftward)
69 left ->
70 nonassoc ++ --
71 right **
72 right ! ~ \ and unary + and -
73 left =~ !~
74 left * / % x
75 left + - .
76 left << >>
77 nonassoc named unary operators
78 nonassoc < > <= >= lt gt le ge
79 nonassoc == != <=> eq ne cmp ~~
80 left &
81 left | ^
82 left &&
83 left || //
84 nonassoc .. ...
85 right ?:
86 right = += -= *= etc. goto last next redo dump
87 left , =>
88 nonassoc list operators (rightward)
89 right not
90 left and
91 left or xor
92
93 In the following sections, these operators are covered in detail, in
94 the same order in which they appear in the table above.
95
96 Many operators can be overloaded for objects. See overload.
97
98 Terms and List Operators (Leftward)
99 A TERM has the highest precedence in Perl. They include variables,
100 quote and quote-like operators, any expression in parentheses, and any
101 function whose arguments are parenthesized. Actually, there aren't
102 really functions in this sense, just list operators and unary operators
103 behaving as functions because you put parentheses around the arguments.
104 These are all documented in perlfunc.
105
106 If any list operator ("print()", etc.) or any unary operator
107 ("chdir()", etc.) is followed by a left parenthesis as the next token,
108 the operator and arguments within parentheses are taken to be of
109 highest precedence, just like a normal function call.
110
111 In the absence of parentheses, the precedence of list operators such as
112 "print", "sort", or "chmod" is either very high or very low depending
113 on whether you are looking at the left side or the right side of the
114 operator. For example, in
115
116 @ary = (1, 3, sort 4, 2);
117 print @ary; # prints 1324
118
119 the commas on the right of the "sort" are evaluated before the "sort",
120 but the commas on the left are evaluated after. In other words, list
121 operators tend to gobble up all arguments that follow, and then act
122 like a simple TERM with regard to the preceding expression. Be careful
123 with parentheses:
124
125 # These evaluate exit before doing the print:
126 print($foo, exit); # Obviously not what you want.
127 print $foo, exit; # Nor is this.
128
129 # These do the print before evaluating exit:
130 (print $foo), exit; # This is what you want.
131 print($foo), exit; # Or this.
132 print ($foo), exit; # Or even this.
133
134 Also note that
135
136 print ($foo & 255) + 1, "\n";
137
138 probably doesn't do what you expect at first glance. The parentheses
139 enclose the argument list for "print" which is evaluated (printing the
140 result of "$foo & 255"). Then one is added to the return value of
141 "print" (usually 1). The result is something like this:
142
143 1 + 1, "\n"; # Obviously not what you meant.
144
145 To do what you meant properly, you must write:
146
147 print(($foo & 255) + 1, "\n");
148
149 See "Named Unary Operators" for more discussion of this.
150
151 Also parsed as terms are the "do {}" and "eval {}" constructs, as well
152 as subroutine and method calls, and the anonymous constructors "[]" and
153 "{}".
154
155 See also "Quote and Quote-like Operators" toward the end of this
156 section, as well as "I/O Operators".
157
158 The Arrow Operator
159 ""->"" is an infix dereference operator, just as it is in C and C++.
160 If the right side is either a "[...]", "{...}", or a "(...)" subscript,
161 then the left side must be either a hard or symbolic reference to an
162 array, a hash, or a subroutine respectively. (Or technically speaking,
163 a location capable of holding a hard reference, if it's an array or
164 hash reference being used for assignment.) See perlreftut and perlref.
165
166 Otherwise, the right side is a method name or a simple scalar variable
167 containing either the method name or a subroutine reference, and the
168 left side must be either an object (a blessed reference) or a class
169 name (that is, a package name). See perlobj.
170
171 The dereferencing cases (as opposed to method-calling cases) are
172 somewhat extended by the "postderef" feature. For the details of that
173 feature, consult "Postfix Dereference Syntax" in perlref.
174
175 Auto-increment and Auto-decrement
176 "++" and "--" work as in C. That is, if placed before a variable, they
177 increment or decrement the variable by one before returning the value,
178 and if placed after, increment or decrement after returning the value.
179
180 $i = 0; $j = 0;
181 print $i++; # prints 0
182 print ++$j; # prints 1
183
184 Note that just as in C, Perl doesn't define when the variable is
185 incremented or decremented. You just know it will be done sometime
186 before or after the value is returned. This also means that modifying
187 a variable twice in the same statement will lead to undefined behavior.
188 Avoid statements like:
189
190 $i = $i ++;
191 print ++ $i + $i ++;
192
193 Perl will not guarantee what the result of the above statements is.
194
195 The auto-increment operator has a little extra builtin magic to it. If
196 you increment a variable that is numeric, or that has ever been used in
197 a numeric context, you get a normal increment. If, however, the
198 variable has been used in only string contexts since it was set, and
199 has a value that is not the empty string and matches the pattern
200 "/^[a-zA-Z]*[0-9]*\z/", the increment is done as a string, preserving
201 each character within its range, with carry:
202
203 print ++($foo = "99"); # prints "100"
204 print ++($foo = "a0"); # prints "a1"
205 print ++($foo = "Az"); # prints "Ba"
206 print ++($foo = "zz"); # prints "aaa"
207
208 "undef" is always treated as numeric, and in particular is changed to 0
209 before incrementing (so that a post-increment of an undef value will
210 return 0 rather than "undef").
211
212 The auto-decrement operator is not magical.
213
214 Exponentiation
215 Binary "**" is the exponentiation operator. It binds even more tightly
216 than unary minus, so "-2**4" is "-(2**4)", not "(-2)**4". (This is
217 implemented using C's pow(3) function, which actually works on doubles
218 internally.)
219
220 Note that certain exponentiation expressions are ill-defined: these
221 include "0**0", "1**Inf", and "Inf**0". Do not expect any particular
222 results from these special cases, the results are platform-dependent.
223
224 Symbolic Unary Operators
225 Unary "!" performs logical negation, that is, "not". See also "not"
226 for a lower precedence version of this.
227
228 Unary "-" performs arithmetic negation if the operand is numeric,
229 including any string that looks like a number. If the operand is an
230 identifier, a string consisting of a minus sign concatenated with the
231 identifier is returned. Otherwise, if the string starts with a plus or
232 minus, a string starting with the opposite sign is returned. One
233 effect of these rules is that "-bareword" is equivalent to the string
234 "-bareword". If, however, the string begins with a non-alphabetic
235 character (excluding "+" or "-"), Perl will attempt to convert the
236 string to a numeric, and the arithmetic negation is performed. If the
237 string cannot be cleanly converted to a numeric, Perl will give the
238 warning Argument "the string" isn't numeric in negation (-) at ....
239
240 Unary "~" performs bitwise negation, that is, 1's complement. For
241 example, "0666 & ~027" is 0640. (See also "Integer Arithmetic" and
242 "Bitwise String Operators".) Note that the width of the result is
243 platform-dependent: "~0" is 32 bits wide on a 32-bit platform, but 64
244 bits wide on a 64-bit platform, so if you are expecting a certain bit
245 width, remember to use the "&" operator to mask off the excess bits.
246
247 Starting in Perl 5.28, it is a fatal error to try to complement a
248 string containing a character with an ordinal value above 255.
249
250 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
251 v5.28", then unary "~" always treats its argument as a number, and an
252 alternate form of the operator, "~.", always treats its argument as a
253 string. So "~0" and "~"0"" will both give 2**32-1 on 32-bit platforms,
254 whereas "~.0" and "~."0"" will both yield "\xff". Until Perl 5.28,
255 this feature produced a warning in the "experimental::bitwise"
256 category.
257
258 Unary "+" has no effect whatsoever, even on strings. It is useful
259 syntactically for separating a function name from a parenthesized
260 expression that would otherwise be interpreted as the complete list of
261 function arguments. (See examples above under "Terms and List
262 Operators (Leftward)".)
263
264 Unary "\" creates references. If its operand is a single sigilled
265 thing, it creates a reference to that object. If its operand is a
266 parenthesised list, then it creates references to the things mentioned
267 in the list. Otherwise it puts its operand in list context, and
268 creates a list of references to the scalars in the list provided by the
269 operand. See perlreftut and perlref. Do not confuse this behavior
270 with the behavior of backslash within a string, although both forms do
271 convey the notion of protecting the next thing from interpolation.
272
273 Binding Operators
274 Binary "=~" binds a scalar expression to a pattern match. Certain
275 operations search or modify the string $_ by default. This operator
276 makes that kind of operation work on some other string. The right
277 argument is a search pattern, substitution, or transliteration. The
278 left argument is what is supposed to be searched, substituted, or
279 transliterated instead of the default $_. When used in scalar context,
280 the return value generally indicates the success of the operation. The
281 exceptions are substitution ("s///") and transliteration ("y///") with
282 the "/r" (non-destructive) option, which cause the return value to be
283 the result of the substitution. Behavior in list context depends on
284 the particular operator. See "Regexp Quote-Like Operators" for details
285 and perlretut for examples using these operators.
286
287 If the right argument is an expression rather than a search pattern,
288 substitution, or transliteration, it is interpreted as a search pattern
289 at run time. Note that this means that its contents will be
290 interpolated twice, so
291
292 '\\' =~ q'\\';
293
294 is not ok, as the regex engine will end up trying to compile the
295 pattern "\", which it will consider a syntax error.
296
297 Binary "!~" is just like "=~" except the return value is negated in the
298 logical sense.
299
300 Binary "!~" with a non-destructive substitution ("s///r") or
301 transliteration ("y///r") is a syntax error.
302
303 Multiplicative Operators
304 Binary "*" multiplies two numbers.
305
306 Binary "/" divides two numbers.
307
308 Binary "%" is the modulo operator, which computes the division
309 remainder of its first argument with respect to its second argument.
310 Given integer operands $m and $n: If $n is positive, then "$m % $n" is
311 $m minus the largest multiple of $n less than or equal to $m. If $n is
312 negative, then "$m % $n" is $m minus the smallest multiple of $n that
313 is not less than $m (that is, the result will be less than or equal to
314 zero). If the operands $m and $n are floating point values and the
315 absolute value of $n (that is "abs($n)") is less than "(UV_MAX + 1)",
316 only the integer portion of $m and $n will be used in the operation
317 (Note: here "UV_MAX" means the maximum of the unsigned integer type).
318 If the absolute value of the right operand ("abs($n)") is greater than
319 or equal to "(UV_MAX + 1)", "%" computes the floating-point remainder
320 $r in the equation "($r = $m - $i*$n)" where $i is a certain integer
321 that makes $r have the same sign as the right operand $n (not as the
322 left operand $m like C function "fmod()") and the absolute value less
323 than that of $n. Note that when "use integer" is in scope, "%" gives
324 you direct access to the modulo operator as implemented by your C
325 compiler. This operator is not as well defined for negative operands,
326 but it will execute faster.
327
328 Binary "x" is the repetition operator. In scalar context, or if the
329 left operand is neither enclosed in parentheses nor a "qw//" list, it
330 performs a string repetition. In that case it supplies scalar context
331 to the left operand, and returns a string consisting of the left
332 operand string repeated the number of times specified by the right
333 operand. If the "x" is in list context, and the left operand is either
334 enclosed in parentheses or a "qw//" list, it performs a list
335 repetition. In that case it supplies list context to the left operand,
336 and returns a list consisting of the left operand list repeated the
337 number of times specified by the right operand. If the right operand
338 is zero or negative (raising a warning on negative), it returns an
339 empty string or an empty list, depending on the context.
340
341 print '-' x 80; # print row of dashes
342
343 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
344
345 @ones = (1) x 80; # a list of 80 1's
346 @ones = (5) x @ones; # set all elements to 5
347
348 Additive Operators
349 Binary "+" returns the sum of two numbers.
350
351 Binary "-" returns the difference of two numbers.
352
353 Binary "." concatenates two strings.
354
355 Shift Operators
356 Binary "<<" returns the value of its left argument shifted left by the
357 number of bits specified by the right argument. Arguments should be
358 integers. (See also "Integer Arithmetic".)
359
360 Binary ">>" returns the value of its left argument shifted right by the
361 number of bits specified by the right argument. Arguments should be
362 integers. (See also "Integer Arithmetic".)
363
364 If "use integer" (see "Integer Arithmetic") is in force then signed C
365 integers are used (arithmetic shift), otherwise unsigned C integers are
366 used (logical shift), even for negative shiftees. In arithmetic right
367 shift the sign bit is replicated on the left, in logical shift zero
368 bits come in from the left.
369
370 Either way, the implementation isn't going to generate results larger
371 than the size of the integer type Perl was built with (32 bits or 64
372 bits).
373
374 Shifting by negative number of bits means the reverse shift: left shift
375 becomes right shift, right shift becomes left shift. This is unlike in
376 C, where negative shift is undefined.
377
378 Shifting by more bits than the size of the integers means most of the
379 time zero (all bits fall off), except that under "use integer" right
380 overshifting a negative shiftee results in -1. This is unlike in C,
381 where shifting by too many bits is undefined. A common C behavior is
382 "shift by modulo wordbits", so that for example
383
384 1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1 # Common C behavior.
385
386 but that is completely accidental.
387
388 If you get tired of being subject to your platform's native integers,
389 the "use bigint" pragma neatly sidesteps the issue altogether:
390
391 print 20 << 20; # 20971520
392 print 20 << 40; # 5120 on 32-bit machines,
393 # 21990232555520 on 64-bit machines
394 use bigint;
395 print 20 << 100; # 25353012004564588029934064107520
396
397 Named Unary Operators
398 The various named unary operators are treated as functions with one
399 argument, with optional parentheses.
400
401 If any list operator ("print()", etc.) or any unary operator
402 ("chdir()", etc.) is followed by a left parenthesis as the next token,
403 the operator and arguments within parentheses are taken to be of
404 highest precedence, just like a normal function call. For example,
405 because named unary operators are higher precedence than "||":
406
407 chdir $foo || die; # (chdir $foo) || die
408 chdir($foo) || die; # (chdir $foo) || die
409 chdir ($foo) || die; # (chdir $foo) || die
410 chdir +($foo) || die; # (chdir $foo) || die
411
412 but, because "*" is higher precedence than named operators:
413
414 chdir $foo * 20; # chdir ($foo * 20)
415 chdir($foo) * 20; # (chdir $foo) * 20
416 chdir ($foo) * 20; # (chdir $foo) * 20
417 chdir +($foo) * 20; # chdir ($foo * 20)
418
419 rand 10 * 20; # rand (10 * 20)
420 rand(10) * 20; # (rand 10) * 20
421 rand (10) * 20; # (rand 10) * 20
422 rand +(10) * 20; # rand (10 * 20)
423
424 Regarding precedence, the filetest operators, like "-f", "-M", etc. are
425 treated like named unary operators, but they don't follow this
426 functional parenthesis rule. That means, for example, that
427 "-f($file).".bak"" is equivalent to "-f "$file.bak"".
428
429 See also "Terms and List Operators (Leftward)".
430
431 Relational Operators
432 Perl operators that return true or false generally return values that
433 can be safely used as numbers. For example, the relational operators
434 in this section and the equality operators in the next one return 1 for
435 true and a special version of the defined empty string, "", which
436 counts as a zero but is exempt from warnings about improper numeric
437 conversions, just as "0 but true" is.
438
439 Binary "<" returns true if the left argument is numerically less than
440 the right argument.
441
442 Binary ">" returns true if the left argument is numerically greater
443 than the right argument.
444
445 Binary "<=" returns true if the left argument is numerically less than
446 or equal to the right argument.
447
448 Binary ">=" returns true if the left argument is numerically greater
449 than or equal to the right argument.
450
451 Binary "lt" returns true if the left argument is stringwise less than
452 the right argument.
453
454 Binary "gt" returns true if the left argument is stringwise greater
455 than the right argument.
456
457 Binary "le" returns true if the left argument is stringwise less than
458 or equal to the right argument.
459
460 Binary "ge" returns true if the left argument is stringwise greater
461 than or equal to the right argument.
462
463 Equality Operators
464 Binary "==" returns true if the left argument is numerically equal to
465 the right argument.
466
467 Binary "!=" returns true if the left argument is numerically not equal
468 to the right argument.
469
470 Binary "<=>" returns -1, 0, or 1 depending on whether the left argument
471 is numerically less than, equal to, or greater than the right argument.
472 If your platform supports "NaN"'s (not-a-numbers) as numeric values,
473 using them with "<=>" returns undef. "NaN" is not "<", "==", ">", "<="
474 or ">=" anything (even "NaN"), so those 5 return false. "NaN != NaN"
475 returns true, as does "NaN !=" anything else. If your platform doesn't
476 support "NaN"'s then "NaN" is just a string with numeric value 0.
477
478 $ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
479 $ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
480
481 (Note that the bigint, bigrat, and bignum pragmas all support "NaN".)
482
483 Binary "eq" returns true if the left argument is stringwise equal to
484 the right argument.
485
486 Binary "ne" returns true if the left argument is stringwise not equal
487 to the right argument.
488
489 Binary "cmp" returns -1, 0, or 1 depending on whether the left argument
490 is stringwise less than, equal to, or greater than the right argument.
491
492 Binary "~~" does a smartmatch between its arguments. Smart matching is
493 described in the next section.
494
495 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order
496 specified by the current "LC_COLLATE" locale if a "use locale" form
497 that includes collation is in effect. See perllocale. Do not mix
498 these with Unicode, only use them with legacy 8-bit locale encodings.
499 The standard "Unicode::Collate" and "Unicode::Collate::Locale" modules
500 offer much more powerful solutions to collation issues.
501
502 For case-insensitive comparisons, look at the "fc" in perlfunc case-
503 folding function, available in Perl v5.16 or later:
504
505 if ( fc($x) eq fc($y) ) { ... }
506
507 Smartmatch Operator
508 First available in Perl 5.10.1 (the 5.10.0 version behaved
509 differently), binary "~~" does a "smartmatch" between its arguments.
510 This is mostly used implicitly in the "when" construct described in
511 perlsyn, although not all "when" clauses call the smartmatch operator.
512 Unique among all of Perl's operators, the smartmatch operator can
513 recurse. The smartmatch operator is experimental and its behavior is
514 subject to change.
515
516 It is also unique in that all other Perl operators impose a context
517 (usually string or numeric context) on their operands, autoconverting
518 those operands to those imposed contexts. In contrast, smartmatch
519 infers contexts from the actual types of its operands and uses that
520 type information to select a suitable comparison mechanism.
521
522 The "~~" operator compares its operands "polymorphically", determining
523 how to compare them according to their actual types (numeric, string,
524 array, hash, etc.). Like the equality operators with which it shares
525 the same precedence, "~~" returns 1 for true and "" for false. It is
526 often best read aloud as "in", "inside of", or "is contained in",
527 because the left operand is often looked for inside the right operand.
528 That makes the order of the operands to the smartmatch operand often
529 opposite that of the regular match operator. In other words, the
530 "smaller" thing is usually placed in the left operand and the larger
531 one in the right.
532
533 The behavior of a smartmatch depends on what type of things its
534 arguments are, as determined by the following table. The first row of
535 the table whose types apply determines the smartmatch behavior.
536 Because what actually happens is mostly determined by the type of the
537 second operand, the table is sorted on the right operand instead of on
538 the left.
539
540 Left Right Description and pseudocode
541 ===============================================================
542 Any undef check whether Any is undefined
543 like: !defined Any
544
545 Any Object invoke ~~ overloading on Object, or die
546
547 Right operand is an ARRAY:
548
549 Left Right Description and pseudocode
550 ===============================================================
551 ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
552 like: (ARRAY1[0] ~~ ARRAY2[0])
553 && (ARRAY1[1] ~~ ARRAY2[1]) && ...
554 HASH ARRAY any ARRAY elements exist as HASH keys
555 like: grep { exists HASH->{$_} } ARRAY
556 Regexp ARRAY any ARRAY elements pattern match Regexp
557 like: grep { /Regexp/ } ARRAY
558 undef ARRAY undef in ARRAY
559 like: grep { !defined } ARRAY
560 Any ARRAY smartmatch each ARRAY element[3]
561 like: grep { Any ~~ $_ } ARRAY
562
563 Right operand is a HASH:
564
565 Left Right Description and pseudocode
566 ===============================================================
567 HASH1 HASH2 all same keys in both HASHes
568 like: keys HASH1 ==
569 grep { exists HASH2->{$_} } keys HASH1
570 ARRAY HASH any ARRAY elements exist as HASH keys
571 like: grep { exists HASH->{$_} } ARRAY
572 Regexp HASH any HASH keys pattern match Regexp
573 like: grep { /Regexp/ } keys HASH
574 undef HASH always false (undef can't be a key)
575 like: 0 == 1
576 Any HASH HASH key existence
577 like: exists HASH->{Any}
578
579 Right operand is CODE:
580
581 Left Right Description and pseudocode
582 ===============================================================
583 ARRAY CODE sub returns true on all ARRAY elements[1]
584 like: !grep { !CODE->($_) } ARRAY
585 HASH CODE sub returns true on all HASH keys[1]
586 like: !grep { !CODE->($_) } keys HASH
587 Any CODE sub passed Any returns true
588 like: CODE->(Any)
589
590 Right operand is a Regexp:
591
592 Left Right Description and pseudocode
593 ===============================================================
594 ARRAY Regexp any ARRAY elements match Regexp
595 like: grep { /Regexp/ } ARRAY
596 HASH Regexp any HASH keys match Regexp
597 like: grep { /Regexp/ } keys HASH
598 Any Regexp pattern match
599 like: Any =~ /Regexp/
600
601 Other:
602
603 Left Right Description and pseudocode
604 ===============================================================
605 Object Any invoke ~~ overloading on Object,
606 or fall back to...
607
608 Any Num numeric equality
609 like: Any == Num
610 Num nummy[4] numeric equality
611 like: Num == nummy
612 undef Any check whether undefined
613 like: !defined(Any)
614 Any Any string equality
615 like: Any eq Any
616
617 Notes:
618
619 1. Empty hashes or arrays match.
620 2. That is, each element smartmatches the element of the same index in
621 the other array.[3]
622 3. If a circular reference is found, fall back to referential equality.
623 4. Either an actual number, or a string that looks like one.
624
625 The smartmatch implicitly dereferences any non-blessed hash or array
626 reference, so the "HASH" and "ARRAY" entries apply in those cases. For
627 blessed references, the "Object" entries apply. Smartmatches involving
628 hashes only consider hash keys, never hash values.
629
630 The "like" code entry is not always an exact rendition. For example,
631 the smartmatch operator short-circuits whenever possible, but "grep"
632 does not. Also, "grep" in scalar context returns the number of
633 matches, but "~~" returns only true or false.
634
635 Unlike most operators, the smartmatch operator knows to treat "undef"
636 specially:
637
638 use v5.10.1;
639 @array = (1, 2, 3, undef, 4, 5);
640 say "some elements undefined" if undef ~~ @array;
641
642 Each operand is considered in a modified scalar context, the
643 modification being that array and hash variables are passed by
644 reference to the operator, which implicitly dereferences them. Both
645 elements of each pair are the same:
646
647 use v5.10.1;
648
649 my %hash = (red => 1, blue => 2, green => 3,
650 orange => 4, yellow => 5, purple => 6,
651 black => 7, grey => 8, white => 9);
652
653 my @array = qw(red blue green);
654
655 say "some array elements in hash keys" if @array ~~ %hash;
656 say "some array elements in hash keys" if \@array ~~ \%hash;
657
658 say "red in array" if "red" ~~ @array;
659 say "red in array" if "red" ~~ \@array;
660
661 say "some keys end in e" if /e$/ ~~ %hash;
662 say "some keys end in e" if /e$/ ~~ \%hash;
663
664 Two arrays smartmatch if each element in the first array smartmatches
665 (that is, is "in") the corresponding element in the second array,
666 recursively.
667
668 use v5.10.1;
669 my @little = qw(red blue green);
670 my @bigger = ("red", "blue", [ "orange", "green" ] );
671 if (@little ~~ @bigger) { # true!
672 say "little is contained in bigger";
673 }
674
675 Because the smartmatch operator recurses on nested arrays, this will
676 still report that "red" is in the array.
677
678 use v5.10.1;
679 my @array = qw(red blue green);
680 my $nested_array = [[[[[[[ @array ]]]]]]];
681 say "red in array" if "red" ~~ $nested_array;
682
683 If two arrays smartmatch each other, then they are deep copies of each
684 others' values, as this example reports:
685
686 use v5.12.0;
687 my @a = (0, 1, 2, [3, [4, 5], 6], 7);
688 my @b = (0, 1, 2, [3, [4, 5], 6], 7);
689
690 if (@a ~~ @b && @b ~~ @a) {
691 say "a and b are deep copies of each other";
692 }
693 elsif (@a ~~ @b) {
694 say "a smartmatches in b";
695 }
696 elsif (@b ~~ @a) {
697 say "b smartmatches in a";
698 }
699 else {
700 say "a and b don't smartmatch each other at all";
701 }
702
703 If you were to set "$b[3] = 4", then instead of reporting that "a and b
704 are deep copies of each other", it now reports that "b smartmatches in
705 a". That's because the corresponding position in @a contains an array
706 that (eventually) has a 4 in it.
707
708 Smartmatching one hash against another reports whether both contain the
709 same keys, no more and no less. This could be used to see whether two
710 records have the same field names, without caring what values those
711 fields might have. For example:
712
713 use v5.10.1;
714 sub make_dogtag {
715 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
716
717 my ($class, $init_fields) = @_;
718
719 die "Must supply (only) name, rank, and serial number"
720 unless $init_fields ~~ $REQUIRED_FIELDS;
721
722 ...
723 }
724
725 However, this only does what you mean if $init_fields is indeed a hash
726 reference. The condition "$init_fields ~~ $REQUIRED_FIELDS" also allows
727 the strings "name", "rank", "serial_num" as well as any array reference
728 that contains "name" or "rank" or "serial_num" anywhere to pass
729 through.
730
731 The smartmatch operator is most often used as the implicit operator of
732 a "when" clause. See the section on "Switch Statements" in perlsyn.
733
734 Smartmatching of Objects
735
736 To avoid relying on an object's underlying representation, if the
737 smartmatch's right operand is an object that doesn't overload "~~", it
738 raises the exception ""Smartmatching a non-overloaded object breaks
739 encapsulation"". That's because one has no business digging around to
740 see whether something is "in" an object. These are all illegal on
741 objects without a "~~" overload:
742
743 %hash ~~ $object
744 42 ~~ $object
745 "fred" ~~ $object
746
747 However, you can change the way an object is smartmatched by
748 overloading the "~~" operator. This is allowed to extend the usual
749 smartmatch semantics. For objects that do have an "~~" overload, see
750 overload.
751
752 Using an object as the left operand is allowed, although not very
753 useful. Smartmatching rules take precedence over overloading, so even
754 if the object in the left operand has smartmatch overloading, this will
755 be ignored. A left operand that is a non-overloaded object falls back
756 on a string or numeric comparison of whatever the "ref" operator
757 returns. That means that
758
759 $object ~~ X
760
761 does not invoke the overload method with "X" as an argument. Instead
762 the above table is consulted as normal, and based on the type of "X",
763 overloading may or may not be invoked. For simple strings or numbers,
764 "in" becomes equivalent to this:
765
766 $object ~~ $number ref($object) == $number
767 $object ~~ $string ref($object) eq $string
768
769 For example, this reports that the handle smells IOish (but please
770 don't really do this!):
771
772 use IO::Handle;
773 my $fh = IO::Handle->new();
774 if ($fh ~~ /\bIO\b/) {
775 say "handle smells IOish";
776 }
777
778 That's because it treats $fh as a string like
779 "IO::Handle=GLOB(0x8039e0)", then pattern matches against that.
780
781 Bitwise And
782 Binary "&" returns its operands ANDed together bit by bit. Although no
783 warning is currently raised, the result is not well defined when this
784 operation is performed on operands that aren't either numbers (see
785 "Integer Arithmetic") nor bitstrings (see "Bitwise String Operators").
786
787 Note that "&" has lower priority than relational operators, so for
788 example the parentheses are essential in a test like
789
790 print "Even\n" if ($x & 1) == 0;
791
792 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
793 v5.28", then this operator always treats its operands as numbers.
794 Before Perl 5.28 this feature produced a warning in the
795 "experimental::bitwise" category.
796
797 Bitwise Or and Exclusive Or
798 Binary "|" returns its operands ORed together bit by bit.
799
800 Binary "^" returns its operands XORed together bit by bit.
801
802 Although no warning is currently raised, the results are not well
803 defined when these operations are performed on operands that aren't
804 either numbers (see "Integer Arithmetic") nor bitstrings (see "Bitwise
805 String Operators").
806
807 Note that "|" and "^" have lower priority than relational operators, so
808 for example the parentheses are essential in a test like
809
810 print "false\n" if (8 | 2) != 10;
811
812 If the "bitwise" feature is enabled via "use feature 'bitwise'" or "use
813 v5.28", then this operator always treats its operands as numbers.
814 Before Perl 5.28. this feature produced a warning in the
815 "experimental::bitwise" category.
816
817 C-style Logical And
818 Binary "&&" performs a short-circuit logical AND operation. That is,
819 if the left operand is false, the right operand is not even evaluated.
820 Scalar or list context propagates down to the right operand if it is
821 evaluated.
822
823 C-style Logical Or
824 Binary "||" performs a short-circuit logical OR operation. That is, if
825 the left operand is true, the right operand is not even evaluated.
826 Scalar or list context propagates down to the right operand if it is
827 evaluated.
828
829 Logical Defined-Or
830 Although it has no direct equivalent in C, Perl's "//" operator is
831 related to its C-style "or". In fact, it's exactly the same as "||",
832 except that it tests the left hand side's definedness instead of its
833 truth. Thus, "EXPR1 // EXPR2" returns the value of "EXPR1" if it's
834 defined, otherwise, the value of "EXPR2" is returned. ("EXPR1" is
835 evaluated in scalar context, "EXPR2" in the context of "//" itself).
836 Usually, this is the same result as "defined(EXPR1) ? EXPR1 : EXPR2"
837 (except that the ternary-operator form can be used as a lvalue, while
838 "EXPR1 // EXPR2" cannot). This is very useful for providing default
839 values for variables. If you actually want to test if at least one of
840 $x and $y is defined, use "defined($x // $y)".
841
842 The "||", "//" and "&&" operators return the last value evaluated
843 (unlike C's "||" and "&&", which return 0 or 1). Thus, a reasonably
844 portable way to find out the home directory might be:
845
846 $home = $ENV{HOME}
847 // $ENV{LOGDIR}
848 // (getpwuid($<))[7]
849 // die "You're homeless!\n";
850
851 In particular, this means that you shouldn't use this for selecting
852 between two aggregates for assignment:
853
854 @a = @b || @c; # This doesn't do the right thing
855 @a = scalar(@b) || @c; # because it really means this.
856 @a = @b ? @b : @c; # This works fine, though.
857
858 As alternatives to "&&" and "||" when used for control flow, Perl
859 provides the "and" and "or" operators (see below). The short-circuit
860 behavior is identical. The precedence of "and" and "or" is much lower,
861 however, so that you can safely use them after a list operator without
862 the need for parentheses:
863
864 unlink "alpha", "beta", "gamma"
865 or gripe(), next LINE;
866
867 With the C-style operators that would have been written like this:
868
869 unlink("alpha", "beta", "gamma")
870 || (gripe(), next LINE);
871
872 It would be even more readable to write that this way:
873
874 unless(unlink("alpha", "beta", "gamma")) {
875 gripe();
876 next LINE;
877 }
878
879 Using "or" for assignment is unlikely to do what you want; see below.
880
881 Range Operators
882 Binary ".." is the range operator, which is really two different
883 operators depending on the context. In list context, it returns a list
884 of values counting (up by ones) from the left value to the right value.
885 If the left value is greater than the right value then it returns the
886 empty list. The range operator is useful for writing "foreach (1..10)"
887 loops and for doing slice operations on arrays. In the current
888 implementation, no temporary array is created when the range operator
889 is used as the expression in "foreach" loops, but older versions of
890 Perl might burn a lot of memory when you write something like this:
891
892 for (1 .. 1_000_000) {
893 # code
894 }
895
896 The range operator also works on strings, using the magical auto-
897 increment, see below.
898
899 In scalar context, ".." returns a boolean value. The operator is
900 bistable, like a flip-flop, and emulates the line-range (comma)
901 operator of sed, awk, and various editors. Each ".." operator
902 maintains its own boolean state, even across calls to a subroutine that
903 contains it. It is false as long as its left operand is false. Once
904 the left operand is true, the range operator stays true until the right
905 operand is true, AFTER which the range operator becomes false again.
906 It doesn't become false till the next time the range operator is
907 evaluated. It can test the right operand and become false on the same
908 evaluation it became true (as in awk), but it still returns true once.
909 If you don't want it to test the right operand until the next
910 evaluation, as in sed, just use three dots ("...") instead of two. In
911 all other regards, "..." behaves just like ".." does.
912
913 The right operand is not evaluated while the operator is in the "false"
914 state, and the left operand is not evaluated while the operator is in
915 the "true" state. The precedence is a little lower than || and &&.
916 The value returned is either the empty string for false, or a sequence
917 number (beginning with 1) for true. The sequence number is reset for
918 each range encountered. The final sequence number in a range has the
919 string "E0" appended to it, which doesn't affect its numeric value, but
920 gives you something to search for if you want to exclude the endpoint.
921 You can exclude the beginning point by waiting for the sequence number
922 to be greater than 1.
923
924 If either operand of scalar ".." is a constant expression, that operand
925 is considered true if it is equal ("==") to the current input line
926 number (the $. variable).
927
928 To be pedantic, the comparison is actually "int(EXPR) == int(EXPR)",
929 but that is only an issue if you use a floating point expression; when
930 implicitly using $. as described in the previous paragraph, the
931 comparison is "int(EXPR) == int($.)" which is only an issue when $. is
932 set to a floating point value and you are not reading from a file.
933 Furthermore, "span" .. "spat" or "2.18 .. 3.14" will not do what you
934 want in scalar context because each of the operands are evaluated using
935 their integer representation.
936
937 Examples:
938
939 As a scalar operator:
940
941 if (101 .. 200) { print; } # print 2nd hundred lines, short for
942 # if ($. == 101 .. $. == 200) { print; }
943
944 next LINE if (1 .. /^$/); # skip header lines, short for
945 # next LINE if ($. == 1 .. /^$/);
946 # (typically in a loop labeled LINE)
947
948 s/^/> / if (/^$/ .. eof()); # quote body
949
950 # parse mail messages
951 while (<>) {
952 $in_header = 1 .. /^$/;
953 $in_body = /^$/ .. eof;
954 if ($in_header) {
955 # do something
956 } else { # in body
957 # do something else
958 }
959 } continue {
960 close ARGV if eof; # reset $. each file
961 }
962
963 Here's a simple example to illustrate the difference between the two
964 range operators:
965
966 @lines = (" - Foo",
967 "01 - Bar",
968 "1 - Baz",
969 " - Quux");
970
971 foreach (@lines) {
972 if (/0/ .. /1/) {
973 print "$_\n";
974 }
975 }
976
977 This program will print only the line containing "Bar". If the range
978 operator is changed to "...", it will also print the "Baz" line.
979
980 And now some examples as a list operator:
981
982 for (101 .. 200) { print } # print $_ 100 times
983 @foo = @foo[0 .. $#foo]; # an expensive no-op
984 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
985
986 The range operator (in list context) makes use of the magical auto-
987 increment algorithm if the operands are strings. You can say
988
989 @alphabet = ("A" .. "Z");
990
991 to get all normal letters of the English alphabet, or
992
993 $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
994
995 to get a hexadecimal digit, or
996
997 @z2 = ("01" .. "31");
998 print $z2[$mday];
999
1000 to get dates with leading zeros.
1001
1002 If the final value specified is not in the sequence that the magical
1003 increment would produce, the sequence goes until the next value would
1004 be longer than the final value specified.
1005
1006 As of Perl 5.26, the list-context range operator on strings works as
1007 expected in the scope of "use feature 'unicode_strings". In previous
1008 versions, and outside the scope of that feature, it exhibits "The
1009 "Unicode Bug"" in perlunicode: its behavior depends on the internal
1010 encoding of the range endpoint.
1011
1012 If the initial value specified isn't part of a magical increment
1013 sequence (that is, a non-empty string matching "/^[a-zA-Z]*[0-9]*\z/"),
1014 only the initial value will be returned. So the following will only
1015 return an alpha:
1016
1017 use charnames "greek";
1018 my @greek_small = ("\N{alpha}" .. "\N{omega}");
1019
1020 To get the 25 traditional lowercase Greek letters, including both
1021 sigmas, you could use this instead:
1022
1023 use charnames "greek";
1024 my @greek_small = map { chr } ( ord("\N{alpha}")
1025 ..
1026 ord("\N{omega}")
1027 );
1028
1029 However, because there are many other lowercase Greek characters than
1030 just those, to match lowercase Greek characters in a regular
1031 expression, you could use the pattern "/(?:(?=\p{Greek})\p{Lower})+/"
1032 (or the experimental feature "/(?[ \p{Greek} & \p{Lower} ])+/").
1033
1034 Because each operand is evaluated in integer form, "2.18 .. 3.14" will
1035 return two elements in list context.
1036
1037 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
1038
1039 Conditional Operator
1040 Ternary "?:" is the conditional operator, just as in C. It works much
1041 like an if-then-else. If the argument before the "?" is true, the
1042 argument before the ":" is returned, otherwise the argument after the
1043 ":" is returned. For example:
1044
1045 printf "I have %d dog%s.\n", $n,
1046 ($n == 1) ? "" : "s";
1047
1048 Scalar or list context propagates downward into the 2nd or 3rd
1049 argument, whichever is selected.
1050
1051 $x = $ok ? $y : $z; # get a scalar
1052 @x = $ok ? @y : @z; # get an array
1053 $x = $ok ? @y : @z; # oops, that's just a count!
1054
1055 The operator may be assigned to if both the 2nd and 3rd arguments are
1056 legal lvalues (meaning that you can assign to them):
1057
1058 ($x_or_y ? $x : $y) = $z;
1059
1060 Because this operator produces an assignable result, using assignments
1061 without parentheses will get you in trouble. For example, this:
1062
1063 $x % 2 ? $x += 10 : $x += 2
1064
1065 Really means this:
1066
1067 (($x % 2) ? ($x += 10) : $x) += 2
1068
1069 Rather than this:
1070
1071 ($x % 2) ? ($x += 10) : ($x += 2)
1072
1073 That should probably be written more simply as:
1074
1075 $x += ($x % 2) ? 10 : 2;
1076
1077 Assignment Operators
1078 "=" is the ordinary assignment operator.
1079
1080 Assignment operators work as in C. That is,
1081
1082 $x += 2;
1083
1084 is equivalent to
1085
1086 $x = $x + 2;
1087
1088 although without duplicating any side effects that dereferencing the
1089 lvalue might trigger, such as from "tie()". Other assignment operators
1090 work similarly. The following are recognized:
1091
1092 **= += *= &= &.= <<= &&=
1093 -= /= |= |.= >>= ||=
1094 .= %= ^= ^.= //=
1095 x=
1096
1097 Although these are grouped by family, they all have the precedence of
1098 assignment. These combined assignment operators can only operate on
1099 scalars, whereas the ordinary assignment operator can assign to arrays,
1100 hashes, lists and even references. (See "Context" and "List value
1101 constructors" in perldata, and "Assigning to References" in perlref.)
1102
1103 Unlike in C, the scalar assignment operator produces a valid lvalue.
1104 Modifying an assignment is equivalent to doing the assignment and then
1105 modifying the variable that was assigned to. This is useful for
1106 modifying a copy of something, like this:
1107
1108 ($tmp = $global) =~ tr/13579/24680/;
1109
1110 Although as of 5.14, that can be also be accomplished this way:
1111
1112 use v5.14;
1113 $tmp = ($global =~ tr/13579/24680/r);
1114
1115 Likewise,
1116
1117 ($x += 2) *= 3;
1118
1119 is equivalent to
1120
1121 $x += 2;
1122 $x *= 3;
1123
1124 Similarly, a list assignment in list context produces the list of
1125 lvalues assigned to, and a list assignment in scalar context returns
1126 the number of elements produced by the expression on the right hand
1127 side of the assignment.
1128
1129 The three dotted bitwise assignment operators ("&.=" "|.=" "^.=") are
1130 new in Perl 5.22. See "Bitwise String Operators".
1131
1132 Comma Operator
1133 Binary "," is the comma operator. In scalar context it evaluates its
1134 left argument, throws that value away, then evaluates its right
1135 argument and returns that value. This is just like C's comma operator.
1136
1137 In list context, it's just the list argument separator, and inserts
1138 both its arguments into the list. These arguments are also evaluated
1139 from left to right.
1140
1141 The "=>" operator (sometimes pronounced "fat comma") is a synonym for
1142 the comma except that it causes a word on its left to be interpreted as
1143 a string if it begins with a letter or underscore and is composed only
1144 of letters, digits and underscores. This includes operands that might
1145 otherwise be interpreted as operators, constants, single number
1146 v-strings or function calls. If in doubt about this behavior, the left
1147 operand can be quoted explicitly.
1148
1149 Otherwise, the "=>" operator behaves exactly as the comma operator or
1150 list argument separator, according to context.
1151
1152 For example:
1153
1154 use constant FOO => "something";
1155
1156 my %h = ( FOO => 23 );
1157
1158 is equivalent to:
1159
1160 my %h = ("FOO", 23);
1161
1162 It is NOT:
1163
1164 my %h = ("something", 23);
1165
1166 The "=>" operator is helpful in documenting the correspondence between
1167 keys and values in hashes, and other paired elements in lists.
1168
1169 %hash = ( $key => $value );
1170 login( $username => $password );
1171
1172 The special quoting behavior ignores precedence, and hence may apply to
1173 part of the left operand:
1174
1175 print time.shift => "bbb";
1176
1177 That example prints something like "1314363215shiftbbb", because the
1178 "=>" implicitly quotes the "shift" immediately on its left, ignoring
1179 the fact that "time.shift" is the entire left operand.
1180
1181 List Operators (Rightward)
1182 On the right side of a list operator, the comma has very low
1183 precedence, such that it controls all comma-separated expressions found
1184 there. The only operators with lower precedence are the logical
1185 operators "and", "or", and "not", which may be used to evaluate calls
1186 to list operators without the need for parentheses:
1187
1188 open HANDLE, "< :encoding(UTF-8)", "filename"
1189 or die "Can't open: $!\n";
1190
1191 However, some people find that code harder to read than writing it with
1192 parentheses:
1193
1194 open(HANDLE, "< :encoding(UTF-8)", "filename")
1195 or die "Can't open: $!\n";
1196
1197 in which case you might as well just use the more customary "||"
1198 operator:
1199
1200 open(HANDLE, "< :encoding(UTF-8)", "filename")
1201 || die "Can't open: $!\n";
1202
1203 See also discussion of list operators in "Terms and List Operators
1204 (Leftward)".
1205
1206 Logical Not
1207 Unary "not" returns the logical negation of the expression to its
1208 right. It's the equivalent of "!" except for the very low precedence.
1209
1210 Logical And
1211 Binary "and" returns the logical conjunction of the two surrounding
1212 expressions. It's equivalent to "&&" except for the very low
1213 precedence. This means that it short-circuits: the right expression is
1214 evaluated only if the left expression is true.
1215
1216 Logical or and Exclusive Or
1217 Binary "or" returns the logical disjunction of the two surrounding
1218 expressions. It's equivalent to "||" except for the very low
1219 precedence. This makes it useful for control flow:
1220
1221 print FH $data or die "Can't write to FH: $!";
1222
1223 This means that it short-circuits: the right expression is evaluated
1224 only if the left expression is false. Due to its precedence, you must
1225 be careful to avoid using it as replacement for the "||" operator. It
1226 usually works out better for flow control than in assignments:
1227
1228 $x = $y or $z; # bug: this is wrong
1229 ($x = $y) or $z; # really means this
1230 $x = $y || $z; # better written this way
1231
1232 However, when it's a list-context assignment and you're trying to use
1233 "||" for control flow, you probably need "or" so that the assignment
1234 takes higher precedence.
1235
1236 @info = stat($file) || die; # oops, scalar sense of stat!
1237 @info = stat($file) or die; # better, now @info gets its due
1238
1239 Then again, you could always use parentheses.
1240
1241 Binary "xor" returns the exclusive-OR of the two surrounding
1242 expressions. It cannot short-circuit (of course).
1243
1244 There is no low precedence operator for defined-OR.
1245
1246 C Operators Missing From Perl
1247 Here is what C has that Perl doesn't:
1248
1249 unary & Address-of operator. (But see the "\" operator for taking a
1250 reference.)
1251
1252 unary * Dereference-address operator. (Perl's prefix dereferencing
1253 operators are typed: "$", "@", "%", and "&".)
1254
1255 (TYPE) Type-casting operator.
1256
1257 Quote and Quote-like Operators
1258 While we usually think of quotes as literal values, in Perl they
1259 function as operators, providing various kinds of interpolating and
1260 pattern matching capabilities. Perl provides customary quote
1261 characters for these behaviors, but also provides a way for you to
1262 choose your quote character for any of them. In the following table, a
1263 "{}" represents any pair of delimiters you choose.
1264
1265 Customary Generic Meaning Interpolates
1266 '' q{} Literal no
1267 "" qq{} Literal yes
1268 `` qx{} Command yes*
1269 qw{} Word list no
1270 // m{} Pattern match yes*
1271 qr{} Pattern yes*
1272 s{}{} Substitution yes*
1273 tr{}{} Transliteration no (but see below)
1274 y{}{} Transliteration no (but see below)
1275 <<EOF here-doc yes*
1276
1277 * unless the delimiter is ''.
1278
1279 Non-bracketing delimiters use the same character fore and aft, but the
1280 four sorts of ASCII brackets (round, angle, square, curly) all nest,
1281 which means that
1282
1283 q{foo{bar}baz}
1284
1285 is the same as
1286
1287 'foo{bar}baz'
1288
1289 Note, however, that this does not always work for quoting Perl code:
1290
1291 $s = q{ if($x eq "}") ... }; # WRONG
1292
1293 is a syntax error. The "Text::Balanced" module (standard as of v5.8,
1294 and from CPAN before then) is able to do this properly.
1295
1296 There can (and in some cases, must) be whitespace between the operator
1297 and the quoting characters, except when "#" is being used as the
1298 quoting character. "q#foo#" is parsed as the string "foo", while
1299 "q #foo#" is the operator "q" followed by a comment. Its argument will
1300 be taken from the next line. This allows you to write:
1301
1302 s {foo} # Replace foo
1303 {bar} # with bar.
1304
1305 The cases where whitespace must be used are when the quoting character
1306 is a word character (meaning it matches "/\w/"):
1307
1308 q XfooX # Works: means the string 'foo'
1309 qXfooX # WRONG!
1310
1311 The following escape sequences are available in constructs that
1312 interpolate, and in transliterations whose delimiters aren't single
1313 quotes ("'").
1314
1315 Sequence Note Description
1316 \t tab (HT, TAB)
1317 \n newline (NL)
1318 \r return (CR)
1319 \f form feed (FF)
1320 \b backspace (BS)
1321 \a alarm (bell) (BEL)
1322 \e escape (ESC)
1323 \x{263A} [1,8] hex char (example: SMILEY)
1324 \x1b [2,8] restricted range hex char (example: ESC)
1325 \N{name} [3] named Unicode character or character sequence
1326 \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
1327 \c[ [5] control char (example: chr(27))
1328 \o{23072} [6,8] octal char (example: SMILEY)
1329 \033 [7,8] restricted range octal char (example: ESC)
1330
1331 [1] The result is the character specified by the hexadecimal number
1332 between the braces. See "[8]" below for details on which
1333 character.
1334
1335 Only hexadecimal digits are valid between the braces. If an
1336 invalid character is encountered, a warning will be issued and the
1337 invalid character and all subsequent characters (valid or invalid)
1338 within the braces will be discarded.
1339
1340 If there are no valid digits between the braces, the generated
1341 character is the NULL character ("\x{00}"). However, an explicit
1342 empty brace ("\x{}") will not cause a warning (currently).
1343
1344 [2] The result is the character specified by the hexadecimal number in
1345 the range 0x00 to 0xFF. See "[8]" below for details on which
1346 character.
1347
1348 Only hexadecimal digits are valid following "\x". When "\x" is
1349 followed by fewer than two valid digits, any valid digits will be
1350 zero-padded. This means that "\x7" will be interpreted as "\x07",
1351 and a lone "\x" will be interpreted as "\x00". Except at the end
1352 of a string, having fewer than two valid digits will result in a
1353 warning. Note that although the warning says the illegal character
1354 is ignored, it is only ignored as part of the escape and will still
1355 be used as the subsequent character in the string. For example:
1356
1357 Original Result Warns?
1358 "\x7" "\x07" no
1359 "\x" "\x00" no
1360 "\x7q" "\x07q" yes
1361 "\xq" "\x00q" yes
1362
1363 [3] The result is the Unicode character or character sequence given by
1364 name. See charnames.
1365
1366 [4] "\N{U+hexadecimal number}" means the Unicode character whose
1367 Unicode code point is hexadecimal number.
1368
1369 [5] The character following "\c" is mapped to some other character as
1370 shown in the table:
1371
1372 Sequence Value
1373 \c@ chr(0)
1374 \cA chr(1)
1375 \ca chr(1)
1376 \cB chr(2)
1377 \cb chr(2)
1378 ...
1379 \cZ chr(26)
1380 \cz chr(26)
1381 \c[ chr(27)
1382 # See below for chr(28)
1383 \c] chr(29)
1384 \c^ chr(30)
1385 \c_ chr(31)
1386 \c? chr(127) # (on ASCII platforms; see below for link to
1387 # EBCDIC discussion)
1388
1389 In other words, it's the character whose code point has had 64
1390 xor'd with its uppercase. "\c?" is DELETE on ASCII platforms
1391 because "ord("?") ^ 64" is 127, and "\c@" is NULL because the ord
1392 of "@" is 64, so xor'ing 64 itself produces 0.
1393
1394 Also, "\c\X" yields " chr(28) . "X"" for any X, but cannot come at
1395 the end of a string, because the backslash would be parsed as
1396 escaping the end quote.
1397
1398 On ASCII platforms, the resulting characters from the list above
1399 are the complete set of ASCII controls. This isn't the case on
1400 EBCDIC platforms; see "OPERATOR DIFFERENCES" in perlebcdic for a
1401 full discussion of the differences between these for ASCII versus
1402 EBCDIC platforms.
1403
1404 Use of any other character following the "c" besides those listed
1405 above is discouraged, and as of Perl v5.20, the only characters
1406 actually allowed are the printable ASCII ones, minus the left brace
1407 "{". What happens for any of the allowed other characters is that
1408 the value is derived by xor'ing with the seventh bit, which is 64,
1409 and a warning raised if enabled. Using the non-allowed characters
1410 generates a fatal error.
1411
1412 To get platform independent controls, you can use "\N{...}".
1413
1414 [6] The result is the character specified by the octal number between
1415 the braces. See "[8]" below for details on which character.
1416
1417 If a character that isn't an octal digit is encountered, a warning
1418 is raised, and the value is based on the octal digits before it,
1419 discarding it and all following characters up to the closing brace.
1420 It is a fatal error if there are no octal digits at all.
1421
1422 [7] The result is the character specified by the three-digit octal
1423 number in the range 000 to 777 (but best to not use above 077, see
1424 next paragraph). See "[8]" below for details on which character.
1425
1426 Some contexts allow 2 or even 1 digit, but any usage without
1427 exactly three digits, the first being a zero, may give unintended
1428 results. (For example, in a regular expression it may be confused
1429 with a backreference; see "Octal escapes" in perlrebackslash.)
1430 Starting in Perl 5.14, you may use "\o{}" instead, which avoids all
1431 these problems. Otherwise, it is best to use this construct only
1432 for ordinals "\077" and below, remembering to pad to the left with
1433 zeros to make three digits. For larger ordinals, either use
1434 "\o{}", or convert to something else, such as to hex and use
1435 "\N{U+}" (which is portable between platforms with different
1436 character sets) or "\x{}" instead.
1437
1438 [8] Several constructs above specify a character by a number. That
1439 number gives the character's position in the character set encoding
1440 (indexed from 0). This is called synonymously its ordinal, code
1441 position, or code point. Perl works on platforms that have a
1442 native encoding currently of either ASCII/Latin1 or EBCDIC, each of
1443 which allow specification of 256 characters. In general, if the
1444 number is 255 (0xFF, 0377) or below, Perl interprets this in the
1445 platform's native encoding. If the number is 256 (0x100, 0400) or
1446 above, Perl interprets it as a Unicode code point and the result is
1447 the corresponding Unicode character. For example "\x{50}" and
1448 "\o{120}" both are the number 80 in decimal, which is less than
1449 256, so the number is interpreted in the native character set
1450 encoding. In ASCII the character in the 80th position (indexed
1451 from 0) is the letter "P", and in EBCDIC it is the ampersand symbol
1452 "&". "\x{100}" and "\o{400}" are both 256 in decimal, so the
1453 number is interpreted as a Unicode code point no matter what the
1454 native encoding is. The name of the character in the 256th
1455 position (indexed by 0) in Unicode is "LATIN CAPITAL LETTER A WITH
1456 MACRON".
1457
1458 An exception to the above rule is that "\N{U+hex number}" is always
1459 interpreted as a Unicode code point, so that "\N{U+0050}" is "P"
1460 even on EBCDIC platforms.
1461
1462 NOTE: Unlike C and other languages, Perl has no "\v" escape sequence
1463 for the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but
1464 you may use "\N{VT}", "\ck", "\N{U+0b}", or "\x0b". ("\v" does have
1465 meaning in regular expression patterns in Perl, see perlre.)
1466
1467 The following escape sequences are available in constructs that
1468 interpolate, but not in transliterations.
1469
1470 \l lowercase next character only
1471 \u titlecase (not uppercase!) next character only
1472 \L lowercase all characters till \E or end of string
1473 \U uppercase all characters till \E or end of string
1474 \F foldcase all characters till \E or end of string
1475 \Q quote (disable) pattern metacharacters till \E or
1476 end of string
1477 \E end either case modification or quoted section
1478 (whichever was last seen)
1479
1480 See "quotemeta" in perlfunc for the exact definition of characters that
1481 are quoted by "\Q".
1482
1483 "\L", "\U", "\F", and "\Q" can stack, in which case you need one "\E"
1484 for each. For example:
1485
1486 say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
1487 This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
1488
1489 If a "use locale" form that includes "LC_CTYPE" is in effect (see
1490 perllocale), the case map used by "\l", "\L", "\u", and "\U" is taken
1491 from the current locale. If Unicode (for example, "\N{}" or code
1492 points of 0x100 or beyond) is being used, the case map used by "\l",
1493 "\L", "\u", and "\U" is as defined by Unicode. That means that case-
1494 mapping a single character can sometimes produce a sequence of several
1495 characters. Under "use locale", "\F" produces the same results as "\L"
1496 for all locales but a UTF-8 one, where it instead uses the Unicode
1497 definition.
1498
1499 All systems use the virtual "\n" to represent a line terminator, called
1500 a "newline". There is no such thing as an unvarying, physical newline
1501 character. It is only an illusion that the operating system, device
1502 drivers, C libraries, and Perl all conspire to preserve. Not all
1503 systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on
1504 the ancient Macs (pre-MacOS X) of yesteryear, these used to be
1505 reversed, and on systems without a line terminator, printing "\n" might
1506 emit no actual data. In general, use "\n" when you mean a "newline"
1507 for your system, but use the literal ASCII when you need an exact
1508 character. For example, most networking protocols expect and prefer a
1509 CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they
1510 often accept just "\012", they seldom tolerate just "\015". If you get
1511 in the habit of using "\n" for networking, you may be burned some day.
1512
1513 For constructs that do interpolate, variables beginning with ""$"" or
1514 ""@"" are interpolated. Subscripted variables such as $a[3] or
1515 "$href->{key}[0]" are also interpolated, as are array and hash slices.
1516 But method calls such as "$obj->meth" are not.
1517
1518 Interpolating an array or slice interpolates the elements in order,
1519 separated by the value of $", so is equivalent to interpolating
1520 "join $", @array". "Punctuation" arrays such as "@*" are usually
1521 interpolated only if the name is enclosed in braces "@{*}", but the
1522 arrays @_, "@+", and "@-" are interpolated even without braces.
1523
1524 For double-quoted strings, the quoting from "\Q" is applied after
1525 interpolation and escapes are processed.
1526
1527 "abc\Qfoo\tbar$s\Exyz"
1528
1529 is equivalent to
1530
1531 "abc" . quotemeta("foo\tbar$s") . "xyz"
1532
1533 For the pattern of regex operators ("qr//", "m//" and "s///"), the
1534 quoting from "\Q" is applied after interpolation is processed, but
1535 before escapes are processed. This allows the pattern to match
1536 literally (except for "$" and "@"). For example, the following
1537 matches:
1538
1539 '\s\t' =~ /\Q\s\t/
1540
1541 Because "$" or "@" trigger interpolation, you'll need to use something
1542 like "/\Quser\E\@\Qhost/" to match them literally.
1543
1544 Patterns are subject to an additional level of interpretation as a
1545 regular expression. This is done as a second pass, after variables are
1546 interpolated, so that regular expressions may be incorporated into the
1547 pattern from the variables. If this is not what you want, use "\Q" to
1548 interpolate a variable literally.
1549
1550 Apart from the behavior described above, Perl does not expand multiple
1551 levels of interpolation. In particular, contrary to the expectations
1552 of shell programmers, back-quotes do NOT interpolate within double
1553 quotes, nor do single quotes impede evaluation of variables when used
1554 within double quotes.
1555
1556 Regexp Quote-Like Operators
1557 Here are the quote-like operators that apply to pattern matching and
1558 related activities.
1559
1560 "qr/STRING/msixpodualn"
1561 This operator quotes (and possibly compiles) its STRING as a
1562 regular expression. STRING is interpolated the same way as
1563 PATTERN in "m/PATTERN/". If "'" is used as the delimiter, no
1564 variable interpolation is done. Returns a Perl value which may
1565 be used instead of the corresponding "/STRING/msixpodualn"
1566 expression. The returned value is a normalized version of the
1567 original pattern. It magically differs from a string
1568 containing the same characters: "ref(qr/x/)" returns "Regexp";
1569 however, dereferencing it is not well defined (you currently
1570 get the normalized version of the original pattern, but this
1571 may change).
1572
1573 For example,
1574
1575 $rex = qr/my.STRING/is;
1576 print $rex; # prints (?si-xm:my.STRING)
1577 s/$rex/foo/;
1578
1579 is equivalent to
1580
1581 s/my.STRING/foo/is;
1582
1583 The result may be used as a subpattern in a match:
1584
1585 $re = qr/$pattern/;
1586 $string =~ /foo${re}bar/; # can be interpolated in other
1587 # patterns
1588 $string =~ $re; # or used standalone
1589 $string =~ /$re/; # or this way
1590
1591 Since Perl may compile the pattern at the moment of execution
1592 of the "qr()" operator, using "qr()" may have speed advantages
1593 in some situations, notably if the result of "qr()" is used
1594 standalone:
1595
1596 sub match {
1597 my $patterns = shift;
1598 my @compiled = map qr/$_/i, @$patterns;
1599 grep {
1600 my $success = 0;
1601 foreach my $pat (@compiled) {
1602 $success = 1, last if /$pat/;
1603 }
1604 $success;
1605 } @_;
1606 }
1607
1608 Precompilation of the pattern into an internal representation
1609 at the moment of "qr()" avoids the need to recompile the
1610 pattern every time a match "/$pat/" is attempted. (Perl has
1611 many other internal optimizations, but none would be triggered
1612 in the above example if we did not use "qr()" operator.)
1613
1614 Options (specified by the following modifiers) are:
1615
1616 m Treat string as multiple lines.
1617 s Treat string as single line. (Make . match a newline)
1618 i Do case-insensitive pattern matching.
1619 x Use extended regular expressions; specifying two
1620 x's means \t and the SPACE character are ignored within
1621 square-bracketed character classes
1622 p When matching preserve a copy of the matched string so
1623 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
1624 defined (ignored starting in v5.20) as these are always
1625 defined starting in that release
1626 o Compile pattern only once.
1627 a ASCII-restrict: Use ASCII for \d, \s, \w and [[:posix:]]
1628 character classes; specifying two a's adds the further
1629 restriction that no ASCII character will match a
1630 non-ASCII one under /i.
1631 l Use the current run-time locale's rules.
1632 u Use Unicode rules.
1633 d Use Unicode or native charset, as in 5.12 and earlier.
1634 n Non-capture mode. Don't let () fill in $1, $2, etc...
1635
1636 If a precompiled pattern is embedded in a larger pattern then
1637 the effect of "msixpluadn" will be propagated appropriately.
1638 The effect that the "/o" modifier has is not propagated, being
1639 restricted to those patterns explicitly using it.
1640
1641 The "/a", "/d", "/l", and "/u" modifiers (added in Perl 5.14)
1642 control the character set rules, but "/a" is the only one you
1643 are likely to want to specify explicitly; the other three are
1644 selected automatically by various pragmas.
1645
1646 See perlre for additional information on valid syntax for
1647 STRING, and for a detailed look at the semantics of regular
1648 expressions. In particular, all modifiers except the largely
1649 obsolete "/o" are further explained in "Modifiers" in perlre.
1650 "/o" is described in the next section.
1651
1652 "m/PATTERN/msixpodualngc"
1653 "/PATTERN/msixpodualngc"
1654 Searches a string for a pattern match, and in scalar context
1655 returns true if it succeeds, false if it fails. If no string
1656 is specified via the "=~" or "!~" operator, the $_ string is
1657 searched. (The string specified with "=~" need not be an
1658 lvalue--it may be the result of an expression evaluation, but
1659 remember the "=~" binds rather tightly.) See also perlre.
1660
1661 Options are as described in "qr//" above; in addition, the
1662 following match process modifiers are available:
1663
1664 g Match globally, i.e., find all occurrences.
1665 c Do not reset search position on a failed match when /g is
1666 in effect.
1667
1668 If "/" is the delimiter then the initial "m" is optional. With
1669 the "m" you can use any pair of non-whitespace (ASCII)
1670 characters as delimiters. This is particularly useful for
1671 matching path names that contain "/", to avoid LTS (leaning
1672 toothpick syndrome). If "?" is the delimiter, then a match-
1673 only-once rule applies, described in "m?PATTERN?" below. If
1674 "'" (single quote) is the delimiter, no variable interpolation
1675 is performed on the PATTERN. When using a delimiter character
1676 valid in an identifier, whitespace is required after the "m".
1677
1678 PATTERN may contain variables, which will be interpolated every
1679 time the pattern search is evaluated, except for when the
1680 delimiter is a single quote. (Note that $(, $), and $| are not
1681 interpolated because they look like end-of-string tests.) Perl
1682 will not recompile the pattern unless an interpolated variable
1683 that it contains changes. You can force Perl to skip the test
1684 and never recompile by adding a "/o" (which stands for "once")
1685 after the trailing delimiter. Once upon a time, Perl would
1686 recompile regular expressions unnecessarily, and this modifier
1687 was useful to tell it not to do so, in the interests of speed.
1688 But now, the only reasons to use "/o" are one of:
1689
1690 1. The variables are thousands of characters long and you know
1691 that they don't change, and you need to wring out the last
1692 little bit of speed by having Perl skip testing for that.
1693 (There is a maintenance penalty for doing this, as
1694 mentioning "/o" constitutes a promise that you won't change
1695 the variables in the pattern. If you do change them, Perl
1696 won't even notice.)
1697
1698 2. you want the pattern to use the initial values of the
1699 variables regardless of whether they change or not. (But
1700 there are saner ways of accomplishing this than using
1701 "/o".)
1702
1703 3. If the pattern contains embedded code, such as
1704
1705 use re 'eval';
1706 $code = 'foo(?{ $x })';
1707 /$code/
1708
1709 then perl will recompile each time, even though the pattern
1710 string hasn't changed, to ensure that the current value of
1711 $x is seen each time. Use "/o" if you want to avoid this.
1712
1713 The bottom line is that using "/o" is almost never a good idea.
1714
1715 The empty pattern "//"
1716 If the PATTERN evaluates to the empty string, the last
1717 successfully matched regular expression is used instead. In
1718 this case, only the "g" and "c" flags on the empty pattern are
1719 honored; the other flags are taken from the original pattern.
1720 If no match has previously succeeded, this will (silently) act
1721 instead as a genuine empty pattern (which will always match).
1722
1723 Note that it's possible to confuse Perl into thinking "//" (the
1724 empty regex) is really "//" (the defined-or operator). Perl is
1725 usually pretty good about this, but some pathological cases
1726 might trigger this, such as "$x///" (is that "($x) / (//)" or
1727 "$x // /"?) and "print $fh //" ("print $fh(//" or
1728 "print($fh //"?). In all of these examples, Perl will assume
1729 you meant defined-or. If you meant the empty regex, just use
1730 parentheses or spaces to disambiguate, or even prefix the empty
1731 regex with an "m" (so "//" becomes "m//").
1732
1733 Matching in list context
1734 If the "/g" option is not used, "m//" in list context returns a
1735 list consisting of the subexpressions matched by the
1736 parentheses in the pattern, that is, ($1, $2, $3...) (Note
1737 that here $1 etc. are also set). When there are no parentheses
1738 in the pattern, the return value is the list "(1)" for success.
1739 With or without parentheses, an empty list is returned upon
1740 failure.
1741
1742 Examples:
1743
1744 open(TTY, "+</dev/tty")
1745 || die "can't access /dev/tty: $!";
1746
1747 <TTY> =~ /^y/i && foo(); # do foo if desired
1748
1749 if (/Version: *([0-9.]*)/) { $version = $1; }
1750
1751 next if m#^/usr/spool/uucp#;
1752
1753 # poor man's grep
1754 $arg = shift;
1755 while (<>) {
1756 print if /$arg/o; # compile only once (no longer needed!)
1757 }
1758
1759 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
1760
1761 This last example splits $foo into the first two words and the
1762 remainder of the line, and assigns those three fields to $F1,
1763 $F2, and $Etc. The conditional is true if any variables were
1764 assigned; that is, if the pattern matched.
1765
1766 The "/g" modifier specifies global pattern matching--that is,
1767 matching as many times as possible within the string. How it
1768 behaves depends on the context. In list context, it returns a
1769 list of the substrings matched by any capturing parentheses in
1770 the regular expression. If there are no parentheses, it
1771 returns a list of all the matched strings, as if there were
1772 parentheses around the whole pattern.
1773
1774 In scalar context, each execution of "m//g" finds the next
1775 match, returning true if it matches, and false if there is no
1776 further match. The position after the last match can be read
1777 or set using the "pos()" function; see "pos" in perlfunc. A
1778 failed match normally resets the search position to the
1779 beginning of the string, but you can avoid that by adding the
1780 "/c" modifier (for example, "m//gc"). Modifying the target
1781 string also resets the search position.
1782
1783 "\G assertion"
1784 You can intermix "m//g" matches with "m/\G.../g", where "\G" is
1785 a zero-width assertion that matches the exact position where
1786 the previous "m//g", if any, left off. Without the "/g"
1787 modifier, the "\G" assertion still anchors at "pos()" as it was
1788 at the start of the operation (see "pos" in perlfunc), but the
1789 match is of course only attempted once. Using "\G" without
1790 "/g" on a target string that has not previously had a "/g"
1791 match applied to it is the same as using the "\A" assertion to
1792 match the beginning of the string. Note also that, currently,
1793 "\G" is only properly supported when anchored at the very
1794 beginning of the pattern.
1795
1796 Examples:
1797
1798 # list context
1799 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1800
1801 # scalar context
1802 local $/ = "";
1803 while ($paragraph = <>) {
1804 while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
1805 $sentences++;
1806 }
1807 }
1808 say $sentences;
1809
1810 Here's another way to check for sentences in a paragraph:
1811
1812 my $sentence_rx = qr{
1813 (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
1814 # whitespace
1815 \p{Lu} # capital letter
1816 .*? # a bunch of anything
1817 (?<= \S ) # that ends in non-
1818 # whitespace
1819 (?<! \b [DMS]r ) # but isn't a common abbr.
1820 (?<! \b Mrs )
1821 (?<! \b Sra )
1822 (?<! \b St )
1823 [.?!] # followed by a sentence
1824 # ender
1825 (?= $ | \s ) # in front of end-of-string
1826 # or whitespace
1827 }sx;
1828 local $/ = "";
1829 while (my $paragraph = <>) {
1830 say "NEW PARAGRAPH";
1831 my $count = 0;
1832 while ($paragraph =~ /($sentence_rx)/g) {
1833 printf "\tgot sentence %d: <%s>\n", ++$count, $1;
1834 }
1835 }
1836
1837 Here's how to use "m//gc" with "\G":
1838
1839 $_ = "ppooqppqq";
1840 while ($i++ < 2) {
1841 print "1: '";
1842 print $1 while /(o)/gc; print "', pos=", pos, "\n";
1843 print "2: '";
1844 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
1845 print "3: '";
1846 print $1 while /(p)/gc; print "', pos=", pos, "\n";
1847 }
1848 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
1849
1850 The last example should print:
1851
1852 1: 'oo', pos=4
1853 2: 'q', pos=5
1854 3: 'pp', pos=7
1855 1: '', pos=7
1856 2: 'q', pos=8
1857 3: '', pos=8
1858 Final: 'q', pos=8
1859
1860 Notice that the final match matched "q" instead of "p", which a
1861 match without the "\G" anchor would have done. Also note that
1862 the final match did not update "pos". "pos" is only updated on
1863 a "/g" match. If the final match did indeed match "p", it's a
1864 good bet that you're running an ancient (pre-5.6.0) version of
1865 Perl.
1866
1867 A useful idiom for "lex"-like scanners is "/\G.../gc". You can
1868 combine several regexps like this to process a string part-by-
1869 part, doing different actions depending on which regexp
1870 matched. Each regexp tries to match where the previous one
1871 leaves off.
1872
1873 $_ = <<'EOL';
1874 $url = URI::URL->new( "http://example.com/" );
1875 die if $url eq "xXx";
1876 EOL
1877
1878 LOOP: {
1879 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
1880 print(" lowercase"), redo LOOP
1881 if /\G\p{Ll}+\b[,.;]?\s*/gc;
1882 print(" UPPERCASE"), redo LOOP
1883 if /\G\p{Lu}+\b[,.;]?\s*/gc;
1884 print(" Capitalized"), redo LOOP
1885 if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
1886 print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
1887 print(" alphanumeric"), redo LOOP
1888 if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
1889 print(" line-noise"), redo LOOP if /\G\W+/gc;
1890 print ". That's all!\n";
1891 }
1892
1893 Here is the output (split into several lines):
1894
1895 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
1896 line-noise lowercase line-noise lowercase line-noise lowercase
1897 lowercase line-noise lowercase lowercase line-noise lowercase
1898 lowercase line-noise MiXeD line-noise. That's all!
1899
1900 "m?PATTERN?msixpodualngc"
1901 This is just like the "m/PATTERN/" search, except that it
1902 matches only once between calls to the "reset()" operator.
1903 This is a useful optimization when you want to see only the
1904 first occurrence of something in each file of a set of files,
1905 for instance. Only "m??" patterns local to the current
1906 package are reset.
1907
1908 while (<>) {
1909 if (m?^$?) {
1910 # blank line between header and body
1911 }
1912 } continue {
1913 reset if eof; # clear m?? status for next file
1914 }
1915
1916 Another example switched the first "latin1" encoding it finds
1917 to "utf8" in a pod file:
1918
1919 s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
1920
1921 The match-once behavior is controlled by the match delimiter
1922 being "?"; with any other delimiter this is the normal "m//"
1923 operator.
1924
1925 In the past, the leading "m" in "m?PATTERN?" was optional, but
1926 omitting it would produce a deprecation warning. As of
1927 v5.22.0, omitting it produces a syntax error. If you encounter
1928 this construct in older code, you can just add "m".
1929
1930 "s/PATTERN/REPLACEMENT/msixpodualngcer"
1931 Searches a string for a pattern, and if found, replaces that
1932 pattern with the replacement text and returns the number of
1933 substitutions made. Otherwise it returns false (a value that
1934 is both an empty string ("") and numeric zero (0) as described
1935 in "Relational Operators").
1936
1937 If the "/r" (non-destructive) option is used then it runs the
1938 substitution on a copy of the string and instead of returning
1939 the number of substitutions, it returns the copy whether or not
1940 a substitution occurred. The original string is never changed
1941 when "/r" is used. The copy will always be a plain string,
1942 even if the input is an object or a tied variable.
1943
1944 If no string is specified via the "=~" or "!~" operator, the $_
1945 variable is searched and modified. Unless the "/r" option is
1946 used, the string specified must be a scalar variable, an array
1947 element, a hash element, or an assignment to one of those; that
1948 is, some sort of scalar lvalue.
1949
1950 If the delimiter chosen is a single quote, no variable
1951 interpolation is done on either the PATTERN or the REPLACEMENT.
1952 Otherwise, if the PATTERN contains a "$" that looks like a
1953 variable rather than an end-of-string test, the variable will
1954 be interpolated into the pattern at run-time. If you want the
1955 pattern compiled only once the first time the variable is
1956 interpolated, use the "/o" option. If the pattern evaluates to
1957 the empty string, the last successfully executed regular
1958 expression is used instead. See perlre for further explanation
1959 on these.
1960
1961 Options are as with "m//" with the addition of the following
1962 replacement specific options:
1963
1964 e Evaluate the right side as an expression.
1965 ee Evaluate the right side as a string then eval the
1966 result.
1967 r Return substitution and leave the original string
1968 untouched.
1969
1970 Any non-whitespace delimiter may replace the slashes. Add
1971 space after the "s" when using a character allowed in
1972 identifiers. If single quotes are used, no interpretation is
1973 done on the replacement string (the "/e" modifier overrides
1974 this, however). Note that Perl treats backticks as normal
1975 delimiters; the replacement text is not evaluated as a command.
1976 If the PATTERN is delimited by bracketing quotes, the
1977 REPLACEMENT has its own pair of quotes, which may or may not be
1978 bracketing quotes, for example, "s(foo)(bar)" or "s<foo>/bar/".
1979 A "/e" will cause the replacement portion to be treated as a
1980 full-fledged Perl expression and evaluated right then and
1981 there. It is, however, syntax checked at compile-time. A
1982 second "e" modifier will cause the replacement portion to be
1983 "eval"ed before being run as a Perl expression.
1984
1985 Examples:
1986
1987 s/\bgreen\b/mauve/g; # don't change wintergreen
1988
1989 $path =~ s|/usr/bin|/usr/local/bin|;
1990
1991 s/Login: $foo/Login: $bar/; # run-time pattern
1992
1993 ($foo = $bar) =~ s/this/that/; # copy first, then
1994 # change
1995 ($foo = "$bar") =~ s/this/that/; # convert to string,
1996 # copy, then change
1997 $foo = $bar =~ s/this/that/r; # Same as above using /r
1998 $foo = $bar =~ s/this/that/r
1999 =~ s/that/the other/r; # Chained substitutes
2000 # using /r
2001 @foo = map { s/this/that/r } @bar # /r is very useful in
2002 # maps
2003
2004 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
2005
2006 $_ = 'abc123xyz';
2007 s/\d+/$&*2/e; # yields 'abc246xyz'
2008 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
2009 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
2010
2011 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
2012 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
2013 s/^=(\w+)/pod($1)/ge; # use function call
2014
2015 $_ = 'abc123xyz';
2016 $x = s/abc/def/r; # $x is 'def123xyz' and
2017 # $_ remains 'abc123xyz'.
2018
2019 # expand variables in $_, but dynamics only, using
2020 # symbolic dereferencing
2021 s/\$(\w+)/${$1}/g;
2022
2023 # Add one to the value of any numbers in the string
2024 s/(\d+)/1 + $1/eg;
2025
2026 # Titlecase words in the last 30 characters only
2027 substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
2028
2029 # This will expand any embedded scalar variable
2030 # (including lexicals) in $_ : First $1 is interpolated
2031 # to the variable name, and then evaluated
2032 s/(\$\w+)/$1/eeg;
2033
2034 # Delete (most) C comments.
2035 $program =~ s {
2036 /\* # Match the opening delimiter.
2037 .*? # Match a minimal number of characters.
2038 \*/ # Match the closing delimiter.
2039 } []gsx;
2040
2041 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
2042 # expensively
2043
2044 for ($variable) { # trim whitespace in $variable,
2045 # cheap
2046 s/^\s+//;
2047 s/\s+$//;
2048 }
2049
2050 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
2051
2052 $foo !~ s/A/a/g; # Lowercase all A's in $foo; return
2053 # 0 if any were found and changed;
2054 # otherwise return 1
2055
2056 Note the use of "$" instead of "\" in the last example. Unlike
2057 sed, we use the \<digit> form only in the left hand side.
2058 Anywhere else it's $<digit>.
2059
2060 Occasionally, you can't use just a "/g" to get all the changes
2061 to occur that you might want. Here are two common cases:
2062
2063 # put commas in the right places in an integer
2064 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
2065
2066 # expand tabs to 8-column spacing
2067 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
2068
2069 Quote-Like Operators
2070 "q/STRING/"
2071 'STRING'
2072 A single-quoted, literal string. A backslash represents a
2073 backslash unless followed by the delimiter or another backslash, in
2074 which case the delimiter or backslash is interpolated.
2075
2076 $foo = q!I said, "You said, 'She said it.'"!;
2077 $bar = q('This is it.');
2078 $baz = '\n'; # a two-character string
2079
2080 "qq/STRING/"
2081 "STRING"
2082 A double-quoted, interpolated string.
2083
2084 $_ .= qq
2085 (*** The previous line contains the naughty word "$1".\n)
2086 if /\b(tcl|java|python)\b/i; # :-)
2087 $baz = "\n"; # a one-character string
2088
2089 "qx/STRING/"
2090 "`STRING`"
2091 A string which is (possibly) interpolated and then executed as a
2092 system command with /bin/sh or its equivalent. Shell wildcards,
2093 pipes, and redirections will be honored. The collected standard
2094 output of the command is returned; standard error is unaffected.
2095 In scalar context, it comes back as a single (potentially multi-
2096 line) string, or "undef" if the command failed. In list context,
2097 returns a list of lines (however you've defined lines with $/ or
2098 $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
2099
2100 Because backticks do not affect standard error, use shell file
2101 descriptor syntax (assuming the shell supports this) if you care to
2102 address this. To capture a command's STDERR and STDOUT together:
2103
2104 $output = `cmd 2>&1`;
2105
2106 To capture a command's STDOUT but discard its STDERR:
2107
2108 $output = `cmd 2>/dev/null`;
2109
2110 To capture a command's STDERR but discard its STDOUT (ordering is
2111 important here):
2112
2113 $output = `cmd 2>&1 1>/dev/null`;
2114
2115 To exchange a command's STDOUT and STDERR in order to capture the
2116 STDERR but leave its STDOUT to come out the old STDERR:
2117
2118 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
2119
2120 To read both a command's STDOUT and its STDERR separately, it's
2121 easiest to redirect them separately to files, and then read from
2122 those files when the program is done:
2123
2124 system("program args 1>program.stdout 2>program.stderr");
2125
2126 The STDIN filehandle used by the command is inherited from Perl's
2127 STDIN. For example:
2128
2129 open(SPLAT, "stuff") || die "can't open stuff: $!";
2130 open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
2131 print STDOUT `sort`;
2132
2133 will print the sorted contents of the file named "stuff".
2134
2135 Using single-quote as a delimiter protects the command from Perl's
2136 double-quote interpolation, passing it on to the shell instead:
2137
2138 $perl_info = qx(ps $$); # that's Perl's $$
2139 $shell_info = qx'ps $$'; # that's the new shell's $$
2140
2141 How that string gets evaluated is entirely subject to the command
2142 interpreter on your system. On most platforms, you will have to
2143 protect shell metacharacters if you want them treated literally.
2144 This is in practice difficult to do, as it's unclear how to escape
2145 which characters. See perlsec for a clean and safe example of a
2146 manual "fork()" and "exec()" to emulate backticks safely.
2147
2148 On some platforms (notably DOS-like ones), the shell may not be
2149 capable of dealing with multiline commands, so putting newlines in
2150 the string may not get you what you want. You may be able to
2151 evaluate multiple commands in a single line by separating them with
2152 the command separator character, if your shell supports that (for
2153 example, ";" on many Unix shells and "&" on the Windows NT "cmd"
2154 shell).
2155
2156 Perl will attempt to flush all files opened for output before
2157 starting the child process, but this may not be supported on some
2158 platforms (see perlport). To be safe, you may need to set $|
2159 ($AUTOFLUSH in "English") or call the "autoflush()" method of
2160 "IO::Handle" on any open handles.
2161
2162 Beware that some command shells may place restrictions on the
2163 length of the command line. You must ensure your strings don't
2164 exceed this limit after any necessary interpolations. See the
2165 platform-specific release notes for more details about your
2166 particular environment.
2167
2168 Using this operator can lead to programs that are difficult to
2169 port, because the shell commands called vary between systems, and
2170 may in fact not be present at all. As one example, the "type"
2171 command under the POSIX shell is very different from the "type"
2172 command under DOS. That doesn't mean you should go out of your way
2173 to avoid backticks when they're the right way to get something
2174 done. Perl was made to be a glue language, and one of the things
2175 it glues together is commands. Just understand what you're getting
2176 yourself into.
2177
2178 Like "system", backticks put the child process exit code in $?. If
2179 you'd like to manually inspect failure, you can check all possible
2180 failure modes by inspecting $? like this:
2181
2182 if ($? == -1) {
2183 print "failed to execute: $!\n";
2184 }
2185 elsif ($? & 127) {
2186 printf "child died with signal %d, %s coredump\n",
2187 ($? & 127), ($? & 128) ? 'with' : 'without';
2188 }
2189 else {
2190 printf "child exited with value %d\n", $? >> 8;
2191 }
2192
2193 Use the open pragma to control the I/O layers used when reading the
2194 output of the command, for example:
2195
2196 use open IN => ":encoding(UTF-8)";
2197 my $x = `cmd-producing-utf-8`;
2198
2199 "qx//" can also be called like a function with "readpipe" in
2200 perlfunc.
2201
2202 See "I/O Operators" for more discussion.
2203
2204 "qw/STRING/"
2205 Evaluates to a list of the words extracted out of STRING, using
2206 embedded whitespace as the word delimiters. It can be understood
2207 as being roughly equivalent to:
2208
2209 split(" ", q/STRING/);
2210
2211 the differences being that it only splits on ASCII whitespace,
2212 generates a real list at compile time, and in scalar context it
2213 returns the last element in the list. So this expression:
2214
2215 qw(foo bar baz)
2216
2217 is semantically equivalent to the list:
2218
2219 "foo", "bar", "baz"
2220
2221 Some frequently seen examples:
2222
2223 use POSIX qw( setlocale localeconv )
2224 @EXPORT = qw( foo bar baz );
2225
2226 A common mistake is to try to separate the words with commas or to
2227 put comments into a multi-line "qw"-string. For this reason, the
2228 "use warnings" pragma and the -w switch (that is, the $^W variable)
2229 produces warnings if the STRING contains the "," or the "#"
2230 character.
2231
2232 "tr/SEARCHLIST/REPLACEMENTLIST/cdsr"
2233 "y/SEARCHLIST/REPLACEMENTLIST/cdsr"
2234 Transliterates all occurrences of the characters found (or not
2235 found if the "/c" modifier is specified) in the search list with
2236 the positionally corresponding character in the replacement list,
2237 possibly deleting some, depending on the modifiers specified. It
2238 returns the number of characters replaced or deleted. If no string
2239 is specified via the "=~" or "!~" operator, the $_ string is
2240 transliterated.
2241
2242 For sed devotees, "y" is provided as a synonym for "tr".
2243
2244 If the "/r" (non-destructive) option is present, a new copy of the
2245 string is made and its characters transliterated, and this copy is
2246 returned no matter whether it was modified or not: the original
2247 string is always left unchanged. The new copy is always a plain
2248 string, even if the input string is an object or a tied variable.
2249
2250 Unless the "/r" option is used, the string specified with "=~" must
2251 be a scalar variable, an array element, a hash element, or an
2252 assignment to one of those; in other words, an lvalue.
2253
2254 If the characters delimiting SEARCHLIST and REPLACEMENTLIST are
2255 single quotes ("tr'SEARCHLIST'REPLACEMENTLIST'"), the only
2256 interpolation is removal of "\" from pairs of "\\".
2257
2258 Otherwise, a character range may be specified with a hyphen, so
2259 "tr/A-J/0-9/" does the same replacement as
2260 "tr/ACEGIBDFHJ/0246813579/".
2261
2262 If the SEARCHLIST is delimited by bracketing quotes, the
2263 REPLACEMENTLIST must have its own pair of quotes, which may or may
2264 not be bracketing quotes; for example, "tr[aeiouy][yuoiea]" or
2265 "tr(+\-*/)/ABCD/".
2266
2267 Characters may be literals, or (if the delimiters aren't single
2268 quotes) any of the escape sequences accepted in double-quoted
2269 strings. But there is never any variable interpolation, so "$" and
2270 "@" are always treated as literals. A hyphen at the beginning or
2271 end, or preceded by a backslash is also always considered a
2272 literal. Escape sequence details are in the table near the
2273 beginning of this section.
2274
2275 Note that "tr" does not do regular expression character classes
2276 such as "\d" or "\pL". The "tr" operator is not equivalent to the
2277 tr(1) utility. "tr[a-z][A-Z]" will uppercase the 26 letters "a"
2278 through "z", but for case changing not confined to ASCII, use "lc",
2279 "uc", "lcfirst", "ucfirst" (all documented in perlfunc), or the
2280 substitution operator "s/PATTERN/REPLACEMENT/" (with "\U", "\u",
2281 "\L", and "\l" string-interpolation escapes in the REPLACEMENT
2282 portion).
2283
2284 Most ranges are unportable between character sets, but certain ones
2285 signal Perl to do special handling to make them portable. There
2286 are two classes of portable ranges. The first are any subsets of
2287 the ranges "A-Z", "a-z", and "0-9", when expressed as literal
2288 characters.
2289
2290 tr/h-k/H-K/
2291
2292 capitalizes the letters "h", "i", "j", and "k" and nothing else, no
2293 matter what the platform's character set is. In contrast, all of
2294
2295 tr/\x68-\x6B/\x48-\x4B/
2296 tr/h-\x6B/H-\x4B/
2297 tr/\x68-k/\x48-K/
2298
2299 do the same capitalizations as the previous example when run on
2300 ASCII platforms, but something completely different on EBCDIC ones.
2301
2302 The second class of portable ranges is invoked when one or both of
2303 the range's end points are expressed as "\N{...}"
2304
2305 $string =~ tr/\N{U+20}-\N{U+7E}//d;
2306
2307 removes from $string all the platform's characters which are
2308 equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E.
2309 This is a portable range, and has the same effect on every platform
2310 it is run on. In this example, these are the ASCII printable
2311 characters. So after this is run, $string has only controls and
2312 characters which have no ASCII equivalents.
2313
2314 But, even for portable ranges, it is not generally obvious what is
2315 included without having to look things up in the manual. A sound
2316 principle is to use only ranges that both begin from, and end at,
2317 either ASCII alphabetics of equal case ("b-e", "B-E"), or digits
2318 ("1-4"). Anything else is unclear (and unportable unless "\N{...}"
2319 is used). If in doubt, spell out the character sets in full.
2320
2321 Options:
2322
2323 c Complement the SEARCHLIST.
2324 d Delete found but unreplaced characters.
2325 r Return the modified string and leave the original string
2326 untouched.
2327 s Squash duplicate replaced characters.
2328
2329 If the "/d" modifier is specified, any characters specified by
2330 SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that
2331 this is slightly more flexible than the behavior of some tr
2332 programs, which delete anything they find in the SEARCHLIST,
2333 period.)
2334
2335 If the "/s" modifier is specified, sequences of characters, all in
2336 a row, that were transliterated to the same character are squashed
2337 down to a single instance of that character.
2338
2339 my $a = "aaaba"
2340 $a =~ tr/a/a/s # $a now is "aba"
2341
2342 If the "/d" modifier is used, the REPLACEMENTLIST is always
2343 interpreted exactly as specified. Otherwise, if the
2344 REPLACEMENTLIST is shorter than the SEARCHLIST, the final
2345 character, if any, is replicated until it is long enough. There
2346 won't be a final character if and only if the REPLACEMENTLIST is
2347 empty, in which case REPLACEMENTLIST is copied from SEARCHLIST.
2348 An empty REPLACEMENTLIST is useful for counting characters in a
2349 class, or for squashing character sequences in a class.
2350
2351 tr/abcd// tr/abcd/abcd/
2352 tr/abcd/AB/ tr/abcd/ABBB/
2353 tr/abcd//d s/[abcd]//g
2354 tr/abcd/AB/d (tr/ab/AB/ + s/[cd]//g) - but run together
2355
2356 If the "/c" modifier is specified, the characters to be
2357 transliterated are the ones NOT in SEARCHLIST, that is, it is
2358 complemented. If "/d" and/or "/s" are also specified, they apply
2359 to the complemented SEARCHLIST. Recall, that if REPLACEMENTLIST is
2360 empty (except under "/d") a copy of SEARCHLIST is used instead.
2361 That copy is made after complementing under "/c". SEARCHLIST is
2362 sorted by code point order after complementing, and any
2363 REPLACEMENTLIST is applied to that sorted result. This means that
2364 under "/c", the order of the characters specified in SEARCHLIST is
2365 irrelevant. This can lead to different results on EBCDIC systems
2366 if REPLACEMENTLIST contains more than one character, hence it is
2367 generally non-portable to use "/c" with such a REPLACEMENTLIST.
2368
2369 Another way of describing the operation is this: If "/c" is
2370 specified, the SEARCHLIST is sorted by code point order, then
2371 complemented. If REPLACEMENTLIST is empty and "/d" is not
2372 specified, REPLACEMENTLIST is replaced by a copy of SEARCHLIST (as
2373 modified under "/c"), and these potentially modified lists are used
2374 as the basis for what follows. Any character in the target string
2375 that isn't in SEARCHLIST is passed through unchanged. Every other
2376 character in the target string is replaced by the character in
2377 REPLACEMENTLIST that positionally corresponds to its mate in
2378 SEARCHLIST, except that under "/s", the 2nd and following
2379 characters are squeezed out in a sequence of characters in a row
2380 that all translate to the same character. If SEARCHLIST is longer
2381 than REPLACEMENTLIST, characters in the target string that match a
2382 character in SEARCHLIST that doesn't have a correspondence in
2383 REPLACEMENTLIST are either deleted from the target string if "/d"
2384 is specified; or replaced by the final character in REPLACEMENTLIST
2385 if "/d" isn't specified.
2386
2387 Some examples:
2388
2389 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
2390
2391 $cnt = tr/*/*/; # count the stars in $_
2392 $cnt = tr/*//; # same thing
2393
2394 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
2395 $cnt = $sky =~ tr/*//; # same thing
2396
2397 $cnt = $sky =~ tr/*//c; # count all the non-stars in $sky
2398 $cnt = $sky =~ tr/*/*/c; # same, but transliterate each non-star
2399 # into a star, leaving the already-stars
2400 # alone. Afterwards, everything in $sky
2401 # is a star.
2402
2403 $cnt = tr/0-9//; # count the ASCII digits in $_
2404
2405 tr/a-zA-Z//s; # bookkeeper -> bokeper
2406 tr/o/o/s; # bookkeeper -> bokkeeper
2407 tr/oe/oe/s; # bookkeeper -> bokkeper
2408 tr/oe//s; # bookkeeper -> bokkeper
2409 tr/oe/o/s; # bookkeeper -> bokkopor
2410
2411 ($HOST = $host) =~ tr/a-z/A-Z/;
2412 $HOST = $host =~ tr/a-z/A-Z/r; # same thing
2413
2414 $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
2415 =~ s/:/ -p/r;
2416
2417 tr/a-zA-Z/ /cs; # change non-alphas to single space
2418
2419 @stripped = map tr/a-zA-Z/ /csr, @original;
2420 # /r with map
2421
2422 tr [\200-\377]
2423 [\000-\177]; # wickedly delete 8th bit
2424
2425 $foo !~ tr/A/a/ # transliterate all the A's in $foo to 'a',
2426 # return 0 if any were found and changed.
2427 # Otherwise return 1
2428
2429 If multiple transliterations are given for a character, only the
2430 first one is used:
2431
2432 tr/AAA/XYZ/
2433
2434 will transliterate any A to X.
2435
2436 Because the transliteration table is built at compile time, neither
2437 the SEARCHLIST nor the REPLACEMENTLIST are subjected to double
2438 quote interpolation. That means that if you want to use variables,
2439 you must use an "eval()":
2440
2441 eval "tr/$oldlist/$newlist/";
2442 die $@ if $@;
2443
2444 eval "tr/$oldlist/$newlist/, 1" or die $@;
2445
2446 "<<EOF"
2447 A line-oriented form of quoting is based on the shell "here-
2448 document" syntax. Following a "<<" you specify a string to
2449 terminate the quoted material, and all lines following the current
2450 line down to the terminating string are the value of the item.
2451
2452 Prefixing the terminating string with a "~" specifies that you want
2453 to use "Indented Here-docs" (see below).
2454
2455 The terminating string may be either an identifier (a word), or
2456 some quoted text. An unquoted identifier works like double quotes.
2457 There may not be a space between the "<<" and the identifier,
2458 unless the identifier is explicitly quoted. The terminating string
2459 must appear by itself (unquoted and with no surrounding whitespace)
2460 on the terminating line.
2461
2462 If the terminating string is quoted, the type of quotes used
2463 determine the treatment of the text.
2464
2465 Double Quotes
2466 Double quotes indicate that the text will be interpolated using
2467 exactly the same rules as normal double quoted strings.
2468
2469 print <<EOF;
2470 The price is $Price.
2471 EOF
2472
2473 print << "EOF"; # same as above
2474 The price is $Price.
2475 EOF
2476
2477 Single Quotes
2478 Single quotes indicate the text is to be treated literally with
2479 no interpolation of its content. This is similar to single
2480 quoted strings except that backslashes have no special meaning,
2481 with "\\" being treated as two backslashes and not one as they
2482 would in every other quoting construct.
2483
2484 Just as in the shell, a backslashed bareword following the "<<"
2485 means the same thing as a single-quoted string does:
2486
2487 $cost = <<'VISTA'; # hasta la ...
2488 That'll be $10 please, ma'am.
2489 VISTA
2490
2491 $cost = <<\VISTA; # Same thing!
2492 That'll be $10 please, ma'am.
2493 VISTA
2494
2495 This is the only form of quoting in perl where there is no need
2496 to worry about escaping content, something that code generators
2497 can and do make good use of.
2498
2499 Backticks
2500 The content of the here doc is treated just as it would be if
2501 the string were embedded in backticks. Thus the content is
2502 interpolated as though it were double quoted and then executed
2503 via the shell, with the results of the execution returned.
2504
2505 print << `EOC`; # execute command and get results
2506 echo hi there
2507 EOC
2508
2509 Indented Here-docs
2510 The here-doc modifier "~" allows you to indent your here-docs
2511 to make the code more readable:
2512
2513 if ($some_var) {
2514 print <<~EOF;
2515 This is a here-doc
2516 EOF
2517 }
2518
2519 This will print...
2520
2521 This is a here-doc
2522
2523 ...with no leading whitespace.
2524
2525 The delimiter is used to determine the exact whitespace to
2526 remove from the beginning of each line. All lines must have at
2527 least the same starting whitespace (except lines only
2528 containing a newline) or perl will croak. Tabs and spaces can
2529 be mixed, but are matched exactly. One tab will not be equal
2530 to 8 spaces!
2531
2532 Additional beginning whitespace (beyond what preceded the
2533 delimiter) will be preserved:
2534
2535 print <<~EOF;
2536 This text is not indented
2537 This text is indented with two spaces
2538 This text is indented with two tabs
2539 EOF
2540
2541 Finally, the modifier may be used with all of the forms
2542 mentioned above:
2543
2544 <<~\EOF;
2545 <<~'EOF'
2546 <<~"EOF"
2547 <<~`EOF`
2548
2549 And whitespace may be used between the "~" and quoted
2550 delimiters:
2551
2552 <<~ 'EOF'; # ... "EOF", `EOF`
2553
2554 It is possible to stack multiple here-docs in a row:
2555
2556 print <<"foo", <<"bar"; # you can stack them
2557 I said foo.
2558 foo
2559 I said bar.
2560 bar
2561
2562 myfunc(<< "THIS", 23, <<'THAT');
2563 Here's a line
2564 or two.
2565 THIS
2566 and here's another.
2567 THAT
2568
2569 Just don't forget that you have to put a semicolon on the end to
2570 finish the statement, as Perl doesn't know you're not going to try
2571 to do this:
2572
2573 print <<ABC
2574 179231
2575 ABC
2576 + 20;
2577
2578 If you want to remove the line terminator from your here-docs, use
2579 "chomp()".
2580
2581 chomp($string = <<'END');
2582 This is a string.
2583 END
2584
2585 If you want your here-docs to be indented with the rest of the
2586 code, use the "<<~FOO" construct described under "Indented Here-
2587 docs":
2588
2589 $quote = <<~'FINIS';
2590 The Road goes ever on and on,
2591 down from the door where it began.
2592 FINIS
2593
2594 If you use a here-doc within a delimited construct, such as in
2595 "s///eg", the quoted material must still come on the line following
2596 the "<<FOO" marker, which means it may be inside the delimited
2597 construct:
2598
2599 s/this/<<E . 'that'
2600 the other
2601 E
2602 . 'more '/eg;
2603
2604 It works this way as of Perl 5.18. Historically, it was
2605 inconsistent, and you would have to write
2606
2607 s/this/<<E . 'that'
2608 . 'more '/eg;
2609 the other
2610 E
2611
2612 outside of string evals.
2613
2614 Additionally, quoting rules for the end-of-string identifier are
2615 unrelated to Perl's quoting rules. "q()", "qq()", and the like are
2616 not supported in place of '' and "", and the only interpolation is
2617 for backslashing the quoting character:
2618
2619 print << "abc\"def";
2620 testing...
2621 abc"def
2622
2623 Finally, quoted strings cannot span multiple lines. The general
2624 rule is that the identifier must be a string literal. Stick with
2625 that, and you should be safe.
2626
2627 Gory details of parsing quoted constructs
2628 When presented with something that might have several different
2629 interpretations, Perl uses the DWIM (that's "Do What I Mean") principle
2630 to pick the most probable interpretation. This strategy is so
2631 successful that Perl programmers often do not suspect the ambivalence
2632 of what they write. But from time to time, Perl's notions differ
2633 substantially from what the author honestly meant.
2634
2635 This section hopes to clarify how Perl handles quoted constructs.
2636 Although the most common reason to learn this is to unravel
2637 labyrinthine regular expressions, because the initial steps of parsing
2638 are the same for all quoting operators, they are all discussed
2639 together.
2640
2641 The most important Perl parsing rule is the first one discussed below:
2642 when processing a quoted construct, Perl first finds the end of that
2643 construct, then interprets its contents. If you understand this rule,
2644 you may skip the rest of this section on the first reading. The other
2645 rules are likely to contradict the user's expectations much less
2646 frequently than this first one.
2647
2648 Some passes discussed below are performed concurrently, but because
2649 their results are the same, we consider them individually. For
2650 different quoting constructs, Perl performs different numbers of
2651 passes, from one to four, but these passes are always performed in the
2652 same order.
2653
2654 Finding the end
2655 The first pass is finding the end of the quoted construct. This
2656 results in saving to a safe location a copy of the text (between
2657 the starting and ending delimiters), normalized as necessary to
2658 avoid needing to know what the original delimiters were.
2659
2660 If the construct is a here-doc, the ending delimiter is a line that
2661 has a terminating string as the content. Therefore "<<EOF" is
2662 terminated by "EOF" immediately followed by "\n" and starting from
2663 the first column of the terminating line. When searching for the
2664 terminating line of a here-doc, nothing is skipped. In other
2665 words, lines after the here-doc syntax are compared with the
2666 terminating string line by line.
2667
2668 For the constructs except here-docs, single characters are used as
2669 starting and ending delimiters. If the starting delimiter is an
2670 opening punctuation (that is "(", "[", "{", or "<"), the ending
2671 delimiter is the corresponding closing punctuation (that is ")",
2672 "]", "}", or ">"). If the starting delimiter is an unpaired
2673 character like "/" or a closing punctuation, the ending delimiter
2674 is the same as the starting delimiter. Therefore a "/" terminates
2675 a "qq//" construct, while a "]" terminates both "qq[]" and "qq]]"
2676 constructs.
2677
2678 When searching for single-character delimiters, escaped delimiters
2679 and "\\" are skipped. For example, while searching for terminating
2680 "/", combinations of "\\" and "\/" are skipped. If the delimiters
2681 are bracketing, nested pairs are also skipped. For example, while
2682 searching for a closing "]" paired with the opening "[",
2683 combinations of "\\", "\]", and "\[" are all skipped, and nested
2684 "[" and "]" are skipped as well. However, when backslashes are
2685 used as the delimiters (like "qq\\" and "tr\\\"), nothing is
2686 skipped. During the search for the end, backslashes that escape
2687 delimiters or other backslashes are removed (exactly speaking, they
2688 are not copied to the safe location).
2689
2690 For constructs with three-part delimiters ("s///", "y///", and
2691 "tr///"), the search is repeated once more. If the first delimiter
2692 is not an opening punctuation, the three delimiters must be the
2693 same, such as "s!!!" and "tr)))", in which case the second
2694 delimiter terminates the left part and starts the right part at
2695 once. If the left part is delimited by bracketing punctuation
2696 (that is "()", "[]", "{}", or "<>"), the right part needs another
2697 pair of delimiters such as "s(){}" and "tr[]//". In these cases,
2698 whitespace and comments are allowed between the two parts, although
2699 the comment must follow at least one whitespace character;
2700 otherwise a character expected as the start of the comment may be
2701 regarded as the starting delimiter of the right part.
2702
2703 During this search no attention is paid to the semantics of the
2704 construct. Thus:
2705
2706 "$hash{"$foo/$bar"}"
2707
2708 or:
2709
2710 m/
2711 bar # NOT a comment, this slash / terminated m//!
2712 /x
2713
2714 do not form legal quoted expressions. The quoted part ends on the
2715 first """ and "/", and the rest happens to be a syntax error.
2716 Because the slash that terminated "m//" was followed by a "SPACE",
2717 the example above is not "m//x", but rather "m//" with no "/x"
2718 modifier. So the embedded "#" is interpreted as a literal "#".
2719
2720 Also no attention is paid to "\c\" (multichar control char syntax)
2721 during this search. Thus the second "\" in "qq/\c\/" is
2722 interpreted as a part of "\/", and the following "/" is not
2723 recognized as a delimiter. Instead, use "\034" or "\x1c" at the
2724 end of quoted constructs.
2725
2726 Interpolation
2727 The next step is interpolation in the text obtained, which is now
2728 delimiter-independent. There are multiple cases.
2729
2730 "<<'EOF'"
2731 No interpolation is performed. Note that the combination "\\"
2732 is left intact, since escaped delimiters are not available for
2733 here-docs.
2734
2735 "m''", the pattern of "s'''"
2736 No interpolation is performed at this stage. Any backslashed
2737 sequences including "\\" are treated at the stage to "parsing
2738 regular expressions".
2739
2740 '', "q//", "tr'''", "y'''", the replacement of "s'''"
2741 The only interpolation is removal of "\" from pairs of "\\".
2742 Therefore "-" in "tr'''" and "y'''" is treated literally as a
2743 hyphen and no character range is available. "\1" in the
2744 replacement of "s'''" does not work as $1.
2745
2746 "tr///", "y///"
2747 No variable interpolation occurs. String modifying
2748 combinations for case and quoting such as "\Q", "\U", and "\E"
2749 are not recognized. The other escape sequences such as "\200"
2750 and "\t" and backslashed characters such as "\\" and "\-" are
2751 converted to appropriate literals. The character "-" is
2752 treated specially and therefore "\-" is treated as a literal
2753 "-".
2754
2755 "", "``", "qq//", "qx//", "<file*glob>", "<<"EOF""
2756 "\Q", "\U", "\u", "\L", "\l", "\F" (possibly paired with "\E")
2757 are converted to corresponding Perl constructs. Thus,
2758 "$foo\Qbaz$bar" is converted to
2759 "$foo . (quotemeta("baz" . $bar))" internally. The other
2760 escape sequences such as "\200" and "\t" and backslashed
2761 characters such as "\\" and "\-" are replaced with appropriate
2762 expansions.
2763
2764 Let it be stressed that whatever falls between "\Q" and "\E" is
2765 interpolated in the usual way. Something like "\Q\\E" has no
2766 "\E" inside. Instead, it has "\Q", "\\", and "E", so the
2767 result is the same as for "\\\\E". As a general rule,
2768 backslashes between "\Q" and "\E" may lead to counterintuitive
2769 results. So, "\Q\t\E" is converted to "quotemeta("\t")", which
2770 is the same as "\\\t" (since TAB is not alphanumeric). Note
2771 also that:
2772
2773 $str = '\t';
2774 return "\Q$str";
2775
2776 may be closer to the conjectural intention of the writer of
2777 "\Q\t\E".
2778
2779 Interpolated scalars and arrays are converted internally to the
2780 "join" and "." catenation operations. Thus, "$foo XXX '@arr'"
2781 becomes:
2782
2783 $foo . " XXX '" . (join $", @arr) . "'";
2784
2785 All operations above are performed simultaneously, left to
2786 right.
2787
2788 Because the result of "\Q STRING \E" has all metacharacters
2789 quoted, there is no way to insert a literal "$" or "@" inside a
2790 "\Q\E" pair. If protected by "\", "$" will be quoted to become
2791 "\\\$"; if not, it is interpreted as the start of an
2792 interpolated scalar.
2793
2794 Note also that the interpolation code needs to make a decision
2795 on where the interpolated scalar ends. For instance, whether
2796 "a $x -> {c}" really means:
2797
2798 "a " . $x . " -> {c}";
2799
2800 or:
2801
2802 "a " . $x -> {c};
2803
2804 Most of the time, the longest possible text that does not
2805 include spaces between components and which contains matching
2806 braces or brackets. because the outcome may be determined by
2807 voting based on heuristic estimators, the result is not
2808 strictly predictable. Fortunately, it's usually correct for
2809 ambiguous cases.
2810
2811 the replacement of "s///"
2812 Processing of "\Q", "\U", "\u", "\L", "\l", "\F" and
2813 interpolation happens as with "qq//" constructs.
2814
2815 It is at this step that "\1" is begrudgingly converted to $1 in
2816 the replacement text of "s///", in order to correct the
2817 incorrigible sed hackers who haven't picked up the saner idiom
2818 yet. A warning is emitted if the "use warnings" pragma or the
2819 -w command-line flag (that is, the $^W variable) was set.
2820
2821 "RE" in "m?RE?", "/RE/", "m/RE/", "s/RE/foo/",
2822 Processing of "\Q", "\U", "\u", "\L", "\l", "\F", "\E", and
2823 interpolation happens (almost) as with "qq//" constructs.
2824
2825 Processing of "\N{...}" is also done here, and compiled into an
2826 intermediate form for the regex compiler. (This is because, as
2827 mentioned below, the regex compilation may be done at execution
2828 time, and "\N{...}" is a compile-time construct.)
2829
2830 However any other combinations of "\" followed by a character
2831 are not substituted but only skipped, in order to parse them as
2832 regular expressions at the following step. As "\c" is skipped
2833 at this step, "@" of "\c@" in RE is possibly treated as an
2834 array symbol (for example @foo), even though the same text in
2835 "qq//" gives interpolation of "\c@".
2836
2837 Code blocks such as "(?{BLOCK})" are handled by temporarily
2838 passing control back to the perl parser, in a similar way that
2839 an interpolated array subscript expression such as
2840 "foo$array[1+f("[xyz")]bar" would be.
2841
2842 Moreover, inside "(?{BLOCK})", "(?# comment )", and a
2843 "#"-comment in a "/x"-regular expression, no processing is
2844 performed whatsoever. This is the first step at which the
2845 presence of the "/x" modifier is relevant.
2846
2847 Interpolation in patterns has several quirks: $|, $(, $), "@+"
2848 and "@-" are not interpolated, and constructs $var[SOMETHING]
2849 are voted (by several different estimators) to be either an
2850 array element or $var followed by an RE alternative. This is
2851 where the notation "${arr[$bar]}" comes handy: "/${arr[0-9]}/"
2852 is interpreted as array element "-9", not as a regular
2853 expression from the variable $arr followed by a digit, which
2854 would be the interpretation of "/$arr[0-9]/". Since voting
2855 among different estimators may occur, the result is not
2856 predictable.
2857
2858 The lack of processing of "\\" creates specific restrictions on
2859 the post-processed text. If the delimiter is "/", one cannot
2860 get the combination "\/" into the result of this step. "/"
2861 will finish the regular expression, "\/" will be stripped to
2862 "/" on the previous step, and "\\/" will be left as is.
2863 Because "/" is equivalent to "\/" inside a regular expression,
2864 this does not matter unless the delimiter happens to be
2865 character special to the RE engine, such as in "s*foo*bar*",
2866 "m[foo]", or "m?foo?"; or an alphanumeric char, as in:
2867
2868 m m ^ a \s* b mmx;
2869
2870 In the RE above, which is intentionally obfuscated for
2871 illustration, the delimiter is "m", the modifier is "mx", and
2872 after delimiter-removal the RE is the same as for
2873 "m/ ^ a \s* b /mx". There's more than one reason you're
2874 encouraged to restrict your delimiters to non-alphanumeric,
2875 non-whitespace choices.
2876
2877 This step is the last one for all constructs except regular
2878 expressions, which are processed further.
2879
2880 parsing regular expressions
2881 Previous steps were performed during the compilation of Perl code,
2882 but this one happens at run time, although it may be optimized to
2883 be calculated at compile time if appropriate. After preprocessing
2884 described above, and possibly after evaluation if concatenation,
2885 joining, casing translation, or metaquoting are involved, the
2886 resulting string is passed to the RE engine for compilation.
2887
2888 Whatever happens in the RE engine might be better discussed in
2889 perlre, but for the sake of continuity, we shall do so here.
2890
2891 This is another step where the presence of the "/x" modifier is
2892 relevant. The RE engine scans the string from left to right and
2893 converts it into a finite automaton.
2894
2895 Backslashed characters are either replaced with corresponding
2896 literal strings (as with "\{"), or else they generate special nodes
2897 in the finite automaton (as with "\b"). Characters special to the
2898 RE engine (such as "|") generate corresponding nodes or groups of
2899 nodes. "(?#...)" comments are ignored. All the rest is either
2900 converted to literal strings to match, or else is ignored (as is
2901 whitespace and "#"-style comments if "/x" is present).
2902
2903 Parsing of the bracketed character class construct, "[...]", is
2904 rather different than the rule used for the rest of the pattern.
2905 The terminator of this construct is found using the same rules as
2906 for finding the terminator of a "{}"-delimited construct, the only
2907 exception being that "]" immediately following "[" is treated as
2908 though preceded by a backslash.
2909
2910 The terminator of runtime "(?{...})" is found by temporarily
2911 switching control to the perl parser, which should stop at the
2912 point where the logically balancing terminating "}" is found.
2913
2914 It is possible to inspect both the string given to RE engine and
2915 the resulting finite automaton. See the arguments
2916 "debug"/"debugcolor" in the "use re" pragma, as well as Perl's -Dr
2917 command-line switch documented in "Command Switches" in perlrun.
2918
2919 Optimization of regular expressions
2920 This step is listed for completeness only. Since it does not
2921 change semantics, details of this step are not documented and are
2922 subject to change without notice. This step is performed over the
2923 finite automaton that was generated during the previous pass.
2924
2925 It is at this stage that "split()" silently optimizes "/^/" to mean
2926 "/^/m".
2927
2928 I/O Operators
2929 There are several I/O operators you should know about.
2930
2931 A string enclosed by backticks (grave accents) first undergoes double-
2932 quote interpolation. It is then interpreted as an external command,
2933 and the output of that command is the value of the backtick string,
2934 like in a shell. In scalar context, a single string consisting of all
2935 output is returned. In list context, a list of values is returned, one
2936 per line of output. (You can set $/ to use a different line
2937 terminator.) The command is executed each time the pseudo-literal is
2938 evaluated. The status value of the command is returned in $? (see
2939 perlvar for the interpretation of $?). Unlike in csh, no translation
2940 is done on the return data--newlines remain newlines. Unlike in any of
2941 the shells, single quotes do not hide variable names in the command
2942 from interpretation. To pass a literal dollar-sign through to the
2943 shell you need to hide it with a backslash. The generalized form of
2944 backticks is "qx//", or you can call the "readpipe" in perlfunc
2945 function. (Because backticks always undergo shell expansion as well,
2946 see perlsec for security concerns.)
2947
2948 In scalar context, evaluating a filehandle in angle brackets yields the
2949 next line from that file (the newline, if any, included), or "undef" at
2950 end-of-file or on error. When $/ is set to "undef" (sometimes known as
2951 file-slurp mode) and the file is empty, it returns '' the first time,
2952 followed by "undef" subsequently.
2953
2954 Ordinarily you must assign the returned value to a variable, but there
2955 is one situation where an automatic assignment happens. If and only if
2956 the input symbol is the only thing inside the conditional of a "while"
2957 statement (even if disguised as a "for(;;)" loop), the value is
2958 automatically assigned to the global variable $_, destroying whatever
2959 was there previously. (This may seem like an odd thing to you, but
2960 you'll use the construct in almost every Perl script you write.) The
2961 $_ variable is not implicitly localized. You'll have to put a
2962 "local $_;" before the loop if you want that to happen. Furthermore,
2963 if the input symbol or an explicit assignment of the input symbol to a
2964 scalar is used as a "while"/"for" condition, then the condition
2965 actually tests for definedness of the expression's value, not for its
2966 regular truth value.
2967
2968 Thus the following lines are equivalent:
2969
2970 while (defined($_ = <STDIN>)) { print; }
2971 while ($_ = <STDIN>) { print; }
2972 while (<STDIN>) { print; }
2973 for (;<STDIN>;) { print; }
2974 print while defined($_ = <STDIN>);
2975 print while ($_ = <STDIN>);
2976 print while <STDIN>;
2977
2978 This also behaves similarly, but assigns to a lexical variable instead
2979 of to $_:
2980
2981 while (my $line = <STDIN>) { print $line }
2982
2983 In these loop constructs, the assigned value (whether assignment is
2984 automatic or explicit) is then tested to see whether it is defined.
2985 The defined test avoids problems where the line has a string value that
2986 would be treated as false by Perl; for example a "" or a "0" with no
2987 trailing newline. If you really mean for such values to terminate the
2988 loop, they should be tested for explicitly:
2989
2990 while (($_ = <STDIN>) ne '0') { ... }
2991 while (<STDIN>) { last unless $_; ... }
2992
2993 In other boolean contexts, "<FILEHANDLE>" without an explicit "defined"
2994 test or comparison elicits a warning if the "use warnings" pragma or
2995 the -w command-line switch (the $^W variable) is in effect.
2996
2997 The filehandles STDIN, STDOUT, and STDERR are predefined. (The
2998 filehandles "stdin", "stdout", and "stderr" will also work except in
2999 packages, where they would be interpreted as local identifiers rather
3000 than global.) Additional filehandles may be created with the "open()"
3001 function, amongst others. See perlopentut and "open" in perlfunc for
3002 details on this.
3003
3004 If a "<FILEHANDLE>" is used in a context that is looking for a list, a
3005 list comprising all input lines is returned, one line per list element.
3006 It's easy to grow to a rather large data space this way, so use with
3007 care.
3008
3009 "<FILEHANDLE>" may also be spelled "readline(*FILEHANDLE)". See
3010 "readline" in perlfunc.
3011
3012 The null filehandle "<>" is special: it can be used to emulate the
3013 behavior of sed and awk, and any other Unix filter program that takes a
3014 list of filenames, doing the same to each line of input from all of
3015 them. Input from "<>" comes either from standard input, or from each
3016 file listed on the command line. Here's how it works: the first time
3017 "<>" is evaluated, the @ARGV array is checked, and if it is empty,
3018 $ARGV[0] is set to "-", which when opened gives you standard input.
3019 The @ARGV array is then processed as a list of filenames. The loop
3020
3021 while (<>) {
3022 ... # code for each line
3023 }
3024
3025 is equivalent to the following Perl-like pseudo code:
3026
3027 unshift(@ARGV, '-') unless @ARGV;
3028 while ($ARGV = shift) {
3029 open(ARGV, $ARGV);
3030 while (<ARGV>) {
3031 ... # code for each line
3032 }
3033 }
3034
3035 except that it isn't so cumbersome to say, and will actually work. It
3036 really does shift the @ARGV array and put the current filename into the
3037 $ARGV variable. It also uses filehandle ARGV internally. "<>" is just
3038 a synonym for "<ARGV>", which is magical. (The pseudo code above
3039 doesn't work because it treats "<ARGV>" as non-magical.)
3040
3041 Since the null filehandle uses the two argument form of "open" in
3042 perlfunc it interprets special characters, so if you have a script like
3043 this:
3044
3045 while (<>) {
3046 print;
3047 }
3048
3049 and call it with "perl dangerous.pl 'rm -rfv *|'", it actually opens a
3050 pipe, executes the "rm" command and reads "rm"'s output from that pipe.
3051 If you want all items in @ARGV to be interpreted as file names, you can
3052 use the module "ARGV::readonly" from CPAN, or use the double bracket:
3053
3054 while (<<>>) {
3055 print;
3056 }
3057
3058 Using double angle brackets inside of a while causes the open to use
3059 the three argument form (with the second argument being "<"), so all
3060 arguments in "ARGV" are treated as literal filenames (including "-").
3061 (Note that for convenience, if you use "<<>>" and if @ARGV is empty, it
3062 will still read from the standard input.)
3063
3064 You can modify @ARGV before the first "<>" as long as the array ends up
3065 containing the list of filenames you really want. Line numbers ($.)
3066 continue as though the input were one big happy file. See the example
3067 in "eof" in perlfunc for how to reset line numbers on each file.
3068
3069 If you want to set @ARGV to your own list of files, go right ahead.
3070 This sets @ARGV to all plain text files if no @ARGV was given:
3071
3072 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
3073
3074 You can even set them to pipe commands. For example, this
3075 automatically filters compressed arguments through gzip:
3076
3077 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
3078
3079 If you want to pass switches into your script, you can use one of the
3080 "Getopts" modules or put a loop on the front like this:
3081
3082 while ($_ = $ARGV[0], /^-/) {
3083 shift;
3084 last if /^--$/;
3085 if (/^-D(.*)/) { $debug = $1 }
3086 if (/^-v/) { $verbose++ }
3087 # ... # other switches
3088 }
3089
3090 while (<>) {
3091 # ... # code for each line
3092 }
3093
3094 The "<>" symbol will return "undef" for end-of-file only once. If you
3095 call it again after this, it will assume you are processing another
3096 @ARGV list, and if you haven't set @ARGV, will read input from STDIN.
3097
3098 If what the angle brackets contain is a simple scalar variable (for
3099 example, $foo), then that variable contains the name of the filehandle
3100 to input from, or its typeglob, or a reference to the same. For
3101 example:
3102
3103 $fh = \*STDIN;
3104 $line = <$fh>;
3105
3106 If what's within the angle brackets is neither a filehandle nor a
3107 simple scalar variable containing a filehandle name, typeglob, or
3108 typeglob reference, it is interpreted as a filename pattern to be
3109 globbed, and either a list of filenames or the next filename in the
3110 list is returned, depending on context. This distinction is determined
3111 on syntactic grounds alone. That means "<$x>" is always a "readline()"
3112 from an indirect handle, but "<$hash{key}>" is always a "glob()".
3113 That's because $x is a simple scalar variable, but $hash{key} is
3114 not--it's a hash element. Even "<$x >" (note the extra space) is
3115 treated as "glob("$x ")", not "readline($x)".
3116
3117 One level of double-quote interpretation is done first, but you can't
3118 say "<$foo>" because that's an indirect filehandle as explained in the
3119 previous paragraph. (In older versions of Perl, programmers would
3120 insert curly brackets to force interpretation as a filename glob:
3121 "<${foo}>". These days, it's considered cleaner to call the internal
3122 function directly as "glob($foo)", which is probably the right way to
3123 have done it in the first place.) For example:
3124
3125 while (<*.c>) {
3126 chmod 0644, $_;
3127 }
3128
3129 is roughly equivalent to:
3130
3131 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
3132 while (<FOO>) {
3133 chomp;
3134 chmod 0644, $_;
3135 }
3136
3137 except that the globbing is actually done internally using the standard
3138 "File::Glob" extension. Of course, the shortest way to do the above
3139 is:
3140
3141 chmod 0644, <*.c>;
3142
3143 A (file)glob evaluates its (embedded) argument only when it is starting
3144 a new list. All values must be read before it will start over. In
3145 list context, this isn't important because you automatically get them
3146 all anyway. However, in scalar context the operator returns the next
3147 value each time it's called, or "undef" when the list has run out. As
3148 with filehandle reads, an automatic "defined" is generated when the
3149 glob occurs in the test part of a "while", because legal glob returns
3150 (for example, a file called 0) would otherwise terminate the loop.
3151 Again, "undef" is returned only once. So if you're expecting a single
3152 value from a glob, it is much better to say
3153
3154 ($file) = <blurch*>;
3155
3156 than
3157
3158 $file = <blurch*>;
3159
3160 because the latter will alternate between returning a filename and
3161 returning false.
3162
3163 If you're trying to do variable interpolation, it's definitely better
3164 to use the "glob()" function, because the older notation can cause
3165 people to become confused with the indirect filehandle notation.
3166
3167 @files = glob("$dir/*.[ch]");
3168 @files = glob($files[$i]);
3169
3170 If an angle-bracket-based globbing expression is used as the condition
3171 of a "while" or "for" loop, then it will be implicitly assigned to $_.
3172 If either a globbing expression or an explicit assignment of a globbing
3173 expression to a scalar is used as a "while"/"for" condition, then the
3174 condition actually tests for definedness of the expression's value, not
3175 for its regular truth value.
3176
3177 Constant Folding
3178 Like C, Perl does a certain amount of expression evaluation at compile
3179 time whenever it determines that all arguments to an operator are
3180 static and have no side effects. In particular, string concatenation
3181 happens at compile time between literals that don't do variable
3182 substitution. Backslash interpolation also happens at compile time.
3183 You can say
3184
3185 'Now is the time for all'
3186 . "\n"
3187 . 'good men to come to.'
3188
3189 and this all reduces to one string internally. Likewise, if you say
3190
3191 foreach $file (@filenames) {
3192 if (-s $file > 5 + 100 * 2**16) { }
3193 }
3194
3195 the compiler precomputes the number which that expression represents so
3196 that the interpreter won't have to.
3197
3198 No-ops
3199 Perl doesn't officially have a no-op operator, but the bare constants 0
3200 and 1 are special-cased not to produce a warning in void context, so
3201 you can for example safely do
3202
3203 1 while foo();
3204
3205 Bitwise String Operators
3206 Bitstrings of any size may be manipulated by the bitwise operators ("~
3207 | & ^").
3208
3209 If the operands to a binary bitwise op are strings of different sizes,
3210 | and ^ ops act as though the shorter operand had additional zero bits
3211 on the right, while the & op acts as though the longer operand were
3212 truncated to the length of the shorter. The granularity for such
3213 extension or truncation is one or more bytes.
3214
3215 # ASCII-based examples
3216 print "j p \n" ^ " a h"; # prints "JAPH\n"
3217 print "JA" | " ph\n"; # prints "japh\n"
3218 print "japh\nJunk" & '_____'; # prints "JAPH\n";
3219 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
3220
3221 If you are intending to manipulate bitstrings, be certain that you're
3222 supplying bitstrings: If an operand is a number, that will imply a
3223 numeric bitwise operation. You may explicitly show which type of
3224 operation you intend by using "" or "0+", as in the examples below.
3225
3226 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3227 $foo = '150' | 105; # yields 255
3228 $foo = 150 | '105'; # yields 255
3229 $foo = '150' | '105'; # yields string '155' (under ASCII)
3230
3231 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
3232 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
3233
3234 This somewhat unpredictable behavior can be avoided with the "bitwise"
3235 feature, new in Perl 5.22. You can enable it via
3236 "use feature 'bitwise'" or "use v5.28". Before Perl 5.28, it used to
3237 emit a warning in the "experimental::bitwise" category. Under this
3238 feature, the four standard bitwise operators ("~ | & ^") are always
3239 numeric. Adding a dot after each operator ("~. |. &. ^.") forces it to
3240 treat its operands as strings:
3241
3242 use feature "bitwise";
3243 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3244 $foo = '150' | 105; # yields 255
3245 $foo = 150 | '105'; # yields 255
3246 $foo = '150' | '105'; # yields 255
3247 $foo = 150 |. 105; # yields string '155'
3248 $foo = '150' |. 105; # yields string '155'
3249 $foo = 150 |.'105'; # yields string '155'
3250 $foo = '150' |.'105'; # yields string '155'
3251
3252 $baz = $foo & $bar; # both operands numeric
3253 $biz = $foo ^. $bar; # both operands stringy
3254
3255 The assignment variants of these operators ("&= |= ^= &.= |.= ^.=")
3256 behave likewise under the feature.
3257
3258 It is a fatal error if an operand contains a character whose ordinal
3259 value is above 0xFF, and hence not expressible except in UTF-8. The
3260 operation is performed on a non-UTF-8 copy for other operands encoded
3261 in UTF-8. See "Byte and Character Semantics" in perlunicode.
3262
3263 See "vec" in perlfunc for information on how to manipulate individual
3264 bits in a bit vector.
3265
3266 Integer Arithmetic
3267 By default, Perl assumes that it must do most of its arithmetic in
3268 floating point. But by saying
3269
3270 use integer;
3271
3272 you may tell the compiler to use integer operations (see integer for a
3273 detailed explanation) from here to the end of the enclosing BLOCK. An
3274 inner BLOCK may countermand this by saying
3275
3276 no integer;
3277
3278 which lasts until the end of that BLOCK. Note that this doesn't mean
3279 everything is an integer, merely that Perl will use integer operations
3280 for arithmetic, comparison, and bitwise operators. For example, even
3281 under "use integer", if you take the sqrt(2), you'll still get
3282 1.4142135623731 or so.
3283
3284 Used on numbers, the bitwise operators ("&" "|" "^" "~" "<<" ">>")
3285 always produce integral results. (But see also "Bitwise String
3286 Operators".) However, "use integer" still has meaning for them. By
3287 default, their results are interpreted as unsigned integers, but if
3288 "use integer" is in effect, their results are interpreted as signed
3289 integers. For example, "~0" usually evaluates to a large integral
3290 value. However, "use integer; ~0" is "-1" on two's-complement
3291 machines.
3292
3293 Floating-point Arithmetic
3294 While "use integer" provides integer-only arithmetic, there is no
3295 analogous mechanism to provide automatic rounding or truncation to a
3296 certain number of decimal places. For rounding to a certain number of
3297 digits, "sprintf()" or "printf()" is usually the easiest route. See
3298 perlfaq4.
3299
3300 Floating-point numbers are only approximations to what a mathematician
3301 would call real numbers. There are infinitely more reals than floats,
3302 so some corners must be cut. For example:
3303
3304 printf "%.20g\n", 123456789123456789;
3305 # produces 123456789123456784
3306
3307 Testing for exact floating-point equality or inequality is not a good
3308 idea. Here's a (relatively expensive) work-around to compare whether
3309 two floating-point numbers are equal to a particular number of decimal
3310 places. See Knuth, volume II, for a more robust treatment of this
3311 topic.
3312
3313 sub fp_equal {
3314 my ($X, $Y, $POINTS) = @_;
3315 my ($tX, $tY);
3316 $tX = sprintf("%.${POINTS}g", $X);
3317 $tY = sprintf("%.${POINTS}g", $Y);
3318 return $tX eq $tY;
3319 }
3320
3321 The POSIX module (part of the standard perl distribution) implements
3322 "ceil()", "floor()", and other mathematical and trigonometric
3323 functions. The "Math::Complex" module (part of the standard perl
3324 distribution) defines mathematical functions that work on both the
3325 reals and the imaginary numbers. "Math::Complex" is not as efficient
3326 as POSIX, but POSIX can't work with complex numbers.
3327
3328 Rounding in financial applications can have serious implications, and
3329 the rounding method used should be specified precisely. In these
3330 cases, it probably pays not to trust whichever system rounding is being
3331 used by Perl, but to instead implement the rounding function you need
3332 yourself.
3333
3334 Bigger Numbers
3335 The standard "Math::BigInt", "Math::BigRat", and "Math::BigFloat"
3336 modules, along with the "bignum", "bigint", and "bigrat" pragmas,
3337 provide variable-precision arithmetic and overloaded operators,
3338 although they're currently pretty slow. At the cost of some space and
3339 considerable speed, they avoid the normal pitfalls associated with
3340 limited-precision representations.
3341
3342 use 5.010;
3343 use bigint; # easy interface to Math::BigInt
3344 $x = 123456789123456789;
3345 say $x * $x;
3346 +15241578780673678515622620750190521
3347
3348 Or with rationals:
3349
3350 use 5.010;
3351 use bigrat;
3352 $x = 3/22;
3353 $y = 4/6;
3354 say "x/y is ", $x/$y;
3355 say "x*y is ", $x*$y;
3356 x/y is 9/44
3357 x*y is 1/11
3358
3359 Several modules let you calculate with unlimited or fixed precision
3360 (bound only by memory and CPU time). There are also some non-standard
3361 modules that provide faster implementations via external C libraries.
3362
3363 Here is a short, but incomplete summary:
3364
3365 Math::String treat string sequences like numbers
3366 Math::FixedPrecision calculate with a fixed precision
3367 Math::Currency for currency calculations
3368 Bit::Vector manipulate bit vectors fast (uses C)
3369 Math::BigIntFast Bit::Vector wrapper for big numbers
3370 Math::Pari provides access to the Pari C library
3371 Math::Cephes uses the external Cephes C library (no
3372 big numbers)
3373 Math::Cephes::Fraction fractions via the Cephes library
3374 Math::GMP another one using an external C library
3375 Math::GMPz an alternative interface to libgmp's big ints
3376 Math::GMPq an interface to libgmp's fraction numbers
3377 Math::GMPf an interface to libgmp's floating point numbers
3378
3379 Choose wisely.
3380
3381
3382
3383perl v5.30.2 2020-03-27 PERLOP(1)