1PERLOP(1) Perl Programmers Reference Guide PERLOP(1)
2
3
4
6 perlop - Perl operators and precedence
7
9 Operator Precedence and Associativity
10 Operator precedence and associativity work in Perl more or less like
11 they do in mathematics.
12
13 Operator precedence means some operators are evaluated before others.
14 For example, in "2 + 4 * 5", the multiplication has higher precedence
15 so "4 * 5" is evaluated first yielding "2 + 20 == 22" and not "6 * 5 ==
16 30".
17
18 Operator associativity defines what happens if a sequence of the same
19 operators is used one after another: whether the evaluator will
20 evaluate the left operations first or the right. For example, in "8 -
21 4 - 2", subtraction is left associative so Perl evaluates the
22 expression left to right. "8 - 4" is evaluated first making the
23 expression "4 - 2 == 2" and not "8 - 2 == 6".
24
25 Perl operators have the following associativity and precedence, listed
26 from highest precedence to lowest. Operators borrowed from C keep the
27 same precedence relationship with each other, even where C's precedence
28 is slightly screwy. (This makes learning Perl easier for C folks.)
29 With very few exceptions, these all operate on scalar values only, not
30 array values.
31
32 left terms and list operators (leftward)
33 left ->
34 nonassoc ++ --
35 right **
36 right ! ~ \ and unary + and -
37 left =~ !~
38 left * / % x
39 left + - .
40 left << >>
41 nonassoc named unary operators
42 nonassoc < > <= >= lt gt le ge
43 nonassoc == != <=> eq ne cmp ~~
44 left &
45 left | ^
46 left &&
47 left || //
48 nonassoc .. ...
49 right ?:
50 right = += -= *= etc.
51 left , =>
52 nonassoc list operators (rightward)
53 right not
54 left and
55 left or xor
56
57 In the following sections, these operators are covered in precedence
58 order.
59
60 Many operators can be overloaded for objects. See overload.
61
62 Terms and List Operators (Leftward)
63 A TERM has the highest precedence in Perl. They include variables,
64 quote and quote-like operators, any expression in parentheses, and any
65 function whose arguments are parenthesized. Actually, there aren't
66 really functions in this sense, just list operators and unary operators
67 behaving as functions because you put parentheses around the arguments.
68 These are all documented in perlfunc.
69
70 If any list operator (print(), etc.) or any unary operator (chdir(),
71 etc.) is followed by a left parenthesis as the next token, the
72 operator and arguments within parentheses are taken to be of highest
73 precedence, just like a normal function call.
74
75 In the absence of parentheses, the precedence of list operators such as
76 "print", "sort", or "chmod" is either very high or very low depending
77 on whether you are looking at the left side or the right side of the
78 operator. For example, in
79
80 @ary = (1, 3, sort 4, 2);
81 print @ary; # prints 1324
82
83 the commas on the right of the sort are evaluated before the sort, but
84 the commas on the left are evaluated after. In other words, list
85 operators tend to gobble up all arguments that follow, and then act
86 like a simple TERM with regard to the preceding expression. Be careful
87 with parentheses:
88
89 # These evaluate exit before doing the print:
90 print($foo, exit); # Obviously not what you want.
91 print $foo, exit; # Nor is this.
92
93 # These do the print before evaluating exit:
94 (print $foo), exit; # This is what you want.
95 print($foo), exit; # Or this.
96 print ($foo), exit; # Or even this.
97
98 Also note that
99
100 print ($foo & 255) + 1, "\n";
101
102 probably doesn't do what you expect at first glance. The parentheses
103 enclose the argument list for "print" which is evaluated (printing the
104 result of "$foo & 255"). Then one is added to the return value of
105 "print" (usually 1). The result is something like this:
106
107 1 + 1, "\n"; # Obviously not what you meant.
108
109 To do what you meant properly, you must write:
110
111 print(($foo & 255) + 1, "\n");
112
113 See "Named Unary Operators" for more discussion of this.
114
115 Also parsed as terms are the "do {}" and "eval {}" constructs, as well
116 as subroutine and method calls, and the anonymous constructors "[]" and
117 "{}".
118
119 See also "Quote and Quote-like Operators" toward the end of this
120 section, as well as "I/O Operators".
121
122 The Arrow Operator
123 ""->"" is an infix dereference operator, just as it is in C and C++.
124 If the right side is either a "[...]", "{...}", or a "(...)" subscript,
125 then the left side must be either a hard or symbolic reference to an
126 array, a hash, or a subroutine respectively. (Or technically speaking,
127 a location capable of holding a hard reference, if it's an array or
128 hash reference being used for assignment.) See perlreftut and perlref.
129
130 Otherwise, the right side is a method name or a simple scalar variable
131 containing either the method name or a subroutine reference, and the
132 left side must be either an object (a blessed reference) or a class
133 name (that is, a package name). See perlobj.
134
135 Auto-increment and Auto-decrement
136 "++" and "--" work as in C. That is, if placed before a variable, they
137 increment or decrement the variable by one before returning the value,
138 and if placed after, increment or decrement after returning the value.
139
140 $i = 0; $j = 0;
141 print $i++; # prints 0
142 print ++$j; # prints 1
143
144 Note that just as in C, Perl doesn't define when the variable is
145 incremented or decremented. You just know it will be done sometime
146 before or after the value is returned. This also means that modifying a
147 variable twice in the same statement will lead to undefined behavior.
148 Avoid statements like:
149
150 $i = $i ++;
151 print ++ $i + $i ++;
152
153 Perl will not guarantee what the result of the above statements is.
154
155 The auto-increment operator has a little extra builtin magic to it. If
156 you increment a variable that is numeric, or that has ever been used in
157 a numeric context, you get a normal increment. If, however, the
158 variable has been used in only string contexts since it was set, and
159 has a value that is not the empty string and matches the pattern
160 "/^[a-zA-Z]*[0-9]*\z/", the increment is done as a string, preserving
161 each character within its range, with carry:
162
163 print ++($foo = "99"); # prints "100"
164 print ++($foo = "a0"); # prints "a1"
165 print ++($foo = "Az"); # prints "Ba"
166 print ++($foo = "zz"); # prints "aaa"
167
168 "undef" is always treated as numeric, and in particular is changed to 0
169 before incrementing (so that a post-increment of an undef value will
170 return 0 rather than "undef").
171
172 The auto-decrement operator is not magical.
173
174 Exponentiation
175 Binary "**" is the exponentiation operator. It binds even more tightly
176 than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
177 implemented using C's pow(3) function, which actually works on doubles
178 internally.)
179
180 Symbolic Unary Operators
181 Unary "!" performs logical negation, that is, "not". See also "not"
182 for a lower precedence version of this.
183
184 Unary "-" performs arithmetic negation if the operand is numeric,
185 including any string that looks like a number. If the operand is an
186 identifier, a string consisting of a minus sign concatenated with the
187 identifier is returned. Otherwise, if the string starts with a plus or
188 minus, a string starting with the opposite sign is returned. One
189 effect of these rules is that -bareword is equivalent to the string
190 "-bareword". If, however, the string begins with a non-alphabetic
191 character (excluding "+" or "-"), Perl will attempt to convert the
192 string to a numeric and the arithmetic negation is performed. If the
193 string cannot be cleanly converted to a numeric, Perl will give the
194 warning Argument "the string" isn't numeric in negation (-) at ....
195
196 Unary "~" performs bitwise negation, that is, 1's complement. For
197 example, "0666 & ~027" is 0640. (See also "Integer Arithmetic" and
198 "Bitwise String Operators".) Note that the width of the result is
199 platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
200 bits wide on a 64-bit platform, so if you are expecting a certain bit
201 width, remember to use the "&" operator to mask off the excess bits.
202
203 When complementing strings, if all characters have ordinal values under
204 256, then their complements will, also. But if they do not, all
205 characters will be in either 32- or 64-bit complements, depending on
206 your architecture. So for example, "~"\x{3B1}"" is "\x{FFFF_FC4E}" on
207 32-bit machines and "\x{FFFF_FFFF_FFFF_FC4E}" on 64-bit machines.
208
209 Unary "+" has no effect whatsoever, even on strings. It is useful
210 syntactically for separating a function name from a parenthesized
211 expression that would otherwise be interpreted as the complete list of
212 function arguments. (See examples above under "Terms and List
213 Operators (Leftward)".)
214
215 Unary "\" creates a reference to whatever follows it. See perlreftut
216 and perlref. Do not confuse this behavior with the behavior of
217 backslash within a string, although both forms do convey the notion of
218 protecting the next thing from interpolation.
219
220 Binding Operators
221 Binary "=~" binds a scalar expression to a pattern match. Certain
222 operations search or modify the string $_ by default. This operator
223 makes that kind of operation work on some other string. The right
224 argument is a search pattern, substitution, or transliteration. The
225 left argument is what is supposed to be searched, substituted, or
226 transliterated instead of the default $_. When used in scalar context,
227 the return value generally indicates the success of the operation. The
228 exceptions are substitution (s///) and transliteration (y///) with the
229 "/r" (non-destructive) option, which cause the return value to be the
230 result of the substitution. Behavior in list context depends on the
231 particular operator. See "Regexp Quote-Like Operators" for details and
232 perlretut for examples using these operators.
233
234 If the right argument is an expression rather than a search pattern,
235 substitution, or transliteration, it is interpreted as a search pattern
236 at run time. Note that this means that its contents will be
237 interpolated twice, so
238
239 '\\' =~ q'\\';
240
241 is not ok, as the regex engine will end up trying to compile the
242 pattern "\", which it will consider a syntax error.
243
244 Binary "!~" is just like "=~" except the return value is negated in the
245 logical sense.
246
247 Binary "!~" with a non-destructive substitution (s///r) or
248 transliteration (y///r) is a syntax error.
249
250 Multiplicative Operators
251 Binary "*" multiplies two numbers.
252
253 Binary "/" divides two numbers.
254
255 Binary "%" is the modulo operator, which computes the division
256 remainder of its first argument with respect to its second argument.
257 Given integer operands $a and $b: If $b is positive, then "$a % $b" is
258 $a minus the largest multiple of $b less than or equal to $a. If $b is
259 negative, then "$a % $b" is $a minus the smallest multiple of $b that
260 is not less than $a (that is, the result will be less than or equal to
261 zero). If the operands $a and $b are floating point values and the
262 absolute value of $b (that is "abs($b)") is less than "(UV_MAX + 1)",
263 only the integer portion of $a and $b will be used in the operation
264 (Note: here "UV_MAX" means the maximum of the unsigned integer type).
265 If the absolute value of the right operand ("abs($b)") is greater than
266 or equal to "(UV_MAX + 1)", "%" computes the floating-point remainder
267 $r in the equation "($r = $a - $i*$b)" where $i is a certain integer
268 that makes $r have the same sign as the right operand $b (not as the
269 left operand $a like C function "fmod()") and the absolute value less
270 than that of $b. Note that when "use integer" is in scope, "%" gives
271 you direct access to the modulo operator as implemented by your C
272 compiler. This operator is not as well defined for negative operands,
273 but it will execute faster.
274
275 Binary "x" is the repetition operator. In scalar context or if the
276 left operand is not enclosed in parentheses, it returns a string
277 consisting of the left operand repeated the number of times specified
278 by the right operand. In list context, if the left operand is enclosed
279 in parentheses or is a list formed by "qw/STRING/", it repeats the
280 list. If the right operand is zero or negative, it returns an empty
281 string or an empty list, depending on the context.
282
283 print '-' x 80; # print row of dashes
284
285 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
286
287 @ones = (1) x 80; # a list of 80 1's
288 @ones = (5) x @ones; # set all elements to 5
289
290 Additive Operators
291 Binary "+" returns the sum of two numbers.
292
293 Binary "-" returns the difference of two numbers.
294
295 Binary "." concatenates two strings.
296
297 Shift Operators
298 Binary "<<" returns the value of its left argument shifted left by the
299 number of bits specified by the right argument. Arguments should be
300 integers. (See also "Integer Arithmetic".)
301
302 Binary ">>" returns the value of its left argument shifted right by the
303 number of bits specified by the right argument. Arguments should be
304 integers. (See also "Integer Arithmetic".)
305
306 Note that both "<<" and ">>" in Perl are implemented directly using
307 "<<" and ">>" in C. If "use integer" (see "Integer Arithmetic") is in
308 force then signed C integers are used, else unsigned C integers are
309 used. Either way, the implementation isn't going to generate results
310 larger than the size of the integer type Perl was built with (32 bits
311 or 64 bits).
312
313 The result of overflowing the range of the integers is undefined
314 because it is undefined also in C. In other words, using 32-bit
315 integers, "1 << 32" is undefined. Shifting by a negative number of
316 bits is also undefined.
317
318 If you get tired of being subject to your platform's native integers,
319 the "use bigint" pragma neatly sidesteps the issue altogether:
320
321 print 20 << 20; # 20971520
322 print 20 << 40; # 5120 on 32-bit machines,
323 # 21990232555520 on 64-bit machines
324 use bigint;
325 print 20 << 100; # 25353012004564588029934064107520
326
327 Named Unary Operators
328 The various named unary operators are treated as functions with one
329 argument, with optional parentheses.
330
331 If any list operator (print(), etc.) or any unary operator (chdir(),
332 etc.) is followed by a left parenthesis as the next token, the
333 operator and arguments within parentheses are taken to be of highest
334 precedence, just like a normal function call. For example, because
335 named unary operators are higher precedence than "||":
336
337 chdir $foo || die; # (chdir $foo) || die
338 chdir($foo) || die; # (chdir $foo) || die
339 chdir ($foo) || die; # (chdir $foo) || die
340 chdir +($foo) || die; # (chdir $foo) || die
341
342 but, because * is higher precedence than named operators:
343
344 chdir $foo * 20; # chdir ($foo * 20)
345 chdir($foo) * 20; # (chdir $foo) * 20
346 chdir ($foo) * 20; # (chdir $foo) * 20
347 chdir +($foo) * 20; # chdir ($foo * 20)
348
349 rand 10 * 20; # rand (10 * 20)
350 rand(10) * 20; # (rand 10) * 20
351 rand (10) * 20; # (rand 10) * 20
352 rand +(10) * 20; # rand (10 * 20)
353
354 Regarding precedence, the filetest operators, like "-f", "-M", etc. are
355 treated like named unary operators, but they don't follow this
356 functional parenthesis rule. That means, for example, that
357 "-f($file).".bak"" is equivalent to "-f "$file.bak"".
358
359 See also "Terms and List Operators (Leftward)".
360
361 Relational Operators
362 Perl operators that return true or false generally return values that
363 can be safely used as numbers. For example, the relational operators
364 in this section and the equality operators in the next one return 1 for
365 true and a special version of the defined empty string, "", which
366 counts as a zero but is exempt from warnings about improper numeric
367 conversions, just as "0 but true" is.
368
369 Binary "<" returns true if the left argument is numerically less than
370 the right argument.
371
372 Binary ">" returns true if the left argument is numerically greater
373 than the right argument.
374
375 Binary "<=" returns true if the left argument is numerically less than
376 or equal to the right argument.
377
378 Binary ">=" returns true if the left argument is numerically greater
379 than or equal to the right argument.
380
381 Binary "lt" returns true if the left argument is stringwise less than
382 the right argument.
383
384 Binary "gt" returns true if the left argument is stringwise greater
385 than the right argument.
386
387 Binary "le" returns true if the left argument is stringwise less than
388 or equal to the right argument.
389
390 Binary "ge" returns true if the left argument is stringwise greater
391 than or equal to the right argument.
392
393 Equality Operators
394 Binary "==" returns true if the left argument is numerically equal to
395 the right argument.
396
397 Binary "!=" returns true if the left argument is numerically not equal
398 to the right argument.
399
400 Binary "<=>" returns -1, 0, or 1 depending on whether the left argument
401 is numerically less than, equal to, or greater than the right argument.
402 If your platform supports NaNs (not-a-numbers) as numeric values, using
403 them with "<=>" returns undef. NaN is not "<", "==", ">", "<=" or ">="
404 anything (even NaN), so those 5 return false. NaN != NaN returns true,
405 as does NaN != anything else. If your platform doesn't support NaNs
406 then NaN is just a string with numeric value 0.
407
408 $ perl -le '$a = "NaN"; print "No NaN support here" if $a == $a'
409 $ perl -le '$a = "NaN"; print "NaN support here" if $a != $a'
410
411 (Note that the bigint, bigrat, and bignum pragmas all support "NaN".)
412
413 Binary "eq" returns true if the left argument is stringwise equal to
414 the right argument.
415
416 Binary "ne" returns true if the left argument is stringwise not equal
417 to the right argument.
418
419 Binary "cmp" returns -1, 0, or 1 depending on whether the left argument
420 is stringwise less than, equal to, or greater than the right argument.
421
422 Binary "~~" does a smartmatch between its arguments. Smart matching is
423 described in the next section.
424
425 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order
426 specified by the current locale if a legacy "use locale" (but not "use
427 locale ':not_characters'") is in effect. See perllocale. Do not mix
428 these with Unicode, only with legacy binary encodings. The standard
429 Unicode::Collate and Unicode::Collate::Locale modules offer much more
430 powerful solutions to collation issues.
431
432 Smartmatch Operator
433 First available in Perl 5.10.1 (the 5.10.0 version behaved
434 differently), binary "~~" does a "smartmatch" between its arguments.
435 This is mostly used implicitly in the "when" construct described in
436 perlsyn, although not all "when" clauses call the smartmatch operator.
437 Unique among all of Perl's operators, the smartmatch operator can
438 recurse.
439
440 It is also unique in that all other Perl operators impose a context
441 (usually string or numeric context) on their operands, autoconverting
442 those operands to those imposed contexts. In contrast, smartmatch
443 infers contexts from the actual types of its operands and uses that
444 type information to select a suitable comparison mechanism.
445
446 The "~~" operator compares its operands "polymorphically", determining
447 how to compare them according to their actual types (numeric, string,
448 array, hash, etc.) Like the equality operators with which it shares
449 the same precedence, "~~" returns 1 for true and "" for false. It is
450 often best read aloud as "in", "inside of", or "is contained in",
451 because the left operand is often looked for inside the right operand.
452 That makes the order of the operands to the smartmatch operand often
453 opposite that of the regular match operator. In other words, the
454 "smaller" thing is usually placed in the left operand and the larger
455 one in the right.
456
457 The behavior of a smartmatch depends on what type of things its
458 arguments are, as determined by the following table. The first row of
459 the table whose types apply determines the smartmatch behavior.
460 Because what actually happens is mostly determined by the type of the
461 second operand, the table is sorted on the right operand instead of on
462 the left.
463
464 Left Right Description and pseudocode
465 ===============================================================
466 Any undef check whether Any is undefined
467 like: !defined Any
468
469 Any Object invoke ~~ overloading on Object, or die
470
471 Right operand is an ARRAY:
472
473 Left Right Description and pseudocode
474 ===============================================================
475 ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
476 like: (ARRAY1[0] ~~ ARRAY2[0])
477 && (ARRAY1[1] ~~ ARRAY2[1]) && ...
478 HASH ARRAY any ARRAY elements exist as HASH keys
479 like: grep { exists HASH->{$_} } ARRAY
480 Regexp ARRAY any ARRAY elements pattern match Regexp
481 like: grep { /Regexp/ } ARRAY
482 undef ARRAY undef in ARRAY
483 like: grep { !defined } ARRAY
484 Any ARRAY smartmatch each ARRAY element[3]
485 like: grep { Any ~~ $_ } ARRAY
486
487 Right operand is a HASH:
488
489 Left Right Description and pseudocode
490 ===============================================================
491 HASH1 HASH2 all same keys in both HASHes
492 like: keys HASH1 ==
493 grep { exists HASH2->{$_} } keys HASH1
494 ARRAY HASH any ARRAY elements exist as HASH keys
495 like: grep { exists HASH->{$_} } ARRAY
496 Regexp HASH any HASH keys pattern match Regexp
497 like: grep { /Regexp/ } keys HASH
498 undef HASH always false (undef can't be a key)
499 like: 0 == 1
500 Any HASH HASH key existence
501 like: exists HASH->{Any}
502
503 Right operand is CODE:
504
505 Left Right Description and pseudocode
506 ===============================================================
507 ARRAY CODE sub returns true on all ARRAY elements[1]
508 like: !grep { !CODE->($_) } ARRAY
509 HASH CODE sub returns true on all HASH keys[1]
510 like: !grep { !CODE->($_) } keys HASH
511 Any CODE sub passed Any returns true
512 like: CODE->(Any)
513
514 Right operand is a Regexp:
515
516 Left Right Description and pseudocode
517 ===============================================================
518 ARRAY Regexp any ARRAY elements match Regexp
519 like: grep { /Regexp/ } ARRAY
520 HASH Regexp any HASH keys match Regexp
521 like: grep { /Regexp/ } keys HASH
522 Any Regexp pattern match
523 like: Any =~ /Regexp/
524
525 Other:
526
527 Left Right Description and pseudocode
528 ===============================================================
529 Object Any invoke ~~ overloading on Object,
530 or fall back to...
531
532 Any Num numeric equality
533 like: Any == Num
534 Num nummy[4] numeric equality
535 like: Num == nummy
536 undef Any check whether undefined
537 like: !defined(Any)
538 Any Any string equality
539 like: Any eq Any
540
541 Notes:
542
543 1. Empty hashes or arrays match.
544 2. That is, each element smartmatches the element of the same index in
545 the other array.[3]
546 3. If a circular reference is found, fall back to referential equality.
547 4. Either an actual number, or a string that looks like one.
548
549 The smartmatch implicitly dereferences any non-blessed hash or array
550 reference, so the "HASH" and "ARRAY" entries apply in those cases. For
551 blessed references, the "Object" entries apply. Smartmatches involving
552 hashes only consider hash keys, never hash values.
553
554 The "like" code entry is not always an exact rendition. For example,
555 the smartmatch operator short-circuits whenever possible, but "grep"
556 does not. Also, "grep" in scalar context returns the number of
557 matches, but "~~" returns only true or false.
558
559 Unlike most operators, the smartmatch operator knows to treat "undef"
560 specially:
561
562 use v5.10.1;
563 @array = (1, 2, 3, undef, 4, 5);
564 say "some elements undefined" if undef ~~ @array;
565
566 Each operand is considered in a modified scalar context, the
567 modification being that array and hash variables are passed by
568 reference to the operator, which implicitly dereferences them. Both
569 elements of each pair are the same:
570
571 use v5.10.1;
572
573 my %hash = (red => 1, blue => 2, green => 3,
574 orange => 4, yellow => 5, purple => 6,
575 black => 7, grey => 8, white => 9);
576
577 my @array = qw(red blue green);
578
579 say "some array elements in hash keys" if @array ~~ %hash;
580 say "some array elements in hash keys" if \@array ~~ \%hash;
581
582 say "red in array" if "red" ~~ @array;
583 say "red in array" if "red" ~~ \@array;
584
585 say "some keys end in e" if /e$/ ~~ %hash;
586 say "some keys end in e" if /e$/ ~~ \%hash;
587
588 Two arrays smartmatch if each element in the first array smartmatches
589 (that is, is "in") the corresponding element in the second array,
590 recursively.
591
592 use v5.10.1;
593 my @little = qw(red blue green);
594 my @bigger = ("red", "blue", [ "orange", "green" ] );
595 if (@little ~~ @bigger) { # true!
596 say "little is contained in bigger";
597 }
598
599 Because the smartmatch operator recurses on nested arrays, this will
600 still report that "red" is in the array.
601
602 use v5.10.1;
603 my @array = qw(red blue green);
604 my $nested_array = [[[[[[[ @array ]]]]]]];
605 say "red in array" if "red" ~~ $nested_array;
606
607 If two arrays smartmatch each other, then they are deep copies of each
608 others' values, as this example reports:
609
610 use v5.12.0;
611 my @a = (0, 1, 2, [3, [4, 5], 6], 7);
612 my @b = (0, 1, 2, [3, [4, 5], 6], 7);
613
614 if (@a ~~ @b && @b ~~ @a) {
615 say "a and b are deep copies of each other";
616 }
617 elsif (@a ~~ @b) {
618 say "a smartmatches in b";
619 }
620 elsif (@b ~~ @a) {
621 say "b smartmatches in a";
622 }
623 else {
624 say "a and b don't smartmatch each other at all";
625 }
626
627 If you were to set "$b[3] = 4", then instead of reporting that "a and b
628 are deep copies of each other", it now reports that "b smartmatches in
629 a". That because the corresponding position in @a contains an array
630 that (eventually) has a 4 in it.
631
632 Smartmatching one hash against another reports whether both contain the
633 same keys, no more and no less. This could be used to see whether two
634 records have the same field names, without caring what values those
635 fields might have. For example:
636
637 use v5.10.1;
638 sub make_dogtag {
639 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
640
641 my ($class, $init_fields) = @_;
642
643 die "Must supply (only) name, rank, and serial number"
644 unless $init_fields ~~ $REQUIRED_FIELDS;
645
646 ...
647 }
648
649 or, if other non-required fields are allowed, use ARRAY ~~ HASH:
650
651 use v5.10.1;
652 sub make_dogtag {
653 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
654
655 my ($class, $init_fields) = @_;
656
657 die "Must supply (at least) name, rank, and serial number"
658 unless [keys %{$init_fields}] ~~ $REQUIRED_FIELDS;
659
660 ...
661 }
662
663 The smartmatch operator is most often used as the implicit operator of
664 a "when" clause. See the section on "Switch Statements" in perlsyn.
665
666 Smartmatching of Objects
667
668 To avoid relying on an object's underlying representation, if the
669 smartmatch's right operand is an object that doesn't overload "~~", it
670 raises the exception ""Smartmatching a non-overloaded object breaks
671 encapsulation"". That's because one has no business digging around to
672 see whether something is "in" an object. These are all illegal on
673 objects without a "~~" overload:
674
675 %hash ~~ $object
676 42 ~~ $object
677 "fred" ~~ $object
678
679 However, you can change the way an object is smartmatched by
680 overloading the "~~" operator. This is allowed to extend the usual
681 smartmatch semantics. For objects that do have an "~~" overload, see
682 overload.
683
684 Using an object as the left operand is allowed, although not very
685 useful. Smartmatching rules take precedence over overloading, so even
686 if the object in the left operand has smartmatch overloading, this will
687 be ignored. A left operand that is a non-overloaded object falls back
688 on a string or numeric comparison of whatever the "ref" operator
689 returns. That means that
690
691 $object ~~ X
692
693 does not invoke the overload method with "X" as an argument. Instead
694 the above table is consulted as normal, and based on the type of "X",
695 overloading may or may not be invoked. For simple strings or numbers,
696 in becomes equivalent to this:
697
698 $object ~~ $number ref($object) == $number
699 $object ~~ $string ref($object) eq $string
700
701 For example, this reports that the handle smells IOish (but please
702 don't really do this!):
703
704 use IO::Handle;
705 my $fh = IO::Handle->new();
706 if ($fh ~~ /\bIO\b/) {
707 say "handle smells IOish";
708 }
709
710 That's because it treats $fh as a string like
711 "IO::Handle=GLOB(0x8039e0)", then pattern matches against that.
712
713 Bitwise And
714 Binary "&" returns its operands ANDed together bit by bit. (See also
715 "Integer Arithmetic" and "Bitwise String Operators".)
716
717 Note that "&" has lower priority than relational operators, so for
718 example the parentheses are essential in a test like
719
720 print "Even\n" if ($x & 1) == 0;
721
722 Bitwise Or and Exclusive Or
723 Binary "|" returns its operands ORed together bit by bit. (See also
724 "Integer Arithmetic" and "Bitwise String Operators".)
725
726 Binary "^" returns its operands XORed together bit by bit. (See also
727 "Integer Arithmetic" and "Bitwise String Operators".)
728
729 Note that "|" and "^" have lower priority than relational operators, so
730 for example the brackets are essential in a test like
731
732 print "false\n" if (8 | 2) != 10;
733
734 C-style Logical And
735 Binary "&&" performs a short-circuit logical AND operation. That is,
736 if the left operand is false, the right operand is not even evaluated.
737 Scalar or list context propagates down to the right operand if it is
738 evaluated.
739
740 C-style Logical Or
741 Binary "||" performs a short-circuit logical OR operation. That is, if
742 the left operand is true, the right operand is not even evaluated.
743 Scalar or list context propagates down to the right operand if it is
744 evaluated.
745
746 Logical Defined-Or
747 Although it has no direct equivalent in C, Perl's "//" operator is
748 related to its C-style or. In fact, it's exactly the same as "||",
749 except that it tests the left hand side's definedness instead of its
750 truth. Thus, "EXPR1 // EXPR2" returns the value of "EXPR1" if it's
751 defined, otherwise, the value of "EXPR2" is returned. ("EXPR1" is
752 evaluated in scalar context, "EXPR2" in the context of "//" itself).
753 Usually, this is the same result as "defined(EXPR1) ? EXPR1 : EXPR2"
754 (except that the ternary-operator form can be used as a lvalue, while
755 "EXPR1 // EXPR2" cannot). This is very useful for providing default
756 values for variables. If you actually want to test if at least one of
757 $a and $b is defined, use "defined($a // $b)".
758
759 The "||", "//" and "&&" operators return the last value evaluated
760 (unlike C's "||" and "&&", which return 0 or 1). Thus, a reasonably
761 portable way to find out the home directory might be:
762
763 $home = $ENV{HOME}
764 // $ENV{LOGDIR}
765 // (getpwuid($<))[7]
766 // die "You're homeless!\n";
767
768 In particular, this means that you shouldn't use this for selecting
769 between two aggregates for assignment:
770
771 @a = @b || @c; # this is wrong
772 @a = scalar(@b) || @c; # really meant this
773 @a = @b ? @b : @c; # this works fine, though
774
775 As alternatives to "&&" and "||" when used for control flow, Perl
776 provides the "and" and "or" operators (see below). The short-circuit
777 behavior is identical. The precedence of "and" and "or" is much lower,
778 however, so that you can safely use them after a list operator without
779 the need for parentheses:
780
781 unlink "alpha", "beta", "gamma"
782 or gripe(), next LINE;
783
784 With the C-style operators that would have been written like this:
785
786 unlink("alpha", "beta", "gamma")
787 || (gripe(), next LINE);
788
789 It would be even more readable to write that this way:
790
791 unless(unlink("alpha", "beta", "gamma")) {
792 gripe();
793 next LINE;
794 }
795
796 Using "or" for assignment is unlikely to do what you want; see below.
797
798 Range Operators
799 Binary ".." is the range operator, which is really two different
800 operators depending on the context. In list context, it returns a list
801 of values counting (up by ones) from the left value to the right value.
802 If the left value is greater than the right value then it returns the
803 empty list. The range operator is useful for writing "foreach (1..10)"
804 loops and for doing slice operations on arrays. In the current
805 implementation, no temporary array is created when the range operator
806 is used as the expression in "foreach" loops, but older versions of
807 Perl might burn a lot of memory when you write something like this:
808
809 for (1 .. 1_000_000) {
810 # code
811 }
812
813 The range operator also works on strings, using the magical auto-
814 increment, see below.
815
816 In scalar context, ".." returns a boolean value. The operator is
817 bistable, like a flip-flop, and emulates the line-range (comma)
818 operator of sed, awk, and various editors. Each ".." operator maintains
819 its own boolean state, even across calls to a subroutine that contains
820 it. It is false as long as its left operand is false. Once the left
821 operand is true, the range operator stays true until the right operand
822 is true, AFTER which the range operator becomes false again. It
823 doesn't become false till the next time the range operator is
824 evaluated. It can test the right operand and become false on the same
825 evaluation it became true (as in awk), but it still returns true once.
826 If you don't want it to test the right operand until the next
827 evaluation, as in sed, just use three dots ("...") instead of two. In
828 all other regards, "..." behaves just like ".." does.
829
830 The right operand is not evaluated while the operator is in the "false"
831 state, and the left operand is not evaluated while the operator is in
832 the "true" state. The precedence is a little lower than || and &&.
833 The value returned is either the empty string for false, or a sequence
834 number (beginning with 1) for true. The sequence number is reset for
835 each range encountered. The final sequence number in a range has the
836 string "E0" appended to it, which doesn't affect its numeric value, but
837 gives you something to search for if you want to exclude the endpoint.
838 You can exclude the beginning point by waiting for the sequence number
839 to be greater than 1.
840
841 If either operand of scalar ".." is a constant expression, that operand
842 is considered true if it is equal ("==") to the current input line
843 number (the $. variable).
844
845 To be pedantic, the comparison is actually "int(EXPR) == int(EXPR)",
846 but that is only an issue if you use a floating point expression; when
847 implicitly using $. as described in the previous paragraph, the
848 comparison is "int(EXPR) == int($.)" which is only an issue when $. is
849 set to a floating point value and you are not reading from a file.
850 Furthermore, "span" .. "spat" or "2.18 .. 3.14" will not do what you
851 want in scalar context because each of the operands are evaluated using
852 their integer representation.
853
854 Examples:
855
856 As a scalar operator:
857
858 if (101 .. 200) { print; } # print 2nd hundred lines, short for
859 # if ($. == 101 .. $. == 200) { print; }
860
861 next LINE if (1 .. /^$/); # skip header lines, short for
862 # next LINE if ($. == 1 .. /^$/);
863 # (typically in a loop labeled LINE)
864
865 s/^/> / if (/^$/ .. eof()); # quote body
866
867 # parse mail messages
868 while (<>) {
869 $in_header = 1 .. /^$/;
870 $in_body = /^$/ .. eof;
871 if ($in_header) {
872 # do something
873 } else { # in body
874 # do something else
875 }
876 } continue {
877 close ARGV if eof; # reset $. each file
878 }
879
880 Here's a simple example to illustrate the difference between the two
881 range operators:
882
883 @lines = (" - Foo",
884 "01 - Bar",
885 "1 - Baz",
886 " - Quux");
887
888 foreach (@lines) {
889 if (/0/ .. /1/) {
890 print "$_\n";
891 }
892 }
893
894 This program will print only the line containing "Bar". If the range
895 operator is changed to "...", it will also print the "Baz" line.
896
897 And now some examples as a list operator:
898
899 for (101 .. 200) { print } # print $_ 100 times
900 @foo = @foo[0 .. $#foo]; # an expensive no-op
901 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
902
903 The range operator (in list context) makes use of the magical auto-
904 increment algorithm if the operands are strings. You can say
905
906 @alphabet = ("A" .. "Z");
907
908 to get all normal letters of the English alphabet, or
909
910 $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
911
912 to get a hexadecimal digit, or
913
914 @z2 = ("01" .. "31");
915 print $z2[$mday];
916
917 to get dates with leading zeros.
918
919 If the final value specified is not in the sequence that the magical
920 increment would produce, the sequence goes until the next value would
921 be longer than the final value specified.
922
923 If the initial value specified isn't part of a magical increment
924 sequence (that is, a non-empty string matching "/^[a-zA-Z]*[0-9]*\z/"),
925 only the initial value will be returned. So the following will only
926 return an alpha:
927
928 use charnames "greek";
929 my @greek_small = ("\N{alpha}" .. "\N{omega}");
930
931 To get the 25 traditional lowercase Greek letters, including both
932 sigmas, you could use this instead:
933
934 use charnames "greek";
935 my @greek_small = map { chr } ( ord("\N{alpha}")
936 ..
937 ord("\N{omega}")
938 );
939
940 However, because there are many other lowercase Greek characters than
941 just those, to match lowercase Greek characters in a regular
942 expression, you would use the pattern "/(?:(?=\p{Greek})\p{Lower})+/".
943
944 Because each operand is evaluated in integer form, "2.18 .. 3.14" will
945 return two elements in list context.
946
947 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
948
949 Conditional Operator
950 Ternary "?:" is the conditional operator, just as in C. It works much
951 like an if-then-else. If the argument before the ? is true, the
952 argument before the : is returned, otherwise the argument after the :
953 is returned. For example:
954
955 printf "I have %d dog%s.\n", $n,
956 ($n == 1) ? "" : "s";
957
958 Scalar or list context propagates downward into the 2nd or 3rd
959 argument, whichever is selected.
960
961 $a = $ok ? $b : $c; # get a scalar
962 @a = $ok ? @b : @c; # get an array
963 $a = $ok ? @b : @c; # oops, that's just a count!
964
965 The operator may be assigned to if both the 2nd and 3rd arguments are
966 legal lvalues (meaning that you can assign to them):
967
968 ($a_or_b ? $a : $b) = $c;
969
970 Because this operator produces an assignable result, using assignments
971 without parentheses will get you in trouble. For example, this:
972
973 $a % 2 ? $a += 10 : $a += 2
974
975 Really means this:
976
977 (($a % 2) ? ($a += 10) : $a) += 2
978
979 Rather than this:
980
981 ($a % 2) ? ($a += 10) : ($a += 2)
982
983 That should probably be written more simply as:
984
985 $a += ($a % 2) ? 10 : 2;
986
987 Assignment Operators
988 "=" is the ordinary assignment operator.
989
990 Assignment operators work as in C. That is,
991
992 $a += 2;
993
994 is equivalent to
995
996 $a = $a + 2;
997
998 although without duplicating any side effects that dereferencing the
999 lvalue might trigger, such as from tie(). Other assignment operators
1000 work similarly. The following are recognized:
1001
1002 **= += *= &= <<= &&=
1003 -= /= |= >>= ||=
1004 .= %= ^= //=
1005 x=
1006
1007 Although these are grouped by family, they all have the precedence of
1008 assignment.
1009
1010 Unlike in C, the scalar assignment operator produces a valid lvalue.
1011 Modifying an assignment is equivalent to doing the assignment and then
1012 modifying the variable that was assigned to. This is useful for
1013 modifying a copy of something, like this:
1014
1015 ($tmp = $global) =~ tr/13579/24680/;
1016
1017 Although as of 5.14, that can be also be accomplished this way:
1018
1019 use v5.14;
1020 $tmp = ($global =~ tr/13579/24680/r);
1021
1022 Likewise,
1023
1024 ($a += 2) *= 3;
1025
1026 is equivalent to
1027
1028 $a += 2;
1029 $a *= 3;
1030
1031 Similarly, a list assignment in list context produces the list of
1032 lvalues assigned to, and a list assignment in scalar context returns
1033 the number of elements produced by the expression on the right hand
1034 side of the assignment.
1035
1036 Comma Operator
1037 Binary "," is the comma operator. In scalar context it evaluates its
1038 left argument, throws that value away, then evaluates its right
1039 argument and returns that value. This is just like C's comma operator.
1040
1041 In list context, it's just the list argument separator, and inserts
1042 both its arguments into the list. These arguments are also evaluated
1043 from left to right.
1044
1045 The "=>" operator is a synonym for the comma except that it causes a
1046 word on its left to be interpreted as a string if it begins with a
1047 letter or underscore and is composed only of letters, digits and
1048 underscores. This includes operands that might otherwise be
1049 interpreted as operators, constants, single number v-strings or
1050 function calls. If in doubt about this behavior, the left operand can
1051 be quoted explicitly.
1052
1053 Otherwise, the "=>" operator behaves exactly as the comma operator or
1054 list argument separator, according to context.
1055
1056 For example:
1057
1058 use constant FOO => "something";
1059
1060 my %h = ( FOO => 23 );
1061
1062 is equivalent to:
1063
1064 my %h = ("FOO", 23);
1065
1066 It is NOT:
1067
1068 my %h = ("something", 23);
1069
1070 The "=>" operator is helpful in documenting the correspondence between
1071 keys and values in hashes, and other paired elements in lists.
1072
1073 %hash = ( $key => $value );
1074 login( $username => $password );
1075
1076 The special quoting behavior ignores precedence, and hence may apply to
1077 part of the left operand:
1078
1079 print time.shift => "bbb";
1080
1081 That example prints something like "1314363215shiftbbb", because the
1082 "=>" implicitly quotes the "shift" immediately on its left, ignoring
1083 the fact that "time.shift" is the entire left operand.
1084
1085 List Operators (Rightward)
1086 On the right side of a list operator, the comma has very low
1087 precedence, such that it controls all comma-separated expressions found
1088 there. The only operators with lower precedence are the logical
1089 operators "and", "or", and "not", which may be used to evaluate calls
1090 to list operators without the need for parentheses:
1091
1092 open HANDLE, "< :utf8", "filename" or die "Can't open: $!\n";
1093
1094 However, some people find that code harder to read than writing it with
1095 parentheses:
1096
1097 open(HANDLE, "< :utf8", "filename") or die "Can't open: $!\n";
1098
1099 in which case you might as well just use the more customary "||"
1100 operator:
1101
1102 open(HANDLE, "< :utf8", "filename") || die "Can't open: $!\n";
1103
1104 See also discussion of list operators in "Terms and List Operators
1105 (Leftward)".
1106
1107 Logical Not
1108 Unary "not" returns the logical negation of the expression to its
1109 right. It's the equivalent of "!" except for the very low precedence.
1110
1111 Logical And
1112 Binary "and" returns the logical conjunction of the two surrounding
1113 expressions. It's equivalent to "&&" except for the very low
1114 precedence. This means that it short-circuits: the right expression is
1115 evaluated only if the left expression is true.
1116
1117 Logical or and Exclusive Or
1118 Binary "or" returns the logical disjunction of the two surrounding
1119 expressions. It's equivalent to "||" except for the very low
1120 precedence. This makes it useful for control flow:
1121
1122 print FH $data or die "Can't write to FH: $!";
1123
1124 This means that it short-circuits: the right expression is evaluated
1125 only if the left expression is false. Due to its precedence, you must
1126 be careful to avoid using it as replacement for the "||" operator. It
1127 usually works out better for flow control than in assignments:
1128
1129 $a = $b or $c; # bug: this is wrong
1130 ($a = $b) or $c; # really means this
1131 $a = $b || $c; # better written this way
1132
1133 However, when it's a list-context assignment and you're trying to use
1134 "||" for control flow, you probably need "or" so that the assignment
1135 takes higher precedence.
1136
1137 @info = stat($file) || die; # oops, scalar sense of stat!
1138 @info = stat($file) or die; # better, now @info gets its due
1139
1140 Then again, you could always use parentheses.
1141
1142 Binary "xor" returns the exclusive-OR of the two surrounding
1143 expressions. It cannot short-circuit (of course).
1144
1145 There is no low precedence operator for defined-OR.
1146
1147 C Operators Missing From Perl
1148 Here is what C has that Perl doesn't:
1149
1150 unary & Address-of operator. (But see the "\" operator for taking a
1151 reference.)
1152
1153 unary * Dereference-address operator. (Perl's prefix dereferencing
1154 operators are typed: $, @, %, and &.)
1155
1156 (TYPE) Type-casting operator.
1157
1158 Quote and Quote-like Operators
1159 While we usually think of quotes as literal values, in Perl they
1160 function as operators, providing various kinds of interpolating and
1161 pattern matching capabilities. Perl provides customary quote
1162 characters for these behaviors, but also provides a way for you to
1163 choose your quote character for any of them. In the following table, a
1164 "{}" represents any pair of delimiters you choose.
1165
1166 Customary Generic Meaning Interpolates
1167 '' q{} Literal no
1168 "" qq{} Literal yes
1169 `` qx{} Command yes*
1170 qw{} Word list no
1171 // m{} Pattern match yes*
1172 qr{} Pattern yes*
1173 s{}{} Substitution yes*
1174 tr{}{} Transliteration no (but see below)
1175 y{}{} Transliteration no (but see below)
1176 <<EOF here-doc yes*
1177
1178 * unless the delimiter is ''.
1179
1180 Non-bracketing delimiters use the same character fore and aft, but the
1181 four sorts of ASCII brackets (round, angle, square, curly) all nest,
1182 which means that
1183
1184 q{foo{bar}baz}
1185
1186 is the same as
1187
1188 'foo{bar}baz'
1189
1190 Note, however, that this does not always work for quoting Perl code:
1191
1192 $s = q{ if($a eq "}") ... }; # WRONG
1193
1194 is a syntax error. The "Text::Balanced" module (standard as of v5.8,
1195 and from CPAN before then) is able to do this properly.
1196
1197 There can be whitespace between the operator and the quoting
1198 characters, except when "#" is being used as the quoting character.
1199 "q#foo#" is parsed as the string "foo", while "q #foo#" is the operator
1200 "q" followed by a comment. Its argument will be taken from the next
1201 line. This allows you to write:
1202
1203 s {foo} # Replace foo
1204 {bar} # with bar.
1205
1206 The following escape sequences are available in constructs that
1207 interpolate, and in transliterations:
1208
1209 Sequence Note Description
1210 \t tab (HT, TAB)
1211 \n newline (NL)
1212 \r return (CR)
1213 \f form feed (FF)
1214 \b backspace (BS)
1215 \a alarm (bell) (BEL)
1216 \e escape (ESC)
1217 \x{263A} [1,8] hex char (example: SMILEY)
1218 \x1b [2,8] restricted range hex char (example: ESC)
1219 \N{name} [3] named Unicode character or character sequence
1220 \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
1221 \c[ [5] control char (example: chr(27))
1222 \o{23072} [6,8] octal char (example: SMILEY)
1223 \033 [7,8] restricted range octal char (example: ESC)
1224
1225 [1] The result is the character specified by the hexadecimal number
1226 between the braces. See "[8]" below for details on which
1227 character.
1228
1229 Only hexadecimal digits are valid between the braces. If an invalid
1230 character is encountered, a warning will be issued and the invalid
1231 character and all subsequent characters (valid or invalid) within
1232 the braces will be discarded.
1233
1234 If there are no valid digits between the braces, the generated
1235 character is the NULL character ("\x{00}"). However, an explicit
1236 empty brace ("\x{}") will not cause a warning (currently).
1237
1238 [2] The result is the character specified by the hexadecimal number in
1239 the range 0x00 to 0xFF. See "[8]" below for details on which
1240 character.
1241
1242 Only hexadecimal digits are valid following "\x". When "\x" is
1243 followed by fewer than two valid digits, any valid digits will be
1244 zero-padded. This means that "\x7" will be interpreted as "\x07",
1245 and a lone <\x> will be interpreted as "\x00". Except at the end
1246 of a string, having fewer than two valid digits will result in a
1247 warning. Note that although the warning says the illegal character
1248 is ignored, it is only ignored as part of the escape and will still
1249 be used as the subsequent character in the string. For example:
1250
1251 Original Result Warns?
1252 "\x7" "\x07" no
1253 "\x" "\x00" no
1254 "\x7q" "\x07q" yes
1255 "\xq" "\x00q" yes
1256
1257 [3] The result is the Unicode character or character sequence given by
1258 name. See charnames.
1259
1260 [4] "\N{U+hexadecimal number}" means the Unicode character whose
1261 Unicode code point is hexadecimal number.
1262
1263 [5] The character following "\c" is mapped to some other character as
1264 shown in the table:
1265
1266 Sequence Value
1267 \c@ chr(0)
1268 \cA chr(1)
1269 \ca chr(1)
1270 \cB chr(2)
1271 \cb chr(2)
1272 ...
1273 \cZ chr(26)
1274 \cz chr(26)
1275 \c[ chr(27)
1276 \c] chr(29)
1277 \c^ chr(30)
1278 \c? chr(127)
1279
1280 In other words, it's the character whose code point has had 64
1281 xor'd with its uppercase. "\c?" is DELETE because "ord("@") ^ 64"
1282 is 127, and "\c@" is NULL because the ord of "@" is 64, so xor'ing
1283 64 itself produces 0.
1284
1285 Also, "\c\X" yields " chr(28) . "X"" for any X, but cannot come at
1286 the end of a string, because the backslash would be parsed as
1287 escaping the end quote.
1288
1289 On ASCII platforms, the resulting characters from the list above
1290 are the complete set of ASCII controls. This isn't the case on
1291 EBCDIC platforms; see "OPERATOR DIFFERENCES" in perlebcdic for the
1292 complete list of what these sequences mean on both ASCII and EBCDIC
1293 platforms.
1294
1295 Use of any other character following the "c" besides those listed
1296 above is discouraged, and some are deprecated with the intention of
1297 removing those in a later Perl version. What happens for any of
1298 these other characters currently though, is that the value is
1299 derived by xor'ing with the seventh bit, which is 64.
1300
1301 To get platform independent controls, you can use "\N{...}".
1302
1303 [6] The result is the character specified by the octal number between
1304 the braces. See "[8]" below for details on which character.
1305
1306 If a character that isn't an octal digit is encountered, a warning
1307 is raised, and the value is based on the octal digits before it,
1308 discarding it and all following characters up to the closing brace.
1309 It is a fatal error if there are no octal digits at all.
1310
1311 [7] The result is the character specified by the three-digit octal
1312 number in the range 000 to 777 (but best to not use above 077, see
1313 next paragraph). See "[8]" below for details on which character.
1314
1315 Some contexts allow 2 or even 1 digit, but any usage without
1316 exactly three digits, the first being a zero, may give unintended
1317 results. (For example, in a regular expression it may be confused
1318 with a backreference; see "Octal escapes" in perlrebackslash.)
1319 Starting in Perl 5.14, you may use "\o{}" instead, which avoids all
1320 these problems. Otherwise, it is best to use this construct only
1321 for ordinals "\077" and below, remembering to pad to the left with
1322 zeros to make three digits. For larger ordinals, either use
1323 "\o{}", or convert to something else, such as to hex and use "\x{}"
1324 instead.
1325
1326 Having fewer than 3 digits may lead to a misleading warning message
1327 that says that what follows is ignored. For example, "\128" in the
1328 ASCII character set is equivalent to the two characters "\n8", but
1329 the warning "Illegal octal digit '8' ignored" will be thrown. If
1330 "\n8" is what you want, you can avoid this warning by padding your
1331 octal number with 0's: "\0128".
1332
1333 [8] Several constructs above specify a character by a number. That
1334 number gives the character's position in the character set encoding
1335 (indexed from 0). This is called synonymously its ordinal, code
1336 position, or code point. Perl works on platforms that have a
1337 native encoding currently of either ASCII/Latin1 or EBCDIC, each of
1338 which allow specification of 256 characters. In general, if the
1339 number is 255 (0xFF, 0377) or below, Perl interprets this in the
1340 platform's native encoding. If the number is 256 (0x100, 0400) or
1341 above, Perl interprets it as a Unicode code point and the result is
1342 the corresponding Unicode character. For example "\x{50}" and
1343 "\o{120}" both are the number 80 in decimal, which is less than
1344 256, so the number is interpreted in the native character set
1345 encoding. In ASCII the character in the 80th position (indexed
1346 from 0) is the letter "P", and in EBCDIC it is the ampersand symbol
1347 "&". "\x{100}" and "\o{400}" are both 256 in decimal, so the
1348 number is interpreted as a Unicode code point no matter what the
1349 native encoding is. The name of the character in the 256th
1350 position (indexed by 0) in Unicode is "LATIN CAPITAL LETTER A WITH
1351 MACRON".
1352
1353 There are a couple of exceptions to the above rule.
1354 "\N{U+hex number}" is always interpreted as a Unicode code point,
1355 so that "\N{U+0050}" is "P" even on EBCDIC platforms. And if
1356 "use encoding" is in effect, the number is considered to be in that
1357 encoding, and is translated from that into the platform's native
1358 encoding if there is a corresponding native character; otherwise to
1359 Unicode.
1360
1361 NOTE: Unlike C and other languages, Perl has no "\v" escape sequence
1362 for the vertical tab (VT - ASCII 11), but you may use "\ck" or "\x0b".
1363 ("\v" does have meaning in regular expression patterns in Perl, see
1364 perlre.)
1365
1366 The following escape sequences are available in constructs that
1367 interpolate, but not in transliterations.
1368
1369 \l lowercase next character only
1370 \u titlecase (not uppercase!) next character only
1371 \L lowercase all characters till \E or end of string
1372 \U uppercase all characters till \E or end of string
1373 \F foldcase all characters till \E or end of string
1374 \Q quote (disable) pattern metacharacters till \E or
1375 end of string
1376 \E end either case modification or quoted section
1377 (whichever was last seen)
1378
1379 See "quotemeta" in perlfunc for the exact definition of characters that
1380 are quoted by "\Q".
1381
1382 "\L", "\U", "\F", and "\Q" can stack, in which case you need one "\E"
1383 for each. For example:
1384
1385 say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
1386 This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
1387
1388 If "use locale" is in effect (but not "use locale ':not_characters'"),
1389 the case map used by "\l", "\L", "\u", and "\U" is taken from the
1390 current locale. See perllocale. If Unicode (for example, "\N{}" or
1391 code points of 0x100 or beyond) is being used, the case map used by
1392 "\l", "\L", "\u", and "\U" is as defined by Unicode. That means that
1393 case-mapping a single character can sometimes produce several
1394 characters. Under "use locale", "\F" produces the same results as
1395 "\L".
1396
1397 All systems use the virtual "\n" to represent a line terminator, called
1398 a "newline". There is no such thing as an unvarying, physical newline
1399 character. It is only an illusion that the operating system, device
1400 drivers, C libraries, and Perl all conspire to preserve. Not all
1401 systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on
1402 the ancient Macs (pre-MacOS X) of yesteryear, these used to be
1403 reversed, and on systems without line terminator, printing "\n" might
1404 emit no actual data. In general, use "\n" when you mean a "newline"
1405 for your system, but use the literal ASCII when you need an exact
1406 character. For example, most networking protocols expect and prefer a
1407 CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they
1408 often accept just "\012", they seldom tolerate just "\015". If you get
1409 in the habit of using "\n" for networking, you may be burned some day.
1410
1411 For constructs that do interpolate, variables beginning with ""$"" or
1412 ""@"" are interpolated. Subscripted variables such as $a[3] or
1413 "$href->{key}[0]" are also interpolated, as are array and hash slices.
1414 But method calls such as "$obj->meth" are not.
1415
1416 Interpolating an array or slice interpolates the elements in order,
1417 separated by the value of $", so is equivalent to interpolating "join
1418 $", @array". "Punctuation" arrays such as "@*" are usually
1419 interpolated only if the name is enclosed in braces "@{*}", but the
1420 arrays @_, "@+", and "@-" are interpolated even without braces.
1421
1422 For double-quoted strings, the quoting from "\Q" is applied after
1423 interpolation and escapes are processed.
1424
1425 "abc\Qfoo\tbar$s\Exyz"
1426
1427 is equivalent to
1428
1429 "abc" . quotemeta("foo\tbar$s") . "xyz"
1430
1431 For the pattern of regex operators ("qr//", "m//" and "s///"), the
1432 quoting from "\Q" is applied after interpolation is processed, but
1433 before escapes are processed. This allows the pattern to match
1434 literally (except for "$" and "@"). For example, the following matches:
1435
1436 '\s\t' =~ /\Q\s\t/
1437
1438 Because "$" or "@" trigger interpolation, you'll need to use something
1439 like "/\Quser\E\@\Qhost/" to match them literally.
1440
1441 Patterns are subject to an additional level of interpretation as a
1442 regular expression. This is done as a second pass, after variables are
1443 interpolated, so that regular expressions may be incorporated into the
1444 pattern from the variables. If this is not what you want, use "\Q" to
1445 interpolate a variable literally.
1446
1447 Apart from the behavior described above, Perl does not expand multiple
1448 levels of interpolation. In particular, contrary to the expectations
1449 of shell programmers, back-quotes do NOT interpolate within double
1450 quotes, nor do single quotes impede evaluation of variables when used
1451 within double quotes.
1452
1453 Regexp Quote-Like Operators
1454 Here are the quote-like operators that apply to pattern matching and
1455 related activities.
1456
1457 qr/STRING/msixpodual
1458 This operator quotes (and possibly compiles) its STRING as a
1459 regular expression. STRING is interpolated the same way as
1460 PATTERN in "m/PATTERN/". If "'" is used as the delimiter, no
1461 interpolation is done. Returns a Perl value which may be used
1462 instead of the corresponding "/STRING/msixpodual" expression.
1463 The returned value is a normalized version of the original
1464 pattern. It magically differs from a string containing the same
1465 characters: "ref(qr/x/)" returns "Regexp"; however,
1466 dereferencing it is not well defined (you currently get the
1467 normalized version of the original pattern, but this may
1468 change).
1469
1470 For example,
1471
1472 $rex = qr/my.STRING/is;
1473 print $rex; # prints (?si-xm:my.STRING)
1474 s/$rex/foo/;
1475
1476 is equivalent to
1477
1478 s/my.STRING/foo/is;
1479
1480 The result may be used as a subpattern in a match:
1481
1482 $re = qr/$pattern/;
1483 $string =~ /foo${re}bar/; # can be interpolated in other patterns
1484 $string =~ $re; # or used standalone
1485 $string =~ /$re/; # or this way
1486
1487 Since Perl may compile the pattern at the moment of execution
1488 of the qr() operator, using qr() may have speed advantages in
1489 some situations, notably if the result of qr() is used
1490 standalone:
1491
1492 sub match {
1493 my $patterns = shift;
1494 my @compiled = map qr/$_/i, @$patterns;
1495 grep {
1496 my $success = 0;
1497 foreach my $pat (@compiled) {
1498 $success = 1, last if /$pat/;
1499 }
1500 $success;
1501 } @_;
1502 }
1503
1504 Precompilation of the pattern into an internal representation
1505 at the moment of qr() avoids a need to recompile the pattern
1506 every time a match "/$pat/" is attempted. (Perl has many other
1507 internal optimizations, but none would be triggered in the
1508 above example if we did not use qr() operator.)
1509
1510 Options (specified by the following modifiers) are:
1511
1512 m Treat string as multiple lines.
1513 s Treat string as single line. (Make . match a newline)
1514 i Do case-insensitive pattern matching.
1515 x Use extended regular expressions.
1516 p When matching preserve a copy of the matched string so
1517 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
1518 o Compile pattern only once.
1519 a ASCII-restrict: Use ASCII for \d, \s, \w; specifying two a's
1520 further restricts /i matching so that no ASCII character will
1521 match a non-ASCII one
1522 l Use the locale
1523 u Use Unicode rules
1524 d Use Unicode or native charset, as in 5.12 and earlier
1525
1526 If a precompiled pattern is embedded in a larger pattern then
1527 the effect of "msixpluad" will be propagated appropriately.
1528 The effect the "o" modifier has is not propagated, being
1529 restricted to those patterns explicitly using it.
1530
1531 The last four modifiers listed above, added in Perl 5.14,
1532 control the character set semantics, but "/a" is the only one
1533 you are likely to want to specify explicitly; the other three
1534 are selected automatically by various pragmas.
1535
1536 See perlre for additional information on valid syntax for
1537 STRING, and for a detailed look at the semantics of regular
1538 expressions. In particular, all modifiers except the largely
1539 obsolete "/o" are further explained in "Modifiers" in perlre.
1540 "/o" is described in the next section.
1541
1542 m/PATTERN/msixpodualgc
1543 /PATTERN/msixpodualgc
1544 Searches a string for a pattern match, and in scalar context
1545 returns true if it succeeds, false if it fails. If no string
1546 is specified via the "=~" or "!~" operator, the $_ string is
1547 searched. (The string specified with "=~" need not be an
1548 lvalue--it may be the result of an expression evaluation, but
1549 remember the "=~" binds rather tightly.) See also perlre.
1550
1551 Options are as described in "qr//" above; in addition, the
1552 following match process modifiers are available:
1553
1554 g Match globally, i.e., find all occurrences.
1555 c Do not reset search position on a failed match when /g is in effect.
1556
1557 If "/" is the delimiter then the initial "m" is optional. With
1558 the "m" you can use any pair of non-whitespace (ASCII)
1559 characters as delimiters. This is particularly useful for
1560 matching path names that contain "/", to avoid LTS (leaning
1561 toothpick syndrome). If "?" is the delimiter, then a match-
1562 only-once rule applies, described in "m?PATTERN?" below. If
1563 "'" is the delimiter, no interpolation is performed on the
1564 PATTERN. When using a character valid in an identifier,
1565 whitespace is required after the "m".
1566
1567 PATTERN may contain variables, which will be interpolated every
1568 time the pattern search is evaluated, except for when the
1569 delimiter is a single quote. (Note that $(, $), and $| are not
1570 interpolated because they look like end-of-string tests.) Perl
1571 will not recompile the pattern unless an interpolated variable
1572 that it contains changes. You can force Perl to skip the test
1573 and never recompile by adding a "/o" (which stands for "once")
1574 after the trailing delimiter. Once upon a time, Perl would
1575 recompile regular expressions unnecessarily, and this modifier
1576 was useful to tell it not to do so, in the interests of speed.
1577 But now, the only reasons to use "/o" are either:
1578
1579 1. The variables are thousands of characters long and you know
1580 that they don't change, and you need to wring out the last
1581 little bit of speed by having Perl skip testing for that.
1582 (There is a maintenance penalty for doing this, as
1583 mentioning "/o" constitutes a promise that you won't change
1584 the variables in the pattern. If you do change them, Perl
1585 won't even notice.)
1586
1587 2. you want the pattern to use the initial values of the
1588 variables regardless of whether they change or not. (But
1589 there are saner ways of accomplishing this than using
1590 "/o".)
1591
1592 The bottom line is that using "/o" is almost never a good idea.
1593
1594 The empty pattern //
1595 If the PATTERN evaluates to the empty string, the last
1596 successfully matched regular expression is used instead. In
1597 this case, only the "g" and "c" flags on the empty pattern are
1598 honored; the other flags are taken from the original pattern.
1599 If no match has previously succeeded, this will (silently) act
1600 instead as a genuine empty pattern (which will always match).
1601
1602 Note that it's possible to confuse Perl into thinking "//" (the
1603 empty regex) is really "//" (the defined-or operator). Perl is
1604 usually pretty good about this, but some pathological cases
1605 might trigger this, such as "$a///" (is that "($a) / (//)" or
1606 "$a // /"?) and "print $fh //" ("print $fh(//" or "print($fh
1607 //"?). In all of these examples, Perl will assume you meant
1608 defined-or. If you meant the empty regex, just use parentheses
1609 or spaces to disambiguate, or even prefix the empty regex with
1610 an "m" (so "//" becomes "m//").
1611
1612 Matching in list context
1613 If the "/g" option is not used, "m//" in list context returns a
1614 list consisting of the subexpressions matched by the
1615 parentheses in the pattern, that is, ($1, $2, $3...). (Note
1616 that here $1 etc. are also set, and that this differs from Perl
1617 4's behavior.) When there are no parentheses in the pattern,
1618 the return value is the list "(1)" for success. With or
1619 without parentheses, an empty list is returned upon failure.
1620
1621 Examples:
1622
1623 open(TTY, "+</dev/tty")
1624 || die "can't access /dev/tty: $!";
1625
1626 <TTY> =~ /^y/i && foo(); # do foo if desired
1627
1628 if (/Version: *([0-9.]*)/) { $version = $1; }
1629
1630 next if m#^/usr/spool/uucp#;
1631
1632 # poor man's grep
1633 $arg = shift;
1634 while (<>) {
1635 print if /$arg/o; # compile only once (no longer needed!)
1636 }
1637
1638 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
1639
1640 This last example splits $foo into the first two words and the
1641 remainder of the line, and assigns those three fields to $F1,
1642 $F2, and $Etc. The conditional is true if any variables were
1643 assigned; that is, if the pattern matched.
1644
1645 The "/g" modifier specifies global pattern matching--that is,
1646 matching as many times as possible within the string. How it
1647 behaves depends on the context. In list context, it returns a
1648 list of the substrings matched by any capturing parentheses in
1649 the regular expression. If there are no parentheses, it returns
1650 a list of all the matched strings, as if there were parentheses
1651 around the whole pattern.
1652
1653 In scalar context, each execution of "m//g" finds the next
1654 match, returning true if it matches, and false if there is no
1655 further match. The position after the last match can be read
1656 or set using the "pos()" function; see "pos" in perlfunc. A
1657 failed match normally resets the search position to the
1658 beginning of the string, but you can avoid that by adding the
1659 "/c" modifier (for example, "m//gc"). Modifying the target
1660 string also resets the search position.
1661
1662 \G assertion
1663 You can intermix "m//g" matches with "m/\G.../g", where "\G" is
1664 a zero-width assertion that matches the exact position where
1665 the previous "m//g", if any, left off. Without the "/g"
1666 modifier, the "\G" assertion still anchors at "pos()" as it was
1667 at the start of the operation (see "pos" in perlfunc), but the
1668 match is of course only attempted once. Using "\G" without "/g"
1669 on a target string that has not previously had a "/g" match
1670 applied to it is the same as using the "\A" assertion to match
1671 the beginning of the string. Note also that, currently, "\G"
1672 is only properly supported when anchored at the very beginning
1673 of the pattern.
1674
1675 Examples:
1676
1677 # list context
1678 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1679
1680 # scalar context
1681 local $/ = "";
1682 while ($paragraph = <>) {
1683 while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
1684 $sentences++;
1685 }
1686 }
1687 say $sentences;
1688
1689 Here's another way to check for sentences in a paragraph:
1690
1691 my $sentence_rx = qr{
1692 (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or whitespace
1693 \p{Lu} # capital letter
1694 .*? # a bunch of anything
1695 (?<= \S ) # that ends in non-whitespace
1696 (?<! \b [DMS]r ) # but isn't a common abbreviation
1697 (?<! \b Mrs )
1698 (?<! \b Sra )
1699 (?<! \b St )
1700 [.?!] # followed by a sentence ender
1701 (?= $ | \s ) # in front of end-of-string or whitespace
1702 }sx;
1703 local $/ = "";
1704 while (my $paragraph = <>) {
1705 say "NEW PARAGRAPH";
1706 my $count = 0;
1707 while ($paragraph =~ /($sentence_rx)/g) {
1708 printf "\tgot sentence %d: <%s>\n", ++$count, $1;
1709 }
1710 }
1711
1712 Here's how to use "m//gc" with "\G":
1713
1714 $_ = "ppooqppqq";
1715 while ($i++ < 2) {
1716 print "1: '";
1717 print $1 while /(o)/gc; print "', pos=", pos, "\n";
1718 print "2: '";
1719 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
1720 print "3: '";
1721 print $1 while /(p)/gc; print "', pos=", pos, "\n";
1722 }
1723 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
1724
1725 The last example should print:
1726
1727 1: 'oo', pos=4
1728 2: 'q', pos=5
1729 3: 'pp', pos=7
1730 1: '', pos=7
1731 2: 'q', pos=8
1732 3: '', pos=8
1733 Final: 'q', pos=8
1734
1735 Notice that the final match matched "q" instead of "p", which a
1736 match without the "\G" anchor would have done. Also note that
1737 the final match did not update "pos". "pos" is only updated on
1738 a "/g" match. If the final match did indeed match "p", it's a
1739 good bet that you're running a very old (pre-5.6.0) version of
1740 Perl.
1741
1742 A useful idiom for "lex"-like scanners is "/\G.../gc". You can
1743 combine several regexps like this to process a string part-by-
1744 part, doing different actions depending on which regexp
1745 matched. Each regexp tries to match where the previous one
1746 leaves off.
1747
1748 $_ = <<'EOL';
1749 $url = URI::URL->new( "http://example.com/" ); die if $url eq "xXx";
1750 EOL
1751
1752 LOOP: {
1753 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
1754 print(" lowercase"), redo LOOP if /\G\p{Ll}+\b[,.;]?\s*/gc;
1755 print(" UPPERCASE"), redo LOOP if /\G\p{Lu}+\b[,.;]?\s*/gc;
1756 print(" Capitalized"), redo LOOP if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
1757 print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
1758 print(" alphanumeric"), redo LOOP if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
1759 print(" line-noise"), redo LOOP if /\G\W+/gc;
1760 print ". That's all!\n";
1761 }
1762
1763 Here is the output (split into several lines):
1764
1765 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
1766 line-noise lowercase line-noise lowercase line-noise lowercase
1767 lowercase line-noise lowercase lowercase line-noise lowercase
1768 lowercase line-noise MiXeD line-noise. That's all!
1769
1770 m?PATTERN?msixpodualgc
1771 ?PATTERN?msixpodualgc
1772 This is just like the "m/PATTERN/" search, except that it
1773 matches only once between calls to the reset() operator. This
1774 is a useful optimization when you want to see only the first
1775 occurrence of something in each file of a set of files, for
1776 instance. Only "m??" patterns local to the current package
1777 are reset.
1778
1779 while (<>) {
1780 if (m?^$?) {
1781 # blank line between header and body
1782 }
1783 } continue {
1784 reset if eof; # clear m?? status for next file
1785 }
1786
1787 Another example switched the first "latin1" encoding it finds
1788 to "utf8" in a pod file:
1789
1790 s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
1791
1792 The match-once behavior is controlled by the match delimiter
1793 being "?"; with any other delimiter this is the normal "m//"
1794 operator.
1795
1796 For historical reasons, the leading "m" in "m?PATTERN?" is
1797 optional, but the resulting "?PATTERN?" syntax is deprecated,
1798 will warn on usage and might be removed from a future stable
1799 release of Perl (without further notice!).
1800
1801 s/PATTERN/REPLACEMENT/msixpodualgcer
1802 Searches a string for a pattern, and if found, replaces that
1803 pattern with the replacement text and returns the number of
1804 substitutions made. Otherwise it returns false (specifically,
1805 the empty string).
1806
1807 If the "/r" (non-destructive) option is used then it runs the
1808 substitution on a copy of the string and instead of returning
1809 the number of substitutions, it returns the copy whether or not
1810 a substitution occurred. The original string is never changed
1811 when "/r" is used. The copy will always be a plain string,
1812 even if the input is an object or a tied variable.
1813
1814 If no string is specified via the "=~" or "!~" operator, the $_
1815 variable is searched and modified. Unless the "/r" option is
1816 used, the string specified must be a scalar variable, an array
1817 element, a hash element, or an assignment to one of those; that
1818 is, some sort of scalar lvalue.
1819
1820 If the delimiter chosen is a single quote, no interpolation is
1821 done on either the PATTERN or the REPLACEMENT. Otherwise, if
1822 the PATTERN contains a $ that looks like a variable rather than
1823 an end-of-string test, the variable will be interpolated into
1824 the pattern at run-time. If you want the pattern compiled only
1825 once the first time the variable is interpolated, use the "/o"
1826 option. If the pattern evaluates to the empty string, the last
1827 successfully executed regular expression is used instead. See
1828 perlre for further explanation on these.
1829
1830 Options are as with m// with the addition of the following
1831 replacement specific options:
1832
1833 e Evaluate the right side as an expression.
1834 ee Evaluate the right side as a string then eval the result.
1835 r Return substitution and leave the original string untouched.
1836
1837 Any non-whitespace delimiter may replace the slashes. Add
1838 space after the "s" when using a character allowed in
1839 identifiers. If single quotes are used, no interpretation is
1840 done on the replacement string (the "/e" modifier overrides
1841 this, however). Unlike Perl 4, Perl 5 treats backticks as
1842 normal delimiters; the replacement text is not evaluated as a
1843 command. If the PATTERN is delimited by bracketing quotes, the
1844 REPLACEMENT has its own pair of quotes, which may or may not be
1845 bracketing quotes, for example, "s(foo)(bar)" or "s<foo>/bar/".
1846 A "/e" will cause the replacement portion to be treated as a
1847 full-fledged Perl expression and evaluated right then and
1848 there. It is, however, syntax checked at compile-time. A
1849 second "e" modifier will cause the replacement portion to be
1850 "eval"ed before being run as a Perl expression.
1851
1852 Examples:
1853
1854 s/\bgreen\b/mauve/g; # don't change wintergreen
1855
1856 $path =~ s|/usr/bin|/usr/local/bin|;
1857
1858 s/Login: $foo/Login: $bar/; # run-time pattern
1859
1860 ($foo = $bar) =~ s/this/that/; # copy first, then change
1861 ($foo = "$bar") =~ s/this/that/; # convert to string, copy, then change
1862 $foo = $bar =~ s/this/that/r; # Same as above using /r
1863 $foo = $bar =~ s/this/that/r
1864 =~ s/that/the other/r; # Chained substitutes using /r
1865 @foo = map { s/this/that/r } @bar # /r is very useful in maps
1866
1867 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
1868
1869 $_ = 'abc123xyz';
1870 s/\d+/$&*2/e; # yields 'abc246xyz'
1871 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
1872 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
1873
1874 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
1875 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
1876 s/^=(\w+)/pod($1)/ge; # use function call
1877
1878 $_ = 'abc123xyz';
1879 $a = s/abc/def/r; # $a is 'def123xyz' and
1880 # $_ remains 'abc123xyz'.
1881
1882 # expand variables in $_, but dynamics only, using
1883 # symbolic dereferencing
1884 s/\$(\w+)/${$1}/g;
1885
1886 # Add one to the value of any numbers in the string
1887 s/(\d+)/1 + $1/eg;
1888
1889 # Titlecase words in the last 30 characters only
1890 substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
1891
1892 # This will expand any embedded scalar variable
1893 # (including lexicals) in $_ : First $1 is interpolated
1894 # to the variable name, and then evaluated
1895 s/(\$\w+)/$1/eeg;
1896
1897 # Delete (most) C comments.
1898 $program =~ s {
1899 /\* # Match the opening delimiter.
1900 .*? # Match a minimal number of characters.
1901 \*/ # Match the closing delimiter.
1902 } []gsx;
1903
1904 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively
1905
1906 for ($variable) { # trim whitespace in $variable, cheap
1907 s/^\s+//;
1908 s/\s+$//;
1909 }
1910
1911 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
1912
1913 Note the use of $ instead of \ in the last example. Unlike
1914 sed, we use the \<digit> form in only the left hand side.
1915 Anywhere else it's $<digit>.
1916
1917 Occasionally, you can't use just a "/g" to get all the changes
1918 to occur that you might want. Here are two common cases:
1919
1920 # put commas in the right places in an integer
1921 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
1922
1923 # expand tabs to 8-column spacing
1924 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1925
1926 "s///le" is treated as a substitution followed by the "le"
1927 operator, not the "/le" flags. This may change in a future
1928 version of Perl. It produces a warning if warnings are
1929 enabled. To disambiguate, use a space or change the order of
1930 the flags:
1931
1932 s/foo/bar/ le 5; # "le" infix operator
1933 s/foo/bar/el; # "e" and "l" flags
1934
1935 Quote-Like Operators
1936 q/STRING/
1937 'STRING'
1938 A single-quoted, literal string. A backslash represents a
1939 backslash unless followed by the delimiter or another backslash, in
1940 which case the delimiter or backslash is interpolated.
1941
1942 $foo = q!I said, "You said, 'She said it.'"!;
1943 $bar = q('This is it.');
1944 $baz = '\n'; # a two-character string
1945
1946 qq/STRING/
1947 "STRING"
1948 A double-quoted, interpolated string.
1949
1950 $_ .= qq
1951 (*** The previous line contains the naughty word "$1".\n)
1952 if /\b(tcl|java|python)\b/i; # :-)
1953 $baz = "\n"; # a one-character string
1954
1955 qx/STRING/
1956 `STRING`
1957 A string which is (possibly) interpolated and then executed as a
1958 system command with "/bin/sh" or its equivalent. Shell wildcards,
1959 pipes, and redirections will be honored. The collected standard
1960 output of the command is returned; standard error is unaffected.
1961 In scalar context, it comes back as a single (potentially multi-
1962 line) string, or undef if the command failed. In list context,
1963 returns a list of lines (however you've defined lines with $/ or
1964 $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
1965
1966 Because backticks do not affect standard error, use shell file
1967 descriptor syntax (assuming the shell supports this) if you care to
1968 address this. To capture a command's STDERR and STDOUT together:
1969
1970 $output = `cmd 2>&1`;
1971
1972 To capture a command's STDOUT but discard its STDERR:
1973
1974 $output = `cmd 2>/dev/null`;
1975
1976 To capture a command's STDERR but discard its STDOUT (ordering is
1977 important here):
1978
1979 $output = `cmd 2>&1 1>/dev/null`;
1980
1981 To exchange a command's STDOUT and STDERR in order to capture the
1982 STDERR but leave its STDOUT to come out the old STDERR:
1983
1984 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1985
1986 To read both a command's STDOUT and its STDERR separately, it's
1987 easiest to redirect them separately to files, and then read from
1988 those files when the program is done:
1989
1990 system("program args 1>program.stdout 2>program.stderr");
1991
1992 The STDIN filehandle used by the command is inherited from Perl's
1993 STDIN. For example:
1994
1995 open(SPLAT, "stuff") || die "can't open stuff: $!";
1996 open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
1997 print STDOUT `sort`;
1998
1999 will print the sorted contents of the file named "stuff".
2000
2001 Using single-quote as a delimiter protects the command from Perl's
2002 double-quote interpolation, passing it on to the shell instead:
2003
2004 $perl_info = qx(ps $$); # that's Perl's $$
2005 $shell_info = qx'ps $$'; # that's the new shell's $$
2006
2007 How that string gets evaluated is entirely subject to the command
2008 interpreter on your system. On most platforms, you will have to
2009 protect shell metacharacters if you want them treated literally.
2010 This is in practice difficult to do, as it's unclear how to escape
2011 which characters. See perlsec for a clean and safe example of a
2012 manual fork() and exec() to emulate backticks safely.
2013
2014 On some platforms (notably DOS-like ones), the shell may not be
2015 capable of dealing with multiline commands, so putting newlines in
2016 the string may not get you what you want. You may be able to
2017 evaluate multiple commands in a single line by separating them with
2018 the command separator character, if your shell supports that (for
2019 example, ";" on many Unix shells and "&" on the Windows NT "cmd"
2020 shell).
2021
2022 Beginning with v5.6.0, Perl will attempt to flush all files opened
2023 for output before starting the child process, but this may not be
2024 supported on some platforms (see perlport). To be safe, you may
2025 need to set $| ($AUTOFLUSH in English) or call the "autoflush()"
2026 method of "IO::Handle" on any open handles.
2027
2028 Beware that some command shells may place restrictions on the
2029 length of the command line. You must ensure your strings don't
2030 exceed this limit after any necessary interpolations. See the
2031 platform-specific release notes for more details about your
2032 particular environment.
2033
2034 Using this operator can lead to programs that are difficult to
2035 port, because the shell commands called vary between systems, and
2036 may in fact not be present at all. As one example, the "type"
2037 command under the POSIX shell is very different from the "type"
2038 command under DOS. That doesn't mean you should go out of your way
2039 to avoid backticks when they're the right way to get something
2040 done. Perl was made to be a glue language, and one of the things
2041 it glues together is commands. Just understand what you're getting
2042 yourself into.
2043
2044 See "I/O Operators" for more discussion.
2045
2046 qw/STRING/
2047 Evaluates to a list of the words extracted out of STRING, using
2048 embedded whitespace as the word delimiters. It can be understood
2049 as being roughly equivalent to:
2050
2051 split(" ", q/STRING/);
2052
2053 the differences being that it generates a real list at compile
2054 time, and in scalar context it returns the last element in the
2055 list. So this expression:
2056
2057 qw(foo bar baz)
2058
2059 is semantically equivalent to the list:
2060
2061 "foo", "bar", "baz"
2062
2063 Some frequently seen examples:
2064
2065 use POSIX qw( setlocale localeconv )
2066 @EXPORT = qw( foo bar baz );
2067
2068 A common mistake is to try to separate the words with comma or to
2069 put comments into a multi-line "qw"-string. For this reason, the
2070 "use warnings" pragma and the -w switch (that is, the $^W variable)
2071 produces warnings if the STRING contains the "," or the "#"
2072 character.
2073
2074 tr/SEARCHLIST/REPLACEMENTLIST/cdsr
2075 y/SEARCHLIST/REPLACEMENTLIST/cdsr
2076 Transliterates all occurrences of the characters found in the
2077 search list with the corresponding character in the replacement
2078 list. It returns the number of characters replaced or deleted. If
2079 no string is specified via the "=~" or "!~" operator, the $_ string
2080 is transliterated.
2081
2082 If the "/r" (non-destructive) option is present, a new copy of the
2083 string is made and its characters transliterated, and this copy is
2084 returned no matter whether it was modified or not: the original
2085 string is always left unchanged. The new copy is always a plain
2086 string, even if the input string is an object or a tied variable.
2087
2088 Unless the "/r" option is used, the string specified with "=~" must
2089 be a scalar variable, an array element, a hash element, or an
2090 assignment to one of those; in other words, an lvalue.
2091
2092 A character range may be specified with a hyphen, so "tr/A-J/0-9/"
2093 does the same replacement as "tr/ACEGIBDFHJ/0246813579/". For sed
2094 devotees, "y" is provided as a synonym for "tr". If the SEARCHLIST
2095 is delimited by bracketing quotes, the REPLACEMENTLIST has its own
2096 pair of quotes, which may or may not be bracketing quotes; for
2097 example, "tr[aeiouy][yuoiea]" or "tr(+\-*/)/ABCD/".
2098
2099 Note that "tr" does not do regular expression character classes
2100 such as "\d" or "\pL". The "tr" operator is not equivalent to the
2101 tr(1) utility. If you want to map strings between lower/upper
2102 cases, see "lc" in perlfunc and "uc" in perlfunc, and in general
2103 consider using the "s" operator if you need regular expressions.
2104 The "\U", "\u", "\L", and "\l" string-interpolation escapes on the
2105 right side of a substitution operator will perform correct case-
2106 mappings, but "tr[a-z][A-Z]" will not (except sometimes on legacy
2107 7-bit data).
2108
2109 Note also that the whole range idea is rather unportable between
2110 character sets--and even within character sets they may cause
2111 results you probably didn't expect. A sound principle is to use
2112 only ranges that begin from and end at either alphabets of equal
2113 case (a-e, A-E), or digits (0-4). Anything else is unsafe. If in
2114 doubt, spell out the character sets in full.
2115
2116 Options:
2117
2118 c Complement the SEARCHLIST.
2119 d Delete found but unreplaced characters.
2120 s Squash duplicate replaced characters.
2121 r Return the modified string and leave the original string
2122 untouched.
2123
2124 If the "/c" modifier is specified, the SEARCHLIST character set is
2125 complemented. If the "/d" modifier is specified, any characters
2126 specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
2127 (Note that this is slightly more flexible than the behavior of some
2128 tr programs, which delete anything they find in the SEARCHLIST,
2129 period.) If the "/s" modifier is specified, sequences of characters
2130 that were transliterated to the same character are squashed down to
2131 a single instance of the character.
2132
2133 If the "/d" modifier is used, the REPLACEMENTLIST is always
2134 interpreted exactly as specified. Otherwise, if the
2135 REPLACEMENTLIST is shorter than the SEARCHLIST, the final character
2136 is replicated till it is long enough. If the REPLACEMENTLIST is
2137 empty, the SEARCHLIST is replicated. This latter is useful for
2138 counting characters in a class or for squashing character sequences
2139 in a class.
2140
2141 Examples:
2142
2143 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
2144
2145 $cnt = tr/*/*/; # count the stars in $_
2146
2147 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
2148
2149 $cnt = tr/0-9//; # count the digits in $_
2150
2151 tr/a-zA-Z//s; # bookkeeper -> bokeper
2152
2153 ($HOST = $host) =~ tr/a-z/A-Z/;
2154 $HOST = $host =~ tr/a-z/A-Z/r; # same thing
2155
2156 $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
2157 =~ s/:/ -p/r;
2158
2159 tr/a-zA-Z/ /cs; # change non-alphas to single space
2160
2161 @stripped = map tr/a-zA-Z/ /csr, @original;
2162 # /r with map
2163
2164 tr [\200-\377]
2165 [\000-\177]; # wickedly delete 8th bit
2166
2167 If multiple transliterations are given for a character, only the
2168 first one is used:
2169
2170 tr/AAA/XYZ/
2171
2172 will transliterate any A to X.
2173
2174 Because the transliteration table is built at compile time, neither
2175 the SEARCHLIST nor the REPLACEMENTLIST are subjected to double
2176 quote interpolation. That means that if you want to use variables,
2177 you must use an eval():
2178
2179 eval "tr/$oldlist/$newlist/";
2180 die $@ if $@;
2181
2182 eval "tr/$oldlist/$newlist/, 1" or die $@;
2183
2184 <<EOF
2185 A line-oriented form of quoting is based on the shell "here-
2186 document" syntax. Following a "<<" you specify a string to
2187 terminate the quoted material, and all lines following the current
2188 line down to the terminating string are the value of the item.
2189
2190 The terminating string may be either an identifier (a word), or
2191 some quoted text. An unquoted identifier works like double quotes.
2192 There may not be a space between the "<<" and the identifier,
2193 unless the identifier is explicitly quoted. (If you put a space it
2194 will be treated as a null identifier, which is valid, and matches
2195 the first empty line.) The terminating string must appear by
2196 itself (unquoted and with no surrounding whitespace) on the
2197 terminating line.
2198
2199 If the terminating string is quoted, the type of quotes used
2200 determine the treatment of the text.
2201
2202 Double Quotes
2203 Double quotes indicate that the text will be interpolated using
2204 exactly the same rules as normal double quoted strings.
2205
2206 print <<EOF;
2207 The price is $Price.
2208 EOF
2209
2210 print << "EOF"; # same as above
2211 The price is $Price.
2212 EOF
2213
2214 Single Quotes
2215 Single quotes indicate the text is to be treated literally with
2216 no interpolation of its content. This is similar to single
2217 quoted strings except that backslashes have no special meaning,
2218 with "\\" being treated as two backslashes and not one as they
2219 would in every other quoting construct.
2220
2221 Just as in the shell, a backslashed bareword following the "<<"
2222 means the same thing as a single-quoted string does:
2223
2224 $cost = <<'VISTA'; # hasta la ...
2225 That'll be $10 please, ma'am.
2226 VISTA
2227
2228 $cost = <<\VISTA; # Same thing!
2229 That'll be $10 please, ma'am.
2230 VISTA
2231
2232 This is the only form of quoting in perl where there is no need
2233 to worry about escaping content, something that code generators
2234 can and do make good use of.
2235
2236 Backticks
2237 The content of the here doc is treated just as it would be if
2238 the string were embedded in backticks. Thus the content is
2239 interpolated as though it were double quoted and then executed
2240 via the shell, with the results of the execution returned.
2241
2242 print << `EOC`; # execute command and get results
2243 echo hi there
2244 EOC
2245
2246 It is possible to stack multiple here-docs in a row:
2247
2248 print <<"foo", <<"bar"; # you can stack them
2249 I said foo.
2250 foo
2251 I said bar.
2252 bar
2253
2254 myfunc(<< "THIS", 23, <<'THAT');
2255 Here's a line
2256 or two.
2257 THIS
2258 and here's another.
2259 THAT
2260
2261 Just don't forget that you have to put a semicolon on the end to
2262 finish the statement, as Perl doesn't know you're not going to try
2263 to do this:
2264
2265 print <<ABC
2266 179231
2267 ABC
2268 + 20;
2269
2270 If you want to remove the line terminator from your here-docs, use
2271 "chomp()".
2272
2273 chomp($string = <<'END');
2274 This is a string.
2275 END
2276
2277 If you want your here-docs to be indented with the rest of the
2278 code, you'll need to remove leading whitespace from each line
2279 manually:
2280
2281 ($quote = <<'FINIS') =~ s/^\s+//gm;
2282 The Road goes ever on and on,
2283 down from the door where it began.
2284 FINIS
2285
2286 If you use a here-doc within a delimited construct, such as in
2287 "s///eg", the quoted material must come on the lines following the
2288 final delimiter. So instead of
2289
2290 s/this/<<E . 'that'
2291 the other
2292 E
2293 . 'more '/eg;
2294
2295 you have to write
2296
2297 s/this/<<E . 'that'
2298 . 'more '/eg;
2299 the other
2300 E
2301
2302 If the terminating identifier is on the last line of the program,
2303 you must be sure there is a newline after it; otherwise, Perl will
2304 give the warning Can't find string terminator "END" anywhere before
2305 EOF....
2306
2307 Additionally, quoting rules for the end-of-string identifier are
2308 unrelated to Perl's quoting rules. "q()", "qq()", and the like are
2309 not supported in place of '' and "", and the only interpolation is
2310 for backslashing the quoting character:
2311
2312 print << "abc\"def";
2313 testing...
2314 abc"def
2315
2316 Finally, quoted strings cannot span multiple lines. The general
2317 rule is that the identifier must be a string literal. Stick with
2318 that, and you should be safe.
2319
2320 Gory details of parsing quoted constructs
2321 When presented with something that might have several different
2322 interpretations, Perl uses the DWIM (that's "Do What I Mean") principle
2323 to pick the most probable interpretation. This strategy is so
2324 successful that Perl programmers often do not suspect the ambivalence
2325 of what they write. But from time to time, Perl's notions differ
2326 substantially from what the author honestly meant.
2327
2328 This section hopes to clarify how Perl handles quoted constructs.
2329 Although the most common reason to learn this is to unravel
2330 labyrinthine regular expressions, because the initial steps of parsing
2331 are the same for all quoting operators, they are all discussed
2332 together.
2333
2334 The most important Perl parsing rule is the first one discussed below:
2335 when processing a quoted construct, Perl first finds the end of that
2336 construct, then interprets its contents. If you understand this rule,
2337 you may skip the rest of this section on the first reading. The other
2338 rules are likely to contradict the user's expectations much less
2339 frequently than this first one.
2340
2341 Some passes discussed below are performed concurrently, but because
2342 their results are the same, we consider them individually. For
2343 different quoting constructs, Perl performs different numbers of
2344 passes, from one to four, but these passes are always performed in the
2345 same order.
2346
2347 Finding the end
2348 The first pass is finding the end of the quoted construct, where
2349 the information about the delimiters is used in parsing. During
2350 this search, text between the starting and ending delimiters is
2351 copied to a safe location. The text copied gets delimiter-
2352 independent.
2353
2354 If the construct is a here-doc, the ending delimiter is a line that
2355 has a terminating string as the content. Therefore "<<EOF" is
2356 terminated by "EOF" immediately followed by "\n" and starting from
2357 the first column of the terminating line. When searching for the
2358 terminating line of a here-doc, nothing is skipped. In other words,
2359 lines after the here-doc syntax are compared with the terminating
2360 string line by line.
2361
2362 For the constructs except here-docs, single characters are used as
2363 starting and ending delimiters. If the starting delimiter is an
2364 opening punctuation (that is "(", "[", "{", or "<"), the ending
2365 delimiter is the corresponding closing punctuation (that is ")",
2366 "]", "}", or ">"). If the starting delimiter is an unpaired
2367 character like "/" or a closing punctuation, the ending delimiter
2368 is same as the starting delimiter. Therefore a "/" terminates a
2369 "qq//" construct, while a "]" terminates "qq[]" and "qq]]"
2370 constructs.
2371
2372 When searching for single-character delimiters, escaped delimiters
2373 and "\\" are skipped. For example, while searching for terminating
2374 "/", combinations of "\\" and "\/" are skipped. If the delimiters
2375 are bracketing, nested pairs are also skipped. For example, while
2376 searching for closing "]" paired with the opening "[", combinations
2377 of "\\", "\]", and "\[" are all skipped, and nested "[" and "]" are
2378 skipped as well. However, when backslashes are used as the
2379 delimiters (like "qq\\" and "tr\\\"), nothing is skipped. During
2380 the search for the end, backslashes that escape delimiters or
2381 backslashes are removed (exactly speaking, they are not copied to
2382 the safe location).
2383
2384 For constructs with three-part delimiters ("s///", "y///", and
2385 "tr///"), the search is repeated once more. If the first delimiter
2386 is not an opening punctuation, three delimiters must be same such
2387 as "s!!!" and "tr)))", in which case the second delimiter
2388 terminates the left part and starts the right part at once. If the
2389 left part is delimited by bracketing punctuation (that is "()",
2390 "[]", "{}", or "<>"), the right part needs another pair of
2391 delimiters such as "s(){}" and "tr[]//". In these cases,
2392 whitespace and comments are allowed between both parts, though the
2393 comment must follow at least one whitespace character; otherwise a
2394 character expected as the start of the comment may be regarded as
2395 the starting delimiter of the right part.
2396
2397 During this search no attention is paid to the semantics of the
2398 construct. Thus:
2399
2400 "$hash{"$foo/$bar"}"
2401
2402 or:
2403
2404 m/
2405 bar # NOT a comment, this slash / terminated m//!
2406 /x
2407
2408 do not form legal quoted expressions. The quoted part ends on the
2409 first """ and "/", and the rest happens to be a syntax error.
2410 Because the slash that terminated "m//" was followed by a "SPACE",
2411 the example above is not "m//x", but rather "m//" with no "/x"
2412 modifier. So the embedded "#" is interpreted as a literal "#".
2413
2414 Also no attention is paid to "\c\" (multichar control char syntax)
2415 during this search. Thus the second "\" in "qq/\c\/" is interpreted
2416 as a part of "\/", and the following "/" is not recognized as a
2417 delimiter. Instead, use "\034" or "\x1c" at the end of quoted
2418 constructs.
2419
2420 Interpolation
2421 The next step is interpolation in the text obtained, which is now
2422 delimiter-independent. There are multiple cases.
2423
2424 "<<'EOF'"
2425 No interpolation is performed. Note that the combination "\\"
2426 is left intact, since escaped delimiters are not available for
2427 here-docs.
2428
2429 "m''", the pattern of "s'''"
2430 No interpolation is performed at this stage. Any backslashed
2431 sequences including "\\" are treated at the stage to "parsing
2432 regular expressions".
2433
2434 '', "q//", "tr'''", "y'''", the replacement of "s'''"
2435 The only interpolation is removal of "\" from pairs of "\\".
2436 Therefore "-" in "tr'''" and "y'''" is treated literally as a
2437 hyphen and no character range is available. "\1" in the
2438 replacement of "s'''" does not work as $1.
2439
2440 "tr///", "y///"
2441 No variable interpolation occurs. String modifying
2442 combinations for case and quoting such as "\Q", "\U", and "\E"
2443 are not recognized. The other escape sequences such as "\200"
2444 and "\t" and backslashed characters such as "\\" and "\-" are
2445 converted to appropriate literals. The character "-" is
2446 treated specially and therefore "\-" is treated as a literal
2447 "-".
2448
2449 "", "``", "qq//", "qx//", "<file*glob>", "<<"EOF""
2450 "\Q", "\U", "\u", "\L", "\l", "\F" (possibly paired with "\E")
2451 are converted to corresponding Perl constructs. Thus,
2452 "$foo\Qbaz$bar" is converted to "$foo . (quotemeta("baz" .
2453 $bar))" internally. The other escape sequences such as "\200"
2454 and "\t" and backslashed characters such as "\\" and "\-" are
2455 replaced with appropriate expansions.
2456
2457 Let it be stressed that whatever falls between "\Q" and "\E" is
2458 interpolated in the usual way. Something like "\Q\\E" has no
2459 "\E" inside. instead, it has "\Q", "\\", and "E", so the
2460 result is the same as for "\\\\E". As a general rule,
2461 backslashes between "\Q" and "\E" may lead to counterintuitive
2462 results. So, "\Q\t\E" is converted to "quotemeta("\t")", which
2463 is the same as "\\\t" (since TAB is not alphanumeric). Note
2464 also that:
2465
2466 $str = '\t';
2467 return "\Q$str";
2468
2469 may be closer to the conjectural intention of the writer of
2470 "\Q\t\E".
2471
2472 Interpolated scalars and arrays are converted internally to the
2473 "join" and "." catenation operations. Thus, "$foo XXX '@arr'"
2474 becomes:
2475
2476 $foo . " XXX '" . (join $", @arr) . "'";
2477
2478 All operations above are performed simultaneously, left to
2479 right.
2480
2481 Because the result of "\Q STRING \E" has all metacharacters
2482 quoted, there is no way to insert a literal "$" or "@" inside a
2483 "\Q\E" pair. If protected by "\", "$" will be quoted to became
2484 "\\\$"; if not, it is interpreted as the start of an
2485 interpolated scalar.
2486
2487 Note also that the interpolation code needs to make a decision
2488 on where the interpolated scalar ends. For instance, whether
2489 "a $b -> {c}" really means:
2490
2491 "a " . $b . " -> {c}";
2492
2493 or:
2494
2495 "a " . $b -> {c};
2496
2497 Most of the time, the longest possible text that does not
2498 include spaces between components and which contains matching
2499 braces or brackets. because the outcome may be determined by
2500 voting based on heuristic estimators, the result is not
2501 strictly predictable. Fortunately, it's usually correct for
2502 ambiguous cases.
2503
2504 the replacement of "s///"
2505 Processing of "\Q", "\U", "\u", "\L", "\l", "\F" and
2506 interpolation happens as with "qq//" constructs.
2507
2508 It is at this step that "\1" is begrudgingly converted to $1 in
2509 the replacement text of "s///", in order to correct the
2510 incorrigible sed hackers who haven't picked up the saner idiom
2511 yet. A warning is emitted if the "use warnings" pragma or the
2512 -w command-line flag (that is, the $^W variable) was set.
2513
2514 "RE" in "?RE?", "/RE/", "m/RE/", "s/RE/foo/",
2515 Processing of "\Q", "\U", "\u", "\L", "\l", "\F", "\E", and
2516 interpolation happens (almost) as with "qq//" constructs.
2517
2518 Processing of "\N{...}" is also done here, and compiled into an
2519 intermediate form for the regex compiler. (This is because, as
2520 mentioned below, the regex compilation may be done at execution
2521 time, and "\N{...}" is a compile-time construct.)
2522
2523 However any other combinations of "\" followed by a character
2524 are not substituted but only skipped, in order to parse them as
2525 regular expressions at the following step. As "\c" is skipped
2526 at this step, "@" of "\c@" in RE is possibly treated as an
2527 array symbol (for example @foo), even though the same text in
2528 "qq//" gives interpolation of "\c@".
2529
2530 Moreover, inside "(?{BLOCK})", "(?# comment )", and a
2531 "#"-comment in a "//x"-regular expression, no processing is
2532 performed whatsoever. This is the first step at which the
2533 presence of the "//x" modifier is relevant.
2534
2535 Interpolation in patterns has several quirks: $|, $(, $), "@+"
2536 and "@-" are not interpolated, and constructs $var[SOMETHING]
2537 are voted (by several different estimators) to be either an
2538 array element or $var followed by an RE alternative. This is
2539 where the notation "${arr[$bar]}" comes handy: "/${arr[0-9]}/"
2540 is interpreted as array element "-9", not as a regular
2541 expression from the variable $arr followed by a digit, which
2542 would be the interpretation of "/$arr[0-9]/". Since voting
2543 among different estimators may occur, the result is not
2544 predictable.
2545
2546 The lack of processing of "\\" creates specific restrictions on
2547 the post-processed text. If the delimiter is "/", one cannot
2548 get the combination "\/" into the result of this step. "/"
2549 will finish the regular expression, "\/" will be stripped to
2550 "/" on the previous step, and "\\/" will be left as is.
2551 Because "/" is equivalent to "\/" inside a regular expression,
2552 this does not matter unless the delimiter happens to be
2553 character special to the RE engine, such as in "s*foo*bar*",
2554 "m[foo]", or "?foo?"; or an alphanumeric char, as in:
2555
2556 m m ^ a \s* b mmx;
2557
2558 In the RE above, which is intentionally obfuscated for
2559 illustration, the delimiter is "m", the modifier is "mx", and
2560 after delimiter-removal the RE is the same as for "m/ ^ a \s* b
2561 /mx". There's more than one reason you're encouraged to
2562 restrict your delimiters to non-alphanumeric, non-whitespace
2563 choices.
2564
2565 This step is the last one for all constructs except regular
2566 expressions, which are processed further.
2567
2568 parsing regular expressions
2569 Previous steps were performed during the compilation of Perl code,
2570 but this one happens at run time, although it may be optimized to
2571 be calculated at compile time if appropriate. After preprocessing
2572 described above, and possibly after evaluation if concatenation,
2573 joining, casing translation, or metaquoting are involved, the
2574 resulting string is passed to the RE engine for compilation.
2575
2576 Whatever happens in the RE engine might be better discussed in
2577 perlre, but for the sake of continuity, we shall do so here.
2578
2579 This is another step where the presence of the "//x" modifier is
2580 relevant. The RE engine scans the string from left to right and
2581 converts it to a finite automaton.
2582
2583 Backslashed characters are either replaced with corresponding
2584 literal strings (as with "\{"), or else they generate special nodes
2585 in the finite automaton (as with "\b"). Characters special to the
2586 RE engine (such as "|") generate corresponding nodes or groups of
2587 nodes. "(?#...)" comments are ignored. All the rest is either
2588 converted to literal strings to match, or else is ignored (as is
2589 whitespace and "#"-style comments if "//x" is present).
2590
2591 Parsing of the bracketed character class construct, "[...]", is
2592 rather different than the rule used for the rest of the pattern.
2593 The terminator of this construct is found using the same rules as
2594 for finding the terminator of a "{}"-delimited construct, the only
2595 exception being that "]" immediately following "[" is treated as
2596 though preceded by a backslash. Similarly, the terminator of
2597 "(?{...})" is found using the same rules as for finding the
2598 terminator of a "{}"-delimited construct.
2599
2600 It is possible to inspect both the string given to RE engine and
2601 the resulting finite automaton. See the arguments
2602 "debug"/"debugcolor" in the "use re" pragma, as well as Perl's -Dr
2603 command-line switch documented in "Command Switches" in perlrun.
2604
2605 Optimization of regular expressions
2606 This step is listed for completeness only. Since it does not
2607 change semantics, details of this step are not documented and are
2608 subject to change without notice. This step is performed over the
2609 finite automaton that was generated during the previous pass.
2610
2611 It is at this stage that "split()" silently optimizes "/^/" to mean
2612 "/^/m".
2613
2614 I/O Operators
2615 There are several I/O operators you should know about.
2616
2617 A string enclosed by backticks (grave accents) first undergoes double-
2618 quote interpolation. It is then interpreted as an external command,
2619 and the output of that command is the value of the backtick string,
2620 like in a shell. In scalar context, a single string consisting of all
2621 output is returned. In list context, a list of values is returned, one
2622 per line of output. (You can set $/ to use a different line
2623 terminator.) The command is executed each time the pseudo-literal is
2624 evaluated. The status value of the command is returned in $? (see
2625 perlvar for the interpretation of $?). Unlike in csh, no translation
2626 is done on the return data--newlines remain newlines. Unlike in any of
2627 the shells, single quotes do not hide variable names in the command
2628 from interpretation. To pass a literal dollar-sign through to the
2629 shell you need to hide it with a backslash. The generalized form of
2630 backticks is "qx//". (Because backticks always undergo shell expansion
2631 as well, see perlsec for security concerns.)
2632
2633 In scalar context, evaluating a filehandle in angle brackets yields the
2634 next line from that file (the newline, if any, included), or "undef" at
2635 end-of-file or on error. When $/ is set to "undef" (sometimes known as
2636 file-slurp mode) and the file is empty, it returns '' the first time,
2637 followed by "undef" subsequently.
2638
2639 Ordinarily you must assign the returned value to a variable, but there
2640 is one situation where an automatic assignment happens. If and only if
2641 the input symbol is the only thing inside the conditional of a "while"
2642 statement (even if disguised as a "for(;;)" loop), the value is
2643 automatically assigned to the global variable $_, destroying whatever
2644 was there previously. (This may seem like an odd thing to you, but
2645 you'll use the construct in almost every Perl script you write.) The
2646 $_ variable is not implicitly localized. You'll have to put a "local
2647 $_;" before the loop if you want that to happen.
2648
2649 The following lines are equivalent:
2650
2651 while (defined($_ = <STDIN>)) { print; }
2652 while ($_ = <STDIN>) { print; }
2653 while (<STDIN>) { print; }
2654 for (;<STDIN>;) { print; }
2655 print while defined($_ = <STDIN>);
2656 print while ($_ = <STDIN>);
2657 print while <STDIN>;
2658
2659 This also behaves similarly, but assigns to a lexical variable instead
2660 of to $_:
2661
2662 while (my $line = <STDIN>) { print $line }
2663
2664 In these loop constructs, the assigned value (whether assignment is
2665 automatic or explicit) is then tested to see whether it is defined.
2666 The defined test avoids problems where the line has a string value that
2667 would be treated as false by Perl; for example a "" or a "0" with no
2668 trailing newline. If you really mean for such values to terminate the
2669 loop, they should be tested for explicitly:
2670
2671 while (($_ = <STDIN>) ne '0') { ... }
2672 while (<STDIN>) { last unless $_; ... }
2673
2674 In other boolean contexts, "<FILEHANDLE>" without an explicit "defined"
2675 test or comparison elicits a warning if the "use warnings" pragma or
2676 the -w command-line switch (the $^W variable) is in effect.
2677
2678 The filehandles STDIN, STDOUT, and STDERR are predefined. (The
2679 filehandles "stdin", "stdout", and "stderr" will also work except in
2680 packages, where they would be interpreted as local identifiers rather
2681 than global.) Additional filehandles may be created with the open()
2682 function, amongst others. See perlopentut and "open" in perlfunc for
2683 details on this.
2684
2685 If a <FILEHANDLE> is used in a context that is looking for a list, a
2686 list comprising all input lines is returned, one line per list element.
2687 It's easy to grow to a rather large data space this way, so use with
2688 care.
2689
2690 <FILEHANDLE> may also be spelled "readline(*FILEHANDLE)". See
2691 "readline" in perlfunc.
2692
2693 The null filehandle <> is special: it can be used to emulate the
2694 behavior of sed and awk, and any other Unix filter program that takes a
2695 list of filenames, doing the same to each line of input from all of
2696 them. Input from <> comes either from standard input, or from each
2697 file listed on the command line. Here's how it works: the first time
2698 <> is evaluated, the @ARGV array is checked, and if it is empty,
2699 $ARGV[0] is set to "-", which when opened gives you standard input.
2700 The @ARGV array is then processed as a list of filenames. The loop
2701
2702 while (<>) {
2703 ... # code for each line
2704 }
2705
2706 is equivalent to the following Perl-like pseudo code:
2707
2708 unshift(@ARGV, '-') unless @ARGV;
2709 while ($ARGV = shift) {
2710 open(ARGV, $ARGV);
2711 while (<ARGV>) {
2712 ... # code for each line
2713 }
2714 }
2715
2716 except that it isn't so cumbersome to say, and will actually work. It
2717 really does shift the @ARGV array and put the current filename into the
2718 $ARGV variable. It also uses filehandle ARGV internally. <> is just a
2719 synonym for <ARGV>, which is magical. (The pseudo code above doesn't
2720 work because it treats <ARGV> as non-magical.)
2721
2722 Since the null filehandle uses the two argument form of "open" in
2723 perlfunc it interprets special characters, so if you have a script like
2724 this:
2725
2726 while (<>) {
2727 print;
2728 }
2729
2730 and call it with "perl dangerous.pl 'rm -rfv *|'", it actually opens a
2731 pipe, executes the "rm" command and reads "rm"'s output from that pipe.
2732 If you want all items in @ARGV to be interpreted as file names, you can
2733 use the module "ARGV::readonly" from CPAN.
2734
2735 You can modify @ARGV before the first <> as long as the array ends up
2736 containing the list of filenames you really want. Line numbers ($.)
2737 continue as though the input were one big happy file. See the example
2738 in "eof" in perlfunc for how to reset line numbers on each file.
2739
2740 If you want to set @ARGV to your own list of files, go right ahead.
2741 This sets @ARGV to all plain text files if no @ARGV was given:
2742
2743 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
2744
2745 You can even set them to pipe commands. For example, this
2746 automatically filters compressed arguments through gzip:
2747
2748 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
2749
2750 If you want to pass switches into your script, you can use one of the
2751 Getopts modules or put a loop on the front like this:
2752
2753 while ($_ = $ARGV[0], /^-/) {
2754 shift;
2755 last if /^--$/;
2756 if (/^-D(.*)/) { $debug = $1 }
2757 if (/^-v/) { $verbose++ }
2758 # ... # other switches
2759 }
2760
2761 while (<>) {
2762 # ... # code for each line
2763 }
2764
2765 The <> symbol will return "undef" for end-of-file only once. If you
2766 call it again after this, it will assume you are processing another
2767 @ARGV list, and if you haven't set @ARGV, will read input from STDIN.
2768
2769 If what the angle brackets contain is a simple scalar variable (for
2770 example, <$foo>), then that variable contains the name of the
2771 filehandle to input from, or its typeglob, or a reference to the same.
2772 For example:
2773
2774 $fh = \*STDIN;
2775 $line = <$fh>;
2776
2777 If what's within the angle brackets is neither a filehandle nor a
2778 simple scalar variable containing a filehandle name, typeglob, or
2779 typeglob reference, it is interpreted as a filename pattern to be
2780 globbed, and either a list of filenames or the next filename in the
2781 list is returned, depending on context. This distinction is determined
2782 on syntactic grounds alone. That means "<$x>" is always a readline()
2783 from an indirect handle, but "<$hash{key}>" is always a glob(). That's
2784 because $x is a simple scalar variable, but $hash{key} is not--it's a
2785 hash element. Even "<$x >" (note the extra space) is treated as
2786 "glob("$x ")", not "readline($x)".
2787
2788 One level of double-quote interpretation is done first, but you can't
2789 say "<$foo>" because that's an indirect filehandle as explained in the
2790 previous paragraph. (In older versions of Perl, programmers would
2791 insert curly brackets to force interpretation as a filename glob:
2792 "<${foo}>". These days, it's considered cleaner to call the internal
2793 function directly as "glob($foo)", which is probably the right way to
2794 have done it in the first place.) For example:
2795
2796 while (<*.c>) {
2797 chmod 0644, $_;
2798 }
2799
2800 is roughly equivalent to:
2801
2802 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
2803 while (<FOO>) {
2804 chomp;
2805 chmod 0644, $_;
2806 }
2807
2808 except that the globbing is actually done internally using the standard
2809 "File::Glob" extension. Of course, the shortest way to do the above
2810 is:
2811
2812 chmod 0644, <*.c>;
2813
2814 A (file)glob evaluates its (embedded) argument only when it is starting
2815 a new list. All values must be read before it will start over. In
2816 list context, this isn't important because you automatically get them
2817 all anyway. However, in scalar context the operator returns the next
2818 value each time it's called, or "undef" when the list has run out. As
2819 with filehandle reads, an automatic "defined" is generated when the
2820 glob occurs in the test part of a "while", because legal glob returns
2821 (for example, a file called 0) would otherwise terminate the loop.
2822 Again, "undef" is returned only once. So if you're expecting a single
2823 value from a glob, it is much better to say
2824
2825 ($file) = <blurch*>;
2826
2827 than
2828
2829 $file = <blurch*>;
2830
2831 because the latter will alternate between returning a filename and
2832 returning false.
2833
2834 If you're trying to do variable interpolation, it's definitely better
2835 to use the glob() function, because the older notation can cause people
2836 to become confused with the indirect filehandle notation.
2837
2838 @files = glob("$dir/*.[ch]");
2839 @files = glob($files[$i]);
2840
2841 Constant Folding
2842 Like C, Perl does a certain amount of expression evaluation at compile
2843 time whenever it determines that all arguments to an operator are
2844 static and have no side effects. In particular, string concatenation
2845 happens at compile time between literals that don't do variable
2846 substitution. Backslash interpolation also happens at compile time.
2847 You can say
2848
2849 'Now is the time for all'
2850 . "\n"
2851 . 'good men to come to.'
2852
2853 and this all reduces to one string internally. Likewise, if you say
2854
2855 foreach $file (@filenames) {
2856 if (-s $file > 5 + 100 * 2**16) { }
2857 }
2858
2859 the compiler precomputes the number which that expression represents so
2860 that the interpreter won't have to.
2861
2862 No-ops
2863 Perl doesn't officially have a no-op operator, but the bare constants 0
2864 and 1 are special-cased not to produce a warning in void context, so
2865 you can for example safely do
2866
2867 1 while foo();
2868
2869 Bitwise String Operators
2870 Bitstrings of any size may be manipulated by the bitwise operators ("~
2871 | & ^").
2872
2873 If the operands to a binary bitwise op are strings of different sizes,
2874 | and ^ ops act as though the shorter operand had additional zero bits
2875 on the right, while the & op acts as though the longer operand were
2876 truncated to the length of the shorter. The granularity for such
2877 extension or truncation is one or more bytes.
2878
2879 # ASCII-based examples
2880 print "j p \n" ^ " a h"; # prints "JAPH\n"
2881 print "JA" | " ph\n"; # prints "japh\n"
2882 print "japh\nJunk" & '_____'; # prints "JAPH\n";
2883 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
2884
2885 If you are intending to manipulate bitstrings, be certain that you're
2886 supplying bitstrings: If an operand is a number, that will imply a
2887 numeric bitwise operation. You may explicitly show which type of
2888 operation you intend by using "" or "0+", as in the examples below.
2889
2890 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
2891 $foo = '150' | 105; # yields 255
2892 $foo = 150 | '105'; # yields 255
2893 $foo = '150' | '105'; # yields string '155' (under ASCII)
2894
2895 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
2896 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
2897
2898 See "vec" in perlfunc for information on how to manipulate individual
2899 bits in a bit vector.
2900
2901 Integer Arithmetic
2902 By default, Perl assumes that it must do most of its arithmetic in
2903 floating point. But by saying
2904
2905 use integer;
2906
2907 you may tell the compiler to use integer operations (see integer for a
2908 detailed explanation) from here to the end of the enclosing BLOCK. An
2909 inner BLOCK may countermand this by saying
2910
2911 no integer;
2912
2913 which lasts until the end of that BLOCK. Note that this doesn't mean
2914 everything is an integer, merely that Perl will use integer operations
2915 for arithmetic, comparison, and bitwise operators. For example, even
2916 under "use integer", if you take the sqrt(2), you'll still get
2917 1.4142135623731 or so.
2918
2919 Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<", and
2920 ">>") always produce integral results. (But see also "Bitwise String
2921 Operators".) However, "use integer" still has meaning for them. By
2922 default, their results are interpreted as unsigned integers, but if
2923 "use integer" is in effect, their results are interpreted as signed
2924 integers. For example, "~0" usually evaluates to a large integral
2925 value. However, "use integer; ~0" is "-1" on two's-complement
2926 machines.
2927
2928 Floating-point Arithmetic
2929 While "use integer" provides integer-only arithmetic, there is no
2930 analogous mechanism to provide automatic rounding or truncation to a
2931 certain number of decimal places. For rounding to a certain number of
2932 digits, sprintf() or printf() is usually the easiest route. See
2933 perlfaq4.
2934
2935 Floating-point numbers are only approximations to what a mathematician
2936 would call real numbers. There are infinitely more reals than floats,
2937 so some corners must be cut. For example:
2938
2939 printf "%.20g\n", 123456789123456789;
2940 # produces 123456789123456784
2941
2942 Testing for exact floating-point equality or inequality is not a good
2943 idea. Here's a (relatively expensive) work-around to compare whether
2944 two floating-point numbers are equal to a particular number of decimal
2945 places. See Knuth, volume II, for a more robust treatment of this
2946 topic.
2947
2948 sub fp_equal {
2949 my ($X, $Y, $POINTS) = @_;
2950 my ($tX, $tY);
2951 $tX = sprintf("%.${POINTS}g", $X);
2952 $tY = sprintf("%.${POINTS}g", $Y);
2953 return $tX eq $tY;
2954 }
2955
2956 The POSIX module (part of the standard perl distribution) implements
2957 ceil(), floor(), and other mathematical and trigonometric functions.
2958 The Math::Complex module (part of the standard perl distribution)
2959 defines mathematical functions that work on both the reals and the
2960 imaginary numbers. Math::Complex not as efficient as POSIX, but POSIX
2961 can't work with complex numbers.
2962
2963 Rounding in financial applications can have serious implications, and
2964 the rounding method used should be specified precisely. In these
2965 cases, it probably pays not to trust whichever system rounding is being
2966 used by Perl, but to instead implement the rounding function you need
2967 yourself.
2968
2969 Bigger Numbers
2970 The standard "Math::BigInt", "Math::BigRat", and "Math::BigFloat"
2971 modules, along with the "bigint", "bigrat", and "bitfloat" pragmas,
2972 provide variable-precision arithmetic and overloaded operators,
2973 although they're currently pretty slow. At the cost of some space and
2974 considerable speed, they avoid the normal pitfalls associated with
2975 limited-precision representations.
2976
2977 use 5.010;
2978 use bigint; # easy interface to Math::BigInt
2979 $x = 123456789123456789;
2980 say $x * $x;
2981 +15241578780673678515622620750190521
2982
2983 Or with rationals:
2984
2985 use 5.010;
2986 use bigrat;
2987 $a = 3/22;
2988 $b = 4/6;
2989 say "a/b is ", $a/$b;
2990 say "a*b is ", $a*$b;
2991 a/b is 9/44
2992 a*b is 1/11
2993
2994 Several modules let you calculate with (bound only by memory and CPU
2995 time) unlimited or fixed precision. There are also some non-standard
2996 modules that provide faster implementations via external C libraries.
2997
2998 Here is a short, but incomplete summary:
2999
3000 Math::Fraction big, unlimited fractions like 9973 / 12967
3001 Math::String treat string sequences like numbers
3002 Math::FixedPrecision calculate with a fixed precision
3003 Math::Currency for currency calculations
3004 Bit::Vector manipulate bit vectors fast (uses C)
3005 Math::BigIntFast Bit::Vector wrapper for big numbers
3006 Math::Pari provides access to the Pari C library
3007 Math::BigInteger uses an external C library
3008 Math::Cephes uses external Cephes C library (no big numbers)
3009 Math::Cephes::Fraction fractions via the Cephes library
3010 Math::GMP another one using an external C library
3011
3012 Choose wisely.
3013
3014
3015
3016perl v5.16.3 2013-03-04 PERLOP(1)