perlreref(1)

1PERLREREF(1)           Perl Programmers Reference Guide           PERLREREF(1)
2
3
4

NAME

6       perlreref - Perl Regular Expressions Reference
7

DESCRIPTION

9       This is a quick reference to Perl's regular expressions.  For full
10       information see perlre and perlop, as well as the "SEE ALSO" section in
11       this document.
12
13   OPERATORS
14       "=~" determines to which variable the regex is applied.  In its
15       absence, $_ is used.
16
17           $var =~ /foo/;
18
19       "!~" determines to which variable the regex is applied, and negates the
20       result of the match; it returns false if the match succeeds, and true
21       if it fails.
22
23           $var !~ /foo/;
24
25       "m/pattern/msixpogc" searches a string for a pattern match, applying
26       the given options.
27
28           m  Multiline mode - ^ and $ match internal lines
29           s  match as a Single line - . matches \n
30           i  case-Insensitive
31           x  eXtended legibility - free whitespace and comments
32           p  Preserve a copy of the matched string -
33              ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
34           o  compile pattern Once
35           g  Global - all occurrences
36           c  don't reset pos on failed matches when using /g
37
38       If 'pattern' is an empty string, the last successfully matched regex is
39       used. Delimiters other than '/' may be used for both this operator and
40       the following ones. The leading "m" can be omitted if the delimiter is
41       '/'.
42
43       "qr/pattern/msixpo" lets you store a regex in a variable, or pass one
44       around. Modifiers as for "m//", and are stored within the regex.
45
46       "s/pattern/replacement/msixpogce" substitutes matches of 'pattern' with
47       'replacement'. Modifiers as for "m//", with one addition:
48
49           e  Evaluate 'replacement' as an expression
50
51       'e' may be specified multiple times. 'replacement' is interpreted as a
52       double quoted string unless a single-quote ("'") is the delimiter.
53
54       "?pattern?" is like "m/pattern/" but matches only once. No alternate
55       delimiters can be used.  Must be reset with reset().
56
57   SYNTAX
58          \       Escapes the character immediately following it
59          .       Matches any single character except a newline (unless /s is used)
60          ^       Matches at the beginning of the string (or line, if /m is used)
61          $       Matches at the end of the string (or line, if /m is used)
62          *       Matches the preceding element 0 or more times
63          +       Matches the preceding element 1 or more times
64          ?       Matches the preceding element 0 or 1 times
65          {...}   Specifies a range of occurrences for the element preceding it
66          [...]   Matches any one of the characters contained within the brackets
67          (...)   Groups subexpressions for capturing to $1, $2...
68          (?:...) Groups subexpressions without capturing (cluster)
69          |       Matches either the subexpression preceding or following it
70          \1, \2, \3 ...           Matches the text from the Nth group
71          \g1 or \g{1}, \g2 ...    Matches the text from the Nth group
72          \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
73          \g{name}     Named backreference
74          \k<name>     Named backreference
75          \k'name'     Named backreference
76          (?P=name)    Named backreference (python syntax)
77
78   ESCAPE SEQUENCES
79       These work as in normal strings.
80
81          \a       Alarm (beep)
82          \e       Escape
83          \f       Formfeed
84          \n       Newline
85          \r       Carriage return
86          \t       Tab
87          \037     Any octal ASCII value
88          \x7f     Any hexadecimal ASCII value
89          \x{263a} A wide hexadecimal value
90          \cx      Control-x
91          \N{name} A named character
92
93          \l  Lowercase next character
94          \u  Titlecase next character
95          \L  Lowercase until \E
96          \U  Uppercase until \E
97          \Q  Disable pattern metacharacters until \E
98          \E  End modification
99
100       For Titlecase, see "Titlecase".
101
102       This one works differently from normal strings:
103
104          \b  An assertion, not backspace, except in a character class
105
106   CHARACTER CLASSES
107          [amy]    Match 'a', 'm' or 'y'
108          [f-j]    Dash specifies "range"
109          [f-j-]   Dash escaped or at start or end means 'dash'
110          [^f-j]   Caret indicates "match any character _except_ these"
111
112       The following sequences work within or without a character class.  The
113       first six are locale aware, all are Unicode aware. See perllocale and
114       perlunicode for details.
115
116          \d      A digit
117          \D      A nondigit
118          \w      A word character
119          \W      A non-word character
120          \s      A whitespace character
121          \S      A non-whitespace character
122          \h      An horizontal white space
123          \H      A non horizontal white space
124          \v      A vertical white space
125          \V      A non vertical white space
126          \R      A generic newline           (?>\v|\x0D\x0A)
127
128          \C      Match a byte (with Unicode, '.' matches a character)
129          \pP     Match P-named (Unicode) property
130          \p{...} Match Unicode property with long name
131          \PP     Match non-P
132          \P{...} Match lack of Unicode property with long name
133          \X      Match extended Unicode combining character sequence
134
135       POSIX character classes and their Unicode and Perl equivalents:
136
137          alnum   IsAlnum              Alphanumeric
138          alpha   IsAlpha              Alphabetic
139          ascii   IsASCII              Any ASCII char
140          blank   IsSpace  [ \t]       Horizontal whitespace (GNU extension)
141          cntrl   IsCntrl              Control characters
142          digit   IsDigit  \d          Digits
143          graph   IsGraph              Alphanumeric and punctuation
144          lower   IsLower              Lowercase chars (locale and Unicode aware)
145          print   IsPrint              Alphanumeric, punct, and space
146          punct   IsPunct              Punctuation
147          space   IsSpace  [\s\ck]     Whitespace
148                  IsSpacePerl   \s     Perl's whitespace definition
149          upper   IsUpper              Uppercase chars (locale and Unicode aware)
150          word    IsWord   \w          Alphanumeric plus _ (Perl extension)
151          xdigit  IsXDigit [0-9A-Fa-f] Hexadecimal digit
152
153       Within a character class:
154
155           POSIX       traditional   Unicode
156           [:digit:]       \d        \p{IsDigit}
157           [:^digit:]      \D        \P{IsDigit}
158
159   ANCHORS
160       All are zero-width assertions.
161
162          ^  Match string start (or line, if /m is used)
163          $  Match string end (or line, if /m is used) or before newline
164          \b Match word boundary (between \w and \W)
165          \B Match except at word boundary (between \w and \w or \W and \W)
166          \A Match string start (regardless of /m)
167          \Z Match string end (before optional newline)
168          \z Match absolute string end
169          \G Match where previous m//g left off
170
171          \K Keep the stuff left of the \K, don't include it in $&
172
173   QUANTIFIERS
174       Quantifiers are greedy by default -- match the longest leftmost.
175
176          Maximal Minimal Possessive Allowed range
177          ------- ------- ---------- -------------
178          {n,m}   {n,m}?  {n,m}+     Must occur at least n times
179                                     but no more than m times
180          {n,}    {n,}?   {n,}+      Must occur at least n times
181          {n}     {n}?    {n}+       Must occur exactly n times
182          *       *?      *+         0 or more times (same as {0,})
183          +       +?      ++         1 or more times (same as {1,})
184          ?       ??      ?+         0 or 1 time (same as {0,1})
185
186       The possessive forms (new in Perl 5.10) prevent backtracking: what gets
187       matched by a pattern with a possessive quantifier will not be
188       backtracked into, even if that causes the whole match to fail.
189
190       There is no quantifier {,n} -- that gets understood as a literal
191       string.
192
193   EXTENDED CONSTRUCTS
194          (?#text)          A comment
195          (?:...)           Groups subexpressions without capturing (cluster)
196          (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
197          (?=...)           Zero-width positive lookahead assertion
198          (?!...)           Zero-width negative lookahead assertion
199          (?<=...)          Zero-width positive lookbehind assertion
200          (?<!...)          Zero-width negative lookbehind assertion
201          (?>...)           Grab what we can, prohibit backtracking
202          (?|...)           Branch reset
203          (?<name>...)      Named capture
204          (?'name'...)      Named capture
205          (?P<name>...)     Named capture (python syntax)
206          (?{ code })       Embedded code, return value becomes $^R
207          (??{ code })      Dynamic regex, return value used as regex
208          (?N)              Recurse into subpattern number N
209          (?-N), (?+N)      Recurse into Nth previous/next subpattern
210          (?R), (?0)        Recurse at the beginning of the whole pattern
211          (?&name)          Recurse into a named subpattern
212          (?P>name)         Recurse into a named subpattern (python syntax)
213          (?(cond)yes|no)
214          (?(cond)yes)      Conditional expression, where "cond" can be:
215                            (N)       subpattern N has matched something
216                            (<name>)  named subpattern has matched something
217                            ('name')  named subpattern has matched something
218                            (?{code}) code condition
219                            (R)       true if recursing
220                            (RN)      true if recursing into Nth subpattern
221                            (R&name)  true if recursing into named subpattern
222                            (DEFINE)  always false, no no-pattern allowed
223
224   VARIABLES
225          $_    Default variable for operators to use
226
227          $`    Everything prior to matched string
228          $&    Entire matched string
229          $'    Everything after to matched string
230
231          ${^PREMATCH}   Everything prior to matched string
232          ${^MATCH}      Entire matched string
233          ${^POSTMATCH}  Everything after to matched string
234
235       The use of "$`", $& or "$'" will slow down all regex use within your
236       program. Consult perlvar for "@-" to see equivalent expressions that
237       won't cause slow down.  See also Devel::SawAmpersand. Starting with
238       Perl 5.10, you can also use the equivalent variables "${^PREMATCH}",
239       "${^MATCH}" and "${^POSTMATCH}", but for them to be defined, you have
240       to specify the "/p" (preserve) modifier on your regular expression.
241
242          $1, $2 ...  hold the Xth captured expr
243          $+    Last parenthesized pattern match
244          $^N   Holds the most recently closed capture
245          $^R   Holds the result of the last (?{...}) expr
246          @-    Offsets of starts of groups. $-[0] holds start of whole match
247          @+    Offsets of ends of groups. $+[0] holds end of whole match
248          %+    Named capture buffers
249          %-    Named capture buffers, as array refs
250
251       Captured groups are numbered according to their opening paren.
252
253   FUNCTIONS
254          lc          Lowercase a string
255          lcfirst     Lowercase first char of a string
256          uc          Uppercase a string
257          ucfirst     Titlecase first char of a string
258
259          pos         Return or set current match position
260          quotemeta   Quote metacharacters
261          reset       Reset ?pattern? status
262          study       Analyze string for optimizing matching
263
264          split       Use a regex to split a string into parts
265
266       The first four of these are like the escape sequences "\L", "\l", "\U",
267       and "\u".  For Titlecase, see "Titlecase".
268
269   TERMINOLOGY
270       Titlecase
271
272       Unicode concept which most often is equal to uppercase, but for certain
273       characters like the German "sharp s" there is a difference.
274

AUTHOR

276       Iain Truskett. Updated by the Perl 5 Porters.
277
278       This document may be distributed under the same terms as Perl itself.
279

THANKS

311       David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom
312       Christiansen, Jim Cromie, and Jeffrey Goff for useful advice.
313
314
315
316perl v5.10.1                      2009-02-12                      PERLREREF(1)