perlreref(1)

1PERLREREF(1)           Perl Programmers Reference Guide           PERLREREF(1)
2
3
4

NAME

6       perlreref - Perl Regular Expressions Reference
7

DESCRIPTION

9       This is a quick reference to Perl's regular expressions.  For full
10       information see perlre and perlop, as well as the "SEE ALSO" section in
11       this document.
12
13       OPERATORS
14
15         =~ determines to which variable the regex is applied.
16            In its absence, $_ is used.
17
18               $var =~ /foo/;
19
20         !~ determines to which variable the regex is applied,
21            and negates the result of the match; it returns
22            false if the match succeeds, and true if it fails.
23
24              $var !~ /foo/;
25
26         m/pattern/igmsoxc searches a string for a pattern match,
27            applying the given options.
28
29               i  case-Insensitive
30               g  Global - all occurrences
31               m  Multiline mode - ^ and $ match internal lines
32               s  match as a Single line - . matches \n
33               o  compile pattern Once
34               x  eXtended legibility - free whitespace and comments
35               c  don't reset pos on failed matches when using /g
36
37            If 'pattern' is an empty string, the last I<successfully> matched
38            regex is used. Delimiters other than '/' may be used for both this
39            operator and the following ones.
40
41         qr/pattern/imsox lets you store a regex in a variable,
42            or pass one around. Modifiers as for m// and are stored
43            within the regex.
44
45         s/pattern/replacement/igmsoxe substitutes matches of
46            'pattern' with 'replacement'. Modifiers as for m//
47            with one addition:
48
49               e  Evaluate replacement as an expression
50
51            'e' may be specified multiple times. 'replacement' is interpreted
52            as a double quoted string unless a single-quote (') is the delimiter.
53
54         ?pattern? is like m/pattern/ but matches only once. No alternate
55             delimiters can be used. Must be reset with L<reset⎪perlfunc/reset>.
56
57       SYNTAX
58
59          \       Escapes the character immediately following it
60          .       Matches any single character except a newline (unless /s is used)
61          ^       Matches at the beginning of the string (or line, if /m is used)
62          $       Matches at the end of the string (or line, if /m is used)
63          *       Matches the preceding element 0 or more times
64          +       Matches the preceding element 1 or more times
65          ?       Matches the preceding element 0 or 1 times
66          {...}   Specifies a range of occurrences for the element preceding it
67          [...]   Matches any one of the characters contained within the brackets
68          (...)   Groups subexpressions for capturing to $1, $2...
69          (?:...) Groups subexpressions without capturing (cluster)
70          ⎪       Matches either the subexpression preceding or following it
71          \1, \2 ...  The text from the Nth group
72
73       ESCAPE SEQUENCES
74
75       These work as in normal strings.
76
77          \a       Alarm (beep)
78          \e       Escape
79          \f       Formfeed
80          \n       Newline
81          \r       Carriage return
82          \t       Tab
83          \037     Any octal ASCII value
84          \x7f     Any hexadecimal ASCII value
85          \x{263a} A wide hexadecimal value
86          \cx      Control-x
87          \N{name} A named character
88
89          \l  Lowercase next character
90          \u  Titlecase next character
91          \L  Lowercase until \E
92          \U  Uppercase until \E
93          \Q  Disable pattern metacharacters until \E
94          \E  End case modification
95
96       For Titlecase, see "Titlecase".
97
98       This one works differently from normal strings:
99
100          \b  An assertion, not backspace, except in a character class
101
102       CHARACTER CLASSES
103
104          [amy]    Match 'a', 'm' or 'y'
105          [f-j]    Dash specifies "range"
106          [f-j-]   Dash escaped or at start or end means 'dash'
107          [^f-j]   Caret indicates "match any character _except_ these"
108
109       The following sequences work within or without a character class.  The
110       first six are locale aware, all are Unicode aware.  The default charac‐
111       ter class equivalent are given.  See perllocale and perlunicode for
112       details.
113
114          \d      A digit                     [0-9]
115          \D      A nondigit                  [^0-9]
116          \w      A word character            [a-zA-Z0-9_]
117          \W      A non-word character        [^a-zA-Z0-9_]
118          \s      A whitespace character      [ \t\n\r\f]
119          \S      A non-whitespace character  [^ \t\n\r\f]
120
121          \C      Match a byte (with Unicode, '.' matches a character)
122          \pP     Match P-named (Unicode) property
123          \p{...} Match Unicode property with long name
124          \PP     Match non-P
125          \P{...} Match lack of Unicode property with long name
126          \X      Match extended unicode sequence
127
128       POSIX character classes and their Unicode and Perl equivalents:
129
130          alnum   IsAlnum              Alphanumeric
131          alpha   IsAlpha              Alphabetic
132          ascii   IsASCII              Any ASCII char
133          blank   IsSpace  [ \t]       Horizontal whitespace (GNU extension)
134          cntrl   IsCntrl              Control characters
135          digit   IsDigit  \d          Digits
136          graph   IsGraph              Alphanumeric and punctuation
137          lower   IsLower              Lowercase chars (locale and Unicode aware)
138          print   IsPrint              Alphanumeric, punct, and space
139          punct   IsPunct              Punctuation
140          space   IsSpace  [\s\ck]     Whitespace
141                  IsSpacePerl   \s     Perl's whitespace definition
142          upper   IsUpper              Uppercase chars (locale and Unicode aware)
143          word    IsWord   \w          Alphanumeric plus _ (Perl extension)
144          xdigit  IsXDigit [0-9A-Fa-f] Hexadecimal digit
145
146       Within a character class:
147
148           POSIX       traditional   Unicode
149           [:digit:]       \d        \p{IsDigit}
150           [:^digit:]      \D        \P{IsDigit}
151
152       ANCHORS
153
154       All are zero-width assertions.
155
156          ^  Match string start (or line, if /m is used)
157          $  Match string end (or line, if /m is used) or before newline
158          \b Match word boundary (between \w and \W)
159          \B Match except at word boundary (between \w and \w or \W and \W)
160          \A Match string start (regardless of /m)
161          \Z Match string end (before optional newline)
162          \z Match absolute string end
163          \G Match where previous m//g left off
164
165       QUANTIFIERS
166
167       Quantifiers are greedy by default -- match the longest leftmost.
168
169          Maximal Minimal Allowed range
170          ------- ------- -------------
171          {n,m}   {n,m}?  Must occur at least n times but no more than m times
172          {n,}    {n,}?   Must occur at least n times
173          {n}     {n}?    Must occur exactly n times
174          *       *?      0 or more times (same as {0,})
175          +       +?      1 or more times (same as {1,})
176          ?       ??      0 or 1 time (same as {0,1})
177
178       There is no quantifier {,n} -- that gets understood as a literal
179       string.
180
181       EXTENDED CONSTRUCTS
182
183          (?#text)         A comment
184          (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
185          (?=...)          Zero-width positive lookahead assertion
186          (?!...)          Zero-width negative lookahead assertion
187          (?<=...)         Zero-width positive lookbehind assertion
188          (?<!...)         Zero-width negative lookbehind assertion
189          (?>...)          Grab what we can, prohibit backtracking
190          (?{ code })      Embedded code, return value becomes $^R
191          (??{ code })     Dynamic regex, return value used as regex
192          (?(cond)yes⎪no)  cond being integer corresponding to capturing parens
193          (?(cond)yes)        or a lookaround/eval zero-width assertion
194
195       VARIABLES
196
197          $_    Default variable for operators to use
198          $*    Enable multiline matching (deprecated; not in 5.9.0 or later)
199
200          $&    Entire matched string
201          $`    Everything prior to matched string
202          $'    Everything after to matched string
203
204       The use of those last three will slow down all regex use within your
205       program. Consult perlvar for @LAST_MATCH_START to see equivalent
206       expressions that won't cause slow down.  See also Devel::SawAmpersand.
207
208          $1, $2 ...  hold the Xth captured expr
209          $+    Last parenthesized pattern match
210          $^N   Holds the most recently closed capture
211          $^R   Holds the result of the last (?{...}) expr
212          @-    Offsets of starts of groups. $-[0] holds start of whole match
213          @+    Offsets of ends of groups. $+[0] holds end of whole match
214
215       Captured groups are numbered according to their opening paren.
216
217       FUNCTIONS
218
219          lc          Lowercase a string
220          lcfirst     Lowercase first char of a string
221          uc          Uppercase a string
222          ucfirst     Titlecase first char of a string
223
224          pos         Return or set current match position
225          quotemeta   Quote metacharacters
226          reset       Reset ?pattern? status
227          study       Analyze string for optimizing matching
228
229          split       Use regex to split a string into parts
230
231       The first four of these are like the escape sequences "\L", "\l", "\U",
232       and "\u".  For Titlecase, see "Titlecase".
233
234       TERMINOLOGY
235
236       Titlecase
237
238       Unicode concept which most often is equal to uppercase, but for certain
239       characters like the German "sharp s" there is a difference.
240

AUTHOR

242       Iain Truskett.
243
244       This document may be distributed under the same terms as Perl itself.
245

THANKS

273       David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom Chris‐
274       tiansen, Jim Cromie, and Jeffrey Goff for useful advice.
275
276
277
278perl v5.8.8                       2006-01-07                      PERLREREF(1)