re_syntax(n)

1re_syntax(n)                 Tcl Built-In Commands                re_syntax(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       re_syntax - Syntax of Tcl regular expressions
9_________________________________________________________________
10

DESCRIPTION

12       A  regular  expression describes strings of characters.  It's a pattern
13       that matches certain strings and does not match others.
14

DIFFERENT FLAVORS OF REs

16       Regular expressions (“RE”s), as defined by POSIX, come in two  flavors:
17       extended  REs  (“ERE”s) and basic REs (“BRE”s).  EREs are roughly those
18       of the traditional egrep, while BREs are roughly those  of  the  tradi‐
19       tional  ed.  This  implementation  adds  a  third  flavor, advanced REs
20       (“ARE”s), basically EREs with some significant extensions.
21
22       This manual page primarily describes AREs. BREs mostly exist for  back‐
23       ward  compatibility in some old programs; they will be discussed at the
24       end. POSIX EREs are almost an exact subset of AREs.  Features  of  AREs
25       that are not present in EREs will be indicated.
26

REGULAR EXPRESSION SYNTAX

28       Tcl  regular  expressions  are implemented using the package written by
29       Henry Spencer, based on the 1003.2 spec and some (not quite all) of the
30       Perl5  extensions  (thanks, Henry!). Much of the description of regular
31       expressions below is copied verbatim from his manual entry.
32
33       An ARE is one or more branches, separated  by  “|”,  matching  anything
34       that matches any of the branches.
35
36       A branch is zero or more constraints or quantified atoms, concatenated.
37       It matches a match for the first, followed by a match for  the  second,
38       etc; an empty branch matches the empty string.
39
40   QUANTIFIERS
41       A  quantified atom is an atom possibly followed by a single quantifier.
42       Without a quantifier, it matches a single  match  for  the  atom.   The
43       quantifiers, and what a so-quantified atom matches, are:
44
45         *     a sequence of 0 or more matches of the atom
46
47         +     a sequence of 1 or more matches of the atom
48
49         ?     a sequence of 0 or 1 matches of the atom
50
51         {m}   a sequence of exactly m matches of the atom
52
53         {m,}  a sequence of m or more matches of the atom
54
55         {m,n} a  sequence  of  m through n (inclusive) matches of the atom; m
56               may not exceed n
57
58         *?  +?  ??  {m}?  {m,}?  {m,n}?
59               non-greedy quantifiers, which match the same possibilities, but
60               prefer  the  smallest  number rather than the largest number of
61               matches (see MATCHING)
62
63       The forms using { and } are known as bounds. The numbers m  and  n  are
64       unsigned  decimal integers with permissible values from 0 to 255 inclu‐
65       sive.
66
67   ATOMS
68       An atom is one of:
69
70         (re)  matches a match for re (re is any regular expression) with  the
71               match noted for possible reporting
72
73         (?:re)
74               as  previous,  but  does no reporting (a “non-capturing” set of
75               parentheses)
76
77         ()    matches an empty string, noted for possible reporting
78
79         (?:)  matches an empty string, without reporting
80
81         [chars]
82               a bracket expression,  matching  any  one  of  the  chars  (see
83               BRACKET EXPRESSIONS for more detail)
84
85         .     matches any single character
86
87         \k    matches  the  non-alphanumeric character k taken as an ordinary
88               character, e.g. \\ matches a backslash character
89
90         \c    where c is alphanumeric (possibly  followed  by  other  charac‐
91               ters), an escape (AREs only), see ESCAPES below
92
93         {     when  followed  by  a character other than a digit, matches the
94               left-brace character “{”; when followed by a digit, it  is  the
95               beginning of a bound (see above)
96
97         x     where  x  is  a  single  character  with no other significance,
98               matches that character.
99
100   CONSTRAINTS
101       A constraint matches an empty string when specific conditions are  met.
102       A  constraint  may  not  be  followed  by a quantifier. The simple con‐
103       straints are as follows; some more constraints are described later, un‐
104       der ESCAPES.
105
106         ^       matches  at  the beginning of the string or a line (according
107                 to whether matching is newline-sensitive or not, as described
108                 in MATCHING, below).
109
110         $       matches  at  the  end  of  the string or a line (according to
111                 whether matching is newline-sensitive or not, as described in
112                 MATCHING, below).
113
114                 The  difference between string and line matching modes is im‐
115                 material when the string does not contain a  newline  charac‐
116                 ter.  The \A and \Z constraint escapes have a similar purpose
117                 but are always constraints for the overall string.
118
119                 The default newline-sensitivity depends on the  command  that
120                 uses  the  regular  expression,  and can be overridden as de‐
121                 scribed in METASYNTAX, below.
122
123         (?=re)  positive lookahead (AREs only), matches at any point where  a
124                 substring matching re begins
125
126         (?!re)  negative lookahead (AREs only), matches at any point where no
127                 substring matching re begins
128
129       The lookahead constraints may not contain back references (see  later),
130       and all parentheses within them are considered non-capturing.
131
132       An RE may not end with “\”.
133

BRACKET EXPRESSIONS

135       A bracket expression is a list of characters enclosed in “[]”.  It nor‐
136       mally matches any single character from the list (but  see  below).  If
137       the  list begins with “^”, it matches any single character (but see be‐
138       low) not from the rest of the list.
139
140       If two characters in the list are separated by “-”, this  is  shorthand
141       for  the  full range of characters between those two (inclusive) in the
142       collating sequence, e.g.  “[0-9]” in Unicode matches  any  conventional
143       decimal  digit.  Two ranges may not share an endpoint, so e.g.  “a-c-e”
144       is illegal. Ranges in Tcl always use the  Unicode  collating  sequence,
145       but  other programs may use other collating sequences and this can be a
146       source of incompatibility between programs.
147
148       To include a literal ] or - in the list, the simplest method is to  en‐
149       close  it  in [. and .] to make it a collating element (see below). Al‐
150       ternatively, make it the first character (following a possible “^”), or
151       (AREs  only)  precede it with “\”.  Alternatively, for “-”, make it the
152       last character, or the second endpoint of a range. To use a  literal  -
153       as  the first endpoint of a range, make it a collating element or (AREs
154       only) precede it with “\”.  With the exception of these, some  combina‐
155       tions  using  [  (see  next paragraphs), and escapes, all other special
156       characters lose their special significance within a bracket expression.
157
158   CHARACTER CLASSES
159       Within a bracket expression, the name of a character class enclosed  in
160       [: and :] stands for the list of all characters (not all collating ele‐
161       ments!) belonging to that class.  Standard character classes are:
162
163       alpha   A letter.
164
165       upper   An upper-case letter.
166
167       lower   A lower-case letter.
168
169       digit   A decimal digit.
170
171       xdigit  A hexadecimal digit.
172
173       alnum   An alphanumeric (letter or digit).
174
175       print   A "printable" (same as graph, except also including space).
176
177       blank   A space or tab character.
178
179       space   A character producing white space in displayed text.
180
181       punct   A punctuation character.
182
183       graph   A character with a visible representation (includes both  alnum
184               and punct).
185
186       cntrl   A control character.
187
188       A  locale  may  provide others. A character class may not be used as an
189       endpoint of a range.
190
191              (Note: the current Tcl implementation has only one  locale,  the
192              Unicode locale, which supports exactly the above classes.)
193
194   BRACKETED CONSTRAINTS
195       There are two special cases of bracket expressions: the bracket expres‐
196       sions “[[:<:]]” and “[[:>:]]” are constraints, matching  empty  strings
197       at  the beginning and end of a word respectively.  A word is defined as
198       a sequence of word characters that is neither preceded nor followed  by
199       word  characters.  A  word character is an alnum character or an under‐
200       score (“_”).  These special bracket expressions are  deprecated;  users
201       of AREs should use constraint escapes instead (see below).
202
203   COLLATING ELEMENTS
204       Within a bracket expression, a collating element (a character, a multi-
205       character sequence that collates as if it were a single character, or a
206       collating-sequence  name  for  either) enclosed in [. and .] stands for
207       the sequence of characters of that collating element. The sequence is a
208       single  element  of the bracket expression's list. A bracket expression
209       in a locale that has multi-character collating elements can thus  match
210       more  than  one  character. So (insidiously), a bracket expression that
211       starts with ^ can match multi-character collating elements even if none
212       of them appear in the bracket expression!
213
214              (Note:  Tcl  has no multi-character collating elements. This in‐
215              formation is only for illustration.)
216
217       For example, assume the collating sequence includes a ch  multi-charac‐
218       ter  collating  element.  Then  the RE “[[.ch.]]*c” (zero or more “chs”
219       followed by “c”) matches the first five characters of “chchcc”.   Also,
220       the  RE “[^c]b” matches all of “chb” (because “[^c]” matches the multi-
221       character “ch”).
222
223   EQUIVALENCE CLASSES
224       Within a bracket expression, a collating element enclosed in [= and  =]
225       is  an  equivalence  class, standing for the sequences of characters of
226       all collating elements equivalent to that one,  including  itself.  (If
227       there  are  no other equivalent collating elements, the treatment is as
228       if the enclosing delimiters were “[.” and “.]”.)  For example, if o and
229       ô  are  the members of an equivalence class, then “[[=o=]]”, “[[=ô=]]”,
230       and “[oô]” are all synonymous. An equivalence class may not be an  end‐
231       point of a range.
232
233              (Note:  Tcl  implements only the Unicode locale. It does not de‐
234              fine any equivalence classes. The examples above are just illus‐
235              trations.)
236

ESCAPES

238       Escapes  (AREs  only), which begin with a \ followed by an alphanumeric
239       character, come in several varieties:  character  entry,  class  short‐
240       hands,  constraint escapes, and back references. A \ followed by an al‐
241       phanumeric character but not constituting a valid escape is illegal  in
242       AREs.  In EREs, there are no escapes: outside a bracket expression, a \
243       followed by an alphanumeric character merely stands for that  character
244       as  an ordinary character, and inside a bracket expression, \ is an or‐
245       dinary character. (The latter is the one actual incompatibility between
246       EREs and AREs.)
247
248   CHARACTER-ENTRY ESCAPES
249       Character-entry  escapes (AREs only) exist to make it easier to specify
250       non-printing and otherwise inconvenient characters in REs:
251
252         \a   alert (bell) character, as in C
253
254         \b   backspace, as in C
255
256         \B   synonym for \ to help reduce backslash doubling in some applica‐
257              tions where there are multiple levels of backslash processing
258
259         \cX  (where  X is any character) the character whose low-order 5 bits
260              are the same as those of X, and whose other bits are all zero
261
262         \e   the character whose collating-sequence name is “ESC”, or failing
263              that, the character with octal value 033
264
265         \f   formfeed, as in C
266
267         \n   newline, as in C
268
269         \r   carriage return, as in C
270
271         \t   horizontal tab, as in C
272
273         \uwxyz
274              (where  wxyz  is  one up to four hexadecimal digits) the Unicode
275              character U+wxyz in the local byte ordering
276
277         \Ustuvwxyz
278              (where stuvwxyz is one up to eight hexadecimal digits)  reserved
279              for a Unicode extension up to 21 bits. The digits are parsed un‐
280              til the first non-hexadecimal character is encountered, the max‐
281              imun  of  eight  hexadecimal  digits are reached, or an overflow
282              would occur in the maximum value of U+10ffff.
283
284         \v   vertical tab, as in C are all available.
285
286         \xhh (where hh is one or two hexadecimal digits) the character  whose
287              hexadecimal value is 0xhh.
288
289         \0   the character whose value is 0
290
291         \xyz (where xyz is exactly three octal digits, and is not a back ref‐
292              erence (see below)) the character whose octal value is 0xyz. The
293              first  digit  must  be in the range 0-3, otherwise the two-digit
294              form is assumed.
295
296         \xy  (where xy is exactly two octal digits, and is not a back  refer‐
297              ence (see below)) the character whose octal value is 0xy
298
299       Hexadecimal digits are “0”-“9”, “a”-“f”, and “A”-“F”.  Octal digits are
300       “0”-“7”.
301
302       The character-entry escapes are always taken  as  ordinary  characters.
303       For  example,  \135  is  ]  in  Unicode,  but \135 does not terminate a
304       bracket expression. Beware, however, that some  applications  (e.g.,  C
305       compilers  and  the  Tcl  interpreter  if the regular expression is not
306       quoted with braces) interpret such sequences themselves before the reg‐
307       ular-expression  package  gets  to see them, which may require doubling
308       (quadrupling, etc.) the “\”.
309
310   CLASS-SHORTHAND ESCAPES
311       Class-shorthand escapes (AREs only) provide shorthands for certain com‐
312       monly-used character classes:
313
314         \d        [[:digit:]]
315
316         \s        [[:space:]]
317
318         \w        [[:al‐
319                   num:]_\u203F\u2040\u2054\uFE33\uFE34\uFE4D\uFE4E\uFE4F\uFF3F]
320                   (including punctuation connector characters)
321
322         \D        [^[:digit:]]
323
324         \S        [^[:space:]]
325
326         \W        [^[:al‐
327                   num:]_\u203F\u2040\u2054\uFE33\uFE34\uFE4D\uFE4E\uFE4F\uFF3F]
328                   (including punctuation connector characters)
329
330       Within  bracket  expressions,  “\d”,  “\s”,  and  “\w” lose their outer
331       brackets, and “\D”, “\S”, and “\W” are illegal. (So, for example,  “[a-
332       c\d]”  is  equivalent  to  “[a-c[:digit:]]”.  Also, “[a-c\D]”, which is
333       equivalent to “[a-c^[:digit:]]”, is illegal.)
334
335   CONSTRAINT ESCAPES
336       A constraint escape (AREs only) is a  constraint,  matching  the  empty
337       string if specific conditions are met, written as an escape:
338
339         \A    matches  only at the beginning of the string (see MATCHING, be‐
340               low, for how this differs from “^”)
341
342         \m    matches only at the beginning of a word
343
344         \M    matches only at the end of a word
345
346         \y    matches only at the beginning or end of a word
347
348         \Y    matches only at a point that is not the beginning or end  of  a
349               word
350
351         \Z    matches only at the end of the string (see MATCHING, below, for
352               how this differs from “$”)
353
354         \m    (where m is a nonzero digit) a back reference, see below
355
356         \mnn  (where m is a nonzero digit, and nn is some  more  digits,  and
357               the decimal value mnn is not greater than the number of closing
358               capturing parentheses seen so far) a back reference, see below
359
360       A word is defined as in the specification of  “[[:<:]]”  and  “[[:>:]]”
361       above. Constraint escapes are illegal within bracket expressions.
362
363   BACK REFERENCES
364       A  back  reference  (AREs  only) matches the same string matched by the
365       parenthesized subexpression specified by the  number,  so  that  (e.g.)
366       “([bc])\1”  matches  “bb” or “cc” but not “bc”.  The subexpression must
367       entirely precede the back reference in the RE.  Subexpressions are num‐
368       bered  in the order of their leading parentheses.  Non-capturing paren‐
369       theses do not define subexpressions.
370
371       There is an inherent historical ambiguity between octal character-entry
372       escapes and back references, which is resolved by heuristics, as hinted
373       at above. A leading zero always indicates an  octal  escape.  A  single
374       non-zero  digit,  not  followed  by another digit, is always taken as a
375       back reference. A multi-digit sequence not  starting  with  a  zero  is
376       taken  as  a  back reference if it comes after a suitable subexpression
377       (i.e. the number is in the legal range for a back reference), and  oth‐
378       erwise is taken as octal.
379

METASYNTAX

381       In  addition to the main syntax described above, there are some special
382       forms and miscellaneous syntactic facilities available.
383
384       Normally the flavor of RE being used is specified by application-depen‐
385       dent  means. However, this can be overridden by a director. If an RE of
386       any flavor begins with “***:”, the rest of the RE is an ARE. If  an  RE
387       of  any  flavor begins with “***=”, the rest of the RE is taken to be a
388       literal string, with all characters considered ordinary characters.
389
390       An ARE may begin with embedded options: a sequence (?xyz) (where xyz is
391       one or more alphabetic characters) specifies options affecting the rest
392       of the RE. These supplement, and can override, any options specified by
393       the application. The available option letters are:
394
395         b  rest of RE is a BRE
396
397         c  case-sensitive matching (usual default)
398
399         e  rest of RE is an ERE
400
401         i  case-insensitive matching (see MATCHING, below)
402
403         m  historical synonym for n
404
405         n  newline-sensitive matching (see MATCHING, below)
406
407         p  partial newline-sensitive matching (see MATCHING, below)
408
409         q  rest of RE is a literal (“quoted”) string, all ordinary characters
410
411         s  non-newline-sensitive matching (usual default)
412
413         t  tight syntax (usual default; see below)
414
415         w  inverse  partial  newline-sensitive (“weird”) matching (see MATCH‐
416            ING, below)
417
418         x  expanded syntax (see below)
419
420       Embedded options take effect at the ) terminating the  sequence.   They
421       are  available  only  at the start of an ARE, and may not be used later
422       within it.
423
424       In addition to the usual (tight) RE syntax, in which all characters are
425       significant,  there  is an expanded syntax, available in all flavors of
426       RE with the -expanded switch, or in AREs with the embedded x option. In
427       the expanded syntax, white-space characters are ignored and all charac‐
428       ters between a # and the following newline (or the end of the  RE)  are
429       ignored, permitting paragraphing and commenting a complex RE. There are
430       three exceptions to that basic rule:
431
432       •  a white-space character or “#” preceded by “\” is retained
433
434       •  white space or “#” within a bracket expression is retained
435
436       •  white space and comments are illegal within multi-character  symbols
437          like the ARE “(?:” or the BRE “\(”
438
439       Expanded-syntax white-space characters are blank, tab, newline, and any
440       character that belongs to the space character class.
441
442       Finally, in an ARE, outside bracket expressions, the sequence “(?#ttt)”
443       (where  ttt  is any text not containing a “)”) is a comment, completely
444       ignored. Again, this is not allowed between the  characters  of  multi-
445       character  symbols like “(?:”.  Such comments are more a historical ar‐
446       tifact than a useful facility, and their use is deprecated; use the ex‐
447       panded syntax instead.
448
449       None of these metasyntax extensions is available if the application (or
450       an initial “***=” director) has specified  that  the  user's  input  be
451       treated as a literal string rather than as an RE.
452

MATCHING

454       In  the event that an RE could match more than one substring of a given
455       string, the RE matches the one starting earliest in the string. If  the
456       RE  could  match  more  than  one substring starting at that point, its
457       choice is determined by its preference: either the  longest  substring,
458       or the shortest.
459
460       Most atoms, and all constraints, have no preference. A parenthesized RE
461       has the same preference (possibly none) as the RE.  A  quantified  atom
462       with  quantifier {m} or {m}? has the same preference (possibly none) as
463       the atom itself. A quantified atom with other normal  quantifiers  (in‐
464       cluding  {m,n}  with  m equal to n) prefers longest match. A quantified
465       atom with other non-greedy quantifiers (including {m,n}?  with m  equal
466       to  n)  prefers shortest match. A branch has the same preference as the
467       first quantified atom in it which has a preference. An RE consisting of
468       two or more branches connected by the | operator prefers longest match.
469
470       Subject  to the constraints imposed by the rules for matching the whole
471       RE, subexpressions also match the longest  or  shortest  possible  sub‐
472       strings,  based on their preferences, with subexpressions starting ear‐
473       lier in the RE taking priority over  ones  starting  later.  Note  that
474       outer subexpressions thus take priority over their component subexpres‐
475       sions.
476
477       The quantifiers {1,1} and {1,1}? can  be  used  to  force  longest  and
478       shortest preference, respectively, on a subexpression or a whole RE.
479
480              NOTE:  This  means  that you can usually make a RE be non-greedy
481              overall by putting {1,1}? after one of the first  non-constraint
482              atoms or parenthesized sub-expressions in it. It pays to experi‐
483              ment with the placing of this non-greediness override on a suit‐
484              able  range  of input texts when you are writing a RE if you are
485              using this level of complexity.
486
487              For example, this regular expression  is  non-greedy,  and  will
488              match  the  shortest substring possible given that “abc” will be
489              matched as early as possible (the  quantifier  does  not  change
490              that):
491
492                     ab{1,1}?c.*x.*cba
493
494              The  atom  “a”  has no greediness preference, we explicitly give
495              one for “b”, and the remaining quantifiers are overridden to  be
496              non-greedy by the preceding non-greedy quantifier.
497
498       Match  lengths  are  measured in characters, not collating elements. An
499       empty string is considered longer than no match at  all.  For  example,
500       “bb*”    matches    the    three    middle   characters   of   “abbbc”,
501       “(week|wee)(night|knights)” matches all ten characters of “weeknights”,
502       when “(.*).*”  is matched against “abc” the parenthesized subexpression
503       matches all three characters, and when “(a*)*” is matched against  “bc”
504       both  the  whole  RE and the parenthesized subexpression match an empty
505       string.
506
507       If case-independent matching is specified, the effect is much as if all
508       case  distinctions  had  vanished from the alphabet. When an alphabetic
509       that exists in multiple cases appears as an ordinary character  outside
510       a  bracket expression, it is effectively transformed into a bracket ex‐
511       pression containing both cases, so that x becomes “[xX]”.  When it  ap‐
512       pears  inside  a  bracket  expression,  all case counterparts of it are
513       added to the bracket expression,  so  that  “[x]”  becomes  “[xX]”  and
514       “[^x]” becomes “[^xX]”.
515
516       If  newline-sensitive  matching is specified, . and bracket expressions
517       using ^ will never match the newline character (so  that  matches  will
518       never  cross newlines unless the RE explicitly arranges it) and ^ and $
519       will match the empty string after and before a newline respectively, in
520       addition  to  matching at beginning and end of string respectively. ARE
521       \A and \Z continue to match beginning or end of string only.
522
523       If partial newline-sensitive matching is specified, this affects .  and
524       bracket  expressions  as with newline-sensitive matching, but not ^ and
525       $.
526
527       If inverse partial newline-sensitive matching is  specified,  this  af‐
528       fects ^ and $ as with newline-sensitive matching, but not . and bracket
529       expressions. This is not very useful but is provided for symmetry.
530

LIMITS AND COMPATIBILITY

532       No particular limit is imposed on the length of REs. Programs  intended
533       to be highly portable should not employ REs longer than 256 bytes, as a
534       POSIX-compliant implementation can refuse to accept such REs.
535
536       The only feature of AREs that is actually incompatible with POSIX  EREs
537       is that \ does not lose its special significance inside bracket expres‐
538       sions. All other ARE features use syntax which is illegal or has  unde‐
539       fined or unspecified effects in POSIX EREs; the *** syntax of directors
540       likewise is outside the POSIX syntax for both BREs and EREs.
541
542       Many of the ARE extensions are borrowed from Perl, but some  have  been
543       changed  to  clean  them up, and a few Perl extensions are not present.
544       Incompatibilities of note include  “\b”,  “\B”,  the  lack  of  special
545       treatment  for a trailing newline, the addition of complemented bracket
546       expressions to the things affected by newline-sensitive  matching,  the
547       restrictions  on  parentheses  and  back  references  in lookahead con‐
548       straints, and  the  longest/shortest-match  (rather  than  first-match)
549       matching semantics.
550
551       The  matching rules for REs containing both normal and non-greedy quan‐
552       tifiers have changed since early beta-test versions  of  this  package.
553       (The new rules are much simpler and cleaner, but do not work as hard at
554       guessing the user's real intentions.)
555
556       Henry Spencer's original 1986 regexp package, still in  widespread  use
557       (e.g., in pre-8.1 releases of Tcl), implemented an early version of to‐
558       day's EREs. There are four incompatibilities between regexp's near-EREs
559       (“RREs”  for  short)  and AREs. In roughly increasing order of signifi‐
560       cance:
561
562       •  In AREs, \ followed by an alphanumeric character is either an escape
563          or  an  error, while in RREs, it was just another way of writing the
564          alphanumeric. This should not be a problem because there was no rea‐
565          son to write such a sequence in RREs.
566
567       •  {  followed  by a digit in an ARE is the beginning of a bound, while
568          in RREs, { was always an ordinary character. Such  sequences  should
569          be rare, and will often result in an error because following charac‐
570          ters will not look like a valid bound.
571
572       •  In AREs, \ remains a special character within “[]”, so a  literal  \
573          within [] must be written “\\”.  \\ also gives a literal \ within []
574          in RREs, but only truly paranoid programmers routinely  doubled  the
575          backslash.
576
577       •  AREs  report  the longest/shortest match for the RE, rather than the
578          first found in a specified search order. This may affect  some  RREs
579          which  were written in the expectation that the first match would be
580          reported. (The careful crafting of RREs to optimize the search order
581          for  fast matching is obsolete (AREs examine all possible matches in
582          parallel, and their performance is largely insensitive to their com‐
583          plexity)  but cases where the search order was exploited to deliber‐
584          ately find a match which was  not  the  longest/shortest  will  need
585          rewriting.)
586

BASIC REGULAR EXPRESSIONS

588       BREs  differ  from EREs in several respects.  “|”, “+”, and ? are ordi‐
589       nary characters and there is no equivalent for their functionality. The
590       delimiters for bounds are \{ and “\}”, with { and } by themselves ordi‐
591       nary characters. The parentheses for nested subexpressions are  \(  and
592       “\)”,  with ( and ) by themselves ordinary characters. ^ is an ordinary
593       character except at the beginning of the  RE  or  the  beginning  of  a
594       parenthesized  subexpression,  $ is an ordinary character except at the
595       end of the RE or the end of a parenthesized subexpression, and * is  an
596       ordinary  character if it appears at the beginning of the RE or the be‐
597       ginning of a parenthesized  subexpression  (after  a  possible  leading
598       “^”).   Finally, single-digit back references are available, and \< and
599       \> are synonyms for “[[:<:]]” and “[[:>:]]” respectively; no other  es‐
600       capes are available.
601

KEYWORDS

606       match, regular expression, string
607
608
609
610Tcl                                   8.1                         re_syntax(n)