1Regexp::Assemble(3) User Contributed Perl Documentation Regexp::Assemble(3)
2
3
4
6 Regexp::Assemble - Assemble multiple Regular Expressions into a single
7 RE
8
10 use Regexp::Assemble;
11
12 my $ra = Regexp::Assemble->new;
13 $ra->add( 'ab+c' );
14 $ra->add( 'ab+-' );
15 $ra->add( 'a\w\d+' );
16 $ra->add( 'a\d+' );
17 print $ra->re; # prints a(?:\w?\d+|b+[-c])
18
20 Regexp::Assemble takes an arbitrary number of regular expressions and
21 assembles them into a single regular expression (or RE) that matches
22 all that the individual REs match.
23
24 As a result, instead of having a large list of expressions to loop
25 over, a target string only needs to be tested against one expression.
26 This is interesting when you have several thousand patterns to deal
27 with. Serious effort is made to produce the smallest pattern possible.
28
29 It is also possible to track the original patterns, so that you can
30 determine which, among the source patterns that form the assembled
31 pattern, was the one that caused the match to occur.
32
33 You should realise that large numbers of alternations are processed in
34 perl's regular expression engine in O(n) time, not O(1). If you are
35 still having performance problems, you should look at using a trie.
36 Note that Perl's own regular expression engine will implement trie
37 optimisations in perl 5.10 (they are already available in perl 5.9.3 if
38 you want to try them out). "Regexp::Assemble" will do the right thing
39 when it knows it's running on a trie'd perl. (At least in some version
40 after this one).
41
42 Some more examples of usage appear in the accompanying README. If that
43 file is not easy to access locally, you can find it on a web repository
44 such as <http://search.cpan.org/dist/Regexp-Assemble/README> or
45 <http://cpan.uwinnipeg.ca/htdocs/Regexp-Assemble/README.html>.
46
47 See also "LIMITATIONS".
48
50 add(LIST)
51 Takes a string, breaks it apart into a set of tokens (respecting meta
52 characters) and inserts the resulting list into the "R::A" object. It
53 uses a naive regular expression to lex the string that may be fooled
54 complex expressions (specifically, it will fail to lex nested
55 parenthetical expressions such as "ab(cd(ef)?gh)ij" correctly). If this
56 is the case, the end of the string will not be tokenised correctly and
57 returned as one long string.
58
59 On the one hand, this may indicate that the patterns you are trying to
60 feed the "R::A" object are too complex. Simpler patterns might allow
61 the algorithm to work more effectively and perform more reductions in
62 the resulting pattern.
63
64 On the other hand, you can supply your own pattern to perform the
65 lexing if you need. The test suite contains an example of a lexer
66 pattern that will match one level of nested parentheses.
67
68 Note that there is an internal optimisation that will bypass a much of
69 the lexing process. If a string contains no "\" (backslash), "[" (open
70 square bracket), "(" (open paren), "?" (question mark), "+" (plus), "*"
71 (star) or "{" (open curly), a character split will be performed
72 directly.
73
74 A list of strings may be supplied, thus you can pass it a file handle
75 of a file opened for reading:
76
77 $re->add( '\d+-\d+-\d+-\d+\.example\.com' );
78 $re->add( <IN> );
79
80 If the file is very large, it may be more efficient to use a "while"
81 loop, to read the file line-by-line:
82
83 $re->add($_) while <IN>;
84
85 The "add" method will chomp the lines automatically. If you do not want
86 this to occur (you want to keep the record separator), then disable
87 "chomp"ing.
88
89 $re->chomp(0);
90 $re->add($_) while <IN>;
91
92 This method is chainable.
93
94 add_file(FILENAME [...])
95 Takes a list of file names. Each file is opened and read line by line.
96 Each line is added to the assembly.
97
98 $r->add_file( 'file.1', 'file.2' );
99
100 If a file cannot be opened, the method will croak. If you cannot afford
101 to let this happen then you should wrap the call in a "eval" block.
102
103 Chomping happens automatically unless you the chomp(0) method to
104 disable it. By default, input lines are read according to the value of
105 the "input_record_separator" attribute (if defined), and will otherwise
106 fall back to the current setting of the system $/ variable. The record
107 separator may also be specified on each call to "add_file". Internally,
108 the routine "local"ises the value of $/ to whatever is required, for
109 the duration of the call.
110
111 An alternate calling mechanism using a hash reference is available.
112 The recognised keys are:
113
114 file
115 Reference to a list of file names, or the name of a single file.
116
117 $r->add_file({file => ['file.1', 'file.2', 'file.3']});
118 $r->add_file({file => 'file.n'});
119
120 input_record_separator
121 If present, indicates what constitutes a line
122
123 $r->add_file({file => 'data.txt', input_record_separator => ':' });
124
125 rs An alias for input_record_separator (mnemonic: same as the English
126 variable names).
127
128 $r->add_file( {
129 file => [ 'pattern.txt', 'more.txt' ],
130 input_record_separator => "\r\n",
131 });
132
133 clone()
134 Clones the contents of a Regexp::Assemble object and creates a new
135 object (in other words it performs a deep copy).
136
137 If the Storable module is installed, its dclone method will be used,
138 otherwise the cloning will be performed using a pure perl approach.
139
140 You can use this method to take a snapshot of the patterns that have
141 been added so far to an object, and generate an assembly from the
142 clone. Additional patterns may to be added to the original object
143 afterwards.
144
145 my $re = $main->clone->re();
146 $main->add( 'another-pattern-\\d+' );
147
148 insert(LIST)
149 Takes a list of tokens representing a regular expression and stores
150 them in the object. Note: you should not pass it a bare regular
151 expression, such as "ab+c?d*e". You must pass it as a list of tokens,
152 e.g. "('a', 'b+', 'c?', 'd*', 'e')".
153
154 This method is chainable, e.g.:
155
156 my $ra = Regexp::Assemble->new
157 ->insert( qw[ a b+ c? d* e ] )
158 ->insert( qw[ a c+ d+ e* f ] );
159
160 Lexing complex patterns with metacharacters and so on can consume a
161 significant proportion of the overall time to build an assembly. If
162 you have the information available in a tokenised form, calling
163 "insert" directly can be a big win.
164
165 lexstr
166 Use the "lexstr" method if you are curious to see how a pattern gets
167 tokenised. It takes a scalar on input, representing a pattern, and
168 returns a reference to an array, containing the tokenised pattern. You
169 can recover the original pattern by performing a "join":
170
171 my @token = $re->lexstr($pattern);
172 my $new_pattern = join( '', @token );
173
174 If the original pattern contains unnecessary backslashes, or "\x4b"
175 escapes, or quotemeta escapes ("\Q"..."\E") the resulting pattern may
176 not be identical.
177
178 Call "lexstr" does not add the pattern to the object, it is merely for
179 exploratory purposes. It will, however, update various statistical
180 counters.
181
182 pre_filter(CODE)
183 Allows you to install a callback to check that the pattern being loaded
184 contains valid input. It receives the pattern as a whole to be added,
185 before it been tokenised by the lexer. It may to return 0 or "undef" to
186 indicate that the pattern should not be added, any true value indicates
187 that the contents are fine.
188
189 A filter to strip out trailing comments (marked by #):
190
191 $re->pre_filter( sub { $_[0] =~ s/\s*#.*$//; 1 } );
192
193 A filter to ignore blank lines:
194
195 $re->pre_filter( sub { length(shift) } );
196
197 If you want to remove the filter, pass "undef" as a parameter.
198
199 $ra->pre_filter(undef);
200
201 This method is chainable.
202
203 filter(CODE)
204 Allows you to install a callback to check that the pattern being loaded
205 contains valid input. It receives a list on input, after it has been
206 tokenised by the lexer. It may to return 0 or undef to indicate that
207 the pattern should not be added, any true value indicates that the
208 contents are fine.
209
210 If you know that all patterns you expect to assemble contain a
211 restricted set of of tokens (e.g. no spaces), you could do the
212 following:
213
214 $ra->filter(sub { not grep { / / } @_ });
215
216 or
217
218 sub only_spaces_and_digits {
219 not grep { ![\d ] } @_
220 }
221 $ra->filter( \&only_spaces_and_digits );
222
223 These two examples will silently ignore faulty patterns, If you want
224 the user to be made aware of the problem you should raise an error (via
225 "warn" or "die"), log an error message, whatever is best. If you want
226 to remove a filter, pass "undef" as a parameter.
227
228 $ra->filter(undef);
229
230 This method is chainable.
231
232 as_string
233 Assemble the expression and return it as a string. You may want to do
234 this if you are writing the pattern to a file. The following arguments
235 can be passed to control the aspect of the resulting pattern:
236
237 indent, the number of spaces used to indent nested grouping of a
238 pattern. Use this to produce a pretty-printed pattern (for some
239 definition of "pretty"). The resulting output is rather verbose. The
240 reason is to ensure that the metacharacters "(?:" and ")" always occur
241 on otherwise empty lines. This allows you grep the result for an even
242 more synthetic view of the pattern:
243
244 egrep -v '^ *[()]' <regexp.file>
245
246 The result of the above is quite readable. Remember to backslash the
247 spaces appearing in your own patterns if you wish to use an indented
248 pattern in an "m/.../x" construct. Indenting is ignored if tracking is
249 enabled.
250
251 The indent argument takes precedence over the "indent" method/attribute
252 of the object.
253
254 Calling this method will drain the internal data structure. Large
255 numbers of patterns can eat a significant amount of memory, and this
256 lets perl recover the memory used for other purposes.
257
258 If you want to reduce the pattern and continue to add new patterns,
259 clone the object and reduce the clone, leaving the original object
260 intact.
261
262 re
263 Assembles the pattern and return it as a compiled RE, using the "qr//"
264 operator.
265
266 As with "as_string", calling this method will reset the internal data
267 structures to free the memory used in assembling the RE.
268
269 The indent attribute, documented in the "as_string" method, can be used
270 here (it will be ignored if tracking is enabled).
271
272 With method chaining, it is possible to produce a RE without having a
273 temporary "Regexp::Assemble" object lying around, e.g.:
274
275 my $re = Regexp::Assemble->new
276 ->add( q[ab+cd+e] )
277 ->add( q[ac\\d+e] )
278 ->add( q[c\\d+e] )
279 ->re;
280
281 The $re variable now contains a Regexp object that can be used
282 directly:
283
284 while( <> ) {
285 /$re/ and print "Something in [$_] matched\n";
286 )
287
288 The "re" method is called when the object is used in string context
289 (hence, within an "m//" operator), so by and large you do not even need
290 to save the RE in a separate variable. The following will work as
291 expected:
292
293 my $re = Regexp::Assemble->new->add( qw[ fee fie foe fum ] );
294 while( <IN> ) {
295 if( /($re)/ ) {
296 print "Here be giants: $1\n";
297 }
298 }
299
300 This approach does not work with tracked patterns. The "match" and
301 "matched" methods must be used instead, see below.
302
303 match(SCALAR)
304 The following information applies to Perl 5.8 and below. See the
305 section that follows for information on Perl 5.10.
306
307 If pattern tracking is in use, you must "use re 'eval'" in order to
308 make things work correctly. At a minimum, this will make your code look
309 like this:
310
311 my $did_match = do { use re 'eval'; $target =~ /$ra/ }
312 if( $did_match ) {
313 print "matched ", $ra->matched, "\n";
314 }
315
316 (The main reason is that the $^R variable is currently broken and an
317 ugly workaround that runs some Perl code during the match is required,
318 in order to simulate what $^R should be doing. See Perl bug #32840 for
319 more information if you are curious. The README also contains more
320 information). This bug has been fixed in 5.10.
321
322 The important thing to note is that with "use re 'eval'", THERE ARE
323 SECURITY IMPLICATIONS WHICH YOU IGNORE AT YOUR PERIL. The problem is
324 this: if you do not have strict control over the patterns being fed to
325 "Regexp::Assemble" when tracking is enabled, and someone slips you a
326 pattern such as "/^(?{system 'rm -rf /'})/" and you attempt to match a
327 string against the resulting pattern, you will know Fear and Loathing.
328
329 What is more, the $^R workaround means that that tracking does not work
330 if you perform a bare "/$re/" pattern match as shown above. You have to
331 instead call the "match" method, in order to supply the necessary
332 context to take care of the tracking housekeeping details.
333
334 if( defined( my $match = $ra->match($_)) ) {
335 print " $_ matched by $match\n";
336 }
337
338 In the case of a successful match, the original matched pattern is
339 returned directly. The matched pattern will also be available through
340 the "matched" method.
341
342 (Except that the above is not true for 5.6.0: the "match" method
343 returns true or undef, and the "matched" method always returns undef).
344
345 If you are capturing parts of the pattern e.g. "foo(bar)rat" you will
346 want to get at the captures. See the "mbegin", "mend", "mvar" and
347 "capture" methods. If you are not using captures then you may safely
348 ignore this section.
349
350 In 5.10, since the bug concerning $^R has been resolved, there is no
351 need to use "re 'eval'" and the assembled pattern does not require any
352 Perl code to be executed during the match.
353
354 new()
355 Creates a new "Regexp::Assemble" object. The following optional
356 key/value parameters may be employed. All keys have a corresponding
357 method that can be used to change the behaviour later on. As a general
358 rule, especially if you're just starting out, you don't have to bother
359 with any of these.
360
361 anchor_*, a family of optional attributes that allow anchors ("^",
362 "\b", "\Z"...) to be added to the resulting pattern.
363
364 flags, sets the "imsx" flags to add to the assembled regular
365 expression. Warning: no error checking is done, you should ensure that
366 the flags you pass are understood by the version of Perl you are using.
367 modifiers exists as an alias, for users familiar with Regexp::List.
368
369 chomp, controls whether the pattern should be chomped before being
370 lexed. Handy if you are reading patterns from a file. By default,
371 "chomp"ing is performed (this behaviour changed as of version 0.24,
372 prior versions did not chomp automatically). See also the "file"
373 attribute and the "add_file" method.
374
375 file, slurp the contents of the specified file and add them to the
376 assembly. Multiple files may be processed by using a list.
377
378 my $r = Regexp::Assemble->new(file => 're.list');
379
380 my $r = Regexp::Assemble->new(file => ['re.1', 're.2']);
381
382 If you really don't want chomping to occur, you will have to set the
383 "chomp" attribute to 0 (zero). You may also want to look at the
384 "input_record_separator" attribute, as well.
385
386 input_record_separator, controls what constitutes a record separator
387 when using the "file" attribute or the "add_file" method. May be
388 abbreviated to rs. See the $/ variable in perlvar.
389
390 lookahead, controls whether the pattern should contain zero-width
391 lookahead assertions (For instance: (?=[abc])(?:bob|alice|charles).
392 This is not activated by default, because in many circumstances the
393 cost of processing the assertion itself outweighs the benefit of its
394 faculty for short-circuiting a match that will fail. This is sensitive
395 to the probability of a match succeeding, so if you're worried about
396 performance you'll have to benchmark a sample population of targets to
397 see which way the benefits lie.
398
399 track, controls whether you want know which of the initial patterns was
400 the one that matched. See the "matched" method for more details. Note
401 for version 5.8 of Perl and below, in this mode of operation YOU SHOULD
402 BE AWARE OF THE SECURITY IMPLICATIONS that this entails. Perl 5.10 does
403 not suffer from any such restriction.
404
405 indent, the number of spaces used to indent nested grouping of a
406 pattern. Use this to produce a pretty-printed pattern. See the
407 "as_string" method for a more detailed explanation.
408
409 pre_filter, allows you to add a callback to enable sanity checks on the
410 pattern being loaded. This callback is triggered before the pattern is
411 split apart by the lexer. In other words, it operates on the entire
412 pattern. If you are loading patterns from a file, this would be an
413 appropriate place to remove comments.
414
415 filter, allows you to add a callback to enable sanity checks on the
416 pattern being loaded. This callback is triggered after the pattern has
417 been split apart by the lexer.
418
419 unroll_plus, controls whether to unroll, for example, "x+" into "x",
420 "x*", which may allow additional reductions in the resulting assembled
421 pattern.
422
423 reduce, controls whether tail reduction occurs or not. If set, patterns
424 like a(?:bc+d|ec+d) will be reduced to "a[be]c+d". That is, the end of
425 the pattern in each part of the b... and d... alternations is
426 identical, and hence is hoisted out of the alternation and placed after
427 it. On by default. Turn it off if you're really pressed for short
428 assembly times.
429
430 lex, specifies the pattern used to lex the input lines into tokens. You
431 could replace the default pattern by a more sophisticated version that
432 matches arbitrarily nested parentheses, for example.
433
434 debug, controls whether copious amounts of output is produced during
435 the loading stage or the reducing stage of assembly.
436
437 my $ra = Regexp::Assemble->new;
438 my $rb = Regexp::Assemble->new( chomp => 1, debug => 3 );
439
440 mutable, controls whether new patterns can be added to the object after
441 the assembled pattern is generated. DEPRECATED.
442
443 This method/attribute will be removed in a future release. It doesn't
444 really serve any purpose, and may be more effectively replaced by
445 cloning an existing "Regexp::Assemble" object and spinning out a
446 pattern from that instead.
447
448 source()
449 When using tracked mode, after a successful match is made, returns the
450 original source pattern that caused the match. In Perl 5.10, the $^R
451 variable can be used to as an index to fetch the correct pattern from
452 the object.
453
454 If no successful match has been performed, or the object is not in
455 tracked mode, this method returns "undef".
456
457 my $r = Regexp::Assemble->new->track(1)->add(qw(foo? bar{2} [Rr]at));
458
459 for my $w (qw(this food is rather barren)) {
460 if ($w =~ /$r/) {
461 print "$w matched by ", $r->source($^R), $/;
462 }
463 else {
464 print "$w no match\n";
465 }
466 }
467
468 mbegin()
469 This method returns a copy of "@-" at the moment of the last match. You
470 should ordinarily not need to bother with this, "mvar" should be able
471 to supply all your needs.
472
473 mend()
474 This method returns a copy of "@+" at the moment of the last match.
475
476 mvar(NUMBER)
477 The "mvar" method returns the captures of the last match. mvar(1)
478 corresponds to $1, mvar(2) to $2, and so on. mvar(0) happens to return
479 the target string matched, as a byproduct of walking down the "@-" and
480 "@+" arrays after the match.
481
482 If called without a parameter, "mvar" will return a reference to an
483 array containing all captures.
484
485 capture
486 The "capture" method returns the the captures of the last match as an
487 array. Unlink "mvar", this method does not include the matched string.
488 It is equivalent to getting an array back that contains "$1, $2, $3,
489 ...".
490
491 If no captures were found in the match, an empty array is returned,
492 rather than "undef". You are therefore guaranteed to be able to use
493 "for my $c ($re->capture) { ..." without have to check whether
494 anything was captured.
495
496 matched()
497 If pattern tracking has been set, via the "track" attribute, or through
498 the "track" method, this method will return the original pattern of the
499 last successful match. Returns undef match has yet been performed, or
500 tracking has not been enabled.
501
502 See below in the NOTES section for additional subtleties of which you
503 should be aware of when tracking patterns.
504
505 Note that this method is not available in 5.6.0, due to limitations in
506 the implementation of "(?{...})" at the time.
507
508 Statistics/Reporting routines
509 stats_add
510 Returns the number of patterns added to the assembly (whether by "add"
511 or "insert"). Duplicate patterns are not included in this total.
512
513 stats_dup
514 Returns the number of duplicate patterns added to the assembly. If
515 non-zero, this may be a sign that something is wrong with your data (or
516 at the least, some needless redundancy). This may occur when you have
517 two patterns (for instance, "a\-b" and "a-b") which map to the same
518 result.
519
520 stats_raw()
521 Returns the raw number of bytes in the patterns added to the assembly.
522 This includes both original and duplicate patterns. For instance,
523 adding the two patterns "ab" and "ab" will count as 4 bytes.
524
525 stats_cooked()
526 Return the true number of bytes added to the assembly. This will not
527 include duplicate patterns. Furthermore, it may differ from the raw
528 bytes due to quotemeta treatment. For instance, "abc\,def" will count
529 as 7 (not 8) bytes, because "\," will be stored as ",". Also, "\Qa.b\E"
530 is 7 bytes long, however, after the quotemeta directives are processed,
531 "a\.b" will be stored, for a total of 4 bytes.
532
533 stats_length()
534 Returns the length of the resulting assembled expression. Until
535 "as_string" or "re" have been called, the length will be 0 (since the
536 assembly will have not yet been performed). The length includes only
537 the pattern, not the additional ("(?-xism...") fluff added by the
538 compilation.
539
540 dup_warn(NUMBER|CODEREF)
541 Turns warnings about duplicate patterns on or off. By default, no
542 warnings are emitted. If the method is called with no parameters, or a
543 true parameter, the object will carp about patterns it has already
544 seen. To turn off the warnings, use 0 as a parameter.
545
546 $r->dup_warn();
547
548 The method may also be passed a code block. In this case the code will
549 be executed and it will receive a reference to the object in question,
550 and the lexed pattern.
551
552 $r->dup_warn(
553 sub {
554 my $self = shift;
555 print $self->stats_add, " patterns added at line $.\n",
556 join( '', @_ ), " added previously\n";
557 }
558 )
559
560 Anchor routines
561 Suppose you wish to assemble a series of patterns that all begin with
562 "^" and end with "$" (anchor pattern to the beginning and end of
563 line). Rather than add the anchors to each and every pattern (and
564 possibly forget to do so when a new entry is added), you may specify
565 the anchors in the object, and they will appear in the resulting
566 pattern, and you no longer need to (or should) put them in your source
567 patterns. For example, the two following snippets will produce
568 identical patterns:
569
570 $r->add(qw(^this ^that ^them))->as_string;
571
572 $r->add(qw(this that them))->anchor_line_begin->as_string;
573
574 # both techniques will produce ^th(?:at|em|is)
575
576 All anchors are possible word ("\b") boundaries, line boundaries ("^"
577 and "$") and string boundaries ("\A" and "\Z" (or "\z" if you
578 absolutely need it)).
579
580 The shortcut "anchor_mumble" implies both "anchor_mumble_begin"
581 "anchor_mumble_end" is also available. If different anchors are
582 specified the most specific anchor wins. For instance, if both
583 "anchor_word_begin" and "anchor_line_begin" are specified,
584 "anchor_word_begin" takes precedence.
585
586 All the anchor methods are chainable.
587
588 anchor_word_begin
589 The resulting pattern will be prefixed with a "\b" word boundary
590 assertion when the value is true. Set to 0 to disable.
591
592 $r->add('pre')->anchor_word_begin->as_string;
593 # produces '\bpre'
594
595 anchor_word_end
596 The resulting pattern will be suffixed with a "\b" word boundary
597 assertion when the value is true. Set to 0 to disable.
598
599 $r->add(qw(ing tion))
600 ->anchor_word_end
601 ->as_string; # produces '(?:tion|ing)\b'
602
603 anchor_word
604 The resulting pattern will be have "\b" word boundary assertions at the
605 beginning and end of the pattern when the value is true. Set to 0 to
606 disable.
607
608 $r->add(qw(cat carrot)
609 ->anchor_word(1)
610 ->as_string; # produces '\bca(?:rro)t\b'
611
612 anchor_line_begin
613 The resulting pattern will be prefixed with a "^" line boundary
614 assertion when the value is true. Set to 0 to disable.
615
616 $r->anchor_line_begin;
617 # or
618 $r->anchor_line_begin(1);
619
620 anchor_line_end
621 The resulting pattern will be suffixed with a "$" line boundary
622 assertion when the value is true. Set to 0 to disable.
623
624 # turn it off
625 $r->anchor_line_end(0);
626
627 anchor_line
628 The resulting pattern will be have the "^" and "$" line boundary
629 assertions at the beginning and end of the pattern, respectively, when
630 the value is true. Set to 0 to disable.
631
632 $r->add(qw(cat carrot)
633 ->anchor_line
634 ->as_string; # produces '^ca(?:rro)t$'
635
636 anchor_string_begin
637 The resulting pattern will be prefixed with a "\A" string boundary
638 assertion when the value is true. Set to 0 to disable.
639
640 $r->anchor_string_begin(1);
641
642 anchor_string_end
643 The resulting pattern will be suffixed with a "\Z" string boundary
644 assertion when the value is true. Set to 0 to disable.
645
646 # disable the string boundary end anchor
647 $r->anchor_string_end(0);
648
649 anchor_string_end_absolute
650 The resulting pattern will be suffixed with a "\z" string boundary
651 assertion when the value is true. Set to 0 to disable.
652
653 # disable the string boundary absolute end anchor
654 $r->anchor_string_end_absolute(0);
655
656 If you don't understand the difference between "\Z" and "\z", the
657 former will probably do what you want.
658
659 anchor_string
660 The resulting pattern will be have the "\A" and "\Z" string boundary
661 assertions at the beginning and end of the pattern, respectively, when
662 the value is true. Set to 0 to disable.
663
664 $r->add(qw(cat carrot)
665 ->anchor_string
666 ->as_string; # produces '\Aca(?:rro)t\Z'
667
668 anchor_string_absolute
669 The resulting pattern will be have the "\A" and "\z" string boundary
670 assertions at the beginning and end of the pattern, respectively, when
671 the value is true. Set to 0 to disable.
672
673 $r->add(qw(cat carrot)
674 ->anchor_string_absolute
675 ->as_string; # produces '\Aca(?:rro)t\z'
676
677 debug(NUMBER)
678 Turns debugging on or off. Statements are printed to the currently
679 selected file handle (STDOUT by default). If you are already using
680 this handle, you will have to arrange to select an output handle to a
681 file of your own choosing, before call the "add", "as_string" or "re")
682 functions, otherwise it will scribble all over your carefully formatted
683 output.
684
685 • Off. Turns off all debugging output.
686
687 • 1
688
689 Add. Trace the addition of patterns.
690
691 • 2
692
693 Reduce. Trace the process of reduction and assembly.
694
695 • 4
696
697 Lex. Trace the lexing of the input patterns into its constituent
698 tokens.
699
700 • 8
701
702 Time. Print to STDOUT the time taken to load all the patterns. This
703 is nothing more than the difference between the time the object was
704 instantiated and the time reduction was initiated.
705
706 # load=<num>
707
708 Any lengthy computation performed in the client code will be
709 reflected in this value. Another line will be printed after
710 reduction is complete.
711
712 # reduce=<num>
713
714 The above output lines will be changed to "load-epoch" and
715 "reduce-epoch" if the internal state of the object is corrupted and
716 the initial timestamp is lost.
717
718 The code attempts to load Time::HiRes in order to report fractional
719 seconds. If this is not successful, the elapsed time is displayed
720 in whole seconds.
721
722 Values can be added (or or'ed together) to trace everything
723
724 $r->debug(7)->add( '\\d+abc' );
725
726 Calling "debug" with no arguments turns debugging off.
727
728 dump()
729 Produces a synthetic view of the internal data structure. How to
730 interpret the results is left as an exercise to the reader.
731
732 print $r->dump;
733
734 chomp(0|1)
735 Turns chomping on or off.
736
737 IMPORTANT: As of version 0.24, chomping is now on by default as it
738 makes "add_file" Just Work. The only time you may run into trouble is
739 with add("\\$/"). So don't do that, or else explicitly turn off
740 chomping.
741
742 To avoid incorporating (spurious) record separators (such as "\n" on
743 Unix) when reading from a file, add() "chomp"s its input. If you don't
744 want this to happen, call "chomp" with a false value.
745
746 $re->chomp(0); # really want the record separators
747 $re->add(<DATA>);
748
749 fold_meta_pairs(NUMBER)
750 Determines whether "\s", "\S" and "\w", "\W" and "\d", "\D" are folded
751 into a "." (dot). Folding happens by default (for reasons of backwards
752 compatibility, even though it is wrong when the "/s" expression
753 modifier is active).
754
755 Call this method with a false value to prevent this behaviour (which is
756 only a problem when dealing with "\n" if the "/s" expression modifier
757 is also set).
758
759 $re->add( '\\w', '\\W' );
760 my $clone = $re->clone;
761
762 $clone->fold_meta_pairs(0);
763 print $clone->as_string; # prints '.'
764 print $re->as_string; # print '[\W\w]'
765
766 indent(NUMBER)
767 Sets the level of indent for pretty-printing nested groups within a
768 pattern. See the "as_string" method for more details. When called
769 without a parameter, no indenting is performed.
770
771 $re->indent( 4 );
772 print $re->as_string;
773
774 lookahead(0|1)
775 Turns on zero-width lookahead assertions. This is usually beneficial
776 when you expect that the pattern will usually fail. If you expect that
777 the pattern will usually match you will probably be worse off.
778
779 flags(STRING)
780 Sets the flags that govern how the pattern behaves (for versions of
781 Perl up to 5.9 or so, these are "imsx"). By default no flags are
782 enabled.
783
784 modifiers(STRING)
785 An alias of the "flags" method, for users familiar with "Regexp::List".
786
787 track(0|1)
788 Turns tracking on or off. When this attribute is enabled, additional
789 housekeeping information is inserted into the assembled expression
790 using "({...}" embedded code constructs. This provides the necessary
791 information to determine which, of the original patterns added, was the
792 one that caused the match.
793
794 $re->track( 1 );
795 if( $target =~ /$re/ ) {
796 print "$target matched by ", $re->matched, "\n";
797 }
798
799 Note that when this functionality is enabled, no reduction is performed
800 and no character classes are generated. In other words, "brag|tag" is
801 not reduced down to "(?:br|t)ag" and "dig|dim" is not reduced to
802 "di[gm]".
803
804 unroll_plus(0|1)
805 Turns the unrolling of plus metacharacters on or off. When a pattern is
806 broken up, "a+" becomes "a", "a*" (and "b+?" becomes "b", "b*?". This
807 may allow the freed "a" to assemble with other patterns. Not enabled by
808 default.
809
810 lex(SCALAR)
811 Change the pattern used to break a string apart into tokens. You can
812 examine the "eg/naive" script as a starting point.
813
814 reduce(0|1)
815 Turns pattern reduction on or off. A reduced pattern may be
816 considerably shorter than an unreduced pattern. Consider
817 "/sl(?:ip|op|ap)/" versus "/sl[aio]p/". An unreduced pattern will be
818 very similar to those produced by "Regexp::Optimizer". Reduction is on
819 by default. Turning it off speeds assembly (but assembly is pretty fast
820 -- it's the breaking up of the initial patterns in the lexing stage
821 that can consume a non-negligible amount of time).
822
823 mutable(0|1)
824 This method has been marked as DEPRECATED. It will be removed in a
825 future release. See the "clone" method for a technique to replace its
826 functionality.
827
828 reset()
829 Empties out the patterns that have been "add"ed or "insert"-ed into the
830 object. Does not modify the state of controller attributes such as
831 "debug", "lex", "reduce" and the like.
832
833 Default_Lexer
834 Warning: the "Default_Lexer" function is a class method, not an object
835 method. It is a fatal error to call it as an object method.
836
837 The "Default_Lexer" method lets you replace the default pattern used
838 for all subsequently created "Regexp::Assemble" objects. It will not
839 have any effect on existing objects. (It is also possible to override
840 the lexer pattern used on a per-object basis).
841
842 The parameter should be an ordinary scalar, not a compiled pattern. If
843 the pattern fails to match all parts of the string, the missing parts
844 will be returned as single chunks. Therefore the following pattern is
845 legal (albeit rather cork-brained):
846
847 Regexp::Assemble::Default_Lexer( '\\d' );
848
849 The above pattern will split up input strings digit by digit, and all
850 non-digit characters as single chunks.
851
853 "Cannot pass a C<refname> to Default_Lexer"
854
855 You tried to replace the default lexer pattern with an object instead
856 of a scalar. Solution: You probably tried to call
857 "$obj->Default_Lexer". Call the qualified class method instead
858 "Regexp::Assemble::Default_Lexer".
859
860 "filter method not passed a coderef"
861
862 "pre_filter method not passed a coderef"
863
864 A reference to a subroutine (anonymous or otherwise) was expected.
865 Solution: read the documentation for the "filter" method.
866
867 "duplicate pattern added: /.../"
868
869 The "dup_warn" attribute is active, and a duplicate pattern was added
870 (well duh!). Solution: clean your data.
871
872 "cannot open [file] for input: [reason]"
873
874 The "add_file" method was unable to open the specified file for
875 whatever reason. Solution: make sure the file exists and the script has
876 the required privileges to read it.
877
879 This module has been tested successfully with a range of versions of
880 perl, from 5.005_03 to 5.9.3. Use of 5.6.0 is not recommended.
881
882 The expressions produced by this module can be used with the PCRE
883 library.
884
885 Remember to "double up" your backslashes if the patterns are hard-coded
886 as constants in your program. That is, you should literally
887 add('a\\d+b') rather than add('a\d+b'). It usually will work either
888 way, but it's good practice to do so.
889
890 Where possible, supply the simplest tokens possible. Don't add
891 "X(?-\d+){2})Y" when "X-\d+-\d+Y" will do. The reason is that if you
892 also add "X\d+Z" the resulting assembly changes dramatically:
893 "X(?:(?:-\d+){2}Y|-\d+Z)" versus "X-\d+(?:-\d+Y|Z)". Since R::A doesn't
894 perform enough analysis, it won't "unroll" the "{2}" quantifier, and
895 will fail to notice the divergence after the first "-d\d+".
896
897 Furthermore, when the string 'X-123000P' is matched against the first
898 assembly, the regexp engine will have to backtrack over each
899 alternation (the one that ends in Y and the one that ends in Z) before
900 determining that there is no match. No such backtracking occurs in the
901 second pattern: as soon as the engine encounters the 'P' in the target
902 string, neither of the alternations at that point ("-\d+Y" or "Z")
903 could succeed and so the match fails.
904
905 "Regexp::Assemble" does, however, know how to build character classes.
906 Given "a-b", "axb" and "a\db", it will assemble these into "a[-\dx]b".
907 When "-" (dash) appears as a candidate for a character class it will be
908 the first character in the class. When "^" (circumflex) appears as a
909 candidate for a character class it will be the last character in the
910 class.
911
912 It also knows about meta-characters than can "absorb" regular
913 characters. For instance, given "X\d" and "X5", it knows that 5 can be
914 represented by "\d" and so the assembly is just "X\d". The "absorbent"
915 meta-characters it deals with are ".", "\d", "\s" and "\W" and their
916 complements. It will replace "\d"/"\D", "\s"/"\S" and "\w"/"\W" by "."
917 (dot), and it will drop "\d" if "\w" is also present (as will "\D" in
918 the presence of "\W").
919
920 "Regexp::Assemble" deals correctly with "quotemeta"'s propensity to
921 backslash many characters that have no need to be. Backslashes on non-
922 metacharacters will be removed. Similarly, in character classes, a
923 number of characters lose their magic and so no longer need to be
924 backslashed within a character class. Two common examples are "."
925 (dot) and "$". Such characters will lose their backslash.
926
927 At the same time, it will also process "\Q...\E" sequences. When such a
928 sequence is encountered, the inner section is extracted and "quotemeta"
929 is applied to the section. The resulting quoted text is then used in
930 place of the original unquoted text, and the "\Q" and "\E"
931 metacharacters are thrown away. Similar processing occurs with the
932 "\U...\E" and "\L...\E" sequences. This may have surprising effects
933 when using a dispatch table. In this case, you will need to know
934 exactly what the module makes of your input. Use the "lexstr" method to
935 find out what's going on:
936
937 $pattern = join( '', @{$re->lexstr($pattern)} );
938
939 If all the digits 0..9 appear in a character class, "Regexp::Assemble"
940 will replace them by "\d". I'd do it for letters as well, but thinking
941 about accented characters and other glyphs hurts my head.
942
943 In an alternation, the longest paths are chosen first (for example,
944 "horse|bird|dog"). When two paths have the same length, the path with
945 the most subpaths will appear first. This aims to put the "busiest"
946 paths to the front of the alternation. For example, the list "bad",
947 "bit", "few", "fig" and "fun" will produce the pattern
948 "(?:f(?:ew|ig|un)|b(?:ad|it))". See eg/tld for a real-world example of
949 how alternations are sorted. Once you have looked at that, everything
950 should be crystal clear.
951
952 When tracking is in use, no reduction is performed. nor are character
953 classes formed. The reason is that it is too difficult to determine the
954 original pattern afterwards. Consider the two patterns "pale" and
955 "palm". These should be reduced to "pal[em]". The final character
956 matches one of two possibilities. To resolve whether it matched an 'e'
957 or 'm' would require keeping track of the fact that the pattern
958 finished up in a character class, which would the require a whole lot
959 more work to figure out which character of the class matched. Without
960 character classes it becomes much easier. Instead, pal(?:e|m) is
961 produced, which lets us find out more simply where we ended up.
962
963 Similarly, "dogfood" and "seafood" should form "(?:dog|sea)food". When
964 the pattern is being assembled, the tracking decision needs to be made
965 at the end of the grouping, but the tail of the pattern has not yet
966 been visited. Deferring things to make this work correctly is a vast
967 hassle. In this case, the pattern becomes merely "(?:dogfood|seafood".
968 Tracked patterns will therefore be bulkier than simple patterns.
969
970 There is an open bug on this issue:
971
972 <http://rt.perl.org/rt3/Ticket/Display.html?id=32840>
973
974 If this bug is ever resolved, tracking would become much easier to deal
975 with (none of the "match" hassle would be required - you could just
976 match like a regular RE and it would Just Work).
977
979 perlre
980 General information about Perl's regular expressions.
981
982 re Specific information about "use re 'eval'".
983
984 Regex::PreSuf
985 "Regex::PreSuf" takes a string and chops it itself into tokens of
986 length 1. Since it can't deal with tokens of more than one
987 character, it can't deal with meta-characters and thus no regular
988 expressions. Which is the main reason why I wrote this module.
989
990 Regexp::Optimizer
991 "Regexp::Optimizer" produces regular expressions that are similar
992 to those produced by R::A with reductions switched off. It's
993 biggest drawback is that it is exponentially slower than
994 Regexp::Assemble on very large sets of patterns.
995
996 Regexp::Parser
997 Fine grained analysis of regular expressions.
998
999 Regexp::Trie
1000 Funnily enough, this was my working name for "Regexp::Assemble"
1001 during its development. I changed the name because I thought it was
1002 too obscure. Anyway, "Regexp::Trie" does much the same as
1003 "Regexp::Optimizer" and "Regexp::Assemble" except that it runs much
1004 faster (according to the author). It does not recognise meta
1005 characters (that is, 'a+b' is interpreted as 'a\+b').
1006
1007 Text::Trie
1008 "Text::Trie" is well worth investigating. Tries can outperform very
1009 bushy (read: many alternations) patterns.
1010
1011 Tree::Trie
1012 "Tree::Trie" is another module that builds tries. The algorithm
1013 that "Regexp::Assemble" uses appears to be quite similar to the
1014 algorithm described therein, except that "R::A" solves its end-
1015 marker problem without having to rewrite the leaves.
1016
1018 For alternatives to this module, consider one of:
1019
1020 o Data::Munge
1021 o OnSearch::Regex
1022 o Regex::PreSuf
1023
1025 Some mildly complex cases are not handled well. See
1026 examples/failure.01.pl and
1027 <https://rt.cpan.org/Public/Bug/Display.html?id=104897>.
1028
1029 See also <https://rt.cpan.org/Public/Bug/Display.html?id=106480> for a
1030 discussion of some of the issues arising with the use of a huge number
1031 of alterations. Thanx to Slaven Rezic for the details of trie 'v' non-
1032 trie operations within Perl which influence regexp handling of
1033 alternations.
1034
1035 <Regexp::Assemble> does not attempt to find common substrings. For
1036 instance, it will not collapse "/cabababc/" down to "/c(?:ab){3}c/".
1037 If there's a module out there that performs this sort of string
1038 analysis I'd like to know about it. But keep in mind that the
1039 algorithms that do this are very expensive: quadratic or worse.
1040
1041 "Regexp::Assemble" does not interpret meta-character modifiers. For
1042 instance, if the following two patterns are given: "X\d" and "X\d+", it
1043 will not determine that "\d" can be matched by "\d+". Instead, it will
1044 produce X(?:\d|\d+). Along a similar line of reasoning, it will not
1045 determine that "Z" and "Z\d+" is equivalent to "Z\d*" (It will produce
1046 "Z(?:\d+)?" instead).
1047
1048 You cannot remove a pattern that has been added to an object. You'll
1049 just have to start over again. Adding a pattern is difficult enough,
1050 I'd need a solid argument to convince me to add a "remove" method. If
1051 you need to do this you should read the documentation for the "clone"
1052 method.
1053
1054 "Regexp::Assemble" does not (yet)? employ the "(?>...)" construct.
1055
1056 The module does not produce POSIX-style regular expressions. This would
1057 be quite easy to add, if there was a demand for it.
1058
1060 Patterns that generate look-ahead assertions sometimes produce
1061 incorrect patterns in certain obscure corner cases. If you suspect that
1062 this is occurring in your pattern, disable lookaheads.
1063
1064 Tracking doesn't really work at all with 5.6.0. It works better in
1065 subsequent 5.6 releases. For maximum reliability, the use of a 5.8
1066 release is strongly recommended. Tracking barely works with 5.005_04.
1067 Of note, using "\d"-style meta-characters invariably causes panics.
1068 Tracking really comes into its own in Perl 5.10.
1069
1070 If you feed "Regexp::Assemble" patterns with nested parentheses, there
1071 is a chance that the resulting pattern will be uncompilable due to
1072 mismatched parentheses (not enough closing parentheses). This is
1073 normal, so long as the default lexer pattern is used. If you want to
1074 find out which pattern among a list of 3000 patterns are to blame
1075 (speaking from experience here), the eg/debugging script offers a
1076 strategy for pinpointing the pattern at fault. While you may not be
1077 able to use the script directly, the general approach is easy to
1078 implement.
1079
1080 The algorithm used to assemble the regular expressions makes extensive
1081 use of mutually-recursive functions (that is, A calls B, B calls A,
1082 ...) For deeply similar expressions, it may be possible to provoke
1083 "Deep recursion" warnings.
1084
1085 The module has been tested extensively, and has an extensive test suite
1086 (that achieves close to 100% statement coverage), but you never know...
1087 A bug may manifest itself in two ways: creating a pattern that cannot
1088 be compiled, such as "a\(bc)", or a pattern that compiles correctly but
1089 that either matches things it shouldn't, or doesn't match things it
1090 should. It is assumed that Such problems will occur when the reduction
1091 algorithm encounters some sort of edge case. A temporary work-around is
1092 to disable reductions:
1093
1094 my $pattern = $assembler->reduce(0)->re;
1095
1096 A discussion about implementation details and where bugs might lurk
1097 appears in the README file. If this file is not available locally, you
1098 should be able to find a copy on the Web at your nearest CPAN mirror.
1099
1100 Seriously, though, a number of people have been using this module to
1101 create expressions anywhere from 140Kb to 600Kb in size, and it seems
1102 to be working according to spec. Thus, I don't think there are any
1103 serious bugs remaining.
1104
1105 If you are feeling brave, extensive debugging traces are available to
1106 figure out where assembly goes wrong.
1107
1108 Please report all bugs at
1109 <http://rt.cpan.org/NoAuth/Bugs.html?Dist=Regexp-Assemble>
1110
1111 Make sure you include the output from the following two commands:
1112
1113 perl -MRegexp::Assemble -le 'print $Regexp::Assemble::VERSION'
1114 perl -V
1115
1116 There is a mailing list for the discussion of "Regexp::Assemble".
1117 Subscription details are available at
1118 <http://listes.mongueurs.net/mailman/listinfo/regexp-assemble>.
1119
1121 This module grew out of work I did building access maps for Postfix, a
1122 modern SMTP mail transfer agent. See <http://www.postfix.org/> for more
1123 information. I used Perl to build large regular expressions for
1124 blocking dynamic/residential IP addresses to cut down on spam and
1125 viruses. Once I had the code running for this, it was easy to start
1126 adding stuff to block really blatant spam subject lines, bogus HELO
1127 strings, spammer mailer-ids and more...
1128
1129 I presented the work at the French Perl Workshop in 2004, and the thing
1130 most people asked was whether the underlying mechanism for assembling
1131 the REs was available as a module. At that time it was nothing more
1132 that a twisty maze of scripts, all different. The interest shown
1133 indicated that a module was called for. I'd like to thank the people
1134 who showed interest. Hey, it's going to make my messy scripts smaller,
1135 in any case.
1136
1137 Thomas Drugeon was a valuable sounding board for trying out early
1138 ideas. Jean Forget and Philippe Blayo looked over an early version.
1139 H.Merijn Brandt stopped over in Paris one evening, and discussed things
1140 over a few beers.
1141
1142 Nicholas Clark pointed out that while what this module does
1143 (?:c|sh)ould be done in perl's core, as per the 2004 TODO, he
1144 encouraged me to continue with the development of this module. In any
1145 event, this module allows one to gauge the difficulty of undertaking
1146 the endeavour in C. I'd rather gouge my eyes out with a blunt pencil.
1147
1148 Paul Johnson settled the question as to whether this module should live
1149 in the Regex:: namespace, or Regexp:: namespace. If you're not
1150 convinced, try running the following one-liner:
1151
1152 perl -le 'print ref qr//'
1153
1154 Philippe Bruhat found a couple of corner cases where this module could
1155 produce incorrect results. Such feedback is invaluable, and only
1156 improves the module's quality.
1157
1159 The file Changes was converted into Changelog.ini by
1160 Module::Metadata::Changes.
1161
1163 David Landgren
1164
1165 Copyright (C) 2004-2011. All rights reserved.
1166
1167 http://www.landgren.net/perl/
1168
1169 If you use this module, I'd love to hear about what you're using it
1170 for. If you want to be informed of updates, send me a note.
1171
1172 Ron Savage is co-maint of the module, starting with V 0.36.
1173
1175 <https://github.com/ronsavage/Regexp-Assemble.git>
1176
1178 1. Tree equivalencies. Currently, /contend/ /content/ /resend/ /resent/
1179 produces (?:conten[dt]|resend[dt]) but it is possible to produce
1180 (?:cont|res)en[dt] if one can spot the common tail nodes (and walk back
1181 the equivalent paths). Or be by me my => /[bm][ey]/ in the simplest
1182 case.
1183
1184 To do this requires a certain amount of restructuring of the code.
1185 Currently, the algorithm uses a two-phase approach. In the first phase,
1186 the trie is traversed and reductions are performed. In the second
1187 phase, the reduced trie is traversed and the pattern is emitted.
1188
1189 What has to occur is that the reduction and emission have to occur
1190 together. As a node is completed, it is replaced by its string
1191 representation. This then allows child nodes to be compared for
1192 equality with a simple 'eq'. Since there is only a single traversal,
1193 the overall generation time might drop, even though the context baggage
1194 required to delve through the tree will be more expensive to carry
1195 along (a hash rather than a couple of scalars).
1196
1197 Actually, a simpler approach is to take on a secret sentinel atom at
1198 the end of every pattern, which gives the reduction algorithm
1199 sufficient traction to create a perfect trie.
1200
1201 I'm rewriting the reduction code using this technique.
1202
1203 2. Investigate how (?>foo) works. Can it be applied?
1204
1205 5. How can a tracked pattern be serialised? (Add freeze and thaw
1206 methods).
1207
1208 6. Store callbacks per tracked pattern.
1209
1210 12. utf-8... hmmmm...
1211
1212 14. Adding qr//'ed patterns. For example, consider
1213 $r->add ( qr/^abc/i )
1214 ->add( qr/^abd/ )
1215 ->add( qr/^ab e/x );
1216 this should admit abc abC aBc aBC abd abe as matches
1217
1218 16. Allow a fast, unsafe tracking mode, that can be used if a(?bc)?
1219 can't happen. (Possibly carp if it does appear during traversal)?
1220
1221 17. given a-\d+-\d+-\d+-\d+-b, produce a(?:-\d+){4}-b. Something
1222 along the lines of (.{4))(\1+) would let the regexp engine
1223 itself be brought to bear on the matter, which is a rather
1224 appealing idea. Consider
1225
1226 while(/(?!\+)(\S{2,}?)(\1+)/g) { ... $1, $2 ... }
1227
1228 as a starting point.
1229
1230 19. The reduction code has become unbelievably baroque. Adding code
1231 to handle (sting,singing,sing) => s(?:(?:ing)?|t)ing was far
1232 too difficult. Adding more stuff just breaks existing behaviour.
1233 And fixing the ^abcd$ ... bug broke stuff all over again.
1234 Now that the corner cases are more clearly identified, a full
1235 rewrite of the reduction code is needed. And would admit the
1236 possibility of implementing items 1 and 17.
1237
1238 20. Handle debug unrev with a separate bit
1239
1240 23. Japhy's http://www.perlmonks.org/index.pl?node_id=90876 list2range
1241 regexp
1242
1243 24. Lookahead assertions contain serious bugs (as shown by
1244 assembling powersets. Need to save more context during reduction,
1245 which in turn will simplify the preparation of the lookahead
1246 classes. See also 19.
1247
1248 26. _lex() swamps the overall run-time. It stems from the decision
1249 to use a single regexp to pull apart any pattern. A suite of
1250 simpler regexp to pick of parens, char classes, quantifiers
1251 and bare tokens may be faster. (This has been implemented as
1252 _fastlex(), but it's only marginally faster. Perhaps split-by-
1253 char and lex a la C?
1254
1255 27. We don't, as yet, unroll_plus a paren e.g. (abc)+?
1256
1257 28. We don't reroll unrolled a a* to a+ in indented or tracked
1258 output
1259
1260 29. Use (*MARK n) in blead for tracked patterns, and use (*FAIL) for
1261 the unmatchable pattern.
1262
1264 This library is free software; you can redistribute it and/or modify it
1265 under the same terms as Perl itself.
1266
1267
1268
1269perl v5.38.0 2023-07-21 Regexp::Assemble(3)