1PERLREAPI(1)           Perl Programmers Reference Guide           PERLREAPI(1)
2
3
4

NAME

6       perlreapi - Perl regular expression plugin interface
7

DESCRIPTION

9       As of Perl 5.9.5 there is a new interface for plugging and using
10       regular expression engines other than the default one.
11
12       Each engine is supposed to provide access to a constant structure of
13       the following format:
14
15           typedef struct regexp_engine {
16               REGEXP* (*comp) (pTHX_
17                                const SV * const pattern, const U32 flags);
18               I32     (*exec) (pTHX_
19                                REGEXP * const rx,
20                                char* stringarg,
21                                char* strend, char* strbeg,
22                                SSize_t minend, SV* sv,
23                                void* data, U32 flags);
24               char*   (*intuit) (pTHX_
25                                  REGEXP * const rx, SV *sv,
26                                  const char * const strbeg,
27                                  char *strpos, char *strend, U32 flags,
28                                  struct re_scream_pos_data_s *data);
29               SV*     (*checkstr) (pTHX_ REGEXP * const rx);
30               void    (*free) (pTHX_ REGEXP * const rx);
31               void    (*numbered_buff_FETCH) (pTHX_
32                                               REGEXP * const rx,
33                                               const I32 paren,
34                                               SV * const sv);
35               void    (*numbered_buff_STORE) (pTHX_
36                                               REGEXP * const rx,
37                                               const I32 paren,
38                                               SV const * const value);
39               I32     (*numbered_buff_LENGTH) (pTHX_
40                                                REGEXP * const rx,
41                                                const SV * const sv,
42                                                const I32 paren);
43               SV*     (*named_buff) (pTHX_
44                                      REGEXP * const rx,
45                                      SV * const key,
46                                      SV * const value,
47                                      U32 flags);
48               SV*     (*named_buff_iter) (pTHX_
49                                           REGEXP * const rx,
50                                           const SV * const lastkey,
51                                           const U32 flags);
52               SV*     (*qr_package)(pTHX_ REGEXP * const rx);
53           #ifdef USE_ITHREADS
54               void*   (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
55           #endif
56               REGEXP* (*op_comp) (...);
57
58       When a regexp is compiled, its "engine" field is then set to point at
59       the appropriate structure, so that when it needs to be used Perl can
60       find the right routines to do so.
61
62       In order to install a new regexp handler, $^H{regcomp} is set to an
63       integer which (when casted appropriately) resolves to one of these
64       structures.  When compiling, the "comp" method is executed, and the
65       resulting "regexp" structure's engine field is expected to point back
66       at the same structure.
67
68       The pTHX_ symbol in the definition is a macro used by Perl under
69       threading to provide an extra argument to the routine holding a pointer
70       back to the interpreter that is executing the regexp. So under
71       threading all routines get an extra argument.
72

Callbacks

74   comp
75           REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags);
76
77       Compile the pattern stored in "pattern" using the given "flags" and
78       return a pointer to a prepared "REGEXP" structure that can perform the
79       match.  See "The REGEXP structure" below for an explanation of the
80       individual fields in the REGEXP struct.
81
82       The "pattern" parameter is the scalar that was used as the pattern.
83       Previous versions of Perl would pass two "char*" indicating the start
84       and end of the stringified pattern; the following snippet can be used
85       to get the old parameters:
86
87           STRLEN plen;
88           char*  exp = SvPV(pattern, plen);
89           char* xend = exp + plen;
90
91       Since any scalar can be passed as a pattern, it's possible to implement
92       an engine that does something with an array (""ook" =~ [ qw/ eek hlagh
93       / ]") or with the non-stringified form of a compiled regular expression
94       (""ook" =~ qr/eek/").  Perl's own engine will always stringify
95       everything using the snippet above, but that doesn't mean other engines
96       have to.
97
98       The "flags" parameter is a bitfield which indicates which of the
99       "msixpn" flags the regex was compiled with.  It also contains
100       additional info, such as if "use locale" is in effect.
101
102       The "eogc" flags are stripped out before being passed to the comp
103       routine.  The regex engine does not need to know if any of these are
104       set, as those flags should only affect what Perl does with the pattern
105       and its match variables, not how it gets compiled and executed.
106
107       By the time the comp callback is called, some of these flags have
108       already had effect (noted below where applicable).  However most of
109       their effect occurs after the comp callback has run, in routines that
110       read the "rx->extflags" field which it populates.
111
112       In general the flags should be preserved in "rx->extflags" after
113       compilation, although the regex engine might want to add or delete some
114       of them to invoke or disable some special behavior in Perl.  The flags
115       along with any special behavior they cause are documented below:
116
117       The pattern modifiers:
118
119       "/m" - RXf_PMf_MULTILINE
120           If this is in "rx->extflags" it will be passed to "Perl_fbm_instr"
121           by "pp_split" which will treat the subject string as a multi-line
122           string.
123
124       "/s" - RXf_PMf_SINGLELINE
125       "/i" - RXf_PMf_FOLD
126       "/x" - RXf_PMf_EXTENDED
127           If present on a regex, "#" comments will be handled differently by
128           the tokenizer in some cases.
129
130           TODO: Document those cases.
131
132       "/p" - RXf_PMf_KEEPCOPY
133           TODO: Document this
134
135       Character set
136           The character set rules are determined by an enum that is contained
137           in this field.  This is still experimental and subject to change,
138           but the current interface returns the rules by use of the in-line
139           function "get_regex_charset(const U32 flags)".  The only currently
140           documented value returned from it is REGEX_LOCALE_CHARSET, which is
141           set if "use locale" is in effect. If present in "rx->extflags",
142           "split" will use the locale dependent definition of whitespace when
143           RXf_SKIPWHITE or RXf_WHITE is in effect.  ASCII whitespace is
144           defined as per isSPACE, and by the internal macros "is_utf8_space"
145           under UTF-8, and "isSPACE_LC" under "use locale".
146
147       Additional flags:
148
149       RXf_SPLIT
150           This flag was removed in perl 5.18.0.  "split ' '" is now special-
151           cased solely in the parser.  RXf_SPLIT is still #defined, so you
152           can test for it.  This is how it used to work:
153
154           If "split" is invoked as "split ' '" or with no arguments (which
155           really means "split(' ', $_)", see split), Perl will set this flag.
156           The regex engine can then check for it and set the SKIPWHITE and
157           WHITE extflags.  To do this, the Perl engine does:
158
159               if (flags & RXf_SPLIT && r->prelen == 1 && r->precomp[0] == ' ')
160                   r->extflags |= (RXf_SKIPWHITE|RXf_WHITE);
161
162       These flags can be set during compilation to enable optimizations in
163       the "split" operator.
164
165       RXf_SKIPWHITE
166           This flag was removed in perl 5.18.0.  It is still #defined, so you
167           can set it, but doing so will have no effect.  This is how it used
168           to work:
169
170           If the flag is present in "rx->extflags" "split" will delete
171           whitespace from the start of the subject string before it's
172           operated on.  What is considered whitespace depends on if the
173           subject is a UTF-8 string and if the "RXf_PMf_LOCALE" flag is set.
174
175           If RXf_WHITE is set in addition to this flag, "split" will behave
176           like "split " "" under the Perl engine.
177
178       RXf_START_ONLY
179           Tells the split operator to split the target string on newlines
180           ("\n") without invoking the regex engine.
181
182           Perl's engine sets this if the pattern is "/^/" ("plen == 1 && *exp
183           == '^'"), even under "/^/s"; see split.  Of course a different
184           regex engine might want to use the same optimizations with a
185           different syntax.
186
187       RXf_WHITE
188           Tells the split operator to split the target string on whitespace
189           without invoking the regex engine.  The definition of whitespace
190           varies depending on if the target string is a UTF-8 string and on
191           if RXf_PMf_LOCALE is set.
192
193           Perl's engine sets this flag if the pattern is "\s+".
194
195       RXf_NULL
196           Tells the split operator to split the target string on characters.
197           The definition of character varies depending on if the target
198           string is a UTF-8 string.
199
200           Perl's engine sets this flag on empty patterns, this optimization
201           makes "split //" much faster than it would otherwise be.  It's even
202           faster than "unpack".
203
204       RXf_NO_INPLACE_SUBST
205           Added in perl 5.18.0, this flag indicates that a regular expression
206           might perform an operation that would interfere with inplace
207           substitution. For instance it might contain lookbehind, or assign
208           to non-magical variables (such as $REGMARK and $REGERROR) during
209           matching.  "s///" will skip certain optimisations when this is set.
210
211   exec
212           I32 exec(pTHX_ REGEXP * const rx,
213                    char *stringarg, char* strend, char* strbeg,
214                    SSize_t minend, SV* sv,
215                    void* data, U32 flags);
216
217       Execute a regexp. The arguments are
218
219       rx  The regular expression to execute.
220
221       sv  This is the SV to be matched against.  Note that the actual char
222           array to be matched against is supplied by the arguments described
223           below; the SV is just used to determine UTF8ness, "pos()" etc.
224
225       strbeg
226           Pointer to the physical start of the string.
227
228       strend
229           Pointer to the character following the physical end of the string
230           (i.e.  the "\0", if any).
231
232       stringarg
233           Pointer to the position in the string where matching should start;
234           it might not be equal to "strbeg" (for example in a later iteration
235           of "/.../g").
236
237       minend
238           Minimum length of string (measured in bytes from "stringarg") that
239           must match; if the engine reaches the end of the match but hasn't
240           reached this position in the string, it should fail.
241
242       data
243           Optimisation data; subject to change.
244
245       flags
246           Optimisation flags; subject to change.
247
248   intuit
249           char* intuit(pTHX_
250                       REGEXP * const rx,
251                       SV *sv,
252                       const char * const strbeg,
253                       char *strpos,
254                       char *strend,
255                       const U32 flags,
256                       struct re_scream_pos_data_s *data);
257
258       Find the start position where a regex match should be attempted, or
259       possibly if the regex engine should not be run because the pattern
260       can't match.  This is called, as appropriate, by the core, depending on
261       the values of the "extflags" member of the "regexp" structure.
262
263       Arguments:
264
265           rx:     the regex to match against
266           sv:     the SV being matched: only used for utf8 flag; the string
267                   itself is accessed via the pointers below. Note that on
268                   something like an overloaded SV, SvPOK(sv) may be false
269                   and the string pointers may point to something unrelated to
270                   the SV itself.
271           strbeg: real beginning of string
272           strpos: the point in the string at which to begin matching
273           strend: pointer to the byte following the last char of the string
274           flags   currently unused; set to 0
275           data:   currently unused; set to NULL
276
277   checkstr
278           SV* checkstr(pTHX_ REGEXP * const rx);
279
280       Return a SV containing a string that must appear in the pattern. Used
281       by "split" for optimising matches.
282
283   free
284           void free(pTHX_ REGEXP * const rx);
285
286       Called by Perl when it is freeing a regexp pattern so that the engine
287       can release any resources pointed to by the "pprivate" member of the
288       "regexp" structure.  This is only responsible for freeing private data;
289       Perl will handle releasing anything else contained in the "regexp"
290       structure.
291
292   Numbered capture callbacks
293       Called to get/set the value of "$`", "$'", $& and their named
294       equivalents, ${^PREMATCH}, ${^POSTMATCH} and ${^MATCH}, as well as the
295       numbered capture groups ($1, $2, ...).
296
297       The "paren" parameter will be 1 for $1, 2 for $2 and so forth, and have
298       these symbolic values for the special variables:
299
300           ${^PREMATCH}  RX_BUFF_IDX_CARET_PREMATCH
301           ${^POSTMATCH} RX_BUFF_IDX_CARET_POSTMATCH
302           ${^MATCH}     RX_BUFF_IDX_CARET_FULLMATCH
303           $`            RX_BUFF_IDX_PREMATCH
304           $'            RX_BUFF_IDX_POSTMATCH
305           $&            RX_BUFF_IDX_FULLMATCH
306
307       Note that in Perl 5.17.3 and earlier, the last three constants were
308       also used for the caret variants of the variables.
309
310       The names have been chosen by analogy with Tie::Scalar methods names
311       with an additional LENGTH callback for efficiency.  However named
312       capture variables are currently not tied internally but implemented via
313       magic.
314
315       numbered_buff_FETCH
316
317           void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren,
318                                    SV * const sv);
319
320       Fetch a specified numbered capture.  "sv" should be set to the scalar
321       to return, the scalar is passed as an argument rather than being
322       returned from the function because when it's called Perl already has a
323       scalar to store the value, creating another one would be redundant.
324       The scalar can be set with "sv_setsv", "sv_setpvn" and friends, see
325       perlapi.
326
327       This callback is where Perl untaints its own capture variables under
328       taint mode (see perlsec).  See the "Perl_reg_numbered_buff_fetch"
329       function in regcomp.c for how to untaint capture variables if that's
330       something you'd like your engine to do as well.
331
332       numbered_buff_STORE
333
334           void    (*numbered_buff_STORE) (pTHX_
335                                           REGEXP * const rx,
336                                           const I32 paren,
337                                           SV const * const value);
338
339       Set the value of a numbered capture variable.  "value" is the scalar
340       that is to be used as the new value.  It's up to the engine to make
341       sure this is used as the new value (or reject it).
342
343       Example:
344
345           if ("ook" =~ /(o*)/) {
346               # 'paren' will be '1' and 'value' will be 'ee'
347               $1 =~ tr/o/e/;
348           }
349
350       Perl's own engine will croak on any attempt to modify the capture
351       variables, to do this in another engine use the following callback
352       (copied from "Perl_reg_numbered_buff_store"):
353
354           void
355           Example_reg_numbered_buff_store(pTHX_
356                                           REGEXP * const rx,
357                                           const I32 paren,
358                                           SV const * const value)
359           {
360               PERL_UNUSED_ARG(rx);
361               PERL_UNUSED_ARG(paren);
362               PERL_UNUSED_ARG(value);
363
364               if (!PL_localizing)
365                   Perl_croak(aTHX_ PL_no_modify);
366           }
367
368       Actually Perl will not always croak in a statement that looks like it
369       would modify a numbered capture variable.  This is because the STORE
370       callback will not be called if Perl can determine that it doesn't have
371       to modify the value.  This is exactly how tied variables behave in the
372       same situation:
373
374           package CaptureVar;
375           use parent 'Tie::Scalar';
376
377           sub TIESCALAR { bless [] }
378           sub FETCH { undef }
379           sub STORE { die "This doesn't get called" }
380
381           package main;
382
383           tie my $sv => "CaptureVar";
384           $sv =~ y/a/b/;
385
386       Because $sv is "undef" when the "y///" operator is applied to it, the
387       transliteration won't actually execute and the program won't "die".
388       This is different to how 5.8 and earlier versions behaved since the
389       capture variables were READONLY variables then; now they'll just die
390       when assigned to in the default engine.
391
392       numbered_buff_LENGTH
393
394           I32 numbered_buff_LENGTH (pTHX_
395                                     REGEXP * const rx,
396                                     const SV * const sv,
397                                     const I32 paren);
398
399       Get the "length" of a capture variable.  There's a special callback for
400       this so that Perl doesn't have to do a FETCH and run "length" on the
401       result, since the length is (in Perl's case) known from an offset
402       stored in "rx->offs", this is much more efficient:
403
404           I32 s1  = rx->offs[paren].start;
405           I32 s2  = rx->offs[paren].end;
406           I32 len = t1 - s1;
407
408       This is a little bit more complex in the case of UTF-8, see what
409       "Perl_reg_numbered_buff_length" does with is_utf8_string_loclen.
410
411   Named capture callbacks
412       Called to get/set the value of "%+" and "%-", as well as by some
413       utility functions in re.
414
415       There are two callbacks, "named_buff" is called in all the cases the
416       FETCH, STORE, DELETE, CLEAR, EXISTS and SCALAR Tie::Hash callbacks
417       would be on changes to "%+" and "%-" and "named_buff_iter" in the same
418       cases as FIRSTKEY and NEXTKEY.
419
420       The "flags" parameter can be used to determine which of these
421       operations the callbacks should respond to.  The following flags are
422       currently defined:
423
424       Which Tie::Hash operation is being performed from the Perl level on
425       "%+" or "%+", if any:
426
427           RXapif_FETCH
428           RXapif_STORE
429           RXapif_DELETE
430           RXapif_CLEAR
431           RXapif_EXISTS
432           RXapif_SCALAR
433           RXapif_FIRSTKEY
434           RXapif_NEXTKEY
435
436       If "%+" or "%-" is being operated on, if any.
437
438           RXapif_ONE /* %+ */
439           RXapif_ALL /* %- */
440
441       If this is being called as "re::regname", "re::regnames" or
442       "re::regnames_count", if any.  The first two will be combined with
443       "RXapif_ONE" or "RXapif_ALL".
444
445           RXapif_REGNAME
446           RXapif_REGNAMES
447           RXapif_REGNAMES_COUNT
448
449       Internally "%+" and "%-" are implemented with a real tied interface via
450       Tie::Hash::NamedCapture.  The methods in that package will call back
451       into these functions.  However the usage of Tie::Hash::NamedCapture for
452       this purpose might change in future releases.  For instance this might
453       be implemented by magic instead (would need an extension to mgvtbl).
454
455       named_buff
456
457           SV*     (*named_buff) (pTHX_ REGEXP * const rx, SV * const key,
458                                  SV * const value, U32 flags);
459
460       named_buff_iter
461
462           SV*     (*named_buff_iter) (pTHX_
463                                       REGEXP * const rx,
464                                       const SV * const lastkey,
465                                       const U32 flags);
466
467   qr_package
468           SV* qr_package(pTHX_ REGEXP * const rx);
469
470       The package the qr// magic object is blessed into (as seen by "ref
471       qr//").  It is recommended that engines change this to their package
472       name for identification regardless of if they implement methods on the
473       object.
474
475       The package this method returns should also have the internal "Regexp"
476       package in its @ISA.  "qr//->isa("Regexp")" should always be true
477       regardless of what engine is being used.
478
479       Example implementation might be:
480
481           SV*
482           Example_qr_package(pTHX_ REGEXP * const rx)
483           {
484               PERL_UNUSED_ARG(rx);
485               return newSVpvs("re::engine::Example");
486           }
487
488       Any method calls on an object created with "qr//" will be dispatched to
489       the package as a normal object.
490
491           use re::engine::Example;
492           my $re = qr//;
493           $re->meth; # dispatched to re::engine::Example::meth()
494
495       To retrieve the "REGEXP" object from the scalar in an XS function use
496       the "SvRX" macro, see "REGEXP Functions" in perlapi.
497
498           void meth(SV * rv)
499           PPCODE:
500               REGEXP * re = SvRX(sv);
501
502   dupe
503           void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
504
505       On threaded builds a regexp may need to be duplicated so that the
506       pattern can be used by multiple threads.  This routine is expected to
507       handle the duplication of any private data pointed to by the "pprivate"
508       member of the "regexp" structure.  It will be called with the
509       preconstructed new "regexp" structure as an argument, the "pprivate"
510       member will point at the old private structure, and it is this
511       routine's responsibility to construct a copy and return a pointer to it
512       (which Perl will then use to overwrite the field as passed to this
513       routine.)
514
515       This allows the engine to dupe its private data but also if necessary
516       modify the final structure if it really must.
517
518       On unthreaded builds this field doesn't exist.
519
520   op_comp
521       This is private to the Perl core and subject to change. Should be left
522       null.
523

The REGEXP structure

525       The REGEXP struct is defined in regexp.h.  All regex engines must be
526       able to correctly build such a structure in their "comp" routine.
527
528       The REGEXP structure contains all the data that Perl needs to be aware
529       of to properly work with the regular expression.  It includes data
530       about optimisations that Perl can use to determine if the regex engine
531       should really be used, and various other control info that is needed to
532       properly execute patterns in various contexts, such as if the pattern
533       anchored in some way, or what flags were used during the compile, or if
534       the program contains special constructs that Perl needs to be aware of.
535
536       In addition it contains two fields that are intended for the private
537       use of the regex engine that compiled the pattern.  These are the
538       "intflags" and "pprivate" members.  "pprivate" is a void pointer to an
539       arbitrary structure, whose use and management is the responsibility of
540       the compiling engine.  Perl will never modify either of these values.
541
542           typedef struct regexp {
543               /* what engine created this regexp? */
544               const struct regexp_engine* engine;
545
546               /* what re is this a lightweight copy of? */
547               struct regexp* mother_re;
548
549               /* Information about the match that the Perl core uses to manage
550                * things */
551               U32 extflags;   /* Flags used both externally and internally */
552               I32 minlen;     /* mininum possible number of chars in */
553                                  string to match */
554               I32 minlenret;  /* mininum possible number of chars in $& */
555               U32 gofs;       /* chars left of pos that we search from */
556
557               /* substring data about strings that must appear
558                  in the final match, used for optimisations */
559               struct reg_substr_data *substrs;
560
561               U32 nparens;  /* number of capture groups */
562
563               /* private engine specific data */
564               U32 intflags;   /* Engine Specific Internal flags */
565               void *pprivate; /* Data private to the regex engine which
566                                  created this object. */
567
568               /* Data about the last/current match. These are modified during
569                * matching*/
570               U32 lastparen;            /* highest close paren matched ($+) */
571               U32 lastcloseparen;       /* last close paren matched ($^N) */
572               regexp_paren_pair *swap;  /* Swap copy of *offs */
573               regexp_paren_pair *offs;  /* Array of offsets for (@-) and
574                                            (@+) */
575
576               char *subbeg;  /* saved or original string so \digit works
577                                 forever. */
578               SV_SAVED_COPY  /* If non-NULL, SV which is COW from original */
579               I32 sublen;    /* Length of string pointed by subbeg */
580               I32 suboffset;  /* byte offset of subbeg from logical start of
581                                  str */
582               I32 subcoffset; /* suboffset equiv, but in chars (for @-/@+) */
583
584               /* Information about the match that isn't often used */
585               I32 prelen;           /* length of precomp */
586               const char *precomp;  /* pre-compilation regular expression */
587
588               char *wrapped;  /* wrapped version of the pattern */
589               I32 wraplen;    /* length of wrapped */
590
591               I32 seen_evals;   /* number of eval groups in the pattern - for
592                                    security checks */
593               HV *paren_names;  /* Optional hash of paren names */
594
595               /* Refcount of this regexp */
596               I32 refcnt;             /* Refcount of this regexp */
597           } regexp;
598
599       The fields are discussed in more detail below:
600
601   "engine"
602       This field points at a "regexp_engine" structure which contains
603       pointers to the subroutines that are to be used for performing a match.
604       It is the compiling routine's responsibility to populate this field
605       before returning the regexp object.
606
607       Internally this is set to "NULL" unless a custom engine is specified in
608       $^H{regcomp}, Perl's own set of callbacks can be accessed in the struct
609       pointed to by "RE_ENGINE_PTR".
610
611   "mother_re"
612       TODO, see
613       <http://www.mail-archive.com/perl5-changes@perl.org/msg17328.html>
614
615   "extflags"
616       This will be used by Perl to see what flags the regexp was compiled
617       with, this will normally be set to the value of the flags parameter by
618       the comp callback.  See the comp documentation for valid flags.
619
620   "minlen" "minlenret"
621       The minimum string length (in characters) required for the pattern to
622       match.  This is used to prune the search space by not bothering to
623       match any closer to the end of a string than would allow a match.  For
624       instance there is no point in even starting the regex engine if the
625       minlen is 10 but the string is only 5 characters long.  There is no way
626       that the pattern can match.
627
628       "minlenret" is the minimum length (in characters) of the string that
629       would be found in $& after a match.
630
631       The difference between "minlen" and "minlenret" can be seen in the
632       following pattern:
633
634           /ns(?=\d)/
635
636       where the "minlen" would be 3 but "minlenret" would only be 2 as the \d
637       is required to match but is not actually included in the matched
638       content.  This distinction is particularly important as the
639       substitution logic uses the "minlenret" to tell if it can do in-place
640       substitutions (these can result in considerable speed-up).
641
642   "gofs"
643       Left offset from pos() to start match at.
644
645   "substrs"
646       Substring data about strings that must appear in the final match.  This
647       is currently only used internally by Perl's engine, but might be used
648       in the future for all engines for optimisations.
649
650   "nparens", "lastparen", and "lastcloseparen"
651       These fields are used to keep track of how many paren groups could be
652       matched in the pattern, which was the last open paren to be entered,
653       and which was the last close paren to be entered.
654
655   "intflags"
656       The engine's private copy of the flags the pattern was compiled with.
657       Usually this is the same as "extflags" unless the engine chose to
658       modify one of them.
659
660   "pprivate"
661       A void* pointing to an engine-defined data structure.  The Perl engine
662       uses the "regexp_internal" structure (see "Base Structures" in
663       perlreguts) but a custom engine should use something else.
664
665   "swap"
666       Unused.  Left in for compatibility with Perl 5.10.0.
667
668   "offs"
669       A "regexp_paren_pair" structure which defines offsets into the string
670       being matched which correspond to the $& and $1, $2 etc. captures, the
671       "regexp_paren_pair" struct is defined as follows:
672
673           typedef struct regexp_paren_pair {
674               I32 start;
675               I32 end;
676           } regexp_paren_pair;
677
678       If "->offs[num].start" or "->offs[num].end" is "-1" then that capture
679       group did not match.  "->offs[0].start/end" represents $& (or
680       "${^MATCH}" under "/p") and "->offs[paren].end" matches $$paren where
681       $paren = 1>.
682
683   "precomp" "prelen"
684       Used for optimisations.  "precomp" holds a copy of the pattern that was
685       compiled and "prelen" its length.  When a new pattern is to be compiled
686       (such as inside a loop) the internal "regcomp" operator checks if the
687       last compiled "REGEXP"'s "precomp" and "prelen" are equivalent to the
688       new one, and if so uses the old pattern instead of compiling a new one.
689
690       The relevant snippet from "Perl_pp_regcomp":
691
692               if (!re || !re->precomp || re->prelen != (I32)len ||
693                   memNE(re->precomp, t, len))
694               /* Compile a new pattern */
695
696   "paren_names"
697       This is a hash used internally to track named capture groups and their
698       offsets.  The keys are the names of the buffers the values are
699       dualvars, with the IV slot holding the number of buffers with the given
700       name and the pv being an embedded array of I32.  The values may also be
701       contained independently in the data array in cases where named
702       backreferences are used.
703
704   "substrs"
705       Holds information on the longest string that must occur at a fixed
706       offset from the start of the pattern, and the longest string that must
707       occur at a floating offset from the start of the pattern.  Used to do
708       Fast-Boyer-Moore searches on the string to find out if its worth using
709       the regex engine at all, and if so where in the string to search.
710
711   "subbeg" "sublen" "saved_copy" "suboffset" "subcoffset"
712       Used during the execution phase for managing search and replace
713       patterns, and for providing the text for $&, $1 etc. "subbeg" points to
714       a buffer (either the original string, or a copy in the case of
715       "RX_MATCH_COPIED(rx)"), and "sublen" is the length of the buffer.  The
716       "RX_OFFS" start and end indices index into this buffer.
717
718       In the presence of the "REXEC_COPY_STR" flag, but with the addition of
719       the "REXEC_COPY_SKIP_PRE" or "REXEC_COPY_SKIP_POST" flags, an engine
720       can choose not to copy the full buffer (although it must still do so in
721       the presence of "RXf_PMf_KEEPCOPY" or the relevant bits being set in
722       "PL_sawampersand").  In this case, it may set "suboffset" to indicate
723       the number of bytes from the logical start of the buffer to the
724       physical start (i.e. "subbeg").  It should also set "subcoffset", the
725       number of characters in the offset. The latter is needed to support
726       "@-" and "@+" which work in characters, not bytes.
727
728   "wrapped" "wraplen"
729       Stores the string "qr//" stringifies to. The Perl engine for example
730       stores "(?^:eek)" in the case of "qr/eek/".
731
732       When using a custom engine that doesn't support the "(?:)" construct
733       for inline modifiers, it's probably best to have "qr//" stringify to
734       the supplied pattern, note that this will create undesired patterns in
735       cases such as:
736
737           my $x = qr/a|b/;  # "a|b"
738           my $y = qr/c/i;   # "c"
739           my $z = qr/$x$y/; # "a|bc"
740
741       There's no solution for this problem other than making the custom
742       engine understand a construct like "(?:)".
743
744   "seen_evals"
745       This stores the number of eval groups in the pattern.  This is used for
746       security purposes when embedding compiled regexes into larger patterns
747       with "qr//".
748
749   "refcnt"
750       The number of times the structure is referenced.  When this falls to 0,
751       the regexp is automatically freed by a call to pregfree.  This should
752       be set to 1 in each engine's "comp" routine.
753

HISTORY

755       Originally part of perlreguts.
756

AUTHORS

758       Originally written by Yves Orton, expanded by AEvar Arnfjoerd`
759       Bjarmason.
760

LICENSE

762       Copyright 2006 Yves Orton and 2007 AEvar Arnfjoerd` Bjarmason.
763
764       This program is free software; you can redistribute it and/or modify it
765       under the same terms as Perl itself.
766
767
768
769perl v5.26.3                      2018-03-23                      PERLREAPI(1)
Impressum