1PCRE2STACK(3)              Library Functions Manual              PCRE2STACK(3)
2
3
4

NAME

6       PCRE2 - Perl-compatible regular expressions (revised API)
7

PCRE2 DISCUSSION OF STACK USAGE

9
10       When  you  call  pcre2_match(),  it  makes  use of an internal function
11       called match(). This calls itself recursively at branch points  in  the
12       pattern,  in  order  to  remember the state of the match so that it can
13       back up and try a different alternative after a  failure.  As  matching
14       proceeds  deeper  and deeper into the tree of possibilities, the recur‐
15       sion depth increases. The match() function is also called in other cir‐
16       cumstances,  for  example,  whenever  a  parenthesized  sub-pattern  is
17       entered, and in certain cases of repetition.
18
19       Not all calls of match() increase the recursion depth; for an item such
20       as  a* it may be called several times at the same level, after matching
21       different numbers of a's. Furthermore, in a number of cases  where  the
22       result  of  the  recursive call would immediately be passed back as the
23       result of the current call (a "tail recursion"), the function  is  just
24       restarted instead.
25
26       Each  time the internal match() function is called recursively, it uses
27       memory from the process stack. For certain kinds of pattern  and  data,
28       very  large  amounts of stack may be needed, despite the recognition of
29       "tail recursion". Note that if  PCRE2  is  compiled  with  the  -fsani‐
30       tize=address  option  of  the  GCC compiler, the stack requirements are
31       greatly increased.
32
33       The above comments apply when pcre2_match() is run in its normal inter‐
34       pretive manner. If the compiled pattern was processed by pcre2_jit_com‐
35       pile(), and just-in-time compiling  was  successful,  and  the  options
36       passed  to  pcre2_match()  were  not incompatible, the matching process
37       uses the JIT-compiled code instead of the  match()  function.  In  this
38       case, the memory requirements are handled entirely differently. See the
39       pcre2jit documentation for details.
40
41       The  pcre2_dfa_match()  function  operates  in  a  different   way   to
42       pcre2_match(),  and uses recursion only when there is a regular expres‐
43       sion recursion or subroutine call in the  pattern.  This  includes  the
44       processing  of assertion and "once-only" subpatterns, which are handled
45       like subroutine calls.  Normally, these are never very  deep,  and  the
46       limit  on  the  complexity  of  pcre2_dfa_match()  is controlled by the
47       amount of workspace it is given.  However, it is possible to write pat‐
48       terns  with  runaway  infinite  recursions;  such  patterns  will cause
49       pcre2_dfa_match() to run out of stack unless a limit  is  applied  (see
50       below).
51
52       The   comments   in   the   next   three   sections  do  not  apply  to
53       pcre2_dfa_match(); they are relevant only for pcre2_match() without the
54       JIT optimization.
55
56   Reducing pcre2_match()'s stack usage
57
58       You  can often reduce the amount of recursion, and therefore the amount
59       of stack used, by modifying the pattern that  is  being  matched.  Con‐
60       sider, for example, this pattern:
61
62         ([^<]|<(?!inet))+
63
64       It  matches  from wherever it starts until it encounters "<inet" or the
65       end of the data, and is the kind of pattern that  might  be  used  when
66       processing an XML file. Each iteration of the outer parentheses matches
67       either one character that is not "<" or a "<" that is not  followed  by
68       "inet".  However,  each  time  a  parenthesis is processed, a recursion
69       occurs, so this formulation uses a stack frame for each matched charac‐
70       ter.  For  a long string, a lot of stack is required. Consider now this
71       rewritten pattern, which matches exactly the same strings:
72
73         ([^<]++|<(?!inet))+
74
75       This uses very much less stack, because runs of characters that do  not
76       contain  "<" are "swallowed" in one item inside the parentheses. Recur‐
77       sion happens only when a "<" character that is not followed  by  "inet"
78       is  encountered  (and  we assume this is relatively rare). A possessive
79       quantifier is used to stop any backtracking into the  runs  of  non-"<"
80       characters, but that is not related to stack usage.
81
82       This  example shows that one way of avoiding stack problems when match‐
83       ing long subject strings is to write repeated parenthesized subpatterns
84       to match more than one character whenever possible.
85
86   Compiling PCRE2 to use heap instead of stack for pcre2_match()
87
88       In  environments  where  stack memory is constrained, you might want to
89       compile PCRE2 to use heap memory instead of stack for remembering back-
90       up points when pcre2_match() is running. This makes it run more slowly,
91       however. Details of how to do this are given in the pcre2build documen‐
92       tation.  When built in this way, instead of using the stack, PCRE2 gets
93       memory for remembering backup points from the  heap.  By  default,  the
94       memory is obtained by calling the system malloc() function, but you can
95       arrange to supply your own memory management function. For details, see
96       the section entitled "The match context" in the pcre2api documentation.
97       Since the block sizes are always the same, it may be possible to imple‐
98       ment  a customized memory handler that is more efficient than the stan‐
99       dard function. The memory blocks obtained for this purpose are retained
100       and  re-used  if  possible while pcre2_match() is running. They are all
101       freed just before it exits.
102
103   Limiting pcre2_match()'s stack usage
104
105       You can set limits on the number of times the internal match() function
106       is  called,  both  in  total  and  recursively. If a limit is exceeded,
107       pcre2_match() returns an error code.  Setting  suitable  limits  should
108       prevent  it from running out of stack. The default values of the limits
109       are very large, and unlikely ever to operate. They can be changed  when
110       PCRE2  is built, and they can also be set when pcre2_match() is called.
111       For details of these interfaces, see the pcre2build  documentation  and
112       the section entitled "The match context" in the pcre2api documentation.
113
114       As a very rough rule of thumb, you should reckon on about 500 bytes per
115       recursion. Thus, if you want to limit your  stack  usage  to  8Mb,  you
116       should  set  the  limit at 16000 recursions. A 64Mb stack, on the other
117       hand, can support around 128000 recursions.
118
119       The pcre2test test program has a modifier called  "find_limits"  which,
120       if  applied  to  a  subject line, causes it to find the smallest limits
121       that allow a a pattern to match. This is done by calling  pcre2_match()
122       repeatedly with different limits.
123
124   Limiting pcre2_dfa_match()'s stack usage
125
126       The recursion limit, as described above for pcre2_match(), also applies
127       to pcre2_dfa_match(), whose use of recursive function calls for  recur‐
128       sions in the pattern can lead to runaway stack usage. The non-recursive
129       match limit is not relevant for DFA matching, and is ignored.
130
131   Changing stack size in Unix-like systems
132
133       In Unix-like environments, there is not often a problem with the  stack
134       unless  very  long  strings  are  involved, though the default limit on
135       stack size varies from system to system. Values from 8Mb  to  64Mb  are
136       common. You can find your default limit by running the command:
137
138         ulimit -s
139
140       Unfortunately,  the  effect  of  running out of stack is often SIGSEGV,
141       though sometimes a more explicit error message is given. You  can  nor‐
142       mally increase the limit on stack size by code such as this:
143
144         struct rlimit rlim;
145         getrlimit(RLIMIT_STACK, &rlim);
146         rlim.rlim_cur = 100*1024*1024;
147         setrlimit(RLIMIT_STACK, &rlim);
148
149       This  reads  the current limits (soft and hard) using getrlimit(), then
150       attempts to increase the soft limit to  100Mb  using  setrlimit().  You
151       must do this before calling pcre2_match().
152
153   Changing stack size in Mac OS X
154
155       Using setrlimit(), as described above, should also work on Mac OS X. It
156       is also possible to set a stack size when linking a program. There is a
157       discussion   about   stack  sizes  in  Mac  OS  X  at  this  web  site:
158       http://developer.apple.com/qa/qa2005/qa1419.html.
159

AUTHOR

161
162       Philip Hazel
163       University Computing Service
164       Cambridge, England.
165

REVISION

167
168       Last updated: 23 December 2016
169       Copyright (c) 1997-2016 University of Cambridge.
170
171
172
173PCRE2 10.23                    23 December 2016                  PCRE2STACK(3)
Impressum