1PCRESTACK(3)               Library Functions Manual               PCRESTACK(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PCRE DISCUSSION OF STACK USAGE

9
10       When  you call pcre[16|32]_exec(), it makes use of an internal function
11       called match(). This calls itself recursively at branch points  in  the
12       pattern,  in  order  to  remember the state of the match so that it can
13       back up and try a different alternative if  the  first  one  fails.  As
14       matching proceeds deeper and deeper into the tree of possibilities, the
15       recursion depth increases. The match() function is also called in other
16       circumstances,  for  example,  whenever  a parenthesized sub-pattern is
17       entered, and in certain cases of repetition.
18
19       Not all calls of match() increase the recursion depth; for an item such
20       as  a* it may be called several times at the same level, after matching
21       different numbers of a's. Furthermore, in a number of cases  where  the
22       result  of  the  recursive call would immediately be passed back as the
23       result of the current call (a "tail recursion"), the function  is  just
24       restarted instead.
25
26       The  above  comments apply when pcre[16|32]_exec() is run in its normal
27       interpretive  manner.   If   the   pattern   was   studied   with   the
28       PCRE_STUDY_JIT_COMPILE  option, and just-in-time compiling was success‐
29       ful, and the options passed to pcre[16|32]_exec() were  not  incompati‐
30       ble,  the  matching  process  uses the JIT-compiled code instead of the
31       match() function. In this case, the  memory  requirements  are  handled
32       entirely differently. See the pcrejit documentation for details.
33
34       The  pcre[16|32]_dfa_exec()  function operates in an entirely different
35       way, and uses recursion only when there is a regular expression  recur‐
36       sion or subroutine call in the pattern. This includes the processing of
37       assertion and "once-only" subpatterns, which are handled  like  subrou‐
38       tine  calls.  Normally, these are never very deep, and the limit on the
39       complexity of pcre[16|32]_dfa_exec() is controlled  by  the  amount  of
40       workspace  it is given.  However, it is possible to write patterns with
41       runaway    infinite    recursions;    such    patterns    will    cause
42       pcre[16|32]_dfa_exec()  to  run  out  of stack. At present, there is no
43       protection against this.
44
45       The comments that follow do NOT apply to  pcre[16|32]_dfa_exec();  they
46       are relevant only for pcre[16|32]_exec() without the JIT optimization.
47
48   Reducing pcre[16|32]_exec()'s stack usage
49
50       Each  time  that match() is actually called recursively, it uses memory
51       from the process stack. For certain kinds of  pattern  and  data,  very
52       large  amounts of stack may be needed, despite the recognition of "tail
53       recursion".  You can often reduce the amount of recursion,  and  there‐
54       fore  the  amount of stack used, by modifying the pattern that is being
55       matched. Consider, for example, this pattern:
56
57         ([^<]|<(?!inet))+
58
59       It matches from wherever it starts until it encounters "<inet"  or  the
60       end  of  the  data,  and is the kind of pattern that might be used when
61       processing an XML file. Each iteration of the outer parentheses matches
62       either  one  character that is not "<" or a "<" that is not followed by
63       "inet". However, each time a  parenthesis  is  processed,  a  recursion
64       occurs, so this formulation uses a stack frame for each matched charac‐
65       ter. For a long string, a lot of stack is required. Consider  now  this
66       rewritten pattern, which matches exactly the same strings:
67
68         ([^<]++|<(?!inet))+
69
70       This  uses very much less stack, because runs of characters that do not
71       contain "<" are "swallowed" in one item inside the parentheses.  Recur‐
72       sion  happens  only when a "<" character that is not followed by "inet"
73       is encountered (and we assume this is relatively  rare).  A  possessive
74       quantifier  is  used  to stop any backtracking into the runs of non-"<"
75       characters, but that is not related to stack usage.
76
77       This example shows that one way of avoiding stack problems when  match‐
78       ing long subject strings is to write repeated parenthesized subpatterns
79       to match more than one character whenever possible.
80
81   Compiling PCRE to use heap instead of stack for pcre[16|32]_exec()
82
83       In environments where stack memory is constrained, you  might  want  to
84       compile  PCRE to use heap memory instead of stack for remembering back-
85       up points when pcre[16|32]_exec() is running. This makes it run  a  lot
86       more slowly, however.  Details of how to do this are given in the pcre‐
87       build documentation. When built in  this  way,  instead  of  using  the
88       stack,  PCRE obtains and frees memory by calling the functions that are
89       pointed to by the pcre[16|32]_stack_malloc  and  pcre[16|32]_stack_free
90       variables.  By default, these point to malloc() and free(), but you can
91       replace the pointers to cause PCRE to use your own functions. Since the
92       block sizes are always the same, and are always freed in reverse order,
93       it may be possible to implement customized  memory  handlers  that  are
94       more efficient than the standard functions.
95
96   Limiting pcre[16|32]_exec()'s stack usage
97
98       You  can set limits on the number of times that match() is called, both
99       in total and recursively. If a limit  is  exceeded,  pcre[16|32]_exec()
100       returns  an  error code. Setting suitable limits should prevent it from
101       running out of stack. The default values of the limits are very  large,
102       and  unlikely  ever to operate. They can be changed when PCRE is built,
103       and they can also be set when pcre[16|32]_exec() is called. For details
104       of these interfaces, see the pcrebuild documentation and the section on
105       extra data for pcre[16|32]_exec() in the pcreapi documentation.
106
107       As a very rough rule of thumb, you should reckon on about 500 bytes per
108       recursion.  Thus,  if  you  want  to limit your stack usage to 8Mb, you
109       should set the limit at 16000 recursions. A 64Mb stack,  on  the  other
110       hand, can support around 128000 recursions.
111
112       In Unix-like environments, the pcretest test program has a command line
113       option (-S) that can be used to increase the size of its stack. As long
114       as  the  stack is large enough, another option (-M) can be used to find
115       the smallest limits that allow a particular pattern to  match  a  given
116       subject  string.  This is done by calling pcre[16|32]_exec() repeatedly
117       with different limits.
118
119   Obtaining an estimate of stack usage
120
121       The actual amount of stack used per recursion can  vary  quite  a  lot,
122       depending on the compiler that was used to build PCRE and the optimiza‐
123       tion or debugging options that were set for it. The rule of thumb value
124       of  500  bytes  mentioned  above  may be larger or smaller than what is
125       actually needed. A better approximation can be obtained by running this
126       command:
127
128         pcretest -m -C
129
130       The  -C  option causes pcretest to output information about the options
131       with which PCRE was compiled. When -m is also given (before -C), infor‐
132       mation about stack use is given in a line like this:
133
134         Match recursion uses stack: approximate frame size = 640 bytes
135
136       The value is approximate because some recursions need a bit more (up to
137       perhaps 16 more bytes).
138
139       If the above command is given when PCRE is compiled  to  use  the  heap
140       instead  of  the  stack  for recursion, the value that is output is the
141       size of each block that is obtained from the heap.
142
143   Changing stack size in Unix-like systems
144
145       In Unix-like environments, there is not often a problem with the  stack
146       unless  very  long  strings  are  involved, though the default limit on
147       stack size varies from system to system. Values from 8Mb  to  64Mb  are
148       common. You can find your default limit by running the command:
149
150         ulimit -s
151
152       Unfortunately,  the  effect  of  running out of stack is often SIGSEGV,
153       though sometimes a more explicit error message is given. You  can  nor‐
154       mally increase the limit on stack size by code such as this:
155
156         struct rlimit rlim;
157         getrlimit(RLIMIT_STACK, &rlim);
158         rlim.rlim_cur = 100*1024*1024;
159         setrlimit(RLIMIT_STACK, &rlim);
160
161       This  reads  the current limits (soft and hard) using getrlimit(), then
162       attempts to increase the soft limit to  100Mb  using  setrlimit().  You
163       must do this before calling pcre[16|32]_exec().
164
165   Changing stack size in Mac OS X
166
167       Using setrlimit(), as described above, should also work on Mac OS X. It
168       is also possible to set a stack size when linking a program. There is a
169       discussion   about   stack  sizes  in  Mac  OS  X  at  this  web  site:
170       http://developer.apple.com/qa/qa2005/qa1419.html.
171

AUTHOR

173
174       Philip Hazel
175       University Computing Service
176       Cambridge CB2 3QH, England.
177

REVISION

179
180       Last updated: 24 June 2012
181       Copyright (c) 1997-2012 University of Cambridge.
182
183
184
185PCRE 8.30                        24 June 2012                     PCRESTACK(3)
Impressum