1PCRESTACK(3) Library Functions Manual PCRESTACK(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 When you call pcre[16|32]_exec(), it makes use of an internal function
11 called match(). This calls itself recursively at branch points in the
12 pattern, in order to remember the state of the match so that it can
13 back up and try a different alternative if the first one fails. As
14 matching proceeds deeper and deeper into the tree of possibilities, the
15 recursion depth increases. The match() function is also called in other
16 circumstances, for example, whenever a parenthesized sub-pattern is
17 entered, and in certain cases of repetition.
18
19 Not all calls of match() increase the recursion depth; for an item such
20 as a* it may be called several times at the same level, after matching
21 different numbers of a's. Furthermore, in a number of cases where the
22 result of the recursive call would immediately be passed back as the
23 result of the current call (a "tail recursion"), the function is just
24 restarted instead.
25
26 The above comments apply when pcre[16|32]_exec() is run in its normal
27 interpretive manner. If the pattern was studied with the
28 PCRE_STUDY_JIT_COMPILE option, and just-in-time compiling was success‐
29 ful, and the options passed to pcre[16|32]_exec() were not incompati‐
30 ble, the matching process uses the JIT-compiled code instead of the
31 match() function. In this case, the memory requirements are handled
32 entirely differently. See the pcrejit documentation for details.
33
34 The pcre[16|32]_dfa_exec() function operates in an entirely different
35 way, and uses recursion only when there is a regular expression recur‐
36 sion or subroutine call in the pattern. This includes the processing of
37 assertion and "once-only" subpatterns, which are handled like subrou‐
38 tine calls. Normally, these are never very deep, and the limit on the
39 complexity of pcre[16|32]_dfa_exec() is controlled by the amount of
40 workspace it is given. However, it is possible to write patterns with
41 runaway infinite recursions; such patterns will cause
42 pcre[16|32]_dfa_exec() to run out of stack. At present, there is no
43 protection against this.
44
45 The comments that follow do NOT apply to pcre[16|32]_dfa_exec(); they
46 are relevant only for pcre[16|32]_exec() without the JIT optimization.
47
48 Reducing pcre[16|32]_exec()'s stack usage
49
50 Each time that match() is actually called recursively, it uses memory
51 from the process stack. For certain kinds of pattern and data, very
52 large amounts of stack may be needed, despite the recognition of "tail
53 recursion". You can often reduce the amount of recursion, and there‐
54 fore the amount of stack used, by modifying the pattern that is being
55 matched. Consider, for example, this pattern:
56
57 ([^<]|<(?!inet))+
58
59 It matches from wherever it starts until it encounters "<inet" or the
60 end of the data, and is the kind of pattern that might be used when
61 processing an XML file. Each iteration of the outer parentheses matches
62 either one character that is not "<" or a "<" that is not followed by
63 "inet". However, each time a parenthesis is processed, a recursion
64 occurs, so this formulation uses a stack frame for each matched charac‐
65 ter. For a long string, a lot of stack is required. Consider now this
66 rewritten pattern, which matches exactly the same strings:
67
68 ([^<]++|<(?!inet))+
69
70 This uses very much less stack, because runs of characters that do not
71 contain "<" are "swallowed" in one item inside the parentheses. Recur‐
72 sion happens only when a "<" character that is not followed by "inet"
73 is encountered (and we assume this is relatively rare). A possessive
74 quantifier is used to stop any backtracking into the runs of non-"<"
75 characters, but that is not related to stack usage.
76
77 This example shows that one way of avoiding stack problems when match‐
78 ing long subject strings is to write repeated parenthesized subpatterns
79 to match more than one character whenever possible.
80
81 Compiling PCRE to use heap instead of stack for pcre[16|32]_exec()
82
83 In environments where stack memory is constrained, you might want to
84 compile PCRE to use heap memory instead of stack for remembering back-
85 up points when pcre[16|32]_exec() is running. This makes it run a lot
86 more slowly, however. Details of how to do this are given in the pcre‐
87 build documentation. When built in this way, instead of using the
88 stack, PCRE obtains and frees memory by calling the functions that are
89 pointed to by the pcre[16|32]_stack_malloc and pcre[16|32]_stack_free
90 variables. By default, these point to malloc() and free(), but you can
91 replace the pointers to cause PCRE to use your own functions. Since the
92 block sizes are always the same, and are always freed in reverse order,
93 it may be possible to implement customized memory handlers that are
94 more efficient than the standard functions.
95
96 Limiting pcre[16|32]_exec()'s stack usage
97
98 You can set limits on the number of times that match() is called, both
99 in total and recursively. If a limit is exceeded, pcre[16|32]_exec()
100 returns an error code. Setting suitable limits should prevent it from
101 running out of stack. The default values of the limits are very large,
102 and unlikely ever to operate. They can be changed when PCRE is built,
103 and they can also be set when pcre[16|32]_exec() is called. For details
104 of these interfaces, see the pcrebuild documentation and the section on
105 extra data for pcre[16|32]_exec() in the pcreapi documentation.
106
107 As a very rough rule of thumb, you should reckon on about 500 bytes per
108 recursion. Thus, if you want to limit your stack usage to 8Mb, you
109 should set the limit at 16000 recursions. A 64Mb stack, on the other
110 hand, can support around 128000 recursions.
111
112 In Unix-like environments, the pcretest test program has a command line
113 option (-S) that can be used to increase the size of its stack. As long
114 as the stack is large enough, another option (-M) can be used to find
115 the smallest limits that allow a particular pattern to match a given
116 subject string. This is done by calling pcre[16|32]_exec() repeatedly
117 with different limits.
118
119 Obtaining an estimate of stack usage
120
121 The actual amount of stack used per recursion can vary quite a lot,
122 depending on the compiler that was used to build PCRE and the optimiza‐
123 tion or debugging options that were set for it. The rule of thumb value
124 of 500 bytes mentioned above may be larger or smaller than what is
125 actually needed. A better approximation can be obtained by running this
126 command:
127
128 pcretest -m -C
129
130 The -C option causes pcretest to output information about the options
131 with which PCRE was compiled. When -m is also given (before -C), infor‐
132 mation about stack use is given in a line like this:
133
134 Match recursion uses stack: approximate frame size = 640 bytes
135
136 The value is approximate because some recursions need a bit more (up to
137 perhaps 16 more bytes).
138
139 If the above command is given when PCRE is compiled to use the heap
140 instead of the stack for recursion, the value that is output is the
141 size of each block that is obtained from the heap.
142
143 Changing stack size in Unix-like systems
144
145 In Unix-like environments, there is not often a problem with the stack
146 unless very long strings are involved, though the default limit on
147 stack size varies from system to system. Values from 8Mb to 64Mb are
148 common. You can find your default limit by running the command:
149
150 ulimit -s
151
152 Unfortunately, the effect of running out of stack is often SIGSEGV,
153 though sometimes a more explicit error message is given. You can nor‐
154 mally increase the limit on stack size by code such as this:
155
156 struct rlimit rlim;
157 getrlimit(RLIMIT_STACK, &rlim);
158 rlim.rlim_cur = 100*1024*1024;
159 setrlimit(RLIMIT_STACK, &rlim);
160
161 This reads the current limits (soft and hard) using getrlimit(), then
162 attempts to increase the soft limit to 100Mb using setrlimit(). You
163 must do this before calling pcre[16|32]_exec().
164
165 Changing stack size in Mac OS X
166
167 Using setrlimit(), as described above, should also work on Mac OS X. It
168 is also possible to set a stack size when linking a program. There is a
169 discussion about stack sizes in Mac OS X at this web site:
170 http://developer.apple.com/qa/qa2005/qa1419.html.
171
173
174 Philip Hazel
175 University Computing Service
176 Cambridge CB2 3QH, England.
177
179
180 Last updated: 24 June 2012
181 Copyright (c) 1997-2012 University of Cambridge.
182
183
184
185PCRE 8.30 24 June 2012 PCRESTACK(3)