1PERLDEBGUTS(1) Perl Programmers Reference Guide PERLDEBGUTS(1)
2
3
4
6 perldebguts - Guts of Perl debugging
7
9 This is not perldebug, which tells you how to use the debugger. This
10 manpage describes low-level details concerning the debugger's
11 internals, which range from difficult to impossible to understand for
12 anyone who isn't incredibly intimate with Perl's guts. Caveat lector.
13
15 Perl has special debugging hooks at compile-time and run-time used to
16 create debugging environments. These hooks are not to be confused with
17 the perl -Dxxx command described in perlrun, which is usable only if a
18 special Perl is built per the instructions in the INSTALL podpage in
19 the Perl source tree.
20
21 For example, whenever you call Perl's built-in "caller" function from
22 the package "DB", the arguments that the corresponding stack frame was
23 called with are copied to the @DB::args array. These mechanisms are
24 enabled by calling Perl with the -d switch. Specifically, the
25 following additional features are enabled (cf. "$^P" in perlvar):
26
27 · Perl inserts the contents of $ENV{PERL5DB} (or "BEGIN {require
28 'perl5db.pl'}" if not present) before the first line of your
29 program.
30
31 · Each array "@{"_<$filename"}" holds the lines of $filename for a
32 file compiled by Perl. The same is also true for "eval"ed strings
33 that contain subroutines, or which are currently being executed.
34 The $filename for "eval"ed strings looks like "(eval 34)".
35
36 Values in this array are magical in numeric context: they compare
37 equal to zero only if the line is not breakable.
38
39 · Each hash "%{"_<$filename"}" contains breakpoints and actions keyed
40 by line number. Individual entries (as opposed to the whole hash)
41 are settable. Perl only cares about Boolean true here, although
42 the values used by perl5db.pl have the form
43 "$break_condition\0$action".
44
45 The same holds for evaluated strings that contain subroutines, or
46 which are currently being executed. The $filename for "eval"ed
47 strings looks like "(eval 34)".
48
49 · Each scalar "${"_<$filename"}" contains "_<$filename". This is
50 also the case for evaluated strings that contain subroutines, or
51 which are currently being executed. The $filename for "eval"ed
52 strings looks like "(eval 34)".
53
54 · After each "require"d file is compiled, but before it is executed,
55 "DB::postponed(*{"_<$filename"})" is called if the subroutine
56 "DB::postponed" exists. Here, the $filename is the expanded name
57 of the "require"d file, as found in the values of %INC.
58
59 · After each subroutine "subname" is compiled, the existence of
60 $DB::postponed{subname} is checked. If this key exists,
61 "DB::postponed(subname)" is called if the "DB::postponed"
62 subroutine also exists.
63
64 · A hash %DB::sub is maintained, whose keys are subroutine names and
65 whose values have the form "filename:startline-endline".
66 "filename" has the form "(eval 34)" for subroutines defined inside
67 "eval"s.
68
69 · When the execution of your program reaches a point that can hold a
70 breakpoint, the "DB::DB()" subroutine is called if any of the
71 variables $DB::trace, $DB::single, or $DB::signal is true. These
72 variables are not "local"izable. This feature is disabled when
73 executing inside "DB::DB()", including functions called from it
74 unless "$^D & (1<<30)" is true.
75
76 · When execution of the program reaches a subroutine call, a call to
77 &DB::sub(args) is made instead, with $DB::sub holding the name of
78 the called subroutine. (This doesn't happen if the subroutine was
79 compiled in the "DB" package.)
80
81 If the call is to an lvalue subroutine, and &DB::lsub is defined
82 &DB::lsub(args) is called instead, otherwise falling back to
83 &DB::sub(args).
84
85 · When execution of the program uses "goto" to enter a non-XS
86 subroutine and the 0x80 bit is set in $^P, a call to &DB::goto is
87 made, with $DB::sub holding the name of the subroutine being
88 entered.
89
90 Note that if &DB::sub needs external data for it to work, no subroutine
91 call is possible without it. As an example, the standard debugger's
92 &DB::sub depends on the $DB::deep variable (it defines how many levels
93 of recursion deep into the debugger you can go before a mandatory
94 break). If $DB::deep is not defined, subroutine calls are not
95 possible, even though &DB::sub exists.
96
97 Writing Your Own Debugger
98 Environment Variables
99
100 The "PERL5DB" environment variable can be used to define a debugger.
101 For example, the minimal "working" debugger (it actually doesn't do
102 anything) consists of one line:
103
104 sub DB::DB {}
105
106 It can easily be defined like this:
107
108 $ PERL5DB="sub DB::DB {}" perl -d your-script
109
110 Another brief debugger, slightly more useful, can be created with only
111 the line:
112
113 sub DB::DB {print ++$i; scalar <STDIN>}
114
115 This debugger prints a number which increments for each statement
116 encountered and waits for you to hit a newline before continuing to the
117 next statement.
118
119 The following debugger is actually useful:
120
121 {
122 package DB;
123 sub DB {}
124 sub sub {print ++$i, " $sub\n"; &$sub}
125 }
126
127 It prints the sequence number of each subroutine call and the name of
128 the called subroutine. Note that &DB::sub is being compiled into the
129 package "DB" through the use of the "package" directive.
130
131 When it starts, the debugger reads your rc file (./.perldb or ~/.perldb
132 under Unix), which can set important options. (A subroutine
133 (&afterinit) can be defined here as well; it is executed after the
134 debugger completes its own initialization.)
135
136 After the rc file is read, the debugger reads the PERLDB_OPTS
137 environment variable and uses it to set debugger options. The contents
138 of this variable are treated as if they were the argument of an "o ..."
139 debugger command (q.v. in "Configurable Options" in perldebug).
140
141 Debugger Internal Variables
142
143 In addition to the file and subroutine-related variables mentioned
144 above, the debugger also maintains various magical internal variables.
145
146 · @DB::dbline is an alias for "@{"::_<current_file"}", which holds
147 the lines of the currently-selected file (compiled by Perl), either
148 explicitly chosen with the debugger's "f" command, or implicitly by
149 flow of execution.
150
151 Values in this array are magical in numeric context: they compare
152 equal to zero only if the line is not breakable.
153
154 · %DB::dbline is an alias for "%{"::_<current_file"}", which contains
155 breakpoints and actions keyed by line number in the currently-
156 selected file, either explicitly chosen with the debugger's "f"
157 command, or implicitly by flow of execution.
158
159 As previously noted, individual entries (as opposed to the whole
160 hash) are settable. Perl only cares about Boolean true here,
161 although the values used by perl5db.pl have the form
162 "$break_condition\0$action".
163
164 Debugger Customization Functions
165
166 Some functions are provided to simplify customization.
167
168 · See "Configurable Options" in perldebug for a description of
169 options parsed by "DB::parse_options(string)".
170
171 · "DB::dump_trace(skip[,count])" skips the specified number of frames
172 and returns a list containing information about the calling frames
173 (all of them, if "count" is missing). Each entry is reference to a
174 hash with keys "context" (either ".", "$", or "@"), "sub"
175 (subroutine name, or info about "eval"), "args" ("undef" or a
176 reference to an array), "file", and "line".
177
178 · "DB::print_trace(FH, skip[, count[, short]])" prints formatted info
179 about caller frames. The last two functions may be convenient as
180 arguments to "<", "<<" commands.
181
182 Note that any variables and functions that are not documented in this
183 manpages (or in perldebug) are considered for internal use only, and as
184 such are subject to change without notice.
185
187 The "frame" option can be used to control the output of frame
188 information. For example, contrast this expression trace:
189
190 $ perl -de 42
191 Stack dump during die enabled outside of evals.
192
193 Loading DB routines from perl5db.pl patch level 0.94
194 Emacs support available.
195
196 Enter h or 'h h' for help.
197
198 main::(-e:1): 0
199 DB<1> sub foo { 14 }
200
201 DB<2> sub bar { 3 }
202
203 DB<3> t print foo() * bar()
204 main::((eval 172):3): print foo() + bar();
205 main::foo((eval 168):2):
206 main::bar((eval 170):2):
207 42
208
209 with this one, once the "o"ption "frame=2" has been set:
210
211 DB<4> o f=2
212 frame = '2'
213 DB<5> t print foo() * bar()
214 3: foo() * bar()
215 entering main::foo
216 2: sub foo { 14 };
217 exited main::foo
218 entering main::bar
219 2: sub bar { 3 };
220 exited main::bar
221 42
222
223 By way of demonstration, we present below a laborious listing resulting
224 from setting your "PERLDB_OPTS" environment variable to the value "f=n
225 N", and running perl -d -V from the command line. Examples using
226 various values of "n" are shown to give you a feel for the difference
227 between settings. Long though it may be, this is not a complete
228 listing, but only excerpts.
229
230 1.
231 entering main::BEGIN
232 entering Config::BEGIN
233 Package lib/Exporter.pm.
234 Package lib/Carp.pm.
235 Package lib/Config.pm.
236 entering Config::TIEHASH
237 entering Exporter::import
238 entering Exporter::export
239 entering Config::myconfig
240 entering Config::FETCH
241 entering Config::FETCH
242 entering Config::FETCH
243 entering Config::FETCH
244
245 2.
246 entering main::BEGIN
247 entering Config::BEGIN
248 Package lib/Exporter.pm.
249 Package lib/Carp.pm.
250 exited Config::BEGIN
251 Package lib/Config.pm.
252 entering Config::TIEHASH
253 exited Config::TIEHASH
254 entering Exporter::import
255 entering Exporter::export
256 exited Exporter::export
257 exited Exporter::import
258 exited main::BEGIN
259 entering Config::myconfig
260 entering Config::FETCH
261 exited Config::FETCH
262 entering Config::FETCH
263 exited Config::FETCH
264 entering Config::FETCH
265
266 3.
267 in $=main::BEGIN() from /dev/null:0
268 in $=Config::BEGIN() from lib/Config.pm:2
269 Package lib/Exporter.pm.
270 Package lib/Carp.pm.
271 Package lib/Config.pm.
272 in $=Config::TIEHASH('Config') from lib/Config.pm:644
273 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
274 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
275 in @=Config::myconfig() from /dev/null:0
276 in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
277 in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
278 in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
279 in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
280 in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
281 in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
282
283 4.
284 in $=main::BEGIN() from /dev/null:0
285 in $=Config::BEGIN() from lib/Config.pm:2
286 Package lib/Exporter.pm.
287 Package lib/Carp.pm.
288 out $=Config::BEGIN() from lib/Config.pm:0
289 Package lib/Config.pm.
290 in $=Config::TIEHASH('Config') from lib/Config.pm:644
291 out $=Config::TIEHASH('Config') from lib/Config.pm:644
292 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
293 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
294 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
295 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
296 out $=main::BEGIN() from /dev/null:0
297 in @=Config::myconfig() from /dev/null:0
298 in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
299 out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
300 in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
301 out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
302 in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
303 out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
304 in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
305
306 5.
307 in $=main::BEGIN() from /dev/null:0
308 in $=Config::BEGIN() from lib/Config.pm:2
309 Package lib/Exporter.pm.
310 Package lib/Carp.pm.
311 out $=Config::BEGIN() from lib/Config.pm:0
312 Package lib/Config.pm.
313 in $=Config::TIEHASH('Config') from lib/Config.pm:644
314 out $=Config::TIEHASH('Config') from lib/Config.pm:644
315 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
316 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
317 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
318 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
319 out $=main::BEGIN() from /dev/null:0
320 in @=Config::myconfig() from /dev/null:0
321 in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
322 out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
323 in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
324 out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
325
326 6.
327 in $=CODE(0x15eca4)() from /dev/null:0
328 in $=CODE(0x182528)() from lib/Config.pm:2
329 Package lib/Exporter.pm.
330 out $=CODE(0x182528)() from lib/Config.pm:0
331 scalar context return from CODE(0x182528): undef
332 Package lib/Config.pm.
333 in $=Config::TIEHASH('Config') from lib/Config.pm:628
334 out $=Config::TIEHASH('Config') from lib/Config.pm:628
335 scalar context return from Config::TIEHASH: empty hash
336 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
337 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
338 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
339 scalar context return from Exporter::export: ''
340 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
341 scalar context return from Exporter::import: ''
342
343 In all cases shown above, the line indentation shows the call tree. If
344 bit 2 of "frame" is set, a line is printed on exit from a subroutine as
345 well. If bit 4 is set, the arguments are printed along with the caller
346 info. If bit 8 is set, the arguments are printed even if they are tied
347 or references. If bit 16 is set, the return value is printed, too.
348
349 When a package is compiled, a line like this
350
351 Package lib/Carp.pm.
352
353 is printed with proper indentation.
354
356 There are two ways to enable debugging output for regular expressions.
357
358 If your perl is compiled with "-DDEBUGGING", you may use the -Dr flag
359 on the command line.
360
361 Otherwise, one can "use re 'debug'", which has effects at compile time
362 and run time. Since Perl 5.9.5, this pragma is lexically scoped.
363
364 Compile-time Output
365 The debugging output at compile time looks like this:
366
367 Compiling REx '[bc]d(ef*g)+h[ij]k$'
368 size 45 Got 364 bytes for offset annotations.
369 first at 1
370 rarest char g at 0
371 rarest char d at 0
372 1: ANYOF[bc](12)
373 12: EXACT <d>(14)
374 14: CURLYX[0] {1,32767}(28)
375 16: OPEN1(18)
376 18: EXACT <e>(20)
377 20: STAR(23)
378 21: EXACT <f>(0)
379 23: EXACT <g>(25)
380 25: CLOSE1(27)
381 27: WHILEM[1/1](0)
382 28: NOTHING(29)
383 29: EXACT <h>(31)
384 31: ANYOF[ij](42)
385 42: EXACT <k>(44)
386 44: EOL(45)
387 45: END(0)
388 anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating)
389 stclass 'ANYOF[bc]' minlen 7
390 Offsets: [45]
391 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
392 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
393 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
394 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
395 Omitting $` $& $' support.
396
397 The first line shows the pre-compiled form of the regex. The second
398 shows the size of the compiled form (in arbitrary units, usually 4-byte
399 words) and the total number of bytes allocated for the offset/length
400 table, usually 4+"size"*8. The next line shows the label id of the
401 first node that does a match.
402
403 The
404
405 anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating)
406 stclass 'ANYOF[bc]' minlen 7
407
408 line (split into two lines above) contains optimizer information. In
409 the example shown, the optimizer found that the match should contain a
410 substring "de" at offset 1, plus substring "gh" at some offset between
411 3 and infinity. Moreover, when checking for these substrings (to
412 abandon impossible matches quickly), Perl will check for the substring
413 "gh" before checking for the substring "de". The optimizer may also
414 use the knowledge that the match starts (at the "first" id) with a
415 character class, and no string shorter than 7 characters can possibly
416 match.
417
418 The fields of interest which may appear in this line are
419
420 "anchored" STRING "at" POS
421 "floating" STRING "at" POS1..POS2
422 See above.
423
424 "matching floating/anchored"
425 Which substring to check first.
426
427 "minlen"
428 The minimal length of the match.
429
430 "stclass" TYPE
431 Type of first matching node.
432
433 "noscan"
434 Don't scan for the found substrings.
435
436 "isall"
437 Means that the optimizer information is all that the regular
438 expression contains, and thus one does not need to enter the regex
439 engine at all.
440
441 "GPOS"
442 Set if the pattern contains "\G".
443
444 "plus"
445 Set if the pattern starts with a repeated char (as in "x+y").
446
447 "implicit"
448 Set if the pattern starts with ".*".
449
450 "with eval"
451 Set if the pattern contain eval-groups, such as "(?{ code })" and
452 "(??{ code })".
453
454 "anchored(TYPE)"
455 If the pattern may match only at a handful of places, with "TYPE"
456 being "SBOL", "MBOL", or "GPOS". See the table below.
457
458 If a substring is known to match at end-of-line only, it may be
459 followed by "$", as in "floating 'k'$".
460
461 The optimizer-specific information is used to avoid entering (a slow)
462 regex engine on strings that will not definitely match. If the "isall"
463 flag is set, a call to the regex engine may be avoided even when the
464 optimizer found an appropriate place for the match.
465
466 Above the optimizer section is the list of nodes of the compiled form
467 of the regex. Each line has format
468
469 " "id: TYPE OPTIONAL-INFO (next-id)
470
471 Types of Nodes
472 Here are the current possible types, with short descriptions:
473
474 # TYPE arg-description [num-args] [longjump-len] DESCRIPTION
475
476 # Exit points
477
478 END no End of program.
479 SUCCEED no Return from a subroutine, basically.
480
481 # Line Start Anchors:
482 SBOL no Match "" at beginning of line: /^/, /\A/
483 MBOL no Same, assuming multiline: /^/m
484
485 # Line End Anchors:
486 SEOL no Match "" at end of line: /$/
487 MEOL no Same, assuming multiline: /$/m
488 EOS no Match "" at end of string: /\z/
489
490 # Match Start Anchors:
491 GPOS no Matches where last m//g left off.
492
493 # Word Boundary Opcodes:
494 BOUND no Like BOUNDA for non-utf8, otherwise match ""
495 between any Unicode \w\W or \W\w
496 BOUNDL no Like BOUND/BOUNDU, but \w and \W are defined
497 by current locale
498 BOUNDU no Match "" at any boundary of a given type
499 using Unicode rules
500 BOUNDA no Match "" at any boundary between \w\W or
501 \W\w, where \w is [_a-zA-Z0-9]
502 NBOUND no Like NBOUNDA for non-utf8, otherwise match
503 "" between any Unicode \w\w or \W\W
504 NBOUNDL no Like NBOUND/NBOUNDU, but \w and \W are
505 defined by current locale
506 NBOUNDU no Match "" at any non-boundary of a given type
507 using using Unicode rules
508 NBOUNDA no Match "" betweeen any \w\w or \W\W, where \w
509 is [_a-zA-Z0-9]
510
511 # [Special] alternatives:
512 REG_ANY no Match any one character (except newline).
513 SANY no Match any one character.
514 ANYOF sv 1 Match character in (or not in) this class,
515 single char match only
516 ANYOFD sv 1 Like ANYOF, but /d is in effect
517 ANYOFL sv 1 Like ANYOF, but /l is in effect
518
519 # POSIX Character Classes:
520 POSIXD none Some [[:class:]] under /d; the FLAGS field
521 gives which one
522 POSIXL none Some [[:class:]] under /l; the FLAGS field
523 gives which one
524 POSIXU none Some [[:class:]] under /u; the FLAGS field
525 gives which one
526 POSIXA none Some [[:class:]] under /a; the FLAGS field
527 gives which one
528 NPOSIXD none complement of POSIXD, [[:^class:]]
529 NPOSIXL none complement of POSIXL, [[:^class:]]
530 NPOSIXU none complement of POSIXU, [[:^class:]]
531 NPOSIXA none complement of POSIXA, [[:^class:]]
532
533 CLUMP no Match any extended grapheme cluster sequence
534
535 # Alternation
536
537 # BRANCH The set of branches constituting a single choice are
538 # hooked together with their "next" pointers, since
539 # precedence prevents anything being concatenated to
540 # any individual branch. The "next" pointer of the last
541 # BRANCH in a choice points to the thing following the
542 # whole choice. This is also where the final "next"
543 # pointer of each individual branch points; each branch
544 # starts with the operand node of a BRANCH node.
545 #
546 BRANCH node Match this alternative, or the next...
547
548 # Literals
549
550 EXACT str Match this string (preceded by length).
551 EXACTL str Like EXACT, but /l is in effect (used so
552 locale-related warnings can be checked for).
553 EXACTF str Match this non-UTF-8 string (not guaranteed
554 to be folded) using /id rules (w/len).
555 EXACTFL str Match this string (not guaranteed to be
556 folded) using /il rules (w/len).
557 EXACTFU str Match this string (folded iff in UTF-8,
558 length in folding doesn't change if not in
559 UTF-8) using /iu rules (w/len).
560 EXACTFA str Match this string (not guaranteed to be
561 folded) using /iaa rules (w/len).
562
563 EXACTFU_SS str Match this string (folded iff in UTF-8,
564 length in folding may change even if not in
565 UTF-8) using /iu rules (w/len).
566 EXACTFLU8 str Rare cirucmstances: like EXACTFU, but is
567 under /l, UTF-8, folded, and everything in
568 it is above 255.
569 EXACTFA_NO_TRIE str Match this string (which is not trie-able;
570 not guaranteed to be folded) using /iaa
571 rules (w/len).
572
573 # Do nothing types
574
575 NOTHING no Match empty string.
576 # A variant of above which delimits a group, thus stops optimizations
577 TAIL no Match empty string. Can jump here from
578 outside.
579
580 # Loops
581
582 # STAR,PLUS '?', and complex '*' and '+', are implemented as
583 # circular BRANCH structures. Simple cases
584 # (one character per match) are implemented with STAR
585 # and PLUS for speed and to minimize recursive plunges.
586 #
587 STAR node Match this (simple) thing 0 or more times.
588 PLUS node Match this (simple) thing 1 or more times.
589
590 CURLY sv 2 Match this simple thing {n,m} times.
591 CURLYN no 2 Capture next-after-this simple thing
592 CURLYM no 2 Capture this medium-complex thing {n,m}
593 times.
594 CURLYX sv 2 Match this complex thing {n,m} times.
595
596 # This terminator creates a loop structure for CURLYX
597 WHILEM no Do curly processing and see if rest matches.
598
599 # Buffer related
600
601 # OPEN,CLOSE,GROUPP ...are numbered at compile time.
602 OPEN num 1 Mark this point in input as start of #n.
603 CLOSE num 1 Analogous to OPEN.
604
605 REF num 1 Match some already matched string
606 REFF num 1 Match already matched string, folded using
607 native charset rules for non-utf8
608 REFFL num 1 Match already matched string, folded in loc.
609 REFFU num 1 Match already matched string, folded using
610 unicode rules for non-utf8
611 REFFA num 1 Match already matched string, folded using
612 unicode rules for non-utf8, no mixing ASCII,
613 non-ASCII
614
615 # Named references. Code in regcomp.c assumes that these all are after
616 # the numbered references
617 NREF no-sv 1 Match some already matched string
618 NREFF no-sv 1 Match already matched string, folded using
619 native charset rules for non-utf8
620 NREFFL no-sv 1 Match already matched string, folded in loc.
621 NREFFU num 1 Match already matched string, folded using
622 unicode rules for non-utf8
623 NREFFA num 1 Match already matched string, folded using
624 unicode rules for non-utf8, no mixing ASCII,
625 non-ASCII
626
627 # Support for long RE
628 LONGJMP off 1 1 Jump far away.
629 BRANCHJ off 1 1 BRANCH with long offset.
630
631 # Special Case Regops
632 IFMATCH off 1 2 Succeeds if the following matches.
633 UNLESSM off 1 2 Fails if the following matches.
634 SUSPEND off 1 1 "Independent" sub-RE.
635 IFTHEN off 1 1 Switch, should be preceded by switcher.
636 GROUPP num 1 Whether the group matched.
637
638 # The heavy worker
639
640 EVAL evl/flags Execute some Perl code.
641 2L
642
643 # Modifiers
644
645 MINMOD no Next operator is not greedy.
646 LOGICAL no Next opcode should set the flag only.
647
648 # This is not used yet
649 RENUM off 1 1 Group with independently numbered parens.
650
651 # Trie Related
652
653 # Behave the same as A|LIST|OF|WORDS would. The '..C' variants
654 # have inline charclass data (ascii only), the 'C' store it in the
655 # structure.
656
657 TRIE trie 1 Match many EXACT(F[ALU]?)? at once.
658 flags==type
659 TRIEC trie Same as TRIE, but with embedded charclass
660 charclass data
661
662 AHOCORASICK trie 1 Aho Corasick stclass. flags==type
663 AHOCORASICKC trie Same as AHOCORASICK, but with embedded
664 charclass charclass data
665
666 # Regex Subroutines
667 GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs arg2
668
669 # Special conditionals
670 NGROUPP no-sv 1 Whether the group matched.
671 INSUBP num 1 Whether we are in a specific recurse.
672 DEFINEP none 1 Never execute directly.
673
674 # Backtracking Verbs
675 ENDLIKE none Used only for the type field of verbs
676 OPFAIL no-sv 1 Same as (?!), but with verb arg
677 ACCEPT no-sv/num Accepts the current matched string, with
678 2L verbar
679
680 # Verbs With Arguments
681 VERB no-sv 1 Used only for the type field of verbs
682 PRUNE no-sv 1 Pattern fails at this startpoint if no-
683 backtracking through this
684 MARKPOINT no-sv 1 Push the current location for rollback by
685 cut.
686 SKIP no-sv 1 On failure skip forward (to the mark) before
687 retrying
688 COMMIT no-sv 1 Pattern fails outright if backtracking
689 through this
690 CUTGROUP no-sv 1 On failure go to the next alternation in the
691 group
692
693 # Control what to keep in $&.
694 KEEPS no $& begins here.
695
696 # New charclass like patterns
697 LNBREAK none generic newline pattern
698
699 # SPECIAL REGOPS
700
701 # This is not really a node, but an optimized away piece of a "long"
702 # node. To simplify debugging output, we mark it as if it were a node
703 OPTIMIZED off Placeholder for dump.
704
705 # Special opcode with the property that no opcode in a compiled program
706 # will ever be of this type. Thus it can be used as a flag value that
707 # no other opcode has been seen. END is used similarly, in that an END
708 # node cant be optimized. So END implies "unoptimizable" and PSEUDO
709 # mean "not seen anything to optimize yet".
710 PSEUDO off Pseudo opcode for internal use.
711
712 Following the optimizer information is a dump of the offset/length
713 table, here split across several lines:
714
715 Offsets: [45]
716 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
717 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
718 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
719 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
720
721 The first line here indicates that the offset/length table contains 45
722 entries. Each entry is a pair of integers, denoted by
723 "offset[length]". Entries are numbered starting with 1, so entry #1
724 here is "1[4]" and entry #12 is "5[1]". "1[4]" indicates that the node
725 labeled "1:" (the "1: ANYOF[bc]") begins at character position 1 in the
726 pre-compiled form of the regex, and has a length of 4 characters.
727 "5[1]" in position 12 indicates that the node labeled "12:" (the "12:
728 EXACT <d>") begins at character position 5 in the pre-compiled form of
729 the regex, and has a length of 1 character. "12[1]" in position 14
730 indicates that the node labeled "14:" (the "14: CURLYX[0] {1,32767}")
731 begins at character position 12 in the pre-compiled form of the regex,
732 and has a length of 1 character---that is, it corresponds to the "+"
733 symbol in the precompiled regex.
734
735 "0[0]" items indicate that there is no corresponding node.
736
737 Run-time Output
738 First of all, when doing a match, one may get no run-time output even
739 if debugging is enabled. This means that the regex engine was never
740 entered and that all of the job was therefore done by the optimizer.
741
742 If the regex engine was entered, the output may look like this:
743
744 Matching '[bc]d(ef*g)+h[ij]k$' against 'abcdefg__gh__'
745 Setting an EVAL scope, savestack=3
746 2 <ab> <cdefg__gh_> | 1: ANYOF
747 3 <abc> <defg__gh_> | 11: EXACT <d>
748 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
749 4 <abcd> <efg__gh_> | 26: WHILEM
750 0 out of 1..32767 cc=effff31c
751 4 <abcd> <efg__gh_> | 15: OPEN1
752 4 <abcd> <efg__gh_> | 17: EXACT <e>
753 5 <abcde> <fg__gh_> | 19: STAR
754 EXACT <f> can match 1 times out of 32767...
755 Setting an EVAL scope, savestack=3
756 6 <bcdef> <g__gh__> | 22: EXACT <g>
757 7 <bcdefg> <__gh__> | 24: CLOSE1
758 7 <bcdefg> <__gh__> | 26: WHILEM
759 1 out of 1..32767 cc=effff31c
760 Setting an EVAL scope, savestack=12
761 7 <bcdefg> <__gh__> | 15: OPEN1
762 7 <bcdefg> <__gh__> | 17: EXACT <e>
763 restoring \1 to 4(4)..7
764 failed, try continuation...
765 7 <bcdefg> <__gh__> | 27: NOTHING
766 7 <bcdefg> <__gh__> | 28: EXACT <h>
767 failed...
768 failed...
769
770 The most significant information in the output is about the particular
771 node of the compiled regex that is currently being tested against the
772 target string. The format of these lines is
773
774 " "STRING-OFFSET <PRE-STRING> <POST-STRING> |ID: TYPE
775
776 The TYPE info is indented with respect to the backtracking level.
777 Other incidental information appears interspersed within.
778
780 Perl is a profligate wastrel when it comes to memory use. There is a
781 saying that to estimate memory usage of Perl, assume a reasonable
782 algorithm for memory allocation, multiply that estimate by 10, and
783 while you still may miss the mark, at least you won't be quite so
784 astonished. This is not absolutely true, but may provide a good grasp
785 of what happens.
786
787 Assume that an integer cannot take less than 20 bytes of memory, a
788 float cannot take less than 24 bytes, a string cannot take less than 32
789 bytes (all these examples assume 32-bit architectures, the result are
790 quite a bit worse on 64-bit architectures). If a variable is accessed
791 in two of three different ways (which require an integer, a float, or a
792 string), the memory footprint may increase yet another 20 bytes. A
793 sloppy malloc(3) implementation can inflate these numbers dramatically.
794
795 On the opposite end of the scale, a declaration like
796
797 sub foo;
798
799 may take up to 500 bytes of memory, depending on which release of Perl
800 you're running.
801
802 Anecdotal estimates of source-to-compiled code bloat suggest an
803 eightfold increase. This means that the compiled form of reasonable
804 (normally commented, properly indented etc.) code will take about eight
805 times more space in memory than the code took on disk.
806
807 The -DL command-line switch is obsolete since circa Perl 5.6.0 (it was
808 available only if Perl was built with "-DDEBUGGING"). The switch was
809 used to track Perl's memory allocations and possible memory leaks.
810 These days the use of malloc debugging tools like Purify or valgrind is
811 suggested instead. See also "PERL_MEM_LOG" in perlhacktips.
812
813 One way to find out how much memory is being used by Perl data
814 structures is to install the Devel::Size module from CPAN: it gives you
815 the minimum number of bytes required to store a particular data
816 structure. Please be mindful of the difference between the size() and
817 total_size().
818
819 If Perl has been compiled using Perl's malloc you can analyze Perl
820 memory usage by setting $ENV{PERL_DEBUG_MSTATS}.
821
822 Using $ENV{PERL_DEBUG_MSTATS}
823 If your perl is using Perl's malloc() and was compiled with the
824 necessary switches (this is the default), then it will print memory
825 usage statistics after compiling your code when
826 "$ENV{PERL_DEBUG_MSTATS} > 1", and before termination of the program
827 when "$ENV{PERL_DEBUG_MSTATS} >= 1". The report format is similar to
828 the following example:
829
830 $ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
831 Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
832 14216 free: 130 117 28 7 9 0 2 2 1 0 0
833 437 61 36 0 5
834 60924 used: 125 137 161 55 7 8 6 16 2 0 1
835 74 109 304 84 20
836 Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
837 Memory allocation statistics after execution: (buckets 4(4)..8188(8192)
838 30888 free: 245 78 85 13 6 2 1 3 2 0 1
839 315 162 39 42 11
840 175816 used: 265 176 1112 111 26 22 11 27 2 1 1
841 196 178 1066 798 39
842 Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
843
844 It is possible to ask for such a statistic at arbitrary points in your
845 execution using the mstat() function out of the standard Devel::Peek
846 module.
847
848 Here is some explanation of that format:
849
850 "buckets SMALLEST(APPROX)..GREATEST(APPROX)"
851 Perl's malloc() uses bucketed allocations. Every request is
852 rounded up to the closest bucket size available, and a bucket is
853 taken from the pool of buckets of that size.
854
855 The line above describes the limits of buckets currently in use.
856 Each bucket has two sizes: memory footprint and the maximal size of
857 user data that can fit into this bucket. Suppose in the above
858 example that the smallest bucket were size 4. The biggest bucket
859 would have usable size 8188, and the memory footprint would be
860 8192.
861
862 In a Perl built for debugging, some buckets may have negative
863 usable size. This means that these buckets cannot (and will not)
864 be used. For larger buckets, the memory footprint may be one page
865 greater than a power of 2. If so, the corresponding power of two
866 is printed in the "APPROX" field above.
867
868 Free/Used
869 The 1 or 2 rows of numbers following that correspond to the number
870 of buckets of each size between "SMALLEST" and "GREATEST". In the
871 first row, the sizes (memory footprints) of buckets are powers of
872 two--or possibly one page greater. In the second row, if present,
873 the memory footprints of the buckets are between the memory
874 footprints of two buckets "above".
875
876 For example, suppose under the previous example, the memory
877 footprints were
878
879 free: 8 16 32 64 128 256 512 1024 2048 4096 8192
880 4 12 24 48 80
881
882 With a non-"DEBUGGING" perl, the buckets starting from 128 have a
883 4-byte overhead, and thus an 8192-long bucket may take up to
884 8188-byte allocations.
885
886 "Total sbrk(): SBRKed/SBRKs:CONTINUOUS"
887 The first two fields give the total amount of memory perl sbrk(2)ed
888 (ess-broken? :-) and number of sbrk(2)s used. The third number is
889 what perl thinks about continuity of returned chunks. So long as
890 this number is positive, malloc() will assume that it is probable
891 that sbrk(2) will provide continuous memory.
892
893 Memory allocated by external libraries is not counted.
894
895 "pad: 0"
896 The amount of sbrk(2)ed memory needed to keep buckets aligned.
897
898 "heads: 2192"
899 Although memory overhead of bigger buckets is kept inside the
900 bucket, for smaller buckets, it is kept in separate areas. This
901 field gives the total size of these areas.
902
903 "chain: 0"
904 malloc() may want to subdivide a bigger bucket into smaller
905 buckets. If only a part of the deceased bucket is left
906 unsubdivided, the rest is kept as an element of a linked list.
907 This field gives the total size of these chunks.
908
909 "tail: 6144"
910 To minimize the number of sbrk(2)s, malloc() asks for more memory.
911 This field gives the size of the yet unused part, which is
912 sbrk(2)ed, but never touched.
913
915 perldebug, perlguts, perlrun re, and Devel::DProf.
916
917
918
919perl v5.26.3 2018-03-01 PERLDEBGUTS(1)