1Devel::NYTProf(3) User Contributed Perl Documentation Devel::NYTProf(3)
2
3
4
6 Devel::NYTProf - Powerful fast feature-rich Perl source code profiler
7
9 # profile code and write database to ./nytprof.out
10 perl -d:NYTProf some_perl.pl
11
12 # convert database into a set of html files, e.g., ./nytprof/index.html
13 # and open a web browser on the nytprof/index.html file
14 nytprofhtml --open
15
16 # or into comma separated files, e.g., ./nytprof/*.csv
17 nytprofcsv
18
19 I give talks on profiling perl code, including a detailed look at how
20 to use NYTProf and how to optimize your code, every year. A video of my
21 YAPC::NA 2014 talk can be found at <https://youtu.be/T7EK6RZAnEA>
22
24 Devel::NYTProf is a powerful, fast, feature-rich perl source code
25 profiler.
26
27 • Performs per-line statement profiling for fine detail
28
29 • Performs per-subroutine statement profiling for overview
30
31 • Performs per-opcode profiling for slow perl builtins
32
33 • Performs per-block statement profiling (the first profiler to do
34 so)
35
36 • Accounts correctly for time spent after calls return
37
38 • Performs inclusive and exclusive timing of subroutines
39
40 • Subroutine times are per calling location (a powerful feature)
41
42 • Can profile compile-time activity, just run-time, or just END time
43
44 • Uses novel techniques for efficient profiling
45
46 • Sub-microsecond (100ns) resolution on supported systems
47
48 • Very fast - the fastest statement and subroutine profilers for perl
49
50 • Handles applications that fork, with no performance cost
51
52 • Immune from noise caused by profiling overheads and I/O
53
54 • Program being profiled can stop/start the profiler
55
56 • Generates richly annotated and cross-linked html reports
57
58 • Captures source code, including string evals, for stable results
59
60 • Trivial to use with mod_perl - add one line to httpd.conf
61
62 • Includes an extensive test suite
63
64 • Tested on very large codebases
65
66 NYTProf is effectively two profilers in one: a statement profiler, and
67 a subroutine profiler.
68
69 Statement Profiling
70 The statement profiler measures the time between entering one perl
71 statement and entering the next. Whenever execution reaches a new
72 statement, the time since entering the previous statement is calculated
73 and added to the time associated with the line of the source file that
74 the previous statement starts on.
75
76 By default the statement profiler also determines the first line of the
77 current block and the first line of the current statement, and
78 accumulates times associated with those.
79
80 Another innovation unique to NYTProf is automatic compensation for a
81 problem inherent in simplistic statement-to-statement timing. Consider
82 a statement that calls a subroutine and then performs some other work
83 that doesn't execute new statements, for example:
84
85 foo(...) + mkdir(...);
86
87 In all other statement profilers the time spent in remainder of the
88 expression (mkdir in the example) will be recorded as having been spent
89 on the last statement executed in foo()! Here's another example:
90
91 while (<>) {
92 ...
93 1;
94 }
95
96 After the first time around the loop, any further time spent evaluating
97 the condition (waiting for input in this example) would be recorded as
98 having been spent on the last statement executed in the loop! (Until
99 perl bug #60954 is fixed this problem still applies to some loops. For
100 more information see
101 <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>)
102
103 NYTProf avoids these problems by intercepting the opcodes which
104 indicate that control is returning into some previous statement and
105 adjusting the profile accordingly.
106
107 The statement profiler naturally generates a lot of data which is
108 streamed out to a file in a very compact format. NYTProf takes care to
109 not include the measurement and writing overheads in the profile times
110 (some profilers produce 'noisy' data due to periodic stdio flushing).
111
112 Subroutine Profiling
113 The subroutine profiler measures the time between entering a subroutine
114 and leaving it. It then increments a call count and accumulates the
115 duration. For each subroutine called, separate counts and durations
116 are stored for each location that called the subroutine.
117
118 Subroutine entry is detected by intercepting the "entersub" opcode.
119 Subroutine exit is detected via perl's internal save stack. As a result
120 the subroutine profiler is both fast and robust.
121
122 Subroutine Recursion
123
124 For subroutines that recurse directly or indirectly, such as
125 Error::try, the inclusive time is only measured for the outer-most
126 call.
127
128 The inclusive times of recursive calls are still measured and are
129 accumulated separately. Also the 'maximum recursion depth' per calling
130 location is recorded.
131
132 Goto &Subroutine
133
134 Perl implements a "goto &destination" as a "return" followed by a call
135 to &destination, so that's how it will appear in the report.
136
137 The "goto" will be shown with a very short time because it's
138 effectively just a "return". The &destination sub will show a call not
139 from the location of the "goto" but from the location of the call to
140 the sub that performed the "goto".
141
142 accept()
143
144 The perl built-in accept() function waits listening for a connection on
145 a socket, and so is a key part of pure-perl network service
146 applications.
147
148 The time spent waiting for a remotely initiated connection can be
149 relatively high but is not relevant to the performance of the
150 application. So the accept() function is treated as a special case. The
151 subroutine profiler discounts the time spent in the accept() function.
152 It does this in a way that also discounts that time from all the
153 callers up the call stack. The effect on the reports is that all
154 accept() calls appear to be instant.
155
156 The statement profiler still shows the time actually spent in the
157 statement that executed the accept() call.
158
159 Application Profiling
160 NYTProf records extra information in the data file to capture details
161 that may be useful when analyzing the performance. It also records the
162 filename and line ranges of all the subroutines.
163
164 NYTProf can profile applications that fork, and does so with no loss of
165 performance. NYTProf detects the fork and starts writing a new profile
166 file with the pid appended to the filename. Since nytprofhtml only
167 works with a single profile file you may want to merge multiple files
168 using nytprofmerge.
169
170 Fast Profiling
171 The NYTProf profiler is written almost entirely in C and great care has
172 been taken to ensure it's very efficient.
173
174 Apache Profiling
175 Just add one line near the start of your httpd.conf file:
176
177 PerlModule Devel::NYTProf::Apache
178
179 By default you'll get a /tmp/nytprof.$$.out file for the parent process
180 and a /tmp/nytprof.$parent.out.$$ file for each worker process.
181
182 NYTProf takes care to detect when control is returning back from perl
183 to mod_perl so time spent in mod_perl (such as waiting for the next
184 request) does not get allocated to the last statement executed.
185
186 Works with mod_perl 1 and 2. See Devel::NYTProf::Apache for more
187 information.
188
190 Usually you'd load Devel::NYTProf on the command line using the perl -d
191 option:
192
193 perl -d:NYTProf some_perl.pl
194
195 To save typing the ':NYTProf' you could set the "PERL5DB" environment
196 variable:
197
198 PERL5DB='use Devel::NYTProf'
199
200 and then just perl -d would work:
201
202 perl -d some_perl.pl
203
204 Or you can avoid the need to add the -d option at all by using the
205 "PERL5OPT" environment variable|perlrun/PERL5OPT>:
206
207 PERL5OPT=-d:NYTProf
208
209 That's also very handy when you can't alter the perl command line being
210 used to run the script you want to profile. Usually you'll want to
211 enable the "addpid=1" option to ensure any nested invocations of perl
212 don't overwrite the profile.
213
215 The behavior of Devel::NYTProf may be modified by setting the
216 environment variable "NYTPROF". It is possible to use this environment
217 variable to effect multiple setting by separating the values with a
218 ":". For example:
219
220 export NYTPROF=trace=2:start=init:file=/tmp/nytprof.out
221
222 Any colon or equal characters in a value can be escaped by preceding
223 them with a backslash.
224
225 addpid=1
226 Append the current process id to the end of the filename.
227
228 This avoids concurrent, or consecutive, processes from overwriting the
229 same file. If a fork is detected during profiling then the child
230 process will automatically add the process id to the filename.
231
232 addtimestamp=1
233 Append the current time, as integer epoch seconds, to the end of the
234 filename.
235
236 trace=N
237 Set trace level to N. 0 is off (the default). Higher values cause more
238 detailed trace output. Trace output is written to STDERR or wherever
239 the "log=F" option has specified.
240
241 log=F
242 Specify the name of the file that "trace=N" output should be written
243 to.
244
245 start=...
246 Specify at which phase of program execution the profiler should be
247 enabled:
248
249 start=begin - start immediately (the default)
250 start=init - start at beginning of INIT phase (after compilation/use/BEGIN)
251 start=end - start at beginning of END phase
252 start=no - don't automatically start
253
254 The start=no option is handy if you want to explicitly control
255 profiling by calling DB::enable_profile() and DB::disable_profile()
256 yourself. See "RUN-TIME CONTROL OF PROFILING".
257
258 The start=init option is handy if you want to avoid profiling the
259 loading and initialization of modules.
260
261 optimize=0
262 Disable the perl optimizer.
263
264 By default NYTProf leaves perl's optimizer enabled. That gives you
265 more accurate profile timing overall, but can lead to odd statement
266 counts for individual sets of lines. That's because the perl's peephole
267 optimizer has effectively rewritten the statements but you can't see
268 what the rewritten version looks like.
269
270 For example:
271
272 1 if (...) {
273 2 return;
274 3 }
275
276 may be rewritten as
277
278 1 return if (...)
279
280 so the profile won't show a statement count for line 2 in your source
281 code because the "return" was merged into the "if" statement on the
282 preceding line.
283
284 Also 'empty' statements like "1;" are removed entirely. Such
285 statements are empty because the optimizer has already removed the
286 pointless constant in void context. It then goes on to remove the now
287 empty statement (in perl >= 5.13.7).
288
289 Using the "optimize=0" option disables the optimizer so you'll get
290 lower overall performance but more accurately assigned statement
291 counts.
292
293 If you find any other examples of the effect of optimizer on NYTProf
294 output (other than performance, obviously) please let us know.
295
296 subs=0
297 Set to 0 to disable the collection of subroutine caller and timing
298 details.
299
300 blocks=1
301 Set to 1 to enable the determination of block and subroutine location
302 per statement. This makes the profiler about 50% slower (as of July
303 2008) and produces larger output files, but you gain some valuable
304 insight in where time is spent in the blocks within large subroutines
305 and scripts.
306
307 stmts=0
308 Set to 0 to disable the statement profiler. (Implies "blocks=0".) The
309 reports won't contain any statement timing detail.
310
311 This significantly reduces the overhead of the profiler and can also be
312 useful for profiling large applications that would normally generate a
313 very large profile data file.
314
315 calls=N
316 This option is new and experimental.
317
318 With calls=1 (the default) subroutine call return events are emitted
319 into the data stream as they happen. With calls=2 subroutine call
320 entry events are also emitted. With calls=0 no subroutine call events
321 are produced. This option depends on the "subs" option being enabled,
322 which it is by default.
323
324 The nytprofcalls utility can be used to process this data. It too is
325 new and experimental and so likely to change.
326
327 The subroutine profiler normally gathers data in memory and outputs a
328 summary when the profile data is being finalized, usually when the
329 program has finished. The summary contains aggregate information for
330 all the calls from one location to another, but the details of
331 individual calls have been lost. The calls option enables the
332 recording of individual call events and thus more detailed analysis and
333 reporting of that data.
334
335 leave=0
336 Set to 0 to disable the extra work done by the statement profiler to
337 allocate times accurately when returning into the middle of statement.
338 For example leaving a subroutine and returning into the middle of
339 statement, or re-evaluating a loop condition.
340
341 This feature also ensures that in embedded environments, such as
342 mod_perl, the last statement executed doesn't accumulate the time spent
343 'outside perl'.
344
345 findcaller=1
346 Force NYTProf to recalculate the name of the caller of the each sub
347 instead of 'inheriting' the name calculated when the caller was
348 entered. (Rarely needed, but might be useful in some odd cases.)
349
350 use_db_sub=1
351 Set to 1 to enable use of the traditional DB::DB() subroutine to
352 perform profiling, instead of the faster 'opcode redirection' technique
353 that's used by default. Also effectively sets "leave=0" (see above).
354
355 The default 'opcode redirection' technique can't profile subroutines
356 that were compiled before NYTProf was loaded. So using use_db_sub=1 can
357 be useful in cases where you can't load the profiler early in the life
358 of the application.
359
360 Another side effect of "use_db_sub=1" is that it enables recording of
361 the source code of the "perl -e '...'" and "perl -" input for old
362 versions of perl. See also "savesrc=0".
363
364 savesrc=0
365 Disable the saving of source code.
366
367 By default NYTProf saves a copy of all source code into the profile
368 data file. This makes the file self-contained, so the reporting tools
369 no longer depend on having the unmodified source code files available.
370
371 With "savesrc=0" some source code is still saved: the arguments to the
372 "perl -e" option, the script fed to perl via STDIN when using "perl -",
373 and the source code of string evals.
374
375 Saving the source code of string evals requires perl version 5.8.9+,
376 5.10.1+, or 5.12 or later.
377
378 Saving the source code of the "perl -e '...'" or "perl -" input
379 requires either a recent perl version, as above, or setting the
380 "use_db_sub=1" option.
381
382 slowops=N
383 Profile perl opcodes that can be slow. These include opcodes that make
384 system calls, such as "print", "read", "sysread", "socket" etc., plus
385 regular expression opcodes like "subst" and "match".
386
387 If "N" is 0 then slowops profiling is disabled.
388
389 If "N" is 1 then all the builtins are treated as being defined in the
390 "CORE" package. So times for "print" calls from anywhere in your code
391 are merged and accounted for as calls to an xsub called "CORE::print".
392
393 If "N" is 2 (the default) then builtins are treated as being defined in
394 the package that calls them. So calls to "print" from package "Foo" are
395 treated as calls to an xsub called "Foo::CORE:print". Note the single
396 colon after CORE.
397
398 The opcodes are currently profiled using their internal names, so
399 "printf" is "prtf" and the "-x" file test is "fteexec". This may change
400 in future.
401
402 Opcodes that call subroutines, perhaps by triggering a FETCH from a
403 tied variable, currently appear in the call tree as the caller of the
404 sub. This is likely to change in future.
405
406 usecputime=1
407 This option has been removed. Profiling won't be enabled if set.
408
409 Use the "clock=N" option to select a high-resolution CPU time clock, if
410 available on your system, instead. That will give you higher resolution
411 and work for the subroutine profiler as well.
412
413 file=...
414 Specify the output file to write profile data to (default:
415 './nytprof.out').
416
417 compress=...
418 Specify the compression level to use, if NYTProf is compiled with
419 compression support. Valid values are 0 to 9, with 0 disabling
420 compression. The default is 6 as higher values yield little extra
421 compression but the cpu cost starts to rise significantly. Using level
422 1 still gives you a significant reduction in file size.
423
424 If NYTProf was not compiled with compression support, this option is
425 silently ignored.
426
427 clock=N
428 Systems which support the "clock_gettime()" system call typically
429 support several clocks. By default NYTProf uses CLOCK_MONOTONIC.
430
431 This option enables you to select a different clock by specifying the
432 integer id of the clock (which may vary between operating system
433 types). If the clock you select isn't available then CLOCK_REALTIME is
434 used.
435
436 See "CLOCKS" for more information.
437
438 sigexit=1
439 When perl exits normally it runs any code defined in "END" blocks.
440 NYTProf defines an END block that finishes profiling and writes out the
441 final profile data.
442
443 If the process ends due to a signal then END blocks are not executed so
444 the profile will be incomplete and unusable. The "sigexit" option
445 tells NYTProf to catch some signals (e.g. INT, HUP, PIPE, SEGV, BUS)
446 and ensure a usable profile by executing:
447
448 DB::finish_profile();
449 exit 1;
450
451 You can also specify which signals to catch in this way by listing
452 them, separated by commas, as the value of the option (case is not
453 significant):
454
455 sigexit=int,hup
456
457 posix_exit=1
458 The NYTProf subroutine profiler normally detects calls to
459 "POSIX::_exit()" (which exits the process without running END blocks)
460 and automatically calls "DB::finish_profile()" for you, so NYTProf
461 'just works'.
462
463 When using the "subs=0" option to disable the subroutine profiler the
464 "posix_exit" option can be used to tell NYTProf to take other steps to
465 arrange for "DB::finish_profile()" to be called before
466 "POSIX::_exit()".
467
468 libcexit=1
469 Arranges for "finish_profile" to be called via the C library "atexit()"
470 function. This may help some tricky cases where the process may exit
471 without perl executing the "END" block that NYTProf uses to call
472 /finish_profile().
473
474 endatexit=1
475 Sets the PERL_EXIT_DESTRUCT_END flag in the PL_exit_flags of the perl
476 interpreter. This makes perl run "END" blocks in perl_destruct()
477 instead of perl_run() which may help in cases, like Apache, where perl
478 is embedded but perl_run() isn't called.
479
480 forkdepth=N
481 When a perl process that is being profiled executes a fork() the child
482 process is also profiled. The forkdepth option can be used to control
483 this. If forkdepth is zero then profiling will be disabled in the child
484 process.
485
486 If forkdepth is greater than zero then profiling will be enabled in the
487 child process and the forkdepth value in that process is decremented by
488 one.
489
490 If forkdepth is -1 (the default) then there's no limit on the number of
491 generations of children that are profiled.
492
493 nameevals=0
494 The 'file name' of a string eval is normally a string like ""(eval
495 N)"", where "N" is a sequence number. By default NYTProf asks perl to
496 give evals more informative names like ""(eval N)[file:line]"", where
497 "file" and "line" are the file and line number where the string "eval"
498 was executed.
499
500 The "nameevals=0" option can be used to disable the more informative
501 names and return to the default behaviour. This may be need in rare
502 cases where the application code is sensitive to the name given to a
503 "eval". (The most common case in when running test suites undef
504 NYTProf.)
505
506 The downside is that the NYTProf reporting tools are less useful and
507 may get confused if this option is used.
508
509 nameanonsubs=0
510 The name of a anonymous subroutine is normally ""__ANON__"". By
511 default NYTProf asks perl to give anonymous subroutines more
512 informative names like ""__ANON__[file:line]"", where "file" and "line"
513 are the file and line number where the anonymous subroutine was
514 defined.
515
516 The "nameanonsubs=0" option can be used to disable the more informative
517 names and return to the default behaviour. This may be need in rare
518 cases where the application code is sensitive to the name given to a
519 anonymous subroutines. (The most common case in when running test
520 suites undef NYTProf.)
521
522 The downside is that the NYTProf reporting tools are less useful and
523 may get confused if this option is used.
524
526 You can profile only parts of an application by calling
527 DB::disable_profile() to stop collecting profile data, and calling
528 DB::enable_profile() to start collecting profile data.
529
530 Using the "start=no" option lets you leave the profiler disabled
531 initially until you call DB::enable_profile() at the right moment. You
532 still need to load Devel::NYTProf as early as possible, even if you
533 don't call DB::enable_profile() until much later. That's because any
534 code that's compiled before Devel::NYTProf is loaded will not be
535 profiled by default. See also "use_db_sub=1".
536
537 The profile output file can't be used until it's been properly
538 completed and closed. Calling DB::disable_profile() doesn't do that.
539 To make a profile file usable before the profiled application has
540 completed you can call DB::finish_profile(). Alternatively you could
541 call DB::enable_profile($newfile).
542
543 Always call the DB::enable_profile(), DB::disable_profile() or
544 DB::finish_profile() function with the "DB::" prefix as shown because
545 you can't import them. They're provided automatically when NYTProf is
546 in use.
547
548 disable_profile
549 DB::disable_profile()
550
551 Stops collection of profile data until DB:enable_profile() is called.
552
553 Subroutine calls which were made while profiling was enabled and are
554 still on the call stack (have not yet exited) will still have their
555 profile data collected when they exit. Compare with "finish_profile"
556 below.
557
558 enable_profile
559 DB::enable_profile($newfile)
560 DB::enable_profile()
561
562 Enables collection of profile data. If $newfile is specified the
563 profile data will be written to $newfile (after completing and closing
564 the previous file, if any). If $newfile already exists it will be
565 deleted first. If DB::enable_profile() is called without a filename
566 argument then profile data will continue to be written to the current
567 file (nytprof.out by default).
568
569 finish_profile
570 DB::finish_profile()
571
572 Calls DB::disable_profile(), then completes the profile data file by
573 writing subroutine profile data, and then closes the file. The in
574 memory subroutine profile data is then discarded.
575
576 Normally NYTProf arranges to call finish_profile() for you via an END
577 block.
578
580 NYTProf tries very hard to gather accurate information. The nature of
581 the internals of perl mean that, in some cases, the information that's
582 gathered is accurate but surprising. In some cases it can appear to be
583 misleading. (Of course, in some cases it may actually be plain wrong.
584 Caveat lector.)
585
586 If Statement and Subroutine Timings Don't Match
587 NYTProf has two profilers: a statement profiler that's invoked when
588 perl moves from one perl statement to another, and a subroutine
589 profiler that's invoked when perl calls or returns from a subroutine.
590
591 The individual statement timings for a subroutine usually add up to
592 slightly less than the exclusive time for the subroutine. That's
593 because the handling of the subroutine call and return overheads is
594 included in the exclusive time for the subroutine. The difference may
595 only be a few microseconds but that may become noticeable for
596 subroutines that are called hundreds of thousands of times.
597
598 The statement profiler keeps track how much time was spent on
599 overheads, like writing statement profile data to disk. The subroutine
600 profiler subtracts the overheads that have accumulated between entering
601 and leaving the subroutine in order to give a more accurate profile.
602 The statement profiler is generally very fast because most writes get
603 buffered for zip compression so the profiler overhead per statement
604 tends to be very small, often a single 'tick'. The result is that the
605 accumulated overhead is quite noisy. This becomes more significant for
606 subroutines that are called frequently and are also fast. This may be
607 another, smaller, contribution to the discrepancy between statement
608 time and exclusive times.
609
610 If Headline Subroutine Timings Don't Match the Called Subs
611 Overall subroutine times are reported with a headline like "spent 10s
612 (2+8) within ...". In this example, 10 seconds were spent inside the
613 subroutine (the "inclusive time") and, of that, 8 seconds were spent in
614 subroutines called by this one. That leaves 2 seconds as the time
615 spent in the subroutine code itself (the "exclusive time", sometimes
616 also called the "self time").
617
618 The report shows the source code of the subroutine. Lines that make
619 calls to other subroutines are annotated with details of the time spent
620 in those calls.
621
622 Sometimes the sum of the times for calls made by the lines of code in
623 the subroutine is less than the inclusive-exclusive time reported in
624 the headline (10-2 = 8 seconds in the example above).
625
626 What's happening here is that calls to other subroutines are being made
627 but NYTProf isn't able to determine the calling location correctly so
628 the calls don't appear in the report in the correct place.
629
630 Using an old version of perl is one cause (see below). Another is
631 calling subroutines that exit via "goto ⊂" - most frequently
632 encountered in AUTOLOAD subs and code using the Memoize module.
633
634 In general the overall subroutine timing is accurate and should be
635 trusted more than the sum of statement or nested sub call timings.
636
637 Perl 5.10.1+ (or else 5.8.9+) is Recommended
638 These versions of perl yield much more detailed information about calls
639 to BEGIN, CHECK, INIT, and END blocks, the code handling tied or
640 overloaded variables, and callbacks from XS code.
641
642 Perl 5.12 will hopefully also fix an inaccuracy in the timing of the
643 last statement and the condition clause of some kinds of loops:
644 <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>
645
646 eval $string
647 Perl treats each execution of a string eval ("eval $string;" not "eval
648 { ... }") as a distinct file, so NYTProf does as well. The 'files' are
649 given names with this structure:
650
651 (eval $sequence)[$filename:$line]
652
653 for example ""(eval 93)[/foo/bar.pm:42]"" would be the name given to
654 the 93rd execution of a string eval by that process and, in this case,
655 the 93rd eval happened to be one at line 42 of "/foo/bar.pm".
656
657 Nested string evals can give rise to file names like
658
659 (eval 1047)[(eval 93)[/foo/bar.pm:42]:17]
660
661 Merging Evals
662
663 Some applications execute a great many string eval statements. If
664 NYTProf generated a report page for each one it would not only slow
665 report generation but also make the overall report less useful by
666 scattering performance data too widely. On the other hand, being able
667 to see the actual source code executed by an eval, along with the
668 timing details, is often very useful.
669
670 To try to balance these conflicting needs, NYTProf currently merges
671 uninteresting string eval siblings.
672
673 What does that mean? Well, for each source code line that executed any
674 string evals, NYTProf first gathers the corresponding eval 'files' for
675 that line (known as the 'siblings') into groups keyed by distinct
676 source code.
677
678 Then, for each of those groups of siblings, NYTProf will 'merge' a
679 group that shares the same source code and doesn't execute any string
680 evals itself. Merging means to pick one sibling as the survivor and
681 merge and delete all the data from the others into it.
682
683 If there are a large number of sibling groups then the data for all of
684 them are merged into one regardless.
685
686 The report annotations will indicate when evals have been merged
687 together.
688
689 Merging Anonymous Subroutines
690
691 Anonymous subroutines defined within string evals have names like this:
692
693 main::__ANON__[(eval 75)[/foo/bar.pm:42]:12]
694
695 That anonymous subroutine was defined on line 12 of the source code
696 executed by the string eval on line 42 of /foo/bar.pm. That was the
697 75th string eval executed by the program.
698
699 Anonymous subroutines defined on the same line of sibling evals that
700 get merged are also merged. That is, the profile information is merged
701 into one and the others are discarded.
702
703 Timing
704
705 Care should be taken when interpreting the report annotations
706 associated with a string eval statement. Normally the report
707 annotations embedded into the source code related to timings from the
708 subroutine profiler. This isn't (currently) true of annotations for
709 string eval statements.
710
711 This makes a significant different if the eval defines any subroutines
712 that get called after the eval has returned. Because the time shown for
713 a string eval is based on the statement times it will include time
714 spent executing statements within the subs defined by the eval.
715
716 In future NYTProf may involve the subroutine profiler in timings evals
717 and so be able to avoid this issue.
718
719 Calls from XSUBs and Opcodes
720 Calls record the current filename and line number of the perl code at
721 the time the call was made. That's fine and accurate for calls from
722 perl code. For calls that originate from C code however, such as an
723 XSUB or an opcode, the filename and line number recorded are still
724 those of the last perl statement executed.
725
726 For example, a line that calls an xsub will appear in reports to also
727 have also called any subroutines that that xsub called. This can be
728 construed as a feature.
729
730 As an extreme example, the first time a regular expression that uses
731 character classes is executed on a unicode string you'll find profile
732 data like this:
733
734 # spent 1ms within main::BEGIN@4 which was called
735 # once (1ms+0s) by main::CORE:subst at line 0
736 4 s/ (?: [A-Z] | [\d] )+ (?= [\s] ) //x;
737 # spent 38.8ms making 1 call to main::CORE:subst
738 # spent 25.4ms making 2 calls to utf8::SWASHNEW, avg 12.7ms/call
739 # spent 12.4ms making 1 call to utf8::AUTOLOAD
740
742 You can reduce the cost of profiling by adjusting some options. The
743 trade-off is reduced detail and/or accuracy in reports.
744
745 If you don't need statement-level profiling then you can disable it via
746 "stmts=0". To further boost statement-level profiling performance try
747 "leave=0" but note that will apportion timings for some kinds of
748 statements less accurate).
749
750 If you don't need call stacks or flamegraph then disable it via
751 calls=0. If you don't need subroutine profiling then you can disable
752 it via "subs=0". If you do need it but don't need timings for perl
753 opcodes then set slowops=0.
754
755 Generally speaking, setting calls=0 and slowops=0 will give you a
756 useful boost with the least loss of detail.
757
758 Another approach is to only enable NYTProf in the sections of code that
759 interest you. See "RUN-TIME CONTROL OF PROFILING" for more details.
760
761 To speed up nytprofhtml try using the --minimal (-m) or --no-flame
762 options.
763
765 The Devel::NYTProf::Data module provides a low-level interface for
766 loading the profile data.
767
768 The Devel::NYTProf::Reader module provides an interface for generating
769 arbitrary reports. This means that you can implement your own output
770 format in perl. (Though the module is in a state of flux and may be
771 deprecated soon.)
772
773 Included in the bin directory of this distribution are some scripts
774 which turn the raw profile data into more useful formats:
775
776 nytprofhtml
777 Creates attractive, richly annotated, and fully cross-linked html
778 reports (including statistics, source code and color highlighting).
779 This is the main report generation tool for NYTProf.
780
781 nytprofcg
782 Translates a profile into a format that can be loaded into KCachegrind
783 <http://kcachegrind.github.io/>
784
785 nytprofcalls
786 Reads a profile and processes the calls events it contains.
787
788 nytprofmerge
789 Reads multiple profile data files and writes out a new file containing
790 the merged profile data.
791
793 Threads and Multiplicity
794 "Devel::NYTProf" is not currently thread safe or multiplicity safe. If
795 you'd be interested in helping to fix that then please get in touch
796 with us. Meanwhile, profiling is disabled when a thread is created, and
797 NYTProf tries to ignore any activity from perl interpreters other than
798 the first one that loaded it.
799
800 Coro
801 The "Devel::NYTProf" subroutine profiler gets confused by the stack
802 gymnastics performed by the Coro module and aborts. When profiling
803 applications that use Coro you should disable the subroutine profiler
804 using the "subs=0" option.
805
806 FCGI::Engine
807 Using "open('-|')" in code running under FCGI::Engine causes a panic in
808 nytprofcalls. See https://github.com/timbunce/devel-nytprof/issues/20
809 for more information.
810
811 For perl < 5.8.8 it may change what caller() returns
812 For example, the Readonly module croaks with "Invalid tie" when
813 profiled with perl versions before 5.8.8. That's because Readonly
814 explicitly checking for certain values from caller(). The NEXT module
815 is also affected.
816
817 For perl < 5.10.1 it can't see some implicit calls and callbacks
818 For perl versions prior to 5.8.9 and 5.10.1, some implicit subroutine
819 calls can't be seen by the subroutine profiler. Technically this
820 affects calls made via the various perl "call_*()" internal APIs.
821
822 For example, BEGIN/CHECK/INIT/END blocks, the "TIE"whatever subroutine
823 called by "tie()", all calls made via operator overloading, and
824 callbacks from XS code, are not seen.
825
826 The effect is that time in those subroutines is accumulated by the subs
827 that triggered the call to them. So time spent in calls invoked by perl
828 to handle overloading are accumulated by the subroutines that trigger
829 overloading (so it is measured, but the cost is dispersed across
830 possibly many calling locations).
831
832 Although the calls aren't seen by the subroutine profiler, the
833 individual statements executed by the code in the called subs are
834 profiled by the statement profiler.
835
836 #line directives
837 The reporting code currently doesn't handle #line directives, but at
838 least it warns about them. Patches welcome.
839
840 Freed values in @_ may be mutated
841 Perl has a class of bugs related to the fact that values placed in the
842 stack are not reference counted. Consider this example:
843
844 @a = (1..9); sub s { undef @a; print $_ for @_ } s(@a);
845
846 The "undef @a" frees the values that @_ refers to. Perl can sometimes
847 detect when a freed value is accessed and treats it as an undef.
848 However, if the freed value is assigned some new value then @_ is
849 effectively corrupted.
850
851 NYTProf allocates new values while it's profiling, in order to record
852 program activity, and so may appear to corrupt @_ in this (rare)
853 situation. If this happens, NYTProf is simply exposing an existing
854 problem in the code.
855
856 Lvalue subroutines aren't profiled when using use_db_sub=1
857 Currently 'lvalue' subroutines (subs that can be assigned to, like
858 "foo() = 42") are not profiled when using use_db_sub=1.
859
861 Here we discuss the way NYTProf gets high-resolution timing information
862 from your system and related issues.
863
864 POSIX Clocks
865 These are the clocks that your system may support if it supports the
866 POSIX "clock_gettime()" function. Other clock sources are listed in the
867 "Other Clocks" section below.
868
869 The "clock_gettime()" interface allows clocks to return times to
870 nanosecond precision. Of course few offer nanosecond accuracy but the
871 extra precision helps reduce the cumulative error that naturally occurs
872 when adding together many timings. When using these clocks NYTProf
873 outputs timings as a count of 100 nanosecond ticks.
874
875 CLOCK_MONOTONIC
876
877 CLOCK_MONOTONIC represents the amount of time since an unspecified
878 point in the past (typically system start-up time). It increments
879 uniformly independent of adjustments to 'wallclock time'. NYTProf will
880 use this clock by default, if available.
881
882 CLOCK_REALTIME
883
884 CLOCK_REALTIME is typically the system's main high resolution 'wall
885 clock time' source. The same source as used for the gettimeofday()
886 call used by most kinds of perl benchmarking and profiling tools.
887
888 The problem with real time is that it's far from simple. It tends to
889 drift and then be reset to match 'reality', either sharply or by small
890 adjustments (via the adjtime() system call).
891
892 Surprisingly, it can also go backwards, for reasons explained in
893 http://preview.tinyurl.com/5wawnn so CLOCK_MONOTONIC is preferred.
894
895 CLOCK_VIRTUAL
896
897 CLOCK_VIRTUAL increments only when the CPU is running in user mode on
898 behalf of the calling process.
899
900 CLOCK_PROF
901
902 CLOCK_PROF increments when the CPU is running in user or kernel mode.
903
904 CLOCK_PROCESS_CPUTIME_ID
905
906 CLOCK_PROCESS_CPUTIME_ID represents the amount of execution time of the
907 process associated with the clock.
908
909 CLOCK_THREAD_CPUTIME_ID
910
911 CLOCK_THREAD_CPUTIME_ID represents the amount of execution time of the
912 thread associated with the clock.
913
914 Finding Available POSIX Clocks
915
916 On unix-like systems you can find the CLOCK_* clocks available on you
917 system using a command like:
918
919 grep -r 'define *CLOCK_' /usr/include
920
921 Look for a group that includes CLOCK_REALTIME. The integer values
922 listed are the clock ids that you can use with the "clock=N" option.
923
924 A future version of NYTProf should be able to list the supported
925 clocks.
926
927 Other Clocks
928 This section lists other clock sources that NYTProf may use.
929
930 If your system doesn't support clock_gettime() then NYTProf will use
931 gettimeofday(), or the nearest equivalent,
932
933 gettimeofday
934
935 This is the traditional high resolution time of day interface for most
936 unix-like systems. With this clock NYTProf outputs timings as a count
937 of 1 microsecond ticks.
938
939 mach_absolute_time
940
941 On Mac OS X the mach_absolute_time() function is used. With this clock
942 NYTProf outputs timings as a count of 100 nanosecond ticks.
943
944 Time::HiRes
945
946 On systems which don't support other clocks, NYTProf falls back to
947 using the Time::HiRes module. With this clock NYTProf outputs timings
948 as a count of 1 microsecond ticks.
949
950 Clock References
951 Relevant specifications and manual pages:
952
953 http://www.opengroup.org/onlinepubs/000095399/functions/clock_getres.html
954 http://linux.die.net/man/3/clock_gettime
955
956 Why 'realtime' can appear to go backwards:
957
958 http://preview.tinyurl.com/5wawnn
959
960 The PostgreSQL pg_test_timing utility documentation has a good summary
961 of timing issues:
962
963 http://www.postgresql.org/docs/9.2/static/pgtesttiming.html
964
966 SMP Systems
967 On systems with multiple processors, which includes most modern
968 machines, (from Linux docs though applicable to most SMP systems):
969
970 The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on
971 many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium).
972 These registers may differ between CPUs and as a consequence these clocks may
973 return bogus results if a process is migrated to another CPU.
974
975 If the CPUs in an SMP system have different clock sources then there is no way
976 to maintain a correlation between the timer registers since each CPU will run
977 at a slightly different frequency. If that is the case then
978 clock_getcpuclockid(0) will return ENOENT to signify this condition. The two
979 clocks will then only be useful if it can be ensured that a process stays on a
980 certain CPU.
981
982 The processors in an SMP system do not start all at exactly the same time and
983 therefore the timer registers are typically running at an offset. Some
984 architectures include code that attempts to limit these offsets on bootup.
985 However, the code cannot guarantee to accurately tune the offsets. Glibc
986 contains no provisions to deal with these offsets (unlike the Linux Kernel).
987 Typically these offsets are small and therefore the effects may be negligible
988 in most cases.
989
990 In summary, SMP systems are likely to give 'noisy' profiles. Setting a
991 "Processor Affinity" may help.
992
993 Processor Affinity
994
995 Processor affinity is an aspect of task scheduling on SMP systems.
996 "Processor affinity takes advantage of the fact that some remnants of a
997 process may remain in one processor's state (in particular, in its
998 cache) from the last time the process ran, and so scheduling it to run
999 on the same processor the next time could result in the process running
1000 more efficiently than if it were to run on another processor." (From
1001 http://en.wikipedia.org/wiki/Processor_affinity)
1002
1003 Setting an explicit processor affinity can avoid the problems described
1004 in "SMP Systems".
1005
1006 Processor affinity can be set using the "taskset" command on Linux.
1007
1008 Note that processor affinity is inherited by child processes, so if the
1009 process you're profiling spawns cpu intensive sub processes then your
1010 process will be impacted by those more than it otherwise would.
1011
1012 Windows
1013
1014 THIS SECTION DOESN'T MATCH THE CODE
1015
1016 On Windows NYTProf uses Time::HiRes which uses the windows
1017 QueryPerformanceCounter() API with some extra logic to adjust for the
1018 current clock speed and try to resync the raw counter to wallclock time
1019 every so often (every 30 seconds or if the timer drifts by more than
1020 0.5 of a seconds). This extra logic may lead to occasional spurious
1021 results.
1022
1023 (It would be great if someone could contribute a patch to NYTProf to
1024 use QueryPerformanceCounter() directly and avoid the overheads and
1025 resyncing behaviour of Time::HiRes.)
1026
1027 Virtual Machines
1028 I recommend you don't do performance profiling while running in a
1029 virtual machine. If you do you're likely to find inexplicable spikes
1030 of real-time appearing at unreasonable places in your code. You should
1031 pay less attention to the statement timings and rely more on the
1032 subroutine timings. They will still be noisy but less so than the
1033 statement times.
1034
1035 You could also try using the "clock=N" option to select a high-
1036 resolution cpu-time clock instead of a real-time one. That should be
1037 much less noisy, though you will lose visibility of wait-times due to
1038 network and disk I/O, for example.
1039
1041 Possibly. All complex software has bugs. Let me know if you find one.
1042
1044 Screenshots of nytprofhtml v2.01 reports can be seen at
1045 <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-index.png>
1046 and
1047 <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-all-perl-files.png>.
1048 A writeup of the new features of NYTProf v2 can be found at
1049 <http://blog.timbunce.org/2008/07/15/nytprof-v2-a-major-advance-in-perl-profilers/>
1050 and the background story, explaining the "why", can be found at
1051 <http://blog.timbunce.org/2008/07/16/nytprof-v2-the-background-story/>.
1052
1053 Mailing list and discussion at
1054 <http://groups.google.com/group/develnytprof-dev>
1055
1056 Blog posts <http://blog.timbunce.org/tag/nytprof/>
1057
1058 Public Github Repository and hacking instructions at
1059 <https://github.com/timbunce/devel-nytprof/>
1060
1061 nytprofhtml is a script included that produces html reports.
1062 nytprofcsv is another script included that produces plain text CSV
1063 reports.
1064
1065 Devel::NYTProf::Reader is the module that powers the report scripts.
1066 You might want to check this out if you plan to implement a custom
1067 report (though it's very likely to be deprecated in a future release).
1068
1069 Devel::NYTProf::ReadStream is the module that lets you read a profile
1070 data file as a stream of chunks of data.
1071
1072 Other tools:
1073
1074 DTrace <https://speakerdeck.com/mrallen1/perl-dtrace-and-you>
1075
1077 "Profile data incomplete, ..." or "Profile format error: ..."
1078 This error message means the file doesn't contain all the expected data
1079 or the data has been corrupted in some way. That may be because it was
1080 truncated (perhaps the filesystem was full) or, more commonly, because
1081 the all the expected data hasn't been written.
1082
1083 NYTProf writes some important data to the data file when finishing
1084 profiling. If you read the file before the profiling has finished
1085 you'll get this error.
1086
1087 If the process being profiled is still running you'll need to wait
1088 until it exits cleanly (runs "END" blocks or "finish_profile" is called
1089 explicitly).
1090
1091 If the process being profiled has exited then it's likely that it met
1092 with a sudden and unnatural death that didn't give NYTProf a chance to
1093 finish the profile. If the sudden death was due to a signal, like
1094 SIGTERM, or a SIGINT from pressing Ctrl-C, then the "sigexit=1" option
1095 may help.
1096
1097 If the sudden death was due to calling "POSIX::_exit($status)" then
1098 you'll need to call "finish_profile" before calling "POSIX::_exit".
1099
1100 You'll also get this error if the code trying to read the profile is
1101 itself being profiled. That's most likely to happen if you enable
1102 profiling via the "PERL5OPT" environment variable and have forgotten to
1103 unset it.
1104
1105 If you've encountered this error message, and you're sure you've
1106 understood the concerns described above, and you're sure they don't
1107 apply in your case, then please open an issue. Be sure to include
1108 sufficient information so I can see how you've addressed these likely
1109 causes.
1110
1111 Some source files don't have profile information
1112 This is usually due to NYTProf being initialized after some perl files
1113 have already been compiled.
1114
1115 If you can't alter the command line to add ""-d:NYTProf"" you could try
1116 using the "PERL5OPT" environment variable. See "PROFILING".
1117
1118 You could also try using the "use_db_sub=1" option.
1119
1120 Eval ... has unknown invoking fid
1121 When using the statement profiler you may see a warning message like
1122 this:
1123
1124 Eval '(eval 2)' (fid 9, flags:viastmt,savesrc) has unknown invoking fid 10
1125
1126 Notice that the eval file id (fid 9 in this case) is lower than the
1127 file id that invoked the eval (fid 10 in this case). This is a known
1128 problem caused by the way perl works and how the profiler assigns and
1129 outputs the file ids. The invoking fid is known but gets assigned a
1130 fid and output after the fid for the eval, and that causes the warning
1131 when the file is read.
1132
1133 Warning: %d subroutine calls had negative time
1134 There are two likely causes for this: clock instability, or accumulated
1135 timing errors.
1136
1137 Clock instability, if present on your system, is most likely to be
1138 noticeable on very small/fast subroutines that are called very few
1139 times.
1140
1141 Accumulated timing errors can arise because the subroutine profiler
1142 uses floating point values (NVs) to store the times. They are most
1143 likely to be noticed on subroutines that are called a few times but
1144 which make a large number of calls to very fast subroutines (such as
1145 opcodes). In this case the accumulated time apparently spent making
1146 those calls can be greater than the time spent in the calling
1147 subroutine.
1148
1149 If you rerun nytprofhtml (etc.) with the "trace=N" option set >0 you'll
1150 see trace messages like "%s call has negative time: incl %fs, excl
1151 %fs" for each affected subroutine.
1152
1153 Try profiling with the "slowops=N" option set to 0 to disable the
1154 profiling of opcodes. Since opcodes often execute in a few microseconds
1155 they are a common cause of this warning.
1156
1157 You could also try recompiling perl to use 'long doubles' for the NV
1158 floating point type (use Configure -Duselongdouble). If you try this
1159 please let me know. I'd also happily take a patch to use long doubles,
1160 if available, by default.
1161
1162 panic: buffer overflow ...
1163 You have unusually long subroutine names in your code. You'll need to
1164 rebuild Devel::NYTProf with the NYTP_MAX_SUB_NAME_LEN environment
1165 variable set to a value longer than the longest subroutine names in
1166 your code.
1167
1169 Tim Bunce (<http://www.tim.bunce.name> and <http://blog.timbunce.org>)
1170 leads the project and has done most of the development work thus far.
1171
1172 Nicholas Clark contributed zip compression and "nytprofmerge". Chia-
1173 liang Kao contributed "nytprofcg". Peter (Stig) Edwards contributed
1174 the VMS port. Jan Dubois contributed the Windows port. Gisle Aas
1175 contributed the Devel::NYTProf::ReadStream module. Steve Peters
1176 contributed greater perl version portability and use of POSIX high-
1177 resolution clocks. Other contributors are noted in the Changes file.
1178
1179 Many thanks to Adam Kaplan who created "NYTProf" initially by forking
1180 "Devel::FastProf" adding reporting from "Devel::Cover" and a test
1181 suite.
1182
1184 Copyright (C) 2008 by Adam Kaplan and The New York Times Company.
1185 Copyright (C) 2008-2016 by Tim Bunce, Ireland.
1186
1187 This library is free software; you can redistribute it and/or modify it
1188 under the same terms as Perl itself, either Perl version 5.8.8 or, at
1189 your option, any later version of Perl 5 you may have available.
1190
1191 Background
1192 Subroutine-level profilers:
1193
1194 Devel::DProf | 1995-10-31 | ILYAZ
1195 Devel::AutoProfiler | 2002-04-07 | GSLONDON
1196 Devel::Profiler | 2002-05-20 | SAMTREGAR
1197 Devel::Profile | 2003-04-13 | JAW
1198 Devel::DProfLB | 2006-05-11 | JAW
1199 Devel::WxProf | 2008-04-14 | MKUTTER
1200
1201 Statement-level profilers:
1202
1203 Devel::SmallProf | 1997-07-30 | ASHTED
1204 Devel::FastProf | 2005-09-20 | SALVA
1205 Devel::NYTProf | 2008-03-04 | AKAPLAN
1206 Devel::Profit | 2008-05-19 | LBROCARD
1207
1208 Devel::NYTProf is a (now distant) fork of Devel::FastProf, which was
1209 itself an evolution of Devel::SmallProf.
1210
1211 Adam Kaplan forked Devel::FastProf and added html report generation
1212 (based on Devel::Cover) and a test suite - a tricky thing to do for a
1213 profiler. Meanwhile Tim Bunce had been extending Devel::FastProf to
1214 add novel per-sub and per-block timing, plus subroutine caller
1215 tracking.
1216
1217 When Devel::NYTProf was released Tim switched to working on
1218 Devel::NYTProf because the html report would be a good way to show the
1219 extra profile data, and the test suite made development much easier and
1220 safer.
1221
1222 Then Tim went a little crazy and added a slew of new features, in
1223 addition to per-sub and per-block timing and subroutine caller
1224 tracking. These included the 'opcode interception' method of profiling,
1225 ultra-fast and robust inclusive subroutine timing, doubling
1226 performance, plus major changes to html reporting to display all the
1227 extra profile call and timing data in richly annotated and cross-linked
1228 reports.
1229
1230 Steve Peters came on board along the way with patches for portability
1231 and to keep NYTProf working with the latest development perl versions.
1232 Nicholas Clark added zip compression, many optimizations, and
1233 "nytprofmerge". Jan Dubois contributed Windows support.
1234
1235 Adam's work was sponsored by The New York Times Co.
1236 <http://open.nytimes.com>. Tim's work was partly sponsored by
1237 Shopzilla <http://www.shopzilla.com> during 2008 but hasn't been
1238 sponsored since then.
1239
1240 For the record, Tim has never worked for the New York Times nor has he
1241 received any kind of sponsorship or support from them in relation to
1242 NYTProf. The name of this module is simply result of the history
1243 outlined above, which can be summarised as: Adam forked an existing
1244 module when he added his enhancements and Tim didn't.
1245
1246
1247
1248perl v5.32.1 2021-04-07 Devel::NYTProf(3)