1Devel::NYTProf(3) User Contributed Perl Documentation Devel::NYTProf(3)
2
3
4
6 Devel::NYTProf - Powerful fast feature-rich Perl source code profiler
7
9 # profile code and write database to ./nytprof.out
10 perl -d:NYTProf some_perl.pl
11
12 # convert database into a set of html files, e.g., ./nytprof/index.html
13 # and open a web browser on the nytprof/index.html file
14 nytprofhtml --open
15
16 # or into comma separated files, e.g., ./nytprof/*.csv
17 nytprofcsv
18
19 Tim Bunce has often given talks on profiling perl code, including a
20 detailed look at how to use NYTProf and how to optimize your code. A
21 video of his YAPC::NA 2014 talk can be found at
22 <https://youtu.be/T7EK6RZAnEA>
23
25 Devel::NYTProf is a powerful, fast, feature-rich perl source code
26 profiler.
27
28 • Performs per-line statement profiling for fine detail
29
30 • Performs per-subroutine statement profiling for overview
31
32 • Performs per-opcode profiling for slow perl builtins
33
34 • Performs per-block statement profiling (the first profiler to do
35 so)
36
37 • Accounts correctly for time spent after calls return
38
39 • Performs inclusive and exclusive timing of subroutines
40
41 • Subroutine times are per calling location (a powerful feature)
42
43 • Can profile compile-time activity, just run-time, or just END time
44
45 • Uses novel techniques for efficient profiling
46
47 • Sub-microsecond (100ns) resolution on supported systems
48
49 • Very fast - the fastest statement and subroutine profilers for perl
50
51 • Handles applications that fork, with no performance cost
52
53 • Immune from noise caused by profiling overheads and I/O
54
55 • Program being profiled can stop/start the profiler
56
57 • Generates richly annotated and cross-linked html reports
58
59 • Captures source code, including string evals, for stable results
60
61 • Trivial to use with mod_perl - add one line to httpd.conf
62
63 • Includes an extensive test suite
64
65 • Tested on very large codebases
66
67 NYTProf is effectively two profilers in one: a statement profiler, and
68 a subroutine profiler.
69
70 Statement Profiling
71 The statement profiler measures the time between entering one perl
72 statement and entering the next. Whenever execution reaches a new
73 statement, the time since entering the previous statement is calculated
74 and added to the time associated with the line of the source file that
75 the previous statement starts on.
76
77 By default the statement profiler also determines the first line of the
78 current block and the first line of the current statement, and
79 accumulates times associated with those.
80
81 Another innovation unique to NYTProf is automatic compensation for a
82 problem inherent in simplistic statement-to-statement timing. Consider
83 a statement that calls a subroutine and then performs some other work
84 that doesn't execute new statements, for example:
85
86 foo(...) + mkdir(...);
87
88 In all other statement profilers the time spent in remainder of the
89 expression (mkdir in the example) will be recorded as having been spent
90 on the last statement executed in foo()! Here's another example:
91
92 while (<>) {
93 ...
94 1;
95 }
96
97 After the first time around the loop, any further time spent evaluating
98 the condition (waiting for input in this example) would be recorded as
99 having been spent on the last statement executed in the loop! (Until
100 perl bug #60954 is fixed this problem still applies to some loops. For
101 more information see
102 <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>)
103
104 NYTProf avoids these problems by intercepting the opcodes which
105 indicate that control is returning into some previous statement and
106 adjusting the profile accordingly.
107
108 The statement profiler naturally generates a lot of data which is
109 streamed out to a file in a very compact format. NYTProf takes care to
110 not include the measurement and writing overheads in the profile times
111 (some profilers produce 'noisy' data due to periodic stdio flushing).
112
113 Subroutine Profiling
114 The subroutine profiler measures the time between entering a subroutine
115 and leaving it. It then increments a call count and accumulates the
116 duration. For each subroutine called, separate counts and durations
117 are stored for each location that called the subroutine.
118
119 Subroutine entry is detected by intercepting the "entersub" opcode.
120 Subroutine exit is detected via perl's internal save stack. As a result
121 the subroutine profiler is both fast and robust.
122
123 Subroutine Recursion
124
125 For subroutines that recurse directly or indirectly, such as
126 Error::try, the inclusive time is only measured for the outer-most
127 call.
128
129 The inclusive times of recursive calls are still measured and are
130 accumulated separately. Also the 'maximum recursion depth' per calling
131 location is recorded.
132
133 Goto &Subroutine
134
135 Perl implements a "goto &destination" as a "return" followed by a call
136 to &destination, so that's how it will appear in the report.
137
138 The "goto" will be shown with a very short time because it's
139 effectively just a "return". The &destination sub will show a call not
140 from the location of the "goto" but from the location of the call to
141 the sub that performed the "goto".
142
143 accept()
144
145 The perl built-in accept() function waits listening for a connection on
146 a socket, and so is a key part of pure-perl network service
147 applications.
148
149 The time spent waiting for a remotely initiated connection can be
150 relatively high but is not relevant to the performance of the
151 application. So the accept() function is treated as a special case. The
152 subroutine profiler discounts the time spent in the accept() function.
153 It does this in a way that also discounts that time from all the
154 callers up the call stack. The effect on the reports is that all
155 accept() calls appear to be instant.
156
157 The statement profiler still shows the time actually spent in the
158 statement that executed the accept() call.
159
160 Application Profiling
161 NYTProf records extra information in the data file to capture details
162 that may be useful when analyzing the performance. It also records the
163 filename and line ranges of all the subroutines.
164
165 NYTProf can profile applications that fork, and does so with no loss of
166 performance. NYTProf detects the fork and starts writing a new profile
167 file with the pid appended to the filename. Since nytprofhtml only
168 works with a single profile file you may want to merge multiple files
169 using nytprofmerge.
170
171 Fast Profiling
172 The NYTProf profiler is written almost entirely in C and great care has
173 been taken to ensure it's very efficient.
174
175 Apache Profiling
176 Just add one line near the start of your httpd.conf file:
177
178 PerlModule Devel::NYTProf::Apache
179
180 By default you'll get a /tmp/nytprof.$$.out file for the parent process
181 and a /tmp/nytprof.$parent.out.$$ file for each worker process.
182
183 NYTProf takes care to detect when control is returning back from perl
184 to mod_perl so time spent in mod_perl (such as waiting for the next
185 request) does not get allocated to the last statement executed.
186
187 Works with mod_perl 1 and 2. See Devel::NYTProf::Apache for more
188 information.
189
191 Usually you'd load Devel::NYTProf on the command line using the perl -d
192 option:
193
194 perl -d:NYTProf some_perl.pl
195
196 To save typing the ':NYTProf' you could set the "PERL5DB" environment
197 variable:
198
199 PERL5DB='use Devel::NYTProf'
200
201 and then just perl -d would work:
202
203 perl -d some_perl.pl
204
205 Or you can avoid the need to add the -d option at all by using the
206 "PERL5OPT" environment variable|perlrun/PERL5OPT>:
207
208 PERL5OPT=-d:NYTProf
209
210 That's also very handy when you can't alter the perl command line being
211 used to run the script you want to profile. Usually you'll want to
212 enable the "addpid=1" option to ensure any nested invocations of perl
213 don't overwrite the profile.
214
216 The behavior of Devel::NYTProf may be modified by setting the
217 environment variable "NYTPROF". It is possible to use this environment
218 variable to effect multiple setting by separating the values with a
219 ":". For example:
220
221 export NYTPROF=trace=2:start=init:file=/tmp/nytprof.out
222
223 Any colon or equal characters in a value can be escaped by preceding
224 them with a backslash.
225
226 addpid=1
227 Append the current process id to the end of the filename.
228
229 This avoids concurrent, or consecutive, processes from overwriting the
230 same file. If a fork is detected during profiling then the child
231 process will automatically add the process id to the filename.
232
233 addtimestamp=1
234 Append the current time, as integer epoch seconds, to the end of the
235 filename.
236
237 trace=N
238 Set trace level to N. 0 is off (the default). Higher values cause more
239 detailed trace output. Trace output is written to STDERR or wherever
240 the "log=F" option has specified.
241
242 log=F
243 Specify the name of the file that "trace=N" output should be written
244 to.
245
246 start=...
247 Specify at which phase of program execution the profiler should be
248 enabled:
249
250 start=begin - start immediately (the default)
251 start=init - start at beginning of INIT phase (after compilation/use/BEGIN)
252 start=end - start at beginning of END phase
253 start=no - don't automatically start
254
255 The start=no option is handy if you want to explicitly control
256 profiling by calling DB::enable_profile() and DB::disable_profile()
257 yourself. See "RUN-TIME CONTROL OF PROFILING".
258
259 The start=init option is handy if you want to avoid profiling the
260 loading and initialization of modules.
261
262 optimize=0
263 Disable the perl optimizer.
264
265 By default NYTProf leaves perl's optimizer enabled. That gives you
266 more accurate profile timing overall, but can lead to odd statement
267 counts for individual sets of lines. That's because the perl's peephole
268 optimizer has effectively rewritten the statements but you can't see
269 what the rewritten version looks like.
270
271 For example:
272
273 1 if (...) {
274 2 return;
275 3 }
276
277 may be rewritten as
278
279 1 return if (...)
280
281 so the profile won't show a statement count for line 2 in your source
282 code because the "return" was merged into the "if" statement on the
283 preceding line.
284
285 Also 'empty' statements like "1;" are removed entirely. Such
286 statements are empty because the optimizer has already removed the
287 pointless constant in void context. It then goes on to remove the now
288 empty statement (in perl >= 5.13.7).
289
290 Using the "optimize=0" option disables the optimizer so you'll get
291 lower overall performance but more accurately assigned statement
292 counts.
293
294 If you find any other examples of the effect of optimizer on NYTProf
295 output (other than performance, obviously) please let us know.
296
297 subs=0
298 Set to 0 to disable the collection of subroutine caller and timing
299 details.
300
301 blocks=1
302 Set to 1 to enable the determination of block and subroutine location
303 per statement. This makes the profiler about 50% slower (as of July
304 2008) and produces larger output files, but you gain some valuable
305 insight in where time is spent in the blocks within large subroutines
306 and scripts.
307
308 stmts=0
309 Set to 0 to disable the statement profiler. (Implies "blocks=0".) The
310 reports won't contain any statement timing detail.
311
312 This significantly reduces the overhead of the profiler and can also be
313 useful for profiling large applications that would normally generate a
314 very large profile data file.
315
316 calls=N
317 This option is new and experimental.
318
319 With calls=1 (the default) subroutine call return events are emitted
320 into the data stream as they happen. With calls=2 subroutine call
321 entry events are also emitted. With calls=0 no subroutine call events
322 are produced. This option depends on the "subs" option being enabled,
323 which it is by default.
324
325 The nytprofcalls utility can be used to process this data. It too is
326 new and experimental and so likely to change.
327
328 The subroutine profiler normally gathers data in memory and outputs a
329 summary when the profile data is being finalized, usually when the
330 program has finished. The summary contains aggregate information for
331 all the calls from one location to another, but the details of
332 individual calls have been lost. The calls option enables the
333 recording of individual call events and thus more detailed analysis and
334 reporting of that data.
335
336 leave=0
337 Set to 0 to disable the extra work done by the statement profiler to
338 allocate times accurately when returning into the middle of statement.
339 For example leaving a subroutine and returning into the middle of
340 statement, or re-evaluating a loop condition.
341
342 This feature also ensures that in embedded environments, such as
343 mod_perl, the last statement executed doesn't accumulate the time spent
344 'outside perl'.
345
346 findcaller=1
347 Force NYTProf to recalculate the name of the caller of the each sub
348 instead of 'inheriting' the name calculated when the caller was
349 entered. (Rarely needed, but might be useful in some odd cases.)
350
351 use_db_sub=1
352 Set to 1 to enable use of the traditional DB::DB() subroutine to
353 perform profiling, instead of the faster 'opcode redirection' technique
354 that's used by default. Also effectively sets "leave=0" (see above).
355
356 The default 'opcode redirection' technique can't profile subroutines
357 that were compiled before NYTProf was loaded. So using use_db_sub=1 can
358 be useful in cases where you can't load the profiler early in the life
359 of the application.
360
361 Another side effect of "use_db_sub=1" is that it enables recording of
362 the source code of the "perl -e '...'" and "perl -" input for old
363 versions of perl. See also "savesrc=0".
364
365 savesrc=0
366 Disable the saving of source code.
367
368 By default NYTProf saves a copy of all source code into the profile
369 data file. This makes the file self-contained, so the reporting tools
370 no longer depend on having the unmodified source code files available.
371
372 With "savesrc=0" some source code is still saved: the arguments to the
373 "perl -e" option, the script fed to perl via STDIN when using "perl -",
374 and the source code of string evals.
375
376 Saving the source code of string evals requires perl version 5.8.9+,
377 5.10.1+, or 5.12 or later.
378
379 Saving the source code of the "perl -e '...'" or "perl -" input
380 requires either a recent perl version, as above, or setting the
381 "use_db_sub=1" option.
382
383 slowops=N
384 Profile perl opcodes that can be slow. These include opcodes that make
385 system calls, such as "print", "read", "sysread", "socket" etc., plus
386 regular expression opcodes like "subst" and "match".
387
388 If "N" is 0 then slowops profiling is disabled.
389
390 If "N" is 1 then all the builtins are treated as being defined in the
391 "CORE" package. So times for "print" calls from anywhere in your code
392 are merged and accounted for as calls to an xsub called "CORE::print".
393
394 If "N" is 2 (the default) then builtins are treated as being defined in
395 the package that calls them. So calls to "print" from package "Foo" are
396 treated as calls to an xsub called "Foo::CORE:print". Note the single
397 colon after CORE.
398
399 The opcodes are currently profiled using their internal names, so
400 "printf" is "prtf" and the "-x" file test is "fteexec". This may change
401 in future.
402
403 Opcodes that call subroutines, perhaps by triggering a FETCH from a
404 tied variable, currently appear in the call tree as the caller of the
405 sub. This is likely to change in future.
406
407 usecputime=1
408 This option has been removed. Profiling won't be enabled if set.
409
410 Use the "clock=N" option to select a high-resolution CPU time clock, if
411 available on your system, instead. That will give you higher resolution
412 and work for the subroutine profiler as well.
413
414 file=...
415 Specify the output file to write profile data to (default:
416 './nytprof.out').
417
418 compress=...
419 Specify the compression level to use, if NYTProf is compiled with
420 compression support. Valid values are 0 to 9, with 0 disabling
421 compression. The default is 6 as higher values yield little extra
422 compression but the cpu cost starts to rise significantly. Using level
423 1 still gives you a significant reduction in file size.
424
425 If NYTProf was not compiled with compression support, this option is
426 silently ignored.
427
428 clock=N
429 Systems which support the "clock_gettime()" system call typically
430 support several clocks. By default NYTProf uses CLOCK_MONOTONIC.
431
432 This option enables you to select a different clock by specifying the
433 integer id of the clock (which may vary between operating system
434 types). If the clock you select isn't available then CLOCK_REALTIME is
435 used.
436
437 See "CLOCKS" for more information.
438
439 sigexit=1
440 When perl exits normally it runs any code defined in "END" blocks.
441 NYTProf defines an END block that finishes profiling and writes out the
442 final profile data.
443
444 If the process ends due to a signal then END blocks are not executed so
445 the profile will be incomplete and unusable. The "sigexit" option
446 tells NYTProf to catch some signals (e.g. INT, HUP, PIPE, SEGV, BUS)
447 and ensure a usable profile by executing:
448
449 DB::finish_profile();
450 exit 1;
451
452 You can also specify which signals to catch in this way by listing
453 them, separated by commas, as the value of the option (case is not
454 significant):
455
456 sigexit=int,hup
457
458 posix_exit=1
459 The NYTProf subroutine profiler normally detects calls to
460 "POSIX::_exit()" (which exits the process without running END blocks)
461 and automatically calls "DB::finish_profile()" for you, so NYTProf
462 'just works'.
463
464 When using the "subs=0" option to disable the subroutine profiler the
465 "posix_exit" option can be used to tell NYTProf to take other steps to
466 arrange for "DB::finish_profile()" to be called before
467 "POSIX::_exit()".
468
469 libcexit=1
470 Arranges for "finish_profile" to be called via the C library "atexit()"
471 function. This may help some tricky cases where the process may exit
472 without perl executing the "END" block that NYTProf uses to call
473 /finish_profile().
474
475 endatexit=1
476 Sets the PERL_EXIT_DESTRUCT_END flag in the PL_exit_flags of the perl
477 interpreter. This makes perl run "END" blocks in perl_destruct()
478 instead of perl_run() which may help in cases, like Apache, where perl
479 is embedded but perl_run() isn't called.
480
481 forkdepth=N
482 When a perl process that is being profiled executes a fork() the child
483 process is also profiled. The forkdepth option can be used to control
484 this. If forkdepth is zero then profiling will be disabled in the child
485 process.
486
487 If forkdepth is greater than zero then profiling will be enabled in the
488 child process and the forkdepth value in that process is decremented by
489 one.
490
491 If forkdepth is -1 (the default) then there's no limit on the number of
492 generations of children that are profiled.
493
494 nameevals=0
495 The 'file name' of a string eval is normally a string like ""(eval
496 N)"", where "N" is a sequence number. By default NYTProf asks perl to
497 give evals more informative names like ""(eval N)[file:line]"", where
498 "file" and "line" are the file and line number where the string "eval"
499 was executed.
500
501 The "nameevals=0" option can be used to disable the more informative
502 names and return to the default behaviour. This may be need in rare
503 cases where the application code is sensitive to the name given to a
504 "eval". (The most common case in when running test suites undef
505 NYTProf.)
506
507 The downside is that the NYTProf reporting tools are less useful and
508 may get confused if this option is used.
509
510 nameanonsubs=0
511 The name of a anonymous subroutine is normally ""__ANON__"". By
512 default NYTProf asks perl to give anonymous subroutines more
513 informative names like ""__ANON__[file:line]"", where "file" and "line"
514 are the file and line number where the anonymous subroutine was
515 defined.
516
517 The "nameanonsubs=0" option can be used to disable the more informative
518 names and return to the default behaviour. This may be need in rare
519 cases where the application code is sensitive to the name given to a
520 anonymous subroutines. (The most common case in when running test
521 suites undef NYTProf.)
522
523 The downside is that the NYTProf reporting tools are less useful and
524 may get confused if this option is used.
525
527 You can profile only parts of an application by calling
528 DB::disable_profile() to stop collecting profile data, and calling
529 DB::enable_profile() to start collecting profile data.
530
531 Using the "start=no" option lets you leave the profiler disabled
532 initially until you call DB::enable_profile() at the right moment. You
533 still need to load Devel::NYTProf as early as possible, even if you
534 don't call DB::enable_profile() until much later. That's because any
535 code that's compiled before Devel::NYTProf is loaded will not be
536 profiled by default. See also "use_db_sub=1".
537
538 The profile output file can't be used until it's been properly
539 completed and closed. Calling DB::disable_profile() doesn't do that.
540 To make a profile file usable before the profiled application has
541 completed you can call DB::finish_profile(). Alternatively you could
542 call DB::enable_profile($newfile).
543
544 Always call the DB::enable_profile(), DB::disable_profile() or
545 DB::finish_profile() function with the "DB::" prefix as shown because
546 you can't import them. They're provided automatically when NYTProf is
547 in use.
548
549 disable_profile
550 DB::disable_profile()
551
552 Stops collection of profile data until DB:enable_profile() is called.
553
554 Subroutine calls which were made while profiling was enabled and are
555 still on the call stack (have not yet exited) will still have their
556 profile data collected when they exit. Compare with "finish_profile"
557 below.
558
559 enable_profile
560 DB::enable_profile($newfile)
561 DB::enable_profile()
562
563 Enables collection of profile data. If $newfile is specified the
564 profile data will be written to $newfile (after completing and closing
565 the previous file, if any). If $newfile already exists it will be
566 deleted first. If DB::enable_profile() is called without a filename
567 argument then profile data will continue to be written to the current
568 file (nytprof.out by default).
569
570 finish_profile
571 DB::finish_profile()
572
573 Calls DB::disable_profile(), then completes the profile data file by
574 writing subroutine profile data, and then closes the file. The in
575 memory subroutine profile data is then discarded.
576
577 Normally NYTProf arranges to call finish_profile() for you via an END
578 block.
579
581 NYTProf tries very hard to gather accurate information. The nature of
582 the internals of perl mean that, in some cases, the information that's
583 gathered is accurate but surprising. In some cases it can appear to be
584 misleading. (Of course, in some cases it may actually be plain wrong.
585 Caveat lector.)
586
587 If Statement and Subroutine Timings Don't Match
588 NYTProf has two profilers: a statement profiler that's invoked when
589 perl moves from one perl statement to another, and a subroutine
590 profiler that's invoked when perl calls or returns from a subroutine.
591
592 The individual statement timings for a subroutine usually add up to
593 slightly less than the exclusive time for the subroutine. That's
594 because the handling of the subroutine call and return overheads is
595 included in the exclusive time for the subroutine. The difference may
596 only be a few microseconds but that may become noticeable for
597 subroutines that are called hundreds of thousands of times.
598
599 The statement profiler keeps track how much time was spent on
600 overheads, like writing statement profile data to disk. The subroutine
601 profiler subtracts the overheads that have accumulated between entering
602 and leaving the subroutine in order to give a more accurate profile.
603 The statement profiler is generally very fast because most writes get
604 buffered for zip compression so the profiler overhead per statement
605 tends to be very small, often a single 'tick'. The result is that the
606 accumulated overhead is quite noisy. This becomes more significant for
607 subroutines that are called frequently and are also fast. This may be
608 another, smaller, contribution to the discrepancy between statement
609 time and exclusive times.
610
611 If Headline Subroutine Timings Don't Match the Called Subs
612 Overall subroutine times are reported with a headline like "spent 10s
613 (2+8) within ...". In this example, 10 seconds were spent inside the
614 subroutine (the "inclusive time") and, of that, 8 seconds were spent in
615 subroutines called by this one. That leaves 2 seconds as the time
616 spent in the subroutine code itself (the "exclusive time", sometimes
617 also called the "self time").
618
619 The report shows the source code of the subroutine. Lines that make
620 calls to other subroutines are annotated with details of the time spent
621 in those calls.
622
623 Sometimes the sum of the times for calls made by the lines of code in
624 the subroutine is less than the inclusive-exclusive time reported in
625 the headline (10-2 = 8 seconds in the example above).
626
627 What's happening here is that calls to other subroutines are being made
628 but NYTProf isn't able to determine the calling location correctly so
629 the calls don't appear in the report in the correct place.
630
631 Using an old version of perl is one cause (see below). Another is
632 calling subroutines that exit via "goto ⊂" - most frequently
633 encountered in AUTOLOAD subs and code using the Memoize module.
634
635 In general the overall subroutine timing is accurate and should be
636 trusted more than the sum of statement or nested sub call timings.
637
638 Perl 5.10.1+ (or else 5.8.9+) is Recommended
639 These versions of perl yield much more detailed information about calls
640 to BEGIN, CHECK, INIT, and END blocks, the code handling tied or
641 overloaded variables, and callbacks from XS code.
642
643 Perl 5.12 will hopefully also fix an inaccuracy in the timing of the
644 last statement and the condition clause of some kinds of loops:
645 <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>
646
647 eval $string
648 Perl treats each execution of a string eval ("eval $string;" not "eval
649 { ... }") as a distinct file, so NYTProf does as well. The 'files' are
650 given names with this structure:
651
652 (eval $sequence)[$filename:$line]
653
654 for example ""(eval 93)[/foo/bar.pm:42]"" would be the name given to
655 the 93rd execution of a string eval by that process and, in this case,
656 the 93rd eval happened to be one at line 42 of "/foo/bar.pm".
657
658 Nested string evals can give rise to file names like
659
660 (eval 1047)[(eval 93)[/foo/bar.pm:42]:17]
661
662 Merging Evals
663
664 Some applications execute a great many string eval statements. If
665 NYTProf generated a report page for each one it would not only slow
666 report generation but also make the overall report less useful by
667 scattering performance data too widely. On the other hand, being able
668 to see the actual source code executed by an eval, along with the
669 timing details, is often very useful.
670
671 To try to balance these conflicting needs, NYTProf currently merges
672 uninteresting string eval siblings.
673
674 What does that mean? Well, for each source code line that executed any
675 string evals, NYTProf first gathers the corresponding eval 'files' for
676 that line (known as the 'siblings') into groups keyed by distinct
677 source code.
678
679 Then, for each of those groups of siblings, NYTProf will 'merge' a
680 group that shares the same source code and doesn't execute any string
681 evals itself. Merging means to pick one sibling as the survivor and
682 merge and delete all the data from the others into it.
683
684 If there are a large number of sibling groups then the data for all of
685 them are merged into one regardless.
686
687 The report annotations will indicate when evals have been merged
688 together.
689
690 Merging Anonymous Subroutines
691
692 Anonymous subroutines defined within string evals have names like this:
693
694 main::__ANON__[(eval 75)[/foo/bar.pm:42]:12]
695
696 That anonymous subroutine was defined on line 12 of the source code
697 executed by the string eval on line 42 of /foo/bar.pm. That was the
698 75th string eval executed by the program.
699
700 Anonymous subroutines defined on the same line of sibling evals that
701 get merged are also merged. That is, the profile information is merged
702 into one and the others are discarded.
703
704 Timing
705
706 Care should be taken when interpreting the report annotations
707 associated with a string eval statement. Normally the report
708 annotations embedded into the source code related to timings from the
709 subroutine profiler. This isn't (currently) true of annotations for
710 string eval statements.
711
712 This makes a significant different if the eval defines any subroutines
713 that get called after the eval has returned. Because the time shown for
714 a string eval is based on the statement times it will include time
715 spent executing statements within the subs defined by the eval.
716
717 In future NYTProf may involve the subroutine profiler in timings evals
718 and so be able to avoid this issue.
719
720 Calls from XSUBs and Opcodes
721 Calls record the current filename and line number of the perl code at
722 the time the call was made. That's fine and accurate for calls from
723 perl code. For calls that originate from C code however, such as an
724 XSUB or an opcode, the filename and line number recorded are still
725 those of the last perl statement executed.
726
727 For example, a line that calls an xsub will appear in reports to also
728 have also called any subroutines that that xsub called. This can be
729 construed as a feature.
730
731 As an extreme example, the first time a regular expression that uses
732 character classes is executed on a unicode string you'll find profile
733 data like this:
734
735 # spent 1ms within main::BEGIN@4 which was called
736 # once (1ms+0s) by main::CORE:subst at line 0
737 4 s/ (?: [A-Z] | [\d] )+ (?= [\s] ) //x;
738 # spent 38.8ms making 1 call to main::CORE:subst
739 # spent 25.4ms making 2 calls to utf8::SWASHNEW, avg 12.7ms/call
740 # spent 12.4ms making 1 call to utf8::AUTOLOAD
741
743 You can reduce the cost of profiling by adjusting some options. The
744 trade-off is reduced detail and/or accuracy in reports.
745
746 If you don't need statement-level profiling then you can disable it via
747 "stmts=0". To further boost statement-level profiling performance try
748 "leave=0" but note that will apportion timings for some kinds of
749 statements less accurate).
750
751 If you don't need call stacks or flamegraph then disable it via
752 calls=0. If you don't need subroutine profiling then you can disable
753 it via "subs=0". If you do need it but don't need timings for perl
754 opcodes then set slowops=0.
755
756 Generally speaking, setting calls=0 and slowops=0 will give you a
757 useful boost with the least loss of detail.
758
759 Another approach is to only enable NYTProf in the sections of code that
760 interest you. See "RUN-TIME CONTROL OF PROFILING" for more details.
761
762 To speed up nytprofhtml try using the --minimal (-m) or --no-flame
763 options.
764
766 The Devel::NYTProf::Data module provides a low-level interface for
767 loading the profile data.
768
769 The Devel::NYTProf::Reader module provides an interface for generating
770 arbitrary reports. This means that you can implement your own output
771 format in perl. (Though the module is in a state of flux and may be
772 deprecated soon.)
773
774 Included in the bin directory of this distribution are some scripts
775 which turn the raw profile data into more useful formats:
776
777 nytprofhtml
778 Creates attractive, richly annotated, and fully cross-linked html
779 reports (including statistics, source code and color highlighting).
780 This is the main report generation tool for NYTProf.
781
782 nytprofcg
783 Translates a profile into a format that can be loaded into KCachegrind
784 <http://kcachegrind.github.io/>
785
786 nytprofcalls
787 Reads a profile and processes the calls events it contains.
788
789 nytprofmerge
790 Reads multiple profile data files and writes out a new file containing
791 the merged profile data.
792
794 Threads and Multiplicity
795 "Devel::NYTProf" is not currently thread safe or multiplicity safe. If
796 you'd be interested in helping to fix that then please get in touch
797 with us. Meanwhile, profiling is disabled when a thread is created, and
798 NYTProf tries to ignore any activity from perl interpreters other than
799 the first one that loaded it.
800
801 Coro
802 The "Devel::NYTProf" subroutine profiler gets confused by the stack
803 gymnastics performed by the Coro module and aborts. When profiling
804 applications that use Coro you should disable the subroutine profiler
805 using the "subs=0" option.
806
807 FCGI::Engine
808 Using "open('-|')" in code running under FCGI::Engine causes a panic in
809 nytprofcalls. See https://github.com/timbunce/devel-nytprof/issues/20
810 for more information.
811
812 For perl < 5.8.8 it may change what caller() returns
813 For example, the Readonly module croaks with "Invalid tie" when
814 profiled with perl versions before 5.8.8. That's because Readonly
815 explicitly checking for certain values from caller(). The NEXT module
816 is also affected.
817
818 For perl < 5.10.1 it can't see some implicit calls and callbacks
819 For perl versions prior to 5.8.9 and 5.10.1, some implicit subroutine
820 calls can't be seen by the subroutine profiler. Technically this
821 affects calls made via the various perl "call_*()" internal APIs.
822
823 For example, BEGIN/CHECK/INIT/END blocks, the "TIE"whatever subroutine
824 called by "tie()", all calls made via operator overloading, and
825 callbacks from XS code, are not seen.
826
827 The effect is that time in those subroutines is accumulated by the subs
828 that triggered the call to them. So time spent in calls invoked by perl
829 to handle overloading are accumulated by the subroutines that trigger
830 overloading (so it is measured, but the cost is dispersed across
831 possibly many calling locations).
832
833 Although the calls aren't seen by the subroutine profiler, the
834 individual statements executed by the code in the called subs are
835 profiled by the statement profiler.
836
837 #line directives
838 The reporting code currently doesn't handle #line directives, but at
839 least it warns about them. Patches welcome.
840
841 Freed values in @_ may be mutated
842 Perl has a class of bugs related to the fact that values placed in the
843 stack are not reference counted. Consider this example:
844
845 @a = (1..9); sub s { undef @a; print $_ for @_ } s(@a);
846
847 The "undef @a" frees the values that @_ refers to. Perl can sometimes
848 detect when a freed value is accessed and treats it as an undef.
849 However, if the freed value is assigned some new value then @_ is
850 effectively corrupted.
851
852 NYTProf allocates new values while it's profiling, in order to record
853 program activity, and so may appear to corrupt @_ in this (rare)
854 situation. If this happens, NYTProf is simply exposing an existing
855 problem in the code.
856
857 Lvalue subroutines aren't profiled when using use_db_sub=1
858 Currently 'lvalue' subroutines (subs that can be assigned to, like
859 "foo() = 42") are not profiled when using use_db_sub=1.
860
862 Here we discuss the way NYTProf gets high-resolution timing information
863 from your system and related issues.
864
865 POSIX Clocks
866 These are the clocks that your system may support if it supports the
867 POSIX "clock_gettime()" function. Other clock sources are listed in the
868 "Other Clocks" section below.
869
870 The "clock_gettime()" interface allows clocks to return times to
871 nanosecond precision. Of course few offer nanosecond accuracy but the
872 extra precision helps reduce the cumulative error that naturally occurs
873 when adding together many timings. When using these clocks NYTProf
874 outputs timings as a count of 100 nanosecond ticks.
875
876 CLOCK_MONOTONIC
877
878 CLOCK_MONOTONIC represents the amount of time since an unspecified
879 point in the past (typically system start-up time). It increments
880 uniformly independent of adjustments to 'wallclock time'. NYTProf will
881 use this clock by default, if available.
882
883 CLOCK_REALTIME
884
885 CLOCK_REALTIME is typically the system's main high resolution 'wall
886 clock time' source. The same source as used for the gettimeofday()
887 call used by most kinds of perl benchmarking and profiling tools.
888
889 The problem with real time is that it's far from simple. It tends to
890 drift and then be reset to match 'reality', either sharply or by small
891 adjustments (via the adjtime() system call).
892
893 Surprisingly, it can also go backwards, for reasons explained in
894 http://preview.tinyurl.com/5wawnn so CLOCK_MONOTONIC is preferred.
895
896 CLOCK_VIRTUAL
897
898 CLOCK_VIRTUAL increments only when the CPU is running in user mode on
899 behalf of the calling process.
900
901 CLOCK_PROF
902
903 CLOCK_PROF increments when the CPU is running in user or kernel mode.
904
905 CLOCK_PROCESS_CPUTIME_ID
906
907 CLOCK_PROCESS_CPUTIME_ID represents the amount of execution time of the
908 process associated with the clock.
909
910 CLOCK_THREAD_CPUTIME_ID
911
912 CLOCK_THREAD_CPUTIME_ID represents the amount of execution time of the
913 thread associated with the clock.
914
915 Finding Available POSIX Clocks
916
917 On unix-like systems you can find the CLOCK_* clocks available on you
918 system using a command like:
919
920 grep -r 'define *CLOCK_' /usr/include
921
922 Look for a group that includes CLOCK_REALTIME. The integer values
923 listed are the clock ids that you can use with the "clock=N" option.
924
925 A future version of NYTProf should be able to list the supported
926 clocks.
927
928 Other Clocks
929 This section lists other clock sources that NYTProf may use.
930
931 If your system doesn't support clock_gettime() then NYTProf will use
932 gettimeofday(), or the nearest equivalent,
933
934 gettimeofday
935
936 This is the traditional high resolution time of day interface for most
937 unix-like systems. With this clock NYTProf outputs timings as a count
938 of 1 microsecond ticks.
939
940 mach_absolute_time
941
942 On Mac OS X the mach_absolute_time() function is used. With this clock
943 NYTProf outputs timings as a count of 100 nanosecond ticks.
944
945 Time::HiRes
946
947 On systems which don't support other clocks, NYTProf falls back to
948 using the Time::HiRes module. With this clock NYTProf outputs timings
949 as a count of 1 microsecond ticks.
950
951 Clock References
952 Relevant specifications and manual pages:
953
954 http://www.opengroup.org/onlinepubs/000095399/functions/clock_getres.html
955 http://linux.die.net/man/3/clock_gettime
956
957 Why 'realtime' can appear to go backwards:
958
959 http://preview.tinyurl.com/5wawnn
960
961 The PostgreSQL pg_test_timing utility documentation has a good summary
962 of timing issues:
963
964 http://www.postgresql.org/docs/9.2/static/pgtesttiming.html
965
967 SMP Systems
968 On systems with multiple processors, which includes most modern
969 machines, (from Linux docs though applicable to most SMP systems):
970
971 The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on
972 many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium).
973 These registers may differ between CPUs and as a consequence these clocks may
974 return bogus results if a process is migrated to another CPU.
975
976 If the CPUs in an SMP system have different clock sources then there is no way
977 to maintain a correlation between the timer registers since each CPU will run
978 at a slightly different frequency. If that is the case then
979 clock_getcpuclockid(0) will return ENOENT to signify this condition. The two
980 clocks will then only be useful if it can be ensured that a process stays on a
981 certain CPU.
982
983 The processors in an SMP system do not start all at exactly the same time and
984 therefore the timer registers are typically running at an offset. Some
985 architectures include code that attempts to limit these offsets on bootup.
986 However, the code cannot guarantee to accurately tune the offsets. Glibc
987 contains no provisions to deal with these offsets (unlike the Linux Kernel).
988 Typically these offsets are small and therefore the effects may be negligible
989 in most cases.
990
991 In summary, SMP systems are likely to give 'noisy' profiles. Setting a
992 "Processor Affinity" may help.
993
994 Processor Affinity
995
996 Processor affinity is an aspect of task scheduling on SMP systems.
997 "Processor affinity takes advantage of the fact that some remnants of a
998 process may remain in one processor's state (in particular, in its
999 cache) from the last time the process ran, and so scheduling it to run
1000 on the same processor the next time could result in the process running
1001 more efficiently than if it were to run on another processor." (From
1002 http://en.wikipedia.org/wiki/Processor_affinity)
1003
1004 Setting an explicit processor affinity can avoid the problems described
1005 in "SMP Systems".
1006
1007 Processor affinity can be set using the "taskset" command on Linux.
1008
1009 Note that processor affinity is inherited by child processes, so if the
1010 process you're profiling spawns cpu intensive sub processes then your
1011 process will be impacted by those more than it otherwise would.
1012
1013 Windows
1014
1015 THIS SECTION DOESN'T MATCH THE CODE
1016
1017 On Windows NYTProf uses Time::HiRes which uses the windows
1018 QueryPerformanceCounter() API with some extra logic to adjust for the
1019 current clock speed and try to resync the raw counter to wallclock time
1020 every so often (every 30 seconds or if the timer drifts by more than
1021 0.5 of a seconds). This extra logic may lead to occasional spurious
1022 results.
1023
1024 (It would be great if someone could contribute a patch to NYTProf to
1025 use QueryPerformanceCounter() directly and avoid the overheads and
1026 resyncing behaviour of Time::HiRes.)
1027
1028 Virtual Machines
1029 I recommend you don't do performance profiling while running in a
1030 virtual machine. If you do you're likely to find inexplicable spikes
1031 of real-time appearing at unreasonable places in your code. You should
1032 pay less attention to the statement timings and rely more on the
1033 subroutine timings. They will still be noisy but less so than the
1034 statement times.
1035
1036 You could also try using the "clock=N" option to select a high-
1037 resolution cpu-time clock instead of a real-time one. That should be
1038 much less noisy, though you will lose visibility of wait-times due to
1039 network and disk I/O, for example.
1040
1042 Possibly. All complex software has bugs. Let me know if you find one.
1043
1045 Screenshots of nytprofhtml v2.01 reports can be seen at
1046 <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-index.png>
1047 and
1048 <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-all-perl-files.png>.
1049 A writeup of the new features of NYTProf v2 can be found at
1050 <http://blog.timbunce.org/2008/07/15/nytprof-v2-a-major-advance-in-perl-profilers/>
1051 and the background story, explaining the "why", can be found at
1052 <http://blog.timbunce.org/2008/07/16/nytprof-v2-the-background-story/>.
1053
1054 Mailing list and discussion at
1055 <http://groups.google.com/group/develnytprof-dev>
1056
1057 Blog posts <http://blog.timbunce.org/tag/nytprof/>
1058
1059 Public Github Repository and hacking instructions at
1060 <https://github.com/timbunce/devel-nytprof/>
1061
1062 nytprofhtml is a script included that produces html reports.
1063 nytprofcsv is another script included that produces plain text CSV
1064 reports.
1065
1066 Devel::NYTProf::Reader is the module that powers the report scripts.
1067 You might want to check this out if you plan to implement a custom
1068 report (though it's very likely to be deprecated in a future release).
1069
1070 Devel::NYTProf::ReadStream is the module that lets you read a profile
1071 data file as a stream of chunks of data.
1072
1073 Other tools:
1074
1075 DTrace <https://speakerdeck.com/mrallen1/perl-dtrace-and-you>
1076
1078 "Profile data incomplete, ..." or "Profile format error: ..."
1079 This error message means the file doesn't contain all the expected data
1080 or the data has been corrupted in some way. That may be because it was
1081 truncated (perhaps the filesystem was full) or, more commonly, because
1082 the all the expected data hasn't been written.
1083
1084 NYTProf writes some important data to the data file when finishing
1085 profiling. If you read the file before the profiling has finished
1086 you'll get this error.
1087
1088 If the process being profiled is still running you'll need to wait
1089 until it exits cleanly (runs "END" blocks or "finish_profile" is called
1090 explicitly).
1091
1092 If the process being profiled has exited then it's likely that it met
1093 with a sudden and unnatural death that didn't give NYTProf a chance to
1094 finish the profile. If the sudden death was due to a signal, like
1095 SIGTERM, or a SIGINT from pressing Ctrl-C, then the "sigexit=1" option
1096 may help.
1097
1098 If the sudden death was due to calling "POSIX::_exit($status)" then
1099 you'll need to call "finish_profile" before calling "POSIX::_exit".
1100
1101 You'll also get this error if the code trying to read the profile is
1102 itself being profiled. That's most likely to happen if you enable
1103 profiling via the "PERL5OPT" environment variable and have forgotten to
1104 unset it.
1105
1106 If you've encountered this error message, and you're sure you've
1107 understood the concerns described above, and you're sure they don't
1108 apply in your case, then please open an issue. Be sure to include
1109 sufficient information so I can see how you've addressed these likely
1110 causes.
1111
1112 Some source files don't have profile information
1113 This is usually due to NYTProf being initialized after some perl files
1114 have already been compiled.
1115
1116 If you can't alter the command line to add ""-d:NYTProf"" you could try
1117 using the "PERL5OPT" environment variable. See "PROFILING".
1118
1119 You could also try using the "use_db_sub=1" option.
1120
1121 Eval ... has unknown invoking fid
1122 When using the statement profiler you may see a warning message like
1123 this:
1124
1125 Eval '(eval 2)' (fid 9, flags:viastmt,savesrc) has unknown invoking fid 10
1126
1127 Notice that the eval file id (fid 9 in this case) is lower than the
1128 file id that invoked the eval (fid 10 in this case). This is a known
1129 problem caused by the way perl works and how the profiler assigns and
1130 outputs the file ids. The invoking fid is known but gets assigned a
1131 fid and output after the fid for the eval, and that causes the warning
1132 when the file is read.
1133
1134 Warning: %d subroutine calls had negative time
1135 There are two likely causes for this: clock instability, or accumulated
1136 timing errors.
1137
1138 Clock instability, if present on your system, is most likely to be
1139 noticeable on very small/fast subroutines that are called very few
1140 times.
1141
1142 Accumulated timing errors can arise because the subroutine profiler
1143 uses floating point values (NVs) to store the times. They are most
1144 likely to be noticed on subroutines that are called a few times but
1145 which make a large number of calls to very fast subroutines (such as
1146 opcodes). In this case the accumulated time apparently spent making
1147 those calls can be greater than the time spent in the calling
1148 subroutine.
1149
1150 If you rerun nytprofhtml (etc.) with the "trace=N" option set >0 you'll
1151 see trace messages like "%s call has negative time: incl %fs, excl
1152 %fs" for each affected subroutine.
1153
1154 Try profiling with the "slowops=N" option set to 0 to disable the
1155 profiling of opcodes. Since opcodes often execute in a few microseconds
1156 they are a common cause of this warning.
1157
1158 You could also try recompiling perl to use 'long doubles' for the NV
1159 floating point type (use Configure -Duselongdouble). If you try this
1160 please let me know. I'd also happily take a patch to use long doubles,
1161 if available, by default.
1162
1163 panic: buffer overflow ...
1164 You have unusually long subroutine names in your code. You'll need to
1165 rebuild Devel::NYTProf with the NYTP_MAX_SUB_NAME_LEN environment
1166 variable set to a value longer than the longest subroutine names in
1167 your code.
1168
1170 Tim Bunce (<http://blog.timbunce.org>) leads the project and has done
1171 most of the development work thus far.
1172
1173 Nicholas Clark contributed zip compression and "nytprofmerge". Chia-
1174 liang Kao contributed "nytprofcg". Peter (Stig) Edwards contributed
1175 the VMS port. Jan Dubois contributed the Windows port. Gisle Aas
1176 contributed the Devel::NYTProf::ReadStream module. Steve Peters
1177 contributed greater perl version portability and use of POSIX high-
1178 resolution clocks. Other contributors are noted in the Changes file.
1179
1180 Many thanks to Adam Kaplan who created "NYTProf" initially by forking
1181 "Devel::FastProf" adding reporting from "Devel::Cover" and a test
1182 suite.
1183
1185 Copyright (C) 2008 by Adam Kaplan and The New York Times Company.
1186 Copyright (C) 2008-2016 by Tim Bunce, Ireland.
1187
1188 This library is free software; you can redistribute it and/or modify it
1189 under the same terms as Perl itself, either Perl version 5.8.8 or, at
1190 your option, any later version of Perl 5 you may have available.
1191
1192 Background
1193 Subroutine-level profilers:
1194
1195 Devel::DProf | 1995-10-31 | ILYAZ
1196 Devel::AutoProfiler | 2002-04-07 | GSLONDON
1197 Devel::Profiler | 2002-05-20 | SAMTREGAR
1198 Devel::Profile | 2003-04-13 | JAW
1199 Devel::DProfLB | 2006-05-11 | JAW
1200 Devel::WxProf | 2008-04-14 | MKUTTER
1201
1202 Statement-level profilers:
1203
1204 Devel::SmallProf | 1997-07-30 | ASHTED
1205 Devel::FastProf | 2005-09-20 | SALVA
1206 Devel::NYTProf | 2008-03-04 | AKAPLAN
1207 Devel::Profit | 2008-05-19 | LBROCARD
1208
1209 Devel::NYTProf is a (now distant) fork of Devel::FastProf, which was
1210 itself an evolution of Devel::SmallProf.
1211
1212 Adam Kaplan forked Devel::FastProf and added html report generation
1213 (based on Devel::Cover) and a test suite - a tricky thing to do for a
1214 profiler. Meanwhile Tim Bunce had been extending Devel::FastProf to
1215 add novel per-sub and per-block timing, plus subroutine caller
1216 tracking.
1217
1218 When Devel::NYTProf was released Tim switched to working on
1219 Devel::NYTProf because the html report would be a good way to show the
1220 extra profile data, and the test suite made development much easier and
1221 safer.
1222
1223 Then Tim went a little crazy and added a slew of new features, in
1224 addition to per-sub and per-block timing and subroutine caller
1225 tracking. These included the 'opcode interception' method of profiling,
1226 ultra-fast and robust inclusive subroutine timing, doubling
1227 performance, plus major changes to html reporting to display all the
1228 extra profile call and timing data in richly annotated and cross-linked
1229 reports.
1230
1231 Steve Peters came on board along the way with patches for portability
1232 and to keep NYTProf working with the latest development perl versions.
1233 Nicholas Clark added zip compression, many optimizations, and
1234 "nytprofmerge". Jan Dubois contributed Windows support.
1235
1236 Adam's work was sponsored by The New York Times Co.
1237 <http://open.nytimes.com>. Tim's work was partly sponsored by
1238 Shopzilla <http://www.shopzilla.com> during 2008 but hasn't been
1239 sponsored since then.
1240
1241 For the record, Tim has never worked for the New York Times nor has he
1242 received any kind of sponsorship or support from them in relation to
1243 NYTProf. The name of this module is simply result of the history
1244 outlined above, which can be summarised as: Adam forked an existing
1245 module when he added his enhancements and Tim didn't.
1246
1247
1248
1249perl v5.34.0 2021-07-22 Devel::NYTProf(3)