1Devel::NYTProf(3)     User Contributed Perl Documentation    Devel::NYTProf(3)
2
3
4

NAME

6       Devel::NYTProf - Powerful fast feature-rich Perl source code profiler
7

SYNOPSIS

9         # profile code and write database to ./nytprof.out
10         perl -d:NYTProf some_perl.pl
11
12         # convert database into a set of html files, e.g., ./nytprof/index.html
13         # and open a web browser on the nytprof/index.html file
14         nytprofhtml --open
15
16         # or into comma separated files, e.g., ./nytprof/*.csv
17         nytprofcsv
18
19       I give talks on profiling perl code, including a detailed look at how
20       to use NYTProf and how to optimize your code, every year. A video of my
21       YAPC::NA 2014 talk can be found at <https://youtu.be/T7EK6RZAnEA>
22

DESCRIPTION

24       Devel::NYTProf is a powerful, fast, feature-rich perl source code
25       profiler.
26
27       ·   Performs per-line statement profiling for fine detail
28
29       ·   Performs per-subroutine statement profiling for overview
30
31       ·   Performs per-opcode profiling for slow perl builtins
32
33       ·   Performs per-block statement profiling (the first profiler to do
34           so)
35
36       ·   Accounts correctly for time spent after calls return
37
38       ·   Performs inclusive and exclusive timing of subroutines
39
40       ·   Subroutine times are per calling location (a powerful feature)
41
42       ·   Can profile compile-time activity, just run-time, or just END time
43
44       ·   Uses novel techniques for efficient profiling
45
46       ·   Sub-microsecond (100ns) resolution on supported systems
47
48       ·   Very fast - the fastest statement and subroutine profilers for perl
49
50       ·   Handles applications that fork, with no performance cost
51
52       ·   Immune from noise caused by profiling overheads and I/O
53
54       ·   Program being profiled can stop/start the profiler
55
56       ·   Generates richly annotated and cross-linked html reports
57
58       ·   Captures source code, including string evals, for stable results
59
60       ·   Trivial to use with mod_perl - add one line to httpd.conf
61
62       ·   Includes an extensive test suite
63
64       ·   Tested on very large codebases
65
66       NYTProf is effectively two profilers in one: a statement profiler, and
67       a subroutine profiler.
68
69   Statement Profiling
70       The statement profiler measures the time between entering one perl
71       statement and entering the next. Whenever execution reaches a new
72       statement, the time since entering the previous statement is calculated
73       and added to the time associated with the line of the source file that
74       the previous statement starts on.
75
76       By default the statement profiler also determines the first line of the
77       current block and the first line of the current statement, and
78       accumulates times associated with those.
79
80       Another innovation unique to NYTProf is automatic compensation for a
81       problem inherent in simplistic statement-to-statement timing. Consider
82       a statement that calls a subroutine and then performs some other work
83       that doesn't execute new statements, for example:
84
85         foo(...) + mkdir(...);
86
87       In all other statement profilers the time spent in remainder of the
88       expression (mkdir in the example) will be recorded as having been spent
89       on the last statement executed in foo()! Here's another example:
90
91         while (<>) {
92            ...
93            1;
94         }
95
96       After the first time around the loop, any further time spent evaluating
97       the condition (waiting for input in this example) would be recorded as
98       having been spent on the last statement executed in the loop! (Until
99       perl bug #60954 is fixed this problem still applies to some loops. For
100       more information see
101       <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>)
102
103       NYTProf avoids these problems by intercepting the opcodes which
104       indicate that control is returning into some previous statement and
105       adjusting the profile accordingly.
106
107       The statement profiler naturally generates a lot of data which is
108       streamed out to a file in a very compact format. NYTProf takes care to
109       not include the measurement and writing overheads in the profile times
110       (some profilers produce 'noisy' data due to periodic stdio flushing).
111
112   Subroutine Profiling
113       The subroutine profiler measures the time between entering a subroutine
114       and leaving it. It then increments a call count and accumulates the
115       duration.  For each subroutine called, separate counts and durations
116       are stored for each location that called the subroutine.
117
118       Subroutine entry is detected by intercepting the "entersub" opcode.
119       Subroutine exit is detected via perl's internal save stack. As a result
120       the subroutine profiler is both fast and robust.
121
122       Subroutine Recursion
123
124       For subroutines that recurse directly or indirectly, such as
125       Error::try, the inclusive time is only measured for the outer-most
126       call.
127
128       The inclusive times of recursive calls are still measured and are
129       accumulated separately. Also the 'maximum recursion depth' per calling
130       location is recorded.
131
132       Goto &Subroutine
133
134       Perl implements a "goto &destination" as a "return" followed by a call
135       to &destination, so that's how it will appear in the report.
136
137       The "goto" will be shown with a very short time because it's
138       effectively just a "return". The &destination sub will show a call not
139       from the location of the "goto" but from the location of the call to
140       the sub that performed the "goto".
141
142       accept()
143
144       The perl built-in accept() function waits listening for a connection on
145       a socket, and so is a key part of pure-perl network service
146       applications.
147
148       The time spent waiting for a remotely initiated connection can be
149       relatively high but is not relevant to the performance of the
150       application. So the accept() function is treated as a special case. The
151       subroutine profiler discounts the time spent in the accept() function.
152       It does this in a way that also discounts that time from all the
153       callers up the call stack. The effect on the reports is that all
154       accept() calls appear to be instant.
155
156       The statement profiler still shows the time actually spent in the
157       statement that executed the accept() call.
158
159   Application Profiling
160       NYTProf records extra information in the data file to capture details
161       that may be useful when analyzing the performance. It also records the
162       filename and line ranges of all the subroutines.
163
164       NYTProf can profile applications that fork, and does so with no loss of
165       performance.  NYTProf detects the fork and starts writing a new profile
166       file with the pid appended to the filename. Since nytprofhtml only
167       works with a single profile file you may want to merge multiple files
168       using nytprofmerge.
169
170   Fast Profiling
171       The NYTProf profiler is written almost entirely in C and great care has
172       been taken to ensure it's very efficient.
173
174   Apache Profiling
175       Just add one line near the start of your httpd.conf file:
176
177         PerlModule Devel::NYTProf::Apache
178
179       By default you'll get a /tmp/nytprof.$$.out file for the parent process
180       and a /tmp/nytprof.$parent.out.$$ file for each worker process.
181
182       NYTProf takes care to detect when control is returning back from perl
183       to mod_perl so time spent in mod_perl (such as waiting for the next
184       request) does not get allocated to the last statement executed.
185
186       Works with mod_perl 1 and 2. See Devel::NYTProf::Apache for more
187       information.
188

PROFILING

190       Usually you'd load Devel::NYTProf on the command line using the perl -d
191       option:
192
193         perl -d:NYTProf some_perl.pl
194
195       To save typing the ':NYTProf' you could set the PERL5DB env var
196
197         PERL5DB='use Devel::NYTProf'
198
199       and then just perl -d would work:
200
201         perl -d some_perl.pl
202
203       Or you can avoid the need to add the -d option at all by using the
204       "PERL5OPT" env var:
205
206         PERL5OPT=-d:NYTProf
207
208       That's also very handy when you can't alter the perl command line being
209       used to run the script you want to profile. Usually you'll want to
210       enable the "addpid=1" option to ensure any nested invocations of perl
211       don't overwrite the profile.
212

NYTPROF ENVIRONMENT VARIABLE

214       The behavior of Devel::NYTProf may be modified by setting the
215       environment variable "NYTPROF".  It is possible to use this environment
216       variable to effect multiple setting by separating the values with a
217       ":".  For example:
218
219         export NYTPROF=trace=2:start=init:file=/tmp/nytprof.out
220
221       Any colon or equal characters in a value can be escaped by preceding
222       them with a backslash.
223
224   addpid=1
225       Append the current process id to the end of the filename.
226
227       This avoids concurrent, or consecutive, processes from overwriting the
228       same file.  If a fork is detected during profiling then the child
229       process will automatically add the process id to the filename.
230
231   addtimestamp=1
232       Append the current time, as integer epoch seconds, to the end of the
233       filename.
234
235   trace=N
236       Set trace level to N. 0 is off (the default). Higher values cause more
237       detailed trace output. Trace output is written to STDERR or wherever
238       the "log=F" option has specified.
239
240   log=F
241       Specify the name of the file that "trace=N" output should be written
242       to.
243
244   start=...
245       Specify at which phase of program execution the profiler should be
246       enabled:
247
248         start=begin - start immediately (the default)
249         start=init  - start at beginning of INIT phase (after compilation/use/BEGIN)
250         start=end   - start at beginning of END phase
251         start=no    - don't automatically start
252
253       The start=no option is handy if you want to explicitly control
254       profiling by calling DB::enable_profile() and DB::disable_profile()
255       yourself.  See "RUN-TIME CONTROL OF PROFILING".
256
257       The start=init option is handy if you want to avoid profiling the
258       loading and initialization of modules.
259
260   optimize=0
261       Disable the perl optimizer.
262
263       By default NYTProf leaves perl's optimizer enabled.  That gives you
264       more accurate profile timing overall, but can lead to odd statement
265       counts for individual sets of lines. That's because the perl's peephole
266       optimizer has effectively rewritten the statements but you can't see
267       what the rewritten version looks like.
268
269       For example:
270
271         1     if (...) {
272         2         return;
273         3     }
274
275       may be rewritten as
276
277         1    return if (...)
278
279       so the profile won't show a statement count for line 2 in your source
280       code because the "return" was merged into the "if" statement on the
281       preceding line.
282
283       Also 'empty' statements like "1;" are removed entirely.  Such
284       statements are empty because the optimizer has already removed the
285       pointless constant in void context. It then goes on to remove the now
286       empty statement (in perl >= 5.13.7).
287
288       Using the "optimize=0" option disables the optimizer so you'll get
289       lower overall performance but more accurately assigned statement
290       counts.
291
292       If you find any other examples of the effect of optimizer on NYTProf
293       output (other than performance, obviously) please let us know.
294
295   subs=0
296       Set to 0 to disable the collection of subroutine caller and timing
297       details.
298
299   blocks=1
300       Set to 1 to enable the determination of block and subroutine location
301       per statement.  This makes the profiler about 50% slower (as of July
302       2008) and produces larger output files, but you gain some valuable
303       insight in where time is spent in the blocks within large subroutines
304       and scripts.
305
306   stmts=0
307       Set to 0 to disable the statement profiler. (Implies "blocks=0".)  The
308       reports won't contain any statement timing detail.
309
310       This significantly reduces the overhead of the profiler and can also be
311       useful for profiling large applications that would normally generate a
312       very large profile data file.
313
314   calls=N
315       This option is new and experimental.
316
317       With calls=1 (the default) subroutine call return events are emitted
318       into the data stream as they happen.  With calls=2 subroutine call
319       entry events are also emitted. With calls=0 no subroutine call events
320       are produced.  This option depends on the "subs" option being enabled,
321       which it is by default.
322
323       The nytprofcalls utility can be used to process this data. It too is
324       new and experimental and so likely to change.
325
326       The subroutine profiler normally gathers data in memory and outputs a
327       summary when the profile data is being finalized, usually when the
328       program has finished.  The summary contains aggregate information for
329       all the calls from one location to another, but the details of
330       individual calls have been lost.  The calls option enables the
331       recording of individual call events and thus more detailed analysis and
332       reporting of that data.
333
334   leave=0
335       Set to 0 to disable the extra work done by the statement profiler to
336       allocate times accurately when returning into the middle of statement.
337       For example leaving a subroutine and returning into the middle of
338       statement, or re-evaluating a loop condition.
339
340       This feature also ensures that in embedded environments, such as
341       mod_perl, the last statement executed doesn't accumulate the time spent
342       'outside perl'.
343
344   findcaller=1
345       Force NYTProf to recalculate the name of the caller of the each sub
346       instead of 'inheriting' the name calculated when the caller was
347       entered. (Rarely needed, but might be useful in some odd cases.)
348
349   use_db_sub=1
350       Set to 1 to enable use of the traditional DB::DB() subroutine to
351       perform profiling, instead of the faster 'opcode redirection' technique
352       that's used by default. Also effectively sets "leave=0" (see above).
353
354       The default 'opcode redirection' technique can't profile subroutines
355       that were compiled before NYTProf was loaded. So using use_db_sub=1 can
356       be useful in cases where you can't load the profiler early in the life
357       of the application.
358
359       Another side effect of "use_db_sub=1" is that it enables recording of
360       the source code of the "perl -e '...'" and "perl -" input for old
361       versions of perl. See also "savesrc=0".
362
363   savesrc=0
364       Disable the saving of source code.
365
366       By default NYTProf saves a copy of all source code into the profile
367       data file.  This makes the file self-contained, so the reporting tools
368       no longer depend on having the unmodified source code files available.
369
370       With "savesrc=0" some source code is still saved: the arguments to the
371       "perl -e" option, the script fed to perl via STDIN when using "perl -",
372       and the source code of string evals.
373
374       Saving the source code of string evals requires perl version 5.8.9+,
375       5.10.1+, or 5.12 or later.
376
377       Saving the source code of the "perl -e '...'" or "perl -" input
378       requires either a recent perl version, as above, or setting the
379       "use_db_sub=1" option.
380
381   slowops=N
382       Profile perl opcodes that can be slow. These include opcodes that make
383       system calls, such as "print", "read", "sysread", "socket" etc., plus
384       regular expression opcodes like "subst" and "match".
385
386       If "N" is 0 then slowops profiling is disabled.
387
388       If "N" is 1 then all the builtins are treated as being defined in the
389       "CORE" package. So times for "print" calls from anywhere in your code
390       are merged and accounted for as calls to an xsub called "CORE::print".
391
392       If "N" is 2 (the default) then builtins are treated as being defined in
393       the package that calls them. So calls to "print" from package "Foo" are
394       treated as calls to an xsub called "Foo::CORE:print". Note the single
395       colon after CORE.
396
397       The opcodes are currently profiled using their internal names, so
398       "printf" is "prtf" and the "-x" file test is "fteexec". This may change
399       in future.
400
401       Opcodes that call subroutines, perhaps by triggering a FETCH from a
402       tied variable, currently appear in the call tree as the caller of the
403       sub. This is likely to change in future.
404
405   usecputime=1
406       This option has been removed. Profiling won't be enabled if set.
407
408       Use the "clock=N" option to select a high-resolution CPU time clock, if
409       available on your system, instead. That will give you higher resolution
410       and work for the subroutine profiler as well.
411
412   file=...
413       Specify the output file to write profile data to (default:
414       './nytprof.out').
415
416   compress=...
417       Specify the compression level to use, if NYTProf is compiled with
418       compression support. Valid values are 0 to 9, with 0 disabling
419       compression. The default is 6 as higher values yield little extra
420       compression but the cpu cost starts to rise significantly. Using level
421       1 still gives you a significant reduction in file size.
422
423       If NYTProf was not compiled with compression support, this option is
424       silently ignored.
425
426   clock=N
427       Systems which support the "clock_gettime()" system call typically
428       support several clocks. By default NYTProf uses CLOCK_MONOTONIC.
429
430       This option enables you to select a different clock by specifying the
431       integer id of the clock (which may vary between operating system
432       types).  If the clock you select isn't available then CLOCK_REALTIME is
433       used.
434
435       See "CLOCKS" for more information.
436
437   sigexit=1
438       When perl exits normally it runs any code defined in "END" blocks.
439       NYTProf defines an END block that finishes profiling and writes out the
440       final profile data.
441
442       If the process ends due to a signal then END blocks are not executed so
443       the profile will be incomplete and unusable.  The "sigexit" option
444       tells NYTProf to catch some signals (e.g. INT, HUP, PIPE, SEGV, BUS)
445       and ensure a usable profile by executing:
446
447           DB::finish_profile();
448           exit 1;
449
450       You can also specify which signals to catch in this way by listing
451       them, separated by commas, as the value of the option (case is not
452       significant):
453
454           sigexit=int,hup
455
456   posix_exit=1
457       The NYTProf subroutine profiler normally detects calls to
458       "POSIX::_exit()" (which exits the process without running END blocks)
459       and automatically calls "DB::finish_profile()" for you, so NYTProf
460       'just works'.
461
462       When using the "subs=0" option to disable the subroutine profiler the
463       "posix_exit" option can be used to tell NYTProf to take other steps to
464       arrange for "DB::finish_profile()" to be called before
465       "POSIX::_exit()".
466
467   libcexit=1
468       Arranges for "finish_profile" to be called via the C library "atexit()"
469       function.  This may help some tricky cases where the process may exit
470       without perl executing the "END" block that NYTProf uses to call
471       /finish_profile().
472
473   endatexit=1
474       Sets the PERL_EXIT_DESTRUCT_END flag in the PL_exit_flags of the perl
475       interpreter.  This makes perl run "END" blocks in perl_destruct()
476       instead of perl_run() which may help in cases, like Apache, where perl
477       is embedded but perl_run() isn't called.
478
479   forkdepth=N
480       When a perl process that is being profiled executes a fork() the child
481       process is also profiled. The forkdepth option can be used to control
482       this. If forkdepth is zero then profiling will be disabled in the child
483       process.
484
485       If forkdepth is greater than zero then profiling will be enabled in the
486       child process and the forkdepth value in that process is decremented by
487       one.
488
489       If forkdepth is -1 (the default) then there's no limit on the number of
490       generations of children that are profiled.
491
492   nameevals=0
493       The 'file name' of a string eval is normally a string like ""(eval
494       N)"", where "N" is a sequence number. By default NYTProf asks perl to
495       give evals more informative names like ""(eval N)[file:line]"", where
496       "file" and "line" are the file and line number where the string "eval"
497       was executed.
498
499       The "nameevals=0" option can be used to disable the more informative
500       names and return to the default behaviour. This may be need in rare
501       cases where the application code is sensitive to the name given to a
502       "eval". (The most common case in when running test suites undef
503       NYTProf.)
504
505       The downside is that the NYTProf reporting tools are less useful and
506       may get confused if this option is used.
507
508   nameanonsubs=0
509       The name of a anonymous subroutine is normally ""__ANON__"".  By
510       default NYTProf asks perl to give anonymous subroutines more
511       informative names like ""__ANON__[file:line]"", where "file" and "line"
512       are the file and line number where the anonymous subroutine was
513       defined.
514
515       The "nameanonsubs=0" option can be used to disable the more informative
516       names and return to the default behaviour. This may be need in rare
517       cases where the application code is sensitive to the name given to a
518       anonymous subroutines.  (The most common case in when running test
519       suites undef NYTProf.)
520
521       The downside is that the NYTProf reporting tools are less useful and
522       may get confused if this option is used.
523

RUN-TIME CONTROL OF PROFILING

525       You can profile only parts of an application by calling
526       DB::disable_profile() to stop collecting profile data, and calling
527       DB::enable_profile() to start collecting profile data.
528
529       Using the "start=no" option lets you leave the profiler disabled
530       initially until you call DB::enable_profile() at the right moment. You
531       still need to load Devel::NYTProf as early as possible, even if you
532       don't call DB::enable_profile() until much later. That's because any
533       code that's compiled before Devel::NYTProf is loaded will not be
534       profiled by default. See also "use_db_sub=1".
535
536       The profile output file can't be used until it's been properly
537       completed and closed.  Calling DB::disable_profile() doesn't do that.
538       To make a profile file usable before the profiled application has
539       completed you can call DB::finish_profile(). Alternatively you could
540       call DB::enable_profile($newfile).
541
542       Always call the DB::enable_profile(), DB::disable_profile() or
543       DB::finish_profile() function with the "DB::" prefix as shown because
544       you can't import them. They're provided automatically when NYTProf is
545       in use.
546
547   disable_profile
548         DB::disable_profile()
549
550       Stops collection of profile data until DB:enable_profile() is called.
551
552       Subroutine calls which were made while profiling was enabled and are
553       still on the call stack (have not yet exited) will still have their
554       profile data collected when they exit. Compare with "finish_profile"
555       below.
556
557   enable_profile
558         DB::enable_profile($newfile)
559         DB::enable_profile()
560
561       Enables collection of profile data. If $newfile is specified the
562       profile data will be written to $newfile (after completing and closing
563       the previous file, if any).  If $newfile already exists it will be
564       deleted first.  If DB::enable_profile() is called without a filename
565       argument then profile data will continue to be written to the current
566       file (nytprof.out by default).
567
568   finish_profile
569         DB::finish_profile()
570
571       Calls DB::disable_profile(), then completes the profile data file by
572       writing subroutine profile data, and then closes the file. The in
573       memory subroutine profile data is then discarded.
574
575       Normally NYTProf arranges to call finish_profile() for you via an END
576       block.
577

DATA COLLECTION AND INTERPRETATION

579       NYTProf tries very hard to gather accurate information.  The nature of
580       the internals of perl mean that, in some cases, the information that's
581       gathered is accurate but surprising. In some cases it can appear to be
582       misleading.  (Of course, in some cases it may actually be plain wrong.
583       Caveat lector.)
584
585   If Statement and Subroutine Timings Don't Match
586       NYTProf has two profilers: a statement profiler that's invoked when
587       perl moves from one perl statement to another, and a subroutine
588       profiler that's invoked when perl calls or returns from a subroutine.
589
590       The individual statement timings for a subroutine usually add up to
591       slightly less than the exclusive time for the subroutine. That's
592       because the handling of the subroutine call and return overheads is
593       included in the exclusive time for the subroutine. The difference may
594       only be a few microseconds but that may become noticeable for
595       subroutines that are called hundreds of thousands of times.
596
597       The statement profiler keeps track how much time was spent on
598       overheads, like writing statement profile data to disk. The subroutine
599       profiler subtracts the overheads that have accumulated between entering
600       and leaving the subroutine in order to give a more accurate profile.
601       The statement profiler is generally very fast because most writes get
602       buffered for zip compression so the profiler overhead per statement
603       tends to be very small, often a single 'tick'.  The result is that the
604       accumulated overhead is quite noisy. This becomes more significant for
605       subroutines that are called frequently and are also fast.  This may be
606       another, smaller, contribution to the discrepancy between statement
607       time and exclusive times.
608
609   If Headline Subroutine Timings Don't Match the Called Subs
610       Overall subroutine times are reported with a headline like "spent 10s
611       (2+8) within ...".  In this example, 10 seconds were spent inside the
612       subroutine (the "inclusive time") and, of that, 8 seconds were spent in
613       subroutines called by this one.  That leaves 2 seconds as the time
614       spent in the subroutine code itself (the "exclusive time", sometimes
615       also called the "self time").
616
617       The report shows the source code of the subroutine. Lines that make
618       calls to other subroutines are annotated with details of the time spent
619       in those calls.
620
621       Sometimes the sum of the times for calls made by the lines of code in
622       the subroutine is less than the inclusive-exclusive time reported in
623       the headline (10-2 = 8 seconds in the example above).
624
625       What's happening here is that calls to other subroutines are being made
626       but NYTProf isn't able to determine the calling location correctly so
627       the calls don't appear in the report in the correct place.
628
629       Using an old version of perl is one cause (see below). Another is
630       calling subroutines that exit via "goto &sub;" - most frequently
631       encountered in AUTOLOAD subs and code using the Memoize module.
632
633       In general the overall subroutine timing is accurate and should be
634       trusted more than the sum of statement or nested sub call timings.
635
636   Perl 5.10.1+ (or else 5.8.9+) is Recommended
637       These versions of perl yield much more detailed information about calls
638       to BEGIN, CHECK, INIT, and END blocks, the code handling tied or
639       overloaded variables, and callbacks from XS code.
640
641       Perl 5.12 will hopefully also fix an inaccuracy in the timing of the
642       last statement and the condition clause of some kinds of loops:
643       <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>
644
645   eval $string
646       Perl treats each execution of a string eval ("eval $string;" not "eval
647       { ...  }") as a distinct file, so NYTProf does as well. The 'files' are
648       given names with this structure:
649
650               (eval $sequence)[$filename:$line]
651
652       for example ""(eval 93)[/foo/bar.pm:42]"" would be the name given to
653       the 93rd execution of a string eval by that process and, in this case,
654       the 93rd eval happened to be one at line 42 of "/foo/bar.pm".
655
656       Nested string evals can give rise to file names like
657
658               (eval 1047)[(eval 93)[/foo/bar.pm:42]:17]
659
660       Merging Evals
661
662       Some applications execute a great many string eval statements. If
663       NYTProf generated a report page for each one it would not only slow
664       report generation but also make the overall report less useful by
665       scattering performance data too widely.  On the other hand, being able
666       to see the actual source code executed by an eval, along with the
667       timing details, is often very useful.
668
669       To try to balance these conflicting needs, NYTProf currently merges
670       uninteresting string eval siblings.
671
672       What does that mean? Well, for each source code line that executed any
673       string evals, NYTProf first gathers the corresponding eval 'files' for
674       that line (known as the 'siblings') into groups keyed by distinct
675       source code.
676
677       Then, for each of those groups of siblings, NYTProf will 'merge' a
678       group that shares the same source code and doesn't execute any string
679       evals itself.  Merging means to pick one sibling as the survivor and
680       merge and delete all the data from the others into it.
681
682       If there are a large number of sibling groups then the data for all of
683       them are merged into one regardless.
684
685       The report annotations will indicate when evals have been merged
686       together.
687
688       Merging Anonymous Subroutines
689
690       Anonymous subroutines defined within string evals have names like this:
691
692               main::__ANON__[(eval 75)[/foo/bar.pm:42]:12]
693
694       That anonymous subroutine was defined on line 12 of the source code
695       executed by the string eval on line 42 of /foo/bar.pm. That was the
696       75th string eval executed by the program.
697
698       Anonymous subroutines defined on the same line of sibling evals that
699       get merged are also merged. That is, the profile information is merged
700       into one and the others are discarded.
701
702       Timing
703
704       Care should be taken when interpreting the report annotations
705       associated with a string eval statement.  Normally the report
706       annotations embedded into the source code related to timings from the
707       subroutine profiler. This isn't (currently) true of annotations for
708       string eval statements.
709
710       This makes a significant different if the eval defines any subroutines
711       that get called after the eval has returned. Because the time shown for
712       a string eval is based on the statement times it will include time
713       spent executing statements within the subs defined by the eval.
714
715       In future NYTProf may involve the subroutine profiler in timings evals
716       and so be able to avoid this issue.
717
718   Calls from XSUBs and Opcodes
719       Calls record the current filename and line number of the perl code at
720       the time the call was made. That's fine and accurate for calls from
721       perl code. For calls that originate from C code however, such as an
722       XSUB or an opcode, the filename and line number recorded are still
723       those of the last perl statement executed.
724
725       For example, a line that calls an xsub will appear in reports to also
726       have also called any subroutines that that xsub called. This can be
727       construed as a feature.
728
729       As an extreme example, the first time a regular expression that uses
730       character classes is executed on a unicode string you'll find profile
731       data like this:
732
733             # spent 1ms within main::BEGIN@4 which was called
734             #    once (1ms+0s) by main::CORE:subst at line 0
735         4   s/ (?: [A-Z] | [\d] )+ (?= [\s] ) //x;
736             # spent  38.8ms making 1 call to main::CORE:subst
737             # spent  25.4ms making 2 calls to utf8::SWASHNEW, avg 12.7ms/call
738             # spent  12.4ms making 1 call to utf8::AUTOLOAD
739

MAKING NYTPROF FASTER

741       You can reduce the cost of profiling by adjusting some options. The
742       trade-off is reduced detail and/or accuracy in reports.
743
744       If you don't need statement-level profiling then you can disable it via
745       "stmts=0".  To further boost statement-level profiling performance try
746       "leave=0" but note that will apportion timings for some kinds of
747       statements less accurate).
748
749       If you don't need call stacks or flamegraph then disable it via
750       "calls=0".  If you don't need subroutine profiling then you can disable
751       it via "subs=0".  If you do need it but don't need timings for perl
752       opcodes then set "slowops=0".
753
754       Generally speaking, setting calls=0 and slowops=0 will give you a
755       useful boost with the least loss of detail.
756
757       Another approach is to only enable NYTProf in the sections of code that
758       interest you. See "RUN-TIME CONTROL OF PROFILING" for more details.
759
760       To speed up nytprofhtml try using the --minimal (-m) or --no-flame
761       options.
762

REPORTS

764       The Devel::NYTProf::Data module provides a low-level interface for
765       loading the profile data.
766
767       The Devel::NYTProf::Reader module provides an interface for generating
768       arbitrary reports.  This means that you can implement your own output
769       format in perl. (Though the module is in a state of flux and may be
770       deprecated soon.)
771
772       Included in the bin directory of this distribution are some scripts
773       which turn the raw profile data into more useful formats:
774
775   nytprofhtml
776       Creates attractive, richly annotated, and fully cross-linked html
777       reports (including statistics, source code and color highlighting).
778       This is the main report generation tool for NYTProf.
779
780   nytprofcg
781       Translates a profile into a format that can be loaded into KCachegrind
782       <http://kcachegrind.github.io/>
783
784   nytprofcalls
785       Reads a profile and processes the calls events it contains.
786
787   nytprofmerge
788       Reads multiple profile data files and writes out a new file containing
789       the merged profile data.
790

LIMITATIONS

792   Threads and Multiplicity
793       "Devel::NYTProf" is not currently thread safe or multiplicity safe.  If
794       you'd be interested in helping to fix that then please get in touch
795       with us. Meanwhile, profiling is disabled when a thread is created, and
796       NYTProf tries to ignore any activity from perl interpreters other than
797       the first one that loaded it.
798
799   Coro
800       The "Devel::NYTProf" subroutine profiler gets confused by the stack
801       gymnastics performed by the Coro module and aborts. When profiling
802       applications that use Coro you should disable the subroutine profiler
803       using the "subs=0" option.
804
805   FCGI::Engine
806       Using "open('-|')" in code running under FCGI::Engine causes a panic in
807       nytprofcalls.  See https://github.com/timbunce/devel-nytprof/issues/20
808       for more information.
809
810   For perl < 5.8.8 it may change what caller() returns
811       For example, the Readonly module croaks with "Invalid tie" when
812       profiled with perl versions before 5.8.8. That's because Readonly
813       explicitly checking for certain values from caller(). The NEXT module
814       is also affected.
815
816   For perl < 5.10.1 it can't see some implicit calls and callbacks
817       For perl versions prior to 5.8.9 and 5.10.1, some implicit subroutine
818       calls can't be seen by the subroutine profiler. Technically this
819       affects calls made via the various perl "call_*()" internal APIs.
820
821       For example, BEGIN/CHECK/INIT/END blocks, the "TIE"whatever subroutine
822       called by "tie()", all calls made via operator overloading, and
823       callbacks from XS code, are not seen.
824
825       The effect is that time in those subroutines is accumulated by the subs
826       that triggered the call to them. So time spent in calls invoked by perl
827       to handle overloading are accumulated by the subroutines that trigger
828       overloading (so it is measured, but the cost is dispersed across
829       possibly many calling locations).
830
831       Although the calls aren't seen by the subroutine profiler, the
832       individual statements executed by the code in the called subs are
833       profiled by the statement profiler.
834
835   #line directives
836       The reporting code currently doesn't handle #line directives, but at
837       least it warns about them. Patches welcome.
838
839   Freed values in @_ may be mutated
840       Perl has a class of bugs related to the fact that values placed in the
841       stack are not reference counted. Consider this example:
842
843         @a = (1..9);  sub s { undef @a; print $_ for @_ }  s(@a);
844
845       The "undef @a" frees the values that @_ refers to. Perl can sometimes
846       detect when a freed value is accessed and treats it as an undef.
847       However, if the freed value is assigned some new value then @_ is
848       effectively corrupted.
849
850       NYTProf allocates new values while it's profiling, in order to record
851       program activity, and so may appear to corrupt @_ in this (rare)
852       situation.  If this happens, NYTProf is simply exposing an existing
853       problem in the code.
854
855   Lvalue subroutines aren't profiled when using use_db_sub=1
856       Currently 'lvalue' subroutines (subs that can be assigned to, like
857       "foo() = 42") are not profiled when using use_db_sub=1.
858

CLOCKS

860       Here we discuss the way NYTProf gets high-resolution timing information
861       from your system and related issues.
862
863   POSIX Clocks
864       These are the clocks that your system may support if it supports the
865       POSIX "clock_gettime()" function. Other clock sources are listed in the
866       "Other Clocks" section below.
867
868       The "clock_gettime()" interface allows clocks to return times to
869       nanosecond precision. Of course few offer nanosecond accuracy but the
870       extra precision helps reduce the cumulative error that naturally occurs
871       when adding together many timings. When using these clocks NYTProf
872       outputs timings as a count of 100 nanosecond ticks.
873
874       CLOCK_MONOTONIC
875
876       CLOCK_MONOTONIC represents the amount of time since an unspecified
877       point in the past (typically system start-up time).  It increments
878       uniformly independent of adjustments to 'wallclock time'. NYTProf will
879       use this clock by default, if available.
880
881       CLOCK_REALTIME
882
883       CLOCK_REALTIME is typically the system's main high resolution 'wall
884       clock time' source.  The same source as used for the gettimeofday()
885       call used by most kinds of perl benchmarking and profiling tools.
886
887       The problem with real time is that it's far from simple. It tends to
888       drift and then be reset to match 'reality', either sharply or by small
889       adjustments (via the adjtime() system call).
890
891       Surprisingly, it can also go backwards, for reasons explained in
892       http://preview.tinyurl.com/5wawnn so CLOCK_MONOTONIC is preferred.
893
894       CLOCK_VIRTUAL
895
896       CLOCK_VIRTUAL increments only when the CPU is running in user mode on
897       behalf of the calling process.
898
899       CLOCK_PROF
900
901       CLOCK_PROF increments when the CPU is running in user or kernel mode.
902
903       CLOCK_PROCESS_CPUTIME_ID
904
905       CLOCK_PROCESS_CPUTIME_ID represents the amount of execution time of the
906       process associated with the clock.
907
908       CLOCK_THREAD_CPUTIME_ID
909
910       CLOCK_THREAD_CPUTIME_ID represents the amount of execution time of the
911       thread associated with the clock.
912
913       Finding Available POSIX Clocks
914
915       On unix-like systems you can find the CLOCK_* clocks available on you
916       system using a command like:
917
918         grep -r 'define *CLOCK_' /usr/include
919
920       Look for a group that includes CLOCK_REALTIME. The integer values
921       listed are the clock ids that you can use with the "clock=N" option.
922
923       A future version of NYTProf should be able to list the supported
924       clocks.
925
926   Other Clocks
927       This section lists other clock sources that NYTProf may use.
928
929       If your system doesn't support clock_gettime() then NYTProf will use
930       gettimeofday(), or the nearest equivalent,
931
932       gettimeofday
933
934       This is the traditional high resolution time of day interface for most
935       unix-like systems.  With this clock NYTProf outputs timings as a count
936       of 1 microsecond ticks.
937
938       mach_absolute_time
939
940       On Mac OS X the mach_absolute_time() function is used. With this clock
941       NYTProf outputs timings as a count of 100 nanosecond ticks.
942
943       Time::HiRes
944
945       On systems which don't support other clocks, NYTProf falls back to
946       using the Time::HiRes module.  With this clock NYTProf outputs timings
947       as a count of 1 microsecond ticks.
948
949   Clock References
950       Relevant specifications and manual pages:
951
952         http://www.opengroup.org/onlinepubs/000095399/functions/clock_getres.html
953         http://linux.die.net/man/3/clock_gettime
954
955       Why 'realtime' can appear to go backwards:
956
957         http://preview.tinyurl.com/5wawnn
958
959       The PostgreSQL pg_test_timing utility documentation has a good summary
960       of timing issues:
961
962         http://www.postgresql.org/docs/9.2/static/pgtesttiming.html
963

CAVEATS

965   SMP Systems
966       On systems with multiple processors, which includes most modern
967       machines, (from Linux docs though applicable to most SMP systems):
968
969         The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on
970         many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium).
971         These registers may differ between CPUs and as a consequence these clocks may
972         return bogus results if a process is migrated to another CPU.
973
974         If the CPUs in an SMP system have different clock sources then there is no way
975         to maintain a correlation between the timer registers since each CPU will run
976         at a slightly different frequency. If that is the case then
977         clock_getcpuclockid(0) will return ENOENT to signify this condition. The two
978         clocks will then only be useful if it can be ensured that a process stays on a
979         certain CPU.
980
981         The processors in an SMP system do not start all at exactly the same time and
982         therefore the timer registers are typically running at an offset. Some
983         architectures include code that attempts to limit these offsets on bootup.
984         However, the code cannot guarantee to accurately tune the offsets. Glibc
985         contains no provisions to deal with these offsets (unlike the Linux Kernel).
986         Typically these offsets are small and therefore the effects may be negligible
987         in most cases.
988
989       In summary, SMP systems are likely to give 'noisy' profiles.  Setting a
990       "Processor Affinity" may help.
991
992       Processor Affinity
993
994       Processor affinity is an aspect of task scheduling on SMP systems.
995       "Processor affinity takes advantage of the fact that some remnants of a
996       process may remain in one processor's state (in particular, in its
997       cache) from the last time the process ran, and so scheduling it to run
998       on the same processor the next time could result in the process running
999       more efficiently than if it were to run on another processor." (From
1000       http://en.wikipedia.org/wiki/Processor_affinity)
1001
1002       Setting an explicit processor affinity can avoid the problems described
1003       in "SMP Systems".
1004
1005       Processor affinity can be set using the "taskset" command on Linux.
1006
1007       Note that processor affinity is inherited by child processes, so if the
1008       process you're profiling spawns cpu intensive sub processes then your
1009       process will be impacted by those more than it otherwise would.
1010
1011       Windows
1012
1013       THIS SECTION DOESN'T MATCH THE CODE
1014
1015       On Windows NYTProf uses Time::HiRes which uses the windows
1016       QueryPerformanceCounter() API with some extra logic to adjust for the
1017       current clock speed and try to resync the raw counter to wallclock time
1018       every so often (every 30 seconds or if the timer drifts by more than
1019       0.5 of a seconds).  This extra logic may lead to occasional spurious
1020       results.
1021
1022       (It would be great if someone could contribute a patch to NYTProf to
1023       use QueryPerformanceCounter() directly and avoid the overheads and
1024       resyncing behaviour of Time::HiRes.)
1025
1026   Virtual Machines
1027       I recommend you don't do performance profiling while running in a
1028       virtual machine.  If you do you're likely to find inexplicable spikes
1029       of real-time appearing at unreasonable places in your code. You should
1030       pay less attention to the statement timings and rely more on the
1031       subroutine timings. They will still be noisy but less so than the
1032       statement times.
1033
1034       You could also try using the "clock=N" option to select a high-
1035       resolution cpu-time clock instead of a real-time one. That should be
1036       much less noisy, though you will lose visibility of wait-times due to
1037       network and disk I/O, for example.
1038

BUGS

1040       Possibly. All complex software has bugs. Let me know if you find one.
1041

SEE ALSO

1043       Screenshots of nytprofhtml v2.01 reports can be seen at
1044       <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-index.png>
1045       and
1046       <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-all-perl-files.png>.
1047       A writeup of the new features of NYTProf v2 can be found at
1048       <http://blog.timbunce.org/2008/07/15/nytprof-v2-a-major-advance-in-perl-profilers/>
1049       and the background story, explaining the "why", can be found at
1050       <http://blog.timbunce.org/2008/07/16/nytprof-v2-the-background-story/>.
1051
1052       Mailing list and discussion at
1053       <http://groups.google.com/group/develnytprof-dev>
1054
1055       Blog posts <http://blog.timbunce.org/tag/nytprof/>
1056
1057       Public Github Repository and hacking instructions at
1058       <https://github.com/timbunce/devel-nytprof/>
1059
1060       nytprofhtml is a script included that produces html reports.
1061       nytprofcsv is another script included that produces plain text CSV
1062       reports.
1063
1064       Devel::NYTProf::Reader is the module that powers the report scripts.
1065       You might want to check this out if you plan to implement a custom
1066       report (though it's very likely to be deprecated in a future release).
1067
1068       Devel::NYTProf::ReadStream is the module that lets you read a profile
1069       data file as a stream of chunks of data.
1070
1071       Other tools:
1072
1073       DTrace <https://speakerdeck.com/mrallen1/perl-dtrace-and-you>
1074

TROUBLESHOOTING

1076   "Profile data incomplete, ..." or "Profile format error: ..."
1077       This error message means the file doesn't contain all the expected data
1078       or the data has been corrupted in some way.  That may be because it was
1079       truncated (perhaps the filesystem was full) or, more commonly, because
1080       the all the expected data hasn't been written.
1081
1082       NYTProf writes some important data to the data file when finishing
1083       profiling.  If you read the file before the profiling has finished
1084       you'll get this error.
1085
1086       If the process being profiled is still running you'll need to wait
1087       until it exits cleanly (runs "END" blocks or "finish_profile" is called
1088       explicitly).
1089
1090       If the process being profiled has exited then it's likely that it met
1091       with a sudden and unnatural death that didn't give NYTProf a chance to
1092       finish the profile.  If the sudden death was due to a signal, like
1093       SIGTERM, or a SIGINT from pressing Ctrl-C, then the "sigexit=1" option
1094       may help.
1095
1096       If the sudden death was due to calling "POSIX::_exit($status)" then
1097       you'll need to call "finish_profile" before calling "POSIX::_exit".
1098
1099       You'll also get this error if the code trying to read the profile is
1100       itself being profiled. That's most likely to happen if you enable
1101       profiling via the "PERL5OPT" environment variable and have forgotten to
1102       unset it.
1103
1104       If you've encountered this error message, and you're sure you've
1105       understood the concerns described above, and you're sure they don't
1106       apply in your case, then please open an issue.  Be sure to include
1107       sufficient information so I can see how you've addressed these likely
1108       causes.
1109
1110   Some source files don't have profile information
1111       This is usually due to NYTProf being initialized after some perl files
1112       have already been compiled.
1113
1114       If you can't alter the command line to add ""-d:NYTProf"" you could try
1115       using the "PERL5OPT" environment variable. See "PROFILING".
1116
1117       You could also try using the "use_db_sub=1" option.
1118
1119   Eval ... has unknown invoking fid
1120       When using the statement profiler you may see a warning message like
1121       this:
1122
1123         Eval '(eval 2)' (fid 9, flags:viastmt,savesrc) has unknown invoking fid 10
1124
1125       Notice that the eval file id (fid 9 in this case) is lower than the
1126       file id that invoked the eval (fid 10 in this case). This is a known
1127       problem caused by the way perl works and how the profiler assigns and
1128       outputs the file ids.  The invoking fid is known but gets assigned a
1129       fid and output after the fid for the eval, and that causes the warning
1130       when the file is read.
1131
1132   Warning: %d subroutine calls had negative time
1133       There are two likely causes for this: clock instability, or accumulated
1134       timing errors.
1135
1136       Clock instability, if present on your system, is most likely to be
1137       noticeable on very small/fast subroutines that are called very few
1138       times.
1139
1140       Accumulated timing errors can arise because the subroutine profiler
1141       uses floating point values (NVs) to store the times.  They are most
1142       likely to be noticed on subroutines that are called a few times but
1143       which make a large number of calls to very fast subroutines (such as
1144       opcodes). In this case the accumulated time apparently spent making
1145       those calls can be greater than the time spent in the calling
1146       subroutine.
1147
1148       If you rerun nytprofhtml (etc.) with the "trace=N" option set >0 you'll
1149       see trace messages like  "%s call has negative time: incl %fs, excl
1150       %fs" for each affected subroutine.
1151
1152       Try profiling with the "slowops=N" option set to 0 to disable the
1153       profiling of opcodes. Since opcodes often execute in a few microseconds
1154       they are a common cause of this warning.
1155
1156       You could also try recompiling perl to use 'long doubles' for the NV
1157       floating point type (use Configure -Duselongdouble). If you try this
1158       please let me know.  I'd also happily take a patch to use long doubles,
1159       if available, by default.
1160
1161   panic: buffer overflow ...
1162       You have unusually long subroutine names in your code. You'll need to
1163       rebuild Devel::NYTProf with the NYTP_MAX_SUB_NAME_LEN environment
1164       variable set to a value longer than the longest subroutine names in
1165       your code.
1166

AUTHORS AND CONTRIBUTORS

1168       Tim Bunce (<http://www.tim.bunce.name> and <http://blog.timbunce.org>)
1169       leads the project and has done most of the development work thus far.
1170
1171       Nicholas Clark contributed zip compression and "nytprofmerge".  Chia-
1172       liang Kao contributed "nytprofcg".  Peter (Stig) Edwards contributed
1173       the VMS port.  Jan Dubois contributed the Windows port.  Gisle Aas
1174       contributed the Devel::NYTProf::ReadStream module.  Steve Peters
1175       contributed greater perl version portability and use of POSIX high-
1176       resolution clocks.  Other contributors are noted in the Changes file.
1177
1178       Many thanks to Adam Kaplan who created "NYTProf" initially by forking
1179       "Devel::FastProf" adding reporting from "Devel::Cover" and a test
1180       suite.  For more details see "HISTORY" below.
1181
1183         Copyright (C) 2008 by Adam Kaplan and The New York Times Company.
1184         Copyright (C) 2008-2016 by Tim Bunce, Ireland.
1185
1186       This library is free software; you can redistribute it and/or modify it
1187       under the same terms as Perl itself, either Perl version 5.8.8 or, at
1188       your option, any later version of Perl 5 you may have available.
1189
1190   Background
1191       Subroutine-level profilers:
1192
1193         Devel::DProf        | 1995-10-31 | ILYAZ
1194         Devel::AutoProfiler | 2002-04-07 | GSLONDON
1195         Devel::Profiler     | 2002-05-20 | SAMTREGAR
1196         Devel::Profile      | 2003-04-13 | JAW
1197         Devel::DProfLB      | 2006-05-11 | JAW
1198         Devel::WxProf       | 2008-04-14 | MKUTTER
1199
1200       Statement-level profilers:
1201
1202         Devel::SmallProf    | 1997-07-30 | ASHTED
1203         Devel::FastProf     | 2005-09-20 | SALVA
1204         Devel::NYTProf      | 2008-03-04 | AKAPLAN
1205         Devel::Profit       | 2008-05-19 | LBROCARD
1206
1207       Devel::NYTProf is a (now distant) fork of Devel::FastProf, which was
1208       itself an evolution of Devel::SmallProf.
1209
1210       Adam Kaplan forked Devel::FastProf and added html report generation
1211       (based on Devel::Cover) and a test suite - a tricky thing to do for a
1212       profiler.  Meanwhile Tim Bunce had been extending Devel::FastProf to
1213       add novel per-sub and per-block timing, plus subroutine caller
1214       tracking.
1215
1216       When Devel::NYTProf was released Tim switched to working on
1217       Devel::NYTProf because the html report would be a good way to show the
1218       extra profile data, and the test suite made development much easier and
1219       safer.
1220
1221       Then Tim went a little crazy and added a slew of new features, in
1222       addition to per-sub and per-block timing and subroutine caller
1223       tracking. These included the 'opcode interception' method of profiling,
1224       ultra-fast and robust inclusive subroutine timing, doubling
1225       performance, plus major changes to html reporting to display all the
1226       extra profile call and timing data in richly annotated and cross-linked
1227       reports.
1228
1229       Steve Peters came on board along the way with patches for portability
1230       and to keep NYTProf working with the latest development perl versions.
1231       Nicholas Clark added zip compression, many optimizations, and
1232       "nytprofmerge".  Jan Dubois contributed Windows support.
1233
1234       Adam's work was sponsored by The New York Times Co.
1235       <http://open.nytimes.com>.  Tim's work was partly sponsored by
1236       Shopzilla <http://www.shopzilla.com> during 2008 but hasn't been
1237       sponsored since then.
1238
1239       For the record, Tim has never worked for the New York Times nor has he
1240       received any kind of sponsorship or support from them in relation to
1241       NYTProf. The name of this module is simply result of the history
1242       outlined above, which can be summarised as: Adam forked an existing
1243       module when he added his enhancements and Tim didn't.
1244
1245
1246
1247perl v5.30.1                      2020-01-29                 Devel::NYTProf(3)
Impressum