1Devel::NYTProf(3)     User Contributed Perl Documentation    Devel::NYTProf(3)
2
3
4

NAME

6       Devel::NYTProf - Powerful fast feature-rich Perl source code profiler
7

SYNOPSIS

9         # profile code and write database to ./nytprof.out
10         perl -d:NYTProf some_perl.pl
11
12         # convert database into a set of html files, e.g., ./nytprof/index.html
13         # and open a web browser on the nytprof/index.html file
14         nytprofhtml --open
15
16         # or into comma separated files, e.g., ./nytprof/*.csv
17         nytprofcsv
18
19       Tim Bunce has often given talks on profiling perl code, including a
20       detailed look at how to use NYTProf and how to optimize your code. A
21       video of his YAPC::NA 2014 talk can be found at
22       <https://youtu.be/T7EK6RZAnEA>
23

DESCRIPTION

25       Devel::NYTProf is a powerful, fast, feature-rich perl source code
26       profiler.
27
28       •   Performs per-line statement profiling for fine detail
29
30       •   Performs per-subroutine statement profiling for overview
31
32       •   Performs per-opcode profiling for slow perl builtins
33
34       •   Performs per-block statement profiling (the first profiler to do
35           so)
36
37       •   Accounts correctly for time spent after calls return
38
39       •   Performs inclusive and exclusive timing of subroutines
40
41       •   Subroutine times are per calling location (a powerful feature)
42
43       •   Can profile compile-time activity, just run-time, or just END time
44
45       •   Uses novel techniques for efficient profiling
46
47       •   Sub-microsecond (100ns) resolution on supported systems
48
49       •   Very fast - the fastest statement and subroutine profilers for perl
50
51       •   Handles applications that fork, with no performance cost
52
53       •   Immune from noise caused by profiling overheads and I/O
54
55       •   Program being profiled can stop/start the profiler
56
57       •   Generates richly annotated and cross-linked html reports
58
59       •   Captures source code, including string evals, for stable results
60
61       •   Trivial to use with mod_perl - add one line to httpd.conf
62
63       •   Includes an extensive test suite
64
65       •   Tested on very large codebases
66
67       NYTProf is effectively two profilers in one: a statement profiler, and
68       a subroutine profiler.
69
70   Statement Profiling
71       The statement profiler measures the time between entering one perl
72       statement and entering the next. Whenever execution reaches a new
73       statement, the time since entering the previous statement is calculated
74       and added to the time associated with the line of the source file that
75       the previous statement starts on.
76
77       By default the statement profiler also determines the first line of the
78       current block and the first line of the current statement, and
79       accumulates times associated with those.
80
81       Another innovation unique to NYTProf is automatic compensation for a
82       problem inherent in simplistic statement-to-statement timing. Consider
83       a statement that calls a subroutine and then performs some other work
84       that doesn't execute new statements, for example:
85
86         foo(...) + mkdir(...);
87
88       In all other statement profilers the time spent in remainder of the
89       expression (mkdir in the example) will be recorded as having been spent
90       on the last statement executed in foo()! Here's another example:
91
92         while (<>) {
93            ...
94            1;
95         }
96
97       After the first time around the loop, any further time spent evaluating
98       the condition (waiting for input in this example) would be recorded as
99       having been spent on the last statement executed in the loop! (Until
100       perl bug #60954 is fixed this problem still applies to some loops. For
101       more information see
102       <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>)
103
104       NYTProf avoids these problems by intercepting the opcodes which
105       indicate that control is returning into some previous statement and
106       adjusting the profile accordingly.
107
108       The statement profiler naturally generates a lot of data which is
109       streamed out to a file in a very compact format. NYTProf takes care to
110       not include the measurement and writing overheads in the profile times
111       (some profilers produce 'noisy' data due to periodic stdio flushing).
112
113   Subroutine Profiling
114       The subroutine profiler measures the time between entering a subroutine
115       and leaving it. It then increments a call count and accumulates the
116       duration.  For each subroutine called, separate counts and durations
117       are stored for each location that called the subroutine.
118
119       Subroutine entry is detected by intercepting the "entersub" opcode.
120       Subroutine exit is detected via perl's internal save stack. As a result
121       the subroutine profiler is both fast and robust.
122
123       Subroutine Recursion
124
125       For subroutines that recurse directly or indirectly, such as
126       Error::try, the inclusive time is only measured for the outer-most
127       call.
128
129       The inclusive times of recursive calls are still measured and are
130       accumulated separately. Also the 'maximum recursion depth' per calling
131       location is recorded.
132
133       Goto &Subroutine
134
135       Perl implements a "goto &destination" as a "return" followed by a call
136       to &destination, so that's how it will appear in the report.
137
138       The "goto" will be shown with a very short time because it's
139       effectively just a "return". The &destination sub will show a call not
140       from the location of the "goto" but from the location of the call to
141       the sub that performed the "goto".
142
143       accept()
144
145       The perl built-in accept() function waits listening for a connection on
146       a socket, and so is a key part of pure-perl network service
147       applications.
148
149       The time spent waiting for a remotely initiated connection can be
150       relatively high but is not relevant to the performance of the
151       application. So the accept() function is treated as a special case. The
152       subroutine profiler discounts the time spent in the accept() function.
153       It does this in a way that also discounts that time from all the
154       callers up the call stack. The effect on the reports is that all
155       accept() calls appear to be instant.
156
157       The statement profiler still shows the time actually spent in the
158       statement that executed the accept() call.
159
160   Application Profiling
161       NYTProf records extra information in the data file to capture details
162       that may be useful when analyzing the performance. It also records the
163       filename and line ranges of all the subroutines.
164
165       NYTProf can profile applications that fork, and does so with no loss of
166       performance.  NYTProf detects the fork and starts writing a new profile
167       file with the pid appended to the filename. Since nytprofhtml only
168       works with a single profile file you may want to merge multiple files
169       using nytprofmerge.
170
171   Fast Profiling
172       The NYTProf profiler is written almost entirely in C and great care has
173       been taken to ensure it's very efficient.
174
175   Apache Profiling
176       Just add one line near the start of your httpd.conf file:
177
178         PerlModule Devel::NYTProf::Apache
179
180       By default you'll get a /tmp/nytprof.$$.out file for the parent process
181       and a /tmp/nytprof.$parent.out.$$ file for each worker process.
182
183       NYTProf takes care to detect when control is returning back from perl
184       to mod_perl so time spent in mod_perl (such as waiting for the next
185       request) does not get allocated to the last statement executed.
186
187       Works with mod_perl 1 and 2. See Devel::NYTProf::Apache for more
188       information.
189

PROFILING

191       Usually you'd load Devel::NYTProf on the command line using the perl -d
192       option:
193
194         perl -d:NYTProf some_perl.pl
195
196       To save typing the ':NYTProf' you could set the "PERL5DB" environment
197       variable:
198
199         PERL5DB='use Devel::NYTProf'
200
201       and then just perl -d would work:
202
203         perl -d some_perl.pl
204
205       Or you can avoid the need to add the -d option at all by using the
206       "PERL5OPT" environment variable|perlrun/PERL5OPT>:
207
208         PERL5OPT=-d:NYTProf
209
210       That's also very handy when you can't alter the perl command line being
211       used to run the script you want to profile. Usually you'll want to
212       enable the "addpid=1" option to ensure any nested invocations of perl
213       don't overwrite the profile.
214

NYTPROF ENVIRONMENT VARIABLE

216       The behavior of Devel::NYTProf may be modified by setting the
217       environment variable "NYTPROF".  It is possible to use this environment
218       variable to effect multiple setting by separating the values with a
219       ":".  For example:
220
221         export NYTPROF=trace=2:start=init:file=/tmp/nytprof.out
222
223       Any colon or equal characters in a value can be escaped by preceding
224       them with a backslash.
225
226   addpid=1
227       Append the current process id to the end of the filename.
228
229       This avoids concurrent, or consecutive, processes from overwriting the
230       same file.  If a fork is detected during profiling then the child
231       process will automatically add the process id to the filename.
232
233   addtimestamp=1
234       Append the current time, as integer epoch seconds, to the end of the
235       filename.
236
237   trace=N
238       Set trace level to N. 0 is off (the default). Higher values cause more
239       detailed trace output. Trace output is written to STDERR or wherever
240       the "log=F" option has specified.
241
242   log=F
243       Specify the name of the file that "trace=N" output should be written
244       to.
245
246   start=...
247       Specify at which phase of program execution the profiler should be
248       enabled:
249
250         start=begin - start immediately (the default)
251         start=init  - start at beginning of INIT phase (after compilation/use/BEGIN)
252         start=end   - start at beginning of END phase
253         start=no    - don't automatically start
254
255       The start=no option is handy if you want to explicitly control
256       profiling by calling DB::enable_profile() and DB::disable_profile()
257       yourself.  See "RUN-TIME CONTROL OF PROFILING".
258
259       The start=init option is handy if you want to avoid profiling the
260       loading and initialization of modules.
261
262   optimize=0
263       Disable the perl optimizer.
264
265       By default NYTProf leaves perl's optimizer enabled.  That gives you
266       more accurate profile timing overall, but can lead to odd statement
267       counts for individual sets of lines. That's because the perl's peephole
268       optimizer has effectively rewritten the statements but you can't see
269       what the rewritten version looks like.
270
271       For example:
272
273         1     if (...) {
274         2         return;
275         3     }
276
277       may be rewritten as
278
279         1    return if (...)
280
281       so the profile won't show a statement count for line 2 in your source
282       code because the "return" was merged into the "if" statement on the
283       preceding line.
284
285       Also 'empty' statements like "1;" are removed entirely.  Such
286       statements are empty because the optimizer has already removed the
287       pointless constant in void context. It then goes on to remove the now
288       empty statement (in perl >= 5.13.7).
289
290       Using the "optimize=0" option disables the optimizer so you'll get
291       lower overall performance but more accurately assigned statement
292       counts.
293
294       If you find any other examples of the effect of optimizer on NYTProf
295       output (other than performance, obviously) please let us know.
296
297   subs=0
298       Set to 0 to disable the collection of subroutine caller and timing
299       details.
300
301   blocks=1
302       Set to 1 to enable the determination of block and subroutine location
303       per statement.  This makes the profiler about 50% slower (as of July
304       2008) and produces larger output files, but you gain some valuable
305       insight in where time is spent in the blocks within large subroutines
306       and scripts.
307
308   stmts=0
309       Set to 0 to disable the statement profiler. (Implies "blocks=0".)  The
310       reports won't contain any statement timing detail.
311
312       This significantly reduces the overhead of the profiler and can also be
313       useful for profiling large applications that would normally generate a
314       very large profile data file.
315
316   calls=N
317       This option is new and experimental.
318
319       With calls=1 (the default) subroutine call return events are emitted
320       into the data stream as they happen.  With calls=2 subroutine call
321       entry events are also emitted. With calls=0 no subroutine call events
322       are produced.  This option depends on the "subs" option being enabled,
323       which it is by default.
324
325       The nytprofcalls utility can be used to process this data. It too is
326       new and experimental and so likely to change.
327
328       The subroutine profiler normally gathers data in memory and outputs a
329       summary when the profile data is being finalized, usually when the
330       program has finished.  The summary contains aggregate information for
331       all the calls from one location to another, but the details of
332       individual calls have been lost.  The calls option enables the
333       recording of individual call events and thus more detailed analysis and
334       reporting of that data.
335
336   leave=0
337       Set to 0 to disable the extra work done by the statement profiler to
338       allocate times accurately when returning into the middle of statement.
339       For example leaving a subroutine and returning into the middle of
340       statement, or re-evaluating a loop condition.
341
342       This feature also ensures that in embedded environments, such as
343       mod_perl, the last statement executed doesn't accumulate the time spent
344       'outside perl'.
345
346   findcaller=1
347       Force NYTProf to recalculate the name of the caller of the each sub
348       instead of 'inheriting' the name calculated when the caller was
349       entered. (Rarely needed, but might be useful in some odd cases.)
350
351   use_db_sub=1
352       Set to 1 to enable use of the traditional DB::DB() subroutine to
353       perform profiling, instead of the faster 'opcode redirection' technique
354       that's used by default. Also effectively sets "leave=0" (see above).
355
356       The default 'opcode redirection' technique can't profile subroutines
357       that were compiled before NYTProf was loaded. So using use_db_sub=1 can
358       be useful in cases where you can't load the profiler early in the life
359       of the application.
360
361       Another side effect of "use_db_sub=1" is that it enables recording of
362       the source code of the "perl -e '...'" and "perl -" input for old
363       versions of perl. See also "savesrc=0".
364
365   savesrc=0
366       Disable the saving of source code.
367
368       By default NYTProf saves a copy of all source code into the profile
369       data file.  This makes the file self-contained, so the reporting tools
370       no longer depend on having the unmodified source code files available.
371
372       With "savesrc=0" some source code is still saved: the arguments to the
373       "perl -e" option, the script fed to perl via STDIN when using "perl -",
374       and the source code of string evals.
375
376       Saving the source code of string evals requires perl version 5.8.9+,
377       5.10.1+, or 5.12 or later.
378
379       Saving the source code of the "perl -e '...'" or "perl -" input
380       requires either a recent perl version, as above, or setting the
381       "use_db_sub=1" option.
382
383   slowops=N
384       Profile perl opcodes that can be slow. These include opcodes that make
385       system calls, such as "print", "read", "sysread", "socket" etc., plus
386       regular expression opcodes like "subst" and "match".
387
388       If "N" is 0 then slowops profiling is disabled.
389
390       If "N" is 1 then all the builtins are treated as being defined in the
391       "CORE" package. So times for "print" calls from anywhere in your code
392       are merged and accounted for as calls to an xsub called "CORE::print".
393
394       If "N" is 2 (the default) then builtins are treated as being defined in
395       the package that calls them. So calls to "print" from package "Foo" are
396       treated as calls to an xsub called "Foo::CORE:print". Note the single
397       colon after CORE.
398
399       The opcodes are currently profiled using their internal names, so
400       "printf" is "prtf" and the "-x" file test is "fteexec". This may change
401       in future.
402
403       Opcodes that call subroutines, perhaps by triggering a FETCH from a
404       tied variable, currently appear in the call tree as the caller of the
405       sub. This is likely to change in future.
406
407   usecputime=1
408       This option has been removed. Profiling won't be enabled if set.
409
410       Use the "clock=N" option to select a high-resolution CPU time clock, if
411       available on your system, instead. That will give you higher resolution
412       and work for the subroutine profiler as well.
413
414   file=...
415       Specify the output file to write profile data to (default:
416       './nytprof.out').
417
418   compress=...
419       Specify the compression level to use, if NYTProf is compiled with
420       compression support. Valid values are 0 to 9, with 0 disabling
421       compression. The default is 6 as higher values yield little extra
422       compression but the cpu cost starts to rise significantly. Using level
423       1 still gives you a significant reduction in file size.
424
425       If NYTProf was not compiled with compression support, this option is
426       silently ignored.
427
428   clock=N
429       Systems which support the clock_gettime() system call typically support
430       several clocks. By default NYTProf uses CLOCK_MONOTONIC.
431
432       This option enables you to select a different clock by specifying the
433       integer id of the clock (which may vary between operating system
434       types).  If the clock you select isn't available then CLOCK_REALTIME is
435       used.
436
437       See "CLOCKS" for more information.
438
439   sigexit=1
440       When perl exits normally it runs any code defined in "END" blocks.
441       NYTProf defines an END block that finishes profiling and writes out the
442       final profile data.
443
444       If the process ends due to a signal then END blocks are not executed so
445       the profile will be incomplete and unusable.  The "sigexit" option
446       tells NYTProf to catch some signals (e.g. INT, HUP, PIPE, SEGV, BUS)
447       and ensure a usable profile by executing:
448
449           DB::finish_profile();
450           exit 1;
451
452       You can also specify which signals to catch in this way by listing
453       them, separated by commas, as the value of the option (case is not
454       significant):
455
456           sigexit=int,hup
457
458   posix_exit=1
459       The NYTProf subroutine profiler normally detects calls to
460       POSIX::_exit() (which exits the process without running END blocks) and
461       automatically calls DB::finish_profile() for you, so NYTProf 'just
462       works'.
463
464       When using the "subs=0" option to disable the subroutine profiler the
465       "posix_exit" option can be used to tell NYTProf to take other steps to
466       arrange for DB::finish_profile() to be called before POSIX::_exit().
467
468   libcexit=1
469       Arranges for "finish_profile" to be called via the C library atexit()
470       function.  This may help some tricky cases where the process may exit
471       without perl executing the "END" block that NYTProf uses to call
472       /finish_profile().
473
474   endatexit=1
475       Sets the PERL_EXIT_DESTRUCT_END flag in the PL_exit_flags of the perl
476       interpreter.  This makes perl run "END" blocks in perl_destruct()
477       instead of perl_run() which may help in cases, like Apache, where perl
478       is embedded but perl_run() isn't called.
479
480   forkdepth=N
481       When a perl process that is being profiled executes a fork() the child
482       process is also profiled. The forkdepth option can be used to control
483       this. If forkdepth is zero then profiling will be disabled in the child
484       process.
485
486       If forkdepth is greater than zero then profiling will be enabled in the
487       child process and the forkdepth value in that process is decremented by
488       one.
489
490       If forkdepth is -1 (the default) then there's no limit on the number of
491       generations of children that are profiled.
492
493   nameevals=0
494       The 'file name' of a string eval is normally a string like ""(eval
495       N)"", where "N" is a sequence number. By default NYTProf asks perl to
496       give evals more informative names like ""(eval N)[file:line]"", where
497       "file" and "line" are the file and line number where the string "eval"
498       was executed.
499
500       The "nameevals=0" option can be used to disable the more informative
501       names and return to the default behaviour. This may be need in rare
502       cases where the application code is sensitive to the name given to a
503       "eval". (The most common case in when running test suites undef
504       NYTProf.)
505
506       The downside is that the NYTProf reporting tools are less useful and
507       may get confused if this option is used.
508
509   nameanonsubs=0
510       The name of a anonymous subroutine is normally ""__ANON__"".  By
511       default NYTProf asks perl to give anonymous subroutines more
512       informative names like ""__ANON__[file:line]"", where "file" and "line"
513       are the file and line number where the anonymous subroutine was
514       defined.
515
516       The "nameanonsubs=0" option can be used to disable the more informative
517       names and return to the default behaviour. This may be need in rare
518       cases where the application code is sensitive to the name given to a
519       anonymous subroutines.  (The most common case in when running test
520       suites undef NYTProf.)
521
522       The downside is that the NYTProf reporting tools are less useful and
523       may get confused if this option is used.
524

RUN-TIME CONTROL OF PROFILING

526       You can profile only parts of an application by calling
527       DB::disable_profile() to stop collecting profile data, and calling
528       DB::enable_profile() to start collecting profile data.
529
530       Using the "start=no" option lets you leave the profiler disabled
531       initially until you call DB::enable_profile() at the right moment. You
532       still need to load Devel::NYTProf as early as possible, even if you
533       don't call DB::enable_profile() until much later. That's because any
534       code that's compiled before Devel::NYTProf is loaded will not be
535       profiled by default. See also "use_db_sub=1".
536
537       The profile output file can't be used until it's been properly
538       completed and closed.  Calling DB::disable_profile() doesn't do that.
539       To make a profile file usable before the profiled application has
540       completed you can call DB::finish_profile(). Alternatively you could
541       call DB::enable_profile($newfile).
542
543       Always call the DB::enable_profile(), DB::disable_profile() or
544       DB::finish_profile() function with the "DB::" prefix as shown because
545       you can't import them. They're provided automatically when NYTProf is
546       in use.
547
548   disable_profile
549         DB::disable_profile()
550
551       Stops collection of profile data until DB:enable_profile() is called.
552
553       Subroutine calls which were made while profiling was enabled and are
554       still on the call stack (have not yet exited) will still have their
555       profile data collected when they exit. Compare with "finish_profile"
556       below.
557
558   enable_profile
559         DB::enable_profile($newfile)
560         DB::enable_profile()
561
562       Enables collection of profile data. If $newfile is specified the
563       profile data will be written to $newfile (after completing and closing
564       the previous file, if any).  If $newfile already exists it will be
565       deleted first.  If DB::enable_profile() is called without a filename
566       argument then profile data will continue to be written to the current
567       file (nytprof.out by default).
568
569   finish_profile
570         DB::finish_profile()
571
572       Calls DB::disable_profile(), then completes the profile data file by
573       writing subroutine profile data, and then closes the file. The in
574       memory subroutine profile data is then discarded.
575
576       Normally NYTProf arranges to call finish_profile() for you via an END
577       block.
578

DATA COLLECTION AND INTERPRETATION

580       NYTProf tries very hard to gather accurate information.  The nature of
581       the internals of perl mean that, in some cases, the information that's
582       gathered is accurate but surprising. In some cases it can appear to be
583       misleading.  (Of course, in some cases it may actually be plain wrong.
584       Caveat lector.)
585
586   If Statement and Subroutine Timings Don't Match
587       NYTProf has two profilers: a statement profiler that's invoked when
588       perl moves from one perl statement to another, and a subroutine
589       profiler that's invoked when perl calls or returns from a subroutine.
590
591       The individual statement timings for a subroutine usually add up to
592       slightly less than the exclusive time for the subroutine. That's
593       because the handling of the subroutine call and return overheads is
594       included in the exclusive time for the subroutine. The difference may
595       only be a few microseconds but that may become noticeable for
596       subroutines that are called hundreds of thousands of times.
597
598       The statement profiler keeps track how much time was spent on
599       overheads, like writing statement profile data to disk. The subroutine
600       profiler subtracts the overheads that have accumulated between entering
601       and leaving the subroutine in order to give a more accurate profile.
602       The statement profiler is generally very fast because most writes get
603       buffered for zip compression so the profiler overhead per statement
604       tends to be very small, often a single 'tick'.  The result is that the
605       accumulated overhead is quite noisy. This becomes more significant for
606       subroutines that are called frequently and are also fast.  This may be
607       another, smaller, contribution to the discrepancy between statement
608       time and exclusive times.
609
610   If Headline Subroutine Timings Don't Match the Called Subs
611       Overall subroutine times are reported with a headline like "spent 10s
612       (2+8) within ...".  In this example, 10 seconds were spent inside the
613       subroutine (the "inclusive time") and, of that, 8 seconds were spent in
614       subroutines called by this one.  That leaves 2 seconds as the time
615       spent in the subroutine code itself (the "exclusive time", sometimes
616       also called the "self time").
617
618       The report shows the source code of the subroutine. Lines that make
619       calls to other subroutines are annotated with details of the time spent
620       in those calls.
621
622       Sometimes the sum of the times for calls made by the lines of code in
623       the subroutine is less than the inclusive-exclusive time reported in
624       the headline (10-2 = 8 seconds in the example above).
625
626       What's happening here is that calls to other subroutines are being made
627       but NYTProf isn't able to determine the calling location correctly so
628       the calls don't appear in the report in the correct place.
629
630       Using an old version of perl is one cause (see below). Another is
631       calling subroutines that exit via "goto &sub;" - most frequently
632       encountered in AUTOLOAD subs and code using the Memoize module.
633
634       In general the overall subroutine timing is accurate and should be
635       trusted more than the sum of statement or nested sub call timings.
636
637   Perl 5.10.1+ (or else 5.8.9+) is Recommended
638       These versions of perl yield much more detailed information about calls
639       to BEGIN, CHECK, INIT, and END blocks, the code handling tied or
640       overloaded variables, and callbacks from XS code.
641
642       Perl 5.12 will hopefully also fix an inaccuracy in the timing of the
643       last statement and the condition clause of some kinds of loops:
644       <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>
645
646   eval $string
647       Perl treats each execution of a string eval ("eval $string;" not "eval
648       { ...  }") as a distinct file, so NYTProf does as well. The 'files' are
649       given names with this structure:
650
651               (eval $sequence)[$filename:$line]
652
653       for example ""(eval 93)[/foo/bar.pm:42]"" would be the name given to
654       the 93rd execution of a string eval by that process and, in this case,
655       the 93rd eval happened to be one at line 42 of "/foo/bar.pm".
656
657       Nested string evals can give rise to file names like
658
659               (eval 1047)[(eval 93)[/foo/bar.pm:42]:17]
660
661       Merging Evals
662
663       Some applications execute a great many string eval statements. If
664       NYTProf generated a report page for each one it would not only slow
665       report generation but also make the overall report less useful by
666       scattering performance data too widely.  On the other hand, being able
667       to see the actual source code executed by an eval, along with the
668       timing details, is often very useful.
669
670       To try to balance these conflicting needs, NYTProf currently merges
671       uninteresting string eval siblings.
672
673       What does that mean? Well, for each source code line that executed any
674       string evals, NYTProf first gathers the corresponding eval 'files' for
675       that line (known as the 'siblings') into groups keyed by distinct
676       source code.
677
678       Then, for each of those groups of siblings, NYTProf will 'merge' a
679       group that shares the same source code and doesn't execute any string
680       evals itself.  Merging means to pick one sibling as the survivor and
681       merge and delete all the data from the others into it.
682
683       If there are a large number of sibling groups then the data for all of
684       them are merged into one regardless.
685
686       The report annotations will indicate when evals have been merged
687       together.
688
689       Merging Anonymous Subroutines
690
691       Anonymous subroutines defined within string evals have names like this:
692
693               main::__ANON__[(eval 75)[/foo/bar.pm:42]:12]
694
695       That anonymous subroutine was defined on line 12 of the source code
696       executed by the string eval on line 42 of /foo/bar.pm. That was the
697       75th string eval executed by the program.
698
699       Anonymous subroutines defined on the same line of sibling evals that
700       get merged are also merged. That is, the profile information is merged
701       into one and the others are discarded.
702
703       Timing
704
705       Care should be taken when interpreting the report annotations
706       associated with a string eval statement.  Normally the report
707       annotations embedded into the source code related to timings from the
708       subroutine profiler. This isn't (currently) true of annotations for
709       string eval statements.
710
711       This makes a significant different if the eval defines any subroutines
712       that get called after the eval has returned. Because the time shown for
713       a string eval is based on the statement times it will include time
714       spent executing statements within the subs defined by the eval.
715
716       In future NYTProf may involve the subroutine profiler in timings evals
717       and so be able to avoid this issue.
718
719   Calls from XSUBs and Opcodes
720       Calls record the current filename and line number of the perl code at
721       the time the call was made. That's fine and accurate for calls from
722       perl code. For calls that originate from C code however, such as an
723       XSUB or an opcode, the filename and line number recorded are still
724       those of the last perl statement executed.
725
726       For example, a line that calls an xsub will appear in reports to also
727       have also called any subroutines that that xsub called. This can be
728       construed as a feature.
729
730       As an extreme example, the first time a regular expression that uses
731       character classes is executed on a unicode string you'll find profile
732       data like this:
733
734             # spent 1ms within main::BEGIN@4 which was called
735             #    once (1ms+0s) by main::CORE:subst at line 0
736         4   s/ (?: [A-Z] | [\d] )+ (?= [\s] ) //x;
737             # spent  38.8ms making 1 call to main::CORE:subst
738             # spent  25.4ms making 2 calls to utf8::SWASHNEW, avg 12.7ms/call
739             # spent  12.4ms making 1 call to utf8::AUTOLOAD
740

MAKING NYTPROF FASTER

742       You can reduce the cost of profiling by adjusting some options. The
743       trade-off is reduced detail and/or accuracy in reports.
744
745       If you don't need statement-level profiling then you can disable it via
746       "stmts=0".  To further boost statement-level profiling performance try
747       "leave=0" but note that will apportion timings for some kinds of
748       statements less accurate).
749
750       If you don't need call stacks or flamegraph then disable it via
751       calls=0.  If you don't need subroutine profiling then you can disable
752       it via "subs=0".  If you do need it but don't need timings for perl
753       opcodes then set slowops=0.
754
755       Generally speaking, setting calls=0 and slowops=0 will give you a
756       useful boost with the least loss of detail.
757
758       Another approach is to only enable NYTProf in the sections of code that
759       interest you. See "RUN-TIME CONTROL OF PROFILING" for more details.
760
761       To speed up nytprofhtml try using the --minimal (-m) or --no-flame
762       options.
763

REPORTS

765       The Devel::NYTProf::Data module provides a low-level interface for
766       loading the profile data.
767
768       The Devel::NYTProf::Reader module provides an interface for generating
769       arbitrary reports.  This means that you can implement your own output
770       format in perl. (Though the module is in a state of flux and may be
771       deprecated soon.)
772
773       Included in the bin directory of this distribution are some scripts
774       which turn the raw profile data into more useful formats:
775
776   nytprofhtml
777       Creates attractive, richly annotated, and fully cross-linked html
778       reports (including statistics, source code and color highlighting).
779       This is the main report generation tool for NYTProf.
780
781   nytprofcg
782       Translates a profile into a format that can be loaded into KCachegrind
783       <http://kcachegrind.github.io/>
784
785   nytprofcalls
786       Reads a profile and processes the calls events it contains.
787
788   nytprofmerge
789       Reads multiple profile data files and writes out a new file containing
790       the merged profile data.
791

LIMITATIONS

793   Threads and Multiplicity
794       "Devel::NYTProf" is not currently thread safe or multiplicity safe.  If
795       you'd be interested in helping to fix that then please get in touch
796       with us. Meanwhile, profiling is disabled when a thread is created, and
797       NYTProf tries to ignore any activity from perl interpreters other than
798       the first one that loaded it.
799
800   Coro
801       The "Devel::NYTProf" subroutine profiler gets confused by the stack
802       gymnastics performed by the Coro module and aborts. When profiling
803       applications that use Coro you should disable the subroutine profiler
804       using the "subs=0" option.
805
806   FCGI::Engine
807       Using open('-|') in code running under FCGI::Engine causes a panic in
808       nytprofcalls.  See https://github.com/timbunce/devel-nytprof/issues/20
809       for more information.
810
811   For perl < 5.8.8 it may change what caller() returns
812       For example, the Readonly module croaks with "Invalid tie" when
813       profiled with perl versions before 5.8.8. That's because Readonly
814       explicitly checking for certain values from caller(). The NEXT module
815       is also affected.
816
817   For perl < 5.10.1 it can't see some implicit calls and callbacks
818       For perl versions prior to 5.8.9 and 5.10.1, some implicit subroutine
819       calls can't be seen by the subroutine profiler. Technically this
820       affects calls made via the various perl "call_*()" internal APIs.
821
822       For example, BEGIN/CHECK/INIT/END blocks, the "TIE"whatever subroutine
823       called by tie(), all calls made via operator overloading, and callbacks
824       from XS code, are not seen.
825
826       The effect is that time in those subroutines is accumulated by the subs
827       that triggered the call to them. So time spent in calls invoked by perl
828       to handle overloading are accumulated by the subroutines that trigger
829       overloading (so it is measured, but the cost is dispersed across
830       possibly many calling locations).
831
832       Although the calls aren't seen by the subroutine profiler, the
833       individual statements executed by the code in the called subs are
834       profiled by the statement profiler.
835
836   #line directives
837       The reporting code currently doesn't handle #line directives, but at
838       least it warns about them. Patches welcome.
839
840   Freed values in @_ may be mutated
841       Perl has a class of bugs related to the fact that values placed in the
842       stack are not reference counted. Consider this example:
843
844         @a = (1..9);  sub s { undef @a; print $_ for @_ }  s(@a);
845
846       The "undef @a" frees the values that @_ refers to. Perl can sometimes
847       detect when a freed value is accessed and treats it as an undef.
848       However, if the freed value is assigned some new value then @_ is
849       effectively corrupted.
850
851       NYTProf allocates new values while it's profiling, in order to record
852       program activity, and so may appear to corrupt @_ in this (rare)
853       situation.  If this happens, NYTProf is simply exposing an existing
854       problem in the code.
855
856   Lvalue subroutines aren't profiled when using use_db_sub=1
857       Currently 'lvalue' subroutines (subs that can be assigned to, like
858       foo() = 42) are not profiled when using use_db_sub=1.
859

CLOCKS

861       Here we discuss the way NYTProf gets high-resolution timing information
862       from your system and related issues.
863
864   POSIX Clocks
865       These are the clocks that your system may support if it supports the
866       POSIX clock_gettime() function. Other clock sources are listed in the
867       "Other Clocks" section below.
868
869       The clock_gettime() interface allows clocks to return times to
870       nanosecond precision. Of course few offer nanosecond accuracy but the
871       extra precision helps reduce the cumulative error that naturally occurs
872       when adding together many timings. When using these clocks NYTProf
873       outputs timings as a count of 100 nanosecond ticks.
874
875       CLOCK_MONOTONIC
876
877       CLOCK_MONOTONIC represents the amount of time since an unspecified
878       point in the past (typically system start-up time).  It increments
879       uniformly independent of adjustments to 'wallclock time'. NYTProf will
880       use this clock by default, if available.
881
882       CLOCK_REALTIME
883
884       CLOCK_REALTIME is typically the system's main high resolution 'wall
885       clock time' source.  The same source as used for the gettimeofday()
886       call used by most kinds of perl benchmarking and profiling tools.
887
888       The problem with real time is that it's far from simple. It tends to
889       drift and then be reset to match 'reality', either sharply or by small
890       adjustments (via the adjtime() system call).
891
892       Surprisingly, it can also go backwards, for reasons explained in
893       http://preview.tinyurl.com/5wawnn so CLOCK_MONOTONIC is preferred.
894
895       CLOCK_VIRTUAL
896
897       CLOCK_VIRTUAL increments only when the CPU is running in user mode on
898       behalf of the calling process.
899
900       CLOCK_PROF
901
902       CLOCK_PROF increments when the CPU is running in user or kernel mode.
903
904       CLOCK_PROCESS_CPUTIME_ID
905
906       CLOCK_PROCESS_CPUTIME_ID represents the amount of execution time of the
907       process associated with the clock.
908
909       CLOCK_THREAD_CPUTIME_ID
910
911       CLOCK_THREAD_CPUTIME_ID represents the amount of execution time of the
912       thread associated with the clock.
913
914       Finding Available POSIX Clocks
915
916       On unix-like systems you can find the CLOCK_* clocks available on you
917       system using a command like:
918
919         grep -r 'define *CLOCK_' /usr/include
920
921       Look for a group that includes CLOCK_REALTIME. The integer values
922       listed are the clock ids that you can use with the "clock=N" option.
923
924       A future version of NYTProf should be able to list the supported
925       clocks.
926
927   Other Clocks
928       This section lists other clock sources that NYTProf may use.
929
930       If your system doesn't support clock_gettime() then NYTProf will use
931       gettimeofday(), or the nearest equivalent,
932
933       gettimeofday
934
935       This is the traditional high resolution time of day interface for most
936       unix-like systems.  With this clock NYTProf outputs timings as a count
937       of 1 microsecond ticks.
938
939       mach_absolute_time
940
941       On Mac OS X the mach_absolute_time() function is used. With this clock
942       NYTProf outputs timings as a count of 100 nanosecond ticks.
943
944       Time::HiRes
945
946       On systems which don't support other clocks, NYTProf falls back to
947       using the Time::HiRes module.  With this clock NYTProf outputs timings
948       as a count of 1 microsecond ticks.
949
950   Clock References
951       Relevant specifications and manual pages:
952
953         http://www.opengroup.org/onlinepubs/000095399/functions/clock_getres.html
954         http://linux.die.net/man/3/clock_gettime
955
956       Why 'realtime' can appear to go backwards:
957
958         http://preview.tinyurl.com/5wawnn
959
960       The PostgreSQL pg_test_timing utility documentation has a good summary
961       of timing issues:
962
963         http://www.postgresql.org/docs/9.2/static/pgtesttiming.html
964

CAVEATS

966   SMP Systems
967       On systems with multiple processors, which includes most modern
968       machines, (from Linux docs though applicable to most SMP systems):
969
970         The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on
971         many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium).
972         These registers may differ between CPUs and as a consequence these clocks may
973         return bogus results if a process is migrated to another CPU.
974
975         If the CPUs in an SMP system have different clock sources then there is no way
976         to maintain a correlation between the timer registers since each CPU will run
977         at a slightly different frequency. If that is the case then
978         clock_getcpuclockid(0) will return ENOENT to signify this condition. The two
979         clocks will then only be useful if it can be ensured that a process stays on a
980         certain CPU.
981
982         The processors in an SMP system do not start all at exactly the same time and
983         therefore the timer registers are typically running at an offset. Some
984         architectures include code that attempts to limit these offsets on bootup.
985         However, the code cannot guarantee to accurately tune the offsets. Glibc
986         contains no provisions to deal with these offsets (unlike the Linux Kernel).
987         Typically these offsets are small and therefore the effects may be negligible
988         in most cases.
989
990       In summary, SMP systems are likely to give 'noisy' profiles.  Setting a
991       "Processor Affinity" may help.
992
993       Processor Affinity
994
995       Processor affinity is an aspect of task scheduling on SMP systems.
996       "Processor affinity takes advantage of the fact that some remnants of a
997       process may remain in one processor's state (in particular, in its
998       cache) from the last time the process ran, and so scheduling it to run
999       on the same processor the next time could result in the process running
1000       more efficiently than if it were to run on another processor." (From
1001       http://en.wikipedia.org/wiki/Processor_affinity)
1002
1003       Setting an explicit processor affinity can avoid the problems described
1004       in "SMP Systems".
1005
1006       Processor affinity can be set using the "taskset" command on Linux.
1007
1008       Note that processor affinity is inherited by child processes, so if the
1009       process you're profiling spawns cpu intensive sub processes then your
1010       process will be impacted by those more than it otherwise would.
1011
1012       Windows
1013
1014       THIS SECTION DOESN'T MATCH THE CODE
1015
1016       On Windows NYTProf uses Time::HiRes which uses the windows
1017       QueryPerformanceCounter() API with some extra logic to adjust for the
1018       current clock speed and try to resync the raw counter to wallclock time
1019       every so often (every 30 seconds or if the timer drifts by more than
1020       0.5 of a seconds).  This extra logic may lead to occasional spurious
1021       results.
1022
1023       (It would be great if someone could contribute a patch to NYTProf to
1024       use QueryPerformanceCounter() directly and avoid the overheads and
1025       resyncing behaviour of Time::HiRes.)
1026
1027   Virtual Machines
1028       I recommend you don't do performance profiling while running in a
1029       virtual machine.  If you do you're likely to find inexplicable spikes
1030       of real-time appearing at unreasonable places in your code. You should
1031       pay less attention to the statement timings and rely more on the
1032       subroutine timings. They will still be noisy but less so than the
1033       statement times.
1034
1035       You could also try using the "clock=N" option to select a high-
1036       resolution cpu-time clock instead of a real-time one. That should be
1037       much less noisy, though you will lose visibility of wait-times due to
1038       network and disk I/O, for example.
1039

BUGS

1041       Possibly. All complex software has bugs. Let me know if you find one.
1042

SEE ALSO

1044       Screenshots of nytprofhtml v2.01 reports can be seen at
1045       <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-index.png>
1046       and
1047       <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-all-perl-files.png>.
1048       A writeup of the new features of NYTProf v2 can be found at
1049       <http://blog.timbunce.org/2008/07/15/nytprof-v2-a-major-advance-in-perl-profilers/>
1050       and the background story, explaining the "why", can be found at
1051       <http://blog.timbunce.org/2008/07/16/nytprof-v2-the-background-story/>.
1052
1053       Mailing list and discussion at
1054       <http://groups.google.com/group/develnytprof-dev>
1055
1056       Blog posts <http://blog.timbunce.org/tag/nytprof/>
1057
1058       Public Github Repository and hacking instructions at
1059       <https://github.com/timbunce/devel-nytprof/>
1060
1061       nytprofhtml is a script included that produces html reports.
1062       nytprofcsv is another script included that produces plain text CSV
1063       reports.
1064
1065       Devel::NYTProf::Reader is the module that powers the report scripts.
1066       You might want to check this out if you plan to implement a custom
1067       report (though it's very likely to be deprecated in a future release).
1068
1069       Devel::NYTProf::ReadStream is the module that lets you read a profile
1070       data file as a stream of chunks of data.
1071
1072       Other tools:
1073
1074       DTrace <https://speakerdeck.com/mrallen1/perl-dtrace-and-you>
1075

TROUBLESHOOTING

1077   "Profile data incomplete, ..." or "Profile format error: ..."
1078       This error message means the file doesn't contain all the expected data
1079       or the data has been corrupted in some way.  That may be because it was
1080       truncated (perhaps the filesystem was full) or, more commonly, because
1081       the all the expected data hasn't been written.
1082
1083       NYTProf writes some important data to the data file when finishing
1084       profiling.  If you read the file before the profiling has finished
1085       you'll get this error.
1086
1087       If the process being profiled is still running you'll need to wait
1088       until it exits cleanly (runs "END" blocks or "finish_profile" is called
1089       explicitly).
1090
1091       If the process being profiled has exited then it's likely that it met
1092       with a sudden and unnatural death that didn't give NYTProf a chance to
1093       finish the profile.  If the sudden death was due to a signal, like
1094       SIGTERM, or a SIGINT from pressing Ctrl-C, then the "sigexit=1" option
1095       may help.
1096
1097       If the sudden death was due to calling POSIX::_exit($status) then
1098       you'll need to call "finish_profile" before calling "POSIX::_exit".
1099
1100       You'll also get this error if the code trying to read the profile is
1101       itself being profiled. That's most likely to happen if you enable
1102       profiling via the "PERL5OPT" environment variable and have forgotten to
1103       unset it.
1104
1105       If you've encountered this error message, and you're sure you've
1106       understood the concerns described above, and you're sure they don't
1107       apply in your case, then please open an issue.  Be sure to include
1108       sufficient information so I can see how you've addressed these likely
1109       causes.
1110
1111   Some source files don't have profile information
1112       This is usually due to NYTProf being initialized after some perl files
1113       have already been compiled.
1114
1115       If you can't alter the command line to add ""-d:NYTProf"" you could try
1116       using the "PERL5OPT" environment variable. See "PROFILING".
1117
1118       You could also try using the "use_db_sub=1" option.
1119
1120   Eval ... has unknown invoking fid
1121       When using the statement profiler you may see a warning message like
1122       this:
1123
1124         Eval '(eval 2)' (fid 9, flags:viastmt,savesrc) has unknown invoking fid 10
1125
1126       Notice that the eval file id (fid 9 in this case) is lower than the
1127       file id that invoked the eval (fid 10 in this case). This is a known
1128       problem caused by the way perl works and how the profiler assigns and
1129       outputs the file ids.  The invoking fid is known but gets assigned a
1130       fid and output after the fid for the eval, and that causes the warning
1131       when the file is read.
1132
1133   Warning: %d subroutine calls had negative time
1134       There are two likely causes for this: clock instability, or accumulated
1135       timing errors.
1136
1137       Clock instability, if present on your system, is most likely to be
1138       noticeable on very small/fast subroutines that are called very few
1139       times.
1140
1141       Accumulated timing errors can arise because the subroutine profiler
1142       uses floating point values (NVs) to store the times.  They are most
1143       likely to be noticed on subroutines that are called a few times but
1144       which make a large number of calls to very fast subroutines (such as
1145       opcodes). In this case the accumulated time apparently spent making
1146       those calls can be greater than the time spent in the calling
1147       subroutine.
1148
1149       If you rerun nytprofhtml (etc.) with the "trace=N" option set >0 you'll
1150       see trace messages like  "%s call has negative time: incl %fs, excl
1151       %fs" for each affected subroutine.
1152
1153       Try profiling with the "slowops=N" option set to 0 to disable the
1154       profiling of opcodes. Since opcodes often execute in a few microseconds
1155       they are a common cause of this warning.
1156
1157       You could also try recompiling perl to use 'long doubles' for the NV
1158       floating point type (use Configure -Duselongdouble). If you try this
1159       please let me know.  I'd also happily take a patch to use long doubles,
1160       if available, by default.
1161
1162   panic: buffer overflow ...
1163       You have unusually long subroutine names in your code. You'll need to
1164       rebuild Devel::NYTProf with the NYTP_MAX_SUB_NAME_LEN environment
1165       variable set to a value longer than the longest subroutine names in
1166       your code.
1167

AUTHORS AND CONTRIBUTORS

1169       Tim Bunce (<http://blog.timbunce.org>) leads the project and has done
1170       most of the development work thus far.
1171
1172       Nicholas Clark contributed zip compression and "nytprofmerge".  Chia-
1173       liang Kao contributed "nytprofcg".  Peter (Stig) Edwards contributed
1174       the VMS port.  Jan Dubois contributed the Windows port.  Gisle Aas
1175       contributed the Devel::NYTProf::ReadStream module.  Steve Peters
1176       contributed greater perl version portability and use of POSIX high-
1177       resolution clocks.  Other contributors are noted in the Changes file.
1178
1179       Many thanks to Adam Kaplan who created "NYTProf" initially by forking
1180       "Devel::FastProf" adding reporting from "Devel::Cover" and a test
1181       suite.
1182
1184         Copyright (C) 2008 by Adam Kaplan and The New York Times Company.
1185         Copyright (C) 2008-2016 by Tim Bunce, Ireland.
1186
1187       This library is free software; you can redistribute it and/or modify it
1188       under the same terms as Perl itself, either Perl version 5.8.8 or, at
1189       your option, any later version of Perl 5 you may have available.
1190
1191   Background
1192       Subroutine-level profilers:
1193
1194         Devel::DProf        | 1995-10-31 | ILYAZ
1195         Devel::AutoProfiler | 2002-04-07 | GSLONDON
1196         Devel::Profiler     | 2002-05-20 | SAMTREGAR
1197         Devel::Profile      | 2003-04-13 | JAW
1198         Devel::DProfLB      | 2006-05-11 | JAW
1199         Devel::WxProf       | 2008-04-14 | MKUTTER
1200
1201       Statement-level profilers:
1202
1203         Devel::SmallProf    | 1997-07-30 | ASHTED
1204         Devel::FastProf     | 2005-09-20 | SALVA
1205         Devel::NYTProf      | 2008-03-04 | AKAPLAN
1206         Devel::Profit       | 2008-05-19 | LBROCARD
1207
1208       Devel::NYTProf is a (now distant) fork of Devel::FastProf, which was
1209       itself an evolution of Devel::SmallProf.
1210
1211       Adam Kaplan forked Devel::FastProf and added html report generation
1212       (based on Devel::Cover) and a test suite - a tricky thing to do for a
1213       profiler.  Meanwhile Tim Bunce had been extending Devel::FastProf to
1214       add novel per-sub and per-block timing, plus subroutine caller
1215       tracking.
1216
1217       When Devel::NYTProf was released Tim switched to working on
1218       Devel::NYTProf because the html report would be a good way to show the
1219       extra profile data, and the test suite made development much easier and
1220       safer.
1221
1222       Then Tim went a little crazy and added a slew of new features, in
1223       addition to per-sub and per-block timing and subroutine caller
1224       tracking. These included the 'opcode interception' method of profiling,
1225       ultra-fast and robust inclusive subroutine timing, doubling
1226       performance, plus major changes to html reporting to display all the
1227       extra profile call and timing data in richly annotated and cross-linked
1228       reports.
1229
1230       Steve Peters came on board along the way with patches for portability
1231       and to keep NYTProf working with the latest development perl versions.
1232       Nicholas Clark added zip compression, many optimizations, and
1233       "nytprofmerge".  Jan Dubois contributed Windows support.
1234
1235       Adam's work was sponsored by The New York Times Co.
1236       <http://open.nytimes.com>.  Tim's work was partly sponsored by
1237       Shopzilla <http://www.shopzilla.com> during 2008 but hasn't been
1238       sponsored since then.
1239
1240       For the record, Tim has never worked for the New York Times nor has he
1241       received any kind of sponsorship or support from them in relation to
1242       NYTProf. The name of this module is simply result of the history
1243       outlined above, which can be summarised as: Adam forked an existing
1244       module when he added his enhancements and Tim didn't.
1245
1246
1247
1248perl v5.36.0                      2023-01-20                 Devel::NYTProf(3)
Impressum