1Devel::NYTProf(3) User Contributed Perl Documentation Devel::NYTProf(3)
2
3
4
6 Devel::NYTProf - Powerful fast feature-rich perl source code profiler
7
9 # profile code and write database to ./nytprof.out
10 perl -d:NYTProf some_perl.pl
11
12 # convert database into a set of html files, e.g., ./nytprof/index.html
13 # and open a web browser on the nytprof/index.html file
14 nytprofhtml --open
15
16 # or into comma separated files, e.g., ./nytprof/*.csv
17 nytprofcsv
18
19 A screencast about profiling perl code, including a detailed look at
20 how to use NYTProf and how to optimize your code, is available at
21 <http://timbunce.blip.tv/file/3913278/>
22
24 Devel::NYTProf is a powerful, fast, feature-rich perl source code
25 profiler.
26
27 · Performs per-line statement profiling for fine detail
28
29 · Performs per-subroutine statement profiling for overview
30
31 · Performs per-opcode profiling for slow perl builtins
32
33 · Performs per-block statement profiling (the first profiler to do
34 so)
35
36 · Accounts correctly for time spent after calls return
37
38 · Performs inclusive and exclusive timing of subroutines
39
40 · Subroutine times are per calling location (a powerful feature)
41
42 · Can profile compile-time activity, just run-time, or just END time
43
44 · Uses novel techniques for efficient profiling
45
46 · Sub-microsecond (100ns) resolution on supported systems
47
48 · Very fast - the fastest statement and subroutine profilers for perl
49
50 · Handles applications that fork, with no performance cost
51
52 · Immune from noise caused by profiling overheads and I/O
53
54 · Program being profiled can stop/start the profiler
55
56 · Generates richly annotated and cross-linked html reports
57
58 · Captures source code, including string evals, for stable results
59
60 · Trivial to use with mod_perl - add one line to httpd.conf
61
62 · Includes an extensive test suite
63
64 · Tested on very large codebases
65
66 NYTProf is effectively two profilers in one: a statement profiler, and
67 a subroutine profiler.
68
69 Statement Profiling
70 The statement profiler measures the time between entering one perl
71 statement and entering the next. Whenever execution reaches a new
72 statement, the time since entering the previous statement is calculated
73 and added to the time associated with the line of the source file that
74 the previous statement starts on.
75
76 By default the statement profiler also determines the first line of the
77 current block and the first line of the current statement, and
78 accumulates times associated with those.
79
80 Another innovation unique to NYTProf is automatic compensation for a
81 problem inherent in simplistic statement-to-statement timing. Consider
82 a statement that calls a subroutine and then performs some other work
83 that doesn't execute new statements, for example:
84
85 foo(...) + mkdir(...);
86
87 In all other statement profilers the time spent in remainder of the
88 expression (mkdir in the example) will be recorded as having been spent
89 on the last statement executed in foo()! Here's another example:
90
91 while (<>) {
92 ...
93 1;
94 }
95
96 After the first time around the loop, any further time spent evaluating
97 the condition (waiting for input in this example) would be be recorded
98 as having been spent on the last statement executed in the loop! (Until
99 perl bug #60954 is fixed this problem still applies to some loops. For
100 more information see
101 <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>)
102
103 NYTProf avoids these problems by intercepting the opcodes which
104 indicate that control is returning into some previous statement and
105 adjusting the profile accordingly.
106
107 The statement profiler naturally generates a lot of data which is
108 streamed out to a file in a very compact format. NYTProf takes care to
109 not include the measurement and writing overheads in the profile times
110 (some profilers produce 'noisy' data due to periodic stdio flushing).
111
112 Subroutine Profiling
113 The subroutine profiler measures the time between entering a subroutine
114 and leaving it. It then increments a call count and accumulates the
115 duration. For each subroutine called, separate counts and durations
116 are stored for each location that called the subroutine.
117
118 Subroutine entry is detected by intercepting the "entersub" opcode.
119 Subroutine exit is detected via perl's internal save stack. The result
120 is both extremely fast and very robust.
121
122 Subroutine Recursion
123
124 For subroutines that recurse directly or indirectly, such as
125 Error::try, the inclusive time is only measured for the outer-most
126 call.
127
128 The inclusive times of recursive calls are still measured and are
129 accumulated separately. Also the 'maximum recursion depth' per calling
130 location is recorded.
131
132 Goto &Subroutine
133
134 Perl implements a "goto &destination" as a "return" followed by a call
135 to &destination, so that's how it will appear in the report.
136
137 The "goto" will be shown with a very short time because it's
138 effectively just a "return". The &destination sub will show a call not
139 from the location of the "goto" but from the location of the call to
140 the sub that performed the "goto".
141
142 accept()
143
144 The perl built-in accept() function waits listening for a connection on
145 a socket, and so is a key part of pure-perl network service
146 applications.
147
148 The time spent waiting for a remotely initiated connection can be
149 relatively high but is not relevant to the performance of the
150 application. So the accept() function is treated as a special case. The
151 subroutine profiler discounts the time spent in the accept() function.
152 It does this in a way that also discounts that time from all the
153 callers up the call stack. The effect on the reports is that all
154 accept() calls appear to be instant.
155
156 The statement profiler still shows the time actually spent in the
157 statement that executed the accept() call.
158
159 Application Profiling
160 NYTProf records extra information in the data file to capture details
161 that may be useful when analyzing the performance. It also records the
162 filename and line ranges of all the subroutines.
163
164 NYTProf can profile applications that fork, and does so with no loss of
165 performance. NYTProf detects the fork and starts writing a new profile
166 file with the pid appended to the filename.
167
168 Fast Profiling
169 The NYTProf profiler is written almost entirely in C and great care has
170 been taken to ensure it's very efficient.
171
172 Apache Profiling
173 Just add one line near the start of your httpd.conf file:
174
175 PerlModule Devel::NYTProf::Apache
176
177 By default you'll get a /tmp/nytprof.$$.out file for the parent process
178 and a /tmp/nytprof.$parent.out.$$ file for each worker process.
179
180 NYTProf takes care to detect when control is returning back from perl
181 to mod_perl so time spent in mod_perl (such as waiting for the next
182 request) does not get allocated to the last statement executed.
183
184 Works with mod_perl 1 and 2. See Devel::NYTProf::Apache for more
185 information.
186
188 Usually you'd load Devel::NYTProf on the command line using the perl -d
189 option:
190
191 perl -d:NYTProf some_perl.pl
192
193 To save typing the ':NYTProf' you could set the PERL5DB env var
194
195 PERL5DB='use Devel::NYTProf'
196
197 and then just perl -d would work:
198
199 perl -d some_perl.pl
200
201 Or you can avoid the need to add the -d option at all by using the
202 "PERL5OPT" env var:
203
204 PERL5OPT=-d:NYTProf
205
206 That's also very handy when you can't alter the perl command line being
207 used to run the script you want to profile. Usually you'll want to
208 enable the "addpid=1" option to ensure any nested invocations of perl
209 don't overwrite the profile.
210
212 The behavior of Devel::NYTProf may be modified by setting the
213 environment variable "NYTPROF". It is possible to use this environment
214 variable to effect multiple setting by separating the values with a
215 ":". For example:
216
217 export NYTPROF=trace=2:start=init:file=/tmp/nytprof.out
218
219 Any colon or equal characters in a value can be escaped by preceding
220 them with a backslash.
221
222 addpid=1
223 Append the current process id to the end of the filename.
224
225 This avoids concurrent, or consecutive, processes from overwriting the
226 same file.
227
228 trace=N
229 Set trace level to N. 0 is off (the default). Higher values cause more
230 detailed trace output. Trace output is written to STDERR or wherever
231 the "log=F" option has specified.
232
233 log=F
234 Specify the name of the file that "trace=N" output should be written
235 to.
236
237 start=...
238 Specify at which phase of program execution the profiler should be
239 enabled:
240
241 start=begin - start immediately (the default)
242 start=init - start at beginning of INIT phase (after compilation)
243 start=end - start at beginning of END phase
244 start=no - don't automatically start
245
246 The start=no option is handy if you want to explicitly control
247 profiling by calling DB::enable_profile() and DB::disable_profile()
248 yourself.
249
250 optimize=0
251 Disable the perl optimizer.
252
253 By default NYTProf leaves perl's optimizer enabled. That gives you
254 more accurate profile timing overall, but can lead to odd statement
255 counts for individual sets of lines. That's because the perl's peephole
256 optimizer has effectively rewritten the statements but you can't see
257 what the rewritten version looks like.
258
259 For example:
260
261 1 if (...) {
262 2 return;
263 3 }
264
265 may be rewritten as
266
267 1 return if (...)
268
269 so the profile won't show a statement count for line 2 in your source
270 code because the "return" was merged into the "if" statement on the
271 preceding line.
272
273 Also 'empty' statements like "1;" are removed entirely. Such
274 statements are empty because the optimizer has already removed the
275 pointless constant in void context. It then goes on to remove the now
276 empty statement (in perl >= 5.13.7).
277
278 Using the "optimize=0" option disables the optimizer so you'll get
279 lower overall performance but more accurately assigned statement
280 counts.
281
282 If you find any other examples of the effect of optimizer on NYTProf
283 output (other than performance, obviously) please let us know.
284
285 subs=0
286 Set to 0 to disable the collection of subroutine caller and timing
287 details.
288
289 blocks=0
290 Set to 0 to disable the determination of block and subroutine location
291 per statement. This makes the profiler about 50% faster (as of July
292 2008) and produces smaller output files, but you lose some valuable
293 information. The extra cost is likely to be reduced in later versions
294 anyway, as little optimization has been done on that part of the code.
295
296 stmts=0
297 Set to 0 to disable the statement profiler. (Implies "blocks=0".) The
298 reports won't contain any statement timing detail.
299
300 This significantly reduces the overhead of the profiler and can also be
301 useful for profiling large applications that would normally generate a
302 very large profile data file.
303
304 leave=0
305 Set to 0 to disable the extra work done by the statement profiler to
306 allocate times accurately when returning into the middle of statement.
307 For example leaving a subroutine and returning into the middle of
308 statement, or re-evaluating a loop condition.
309
310 This feature also ensures that in embedded environments, such as
311 mod_perl, the last statement executed doesn't accumulate the time spent
312 'outside perl'.
313
314 findcaller=1
315 Force NYTProf to recalculate the name of the caller of the each sub
316 instead of 'inheriting' the name calculated when the caller was
317 entered. (Rarely needed, but might be useful in some odd cases.)
318
319 use_db_sub=1
320 Set to 1 to enable use of the traditional DB::DB() subroutine to
321 perform profiling, instead of the faster 'opcode redirection' technique
322 that's used by default. Also effectively sets "leave=0" (see above).
323
324 The default 'opcode redirection' technique can't profile subroutines
325 that were compiled before NYTProf was loaded. So using use_db_sub=1 can
326 be useful in cases where you can't load the profiler early in the life
327 of the application.
328
329 Another side effect of "use_db_sub=1" is that it enables recording of
330 the source code of the "perl -e '...'" and "perl -" input for old
331 versions of perl. See also "savesrc=0".
332
333 savesrc=0
334 Disable the saving of source code.
335
336 By default NYTProf saves a copy of all source code into the profile
337 data file. This makes the file self-contained, so the reporting tools
338 no longer depend on having the unmodified source code files available.
339
340 With "savesrc=0" some source code is still saved: the arguments to the
341 "perl -e" option, the script fed to perl via STDIN when using "perl -",
342 and the source code of string evals.
343
344 Saving the source code of string evals requires perl version 5.8.9+,
345 5.10.1+, or 5.12 or later.
346
347 Saving the source code of the "perl -e '...'" or "perl -" input
348 requires either a recent perl version, as above, or setting the
349 "use_db_sub=1" option.
350
351 slowops=N
352 Profile perl opcodes that can be slow. These include opcodes that make
353 system calls, such as "print", "read", "sysread", "socket" etc., plus
354 regular expression opcodes like "subst" and "match".
355
356 If "N" is 0 then slowops profiling is disabled.
357
358 If "N" is 1 then all the builtins are treated as being defined in the
359 "CORE" package. So times for "print" calls from anywhere in your code
360 are merged and accounted for as calls to an xsub called "CORE::print".
361
362 If "N" is 2 (the default) then builtins are treated as being defined in
363 the package that calls them. So calls to "print" from package "Foo" are
364 treated as calls to an xsub called "Foo::CORE:print". Note the single
365 colon after CORE.
366
367 The opcodes are currently profiled using their internal names, so
368 "printf" is "prtf" and the "-x" file test is "fteexec". This may change
369 in future.
370
371 Opcodes that call subroutines, perhaps by triggering a FETCH from a
372 tied variable, currently appear in the call tree as the caller of the
373 sub. This is likely to change in future.
374
375 usecputime=1
376 Measure user CPU + system CPU time instead of the real elapsed 'wall
377 clock' time (which is the default). But there are better ways to do
378 this, read on.
379
380 Measuring CPU time has the advantage of making the measurements
381 independent of time spent blocked waiting for the cpu or network i/o
382 etc. But the method used by Cusecputime=1> also has the severe
383 disadvantage of having typically far less accurate timings.
384
385 Most systems have a 0.01 second granularity in the results from the
386 "times()" sytem call. With modern processors having multi- gigahertz
387 clocks, 0.01 seconds is like a lifetime. The 'ticks' of this CPU time
388 clock happen so rarely relative to the activity of a most applications
389 that you'd have to run the code for many hours to have any hope of
390 reasonably useful results.
391
392 A much better alternative is to use the "clock=N" option to select a
393 high-resolution CPU time clock, if available on your system, because
394 that'll give you higher resolution and work for the subroutine profiler
395 as well.
396
397 file=...
398 Specify the output file to write profile data to (default:
399 './nytprof.out').
400
401 compress=...
402 Specify the compression level to use, if NYTProf is compiled with
403 compression support. Valid values are 0 to 9, with 0 disabling
404 compression. The default is 6 as higher values yield little extra
405 compression but the cpu cost starts to rise significantly. Using level
406 1 still gives you a significant reduction in file size.
407
408 If NYTProf was not compiled with compression support, this option is
409 silently ignored.
410
411 clock=N
412 Systems which support the "clock_gettime()" system call typically
413 support several clocks. By default NYTProf uses CLOCK_MONOTONIC.
414
415 This option enables you to select a different clock by specifying the
416 integer id of the clock (which may vary between operating system
417 types). If the clock you select isn't available then CLOCK_REALTIME is
418 used.
419
420 See "CLOCKS" for more information.
421
422 sigexit=1
423 When perl exits normally it runs any code defined in "END" blocks.
424 NYTProf defines an END block that finishes profiling and writes out the
425 final profile data.
426
427 If the process ends due to a signal then END blocks are not executed.
428 The "sigexit" option tells NYTProf to catch some signals (e.g. INT,
429 HUP, PIPE, SEGV, BUS) and ensure a usable by executing:
430
431 DB::finish_profile();
432 exit 1;
433
434 You can also specify which signals to catch in this way by listing
435 them, separated by commas, as the value of the option (case is not
436 significant):
437
438 sigexit=int,hup
439
440 posix_exit=1
441 The NYTProf subroutine profiler normally detects calls to
442 "POSIX::_exit()" (which exits the process without running END blocks)
443 and automatically calls "DB::finish_profile()" for you, so NYTProf
444 'just works'.
445
446 When using the "subs=0" option to disable the subroutine profiler the
447 "posix_exit" option can be used to tell NYTProf to take other steps to
448 arrange for "DB::finish_profile()" to be called before
449 "POSIX::_exit()".
450
451 forkdepth=N
452 When a perl process that is being profiled executes a fork() the child
453 process is also profiled. The forkdepth option can be used to control
454 this. If forkdepth is zero then profiling will be disabled in the child
455 process.
456
457 If forkdepth is greater than zero then profiling will be enabled in the
458 child process and the forkdepth value in that process is decremented by
459 one.
460
461 If forkdepth is -1 (the default) then there's no limit on the number of
462 generations of children that are profiled.
463
464 nameevals=0
465 The 'file name' of a string eval is normally a string like ""(eval
466 N)"", where "N" is a sequence number. By default NYTProf asks perl to
467 give evals more informative names like ""(eval N)[file:line]"", where
468 "file" and "line" are the file and line number where the string "eval"
469 was executed.
470
471 The "nameevals=0" option can be used to disable the more informative
472 names and return to the default behaviour. This may be need in rare
473 cases where the application code is sensitive to the name given to a
474 "eval". (The most common case in when running test suites undef
475 NYTProf.)
476
477 The downside is that the NYTProf reporting tools are less useful and
478 may get confused if this option is used.
479
480 nameanonsubs=0
481 The name of a anonymous subroutine is normally ""__ANON__"". By
482 default NYTProf asks perl to give anonymous subroutines more
483 informative names like ""__ANON__[file:line]"", where "file" and "line"
484 are the file and line number where the anonymous subroutine was
485 defined.
486
487 The "nameanonsubs=0" option can be used to disable the more informative
488 names and return to the default behaviour. This may be need in rare
489 cases where the application code is sensitive to the name given to a
490 anonymous subroutines. (The most common case in when running test
491 suites undef NYTProf.)
492
493 The downside is that the NYTProf reporting tools are less useful and
494 may get confused if this option is used.
495
497 You can profile only parts of an application by calling
498 DB::disable_profile() to stop collecting profile data, and calling
499 DB::enable_profile() to start collecting profile data.
500
501 Using the "start=no" option lets you leave the profiler disabled
502 initially until you call DB::enable_profile() at the right moment.
503
504 The profile output file can't be used until it's been properly
505 completed and closed. Calling DB::disable_profile() doesn't do that.
506 To make a profile file usable before the profiled application has
507 completed you can call DB::finish_profile(). Alternatively you could
508 call DB::enable_profile($newfile).
509
510 disable_profile
511 DB::disable_profile()
512
513 Stops collection of profile data.
514
515 Subroutine calls which were made while profiling was enabled and are
516 still on the call stack (have not yet exited) will still have their
517 profile data collected when they exit.
518
519 enable_profile
520 DB::enable_profile($newfile)
521
522 Enables collection of profile data. If $newfile is true the profile
523 data will be written to $newfile (after completing and closing the
524 previous file, if any). If $newfile already exists it will be deleted
525 first.
526
527 finish_profile
528 DB::finish_profile()
529
530 Calls DB::disable_profile(), then completes the profile data file by
531 writing subroutine profile data, and then closes the file. The in
532 memory subroutine profile data is then discarded.
533
534 Normally NYTProf arranges to call finish_profile() for you via an END
535 block.
536
538 NYTProf tries very hard to gather accurate information. The nature of
539 the internals of perl mean that, in some cases, the information that's
540 gathered is accurate but surprising. In some cases it can appear to be
541 misleading. (Of course, in some cases it may actually be plain wrong.
542 Caveat lector.)
543
544 If Statement and Subroutine Timings Don't Match
545 NYTProf has two profilers: a statement profiler that's invoked when
546 perl moves from one perl statement to another, and a subroutine
547 profiler that's invoked when perl calls or returns from a subroutine.
548
549 The individual statement timings for a subroutine usually add up to
550 slightly less than the exclusive time for the subroutine. That's
551 because the handling of the subroutine call and return overheads is
552 included in the exclusive time for the subroutine. The difference may
553 only be a new microseconds but that may become noticeable for
554 subroutines that are called hundreds of thousands of times.
555
556 The statement profiler keeps track how much time was spent on
557 overheads, like writing statement profile data to disk. The subroutine
558 profiler subtracts the overheads that have accumulated between entering
559 and leaving the subroutine in order to give a more accurate profile.
560 The statement profiler is generally very fast because most writes get
561 buffered for zip compression so the profiler overhead per statement
562 tends to be very small, often a single 'tick'. The result is that the
563 accumulated overhead is quite noisy. This becomes more significant for
564 subroutines that are called frequently and are also fast. This may be
565 another, smaller, contribution to the discrepancy between statement
566 time and exclusive times.
567
568 If Headline Subroutine Timings Don't Match the Called Subs
569 Overall subroutine times are reported with a headline like "spent 10s
570 (2+8) within ...". In this example, 10 seconds were spent inside the
571 subroutine (the "inclusive time") and, of that, 8 seconds were spent in
572 subroutines called by this one. That leaves 2 seconds as the time
573 spent in the subroutine code itself (the "exclusive time", sometimes
574 also called the "self time").
575
576 The report shows the source code of the subroutine. Lines that make
577 calls to other subroutines are annotated with details of the time spent
578 in those calls.
579
580 Sometimes the sum of the times for calls made by the lines of code in
581 the subroutine is less than the inclusive-exclusive time reported in
582 the headline (10-2 = 8 seconds in the example above).
583
584 What's happening here is that calls to other subroutines are being made
585 but NYTProf isn't able to determine the calling location correctly so
586 the calls don't appear in the report in the correct place.
587
588 Using an old version of perl is one cause (see below). Another is
589 calling subroutines that exit via "goto ⊂" - most frequently
590 encountered in AUTOLOAD subs and code using the Memoize module.
591
592 In general the overall subroutine timing is accurate and should be
593 trusted more than the sum of statement or nested sub call timings.
594
595 Perl 5.10.1+ (or else 5.8.9+) is Recommended
596 These versions of perl yield much more detailed information about calls
597 to BEGIN, CHECK, INIT, and END blocks, the code handling tied or
598 overloaded variables, and callbacks from XS code.
599
600 Perl 5.12 will hopefully also fix an inaccuracy in the timing of the
601 last statement and the condition clause of some kinds of loops:
602 <http://rt.perl.org/rt3/Ticket/Display.html?id=60954>
603
604 eval $string
605 Perl treats each execution of a string eval ("eval $string;" not "eval
606 { ... }") as a distinct file, so NYTProf does as well. The 'files' are
607 given names with this structure:
608
609 (eval $sequence)[$filename:$line]
610
611 for example ""(eval 93)[/foo/bar.pm:42]"" would be the name given to
612 the 93rd execution of a string eval by that process and, in this case,
613 the 93rd eval happened to be one at line 42 of "/foo/bar.pm".
614
615 Nested string evals can give rise to file names like
616
617 (eval 1047)[(eval 93)[/foo/bar.pm:42]:17]
618
619 Merging Evals
620
621 Some applications execute a great many string eval statements. If
622 NYTProf generated a report page for each one it would not only slow
623 report generation but also make the overall report less useful by
624 scattering performance data too widely. On the other hand, being able
625 to see the actual source code executed by an eval, along with the
626 timing details, is often very useful.
627
628 To try to balance these conflicting needs, NYTProf currently merges
629 uninteresting string eval siblings.
630
631 What does that mean? Well, for each source code line that executed any
632 string evals, NYTProf first gathers the corresponding eval 'files' for
633 that line (known as the 'siblings') into groups keyed by distinct
634 source code.
635
636 Then, for each of those groups of siblings, NYTProf will 'merge' a
637 group that shares the same source code and doesn't execute any string
638 evals itself. Merging means to pick one sibling as the survivor and
639 merge and delete all the data from the others into it.
640
641 If there are a large number of sibling groups then the data for all of
642 them are merged into one regardless.
643
644 The report annotations will indicate when evals have been merged
645 together.
646
647 Merging Anonymous Subroutines
648
649 Anonymous subroutines defined within string evals have names like this:
650
651 main::__ANON__[(eval 75)[/foo/bar.pm:42]:12]
652
653 That anonymous subroutine was defined on line 12 of the source code
654 executed by the string eval on line 42 of /foo/bar.pm. That was the
655 75th string eval executed by the program.
656
657 Anonymous subroutines defined on the same line of sibling evals that
658 get merged are also merged. That is, the profile information is merged
659 into one and the others are discarded.
660
661 Timing
662
663 Care should be taken when interpreting the report annotations
664 associated with a string eval statement. Normally the report
665 annotations embedded into the source code related to timings from the
666 subroutine profiler. This isn't (currently) true of annotations for
667 string eval statements.
668
669 This makes a significant different if the eval defines any subroutines
670 that get called after the eval has returned. Because the time shown for
671 a string eval is based on the statement times it will include time
672 spent executing statements within the subs defined by the eval.
673
674 In future NYTProf may involve the subroutine profiler in timings evals
675 and so be able to avoid this issue.
676
677 Calls from XSUBs and Opcodes
678 Calls record the current filename and line number of the perl code at
679 the time the call was made. That's fine and accurate for calls from
680 perl code. For calls that originate from C code however, such as an
681 XSUB or an opcode, the filename and line number recorded are still
682 those of the last perl statement executed.
683
684 For example, a line that calls an xsub will appear in reports to also
685 have also called any subroutines that that xsub called. This can be
686 construed as a feature.
687
688 As an extreme example, the first time a regular expression that uses
689 character classes is executed on a unicode string you'll find profile
690 data like this:
691
692 # spent 1ms within main::BEGIN@4 which was called
693 # once (1ms+0s) by main::CORE:subst at line 0
694 4 s/ (?: [A-Z] | [\d] )+ (?= [\s] ) //x;
695 # spent 38.8ms making 1 call to main::CORE:subst
696 # spent 25.4ms making 2 calls to utf8::SWASHNEW, avg 12.7ms/call
697 # spent 12.4ms making 1 call to utf8::AUTOLOAD
698
700 You can reduce the cost of profiling by adjusting some options. The
701 trade-off is reduced detail and/or accuracy in reports.
702
703 If you don't need statement-level profiling then you can disable it via
704 "stmts=0". If you do want it but don't mind loosing block-level
705 timings then set "blocks=0". To further boost statement-level
706 profiling performance try "leave=0" but note that will apportion
707 timings for some kinds of statements less accurate).
708
709 If you don't need subroutine profiling then you can disable it via
710 "subs=0". If you do need it but don't need timings for perl opcodes
711 then set "slowops=0".
712
713 Generally speaking, setting blocks=0 and slowops=0 will give you a
714 useful boost with the least loss of detail.
715
716 Another approach is to only enable NYTProf in the sections of code that
717 interest you. See "RUN-TIME CONTROL OF PROFILING" for more details.
718
719 To speed up nytprofhtml try using the --minimal (-m) option.
720
722 The Devel::NYTProf::Data module provides a low-level interface for
723 loading the profile data.
724
725 The Devel::NYTProf::Reader module provides an interface for generating
726 arbitrary reports. This means that you can implement your own output
727 format in perl. (Though the module is in a state of flux and may be
728 deprecated soon.)
729
730 Included in the bin directory of this distribution are some scripts
731 which turn the raw profile data into more useful formats:
732
733 nytprofhtml
734 Creates attractive, richly annotated, and fully cross-linked html
735 reports (including statistics, source code and color highlighting).
736 This is the main report generation tool for NYTProf.
737
738 nytprofcsv
739 Creates comma delimited profile reports. Old and limited.
740
741 nytprofcg
742 Translates a profile into a format that can be loaded into KCachegrind
743 <http://kcachegrind.sourceforge.net>
744
745 nytprofmerge
746 Reads multiple profile data files and writes out a new file containing
747 the merged profile data.
748
750 Threads and Multiplicity
751 "Devel::NYTProf" is not currently thread safe or multiplicity safe. If
752 you'd be interested in helping to fix that then please get in touch
753 with us. Meanwhile, profiling is disabled when a thread is created, and
754 NYTProf tries to ignore any activity from perl interpreters other than
755 the first one that loaded it.
756
757 Coro
758 The "Devel::NYTProf" subroutine profiler gets confused by the stack
759 gymnastics performed by the Coro module and aborts. When profiling
760 applications that use Coro you should disable the subroutine profiler
761 using the "subs=0" option.
762
763 For perl < 5.8.8 it may change what caller() returns
764 For example, the Readonly module croaks with "Invalid tie" when
765 profiled with perl versions before 5.8.8. That's because Readonly
766 explicitly checking for certain values from caller(). The NEXT module
767 is also affected.
768
769 For perl < 5.10.1 it can't see some implicit calls and callbacks
770 For perl versions prior to 5.8.9 and 5.10.1, some implicit subroutine
771 calls can't be seen by the subroutine profiler. Technically this
772 affects calls made via the various perl "call_*()" internal APIs.
773
774 For example, BEGIN/CHECK/INIT/END blocks, the "TIE"whatever subroutine
775 called by "tie()", all calls made via operator overloading, and
776 callbacks from XS code, are not seen.
777
778 The effect is that time in those subroutines is accumulated by the subs
779 that triggered the call to them. So time spent in calls invoked by perl
780 to handle overloading are accumulated by the subroutines that trigger
781 overloading (so it is measured, but the cost is dispersed across
782 possibly many calling locations).
783
784 Although the calls aren't seen by the subroutine profiler, the
785 individual statements executed by the code in the called subs are
786 profiled by the statement profiler.
787
788 #line directives
789 The reporting code currently doesn't handle #line directives, but at
790 least it warns about them. Patches welcome.
791
792 Freed values in @_ may be mutated
793 Perl has a class of bugs related to the fact that values placed in the
794 stack are not reference counted. Consider this example:
795
796 @a = (1..9); sub s { undef @a; print $_ for @_ } s(@a);
797
798 The "undef @a" frees the values that @_ refers to. Perl can sometimes
799 detect when a freed value is accessed and treats it as an undef.
800 However, if the freed value is assigned some new value then @_ is
801 effectively corrupted.
802
803 NYTProf allocates new values while it's profiling, in order to record
804 program activity, and so may appear to corrupt @_ in this (rare)
805 situation. If this happens, NYTProf is simply exposing an existing
806 problem in the code.
807
808 Lvalue subroutines aren't profiled when using use_db_sub=1
809 Currently 'lvalue' subroutines (subs that can be assigned to, like
810 "foo() = 42") are not profiled when using use_db_sub=1.
811
813 Here we discuss the way NYTProf gets high-resolution timing information
814 from your system and related issues.
815
816 POSIX Clocks
817 These are the clocks that your system may support if it supports the
818 POSIX "clock_gettime()" function. Other clock sources are listed in the
819 "Other Clocks" section below.
820
821 The "clock_gettime()" interface allows clocks to return times to
822 nanosecond precision. Of course few offer nanosecond accuracy but the
823 extra precision helps reduce the cumulative error that naturally occurs
824 when adding together many timings. When using these clocks NYTProf
825 outputs timings as a count of 100 nanosecond ticks.
826
827 CLOCK_MONOTONIC
828
829 CLOCK_MONOTONIC represents the amount of time since an unspecified
830 point in the past (typically system start-up time). It increments
831 uniformly independent of adjustments to 'wallclock time'. NYTProf will
832 use this clock by default, if available.
833
834 CLOCK_REALTIME
835
836 CLOCK_REALTIME is typically the system's main high resolution 'wall
837 clock time' source. The same source as used for the gettimeofday()
838 call used by most kinds of perl benchmarking and profiling tools.
839
840 The problem with real time is that it's far from simple. It tends to
841 drift and then be reset to match 'reality', either sharply or by small
842 adjustments (via the adjtime() system call).
843
844 Surprisingly, it can also go backwards, for reasons explained in
845 http://preview.tinyurl.com/5wawnn so CLOCK_MONOTONIC is preferred.
846
847 CLOCK_VIRTUAL
848
849 CLOCK_VIRTUAL increments only when the CPU is running in user mode on
850 behalf of the calling process.
851
852 CLOCK_PROF
853
854 CLOCK_PROF increments when the CPU is running in user or kernel mode.
855
856 CLOCK_PROCESS_CPUTIME_ID
857
858 CLOCK_PROCESS_CPUTIME_ID represents the amount of execution time of the
859 process associated with the clock.
860
861 CLOCK_THREAD_CPUTIME_ID
862
863 CLOCK_THREAD_CPUTIME_ID represents the amount of execution time of the
864 thread associated with the clock.
865
866 Finding Available POSIX Clocks
867
868 On unix-like systems you can find the CLOCK_* clocks available on you
869 system using a command like:
870
871 grep -r 'define *CLOCK_' /usr/include
872
873 Look for a group that includes CLOCK_REALTIME. The integer values
874 listed are the clock ids that you can use with the "clock=N" option.
875
876 A future version of NYTProf should be able to list the supported
877 clocks.
878
879 Other Clocks
880 This section lists other clock sources that NYTProf may use.
881
882 If your system doesn't support clock_gettime() then NYTProf will use
883 gettimeofday(), or the nearest equivalent,
884
885 gettimeofday
886
887 This is the traditional high resolution time of day interface for most
888 unix-like systems. With this clock NYTProf outputs timings as a count
889 of 1 microsecond ticks.
890
891 mach_absolute_time
892
893 On Mac OS X the mach_absolute_time() function is used. With this clock
894 NYTProf outputs timings as a count of 100 nanosecond ticks.
895
896 Time::HiRes
897
898 On systems which don't support other clocks, NYTProf falls back to
899 using the Time::HiRes module. With this clock NYTProf outputs timings
900 as a count of 1 microsecond ticks.
901
902 Clock References
903 Relevant specifications and manual pages:
904
905 http://www.opengroup.org/onlinepubs/000095399/functions/clock_getres.html
906 http://linux.die.net/man/3/clock_gettime
907
908 Why 'realtime' can appear to go backwards:
909
910 http://preview.tinyurl.com/5wawnn
911
913 SMP Systems
914 On systems with multiple processors, which includes most modern
915 machines, (from Linux docs though applicable to most SMP systems):
916
917 The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on
918 many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium).
919 These registers may differ between CPUs and as a consequence these clocks may
920 return bogus results if a process is migrated to another CPU.
921
922 If the CPUs in an SMP system have different clock sources then there is no way
923 to maintain a correlation between the timer registers since each CPU will run
924 at a slightly different frequency. If that is the case then
925 clock_getcpuclockid(0) will return ENOENT to signify this condition. The two
926 clocks will then only be useful if it can be ensured that a process stays on a
927 certain CPU.
928
929 The processors in an SMP system do not start all at exactly the same time and
930 therefore the timer registers are typically running at an offset. Some
931 architectures include code that attempts to limit these offsets on bootup.
932 However, the code cannot guarantee to accurately tune the offsets. Glibc
933 contains no provisions to deal with these offsets (unlike the Linux Kernel).
934 Typically these offsets are small and therefore the effects may be negligible
935 in most cases.
936
937 In summary, SMP systems are likely to give 'noisy' profiles. Setting a
938 "Processor Affinity" may help.
939
940 Processor Affinity
941
942 Processor affinity is an aspect of task scheduling on SMP systems.
943 "Processor affinity takes advantage of the fact that some remnants of a
944 process may remain in one processor's state (in particular, in its
945 cache) from the last time the process ran, and so scheduling it to run
946 on the same processor the next time could result in the process running
947 more efficiently than if it were to run on another processor." (From
948 http://en.wikipedia.org/wiki/Processor_affinity)
949
950 Setting an explicit processor affinity can avoid the problems described
951 in "SMP Systems".
952
953 Processor affinity can be set using the "taskset" command on Linux.
954
955 Note that processor affinity is inherited by child processes, so if the
956 process you're profiling spawns cpu intensive sub processes then your
957 process will be impacted by those more than it otherwise would.
958
959 Windows
960
961 On Windows NYTProf uses Time::HiRes which uses the windows
962 QueryPerformanceCounter() API with some extra logic to adjust for the
963 current clock speed and try to resync the raw counter to wallclock time
964 every so often (every 30 seconds or if the timer drifts by more than
965 0.5 of a seconds). This extra logic may lead to occasional spurious
966 results.
967
968 (It would be great if someone could contribute a patch to NYTProf to
969 use QueryPerformanceCounter() directly and avoid the overheads and
970 resyncing behaviour of Time::HiRes.)
971
972 Virtual Machines
973 I recommend you don't do performance profiling while running in a
974 virtual machine. If you do you're likely to find inexplicable spikes
975 of real-time appearing at unreasonable places in your code. You should
976 pay less attention to the statement timings and rely more on the
977 subroutine timings. They will still be noisy but less so than the
978 statement times.
979
980 You could also try using the "clock=N" option to select a high-
981 resolution cpu-time clock instead of a real-time one. That should be
982 much less noisy, though you will lose visibility of wait-times due to
983 network and disk I/O, for example.
984
985 If your system doesn't support the "clock=N" option then you could try
986 using the "usecputime=1" option. That will give you cpu-time
987 measurements but only at a very low 1/100th of a second resolution.
988
990 Possibly. All complex software has bugs. Let me know if you find one.
991
993 Screenshots of nytprofhtml v2.01 reports can be seen at
994 http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-index.png
995 <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-
996 index.png> and
997 http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-all-perl-files.png
998 <http://timbunce.files.wordpress.com/2008/07/nytprof-perlcritic-all-
999 perl-files.png>. A writeup of the new features of NYTProf v2 can be
1000 found at
1001 http://blog.timbunce.org/2008/07/15/nytprof-v2-a-major-advance-in-perl-profilers/
1002 <http://blog.timbunce.org/2008/07/15/nytprof-v2-a-major-advance-in-
1003 perl-profilers/> and the background story, explaining the "why", can be
1004 found at
1005 http://blog.timbunce.org/2008/07/16/nytprof-v2-the-background-story/
1006 <http://blog.timbunce.org/2008/07/16/nytprof-v2-the-background-story/>.
1007
1008 Mailing list and discussion at
1009 http://groups.google.com/group/develnytprof-dev
1010 <http://groups.google.com/group/develnytprof-dev>
1011
1012 Blog posts <http://blog.timbunce.org/tag/nytprof/>
1013
1014 Public SVN Repository and hacking instructions at
1015 http://code.google.com/p/perl-devel-nytprof/
1016 <http://code.google.com/p/perl-devel-nytprof/>
1017
1018 nytprofhtml is a script included that produces html reports.
1019 nytprofcsv is another script included that produces plain text CSV
1020 reports.
1021
1022 Devel::NYTProf::Reader is the module that powers the report scripts.
1023 You might want to check this out if you plan to implement a custom
1024 report (though it's very likely to be deprecated in a future release).
1025
1026 Devel::NYTProf::ReadStream is the module that lets you read a profile
1027 data file as a stream of chunks of data.
1028
1030 "Profile data incomplete, ..."
1031 This error message means the file doesn't contain all the expected
1032 data. That may be because it was truncated (perhaps the filesystem was
1033 full) or, more commonly, because the all the expected data hasn't been
1034 written.
1035
1036 NYTProf writes some important data to the data file when finishing
1037 profiling. If you read the file before the profiling has finished
1038 you'll get this error.
1039
1040 If the process being profiled is still running you'll need to wait
1041 until it exits cleanly (runs "END" blocks or "finish_profile" is called
1042 explicitly).
1043
1044 If the process being profiled has exited then it's likely that it met
1045 with a sudden and unnatural death that didn't give NYTProf a chance to
1046 finish the profile. If the sudden death was due to a signal then
1047 "sigexit=1" may help. If the sudden death was due to calling
1048 "POSIX::_exit($status)" then you'll need to call "finish_profile"
1049 before calling "POSIX::_exit".
1050
1051 Some files don't have profile information
1052 This is usually due to NYTProf being loaded after the other files, for
1053 example
1054
1055 If you can't alter the command line to add ""-d:NYTProf"" you could try
1056 using the "PERL5OPT" environment variable. See "PROFILING".
1057
1059 Tim Bunce (<http://www.tim.bunce.name> and <http://blog.timbunce.org>)
1060 leads the project and has done most of the development work thus far.
1061
1062 Nicholas Clark contributed zip compression and "nytprofmerge". Chia-
1063 liang Kao contributed "nytprofcg". Peter (Stig) Edwards contributed
1064 the VMS port. Jan Dubois contributed the Windows port. Gisle Aas
1065 contributed the Devel::NYTProf::ReadStream module. Steve Peters
1066 contributed greater perl version portability and use of POSIX high-
1067 resolution clocks. Other contributors are noted in the Changes file.
1068
1069 Many thanks to Adam Kaplan who created "NYTProf" initially by forking
1070 "Devel::FastProf" adding reporting from "Devel::Cover" and a test
1071 suite. For more details see "HISTORY" below.
1072
1074 Copyright (C) 2008 by Adam Kaplan and The New York Times Company.
1075 Copyright (C) 2008-2010 by Tim Bunce, Ireland.
1076
1077 This library is free software; you can redistribute it and/or modify it
1078 under the same terms as Perl itself, either Perl version 5.8.8 or, at
1079 your option, any later version of Perl 5 you may have available.
1080
1082 A bit of history (and a shameless plug from Adam)...
1083
1084 NYTProf stands for 'New York Times Profiler'. Indeed, this module was
1085 initially developed from Devel::FastProf by The New York Times Co. to
1086 help our developers quickly identify bottlenecks in large Perl
1087 applications. The NY Times loves Perl and we hope the community will
1088 benefit from our work as much as we have from theirs.
1089
1090 Please visit <http://open.nytimes.com>, our open source blog to see
1091 what we are up to, <http://code.nytimes.com> to see some of our open
1092 projects and then check out <http://nytimes.com> for the latest news!
1093
1094 Background
1095 Subroutine-level profilers:
1096
1097 Devel::DProf | 1995-10-31 | ILYAZ
1098 Devel::AutoProfiler | 2002-04-07 | GSLONDON
1099 Devel::Profiler | 2002-05-20 | SAMTREGAR
1100 Devel::Profile | 2003-04-13 | JAW
1101 Devel::DProfLB | 2006-05-11 | JAW
1102 Devel::WxProf | 2008-04-14 | MKUTTER
1103
1104 Statement-level profilers:
1105
1106 Devel::SmallProf | 1997-07-30 | ASHTED
1107 Devel::FastProf | 2005-09-20 | SALVA
1108 Devel::NYTProf | 2008-03-04 | AKAPLAN
1109 Devel::Profit | 2008-05-19 | LBROCARD
1110
1111 Devel::NYTProf is a (now distant) fork of Devel::FastProf, which was
1112 itself an evolution of Devel::SmallProf.
1113
1114 Adam Kaplan took Devel::FastProf and added html report generation
1115 (based on Devel::Cover) and a test suite - a tricky thing to do for a
1116 profiler. Meanwhile Tim Bunce had been extending Devel::FastProf to
1117 add novel per-sub and per-block timing, plus subroutine caller
1118 tracking.
1119
1120 When Devel::NYTProf was released Tim switched to working on
1121 Devel::NYTProf because the html report would be a good way to show the
1122 extra profile data, and the test suite made development much easier and
1123 safer.
1124
1125 Then he went a little crazy and added a slew of new features, in
1126 addition to per-sub and per-block timing and subroutine caller
1127 tracking. These included the 'opcode interception' method of profiling,
1128 ultra-fast and robust inclusive subroutine timing, doubling
1129 performance, plus major changes to html reporting to display all the
1130 extra profile call and timing data in richly annotated and cross-linked
1131 reports.
1132
1133 Steve Peters came on board along the way with patches for portability
1134 and to keep NYTProf working with the latest development perl versions.
1135 Nicholas Clark added zip compression, many optimizations, and
1136 "nytprofmerge". Jan Dubois contributed Windows support.
1137
1138 Adam's work is sponsored by The New York Times Co.
1139 <http://open.nytimes.com>. Tim's work was partly sponsored by
1140 Shopzilla <http://www.shopzilla.com> during 2008.
1141
1142
1143
1144perl v5.12.2 2010-11-30 Devel::NYTProf(3)