1PERLPERF(1)            Perl Programmers Reference Guide            PERLPERF(1)
2
3
4

NAME

6       perlperf - Perl Performance and Optimization Techniques
7

DESCRIPTION

9       This is an introduction to the use of performance and optimization
10       techniques which can be used with particular reference to perl
11       programs.  While many perl developers have come from other languages,
12       and can use their prior knowledge where appropriate, there are many
13       other people who might benefit from a few perl specific pointers.  If
14       you want the condensed version, perhaps the best advice comes from the
15       renowned Japanese Samurai, Miyamoto Musashi, who said:
16
17           "Do Not Engage in Useless Activity"
18
19       in 1645.
20

OVERVIEW

22       Perhaps the most common mistake programmers make is to attempt to
23       optimize their code before a program actually does anything useful -
24       this is a bad idea.  There's no point in having an extremely fast
25       program that doesn't work.  The first job is to get a program to
26       correctly do something useful, (not to mention ensuring the test suite
27       is fully functional), and only then to consider optimizing it.  Having
28       decided to optimize existing working code, there are several simple but
29       essential steps to consider which are intrinsic to any optimization
30       process.
31
32   ONE STEP SIDEWAYS
33       Firstly, you need to establish a baseline time for the existing code,
34       which timing needs to be reliable and repeatable.  You'll probably want
35       to use the "Benchmark" or "Devel::NYTProf" modules, or something
36       similar, for this step, or perhaps the Unix system "time" utility,
37       whichever is appropriate.  See the base of this document for a longer
38       list of benchmarking and profiling modules, and recommended further
39       reading.
40
41   ONE STEP FORWARD
42       Next, having examined the program for hot spots, (places where the code
43       seems to run slowly), change the code with the intention of making it
44       run faster.  Using version control software, like "subversion", will
45       ensure no changes are irreversible.  It's too easy to fiddle here and
46       fiddle there - don't change too much at any one time or you might not
47       discover which piece of code really was the slow bit.
48
49   ANOTHER STEP SIDEWAYS
50       It's not enough to say: "that will make it run faster", you have to
51       check it.  Rerun the code under control of the benchmarking or
52       profiling modules, from the first step above, and check that the new
53       code executed the same task in less time.  Save your work and repeat...
54

GENERAL GUIDELINES

56       The critical thing when considering performance is to remember there is
57       no such thing as a "Golden Bullet", which is why there are no rules,
58       only guidelines.
59
60       It is clear that inline code is going to be faster than subroutine or
61       method calls, because there is less overhead, but this approach has the
62       disadvantage of being less maintainable and comes at the cost of
63       greater memory usage - there is no such thing as a free lunch.  If you
64       are searching for an element in a list, it can be more efficient to
65       store the data in a hash structure, and then simply look to see whether
66       the key is defined, rather than to loop through the entire array using
67       grep() for instance.  substr() may be (a lot) faster than grep() but
68       not as flexible, so you have another trade-off to access.  Your code
69       may contain a line which takes 0.01 of a second to execute which if you
70       call it 1,000 times, quite likely in a program parsing even medium
71       sized files for instance, you already have a 10 second delay, in just
72       one single code location, and if you call that line 100,000 times, your
73       entire program will slow down to an unbearable crawl.
74
75       Using a subroutine as part of your sort is a powerful way to get
76       exactly what you want, but will usually be slower than the built-in
77       alphabetic "cmp" and numeric "<=>" sort operators.  It is possible to
78       make multiple passes over your data, building indices to make the
79       upcoming sort more efficient, and to use what is known as the "OM"
80       (Orcish Maneuver) to cache the sort keys in advance.  The cache lookup,
81       while a good idea, can itself be a source of slowdown by enforcing a
82       double pass over the data - once to setup the cache, and once to sort
83       the data.  Using "pack()" to extract the required sort key into a
84       consistent string can be an efficient way to build a single string to
85       compare, instead of using multiple sort keys, which makes it possible
86       to use the standard, written in "c" and fast, perl "sort()" function on
87       the output, and is the basis of the "GRT" (Guttman Rossler Transform).
88       Some string combinations can slow the "GRT" down, by just being too
89       plain complex for it's own good.
90
91       For applications using database backends, the standard "DBIx" namespace
92       has tries to help with keeping things nippy, not least because it tries
93       to not query the database until the latest possible moment, but always
94       read the docs which come with your choice of libraries.  Among the many
95       issues facing developers dealing with databases should remain aware of
96       is to always use "SQL" placeholders and to consider pre-fetching data
97       sets when this might prove advantageous.  Splitting up a large file by
98       assigning multiple processes to parsing a single file, using say "POE",
99       "threads" or "fork" can also be a useful way of optimizing your usage
100       of the available "CPU" resources, though this technique is fraught with
101       concurrency issues and demands high attention to detail.
102
103       Every case has a specific application and one or more exceptions, and
104       there is no replacement for running a few tests and finding out which
105       method works best for your particular environment, this is why writing
106       optimal code is not an exact science, and why we love using Perl so
107       much - TMTOWTDI.
108

BENCHMARKS

110       Here are a few examples to demonstrate usage of Perl's benchmarking
111       tools.
112
113   Assigning and Dereferencing Variables.
114       I'm sure most of us have seen code which looks like, (or worse than),
115       this:
116
117           if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) {
118               ...
119
120       This sort of code can be a real eyesore to read, as well as being very
121       sensitive to typos, and it's much clearer to dereference the variable
122       explicitly.  We're side-stepping the issue of working with object-
123       oriented programming techniques to encapsulate variable access via
124       methods, only accessible through an object.  Here we're just discussing
125       the technical implementation of choice, and whether this has an effect
126       on performance.  We can see whether this dereferencing operation, has
127       any overhead by putting comparative code in a file and running a
128       "Benchmark" test.
129
130       # dereference
131
132           #!/usr/bin/perl
133
134           use strict;
135           use warnings;
136
137           use Benchmark;
138
139           my $ref = {
140                   'ref'   => {
141                       _myscore    => '100 + 1',
142                       _yourscore  => '102 - 1',
143                   },
144           };
145
146           timethese(1000000, {
147                   'direct'       => sub {
148                       my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ;
149                   },
150                   'dereference'  => sub {
151                       my $ref  = $ref->{ref};
152                       my $myscore = $ref->{_myscore};
153                       my $yourscore = $ref->{_yourscore};
154                       my $x = $myscore . $yourscore;
155                   },
156           });
157
158       It's essential to run any timing measurements a sufficient number of
159       times so the numbers settle on a numerical average, otherwise each run
160       will naturally fluctuate due to variations in the environment, to
161       reduce the effect of contention for "CPU" resources and network
162       bandwidth for instance.  Running the above code for one million
163       iterations, we can take a look at the report output by the "Benchmark"
164       module, to see which approach is the most effective.
165
166           $> perl dereference
167
168           Benchmark: timing 1000000 iterations of dereference, direct...
169           dereference:  2 wallclock secs ( 1.59 usr +  0.00 sys =  1.59 CPU) @ 628930.82/s (n=1000000)
170               direct:  1 wallclock secs ( 1.20 usr +  0.00 sys =  1.20 CPU) @ 833333.33/s (n=1000000)
171
172       The difference is clear to see and the dereferencing approach is
173       slower.  While it managed to execute an average of 628,930 times a
174       second during our test, the direct approach managed to run an
175       additional 204,403 times, unfortunately.  Unfortunately, because there
176       are many examples of code written using the multiple layer direct
177       variable access, and it's usually horrible.  It is, however, minusculy
178       faster.  The question remains whether the minute gain is actually worth
179       the eyestrain, or the loss of maintainability.
180
181   Search and replace or tr
182       If we have a string which needs to be modified, while a regex will
183       almost always be much more flexible, "tr", an oft underused tool, can
184       still be a useful.  One scenario might be replace all vowels with
185       another character.  The regex solution might look like this:
186
187           $str =~ s/[aeiou]/x/g
188
189       The "tr" alternative might look like this:
190
191           $str =~ tr/aeiou/xxxxx/
192
193       We can put that into a test file which we can run to check which
194       approach is the fastest, using a global $STR variable to assign to the
195       "my $str" variable so as to avoid perl trying to optimize any of the
196       work away by noticing it's assigned only the once.
197
198       # regex-transliterate
199
200           #!/usr/bin/perl
201
202           use strict;
203           use warnings;
204
205           use Benchmark;
206
207           my $STR = "$$-this and that";
208
209           timethese( 1000000, {
210                   'sr'  => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; },
211                   'tr'  => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; },
212           });
213
214       Running the code gives us our results:
215
216           $> perl regex-transliterate
217
218           Benchmark: timing 1000000 iterations of sr, tr...
219                   sr:  2 wallclock secs ( 1.19 usr +  0.00 sys =  1.19 CPU) @ 840336.13/s (n=1000000)
220                   tr:  0 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 2040816.33/s (n=1000000)
221
222       The "tr" version is a clear winner.  One solution is flexible, the
223       other is fast - and it's appropriately the programmer's choice which to
224       use.
225
226       Check the "Benchmark" docs for further useful techniques.
227

PROFILING TOOLS

229       A slightly larger piece of code will provide something on which a
230       profiler can produce more extensive reporting statistics.  This example
231       uses the simplistic "wordmatch" program which parses a given input file
232       and spews out a short report on the contents.
233
234       # wordmatch
235
236           #!/usr/bin/perl
237
238           use strict;
239           use warnings;
240
241           =head1 NAME
242
243           filewords - word analysis of input file
244
245           =head1 SYNOPSIS
246
247               filewords -f inputfilename [-d]
248
249           =head1 DESCRIPTION
250
251           This program parses the given filename, specified with C<-f>, and displays a
252           simple analysis of the words found therein.  Use the C<-d> switch to enable
253           debugging messages.
254
255           =cut
256
257           use FileHandle;
258           use Getopt::Long;
259
260           my $debug   =  0;
261           my $file    = '';
262
263           my $result = GetOptions (
264               'debug'         => \$debug,
265               'file=s'        => \$file,
266           );
267           die("invalid args") unless $result;
268
269           unless ( -f $file ) {
270               die("Usage: $0 -f filename [-d]");
271           }
272           my $FH = FileHandle->new("< $file") or die("unable to open file($file): $!");
273
274           my $i_LINES = 0;
275           my $i_WORDS = 0;
276           my %count   = ();
277
278           my @lines = <$FH>;
279           foreach my $line ( @lines ) {
280               $i_LINES++;
281               $line =~ s/\n//;
282               my @words = split(/ +/, $line);
283               my $i_words = scalar(@words);
284               $i_WORDS = $i_WORDS + $i_words;
285               debug("line: $i_LINES supplying $i_words words: @words");
286               my $i_word = 0;
287               foreach my $word ( @words ) {
288                   $i_word++;
289                   $count{$i_LINES}{spec} += matches($i_word, $word, '[^a-zA-Z0-9]');
290                   $count{$i_LINES}{only} += matches($i_word, $word, '^[^a-zA-Z0-9]+$');
291                   $count{$i_LINES}{cons} += matches($i_word, $word, '^[(?i:bcdfghjklmnpqrstvwxyz)]+$');
292                   $count{$i_LINES}{vows} += matches($i_word, $word, '^[(?i:aeiou)]+$');
293                   $count{$i_LINES}{caps} += matches($i_word, $word, '^[(A-Z)]+$');
294               }
295           }
296
297           print report( %count );
298
299           sub matches {
300               my $i_wd  = shift;
301               my $word  = shift;
302               my $regex = shift;
303               my $has = 0;
304
305               if ( $word =~ /($regex)/ ) {
306                   $has++ if $1;
307               }
308
309               debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
310
311               return $has;
312           }
313
314           sub report {
315               my %report = @_;
316               my %rep;
317
318               foreach my $line ( keys %report ) {
319                   foreach my $key ( keys %{ $report{$line} } ) {
320                       $rep{$key} += $report{$line}{$key};
321                   }
322               }
323
324               my $report = qq|
325           $0 report for $file:
326           lines in file: $i_LINES
327           words in file: $i_WORDS
328           words with special (non-word) characters: $i_spec
329           words with only special (non-word) characters: $i_only
330           words with only consonants: $i_cons
331           words with only capital letters: $i_caps
332           words with only vowels: $i_vows
333           |;
334
335               return $report;
336           }
337
338           sub debug {
339               my $message = shift;
340
341               if ( $debug ) {
342                   print STDERR "DBG: $message\n";
343               }
344           }
345
346           exit 0;
347
348   Devel::DProf
349       This venerable module has been the de-facto standard for Perl code
350       profiling for more than a decade, but has been replaced by a number of
351       other modules which have brought us back to the 21st century.  Although
352       you're recommended to evaluate your tool from the several mentioned
353       here and from the CPAN list at the base of this document, (and
354       currently Devel::NYTProf seems to be the weapon of choice - see below),
355       we'll take a quick look at the output from Devel::DProf first, to set a
356       baseline for Perl profiling tools.  Run the above program under the
357       control of "Devel::DProf" by using the "-d" switch on the command-line.
358
359           $> perl -d:DProf wordmatch -f perl5db.pl
360
361           <...multiple lines snipped...>
362
363           wordmatch report for perl5db.pl:
364           lines in file: 9428
365           words in file: 50243
366           words with special (non-word) characters: 20480
367           words with only special (non-word) characters: 7790
368           words with only consonants: 4801
369           words with only capital letters: 1316
370           words with only vowels: 1701
371
372       "Devel::DProf" produces a special file, called tmon.out by default, and
373       this file is read by the "dprofpp" program, which is already installed
374       as part of the "Devel::DProf" distribution.  If you call "dprofpp" with
375       no options, it will read the tmon.out file in the current directory and
376       produce a human readable statistics report of the run of your program.
377       Note that this may take a little time.
378
379           $> dprofpp
380
381           Total Elapsed Time = 2.951677 Seconds
382             User+System Time = 2.871677 Seconds
383           Exclusive Times
384           %Time ExclSec CumulS #Calls sec/call Csec/c  Name
385            102.   2.945  3.003 251215   0.0000 0.0000  main::matches
386            2.40   0.069  0.069 260643   0.0000 0.0000  main::debug
387            1.74   0.050  0.050      1   0.0500 0.0500  main::report
388            1.04   0.030  0.049      4   0.0075 0.0123  main::BEGIN
389            0.35   0.010  0.010      3   0.0033 0.0033  Exporter::as_heavy
390            0.35   0.010  0.010      7   0.0014 0.0014  IO::File::BEGIN
391            0.00       - -0.000      1        -      -  Getopt::Long::FindOption
392            0.00       - -0.000      1        -      -  Symbol::BEGIN
393            0.00       - -0.000      1        -      -  Fcntl::BEGIN
394            0.00       - -0.000      1        -      -  Fcntl::bootstrap
395            0.00       - -0.000      1        -      -  warnings::BEGIN
396            0.00       - -0.000      1        -      -  IO::bootstrap
397            0.00       - -0.000      1        -      -  Getopt::Long::ConfigDefaults
398            0.00       - -0.000      1        -      -  Getopt::Long::Configure
399            0.00       - -0.000      1        -      -  Symbol::gensym
400
401       "dprofpp" will produce some quite detailed reporting on the activity of
402       the "wordmatch" program.  The wallclock, user and system, times are at
403       the top of the analysis, and after this are the main columns defining
404       which define the report.  Check the "dprofpp" docs for details of the
405       many options it supports.
406
407       See also "Apache::DProf" which hooks "Devel::DProf" into "mod_perl".
408
409   Devel::Profiler
410       Let's take a look at the same program using a different profiler:
411       "Devel::Profiler", a drop-in Perl-only replacement for "Devel::DProf".
412       The usage is very slightly different in that instead of using the
413       special "-d:" flag, you pull "Devel::Profiler" in directly as a module
414       using "-M".
415
416           $> perl -MDevel::Profiler wordmatch -f perl5db.pl
417
418           <...multiple lines snipped...>
419
420           wordmatch report for perl5db.pl:
421           lines in file: 9428
422           words in file: 50243
423           words with special (non-word) characters: 20480
424           words with only special (non-word) characters: 7790
425           words with only consonants: 4801
426           words with only capital letters: 1316
427           words with only vowels: 1701
428
429       "Devel::Profiler" generates a tmon.out file which is compatible with
430       the "dprofpp" program, thus saving the construction of a dedicated
431       statistics reader program.  "dprofpp" usage is therefore identical to
432       the above example.
433
434           $> dprofpp
435
436           Total Elapsed Time =   20.984 Seconds
437             User+System Time =   19.981 Seconds
438           Exclusive Times
439           %Time ExclSec CumulS #Calls sec/call Csec/c  Name
440            49.0   9.792 14.509 251215   0.0000 0.0001  main::matches
441            24.4   4.887  4.887 260643   0.0000 0.0000  main::debug
442            0.25   0.049  0.049      1   0.0490 0.0490  main::report
443            0.00   0.000  0.000      1   0.0000 0.0000  Getopt::Long::GetOptions
444            0.00   0.000  0.000      2   0.0000 0.0000  Getopt::Long::ParseOptionSpec
445            0.00   0.000  0.000      1   0.0000 0.0000  Getopt::Long::FindOption
446            0.00   0.000  0.000      1   0.0000 0.0000  IO::File::new
447            0.00   0.000  0.000      1   0.0000 0.0000  IO::Handle::new
448            0.00   0.000  0.000      1   0.0000 0.0000  Symbol::gensym
449            0.00   0.000  0.000      1   0.0000 0.0000  IO::File::open
450
451       Interestingly we get slightly different results, which is mostly
452       because the algorithm which generates the report is different, even
453       though the output file format was allegedly identical.  The elapsed,
454       user and system times are clearly showing the time it took for
455       "Devel::Profiler" to execute its own run, but the column listings feel
456       more accurate somehow than the ones we had earlier from "Devel::DProf".
457       The 102% figure has disappeared, for example.  This is where we have to
458       use the tools at our disposal, and recognise their pros and cons,
459       before using them.  Interestingly, the numbers of calls for each
460       subroutine are identical in the two reports, it's the percentages which
461       differ.  As the author of "Devel::Proviler" writes:
462
463           ...running HTML::Template's test suite under Devel::DProf shows output()
464           taking NO time but Devel::Profiler shows around 10% of the time is in output().
465           I don't know which to trust but my gut tells me something is wrong with
466           Devel::DProf.  HTML::Template::output() is a big routine that's called for
467           every test. Either way, something needs fixing.
468
469       YMMV.
470
471       See also "Devel::Apache::Profiler" which hooks "Devel::Profiler" into
472       "mod_perl".
473
474   Devel::SmallProf
475       The "Devel::SmallProf" profiler examines the runtime of your Perl
476       program and produces a line-by-line listing to show how many times each
477       line was called, and how long each line took to execute.  It is called
478       by supplying the familiar "-d" flag to Perl at runtime.
479
480           $> perl -d:SmallProf wordmatch -f perl5db.pl
481
482           <...multiple lines snipped...>
483
484           wordmatch report for perl5db.pl:
485           lines in file: 9428
486           words in file: 50243
487           words with special (non-word) characters: 20480
488           words with only special (non-word) characters: 7790
489           words with only consonants: 4801
490           words with only capital letters: 1316
491           words with only vowels: 1701
492
493       "Devel::SmallProf" writes it's output into a file called smallprof.out,
494       by default.  The format of the file looks like this:
495
496           <num> <time> <ctime> <line>:<text>
497
498       When the program has terminated, the output may be examined and sorted
499       using any standard text filtering utilities.  Something like the
500       following may be sufficient:
501
502           $> cat smallprof.out | grep \d*: | sort -k3 | tac | head -n20
503
504           251215   1.65674   7.68000    75: if ( $word =~ /($regex)/ ) {
505           251215   0.03264   4.40000    79: debug("word: $i_wd ".($has ? 'matches' :
506           251215   0.02693   4.10000    81: return $has;
507           260643   0.02841   4.07000   128: if ( $debug ) {
508           260643   0.02601   4.04000   126: my $message = shift;
509           251215   0.02641   3.91000    73: my $has = 0;
510           251215   0.03311   3.71000    70: my $i_wd  = shift;
511           251215   0.02699   3.69000    72: my $regex = shift;
512           251215   0.02766   3.68000    71: my $word  = shift;
513            50243   0.59726   1.00000    59:  $count{$i_LINES}{cons} =
514            50243   0.48175   0.92000    61:  $count{$i_LINES}{spec} =
515            50243   0.00644   0.89000    56:  my $i_cons = matches($i_word, $word,
516            50243   0.48837   0.88000    63:  $count{$i_LINES}{caps} =
517            50243   0.00516   0.88000    58:  my $i_caps = matches($i_word, $word, '^[(A-
518            50243   0.00631   0.81000    54:  my $i_spec = matches($i_word, $word, '[^a-
519            50243   0.00496   0.80000    57:  my $i_vows = matches($i_word, $word,
520            50243   0.00688   0.80000    53:  $i_word++;
521            50243   0.48469   0.79000    62:  $count{$i_LINES}{only} =
522            50243   0.48928   0.77000    60:  $count{$i_LINES}{vows} =
523            50243   0.00683   0.75000    55:  my $i_only = matches($i_word, $word, '^[^a-
524
525       You can immediately see a slightly different focus to the subroutine
526       profiling modules, and we start to see exactly which line of code is
527       taking the most time.  That regex line is looking a bit suspicious, for
528       example.  Remember that these tools are supposed to be used together,
529       there is no single best way to profile your code, you need to use the
530       best tools for the job.
531
532       See also "Apache::SmallProf" which hooks "Devel::SmallProf" into
533       "mod_perl".
534
535   Devel::FastProf
536       "Devel::FastProf" is another Perl line profiler.  This was written with
537       a view to getting a faster line profiler, than is possible with for
538       example "Devel::SmallProf", because it's written in "C".  To use
539       "Devel::FastProf", supply the "-d" argument to Perl:
540
541           $> perl -d:FastProf wordmatch -f perl5db.pl
542
543           <...multiple lines snipped...>
544
545           wordmatch report for perl5db.pl:
546           lines in file: 9428
547           words in file: 50243
548           words with special (non-word) characters: 20480
549           words with only special (non-word) characters: 7790
550           words with only consonants: 4801
551           words with only capital letters: 1316
552           words with only vowels: 1701
553
554       "Devel::FastProf" writes statistics to the file fastprof.out in the
555       current directory.  The output file, which can be specified, can be
556       interpreted by using the "fprofpp" command-line program.
557
558           $> fprofpp | head -n20
559
560           # fprofpp output format is:
561           # filename:line time count: source
562           wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) {
563           wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
564           wordmatch:81 1.47604 251215: return $has;
565           wordmatch:126 1.43441 260643: my $message = shift;
566           wordmatch:128 1.42156 260643: if ( $debug ) {
567           wordmatch:70 1.36824 251215: my $i_wd  = shift;
568           wordmatch:71 1.36739 251215: my $word  = shift;
569           wordmatch:72 1.35939 251215: my $regex = shift;
570
571       Straightaway we can see that the number of times each line has been
572       called is identical to the "Devel::SmallProf" output, and the sequence
573       is only very slightly different based on the ordering of the amount of
574       time each line took to execute, "if ( $debug ) { " and "my $message =
575       shift;", for example.  The differences in the actual times recorded
576       might be in the algorithm used internally, or it could be due to system
577       resource limitations or contention.
578
579       See also the DBIx::Profile which will profile database queries running
580       under the "DBIx::*" namespace.
581
582   Devel::NYTProf
583       "Devel::NYTProf" is the next generation of Perl code profiler, fixing
584       many shortcomings in other tools and implementing many cool features.
585       First of all it can be used as either a line profiler, a block or a
586       subroutine profiler, all at once.  It can also use sub-microsecond
587       (100ns) resolution on systems which provide "clock_gettime()".  It can
588       be started and stopped even by the program being profiled.  It's a one-
589       line entry to profile "mod_perl" applications.  It's written in "c" and
590       is probably the fastest profiler available for Perl.  The list of
591       coolness just goes on.  Enough of that, let's see how to it works -
592       just use the familiar "-d" switch to plug it in and run the code.
593
594           $> perl -d:NYTProf wordmatch -f perl5db.pl
595
596           wordmatch report for perl5db.pl:
597           lines in file: 9427
598           words in file: 50243
599           words with special (non-word) characters: 20480
600           words with only special (non-word) characters: 7790
601           words with only consonants: 4801
602           words with only capital letters: 1316
603           words with only vowels: 1701
604
605       "NYTProf" will generate a report database into the file nytprof.out by
606       default.  Human readable reports can be generated from here by using
607       the supplied "nytprofhtml" (HTML output) and "nytprofcsv" (CSV output)
608       programs.  We've used the Unix system "html2text" utility to convert
609       the nytprof/index.html file for convenience here.
610
611           $> html2text nytprof/index.html
612
613           Performance Profile Index
614           For wordmatch
615             Run on Fri Sep 26 13:46:39 2008
616           Reported on Fri Sep 26 13:47:23 2008
617
618                    Top 15 Subroutines -- ordered by exclusive time
619           |Calls |P |F |Inclusive|Exclusive|Subroutine                          |
620           |      |  |  |Time     |Time     |                                    |
621           |251215|5 |1 |13.09263 |10.47692 |main::              |matches        |
622           |260642|2 |1 |2.71199  |2.71199  |main::              |debug          |
623           |1     |1 |1 |0.21404  |0.21404  |main::              |report         |
624           |2     |2 |2 |0.00511  |0.00511  |XSLoader::          |load (xsub)    |
625           |14    |14|7 |0.00304  |0.00298  |Exporter::          |import         |
626           |3     |1 |1 |0.00265  |0.00254  |Exporter::          |as_heavy       |
627           |10    |10|4 |0.00140  |0.00140  |vars::              |import         |
628           |13    |13|1 |0.00129  |0.00109  |constant::          |import         |
629           |1     |1 |1 |0.00360  |0.00096  |FileHandle::        |import         |
630           |3     |3 |3 |0.00086  |0.00074  |warnings::register::|import         |
631           |9     |3 |1 |0.00036  |0.00036  |strict::            |bits           |
632           |13    |13|13|0.00032  |0.00029  |strict::            |import         |
633           |2     |2 |2 |0.00020  |0.00020  |warnings::          |import         |
634           |2     |1 |1 |0.00020  |0.00020  |Getopt::Long::      |ParseOptionSpec|
635           |7     |7 |6 |0.00043  |0.00020  |strict::            |unimport       |
636
637           For more information see the full list of 189 subroutines.
638
639       The first part of the report already shows the critical information
640       regarding which subroutines are using the most time.  The next gives
641       some statistics about the source files profiled.
642
643                   Source Code Files -- ordered by exclusive time then name
644           |Stmts  |Exclusive|Avg.   |Reports                     |Source File         |
645           |       |Time     |       |                            |                    |
646           |2699761|15.66654 |6e-06  |line   .    block   .    sub|wordmatch           |
647           |35     |0.02187  |0.00062|line   .    block   .    sub|IO/Handle.pm        |
648           |274    |0.01525  |0.00006|line   .    block   .    sub|Getopt/Long.pm      |
649           |20     |0.00585  |0.00029|line   .    block   .    sub|Fcntl.pm            |
650           |128    |0.00340  |0.00003|line   .    block   .    sub|Exporter/Heavy.pm   |
651           |42     |0.00332  |0.00008|line   .    block   .    sub|IO/File.pm          |
652           |261    |0.00308  |0.00001|line   .    block   .    sub|Exporter.pm         |
653           |323    |0.00248  |8e-06  |line   .    block   .    sub|constant.pm         |
654           |12     |0.00246  |0.00021|line   .    block   .    sub|File/Spec/Unix.pm   |
655           |191    |0.00240  |0.00001|line   .    block   .    sub|vars.pm             |
656           |77     |0.00201  |0.00003|line   .    block   .    sub|FileHandle.pm       |
657           |12     |0.00198  |0.00016|line   .    block   .    sub|Carp.pm             |
658           |14     |0.00175  |0.00013|line   .    block   .    sub|Symbol.pm           |
659           |15     |0.00130  |0.00009|line   .    block   .    sub|IO.pm               |
660           |22     |0.00120  |0.00005|line   .    block   .    sub|IO/Seekable.pm      |
661           |198    |0.00085  |4e-06  |line   .    block   .    sub|warnings/register.pm|
662           |114    |0.00080  |7e-06  |line   .    block   .    sub|strict.pm           |
663           |47     |0.00068  |0.00001|line   .    block   .    sub|warnings.pm         |
664           |27     |0.00054  |0.00002|line   .    block   .    sub|overload.pm         |
665           |9      |0.00047  |0.00005|line   .    block   .    sub|SelectSaver.pm      |
666           |13     |0.00045  |0.00003|line   .    block   .    sub|File/Spec.pm        |
667           |2701595|15.73869 |       |Total                       |
668           |128647 |0.74946  |       |Average                     |
669           |       |0.00201  |0.00003|Median                      |
670           |       |0.00121  |0.00003|Deviation                   |
671
672           Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and
673           Adam Kaplan.
674
675       At this point, if you're using the html report, you can click through
676       the various links to bore down into each subroutine and each line of
677       code.  Because we're using the text reporting here, and there's a whole
678       directory full of reports built for each source file, we'll just
679       display a part of the corresponding wordmatch-line.html file,
680       sufficient to give an idea of the sort of output you can expect from
681       this cool tool.
682
683           $> html2text nytprof/wordmatch-line.html
684
685           Performance Profile -- -block view-.-line view-.-sub view-
686           For wordmatch
687           Run on Fri Sep 26 13:46:39 2008
688           Reported on Fri Sep 26 13:47:22 2008
689
690           File wordmatch
691
692            Subroutines -- ordered by exclusive time
693           |Calls |P|F|Inclusive|Exclusive|Subroutine    |
694           |      | | |Time     |Time     |              |
695           |251215|5|1|13.09263 |10.47692 |main::|matches|
696           |260642|2|1|2.71199  |2.71199  |main::|debug  |
697           |1     |1|1|0.21404  |0.21404  |main::|report |
698           |0     |0|0|0        |0        |main::|BEGIN  |
699
700
701           |Line|Stmts.|Exclusive|Avg.   |Code                                           |
702           |    |      |Time     |       |                                               |
703           |1   |      |         |       |#!/usr/bin/perl                                |
704           |2   |      |         |       |                                               |
705           |    |      |         |       |use strict;                                    |
706           |3   |3     |0.00086  |0.00029|# spent 0.00003s making 1 calls to strict::    |
707           |    |      |         |       |import                                         |
708           |    |      |         |       |use warnings;                                  |
709           |4   |3     |0.01563  |0.00521|# spent 0.00012s making 1 calls to warnings::  |
710           |    |      |         |       |import                                         |
711           |5   |      |         |       |                                               |
712           |6   |      |         |       |=head1 NAME                                    |
713           |7   |      |         |       |                                               |
714           |8   |      |         |       |filewords - word analysis of input file        |
715           <...snip...>
716           |62  |1     |0.00445  |0.00445|print report( %count );                        |
717           |    |      |         |       |# spent 0.21404s making 1 calls to main::report|
718           |63  |      |         |       |                                               |
719           |    |      |         |       |# spent 23.56955s (10.47692+2.61571) within    |
720           |    |      |         |       |main::matches which was called 251215 times,   |
721           |    |      |         |       |avg 0.00005s/call: # 50243 times               |
722           |    |      |         |       |(2.12134+0.51939s) at line 57 of wordmatch, avg|
723           |    |      |         |       |0.00005s/call # 50243 times (2.17735+0.54550s) |
724           |64  |      |         |       |at line 56 of wordmatch, avg 0.00005s/call #   |
725           |    |      |         |       |50243 times (2.10992+0.51797s) at line 58 of   |
726           |    |      |         |       |wordmatch, avg 0.00005s/call # 50243 times     |
727           |    |      |         |       |(2.12696+0.51598s) at line 55 of wordmatch, avg|
728           |    |      |         |       |0.00005s/call # 50243 times (1.94134+0.51687s) |
729           |    |      |         |       |at line 54 of wordmatch, avg 0.00005s/call     |
730           |    |      |         |       |sub matches {                                  |
731           <...snip...>
732           |102 |      |         |       |                                               |
733           |    |      |         |       |# spent 2.71199s within main::debug which was  |
734           |    |      |         |       |called 260642 times, avg 0.00001s/call: #      |
735           |    |      |         |       |251215 times (2.61571+0s) by main::matches at  |
736           |103 |      |         |       |line 74 of wordmatch, avg 0.00001s/call # 9427 |
737           |    |      |         |       |times (0.09628+0s) at line 50 of wordmatch, avg|
738           |    |      |         |       |0.00001s/call                                  |
739           |    |      |         |       |sub debug {                                    |
740           |104 |260642|0.58496  |2e-06  |my $message = shift;                           |
741           |105 |      |         |       |                                               |
742           |106 |260642|1.09917  |4e-06  |if ( $debug ) {                                |
743           |107 |      |         |       |print STDERR "DBG: $message\n";                |
744           |108 |      |         |       |}                                              |
745           |109 |      |         |       |}                                              |
746           |110 |      |         |       |                                               |
747           |111 |1     |0.01501  |0.01501|exit 0;                                        |
748           |112 |      |         |       |                                               |
749
750       Oodles of very useful information in there - this seems to be the way
751       forward.
752
753       See also "Devel::NYTProf::Apache" which hooks "Devel::NYTProf" into
754       "mod_perl".
755

SORTING

757       Perl modules are not the only tools a performance analyst has at their
758       disposal, system tools like "time" should not be overlooked as the next
759       example shows, where we take a quick look at sorting.  Many books,
760       theses and articles, have been written about efficient sorting
761       algorithms, and this is not the place to repeat such work, there's
762       several good sorting modules which deserve taking a look at too:
763       "Sort::Maker", "Sort::Key" spring to mind.  However, it's still
764       possible to make some observations on certain Perl specific
765       interpretations on issues relating to sorting data sets and give an
766       example or two with regard to how sorting large data volumes can effect
767       performance.  Firstly, an often overlooked point when sorting large
768       amounts of data, one can attempt to reduce the data set to be dealt
769       with and in many cases "grep()" can be quite useful as a simple filter:
770
771           @data = sort grep { /$filter/ } @incoming
772
773       A command such as this can vastly reduce the volume of material to
774       actually sort through in the first place, and should not be too lightly
775       disregarded purely on the basis of its simplicity.  The "KISS"
776       principle is too often overlooked - the next example uses the simple
777       system "time" utility to demonstrate.  Let's take a look at an actual
778       example of sorting the contents of a large file, an apache logfile
779       would do.  This one has over a quarter of a million lines, is 50M in
780       size, and a snippet of it looks like this:
781
782       # logfile
783
784           188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
785           188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
786           151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
787           151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
788           151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
789           217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
790           217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
791           debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
792           debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
793           debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
794           195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
795           195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
796           195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
797           crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0"
798           crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0"
799           fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
800           fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
801           80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
802           80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
803           pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
804           livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
805           livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
806           dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
807           196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1"
808
809       The specific task here is to sort the 286,525 lines of this file by
810       Response Code, Query, Browser, Referring Url, and lastly Date.  One
811       solution might be to use the following code, which iterates over the
812       files given on the command-line.
813
814       # sort-apache-log
815
816           #!/usr/bin/perl -n
817
818           use strict;
819           use warnings;
820
821           my @data;
822
823           LINE:
824           while ( <> ) {
825               my $line = $_;
826               if (
827                   $line =~ m/^(
828                       ([\w\.\-]+)             # client
829                       \s*-\s*-\s*\[
830                       ([^]]+)                 # date
831                       \]\s*"\w+\s*
832                       (\S+)                   # query
833                       [^"]+"\s*
834                       (\d+)                   # status
835                       \s+\S+\s+"[^"]*"\s+"
836                       ([^"]*)                 # browser
837                       "
838                       .*
839                   )$/x
840               ) {
841                   my @chunks = split(/ +/, $line);
842                   my $ip      = $1;
843                   my $date    = $2;
844                   my $query   = $3;
845                   my $status  = $4;
846                   my $browser = $5;
847
848                   push(@data, [$ip, $date, $query, $status, $browser, $line]);
849               }
850           }
851
852           my @sorted = sort {
853               $a->[3] cmp $b->[3]
854                       ||
855               $a->[2] cmp $b->[2]
856                       ||
857               $a->[0] cmp $b->[0]
858                       ||
859               $a->[1] cmp $b->[1]
860                       ||
861               $a->[4] cmp $b->[4]
862           } @data;
863
864           foreach my $data ( @sorted ) {
865               print $data->[5];
866           }
867
868           exit 0;
869
870       When running this program, redirect "STDOUT" so it is possible to check
871       the output is correct from following test runs and use the system
872       "time" utility to check the overall runtime.
873
874           $> time ./sort-apache-log logfile > out-sort
875
876           real    0m17.371s
877           user    0m15.757s
878           sys     0m0.592s
879
880       The program took just over 17 wallclock seconds to run.  Note the
881       different values "time" outputs, it's important to always use the same
882       one, and to not confuse what each one means.
883
884       Elapsed Real Time
885           The overall, or wallclock, time between when "time" was called, and
886           when it terminates.  The elapsed time includes both user and system
887           times, and time spent waiting for other users and processes on the
888           system.  Inevitably, this is the most approximate of the
889           measurements given.
890
891       User CPU Time
892           The user time is the amount of time the entire process spent on
893           behalf of the user on this system executing this program.
894
895       System CPU Time
896           The system time is the amount of time the kernel itself spent
897           executing routines, or system calls, on behalf of this process
898           user.
899
900       Running this same process as a "Schwarzian Transform" it is possible to
901       eliminate the input and output arrays for storing all the data, and
902       work on the input directly as it arrives too.  Otherwise, the code
903       looks fairly similar:
904
905       # sort-apache-log-schwarzian
906
907           #!/usr/bin/perl -n
908
909           use strict;
910           use warnings;
911
912           print
913
914               map $_->[0] =>
915
916               sort {
917                   $a->[4] cmp $b->[4]
918                           ||
919                   $a->[3] cmp $b->[3]
920                           ||
921                   $a->[1] cmp $b->[1]
922                           ||
923                   $a->[2] cmp $b->[2]
924                           ||
925                   $a->[5] cmp $b->[5]
926               }
927               map  [ $_, m/^(
928                   ([\w\.\-]+)             # client
929                   \s*-\s*-\s*\[
930                   ([^]]+)                 # date
931                   \]\s*"\w+\s*
932                   (\S+)                   # query
933                   [^"]+"\s*
934                   (\d+)                   # status
935                   \s+\S+\s+"[^"]*"\s+"
936                   ([^"]*)                 # browser
937                   "
938                   .*
939               )$/xo ]
940
941               => <>;
942
943           exit 0;
944
945       Run the new code against the same logfile, as above, to check the new
946       time.
947
948           $> time ./sort-apache-log-schwarzian logfile > out-schwarz
949
950           real    0m9.664s
951           user    0m8.873s
952           sys     0m0.704s
953
954       The time has been cut in half, which is a respectable speed improvement
955       by any standard.  Naturally, it is important to check the output is
956       consistent with the first program run, this is where the Unix system
957       "cksum" utility comes in.
958
959           $> cksum out-sort out-schwarz
960           3044173777 52029194 out-sort
961           3044173777 52029194 out-schwarz
962
963       BTW. Beware too of pressure from managers who see you speed a program
964       up by 50% of the runtime once, only to get a request one month later to
965       do the same again (true story) - you'll just have to point out your
966       only human, even if you are a Perl programmer, and you'll see what you
967       can do...
968

LOGGING

970       An essential part of any good development process is appropriate error
971       handling with appropriately informative messages, however there exists
972       a school of thought which suggests that log files should be chatty, as
973       if the chain of unbroken output somehow ensures the survival of the
974       program.  If speed is in any way an issue, this approach is wrong.
975
976       A common sight is code which looks something like this:
977
978           logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) )
979
980       The problem is that this code will always be parsed and executed, even
981       when the debug level set in the logging configuration file is zero.
982       Once the debug() subroutine has been entered, and the internal $debug
983       variable confirmed to be zero, for example, the message which has been
984       sent in will be discarded and the program will continue.  In the
985       example given though, the \%INC hash will already have been dumped, and
986       the message string constructed, all of which work could be bypassed by
987       a debug variable at the statement level, like this:
988
989           logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) ) if $DEBUG;
990
991       This effect can be demonstrated by setting up a test script with both
992       forms, including a "debug()" subroutine to emulate typical "logger()"
993       functionality.
994
995       # ifdebug
996
997           #!/usr/bin/perl
998
999           use strict;
1000           use warnings;
1001
1002           use Benchmark;
1003           use Data::Dumper;
1004           my $DEBUG = 0;
1005
1006           sub debug {
1007               my $msg = shift;
1008
1009               if ( $DEBUG ) {
1010                   print "DEBUG: $msg\n";
1011               }
1012           };
1013
1014           timethese(100000, {
1015                   'debug'       => sub {
1016                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
1017                   },
1018                   'ifdebug'  => sub {
1019                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG
1020                   },
1021           });
1022
1023       Let's see what "Benchmark" makes of this:
1024
1025           $> perl ifdebug
1026           Benchmark: timing 100000 iterations of constant, sub...
1027              ifdebug:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
1028                       (warning: too few iterations for a reliable count)
1029                debug: 14 wallclock secs (13.18 usr +  0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000)
1030
1031       In the one case the code, which does exactly the same thing as far as
1032       outputting any debugging information is concerned, in other words
1033       nothing, takes 14 seconds, and in the other case the code takes one
1034       hundredth of a second.  Looks fairly definitive.  Use a $DEBUG variable
1035       BEFORE you call the subroutine, rather than relying on the smart
1036       functionality inside it.
1037
1038   Logging if DEBUG (constant)
1039       It's possible to take the previous idea a little further, by using a
1040       compile time "DEBUG" constant.
1041
1042       # ifdebug-constant
1043
1044           #!/usr/bin/perl
1045
1046           use strict;
1047           use warnings;
1048
1049           use Benchmark;
1050           use Data::Dumper;
1051           use constant
1052               DEBUG => 0
1053           ;
1054
1055           sub debug {
1056               if ( DEBUG ) {
1057                   my $msg = shift;
1058                   print "DEBUG: $msg\n";
1059               }
1060           };
1061
1062           timethese(100000, {
1063                   'debug'       => sub {
1064                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
1065                   },
1066                   'constant'  => sub {
1067                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG
1068                   },
1069           });
1070
1071       Running this program produces the following output:
1072
1073           $> perl ifdebug-constant
1074           Benchmark: timing 100000 iterations of constant, sub...
1075             constant:  0 wallclock secs (-0.00 usr +  0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000)
1076                       (warning: too few iterations for a reliable count)
1077                  sub: 14 wallclock secs (13.09 usr +  0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000)
1078
1079       The "DEBUG" constant wipes the floor with even the $debug variable,
1080       clocking in at minus zero seconds, and generates a "warning: too few
1081       iterations for a reliable count" message into the bargain.  To see what
1082       is really going on, and why we had too few iterations when we thought
1083       we asked for 100000, we can use the very useful "B::Deparse" to inspect
1084       the new code:
1085
1086           $> perl -MO=Deparse ifdebug-constant
1087
1088           use Benchmark;
1089           use Data::Dumper;
1090           use constant ('DEBUG', 0);
1091           sub debug {
1092               use warnings;
1093               use strict 'refs';
1094               0;
1095           }
1096           use warnings;
1097           use strict 'refs';
1098           timethese(100000, {'sub', sub {
1099               debug "A $0 logging message via process-id: $$" . Dumper(\%INC);
1100           }
1101           , 'constant', sub {
1102               0;
1103           }
1104           });
1105           ifdebug-constant syntax OK
1106
1107       The output shows the constant() subroutine we're testing being replaced
1108       with the value of the "DEBUG" constant: zero.  The line to be tested
1109       has been completely optimized away, and you can't get much more
1110       efficient than that.
1111

POSTSCRIPT

1113       This document has provided several way to go about identifying hot-
1114       spots, and checking whether any modifications have improved the runtime
1115       of the code.
1116
1117       As a final thought, remember that it's not (at the time of writing)
1118       possible to produce a useful program which will run in zero or negative
1119       time and this basic principle can be written as: useful programs are
1120       slow by their very definition.  It is of course possible to write a
1121       nearly instantaneous program, but it's not going to do very much,
1122       here's a very efficient one:
1123
1124           $> perl -e 0
1125
1126       Optimizing that any further is a job for "p5p".
1127

SEE ALSO

1129       Further reading can be found using the modules and links below.
1130
1131   PERLDOCS
1132       For example: "perldoc -f sort".
1133
1134       perlfaq4.
1135
1136       perlfork, perlfunc, perlretut, perlthrtut.
1137
1138       threads.
1139
1140   MAN PAGES
1141       "time".
1142
1143   MODULES
1144       It's not possible to individually showcase all the performance related
1145       code for Perl here, naturally, but here's a short list of modules from
1146       the CPAN which deserve further attention.
1147
1148           Apache::DProf
1149           Apache::SmallProf
1150           Benchmark
1151           DBIx::Profile
1152           Devel::AutoProfiler
1153           Devel::DProf
1154           Devel::DProfLB
1155           Devel::FastProf
1156           Devel::GraphVizProf
1157           Devel::NYTProf
1158           Devel::NYTProf::Apache
1159           Devel::Profiler
1160           Devel::Profile
1161           Devel::Profit
1162           Devel::SmallProf
1163           Devel::WxProf
1164           POE::Devel::Profiler
1165           Sort::Key
1166           Sort::Maker
1167
1168   URLS
1169       Very useful online reference material:
1170
1171           http://www.ccl4.org/~nick/P/Fast_Enough/
1172
1173           http://www-128.ibm.com/developerworks/library/l-optperl.html
1174
1175           http://perlbuzz.com/2007/11/bind-output-variables-in-dbi-for-speed-and-safety.html
1176
1177           http://en.wikipedia.org/wiki/Performance_analysis
1178
1179           http://apache.perl.org/docs/1.0/guide/performance.html
1180
1181           http://perlgolf.sourceforge.net/
1182
1183           http://www.sysarch.com/Perl/sort_paper.html
1184

AUTHOR

1186       Richard Foley <richard.foley@rfi.net> Copyright (c) 2008
1187
1188
1189
1190perl v5.16.3                      2013-03-04                       PERLPERF(1)
Impressum