1PERLPERF(1)            Perl Programmers Reference Guide            PERLPERF(1)
2
3
4

NAME

6       perlperf - Perl Performance and Optimization Techniques
7

DESCRIPTION

9       This is an introduction to the use of performance and optimization
10       techniques which can be used with particular reference to perl
11       programs.  While many perl developers have come from other languages,
12       and can use their prior knowledge where appropriate, there are many
13       other people who might benefit from a few perl specific pointers.  If
14       you want the condensed version, perhaps the best advice comes from the
15       renowned Japanese Samurai, Miyamoto Musashi, who said:
16
17           "Do Not Engage in Useless Activity"
18
19       in 1645.
20

OVERVIEW

22       Perhaps the most common mistake programmers make is to attempt to
23       optimize their code before a program actually does anything useful -
24       this is a bad idea.  There's no point in having an extremely fast
25       program that doesn't work.  The first job is to get a program to
26       correctly do something useful, (not to mention ensuring the test suite
27       is fully functional), and only then to consider optimizing it.  Having
28       decided to optimize existing working code, there are several simple but
29       essential steps to consider which are intrinsic to any optimization
30       process.
31
32   ONE STEP SIDEWAYS
33       Firstly, you need to establish a baseline time for the existing code,
34       which timing needs to be reliable and repeatable.  You'll probably want
35       to use the "Benchmark" or "Devel::DProf" modules, or something similar,
36       for this step, or perhaps the Unix system "time" utility, whichever is
37       appropriate.  See the base of this document for a longer list of
38       benchmarking and profiling modules, and recommended further reading.
39
40   ONE STEP FORWARD
41       Next, having examined the program for hot spots, (places where the code
42       seems to run slowly), change the code with the intention of making it
43       run faster.  Using version control software, like "subversion", will
44       ensure no changes are irreversible.  It's too easy to fiddle here and
45       fiddle there - don't change too much at any one time or you might not
46       discover which piece of code really was the slow bit.
47
48   ANOTHER STEP SIDEWAYS
49       It's not enough to say: "that will make it run faster", you have to
50       check it.  Rerun the code under control of the benchmarking or
51       profiling modules, from the first step above, and check that the new
52       code executed the same task in less time.  Save your work and repeat...
53

GENERAL GUIDELINES

55       The critical thing when considering performance is to remember there is
56       no such thing as a "Golden Bullet", which is why there are no rules,
57       only guidelines.
58
59       It is clear that inline code is going to be faster than subroutine or
60       method calls, because there is less overhead, but this approach has the
61       disadvantage of being less maintainable and comes at the cost of
62       greater memory usage - there is no such thing as a free lunch.  If you
63       are searching for an element in a list, it can be more efficient to
64       store the data in a hash structure, and then simply look to see whether
65       the key is defined, rather than to loop through the entire array using
66       grep() for instance.  substr() may be (a lot) faster than grep() but
67       not as flexible, so you have another trade-off to access.  Your code
68       may contain a line which takes 0.01 of a second to execute which if you
69       call it 1,000 times, quite likely in a program parsing even medium
70       sized files for instance, you already have a 10 second delay, in just
71       one single code location, and if you call that line 100,000 times, your
72       entire program will slow down to an unbearable crawl.
73
74       Using a subroutine as part of your sort is a powerful way to get
75       exactly what you want, but will usually be slower than the built-in
76       alphabetic "cmp" and numeric "<=>" sort operators.  It is possible to
77       make multiple passes over your data, building indices to make the
78       upcoming sort more efficient, and to use what is known as the "OM"
79       (Orcish Maneuver) to cache the sort keys in advance.  The cache lookup,
80       while a good idea, can itself be a source of slowdown by enforcing a
81       double pass over the data - once to setup the cache, and once to sort
82       the data.  Using "pack()" to extract the required sort key into a
83       consistent string can be an efficient way to build a single string to
84       compare, instead of using multiple sort keys, which makes it possible
85       to use the standard, written in "c" and fast, perl "sort()" function on
86       the output, and is the basis of the "GRT" (Guttman Rossler Transform).
87       Some string combinations can slow the "GRT" down, by just being too
88       plain complex for it's own good.
89
90       For applications using database backends, the standard "DBIx" namespace
91       has tries to help with keeping things nippy, not least because it tries
92       to not query the database until the latest possible moment, but always
93       read the docs which come with your choice of libraries.  Among the many
94       issues facing developers dealing with databases should remain aware of
95       is to always use "SQL" placeholders and to consider pre-fetching data
96       sets when this might prove advantageous.  Splitting up a large file by
97       assigning multiple processes to parsing a single file, using say "POE",
98       "threads" or "fork" can also be a useful way of optimizing your usage
99       of the available "CPU" resources, though this technique is fraught with
100       concurrency issues and demands high attention to detail.
101
102       Every case has a specific application and one or more exceptions, and
103       there is no replacement for running a few tests and finding out which
104       method works best for your particular environment, this is why writing
105       optimal code is not an exact science, and why we love using Perl so
106       much - TMTOWTDI.
107

BENCHMARKS

109       Here are a few examples to demonstrate usage of Perl's benchmarking
110       tools.
111
112   Assigning and Dereferencing Variables.
113       I'm sure most of us have seen code which looks like, (or worse than),
114       this:
115
116           if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) {
117               ...
118
119       This sort of code can be a real eyesore to read, as well as being very
120       sensitive to typos, and it's much clearer to dereference the variable
121       explicitly.  We're side-stepping the issue of working with object-
122       oriented programming techniques to encapsulate variable access via
123       methods, only accessible through an object.  Here we're just discussing
124       the technical implementation of choice, and whether this has an effect
125       on performance.  We can see whether this dereferencing operation, has
126       any overhead by putting comparative code in a file and running a
127       "Benchmark" test.
128
129       # dereference
130
131           #!/usr/bin/perl
132
133           use strict;
134           use warnings;
135
136           use Benchmark;
137
138           my $ref = {
139                   'ref'   => {
140                       _myscore    => '100 + 1',
141                       _yourscore  => '102 - 1',
142                   },
143           };
144
145           timethese(1000000, {
146                   'direct'       => sub {
147                       my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ;
148                   },
149                   'dereference'  => sub {
150                       my $ref  = $ref->{ref};
151                       my $myscore = $ref->{_myscore};
152                       my $yourscore = $ref->{_yourscore};
153                       my $x = $myscore . $yourscore;
154                   },
155           });
156
157       It's essential to run any timing measurements a sufficient number of
158       times so the numbers settle on a numerical average, otherwise each run
159       will naturally fluctuate due to variations in the environment, to
160       reduce the effect of contention for "CPU" resources and network
161       bandwidth for instance.  Running the above code for one million
162       iterations, we can take a look at the report output by the "Benchmark"
163       module, to see which approach is the most effective.
164
165           $> perl dereference
166
167           Benchmark: timing 1000000 iterations of dereference, direct...
168           dereference:  2 wallclock secs ( 1.59 usr +  0.00 sys =  1.59 CPU) @ 628930.82/s (n=1000000)
169               direct:  1 wallclock secs ( 1.20 usr +  0.00 sys =  1.20 CPU) @ 833333.33/s (n=1000000)
170
171       The difference is clear to see and the dereferencing approach is
172       slower.  While it managed to execute an average of 628,930 times a
173       second during our test, the direct approach managed to run an
174       additional 204,403 times, unfortunately.  Unfortunately, because there
175       are many examples of code written using the multiple layer direct
176       variable access, and it's usually horrible.  It is, however, minusculy
177       faster.  The question remains whether the minute gain is actually worth
178       the eyestrain, or the loss of maintainability.
179
180   Search and replace or tr
181       If we have a string which needs to be modified, while a regex will
182       almost always be much more flexible, "tr", an oft underused tool, can
183       still be a useful.  One scenario might be replace all vowels with
184       another character.  The regex solution might look like this:
185
186           $str =~ s/[aeiou]/x/g
187
188       The "tr" alternative might look like this:
189
190           $str =~ tr/aeiou/xxxxx/
191
192       We can put that into a test file which we can run to check which
193       approach is the fastest, using a global $STR variable to assign to the
194       "my $str" variable so as to avoid perl trying to optimize any of the
195       work away by noticing it's assigned only the once.
196
197       # regex-transliterate
198
199           #!/usr/bin/perl
200
201           use strict;
202           use warnings;
203
204           use Benchmark;
205
206           my $STR = "$$-this and that";
207
208           timethese( 1000000, {
209                   'sr'  => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; },
210                   'tr'  => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; },
211           });
212
213       Running the code gives us our results:
214
215           $> perl regex-transliterate
216
217           Benchmark: timing 1000000 iterations of sr, tr...
218                   sr:  2 wallclock secs ( 1.19 usr +  0.00 sys =  1.19 CPU) @ 840336.13/s (n=1000000)
219                   tr:  0 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 2040816.33/s (n=1000000)
220
221       The "tr" version is a clear winner.  One solution is flexible, the
222       other is fast - and it's appropriately the programmer's choice which to
223       use.
224
225       Check the "Benchmark" docs for further useful techniques.
226

PROFILING TOOLS

228       A slightly larger piece of code will provide something on which a
229       profiler can produce more extensive reporting statistics.  This example
230       uses the simplistic "wordmatch" program which parses a given input file
231       and spews out a short report on the contents.
232
233       # wordmatch
234
235           #!/usr/bin/perl
236
237           use strict;
238           use warnings;
239
240           =head1 NAME
241
242           filewords - word analysis of input file
243
244           =head1 SYNOPSIS
245
246               filewords -f inputfilename [-d]
247
248           =head1 DESCRIPTION
249
250           This program parses the given filename, specified with C<-f>, and displays a
251           simple analysis of the words found therein.  Use the C<-d> switch to enable
252           debugging messages.
253
254           =cut
255
256           use FileHandle;
257           use Getopt::Long;
258
259           my $debug   =  0;
260           my $file    = '';
261
262           my $result = GetOptions (
263               'debug'         => \$debug,
264               'file=s'        => \$file,
265           );
266           die("invalid args") unless $result;
267
268           unless ( -f $file ) {
269               die("Usage: $0 -f filename [-d]");
270           }
271           my $FH = FileHandle->new("< $file") or die("unable to open file($file): $!");
272
273           my $i_LINES = 0;
274           my $i_WORDS = 0;
275           my %count   = ();
276
277           my @lines = <$FH>;
278           foreach my $line ( @lines ) {
279               $i_LINES++;
280               $line =~ s/\n//;
281               my @words = split(/ +/, $line);
282               my $i_words = scalar(@words);
283               $i_WORDS = $i_WORDS + $i_words;
284               debug("line: $i_LINES supplying $i_words words: @words");
285               my $i_word = 0;
286               foreach my $word ( @words ) {
287                   $i_word++;
288                   $count{$i_LINES}{spec} += matches($i_word, $word, '[^a-zA-Z0-9]');
289                   $count{$i_LINES}{only} += matches($i_word, $word, '^[^a-zA-Z0-9]+$');
290                   $count{$i_LINES}{cons} += matches($i_word, $word, '^[(?i:bcdfghjklmnpqrstvwxyz)]+$');
291                   $count{$i_LINES}{vows} += matches($i_word, $word, '^[(?i:aeiou)]+$');
292                   $count{$i_LINES}{caps} += matches($i_word, $word, '^[(A-Z)]+$');
293               }
294           }
295
296           print report( %count );
297
298           sub matches {
299               my $i_wd  = shift;
300               my $word  = shift;
301               my $regex = shift;
302               my $has = 0;
303
304               if ( $word =~ /($regex)/ ) {
305                   $has++ if $1;
306               }
307
308               debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
309
310               return $has;
311           }
312
313           sub report {
314               my %report = @_;
315               my %rep;
316
317               foreach my $line ( keys %report ) {
318                   foreach my $key ( keys %{ $report{$line} } ) {
319                       $rep{$key} += $report{$line}{$key};
320                   }
321               }
322
323               my $report = qq|
324           $0 report for $file:
325           lines in file: $i_LINES
326           words in file: $i_WORDS
327           words with special (non-word) characters: $i_spec
328           words with only special (non-word) characters: $i_only
329           words with only consonants: $i_cons
330           words with only capital letters: $i_caps
331           words with only vowels: $i_vows
332           |;
333
334               return $report;
335           }
336
337           sub debug {
338               my $message = shift;
339
340               if ( $debug ) {
341                   print STDERR "DBG: $message\n";
342               }
343           }
344
345           exit 0;
346
347   Devel::DProf
348       This venerable module has been the de-facto standard for Perl code
349       profiling for more than a decade, but has been replaced by a number of
350       other modules which have brought us back to the 21st century.  Although
351       you're recommended to evaluate your tool from the several mentioned
352       here and from the CPAN list at the base of this document, (and
353       currently Devel::NYTProf seems to be the weapon of choice - see below),
354       we'll take a quick look at the output from Devel::DProf first, to set a
355       baseline for Perl profiling tools.  Run the above program under the
356       control of "Devel::DProf" by using the "-d" switch on the command-line.
357
358           $> perl -d:DProf wordmatch -f perl5db.pl
359
360           <...multiple lines snipped...>
361
362           wordmatch report for perl5db.pl:
363           lines in file: 9428
364           words in file: 50243
365           words with special (non-word) characters: 20480
366           words with only special (non-word) characters: 7790
367           words with only consonants: 4801
368           words with only capital letters: 1316
369           words with only vowels: 1701
370
371       "Devel::DProf" produces a special file, called tmon.out by default, and
372       this file is read by the "dprofpp" program, which is already installed
373       as part of the "Devel::DProf" distribution.  If you call "dprofpp" with
374       no options, it will read the tmon.out file in the current directory and
375       produce a human readable statistics report of the run of your program.
376       Note that this may take a little time.
377
378           $> dprofpp
379
380           Total Elapsed Time = 2.951677 Seconds
381             User+System Time = 2.871677 Seconds
382           Exclusive Times
383           %Time ExclSec CumulS #Calls sec/call Csec/c  Name
384            102.   2.945  3.003 251215   0.0000 0.0000  main::matches
385            2.40   0.069  0.069 260643   0.0000 0.0000  main::debug
386            1.74   0.050  0.050      1   0.0500 0.0500  main::report
387            1.04   0.030  0.049      4   0.0075 0.0123  main::BEGIN
388            0.35   0.010  0.010      3   0.0033 0.0033  Exporter::as_heavy
389            0.35   0.010  0.010      7   0.0014 0.0014  IO::File::BEGIN
390            0.00       - -0.000      1        -      -  Getopt::Long::FindOption
391            0.00       - -0.000      1        -      -  Symbol::BEGIN
392            0.00       - -0.000      1        -      -  Fcntl::BEGIN
393            0.00       - -0.000      1        -      -  Fcntl::bootstrap
394            0.00       - -0.000      1        -      -  warnings::BEGIN
395            0.00       - -0.000      1        -      -  IO::bootstrap
396            0.00       - -0.000      1        -      -  Getopt::Long::ConfigDefaults
397            0.00       - -0.000      1        -      -  Getopt::Long::Configure
398            0.00       - -0.000      1        -      -  Symbol::gensym
399
400       "dprofpp" will produce some quite detailed reporting on the activity of
401       the "wordmatch" program.  The wallclock, user and system, times are at
402       the top of the analysis, and after this are the main columns defining
403       which define the report.  Check the "dprofpp" docs for details of the
404       many options it supports.
405
406       See also "Apache::DProf" which hooks "Devel::DProf" into "mod_perl".
407
408   Devel::Profiler
409       Let's take a look at the same program using a different profiler:
410       "Devel::Profiler", a drop-in Perl-only replacement for "Devel::DProf".
411       The usage is very slightly different in that instead of using the
412       special "-d:" flag, you pull "Devel::Profiler" in directly as a module
413       using "-M".
414
415           $> perl -MDevel::Profiler wordmatch -f perl5db.pl
416
417           <...multiple lines snipped...>
418
419           wordmatch report for perl5db.pl:
420           lines in file: 9428
421           words in file: 50243
422           words with special (non-word) characters: 20480
423           words with only special (non-word) characters: 7790
424           words with only consonants: 4801
425           words with only capital letters: 1316
426           words with only vowels: 1701
427
428       "Devel::Profiler" generates a tmon.out file which is compatible with
429       the "dprofpp" program, thus saving the construction of a dedicated
430       statistics reader program.  "dprofpp" usage is therefore identical to
431       the above example.
432
433           $> dprofpp
434
435           Total Elapsed Time =   20.984 Seconds
436             User+System Time =   19.981 Seconds
437           Exclusive Times
438           %Time ExclSec CumulS #Calls sec/call Csec/c  Name
439            49.0   9.792 14.509 251215   0.0000 0.0001  main::matches
440            24.4   4.887  4.887 260643   0.0000 0.0000  main::debug
441            0.25   0.049  0.049      1   0.0490 0.0490  main::report
442            0.00   0.000  0.000      1   0.0000 0.0000  Getopt::Long::GetOptions
443            0.00   0.000  0.000      2   0.0000 0.0000  Getopt::Long::ParseOptionSpec
444            0.00   0.000  0.000      1   0.0000 0.0000  Getopt::Long::FindOption
445            0.00   0.000  0.000      1   0.0000 0.0000  IO::File::new
446            0.00   0.000  0.000      1   0.0000 0.0000  IO::Handle::new
447            0.00   0.000  0.000      1   0.0000 0.0000  Symbol::gensym
448            0.00   0.000  0.000      1   0.0000 0.0000  IO::File::open
449
450       Interestingly we get slightly different results, which is mostly
451       because the algorithm which generates the report is different, even
452       though the output file format was allegedly identical.  The elapsed,
453       user and system times are clearly showing the time it took for
454       "Devel::Profiler" to execute its own run, but the column listings feel
455       more accurate somehow than the ones we had earlier from "Devel::DProf".
456       The 102% figure has disappeared, for example.  This is where we have to
457       use the tools at our disposal, and recognise their pros and cons,
458       before using them.  Interestingly, the numbers of calls for each
459       subroutine are identical in the two reports, it's the percentages which
460       differ.  As the author of "Devel::Proviler" writes:
461
462           ...running HTML::Template's test suite under Devel::DProf shows output()
463           taking NO time but Devel::Profiler shows around 10% of the time is in output().
464           I don't know which to trust but my gut tells me something is wrong with
465           Devel::DProf.  HTML::Template::output() is a big routine that's called for
466           every test. Either way, something needs fixing.
467
468       YMMV.
469
470       See also "Devel::Apache::Profiler" which hooks "Devel::Profiler" into
471       "mod_perl".
472
473   Devel::SmallProf
474       The "Devel::SmallProf" profiler examines the runtime of your Perl
475       program and produces a line-by-line listing to show how many times each
476       line was called, and how long each line took to execute.  It is called
477       by supplying the familiar "-d" flag to Perl at runtime.
478
479           $> perl -d:SmallProf wordmatch -f perl5db.pl
480
481           <...multiple lines snipped...>
482
483           wordmatch report for perl5db.pl:
484           lines in file: 9428
485           words in file: 50243
486           words with special (non-word) characters: 20480
487           words with only special (non-word) characters: 7790
488           words with only consonants: 4801
489           words with only capital letters: 1316
490           words with only vowels: 1701
491
492       "Devel::SmallProf" writes it's output into a file called smallprof.out,
493       by default.  The format of the file looks like this:
494
495           <num> <time> <ctime> <line>:<text>
496
497       When the program has terminated, the output may be examined and sorted
498       using any standard text filtering utilities.  Something like the
499       following may be sufficient:
500
501           $> cat smallprof.out | grep \d*: | sort -k3 | tac | head -n20
502
503           251215   1.65674   7.68000    75: if ( $word =~ /($regex)/ ) {
504           251215   0.03264   4.40000    79: debug("word: $i_wd ".($has ? 'matches' :
505           251215   0.02693   4.10000    81: return $has;
506           260643   0.02841   4.07000   128: if ( $debug ) {
507           260643   0.02601   4.04000   126: my $message = shift;
508           251215   0.02641   3.91000    73: my $has = 0;
509           251215   0.03311   3.71000    70: my $i_wd  = shift;
510           251215   0.02699   3.69000    72: my $regex = shift;
511           251215   0.02766   3.68000    71: my $word  = shift;
512            50243   0.59726   1.00000    59:  $count{$i_LINES}{cons} =
513            50243   0.48175   0.92000    61:  $count{$i_LINES}{spec} =
514            50243   0.00644   0.89000    56:  my $i_cons = matches($i_word, $word,
515            50243   0.48837   0.88000    63:  $count{$i_LINES}{caps} =
516            50243   0.00516   0.88000    58:  my $i_caps = matches($i_word, $word, '^[(A-
517            50243   0.00631   0.81000    54:  my $i_spec = matches($i_word, $word, '[^a-
518            50243   0.00496   0.80000    57:  my $i_vows = matches($i_word, $word,
519            50243   0.00688   0.80000    53:  $i_word++;
520            50243   0.48469   0.79000    62:  $count{$i_LINES}{only} =
521            50243   0.48928   0.77000    60:  $count{$i_LINES}{vows} =
522            50243   0.00683   0.75000    55:  my $i_only = matches($i_word, $word, '^[^a-
523
524       You can immediately see a slightly different focus to the subroutine
525       profiling modules, and we start to see exactly which line of code is
526       taking the most time.  That regex line is looking a bit suspicious, for
527       example.  Remember that these tools are supposed to be used together,
528       there is no single best way to profile your code, you need to use the
529       best tools for the job.
530
531       See also "Apache::SmallProf" which hooks "Devel::SmallProf" into
532       "mod_perl".
533
534   Devel::FastProf
535       "Devel::FastProf" is another Perl line profiler.  This was written with
536       a view to getting a faster line profiler, than is possible with for
537       example "Devel::SmallProf", because it's written in "C".  To use
538       "Devel::FastProf", supply the "-d" argument to Perl:
539
540           $> perl -d:FastProf wordmatch -f perl5db.pl
541
542           <...multiple lines snipped...>
543
544           wordmatch report for perl5db.pl:
545           lines in file: 9428
546           words in file: 50243
547           words with special (non-word) characters: 20480
548           words with only special (non-word) characters: 7790
549           words with only consonants: 4801
550           words with only capital letters: 1316
551           words with only vowels: 1701
552
553       "Devel::FastProf" writes statistics to the file fastprof.out in the
554       current directory.  The output file, which can be specified, can be
555       interpreted by using the "fprofpp" command-line program.
556
557           $> fprofpp | head -n20
558
559           # fprofpp output format is:
560           # filename:line time count: source
561           wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) {
562           wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
563           wordmatch:81 1.47604 251215: return $has;
564           wordmatch:126 1.43441 260643: my $message = shift;
565           wordmatch:128 1.42156 260643: if ( $debug ) {
566           wordmatch:70 1.36824 251215: my $i_wd  = shift;
567           wordmatch:71 1.36739 251215: my $word  = shift;
568           wordmatch:72 1.35939 251215: my $regex = shift;
569
570       Straightaway we can see that the number of times each line has been
571       called is identical to the "Devel::SmallProf" output, and the sequence
572       is only very slightly different based on the ordering of the amount of
573       time each line took to execute, "if ( $debug ) { " and "my $message =
574       shift;", for example.  The differences in the actual times recorded
575       might be in the algorithm used internally, or it could be due to system
576       resource limitations or contention.
577
578       See also the DBIx::Profiler which will profile database queries running
579       under the "DBIx::*" namespace.
580
581   Devel::NYTProf
582       "Devel::NYTProf" is the next generation of Perl code profiler, fixing
583       many shortcomings in other tools and implementing many cool features.
584       First of all it can be used as either a line profiler, a block or a
585       subroutine profiler, all at once.  It can also use sub-microsecond
586       (100ns) resolution on systems which provide "clock_gettime()".  It can
587       be started and stopped even by the program being profiled.  It's a one-
588       line entry to profile "mod_perl" applications.  It's written in "c" and
589       is probably the fastest profiler available for Perl.  The list of
590       coolness just goes on.  Enough of that, let's see how to it works -
591       just use the familiar "-d" switch to plug it in and run the code.
592
593           $> perl -d:NYTProf wordmatch -f perl5db.pl
594
595           wordmatch report for perl5db.pl:
596           lines in file: 9427
597           words in file: 50243
598           words with special (non-word) characters: 20480
599           words with only special (non-word) characters: 7790
600           words with only consonants: 4801
601           words with only capital letters: 1316
602           words with only vowels: 1701
603
604       "NYTProf" will generate a report database into the file nytprof.out by
605       default.  Human readable reports can be generated from here by using
606       the supplied "nytprofhtml" (HTML output) and "nytprofcsv" (CSV output)
607       programs.  We've used the Unix sytem "html2text" utility to convert the
608       nytprof/index.html file for convenience here.
609
610           $> html2text nytprof/index.html
611
612           Performance Profile Index
613           For wordmatch
614             Run on Fri Sep 26 13:46:39 2008
615           Reported on Fri Sep 26 13:47:23 2008
616
617                    Top 15 Subroutines -- ordered by exclusive time
618           |Calls |P |F |Inclusive|Exclusive|Subroutine                          |
619           |      |  |  |Time     |Time     |                                    |
620           |251215|5 |1 |13.09263 |10.47692 |main::              |matches        |
621           |260642|2 |1 |2.71199  |2.71199  |main::              |debug          |
622           |1     |1 |1 |0.21404  |0.21404  |main::              |report         |
623           |2     |2 |2 |0.00511  |0.00511  |XSLoader::          |load (xsub)    |
624           |14    |14|7 |0.00304  |0.00298  |Exporter::          |import         |
625           |3     |1 |1 |0.00265  |0.00254  |Exporter::          |as_heavy       |
626           |10    |10|4 |0.00140  |0.00140  |vars::              |import         |
627           |13    |13|1 |0.00129  |0.00109  |constant::          |import         |
628           |1     |1 |1 |0.00360  |0.00096  |FileHandle::        |import         |
629           |3     |3 |3 |0.00086  |0.00074  |warnings::register::|import         |
630           |9     |3 |1 |0.00036  |0.00036  |strict::            |bits           |
631           |13    |13|13|0.00032  |0.00029  |strict::            |import         |
632           |2     |2 |2 |0.00020  |0.00020  |warnings::          |import         |
633           |2     |1 |1 |0.00020  |0.00020  |Getopt::Long::      |ParseOptionSpec|
634           |7     |7 |6 |0.00043  |0.00020  |strict::            |unimport       |
635
636           For more information see the full list of 189 subroutines.
637
638       The first part of the report already shows the critical information
639       regarding which subroutines are using the most time.  The next gives
640       some statistics about the source files profiled.
641
642                   Source Code Files -- ordered by exclusive time then name
643           |Stmts  |Exclusive|Avg.   |Reports                     |Source File         |
644           |       |Time     |       |                            |                    |
645           |2699761|15.66654 |6e-06  |line   .    block   .    sub|wordmatch           |
646           |35     |0.02187  |0.00062|line   .    block   .    sub|IO/Handle.pm        |
647           |274    |0.01525  |0.00006|line   .    block   .    sub|Getopt/Long.pm      |
648           |20     |0.00585  |0.00029|line   .    block   .    sub|Fcntl.pm            |
649           |128    |0.00340  |0.00003|line   .    block   .    sub|Exporter/Heavy.pm   |
650           |42     |0.00332  |0.00008|line   .    block   .    sub|IO/File.pm          |
651           |261    |0.00308  |0.00001|line   .    block   .    sub|Exporter.pm         |
652           |323    |0.00248  |8e-06  |line   .    block   .    sub|constant.pm         |
653           |12     |0.00246  |0.00021|line   .    block   .    sub|File/Spec/Unix.pm   |
654           |191    |0.00240  |0.00001|line   .    block   .    sub|vars.pm             |
655           |77     |0.00201  |0.00003|line   .    block   .    sub|FileHandle.pm       |
656           |12     |0.00198  |0.00016|line   .    block   .    sub|Carp.pm             |
657           |14     |0.00175  |0.00013|line   .    block   .    sub|Symbol.pm           |
658           |15     |0.00130  |0.00009|line   .    block   .    sub|IO.pm               |
659           |22     |0.00120  |0.00005|line   .    block   .    sub|IO/Seekable.pm      |
660           |198    |0.00085  |4e-06  |line   .    block   .    sub|warnings/register.pm|
661           |114    |0.00080  |7e-06  |line   .    block   .    sub|strict.pm           |
662           |47     |0.00068  |0.00001|line   .    block   .    sub|warnings.pm         |
663           |27     |0.00054  |0.00002|line   .    block   .    sub|overload.pm         |
664           |9      |0.00047  |0.00005|line   .    block   .    sub|SelectSaver.pm      |
665           |13     |0.00045  |0.00003|line   .    block   .    sub|File/Spec.pm        |
666           |2701595|15.73869 |       |Total                       |
667           |128647 |0.74946  |       |Average                     |
668           |       |0.00201  |0.00003|Median                      |
669           |       |0.00121  |0.00003|Deviation                   |
670
671           Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and
672           Adam Kaplan.
673
674       At this point, if you're using the html report, you can click through
675       the various links to bore down into each subroutine and each line of
676       code.  Because we're using the text reporting here, and there's a whole
677       directory full of reports built for each source file, we'll just
678       display a part of the corresponding wordmatch-line.html file,
679       sufficient to give an idea of the sort of output you can expect from
680       this cool tool.
681
682           $> html2text nytprof/wordmatch-line.html
683
684           Performance Profile -- -block view-.-line view-.-sub view-
685           For wordmatch
686           Run on Fri Sep 26 13:46:39 2008
687           Reported on Fri Sep 26 13:47:22 2008
688
689           File wordmatch
690
691            Subroutines -- ordered by exclusive time
692           |Calls |P|F|Inclusive|Exclusive|Subroutine    |
693           |      | | |Time     |Time     |              |
694           |251215|5|1|13.09263 |10.47692 |main::|matches|
695           |260642|2|1|2.71199  |2.71199  |main::|debug  |
696           |1     |1|1|0.21404  |0.21404  |main::|report |
697           |0     |0|0|0        |0        |main::|BEGIN  |
698
699
700           |Line|Stmts.|Exclusive|Avg.   |Code                                           |
701           |    |      |Time     |       |                                               |
702           |1   |      |         |       |#!/usr/bin/perl                                |
703           |2   |      |         |       |                                               |
704           |    |      |         |       |use strict;                                    |
705           |3   |3     |0.00086  |0.00029|# spent 0.00003s making 1 calls to strict::    |
706           |    |      |         |       |import                                         |
707           |    |      |         |       |use warnings;                                  |
708           |4   |3     |0.01563  |0.00521|# spent 0.00012s making 1 calls to warnings::  |
709           |    |      |         |       |import                                         |
710           |5   |      |         |       |                                               |
711           |6   |      |         |       |=head1 NAME                                    |
712           |7   |      |         |       |                                               |
713           |8   |      |         |       |filewords - word analysis of input file        |
714           <...snip...>
715           |62  |1     |0.00445  |0.00445|print report( %count );                        |
716           |    |      |         |       |# spent 0.21404s making 1 calls to main::report|
717           |63  |      |         |       |                                               |
718           |    |      |         |       |# spent 23.56955s (10.47692+2.61571) within    |
719           |    |      |         |       |main::matches which was called 251215 times,   |
720           |    |      |         |       |avg 0.00005s/call: # 50243 times               |
721           |    |      |         |       |(2.12134+0.51939s) at line 57 of wordmatch, avg|
722           |    |      |         |       |0.00005s/call # 50243 times (2.17735+0.54550s) |
723           |64  |      |         |       |at line 56 of wordmatch, avg 0.00005s/call #   |
724           |    |      |         |       |50243 times (2.10992+0.51797s) at line 58 of   |
725           |    |      |         |       |wordmatch, avg 0.00005s/call # 50243 times     |
726           |    |      |         |       |(2.12696+0.51598s) at line 55 of wordmatch, avg|
727           |    |      |         |       |0.00005s/call # 50243 times (1.94134+0.51687s) |
728           |    |      |         |       |at line 54 of wordmatch, avg 0.00005s/call     |
729           |    |      |         |       |sub matches {                                  |
730           <...snip...>
731           |102 |      |         |       |                                               |
732           |    |      |         |       |# spent 2.71199s within main::debug which was  |
733           |    |      |         |       |called 260642 times, avg 0.00001s/call: #      |
734           |    |      |         |       |251215 times (2.61571+0s) by main::matches at  |
735           |103 |      |         |       |line 74 of wordmatch, avg 0.00001s/call # 9427 |
736           |    |      |         |       |times (0.09628+0s) at line 50 of wordmatch, avg|
737           |    |      |         |       |0.00001s/call                                  |
738           |    |      |         |       |sub debug {                                    |
739           |104 |260642|0.58496  |2e-06  |my $message = shift;                           |
740           |105 |      |         |       |                                               |
741           |106 |260642|1.09917  |4e-06  |if ( $debug ) {                                |
742           |107 |      |         |       |print STDERR "DBG: $message\n";                |
743           |108 |      |         |       |}                                              |
744           |109 |      |         |       |}                                              |
745           |110 |      |         |       |                                               |
746           |111 |1     |0.01501  |0.01501|exit 0;                                        |
747           |112 |      |         |       |                                               |
748
749       Oodles of very useful information in there - this seems to be the way
750       forward.
751
752       See also "Devel::NYTProf::Apache" which hooks "Devel::NYTProf" into
753       "mod_perl".
754

SORTING

756       Perl modules are not the only tools a performance analyst has at their
757       disposal, system tools like "time" should not be overlooked as the next
758       example shows, where we take a quick look at sorting.  Many books,
759       theses and articles, have been written about efficient sorting
760       algorithms, and this is not the place to repeat such work, there's
761       several good sorting modules which deserve taking a look at too:
762       "Sort::Maker", "Sort::Key" spring to mind.  However, it's still
763       possible to make some observations on certain Perl specific
764       interpretations on issues relating to sorting data sets and give an
765       example or two with regard to how sorting large data volumes can effect
766       performance.  Firstly, an often overlooked point when sorting large
767       amounts of data, one can attempt to reduce the data set to be dealt
768       with and in many cases "grep()" can be quite useful as a simple filter:
769
770           @data = sort grep { /$filter/ } @incoming
771
772       A command such as this can vastly reduce the volume of material to
773       actually sort through in the first place, and should not be too lightly
774       disregarded purely on the basis of its simplicity.  The "KISS"
775       principle is too often overlooked - the next example uses the simple
776       system "time" utility to demonstrate.  Let's take a look at an actual
777       example of sorting the contents of a large file, an apache logfile
778       would do.  This one has over a quarter of a million lines, is 50M in
779       size, and a snippet of it looks like this:
780
781       # logfile
782
783           188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
784           188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
785           151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
786           151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
787           151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
788           217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
789           217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
790           debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
791           debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
792           debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
793           195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
794           195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
795           195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
796           crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0"
797           crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0"
798           fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
799           fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
800           80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
801           80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
802           pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
803           livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
804           livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
805           dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
806           196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1"
807
808       The specific task here is to sort the 286,525 lines of this file by
809       Response Code, Query, Browser, Referring Url, and lastly Date.  One
810       solution might be to use the following code, which iterates over the
811       files given on the command-line.
812
813       # sort-apache-log
814
815           #!/usr/bin/perl -n
816
817           use strict;
818           use warnings;
819
820           my @data;
821
822           LINE:
823           while ( <> ) {
824               my $line = $_;
825               if (
826                   $line =~ m/^(
827                       ([\w\.\-]+)             # client
828                       \s*-\s*-\s*\[
829                       ([^]]+)                 # date
830                       \]\s*"\w+\s*
831                       (\S+)                   # query
832                       [^"]+"\s*
833                       (\d+)                   # status
834                       \s+\S+\s+"[^"]*"\s+"
835                       ([^"]*)                 # browser
836                       "
837                       .*
838                   )$/x
839               ) {
840                   my @chunks = split(/ +/, $line);
841                   my $ip      = $1;
842                   my $date    = $2;
843                   my $query   = $3;
844                   my $status  = $4;
845                   my $browser = $5;
846
847                   push(@data, [$ip, $date, $query, $status, $browser, $line]);
848               }
849           }
850
851           my @sorted = sort {
852               $a->[3] cmp $b->[3]
853                       ||
854               $a->[2] cmp $b->[2]
855                       ||
856               $a->[0] cmp $b->[0]
857                       ||
858               $a->[1] cmp $b->[1]
859                       ||
860               $a->[4] cmp $b->[4]
861           } @data;
862
863           foreach my $data ( @sorted ) {
864               print $data->[5];
865           }
866
867           exit 0;
868
869       When running this program, redirect "STDOUT" so it is possible to check
870       the output is correct from following test runs and use the system
871       "time" utility to check the overall runtime.
872
873           $> time ./sort-apache-log logfile > out-sort
874
875           real    0m17.371s
876           user    0m15.757s
877           sys     0m0.592s
878
879       The program took just over 17 wallclock seconds to run.  Note the
880       different values "time" outputs, it's important to always use the same
881       one, and to not confuse what each one means.
882
883       Elapsed Real Time
884           The overall, or wallclock, time between when "time" was called, and
885           when it terminates.  The elapsed time includes both user and system
886           times, and time spent waiting for other users and processes on the
887           system.  Inevitably, this is the most approximate of the
888           measurements given.
889
890       User CPU Time
891           The user time is the amount of time the entire process spent on
892           behalf of the user on this system executing this program.
893
894       System CPU Time
895           The system time is the amount of time the kernel itself spent
896           executing routines, or system calls, on behalf of this process
897           user.
898
899       Running this same process as a "Schwarzian Transform" it is possible to
900       eliminate the input and output arrays for storing all the data, and
901       work on the input directly as it arrives too.  Otherwise, the code
902       looks fairly similar:
903
904       # sort-apache-log-schwarzian
905
906           #!/usr/bin/perl -n
907
908           use strict;
909           use warnings;
910
911           print
912
913               map $_->[0] =>
914
915               sort {
916                   $a->[4] cmp $b->[4]
917                           ||
918                   $a->[3] cmp $b->[3]
919                           ||
920                   $a->[1] cmp $b->[1]
921                           ||
922                   $a->[2] cmp $b->[2]
923                           ||
924                   $a->[5] cmp $b->[5]
925               }
926               map  [ $_, m/^(
927                   ([\w\.\-]+)             # client
928                   \s*-\s*-\s*\[
929                   ([^]]+)                 # date
930                   \]\s*"\w+\s*
931                   (\S+)                   # query
932                   [^"]+"\s*
933                   (\d+)                   # status
934                   \s+\S+\s+"[^"]*"\s+"
935                   ([^"]*)                 # browser
936                   "
937                   .*
938               )$/xo ]
939
940               => <>;
941
942           exit 0;
943
944       Run the new code against the same logfile, as above, to check the new
945       time.
946
947           $> time ./sort-apache-log-schwarzian logfile > out-schwarz
948
949           real    0m9.664s
950           user    0m8.873s
951           sys     0m0.704s
952
953       The time has been cut in half, which is a respectable speed improvement
954       by any standard.  Naturally, it is important to check the output is
955       consistent with the first program run, this is where the Unix system
956       "cksum" utility comes in.
957
958           $> cksum out-sort out-schwarz
959           3044173777 52029194 out-sort
960           3044173777 52029194 out-schwarz
961
962       BTW. Beware too of pressure from managers who see you speed a program
963       up by 50% of the runtime once, only to get a request one month later to
964       do the same again (true story) - you'll just have to point out your
965       only human, even if you are a Perl programmer, and you'll see what you
966       can do...
967

LOGGING

969       An essential part of any good development process is appropriate error
970       handling with appropriately informative messages, however there exists
971       a school of thought which suggests that log files should be chatty, as
972       if the chain of unbroken output somehow ensures the survival of the
973       program.  If speed is in any way an issue, this approach is wrong.
974
975       A common sight is code which looks something like this:
976
977           logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) )
978
979       The problem is that this code will always be parsed and executed, even
980       when the debug level set in the logging configuration file is zero.
981       Once the debug() subroutine has been entered, and the internal $debug
982       variable confirmed to be zero, for example, the message which has been
983       sent in will be discarded and the program will continue.  In the
984       example given though, the \%INC hash will already have been dumped, and
985       the message string constructed, all of which work could be bypassed by
986       a debug variable at the statement level, like this:
987
988           logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) ) if $DEBUG;
989
990       This effect can be demonstrated by setting up a test script with both
991       forms, including a "debug()" subroutine to emulate typical "logger()"
992       functionality.
993
994       # ifdebug
995
996           #!/usr/bin/perl
997
998           use strict;
999           use warnings;
1000
1001           use Benchmark;
1002           use Data::Dumper;
1003           my $DEBUG = 0;
1004
1005           sub debug {
1006               my $msg = shift;
1007
1008               if ( $DEBUG ) {
1009                   print "DEBUG: $msg\n";
1010               }
1011           };
1012
1013           timethese(100000, {
1014                   'debug'       => sub {
1015                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
1016                   },
1017                   'ifdebug'  => sub {
1018                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG
1019                   },
1020           });
1021
1022       Let's see what "Benchmark" makes of this:
1023
1024           $> perl ifdebug
1025           Benchmark: timing 100000 iterations of constant, sub...
1026              ifdebug:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
1027                       (warning: too few iterations for a reliable count)
1028                debug: 14 wallclock secs (13.18 usr +  0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000)
1029
1030       In the one case the code, which does exactly the same thing as far as
1031       outputting any debugging information is concerned, in other words
1032       nothing, takes 14 seconds, and in the other case the code takes one
1033       hundredth of a second.  Looks fairly definitive.  Use a $DEBUG variable
1034       BEFORE you call the subroutine, rather than relying on the smart
1035       functionality inside it.
1036
1037   Logging if DEBUG (constant)
1038       It's possible to take the previous idea a little further, by using a
1039       compile time "DEBUG" constant.
1040
1041       # ifdebug-constant
1042
1043           #!/usr/bin/perl
1044
1045           use strict;
1046           use warnings;
1047
1048           use Benchmark;
1049           use Data::Dumper;
1050           use constant
1051               DEBUG => 0
1052           ;
1053
1054           sub debug {
1055               if ( DEBUG ) {
1056                   my $msg = shift;
1057                   print "DEBUG: $msg\n";
1058               }
1059           };
1060
1061           timethese(100000, {
1062                   'debug'       => sub {
1063                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
1064                   },
1065                   'constant'  => sub {
1066                       debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG
1067                   },
1068           });
1069
1070       Running this program produces the following output:
1071
1072           $> perl ifdebug-constant
1073           Benchmark: timing 100000 iterations of constant, sub...
1074             constant:  0 wallclock secs (-0.00 usr +  0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000)
1075                       (warning: too few iterations for a reliable count)
1076                  sub: 14 wallclock secs (13.09 usr +  0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000)
1077
1078       The "DEBUG" constant wipes the floor with even the $debug variable,
1079       clocking in at minus zero seconds, and generates a "warning: too few
1080       iterations for a reliable count" message into the bargain.  To see what
1081       is really going on, and why we had too few iterations when we thought
1082       we asked for 100000, we can use the very useful "B::Deparse" to inspect
1083       the new code:
1084
1085           $> perl -MO=Deparse ifdebug-constant
1086
1087           use Benchmark;
1088           use Data::Dumper;
1089           use constant ('DEBUG', 0);
1090           sub debug {
1091               use warnings;
1092               use strict 'refs';
1093               0;
1094           }
1095           use warnings;
1096           use strict 'refs';
1097           timethese(100000, {'sub', sub {
1098               debug "A $0 logging message via process-id: $$" . Dumper(\%INC);
1099           }
1100           , 'constant', sub {
1101               0;
1102           }
1103           });
1104           ifdebug-constant syntax OK
1105
1106       The output shows the constant() subroutine we're testing being replaced
1107       with the value of the "DEBUG" constant: zero.  The line to be tested
1108       has been completely optimized away, and you can't get much more
1109       efficient than that.
1110

POSTSCRIPT

1112       This document has provided several way to go about identifying hot-
1113       spots, and checking whether any modifications have improved the runtime
1114       of the code.
1115
1116       As a final thought, remember that it's not (at the time of writing)
1117       possible to produce a useful program which will run in zero or negative
1118       time and this basic principle can be written as: useful programs are
1119       slow by their very definition.  It is of course possible to write a
1120       nearly instantaneous program, but it's not going to do very much,
1121       here's a very efficient one:
1122
1123           $> perl -e 0
1124
1125       Optimizing that any further is a job for "p5p".
1126

SEE ALSO

1128       Further reading can be found using the modules and links below.
1129
1130   PERLDOCS
1131       For example: "perldoc -f sort".
1132
1133       perlfaq4.
1134
1135       perlfork, perlfunc, perlretut, perlthrtut.
1136
1137       threads.
1138
1139   MAN PAGES
1140       "time".
1141
1142   MODULES
1143       It's not possible to individually showcase all the performance related
1144       code for Perl here, naturally, but here's a short list of modules from
1145       the CPAN which deserve further attention.
1146
1147           Apache::DProf
1148           Apache::SmallProf
1149           Benchmark
1150           DBIx::Profiler
1151           Devel::AutoProfiler
1152           Devel::DProf
1153           Devel::DProfLB
1154           Devel::FastProf
1155           Devel::GraphVizProf
1156           Devel::NYTProf
1157           Devel::NYTProf::Apache
1158           Devel::Profiler
1159           Devel::Profile
1160           Devel::Profit
1161           Devel::SmallProf
1162           Devel::WxProf
1163           POE::Devel::Profiler
1164           Sort::Key
1165           Sort::Maker
1166
1167   URLS
1168       Very useful online reference material:
1169
1170           http://www.ccl4.org/~nick/P/Fast_Enough/
1171
1172           http://www-128.ibm.com/developerworks/library/l-optperl.html
1173
1174           http://perlbuzz.com/2007/11/bind-output-variables-in-dbi-for-speed-and-safety.html
1175
1176           http://en.wikipedia.org/wiki/Performance_analysis
1177
1178           http://apache.perl.org/docs/1.0/guide/performance.html
1179
1180           http://perlgolf.sourceforge.net/
1181
1182           http://www.sysarch.com/Perl/sort_paper.html
1183

AUTHOR

1185       Richard Foley <richard.foley@rfi.net> Copyright (c) 2008
1186
1187
1188
1189perl v5.12.4                      2011-06-07                       PERLPERF(1)
Impressum