1PERLPERF(1) Perl Programmers Reference Guide PERLPERF(1)
2
3
4
6 perlperf - Perl Performance and Optimization Techniques
7
9 This is an introduction to the use of performance and optimization
10 techniques which can be used with particular reference to perl
11 programs. While many perl developers have come from other languages,
12 and can use their prior knowledge where appropriate, there are many
13 other people who might benefit from a few perl specific pointers. If
14 you want the condensed version, perhaps the best advice comes from the
15 renowned Japanese Samurai, Miyamoto Musashi, who said:
16
17 "Do Not Engage in Useless Activity"
18
19 in 1645.
20
22 Perhaps the most common mistake programmers make is to attempt to
23 optimize their code before a program actually does anything useful -
24 this is a bad idea. There's no point in having an extremely fast
25 program that doesn't work. The first job is to get a program to
26 correctly do something useful, (not to mention ensuring the test suite
27 is fully functional), and only then to consider optimizing it. Having
28 decided to optimize existing working code, there are several simple but
29 essential steps to consider which are intrinsic to any optimization
30 process.
31
32 ONE STEP SIDEWAYS
33 Firstly, you need to establish a baseline time for the existing code,
34 which timing needs to be reliable and repeatable. You'll probably want
35 to use the "Benchmark" or "Devel::DProf" modules, or something similar,
36 for this step, or perhaps the unix system "time" utility, whichever is
37 appropriate. See the base of this document for a longer list of
38 benchmarking and profiling modules, and recommended further reading.
39
40 ONE STEP FORWARD
41 Next, having examined the program for hot spots, (places where the code
42 seems to run slowly), change the code with the intention of making it
43 run faster. Using version control software, like "subversion", will
44 ensure no changes are irreversible. It's too easy to fiddle here and
45 fiddle there - don't change too much at any one time or you might not
46 discover which piece of code really was the slow bit.
47
48 ANOTHER STEP SIDEWAYS
49 It's not enough to say: "that will make it run faster", you have to
50 check it. Rerun the code under control of the benchmarking or
51 profiling modules, from the first step above, and check that the new
52 code executed the same task in less time. Save your work and repeat...
53
55 The critical thing when considering performance is to remember there is
56 no such thing as a "Golden Bullet", which is why there are no rules,
57 only guidelines.
58
59 It is clear that inline code is going to be faster than subroutine or
60 method calls, because there is less overhead, but this approach has the
61 disadvantage of being less maintainable and comes at the cost of
62 greater memory usage - there is no such thing as a free lunch. If you
63 are searching for an element in a list, it can be more efficient to
64 store the data in a hash structure, and then simply look to see whether
65 the key is defined, rather than to loop through the entire array using
66 grep() for instance. substr() may be (a lot) faster than grep() but
67 not as flexible, so you have another trade-off to access. Your code
68 may contain a line which takes 0.01 of a second to execute which if you
69 call it 1,000 times, quite likely in a program parsing even medium
70 sized files for instance, you already have a 10 second delay, in just
71 one single code location, and if you call that line 100,000 times, your
72 entire program will slow down to an unbearable crawl.
73
74 Using a subroutine as part of your sort is a powerful way to get
75 exactly what you want, but will usually be slower than the built-in
76 alphabetic "cmp" and numeric "<=>" sort operators. It is possible to
77 make multiple passes over your data, building indices to make the
78 upcoming sort more efficient, and to use what is known as the "OM"
79 (Orcish Maneuver) to cache the sort keys in advance. The cache lookup,
80 while a good idea, can itself be a source of slowdown by enforcing a
81 double pass over the data - once to setup the cache, and once to sort
82 the data. Using "pack()" to extract the required sort key into a
83 consistent string can be an efficient way to build a single string to
84 compare, instead of using multiple sort keys, which makes it possible
85 to use the standard, written in "c" and fast, perl "sort()" function on
86 the output, and is the basis of the "GRT" (Guttman Rossler Transform).
87 Some string combinations can slow the "GRT" down, by just being too
88 plain complex for it's own good.
89
90 For applications using database backends, the standard "DBIx" namespace
91 has tries to help with keeping things nippy, not least because it tries
92 to not query the database until the latest possible moment, but always
93 read the docs which come with your choice of libraries. Among the many
94 issues facing developers dealing with databases should remain aware of
95 is to always use "SQL" placeholders and to consider pre-fetching data
96 sets when this might prove advantageous. Splitting up a large file by
97 assigning multiple processes to parsing a single file, using say "POE",
98 "threads" or "fork" can also be a useful way of optimizing your usage
99 of the available "CPU" resources, though this technique is fraught with
100 concurrency issues and demands high attention to detail.
101
102 Every case has a specific application and one or more exceptions, and
103 there is no replacement for running a few tests and finding out which
104 method works best for your particular environment, this is why writing
105 optimal code is not an exact science, and why we love using Perl so
106 much - TMTOWTDI.
107
109 Here are a few examples to demonstrate usage of Perl's benchmarking
110 tools.
111
112 Assigning and Dereferencing Variables.
113 I'm sure most of us have seen code which looks like, (or worse than),
114 this:
115
116 if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) {
117 ...
118
119 This sort of code can be a real eyesore to read, as well as being very
120 sensitive to typos, and it's much clearer to dereference the variable
121 explicitly. We're side-stepping the issue of working with object-
122 oriented programming techniques to encapsulate variable access via
123 methods, only accessible through an object. Here we're just discussing
124 the technical implementation of choice, and whether this has an effect
125 on performance. We can see whether this dereferencing operation, has
126 any overhead by putting comparative code in a file and running a
127 "Benchmark" test.
128
129 # dereference
130
131 #!/usr/bin/perl
132
133 use strict;
134 use warnings;
135
136 use Benchmark;
137
138 my $ref = {
139 'ref' => {
140 _myscore => '100 + 1',
141 _yourscore => '102 - 1',
142 },
143 };
144
145 timethese(1000000, {
146 'direct' => sub {
147 my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ;
148 },
149 'dereference' => sub {
150 my $ref = $ref->{ref};
151 my $myscore = $ref->{_myscore};
152 my $yourscore = $ref->{_yourscore};
153 my $x = $myscore . $yourscore;
154 },
155 });
156
157 It's essential to run any timing measurements a sufficient number of
158 times so the numbers settle on a numerical average, otherwise each run
159 will naturally fluctuate due to variations in the environment, to
160 reduce the effect of contention for "CPU" resources and network
161 bandwidth for instance. Running the above code for one million
162 iterations, we can take a look at the report output by the "Benchmark"
163 module, to see which approach is the most effective.
164
165 $> perl dereference
166
167 Benchmark: timing 1000000 iterations of dereference, direct...
168 dereference: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 628930.82/s (n=1000000)
169 direct: 1 wallclock secs ( 1.20 usr + 0.00 sys = 1.20 CPU) @ 833333.33/s (n=1000000)
170
171 The difference is clear to see and the dereferencing approach is
172 slower. While it managed to execute an average of 628,930 times a
173 second during our test, the direct approach managed to run an
174 additional 204,403 times, unfortunately. Unfortunately, because there
175 are many examples of code written using the multiple layer direct
176 variable access, and it's usually horrible. It is, however, miniscully
177 faster. The question remains whether the minute gain is actually worth
178 the eyestrain, or the loss of maintainability.
179
180 Search and replace or tr
181 If we have a string which needs to be modified, while a regex will
182 almost always be much more flexible, "tr", an oft underused tool, can
183 still be a useful. One scenario might be replace all vowels with
184 another character. The regex solution might look like this:
185
186 $str =~ s/[aeiou]/x/g
187
188 The "tr" alternative might look like this:
189
190 $str =~ tr/aeiou/xxxxx/
191
192 We can put that into a test file which we can run to check which
193 approach is the fastest, using a global $STR variable to assign to the
194 "my $str" variable so as to avoid perl trying to optimize any of the
195 work away by noticing it's assigned only the once.
196
197 # regex-transliterate
198
199 #!/usr/bin/perl
200
201 use strict;
202 use warnings;
203
204 use Benchmark;
205
206 my $STR = "$$-this and that";
207
208 timethese( 1000000, {
209 'sr' => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; },
210 'tr' => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; },
211 });
212
213 Running the code gives us our results:
214
215 $> perl regex-transliterate
216
217 Benchmark: timing 1000000 iterations of sr, tr...
218 sr: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 840336.13/s (n=1000000)
219 tr: 0 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 2040816.33/s (n=1000000)
220
221 The "tr" version is a clear winner. One solution is flexible, the
222 other is fast - and it's appropriately the programmers choice which to
223 use in the circumstances.
224
225 Check the "Benchmark" docs for further useful techniques.
226
228 A slightly larger piece of code will provide something on which a
229 profiler can produce more extensive reporting statistics. This example
230 uses the simplistic "wordmatch" program which parses a given input file
231 and spews out a short report on the contents.
232
233 # wordmatch
234
235 #!/usr/bin/perl
236
237 use strict;
238 use warnings;
239
240 =head1 NAME
241
242 filewords - word analysis of input file
243
244 =head1 SYNOPSIS
245
246 filewords -f inputfilename [-d]
247
248 =head1 DESCRIPTION
249
250 This program parses the given filename, specified with C<-f>, and displays a
251 simple analysis of the words found therein. Use the C<-d> switch to enable
252 debugging messages.
253
254 =cut
255
256 use FileHandle;
257 use Getopt::Long;
258
259 my $debug = 0;
260 my $file = '';
261
262 my $result = GetOptions (
263 'debug' => \$debug,
264 'file=s' => \$file,
265 );
266 die("invalid args") unless $result;
267
268 unless ( -f $file ) {
269 die("Usage: $0 -f filename [-d]");
270 }
271 my $FH = FileHandle->new("< $file") or die("unable to open file($file): $!");
272
273 my $i_LINES = 0;
274 my $i_WORDS = 0;
275 my %count = ();
276
277 my @lines = <$FH>;
278 foreach my $line ( @lines ) {
279 $i_LINES++;
280 $line =~ s/\n//;
281 my @words = split(/ +/, $line);
282 my $i_words = scalar(@words);
283 $i_WORDS = $i_WORDS + $i_words;
284 debug("line: $i_LINES supplying $i_words words: @words");
285 my $i_word = 0;
286 foreach my $word ( @words ) {
287 $i_word++;
288 $count{$i_LINES}{spec} += matches($i_word, $word, '[^a-zA-Z0-9]');
289 $count{$i_LINES}{only} += matches($i_word, $word, '^[^a-zA-Z0-9]+$');
290 $count{$i_LINES}{cons} += matches($i_word, $word, '^[(?i:bcdfghjklmnpqrstvwxyz)]+$');
291 $count{$i_LINES}{vows} += matches($i_word, $word, '^[(?i:aeiou)]+$');
292 $count{$i_LINES}{caps} += matches($i_word, $word, '^[(A-Z)]+$');
293 }
294 }
295
296 print report( %count );
297
298 sub matches {
299 my $i_wd = shift;
300 my $word = shift;
301 my $regex = shift;
302 my $has = 0;
303
304 if ( $word =~ /($regex)/ ) {
305 $has++ if $1;
306 }
307
308 debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
309
310 return $has;
311 }
312
313 sub report {
314 my %report = @_;
315 my %rep;
316
317 foreach my $line ( keys %report ) {
318 foreach my $key ( keys %{ $report{$line} } ) {
319 $rep{$key} += $report{$line}{$key};
320 }
321 }
322
323 my $report = qq|
324 $0 report for $file:
325 lines in file: $i_LINES
326 words in file: $i_WORDS
327 words with special (non-word) characters: $i_spec
328 words with only special (non-word) characters: $i_only
329 words with only consonants: $i_cons
330 words with only capital letters: $i_caps
331 words with only vowels: $i_vows
332 |;
333
334 return $report;
335 }
336
337 sub debug {
338 my $message = shift;
339
340 if ( $debug ) {
341 print STDERR "DBG: $message\n";
342 }
343 }
344
345 exit 0;
346
347 Devel::DProf
348 This venerable module has been the de-facto standard for Perl code
349 profiling for more than a decade, but has been replaced by a number of
350 other modules which have brought us back to the 21st century. Although
351 you're recommended to evaluate your tool from the several mentioned
352 here and from the CPAN list at the base of this document, (and
353 currently Devel::NYTProf seems to be the weapon of choice - see below),
354 we'll take a quick look at the output from Devel::DProf first, to set a
355 baseline for Perl profiling tools. Run the above program under the
356 control of "Devel::DProf" by using the "-d" switch on the command-line.
357
358 $> perl -d:DProf wordmatch -f perl5db.pl
359
360 <...multiple lines snipped...>
361
362 wordmatch report for perl5db.pl:
363 lines in file: 9428
364 words in file: 50243
365 words with special (non-word) characters: 20480
366 words with only special (non-word) characters: 7790
367 words with only consonants: 4801
368 words with only capital letters: 1316
369 words with only vowels: 1701
370
371 "Devel::DProf" produces a special file, called tmon.out by default, and
372 this file is read by the "dprofpp" program, which is already installed
373 as part of the "Devel::DProf" distribution. If you call "dprofpp" with
374 no options, it will read the tmon.out file in the current directory and
375 produce a human readable statistics report of the run of your program.
376 Note that this may take a little time.
377
378 $> dprofpp
379
380 Total Elapsed Time = 2.951677 Seconds
381 User+System Time = 2.871677 Seconds
382 Exclusive Times
383 %Time ExclSec CumulS #Calls sec/call Csec/c Name
384 102. 2.945 3.003 251215 0.0000 0.0000 main::matches
385 2.40 0.069 0.069 260643 0.0000 0.0000 main::debug
386 1.74 0.050 0.050 1 0.0500 0.0500 main::report
387 1.04 0.030 0.049 4 0.0075 0.0123 main::BEGIN
388 0.35 0.010 0.010 3 0.0033 0.0033 Exporter::as_heavy
389 0.35 0.010 0.010 7 0.0014 0.0014 IO::File::BEGIN
390 0.00 - -0.000 1 - - Getopt::Long::FindOption
391 0.00 - -0.000 1 - - Symbol::BEGIN
392 0.00 - -0.000 1 - - Fcntl::BEGIN
393 0.00 - -0.000 1 - - Fcntl::bootstrap
394 0.00 - -0.000 1 - - warnings::BEGIN
395 0.00 - -0.000 1 - - IO::bootstrap
396 0.00 - -0.000 1 - - Getopt::Long::ConfigDefaults
397 0.00 - -0.000 1 - - Getopt::Long::Configure
398 0.00 - -0.000 1 - - Symbol::gensym
399
400 "dprofpp" will produce some quite detailed reporting on the activity of
401 the "wordmatch" program. The wallclock, user and system, times are at
402 the top of the analysis, and after this are the main columns defining
403 which define the report. Check the "dprofpp" docs for details of the
404 many options it supports.
405
406 See also "Apache::DProf" which hooks "Devel::DProf" into "mod_perl".
407
408 Devel::Profiler
409 Let's take a look at the same program using a different profiler:
410 "Devel::Profiler", a drop-in Perl-only replacement for "Devel::DProf".
411 The usage is very slightly different in that instead of using the
412 special "-d:" flag, you pull "Devel::Profiler" in directly as a module
413 using "-M".
414
415 $> perl -MDevel::Profiler wordmatch -f perl5db.pl
416
417 <...multiple lines snipped...>
418
419 wordmatch report for perl5db.pl:
420 lines in file: 9428
421 words in file: 50243
422 words with special (non-word) characters: 20480
423 words with only special (non-word) characters: 7790
424 words with only consonants: 4801
425 words with only capital letters: 1316
426 words with only vowels: 1701
427
428 "Devel::Profiler" generates a tmon.out file which is compatible with
429 the "dprofpp" program, thus saving the construction of a dedicated
430 statistics reader program. "dprofpp" usage is therefore identical to
431 the above example.
432
433 $> dprofpp
434
435 Total Elapsed Time = 20.984 Seconds
436 User+System Time = 19.981 Seconds
437 Exclusive Times
438 %Time ExclSec CumulS #Calls sec/call Csec/c Name
439 49.0 9.792 14.509 251215 0.0000 0.0001 main::matches
440 24.4 4.887 4.887 260643 0.0000 0.0000 main::debug
441 0.25 0.049 0.049 1 0.0490 0.0490 main::report
442 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::GetOptions
443 0.00 0.000 0.000 2 0.0000 0.0000 Getopt::Long::ParseOptionSpec
444 0.00 0.000 0.000 1 0.0000 0.0000 Getopt::Long::FindOption
445 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::new
446 0.00 0.000 0.000 1 0.0000 0.0000 IO::Handle::new
447 0.00 0.000 0.000 1 0.0000 0.0000 Symbol::gensym
448 0.00 0.000 0.000 1 0.0000 0.0000 IO::File::open
449
450 Interestingly we get slightly different results, which is mostly
451 because the algorithm which generates the report is different, even
452 though the output file format was allegedly identical. The elapsed,
453 user and system times are clearly showing the time it took for
454 "Devel::Profiler" to execute it's own run, but the column listings feel
455 more accurate somehow than the ones we had earlier from "Devel::DProf".
456 The 102% figure has disappeared, for example. This is where we have to
457 use the tools at our disposal, and recognise their pros and cons,
458 before using them. Interestingly, the numbers of calls for each
459 subroutine are identical in the two reports, it's the percentages which
460 differ. As the author of "Devel::Proviler" writes:
461
462 ...running HTML::Template's test suite under Devel::DProf shows output()
463 taking NO time but Devel::Profiler shows around 10% of the time is in output().
464 I don't know which to trust but my gut tells me something is wrong with
465 Devel::DProf. HTML::Template::output() is a big routine that's called for
466 every test. Either way, something needs fixing.
467
468 YMMV.
469
470 See also "Devel::Apache::Profiler" which hooks "Devel::Profiler" into
471 "mod_perl".
472
473 Devel::SmallProf
474 The "Devel::SmallProf" profiler examines the runtime of your Perl
475 program and produces a line-by-line listing to show how many times each
476 line was called, and how long each line took to execute. It is called
477 by supplying the familiar "-d" flag to Perl at runtime.
478
479 $> perl -d:SmallProf wordmatch -f perl5db.pl
480
481 <...multiple lines snipped...>
482
483 wordmatch report for perl5db.pl:
484 lines in file: 9428
485 words in file: 50243
486 words with special (non-word) characters: 20480
487 words with only special (non-word) characters: 7790
488 words with only consonants: 4801
489 words with only capital letters: 1316
490 words with only vowels: 1701
491
492 "Devel::SmallProf" writes it's output into a file called smallprof.out,
493 by default. The format of the file looks like this:
494
495 <num> <time> <ctime> <line>:<text>
496
497 When the program has terminated, the output may be examined and sorted
498 using any standard text filtering utilities. Something like the
499 following may be sufficient:
500
501 $> cat smallprof.out | grep \d*: | sort -k3 | tac | head -n20
502
503 251215 1.65674 7.68000 75: if ( $word =~ /($regex)/ ) {
504 251215 0.03264 4.40000 79: debug("word: $i_wd ".($has ? 'matches' :
505 251215 0.02693 4.10000 81: return $has;
506 260643 0.02841 4.07000 128: if ( $debug ) {
507 260643 0.02601 4.04000 126: my $message = shift;
508 251215 0.02641 3.91000 73: my $has = 0;
509 251215 0.03311 3.71000 70: my $i_wd = shift;
510 251215 0.02699 3.69000 72: my $regex = shift;
511 251215 0.02766 3.68000 71: my $word = shift;
512 50243 0.59726 1.00000 59: $count{$i_LINES}{cons} =
513 50243 0.48175 0.92000 61: $count{$i_LINES}{spec} =
514 50243 0.00644 0.89000 56: my $i_cons = matches($i_word, $word,
515 50243 0.48837 0.88000 63: $count{$i_LINES}{caps} =
516 50243 0.00516 0.88000 58: my $i_caps = matches($i_word, $word, '^[(A-
517 50243 0.00631 0.81000 54: my $i_spec = matches($i_word, $word, '[^a-
518 50243 0.00496 0.80000 57: my $i_vows = matches($i_word, $word,
519 50243 0.00688 0.80000 53: $i_word++;
520 50243 0.48469 0.79000 62: $count{$i_LINES}{only} =
521 50243 0.48928 0.77000 60: $count{$i_LINES}{vows} =
522 50243 0.00683 0.75000 55: my $i_only = matches($i_word, $word, '^[^a-
523
524 You can immediately see a slightly different focus to the subroutine
525 profiling modules, and we start to see exactly which line of code is
526 taking the most time. That regex line is looking a bit suspicious, for
527 example. Remember that these tools are supposed to be used together,
528 there is no single best way to profile your code, you need to use the
529 best tools for the job.
530
531 See also "Apache::SmallProf" which hooks "Devel::SmallProf" into
532 "mod_perl".
533
534 Devel::FastProf
535 "Devel::FastProf" is another Perl line profiler. This was written with
536 a view to getting a faster line profiler, than is possible with for
537 example "Devel::SmallProf", because it's written in "C". To use
538 "Devel::FastProf", supply the "-d" argument to Perl:
539
540 $> perl -d:FastProf wordmatch -f perl5db.pl
541
542 <...multiple lines snipped...>
543
544 wordmatch report for perl5db.pl:
545 lines in file: 9428
546 words in file: 50243
547 words with special (non-word) characters: 20480
548 words with only special (non-word) characters: 7790
549 words with only consonants: 4801
550 words with only capital letters: 1316
551 words with only vowels: 1701
552
553 "Devel::FastProf" writes statistics to the file fastprof.out in the
554 current directory. The output file, which can be specified, can be
555 interpreted by using the "fprofpp" command-line program.
556
557 $> fprofpp | head -n20
558
559 # fprofpp output format is:
560 # filename:line time count: source
561 wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) {
562 wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
563 wordmatch:81 1.47604 251215: return $has;
564 wordmatch:126 1.43441 260643: my $message = shift;
565 wordmatch:128 1.42156 260643: if ( $debug ) {
566 wordmatch:70 1.36824 251215: my $i_wd = shift;
567 wordmatch:71 1.36739 251215: my $word = shift;
568 wordmatch:72 1.35939 251215: my $regex = shift;
569
570 Straightaway we can see that the number of times each line has been
571 called is identical to the "Devel::SmallProf" output, and the sequence
572 is only very slightly different based on the ordering of the amount of
573 time each line took to execute, "if ( $debug ) { " and "my $message =
574 shift;", for example. The differences in the actual times recorded
575 might be in the algorithm used internally, or it could be due to system
576 resource limitations or contention.
577
578 See also the DBIx::Profiler which will profile database queries running
579 under the "DBIx::*" namespace.
580
581 Devel::NYTProf
582 "Devel::NYTProf" is the next generation of Perl code profiler, fixing
583 many shortcomings in other tools and implementing many cool features.
584 First of all it can be used as either a line profiler, a block or a
585 subroutine profiler, all at once. It can also use sub-microsecond
586 (100ns) resolution on systems which provide "clock_gettime()". It can
587 be started and stopped even by the program being profiled. It's a one-
588 line entry to profile "mod_perl" applications. It's written in "c" and
589 is probably the fastest profiler available for Perl. The list of
590 coolness just goes on. Enough of that, let's see how to it works -
591 just use the familiar "-d" switch to plug it in and run the code.
592
593 $> perl -d:NYTProf wordmatch -f perl5db.pl
594
595 wordmatch report for perl5db.pl:
596 lines in file: 9427
597 words in file: 50243
598 words with special (non-word) characters: 20480
599 words with only special (non-word) characters: 7790
600 words with only consonants: 4801
601 words with only capital letters: 1316
602 words with only vowels: 1701
603
604 "NYTProf" will generate a report database into the file nytprof.out by
605 default. Human readable reports can be generated from here by using
606 the supplied "nytprofhtml" (HTML output) and "nytprofcsv" (CSV output)
607 programs. We've used the unix sytem "html2text" utility to convert the
608 nytprof/index.html file for convenience here.
609
610 $> html2text nytprof/index.html
611
612 Performance Profile Index
613 For wordmatch
614 Run on Fri Sep 26 13:46:39 2008
615 Reported on Fri Sep 26 13:47:23 2008
616
617 Top 15 Subroutines -- ordered by exclusive time
618 |Calls |P |F |Inclusive|Exclusive|Subroutine |
619 | | | |Time |Time | |
620 |251215|5 |1 |13.09263 |10.47692 |main:: |matches |
621 |260642|2 |1 |2.71199 |2.71199 |main:: |debug |
622 |1 |1 |1 |0.21404 |0.21404 |main:: |report |
623 |2 |2 |2 |0.00511 |0.00511 |XSLoader:: |load (xsub) |
624 |14 |14|7 |0.00304 |0.00298 |Exporter:: |import |
625 |3 |1 |1 |0.00265 |0.00254 |Exporter:: |as_heavy |
626 |10 |10|4 |0.00140 |0.00140 |vars:: |import |
627 |13 |13|1 |0.00129 |0.00109 |constant:: |import |
628 |1 |1 |1 |0.00360 |0.00096 |FileHandle:: |import |
629 |3 |3 |3 |0.00086 |0.00074 |warnings::register::|import |
630 |9 |3 |1 |0.00036 |0.00036 |strict:: |bits |
631 |13 |13|13|0.00032 |0.00029 |strict:: |import |
632 |2 |2 |2 |0.00020 |0.00020 |warnings:: |import |
633 |2 |1 |1 |0.00020 |0.00020 |Getopt::Long:: |ParseOptionSpec|
634 |7 |7 |6 |0.00043 |0.00020 |strict:: |unimport |
635
636 For more information see the full list of 189 subroutines.
637
638 The first part of the report already shows the critical information
639 regarding which subroutines are using the most time. The next gives
640 some statistics about the source files profiled.
641
642 Source Code Files -- ordered by exclusive time then name
643 |Stmts |Exclusive|Avg. |Reports |Source File |
644 | |Time | | | |
645 |2699761|15.66654 |6e-06 |line . block . sub|wordmatch |
646 |35 |0.02187 |0.00062|line . block . sub|IO/Handle.pm |
647 |274 |0.01525 |0.00006|line . block . sub|Getopt/Long.pm |
648 |20 |0.00585 |0.00029|line . block . sub|Fcntl.pm |
649 |128 |0.00340 |0.00003|line . block . sub|Exporter/Heavy.pm |
650 |42 |0.00332 |0.00008|line . block . sub|IO/File.pm |
651 |261 |0.00308 |0.00001|line . block . sub|Exporter.pm |
652 |323 |0.00248 |8e-06 |line . block . sub|constant.pm |
653 |12 |0.00246 |0.00021|line . block . sub|File/Spec/Unix.pm |
654 |191 |0.00240 |0.00001|line . block . sub|vars.pm |
655 |77 |0.00201 |0.00003|line . block . sub|FileHandle.pm |
656 |12 |0.00198 |0.00016|line . block . sub|Carp.pm |
657 |14 |0.00175 |0.00013|line . block . sub|Symbol.pm |
658 |15 |0.00130 |0.00009|line . block . sub|IO.pm |
659 |22 |0.00120 |0.00005|line . block . sub|IO/Seekable.pm |
660 |198 |0.00085 |4e-06 |line . block . sub|warnings/register.pm|
661 |114 |0.00080 |7e-06 |line . block . sub|strict.pm |
662 |47 |0.00068 |0.00001|line . block . sub|warnings.pm |
663 |27 |0.00054 |0.00002|line . block . sub|overload.pm |
664 |9 |0.00047 |0.00005|line . block . sub|SelectSaver.pm |
665 |13 |0.00045 |0.00003|line . block . sub|File/Spec.pm |
666 |2701595|15.73869 | |Total |
667 |128647 |0.74946 | |Average |
668 | |0.00201 |0.00003|Median |
669 | |0.00121 |0.00003|Deviation |
670
671 Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and
672 Adam Kaplan.
673
674 At this point, if you're using the html report, you can click through
675 the various links to bore down into each subroutine and each line of
676 code. Because we're using the text reporting here, and there's a whole
677 directory full of reports built for each source file, we'll just
678 display a part of the corresponding wordmatch-line.html file,
679 sufficient to give an idea of the sort of output you can expect from
680 this cool tool.
681
682 $> html2text nytprof/wordmatch-line.html
683
684 Performance Profile -- -block view-.-line view-.-sub view-
685 For wordmatch
686 Run on Fri Sep 26 13:46:39 2008
687 Reported on Fri Sep 26 13:47:22 2008
688
689 File wordmatch
690
691 Subroutines -- ordered by exclusive time
692 |Calls |P|F|Inclusive|Exclusive|Subroutine |
693 | | | |Time |Time | |
694 |251215|5|1|13.09263 |10.47692 |main::|matches|
695 |260642|2|1|2.71199 |2.71199 |main::|debug |
696 |1 |1|1|0.21404 |0.21404 |main::|report |
697 |0 |0|0|0 |0 |main::|BEGIN |
698
699
700 |Line|Stmts.|Exclusive|Avg. |Code |
701 | | |Time | | |
702 |1 | | | |#!/usr/bin/perl |
703 |2 | | | | |
704 | | | | |use strict; |
705 |3 |3 |0.00086 |0.00029|# spent 0.00003s making 1 calls to strict:: |
706 | | | | |import |
707 | | | | |use warnings; |
708 |4 |3 |0.01563 |0.00521|# spent 0.00012s making 1 calls to warnings:: |
709 | | | | |import |
710 |5 | | | | |
711 |6 | | | |=head1 NAME |
712 |7 | | | | |
713 |8 | | | |filewords - word analysis of input file |
714 <...snip...>
715 |62 |1 |0.00445 |0.00445|print report( %count ); |
716 | | | | |# spent 0.21404s making 1 calls to main::report|
717 |63 | | | | |
718 | | | | |# spent 23.56955s (10.47692+2.61571) within |
719 | | | | |main::matches which was called 251215 times, |
720 | | | | |avg 0.00005s/call: # 50243 times |
721 | | | | |(2.12134+0.51939s) at line 57 of wordmatch, avg|
722 | | | | |0.00005s/call # 50243 times (2.17735+0.54550s) |
723 |64 | | | |at line 56 of wordmatch, avg 0.00005s/call # |
724 | | | | |50243 times (2.10992+0.51797s) at line 58 of |
725 | | | | |wordmatch, avg 0.00005s/call # 50243 times |
726 | | | | |(2.12696+0.51598s) at line 55 of wordmatch, avg|
727 | | | | |0.00005s/call # 50243 times (1.94134+0.51687s) |
728 | | | | |at line 54 of wordmatch, avg 0.00005s/call |
729 | | | | |sub matches { |
730 <...snip...>
731 |102 | | | | |
732 | | | | |# spent 2.71199s within main::debug which was |
733 | | | | |called 260642 times, avg 0.00001s/call: # |
734 | | | | |251215 times (2.61571+0s) by main::matches at |
735 |103 | | | |line 74 of wordmatch, avg 0.00001s/call # 9427 |
736 | | | | |times (0.09628+0s) at line 50 of wordmatch, avg|
737 | | | | |0.00001s/call |
738 | | | | |sub debug { |
739 |104 |260642|0.58496 |2e-06 |my $message = shift; |
740 |105 | | | | |
741 |106 |260642|1.09917 |4e-06 |if ( $debug ) { |
742 |107 | | | |print STDERR "DBG: $message\n"; |
743 |108 | | | |} |
744 |109 | | | |} |
745 |110 | | | | |
746 |111 |1 |0.01501 |0.01501|exit 0; |
747 |112 | | | | |
748
749 Oodles of very useful information in there - this seems to be the way
750 forward.
751
752 See also "Devel::NYTProf::Apache" which hooks "Devel::NYTProf" into
753 "mod_perl".
754
756 Perl modules are not the only tools a performance analyst has at their
757 disposal, system tools like "time" should not be overlooked as the next
758 example shows, where we take a quick look at sorting. Many books,
759 theses and articles, have been written about efficient sorting
760 algorithms, and this is not the place to repeat such work, there's
761 several good sorting modules which deserve taking a look at too:
762 "Sort::Maker", "Sort::Key" spring to mind. However, it's still
763 possible to make some observations on certain Perl specific
764 interpretations on issues relating to sorting data sets and give an
765 example or two with regard to how sorting large data volumes can effect
766 performance. Firstly, an often overlooked point when sorting large
767 amounts of data, one can attempt to reduce the data set to be dealt
768 with and in many cases "grep()" can be quite useful as a simple filter:
769
770 @data = sort grep { /$filter/ } @incoming
771
772 A command such as this can vastly reduce the volume of material to
773 actually sort through in the first place, and should not be too lightly
774 disregarded purely on the basis of it's simplicity. The "KISS"
775 principle is too often overlooked - the next example uses the simple
776 system "time" utility to demonstrate. Let's take a look at an actual
777 example of sorting the contents of a large file, an apache logfile
778 would do. This one has over a quarter of a million lines, is 50M in
779 size, and a snippet of it looks like this:
780
781 # logfile
782
783 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
784 188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
785 151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
786 151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
787 151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
788 217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
789 217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
790 debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
791 debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
792 debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
793 195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
794 195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
795 195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
796 crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0"
797 crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0"
798 fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
799 fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
800 80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
801 80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
802 pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
803 livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
804 livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
805 dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
806 196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1"
807
808 The specific task here is to sort the 286,525 lines of this file by
809 Response Code, Query, Browser, Referring Url, and lastly Date. One
810 solution might be to use the following code, which iterates over the
811 files given on the command-line.
812
813 # sort-apache-log
814
815 #!/usr/bin/perl -n
816
817 use strict;
818 use warnings;
819
820 my @data;
821
822 LINE:
823 while ( <> ) {
824 my $line = $_;
825 if (
826 $line =~ m/^(
827 ([\w\.\-]+) # client
828 \s*-\s*-\s*\[
829 ([^]]+) # date
830 \]\s*"\w+\s*
831 (\S+) # query
832 [^"]+"\s*
833 (\d+) # status
834 \s+\S+\s+"[^"]*"\s+"
835 ([^"]*) # browser
836 "
837 .*
838 )$/x
839 ) {
840 my @chunks = split(/ +/, $line);
841 my $ip = $1;
842 my $date = $2;
843 my $query = $3;
844 my $status = $4;
845 my $browser = $5;
846
847 push(@data, [$ip, $date, $query, $status, $browser, $line]);
848 }
849 }
850
851 my @sorted = sort {
852 $a->[3] cmp $b->[3]
853 ||
854 $a->[2] cmp $b->[2]
855 ||
856 $a->[0] cmp $b->[0]
857 ||
858 $a->[1] cmp $b->[1]
859 ||
860 $a->[4] cmp $b->[4]
861 } @data;
862
863 foreach my $data ( @sorted ) {
864 print $data->[5];
865 }
866
867 exit 0;
868
869 When running this program, redirect "STDOUT" so it is possible to check
870 the output is correct from following test runs and use the system
871 "time" utility to check the overall runtime.
872
873 $> time ./sort-apache-log logfile > out-sort
874
875 real 0m17.371s
876 user 0m15.757s
877 sys 0m0.592s
878
879 The program took just over 17 wallclock seconds to run. Note the
880 different values "time" outputs, it's important to always use the same
881 one, and to not confuse what each one means.
882
883 Elapsed Real Time
884 The overall, or wallclock, time between when "time" was called, and
885 when it terminates. The elapsed time includes both user and system
886 times, and time spent waiting for other users and processes on the
887 system. Inevitably, this is the most approximate of the
888 measurements given.
889
890 User CPU Time
891 The user time is the amount of time the entire process spent on
892 behalf of the user on this system executing this program.
893
894 System CPU Time
895 The system time is the amount of time the kernel itself spent
896 executing routines, or system calls, on behalf of this process
897 user.
898
899 Running this same process as a "Schwarzian Transform" it is possible to
900 eliminate the input and output arrays for storing all the data, and
901 work on the input directly as it arrives too. Otherwise, the code
902 looks fairly similar:
903
904 # sort-apache-log-schwarzian
905
906 #!/usr/bin/perl -n
907
908 use strict;
909 use warnings;
910
911 print
912
913 map $_->[0] =>
914
915 sort {
916 $a->[4] cmp $b->[4]
917 ||
918 $a->[3] cmp $b->[3]
919 ||
920 $a->[1] cmp $b->[1]
921 ||
922 $a->[2] cmp $b->[2]
923 ||
924 $a->[5] cmp $b->[5]
925 }
926 map [ $_, m/^(
927 ([\w\.\-]+) # client
928 \s*-\s*-\s*\[
929 ([^]]+) # date
930 \]\s*"\w+\s*
931 (\S+) # query
932 [^"]+"\s*
933 (\d+) # status
934 \s+\S+\s+"[^"]*"\s+"
935 ([^"]*) # browser
936 "
937 .*
938 )$/xo ]
939
940 => <>;
941
942 exit 0;
943
944 Run the new code against the same logfile, as above, to check the new
945 time.
946
947 $> time ./sort-apache-log-schwarzian logfile > out-schwarz
948
949 real 0m9.664s
950 user 0m8.873s
951 sys 0m0.704s
952
953 The time has been cut in half, which is a respectable speed improvement
954 by any standard. Naturally, it is important to check the output is
955 consistent with the first program run, this is where the unix system
956 "cksum" utility comes in.
957
958 $> cksum out-sort out-schwarz
959 3044173777 52029194 out-sort
960 3044173777 52029194 out-schwarz
961
962 BTW. Beware too of pressure from managers who see you speed a program
963 up by 50% of the runtime once, only to get a request one month later to
964 do the same again (true story) - you'll just have to point out your
965 only human, even if you are a Perl programmer, and you'll see what you
966 can do...
967
969 An essential part of any good development process is appropriate error
970 handling with appropriately informative messages, however there exists
971 a school of thought which suggests that log files should be chatty, as
972 if the chain of unbroken output somehow ensures the survival of the
973 program. If speed is in any way an issue, this approach is wrong.
974
975 A common sight is code which looks something like this:
976
977 logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) )
978
979 The problem is that this code will always be parsed and executed, even
980 when the debug level set in the logging configuration file is zero.
981 Once the debug() subroutine has been entered, and the internal $debug
982 variable confirmed to be zero, for example, the message which has been
983 sent in will be discarded and the program will continue. In the
984 example given though, the \%INC hash will already have been dumped, and
985 the message string constructed, all of which work could be bypassed by
986 a debug variable at the statement level, like this:
987
988 logger->debug( "A logging message via process-id: $$ INC: " . Dumper(\%INC) ) if $DEBUG;
989
990 This effect can be demonstrated by setting up a test script with both
991 forms, including a "debug()" subroutine to emulate typical "logger()"
992 functionality.
993
994 # ifdebug
995
996 #!/usr/bin/perl
997
998 use strict;
999 use warnings;
1000
1001 use Benchmark;
1002 use Data::Dumper;
1003 my $DEBUG = 0;
1004
1005 sub debug {
1006 my $msg = shift;
1007
1008 if ( $DEBUG ) {
1009 print "DEBUG: $msg\n";
1010 }
1011 };
1012
1013 timethese(100000, {
1014 'debug' => sub {
1015 debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
1016 },
1017 'ifdebug' => sub {
1018 debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG
1019 },
1020 });
1021
1022 Let's see what "Benchmark" makes of this:
1023
1024 $> perl ifdebug
1025 Benchmark: timing 100000 iterations of constant, sub...
1026 ifdebug: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000)
1027 (warning: too few iterations for a reliable count)
1028 debug: 14 wallclock secs (13.18 usr + 0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000)
1029
1030 In the one case the code, which does exactly the same thing as far as
1031 outputting any debugging information is concerned, in other words
1032 nothing, takes 14 seconds, and in the other case the code takes one
1033 hundredth of a second. Looks fairly definitive. Use a $DEBUG variable
1034 BEFORE you call the subroutine, rather than relying on the smart
1035 functionality inside it.
1036
1037 Logging if DEBUG (constant)
1038 It's possible to take the previous idea a little further, by using a
1039 compile time "DEBUG" constant.
1040
1041 # ifdebug-constant
1042
1043 #!/usr/bin/perl
1044
1045 use strict;
1046 use warnings;
1047
1048 use Benchmark;
1049 use Data::Dumper;
1050 use constant
1051 DEBUG => 0
1052 ;
1053
1054 sub debug {
1055 if ( DEBUG ) {
1056 my $msg = shift;
1057 print "DEBUG: $msg\n";
1058 }
1059 };
1060
1061 timethese(100000, {
1062 'debug' => sub {
1063 debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
1064 },
1065 'constant' => sub {
1066 debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG
1067 },
1068 });
1069
1070 Running this program produces the following output:
1071
1072 $> perl ifdebug-constant
1073 Benchmark: timing 100000 iterations of constant, sub...
1074 constant: 0 wallclock secs (-0.00 usr + 0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000)
1075 (warning: too few iterations for a reliable count)
1076 sub: 14 wallclock secs (13.09 usr + 0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000)
1077
1078 The "DEBUG" constant wipes the floor with even the $debug variable,
1079 clocking in at minus zero seconds, and generates a "warning: too few
1080 iterations for a reliable count" message into the bargain. To see what
1081 is really going on, and why we had too few iterations when we thought
1082 we asked for 100000, we can use the very useful "B::Deparse" to inspect
1083 the new code:
1084
1085 $> perl -MO=Deparse ifdebug-constant
1086
1087 use Benchmark;
1088 use Data::Dumper;
1089 use constant ('DEBUG', 0);
1090 sub debug {
1091 use warnings;
1092 use strict 'refs';
1093 0;
1094 }
1095 use warnings;
1096 use strict 'refs';
1097 timethese(100000, {'sub', sub {
1098 debug "A $0 logging message via process-id: $$" . Dumper(\%INC);
1099 }
1100 , 'constant', sub {
1101 0;
1102 }
1103 });
1104 ifdebug-constant syntax OK
1105
1106 The output shows the constant() subroutine we're testing being replaced
1107 with the value of the "DEBUG" constant: zero. The line to be tested
1108 has been completely optimized away, and you can't get much more
1109 efficient than that.
1110
1112 This document has provided several way to go about identifying hot-
1113 spots, and checking whether any modifications have improved the runtime
1114 of the code.
1115
1116 As a final thought, remember that it's not (at the time of writing)
1117 possible to produce a useful program which will run in zero or negative
1118 time and this basic principle can be written as: useful programs are
1119 slow by their very definition. It is of course possible to write a
1120 nearly instantaneous program, but it's not going to do very much,
1121 here's a very efficient one:
1122
1123 $> perl -e 0
1124
1125 Optimizing that any further is a job for "p5p".
1126
1128 Further reading can be found using the modules and links below.
1129
1130 PERLDOCS
1131 For example: "perldoc -f sort".
1132
1133 perlfaq4.
1134
1135 perlfork, perlfunc, perlretut, perlthrtut.
1136
1137 threads.
1138
1139 MAN PAGES
1140 "time".
1141
1142 MODULES
1143 It's not possible to individually showcase all the performance related
1144 code for Perl here, naturally, but here's a short list of modules from
1145 the CPAN which deserve further attention.
1146
1147 Apache::DProf
1148 Apache::SmallProf
1149 Benchmark
1150 DBIx::Profiler
1151 Devel::AutoProfiler
1152 Devel::DProf
1153 Devel::DProfLB
1154 Devel::FastProf
1155 Devel::GraphVizProf
1156 Devel::NYTProf
1157 Devel::NYTProf::Apache
1158 Devel::Profiler
1159 Devel::Profile
1160 Devel::Profit
1161 Devel::SmallProf
1162 Devel::WxProf
1163 POE::Devel::Profiler
1164 Sort::Key
1165 Sort::Maker
1166
1167 URLS
1168 Very useful online reference material:
1169
1170 http://www.ccl4.org/~nick/P/Fast_Enough/
1171
1172 http://www-128.ibm.com/developerworks/library/l-optperl.html
1173
1174 http://perlbuzz.com/2007/11/bind-output-variables-in-dbi-for-speed-and-safety.html
1175
1176 http://en.wikipedia.org/wiki/Performance_analysis
1177
1178 http://apache.perl.org/docs/1.0/guide/performance.html
1179
1180 http://perlgolf.sourceforge.net/
1181
1182 http://www.sysarch.com/Perl/sort_paper.html
1183
1184 http://www.unix.org.ua/orelly/perl/prog/ch08_03.htm
1185
1187 Richard Foley <richard.foley@rfi.net> Copyright (c) 2008
1188
1189
1190
1191perl v5.10.1 2009-05-14 PERLPERF(1)