perlhack(1)

1PERLHACK(1)            Perl Programmers Reference Guide            PERLHACK(1)
2
3
4

NAME

6       perlhack - How to hack at the Perl internals
7

DESCRIPTION

9       This document attempts to explain how Perl development takes place, and
10       ends with some suggestions for people wanting to become bona fide
11       porters.
12
13       The perl5-porters mailing list is where the Perl standard distribution
14       is maintained and developed.  The list can get anywhere from 10 to 150
15       messages a day, depending on the heatedness of the debate.  Most days
16       there are two or three patches, extensions, features, or bugs being
17       discussed at a time.
18
19       A searchable archive of the list is at either:
20
21           http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
22
23       or
24
25           http://archive.develooper.com/perl5-porters@perl.org/
26
27       List subscribers (the porters themselves) come in several flavours.
28       Some are quiet curious lurkers, who rarely pitch in and instead watch
29       the ongoing development to ensure they're forewarned of new changes or
30       features in Perl.  Some are representatives of vendors, who are there
31       to make sure that Perl continues to compile and work on their plat‐
32       forms.  Some patch any reported bug that they know how to fix, some are
33       actively patching their pet area (threads, Win32, the regexp engine),
34       while others seem to do nothing but complain.  In other words, it's
35       your usual mix of technical people.
36
37       Over this group of porters presides Larry Wall.  He has the final word
38       in what does and does not change in the Perl language.  Various
39       releases of Perl are shepherded by a "pumpking", a porter responsible
40       for gathering patches, deciding on a patch-by-patch, feature-by-feature
41       basis what will and will not go into the release.  For instance,
42       Gurusamy Sarathy was the pumpking for the 5.6 release of Perl, and
43       Jarkko Hietaniemi was the pumpking for the 5.8 release, and Rafael Gar‐
44       cia-Suarez holds the pumpking crown for the 5.10 release.
45
46       In addition, various people are pumpkings for different things.  For
47       instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the
48       Configure pumpkin up till the 5.8 release. For the 5.10 release H.Mer‐
49       ijn Brand took over.
50
51       Larry sees Perl development along the lines of the US government:
52       there's the Legislature (the porters), the Executive branch (the pump‐
53       kings), and the Supreme Court (Larry).  The legislature can discuss and
54       submit patches to the executive branch all they like, but the executive
55       branch is free to veto them.  Rarely, the Supreme Court will side with
56       the executive branch over the legislature, or the legislature over the
57       executive branch.  Mostly, however, the legislature and the executive
58       branch are supposed to get along and work out their differences without
59       impeachment or court cases.
60
61       You might sometimes see reference to Rule 1 and Rule 2.  Larry's power
62       as Supreme Court is expressed in The Rules:
63
64       1   Larry is always by definition right about how Perl should behave.
65           This means he has final veto power on the core functionality.
66
67       2   Larry is allowed to change his mind about any matter at a later
68           date, regardless of whether he previously invoked Rule 1.
69
70       Got that?  Larry is always right, even when he was wrong.  It's rare to
71       see either Rule exercised, but they are often alluded to.
72
73       New features and extensions to the language are contentious, because
74       the criteria used by the pumpkings, Larry, and other porters to decide
75       which features should be implemented and incorporated are not codified
76       in a few small design goals as with some other languages.  Instead, the
77       heuristics are flexible and often difficult to fathom.  Here is one
78       person's list, roughly in decreasing order of importance, of heuristics
79       that new features have to be weighed against:
80
81       Does concept match the general goals of Perl?
82           These haven't been written anywhere in stone, but one approximation
83           is:
84
85            1. Keep it fast, simple, and useful.
86            2. Keep features/concepts as orthogonal as possible.
87            3. No arbitrary limits (platforms, data sizes, cultures).
88            4. Keep it open and exciting to use/patch/advocate Perl everywhere.
89            5. Either assimilate new technologies, or build bridges to them.
90
91       Where is the implementation?
92           All the talk in the world is useless without an implementation.  In
93           almost every case, the person or people who argue for a new feature
94           will be expected to be the ones who implement it.  Porters capable
95           of coding new features have their own agendas, and are not avail‐
96           able to implement your (possibly good) idea.
97
98       Backwards compatibility
99           It's a cardinal sin to break existing Perl programs.  New warnings
100           are contentious--some say that a program that emits warnings is not
101           broken, while others say it is.  Adding keywords has the potential
102           to break programs, changing the meaning of existing token sequences
103           or functions might break programs.
104
105       Could it be a module instead?
106           Perl 5 has extension mechanisms, modules and XS, specifically to
107           avoid the need to keep changing the Perl interpreter.  You can
108           write modules that export functions, you can give those functions
109           prototypes so they can be called like built-in functions, you can
110           even write XS code to mess with the runtime data structures of the
111           Perl interpreter if you want to implement really complicated
112           things.  If it can be done in a module instead of in the core, it's
113           highly unlikely to be added.
114
115       Is the feature generic enough?
116           Is this something that only the submitter wants added to the lan‐
117           guage, or would it be broadly useful?  Sometimes, instead of adding
118           a feature with a tight focus, the porters might decide to wait
119           until someone implements the more generalized feature.  For
120           instance, instead of implementing a "delayed evaluation" feature,
121           the porters are waiting for a macro system that would permit
122           delayed evaluation and much more.
123
124       Does it potentially introduce new bugs?
125           Radical rewrites of large chunks of the Perl interpreter have the
126           potential to introduce new bugs.  The smaller and more localized
127           the change, the better.
128
129       Does it preclude other desirable features?
130           A patch is likely to be rejected if it closes off future avenues of
131           development.  For instance, a patch that placed a true and final
132           interpretation on prototypes is likely to be rejected because there
133           are still options for the future of prototypes that haven't been
134           addressed.
135
136       Is the implementation robust?
137           Good patches (tight code, complete, correct) stand more chance of
138           going in.  Sloppy or incorrect patches might be placed on the back
139           burner until the pumpking has time to fix, or might be discarded
140           altogether without further notice.
141
142       Is the implementation generic enough to be portable?
143           The worst patches make use of a system-specific features.  It's
144           highly unlikely that nonportable additions to the Perl language
145           will be accepted.
146
147       Is the implementation tested?
148           Patches which change behaviour (fixing bugs or introducing new fea‐
149           tures) must include regression tests to verify that everything
150           works as expected.  Without tests provided by the original author,
151           how can anyone else changing perl in the future be sure that they
152           haven't unwittingly broken the behaviour the patch implements? And
153           without tests, how can the patch's author be confident that his/her
154           hard work put into the patch won't be accidentally thrown away by
155           someone in the future?
156
157       Is there enough documentation?
158           Patches without documentation are probably ill-thought out or
159           incomplete.  Nothing can be added without documentation, so submit‐
160           ting a patch for the appropriate manpages as well as the source
161           code is always a good idea.
162
163       Is there another way to do it?
164           Larry said "Although the Perl Slogan is There's More Than One Way
165           to Do It, I hesitate to make 10 ways to do something".  This is a
166           tricky heuristic to navigate, though--one man's essential addition
167           is another man's pointless cruft.
168
169       Does it create too much work?
170           Work for the pumpking, work for Perl programmers, work for module
171           authors, ...  Perl is supposed to be easy.
172
173       Patches speak louder than words
174           Working code is always preferred to pie-in-the-sky ideas.  A patch
175           to add a feature stands a much higher chance of making it to the
176           language than does a random feature request, no matter how fer‐
177           vently argued the request might be.  This ties into "Will it be
178           useful?", as the fact that someone took the time to make the patch
179           demonstrates a strong desire for the feature.
180
181       If you're on the list, you might hear the word "core" bandied around.
182       It refers to the standard distribution.  "Hacking on the core" means
183       you're changing the C source code to the Perl interpreter.  "A core
184       module" is one that ships with Perl.
185
186       Keeping in sync
187
188       The source code to the Perl interpreter, in its different versions, is
189       kept in a repository managed by a revision control system ( which is
190       currently the Perforce program, see http://perforce.com/ ).  The pump‐
191       kings and a few others have access to the repository to check in
192       changes.  Periodically the pumpking for the development version of Perl
193       will release a new version, so the rest of the porters can see what's
194       changed.  The current state of the main trunk of repository, and
195       patches that describe the individual changes that have happened since
196       the last public release are available at this location:
197
198           http://public.activestate.com/pub/apc/
199           ftp://public.activestate.com/pub/apc/
200
201       If you're looking for a particular change, or a change that affected a
202       particular set of files, you may find the Perl Repository Browser use‐
203       ful:
204
205           http://public.activestate.com/cgi-bin/perlbrowse
206
207       You may also want to subscribe to the perl5-changes mailing list to
208       receive a copy of each patch that gets submitted to the maintenance and
209       development "branches" of the perl repository.  See
210       http://lists.perl.org/ for subscription information.
211
212       If you are a member of the perl5-porters mailing list, it is a good
213       thing to keep in touch with the most recent changes. If not only to
214       verify if what you would have posted as a bug report isn't already
215       solved in the most recent available perl development branch, also known
216       as perl-current, bleading edge perl, bleedperl or bleadperl.
217
218       Needless to say, the source code in perl-current is usually in a per‐
219       petual state of evolution.  You should expect it to be very buggy.  Do
220       not use it for any purpose other than testing and development.
221
222       Keeping in sync with the most recent branch can be done in several
223       ways, but the most convenient and reliable way is using rsync, avail‐
224       able at ftp://rsync.samba.org/pub/rsync/ .  (You can also get the most
225       recent branch by FTP.)
226
227       If you choose to keep in sync using rsync, there are two approaches to
228       doing so:
229
230       rsync'ing the source tree
231           Presuming you are in the directory where your perl source resides
232           and you have rsync installed and available, you can "upgrade" to
233           the bleadperl using:
234
235            # rsync -avz rsync://public.activestate.com/perl-current/ .
236
237           This takes care of updating every single item in the source tree to
238           the latest applied patch level, creating files that are new (to
239           your distribution) and setting date/time stamps of existing files
240           to reflect the bleadperl status.
241
242           Note that this will not delete any files that were in '.' before
243           the rsync. Once you are sure that the rsync is running correctly,
244           run it with the --delete and the --dry-run options like this:
245
246            # rsync -avz --delete --dry-run rsync://public.activestate.com/perl-current/ .
247
248           This will simulate an rsync run that also deletes files not present
249           in the bleadperl master copy. Observe the results from this run
250           closely. If you are sure that the actual run would delete no files
251           precious to you, you could remove the '--dry-run' option.
252
253           You can than check what patch was the latest that was applied by
254           looking in the file .patch, which will show the number of the lat‐
255           est patch.
256
257           If you have more than one machine to keep in sync, and not all of
258           them have access to the WAN (so you are not able to rsync all the
259           source trees to the real source), there are some ways to get around
260           this problem.
261
262           Using rsync over the LAN
263               Set up a local rsync server which makes the rsynced source tree
264               available to the LAN and sync the other machines against this
265               directory.
266
267               From http://rsync.samba.org/README.html :
268
269                  "Rsync uses rsh or ssh for communication. It does not need to be
270                   setuid and requires no special privileges for installation.  It
271                   does not require an inetd entry or a daemon.  You must, however,
272                   have a working rsh or ssh system.  Using ssh is recommended for
273                   its security features."
274
275           Using pushing over the NFS
276               Having the other systems mounted over the NFS, you can take an
277               active pushing approach by checking the just updated tree
278               against the other not-yet synced trees. An example would be
279
280                 #!/usr/bin/perl -w
281
282                 use strict;
283                 use File::Copy;
284
285                 my %MF = map {
286                     m/(\S+)/;
287                     $1 => [ (stat $1)[2, 7, 9] ];     # mode, size, mtime
288                     } `cat MANIFEST`;
289
290                 my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
291
292                 foreach my $host (keys %remote) {
293                     unless (-d $remote{$host}) {
294                         print STDERR "Cannot Xsync for host $host\n";
295                         next;
296                         }
297                     foreach my $file (keys %MF) {
298                         my $rfile = "$remote{$host}/$file";
299                         my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
300                         defined $size or ($mode, $size, $mtime) = (0, 0, 0);
301                         $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
302                         printf "%4s %-34s %8d %9d  %8d %9d\n",
303                             $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
304                         unlink $rfile;
305                         copy ($file, $rfile);
306                         utime time, $MF{$file}[2], $rfile;
307                         chmod $MF{$file}[0], $rfile;
308                         }
309                     }
310
311               though this is not perfect. It could be improved with checking
312               file checksums before updating. Not all NFS systems support
313               reliable utime support (when used over the NFS).
314
315       rsync'ing the patches
316           The source tree is maintained by the pumpking who applies patches
317           to the files in the tree. These patches are either created by the
318           pumpking himself using "diff -c" after updating the file manually
319           or by applying patches sent in by posters on the perl5-porters
320           list.  These patches are also saved and rsync'able, so you can
321           apply them yourself to the source files.
322
323           Presuming you are in a directory where your patches reside, you can
324           get them in sync with
325
326            # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
327
328           This makes sure the latest available patch is downloaded to your
329           patch directory.
330
331           It's then up to you to apply these patches, using something like
332
333            # last=`ls -t *.gz ⎪ sed q`
334            # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
335            # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
336            # cd ../perl-current
337            # patch -p1 -N <../perl-current-diffs/blead.patch
338
339           or, since this is only a hint towards how it works, use CPAN-
340           patchaperl from Andreas König to have better control over the
341           patching process.
342
343       Why rsync the source tree
344
345       It's easier to rsync the source tree
346           Since you don't have to apply the patches yourself, you are sure
347           all files in the source tree are in the right state.
348
349       It's more reliable
350           While both the rsync-able source and patch areas are automatically
351           updated every few minutes, keep in mind that applying patches may
352           sometimes mean careful hand-holding, especially if your version of
353           the "patch" program does not understand how to deal with new files,
354           files with 8-bit characters, or files without trailing newlines.
355
356       Why rsync the patches
357
358       It's easier to rsync the patches
359           If you have more than one machine that you want to keep in track
360           with bleadperl, it's easier to rsync the patches only once and then
361           apply them to all the source trees on the different machines.
362
363           In case you try to keep in pace on 5 different machines, for which
364           only one of them has access to the WAN, rsync'ing all the source
365           trees should than be done 5 times over the NFS. Having rsync'ed the
366           patches only once, I can apply them to all the source trees auto‐
367           matically. Need you say more ;-)
368
369       It's a good reference
370           If you do not only like to have the most recent development branch,
371           but also like to fix bugs, or extend features, you want to dive
372           into the sources. If you are a seasoned perl core diver, you don't
373           need no manuals, tips, roadmaps, perlguts.pod or other aids to find
374           your way around. But if you are a starter, the patches may help you
375           in finding where you should start and how to change the bits that
376           bug you.
377
378           The file Changes is updated on occasions the pumpking sees as his
379           own little sync points. On those occasions, he releases a tar-ball
380           of the current source tree (i.e. perl@7582.tar.gz), which will be
381           an excellent point to start with when choosing to use the 'rsync
382           the patches' scheme. Starting with perl@7582, which means a set of
383           source files on which the latest applied patch is number 7582, you
384           apply all succeeding patches available from then on (7583, 7584,
385           ...).
386
387           You can use the patches later as a kind of search archive.
388
389           Finding a start point
390               If you want to fix/change the behaviour of function/feature
391               Foo, just scan the patches for patches that mention Foo either
392               in the subject, the comments, or the body of the fix. A good
393               chance the patch shows you the files that are affected by that
394               patch which are very likely to be the starting point of your
395               journey into the guts of perl.
396
397           Finding how to fix a bug
398               If you've found where the function/feature Foo misbehaves, but
399               you don't know how to fix it (but you do know the change you
400               want to make), you can, again, peruse the patches for similar
401               changes and look how others apply the fix.
402
403           Finding the source of misbehaviour
404               When you keep in sync with bleadperl, the pumpking would love
405               to see that the community efforts really work. So after each of
406               his sync points, you are to 'make test' to check if everything
407               is still in working order. If it is, you do 'make ok', which
408               will send an OK report to perlbug@perl.org. (If you do not have
409               access to a mailer from the system you just finished success‐
410               fully 'make test', you can do 'make okfile', which creates the
411               file "perl.ok", which you can than take to your favourite
412               mailer and mail yourself).
413
414               But of course, as always, things will not always lead to a suc‐
415               cess path, and one or more test do not pass the 'make test'.
416               Before sending in a bug report (using 'make nok' or 'make nok‐
417               file'), check the mailing list if someone else has reported the
418               bug already and if so, confirm it by replying to that message.
419               If not, you might want to trace the source of that misbehaviour
420               before sending in the bug, which will help all the other
421               porters in finding the solution.
422
423               Here the saved patches come in very handy. You can check the
424               list of patches to see which patch changed what file and what
425               change caused the misbehaviour. If you note that in the bug
426               report, it saves the one trying to solve it, looking for that
427               point.
428
429           If searching the patches is too bothersome, you might consider
430           using perl's bugtron to find more information about discussions and
431           ramblings on posted bugs.
432
433           If you want to get the best of both worlds, rsync both the source
434           tree for convenience, reliability and ease and rsync the patches
435           for reference.
436
437       Working with the source
438
439       Because you cannot use the Perforce client, you cannot easily generate
440       diffs against the repository, nor will merges occur when you update via
441       rsync.  If you edit a file locally and then rsync against the latest
442       source, changes made in the remote copy will overwrite your local ver‐
443       sions!
444
445       The best way to deal with this is to maintain a tree of symlinks to the
446       rsync'd source.  Then, when you want to edit a file, you remove the
447       symlink, copy the real file into the other tree, and edit it.  You can
448       then diff your edited file against the original to generate a patch,
449       and you can safely update the original tree.
450
451       Perl's Configure script can generate this tree of symlinks for you.
452       The following example assumes that you have used rsync to pull a copy
453       of the Perl source into the perl-rsync directory.  In the directory
454       above that one, you can execute the following commands:
455
456         mkdir perl-dev
457         cd perl-dev
458         ../perl-rsync/Configure -Dmksymlinks -Dusedevel -D"optimize=-g"
459
460       This will start the Perl configuration process.  After a few prompts,
461       you should see something like this:
462
463         Symbolic links are supported.
464
465         Checking how to test for symbolic links...
466         Your builtin 'test -h' may be broken.
467         Trying external '/usr/bin/test -h'.
468         You can test for symbolic links with '/usr/bin/test -h'.
469
470         Creating the symbolic links...
471         (First creating the subdirectories...)
472         (Then creating the symlinks...)
473
474       The specifics may vary based on your operating system, of course.
475       After you see this, you can abort the Configure script, and you will
476       see that the directory you are in has a tree of symlinks to the perl-
477       rsync directories and files.
478
479       If you plan to do a lot of work with the Perl source, here are some
480       Bourne shell script functions that can make your life easier:
481
482           function edit {
483               if [ -L $1 ]; then
484                   mv $1 $1.orig
485                       cp $1.orig $1
486                       vi $1
487               else
488                   /bin/vi $1
489                       fi
490           }
491
492           function unedit {
493               if [ -L $1.orig ]; then
494                   rm $1
495                       mv $1.orig $1
496                       fi
497           }
498
499       Replace "vi" with your favorite flavor of editor.
500
501       Here is another function which will quickly generate a patch for the
502       files which have been edited in your symlink tree:
503
504           mkpatchorig() {
505               local diffopts
506                   for f in `find . -name '*.orig' ⎪ sed s,^\./,,`
507                       do
508                           case `echo $f ⎪ sed 's,.orig$,,;s,.*\.,,'` in
509                               c)   diffopts=-p ;;
510                       pod) diffopts='-F^=' ;;
511                       *)   diffopts= ;;
512                       esac
513                           diff -du $diffopts $f `echo $f ⎪ sed 's,.orig$,,'`
514                           done
515           }
516
517       This function produces patches which include enough context to make
518       your changes obvious.  This makes it easier for the Perl pumpking(s) to
519       review them when you send them to the perl5-porters list, and that
520       means they're more likely to get applied.
521
522       This function assumed a GNU diff, and may require some tweaking for
523       other diff variants.
524
525       Perlbug administration
526
527       There is a single remote administrative interface for modifying bug
528       status, category, open issues etc. using the RT bugtracker system,
529       maintained by Robert Spier.  Become an administrator, and close any
530       bugs you can get your sticky mitts on:
531
532               http://rt.perl.org
533
534       The bugtracker mechanism for perl5 bugs in particular is at:
535
536               http://bugs6.perl.org/perlbug
537
538       To email the bug system administrators:
539
540               "perlbug-admin" <perlbug-admin@perl.org>
541
542       Submitting patches
543
544       Always submit patches to perl5-porters@perl.org.  If you're patching a
545       core module and there's an author listed, send the author a copy (see
546       "Patching a core module").  This lets other porters review your patch,
547       which catches a surprising number of errors in patches.  Either use the
548       diff program (available in source code form from
549       ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' makepatch (available
550       from CPAN/authors/id/JV/).  Unified diffs are preferred, but context
551       diffs are accepted.  Do not send RCS-style diffs or diffs without con‐
552       text lines.  More information is given in the Porting/patching.pod file
553       in the Perl source distribution.  Please patch against the latest
554       development version (e.g., if you're fixing a bug in the 5.005 track,
555       patch against the latest 5.005_5x version).  Only patches that survive
556       the heat of the development branch get applied to maintenance versions.
557
558       Your patch should update the documentation and test suite.  See "Writ‐
559       ing a test".
560
561       To report a bug in Perl, use the program perlbug which comes with Perl
562       (if you can't get Perl to work, send mail to the address perl‐
563       bug@perl.org or perlbug@perl.com).  Reporting bugs through perlbug
564       feeds into the automated bug-tracking system, access to which is pro‐
565       vided through the web at http://bugs.perl.org/ .  It often pays to
566       check the archives of the perl5-porters mailing list to see whether the
567       bug you're reporting has been reported before, and if so whether it was
568       considered a bug.  See above for the location of the searchable ar‐
569       chives.
570
571       The CPAN testers ( http://testers.cpan.org/ ) are a group of volunteers
572       who test CPAN modules on a variety of platforms.  Perl Smokers (
573       http://archives.develooper.com/daily-build@perl.org/ ) automatically
574       tests Perl source releases on platforms with various configurations.
575       Both efforts welcome volunteers.
576
577       It's a good idea to read and lurk for a while before chipping in.  That
578       way you'll get to see the dynamic of the conversations, learn the per‐
579       sonalities of the players, and hopefully be better prepared to make a
580       useful contribution when do you speak up.
581
582       If after all this you still think you want to join the perl5-porters
583       mailing list, send mail to perl5-porters-subscribe@perl.org.  To unsub‐
584       scribe, send mail to perl5-porters-unsubscribe@perl.org.
585
586       To hack on the Perl guts, you'll need to read the following things:
587
588       perlguts
589          This is of paramount importance, since it's the documentation of
590          what goes where in the Perl source. Read it over a couple of times
591          and it might start to make sense - don't worry if it doesn't yet,
592          because the best way to study it is to read it in conjunction with
593          poking at Perl source, and we'll do that later on.
594
595          You might also want to look at Gisle Aas's illustrated perlguts -
596          there's no guarantee that this will be absolutely up-to-date with
597          the latest documentation in the Perl core, but the fundamentals will
598          be right. ( http://gisle.aas.no/perl/illguts/ )
599
600       perlxstut and perlxs
601          A working knowledge of XSUB programming is incredibly useful for
602          core hacking; XSUBs use techniques drawn from the PP code, the por‐
603          tion of the guts that actually executes a Perl program. It's a lot
604          gentler to learn those techniques from simple examples and explana‐
605          tion than from the core itself.
606
607       perlapi
608          The documentation for the Perl API explains what some of the inter‐
609          nal functions do, as well as the many macros used in the source.
610
611       Porting/pumpkin.pod
612          This is a collection of words of wisdom for a Perl porter; some of
613          it is only useful to the pumpkin holder, but most of it applies to
614          anyone wanting to go about Perl development.
615
616       The perl5-porters FAQ
617          This should be available from http://simon-cozens.org/writ‐
618          ings/p5p-faq ; alternatively, you can get the FAQ emailed to you by
619          sending mail to "perl5-porters-faq@perl.org". It contains hints on
620          reading perl5-porters, information on how perl5-porters works and
621          how Perl development in general works.
622
623       Finding Your Way Around
624
625       Perl maintenance can be split into a number of areas, and certain peo‐
626       ple (pumpkins) will have responsibility for each area. These areas
627       sometimes correspond to files or directories in the source kit. Among
628       the areas are:
629
630       Core modules
631          Modules shipped as part of the Perl core live in the lib/ and ext/
632          subdirectories: lib/ is for the pure-Perl modules, and ext/ contains
633          the core XS modules.
634
635       Tests
636          There are tests for nearly all the modules, built-ins and major bits
637          of functionality.  Test files all have a .t suffix.  Module tests
638          live in the lib/ and ext/ directories next to the module being
639          tested.  Others live in t/.  See "Writing a test"
640
641       Documentation
642          Documentation maintenance includes looking after everything in the
643          pod/ directory, (as well as contributing new documentation) and the
644          documentation to the modules in core.
645
646       Configure
647          The configure process is the way we make Perl portable across the
648          myriad of operating systems it supports. Responsibility for the con‐
649          figure, build and installation process, as well as the overall
650          portability of the core code rests with the configure pumpkin - oth‐
651          ers help out with individual operating systems.
652
653          The files involved are the operating system directories, (win32/,
654          os2/, vms/ and so on) the shell scripts which generate config.h and
655          Makefile, as well as the metaconfig files which generate Configure.
656          (metaconfig isn't included in the core distribution.)
657
658       Interpreter
659          And of course, there's the core of the Perl interpreter itself.
660          Let's have a look at that in a little more detail.
661
662       Before we leave looking at the layout, though, don't forget that MANI‐
663       FEST contains not only the file names in the Perl distribution, but
664       short descriptions of what's in them, too. For an overview of the
665       important files, try this:
666
667           perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
668
669       Elements of the interpreter
670
671       The work of the interpreter has two main stages: compiling the code
672       into the internal representation, or bytecode, and then executing it.
673       "Compiled code" in perlguts explains exactly how the compilation stage
674       happens.
675
676       Here is a short breakdown of perl's operation:
677
678       Startup
679          The action begins in perlmain.c. (or miniperlmain.c for miniperl)
680          This is very high-level code, enough to fit on a single screen, and
681          it resembles the code found in perlembed; most of the real action
682          takes place in perl.c
683
684          First, perlmain.c allocates some memory and constructs a Perl inter‐
685          preter:
686
687              1 PERL_SYS_INIT3(&argc,&argv,&env);
688              2
689              3 if (!PL_do_undump) {
690              4     my_perl = perl_alloc();
691              5     if (!my_perl)
692              6         exit(1);
693              7     perl_construct(my_perl);
694              8     PL_perl_destruct_level = 0;
695              9 }
696
697          Line 1 is a macro, and its definition is dependent on your operating
698          system. Line 3 references "PL_do_undump", a global variable - all
699          global variables in Perl start with "PL_". This tells you whether
700          the current running program was created with the "-u" flag to perl
701          and then undump, which means it's going to be false in any sane con‐
702          text.
703
704          Line 4 calls a function in perl.c to allocate memory for a Perl
705          interpreter. It's quite a simple function, and the guts of it looks
706          like this:
707
708              my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
709
710          Here you see an example of Perl's system abstraction, which we'll
711          see later: "PerlMem_malloc" is either your system's "malloc", or
712          Perl's own "malloc" as defined in malloc.c if you selected that
713          option at configure time.
714
715          Next, in line 7, we construct the interpreter; this sets up all the
716          special variables that Perl needs, the stacks, and so on.
717
718          Now we pass Perl the command line options, and tell it to go:
719
720              exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
721              if (!exitstatus) {
722                  exitstatus = perl_run(my_perl);
723              }
724
725          "perl_parse" is actually a wrapper around "S_parse_body", as defined
726          in perl.c, which processes the command line options, sets up any
727          statically linked XS modules, opens the program and calls "yyparse"
728          to parse it.
729
730       Parsing
731          The aim of this stage is to take the Perl source, and turn it into
732          an op tree. We'll see what one of those looks like later. Strictly
733          speaking, there's three things going on here.
734
735          "yyparse", the parser, lives in perly.c, although you're better off
736          reading the original YACC input in perly.y. (Yes, Virginia, there is
737          a YACC grammar for Perl!) The job of the parser is to take your code
738          and "understand" it, splitting it into sentences, deciding which op‐
739          erands go with which operators and so on.
740
741          The parser is nobly assisted by the lexer, which chunks up your
742          input into tokens, and decides what type of thing each token is: a
743          variable name, an operator, a bareword, a subroutine, a core func‐
744          tion, and so on.  The main point of entry to the lexer is "yylex",
745          and that and its associated routines can be found in toke.c. Perl
746          isn't much like other computer languages; it's highly context sensi‐
747          tive at times, it can be tricky to work out what sort of token some‐
748          thing is, or where a token ends. As such, there's a lot of interplay
749          between the tokeniser and the parser, which can get pretty frighten‐
750          ing if you're not used to it.
751
752          As the parser understands a Perl program, it builds up a tree of
753          operations for the interpreter to perform during execution. The rou‐
754          tines which construct and link together the various operations are
755          to be found in op.c, and will be examined later.
756
757       Optimization
758          Now the parsing stage is complete, and the finished tree represents
759          the operations that the Perl interpreter needs to perform to execute
760          our program. Next, Perl does a dry run over the tree looking for
761          optimisations: constant expressions such as "3 + 4" will be computed
762          now, and the optimizer will also see if any multiple operations can
763          be replaced with a single one. For instance, to fetch the variable
764          $foo, instead of grabbing the glob *foo and looking at the scalar
765          component, the optimizer fiddles the op tree to use a function which
766          directly looks up the scalar in question. The main optimizer is
767          "peep" in op.c, and many ops have their own optimizing functions.
768
769       Running
770          Now we're finally ready to go: we have compiled Perl byte code, and
771          all that's left to do is run it. The actual execution is done by the
772          "runops_standard" function in run.c; more specifically, it's done by
773          these three innocent looking lines:
774
775              while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
776                  PERL_ASYNC_CHECK();
777              }
778
779          You may be more comfortable with the Perl version of that:
780
781              PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
782
783          Well, maybe not. Anyway, each op contains a function pointer, which
784          stipulates the function which will actually carry out the operation.
785          This function will return the next op in the sequence - this allows
786          for things like "if" which choose the next op dynamically at run
787          time.  The "PERL_ASYNC_CHECK" makes sure that things like signals
788          interrupt execution if required.
789
790          The actual functions called are known as PP code, and they're spread
791          between four files: pp_hot.c contains the "hot" code, which is most
792          often used and highly optimized, pp_sys.c contains all the system-
793          specific functions, pp_ctl.c contains the functions which implement
794          control structures ("if", "while" and the like) and pp.c contains
795          everything else. These are, if you like, the C code for Perl's
796          built-in functions and operators.
797
798          Note that each "pp_" function is expected to return a pointer to the
799          next op. Calls to perl subs (and eval blocks) are handled within the
800          same runops loop, and do not consume extra space on the C stack. For
801          example, "pp_entersub" and "pp_entertry" just push a "CxSUB" or
802          "CxEVAL" block struct onto the context stack which contain the
803          address of the op following the sub call or eval. They then return
804          the first op of that sub or eval block, and so execution continues
805          of that sub or block.  Later, a "pp_leavesub" or "pp_leavetry" op
806          pops the "CxSUB" or "CxEVAL", retrieves the return op from it, and
807          returns it.
808
809       Exception handing
810          Perl's exception handing (i.e. "die" etc) is built on top of the
811          low-level "setjmp()"/"longjmp()" C-library functions. These basi‐
812          cally provide a way to capture the current PC and SP registers and
813          later restore them; i.e.  a "longjmp()" continues at the point in
814          code where a previous "setjmp()" was done, with anything further up
815          on the C stack being lost. This is why code should always save val‐
816          ues using "SAVE_FOO" rather than in auto variables.
817
818          The perl core wraps "setjmp()" etc in the macros "JMPENV_PUSH" and
819          "JMPENV_JUMP". The basic rule of perl exceptions is that "exit", and
820          "die" (in the absence of "eval") perform a JMPENV_JUMP(2), while
821          "die" within "eval" does a JMPENV_JUMP(3).
822
823          At entry points to perl, such as "perl_parse()", "perl_run()" and
824          "call_sv(cv, G_EVAL)" each does a "JMPENV_PUSH", then enter a runops
825          loop or whatever, and handle possible exception returns. For a 2
826          return, final cleanup is performed, such as popping stacks and call‐
827          ing "CHECK" or "END" blocks. Amongst other things, this is how scope
828          cleanup still occurs during an "exit".
829
830          If a "die" can find a "CxEVAL" block on the context stack, then the
831          stack is popped to that level and the return op in that block is
832          assigned to "PL_restartop"; then a JMPENV_JUMP(3) is performed.
833          This normally passes control back to the guard. In the case of
834          "perl_run" and "call_sv", a non-null "PL_restartop" triggers re-
835          entry to the runops loop. The is the normal way that "die" or
836          "croak" is handled within an "eval".
837
838          Sometimes ops are executed within an inner runops loop, such as tie,
839          sort or overload code. In this case, something like
840
841              sub FETCH { eval { die } }
842
843          would cause a longjmp right back to the guard in "perl_run", popping
844          both runops loops, which is clearly incorrect. One way to avoid this
845          is for the tie code to do a "JMPENV_PUSH" before executing "FETCH"
846          in the inner runops loop, but for efficiency reasons, perl in fact
847          just sets a flag, using "CATCH_SET(TRUE)". The "pp_require",
848          "pp_entereval" and "pp_entertry" ops check this flag, and if true,
849          they call "docatch", which does a "JMPENV_PUSH" and starts a new
850          runops level to execute the code, rather than doing it on the cur‐
851          rent loop.
852
853          As a further optimisation, on exit from the eval block in the
854          "FETCH", execution of the code following the block is still carried
855          on in the inner loop.  When an exception is raised, "docatch" com‐
856          pares the "JMPENV" level of the "CxEVAL" with "PL_top_env" and if
857          they differ, just re-throws the exception. In this way any inner
858          loops get popped.
859
860          Here's an example.
861
862              1: eval { tie @a, 'A' };
863              2: sub A::TIEARRAY {
864              3:     eval { die };
865              4:     die;
866              5: }
867
868          To run this code, "perl_run" is called, which does a "JMPENV_PUSH"
869          then enters a runops loop. This loop executes the eval and tie ops
870          on line 1, with the eval pushing a "CxEVAL" onto the context stack.
871
872          The "pp_tie" does a "CATCH_SET(TRUE)", then starts a second runops
873          loop to execute the body of "TIEARRAY". When it executes the
874          entertry op on line 3, "CATCH_GET" is true, so "pp_entertry" calls
875          "docatch" which does a "JMPENV_PUSH" and starts a third runops loop,
876          which then executes the die op. At this point the C call stack looks
877          like this:
878
879              Perl_pp_die
880              Perl_runops      # third loop
881              S_docatch_body
882              S_docatch
883              Perl_pp_entertry
884              Perl_runops      # second loop
885              S_call_body
886              Perl_call_sv
887              Perl_pp_tie
888              Perl_runops      # first loop
889              S_run_body
890              perl_run
891              main
892
893          and the context and data stacks, as shown by "-Dstv", look like:
894
895              STACK 0: MAIN
896                CX 0: BLOCK  =>
897                CX 1: EVAL   => AV()  PV("A"\0)
898                retop=leave
899              STACK 1: MAGIC
900                CX 0: SUB    =>
901                retop=(null)
902                CX 1: EVAL   => *
903              retop=nextstate
904
905          The die pops the first "CxEVAL" off the context stack, sets
906          "PL_restartop" from it, does a JMPENV_JUMP(3), and control returns
907          to the top "docatch". This then starts another third-level runops
908          level, which executes the nextstate, pushmark and die ops on line 4.
909          At the point that the second "pp_die" is called, the C call stack
910          looks exactly like that above, even though we are no longer within
911          an inner eval; this is because of the optimization mentioned ear‐
912          lier. However, the context stack now looks like this, ie with the
913          top CxEVAL popped:
914
915              STACK 0: MAIN
916                CX 0: BLOCK  =>
917                CX 1: EVAL   => AV()  PV("A"\0)
918                retop=leave
919              STACK 1: MAGIC
920                CX 0: SUB    =>
921                retop=(null)
922
923          The die on line 4 pops the context stack back down to the CxEVAL,
924          leaving it as:
925
926              STACK 0: MAIN
927                CX 0: BLOCK  =>
928
929          As usual, "PL_restartop" is extracted from the "CxEVAL", and a
930          JMPENV_JUMP(3) done, which pops the C stack back to the docatch:
931
932              S_docatch
933              Perl_pp_entertry
934              Perl_runops      # second loop
935              S_call_body
936              Perl_call_sv
937              Perl_pp_tie
938              Perl_runops      # first loop
939              S_run_body
940              perl_run
941              main
942
943          In  this case, because the "JMPENV" level recorded in the "CxEVAL"
944          differs from the current one, "docatch" just does a JMPENV_JUMP(3)
945          and the C stack unwinds to:
946
947              perl_run
948              main
949
950          Because "PL_restartop" is non-null, "run_body" starts a new runops
951          loop and execution continues.
952
953       Internal Variable Types
954
955       You should by now have had a look at perlguts, which tells you about
956       Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
957       that now.
958
959       These variables are used not only to represent Perl-space variables,
960       but also any constants in the code, as well as some structures com‐
961       pletely internal to Perl. The symbol table, for instance, is an ordi‐
962       nary Perl hash. Your code is represented by an SV as it's read into the
963       parser; any program files you call are opened via ordinary Perl file‐
964       handles, and so on.
965
966       The core Devel::Peek module lets us examine SVs from a Perl program.
967       Let's see, for instance, how Perl treats the constant "hello".
968
969             % perl -MDevel::Peek -e 'Dump("hello")'
970           1 SV = PV(0xa041450) at 0xa04ecbc
971           2   REFCNT = 1
972           3   FLAGS = (POK,READONLY,pPOK)
973           4   PV = 0xa0484e0 "hello"\0
974           5   CUR = 5
975           6   LEN = 6
976
977       Reading "Devel::Peek" output takes a bit of practise, so let's go
978       through it line by line.
979
980       Line 1 tells us we're looking at an SV which lives at 0xa04ecbc in mem‐
981       ory. SVs themselves are very simple structures, but they contain a
982       pointer to a more complex structure. In this case, it's a PV, a struc‐
983       ture which holds a string value, at location 0xa041450.  Line 2 is the
984       reference count; there are no other references to this data, so it's 1.
985
986       Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
987       read-only SV (because it's a constant) and the data is a PV internally.
988       Next we've got the contents of the string, starting at location
989       0xa0484e0.
990
991       Line 5 gives us the current length of the string - note that this does
992       not include the null terminator. Line 6 is not the length of the
993       string, but the length of the currently allocated buffer; as the string
994       grows, Perl automatically extends the available storage via a routine
995       called "SvGROW".
996
997       You can get at any of these quantities from C very easily; just add
998       "Sv" to the name of the field shown in the snippet, and you've got a
999       macro which will return the value: "SvCUR(sv)" returns the current
1000       length of the string, "SvREFCOUNT(sv)" returns the reference count,
1001       "SvPV(sv, len)" returns the string itself with its length, and so on.
1002       More macros to manipulate these properties can be found in perlguts.
1003
1004       Let's take an example of manipulating a PV, from "sv_catpvn", in sv.c
1005
1006            1  void
1007            2  Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
1008            3  {
1009            4      STRLEN tlen;
1010            5      char *junk;
1011
1012            6      junk = SvPV_force(sv, tlen);
1013            7      SvGROW(sv, tlen + len + 1);
1014            8      if (ptr == junk)
1015            9          ptr = SvPVX(sv);
1016           10      Move(ptr,SvPVX(sv)+tlen,len,char);
1017           11      SvCUR(sv) += len;
1018           12      *SvEND(sv) = '\0';
1019           13      (void)SvPOK_only_UTF8(sv);          /* validate pointer */
1020           14      SvTAINT(sv);
1021           15  }
1022
1023       This is a function which adds a string, "ptr", of length "len" onto the
1024       end of the PV stored in "sv". The first thing we do in line 6 is make
1025       sure that the SV has a valid PV, by calling the "SvPV_force" macro to
1026       force a PV. As a side effect, "tlen" gets set to the current value of
1027       the PV, and the PV itself is returned to "junk".
1028
1029       In line 7, we make sure that the SV will have enough room to accommo‐
1030       date the old string, the new string and the null terminator. If "LEN"
1031       isn't big enough, "SvGROW" will reallocate space for us.
1032
1033       Now, if "junk" is the same as the string we're trying to add, we can
1034       grab the string directly from the SV; "SvPVX" is the address of the PV
1035       in the SV.
1036
1037       Line 10 does the actual catenation: the "Move" macro moves a chunk of
1038       memory around: we move the string "ptr" to the end of the PV - that's
1039       the start of the PV plus its current length. We're moving "len" bytes
1040       of type "char". After doing so, we need to tell Perl we've extended the
1041       string, by altering "CUR" to reflect the new length. "SvEND" is a macro
1042       which gives us the end of the string, so that needs to be a "\0".
1043
1044       Line 13 manipulates the flags; since we've changed the PV, any IV or NV
1045       values will no longer be valid: if we have "$a=10; $a.="6";" we don't
1046       want to use the old IV of 10. "SvPOK_only_utf8" is a special
1047       UTF-8-aware version of "SvPOK_only", a macro which turns off the IOK
1048       and NOK flags and turns on POK. The final "SvTAINT" is a macro which
1049       launders tainted data if taint mode is turned on.
1050
1051       AVs and HVs are more complicated, but SVs are by far the most common
1052       variable type being thrown around. Having seen something of how we
1053       manipulate these, let's go on and look at how the op tree is con‐
1054       structed.
1055
1056       Op Trees
1057
1058       First, what is the op tree, anyway? The op tree is the parsed represen‐
1059       tation of your program, as we saw in our section on parsing, and it's
1060       the sequence of operations that Perl goes through to execute your pro‐
1061       gram, as we saw in "Running".
1062
1063       An op is a fundamental operation that Perl can perform: all the built-
1064       in functions and operators are ops, and there are a series of ops which
1065       deal with concepts the interpreter needs internally - entering and
1066       leaving a block, ending a statement, fetching a variable, and so on.
1067
1068       The op tree is connected in two ways: you can imagine that there are
1069       two "routes" through it, two orders in which you can traverse the tree.
1070       First, parse order reflects how the parser understood the code, and
1071       secondly, execution order tells perl what order to perform the opera‐
1072       tions in.
1073
1074       The easiest way to examine the op tree is to stop Perl after it has
1075       finished parsing, and get it to dump out the tree. This is exactly what
1076       the compiler backends B::Terse, B::Concise and B::Debug do.
1077
1078       Let's have a look at how Perl sees "$a = $b + $c":
1079
1080            % perl -MO=Terse -e '$a=$b+$c'
1081            1  LISTOP (0x8179888) leave
1082            2      OP (0x81798b0) enter
1083            3      COP (0x8179850) nextstate
1084            4      BINOP (0x8179828) sassign
1085            5          BINOP (0x8179800) add [1]
1086            6              UNOP (0x81796e0) null [15]
1087            7                  SVOP (0x80fafe0) gvsv  GV (0x80fa4cc) *b
1088            8              UNOP (0x81797e0) null [15]
1089            9                  SVOP (0x8179700) gvsv  GV (0x80efeb0) *c
1090           10          UNOP (0x816b4f0) null [15]
1091           11              SVOP (0x816dcf0) gvsv  GV (0x80fa460) *a
1092
1093       Let's start in the middle, at line 4. This is a BINOP, a binary opera‐
1094       tor, which is at location 0x8179828. The specific operator in question
1095       is "sassign" - scalar assignment - and you can find the code which
1096       implements it in the function "pp_sassign" in pp_hot.c. As a binary
1097       operator, it has two children: the add operator, providing the result
1098       of "$b+$c", is uppermost on line 5, and the left hand side is on line
1099       10.
1100
1101       Line 10 is the null op: this does exactly nothing. What is that doing
1102       there? If you see the null op, it's a sign that something has been
1103       optimized away after parsing. As we mentioned in "Optimization", the
1104       optimization stage sometimes converts two operations into one, for
1105       example when fetching a scalar variable. When this happens, instead of
1106       rewriting the op tree and cleaning up the dangling pointers, it's eas‐
1107       ier just to replace the redundant operation with the null op. Origi‐
1108       nally, the tree would have looked like this:
1109
1110           10          SVOP (0x816b4f0) rv2sv [15]
1111           11              SVOP (0x816dcf0) gv  GV (0x80fa460) *a
1112
1113       That is, fetch the "a" entry from the main symbol table, and then look
1114       at the scalar component of it: "gvsv" ("pp_gvsv" into pp_hot.c) happens
1115       to do both these things.
1116
1117       The right hand side, starting at line 5 is similar to what we've just
1118       seen: we have the "add" op ("pp_add" also in pp_hot.c) add together two
1119       "gvsv"s.
1120
1121       Now, what's this about?
1122
1123            1  LISTOP (0x8179888) leave
1124            2      OP (0x81798b0) enter
1125            3      COP (0x8179850) nextstate
1126
1127       "enter" and "leave" are scoping ops, and their job is to perform any
1128       housekeeping every time you enter and leave a block: lexical variables
1129       are tidied up, unreferenced variables are destroyed, and so on. Every
1130       program will have those first three lines: "leave" is a list, and its
1131       children are all the statements in the block. Statements are delimited
1132       by "nextstate", so a block is a collection of "nextstate" ops, with the
1133       ops to be performed for each statement being the children of
1134       "nextstate". "enter" is a single op which functions as a marker.
1135
1136       That's how Perl parsed the program, from top to bottom:
1137
1138                               Program
1139                                  ⎪
1140                              Statement
1141                                  ⎪
1142                                  =
1143                                 / \
1144                                /   \
1145                               $a   +
1146                                   / \
1147                                 $b   $c
1148
1149       However, it's impossible to perform the operations in this order: you
1150       have to find the values of $b and $c before you add them together, for
1151       instance. So, the other thread that runs through the op tree is the
1152       execution order: each op has a field "op_next" which points to the next
1153       op to be run, so following these pointers tells us how perl executes
1154       the code. We can traverse the tree in this order using the "exec"
1155       option to "B::Terse":
1156
1157            % perl -MO=Terse,exec -e '$a=$b+$c'
1158            1  OP (0x8179928) enter
1159            2  COP (0x81798c8) nextstate
1160            3  SVOP (0x81796c8) gvsv  GV (0x80fa4d4) *b
1161            4  SVOP (0x8179798) gvsv  GV (0x80efeb0) *c
1162            5  BINOP (0x8179878) add [1]
1163            6  SVOP (0x816dd38) gvsv  GV (0x80fa468) *a
1164            7  BINOP (0x81798a0) sassign
1165            8  LISTOP (0x8179900) leave
1166
1167       This probably makes more sense for a human: enter a block, start a
1168       statement. Get the values of $b and $c, and add them together.  Find
1169       $a, and assign one to the other. Then leave.
1170
1171       The way Perl builds up these op trees in the parsing process can be
1172       unravelled by examining perly.y, the YACC grammar. Let's take the piece
1173       we need to construct the tree for "$a = $b + $c"
1174
1175           1 term    :   term ASSIGNOP term
1176           2                { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
1177           3         ⎪   term ADDOP term
1178           4                { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1179
1180       If you're not used to reading BNF grammars, this is how it works:
1181       You're fed certain things by the tokeniser, which generally end up in
1182       upper case. Here, "ADDOP", is provided when the tokeniser sees "+" in
1183       your code. "ASSIGNOP" is provided when "=" is used for assigning. These
1184       are "terminal symbols", because you can't get any simpler than them.
1185
1186       The grammar, lines one and three of the snippet above, tells you how to
1187       build up more complex forms. These complex forms, "non-terminal sym‐
1188       bols" are generally placed in lower case. "term" here is a non-terminal
1189       symbol, representing a single expression.
1190
1191       The grammar gives you the following rule: you can make the thing on the
1192       left of the colon if you see all the things on the right in sequence.
1193       This is called a "reduction", and the aim of parsing is to completely
1194       reduce the input. There are several different ways you can perform a
1195       reduction, separated by vertical bars: so, "term" followed by "=" fol‐
1196       lowed by "term" makes a "term", and "term" followed by "+" followed by
1197       "term" can also make a "term".
1198
1199       So, if you see two terms with an "=" or "+", between them, you can turn
1200       them into a single expression. When you do this, you execute the code
1201       in the block on the next line: if you see "=", you'll do the code in
1202       line 2. If you see "+", you'll do the code in line 4. It's this code
1203       which contributes to the op tree.
1204
1205                   ⎪   term ADDOP term
1206                   { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1207
1208       What this does is creates a new binary op, and feeds it a number of
1209       variables. The variables refer to the tokens: $1 is the first token in
1210       the input, $2 the second, and so on - think regular expression backref‐
1211       erences. $$ is the op returned from this reduction. So, we call "new‐
1212       BINOP" to create a new binary operator. The first parameter to "new‐
1213       BINOP", a function in op.c, is the op type. It's an addition operator,
1214       so we want the type to be "ADDOP". We could specify this directly, but
1215       it's right there as the second token in the input, so we use $2. The
1216       second parameter is the op's flags: 0 means "nothing special". Then the
1217       things to add: the left and right hand side of our expression, in
1218       scalar context.
1219
1220       Stacks
1221
1222       When perl executes something like "addop", how does it pass on its
1223       results to the next op? The answer is, through the use of stacks. Perl
1224       has a number of stacks to store things it's currently working on, and
1225       we'll look at the three most important ones here.
1226
1227       Argument stack
1228          Arguments are passed to PP code and returned from PP code using the
1229          argument stack, "ST". The typical way to handle arguments is to pop
1230          them off the stack, deal with them how you wish, and then push the
1231          result back onto the stack. This is how, for instance, the cosine
1232          operator works:
1233
1234                NV value;
1235                value = POPn;
1236                value = Perl_cos(value);
1237                XPUSHn(value);
1238
1239          We'll see a more tricky example of this when we consider Perl's
1240          macros below. "POPn" gives you the NV (floating point value) of the
1241          top SV on the stack: the $x in "cos($x)". Then we compute the
1242          cosine, and push the result back as an NV. The "X" in "XPUSHn" means
1243          that the stack should be extended if necessary - it can't be neces‐
1244          sary here, because we know there's room for one more item on the
1245          stack, since we've just removed one! The "XPUSH*" macros at least
1246          guarantee safety.
1247
1248          Alternatively, you can fiddle with the stack directly: "SP" gives
1249          you the first element in your portion of the stack, and "TOP*" gives
1250          you the top SV/IV/NV/etc. on the stack. So, for instance, to do
1251          unary negation of an integer:
1252
1253               SETi(-TOPi);
1254
1255          Just set the integer value of the top stack entry to its negation.
1256
1257          Argument stack manipulation in the core is exactly the same as it is
1258          in XSUBs - see perlxstut, perlxs and perlguts for a longer descrip‐
1259          tion of the macros used in stack manipulation.
1260
1261       Mark stack
1262          I say "your portion of the stack" above because PP code doesn't nec‐
1263          essarily get the whole stack to itself: if your function calls
1264          another function, you'll only want to expose the arguments aimed for
1265          the called function, and not (necessarily) let it get at your own
1266          data. The way we do this is to have a "virtual" bottom-of-stack,
1267          exposed to each function. The mark stack keeps bookmarks to loca‐
1268          tions in the argument stack usable by each function. For instance,
1269          when dealing with a tied variable, (internally, something with "P"
1270          magic) Perl has to call methods for accesses to the tied variables.
1271          However, we need to separate the arguments exposed to the method to
1272          the argument exposed to the original function - the store or fetch
1273          or whatever it may be. Here's how the tied "push" is implemented;
1274          see "av_push" in av.c:
1275
1276               1  PUSHMARK(SP);
1277               2  EXTEND(SP,2);
1278               3  PUSHs(SvTIED_obj((SV*)av, mg));
1279               4  PUSHs(val);
1280               5  PUTBACK;
1281               6  ENTER;
1282               7  call_method("PUSH", G_SCALAR⎪G_DISCARD);
1283               8  LEAVE;
1284               9  POPSTACK;
1285
1286          The lines which concern the mark stack are the first, fifth and last
1287          lines: they save away, restore and remove the current position of
1288          the argument stack.
1289
1290          Let's examine the whole implementation, for practice:
1291
1292               1  PUSHMARK(SP);
1293
1294          Push the current state of the stack pointer onto the mark stack.
1295          This is so that when we've finished adding items to the argument
1296          stack, Perl knows how many things we've added recently.
1297
1298               2  EXTEND(SP,2);
1299               3  PUSHs(SvTIED_obj((SV*)av, mg));
1300               4  PUSHs(val);
1301
1302          We're going to add two more items onto the argument stack: when you
1303          have a tied array, the "PUSH" subroutine receives the object and the
1304          value to be pushed, and that's exactly what we have here - the tied
1305          object, retrieved with "SvTIED_obj", and the value, the SV "val".
1306
1307               5  PUTBACK;
1308
1309          Next we tell Perl to make the change to the global stack pointer:
1310          "dSP" only gave us a local copy, not a reference to the global.
1311
1312               6  ENTER;
1313               7  call_method("PUSH", G_SCALAR⎪G_DISCARD);
1314               8  LEAVE;
1315
1316          "ENTER" and "LEAVE" localise a block of code - they make sure that
1317          all variables are tidied up, everything that has been localised gets
1318          its previous value returned, and so on. Think of them as the "{" and
1319          "}" of a Perl block.
1320
1321          To actually do the magic method call, we have to call a subroutine
1322          in Perl space: "call_method" takes care of that, and it's described
1323          in perlcall. We call the "PUSH" method in scalar context, and we're
1324          going to discard its return value.
1325
1326               9  POPSTACK;
1327
1328          Finally, we remove the value we placed on the mark stack, since we
1329          don't need it any more.
1330
1331       Save stack
1332          C doesn't have a concept of local scope, so perl provides one. We've
1333          seen that "ENTER" and "LEAVE" are used as scoping braces; the save
1334          stack implements the C equivalent of, for example:
1335
1336              {
1337                  local $foo = 42;
1338                  ...
1339              }
1340
1341          See "Localising Changes" in perlguts for how to use the save stack.
1342
1343       Millions of Macros
1344
1345       One thing you'll notice about the Perl source is that it's full of
1346       macros. Some have called the pervasive use of macros the hardest thing
1347       to understand, others find it adds to clarity. Let's take an example,
1348       the code which implements the addition operator:
1349
1350          1  PP(pp_add)
1351          2  {
1352          3      dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
1353          4      {
1354          5        dPOPTOPnnrl_ul;
1355          6        SETn( left + right );
1356          7        RETURN;
1357          8      }
1358          9  }
1359
1360       Every line here (apart from the braces, of course) contains a macro.
1361       The first line sets up the function declaration as Perl expects for PP
1362       code; line 3 sets up variable declarations for the argument stack and
1363       the target, the return value of the operation. Finally, it tries to see
1364       if the addition operation is overloaded; if so, the appropriate subrou‐
1365       tine is called.
1366
1367       Line 5 is another variable declaration - all variable declarations
1368       start with "d" - which pops from the top of the argument stack two NVs
1369       (hence "nn") and puts them into the variables "right" and "left", hence
1370       the "rl". These are the two operands to the addition operator. Next, we
1371       call "SETn" to set the NV of the return value to the result of adding
1372       the two values. This done, we return - the "RETURN" macro makes sure
1373       that our return value is properly handled, and we pass the next opera‐
1374       tor to run back to the main run loop.
1375
1376       Most of these macros are explained in perlapi, and some of the more
1377       important ones are explained in perlxs as well. Pay special attention
1378       to "Background and PERL_IMPLICIT_CONTEXT" in perlguts for information
1379       on the "[pad]THX_?" macros.
1380
1381       The .i Targets
1382
1383       You can expand the macros in a foo.c file by saying
1384
1385           make foo.i
1386
1387       which will expand the macros using cpp.  Don't be scared by the
1388       results.
1389
1390       Poking at Perl
1391
1392       To really poke around with Perl, you'll probably want to build Perl for
1393       debugging, like this:
1394
1395           ./Configure -d -D optimize=-g
1396           make
1397
1398       "-g" is a flag to the C compiler to have it produce debugging informa‐
1399       tion which will allow us to step through a running program.  Configure
1400       will also turn on the "DEBUGGING" compilation symbol which enables all
1401       the internal debugging code in Perl. There are a whole bunch of things
1402       you can debug with this: perlrun lists them all, and the best way to
1403       find out about them is to play about with them. The most useful options
1404       are probably
1405
1406           l  Context (loop) stack processing
1407           t  Trace execution
1408           o  Method and overloading resolution
1409           c  String/numeric conversions
1410
1411       Some of the functionality of the debugging code can be achieved using
1412       XS modules.
1413
1414           -Dr => use re 'debug'
1415           -Dx => use O 'Debug'
1416
1417       Using a source-level debugger
1418
1419       If the debugging output of "-D" doesn't help you, it's time to step
1420       through perl's execution with a source-level debugger.
1421
1422       ·  We'll use "gdb" for our examples here; the principles will apply to
1423          any debugger, but check the manual of the one you're using.
1424
1425       To fire up the debugger, type
1426
1427           gdb ./perl
1428
1429       You'll want to do that in your Perl source tree so the debugger can
1430       read the source code. You should see the copyright message, followed by
1431       the prompt.
1432
1433           (gdb)
1434
1435       "help" will get you into the documentation, but here are the most use‐
1436       ful commands:
1437
1438       run [args]
1439          Run the program with the given arguments.
1440
1441       break function_name
1442       break source.c:xxx
1443          Tells the debugger that we'll want to pause execution when we reach
1444          either the named function (but see "Internal Functions" in
1445          perlguts!) or the given line in the named source file.
1446
1447       step
1448          Steps through the program a line at a time.
1449
1450       next
1451          Steps through the program a line at a time, without descending into
1452          functions.
1453
1454       continue
1455          Run until the next breakpoint.
1456
1457       finish
1458          Run until the end of the current function, then stop again.
1459
1460       'enter'
1461          Just pressing Enter will do the most recent operation again - it's a
1462          blessing when stepping through miles of source code.
1463
1464       print
1465          Execute the given C code and print its results. WARNING: Perl makes
1466          heavy use of macros, and gdb does not necessarily support macros
1467          (see later "gdb macro support").  You'll have to substitute them
1468          yourself, or to invoke cpp on the source code files (see "The .i
1469          Targets") So, for instance, you can't say
1470
1471              print SvPV_nolen(sv)
1472
1473          but you have to say
1474
1475              print Perl_sv_2pv_nolen(sv)
1476
1477       You may find it helpful to have a "macro dictionary", which you can
1478       produce by saying "cpp -dM perl.c ⎪ sort". Even then, cpp won't recur‐
1479       sively apply those macros for you.
1480
1481       gdb macro support
1482
1483       Recent versions of gdb have fairly good macro support, but in order to
1484       use it you'll need to compile perl with macro definitions included in
1485       the debugging information.  Using gcc version 3.1, this means configur‐
1486       ing with "-Doptimize=-g3".  Other compilers might use a different
1487       switch (if they support debugging macros at all).
1488
1489       Dumping Perl Data Structures
1490
1491       One way to get around this macro hell is to use the dumping functions
1492       in dump.c; these work a little like an internal Devel::Peek, but they
1493       also cover OPs and other structures that you can't get at from Perl.
1494       Let's take an example. We'll use the "$a = $b + $c" we used before, but
1495       give it a bit of context: "$b = "6XXXX"; $c = 2.3;". Where's a good
1496       place to stop and poke around?
1497
1498       What about "pp_add", the function we examined earlier to implement the
1499       "+" operator:
1500
1501           (gdb) break Perl_pp_add
1502           Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
1503
1504       Notice we use "Perl_pp_add" and not "pp_add" - see "Internal Functions"
1505       in perlguts.  With the breakpoint in place, we can run our program:
1506
1507           (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
1508
1509       Lots of junk will go past as gdb reads in the relevant source files and
1510       libraries, and then:
1511
1512           Breakpoint 1, Perl_pp_add () at pp_hot.c:309
1513           309         dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
1514           (gdb) step
1515           311           dPOPTOPnnrl_ul;
1516           (gdb)
1517
1518       We looked at this bit of code before, and we said that "dPOPTOPnnrl_ul"
1519       arranges for two "NV"s to be placed into "left" and "right" - let's
1520       slightly expand it:
1521
1522           #define dPOPTOPnnrl_ul  NV right = POPn; \
1523                                   SV *leftsv = TOPs; \
1524                                   NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
1525
1526       "POPn" takes the SV from the top of the stack and obtains its NV either
1527       directly (if "SvNOK" is set) or by calling the "sv_2nv" function.
1528       "TOPs" takes the next SV from the top of the stack - yes, "POPn" uses
1529       "TOPs" - but doesn't remove it. We then use "SvNV" to get the NV from
1530       "leftsv" in the same way as before - yes, "POPn" uses "SvNV".
1531
1532       Since we don't have an NV for $b, we'll have to use "sv_2nv" to convert
1533       it. If we step again, we'll find ourselves there:
1534
1535           Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
1536           1669        if (!sv)
1537           (gdb)
1538
1539       We can now use "Perl_sv_dump" to investigate the SV:
1540
1541           SV = PV(0xa057cc0) at 0xa0675d0
1542           REFCNT = 1
1543           FLAGS = (POK,pPOK)
1544           PV = 0xa06a510 "6XXXX"\0
1545           CUR = 5
1546           LEN = 6
1547           $1 = void
1548
1549       We know we're going to get 6 from this, so let's finish the subroutine:
1550
1551           (gdb) finish
1552           Run till exit from #0  Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
1553           0x462669 in Perl_pp_add () at pp_hot.c:311
1554           311           dPOPTOPnnrl_ul;
1555
1556       We can also dump out this op: the current op is always stored in
1557       "PL_op", and we can dump it with "Perl_op_dump". This'll give us simi‐
1558       lar output to B::Debug.
1559
1560           {
1561           13  TYPE = add  ===> 14
1562               TARG = 1
1563               FLAGS = (SCALAR,KIDS)
1564               {
1565                   TYPE = null  ===> (12)
1566                     (was rv2sv)
1567                   FLAGS = (SCALAR,KIDS)
1568                   {
1569           11          TYPE = gvsv  ===> 12
1570                       FLAGS = (SCALAR)
1571                       GV = main::b
1572                   }
1573               }
1574
1575       # finish this later #
1576
1577       Patching
1578
1579       All right, we've now had a look at how to navigate the Perl sources and
1580       some things you'll need to know when fiddling with them. Let's now get
1581       on and create a simple patch. Here's something Larry suggested: if a
1582       "U" is the first active format during a "pack", (for example, "pack
1583       "U3C8", @stuff") then the resulting string should be treated as UTF-8
1584       encoded.
1585
1586       How do we prepare to fix this up? First we locate the code in question
1587       - the "pack" happens at runtime, so it's going to be in one of the pp
1588       files. Sure enough, "pp_pack" is in pp.c. Since we're going to be
1589       altering this file, let's copy it to pp.c~.
1590
1591       [Well, it was in pp.c when this tutorial was written. It has now been
1592       split off with "pp_unpack" to its own file, pp_pack.c]
1593
1594       Now let's look over "pp_pack": we take a pattern into "pat", and then
1595       loop over the pattern, taking each format character in turn into
1596       "datum_type". Then for each possible format character, we swallow up
1597       the other arguments in the pattern (a field width, an asterisk, and so
1598       on) and convert the next chunk input into the specified format, adding
1599       it onto the output SV "cat".
1600
1601       How do we know if the "U" is the first format in the "pat"? Well, if we
1602       have a pointer to the start of "pat" then, if we see a "U" we can test
1603       whether we're still at the start of the string. So, here's where "pat"
1604       is set up:
1605
1606           STRLEN fromlen;
1607           register char *pat = SvPVx(*++MARK, fromlen);
1608           register char *patend = pat + fromlen;
1609           register I32 len;
1610           I32 datumtype;
1611           SV *fromstr;
1612
1613       We'll have another string pointer in there:
1614
1615           STRLEN fromlen;
1616           register char *pat = SvPVx(*++MARK, fromlen);
1617           register char *patend = pat + fromlen;
1618        +  char *patcopy;
1619           register I32 len;
1620           I32 datumtype;
1621           SV *fromstr;
1622
1623       And just before we start the loop, we'll set "patcopy" to be the start
1624       of "pat":
1625
1626           items = SP - MARK;
1627           MARK++;
1628           sv_setpvn(cat, "", 0);
1629        +  patcopy = pat;
1630           while (pat < patend) {
1631
1632       Now if we see a "U" which was at the start of the string, we turn on
1633       the "UTF8" flag for the output SV, "cat":
1634
1635        +  if (datumtype == 'U' && pat==patcopy+1)
1636        +      SvUTF8_on(cat);
1637           if (datumtype == '#') {
1638               while (pat < patend && *pat != '\n')
1639                   pat++;
1640
1641       Remember that it has to be "patcopy+1" because the first character of
1642       the string is the "U" which has been swallowed into "datumtype!"
1643
1644       Oops, we forgot one thing: what if there are spaces at the start of the
1645       pattern? "pack("  U*", @stuff)" will have "U" as the first active char‐
1646       acter, even though it's not the first thing in the pattern. In this
1647       case, we have to advance "patcopy" along with "pat" when we see spaces:
1648
1649           if (isSPACE(datumtype))
1650               continue;
1651
1652       needs to become
1653
1654           if (isSPACE(datumtype)) {
1655               patcopy++;
1656               continue;
1657           }
1658
1659       OK. That's the C part done. Now we must do two additional things before
1660       this patch is ready to go: we've changed the behaviour of Perl, and so
1661       we must document that change. We must also provide some more regression
1662       tests to make sure our patch works and doesn't create a bug somewhere
1663       else along the line.
1664
1665       The regression tests for each operator live in t/op/, and so we make a
1666       copy of t/op/pack.t to t/op/pack.t~. Now we can add our tests to the
1667       end. First, we'll test that the "U" does indeed create Unicode strings.
1668
1669       t/op/pack.t has a sensible ok() function, but if it didn't we could use
1670       the one from t/test.pl.
1671
1672        require './test.pl';
1673        plan( tests => 159 );
1674
1675       so instead of this:
1676
1677        print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000);
1678        print "ok $test\n"; $test++;
1679
1680       we can write the more sensible (see Test::More for a full explanation
1681       of is() and other testing functions).
1682
1683        is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
1684                                              "U* produces unicode" );
1685
1686       Now we'll test that we got that space-at-the-beginning business right:
1687
1688        is( "1.20.300.4000", sprintf "%vd", pack("  U*",1,20,300,4000),
1689                                              "  with spaces at the beginning" );
1690
1691       And finally we'll test that we don't make Unicode strings if "U" is not
1692       the first active format:
1693
1694        isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
1695                                              "U* not first isn't unicode" );
1696
1697       Mustn't forget to change the number of tests which appears at the top,
1698       or else the automated tester will get confused.  This will either look
1699       like this:
1700
1701        print "1..156\n";
1702
1703       or this:
1704
1705        plan( tests => 156 );
1706
1707       We now compile up Perl, and run it through the test suite. Our new
1708       tests pass, hooray!
1709
1710       Finally, the documentation. The job is never done until the paperwork
1711       is over, so let's describe the change we've just made. The relevant
1712       place is pod/perlfunc.pod; again, we make a copy, and then we'll insert
1713       this text in the description of "pack":
1714
1715        =item *
1716
1717        If the pattern begins with a C<U>, the resulting string will be treated
1718        as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
1719        with an initial C<U0>, and the bytes that follow will be interpreted as
1720        Unicode characters. If you don't want this to happen, you can begin your
1721        pattern with C<C0> (or anything else) to force Perl not to UTF-8 encode your
1722        string, and then follow this with a C<U*> somewhere in your pattern.
1723
1724       All done. Now let's create the patch. Porting/patching.pod tells us
1725       that if we're making major changes, we should copy the entire directory
1726       to somewhere safe before we begin fiddling, and then do
1727
1728           diff -ruN old new > patch
1729
1730       However, we know which files we've changed, and we can simply do this:
1731
1732           diff -u pp.c~             pp.c             >  patch
1733           diff -u t/op/pack.t~      t/op/pack.t      >> patch
1734           diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch
1735
1736       We end up with a patch looking a little like this:
1737
1738           --- pp.c~       Fri Jun 02 04:34:10 2000
1739           +++ pp.c        Fri Jun 16 11:37:25 2000
1740           @@ -4375,6 +4375,7 @@
1741                register I32 items;
1742                STRLEN fromlen;
1743                register char *pat = SvPVx(*++MARK, fromlen);
1744           +    char *patcopy;
1745                register char *patend = pat + fromlen;
1746                register I32 len;
1747                I32 datumtype;
1748           @@ -4405,6 +4406,7 @@
1749           ...
1750
1751       And finally, we submit it, with our rationale, to perl5-porters. Job
1752       done!
1753
1754       Patching a core module
1755
1756       This works just like patching anything else, with an extra considera‐
1757       tion.  Many core modules also live on CPAN.  If this is so, patch the
1758       CPAN version instead of the core and send the patch off to the module
1759       maintainer (with a copy to p5p).  This will help the module maintainer
1760       keep the CPAN version in sync with the core version without constantly
1761       scanning p5p.
1762
1763       The list of maintainers of core modules is usefully documented in Port‐
1764       ing/Maintainers.pl.
1765
1766       Adding a new function to the core
1767
1768       If, as part of a patch to fix a bug, or just because you have an espe‐
1769       cially good idea, you decide to add a new function to the core, discuss
1770       your ideas on p5p well before you start work.  It may be that someone
1771       else has already attempted to do what you are considering and can give
1772       lots of good advice or even provide you with bits of code that they
1773       already started (but never finished).
1774
1775       You have to follow all of the advice given above for patching.  It is
1776       extremely important to test any addition thoroughly and add new tests
1777       to explore all boundary conditions that your new function is expected
1778       to handle.  If your new function is used only by one module (e.g.
1779       toke), then it should probably be named S_your_function (for static);
1780       on the other hand, if you expect it to accessible from other functions
1781       in Perl, you should name it Perl_your_function.  See "Internal Func‐
1782       tions" in perlguts for more details.
1783
1784       The location of any new code is also an important consideration.  Don't
1785       just create a new top level .c file and put your code there; you would
1786       have to make changes to Configure (so the Makefile is created prop‐
1787       erly), as well as possibly lots of include files.  This is strictly
1788       pumpking business.
1789
1790       It is better to add your function to one of the existing top level
1791       source code files, but your choice is complicated by the nature of the
1792       Perl distribution.  Only the files that are marked as compiled static
1793       are located in the perl executable.  Everything else is located in the
1794       shared library (or DLL if you are running under WIN32).  So, for exam‐
1795       ple, if a function was only used by functions located in toke.c, then
1796       your code can go in toke.c.  If, however, you want to call the function
1797       from universal.c, then you should put your code in another location,
1798       for example util.c.
1799
1800       In addition to writing your c-code, you will need to create an appro‐
1801       priate entry in embed.pl describing your function, then run 'make
1802       regen_headers' to create the entries in the numerous header files that
1803       perl needs to compile correctly.  See "Internal Functions" in perlguts
1804       for information on the various options that you can set in embed.pl.
1805       You will forget to do this a few (or many) times and you will get warn‐
1806       ings during the compilation phase.  Make sure that you mention this
1807       when you post your patch to P5P; the pumpking needs to know this.
1808
1809       When you write your new code, please be conscious of existing code con‐
1810       ventions used in the perl source files.  See perlstyle for details.
1811       Although most of the guidelines discussed seem to focus on Perl code,
1812       rather than c, they all apply (except when they don't ;).  See also
1813       Porting/patching.pod file in the Perl source distribution for lots of
1814       details about both formatting and submitting patches of your changes.
1815
1816       Lastly, TEST TEST TEST TEST TEST any code before posting to p5p.  Test
1817       on as many platforms as you can find.  Test as many perl Configure
1818       options as you can (e.g. MULTIPLICITY).  If you have profiling or mem‐
1819       ory tools, see "EXTERNAL TOOLS FOR DEBUGGING PERL" below for how to use
1820       them to further test your code.  Remember that most of the people on
1821       P5P are doing this on their own time and don't have the time to debug
1822       your code.
1823
1824       Writing a test
1825
1826       Every module and built-in function has an associated test file (or
1827       should...).  If you add or change functionality, you have to write a
1828       test.  If you fix a bug, you have to write a test so that bug never
1829       comes back.  If you alter the docs, it would be nice to test what the
1830       new documentation says.
1831
1832       In short, if you submit a patch you probably also have to patch the
1833       tests.
1834
1835       For modules, the test file is right next to the module itself.
1836       lib/strict.t tests lib/strict.pm.  This is a recent innovation, so
1837       there are some snags (and it would be wonderful for you to brush them
1838       out), but it basically works that way.  Everything else lives in t/.
1839
1840       t/base/
1841          Testing of the absolute basic functionality of Perl.  Things like
1842          "if", basic file reads and writes, simple regexes, etc.  These are
1843          run first in the test suite and if any of them fail, something is
1844          really broken.
1845
1846       t/cmd/
1847          These test the basic control structures, "if/else", "while", subrou‐
1848          tines, etc.
1849
1850       t/comp/
1851          Tests basic issues of how Perl parses and compiles itself.
1852
1853       t/io/
1854          Tests for built-in IO functions, including command line arguments.
1855
1856       t/lib/
1857          The old home for the module tests, you shouldn't put anything new in
1858          here.  There are still some bits and pieces hanging around in here
1859          that need to be moved.  Perhaps you could move them?  Thanks!
1860
1861       t/op/
1862          Tests for perl's built in functions that don't fit into any of the
1863          other directories.
1864
1865       t/pod/
1866          Tests for POD directives.  There are still some tests for the Pod
1867          modules hanging around in here that need to be moved out into lib/.
1868
1869       t/run/
1870          Testing features of how perl actually runs, including exit codes and
1871          handling of PERL* environment variables.
1872
1873       t/uni/
1874          Tests for the core support of Unicode.
1875
1876       t/win32/
1877          Windows-specific tests.
1878
1879       t/x2p
1880          A test suite for the s2p converter.
1881
1882       The core uses the same testing style as the rest of Perl, a simple
1883       "ok/not ok" run through Test::Harness, but there are a few special con‐
1884       siderations.
1885
1886       There are three ways to write a test in the core.  Test::More,
1887       t/test.pl and ad hoc "print $test ? "ok 42\n" : "not ok 42\n"".  The
1888       decision of which to use depends on what part of the test suite you're
1889       working on.  This is a measure to prevent a high-level failure (such as
1890       Config.pm breaking) from causing basic functionality tests to fail.
1891
1892       t/base t/comp
1893           Since we don't know if require works, or even subroutines, use ad
1894           hoc tests for these two.  Step carefully to avoid using the feature
1895           being tested.
1896
1897       t/cmd t/run t/io t/op
1898           Now that basic require() and subroutines are tested, you can use
1899           the t/test.pl library which emulates the important features of
1900           Test::More while using a minimum of core features.
1901
1902           You can also conditionally use certain libraries like Config, but
1903           be sure to skip the test gracefully if it's not there.
1904
1905       t/lib ext lib
1906           Now that the core of Perl is tested, Test::More can be used.  You
1907           can also use the full suite of core modules in the tests.
1908
1909       When you say "make test" Perl uses the t/TEST program to run the test
1910       suite (except under Win32 where it uses t/harness instead.)  All tests
1911       are run from the t/ directory, not the directory which contains the
1912       test.  This causes some problems with the tests in lib/, so here's some
1913       opportunity for some patching.
1914
1915       You must be triply conscious of cross-platform concerns.  This usually
1916       boils down to using File::Spec and avoiding things like "fork()" and
1917       "system()" unless absolutely necessary.
1918
1919       Special Make Test Targets
1920
1921       There are various special make targets that can be used to test Perl
1922       slightly differently than the standard "test" target.  Not all them are
1923       expected to give a 100% success rate.  Many of them have several
1924       aliases, and many of them are not available on certain operating sys‐
1925       tems.
1926
1927       coretest
1928           Run perl on all core tests (t/* and lib/[a-z]* pragma tests).
1929
1930           (Not available on Win32)
1931
1932       test.deparse
1933           Run all the tests through B::Deparse.  Not all tests will succeed.
1934
1935           (Not available on Win32)
1936
1937       test.taintwarn
1938           Run all tests with the -t command-line switch.  Not all tests are
1939           expected to succeed (until they're specifically fixed, of course).
1940
1941           (Not available on Win32)
1942
1943       minitest
1944           Run miniperl on t/base, t/comp, t/cmd, t/run, t/io, t/op, and t/uni
1945           tests.
1946
1947       test.valgrind check.valgrind utest.valgrind ucheck.valgrind
1948           (Only in Linux) Run all the tests using the memory leak + naughty
1949           memory access tool "valgrind".  The log files will be named test‐
1950           name.valgrind.
1951
1952       test.third check.third utest.third ucheck.third
1953           (Only in Tru64)  Run all the tests using the memory leak + naughty
1954           memory access tool "Third Degree".  The log files will be named
1955           perl.3log.testname.
1956
1957       test.torture torturetest
1958           Run all the usual tests and some extra tests.  As of Perl 5.8.0 the
1959           only extra tests are Abigail's JAPHs, t/japh/abigail.t.
1960
1961           You can also run the torture test with t/harness by giving "-tor‐
1962           ture" argument to t/harness.
1963
1964       utest ucheck test.utf8 check.utf8
1965           Run all the tests with -Mutf8.  Not all tests will succeed.
1966
1967           (Not available on Win32)
1968
1969       minitest.utf16 test.utf16
1970           Runs the tests with UTF-16 encoded scripts, encoded with different
1971           versions of this encoding.
1972
1973           "make utest.utf16" runs the test suite with a combination of
1974           "-utf8" and "-utf16" arguments to t/TEST.
1975
1976           (Not available on Win32)
1977
1978       test_harness
1979           Run the test suite with the t/harness controlling program, instead
1980           of t/TEST. t/harness is more sophisticated, and uses the Test::Har‐
1981           ness module, thus using this test target supposes that perl mostly
1982           works. The main advantage for our purposes is that it prints a
1983           detailed summary of failed tests at the end. Also, unlike t/TEST,
1984           it doesn't redirect stderr to stdout.
1985
1986           Note that under Win32 t/harness is always used instead of t/TEST,
1987           so there is no special "test_harness" target.
1988
1989           Under Win32's "test" target you may use the TEST_SWITCHES and
1990           TEST_FILES environment variables to control the behaviour of t/har‐
1991           ness.  This means you can say
1992
1993               nmake test TEST_FILES="op/*.t"
1994               nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t"
1995
1996       test-notty test_notty
1997           Sets PERL_SKIP_TTY_TEST to true before running normal test.
1998
1999       Running tests by hand
2000
2001       You can run part of the test suite by hand by using one the following
2002       commands from the t/ directory :
2003
2004           ./perl -I../lib TEST list-of-.t-files
2005
2006       or
2007
2008           ./perl -I../lib harness list-of-.t-files
2009
2010       (if you don't specify test scripts, the whole test suite will be run.)
2011
2012       Using t/harness for testing
2013
2014       If you use "harness" for testing you have several command line options
2015       available to you. The arguments are as follows, and are in the order
2016       that they must appear if used together.
2017
2018           harness -v -torture -re=pattern LIST OF FILES TO TEST
2019           harness -v -torture -re LIST OF PATTERNS TO MATCH
2020
2021       If "LIST OF FILES TO TEST" is omitted the file list is obtained from
2022       the manifest. The file list may include shell wildcards which will be
2023       expanded out.
2024
2025       -v  Run the tests under verbose mode so you can see what tests were
2026           run, and debug outbut.
2027
2028       -torture
2029           Run the torture tests as well as the normal set.
2030
2031       -re=PATTERN
2032           Filter the file list so that all the test files run match PATTERN.
2033           Note that this form is distinct from the -re LIST OF PATTERNS form
2034           below in that it allows the file list to be provided as well.
2035
2036       -re LIST OF PATTERNS
2037           Filter the file list so that all the test files run match
2038           /(LIST⎪OF⎪PATTERNS)/. Note that with this form the patterns are
2039           joined by '⎪' and you cannot supply a list of files, instead the
2040           test files are obtained from the MANIFEST.
2041
2042       You can run an individual test by a command similar to
2043
2044           ./perl -I../lib patho/to/foo.t
2045
2046       except that the harnesses set up some environment variables that may
2047       affect the execution of the test :
2048
2049       PERL_CORE=1
2050           indicates that we're running this test part of the perl core test
2051           suite.  This is useful for modules that have a dual life on CPAN.
2052
2053       PERL_DESTRUCT_LEVEL=2
2054           is set to 2 if it isn't set already (see "PERL_DESTRUCT_LEVEL")
2055
2056       PERL
2057           (used only by t/TEST) if set, overrides the path to the perl exe‐
2058           cutable that should be used to run the tests (the default being
2059           ./perl).
2060
2061       PERL_SKIP_TTY_TEST
2062           if set, tells to skip the tests that need a terminal. It's actually
2063           set automatically by the Makefile, but can also be forced artifi‐
2064           cially by running 'make test_notty'.
2065

EXTERNAL TOOLS FOR DEBUGGING PERL

2067       Sometimes it helps to use external tools while debugging and testing
2068       Perl.  This section tries to guide you through using some common test‐
2069       ing and debugging tools with Perl.  This is meant as a guide to inter‐
2070       facing these tools with Perl, not as any kind of guide to the use of
2071       the tools themselves.
2072
2073       NOTE 1: Running under memory debuggers such as Purify, valgrind, or
2074       Third Degree greatly slows down the execution: seconds become minutes,
2075       minutes become hours.  For example as of Perl 5.8.1, the
2076       ext/Encode/t/Unicode.t takes extraordinarily long to complete under
2077       e.g. Purify, Third Degree, and valgrind.  Under valgrind it takes more
2078       than six hours, even on a snappy computer-- the said test must be doing
2079       something that is quite unfriendly for memory debuggers.  If you don't
2080       feel like waiting, that you can simply kill away the perl process.
2081
2082       NOTE 2: To minimize the number of memory leak false alarms (see
2083       "PERL_DESTRUCT_LEVEL" for more information), you have to have environ‐
2084       ment variable PERL_DESTRUCT_LEVEL set to 2.  The TEST and harness
2085       scripts do that automatically.  But if you are running some of the
2086       tests manually-- for csh-like shells:
2087
2088           setenv PERL_DESTRUCT_LEVEL 2
2089
2090       and for Bourne-type shells:
2091
2092           PERL_DESTRUCT_LEVEL=2
2093           export PERL_DESTRUCT_LEVEL
2094
2095       or in UNIXy environments you can also use the "env" command:
2096
2097           env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
2098
2099       NOTE 3: There are known memory leaks when there are compile-time errors
2100       within eval or require, seeing "S_doeval" in the call stack is a good
2101       sign of these.  Fixing these leaks is non-trivial, unfortunately, but
2102       they must be fixed eventually.
2103
2104       Rational Software's Purify
2105
2106       Purify is a commercial tool that is helpful in identifying memory over‐
2107       runs, wild pointers, memory leaks and other such badness.  Perl must be
2108       compiled in a specific way for optimal testing with Purify.  Purify is
2109       available under Windows NT, Solaris, HP-UX, SGI, and Siemens Unix.
2110
2111       Purify on Unix
2112
2113       On Unix, Purify creates a new Perl binary.  To get the most benefit out
2114       of Purify, you should create the perl to Purify using:
2115
2116           sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
2117            -Uusemymalloc -Dusemultiplicity
2118
2119       where these arguments mean:
2120
2121       -Accflags=-DPURIFY
2122           Disables Perl's arena memory allocation functions, as well as forc‐
2123           ing use of memory allocation functions derived from the system mal‐
2124           loc.
2125
2126       -Doptimize='-g'
2127           Adds debugging information so that you see the exact source state‐
2128           ments where the problem occurs.  Without this flag, all you will
2129           see is the source filename of where the error occurred.
2130
2131       -Uusemymalloc
2132           Disable Perl's malloc so that Purify can more closely monitor allo‐
2133           cations and leaks.  Using Perl's malloc will make Purify report
2134           most leaks in the "potential" leaks category.
2135
2136       -Dusemultiplicity
2137           Enabling the multiplicity option allows perl to clean up thoroughly
2138           when the interpreter shuts down, which reduces the number of bogus
2139           leak reports from Purify.
2140
2141       Once you've compiled a perl suitable for Purify'ing, then you can just:
2142
2143           make pureperl
2144
2145       which creates a binary named 'pureperl' that has been Purify'ed.  This
2146       binary is used in place of the standard 'perl' binary when you want to
2147       debug Perl memory problems.
2148
2149       As an example, to show any memory leaks produced during the standard
2150       Perl testset you would create and run the Purify'ed perl as:
2151
2152           make pureperl
2153           cd t
2154           ../pureperl -I../lib harness
2155
2156       which would run Perl on test.pl and report any memory problems.
2157
2158       Purify outputs messages in "Viewer" windows by default.  If you don't
2159       have a windowing environment or if you simply want the Purify output to
2160       unobtrusively go to a log file instead of to the interactive window,
2161       use these following options to output to the log file "perl.log":
2162
2163           setenv PURIFYOPTIONS "-chain-length=25 -windows=no \
2164            -log-file=perl.log -append-logfile=yes"
2165
2166       If you plan to use the "Viewer" windows, then you only need this
2167       option:
2168
2169           setenv PURIFYOPTIONS "-chain-length=25"
2170
2171       In Bourne-type shells:
2172
2173           PURIFYOPTIONS="..."
2174           export PURIFYOPTIONS
2175
2176       or if you have the "env" utility:
2177
2178           env PURIFYOPTIONS="..." ../pureperl ...
2179
2180       Purify on NT
2181
2182       Purify on Windows NT instruments the Perl binary 'perl.exe' on the fly.
2183       There are several options in the makefile you should change to get the
2184       most use out of Purify:
2185
2186       DEFINES
2187           You should add -DPURIFY to the DEFINES line so the DEFINES line
2188           looks something like:
2189
2190               DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1
2191
2192           to disable Perl's arena memory allocation functions, as well as to
2193           force use of memory allocation functions derived from the system
2194           malloc.
2195
2196       USE_MULTI = define
2197           Enabling the multiplicity option allows perl to clean up thoroughly
2198           when the interpreter shuts down, which reduces the number of bogus
2199           leak reports from Purify.
2200
2201       #PERL_MALLOC = define
2202           Disable Perl's malloc so that Purify can more closely monitor allo‐
2203           cations and leaks.  Using Perl's malloc will make Purify report
2204           most leaks in the "potential" leaks category.
2205
2206       CFG = Debug
2207           Adds debugging information so that you see the exact source state‐
2208           ments where the problem occurs.  Without this flag, all you will
2209           see is the source filename of where the error occurred.
2210
2211       As an example, to show any memory leaks produced during the standard
2212       Perl testset you would create and run Purify as:
2213
2214           cd win32
2215           make
2216           cd ../t
2217           purify ../perl -I../lib harness
2218
2219       which would instrument Perl in memory, run Perl on test.pl, then
2220       finally report any memory problems.
2221
2222       valgrind
2223
2224       The excellent valgrind tool can be used to find out both memory leaks
2225       and illegal memory accesses.  As of August 2003 it unfortunately works
2226       only on x86 (ELF) Linux.  The special "test.valgrind" target can be
2227       used to run the tests under valgrind.  Found errors and memory leaks
2228       are logged in files named test.valgrind.
2229
2230       As system libraries (most notably glibc) are also triggering errors,
2231       valgrind allows to suppress such errors using suppression files. The
2232       default suppression file that comes with valgrind already catches a lot
2233       of them. Some additional suppressions are defined in t/perl.supp.
2234
2235       To get valgrind and for more information see
2236
2237           http://developer.kde.org/~sewardj/
2238
2239       Compaq's/Digital's/HP's Third Degree
2240
2241       Third Degree is a tool for memory leak detection and memory access
2242       checks.  It is one of the many tools in the ATOM toolkit.  The toolkit
2243       is only available on Tru64 (formerly known as Digital UNIX formerly
2244       known as DEC OSF/1).
2245
2246       When building Perl, you must first run Configure with -Doptimize=-g and
2247       -Uusemymalloc flags, after that you can use the make targets
2248       "perl.third" and "test.third".  (What is required is that Perl must be
2249       compiled using the "-g" flag, you may need to re-Configure.)
2250
2251       The short story is that with "atom" you can instrument the Perl exe‐
2252       cutable to create a new executable called perl.third.  When the instru‐
2253       mented executable is run, it creates a log of dubious memory traffic in
2254       file called perl.3log.  See the manual pages of atom and third for more
2255       information.  The most extensive Third Degree documentation is avail‐
2256       able in the Compaq "Tru64 UNIX Programmer's Guide", chapter "Debugging
2257       Programs with Third Degree".
2258
2259       The "test.third" leaves a lot of files named foo_bar.3log in the t/
2260       subdirectory.  There is a problem with these files: Third Degree is so
2261       effective that it finds problems also in the system libraries.  There‐
2262       fore you should used the Porting/thirdclean script to cleanup the
2263       *.3log files.
2264
2265       There are also leaks that for given certain definition of a leak,
2266       aren't.  See "PERL_DESTRUCT_LEVEL" for more information.
2267
2268       PERL_DESTRUCT_LEVEL
2269
2270       If you want to run any of the tests yourself manually using e.g.  val‐
2271       grind, or the pureperl or perl.third executables, please note that by
2272       default perl does not explicitly cleanup all the memory it has allo‐
2273       cated (such as global memory arenas) but instead lets the exit() of the
2274       whole program "take care" of such allocations, also known as "global
2275       destruction of objects".
2276
2277       There is a way to tell perl to do complete cleanup: set the environment
2278       variable PERL_DESTRUCT_LEVEL to a non-zero value.  The t/TEST wrapper
2279       does set this to 2, and this is what you need to do too, if you don't
2280       want to see the "global leaks": For example, for "third-degreed" Perl:
2281
2282               env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t
2283
2284       (Note: the mod_perl apache module uses also this environment variable
2285       for its own purposes and extended its semantics. Refer to the mod_perl
2286       documentation for more information. Also, spawned threads do the equiv‐
2287       alent of setting this variable to the value 1.)
2288
2289       If, at the end of a run you get the message N scalars leaked, you can
2290       recompile with "-DDEBUG_LEAKING_SCALARS", which will cause the
2291       addresses of all those leaked SVs to be dumped; it also converts
2292       "new_SV()" from a macro into a real function, so you can use your
2293       favourite debugger to discover where those pesky SVs were allocated.
2294
2295       Profiling
2296
2297       Depending on your platform there are various of profiling Perl.
2298
2299       There are two commonly used techniques of profiling executables: sta‐
2300       tistical time-sampling and basic-block counting.
2301
2302       The first method takes periodically samples of the CPU program counter,
2303       and since the program counter can be correlated with the code generated
2304       for functions, we get a statistical view of in which functions the pro‐
2305       gram is spending its time.  The caveats are that very small/fast func‐
2306       tions have lower probability of showing up in the profile, and that
2307       periodically interrupting the program (this is usually done rather fre‐
2308       quently, in the scale of milliseconds) imposes an additional overhead
2309       that may skew the results.  The first problem can be alleviated by run‐
2310       ning the code for longer (in general this is a good idea for profil‐
2311       ing), the second problem is usually kept in guard by the profiling
2312       tools themselves.
2313
2314       The second method divides up the generated code into basic blocks.
2315       Basic blocks are sections of code that are entered only in the begin‐
2316       ning and exited only at the end.  For example, a conditional jump
2317       starts a basic block.  Basic block profiling usually works by instru‐
2318       menting the code by adding enter basic block #nnnn book-keeping code to
2319       the generated code.  During the execution of the code the basic block
2320       counters are then updated appropriately.  The caveat is that the added
2321       extra code can skew the results: again, the profiling tools usually try
2322       to factor their own effects out of the results.
2323
2324       Gprof Profiling
2325
2326       gprof is a profiling tool available in many UNIX platforms, it uses
2327       statistical time-sampling.
2328
2329       You can build a profiled version of perl called "perl.gprof" by invok‐
2330       ing the make target "perl.gprof"  (What is required is that Perl must
2331       be compiled using the "-pg" flag, you may need to re-Configure).  Run‐
2332       ning the profiled version of Perl will create an output file called
2333       gmon.out is created which contains the profiling data collected during
2334       the execution.
2335
2336       The gprof tool can then display the collected data in various ways.
2337       Usually gprof understands the following options:
2338
2339       -a  Suppress statically defined functions from the profile.
2340
2341       -b  Suppress the verbose descriptions in the profile.
2342
2343       -e routine
2344           Exclude the given routine and its descendants from the profile.
2345
2346       -f routine
2347           Display only the given routine and its descendants in the profile.
2348
2349       -s  Generate a summary file called gmon.sum which then may be given to
2350           subsequent gprof runs to accumulate data over several runs.
2351
2352       -z  Display routines that have zero usage.
2353
2354       For more detailed explanation of the available commands and output for‐
2355       mats, see your own local documentation of gprof.
2356
2357       GCC gcov Profiling
2358
2359       Starting from GCC 3.0 basic block profiling is officially available for
2360       the GNU CC.
2361
2362       You can build a profiled version of perl called perl.gcov by invoking
2363       the make target "perl.gcov" (what is required that Perl must be com‐
2364       piled using gcc with the flags "-fprofile-arcs -ftest-coverage", you
2365       may need to re-Configure).
2366
2367       Running the profiled version of Perl will cause profile output to be
2368       generated.  For each source file an accompanying ".da" file will be
2369       created.
2370
2371       To display the results you use the "gcov" utility (which should be
2372       installed if you have gcc 3.0 or newer installed).  gcov is run on
2373       source code files, like this
2374
2375           gcov sv.c
2376
2377       which will cause sv.c.gcov to be created.  The .gcov files contain the
2378       source code annotated with relative frequencies of execution indicated
2379       by "#" markers.
2380
2381       Useful options of gcov include "-b" which will summarise the basic
2382       block, branch, and function call coverage, and "-c" which instead of
2383       relative frequencies will use the actual counts.  For more information
2384       on the use of gcov and basic block profiling with gcc, see the latest
2385       GNU CC manual, as of GCC 3.0 see
2386
2387           http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html
2388
2389       and its section titled "8. gcov: a Test Coverage Program"
2390
2391           http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132
2392
2393       Pixie Profiling
2394
2395       Pixie is a profiling tool available on IRIX and Tru64 (aka Digital UNIX
2396       aka DEC OSF/1) platforms.  Pixie does its profiling using basic-block
2397       counting.
2398
2399       You can build a profiled version of perl called perl.pixie by invoking
2400       the make target "perl.pixie" (what is required is that Perl must be
2401       compiled using the "-g" flag, you may need to re-Configure).
2402
2403       In Tru64 a file called perl.Addrs will also be silently created, this
2404       file contains the addresses of the basic blocks.  Running the profiled
2405       version of Perl will create a new file called "perl.Counts" which con‐
2406       tains the counts for the basic block for that particular program execu‐
2407       tion.
2408
2409       To display the results you use the prof utility.  The exact incantation
2410       depends on your operating system, "prof perl.Counts" in IRIX, and "prof
2411       -pixie -all -L. perl" in Tru64.
2412
2413       In IRIX the following prof options are available:
2414
2415       -h  Reports the most heavily used lines in descending order of use.
2416           Useful for finding the hotspot lines.
2417
2418       -l  Groups lines by procedure, with procedures sorted in descending
2419           order of use.  Within a procedure, lines are listed in source
2420           order.  Useful for finding the hotspots of procedures.
2421
2422       In Tru64 the following options are available:
2423
2424       -p[rocedures]
2425           Procedures sorted in descending order by the number of cycles exe‐
2426           cuted in each procedure.  Useful for finding the hotspot proce‐
2427           dures.  (This is the default option.)
2428
2429       -h[eavy]
2430           Lines sorted in descending order by the number of cycles executed
2431           in each line.  Useful for finding the hotspot lines.
2432
2433       -i[nvocations]
2434           The called procedures are sorted in descending order by number of
2435           calls made to the procedures.  Useful for finding the most used
2436           procedures.
2437
2438       -l[ines]
2439           Grouped by procedure, sorted by cycles executed per procedure.
2440           Useful for finding the hotspots of procedures.
2441
2442       -testcoverage
2443           The compiler emitted code for these lines, but the code was unexe‐
2444           cuted.
2445
2446       -z[ero]
2447           Unexecuted procedures.
2448
2449       For further information, see your system's manual pages for pixie and
2450       prof.
2451
2452       Miscellaneous tricks
2453
2454       ·   Those debugging perl with the DDD frontend over gdb may find the
2455           following useful:
2456
2457           You can extend the data conversion shortcuts menu, so for example
2458           you can display an SV's IV value with one click, without doing any
2459           typing.  To do that simply edit ~/.ddd/init file and add after:
2460
2461             ! Display shortcuts.
2462             Ddd*gdbDisplayShortcuts: \
2463             /t ()   // Convert to Bin\n\
2464             /d ()   // Convert to Dec\n\
2465             /x ()   // Convert to Hex\n\
2466             /o ()   // Convert to Oct(\n\
2467
2468           the following two lines:
2469
2470             ((XPV*) (())->sv_any )->xpv_pv  // 2pvx\n\
2471             ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
2472
2473           so now you can do ivx and pvx lookups or you can plug there the
2474           sv_peek "conversion":
2475
2476             Perl_sv_peek(my_perl, (SV*)()) // sv_peek
2477
2478           (The my_perl is for threaded builds.)  Just remember that every
2479           line, but the last one, should end with \n\
2480
2481           Alternatively edit the init file interactively via: 3rd mouse but‐
2482           ton -> New Display -> Edit Menu
2483
2484           Note: you can define up to 20 conversion shortcuts in the gdb sec‐
2485           tion.
2486
2487       ·   If you see in a debugger a memory area mysteriously full of
2488           0xabababab, you may be seeing the effect of the Poison() macro, see
2489           perlclib.
2490
2491       CONCLUSION
2492
2493       We've had a brief look around the Perl source, an overview of the
2494       stages perl goes through when it's running your code, and how to use a
2495       debugger to poke at the Perl guts. We took a very simple problem and
2496       demonstrated how to solve it fully - with documentation, regression
2497       tests, and finally a patch for submission to p5p.  Finally, we talked
2498       about how to use external tools to debug and test Perl.
2499
2500       I'd now suggest you read over those references again, and then, as soon
2501       as possible, get your hands dirty. The best way to learn is by doing,
2502       so:
2503
2504       ·  Subscribe to perl5-porters, follow the patches and try and under‐
2505          stand them; don't be afraid to ask if there's a portion you're not
2506          clear on - who knows, you may unearth a bug in the patch...
2507
2508       ·  Keep up to date with the bleeding edge Perl distributions and get
2509          familiar with the changes. Try and get an idea of what areas people
2510          are working on and the changes they're making.
2511
2512       ·  Do read the README associated with your operating system, e.g.
2513          README.aix on the IBM AIX OS. Don't hesitate to supply patches to
2514          that README if you find anything missing or changed over a new OS
2515          release.
2516
2517       ·  Find an area of Perl that seems interesting to you, and see if you
2518          can work out how it works. Scan through the source, and step over it
2519          in the debugger. Play, poke, investigate, fiddle! You'll probably
2520          get to understand not just your chosen area but a much wider range
2521          of perl's activity as well, and probably sooner than you'd think.
2522
2523       The Road goes ever on and on, down from the door where it began.
2524
2525       If you can do these things, you've started on the long road to Perl
2526       porting.  Thanks for wanting to help make Perl better - and happy hack‐
2527       ing!
2528

AUTHOR

2530       This document was written by Nathan Torkington, and is maintained by
2531       the perl5-porters mailing list.
2532
2533
2534
2535perl v5.8.8                       2006-01-07                       PERLHACK(1)