perlguts(1)

1PERLGUTS(1)            Perl Programmers Reference Guide            PERLGUTS(1)
2
3
4

NAME

6       perlguts - Introduction to the Perl API
7

DESCRIPTION

9       This document attempts to describe how to use the Perl API, as well as
10       to provide some info on the basic workings of the Perl core.  It is far
11       from complete and probably contains many errors.  Please refer any
12       questions or comments to the author below.
13

Variables

15   Datatypes
16       Perl has three typedefs that handle Perl's three main data types:
17
18           SV  Scalar Value
19           AV  Array Value
20           HV  Hash Value
21
22       Each typedef has specific routines that manipulate the various data
23       types.
24
25   What is an "IV"?
26       Perl uses a special typedef IV which is a simple signed integer type
27       that is guaranteed to be large enough to hold a pointer (as well as an
28       integer).  Additionally, there is the UV, which is simply an unsigned
29       IV.
30
31       Perl also uses two special typedefs, I32 and I16, which will always be
32       at least 32-bits and 16-bits long, respectively.  (Again, there are U32
33       and U16, as well.)  They will usually be exactly 32 and 16 bits long,
34       but on Crays they will both be 64 bits.
35
36   Working with SVs
37       An SV can be created and loaded with one command.  There are five types
38       of values that can be loaded: an integer value (IV), an unsigned
39       integer value (UV), a double (NV), a string (PV), and another scalar
40       (SV).  ("PV" stands for "Pointer Value".  You might think that it is
41       misnamed because it is described as pointing only to strings.  However,
42       it is possible to have it point to other things.  For example, it could
43       point to an array of UVs.  But, using it for non-strings requires care,
44       as the underlying assumption of much of the internals is that PVs are
45       just for strings.  Often, for example, a trailing "NUL" is tacked on
46       automatically.  The non-string use is documented only in this
47       paragraph.)
48
49       The seven routines are:
50
51           SV*  newSViv(IV);
52           SV*  newSVuv(UV);
53           SV*  newSVnv(double);
54           SV*  newSVpv(const char*, STRLEN);
55           SV*  newSVpvn(const char*, STRLEN);
56           SV*  newSVpvf(const char*, ...);
57           SV*  newSVsv(SV*);
58
59       "STRLEN" is an integer type ("Size_t", usually defined as "size_t" in
60       config.h) guaranteed to be large enough to represent the size of any
61       string that perl can handle.
62
63       In the unlikely case of a SV requiring more complex initialization, you
64       can create an empty SV with newSV(len).  If "len" is 0 an empty SV of
65       type NULL is returned, else an SV of type PV is returned with len + 1
66       (for the "NUL") bytes of storage allocated, accessible via SvPVX.  In
67       both cases the SV has the undef value.
68
69           SV *sv = newSV(0);   /* no storage allocated  */
70           SV *sv = newSV(10);  /* 10 (+1) bytes of uninitialised storage
71                                 * allocated */
72
73       To change the value of an already-existing SV, there are eight
74       routines:
75
76           void  sv_setiv(SV*, IV);
77           void  sv_setuv(SV*, UV);
78           void  sv_setnv(SV*, double);
79           void  sv_setpv(SV*, const char*);
80           void  sv_setpvn(SV*, const char*, STRLEN)
81           void  sv_setpvf(SV*, const char*, ...);
82           void  sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
83                                               SV **, Size_t, bool *);
84           void  sv_setsv(SV*, SV*);
85
86       Notice that you can choose to specify the length of the string to be
87       assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
88       allow Perl to calculate the length by using "sv_setpv" or by specifying
89       0 as the second argument to "newSVpv".  Be warned, though, that Perl
90       will determine the string's length by using "strlen", which depends on
91       the string terminating with a "NUL" character, and not otherwise
92       containing NULs.
93
94       The arguments of "sv_setpvf" are processed like "sprintf", and the
95       formatted output becomes the value.
96
97       "sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to
98       specify either a pointer to a variable argument list or the address and
99       length of an array of SVs.  The last argument points to a boolean; on
100       return, if that boolean is true, then locale-specific information has
101       been used to format the string, and the string's contents are therefore
102       untrustworthy (see perlsec).  This pointer may be NULL if that
103       information is not important.  Note that this function requires you to
104       specify the length of the format.
105
106       The "sv_set*()" functions are not generic enough to operate on values
107       that have "magic".  See "Magic Virtual Tables" later in this document.
108
109       All SVs that contain strings should be terminated with a "NUL"
110       character.  If it is not "NUL"-terminated there is a risk of core dumps
111       and corruptions from code which passes the string to C functions or
112       system calls which expect a "NUL"-terminated string.  Perl's own
113       functions typically add a trailing "NUL" for this reason.
114       Nevertheless, you should be very careful when you pass a string stored
115       in an SV to a C function or system call.
116
117       To access the actual value that an SV points to, you can use the
118       macros:
119
120           SvIV(SV*)
121           SvUV(SV*)
122           SvNV(SV*)
123           SvPV(SV*, STRLEN len)
124           SvPV_nolen(SV*)
125
126       which will automatically coerce the actual scalar type into an IV, UV,
127       double, or string.
128
129       In the "SvPV" macro, the length of the string returned is placed into
130       the variable "len" (this is a macro, so you do not use &len).  If you
131       do not care what the length of the data is, use the "SvPV_nolen" macro.
132       Historically the "SvPV" macro with the global variable "PL_na" has been
133       used in this case.  But that can be quite inefficient because "PL_na"
134       must be accessed in thread-local storage in threaded Perl.  In any
135       case, remember that Perl allows arbitrary strings of data that may both
136       contain NULs and might not be terminated by a "NUL".
137
138       Also remember that C doesn't allow you to safely say "foo(SvPV(s, len),
139       len);".  It might work with your compiler, but it won't work for
140       everyone.  Break this sort of statement up into separate assignments:
141
142           SV *s;
143           STRLEN len;
144           char *ptr;
145           ptr = SvPV(s, len);
146           foo(ptr, len);
147
148       If you want to know if the scalar value is TRUE, you can use:
149
150           SvTRUE(SV*)
151
152       Although Perl will automatically grow strings for you, if you need to
153       force Perl to allocate more memory for your SV, you can use the macro
154
155           SvGROW(SV*, STRLEN newlen)
156
157       which will determine if more memory needs to be allocated.  If so, it
158       will call the function "sv_grow".  Note that "SvGROW" can only
159       increase, not decrease, the allocated memory of an SV and that it does
160       not automatically add space for the trailing "NUL" byte (perl's own
161       string functions typically do "SvGROW(sv, len + 1)").
162
163       If you want to write to an existing SV's buffer and set its value to a
164       string, use SvPV_force() or one of its variants to force the SV to be a
165       PV.  This will remove any of various types of non-stringness from the
166       SV while preserving the content of the SV in the PV.  This can be used,
167       for example, to append data from an API function to a buffer without
168       extra copying:
169
170           (void)SvPVbyte_force(sv, len);
171           s = SvGROW(sv, len + needlen + 1);
172           /* something that modifies up to needlen bytes at s+len, but
173              modifies newlen bytes
174                eg. newlen = read(fd, s + len, needlen);
175              ignoring errors for these examples
176            */
177           s[len + newlen] = '\0';
178           SvCUR_set(sv, len + newlen);
179           SvUTF8_off(sv);
180           SvSETMAGIC(sv);
181
182       If you already have the data in memory or if you want to keep your code
183       simple, you can use one of the sv_cat*() variants, such as sv_catpvn().
184       If you want to insert anywhere in the string you can use sv_insert() or
185       sv_insert_flags().
186
187       If you don't need the existing content of the SV, you can avoid some
188       copying with:
189
190           SvPVCLEAR(sv);
191           s = SvGROW(sv, needlen + 1);
192           /* something that modifies up to needlen bytes at s, but modifies
193              newlen bytes
194                eg. newlen = read(fd, s. needlen);
195            */
196           s[newlen] = '\0';
197           SvCUR_set(sv, newlen);
198           SvPOK_only(sv); /* also clears SVf_UTF8 */
199           SvSETMAGIC(sv);
200
201       Again, if you already have the data in memory or want to avoid the
202       complexity of the above, you can use sv_setpvn().
203
204       If you have a buffer allocated with Newx() and want to set that as the
205       SV's value, you can use sv_usepvn_flags().  That has some requirements
206       if you want to avoid perl re-allocating the buffer to fit the trailing
207       NUL:
208
209          Newx(buf, somesize+1, char);
210          /* ... fill in buf ... */
211          buf[somesize] = '\0';
212          sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL);
213          /* buf now belongs to perl, don't release it */
214
215       If you have an SV and want to know what kind of data Perl thinks is
216       stored in it, you can use the following macros to check the type of SV
217       you have.
218
219           SvIOK(SV*)
220           SvNOK(SV*)
221           SvPOK(SV*)
222
223       You can get and set the current length of the string stored in an SV
224       with the following macros:
225
226           SvCUR(SV*)
227           SvCUR_set(SV*, I32 val)
228
229       You can also get a pointer to the end of the string stored in the SV
230       with the macro:
231
232           SvEND(SV*)
233
234       But note that these last three macros are valid only if "SvPOK()" is
235       true.
236
237       If you want to append something to the end of string stored in an
238       "SV*", you can use the following functions:
239
240           void  sv_catpv(SV*, const char*);
241           void  sv_catpvn(SV*, const char*, STRLEN);
242           void  sv_catpvf(SV*, const char*, ...);
243           void  sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
244                                                                    I32, bool);
245           void  sv_catsv(SV*, SV*);
246
247       The first function calculates the length of the string to be appended
248       by using "strlen".  In the second, you specify the length of the string
249       yourself.  The third function processes its arguments like "sprintf"
250       and appends the formatted output.  The fourth function works like
251       "vsprintf".  You can specify the address and length of an array of SVs
252       instead of the va_list argument.  The fifth function extends the string
253       stored in the first SV with the string stored in the second SV.  It
254       also forces the second SV to be interpreted as a string.
255
256       The "sv_cat*()" functions are not generic enough to operate on values
257       that have "magic".  See "Magic Virtual Tables" later in this document.
258
259       If you know the name of a scalar variable, you can get a pointer to its
260       SV by using the following:
261
262           SV*  get_sv("package::varname", 0);
263
264       This returns NULL if the variable does not exist.
265
266       If you want to know if this variable (or any other SV) is actually
267       "defined", you can call:
268
269           SvOK(SV*)
270
271       The scalar "undef" value is stored in an SV instance called
272       "PL_sv_undef".
273
274       Its address can be used whenever an "SV*" is needed.  Make sure that
275       you don't try to compare a random sv with &PL_sv_undef.  For example
276       when interfacing Perl code, it'll work correctly for:
277
278         foo(undef);
279
280       But won't work when called as:
281
282         $x = undef;
283         foo($x);
284
285       So to repeat always use SvOK() to check whether an sv is defined.
286
287       Also you have to be careful when using &PL_sv_undef as a value in AVs
288       or HVs (see "AVs, HVs and undefined values").
289
290       There are also the two values "PL_sv_yes" and "PL_sv_no", which contain
291       boolean TRUE and FALSE values, respectively.  Like "PL_sv_undef", their
292       addresses can be used whenever an "SV*" is needed.
293
294       Do not be fooled into thinking that "(SV *) 0" is the same as
295       &PL_sv_undef.  Take this code:
296
297           SV* sv = (SV*) 0;
298           if (I-am-to-return-a-real-value) {
299                   sv = sv_2mortal(newSViv(42));
300           }
301           sv_setsv(ST(0), sv);
302
303       This code tries to return a new SV (which contains the value 42) if it
304       should return a real value, or undef otherwise.  Instead it has
305       returned a NULL pointer which, somewhere down the line, will cause a
306       segmentation violation, bus error, or just weird results.  Change the
307       zero to &PL_sv_undef in the first line and all will be well.
308
309       To free an SV that you've created, call "SvREFCNT_dec(SV*)".  Normally
310       this call is not necessary (see "Reference Counts and Mortality").
311
312   Offsets
313       Perl provides the function "sv_chop" to efficiently remove characters
314       from the beginning of a string; you give it an SV and a pointer to
315       somewhere inside the PV, and it discards everything before the pointer.
316       The efficiency comes by means of a little hack: instead of actually
317       removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to
318       signal to other functions that the offset hack is in effect, and it
319       moves the PV pointer (called "SvPVX") forward by the number of bytes
320       chopped off, and adjusts "SvCUR" and "SvLEN" accordingly.  (A portion
321       of the space between the old and new PV pointers is used to store the
322       count of chopped bytes.)
323
324       Hence, at this point, the start of the buffer that we allocated lives
325       at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing into
326       the middle of this allocated storage.
327
328       This is best demonstrated by example.  Normally copy-on-write will
329       prevent the substitution from operator from using this hack, but if you
330       can craft a string for which copy-on-write is not possible, you can see
331       it in play.  In the current implementation, the final byte of a string
332       buffer is used as a copy-on-write reference count.  If the buffer is
333       not big enough, then copy-on-write is skipped.  First have a look at an
334       empty string:
335
336         % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a'
337         SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
338           REFCNT = 1
339           FLAGS = (POK,pPOK)
340           PV = 0x7ffb7bc05b50 ""\0
341           CUR = 0
342           LEN = 10
343
344       Notice here the LEN is 10.  (It may differ on your platform.)  Extend
345       the length of the string to one less than 10, and do a substitution:
346
347        % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \
348                                                                   Dump($a)'
349        SV = PV(0x7ffa04008a70) at 0x7ffa04030390
350          REFCNT = 1
351          FLAGS = (POK,OOK,pPOK)
352          OFFSET = 1
353          PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0
354          CUR = 8
355          LEN = 9
356
357       Here the number of bytes chopped off (1) is shown next as the OFFSET.
358       The portion of the string between the "real" and the "fake" beginnings
359       is shown in parentheses, and the values of "SvCUR" and "SvLEN" reflect
360       the fake beginning, not the real one.  (The first character of the
361       string buffer happens to have changed to "\1" here, not "1", because
362       the current implementation stores the offset count in the string
363       buffer.  This is subject to change.)
364
365       Something similar to the offset hack is performed on AVs to enable
366       efficient shifting and splicing off the beginning of the array; while
367       "AvARRAY" points to the first element in the array that is visible from
368       Perl, "AvALLOC" points to the real start of the C array.  These are
369       usually the same, but a "shift" operation can be carried out by
370       increasing "AvARRAY" by one and decreasing "AvFILL" and "AvMAX".
371       Again, the location of the real start of the C array only comes into
372       play when freeing the array.  See "av_shift" in av.c.
373
374   What's Really Stored in an SV?
375       Recall that the usual method of determining the type of scalar you have
376       is to use "Sv*OK" macros.  Because a scalar can be both a number and a
377       string, usually these macros will always return TRUE and calling the
378       "Sv*V" macros will do the appropriate conversion of string to
379       integer/double or integer/double to string.
380
381       If you really need to know if you have an integer, double, or string
382       pointer in an SV, you can use the following three macros instead:
383
384           SvIOKp(SV*)
385           SvNOKp(SV*)
386           SvPOKp(SV*)
387
388       These will tell you if you truly have an integer, double, or string
389       pointer stored in your SV.  The "p" stands for private.
390
391       There are various ways in which the private and public flags may
392       differ.  For example, in perl 5.16 and earlier a tied SV may have a
393       valid underlying value in the IV slot (so SvIOKp is true), but the data
394       should be accessed via the FETCH routine rather than directly, so SvIOK
395       is false.  (In perl 5.18 onwards, tied scalars use the flags the same
396       way as untied scalars.)  Another is when numeric conversion has
397       occurred and precision has been lost: only the private flag is set on
398       'lossy' values.  So when an NV is converted to an IV with loss, SvIOKp,
399       SvNOKp and SvNOK will be set, while SvIOK wont be.
400
401       In general, though, it's best to use the "Sv*V" macros.
402
403   Working with AVs
404       There are two ways to create and load an AV.  The first method creates
405       an empty AV:
406
407           AV*  newAV();
408
409       The second method both creates the AV and initially populates it with
410       SVs:
411
412           AV*  av_make(SSize_t num, SV **ptr);
413
414       The second argument points to an array containing "num" "SV*"'s.  Once
415       the AV has been created, the SVs can be destroyed, if so desired.
416
417       Once the AV has been created, the following operations are possible on
418       it:
419
420           void  av_push(AV*, SV*);
421           SV*   av_pop(AV*);
422           SV*   av_shift(AV*);
423           void  av_unshift(AV*, SSize_t num);
424
425       These should be familiar operations, with the exception of
426       "av_unshift".  This routine adds "num" elements at the front of the
427       array with the "undef" value.  You must then use "av_store" (described
428       below) to assign values to these new elements.
429
430       Here are some other functions:
431
432           SSize_t av_top_index(AV*);
433           SV**    av_fetch(AV*, SSize_t key, I32 lval);
434           SV**    av_store(AV*, SSize_t key, SV* val);
435
436       The "av_top_index" function returns the highest index value in an array
437       (just like $#array in Perl).  If the array is empty, -1 is returned.
438       The "av_fetch" function returns the value at index "key", but if "lval"
439       is non-zero, then "av_fetch" will store an undef value at that index.
440       The "av_store" function stores the value "val" at index "key", and does
441       not increment the reference count of "val".  Thus the caller is
442       responsible for taking care of that, and if "av_store" returns NULL,
443       the caller will have to decrement the reference count to avoid a memory
444       leak.  Note that "av_fetch" and "av_store" both return "SV**"'s, not
445       "SV*"'s as their return value.
446
447       A few more:
448
449           void  av_clear(AV*);
450           void  av_undef(AV*);
451           void  av_extend(AV*, SSize_t key);
452
453       The "av_clear" function deletes all the elements in the AV* array, but
454       does not actually delete the array itself.  The "av_undef" function
455       will delete all the elements in the array plus the array itself.  The
456       "av_extend" function extends the array so that it contains at least
457       "key+1" elements.  If "key+1" is less than the currently allocated
458       length of the array, then nothing is done.
459
460       If you know the name of an array variable, you can get a pointer to its
461       AV by using the following:
462
463           AV*  get_av("package::varname", 0);
464
465       This returns NULL if the variable does not exist.
466
467       See "Understanding the Magic of Tied Hashes and Arrays" for more
468       information on how to use the array access functions on tied arrays.
469
470   Working with HVs
471       To create an HV, you use the following routine:
472
473           HV*  newHV();
474
475       Once the HV has been created, the following operations are possible on
476       it:
477
478           SV**  hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
479           SV**  hv_fetch(HV*, const char* key, U32 klen, I32 lval);
480
481       The "klen" parameter is the length of the key being passed in (Note
482       that you cannot pass 0 in as a value of "klen" to tell Perl to measure
483       the length of the key).  The "val" argument contains the SV pointer to
484       the scalar being stored, and "hash" is the precomputed hash value (zero
485       if you want "hv_store" to calculate it for you).  The "lval" parameter
486       indicates whether this fetch is actually a part of a store operation,
487       in which case a new undefined value will be added to the HV with the
488       supplied key and "hv_fetch" will return as if the value had already
489       existed.
490
491       Remember that "hv_store" and "hv_fetch" return "SV**"'s and not just
492       "SV*".  To access the scalar value, you must first dereference the
493       return value.  However, you should check to make sure that the return
494       value is not NULL before dereferencing it.
495
496       The first of these two functions checks if a hash table entry exists,
497       and the second deletes it.
498
499           bool  hv_exists(HV*, const char* key, U32 klen);
500           SV*   hv_delete(HV*, const char* key, U32 klen, I32 flags);
501
502       If "flags" does not include the "G_DISCARD" flag then "hv_delete" will
503       create and return a mortal copy of the deleted value.
504
505       And more miscellaneous functions:
506
507           void   hv_clear(HV*);
508           void   hv_undef(HV*);
509
510       Like their AV counterparts, "hv_clear" deletes all the entries in the
511       hash table but does not actually delete the hash table.  The "hv_undef"
512       deletes both the entries and the hash table itself.
513
514       Perl keeps the actual data in a linked list of structures with a
515       typedef of HE.  These contain the actual key and value pointers (plus
516       extra administrative overhead).  The key is a string pointer; the value
517       is an "SV*".  However, once you have an "HE*", to get the actual key
518       and value, use the routines specified below.
519
520           I32    hv_iterinit(HV*);
521                   /* Prepares starting point to traverse hash table */
522           HE*    hv_iternext(HV*);
523                   /* Get the next entry, and return a pointer to a
524                      structure that has both the key and value */
525           char*  hv_iterkey(HE* entry, I32* retlen);
526                   /* Get the key from an HE structure and also return
527                      the length of the key string */
528           SV*    hv_iterval(HV*, HE* entry);
529                   /* Return an SV pointer to the value of the HE
530                      structure */
531           SV*    hv_iternextsv(HV*, char** key, I32* retlen);
532                   /* This convenience routine combines hv_iternext,
533                      hv_iterkey, and hv_iterval.  The key and retlen
534                      arguments are return values for the key and its
535                      length.  The value is returned in the SV* argument */
536
537       If you know the name of a hash variable, you can get a pointer to its
538       HV by using the following:
539
540           HV*  get_hv("package::varname", 0);
541
542       This returns NULL if the variable does not exist.
543
544       The hash algorithm is defined in the "PERL_HASH" macro:
545
546           PERL_HASH(hash, key, klen)
547
548       The exact implementation of this macro varies by architecture and
549       version of perl, and the return value may change per invocation, so the
550       value is only valid for the duration of a single perl process.
551
552       See "Understanding the Magic of Tied Hashes and Arrays" for more
553       information on how to use the hash access functions on tied hashes.
554
555   Hash API Extensions
556       Beginning with version 5.004, the following functions are also
557       supported:
558
559           HE*     hv_fetch_ent  (HV* tb, SV* key, I32 lval, U32 hash);
560           HE*     hv_store_ent  (HV* tb, SV* key, SV* val, U32 hash);
561
562           bool    hv_exists_ent (HV* tb, SV* key, U32 hash);
563           SV*     hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
564
565           SV*     hv_iterkeysv  (HE* entry);
566
567       Note that these functions take "SV*" keys, which simplifies writing of
568       extension code that deals with hash structures.  These functions also
569       allow passing of "SV*" keys to "tie" functions without forcing you to
570       stringify the keys (unlike the previous set of functions).
571
572       They also return and accept whole hash entries ("HE*"), making their
573       use more efficient (since the hash number for a particular string
574       doesn't have to be recomputed every time).  See perlapi for detailed
575       descriptions.
576
577       The following macros must always be used to access the contents of hash
578       entries.  Note that the arguments to these macros must be simple
579       variables, since they may get evaluated more than once.  See perlapi
580       for detailed descriptions of these macros.
581
582           HePV(HE* he, STRLEN len)
583           HeVAL(HE* he)
584           HeHASH(HE* he)
585           HeSVKEY(HE* he)
586           HeSVKEY_force(HE* he)
587           HeSVKEY_set(HE* he, SV* sv)
588
589       These two lower level macros are defined, but must only be used when
590       dealing with keys that are not "SV*"s:
591
592           HeKEY(HE* he)
593           HeKLEN(HE* he)
594
595       Note that both "hv_store" and "hv_store_ent" do not increment the
596       reference count of the stored "val", which is the caller's
597       responsibility.  If these functions return a NULL value, the caller
598       will usually have to decrement the reference count of "val" to avoid a
599       memory leak.
600
601   AVs, HVs and undefined values
602       Sometimes you have to store undefined values in AVs or HVs.  Although
603       this may be a rare case, it can be tricky.  That's because you're used
604       to using &PL_sv_undef if you need an undefined SV.
605
606       For example, intuition tells you that this XS code:
607
608           AV *av = newAV();
609           av_store( av, 0, &PL_sv_undef );
610
611       is equivalent to this Perl code:
612
613           my @av;
614           $av[0] = undef;
615
616       Unfortunately, this isn't true.  In perl 5.18 and earlier, AVs use
617       &PL_sv_undef as a marker for indicating that an array element has not
618       yet been initialized.  Thus, "exists $av[0]" would be true for the
619       above Perl code, but false for the array generated by the XS code.  In
620       perl 5.20, storing &PL_sv_undef will create a read-only element,
621       because the scalar &PL_sv_undef itself is stored, not a copy.
622
623       Similar problems can occur when storing &PL_sv_undef in HVs:
624
625           hv_store( hv, "key", 3, &PL_sv_undef, 0 );
626
627       This will indeed make the value "undef", but if you try to modify the
628       value of "key", you'll get the following error:
629
630           Modification of non-creatable hash value attempted
631
632       In perl 5.8.0, &PL_sv_undef was also used to mark placeholders in
633       restricted hashes.  This caused such hash entries not to appear when
634       iterating over the hash or when checking for the keys with the
635       "hv_exists" function.
636
637       You can run into similar problems when you store &PL_sv_yes or
638       &PL_sv_no into AVs or HVs.  Trying to modify such elements will give
639       you the following error:
640
641           Modification of a read-only value attempted
642
643       To make a long story short, you can use the special variables
644       &PL_sv_undef, &PL_sv_yes and &PL_sv_no with AVs and HVs, but you have
645       to make sure you know what you're doing.
646
647       Generally, if you want to store an undefined value in an AV or HV, you
648       should not use &PL_sv_undef, but rather create a new undefined value
649       using the "newSV" function, for example:
650
651           av_store( av, 42, newSV(0) );
652           hv_store( hv, "foo", 3, newSV(0), 0 );
653
654   References
655       References are a special type of scalar that point to other data types
656       (including other references).
657
658       To create a reference, use either of the following functions:
659
660           SV* newRV_inc((SV*) thing);
661           SV* newRV_noinc((SV*) thing);
662
663       The "thing" argument can be any of an "SV*", "AV*", or "HV*".  The
664       functions are identical except that "newRV_inc" increments the
665       reference count of the "thing", while "newRV_noinc" does not.  For
666       historical reasons, "newRV" is a synonym for "newRV_inc".
667
668       Once you have a reference, you can use the following macro to
669       dereference the reference:
670
671           SvRV(SV*)
672
673       then call the appropriate routines, casting the returned "SV*" to
674       either an "AV*" or "HV*", if required.
675
676       To determine if an SV is a reference, you can use the following macro:
677
678           SvROK(SV*)
679
680       To discover what type of value the reference refers to, use the
681       following macro and then check the return value.
682
683           SvTYPE(SvRV(SV*))
684
685       The most useful types that will be returned are:
686
687           SVt_PVAV    Array
688           SVt_PVHV    Hash
689           SVt_PVCV    Code
690           SVt_PVGV    Glob (possibly a file handle)
691
692       Any numerical value returned which is less than SVt_PVAV will be a
693       scalar of some form.
694
695       See "svtype" in perlapi for more details.
696
697   Blessed References and Class Objects
698       References are also used to support object-oriented programming.  In
699       perl's OO lexicon, an object is simply a reference that has been
700       blessed into a package (or class).  Once blessed, the programmer may
701       now use the reference to access the various methods in the class.
702
703       A reference can be blessed into a package with the following function:
704
705           SV* sv_bless(SV* sv, HV* stash);
706
707       The "sv" argument must be a reference value.  The "stash" argument
708       specifies which class the reference will belong to.  See "Stashes and
709       Globs" for information on converting class names into stashes.
710
711       /* Still under construction */
712
713       The following function upgrades rv to reference if not already one.
714       Creates a new SV for rv to point to.  If "classname" is non-null, the
715       SV is blessed into the specified class.  SV is returned.
716
717               SV* newSVrv(SV* rv, const char* classname);
718
719       The following three functions copy integer, unsigned integer or double
720       into an SV whose reference is "rv".  SV is blessed if "classname" is
721       non-null.
722
723               SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
724               SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
725               SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
726
727       The following function copies the pointer value (the address, not the
728       string!) into an SV whose reference is rv.  SV is blessed if
729       "classname" is non-null.
730
731               SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
732
733       The following function copies a string into an SV whose reference is
734       "rv".  Set length to 0 to let Perl calculate the string length.  SV is
735       blessed if "classname" is non-null.
736
737           SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
738                                                                STRLEN length);
739
740       The following function tests whether the SV is blessed into the
741       specified class.  It does not check inheritance relationships.
742
743               int  sv_isa(SV* sv, const char* name);
744
745       The following function tests whether the SV is a reference to a blessed
746       object.
747
748               int  sv_isobject(SV* sv);
749
750       The following function tests whether the SV is derived from the
751       specified class.  SV can be either a reference to a blessed object or a
752       string containing a class name.  This is the function implementing the
753       "UNIVERSAL::isa" functionality.
754
755               bool sv_derived_from(SV* sv, const char* name);
756
757       To check if you've got an object derived from a specific class you have
758       to write:
759
760               if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
761
762   Creating New Variables
763       To create a new Perl variable with an undef value which can be accessed
764       from your Perl script, use the following routines, depending on the
765       variable type.
766
767           SV*  get_sv("package::varname", GV_ADD);
768           AV*  get_av("package::varname", GV_ADD);
769           HV*  get_hv("package::varname", GV_ADD);
770
771       Notice the use of GV_ADD as the second parameter.  The new variable can
772       now be set, using the routines appropriate to the data type.
773
774       There are additional macros whose values may be bitwise OR'ed with the
775       "GV_ADD" argument to enable certain extra features.  Those bits are:
776
777       GV_ADDMULTI
778           Marks the variable as multiply defined, thus preventing the:
779
780             Name <varname> used only once: possible typo
781
782           warning.
783
784       GV_ADDWARN
785           Issues the warning:
786
787             Had to create <varname> unexpectedly
788
789           if the variable did not exist before the function was called.
790
791       If you do not specify a package name, the variable is created in the
792       current package.
793
794   Reference Counts and Mortality
795       Perl uses a reference count-driven garbage collection mechanism.  SVs,
796       AVs, or HVs (xV for short in the following) start their life with a
797       reference count of 1.  If the reference count of an xV ever drops to 0,
798       then it will be destroyed and its memory made available for reuse.  At
799       the most basic internal level, reference counts can be manipulated with
800       the following macros:
801
802           int SvREFCNT(SV* sv);
803           SV* SvREFCNT_inc(SV* sv);
804           void SvREFCNT_dec(SV* sv);
805
806       (There are also suffixed versions of the increment and decrement
807       macros, for situations where the full generality of these basic macros
808       can be exchanged for some performance.)
809
810       However, the way a programmer should think about references is not so
811       much in terms of the bare reference count, but in terms of ownership of
812       references.  A reference to an xV can be owned by any of a variety of
813       entities: another xV, the Perl interpreter, an XS data structure, a
814       piece of running code, or a dynamic scope.  An xV generally does not
815       know what entities own the references to it; it only knows how many
816       references there are, which is the reference count.
817
818       To correctly maintain reference counts, it is essential to keep track
819       of what references the XS code is manipulating.  The programmer should
820       always know where a reference has come from and who owns it, and be
821       aware of any creation or destruction of references, and any transfers
822       of ownership.  Because ownership isn't represented explicitly in the xV
823       data structures, only the reference count need be actually maintained
824       by the code, and that means that this understanding of ownership is not
825       actually evident in the code.  For example, transferring ownership of a
826       reference from one owner to another doesn't change the reference count
827       at all, so may be achieved with no actual code.  (The transferring code
828       doesn't touch the referenced object, but does need to ensure that the
829       former owner knows that it no longer owns the reference, and that the
830       new owner knows that it now does.)
831
832       An xV that is visible at the Perl level should not become unreferenced
833       and thus be destroyed.  Normally, an object will only become
834       unreferenced when it is no longer visible, often by the same means that
835       makes it invisible.  For example, a Perl reference value (RV) owns a
836       reference to its referent, so if the RV is overwritten that reference
837       gets destroyed, and the no-longer-reachable referent may be destroyed
838       as a result.
839
840       Many functions have some kind of reference manipulation as part of
841       their purpose.  Sometimes this is documented in terms of ownership of
842       references, and sometimes it is (less helpfully) documented in terms of
843       changes to reference counts.  For example, the newRV_inc() function is
844       documented to create a new RV (with reference count 1) and increment
845       the reference count of the referent that was supplied by the caller.
846       This is best understood as creating a new reference to the referent,
847       which is owned by the created RV, and returning to the caller ownership
848       of the sole reference to the RV.  The newRV_noinc() function instead
849       does not increment the reference count of the referent, but the RV
850       nevertheless ends up owning a reference to the referent.  It is
851       therefore implied that the caller of "newRV_noinc()" is relinquishing a
852       reference to the referent, making this conceptually a more complicated
853       operation even though it does less to the data structures.
854
855       For example, imagine you want to return a reference from an XSUB
856       function.  Inside the XSUB routine, you create an SV which initially
857       has just a single reference, owned by the XSUB routine.  This reference
858       needs to be disposed of before the routine is complete, otherwise it
859       will leak, preventing the SV from ever being destroyed.  So to create
860       an RV referencing the SV, it is most convenient to pass the SV to
861       "newRV_noinc()", which consumes that reference.  Now the XSUB routine
862       no longer owns a reference to the SV, but does own a reference to the
863       RV, which in turn owns a reference to the SV.  The ownership of the
864       reference to the RV is then transferred by the process of returning the
865       RV from the XSUB.
866
867       There are some convenience functions available that can help with the
868       destruction of xVs.  These functions introduce the concept of
869       "mortality".  Much documentation speaks of an xV itself being mortal,
870       but this is misleading.  It is really a reference to an xV that is
871       mortal, and it is possible for there to be more than one mortal
872       reference to a single xV.  For a reference to be mortal means that it
873       is owned by the temps stack, one of perl's many internal stacks, which
874       will destroy that reference "a short time later".  Usually the "short
875       time later" is the end of the current Perl statement.  However, it gets
876       more complicated around dynamic scopes: there can be multiple sets of
877       mortal references hanging around at the same time, with different death
878       dates.  Internally, the actual determinant for when mortal xV
879       references are destroyed depends on two macros, SAVETMPS and FREETMPS.
880       See perlcall and perlxs for more details on these macros.
881
882       Mortal references are mainly used for xVs that are placed on perl's
883       main stack.  The stack is problematic for reference tracking, because
884       it contains a lot of xV references, but doesn't own those references:
885       they are not counted.  Currently, there are many bugs resulting from
886       xVs being destroyed while referenced by the stack, because the stack's
887       uncounted references aren't enough to keep the xVs alive.  So when
888       putting an (uncounted) reference on the stack, it is vitally important
889       to ensure that there will be a counted reference to the same xV that
890       will last at least as long as the uncounted reference.  But it's also
891       important that that counted reference be cleaned up at an appropriate
892       time, and not unduly prolong the xV's life.  For there to be a mortal
893       reference is often the best way to satisfy this requirement, especially
894       if the xV was created especially to be put on the stack and would
895       otherwise be unreferenced.
896
897       To create a mortal reference, use the functions:
898
899           SV*  sv_newmortal()
900           SV*  sv_mortalcopy(SV*)
901           SV*  sv_2mortal(SV*)
902
903       "sv_newmortal()" creates an SV (with the undefined value) whose sole
904       reference is mortal.  "sv_mortalcopy()" creates an xV whose value is a
905       copy of a supplied xV and whose sole reference is mortal.
906       "sv_2mortal()" mortalises an existing xV reference: it transfers
907       ownership of a reference from the caller to the temps stack.  Because
908       "sv_newmortal" gives the new SV no value, it must normally be given one
909       via "sv_setpv", "sv_setiv", etc. :
910
911           SV *tmp = sv_newmortal();
912           sv_setiv(tmp, an_integer);
913
914       As that is multiple C statements it is quite common so see this idiom
915       instead:
916
917           SV *tmp = sv_2mortal(newSViv(an_integer));
918
919       The mortal routines are not just for SVs; AVs and HVs can be made
920       mortal by passing their address (type-casted to "SV*") to the
921       "sv_2mortal" or "sv_mortalcopy" routines.
922
923   Stashes and Globs
924       A stash is a hash that contains all variables that are defined within a
925       package.  Each key of the stash is a symbol name (shared by all the
926       different types of objects that have the same name), and each value in
927       the hash table is a GV (Glob Value).  This GV in turn contains
928       references to the various objects of that name, including (but not
929       limited to) the following:
930
931           Scalar Value
932           Array Value
933           Hash Value
934           I/O Handle
935           Format
936           Subroutine
937
938       There is a single stash called "PL_defstash" that holds the items that
939       exist in the "main" package.  To get at the items in other packages,
940       append the string "::" to the package name.  The items in the "Foo"
941       package are in the stash "Foo::" in PL_defstash.  The items in the
942       "Bar::Baz" package are in the stash "Baz::" in "Bar::"'s stash.
943
944       To get the stash pointer for a particular package, use the function:
945
946           HV*  gv_stashpv(const char* name, I32 flags)
947           HV*  gv_stashsv(SV*, I32 flags)
948
949       The first function takes a literal string, the second uses the string
950       stored in the SV.  Remember that a stash is just a hash table, so you
951       get back an "HV*".  The "flags" flag will create a new package if it is
952       set to GV_ADD.
953
954       The name that "gv_stash*v" wants is the name of the package whose
955       symbol table you want.  The default package is called "main".  If you
956       have multiply nested packages, pass their names to "gv_stash*v",
957       separated by "::" as in the Perl language itself.
958
959       Alternately, if you have an SV that is a blessed reference, you can
960       find out the stash pointer by using:
961
962           HV*  SvSTASH(SvRV(SV*));
963
964       then use the following to get the package name itself:
965
966           char*  HvNAME(HV* stash);
967
968       If you need to bless or re-bless an object you can use the following
969       function:
970
971           SV*  sv_bless(SV*, HV* stash)
972
973       where the first argument, an "SV*", must be a reference, and the second
974       argument is a stash.  The returned "SV*" can now be used in the same
975       way as any other SV.
976
977       For more information on references and blessings, consult perlref.
978
979   Double-Typed SVs
980       Scalar variables normally contain only one type of value, an integer,
981       double, pointer, or reference.  Perl will automatically convert the
982       actual scalar data from the stored type into the requested type.
983
984       Some scalar variables contain more than one type of scalar data.  For
985       example, the variable $! contains either the numeric value of "errno"
986       or its string equivalent from either "strerror" or "sys_errlist[]".
987
988       To force multiple data values into an SV, you must do two things: use
989       the "sv_set*v" routines to add the additional scalar type, then set a
990       flag so that Perl will believe it contains more than one type of data.
991       The four macros to set the flags are:
992
993               SvIOK_on
994               SvNOK_on
995               SvPOK_on
996               SvROK_on
997
998       The particular macro you must use depends on which "sv_set*v" routine
999       you called first.  This is because every "sv_set*v" routine turns on
1000       only the bit for the particular type of data being set, and turns off
1001       all the rest.
1002
1003       For example, to create a new Perl variable called "dberror" that
1004       contains both the numeric and descriptive string error values, you
1005       could use the following code:
1006
1007           extern int  dberror;
1008           extern char *dberror_list;
1009
1010           SV* sv = get_sv("dberror", GV_ADD);
1011           sv_setiv(sv, (IV) dberror);
1012           sv_setpv(sv, dberror_list[dberror]);
1013           SvIOK_on(sv);
1014
1015       If the order of "sv_setiv" and "sv_setpv" had been reversed, then the
1016       macro "SvPOK_on" would need to be called instead of "SvIOK_on".
1017
1018   Read-Only Values
1019       In Perl 5.16 and earlier, copy-on-write (see the next section) shared a
1020       flag bit with read-only scalars.  So the only way to test whether
1021       "sv_setsv", etc., will raise a "Modification of a read-only value"
1022       error in those versions is:
1023
1024           SvREADONLY(sv) && !SvIsCOW(sv)
1025
1026       Under Perl 5.18 and later, SvREADONLY only applies to read-only
1027       variables, and, under 5.20, copy-on-write scalars can also be read-
1028       only, so the above check is incorrect.  You just want:
1029
1030           SvREADONLY(sv)
1031
1032       If you need to do this check often, define your own macro like this:
1033
1034           #if PERL_VERSION >= 18
1035           # define SvTRULYREADONLY(sv) SvREADONLY(sv)
1036           #else
1037           # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv))
1038           #endif
1039
1040   Copy on Write
1041       Perl implements a copy-on-write (COW) mechanism for scalars, in which
1042       string copies are not immediately made when requested, but are deferred
1043       until made necessary by one or the other scalar changing.  This is
1044       mostly transparent, but one must take care not to modify string buffers
1045       that are shared by multiple SVs.
1046
1047       You can test whether an SV is using copy-on-write with "SvIsCOW(sv)".
1048
1049       You can force an SV to make its own copy of its string buffer by
1050       calling "sv_force_normal(sv)" or SvPV_force_nolen(sv).
1051
1052       If you want to make the SV drop its string buffer, use
1053       "sv_force_normal_flags(sv, SV_COW_DROP_PV)" or simply "sv_setsv(sv,
1054       NULL)".
1055
1056       All of these functions will croak on read-only scalars (see the
1057       previous section for more on those).
1058
1059       To test that your code is behaving correctly and not modifying COW
1060       buffers, on systems that support mmap(2) (i.e., Unix) you can configure
1061       perl with "-Accflags=-DPERL_DEBUG_READONLY_COW" and it will turn buffer
1062       violations into crashes.  You will find it to be marvellously slow, so
1063       you may want to skip perl's own tests.
1064
1065   Magic Variables
1066       [This section still under construction.  Ignore everything here.  Post
1067       no bills.  Everything not permitted is forbidden.]
1068
1069       Any SV may be magical, that is, it has special features that a normal
1070       SV does not have.  These features are stored in the SV structure in a
1071       linked list of "struct magic"'s, typedef'ed to "MAGIC".
1072
1073           struct magic {
1074               MAGIC*      mg_moremagic;
1075               MGVTBL*     mg_virtual;
1076               U16         mg_private;
1077               char        mg_type;
1078               U8          mg_flags;
1079               I32         mg_len;
1080               SV*         mg_obj;
1081               char*       mg_ptr;
1082           };
1083
1084       Note this is current as of patchlevel 0, and could change at any time.
1085
1086   Assigning Magic
1087       Perl adds magic to an SV using the sv_magic function:
1088
1089         void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
1090
1091       The "sv" argument is a pointer to the SV that is to acquire a new
1092       magical feature.
1093
1094       If "sv" is not already magical, Perl uses the "SvUPGRADE" macro to
1095       convert "sv" to type "SVt_PVMG".  Perl then continues by adding new
1096       magic to the beginning of the linked list of magical features.  Any
1097       prior entry of the same type of magic is deleted.  Note that this can
1098       be overridden, and multiple instances of the same type of magic can be
1099       associated with an SV.
1100
1101       The "name" and "namlen" arguments are used to associate a string with
1102       the magic, typically the name of a variable.  "namlen" is stored in the
1103       "mg_len" field and if "name" is non-null then either a "savepvn" copy
1104       of "name" or "name" itself is stored in the "mg_ptr" field, depending
1105       on whether "namlen" is greater than zero or equal to zero respectively.
1106       As a special case, if "(name && namlen == HEf_SVKEY)" then "name" is
1107       assumed to contain an "SV*" and is stored as-is with its REFCNT
1108       incremented.
1109
1110       The sv_magic function uses "how" to determine which, if any, predefined
1111       "Magic Virtual Table" should be assigned to the "mg_virtual" field.
1112       See the "Magic Virtual Tables" section below.  The "how" argument is
1113       also stored in the "mg_type" field.  The value of "how" should be
1114       chosen from the set of macros "PERL_MAGIC_foo" found in perl.h.  Note
1115       that before these macros were added, Perl internals used to directly
1116       use character literals, so you may occasionally come across old code or
1117       documentation referring to 'U' magic rather than "PERL_MAGIC_uvar" for
1118       example.
1119
1120       The "obj" argument is stored in the "mg_obj" field of the "MAGIC"
1121       structure.  If it is not the same as the "sv" argument, the reference
1122       count of the "obj" object is incremented.  If it is the same, or if the
1123       "how" argument is "PERL_MAGIC_arylen", "PERL_MAGIC_regdatum",
1124       "PERL_MAGIC_regdata", or if it is a NULL pointer, then "obj" is merely
1125       stored, without the reference count being incremented.
1126
1127       See also "sv_magicext" in perlapi for a more flexible way to add magic
1128       to an SV.
1129
1130       There is also a function to add magic to an "HV":
1131
1132           void hv_magic(HV *hv, GV *gv, int how);
1133
1134       This simply calls "sv_magic" and coerces the "gv" argument into an
1135       "SV".
1136
1137       To remove the magic from an SV, call the function sv_unmagic:
1138
1139           int sv_unmagic(SV *sv, int type);
1140
1141       The "type" argument should be equal to the "how" value when the "SV"
1142       was initially made magical.
1143
1144       However, note that "sv_unmagic" removes all magic of a certain "type"
1145       from the "SV".  If you want to remove only certain magic of a "type"
1146       based on the magic virtual table, use "sv_unmagicext" instead:
1147
1148           int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
1149
1150   Magic Virtual Tables
1151       The "mg_virtual" field in the "MAGIC" structure is a pointer to an
1152       "MGVTBL", which is a structure of function pointers and stands for
1153       "Magic Virtual Table" to handle the various operations that might be
1154       applied to that variable.
1155
1156       The "MGVTBL" has five (or sometimes eight) pointers to the following
1157       routine types:
1158
1159           int  (*svt_get)  (pTHX_ SV* sv, MAGIC* mg);
1160           int  (*svt_set)  (pTHX_ SV* sv, MAGIC* mg);
1161           U32  (*svt_len)  (pTHX_ SV* sv, MAGIC* mg);
1162           int  (*svt_clear)(pTHX_ SV* sv, MAGIC* mg);
1163           int  (*svt_free) (pTHX_ SV* sv, MAGIC* mg);
1164
1165           int  (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv,
1166                                                 const char *name, I32 namlen);
1167           int  (*svt_dup)  (pTHX_ MAGIC *mg, CLONE_PARAMS *param);
1168           int  (*svt_local)(pTHX_ SV *nsv, MAGIC *mg);
1169
1170       This MGVTBL structure is set at compile-time in perl.h and there are
1171       currently 32 types.  These different structures contain pointers to
1172       various routines that perform additional actions depending on which
1173       function is being called.
1174
1175          Function pointer    Action taken
1176          ----------------    ------------
1177          svt_get             Do something before the value of the SV is
1178                              retrieved.
1179          svt_set             Do something after the SV is assigned a value.
1180          svt_len             Report on the SV's length.
1181          svt_clear           Clear something the SV represents.
1182          svt_free            Free any extra storage associated with the SV.
1183
1184          svt_copy            copy tied variable magic to a tied element
1185          svt_dup             duplicate a magic structure during thread cloning
1186          svt_local           copy magic to local value during 'local'
1187
1188       For instance, the MGVTBL structure called "vtbl_sv" (which corresponds
1189       to an "mg_type" of "PERL_MAGIC_sv") contains:
1190
1191           { magic_get, magic_set, magic_len, 0, 0 }
1192
1193       Thus, when an SV is determined to be magical and of type
1194       "PERL_MAGIC_sv", if a get operation is being performed, the routine
1195       "magic_get" is called.  All the various routines for the various
1196       magical types begin with "magic_".  NOTE: the magic routines are not
1197       considered part of the Perl API, and may not be exported by the Perl
1198       library.
1199
1200       The last three slots are a recent addition, and for source code
1201       compatibility they are only checked for if one of the three flags
1202       MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags.  This means that
1203       most code can continue declaring a vtable as a 5-element value.  These
1204       three are currently used exclusively by the threading code, and are
1205       highly subject to change.
1206
1207       The current kinds of Magic Virtual Tables are:
1208
1209        mg_type
1210        (old-style char and macro)   MGVTBL         Type of magic
1211        --------------------------   ------         -------------
1212        \0 PERL_MAGIC_sv             vtbl_sv        Special scalar variable
1213        #  PERL_MAGIC_arylen         vtbl_arylen    Array length ($#ary)
1214        %  PERL_MAGIC_rhash          (none)         Extra data for restricted
1215                                                    hashes
1216        *  PERL_MAGIC_debugvar       vtbl_debugvar  $DB::single, signal, trace
1217                                                    vars
1218        .  PERL_MAGIC_pos            vtbl_pos       pos() lvalue
1219        :  PERL_MAGIC_symtab         (none)         Extra data for symbol
1220                                                    tables
1221        <  PERL_MAGIC_backref        vtbl_backref   For weak ref data
1222        @  PERL_MAGIC_arylen_p       (none)         To move arylen out of XPVAV
1223        B  PERL_MAGIC_bm             vtbl_regexp    Boyer-Moore
1224                                                    (fast string search)
1225        c  PERL_MAGIC_overload_table vtbl_ovrld     Holds overload table
1226                                                    (AMT) on stash
1227        D  PERL_MAGIC_regdata        vtbl_regdata   Regex match position data
1228                                                    (@+ and @- vars)
1229        d  PERL_MAGIC_regdatum       vtbl_regdatum  Regex match position data
1230                                                    element
1231        E  PERL_MAGIC_env            vtbl_env       %ENV hash
1232        e  PERL_MAGIC_envelem        vtbl_envelem   %ENV hash element
1233        f  PERL_MAGIC_fm             vtbl_regexp    Formline
1234                                                    ('compiled' format)
1235        g  PERL_MAGIC_regex_global   vtbl_mglob     m//g target
1236        H  PERL_MAGIC_hints          vtbl_hints     %^H hash
1237        h  PERL_MAGIC_hintselem      vtbl_hintselem %^H hash element
1238        I  PERL_MAGIC_isa            vtbl_isa       @ISA array
1239        i  PERL_MAGIC_isaelem        vtbl_isaelem   @ISA array element
1240        k  PERL_MAGIC_nkeys          vtbl_nkeys     scalar(keys()) lvalue
1241        L  PERL_MAGIC_dbfile         (none)         Debugger %_<filename
1242        l  PERL_MAGIC_dbline         vtbl_dbline    Debugger %_<filename
1243                                                    element
1244        N  PERL_MAGIC_shared         (none)         Shared between threads
1245        n  PERL_MAGIC_shared_scalar  (none)         Shared between threads
1246        o  PERL_MAGIC_collxfrm       vtbl_collxfrm  Locale transformation
1247        P  PERL_MAGIC_tied           vtbl_pack      Tied array or hash
1248        p  PERL_MAGIC_tiedelem       vtbl_packelem  Tied array or hash element
1249        q  PERL_MAGIC_tiedscalar     vtbl_packelem  Tied scalar or handle
1250        r  PERL_MAGIC_qr             vtbl_regexp    Precompiled qr// regex
1251        S  PERL_MAGIC_sig            (none)         %SIG hash
1252        s  PERL_MAGIC_sigelem        vtbl_sigelem   %SIG hash element
1253        t  PERL_MAGIC_taint          vtbl_taint     Taintedness
1254        U  PERL_MAGIC_uvar           vtbl_uvar      Available for use by
1255                                                    extensions
1256        u  PERL_MAGIC_uvar_elem      (none)         Reserved for use by
1257                                                    extensions
1258        V  PERL_MAGIC_vstring        (none)         SV was vstring literal
1259        v  PERL_MAGIC_vec            vtbl_vec       vec() lvalue
1260        w  PERL_MAGIC_utf8           vtbl_utf8      Cached UTF-8 information
1261        x  PERL_MAGIC_substr         vtbl_substr    substr() lvalue
1262        Y  PERL_MAGIC_nonelem        vtbl_nonelem   Array element that does not
1263                                                    exist
1264        y  PERL_MAGIC_defelem        vtbl_defelem   Shadow "foreach" iterator
1265                                                    variable / smart parameter
1266                                                    vivification
1267        \  PERL_MAGIC_lvref          vtbl_lvref     Lvalue reference
1268                                                    constructor
1269        ]  PERL_MAGIC_checkcall      vtbl_checkcall Inlining/mutation of call
1270                                                    to this CV
1271        ~  PERL_MAGIC_ext            (none)         Available for use by
1272                                                    extensions
1273
1274       When an uppercase and lowercase letter both exist in the table, then
1275       the uppercase letter is typically used to represent some kind of
1276       composite type (a list or a hash), and the lowercase letter is used to
1277       represent an element of that composite type.  Some internals code makes
1278       use of this case relationship.  However, 'v' and 'V' (vec and v-string)
1279       are in no way related.
1280
1281       The "PERL_MAGIC_ext" and "PERL_MAGIC_uvar" magic types are defined
1282       specifically for use by extensions and will not be used by perl itself.
1283       Extensions can use "PERL_MAGIC_ext" magic to 'attach' private
1284       information to variables (typically objects).  This is especially
1285       useful because there is no way for normal perl code to corrupt this
1286       private information (unlike using extra elements of a hash object).
1287
1288       Similarly, "PERL_MAGIC_uvar" magic can be used much like tie() to call
1289       a C function any time a scalar's value is used or changed.  The
1290       "MAGIC"'s "mg_ptr" field points to a "ufuncs" structure:
1291
1292           struct ufuncs {
1293               I32 (*uf_val)(pTHX_ IV, SV*);
1294               I32 (*uf_set)(pTHX_ IV, SV*);
1295               IV uf_index;
1296           };
1297
1298       When the SV is read from or written to, the "uf_val" or "uf_set"
1299       function will be called with "uf_index" as the first arg and a pointer
1300       to the SV as the second.  A simple example of how to add
1301       "PERL_MAGIC_uvar" magic is shown below.  Note that the ufuncs structure
1302       is copied by sv_magic, so you can safely allocate it on the stack.
1303
1304           void
1305           Umagic(sv)
1306               SV *sv;
1307           PREINIT:
1308               struct ufuncs uf;
1309           CODE:
1310               uf.uf_val   = &my_get_fn;
1311               uf.uf_set   = &my_set_fn;
1312               uf.uf_index = 0;
1313               sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1314
1315       Attaching "PERL_MAGIC_uvar" to arrays is permissible but has no effect.
1316
1317       For hashes there is a specialized hook that gives control over hash
1318       keys (but not values).  This hook calls "PERL_MAGIC_uvar" 'get' magic
1319       if the "set" function in the "ufuncs" structure is NULL.  The hook is
1320       activated whenever the hash is accessed with a key specified as an "SV"
1321       through the functions "hv_store_ent", "hv_fetch_ent", "hv_delete_ent",
1322       and "hv_exists_ent".  Accessing the key as a string through the
1323       functions without the "..._ent" suffix circumvents the hook.  See
1324       "GUTS" in Hash::Util::FieldHash for a detailed description.
1325
1326       Note that because multiple extensions may be using "PERL_MAGIC_ext" or
1327       "PERL_MAGIC_uvar" magic, it is important for extensions to take extra
1328       care to avoid conflict.  Typically only using the magic on objects
1329       blessed into the same class as the extension is sufficient.  For
1330       "PERL_MAGIC_ext" magic, it is usually a good idea to define an
1331       "MGVTBL", even if all its fields will be 0, so that individual "MAGIC"
1332       pointers can be identified as a particular kind of magic using their
1333       magic virtual table.  "mg_findext" provides an easy way to do that:
1334
1335           STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
1336
1337           MAGIC *mg;
1338           if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
1339               /* this is really ours, not another module's PERL_MAGIC_ext */
1340               my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr;
1341               ...
1342           }
1343
1344       Also note that the "sv_set*()" and "sv_cat*()" functions described
1345       earlier do not invoke 'set' magic on their targets.  This must be done
1346       by the user either by calling the "SvSETMAGIC()" macro after calling
1347       these functions, or by using one of the "sv_set*_mg()" or
1348       "sv_cat*_mg()" functions.  Similarly, generic C code must call the
1349       "SvGETMAGIC()" macro to invoke any 'get' magic if they use an SV
1350       obtained from external sources in functions that don't handle magic.
1351       See perlapi for a description of these functions.  For example, calls
1352       to the "sv_cat*()" functions typically need to be followed by
1353       "SvSETMAGIC()", but they don't need a prior "SvGETMAGIC()" since their
1354       implementation handles 'get' magic.
1355
1356   Finding Magic
1357           MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
1358                                              * type */
1359
1360       This routine returns a pointer to a "MAGIC" structure stored in the SV.
1361       If the SV does not have that magical feature, "NULL" is returned.  If
1362       the SV has multiple instances of that magical feature, the first one
1363       will be returned.  "mg_findext" can be used to find a "MAGIC" structure
1364       of an SV based on both its magic type and its magic virtual table:
1365
1366           MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
1367
1368       Also, if the SV passed to "mg_find" or "mg_findext" is not of type
1369       SVt_PVMG, Perl may core dump.
1370
1371           int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1372
1373       This routine checks to see what types of magic "sv" has.  If the
1374       mg_type field is an uppercase letter, then the mg_obj is copied to
1375       "nsv", but the mg_type field is changed to be the lowercase letter.
1376
1377   Understanding the Magic of Tied Hashes and Arrays
1378       Tied hashes and arrays are magical beasts of the "PERL_MAGIC_tied"
1379       magic type.
1380
1381       WARNING: As of the 5.004 release, proper usage of the array and hash
1382       access functions requires understanding a few caveats.  Some of these
1383       caveats are actually considered bugs in the API, to be fixed in later
1384       releases, and are bracketed with [MAYCHANGE] below.  If you find
1385       yourself actually applying such information in this section, be aware
1386       that the behavior may change in the future, umm, without warning.
1387
1388       The perl tie function associates a variable with an object that
1389       implements the various GET, SET, etc methods.  To perform the
1390       equivalent of the perl tie function from an XSUB, you must mimic this
1391       behaviour.  The code below carries out the necessary steps -- firstly
1392       it creates a new hash, and then creates a second hash which it blesses
1393       into the class which will implement the tie methods.  Lastly it ties
1394       the two hashes together, and returns a reference to the new tied hash.
1395       Note that the code below does NOT call the TIEHASH method in the MyTie
1396       class - see "Calling Perl Routines from within C Programs" for details
1397       on how to do this.
1398
1399           SV*
1400           mytie()
1401           PREINIT:
1402               HV *hash;
1403               HV *stash;
1404               SV *tie;
1405           CODE:
1406               hash = newHV();
1407               tie = newRV_noinc((SV*)newHV());
1408               stash = gv_stashpv("MyTie", GV_ADD);
1409               sv_bless(tie, stash);
1410               hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1411               RETVAL = newRV_noinc(hash);
1412           OUTPUT:
1413               RETVAL
1414
1415       The "av_store" function, when given a tied array argument, merely
1416       copies the magic of the array onto the value to be "stored", using
1417       "mg_copy".  It may also return NULL, indicating that the value did not
1418       actually need to be stored in the array.  [MAYCHANGE] After a call to
1419       "av_store" on a tied array, the caller will usually need to call
1420       "mg_set(val)" to actually invoke the perl level "STORE" method on the
1421       TIEARRAY object.  If "av_store" did return NULL, a call to
1422       "SvREFCNT_dec(val)" will also be usually necessary to avoid a memory
1423       leak. [/MAYCHANGE]
1424
1425       The previous paragraph is applicable verbatim to tied hash access using
1426       the "hv_store" and "hv_store_ent" functions as well.
1427
1428       "av_fetch" and the corresponding hash functions "hv_fetch" and
1429       "hv_fetch_ent" actually return an undefined mortal value whose magic
1430       has been initialized using "mg_copy".  Note the value so returned does
1431       not need to be deallocated, as it is already mortal.  [MAYCHANGE] But
1432       you will need to call "mg_get()" on the returned value in order to
1433       actually invoke the perl level "FETCH" method on the underlying TIE
1434       object.  Similarly, you may also call "mg_set()" on the return value
1435       after possibly assigning a suitable value to it using "sv_setsv",
1436       which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]
1437
1438       [MAYCHANGE] In other words, the array or hash fetch/store functions
1439       don't really fetch and store actual values in the case of tied arrays
1440       and hashes.  They merely call "mg_copy" to attach magic to the values
1441       that were meant to be "stored" or "fetched".  Later calls to "mg_get"
1442       and "mg_set" actually do the job of invoking the TIE methods on the
1443       underlying objects.  Thus the magic mechanism currently implements a
1444       kind of lazy access to arrays and hashes.
1445
1446       Currently (as of perl version 5.004), use of the hash and array access
1447       functions requires the user to be aware of whether they are operating
1448       on "normal" hashes and arrays, or on their tied variants.  The API may
1449       be changed to provide more transparent access to both tied and normal
1450       data types in future versions.  [/MAYCHANGE]
1451
1452       You would do well to understand that the TIEARRAY and TIEHASH
1453       interfaces are mere sugar to invoke some perl method calls while using
1454       the uniform hash and array syntax.  The use of this sugar imposes some
1455       overhead (typically about two to four extra opcodes per FETCH/STORE
1456       operation, in addition to the creation of all the mortal variables
1457       required to invoke the methods).  This overhead will be comparatively
1458       small if the TIE methods are themselves substantial, but if they are
1459       only a few statements long, the overhead will not be insignificant.
1460
1461   Localizing changes
1462       Perl has a very handy construction
1463
1464         {
1465           local $var = 2;
1466           ...
1467         }
1468
1469       This construction is approximately equivalent to
1470
1471         {
1472           my $oldvar = $var;
1473           $var = 2;
1474           ...
1475           $var = $oldvar;
1476         }
1477
1478       The biggest difference is that the first construction would reinstate
1479       the initial value of $var, irrespective of how control exits the block:
1480       "goto", "return", "die"/"eval", etc.  It is a little bit more efficient
1481       as well.
1482
1483       There is a way to achieve a similar task from C via Perl API: create a
1484       pseudo-block, and arrange for some changes to be automatically undone
1485       at the end of it, either explicit, or via a non-local exit (via die()).
1486       A block-like construct is created by a pair of "ENTER"/"LEAVE" macros
1487       (see "Returning a Scalar" in perlcall).  Such a construct may be
1488       created specially for some important localized task, or an existing one
1489       (like boundaries of enclosing Perl subroutine/block, or an existing
1490       pair for freeing TMPs) may be used.  (In the second case the overhead
1491       of additional localization must be almost negligible.)  Note that any
1492       XSUB is automatically enclosed in an "ENTER"/"LEAVE" pair.
1493
1494       Inside such a pseudo-block the following service is available:
1495
1496       "SAVEINT(int i)"
1497       "SAVEIV(IV i)"
1498       "SAVEI32(I32 i)"
1499       "SAVELONG(long i)"
1500           These macros arrange things to restore the value of integer
1501           variable "i" at the end of enclosing pseudo-block.
1502
1503       SAVESPTR(s)
1504       SAVEPPTR(p)
1505           These macros arrange things to restore the value of pointers "s"
1506           and "p".  "s" must be a pointer of a type which survives conversion
1507           to "SV*" and back, "p" should be able to survive conversion to
1508           "char*" and back.
1509
1510       "SAVEFREESV(SV *sv)"
1511           The refcount of "sv" will be decremented at the end of pseudo-
1512           block.  This is similar to "sv_2mortal" in that it is also a
1513           mechanism for doing a delayed "SvREFCNT_dec".  However, while
1514           "sv_2mortal" extends the lifetime of "sv" until the beginning of
1515           the next statement, "SAVEFREESV" extends it until the end of the
1516           enclosing scope.  These lifetimes can be wildly different.
1517
1518           Also compare "SAVEMORTALIZESV".
1519
1520       "SAVEMORTALIZESV(SV *sv)"
1521           Just like "SAVEFREESV", but mortalizes "sv" at the end of the
1522           current scope instead of decrementing its reference count.  This
1523           usually has the effect of keeping "sv" alive until the statement
1524           that called the currently live scope has finished executing.
1525
1526       "SAVEFREEOP(OP *op)"
1527           The "OP *" is op_free()ed at the end of pseudo-block.
1528
1529       SAVEFREEPV(p)
1530           The chunk of memory which is pointed to by "p" is Safefree()ed at
1531           the end of pseudo-block.
1532
1533       "SAVECLEARSV(SV *sv)"
1534           Clears a slot in the current scratchpad which corresponds to "sv"
1535           at the end of pseudo-block.
1536
1537       "SAVEDELETE(HV *hv, char *key, I32 length)"
1538           The key "key" of "hv" is deleted at the end of pseudo-block.  The
1539           string pointed to by "key" is Safefree()ed.  If one has a key in
1540           short-lived storage, the corresponding string may be reallocated
1541           like this:
1542
1543             SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1544
1545       "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
1546           At the end of pseudo-block the function "f" is called with the only
1547           argument "p".
1548
1549       "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
1550           At the end of pseudo-block the function "f" is called with the
1551           implicit context argument (if any), and "p".
1552
1553       "SAVESTACK_POS()"
1554           The current offset on the Perl internal stack (cf. "SP") is
1555           restored at the end of pseudo-block.
1556
1557       The following API list contains functions, thus one needs to provide
1558       pointers to the modifiable data explicitly (either C pointers, or
1559       Perlish "GV *"s).  Where the above macros take "int", a similar
1560       function takes "int *".
1561
1562       "SV* save_scalar(GV *gv)"
1563           Equivalent to Perl code "local $gv".
1564
1565       "AV* save_ary(GV *gv)"
1566       "HV* save_hash(GV *gv)"
1567           Similar to "save_scalar", but localize @gv and %gv.
1568
1569       "void save_item(SV *item)"
1570           Duplicates the current value of "SV", on the exit from the current
1571           "ENTER"/"LEAVE" pseudo-block will restore the value of "SV" using
1572           the stored value.  It doesn't handle magic.  Use "save_scalar" if
1573           magic is affected.
1574
1575       "void save_list(SV **sarg, I32 maxsarg)"
1576           A variant of "save_item" which takes multiple arguments via an
1577           array "sarg" of "SV*" of length "maxsarg".
1578
1579       "SV* save_svref(SV **sptr)"
1580           Similar to "save_scalar", but will reinstate an "SV *".
1581
1582       "void save_aptr(AV **aptr)"
1583       "void save_hptr(HV **hptr)"
1584           Similar to "save_svref", but localize "AV *" and "HV *".
1585
1586       The "Alias" module implements localization of the basic types within
1587       the caller's scope.  People who are interested in how to localize
1588       things in the containing scope should take a look there too.
1589

Subroutines

1591   XSUBs and the Argument Stack
1592       The XSUB mechanism is a simple way for Perl programs to access C
1593       subroutines.  An XSUB routine will have a stack that contains the
1594       arguments from the Perl program, and a way to map from the Perl data
1595       structures to a C equivalent.
1596
1597       The stack arguments are accessible through the ST(n) macro, which
1598       returns the "n"'th stack argument.  Argument 0 is the first argument
1599       passed in the Perl subroutine call.  These arguments are "SV*", and can
1600       be used anywhere an "SV*" is used.
1601
1602       Most of the time, output from the C routine can be handled through use
1603       of the RETVAL and OUTPUT directives.  However, there are some cases
1604       where the argument stack is not already long enough to handle all the
1605       return values.  An example is the POSIX tzname() call, which takes no
1606       arguments, but returns two, the local time zone's standard and summer
1607       time abbreviations.
1608
1609       To handle this situation, the PPCODE directive is used and the stack is
1610       extended using the macro:
1611
1612           EXTEND(SP, num);
1613
1614       where "SP" is the macro that represents the local copy of the stack
1615       pointer, and "num" is the number of elements the stack should be
1616       extended by.
1617
1618       Now that there is room on the stack, values can be pushed on it using
1619       "PUSHs" macro.  The pushed values will often need to be "mortal" (See
1620       "Reference Counts and Mortality"):
1621
1622           PUSHs(sv_2mortal(newSViv(an_integer)))
1623           PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
1624           PUSHs(sv_2mortal(newSVnv(a_double)))
1625           PUSHs(sv_2mortal(newSVpv("Some String",0)))
1626           /* Although the last example is better written as the more
1627            * efficient: */
1628           PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
1629
1630       And now the Perl program calling "tzname", the two values will be
1631       assigned as in:
1632
1633           ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1634
1635       An alternate (and possibly simpler) method to pushing values on the
1636       stack is to use the macro:
1637
1638           XPUSHs(SV*)
1639
1640       This macro automatically adjusts the stack for you, if needed.  Thus,
1641       you do not need to call "EXTEND" to extend the stack.
1642
1643       Despite their suggestions in earlier versions of this document the
1644       macros "(X)PUSH[iunp]" are not suited to XSUBs which return multiple
1645       results.  For that, either stick to the "(X)PUSHs" macros shown above,
1646       or use the new "m(X)PUSH[iunp]" macros instead; see "Putting a C value
1647       on Perl stack".
1648
1649       For more information, consult perlxs and perlxstut.
1650
1651   Autoloading with XSUBs
1652       If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts
1653       the fully-qualified name of the autoloaded subroutine in the $AUTOLOAD
1654       variable of the XSUB's package.
1655
1656       But it also puts the same information in certain fields of the XSUB
1657       itself:
1658
1659           HV *stash           = CvSTASH(cv);
1660           const char *subname = SvPVX(cv);
1661           STRLEN name_length  = SvCUR(cv); /* in bytes */
1662           U32 is_utf8         = SvUTF8(cv);
1663
1664       "SvPVX(cv)" contains just the sub name itself, not including the
1665       package.  For an AUTOLOAD routine in UNIVERSAL or one of its
1666       superclasses, "CvSTASH(cv)" returns NULL during a method call on a
1667       nonexistent package.
1668
1669       Note: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
1670       XS AUTOLOAD subs at all.  Perl 5.8.0 introduced the use of fields in
1671       the XSUB itself.  Perl 5.16.0 restored the setting of $AUTOLOAD.  If
1672       you need to support 5.8-5.14, use the XSUB's fields.
1673
1674   Calling Perl Routines from within C Programs
1675       There are four routines that can be used to call a Perl subroutine from
1676       within a C program.  These four are:
1677
1678           I32  call_sv(SV*, I32);
1679           I32  call_pv(const char*, I32);
1680           I32  call_method(const char*, I32);
1681           I32  call_argv(const char*, I32, char**);
1682
1683       The routine most often used is "call_sv".  The "SV*" argument contains
1684       either the name of the Perl subroutine to be called, or a reference to
1685       the subroutine.  The second argument consists of flags that control the
1686       context in which the subroutine is called, whether or not the
1687       subroutine is being passed arguments, how errors should be trapped, and
1688       how to treat return values.
1689
1690       All four routines return the number of arguments that the subroutine
1691       returned on the Perl stack.
1692
1693       These routines used to be called "perl_call_sv", etc., before Perl
1694       v5.6.0, but those names are now deprecated; macros of the same name are
1695       provided for compatibility.
1696
1697       When using any of these routines (except "call_argv"), the programmer
1698       must manipulate the Perl stack.  These include the following macros and
1699       functions:
1700
1701           dSP
1702           SP
1703           PUSHMARK()
1704           PUTBACK
1705           SPAGAIN
1706           ENTER
1707           SAVETMPS
1708           FREETMPS
1709           LEAVE
1710           XPUSH*()
1711           POP*()
1712
1713       For a detailed description of calling conventions from C to Perl,
1714       consult perlcall.
1715
1716   Putting a C value on Perl stack
1717       A lot of opcodes (this is an elementary operation in the internal perl
1718       stack machine) put an SV* on the stack.  However, as an optimization
1719       the corresponding SV is (usually) not recreated each time.  The opcodes
1720       reuse specially assigned SVs (targets) which are (as a corollary) not
1721       constantly freed/created.
1722
1723       Each of the targets is created only once (but see "Scratchpads and
1724       recursion" below), and when an opcode needs to put an integer, a
1725       double, or a string on stack, it just sets the corresponding parts of
1726       its target and puts the target on stack.
1727
1728       The macro to put this target on stack is "PUSHTARG", and it is directly
1729       used in some opcodes, as well as indirectly in zillions of others,
1730       which use it via "(X)PUSH[iunp]".
1731
1732       Because the target is reused, you must be careful when pushing multiple
1733       values on the stack.  The following code will not do what you think:
1734
1735           XPUSHi(10);
1736           XPUSHi(20);
1737
1738       This translates as "set "TARG" to 10, push a pointer to "TARG" onto the
1739       stack; set "TARG" to 20, push a pointer to "TARG" onto the stack".  At
1740       the end of the operation, the stack does not contain the values 10 and
1741       20, but actually contains two pointers to "TARG", which we have set to
1742       20.
1743
1744       If you need to push multiple different values then you should either
1745       use the "(X)PUSHs" macros, or else use the new "m(X)PUSH[iunp]" macros,
1746       none of which make use of "TARG".  The "(X)PUSHs" macros simply push an
1747       SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
1748       will often need to be "mortal".  The new "m(X)PUSH[iunp]" macros make
1749       this a little easier to achieve by creating a new mortal for you (via
1750       "(X)PUSHmortal"), pushing that onto the stack (extending it if
1751       necessary in the case of the "mXPUSH[iunp]" macros), and then setting
1752       its value.  Thus, instead of writing this to "fix" the example above:
1753
1754           XPUSHs(sv_2mortal(newSViv(10)))
1755           XPUSHs(sv_2mortal(newSViv(20)))
1756
1757       you can simply write:
1758
1759           mXPUSHi(10)
1760           mXPUSHi(20)
1761
1762       On a related note, if you do use "(X)PUSH[iunp]", then you're going to
1763       need a "dTARG" in your variable declarations so that the "*PUSH*"
1764       macros can make use of the local variable "TARG".  See also "dTARGET"
1765       and "dXSTARG".
1766
1767   Scratchpads
1768       The question remains on when the SVs which are targets for opcodes are
1769       created.  The answer is that they are created when the current unit--a
1770       subroutine or a file (for opcodes for statements outside of
1771       subroutines)--is compiled.  During this time a special anonymous Perl
1772       array is created, which is called a scratchpad for the current unit.
1773
1774       A scratchpad keeps SVs which are lexicals for the current unit and are
1775       targets for opcodes.  A previous version of this document stated that
1776       one can deduce that an SV lives on a scratchpad by looking on its
1777       flags: lexicals have "SVs_PADMY" set, and targets have "SVs_PADTMP"
1778       set.  But this has never been fully true.  "SVs_PADMY" could be set on
1779       a variable that no longer resides in any pad.  While targets do have
1780       "SVs_PADTMP" set, it can also be set on variables that have never
1781       resided in a pad, but nonetheless act like targets.  As of perl 5.21.5,
1782       the "SVs_PADMY" flag is no longer used and is defined as 0.
1783       "SvPADMY()" now returns true for anything without "SVs_PADTMP".
1784
1785       The correspondence between OPs and targets is not 1-to-1.  Different
1786       OPs in the compile tree of the unit can use the same target, if this
1787       would not conflict with the expected life of the temporary.
1788
1789   Scratchpads and recursion
1790       In fact it is not 100% true that a compiled unit contains a pointer to
1791       the scratchpad AV.  In fact it contains a pointer to an AV of
1792       (initially) one element, and this element is the scratchpad AV.  Why do
1793       we need an extra level of indirection?
1794
1795       The answer is recursion, and maybe threads.  Both these can create
1796       several execution pointers going into the same subroutine.  For the
1797       subroutine-child not write over the temporaries for the subroutine-
1798       parent (lifespan of which covers the call to the child), the parent and
1799       the child should have different scratchpads.  (And the lexicals should
1800       be separate anyway!)
1801
1802       So each subroutine is born with an array of scratchpads (of length 1).
1803       On each entry to the subroutine it is checked that the current depth of
1804       the recursion is not more than the length of this array, and if it is,
1805       new scratchpad is created and pushed into the array.
1806
1807       The targets on this scratchpad are "undef"s, but they are already
1808       marked with correct flags.
1809

Memory Allocation

1811   Allocation
1812       All memory meant to be used with the Perl API functions should be
1813       manipulated using the macros described in this section.  The macros
1814       provide the necessary transparency between differences in the actual
1815       malloc implementation that is used within perl.
1816
1817       It is suggested that you enable the version of malloc that is
1818       distributed with Perl.  It keeps pools of various sizes of unallocated
1819       memory in order to satisfy allocation requests more quickly.  However,
1820       on some platforms, it may cause spurious malloc or free errors.
1821
1822       The following three macros are used to initially allocate memory :
1823
1824           Newx(pointer, number, type);
1825           Newxc(pointer, number, type, cast);
1826           Newxz(pointer, number, type);
1827
1828       The first argument "pointer" should be the name of a variable that will
1829       point to the newly allocated memory.
1830
1831       The second and third arguments "number" and "type" specify how many of
1832       the specified type of data structure should be allocated.  The argument
1833       "type" is passed to "sizeof".  The final argument to "Newxc", "cast",
1834       should be used if the "pointer" argument is different from the "type"
1835       argument.
1836
1837       Unlike the "Newx" and "Newxc" macros, the "Newxz" macro calls "memzero"
1838       to zero out all the newly allocated memory.
1839
1840   Reallocation
1841           Renew(pointer, number, type);
1842           Renewc(pointer, number, type, cast);
1843           Safefree(pointer)
1844
1845       These three macros are used to change a memory buffer size or to free a
1846       piece of memory no longer needed.  The arguments to "Renew" and
1847       "Renewc" match those of "New" and "Newc" with the exception of not
1848       needing the "magic cookie" argument.
1849
1850   Moving
1851           Move(source, dest, number, type);
1852           Copy(source, dest, number, type);
1853           Zero(dest, number, type);
1854
1855       These three macros are used to move, copy, or zero out previously
1856       allocated memory.  The "source" and "dest" arguments point to the
1857       source and destination starting points.  Perl will move, copy, or zero
1858       out "number" instances of the size of the "type" data structure (using
1859       the "sizeof" function).
1860

PerlIO

1862       The most recent development releases of Perl have been experimenting
1863       with removing Perl's dependency on the "normal" standard I/O suite and
1864       allowing other stdio implementations to be used.  This involves
1865       creating a new abstraction layer that then calls whichever
1866       implementation of stdio Perl was compiled with.  All XSUBs should now
1867       use the functions in the PerlIO abstraction layer and not make any
1868       assumptions about what kind of stdio is being used.
1869
1870       For a complete description of the PerlIO abstraction, consult perlapio.
1871

Compiled code

1873   Code tree
1874       Here we describe the internal form your code is converted to by Perl.
1875       Start with a simple example:
1876
1877         $a = $b + $c;
1878
1879       This is converted to a tree similar to this one:
1880
1881                    assign-to
1882                  /           \
1883                 +             $a
1884               /   \
1885             $b     $c
1886
1887       (but slightly more complicated).  This tree reflects the way Perl
1888       parsed your code, but has nothing to do with the execution order.
1889       There is an additional "thread" going through the nodes of the tree
1890       which shows the order of execution of the nodes.  In our simplified
1891       example above it looks like:
1892
1893            $b ---> $c ---> + ---> $a ---> assign-to
1894
1895       But with the actual compile tree for "$a = $b + $c" it is different:
1896       some nodes optimized away.  As a corollary, though the actual tree
1897       contains more nodes than our simplified example, the execution order is
1898       the same as in our example.
1899
1900   Examining the tree
1901       If you have your perl compiled for debugging (usually done with
1902       "-DDEBUGGING" on the "Configure" command line), you may examine the
1903       compiled tree by specifying "-Dx" on the Perl command line.  The output
1904       takes several lines per node, and for "$b+$c" it looks like this:
1905
1906           5           TYPE = add  ===> 6
1907                       TARG = 1
1908                       FLAGS = (SCALAR,KIDS)
1909                       {
1910                           TYPE = null  ===> (4)
1911                             (was rv2sv)
1912                           FLAGS = (SCALAR,KIDS)
1913                           {
1914           3                   TYPE = gvsv  ===> 4
1915                               FLAGS = (SCALAR)
1916                               GV = main::b
1917                           }
1918                       }
1919                       {
1920                           TYPE = null  ===> (5)
1921                             (was rv2sv)
1922                           FLAGS = (SCALAR,KIDS)
1923                           {
1924           4                   TYPE = gvsv  ===> 5
1925                               FLAGS = (SCALAR)
1926                               GV = main::c
1927                           }
1928                       }
1929
1930       This tree has 5 nodes (one per "TYPE" specifier), only 3 of them are
1931       not optimized away (one per number in the left column).  The immediate
1932       children of the given node correspond to "{}" pairs on the same level
1933       of indentation, thus this listing corresponds to the tree:
1934
1935                          add
1936                        /     \
1937                      null    null
1938                       |       |
1939                      gvsv    gvsv
1940
1941       The execution order is indicated by "===>" marks, thus it is "3 4 5 6"
1942       (node 6 is not included into above listing), i.e., "gvsv gvsv add
1943       whatever".
1944
1945       Each of these nodes represents an op, a fundamental operation inside
1946       the Perl core.  The code which implements each operation can be found
1947       in the pp*.c files; the function which implements the op with type
1948       "gvsv" is "pp_gvsv", and so on.  As the tree above shows, different ops
1949       have different numbers of children: "add" is a binary operator, as one
1950       would expect, and so has two children.  To accommodate the various
1951       different numbers of children, there are various types of op data
1952       structure, and they link together in different ways.
1953
1954       The simplest type of op structure is "OP": this has no children.  Unary
1955       operators, "UNOP"s, have one child, and this is pointed to by the
1956       "op_first" field.  Binary operators ("BINOP"s) have not only an
1957       "op_first" field but also an "op_last" field.  The most complex type of
1958       op is a "LISTOP", which has any number of children.  In this case, the
1959       first child is pointed to by "op_first" and the last child by
1960       "op_last".  The children in between can be found by iteratively
1961       following the "OpSIBLING" pointer from the first child to the last (but
1962       see below).
1963
1964       There are also some other op types: a "PMOP" holds a regular
1965       expression, and has no children, and a "LOOP" may or may not have
1966       children.  If the "op_children" field is non-zero, it behaves like a
1967       "LISTOP".  To complicate matters, if a "UNOP" is actually a "null" op
1968       after optimization (see "Compile pass 2: context propagation") it will
1969       still have children in accordance with its former type.
1970
1971       Finally, there is a "LOGOP", or logic op. Like a "LISTOP", this has one
1972       or more children, but it doesn't have an "op_last" field: so you have
1973       to follow "op_first" and then the "OpSIBLING" chain itself to find the
1974       last child. Instead it has an "op_other" field, which is comparable to
1975       the "op_next" field described below, and represents an alternate
1976       execution path. Operators like "and", "or" and "?" are "LOGOP"s. Note
1977       that in general, "op_other" may not point to any of the direct children
1978       of the "LOGOP".
1979
1980       Starting in version 5.21.2, perls built with the experimental define
1981       "-DPERL_OP_PARENT" add an extra boolean flag for each op, "op_moresib".
1982       When not set, this indicates that this is the last op in an "OpSIBLING"
1983       chain. This frees up the "op_sibling" field on the last sibling to
1984       point back to the parent op. Under this build, that field is also
1985       renamed "op_sibparent" to reflect its joint role. The macro
1986       OpSIBLING(o) wraps this special behaviour, and always returns NULL on
1987       the last sibling.  With this build the op_parent(o) function can be
1988       used to find the parent of any op. Thus for forward compatibility, you
1989       should always use the OpSIBLING(o) macro rather than accessing
1990       "op_sibling" directly.
1991
1992       Another way to examine the tree is to use a compiler back-end module,
1993       such as B::Concise.
1994
1995   Compile pass 1: check routines
1996       The tree is created by the compiler while yacc code feeds it the
1997       constructions it recognizes.  Since yacc works bottom-up, so does the
1998       first pass of perl compilation.
1999
2000       What makes this pass interesting for perl developers is that some
2001       optimization may be performed on this pass.  This is optimization by
2002       so-called "check routines".  The correspondence between node names and
2003       corresponding check routines is described in opcode.pl (do not forget
2004       to run "make regen_headers" if you modify this file).
2005
2006       A check routine is called when the node is fully constructed except for
2007       the execution-order thread.  Since at this time there are no back-links
2008       to the currently constructed node, one can do most any operation to the
2009       top-level node, including freeing it and/or creating new nodes
2010       above/below it.
2011
2012       The check routine returns the node which should be inserted into the
2013       tree (if the top-level node was not modified, check routine returns its
2014       argument).
2015
2016       By convention, check routines have names "ck_*".  They are usually
2017       called from "new*OP" subroutines (or "convert") (which in turn are
2018       called from perly.y).
2019
2020   Compile pass 1a: constant folding
2021       Immediately after the check routine is called the returned node is
2022       checked for being compile-time executable.  If it is (the value is
2023       judged to be constant) it is immediately executed, and a constant node
2024       with the "return value" of the corresponding subtree is substituted
2025       instead.  The subtree is deleted.
2026
2027       If constant folding was not performed, the execution-order thread is
2028       created.
2029
2030   Compile pass 2: context propagation
2031       When a context for a part of compile tree is known, it is propagated
2032       down through the tree.  At this time the context can have 5 values
2033       (instead of 2 for runtime context): void, boolean, scalar, list, and
2034       lvalue.  In contrast with the pass 1 this pass is processed from top to
2035       bottom: a node's context determines the context for its children.
2036
2037       Additional context-dependent optimizations are performed at this time.
2038       Since at this moment the compile tree contains back-references (via
2039       "thread" pointers), nodes cannot be free()d now.  To allow optimized-
2040       away nodes at this stage, such nodes are null()ified instead of
2041       free()ing (i.e. their type is changed to OP_NULL).
2042
2043   Compile pass 3: peephole optimization
2044       After the compile tree for a subroutine (or for an "eval" or a file) is
2045       created, an additional pass over the code is performed.  This pass is
2046       neither top-down or bottom-up, but in the execution order (with
2047       additional complications for conditionals).  Optimizations performed at
2048       this stage are subject to the same restrictions as in the pass 2.
2049
2050       Peephole optimizations are done by calling the function pointed to by
2051       the global variable "PL_peepp".  By default, "PL_peepp" just calls the
2052       function pointed to by the global variable "PL_rpeepp".  By default,
2053       that performs some basic op fixups and optimisations along the
2054       execution-order op chain, and recursively calls "PL_rpeepp" for each
2055       side chain of ops (resulting from conditionals).  Extensions may
2056       provide additional optimisations or fixups, hooking into either the
2057       per-subroutine or recursive stage, like this:
2058
2059           static peep_t prev_peepp;
2060           static void my_peep(pTHX_ OP *o)
2061           {
2062               /* custom per-subroutine optimisation goes here */
2063               prev_peepp(aTHX_ o);
2064               /* custom per-subroutine optimisation may also go here */
2065           }
2066           BOOT:
2067               prev_peepp = PL_peepp;
2068               PL_peepp = my_peep;
2069
2070           static peep_t prev_rpeepp;
2071           static void my_rpeep(pTHX_ OP *o)
2072           {
2073               OP *orig_o = o;
2074               for(; o; o = o->op_next) {
2075                   /* custom per-op optimisation goes here */
2076               }
2077               prev_rpeepp(aTHX_ orig_o);
2078           }
2079           BOOT:
2080               prev_rpeepp = PL_rpeepp;
2081               PL_rpeepp = my_rpeep;
2082
2083   Pluggable runops
2084       The compile tree is executed in a runops function.  There are two
2085       runops functions, in run.c and in dump.c.  "Perl_runops_debug" is used
2086       with DEBUGGING and "Perl_runops_standard" is used otherwise.  For fine
2087       control over the execution of the compile tree it is possible to
2088       provide your own runops function.
2089
2090       It's probably best to copy one of the existing runops functions and
2091       change it to suit your needs.  Then, in the BOOT section of your XS
2092       file, add the line:
2093
2094         PL_runops = my_runops;
2095
2096       This function should be as efficient as possible to keep your programs
2097       running as fast as possible.
2098
2099   Compile-time scope hooks
2100       As of perl 5.14 it is possible to hook into the compile-time lexical
2101       scope mechanism using "Perl_blockhook_register".  This is used like
2102       this:
2103
2104           STATIC void my_start_hook(pTHX_ int full);
2105           STATIC BHK my_hooks;
2106
2107           BOOT:
2108               BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
2109               Perl_blockhook_register(aTHX_ &my_hooks);
2110
2111       This will arrange to have "my_start_hook" called at the start of
2112       compiling every lexical scope.  The available hooks are:
2113
2114       "void bhk_start(pTHX_ int full)"
2115           This is called just after starting a new lexical scope.  Note that
2116           Perl code like
2117
2118               if ($x) { ... }
2119
2120           creates two scopes: the first starts at the "(" and has "full ==
2121           1", the second starts at the "{" and has "full == 0".  Both end at
2122           the "}", so calls to "start" and "pre"/"post_end" will match.
2123           Anything pushed onto the save stack by this hook will be popped
2124           just before the scope ends (between the "pre_" and "post_end"
2125           hooks, in fact).
2126
2127       "void bhk_pre_end(pTHX_ OP **o)"
2128           This is called at the end of a lexical scope, just before unwinding
2129           the stack.  o is the root of the optree representing the scope; it
2130           is a double pointer so you can replace the OP if you need to.
2131
2132       "void bhk_post_end(pTHX_ OP **o)"
2133           This is called at the end of a lexical scope, just after unwinding
2134           the stack.  o is as above.  Note that it is possible for calls to
2135           "pre_" and "post_end" to nest, if there is something on the save
2136           stack that calls string eval.
2137
2138       "void bhk_eval(pTHX_ OP *const o)"
2139           This is called just before starting to compile an "eval STRING",
2140           "do FILE", "require" or "use", after the eval has been set up.  o
2141           is the OP that requested the eval, and will normally be an
2142           "OP_ENTEREVAL", "OP_DOFILE" or "OP_REQUIRE".
2143
2144       Once you have your hook functions, you need a "BHK" structure to put
2145       them in.  It's best to allocate it statically, since there is no way to
2146       free it once it's registered.  The function pointers should be inserted
2147       into this structure using the "BhkENTRY_set" macro, which will also set
2148       flags indicating which entries are valid.  If you do need to allocate
2149       your "BHK" dynamically for some reason, be sure to zero it before you
2150       start.
2151
2152       Once registered, there is no mechanism to switch these hooks off, so if
2153       that is necessary you will need to do this yourself.  An entry in "%^H"
2154       is probably the best way, so the effect is lexically scoped; however it
2155       is also possible to use the "BhkDISABLE" and "BhkENABLE" macros to
2156       temporarily switch entries on and off.  You should also be aware that
2157       generally speaking at least one scope will have opened before your
2158       extension is loaded, so you will see some "pre"/"post_end" pairs that
2159       didn't have a matching "start".
2160

Examining internal data structures with the "dump" functions

2162       To aid debugging, the source file dump.c contains a number of functions
2163       which produce formatted output of internal data structures.
2164
2165       The most commonly used of these functions is "Perl_sv_dump"; it's used
2166       for dumping SVs, AVs, HVs, and CVs.  The "Devel::Peek" module calls
2167       "sv_dump" to produce debugging output from Perl-space, so users of that
2168       module should already be familiar with its format.
2169
2170       "Perl_op_dump" can be used to dump an "OP" structure or any of its
2171       derivatives, and produces output similar to "perl -Dx"; in fact,
2172       "Perl_dump_eval" will dump the main root of the code being evaluated,
2173       exactly like "-Dx".
2174
2175       Other useful functions are "Perl_dump_sub", which turns a "GV" into an
2176       op tree, "Perl_dump_packsubs" which calls "Perl_dump_sub" on all the
2177       subroutines in a package like so: (Thankfully, these are all xsubs, so
2178       there is no op tree)
2179
2180           (gdb) print Perl_dump_packsubs(PL_defstash)
2181
2182           SUB attributes::bootstrap = (xsub 0x811fedc 0)
2183
2184           SUB UNIVERSAL::can = (xsub 0x811f50c 0)
2185
2186           SUB UNIVERSAL::isa = (xsub 0x811f304 0)
2187
2188           SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
2189
2190           SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
2191
2192       and "Perl_dump_all", which dumps all the subroutines in the stash and
2193       the op tree of the main root.
2194

How multiple interpreters and concurrency are supported

2196   Background and PERL_IMPLICIT_CONTEXT
2197       The Perl interpreter can be regarded as a closed box: it has an API for
2198       feeding it code or otherwise making it do things, but it also has
2199       functions for its own use.  This smells a lot like an object, and there
2200       are ways for you to build Perl so that you can have multiple
2201       interpreters, with one interpreter represented either as a C structure,
2202       or inside a thread-specific structure.  These structures contain all
2203       the context, the state of that interpreter.
2204
2205       One macro controls the major Perl build flavor: MULTIPLICITY.  The
2206       MULTIPLICITY build has a C structure that packages all the interpreter
2207       state.  With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
2208       normally defined, and enables the support for passing in a "hidden"
2209       first argument that represents all three data structures.  MULTIPLICITY
2210       makes multi-threaded perls possible (with the ithreads threading model,
2211       related to the macro USE_ITHREADS.)
2212
2213       Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
2214       PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
2215       former turns on MULTIPLICITY.)  The PERL_GLOBAL_STRUCT causes all the
2216       internal variables of Perl to be wrapped inside a single global struct,
2217       struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or the
2218       function  Perl_GetVars().  The PERL_GLOBAL_STRUCT_PRIVATE goes one step
2219       further, there is still a single struct (allocated in main() either
2220       from heap or from stack) but there are no global data symbols pointing
2221       to it.  In either case the global struct should be initialized as the
2222       very first thing in main() using Perl_init_global_struct() and
2223       correspondingly tear it down after perl_free() using
2224       Perl_free_global_struct(), please see miniperlmain.c for usage details.
2225       You may also need to use "dVAR" in your coding to "declare the global
2226       variables" when you are using them.  dTHX does this for you
2227       automatically.
2228
2229       To see whether you have non-const data you can use a BSD (or GNU)
2230       compatible "nm":
2231
2232         nm libperl.a | grep -v ' [TURtr] '
2233
2234       If this displays any "D" or "d" symbols (or possibly "C" or "c"), you
2235       have non-const data.  The symbols the "grep" removed are as follows:
2236       "Tt" are text, or code, the "Rr" are read-only (const) data, and the
2237       "U" is <undefined>, external symbols referred to.
2238
2239       The test t/porting/libperl.t does this kind of symbol sanity checking
2240       on "libperl.a".
2241
2242       For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
2243       doesn't actually hide all symbols inside a big global struct: some
2244       PerlIO_xxx vtables are left visible.  The PERL_GLOBAL_STRUCT_PRIVATE
2245       then hides everything (see how the PERLIO_FUNCS_DECL is used).
2246
2247       All this obviously requires a way for the Perl internal functions to be
2248       either subroutines taking some kind of structure as the first argument,
2249       or subroutines taking nothing as the first argument.  To enable these
2250       two very different ways of building the interpreter, the Perl source
2251       (as it does in so many other situations) makes heavy use of macros and
2252       subroutine naming conventions.
2253
2254       First problem: deciding which functions will be public API functions
2255       and which will be private.  All functions whose names begin "S_" are
2256       private (think "S" for "secret" or "static").  All other functions
2257       begin with "Perl_", but just because a function begins with "Perl_"
2258       does not mean it is part of the API.  (See "Internal Functions".)  The
2259       easiest way to be sure a function is part of the API is to find its
2260       entry in perlapi.  If it exists in perlapi, it's part of the API.  If
2261       it doesn't, and you think it should be (i.e., you need it for your
2262       extension), send mail via perlbug explaining why you think it should
2263       be.
2264
2265       Second problem: there must be a syntax so that the same subroutine
2266       declarations and calls can pass a structure as their first argument, or
2267       pass nothing.  To solve this, the subroutines are named and declared in
2268       a particular way.  Here's a typical start of a static function used
2269       within the Perl guts:
2270
2271         STATIC void
2272         S_incline(pTHX_ char *s)
2273
2274       STATIC becomes "static" in C, and may be #define'd to nothing in some
2275       configurations in the future.
2276
2277       A public function (i.e. part of the internal API, but not necessarily
2278       sanctioned for use in extensions) begins like this:
2279
2280         void
2281         Perl_sv_setiv(pTHX_ SV* dsv, IV num)
2282
2283       "pTHX_" is one of a number of macros (in perl.h) that hide the details
2284       of the interpreter's context.  THX stands for "thread", "this", or
2285       "thingy", as the case may be.  (And no, George Lucas is not involved.
2286       :-) The first character could be 'p' for a prototype, 'a' for argument,
2287       or 'd' for declaration, so we have "pTHX", "aTHX" and "dTHX", and their
2288       variants.
2289
2290       When Perl is built without options that set PERL_IMPLICIT_CONTEXT,
2291       there is no first argument containing the interpreter's context.  The
2292       trailing underscore in the pTHX_ macro indicates that the macro
2293       expansion needs a comma after the context argument because other
2294       arguments follow it.  If PERL_IMPLICIT_CONTEXT is not defined, pTHX_
2295       will be ignored, and the subroutine is not prototyped to take the extra
2296       argument.  The form of the macro without the trailing underscore is
2297       used when there are no additional explicit arguments.
2298
2299       When a core function calls another, it must pass the context.  This is
2300       normally hidden via macros.  Consider "sv_setiv".  It expands into
2301       something like this:
2302
2303           #ifdef PERL_IMPLICIT_CONTEXT
2304             #define sv_setiv(a,b)      Perl_sv_setiv(aTHX_ a, b)
2305             /* can't do this for vararg functions, see below */
2306           #else
2307             #define sv_setiv           Perl_sv_setiv
2308           #endif
2309
2310       This works well, and means that XS authors can gleefully write:
2311
2312           sv_setiv(foo, bar);
2313
2314       and still have it work under all the modes Perl could have been
2315       compiled with.
2316
2317       This doesn't work so cleanly for varargs functions, though, as macros
2318       imply that the number of arguments is known in advance.  Instead we
2319       either need to spell them out fully, passing "aTHX_" as the first
2320       argument (the Perl core tends to do this with functions like
2321       Perl_warner), or use a context-free version.
2322
2323       The context-free version of Perl_warner is called
2324       Perl_warner_nocontext, and does not take the extra argument.  Instead
2325       it does dTHX; to get the context from thread-local storage.  We
2326       "#define warner Perl_warner_nocontext" so that extensions get source
2327       compatibility at the expense of performance.  (Passing an arg is
2328       cheaper than grabbing it from thread-local storage.)
2329
2330       You can ignore [pad]THXx when browsing the Perl headers/sources.  Those
2331       are strictly for use within the core.  Extensions and embedders need
2332       only be aware of [pad]THX.
2333
2334   So what happened to dTHR?
2335       "dTHR" was introduced in perl 5.005 to support the older thread model.
2336       The older thread model now uses the "THX" mechanism to pass context
2337       pointers around, so "dTHR" is not useful any more.  Perl 5.6.0 and
2338       later still have it for backward source compatibility, but it is
2339       defined to be a no-op.
2340
2341   How do I use all this in extensions?
2342       When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any
2343       functions in the Perl API will need to pass the initial context
2344       argument somehow.  The kicker is that you will need to write it in such
2345       a way that the extension still compiles when Perl hasn't been built
2346       with PERL_IMPLICIT_CONTEXT enabled.
2347
2348       There are three ways to do this.  First, the easy but inefficient way,
2349       which is also the default, in order to maintain source compatibility
2350       with extensions: whenever XSUB.h is #included, it redefines the aTHX
2351       and aTHX_ macros to call a function that will return the context.
2352       Thus, something like:
2353
2354               sv_setiv(sv, num);
2355
2356       in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
2357       in effect:
2358
2359               Perl_sv_setiv(Perl_get_context(), sv, num);
2360
2361       or to this otherwise:
2362
2363               Perl_sv_setiv(sv, num);
2364
2365       You don't have to do anything new in your extension to get this; since
2366       the Perl library provides Perl_get_context(), it will all just work.
2367
2368       The second, more efficient way is to use the following template for
2369       your Foo.xs:
2370
2371               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2372               #include "EXTERN.h"
2373               #include "perl.h"
2374               #include "XSUB.h"
2375
2376               STATIC void my_private_function(int arg1, int arg2);
2377
2378               STATIC void
2379               my_private_function(int arg1, int arg2)
2380               {
2381                   dTHX;       /* fetch context */
2382                   ... call many Perl API functions ...
2383               }
2384
2385               [... etc ...]
2386
2387               MODULE = Foo            PACKAGE = Foo
2388
2389               /* typical XSUB */
2390
2391               void
2392               my_xsub(arg)
2393                       int arg
2394                   CODE:
2395                       my_private_function(arg, 10);
2396
2397       Note that the only two changes from the normal way of writing an
2398       extension is the addition of a "#define PERL_NO_GET_CONTEXT" before
2399       including the Perl headers, followed by a "dTHX;" declaration at the
2400       start of every function that will call the Perl API.  (You'll know
2401       which functions need this, because the C compiler will complain that
2402       there's an undeclared identifier in those functions.)  No changes are
2403       needed for the XSUBs themselves, because the XS() macro is correctly
2404       defined to pass in the implicit context if needed.
2405
2406       The third, even more efficient way is to ape how it is done within the
2407       Perl guts:
2408
2409               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2410               #include "EXTERN.h"
2411               #include "perl.h"
2412               #include "XSUB.h"
2413
2414               /* pTHX_ only needed for functions that call Perl API */
2415               STATIC void my_private_function(pTHX_ int arg1, int arg2);
2416
2417               STATIC void
2418               my_private_function(pTHX_ int arg1, int arg2)
2419               {
2420                   /* dTHX; not needed here, because THX is an argument */
2421                   ... call Perl API functions ...
2422               }
2423
2424               [... etc ...]
2425
2426               MODULE = Foo            PACKAGE = Foo
2427
2428               /* typical XSUB */
2429
2430               void
2431               my_xsub(arg)
2432                       int arg
2433                   CODE:
2434                       my_private_function(aTHX_ arg, 10);
2435
2436       This implementation never has to fetch the context using a function
2437       call, since it is always passed as an extra argument.  Depending on
2438       your needs for simplicity or efficiency, you may mix the previous two
2439       approaches freely.
2440
2441       Never add a comma after "pTHX" yourself--always use the form of the
2442       macro with the underscore for functions that take explicit arguments,
2443       or the form without the argument for functions with no explicit
2444       arguments.
2445
2446       If one is compiling Perl with the "-DPERL_GLOBAL_STRUCT" the "dVAR"
2447       definition is needed if the Perl global variables (see perlvars.h or
2448       globvar.sym) are accessed in the function and "dTHX" is not used (the
2449       "dTHX" includes the "dVAR" if necessary).  One notices the need for
2450       "dVAR" only with the said compile-time define, because otherwise the
2451       Perl global variables are visible as-is.
2452
2453   Should I do anything special if I call perl from multiple threads?
2454       If you create interpreters in one thread and then proceed to call them
2455       in another, you need to make sure perl's own Thread Local Storage (TLS)
2456       slot is initialized correctly in each of those threads.
2457
2458       The "perl_alloc" and "perl_clone" API functions will automatically set
2459       the TLS slot to the interpreter they created, so that there is no need
2460       to do anything special if the interpreter is always accessed in the
2461       same thread that created it, and that thread did not create or call any
2462       other interpreters afterwards.  If that is not the case, you have to
2463       set the TLS slot of the thread before calling any functions in the Perl
2464       API on that particular interpreter.  This is done by calling the
2465       "PERL_SET_CONTEXT" macro in that thread as the first thing you do:
2466
2467               /* do this before doing anything else with some_perl */
2468               PERL_SET_CONTEXT(some_perl);
2469
2470               ... other Perl API calls on some_perl go here ...
2471
2472   Future Plans and PERL_IMPLICIT_SYS
2473       Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2474       that the interpreter knows about itself and pass it around, so too are
2475       there plans to allow the interpreter to bundle up everything it knows
2476       about the environment it's running on.  This is enabled with the
2477       PERL_IMPLICIT_SYS macro.  Currently it only works with USE_ITHREADS on
2478       Windows.
2479
2480       This allows the ability to provide an extra pointer (called the "host"
2481       environment) for all the system calls.  This makes it possible for all
2482       the system stuff to maintain their own state, broken down into seven C
2483       structures.  These are thin wrappers around the usual system calls (see
2484       win32/perllib.c) for the default perl executable, but for a more
2485       ambitious host (like the one that would do fork() emulation) all the
2486       extra work needed to pretend that different interpreters are actually
2487       different "processes", would be done here.
2488
2489       The Perl engine/interpreter and the host are orthogonal entities.
2490       There could be one or more interpreters in a process, and one or more
2491       "hosts", with free association between them.
2492

Internal Functions

2494       All of Perl's internal functions which will be exposed to the outside
2495       world are prefixed by "Perl_" so that they will not conflict with XS
2496       functions or functions used in a program in which Perl is embedded.
2497       Similarly, all global variables begin with "PL_".  (By convention,
2498       static functions start with "S_".)
2499
2500       Inside the Perl core ("PERL_CORE" defined), you can get at the
2501       functions either with or without the "Perl_" prefix, thanks to a bunch
2502       of defines that live in embed.h.  Note that extension code should not
2503       set "PERL_CORE"; this exposes the full perl internals, and is likely to
2504       cause breakage of the XS in each new perl release.
2505
2506       The file embed.h is generated automatically from embed.pl and
2507       embed.fnc.  embed.pl also creates the prototyping header files for the
2508       internal functions, generates the documentation and a lot of other bits
2509       and pieces.  It's important that when you add a new function to the
2510       core or change an existing one, you change the data in the table in
2511       embed.fnc as well.  Here's a sample entry from that table:
2512
2513           Apd |SV**   |av_fetch   |AV* ar|I32 key|I32 lval
2514
2515       The second column is the return type, the third column the name.
2516       Columns after that are the arguments.  The first column is a set of
2517       flags:
2518
2519       A  This function is a part of the public API.  All such functions
2520          should also have 'd', very few do not.
2521
2522       p  This function has a "Perl_" prefix; i.e. it is defined as
2523          "Perl_av_fetch".
2524
2525       d  This function has documentation using the "apidoc" feature which
2526          we'll look at in a second.  Some functions have 'd' but not 'A';
2527          docs are good.
2528
2529       Other available flags are:
2530
2531       s  This is a static function and is defined as "STATIC S_whatever", and
2532          usually called within the sources as "whatever(...)".
2533
2534       n  This does not need an interpreter context, so the definition has no
2535          "pTHX", and it follows that callers don't use "aTHX".  (See
2536          "Background and PERL_IMPLICIT_CONTEXT".)
2537
2538       r  This function never returns; "croak", "exit" and friends.
2539
2540       f  This function takes a variable number of arguments, "printf" style.
2541          The argument list should end with "...", like this:
2542
2543              Afprd   |void   |croak          |const char* pat|...
2544
2545       M  This function is part of the experimental development API, and may
2546          change or disappear without notice.
2547
2548       o  This function should not have a compatibility macro to define, say,
2549          "Perl_parse" to "parse".  It must be called as "Perl_parse".
2550
2551       x  This function isn't exported out of the Perl core.
2552
2553       m  This is implemented as a macro.
2554
2555       X  This function is explicitly exported.
2556
2557       E  This function is visible to extensions included in the Perl core.
2558
2559       b  Binary backward compatibility; this function is a macro but also has
2560          a "Perl_" implementation (which is exported).
2561
2562       others
2563          See the comments at the top of "embed.fnc" for others.
2564
2565       If you edit embed.pl or embed.fnc, you will need to run "make
2566       regen_headers" to force a rebuild of embed.h and other auto-generated
2567       files.
2568
2569   Formatted Printing of IVs, UVs, and NVs
2570       If you are printing IVs, UVs, or NVS instead of the stdio(3) style
2571       formatting codes like %d, %ld, %f, you should use the following macros
2572       for portability
2573
2574               IVdf            IV in decimal
2575               UVuf            UV in decimal
2576               UVof            UV in octal
2577               UVxf            UV in hexadecimal
2578               NVef            NV %e-like
2579               NVff            NV %f-like
2580               NVgf            NV %g-like
2581
2582       These will take care of 64-bit integers and long doubles.  For example:
2583
2584               printf("IV is %"IVdf"\n", iv);
2585
2586       The IVdf will expand to whatever is the correct format for the IVs.
2587
2588       Note that there are different "long doubles": Perl will use whatever
2589       the compiler has.
2590
2591       If you are printing addresses of pointers, use UVxf combined with
2592       PTR2UV(), do not use %lx or %p.
2593
2594   Formatted Printing of "Size_t" and "SSize_t"
2595       The most general way to do this is to cast them to a UV or IV, and
2596       print as in the previous section.
2597
2598       But if you're using "PerlIO_printf()", it's less typing and visual
2599       clutter to use the "%z" length modifier (for siZe):
2600
2601               PerlIO_printf("STRLEN is %zu\n", len);
2602
2603       This modifier is not portable, so its use should be restricted to
2604       "PerlIO_printf()".
2605
2606   Pointer-To-Integer and Integer-To-Pointer
2607       Because pointer size does not necessarily equal integer size, use the
2608       follow macros to do it right.
2609
2610               PTR2UV(pointer)
2611               PTR2IV(pointer)
2612               PTR2NV(pointer)
2613               INT2PTR(pointertotype, integer)
2614
2615       For example:
2616
2617               IV  iv = ...;
2618               SV *sv = INT2PTR(SV*, iv);
2619
2620       and
2621
2622               AV *av = ...;
2623               UV  uv = PTR2UV(av);
2624
2625   Exception Handling
2626       There are a couple of macros to do very basic exception handling in XS
2627       modules.  You have to define "NO_XSLOCKS" before including XSUB.h to be
2628       able to use these macros:
2629
2630               #define NO_XSLOCKS
2631               #include "XSUB.h"
2632
2633       You can use these macros if you call code that may croak, but you need
2634       to do some cleanup before giving control back to Perl.  For example:
2635
2636               dXCPT;    /* set up necessary variables */
2637
2638               XCPT_TRY_START {
2639                 code_that_may_croak();
2640               } XCPT_TRY_END
2641
2642               XCPT_CATCH
2643               {
2644                 /* do cleanup here */
2645                 XCPT_RETHROW;
2646               }
2647
2648       Note that you always have to rethrow an exception that has been caught.
2649       Using these macros, it is not possible to just catch the exception and
2650       ignore it.  If you have to ignore the exception, you have to use the
2651       "call_*" function.
2652
2653       The advantage of using the above macros is that you don't have to setup
2654       an extra function for "call_*", and that using these macros is faster
2655       than using "call_*".
2656
2657   Source Documentation
2658       There's an effort going on to document the internal functions and
2659       automatically produce reference manuals from them -- perlapi is one
2660       such manual which details all the functions which are available to XS
2661       writers.  perlintern is the autogenerated manual for the functions
2662       which are not part of the API and are supposedly for internal use only.
2663
2664       Source documentation is created by putting POD comments into the C
2665       source, like this:
2666
2667        /*
2668        =for apidoc sv_setiv
2669
2670        Copies an integer into the given SV.  Does not handle 'set' magic.  See
2671        L<perlapi/sv_setiv_mg>.
2672
2673        =cut
2674        */
2675
2676       Please try and supply some documentation if you add functions to the
2677       Perl core.
2678
2679   Backwards compatibility
2680       The Perl API changes over time.  New functions are added or the
2681       interfaces of existing functions are changed.  The "Devel::PPPort"
2682       module tries to provide compatibility code for some of these changes,
2683       so XS writers don't have to code it themselves when supporting multiple
2684       versions of Perl.
2685
2686       "Devel::PPPort" generates a C header file ppport.h that can also be run
2687       as a Perl script.  To generate ppport.h, run:
2688
2689           perl -MDevel::PPPort -eDevel::PPPort::WriteFile
2690
2691       Besides checking existing XS code, the script can also be used to
2692       retrieve compatibility information for various API calls using the
2693       "--api-info" command line switch.  For example:
2694
2695         % perl ppport.h --api-info=sv_magicext
2696
2697       For details, see "perldoc ppport.h".
2698

Unicode Support

2700       Perl 5.6.0 introduced Unicode support.  It's important for porters and
2701       XS writers to understand this support and make sure that the code they
2702       write does not corrupt Unicode data.
2703
2704   What is Unicode, anyway?
2705       In the olden, less enlightened times, we all used to use ASCII.  Most
2706       of us did, anyway.  The big problem with ASCII is that it's American.
2707       Well, no, that's not actually the problem; the problem is that it's not
2708       particularly useful for people who don't use the Roman alphabet.  What
2709       used to happen was that particular languages would stick their own
2710       alphabet in the upper range of the sequence, between 128 and 255.  Of
2711       course, we then ended up with plenty of variants that weren't quite
2712       ASCII, and the whole point of it being a standard was lost.
2713
2714       Worse still, if you've got a language like Chinese or Japanese that has
2715       hundreds or thousands of characters, then you really can't fit them
2716       into a mere 256, so they had to forget about ASCII altogether, and
2717       build their own systems using pairs of numbers to refer to one
2718       character.
2719
2720       To fix this, some people formed Unicode, Inc. and produced a new
2721       character set containing all the characters you can possibly think of
2722       and more.  There are several ways of representing these characters, and
2723       the one Perl uses is called UTF-8.  UTF-8 uses a variable number of
2724       bytes to represent a character.  You can learn more about Unicode and
2725       Perl's Unicode model in perlunicode.
2726
2727       (On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of
2728       UTF-8 adapted for EBCDIC platforms.  Below, we just talk about UTF-8.
2729       UTF-EBCDIC is like UTF-8, but the details are different.  The macros
2730       hide the differences from you, just remember that the particular
2731       numbers and bit patterns presented below will differ in UTF-EBCDIC.)
2732
2733   How can I recognise a UTF-8 string?
2734       You can't.  This is because UTF-8 data is stored in bytes just like
2735       non-UTF-8 data.  The Unicode character 200, (0xC8 for you hex types)
2736       capital E with a grave accent, is represented by the two bytes
2737       "v196.172".  Unfortunately, the non-Unicode string "chr(196).chr(172)"
2738       has that byte sequence as well.  So you can't tell just by looking --
2739       this is what makes Unicode input an interesting problem.
2740
2741       In general, you either have to know what you're dealing with, or you
2742       have to guess.  The API function "is_utf8_string" can help; it'll tell
2743       you if a string contains only valid UTF-8 characters, and the chances
2744       of a non-UTF-8 string looking like valid UTF-8 become very small very
2745       quickly with increasing string length.  On a character-by-character
2746       basis, "isUTF8_CHAR" will tell you whether the current character in a
2747       string is valid UTF-8.
2748
2749   How does UTF-8 represent Unicode characters?
2750       As mentioned above, UTF-8 uses a variable number of bytes to store a
2751       character.  Characters with values 0...127 are stored in one byte, just
2752       like good ol' ASCII.  Character 128 is stored as "v194.128"; this
2753       continues up to character 191, which is "v194.191".  Now we've run out
2754       of bits (191 is binary 10111111) so we move on; character 192 is
2755       "v195.128".  And so it goes on, moving to three bytes at character
2756       2048.  "Unicode Encodings" in perlunicode has pictures of how this
2757       works.
2758
2759       Assuming you know you're dealing with a UTF-8 string, you can find out
2760       how long the first character in it is with the "UTF8SKIP" macro:
2761
2762           char *utf = "\305\233\340\240\201";
2763           I32 len;
2764
2765           len = UTF8SKIP(utf); /* len is 2 here */
2766           utf += len;
2767           len = UTF8SKIP(utf); /* len is 3 here */
2768
2769       Another way to skip over characters in a UTF-8 string is to use
2770       "utf8_hop", which takes a string and a number of characters to skip
2771       over.  You're on your own about bounds checking, though, so don't use
2772       it lightly.
2773
2774       All bytes in a multi-byte UTF-8 character will have the high bit set,
2775       so you can test if you need to do something special with this character
2776       like this (the "UTF8_IS_INVARIANT()" is a macro that tests whether the
2777       byte is encoded as a single byte even in UTF-8):
2778
2779           U8 *utf;     /* Initialize this to point to the beginning of the
2780                           sequence to convert */
2781           U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence
2782                           pointed to by 'utf' */
2783           UV uv;       /* Returned code point; note: a UV, not a U8, not a
2784                           char */
2785           STRLEN len; /* Returned length of character in bytes */
2786
2787           if (!UTF8_IS_INVARIANT(*utf))
2788               /* Must treat this as UTF-8 */
2789               uv = utf8_to_uvchr_buf(utf, utf_end, &len);
2790           else
2791               /* OK to treat this character as a byte */
2792               uv = *utf;
2793
2794       You can also see in that example that we use "utf8_to_uvchr_buf" to get
2795       the value of the character; the inverse function "uvchr_to_utf8" is
2796       available for putting a UV into UTF-8:
2797
2798           if (!UVCHR_IS_INVARIANT(uv))
2799               /* Must treat this as UTF8 */
2800               utf8 = uvchr_to_utf8(utf8, uv);
2801           else
2802               /* OK to treat this character as a byte */
2803               *utf8++ = uv;
2804
2805       You must convert characters to UVs using the above functions if you're
2806       ever in a situation where you have to match UTF-8 and non-UTF-8
2807       characters.  You may not skip over UTF-8 characters in this case.  If
2808       you do this, you'll lose the ability to match hi-bit non-UTF-8
2809       characters; for instance, if your UTF-8 string contains "v196.172", and
2810       you skip that character, you can never match a "chr(200)" in a
2811       non-UTF-8 string.  So don't do that!
2812
2813       (Note that we don't have to test for invariant characters in the
2814       examples above.  The functions work on any well-formed UTF-8 input.
2815       It's just that its faster to avoid the function overhead when it's not
2816       needed.)
2817
2818   How does Perl store UTF-8 strings?
2819       Currently, Perl deals with UTF-8 strings and non-UTF-8 strings slightly
2820       differently.  A flag in the SV, "SVf_UTF8", indicates that the string
2821       is internally encoded as UTF-8.  Without it, the byte value is the
2822       codepoint number and vice versa.  This flag is only meaningful if the
2823       SV is "SvPOK" or immediately after stringification via "SvPV" or a
2824       similar macro.  You can check and manipulate this flag with the
2825       following macros:
2826
2827           SvUTF8(sv)
2828           SvUTF8_on(sv)
2829           SvUTF8_off(sv)
2830
2831       This flag has an important effect on Perl's treatment of the string: if
2832       UTF-8 data is not properly distinguished, regular expressions,
2833       "length", "substr" and other string handling operations will have
2834       undesirable (wrong) results.
2835
2836       The problem comes when you have, for instance, a string that isn't
2837       flagged as UTF-8, and contains a byte sequence that could be UTF-8 --
2838       especially when combining non-UTF-8 and UTF-8 strings.
2839
2840       Never forget that the "SVf_UTF8" flag is separate from the PV value;
2841       you need to be sure you don't accidentally knock it off while you're
2842       manipulating SVs.  More specifically, you cannot expect to do this:
2843
2844           SV *sv;
2845           SV *nsv;
2846           STRLEN len;
2847           char *p;
2848
2849           p = SvPV(sv, len);
2850           frobnicate(p);
2851           nsv = newSVpvn(p, len);
2852
2853       The "char*" string does not tell you the whole story, and you can't
2854       copy or reconstruct an SV just by copying the string value.  Check if
2855       the old SV has the UTF8 flag set (after the "SvPV" call), and act
2856       accordingly:
2857
2858           p = SvPV(sv, len);
2859           is_utf8 = SvUTF8(sv);
2860           frobnicate(p, is_utf8);
2861           nsv = newSVpvn(p, len);
2862           if (is_utf8)
2863               SvUTF8_on(nsv);
2864
2865       In the above, your "frobnicate" function has been changed to be made
2866       aware of whether or not it's dealing with UTF-8 data, so that it can
2867       handle the string appropriately.
2868
2869       Since just passing an SV to an XS function and copying the data of the
2870       SV is not enough to copy the UTF8 flags, even less right is just
2871       passing a "char *" to an XS function.
2872
2873       For full generality, use the "DO_UTF8" macro to see if the string in an
2874       SV is to be treated as UTF-8.  This takes into account if the call to
2875       the XS function is being made from within the scope of "use bytes".  If
2876       so, the underlying bytes that comprise the UTF-8 string are to be
2877       exposed, rather than the character they represent.  But this pragma
2878       should only really be used for debugging and perhaps low-level testing
2879       at the byte level.  Hence most XS code need not concern itself with
2880       this, but various areas of the perl core do need to support it.
2881
2882       And this isn't the whole story.  Starting in Perl v5.12, strings that
2883       aren't encoded in UTF-8 may also be treated as Unicode under various
2884       conditions (see "ASCII Rules versus Unicode Rules" in perlunicode).
2885       This is only really a problem for characters whose ordinals are between
2886       128 and 255, and their behavior varies under ASCII versus Unicode rules
2887       in ways that your code cares about (see "The "Unicode Bug"" in
2888       perlunicode).  There is no published API for dealing with this, as it
2889       is subject to change, but you can look at the code for "pp_lc" in pp.c
2890       for an example as to how it's currently done.
2891
2892   How do I convert a string to UTF-8?
2893       If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to
2894       upgrade the non-UTF-8 strings to UTF-8.  If you've got an SV, the
2895       easiest way to do this is:
2896
2897           sv_utf8_upgrade(sv);
2898
2899       However, you must not do this, for example:
2900
2901           if (!SvUTF8(left))
2902               sv_utf8_upgrade(left);
2903
2904       If you do this in a binary operator, you will actually change one of
2905       the strings that came into the operator, and, while it shouldn't be
2906       noticeable by the end user, it can cause problems in deficient code.
2907
2908       Instead, "bytes_to_utf8" will give you a UTF-8-encoded copy of its
2909       string argument.  This is useful for having the data available for
2910       comparisons and so on, without harming the original SV.  There's also
2911       "utf8_to_bytes" to go the other way, but naturally, this will fail if
2912       the string contains any characters above 255 that can't be represented
2913       in a single byte.
2914
2915   How do I compare strings?
2916       "sv_cmp" in perlapi and "sv_cmp_flags" in perlapi do a lexigraphic
2917       comparison of two SV's, and handle UTF-8ness properly.  Note, however,
2918       that Unicode specifies a much fancier mechanism for collation,
2919       available via the Unicode::Collate module.
2920
2921       To just compare two strings for equality/non-equality, you can just use
2922       "memEQ()" and "memNE()" as usual, except the strings must be both UTF-8
2923       or not UTF-8 encoded.
2924
2925       To compare two strings case-insensitively, use "foldEQ_utf8()" (the
2926       strings don't have to have the same UTF-8ness).
2927
2928   Is there anything else I need to know?
2929       Not really.  Just remember these things:
2930
2931       ·  There's no way to tell if a "char *" or "U8 *" string is UTF-8 or
2932          not.  But you can tell if an SV is to be treated as UTF-8 by calling
2933          "DO_UTF8" on it, after stringifying it with "SvPV" or a similar
2934          macro.  And, you can tell if SV is actually UTF-8 (even if it is not
2935          to be treated as such) by looking at its "SvUTF8" flag (again after
2936          stringifying it).  Don't forget to set the flag if something should
2937          be UTF-8.  Treat the flag as part of the PV, even though it's not --
2938          if you pass on the PV to somewhere, pass on the flag too.
2939
2940       ·  If a string is UTF-8, always use "utf8_to_uvchr_buf" to get at the
2941          value, unless "UTF8_IS_INVARIANT(*s)" in which case you can use *s.
2942
2943       ·  When writing a character UV to a UTF-8 string, always use
2944          "uvchr_to_utf8", unless "UVCHR_IS_INVARIANT(uv))" in which case you
2945          can use "*s = uv".
2946
2947       ·  Mixing UTF-8 and non-UTF-8 strings is tricky.  Use "bytes_to_utf8"
2948          to get a new string which is UTF-8 encoded, and then combine them.
2949

Custom Operators

2951       Custom operator support is an experimental feature that allows you to
2952       define your own ops.  This is primarily to allow the building of
2953       interpreters for other languages in the Perl core, but it also allows
2954       optimizations through the creation of "macro-ops" (ops which perform
2955       the functions of multiple ops which are usually executed together, such
2956       as "gvsv, gvsv, add".)
2957
2958       This feature is implemented as a new op type, "OP_CUSTOM".  The Perl
2959       core does not "know" anything special about this op type, and so it
2960       will not be involved in any optimizations.  This also means that you
2961       can define your custom ops to be any op structure -- unary, binary,
2962       list and so on -- you like.
2963
2964       It's important to know what custom operators won't do for you.  They
2965       won't let you add new syntax to Perl, directly.  They won't even let
2966       you add new keywords, directly.  In fact, they won't change the way
2967       Perl compiles a program at all.  You have to do those changes yourself,
2968       after Perl has compiled the program.  You do this either by
2969       manipulating the op tree using a "CHECK" block and the "B::Generate"
2970       module, or by adding a custom peephole optimizer with the "optimize"
2971       module.
2972
2973       When you do this, you replace ordinary Perl ops with custom ops by
2974       creating ops with the type "OP_CUSTOM" and the "op_ppaddr" of your own
2975       PP function.  This should be defined in XS code, and should look like
2976       the PP ops in "pp_*.c".  You are responsible for ensuring that your op
2977       takes the appropriate number of values from the stack, and you are
2978       responsible for adding stack marks if necessary.
2979
2980       You should also "register" your op with the Perl interpreter so that it
2981       can produce sensible error and warning messages.  Since it is possible
2982       to have multiple custom ops within the one "logical" op type
2983       "OP_CUSTOM", Perl uses the value of "o->op_ppaddr" to determine which
2984       custom op it is dealing with.  You should create an "XOP" structure for
2985       each ppaddr you use, set the properties of the custom op with
2986       "XopENTRY_set", and register the structure against the ppaddr using
2987       "Perl_custom_op_register".  A trivial example might look like:
2988
2989           static XOP my_xop;
2990           static OP *my_pp(pTHX);
2991
2992           BOOT:
2993               XopENTRY_set(&my_xop, xop_name, "myxop");
2994               XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
2995               Perl_custom_op_register(aTHX_ my_pp, &my_xop);
2996
2997       The available fields in the structure are:
2998
2999       xop_name
3000           A short name for your op.  This will be included in some error
3001           messages, and will also be returned as "$op->name" by the B module,
3002           so it will appear in the output of module like B::Concise.
3003
3004       xop_desc
3005           A short description of the function of the op.
3006
3007       xop_class
3008           Which of the various *OP structures this op uses.  This should be
3009           one of the "OA_*" constants from op.h, namely
3010
3011           OA_BASEOP
3012           OA_UNOP
3013           OA_BINOP
3014           OA_LOGOP
3015           OA_LISTOP
3016           OA_PMOP
3017           OA_SVOP
3018           OA_PADOP
3019           OA_PVOP_OR_SVOP
3020               This should be interpreted as '"PVOP"' only.  The "_OR_SVOP" is
3021               because the only core "PVOP", "OP_TRANS", can sometimes be a
3022               "SVOP" instead.
3023
3024           OA_LOOP
3025           OA_COP
3026
3027           The other "OA_*" constants should not be used.
3028
3029       xop_peep
3030           This member is of type "Perl_cpeep_t", which expands to "void
3031           (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)".  If it is set, this
3032           function will be called from "Perl_rpeep" when ops of this type are
3033           encountered by the peephole optimizer.  o is the OP that needs
3034           optimizing; oldop is the previous OP optimized, whose "op_next"
3035           points to o.
3036
3037       "B::Generate" directly supports the creation of custom ops by name.
3038

Dynamic Scope and the Context Stack

3040       Note: this section describes a non-public internal API that is subject
3041       to change without notice.
3042
3043   Introduction to the context stack
3044       In Perl, dynamic scoping refers to the runtime nesting of things like
3045       subroutine calls, evals etc, as well as the entering and exiting of
3046       block scopes. For example, the restoring of a "local"ised variable is
3047       determined by the dynamic scope.
3048
3049       Perl tracks the dynamic scope by a data structure called the context
3050       stack, which is an array of "PERL_CONTEXT" structures, and which is
3051       itself a big union for all the types of context. Whenever a new scope
3052       is entered (such as a block, a "for" loop, or a subroutine call), a new
3053       context entry is pushed onto the stack. Similarly when leaving a block
3054       or returning from a subroutine call etc. a context is popped. Since the
3055       context stack represents the current dynamic scope, it can be searched.
3056       For example, "next LABEL" searches back through the stack looking for a
3057       loop context that matches the label; "return" pops contexts until it
3058       finds a sub or eval context or similar; "caller" examines sub contexts
3059       on the stack.
3060
3061       Each context entry is labelled with a context type, "cx_type". Typical
3062       context types are "CXt_SUB", "CXt_EVAL" etc., as well as "CXt_BLOCK"
3063       and "CXt_NULL" which represent a basic scope (as pushed by "pp_enter")
3064       and a sort block. The type determines which part of the context union
3065       are valid.
3066
3067       The main division in the context struct is between a substitution scope
3068       ("CXt_SUBST") and block scopes, which are everything else. The former
3069       is just used while executing "s///e", and won't be discussed further
3070       here.
3071
3072       All the block scope types share a common base, which corresponds to
3073       "CXt_BLOCK". This stores the old values of various scope-related
3074       variables like "PL_curpm", as well as information about the current
3075       scope, such as "gimme". On scope exit, the old variables are restored.
3076
3077       Particular block scope types store extra per-type information. For
3078       example, "CXt_SUB" stores the currently executing CV, while the various
3079       for loop types might hold the original loop variable SV. On scope exit,
3080       the per-type data is processed; for example the CV has its reference
3081       count decremented, and the original loop variable is restored.
3082
3083       The macro "cxstack" returns the base of the current context stack,
3084       while "cxstack_ix" is the index of the current frame within that stack.
3085
3086       In fact, the context stack is actually part of a stack-of-stacks
3087       system; whenever something unusual is done such as calling a "DESTROY"
3088       or tie handler, a new stack is pushed, then popped at the end.
3089
3090       Note that the API described here changed considerably in perl 5.24;
3091       prior to that, big macros like "PUSHBLOCK" and "POPSUB" were used; in
3092       5.24 they were replaced by the inline static functions described below.
3093       In addition, the ordering and detail of how these macros/function work
3094       changed in many ways, often subtly. In particular they didn't handle
3095       saving the savestack and temps stack positions, and required additional
3096       "ENTER", "SAVETMPS" and "LEAVE" compared to the new functions. The old-
3097       style macros will not be described further.
3098
3099   Pushing contexts
3100       For pushing a new context, the two basic functions are "cx =
3101       cx_pushblock()", which pushes a new basic context block and returns its
3102       address, and a family of similar functions with names like
3103       "cx_pushsub(cx)" which populate the additional type-dependent fields in
3104       the "cx" struct. Note that "CXt_NULL" and "CXt_BLOCK" don't have their
3105       own push functions, as they don't store any data beyond that pushed by
3106       "cx_pushblock".
3107
3108       The fields of the context struct and the arguments to the "cx_*"
3109       functions are subject to change between perl releases, representing
3110       whatever is convenient or efficient for that release.
3111
3112       A typical context stack pushing can be found in "pp_entersub"; the
3113       following shows a simplified and stripped-down example of a non-XS
3114       call, along with comments showing roughly what each function does.
3115
3116        dMARK;
3117        U8 gimme      = GIMME_V;
3118        bool hasargs  = cBOOL(PL_op->op_flags & OPf_STACKED);
3119        OP *retop     = PL_op->op_next;
3120        I32 old_ss_ix = PL_savestack_ix;
3121        CV *cv        = ....;
3122
3123        /* ... make mortal copies of stack args which are PADTMPs here ... */
3124
3125        /* ... do any additional savestack pushes here ... */
3126
3127        /* Now push a new context entry of type 'CXt_SUB'; initially just
3128         * doing the actions common to all block types: */
3129
3130        cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix);
3131
3132            /* this does (approximately):
3133                CXINC;              /* cxstack_ix++ (grow if necessary) */
3134                cx = CX_CUR();      /* and get the address of new frame */
3135                cx->cx_type        = CXt_SUB;
3136                cx->blk_gimme      = gimme;
3137                cx->blk_oldsp      = MARK - PL_stack_base;
3138                cx->blk_oldsaveix  = old_ss_ix;
3139                cx->blk_oldcop     = PL_curcop;
3140                cx->blk_oldmarksp  = PL_markstack_ptr - PL_markstack;
3141                cx->blk_oldscopesp = PL_scopestack_ix;
3142                cx->blk_oldpm      = PL_curpm;
3143                cx->blk_old_tmpsfloor = PL_tmps_floor;
3144
3145                PL_tmps_floor        = PL_tmps_ix;
3146            */
3147
3148
3149        /* then update the new context frame with subroutine-specific info,
3150         * such as the CV about to be executed: */
3151
3152        cx_pushsub(cx, cv, retop, hasargs);
3153
3154            /* this does (approximately):
3155                cx->blk_sub.cv          = cv;
3156                cx->blk_sub.olddepth    = CvDEPTH(cv);
3157                cx->blk_sub.prevcomppad = PL_comppad;
3158                cx->cx_type            |= (hasargs) ? CXp_HASARGS : 0;
3159                cx->blk_sub.retop       = retop;
3160                SvREFCNT_inc_simple_void_NN(cv);
3161            */
3162
3163       Note that "cx_pushblock()" sets two new floors: for the args stack (to
3164       "MARK") and the temps stack (to "PL_tmps_ix"). While executing at this
3165       scope level, every "nextstate" (amongst others) will reset the args and
3166       tmps stack levels to these floors. Note that since "cx_pushblock" uses
3167       the current value of "PL_tmps_ix" rather than it being passed as an
3168       arg, this dictates at what point "cx_pushblock" should be called. In
3169       particular, any new mortals which should be freed only on scope exit
3170       (rather than at the next "nextstate") should be created first.
3171
3172       Most callers of "cx_pushblock" simply set the new args stack floor to
3173       the top of the previous stack frame, but for "CXt_LOOP_LIST" it stores
3174       the items being iterated over on the stack, and so sets "blk_oldsp" to
3175       the top of these items instead. Note that, contrary to its name,
3176       "blk_oldsp" doesn't always represent the value to restore "PL_stack_sp"
3177       to on scope exit.
3178
3179       Note the early capture of "PL_savestack_ix" to "old_ss_ix", which is
3180       later passed as an arg to "cx_pushblock". In the case of "pp_entersub",
3181       this is because, although most values needing saving are stored in
3182       fields of the context struct, an extra value needs saving only when the
3183       debugger is running, and it doesn't make sense to bloat the struct for
3184       this rare case. So instead it is saved on the savestack. Since this
3185       value gets calculated and saved before the context is pushed, it is
3186       necessary to pass the old value of "PL_savestack_ix" to "cx_pushblock",
3187       to ensure that the saved value gets freed during scope exit.  For most
3188       users of "cx_pushblock", where nothing needs pushing on the save stack,
3189       "PL_savestack_ix" is just passed directly as an arg to "cx_pushblock".
3190
3191       Note that where possible, values should be saved in the context struct
3192       rather than on the save stack; it's much faster that way.
3193
3194       Normally "cx_pushblock" should be immediately followed by the
3195       appropriate "cx_pushfoo", with nothing between them; this is because if
3196       code in-between could die (e.g. a warning upgraded to fatal), then the
3197       context stack unwinding code in "dounwind" would see (in the example
3198       above) a "CXt_SUB" context frame, but without all the subroutine-
3199       specific fields set, and crashes would soon ensue.
3200
3201       Where the two must be separate, initially set the type to "CXt_NULL" or
3202       "CXt_BLOCK", and later change it to "CXt_foo" when doing the
3203       "cx_pushfoo". This is exactly what "pp_enteriter" does, once it's
3204       determined which type of loop it's pushing.
3205
3206   Popping contexts
3207       Contexts are popped using "cx_popsub()" etc. and "cx_popblock()". Note
3208       however, that unlike "cx_pushblock", neither of these functions
3209       actually decrement the current context stack index; this is done
3210       separately using "CX_POP()".
3211
3212       There are two main ways that contexts are popped. During normal
3213       execution as scopes are exited, functions like "pp_leave",
3214       "pp_leaveloop" and "pp_leavesub" process and pop just one context using
3215       "cx_popfoo" and "cx_popblock". On the other hand, things like
3216       "pp_return" and "next" may have to pop back several scopes until a sub
3217       or loop context is found, and exceptions (such as "die") need to pop
3218       back contexts until an eval context is found. Both of these are
3219       accomplished by "dounwind()", which is capable of processing and
3220       popping all contexts above the target one.
3221
3222       Here is a typical example of context popping, as found in "pp_leavesub"
3223       (simplified slightly):
3224
3225        U8 gimme;
3226        PERL_CONTEXT *cx;
3227        SV **oldsp;
3228        OP *retop;
3229
3230        cx = CX_CUR();
3231
3232        gimme = cx->blk_gimme;
3233        oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */
3234
3235        if (gimme == G_VOID)
3236            PL_stack_sp = oldsp;
3237        else
3238            leave_adjust_stacks(oldsp, oldsp, gimme, 0);
3239
3240        CX_LEAVE_SCOPE(cx);
3241        cx_popsub(cx);
3242        cx_popblock(cx);
3243        retop = cx->blk_sub.retop;
3244        CX_POP(cx);
3245
3246        return retop;
3247
3248       The steps above are in a very specific order, designed to be the
3249       reverse order of when the context was pushed. The first thing to do is
3250       to copy and/or protect any any return arguments and free any temps in
3251       the current scope. Scope exits like an rvalue sub normally return a
3252       mortal copy of their return args (as opposed to lvalue subs). It is
3253       important to make this copy before the save stack is popped or
3254       variables are restored, or bad things like the following can happen:
3255
3256           sub f { my $x =...; $x }  # $x freed before we get to copy it
3257           sub f { /(...)/;    $1 }  # PL_curpm restored before $1 copied
3258
3259       Although we wish to free any temps at the same time, we have to be
3260       careful not to free any temps which are keeping return args alive; nor
3261       to free the temps we have just created while mortal copying return
3262       args. Fortunately, "leave_adjust_stacks()" is capable of making mortal
3263       copies of return args, shifting args down the stack, and only
3264       processing those entries on the temps stack that are safe to do so.
3265
3266       In void context no args are returned, so it's more efficient to skip
3267       calling "leave_adjust_stacks()". Also in void context, a "nextstate" op
3268       is likely to be imminently called which will do a "FREETMPS", so
3269       there's no need to do that either.
3270
3271       The next step is to pop savestack entries: "CX_LEAVE_SCOPE(cx)" is just
3272       defined as "LEAVE_SCOPE(cx->blk_oldsaveix)". Note that during the
3273       popping, it's possible for perl to call destructors, call "STORE" to
3274       undo localisations of tied vars, and so on. Any of these can die or
3275       call "exit()". In this case, "dounwind()" will be called, and the
3276       current context stack frame will be re-processed. Thus it is vital that
3277       all steps in popping a context are done in such a way to support
3278       reentrancy.  The other alternative, of decrementing "cxstack_ix" before
3279       processing the frame, would lead to leaks and the like if something
3280       died halfway through, or overwriting of the current frame.
3281
3282       "CX_LEAVE_SCOPE" itself is safely re-entrant: if only half the
3283       savestack items have been popped before dying and getting trapped by
3284       eval, then the "CX_LEAVE_SCOPE"s in "dounwind" or "pp_leaveeval" will
3285       continue where the first one left off.
3286
3287       The next step is the type-specific context processing; in this case
3288       "cx_popsub". In part, this looks like:
3289
3290           cv = cx->blk_sub.cv;
3291           CvDEPTH(cv) = cx->blk_sub.olddepth;
3292           cx->blk_sub.cv = NULL;
3293           SvREFCNT_dec(cv);
3294
3295       where its processing the just-executed CV. Note that before it
3296       decrements the CV's reference count, it nulls the "blk_sub.cv". This
3297       means that if it re-enters, the CV won't be freed twice. It also means
3298       that you can't rely on such type-specific fields having useful values
3299       after the return from "cx_popfoo".
3300
3301       Next, "cx_popblock" restores all the various interpreter vars to their
3302       previous values or previous high water marks; it expands to:
3303
3304           PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp;
3305           PL_scopestack_ix = cx->blk_oldscopesp;
3306           PL_curpm         = cx->blk_oldpm;
3307           PL_curcop        = cx->blk_oldcop;
3308           PL_tmps_floor    = cx->blk_old_tmpsfloor;
3309
3310       Note that it doesn't restore "PL_stack_sp"; as mentioned earlier, which
3311       value to restore it to depends on the context type (specifically "for
3312       (list) {}"), and what args (if any) it returns; and that will already
3313       have been sorted out earlier by "leave_adjust_stacks()".
3314
3315       Finally, the context stack pointer is actually decremented by
3316       "CX_POP(cx)".  After this point, it's possible that that the current
3317       context frame could be overwritten by other contexts being pushed.
3318       Although things like ties and "DESTROY" are supposed to work within a
3319       new context stack, it's best not to assume this. Indeed on debugging
3320       builds, "CX_POP(cx)" deliberately sets "cx" to null to detect code that
3321       is still relying on the field values in that context frame. Note in the
3322       "pp_leavesub()" example above, we grab "blk_sub.retop" before calling
3323       "CX_POP".
3324
3325   Redoing contexts
3326       Finally, there is "cx_topblock(cx)", which acts like a
3327       super-"nextstate" as regards to resetting various vars to their base
3328       values. It is used in places like "pp_next", "pp_redo" and "pp_goto"
3329       where rather than exiting a scope, we want to re-initialise the scope.
3330       As well as resetting "PL_stack_sp" like "nextstate", it also resets
3331       "PL_markstack_ptr", "PL_scopestack_ix" and "PL_curpm". Note that it
3332       doesn't do a "FREETMPS".
3333

Slab-based operator allocation

3335       Note: this section describes a non-public internal API that is subject
3336       to change without notice.
3337
3338       Perl's internal error-handling mechanisms implement "die" (and its
3339       internal equivalents) using longjmp. If this occurs during lexing,
3340       parsing or compilation, we must ensure that any ops allocated as part
3341       of the compilation process are freed. (Older Perl versions did not
3342       adequately handle this situation: when failing a parse, they would leak
3343       ops that were stored in C "auto" variables and not linked anywhere
3344       else.)
3345
3346       To handle this situation, Perl uses op slabs that are attached to the
3347       currently-compiling CV. A slab is a chunk of allocated memory. New ops
3348       are allocated as regions of the slab. If the slab fills up, a new one
3349       is created (and linked from the previous one). When an error occurs and
3350       the CV is freed, any ops remaining are freed.
3351
3352       Each op is preceded by two pointers: one points to the next op in the
3353       slab, and the other points to the slab that owns it. The next-op
3354       pointer is needed so that Perl can iterate over a slab and free all its
3355       ops. (Op structures are of different sizes, so the slab's ops can't
3356       merely be treated as a dense array.)  The slab pointer is needed for
3357       accessing a reference count on the slab: when the last op on a slab is
3358       freed, the slab itself is freed.
3359
3360       The slab allocator puts the ops at the end of the slab first. This will
3361       tend to allocate the leaves of the op tree first, and the layout will
3362       therefore hopefully be cache-friendly. In addition, this means that
3363       there's no need to store the size of the slab (see below on why slabs
3364       vary in size), because Perl can follow pointers to find the last op.
3365
3366       It might seem possible eliminate slab reference counts altogether, by
3367       having all ops implicitly attached to "PL_compcv" when allocated and
3368       freed when the CV is freed. That would also allow "op_free" to skip
3369       "FreeOp" altogether, and thus free ops faster. But that doesn't work in
3370       those cases where ops need to survive beyond their CVs, such as re-
3371       evals.
3372
3373       The CV also has to have a reference count on the slab. Sometimes the
3374       first op created is immediately freed. If the reference count of the
3375       slab reaches 0, then it will be freed with the CV still pointing to it.
3376
3377       CVs use the "CVf_SLABBED" flag to indicate that the CV has a reference
3378       count on the slab. When this flag is set, the slab is accessible via
3379       "CvSTART" when "CvROOT" is not set, or by subtracting two pointers
3380       "(2*sizeof(I32 *))" from "CvROOT" when it is set. The alternative to
3381       this approach of sneaking the slab into "CvSTART" during compilation
3382       would be to enlarge the "xpvcv" struct by another pointer. But that
3383       would make all CVs larger, even though slab-based op freeing is
3384       typically of benefit only for programs that make significant use of
3385       string eval.
3386
3387       When the "CVf_SLABBED" flag is set, the CV takes responsibility for
3388       freeing the slab. If "CvROOT" is not set when the CV is freed or
3389       undeffed, it is assumed that a compilation error has occurred, so the
3390       op slab is traversed and all the ops are freed.
3391
3392       Under normal circumstances, the CV forgets about its slab (decrementing
3393       the reference count) when the root is attached. So the slab reference
3394       counting that happens when ops are freed takes care of freeing the
3395       slab. In some cases, the CV is told to forget about the slab
3396       ("cv_forget_slab") precisely so that the ops can survive after the CV
3397       is done away with.
3398
3399       Forgetting the slab when the root is attached is not strictly
3400       necessary, but avoids potential problems with "CvROOT" being written
3401       over. There is code all over the place, both in core and on CPAN, that
3402       does things with "CvROOT", so forgetting the slab makes things more
3403       robust and avoids potential problems.
3404
3405       Since the CV takes ownership of its slab when flagged, that flag is
3406       never copied when a CV is cloned, as one CV could free a slab that
3407       another CV still points to, since forced freeing of ops ignores the
3408       reference count (but asserts that it looks right).
3409
3410       To avoid slab fragmentation, freed ops are marked as freed and attached
3411       to the slab's freed chain (an idea stolen from DBM::Deep). Those freed
3412       ops are reused when possible. Not reusing freed ops would be simpler,
3413       but it would result in significantly higher memory usage for programs
3414       with large "if (DEBUG) {...}" blocks.
3415
3416       "SAVEFREEOP" is slightly problematic under this scheme. Sometimes it
3417       can cause an op to be freed after its CV. If the CV has forcibly freed
3418       the ops on its slab and the slab itself, then we will be fiddling with
3419       a freed slab. Making "SAVEFREEOP" a no-op doesn't help, as sometimes an
3420       op can be savefreed when there is no compilation error, so the op would
3421       never be freed. It holds a reference count on the slab, so the whole
3422       slab would leak. So "SAVEFREEOP" now sets a special flag on the op
3423       ("->op_savefree"). The forced freeing of ops after a compilation error
3424       won't free any ops thus marked.
3425
3426       Since many pieces of code create tiny subroutines consisting of only a
3427       few ops, and since a huge slab would be quite a bit of baggage for
3428       those to carry around, the first slab is always very small. To avoid
3429       allocating too many slabs for a single CV, each subsequent slab is
3430       twice the size of the previous.
3431
3432       Smartmatch expects to be able to allocate an op at run time, run it,
3433       and then throw it away. For that to work the op is simply malloced when
3434       PL_compcv hasn't been set up. So all slab-allocated ops are marked as
3435       such ("->op_slabbed"), to distinguish them from malloced ops.
3436

AUTHORS

3438       Until May 1997, this document was maintained by Jeff Okamoto
3439       <okamoto@corp.hp.com>.  It is now maintained as part of Perl itself by
3440       the Perl 5 Porters <perl5-porters@perl.org>.
3441
3442       With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
3443       Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
3444       Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
3445       Stephen McCamant, and Gurusamy Sarathy.
3446