1PERLGUTS(1)            Perl Programmers Reference Guide            PERLGUTS(1)
2
3
4

NAME

6       perlguts - Introduction to the Perl API
7

DESCRIPTION

9       This document attempts to describe how to use the Perl API, as well as
10       to provide some info on the basic workings of the Perl core.  It is far
11       from complete and probably contains many errors.  Please refer any
12       questions or comments to the author below.
13

Variables

15   Datatypes
16       Perl has three typedefs that handle Perl's three main data types:
17
18           SV  Scalar Value
19           AV  Array Value
20           HV  Hash Value
21
22       Each typedef has specific routines that manipulate the various data
23       types.
24
25   What is an "IV"?
26       Perl uses a special typedef IV which is a simple signed integer type
27       that is guaranteed to be large enough to hold a pointer (as well as an
28       integer).  Additionally, there is the UV, which is simply an unsigned
29       IV.
30
31       Perl also uses several special typedefs to declare variables to hold
32       integers of (at least) a given size.  Use I8, I16, I32, and I64 to
33       declare a signed integer variable which has at least as many bits as
34       the number in its name.  These all evaluate to the native C type that
35       is closest to the given number of bits, but no smaller than that
36       number.  For example, on many platforms, a "short" is 16 bits long, and
37       if so, I16 will evaluate to a "short".  But on platforms where a
38       "short" isn't exactly 16 bits, Perl will use the smallest type that
39       contains 16 bits or more.
40
41       U8, U16, U32, and U64 are to declare the corresponding unsigned integer
42       types.
43
44       If the platform doesn't support 64-bit integers, both I64 and U64 will
45       be undefined.  Use IV and UV to declare the largest practicable, and
46       ""WIDEST_UTYPE" in perlapi" for the absolute maximum unsigned, but
47       which may not be usable in all circumstances.
48
49       A numeric constant can be specified with ""INT16_C"" in perlapi,
50       ""UINTMAX_C"" in perlapi, and similar.
51
52   Working with SVs
53       An SV can be created and loaded with one command.  There are five types
54       of values that can be loaded: an integer value (IV), an unsigned
55       integer value (UV), a double (NV), a string (PV), and another scalar
56       (SV).  ("PV" stands for "Pointer Value".  You might think that it is
57       misnamed because it is described as pointing only to strings.  However,
58       it is possible to have it point to other things.  For example, it could
59       point to an array of UVs.  But, using it for non-strings requires care,
60       as the underlying assumption of much of the internals is that PVs are
61       just for strings.  Often, for example, a trailing "NUL" is tacked on
62       automatically.  The non-string use is documented only in this
63       paragraph.)
64
65       The seven routines are:
66
67           SV*  newSViv(IV);
68           SV*  newSVuv(UV);
69           SV*  newSVnv(double);
70           SV*  newSVpv(const char*, STRLEN);
71           SV*  newSVpvn(const char*, STRLEN);
72           SV*  newSVpvf(const char*, ...);
73           SV*  newSVsv(SV*);
74
75       "STRLEN" is an integer type ("Size_t", usually defined as "size_t" in
76       config.h) guaranteed to be large enough to represent the size of any
77       string that perl can handle.
78
79       In the unlikely case of a SV requiring more complex initialization, you
80       can create an empty SV with newSV(len).  If "len" is 0 an empty SV of
81       type NULL is returned, else an SV of type PV is returned with len + 1
82       (for the "NUL") bytes of storage allocated, accessible via SvPVX.  In
83       both cases the SV has the undef value.
84
85           SV *sv = newSV(0);   /* no storage allocated  */
86           SV *sv = newSV(10);  /* 10 (+1) bytes of uninitialised storage
87                                 * allocated */
88
89       To change the value of an already-existing SV, there are eight
90       routines:
91
92           void  sv_setiv(SV*, IV);
93           void  sv_setuv(SV*, UV);
94           void  sv_setnv(SV*, double);
95           void  sv_setpv(SV*, const char*);
96           void  sv_setpvn(SV*, const char*, STRLEN)
97           void  sv_setpvf(SV*, const char*, ...);
98           void  sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
99                                               SV **, Size_t, bool *);
100           void  sv_setsv(SV*, SV*);
101
102       Notice that you can choose to specify the length of the string to be
103       assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
104       allow Perl to calculate the length by using "sv_setpv" or by specifying
105       0 as the second argument to "newSVpv".  Be warned, though, that Perl
106       will determine the string's length by using "strlen", which depends on
107       the string terminating with a "NUL" character, and not otherwise
108       containing NULs.
109
110       The arguments of "sv_setpvf" are processed like "sprintf", and the
111       formatted output becomes the value.
112
113       "sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to
114       specify either a pointer to a variable argument list or the address and
115       length of an array of SVs.  The last argument points to a boolean; on
116       return, if that boolean is true, then locale-specific information has
117       been used to format the string, and the string's contents are therefore
118       untrustworthy (see perlsec).  This pointer may be NULL if that
119       information is not important.  Note that this function requires you to
120       specify the length of the format.
121
122       The "sv_set*()" functions are not generic enough to operate on values
123       that have "magic".  See "Magic Virtual Tables" later in this document.
124
125       All SVs that contain strings should be terminated with a "NUL"
126       character.  If it is not "NUL"-terminated there is a risk of core dumps
127       and corruptions from code which passes the string to C functions or
128       system calls which expect a "NUL"-terminated string.  Perl's own
129       functions typically add a trailing "NUL" for this reason.
130       Nevertheless, you should be very careful when you pass a string stored
131       in an SV to a C function or system call.
132
133       To access the actual value that an SV points to, Perl's API exposes
134       several macros that coerce the actual scalar type into an IV, UV,
135       double, or string:
136
137       •   "SvIV(SV*)" ("IV") and "SvUV(SV*)" ("UV")
138
139       •   "SvNV(SV*)" ("double")
140
141       •   Strings are a bit complicated:
142
143           •   Byte string: "SvPVbyte(SV*, STRLEN len)" or
144               "SvPVbyte_nolen(SV*)"
145
146               If the Perl string is "\xff\xff", then this returns a 2-byte
147               "char*".
148
149               This is suitable for Perl strings that represent bytes.
150
151           •   UTF-8 string: "SvPVutf8(SV*, STRLEN len)" or
152               "SvPVutf8_nolen(SV*)"
153
154               If the Perl string is "\xff\xff", then this returns a 4-byte
155               "char*".
156
157               This is suitable for Perl strings that represent characters.
158
159               CAVEAT: That "char*" will be encoded via Perl's internal UTF-8
160               variant, which means that if the SV contains non-Unicode code
161               points (e.g., 0x110000), then the result may contain extensions
162               over valid UTF-8.  See "is_strict_utf8_string" in perlapi for
163               some methods Perl gives you to check the UTF-8 validity of
164               these macros' returns.
165
166           •   You can also use "SvPV(SV*, STRLEN len)" or "SvPV_nolen(SV*)"
167               to fetch the SV's raw internal buffer. This is tricky, though;
168               if your Perl string is "\xff\xff", then depending on the SV's
169               internal encoding you might get back a 2-byte OR a 4-byte
170               "char*".  Moreover, if it's the 4-byte string, that could come
171               from either Perl "\xff\xff" stored UTF-8 encoded, or Perl
172               "\xc3\xbf\xc3\xbf" stored as raw octets. To differentiate
173               between these you MUST look up the SV's UTF8 bit (cf. "SvUTF8")
174               to know whether the source Perl string is 2 characters
175               ("SvUTF8" would be on) or 4 characters ("SvUTF8" would be off).
176
177               IMPORTANT: Use of "SvPV", "SvPV_nolen", or similarly-named
178               macros without looking up the SV's UTF8 bit is almost certainly
179               a bug if non-ASCII input is allowed.
180
181               When the UTF8 bit is on, the same CAVEAT about UTF-8 validity
182               applies here as for "SvPVutf8".
183
184           (See "How do I pass a Perl string to a C library?" for more
185           details.)
186
187           In "SvPVbyte", "SvPVutf8", and "SvPV", the length of the "char*"
188           returned is placed into the variable "len" (these are macros, so
189           you do not use &len). If you do not care what the length of the
190           data is, use "SvPVbyte_nolen", "SvPVutf8_nolen", or "SvPV_nolen"
191           instead.  The global variable "PL_na" can also be given to
192           "SvPVbyte"/"SvPVutf8"/"SvPV" in this case.  But that can be quite
193           inefficient because "PL_na" must be accessed in thread-local
194           storage in threaded Perl.  In any case, remember that Perl allows
195           arbitrary strings of data that may both contain NULs and might not
196           be terminated by a "NUL".
197
198           Also remember that C doesn't allow you to safely say
199           "foo(SvPVbyte(s, len), len);".  It might work with your compiler,
200           but it won't work for everyone.  Break this sort of statement up
201           into separate assignments:
202
203               SV *s;
204               STRLEN len;
205               char *ptr;
206               ptr = SvPVbyte(s, len);
207               foo(ptr, len);
208
209       If you want to know if the scalar value is TRUE, you can use:
210
211           SvTRUE(SV*)
212
213       Although Perl will automatically grow strings for you, if you need to
214       force Perl to allocate more memory for your SV, you can use the macro
215
216           SvGROW(SV*, STRLEN newlen)
217
218       which will determine if more memory needs to be allocated.  If so, it
219       will call the function "sv_grow".  Note that "SvGROW" can only
220       increase, not decrease, the allocated memory of an SV and that it does
221       not automatically add space for the trailing "NUL" byte (perl's own
222       string functions typically do "SvGROW(sv, len + 1)").
223
224       If you want to write to an existing SV's buffer and set its value to a
225       string, use SvPVbyte_force() or one of its variants to force the SV to
226       be a PV.  This will remove any of various types of non-stringness from
227       the SV while preserving the content of the SV in the PV.  This can be
228       used, for example, to append data from an API function to a buffer
229       without extra copying:
230
231           (void)SvPVbyte_force(sv, len);
232           s = SvGROW(sv, len + needlen + 1);
233           /* something that modifies up to needlen bytes at s+len, but
234              modifies newlen bytes
235                eg. newlen = read(fd, s + len, needlen);
236              ignoring errors for these examples
237            */
238           s[len + newlen] = '\0';
239           SvCUR_set(sv, len + newlen);
240           SvUTF8_off(sv);
241           SvSETMAGIC(sv);
242
243       If you already have the data in memory or if you want to keep your code
244       simple, you can use one of the sv_cat*() variants, such as sv_catpvn().
245       If you want to insert anywhere in the string you can use sv_insert() or
246       sv_insert_flags().
247
248       If you don't need the existing content of the SV, you can avoid some
249       copying with:
250
251           SvPVCLEAR(sv);
252           s = SvGROW(sv, needlen + 1);
253           /* something that modifies up to needlen bytes at s, but modifies
254              newlen bytes
255                eg. newlen = read(fd, s, needlen);
256            */
257           s[newlen] = '\0';
258           SvCUR_set(sv, newlen);
259           SvPOK_only(sv); /* also clears SVf_UTF8 */
260           SvSETMAGIC(sv);
261
262       Again, if you already have the data in memory or want to avoid the
263       complexity of the above, you can use sv_setpvn().
264
265       If you have a buffer allocated with Newx() and want to set that as the
266       SV's value, you can use sv_usepvn_flags().  That has some requirements
267       if you want to avoid perl re-allocating the buffer to fit the trailing
268       NUL:
269
270          Newx(buf, somesize+1, char);
271          /* ... fill in buf ... */
272          buf[somesize] = '\0';
273          sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL);
274          /* buf now belongs to perl, don't release it */
275
276       If you have an SV and want to know what kind of data Perl thinks is
277       stored in it, you can use the following macros to check the type of SV
278       you have.
279
280           SvIOK(SV*)
281           SvNOK(SV*)
282           SvPOK(SV*)
283
284       You can get and set the current length of the string stored in an SV
285       with the following macros:
286
287           SvCUR(SV*)
288           SvCUR_set(SV*, I32 val)
289
290       You can also get a pointer to the end of the string stored in the SV
291       with the macro:
292
293           SvEND(SV*)
294
295       But note that these last three macros are valid only if "SvPOK()" is
296       true.
297
298       If you want to append something to the end of string stored in an
299       "SV*", you can use the following functions:
300
301           void  sv_catpv(SV*, const char*);
302           void  sv_catpvn(SV*, const char*, STRLEN);
303           void  sv_catpvf(SV*, const char*, ...);
304           void  sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
305                                                                    I32, bool);
306           void  sv_catsv(SV*, SV*);
307
308       The first function calculates the length of the string to be appended
309       by using "strlen".  In the second, you specify the length of the string
310       yourself.  The third function processes its arguments like "sprintf"
311       and appends the formatted output.  The fourth function works like
312       "vsprintf".  You can specify the address and length of an array of SVs
313       instead of the va_list argument.  The fifth function extends the string
314       stored in the first SV with the string stored in the second SV.  It
315       also forces the second SV to be interpreted as a string.
316
317       The "sv_cat*()" functions are not generic enough to operate on values
318       that have "magic".  See "Magic Virtual Tables" later in this document.
319
320       If you know the name of a scalar variable, you can get a pointer to its
321       SV by using the following:
322
323           SV*  get_sv("package::varname", 0);
324
325       This returns NULL if the variable does not exist.
326
327       If you want to know if this variable (or any other SV) is actually
328       "defined", you can call:
329
330           SvOK(SV*)
331
332       The scalar "undef" value is stored in an SV instance called
333       "PL_sv_undef".
334
335       Its address can be used whenever an "SV*" is needed.  Make sure that
336       you don't try to compare a random sv with &PL_sv_undef.  For example
337       when interfacing Perl code, it'll work correctly for:
338
339         foo(undef);
340
341       But won't work when called as:
342
343         $x = undef;
344         foo($x);
345
346       So to repeat always use SvOK() to check whether an sv is defined.
347
348       Also you have to be careful when using &PL_sv_undef as a value in AVs
349       or HVs (see "AVs, HVs and undefined values").
350
351       There are also the two values "PL_sv_yes" and "PL_sv_no", which contain
352       boolean TRUE and FALSE values, respectively.  Like "PL_sv_undef", their
353       addresses can be used whenever an "SV*" is needed.
354
355       Do not be fooled into thinking that "(SV *) 0" is the same as
356       &PL_sv_undef.  Take this code:
357
358           SV* sv = (SV*) 0;
359           if (I-am-to-return-a-real-value) {
360                   sv = sv_2mortal(newSViv(42));
361           }
362           sv_setsv(ST(0), sv);
363
364       This code tries to return a new SV (which contains the value 42) if it
365       should return a real value, or undef otherwise.  Instead it has
366       returned a NULL pointer which, somewhere down the line, will cause a
367       segmentation violation, bus error, or just weird results.  Change the
368       zero to &PL_sv_undef in the first line and all will be well.
369
370       To free an SV that you've created, call "SvREFCNT_dec(SV*)".  Normally
371       this call is not necessary (see "Reference Counts and Mortality").
372
373   Offsets
374       Perl provides the function "sv_chop" to efficiently remove characters
375       from the beginning of a string; you give it an SV and a pointer to
376       somewhere inside the PV, and it discards everything before the pointer.
377       The efficiency comes by means of a little hack: instead of actually
378       removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to
379       signal to other functions that the offset hack is in effect, and it
380       moves the PV pointer (called "SvPVX") forward by the number of bytes
381       chopped off, and adjusts "SvCUR" and "SvLEN" accordingly.  (A portion
382       of the space between the old and new PV pointers is used to store the
383       count of chopped bytes.)
384
385       Hence, at this point, the start of the buffer that we allocated lives
386       at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing into
387       the middle of this allocated storage.
388
389       This is best demonstrated by example.  Normally copy-on-write will
390       prevent the substitution from operator from using this hack, but if you
391       can craft a string for which copy-on-write is not possible, you can see
392       it in play.  In the current implementation, the final byte of a string
393       buffer is used as a copy-on-write reference count.  If the buffer is
394       not big enough, then copy-on-write is skipped.  First have a look at an
395       empty string:
396
397         % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a'
398         SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
399           REFCNT = 1
400           FLAGS = (POK,pPOK)
401           PV = 0x7ffb7bc05b50 ""\0
402           CUR = 0
403           LEN = 10
404
405       Notice here the LEN is 10.  (It may differ on your platform.)  Extend
406       the length of the string to one less than 10, and do a substitution:
407
408        % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \
409                                                                   Dump($a)'
410        SV = PV(0x7ffa04008a70) at 0x7ffa04030390
411          REFCNT = 1
412          FLAGS = (POK,OOK,pPOK)
413          OFFSET = 1
414          PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0
415          CUR = 8
416          LEN = 9
417
418       Here the number of bytes chopped off (1) is shown next as the OFFSET.
419       The portion of the string between the "real" and the "fake" beginnings
420       is shown in parentheses, and the values of "SvCUR" and "SvLEN" reflect
421       the fake beginning, not the real one.  (The first character of the
422       string buffer happens to have changed to "\1" here, not "1", because
423       the current implementation stores the offset count in the string
424       buffer.  This is subject to change.)
425
426       Something similar to the offset hack is performed on AVs to enable
427       efficient shifting and splicing off the beginning of the array; while
428       "AvARRAY" points to the first element in the array that is visible from
429       Perl, "AvALLOC" points to the real start of the C array.  These are
430       usually the same, but a "shift" operation can be carried out by
431       increasing "AvARRAY" by one and decreasing "AvFILL" and "AvMAX".
432       Again, the location of the real start of the C array only comes into
433       play when freeing the array.  See "av_shift" in av.c.
434
435   What's Really Stored in an SV?
436       Recall that the usual method of determining the type of scalar you have
437       is to use "Sv*OK" macros.  Because a scalar can be both a number and a
438       string, usually these macros will always return TRUE and calling the
439       "Sv*V" macros will do the appropriate conversion of string to
440       integer/double or integer/double to string.
441
442       If you really need to know if you have an integer, double, or string
443       pointer in an SV, you can use the following three macros instead:
444
445           SvIOKp(SV*)
446           SvNOKp(SV*)
447           SvPOKp(SV*)
448
449       These will tell you if you truly have an integer, double, or string
450       pointer stored in your SV.  The "p" stands for private.
451
452       There are various ways in which the private and public flags may
453       differ.  For example, in perl 5.16 and earlier a tied SV may have a
454       valid underlying value in the IV slot (so SvIOKp is true), but the data
455       should be accessed via the FETCH routine rather than directly, so SvIOK
456       is false.  (In perl 5.18 onwards, tied scalars use the flags the same
457       way as untied scalars.)  Another is when numeric conversion has
458       occurred and precision has been lost: only the private flag is set on
459       'lossy' values.  So when an NV is converted to an IV with loss, SvIOKp,
460       SvNOKp and SvNOK will be set, while SvIOK wont be.
461
462       In general, though, it's best to use the "Sv*V" macros.
463
464   Working with AVs
465       There are two ways to create and load an AV.  The first method creates
466       an empty AV:
467
468           AV*  newAV();
469
470       The second method both creates the AV and initially populates it with
471       SVs:
472
473           AV*  av_make(SSize_t num, SV **ptr);
474
475       The second argument points to an array containing "num" "SV*"'s.  Once
476       the AV has been created, the SVs can be destroyed, if so desired.
477
478       Once the AV has been created, the following operations are possible on
479       it:
480
481           void  av_push(AV*, SV*);
482           SV*   av_pop(AV*);
483           SV*   av_shift(AV*);
484           void  av_unshift(AV*, SSize_t num);
485
486       These should be familiar operations, with the exception of
487       "av_unshift".  This routine adds "num" elements at the front of the
488       array with the "undef" value.  You must then use "av_store" (described
489       below) to assign values to these new elements.
490
491       Here are some other functions:
492
493           SSize_t av_top_index(AV*);
494           SV**    av_fetch(AV*, SSize_t key, I32 lval);
495           SV**    av_store(AV*, SSize_t key, SV* val);
496
497       The "av_top_index" function returns the highest index value in an array
498       (just like $#array in Perl).  If the array is empty, -1 is returned.
499       The "av_fetch" function returns the value at index "key", but if "lval"
500       is non-zero, then "av_fetch" will store an undef value at that index.
501       The "av_store" function stores the value "val" at index "key", and does
502       not increment the reference count of "val".  Thus the caller is
503       responsible for taking care of that, and if "av_store" returns NULL,
504       the caller will have to decrement the reference count to avoid a memory
505       leak.  Note that "av_fetch" and "av_store" both return "SV**"'s, not
506       "SV*"'s as their return value.
507
508       A few more:
509
510           void  av_clear(AV*);
511           void  av_undef(AV*);
512           void  av_extend(AV*, SSize_t key);
513
514       The "av_clear" function deletes all the elements in the AV* array, but
515       does not actually delete the array itself.  The "av_undef" function
516       will delete all the elements in the array plus the array itself.  The
517       "av_extend" function extends the array so that it contains at least
518       "key+1" elements.  If "key+1" is less than the currently allocated
519       length of the array, then nothing is done.
520
521       If you know the name of an array variable, you can get a pointer to its
522       AV by using the following:
523
524           AV*  get_av("package::varname", 0);
525
526       This returns NULL if the variable does not exist.
527
528       See "Understanding the Magic of Tied Hashes and Arrays" for more
529       information on how to use the array access functions on tied arrays.
530
531   Working with HVs
532       To create an HV, you use the following routine:
533
534           HV*  newHV();
535
536       Once the HV has been created, the following operations are possible on
537       it:
538
539           SV**  hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
540           SV**  hv_fetch(HV*, const char* key, U32 klen, I32 lval);
541
542       The "klen" parameter is the length of the key being passed in (Note
543       that you cannot pass 0 in as a value of "klen" to tell Perl to measure
544       the length of the key).  The "val" argument contains the SV pointer to
545       the scalar being stored, and "hash" is the precomputed hash value (zero
546       if you want "hv_store" to calculate it for you).  The "lval" parameter
547       indicates whether this fetch is actually a part of a store operation,
548       in which case a new undefined value will be added to the HV with the
549       supplied key and "hv_fetch" will return as if the value had already
550       existed.
551
552       Remember that "hv_store" and "hv_fetch" return "SV**"'s and not just
553       "SV*".  To access the scalar value, you must first dereference the
554       return value.  However, you should check to make sure that the return
555       value is not NULL before dereferencing it.
556
557       The first of these two functions checks if a hash table entry exists,
558       and the second deletes it.
559
560           bool  hv_exists(HV*, const char* key, U32 klen);
561           SV*   hv_delete(HV*, const char* key, U32 klen, I32 flags);
562
563       If "flags" does not include the "G_DISCARD" flag then "hv_delete" will
564       create and return a mortal copy of the deleted value.
565
566       And more miscellaneous functions:
567
568           void   hv_clear(HV*);
569           void   hv_undef(HV*);
570
571       Like their AV counterparts, "hv_clear" deletes all the entries in the
572       hash table but does not actually delete the hash table.  The "hv_undef"
573       deletes both the entries and the hash table itself.
574
575       Perl keeps the actual data in a linked list of structures with a
576       typedef of HE.  These contain the actual key and value pointers (plus
577       extra administrative overhead).  The key is a string pointer; the value
578       is an "SV*".  However, once you have an "HE*", to get the actual key
579       and value, use the routines specified below.
580
581           I32    hv_iterinit(HV*);
582                   /* Prepares starting point to traverse hash table */
583           HE*    hv_iternext(HV*);
584                   /* Get the next entry, and return a pointer to a
585                      structure that has both the key and value */
586           char*  hv_iterkey(HE* entry, I32* retlen);
587                   /* Get the key from an HE structure and also return
588                      the length of the key string */
589           SV*    hv_iterval(HV*, HE* entry);
590                   /* Return an SV pointer to the value of the HE
591                      structure */
592           SV*    hv_iternextsv(HV*, char** key, I32* retlen);
593                   /* This convenience routine combines hv_iternext,
594                      hv_iterkey, and hv_iterval.  The key and retlen
595                      arguments are return values for the key and its
596                      length.  The value is returned in the SV* argument */
597
598       If you know the name of a hash variable, you can get a pointer to its
599       HV by using the following:
600
601           HV*  get_hv("package::varname", 0);
602
603       This returns NULL if the variable does not exist.
604
605       The hash algorithm is defined in the "PERL_HASH" macro:
606
607           PERL_HASH(hash, key, klen)
608
609       The exact implementation of this macro varies by architecture and
610       version of perl, and the return value may change per invocation, so the
611       value is only valid for the duration of a single perl process.
612
613       See "Understanding the Magic of Tied Hashes and Arrays" for more
614       information on how to use the hash access functions on tied hashes.
615
616   Hash API Extensions
617       Beginning with version 5.004, the following functions are also
618       supported:
619
620           HE*     hv_fetch_ent  (HV* tb, SV* key, I32 lval, U32 hash);
621           HE*     hv_store_ent  (HV* tb, SV* key, SV* val, U32 hash);
622
623           bool    hv_exists_ent (HV* tb, SV* key, U32 hash);
624           SV*     hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
625
626           SV*     hv_iterkeysv  (HE* entry);
627
628       Note that these functions take "SV*" keys, which simplifies writing of
629       extension code that deals with hash structures.  These functions also
630       allow passing of "SV*" keys to "tie" functions without forcing you to
631       stringify the keys (unlike the previous set of functions).
632
633       They also return and accept whole hash entries ("HE*"), making their
634       use more efficient (since the hash number for a particular string
635       doesn't have to be recomputed every time).  See perlapi for detailed
636       descriptions.
637
638       The following macros must always be used to access the contents of hash
639       entries.  Note that the arguments to these macros must be simple
640       variables, since they may get evaluated more than once.  See perlapi
641       for detailed descriptions of these macros.
642
643           HePV(HE* he, STRLEN len)
644           HeVAL(HE* he)
645           HeHASH(HE* he)
646           HeSVKEY(HE* he)
647           HeSVKEY_force(HE* he)
648           HeSVKEY_set(HE* he, SV* sv)
649
650       These two lower level macros are defined, but must only be used when
651       dealing with keys that are not "SV*"s:
652
653           HeKEY(HE* he)
654           HeKLEN(HE* he)
655
656       Note that both "hv_store" and "hv_store_ent" do not increment the
657       reference count of the stored "val", which is the caller's
658       responsibility.  If these functions return a NULL value, the caller
659       will usually have to decrement the reference count of "val" to avoid a
660       memory leak.
661
662   AVs, HVs and undefined values
663       Sometimes you have to store undefined values in AVs or HVs.  Although
664       this may be a rare case, it can be tricky.  That's because you're used
665       to using &PL_sv_undef if you need an undefined SV.
666
667       For example, intuition tells you that this XS code:
668
669           AV *av = newAV();
670           av_store( av, 0, &PL_sv_undef );
671
672       is equivalent to this Perl code:
673
674           my @av;
675           $av[0] = undef;
676
677       Unfortunately, this isn't true.  In perl 5.18 and earlier, AVs use
678       &PL_sv_undef as a marker for indicating that an array element has not
679       yet been initialized.  Thus, "exists $av[0]" would be true for the
680       above Perl code, but false for the array generated by the XS code.  In
681       perl 5.20, storing &PL_sv_undef will create a read-only element,
682       because the scalar &PL_sv_undef itself is stored, not a copy.
683
684       Similar problems can occur when storing &PL_sv_undef in HVs:
685
686           hv_store( hv, "key", 3, &PL_sv_undef, 0 );
687
688       This will indeed make the value "undef", but if you try to modify the
689       value of "key", you'll get the following error:
690
691           Modification of non-creatable hash value attempted
692
693       In perl 5.8.0, &PL_sv_undef was also used to mark placeholders in
694       restricted hashes.  This caused such hash entries not to appear when
695       iterating over the hash or when checking for the keys with the
696       "hv_exists" function.
697
698       You can run into similar problems when you store &PL_sv_yes or
699       &PL_sv_no into AVs or HVs.  Trying to modify such elements will give
700       you the following error:
701
702           Modification of a read-only value attempted
703
704       To make a long story short, you can use the special variables
705       &PL_sv_undef, &PL_sv_yes and &PL_sv_no with AVs and HVs, but you have
706       to make sure you know what you're doing.
707
708       Generally, if you want to store an undefined value in an AV or HV, you
709       should not use &PL_sv_undef, but rather create a new undefined value
710       using the "newSV" function, for example:
711
712           av_store( av, 42, newSV(0) );
713           hv_store( hv, "foo", 3, newSV(0), 0 );
714
715   References
716       References are a special type of scalar that point to other data types
717       (including other references).
718
719       To create a reference, use either of the following functions:
720
721           SV* newRV_inc((SV*) thing);
722           SV* newRV_noinc((SV*) thing);
723
724       The "thing" argument can be any of an "SV*", "AV*", or "HV*".  The
725       functions are identical except that "newRV_inc" increments the
726       reference count of the "thing", while "newRV_noinc" does not.  For
727       historical reasons, "newRV" is a synonym for "newRV_inc".
728
729       Once you have a reference, you can use the following macro to
730       dereference the reference:
731
732           SvRV(SV*)
733
734       then call the appropriate routines, casting the returned "SV*" to
735       either an "AV*" or "HV*", if required.
736
737       To determine if an SV is a reference, you can use the following macro:
738
739           SvROK(SV*)
740
741       To discover what type of value the reference refers to, use the
742       following macro and then check the return value.
743
744           SvTYPE(SvRV(SV*))
745
746       The most useful types that will be returned are:
747
748           SVt_PVAV    Array
749           SVt_PVHV    Hash
750           SVt_PVCV    Code
751           SVt_PVGV    Glob (possibly a file handle)
752
753       Any numerical value returned which is less than SVt_PVAV will be a
754       scalar of some form.
755
756       See "svtype" in perlapi for more details.
757
758   Blessed References and Class Objects
759       References are also used to support object-oriented programming.  In
760       perl's OO lexicon, an object is simply a reference that has been
761       blessed into a package (or class).  Once blessed, the programmer may
762       now use the reference to access the various methods in the class.
763
764       A reference can be blessed into a package with the following function:
765
766           SV* sv_bless(SV* sv, HV* stash);
767
768       The "sv" argument must be a reference value.  The "stash" argument
769       specifies which class the reference will belong to.  See "Stashes and
770       Globs" for information on converting class names into stashes.
771
772       /* Still under construction */
773
774       The following function upgrades rv to reference if not already one.
775       Creates a new SV for rv to point to.  If "classname" is non-null, the
776       SV is blessed into the specified class.  SV is returned.
777
778               SV* newSVrv(SV* rv, const char* classname);
779
780       The following three functions copy integer, unsigned integer or double
781       into an SV whose reference is "rv".  SV is blessed if "classname" is
782       non-null.
783
784               SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
785               SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
786               SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
787
788       The following function copies the pointer value (the address, not the
789       string!) into an SV whose reference is rv.  SV is blessed if
790       "classname" is non-null.
791
792               SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
793
794       The following function copies a string into an SV whose reference is
795       "rv".  Set length to 0 to let Perl calculate the string length.  SV is
796       blessed if "classname" is non-null.
797
798           SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
799                                                                STRLEN length);
800
801       The following function tests whether the SV is blessed into the
802       specified class.  It does not check inheritance relationships.
803
804               int  sv_isa(SV* sv, const char* name);
805
806       The following function tests whether the SV is a reference to a blessed
807       object.
808
809               int  sv_isobject(SV* sv);
810
811       The following function tests whether the SV is derived from the
812       specified class.  SV can be either a reference to a blessed object or a
813       string containing a class name.  This is the function implementing the
814       "UNIVERSAL::isa" functionality.
815
816               bool sv_derived_from(SV* sv, const char* name);
817
818       To check if you've got an object derived from a specific class you have
819       to write:
820
821               if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
822
823   Creating New Variables
824       To create a new Perl variable with an undef value which can be accessed
825       from your Perl script, use the following routines, depending on the
826       variable type.
827
828           SV*  get_sv("package::varname", GV_ADD);
829           AV*  get_av("package::varname", GV_ADD);
830           HV*  get_hv("package::varname", GV_ADD);
831
832       Notice the use of GV_ADD as the second parameter.  The new variable can
833       now be set, using the routines appropriate to the data type.
834
835       There are additional macros whose values may be bitwise OR'ed with the
836       "GV_ADD" argument to enable certain extra features.  Those bits are:
837
838       GV_ADDMULTI
839           Marks the variable as multiply defined, thus preventing the:
840
841             Name <varname> used only once: possible typo
842
843           warning.
844
845       GV_ADDWARN
846           Issues the warning:
847
848             Had to create <varname> unexpectedly
849
850           if the variable did not exist before the function was called.
851
852       If you do not specify a package name, the variable is created in the
853       current package.
854
855   Reference Counts and Mortality
856       Perl uses a reference count-driven garbage collection mechanism.  SVs,
857       AVs, or HVs (xV for short in the following) start their life with a
858       reference count of 1.  If the reference count of an xV ever drops to 0,
859       then it will be destroyed and its memory made available for reuse.  At
860       the most basic internal level, reference counts can be manipulated with
861       the following macros:
862
863           int SvREFCNT(SV* sv);
864           SV* SvREFCNT_inc(SV* sv);
865           void SvREFCNT_dec(SV* sv);
866
867       (There are also suffixed versions of the increment and decrement
868       macros, for situations where the full generality of these basic macros
869       can be exchanged for some performance.)
870
871       However, the way a programmer should think about references is not so
872       much in terms of the bare reference count, but in terms of ownership of
873       references.  A reference to an xV can be owned by any of a variety of
874       entities: another xV, the Perl interpreter, an XS data structure, a
875       piece of running code, or a dynamic scope.  An xV generally does not
876       know what entities own the references to it; it only knows how many
877       references there are, which is the reference count.
878
879       To correctly maintain reference counts, it is essential to keep track
880       of what references the XS code is manipulating.  The programmer should
881       always know where a reference has come from and who owns it, and be
882       aware of any creation or destruction of references, and any transfers
883       of ownership.  Because ownership isn't represented explicitly in the xV
884       data structures, only the reference count need be actually maintained
885       by the code, and that means that this understanding of ownership is not
886       actually evident in the code.  For example, transferring ownership of a
887       reference from one owner to another doesn't change the reference count
888       at all, so may be achieved with no actual code.  (The transferring code
889       doesn't touch the referenced object, but does need to ensure that the
890       former owner knows that it no longer owns the reference, and that the
891       new owner knows that it now does.)
892
893       An xV that is visible at the Perl level should not become unreferenced
894       and thus be destroyed.  Normally, an object will only become
895       unreferenced when it is no longer visible, often by the same means that
896       makes it invisible.  For example, a Perl reference value (RV) owns a
897       reference to its referent, so if the RV is overwritten that reference
898       gets destroyed, and the no-longer-reachable referent may be destroyed
899       as a result.
900
901       Many functions have some kind of reference manipulation as part of
902       their purpose.  Sometimes this is documented in terms of ownership of
903       references, and sometimes it is (less helpfully) documented in terms of
904       changes to reference counts.  For example, the newRV_inc() function is
905       documented to create a new RV (with reference count 1) and increment
906       the reference count of the referent that was supplied by the caller.
907       This is best understood as creating a new reference to the referent,
908       which is owned by the created RV, and returning to the caller ownership
909       of the sole reference to the RV.  The newRV_noinc() function instead
910       does not increment the reference count of the referent, but the RV
911       nevertheless ends up owning a reference to the referent.  It is
912       therefore implied that the caller of "newRV_noinc()" is relinquishing a
913       reference to the referent, making this conceptually a more complicated
914       operation even though it does less to the data structures.
915
916       For example, imagine you want to return a reference from an XSUB
917       function.  Inside the XSUB routine, you create an SV which initially
918       has just a single reference, owned by the XSUB routine.  This reference
919       needs to be disposed of before the routine is complete, otherwise it
920       will leak, preventing the SV from ever being destroyed.  So to create
921       an RV referencing the SV, it is most convenient to pass the SV to
922       "newRV_noinc()", which consumes that reference.  Now the XSUB routine
923       no longer owns a reference to the SV, but does own a reference to the
924       RV, which in turn owns a reference to the SV.  The ownership of the
925       reference to the RV is then transferred by the process of returning the
926       RV from the XSUB.
927
928       There are some convenience functions available that can help with the
929       destruction of xVs.  These functions introduce the concept of
930       "mortality".  Much documentation speaks of an xV itself being mortal,
931       but this is misleading.  It is really a reference to an xV that is
932       mortal, and it is possible for there to be more than one mortal
933       reference to a single xV.  For a reference to be mortal means that it
934       is owned by the temps stack, one of perl's many internal stacks, which
935       will destroy that reference "a short time later".  Usually the "short
936       time later" is the end of the current Perl statement.  However, it gets
937       more complicated around dynamic scopes: there can be multiple sets of
938       mortal references hanging around at the same time, with different death
939       dates.  Internally, the actual determinant for when mortal xV
940       references are destroyed depends on two macros, SAVETMPS and FREETMPS.
941       See perlcall and perlxs and "Temporaries Stack" below for more details
942       on these macros.
943
944       Mortal references are mainly used for xVs that are placed on perl's
945       main stack.  The stack is problematic for reference tracking, because
946       it contains a lot of xV references, but doesn't own those references:
947       they are not counted.  Currently, there are many bugs resulting from
948       xVs being destroyed while referenced by the stack, because the stack's
949       uncounted references aren't enough to keep the xVs alive.  So when
950       putting an (uncounted) reference on the stack, it is vitally important
951       to ensure that there will be a counted reference to the same xV that
952       will last at least as long as the uncounted reference.  But it's also
953       important that that counted reference be cleaned up at an appropriate
954       time, and not unduly prolong the xV's life.  For there to be a mortal
955       reference is often the best way to satisfy this requirement, especially
956       if the xV was created especially to be put on the stack and would
957       otherwise be unreferenced.
958
959       To create a mortal reference, use the functions:
960
961           SV*  sv_newmortal()
962           SV*  sv_mortalcopy(SV*)
963           SV*  sv_2mortal(SV*)
964
965       "sv_newmortal()" creates an SV (with the undefined value) whose sole
966       reference is mortal.  "sv_mortalcopy()" creates an xV whose value is a
967       copy of a supplied xV and whose sole reference is mortal.
968       "sv_2mortal()" mortalises an existing xV reference: it transfers
969       ownership of a reference from the caller to the temps stack.  Because
970       "sv_newmortal" gives the new SV no value, it must normally be given one
971       via "sv_setpv", "sv_setiv", etc. :
972
973           SV *tmp = sv_newmortal();
974           sv_setiv(tmp, an_integer);
975
976       As that is multiple C statements it is quite common so see this idiom
977       instead:
978
979           SV *tmp = sv_2mortal(newSViv(an_integer));
980
981       The mortal routines are not just for SVs; AVs and HVs can be made
982       mortal by passing their address (type-casted to "SV*") to the
983       "sv_2mortal" or "sv_mortalcopy" routines.
984
985   Stashes and Globs
986       A stash is a hash that contains all variables that are defined within a
987       package.  Each key of the stash is a symbol name (shared by all the
988       different types of objects that have the same name), and each value in
989       the hash table is a GV (Glob Value).  This GV in turn contains
990       references to the various objects of that name, including (but not
991       limited to) the following:
992
993           Scalar Value
994           Array Value
995           Hash Value
996           I/O Handle
997           Format
998           Subroutine
999
1000       There is a single stash called "PL_defstash" that holds the items that
1001       exist in the "main" package.  To get at the items in other packages,
1002       append the string "::" to the package name.  The items in the "Foo"
1003       package are in the stash "Foo::" in PL_defstash.  The items in the
1004       "Bar::Baz" package are in the stash "Baz::" in "Bar::"'s stash.
1005
1006       To get the stash pointer for a particular package, use the function:
1007
1008           HV*  gv_stashpv(const char* name, I32 flags)
1009           HV*  gv_stashsv(SV*, I32 flags)
1010
1011       The first function takes a literal string, the second uses the string
1012       stored in the SV.  Remember that a stash is just a hash table, so you
1013       get back an "HV*".  The "flags" flag will create a new package if it is
1014       set to GV_ADD.
1015
1016       The name that "gv_stash*v" wants is the name of the package whose
1017       symbol table you want.  The default package is called "main".  If you
1018       have multiply nested packages, pass their names to "gv_stash*v",
1019       separated by "::" as in the Perl language itself.
1020
1021       Alternately, if you have an SV that is a blessed reference, you can
1022       find out the stash pointer by using:
1023
1024           HV*  SvSTASH(SvRV(SV*));
1025
1026       then use the following to get the package name itself:
1027
1028           char*  HvNAME(HV* stash);
1029
1030       If you need to bless or re-bless an object you can use the following
1031       function:
1032
1033           SV*  sv_bless(SV*, HV* stash)
1034
1035       where the first argument, an "SV*", must be a reference, and the second
1036       argument is a stash.  The returned "SV*" can now be used in the same
1037       way as any other SV.
1038
1039       For more information on references and blessings, consult perlref.
1040
1041   I/O Handles
1042       Like AVs and HVs, IO objects are another type of non-scalar SV which
1043       may contain input and output PerlIO objects or a "DIR *" from
1044       opendir().
1045
1046       You can create a new IO object:
1047
1048           IO*  newIO();
1049
1050       Unlike other SVs, a new IO object is automatically blessed into the
1051       IO::File class.
1052
1053       The IO object contains an input and output PerlIO handle:
1054
1055         PerlIO *IoIFP(IO *io);
1056         PerlIO *IoOFP(IO *io);
1057
1058       Typically if the IO object has been opened on a file, the input handle
1059       is always present, but the output handle is only present if the file is
1060       open for output.  For a file, if both are present they will be the same
1061       PerlIO object.
1062
1063       Distinct input and output PerlIO objects are created for sockets and
1064       character devices.
1065
1066       The IO object also contains other data associated with Perl I/O
1067       handles:
1068
1069         IV IoLINES(io);                /* $. */
1070         IV IoPAGE(io);                 /* $% */
1071         IV IoPAGE_LEN(io);             /* $= */
1072         IV IoLINES_LEFT(io);           /* $- */
1073         char *IoTOP_NAME(io);          /* $^ */
1074         GV *IoTOP_GV(io);              /* $^ */
1075         char *IoFMT_NAME(io);          /* $~ */
1076         GV *IoFMT_GV(io);              /* $~ */
1077         char *IoBOTTOM_NAME(io);
1078         GV *IoBOTTOM_GV(io);
1079         char IoTYPE(io);
1080         U8 IoFLAGS(io);
1081
1082       Most of these are involved with formats.
1083
1084       IoFLAGs() may contain a combination of flags, the most interesting of
1085       which are "IOf_FLUSH" ($|) for autoflush and "IOf_UNTAINT", settable
1086       with IO::Handle's untaint() method.
1087
1088       The IO object may also contains a directory handle:
1089
1090         DIR *IoDIRP(io);
1091
1092       suitable for use with PerlDir_read() etc.
1093
1094       All of these accessors macros are lvalues, there are no distinct
1095       "_set()" macros to modify the members of the IO object.
1096
1097   Double-Typed SVs
1098       Scalar variables normally contain only one type of value, an integer,
1099       double, pointer, or reference.  Perl will automatically convert the
1100       actual scalar data from the stored type into the requested type.
1101
1102       Some scalar variables contain more than one type of scalar data.  For
1103       example, the variable $! contains either the numeric value of "errno"
1104       or its string equivalent from either "strerror" or "sys_errlist[]".
1105
1106       To force multiple data values into an SV, you must do two things: use
1107       the "sv_set*v" routines to add the additional scalar type, then set a
1108       flag so that Perl will believe it contains more than one type of data.
1109       The four macros to set the flags are:
1110
1111               SvIOK_on
1112               SvNOK_on
1113               SvPOK_on
1114               SvROK_on
1115
1116       The particular macro you must use depends on which "sv_set*v" routine
1117       you called first.  This is because every "sv_set*v" routine turns on
1118       only the bit for the particular type of data being set, and turns off
1119       all the rest.
1120
1121       For example, to create a new Perl variable called "dberror" that
1122       contains both the numeric and descriptive string error values, you
1123       could use the following code:
1124
1125           extern int  dberror;
1126           extern char *dberror_list;
1127
1128           SV* sv = get_sv("dberror", GV_ADD);
1129           sv_setiv(sv, (IV) dberror);
1130           sv_setpv(sv, dberror_list[dberror]);
1131           SvIOK_on(sv);
1132
1133       If the order of "sv_setiv" and "sv_setpv" had been reversed, then the
1134       macro "SvPOK_on" would need to be called instead of "SvIOK_on".
1135
1136   Read-Only Values
1137       In Perl 5.16 and earlier, copy-on-write (see the next section) shared a
1138       flag bit with read-only scalars.  So the only way to test whether
1139       "sv_setsv", etc., will raise a "Modification of a read-only value"
1140       error in those versions is:
1141
1142           SvREADONLY(sv) && !SvIsCOW(sv)
1143
1144       Under Perl 5.18 and later, SvREADONLY only applies to read-only
1145       variables, and, under 5.20, copy-on-write scalars can also be read-
1146       only, so the above check is incorrect.  You just want:
1147
1148           SvREADONLY(sv)
1149
1150       If you need to do this check often, define your own macro like this:
1151
1152           #if PERL_VERSION >= 18
1153           # define SvTRULYREADONLY(sv) SvREADONLY(sv)
1154           #else
1155           # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv))
1156           #endif
1157
1158   Copy on Write
1159       Perl implements a copy-on-write (COW) mechanism for scalars, in which
1160       string copies are not immediately made when requested, but are deferred
1161       until made necessary by one or the other scalar changing.  This is
1162       mostly transparent, but one must take care not to modify string buffers
1163       that are shared by multiple SVs.
1164
1165       You can test whether an SV is using copy-on-write with "SvIsCOW(sv)".
1166
1167       You can force an SV to make its own copy of its string buffer by
1168       calling "sv_force_normal(sv)" or SvPV_force_nolen(sv).
1169
1170       If you want to make the SV drop its string buffer, use
1171       "sv_force_normal_flags(sv, SV_COW_DROP_PV)" or simply "sv_setsv(sv,
1172       NULL)".
1173
1174       All of these functions will croak on read-only scalars (see the
1175       previous section for more on those).
1176
1177       To test that your code is behaving correctly and not modifying COW
1178       buffers, on systems that support mmap(2) (i.e., Unix) you can configure
1179       perl with "-Accflags=-DPERL_DEBUG_READONLY_COW" and it will turn buffer
1180       violations into crashes.  You will find it to be marvellously slow, so
1181       you may want to skip perl's own tests.
1182
1183   Magic Variables
1184       [This section still under construction.  Ignore everything here.  Post
1185       no bills.  Everything not permitted is forbidden.]
1186
1187       Any SV may be magical, that is, it has special features that a normal
1188       SV does not have.  These features are stored in the SV structure in a
1189       linked list of "struct magic"'s, typedef'ed to "MAGIC".
1190
1191           struct magic {
1192               MAGIC*      mg_moremagic;
1193               MGVTBL*     mg_virtual;
1194               U16         mg_private;
1195               char        mg_type;
1196               U8          mg_flags;
1197               I32         mg_len;
1198               SV*         mg_obj;
1199               char*       mg_ptr;
1200           };
1201
1202       Note this is current as of patchlevel 0, and could change at any time.
1203
1204   Assigning Magic
1205       Perl adds magic to an SV using the sv_magic function:
1206
1207         void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
1208
1209       The "sv" argument is a pointer to the SV that is to acquire a new
1210       magical feature.
1211
1212       If "sv" is not already magical, Perl uses the "SvUPGRADE" macro to
1213       convert "sv" to type "SVt_PVMG".  Perl then continues by adding new
1214       magic to the beginning of the linked list of magical features.  Any
1215       prior entry of the same type of magic is deleted.  Note that this can
1216       be overridden, and multiple instances of the same type of magic can be
1217       associated with an SV.
1218
1219       The "name" and "namlen" arguments are used to associate a string with
1220       the magic, typically the name of a variable.  "namlen" is stored in the
1221       "mg_len" field and if "name" is non-null then either a "savepvn" copy
1222       of "name" or "name" itself is stored in the "mg_ptr" field, depending
1223       on whether "namlen" is greater than zero or equal to zero respectively.
1224       As a special case, if "(name && namlen == HEf_SVKEY)" then "name" is
1225       assumed to contain an "SV*" and is stored as-is with its REFCNT
1226       incremented.
1227
1228       The sv_magic function uses "how" to determine which, if any, predefined
1229       "Magic Virtual Table" should be assigned to the "mg_virtual" field.
1230       See the "Magic Virtual Tables" section below.  The "how" argument is
1231       also stored in the "mg_type" field.  The value of "how" should be
1232       chosen from the set of macros "PERL_MAGIC_foo" found in perl.h.  Note
1233       that before these macros were added, Perl internals used to directly
1234       use character literals, so you may occasionally come across old code or
1235       documentation referring to 'U' magic rather than "PERL_MAGIC_uvar" for
1236       example.
1237
1238       The "obj" argument is stored in the "mg_obj" field of the "MAGIC"
1239       structure.  If it is not the same as the "sv" argument, the reference
1240       count of the "obj" object is incremented.  If it is the same, or if the
1241       "how" argument is "PERL_MAGIC_arylen", "PERL_MAGIC_regdatum",
1242       "PERL_MAGIC_regdata", or if it is a NULL pointer, then "obj" is merely
1243       stored, without the reference count being incremented.
1244
1245       See also "sv_magicext" in perlapi for a more flexible way to add magic
1246       to an SV.
1247
1248       There is also a function to add magic to an "HV":
1249
1250           void hv_magic(HV *hv, GV *gv, int how);
1251
1252       This simply calls "sv_magic" and coerces the "gv" argument into an
1253       "SV".
1254
1255       To remove the magic from an SV, call the function sv_unmagic:
1256
1257           int sv_unmagic(SV *sv, int type);
1258
1259       The "type" argument should be equal to the "how" value when the "SV"
1260       was initially made magical.
1261
1262       However, note that "sv_unmagic" removes all magic of a certain "type"
1263       from the "SV".  If you want to remove only certain magic of a "type"
1264       based on the magic virtual table, use "sv_unmagicext" instead:
1265
1266           int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
1267
1268   Magic Virtual Tables
1269       The "mg_virtual" field in the "MAGIC" structure is a pointer to an
1270       "MGVTBL", which is a structure of function pointers and stands for
1271       "Magic Virtual Table" to handle the various operations that might be
1272       applied to that variable.
1273
1274       The "MGVTBL" has five (or sometimes eight) pointers to the following
1275       routine types:
1276
1277           int  (*svt_get)  (pTHX_ SV* sv, MAGIC* mg);
1278           int  (*svt_set)  (pTHX_ SV* sv, MAGIC* mg);
1279           U32  (*svt_len)  (pTHX_ SV* sv, MAGIC* mg);
1280           int  (*svt_clear)(pTHX_ SV* sv, MAGIC* mg);
1281           int  (*svt_free) (pTHX_ SV* sv, MAGIC* mg);
1282
1283           int  (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv,
1284                                                 const char *name, I32 namlen);
1285           int  (*svt_dup)  (pTHX_ MAGIC *mg, CLONE_PARAMS *param);
1286           int  (*svt_local)(pTHX_ SV *nsv, MAGIC *mg);
1287
1288       This MGVTBL structure is set at compile-time in perl.h and there are
1289       currently 32 types.  These different structures contain pointers to
1290       various routines that perform additional actions depending on which
1291       function is being called.
1292
1293          Function pointer    Action taken
1294          ----------------    ------------
1295          svt_get             Do something before the value of the SV is
1296                              retrieved.
1297          svt_set             Do something after the SV is assigned a value.
1298          svt_len             Report on the SV's length.
1299          svt_clear           Clear something the SV represents.
1300          svt_free            Free any extra storage associated with the SV.
1301
1302          svt_copy            copy tied variable magic to a tied element
1303          svt_dup             duplicate a magic structure during thread cloning
1304          svt_local           copy magic to local value during 'local'
1305
1306       For instance, the MGVTBL structure called "vtbl_sv" (which corresponds
1307       to an "mg_type" of "PERL_MAGIC_sv") contains:
1308
1309           { magic_get, magic_set, magic_len, 0, 0 }
1310
1311       Thus, when an SV is determined to be magical and of type
1312       "PERL_MAGIC_sv", if a get operation is being performed, the routine
1313       "magic_get" is called.  All the various routines for the various
1314       magical types begin with "magic_".  NOTE: the magic routines are not
1315       considered part of the Perl API, and may not be exported by the Perl
1316       library.
1317
1318       The last three slots are a recent addition, and for source code
1319       compatibility they are only checked for if one of the three flags
1320       MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags.  This means that
1321       most code can continue declaring a vtable as a 5-element value.  These
1322       three are currently used exclusively by the threading code, and are
1323       highly subject to change.
1324
1325       The current kinds of Magic Virtual Tables are:
1326
1327        mg_type
1328        (old-style char and macro)   MGVTBL         Type of magic
1329        --------------------------   ------         -------------
1330        \0 PERL_MAGIC_sv             vtbl_sv        Special scalar variable
1331        #  PERL_MAGIC_arylen         vtbl_arylen    Array length ($#ary)
1332        %  PERL_MAGIC_rhash          (none)         Extra data for restricted
1333                                                    hashes
1334        *  PERL_MAGIC_debugvar       vtbl_debugvar  $DB::single, signal, trace
1335                                                    vars
1336        .  PERL_MAGIC_pos            vtbl_pos       pos() lvalue
1337        :  PERL_MAGIC_symtab         (none)         Extra data for symbol
1338                                                    tables
1339        <  PERL_MAGIC_backref        vtbl_backref   For weak ref data
1340        @  PERL_MAGIC_arylen_p       (none)         To move arylen out of XPVAV
1341        B  PERL_MAGIC_bm             vtbl_regexp    Boyer-Moore
1342                                                    (fast string search)
1343        c  PERL_MAGIC_overload_table vtbl_ovrld     Holds overload table
1344                                                    (AMT) on stash
1345        D  PERL_MAGIC_regdata        vtbl_regdata   Regex match position data
1346                                                    (@+ and @- vars)
1347        d  PERL_MAGIC_regdatum       vtbl_regdatum  Regex match position data
1348                                                    element
1349        E  PERL_MAGIC_env            vtbl_env       %ENV hash
1350        e  PERL_MAGIC_envelem        vtbl_envelem   %ENV hash element
1351        f  PERL_MAGIC_fm             vtbl_regexp    Formline
1352                                                    ('compiled' format)
1353        g  PERL_MAGIC_regex_global   vtbl_mglob     m//g target
1354        H  PERL_MAGIC_hints          vtbl_hints     %^H hash
1355        h  PERL_MAGIC_hintselem      vtbl_hintselem %^H hash element
1356        I  PERL_MAGIC_isa            vtbl_isa       @ISA array
1357        i  PERL_MAGIC_isaelem        vtbl_isaelem   @ISA array element
1358        k  PERL_MAGIC_nkeys          vtbl_nkeys     scalar(keys()) lvalue
1359        L  PERL_MAGIC_dbfile         (none)         Debugger %_<filename
1360        l  PERL_MAGIC_dbline         vtbl_dbline    Debugger %_<filename
1361                                                    element
1362        N  PERL_MAGIC_shared         (none)         Shared between threads
1363        n  PERL_MAGIC_shared_scalar  (none)         Shared between threads
1364        o  PERL_MAGIC_collxfrm       vtbl_collxfrm  Locale transformation
1365        P  PERL_MAGIC_tied           vtbl_pack      Tied array or hash
1366        p  PERL_MAGIC_tiedelem       vtbl_packelem  Tied array or hash element
1367        q  PERL_MAGIC_tiedscalar     vtbl_packelem  Tied scalar or handle
1368        r  PERL_MAGIC_qr             vtbl_regexp    Precompiled qr// regex
1369        S  PERL_MAGIC_sig            (none)         %SIG hash
1370        s  PERL_MAGIC_sigelem        vtbl_sigelem   %SIG hash element
1371        t  PERL_MAGIC_taint          vtbl_taint     Taintedness
1372        U  PERL_MAGIC_uvar           vtbl_uvar      Available for use by
1373                                                    extensions
1374        u  PERL_MAGIC_uvar_elem      (none)         Reserved for use by
1375                                                    extensions
1376        V  PERL_MAGIC_vstring        (none)         SV was vstring literal
1377        v  PERL_MAGIC_vec            vtbl_vec       vec() lvalue
1378        w  PERL_MAGIC_utf8           vtbl_utf8      Cached UTF-8 information
1379        x  PERL_MAGIC_substr         vtbl_substr    substr() lvalue
1380        Y  PERL_MAGIC_nonelem        vtbl_nonelem   Array element that does not
1381                                                    exist
1382        y  PERL_MAGIC_defelem        vtbl_defelem   Shadow "foreach" iterator
1383                                                    variable / smart parameter
1384                                                    vivification
1385        \  PERL_MAGIC_lvref          vtbl_lvref     Lvalue reference
1386                                                    constructor
1387        ]  PERL_MAGIC_checkcall      vtbl_checkcall Inlining/mutation of call
1388                                                    to this CV
1389        ~  PERL_MAGIC_ext            (none)         Available for use by
1390                                                    extensions
1391
1392       When an uppercase and lowercase letter both exist in the table, then
1393       the uppercase letter is typically used to represent some kind of
1394       composite type (a list or a hash), and the lowercase letter is used to
1395       represent an element of that composite type.  Some internals code makes
1396       use of this case relationship.  However, 'v' and 'V' (vec and v-string)
1397       are in no way related.
1398
1399       The "PERL_MAGIC_ext" and "PERL_MAGIC_uvar" magic types are defined
1400       specifically for use by extensions and will not be used by perl itself.
1401       Extensions can use "PERL_MAGIC_ext" magic to 'attach' private
1402       information to variables (typically objects).  This is especially
1403       useful because there is no way for normal perl code to corrupt this
1404       private information (unlike using extra elements of a hash object).
1405
1406       Similarly, "PERL_MAGIC_uvar" magic can be used much like tie() to call
1407       a C function any time a scalar's value is used or changed.  The
1408       "MAGIC"'s "mg_ptr" field points to a "ufuncs" structure:
1409
1410           struct ufuncs {
1411               I32 (*uf_val)(pTHX_ IV, SV*);
1412               I32 (*uf_set)(pTHX_ IV, SV*);
1413               IV uf_index;
1414           };
1415
1416       When the SV is read from or written to, the "uf_val" or "uf_set"
1417       function will be called with "uf_index" as the first arg and a pointer
1418       to the SV as the second.  A simple example of how to add
1419       "PERL_MAGIC_uvar" magic is shown below.  Note that the ufuncs structure
1420       is copied by sv_magic, so you can safely allocate it on the stack.
1421
1422           void
1423           Umagic(sv)
1424               SV *sv;
1425           PREINIT:
1426               struct ufuncs uf;
1427           CODE:
1428               uf.uf_val   = &my_get_fn;
1429               uf.uf_set   = &my_set_fn;
1430               uf.uf_index = 0;
1431               sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1432
1433       Attaching "PERL_MAGIC_uvar" to arrays is permissible but has no effect.
1434
1435       For hashes there is a specialized hook that gives control over hash
1436       keys (but not values).  This hook calls "PERL_MAGIC_uvar" 'get' magic
1437       if the "set" function in the "ufuncs" structure is NULL.  The hook is
1438       activated whenever the hash is accessed with a key specified as an "SV"
1439       through the functions "hv_store_ent", "hv_fetch_ent", "hv_delete_ent",
1440       and "hv_exists_ent".  Accessing the key as a string through the
1441       functions without the "..._ent" suffix circumvents the hook.  See
1442       "GUTS" in Hash::Util::FieldHash for a detailed description.
1443
1444       Note that because multiple extensions may be using "PERL_MAGIC_ext" or
1445       "PERL_MAGIC_uvar" magic, it is important for extensions to take extra
1446       care to avoid conflict.  Typically only using the magic on objects
1447       blessed into the same class as the extension is sufficient.  For
1448       "PERL_MAGIC_ext" magic, it is usually a good idea to define an
1449       "MGVTBL", even if all its fields will be 0, so that individual "MAGIC"
1450       pointers can be identified as a particular kind of magic using their
1451       magic virtual table.  "mg_findext" provides an easy way to do that:
1452
1453           STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
1454
1455           MAGIC *mg;
1456           if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
1457               /* this is really ours, not another module's PERL_MAGIC_ext */
1458               my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr;
1459               ...
1460           }
1461
1462       Also note that the "sv_set*()" and "sv_cat*()" functions described
1463       earlier do not invoke 'set' magic on their targets.  This must be done
1464       by the user either by calling the "SvSETMAGIC()" macro after calling
1465       these functions, or by using one of the "sv_set*_mg()" or
1466       "sv_cat*_mg()" functions.  Similarly, generic C code must call the
1467       "SvGETMAGIC()" macro to invoke any 'get' magic if they use an SV
1468       obtained from external sources in functions that don't handle magic.
1469       See perlapi for a description of these functions.  For example, calls
1470       to the "sv_cat*()" functions typically need to be followed by
1471       "SvSETMAGIC()", but they don't need a prior "SvGETMAGIC()" since their
1472       implementation handles 'get' magic.
1473
1474   Finding Magic
1475           MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
1476                                              * type */
1477
1478       This routine returns a pointer to a "MAGIC" structure stored in the SV.
1479       If the SV does not have that magical feature, "NULL" is returned.  If
1480       the SV has multiple instances of that magical feature, the first one
1481       will be returned.  "mg_findext" can be used to find a "MAGIC" structure
1482       of an SV based on both its magic type and its magic virtual table:
1483
1484           MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
1485
1486       Also, if the SV passed to "mg_find" or "mg_findext" is not of type
1487       SVt_PVMG, Perl may core dump.
1488
1489           int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1490
1491       This routine checks to see what types of magic "sv" has.  If the
1492       mg_type field is an uppercase letter, then the mg_obj is copied to
1493       "nsv", but the mg_type field is changed to be the lowercase letter.
1494
1495   Understanding the Magic of Tied Hashes and Arrays
1496       Tied hashes and arrays are magical beasts of the "PERL_MAGIC_tied"
1497       magic type.
1498
1499       WARNING: As of the 5.004 release, proper usage of the array and hash
1500       access functions requires understanding a few caveats.  Some of these
1501       caveats are actually considered bugs in the API, to be fixed in later
1502       releases, and are bracketed with [MAYCHANGE] below.  If you find
1503       yourself actually applying such information in this section, be aware
1504       that the behavior may change in the future, umm, without warning.
1505
1506       The perl tie function associates a variable with an object that
1507       implements the various GET, SET, etc methods.  To perform the
1508       equivalent of the perl tie function from an XSUB, you must mimic this
1509       behaviour.  The code below carries out the necessary steps -- firstly
1510       it creates a new hash, and then creates a second hash which it blesses
1511       into the class which will implement the tie methods.  Lastly it ties
1512       the two hashes together, and returns a reference to the new tied hash.
1513       Note that the code below does NOT call the TIEHASH method in the MyTie
1514       class - see "Calling Perl Routines from within C Programs" for details
1515       on how to do this.
1516
1517           SV*
1518           mytie()
1519           PREINIT:
1520               HV *hash;
1521               HV *stash;
1522               SV *tie;
1523           CODE:
1524               hash = newHV();
1525               tie = newRV_noinc((SV*)newHV());
1526               stash = gv_stashpv("MyTie", GV_ADD);
1527               sv_bless(tie, stash);
1528               hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1529               RETVAL = newRV_noinc(hash);
1530           OUTPUT:
1531               RETVAL
1532
1533       The "av_store" function, when given a tied array argument, merely
1534       copies the magic of the array onto the value to be "stored", using
1535       "mg_copy".  It may also return NULL, indicating that the value did not
1536       actually need to be stored in the array.  [MAYCHANGE] After a call to
1537       "av_store" on a tied array, the caller will usually need to call
1538       "mg_set(val)" to actually invoke the perl level "STORE" method on the
1539       TIEARRAY object.  If "av_store" did return NULL, a call to
1540       "SvREFCNT_dec(val)" will also be usually necessary to avoid a memory
1541       leak. [/MAYCHANGE]
1542
1543       The previous paragraph is applicable verbatim to tied hash access using
1544       the "hv_store" and "hv_store_ent" functions as well.
1545
1546       "av_fetch" and the corresponding hash functions "hv_fetch" and
1547       "hv_fetch_ent" actually return an undefined mortal value whose magic
1548       has been initialized using "mg_copy".  Note the value so returned does
1549       not need to be deallocated, as it is already mortal.  [MAYCHANGE] But
1550       you will need to call "mg_get()" on the returned value in order to
1551       actually invoke the perl level "FETCH" method on the underlying TIE
1552       object.  Similarly, you may also call "mg_set()" on the return value
1553       after possibly assigning a suitable value to it using "sv_setsv",
1554       which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]
1555
1556       [MAYCHANGE] In other words, the array or hash fetch/store functions
1557       don't really fetch and store actual values in the case of tied arrays
1558       and hashes.  They merely call "mg_copy" to attach magic to the values
1559       that were meant to be "stored" or "fetched".  Later calls to "mg_get"
1560       and "mg_set" actually do the job of invoking the TIE methods on the
1561       underlying objects.  Thus the magic mechanism currently implements a
1562       kind of lazy access to arrays and hashes.
1563
1564       Currently (as of perl version 5.004), use of the hash and array access
1565       functions requires the user to be aware of whether they are operating
1566       on "normal" hashes and arrays, or on their tied variants.  The API may
1567       be changed to provide more transparent access to both tied and normal
1568       data types in future versions.  [/MAYCHANGE]
1569
1570       You would do well to understand that the TIEARRAY and TIEHASH
1571       interfaces are mere sugar to invoke some perl method calls while using
1572       the uniform hash and array syntax.  The use of this sugar imposes some
1573       overhead (typically about two to four extra opcodes per FETCH/STORE
1574       operation, in addition to the creation of all the mortal variables
1575       required to invoke the methods).  This overhead will be comparatively
1576       small if the TIE methods are themselves substantial, but if they are
1577       only a few statements long, the overhead will not be insignificant.
1578
1579   Localizing changes
1580       Perl has a very handy construction
1581
1582         {
1583           local $var = 2;
1584           ...
1585         }
1586
1587       This construction is approximately equivalent to
1588
1589         {
1590           my $oldvar = $var;
1591           $var = 2;
1592           ...
1593           $var = $oldvar;
1594         }
1595
1596       The biggest difference is that the first construction would reinstate
1597       the initial value of $var, irrespective of how control exits the block:
1598       "goto", "return", "die"/"eval", etc.  It is a little bit more efficient
1599       as well.
1600
1601       There is a way to achieve a similar task from C via Perl API: create a
1602       pseudo-block, and arrange for some changes to be automatically undone
1603       at the end of it, either explicit, or via a non-local exit (via die()).
1604       A block-like construct is created by a pair of "ENTER"/"LEAVE" macros
1605       (see "Returning a Scalar" in perlcall).  Such a construct may be
1606       created specially for some important localized task, or an existing one
1607       (like boundaries of enclosing Perl subroutine/block, or an existing
1608       pair for freeing TMPs) may be used.  (In the second case the overhead
1609       of additional localization must be almost negligible.)  Note that any
1610       XSUB is automatically enclosed in an "ENTER"/"LEAVE" pair.
1611
1612       Inside such a pseudo-block the following service is available:
1613
1614       "SAVEINT(int i)"
1615       "SAVEIV(IV i)"
1616       "SAVEI32(I32 i)"
1617       "SAVELONG(long i)"
1618       "SAVEI8(I8 i)"
1619       "SAVEI16(I16 i)"
1620       "SAVEBOOL(int i)"
1621           These macros arrange things to restore the value of integer
1622           variable "i" at the end of the enclosing pseudo-block.
1623
1624       SAVESPTR(s)
1625       SAVEPPTR(p)
1626           These macros arrange things to restore the value of pointers "s"
1627           and "p".  "s" must be a pointer of a type which survives conversion
1628           to "SV*" and back, "p" should be able to survive conversion to
1629           "char*" and back.
1630
1631       "SAVEFREESV(SV *sv)"
1632           The refcount of "sv" will be decremented at the end of pseudo-
1633           block.  This is similar to "sv_2mortal" in that it is also a
1634           mechanism for doing a delayed "SvREFCNT_dec".  However, while
1635           "sv_2mortal" extends the lifetime of "sv" until the beginning of
1636           the next statement, "SAVEFREESV" extends it until the end of the
1637           enclosing scope.  These lifetimes can be wildly different.
1638
1639           Also compare "SAVEMORTALIZESV".
1640
1641       "SAVEMORTALIZESV(SV *sv)"
1642           Just like "SAVEFREESV", but mortalizes "sv" at the end of the
1643           current scope instead of decrementing its reference count.  This
1644           usually has the effect of keeping "sv" alive until the statement
1645           that called the currently live scope has finished executing.
1646
1647       "SAVEFREEOP(OP *op)"
1648           The "OP *" is op_free()ed at the end of pseudo-block.
1649
1650       SAVEFREEPV(p)
1651           The chunk of memory which is pointed to by "p" is Safefree()ed at
1652           the end of pseudo-block.
1653
1654       "SAVECLEARSV(SV *sv)"
1655           Clears a slot in the current scratchpad which corresponds to "sv"
1656           at the end of pseudo-block.
1657
1658       "SAVEDELETE(HV *hv, char *key, I32 length)"
1659           The key "key" of "hv" is deleted at the end of pseudo-block.  The
1660           string pointed to by "key" is Safefree()ed.  If one has a key in
1661           short-lived storage, the corresponding string may be reallocated
1662           like this:
1663
1664             SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1665
1666       "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
1667           At the end of pseudo-block the function "f" is called with the only
1668           argument "p".
1669
1670       "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
1671           At the end of pseudo-block the function "f" is called with the
1672           implicit context argument (if any), and "p".
1673
1674       "SAVESTACK_POS()"
1675           The current offset on the Perl internal stack (cf. "SP") is
1676           restored at the end of pseudo-block.
1677
1678       The following API list contains functions, thus one needs to provide
1679       pointers to the modifiable data explicitly (either C pointers, or
1680       Perlish "GV *"s).  Where the above macros take "int", a similar
1681       function takes "int *".
1682
1683       Other macros above have functions implementing them, but its probably
1684       best to just use the macro, and not those or the ones below.
1685
1686       "SV* save_scalar(GV *gv)"
1687           Equivalent to Perl code "local $gv".
1688
1689       "AV* save_ary(GV *gv)"
1690       "HV* save_hash(GV *gv)"
1691           Similar to "save_scalar", but localize @gv and %gv.
1692
1693       "void save_item(SV *item)"
1694           Duplicates the current value of "SV". On the exit from the current
1695           "ENTER"/"LEAVE" pseudo-block the value of "SV" will be restored
1696           using the stored value.  It doesn't handle magic.  Use
1697           "save_scalar" if magic is affected.
1698
1699       "void save_list(SV **sarg, I32 maxsarg)"
1700           A variant of "save_item" which takes multiple arguments via an
1701           array "sarg" of "SV*" of length "maxsarg".
1702
1703       "SV* save_svref(SV **sptr)"
1704           Similar to "save_scalar", but will reinstate an "SV *".
1705
1706       "void save_aptr(AV **aptr)"
1707       "void save_hptr(HV **hptr)"
1708           Similar to "save_svref", but localize "AV *" and "HV *".
1709
1710       The "Alias" module implements localization of the basic types within
1711       the caller's scope.  People who are interested in how to localize
1712       things in the containing scope should take a look there too.
1713

Subroutines

1715   XSUBs and the Argument Stack
1716       The XSUB mechanism is a simple way for Perl programs to access C
1717       subroutines.  An XSUB routine will have a stack that contains the
1718       arguments from the Perl program, and a way to map from the Perl data
1719       structures to a C equivalent.
1720
1721       The stack arguments are accessible through the ST(n) macro, which
1722       returns the "n"'th stack argument.  Argument 0 is the first argument
1723       passed in the Perl subroutine call.  These arguments are "SV*", and can
1724       be used anywhere an "SV*" is used.
1725
1726       Most of the time, output from the C routine can be handled through use
1727       of the RETVAL and OUTPUT directives.  However, there are some cases
1728       where the argument stack is not already long enough to handle all the
1729       return values.  An example is the POSIX tzname() call, which takes no
1730       arguments, but returns two, the local time zone's standard and summer
1731       time abbreviations.
1732
1733       To handle this situation, the PPCODE directive is used and the stack is
1734       extended using the macro:
1735
1736           EXTEND(SP, num);
1737
1738       where "SP" is the macro that represents the local copy of the stack
1739       pointer, and "num" is the number of elements the stack should be
1740       extended by.
1741
1742       Now that there is room on the stack, values can be pushed on it using
1743       "PUSHs" macro.  The pushed values will often need to be "mortal" (See
1744       "Reference Counts and Mortality"):
1745
1746           PUSHs(sv_2mortal(newSViv(an_integer)))
1747           PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
1748           PUSHs(sv_2mortal(newSVnv(a_double)))
1749           PUSHs(sv_2mortal(newSVpv("Some String",0)))
1750           /* Although the last example is better written as the more
1751            * efficient: */
1752           PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
1753
1754       And now the Perl program calling "tzname", the two values will be
1755       assigned as in:
1756
1757           ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1758
1759       An alternate (and possibly simpler) method to pushing values on the
1760       stack is to use the macro:
1761
1762           XPUSHs(SV*)
1763
1764       This macro automatically adjusts the stack for you, if needed.  Thus,
1765       you do not need to call "EXTEND" to extend the stack.
1766
1767       Despite their suggestions in earlier versions of this document the
1768       macros "(X)PUSH[iunp]" are not suited to XSUBs which return multiple
1769       results.  For that, either stick to the "(X)PUSHs" macros shown above,
1770       or use the new "m(X)PUSH[iunp]" macros instead; see "Putting a C value
1771       on Perl stack".
1772
1773       For more information, consult perlxs and perlxstut.
1774
1775   Autoloading with XSUBs
1776       If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts
1777       the fully-qualified name of the autoloaded subroutine in the $AUTOLOAD
1778       variable of the XSUB's package.
1779
1780       But it also puts the same information in certain fields of the XSUB
1781       itself:
1782
1783           HV *stash           = CvSTASH(cv);
1784           const char *subname = SvPVX(cv);
1785           STRLEN name_length  = SvCUR(cv); /* in bytes */
1786           U32 is_utf8         = SvUTF8(cv);
1787
1788       "SvPVX(cv)" contains just the sub name itself, not including the
1789       package.  For an AUTOLOAD routine in UNIVERSAL or one of its
1790       superclasses, "CvSTASH(cv)" returns NULL during a method call on a
1791       nonexistent package.
1792
1793       Note: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
1794       XS AUTOLOAD subs at all.  Perl 5.8.0 introduced the use of fields in
1795       the XSUB itself.  Perl 5.16.0 restored the setting of $AUTOLOAD.  If
1796       you need to support 5.8-5.14, use the XSUB's fields.
1797
1798   Calling Perl Routines from within C Programs
1799       There are four routines that can be used to call a Perl subroutine from
1800       within a C program.  These four are:
1801
1802           I32  call_sv(SV*, I32);
1803           I32  call_pv(const char*, I32);
1804           I32  call_method(const char*, I32);
1805           I32  call_argv(const char*, I32, char**);
1806
1807       The routine most often used is "call_sv".  The "SV*" argument contains
1808       either the name of the Perl subroutine to be called, or a reference to
1809       the subroutine.  The second argument consists of flags that control the
1810       context in which the subroutine is called, whether or not the
1811       subroutine is being passed arguments, how errors should be trapped, and
1812       how to treat return values.
1813
1814       All four routines return the number of arguments that the subroutine
1815       returned on the Perl stack.
1816
1817       These routines used to be called "perl_call_sv", etc., before Perl
1818       v5.6.0, but those names are now deprecated; macros of the same name are
1819       provided for compatibility.
1820
1821       When using any of these routines (except "call_argv"), the programmer
1822       must manipulate the Perl stack.  These include the following macros and
1823       functions:
1824
1825           dSP
1826           SP
1827           PUSHMARK()
1828           PUTBACK
1829           SPAGAIN
1830           ENTER
1831           SAVETMPS
1832           FREETMPS
1833           LEAVE
1834           XPUSH*()
1835           POP*()
1836
1837       For a detailed description of calling conventions from C to Perl,
1838       consult perlcall.
1839
1840   Putting a C value on Perl stack
1841       A lot of opcodes (this is an elementary operation in the internal perl
1842       stack machine) put an SV* on the stack.  However, as an optimization
1843       the corresponding SV is (usually) not recreated each time.  The opcodes
1844       reuse specially assigned SVs (targets) which are (as a corollary) not
1845       constantly freed/created.
1846
1847       Each of the targets is created only once (but see "Scratchpads and
1848       recursion" below), and when an opcode needs to put an integer, a
1849       double, or a string on stack, it just sets the corresponding parts of
1850       its target and puts the target on stack.
1851
1852       The macro to put this target on stack is "PUSHTARG", and it is directly
1853       used in some opcodes, as well as indirectly in zillions of others,
1854       which use it via "(X)PUSH[iunp]".
1855
1856       Because the target is reused, you must be careful when pushing multiple
1857       values on the stack.  The following code will not do what you think:
1858
1859           XPUSHi(10);
1860           XPUSHi(20);
1861
1862       This translates as "set "TARG" to 10, push a pointer to "TARG" onto the
1863       stack; set "TARG" to 20, push a pointer to "TARG" onto the stack".  At
1864       the end of the operation, the stack does not contain the values 10 and
1865       20, but actually contains two pointers to "TARG", which we have set to
1866       20.
1867
1868       If you need to push multiple different values then you should either
1869       use the "(X)PUSHs" macros, or else use the new "m(X)PUSH[iunp]" macros,
1870       none of which make use of "TARG".  The "(X)PUSHs" macros simply push an
1871       SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
1872       will often need to be "mortal".  The new "m(X)PUSH[iunp]" macros make
1873       this a little easier to achieve by creating a new mortal for you (via
1874       "(X)PUSHmortal"), pushing that onto the stack (extending it if
1875       necessary in the case of the "mXPUSH[iunp]" macros), and then setting
1876       its value.  Thus, instead of writing this to "fix" the example above:
1877
1878           XPUSHs(sv_2mortal(newSViv(10)))
1879           XPUSHs(sv_2mortal(newSViv(20)))
1880
1881       you can simply write:
1882
1883           mXPUSHi(10)
1884           mXPUSHi(20)
1885
1886       On a related note, if you do use "(X)PUSH[iunp]", then you're going to
1887       need a "dTARG" in your variable declarations so that the "*PUSH*"
1888       macros can make use of the local variable "TARG".  See also "dTARGET"
1889       and "dXSTARG".
1890
1891   Scratchpads
1892       The question remains on when the SVs which are targets for opcodes are
1893       created.  The answer is that they are created when the current unit--a
1894       subroutine or a file (for opcodes for statements outside of
1895       subroutines)--is compiled.  During this time a special anonymous Perl
1896       array is created, which is called a scratchpad for the current unit.
1897
1898       A scratchpad keeps SVs which are lexicals for the current unit and are
1899       targets for opcodes.  A previous version of this document stated that
1900       one can deduce that an SV lives on a scratchpad by looking on its
1901       flags: lexicals have "SVs_PADMY" set, and targets have "SVs_PADTMP"
1902       set.  But this has never been fully true.  "SVs_PADMY" could be set on
1903       a variable that no longer resides in any pad.  While targets do have
1904       "SVs_PADTMP" set, it can also be set on variables that have never
1905       resided in a pad, but nonetheless act like targets.  As of perl 5.21.5,
1906       the "SVs_PADMY" flag is no longer used and is defined as 0.
1907       "SvPADMY()" now returns true for anything without "SVs_PADTMP".
1908
1909       The correspondence between OPs and targets is not 1-to-1.  Different
1910       OPs in the compile tree of the unit can use the same target, if this
1911       would not conflict with the expected life of the temporary.
1912
1913   Scratchpads and recursion
1914       In fact it is not 100% true that a compiled unit contains a pointer to
1915       the scratchpad AV.  In fact it contains a pointer to an AV of
1916       (initially) one element, and this element is the scratchpad AV.  Why do
1917       we need an extra level of indirection?
1918
1919       The answer is recursion, and maybe threads.  Both these can create
1920       several execution pointers going into the same subroutine.  For the
1921       subroutine-child not write over the temporaries for the subroutine-
1922       parent (lifespan of which covers the call to the child), the parent and
1923       the child should have different scratchpads.  (And the lexicals should
1924       be separate anyway!)
1925
1926       So each subroutine is born with an array of scratchpads (of length 1).
1927       On each entry to the subroutine it is checked that the current depth of
1928       the recursion is not more than the length of this array, and if it is,
1929       new scratchpad is created and pushed into the array.
1930
1931       The targets on this scratchpad are "undef"s, but they are already
1932       marked with correct flags.
1933

Memory Allocation

1935   Allocation
1936       All memory meant to be used with the Perl API functions should be
1937       manipulated using the macros described in this section.  The macros
1938       provide the necessary transparency between differences in the actual
1939       malloc implementation that is used within perl.
1940
1941       The following three macros are used to initially allocate memory :
1942
1943           Newx(pointer, number, type);
1944           Newxc(pointer, number, type, cast);
1945           Newxz(pointer, number, type);
1946
1947       The first argument "pointer" should be the name of a variable that will
1948       point to the newly allocated memory.
1949
1950       The second and third arguments "number" and "type" specify how many of
1951       the specified type of data structure should be allocated.  The argument
1952       "type" is passed to "sizeof".  The final argument to "Newxc", "cast",
1953       should be used if the "pointer" argument is different from the "type"
1954       argument.
1955
1956       Unlike the "Newx" and "Newxc" macros, the "Newxz" macro calls "memzero"
1957       to zero out all the newly allocated memory.
1958
1959   Reallocation
1960           Renew(pointer, number, type);
1961           Renewc(pointer, number, type, cast);
1962           Safefree(pointer)
1963
1964       These three macros are used to change a memory buffer size or to free a
1965       piece of memory no longer needed.  The arguments to "Renew" and
1966       "Renewc" match those of "New" and "Newc" with the exception of not
1967       needing the "magic cookie" argument.
1968
1969   Moving
1970           Move(source, dest, number, type);
1971           Copy(source, dest, number, type);
1972           Zero(dest, number, type);
1973
1974       These three macros are used to move, copy, or zero out previously
1975       allocated memory.  The "source" and "dest" arguments point to the
1976       source and destination starting points.  Perl will move, copy, or zero
1977       out "number" instances of the size of the "type" data structure (using
1978       the "sizeof" function).
1979

PerlIO

1981       The most recent development releases of Perl have been experimenting
1982       with removing Perl's dependency on the "normal" standard I/O suite and
1983       allowing other stdio implementations to be used.  This involves
1984       creating a new abstraction layer that then calls whichever
1985       implementation of stdio Perl was compiled with.  All XSUBs should now
1986       use the functions in the PerlIO abstraction layer and not make any
1987       assumptions about what kind of stdio is being used.
1988
1989       For a complete description of the PerlIO abstraction, consult perlapio.
1990

Compiled code

1992   Code tree
1993       Here we describe the internal form your code is converted to by Perl.
1994       Start with a simple example:
1995
1996         $a = $b + $c;
1997
1998       This is converted to a tree similar to this one:
1999
2000                    assign-to
2001                  /           \
2002                 +             $a
2003               /   \
2004             $b     $c
2005
2006       (but slightly more complicated).  This tree reflects the way Perl
2007       parsed your code, but has nothing to do with the execution order.
2008       There is an additional "thread" going through the nodes of the tree
2009       which shows the order of execution of the nodes.  In our simplified
2010       example above it looks like:
2011
2012            $b ---> $c ---> + ---> $a ---> assign-to
2013
2014       But with the actual compile tree for "$a = $b + $c" it is different:
2015       some nodes optimized away.  As a corollary, though the actual tree
2016       contains more nodes than our simplified example, the execution order is
2017       the same as in our example.
2018
2019   Examining the tree
2020       If you have your perl compiled for debugging (usually done with
2021       "-DDEBUGGING" on the "Configure" command line), you may examine the
2022       compiled tree by specifying "-Dx" on the Perl command line.  The output
2023       takes several lines per node, and for "$b+$c" it looks like this:
2024
2025           5           TYPE = add  ===> 6
2026                       TARG = 1
2027                       FLAGS = (SCALAR,KIDS)
2028                       {
2029                           TYPE = null  ===> (4)
2030                             (was rv2sv)
2031                           FLAGS = (SCALAR,KIDS)
2032                           {
2033           3                   TYPE = gvsv  ===> 4
2034                               FLAGS = (SCALAR)
2035                               GV = main::b
2036                           }
2037                       }
2038                       {
2039                           TYPE = null  ===> (5)
2040                             (was rv2sv)
2041                           FLAGS = (SCALAR,KIDS)
2042                           {
2043           4                   TYPE = gvsv  ===> 5
2044                               FLAGS = (SCALAR)
2045                               GV = main::c
2046                           }
2047                       }
2048
2049       This tree has 5 nodes (one per "TYPE" specifier), only 3 of them are
2050       not optimized away (one per number in the left column).  The immediate
2051       children of the given node correspond to "{}" pairs on the same level
2052       of indentation, thus this listing corresponds to the tree:
2053
2054                          add
2055                        /     \
2056                      null    null
2057                       |       |
2058                      gvsv    gvsv
2059
2060       The execution order is indicated by "===>" marks, thus it is "3 4 5 6"
2061       (node 6 is not included into above listing), i.e., "gvsv gvsv add
2062       whatever".
2063
2064       Each of these nodes represents an op, a fundamental operation inside
2065       the Perl core.  The code which implements each operation can be found
2066       in the pp*.c files; the function which implements the op with type
2067       "gvsv" is "pp_gvsv", and so on.  As the tree above shows, different ops
2068       have different numbers of children: "add" is a binary operator, as one
2069       would expect, and so has two children.  To accommodate the various
2070       different numbers of children, there are various types of op data
2071       structure, and they link together in different ways.
2072
2073       The simplest type of op structure is "OP": this has no children.  Unary
2074       operators, "UNOP"s, have one child, and this is pointed to by the
2075       "op_first" field.  Binary operators ("BINOP"s) have not only an
2076       "op_first" field but also an "op_last" field.  The most complex type of
2077       op is a "LISTOP", which has any number of children.  In this case, the
2078       first child is pointed to by "op_first" and the last child by
2079       "op_last".  The children in between can be found by iteratively
2080       following the "OpSIBLING" pointer from the first child to the last (but
2081       see below).
2082
2083       There are also some other op types: a "PMOP" holds a regular
2084       expression, and has no children, and a "LOOP" may or may not have
2085       children.  If the "op_children" field is non-zero, it behaves like a
2086       "LISTOP".  To complicate matters, if a "UNOP" is actually a "null" op
2087       after optimization (see "Compile pass 2: context propagation") it will
2088       still have children in accordance with its former type.
2089
2090       Finally, there is a "LOGOP", or logic op. Like a "LISTOP", this has one
2091       or more children, but it doesn't have an "op_last" field: so you have
2092       to follow "op_first" and then the "OpSIBLING" chain itself to find the
2093       last child. Instead it has an "op_other" field, which is comparable to
2094       the "op_next" field described below, and represents an alternate
2095       execution path. Operators like "and", "or" and "?" are "LOGOP"s. Note
2096       that in general, "op_other" may not point to any of the direct children
2097       of the "LOGOP".
2098
2099       Starting in version 5.21.2, perls built with the experimental define
2100       "-DPERL_OP_PARENT" add an extra boolean flag for each op, "op_moresib".
2101       When not set, this indicates that this is the last op in an "OpSIBLING"
2102       chain. This frees up the "op_sibling" field on the last sibling to
2103       point back to the parent op. Under this build, that field is also
2104       renamed "op_sibparent" to reflect its joint role. The macro
2105       OpSIBLING(o) wraps this special behaviour, and always returns NULL on
2106       the last sibling.  With this build the op_parent(o) function can be
2107       used to find the parent of any op. Thus for forward compatibility, you
2108       should always use the OpSIBLING(o) macro rather than accessing
2109       "op_sibling" directly.
2110
2111       Another way to examine the tree is to use a compiler back-end module,
2112       such as B::Concise.
2113
2114   Compile pass 1: check routines
2115       The tree is created by the compiler while yacc code feeds it the
2116       constructions it recognizes.  Since yacc works bottom-up, so does the
2117       first pass of perl compilation.
2118
2119       What makes this pass interesting for perl developers is that some
2120       optimization may be performed on this pass.  This is optimization by
2121       so-called "check routines".  The correspondence between node names and
2122       corresponding check routines is described in opcode.pl (do not forget
2123       to run "make regen_headers" if you modify this file).
2124
2125       A check routine is called when the node is fully constructed except for
2126       the execution-order thread.  Since at this time there are no back-links
2127       to the currently constructed node, one can do most any operation to the
2128       top-level node, including freeing it and/or creating new nodes
2129       above/below it.
2130
2131       The check routine returns the node which should be inserted into the
2132       tree (if the top-level node was not modified, check routine returns its
2133       argument).
2134
2135       By convention, check routines have names "ck_*".  They are usually
2136       called from "new*OP" subroutines (or "convert") (which in turn are
2137       called from perly.y).
2138
2139   Compile pass 1a: constant folding
2140       Immediately after the check routine is called the returned node is
2141       checked for being compile-time executable.  If it is (the value is
2142       judged to be constant) it is immediately executed, and a constant node
2143       with the "return value" of the corresponding subtree is substituted
2144       instead.  The subtree is deleted.
2145
2146       If constant folding was not performed, the execution-order thread is
2147       created.
2148
2149   Compile pass 2: context propagation
2150       When a context for a part of compile tree is known, it is propagated
2151       down through the tree.  At this time the context can have 5 values
2152       (instead of 2 for runtime context): void, boolean, scalar, list, and
2153       lvalue.  In contrast with the pass 1 this pass is processed from top to
2154       bottom: a node's context determines the context for its children.
2155
2156       Additional context-dependent optimizations are performed at this time.
2157       Since at this moment the compile tree contains back-references (via
2158       "thread" pointers), nodes cannot be free()d now.  To allow optimized-
2159       away nodes at this stage, such nodes are null()ified instead of
2160       free()ing (i.e. their type is changed to OP_NULL).
2161
2162   Compile pass 3: peephole optimization
2163       After the compile tree for a subroutine (or for an "eval" or a file) is
2164       created, an additional pass over the code is performed.  This pass is
2165       neither top-down or bottom-up, but in the execution order (with
2166       additional complications for conditionals).  Optimizations performed at
2167       this stage are subject to the same restrictions as in the pass 2.
2168
2169       Peephole optimizations are done by calling the function pointed to by
2170       the global variable "PL_peepp".  By default, "PL_peepp" just calls the
2171       function pointed to by the global variable "PL_rpeepp".  By default,
2172       that performs some basic op fixups and optimisations along the
2173       execution-order op chain, and recursively calls "PL_rpeepp" for each
2174       side chain of ops (resulting from conditionals).  Extensions may
2175       provide additional optimisations or fixups, hooking into either the
2176       per-subroutine or recursive stage, like this:
2177
2178           static peep_t prev_peepp;
2179           static void my_peep(pTHX_ OP *o)
2180           {
2181               /* custom per-subroutine optimisation goes here */
2182               prev_peepp(aTHX_ o);
2183               /* custom per-subroutine optimisation may also go here */
2184           }
2185           BOOT:
2186               prev_peepp = PL_peepp;
2187               PL_peepp = my_peep;
2188
2189           static peep_t prev_rpeepp;
2190           static void my_rpeep(pTHX_ OP *first)
2191           {
2192               OP *o = first, *t = first;
2193               for(; o = o->op_next, t = t->op_next) {
2194                   /* custom per-op optimisation goes here */
2195                   o = o->op_next;
2196                   if (!o || o == t) break;
2197                   /* custom per-op optimisation goes AND here */
2198               }
2199               prev_rpeepp(aTHX_ orig_o);
2200           }
2201           BOOT:
2202               prev_rpeepp = PL_rpeepp;
2203               PL_rpeepp = my_rpeep;
2204
2205   Pluggable runops
2206       The compile tree is executed in a runops function.  There are two
2207       runops functions, in run.c and in dump.c.  "Perl_runops_debug" is used
2208       with DEBUGGING and "Perl_runops_standard" is used otherwise.  For fine
2209       control over the execution of the compile tree it is possible to
2210       provide your own runops function.
2211
2212       It's probably best to copy one of the existing runops functions and
2213       change it to suit your needs.  Then, in the BOOT section of your XS
2214       file, add the line:
2215
2216         PL_runops = my_runops;
2217
2218       This function should be as efficient as possible to keep your programs
2219       running as fast as possible.
2220
2221   Compile-time scope hooks
2222       As of perl 5.14 it is possible to hook into the compile-time lexical
2223       scope mechanism using "Perl_blockhook_register".  This is used like
2224       this:
2225
2226           STATIC void my_start_hook(pTHX_ int full);
2227           STATIC BHK my_hooks;
2228
2229           BOOT:
2230               BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
2231               Perl_blockhook_register(aTHX_ &my_hooks);
2232
2233       This will arrange to have "my_start_hook" called at the start of
2234       compiling every lexical scope.  The available hooks are:
2235
2236       "void bhk_start(pTHX_ int full)"
2237           This is called just after starting a new lexical scope.  Note that
2238           Perl code like
2239
2240               if ($x) { ... }
2241
2242           creates two scopes: the first starts at the "(" and has "full ==
2243           1", the second starts at the "{" and has "full == 0".  Both end at
2244           the "}", so calls to "start" and "pre"/"post_end" will match.
2245           Anything pushed onto the save stack by this hook will be popped
2246           just before the scope ends (between the "pre_" and "post_end"
2247           hooks, in fact).
2248
2249       "void bhk_pre_end(pTHX_ OP **o)"
2250           This is called at the end of a lexical scope, just before unwinding
2251           the stack.  o is the root of the optree representing the scope; it
2252           is a double pointer so you can replace the OP if you need to.
2253
2254       "void bhk_post_end(pTHX_ OP **o)"
2255           This is called at the end of a lexical scope, just after unwinding
2256           the stack.  o is as above.  Note that it is possible for calls to
2257           "pre_" and "post_end" to nest, if there is something on the save
2258           stack that calls string eval.
2259
2260       "void bhk_eval(pTHX_ OP *const o)"
2261           This is called just before starting to compile an "eval STRING",
2262           "do FILE", "require" or "use", after the eval has been set up.  o
2263           is the OP that requested the eval, and will normally be an
2264           "OP_ENTEREVAL", "OP_DOFILE" or "OP_REQUIRE".
2265
2266       Once you have your hook functions, you need a "BHK" structure to put
2267       them in.  It's best to allocate it statically, since there is no way to
2268       free it once it's registered.  The function pointers should be inserted
2269       into this structure using the "BhkENTRY_set" macro, which will also set
2270       flags indicating which entries are valid.  If you do need to allocate
2271       your "BHK" dynamically for some reason, be sure to zero it before you
2272       start.
2273
2274       Once registered, there is no mechanism to switch these hooks off, so if
2275       that is necessary you will need to do this yourself.  An entry in "%^H"
2276       is probably the best way, so the effect is lexically scoped; however it
2277       is also possible to use the "BhkDISABLE" and "BhkENABLE" macros to
2278       temporarily switch entries on and off.  You should also be aware that
2279       generally speaking at least one scope will have opened before your
2280       extension is loaded, so you will see some "pre"/"post_end" pairs that
2281       didn't have a matching "start".
2282

Examining internal data structures with the "dump" functions

2284       To aid debugging, the source file dump.c contains a number of functions
2285       which produce formatted output of internal data structures.
2286
2287       The most commonly used of these functions is "Perl_sv_dump"; it's used
2288       for dumping SVs, AVs, HVs, and CVs.  The "Devel::Peek" module calls
2289       "sv_dump" to produce debugging output from Perl-space, so users of that
2290       module should already be familiar with its format.
2291
2292       "Perl_op_dump" can be used to dump an "OP" structure or any of its
2293       derivatives, and produces output similar to "perl -Dx"; in fact,
2294       "Perl_dump_eval" will dump the main root of the code being evaluated,
2295       exactly like "-Dx".
2296
2297       Other useful functions are "Perl_dump_sub", which turns a "GV" into an
2298       op tree, "Perl_dump_packsubs" which calls "Perl_dump_sub" on all the
2299       subroutines in a package like so: (Thankfully, these are all xsubs, so
2300       there is no op tree)
2301
2302           (gdb) print Perl_dump_packsubs(PL_defstash)
2303
2304           SUB attributes::bootstrap = (xsub 0x811fedc 0)
2305
2306           SUB UNIVERSAL::can = (xsub 0x811f50c 0)
2307
2308           SUB UNIVERSAL::isa = (xsub 0x811f304 0)
2309
2310           SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
2311
2312           SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
2313
2314       and "Perl_dump_all", which dumps all the subroutines in the stash and
2315       the op tree of the main root.
2316

How multiple interpreters and concurrency are supported

2318   Background and PERL_IMPLICIT_CONTEXT
2319       The Perl interpreter can be regarded as a closed box: it has an API for
2320       feeding it code or otherwise making it do things, but it also has
2321       functions for its own use.  This smells a lot like an object, and there
2322       is a way for you to build Perl so that you can have multiple
2323       interpreters, with one interpreter represented either as a C structure,
2324       or inside a thread-specific structure.  These structures contain all
2325       the context, the state of that interpreter.
2326
2327       The macro that controls the major Perl build flavor is MULTIPLICITY.
2328       The MULTIPLICITY build has a C structure that packages all the
2329       interpreter state.  With multiplicity-enabled perls,
2330       PERL_IMPLICIT_CONTEXT is also normally defined, and enables the support
2331       for passing in a "hidden" first argument that represents all three data
2332       structures.  MULTIPLICITY makes multi-threaded perls possible (with the
2333       ithreads threading model, related to the macro USE_ITHREADS.)
2334
2335       To see whether you have non-const data you can use a BSD (or GNU)
2336       compatible "nm":
2337
2338         nm libperl.a | grep -v ' [TURtr] '
2339
2340       If this displays any "D" or "d" symbols (or possibly "C" or "c"), you
2341       have non-const data.  The symbols the "grep" removed are as follows:
2342       "Tt" are text, or code, the "Rr" are read-only (const) data, and the
2343       "U" is <undefined>, external symbols referred to.
2344
2345       The test t/porting/libperl.t does this kind of symbol sanity checking
2346       on "libperl.a".
2347
2348       All this obviously requires a way for the Perl internal functions to be
2349       either subroutines taking some kind of structure as the first argument,
2350       or subroutines taking nothing as the first argument.  To enable these
2351       two very different ways of building the interpreter, the Perl source
2352       (as it does in so many other situations) makes heavy use of macros and
2353       subroutine naming conventions.
2354
2355       First problem: deciding which functions will be public API functions
2356       and which will be private.  All functions whose names begin "S_" are
2357       private (think "S" for "secret" or "static").  All other functions
2358       begin with "Perl_", but just because a function begins with "Perl_"
2359       does not mean it is part of the API.  (See "Internal Functions".)  The
2360       easiest way to be sure a function is part of the API is to find its
2361       entry in perlapi.  If it exists in perlapi, it's part of the API.  If
2362       it doesn't, and you think it should be (i.e., you need it for your
2363       extension), submit an issue at <https://github.com/Perl/perl5/issues>
2364       explaining why you think it should be.
2365
2366       Second problem: there must be a syntax so that the same subroutine
2367       declarations and calls can pass a structure as their first argument, or
2368       pass nothing.  To solve this, the subroutines are named and declared in
2369       a particular way.  Here's a typical start of a static function used
2370       within the Perl guts:
2371
2372         STATIC void
2373         S_incline(pTHX_ char *s)
2374
2375       STATIC becomes "static" in C, and may be #define'd to nothing in some
2376       configurations in the future.
2377
2378       A public function (i.e. part of the internal API, but not necessarily
2379       sanctioned for use in extensions) begins like this:
2380
2381         void
2382         Perl_sv_setiv(pTHX_ SV* dsv, IV num)
2383
2384       "pTHX_" is one of a number of macros (in perl.h) that hide the details
2385       of the interpreter's context.  THX stands for "thread", "this", or
2386       "thingy", as the case may be.  (And no, George Lucas is not involved.
2387       :-) The first character could be 'p' for a prototype, 'a' for argument,
2388       or 'd' for declaration, so we have "pTHX", "aTHX" and "dTHX", and their
2389       variants.
2390
2391       When Perl is built without options that set PERL_IMPLICIT_CONTEXT,
2392       there is no first argument containing the interpreter's context.  The
2393       trailing underscore in the pTHX_ macro indicates that the macro
2394       expansion needs a comma after the context argument because other
2395       arguments follow it.  If PERL_IMPLICIT_CONTEXT is not defined, pTHX_
2396       will be ignored, and the subroutine is not prototyped to take the extra
2397       argument.  The form of the macro without the trailing underscore is
2398       used when there are no additional explicit arguments.
2399
2400       When a core function calls another, it must pass the context.  This is
2401       normally hidden via macros.  Consider "sv_setiv".  It expands into
2402       something like this:
2403
2404           #ifdef PERL_IMPLICIT_CONTEXT
2405             #define sv_setiv(a,b)      Perl_sv_setiv(aTHX_ a, b)
2406             /* can't do this for vararg functions, see below */
2407           #else
2408             #define sv_setiv           Perl_sv_setiv
2409           #endif
2410
2411       This works well, and means that XS authors can gleefully write:
2412
2413           sv_setiv(foo, bar);
2414
2415       and still have it work under all the modes Perl could have been
2416       compiled with.
2417
2418       This doesn't work so cleanly for varargs functions, though, as macros
2419       imply that the number of arguments is known in advance.  Instead we
2420       either need to spell them out fully, passing "aTHX_" as the first
2421       argument (the Perl core tends to do this with functions like
2422       Perl_warner), or use a context-free version.
2423
2424       The context-free version of Perl_warner is called
2425       Perl_warner_nocontext, and does not take the extra argument.  Instead
2426       it does "dTHX;" to get the context from thread-local storage.  We
2427       "#define warner Perl_warner_nocontext" so that extensions get source
2428       compatibility at the expense of performance.  (Passing an arg is
2429       cheaper than grabbing it from thread-local storage.)
2430
2431       You can ignore [pad]THXx when browsing the Perl headers/sources.  Those
2432       are strictly for use within the core.  Extensions and embedders need
2433       only be aware of [pad]THX.
2434
2435   So what happened to dTHR?
2436       "dTHR" was introduced in perl 5.005 to support the older thread model.
2437       The older thread model now uses the "THX" mechanism to pass context
2438       pointers around, so "dTHR" is not useful any more.  Perl 5.6.0 and
2439       later still have it for backward source compatibility, but it is
2440       defined to be a no-op.
2441
2442   How do I use all this in extensions?
2443       When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any
2444       functions in the Perl API will need to pass the initial context
2445       argument somehow.  The kicker is that you will need to write it in such
2446       a way that the extension still compiles when Perl hasn't been built
2447       with PERL_IMPLICIT_CONTEXT enabled.
2448
2449       There are three ways to do this.  First, the easy but inefficient way,
2450       which is also the default, in order to maintain source compatibility
2451       with extensions: whenever XSUB.h is #included, it redefines the aTHX
2452       and aTHX_ macros to call a function that will return the context.
2453       Thus, something like:
2454
2455               sv_setiv(sv, num);
2456
2457       in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
2458       in effect:
2459
2460               Perl_sv_setiv(Perl_get_context(), sv, num);
2461
2462       or to this otherwise:
2463
2464               Perl_sv_setiv(sv, num);
2465
2466       You don't have to do anything new in your extension to get this; since
2467       the Perl library provides Perl_get_context(), it will all just work.
2468
2469       The second, more efficient way is to use the following template for
2470       your Foo.xs:
2471
2472               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2473               #include "EXTERN.h"
2474               #include "perl.h"
2475               #include "XSUB.h"
2476
2477               STATIC void my_private_function(int arg1, int arg2);
2478
2479               STATIC void
2480               my_private_function(int arg1, int arg2)
2481               {
2482                   dTHX;       /* fetch context */
2483                   ... call many Perl API functions ...
2484               }
2485
2486               [... etc ...]
2487
2488               MODULE = Foo            PACKAGE = Foo
2489
2490               /* typical XSUB */
2491
2492               void
2493               my_xsub(arg)
2494                       int arg
2495                   CODE:
2496                       my_private_function(arg, 10);
2497
2498       Note that the only two changes from the normal way of writing an
2499       extension is the addition of a "#define PERL_NO_GET_CONTEXT" before
2500       including the Perl headers, followed by a "dTHX;" declaration at the
2501       start of every function that will call the Perl API.  (You'll know
2502       which functions need this, because the C compiler will complain that
2503       there's an undeclared identifier in those functions.)  No changes are
2504       needed for the XSUBs themselves, because the XS() macro is correctly
2505       defined to pass in the implicit context if needed.
2506
2507       The third, even more efficient way is to ape how it is done within the
2508       Perl guts:
2509
2510               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2511               #include "EXTERN.h"
2512               #include "perl.h"
2513               #include "XSUB.h"
2514
2515               /* pTHX_ only needed for functions that call Perl API */
2516               STATIC void my_private_function(pTHX_ int arg1, int arg2);
2517
2518               STATIC void
2519               my_private_function(pTHX_ int arg1, int arg2)
2520               {
2521                   /* dTHX; not needed here, because THX is an argument */
2522                   ... call Perl API functions ...
2523               }
2524
2525               [... etc ...]
2526
2527               MODULE = Foo            PACKAGE = Foo
2528
2529               /* typical XSUB */
2530
2531               void
2532               my_xsub(arg)
2533                       int arg
2534                   CODE:
2535                       my_private_function(aTHX_ arg, 10);
2536
2537       This implementation never has to fetch the context using a function
2538       call, since it is always passed as an extra argument.  Depending on
2539       your needs for simplicity or efficiency, you may mix the previous two
2540       approaches freely.
2541
2542       Never add a comma after "pTHX" yourself--always use the form of the
2543       macro with the underscore for functions that take explicit arguments,
2544       or the form without the argument for functions with no explicit
2545       arguments.
2546
2547   Should I do anything special if I call perl from multiple threads?
2548       If you create interpreters in one thread and then proceed to call them
2549       in another, you need to make sure perl's own Thread Local Storage (TLS)
2550       slot is initialized correctly in each of those threads.
2551
2552       The "perl_alloc" and "perl_clone" API functions will automatically set
2553       the TLS slot to the interpreter they created, so that there is no need
2554       to do anything special if the interpreter is always accessed in the
2555       same thread that created it, and that thread did not create or call any
2556       other interpreters afterwards.  If that is not the case, you have to
2557       set the TLS slot of the thread before calling any functions in the Perl
2558       API on that particular interpreter.  This is done by calling the
2559       "PERL_SET_CONTEXT" macro in that thread as the first thing you do:
2560
2561               /* do this before doing anything else with some_perl */
2562               PERL_SET_CONTEXT(some_perl);
2563
2564               ... other Perl API calls on some_perl go here ...
2565
2566   Future Plans and PERL_IMPLICIT_SYS
2567       Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2568       that the interpreter knows about itself and pass it around, so too are
2569       there plans to allow the interpreter to bundle up everything it knows
2570       about the environment it's running on.  This is enabled with the
2571       PERL_IMPLICIT_SYS macro.  Currently it only works with USE_ITHREADS on
2572       Windows.
2573
2574       This allows the ability to provide an extra pointer (called the "host"
2575       environment) for all the system calls.  This makes it possible for all
2576       the system stuff to maintain their own state, broken down into seven C
2577       structures.  These are thin wrappers around the usual system calls (see
2578       win32/perllib.c) for the default perl executable, but for a more
2579       ambitious host (like the one that would do fork() emulation) all the
2580       extra work needed to pretend that different interpreters are actually
2581       different "processes", would be done here.
2582
2583       The Perl engine/interpreter and the host are orthogonal entities.
2584       There could be one or more interpreters in a process, and one or more
2585       "hosts", with free association between them.
2586

Internal Functions

2588       All of Perl's internal functions which will be exposed to the outside
2589       world are prefixed by "Perl_" so that they will not conflict with XS
2590       functions or functions used in a program in which Perl is embedded.
2591       Similarly, all global variables begin with "PL_".  (By convention,
2592       static functions start with "S_".)
2593
2594       Inside the Perl core ("PERL_CORE" defined), you can get at the
2595       functions either with or without the "Perl_" prefix, thanks to a bunch
2596       of defines that live in embed.h.  Note that extension code should not
2597       set "PERL_CORE"; this exposes the full perl internals, and is likely to
2598       cause breakage of the XS in each new perl release.
2599
2600       The file embed.h is generated automatically from embed.pl and
2601       embed.fnc.  embed.pl also creates the prototyping header files for the
2602       internal functions, generates the documentation and a lot of other bits
2603       and pieces.  It's important that when you add a new function to the
2604       core or change an existing one, you change the data in the table in
2605       embed.fnc as well.  Here's a sample entry from that table:
2606
2607           Apd |SV**   |av_fetch   |AV* ar|I32 key|I32 lval
2608
2609       The first column is a set of flags, the second column the return type,
2610       the third column the name.  Columns after that are the arguments.  The
2611       flags are documented at the top of embed.fnc.
2612
2613       If you edit embed.pl or embed.fnc, you will need to run "make
2614       regen_headers" to force a rebuild of embed.h and other auto-generated
2615       files.
2616
2617   Formatted Printing of IVs, UVs, and NVs
2618       If you are printing IVs, UVs, or NVS instead of the stdio(3) style
2619       formatting codes like %d, %ld, %f, you should use the following macros
2620       for portability
2621
2622               IVdf            IV in decimal
2623               UVuf            UV in decimal
2624               UVof            UV in octal
2625               UVxf            UV in hexadecimal
2626               NVef            NV %e-like
2627               NVff            NV %f-like
2628               NVgf            NV %g-like
2629
2630       These will take care of 64-bit integers and long doubles.  For example:
2631
2632               printf("IV is %" IVdf "\n", iv);
2633
2634       The "IVdf" will expand to whatever is the correct format for the IVs.
2635       Note that the spaces are required around the format in case the code is
2636       compiled with C++, to maintain compliance with its standard.
2637
2638       Note that there are different "long doubles": Perl will use whatever
2639       the compiler has.
2640
2641       If you are printing addresses of pointers, use %p or UVxf combined with
2642       PTR2UV().
2643
2644   Formatted Printing of SVs
2645       The contents of SVs may be printed using the "SVf" format, like so:
2646
2647        Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg))
2648
2649       where "err_msg" is an SV.
2650
2651       Not all scalar types are printable.  Simple values certainly are: one
2652       of IV, UV, NV, or PV.  Also, if the SV is a reference to some value,
2653       either it will be dereferenced and the value printed, or information
2654       about the type of that value and its address are displayed.  The
2655       results of printing any other type of SV are undefined and likely to
2656       lead to an interpreter crash.  NVs are printed using a %g-ish format.
2657
2658       Note that the spaces are required around the "SVf" in case the code is
2659       compiled with C++, to maintain compliance with its standard.
2660
2661       Note that any filehandle being printed to under UTF-8 must be expecting
2662       UTF-8 in order to get good results and avoid Wide-character warnings.
2663       One way to do this for typical filehandles is to invoke perl with the
2664       "-C"> parameter.  (See "-C [number/list]" in perlrun.
2665
2666       You can use this to concatenate two scalars:
2667
2668        SV *var1 = get_sv("var1", GV_ADD);
2669        SV *var2 = get_sv("var2", GV_ADD);
2670        SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf,
2671                            SVfARG(var1), SVfARG(var2));
2672
2673   Formatted Printing of Strings
2674       If you just want the bytes printed in a 7bit NUL-terminated string, you
2675       can just use %s (assuming they are all really only 7bit).  But if there
2676       is a possibility the value will be encoded as UTF-8 or contains bytes
2677       above 0x7F (and therefore 8bit), you should instead use the "UTF8f"
2678       format.  And as its parameter, use the "UTF8fARG()" macro:
2679
2680        chr * msg;
2681
2682        /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK
2683           U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */
2684        if (can_utf8)
2685          msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99";
2686        else
2687          msg = "'Uses simple quotes'";
2688
2689        Perl_croak(aTHX_ "The message is: %" UTF8f "\n",
2690                         UTF8fARG(can_utf8, strlen(msg), msg));
2691
2692       The first parameter to "UTF8fARG" is a boolean: 1 if the string is in
2693       UTF-8; 0 if string is in native byte encoding (Latin1).  The second
2694       parameter is the number of bytes in the string to print.  And the third
2695       and final parameter is a pointer to the first byte in the string.
2696
2697       Note that any filehandle being printed to under UTF-8 must be expecting
2698       UTF-8 in order to get good results and avoid Wide-character warnings.
2699       One way to do this for typical filehandles is to invoke perl with the
2700       "-C"> parameter.  (See "-C [number/list]" in perlrun.
2701
2702   Formatted Printing of "Size_t" and "SSize_t"
2703       The most general way to do this is to cast them to a UV or IV, and
2704       print as in the previous section.
2705
2706       But if you're using "PerlIO_printf()", it's less typing and visual
2707       clutter to use the %z length modifier (for siZe):
2708
2709               PerlIO_printf("STRLEN is %zu\n", len);
2710
2711       This modifier is not portable, so its use should be restricted to
2712       "PerlIO_printf()".
2713
2714   Formatted Printing of "Ptrdiff_t", "intmax_t", "short" and other special
2715       sizes
2716       There are modifiers for these special situations if you are using
2717       "PerlIO_printf()".  See "size" in perlfunc.
2718
2719   Pointer-To-Integer and Integer-To-Pointer
2720       Because pointer size does not necessarily equal integer size, use the
2721       follow macros to do it right.
2722
2723               PTR2UV(pointer)
2724               PTR2IV(pointer)
2725               PTR2NV(pointer)
2726               INT2PTR(pointertotype, integer)
2727
2728       For example:
2729
2730               IV  iv = ...;
2731               SV *sv = INT2PTR(SV*, iv);
2732
2733       and
2734
2735               AV *av = ...;
2736               UV  uv = PTR2UV(av);
2737
2738       There are also
2739
2740        PTR2nat(pointer)   /* pointer to integer of PTRSIZE */
2741        PTR2ul(pointer)    /* pointer to unsigned long */
2742
2743       And "PTRV" which gives the native type for an integer the same size as
2744       pointers, such as "unsigned" or "unsigned long".
2745
2746   Exception Handling
2747       There are a couple of macros to do very basic exception handling in XS
2748       modules.  You have to define "NO_XSLOCKS" before including XSUB.h to be
2749       able to use these macros:
2750
2751               #define NO_XSLOCKS
2752               #include "XSUB.h"
2753
2754       You can use these macros if you call code that may croak, but you need
2755       to do some cleanup before giving control back to Perl.  For example:
2756
2757               dXCPT;    /* set up necessary variables */
2758
2759               XCPT_TRY_START {
2760                 code_that_may_croak();
2761               } XCPT_TRY_END
2762
2763               XCPT_CATCH
2764               {
2765                 /* do cleanup here */
2766                 XCPT_RETHROW;
2767               }
2768
2769       Note that you always have to rethrow an exception that has been caught.
2770       Using these macros, it is not possible to just catch the exception and
2771       ignore it.  If you have to ignore the exception, you have to use the
2772       "call_*" function.
2773
2774       The advantage of using the above macros is that you don't have to setup
2775       an extra function for "call_*", and that using these macros is faster
2776       than using "call_*".
2777
2778   Source Documentation
2779       There's an effort going on to document the internal functions and
2780       automatically produce reference manuals from them -- perlapi is one
2781       such manual which details all the functions which are available to XS
2782       writers.  perlintern is the autogenerated manual for the functions
2783       which are not part of the API and are supposedly for internal use only.
2784
2785       Source documentation is created by putting POD comments into the C
2786       source, like this:
2787
2788        /*
2789        =for apidoc sv_setiv
2790
2791        Copies an integer into the given SV.  Does not handle 'set' magic.  See
2792        L<perlapi/sv_setiv_mg>.
2793
2794        =cut
2795        */
2796
2797       Please try and supply some documentation if you add functions to the
2798       Perl core.
2799
2800   Backwards compatibility
2801       The Perl API changes over time.  New functions are added or the
2802       interfaces of existing functions are changed.  The "Devel::PPPort"
2803       module tries to provide compatibility code for some of these changes,
2804       so XS writers don't have to code it themselves when supporting multiple
2805       versions of Perl.
2806
2807       "Devel::PPPort" generates a C header file ppport.h that can also be run
2808       as a Perl script.  To generate ppport.h, run:
2809
2810           perl -MDevel::PPPort -eDevel::PPPort::WriteFile
2811
2812       Besides checking existing XS code, the script can also be used to
2813       retrieve compatibility information for various API calls using the
2814       "--api-info" command line switch.  For example:
2815
2816         % perl ppport.h --api-info=sv_magicext
2817
2818       For details, see "perldoc ppport.h".
2819

Unicode Support

2821       Perl 5.6.0 introduced Unicode support.  It's important for porters and
2822       XS writers to understand this support and make sure that the code they
2823       write does not corrupt Unicode data.
2824
2825   What is Unicode, anyway?
2826       In the olden, less enlightened times, we all used to use ASCII.  Most
2827       of us did, anyway.  The big problem with ASCII is that it's American.
2828       Well, no, that's not actually the problem; the problem is that it's not
2829       particularly useful for people who don't use the Roman alphabet.  What
2830       used to happen was that particular languages would stick their own
2831       alphabet in the upper range of the sequence, between 128 and 255.  Of
2832       course, we then ended up with plenty of variants that weren't quite
2833       ASCII, and the whole point of it being a standard was lost.
2834
2835       Worse still, if you've got a language like Chinese or Japanese that has
2836       hundreds or thousands of characters, then you really can't fit them
2837       into a mere 256, so they had to forget about ASCII altogether, and
2838       build their own systems using pairs of numbers to refer to one
2839       character.
2840
2841       To fix this, some people formed Unicode, Inc. and produced a new
2842       character set containing all the characters you can possibly think of
2843       and more.  There are several ways of representing these characters, and
2844       the one Perl uses is called UTF-8.  UTF-8 uses a variable number of
2845       bytes to represent a character.  You can learn more about Unicode and
2846       Perl's Unicode model in perlunicode.
2847
2848       (On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of
2849       UTF-8 adapted for EBCDIC platforms.  Below, we just talk about UTF-8.
2850       UTF-EBCDIC is like UTF-8, but the details are different.  The macros
2851       hide the differences from you, just remember that the particular
2852       numbers and bit patterns presented below will differ in UTF-EBCDIC.)
2853
2854   How can I recognise a UTF-8 string?
2855       You can't.  This is because UTF-8 data is stored in bytes just like
2856       non-UTF-8 data.  The Unicode character 200, (0xC8 for you hex types)
2857       capital E with a grave accent, is represented by the two bytes
2858       "v196.172".  Unfortunately, the non-Unicode string "chr(196).chr(172)"
2859       has that byte sequence as well.  So you can't tell just by looking --
2860       this is what makes Unicode input an interesting problem.
2861
2862       In general, you either have to know what you're dealing with, or you
2863       have to guess.  The API function "is_utf8_string" can help; it'll tell
2864       you if a string contains only valid UTF-8 characters, and the chances
2865       of a non-UTF-8 string looking like valid UTF-8 become very small very
2866       quickly with increasing string length.  On a character-by-character
2867       basis, "isUTF8_CHAR" will tell you whether the current character in a
2868       string is valid UTF-8.
2869
2870   How does UTF-8 represent Unicode characters?
2871       As mentioned above, UTF-8 uses a variable number of bytes to store a
2872       character.  Characters with values 0...127 are stored in one byte, just
2873       like good ol' ASCII.  Character 128 is stored as "v194.128"; this
2874       continues up to character 191, which is "v194.191".  Now we've run out
2875       of bits (191 is binary 10111111) so we move on; character 192 is
2876       "v195.128".  And so it goes on, moving to three bytes at character
2877       2048.  "Unicode Encodings" in perlunicode has pictures of how this
2878       works.
2879
2880       Assuming you know you're dealing with a UTF-8 string, you can find out
2881       how long the first character in it is with the "UTF8SKIP" macro:
2882
2883           char *utf = "\305\233\340\240\201";
2884           I32 len;
2885
2886           len = UTF8SKIP(utf); /* len is 2 here */
2887           utf += len;
2888           len = UTF8SKIP(utf); /* len is 3 here */
2889
2890       Another way to skip over characters in a UTF-8 string is to use
2891       "utf8_hop", which takes a string and a number of characters to skip
2892       over.  You're on your own about bounds checking, though, so don't use
2893       it lightly.
2894
2895       All bytes in a multi-byte UTF-8 character will have the high bit set,
2896       so you can test if you need to do something special with this character
2897       like this (the "UTF8_IS_INVARIANT()" is a macro that tests whether the
2898       byte is encoded as a single byte even in UTF-8):
2899
2900           U8 *utf;     /* Initialize this to point to the beginning of the
2901                           sequence to convert */
2902           U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence
2903                           pointed to by 'utf' */
2904           UV uv;       /* Returned code point; note: a UV, not a U8, not a
2905                           char */
2906           STRLEN len; /* Returned length of character in bytes */
2907
2908           if (!UTF8_IS_INVARIANT(*utf))
2909               /* Must treat this as UTF-8 */
2910               uv = utf8_to_uvchr_buf(utf, utf_end, &len);
2911           else
2912               /* OK to treat this character as a byte */
2913               uv = *utf;
2914
2915       You can also see in that example that we use "utf8_to_uvchr_buf" to get
2916       the value of the character; the inverse function "uvchr_to_utf8" is
2917       available for putting a UV into UTF-8:
2918
2919           if (!UVCHR_IS_INVARIANT(uv))
2920               /* Must treat this as UTF8 */
2921               utf8 = uvchr_to_utf8(utf8, uv);
2922           else
2923               /* OK to treat this character as a byte */
2924               *utf8++ = uv;
2925
2926       You must convert characters to UVs using the above functions if you're
2927       ever in a situation where you have to match UTF-8 and non-UTF-8
2928       characters.  You may not skip over UTF-8 characters in this case.  If
2929       you do this, you'll lose the ability to match hi-bit non-UTF-8
2930       characters; for instance, if your UTF-8 string contains "v196.172", and
2931       you skip that character, you can never match a "chr(200)" in a
2932       non-UTF-8 string.  So don't do that!
2933
2934       (Note that we don't have to test for invariant characters in the
2935       examples above.  The functions work on any well-formed UTF-8 input.
2936       It's just that its faster to avoid the function overhead when it's not
2937       needed.)
2938
2939   How does Perl store UTF-8 strings?
2940       Currently, Perl deals with UTF-8 strings and non-UTF-8 strings slightly
2941       differently.  A flag in the SV, "SVf_UTF8", indicates that the string
2942       is internally encoded as UTF-8.  Without it, the byte value is the
2943       codepoint number and vice versa.  This flag is only meaningful if the
2944       SV is "SvPOK" or immediately after stringification via "SvPV" or a
2945       similar macro.  You can check and manipulate this flag with the
2946       following macros:
2947
2948           SvUTF8(sv)
2949           SvUTF8_on(sv)
2950           SvUTF8_off(sv)
2951
2952       This flag has an important effect on Perl's treatment of the string: if
2953       UTF-8 data is not properly distinguished, regular expressions,
2954       "length", "substr" and other string handling operations will have
2955       undesirable (wrong) results.
2956
2957       The problem comes when you have, for instance, a string that isn't
2958       flagged as UTF-8, and contains a byte sequence that could be UTF-8 --
2959       especially when combining non-UTF-8 and UTF-8 strings.
2960
2961       Never forget that the "SVf_UTF8" flag is separate from the PV value;
2962       you need to be sure you don't accidentally knock it off while you're
2963       manipulating SVs.  More specifically, you cannot expect to do this:
2964
2965           SV *sv;
2966           SV *nsv;
2967           STRLEN len;
2968           char *p;
2969
2970           p = SvPV(sv, len);
2971           frobnicate(p);
2972           nsv = newSVpvn(p, len);
2973
2974       The "char*" string does not tell you the whole story, and you can't
2975       copy or reconstruct an SV just by copying the string value.  Check if
2976       the old SV has the UTF8 flag set (after the "SvPV" call), and act
2977       accordingly:
2978
2979           p = SvPV(sv, len);
2980           is_utf8 = SvUTF8(sv);
2981           frobnicate(p, is_utf8);
2982           nsv = newSVpvn(p, len);
2983           if (is_utf8)
2984               SvUTF8_on(nsv);
2985
2986       In the above, your "frobnicate" function has been changed to be made
2987       aware of whether or not it's dealing with UTF-8 data, so that it can
2988       handle the string appropriately.
2989
2990       Since just passing an SV to an XS function and copying the data of the
2991       SV is not enough to copy the UTF8 flags, even less right is just
2992       passing a "char *" to an XS function.
2993
2994       For full generality, use the "DO_UTF8" macro to see if the string in an
2995       SV is to be treated as UTF-8.  This takes into account if the call to
2996       the XS function is being made from within the scope of "use bytes".  If
2997       so, the underlying bytes that comprise the UTF-8 string are to be
2998       exposed, rather than the character they represent.  But this pragma
2999       should only really be used for debugging and perhaps low-level testing
3000       at the byte level.  Hence most XS code need not concern itself with
3001       this, but various areas of the perl core do need to support it.
3002
3003       And this isn't the whole story.  Starting in Perl v5.12, strings that
3004       aren't encoded in UTF-8 may also be treated as Unicode under various
3005       conditions (see "ASCII Rules versus Unicode Rules" in perlunicode).
3006       This is only really a problem for characters whose ordinals are between
3007       128 and 255, and their behavior varies under ASCII versus Unicode rules
3008       in ways that your code cares about (see "The "Unicode Bug"" in
3009       perlunicode).  There is no published API for dealing with this, as it
3010       is subject to change, but you can look at the code for "pp_lc" in pp.c
3011       for an example as to how it's currently done.
3012
3013   How do I pass a Perl string to a C library?
3014       A Perl string, conceptually, is an opaque sequence of code points.
3015       Many C libraries expect their inputs to be "classical" C strings, which
3016       are arrays of octets 1-255, terminated with a NUL byte. Your job when
3017       writing an interface between Perl and a C library is to define the
3018       mapping between Perl and that library.
3019
3020       Generally speaking, "SvPVbyte" and related macros suit this task well.
3021       These assume that your Perl string is a "byte string", i.e., is either
3022       raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8.
3023
3024       Alternatively, if your C library expects UTF-8 text, you can use
3025       "SvPVutf8" and related macros. This has the same effect as encoding to
3026       UTF-8 then calling the corresponding "SvPVbyte"-related macro.
3027
3028       Some C libraries may expect other encodings (e.g., UTF-16LE). To give
3029       Perl strings to such libraries you must either do that encoding in Perl
3030       then use "SvPVbyte", or use an intermediary C library to convert from
3031       however Perl stores the string to the desired encoding.
3032
3033       Take care also that NULs in your Perl string don't confuse the C
3034       library. If possible, give the string's length to the C library; if
3035       that's not possible, consider rejecting strings that contain NUL bytes.
3036
3037       What about "SvPV", "SvPV_nolen", etc.?
3038
3039       Consider a 3-character Perl string "$foo = "\x64\x78\x8c"".  Perl can
3040       store these 3 characters either of two ways:
3041
3042       •   bytes: 0x64 0x78 0x8c
3043
3044       •   UTF-8: 0x64 0x78 0xc2 0x8c
3045
3046       Now let's say you convert $foo to a C string thus:
3047
3048           STRLEN strlen;
3049           char *str = SvPV(foo_sv, strlen);
3050
3051       At this point "str" could point to a 3-byte C string or a 4-byte one.
3052
3053       Generally speaking, we want "str" to be the same regardless of how Perl
3054       stores $foo, so the ambiguity here is undesirable. "SvPVbyte" and
3055       "SvPVutf8" solve that by giving predictable output: use "SvPVbyte" if
3056       your C library expects byte strings, or "SvPVutf8" if it expects UTF-8.
3057
3058       If your C library happens to support both encodings, then
3059       "SvPV"--always in tandem with lookups to "SvUTF8"!--may be safe and
3060       (slightly) more efficient.
3061
3062       TESTING TIP: Use utf8's "upgrade" and "downgrade" functions in your
3063       tests to ensure consistent handling regardless of Perl's internal
3064       encoding.
3065
3066   How do I convert a string to UTF-8?
3067       If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to
3068       upgrade the non-UTF-8 strings to UTF-8.  If you've got an SV, the
3069       easiest way to do this is:
3070
3071           sv_utf8_upgrade(sv);
3072
3073       However, you must not do this, for example:
3074
3075           if (!SvUTF8(left))
3076               sv_utf8_upgrade(left);
3077
3078       If you do this in a binary operator, you will actually change one of
3079       the strings that came into the operator, and, while it shouldn't be
3080       noticeable by the end user, it can cause problems in deficient code.
3081
3082       Instead, "bytes_to_utf8" will give you a UTF-8-encoded copy of its
3083       string argument.  This is useful for having the data available for
3084       comparisons and so on, without harming the original SV.  There's also
3085       "utf8_to_bytes" to go the other way, but naturally, this will fail if
3086       the string contains any characters above 255 that can't be represented
3087       in a single byte.
3088
3089   How do I compare strings?
3090       "sv_cmp" in perlapi and "sv_cmp_flags" in perlapi do a lexigraphic
3091       comparison of two SV's, and handle UTF-8ness properly.  Note, however,
3092       that Unicode specifies a much fancier mechanism for collation,
3093       available via the Unicode::Collate module.
3094
3095       To just compare two strings for equality/non-equality, you can just use
3096       "memEQ()" and "memNE()" as usual, except the strings must be both UTF-8
3097       or not UTF-8 encoded.
3098
3099       To compare two strings case-insensitively, use "foldEQ_utf8()" (the
3100       strings don't have to have the same UTF-8ness).
3101
3102   Is there anything else I need to know?
3103       Not really.  Just remember these things:
3104
3105       •  There's no way to tell if a "char *" or "U8 *" string is UTF-8 or
3106          not.  But you can tell if an SV is to be treated as UTF-8 by calling
3107          "DO_UTF8" on it, after stringifying it with "SvPV" or a similar
3108          macro.  And, you can tell if SV is actually UTF-8 (even if it is not
3109          to be treated as such) by looking at its "SvUTF8" flag (again after
3110          stringifying it).  Don't forget to set the flag if something should
3111          be UTF-8.  Treat the flag as part of the PV, even though it's not --
3112          if you pass on the PV to somewhere, pass on the flag too.
3113
3114       •  If a string is UTF-8, always use "utf8_to_uvchr_buf" to get at the
3115          value, unless "UTF8_IS_INVARIANT(*s)" in which case you can use *s.
3116
3117       •  When writing a character UV to a UTF-8 string, always use
3118          "uvchr_to_utf8", unless "UVCHR_IS_INVARIANT(uv))" in which case you
3119          can use "*s = uv".
3120
3121       •  Mixing UTF-8 and non-UTF-8 strings is tricky.  Use "bytes_to_utf8"
3122          to get a new string which is UTF-8 encoded, and then combine them.
3123

Custom Operators

3125       Custom operator support is an experimental feature that allows you to
3126       define your own ops.  This is primarily to allow the building of
3127       interpreters for other languages in the Perl core, but it also allows
3128       optimizations through the creation of "macro-ops" (ops which perform
3129       the functions of multiple ops which are usually executed together, such
3130       as "gvsv, gvsv, add".)
3131
3132       This feature is implemented as a new op type, "OP_CUSTOM".  The Perl
3133       core does not "know" anything special about this op type, and so it
3134       will not be involved in any optimizations.  This also means that you
3135       can define your custom ops to be any op structure -- unary, binary,
3136       list and so on -- you like.
3137
3138       It's important to know what custom operators won't do for you.  They
3139       won't let you add new syntax to Perl, directly.  They won't even let
3140       you add new keywords, directly.  In fact, they won't change the way
3141       Perl compiles a program at all.  You have to do those changes yourself,
3142       after Perl has compiled the program.  You do this either by
3143       manipulating the op tree using a "CHECK" block and the "B::Generate"
3144       module, or by adding a custom peephole optimizer with the "optimize"
3145       module.
3146
3147       When you do this, you replace ordinary Perl ops with custom ops by
3148       creating ops with the type "OP_CUSTOM" and the "op_ppaddr" of your own
3149       PP function.  This should be defined in XS code, and should look like
3150       the PP ops in "pp_*.c".  You are responsible for ensuring that your op
3151       takes the appropriate number of values from the stack, and you are
3152       responsible for adding stack marks if necessary.
3153
3154       You should also "register" your op with the Perl interpreter so that it
3155       can produce sensible error and warning messages.  Since it is possible
3156       to have multiple custom ops within the one "logical" op type
3157       "OP_CUSTOM", Perl uses the value of "o->op_ppaddr" to determine which
3158       custom op it is dealing with.  You should create an "XOP" structure for
3159       each ppaddr you use, set the properties of the custom op with
3160       "XopENTRY_set", and register the structure against the ppaddr using
3161       "Perl_custom_op_register".  A trivial example might look like:
3162
3163           static XOP my_xop;
3164           static OP *my_pp(pTHX);
3165
3166           BOOT:
3167               XopENTRY_set(&my_xop, xop_name, "myxop");
3168               XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
3169               Perl_custom_op_register(aTHX_ my_pp, &my_xop);
3170
3171       The available fields in the structure are:
3172
3173       xop_name
3174           A short name for your op.  This will be included in some error
3175           messages, and will also be returned as "$op->name" by the B module,
3176           so it will appear in the output of module like B::Concise.
3177
3178       xop_desc
3179           A short description of the function of the op.
3180
3181       xop_class
3182           Which of the various *OP structures this op uses.  This should be
3183           one of the "OA_*" constants from op.h, namely
3184
3185           OA_BASEOP
3186           OA_UNOP
3187           OA_BINOP
3188           OA_LOGOP
3189           OA_LISTOP
3190           OA_PMOP
3191           OA_SVOP
3192           OA_PADOP
3193           OA_PVOP_OR_SVOP
3194               This should be interpreted as '"PVOP"' only.  The "_OR_SVOP" is
3195               because the only core "PVOP", "OP_TRANS", can sometimes be a
3196               "SVOP" instead.
3197
3198           OA_LOOP
3199           OA_COP
3200
3201           The other "OA_*" constants should not be used.
3202
3203       xop_peep
3204           This member is of type "Perl_cpeep_t", which expands to "void
3205           (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)".  If it is set, this
3206           function will be called from "Perl_rpeep" when ops of this type are
3207           encountered by the peephole optimizer.  o is the OP that needs
3208           optimizing; oldop is the previous OP optimized, whose "op_next"
3209           points to o.
3210
3211       "B::Generate" directly supports the creation of custom ops by name.
3212

Stacks

3214       Descriptions above occasionally refer to "the stack", but there are in
3215       fact many stack-like data structures within the perl interpreter. When
3216       otherwise unqualified, "the stack" usually refers to the value stack.
3217
3218       The various stacks have different purposes, and operate in slightly
3219       different ways. Their differences are noted below.
3220
3221   Value Stack
3222       This stack stores the values that regular perl code is operating on,
3223       usually intermediate values of expressions within a statement. The
3224       stack itself is formed of an array of SV pointers.
3225
3226       The base of this stack is pointed to by the interpreter variable
3227       "PL_stack_base", of type "SV **".
3228
3229       The head of the stack is "PL_stack_sp", and points to the most
3230       recently-pushed item.
3231
3232       Items are pushed to the stack by using the "PUSHs()" macro or its
3233       variants described above; "XPUSHs()", "mPUSHs()", "mXPUSHs()" and the
3234       typed versions. Note carefully that the non-"X" versions of these
3235       macros do not check the size of the stack and assume it to be big
3236       enough. These must be paired with a suitable check of the stack's size,
3237       such as the "EXTEND" macro to ensure it is large enough. For example
3238
3239           EXTEND(SP, 4);
3240           mPUSHi(10);
3241           mPUSHi(20);
3242           mPUSHi(30);
3243           mPUSHi(40);
3244
3245       This is slightly more performant than making four separate checks in
3246       four separate "mXPUSHi()" calls.
3247
3248       As a further performance optimisation, the various "PUSH" macros all
3249       operate using a local variable "SP", rather than the interpreter-global
3250       variable "PL_stack_sp". This variable is declared by the "dSP" macro -
3251       though it is normally implied by XSUBs and similar so it is rare you
3252       have to consider it directly. Once declared, the "PUSH" macros will
3253       operate only on this local variable, so before invoking any other perl
3254       core functions you must use the "PUTBACK" macro to return the value
3255       from the local "SP" variable back to the interpreter variable.
3256       Similarly, after calling a perl core function which may have had reason
3257       to move the stack or push/pop values to it, you must use the "SPAGAIN"
3258       macro which refreshes the local "SP" value back from the interpreter
3259       one.
3260
3261       Items are popped from the stack by using the "POPs" macro or its typed
3262       versions, There is also a macro "TOPs" that inspects the topmost item
3263       without removing it.
3264
3265       Note specifically that SV pointers on the value stack do not contribute
3266       to the overall reference count of the xVs being referred to. If newly-
3267       created xVs are being pushed to the stack you must arrange for them to
3268       be destroyed at a suitable time; usually by using one of the "mPUSH*"
3269       macros or "sv_2mortal()" to mortalise the xV.
3270
3271   Mark Stack
3272       The value stack stores individual perl scalar values as temporaries
3273       between expressions. Some perl expressions operate on entire lists; for
3274       that purpose we need to know where on the stack each list begins. This
3275       is the purpose of the mark stack.
3276
3277       The mark stack stores integers as I32 values, which are the height of
3278       the value stack at the time before the list began; thus the mark itself
3279       actually points to the value stack entry one before the list. The list
3280       itself starts at "mark + 1".
3281
3282       The base of this stack is pointed to by the interpreter variable
3283       "PL_markstack", of type "I32 *".
3284
3285       The head of the stack is "PL_markstack_ptr", and points to the most
3286       recently-pushed item.
3287
3288       Items are pushed to the stack by using the "PUSHMARK()" macro. Even
3289       though the stack itself stores (value) stack indices as integers, the
3290       "PUSHMARK" macro should be given a stack pointer directly; it will
3291       calculate the index offset by comparing to the "PL_stack_sp" variable.
3292       Thus almost always the code to perform this is
3293
3294           PUSHMARK(SP);
3295
3296       Items are popped from the stack by the "POPMARK" macro. There is also a
3297       macro "TOPMARK" that inspects the topmost item without removing it.
3298       These macros return I32 index values directly. There is also the
3299       "dMARK" macro which declares a new SV double-pointer variable, called
3300       "mark", which points at the marked stack slot; this is the usual macro
3301       that C code will use when operating on lists given on the stack.
3302
3303       As noted above, the "mark" variable itself will point at the most
3304       recently pushed value on the value stack before the list begins, and so
3305       the list itself starts at "mark + 1". The values of the list may be
3306       iterated by code such as
3307
3308           for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) {
3309             SV *item = *svp;
3310             ...
3311           }
3312
3313       Note specifically in the case that the list is already empty, "mark"
3314       will equal "PL_stack_sp".
3315
3316       Because the "mark" variable is converted to a pointer on the value
3317       stack, extra care must be taken if "EXTEND" or any of the "XPUSH"
3318       macros are invoked within the function, because the stack may need to
3319       be moved to extend it and so the existing pointer will now be invalid.
3320       If this may be a problem, a possible solution is to track the mark
3321       offset as an integer and track the mark itself later on after the stack
3322       had been moved.
3323
3324           I32 markoff = POPMARK;
3325
3326           ...
3327
3328           SP **mark = PL_stack_base + markoff;
3329
3330   Temporaries Stack
3331       As noted above, xV references on the main value stack do not contribute
3332       to the reference count of an xV, and so another mechanism is used to
3333       track when temporary values which live on the stack must be released.
3334       This is the job of the temporaries stack.
3335
3336       The temporaries stack stores pointers to xVs whose reference counts
3337       will be decremented soon.
3338
3339       The base of this stack is pointed to by the interpreter variable
3340       "PL_tmps_stack", of type "SV **".
3341
3342       The head of the stack is indexed by "PL_tmps_ix", an integer which
3343       stores the index in the array of the most recently-pushed item.
3344
3345       There is no public API to directly push items to the temporaries stack.
3346       Instead, the API function "sv_2mortal()" is used to mortalize an xV,
3347       adding its address to the temporaries stack.
3348
3349       Likewise, there is no public API to read values from the temporaries
3350       stack.  Instead, the macros "SAVETMPS" and "FREETMPS" are used. The
3351       "SAVETMPS" macro establishes the base levels of the temporaries stack,
3352       by capturing the current value of "PL_tmps_ix" into "PL_tmps_floor" and
3353       saving the previous value to the save stack. Thereafter, whenever
3354       "FREETMPS" is invoked all of the temporaries that have been pushed
3355       since that level are reclaimed.
3356
3357       While it is common to see these two macros in pairs within an "ENTER"/
3358       "LEAVE" pair, it is not necessary to match them. It is permitted to
3359       invoke "FREETMPS" multiple times since the most recent "SAVETMPS"; for
3360       example in a loop iterating over elements of a list. While you can
3361       invoke "SAVETMPS" multiple times within a scope pair, it is unlikely to
3362       be useful. Subsequent invocations will move the temporaries floor
3363       further up, thus effectively trapping the existing temporaries to only
3364       be released at the end of the scope.
3365
3366   Save Stack
3367       The save stack is used by perl to implement the "local" keyword and
3368       other similar behaviours; any cleanup operations that need to be
3369       performed when leaving the current scope. Items pushed to this stack
3370       generally capture the current value of some internal variable or state,
3371       which will be restored when the scope is unwound due to leaving,
3372       "return", "die", "goto" or other reasons.
3373
3374       Whereas other perl internal stacks store individual items all of the
3375       same type (usually SV pointers or integers), the items pushed to the
3376       save stack are formed of many different types, having multiple fields
3377       to them. For example, the "SAVEt_INT" type needs to store both the
3378       address of the "int" variable to restore, and the value to restore it
3379       to. This information could have been stored using fields of a "struct",
3380       but would have to be large enough to store three pointers in the
3381       largest case, which would waste a lot of space in most of the smaller
3382       cases.
3383
3384       Instead, the stack stores information in a variable-length encoding of
3385       "ANY" structures. The final value pushed is stored in the "UV" field
3386       which encodes the kind of item held by the preceeding items; the count
3387       and types of which will depend on what kind of item is being stored.
3388       The kind field is pushed last because that will be the first field to
3389       be popped when unwinding items from the stack.
3390
3391       The base of this stack is pointed to by the interpreter variable
3392       "PL_savestack", of type "ANY *".
3393
3394       The head of the stack is indexed by "PL_savestack_ix", an integer which
3395       stores the index in the array at which the next item should be pushed.
3396       (Note that this is different to most other stacks, which reference the
3397       most recently-pushed item).
3398
3399       Items are pushed to the save stack by using the various "SAVE...()"
3400       macros.  Many of these macros take a variable and store both its
3401       address and current value on the save stack, ensuring that value gets
3402       restored on scope exit.
3403
3404           SAVEI8(i8)
3405           SAVEI16(i16)
3406           SAVEI32(i32)
3407           SAVEINT(i)
3408           ...
3409
3410       There are also a variety of other special-purpose macros which save
3411       particular types or values of interest. "SAVETMPS" has already been
3412       mentioned above.  Others include "SAVEFREEPV" which arranges for a PV
3413       (i.e. a string buffer) to be freed, or "SAVEDESTRUCTOR" which arranges
3414       for a given function pointer to be invoked on scope exit. A full list
3415       of such macros can be found in scope.h.
3416
3417       There is no public API for popping individual values or items from the
3418       save stack. Instead, via the scope stack, the "ENTER" and "LEAVE" pair
3419       form a way to start and stop nested scopes. Leaving a nested scope via
3420       "LEAVE" will restore all of the saved values that had been pushed since
3421       the most recent "ENTER".
3422
3423   Scope Stack
3424       As with the mark stack to the value stack, the scope stack forms a pair
3425       with the save stack. The scope stack stores the height of the save
3426       stack at which nested scopes begin, and allows the save stack to be
3427       unwound back to that point when the scope is left.
3428
3429       When perl is built with debugging enabled, there is a second part to
3430       this stack storing human-readable string names describing the type of
3431       stack context. Each push operation saves the name as well as the height
3432       of the save stack, and each pop operation checks the topmost name with
3433       what is expected, causing an assertion failure if the name does not
3434       match.
3435
3436       The base of this stack is pointed to by the interpreter variable
3437       "PL_scopestack", of type "I32 *". If enabled, the scope stack names are
3438       stored in a separate array pointed to by "PL_scopestack_name", of type
3439       "const char **".
3440
3441       The head of the stack is indexed by "PL_scopestack_ix", an integer
3442       which stores the index of the array or arrays at which the next item
3443       should be pushed. (Note that this is different to most other stacks,
3444       which reference the most recently-pushed item).
3445
3446       Values are pushed to the scope stack using the "ENTER" macro, which
3447       begins a new nested scope. Any items pushed to the save stack are then
3448       restored at the next nested invocation of the "LEAVE" macro.
3449

Dynamic Scope and the Context Stack

3451       Note: this section describes a non-public internal API that is subject
3452       to change without notice.
3453
3454   Introduction to the context stack
3455       In Perl, dynamic scoping refers to the runtime nesting of things like
3456       subroutine calls, evals etc, as well as the entering and exiting of
3457       block scopes. For example, the restoring of a "local"ised variable is
3458       determined by the dynamic scope.
3459
3460       Perl tracks the dynamic scope by a data structure called the context
3461       stack, which is an array of "PERL_CONTEXT" structures, and which is
3462       itself a big union for all the types of context. Whenever a new scope
3463       is entered (such as a block, a "for" loop, or a subroutine call), a new
3464       context entry is pushed onto the stack. Similarly when leaving a block
3465       or returning from a subroutine call etc. a context is popped. Since the
3466       context stack represents the current dynamic scope, it can be searched.
3467       For example, "next LABEL" searches back through the stack looking for a
3468       loop context that matches the label; "return" pops contexts until it
3469       finds a sub or eval context or similar; "caller" examines sub contexts
3470       on the stack.
3471
3472       Each context entry is labelled with a context type, "cx_type". Typical
3473       context types are "CXt_SUB", "CXt_EVAL" etc., as well as "CXt_BLOCK"
3474       and "CXt_NULL" which represent a basic scope (as pushed by "pp_enter")
3475       and a sort block. The type determines which part of the context union
3476       are valid.
3477
3478       The main division in the context struct is between a substitution scope
3479       ("CXt_SUBST") and block scopes, which are everything else. The former
3480       is just used while executing "s///e", and won't be discussed further
3481       here.
3482
3483       All the block scope types share a common base, which corresponds to
3484       "CXt_BLOCK". This stores the old values of various scope-related
3485       variables like "PL_curpm", as well as information about the current
3486       scope, such as "gimme". On scope exit, the old variables are restored.
3487
3488       Particular block scope types store extra per-type information. For
3489       example, "CXt_SUB" stores the currently executing CV, while the various
3490       for loop types might hold the original loop variable SV. On scope exit,
3491       the per-type data is processed; for example the CV has its reference
3492       count decremented, and the original loop variable is restored.
3493
3494       The macro "cxstack" returns the base of the current context stack,
3495       while "cxstack_ix" is the index of the current frame within that stack.
3496
3497       In fact, the context stack is actually part of a stack-of-stacks
3498       system; whenever something unusual is done such as calling a "DESTROY"
3499       or tie handler, a new stack is pushed, then popped at the end.
3500
3501       Note that the API described here changed considerably in perl 5.24;
3502       prior to that, big macros like "PUSHBLOCK" and "POPSUB" were used; in
3503       5.24 they were replaced by the inline static functions described below.
3504       In addition, the ordering and detail of how these macros/function work
3505       changed in many ways, often subtly. In particular they didn't handle
3506       saving the savestack and temps stack positions, and required additional
3507       "ENTER", "SAVETMPS" and "LEAVE" compared to the new functions. The old-
3508       style macros will not be described further.
3509
3510   Pushing contexts
3511       For pushing a new context, the two basic functions are "cx =
3512       cx_pushblock()", which pushes a new basic context block and returns its
3513       address, and a family of similar functions with names like
3514       "cx_pushsub(cx)" which populate the additional type-dependent fields in
3515       the "cx" struct. Note that "CXt_NULL" and "CXt_BLOCK" don't have their
3516       own push functions, as they don't store any data beyond that pushed by
3517       "cx_pushblock".
3518
3519       The fields of the context struct and the arguments to the "cx_*"
3520       functions are subject to change between perl releases, representing
3521       whatever is convenient or efficient for that release.
3522
3523       A typical context stack pushing can be found in "pp_entersub"; the
3524       following shows a simplified and stripped-down example of a non-XS
3525       call, along with comments showing roughly what each function does.
3526
3527        dMARK;
3528        U8 gimme      = GIMME_V;
3529        bool hasargs  = cBOOL(PL_op->op_flags & OPf_STACKED);
3530        OP *retop     = PL_op->op_next;
3531        I32 old_ss_ix = PL_savestack_ix;
3532        CV *cv        = ....;
3533
3534        /* ... make mortal copies of stack args which are PADTMPs here ... */
3535
3536        /* ... do any additional savestack pushes here ... */
3537
3538        /* Now push a new context entry of type 'CXt_SUB'; initially just
3539         * doing the actions common to all block types: */
3540
3541        cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix);
3542
3543            /* this does (approximately):
3544                CXINC;              /* cxstack_ix++ (grow if necessary) */
3545                cx = CX_CUR();      /* and get the address of new frame */
3546                cx->cx_type        = CXt_SUB;
3547                cx->blk_gimme      = gimme;
3548                cx->blk_oldsp      = MARK - PL_stack_base;
3549                cx->blk_oldsaveix  = old_ss_ix;
3550                cx->blk_oldcop     = PL_curcop;
3551                cx->blk_oldmarksp  = PL_markstack_ptr - PL_markstack;
3552                cx->blk_oldscopesp = PL_scopestack_ix;
3553                cx->blk_oldpm      = PL_curpm;
3554                cx->blk_old_tmpsfloor = PL_tmps_floor;
3555
3556                PL_tmps_floor        = PL_tmps_ix;
3557            */
3558
3559
3560        /* then update the new context frame with subroutine-specific info,
3561         * such as the CV about to be executed: */
3562
3563        cx_pushsub(cx, cv, retop, hasargs);
3564
3565            /* this does (approximately):
3566                cx->blk_sub.cv          = cv;
3567                cx->blk_sub.olddepth    = CvDEPTH(cv);
3568                cx->blk_sub.prevcomppad = PL_comppad;
3569                cx->cx_type            |= (hasargs) ? CXp_HASARGS : 0;
3570                cx->blk_sub.retop       = retop;
3571                SvREFCNT_inc_simple_void_NN(cv);
3572            */
3573
3574       Note that "cx_pushblock()" sets two new floors: for the args stack (to
3575       "MARK") and the temps stack (to "PL_tmps_ix"). While executing at this
3576       scope level, every "nextstate" (amongst others) will reset the args and
3577       tmps stack levels to these floors. Note that since "cx_pushblock" uses
3578       the current value of "PL_tmps_ix" rather than it being passed as an
3579       arg, this dictates at what point "cx_pushblock" should be called. In
3580       particular, any new mortals which should be freed only on scope exit
3581       (rather than at the next "nextstate") should be created first.
3582
3583       Most callers of "cx_pushblock" simply set the new args stack floor to
3584       the top of the previous stack frame, but for "CXt_LOOP_LIST" it stores
3585       the items being iterated over on the stack, and so sets "blk_oldsp" to
3586       the top of these items instead. Note that, contrary to its name,
3587       "blk_oldsp" doesn't always represent the value to restore "PL_stack_sp"
3588       to on scope exit.
3589
3590       Note the early capture of "PL_savestack_ix" to "old_ss_ix", which is
3591       later passed as an arg to "cx_pushblock". In the case of "pp_entersub",
3592       this is because, although most values needing saving are stored in
3593       fields of the context struct, an extra value needs saving only when the
3594       debugger is running, and it doesn't make sense to bloat the struct for
3595       this rare case. So instead it is saved on the savestack. Since this
3596       value gets calculated and saved before the context is pushed, it is
3597       necessary to pass the old value of "PL_savestack_ix" to "cx_pushblock",
3598       to ensure that the saved value gets freed during scope exit.  For most
3599       users of "cx_pushblock", where nothing needs pushing on the save stack,
3600       "PL_savestack_ix" is just passed directly as an arg to "cx_pushblock".
3601
3602       Note that where possible, values should be saved in the context struct
3603       rather than on the save stack; it's much faster that way.
3604
3605       Normally "cx_pushblock" should be immediately followed by the
3606       appropriate "cx_pushfoo", with nothing between them; this is because if
3607       code in-between could die (e.g. a warning upgraded to fatal), then the
3608       context stack unwinding code in "dounwind" would see (in the example
3609       above) a "CXt_SUB" context frame, but without all the subroutine-
3610       specific fields set, and crashes would soon ensue.
3611
3612       Where the two must be separate, initially set the type to "CXt_NULL" or
3613       "CXt_BLOCK", and later change it to "CXt_foo" when doing the
3614       "cx_pushfoo". This is exactly what "pp_enteriter" does, once it's
3615       determined which type of loop it's pushing.
3616
3617   Popping contexts
3618       Contexts are popped using "cx_popsub()" etc. and "cx_popblock()". Note
3619       however, that unlike "cx_pushblock", neither of these functions
3620       actually decrement the current context stack index; this is done
3621       separately using "CX_POP()".
3622
3623       There are two main ways that contexts are popped. During normal
3624       execution as scopes are exited, functions like "pp_leave",
3625       "pp_leaveloop" and "pp_leavesub" process and pop just one context using
3626       "cx_popfoo" and "cx_popblock". On the other hand, things like
3627       "pp_return" and "next" may have to pop back several scopes until a sub
3628       or loop context is found, and exceptions (such as "die") need to pop
3629       back contexts until an eval context is found. Both of these are
3630       accomplished by "dounwind()", which is capable of processing and
3631       popping all contexts above the target one.
3632
3633       Here is a typical example of context popping, as found in "pp_leavesub"
3634       (simplified slightly):
3635
3636        U8 gimme;
3637        PERL_CONTEXT *cx;
3638        SV **oldsp;
3639        OP *retop;
3640
3641        cx = CX_CUR();
3642
3643        gimme = cx->blk_gimme;
3644        oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */
3645
3646        if (gimme == G_VOID)
3647            PL_stack_sp = oldsp;
3648        else
3649            leave_adjust_stacks(oldsp, oldsp, gimme, 0);
3650
3651        CX_LEAVE_SCOPE(cx);
3652        cx_popsub(cx);
3653        cx_popblock(cx);
3654        retop = cx->blk_sub.retop;
3655        CX_POP(cx);
3656
3657        return retop;
3658
3659       The steps above are in a very specific order, designed to be the
3660       reverse order of when the context was pushed. The first thing to do is
3661       to copy and/or protect any return arguments and free any temps in the
3662       current scope. Scope exits like an rvalue sub normally return a mortal
3663       copy of their return args (as opposed to lvalue subs). It is important
3664       to make this copy before the save stack is popped or variables are
3665       restored, or bad things like the following can happen:
3666
3667           sub f { my $x =...; $x }  # $x freed before we get to copy it
3668           sub f { /(...)/;    $1 }  # PL_curpm restored before $1 copied
3669
3670       Although we wish to free any temps at the same time, we have to be
3671       careful not to free any temps which are keeping return args alive; nor
3672       to free the temps we have just created while mortal copying return
3673       args. Fortunately, "leave_adjust_stacks()" is capable of making mortal
3674       copies of return args, shifting args down the stack, and only
3675       processing those entries on the temps stack that are safe to do so.
3676
3677       In void context no args are returned, so it's more efficient to skip
3678       calling "leave_adjust_stacks()". Also in void context, a "nextstate" op
3679       is likely to be imminently called which will do a "FREETMPS", so
3680       there's no need to do that either.
3681
3682       The next step is to pop savestack entries: "CX_LEAVE_SCOPE(cx)" is just
3683       defined as "LEAVE_SCOPE(cx->blk_oldsaveix)". Note that during the
3684       popping, it's possible for perl to call destructors, call "STORE" to
3685       undo localisations of tied vars, and so on. Any of these can die or
3686       call "exit()". In this case, "dounwind()" will be called, and the
3687       current context stack frame will be re-processed. Thus it is vital that
3688       all steps in popping a context are done in such a way to support
3689       reentrancy.  The other alternative, of decrementing "cxstack_ix" before
3690       processing the frame, would lead to leaks and the like if something
3691       died halfway through, or overwriting of the current frame.
3692
3693       "CX_LEAVE_SCOPE" itself is safely re-entrant: if only half the
3694       savestack items have been popped before dying and getting trapped by
3695       eval, then the "CX_LEAVE_SCOPE"s in "dounwind" or "pp_leaveeval" will
3696       continue where the first one left off.
3697
3698       The next step is the type-specific context processing; in this case
3699       "cx_popsub". In part, this looks like:
3700
3701           cv = cx->blk_sub.cv;
3702           CvDEPTH(cv) = cx->blk_sub.olddepth;
3703           cx->blk_sub.cv = NULL;
3704           SvREFCNT_dec(cv);
3705
3706       where its processing the just-executed CV. Note that before it
3707       decrements the CV's reference count, it nulls the "blk_sub.cv". This
3708       means that if it re-enters, the CV won't be freed twice. It also means
3709       that you can't rely on such type-specific fields having useful values
3710       after the return from "cx_popfoo".
3711
3712       Next, "cx_popblock" restores all the various interpreter vars to their
3713       previous values or previous high water marks; it expands to:
3714
3715           PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp;
3716           PL_scopestack_ix = cx->blk_oldscopesp;
3717           PL_curpm         = cx->blk_oldpm;
3718           PL_curcop        = cx->blk_oldcop;
3719           PL_tmps_floor    = cx->blk_old_tmpsfloor;
3720
3721       Note that it doesn't restore "PL_stack_sp"; as mentioned earlier, which
3722       value to restore it to depends on the context type (specifically "for
3723       (list) {}"), and what args (if any) it returns; and that will already
3724       have been sorted out earlier by "leave_adjust_stacks()".
3725
3726       Finally, the context stack pointer is actually decremented by
3727       "CX_POP(cx)".  After this point, it's possible that that the current
3728       context frame could be overwritten by other contexts being pushed.
3729       Although things like ties and "DESTROY" are supposed to work within a
3730       new context stack, it's best not to assume this. Indeed on debugging
3731       builds, "CX_POP(cx)" deliberately sets "cx" to null to detect code that
3732       is still relying on the field values in that context frame. Note in the
3733       "pp_leavesub()" example above, we grab "blk_sub.retop" before calling
3734       "CX_POP".
3735
3736   Redoing contexts
3737       Finally, there is "cx_topblock(cx)", which acts like a
3738       super-"nextstate" as regards to resetting various vars to their base
3739       values. It is used in places like "pp_next", "pp_redo" and "pp_goto"
3740       where rather than exiting a scope, we want to re-initialise the scope.
3741       As well as resetting "PL_stack_sp" like "nextstate", it also resets
3742       "PL_markstack_ptr", "PL_scopestack_ix" and "PL_curpm". Note that it
3743       doesn't do a "FREETMPS".
3744

Slab-based operator allocation

3746       Note: this section describes a non-public internal API that is subject
3747       to change without notice.
3748
3749       Perl's internal error-handling mechanisms implement "die" (and its
3750       internal equivalents) using longjmp. If this occurs during lexing,
3751       parsing or compilation, we must ensure that any ops allocated as part
3752       of the compilation process are freed. (Older Perl versions did not
3753       adequately handle this situation: when failing a parse, they would leak
3754       ops that were stored in C "auto" variables and not linked anywhere
3755       else.)
3756
3757       To handle this situation, Perl uses op slabs that are attached to the
3758       currently-compiling CV. A slab is a chunk of allocated memory. New ops
3759       are allocated as regions of the slab. If the slab fills up, a new one
3760       is created (and linked from the previous one). When an error occurs and
3761       the CV is freed, any ops remaining are freed.
3762
3763       Each op is preceded by two pointers: one points to the next op in the
3764       slab, and the other points to the slab that owns it. The next-op
3765       pointer is needed so that Perl can iterate over a slab and free all its
3766       ops. (Op structures are of different sizes, so the slab's ops can't
3767       merely be treated as a dense array.)  The slab pointer is needed for
3768       accessing a reference count on the slab: when the last op on a slab is
3769       freed, the slab itself is freed.
3770
3771       The slab allocator puts the ops at the end of the slab first. This will
3772       tend to allocate the leaves of the op tree first, and the layout will
3773       therefore hopefully be cache-friendly. In addition, this means that
3774       there's no need to store the size of the slab (see below on why slabs
3775       vary in size), because Perl can follow pointers to find the last op.
3776
3777       It might seem possible to eliminate slab reference counts altogether,
3778       by having all ops implicitly attached to "PL_compcv" when allocated and
3779       freed when the CV is freed. That would also allow "op_free" to skip
3780       "FreeOp" altogether, and thus free ops faster. But that doesn't work in
3781       those cases where ops need to survive beyond their CVs, such as re-
3782       evals.
3783
3784       The CV also has to have a reference count on the slab. Sometimes the
3785       first op created is immediately freed. If the reference count of the
3786       slab reaches 0, then it will be freed with the CV still pointing to it.
3787
3788       CVs use the "CVf_SLABBED" flag to indicate that the CV has a reference
3789       count on the slab. When this flag is set, the slab is accessible via
3790       "CvSTART" when "CvROOT" is not set, or by subtracting two pointers
3791       "(2*sizeof(I32 *))" from "CvROOT" when it is set. The alternative to
3792       this approach of sneaking the slab into "CvSTART" during compilation
3793       would be to enlarge the "xpvcv" struct by another pointer. But that
3794       would make all CVs larger, even though slab-based op freeing is
3795       typically of benefit only for programs that make significant use of
3796       string eval.
3797
3798       When the "CVf_SLABBED" flag is set, the CV takes responsibility for
3799       freeing the slab. If "CvROOT" is not set when the CV is freed or
3800       undeffed, it is assumed that a compilation error has occurred, so the
3801       op slab is traversed and all the ops are freed.
3802
3803       Under normal circumstances, the CV forgets about its slab (decrementing
3804       the reference count) when the root is attached. So the slab reference
3805       counting that happens when ops are freed takes care of freeing the
3806       slab. In some cases, the CV is told to forget about the slab
3807       ("cv_forget_slab") precisely so that the ops can survive after the CV
3808       is done away with.
3809
3810       Forgetting the slab when the root is attached is not strictly
3811       necessary, but avoids potential problems with "CvROOT" being written
3812       over. There is code all over the place, both in core and on CPAN, that
3813       does things with "CvROOT", so forgetting the slab makes things more
3814       robust and avoids potential problems.
3815
3816       Since the CV takes ownership of its slab when flagged, that flag is
3817       never copied when a CV is cloned, as one CV could free a slab that
3818       another CV still points to, since forced freeing of ops ignores the
3819       reference count (but asserts that it looks right).
3820
3821       To avoid slab fragmentation, freed ops are marked as freed and attached
3822       to the slab's freed chain (an idea stolen from DBM::Deep). Those freed
3823       ops are reused when possible. Not reusing freed ops would be simpler,
3824       but it would result in significantly higher memory usage for programs
3825       with large "if (DEBUG) {...}" blocks.
3826
3827       "SAVEFREEOP" is slightly problematic under this scheme. Sometimes it
3828       can cause an op to be freed after its CV. If the CV has forcibly freed
3829       the ops on its slab and the slab itself, then we will be fiddling with
3830       a freed slab. Making "SAVEFREEOP" a no-op doesn't help, as sometimes an
3831       op can be savefreed when there is no compilation error, so the op would
3832       never be freed. It holds a reference count on the slab, so the whole
3833       slab would leak. So "SAVEFREEOP" now sets a special flag on the op
3834       ("->op_savefree"). The forced freeing of ops after a compilation error
3835       won't free any ops thus marked.
3836
3837       Since many pieces of code create tiny subroutines consisting of only a
3838       few ops, and since a huge slab would be quite a bit of baggage for
3839       those to carry around, the first slab is always very small. To avoid
3840       allocating too many slabs for a single CV, each subsequent slab is
3841       twice the size of the previous.
3842
3843       Smartmatch expects to be able to allocate an op at run time, run it,
3844       and then throw it away. For that to work the op is simply malloced when
3845       PL_compcv hasn't been set up. So all slab-allocated ops are marked as
3846       such ("->op_slabbed"), to distinguish them from malloced ops.
3847

AUTHORS

3849       Until May 1997, this document was maintained by Jeff Okamoto
3850       <okamoto@corp.hp.com>.  It is now maintained as part of Perl itself by
3851       the Perl 5 Porters <perl5-porters@perl.org>.
3852
3853       With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
3854       Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
3855       Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
3856       Stephen McCamant, and Gurusamy Sarathy.
3857

SEE ALSO

3859       perlapi, perlintern, perlxs, perlembed
3860
3861
3862
3863perl v5.34.1                      2022-03-15                       PERLGUTS(1)
Impressum