perlguts(1)

1PERLGUTS(1)            Perl Programmers Reference Guide            PERLGUTS(1)
2
3
4

NAME

6       perlguts - Introduction to the Perl API
7

DESCRIPTION

9       This document attempts to describe how to use the Perl API, as well as
10       to provide some info on the basic workings of the Perl core. It is far
11       from complete and probably contains many errors. Please refer any
12       questions or comments to the author below.
13

Variables

15   Datatypes
16       Perl has three typedefs that handle Perl's three main data types:
17
18           SV  Scalar Value
19           AV  Array Value
20           HV  Hash Value
21
22       Each typedef has specific routines that manipulate the various data
23       types.
24
25   What is an "IV"?
26       Perl uses a special typedef IV which is a simple signed integer type
27       that is guaranteed to be large enough to hold a pointer (as well as an
28       integer).  Additionally, there is the UV, which is simply an unsigned
29       IV.
30
31       Perl also uses two special typedefs, I32 and I16, which will always be
32       at least 32-bits and 16-bits long, respectively. (Again, there are U32
33       and U16, as well.)  They will usually be exactly 32 and 16 bits long,
34       but on Crays they will both be 64 bits.
35
36   Working with SVs
37       An SV can be created and loaded with one command.  There are five types
38       of values that can be loaded: an integer value (IV), an unsigned
39       integer value (UV), a double (NV), a string (PV), and another scalar
40       (SV).
41
42       The seven routines are:
43
44           SV*  newSViv(IV);
45           SV*  newSVuv(UV);
46           SV*  newSVnv(double);
47           SV*  newSVpv(const char*, STRLEN);
48           SV*  newSVpvn(const char*, STRLEN);
49           SV*  newSVpvf(const char*, ...);
50           SV*  newSVsv(SV*);
51
52       "STRLEN" is an integer type (Size_t, usually defined as size_t in
53       config.h) guaranteed to be large enough to represent the size of any
54       string that perl can handle.
55
56       In the unlikely case of a SV requiring more complex initialisation, you
57       can create an empty SV with newSV(len).  If "len" is 0 an empty SV of
58       type NULL is returned, else an SV of type PV is returned with len + 1
59       (for the NUL) bytes of storage allocated, accessible via SvPVX.  In
60       both cases the SV has the undef value.
61
62           SV *sv = newSV(0);   /* no storage allocated  */
63           SV *sv = newSV(10);  /* 10 (+1) bytes of uninitialised storage
64                                 * allocated */
65
66       To change the value of an already-existing SV, there are eight
67       routines:
68
69           void  sv_setiv(SV*, IV);
70           void  sv_setuv(SV*, UV);
71           void  sv_setnv(SV*, double);
72           void  sv_setpv(SV*, const char*);
73           void  sv_setpvn(SV*, const char*, STRLEN)
74           void  sv_setpvf(SV*, const char*, ...);
75           void  sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
76                                                           SV **, I32, bool *);
77           void  sv_setsv(SV*, SV*);
78
79       Notice that you can choose to specify the length of the string to be
80       assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
81       allow Perl to calculate the length by using "sv_setpv" or by specifying
82       0 as the second argument to "newSVpv".  Be warned, though, that Perl
83       will determine the string's length by using "strlen", which depends on
84       the string terminating with a NUL character, and not otherwise
85       containing NULs.
86
87       The arguments of "sv_setpvf" are processed like "sprintf", and the
88       formatted output becomes the value.
89
90       "sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to
91       specify either a pointer to a variable argument list or the address and
92       length of an array of SVs.  The last argument points to a boolean; on
93       return, if that boolean is true, then locale-specific information has
94       been used to format the string, and the string's contents are therefore
95       untrustworthy (see perlsec).  This pointer may be NULL if that
96       information is not important.  Note that this function requires you to
97       specify the length of the format.
98
99       The "sv_set*()" functions are not generic enough to operate on values
100       that have "magic".  See "Magic Virtual Tables" later in this document.
101
102       All SVs that contain strings should be terminated with a NUL character.
103       If it is not NUL-terminated there is a risk of core dumps and
104       corruptions from code which passes the string to C functions or system
105       calls which expect a NUL-terminated string.  Perl's own functions
106       typically add a trailing NUL for this reason.  Nevertheless, you should
107       be very careful when you pass a string stored in an SV to a C function
108       or system call.
109
110       To access the actual value that an SV points to, you can use the
111       macros:
112
113           SvIV(SV*)
114           SvUV(SV*)
115           SvNV(SV*)
116           SvPV(SV*, STRLEN len)
117           SvPV_nolen(SV*)
118
119       which will automatically coerce the actual scalar type into an IV, UV,
120       double, or string.
121
122       In the "SvPV" macro, the length of the string returned is placed into
123       the variable "len" (this is a macro, so you do not use &len).  If you
124       do not care what the length of the data is, use the "SvPV_nolen" macro.
125       Historically the "SvPV" macro with the global variable "PL_na" has been
126       used in this case.  But that can be quite inefficient because "PL_na"
127       must be accessed in thread-local storage in threaded Perl.  In any
128       case, remember that Perl allows arbitrary strings of data that may both
129       contain NULs and might not be terminated by a NUL.
130
131       Also remember that C doesn't allow you to safely say "foo(SvPV(s, len),
132       len);". It might work with your compiler, but it won't work for
133       everyone.  Break this sort of statement up into separate assignments:
134
135           SV *s;
136           STRLEN len;
137           char *ptr;
138           ptr = SvPV(s, len);
139           foo(ptr, len);
140
141       If you want to know if the scalar value is TRUE, you can use:
142
143           SvTRUE(SV*)
144
145       Although Perl will automatically grow strings for you, if you need to
146       force Perl to allocate more memory for your SV, you can use the macro
147
148           SvGROW(SV*, STRLEN newlen)
149
150       which will determine if more memory needs to be allocated.  If so, it
151       will call the function "sv_grow".  Note that "SvGROW" can only
152       increase, not decrease, the allocated memory of an SV and that it does
153       not automatically add space for the trailing NUL byte (perl's own
154       string functions typically do "SvGROW(sv, len + 1)").
155
156       If you have an SV and want to know what kind of data Perl thinks is
157       stored in it, you can use the following macros to check the type of SV
158       you have.
159
160           SvIOK(SV*)
161           SvNOK(SV*)
162           SvPOK(SV*)
163
164       You can get and set the current length of the string stored in an SV
165       with the following macros:
166
167           SvCUR(SV*)
168           SvCUR_set(SV*, I32 val)
169
170       You can also get a pointer to the end of the string stored in the SV
171       with the macro:
172
173           SvEND(SV*)
174
175       But note that these last three macros are valid only if "SvPOK()" is
176       true.
177
178       If you want to append something to the end of string stored in an
179       "SV*", you can use the following functions:
180
181           void  sv_catpv(SV*, const char*);
182           void  sv_catpvn(SV*, const char*, STRLEN);
183           void  sv_catpvf(SV*, const char*, ...);
184           void  sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
185                                                                    I32, bool);
186           void  sv_catsv(SV*, SV*);
187
188       The first function calculates the length of the string to be appended
189       by using "strlen".  In the second, you specify the length of the string
190       yourself.  The third function processes its arguments like "sprintf"
191       and appends the formatted output.  The fourth function works like
192       "vsprintf".  You can specify the address and length of an array of SVs
193       instead of the va_list argument. The fifth function extends the string
194       stored in the first SV with the string stored in the second SV.  It
195       also forces the second SV to be interpreted as a string.
196
197       The "sv_cat*()" functions are not generic enough to operate on values
198       that have "magic".  See "Magic Virtual Tables" later in this document.
199
200       If you know the name of a scalar variable, you can get a pointer to its
201       SV by using the following:
202
203           SV*  get_sv("package::varname", 0);
204
205       This returns NULL if the variable does not exist.
206
207       If you want to know if this variable (or any other SV) is actually
208       "defined", you can call:
209
210           SvOK(SV*)
211
212       The scalar "undef" value is stored in an SV instance called
213       "PL_sv_undef".
214
215       Its address can be used whenever an "SV*" is needed. Make sure that you
216       don't try to compare a random sv with &PL_sv_undef. For example when
217       interfacing Perl code, it'll work correctly for:
218
219         foo(undef);
220
221       But won't work when called as:
222
223         $x = undef;
224         foo($x);
225
226       So to repeat always use SvOK() to check whether an sv is defined.
227
228       Also you have to be careful when using &PL_sv_undef as a value in AVs
229       or HVs (see "AVs, HVs and undefined values").
230
231       There are also the two values "PL_sv_yes" and "PL_sv_no", which contain
232       boolean TRUE and FALSE values, respectively.  Like "PL_sv_undef", their
233       addresses can be used whenever an "SV*" is needed.
234
235       Do not be fooled into thinking that "(SV *) 0" is the same as
236       &PL_sv_undef.  Take this code:
237
238           SV* sv = (SV*) 0;
239           if (I-am-to-return-a-real-value) {
240                   sv = sv_2mortal(newSViv(42));
241           }
242           sv_setsv(ST(0), sv);
243
244       This code tries to return a new SV (which contains the value 42) if it
245       should return a real value, or undef otherwise.  Instead it has
246       returned a NULL pointer which, somewhere down the line, will cause a
247       segmentation violation, bus error, or just weird results.  Change the
248       zero to &PL_sv_undef in the first line and all will be well.
249
250       To free an SV that you've created, call "SvREFCNT_dec(SV*)".  Normally
251       this call is not necessary (see "Reference Counts and Mortality").
252
253   Offsets
254       Perl provides the function "sv_chop" to efficiently remove characters
255       from the beginning of a string; you give it an SV and a pointer to
256       somewhere inside the PV, and it discards everything before the pointer.
257       The efficiency comes by means of a little hack: instead of actually
258       removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to
259       signal to other functions that the offset hack is in effect, and it
260       puts the number of bytes chopped off into the IV field of the SV. It
261       then moves the PV pointer (called "SvPVX") forward that many bytes, and
262       adjusts "SvCUR" and "SvLEN".
263
264       Hence, at this point, the start of the buffer that we allocated lives
265       at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing into
266       the middle of this allocated storage.
267
268       This is best demonstrated by example:
269
270         % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
271         SV = PVIV(0x8128450) at 0x81340f0
272           REFCNT = 1
273           FLAGS = (POK,OOK,pPOK)
274           IV = 1  (OFFSET)
275           PV = 0x8135781 ( "1" . ) "2345"\0
276           CUR = 4
277           LEN = 5
278
279       Here the number of bytes chopped off (1) is put into IV, and
280       "Devel::Peek::Dump" helpfully reminds us that this is an offset. The
281       portion of the string between the "real" and the "fake" beginnings is
282       shown in parentheses, and the values of "SvCUR" and "SvLEN" reflect the
283       fake beginning, not the real one.
284
285       Something similar to the offset hack is performed on AVs to enable
286       efficient shifting and splicing off the beginning of the array; while
287       "AvARRAY" points to the first element in the array that is visible from
288       Perl, "AvALLOC" points to the real start of the C array. These are
289       usually the same, but a "shift" operation can be carried out by
290       increasing "AvARRAY" by one and decreasing "AvFILL" and "AvMAX".
291       Again, the location of the real start of the C array only comes into
292       play when freeing the array. See "av_shift" in av.c.
293
294   What's Really Stored in an SV?
295       Recall that the usual method of determining the type of scalar you have
296       is to use "Sv*OK" macros.  Because a scalar can be both a number and a
297       string, usually these macros will always return TRUE and calling the
298       "Sv*V" macros will do the appropriate conversion of string to
299       integer/double or integer/double to string.
300
301       If you really need to know if you have an integer, double, or string
302       pointer in an SV, you can use the following three macros instead:
303
304           SvIOKp(SV*)
305           SvNOKp(SV*)
306           SvPOKp(SV*)
307
308       These will tell you if you truly have an integer, double, or string
309       pointer stored in your SV.  The "p" stands for private.
310
311       There are various ways in which the private and public flags may
312       differ.  For example, a tied SV may have a valid underlying value in
313       the IV slot (so SvIOKp is true), but the data should be accessed via
314       the FETCH routine rather than directly, so SvIOK is false. Another is
315       when numeric conversion has occurred and precision has been lost: only
316       the private flag is set on 'lossy' values. So when an NV is converted
317       to an IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK
318       wont be.
319
320       In general, though, it's best to use the "Sv*V" macros.
321
322   Working with AVs
323       There are two ways to create and load an AV.  The first method creates
324       an empty AV:
325
326           AV*  newAV();
327
328       The second method both creates the AV and initially populates it with
329       SVs:
330
331           AV*  av_make(I32 num, SV **ptr);
332
333       The second argument points to an array containing "num" "SV*"'s.  Once
334       the AV has been created, the SVs can be destroyed, if so desired.
335
336       Once the AV has been created, the following operations are possible on
337       it:
338
339           void  av_push(AV*, SV*);
340           SV*   av_pop(AV*);
341           SV*   av_shift(AV*);
342           void  av_unshift(AV*, I32 num);
343
344       These should be familiar operations, with the exception of
345       "av_unshift".  This routine adds "num" elements at the front of the
346       array with the "undef" value.  You must then use "av_store" (described
347       below) to assign values to these new elements.
348
349       Here are some other functions:
350
351           I32   av_len(AV*);
352           SV**  av_fetch(AV*, I32 key, I32 lval);
353           SV**  av_store(AV*, I32 key, SV* val);
354
355       The "av_len" function returns the highest index value in an array (just
356       like $#array in Perl).  If the array is empty, -1 is returned.  The
357       "av_fetch" function returns the value at index "key", but if "lval" is
358       non-zero, then "av_fetch" will store an undef value at that index.  The
359       "av_store" function stores the value "val" at index "key", and does not
360       increment the reference count of "val".  Thus the caller is responsible
361       for taking care of that, and if "av_store" returns NULL, the caller
362       will have to decrement the reference count to avoid a memory leak.
363       Note that "av_fetch" and "av_store" both return "SV**"'s, not "SV*"'s
364       as their return value.
365
366       A few more:
367
368           void  av_clear(AV*);
369           void  av_undef(AV*);
370           void  av_extend(AV*, I32 key);
371
372       The "av_clear" function deletes all the elements in the AV* array, but
373       does not actually delete the array itself.  The "av_undef" function
374       will delete all the elements in the array plus the array itself.  The
375       "av_extend" function extends the array so that it contains at least
376       "key+1" elements.  If "key+1" is less than the currently allocated
377       length of the array, then nothing is done.
378
379       If you know the name of an array variable, you can get a pointer to its
380       AV by using the following:
381
382           AV*  get_av("package::varname", 0);
383
384       This returns NULL if the variable does not exist.
385
386       See "Understanding the Magic of Tied Hashes and Arrays" for more
387       information on how to use the array access functions on tied arrays.
388
389   Working with HVs
390       To create an HV, you use the following routine:
391
392           HV*  newHV();
393
394       Once the HV has been created, the following operations are possible on
395       it:
396
397           SV**  hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
398           SV**  hv_fetch(HV*, const char* key, U32 klen, I32 lval);
399
400       The "klen" parameter is the length of the key being passed in (Note
401       that you cannot pass 0 in as a value of "klen" to tell Perl to measure
402       the length of the key).  The "val" argument contains the SV pointer to
403       the scalar being stored, and "hash" is the precomputed hash value (zero
404       if you want "hv_store" to calculate it for you).  The "lval" parameter
405       indicates whether this fetch is actually a part of a store operation,
406       in which case a new undefined value will be added to the HV with the
407       supplied key and "hv_fetch" will return as if the value had already
408       existed.
409
410       Remember that "hv_store" and "hv_fetch" return "SV**"'s and not just
411       "SV*".  To access the scalar value, you must first dereference the
412       return value.  However, you should check to make sure that the return
413       value is not NULL before dereferencing it.
414
415       The first of these two functions checks if a hash table entry exists,
416       and the second deletes it.
417
418           bool  hv_exists(HV*, const char* key, U32 klen);
419           SV*   hv_delete(HV*, const char* key, U32 klen, I32 flags);
420
421       If "flags" does not include the "G_DISCARD" flag then "hv_delete" will
422       create and return a mortal copy of the deleted value.
423
424       And more miscellaneous functions:
425
426           void   hv_clear(HV*);
427           void   hv_undef(HV*);
428
429       Like their AV counterparts, "hv_clear" deletes all the entries in the
430       hash table but does not actually delete the hash table.  The "hv_undef"
431       deletes both the entries and the hash table itself.
432
433       Perl keeps the actual data in a linked list of structures with a
434       typedef of HE.  These contain the actual key and value pointers (plus
435       extra administrative overhead).  The key is a string pointer; the value
436       is an "SV*".  However, once you have an "HE*", to get the actual key
437       and value, use the routines specified below.
438
439           I32    hv_iterinit(HV*);
440                   /* Prepares starting point to traverse hash table */
441           HE*    hv_iternext(HV*);
442                   /* Get the next entry, and return a pointer to a
443                      structure that has both the key and value */
444           char*  hv_iterkey(HE* entry, I32* retlen);
445                   /* Get the key from an HE structure and also return
446                      the length of the key string */
447           SV*    hv_iterval(HV*, HE* entry);
448                   /* Return an SV pointer to the value of the HE
449                      structure */
450           SV*    hv_iternextsv(HV*, char** key, I32* retlen);
451                   /* This convenience routine combines hv_iternext,
452                      hv_iterkey, and hv_iterval.  The key and retlen
453                      arguments are return values for the key and its
454                      length.  The value is returned in the SV* argument */
455
456       If you know the name of a hash variable, you can get a pointer to its
457       HV by using the following:
458
459           HV*  get_hv("package::varname", 0);
460
461       This returns NULL if the variable does not exist.
462
463       The hash algorithm is defined in the "PERL_HASH(hash, key, klen)"
464       macro:
465
466           hash = 0;
467           while (klen--)
468               hash = (hash * 33) + *key++;
469           hash = hash + (hash >> 5);                  /* after 5.6 */
470
471       The last step was added in version 5.6 to improve distribution of lower
472       bits in the resulting hash value.
473
474       See "Understanding the Magic of Tied Hashes and Arrays" for more
475       information on how to use the hash access functions on tied hashes.
476
477   Hash API Extensions
478       Beginning with version 5.004, the following functions are also
479       supported:
480
481           HE*     hv_fetch_ent  (HV* tb, SV* key, I32 lval, U32 hash);
482           HE*     hv_store_ent  (HV* tb, SV* key, SV* val, U32 hash);
483
484           bool    hv_exists_ent (HV* tb, SV* key, U32 hash);
485           SV*     hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
486
487           SV*     hv_iterkeysv  (HE* entry);
488
489       Note that these functions take "SV*" keys, which simplifies writing of
490       extension code that deals with hash structures.  These functions also
491       allow passing of "SV*" keys to "tie" functions without forcing you to
492       stringify the keys (unlike the previous set of functions).
493
494       They also return and accept whole hash entries ("HE*"), making their
495       use more efficient (since the hash number for a particular string
496       doesn't have to be recomputed every time).  See perlapi for detailed
497       descriptions.
498
499       The following macros must always be used to access the contents of hash
500       entries.  Note that the arguments to these macros must be simple
501       variables, since they may get evaluated more than once.  See perlapi
502       for detailed descriptions of these macros.
503
504           HePV(HE* he, STRLEN len)
505           HeVAL(HE* he)
506           HeHASH(HE* he)
507           HeSVKEY(HE* he)
508           HeSVKEY_force(HE* he)
509           HeSVKEY_set(HE* he, SV* sv)
510
511       These two lower level macros are defined, but must only be used when
512       dealing with keys that are not "SV*"s:
513
514           HeKEY(HE* he)
515           HeKLEN(HE* he)
516
517       Note that both "hv_store" and "hv_store_ent" do not increment the
518       reference count of the stored "val", which is the caller's
519       responsibility.  If these functions return a NULL value, the caller
520       will usually have to decrement the reference count of "val" to avoid a
521       memory leak.
522
523   AVs, HVs and undefined values
524       Sometimes you have to store undefined values in AVs or HVs. Although
525       this may be a rare case, it can be tricky. That's because you're used
526       to using &PL_sv_undef if you need an undefined SV.
527
528       For example, intuition tells you that this XS code:
529
530           AV *av = newAV();
531           av_store( av, 0, &PL_sv_undef );
532
533       is equivalent to this Perl code:
534
535           my @av;
536           $av[0] = undef;
537
538       Unfortunately, this isn't true. AVs use &PL_sv_undef as a marker for
539       indicating that an array element has not yet been initialized.  Thus,
540       "exists $av[0]" would be true for the above Perl code, but false for
541       the array generated by the XS code.
542
543       Other problems can occur when storing &PL_sv_undef in HVs:
544
545           hv_store( hv, "key", 3, &PL_sv_undef, 0 );
546
547       This will indeed make the value "undef", but if you try to modify the
548       value of "key", you'll get the following error:
549
550           Modification of non-creatable hash value attempted
551
552       In perl 5.8.0, &PL_sv_undef was also used to mark placeholders in
553       restricted hashes. This caused such hash entries not to appear when
554       iterating over the hash or when checking for the keys with the
555       "hv_exists" function.
556
557       You can run into similar problems when you store &PL_sv_yes or
558       &PL_sv_no into AVs or HVs. Trying to modify such elements will give you
559       the following error:
560
561           Modification of a read-only value attempted
562
563       To make a long story short, you can use the special variables
564       &PL_sv_undef, &PL_sv_yes and &PL_sv_no with AVs and HVs, but you have
565       to make sure you know what you're doing.
566
567       Generally, if you want to store an undefined value in an AV or HV, you
568       should not use &PL_sv_undef, but rather create a new undefined value
569       using the "newSV" function, for example:
570
571           av_store( av, 42, newSV(0) );
572           hv_store( hv, "foo", 3, newSV(0), 0 );
573
574   References
575       References are a special type of scalar that point to other data types
576       (including other references).
577
578       To create a reference, use either of the following functions:
579
580           SV* newRV_inc((SV*) thing);
581           SV* newRV_noinc((SV*) thing);
582
583       The "thing" argument can be any of an "SV*", "AV*", or "HV*".  The
584       functions are identical except that "newRV_inc" increments the
585       reference count of the "thing", while "newRV_noinc" does not.  For
586       historical reasons, "newRV" is a synonym for "newRV_inc".
587
588       Once you have a reference, you can use the following macro to
589       dereference the reference:
590
591           SvRV(SV*)
592
593       then call the appropriate routines, casting the returned "SV*" to
594       either an "AV*" or "HV*", if required.
595
596       To determine if an SV is a reference, you can use the following macro:
597
598           SvROK(SV*)
599
600       To discover what type of value the reference refers to, use the
601       following macro and then check the return value.
602
603           SvTYPE(SvRV(SV*))
604
605       The most useful types that will be returned are:
606
607           SVt_IV    Scalar
608           SVt_NV    Scalar
609           SVt_PV    Scalar
610           SVt_RV    Scalar
611           SVt_PVAV  Array
612           SVt_PVHV  Hash
613           SVt_PVCV  Code
614           SVt_PVGV  Glob (possibly a file handle)
615           SVt_PVMG  Blessed or Magical Scalar
616
617       See the sv.h header file for more details.
618
619   Blessed References and Class Objects
620       References are also used to support object-oriented programming.  In
621       perl's OO lexicon, an object is simply a reference that has been
622       blessed into a package (or class).  Once blessed, the programmer may
623       now use the reference to access the various methods in the class.
624
625       A reference can be blessed into a package with the following function:
626
627           SV* sv_bless(SV* sv, HV* stash);
628
629       The "sv" argument must be a reference value.  The "stash" argument
630       specifies which class the reference will belong to.  See "Stashes and
631       Globs" for information on converting class names into stashes.
632
633       /* Still under construction */
634
635       The following function upgrades rv to reference if not already one.
636       Creates a new SV for rv to point to.  If "classname" is non-null, the
637       SV is blessed into the specified class.  SV is returned.
638
639               SV* newSVrv(SV* rv, const char* classname);
640
641       The following three functions copy integer, unsigned integer or double
642       into an SV whose reference is "rv".  SV is blessed if "classname" is
643       non-null.
644
645               SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
646               SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
647               SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
648
649       The following function copies the pointer value (the address, not the
650       string!) into an SV whose reference is rv.  SV is blessed if
651       "classname" is non-null.
652
653               SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
654
655       The following function copies a string into an SV whose reference is
656       "rv".  Set length to 0 to let Perl calculate the string length.  SV is
657       blessed if "classname" is non-null.
658
659           SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
660                                                                STRLEN length);
661
662       The following function tests whether the SV is blessed into the
663       specified class.  It does not check inheritance relationships.
664
665               int  sv_isa(SV* sv, const char* name);
666
667       The following function tests whether the SV is a reference to a blessed
668       object.
669
670               int  sv_isobject(SV* sv);
671
672       The following function tests whether the SV is derived from the
673       specified class. SV can be either a reference to a blessed object or a
674       string containing a class name. This is the function implementing the
675       "UNIVERSAL::isa" functionality.
676
677               bool sv_derived_from(SV* sv, const char* name);
678
679       To check if you've got an object derived from a specific class you have
680       to write:
681
682               if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
683
684   Creating New Variables
685       To create a new Perl variable with an undef value which can be accessed
686       from your Perl script, use the following routines, depending on the
687       variable type.
688
689           SV*  get_sv("package::varname", GV_ADD);
690           AV*  get_av("package::varname", GV_ADD);
691           HV*  get_hv("package::varname", GV_ADD);
692
693       Notice the use of GV_ADD as the second parameter.  The new variable can
694       now be set, using the routines appropriate to the data type.
695
696       There are additional macros whose values may be bitwise OR'ed with the
697       "GV_ADD" argument to enable certain extra features.  Those bits are:
698
699       GV_ADDMULTI
700           Marks the variable as multiply defined, thus preventing the:
701
702             Name <varname> used only once: possible typo
703
704           warning.
705
706       GV_ADDWARN
707           Issues the warning:
708
709             Had to create <varname> unexpectedly
710
711           if the variable did not exist before the function was called.
712
713       If you do not specify a package name, the variable is created in the
714       current package.
715
716   Reference Counts and Mortality
717       Perl uses a reference count-driven garbage collection mechanism. SVs,
718       AVs, or HVs (xV for short in the following) start their life with a
719       reference count of 1.  If the reference count of an xV ever drops to 0,
720       then it will be destroyed and its memory made available for reuse.
721
722       This normally doesn't happen at the Perl level unless a variable is
723       undef'ed or the last variable holding a reference to it is changed or
724       overwritten.  At the internal level, however, reference counts can be
725       manipulated with the following macros:
726
727           int SvREFCNT(SV* sv);
728           SV* SvREFCNT_inc(SV* sv);
729           void SvREFCNT_dec(SV* sv);
730
731       However, there is one other function which manipulates the reference
732       count of its argument.  The "newRV_inc" function, you will recall,
733       creates a reference to the specified argument.  As a side effect, it
734       increments the argument's reference count.  If this is not what you
735       want, use "newRV_noinc" instead.
736
737       For example, imagine you want to return a reference from an XSUB
738       function.  Inside the XSUB routine, you create an SV which initially
739       has a reference count of one.  Then you call "newRV_inc", passing it
740       the just-created SV.  This returns the reference as a new SV, but the
741       reference count of the SV you passed to "newRV_inc" has been
742       incremented to two.  Now you return the reference from the XSUB routine
743       and forget about the SV.  But Perl hasn't!  Whenever the returned
744       reference is destroyed, the reference count of the original SV is
745       decreased to one and nothing happens.  The SV will hang around without
746       any way to access it until Perl itself terminates.  This is a memory
747       leak.
748
749       The correct procedure, then, is to use "newRV_noinc" instead of
750       "newRV_inc".  Then, if and when the last reference is destroyed, the
751       reference count of the SV will go to zero and it will be destroyed,
752       stopping any memory leak.
753
754       There are some convenience functions available that can help with the
755       destruction of xVs.  These functions introduce the concept of
756       "mortality".  An xV that is mortal has had its reference count marked
757       to be decremented, but not actually decremented, until "a short time
758       later".  Generally the term "short time later" means a single Perl
759       statement, such as a call to an XSUB function.  The actual determinant
760       for when mortal xVs have their reference count decremented depends on
761       two macros, SAVETMPS and FREETMPS.  See perlcall and perlxs for more
762       details on these macros.
763
764       "Mortalization" then is at its simplest a deferred "SvREFCNT_dec".
765       However, if you mortalize a variable twice, the reference count will
766       later be decremented twice.
767
768       "Mortal" SVs are mainly used for SVs that are placed on perl's stack.
769       For example an SV which is created just to pass a number to a called
770       sub is made mortal to have it cleaned up automatically when it's popped
771       off the stack. Similarly, results returned by XSUBs (which are pushed
772       on the stack) are often made mortal.
773
774       To create a mortal variable, use the functions:
775
776           SV*  sv_newmortal()
777           SV*  sv_2mortal(SV*)
778           SV*  sv_mortalcopy(SV*)
779
780       The first call creates a mortal SV (with no value), the second converts
781       an existing SV to a mortal SV (and thus defers a call to
782       "SvREFCNT_dec"), and the third creates a mortal copy of an existing SV.
783       Because "sv_newmortal" gives the new SV no value, it must normally be
784       given one via "sv_setpv", "sv_setiv", etc. :
785
786           SV *tmp = sv_newmortal();
787           sv_setiv(tmp, an_integer);
788
789       As that is multiple C statements it is quite common so see this idiom
790       instead:
791
792           SV *tmp = sv_2mortal(newSViv(an_integer));
793
794       You should be careful about creating mortal variables.  Strange things
795       can happen if you make the same value mortal within multiple contexts,
796       or if you make a variable mortal multiple times. Thinking of
797       "Mortalization" as deferred "SvREFCNT_dec" should help to minimize such
798       problems.  For example if you are passing an SV which you know has a
799       high enough REFCNT to survive its use on the stack you need not do any
800       mortalization.  If you are not sure then doing an "SvREFCNT_inc" and
801       "sv_2mortal", or making a "sv_mortalcopy" is safer.
802
803       The mortal routines are not just for SVs; AVs and HVs can be made
804       mortal by passing their address (type-casted to "SV*") to the
805       "sv_2mortal" or "sv_mortalcopy" routines.
806
807   Stashes and Globs
808       A stash is a hash that contains all variables that are defined within a
809       package.  Each key of the stash is a symbol name (shared by all the
810       different types of objects that have the same name), and each value in
811       the hash table is a GV (Glob Value).  This GV in turn contains
812       references to the various objects of that name, including (but not
813       limited to) the following:
814
815           Scalar Value
816           Array Value
817           Hash Value
818           I/O Handle
819           Format
820           Subroutine
821
822       There is a single stash called "PL_defstash" that holds the items that
823       exist in the "main" package.  To get at the items in other packages,
824       append the string "::" to the package name.  The items in the "Foo"
825       package are in the stash "Foo::" in PL_defstash.  The items in the
826       "Bar::Baz" package are in the stash "Baz::" in "Bar::"'s stash.
827
828       To get the stash pointer for a particular package, use the function:
829
830           HV*  gv_stashpv(const char* name, I32 flags)
831           HV*  gv_stashsv(SV*, I32 flags)
832
833       The first function takes a literal string, the second uses the string
834       stored in the SV.  Remember that a stash is just a hash table, so you
835       get back an "HV*".  The "flags" flag will create a new package if it is
836       set to GV_ADD.
837
838       The name that "gv_stash*v" wants is the name of the package whose
839       symbol table you want.  The default package is called "main".  If you
840       have multiply nested packages, pass their names to "gv_stash*v",
841       separated by "::" as in the Perl language itself.
842
843       Alternately, if you have an SV that is a blessed reference, you can
844       find out the stash pointer by using:
845
846           HV*  SvSTASH(SvRV(SV*));
847
848       then use the following to get the package name itself:
849
850           char*  HvNAME(HV* stash);
851
852       If you need to bless or re-bless an object you can use the following
853       function:
854
855           SV*  sv_bless(SV*, HV* stash)
856
857       where the first argument, an "SV*", must be a reference, and the second
858       argument is a stash.  The returned "SV*" can now be used in the same
859       way as any other SV.
860
861       For more information on references and blessings, consult perlref.
862
863   Double-Typed SVs
864       Scalar variables normally contain only one type of value, an integer,
865       double, pointer, or reference.  Perl will automatically convert the
866       actual scalar data from the stored type into the requested type.
867
868       Some scalar variables contain more than one type of scalar data.  For
869       example, the variable $! contains either the numeric value of "errno"
870       or its string equivalent from either "strerror" or "sys_errlist[]".
871
872       To force multiple data values into an SV, you must do two things: use
873       the "sv_set*v" routines to add the additional scalar type, then set a
874       flag so that Perl will believe it contains more than one type of data.
875       The four macros to set the flags are:
876
877               SvIOK_on
878               SvNOK_on
879               SvPOK_on
880               SvROK_on
881
882       The particular macro you must use depends on which "sv_set*v" routine
883       you called first.  This is because every "sv_set*v" routine turns on
884       only the bit for the particular type of data being set, and turns off
885       all the rest.
886
887       For example, to create a new Perl variable called "dberror" that
888       contains both the numeric and descriptive string error values, you
889       could use the following code:
890
891           extern int  dberror;
892           extern char *dberror_list;
893
894           SV* sv = get_sv("dberror", GV_ADD);
895           sv_setiv(sv, (IV) dberror);
896           sv_setpv(sv, dberror_list[dberror]);
897           SvIOK_on(sv);
898
899       If the order of "sv_setiv" and "sv_setpv" had been reversed, then the
900       macro "SvPOK_on" would need to be called instead of "SvIOK_on".
901
902   Magic Variables
903       [This section still under construction.  Ignore everything here.  Post
904       no bills.  Everything not permitted is forbidden.]
905
906       Any SV may be magical, that is, it has special features that a normal
907       SV does not have.  These features are stored in the SV structure in a
908       linked list of "struct magic"'s, typedef'ed to "MAGIC".
909
910           struct magic {
911               MAGIC*      mg_moremagic;
912               MGVTBL*     mg_virtual;
913               U16         mg_private;
914               char        mg_type;
915               U8          mg_flags;
916               I32         mg_len;
917               SV*         mg_obj;
918               char*       mg_ptr;
919           };
920
921       Note this is current as of patchlevel 0, and could change at any time.
922
923   Assigning Magic
924       Perl adds magic to an SV using the sv_magic function:
925
926         void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
927
928       The "sv" argument is a pointer to the SV that is to acquire a new
929       magical feature.
930
931       If "sv" is not already magical, Perl uses the "SvUPGRADE" macro to
932       convert "sv" to type "SVt_PVMG". Perl then continues by adding new
933       magic to the beginning of the linked list of magical features.  Any
934       prior entry of the same type of magic is deleted.  Note that this can
935       be overridden, and multiple instances of the same type of magic can be
936       associated with an SV.
937
938       The "name" and "namlen" arguments are used to associate a string with
939       the magic, typically the name of a variable. "namlen" is stored in the
940       "mg_len" field and if "name" is non-null then either a "savepvn" copy
941       of "name" or "name" itself is stored in the "mg_ptr" field, depending
942       on whether "namlen" is greater than zero or equal to zero respectively.
943       As a special case, if "(name && namlen == HEf_SVKEY)" then "name" is
944       assumed to contain an "SV*" and is stored as-is with its REFCNT
945       incremented.
946
947       The sv_magic function uses "how" to determine which, if any, predefined
948       "Magic Virtual Table" should be assigned to the "mg_virtual" field.
949       See the "Magic Virtual Tables" section below.  The "how" argument is
950       also stored in the "mg_type" field. The value of "how" should be chosen
951       from the set of macros "PERL_MAGIC_foo" found in perl.h. Note that
952       before these macros were added, Perl internals used to directly use
953       character literals, so you may occasionally come across old code or
954       documentation referring to 'U' magic rather than "PERL_MAGIC_uvar" for
955       example.
956
957       The "obj" argument is stored in the "mg_obj" field of the "MAGIC"
958       structure.  If it is not the same as the "sv" argument, the reference
959       count of the "obj" object is incremented.  If it is the same, or if the
960       "how" argument is "PERL_MAGIC_arylen", or if it is a NULL pointer, then
961       "obj" is merely stored, without the reference count being incremented.
962
963       See also "sv_magicext" in perlapi for a more flexible way to add magic
964       to an SV.
965
966       There is also a function to add magic to an "HV":
967
968           void hv_magic(HV *hv, GV *gv, int how);
969
970       This simply calls "sv_magic" and coerces the "gv" argument into an
971       "SV".
972
973       To remove the magic from an SV, call the function sv_unmagic:
974
975           int sv_unmagic(SV *sv, int type);
976
977       The "type" argument should be equal to the "how" value when the "SV"
978       was initially made magical.
979
980       However, note that "sv_unmagic" removes all magic of a certain "type"
981       from the "SV". If you want to remove only certain magic of a "type"
982       based on the magic virtual table, use "sv_unmagicext" instead:
983
984           int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
985
986   Magic Virtual Tables
987       The "mg_virtual" field in the "MAGIC" structure is a pointer to an
988       "MGVTBL", which is a structure of function pointers and stands for
989       "Magic Virtual Table" to handle the various operations that might be
990       applied to that variable.
991
992       The "MGVTBL" has five (or sometimes eight) pointers to the following
993       routine types:
994
995           int  (*svt_get)(SV* sv, MAGIC* mg);
996           int  (*svt_set)(SV* sv, MAGIC* mg);
997           U32  (*svt_len)(SV* sv, MAGIC* mg);
998           int  (*svt_clear)(SV* sv, MAGIC* mg);
999           int  (*svt_free)(SV* sv, MAGIC* mg);
1000
1001           int  (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv,
1002                                                 const char *name, I32 namlen);
1003           int  (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param);
1004           int  (*svt_local)(SV *nsv, MAGIC *mg);
1005
1006       This MGVTBL structure is set at compile-time in perl.h and there are
1007       currently 32 types.  These different structures contain pointers to
1008       various routines that perform additional actions depending on which
1009       function is being called.
1010
1011          Function pointer    Action taken
1012          ----------------    ------------
1013          svt_get             Do something before the value of the SV is
1014                              retrieved.
1015          svt_set             Do something after the SV is assigned a value.
1016          svt_len             Report on the SV's length.
1017          svt_clear           Clear something the SV represents.
1018          svt_free            Free any extra storage associated with the SV.
1019
1020          svt_copy            copy tied variable magic to a tied element
1021          svt_dup             duplicate a magic structure during thread cloning
1022          svt_local           copy magic to local value during 'local'
1023
1024       For instance, the MGVTBL structure called "vtbl_sv" (which corresponds
1025       to an "mg_type" of "PERL_MAGIC_sv") contains:
1026
1027           { magic_get, magic_set, magic_len, 0, 0 }
1028
1029       Thus, when an SV is determined to be magical and of type
1030       "PERL_MAGIC_sv", if a get operation is being performed, the routine
1031       "magic_get" is called.  All the various routines for the various
1032       magical types begin with "magic_".  NOTE: the magic routines are not
1033       considered part of the Perl API, and may not be exported by the Perl
1034       library.
1035
1036       The last three slots are a recent addition, and for source code
1037       compatibility they are only checked for if one of the three flags
1038       MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. This means that most
1039       code can continue declaring a vtable as a 5-element value. These three
1040       are currently used exclusively by the threading code, and are highly
1041       subject to change.
1042
1043       The current kinds of Magic Virtual Tables are:
1044
1045        mg_type
1046        (old-style char and macro)   MGVTBL          Type of magic
1047        --------------------------   ------          -------------
1048        \0 PERL_MAGIC_sv             vtbl_sv         Special scalar variable
1049        #  PERL_MAGIC_arylen         vtbl_arylen     Array length ($#ary)
1050        %  PERL_MAGIC_rhash          (none)          extra data for restricted
1051                                                     hashes
1052        .  PERL_MAGIC_pos            vtbl_pos        pos() lvalue
1053        :  PERL_MAGIC_symtab         (none)          extra data for symbol
1054                                                     tables
1055        <  PERL_MAGIC_backref        vtbl_backref    for weak ref data
1056        @  PERL_MAGIC_arylen_p       (none)          to move arylen out of
1057                                                     XPVAV
1058        A  PERL_MAGIC_overload       vtbl_amagic     %OVERLOAD hash
1059        a  PERL_MAGIC_overload_elem  vtbl_amagicelem %OVERLOAD hash element
1060        B  PERL_MAGIC_bm             vtbl_regexp     Boyer-Moore
1061                                                     (fast string search)
1062        c  PERL_MAGIC_overload_table vtbl_ovrld      Holds overload table
1063                                                     (AMT) on stash
1064        D  PERL_MAGIC_regdata        vtbl_regdata    Regex match position data
1065                                                     (@+ and @- vars)
1066        d  PERL_MAGIC_regdatum       vtbl_regdatum   Regex match position data
1067                                                     element
1068        E  PERL_MAGIC_env            vtbl_env        %ENV hash
1069        e  PERL_MAGIC_envelem        vtbl_envelem    %ENV hash element
1070        f  PERL_MAGIC_fm             vtbl_regdata    Formline
1071                                                     ('compiled' format)
1072        G  PERL_MAGIC_study          vtbl_regexp     study()ed string
1073        g  PERL_MAGIC_regex_global   vtbl_mglob      m//g target
1074        H  PERL_MAGIC_hints          vtbl_hints      %^H hash
1075        h  PERL_MAGIC_hintselem      vtbl_hintselem  %^H hash element
1076        I  PERL_MAGIC_isa            vtbl_isa        @ISA array
1077        i  PERL_MAGIC_isaelem        vtbl_isaelem    @ISA array element
1078        k  PERL_MAGIC_nkeys          vtbl_nkeys      scalar(keys()) lvalue
1079        L  PERL_MAGIC_dbfile         (none)          Debugger %_<filename
1080        l  PERL_MAGIC_dbline         vtbl_dbline     Debugger %_<filename
1081                                                     element
1082        N  PERL_MAGIC_shared         (none)          Shared between threads
1083        n  PERL_MAGIC_shared_scalar  (none)          Shared between threads
1084        o  PERL_MAGIC_collxfrm       vtbl_collxfrm   Locale transformation
1085        P  PERL_MAGIC_tied           vtbl_pack       Tied array or hash
1086        p  PERL_MAGIC_tiedelem       vtbl_packelem   Tied array or hash element
1087        q  PERL_MAGIC_tiedscalar     vtbl_packelem   Tied scalar or handle
1088        r  PERL_MAGIC_qr             vtbl_regexp     precompiled qr// regex
1089        S  PERL_MAGIC_sig            (none)          %SIG hash
1090        s  PERL_MAGIC_sigelem        vtbl_sigelem    %SIG hash element
1091        t  PERL_MAGIC_taint          vtbl_taint      Taintedness
1092        U  PERL_MAGIC_uvar           vtbl_uvar       Available for use by
1093                                                     extensions
1094        u  PERL_MAGIC_uvar_elem      (none)          Reserved for use by
1095                                                     extensions
1096        V  PERL_MAGIC_vstring        vtbl_vstring    SV was vstring literal
1097        v  PERL_MAGIC_vec            vtbl_vec        vec() lvalue
1098        w  PERL_MAGIC_utf8           vtbl_utf8       Cached UTF-8 information
1099        x  PERL_MAGIC_substr         vtbl_substr     substr() lvalue
1100        y  PERL_MAGIC_defelem        vtbl_defelem    Shadow "foreach" iterator
1101                                                     variable / smart parameter
1102                                                     vivification
1103        ]  PERL_MAGIC_checkcall      (none)          inlining/mutation of call
1104                                                     to this CV
1105        ~  PERL_MAGIC_ext            (none)          Available for use by
1106                                                     extensions
1107
1108       When an uppercase and lowercase letter both exist in the table, then
1109       the uppercase letter is typically used to represent some kind of
1110       composite type (a list or a hash), and the lowercase letter is used to
1111       represent an element of that composite type. Some internals code makes
1112       use of this case relationship.  However, 'v' and 'V' (vec and v-string)
1113       are in no way related.
1114
1115       The "PERL_MAGIC_ext" and "PERL_MAGIC_uvar" magic types are defined
1116       specifically for use by extensions and will not be used by perl itself.
1117       Extensions can use "PERL_MAGIC_ext" magic to 'attach' private
1118       information to variables (typically objects).  This is especially
1119       useful because there is no way for normal perl code to corrupt this
1120       private information (unlike using extra elements of a hash object).
1121
1122       Similarly, "PERL_MAGIC_uvar" magic can be used much like tie() to call
1123       a C function any time a scalar's value is used or changed.  The
1124       "MAGIC"'s "mg_ptr" field points to a "ufuncs" structure:
1125
1126           struct ufuncs {
1127               I32 (*uf_val)(pTHX_ IV, SV*);
1128               I32 (*uf_set)(pTHX_ IV, SV*);
1129               IV uf_index;
1130           };
1131
1132       When the SV is read from or written to, the "uf_val" or "uf_set"
1133       function will be called with "uf_index" as the first arg and a pointer
1134       to the SV as the second.  A simple example of how to add
1135       "PERL_MAGIC_uvar" magic is shown below.  Note that the ufuncs structure
1136       is copied by sv_magic, so you can safely allocate it on the stack.
1137
1138           void
1139           Umagic(sv)
1140               SV *sv;
1141           PREINIT:
1142               struct ufuncs uf;
1143           CODE:
1144               uf.uf_val   = &my_get_fn;
1145               uf.uf_set   = &my_set_fn;
1146               uf.uf_index = 0;
1147               sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1148
1149       Attaching "PERL_MAGIC_uvar" to arrays is permissible but has no effect.
1150
1151       For hashes there is a specialized hook that gives control over hash
1152       keys (but not values).  This hook calls "PERL_MAGIC_uvar" 'get' magic
1153       if the "set" function in the "ufuncs" structure is NULL.  The hook is
1154       activated whenever the hash is accessed with a key specified as an "SV"
1155       through the functions "hv_store_ent", "hv_fetch_ent", "hv_delete_ent",
1156       and "hv_exists_ent".  Accessing the key as a string through the
1157       functions without the "..._ent" suffix circumvents the hook.  See
1158       "GUTS" in Hash::Util::FieldHash for a detailed description.
1159
1160       Note that because multiple extensions may be using "PERL_MAGIC_ext" or
1161       "PERL_MAGIC_uvar" magic, it is important for extensions to take extra
1162       care to avoid conflict.  Typically only using the magic on objects
1163       blessed into the same class as the extension is sufficient.  For
1164       "PERL_MAGIC_ext" magic, it is usually a good idea to define an
1165       "MGVTBL", even if all its fields will be 0, so that individual "MAGIC"
1166       pointers can be identified as a particular kind of magic using their
1167       magic virtual table. "mg_findext" provides an easy way to do that:
1168
1169           STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
1170
1171           MAGIC *mg;
1172           if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
1173               /* this is really ours, not another module's PERL_MAGIC_ext */
1174               my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr;
1175               ...
1176           }
1177
1178       Also note that the "sv_set*()" and "sv_cat*()" functions described
1179       earlier do not invoke 'set' magic on their targets.  This must be done
1180       by the user either by calling the "SvSETMAGIC()" macro after calling
1181       these functions, or by using one of the "sv_set*_mg()" or
1182       "sv_cat*_mg()" functions.  Similarly, generic C code must call the
1183       "SvGETMAGIC()" macro to invoke any 'get' magic if they use an SV
1184       obtained from external sources in functions that don't handle magic.
1185       See perlapi for a description of these functions.  For example, calls
1186       to the "sv_cat*()" functions typically need to be followed by
1187       "SvSETMAGIC()", but they don't need a prior "SvGETMAGIC()" since their
1188       implementation handles 'get' magic.
1189
1190   Finding Magic
1191           MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
1192                                              * type */
1193
1194       This routine returns a pointer to a "MAGIC" structure stored in the SV.
1195       If the SV does not have that magical feature, "NULL" is returned. If
1196       the SV has multiple instances of that magical feature, the first one
1197       will be returned. "mg_findext" can be used to find a "MAGIC" structure
1198       of an SV based on both its magic type and its magic virtual table:
1199
1200           MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
1201
1202       Also, if the SV passed to "mg_find" or "mg_findext" is not of type
1203       SVt_PVMG, Perl may core dump.
1204
1205           int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1206
1207       This routine checks to see what types of magic "sv" has.  If the
1208       mg_type field is an uppercase letter, then the mg_obj is copied to
1209       "nsv", but the mg_type field is changed to be the lowercase letter.
1210
1211   Understanding the Magic of Tied Hashes and Arrays
1212       Tied hashes and arrays are magical beasts of the "PERL_MAGIC_tied"
1213       magic type.
1214
1215       WARNING: As of the 5.004 release, proper usage of the array and hash
1216       access functions requires understanding a few caveats.  Some of these
1217       caveats are actually considered bugs in the API, to be fixed in later
1218       releases, and are bracketed with [MAYCHANGE] below. If you find
1219       yourself actually applying such information in this section, be aware
1220       that the behavior may change in the future, umm, without warning.
1221
1222       The perl tie function associates a variable with an object that
1223       implements the various GET, SET, etc methods.  To perform the
1224       equivalent of the perl tie function from an XSUB, you must mimic this
1225       behaviour.  The code below carries out the necessary steps - firstly it
1226       creates a new hash, and then creates a second hash which it blesses
1227       into the class which will implement the tie methods. Lastly it ties the
1228       two hashes together, and returns a reference to the new tied hash.
1229       Note that the code below does NOT call the TIEHASH method in the MyTie
1230       class - see "Calling Perl Routines from within C Programs" for details
1231       on how to do this.
1232
1233           SV*
1234           mytie()
1235           PREINIT:
1236               HV *hash;
1237               HV *stash;
1238               SV *tie;
1239           CODE:
1240               hash = newHV();
1241               tie = newRV_noinc((SV*)newHV());
1242               stash = gv_stashpv("MyTie", GV_ADD);
1243               sv_bless(tie, stash);
1244               hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1245               RETVAL = newRV_noinc(hash);
1246           OUTPUT:
1247               RETVAL
1248
1249       The "av_store" function, when given a tied array argument, merely
1250       copies the magic of the array onto the value to be "stored", using
1251       "mg_copy".  It may also return NULL, indicating that the value did not
1252       actually need to be stored in the array.  [MAYCHANGE] After a call to
1253       "av_store" on a tied array, the caller will usually need to call
1254       "mg_set(val)" to actually invoke the perl level "STORE" method on the
1255       TIEARRAY object.  If "av_store" did return NULL, a call to
1256       "SvREFCNT_dec(val)" will also be usually necessary to avoid a memory
1257       leak. [/MAYCHANGE]
1258
1259       The previous paragraph is applicable verbatim to tied hash access using
1260       the "hv_store" and "hv_store_ent" functions as well.
1261
1262       "av_fetch" and the corresponding hash functions "hv_fetch" and
1263       "hv_fetch_ent" actually return an undefined mortal value whose magic
1264       has been initialized using "mg_copy".  Note the value so returned does
1265       not need to be deallocated, as it is already mortal.  [MAYCHANGE] But
1266       you will need to call "mg_get()" on the returned value in order to
1267       actually invoke the perl level "FETCH" method on the underlying TIE
1268       object.  Similarly, you may also call "mg_set()" on the return value
1269       after possibly assigning a suitable value to it using "sv_setsv",
1270       which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]
1271
1272       [MAYCHANGE] In other words, the array or hash fetch/store functions
1273       don't really fetch and store actual values in the case of tied arrays
1274       and hashes.  They merely call "mg_copy" to attach magic to the values
1275       that were meant to be "stored" or "fetched".  Later calls to "mg_get"
1276       and "mg_set" actually do the job of invoking the TIE methods on the
1277       underlying objects.  Thus the magic mechanism currently implements a
1278       kind of lazy access to arrays and hashes.
1279
1280       Currently (as of perl version 5.004), use of the hash and array access
1281       functions requires the user to be aware of whether they are operating
1282       on "normal" hashes and arrays, or on their tied variants.  The API may
1283       be changed to provide more transparent access to both tied and normal
1284       data types in future versions.  [/MAYCHANGE]
1285
1286       You would do well to understand that the TIEARRAY and TIEHASH
1287       interfaces are mere sugar to invoke some perl method calls while using
1288       the uniform hash and array syntax.  The use of this sugar imposes some
1289       overhead (typically about two to four extra opcodes per FETCH/STORE
1290       operation, in addition to the creation of all the mortal variables
1291       required to invoke the methods).  This overhead will be comparatively
1292       small if the TIE methods are themselves substantial, but if they are
1293       only a few statements long, the overhead will not be insignificant.
1294
1295   Localizing changes
1296       Perl has a very handy construction
1297
1298         {
1299           local $var = 2;
1300           ...
1301         }
1302
1303       This construction is approximately equivalent to
1304
1305         {
1306           my $oldvar = $var;
1307           $var = 2;
1308           ...
1309           $var = $oldvar;
1310         }
1311
1312       The biggest difference is that the first construction would reinstate
1313       the initial value of $var, irrespective of how control exits the block:
1314       "goto", "return", "die"/"eval", etc. It is a little bit more efficient
1315       as well.
1316
1317       There is a way to achieve a similar task from C via Perl API: create a
1318       pseudo-block, and arrange for some changes to be automatically undone
1319       at the end of it, either explicit, or via a non-local exit (via die()).
1320       A block-like construct is created by a pair of "ENTER"/"LEAVE" macros
1321       (see "Returning a Scalar" in perlcall).  Such a construct may be
1322       created specially for some important localized task, or an existing one
1323       (like boundaries of enclosing Perl subroutine/block, or an existing
1324       pair for freeing TMPs) may be used. (In the second case the overhead of
1325       additional localization must be almost negligible.) Note that any XSUB
1326       is automatically enclosed in an "ENTER"/"LEAVE" pair.
1327
1328       Inside such a pseudo-block the following service is available:
1329
1330       "SAVEINT(int i)"
1331       "SAVEIV(IV i)"
1332       "SAVEI32(I32 i)"
1333       "SAVELONG(long i)"
1334           These macros arrange things to restore the value of integer
1335           variable "i" at the end of enclosing pseudo-block.
1336
1337       SAVESPTR(s)
1338       SAVEPPTR(p)
1339           These macros arrange things to restore the value of pointers "s"
1340           and "p". "s" must be a pointer of a type which survives conversion
1341           to "SV*" and back, "p" should be able to survive conversion to
1342           "char*" and back.
1343
1344       "SAVEFREESV(SV *sv)"
1345           The refcount of "sv" would be decremented at the end of pseudo-
1346           block.  This is similar to "sv_2mortal" in that it is also a
1347           mechanism for doing a delayed "SvREFCNT_dec".  However, while
1348           "sv_2mortal" extends the lifetime of "sv" until the beginning of
1349           the next statement, "SAVEFREESV" extends it until the end of the
1350           enclosing scope.  These lifetimes can be wildly different.
1351
1352           Also compare "SAVEMORTALIZESV".
1353
1354       "SAVEMORTALIZESV(SV *sv)"
1355           Just like "SAVEFREESV", but mortalizes "sv" at the end of the
1356           current scope instead of decrementing its reference count.  This
1357           usually has the effect of keeping "sv" alive until the statement
1358           that called the currently live scope has finished executing.
1359
1360       "SAVEFREEOP(OP *op)"
1361           The "OP *" is op_free()ed at the end of pseudo-block.
1362
1363       SAVEFREEPV(p)
1364           The chunk of memory which is pointed to by "p" is Safefree()ed at
1365           the end of pseudo-block.
1366
1367       "SAVECLEARSV(SV *sv)"
1368           Clears a slot in the current scratchpad which corresponds to "sv"
1369           at the end of pseudo-block.
1370
1371       "SAVEDELETE(HV *hv, char *key, I32 length)"
1372           The key "key" of "hv" is deleted at the end of pseudo-block. The
1373           string pointed to by "key" is Safefree()ed.  If one has a key in
1374           short-lived storage, the corresponding string may be reallocated
1375           like this:
1376
1377             SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1378
1379       "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
1380           At the end of pseudo-block the function "f" is called with the only
1381           argument "p".
1382
1383       "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
1384           At the end of pseudo-block the function "f" is called with the
1385           implicit context argument (if any), and "p".
1386
1387       "SAVESTACK_POS()"
1388           The current offset on the Perl internal stack (cf. "SP") is
1389           restored at the end of pseudo-block.
1390
1391       The following API list contains functions, thus one needs to provide
1392       pointers to the modifiable data explicitly (either C pointers, or
1393       Perlish "GV *"s).  Where the above macros take "int", a similar
1394       function takes "int *".
1395
1396       "SV* save_scalar(GV *gv)"
1397           Equivalent to Perl code "local $gv".
1398
1399       "AV* save_ary(GV *gv)"
1400       "HV* save_hash(GV *gv)"
1401           Similar to "save_scalar", but localize @gv and %gv.
1402
1403       "void save_item(SV *item)"
1404           Duplicates the current value of "SV", on the exit from the current
1405           "ENTER"/"LEAVE" pseudo-block will restore the value of "SV" using
1406           the stored value. It doesn't handle magic. Use "save_scalar" if
1407           magic is affected.
1408
1409       "void save_list(SV **sarg, I32 maxsarg)"
1410           A variant of "save_item" which takes multiple arguments via an
1411           array "sarg" of "SV*" of length "maxsarg".
1412
1413       "SV* save_svref(SV **sptr)"
1414           Similar to "save_scalar", but will reinstate an "SV *".
1415
1416       "void save_aptr(AV **aptr)"
1417       "void save_hptr(HV **hptr)"
1418           Similar to "save_svref", but localize "AV *" and "HV *".
1419
1420       The "Alias" module implements localization of the basic types within
1421       the caller's scope.  People who are interested in how to localize
1422       things in the containing scope should take a look there too.
1423

Subroutines

1425   XSUBs and the Argument Stack
1426       The XSUB mechanism is a simple way for Perl programs to access C
1427       subroutines.  An XSUB routine will have a stack that contains the
1428       arguments from the Perl program, and a way to map from the Perl data
1429       structures to a C equivalent.
1430
1431       The stack arguments are accessible through the ST(n) macro, which
1432       returns the "n"'th stack argument.  Argument 0 is the first argument
1433       passed in the Perl subroutine call.  These arguments are "SV*", and can
1434       be used anywhere an "SV*" is used.
1435
1436       Most of the time, output from the C routine can be handled through use
1437       of the RETVAL and OUTPUT directives.  However, there are some cases
1438       where the argument stack is not already long enough to handle all the
1439       return values.  An example is the POSIX tzname() call, which takes no
1440       arguments, but returns two, the local time zone's standard and summer
1441       time abbreviations.
1442
1443       To handle this situation, the PPCODE directive is used and the stack is
1444       extended using the macro:
1445
1446           EXTEND(SP, num);
1447
1448       where "SP" is the macro that represents the local copy of the stack
1449       pointer, and "num" is the number of elements the stack should be
1450       extended by.
1451
1452       Now that there is room on the stack, values can be pushed on it using
1453       "PUSHs" macro. The pushed values will often need to be "mortal" (See
1454       "Reference Counts and Mortality"):
1455
1456           PUSHs(sv_2mortal(newSViv(an_integer)))
1457           PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
1458           PUSHs(sv_2mortal(newSVnv(a_double)))
1459           PUSHs(sv_2mortal(newSVpv("Some String",0)))
1460           /* Although the last example is better written as the more
1461            * efficient: */
1462           PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
1463
1464       And now the Perl program calling "tzname", the two values will be
1465       assigned as in:
1466
1467           ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1468
1469       An alternate (and possibly simpler) method to pushing values on the
1470       stack is to use the macro:
1471
1472           XPUSHs(SV*)
1473
1474       This macro automatically adjusts the stack for you, if needed.  Thus,
1475       you do not need to call "EXTEND" to extend the stack.
1476
1477       Despite their suggestions in earlier versions of this document the
1478       macros "(X)PUSH[iunp]" are not suited to XSUBs which return multiple
1479       results.  For that, either stick to the "(X)PUSHs" macros shown above,
1480       or use the new "m(X)PUSH[iunp]" macros instead; see "Putting a C value
1481       on Perl stack".
1482
1483       For more information, consult perlxs and perlxstut.
1484
1485   Autoloading with XSUBs
1486       If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts
1487       the fully-qualified name of the autoloaded subroutine in the $AUTOLOAD
1488       variable of the XSUB's package.
1489
1490       But it also puts the same information in certain fields of the XSUB
1491       itself:
1492
1493           HV *stash           = CvSTASH(cv);
1494           const char *subname = SvPVX(cv);
1495           STRLEN name_length  = SvCUR(cv); /* in bytes */
1496           U32 is_utf8         = SvUTF8(cv);
1497
1498       "SvPVX(cv)" contains just the sub name itself, not including the
1499       package.  For an AUTOLOAD routine in UNIVERSAL or one of its
1500       superclasses, "CvSTASH(cv)" returns NULL during a method call on a
1501       nonexistent package.
1502
1503       Note: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
1504       XS AUTOLOAD subs at all.  Perl 5.8.0 introduced the use of fields in
1505       the XSUB itself.  Perl 5.16.0 restored the setting of $AUTOLOAD.  If
1506       you need to support 5.8-5.14, use the XSUB's fields.
1507
1508   Calling Perl Routines from within C Programs
1509       There are four routines that can be used to call a Perl subroutine from
1510       within a C program.  These four are:
1511
1512           I32  call_sv(SV*, I32);
1513           I32  call_pv(const char*, I32);
1514           I32  call_method(const char*, I32);
1515           I32  call_argv(const char*, I32, register char**);
1516
1517       The routine most often used is "call_sv".  The "SV*" argument contains
1518       either the name of the Perl subroutine to be called, or a reference to
1519       the subroutine.  The second argument consists of flags that control the
1520       context in which the subroutine is called, whether or not the
1521       subroutine is being passed arguments, how errors should be trapped, and
1522       how to treat return values.
1523
1524       All four routines return the number of arguments that the subroutine
1525       returned on the Perl stack.
1526
1527       These routines used to be called "perl_call_sv", etc., before Perl
1528       v5.6.0, but those names are now deprecated; macros of the same name are
1529       provided for compatibility.
1530
1531       When using any of these routines (except "call_argv"), the programmer
1532       must manipulate the Perl stack.  These include the following macros and
1533       functions:
1534
1535           dSP
1536           SP
1537           PUSHMARK()
1538           PUTBACK
1539           SPAGAIN
1540           ENTER
1541           SAVETMPS
1542           FREETMPS
1543           LEAVE
1544           XPUSH*()
1545           POP*()
1546
1547       For a detailed description of calling conventions from C to Perl,
1548       consult perlcall.
1549
1550   Memory Allocation
1551       Allocation
1552
1553       All memory meant to be used with the Perl API functions should be
1554       manipulated using the macros described in this section.  The macros
1555       provide the necessary transparency between differences in the actual
1556       malloc implementation that is used within perl.
1557
1558       It is suggested that you enable the version of malloc that is
1559       distributed with Perl.  It keeps pools of various sizes of unallocated
1560       memory in order to satisfy allocation requests more quickly.  However,
1561       on some platforms, it may cause spurious malloc or free errors.
1562
1563       The following three macros are used to initially allocate memory :
1564
1565           Newx(pointer, number, type);
1566           Newxc(pointer, number, type, cast);
1567           Newxz(pointer, number, type);
1568
1569       The first argument "pointer" should be the name of a variable that will
1570       point to the newly allocated memory.
1571
1572       The second and third arguments "number" and "type" specify how many of
1573       the specified type of data structure should be allocated.  The argument
1574       "type" is passed to "sizeof".  The final argument to "Newxc", "cast",
1575       should be used if the "pointer" argument is different from the "type"
1576       argument.
1577
1578       Unlike the "Newx" and "Newxc" macros, the "Newxz" macro calls "memzero"
1579       to zero out all the newly allocated memory.
1580
1581       Reallocation
1582
1583           Renew(pointer, number, type);
1584           Renewc(pointer, number, type, cast);
1585           Safefree(pointer)
1586
1587       These three macros are used to change a memory buffer size or to free a
1588       piece of memory no longer needed.  The arguments to "Renew" and
1589       "Renewc" match those of "New" and "Newc" with the exception of not
1590       needing the "magic cookie" argument.
1591
1592       Moving
1593
1594           Move(source, dest, number, type);
1595           Copy(source, dest, number, type);
1596           Zero(dest, number, type);
1597
1598       These three macros are used to move, copy, or zero out previously
1599       allocated memory.  The "source" and "dest" arguments point to the
1600       source and destination starting points.  Perl will move, copy, or zero
1601       out "number" instances of the size of the "type" data structure (using
1602       the "sizeof" function).
1603
1604   PerlIO
1605       The most recent development releases of Perl have been experimenting
1606       with removing Perl's dependency on the "normal" standard I/O suite and
1607       allowing other stdio implementations to be used.  This involves
1608       creating a new abstraction layer that then calls whichever
1609       implementation of stdio Perl was compiled with.  All XSUBs should now
1610       use the functions in the PerlIO abstraction layer and not make any
1611       assumptions about what kind of stdio is being used.
1612
1613       For a complete description of the PerlIO abstraction, consult perlapio.
1614
1615   Putting a C value on Perl stack
1616       A lot of opcodes (this is an elementary operation in the internal perl
1617       stack machine) put an SV* on the stack. However, as an optimization the
1618       corresponding SV is (usually) not recreated each time. The opcodes
1619       reuse specially assigned SVs (targets) which are (as a corollary) not
1620       constantly freed/created.
1621
1622       Each of the targets is created only once (but see "Scratchpads and
1623       recursion" below), and when an opcode needs to put an integer, a
1624       double, or a string on stack, it just sets the corresponding parts of
1625       its target and puts the target on stack.
1626
1627       The macro to put this target on stack is "PUSHTARG", and it is directly
1628       used in some opcodes, as well as indirectly in zillions of others,
1629       which use it via "(X)PUSH[iunp]".
1630
1631       Because the target is reused, you must be careful when pushing multiple
1632       values on the stack. The following code will not do what you think:
1633
1634           XPUSHi(10);
1635           XPUSHi(20);
1636
1637       This translates as "set "TARG" to 10, push a pointer to "TARG" onto the
1638       stack; set "TARG" to 20, push a pointer to "TARG" onto the stack".  At
1639       the end of the operation, the stack does not contain the values 10 and
1640       20, but actually contains two pointers to "TARG", which we have set to
1641       20.
1642
1643       If you need to push multiple different values then you should either
1644       use the "(X)PUSHs" macros, or else use the new "m(X)PUSH[iunp]" macros,
1645       none of which make use of "TARG".  The "(X)PUSHs" macros simply push an
1646       SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
1647       will often need to be "mortal".  The new "m(X)PUSH[iunp]" macros make
1648       this a little easier to achieve by creating a new mortal for you (via
1649       "(X)PUSHmortal"), pushing that onto the stack (extending it if
1650       necessary in the case of the "mXPUSH[iunp]" macros), and then setting
1651       its value.  Thus, instead of writing this to "fix" the example above:
1652
1653           XPUSHs(sv_2mortal(newSViv(10)))
1654           XPUSHs(sv_2mortal(newSViv(20)))
1655
1656       you can simply write:
1657
1658           mXPUSHi(10)
1659           mXPUSHi(20)
1660
1661       On a related note, if you do use "(X)PUSH[iunp]", then you're going to
1662       need a "dTARG" in your variable declarations so that the "*PUSH*"
1663       macros can make use of the local variable "TARG".  See also "dTARGET"
1664       and "dXSTARG".
1665
1666   Scratchpads
1667       The question remains on when the SVs which are targets for opcodes are
1668       created. The answer is that they are created when the current unit--a
1669       subroutine or a file (for opcodes for statements outside of
1670       subroutines)--is compiled. During this time a special anonymous Perl
1671       array is created, which is called a scratchpad for the current unit.
1672
1673       A scratchpad keeps SVs which are lexicals for the current unit and are
1674       targets for opcodes. One can deduce that an SV lives on a scratchpad by
1675       looking on its flags: lexicals have "SVs_PADMY" set, and targets have
1676       "SVs_PADTMP" set.
1677
1678       The correspondence between OPs and targets is not 1-to-1. Different OPs
1679       in the compile tree of the unit can use the same target, if this would
1680       not conflict with the expected life of the temporary.
1681
1682   Scratchpads and recursion
1683       In fact it is not 100% true that a compiled unit contains a pointer to
1684       the scratchpad AV. In fact it contains a pointer to an AV of
1685       (initially) one element, and this element is the scratchpad AV. Why do
1686       we need an extra level of indirection?
1687
1688       The answer is recursion, and maybe threads. Both these can create
1689       several execution pointers going into the same subroutine. For the
1690       subroutine-child not write over the temporaries for the subroutine-
1691       parent (lifespan of which covers the call to the child), the parent and
1692       the child should have different scratchpads. (And the lexicals should
1693       be separate anyway!)
1694
1695       So each subroutine is born with an array of scratchpads (of length 1).
1696       On each entry to the subroutine it is checked that the current depth of
1697       the recursion is not more than the length of this array, and if it is,
1698       new scratchpad is created and pushed into the array.
1699
1700       The targets on this scratchpad are "undef"s, but they are already
1701       marked with correct flags.
1702

Compiled code

1704   Code tree
1705       Here we describe the internal form your code is converted to by Perl.
1706       Start with a simple example:
1707
1708         $a = $b + $c;
1709
1710       This is converted to a tree similar to this one:
1711
1712                    assign-to
1713                  /           \
1714                 +             $a
1715               /   \
1716             $b     $c
1717
1718       (but slightly more complicated).  This tree reflects the way Perl
1719       parsed your code, but has nothing to do with the execution order.
1720       There is an additional "thread" going through the nodes of the tree
1721       which shows the order of execution of the nodes.  In our simplified
1722       example above it looks like:
1723
1724            $b ---> $c ---> + ---> $a ---> assign-to
1725
1726       But with the actual compile tree for "$a = $b + $c" it is different:
1727       some nodes optimized away.  As a corollary, though the actual tree
1728       contains more nodes than our simplified example, the execution order is
1729       the same as in our example.
1730
1731   Examining the tree
1732       If you have your perl compiled for debugging (usually done with
1733       "-DDEBUGGING" on the "Configure" command line), you may examine the
1734       compiled tree by specifying "-Dx" on the Perl command line.  The output
1735       takes several lines per node, and for "$b+$c" it looks like this:
1736
1737           5           TYPE = add  ===> 6
1738                       TARG = 1
1739                       FLAGS = (SCALAR,KIDS)
1740                       {
1741                           TYPE = null  ===> (4)
1742                             (was rv2sv)
1743                           FLAGS = (SCALAR,KIDS)
1744                           {
1745           3                   TYPE = gvsv  ===> 4
1746                               FLAGS = (SCALAR)
1747                               GV = main::b
1748                           }
1749                       }
1750                       {
1751                           TYPE = null  ===> (5)
1752                             (was rv2sv)
1753                           FLAGS = (SCALAR,KIDS)
1754                           {
1755           4                   TYPE = gvsv  ===> 5
1756                               FLAGS = (SCALAR)
1757                               GV = main::c
1758                           }
1759                       }
1760
1761       This tree has 5 nodes (one per "TYPE" specifier), only 3 of them are
1762       not optimized away (one per number in the left column).  The immediate
1763       children of the given node correspond to "{}" pairs on the same level
1764       of indentation, thus this listing corresponds to the tree:
1765
1766                          add
1767                        /     \
1768                      null    null
1769                       |       |
1770                      gvsv    gvsv
1771
1772       The execution order is indicated by "===>" marks, thus it is "3 4 5 6"
1773       (node 6 is not included into above listing), i.e., "gvsv gvsv add
1774       whatever".
1775
1776       Each of these nodes represents an op, a fundamental operation inside
1777       the Perl core. The code which implements each operation can be found in
1778       the pp*.c files; the function which implements the op with type "gvsv"
1779       is "pp_gvsv", and so on. As the tree above shows, different ops have
1780       different numbers of children: "add" is a binary operator, as one would
1781       expect, and so has two children. To accommodate the various different
1782       numbers of children, there are various types of op data structure, and
1783       they link together in different ways.
1784
1785       The simplest type of op structure is "OP": this has no children. Unary
1786       operators, "UNOP"s, have one child, and this is pointed to by the
1787       "op_first" field. Binary operators ("BINOP"s) have not only an
1788       "op_first" field but also an "op_last" field. The most complex type of
1789       op is a "LISTOP", which has any number of children. In this case, the
1790       first child is pointed to by "op_first" and the last child by
1791       "op_last". The children in between can be found by iteratively
1792       following the "op_sibling" pointer from the first child to the last.
1793
1794       There are also two other op types: a "PMOP" holds a regular expression,
1795       and has no children, and a "LOOP" may or may not have children. If the
1796       "op_children" field is non-zero, it behaves like a "LISTOP". To
1797       complicate matters, if a "UNOP" is actually a "null" op after
1798       optimization (see "Compile pass 2: context propagation") it will still
1799       have children in accordance with its former type.
1800
1801       Another way to examine the tree is to use a compiler back-end module,
1802       such as B::Concise.
1803
1804   Compile pass 1: check routines
1805       The tree is created by the compiler while yacc code feeds it the
1806       constructions it recognizes. Since yacc works bottom-up, so does the
1807       first pass of perl compilation.
1808
1809       What makes this pass interesting for perl developers is that some
1810       optimization may be performed on this pass.  This is optimization by
1811       so-called "check routines".  The correspondence between node names and
1812       corresponding check routines is described in opcode.pl (do not forget
1813       to run "make regen_headers" if you modify this file).
1814
1815       A check routine is called when the node is fully constructed except for
1816       the execution-order thread.  Since at this time there are no back-links
1817       to the currently constructed node, one can do most any operation to the
1818       top-level node, including freeing it and/or creating new nodes
1819       above/below it.
1820
1821       The check routine returns the node which should be inserted into the
1822       tree (if the top-level node was not modified, check routine returns its
1823       argument).
1824
1825       By convention, check routines have names "ck_*". They are usually
1826       called from "new*OP" subroutines (or "convert") (which in turn are
1827       called from perly.y).
1828
1829   Compile pass 1a: constant folding
1830       Immediately after the check routine is called the returned node is
1831       checked for being compile-time executable.  If it is (the value is
1832       judged to be constant) it is immediately executed, and a constant node
1833       with the "return value" of the corresponding subtree is substituted
1834       instead.  The subtree is deleted.
1835
1836       If constant folding was not performed, the execution-order thread is
1837       created.
1838
1839   Compile pass 2: context propagation
1840       When a context for a part of compile tree is known, it is propagated
1841       down through the tree.  At this time the context can have 5 values
1842       (instead of 2 for runtime context): void, boolean, scalar, list, and
1843       lvalue.  In contrast with the pass 1 this pass is processed from top to
1844       bottom: a node's context determines the context for its children.
1845
1846       Additional context-dependent optimizations are performed at this time.
1847       Since at this moment the compile tree contains back-references (via
1848       "thread" pointers), nodes cannot be free()d now.  To allow optimized-
1849       away nodes at this stage, such nodes are null()ified instead of
1850       free()ing (i.e. their type is changed to OP_NULL).
1851
1852   Compile pass 3: peephole optimization
1853       After the compile tree for a subroutine (or for an "eval" or a file) is
1854       created, an additional pass over the code is performed. This pass is
1855       neither top-down or bottom-up, but in the execution order (with
1856       additional complications for conditionals).  Optimizations performed at
1857       this stage are subject to the same restrictions as in the pass 2.
1858
1859       Peephole optimizations are done by calling the function pointed to by
1860       the global variable "PL_peepp".  By default, "PL_peepp" just calls the
1861       function pointed to by the global variable "PL_rpeepp".  By default,
1862       that performs some basic op fixups and optimisations along the
1863       execution-order op chain, and recursively calls "PL_rpeepp" for each
1864       side chain of ops (resulting from conditionals).  Extensions may
1865       provide additional optimisations or fixups, hooking into either the
1866       per-subroutine or recursive stage, like this:
1867
1868           static peep_t prev_peepp;
1869           static void my_peep(pTHX_ OP *o)
1870           {
1871               /* custom per-subroutine optimisation goes here */
1872               prev_peepp(o);
1873               /* custom per-subroutine optimisation may also go here */
1874           }
1875           BOOT:
1876               prev_peepp = PL_peepp;
1877               PL_peepp = my_peep;
1878
1879           static peep_t prev_rpeepp;
1880           static void my_rpeep(pTHX_ OP *o)
1881           {
1882               OP *orig_o = o;
1883               for(; o; o = o->op_next) {
1884                   /* custom per-op optimisation goes here */
1885               }
1886               prev_rpeepp(orig_o);
1887           }
1888           BOOT:
1889               prev_rpeepp = PL_rpeepp;
1890               PL_rpeepp = my_rpeep;
1891
1892   Pluggable runops
1893       The compile tree is executed in a runops function.  There are two
1894       runops functions, in run.c and in dump.c.  "Perl_runops_debug" is used
1895       with DEBUGGING and "Perl_runops_standard" is used otherwise.  For fine
1896       control over the execution of the compile tree it is possible to
1897       provide your own runops function.
1898
1899       It's probably best to copy one of the existing runops functions and
1900       change it to suit your needs.  Then, in the BOOT section of your XS
1901       file, add the line:
1902
1903         PL_runops = my_runops;
1904
1905       This function should be as efficient as possible to keep your programs
1906       running as fast as possible.
1907
1908   Compile-time scope hooks
1909       As of perl 5.14 it is possible to hook into the compile-time lexical
1910       scope mechanism using "Perl_blockhook_register". This is used like
1911       this:
1912
1913           STATIC void my_start_hook(pTHX_ int full);
1914           STATIC BHK my_hooks;
1915
1916           BOOT:
1917               BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
1918               Perl_blockhook_register(aTHX_ &my_hooks);
1919
1920       This will arrange to have "my_start_hook" called at the start of
1921       compiling every lexical scope. The available hooks are:
1922
1923       "void bhk_start(pTHX_ int full)"
1924           This is called just after starting a new lexical scope. Note that
1925           Perl code like
1926
1927               if ($x) { ... }
1928
1929           creates two scopes: the first starts at the "(" and has "full ==
1930           1", the second starts at the "{" and has "full == 0". Both end at
1931           the "}", so calls to "start" and "pre/post_end" will match.
1932           Anything pushed onto the save stack by this hook will be popped
1933           just before the scope ends (between the "pre_" and "post_end"
1934           hooks, in fact).
1935
1936       "void bhk_pre_end(pTHX_ OP **o)"
1937           This is called at the end of a lexical scope, just before unwinding
1938           the stack. o is the root of the optree representing the scope; it
1939           is a double pointer so you can replace the OP if you need to.
1940
1941       "void bhk_post_end(pTHX_ OP **o)"
1942           This is called at the end of a lexical scope, just after unwinding
1943           the stack. o is as above. Note that it is possible for calls to
1944           "pre_" and "post_end" to nest, if there is something on the save
1945           stack that calls string eval.
1946
1947       "void bhk_eval(pTHX_ OP *const o)"
1948           This is called just before starting to compile an "eval STRING",
1949           "do FILE", "require" or "use", after the eval has been set up. o is
1950           the OP that requested the eval, and will normally be an
1951           "OP_ENTEREVAL", "OP_DOFILE" or "OP_REQUIRE".
1952
1953       Once you have your hook functions, you need a "BHK" structure to put
1954       them in. It's best to allocate it statically, since there is no way to
1955       free it once it's registered. The function pointers should be inserted
1956       into this structure using the "BhkENTRY_set" macro, which will also set
1957       flags indicating which entries are valid. If you do need to allocate
1958       your "BHK" dynamically for some reason, be sure to zero it before you
1959       start.
1960
1961       Once registered, there is no mechanism to switch these hooks off, so if
1962       that is necessary you will need to do this yourself. An entry in "%^H"
1963       is probably the best way, so the effect is lexically scoped; however it
1964       is also possible to use the "BhkDISABLE" and "BhkENABLE" macros to
1965       temporarily switch entries on and off. You should also be aware that
1966       generally speaking at least one scope will have opened before your
1967       extension is loaded, so you will see some "pre/post_end" pairs that
1968       didn't have a matching "start".
1969

Examining internal data structures with the "dump" functions

1971       To aid debugging, the source file dump.c contains a number of functions
1972       which produce formatted output of internal data structures.
1973
1974       The most commonly used of these functions is "Perl_sv_dump"; it's used
1975       for dumping SVs, AVs, HVs, and CVs. The "Devel::Peek" module calls
1976       "sv_dump" to produce debugging output from Perl-space, so users of that
1977       module should already be familiar with its format.
1978
1979       "Perl_op_dump" can be used to dump an "OP" structure or any of its
1980       derivatives, and produces output similar to "perl -Dx"; in fact,
1981       "Perl_dump_eval" will dump the main root of the code being evaluated,
1982       exactly like "-Dx".
1983
1984       Other useful functions are "Perl_dump_sub", which turns a "GV" into an
1985       op tree, "Perl_dump_packsubs" which calls "Perl_dump_sub" on all the
1986       subroutines in a package like so: (Thankfully, these are all xsubs, so
1987       there is no op tree)
1988
1989           (gdb) print Perl_dump_packsubs(PL_defstash)
1990
1991           SUB attributes::bootstrap = (xsub 0x811fedc 0)
1992
1993           SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1994
1995           SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1996
1997           SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
1998
1999           SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
2000
2001       and "Perl_dump_all", which dumps all the subroutines in the stash and
2002       the op tree of the main root.
2003

How multiple interpreters and concurrency are supported

2005   Background and PERL_IMPLICIT_CONTEXT
2006       The Perl interpreter can be regarded as a closed box: it has an API for
2007       feeding it code or otherwise making it do things, but it also has
2008       functions for its own use.  This smells a lot like an object, and there
2009       are ways for you to build Perl so that you can have multiple
2010       interpreters, with one interpreter represented either as a C structure,
2011       or inside a thread-specific structure.  These structures contain all
2012       the context, the state of that interpreter.
2013
2014       One macro controls the major Perl build flavor: MULTIPLICITY. The
2015       MULTIPLICITY build has a C structure that packages all the interpreter
2016       state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
2017       normally defined, and enables the support for passing in a "hidden"
2018       first argument that represents all three data structures. MULTIPLICITY
2019       makes multi-threaded perls possible (with the ithreads threading model,
2020       related to the macro USE_ITHREADS.)
2021
2022       Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
2023       PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
2024       former turns on MULTIPLICITY.)  The PERL_GLOBAL_STRUCT causes all the
2025       internal variables of Perl to be wrapped inside a single global struct,
2026       struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or the
2027       function  Perl_GetVars().  The PERL_GLOBAL_STRUCT_PRIVATE goes one step
2028       further, there is still a single struct (allocated in main() either
2029       from heap or from stack) but there are no global data symbols pointing
2030       to it.  In either case the global struct should be initialised as the
2031       very first thing in main() using Perl_init_global_struct() and
2032       correspondingly tear it down after perl_free() using
2033       Perl_free_global_struct(), please see miniperlmain.c for usage details.
2034       You may also need to use "dVAR" in your coding to "declare the global
2035       variables" when you are using them.  dTHX does this for you
2036       automatically.
2037
2038       To see whether you have non-const data you can use a BSD-compatible
2039       "nm":
2040
2041         nm libperl.a | grep -v ' [TURtr] '
2042
2043       If this displays any "D" or "d" symbols, you have non-const data.
2044
2045       For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
2046       doesn't actually hide all symbols inside a big global struct: some
2047       PerlIO_xxx vtables are left visible.  The PERL_GLOBAL_STRUCT_PRIVATE
2048       then hides everything (see how the PERLIO_FUNCS_DECL is used).
2049
2050       All this obviously requires a way for the Perl internal functions to be
2051       either subroutines taking some kind of structure as the first argument,
2052       or subroutines taking nothing as the first argument.  To enable these
2053       two very different ways of building the interpreter, the Perl source
2054       (as it does in so many other situations) makes heavy use of macros and
2055       subroutine naming conventions.
2056
2057       First problem: deciding which functions will be public API functions
2058       and which will be private.  All functions whose names begin "S_" are
2059       private (think "S" for "secret" or "static").  All other functions
2060       begin with "Perl_", but just because a function begins with "Perl_"
2061       does not mean it is part of the API. (See "Internal Functions".) The
2062       easiest way to be sure a function is part of the API is to find its
2063       entry in perlapi.  If it exists in perlapi, it's part of the API.  If
2064       it doesn't, and you think it should be (i.e., you need it for your
2065       extension), send mail via perlbug explaining why you think it should
2066       be.
2067
2068       Second problem: there must be a syntax so that the same subroutine
2069       declarations and calls can pass a structure as their first argument, or
2070       pass nothing.  To solve this, the subroutines are named and declared in
2071       a particular way.  Here's a typical start of a static function used
2072       within the Perl guts:
2073
2074         STATIC void
2075         S_incline(pTHX_ char *s)
2076
2077       STATIC becomes "static" in C, and may be #define'd to nothing in some
2078       configurations in the future.
2079
2080       A public function (i.e. part of the internal API, but not necessarily
2081       sanctioned for use in extensions) begins like this:
2082
2083         void
2084         Perl_sv_setiv(pTHX_ SV* dsv, IV num)
2085
2086       "pTHX_" is one of a number of macros (in perl.h) that hide the details
2087       of the interpreter's context.  THX stands for "thread", "this", or
2088       "thingy", as the case may be.  (And no, George Lucas is not involved.
2089       :-) The first character could be 'p' for a prototype, 'a' for argument,
2090       or 'd' for declaration, so we have "pTHX", "aTHX" and "dTHX", and their
2091       variants.
2092
2093       When Perl is built without options that set PERL_IMPLICIT_CONTEXT,
2094       there is no first argument containing the interpreter's context.  The
2095       trailing underscore in the pTHX_ macro indicates that the macro
2096       expansion needs a comma after the context argument because other
2097       arguments follow it.  If PERL_IMPLICIT_CONTEXT is not defined, pTHX_
2098       will be ignored, and the subroutine is not prototyped to take the extra
2099       argument.  The form of the macro without the trailing underscore is
2100       used when there are no additional explicit arguments.
2101
2102       When a core function calls another, it must pass the context.  This is
2103       normally hidden via macros.  Consider "sv_setiv".  It expands into
2104       something like this:
2105
2106           #ifdef PERL_IMPLICIT_CONTEXT
2107             #define sv_setiv(a,b)      Perl_sv_setiv(aTHX_ a, b)
2108             /* can't do this for vararg functions, see below */
2109           #else
2110             #define sv_setiv           Perl_sv_setiv
2111           #endif
2112
2113       This works well, and means that XS authors can gleefully write:
2114
2115           sv_setiv(foo, bar);
2116
2117       and still have it work under all the modes Perl could have been
2118       compiled with.
2119
2120       This doesn't work so cleanly for varargs functions, though, as macros
2121       imply that the number of arguments is known in advance.  Instead we
2122       either need to spell them out fully, passing "aTHX_" as the first
2123       argument (the Perl core tends to do this with functions like
2124       Perl_warner), or use a context-free version.
2125
2126       The context-free version of Perl_warner is called
2127       Perl_warner_nocontext, and does not take the extra argument.  Instead
2128       it does dTHX; to get the context from thread-local storage.  We
2129       "#define warner Perl_warner_nocontext" so that extensions get source
2130       compatibility at the expense of performance.  (Passing an arg is
2131       cheaper than grabbing it from thread-local storage.)
2132
2133       You can ignore [pad]THXx when browsing the Perl headers/sources.  Those
2134       are strictly for use within the core.  Extensions and embedders need
2135       only be aware of [pad]THX.
2136
2137   So what happened to dTHR?
2138       "dTHR" was introduced in perl 5.005 to support the older thread model.
2139       The older thread model now uses the "THX" mechanism to pass context
2140       pointers around, so "dTHR" is not useful any more.  Perl 5.6.0 and
2141       later still have it for backward source compatibility, but it is
2142       defined to be a no-op.
2143
2144   How do I use all this in extensions?
2145       When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any
2146       functions in the Perl API will need to pass the initial context
2147       argument somehow.  The kicker is that you will need to write it in such
2148       a way that the extension still compiles when Perl hasn't been built
2149       with PERL_IMPLICIT_CONTEXT enabled.
2150
2151       There are three ways to do this.  First, the easy but inefficient way,
2152       which is also the default, in order to maintain source compatibility
2153       with extensions: whenever XSUB.h is #included, it redefines the aTHX
2154       and aTHX_ macros to call a function that will return the context.
2155       Thus, something like:
2156
2157               sv_setiv(sv, num);
2158
2159       in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
2160       in effect:
2161
2162               Perl_sv_setiv(Perl_get_context(), sv, num);
2163
2164       or to this otherwise:
2165
2166               Perl_sv_setiv(sv, num);
2167
2168       You don't have to do anything new in your extension to get this; since
2169       the Perl library provides Perl_get_context(), it will all just work.
2170
2171       The second, more efficient way is to use the following template for
2172       your Foo.xs:
2173
2174               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2175               #include "EXTERN.h"
2176               #include "perl.h"
2177               #include "XSUB.h"
2178
2179               STATIC void my_private_function(int arg1, int arg2);
2180
2181               STATIC void
2182               my_private_function(int arg1, int arg2)
2183               {
2184                   dTHX;       /* fetch context */
2185                   ... call many Perl API functions ...
2186               }
2187
2188               [... etc ...]
2189
2190               MODULE = Foo            PACKAGE = Foo
2191
2192               /* typical XSUB */
2193
2194               void
2195               my_xsub(arg)
2196                       int arg
2197                   CODE:
2198                       my_private_function(arg, 10);
2199
2200       Note that the only two changes from the normal way of writing an
2201       extension is the addition of a "#define PERL_NO_GET_CONTEXT" before
2202       including the Perl headers, followed by a "dTHX;" declaration at the
2203       start of every function that will call the Perl API.  (You'll know
2204       which functions need this, because the C compiler will complain that
2205       there's an undeclared identifier in those functions.)  No changes are
2206       needed for the XSUBs themselves, because the XS() macro is correctly
2207       defined to pass in the implicit context if needed.
2208
2209       The third, even more efficient way is to ape how it is done within the
2210       Perl guts:
2211
2212               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2213               #include "EXTERN.h"
2214               #include "perl.h"
2215               #include "XSUB.h"
2216
2217               /* pTHX_ only needed for functions that call Perl API */
2218               STATIC void my_private_function(pTHX_ int arg1, int arg2);
2219
2220               STATIC void
2221               my_private_function(pTHX_ int arg1, int arg2)
2222               {
2223                   /* dTHX; not needed here, because THX is an argument */
2224                   ... call Perl API functions ...
2225               }
2226
2227               [... etc ...]
2228
2229               MODULE = Foo            PACKAGE = Foo
2230
2231               /* typical XSUB */
2232
2233               void
2234               my_xsub(arg)
2235                       int arg
2236                   CODE:
2237                       my_private_function(aTHX_ arg, 10);
2238
2239       This implementation never has to fetch the context using a function
2240       call, since it is always passed as an extra argument.  Depending on
2241       your needs for simplicity or efficiency, you may mix the previous two
2242       approaches freely.
2243
2244       Never add a comma after "pTHX" yourself--always use the form of the
2245       macro with the underscore for functions that take explicit arguments,
2246       or the form without the argument for functions with no explicit
2247       arguments.
2248
2249       If one is compiling Perl with the "-DPERL_GLOBAL_STRUCT" the "dVAR"
2250       definition is needed if the Perl global variables (see perlvars.h or
2251       globvar.sym) are accessed in the function and "dTHX" is not used (the
2252       "dTHX" includes the "dVAR" if necessary).  One notices the need for
2253       "dVAR" only with the said compile-time define, because otherwise the
2254       Perl global variables are visible as-is.
2255
2256   Should I do anything special if I call perl from multiple threads?
2257       If you create interpreters in one thread and then proceed to call them
2258       in another, you need to make sure perl's own Thread Local Storage (TLS)
2259       slot is initialized correctly in each of those threads.
2260
2261       The "perl_alloc" and "perl_clone" API functions will automatically set
2262       the TLS slot to the interpreter they created, so that there is no need
2263       to do anything special if the interpreter is always accessed in the
2264       same thread that created it, and that thread did not create or call any
2265       other interpreters afterwards.  If that is not the case, you have to
2266       set the TLS slot of the thread before calling any functions in the Perl
2267       API on that particular interpreter.  This is done by calling the
2268       "PERL_SET_CONTEXT" macro in that thread as the first thing you do:
2269
2270               /* do this before doing anything else with some_perl */
2271               PERL_SET_CONTEXT(some_perl);
2272
2273               ... other Perl API calls on some_perl go here ...
2274
2275   Future Plans and PERL_IMPLICIT_SYS
2276       Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2277       that the interpreter knows about itself and pass it around, so too are
2278       there plans to allow the interpreter to bundle up everything it knows
2279       about the environment it's running on.  This is enabled with the
2280       PERL_IMPLICIT_SYS macro.  Currently it only works with USE_ITHREADS on
2281       Windows.
2282
2283       This allows the ability to provide an extra pointer (called the "host"
2284       environment) for all the system calls.  This makes it possible for all
2285       the system stuff to maintain their own state, broken down into seven C
2286       structures.  These are thin wrappers around the usual system calls (see
2287       win32/perllib.c) for the default perl executable, but for a more
2288       ambitious host (like the one that would do fork() emulation) all the
2289       extra work needed to pretend that different interpreters are actually
2290       different "processes", would be done here.
2291
2292       The Perl engine/interpreter and the host are orthogonal entities.
2293       There could be one or more interpreters in a process, and one or more
2294       "hosts", with free association between them.
2295

Internal Functions

2297       All of Perl's internal functions which will be exposed to the outside
2298       world are prefixed by "Perl_" so that they will not conflict with XS
2299       functions or functions used in a program in which Perl is embedded.
2300       Similarly, all global variables begin with "PL_". (By convention,
2301       static functions start with "S_".)
2302
2303       Inside the Perl core ("PERL_CORE" defined), you can get at the
2304       functions either with or without the "Perl_" prefix, thanks to a bunch
2305       of defines that live in embed.h. Note that extension code should not
2306       set "PERL_CORE"; this exposes the full perl internals, and is likely to
2307       cause breakage of the XS in each new perl release.
2308
2309       The file embed.h is generated automatically from embed.pl and
2310       embed.fnc. embed.pl also creates the prototyping header files for the
2311       internal functions, generates the documentation and a lot of other bits
2312       and pieces. It's important that when you add a new function to the core
2313       or change an existing one, you change the data in the table in
2314       embed.fnc as well. Here's a sample entry from that table:
2315
2316           Apd |SV**   |av_fetch   |AV* ar|I32 key|I32 lval
2317
2318       The second column is the return type, the third column the name.
2319       Columns after that are the arguments. The first column is a set of
2320       flags:
2321
2322       A  This function is a part of the public API. All such functions should
2323          also have 'd', very few do not.
2324
2325       p  This function has a "Perl_" prefix; i.e. it is defined as
2326          "Perl_av_fetch".
2327
2328       d  This function has documentation using the "apidoc" feature which
2329          we'll look at in a second.  Some functions have 'd' but not 'A';
2330          docs are good.
2331
2332       Other available flags are:
2333
2334       s  This is a static function and is defined as "STATIC S_whatever", and
2335          usually called within the sources as "whatever(...)".
2336
2337       n  This does not need an interpreter context, so the definition has no
2338          "pTHX", and it follows that callers don't use "aTHX".  (See
2339          "Background and PERL_IMPLICIT_CONTEXT".)
2340
2341       r  This function never returns; "croak", "exit" and friends.
2342
2343       f  This function takes a variable number of arguments, "printf" style.
2344          The argument list should end with "...", like this:
2345
2346              Afprd   |void   |croak          |const char* pat|...
2347
2348       M  This function is part of the experimental development API, and may
2349          change or disappear without notice.
2350
2351       o  This function should not have a compatibility macro to define, say,
2352          "Perl_parse" to "parse". It must be called as "Perl_parse".
2353
2354       x  This function isn't exported out of the Perl core.
2355
2356       m  This is implemented as a macro.
2357
2358       X  This function is explicitly exported.
2359
2360       E  This function is visible to extensions included in the Perl core.
2361
2362       b  Binary backward compatibility; this function is a macro but also has
2363          a "Perl_" implementation (which is exported).
2364
2365       others
2366          See the comments at the top of "embed.fnc" for others.
2367
2368       If you edit embed.pl or embed.fnc, you will need to run "make
2369       regen_headers" to force a rebuild of embed.h and other auto-generated
2370       files.
2371
2372   Formatted Printing of IVs, UVs, and NVs
2373       If you are printing IVs, UVs, or NVS instead of the stdio(3) style
2374       formatting codes like %d, %ld, %f, you should use the following macros
2375       for portability
2376
2377               IVdf            IV in decimal
2378               UVuf            UV in decimal
2379               UVof            UV in octal
2380               UVxf            UV in hexadecimal
2381               NVef            NV %e-like
2382               NVff            NV %f-like
2383               NVgf            NV %g-like
2384
2385       These will take care of 64-bit integers and long doubles.  For example:
2386
2387               printf("IV is %"IVdf"\n", iv);
2388
2389       The IVdf will expand to whatever is the correct format for the IVs.
2390
2391       If you are printing addresses of pointers, use UVxf combined with
2392       PTR2UV(), do not use %lx or %p.
2393
2394   Pointer-To-Integer and Integer-To-Pointer
2395       Because pointer size does not necessarily equal integer size, use the
2396       follow macros to do it right.
2397
2398               PTR2UV(pointer)
2399               PTR2IV(pointer)
2400               PTR2NV(pointer)
2401               INT2PTR(pointertotype, integer)
2402
2403       For example:
2404
2405               IV  iv = ...;
2406               SV *sv = INT2PTR(SV*, iv);
2407
2408       and
2409
2410               AV *av = ...;
2411               UV  uv = PTR2UV(av);
2412
2413   Exception Handling
2414       There are a couple of macros to do very basic exception handling in XS
2415       modules. You have to define "NO_XSLOCKS" before including XSUB.h to be
2416       able to use these macros:
2417
2418               #define NO_XSLOCKS
2419               #include "XSUB.h"
2420
2421       You can use these macros if you call code that may croak, but you need
2422       to do some cleanup before giving control back to Perl. For example:
2423
2424               dXCPT;    /* set up necessary variables */
2425
2426               XCPT_TRY_START {
2427                 code_that_may_croak();
2428               } XCPT_TRY_END
2429
2430               XCPT_CATCH
2431               {
2432                 /* do cleanup here */
2433                 XCPT_RETHROW;
2434               }
2435
2436       Note that you always have to rethrow an exception that has been caught.
2437       Using these macros, it is not possible to just catch the exception and
2438       ignore it. If you have to ignore the exception, you have to use the
2439       "call_*" function.
2440
2441       The advantage of using the above macros is that you don't have to setup
2442       an extra function for "call_*", and that using these macros is faster
2443       than using "call_*".
2444
2445   Source Documentation
2446       There's an effort going on to document the internal functions and
2447       automatically produce reference manuals from them - perlapi is one such
2448       manual which details all the functions which are available to XS
2449       writers. perlintern is the autogenerated manual for the functions which
2450       are not part of the API and are supposedly for internal use only.
2451
2452       Source documentation is created by putting POD comments into the C
2453       source, like this:
2454
2455        /*
2456        =for apidoc sv_setiv
2457
2458        Copies an integer into the given SV.  Does not handle 'set' magic.  See
2459        C<sv_setiv_mg>.
2460
2461        =cut
2462        */
2463
2464       Please try and supply some documentation if you add functions to the
2465       Perl core.
2466
2467   Backwards compatibility
2468       The Perl API changes over time. New functions are added or the
2469       interfaces of existing functions are changed. The "Devel::PPPort"
2470       module tries to provide compatibility code for some of these changes,
2471       so XS writers don't have to code it themselves when supporting multiple
2472       versions of Perl.
2473
2474       "Devel::PPPort" generates a C header file ppport.h that can also be run
2475       as a Perl script. To generate ppport.h, run:
2476
2477           perl -MDevel::PPPort -eDevel::PPPort::WriteFile
2478
2479       Besides checking existing XS code, the script can also be used to
2480       retrieve compatibility information for various API calls using the
2481       "--api-info" command line switch. For example:
2482
2483         % perl ppport.h --api-info=sv_magicext
2484
2485       For details, see "perldoc ppport.h".
2486

Unicode Support

2488       Perl 5.6.0 introduced Unicode support. It's important for porters and
2489       XS writers to understand this support and make sure that the code they
2490       write does not corrupt Unicode data.
2491
2492   What is Unicode, anyway?
2493       In the olden, less enlightened times, we all used to use ASCII. Most of
2494       us did, anyway. The big problem with ASCII is that it's American. Well,
2495       no, that's not actually the problem; the problem is that it's not
2496       particularly useful for people who don't use the Roman alphabet. What
2497       used to happen was that particular languages would stick their own
2498       alphabet in the upper range of the sequence, between 128 and 255. Of
2499       course, we then ended up with plenty of variants that weren't quite
2500       ASCII, and the whole point of it being a standard was lost.
2501
2502       Worse still, if you've got a language like Chinese or Japanese that has
2503       hundreds or thousands of characters, then you really can't fit them
2504       into a mere 256, so they had to forget about ASCII altogether, and
2505       build their own systems using pairs of numbers to refer to one
2506       character.
2507
2508       To fix this, some people formed Unicode, Inc. and produced a new
2509       character set containing all the characters you can possibly think of
2510       and more. There are several ways of representing these characters, and
2511       the one Perl uses is called UTF-8. UTF-8 uses a variable number of
2512       bytes to represent a character. You can learn more about Unicode and
2513       Perl's Unicode model in perlunicode.
2514
2515   How can I recognise a UTF-8 string?
2516       You can't. This is because UTF-8 data is stored in bytes just like
2517       non-UTF-8 data. The Unicode character 200, (0xC8 for you hex types)
2518       capital E with a grave accent, is represented by the two bytes
2519       "v196.172". Unfortunately, the non-Unicode string "chr(196).chr(172)"
2520       has that byte sequence as well. So you can't tell just by looking -
2521       this is what makes Unicode input an interesting problem.
2522
2523       In general, you either have to know what you're dealing with, or you
2524       have to guess.  The API function "is_utf8_string" can help; it'll tell
2525       you if a string contains only valid UTF-8 characters. However, it can't
2526       do the work for you. On a character-by-character basis, "is_utf8_char"
2527       will tell you whether the current character in a string is valid UTF-8.
2528
2529   How does UTF-8 represent Unicode characters?
2530       As mentioned above, UTF-8 uses a variable number of bytes to store a
2531       character. Characters with values 0...127 are stored in one byte, just
2532       like good ol' ASCII. Character 128 is stored as "v194.128"; this
2533       continues up to character 191, which is "v194.191". Now we've run out
2534       of bits (191 is binary 10111111) so we move on; 192 is "v195.128". And
2535       so it goes on, moving to three bytes at character 2048.
2536
2537       Assuming you know you're dealing with a UTF-8 string, you can find out
2538       how long the first character in it is with the "UTF8SKIP" macro:
2539
2540           char *utf = "\305\233\340\240\201";
2541           I32 len;
2542
2543           len = UTF8SKIP(utf); /* len is 2 here */
2544           utf += len;
2545           len = UTF8SKIP(utf); /* len is 3 here */
2546
2547       Another way to skip over characters in a UTF-8 string is to use
2548       "utf8_hop", which takes a string and a number of characters to skip
2549       over. You're on your own about bounds checking, though, so don't use it
2550       lightly.
2551
2552       All bytes in a multi-byte UTF-8 character will have the high bit set,
2553       so you can test if you need to do something special with this character
2554       like this (the UTF8_IS_INVARIANT() is a macro that tests whether the
2555       byte can be encoded as a single byte even in UTF-8):
2556
2557           U8 *utf;
2558           U8 *utf_end; /* 1 beyond buffer pointed to by utf */
2559           UV uv;      /* Note: a UV, not a U8, not a char */
2560           STRLEN len; /* length of character in bytes */
2561
2562           if (!UTF8_IS_INVARIANT(*utf))
2563               /* Must treat this as UTF-8 */
2564               uv = utf8_to_uvchr_buf(utf, utf_end, &len);
2565           else
2566               /* OK to treat this character as a byte */
2567               uv = *utf;
2568
2569       You can also see in that example that we use "utf8_to_uvchr_buf" to get
2570       the value of the character; the inverse function "uvchr_to_utf8" is
2571       available for putting a UV into UTF-8:
2572
2573           if (!UTF8_IS_INVARIANT(uv))
2574               /* Must treat this as UTF8 */
2575               utf8 = uvchr_to_utf8(utf8, uv);
2576           else
2577               /* OK to treat this character as a byte */
2578               *utf8++ = uv;
2579
2580       You must convert characters to UVs using the above functions if you're
2581       ever in a situation where you have to match UTF-8 and non-UTF-8
2582       characters. You may not skip over UTF-8 characters in this case. If you
2583       do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
2584       for instance, if your UTF-8 string contains "v196.172", and you skip
2585       that character, you can never match a "chr(200)" in a non-UTF-8 string.
2586       So don't do that!
2587
2588   How does Perl store UTF-8 strings?
2589       Currently, Perl deals with Unicode strings and non-Unicode strings
2590       slightly differently. A flag in the SV, "SVf_UTF8", indicates that the
2591       string is internally encoded as UTF-8. Without it, the byte value is
2592       the codepoint number and vice versa (in other words, the string is
2593       encoded as iso-8859-1, but "use feature 'unicode_strings'" is needed to
2594       get iso-8859-1 semantics). You can check and manipulate this flag with
2595       the following macros:
2596
2597           SvUTF8(sv)
2598           SvUTF8_on(sv)
2599           SvUTF8_off(sv)
2600
2601       This flag has an important effect on Perl's treatment of the string: if
2602       Unicode data is not properly distinguished, regular expressions,
2603       "length", "substr" and other string handling operations will have
2604       undesirable results.
2605
2606       The problem comes when you have, for instance, a string that isn't
2607       flagged as UTF-8, and contains a byte sequence that could be UTF-8 -
2608       especially when combining non-UTF-8 and UTF-8 strings.
2609
2610       Never forget that the "SVf_UTF8" flag is separate to the PV value; you
2611       need be sure you don't accidentally knock it off while you're
2612       manipulating SVs. More specifically, you cannot expect to do this:
2613
2614           SV *sv;
2615           SV *nsv;
2616           STRLEN len;
2617           char *p;
2618
2619           p = SvPV(sv, len);
2620           frobnicate(p);
2621           nsv = newSVpvn(p, len);
2622
2623       The "char*" string does not tell you the whole story, and you can't
2624       copy or reconstruct an SV just by copying the string value. Check if
2625       the old SV has the UTF8 flag set, and act accordingly:
2626
2627           p = SvPV(sv, len);
2628           frobnicate(p);
2629           nsv = newSVpvn(p, len);
2630           if (SvUTF8(sv))
2631               SvUTF8_on(nsv);
2632
2633       In fact, your "frobnicate" function should be made aware of whether or
2634       not it's dealing with UTF-8 data, so that it can handle the string
2635       appropriately.
2636
2637       Since just passing an SV to an XS function and copying the data of the
2638       SV is not enough to copy the UTF8 flags, even less right is just
2639       passing a "char *" to an XS function.
2640
2641   How do I convert a string to UTF-8?
2642       If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to
2643       upgrade one of the strings to UTF-8. If you've got an SV, the easiest
2644       way to do this is:
2645
2646           sv_utf8_upgrade(sv);
2647
2648       However, you must not do this, for example:
2649
2650           if (!SvUTF8(left))
2651               sv_utf8_upgrade(left);
2652
2653       If you do this in a binary operator, you will actually change one of
2654       the strings that came into the operator, and, while it shouldn't be
2655       noticeable by the end user, it can cause problems in deficient code.
2656
2657       Instead, "bytes_to_utf8" will give you a UTF-8-encoded copy of its
2658       string argument. This is useful for having the data available for
2659       comparisons and so on, without harming the original SV. There's also
2660       "utf8_to_bytes" to go the other way, but naturally, this will fail if
2661       the string contains any characters above 255 that can't be represented
2662       in a single byte.
2663
2664   Is there anything else I need to know?
2665       Not really. Just remember these things:
2666
2667       ·  There's no way to tell if a string is UTF-8 or not. You can tell if
2668          an SV is UTF-8 by looking at its "SvUTF8" flag. Don't forget to set
2669          the flag if something should be UTF-8. Treat the flag as part of the
2670          PV, even though it's not - if you pass on the PV to somewhere, pass
2671          on the flag too.
2672
2673       ·  If a string is UTF-8, always use "utf8_to_uvchr_buf" to get at the
2674          value, unless "UTF8_IS_INVARIANT(*s)" in which case you can use *s.
2675
2676       ·  When writing a character "uv" to a UTF-8 string, always use
2677          "uvchr_to_utf8", unless "UTF8_IS_INVARIANT(uv))" in which case you
2678          can use "*s = uv".
2679
2680       ·  Mixing UTF-8 and non-UTF-8 strings is tricky. Use "bytes_to_utf8" to
2681          get a new string which is UTF-8 encoded, and then combine them.
2682

Custom Operators

2684       Custom operator support is a new experimental feature that allows you
2685       to define your own ops. This is primarily to allow the building of
2686       interpreters for other languages in the Perl core, but it also allows
2687       optimizations through the creation of "macro-ops" (ops which perform
2688       the functions of multiple ops which are usually executed together, such
2689       as "gvsv, gvsv, add".)
2690
2691       This feature is implemented as a new op type, "OP_CUSTOM". The Perl
2692       core does not "know" anything special about this op type, and so it
2693       will not be involved in any optimizations. This also means that you can
2694       define your custom ops to be any op structure - unary, binary, list and
2695       so on - you like.
2696
2697       It's important to know what custom operators won't do for you. They
2698       won't let you add new syntax to Perl, directly. They won't even let you
2699       add new keywords, directly. In fact, they won't change the way Perl
2700       compiles a program at all. You have to do those changes yourself, after
2701       Perl has compiled the program. You do this either by manipulating the
2702       op tree using a "CHECK" block and the "B::Generate" module, or by
2703       adding a custom peephole optimizer with the "optimize" module.
2704
2705       When you do this, you replace ordinary Perl ops with custom ops by
2706       creating ops with the type "OP_CUSTOM" and the "pp_addr" of your own PP
2707       function. This should be defined in XS code, and should look like the
2708       PP ops in "pp_*.c". You are responsible for ensuring that your op takes
2709       the appropriate number of values from the stack, and you are
2710       responsible for adding stack marks if necessary.
2711
2712       You should also "register" your op with the Perl interpreter so that it
2713       can produce sensible error and warning messages. Since it is possible
2714       to have multiple custom ops within the one "logical" op type
2715       "OP_CUSTOM", Perl uses the value of "o->op_ppaddr" to determine which
2716       custom op it is dealing with. You should create an "XOP" structure for
2717       each ppaddr you use, set the properties of the custom op with
2718       "XopENTRY_set", and register the structure against the ppaddr using
2719       "Perl_custom_op_register". A trivial example might look like:
2720
2721           static XOP my_xop;
2722           static OP *my_pp(pTHX);
2723
2724           BOOT:
2725               XopENTRY_set(&my_xop, xop_name, "myxop");
2726               XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
2727               Perl_custom_op_register(aTHX_ my_pp, &my_xop);
2728
2729       The available fields in the structure are:
2730
2731       xop_name
2732           A short name for your op. This will be included in some error
2733           messages, and will also be returned as "$op->name" by the B module,
2734           so it will appear in the output of module like B::Concise.
2735
2736       xop_desc
2737           A short description of the function of the op.
2738
2739       xop_class
2740           Which of the various *OP structures this op uses. This should be
2741           one of the "OA_*" constants from op.h, namely
2742
2743           OA_BASEOP
2744           OA_UNOP
2745           OA_BINOP
2746           OA_LOGOP
2747           OA_LISTOP
2748           OA_PMOP
2749           OA_SVOP
2750           OA_PADOP
2751           OA_PVOP_OR_SVOP
2752               This should be interpreted as '"PVOP"' only. The "_OR_SVOP" is
2753               because the only core "PVOP", "OP_TRANS", can sometimes be a
2754               "SVOP" instead.
2755
2756           OA_LOOP
2757           OA_COP
2758
2759           The other "OA_*" constants should not be used.
2760
2761       xop_peep
2762           This member is of type "Perl_cpeep_t", which expands to "void
2763           (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)". If it is set, this
2764           function will be called from "Perl_rpeep" when ops of this type are
2765           encountered by the peephole optimizer. o is the OP that needs
2766           optimizing; oldop is the previous OP optimized, whose "op_next"
2767           points to o.
2768
2769       "B::Generate" directly supports the creation of custom ops by name.
2770

AUTHORS

2772       Until May 1997, this document was maintained by Jeff Okamoto
2773       <okamoto@corp.hp.com>.  It is now maintained as part of Perl itself by
2774       the Perl 5 Porters <perl5-porters@perl.org>.
2775
2776       With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
2777       Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
2778       Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
2779       Stephen McCamant, and Gurusamy Sarathy.
2780