perlguts(1)

1PERLGUTS(1)            Perl Programmers Reference Guide            PERLGUTS(1)
2
3
4

NAME

6       perlguts - Introduction to the Perl API
7

DESCRIPTION

9       This document attempts to describe how to use the Perl API, as well as
10       to provide some info on the basic workings of the Perl core. It is far
11       from complete and probably contains many errors. Please refer any ques‐
12       tions or comments to the author below.
13

Variables

15       Datatypes
16
17       Perl has three typedefs that handle Perl's three main data types:
18
19           SV  Scalar Value
20           AV  Array Value
21           HV  Hash Value
22
23       Each typedef has specific routines that manipulate the various data
24       types.
25
26       What is an "IV"?
27
28       Perl uses a special typedef IV which is a simple signed integer type
29       that is guaranteed to be large enough to hold a pointer (as well as an
30       integer).  Additionally, there is the UV, which is simply an unsigned
31       IV.
32
33       Perl also uses two special typedefs, I32 and I16, which will always be
34       at least 32-bits and 16-bits long, respectively. (Again, there are U32
35       and U16, as well.)  They will usually be exactly 32 and 16 bits long,
36       but on Crays they will both be 64 bits.
37
38       Working with SVs
39
40       An SV can be created and loaded with one command.  There are five types
41       of values that can be loaded: an integer value (IV), an unsigned inte‐
42       ger value (UV), a double (NV), a string (PV), and another scalar (SV).
43
44       The seven routines are:
45
46           SV*  newSViv(IV);
47           SV*  newSVuv(UV);
48           SV*  newSVnv(double);
49           SV*  newSVpv(const char*, STRLEN);
50           SV*  newSVpvn(const char*, STRLEN);
51           SV*  newSVpvf(const char*, ...);
52           SV*  newSVsv(SV*);
53
54       "STRLEN" is an integer type (Size_t, usually defined as size_t in con‐
55       fig.h) guaranteed to be large enough to represent the size of any
56       string that perl can handle.
57
58       In the unlikely case of a SV requiring more complex initialisation, you
59       can create an empty SV with newSV(len).  If "len" is 0 an empty SV of
60       type NULL is returned, else an SV of type PV is returned with len + 1
61       (for the NUL) bytes of storage allocated, accessible via SvPVX.  In
62       both cases the SV has value undef.
63
64           SV *sv = newSV(0);   /* no storage allocated  */
65           SV *sv = newSV(10);  /* 10 (+1) bytes of uninitialised storage allocated  */
66
67       To change the value of an already-existing SV, there are eight rou‐
68       tines:
69
70           void  sv_setiv(SV*, IV);
71           void  sv_setuv(SV*, UV);
72           void  sv_setnv(SV*, double);
73           void  sv_setpv(SV*, const char*);
74           void  sv_setpvn(SV*, const char*, STRLEN)
75           void  sv_setpvf(SV*, const char*, ...);
76           void  sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
77           void  sv_setsv(SV*, SV*);
78
79       Notice that you can choose to specify the length of the string to be
80       assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
81       allow Perl to calculate the length by using "sv_setpv" or by specifying
82       0 as the second argument to "newSVpv".  Be warned, though, that Perl
83       will determine the string's length by using "strlen", which depends on
84       the string terminating with a NUL character.
85
86       The arguments of "sv_setpvf" are processed like "sprintf", and the for‐
87       matted output becomes the value.
88
89       "sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to spec‐
90       ify either a pointer to a variable argument list or the address and
91       length of an array of SVs.  The last argument points to a boolean; on
92       return, if that boolean is true, then locale-specific information has
93       been used to format the string, and the string's contents are therefore
94       untrustworthy (see perlsec).  This pointer may be NULL if that informa‐
95       tion is not important.  Note that this function requires you to specify
96       the length of the format.
97
98       The "sv_set*()" functions are not generic enough to operate on values
99       that have "magic".  See "Magic Virtual Tables" later in this document.
100
101       All SVs that contain strings should be terminated with a NUL character.
102       If it is not NUL-terminated there is a risk of core dumps and corrup‐
103       tions from code which passes the string to C functions or system calls
104       which expect a NUL-terminated string.  Perl's own functions typically
105       add a trailing NUL for this reason.  Nevertheless, you should be very
106       careful when you pass a string stored in an SV to a C function or sys‐
107       tem call.
108
109       To access the actual value that an SV points to, you can use the
110       macros:
111
112           SvIV(SV*)
113           SvUV(SV*)
114           SvNV(SV*)
115           SvPV(SV*, STRLEN len)
116           SvPV_nolen(SV*)
117
118       which will automatically coerce the actual scalar type into an IV, UV,
119       double, or string.
120
121       In the "SvPV" macro, the length of the string returned is placed into
122       the variable "len" (this is a macro, so you do not use &len).  If you
123       do not care what the length of the data is, use the "SvPV_nolen" macro.
124       Historically the "SvPV" macro with the global variable "PL_na" has been
125       used in this case.  But that can be quite inefficient because "PL_na"
126       must be accessed in thread-local storage in threaded Perl.  In any
127       case, remember that Perl allows arbitrary strings of data that may both
128       contain NULs and might not be terminated by a NUL.
129
130       Also remember that C doesn't allow you to safely say "foo(SvPV(s, len),
131       len);". It might work with your compiler, but it won't work for every‐
132       one.  Break this sort of statement up into separate assignments:
133
134           SV *s;
135           STRLEN len;
136           char * ptr;
137           ptr = SvPV(s, len);
138           foo(ptr, len);
139
140       If you want to know if the scalar value is TRUE, you can use:
141
142           SvTRUE(SV*)
143
144       Although Perl will automatically grow strings for you, if you need to
145       force Perl to allocate more memory for your SV, you can use the macro
146
147           SvGROW(SV*, STRLEN newlen)
148
149       which will determine if more memory needs to be allocated.  If so, it
150       will call the function "sv_grow".  Note that "SvGROW" can only
151       increase, not decrease, the allocated memory of an SV and that it does
152       not automatically add a byte for the a trailing NUL (perl's own string
153       functions typically do "SvGROW(sv, len + 1)").
154
155       If you have an SV and want to know what kind of data Perl thinks is
156       stored in it, you can use the following macros to check the type of SV
157       you have.
158
159           SvIOK(SV*)
160           SvNOK(SV*)
161           SvPOK(SV*)
162
163       You can get and set the current length of the string stored in an SV
164       with the following macros:
165
166           SvCUR(SV*)
167           SvCUR_set(SV*, I32 val)
168
169       You can also get a pointer to the end of the string stored in the SV
170       with the macro:
171
172           SvEND(SV*)
173
174       But note that these last three macros are valid only if "SvPOK()" is
175       true.
176
177       If you want to append something to the end of string stored in an
178       "SV*", you can use the following functions:
179
180           void  sv_catpv(SV*, const char*);
181           void  sv_catpvn(SV*, const char*, STRLEN);
182           void  sv_catpvf(SV*, const char*, ...);
183           void  sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
184           void  sv_catsv(SV*, SV*);
185
186       The first function calculates the length of the string to be appended
187       by using "strlen".  In the second, you specify the length of the string
188       yourself.  The third function processes its arguments like "sprintf"
189       and appends the formatted output.  The fourth function works like
190       "vsprintf".  You can specify the address and length of an array of SVs
191       instead of the va_list argument. The fifth function extends the string
192       stored in the first SV with the string stored in the second SV.  It
193       also forces the second SV to be interpreted as a string.
194
195       The "sv_cat*()" functions are not generic enough to operate on values
196       that have "magic".  See "Magic Virtual Tables" later in this document.
197
198       If you know the name of a scalar variable, you can get a pointer to its
199       SV by using the following:
200
201           SV*  get_sv("package::varname", FALSE);
202
203       This returns NULL if the variable does not exist.
204
205       If you want to know if this variable (or any other SV) is actually
206       "defined", you can call:
207
208           SvOK(SV*)
209
210       The scalar "undef" value is stored in an SV instance called
211       "PL_sv_undef".
212
213       Its address can be used whenever an "SV*" is needed. Make sure that you
214       don't try to compare a random sv with &PL_sv_undef. For example when
215       interfacing Perl code, it'll work correctly for:
216
217         foo(undef);
218
219       But won't work when called as:
220
221         $x = undef;
222         foo($x);
223
224       So to repeat always use SvOK() to check whether an sv is defined.
225
226       Also you have to be careful when using &PL_sv_undef as a value in AVs
227       or HVs (see "AVs, HVs and undefined values").
228
229       There are also the two values "PL_sv_yes" and "PL_sv_no", which contain
230       boolean TRUE and FALSE values, respectively.  Like "PL_sv_undef", their
231       addresses can be used whenever an "SV*" is needed.
232
233       Do not be fooled into thinking that "(SV *) 0" is the same as
234       &PL_sv_undef.  Take this code:
235
236           SV* sv = (SV*) 0;
237           if (I-am-to-return-a-real-value) {
238                   sv = sv_2mortal(newSViv(42));
239           }
240           sv_setsv(ST(0), sv);
241
242       This code tries to return a new SV (which contains the value 42) if it
243       should return a real value, or undef otherwise.  Instead it has
244       returned a NULL pointer which, somewhere down the line, will cause a
245       segmentation violation, bus error, or just weird results.  Change the
246       zero to &PL_sv_undef in the first line and all will be well.
247
248       To free an SV that you've created, call "SvREFCNT_dec(SV*)".  Normally
249       this call is not necessary (see "Reference Counts and Mortality").
250
251       Offsets
252
253       Perl provides the function "sv_chop" to efficiently remove characters
254       from the beginning of a string; you give it an SV and a pointer to
255       somewhere inside the PV, and it discards everything before the pointer.
256       The efficiency comes by means of a little hack: instead of actually
257       removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to
258       signal to other functions that the offset hack is in effect, and it
259       puts the number of bytes chopped off into the IV field of the SV. It
260       then moves the PV pointer (called "SvPVX") forward that many bytes, and
261       adjusts "SvCUR" and "SvLEN".
262
263       Hence, at this point, the start of the buffer that we allocated lives
264       at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing into
265       the middle of this allocated storage.
266
267       This is best demonstrated by example:
268
269         % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
270         SV = PVIV(0x8128450) at 0x81340f0
271           REFCNT = 1
272           FLAGS = (POK,OOK,pPOK)
273           IV = 1  (OFFSET)
274           PV = 0x8135781 ( "1" . ) "2345"\0
275           CUR = 4
276           LEN = 5
277
278       Here the number of bytes chopped off (1) is put into IV, and
279       "Devel::Peek::Dump" helpfully reminds us that this is an offset. The
280       portion of the string between the "real" and the "fake" beginnings is
281       shown in parentheses, and the values of "SvCUR" and "SvLEN" reflect the
282       fake beginning, not the real one.
283
284       Something similar to the offset hack is performed on AVs to enable
285       efficient shifting and splicing off the beginning of the array; while
286       "AvARRAY" points to the first element in the array that is visible from
287       Perl, "AvALLOC" points to the real start of the C array. These are usu‐
288       ally the same, but a "shift" operation can be carried out by increasing
289       "AvARRAY" by one and decreasing "AvFILL" and "AvLEN".  Again, the loca‐
290       tion of the real start of the C array only comes into play when freeing
291       the array. See "av_shift" in av.c.
292
293       What's Really Stored in an SV?
294
295       Recall that the usual method of determining the type of scalar you have
296       is to use "Sv*OK" macros.  Because a scalar can be both a number and a
297       string, usually these macros will always return TRUE and calling the
298       "Sv*V" macros will do the appropriate conversion of string to inte‐
299       ger/double or integer/double to string.
300
301       If you really need to know if you have an integer, double, or string
302       pointer in an SV, you can use the following three macros instead:
303
304           SvIOKp(SV*)
305           SvNOKp(SV*)
306           SvPOKp(SV*)
307
308       These will tell you if you truly have an integer, double, or string
309       pointer stored in your SV.  The "p" stands for private.
310
311       The are various ways in which the private and public flags may differ.
312       For example, a tied SV may have a valid underlying value in the IV slot
313       (so SvIOKp is true), but the data should be accessed via the FETCH rou‐
314       tine rather than directly, so SvIOK is false. Another is when numeric
315       conversion has occurred and precision has been lost: only the private
316       flag is set on 'lossy' values. So when an NV is converted to an IV with
317       loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
318
319       In general, though, it's best to use the "Sv*V" macros.
320
321       Working with AVs
322
323       There are two ways to create and load an AV.  The first method creates
324       an empty AV:
325
326           AV*  newAV();
327
328       The second method both creates the AV and initially populates it with
329       SVs:
330
331           AV*  av_make(I32 num, SV **ptr);
332
333       The second argument points to an array containing "num" "SV*"'s.  Once
334       the AV has been created, the SVs can be destroyed, if so desired.
335
336       Once the AV has been created, the following operations are possible on
337       AVs:
338
339           void  av_push(AV*, SV*);
340           SV*   av_pop(AV*);
341           SV*   av_shift(AV*);
342           void  av_unshift(AV*, I32 num);
343
344       These should be familiar operations, with the exception of
345       "av_unshift".  This routine adds "num" elements at the front of the
346       array with the "undef" value.  You must then use "av_store" (described
347       below) to assign values to these new elements.
348
349       Here are some other functions:
350
351           I32   av_len(AV*);
352           SV**  av_fetch(AV*, I32 key, I32 lval);
353           SV**  av_store(AV*, I32 key, SV* val);
354
355       The "av_len" function returns the highest index value in array (just
356       like $#array in Perl).  If the array is empty, -1 is returned.  The
357       "av_fetch" function returns the value at index "key", but if "lval" is
358       non-zero, then "av_fetch" will store an undef value at that index.  The
359       "av_store" function stores the value "val" at index "key", and does not
360       increment the reference count of "val".  Thus the caller is responsible
361       for taking care of that, and if "av_store" returns NULL, the caller
362       will have to decrement the reference count to avoid a memory leak.
363       Note that "av_fetch" and "av_store" both return "SV**"'s, not "SV*"'s
364       as their return value.
365
366           void  av_clear(AV*);
367           void  av_undef(AV*);
368           void  av_extend(AV*, I32 key);
369
370       The "av_clear" function deletes all the elements in the AV* array, but
371       does not actually delete the array itself.  The "av_undef" function
372       will delete all the elements in the array plus the array itself.  The
373       "av_extend" function extends the array so that it contains at least
374       "key+1" elements.  If "key+1" is less than the currently allocated
375       length of the array, then nothing is done.
376
377       If you know the name of an array variable, you can get a pointer to its
378       AV by using the following:
379
380           AV*  get_av("package::varname", FALSE);
381
382       This returns NULL if the variable does not exist.
383
384       See "Understanding the Magic of Tied Hashes and Arrays" for more infor‐
385       mation on how to use the array access functions on tied arrays.
386
387       Working with HVs
388
389       To create an HV, you use the following routine:
390
391           HV*  newHV();
392
393       Once the HV has been created, the following operations are possible on
394       HVs:
395
396           SV**  hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
397           SV**  hv_fetch(HV*, const char* key, U32 klen, I32 lval);
398
399       The "klen" parameter is the length of the key being passed in (Note
400       that you cannot pass 0 in as a value of "klen" to tell Perl to measure
401       the length of the key).  The "val" argument contains the SV pointer to
402       the scalar being stored, and "hash" is the precomputed hash value (zero
403       if you want "hv_store" to calculate it for you).  The "lval" parameter
404       indicates whether this fetch is actually a part of a store operation,
405       in which case a new undefined value will be added to the HV with the
406       supplied key and "hv_fetch" will return as if the value had already
407       existed.
408
409       Remember that "hv_store" and "hv_fetch" return "SV**"'s and not just
410       "SV*".  To access the scalar value, you must first dereference the
411       return value.  However, you should check to make sure that the return
412       value is not NULL before dereferencing it.
413
414       These two functions check if a hash table entry exists, and deletes it.
415
416           bool  hv_exists(HV*, const char* key, U32 klen);
417           SV*   hv_delete(HV*, const char* key, U32 klen, I32 flags);
418
419       If "flags" does not include the "G_DISCARD" flag then "hv_delete" will
420       create and return a mortal copy of the deleted value.
421
422       And more miscellaneous functions:
423
424           void   hv_clear(HV*);
425           void   hv_undef(HV*);
426
427       Like their AV counterparts, "hv_clear" deletes all the entries in the
428       hash table but does not actually delete the hash table.  The "hv_undef"
429       deletes both the entries and the hash table itself.
430
431       Perl keeps the actual data in linked list of structures with a typedef
432       of HE.  These contain the actual key and value pointers (plus extra
433       administrative overhead).  The key is a string pointer; the value is an
434       "SV*".  However, once you have an "HE*", to get the actual key and
435       value, use the routines specified below.
436
437           I32    hv_iterinit(HV*);
438                   /* Prepares starting point to traverse hash table */
439           HE*    hv_iternext(HV*);
440                   /* Get the next entry, and return a pointer to a
441                      structure that has both the key and value */
442           char*  hv_iterkey(HE* entry, I32* retlen);
443                   /* Get the key from an HE structure and also return
444                      the length of the key string */
445           SV*    hv_iterval(HV*, HE* entry);
446                   /* Return an SV pointer to the value of the HE
447                      structure */
448           SV*    hv_iternextsv(HV*, char** key, I32* retlen);
449                   /* This convenience routine combines hv_iternext,
450                      hv_iterkey, and hv_iterval.  The key and retlen
451                      arguments are return values for the key and its
452                      length.  The value is returned in the SV* argument */
453
454       If you know the name of a hash variable, you can get a pointer to its
455       HV by using the following:
456
457           HV*  get_hv("package::varname", FALSE);
458
459       This returns NULL if the variable does not exist.
460
461       The hash algorithm is defined in the "PERL_HASH(hash, key, klen)"
462       macro:
463
464           hash = 0;
465           while (klen--)
466               hash = (hash * 33) + *key++;
467           hash = hash + (hash >> 5);                  /* after 5.6 */
468
469       The last step was added in version 5.6 to improve distribution of lower
470       bits in the resulting hash value.
471
472       See "Understanding the Magic of Tied Hashes and Arrays" for more infor‐
473       mation on how to use the hash access functions on tied hashes.
474
475       Hash API Extensions
476
477       Beginning with version 5.004, the following functions are also sup‐
478       ported:
479
480           HE*     hv_fetch_ent  (HV* tb, SV* key, I32 lval, U32 hash);
481           HE*     hv_store_ent  (HV* tb, SV* key, SV* val, U32 hash);
482
483           bool    hv_exists_ent (HV* tb, SV* key, U32 hash);
484           SV*     hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
485
486           SV*     hv_iterkeysv  (HE* entry);
487
488       Note that these functions take "SV*" keys, which simplifies writing of
489       extension code that deals with hash structures.  These functions also
490       allow passing of "SV*" keys to "tie" functions without forcing you to
491       stringify the keys (unlike the previous set of functions).
492
493       They also return and accept whole hash entries ("HE*"), making their
494       use more efficient (since the hash number for a particular string
495       doesn't have to be recomputed every time).  See perlapi for detailed
496       descriptions.
497
498       The following macros must always be used to access the contents of hash
499       entries.  Note that the arguments to these macros must be simple vari‐
500       ables, since they may get evaluated more than once.  See perlapi for
501       detailed descriptions of these macros.
502
503           HePV(HE* he, STRLEN len)
504           HeVAL(HE* he)
505           HeHASH(HE* he)
506           HeSVKEY(HE* he)
507           HeSVKEY_force(HE* he)
508           HeSVKEY_set(HE* he, SV* sv)
509
510       These two lower level macros are defined, but must only be used when
511       dealing with keys that are not "SV*"s:
512
513           HeKEY(HE* he)
514           HeKLEN(HE* he)
515
516       Note that both "hv_store" and "hv_store_ent" do not increment the ref‐
517       erence count of the stored "val", which is the caller's responsibility.
518       If these functions return a NULL value, the caller will usually have to
519       decrement the reference count of "val" to avoid a memory leak.
520
521       AVs, HVs and undefined values
522
523       Sometimes you have to store undefined values in AVs or HVs. Although
524       this may be a rare case, it can be tricky. That's because you're used
525       to using &PL_sv_undef if you need an undefined SV.
526
527       For example, intuition tells you that this XS code:
528
529           AV *av = newAV();
530           av_store( av, 0, &PL_sv_undef );
531
532       is equivalent to this Perl code:
533
534           my @av;
535           $av[0] = undef;
536
537       Unfortunately, this isn't true. AVs use &PL_sv_undef as a marker for
538       indicating that an array element has not yet been initialized.  Thus,
539       "exists $av[0]" would be true for the above Perl code, but false for
540       the array generated by the XS code.
541
542       Other problems can occur when storing &PL_sv_undef in HVs:
543
544           hv_store( hv, "key", 3, &PL_sv_undef, 0 );
545
546       This will indeed make the value "undef", but if you try to modify the
547       value of "key", you'll get the following error:
548
549           Modification of non-creatable hash value attempted
550
551       In perl 5.8.0, &PL_sv_undef was also used to mark placeholders in
552       restricted hashes. This caused such hash entries not to appear when
553       iterating over the hash or when checking for the keys with the
554       "hv_exists" function.
555
556       You can run into similar problems when you store &PL_sv_true or
557       &PL_sv_false into AVs or HVs. Trying to modify such elements will give
558       you the following error:
559
560           Modification of a read-only value attempted
561
562       To make a long story short, you can use the special variables
563       &PL_sv_undef, &PL_sv_true and &PL_sv_false with AVs and HVs, but you
564       have to make sure you know what you're doing.
565
566       Generally, if you want to store an undefined value in an AV or HV, you
567       should not use &PL_sv_undef, but rather create a new undefined value
568       using the "newSV" function, for example:
569
570           av_store( av, 42, newSV(0) );
571           hv_store( hv, "foo", 3, newSV(0), 0 );
572
573       References
574
575       References are a special type of scalar that point to other data types
576       (including references).
577
578       To create a reference, use either of the following functions:
579
580           SV* newRV_inc((SV*) thing);
581           SV* newRV_noinc((SV*) thing);
582
583       The "thing" argument can be any of an "SV*", "AV*", or "HV*".  The
584       functions are identical except that "newRV_inc" increments the refer‐
585       ence count of the "thing", while "newRV_noinc" does not.  For histori‐
586       cal reasons, "newRV" is a synonym for "newRV_inc".
587
588       Once you have a reference, you can use the following macro to derefer‐
589       ence the reference:
590
591           SvRV(SV*)
592
593       then call the appropriate routines, casting the returned "SV*" to
594       either an "AV*" or "HV*", if required.
595
596       To determine if an SV is a reference, you can use the following macro:
597
598           SvROK(SV*)
599
600       To discover what type of value the reference refers to, use the follow‐
601       ing macro and then check the return value.
602
603           SvTYPE(SvRV(SV*))
604
605       The most useful types that will be returned are:
606
607           SVt_IV    Scalar
608           SVt_NV    Scalar
609           SVt_PV    Scalar
610           SVt_RV    Scalar
611           SVt_PVAV  Array
612           SVt_PVHV  Hash
613           SVt_PVCV  Code
614           SVt_PVGV  Glob (possible a file handle)
615           SVt_PVMG  Blessed or Magical Scalar
616
617           See the sv.h header file for more details.
618
619       Blessed References and Class Objects
620
621       References are also used to support object-oriented programming.  In
622       perl's OO lexicon, an object is simply a reference that has been
623       blessed into a package (or class).  Once blessed, the programmer may
624       now use the reference to access the various methods in the class.
625
626       A reference can be blessed into a package with the following function:
627
628           SV* sv_bless(SV* sv, HV* stash);
629
630       The "sv" argument must be a reference value.  The "stash" argument
631       specifies which class the reference will belong to.  See "Stashes and
632       Globs" for information on converting class names into stashes.
633
634       /* Still under construction */
635
636       Upgrades rv to reference if not already one.  Creates new SV for rv to
637       point to.  If "classname" is non-null, the SV is blessed into the spec‐
638       ified class.  SV is returned.
639
640               SV* newSVrv(SV* rv, const char* classname);
641
642       Copies integer, unsigned integer or double into an SV whose reference
643       is "rv".  SV is blessed if "classname" is non-null.
644
645               SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
646               SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
647               SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
648
649       Copies the pointer value (the address, not the string!) into an SV
650       whose reference is rv.  SV is blessed if "classname" is non-null.
651
652               SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
653
654       Copies string into an SV whose reference is "rv".  Set length to 0 to
655       let Perl calculate the string length.  SV is blessed if "classname" is
656       non-null.
657
658               SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
659
660       Tests whether the SV is blessed into the specified class.  It does not
661       check inheritance relationships.
662
663               int  sv_isa(SV* sv, const char* name);
664
665       Tests whether the SV is a reference to a blessed object.
666
667               int  sv_isobject(SV* sv);
668
669       Tests whether the SV is derived from the specified class. SV can be
670       either a reference to a blessed object or a string containing a class
671       name. This is the function implementing the "UNIVERSAL::isa" function‐
672       ality.
673
674               bool sv_derived_from(SV* sv, const char* name);
675
676       To check if you've got an object derived from a specific class you have
677       to write:
678
679               if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
680
681       Creating New Variables
682
683       To create a new Perl variable with an undef value which can be accessed
684       from your Perl script, use the following routines, depending on the
685       variable type.
686
687           SV*  get_sv("package::varname", TRUE);
688           AV*  get_av("package::varname", TRUE);
689           HV*  get_hv("package::varname", TRUE);
690
691       Notice the use of TRUE as the second parameter.  The new variable can
692       now be set, using the routines appropriate to the data type.
693
694       There are additional macros whose values may be bitwise OR'ed with the
695       "TRUE" argument to enable certain extra features.  Those bits are:
696
697       GV_ADDMULTI
698           Marks the variable as multiply defined, thus preventing the:
699
700             Name <varname> used only once: possible typo
701
702           warning.
703
704       GV_ADDWARN
705           Issues the warning:
706
707             Had to create <varname> unexpectedly
708
709           if the variable did not exist before the function was called.
710
711       If you do not specify a package name, the variable is created in the
712       current package.
713
714       Reference Counts and Mortality
715
716       Perl uses a reference count-driven garbage collection mechanism. SVs,
717       AVs, or HVs (xV for short in the following) start their life with a
718       reference count of 1.  If the reference count of an xV ever drops to 0,
719       then it will be destroyed and its memory made available for reuse.
720
721       This normally doesn't happen at the Perl level unless a variable is
722       undef'ed or the last variable holding a reference to it is changed or
723       overwritten.  At the internal level, however, reference counts can be
724       manipulated with the following macros:
725
726           int SvREFCNT(SV* sv);
727           SV* SvREFCNT_inc(SV* sv);
728           void SvREFCNT_dec(SV* sv);
729
730       However, there is one other function which manipulates the reference
731       count of its argument.  The "newRV_inc" function, you will recall, cre‐
732       ates a reference to the specified argument.  As a side effect, it
733       increments the argument's reference count.  If this is not what you
734       want, use "newRV_noinc" instead.
735
736       For example, imagine you want to return a reference from an XSUB func‐
737       tion.  Inside the XSUB routine, you create an SV which initially has a
738       reference count of one.  Then you call "newRV_inc", passing it the
739       just-created SV.  This returns the reference as a new SV, but the ref‐
740       erence count of the SV you passed to "newRV_inc" has been incremented
741       to two.  Now you return the reference from the XSUB routine and forget
742       about the SV.  But Perl hasn't!  Whenever the returned reference is
743       destroyed, the reference count of the original SV is decreased to one
744       and nothing happens.  The SV will hang around without any way to access
745       it until Perl itself terminates.  This is a memory leak.
746
747       The correct procedure, then, is to use "newRV_noinc" instead of
748       "newRV_inc".  Then, if and when the last reference is destroyed, the
749       reference count of the SV will go to zero and it will be destroyed,
750       stopping any memory leak.
751
752       There are some convenience functions available that can help with the
753       destruction of xVs.  These functions introduce the concept of "mortal‐
754       ity".  An xV that is mortal has had its reference count marked to be
755       decremented, but not actually decremented, until "a short time later".
756       Generally the term "short time later" means a single Perl statement,
757       such as a call to an XSUB function.  The actual determinant for when
758       mortal xVs have their reference count decremented depends on two
759       macros, SAVETMPS and FREETMPS.  See perlcall and perlxs for more
760       details on these macros.
761
762       "Mortalization" then is at its simplest a deferred "SvREFCNT_dec".
763       However, if you mortalize a variable twice, the reference count will
764       later be decremented twice.
765
766       "Mortal" SVs are mainly used for SVs that are placed on perl's stack.
767       For example an SV which is created just to pass a number to a called
768       sub is made mortal to have it cleaned up automatically when it's popped
769       off the stack. Similarly, results returned by XSUBs (which are pushed
770       on the stack) are often made mortal.
771
772       To create a mortal variable, use the functions:
773
774           SV*  sv_newmortal()
775           SV*  sv_2mortal(SV*)
776           SV*  sv_mortalcopy(SV*)
777
778       The first call creates a mortal SV (with no value), the second converts
779       an existing SV to a mortal SV (and thus defers a call to "SvRE‐
780       FCNT_dec"), and the third creates a mortal copy of an existing SV.
781       Because "sv_newmortal" gives the new SV no value,it must normally be
782       given one via "sv_setpv", "sv_setiv", etc. :
783
784           SV *tmp = sv_newmortal();
785           sv_setiv(tmp, an_integer);
786
787       As that is multiple C statements it is quite common so see this idiom
788       instead:
789
790           SV *tmp = sv_2mortal(newSViv(an_integer));
791
792       You should be careful about creating mortal variables.  Strange things
793       can happen if you make the same value mortal within multiple contexts,
794       or if you make a variable mortal multiple times. Thinking of "Mortal‐
795       ization" as deferred "SvREFCNT_dec" should help to minimize such prob‐
796       lems.  For example if you are passing an SV which you know has high
797       enough REFCNT to survive its use on the stack you need not do any mor‐
798       talization.  If you are not sure then doing an "SvREFCNT_inc" and
799       "sv_2mortal", or making a "sv_mortalcopy" is safer.
800
801       The mortal routines are not just for SVs -- AVs and HVs can be made
802       mortal by passing their address (type-casted to "SV*") to the "sv_2mor‐
803       tal" or "sv_mortalcopy" routines.
804
805       Stashes and Globs
806
807       A stash is a hash that contains all variables that are defined within a
808       package.  Each key of the stash is a symbol name (shared by all the
809       different types of objects that have the same name), and each value in
810       the hash table is a GV (Glob Value).  This GV in turn contains refer‐
811       ences to the various objects of that name, including (but not limited
812       to) the following:
813
814           Scalar Value
815           Array Value
816           Hash Value
817           I/O Handle
818           Format
819           Subroutine
820
821       There is a single stash called "PL_defstash" that holds the items that
822       exist in the "main" package.  To get at the items in other packages,
823       append the string "::" to the package name.  The items in the "Foo"
824       package are in the stash "Foo::" in PL_defstash.  The items in the
825       "Bar::Baz" package are in the stash "Baz::" in "Bar::"'s stash.
826
827       To get the stash pointer for a particular package, use the function:
828
829           HV*  gv_stashpv(const char* name, I32 create)
830           HV*  gv_stashsv(SV*, I32 create)
831
832       The first function takes a literal string, the second uses the string
833       stored in the SV.  Remember that a stash is just a hash table, so you
834       get back an "HV*".  The "create" flag will create a new package if it
835       is set.
836
837       The name that "gv_stash*v" wants is the name of the package whose sym‐
838       bol table you want.  The default package is called "main".  If you have
839       multiply nested packages, pass their names to "gv_stash*v", separated
840       by "::" as in the Perl language itself.
841
842       Alternately, if you have an SV that is a blessed reference, you can
843       find out the stash pointer by using:
844
845           HV*  SvSTASH(SvRV(SV*));
846
847       then use the following to get the package name itself:
848
849           char*  HvNAME(HV* stash);
850
851       If you need to bless or re-bless an object you can use the following
852       function:
853
854           SV*  sv_bless(SV*, HV* stash)
855
856       where the first argument, an "SV*", must be a reference, and the second
857       argument is a stash.  The returned "SV*" can now be used in the same
858       way as any other SV.
859
860       For more information on references and blessings, consult perlref.
861
862       Double-Typed SVs
863
864       Scalar variables normally contain only one type of value, an integer,
865       double, pointer, or reference.  Perl will automatically convert the
866       actual scalar data from the stored type into the requested type.
867
868       Some scalar variables contain more than one type of scalar data.  For
869       example, the variable $! contains either the numeric value of "errno"
870       or its string equivalent from either "strerror" or "sys_errlist[]".
871
872       To force multiple data values into an SV, you must do two things: use
873       the "sv_set*v" routines to add the additional scalar type, then set a
874       flag so that Perl will believe it contains more than one type of data.
875       The four macros to set the flags are:
876
877               SvIOK_on
878               SvNOK_on
879               SvPOK_on
880               SvROK_on
881
882       The particular macro you must use depends on which "sv_set*v" routine
883       you called first.  This is because every "sv_set*v" routine turns on
884       only the bit for the particular type of data being set, and turns off
885       all the rest.
886
887       For example, to create a new Perl variable called "dberror" that con‐
888       tains both the numeric and descriptive string error values, you could
889       use the following code:
890
891           extern int  dberror;
892           extern char *dberror_list;
893
894           SV* sv = get_sv("dberror", TRUE);
895           sv_setiv(sv, (IV) dberror);
896           sv_setpv(sv, dberror_list[dberror]);
897           SvIOK_on(sv);
898
899       If the order of "sv_setiv" and "sv_setpv" had been reversed, then the
900       macro "SvPOK_on" would need to be called instead of "SvIOK_on".
901
902       Magic Variables
903
904       [This section still under construction.  Ignore everything here.  Post
905       no bills.  Everything not permitted is forbidden.]
906
907       Any SV may be magical, that is, it has special features that a normal
908       SV does not have.  These features are stored in the SV structure in a
909       linked list of "struct magic"'s, typedef'ed to "MAGIC".
910
911           struct magic {
912               MAGIC*      mg_moremagic;
913               MGVTBL*     mg_virtual;
914               U16         mg_private;
915               char        mg_type;
916               U8          mg_flags;
917               SV*         mg_obj;
918               char*       mg_ptr;
919               I32         mg_len;
920           };
921
922       Note this is current as of patchlevel 0, and could change at any time.
923
924       Assigning Magic
925
926       Perl adds magic to an SV using the sv_magic function:
927
928           void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
929
930       The "sv" argument is a pointer to the SV that is to acquire a new magi‐
931       cal feature.
932
933       If "sv" is not already magical, Perl uses the "SvUPGRADE" macro to con‐
934       vert "sv" to type "SVt_PVMG". Perl then continues by adding new magic
935       to the beginning of the linked list of magical features.  Any prior
936       entry of the same type of magic is deleted.  Note that this can be
937       overridden, and multiple instances of the same type of magic can be
938       associated with an SV.
939
940       The "name" and "namlen" arguments are used to associate a string with
941       the magic, typically the name of a variable. "namlen" is stored in the
942       "mg_len" field and if "name" is non-null then either a "savepvn" copy
943       of "name" or "name" itself is stored in the "mg_ptr" field, depending
944       on whether "namlen" is greater than zero or equal to zero respectively.
945       As a special case, if "(name && namlen == HEf_SVKEY)" then "name" is
946       assumed to contain an "SV*" and is stored as-is with its REFCNT incre‐
947       mented.
948
949       The sv_magic function uses "how" to determine which, if any, predefined
950       "Magic Virtual Table" should be assigned to the "mg_virtual" field.
951       See the "Magic Virtual Tables" section below.  The "how" argument is
952       also stored in the "mg_type" field. The value of "how" should be chosen
953       from the set of macros "PERL_MAGIC_foo" found in perl.h. Note that
954       before these macros were added, Perl internals used to directly use
955       character literals, so you may occasionally come across old code or
956       documentation referring to 'U' magic rather than "PERL_MAGIC_uvar" for
957       example.
958
959       The "obj" argument is stored in the "mg_obj" field of the "MAGIC"
960       structure.  If it is not the same as the "sv" argument, the reference
961       count of the "obj" object is incremented.  If it is the same, or if the
962       "how" argument is "PERL_MAGIC_arylen", or if it is a NULL pointer, then
963       "obj" is merely stored, without the reference count being incremented.
964
965       See also "sv_magicext" in perlapi for a more flexible way to add magic
966       to an SV.
967
968       There is also a function to add magic to an "HV":
969
970           void hv_magic(HV *hv, GV *gv, int how);
971
972       This simply calls "sv_magic" and coerces the "gv" argument into an
973       "SV".
974
975       To remove the magic from an SV, call the function sv_unmagic:
976
977           void sv_unmagic(SV *sv, int type);
978
979       The "type" argument should be equal to the "how" value when the "SV"
980       was initially made magical.
981
982       Magic Virtual Tables
983
984       The "mg_virtual" field in the "MAGIC" structure is a pointer to an
985       "MGVTBL", which is a structure of function pointers and stands for
986       "Magic Virtual Table" to handle the various operations that might be
987       applied to that variable.
988
989       The "MGVTBL" has five pointers to the following routine types:
990
991           int  (*svt_get)(SV* sv, MAGIC* mg);
992           int  (*svt_set)(SV* sv, MAGIC* mg);
993           U32  (*svt_len)(SV* sv, MAGIC* mg);
994           int  (*svt_clear)(SV* sv, MAGIC* mg);
995           int  (*svt_free)(SV* sv, MAGIC* mg);
996
997       This MGVTBL structure is set at compile-time in perl.h and there are
998       currently 19 types (or 21 with overloading turned on).  These different
999       structures contain pointers to various routines that perform additional
1000       actions depending on which function is being called.
1001
1002           Function pointer    Action taken
1003           ----------------    ------------
1004           svt_get             Do something before the value of the SV is retrieved.
1005           svt_set             Do something after the SV is assigned a value.
1006           svt_len             Report on the SV's length.
1007           svt_clear           Clear something the SV represents.
1008           svt_free            Free any extra storage associated with the SV.
1009
1010       For instance, the MGVTBL structure called "vtbl_sv" (which corresponds
1011       to an "mg_type" of "PERL_MAGIC_sv") contains:
1012
1013           { magic_get, magic_set, magic_len, 0, 0 }
1014
1015       Thus, when an SV is determined to be magical and of type
1016       "PERL_MAGIC_sv", if a get operation is being performed, the routine
1017       "magic_get" is called.  All the various routines for the various magi‐
1018       cal types begin with "magic_".  NOTE: the magic routines are not con‐
1019       sidered part of the Perl API, and may not be exported by the Perl
1020       library.
1021
1022       The current kinds of Magic Virtual Tables are:
1023
1024           mg_type
1025           (old-style char and macro)   MGVTBL         Type of magic
1026           --------------------------   ------         ----------------------------
1027           \0 PERL_MAGIC_sv             vtbl_sv        Special scalar variable
1028           A  PERL_MAGIC_overload       vtbl_amagic    %OVERLOAD hash
1029           a  PERL_MAGIC_overload_elem  vtbl_amagicelem %OVERLOAD hash element
1030           c  PERL_MAGIC_overload_table (none)         Holds overload table (AMT)
1031                                                       on stash
1032           B  PERL_MAGIC_bm             vtbl_bm        Boyer-Moore (fast string search)
1033           D  PERL_MAGIC_regdata        vtbl_regdata   Regex match position data
1034                                                       (@+ and @- vars)
1035           d  PERL_MAGIC_regdatum       vtbl_regdatum  Regex match position data
1036                                                       element
1037           E  PERL_MAGIC_env            vtbl_env       %ENV hash
1038           e  PERL_MAGIC_envelem        vtbl_envelem   %ENV hash element
1039           f  PERL_MAGIC_fm             vtbl_fm        Formline ('compiled' format)
1040           g  PERL_MAGIC_regex_global   vtbl_mglob     m//g target / study()ed string
1041           I  PERL_MAGIC_isa            vtbl_isa       @ISA array
1042           i  PERL_MAGIC_isaelem        vtbl_isaelem   @ISA array element
1043           k  PERL_MAGIC_nkeys          vtbl_nkeys     scalar(keys()) lvalue
1044           L  PERL_MAGIC_dbfile         (none)         Debugger %_<filename
1045           l  PERL_MAGIC_dbline         vtbl_dbline    Debugger %_<filename element
1046           m  PERL_MAGIC_mutex          vtbl_mutex     ???
1047           o  PERL_MAGIC_collxfrm       vtbl_collxfrm  Locale collate transformation
1048           P  PERL_MAGIC_tied           vtbl_pack      Tied array or hash
1049           p  PERL_MAGIC_tiedelem       vtbl_packelem  Tied array or hash element
1050           q  PERL_MAGIC_tiedscalar     vtbl_packelem  Tied scalar or handle
1051           r  PERL_MAGIC_qr             vtbl_qr        precompiled qr// regex
1052           S  PERL_MAGIC_sig            vtbl_sig       %SIG hash
1053           s  PERL_MAGIC_sigelem        vtbl_sigelem   %SIG hash element
1054           t  PERL_MAGIC_taint          vtbl_taint     Taintedness
1055           U  PERL_MAGIC_uvar           vtbl_uvar      Available for use by extensions
1056           v  PERL_MAGIC_vec            vtbl_vec       vec() lvalue
1057           V  PERL_MAGIC_vstring        (none)         v-string scalars
1058           w  PERL_MAGIC_utf8           vtbl_utf8      UTF-8 length+offset cache
1059           x  PERL_MAGIC_substr         vtbl_substr    substr() lvalue
1060           y  PERL_MAGIC_defelem        vtbl_defelem   Shadow "foreach" iterator
1061                                                       variable / smart parameter
1062                                                       vivification
1063           *  PERL_MAGIC_glob           vtbl_glob      GV (typeglob)
1064           #  PERL_MAGIC_arylen         vtbl_arylen    Array length ($#ary)
1065           .  PERL_MAGIC_pos            vtbl_pos       pos() lvalue
1066           <  PERL_MAGIC_backref        vtbl_backref   ???
1067           ~  PERL_MAGIC_ext            (none)         Available for use by extensions
1068
1069       When an uppercase and lowercase letter both exist in the table, then
1070       the uppercase letter is typically used to represent some kind of com‐
1071       posite type (a list or a hash), and the lowercase letter is used to
1072       represent an element of that composite type. Some internals code makes
1073       use of this case relationship.  However, 'v' and 'V' (vec and v-string)
1074       are in no way related.
1075
1076       The "PERL_MAGIC_ext" and "PERL_MAGIC_uvar" magic types are defined
1077       specifically for use by extensions and will not be used by perl itself.
1078       Extensions can use "PERL_MAGIC_ext" magic to 'attach' private informa‐
1079       tion to variables (typically objects).  This is especially useful
1080       because there is no way for normal perl code to corrupt this private
1081       information (unlike using extra elements of a hash object).
1082
1083       Similarly, "PERL_MAGIC_uvar" magic can be used much like tie() to call
1084       a C function any time a scalar's value is used or changed.  The
1085       "MAGIC"'s "mg_ptr" field points to a "ufuncs" structure:
1086
1087           struct ufuncs {
1088               I32 (*uf_val)(pTHX_ IV, SV*);
1089               I32 (*uf_set)(pTHX_ IV, SV*);
1090               IV uf_index;
1091           };
1092
1093       When the SV is read from or written to, the "uf_val" or "uf_set" func‐
1094       tion will be called with "uf_index" as the first arg and a pointer to
1095       the SV as the second.  A simple example of how to add "PERL_MAGIC_uvar"
1096       magic is shown below.  Note that the ufuncs structure is copied by
1097       sv_magic, so you can safely allocate it on the stack.
1098
1099           void
1100           Umagic(sv)
1101               SV *sv;
1102           PREINIT:
1103               struct ufuncs uf;
1104           CODE:
1105               uf.uf_val   = &my_get_fn;
1106               uf.uf_set   = &my_set_fn;
1107               uf.uf_index = 0;
1108               sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1109
1110       Note that because multiple extensions may be using "PERL_MAGIC_ext" or
1111       "PERL_MAGIC_uvar" magic, it is important for extensions to take extra
1112       care to avoid conflict.  Typically only using the magic on objects
1113       blessed into the same class as the extension is sufficient.  For
1114       "PERL_MAGIC_ext" magic, it may also be appropriate to add an I32 'sig‐
1115       nature' at the top of the private data area and check that.
1116
1117       Also note that the "sv_set*()" and "sv_cat*()" functions described ear‐
1118       lier do not invoke 'set' magic on their targets.  This must be done by
1119       the user either by calling the "SvSETMAGIC()" macro after calling these
1120       functions, or by using one of the "sv_set*_mg()" or "sv_cat*_mg()"
1121       functions.  Similarly, generic C code must call the "SvGETMAGIC()"
1122       macro to invoke any 'get' magic if they use an SV obtained from exter‐
1123       nal sources in functions that don't handle magic.  See perlapi for a
1124       description of these functions.  For example, calls to the "sv_cat*()"
1125       functions typically need to be followed by "SvSETMAGIC()", but they
1126       don't need a prior "SvGETMAGIC()" since their implementation handles
1127       'get' magic.
1128
1129       Finding Magic
1130
1131           MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
1132
1133       This routine returns a pointer to the "MAGIC" structure stored in the
1134       SV.  If the SV does not have that magical feature, "NULL" is returned.
1135       Also, if the SV is not of type SVt_PVMG, Perl may core dump.
1136
1137           int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1138
1139       This routine checks to see what types of magic "sv" has.  If the
1140       mg_type field is an uppercase letter, then the mg_obj is copied to
1141       "nsv", but the mg_type field is changed to be the lowercase letter.
1142
1143       Understanding the Magic of Tied Hashes and Arrays
1144
1145       Tied hashes and arrays are magical beasts of the "PERL_MAGIC_tied"
1146       magic type.
1147
1148       WARNING: As of the 5.004 release, proper usage of the array and hash
1149       access functions requires understanding a few caveats.  Some of these
1150       caveats are actually considered bugs in the API, to be fixed in later
1151       releases, and are bracketed with [MAYCHANGE] below. If you find your‐
1152       self actually applying such information in this section, be aware that
1153       the behavior may change in the future, umm, without warning.
1154
1155       The perl tie function associates a variable with an object that imple‐
1156       ments the various GET, SET, etc methods.  To perform the equivalent of
1157       the perl tie function from an XSUB, you must mimic this behaviour.  The
1158       code below carries out the necessary steps - firstly it creates a new
1159       hash, and then creates a second hash which it blesses into the class
1160       which will implement the tie methods. Lastly it ties the two hashes
1161       together, and returns a reference to the new tied hash.  Note that the
1162       code below does NOT call the TIEHASH method in the MyTie class - see
1163       "Calling Perl Routines from within C Programs" for details on how to do
1164       this.
1165
1166           SV*
1167           mytie()
1168           PREINIT:
1169               HV *hash;
1170               HV *stash;
1171               SV *tie;
1172           CODE:
1173               hash = newHV();
1174               tie = newRV_noinc((SV*)newHV());
1175               stash = gv_stashpv("MyTie", TRUE);
1176               sv_bless(tie, stash);
1177               hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1178               RETVAL = newRV_noinc(hash);
1179           OUTPUT:
1180               RETVAL
1181
1182       The "av_store" function, when given a tied array argument, merely
1183       copies the magic of the array onto the value to be "stored", using
1184       "mg_copy".  It may also return NULL, indicating that the value did not
1185       actually need to be stored in the array.  [MAYCHANGE] After a call to
1186       "av_store" on a tied array, the caller will usually need to call
1187       "mg_set(val)" to actually invoke the perl level "STORE" method on the
1188       TIEARRAY object.  If "av_store" did return NULL, a call to "SvRE‐
1189       FCNT_dec(val)" will also be usually necessary to avoid a memory leak.
1190       [/MAYCHANGE]
1191
1192       The previous paragraph is applicable verbatim to tied hash access using
1193       the "hv_store" and "hv_store_ent" functions as well.
1194
1195       "av_fetch" and the corresponding hash functions "hv_fetch" and
1196       "hv_fetch_ent" actually return an undefined mortal value whose magic
1197       has been initialized using "mg_copy".  Note the value so returned does
1198       not need to be deallocated, as it is already mortal.  [MAYCHANGE] But
1199       you will need to call "mg_get()" on the returned value in order to
1200       actually invoke the perl level "FETCH" method on the underlying TIE
1201       object.  Similarly, you may also call "mg_set()" on the return value
1202       after possibly assigning a suitable value to it using "sv_setsv",
1203       which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]
1204
1205       [MAYCHANGE] In other words, the array or hash fetch/store functions
1206       don't really fetch and store actual values in the case of tied arrays
1207       and hashes.  They merely call "mg_copy" to attach magic to the values
1208       that were meant to be "stored" or "fetched".  Later calls to "mg_get"
1209       and "mg_set" actually do the job of invoking the TIE methods on the
1210       underlying objects.  Thus the magic mechanism currently implements a
1211       kind of lazy access to arrays and hashes.
1212
1213       Currently (as of perl version 5.004), use of the hash and array access
1214       functions requires the user to be aware of whether they are operating
1215       on "normal" hashes and arrays, or on their tied variants.  The API may
1216       be changed to provide more transparent access to both tied and normal
1217       data types in future versions.  [/MAYCHANGE]
1218
1219       You would do well to understand that the TIEARRAY and TIEHASH inter‐
1220       faces are mere sugar to invoke some perl method calls while using the
1221       uniform hash and array syntax.  The use of this sugar imposes some
1222       overhead (typically about two to four extra opcodes per FETCH/STORE
1223       operation, in addition to the creation of all the mortal variables
1224       required to invoke the methods).  This overhead will be comparatively
1225       small if the TIE methods are themselves substantial, but if they are
1226       only a few statements long, the overhead will not be insignificant.
1227
1228       Localizing changes
1229
1230       Perl has a very handy construction
1231
1232         {
1233           local $var = 2;
1234           ...
1235         }
1236
1237       This construction is approximately equivalent to
1238
1239         {
1240           my $oldvar = $var;
1241           $var = 2;
1242           ...
1243           $var = $oldvar;
1244         }
1245
1246       The biggest difference is that the first construction would reinstate
1247       the initial value of $var, irrespective of how control exits the block:
1248       "goto", "return", "die"/"eval", etc. It is a little bit more efficient
1249       as well.
1250
1251       There is a way to achieve a similar task from C via Perl API: create a
1252       pseudo-block, and arrange for some changes to be automatically undone
1253       at the end of it, either explicit, or via a non-local exit (via die()).
1254       A block-like construct is created by a pair of "ENTER"/"LEAVE" macros
1255       (see "Returning a Scalar" in perlcall).  Such a construct may be cre‐
1256       ated specially for some important localized task, or an existing one
1257       (like boundaries of enclosing Perl subroutine/block, or an existing
1258       pair for freeing TMPs) may be used. (In the second case the overhead of
1259       additional localization must be almost negligible.) Note that any XSUB
1260       is automatically enclosed in an "ENTER"/"LEAVE" pair.
1261
1262       Inside such a pseudo-block the following service is available:
1263
1264       "SAVEINT(int i)"
1265       "SAVEIV(IV i)"
1266       "SAVEI32(I32 i)"
1267       "SAVELONG(long i)"
1268           These macros arrange things to restore the value of integer vari‐
1269           able "i" at the end of enclosing pseudo-block.
1270
1271       SAVESPTR(s)
1272       SAVEPPTR(p)
1273           These macros arrange things to restore the value of pointers "s"
1274           and "p". "s" must be a pointer of a type which survives conversion
1275           to "SV*" and back, "p" should be able to survive conversion to
1276           "char*" and back.
1277
1278       "SAVEFREESV(SV *sv)"
1279           The refcount of "sv" would be decremented at the end of pseudo-
1280           block.  This is similar to "sv_2mortal" in that it is also a mecha‐
1281           nism for doing a delayed "SvREFCNT_dec".  However, while "sv_2mor‐
1282           tal" extends the lifetime of "sv" until the beginning of the next
1283           statement, "SAVEFREESV" extends it until the end of the enclosing
1284           scope.  These lifetimes can be wildly different.
1285
1286           Also compare "SAVEMORTALIZESV".
1287
1288       "SAVEMORTALIZESV(SV *sv)"
1289           Just like "SAVEFREESV", but mortalizes "sv" at the end of the cur‐
1290           rent scope instead of decrementing its reference count.  This usu‐
1291           ally has the effect of keeping "sv" alive until the statement that
1292           called the currently live scope has finished executing.
1293
1294       "SAVEFREEOP(OP *op)"
1295           The "OP *" is op_free()ed at the end of pseudo-block.
1296
1297       SAVEFREEPV(p)
1298           The chunk of memory which is pointed to by "p" is Safefree()ed at
1299           the end of pseudo-block.
1300
1301       "SAVECLEARSV(SV *sv)"
1302           Clears a slot in the current scratchpad which corresponds to "sv"
1303           at the end of pseudo-block.
1304
1305       "SAVEDELETE(HV *hv, char *key, I32 length)"
1306           The key "key" of "hv" is deleted at the end of pseudo-block. The
1307           string pointed to by "key" is Safefree()ed.  If one has a key in
1308           short-lived storage, the corresponding string may be reallocated
1309           like this:
1310
1311             SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1312
1313       "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
1314           At the end of pseudo-block the function "f" is called with the only
1315           argument "p".
1316
1317       "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
1318           At the end of pseudo-block the function "f" is called with the
1319           implicit context argument (if any), and "p".
1320
1321       "SAVESTACK_POS()"
1322           The current offset on the Perl internal stack (cf. "SP") is
1323           restored at the end of pseudo-block.
1324
1325       The following API list contains functions, thus one needs to provide
1326       pointers to the modifiable data explicitly (either C pointers, or Perl‐
1327       ish "GV *"s).  Where the above macros take "int", a similar function
1328       takes "int *".
1329
1330       "SV* save_scalar(GV *gv)"
1331           Equivalent to Perl code "local $gv".
1332
1333       "AV* save_ary(GV *gv)"
1334       "HV* save_hash(GV *gv)"
1335           Similar to "save_scalar", but localize @gv and %gv.
1336
1337       "void save_item(SV *item)"
1338           Duplicates the current value of "SV", on the exit from the current
1339           "ENTER"/"LEAVE" pseudo-block will restore the value of "SV" using
1340           the stored value.
1341
1342       "void save_list(SV **sarg, I32 maxsarg)"
1343           A variant of "save_item" which takes multiple arguments via an
1344           array "sarg" of "SV*" of length "maxsarg".
1345
1346       "SV* save_svref(SV **sptr)"
1347           Similar to "save_scalar", but will reinstate an "SV *".
1348
1349       "void save_aptr(AV **aptr)"
1350       "void save_hptr(HV **hptr)"
1351           Similar to "save_svref", but localize "AV *" and "HV *".
1352
1353       The "Alias" module implements localization of the basic types within
1354       the caller's scope.  People who are interested in how to localize
1355       things in the containing scope should take a look there too.
1356

Subroutines

1358       XSUBs and the Argument Stack
1359
1360       The XSUB mechanism is a simple way for Perl programs to access C sub‐
1361       routines.  An XSUB routine will have a stack that contains the argu‐
1362       ments from the Perl program, and a way to map from the Perl data struc‐
1363       tures to a C equivalent.
1364
1365       The stack arguments are accessible through the ST(n) macro, which
1366       returns the "n"'th stack argument.  Argument 0 is the first argument
1367       passed in the Perl subroutine call.  These arguments are "SV*", and can
1368       be used anywhere an "SV*" is used.
1369
1370       Most of the time, output from the C routine can be handled through use
1371       of the RETVAL and OUTPUT directives.  However, there are some cases
1372       where the argument stack is not already long enough to handle all the
1373       return values.  An example is the POSIX tzname() call, which takes no
1374       arguments, but returns two, the local time zone's standard and summer
1375       time abbreviations.
1376
1377       To handle this situation, the PPCODE directive is used and the stack is
1378       extended using the macro:
1379
1380           EXTEND(SP, num);
1381
1382       where "SP" is the macro that represents the local copy of the stack
1383       pointer, and "num" is the number of elements the stack should be
1384       extended by.
1385
1386       Now that there is room on the stack, values can be pushed on it using
1387       "PUSHs" macro. The pushed values will often need to be "mortal" (See
1388       "Reference Counts and Mortality"):
1389
1390           PUSHs(sv_2mortal(newSViv(an_integer)))
1391           PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
1392           PUSHs(sv_2mortal(newSVnv(a_double)))
1393           PUSHs(sv_2mortal(newSVpv("Some String",0)))
1394
1395       And now the Perl program calling "tzname", the two values will be
1396       assigned as in:
1397
1398           ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1399
1400       An alternate (and possibly simpler) method to pushing values on the
1401       stack is to use the macro:
1402
1403           XPUSHs(SV*)
1404
1405       This macro automatically adjust the stack for you, if needed.  Thus,
1406       you do not need to call "EXTEND" to extend the stack.
1407
1408       Despite their suggestions in earlier versions of this document the
1409       macros "(X)PUSH[iunp]" are not suited to XSUBs which return multiple
1410       results.  For that, either stick to the "(X)PUSHs" macros shown above,
1411       or use the new "m(X)PUSH[iunp]" macros instead; see "Putting a C value
1412       on Perl stack".
1413
1414       For more information, consult perlxs and perlxstut.
1415
1416       Calling Perl Routines from within C Programs
1417
1418       There are four routines that can be used to call a Perl subroutine from
1419       within a C program.  These four are:
1420
1421           I32  call_sv(SV*, I32);
1422           I32  call_pv(const char*, I32);
1423           I32  call_method(const char*, I32);
1424           I32  call_argv(const char*, I32, register char**);
1425
1426       The routine most often used is "call_sv".  The "SV*" argument contains
1427       either the name of the Perl subroutine to be called, or a reference to
1428       the subroutine.  The second argument consists of flags that control the
1429       context in which the subroutine is called, whether or not the subrou‐
1430       tine is being passed arguments, how errors should be trapped, and how
1431       to treat return values.
1432
1433       All four routines return the number of arguments that the subroutine
1434       returned on the Perl stack.
1435
1436       These routines used to be called "perl_call_sv", etc., before Perl
1437       v5.6.0, but those names are now deprecated; macros of the same name are
1438       provided for compatibility.
1439
1440       When using any of these routines (except "call_argv"), the programmer
1441       must manipulate the Perl stack.  These include the following macros and
1442       functions:
1443
1444           dSP
1445           SP
1446           PUSHMARK()
1447           PUTBACK
1448           SPAGAIN
1449           ENTER
1450           SAVETMPS
1451           FREETMPS
1452           LEAVE
1453           XPUSH*()
1454           POP*()
1455
1456       For a detailed description of calling conventions from C to Perl, con‐
1457       sult perlcall.
1458
1459       Memory Allocation
1460
1461       Allocation
1462
1463       All memory meant to be used with the Perl API functions should be
1464       manipulated using the macros described in this section.  The macros
1465       provide the necessary transparency between differences in the actual
1466       malloc implementation that is used within perl.
1467
1468       It is suggested that you enable the version of malloc that is distrib‐
1469       uted with Perl.  It keeps pools of various sizes of unallocated memory
1470       in order to satisfy allocation requests more quickly.  However, on some
1471       platforms, it may cause spurious malloc or free errors.
1472
1473       The following three macros are used to initially allocate memory :
1474
1475           Newx(pointer, number, type);
1476           Newxc(pointer, number, type, cast);
1477           Newxz(pointer, number, type);
1478
1479       The first argument "pointer" should be the name of a variable that will
1480       point to the newly allocated memory.
1481
1482       The second and third arguments "number" and "type" specify how many of
1483       the specified type of data structure should be allocated.  The argument
1484       "type" is passed to "sizeof".  The final argument to "Newxc", "cast",
1485       should be used if the "pointer" argument is different from the "type"
1486       argument.
1487
1488       Unlike the "Newx" and "Newxc" macros, the "Newxz" macro calls "memzero"
1489       to zero out all the newly allocated memory.
1490
1491       Reallocation
1492
1493           Renew(pointer, number, type);
1494           Renewc(pointer, number, type, cast);
1495           Safefree(pointer)
1496
1497       These three macros are used to change a memory buffer size or to free a
1498       piece of memory no longer needed.  The arguments to "Renew" and
1499       "Renewc" match those of "New" and "Newc" with the exception of not
1500       needing the "magic cookie" argument.
1501
1502       Moving
1503
1504           Move(source, dest, number, type);
1505           Copy(source, dest, number, type);
1506           Zero(dest, number, type);
1507
1508       These three macros are used to move, copy, or zero out previously allo‐
1509       cated memory.  The "source" and "dest" arguments point to the source
1510       and destination starting points.  Perl will move, copy, or zero out
1511       "number" instances of the size of the "type" data structure (using the
1512       "sizeof" function).
1513
1514       PerlIO
1515
1516       The most recent development releases of Perl has been experimenting
1517       with removing Perl's dependency on the "normal" standard I/O suite and
1518       allowing other stdio implementations to be used.  This involves creat‐
1519       ing a new abstraction layer that then calls whichever implementation of
1520       stdio Perl was compiled with.  All XSUBs should now use the functions
1521       in the PerlIO abstraction layer and not make any assumptions about what
1522       kind of stdio is being used.
1523
1524       For a complete description of the PerlIO abstraction, consult perlapio.
1525
1526       Putting a C value on Perl stack
1527
1528       A lot of opcodes (this is an elementary operation in the internal perl
1529       stack machine) put an SV* on the stack. However, as an optimization the
1530       corresponding SV is (usually) not recreated each time. The opcodes re‐
1531       use specially assigned SVs (targets) which are (as a corollary) not
1532       constantly freed/created.
1533
1534       Each of the targets is created only once (but see "Scratchpads and
1535       recursion" below), and when an opcode needs to put an integer, a dou‐
1536       ble, or a string on stack, it just sets the corresponding parts of its
1537       target and puts the target on stack.
1538
1539       The macro to put this target on stack is "PUSHTARG", and it is directly
1540       used in some opcodes, as well as indirectly in zillions of others,
1541       which use it via "(X)PUSH[iunp]".
1542
1543       Because the target is reused, you must be careful when pushing multiple
1544       values on the stack. The following code will not do what you think:
1545
1546           XPUSHi(10);
1547           XPUSHi(20);
1548
1549       This translates as "set "TARG" to 10, push a pointer to "TARG" onto the
1550       stack; set "TARG" to 20, push a pointer to "TARG" onto the stack".  At
1551       the end of the operation, the stack does not contain the values 10 and
1552       20, but actually contains two pointers to "TARG", which we have set to
1553       20.
1554
1555       If you need to push multiple different values then you should either
1556       use the "(X)PUSHs" macros, or else use the new "m(X)PUSH[iunp]" macros,
1557       none of which make use of "TARG".  The "(X)PUSHs" macros simply push an
1558       SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
1559       will often need to be "mortal".  The new "m(X)PUSH[iunp]" macros make
1560       this a little easier to achieve by creating a new mortal for you (via
1561       "(X)PUSHmortal"), pushing that onto the stack (extending it if neces‐
1562       sary in the case of the "mXPUSH[iunp]" macros), and then setting its
1563       value.  Thus, instead of writing this to "fix" the example above:
1564
1565           XPUSHs(sv_2mortal(newSViv(10)))
1566           XPUSHs(sv_2mortal(newSViv(20)))
1567
1568       you can simply write:
1569
1570           mXPUSHi(10)
1571           mXPUSHi(20)
1572
1573       On a related note, if you do use "(X)PUSH[iunp]", then you're going to
1574       need a "dTARG" in your variable declarations so that the "*PUSH*"
1575       macros can make use of the local variable "TARG".  See also "dTARGET"
1576       and "dXSTARG".
1577
1578       Scratchpads
1579
1580       The question remains on when the SVs which are targets for opcodes are
1581       created. The answer is that they are created when the current unit -- a
1582       subroutine or a file (for opcodes for statements outside of subrou‐
1583       tines) -- is compiled. During this time a special anonymous Perl array
1584       is created, which is called a scratchpad for the current unit.
1585
1586       A scratchpad keeps SVs which are lexicals for the current unit and are
1587       targets for opcodes. One can deduce that an SV lives on a scratchpad by
1588       looking on its flags: lexicals have "SVs_PADMY" set, and targets have
1589       "SVs_PADTMP" set.
1590
1591       The correspondence between OPs and targets is not 1-to-1. Different OPs
1592       in the compile tree of the unit can use the same target, if this would
1593       not conflict with the expected life of the temporary.
1594
1595       Scratchpads and recursion
1596
1597       In fact it is not 100% true that a compiled unit contains a pointer to
1598       the scratchpad AV. In fact it contains a pointer to an AV of (ini‐
1599       tially) one element, and this element is the scratchpad AV. Why do we
1600       need an extra level of indirection?
1601
1602       The answer is recursion, and maybe threads. Both these can create sev‐
1603       eral execution pointers going into the same subroutine. For the subrou‐
1604       tine-child not write over the temporaries for the subroutine-parent
1605       (lifespan of which covers the call to the child), the parent and the
1606       child should have different scratchpads. (And the lexicals should be
1607       separate anyway!)
1608
1609       So each subroutine is born with an array of scratchpads (of length 1).
1610       On each entry to the subroutine it is checked that the current depth of
1611       the recursion is not more than the length of this array, and if it is,
1612       new scratchpad is created and pushed into the array.
1613
1614       The targets on this scratchpad are "undef"s, but they are already
1615       marked with correct flags.
1616

Compiled code

1618       Code tree
1619
1620       Here we describe the internal form your code is converted to by Perl.
1621       Start with a simple example:
1622
1623         $a = $b + $c;
1624
1625       This is converted to a tree similar to this one:
1626
1627                    assign-to
1628                  /           \
1629                 +             $a
1630               /   \
1631             $b     $c
1632
1633       (but slightly more complicated).  This tree reflects the way Perl
1634       parsed your code, but has nothing to do with the execution order.
1635       There is an additional "thread" going through the nodes of the tree
1636       which shows the order of execution of the nodes.  In our simplified
1637       example above it looks like:
1638
1639            $b ---> $c ---> + ---> $a ---> assign-to
1640
1641       But with the actual compile tree for "$a = $b + $c" it is different:
1642       some nodes optimized away.  As a corollary, though the actual tree con‐
1643       tains more nodes than our simplified example, the execution order is
1644       the same as in our example.
1645
1646       Examining the tree
1647
1648       If you have your perl compiled for debugging (usually done with "-DDE‐
1649       BUGGING" on the "Configure" command line), you may examine the compiled
1650       tree by specifying "-Dx" on the Perl command line.  The output takes
1651       several lines per node, and for "$b+$c" it looks like this:
1652
1653           5           TYPE = add  ===> 6
1654                       TARG = 1
1655                       FLAGS = (SCALAR,KIDS)
1656                       {
1657                           TYPE = null  ===> (4)
1658                             (was rv2sv)
1659                           FLAGS = (SCALAR,KIDS)
1660                           {
1661           3                   TYPE = gvsv  ===> 4
1662                               FLAGS = (SCALAR)
1663                               GV = main::b
1664                           }
1665                       }
1666                       {
1667                           TYPE = null  ===> (5)
1668                             (was rv2sv)
1669                           FLAGS = (SCALAR,KIDS)
1670                           {
1671           4                   TYPE = gvsv  ===> 5
1672                               FLAGS = (SCALAR)
1673                               GV = main::c
1674                           }
1675                       }
1676
1677       This tree has 5 nodes (one per "TYPE" specifier), only 3 of them are
1678       not optimized away (one per number in the left column).  The immediate
1679       children of the given node correspond to "{}" pairs on the same level
1680       of indentation, thus this listing corresponds to the tree:
1681
1682                          add
1683                        /     \
1684                      null    null
1685                       ⎪       ⎪
1686                      gvsv    gvsv
1687
1688       The execution order is indicated by "===>" marks, thus it is "3 4 5 6"
1689       (node 6 is not included into above listing), i.e., "gvsv gvsv add what‐
1690       ever".
1691
1692       Each of these nodes represents an op, a fundamental operation inside
1693       the Perl core. The code which implements each operation can be found in
1694       the pp*.c files; the function which implements the op with type "gvsv"
1695       is "pp_gvsv", and so on. As the tree above shows, different ops have
1696       different numbers of children: "add" is a binary operator, as one would
1697       expect, and so has two children. To accommodate the various different
1698       numbers of children, there are various types of op data structure, and
1699       they link together in different ways.
1700
1701       The simplest type of op structure is "OP": this has no children. Unary
1702       operators, "UNOP"s, have one child, and this is pointed to by the
1703       "op_first" field. Binary operators ("BINOP"s) have not only an
1704       "op_first" field but also an "op_last" field. The most complex type of
1705       op is a "LISTOP", which has any number of children. In this case, the
1706       first child is pointed to by "op_first" and the last child by
1707       "op_last". The children in between can be found by iteratively follow‐
1708       ing the "op_sibling" pointer from the first child to the last.
1709
1710       There are also two other op types: a "PMOP" holds a regular expression,
1711       and has no children, and a "LOOP" may or may not have children. If the
1712       "op_children" field is non-zero, it behaves like a "LISTOP". To compli‐
1713       cate matters, if a "UNOP" is actually a "null" op after optimization
1714       (see "Compile pass 2: context propagation") it will still have children
1715       in accordance with its former type.
1716
1717       Another way to examine the tree is to use a compiler back-end module,
1718       such as B::Concise.
1719
1720       Compile pass 1: check routines
1721
1722       The tree is created by the compiler while yacc code feeds it the con‐
1723       structions it recognizes. Since yacc works bottom-up, so does the first
1724       pass of perl compilation.
1725
1726       What makes this pass interesting for perl developers is that some opti‐
1727       mization may be performed on this pass.  This is optimization by so-
1728       called "check routines".  The correspondence between node names and
1729       corresponding check routines is described in opcode.pl (do not forget
1730       to run "make regen_headers" if you modify this file).
1731
1732       A check routine is called when the node is fully constructed except for
1733       the execution-order thread.  Since at this time there are no back-links
1734       to the currently constructed node, one can do most any operation to the
1735       top-level node, including freeing it and/or creating new nodes
1736       above/below it.
1737
1738       The check routine returns the node which should be inserted into the
1739       tree (if the top-level node was not modified, check routine returns its
1740       argument).
1741
1742       By convention, check routines have names "ck_*". They are usually
1743       called from "new*OP" subroutines (or "convert") (which in turn are
1744       called from perly.y).
1745
1746       Compile pass 1a: constant folding
1747
1748       Immediately after the check routine is called the returned node is
1749       checked for being compile-time executable.  If it is (the value is
1750       judged to be constant) it is immediately executed, and a constant node
1751       with the "return value" of the corresponding subtree is substituted
1752       instead.  The subtree is deleted.
1753
1754       If constant folding was not performed, the execution-order thread is
1755       created.
1756
1757       Compile pass 2: context propagation
1758
1759       When a context for a part of compile tree is known, it is propagated
1760       down through the tree.  At this time the context can have 5 values
1761       (instead of 2 for runtime context): void, boolean, scalar, list, and
1762       lvalue.  In contrast with the pass 1 this pass is processed from top to
1763       bottom: a node's context determines the context for its children.
1764
1765       Additional context-dependent optimizations are performed at this time.
1766       Since at this moment the compile tree contains back-references (via
1767       "thread" pointers), nodes cannot be free()d now.  To allow optimized-
1768       away nodes at this stage, such nodes are null()ified instead of
1769       free()ing (i.e. their type is changed to OP_NULL).
1770
1771       Compile pass 3: peephole optimization
1772
1773       After the compile tree for a subroutine (or for an "eval" or a file) is
1774       created, an additional pass over the code is performed. This pass is
1775       neither top-down or bottom-up, but in the execution order (with addi‐
1776       tional complications for conditionals).  These optimizations are done
1777       in the subroutine peep().  Optimizations performed at this stage are
1778       subject to the same restrictions as in the pass 2.
1779
1780       Pluggable runops
1781
1782       The compile tree is executed in a runops function.  There are two
1783       runops functions, in run.c and in dump.c.  "Perl_runops_debug" is used
1784       with DEBUGGING and "Perl_runops_standard" is used otherwise.  For fine
1785       control over the execution of the compile tree it is possible to pro‐
1786       vide your own runops function.
1787
1788       It's probably best to copy one of the existing runops functions and
1789       change it to suit your needs.  Then, in the BOOT section of your XS
1790       file, add the line:
1791
1792         PL_runops = my_runops;
1793
1794       This function should be as efficient as possible to keep your programs
1795       running as fast as possible.
1796

Examining internal data structures with the "dump" functions

1798       To aid debugging, the source file dump.c contains a number of functions
1799       which produce formatted output of internal data structures.
1800
1801       The most commonly used of these functions is "Perl_sv_dump"; it's used
1802       for dumping SVs, AVs, HVs, and CVs. The "Devel::Peek" module calls
1803       "sv_dump" to produce debugging output from Perl-space, so users of that
1804       module should already be familiar with its format.
1805
1806       "Perl_op_dump" can be used to dump an "OP" structure or any of its de‐
1807       rivatives, and produces output similar to "perl -Dx"; in fact,
1808       "Perl_dump_eval" will dump the main root of the code being evaluated,
1809       exactly like "-Dx".
1810
1811       Other useful functions are "Perl_dump_sub", which turns a "GV" into an
1812       op tree, "Perl_dump_packsubs" which calls "Perl_dump_sub" on all the
1813       subroutines in a package like so: (Thankfully, these are all xsubs, so
1814       there is no op tree)
1815
1816           (gdb) print Perl_dump_packsubs(PL_defstash)
1817
1818           SUB attributes::bootstrap = (xsub 0x811fedc 0)
1819
1820           SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1821
1822           SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1823
1824           SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
1825
1826           SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
1827
1828       and "Perl_dump_all", which dumps all the subroutines in the stash and
1829       the op tree of the main root.
1830

How multiple interpreters and concurrency are supported

1832       Background and PERL_IMPLICIT_CONTEXT
1833
1834       The Perl interpreter can be regarded as a closed box: it has an API for
1835       feeding it code or otherwise making it do things, but it also has func‐
1836       tions for its own use.  This smells a lot like an object, and there are
1837       ways for you to build Perl so that you can have multiple interpreters,
1838       with one interpreter represented either as a C structure, or inside a
1839       thread-specific structure.  These structures contain all the context,
1840       the state of that interpreter.
1841
1842       Two macros control the major Perl build flavors: MULTIPLICITY and
1843       USE_5005THREADS.  The MULTIPLICITY build has a C structure that pack‐
1844       ages all the interpreter state, and there is a similar thread-specific
1845       data structure under USE_5005THREADS.  In both cases,
1846       PERL_IMPLICIT_CONTEXT is also normally defined, and enables the support
1847       for passing in a "hidden" first argument that represents all three data
1848       structures.
1849
1850       All this obviously requires a way for the Perl internal functions to be
1851       either subroutines taking some kind of structure as the first argument,
1852       or subroutines taking nothing as the first argument.  To enable these
1853       two very different ways of building the interpreter, the Perl source
1854       (as it does in so many other situations) makes heavy use of macros and
1855       subroutine naming conventions.
1856
1857       First problem: deciding which functions will be public API functions
1858       and which will be private.  All functions whose names begin "S_" are
1859       private (think "S" for "secret" or "static").  All other functions
1860       begin with "Perl_", but just because a function begins with "Perl_"
1861       does not mean it is part of the API. (See "Internal Functions".) The
1862       easiest way to be sure a function is part of the API is to find its
1863       entry in perlapi.  If it exists in perlapi, it's part of the API.  If
1864       it doesn't, and you think it should be (i.e., you need it for your
1865       extension), send mail via perlbug explaining why you think it should
1866       be.
1867
1868       Second problem: there must be a syntax so that the same subroutine dec‐
1869       larations and calls can pass a structure as their first argument, or
1870       pass nothing.  To solve this, the subroutines are named and declared in
1871       a particular way.  Here's a typical start of a static function used
1872       within the Perl guts:
1873
1874         STATIC void
1875         S_incline(pTHX_ char *s)
1876
1877       STATIC becomes "static" in C, and may be #define'd to nothing in some
1878       configurations in future.
1879
1880       A public function (i.e. part of the internal API, but not necessarily
1881       sanctioned for use in extensions) begins like this:
1882
1883         void
1884         Perl_sv_setiv(pTHX_ SV* dsv, IV num)
1885
1886       "pTHX_" is one of a number of macros (in perl.h) that hide the details
1887       of the interpreter's context.  THX stands for "thread", "this", or
1888       "thingy", as the case may be.  (And no, George Lucas is not involved.
1889       :-) The first character could be 'p' for a prototype, 'a' for argument,
1890       or 'd' for declaration, so we have "pTHX", "aTHX" and "dTHX", and their
1891       variants.
1892
1893       When Perl is built without options that set PERL_IMPLICIT_CONTEXT,
1894       there is no first argument containing the interpreter's context.  The
1895       trailing underscore in the pTHX_ macro indicates that the macro expan‐
1896       sion needs a comma after the context argument because other arguments
1897       follow it.  If PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be
1898       ignored, and the subroutine is not prototyped to take the extra argu‐
1899       ment.  The form of the macro without the trailing underscore is used
1900       when there are no additional explicit arguments.
1901
1902       When a core function calls another, it must pass the context.  This is
1903       normally hidden via macros.  Consider "sv_setiv".  It expands into
1904       something like this:
1905
1906           #ifdef PERL_IMPLICIT_CONTEXT
1907             #define sv_setiv(a,b)      Perl_sv_setiv(aTHX_ a, b)
1908             /* can't do this for vararg functions, see below */
1909           #else
1910             #define sv_setiv           Perl_sv_setiv
1911           #endif
1912
1913       This works well, and means that XS authors can gleefully write:
1914
1915           sv_setiv(foo, bar);
1916
1917       and still have it work under all the modes Perl could have been com‐
1918       piled with.
1919
1920       This doesn't work so cleanly for varargs functions, though, as macros
1921       imply that the number of arguments is known in advance.  Instead we
1922       either need to spell them out fully, passing "aTHX_" as the first argu‐
1923       ment (the Perl core tends to do this with functions like Perl_warner),
1924       or use a context-free version.
1925
1926       The context-free version of Perl_warner is called Perl_warner_nocon‐
1927       text, and does not take the extra argument.  Instead it does dTHX; to
1928       get the context from thread-local storage.  We "#define warner
1929       Perl_warner_nocontext" so that extensions get source compatibility at
1930       the expense of performance.  (Passing an arg is cheaper than grabbing
1931       it from thread-local storage.)
1932
1933       You can ignore [pad]THXx when browsing the Perl headers/sources.  Those
1934       are strictly for use within the core.  Extensions and embedders need
1935       only be aware of [pad]THX.
1936
1937       So what happened to dTHR?
1938
1939       "dTHR" was introduced in perl 5.005 to support the older thread model.
1940       The older thread model now uses the "THX" mechanism to pass context
1941       pointers around, so "dTHR" is not useful any more.  Perl 5.6.0 and
1942       later still have it for backward source compatibility, but it is
1943       defined to be a no-op.
1944
1945       How do I use all this in extensions?
1946
1947       When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any
1948       functions in the Perl API will need to pass the initial context argu‐
1949       ment somehow.  The kicker is that you will need to write it in such a
1950       way that the extension still compiles when Perl hasn't been built with
1951       PERL_IMPLICIT_CONTEXT enabled.
1952
1953       There are three ways to do this.  First, the easy but inefficient way,
1954       which is also the default, in order to maintain source compatibility
1955       with extensions: whenever XSUB.h is #included, it redefines the aTHX
1956       and aTHX_ macros to call a function that will return the context.
1957       Thus, something like:
1958
1959               sv_setiv(sv, num);
1960
1961       in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
1962       in effect:
1963
1964               Perl_sv_setiv(Perl_get_context(), sv, num);
1965
1966       or to this otherwise:
1967
1968               Perl_sv_setiv(sv, num);
1969
1970       You have to do nothing new in your extension to get this; since the
1971       Perl library provides Perl_get_context(), it will all just work.
1972
1973       The second, more efficient way is to use the following template for
1974       your Foo.xs:
1975
1976               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
1977               #include "EXTERN.h"
1978               #include "perl.h"
1979               #include "XSUB.h"
1980
1981               static my_private_function(int arg1, int arg2);
1982
1983               static SV *
1984               my_private_function(int arg1, int arg2)
1985               {
1986                   dTHX;       /* fetch context */
1987                   ... call many Perl API functions ...
1988               }
1989
1990               [... etc ...]
1991
1992               MODULE = Foo            PACKAGE = Foo
1993
1994               /* typical XSUB */
1995
1996               void
1997               my_xsub(arg)
1998                       int arg
1999                   CODE:
2000                       my_private_function(arg, 10);
2001
2002       Note that the only two changes from the normal way of writing an exten‐
2003       sion is the addition of a "#define PERL_NO_GET_CONTEXT" before includ‐
2004       ing the Perl headers, followed by a "dTHX;" declaration at the start of
2005       every function that will call the Perl API.  (You'll know which func‐
2006       tions need this, because the C compiler will complain that there's an
2007       undeclared identifier in those functions.)  No changes are needed for
2008       the XSUBs themselves, because the XS() macro is correctly defined to
2009       pass in the implicit context if needed.
2010
2011       The third, even more efficient way is to ape how it is done within the
2012       Perl guts:
2013
2014               #define PERL_NO_GET_CONTEXT     /* we want efficiency */
2015               #include "EXTERN.h"
2016               #include "perl.h"
2017               #include "XSUB.h"
2018
2019               /* pTHX_ only needed for functions that call Perl API */
2020               static my_private_function(pTHX_ int arg1, int arg2);
2021
2022               static SV *
2023               my_private_function(pTHX_ int arg1, int arg2)
2024               {
2025                   /* dTHX; not needed here, because THX is an argument */
2026                   ... call Perl API functions ...
2027               }
2028
2029               [... etc ...]
2030
2031               MODULE = Foo            PACKAGE = Foo
2032
2033               /* typical XSUB */
2034
2035               void
2036               my_xsub(arg)
2037                       int arg
2038                   CODE:
2039                       my_private_function(aTHX_ arg, 10);
2040
2041       This implementation never has to fetch the context using a function
2042       call, since it is always passed as an extra argument.  Depending on
2043       your needs for simplicity or efficiency, you may mix the previous two
2044       approaches freely.
2045
2046       Never add a comma after "pTHX" yourself--always use the form of the
2047       macro with the underscore for functions that take explicit arguments,
2048       or the form without the argument for functions with no explicit argu‐
2049       ments.
2050
2051       Should I do anything special if I call perl from multiple threads?
2052
2053       If you create interpreters in one thread and then proceed to call them
2054       in another, you need to make sure perl's own Thread Local Storage (TLS)
2055       slot is initialized correctly in each of those threads.
2056
2057       The "perl_alloc" and "perl_clone" API functions will automatically set
2058       the TLS slot to the interpreter they created, so that there is no need
2059       to do anything special if the interpreter is always accessed in the
2060       same thread that created it, and that thread did not create or call any
2061       other interpreters afterwards.  If that is not the case, you have to
2062       set the TLS slot of the thread before calling any functions in the Perl
2063       API on that particular interpreter.  This is done by calling the
2064       "PERL_SET_CONTEXT" macro in that thread as the first thing you do:
2065
2066               /* do this before doing anything else with some_perl */
2067               PERL_SET_CONTEXT(some_perl);
2068
2069               ... other Perl API calls on some_perl go here ...
2070
2071       Future Plans and PERL_IMPLICIT_SYS
2072
2073       Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2074       that the interpreter knows about itself and pass it around, so too are
2075       there plans to allow the interpreter to bundle up everything it knows
2076       about the environment it's running on.  This is enabled with the
2077       PERL_IMPLICIT_SYS macro.  Currently it only works with USE_ITHREADS and
2078       USE_5005THREADS on Windows (see inside iperlsys.h).
2079
2080       This allows the ability to provide an extra pointer (called the "host"
2081       environment) for all the system calls.  This makes it possible for all
2082       the system stuff to maintain their own state, broken down into seven C
2083       structures.  These are thin wrappers around the usual system calls (see
2084       win32/perllib.c) for the default perl executable, but for a more ambi‐
2085       tious host (like the one that would do fork() emulation) all the extra
2086       work needed to pretend that different interpreters are actually differ‐
2087       ent "processes", would be done here.
2088
2089       The Perl engine/interpreter and the host are orthogonal entities.
2090       There could be one or more interpreters in a process, and one or more
2091       "hosts", with free association between them.
2092

Internal Functions

2094       All of Perl's internal functions which will be exposed to the outside
2095       world are prefixed by "Perl_" so that they will not conflict with XS
2096       functions or functions used in a program in which Perl is embedded.
2097       Similarly, all global variables begin with "PL_". (By convention,
2098       static functions start with "S_".)
2099
2100       Inside the Perl core, you can get at the functions either with or with‐
2101       out the "Perl_" prefix, thanks to a bunch of defines that live in
2102       embed.h. This header file is generated automatically from embed.pl and
2103       embed.fnc. embed.pl also creates the prototyping header files for the
2104       internal functions, generates the documentation and a lot of other bits
2105       and pieces. It's important that when you add a new function to the core
2106       or change an existing one, you change the data in the table in
2107       embed.fnc as well. Here's a sample entry from that table:
2108
2109           Apd ⎪SV**   ⎪av_fetch   ⎪AV* ar⎪I32 key⎪I32 lval
2110
2111       The second column is the return type, the third column the name. Col‐
2112       umns after that are the arguments. The first column is a set of flags:
2113
2114       A  This function is a part of the public API. All such functions should
2115          also have 'd', very few do not.
2116
2117       p  This function has a "Perl_" prefix; i.e. it is defined as
2118          "Perl_av_fetch".
2119
2120       d  This function has documentation using the "apidoc" feature which
2121          we'll look at in a second.  Some functions have 'd' but not 'A';
2122          docs are good.
2123
2124       Other available flags are:
2125
2126       s  This is a static function and is defined as "STATIC S_whatever", and
2127          usually called within the sources as "whatever(...)".
2128
2129       n  This does not need a interpreter context, so the definition has no
2130          "pTHX", and it follows that callers don't use "aTHX".  (See "Back‐
2131          ground and PERL_IMPLICIT_CONTEXT" in perlguts.)
2132
2133       r  This function never returns; "croak", "exit" and friends.
2134
2135       f  This function takes a variable number of arguments, "printf" style.
2136          The argument list should end with "...", like this:
2137
2138              Afprd   ⎪void   ⎪croak          ⎪const char* pat⎪...
2139
2140       M  This function is part of the experimental development API, and may
2141          change or disappear without notice.
2142
2143       o  This function should not have a compatibility macro to define, say,
2144          "Perl_parse" to "parse". It must be called as "Perl_parse".
2145
2146       x  This function isn't exported out of the Perl core.
2147
2148       m  This is implemented as a macro.
2149
2150       X  This function is explicitly exported.
2151
2152       E  This function is visible to extensions included in the Perl core.
2153
2154       b  Binary backward compatibility; this function is a macro but also has
2155          a "Perl_" implementation (which is exported).
2156
2157       others
2158          See the comments at the top of "embed.fnc" for others.
2159
2160       If you edit embed.pl or embed.fnc, you will need to run "make
2161       regen_headers" to force a rebuild of embed.h and other auto-generated
2162       files.
2163
2164       Formatted Printing of IVs, UVs, and NVs
2165
2166       If you are printing IVs, UVs, or NVS instead of the stdio(3) style for‐
2167       matting codes like %d, %ld, %f, you should use the following macros for
2168       portability
2169
2170               IVdf            IV in decimal
2171               UVuf            UV in decimal
2172               UVof            UV in octal
2173               UVxf            UV in hexadecimal
2174               NVef            NV %e-like
2175               NVff            NV %f-like
2176               NVgf            NV %g-like
2177
2178       These will take care of 64-bit integers and long doubles.  For example:
2179
2180               printf("IV is %"IVdf"\n", iv);
2181
2182       The IVdf will expand to whatever is the correct format for the IVs.
2183
2184       If you are printing addresses of pointers, use UVxf combined with
2185       PTR2UV(), do not use %lx or %p.
2186
2187       Pointer-To-Integer and Integer-To-Pointer
2188
2189       Because pointer size does not necessarily equal integer size, use the
2190       follow macros to do it right.
2191
2192               PTR2UV(pointer)
2193               PTR2IV(pointer)
2194               PTR2NV(pointer)
2195               INT2PTR(pointertotype, integer)
2196
2197       For example:
2198
2199               IV  iv = ...;
2200               SV *sv = INT2PTR(SV*, iv);
2201
2202       and
2203
2204               AV *av = ...;
2205               UV  uv = PTR2UV(av);
2206
2207       Source Documentation
2208
2209       There's an effort going on to document the internal functions and auto‐
2210       matically produce reference manuals from them - perlapi is one such
2211       manual which details all the functions which are available to XS writ‐
2212       ers. perlintern is the autogenerated manual for the functions which are
2213       not part of the API and are supposedly for internal use only.
2214
2215       Source documentation is created by putting POD comments into the C
2216       source, like this:
2217
2218        /*
2219        =for apidoc sv_setiv
2220
2221        Copies an integer into the given SV.  Does not handle 'set' magic.  See
2222        C<sv_setiv_mg>.
2223
2224        =cut
2225        */
2226
2227       Please try and supply some documentation if you add functions to the
2228       Perl core.
2229
2230       Backwards compatibility
2231
2232       The Perl API changes over time. New functions are added or the inter‐
2233       faces of existing functions are changed. The "Devel::PPPort" module
2234       tries to provide compatibility code for some of these changes, so XS
2235       writers don't have to code it themselves when supporting multiple ver‐
2236       sions of Perl.
2237
2238       "Devel::PPPort" generates a C header file ppport.h that can also be run
2239       as a Perl script. To generate ppport.h, run:
2240
2241           perl -MDevel::PPPort -eDevel::PPPort::WriteFile
2242
2243       Besides checking existing XS code, the script can also be used to
2244       retrieve compatibility information for various API calls using the
2245       "--api-info" command line switch. For example:
2246
2247         % perl ppport.h --api-info=sv_magicext
2248
2249       For details, see "perldoc ppport.h".
2250

Unicode Support

2252       Perl 5.6.0 introduced Unicode support. It's important for porters and
2253       XS writers to understand this support and make sure that the code they
2254       write does not corrupt Unicode data.
2255
2256       What is Unicode, anyway?
2257
2258       In the olden, less enlightened times, we all used to use ASCII. Most of
2259       us did, anyway. The big problem with ASCII is that it's American. Well,
2260       no, that's not actually the problem; the problem is that it's not par‐
2261       ticularly useful for people who don't use the Roman alphabet. What used
2262       to happen was that particular languages would stick their own alphabet
2263       in the upper range of the sequence, between 128 and 255. Of course, we
2264       then ended up with plenty of variants that weren't quite ASCII, and the
2265       whole point of it being a standard was lost.
2266
2267       Worse still, if you've got a language like Chinese or Japanese that has
2268       hundreds or thousands of characters, then you really can't fit them
2269       into a mere 256, so they had to forget about ASCII altogether, and
2270       build their own systems using pairs of numbers to refer to one charac‐
2271       ter.
2272
2273       To fix this, some people formed Unicode, Inc. and produced a new char‐
2274       acter set containing all the characters you can possibly think of and
2275       more. There are several ways of representing these characters, and the
2276       one Perl uses is called UTF-8. UTF-8 uses a variable number of bytes to
2277       represent a character, instead of just one. You can learn more about
2278       Unicode at http://www.unicode.org/
2279
2280       How can I recognise a UTF-8 string?
2281
2282       You can't. This is because UTF-8 data is stored in bytes just like
2283       non-UTF-8 data. The Unicode character 200, (0xC8 for you hex types)
2284       capital E with a grave accent, is represented by the two bytes
2285       "v196.172". Unfortunately, the non-Unicode string "chr(196).chr(172)"
2286       has that byte sequence as well. So you can't tell just by looking -
2287       this is what makes Unicode input an interesting problem.
2288
2289       The API function "is_utf8_string" can help; it'll tell you if a string
2290       contains only valid UTF-8 characters. However, it can't do the work for
2291       you. On a character-by-character basis, "is_utf8_char" will tell you
2292       whether the current character in a string is valid UTF-8.
2293
2294       How does UTF-8 represent Unicode characters?
2295
2296       As mentioned above, UTF-8 uses a variable number of bytes to store a
2297       character. Characters with values 1...128 are stored in one byte, just
2298       like good ol' ASCII. Character 129 is stored as "v194.129"; this con‐
2299       tinues up to character 191, which is "v194.191". Now we've run out of
2300       bits (191 is binary 10111111) so we move on; 192 is "v195.128". And so
2301       it goes on, moving to three bytes at character 2048.
2302
2303       Assuming you know you're dealing with a UTF-8 string, you can find out
2304       how long the first character in it is with the "UTF8SKIP" macro:
2305
2306           char *utf = "\305\233\340\240\201";
2307           I32 len;
2308
2309           len = UTF8SKIP(utf); /* len is 2 here */
2310           utf += len;
2311           len = UTF8SKIP(utf); /* len is 3 here */
2312
2313       Another way to skip over characters in a UTF-8 string is to use
2314       "utf8_hop", which takes a string and a number of characters to skip
2315       over. You're on your own about bounds checking, though, so don't use it
2316       lightly.
2317
2318       All bytes in a multi-byte UTF-8 character will have the high bit set,
2319       so you can test if you need to do something special with this character
2320       like this (the UTF8_IS_INVARIANT() is a macro that tests whether the
2321       byte can be encoded as a single byte even in UTF-8):
2322
2323           U8 *utf;
2324           UV uv;      /* Note: a UV, not a U8, not a char */
2325
2326           if (!UTF8_IS_INVARIANT(*utf))
2327               /* Must treat this as UTF-8 */
2328               uv = utf8_to_uv(utf);
2329           else
2330               /* OK to treat this character as a byte */
2331               uv = *utf;
2332
2333       You can also see in that example that we use "utf8_to_uv" to get the
2334       value of the character; the inverse function "uv_to_utf8" is available
2335       for putting a UV into UTF-8:
2336
2337           if (!UTF8_IS_INVARIANT(uv))
2338               /* Must treat this as UTF8 */
2339               utf8 = uv_to_utf8(utf8, uv);
2340           else
2341               /* OK to treat this character as a byte */
2342               *utf8++ = uv;
2343
2344       You must convert characters to UVs using the above functions if you're
2345       ever in a situation where you have to match UTF-8 and non-UTF-8 charac‐
2346       ters. You may not skip over UTF-8 characters in this case. If you do
2347       this, you'll lose the ability to match hi-bit non-UTF-8 characters; for
2348       instance, if your UTF-8 string contains "v196.172", and you skip that
2349       character, you can never match a "chr(200)" in a non-UTF-8 string.  So
2350       don't do that!
2351
2352       How does Perl store UTF-8 strings?
2353
2354       Currently, Perl deals with Unicode strings and non-Unicode strings
2355       slightly differently. If a string has been identified as being UTF-8
2356       encoded, Perl will set a flag in the SV, "SVf_UTF8". You can check and
2357       manipulate this flag with the following macros:
2358
2359           SvUTF8(sv)
2360           SvUTF8_on(sv)
2361           SvUTF8_off(sv)
2362
2363       This flag has an important effect on Perl's treatment of the string: if
2364       Unicode data is not properly distinguished, regular expressions,
2365       "length", "substr" and other string handling operations will have unde‐
2366       sirable results.
2367
2368       The problem comes when you have, for instance, a string that isn't
2369       flagged is UTF-8, and contains a byte sequence that could be UTF-8 -
2370       especially when combining non-UTF-8 and UTF-8 strings.
2371
2372       Never forget that the "SVf_UTF8" flag is separate to the PV value; you
2373       need be sure you don't accidentally knock it off while you're manipu‐
2374       lating SVs. More specifically, you cannot expect to do this:
2375
2376           SV *sv;
2377           SV *nsv;
2378           STRLEN len;
2379           char *p;
2380
2381           p = SvPV(sv, len);
2382           frobnicate(p);
2383           nsv = newSVpvn(p, len);
2384
2385       The "char*" string does not tell you the whole story, and you can't
2386       copy or reconstruct an SV just by copying the string value. Check if
2387       the old SV has the UTF-8 flag set, and act accordingly:
2388
2389           p = SvPV(sv, len);
2390           frobnicate(p);
2391           nsv = newSVpvn(p, len);
2392           if (SvUTF8(sv))
2393               SvUTF8_on(nsv);
2394
2395       In fact, your "frobnicate" function should be made aware of whether or
2396       not it's dealing with UTF-8 data, so that it can handle the string
2397       appropriately.
2398
2399       Since just passing an SV to an XS function and copying the data of the
2400       SV is not enough to copy the UTF-8 flags, even less right is just pass‐
2401       ing a "char *" to an XS function.
2402
2403       How do I convert a string to UTF-8?
2404
2405       If you're mixing UTF-8 and non-UTF-8 strings, you might find it neces‐
2406       sary to upgrade one of the strings to UTF-8. If you've got an SV, the
2407       easiest way to do this is:
2408
2409           sv_utf8_upgrade(sv);
2410
2411       However, you must not do this, for example:
2412
2413           if (!SvUTF8(left))
2414               sv_utf8_upgrade(left);
2415
2416       If you do this in a binary operator, you will actually change one of
2417       the strings that came into the operator, and, while it shouldn't be
2418       noticeable by the end user, it can cause problems.
2419
2420       Instead, "bytes_to_utf8" will give you a UTF-8-encoded copy of its
2421       string argument. This is useful for having the data available for com‐
2422       parisons and so on, without harming the original SV. There's also
2423       "utf8_to_bytes" to go the other way, but naturally, this will fail if
2424       the string contains any characters above 255 that can't be represented
2425       in a single byte.
2426
2427       Is there anything else I need to know?
2428
2429       Not really. Just remember these things:
2430
2431       ·  There's no way to tell if a string is UTF-8 or not. You can tell if
2432          an SV is UTF-8 by looking at is "SvUTF8" flag. Don't forget to set
2433          the flag if something should be UTF-8. Treat the flag as part of the
2434          PV, even though it's not - if you pass on the PV to somewhere, pass
2435          on the flag too.
2436
2437       ·  If a string is UTF-8, always use "utf8_to_uv" to get at the value,
2438          unless "UTF8_IS_INVARIANT(*s)" in which case you can use *s.
2439
2440       ·  When writing a character "uv" to a UTF-8 string, always use
2441          "uv_to_utf8", unless "UTF8_IS_INVARIANT(uv))" in which case you can
2442          use "*s = uv".
2443
2444       ·  Mixing UTF-8 and non-UTF-8 strings is tricky. Use "bytes_to_utf8" to
2445          get a new string which is UTF-8 encoded. There are tricks you can
2446          use to delay deciding whether you need to use a UTF-8 string until
2447          you get to a high character - "HALF_UPGRADE" is one of those.
2448

Custom Operators

2450       Custom operator support is a new experimental feature that allows you
2451       to define your own ops. This is primarily to allow the building of
2452       interpreters for other languages in the Perl core, but it also allows
2453       optimizations through the creation of "macro-ops" (ops which perform
2454       the functions of multiple ops which are usually executed together, such
2455       as "gvsv, gvsv, add".)
2456
2457       This feature is implemented as a new op type, "OP_CUSTOM". The Perl
2458       core does not "know" anything special about this op type, and so it
2459       will not be involved in any optimizations. This also means that you can
2460       define your custom ops to be any op structure - unary, binary, list and
2461       so on - you like.
2462
2463       It's important to know what custom operators won't do for you. They
2464       won't let you add new syntax to Perl, directly. They won't even let you
2465       add new keywords, directly. In fact, they won't change the way Perl
2466       compiles a program at all. You have to do those changes yourself, after
2467       Perl has compiled the program. You do this either by manipulating the
2468       op tree using a "CHECK" block and the "B::Generate" module, or by
2469       adding a custom peephole optimizer with the "optimize" module.
2470
2471       When you do this, you replace ordinary Perl ops with custom ops by cre‐
2472       ating ops with the type "OP_CUSTOM" and the "pp_addr" of your own PP
2473       function. This should be defined in XS code, and should look like the
2474       PP ops in "pp_*.c". You are responsible for ensuring that your op takes
2475       the appropriate number of values from the stack, and you are responsi‐
2476       ble for adding stack marks if necessary.
2477
2478       You should also "register" your op with the Perl interpreter so that it
2479       can produce sensible error and warning messages. Since it is possible
2480       to have multiple custom ops within the one "logical" op type "OP_CUS‐
2481       TOM", Perl uses the value of "o->op_ppaddr" as a key into the "PL_cus‐
2482       tom_op_descs" and "PL_custom_op_names" hashes. This means you need to
2483       enter a name and description for your op at the appropriate place in
2484       the "PL_custom_op_names" and "PL_custom_op_descs" hashes.
2485
2486       Forthcoming versions of "B::Generate" (version 1.0 and above) should
2487       directly support the creation of custom ops by name.
2488

AUTHORS

2490       Until May 1997, this document was maintained by Jeff Okamoto
2491       <okamoto@corp.hp.com>.  It is now maintained as part of Perl itself by
2492       the Perl 5 Porters <perl5-porters@perl.org>.
2493
2494       With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
2495       Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil Bow‐
2496       ers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, Stephen
2497       McCamant, and Gurusamy Sarathy.
2498