1PERLGUTS(1) Perl Programmers Reference Guide PERLGUTS(1)
2
3
4
6 perlguts - Introduction to the Perl API
7
9 This document attempts to describe how to use the Perl API, as well as
10 to provide some info on the basic workings of the Perl core. It is far
11 from complete and probably contains many errors. Please refer any
12 questions or comments to the author below.
13
15 Datatypes
16 Perl has three typedefs that handle Perl's three main data types:
17
18 SV Scalar Value
19 AV Array Value
20 HV Hash Value
21
22 Each typedef has specific routines that manipulate the various data
23 types.
24
25 What is an "IV"?
26 Perl uses a special typedef IV which is a simple signed integer type
27 that is guaranteed to be large enough to hold a pointer (as well as an
28 integer). Additionally, there is the UV, which is simply an unsigned
29 IV.
30
31 Perl also uses two special typedefs, I32 and I16, which will always be
32 at least 32-bits and 16-bits long, respectively. (Again, there are U32
33 and U16, as well.) They will usually be exactly 32 and 16 bits long,
34 but on Crays they will both be 64 bits.
35
36 Working with SVs
37 An SV can be created and loaded with one command. There are five types
38 of values that can be loaded: an integer value (IV), an unsigned
39 integer value (UV), a double (NV), a string (PV), and another scalar
40 (SV).
41
42 The seven routines are:
43
44 SV* newSViv(IV);
45 SV* newSVuv(UV);
46 SV* newSVnv(double);
47 SV* newSVpv(const char*, STRLEN);
48 SV* newSVpvn(const char*, STRLEN);
49 SV* newSVpvf(const char*, ...);
50 SV* newSVsv(SV*);
51
52 "STRLEN" is an integer type (Size_t, usually defined as size_t in
53 config.h) guaranteed to be large enough to represent the size of any
54 string that perl can handle.
55
56 In the unlikely case of a SV requiring more complex initialisation, you
57 can create an empty SV with newSV(len). If "len" is 0 an empty SV of
58 type NULL is returned, else an SV of type PV is returned with len + 1
59 (for the NUL) bytes of storage allocated, accessible via SvPVX. In
60 both cases the SV has the undef value.
61
62 SV *sv = newSV(0); /* no storage allocated */
63 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
64 * allocated */
65
66 To change the value of an already-existing SV, there are eight
67 routines:
68
69 void sv_setiv(SV*, IV);
70 void sv_setuv(SV*, UV);
71 void sv_setnv(SV*, double);
72 void sv_setpv(SV*, const char*);
73 void sv_setpvn(SV*, const char*, STRLEN)
74 void sv_setpvf(SV*, const char*, ...);
75 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
76 SV **, I32, bool *);
77 void sv_setsv(SV*, SV*);
78
79 Notice that you can choose to specify the length of the string to be
80 assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
81 allow Perl to calculate the length by using "sv_setpv" or by specifying
82 0 as the second argument to "newSVpv". Be warned, though, that Perl
83 will determine the string's length by using "strlen", which depends on
84 the string terminating with a NUL character, and not otherwise
85 containing NULs.
86
87 The arguments of "sv_setpvf" are processed like "sprintf", and the
88 formatted output becomes the value.
89
90 "sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to
91 specify either a pointer to a variable argument list or the address and
92 length of an array of SVs. The last argument points to a boolean; on
93 return, if that boolean is true, then locale-specific information has
94 been used to format the string, and the string's contents are therefore
95 untrustworthy (see perlsec). This pointer may be NULL if that
96 information is not important. Note that this function requires you to
97 specify the length of the format.
98
99 The "sv_set*()" functions are not generic enough to operate on values
100 that have "magic". See "Magic Virtual Tables" later in this document.
101
102 All SVs that contain strings should be terminated with a NUL character.
103 If it is not NUL-terminated there is a risk of core dumps and
104 corruptions from code which passes the string to C functions or system
105 calls which expect a NUL-terminated string. Perl's own functions
106 typically add a trailing NUL for this reason. Nevertheless, you should
107 be very careful when you pass a string stored in an SV to a C function
108 or system call.
109
110 To access the actual value that an SV points to, you can use the
111 macros:
112
113 SvIV(SV*)
114 SvUV(SV*)
115 SvNV(SV*)
116 SvPV(SV*, STRLEN len)
117 SvPV_nolen(SV*)
118
119 which will automatically coerce the actual scalar type into an IV, UV,
120 double, or string.
121
122 In the "SvPV" macro, the length of the string returned is placed into
123 the variable "len" (this is a macro, so you do not use &len). If you
124 do not care what the length of the data is, use the "SvPV_nolen" macro.
125 Historically the "SvPV" macro with the global variable "PL_na" has been
126 used in this case. But that can be quite inefficient because "PL_na"
127 must be accessed in thread-local storage in threaded Perl. In any
128 case, remember that Perl allows arbitrary strings of data that may both
129 contain NULs and might not be terminated by a NUL.
130
131 Also remember that C doesn't allow you to safely say "foo(SvPV(s, len),
132 len);". It might work with your compiler, but it won't work for
133 everyone. Break this sort of statement up into separate assignments:
134
135 SV *s;
136 STRLEN len;
137 char *ptr;
138 ptr = SvPV(s, len);
139 foo(ptr, len);
140
141 If you want to know if the scalar value is TRUE, you can use:
142
143 SvTRUE(SV*)
144
145 Although Perl will automatically grow strings for you, if you need to
146 force Perl to allocate more memory for your SV, you can use the macro
147
148 SvGROW(SV*, STRLEN newlen)
149
150 which will determine if more memory needs to be allocated. If so, it
151 will call the function "sv_grow". Note that "SvGROW" can only
152 increase, not decrease, the allocated memory of an SV and that it does
153 not automatically add space for the trailing NUL byte (perl's own
154 string functions typically do "SvGROW(sv, len + 1)").
155
156 If you have an SV and want to know what kind of data Perl thinks is
157 stored in it, you can use the following macros to check the type of SV
158 you have.
159
160 SvIOK(SV*)
161 SvNOK(SV*)
162 SvPOK(SV*)
163
164 You can get and set the current length of the string stored in an SV
165 with the following macros:
166
167 SvCUR(SV*)
168 SvCUR_set(SV*, I32 val)
169
170 You can also get a pointer to the end of the string stored in the SV
171 with the macro:
172
173 SvEND(SV*)
174
175 But note that these last three macros are valid only if "SvPOK()" is
176 true.
177
178 If you want to append something to the end of string stored in an
179 "SV*", you can use the following functions:
180
181 void sv_catpv(SV*, const char*);
182 void sv_catpvn(SV*, const char*, STRLEN);
183 void sv_catpvf(SV*, const char*, ...);
184 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
185 I32, bool);
186 void sv_catsv(SV*, SV*);
187
188 The first function calculates the length of the string to be appended
189 by using "strlen". In the second, you specify the length of the string
190 yourself. The third function processes its arguments like "sprintf"
191 and appends the formatted output. The fourth function works like
192 "vsprintf". You can specify the address and length of an array of SVs
193 instead of the va_list argument. The fifth function extends the string
194 stored in the first SV with the string stored in the second SV. It
195 also forces the second SV to be interpreted as a string.
196
197 The "sv_cat*()" functions are not generic enough to operate on values
198 that have "magic". See "Magic Virtual Tables" later in this document.
199
200 If you know the name of a scalar variable, you can get a pointer to its
201 SV by using the following:
202
203 SV* get_sv("package::varname", 0);
204
205 This returns NULL if the variable does not exist.
206
207 If you want to know if this variable (or any other SV) is actually
208 "defined", you can call:
209
210 SvOK(SV*)
211
212 The scalar "undef" value is stored in an SV instance called
213 "PL_sv_undef".
214
215 Its address can be used whenever an "SV*" is needed. Make sure that you
216 don't try to compare a random sv with &PL_sv_undef. For example when
217 interfacing Perl code, it'll work correctly for:
218
219 foo(undef);
220
221 But won't work when called as:
222
223 $x = undef;
224 foo($x);
225
226 So to repeat always use SvOK() to check whether an sv is defined.
227
228 Also you have to be careful when using &PL_sv_undef as a value in AVs
229 or HVs (see "AVs, HVs and undefined values").
230
231 There are also the two values "PL_sv_yes" and "PL_sv_no", which contain
232 boolean TRUE and FALSE values, respectively. Like "PL_sv_undef", their
233 addresses can be used whenever an "SV*" is needed.
234
235 Do not be fooled into thinking that "(SV *) 0" is the same as
236 &PL_sv_undef. Take this code:
237
238 SV* sv = (SV*) 0;
239 if (I-am-to-return-a-real-value) {
240 sv = sv_2mortal(newSViv(42));
241 }
242 sv_setsv(ST(0), sv);
243
244 This code tries to return a new SV (which contains the value 42) if it
245 should return a real value, or undef otherwise. Instead it has
246 returned a NULL pointer which, somewhere down the line, will cause a
247 segmentation violation, bus error, or just weird results. Change the
248 zero to &PL_sv_undef in the first line and all will be well.
249
250 To free an SV that you've created, call "SvREFCNT_dec(SV*)". Normally
251 this call is not necessary (see "Reference Counts and Mortality").
252
253 Offsets
254 Perl provides the function "sv_chop" to efficiently remove characters
255 from the beginning of a string; you give it an SV and a pointer to
256 somewhere inside the PV, and it discards everything before the pointer.
257 The efficiency comes by means of a little hack: instead of actually
258 removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to
259 signal to other functions that the offset hack is in effect, and it
260 puts the number of bytes chopped off into the IV field of the SV. It
261 then moves the PV pointer (called "SvPVX") forward that many bytes, and
262 adjusts "SvCUR" and "SvLEN".
263
264 Hence, at this point, the start of the buffer that we allocated lives
265 at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing into
266 the middle of this allocated storage.
267
268 This is best demonstrated by example:
269
270 % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
271 SV = PVIV(0x8128450) at 0x81340f0
272 REFCNT = 1
273 FLAGS = (POK,OOK,pPOK)
274 IV = 1 (OFFSET)
275 PV = 0x8135781 ( "1" . ) "2345"\0
276 CUR = 4
277 LEN = 5
278
279 Here the number of bytes chopped off (1) is put into IV, and
280 "Devel::Peek::Dump" helpfully reminds us that this is an offset. The
281 portion of the string between the "real" and the "fake" beginnings is
282 shown in parentheses, and the values of "SvCUR" and "SvLEN" reflect the
283 fake beginning, not the real one.
284
285 Something similar to the offset hack is performed on AVs to enable
286 efficient shifting and splicing off the beginning of the array; while
287 "AvARRAY" points to the first element in the array that is visible from
288 Perl, "AvALLOC" points to the real start of the C array. These are
289 usually the same, but a "shift" operation can be carried out by
290 increasing "AvARRAY" by one and decreasing "AvFILL" and "AvMAX".
291 Again, the location of the real start of the C array only comes into
292 play when freeing the array. See "av_shift" in av.c.
293
294 What's Really Stored in an SV?
295 Recall that the usual method of determining the type of scalar you have
296 is to use "Sv*OK" macros. Because a scalar can be both a number and a
297 string, usually these macros will always return TRUE and calling the
298 "Sv*V" macros will do the appropriate conversion of string to
299 integer/double or integer/double to string.
300
301 If you really need to know if you have an integer, double, or string
302 pointer in an SV, you can use the following three macros instead:
303
304 SvIOKp(SV*)
305 SvNOKp(SV*)
306 SvPOKp(SV*)
307
308 These will tell you if you truly have an integer, double, or string
309 pointer stored in your SV. The "p" stands for private.
310
311 There are various ways in which the private and public flags may
312 differ. For example, a tied SV may have a valid underlying value in
313 the IV slot (so SvIOKp is true), but the data should be accessed via
314 the FETCH routine rather than directly, so SvIOK is false. Another is
315 when numeric conversion has occurred and precision has been lost: only
316 the private flag is set on 'lossy' values. So when an NV is converted
317 to an IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK
318 wont be.
319
320 In general, though, it's best to use the "Sv*V" macros.
321
322 Working with AVs
323 There are two ways to create and load an AV. The first method creates
324 an empty AV:
325
326 AV* newAV();
327
328 The second method both creates the AV and initially populates it with
329 SVs:
330
331 AV* av_make(I32 num, SV **ptr);
332
333 The second argument points to an array containing "num" "SV*"'s. Once
334 the AV has been created, the SVs can be destroyed, if so desired.
335
336 Once the AV has been created, the following operations are possible on
337 it:
338
339 void av_push(AV*, SV*);
340 SV* av_pop(AV*);
341 SV* av_shift(AV*);
342 void av_unshift(AV*, I32 num);
343
344 These should be familiar operations, with the exception of
345 "av_unshift". This routine adds "num" elements at the front of the
346 array with the "undef" value. You must then use "av_store" (described
347 below) to assign values to these new elements.
348
349 Here are some other functions:
350
351 I32 av_len(AV*);
352 SV** av_fetch(AV*, I32 key, I32 lval);
353 SV** av_store(AV*, I32 key, SV* val);
354
355 The "av_len" function returns the highest index value in an array (just
356 like $#array in Perl). If the array is empty, -1 is returned. The
357 "av_fetch" function returns the value at index "key", but if "lval" is
358 non-zero, then "av_fetch" will store an undef value at that index. The
359 "av_store" function stores the value "val" at index "key", and does not
360 increment the reference count of "val". Thus the caller is responsible
361 for taking care of that, and if "av_store" returns NULL, the caller
362 will have to decrement the reference count to avoid a memory leak.
363 Note that "av_fetch" and "av_store" both return "SV**"'s, not "SV*"'s
364 as their return value.
365
366 A few more:
367
368 void av_clear(AV*);
369 void av_undef(AV*);
370 void av_extend(AV*, I32 key);
371
372 The "av_clear" function deletes all the elements in the AV* array, but
373 does not actually delete the array itself. The "av_undef" function
374 will delete all the elements in the array plus the array itself. The
375 "av_extend" function extends the array so that it contains at least
376 "key+1" elements. If "key+1" is less than the currently allocated
377 length of the array, then nothing is done.
378
379 If you know the name of an array variable, you can get a pointer to its
380 AV by using the following:
381
382 AV* get_av("package::varname", 0);
383
384 This returns NULL if the variable does not exist.
385
386 See "Understanding the Magic of Tied Hashes and Arrays" for more
387 information on how to use the array access functions on tied arrays.
388
389 Working with HVs
390 To create an HV, you use the following routine:
391
392 HV* newHV();
393
394 Once the HV has been created, the following operations are possible on
395 it:
396
397 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
398 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
399
400 The "klen" parameter is the length of the key being passed in (Note
401 that you cannot pass 0 in as a value of "klen" to tell Perl to measure
402 the length of the key). The "val" argument contains the SV pointer to
403 the scalar being stored, and "hash" is the precomputed hash value (zero
404 if you want "hv_store" to calculate it for you). The "lval" parameter
405 indicates whether this fetch is actually a part of a store operation,
406 in which case a new undefined value will be added to the HV with the
407 supplied key and "hv_fetch" will return as if the value had already
408 existed.
409
410 Remember that "hv_store" and "hv_fetch" return "SV**"'s and not just
411 "SV*". To access the scalar value, you must first dereference the
412 return value. However, you should check to make sure that the return
413 value is not NULL before dereferencing it.
414
415 The first of these two functions checks if a hash table entry exists,
416 and the second deletes it.
417
418 bool hv_exists(HV*, const char* key, U32 klen);
419 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
420
421 If "flags" does not include the "G_DISCARD" flag then "hv_delete" will
422 create and return a mortal copy of the deleted value.
423
424 And more miscellaneous functions:
425
426 void hv_clear(HV*);
427 void hv_undef(HV*);
428
429 Like their AV counterparts, "hv_clear" deletes all the entries in the
430 hash table but does not actually delete the hash table. The "hv_undef"
431 deletes both the entries and the hash table itself.
432
433 Perl keeps the actual data in a linked list of structures with a
434 typedef of HE. These contain the actual key and value pointers (plus
435 extra administrative overhead). The key is a string pointer; the value
436 is an "SV*". However, once you have an "HE*", to get the actual key
437 and value, use the routines specified below.
438
439 I32 hv_iterinit(HV*);
440 /* Prepares starting point to traverse hash table */
441 HE* hv_iternext(HV*);
442 /* Get the next entry, and return a pointer to a
443 structure that has both the key and value */
444 char* hv_iterkey(HE* entry, I32* retlen);
445 /* Get the key from an HE structure and also return
446 the length of the key string */
447 SV* hv_iterval(HV*, HE* entry);
448 /* Return an SV pointer to the value of the HE
449 structure */
450 SV* hv_iternextsv(HV*, char** key, I32* retlen);
451 /* This convenience routine combines hv_iternext,
452 hv_iterkey, and hv_iterval. The key and retlen
453 arguments are return values for the key and its
454 length. The value is returned in the SV* argument */
455
456 If you know the name of a hash variable, you can get a pointer to its
457 HV by using the following:
458
459 HV* get_hv("package::varname", 0);
460
461 This returns NULL if the variable does not exist.
462
463 The hash algorithm is defined in the "PERL_HASH(hash, key, klen)"
464 macro:
465
466 hash = 0;
467 while (klen--)
468 hash = (hash * 33) + *key++;
469 hash = hash + (hash >> 5); /* after 5.6 */
470
471 The last step was added in version 5.6 to improve distribution of lower
472 bits in the resulting hash value.
473
474 See "Understanding the Magic of Tied Hashes and Arrays" for more
475 information on how to use the hash access functions on tied hashes.
476
477 Hash API Extensions
478 Beginning with version 5.004, the following functions are also
479 supported:
480
481 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
482 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
483
484 bool hv_exists_ent (HV* tb, SV* key, U32 hash);
485 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
486
487 SV* hv_iterkeysv (HE* entry);
488
489 Note that these functions take "SV*" keys, which simplifies writing of
490 extension code that deals with hash structures. These functions also
491 allow passing of "SV*" keys to "tie" functions without forcing you to
492 stringify the keys (unlike the previous set of functions).
493
494 They also return and accept whole hash entries ("HE*"), making their
495 use more efficient (since the hash number for a particular string
496 doesn't have to be recomputed every time). See perlapi for detailed
497 descriptions.
498
499 The following macros must always be used to access the contents of hash
500 entries. Note that the arguments to these macros must be simple
501 variables, since they may get evaluated more than once. See perlapi
502 for detailed descriptions of these macros.
503
504 HePV(HE* he, STRLEN len)
505 HeVAL(HE* he)
506 HeHASH(HE* he)
507 HeSVKEY(HE* he)
508 HeSVKEY_force(HE* he)
509 HeSVKEY_set(HE* he, SV* sv)
510
511 These two lower level macros are defined, but must only be used when
512 dealing with keys that are not "SV*"s:
513
514 HeKEY(HE* he)
515 HeKLEN(HE* he)
516
517 Note that both "hv_store" and "hv_store_ent" do not increment the
518 reference count of the stored "val", which is the caller's
519 responsibility. If these functions return a NULL value, the caller
520 will usually have to decrement the reference count of "val" to avoid a
521 memory leak.
522
523 AVs, HVs and undefined values
524 Sometimes you have to store undefined values in AVs or HVs. Although
525 this may be a rare case, it can be tricky. That's because you're used
526 to using &PL_sv_undef if you need an undefined SV.
527
528 For example, intuition tells you that this XS code:
529
530 AV *av = newAV();
531 av_store( av, 0, &PL_sv_undef );
532
533 is equivalent to this Perl code:
534
535 my @av;
536 $av[0] = undef;
537
538 Unfortunately, this isn't true. AVs use &PL_sv_undef as a marker for
539 indicating that an array element has not yet been initialized. Thus,
540 "exists $av[0]" would be true for the above Perl code, but false for
541 the array generated by the XS code.
542
543 Other problems can occur when storing &PL_sv_undef in HVs:
544
545 hv_store( hv, "key", 3, &PL_sv_undef, 0 );
546
547 This will indeed make the value "undef", but if you try to modify the
548 value of "key", you'll get the following error:
549
550 Modification of non-creatable hash value attempted
551
552 In perl 5.8.0, &PL_sv_undef was also used to mark placeholders in
553 restricted hashes. This caused such hash entries not to appear when
554 iterating over the hash or when checking for the keys with the
555 "hv_exists" function.
556
557 You can run into similar problems when you store &PL_sv_yes or
558 &PL_sv_no into AVs or HVs. Trying to modify such elements will give you
559 the following error:
560
561 Modification of a read-only value attempted
562
563 To make a long story short, you can use the special variables
564 &PL_sv_undef, &PL_sv_yes and &PL_sv_no with AVs and HVs, but you have
565 to make sure you know what you're doing.
566
567 Generally, if you want to store an undefined value in an AV or HV, you
568 should not use &PL_sv_undef, but rather create a new undefined value
569 using the "newSV" function, for example:
570
571 av_store( av, 42, newSV(0) );
572 hv_store( hv, "foo", 3, newSV(0), 0 );
573
574 References
575 References are a special type of scalar that point to other data types
576 (including other references).
577
578 To create a reference, use either of the following functions:
579
580 SV* newRV_inc((SV*) thing);
581 SV* newRV_noinc((SV*) thing);
582
583 The "thing" argument can be any of an "SV*", "AV*", or "HV*". The
584 functions are identical except that "newRV_inc" increments the
585 reference count of the "thing", while "newRV_noinc" does not. For
586 historical reasons, "newRV" is a synonym for "newRV_inc".
587
588 Once you have a reference, you can use the following macro to
589 dereference the reference:
590
591 SvRV(SV*)
592
593 then call the appropriate routines, casting the returned "SV*" to
594 either an "AV*" or "HV*", if required.
595
596 To determine if an SV is a reference, you can use the following macro:
597
598 SvROK(SV*)
599
600 To discover what type of value the reference refers to, use the
601 following macro and then check the return value.
602
603 SvTYPE(SvRV(SV*))
604
605 The most useful types that will be returned are:
606
607 SVt_IV Scalar
608 SVt_NV Scalar
609 SVt_PV Scalar
610 SVt_RV Scalar
611 SVt_PVAV Array
612 SVt_PVHV Hash
613 SVt_PVCV Code
614 SVt_PVGV Glob (possibly a file handle)
615 SVt_PVMG Blessed or Magical Scalar
616
617 See the sv.h header file for more details.
618
619 Blessed References and Class Objects
620 References are also used to support object-oriented programming. In
621 perl's OO lexicon, an object is simply a reference that has been
622 blessed into a package (or class). Once blessed, the programmer may
623 now use the reference to access the various methods in the class.
624
625 A reference can be blessed into a package with the following function:
626
627 SV* sv_bless(SV* sv, HV* stash);
628
629 The "sv" argument must be a reference value. The "stash" argument
630 specifies which class the reference will belong to. See "Stashes and
631 Globs" for information on converting class names into stashes.
632
633 /* Still under construction */
634
635 The following function upgrades rv to reference if not already one.
636 Creates a new SV for rv to point to. If "classname" is non-null, the
637 SV is blessed into the specified class. SV is returned.
638
639 SV* newSVrv(SV* rv, const char* classname);
640
641 The following three functions copy integer, unsigned integer or double
642 into an SV whose reference is "rv". SV is blessed if "classname" is
643 non-null.
644
645 SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
646 SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
647 SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
648
649 The following function copies the pointer value (the address, not the
650 string!) into an SV whose reference is rv. SV is blessed if
651 "classname" is non-null.
652
653 SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
654
655 The following function copies a string into an SV whose reference is
656 "rv". Set length to 0 to let Perl calculate the string length. SV is
657 blessed if "classname" is non-null.
658
659 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
660 STRLEN length);
661
662 The following function tests whether the SV is blessed into the
663 specified class. It does not check inheritance relationships.
664
665 int sv_isa(SV* sv, const char* name);
666
667 The following function tests whether the SV is a reference to a blessed
668 object.
669
670 int sv_isobject(SV* sv);
671
672 The following function tests whether the SV is derived from the
673 specified class. SV can be either a reference to a blessed object or a
674 string containing a class name. This is the function implementing the
675 "UNIVERSAL::isa" functionality.
676
677 bool sv_derived_from(SV* sv, const char* name);
678
679 To check if you've got an object derived from a specific class you have
680 to write:
681
682 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
683
684 Creating New Variables
685 To create a new Perl variable with an undef value which can be accessed
686 from your Perl script, use the following routines, depending on the
687 variable type.
688
689 SV* get_sv("package::varname", GV_ADD);
690 AV* get_av("package::varname", GV_ADD);
691 HV* get_hv("package::varname", GV_ADD);
692
693 Notice the use of GV_ADD as the second parameter. The new variable can
694 now be set, using the routines appropriate to the data type.
695
696 There are additional macros whose values may be bitwise OR'ed with the
697 "GV_ADD" argument to enable certain extra features. Those bits are:
698
699 GV_ADDMULTI
700 Marks the variable as multiply defined, thus preventing the:
701
702 Name <varname> used only once: possible typo
703
704 warning.
705
706 GV_ADDWARN
707 Issues the warning:
708
709 Had to create <varname> unexpectedly
710
711 if the variable did not exist before the function was called.
712
713 If you do not specify a package name, the variable is created in the
714 current package.
715
716 Reference Counts and Mortality
717 Perl uses a reference count-driven garbage collection mechanism. SVs,
718 AVs, or HVs (xV for short in the following) start their life with a
719 reference count of 1. If the reference count of an xV ever drops to 0,
720 then it will be destroyed and its memory made available for reuse.
721
722 This normally doesn't happen at the Perl level unless a variable is
723 undef'ed or the last variable holding a reference to it is changed or
724 overwritten. At the internal level, however, reference counts can be
725 manipulated with the following macros:
726
727 int SvREFCNT(SV* sv);
728 SV* SvREFCNT_inc(SV* sv);
729 void SvREFCNT_dec(SV* sv);
730
731 However, there is one other function which manipulates the reference
732 count of its argument. The "newRV_inc" function, you will recall,
733 creates a reference to the specified argument. As a side effect, it
734 increments the argument's reference count. If this is not what you
735 want, use "newRV_noinc" instead.
736
737 For example, imagine you want to return a reference from an XSUB
738 function. Inside the XSUB routine, you create an SV which initially
739 has a reference count of one. Then you call "newRV_inc", passing it
740 the just-created SV. This returns the reference as a new SV, but the
741 reference count of the SV you passed to "newRV_inc" has been
742 incremented to two. Now you return the reference from the XSUB routine
743 and forget about the SV. But Perl hasn't! Whenever the returned
744 reference is destroyed, the reference count of the original SV is
745 decreased to one and nothing happens. The SV will hang around without
746 any way to access it until Perl itself terminates. This is a memory
747 leak.
748
749 The correct procedure, then, is to use "newRV_noinc" instead of
750 "newRV_inc". Then, if and when the last reference is destroyed, the
751 reference count of the SV will go to zero and it will be destroyed,
752 stopping any memory leak.
753
754 There are some convenience functions available that can help with the
755 destruction of xVs. These functions introduce the concept of
756 "mortality". An xV that is mortal has had its reference count marked
757 to be decremented, but not actually decremented, until "a short time
758 later". Generally the term "short time later" means a single Perl
759 statement, such as a call to an XSUB function. The actual determinant
760 for when mortal xVs have their reference count decremented depends on
761 two macros, SAVETMPS and FREETMPS. See perlcall and perlxs for more
762 details on these macros.
763
764 "Mortalization" then is at its simplest a deferred "SvREFCNT_dec".
765 However, if you mortalize a variable twice, the reference count will
766 later be decremented twice.
767
768 "Mortal" SVs are mainly used for SVs that are placed on perl's stack.
769 For example an SV which is created just to pass a number to a called
770 sub is made mortal to have it cleaned up automatically when it's popped
771 off the stack. Similarly, results returned by XSUBs (which are pushed
772 on the stack) are often made mortal.
773
774 To create a mortal variable, use the functions:
775
776 SV* sv_newmortal()
777 SV* sv_2mortal(SV*)
778 SV* sv_mortalcopy(SV*)
779
780 The first call creates a mortal SV (with no value), the second converts
781 an existing SV to a mortal SV (and thus defers a call to
782 "SvREFCNT_dec"), and the third creates a mortal copy of an existing SV.
783 Because "sv_newmortal" gives the new SV no value, it must normally be
784 given one via "sv_setpv", "sv_setiv", etc. :
785
786 SV *tmp = sv_newmortal();
787 sv_setiv(tmp, an_integer);
788
789 As that is multiple C statements it is quite common so see this idiom
790 instead:
791
792 SV *tmp = sv_2mortal(newSViv(an_integer));
793
794 You should be careful about creating mortal variables. Strange things
795 can happen if you make the same value mortal within multiple contexts,
796 or if you make a variable mortal multiple times. Thinking of
797 "Mortalization" as deferred "SvREFCNT_dec" should help to minimize such
798 problems. For example if you are passing an SV which you know has a
799 high enough REFCNT to survive its use on the stack you need not do any
800 mortalization. If you are not sure then doing an "SvREFCNT_inc" and
801 "sv_2mortal", or making a "sv_mortalcopy" is safer.
802
803 The mortal routines are not just for SVs; AVs and HVs can be made
804 mortal by passing their address (type-casted to "SV*") to the
805 "sv_2mortal" or "sv_mortalcopy" routines.
806
807 Stashes and Globs
808 A stash is a hash that contains all variables that are defined within a
809 package. Each key of the stash is a symbol name (shared by all the
810 different types of objects that have the same name), and each value in
811 the hash table is a GV (Glob Value). This GV in turn contains
812 references to the various objects of that name, including (but not
813 limited to) the following:
814
815 Scalar Value
816 Array Value
817 Hash Value
818 I/O Handle
819 Format
820 Subroutine
821
822 There is a single stash called "PL_defstash" that holds the items that
823 exist in the "main" package. To get at the items in other packages,
824 append the string "::" to the package name. The items in the "Foo"
825 package are in the stash "Foo::" in PL_defstash. The items in the
826 "Bar::Baz" package are in the stash "Baz::" in "Bar::"'s stash.
827
828 To get the stash pointer for a particular package, use the function:
829
830 HV* gv_stashpv(const char* name, I32 flags)
831 HV* gv_stashsv(SV*, I32 flags)
832
833 The first function takes a literal string, the second uses the string
834 stored in the SV. Remember that a stash is just a hash table, so you
835 get back an "HV*". The "flags" flag will create a new package if it is
836 set to GV_ADD.
837
838 The name that "gv_stash*v" wants is the name of the package whose
839 symbol table you want. The default package is called "main". If you
840 have multiply nested packages, pass their names to "gv_stash*v",
841 separated by "::" as in the Perl language itself.
842
843 Alternately, if you have an SV that is a blessed reference, you can
844 find out the stash pointer by using:
845
846 HV* SvSTASH(SvRV(SV*));
847
848 then use the following to get the package name itself:
849
850 char* HvNAME(HV* stash);
851
852 If you need to bless or re-bless an object you can use the following
853 function:
854
855 SV* sv_bless(SV*, HV* stash)
856
857 where the first argument, an "SV*", must be a reference, and the second
858 argument is a stash. The returned "SV*" can now be used in the same
859 way as any other SV.
860
861 For more information on references and blessings, consult perlref.
862
863 Double-Typed SVs
864 Scalar variables normally contain only one type of value, an integer,
865 double, pointer, or reference. Perl will automatically convert the
866 actual scalar data from the stored type into the requested type.
867
868 Some scalar variables contain more than one type of scalar data. For
869 example, the variable $! contains either the numeric value of "errno"
870 or its string equivalent from either "strerror" or "sys_errlist[]".
871
872 To force multiple data values into an SV, you must do two things: use
873 the "sv_set*v" routines to add the additional scalar type, then set a
874 flag so that Perl will believe it contains more than one type of data.
875 The four macros to set the flags are:
876
877 SvIOK_on
878 SvNOK_on
879 SvPOK_on
880 SvROK_on
881
882 The particular macro you must use depends on which "sv_set*v" routine
883 you called first. This is because every "sv_set*v" routine turns on
884 only the bit for the particular type of data being set, and turns off
885 all the rest.
886
887 For example, to create a new Perl variable called "dberror" that
888 contains both the numeric and descriptive string error values, you
889 could use the following code:
890
891 extern int dberror;
892 extern char *dberror_list;
893
894 SV* sv = get_sv("dberror", GV_ADD);
895 sv_setiv(sv, (IV) dberror);
896 sv_setpv(sv, dberror_list[dberror]);
897 SvIOK_on(sv);
898
899 If the order of "sv_setiv" and "sv_setpv" had been reversed, then the
900 macro "SvPOK_on" would need to be called instead of "SvIOK_on".
901
902 Magic Variables
903 [This section still under construction. Ignore everything here. Post
904 no bills. Everything not permitted is forbidden.]
905
906 Any SV may be magical, that is, it has special features that a normal
907 SV does not have. These features are stored in the SV structure in a
908 linked list of "struct magic"'s, typedef'ed to "MAGIC".
909
910 struct magic {
911 MAGIC* mg_moremagic;
912 MGVTBL* mg_virtual;
913 U16 mg_private;
914 char mg_type;
915 U8 mg_flags;
916 I32 mg_len;
917 SV* mg_obj;
918 char* mg_ptr;
919 };
920
921 Note this is current as of patchlevel 0, and could change at any time.
922
923 Assigning Magic
924 Perl adds magic to an SV using the sv_magic function:
925
926 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
927
928 The "sv" argument is a pointer to the SV that is to acquire a new
929 magical feature.
930
931 If "sv" is not already magical, Perl uses the "SvUPGRADE" macro to
932 convert "sv" to type "SVt_PVMG". Perl then continues by adding new
933 magic to the beginning of the linked list of magical features. Any
934 prior entry of the same type of magic is deleted. Note that this can
935 be overridden, and multiple instances of the same type of magic can be
936 associated with an SV.
937
938 The "name" and "namlen" arguments are used to associate a string with
939 the magic, typically the name of a variable. "namlen" is stored in the
940 "mg_len" field and if "name" is non-null then either a "savepvn" copy
941 of "name" or "name" itself is stored in the "mg_ptr" field, depending
942 on whether "namlen" is greater than zero or equal to zero respectively.
943 As a special case, if "(name && namlen == HEf_SVKEY)" then "name" is
944 assumed to contain an "SV*" and is stored as-is with its REFCNT
945 incremented.
946
947 The sv_magic function uses "how" to determine which, if any, predefined
948 "Magic Virtual Table" should be assigned to the "mg_virtual" field.
949 See the "Magic Virtual Tables" section below. The "how" argument is
950 also stored in the "mg_type" field. The value of "how" should be chosen
951 from the set of macros "PERL_MAGIC_foo" found in perl.h. Note that
952 before these macros were added, Perl internals used to directly use
953 character literals, so you may occasionally come across old code or
954 documentation referring to 'U' magic rather than "PERL_MAGIC_uvar" for
955 example.
956
957 The "obj" argument is stored in the "mg_obj" field of the "MAGIC"
958 structure. If it is not the same as the "sv" argument, the reference
959 count of the "obj" object is incremented. If it is the same, or if the
960 "how" argument is "PERL_MAGIC_arylen", or if it is a NULL pointer, then
961 "obj" is merely stored, without the reference count being incremented.
962
963 See also "sv_magicext" in perlapi for a more flexible way to add magic
964 to an SV.
965
966 There is also a function to add magic to an "HV":
967
968 void hv_magic(HV *hv, GV *gv, int how);
969
970 This simply calls "sv_magic" and coerces the "gv" argument into an
971 "SV".
972
973 To remove the magic from an SV, call the function sv_unmagic:
974
975 int sv_unmagic(SV *sv, int type);
976
977 The "type" argument should be equal to the "how" value when the "SV"
978 was initially made magical.
979
980 However, note that "sv_unmagic" removes all magic of a certain "type"
981 from the "SV". If you want to remove only certain magic of a "type"
982 based on the magic virtual table, use "sv_unmagicext" instead:
983
984 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
985
986 Magic Virtual Tables
987 The "mg_virtual" field in the "MAGIC" structure is a pointer to an
988 "MGVTBL", which is a structure of function pointers and stands for
989 "Magic Virtual Table" to handle the various operations that might be
990 applied to that variable.
991
992 The "MGVTBL" has five (or sometimes eight) pointers to the following
993 routine types:
994
995 int (*svt_get)(SV* sv, MAGIC* mg);
996 int (*svt_set)(SV* sv, MAGIC* mg);
997 U32 (*svt_len)(SV* sv, MAGIC* mg);
998 int (*svt_clear)(SV* sv, MAGIC* mg);
999 int (*svt_free)(SV* sv, MAGIC* mg);
1000
1001 int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv,
1002 const char *name, I32 namlen);
1003 int (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param);
1004 int (*svt_local)(SV *nsv, MAGIC *mg);
1005
1006 This MGVTBL structure is set at compile-time in perl.h and there are
1007 currently 32 types. These different structures contain pointers to
1008 various routines that perform additional actions depending on which
1009 function is being called.
1010
1011 Function pointer Action taken
1012 ---------------- ------------
1013 svt_get Do something before the value of the SV is
1014 retrieved.
1015 svt_set Do something after the SV is assigned a value.
1016 svt_len Report on the SV's length.
1017 svt_clear Clear something the SV represents.
1018 svt_free Free any extra storage associated with the SV.
1019
1020 svt_copy copy tied variable magic to a tied element
1021 svt_dup duplicate a magic structure during thread cloning
1022 svt_local copy magic to local value during 'local'
1023
1024 For instance, the MGVTBL structure called "vtbl_sv" (which corresponds
1025 to an "mg_type" of "PERL_MAGIC_sv") contains:
1026
1027 { magic_get, magic_set, magic_len, 0, 0 }
1028
1029 Thus, when an SV is determined to be magical and of type
1030 "PERL_MAGIC_sv", if a get operation is being performed, the routine
1031 "magic_get" is called. All the various routines for the various
1032 magical types begin with "magic_". NOTE: the magic routines are not
1033 considered part of the Perl API, and may not be exported by the Perl
1034 library.
1035
1036 The last three slots are a recent addition, and for source code
1037 compatibility they are only checked for if one of the three flags
1038 MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. This means that most
1039 code can continue declaring a vtable as a 5-element value. These three
1040 are currently used exclusively by the threading code, and are highly
1041 subject to change.
1042
1043 The current kinds of Magic Virtual Tables are:
1044
1045 mg_type
1046 (old-style char and macro) MGVTBL Type of magic
1047 -------------------------- ------ -------------
1048 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
1049 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
1050 % PERL_MAGIC_rhash (none) extra data for restricted
1051 hashes
1052 . PERL_MAGIC_pos vtbl_pos pos() lvalue
1053 : PERL_MAGIC_symtab (none) extra data for symbol
1054 tables
1055 < PERL_MAGIC_backref vtbl_backref for weak ref data
1056 @ PERL_MAGIC_arylen_p (none) to move arylen out of
1057 XPVAV
1058 A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash
1059 a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element
1060 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
1061 (fast string search)
1062 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
1063 (AMT) on stash
1064 D PERL_MAGIC_regdata vtbl_regdata Regex match position data
1065 (@+ and @- vars)
1066 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
1067 element
1068 E PERL_MAGIC_env vtbl_env %ENV hash
1069 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
1070 f PERL_MAGIC_fm vtbl_regdata Formline
1071 ('compiled' format)
1072 G PERL_MAGIC_study vtbl_regexp study()ed string
1073 g PERL_MAGIC_regex_global vtbl_mglob m//g target
1074 H PERL_MAGIC_hints vtbl_hints %^H hash
1075 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
1076 I PERL_MAGIC_isa vtbl_isa @ISA array
1077 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
1078 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
1079 L PERL_MAGIC_dbfile (none) Debugger %_<filename
1080 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
1081 element
1082 N PERL_MAGIC_shared (none) Shared between threads
1083 n PERL_MAGIC_shared_scalar (none) Shared between threads
1084 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
1085 P PERL_MAGIC_tied vtbl_pack Tied array or hash
1086 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
1087 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
1088 r PERL_MAGIC_qr vtbl_regexp precompiled qr// regex
1089 S PERL_MAGIC_sig (none) %SIG hash
1090 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
1091 t PERL_MAGIC_taint vtbl_taint Taintedness
1092 U PERL_MAGIC_uvar vtbl_uvar Available for use by
1093 extensions
1094 u PERL_MAGIC_uvar_elem (none) Reserved for use by
1095 extensions
1096 V PERL_MAGIC_vstring vtbl_vstring SV was vstring literal
1097 v PERL_MAGIC_vec vtbl_vec vec() lvalue
1098 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
1099 x PERL_MAGIC_substr vtbl_substr substr() lvalue
1100 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
1101 variable / smart parameter
1102 vivification
1103 ] PERL_MAGIC_checkcall (none) inlining/mutation of call
1104 to this CV
1105 ~ PERL_MAGIC_ext (none) Available for use by
1106 extensions
1107
1108 When an uppercase and lowercase letter both exist in the table, then
1109 the uppercase letter is typically used to represent some kind of
1110 composite type (a list or a hash), and the lowercase letter is used to
1111 represent an element of that composite type. Some internals code makes
1112 use of this case relationship. However, 'v' and 'V' (vec and v-string)
1113 are in no way related.
1114
1115 The "PERL_MAGIC_ext" and "PERL_MAGIC_uvar" magic types are defined
1116 specifically for use by extensions and will not be used by perl itself.
1117 Extensions can use "PERL_MAGIC_ext" magic to 'attach' private
1118 information to variables (typically objects). This is especially
1119 useful because there is no way for normal perl code to corrupt this
1120 private information (unlike using extra elements of a hash object).
1121
1122 Similarly, "PERL_MAGIC_uvar" magic can be used much like tie() to call
1123 a C function any time a scalar's value is used or changed. The
1124 "MAGIC"'s "mg_ptr" field points to a "ufuncs" structure:
1125
1126 struct ufuncs {
1127 I32 (*uf_val)(pTHX_ IV, SV*);
1128 I32 (*uf_set)(pTHX_ IV, SV*);
1129 IV uf_index;
1130 };
1131
1132 When the SV is read from or written to, the "uf_val" or "uf_set"
1133 function will be called with "uf_index" as the first arg and a pointer
1134 to the SV as the second. A simple example of how to add
1135 "PERL_MAGIC_uvar" magic is shown below. Note that the ufuncs structure
1136 is copied by sv_magic, so you can safely allocate it on the stack.
1137
1138 void
1139 Umagic(sv)
1140 SV *sv;
1141 PREINIT:
1142 struct ufuncs uf;
1143 CODE:
1144 uf.uf_val = &my_get_fn;
1145 uf.uf_set = &my_set_fn;
1146 uf.uf_index = 0;
1147 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1148
1149 Attaching "PERL_MAGIC_uvar" to arrays is permissible but has no effect.
1150
1151 For hashes there is a specialized hook that gives control over hash
1152 keys (but not values). This hook calls "PERL_MAGIC_uvar" 'get' magic
1153 if the "set" function in the "ufuncs" structure is NULL. The hook is
1154 activated whenever the hash is accessed with a key specified as an "SV"
1155 through the functions "hv_store_ent", "hv_fetch_ent", "hv_delete_ent",
1156 and "hv_exists_ent". Accessing the key as a string through the
1157 functions without the "..._ent" suffix circumvents the hook. See
1158 "GUTS" in Hash::Util::FieldHash for a detailed description.
1159
1160 Note that because multiple extensions may be using "PERL_MAGIC_ext" or
1161 "PERL_MAGIC_uvar" magic, it is important for extensions to take extra
1162 care to avoid conflict. Typically only using the magic on objects
1163 blessed into the same class as the extension is sufficient. For
1164 "PERL_MAGIC_ext" magic, it is usually a good idea to define an
1165 "MGVTBL", even if all its fields will be 0, so that individual "MAGIC"
1166 pointers can be identified as a particular kind of magic using their
1167 magic virtual table. "mg_findext" provides an easy way to do that:
1168
1169 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
1170
1171 MAGIC *mg;
1172 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
1173 /* this is really ours, not another module's PERL_MAGIC_ext */
1174 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr;
1175 ...
1176 }
1177
1178 Also note that the "sv_set*()" and "sv_cat*()" functions described
1179 earlier do not invoke 'set' magic on their targets. This must be done
1180 by the user either by calling the "SvSETMAGIC()" macro after calling
1181 these functions, or by using one of the "sv_set*_mg()" or
1182 "sv_cat*_mg()" functions. Similarly, generic C code must call the
1183 "SvGETMAGIC()" macro to invoke any 'get' magic if they use an SV
1184 obtained from external sources in functions that don't handle magic.
1185 See perlapi for a description of these functions. For example, calls
1186 to the "sv_cat*()" functions typically need to be followed by
1187 "SvSETMAGIC()", but they don't need a prior "SvGETMAGIC()" since their
1188 implementation handles 'get' magic.
1189
1190 Finding Magic
1191 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
1192 * type */
1193
1194 This routine returns a pointer to a "MAGIC" structure stored in the SV.
1195 If the SV does not have that magical feature, "NULL" is returned. If
1196 the SV has multiple instances of that magical feature, the first one
1197 will be returned. "mg_findext" can be used to find a "MAGIC" structure
1198 of an SV based on both its magic type and its magic virtual table:
1199
1200 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
1201
1202 Also, if the SV passed to "mg_find" or "mg_findext" is not of type
1203 SVt_PVMG, Perl may core dump.
1204
1205 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1206
1207 This routine checks to see what types of magic "sv" has. If the
1208 mg_type field is an uppercase letter, then the mg_obj is copied to
1209 "nsv", but the mg_type field is changed to be the lowercase letter.
1210
1211 Understanding the Magic of Tied Hashes and Arrays
1212 Tied hashes and arrays are magical beasts of the "PERL_MAGIC_tied"
1213 magic type.
1214
1215 WARNING: As of the 5.004 release, proper usage of the array and hash
1216 access functions requires understanding a few caveats. Some of these
1217 caveats are actually considered bugs in the API, to be fixed in later
1218 releases, and are bracketed with [MAYCHANGE] below. If you find
1219 yourself actually applying such information in this section, be aware
1220 that the behavior may change in the future, umm, without warning.
1221
1222 The perl tie function associates a variable with an object that
1223 implements the various GET, SET, etc methods. To perform the
1224 equivalent of the perl tie function from an XSUB, you must mimic this
1225 behaviour. The code below carries out the necessary steps - firstly it
1226 creates a new hash, and then creates a second hash which it blesses
1227 into the class which will implement the tie methods. Lastly it ties the
1228 two hashes together, and returns a reference to the new tied hash.
1229 Note that the code below does NOT call the TIEHASH method in the MyTie
1230 class - see "Calling Perl Routines from within C Programs" for details
1231 on how to do this.
1232
1233 SV*
1234 mytie()
1235 PREINIT:
1236 HV *hash;
1237 HV *stash;
1238 SV *tie;
1239 CODE:
1240 hash = newHV();
1241 tie = newRV_noinc((SV*)newHV());
1242 stash = gv_stashpv("MyTie", GV_ADD);
1243 sv_bless(tie, stash);
1244 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1245 RETVAL = newRV_noinc(hash);
1246 OUTPUT:
1247 RETVAL
1248
1249 The "av_store" function, when given a tied array argument, merely
1250 copies the magic of the array onto the value to be "stored", using
1251 "mg_copy". It may also return NULL, indicating that the value did not
1252 actually need to be stored in the array. [MAYCHANGE] After a call to
1253 "av_store" on a tied array, the caller will usually need to call
1254 "mg_set(val)" to actually invoke the perl level "STORE" method on the
1255 TIEARRAY object. If "av_store" did return NULL, a call to
1256 "SvREFCNT_dec(val)" will also be usually necessary to avoid a memory
1257 leak. [/MAYCHANGE]
1258
1259 The previous paragraph is applicable verbatim to tied hash access using
1260 the "hv_store" and "hv_store_ent" functions as well.
1261
1262 "av_fetch" and the corresponding hash functions "hv_fetch" and
1263 "hv_fetch_ent" actually return an undefined mortal value whose magic
1264 has been initialized using "mg_copy". Note the value so returned does
1265 not need to be deallocated, as it is already mortal. [MAYCHANGE] But
1266 you will need to call "mg_get()" on the returned value in order to
1267 actually invoke the perl level "FETCH" method on the underlying TIE
1268 object. Similarly, you may also call "mg_set()" on the return value
1269 after possibly assigning a suitable value to it using "sv_setsv",
1270 which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]
1271
1272 [MAYCHANGE] In other words, the array or hash fetch/store functions
1273 don't really fetch and store actual values in the case of tied arrays
1274 and hashes. They merely call "mg_copy" to attach magic to the values
1275 that were meant to be "stored" or "fetched". Later calls to "mg_get"
1276 and "mg_set" actually do the job of invoking the TIE methods on the
1277 underlying objects. Thus the magic mechanism currently implements a
1278 kind of lazy access to arrays and hashes.
1279
1280 Currently (as of perl version 5.004), use of the hash and array access
1281 functions requires the user to be aware of whether they are operating
1282 on "normal" hashes and arrays, or on their tied variants. The API may
1283 be changed to provide more transparent access to both tied and normal
1284 data types in future versions. [/MAYCHANGE]
1285
1286 You would do well to understand that the TIEARRAY and TIEHASH
1287 interfaces are mere sugar to invoke some perl method calls while using
1288 the uniform hash and array syntax. The use of this sugar imposes some
1289 overhead (typically about two to four extra opcodes per FETCH/STORE
1290 operation, in addition to the creation of all the mortal variables
1291 required to invoke the methods). This overhead will be comparatively
1292 small if the TIE methods are themselves substantial, but if they are
1293 only a few statements long, the overhead will not be insignificant.
1294
1295 Localizing changes
1296 Perl has a very handy construction
1297
1298 {
1299 local $var = 2;
1300 ...
1301 }
1302
1303 This construction is approximately equivalent to
1304
1305 {
1306 my $oldvar = $var;
1307 $var = 2;
1308 ...
1309 $var = $oldvar;
1310 }
1311
1312 The biggest difference is that the first construction would reinstate
1313 the initial value of $var, irrespective of how control exits the block:
1314 "goto", "return", "die"/"eval", etc. It is a little bit more efficient
1315 as well.
1316
1317 There is a way to achieve a similar task from C via Perl API: create a
1318 pseudo-block, and arrange for some changes to be automatically undone
1319 at the end of it, either explicit, or via a non-local exit (via die()).
1320 A block-like construct is created by a pair of "ENTER"/"LEAVE" macros
1321 (see "Returning a Scalar" in perlcall). Such a construct may be
1322 created specially for some important localized task, or an existing one
1323 (like boundaries of enclosing Perl subroutine/block, or an existing
1324 pair for freeing TMPs) may be used. (In the second case the overhead of
1325 additional localization must be almost negligible.) Note that any XSUB
1326 is automatically enclosed in an "ENTER"/"LEAVE" pair.
1327
1328 Inside such a pseudo-block the following service is available:
1329
1330 "SAVEINT(int i)"
1331 "SAVEIV(IV i)"
1332 "SAVEI32(I32 i)"
1333 "SAVELONG(long i)"
1334 These macros arrange things to restore the value of integer
1335 variable "i" at the end of enclosing pseudo-block.
1336
1337 SAVESPTR(s)
1338 SAVEPPTR(p)
1339 These macros arrange things to restore the value of pointers "s"
1340 and "p". "s" must be a pointer of a type which survives conversion
1341 to "SV*" and back, "p" should be able to survive conversion to
1342 "char*" and back.
1343
1344 "SAVEFREESV(SV *sv)"
1345 The refcount of "sv" would be decremented at the end of pseudo-
1346 block. This is similar to "sv_2mortal" in that it is also a
1347 mechanism for doing a delayed "SvREFCNT_dec". However, while
1348 "sv_2mortal" extends the lifetime of "sv" until the beginning of
1349 the next statement, "SAVEFREESV" extends it until the end of the
1350 enclosing scope. These lifetimes can be wildly different.
1351
1352 Also compare "SAVEMORTALIZESV".
1353
1354 "SAVEMORTALIZESV(SV *sv)"
1355 Just like "SAVEFREESV", but mortalizes "sv" at the end of the
1356 current scope instead of decrementing its reference count. This
1357 usually has the effect of keeping "sv" alive until the statement
1358 that called the currently live scope has finished executing.
1359
1360 "SAVEFREEOP(OP *op)"
1361 The "OP *" is op_free()ed at the end of pseudo-block.
1362
1363 SAVEFREEPV(p)
1364 The chunk of memory which is pointed to by "p" is Safefree()ed at
1365 the end of pseudo-block.
1366
1367 "SAVECLEARSV(SV *sv)"
1368 Clears a slot in the current scratchpad which corresponds to "sv"
1369 at the end of pseudo-block.
1370
1371 "SAVEDELETE(HV *hv, char *key, I32 length)"
1372 The key "key" of "hv" is deleted at the end of pseudo-block. The
1373 string pointed to by "key" is Safefree()ed. If one has a key in
1374 short-lived storage, the corresponding string may be reallocated
1375 like this:
1376
1377 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1378
1379 "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
1380 At the end of pseudo-block the function "f" is called with the only
1381 argument "p".
1382
1383 "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
1384 At the end of pseudo-block the function "f" is called with the
1385 implicit context argument (if any), and "p".
1386
1387 "SAVESTACK_POS()"
1388 The current offset on the Perl internal stack (cf. "SP") is
1389 restored at the end of pseudo-block.
1390
1391 The following API list contains functions, thus one needs to provide
1392 pointers to the modifiable data explicitly (either C pointers, or
1393 Perlish "GV *"s). Where the above macros take "int", a similar
1394 function takes "int *".
1395
1396 "SV* save_scalar(GV *gv)"
1397 Equivalent to Perl code "local $gv".
1398
1399 "AV* save_ary(GV *gv)"
1400 "HV* save_hash(GV *gv)"
1401 Similar to "save_scalar", but localize @gv and %gv.
1402
1403 "void save_item(SV *item)"
1404 Duplicates the current value of "SV", on the exit from the current
1405 "ENTER"/"LEAVE" pseudo-block will restore the value of "SV" using
1406 the stored value. It doesn't handle magic. Use "save_scalar" if
1407 magic is affected.
1408
1409 "void save_list(SV **sarg, I32 maxsarg)"
1410 A variant of "save_item" which takes multiple arguments via an
1411 array "sarg" of "SV*" of length "maxsarg".
1412
1413 "SV* save_svref(SV **sptr)"
1414 Similar to "save_scalar", but will reinstate an "SV *".
1415
1416 "void save_aptr(AV **aptr)"
1417 "void save_hptr(HV **hptr)"
1418 Similar to "save_svref", but localize "AV *" and "HV *".
1419
1420 The "Alias" module implements localization of the basic types within
1421 the caller's scope. People who are interested in how to localize
1422 things in the containing scope should take a look there too.
1423
1425 XSUBs and the Argument Stack
1426 The XSUB mechanism is a simple way for Perl programs to access C
1427 subroutines. An XSUB routine will have a stack that contains the
1428 arguments from the Perl program, and a way to map from the Perl data
1429 structures to a C equivalent.
1430
1431 The stack arguments are accessible through the ST(n) macro, which
1432 returns the "n"'th stack argument. Argument 0 is the first argument
1433 passed in the Perl subroutine call. These arguments are "SV*", and can
1434 be used anywhere an "SV*" is used.
1435
1436 Most of the time, output from the C routine can be handled through use
1437 of the RETVAL and OUTPUT directives. However, there are some cases
1438 where the argument stack is not already long enough to handle all the
1439 return values. An example is the POSIX tzname() call, which takes no
1440 arguments, but returns two, the local time zone's standard and summer
1441 time abbreviations.
1442
1443 To handle this situation, the PPCODE directive is used and the stack is
1444 extended using the macro:
1445
1446 EXTEND(SP, num);
1447
1448 where "SP" is the macro that represents the local copy of the stack
1449 pointer, and "num" is the number of elements the stack should be
1450 extended by.
1451
1452 Now that there is room on the stack, values can be pushed on it using
1453 "PUSHs" macro. The pushed values will often need to be "mortal" (See
1454 "Reference Counts and Mortality"):
1455
1456 PUSHs(sv_2mortal(newSViv(an_integer)))
1457 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
1458 PUSHs(sv_2mortal(newSVnv(a_double)))
1459 PUSHs(sv_2mortal(newSVpv("Some String",0)))
1460 /* Although the last example is better written as the more
1461 * efficient: */
1462 PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
1463
1464 And now the Perl program calling "tzname", the two values will be
1465 assigned as in:
1466
1467 ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1468
1469 An alternate (and possibly simpler) method to pushing values on the
1470 stack is to use the macro:
1471
1472 XPUSHs(SV*)
1473
1474 This macro automatically adjusts the stack for you, if needed. Thus,
1475 you do not need to call "EXTEND" to extend the stack.
1476
1477 Despite their suggestions in earlier versions of this document the
1478 macros "(X)PUSH[iunp]" are not suited to XSUBs which return multiple
1479 results. For that, either stick to the "(X)PUSHs" macros shown above,
1480 or use the new "m(X)PUSH[iunp]" macros instead; see "Putting a C value
1481 on Perl stack".
1482
1483 For more information, consult perlxs and perlxstut.
1484
1485 Autoloading with XSUBs
1486 If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts
1487 the fully-qualified name of the autoloaded subroutine in the $AUTOLOAD
1488 variable of the XSUB's package.
1489
1490 But it also puts the same information in certain fields of the XSUB
1491 itself:
1492
1493 HV *stash = CvSTASH(cv);
1494 const char *subname = SvPVX(cv);
1495 STRLEN name_length = SvCUR(cv); /* in bytes */
1496 U32 is_utf8 = SvUTF8(cv);
1497
1498 "SvPVX(cv)" contains just the sub name itself, not including the
1499 package. For an AUTOLOAD routine in UNIVERSAL or one of its
1500 superclasses, "CvSTASH(cv)" returns NULL during a method call on a
1501 nonexistent package.
1502
1503 Note: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
1504 XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in
1505 the XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If
1506 you need to support 5.8-5.14, use the XSUB's fields.
1507
1508 Calling Perl Routines from within C Programs
1509 There are four routines that can be used to call a Perl subroutine from
1510 within a C program. These four are:
1511
1512 I32 call_sv(SV*, I32);
1513 I32 call_pv(const char*, I32);
1514 I32 call_method(const char*, I32);
1515 I32 call_argv(const char*, I32, register char**);
1516
1517 The routine most often used is "call_sv". The "SV*" argument contains
1518 either the name of the Perl subroutine to be called, or a reference to
1519 the subroutine. The second argument consists of flags that control the
1520 context in which the subroutine is called, whether or not the
1521 subroutine is being passed arguments, how errors should be trapped, and
1522 how to treat return values.
1523
1524 All four routines return the number of arguments that the subroutine
1525 returned on the Perl stack.
1526
1527 These routines used to be called "perl_call_sv", etc., before Perl
1528 v5.6.0, but those names are now deprecated; macros of the same name are
1529 provided for compatibility.
1530
1531 When using any of these routines (except "call_argv"), the programmer
1532 must manipulate the Perl stack. These include the following macros and
1533 functions:
1534
1535 dSP
1536 SP
1537 PUSHMARK()
1538 PUTBACK
1539 SPAGAIN
1540 ENTER
1541 SAVETMPS
1542 FREETMPS
1543 LEAVE
1544 XPUSH*()
1545 POP*()
1546
1547 For a detailed description of calling conventions from C to Perl,
1548 consult perlcall.
1549
1550 Memory Allocation
1551 Allocation
1552
1553 All memory meant to be used with the Perl API functions should be
1554 manipulated using the macros described in this section. The macros
1555 provide the necessary transparency between differences in the actual
1556 malloc implementation that is used within perl.
1557
1558 It is suggested that you enable the version of malloc that is
1559 distributed with Perl. It keeps pools of various sizes of unallocated
1560 memory in order to satisfy allocation requests more quickly. However,
1561 on some platforms, it may cause spurious malloc or free errors.
1562
1563 The following three macros are used to initially allocate memory :
1564
1565 Newx(pointer, number, type);
1566 Newxc(pointer, number, type, cast);
1567 Newxz(pointer, number, type);
1568
1569 The first argument "pointer" should be the name of a variable that will
1570 point to the newly allocated memory.
1571
1572 The second and third arguments "number" and "type" specify how many of
1573 the specified type of data structure should be allocated. The argument
1574 "type" is passed to "sizeof". The final argument to "Newxc", "cast",
1575 should be used if the "pointer" argument is different from the "type"
1576 argument.
1577
1578 Unlike the "Newx" and "Newxc" macros, the "Newxz" macro calls "memzero"
1579 to zero out all the newly allocated memory.
1580
1581 Reallocation
1582
1583 Renew(pointer, number, type);
1584 Renewc(pointer, number, type, cast);
1585 Safefree(pointer)
1586
1587 These three macros are used to change a memory buffer size or to free a
1588 piece of memory no longer needed. The arguments to "Renew" and
1589 "Renewc" match those of "New" and "Newc" with the exception of not
1590 needing the "magic cookie" argument.
1591
1592 Moving
1593
1594 Move(source, dest, number, type);
1595 Copy(source, dest, number, type);
1596 Zero(dest, number, type);
1597
1598 These three macros are used to move, copy, or zero out previously
1599 allocated memory. The "source" and "dest" arguments point to the
1600 source and destination starting points. Perl will move, copy, or zero
1601 out "number" instances of the size of the "type" data structure (using
1602 the "sizeof" function).
1603
1604 PerlIO
1605 The most recent development releases of Perl have been experimenting
1606 with removing Perl's dependency on the "normal" standard I/O suite and
1607 allowing other stdio implementations to be used. This involves
1608 creating a new abstraction layer that then calls whichever
1609 implementation of stdio Perl was compiled with. All XSUBs should now
1610 use the functions in the PerlIO abstraction layer and not make any
1611 assumptions about what kind of stdio is being used.
1612
1613 For a complete description of the PerlIO abstraction, consult perlapio.
1614
1615 Putting a C value on Perl stack
1616 A lot of opcodes (this is an elementary operation in the internal perl
1617 stack machine) put an SV* on the stack. However, as an optimization the
1618 corresponding SV is (usually) not recreated each time. The opcodes
1619 reuse specially assigned SVs (targets) which are (as a corollary) not
1620 constantly freed/created.
1621
1622 Each of the targets is created only once (but see "Scratchpads and
1623 recursion" below), and when an opcode needs to put an integer, a
1624 double, or a string on stack, it just sets the corresponding parts of
1625 its target and puts the target on stack.
1626
1627 The macro to put this target on stack is "PUSHTARG", and it is directly
1628 used in some opcodes, as well as indirectly in zillions of others,
1629 which use it via "(X)PUSH[iunp]".
1630
1631 Because the target is reused, you must be careful when pushing multiple
1632 values on the stack. The following code will not do what you think:
1633
1634 XPUSHi(10);
1635 XPUSHi(20);
1636
1637 This translates as "set "TARG" to 10, push a pointer to "TARG" onto the
1638 stack; set "TARG" to 20, push a pointer to "TARG" onto the stack". At
1639 the end of the operation, the stack does not contain the values 10 and
1640 20, but actually contains two pointers to "TARG", which we have set to
1641 20.
1642
1643 If you need to push multiple different values then you should either
1644 use the "(X)PUSHs" macros, or else use the new "m(X)PUSH[iunp]" macros,
1645 none of which make use of "TARG". The "(X)PUSHs" macros simply push an
1646 SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
1647 will often need to be "mortal". The new "m(X)PUSH[iunp]" macros make
1648 this a little easier to achieve by creating a new mortal for you (via
1649 "(X)PUSHmortal"), pushing that onto the stack (extending it if
1650 necessary in the case of the "mXPUSH[iunp]" macros), and then setting
1651 its value. Thus, instead of writing this to "fix" the example above:
1652
1653 XPUSHs(sv_2mortal(newSViv(10)))
1654 XPUSHs(sv_2mortal(newSViv(20)))
1655
1656 you can simply write:
1657
1658 mXPUSHi(10)
1659 mXPUSHi(20)
1660
1661 On a related note, if you do use "(X)PUSH[iunp]", then you're going to
1662 need a "dTARG" in your variable declarations so that the "*PUSH*"
1663 macros can make use of the local variable "TARG". See also "dTARGET"
1664 and "dXSTARG".
1665
1666 Scratchpads
1667 The question remains on when the SVs which are targets for opcodes are
1668 created. The answer is that they are created when the current unit--a
1669 subroutine or a file (for opcodes for statements outside of
1670 subroutines)--is compiled. During this time a special anonymous Perl
1671 array is created, which is called a scratchpad for the current unit.
1672
1673 A scratchpad keeps SVs which are lexicals for the current unit and are
1674 targets for opcodes. One can deduce that an SV lives on a scratchpad by
1675 looking on its flags: lexicals have "SVs_PADMY" set, and targets have
1676 "SVs_PADTMP" set.
1677
1678 The correspondence between OPs and targets is not 1-to-1. Different OPs
1679 in the compile tree of the unit can use the same target, if this would
1680 not conflict with the expected life of the temporary.
1681
1682 Scratchpads and recursion
1683 In fact it is not 100% true that a compiled unit contains a pointer to
1684 the scratchpad AV. In fact it contains a pointer to an AV of
1685 (initially) one element, and this element is the scratchpad AV. Why do
1686 we need an extra level of indirection?
1687
1688 The answer is recursion, and maybe threads. Both these can create
1689 several execution pointers going into the same subroutine. For the
1690 subroutine-child not write over the temporaries for the subroutine-
1691 parent (lifespan of which covers the call to the child), the parent and
1692 the child should have different scratchpads. (And the lexicals should
1693 be separate anyway!)
1694
1695 So each subroutine is born with an array of scratchpads (of length 1).
1696 On each entry to the subroutine it is checked that the current depth of
1697 the recursion is not more than the length of this array, and if it is,
1698 new scratchpad is created and pushed into the array.
1699
1700 The targets on this scratchpad are "undef"s, but they are already
1701 marked with correct flags.
1702
1704 Code tree
1705 Here we describe the internal form your code is converted to by Perl.
1706 Start with a simple example:
1707
1708 $a = $b + $c;
1709
1710 This is converted to a tree similar to this one:
1711
1712 assign-to
1713 / \
1714 + $a
1715 / \
1716 $b $c
1717
1718 (but slightly more complicated). This tree reflects the way Perl
1719 parsed your code, but has nothing to do with the execution order.
1720 There is an additional "thread" going through the nodes of the tree
1721 which shows the order of execution of the nodes. In our simplified
1722 example above it looks like:
1723
1724 $b ---> $c ---> + ---> $a ---> assign-to
1725
1726 But with the actual compile tree for "$a = $b + $c" it is different:
1727 some nodes optimized away. As a corollary, though the actual tree
1728 contains more nodes than our simplified example, the execution order is
1729 the same as in our example.
1730
1731 Examining the tree
1732 If you have your perl compiled for debugging (usually done with
1733 "-DDEBUGGING" on the "Configure" command line), you may examine the
1734 compiled tree by specifying "-Dx" on the Perl command line. The output
1735 takes several lines per node, and for "$b+$c" it looks like this:
1736
1737 5 TYPE = add ===> 6
1738 TARG = 1
1739 FLAGS = (SCALAR,KIDS)
1740 {
1741 TYPE = null ===> (4)
1742 (was rv2sv)
1743 FLAGS = (SCALAR,KIDS)
1744 {
1745 3 TYPE = gvsv ===> 4
1746 FLAGS = (SCALAR)
1747 GV = main::b
1748 }
1749 }
1750 {
1751 TYPE = null ===> (5)
1752 (was rv2sv)
1753 FLAGS = (SCALAR,KIDS)
1754 {
1755 4 TYPE = gvsv ===> 5
1756 FLAGS = (SCALAR)
1757 GV = main::c
1758 }
1759 }
1760
1761 This tree has 5 nodes (one per "TYPE" specifier), only 3 of them are
1762 not optimized away (one per number in the left column). The immediate
1763 children of the given node correspond to "{}" pairs on the same level
1764 of indentation, thus this listing corresponds to the tree:
1765
1766 add
1767 / \
1768 null null
1769 | |
1770 gvsv gvsv
1771
1772 The execution order is indicated by "===>" marks, thus it is "3 4 5 6"
1773 (node 6 is not included into above listing), i.e., "gvsv gvsv add
1774 whatever".
1775
1776 Each of these nodes represents an op, a fundamental operation inside
1777 the Perl core. The code which implements each operation can be found in
1778 the pp*.c files; the function which implements the op with type "gvsv"
1779 is "pp_gvsv", and so on. As the tree above shows, different ops have
1780 different numbers of children: "add" is a binary operator, as one would
1781 expect, and so has two children. To accommodate the various different
1782 numbers of children, there are various types of op data structure, and
1783 they link together in different ways.
1784
1785 The simplest type of op structure is "OP": this has no children. Unary
1786 operators, "UNOP"s, have one child, and this is pointed to by the
1787 "op_first" field. Binary operators ("BINOP"s) have not only an
1788 "op_first" field but also an "op_last" field. The most complex type of
1789 op is a "LISTOP", which has any number of children. In this case, the
1790 first child is pointed to by "op_first" and the last child by
1791 "op_last". The children in between can be found by iteratively
1792 following the "op_sibling" pointer from the first child to the last.
1793
1794 There are also two other op types: a "PMOP" holds a regular expression,
1795 and has no children, and a "LOOP" may or may not have children. If the
1796 "op_children" field is non-zero, it behaves like a "LISTOP". To
1797 complicate matters, if a "UNOP" is actually a "null" op after
1798 optimization (see "Compile pass 2: context propagation") it will still
1799 have children in accordance with its former type.
1800
1801 Another way to examine the tree is to use a compiler back-end module,
1802 such as B::Concise.
1803
1804 Compile pass 1: check routines
1805 The tree is created by the compiler while yacc code feeds it the
1806 constructions it recognizes. Since yacc works bottom-up, so does the
1807 first pass of perl compilation.
1808
1809 What makes this pass interesting for perl developers is that some
1810 optimization may be performed on this pass. This is optimization by
1811 so-called "check routines". The correspondence between node names and
1812 corresponding check routines is described in opcode.pl (do not forget
1813 to run "make regen_headers" if you modify this file).
1814
1815 A check routine is called when the node is fully constructed except for
1816 the execution-order thread. Since at this time there are no back-links
1817 to the currently constructed node, one can do most any operation to the
1818 top-level node, including freeing it and/or creating new nodes
1819 above/below it.
1820
1821 The check routine returns the node which should be inserted into the
1822 tree (if the top-level node was not modified, check routine returns its
1823 argument).
1824
1825 By convention, check routines have names "ck_*". They are usually
1826 called from "new*OP" subroutines (or "convert") (which in turn are
1827 called from perly.y).
1828
1829 Compile pass 1a: constant folding
1830 Immediately after the check routine is called the returned node is
1831 checked for being compile-time executable. If it is (the value is
1832 judged to be constant) it is immediately executed, and a constant node
1833 with the "return value" of the corresponding subtree is substituted
1834 instead. The subtree is deleted.
1835
1836 If constant folding was not performed, the execution-order thread is
1837 created.
1838
1839 Compile pass 2: context propagation
1840 When a context for a part of compile tree is known, it is propagated
1841 down through the tree. At this time the context can have 5 values
1842 (instead of 2 for runtime context): void, boolean, scalar, list, and
1843 lvalue. In contrast with the pass 1 this pass is processed from top to
1844 bottom: a node's context determines the context for its children.
1845
1846 Additional context-dependent optimizations are performed at this time.
1847 Since at this moment the compile tree contains back-references (via
1848 "thread" pointers), nodes cannot be free()d now. To allow optimized-
1849 away nodes at this stage, such nodes are null()ified instead of
1850 free()ing (i.e. their type is changed to OP_NULL).
1851
1852 Compile pass 3: peephole optimization
1853 After the compile tree for a subroutine (or for an "eval" or a file) is
1854 created, an additional pass over the code is performed. This pass is
1855 neither top-down or bottom-up, but in the execution order (with
1856 additional complications for conditionals). Optimizations performed at
1857 this stage are subject to the same restrictions as in the pass 2.
1858
1859 Peephole optimizations are done by calling the function pointed to by
1860 the global variable "PL_peepp". By default, "PL_peepp" just calls the
1861 function pointed to by the global variable "PL_rpeepp". By default,
1862 that performs some basic op fixups and optimisations along the
1863 execution-order op chain, and recursively calls "PL_rpeepp" for each
1864 side chain of ops (resulting from conditionals). Extensions may
1865 provide additional optimisations or fixups, hooking into either the
1866 per-subroutine or recursive stage, like this:
1867
1868 static peep_t prev_peepp;
1869 static void my_peep(pTHX_ OP *o)
1870 {
1871 /* custom per-subroutine optimisation goes here */
1872 prev_peepp(o);
1873 /* custom per-subroutine optimisation may also go here */
1874 }
1875 BOOT:
1876 prev_peepp = PL_peepp;
1877 PL_peepp = my_peep;
1878
1879 static peep_t prev_rpeepp;
1880 static void my_rpeep(pTHX_ OP *o)
1881 {
1882 OP *orig_o = o;
1883 for(; o; o = o->op_next) {
1884 /* custom per-op optimisation goes here */
1885 }
1886 prev_rpeepp(orig_o);
1887 }
1888 BOOT:
1889 prev_rpeepp = PL_rpeepp;
1890 PL_rpeepp = my_rpeep;
1891
1892 Pluggable runops
1893 The compile tree is executed in a runops function. There are two
1894 runops functions, in run.c and in dump.c. "Perl_runops_debug" is used
1895 with DEBUGGING and "Perl_runops_standard" is used otherwise. For fine
1896 control over the execution of the compile tree it is possible to
1897 provide your own runops function.
1898
1899 It's probably best to copy one of the existing runops functions and
1900 change it to suit your needs. Then, in the BOOT section of your XS
1901 file, add the line:
1902
1903 PL_runops = my_runops;
1904
1905 This function should be as efficient as possible to keep your programs
1906 running as fast as possible.
1907
1908 Compile-time scope hooks
1909 As of perl 5.14 it is possible to hook into the compile-time lexical
1910 scope mechanism using "Perl_blockhook_register". This is used like
1911 this:
1912
1913 STATIC void my_start_hook(pTHX_ int full);
1914 STATIC BHK my_hooks;
1915
1916 BOOT:
1917 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
1918 Perl_blockhook_register(aTHX_ &my_hooks);
1919
1920 This will arrange to have "my_start_hook" called at the start of
1921 compiling every lexical scope. The available hooks are:
1922
1923 "void bhk_start(pTHX_ int full)"
1924 This is called just after starting a new lexical scope. Note that
1925 Perl code like
1926
1927 if ($x) { ... }
1928
1929 creates two scopes: the first starts at the "(" and has "full ==
1930 1", the second starts at the "{" and has "full == 0". Both end at
1931 the "}", so calls to "start" and "pre/post_end" will match.
1932 Anything pushed onto the save stack by this hook will be popped
1933 just before the scope ends (between the "pre_" and "post_end"
1934 hooks, in fact).
1935
1936 "void bhk_pre_end(pTHX_ OP **o)"
1937 This is called at the end of a lexical scope, just before unwinding
1938 the stack. o is the root of the optree representing the scope; it
1939 is a double pointer so you can replace the OP if you need to.
1940
1941 "void bhk_post_end(pTHX_ OP **o)"
1942 This is called at the end of a lexical scope, just after unwinding
1943 the stack. o is as above. Note that it is possible for calls to
1944 "pre_" and "post_end" to nest, if there is something on the save
1945 stack that calls string eval.
1946
1947 "void bhk_eval(pTHX_ OP *const o)"
1948 This is called just before starting to compile an "eval STRING",
1949 "do FILE", "require" or "use", after the eval has been set up. o is
1950 the OP that requested the eval, and will normally be an
1951 "OP_ENTEREVAL", "OP_DOFILE" or "OP_REQUIRE".
1952
1953 Once you have your hook functions, you need a "BHK" structure to put
1954 them in. It's best to allocate it statically, since there is no way to
1955 free it once it's registered. The function pointers should be inserted
1956 into this structure using the "BhkENTRY_set" macro, which will also set
1957 flags indicating which entries are valid. If you do need to allocate
1958 your "BHK" dynamically for some reason, be sure to zero it before you
1959 start.
1960
1961 Once registered, there is no mechanism to switch these hooks off, so if
1962 that is necessary you will need to do this yourself. An entry in "%^H"
1963 is probably the best way, so the effect is lexically scoped; however it
1964 is also possible to use the "BhkDISABLE" and "BhkENABLE" macros to
1965 temporarily switch entries on and off. You should also be aware that
1966 generally speaking at least one scope will have opened before your
1967 extension is loaded, so you will see some "pre/post_end" pairs that
1968 didn't have a matching "start".
1969
1971 To aid debugging, the source file dump.c contains a number of functions
1972 which produce formatted output of internal data structures.
1973
1974 The most commonly used of these functions is "Perl_sv_dump"; it's used
1975 for dumping SVs, AVs, HVs, and CVs. The "Devel::Peek" module calls
1976 "sv_dump" to produce debugging output from Perl-space, so users of that
1977 module should already be familiar with its format.
1978
1979 "Perl_op_dump" can be used to dump an "OP" structure or any of its
1980 derivatives, and produces output similar to "perl -Dx"; in fact,
1981 "Perl_dump_eval" will dump the main root of the code being evaluated,
1982 exactly like "-Dx".
1983
1984 Other useful functions are "Perl_dump_sub", which turns a "GV" into an
1985 op tree, "Perl_dump_packsubs" which calls "Perl_dump_sub" on all the
1986 subroutines in a package like so: (Thankfully, these are all xsubs, so
1987 there is no op tree)
1988
1989 (gdb) print Perl_dump_packsubs(PL_defstash)
1990
1991 SUB attributes::bootstrap = (xsub 0x811fedc 0)
1992
1993 SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1994
1995 SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1996
1997 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
1998
1999 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
2000
2001 and "Perl_dump_all", which dumps all the subroutines in the stash and
2002 the op tree of the main root.
2003
2005 Background and PERL_IMPLICIT_CONTEXT
2006 The Perl interpreter can be regarded as a closed box: it has an API for
2007 feeding it code or otherwise making it do things, but it also has
2008 functions for its own use. This smells a lot like an object, and there
2009 are ways for you to build Perl so that you can have multiple
2010 interpreters, with one interpreter represented either as a C structure,
2011 or inside a thread-specific structure. These structures contain all
2012 the context, the state of that interpreter.
2013
2014 One macro controls the major Perl build flavor: MULTIPLICITY. The
2015 MULTIPLICITY build has a C structure that packages all the interpreter
2016 state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
2017 normally defined, and enables the support for passing in a "hidden"
2018 first argument that represents all three data structures. MULTIPLICITY
2019 makes multi-threaded perls possible (with the ithreads threading model,
2020 related to the macro USE_ITHREADS.)
2021
2022 Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
2023 PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
2024 former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the
2025 internal variables of Perl to be wrapped inside a single global struct,
2026 struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or the
2027 function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes one step
2028 further, there is still a single struct (allocated in main() either
2029 from heap or from stack) but there are no global data symbols pointing
2030 to it. In either case the global struct should be initialised as the
2031 very first thing in main() using Perl_init_global_struct() and
2032 correspondingly tear it down after perl_free() using
2033 Perl_free_global_struct(), please see miniperlmain.c for usage details.
2034 You may also need to use "dVAR" in your coding to "declare the global
2035 variables" when you are using them. dTHX does this for you
2036 automatically.
2037
2038 To see whether you have non-const data you can use a BSD-compatible
2039 "nm":
2040
2041 nm libperl.a | grep -v ' [TURtr] '
2042
2043 If this displays any "D" or "d" symbols, you have non-const data.
2044
2045 For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
2046 doesn't actually hide all symbols inside a big global struct: some
2047 PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE
2048 then hides everything (see how the PERLIO_FUNCS_DECL is used).
2049
2050 All this obviously requires a way for the Perl internal functions to be
2051 either subroutines taking some kind of structure as the first argument,
2052 or subroutines taking nothing as the first argument. To enable these
2053 two very different ways of building the interpreter, the Perl source
2054 (as it does in so many other situations) makes heavy use of macros and
2055 subroutine naming conventions.
2056
2057 First problem: deciding which functions will be public API functions
2058 and which will be private. All functions whose names begin "S_" are
2059 private (think "S" for "secret" or "static"). All other functions
2060 begin with "Perl_", but just because a function begins with "Perl_"
2061 does not mean it is part of the API. (See "Internal Functions".) The
2062 easiest way to be sure a function is part of the API is to find its
2063 entry in perlapi. If it exists in perlapi, it's part of the API. If
2064 it doesn't, and you think it should be (i.e., you need it for your
2065 extension), send mail via perlbug explaining why you think it should
2066 be.
2067
2068 Second problem: there must be a syntax so that the same subroutine
2069 declarations and calls can pass a structure as their first argument, or
2070 pass nothing. To solve this, the subroutines are named and declared in
2071 a particular way. Here's a typical start of a static function used
2072 within the Perl guts:
2073
2074 STATIC void
2075 S_incline(pTHX_ char *s)
2076
2077 STATIC becomes "static" in C, and may be #define'd to nothing in some
2078 configurations in the future.
2079
2080 A public function (i.e. part of the internal API, but not necessarily
2081 sanctioned for use in extensions) begins like this:
2082
2083 void
2084 Perl_sv_setiv(pTHX_ SV* dsv, IV num)
2085
2086 "pTHX_" is one of a number of macros (in perl.h) that hide the details
2087 of the interpreter's context. THX stands for "thread", "this", or
2088 "thingy", as the case may be. (And no, George Lucas is not involved.
2089 :-) The first character could be 'p' for a prototype, 'a' for argument,
2090 or 'd' for declaration, so we have "pTHX", "aTHX" and "dTHX", and their
2091 variants.
2092
2093 When Perl is built without options that set PERL_IMPLICIT_CONTEXT,
2094 there is no first argument containing the interpreter's context. The
2095 trailing underscore in the pTHX_ macro indicates that the macro
2096 expansion needs a comma after the context argument because other
2097 arguments follow it. If PERL_IMPLICIT_CONTEXT is not defined, pTHX_
2098 will be ignored, and the subroutine is not prototyped to take the extra
2099 argument. The form of the macro without the trailing underscore is
2100 used when there are no additional explicit arguments.
2101
2102 When a core function calls another, it must pass the context. This is
2103 normally hidden via macros. Consider "sv_setiv". It expands into
2104 something like this:
2105
2106 #ifdef PERL_IMPLICIT_CONTEXT
2107 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
2108 /* can't do this for vararg functions, see below */
2109 #else
2110 #define sv_setiv Perl_sv_setiv
2111 #endif
2112
2113 This works well, and means that XS authors can gleefully write:
2114
2115 sv_setiv(foo, bar);
2116
2117 and still have it work under all the modes Perl could have been
2118 compiled with.
2119
2120 This doesn't work so cleanly for varargs functions, though, as macros
2121 imply that the number of arguments is known in advance. Instead we
2122 either need to spell them out fully, passing "aTHX_" as the first
2123 argument (the Perl core tends to do this with functions like
2124 Perl_warner), or use a context-free version.
2125
2126 The context-free version of Perl_warner is called
2127 Perl_warner_nocontext, and does not take the extra argument. Instead
2128 it does dTHX; to get the context from thread-local storage. We
2129 "#define warner Perl_warner_nocontext" so that extensions get source
2130 compatibility at the expense of performance. (Passing an arg is
2131 cheaper than grabbing it from thread-local storage.)
2132
2133 You can ignore [pad]THXx when browsing the Perl headers/sources. Those
2134 are strictly for use within the core. Extensions and embedders need
2135 only be aware of [pad]THX.
2136
2137 So what happened to dTHR?
2138 "dTHR" was introduced in perl 5.005 to support the older thread model.
2139 The older thread model now uses the "THX" mechanism to pass context
2140 pointers around, so "dTHR" is not useful any more. Perl 5.6.0 and
2141 later still have it for backward source compatibility, but it is
2142 defined to be a no-op.
2143
2144 How do I use all this in extensions?
2145 When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any
2146 functions in the Perl API will need to pass the initial context
2147 argument somehow. The kicker is that you will need to write it in such
2148 a way that the extension still compiles when Perl hasn't been built
2149 with PERL_IMPLICIT_CONTEXT enabled.
2150
2151 There are three ways to do this. First, the easy but inefficient way,
2152 which is also the default, in order to maintain source compatibility
2153 with extensions: whenever XSUB.h is #included, it redefines the aTHX
2154 and aTHX_ macros to call a function that will return the context.
2155 Thus, something like:
2156
2157 sv_setiv(sv, num);
2158
2159 in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
2160 in effect:
2161
2162 Perl_sv_setiv(Perl_get_context(), sv, num);
2163
2164 or to this otherwise:
2165
2166 Perl_sv_setiv(sv, num);
2167
2168 You don't have to do anything new in your extension to get this; since
2169 the Perl library provides Perl_get_context(), it will all just work.
2170
2171 The second, more efficient way is to use the following template for
2172 your Foo.xs:
2173
2174 #define PERL_NO_GET_CONTEXT /* we want efficiency */
2175 #include "EXTERN.h"
2176 #include "perl.h"
2177 #include "XSUB.h"
2178
2179 STATIC void my_private_function(int arg1, int arg2);
2180
2181 STATIC void
2182 my_private_function(int arg1, int arg2)
2183 {
2184 dTHX; /* fetch context */
2185 ... call many Perl API functions ...
2186 }
2187
2188 [... etc ...]
2189
2190 MODULE = Foo PACKAGE = Foo
2191
2192 /* typical XSUB */
2193
2194 void
2195 my_xsub(arg)
2196 int arg
2197 CODE:
2198 my_private_function(arg, 10);
2199
2200 Note that the only two changes from the normal way of writing an
2201 extension is the addition of a "#define PERL_NO_GET_CONTEXT" before
2202 including the Perl headers, followed by a "dTHX;" declaration at the
2203 start of every function that will call the Perl API. (You'll know
2204 which functions need this, because the C compiler will complain that
2205 there's an undeclared identifier in those functions.) No changes are
2206 needed for the XSUBs themselves, because the XS() macro is correctly
2207 defined to pass in the implicit context if needed.
2208
2209 The third, even more efficient way is to ape how it is done within the
2210 Perl guts:
2211
2212 #define PERL_NO_GET_CONTEXT /* we want efficiency */
2213 #include "EXTERN.h"
2214 #include "perl.h"
2215 #include "XSUB.h"
2216
2217 /* pTHX_ only needed for functions that call Perl API */
2218 STATIC void my_private_function(pTHX_ int arg1, int arg2);
2219
2220 STATIC void
2221 my_private_function(pTHX_ int arg1, int arg2)
2222 {
2223 /* dTHX; not needed here, because THX is an argument */
2224 ... call Perl API functions ...
2225 }
2226
2227 [... etc ...]
2228
2229 MODULE = Foo PACKAGE = Foo
2230
2231 /* typical XSUB */
2232
2233 void
2234 my_xsub(arg)
2235 int arg
2236 CODE:
2237 my_private_function(aTHX_ arg, 10);
2238
2239 This implementation never has to fetch the context using a function
2240 call, since it is always passed as an extra argument. Depending on
2241 your needs for simplicity or efficiency, you may mix the previous two
2242 approaches freely.
2243
2244 Never add a comma after "pTHX" yourself--always use the form of the
2245 macro with the underscore for functions that take explicit arguments,
2246 or the form without the argument for functions with no explicit
2247 arguments.
2248
2249 If one is compiling Perl with the "-DPERL_GLOBAL_STRUCT" the "dVAR"
2250 definition is needed if the Perl global variables (see perlvars.h or
2251 globvar.sym) are accessed in the function and "dTHX" is not used (the
2252 "dTHX" includes the "dVAR" if necessary). One notices the need for
2253 "dVAR" only with the said compile-time define, because otherwise the
2254 Perl global variables are visible as-is.
2255
2256 Should I do anything special if I call perl from multiple threads?
2257 If you create interpreters in one thread and then proceed to call them
2258 in another, you need to make sure perl's own Thread Local Storage (TLS)
2259 slot is initialized correctly in each of those threads.
2260
2261 The "perl_alloc" and "perl_clone" API functions will automatically set
2262 the TLS slot to the interpreter they created, so that there is no need
2263 to do anything special if the interpreter is always accessed in the
2264 same thread that created it, and that thread did not create or call any
2265 other interpreters afterwards. If that is not the case, you have to
2266 set the TLS slot of the thread before calling any functions in the Perl
2267 API on that particular interpreter. This is done by calling the
2268 "PERL_SET_CONTEXT" macro in that thread as the first thing you do:
2269
2270 /* do this before doing anything else with some_perl */
2271 PERL_SET_CONTEXT(some_perl);
2272
2273 ... other Perl API calls on some_perl go here ...
2274
2275 Future Plans and PERL_IMPLICIT_SYS
2276 Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2277 that the interpreter knows about itself and pass it around, so too are
2278 there plans to allow the interpreter to bundle up everything it knows
2279 about the environment it's running on. This is enabled with the
2280 PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on
2281 Windows.
2282
2283 This allows the ability to provide an extra pointer (called the "host"
2284 environment) for all the system calls. This makes it possible for all
2285 the system stuff to maintain their own state, broken down into seven C
2286 structures. These are thin wrappers around the usual system calls (see
2287 win32/perllib.c) for the default perl executable, but for a more
2288 ambitious host (like the one that would do fork() emulation) all the
2289 extra work needed to pretend that different interpreters are actually
2290 different "processes", would be done here.
2291
2292 The Perl engine/interpreter and the host are orthogonal entities.
2293 There could be one or more interpreters in a process, and one or more
2294 "hosts", with free association between them.
2295
2297 All of Perl's internal functions which will be exposed to the outside
2298 world are prefixed by "Perl_" so that they will not conflict with XS
2299 functions or functions used in a program in which Perl is embedded.
2300 Similarly, all global variables begin with "PL_". (By convention,
2301 static functions start with "S_".)
2302
2303 Inside the Perl core ("PERL_CORE" defined), you can get at the
2304 functions either with or without the "Perl_" prefix, thanks to a bunch
2305 of defines that live in embed.h. Note that extension code should not
2306 set "PERL_CORE"; this exposes the full perl internals, and is likely to
2307 cause breakage of the XS in each new perl release.
2308
2309 The file embed.h is generated automatically from embed.pl and
2310 embed.fnc. embed.pl also creates the prototyping header files for the
2311 internal functions, generates the documentation and a lot of other bits
2312 and pieces. It's important that when you add a new function to the core
2313 or change an existing one, you change the data in the table in
2314 embed.fnc as well. Here's a sample entry from that table:
2315
2316 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
2317
2318 The second column is the return type, the third column the name.
2319 Columns after that are the arguments. The first column is a set of
2320 flags:
2321
2322 A This function is a part of the public API. All such functions should
2323 also have 'd', very few do not.
2324
2325 p This function has a "Perl_" prefix; i.e. it is defined as
2326 "Perl_av_fetch".
2327
2328 d This function has documentation using the "apidoc" feature which
2329 we'll look at in a second. Some functions have 'd' but not 'A';
2330 docs are good.
2331
2332 Other available flags are:
2333
2334 s This is a static function and is defined as "STATIC S_whatever", and
2335 usually called within the sources as "whatever(...)".
2336
2337 n This does not need an interpreter context, so the definition has no
2338 "pTHX", and it follows that callers don't use "aTHX". (See
2339 "Background and PERL_IMPLICIT_CONTEXT".)
2340
2341 r This function never returns; "croak", "exit" and friends.
2342
2343 f This function takes a variable number of arguments, "printf" style.
2344 The argument list should end with "...", like this:
2345
2346 Afprd |void |croak |const char* pat|...
2347
2348 M This function is part of the experimental development API, and may
2349 change or disappear without notice.
2350
2351 o This function should not have a compatibility macro to define, say,
2352 "Perl_parse" to "parse". It must be called as "Perl_parse".
2353
2354 x This function isn't exported out of the Perl core.
2355
2356 m This is implemented as a macro.
2357
2358 X This function is explicitly exported.
2359
2360 E This function is visible to extensions included in the Perl core.
2361
2362 b Binary backward compatibility; this function is a macro but also has
2363 a "Perl_" implementation (which is exported).
2364
2365 others
2366 See the comments at the top of "embed.fnc" for others.
2367
2368 If you edit embed.pl or embed.fnc, you will need to run "make
2369 regen_headers" to force a rebuild of embed.h and other auto-generated
2370 files.
2371
2372 Formatted Printing of IVs, UVs, and NVs
2373 If you are printing IVs, UVs, or NVS instead of the stdio(3) style
2374 formatting codes like %d, %ld, %f, you should use the following macros
2375 for portability
2376
2377 IVdf IV in decimal
2378 UVuf UV in decimal
2379 UVof UV in octal
2380 UVxf UV in hexadecimal
2381 NVef NV %e-like
2382 NVff NV %f-like
2383 NVgf NV %g-like
2384
2385 These will take care of 64-bit integers and long doubles. For example:
2386
2387 printf("IV is %"IVdf"\n", iv);
2388
2389 The IVdf will expand to whatever is the correct format for the IVs.
2390
2391 If you are printing addresses of pointers, use UVxf combined with
2392 PTR2UV(), do not use %lx or %p.
2393
2394 Pointer-To-Integer and Integer-To-Pointer
2395 Because pointer size does not necessarily equal integer size, use the
2396 follow macros to do it right.
2397
2398 PTR2UV(pointer)
2399 PTR2IV(pointer)
2400 PTR2NV(pointer)
2401 INT2PTR(pointertotype, integer)
2402
2403 For example:
2404
2405 IV iv = ...;
2406 SV *sv = INT2PTR(SV*, iv);
2407
2408 and
2409
2410 AV *av = ...;
2411 UV uv = PTR2UV(av);
2412
2413 Exception Handling
2414 There are a couple of macros to do very basic exception handling in XS
2415 modules. You have to define "NO_XSLOCKS" before including XSUB.h to be
2416 able to use these macros:
2417
2418 #define NO_XSLOCKS
2419 #include "XSUB.h"
2420
2421 You can use these macros if you call code that may croak, but you need
2422 to do some cleanup before giving control back to Perl. For example:
2423
2424 dXCPT; /* set up necessary variables */
2425
2426 XCPT_TRY_START {
2427 code_that_may_croak();
2428 } XCPT_TRY_END
2429
2430 XCPT_CATCH
2431 {
2432 /* do cleanup here */
2433 XCPT_RETHROW;
2434 }
2435
2436 Note that you always have to rethrow an exception that has been caught.
2437 Using these macros, it is not possible to just catch the exception and
2438 ignore it. If you have to ignore the exception, you have to use the
2439 "call_*" function.
2440
2441 The advantage of using the above macros is that you don't have to setup
2442 an extra function for "call_*", and that using these macros is faster
2443 than using "call_*".
2444
2445 Source Documentation
2446 There's an effort going on to document the internal functions and
2447 automatically produce reference manuals from them - perlapi is one such
2448 manual which details all the functions which are available to XS
2449 writers. perlintern is the autogenerated manual for the functions which
2450 are not part of the API and are supposedly for internal use only.
2451
2452 Source documentation is created by putting POD comments into the C
2453 source, like this:
2454
2455 /*
2456 =for apidoc sv_setiv
2457
2458 Copies an integer into the given SV. Does not handle 'set' magic. See
2459 C<sv_setiv_mg>.
2460
2461 =cut
2462 */
2463
2464 Please try and supply some documentation if you add functions to the
2465 Perl core.
2466
2467 Backwards compatibility
2468 The Perl API changes over time. New functions are added or the
2469 interfaces of existing functions are changed. The "Devel::PPPort"
2470 module tries to provide compatibility code for some of these changes,
2471 so XS writers don't have to code it themselves when supporting multiple
2472 versions of Perl.
2473
2474 "Devel::PPPort" generates a C header file ppport.h that can also be run
2475 as a Perl script. To generate ppport.h, run:
2476
2477 perl -MDevel::PPPort -eDevel::PPPort::WriteFile
2478
2479 Besides checking existing XS code, the script can also be used to
2480 retrieve compatibility information for various API calls using the
2481 "--api-info" command line switch. For example:
2482
2483 % perl ppport.h --api-info=sv_magicext
2484
2485 For details, see "perldoc ppport.h".
2486
2488 Perl 5.6.0 introduced Unicode support. It's important for porters and
2489 XS writers to understand this support and make sure that the code they
2490 write does not corrupt Unicode data.
2491
2492 What is Unicode, anyway?
2493 In the olden, less enlightened times, we all used to use ASCII. Most of
2494 us did, anyway. The big problem with ASCII is that it's American. Well,
2495 no, that's not actually the problem; the problem is that it's not
2496 particularly useful for people who don't use the Roman alphabet. What
2497 used to happen was that particular languages would stick their own
2498 alphabet in the upper range of the sequence, between 128 and 255. Of
2499 course, we then ended up with plenty of variants that weren't quite
2500 ASCII, and the whole point of it being a standard was lost.
2501
2502 Worse still, if you've got a language like Chinese or Japanese that has
2503 hundreds or thousands of characters, then you really can't fit them
2504 into a mere 256, so they had to forget about ASCII altogether, and
2505 build their own systems using pairs of numbers to refer to one
2506 character.
2507
2508 To fix this, some people formed Unicode, Inc. and produced a new
2509 character set containing all the characters you can possibly think of
2510 and more. There are several ways of representing these characters, and
2511 the one Perl uses is called UTF-8. UTF-8 uses a variable number of
2512 bytes to represent a character. You can learn more about Unicode and
2513 Perl's Unicode model in perlunicode.
2514
2515 How can I recognise a UTF-8 string?
2516 You can't. This is because UTF-8 data is stored in bytes just like
2517 non-UTF-8 data. The Unicode character 200, (0xC8 for you hex types)
2518 capital E with a grave accent, is represented by the two bytes
2519 "v196.172". Unfortunately, the non-Unicode string "chr(196).chr(172)"
2520 has that byte sequence as well. So you can't tell just by looking -
2521 this is what makes Unicode input an interesting problem.
2522
2523 In general, you either have to know what you're dealing with, or you
2524 have to guess. The API function "is_utf8_string" can help; it'll tell
2525 you if a string contains only valid UTF-8 characters. However, it can't
2526 do the work for you. On a character-by-character basis, "is_utf8_char"
2527 will tell you whether the current character in a string is valid UTF-8.
2528
2529 How does UTF-8 represent Unicode characters?
2530 As mentioned above, UTF-8 uses a variable number of bytes to store a
2531 character. Characters with values 0...127 are stored in one byte, just
2532 like good ol' ASCII. Character 128 is stored as "v194.128"; this
2533 continues up to character 191, which is "v194.191". Now we've run out
2534 of bits (191 is binary 10111111) so we move on; 192 is "v195.128". And
2535 so it goes on, moving to three bytes at character 2048.
2536
2537 Assuming you know you're dealing with a UTF-8 string, you can find out
2538 how long the first character in it is with the "UTF8SKIP" macro:
2539
2540 char *utf = "\305\233\340\240\201";
2541 I32 len;
2542
2543 len = UTF8SKIP(utf); /* len is 2 here */
2544 utf += len;
2545 len = UTF8SKIP(utf); /* len is 3 here */
2546
2547 Another way to skip over characters in a UTF-8 string is to use
2548 "utf8_hop", which takes a string and a number of characters to skip
2549 over. You're on your own about bounds checking, though, so don't use it
2550 lightly.
2551
2552 All bytes in a multi-byte UTF-8 character will have the high bit set,
2553 so you can test if you need to do something special with this character
2554 like this (the UTF8_IS_INVARIANT() is a macro that tests whether the
2555 byte can be encoded as a single byte even in UTF-8):
2556
2557 U8 *utf;
2558 U8 *utf_end; /* 1 beyond buffer pointed to by utf */
2559 UV uv; /* Note: a UV, not a U8, not a char */
2560 STRLEN len; /* length of character in bytes */
2561
2562 if (!UTF8_IS_INVARIANT(*utf))
2563 /* Must treat this as UTF-8 */
2564 uv = utf8_to_uvchr_buf(utf, utf_end, &len);
2565 else
2566 /* OK to treat this character as a byte */
2567 uv = *utf;
2568
2569 You can also see in that example that we use "utf8_to_uvchr_buf" to get
2570 the value of the character; the inverse function "uvchr_to_utf8" is
2571 available for putting a UV into UTF-8:
2572
2573 if (!UTF8_IS_INVARIANT(uv))
2574 /* Must treat this as UTF8 */
2575 utf8 = uvchr_to_utf8(utf8, uv);
2576 else
2577 /* OK to treat this character as a byte */
2578 *utf8++ = uv;
2579
2580 You must convert characters to UVs using the above functions if you're
2581 ever in a situation where you have to match UTF-8 and non-UTF-8
2582 characters. You may not skip over UTF-8 characters in this case. If you
2583 do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
2584 for instance, if your UTF-8 string contains "v196.172", and you skip
2585 that character, you can never match a "chr(200)" in a non-UTF-8 string.
2586 So don't do that!
2587
2588 How does Perl store UTF-8 strings?
2589 Currently, Perl deals with Unicode strings and non-Unicode strings
2590 slightly differently. A flag in the SV, "SVf_UTF8", indicates that the
2591 string is internally encoded as UTF-8. Without it, the byte value is
2592 the codepoint number and vice versa (in other words, the string is
2593 encoded as iso-8859-1, but "use feature 'unicode_strings'" is needed to
2594 get iso-8859-1 semantics). You can check and manipulate this flag with
2595 the following macros:
2596
2597 SvUTF8(sv)
2598 SvUTF8_on(sv)
2599 SvUTF8_off(sv)
2600
2601 This flag has an important effect on Perl's treatment of the string: if
2602 Unicode data is not properly distinguished, regular expressions,
2603 "length", "substr" and other string handling operations will have
2604 undesirable results.
2605
2606 The problem comes when you have, for instance, a string that isn't
2607 flagged as UTF-8, and contains a byte sequence that could be UTF-8 -
2608 especially when combining non-UTF-8 and UTF-8 strings.
2609
2610 Never forget that the "SVf_UTF8" flag is separate to the PV value; you
2611 need be sure you don't accidentally knock it off while you're
2612 manipulating SVs. More specifically, you cannot expect to do this:
2613
2614 SV *sv;
2615 SV *nsv;
2616 STRLEN len;
2617 char *p;
2618
2619 p = SvPV(sv, len);
2620 frobnicate(p);
2621 nsv = newSVpvn(p, len);
2622
2623 The "char*" string does not tell you the whole story, and you can't
2624 copy or reconstruct an SV just by copying the string value. Check if
2625 the old SV has the UTF8 flag set, and act accordingly:
2626
2627 p = SvPV(sv, len);
2628 frobnicate(p);
2629 nsv = newSVpvn(p, len);
2630 if (SvUTF8(sv))
2631 SvUTF8_on(nsv);
2632
2633 In fact, your "frobnicate" function should be made aware of whether or
2634 not it's dealing with UTF-8 data, so that it can handle the string
2635 appropriately.
2636
2637 Since just passing an SV to an XS function and copying the data of the
2638 SV is not enough to copy the UTF8 flags, even less right is just
2639 passing a "char *" to an XS function.
2640
2641 How do I convert a string to UTF-8?
2642 If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to
2643 upgrade one of the strings to UTF-8. If you've got an SV, the easiest
2644 way to do this is:
2645
2646 sv_utf8_upgrade(sv);
2647
2648 However, you must not do this, for example:
2649
2650 if (!SvUTF8(left))
2651 sv_utf8_upgrade(left);
2652
2653 If you do this in a binary operator, you will actually change one of
2654 the strings that came into the operator, and, while it shouldn't be
2655 noticeable by the end user, it can cause problems in deficient code.
2656
2657 Instead, "bytes_to_utf8" will give you a UTF-8-encoded copy of its
2658 string argument. This is useful for having the data available for
2659 comparisons and so on, without harming the original SV. There's also
2660 "utf8_to_bytes" to go the other way, but naturally, this will fail if
2661 the string contains any characters above 255 that can't be represented
2662 in a single byte.
2663
2664 Is there anything else I need to know?
2665 Not really. Just remember these things:
2666
2667 · There's no way to tell if a string is UTF-8 or not. You can tell if
2668 an SV is UTF-8 by looking at its "SvUTF8" flag. Don't forget to set
2669 the flag if something should be UTF-8. Treat the flag as part of the
2670 PV, even though it's not - if you pass on the PV to somewhere, pass
2671 on the flag too.
2672
2673 · If a string is UTF-8, always use "utf8_to_uvchr_buf" to get at the
2674 value, unless "UTF8_IS_INVARIANT(*s)" in which case you can use *s.
2675
2676 · When writing a character "uv" to a UTF-8 string, always use
2677 "uvchr_to_utf8", unless "UTF8_IS_INVARIANT(uv))" in which case you
2678 can use "*s = uv".
2679
2680 · Mixing UTF-8 and non-UTF-8 strings is tricky. Use "bytes_to_utf8" to
2681 get a new string which is UTF-8 encoded, and then combine them.
2682
2684 Custom operator support is a new experimental feature that allows you
2685 to define your own ops. This is primarily to allow the building of
2686 interpreters for other languages in the Perl core, but it also allows
2687 optimizations through the creation of "macro-ops" (ops which perform
2688 the functions of multiple ops which are usually executed together, such
2689 as "gvsv, gvsv, add".)
2690
2691 This feature is implemented as a new op type, "OP_CUSTOM". The Perl
2692 core does not "know" anything special about this op type, and so it
2693 will not be involved in any optimizations. This also means that you can
2694 define your custom ops to be any op structure - unary, binary, list and
2695 so on - you like.
2696
2697 It's important to know what custom operators won't do for you. They
2698 won't let you add new syntax to Perl, directly. They won't even let you
2699 add new keywords, directly. In fact, they won't change the way Perl
2700 compiles a program at all. You have to do those changes yourself, after
2701 Perl has compiled the program. You do this either by manipulating the
2702 op tree using a "CHECK" block and the "B::Generate" module, or by
2703 adding a custom peephole optimizer with the "optimize" module.
2704
2705 When you do this, you replace ordinary Perl ops with custom ops by
2706 creating ops with the type "OP_CUSTOM" and the "pp_addr" of your own PP
2707 function. This should be defined in XS code, and should look like the
2708 PP ops in "pp_*.c". You are responsible for ensuring that your op takes
2709 the appropriate number of values from the stack, and you are
2710 responsible for adding stack marks if necessary.
2711
2712 You should also "register" your op with the Perl interpreter so that it
2713 can produce sensible error and warning messages. Since it is possible
2714 to have multiple custom ops within the one "logical" op type
2715 "OP_CUSTOM", Perl uses the value of "o->op_ppaddr" to determine which
2716 custom op it is dealing with. You should create an "XOP" structure for
2717 each ppaddr you use, set the properties of the custom op with
2718 "XopENTRY_set", and register the structure against the ppaddr using
2719 "Perl_custom_op_register". A trivial example might look like:
2720
2721 static XOP my_xop;
2722 static OP *my_pp(pTHX);
2723
2724 BOOT:
2725 XopENTRY_set(&my_xop, xop_name, "myxop");
2726 XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
2727 Perl_custom_op_register(aTHX_ my_pp, &my_xop);
2728
2729 The available fields in the structure are:
2730
2731 xop_name
2732 A short name for your op. This will be included in some error
2733 messages, and will also be returned as "$op->name" by the B module,
2734 so it will appear in the output of module like B::Concise.
2735
2736 xop_desc
2737 A short description of the function of the op.
2738
2739 xop_class
2740 Which of the various *OP structures this op uses. This should be
2741 one of the "OA_*" constants from op.h, namely
2742
2743 OA_BASEOP
2744 OA_UNOP
2745 OA_BINOP
2746 OA_LOGOP
2747 OA_LISTOP
2748 OA_PMOP
2749 OA_SVOP
2750 OA_PADOP
2751 OA_PVOP_OR_SVOP
2752 This should be interpreted as '"PVOP"' only. The "_OR_SVOP" is
2753 because the only core "PVOP", "OP_TRANS", can sometimes be a
2754 "SVOP" instead.
2755
2756 OA_LOOP
2757 OA_COP
2758
2759 The other "OA_*" constants should not be used.
2760
2761 xop_peep
2762 This member is of type "Perl_cpeep_t", which expands to "void
2763 (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)". If it is set, this
2764 function will be called from "Perl_rpeep" when ops of this type are
2765 encountered by the peephole optimizer. o is the OP that needs
2766 optimizing; oldop is the previous OP optimized, whose "op_next"
2767 points to o.
2768
2769 "B::Generate" directly supports the creation of custom ops by name.
2770
2772 Until May 1997, this document was maintained by Jeff Okamoto
2773 <okamoto@corp.hp.com>. It is now maintained as part of Perl itself by
2774 the Perl 5 Porters <perl5-porters@perl.org>.
2775
2776 With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
2777 Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
2778 Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
2779 Stephen McCamant, and Gurusamy Sarathy.
2780
2782 perlapi, perlintern, perlxs, perlembed
2783
2784
2785
2786perl v5.16.3 2013-03-04 PERLGUTS(1)