1PERLGUTS(1) Perl Programmers Reference Guide PERLGUTS(1)
2
3
4
6 perlguts - Introduction to the Perl API
7
9 This document attempts to describe how to use the Perl API, as well as
10 to provide some info on the basic workings of the Perl core. It is far
11 from complete and probably contains many errors. Please refer any ques‐
12 tions or comments to the author below.
13
15 Datatypes
16
17 Perl has three typedefs that handle Perl's three main data types:
18
19 SV Scalar Value
20 AV Array Value
21 HV Hash Value
22
23 Each typedef has specific routines that manipulate the various data
24 types.
25
26 What is an "IV"?
27
28 Perl uses a special typedef IV which is a simple signed integer type
29 that is guaranteed to be large enough to hold a pointer (as well as an
30 integer). Additionally, there is the UV, which is simply an unsigned
31 IV.
32
33 Perl also uses two special typedefs, I32 and I16, which will always be
34 at least 32-bits and 16-bits long, respectively. (Again, there are U32
35 and U16, as well.) They will usually be exactly 32 and 16 bits long,
36 but on Crays they will both be 64 bits.
37
38 Working with SVs
39
40 An SV can be created and loaded with one command. There are five types
41 of values that can be loaded: an integer value (IV), an unsigned inte‐
42 ger value (UV), a double (NV), a string (PV), and another scalar (SV).
43
44 The seven routines are:
45
46 SV* newSViv(IV);
47 SV* newSVuv(UV);
48 SV* newSVnv(double);
49 SV* newSVpv(const char*, STRLEN);
50 SV* newSVpvn(const char*, STRLEN);
51 SV* newSVpvf(const char*, ...);
52 SV* newSVsv(SV*);
53
54 "STRLEN" is an integer type (Size_t, usually defined as size_t in con‐
55 fig.h) guaranteed to be large enough to represent the size of any
56 string that perl can handle.
57
58 In the unlikely case of a SV requiring more complex initialisation, you
59 can create an empty SV with newSV(len). If "len" is 0 an empty SV of
60 type NULL is returned, else an SV of type PV is returned with len + 1
61 (for the NUL) bytes of storage allocated, accessible via SvPVX. In
62 both cases the SV has value undef.
63
64 SV *sv = newSV(0); /* no storage allocated */
65 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
66
67 To change the value of an already-existing SV, there are eight rou‐
68 tines:
69
70 void sv_setiv(SV*, IV);
71 void sv_setuv(SV*, UV);
72 void sv_setnv(SV*, double);
73 void sv_setpv(SV*, const char*);
74 void sv_setpvn(SV*, const char*, STRLEN)
75 void sv_setpvf(SV*, const char*, ...);
76 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
77 void sv_setsv(SV*, SV*);
78
79 Notice that you can choose to specify the length of the string to be
80 assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
81 allow Perl to calculate the length by using "sv_setpv" or by specifying
82 0 as the second argument to "newSVpv". Be warned, though, that Perl
83 will determine the string's length by using "strlen", which depends on
84 the string terminating with a NUL character.
85
86 The arguments of "sv_setpvf" are processed like "sprintf", and the for‐
87 matted output becomes the value.
88
89 "sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to spec‐
90 ify either a pointer to a variable argument list or the address and
91 length of an array of SVs. The last argument points to a boolean; on
92 return, if that boolean is true, then locale-specific information has
93 been used to format the string, and the string's contents are therefore
94 untrustworthy (see perlsec). This pointer may be NULL if that informa‐
95 tion is not important. Note that this function requires you to specify
96 the length of the format.
97
98 The "sv_set*()" functions are not generic enough to operate on values
99 that have "magic". See "Magic Virtual Tables" later in this document.
100
101 All SVs that contain strings should be terminated with a NUL character.
102 If it is not NUL-terminated there is a risk of core dumps and corrup‐
103 tions from code which passes the string to C functions or system calls
104 which expect a NUL-terminated string. Perl's own functions typically
105 add a trailing NUL for this reason. Nevertheless, you should be very
106 careful when you pass a string stored in an SV to a C function or sys‐
107 tem call.
108
109 To access the actual value that an SV points to, you can use the
110 macros:
111
112 SvIV(SV*)
113 SvUV(SV*)
114 SvNV(SV*)
115 SvPV(SV*, STRLEN len)
116 SvPV_nolen(SV*)
117
118 which will automatically coerce the actual scalar type into an IV, UV,
119 double, or string.
120
121 In the "SvPV" macro, the length of the string returned is placed into
122 the variable "len" (this is a macro, so you do not use &len). If you
123 do not care what the length of the data is, use the "SvPV_nolen" macro.
124 Historically the "SvPV" macro with the global variable "PL_na" has been
125 used in this case. But that can be quite inefficient because "PL_na"
126 must be accessed in thread-local storage in threaded Perl. In any
127 case, remember that Perl allows arbitrary strings of data that may both
128 contain NULs and might not be terminated by a NUL.
129
130 Also remember that C doesn't allow you to safely say "foo(SvPV(s, len),
131 len);". It might work with your compiler, but it won't work for every‐
132 one. Break this sort of statement up into separate assignments:
133
134 SV *s;
135 STRLEN len;
136 char * ptr;
137 ptr = SvPV(s, len);
138 foo(ptr, len);
139
140 If you want to know if the scalar value is TRUE, you can use:
141
142 SvTRUE(SV*)
143
144 Although Perl will automatically grow strings for you, if you need to
145 force Perl to allocate more memory for your SV, you can use the macro
146
147 SvGROW(SV*, STRLEN newlen)
148
149 which will determine if more memory needs to be allocated. If so, it
150 will call the function "sv_grow". Note that "SvGROW" can only
151 increase, not decrease, the allocated memory of an SV and that it does
152 not automatically add a byte for the a trailing NUL (perl's own string
153 functions typically do "SvGROW(sv, len + 1)").
154
155 If you have an SV and want to know what kind of data Perl thinks is
156 stored in it, you can use the following macros to check the type of SV
157 you have.
158
159 SvIOK(SV*)
160 SvNOK(SV*)
161 SvPOK(SV*)
162
163 You can get and set the current length of the string stored in an SV
164 with the following macros:
165
166 SvCUR(SV*)
167 SvCUR_set(SV*, I32 val)
168
169 You can also get a pointer to the end of the string stored in the SV
170 with the macro:
171
172 SvEND(SV*)
173
174 But note that these last three macros are valid only if "SvPOK()" is
175 true.
176
177 If you want to append something to the end of string stored in an
178 "SV*", you can use the following functions:
179
180 void sv_catpv(SV*, const char*);
181 void sv_catpvn(SV*, const char*, STRLEN);
182 void sv_catpvf(SV*, const char*, ...);
183 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
184 void sv_catsv(SV*, SV*);
185
186 The first function calculates the length of the string to be appended
187 by using "strlen". In the second, you specify the length of the string
188 yourself. The third function processes its arguments like "sprintf"
189 and appends the formatted output. The fourth function works like
190 "vsprintf". You can specify the address and length of an array of SVs
191 instead of the va_list argument. The fifth function extends the string
192 stored in the first SV with the string stored in the second SV. It
193 also forces the second SV to be interpreted as a string.
194
195 The "sv_cat*()" functions are not generic enough to operate on values
196 that have "magic". See "Magic Virtual Tables" later in this document.
197
198 If you know the name of a scalar variable, you can get a pointer to its
199 SV by using the following:
200
201 SV* get_sv("package::varname", FALSE);
202
203 This returns NULL if the variable does not exist.
204
205 If you want to know if this variable (or any other SV) is actually
206 "defined", you can call:
207
208 SvOK(SV*)
209
210 The scalar "undef" value is stored in an SV instance called
211 "PL_sv_undef".
212
213 Its address can be used whenever an "SV*" is needed. Make sure that you
214 don't try to compare a random sv with &PL_sv_undef. For example when
215 interfacing Perl code, it'll work correctly for:
216
217 foo(undef);
218
219 But won't work when called as:
220
221 $x = undef;
222 foo($x);
223
224 So to repeat always use SvOK() to check whether an sv is defined.
225
226 Also you have to be careful when using &PL_sv_undef as a value in AVs
227 or HVs (see "AVs, HVs and undefined values").
228
229 There are also the two values "PL_sv_yes" and "PL_sv_no", which contain
230 boolean TRUE and FALSE values, respectively. Like "PL_sv_undef", their
231 addresses can be used whenever an "SV*" is needed.
232
233 Do not be fooled into thinking that "(SV *) 0" is the same as
234 &PL_sv_undef. Take this code:
235
236 SV* sv = (SV*) 0;
237 if (I-am-to-return-a-real-value) {
238 sv = sv_2mortal(newSViv(42));
239 }
240 sv_setsv(ST(0), sv);
241
242 This code tries to return a new SV (which contains the value 42) if it
243 should return a real value, or undef otherwise. Instead it has
244 returned a NULL pointer which, somewhere down the line, will cause a
245 segmentation violation, bus error, or just weird results. Change the
246 zero to &PL_sv_undef in the first line and all will be well.
247
248 To free an SV that you've created, call "SvREFCNT_dec(SV*)". Normally
249 this call is not necessary (see "Reference Counts and Mortality").
250
251 Offsets
252
253 Perl provides the function "sv_chop" to efficiently remove characters
254 from the beginning of a string; you give it an SV and a pointer to
255 somewhere inside the PV, and it discards everything before the pointer.
256 The efficiency comes by means of a little hack: instead of actually
257 removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to
258 signal to other functions that the offset hack is in effect, and it
259 puts the number of bytes chopped off into the IV field of the SV. It
260 then moves the PV pointer (called "SvPVX") forward that many bytes, and
261 adjusts "SvCUR" and "SvLEN".
262
263 Hence, at this point, the start of the buffer that we allocated lives
264 at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing into
265 the middle of this allocated storage.
266
267 This is best demonstrated by example:
268
269 % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
270 SV = PVIV(0x8128450) at 0x81340f0
271 REFCNT = 1
272 FLAGS = (POK,OOK,pPOK)
273 IV = 1 (OFFSET)
274 PV = 0x8135781 ( "1" . ) "2345"\0
275 CUR = 4
276 LEN = 5
277
278 Here the number of bytes chopped off (1) is put into IV, and
279 "Devel::Peek::Dump" helpfully reminds us that this is an offset. The
280 portion of the string between the "real" and the "fake" beginnings is
281 shown in parentheses, and the values of "SvCUR" and "SvLEN" reflect the
282 fake beginning, not the real one.
283
284 Something similar to the offset hack is performed on AVs to enable
285 efficient shifting and splicing off the beginning of the array; while
286 "AvARRAY" points to the first element in the array that is visible from
287 Perl, "AvALLOC" points to the real start of the C array. These are usu‐
288 ally the same, but a "shift" operation can be carried out by increasing
289 "AvARRAY" by one and decreasing "AvFILL" and "AvLEN". Again, the loca‐
290 tion of the real start of the C array only comes into play when freeing
291 the array. See "av_shift" in av.c.
292
293 What's Really Stored in an SV?
294
295 Recall that the usual method of determining the type of scalar you have
296 is to use "Sv*OK" macros. Because a scalar can be both a number and a
297 string, usually these macros will always return TRUE and calling the
298 "Sv*V" macros will do the appropriate conversion of string to inte‐
299 ger/double or integer/double to string.
300
301 If you really need to know if you have an integer, double, or string
302 pointer in an SV, you can use the following three macros instead:
303
304 SvIOKp(SV*)
305 SvNOKp(SV*)
306 SvPOKp(SV*)
307
308 These will tell you if you truly have an integer, double, or string
309 pointer stored in your SV. The "p" stands for private.
310
311 The are various ways in which the private and public flags may differ.
312 For example, a tied SV may have a valid underlying value in the IV slot
313 (so SvIOKp is true), but the data should be accessed via the FETCH rou‐
314 tine rather than directly, so SvIOK is false. Another is when numeric
315 conversion has occurred and precision has been lost: only the private
316 flag is set on 'lossy' values. So when an NV is converted to an IV with
317 loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
318
319 In general, though, it's best to use the "Sv*V" macros.
320
321 Working with AVs
322
323 There are two ways to create and load an AV. The first method creates
324 an empty AV:
325
326 AV* newAV();
327
328 The second method both creates the AV and initially populates it with
329 SVs:
330
331 AV* av_make(I32 num, SV **ptr);
332
333 The second argument points to an array containing "num" "SV*"'s. Once
334 the AV has been created, the SVs can be destroyed, if so desired.
335
336 Once the AV has been created, the following operations are possible on
337 AVs:
338
339 void av_push(AV*, SV*);
340 SV* av_pop(AV*);
341 SV* av_shift(AV*);
342 void av_unshift(AV*, I32 num);
343
344 These should be familiar operations, with the exception of
345 "av_unshift". This routine adds "num" elements at the front of the
346 array with the "undef" value. You must then use "av_store" (described
347 below) to assign values to these new elements.
348
349 Here are some other functions:
350
351 I32 av_len(AV*);
352 SV** av_fetch(AV*, I32 key, I32 lval);
353 SV** av_store(AV*, I32 key, SV* val);
354
355 The "av_len" function returns the highest index value in array (just
356 like $#array in Perl). If the array is empty, -1 is returned. The
357 "av_fetch" function returns the value at index "key", but if "lval" is
358 non-zero, then "av_fetch" will store an undef value at that index. The
359 "av_store" function stores the value "val" at index "key", and does not
360 increment the reference count of "val". Thus the caller is responsible
361 for taking care of that, and if "av_store" returns NULL, the caller
362 will have to decrement the reference count to avoid a memory leak.
363 Note that "av_fetch" and "av_store" both return "SV**"'s, not "SV*"'s
364 as their return value.
365
366 void av_clear(AV*);
367 void av_undef(AV*);
368 void av_extend(AV*, I32 key);
369
370 The "av_clear" function deletes all the elements in the AV* array, but
371 does not actually delete the array itself. The "av_undef" function
372 will delete all the elements in the array plus the array itself. The
373 "av_extend" function extends the array so that it contains at least
374 "key+1" elements. If "key+1" is less than the currently allocated
375 length of the array, then nothing is done.
376
377 If you know the name of an array variable, you can get a pointer to its
378 AV by using the following:
379
380 AV* get_av("package::varname", FALSE);
381
382 This returns NULL if the variable does not exist.
383
384 See "Understanding the Magic of Tied Hashes and Arrays" for more infor‐
385 mation on how to use the array access functions on tied arrays.
386
387 Working with HVs
388
389 To create an HV, you use the following routine:
390
391 HV* newHV();
392
393 Once the HV has been created, the following operations are possible on
394 HVs:
395
396 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
397 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
398
399 The "klen" parameter is the length of the key being passed in (Note
400 that you cannot pass 0 in as a value of "klen" to tell Perl to measure
401 the length of the key). The "val" argument contains the SV pointer to
402 the scalar being stored, and "hash" is the precomputed hash value (zero
403 if you want "hv_store" to calculate it for you). The "lval" parameter
404 indicates whether this fetch is actually a part of a store operation,
405 in which case a new undefined value will be added to the HV with the
406 supplied key and "hv_fetch" will return as if the value had already
407 existed.
408
409 Remember that "hv_store" and "hv_fetch" return "SV**"'s and not just
410 "SV*". To access the scalar value, you must first dereference the
411 return value. However, you should check to make sure that the return
412 value is not NULL before dereferencing it.
413
414 These two functions check if a hash table entry exists, and deletes it.
415
416 bool hv_exists(HV*, const char* key, U32 klen);
417 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
418
419 If "flags" does not include the "G_DISCARD" flag then "hv_delete" will
420 create and return a mortal copy of the deleted value.
421
422 And more miscellaneous functions:
423
424 void hv_clear(HV*);
425 void hv_undef(HV*);
426
427 Like their AV counterparts, "hv_clear" deletes all the entries in the
428 hash table but does not actually delete the hash table. The "hv_undef"
429 deletes both the entries and the hash table itself.
430
431 Perl keeps the actual data in linked list of structures with a typedef
432 of HE. These contain the actual key and value pointers (plus extra
433 administrative overhead). The key is a string pointer; the value is an
434 "SV*". However, once you have an "HE*", to get the actual key and
435 value, use the routines specified below.
436
437 I32 hv_iterinit(HV*);
438 /* Prepares starting point to traverse hash table */
439 HE* hv_iternext(HV*);
440 /* Get the next entry, and return a pointer to a
441 structure that has both the key and value */
442 char* hv_iterkey(HE* entry, I32* retlen);
443 /* Get the key from an HE structure and also return
444 the length of the key string */
445 SV* hv_iterval(HV*, HE* entry);
446 /* Return an SV pointer to the value of the HE
447 structure */
448 SV* hv_iternextsv(HV*, char** key, I32* retlen);
449 /* This convenience routine combines hv_iternext,
450 hv_iterkey, and hv_iterval. The key and retlen
451 arguments are return values for the key and its
452 length. The value is returned in the SV* argument */
453
454 If you know the name of a hash variable, you can get a pointer to its
455 HV by using the following:
456
457 HV* get_hv("package::varname", FALSE);
458
459 This returns NULL if the variable does not exist.
460
461 The hash algorithm is defined in the "PERL_HASH(hash, key, klen)"
462 macro:
463
464 hash = 0;
465 while (klen--)
466 hash = (hash * 33) + *key++;
467 hash = hash + (hash >> 5); /* after 5.6 */
468
469 The last step was added in version 5.6 to improve distribution of lower
470 bits in the resulting hash value.
471
472 See "Understanding the Magic of Tied Hashes and Arrays" for more infor‐
473 mation on how to use the hash access functions on tied hashes.
474
475 Hash API Extensions
476
477 Beginning with version 5.004, the following functions are also sup‐
478 ported:
479
480 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
481 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
482
483 bool hv_exists_ent (HV* tb, SV* key, U32 hash);
484 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
485
486 SV* hv_iterkeysv (HE* entry);
487
488 Note that these functions take "SV*" keys, which simplifies writing of
489 extension code that deals with hash structures. These functions also
490 allow passing of "SV*" keys to "tie" functions without forcing you to
491 stringify the keys (unlike the previous set of functions).
492
493 They also return and accept whole hash entries ("HE*"), making their
494 use more efficient (since the hash number for a particular string
495 doesn't have to be recomputed every time). See perlapi for detailed
496 descriptions.
497
498 The following macros must always be used to access the contents of hash
499 entries. Note that the arguments to these macros must be simple vari‐
500 ables, since they may get evaluated more than once. See perlapi for
501 detailed descriptions of these macros.
502
503 HePV(HE* he, STRLEN len)
504 HeVAL(HE* he)
505 HeHASH(HE* he)
506 HeSVKEY(HE* he)
507 HeSVKEY_force(HE* he)
508 HeSVKEY_set(HE* he, SV* sv)
509
510 These two lower level macros are defined, but must only be used when
511 dealing with keys that are not "SV*"s:
512
513 HeKEY(HE* he)
514 HeKLEN(HE* he)
515
516 Note that both "hv_store" and "hv_store_ent" do not increment the ref‐
517 erence count of the stored "val", which is the caller's responsibility.
518 If these functions return a NULL value, the caller will usually have to
519 decrement the reference count of "val" to avoid a memory leak.
520
521 AVs, HVs and undefined values
522
523 Sometimes you have to store undefined values in AVs or HVs. Although
524 this may be a rare case, it can be tricky. That's because you're used
525 to using &PL_sv_undef if you need an undefined SV.
526
527 For example, intuition tells you that this XS code:
528
529 AV *av = newAV();
530 av_store( av, 0, &PL_sv_undef );
531
532 is equivalent to this Perl code:
533
534 my @av;
535 $av[0] = undef;
536
537 Unfortunately, this isn't true. AVs use &PL_sv_undef as a marker for
538 indicating that an array element has not yet been initialized. Thus,
539 "exists $av[0]" would be true for the above Perl code, but false for
540 the array generated by the XS code.
541
542 Other problems can occur when storing &PL_sv_undef in HVs:
543
544 hv_store( hv, "key", 3, &PL_sv_undef, 0 );
545
546 This will indeed make the value "undef", but if you try to modify the
547 value of "key", you'll get the following error:
548
549 Modification of non-creatable hash value attempted
550
551 In perl 5.8.0, &PL_sv_undef was also used to mark placeholders in
552 restricted hashes. This caused such hash entries not to appear when
553 iterating over the hash or when checking for the keys with the
554 "hv_exists" function.
555
556 You can run into similar problems when you store &PL_sv_true or
557 &PL_sv_false into AVs or HVs. Trying to modify such elements will give
558 you the following error:
559
560 Modification of a read-only value attempted
561
562 To make a long story short, you can use the special variables
563 &PL_sv_undef, &PL_sv_true and &PL_sv_false with AVs and HVs, but you
564 have to make sure you know what you're doing.
565
566 Generally, if you want to store an undefined value in an AV or HV, you
567 should not use &PL_sv_undef, but rather create a new undefined value
568 using the "newSV" function, for example:
569
570 av_store( av, 42, newSV(0) );
571 hv_store( hv, "foo", 3, newSV(0), 0 );
572
573 References
574
575 References are a special type of scalar that point to other data types
576 (including references).
577
578 To create a reference, use either of the following functions:
579
580 SV* newRV_inc((SV*) thing);
581 SV* newRV_noinc((SV*) thing);
582
583 The "thing" argument can be any of an "SV*", "AV*", or "HV*". The
584 functions are identical except that "newRV_inc" increments the refer‐
585 ence count of the "thing", while "newRV_noinc" does not. For histori‐
586 cal reasons, "newRV" is a synonym for "newRV_inc".
587
588 Once you have a reference, you can use the following macro to derefer‐
589 ence the reference:
590
591 SvRV(SV*)
592
593 then call the appropriate routines, casting the returned "SV*" to
594 either an "AV*" or "HV*", if required.
595
596 To determine if an SV is a reference, you can use the following macro:
597
598 SvROK(SV*)
599
600 To discover what type of value the reference refers to, use the follow‐
601 ing macro and then check the return value.
602
603 SvTYPE(SvRV(SV*))
604
605 The most useful types that will be returned are:
606
607 SVt_IV Scalar
608 SVt_NV Scalar
609 SVt_PV Scalar
610 SVt_RV Scalar
611 SVt_PVAV Array
612 SVt_PVHV Hash
613 SVt_PVCV Code
614 SVt_PVGV Glob (possible a file handle)
615 SVt_PVMG Blessed or Magical Scalar
616
617 See the sv.h header file for more details.
618
619 Blessed References and Class Objects
620
621 References are also used to support object-oriented programming. In
622 perl's OO lexicon, an object is simply a reference that has been
623 blessed into a package (or class). Once blessed, the programmer may
624 now use the reference to access the various methods in the class.
625
626 A reference can be blessed into a package with the following function:
627
628 SV* sv_bless(SV* sv, HV* stash);
629
630 The "sv" argument must be a reference value. The "stash" argument
631 specifies which class the reference will belong to. See "Stashes and
632 Globs" for information on converting class names into stashes.
633
634 /* Still under construction */
635
636 Upgrades rv to reference if not already one. Creates new SV for rv to
637 point to. If "classname" is non-null, the SV is blessed into the spec‐
638 ified class. SV is returned.
639
640 SV* newSVrv(SV* rv, const char* classname);
641
642 Copies integer, unsigned integer or double into an SV whose reference
643 is "rv". SV is blessed if "classname" is non-null.
644
645 SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
646 SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
647 SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
648
649 Copies the pointer value (the address, not the string!) into an SV
650 whose reference is rv. SV is blessed if "classname" is non-null.
651
652 SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
653
654 Copies string into an SV whose reference is "rv". Set length to 0 to
655 let Perl calculate the string length. SV is blessed if "classname" is
656 non-null.
657
658 SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
659
660 Tests whether the SV is blessed into the specified class. It does not
661 check inheritance relationships.
662
663 int sv_isa(SV* sv, const char* name);
664
665 Tests whether the SV is a reference to a blessed object.
666
667 int sv_isobject(SV* sv);
668
669 Tests whether the SV is derived from the specified class. SV can be
670 either a reference to a blessed object or a string containing a class
671 name. This is the function implementing the "UNIVERSAL::isa" function‐
672 ality.
673
674 bool sv_derived_from(SV* sv, const char* name);
675
676 To check if you've got an object derived from a specific class you have
677 to write:
678
679 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
680
681 Creating New Variables
682
683 To create a new Perl variable with an undef value which can be accessed
684 from your Perl script, use the following routines, depending on the
685 variable type.
686
687 SV* get_sv("package::varname", TRUE);
688 AV* get_av("package::varname", TRUE);
689 HV* get_hv("package::varname", TRUE);
690
691 Notice the use of TRUE as the second parameter. The new variable can
692 now be set, using the routines appropriate to the data type.
693
694 There are additional macros whose values may be bitwise OR'ed with the
695 "TRUE" argument to enable certain extra features. Those bits are:
696
697 GV_ADDMULTI
698 Marks the variable as multiply defined, thus preventing the:
699
700 Name <varname> used only once: possible typo
701
702 warning.
703
704 GV_ADDWARN
705 Issues the warning:
706
707 Had to create <varname> unexpectedly
708
709 if the variable did not exist before the function was called.
710
711 If you do not specify a package name, the variable is created in the
712 current package.
713
714 Reference Counts and Mortality
715
716 Perl uses a reference count-driven garbage collection mechanism. SVs,
717 AVs, or HVs (xV for short in the following) start their life with a
718 reference count of 1. If the reference count of an xV ever drops to 0,
719 then it will be destroyed and its memory made available for reuse.
720
721 This normally doesn't happen at the Perl level unless a variable is
722 undef'ed or the last variable holding a reference to it is changed or
723 overwritten. At the internal level, however, reference counts can be
724 manipulated with the following macros:
725
726 int SvREFCNT(SV* sv);
727 SV* SvREFCNT_inc(SV* sv);
728 void SvREFCNT_dec(SV* sv);
729
730 However, there is one other function which manipulates the reference
731 count of its argument. The "newRV_inc" function, you will recall, cre‐
732 ates a reference to the specified argument. As a side effect, it
733 increments the argument's reference count. If this is not what you
734 want, use "newRV_noinc" instead.
735
736 For example, imagine you want to return a reference from an XSUB func‐
737 tion. Inside the XSUB routine, you create an SV which initially has a
738 reference count of one. Then you call "newRV_inc", passing it the
739 just-created SV. This returns the reference as a new SV, but the ref‐
740 erence count of the SV you passed to "newRV_inc" has been incremented
741 to two. Now you return the reference from the XSUB routine and forget
742 about the SV. But Perl hasn't! Whenever the returned reference is
743 destroyed, the reference count of the original SV is decreased to one
744 and nothing happens. The SV will hang around without any way to access
745 it until Perl itself terminates. This is a memory leak.
746
747 The correct procedure, then, is to use "newRV_noinc" instead of
748 "newRV_inc". Then, if and when the last reference is destroyed, the
749 reference count of the SV will go to zero and it will be destroyed,
750 stopping any memory leak.
751
752 There are some convenience functions available that can help with the
753 destruction of xVs. These functions introduce the concept of "mortal‐
754 ity". An xV that is mortal has had its reference count marked to be
755 decremented, but not actually decremented, until "a short time later".
756 Generally the term "short time later" means a single Perl statement,
757 such as a call to an XSUB function. The actual determinant for when
758 mortal xVs have their reference count decremented depends on two
759 macros, SAVETMPS and FREETMPS. See perlcall and perlxs for more
760 details on these macros.
761
762 "Mortalization" then is at its simplest a deferred "SvREFCNT_dec".
763 However, if you mortalize a variable twice, the reference count will
764 later be decremented twice.
765
766 "Mortal" SVs are mainly used for SVs that are placed on perl's stack.
767 For example an SV which is created just to pass a number to a called
768 sub is made mortal to have it cleaned up automatically when it's popped
769 off the stack. Similarly, results returned by XSUBs (which are pushed
770 on the stack) are often made mortal.
771
772 To create a mortal variable, use the functions:
773
774 SV* sv_newmortal()
775 SV* sv_2mortal(SV*)
776 SV* sv_mortalcopy(SV*)
777
778 The first call creates a mortal SV (with no value), the second converts
779 an existing SV to a mortal SV (and thus defers a call to "SvRE‐
780 FCNT_dec"), and the third creates a mortal copy of an existing SV.
781 Because "sv_newmortal" gives the new SV no value,it must normally be
782 given one via "sv_setpv", "sv_setiv", etc. :
783
784 SV *tmp = sv_newmortal();
785 sv_setiv(tmp, an_integer);
786
787 As that is multiple C statements it is quite common so see this idiom
788 instead:
789
790 SV *tmp = sv_2mortal(newSViv(an_integer));
791
792 You should be careful about creating mortal variables. Strange things
793 can happen if you make the same value mortal within multiple contexts,
794 or if you make a variable mortal multiple times. Thinking of "Mortal‐
795 ization" as deferred "SvREFCNT_dec" should help to minimize such prob‐
796 lems. For example if you are passing an SV which you know has high
797 enough REFCNT to survive its use on the stack you need not do any mor‐
798 talization. If you are not sure then doing an "SvREFCNT_inc" and
799 "sv_2mortal", or making a "sv_mortalcopy" is safer.
800
801 The mortal routines are not just for SVs -- AVs and HVs can be made
802 mortal by passing their address (type-casted to "SV*") to the "sv_2mor‐
803 tal" or "sv_mortalcopy" routines.
804
805 Stashes and Globs
806
807 A stash is a hash that contains all variables that are defined within a
808 package. Each key of the stash is a symbol name (shared by all the
809 different types of objects that have the same name), and each value in
810 the hash table is a GV (Glob Value). This GV in turn contains refer‐
811 ences to the various objects of that name, including (but not limited
812 to) the following:
813
814 Scalar Value
815 Array Value
816 Hash Value
817 I/O Handle
818 Format
819 Subroutine
820
821 There is a single stash called "PL_defstash" that holds the items that
822 exist in the "main" package. To get at the items in other packages,
823 append the string "::" to the package name. The items in the "Foo"
824 package are in the stash "Foo::" in PL_defstash. The items in the
825 "Bar::Baz" package are in the stash "Baz::" in "Bar::"'s stash.
826
827 To get the stash pointer for a particular package, use the function:
828
829 HV* gv_stashpv(const char* name, I32 create)
830 HV* gv_stashsv(SV*, I32 create)
831
832 The first function takes a literal string, the second uses the string
833 stored in the SV. Remember that a stash is just a hash table, so you
834 get back an "HV*". The "create" flag will create a new package if it
835 is set.
836
837 The name that "gv_stash*v" wants is the name of the package whose sym‐
838 bol table you want. The default package is called "main". If you have
839 multiply nested packages, pass their names to "gv_stash*v", separated
840 by "::" as in the Perl language itself.
841
842 Alternately, if you have an SV that is a blessed reference, you can
843 find out the stash pointer by using:
844
845 HV* SvSTASH(SvRV(SV*));
846
847 then use the following to get the package name itself:
848
849 char* HvNAME(HV* stash);
850
851 If you need to bless or re-bless an object you can use the following
852 function:
853
854 SV* sv_bless(SV*, HV* stash)
855
856 where the first argument, an "SV*", must be a reference, and the second
857 argument is a stash. The returned "SV*" can now be used in the same
858 way as any other SV.
859
860 For more information on references and blessings, consult perlref.
861
862 Double-Typed SVs
863
864 Scalar variables normally contain only one type of value, an integer,
865 double, pointer, or reference. Perl will automatically convert the
866 actual scalar data from the stored type into the requested type.
867
868 Some scalar variables contain more than one type of scalar data. For
869 example, the variable $! contains either the numeric value of "errno"
870 or its string equivalent from either "strerror" or "sys_errlist[]".
871
872 To force multiple data values into an SV, you must do two things: use
873 the "sv_set*v" routines to add the additional scalar type, then set a
874 flag so that Perl will believe it contains more than one type of data.
875 The four macros to set the flags are:
876
877 SvIOK_on
878 SvNOK_on
879 SvPOK_on
880 SvROK_on
881
882 The particular macro you must use depends on which "sv_set*v" routine
883 you called first. This is because every "sv_set*v" routine turns on
884 only the bit for the particular type of data being set, and turns off
885 all the rest.
886
887 For example, to create a new Perl variable called "dberror" that con‐
888 tains both the numeric and descriptive string error values, you could
889 use the following code:
890
891 extern int dberror;
892 extern char *dberror_list;
893
894 SV* sv = get_sv("dberror", TRUE);
895 sv_setiv(sv, (IV) dberror);
896 sv_setpv(sv, dberror_list[dberror]);
897 SvIOK_on(sv);
898
899 If the order of "sv_setiv" and "sv_setpv" had been reversed, then the
900 macro "SvPOK_on" would need to be called instead of "SvIOK_on".
901
902 Magic Variables
903
904 [This section still under construction. Ignore everything here. Post
905 no bills. Everything not permitted is forbidden.]
906
907 Any SV may be magical, that is, it has special features that a normal
908 SV does not have. These features are stored in the SV structure in a
909 linked list of "struct magic"'s, typedef'ed to "MAGIC".
910
911 struct magic {
912 MAGIC* mg_moremagic;
913 MGVTBL* mg_virtual;
914 U16 mg_private;
915 char mg_type;
916 U8 mg_flags;
917 SV* mg_obj;
918 char* mg_ptr;
919 I32 mg_len;
920 };
921
922 Note this is current as of patchlevel 0, and could change at any time.
923
924 Assigning Magic
925
926 Perl adds magic to an SV using the sv_magic function:
927
928 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
929
930 The "sv" argument is a pointer to the SV that is to acquire a new magi‐
931 cal feature.
932
933 If "sv" is not already magical, Perl uses the "SvUPGRADE" macro to con‐
934 vert "sv" to type "SVt_PVMG". Perl then continues by adding new magic
935 to the beginning of the linked list of magical features. Any prior
936 entry of the same type of magic is deleted. Note that this can be
937 overridden, and multiple instances of the same type of magic can be
938 associated with an SV.
939
940 The "name" and "namlen" arguments are used to associate a string with
941 the magic, typically the name of a variable. "namlen" is stored in the
942 "mg_len" field and if "name" is non-null then either a "savepvn" copy
943 of "name" or "name" itself is stored in the "mg_ptr" field, depending
944 on whether "namlen" is greater than zero or equal to zero respectively.
945 As a special case, if "(name && namlen == HEf_SVKEY)" then "name" is
946 assumed to contain an "SV*" and is stored as-is with its REFCNT incre‐
947 mented.
948
949 The sv_magic function uses "how" to determine which, if any, predefined
950 "Magic Virtual Table" should be assigned to the "mg_virtual" field.
951 See the "Magic Virtual Tables" section below. The "how" argument is
952 also stored in the "mg_type" field. The value of "how" should be chosen
953 from the set of macros "PERL_MAGIC_foo" found in perl.h. Note that
954 before these macros were added, Perl internals used to directly use
955 character literals, so you may occasionally come across old code or
956 documentation referring to 'U' magic rather than "PERL_MAGIC_uvar" for
957 example.
958
959 The "obj" argument is stored in the "mg_obj" field of the "MAGIC"
960 structure. If it is not the same as the "sv" argument, the reference
961 count of the "obj" object is incremented. If it is the same, or if the
962 "how" argument is "PERL_MAGIC_arylen", or if it is a NULL pointer, then
963 "obj" is merely stored, without the reference count being incremented.
964
965 See also "sv_magicext" in perlapi for a more flexible way to add magic
966 to an SV.
967
968 There is also a function to add magic to an "HV":
969
970 void hv_magic(HV *hv, GV *gv, int how);
971
972 This simply calls "sv_magic" and coerces the "gv" argument into an
973 "SV".
974
975 To remove the magic from an SV, call the function sv_unmagic:
976
977 void sv_unmagic(SV *sv, int type);
978
979 The "type" argument should be equal to the "how" value when the "SV"
980 was initially made magical.
981
982 Magic Virtual Tables
983
984 The "mg_virtual" field in the "MAGIC" structure is a pointer to an
985 "MGVTBL", which is a structure of function pointers and stands for
986 "Magic Virtual Table" to handle the various operations that might be
987 applied to that variable.
988
989 The "MGVTBL" has five pointers to the following routine types:
990
991 int (*svt_get)(SV* sv, MAGIC* mg);
992 int (*svt_set)(SV* sv, MAGIC* mg);
993 U32 (*svt_len)(SV* sv, MAGIC* mg);
994 int (*svt_clear)(SV* sv, MAGIC* mg);
995 int (*svt_free)(SV* sv, MAGIC* mg);
996
997 This MGVTBL structure is set at compile-time in perl.h and there are
998 currently 19 types (or 21 with overloading turned on). These different
999 structures contain pointers to various routines that perform additional
1000 actions depending on which function is being called.
1001
1002 Function pointer Action taken
1003 ---------------- ------------
1004 svt_get Do something before the value of the SV is retrieved.
1005 svt_set Do something after the SV is assigned a value.
1006 svt_len Report on the SV's length.
1007 svt_clear Clear something the SV represents.
1008 svt_free Free any extra storage associated with the SV.
1009
1010 For instance, the MGVTBL structure called "vtbl_sv" (which corresponds
1011 to an "mg_type" of "PERL_MAGIC_sv") contains:
1012
1013 { magic_get, magic_set, magic_len, 0, 0 }
1014
1015 Thus, when an SV is determined to be magical and of type
1016 "PERL_MAGIC_sv", if a get operation is being performed, the routine
1017 "magic_get" is called. All the various routines for the various magi‐
1018 cal types begin with "magic_". NOTE: the magic routines are not con‐
1019 sidered part of the Perl API, and may not be exported by the Perl
1020 library.
1021
1022 The current kinds of Magic Virtual Tables are:
1023
1024 mg_type
1025 (old-style char and macro) MGVTBL Type of magic
1026 -------------------------- ------ ----------------------------
1027 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
1028 A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash
1029 a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element
1030 c PERL_MAGIC_overload_table (none) Holds overload table (AMT)
1031 on stash
1032 B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search)
1033 D PERL_MAGIC_regdata vtbl_regdata Regex match position data
1034 (@+ and @- vars)
1035 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
1036 element
1037 E PERL_MAGIC_env vtbl_env %ENV hash
1038 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
1039 f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format)
1040 g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string
1041 I PERL_MAGIC_isa vtbl_isa @ISA array
1042 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
1043 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
1044 L PERL_MAGIC_dbfile (none) Debugger %_<filename
1045 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename element
1046 m PERL_MAGIC_mutex vtbl_mutex ???
1047 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale collate transformation
1048 P PERL_MAGIC_tied vtbl_pack Tied array or hash
1049 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
1050 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
1051 r PERL_MAGIC_qr vtbl_qr precompiled qr// regex
1052 S PERL_MAGIC_sig vtbl_sig %SIG hash
1053 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
1054 t PERL_MAGIC_taint vtbl_taint Taintedness
1055 U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions
1056 v PERL_MAGIC_vec vtbl_vec vec() lvalue
1057 V PERL_MAGIC_vstring (none) v-string scalars
1058 w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cache
1059 x PERL_MAGIC_substr vtbl_substr substr() lvalue
1060 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
1061 variable / smart parameter
1062 vivification
1063 * PERL_MAGIC_glob vtbl_glob GV (typeglob)
1064 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
1065 . PERL_MAGIC_pos vtbl_pos pos() lvalue
1066 < PERL_MAGIC_backref vtbl_backref ???
1067 ~ PERL_MAGIC_ext (none) Available for use by extensions
1068
1069 When an uppercase and lowercase letter both exist in the table, then
1070 the uppercase letter is typically used to represent some kind of com‐
1071 posite type (a list or a hash), and the lowercase letter is used to
1072 represent an element of that composite type. Some internals code makes
1073 use of this case relationship. However, 'v' and 'V' (vec and v-string)
1074 are in no way related.
1075
1076 The "PERL_MAGIC_ext" and "PERL_MAGIC_uvar" magic types are defined
1077 specifically for use by extensions and will not be used by perl itself.
1078 Extensions can use "PERL_MAGIC_ext" magic to 'attach' private informa‐
1079 tion to variables (typically objects). This is especially useful
1080 because there is no way for normal perl code to corrupt this private
1081 information (unlike using extra elements of a hash object).
1082
1083 Similarly, "PERL_MAGIC_uvar" magic can be used much like tie() to call
1084 a C function any time a scalar's value is used or changed. The
1085 "MAGIC"'s "mg_ptr" field points to a "ufuncs" structure:
1086
1087 struct ufuncs {
1088 I32 (*uf_val)(pTHX_ IV, SV*);
1089 I32 (*uf_set)(pTHX_ IV, SV*);
1090 IV uf_index;
1091 };
1092
1093 When the SV is read from or written to, the "uf_val" or "uf_set" func‐
1094 tion will be called with "uf_index" as the first arg and a pointer to
1095 the SV as the second. A simple example of how to add "PERL_MAGIC_uvar"
1096 magic is shown below. Note that the ufuncs structure is copied by
1097 sv_magic, so you can safely allocate it on the stack.
1098
1099 void
1100 Umagic(sv)
1101 SV *sv;
1102 PREINIT:
1103 struct ufuncs uf;
1104 CODE:
1105 uf.uf_val = &my_get_fn;
1106 uf.uf_set = &my_set_fn;
1107 uf.uf_index = 0;
1108 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1109
1110 Note that because multiple extensions may be using "PERL_MAGIC_ext" or
1111 "PERL_MAGIC_uvar" magic, it is important for extensions to take extra
1112 care to avoid conflict. Typically only using the magic on objects
1113 blessed into the same class as the extension is sufficient. For
1114 "PERL_MAGIC_ext" magic, it may also be appropriate to add an I32 'sig‐
1115 nature' at the top of the private data area and check that.
1116
1117 Also note that the "sv_set*()" and "sv_cat*()" functions described ear‐
1118 lier do not invoke 'set' magic on their targets. This must be done by
1119 the user either by calling the "SvSETMAGIC()" macro after calling these
1120 functions, or by using one of the "sv_set*_mg()" or "sv_cat*_mg()"
1121 functions. Similarly, generic C code must call the "SvGETMAGIC()"
1122 macro to invoke any 'get' magic if they use an SV obtained from exter‐
1123 nal sources in functions that don't handle magic. See perlapi for a
1124 description of these functions. For example, calls to the "sv_cat*()"
1125 functions typically need to be followed by "SvSETMAGIC()", but they
1126 don't need a prior "SvGETMAGIC()" since their implementation handles
1127 'get' magic.
1128
1129 Finding Magic
1130
1131 MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
1132
1133 This routine returns a pointer to the "MAGIC" structure stored in the
1134 SV. If the SV does not have that magical feature, "NULL" is returned.
1135 Also, if the SV is not of type SVt_PVMG, Perl may core dump.
1136
1137 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1138
1139 This routine checks to see what types of magic "sv" has. If the
1140 mg_type field is an uppercase letter, then the mg_obj is copied to
1141 "nsv", but the mg_type field is changed to be the lowercase letter.
1142
1143 Understanding the Magic of Tied Hashes and Arrays
1144
1145 Tied hashes and arrays are magical beasts of the "PERL_MAGIC_tied"
1146 magic type.
1147
1148 WARNING: As of the 5.004 release, proper usage of the array and hash
1149 access functions requires understanding a few caveats. Some of these
1150 caveats are actually considered bugs in the API, to be fixed in later
1151 releases, and are bracketed with [MAYCHANGE] below. If you find your‐
1152 self actually applying such information in this section, be aware that
1153 the behavior may change in the future, umm, without warning.
1154
1155 The perl tie function associates a variable with an object that imple‐
1156 ments the various GET, SET, etc methods. To perform the equivalent of
1157 the perl tie function from an XSUB, you must mimic this behaviour. The
1158 code below carries out the necessary steps - firstly it creates a new
1159 hash, and then creates a second hash which it blesses into the class
1160 which will implement the tie methods. Lastly it ties the two hashes
1161 together, and returns a reference to the new tied hash. Note that the
1162 code below does NOT call the TIEHASH method in the MyTie class - see
1163 "Calling Perl Routines from within C Programs" for details on how to do
1164 this.
1165
1166 SV*
1167 mytie()
1168 PREINIT:
1169 HV *hash;
1170 HV *stash;
1171 SV *tie;
1172 CODE:
1173 hash = newHV();
1174 tie = newRV_noinc((SV*)newHV());
1175 stash = gv_stashpv("MyTie", TRUE);
1176 sv_bless(tie, stash);
1177 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1178 RETVAL = newRV_noinc(hash);
1179 OUTPUT:
1180 RETVAL
1181
1182 The "av_store" function, when given a tied array argument, merely
1183 copies the magic of the array onto the value to be "stored", using
1184 "mg_copy". It may also return NULL, indicating that the value did not
1185 actually need to be stored in the array. [MAYCHANGE] After a call to
1186 "av_store" on a tied array, the caller will usually need to call
1187 "mg_set(val)" to actually invoke the perl level "STORE" method on the
1188 TIEARRAY object. If "av_store" did return NULL, a call to "SvRE‐
1189 FCNT_dec(val)" will also be usually necessary to avoid a memory leak.
1190 [/MAYCHANGE]
1191
1192 The previous paragraph is applicable verbatim to tied hash access using
1193 the "hv_store" and "hv_store_ent" functions as well.
1194
1195 "av_fetch" and the corresponding hash functions "hv_fetch" and
1196 "hv_fetch_ent" actually return an undefined mortal value whose magic
1197 has been initialized using "mg_copy". Note the value so returned does
1198 not need to be deallocated, as it is already mortal. [MAYCHANGE] But
1199 you will need to call "mg_get()" on the returned value in order to
1200 actually invoke the perl level "FETCH" method on the underlying TIE
1201 object. Similarly, you may also call "mg_set()" on the return value
1202 after possibly assigning a suitable value to it using "sv_setsv",
1203 which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]
1204
1205 [MAYCHANGE] In other words, the array or hash fetch/store functions
1206 don't really fetch and store actual values in the case of tied arrays
1207 and hashes. They merely call "mg_copy" to attach magic to the values
1208 that were meant to be "stored" or "fetched". Later calls to "mg_get"
1209 and "mg_set" actually do the job of invoking the TIE methods on the
1210 underlying objects. Thus the magic mechanism currently implements a
1211 kind of lazy access to arrays and hashes.
1212
1213 Currently (as of perl version 5.004), use of the hash and array access
1214 functions requires the user to be aware of whether they are operating
1215 on "normal" hashes and arrays, or on their tied variants. The API may
1216 be changed to provide more transparent access to both tied and normal
1217 data types in future versions. [/MAYCHANGE]
1218
1219 You would do well to understand that the TIEARRAY and TIEHASH inter‐
1220 faces are mere sugar to invoke some perl method calls while using the
1221 uniform hash and array syntax. The use of this sugar imposes some
1222 overhead (typically about two to four extra opcodes per FETCH/STORE
1223 operation, in addition to the creation of all the mortal variables
1224 required to invoke the methods). This overhead will be comparatively
1225 small if the TIE methods are themselves substantial, but if they are
1226 only a few statements long, the overhead will not be insignificant.
1227
1228 Localizing changes
1229
1230 Perl has a very handy construction
1231
1232 {
1233 local $var = 2;
1234 ...
1235 }
1236
1237 This construction is approximately equivalent to
1238
1239 {
1240 my $oldvar = $var;
1241 $var = 2;
1242 ...
1243 $var = $oldvar;
1244 }
1245
1246 The biggest difference is that the first construction would reinstate
1247 the initial value of $var, irrespective of how control exits the block:
1248 "goto", "return", "die"/"eval", etc. It is a little bit more efficient
1249 as well.
1250
1251 There is a way to achieve a similar task from C via Perl API: create a
1252 pseudo-block, and arrange for some changes to be automatically undone
1253 at the end of it, either explicit, or via a non-local exit (via die()).
1254 A block-like construct is created by a pair of "ENTER"/"LEAVE" macros
1255 (see "Returning a Scalar" in perlcall). Such a construct may be cre‐
1256 ated specially for some important localized task, or an existing one
1257 (like boundaries of enclosing Perl subroutine/block, or an existing
1258 pair for freeing TMPs) may be used. (In the second case the overhead of
1259 additional localization must be almost negligible.) Note that any XSUB
1260 is automatically enclosed in an "ENTER"/"LEAVE" pair.
1261
1262 Inside such a pseudo-block the following service is available:
1263
1264 "SAVEINT(int i)"
1265 "SAVEIV(IV i)"
1266 "SAVEI32(I32 i)"
1267 "SAVELONG(long i)"
1268 These macros arrange things to restore the value of integer vari‐
1269 able "i" at the end of enclosing pseudo-block.
1270
1271 SAVESPTR(s)
1272 SAVEPPTR(p)
1273 These macros arrange things to restore the value of pointers "s"
1274 and "p". "s" must be a pointer of a type which survives conversion
1275 to "SV*" and back, "p" should be able to survive conversion to
1276 "char*" and back.
1277
1278 "SAVEFREESV(SV *sv)"
1279 The refcount of "sv" would be decremented at the end of pseudo-
1280 block. This is similar to "sv_2mortal" in that it is also a mecha‐
1281 nism for doing a delayed "SvREFCNT_dec". However, while "sv_2mor‐
1282 tal" extends the lifetime of "sv" until the beginning of the next
1283 statement, "SAVEFREESV" extends it until the end of the enclosing
1284 scope. These lifetimes can be wildly different.
1285
1286 Also compare "SAVEMORTALIZESV".
1287
1288 "SAVEMORTALIZESV(SV *sv)"
1289 Just like "SAVEFREESV", but mortalizes "sv" at the end of the cur‐
1290 rent scope instead of decrementing its reference count. This usu‐
1291 ally has the effect of keeping "sv" alive until the statement that
1292 called the currently live scope has finished executing.
1293
1294 "SAVEFREEOP(OP *op)"
1295 The "OP *" is op_free()ed at the end of pseudo-block.
1296
1297 SAVEFREEPV(p)
1298 The chunk of memory which is pointed to by "p" is Safefree()ed at
1299 the end of pseudo-block.
1300
1301 "SAVECLEARSV(SV *sv)"
1302 Clears a slot in the current scratchpad which corresponds to "sv"
1303 at the end of pseudo-block.
1304
1305 "SAVEDELETE(HV *hv, char *key, I32 length)"
1306 The key "key" of "hv" is deleted at the end of pseudo-block. The
1307 string pointed to by "key" is Safefree()ed. If one has a key in
1308 short-lived storage, the corresponding string may be reallocated
1309 like this:
1310
1311 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1312
1313 "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
1314 At the end of pseudo-block the function "f" is called with the only
1315 argument "p".
1316
1317 "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
1318 At the end of pseudo-block the function "f" is called with the
1319 implicit context argument (if any), and "p".
1320
1321 "SAVESTACK_POS()"
1322 The current offset on the Perl internal stack (cf. "SP") is
1323 restored at the end of pseudo-block.
1324
1325 The following API list contains functions, thus one needs to provide
1326 pointers to the modifiable data explicitly (either C pointers, or Perl‐
1327 ish "GV *"s). Where the above macros take "int", a similar function
1328 takes "int *".
1329
1330 "SV* save_scalar(GV *gv)"
1331 Equivalent to Perl code "local $gv".
1332
1333 "AV* save_ary(GV *gv)"
1334 "HV* save_hash(GV *gv)"
1335 Similar to "save_scalar", but localize @gv and %gv.
1336
1337 "void save_item(SV *item)"
1338 Duplicates the current value of "SV", on the exit from the current
1339 "ENTER"/"LEAVE" pseudo-block will restore the value of "SV" using
1340 the stored value.
1341
1342 "void save_list(SV **sarg, I32 maxsarg)"
1343 A variant of "save_item" which takes multiple arguments via an
1344 array "sarg" of "SV*" of length "maxsarg".
1345
1346 "SV* save_svref(SV **sptr)"
1347 Similar to "save_scalar", but will reinstate an "SV *".
1348
1349 "void save_aptr(AV **aptr)"
1350 "void save_hptr(HV **hptr)"
1351 Similar to "save_svref", but localize "AV *" and "HV *".
1352
1353 The "Alias" module implements localization of the basic types within
1354 the caller's scope. People who are interested in how to localize
1355 things in the containing scope should take a look there too.
1356
1358 XSUBs and the Argument Stack
1359
1360 The XSUB mechanism is a simple way for Perl programs to access C sub‐
1361 routines. An XSUB routine will have a stack that contains the argu‐
1362 ments from the Perl program, and a way to map from the Perl data struc‐
1363 tures to a C equivalent.
1364
1365 The stack arguments are accessible through the ST(n) macro, which
1366 returns the "n"'th stack argument. Argument 0 is the first argument
1367 passed in the Perl subroutine call. These arguments are "SV*", and can
1368 be used anywhere an "SV*" is used.
1369
1370 Most of the time, output from the C routine can be handled through use
1371 of the RETVAL and OUTPUT directives. However, there are some cases
1372 where the argument stack is not already long enough to handle all the
1373 return values. An example is the POSIX tzname() call, which takes no
1374 arguments, but returns two, the local time zone's standard and summer
1375 time abbreviations.
1376
1377 To handle this situation, the PPCODE directive is used and the stack is
1378 extended using the macro:
1379
1380 EXTEND(SP, num);
1381
1382 where "SP" is the macro that represents the local copy of the stack
1383 pointer, and "num" is the number of elements the stack should be
1384 extended by.
1385
1386 Now that there is room on the stack, values can be pushed on it using
1387 "PUSHs" macro. The pushed values will often need to be "mortal" (See
1388 "Reference Counts and Mortality"):
1389
1390 PUSHs(sv_2mortal(newSViv(an_integer)))
1391 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
1392 PUSHs(sv_2mortal(newSVnv(a_double)))
1393 PUSHs(sv_2mortal(newSVpv("Some String",0)))
1394
1395 And now the Perl program calling "tzname", the two values will be
1396 assigned as in:
1397
1398 ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1399
1400 An alternate (and possibly simpler) method to pushing values on the
1401 stack is to use the macro:
1402
1403 XPUSHs(SV*)
1404
1405 This macro automatically adjust the stack for you, if needed. Thus,
1406 you do not need to call "EXTEND" to extend the stack.
1407
1408 Despite their suggestions in earlier versions of this document the
1409 macros "(X)PUSH[iunp]" are not suited to XSUBs which return multiple
1410 results. For that, either stick to the "(X)PUSHs" macros shown above,
1411 or use the new "m(X)PUSH[iunp]" macros instead; see "Putting a C value
1412 on Perl stack".
1413
1414 For more information, consult perlxs and perlxstut.
1415
1416 Calling Perl Routines from within C Programs
1417
1418 There are four routines that can be used to call a Perl subroutine from
1419 within a C program. These four are:
1420
1421 I32 call_sv(SV*, I32);
1422 I32 call_pv(const char*, I32);
1423 I32 call_method(const char*, I32);
1424 I32 call_argv(const char*, I32, register char**);
1425
1426 The routine most often used is "call_sv". The "SV*" argument contains
1427 either the name of the Perl subroutine to be called, or a reference to
1428 the subroutine. The second argument consists of flags that control the
1429 context in which the subroutine is called, whether or not the subrou‐
1430 tine is being passed arguments, how errors should be trapped, and how
1431 to treat return values.
1432
1433 All four routines return the number of arguments that the subroutine
1434 returned on the Perl stack.
1435
1436 These routines used to be called "perl_call_sv", etc., before Perl
1437 v5.6.0, but those names are now deprecated; macros of the same name are
1438 provided for compatibility.
1439
1440 When using any of these routines (except "call_argv"), the programmer
1441 must manipulate the Perl stack. These include the following macros and
1442 functions:
1443
1444 dSP
1445 SP
1446 PUSHMARK()
1447 PUTBACK
1448 SPAGAIN
1449 ENTER
1450 SAVETMPS
1451 FREETMPS
1452 LEAVE
1453 XPUSH*()
1454 POP*()
1455
1456 For a detailed description of calling conventions from C to Perl, con‐
1457 sult perlcall.
1458
1459 Memory Allocation
1460
1461 Allocation
1462
1463 All memory meant to be used with the Perl API functions should be
1464 manipulated using the macros described in this section. The macros
1465 provide the necessary transparency between differences in the actual
1466 malloc implementation that is used within perl.
1467
1468 It is suggested that you enable the version of malloc that is distrib‐
1469 uted with Perl. It keeps pools of various sizes of unallocated memory
1470 in order to satisfy allocation requests more quickly. However, on some
1471 platforms, it may cause spurious malloc or free errors.
1472
1473 The following three macros are used to initially allocate memory :
1474
1475 Newx(pointer, number, type);
1476 Newxc(pointer, number, type, cast);
1477 Newxz(pointer, number, type);
1478
1479 The first argument "pointer" should be the name of a variable that will
1480 point to the newly allocated memory.
1481
1482 The second and third arguments "number" and "type" specify how many of
1483 the specified type of data structure should be allocated. The argument
1484 "type" is passed to "sizeof". The final argument to "Newxc", "cast",
1485 should be used if the "pointer" argument is different from the "type"
1486 argument.
1487
1488 Unlike the "Newx" and "Newxc" macros, the "Newxz" macro calls "memzero"
1489 to zero out all the newly allocated memory.
1490
1491 Reallocation
1492
1493 Renew(pointer, number, type);
1494 Renewc(pointer, number, type, cast);
1495 Safefree(pointer)
1496
1497 These three macros are used to change a memory buffer size or to free a
1498 piece of memory no longer needed. The arguments to "Renew" and
1499 "Renewc" match those of "New" and "Newc" with the exception of not
1500 needing the "magic cookie" argument.
1501
1502 Moving
1503
1504 Move(source, dest, number, type);
1505 Copy(source, dest, number, type);
1506 Zero(dest, number, type);
1507
1508 These three macros are used to move, copy, or zero out previously allo‐
1509 cated memory. The "source" and "dest" arguments point to the source
1510 and destination starting points. Perl will move, copy, or zero out
1511 "number" instances of the size of the "type" data structure (using the
1512 "sizeof" function).
1513
1514 PerlIO
1515
1516 The most recent development releases of Perl has been experimenting
1517 with removing Perl's dependency on the "normal" standard I/O suite and
1518 allowing other stdio implementations to be used. This involves creat‐
1519 ing a new abstraction layer that then calls whichever implementation of
1520 stdio Perl was compiled with. All XSUBs should now use the functions
1521 in the PerlIO abstraction layer and not make any assumptions about what
1522 kind of stdio is being used.
1523
1524 For a complete description of the PerlIO abstraction, consult perlapio.
1525
1526 Putting a C value on Perl stack
1527
1528 A lot of opcodes (this is an elementary operation in the internal perl
1529 stack machine) put an SV* on the stack. However, as an optimization the
1530 corresponding SV is (usually) not recreated each time. The opcodes re‐
1531 use specially assigned SVs (targets) which are (as a corollary) not
1532 constantly freed/created.
1533
1534 Each of the targets is created only once (but see "Scratchpads and
1535 recursion" below), and when an opcode needs to put an integer, a dou‐
1536 ble, or a string on stack, it just sets the corresponding parts of its
1537 target and puts the target on stack.
1538
1539 The macro to put this target on stack is "PUSHTARG", and it is directly
1540 used in some opcodes, as well as indirectly in zillions of others,
1541 which use it via "(X)PUSH[iunp]".
1542
1543 Because the target is reused, you must be careful when pushing multiple
1544 values on the stack. The following code will not do what you think:
1545
1546 XPUSHi(10);
1547 XPUSHi(20);
1548
1549 This translates as "set "TARG" to 10, push a pointer to "TARG" onto the
1550 stack; set "TARG" to 20, push a pointer to "TARG" onto the stack". At
1551 the end of the operation, the stack does not contain the values 10 and
1552 20, but actually contains two pointers to "TARG", which we have set to
1553 20.
1554
1555 If you need to push multiple different values then you should either
1556 use the "(X)PUSHs" macros, or else use the new "m(X)PUSH[iunp]" macros,
1557 none of which make use of "TARG". The "(X)PUSHs" macros simply push an
1558 SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
1559 will often need to be "mortal". The new "m(X)PUSH[iunp]" macros make
1560 this a little easier to achieve by creating a new mortal for you (via
1561 "(X)PUSHmortal"), pushing that onto the stack (extending it if neces‐
1562 sary in the case of the "mXPUSH[iunp]" macros), and then setting its
1563 value. Thus, instead of writing this to "fix" the example above:
1564
1565 XPUSHs(sv_2mortal(newSViv(10)))
1566 XPUSHs(sv_2mortal(newSViv(20)))
1567
1568 you can simply write:
1569
1570 mXPUSHi(10)
1571 mXPUSHi(20)
1572
1573 On a related note, if you do use "(X)PUSH[iunp]", then you're going to
1574 need a "dTARG" in your variable declarations so that the "*PUSH*"
1575 macros can make use of the local variable "TARG". See also "dTARGET"
1576 and "dXSTARG".
1577
1578 Scratchpads
1579
1580 The question remains on when the SVs which are targets for opcodes are
1581 created. The answer is that they are created when the current unit -- a
1582 subroutine or a file (for opcodes for statements outside of subrou‐
1583 tines) -- is compiled. During this time a special anonymous Perl array
1584 is created, which is called a scratchpad for the current unit.
1585
1586 A scratchpad keeps SVs which are lexicals for the current unit and are
1587 targets for opcodes. One can deduce that an SV lives on a scratchpad by
1588 looking on its flags: lexicals have "SVs_PADMY" set, and targets have
1589 "SVs_PADTMP" set.
1590
1591 The correspondence between OPs and targets is not 1-to-1. Different OPs
1592 in the compile tree of the unit can use the same target, if this would
1593 not conflict with the expected life of the temporary.
1594
1595 Scratchpads and recursion
1596
1597 In fact it is not 100% true that a compiled unit contains a pointer to
1598 the scratchpad AV. In fact it contains a pointer to an AV of (ini‐
1599 tially) one element, and this element is the scratchpad AV. Why do we
1600 need an extra level of indirection?
1601
1602 The answer is recursion, and maybe threads. Both these can create sev‐
1603 eral execution pointers going into the same subroutine. For the subrou‐
1604 tine-child not write over the temporaries for the subroutine-parent
1605 (lifespan of which covers the call to the child), the parent and the
1606 child should have different scratchpads. (And the lexicals should be
1607 separate anyway!)
1608
1609 So each subroutine is born with an array of scratchpads (of length 1).
1610 On each entry to the subroutine it is checked that the current depth of
1611 the recursion is not more than the length of this array, and if it is,
1612 new scratchpad is created and pushed into the array.
1613
1614 The targets on this scratchpad are "undef"s, but they are already
1615 marked with correct flags.
1616
1618 Code tree
1619
1620 Here we describe the internal form your code is converted to by Perl.
1621 Start with a simple example:
1622
1623 $a = $b + $c;
1624
1625 This is converted to a tree similar to this one:
1626
1627 assign-to
1628 / \
1629 + $a
1630 / \
1631 $b $c
1632
1633 (but slightly more complicated). This tree reflects the way Perl
1634 parsed your code, but has nothing to do with the execution order.
1635 There is an additional "thread" going through the nodes of the tree
1636 which shows the order of execution of the nodes. In our simplified
1637 example above it looks like:
1638
1639 $b ---> $c ---> + ---> $a ---> assign-to
1640
1641 But with the actual compile tree for "$a = $b + $c" it is different:
1642 some nodes optimized away. As a corollary, though the actual tree con‐
1643 tains more nodes than our simplified example, the execution order is
1644 the same as in our example.
1645
1646 Examining the tree
1647
1648 If you have your perl compiled for debugging (usually done with "-DDE‐
1649 BUGGING" on the "Configure" command line), you may examine the compiled
1650 tree by specifying "-Dx" on the Perl command line. The output takes
1651 several lines per node, and for "$b+$c" it looks like this:
1652
1653 5 TYPE = add ===> 6
1654 TARG = 1
1655 FLAGS = (SCALAR,KIDS)
1656 {
1657 TYPE = null ===> (4)
1658 (was rv2sv)
1659 FLAGS = (SCALAR,KIDS)
1660 {
1661 3 TYPE = gvsv ===> 4
1662 FLAGS = (SCALAR)
1663 GV = main::b
1664 }
1665 }
1666 {
1667 TYPE = null ===> (5)
1668 (was rv2sv)
1669 FLAGS = (SCALAR,KIDS)
1670 {
1671 4 TYPE = gvsv ===> 5
1672 FLAGS = (SCALAR)
1673 GV = main::c
1674 }
1675 }
1676
1677 This tree has 5 nodes (one per "TYPE" specifier), only 3 of them are
1678 not optimized away (one per number in the left column). The immediate
1679 children of the given node correspond to "{}" pairs on the same level
1680 of indentation, thus this listing corresponds to the tree:
1681
1682 add
1683 / \
1684 null null
1685 ⎪ ⎪
1686 gvsv gvsv
1687
1688 The execution order is indicated by "===>" marks, thus it is "3 4 5 6"
1689 (node 6 is not included into above listing), i.e., "gvsv gvsv add what‐
1690 ever".
1691
1692 Each of these nodes represents an op, a fundamental operation inside
1693 the Perl core. The code which implements each operation can be found in
1694 the pp*.c files; the function which implements the op with type "gvsv"
1695 is "pp_gvsv", and so on. As the tree above shows, different ops have
1696 different numbers of children: "add" is a binary operator, as one would
1697 expect, and so has two children. To accommodate the various different
1698 numbers of children, there are various types of op data structure, and
1699 they link together in different ways.
1700
1701 The simplest type of op structure is "OP": this has no children. Unary
1702 operators, "UNOP"s, have one child, and this is pointed to by the
1703 "op_first" field. Binary operators ("BINOP"s) have not only an
1704 "op_first" field but also an "op_last" field. The most complex type of
1705 op is a "LISTOP", which has any number of children. In this case, the
1706 first child is pointed to by "op_first" and the last child by
1707 "op_last". The children in between can be found by iteratively follow‐
1708 ing the "op_sibling" pointer from the first child to the last.
1709
1710 There are also two other op types: a "PMOP" holds a regular expression,
1711 and has no children, and a "LOOP" may or may not have children. If the
1712 "op_children" field is non-zero, it behaves like a "LISTOP". To compli‐
1713 cate matters, if a "UNOP" is actually a "null" op after optimization
1714 (see "Compile pass 2: context propagation") it will still have children
1715 in accordance with its former type.
1716
1717 Another way to examine the tree is to use a compiler back-end module,
1718 such as B::Concise.
1719
1720 Compile pass 1: check routines
1721
1722 The tree is created by the compiler while yacc code feeds it the con‐
1723 structions it recognizes. Since yacc works bottom-up, so does the first
1724 pass of perl compilation.
1725
1726 What makes this pass interesting for perl developers is that some opti‐
1727 mization may be performed on this pass. This is optimization by so-
1728 called "check routines". The correspondence between node names and
1729 corresponding check routines is described in opcode.pl (do not forget
1730 to run "make regen_headers" if you modify this file).
1731
1732 A check routine is called when the node is fully constructed except for
1733 the execution-order thread. Since at this time there are no back-links
1734 to the currently constructed node, one can do most any operation to the
1735 top-level node, including freeing it and/or creating new nodes
1736 above/below it.
1737
1738 The check routine returns the node which should be inserted into the
1739 tree (if the top-level node was not modified, check routine returns its
1740 argument).
1741
1742 By convention, check routines have names "ck_*". They are usually
1743 called from "new*OP" subroutines (or "convert") (which in turn are
1744 called from perly.y).
1745
1746 Compile pass 1a: constant folding
1747
1748 Immediately after the check routine is called the returned node is
1749 checked for being compile-time executable. If it is (the value is
1750 judged to be constant) it is immediately executed, and a constant node
1751 with the "return value" of the corresponding subtree is substituted
1752 instead. The subtree is deleted.
1753
1754 If constant folding was not performed, the execution-order thread is
1755 created.
1756
1757 Compile pass 2: context propagation
1758
1759 When a context for a part of compile tree is known, it is propagated
1760 down through the tree. At this time the context can have 5 values
1761 (instead of 2 for runtime context): void, boolean, scalar, list, and
1762 lvalue. In contrast with the pass 1 this pass is processed from top to
1763 bottom: a node's context determines the context for its children.
1764
1765 Additional context-dependent optimizations are performed at this time.
1766 Since at this moment the compile tree contains back-references (via
1767 "thread" pointers), nodes cannot be free()d now. To allow optimized-
1768 away nodes at this stage, such nodes are null()ified instead of
1769 free()ing (i.e. their type is changed to OP_NULL).
1770
1771 Compile pass 3: peephole optimization
1772
1773 After the compile tree for a subroutine (or for an "eval" or a file) is
1774 created, an additional pass over the code is performed. This pass is
1775 neither top-down or bottom-up, but in the execution order (with addi‐
1776 tional complications for conditionals). These optimizations are done
1777 in the subroutine peep(). Optimizations performed at this stage are
1778 subject to the same restrictions as in the pass 2.
1779
1780 Pluggable runops
1781
1782 The compile tree is executed in a runops function. There are two
1783 runops functions, in run.c and in dump.c. "Perl_runops_debug" is used
1784 with DEBUGGING and "Perl_runops_standard" is used otherwise. For fine
1785 control over the execution of the compile tree it is possible to pro‐
1786 vide your own runops function.
1787
1788 It's probably best to copy one of the existing runops functions and
1789 change it to suit your needs. Then, in the BOOT section of your XS
1790 file, add the line:
1791
1792 PL_runops = my_runops;
1793
1794 This function should be as efficient as possible to keep your programs
1795 running as fast as possible.
1796
1798 To aid debugging, the source file dump.c contains a number of functions
1799 which produce formatted output of internal data structures.
1800
1801 The most commonly used of these functions is "Perl_sv_dump"; it's used
1802 for dumping SVs, AVs, HVs, and CVs. The "Devel::Peek" module calls
1803 "sv_dump" to produce debugging output from Perl-space, so users of that
1804 module should already be familiar with its format.
1805
1806 "Perl_op_dump" can be used to dump an "OP" structure or any of its de‐
1807 rivatives, and produces output similar to "perl -Dx"; in fact,
1808 "Perl_dump_eval" will dump the main root of the code being evaluated,
1809 exactly like "-Dx".
1810
1811 Other useful functions are "Perl_dump_sub", which turns a "GV" into an
1812 op tree, "Perl_dump_packsubs" which calls "Perl_dump_sub" on all the
1813 subroutines in a package like so: (Thankfully, these are all xsubs, so
1814 there is no op tree)
1815
1816 (gdb) print Perl_dump_packsubs(PL_defstash)
1817
1818 SUB attributes::bootstrap = (xsub 0x811fedc 0)
1819
1820 SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1821
1822 SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1823
1824 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
1825
1826 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
1827
1828 and "Perl_dump_all", which dumps all the subroutines in the stash and
1829 the op tree of the main root.
1830
1832 Background and PERL_IMPLICIT_CONTEXT
1833
1834 The Perl interpreter can be regarded as a closed box: it has an API for
1835 feeding it code or otherwise making it do things, but it also has func‐
1836 tions for its own use. This smells a lot like an object, and there are
1837 ways for you to build Perl so that you can have multiple interpreters,
1838 with one interpreter represented either as a C structure, or inside a
1839 thread-specific structure. These structures contain all the context,
1840 the state of that interpreter.
1841
1842 Two macros control the major Perl build flavors: MULTIPLICITY and
1843 USE_5005THREADS. The MULTIPLICITY build has a C structure that pack‐
1844 ages all the interpreter state, and there is a similar thread-specific
1845 data structure under USE_5005THREADS. In both cases,
1846 PERL_IMPLICIT_CONTEXT is also normally defined, and enables the support
1847 for passing in a "hidden" first argument that represents all three data
1848 structures.
1849
1850 All this obviously requires a way for the Perl internal functions to be
1851 either subroutines taking some kind of structure as the first argument,
1852 or subroutines taking nothing as the first argument. To enable these
1853 two very different ways of building the interpreter, the Perl source
1854 (as it does in so many other situations) makes heavy use of macros and
1855 subroutine naming conventions.
1856
1857 First problem: deciding which functions will be public API functions
1858 and which will be private. All functions whose names begin "S_" are
1859 private (think "S" for "secret" or "static"). All other functions
1860 begin with "Perl_", but just because a function begins with "Perl_"
1861 does not mean it is part of the API. (See "Internal Functions".) The
1862 easiest way to be sure a function is part of the API is to find its
1863 entry in perlapi. If it exists in perlapi, it's part of the API. If
1864 it doesn't, and you think it should be (i.e., you need it for your
1865 extension), send mail via perlbug explaining why you think it should
1866 be.
1867
1868 Second problem: there must be a syntax so that the same subroutine dec‐
1869 larations and calls can pass a structure as their first argument, or
1870 pass nothing. To solve this, the subroutines are named and declared in
1871 a particular way. Here's a typical start of a static function used
1872 within the Perl guts:
1873
1874 STATIC void
1875 S_incline(pTHX_ char *s)
1876
1877 STATIC becomes "static" in C, and may be #define'd to nothing in some
1878 configurations in future.
1879
1880 A public function (i.e. part of the internal API, but not necessarily
1881 sanctioned for use in extensions) begins like this:
1882
1883 void
1884 Perl_sv_setiv(pTHX_ SV* dsv, IV num)
1885
1886 "pTHX_" is one of a number of macros (in perl.h) that hide the details
1887 of the interpreter's context. THX stands for "thread", "this", or
1888 "thingy", as the case may be. (And no, George Lucas is not involved.
1889 :-) The first character could be 'p' for a prototype, 'a' for argument,
1890 or 'd' for declaration, so we have "pTHX", "aTHX" and "dTHX", and their
1891 variants.
1892
1893 When Perl is built without options that set PERL_IMPLICIT_CONTEXT,
1894 there is no first argument containing the interpreter's context. The
1895 trailing underscore in the pTHX_ macro indicates that the macro expan‐
1896 sion needs a comma after the context argument because other arguments
1897 follow it. If PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be
1898 ignored, and the subroutine is not prototyped to take the extra argu‐
1899 ment. The form of the macro without the trailing underscore is used
1900 when there are no additional explicit arguments.
1901
1902 When a core function calls another, it must pass the context. This is
1903 normally hidden via macros. Consider "sv_setiv". It expands into
1904 something like this:
1905
1906 #ifdef PERL_IMPLICIT_CONTEXT
1907 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
1908 /* can't do this for vararg functions, see below */
1909 #else
1910 #define sv_setiv Perl_sv_setiv
1911 #endif
1912
1913 This works well, and means that XS authors can gleefully write:
1914
1915 sv_setiv(foo, bar);
1916
1917 and still have it work under all the modes Perl could have been com‐
1918 piled with.
1919
1920 This doesn't work so cleanly for varargs functions, though, as macros
1921 imply that the number of arguments is known in advance. Instead we
1922 either need to spell them out fully, passing "aTHX_" as the first argu‐
1923 ment (the Perl core tends to do this with functions like Perl_warner),
1924 or use a context-free version.
1925
1926 The context-free version of Perl_warner is called Perl_warner_nocon‐
1927 text, and does not take the extra argument. Instead it does dTHX; to
1928 get the context from thread-local storage. We "#define warner
1929 Perl_warner_nocontext" so that extensions get source compatibility at
1930 the expense of performance. (Passing an arg is cheaper than grabbing
1931 it from thread-local storage.)
1932
1933 You can ignore [pad]THXx when browsing the Perl headers/sources. Those
1934 are strictly for use within the core. Extensions and embedders need
1935 only be aware of [pad]THX.
1936
1937 So what happened to dTHR?
1938
1939 "dTHR" was introduced in perl 5.005 to support the older thread model.
1940 The older thread model now uses the "THX" mechanism to pass context
1941 pointers around, so "dTHR" is not useful any more. Perl 5.6.0 and
1942 later still have it for backward source compatibility, but it is
1943 defined to be a no-op.
1944
1945 How do I use all this in extensions?
1946
1947 When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any
1948 functions in the Perl API will need to pass the initial context argu‐
1949 ment somehow. The kicker is that you will need to write it in such a
1950 way that the extension still compiles when Perl hasn't been built with
1951 PERL_IMPLICIT_CONTEXT enabled.
1952
1953 There are three ways to do this. First, the easy but inefficient way,
1954 which is also the default, in order to maintain source compatibility
1955 with extensions: whenever XSUB.h is #included, it redefines the aTHX
1956 and aTHX_ macros to call a function that will return the context.
1957 Thus, something like:
1958
1959 sv_setiv(sv, num);
1960
1961 in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
1962 in effect:
1963
1964 Perl_sv_setiv(Perl_get_context(), sv, num);
1965
1966 or to this otherwise:
1967
1968 Perl_sv_setiv(sv, num);
1969
1970 You have to do nothing new in your extension to get this; since the
1971 Perl library provides Perl_get_context(), it will all just work.
1972
1973 The second, more efficient way is to use the following template for
1974 your Foo.xs:
1975
1976 #define PERL_NO_GET_CONTEXT /* we want efficiency */
1977 #include "EXTERN.h"
1978 #include "perl.h"
1979 #include "XSUB.h"
1980
1981 static my_private_function(int arg1, int arg2);
1982
1983 static SV *
1984 my_private_function(int arg1, int arg2)
1985 {
1986 dTHX; /* fetch context */
1987 ... call many Perl API functions ...
1988 }
1989
1990 [... etc ...]
1991
1992 MODULE = Foo PACKAGE = Foo
1993
1994 /* typical XSUB */
1995
1996 void
1997 my_xsub(arg)
1998 int arg
1999 CODE:
2000 my_private_function(arg, 10);
2001
2002 Note that the only two changes from the normal way of writing an exten‐
2003 sion is the addition of a "#define PERL_NO_GET_CONTEXT" before includ‐
2004 ing the Perl headers, followed by a "dTHX;" declaration at the start of
2005 every function that will call the Perl API. (You'll know which func‐
2006 tions need this, because the C compiler will complain that there's an
2007 undeclared identifier in those functions.) No changes are needed for
2008 the XSUBs themselves, because the XS() macro is correctly defined to
2009 pass in the implicit context if needed.
2010
2011 The third, even more efficient way is to ape how it is done within the
2012 Perl guts:
2013
2014 #define PERL_NO_GET_CONTEXT /* we want efficiency */
2015 #include "EXTERN.h"
2016 #include "perl.h"
2017 #include "XSUB.h"
2018
2019 /* pTHX_ only needed for functions that call Perl API */
2020 static my_private_function(pTHX_ int arg1, int arg2);
2021
2022 static SV *
2023 my_private_function(pTHX_ int arg1, int arg2)
2024 {
2025 /* dTHX; not needed here, because THX is an argument */
2026 ... call Perl API functions ...
2027 }
2028
2029 [... etc ...]
2030
2031 MODULE = Foo PACKAGE = Foo
2032
2033 /* typical XSUB */
2034
2035 void
2036 my_xsub(arg)
2037 int arg
2038 CODE:
2039 my_private_function(aTHX_ arg, 10);
2040
2041 This implementation never has to fetch the context using a function
2042 call, since it is always passed as an extra argument. Depending on
2043 your needs for simplicity or efficiency, you may mix the previous two
2044 approaches freely.
2045
2046 Never add a comma after "pTHX" yourself--always use the form of the
2047 macro with the underscore for functions that take explicit arguments,
2048 or the form without the argument for functions with no explicit argu‐
2049 ments.
2050
2051 Should I do anything special if I call perl from multiple threads?
2052
2053 If you create interpreters in one thread and then proceed to call them
2054 in another, you need to make sure perl's own Thread Local Storage (TLS)
2055 slot is initialized correctly in each of those threads.
2056
2057 The "perl_alloc" and "perl_clone" API functions will automatically set
2058 the TLS slot to the interpreter they created, so that there is no need
2059 to do anything special if the interpreter is always accessed in the
2060 same thread that created it, and that thread did not create or call any
2061 other interpreters afterwards. If that is not the case, you have to
2062 set the TLS slot of the thread before calling any functions in the Perl
2063 API on that particular interpreter. This is done by calling the
2064 "PERL_SET_CONTEXT" macro in that thread as the first thing you do:
2065
2066 /* do this before doing anything else with some_perl */
2067 PERL_SET_CONTEXT(some_perl);
2068
2069 ... other Perl API calls on some_perl go here ...
2070
2071 Future Plans and PERL_IMPLICIT_SYS
2072
2073 Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2074 that the interpreter knows about itself and pass it around, so too are
2075 there plans to allow the interpreter to bundle up everything it knows
2076 about the environment it's running on. This is enabled with the
2077 PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS and
2078 USE_5005THREADS on Windows (see inside iperlsys.h).
2079
2080 This allows the ability to provide an extra pointer (called the "host"
2081 environment) for all the system calls. This makes it possible for all
2082 the system stuff to maintain their own state, broken down into seven C
2083 structures. These are thin wrappers around the usual system calls (see
2084 win32/perllib.c) for the default perl executable, but for a more ambi‐
2085 tious host (like the one that would do fork() emulation) all the extra
2086 work needed to pretend that different interpreters are actually differ‐
2087 ent "processes", would be done here.
2088
2089 The Perl engine/interpreter and the host are orthogonal entities.
2090 There could be one or more interpreters in a process, and one or more
2091 "hosts", with free association between them.
2092
2094 All of Perl's internal functions which will be exposed to the outside
2095 world are prefixed by "Perl_" so that they will not conflict with XS
2096 functions or functions used in a program in which Perl is embedded.
2097 Similarly, all global variables begin with "PL_". (By convention,
2098 static functions start with "S_".)
2099
2100 Inside the Perl core, you can get at the functions either with or with‐
2101 out the "Perl_" prefix, thanks to a bunch of defines that live in
2102 embed.h. This header file is generated automatically from embed.pl and
2103 embed.fnc. embed.pl also creates the prototyping header files for the
2104 internal functions, generates the documentation and a lot of other bits
2105 and pieces. It's important that when you add a new function to the core
2106 or change an existing one, you change the data in the table in
2107 embed.fnc as well. Here's a sample entry from that table:
2108
2109 Apd ⎪SV** ⎪av_fetch ⎪AV* ar⎪I32 key⎪I32 lval
2110
2111 The second column is the return type, the third column the name. Col‐
2112 umns after that are the arguments. The first column is a set of flags:
2113
2114 A This function is a part of the public API. All such functions should
2115 also have 'd', very few do not.
2116
2117 p This function has a "Perl_" prefix; i.e. it is defined as
2118 "Perl_av_fetch".
2119
2120 d This function has documentation using the "apidoc" feature which
2121 we'll look at in a second. Some functions have 'd' but not 'A';
2122 docs are good.
2123
2124 Other available flags are:
2125
2126 s This is a static function and is defined as "STATIC S_whatever", and
2127 usually called within the sources as "whatever(...)".
2128
2129 n This does not need a interpreter context, so the definition has no
2130 "pTHX", and it follows that callers don't use "aTHX". (See "Back‐
2131 ground and PERL_IMPLICIT_CONTEXT" in perlguts.)
2132
2133 r This function never returns; "croak", "exit" and friends.
2134
2135 f This function takes a variable number of arguments, "printf" style.
2136 The argument list should end with "...", like this:
2137
2138 Afprd ⎪void ⎪croak ⎪const char* pat⎪...
2139
2140 M This function is part of the experimental development API, and may
2141 change or disappear without notice.
2142
2143 o This function should not have a compatibility macro to define, say,
2144 "Perl_parse" to "parse". It must be called as "Perl_parse".
2145
2146 x This function isn't exported out of the Perl core.
2147
2148 m This is implemented as a macro.
2149
2150 X This function is explicitly exported.
2151
2152 E This function is visible to extensions included in the Perl core.
2153
2154 b Binary backward compatibility; this function is a macro but also has
2155 a "Perl_" implementation (which is exported).
2156
2157 others
2158 See the comments at the top of "embed.fnc" for others.
2159
2160 If you edit embed.pl or embed.fnc, you will need to run "make
2161 regen_headers" to force a rebuild of embed.h and other auto-generated
2162 files.
2163
2164 Formatted Printing of IVs, UVs, and NVs
2165
2166 If you are printing IVs, UVs, or NVS instead of the stdio(3) style for‐
2167 matting codes like %d, %ld, %f, you should use the following macros for
2168 portability
2169
2170 IVdf IV in decimal
2171 UVuf UV in decimal
2172 UVof UV in octal
2173 UVxf UV in hexadecimal
2174 NVef NV %e-like
2175 NVff NV %f-like
2176 NVgf NV %g-like
2177
2178 These will take care of 64-bit integers and long doubles. For example:
2179
2180 printf("IV is %"IVdf"\n", iv);
2181
2182 The IVdf will expand to whatever is the correct format for the IVs.
2183
2184 If you are printing addresses of pointers, use UVxf combined with
2185 PTR2UV(), do not use %lx or %p.
2186
2187 Pointer-To-Integer and Integer-To-Pointer
2188
2189 Because pointer size does not necessarily equal integer size, use the
2190 follow macros to do it right.
2191
2192 PTR2UV(pointer)
2193 PTR2IV(pointer)
2194 PTR2NV(pointer)
2195 INT2PTR(pointertotype, integer)
2196
2197 For example:
2198
2199 IV iv = ...;
2200 SV *sv = INT2PTR(SV*, iv);
2201
2202 and
2203
2204 AV *av = ...;
2205 UV uv = PTR2UV(av);
2206
2207 Source Documentation
2208
2209 There's an effort going on to document the internal functions and auto‐
2210 matically produce reference manuals from them - perlapi is one such
2211 manual which details all the functions which are available to XS writ‐
2212 ers. perlintern is the autogenerated manual for the functions which are
2213 not part of the API and are supposedly for internal use only.
2214
2215 Source documentation is created by putting POD comments into the C
2216 source, like this:
2217
2218 /*
2219 =for apidoc sv_setiv
2220
2221 Copies an integer into the given SV. Does not handle 'set' magic. See
2222 C<sv_setiv_mg>.
2223
2224 =cut
2225 */
2226
2227 Please try and supply some documentation if you add functions to the
2228 Perl core.
2229
2230 Backwards compatibility
2231
2232 The Perl API changes over time. New functions are added or the inter‐
2233 faces of existing functions are changed. The "Devel::PPPort" module
2234 tries to provide compatibility code for some of these changes, so XS
2235 writers don't have to code it themselves when supporting multiple ver‐
2236 sions of Perl.
2237
2238 "Devel::PPPort" generates a C header file ppport.h that can also be run
2239 as a Perl script. To generate ppport.h, run:
2240
2241 perl -MDevel::PPPort -eDevel::PPPort::WriteFile
2242
2243 Besides checking existing XS code, the script can also be used to
2244 retrieve compatibility information for various API calls using the
2245 "--api-info" command line switch. For example:
2246
2247 % perl ppport.h --api-info=sv_magicext
2248
2249 For details, see "perldoc ppport.h".
2250
2252 Perl 5.6.0 introduced Unicode support. It's important for porters and
2253 XS writers to understand this support and make sure that the code they
2254 write does not corrupt Unicode data.
2255
2256 What is Unicode, anyway?
2257
2258 In the olden, less enlightened times, we all used to use ASCII. Most of
2259 us did, anyway. The big problem with ASCII is that it's American. Well,
2260 no, that's not actually the problem; the problem is that it's not par‐
2261 ticularly useful for people who don't use the Roman alphabet. What used
2262 to happen was that particular languages would stick their own alphabet
2263 in the upper range of the sequence, between 128 and 255. Of course, we
2264 then ended up with plenty of variants that weren't quite ASCII, and the
2265 whole point of it being a standard was lost.
2266
2267 Worse still, if you've got a language like Chinese or Japanese that has
2268 hundreds or thousands of characters, then you really can't fit them
2269 into a mere 256, so they had to forget about ASCII altogether, and
2270 build their own systems using pairs of numbers to refer to one charac‐
2271 ter.
2272
2273 To fix this, some people formed Unicode, Inc. and produced a new char‐
2274 acter set containing all the characters you can possibly think of and
2275 more. There are several ways of representing these characters, and the
2276 one Perl uses is called UTF-8. UTF-8 uses a variable number of bytes to
2277 represent a character, instead of just one. You can learn more about
2278 Unicode at http://www.unicode.org/
2279
2280 How can I recognise a UTF-8 string?
2281
2282 You can't. This is because UTF-8 data is stored in bytes just like
2283 non-UTF-8 data. The Unicode character 200, (0xC8 for you hex types)
2284 capital E with a grave accent, is represented by the two bytes
2285 "v196.172". Unfortunately, the non-Unicode string "chr(196).chr(172)"
2286 has that byte sequence as well. So you can't tell just by looking -
2287 this is what makes Unicode input an interesting problem.
2288
2289 The API function "is_utf8_string" can help; it'll tell you if a string
2290 contains only valid UTF-8 characters. However, it can't do the work for
2291 you. On a character-by-character basis, "is_utf8_char" will tell you
2292 whether the current character in a string is valid UTF-8.
2293
2294 How does UTF-8 represent Unicode characters?
2295
2296 As mentioned above, UTF-8 uses a variable number of bytes to store a
2297 character. Characters with values 1...128 are stored in one byte, just
2298 like good ol' ASCII. Character 129 is stored as "v194.129"; this con‐
2299 tinues up to character 191, which is "v194.191". Now we've run out of
2300 bits (191 is binary 10111111) so we move on; 192 is "v195.128". And so
2301 it goes on, moving to three bytes at character 2048.
2302
2303 Assuming you know you're dealing with a UTF-8 string, you can find out
2304 how long the first character in it is with the "UTF8SKIP" macro:
2305
2306 char *utf = "\305\233\340\240\201";
2307 I32 len;
2308
2309 len = UTF8SKIP(utf); /* len is 2 here */
2310 utf += len;
2311 len = UTF8SKIP(utf); /* len is 3 here */
2312
2313 Another way to skip over characters in a UTF-8 string is to use
2314 "utf8_hop", which takes a string and a number of characters to skip
2315 over. You're on your own about bounds checking, though, so don't use it
2316 lightly.
2317
2318 All bytes in a multi-byte UTF-8 character will have the high bit set,
2319 so you can test if you need to do something special with this character
2320 like this (the UTF8_IS_INVARIANT() is a macro that tests whether the
2321 byte can be encoded as a single byte even in UTF-8):
2322
2323 U8 *utf;
2324 UV uv; /* Note: a UV, not a U8, not a char */
2325
2326 if (!UTF8_IS_INVARIANT(*utf))
2327 /* Must treat this as UTF-8 */
2328 uv = utf8_to_uv(utf);
2329 else
2330 /* OK to treat this character as a byte */
2331 uv = *utf;
2332
2333 You can also see in that example that we use "utf8_to_uv" to get the
2334 value of the character; the inverse function "uv_to_utf8" is available
2335 for putting a UV into UTF-8:
2336
2337 if (!UTF8_IS_INVARIANT(uv))
2338 /* Must treat this as UTF8 */
2339 utf8 = uv_to_utf8(utf8, uv);
2340 else
2341 /* OK to treat this character as a byte */
2342 *utf8++ = uv;
2343
2344 You must convert characters to UVs using the above functions if you're
2345 ever in a situation where you have to match UTF-8 and non-UTF-8 charac‐
2346 ters. You may not skip over UTF-8 characters in this case. If you do
2347 this, you'll lose the ability to match hi-bit non-UTF-8 characters; for
2348 instance, if your UTF-8 string contains "v196.172", and you skip that
2349 character, you can never match a "chr(200)" in a non-UTF-8 string. So
2350 don't do that!
2351
2352 How does Perl store UTF-8 strings?
2353
2354 Currently, Perl deals with Unicode strings and non-Unicode strings
2355 slightly differently. If a string has been identified as being UTF-8
2356 encoded, Perl will set a flag in the SV, "SVf_UTF8". You can check and
2357 manipulate this flag with the following macros:
2358
2359 SvUTF8(sv)
2360 SvUTF8_on(sv)
2361 SvUTF8_off(sv)
2362
2363 This flag has an important effect on Perl's treatment of the string: if
2364 Unicode data is not properly distinguished, regular expressions,
2365 "length", "substr" and other string handling operations will have unde‐
2366 sirable results.
2367
2368 The problem comes when you have, for instance, a string that isn't
2369 flagged is UTF-8, and contains a byte sequence that could be UTF-8 -
2370 especially when combining non-UTF-8 and UTF-8 strings.
2371
2372 Never forget that the "SVf_UTF8" flag is separate to the PV value; you
2373 need be sure you don't accidentally knock it off while you're manipu‐
2374 lating SVs. More specifically, you cannot expect to do this:
2375
2376 SV *sv;
2377 SV *nsv;
2378 STRLEN len;
2379 char *p;
2380
2381 p = SvPV(sv, len);
2382 frobnicate(p);
2383 nsv = newSVpvn(p, len);
2384
2385 The "char*" string does not tell you the whole story, and you can't
2386 copy or reconstruct an SV just by copying the string value. Check if
2387 the old SV has the UTF-8 flag set, and act accordingly:
2388
2389 p = SvPV(sv, len);
2390 frobnicate(p);
2391 nsv = newSVpvn(p, len);
2392 if (SvUTF8(sv))
2393 SvUTF8_on(nsv);
2394
2395 In fact, your "frobnicate" function should be made aware of whether or
2396 not it's dealing with UTF-8 data, so that it can handle the string
2397 appropriately.
2398
2399 Since just passing an SV to an XS function and copying the data of the
2400 SV is not enough to copy the UTF-8 flags, even less right is just pass‐
2401 ing a "char *" to an XS function.
2402
2403 How do I convert a string to UTF-8?
2404
2405 If you're mixing UTF-8 and non-UTF-8 strings, you might find it neces‐
2406 sary to upgrade one of the strings to UTF-8. If you've got an SV, the
2407 easiest way to do this is:
2408
2409 sv_utf8_upgrade(sv);
2410
2411 However, you must not do this, for example:
2412
2413 if (!SvUTF8(left))
2414 sv_utf8_upgrade(left);
2415
2416 If you do this in a binary operator, you will actually change one of
2417 the strings that came into the operator, and, while it shouldn't be
2418 noticeable by the end user, it can cause problems.
2419
2420 Instead, "bytes_to_utf8" will give you a UTF-8-encoded copy of its
2421 string argument. This is useful for having the data available for com‐
2422 parisons and so on, without harming the original SV. There's also
2423 "utf8_to_bytes" to go the other way, but naturally, this will fail if
2424 the string contains any characters above 255 that can't be represented
2425 in a single byte.
2426
2427 Is there anything else I need to know?
2428
2429 Not really. Just remember these things:
2430
2431 · There's no way to tell if a string is UTF-8 or not. You can tell if
2432 an SV is UTF-8 by looking at is "SvUTF8" flag. Don't forget to set
2433 the flag if something should be UTF-8. Treat the flag as part of the
2434 PV, even though it's not - if you pass on the PV to somewhere, pass
2435 on the flag too.
2436
2437 · If a string is UTF-8, always use "utf8_to_uv" to get at the value,
2438 unless "UTF8_IS_INVARIANT(*s)" in which case you can use *s.
2439
2440 · When writing a character "uv" to a UTF-8 string, always use
2441 "uv_to_utf8", unless "UTF8_IS_INVARIANT(uv))" in which case you can
2442 use "*s = uv".
2443
2444 · Mixing UTF-8 and non-UTF-8 strings is tricky. Use "bytes_to_utf8" to
2445 get a new string which is UTF-8 encoded. There are tricks you can
2446 use to delay deciding whether you need to use a UTF-8 string until
2447 you get to a high character - "HALF_UPGRADE" is one of those.
2448
2450 Custom operator support is a new experimental feature that allows you
2451 to define your own ops. This is primarily to allow the building of
2452 interpreters for other languages in the Perl core, but it also allows
2453 optimizations through the creation of "macro-ops" (ops which perform
2454 the functions of multiple ops which are usually executed together, such
2455 as "gvsv, gvsv, add".)
2456
2457 This feature is implemented as a new op type, "OP_CUSTOM". The Perl
2458 core does not "know" anything special about this op type, and so it
2459 will not be involved in any optimizations. This also means that you can
2460 define your custom ops to be any op structure - unary, binary, list and
2461 so on - you like.
2462
2463 It's important to know what custom operators won't do for you. They
2464 won't let you add new syntax to Perl, directly. They won't even let you
2465 add new keywords, directly. In fact, they won't change the way Perl
2466 compiles a program at all. You have to do those changes yourself, after
2467 Perl has compiled the program. You do this either by manipulating the
2468 op tree using a "CHECK" block and the "B::Generate" module, or by
2469 adding a custom peephole optimizer with the "optimize" module.
2470
2471 When you do this, you replace ordinary Perl ops with custom ops by cre‐
2472 ating ops with the type "OP_CUSTOM" and the "pp_addr" of your own PP
2473 function. This should be defined in XS code, and should look like the
2474 PP ops in "pp_*.c". You are responsible for ensuring that your op takes
2475 the appropriate number of values from the stack, and you are responsi‐
2476 ble for adding stack marks if necessary.
2477
2478 You should also "register" your op with the Perl interpreter so that it
2479 can produce sensible error and warning messages. Since it is possible
2480 to have multiple custom ops within the one "logical" op type "OP_CUS‐
2481 TOM", Perl uses the value of "o->op_ppaddr" as a key into the "PL_cus‐
2482 tom_op_descs" and "PL_custom_op_names" hashes. This means you need to
2483 enter a name and description for your op at the appropriate place in
2484 the "PL_custom_op_names" and "PL_custom_op_descs" hashes.
2485
2486 Forthcoming versions of "B::Generate" (version 1.0 and above) should
2487 directly support the creation of custom ops by name.
2488
2490 Until May 1997, this document was maintained by Jeff Okamoto
2491 <okamoto@corp.hp.com>. It is now maintained as part of Perl itself by
2492 the Perl 5 Porters <perl5-porters@perl.org>.
2493
2494 With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
2495 Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil Bow‐
2496 ers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, Stephen
2497 McCamant, and Gurusamy Sarathy.
2498
2500 perlapi(1), perlintern(1), perlxs(1), perlembed(1)
2501
2502
2503
2504perl v5.8.8 2006-01-07 PERLGUTS(1)