1PERLGUTS(1) Perl Programmers Reference Guide PERLGUTS(1)
2
3
4
6 perlguts - Introduction to the Perl API
7
9 This document attempts to describe how to use the Perl API, as well as
10 to provide some info on the basic workings of the Perl core. It is far
11 from complete and probably contains many errors. Please refer any
12 questions or comments to the author below.
13
15 Datatypes
16 Perl has three typedefs that handle Perl's three main data types:
17
18 SV Scalar Value
19 AV Array Value
20 HV Hash Value
21
22 Each typedef has specific routines that manipulate the various data
23 types.
24
25 What is an "IV"?
26 Perl uses a special typedef IV which is a simple signed integer type
27 that is guaranteed to be large enough to hold a pointer (as well as an
28 integer). Additionally, there is the UV, which is simply an unsigned
29 IV.
30
31 Perl also uses two special typedefs, I32 and I16, which will always be
32 at least 32-bits and 16-bits long, respectively. (Again, there are U32
33 and U16, as well.) They will usually be exactly 32 and 16 bits long,
34 but on Crays they will both be 64 bits.
35
36 Working with SVs
37 An SV can be created and loaded with one command. There are five types
38 of values that can be loaded: an integer value (IV), an unsigned
39 integer value (UV), a double (NV), a string (PV), and another scalar
40 (SV). ("PV" stands for "Pointer Value". You might think that it is
41 misnamed because it is described as pointing only to strings. However,
42 it is possible to have it point to other things. For example, it could
43 point to an array of UVs. But, using it for non-strings requires care,
44 as the underlying assumption of much of the internals is that PVs are
45 just for strings. Often, for example, a trailing "NUL" is tacked on
46 automatically. The non-string use is documented only in this
47 paragraph.)
48
49 The seven routines are:
50
51 SV* newSViv(IV);
52 SV* newSVuv(UV);
53 SV* newSVnv(double);
54 SV* newSVpv(const char*, STRLEN);
55 SV* newSVpvn(const char*, STRLEN);
56 SV* newSVpvf(const char*, ...);
57 SV* newSVsv(SV*);
58
59 "STRLEN" is an integer type ("Size_t", usually defined as "size_t" in
60 config.h) guaranteed to be large enough to represent the size of any
61 string that perl can handle.
62
63 In the unlikely case of a SV requiring more complex initialization, you
64 can create an empty SV with newSV(len). If "len" is 0 an empty SV of
65 type NULL is returned, else an SV of type PV is returned with len + 1
66 (for the "NUL") bytes of storage allocated, accessible via SvPVX. In
67 both cases the SV has the undef value.
68
69 SV *sv = newSV(0); /* no storage allocated */
70 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
71 * allocated */
72
73 To change the value of an already-existing SV, there are eight
74 routines:
75
76 void sv_setiv(SV*, IV);
77 void sv_setuv(SV*, UV);
78 void sv_setnv(SV*, double);
79 void sv_setpv(SV*, const char*);
80 void sv_setpvn(SV*, const char*, STRLEN)
81 void sv_setpvf(SV*, const char*, ...);
82 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *,
83 SV **, Size_t, bool *);
84 void sv_setsv(SV*, SV*);
85
86 Notice that you can choose to specify the length of the string to be
87 assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
88 allow Perl to calculate the length by using "sv_setpv" or by specifying
89 0 as the second argument to "newSVpv". Be warned, though, that Perl
90 will determine the string's length by using "strlen", which depends on
91 the string terminating with a "NUL" character, and not otherwise
92 containing NULs.
93
94 The arguments of "sv_setpvf" are processed like "sprintf", and the
95 formatted output becomes the value.
96
97 "sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to
98 specify either a pointer to a variable argument list or the address and
99 length of an array of SVs. The last argument points to a boolean; on
100 return, if that boolean is true, then locale-specific information has
101 been used to format the string, and the string's contents are therefore
102 untrustworthy (see perlsec). This pointer may be NULL if that
103 information is not important. Note that this function requires you to
104 specify the length of the format.
105
106 The "sv_set*()" functions are not generic enough to operate on values
107 that have "magic". See "Magic Virtual Tables" later in this document.
108
109 All SVs that contain strings should be terminated with a "NUL"
110 character. If it is not "NUL"-terminated there is a risk of core dumps
111 and corruptions from code which passes the string to C functions or
112 system calls which expect a "NUL"-terminated string. Perl's own
113 functions typically add a trailing "NUL" for this reason.
114 Nevertheless, you should be very careful when you pass a string stored
115 in an SV to a C function or system call.
116
117 To access the actual value that an SV points to, you can use the
118 macros:
119
120 SvIV(SV*)
121 SvUV(SV*)
122 SvNV(SV*)
123 SvPV(SV*, STRLEN len)
124 SvPV_nolen(SV*)
125
126 which will automatically coerce the actual scalar type into an IV, UV,
127 double, or string.
128
129 In the "SvPV" macro, the length of the string returned is placed into
130 the variable "len" (this is a macro, so you do not use &len). If you
131 do not care what the length of the data is, use the "SvPV_nolen" macro.
132 Historically the "SvPV" macro with the global variable "PL_na" has been
133 used in this case. But that can be quite inefficient because "PL_na"
134 must be accessed in thread-local storage in threaded Perl. In any
135 case, remember that Perl allows arbitrary strings of data that may both
136 contain NULs and might not be terminated by a "NUL".
137
138 Also remember that C doesn't allow you to safely say "foo(SvPV(s, len),
139 len);". It might work with your compiler, but it won't work for
140 everyone. Break this sort of statement up into separate assignments:
141
142 SV *s;
143 STRLEN len;
144 char *ptr;
145 ptr = SvPV(s, len);
146 foo(ptr, len);
147
148 If you want to know if the scalar value is TRUE, you can use:
149
150 SvTRUE(SV*)
151
152 Although Perl will automatically grow strings for you, if you need to
153 force Perl to allocate more memory for your SV, you can use the macro
154
155 SvGROW(SV*, STRLEN newlen)
156
157 which will determine if more memory needs to be allocated. If so, it
158 will call the function "sv_grow". Note that "SvGROW" can only
159 increase, not decrease, the allocated memory of an SV and that it does
160 not automatically add space for the trailing "NUL" byte (perl's own
161 string functions typically do "SvGROW(sv, len + 1)").
162
163 If you want to write to an existing SV's buffer and set its value to a
164 string, use SvPV_force() or one of its variants to force the SV to be a
165 PV. This will remove any of various types of non-stringness from the
166 SV while preserving the content of the SV in the PV. This can be used,
167 for example, to append data from an API function to a buffer without
168 extra copying:
169
170 (void)SvPVbyte_force(sv, len);
171 s = SvGROW(sv, len + needlen + 1);
172 /* something that modifies up to needlen bytes at s+len, but
173 modifies newlen bytes
174 eg. newlen = read(fd, s + len, needlen);
175 ignoring errors for these examples
176 */
177 s[len + newlen] = '\0';
178 SvCUR_set(sv, len + newlen);
179 SvUTF8_off(sv);
180 SvSETMAGIC(sv);
181
182 If you already have the data in memory or if you want to keep your code
183 simple, you can use one of the sv_cat*() variants, such as sv_catpvn().
184 If you want to insert anywhere in the string you can use sv_insert() or
185 sv_insert_flags().
186
187 If you don't need the existing content of the SV, you can avoid some
188 copying with:
189
190 SvPVCLEAR(sv);
191 s = SvGROW(sv, needlen + 1);
192 /* something that modifies up to needlen bytes at s, but modifies
193 newlen bytes
194 eg. newlen = read(fd, s. needlen);
195 */
196 s[newlen] = '\0';
197 SvCUR_set(sv, newlen);
198 SvPOK_only(sv); /* also clears SVf_UTF8 */
199 SvSETMAGIC(sv);
200
201 Again, if you already have the data in memory or want to avoid the
202 complexity of the above, you can use sv_setpvn().
203
204 If you have a buffer allocated with Newx() and want to set that as the
205 SV's value, you can use sv_usepvn_flags(). That has some requirements
206 if you want to avoid perl re-allocating the buffer to fit the trailing
207 NUL:
208
209 Newx(buf, somesize+1, char);
210 /* ... fill in buf ... */
211 buf[somesize] = '\0';
212 sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL);
213 /* buf now belongs to perl, don't release it */
214
215 If you have an SV and want to know what kind of data Perl thinks is
216 stored in it, you can use the following macros to check the type of SV
217 you have.
218
219 SvIOK(SV*)
220 SvNOK(SV*)
221 SvPOK(SV*)
222
223 You can get and set the current length of the string stored in an SV
224 with the following macros:
225
226 SvCUR(SV*)
227 SvCUR_set(SV*, I32 val)
228
229 You can also get a pointer to the end of the string stored in the SV
230 with the macro:
231
232 SvEND(SV*)
233
234 But note that these last three macros are valid only if "SvPOK()" is
235 true.
236
237 If you want to append something to the end of string stored in an
238 "SV*", you can use the following functions:
239
240 void sv_catpv(SV*, const char*);
241 void sv_catpvn(SV*, const char*, STRLEN);
242 void sv_catpvf(SV*, const char*, ...);
243 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **,
244 I32, bool);
245 void sv_catsv(SV*, SV*);
246
247 The first function calculates the length of the string to be appended
248 by using "strlen". In the second, you specify the length of the string
249 yourself. The third function processes its arguments like "sprintf"
250 and appends the formatted output. The fourth function works like
251 "vsprintf". You can specify the address and length of an array of SVs
252 instead of the va_list argument. The fifth function extends the string
253 stored in the first SV with the string stored in the second SV. It
254 also forces the second SV to be interpreted as a string.
255
256 The "sv_cat*()" functions are not generic enough to operate on values
257 that have "magic". See "Magic Virtual Tables" later in this document.
258
259 If you know the name of a scalar variable, you can get a pointer to its
260 SV by using the following:
261
262 SV* get_sv("package::varname", 0);
263
264 This returns NULL if the variable does not exist.
265
266 If you want to know if this variable (or any other SV) is actually
267 "defined", you can call:
268
269 SvOK(SV*)
270
271 The scalar "undef" value is stored in an SV instance called
272 "PL_sv_undef".
273
274 Its address can be used whenever an "SV*" is needed. Make sure that
275 you don't try to compare a random sv with &PL_sv_undef. For example
276 when interfacing Perl code, it'll work correctly for:
277
278 foo(undef);
279
280 But won't work when called as:
281
282 $x = undef;
283 foo($x);
284
285 So to repeat always use SvOK() to check whether an sv is defined.
286
287 Also you have to be careful when using &PL_sv_undef as a value in AVs
288 or HVs (see "AVs, HVs and undefined values").
289
290 There are also the two values "PL_sv_yes" and "PL_sv_no", which contain
291 boolean TRUE and FALSE values, respectively. Like "PL_sv_undef", their
292 addresses can be used whenever an "SV*" is needed.
293
294 Do not be fooled into thinking that "(SV *) 0" is the same as
295 &PL_sv_undef. Take this code:
296
297 SV* sv = (SV*) 0;
298 if (I-am-to-return-a-real-value) {
299 sv = sv_2mortal(newSViv(42));
300 }
301 sv_setsv(ST(0), sv);
302
303 This code tries to return a new SV (which contains the value 42) if it
304 should return a real value, or undef otherwise. Instead it has
305 returned a NULL pointer which, somewhere down the line, will cause a
306 segmentation violation, bus error, or just weird results. Change the
307 zero to &PL_sv_undef in the first line and all will be well.
308
309 To free an SV that you've created, call "SvREFCNT_dec(SV*)". Normally
310 this call is not necessary (see "Reference Counts and Mortality").
311
312 Offsets
313 Perl provides the function "sv_chop" to efficiently remove characters
314 from the beginning of a string; you give it an SV and a pointer to
315 somewhere inside the PV, and it discards everything before the pointer.
316 The efficiency comes by means of a little hack: instead of actually
317 removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to
318 signal to other functions that the offset hack is in effect, and it
319 moves the PV pointer (called "SvPVX") forward by the number of bytes
320 chopped off, and adjusts "SvCUR" and "SvLEN" accordingly. (A portion
321 of the space between the old and new PV pointers is used to store the
322 count of chopped bytes.)
323
324 Hence, at this point, the start of the buffer that we allocated lives
325 at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing into
326 the middle of this allocated storage.
327
328 This is best demonstrated by example. Normally copy-on-write will
329 prevent the substitution from operator from using this hack, but if you
330 can craft a string for which copy-on-write is not possible, you can see
331 it in play. In the current implementation, the final byte of a string
332 buffer is used as a copy-on-write reference count. If the buffer is
333 not big enough, then copy-on-write is skipped. First have a look at an
334 empty string:
335
336 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a'
337 SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
338 REFCNT = 1
339 FLAGS = (POK,pPOK)
340 PV = 0x7ffb7bc05b50 ""\0
341 CUR = 0
342 LEN = 10
343
344 Notice here the LEN is 10. (It may differ on your platform.) Extend
345 the length of the string to one less than 10, and do a substitution:
346
347 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \
348 Dump($a)'
349 SV = PV(0x7ffa04008a70) at 0x7ffa04030390
350 REFCNT = 1
351 FLAGS = (POK,OOK,pPOK)
352 OFFSET = 1
353 PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0
354 CUR = 8
355 LEN = 9
356
357 Here the number of bytes chopped off (1) is shown next as the OFFSET.
358 The portion of the string between the "real" and the "fake" beginnings
359 is shown in parentheses, and the values of "SvCUR" and "SvLEN" reflect
360 the fake beginning, not the real one. (The first character of the
361 string buffer happens to have changed to "\1" here, not "1", because
362 the current implementation stores the offset count in the string
363 buffer. This is subject to change.)
364
365 Something similar to the offset hack is performed on AVs to enable
366 efficient shifting and splicing off the beginning of the array; while
367 "AvARRAY" points to the first element in the array that is visible from
368 Perl, "AvALLOC" points to the real start of the C array. These are
369 usually the same, but a "shift" operation can be carried out by
370 increasing "AvARRAY" by one and decreasing "AvFILL" and "AvMAX".
371 Again, the location of the real start of the C array only comes into
372 play when freeing the array. See "av_shift" in av.c.
373
374 What's Really Stored in an SV?
375 Recall that the usual method of determining the type of scalar you have
376 is to use "Sv*OK" macros. Because a scalar can be both a number and a
377 string, usually these macros will always return TRUE and calling the
378 "Sv*V" macros will do the appropriate conversion of string to
379 integer/double or integer/double to string.
380
381 If you really need to know if you have an integer, double, or string
382 pointer in an SV, you can use the following three macros instead:
383
384 SvIOKp(SV*)
385 SvNOKp(SV*)
386 SvPOKp(SV*)
387
388 These will tell you if you truly have an integer, double, or string
389 pointer stored in your SV. The "p" stands for private.
390
391 There are various ways in which the private and public flags may
392 differ. For example, in perl 5.16 and earlier a tied SV may have a
393 valid underlying value in the IV slot (so SvIOKp is true), but the data
394 should be accessed via the FETCH routine rather than directly, so SvIOK
395 is false. (In perl 5.18 onwards, tied scalars use the flags the same
396 way as untied scalars.) Another is when numeric conversion has
397 occurred and precision has been lost: only the private flag is set on
398 'lossy' values. So when an NV is converted to an IV with loss, SvIOKp,
399 SvNOKp and SvNOK will be set, while SvIOK wont be.
400
401 In general, though, it's best to use the "Sv*V" macros.
402
403 Working with AVs
404 There are two ways to create and load an AV. The first method creates
405 an empty AV:
406
407 AV* newAV();
408
409 The second method both creates the AV and initially populates it with
410 SVs:
411
412 AV* av_make(SSize_t num, SV **ptr);
413
414 The second argument points to an array containing "num" "SV*"'s. Once
415 the AV has been created, the SVs can be destroyed, if so desired.
416
417 Once the AV has been created, the following operations are possible on
418 it:
419
420 void av_push(AV*, SV*);
421 SV* av_pop(AV*);
422 SV* av_shift(AV*);
423 void av_unshift(AV*, SSize_t num);
424
425 These should be familiar operations, with the exception of
426 "av_unshift". This routine adds "num" elements at the front of the
427 array with the "undef" value. You must then use "av_store" (described
428 below) to assign values to these new elements.
429
430 Here are some other functions:
431
432 SSize_t av_top_index(AV*);
433 SV** av_fetch(AV*, SSize_t key, I32 lval);
434 SV** av_store(AV*, SSize_t key, SV* val);
435
436 The "av_top_index" function returns the highest index value in an array
437 (just like $#array in Perl). If the array is empty, -1 is returned.
438 The "av_fetch" function returns the value at index "key", but if "lval"
439 is non-zero, then "av_fetch" will store an undef value at that index.
440 The "av_store" function stores the value "val" at index "key", and does
441 not increment the reference count of "val". Thus the caller is
442 responsible for taking care of that, and if "av_store" returns NULL,
443 the caller will have to decrement the reference count to avoid a memory
444 leak. Note that "av_fetch" and "av_store" both return "SV**"'s, not
445 "SV*"'s as their return value.
446
447 A few more:
448
449 void av_clear(AV*);
450 void av_undef(AV*);
451 void av_extend(AV*, SSize_t key);
452
453 The "av_clear" function deletes all the elements in the AV* array, but
454 does not actually delete the array itself. The "av_undef" function
455 will delete all the elements in the array plus the array itself. The
456 "av_extend" function extends the array so that it contains at least
457 "key+1" elements. If "key+1" is less than the currently allocated
458 length of the array, then nothing is done.
459
460 If you know the name of an array variable, you can get a pointer to its
461 AV by using the following:
462
463 AV* get_av("package::varname", 0);
464
465 This returns NULL if the variable does not exist.
466
467 See "Understanding the Magic of Tied Hashes and Arrays" for more
468 information on how to use the array access functions on tied arrays.
469
470 Working with HVs
471 To create an HV, you use the following routine:
472
473 HV* newHV();
474
475 Once the HV has been created, the following operations are possible on
476 it:
477
478 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
479 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
480
481 The "klen" parameter is the length of the key being passed in (Note
482 that you cannot pass 0 in as a value of "klen" to tell Perl to measure
483 the length of the key). The "val" argument contains the SV pointer to
484 the scalar being stored, and "hash" is the precomputed hash value (zero
485 if you want "hv_store" to calculate it for you). The "lval" parameter
486 indicates whether this fetch is actually a part of a store operation,
487 in which case a new undefined value will be added to the HV with the
488 supplied key and "hv_fetch" will return as if the value had already
489 existed.
490
491 Remember that "hv_store" and "hv_fetch" return "SV**"'s and not just
492 "SV*". To access the scalar value, you must first dereference the
493 return value. However, you should check to make sure that the return
494 value is not NULL before dereferencing it.
495
496 The first of these two functions checks if a hash table entry exists,
497 and the second deletes it.
498
499 bool hv_exists(HV*, const char* key, U32 klen);
500 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
501
502 If "flags" does not include the "G_DISCARD" flag then "hv_delete" will
503 create and return a mortal copy of the deleted value.
504
505 And more miscellaneous functions:
506
507 void hv_clear(HV*);
508 void hv_undef(HV*);
509
510 Like their AV counterparts, "hv_clear" deletes all the entries in the
511 hash table but does not actually delete the hash table. The "hv_undef"
512 deletes both the entries and the hash table itself.
513
514 Perl keeps the actual data in a linked list of structures with a
515 typedef of HE. These contain the actual key and value pointers (plus
516 extra administrative overhead). The key is a string pointer; the value
517 is an "SV*". However, once you have an "HE*", to get the actual key
518 and value, use the routines specified below.
519
520 I32 hv_iterinit(HV*);
521 /* Prepares starting point to traverse hash table */
522 HE* hv_iternext(HV*);
523 /* Get the next entry, and return a pointer to a
524 structure that has both the key and value */
525 char* hv_iterkey(HE* entry, I32* retlen);
526 /* Get the key from an HE structure and also return
527 the length of the key string */
528 SV* hv_iterval(HV*, HE* entry);
529 /* Return an SV pointer to the value of the HE
530 structure */
531 SV* hv_iternextsv(HV*, char** key, I32* retlen);
532 /* This convenience routine combines hv_iternext,
533 hv_iterkey, and hv_iterval. The key and retlen
534 arguments are return values for the key and its
535 length. The value is returned in the SV* argument */
536
537 If you know the name of a hash variable, you can get a pointer to its
538 HV by using the following:
539
540 HV* get_hv("package::varname", 0);
541
542 This returns NULL if the variable does not exist.
543
544 The hash algorithm is defined in the "PERL_HASH" macro:
545
546 PERL_HASH(hash, key, klen)
547
548 The exact implementation of this macro varies by architecture and
549 version of perl, and the return value may change per invocation, so the
550 value is only valid for the duration of a single perl process.
551
552 See "Understanding the Magic of Tied Hashes and Arrays" for more
553 information on how to use the hash access functions on tied hashes.
554
555 Hash API Extensions
556 Beginning with version 5.004, the following functions are also
557 supported:
558
559 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
560 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
561
562 bool hv_exists_ent (HV* tb, SV* key, U32 hash);
563 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
564
565 SV* hv_iterkeysv (HE* entry);
566
567 Note that these functions take "SV*" keys, which simplifies writing of
568 extension code that deals with hash structures. These functions also
569 allow passing of "SV*" keys to "tie" functions without forcing you to
570 stringify the keys (unlike the previous set of functions).
571
572 They also return and accept whole hash entries ("HE*"), making their
573 use more efficient (since the hash number for a particular string
574 doesn't have to be recomputed every time). See perlapi for detailed
575 descriptions.
576
577 The following macros must always be used to access the contents of hash
578 entries. Note that the arguments to these macros must be simple
579 variables, since they may get evaluated more than once. See perlapi
580 for detailed descriptions of these macros.
581
582 HePV(HE* he, STRLEN len)
583 HeVAL(HE* he)
584 HeHASH(HE* he)
585 HeSVKEY(HE* he)
586 HeSVKEY_force(HE* he)
587 HeSVKEY_set(HE* he, SV* sv)
588
589 These two lower level macros are defined, but must only be used when
590 dealing with keys that are not "SV*"s:
591
592 HeKEY(HE* he)
593 HeKLEN(HE* he)
594
595 Note that both "hv_store" and "hv_store_ent" do not increment the
596 reference count of the stored "val", which is the caller's
597 responsibility. If these functions return a NULL value, the caller
598 will usually have to decrement the reference count of "val" to avoid a
599 memory leak.
600
601 AVs, HVs and undefined values
602 Sometimes you have to store undefined values in AVs or HVs. Although
603 this may be a rare case, it can be tricky. That's because you're used
604 to using &PL_sv_undef if you need an undefined SV.
605
606 For example, intuition tells you that this XS code:
607
608 AV *av = newAV();
609 av_store( av, 0, &PL_sv_undef );
610
611 is equivalent to this Perl code:
612
613 my @av;
614 $av[0] = undef;
615
616 Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use
617 &PL_sv_undef as a marker for indicating that an array element has not
618 yet been initialized. Thus, "exists $av[0]" would be true for the
619 above Perl code, but false for the array generated by the XS code. In
620 perl 5.20, storing &PL_sv_undef will create a read-only element,
621 because the scalar &PL_sv_undef itself is stored, not a copy.
622
623 Similar problems can occur when storing &PL_sv_undef in HVs:
624
625 hv_store( hv, "key", 3, &PL_sv_undef, 0 );
626
627 This will indeed make the value "undef", but if you try to modify the
628 value of "key", you'll get the following error:
629
630 Modification of non-creatable hash value attempted
631
632 In perl 5.8.0, &PL_sv_undef was also used to mark placeholders in
633 restricted hashes. This caused such hash entries not to appear when
634 iterating over the hash or when checking for the keys with the
635 "hv_exists" function.
636
637 You can run into similar problems when you store &PL_sv_yes or
638 &PL_sv_no into AVs or HVs. Trying to modify such elements will give
639 you the following error:
640
641 Modification of a read-only value attempted
642
643 To make a long story short, you can use the special variables
644 &PL_sv_undef, &PL_sv_yes and &PL_sv_no with AVs and HVs, but you have
645 to make sure you know what you're doing.
646
647 Generally, if you want to store an undefined value in an AV or HV, you
648 should not use &PL_sv_undef, but rather create a new undefined value
649 using the "newSV" function, for example:
650
651 av_store( av, 42, newSV(0) );
652 hv_store( hv, "foo", 3, newSV(0), 0 );
653
654 References
655 References are a special type of scalar that point to other data types
656 (including other references).
657
658 To create a reference, use either of the following functions:
659
660 SV* newRV_inc((SV*) thing);
661 SV* newRV_noinc((SV*) thing);
662
663 The "thing" argument can be any of an "SV*", "AV*", or "HV*". The
664 functions are identical except that "newRV_inc" increments the
665 reference count of the "thing", while "newRV_noinc" does not. For
666 historical reasons, "newRV" is a synonym for "newRV_inc".
667
668 Once you have a reference, you can use the following macro to
669 dereference the reference:
670
671 SvRV(SV*)
672
673 then call the appropriate routines, casting the returned "SV*" to
674 either an "AV*" or "HV*", if required.
675
676 To determine if an SV is a reference, you can use the following macro:
677
678 SvROK(SV*)
679
680 To discover what type of value the reference refers to, use the
681 following macro and then check the return value.
682
683 SvTYPE(SvRV(SV*))
684
685 The most useful types that will be returned are:
686
687 SVt_PVAV Array
688 SVt_PVHV Hash
689 SVt_PVCV Code
690 SVt_PVGV Glob (possibly a file handle)
691
692 Any numerical value returned which is less than SVt_PVAV will be a
693 scalar of some form.
694
695 See "svtype" in perlapi for more details.
696
697 Blessed References and Class Objects
698 References are also used to support object-oriented programming. In
699 perl's OO lexicon, an object is simply a reference that has been
700 blessed into a package (or class). Once blessed, the programmer may
701 now use the reference to access the various methods in the class.
702
703 A reference can be blessed into a package with the following function:
704
705 SV* sv_bless(SV* sv, HV* stash);
706
707 The "sv" argument must be a reference value. The "stash" argument
708 specifies which class the reference will belong to. See "Stashes and
709 Globs" for information on converting class names into stashes.
710
711 /* Still under construction */
712
713 The following function upgrades rv to reference if not already one.
714 Creates a new SV for rv to point to. If "classname" is non-null, the
715 SV is blessed into the specified class. SV is returned.
716
717 SV* newSVrv(SV* rv, const char* classname);
718
719 The following three functions copy integer, unsigned integer or double
720 into an SV whose reference is "rv". SV is blessed if "classname" is
721 non-null.
722
723 SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
724 SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
725 SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
726
727 The following function copies the pointer value (the address, not the
728 string!) into an SV whose reference is rv. SV is blessed if
729 "classname" is non-null.
730
731 SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
732
733 The following function copies a string into an SV whose reference is
734 "rv". Set length to 0 to let Perl calculate the string length. SV is
735 blessed if "classname" is non-null.
736
737 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
738 STRLEN length);
739
740 The following function tests whether the SV is blessed into the
741 specified class. It does not check inheritance relationships.
742
743 int sv_isa(SV* sv, const char* name);
744
745 The following function tests whether the SV is a reference to a blessed
746 object.
747
748 int sv_isobject(SV* sv);
749
750 The following function tests whether the SV is derived from the
751 specified class. SV can be either a reference to a blessed object or a
752 string containing a class name. This is the function implementing the
753 "UNIVERSAL::isa" functionality.
754
755 bool sv_derived_from(SV* sv, const char* name);
756
757 To check if you've got an object derived from a specific class you have
758 to write:
759
760 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
761
762 Creating New Variables
763 To create a new Perl variable with an undef value which can be accessed
764 from your Perl script, use the following routines, depending on the
765 variable type.
766
767 SV* get_sv("package::varname", GV_ADD);
768 AV* get_av("package::varname", GV_ADD);
769 HV* get_hv("package::varname", GV_ADD);
770
771 Notice the use of GV_ADD as the second parameter. The new variable can
772 now be set, using the routines appropriate to the data type.
773
774 There are additional macros whose values may be bitwise OR'ed with the
775 "GV_ADD" argument to enable certain extra features. Those bits are:
776
777 GV_ADDMULTI
778 Marks the variable as multiply defined, thus preventing the:
779
780 Name <varname> used only once: possible typo
781
782 warning.
783
784 GV_ADDWARN
785 Issues the warning:
786
787 Had to create <varname> unexpectedly
788
789 if the variable did not exist before the function was called.
790
791 If you do not specify a package name, the variable is created in the
792 current package.
793
794 Reference Counts and Mortality
795 Perl uses a reference count-driven garbage collection mechanism. SVs,
796 AVs, or HVs (xV for short in the following) start their life with a
797 reference count of 1. If the reference count of an xV ever drops to 0,
798 then it will be destroyed and its memory made available for reuse. At
799 the most basic internal level, reference counts can be manipulated with
800 the following macros:
801
802 int SvREFCNT(SV* sv);
803 SV* SvREFCNT_inc(SV* sv);
804 void SvREFCNT_dec(SV* sv);
805
806 (There are also suffixed versions of the increment and decrement
807 macros, for situations where the full generality of these basic macros
808 can be exchanged for some performance.)
809
810 However, the way a programmer should think about references is not so
811 much in terms of the bare reference count, but in terms of ownership of
812 references. A reference to an xV can be owned by any of a variety of
813 entities: another xV, the Perl interpreter, an XS data structure, a
814 piece of running code, or a dynamic scope. An xV generally does not
815 know what entities own the references to it; it only knows how many
816 references there are, which is the reference count.
817
818 To correctly maintain reference counts, it is essential to keep track
819 of what references the XS code is manipulating. The programmer should
820 always know where a reference has come from and who owns it, and be
821 aware of any creation or destruction of references, and any transfers
822 of ownership. Because ownership isn't represented explicitly in the xV
823 data structures, only the reference count need be actually maintained
824 by the code, and that means that this understanding of ownership is not
825 actually evident in the code. For example, transferring ownership of a
826 reference from one owner to another doesn't change the reference count
827 at all, so may be achieved with no actual code. (The transferring code
828 doesn't touch the referenced object, but does need to ensure that the
829 former owner knows that it no longer owns the reference, and that the
830 new owner knows that it now does.)
831
832 An xV that is visible at the Perl level should not become unreferenced
833 and thus be destroyed. Normally, an object will only become
834 unreferenced when it is no longer visible, often by the same means that
835 makes it invisible. For example, a Perl reference value (RV) owns a
836 reference to its referent, so if the RV is overwritten that reference
837 gets destroyed, and the no-longer-reachable referent may be destroyed
838 as a result.
839
840 Many functions have some kind of reference manipulation as part of
841 their purpose. Sometimes this is documented in terms of ownership of
842 references, and sometimes it is (less helpfully) documented in terms of
843 changes to reference counts. For example, the newRV_inc() function is
844 documented to create a new RV (with reference count 1) and increment
845 the reference count of the referent that was supplied by the caller.
846 This is best understood as creating a new reference to the referent,
847 which is owned by the created RV, and returning to the caller ownership
848 of the sole reference to the RV. The newRV_noinc() function instead
849 does not increment the reference count of the referent, but the RV
850 nevertheless ends up owning a reference to the referent. It is
851 therefore implied that the caller of "newRV_noinc()" is relinquishing a
852 reference to the referent, making this conceptually a more complicated
853 operation even though it does less to the data structures.
854
855 For example, imagine you want to return a reference from an XSUB
856 function. Inside the XSUB routine, you create an SV which initially
857 has just a single reference, owned by the XSUB routine. This reference
858 needs to be disposed of before the routine is complete, otherwise it
859 will leak, preventing the SV from ever being destroyed. So to create
860 an RV referencing the SV, it is most convenient to pass the SV to
861 "newRV_noinc()", which consumes that reference. Now the XSUB routine
862 no longer owns a reference to the SV, but does own a reference to the
863 RV, which in turn owns a reference to the SV. The ownership of the
864 reference to the RV is then transferred by the process of returning the
865 RV from the XSUB.
866
867 There are some convenience functions available that can help with the
868 destruction of xVs. These functions introduce the concept of
869 "mortality". Much documentation speaks of an xV itself being mortal,
870 but this is misleading. It is really a reference to an xV that is
871 mortal, and it is possible for there to be more than one mortal
872 reference to a single xV. For a reference to be mortal means that it
873 is owned by the temps stack, one of perl's many internal stacks, which
874 will destroy that reference "a short time later". Usually the "short
875 time later" is the end of the current Perl statement. However, it gets
876 more complicated around dynamic scopes: there can be multiple sets of
877 mortal references hanging around at the same time, with different death
878 dates. Internally, the actual determinant for when mortal xV
879 references are destroyed depends on two macros, SAVETMPS and FREETMPS.
880 See perlcall and perlxs for more details on these macros.
881
882 Mortal references are mainly used for xVs that are placed on perl's
883 main stack. The stack is problematic for reference tracking, because
884 it contains a lot of xV references, but doesn't own those references:
885 they are not counted. Currently, there are many bugs resulting from
886 xVs being destroyed while referenced by the stack, because the stack's
887 uncounted references aren't enough to keep the xVs alive. So when
888 putting an (uncounted) reference on the stack, it is vitally important
889 to ensure that there will be a counted reference to the same xV that
890 will last at least as long as the uncounted reference. But it's also
891 important that that counted reference be cleaned up at an appropriate
892 time, and not unduly prolong the xV's life. For there to be a mortal
893 reference is often the best way to satisfy this requirement, especially
894 if the xV was created especially to be put on the stack and would
895 otherwise be unreferenced.
896
897 To create a mortal reference, use the functions:
898
899 SV* sv_newmortal()
900 SV* sv_mortalcopy(SV*)
901 SV* sv_2mortal(SV*)
902
903 "sv_newmortal()" creates an SV (with the undefined value) whose sole
904 reference is mortal. "sv_mortalcopy()" creates an xV whose value is a
905 copy of a supplied xV and whose sole reference is mortal.
906 "sv_2mortal()" mortalises an existing xV reference: it transfers
907 ownership of a reference from the caller to the temps stack. Because
908 "sv_newmortal" gives the new SV no value, it must normally be given one
909 via "sv_setpv", "sv_setiv", etc. :
910
911 SV *tmp = sv_newmortal();
912 sv_setiv(tmp, an_integer);
913
914 As that is multiple C statements it is quite common so see this idiom
915 instead:
916
917 SV *tmp = sv_2mortal(newSViv(an_integer));
918
919 The mortal routines are not just for SVs; AVs and HVs can be made
920 mortal by passing their address (type-casted to "SV*") to the
921 "sv_2mortal" or "sv_mortalcopy" routines.
922
923 Stashes and Globs
924 A stash is a hash that contains all variables that are defined within a
925 package. Each key of the stash is a symbol name (shared by all the
926 different types of objects that have the same name), and each value in
927 the hash table is a GV (Glob Value). This GV in turn contains
928 references to the various objects of that name, including (but not
929 limited to) the following:
930
931 Scalar Value
932 Array Value
933 Hash Value
934 I/O Handle
935 Format
936 Subroutine
937
938 There is a single stash called "PL_defstash" that holds the items that
939 exist in the "main" package. To get at the items in other packages,
940 append the string "::" to the package name. The items in the "Foo"
941 package are in the stash "Foo::" in PL_defstash. The items in the
942 "Bar::Baz" package are in the stash "Baz::" in "Bar::"'s stash.
943
944 To get the stash pointer for a particular package, use the function:
945
946 HV* gv_stashpv(const char* name, I32 flags)
947 HV* gv_stashsv(SV*, I32 flags)
948
949 The first function takes a literal string, the second uses the string
950 stored in the SV. Remember that a stash is just a hash table, so you
951 get back an "HV*". The "flags" flag will create a new package if it is
952 set to GV_ADD.
953
954 The name that "gv_stash*v" wants is the name of the package whose
955 symbol table you want. The default package is called "main". If you
956 have multiply nested packages, pass their names to "gv_stash*v",
957 separated by "::" as in the Perl language itself.
958
959 Alternately, if you have an SV that is a blessed reference, you can
960 find out the stash pointer by using:
961
962 HV* SvSTASH(SvRV(SV*));
963
964 then use the following to get the package name itself:
965
966 char* HvNAME(HV* stash);
967
968 If you need to bless or re-bless an object you can use the following
969 function:
970
971 SV* sv_bless(SV*, HV* stash)
972
973 where the first argument, an "SV*", must be a reference, and the second
974 argument is a stash. The returned "SV*" can now be used in the same
975 way as any other SV.
976
977 For more information on references and blessings, consult perlref.
978
979 Double-Typed SVs
980 Scalar variables normally contain only one type of value, an integer,
981 double, pointer, or reference. Perl will automatically convert the
982 actual scalar data from the stored type into the requested type.
983
984 Some scalar variables contain more than one type of scalar data. For
985 example, the variable $! contains either the numeric value of "errno"
986 or its string equivalent from either "strerror" or "sys_errlist[]".
987
988 To force multiple data values into an SV, you must do two things: use
989 the "sv_set*v" routines to add the additional scalar type, then set a
990 flag so that Perl will believe it contains more than one type of data.
991 The four macros to set the flags are:
992
993 SvIOK_on
994 SvNOK_on
995 SvPOK_on
996 SvROK_on
997
998 The particular macro you must use depends on which "sv_set*v" routine
999 you called first. This is because every "sv_set*v" routine turns on
1000 only the bit for the particular type of data being set, and turns off
1001 all the rest.
1002
1003 For example, to create a new Perl variable called "dberror" that
1004 contains both the numeric and descriptive string error values, you
1005 could use the following code:
1006
1007 extern int dberror;
1008 extern char *dberror_list;
1009
1010 SV* sv = get_sv("dberror", GV_ADD);
1011 sv_setiv(sv, (IV) dberror);
1012 sv_setpv(sv, dberror_list[dberror]);
1013 SvIOK_on(sv);
1014
1015 If the order of "sv_setiv" and "sv_setpv" had been reversed, then the
1016 macro "SvPOK_on" would need to be called instead of "SvIOK_on".
1017
1018 Read-Only Values
1019 In Perl 5.16 and earlier, copy-on-write (see the next section) shared a
1020 flag bit with read-only scalars. So the only way to test whether
1021 "sv_setsv", etc., will raise a "Modification of a read-only value"
1022 error in those versions is:
1023
1024 SvREADONLY(sv) && !SvIsCOW(sv)
1025
1026 Under Perl 5.18 and later, SvREADONLY only applies to read-only
1027 variables, and, under 5.20, copy-on-write scalars can also be read-
1028 only, so the above check is incorrect. You just want:
1029
1030 SvREADONLY(sv)
1031
1032 If you need to do this check often, define your own macro like this:
1033
1034 #if PERL_VERSION >= 18
1035 # define SvTRULYREADONLY(sv) SvREADONLY(sv)
1036 #else
1037 # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv))
1038 #endif
1039
1040 Copy on Write
1041 Perl implements a copy-on-write (COW) mechanism for scalars, in which
1042 string copies are not immediately made when requested, but are deferred
1043 until made necessary by one or the other scalar changing. This is
1044 mostly transparent, but one must take care not to modify string buffers
1045 that are shared by multiple SVs.
1046
1047 You can test whether an SV is using copy-on-write with "SvIsCOW(sv)".
1048
1049 You can force an SV to make its own copy of its string buffer by
1050 calling "sv_force_normal(sv)" or SvPV_force_nolen(sv).
1051
1052 If you want to make the SV drop its string buffer, use
1053 "sv_force_normal_flags(sv, SV_COW_DROP_PV)" or simply "sv_setsv(sv,
1054 NULL)".
1055
1056 All of these functions will croak on read-only scalars (see the
1057 previous section for more on those).
1058
1059 To test that your code is behaving correctly and not modifying COW
1060 buffers, on systems that support mmap(2) (i.e., Unix) you can configure
1061 perl with "-Accflags=-DPERL_DEBUG_READONLY_COW" and it will turn buffer
1062 violations into crashes. You will find it to be marvellously slow, so
1063 you may want to skip perl's own tests.
1064
1065 Magic Variables
1066 [This section still under construction. Ignore everything here. Post
1067 no bills. Everything not permitted is forbidden.]
1068
1069 Any SV may be magical, that is, it has special features that a normal
1070 SV does not have. These features are stored in the SV structure in a
1071 linked list of "struct magic"'s, typedef'ed to "MAGIC".
1072
1073 struct magic {
1074 MAGIC* mg_moremagic;
1075 MGVTBL* mg_virtual;
1076 U16 mg_private;
1077 char mg_type;
1078 U8 mg_flags;
1079 I32 mg_len;
1080 SV* mg_obj;
1081 char* mg_ptr;
1082 };
1083
1084 Note this is current as of patchlevel 0, and could change at any time.
1085
1086 Assigning Magic
1087 Perl adds magic to an SV using the sv_magic function:
1088
1089 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
1090
1091 The "sv" argument is a pointer to the SV that is to acquire a new
1092 magical feature.
1093
1094 If "sv" is not already magical, Perl uses the "SvUPGRADE" macro to
1095 convert "sv" to type "SVt_PVMG". Perl then continues by adding new
1096 magic to the beginning of the linked list of magical features. Any
1097 prior entry of the same type of magic is deleted. Note that this can
1098 be overridden, and multiple instances of the same type of magic can be
1099 associated with an SV.
1100
1101 The "name" and "namlen" arguments are used to associate a string with
1102 the magic, typically the name of a variable. "namlen" is stored in the
1103 "mg_len" field and if "name" is non-null then either a "savepvn" copy
1104 of "name" or "name" itself is stored in the "mg_ptr" field, depending
1105 on whether "namlen" is greater than zero or equal to zero respectively.
1106 As a special case, if "(name && namlen == HEf_SVKEY)" then "name" is
1107 assumed to contain an "SV*" and is stored as-is with its REFCNT
1108 incremented.
1109
1110 The sv_magic function uses "how" to determine which, if any, predefined
1111 "Magic Virtual Table" should be assigned to the "mg_virtual" field.
1112 See the "Magic Virtual Tables" section below. The "how" argument is
1113 also stored in the "mg_type" field. The value of "how" should be
1114 chosen from the set of macros "PERL_MAGIC_foo" found in perl.h. Note
1115 that before these macros were added, Perl internals used to directly
1116 use character literals, so you may occasionally come across old code or
1117 documentation referring to 'U' magic rather than "PERL_MAGIC_uvar" for
1118 example.
1119
1120 The "obj" argument is stored in the "mg_obj" field of the "MAGIC"
1121 structure. If it is not the same as the "sv" argument, the reference
1122 count of the "obj" object is incremented. If it is the same, or if the
1123 "how" argument is "PERL_MAGIC_arylen", "PERL_MAGIC_regdatum",
1124 "PERL_MAGIC_regdata", or if it is a NULL pointer, then "obj" is merely
1125 stored, without the reference count being incremented.
1126
1127 See also "sv_magicext" in perlapi for a more flexible way to add magic
1128 to an SV.
1129
1130 There is also a function to add magic to an "HV":
1131
1132 void hv_magic(HV *hv, GV *gv, int how);
1133
1134 This simply calls "sv_magic" and coerces the "gv" argument into an
1135 "SV".
1136
1137 To remove the magic from an SV, call the function sv_unmagic:
1138
1139 int sv_unmagic(SV *sv, int type);
1140
1141 The "type" argument should be equal to the "how" value when the "SV"
1142 was initially made magical.
1143
1144 However, note that "sv_unmagic" removes all magic of a certain "type"
1145 from the "SV". If you want to remove only certain magic of a "type"
1146 based on the magic virtual table, use "sv_unmagicext" instead:
1147
1148 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
1149
1150 Magic Virtual Tables
1151 The "mg_virtual" field in the "MAGIC" structure is a pointer to an
1152 "MGVTBL", which is a structure of function pointers and stands for
1153 "Magic Virtual Table" to handle the various operations that might be
1154 applied to that variable.
1155
1156 The "MGVTBL" has five (or sometimes eight) pointers to the following
1157 routine types:
1158
1159 int (*svt_get) (pTHX_ SV* sv, MAGIC* mg);
1160 int (*svt_set) (pTHX_ SV* sv, MAGIC* mg);
1161 U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg);
1162 int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg);
1163 int (*svt_free) (pTHX_ SV* sv, MAGIC* mg);
1164
1165 int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv,
1166 const char *name, I32 namlen);
1167 int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param);
1168 int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg);
1169
1170 This MGVTBL structure is set at compile-time in perl.h and there are
1171 currently 32 types. These different structures contain pointers to
1172 various routines that perform additional actions depending on which
1173 function is being called.
1174
1175 Function pointer Action taken
1176 ---------------- ------------
1177 svt_get Do something before the value of the SV is
1178 retrieved.
1179 svt_set Do something after the SV is assigned a value.
1180 svt_len Report on the SV's length.
1181 svt_clear Clear something the SV represents.
1182 svt_free Free any extra storage associated with the SV.
1183
1184 svt_copy copy tied variable magic to a tied element
1185 svt_dup duplicate a magic structure during thread cloning
1186 svt_local copy magic to local value during 'local'
1187
1188 For instance, the MGVTBL structure called "vtbl_sv" (which corresponds
1189 to an "mg_type" of "PERL_MAGIC_sv") contains:
1190
1191 { magic_get, magic_set, magic_len, 0, 0 }
1192
1193 Thus, when an SV is determined to be magical and of type
1194 "PERL_MAGIC_sv", if a get operation is being performed, the routine
1195 "magic_get" is called. All the various routines for the various
1196 magical types begin with "magic_". NOTE: the magic routines are not
1197 considered part of the Perl API, and may not be exported by the Perl
1198 library.
1199
1200 The last three slots are a recent addition, and for source code
1201 compatibility they are only checked for if one of the three flags
1202 MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. This means that
1203 most code can continue declaring a vtable as a 5-element value. These
1204 three are currently used exclusively by the threading code, and are
1205 highly subject to change.
1206
1207 The current kinds of Magic Virtual Tables are:
1208
1209 mg_type
1210 (old-style char and macro) MGVTBL Type of magic
1211 -------------------------- ------ -------------
1212 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
1213 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
1214 % PERL_MAGIC_rhash (none) Extra data for restricted
1215 hashes
1216 * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace
1217 vars
1218 . PERL_MAGIC_pos vtbl_pos pos() lvalue
1219 : PERL_MAGIC_symtab (none) Extra data for symbol
1220 tables
1221 < PERL_MAGIC_backref vtbl_backref For weak ref data
1222 @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV
1223 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
1224 (fast string search)
1225 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
1226 (AMT) on stash
1227 D PERL_MAGIC_regdata vtbl_regdata Regex match position data
1228 (@+ and @- vars)
1229 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
1230 element
1231 E PERL_MAGIC_env vtbl_env %ENV hash
1232 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
1233 f PERL_MAGIC_fm vtbl_regexp Formline
1234 ('compiled' format)
1235 g PERL_MAGIC_regex_global vtbl_mglob m//g target
1236 H PERL_MAGIC_hints vtbl_hints %^H hash
1237 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
1238 I PERL_MAGIC_isa vtbl_isa @ISA array
1239 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
1240 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
1241 L PERL_MAGIC_dbfile (none) Debugger %_<filename
1242 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
1243 element
1244 N PERL_MAGIC_shared (none) Shared between threads
1245 n PERL_MAGIC_shared_scalar (none) Shared between threads
1246 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
1247 P PERL_MAGIC_tied vtbl_pack Tied array or hash
1248 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
1249 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
1250 r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex
1251 S PERL_MAGIC_sig (none) %SIG hash
1252 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
1253 t PERL_MAGIC_taint vtbl_taint Taintedness
1254 U PERL_MAGIC_uvar vtbl_uvar Available for use by
1255 extensions
1256 u PERL_MAGIC_uvar_elem (none) Reserved for use by
1257 extensions
1258 V PERL_MAGIC_vstring (none) SV was vstring literal
1259 v PERL_MAGIC_vec vtbl_vec vec() lvalue
1260 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
1261 x PERL_MAGIC_substr vtbl_substr substr() lvalue
1262 Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not
1263 exist
1264 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
1265 variable / smart parameter
1266 vivification
1267 \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference
1268 constructor
1269 ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call
1270 to this CV
1271 ~ PERL_MAGIC_ext (none) Available for use by
1272 extensions
1273
1274 When an uppercase and lowercase letter both exist in the table, then
1275 the uppercase letter is typically used to represent some kind of
1276 composite type (a list or a hash), and the lowercase letter is used to
1277 represent an element of that composite type. Some internals code makes
1278 use of this case relationship. However, 'v' and 'V' (vec and v-string)
1279 are in no way related.
1280
1281 The "PERL_MAGIC_ext" and "PERL_MAGIC_uvar" magic types are defined
1282 specifically for use by extensions and will not be used by perl itself.
1283 Extensions can use "PERL_MAGIC_ext" magic to 'attach' private
1284 information to variables (typically objects). This is especially
1285 useful because there is no way for normal perl code to corrupt this
1286 private information (unlike using extra elements of a hash object).
1287
1288 Similarly, "PERL_MAGIC_uvar" magic can be used much like tie() to call
1289 a C function any time a scalar's value is used or changed. The
1290 "MAGIC"'s "mg_ptr" field points to a "ufuncs" structure:
1291
1292 struct ufuncs {
1293 I32 (*uf_val)(pTHX_ IV, SV*);
1294 I32 (*uf_set)(pTHX_ IV, SV*);
1295 IV uf_index;
1296 };
1297
1298 When the SV is read from or written to, the "uf_val" or "uf_set"
1299 function will be called with "uf_index" as the first arg and a pointer
1300 to the SV as the second. A simple example of how to add
1301 "PERL_MAGIC_uvar" magic is shown below. Note that the ufuncs structure
1302 is copied by sv_magic, so you can safely allocate it on the stack.
1303
1304 void
1305 Umagic(sv)
1306 SV *sv;
1307 PREINIT:
1308 struct ufuncs uf;
1309 CODE:
1310 uf.uf_val = &my_get_fn;
1311 uf.uf_set = &my_set_fn;
1312 uf.uf_index = 0;
1313 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
1314
1315 Attaching "PERL_MAGIC_uvar" to arrays is permissible but has no effect.
1316
1317 For hashes there is a specialized hook that gives control over hash
1318 keys (but not values). This hook calls "PERL_MAGIC_uvar" 'get' magic
1319 if the "set" function in the "ufuncs" structure is NULL. The hook is
1320 activated whenever the hash is accessed with a key specified as an "SV"
1321 through the functions "hv_store_ent", "hv_fetch_ent", "hv_delete_ent",
1322 and "hv_exists_ent". Accessing the key as a string through the
1323 functions without the "..._ent" suffix circumvents the hook. See
1324 "GUTS" in Hash::Util::FieldHash for a detailed description.
1325
1326 Note that because multiple extensions may be using "PERL_MAGIC_ext" or
1327 "PERL_MAGIC_uvar" magic, it is important for extensions to take extra
1328 care to avoid conflict. Typically only using the magic on objects
1329 blessed into the same class as the extension is sufficient. For
1330 "PERL_MAGIC_ext" magic, it is usually a good idea to define an
1331 "MGVTBL", even if all its fields will be 0, so that individual "MAGIC"
1332 pointers can be identified as a particular kind of magic using their
1333 magic virtual table. "mg_findext" provides an easy way to do that:
1334
1335 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
1336
1337 MAGIC *mg;
1338 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
1339 /* this is really ours, not another module's PERL_MAGIC_ext */
1340 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr;
1341 ...
1342 }
1343
1344 Also note that the "sv_set*()" and "sv_cat*()" functions described
1345 earlier do not invoke 'set' magic on their targets. This must be done
1346 by the user either by calling the "SvSETMAGIC()" macro after calling
1347 these functions, or by using one of the "sv_set*_mg()" or
1348 "sv_cat*_mg()" functions. Similarly, generic C code must call the
1349 "SvGETMAGIC()" macro to invoke any 'get' magic if they use an SV
1350 obtained from external sources in functions that don't handle magic.
1351 See perlapi for a description of these functions. For example, calls
1352 to the "sv_cat*()" functions typically need to be followed by
1353 "SvSETMAGIC()", but they don't need a prior "SvGETMAGIC()" since their
1354 implementation handles 'get' magic.
1355
1356 Finding Magic
1357 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that
1358 * type */
1359
1360 This routine returns a pointer to a "MAGIC" structure stored in the SV.
1361 If the SV does not have that magical feature, "NULL" is returned. If
1362 the SV has multiple instances of that magical feature, the first one
1363 will be returned. "mg_findext" can be used to find a "MAGIC" structure
1364 of an SV based on both its magic type and its magic virtual table:
1365
1366 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
1367
1368 Also, if the SV passed to "mg_find" or "mg_findext" is not of type
1369 SVt_PVMG, Perl may core dump.
1370
1371 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
1372
1373 This routine checks to see what types of magic "sv" has. If the
1374 mg_type field is an uppercase letter, then the mg_obj is copied to
1375 "nsv", but the mg_type field is changed to be the lowercase letter.
1376
1377 Understanding the Magic of Tied Hashes and Arrays
1378 Tied hashes and arrays are magical beasts of the "PERL_MAGIC_tied"
1379 magic type.
1380
1381 WARNING: As of the 5.004 release, proper usage of the array and hash
1382 access functions requires understanding a few caveats. Some of these
1383 caveats are actually considered bugs in the API, to be fixed in later
1384 releases, and are bracketed with [MAYCHANGE] below. If you find
1385 yourself actually applying such information in this section, be aware
1386 that the behavior may change in the future, umm, without warning.
1387
1388 The perl tie function associates a variable with an object that
1389 implements the various GET, SET, etc methods. To perform the
1390 equivalent of the perl tie function from an XSUB, you must mimic this
1391 behaviour. The code below carries out the necessary steps -- firstly
1392 it creates a new hash, and then creates a second hash which it blesses
1393 into the class which will implement the tie methods. Lastly it ties
1394 the two hashes together, and returns a reference to the new tied hash.
1395 Note that the code below does NOT call the TIEHASH method in the MyTie
1396 class - see "Calling Perl Routines from within C Programs" for details
1397 on how to do this.
1398
1399 SV*
1400 mytie()
1401 PREINIT:
1402 HV *hash;
1403 HV *stash;
1404 SV *tie;
1405 CODE:
1406 hash = newHV();
1407 tie = newRV_noinc((SV*)newHV());
1408 stash = gv_stashpv("MyTie", GV_ADD);
1409 sv_bless(tie, stash);
1410 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
1411 RETVAL = newRV_noinc(hash);
1412 OUTPUT:
1413 RETVAL
1414
1415 The "av_store" function, when given a tied array argument, merely
1416 copies the magic of the array onto the value to be "stored", using
1417 "mg_copy". It may also return NULL, indicating that the value did not
1418 actually need to be stored in the array. [MAYCHANGE] After a call to
1419 "av_store" on a tied array, the caller will usually need to call
1420 "mg_set(val)" to actually invoke the perl level "STORE" method on the
1421 TIEARRAY object. If "av_store" did return NULL, a call to
1422 "SvREFCNT_dec(val)" will also be usually necessary to avoid a memory
1423 leak. [/MAYCHANGE]
1424
1425 The previous paragraph is applicable verbatim to tied hash access using
1426 the "hv_store" and "hv_store_ent" functions as well.
1427
1428 "av_fetch" and the corresponding hash functions "hv_fetch" and
1429 "hv_fetch_ent" actually return an undefined mortal value whose magic
1430 has been initialized using "mg_copy". Note the value so returned does
1431 not need to be deallocated, as it is already mortal. [MAYCHANGE] But
1432 you will need to call "mg_get()" on the returned value in order to
1433 actually invoke the perl level "FETCH" method on the underlying TIE
1434 object. Similarly, you may also call "mg_set()" on the return value
1435 after possibly assigning a suitable value to it using "sv_setsv",
1436 which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]
1437
1438 [MAYCHANGE] In other words, the array or hash fetch/store functions
1439 don't really fetch and store actual values in the case of tied arrays
1440 and hashes. They merely call "mg_copy" to attach magic to the values
1441 that were meant to be "stored" or "fetched". Later calls to "mg_get"
1442 and "mg_set" actually do the job of invoking the TIE methods on the
1443 underlying objects. Thus the magic mechanism currently implements a
1444 kind of lazy access to arrays and hashes.
1445
1446 Currently (as of perl version 5.004), use of the hash and array access
1447 functions requires the user to be aware of whether they are operating
1448 on "normal" hashes and arrays, or on their tied variants. The API may
1449 be changed to provide more transparent access to both tied and normal
1450 data types in future versions. [/MAYCHANGE]
1451
1452 You would do well to understand that the TIEARRAY and TIEHASH
1453 interfaces are mere sugar to invoke some perl method calls while using
1454 the uniform hash and array syntax. The use of this sugar imposes some
1455 overhead (typically about two to four extra opcodes per FETCH/STORE
1456 operation, in addition to the creation of all the mortal variables
1457 required to invoke the methods). This overhead will be comparatively
1458 small if the TIE methods are themselves substantial, but if they are
1459 only a few statements long, the overhead will not be insignificant.
1460
1461 Localizing changes
1462 Perl has a very handy construction
1463
1464 {
1465 local $var = 2;
1466 ...
1467 }
1468
1469 This construction is approximately equivalent to
1470
1471 {
1472 my $oldvar = $var;
1473 $var = 2;
1474 ...
1475 $var = $oldvar;
1476 }
1477
1478 The biggest difference is that the first construction would reinstate
1479 the initial value of $var, irrespective of how control exits the block:
1480 "goto", "return", "die"/"eval", etc. It is a little bit more efficient
1481 as well.
1482
1483 There is a way to achieve a similar task from C via Perl API: create a
1484 pseudo-block, and arrange for some changes to be automatically undone
1485 at the end of it, either explicit, or via a non-local exit (via die()).
1486 A block-like construct is created by a pair of "ENTER"/"LEAVE" macros
1487 (see "Returning a Scalar" in perlcall). Such a construct may be
1488 created specially for some important localized task, or an existing one
1489 (like boundaries of enclosing Perl subroutine/block, or an existing
1490 pair for freeing TMPs) may be used. (In the second case the overhead
1491 of additional localization must be almost negligible.) Note that any
1492 XSUB is automatically enclosed in an "ENTER"/"LEAVE" pair.
1493
1494 Inside such a pseudo-block the following service is available:
1495
1496 "SAVEINT(int i)"
1497 "SAVEIV(IV i)"
1498 "SAVEI32(I32 i)"
1499 "SAVELONG(long i)"
1500 These macros arrange things to restore the value of integer
1501 variable "i" at the end of enclosing pseudo-block.
1502
1503 SAVESPTR(s)
1504 SAVEPPTR(p)
1505 These macros arrange things to restore the value of pointers "s"
1506 and "p". "s" must be a pointer of a type which survives conversion
1507 to "SV*" and back, "p" should be able to survive conversion to
1508 "char*" and back.
1509
1510 "SAVEFREESV(SV *sv)"
1511 The refcount of "sv" will be decremented at the end of pseudo-
1512 block. This is similar to "sv_2mortal" in that it is also a
1513 mechanism for doing a delayed "SvREFCNT_dec". However, while
1514 "sv_2mortal" extends the lifetime of "sv" until the beginning of
1515 the next statement, "SAVEFREESV" extends it until the end of the
1516 enclosing scope. These lifetimes can be wildly different.
1517
1518 Also compare "SAVEMORTALIZESV".
1519
1520 "SAVEMORTALIZESV(SV *sv)"
1521 Just like "SAVEFREESV", but mortalizes "sv" at the end of the
1522 current scope instead of decrementing its reference count. This
1523 usually has the effect of keeping "sv" alive until the statement
1524 that called the currently live scope has finished executing.
1525
1526 "SAVEFREEOP(OP *op)"
1527 The "OP *" is op_free()ed at the end of pseudo-block.
1528
1529 SAVEFREEPV(p)
1530 The chunk of memory which is pointed to by "p" is Safefree()ed at
1531 the end of pseudo-block.
1532
1533 "SAVECLEARSV(SV *sv)"
1534 Clears a slot in the current scratchpad which corresponds to "sv"
1535 at the end of pseudo-block.
1536
1537 "SAVEDELETE(HV *hv, char *key, I32 length)"
1538 The key "key" of "hv" is deleted at the end of pseudo-block. The
1539 string pointed to by "key" is Safefree()ed. If one has a key in
1540 short-lived storage, the corresponding string may be reallocated
1541 like this:
1542
1543 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1544
1545 "SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)"
1546 At the end of pseudo-block the function "f" is called with the only
1547 argument "p".
1548
1549 "SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)"
1550 At the end of pseudo-block the function "f" is called with the
1551 implicit context argument (if any), and "p".
1552
1553 "SAVESTACK_POS()"
1554 The current offset on the Perl internal stack (cf. "SP") is
1555 restored at the end of pseudo-block.
1556
1557 The following API list contains functions, thus one needs to provide
1558 pointers to the modifiable data explicitly (either C pointers, or
1559 Perlish "GV *"s). Where the above macros take "int", a similar
1560 function takes "int *".
1561
1562 "SV* save_scalar(GV *gv)"
1563 Equivalent to Perl code "local $gv".
1564
1565 "AV* save_ary(GV *gv)"
1566 "HV* save_hash(GV *gv)"
1567 Similar to "save_scalar", but localize @gv and %gv.
1568
1569 "void save_item(SV *item)"
1570 Duplicates the current value of "SV", on the exit from the current
1571 "ENTER"/"LEAVE" pseudo-block will restore the value of "SV" using
1572 the stored value. It doesn't handle magic. Use "save_scalar" if
1573 magic is affected.
1574
1575 "void save_list(SV **sarg, I32 maxsarg)"
1576 A variant of "save_item" which takes multiple arguments via an
1577 array "sarg" of "SV*" of length "maxsarg".
1578
1579 "SV* save_svref(SV **sptr)"
1580 Similar to "save_scalar", but will reinstate an "SV *".
1581
1582 "void save_aptr(AV **aptr)"
1583 "void save_hptr(HV **hptr)"
1584 Similar to "save_svref", but localize "AV *" and "HV *".
1585
1586 The "Alias" module implements localization of the basic types within
1587 the caller's scope. People who are interested in how to localize
1588 things in the containing scope should take a look there too.
1589
1591 XSUBs and the Argument Stack
1592 The XSUB mechanism is a simple way for Perl programs to access C
1593 subroutines. An XSUB routine will have a stack that contains the
1594 arguments from the Perl program, and a way to map from the Perl data
1595 structures to a C equivalent.
1596
1597 The stack arguments are accessible through the ST(n) macro, which
1598 returns the "n"'th stack argument. Argument 0 is the first argument
1599 passed in the Perl subroutine call. These arguments are "SV*", and can
1600 be used anywhere an "SV*" is used.
1601
1602 Most of the time, output from the C routine can be handled through use
1603 of the RETVAL and OUTPUT directives. However, there are some cases
1604 where the argument stack is not already long enough to handle all the
1605 return values. An example is the POSIX tzname() call, which takes no
1606 arguments, but returns two, the local time zone's standard and summer
1607 time abbreviations.
1608
1609 To handle this situation, the PPCODE directive is used and the stack is
1610 extended using the macro:
1611
1612 EXTEND(SP, num);
1613
1614 where "SP" is the macro that represents the local copy of the stack
1615 pointer, and "num" is the number of elements the stack should be
1616 extended by.
1617
1618 Now that there is room on the stack, values can be pushed on it using
1619 "PUSHs" macro. The pushed values will often need to be "mortal" (See
1620 "Reference Counts and Mortality"):
1621
1622 PUSHs(sv_2mortal(newSViv(an_integer)))
1623 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
1624 PUSHs(sv_2mortal(newSVnv(a_double)))
1625 PUSHs(sv_2mortal(newSVpv("Some String",0)))
1626 /* Although the last example is better written as the more
1627 * efficient: */
1628 PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
1629
1630 And now the Perl program calling "tzname", the two values will be
1631 assigned as in:
1632
1633 ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1634
1635 An alternate (and possibly simpler) method to pushing values on the
1636 stack is to use the macro:
1637
1638 XPUSHs(SV*)
1639
1640 This macro automatically adjusts the stack for you, if needed. Thus,
1641 you do not need to call "EXTEND" to extend the stack.
1642
1643 Despite their suggestions in earlier versions of this document the
1644 macros "(X)PUSH[iunp]" are not suited to XSUBs which return multiple
1645 results. For that, either stick to the "(X)PUSHs" macros shown above,
1646 or use the new "m(X)PUSH[iunp]" macros instead; see "Putting a C value
1647 on Perl stack".
1648
1649 For more information, consult perlxs and perlxstut.
1650
1651 Autoloading with XSUBs
1652 If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts
1653 the fully-qualified name of the autoloaded subroutine in the $AUTOLOAD
1654 variable of the XSUB's package.
1655
1656 But it also puts the same information in certain fields of the XSUB
1657 itself:
1658
1659 HV *stash = CvSTASH(cv);
1660 const char *subname = SvPVX(cv);
1661 STRLEN name_length = SvCUR(cv); /* in bytes */
1662 U32 is_utf8 = SvUTF8(cv);
1663
1664 "SvPVX(cv)" contains just the sub name itself, not including the
1665 package. For an AUTOLOAD routine in UNIVERSAL or one of its
1666 superclasses, "CvSTASH(cv)" returns NULL during a method call on a
1667 nonexistent package.
1668
1669 Note: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
1670 XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in
1671 the XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If
1672 you need to support 5.8-5.14, use the XSUB's fields.
1673
1674 Calling Perl Routines from within C Programs
1675 There are four routines that can be used to call a Perl subroutine from
1676 within a C program. These four are:
1677
1678 I32 call_sv(SV*, I32);
1679 I32 call_pv(const char*, I32);
1680 I32 call_method(const char*, I32);
1681 I32 call_argv(const char*, I32, char**);
1682
1683 The routine most often used is "call_sv". The "SV*" argument contains
1684 either the name of the Perl subroutine to be called, or a reference to
1685 the subroutine. The second argument consists of flags that control the
1686 context in which the subroutine is called, whether or not the
1687 subroutine is being passed arguments, how errors should be trapped, and
1688 how to treat return values.
1689
1690 All four routines return the number of arguments that the subroutine
1691 returned on the Perl stack.
1692
1693 These routines used to be called "perl_call_sv", etc., before Perl
1694 v5.6.0, but those names are now deprecated; macros of the same name are
1695 provided for compatibility.
1696
1697 When using any of these routines (except "call_argv"), the programmer
1698 must manipulate the Perl stack. These include the following macros and
1699 functions:
1700
1701 dSP
1702 SP
1703 PUSHMARK()
1704 PUTBACK
1705 SPAGAIN
1706 ENTER
1707 SAVETMPS
1708 FREETMPS
1709 LEAVE
1710 XPUSH*()
1711 POP*()
1712
1713 For a detailed description of calling conventions from C to Perl,
1714 consult perlcall.
1715
1716 Putting a C value on Perl stack
1717 A lot of opcodes (this is an elementary operation in the internal perl
1718 stack machine) put an SV* on the stack. However, as an optimization
1719 the corresponding SV is (usually) not recreated each time. The opcodes
1720 reuse specially assigned SVs (targets) which are (as a corollary) not
1721 constantly freed/created.
1722
1723 Each of the targets is created only once (but see "Scratchpads and
1724 recursion" below), and when an opcode needs to put an integer, a
1725 double, or a string on stack, it just sets the corresponding parts of
1726 its target and puts the target on stack.
1727
1728 The macro to put this target on stack is "PUSHTARG", and it is directly
1729 used in some opcodes, as well as indirectly in zillions of others,
1730 which use it via "(X)PUSH[iunp]".
1731
1732 Because the target is reused, you must be careful when pushing multiple
1733 values on the stack. The following code will not do what you think:
1734
1735 XPUSHi(10);
1736 XPUSHi(20);
1737
1738 This translates as "set "TARG" to 10, push a pointer to "TARG" onto the
1739 stack; set "TARG" to 20, push a pointer to "TARG" onto the stack". At
1740 the end of the operation, the stack does not contain the values 10 and
1741 20, but actually contains two pointers to "TARG", which we have set to
1742 20.
1743
1744 If you need to push multiple different values then you should either
1745 use the "(X)PUSHs" macros, or else use the new "m(X)PUSH[iunp]" macros,
1746 none of which make use of "TARG". The "(X)PUSHs" macros simply push an
1747 SV* on the stack, which, as noted under "XSUBs and the Argument Stack",
1748 will often need to be "mortal". The new "m(X)PUSH[iunp]" macros make
1749 this a little easier to achieve by creating a new mortal for you (via
1750 "(X)PUSHmortal"), pushing that onto the stack (extending it if
1751 necessary in the case of the "mXPUSH[iunp]" macros), and then setting
1752 its value. Thus, instead of writing this to "fix" the example above:
1753
1754 XPUSHs(sv_2mortal(newSViv(10)))
1755 XPUSHs(sv_2mortal(newSViv(20)))
1756
1757 you can simply write:
1758
1759 mXPUSHi(10)
1760 mXPUSHi(20)
1761
1762 On a related note, if you do use "(X)PUSH[iunp]", then you're going to
1763 need a "dTARG" in your variable declarations so that the "*PUSH*"
1764 macros can make use of the local variable "TARG". See also "dTARGET"
1765 and "dXSTARG".
1766
1767 Scratchpads
1768 The question remains on when the SVs which are targets for opcodes are
1769 created. The answer is that they are created when the current unit--a
1770 subroutine or a file (for opcodes for statements outside of
1771 subroutines)--is compiled. During this time a special anonymous Perl
1772 array is created, which is called a scratchpad for the current unit.
1773
1774 A scratchpad keeps SVs which are lexicals for the current unit and are
1775 targets for opcodes. A previous version of this document stated that
1776 one can deduce that an SV lives on a scratchpad by looking on its
1777 flags: lexicals have "SVs_PADMY" set, and targets have "SVs_PADTMP"
1778 set. But this has never been fully true. "SVs_PADMY" could be set on
1779 a variable that no longer resides in any pad. While targets do have
1780 "SVs_PADTMP" set, it can also be set on variables that have never
1781 resided in a pad, but nonetheless act like targets. As of perl 5.21.5,
1782 the "SVs_PADMY" flag is no longer used and is defined as 0.
1783 "SvPADMY()" now returns true for anything without "SVs_PADTMP".
1784
1785 The correspondence between OPs and targets is not 1-to-1. Different
1786 OPs in the compile tree of the unit can use the same target, if this
1787 would not conflict with the expected life of the temporary.
1788
1789 Scratchpads and recursion
1790 In fact it is not 100% true that a compiled unit contains a pointer to
1791 the scratchpad AV. In fact it contains a pointer to an AV of
1792 (initially) one element, and this element is the scratchpad AV. Why do
1793 we need an extra level of indirection?
1794
1795 The answer is recursion, and maybe threads. Both these can create
1796 several execution pointers going into the same subroutine. For the
1797 subroutine-child not write over the temporaries for the subroutine-
1798 parent (lifespan of which covers the call to the child), the parent and
1799 the child should have different scratchpads. (And the lexicals should
1800 be separate anyway!)
1801
1802 So each subroutine is born with an array of scratchpads (of length 1).
1803 On each entry to the subroutine it is checked that the current depth of
1804 the recursion is not more than the length of this array, and if it is,
1805 new scratchpad is created and pushed into the array.
1806
1807 The targets on this scratchpad are "undef"s, but they are already
1808 marked with correct flags.
1809
1811 Allocation
1812 All memory meant to be used with the Perl API functions should be
1813 manipulated using the macros described in this section. The macros
1814 provide the necessary transparency between differences in the actual
1815 malloc implementation that is used within perl.
1816
1817 It is suggested that you enable the version of malloc that is
1818 distributed with Perl. It keeps pools of various sizes of unallocated
1819 memory in order to satisfy allocation requests more quickly. However,
1820 on some platforms, it may cause spurious malloc or free errors.
1821
1822 The following three macros are used to initially allocate memory :
1823
1824 Newx(pointer, number, type);
1825 Newxc(pointer, number, type, cast);
1826 Newxz(pointer, number, type);
1827
1828 The first argument "pointer" should be the name of a variable that will
1829 point to the newly allocated memory.
1830
1831 The second and third arguments "number" and "type" specify how many of
1832 the specified type of data structure should be allocated. The argument
1833 "type" is passed to "sizeof". The final argument to "Newxc", "cast",
1834 should be used if the "pointer" argument is different from the "type"
1835 argument.
1836
1837 Unlike the "Newx" and "Newxc" macros, the "Newxz" macro calls "memzero"
1838 to zero out all the newly allocated memory.
1839
1840 Reallocation
1841 Renew(pointer, number, type);
1842 Renewc(pointer, number, type, cast);
1843 Safefree(pointer)
1844
1845 These three macros are used to change a memory buffer size or to free a
1846 piece of memory no longer needed. The arguments to "Renew" and
1847 "Renewc" match those of "New" and "Newc" with the exception of not
1848 needing the "magic cookie" argument.
1849
1850 Moving
1851 Move(source, dest, number, type);
1852 Copy(source, dest, number, type);
1853 Zero(dest, number, type);
1854
1855 These three macros are used to move, copy, or zero out previously
1856 allocated memory. The "source" and "dest" arguments point to the
1857 source and destination starting points. Perl will move, copy, or zero
1858 out "number" instances of the size of the "type" data structure (using
1859 the "sizeof" function).
1860
1862 The most recent development releases of Perl have been experimenting
1863 with removing Perl's dependency on the "normal" standard I/O suite and
1864 allowing other stdio implementations to be used. This involves
1865 creating a new abstraction layer that then calls whichever
1866 implementation of stdio Perl was compiled with. All XSUBs should now
1867 use the functions in the PerlIO abstraction layer and not make any
1868 assumptions about what kind of stdio is being used.
1869
1870 For a complete description of the PerlIO abstraction, consult perlapio.
1871
1873 Code tree
1874 Here we describe the internal form your code is converted to by Perl.
1875 Start with a simple example:
1876
1877 $a = $b + $c;
1878
1879 This is converted to a tree similar to this one:
1880
1881 assign-to
1882 / \
1883 + $a
1884 / \
1885 $b $c
1886
1887 (but slightly more complicated). This tree reflects the way Perl
1888 parsed your code, but has nothing to do with the execution order.
1889 There is an additional "thread" going through the nodes of the tree
1890 which shows the order of execution of the nodes. In our simplified
1891 example above it looks like:
1892
1893 $b ---> $c ---> + ---> $a ---> assign-to
1894
1895 But with the actual compile tree for "$a = $b + $c" it is different:
1896 some nodes optimized away. As a corollary, though the actual tree
1897 contains more nodes than our simplified example, the execution order is
1898 the same as in our example.
1899
1900 Examining the tree
1901 If you have your perl compiled for debugging (usually done with
1902 "-DDEBUGGING" on the "Configure" command line), you may examine the
1903 compiled tree by specifying "-Dx" on the Perl command line. The output
1904 takes several lines per node, and for "$b+$c" it looks like this:
1905
1906 5 TYPE = add ===> 6
1907 TARG = 1
1908 FLAGS = (SCALAR,KIDS)
1909 {
1910 TYPE = null ===> (4)
1911 (was rv2sv)
1912 FLAGS = (SCALAR,KIDS)
1913 {
1914 3 TYPE = gvsv ===> 4
1915 FLAGS = (SCALAR)
1916 GV = main::b
1917 }
1918 }
1919 {
1920 TYPE = null ===> (5)
1921 (was rv2sv)
1922 FLAGS = (SCALAR,KIDS)
1923 {
1924 4 TYPE = gvsv ===> 5
1925 FLAGS = (SCALAR)
1926 GV = main::c
1927 }
1928 }
1929
1930 This tree has 5 nodes (one per "TYPE" specifier), only 3 of them are
1931 not optimized away (one per number in the left column). The immediate
1932 children of the given node correspond to "{}" pairs on the same level
1933 of indentation, thus this listing corresponds to the tree:
1934
1935 add
1936 / \
1937 null null
1938 | |
1939 gvsv gvsv
1940
1941 The execution order is indicated by "===>" marks, thus it is "3 4 5 6"
1942 (node 6 is not included into above listing), i.e., "gvsv gvsv add
1943 whatever".
1944
1945 Each of these nodes represents an op, a fundamental operation inside
1946 the Perl core. The code which implements each operation can be found
1947 in the pp*.c files; the function which implements the op with type
1948 "gvsv" is "pp_gvsv", and so on. As the tree above shows, different ops
1949 have different numbers of children: "add" is a binary operator, as one
1950 would expect, and so has two children. To accommodate the various
1951 different numbers of children, there are various types of op data
1952 structure, and they link together in different ways.
1953
1954 The simplest type of op structure is "OP": this has no children. Unary
1955 operators, "UNOP"s, have one child, and this is pointed to by the
1956 "op_first" field. Binary operators ("BINOP"s) have not only an
1957 "op_first" field but also an "op_last" field. The most complex type of
1958 op is a "LISTOP", which has any number of children. In this case, the
1959 first child is pointed to by "op_first" and the last child by
1960 "op_last". The children in between can be found by iteratively
1961 following the "OpSIBLING" pointer from the first child to the last (but
1962 see below).
1963
1964 There are also some other op types: a "PMOP" holds a regular
1965 expression, and has no children, and a "LOOP" may or may not have
1966 children. If the "op_children" field is non-zero, it behaves like a
1967 "LISTOP". To complicate matters, if a "UNOP" is actually a "null" op
1968 after optimization (see "Compile pass 2: context propagation") it will
1969 still have children in accordance with its former type.
1970
1971 Finally, there is a "LOGOP", or logic op. Like a "LISTOP", this has one
1972 or more children, but it doesn't have an "op_last" field: so you have
1973 to follow "op_first" and then the "OpSIBLING" chain itself to find the
1974 last child. Instead it has an "op_other" field, which is comparable to
1975 the "op_next" field described below, and represents an alternate
1976 execution path. Operators like "and", "or" and "?" are "LOGOP"s. Note
1977 that in general, "op_other" may not point to any of the direct children
1978 of the "LOGOP".
1979
1980 Starting in version 5.21.2, perls built with the experimental define
1981 "-DPERL_OP_PARENT" add an extra boolean flag for each op, "op_moresib".
1982 When not set, this indicates that this is the last op in an "OpSIBLING"
1983 chain. This frees up the "op_sibling" field on the last sibling to
1984 point back to the parent op. Under this build, that field is also
1985 renamed "op_sibparent" to reflect its joint role. The macro
1986 OpSIBLING(o) wraps this special behaviour, and always returns NULL on
1987 the last sibling. With this build the op_parent(o) function can be
1988 used to find the parent of any op. Thus for forward compatibility, you
1989 should always use the OpSIBLING(o) macro rather than accessing
1990 "op_sibling" directly.
1991
1992 Another way to examine the tree is to use a compiler back-end module,
1993 such as B::Concise.
1994
1995 Compile pass 1: check routines
1996 The tree is created by the compiler while yacc code feeds it the
1997 constructions it recognizes. Since yacc works bottom-up, so does the
1998 first pass of perl compilation.
1999
2000 What makes this pass interesting for perl developers is that some
2001 optimization may be performed on this pass. This is optimization by
2002 so-called "check routines". The correspondence between node names and
2003 corresponding check routines is described in opcode.pl (do not forget
2004 to run "make regen_headers" if you modify this file).
2005
2006 A check routine is called when the node is fully constructed except for
2007 the execution-order thread. Since at this time there are no back-links
2008 to the currently constructed node, one can do most any operation to the
2009 top-level node, including freeing it and/or creating new nodes
2010 above/below it.
2011
2012 The check routine returns the node which should be inserted into the
2013 tree (if the top-level node was not modified, check routine returns its
2014 argument).
2015
2016 By convention, check routines have names "ck_*". They are usually
2017 called from "new*OP" subroutines (or "convert") (which in turn are
2018 called from perly.y).
2019
2020 Compile pass 1a: constant folding
2021 Immediately after the check routine is called the returned node is
2022 checked for being compile-time executable. If it is (the value is
2023 judged to be constant) it is immediately executed, and a constant node
2024 with the "return value" of the corresponding subtree is substituted
2025 instead. The subtree is deleted.
2026
2027 If constant folding was not performed, the execution-order thread is
2028 created.
2029
2030 Compile pass 2: context propagation
2031 When a context for a part of compile tree is known, it is propagated
2032 down through the tree. At this time the context can have 5 values
2033 (instead of 2 for runtime context): void, boolean, scalar, list, and
2034 lvalue. In contrast with the pass 1 this pass is processed from top to
2035 bottom: a node's context determines the context for its children.
2036
2037 Additional context-dependent optimizations are performed at this time.
2038 Since at this moment the compile tree contains back-references (via
2039 "thread" pointers), nodes cannot be free()d now. To allow optimized-
2040 away nodes at this stage, such nodes are null()ified instead of
2041 free()ing (i.e. their type is changed to OP_NULL).
2042
2043 Compile pass 3: peephole optimization
2044 After the compile tree for a subroutine (or for an "eval" or a file) is
2045 created, an additional pass over the code is performed. This pass is
2046 neither top-down or bottom-up, but in the execution order (with
2047 additional complications for conditionals). Optimizations performed at
2048 this stage are subject to the same restrictions as in the pass 2.
2049
2050 Peephole optimizations are done by calling the function pointed to by
2051 the global variable "PL_peepp". By default, "PL_peepp" just calls the
2052 function pointed to by the global variable "PL_rpeepp". By default,
2053 that performs some basic op fixups and optimisations along the
2054 execution-order op chain, and recursively calls "PL_rpeepp" for each
2055 side chain of ops (resulting from conditionals). Extensions may
2056 provide additional optimisations or fixups, hooking into either the
2057 per-subroutine or recursive stage, like this:
2058
2059 static peep_t prev_peepp;
2060 static void my_peep(pTHX_ OP *o)
2061 {
2062 /* custom per-subroutine optimisation goes here */
2063 prev_peepp(aTHX_ o);
2064 /* custom per-subroutine optimisation may also go here */
2065 }
2066 BOOT:
2067 prev_peepp = PL_peepp;
2068 PL_peepp = my_peep;
2069
2070 static peep_t prev_rpeepp;
2071 static void my_rpeep(pTHX_ OP *o)
2072 {
2073 OP *orig_o = o;
2074 for(; o; o = o->op_next) {
2075 /* custom per-op optimisation goes here */
2076 }
2077 prev_rpeepp(aTHX_ orig_o);
2078 }
2079 BOOT:
2080 prev_rpeepp = PL_rpeepp;
2081 PL_rpeepp = my_rpeep;
2082
2083 Pluggable runops
2084 The compile tree is executed in a runops function. There are two
2085 runops functions, in run.c and in dump.c. "Perl_runops_debug" is used
2086 with DEBUGGING and "Perl_runops_standard" is used otherwise. For fine
2087 control over the execution of the compile tree it is possible to
2088 provide your own runops function.
2089
2090 It's probably best to copy one of the existing runops functions and
2091 change it to suit your needs. Then, in the BOOT section of your XS
2092 file, add the line:
2093
2094 PL_runops = my_runops;
2095
2096 This function should be as efficient as possible to keep your programs
2097 running as fast as possible.
2098
2099 Compile-time scope hooks
2100 As of perl 5.14 it is possible to hook into the compile-time lexical
2101 scope mechanism using "Perl_blockhook_register". This is used like
2102 this:
2103
2104 STATIC void my_start_hook(pTHX_ int full);
2105 STATIC BHK my_hooks;
2106
2107 BOOT:
2108 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
2109 Perl_blockhook_register(aTHX_ &my_hooks);
2110
2111 This will arrange to have "my_start_hook" called at the start of
2112 compiling every lexical scope. The available hooks are:
2113
2114 "void bhk_start(pTHX_ int full)"
2115 This is called just after starting a new lexical scope. Note that
2116 Perl code like
2117
2118 if ($x) { ... }
2119
2120 creates two scopes: the first starts at the "(" and has "full ==
2121 1", the second starts at the "{" and has "full == 0". Both end at
2122 the "}", so calls to "start" and "pre"/"post_end" will match.
2123 Anything pushed onto the save stack by this hook will be popped
2124 just before the scope ends (between the "pre_" and "post_end"
2125 hooks, in fact).
2126
2127 "void bhk_pre_end(pTHX_ OP **o)"
2128 This is called at the end of a lexical scope, just before unwinding
2129 the stack. o is the root of the optree representing the scope; it
2130 is a double pointer so you can replace the OP if you need to.
2131
2132 "void bhk_post_end(pTHX_ OP **o)"
2133 This is called at the end of a lexical scope, just after unwinding
2134 the stack. o is as above. Note that it is possible for calls to
2135 "pre_" and "post_end" to nest, if there is something on the save
2136 stack that calls string eval.
2137
2138 "void bhk_eval(pTHX_ OP *const o)"
2139 This is called just before starting to compile an "eval STRING",
2140 "do FILE", "require" or "use", after the eval has been set up. o
2141 is the OP that requested the eval, and will normally be an
2142 "OP_ENTEREVAL", "OP_DOFILE" or "OP_REQUIRE".
2143
2144 Once you have your hook functions, you need a "BHK" structure to put
2145 them in. It's best to allocate it statically, since there is no way to
2146 free it once it's registered. The function pointers should be inserted
2147 into this structure using the "BhkENTRY_set" macro, which will also set
2148 flags indicating which entries are valid. If you do need to allocate
2149 your "BHK" dynamically for some reason, be sure to zero it before you
2150 start.
2151
2152 Once registered, there is no mechanism to switch these hooks off, so if
2153 that is necessary you will need to do this yourself. An entry in "%^H"
2154 is probably the best way, so the effect is lexically scoped; however it
2155 is also possible to use the "BhkDISABLE" and "BhkENABLE" macros to
2156 temporarily switch entries on and off. You should also be aware that
2157 generally speaking at least one scope will have opened before your
2158 extension is loaded, so you will see some "pre"/"post_end" pairs that
2159 didn't have a matching "start".
2160
2162 To aid debugging, the source file dump.c contains a number of functions
2163 which produce formatted output of internal data structures.
2164
2165 The most commonly used of these functions is "Perl_sv_dump"; it's used
2166 for dumping SVs, AVs, HVs, and CVs. The "Devel::Peek" module calls
2167 "sv_dump" to produce debugging output from Perl-space, so users of that
2168 module should already be familiar with its format.
2169
2170 "Perl_op_dump" can be used to dump an "OP" structure or any of its
2171 derivatives, and produces output similar to "perl -Dx"; in fact,
2172 "Perl_dump_eval" will dump the main root of the code being evaluated,
2173 exactly like "-Dx".
2174
2175 Other useful functions are "Perl_dump_sub", which turns a "GV" into an
2176 op tree, "Perl_dump_packsubs" which calls "Perl_dump_sub" on all the
2177 subroutines in a package like so: (Thankfully, these are all xsubs, so
2178 there is no op tree)
2179
2180 (gdb) print Perl_dump_packsubs(PL_defstash)
2181
2182 SUB attributes::bootstrap = (xsub 0x811fedc 0)
2183
2184 SUB UNIVERSAL::can = (xsub 0x811f50c 0)
2185
2186 SUB UNIVERSAL::isa = (xsub 0x811f304 0)
2187
2188 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
2189
2190 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
2191
2192 and "Perl_dump_all", which dumps all the subroutines in the stash and
2193 the op tree of the main root.
2194
2196 Background and PERL_IMPLICIT_CONTEXT
2197 The Perl interpreter can be regarded as a closed box: it has an API for
2198 feeding it code or otherwise making it do things, but it also has
2199 functions for its own use. This smells a lot like an object, and there
2200 are ways for you to build Perl so that you can have multiple
2201 interpreters, with one interpreter represented either as a C structure,
2202 or inside a thread-specific structure. These structures contain all
2203 the context, the state of that interpreter.
2204
2205 One macro controls the major Perl build flavor: MULTIPLICITY. The
2206 MULTIPLICITY build has a C structure that packages all the interpreter
2207 state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
2208 normally defined, and enables the support for passing in a "hidden"
2209 first argument that represents all three data structures. MULTIPLICITY
2210 makes multi-threaded perls possible (with the ithreads threading model,
2211 related to the macro USE_ITHREADS.)
2212
2213 Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
2214 PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
2215 former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the
2216 internal variables of Perl to be wrapped inside a single global struct,
2217 struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or the
2218 function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes one step
2219 further, there is still a single struct (allocated in main() either
2220 from heap or from stack) but there are no global data symbols pointing
2221 to it. In either case the global struct should be initialized as the
2222 very first thing in main() using Perl_init_global_struct() and
2223 correspondingly tear it down after perl_free() using
2224 Perl_free_global_struct(), please see miniperlmain.c for usage details.
2225 You may also need to use "dVAR" in your coding to "declare the global
2226 variables" when you are using them. dTHX does this for you
2227 automatically.
2228
2229 To see whether you have non-const data you can use a BSD (or GNU)
2230 compatible "nm":
2231
2232 nm libperl.a | grep -v ' [TURtr] '
2233
2234 If this displays any "D" or "d" symbols (or possibly "C" or "c"), you
2235 have non-const data. The symbols the "grep" removed are as follows:
2236 "Tt" are text, or code, the "Rr" are read-only (const) data, and the
2237 "U" is <undefined>, external symbols referred to.
2238
2239 The test t/porting/libperl.t does this kind of symbol sanity checking
2240 on "libperl.a".
2241
2242 For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
2243 doesn't actually hide all symbols inside a big global struct: some
2244 PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE
2245 then hides everything (see how the PERLIO_FUNCS_DECL is used).
2246
2247 All this obviously requires a way for the Perl internal functions to be
2248 either subroutines taking some kind of structure as the first argument,
2249 or subroutines taking nothing as the first argument. To enable these
2250 two very different ways of building the interpreter, the Perl source
2251 (as it does in so many other situations) makes heavy use of macros and
2252 subroutine naming conventions.
2253
2254 First problem: deciding which functions will be public API functions
2255 and which will be private. All functions whose names begin "S_" are
2256 private (think "S" for "secret" or "static"). All other functions
2257 begin with "Perl_", but just because a function begins with "Perl_"
2258 does not mean it is part of the API. (See "Internal Functions".) The
2259 easiest way to be sure a function is part of the API is to find its
2260 entry in perlapi. If it exists in perlapi, it's part of the API. If
2261 it doesn't, and you think it should be (i.e., you need it for your
2262 extension), send mail via perlbug explaining why you think it should
2263 be.
2264
2265 Second problem: there must be a syntax so that the same subroutine
2266 declarations and calls can pass a structure as their first argument, or
2267 pass nothing. To solve this, the subroutines are named and declared in
2268 a particular way. Here's a typical start of a static function used
2269 within the Perl guts:
2270
2271 STATIC void
2272 S_incline(pTHX_ char *s)
2273
2274 STATIC becomes "static" in C, and may be #define'd to nothing in some
2275 configurations in the future.
2276
2277 A public function (i.e. part of the internal API, but not necessarily
2278 sanctioned for use in extensions) begins like this:
2279
2280 void
2281 Perl_sv_setiv(pTHX_ SV* dsv, IV num)
2282
2283 "pTHX_" is one of a number of macros (in perl.h) that hide the details
2284 of the interpreter's context. THX stands for "thread", "this", or
2285 "thingy", as the case may be. (And no, George Lucas is not involved.
2286 :-) The first character could be 'p' for a prototype, 'a' for argument,
2287 or 'd' for declaration, so we have "pTHX", "aTHX" and "dTHX", and their
2288 variants.
2289
2290 When Perl is built without options that set PERL_IMPLICIT_CONTEXT,
2291 there is no first argument containing the interpreter's context. The
2292 trailing underscore in the pTHX_ macro indicates that the macro
2293 expansion needs a comma after the context argument because other
2294 arguments follow it. If PERL_IMPLICIT_CONTEXT is not defined, pTHX_
2295 will be ignored, and the subroutine is not prototyped to take the extra
2296 argument. The form of the macro without the trailing underscore is
2297 used when there are no additional explicit arguments.
2298
2299 When a core function calls another, it must pass the context. This is
2300 normally hidden via macros. Consider "sv_setiv". It expands into
2301 something like this:
2302
2303 #ifdef PERL_IMPLICIT_CONTEXT
2304 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
2305 /* can't do this for vararg functions, see below */
2306 #else
2307 #define sv_setiv Perl_sv_setiv
2308 #endif
2309
2310 This works well, and means that XS authors can gleefully write:
2311
2312 sv_setiv(foo, bar);
2313
2314 and still have it work under all the modes Perl could have been
2315 compiled with.
2316
2317 This doesn't work so cleanly for varargs functions, though, as macros
2318 imply that the number of arguments is known in advance. Instead we
2319 either need to spell them out fully, passing "aTHX_" as the first
2320 argument (the Perl core tends to do this with functions like
2321 Perl_warner), or use a context-free version.
2322
2323 The context-free version of Perl_warner is called
2324 Perl_warner_nocontext, and does not take the extra argument. Instead
2325 it does dTHX; to get the context from thread-local storage. We
2326 "#define warner Perl_warner_nocontext" so that extensions get source
2327 compatibility at the expense of performance. (Passing an arg is
2328 cheaper than grabbing it from thread-local storage.)
2329
2330 You can ignore [pad]THXx when browsing the Perl headers/sources. Those
2331 are strictly for use within the core. Extensions and embedders need
2332 only be aware of [pad]THX.
2333
2334 So what happened to dTHR?
2335 "dTHR" was introduced in perl 5.005 to support the older thread model.
2336 The older thread model now uses the "THX" mechanism to pass context
2337 pointers around, so "dTHR" is not useful any more. Perl 5.6.0 and
2338 later still have it for backward source compatibility, but it is
2339 defined to be a no-op.
2340
2341 How do I use all this in extensions?
2342 When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any
2343 functions in the Perl API will need to pass the initial context
2344 argument somehow. The kicker is that you will need to write it in such
2345 a way that the extension still compiles when Perl hasn't been built
2346 with PERL_IMPLICIT_CONTEXT enabled.
2347
2348 There are three ways to do this. First, the easy but inefficient way,
2349 which is also the default, in order to maintain source compatibility
2350 with extensions: whenever XSUB.h is #included, it redefines the aTHX
2351 and aTHX_ macros to call a function that will return the context.
2352 Thus, something like:
2353
2354 sv_setiv(sv, num);
2355
2356 in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
2357 in effect:
2358
2359 Perl_sv_setiv(Perl_get_context(), sv, num);
2360
2361 or to this otherwise:
2362
2363 Perl_sv_setiv(sv, num);
2364
2365 You don't have to do anything new in your extension to get this; since
2366 the Perl library provides Perl_get_context(), it will all just work.
2367
2368 The second, more efficient way is to use the following template for
2369 your Foo.xs:
2370
2371 #define PERL_NO_GET_CONTEXT /* we want efficiency */
2372 #include "EXTERN.h"
2373 #include "perl.h"
2374 #include "XSUB.h"
2375
2376 STATIC void my_private_function(int arg1, int arg2);
2377
2378 STATIC void
2379 my_private_function(int arg1, int arg2)
2380 {
2381 dTHX; /* fetch context */
2382 ... call many Perl API functions ...
2383 }
2384
2385 [... etc ...]
2386
2387 MODULE = Foo PACKAGE = Foo
2388
2389 /* typical XSUB */
2390
2391 void
2392 my_xsub(arg)
2393 int arg
2394 CODE:
2395 my_private_function(arg, 10);
2396
2397 Note that the only two changes from the normal way of writing an
2398 extension is the addition of a "#define PERL_NO_GET_CONTEXT" before
2399 including the Perl headers, followed by a "dTHX;" declaration at the
2400 start of every function that will call the Perl API. (You'll know
2401 which functions need this, because the C compiler will complain that
2402 there's an undeclared identifier in those functions.) No changes are
2403 needed for the XSUBs themselves, because the XS() macro is correctly
2404 defined to pass in the implicit context if needed.
2405
2406 The third, even more efficient way is to ape how it is done within the
2407 Perl guts:
2408
2409 #define PERL_NO_GET_CONTEXT /* we want efficiency */
2410 #include "EXTERN.h"
2411 #include "perl.h"
2412 #include "XSUB.h"
2413
2414 /* pTHX_ only needed for functions that call Perl API */
2415 STATIC void my_private_function(pTHX_ int arg1, int arg2);
2416
2417 STATIC void
2418 my_private_function(pTHX_ int arg1, int arg2)
2419 {
2420 /* dTHX; not needed here, because THX is an argument */
2421 ... call Perl API functions ...
2422 }
2423
2424 [... etc ...]
2425
2426 MODULE = Foo PACKAGE = Foo
2427
2428 /* typical XSUB */
2429
2430 void
2431 my_xsub(arg)
2432 int arg
2433 CODE:
2434 my_private_function(aTHX_ arg, 10);
2435
2436 This implementation never has to fetch the context using a function
2437 call, since it is always passed as an extra argument. Depending on
2438 your needs for simplicity or efficiency, you may mix the previous two
2439 approaches freely.
2440
2441 Never add a comma after "pTHX" yourself--always use the form of the
2442 macro with the underscore for functions that take explicit arguments,
2443 or the form without the argument for functions with no explicit
2444 arguments.
2445
2446 If one is compiling Perl with the "-DPERL_GLOBAL_STRUCT" the "dVAR"
2447 definition is needed if the Perl global variables (see perlvars.h or
2448 globvar.sym) are accessed in the function and "dTHX" is not used (the
2449 "dTHX" includes the "dVAR" if necessary). One notices the need for
2450 "dVAR" only with the said compile-time define, because otherwise the
2451 Perl global variables are visible as-is.
2452
2453 Should I do anything special if I call perl from multiple threads?
2454 If you create interpreters in one thread and then proceed to call them
2455 in another, you need to make sure perl's own Thread Local Storage (TLS)
2456 slot is initialized correctly in each of those threads.
2457
2458 The "perl_alloc" and "perl_clone" API functions will automatically set
2459 the TLS slot to the interpreter they created, so that there is no need
2460 to do anything special if the interpreter is always accessed in the
2461 same thread that created it, and that thread did not create or call any
2462 other interpreters afterwards. If that is not the case, you have to
2463 set the TLS slot of the thread before calling any functions in the Perl
2464 API on that particular interpreter. This is done by calling the
2465 "PERL_SET_CONTEXT" macro in that thread as the first thing you do:
2466
2467 /* do this before doing anything else with some_perl */
2468 PERL_SET_CONTEXT(some_perl);
2469
2470 ... other Perl API calls on some_perl go here ...
2471
2472 Future Plans and PERL_IMPLICIT_SYS
2473 Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
2474 that the interpreter knows about itself and pass it around, so too are
2475 there plans to allow the interpreter to bundle up everything it knows
2476 about the environment it's running on. This is enabled with the
2477 PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on
2478 Windows.
2479
2480 This allows the ability to provide an extra pointer (called the "host"
2481 environment) for all the system calls. This makes it possible for all
2482 the system stuff to maintain their own state, broken down into seven C
2483 structures. These are thin wrappers around the usual system calls (see
2484 win32/perllib.c) for the default perl executable, but for a more
2485 ambitious host (like the one that would do fork() emulation) all the
2486 extra work needed to pretend that different interpreters are actually
2487 different "processes", would be done here.
2488
2489 The Perl engine/interpreter and the host are orthogonal entities.
2490 There could be one or more interpreters in a process, and one or more
2491 "hosts", with free association between them.
2492
2494 All of Perl's internal functions which will be exposed to the outside
2495 world are prefixed by "Perl_" so that they will not conflict with XS
2496 functions or functions used in a program in which Perl is embedded.
2497 Similarly, all global variables begin with "PL_". (By convention,
2498 static functions start with "S_".)
2499
2500 Inside the Perl core ("PERL_CORE" defined), you can get at the
2501 functions either with or without the "Perl_" prefix, thanks to a bunch
2502 of defines that live in embed.h. Note that extension code should not
2503 set "PERL_CORE"; this exposes the full perl internals, and is likely to
2504 cause breakage of the XS in each new perl release.
2505
2506 The file embed.h is generated automatically from embed.pl and
2507 embed.fnc. embed.pl also creates the prototyping header files for the
2508 internal functions, generates the documentation and a lot of other bits
2509 and pieces. It's important that when you add a new function to the
2510 core or change an existing one, you change the data in the table in
2511 embed.fnc as well. Here's a sample entry from that table:
2512
2513 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
2514
2515 The second column is the return type, the third column the name.
2516 Columns after that are the arguments. The first column is a set of
2517 flags:
2518
2519 A This function is a part of the public API. All such functions
2520 should also have 'd', very few do not.
2521
2522 p This function has a "Perl_" prefix; i.e. it is defined as
2523 "Perl_av_fetch".
2524
2525 d This function has documentation using the "apidoc" feature which
2526 we'll look at in a second. Some functions have 'd' but not 'A';
2527 docs are good.
2528
2529 Other available flags are:
2530
2531 s This is a static function and is defined as "STATIC S_whatever", and
2532 usually called within the sources as "whatever(...)".
2533
2534 n This does not need an interpreter context, so the definition has no
2535 "pTHX", and it follows that callers don't use "aTHX". (See
2536 "Background and PERL_IMPLICIT_CONTEXT".)
2537
2538 r This function never returns; "croak", "exit" and friends.
2539
2540 f This function takes a variable number of arguments, "printf" style.
2541 The argument list should end with "...", like this:
2542
2543 Afprd |void |croak |const char* pat|...
2544
2545 M This function is part of the experimental development API, and may
2546 change or disappear without notice.
2547
2548 o This function should not have a compatibility macro to define, say,
2549 "Perl_parse" to "parse". It must be called as "Perl_parse".
2550
2551 x This function isn't exported out of the Perl core.
2552
2553 m This is implemented as a macro.
2554
2555 X This function is explicitly exported.
2556
2557 E This function is visible to extensions included in the Perl core.
2558
2559 b Binary backward compatibility; this function is a macro but also has
2560 a "Perl_" implementation (which is exported).
2561
2562 others
2563 See the comments at the top of "embed.fnc" for others.
2564
2565 If you edit embed.pl or embed.fnc, you will need to run "make
2566 regen_headers" to force a rebuild of embed.h and other auto-generated
2567 files.
2568
2569 Formatted Printing of IVs, UVs, and NVs
2570 If you are printing IVs, UVs, or NVS instead of the stdio(3) style
2571 formatting codes like %d, %ld, %f, you should use the following macros
2572 for portability
2573
2574 IVdf IV in decimal
2575 UVuf UV in decimal
2576 UVof UV in octal
2577 UVxf UV in hexadecimal
2578 NVef NV %e-like
2579 NVff NV %f-like
2580 NVgf NV %g-like
2581
2582 These will take care of 64-bit integers and long doubles. For example:
2583
2584 printf("IV is %"IVdf"\n", iv);
2585
2586 The IVdf will expand to whatever is the correct format for the IVs.
2587
2588 Note that there are different "long doubles": Perl will use whatever
2589 the compiler has.
2590
2591 If you are printing addresses of pointers, use UVxf combined with
2592 PTR2UV(), do not use %lx or %p.
2593
2594 Formatted Printing of "Size_t" and "SSize_t"
2595 The most general way to do this is to cast them to a UV or IV, and
2596 print as in the previous section.
2597
2598 But if you're using "PerlIO_printf()", it's less typing and visual
2599 clutter to use the "%z" length modifier (for siZe):
2600
2601 PerlIO_printf("STRLEN is %zu\n", len);
2602
2603 This modifier is not portable, so its use should be restricted to
2604 "PerlIO_printf()".
2605
2606 Pointer-To-Integer and Integer-To-Pointer
2607 Because pointer size does not necessarily equal integer size, use the
2608 follow macros to do it right.
2609
2610 PTR2UV(pointer)
2611 PTR2IV(pointer)
2612 PTR2NV(pointer)
2613 INT2PTR(pointertotype, integer)
2614
2615 For example:
2616
2617 IV iv = ...;
2618 SV *sv = INT2PTR(SV*, iv);
2619
2620 and
2621
2622 AV *av = ...;
2623 UV uv = PTR2UV(av);
2624
2625 Exception Handling
2626 There are a couple of macros to do very basic exception handling in XS
2627 modules. You have to define "NO_XSLOCKS" before including XSUB.h to be
2628 able to use these macros:
2629
2630 #define NO_XSLOCKS
2631 #include "XSUB.h"
2632
2633 You can use these macros if you call code that may croak, but you need
2634 to do some cleanup before giving control back to Perl. For example:
2635
2636 dXCPT; /* set up necessary variables */
2637
2638 XCPT_TRY_START {
2639 code_that_may_croak();
2640 } XCPT_TRY_END
2641
2642 XCPT_CATCH
2643 {
2644 /* do cleanup here */
2645 XCPT_RETHROW;
2646 }
2647
2648 Note that you always have to rethrow an exception that has been caught.
2649 Using these macros, it is not possible to just catch the exception and
2650 ignore it. If you have to ignore the exception, you have to use the
2651 "call_*" function.
2652
2653 The advantage of using the above macros is that you don't have to setup
2654 an extra function for "call_*", and that using these macros is faster
2655 than using "call_*".
2656
2657 Source Documentation
2658 There's an effort going on to document the internal functions and
2659 automatically produce reference manuals from them -- perlapi is one
2660 such manual which details all the functions which are available to XS
2661 writers. perlintern is the autogenerated manual for the functions
2662 which are not part of the API and are supposedly for internal use only.
2663
2664 Source documentation is created by putting POD comments into the C
2665 source, like this:
2666
2667 /*
2668 =for apidoc sv_setiv
2669
2670 Copies an integer into the given SV. Does not handle 'set' magic. See
2671 L<perlapi/sv_setiv_mg>.
2672
2673 =cut
2674 */
2675
2676 Please try and supply some documentation if you add functions to the
2677 Perl core.
2678
2679 Backwards compatibility
2680 The Perl API changes over time. New functions are added or the
2681 interfaces of existing functions are changed. The "Devel::PPPort"
2682 module tries to provide compatibility code for some of these changes,
2683 so XS writers don't have to code it themselves when supporting multiple
2684 versions of Perl.
2685
2686 "Devel::PPPort" generates a C header file ppport.h that can also be run
2687 as a Perl script. To generate ppport.h, run:
2688
2689 perl -MDevel::PPPort -eDevel::PPPort::WriteFile
2690
2691 Besides checking existing XS code, the script can also be used to
2692 retrieve compatibility information for various API calls using the
2693 "--api-info" command line switch. For example:
2694
2695 % perl ppport.h --api-info=sv_magicext
2696
2697 For details, see "perldoc ppport.h".
2698
2700 Perl 5.6.0 introduced Unicode support. It's important for porters and
2701 XS writers to understand this support and make sure that the code they
2702 write does not corrupt Unicode data.
2703
2704 What is Unicode, anyway?
2705 In the olden, less enlightened times, we all used to use ASCII. Most
2706 of us did, anyway. The big problem with ASCII is that it's American.
2707 Well, no, that's not actually the problem; the problem is that it's not
2708 particularly useful for people who don't use the Roman alphabet. What
2709 used to happen was that particular languages would stick their own
2710 alphabet in the upper range of the sequence, between 128 and 255. Of
2711 course, we then ended up with plenty of variants that weren't quite
2712 ASCII, and the whole point of it being a standard was lost.
2713
2714 Worse still, if you've got a language like Chinese or Japanese that has
2715 hundreds or thousands of characters, then you really can't fit them
2716 into a mere 256, so they had to forget about ASCII altogether, and
2717 build their own systems using pairs of numbers to refer to one
2718 character.
2719
2720 To fix this, some people formed Unicode, Inc. and produced a new
2721 character set containing all the characters you can possibly think of
2722 and more. There are several ways of representing these characters, and
2723 the one Perl uses is called UTF-8. UTF-8 uses a variable number of
2724 bytes to represent a character. You can learn more about Unicode and
2725 Perl's Unicode model in perlunicode.
2726
2727 (On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of
2728 UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8.
2729 UTF-EBCDIC is like UTF-8, but the details are different. The macros
2730 hide the differences from you, just remember that the particular
2731 numbers and bit patterns presented below will differ in UTF-EBCDIC.)
2732
2733 How can I recognise a UTF-8 string?
2734 You can't. This is because UTF-8 data is stored in bytes just like
2735 non-UTF-8 data. The Unicode character 200, (0xC8 for you hex types)
2736 capital E with a grave accent, is represented by the two bytes
2737 "v196.172". Unfortunately, the non-Unicode string "chr(196).chr(172)"
2738 has that byte sequence as well. So you can't tell just by looking --
2739 this is what makes Unicode input an interesting problem.
2740
2741 In general, you either have to know what you're dealing with, or you
2742 have to guess. The API function "is_utf8_string" can help; it'll tell
2743 you if a string contains only valid UTF-8 characters, and the chances
2744 of a non-UTF-8 string looking like valid UTF-8 become very small very
2745 quickly with increasing string length. On a character-by-character
2746 basis, "isUTF8_CHAR" will tell you whether the current character in a
2747 string is valid UTF-8.
2748
2749 How does UTF-8 represent Unicode characters?
2750 As mentioned above, UTF-8 uses a variable number of bytes to store a
2751 character. Characters with values 0...127 are stored in one byte, just
2752 like good ol' ASCII. Character 128 is stored as "v194.128"; this
2753 continues up to character 191, which is "v194.191". Now we've run out
2754 of bits (191 is binary 10111111) so we move on; character 192 is
2755 "v195.128". And so it goes on, moving to three bytes at character
2756 2048. "Unicode Encodings" in perlunicode has pictures of how this
2757 works.
2758
2759 Assuming you know you're dealing with a UTF-8 string, you can find out
2760 how long the first character in it is with the "UTF8SKIP" macro:
2761
2762 char *utf = "\305\233\340\240\201";
2763 I32 len;
2764
2765 len = UTF8SKIP(utf); /* len is 2 here */
2766 utf += len;
2767 len = UTF8SKIP(utf); /* len is 3 here */
2768
2769 Another way to skip over characters in a UTF-8 string is to use
2770 "utf8_hop", which takes a string and a number of characters to skip
2771 over. You're on your own about bounds checking, though, so don't use
2772 it lightly.
2773
2774 All bytes in a multi-byte UTF-8 character will have the high bit set,
2775 so you can test if you need to do something special with this character
2776 like this (the "UTF8_IS_INVARIANT()" is a macro that tests whether the
2777 byte is encoded as a single byte even in UTF-8):
2778
2779 U8 *utf; /* Initialize this to point to the beginning of the
2780 sequence to convert */
2781 U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence
2782 pointed to by 'utf' */
2783 UV uv; /* Returned code point; note: a UV, not a U8, not a
2784 char */
2785 STRLEN len; /* Returned length of character in bytes */
2786
2787 if (!UTF8_IS_INVARIANT(*utf))
2788 /* Must treat this as UTF-8 */
2789 uv = utf8_to_uvchr_buf(utf, utf_end, &len);
2790 else
2791 /* OK to treat this character as a byte */
2792 uv = *utf;
2793
2794 You can also see in that example that we use "utf8_to_uvchr_buf" to get
2795 the value of the character; the inverse function "uvchr_to_utf8" is
2796 available for putting a UV into UTF-8:
2797
2798 if (!UVCHR_IS_INVARIANT(uv))
2799 /* Must treat this as UTF8 */
2800 utf8 = uvchr_to_utf8(utf8, uv);
2801 else
2802 /* OK to treat this character as a byte */
2803 *utf8++ = uv;
2804
2805 You must convert characters to UVs using the above functions if you're
2806 ever in a situation where you have to match UTF-8 and non-UTF-8
2807 characters. You may not skip over UTF-8 characters in this case. If
2808 you do this, you'll lose the ability to match hi-bit non-UTF-8
2809 characters; for instance, if your UTF-8 string contains "v196.172", and
2810 you skip that character, you can never match a "chr(200)" in a
2811 non-UTF-8 string. So don't do that!
2812
2813 (Note that we don't have to test for invariant characters in the
2814 examples above. The functions work on any well-formed UTF-8 input.
2815 It's just that its faster to avoid the function overhead when it's not
2816 needed.)
2817
2818 How does Perl store UTF-8 strings?
2819 Currently, Perl deals with UTF-8 strings and non-UTF-8 strings slightly
2820 differently. A flag in the SV, "SVf_UTF8", indicates that the string
2821 is internally encoded as UTF-8. Without it, the byte value is the
2822 codepoint number and vice versa. This flag is only meaningful if the
2823 SV is "SvPOK" or immediately after stringification via "SvPV" or a
2824 similar macro. You can check and manipulate this flag with the
2825 following macros:
2826
2827 SvUTF8(sv)
2828 SvUTF8_on(sv)
2829 SvUTF8_off(sv)
2830
2831 This flag has an important effect on Perl's treatment of the string: if
2832 UTF-8 data is not properly distinguished, regular expressions,
2833 "length", "substr" and other string handling operations will have
2834 undesirable (wrong) results.
2835
2836 The problem comes when you have, for instance, a string that isn't
2837 flagged as UTF-8, and contains a byte sequence that could be UTF-8 --
2838 especially when combining non-UTF-8 and UTF-8 strings.
2839
2840 Never forget that the "SVf_UTF8" flag is separate from the PV value;
2841 you need to be sure you don't accidentally knock it off while you're
2842 manipulating SVs. More specifically, you cannot expect to do this:
2843
2844 SV *sv;
2845 SV *nsv;
2846 STRLEN len;
2847 char *p;
2848
2849 p = SvPV(sv, len);
2850 frobnicate(p);
2851 nsv = newSVpvn(p, len);
2852
2853 The "char*" string does not tell you the whole story, and you can't
2854 copy or reconstruct an SV just by copying the string value. Check if
2855 the old SV has the UTF8 flag set (after the "SvPV" call), and act
2856 accordingly:
2857
2858 p = SvPV(sv, len);
2859 is_utf8 = SvUTF8(sv);
2860 frobnicate(p, is_utf8);
2861 nsv = newSVpvn(p, len);
2862 if (is_utf8)
2863 SvUTF8_on(nsv);
2864
2865 In the above, your "frobnicate" function has been changed to be made
2866 aware of whether or not it's dealing with UTF-8 data, so that it can
2867 handle the string appropriately.
2868
2869 Since just passing an SV to an XS function and copying the data of the
2870 SV is not enough to copy the UTF8 flags, even less right is just
2871 passing a "char *" to an XS function.
2872
2873 For full generality, use the "DO_UTF8" macro to see if the string in an
2874 SV is to be treated as UTF-8. This takes into account if the call to
2875 the XS function is being made from within the scope of "use bytes". If
2876 so, the underlying bytes that comprise the UTF-8 string are to be
2877 exposed, rather than the character they represent. But this pragma
2878 should only really be used for debugging and perhaps low-level testing
2879 at the byte level. Hence most XS code need not concern itself with
2880 this, but various areas of the perl core do need to support it.
2881
2882 And this isn't the whole story. Starting in Perl v5.12, strings that
2883 aren't encoded in UTF-8 may also be treated as Unicode under various
2884 conditions (see "ASCII Rules versus Unicode Rules" in perlunicode).
2885 This is only really a problem for characters whose ordinals are between
2886 128 and 255, and their behavior varies under ASCII versus Unicode rules
2887 in ways that your code cares about (see "The "Unicode Bug"" in
2888 perlunicode). There is no published API for dealing with this, as it
2889 is subject to change, but you can look at the code for "pp_lc" in pp.c
2890 for an example as to how it's currently done.
2891
2892 How do I convert a string to UTF-8?
2893 If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to
2894 upgrade the non-UTF-8 strings to UTF-8. If you've got an SV, the
2895 easiest way to do this is:
2896
2897 sv_utf8_upgrade(sv);
2898
2899 However, you must not do this, for example:
2900
2901 if (!SvUTF8(left))
2902 sv_utf8_upgrade(left);
2903
2904 If you do this in a binary operator, you will actually change one of
2905 the strings that came into the operator, and, while it shouldn't be
2906 noticeable by the end user, it can cause problems in deficient code.
2907
2908 Instead, "bytes_to_utf8" will give you a UTF-8-encoded copy of its
2909 string argument. This is useful for having the data available for
2910 comparisons and so on, without harming the original SV. There's also
2911 "utf8_to_bytes" to go the other way, but naturally, this will fail if
2912 the string contains any characters above 255 that can't be represented
2913 in a single byte.
2914
2915 How do I compare strings?
2916 "sv_cmp" in perlapi and "sv_cmp_flags" in perlapi do a lexigraphic
2917 comparison of two SV's, and handle UTF-8ness properly. Note, however,
2918 that Unicode specifies a much fancier mechanism for collation,
2919 available via the Unicode::Collate module.
2920
2921 To just compare two strings for equality/non-equality, you can just use
2922 "memEQ()" and "memNE()" as usual, except the strings must be both UTF-8
2923 or not UTF-8 encoded.
2924
2925 To compare two strings case-insensitively, use "foldEQ_utf8()" (the
2926 strings don't have to have the same UTF-8ness).
2927
2928 Is there anything else I need to know?
2929 Not really. Just remember these things:
2930
2931 · There's no way to tell if a "char *" or "U8 *" string is UTF-8 or
2932 not. But you can tell if an SV is to be treated as UTF-8 by calling
2933 "DO_UTF8" on it, after stringifying it with "SvPV" or a similar
2934 macro. And, you can tell if SV is actually UTF-8 (even if it is not
2935 to be treated as such) by looking at its "SvUTF8" flag (again after
2936 stringifying it). Don't forget to set the flag if something should
2937 be UTF-8. Treat the flag as part of the PV, even though it's not --
2938 if you pass on the PV to somewhere, pass on the flag too.
2939
2940 · If a string is UTF-8, always use "utf8_to_uvchr_buf" to get at the
2941 value, unless "UTF8_IS_INVARIANT(*s)" in which case you can use *s.
2942
2943 · When writing a character UV to a UTF-8 string, always use
2944 "uvchr_to_utf8", unless "UVCHR_IS_INVARIANT(uv))" in which case you
2945 can use "*s = uv".
2946
2947 · Mixing UTF-8 and non-UTF-8 strings is tricky. Use "bytes_to_utf8"
2948 to get a new string which is UTF-8 encoded, and then combine them.
2949
2951 Custom operator support is an experimental feature that allows you to
2952 define your own ops. This is primarily to allow the building of
2953 interpreters for other languages in the Perl core, but it also allows
2954 optimizations through the creation of "macro-ops" (ops which perform
2955 the functions of multiple ops which are usually executed together, such
2956 as "gvsv, gvsv, add".)
2957
2958 This feature is implemented as a new op type, "OP_CUSTOM". The Perl
2959 core does not "know" anything special about this op type, and so it
2960 will not be involved in any optimizations. This also means that you
2961 can define your custom ops to be any op structure -- unary, binary,
2962 list and so on -- you like.
2963
2964 It's important to know what custom operators won't do for you. They
2965 won't let you add new syntax to Perl, directly. They won't even let
2966 you add new keywords, directly. In fact, they won't change the way
2967 Perl compiles a program at all. You have to do those changes yourself,
2968 after Perl has compiled the program. You do this either by
2969 manipulating the op tree using a "CHECK" block and the "B::Generate"
2970 module, or by adding a custom peephole optimizer with the "optimize"
2971 module.
2972
2973 When you do this, you replace ordinary Perl ops with custom ops by
2974 creating ops with the type "OP_CUSTOM" and the "op_ppaddr" of your own
2975 PP function. This should be defined in XS code, and should look like
2976 the PP ops in "pp_*.c". You are responsible for ensuring that your op
2977 takes the appropriate number of values from the stack, and you are
2978 responsible for adding stack marks if necessary.
2979
2980 You should also "register" your op with the Perl interpreter so that it
2981 can produce sensible error and warning messages. Since it is possible
2982 to have multiple custom ops within the one "logical" op type
2983 "OP_CUSTOM", Perl uses the value of "o->op_ppaddr" to determine which
2984 custom op it is dealing with. You should create an "XOP" structure for
2985 each ppaddr you use, set the properties of the custom op with
2986 "XopENTRY_set", and register the structure against the ppaddr using
2987 "Perl_custom_op_register". A trivial example might look like:
2988
2989 static XOP my_xop;
2990 static OP *my_pp(pTHX);
2991
2992 BOOT:
2993 XopENTRY_set(&my_xop, xop_name, "myxop");
2994 XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
2995 Perl_custom_op_register(aTHX_ my_pp, &my_xop);
2996
2997 The available fields in the structure are:
2998
2999 xop_name
3000 A short name for your op. This will be included in some error
3001 messages, and will also be returned as "$op->name" by the B module,
3002 so it will appear in the output of module like B::Concise.
3003
3004 xop_desc
3005 A short description of the function of the op.
3006
3007 xop_class
3008 Which of the various *OP structures this op uses. This should be
3009 one of the "OA_*" constants from op.h, namely
3010
3011 OA_BASEOP
3012 OA_UNOP
3013 OA_BINOP
3014 OA_LOGOP
3015 OA_LISTOP
3016 OA_PMOP
3017 OA_SVOP
3018 OA_PADOP
3019 OA_PVOP_OR_SVOP
3020 This should be interpreted as '"PVOP"' only. The "_OR_SVOP" is
3021 because the only core "PVOP", "OP_TRANS", can sometimes be a
3022 "SVOP" instead.
3023
3024 OA_LOOP
3025 OA_COP
3026
3027 The other "OA_*" constants should not be used.
3028
3029 xop_peep
3030 This member is of type "Perl_cpeep_t", which expands to "void
3031 (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)". If it is set, this
3032 function will be called from "Perl_rpeep" when ops of this type are
3033 encountered by the peephole optimizer. o is the OP that needs
3034 optimizing; oldop is the previous OP optimized, whose "op_next"
3035 points to o.
3036
3037 "B::Generate" directly supports the creation of custom ops by name.
3038
3040 Note: this section describes a non-public internal API that is subject
3041 to change without notice.
3042
3043 Introduction to the context stack
3044 In Perl, dynamic scoping refers to the runtime nesting of things like
3045 subroutine calls, evals etc, as well as the entering and exiting of
3046 block scopes. For example, the restoring of a "local"ised variable is
3047 determined by the dynamic scope.
3048
3049 Perl tracks the dynamic scope by a data structure called the context
3050 stack, which is an array of "PERL_CONTEXT" structures, and which is
3051 itself a big union for all the types of context. Whenever a new scope
3052 is entered (such as a block, a "for" loop, or a subroutine call), a new
3053 context entry is pushed onto the stack. Similarly when leaving a block
3054 or returning from a subroutine call etc. a context is popped. Since the
3055 context stack represents the current dynamic scope, it can be searched.
3056 For example, "next LABEL" searches back through the stack looking for a
3057 loop context that matches the label; "return" pops contexts until it
3058 finds a sub or eval context or similar; "caller" examines sub contexts
3059 on the stack.
3060
3061 Each context entry is labelled with a context type, "cx_type". Typical
3062 context types are "CXt_SUB", "CXt_EVAL" etc., as well as "CXt_BLOCK"
3063 and "CXt_NULL" which represent a basic scope (as pushed by "pp_enter")
3064 and a sort block. The type determines which part of the context union
3065 are valid.
3066
3067 The main division in the context struct is between a substitution scope
3068 ("CXt_SUBST") and block scopes, which are everything else. The former
3069 is just used while executing "s///e", and won't be discussed further
3070 here.
3071
3072 All the block scope types share a common base, which corresponds to
3073 "CXt_BLOCK". This stores the old values of various scope-related
3074 variables like "PL_curpm", as well as information about the current
3075 scope, such as "gimme". On scope exit, the old variables are restored.
3076
3077 Particular block scope types store extra per-type information. For
3078 example, "CXt_SUB" stores the currently executing CV, while the various
3079 for loop types might hold the original loop variable SV. On scope exit,
3080 the per-type data is processed; for example the CV has its reference
3081 count decremented, and the original loop variable is restored.
3082
3083 The macro "cxstack" returns the base of the current context stack,
3084 while "cxstack_ix" is the index of the current frame within that stack.
3085
3086 In fact, the context stack is actually part of a stack-of-stacks
3087 system; whenever something unusual is done such as calling a "DESTROY"
3088 or tie handler, a new stack is pushed, then popped at the end.
3089
3090 Note that the API described here changed considerably in perl 5.24;
3091 prior to that, big macros like "PUSHBLOCK" and "POPSUB" were used; in
3092 5.24 they were replaced by the inline static functions described below.
3093 In addition, the ordering and detail of how these macros/function work
3094 changed in many ways, often subtly. In particular they didn't handle
3095 saving the savestack and temps stack positions, and required additional
3096 "ENTER", "SAVETMPS" and "LEAVE" compared to the new functions. The old-
3097 style macros will not be described further.
3098
3099 Pushing contexts
3100 For pushing a new context, the two basic functions are "cx =
3101 cx_pushblock()", which pushes a new basic context block and returns its
3102 address, and a family of similar functions with names like
3103 "cx_pushsub(cx)" which populate the additional type-dependent fields in
3104 the "cx" struct. Note that "CXt_NULL" and "CXt_BLOCK" don't have their
3105 own push functions, as they don't store any data beyond that pushed by
3106 "cx_pushblock".
3107
3108 The fields of the context struct and the arguments to the "cx_*"
3109 functions are subject to change between perl releases, representing
3110 whatever is convenient or efficient for that release.
3111
3112 A typical context stack pushing can be found in "pp_entersub"; the
3113 following shows a simplified and stripped-down example of a non-XS
3114 call, along with comments showing roughly what each function does.
3115
3116 dMARK;
3117 U8 gimme = GIMME_V;
3118 bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED);
3119 OP *retop = PL_op->op_next;
3120 I32 old_ss_ix = PL_savestack_ix;
3121 CV *cv = ....;
3122
3123 /* ... make mortal copies of stack args which are PADTMPs here ... */
3124
3125 /* ... do any additional savestack pushes here ... */
3126
3127 /* Now push a new context entry of type 'CXt_SUB'; initially just
3128 * doing the actions common to all block types: */
3129
3130 cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix);
3131
3132 /* this does (approximately):
3133 CXINC; /* cxstack_ix++ (grow if necessary) */
3134 cx = CX_CUR(); /* and get the address of new frame */
3135 cx->cx_type = CXt_SUB;
3136 cx->blk_gimme = gimme;
3137 cx->blk_oldsp = MARK - PL_stack_base;
3138 cx->blk_oldsaveix = old_ss_ix;
3139 cx->blk_oldcop = PL_curcop;
3140 cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack;
3141 cx->blk_oldscopesp = PL_scopestack_ix;
3142 cx->blk_oldpm = PL_curpm;
3143 cx->blk_old_tmpsfloor = PL_tmps_floor;
3144
3145 PL_tmps_floor = PL_tmps_ix;
3146 */
3147
3148
3149 /* then update the new context frame with subroutine-specific info,
3150 * such as the CV about to be executed: */
3151
3152 cx_pushsub(cx, cv, retop, hasargs);
3153
3154 /* this does (approximately):
3155 cx->blk_sub.cv = cv;
3156 cx->blk_sub.olddepth = CvDEPTH(cv);
3157 cx->blk_sub.prevcomppad = PL_comppad;
3158 cx->cx_type |= (hasargs) ? CXp_HASARGS : 0;
3159 cx->blk_sub.retop = retop;
3160 SvREFCNT_inc_simple_void_NN(cv);
3161 */
3162
3163 Note that "cx_pushblock()" sets two new floors: for the args stack (to
3164 "MARK") and the temps stack (to "PL_tmps_ix"). While executing at this
3165 scope level, every "nextstate" (amongst others) will reset the args and
3166 tmps stack levels to these floors. Note that since "cx_pushblock" uses
3167 the current value of "PL_tmps_ix" rather than it being passed as an
3168 arg, this dictates at what point "cx_pushblock" should be called. In
3169 particular, any new mortals which should be freed only on scope exit
3170 (rather than at the next "nextstate") should be created first.
3171
3172 Most callers of "cx_pushblock" simply set the new args stack floor to
3173 the top of the previous stack frame, but for "CXt_LOOP_LIST" it stores
3174 the items being iterated over on the stack, and so sets "blk_oldsp" to
3175 the top of these items instead. Note that, contrary to its name,
3176 "blk_oldsp" doesn't always represent the value to restore "PL_stack_sp"
3177 to on scope exit.
3178
3179 Note the early capture of "PL_savestack_ix" to "old_ss_ix", which is
3180 later passed as an arg to "cx_pushblock". In the case of "pp_entersub",
3181 this is because, although most values needing saving are stored in
3182 fields of the context struct, an extra value needs saving only when the
3183 debugger is running, and it doesn't make sense to bloat the struct for
3184 this rare case. So instead it is saved on the savestack. Since this
3185 value gets calculated and saved before the context is pushed, it is
3186 necessary to pass the old value of "PL_savestack_ix" to "cx_pushblock",
3187 to ensure that the saved value gets freed during scope exit. For most
3188 users of "cx_pushblock", where nothing needs pushing on the save stack,
3189 "PL_savestack_ix" is just passed directly as an arg to "cx_pushblock".
3190
3191 Note that where possible, values should be saved in the context struct
3192 rather than on the save stack; it's much faster that way.
3193
3194 Normally "cx_pushblock" should be immediately followed by the
3195 appropriate "cx_pushfoo", with nothing between them; this is because if
3196 code in-between could die (e.g. a warning upgraded to fatal), then the
3197 context stack unwinding code in "dounwind" would see (in the example
3198 above) a "CXt_SUB" context frame, but without all the subroutine-
3199 specific fields set, and crashes would soon ensue.
3200
3201 Where the two must be separate, initially set the type to "CXt_NULL" or
3202 "CXt_BLOCK", and later change it to "CXt_foo" when doing the
3203 "cx_pushfoo". This is exactly what "pp_enteriter" does, once it's
3204 determined which type of loop it's pushing.
3205
3206 Popping contexts
3207 Contexts are popped using "cx_popsub()" etc. and "cx_popblock()". Note
3208 however, that unlike "cx_pushblock", neither of these functions
3209 actually decrement the current context stack index; this is done
3210 separately using "CX_POP()".
3211
3212 There are two main ways that contexts are popped. During normal
3213 execution as scopes are exited, functions like "pp_leave",
3214 "pp_leaveloop" and "pp_leavesub" process and pop just one context using
3215 "cx_popfoo" and "cx_popblock". On the other hand, things like
3216 "pp_return" and "next" may have to pop back several scopes until a sub
3217 or loop context is found, and exceptions (such as "die") need to pop
3218 back contexts until an eval context is found. Both of these are
3219 accomplished by "dounwind()", which is capable of processing and
3220 popping all contexts above the target one.
3221
3222 Here is a typical example of context popping, as found in "pp_leavesub"
3223 (simplified slightly):
3224
3225 U8 gimme;
3226 PERL_CONTEXT *cx;
3227 SV **oldsp;
3228 OP *retop;
3229
3230 cx = CX_CUR();
3231
3232 gimme = cx->blk_gimme;
3233 oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */
3234
3235 if (gimme == G_VOID)
3236 PL_stack_sp = oldsp;
3237 else
3238 leave_adjust_stacks(oldsp, oldsp, gimme, 0);
3239
3240 CX_LEAVE_SCOPE(cx);
3241 cx_popsub(cx);
3242 cx_popblock(cx);
3243 retop = cx->blk_sub.retop;
3244 CX_POP(cx);
3245
3246 return retop;
3247
3248 The steps above are in a very specific order, designed to be the
3249 reverse order of when the context was pushed. The first thing to do is
3250 to copy and/or protect any any return arguments and free any temps in
3251 the current scope. Scope exits like an rvalue sub normally return a
3252 mortal copy of their return args (as opposed to lvalue subs). It is
3253 important to make this copy before the save stack is popped or
3254 variables are restored, or bad things like the following can happen:
3255
3256 sub f { my $x =...; $x } # $x freed before we get to copy it
3257 sub f { /(...)/; $1 } # PL_curpm restored before $1 copied
3258
3259 Although we wish to free any temps at the same time, we have to be
3260 careful not to free any temps which are keeping return args alive; nor
3261 to free the temps we have just created while mortal copying return
3262 args. Fortunately, "leave_adjust_stacks()" is capable of making mortal
3263 copies of return args, shifting args down the stack, and only
3264 processing those entries on the temps stack that are safe to do so.
3265
3266 In void context no args are returned, so it's more efficient to skip
3267 calling "leave_adjust_stacks()". Also in void context, a "nextstate" op
3268 is likely to be imminently called which will do a "FREETMPS", so
3269 there's no need to do that either.
3270
3271 The next step is to pop savestack entries: "CX_LEAVE_SCOPE(cx)" is just
3272 defined as "LEAVE_SCOPE(cx->blk_oldsaveix)". Note that during the
3273 popping, it's possible for perl to call destructors, call "STORE" to
3274 undo localisations of tied vars, and so on. Any of these can die or
3275 call "exit()". In this case, "dounwind()" will be called, and the
3276 current context stack frame will be re-processed. Thus it is vital that
3277 all steps in popping a context are done in such a way to support
3278 reentrancy. The other alternative, of decrementing "cxstack_ix" before
3279 processing the frame, would lead to leaks and the like if something
3280 died halfway through, or overwriting of the current frame.
3281
3282 "CX_LEAVE_SCOPE" itself is safely re-entrant: if only half the
3283 savestack items have been popped before dying and getting trapped by
3284 eval, then the "CX_LEAVE_SCOPE"s in "dounwind" or "pp_leaveeval" will
3285 continue where the first one left off.
3286
3287 The next step is the type-specific context processing; in this case
3288 "cx_popsub". In part, this looks like:
3289
3290 cv = cx->blk_sub.cv;
3291 CvDEPTH(cv) = cx->blk_sub.olddepth;
3292 cx->blk_sub.cv = NULL;
3293 SvREFCNT_dec(cv);
3294
3295 where its processing the just-executed CV. Note that before it
3296 decrements the CV's reference count, it nulls the "blk_sub.cv". This
3297 means that if it re-enters, the CV won't be freed twice. It also means
3298 that you can't rely on such type-specific fields having useful values
3299 after the return from "cx_popfoo".
3300
3301 Next, "cx_popblock" restores all the various interpreter vars to their
3302 previous values or previous high water marks; it expands to:
3303
3304 PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp;
3305 PL_scopestack_ix = cx->blk_oldscopesp;
3306 PL_curpm = cx->blk_oldpm;
3307 PL_curcop = cx->blk_oldcop;
3308 PL_tmps_floor = cx->blk_old_tmpsfloor;
3309
3310 Note that it doesn't restore "PL_stack_sp"; as mentioned earlier, which
3311 value to restore it to depends on the context type (specifically "for
3312 (list) {}"), and what args (if any) it returns; and that will already
3313 have been sorted out earlier by "leave_adjust_stacks()".
3314
3315 Finally, the context stack pointer is actually decremented by
3316 "CX_POP(cx)". After this point, it's possible that that the current
3317 context frame could be overwritten by other contexts being pushed.
3318 Although things like ties and "DESTROY" are supposed to work within a
3319 new context stack, it's best not to assume this. Indeed on debugging
3320 builds, "CX_POP(cx)" deliberately sets "cx" to null to detect code that
3321 is still relying on the field values in that context frame. Note in the
3322 "pp_leavesub()" example above, we grab "blk_sub.retop" before calling
3323 "CX_POP".
3324
3325 Redoing contexts
3326 Finally, there is "cx_topblock(cx)", which acts like a
3327 super-"nextstate" as regards to resetting various vars to their base
3328 values. It is used in places like "pp_next", "pp_redo" and "pp_goto"
3329 where rather than exiting a scope, we want to re-initialise the scope.
3330 As well as resetting "PL_stack_sp" like "nextstate", it also resets
3331 "PL_markstack_ptr", "PL_scopestack_ix" and "PL_curpm". Note that it
3332 doesn't do a "FREETMPS".
3333
3335 Note: this section describes a non-public internal API that is subject
3336 to change without notice.
3337
3338 Perl's internal error-handling mechanisms implement "die" (and its
3339 internal equivalents) using longjmp. If this occurs during lexing,
3340 parsing or compilation, we must ensure that any ops allocated as part
3341 of the compilation process are freed. (Older Perl versions did not
3342 adequately handle this situation: when failing a parse, they would leak
3343 ops that were stored in C "auto" variables and not linked anywhere
3344 else.)
3345
3346 To handle this situation, Perl uses op slabs that are attached to the
3347 currently-compiling CV. A slab is a chunk of allocated memory. New ops
3348 are allocated as regions of the slab. If the slab fills up, a new one
3349 is created (and linked from the previous one). When an error occurs and
3350 the CV is freed, any ops remaining are freed.
3351
3352 Each op is preceded by two pointers: one points to the next op in the
3353 slab, and the other points to the slab that owns it. The next-op
3354 pointer is needed so that Perl can iterate over a slab and free all its
3355 ops. (Op structures are of different sizes, so the slab's ops can't
3356 merely be treated as a dense array.) The slab pointer is needed for
3357 accessing a reference count on the slab: when the last op on a slab is
3358 freed, the slab itself is freed.
3359
3360 The slab allocator puts the ops at the end of the slab first. This will
3361 tend to allocate the leaves of the op tree first, and the layout will
3362 therefore hopefully be cache-friendly. In addition, this means that
3363 there's no need to store the size of the slab (see below on why slabs
3364 vary in size), because Perl can follow pointers to find the last op.
3365
3366 It might seem possible eliminate slab reference counts altogether, by
3367 having all ops implicitly attached to "PL_compcv" when allocated and
3368 freed when the CV is freed. That would also allow "op_free" to skip
3369 "FreeOp" altogether, and thus free ops faster. But that doesn't work in
3370 those cases where ops need to survive beyond their CVs, such as re-
3371 evals.
3372
3373 The CV also has to have a reference count on the slab. Sometimes the
3374 first op created is immediately freed. If the reference count of the
3375 slab reaches 0, then it will be freed with the CV still pointing to it.
3376
3377 CVs use the "CVf_SLABBED" flag to indicate that the CV has a reference
3378 count on the slab. When this flag is set, the slab is accessible via
3379 "CvSTART" when "CvROOT" is not set, or by subtracting two pointers
3380 "(2*sizeof(I32 *))" from "CvROOT" when it is set. The alternative to
3381 this approach of sneaking the slab into "CvSTART" during compilation
3382 would be to enlarge the "xpvcv" struct by another pointer. But that
3383 would make all CVs larger, even though slab-based op freeing is
3384 typically of benefit only for programs that make significant use of
3385 string eval.
3386
3387 When the "CVf_SLABBED" flag is set, the CV takes responsibility for
3388 freeing the slab. If "CvROOT" is not set when the CV is freed or
3389 undeffed, it is assumed that a compilation error has occurred, so the
3390 op slab is traversed and all the ops are freed.
3391
3392 Under normal circumstances, the CV forgets about its slab (decrementing
3393 the reference count) when the root is attached. So the slab reference
3394 counting that happens when ops are freed takes care of freeing the
3395 slab. In some cases, the CV is told to forget about the slab
3396 ("cv_forget_slab") precisely so that the ops can survive after the CV
3397 is done away with.
3398
3399 Forgetting the slab when the root is attached is not strictly
3400 necessary, but avoids potential problems with "CvROOT" being written
3401 over. There is code all over the place, both in core and on CPAN, that
3402 does things with "CvROOT", so forgetting the slab makes things more
3403 robust and avoids potential problems.
3404
3405 Since the CV takes ownership of its slab when flagged, that flag is
3406 never copied when a CV is cloned, as one CV could free a slab that
3407 another CV still points to, since forced freeing of ops ignores the
3408 reference count (but asserts that it looks right).
3409
3410 To avoid slab fragmentation, freed ops are marked as freed and attached
3411 to the slab's freed chain (an idea stolen from DBM::Deep). Those freed
3412 ops are reused when possible. Not reusing freed ops would be simpler,
3413 but it would result in significantly higher memory usage for programs
3414 with large "if (DEBUG) {...}" blocks.
3415
3416 "SAVEFREEOP" is slightly problematic under this scheme. Sometimes it
3417 can cause an op to be freed after its CV. If the CV has forcibly freed
3418 the ops on its slab and the slab itself, then we will be fiddling with
3419 a freed slab. Making "SAVEFREEOP" a no-op doesn't help, as sometimes an
3420 op can be savefreed when there is no compilation error, so the op would
3421 never be freed. It holds a reference count on the slab, so the whole
3422 slab would leak. So "SAVEFREEOP" now sets a special flag on the op
3423 ("->op_savefree"). The forced freeing of ops after a compilation error
3424 won't free any ops thus marked.
3425
3426 Since many pieces of code create tiny subroutines consisting of only a
3427 few ops, and since a huge slab would be quite a bit of baggage for
3428 those to carry around, the first slab is always very small. To avoid
3429 allocating too many slabs for a single CV, each subsequent slab is
3430 twice the size of the previous.
3431
3432 Smartmatch expects to be able to allocate an op at run time, run it,
3433 and then throw it away. For that to work the op is simply malloced when
3434 PL_compcv hasn't been set up. So all slab-allocated ops are marked as
3435 such ("->op_slabbed"), to distinguish them from malloced ops.
3436
3438 Until May 1997, this document was maintained by Jeff Okamoto
3439 <okamoto@corp.hp.com>. It is now maintained as part of Perl itself by
3440 the Perl 5 Porters <perl5-porters@perl.org>.
3441
3442 With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
3443 Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
3444 Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
3445 Stephen McCamant, and Gurusamy Sarathy.
3446
3448 perlapi, perlintern, perlxs, perlembed
3449
3450
3451
3452perl v5.30.1 2019-11-29 PERLGUTS(1)