1INTERNALS(1) User Contributed Perl Documentation INTERNALS(1)
2
3
4
6 PDL::Internals - description of some aspects of the current internals
7
9 Intro
10 This document explains various aspects of the current implementation of
11 PDL. If you just want to use PDL for something, you definitely do not
12 need to read this. Even if you want to interface your C routines to PDL
13 or create new PDL::PP functions, you do not need to read this man page
14 (though it may be informative). This document is primarily intended for
15 people interested in debugging or changing the internals of PDL. To
16 read this, a good understanding of the C language and programming and
17 data structures in general is required, as well as some Perl
18 understanding. If you read through this document and understand all of
19 it and are able to point what any part of this document refers to in
20 the PDL core sources and additionally struggle to understand PDL::PP,
21 you will be awarded the title "PDL Guru" (of course, the current
22 version of this document is so incomplete that this is next to
23 impossible from just these notes).
24
25 Warning: If it seems that this document has gotten out of date, please
26 inform the PDL porters email list (pdl-devel@lists.sourceforge.net).
27 This may well happen.
28
29 Piddles
30 The pdl data object is generally an opaque scalar reference into a pdl
31 structure in memory. Alternatively, it may be a hash reference with the
32 "PDL" field containing the scalar reference (this makes overloading
33 piddles easy, see PDL::Objects). You can easily find out at the Perl
34 level which type of piddle you are dealing with. The example code below
35 demonstrates how to do it:
36
37 # check if this a piddle
38 die "not a piddle" unless UNIVERSAL::isa($pdl, 'PDL');
39 # is it a scalar ref or a hash ref?
40 if (UNIVERSAL::isa($pdl, "HASH")) {
41 die "not a valid PDL" unless exists $pdl->{PDL} &&
42 UNIVERSAL::isa($pdl->{PDL},'PDL');
43 print "This is a hash reference,",
44 " the PDL field contains the scalar ref\n";
45 } else {
46 print "This is a scalar ref that points to address $$pdl in memory\n";
47 }
48
49 The scalar reference points to the numeric address of a C structure of
50 type "pdl" which is defined in pdl.h. The mapping between the object at
51 the Perl level and the C structure containing the actual data and
52 structural that makes up a piddle is done by the PDL typemap. The
53 functions used in the PDL typemap are defined pretty much at the top of
54 the file pdlcore.h. So what does the structure look like:
55
56 struct pdl {
57 unsigned long magicno; /* Always stores PDL_MAGICNO as a sanity check */
58 /* This is first so most pointer accesses to wrong type are caught */
59 int state; /* What's in this pdl */
60
61 pdl_trans *trans; /* Opaque pointer to internals of transformation from
62 parent */
63
64 pdl_vaffine *vafftrans;
65
66 void* sv; /* (optional) pointer back to original sv.
67 ALWAYS check for non-null before use.
68 We cannot inc refcnt on this one or we'd
69 never get destroyed */
70
71 void *datasv; /* Pointer to SV containing data. Refcnt inced */
72 void *data; /* Null: no data alloced for this one */
73 PDL_Indx nvals; /* How many values allocated */
74 int datatype;
75 PDL_Indx *dims; /* Array of data dimensions */
76 PDL_Indx *dimincs; /* Array of data default increments */
77 short ndims; /* Number of data dimensions */
78
79 unsigned char *threadids; /* Starting index of the thread index set n */
80 unsigned char nthreadids;
81
82 pdl_children children;
83
84 PDL_Indx def_dims[PDL_NDIMS]; /* Preallocated space for efficiency */
85 PDL_Indx def_dimincs[PDL_NDIMS]; /* Preallocated space for efficiency */
86 unsigned char def_threadids[PDL_NTHREADIDS];
87
88 struct pdl_magic *magic;
89
90 void *hdrsv; /* "header", settable from outside */
91 };
92
93 This is quite a structure for just storing some data in - what is going
94 on?
95
96 Data storage
97 We are going to start with some of the simpler members: first of
98 all, there is the member
99
100 void *datasv;
101
102 which is really a pointer to a Perl SV structure ("SV *"). The SV
103 is expected to be representing a string, in which the data of the
104 piddle is stored in a tightly packed form. This pointer counts as
105 a reference to the SV so the reference count has been incremented
106 when the "SV *" was placed here (this reference count business has
107 to do with Perl's garbage collection mechanism -- don't worry if
108 this doesn't mean much to you). This pointer is allowed to have
109 the value "NULL" which means that there is no actual Perl SV for
110 this data - for instance, the data might be allocated by a "mmap"
111 operation. Note the use of an SV* was purely for convenience, it
112 allows easy transformation of packed data from files into piddles.
113 Other implementations are not excluded.
114
115 The actual pointer to data is stored in the member
116
117 void *data;
118
119 which contains a pointer to a memory area with space for
120
121 PDL_Indx nvals;
122
123 data items of the data type of this piddle. PDL_Indx is either
124 'long' or 'long long' depending on whether your perl is 64bit or
125 not.
126
127 The data type of the data is stored in the variable
128
129 int datatype;
130
131 the values for this member are given in the enum "pdl_datatypes"
132 (see pdl.h). Currently we have byte, short, unsigned short, long,
133 index (either long or long long), long long, float and double
134 types, see also PDL::Types.
135
136 Dimensions
137 The number of dimensions in the piddle is given by the member
138
139 int ndims;
140
141 which shows how many entries there are in the arrays
142
143 PDL_Indx *dims;
144 PDL_Indx *dimincs;
145
146 These arrays are intimately related: "dims" gives the sizes of the
147 dimensions and "dimincs" is always calculated by the code
148
149 PDL_Indx inc = 1;
150 for(i=0; i<it->ndims; i++) {
151 it->dimincs[i] = inc; inc *= it->dims[i];
152 }
153
154 in the routine "pdl_resize_defaultincs" in "pdlapi.c". What this
155 means is that the dimincs can be used to calculate the offset by
156 code like
157
158 PDL_Indx offs = 0;
159 for(i=0; i<it->ndims; i++) {
160 offs += it->dimincs[i] * index[i];
161 }
162
163 but this is not always the right thing to do, at least without
164 checking for certain things first.
165
166 Default storage
167 Since the vast majority of piddles don't have more than 6
168 dimensions, it is more efficient to have default storage for the
169 dimensions and dimincs inside the PDL struct.
170
171 PDL_Indx def_dims[PDL_NDIMS];
172 PDL_Indx def_dimincs[PDL_NDIMS];
173
174 The "dims" and "dimincs" may be set to point to the beginning of
175 these arrays if "ndims" is smaller than or equal to the compile-
176 time constant "PDL_NDIMS". This is important to note when freeing
177 a piddle struct. The same applies for the threadids:
178
179 unsigned char def_threadids[PDL_NTHREADIDS];
180
181 Magic
182 It is possible to attach magic to piddles, much like Perl's own
183 magic mechanism. If the member pointer
184
185 struct pdl_magic *magic;
186
187 is nonzero, the PDL has some magic attached to it. The
188 implementation of magic can be gleaned from the file pdlmagic.c in
189 the distribution.
190
191 State
192 One of the first members of the structure is
193
194 int state;
195
196 The possible flags and their meanings are given in "pdl.h". These
197 are mainly used to implement the lazy evaluation mechanism and
198 keep track of piddles in these operations.
199
200 Transformations and virtual affine transformations
201 As you should already know, piddles often carry information about
202 where they come from. For example, the code
203
204 $y = $x->slice("2:5");
205 $y .= 1;
206
207 will alter $x. So $y and $x know that they are connected via a
208 "slice"-transformation. This information is stored in the members
209
210 pdl_trans *trans;
211 pdl_vaffine *vafftrans;
212
213 Both $x (the parent) and $y (the child) store this information
214 about the transformation in appropriate slots of the "pdl"
215 structure.
216
217 "pdl_trans" and "pdl_vaffine" are structures that we will look at
218 in more detail below.
219
220 The Perl SVs
221 When piddles are referred to through Perl SVs, we store an
222 additional reference to it in the member
223
224 void* sv;
225
226 in order to be able to return a reference to the user when he
227 wants to inspect the transformation structure on the Perl side.
228
229 Also, we store an opaque
230
231 void *hdrsv;
232
233 which is just for use by the user to hook up arbitrary data with
234 this sv. This one is generally manipulated through sethdr and
235 gethdr calls.
236
237 Smart references and transformations: slicing and dicing
238 Smart references and most other fundamental functions operating on
239 piddles are implemented via transformations (as mentioned above) which
240 are represented by the type "pdl_trans" in PDL.
241
242 A transformation links input and output piddles and contains all the
243 infrastructure that defines how:
244
245 • output piddles are obtained from input piddles;
246
247 • changes in smartly linked output piddles (e.g. the child of a
248 sliced parent piddle) are flown back to the input piddle in
249 transformations where this is supported (the most often used
250 example being "slice" here);
251
252 • datatype and size of output piddles that need to be created are
253 obtained.
254
255 In general, executing a PDL function on a group of piddles results in
256 creation of a transformation of the requested type that links all input
257 and output arguments (at least those that are piddles). In PDL
258 functions that support data flow between input and output args (e.g.
259 "slice", "index") this transformation links parent (input) and child
260 (output) piddles permanently until either the link is explicitly broken
261 by user request ("sever" at the Perl level) or all parents and children
262 have been destroyed. In those cases the transformation is lazy-
263 evaluated, e.g. only executed when piddle values are actually accessed.
264
265 In non-flowing functions, for example addition ("+") and inner products
266 ("inner"), the transformation is installed just as in flowing functions
267 but then the transformation is immediately executed and destroyed
268 (breaking the link between input and output args) before the function
269 returns.
270
271 It should be noted that the close link between input and output args of
272 a flowing function (like slice) requires that piddle objects that are
273 linked in such a way be kept alive beyond the point where they have
274 gone out of scope from the point of view of Perl:
275
276 $x = zeroes(20);
277 $y = $x->slice('2:4');
278 undef $x; # last reference to $x is now destroyed
279
280 Although $x should now be destroyed according to Perl's rules the
281 underlying "pdl" structure must actually only be freed when $y also
282 goes out of scope (since it still references internally some of $x's
283 data). This example demonstrates that such a dataflow paradigm between
284 PDL objects necessitates a special destruction algorithm that takes the
285 links between piddles into account and couples the lifespan of those
286 objects. The non-trivial algorithm is implemented in the function
287 "pdl_destroy" in pdlapi.c. In fact, most of the code in pdlapi.c and
288 pdlfamily.c is concerned with making sure that piddles ("pdl *"s) are
289 created, updated and freed at the right times depending on interactions
290 with other piddles via PDL transformations (remember, "pdl_trans").
291
292 Accessing children and parents of a piddle
293 When piddles are dynamically linked via transformations as suggested
294 above input and output piddles are referred to as parents and children,
295 respectively.
296
297 An example of processing the children of a piddle is provided by the
298 "baddata" method of PDL::Bad (only available if you have compiled PDL
299 with the "WITH_BADVAL" option set to 1, but still useful as an
300 example!).
301
302 Consider the following situation:
303
304 pdl> $x = rvals(7,7,{Centre=>[3,4]});
305 pdl> $y = $x->slice('2:4,3:5');
306 pdl> ? vars
307 PDL variables in package main::
308
309 Name Type Dimension Flow State Mem
310 ----------------------------------------------------------------
311 $x Double D [7,7] P 0.38Kb
312 $y Double D [3,3] -C 0.00Kb
313
314 Now, if I suddenly decide that $x should be flagged as possibly
315 containing bad values, using
316
317 pdl> $x->badflag(1)
318
319 then I want the state of $y - it's child - to be changed as well (since
320 it will either share or inherit some of $x's data and so be also bad),
321 so that I get a 'B' in the State field:
322
323 pdl> ? vars
324 PDL variables in package main::
325
326 Name Type Dimension Flow State Mem
327 ----------------------------------------------------------------
328 $x Double D [7,7] PB 0.38Kb
329 $y Double D [3,3] -CB 0.00Kb
330
331 This bit of magic is performed by the "propagate_badflag" function,
332 which is listed below:
333
334 /* newval = 1 means set flag, 0 means clear it */
335 /* thanks to Christian Soeller for this */
336
337 void propagate_badflag( pdl *it, int newval ) {
338 PDL_DECL_CHILDLOOP(it)
339 PDL_START_CHILDLOOP(it)
340 {
341 pdl_trans *trans = PDL_CHILDLOOP_THISCHILD(it);
342 int i;
343 for( i = trans->vtable->nparents;
344 i < trans->vtable->npdls;
345 i++ ) {
346 pdl *child = trans->pdls[i];
347
348 if ( newval ) child->state |= PDL_BADVAL;
349 else child->state &= ~PDL_BADVAL;
350
351 /* make sure we propagate to grandchildren, etc */
352 propagate_badflag( child, newval );
353
354 } /* for: i */
355 }
356 PDL_END_CHILDLOOP(it)
357 } /* propagate_badflag */
358
359 Given a piddle ("pdl *it"), the routine loops through each "pdl_trans"
360 structure, where access to this structure is provided by the
361 "PDL_CHILDLOOP_THISCHILD" macro. The children of the piddle are stored
362 in the "pdls" array, after the parents, hence the loop from "i =
363 ...nparents" to "i = ...npdls - 1". Once we have the pointer to the
364 child piddle, we can do what we want to it; here we change the value of
365 the "state" variable, but the details are unimportant). What is
366 important is that we call "propagate_badflag" on this piddle, to ensure
367 we loop through its children. This recursion ensures we get to all the
368 offspring of a particular piddle.
369
370 Access to parents is similar, with the "for" loop replaced by:
371
372 for( i = 0;
373 i < trans->vtable->nparents;
374 i++ ) {
375 /* do stuff with parent #i: trans->pdls[i] */
376 }
377
378 What's in a transformation ("pdl_trans")
379 All transformations are implemented as structures
380
381 struct XXX_trans {
382 int magicno; /* to detect memory overwrites */
383 short flags; /* state of the trans */
384 pdl_transvtable *vtable; /* the all important vtable */
385 void (*freeproc)(struct pdl_trans *); /* Call to free this trans
386 (in case we had to malloc some stuff for this trans) */
387 pdl *pdls[NP]; /* The pdls involved in the transformation */
388 int __datatype; /* the type of the transformation */
389 /* in general more members
390 /* depending on the actual transformation (slice, add, etc)
391 */
392 };
393
394 The transformation identifies all "pdl"s involved in the trans
395
396 pdl *pdls[NP];
397
398 with "NP" depending on the number of piddle args of the particular
399 trans. It records a state
400
401 short flags;
402
403 and the datatype
404
405 int __datatype;
406
407 of the trans (to which all piddles must be converted unless they are
408 explicitly typed, PDL functions created with PDL::PP make sure that
409 these conversions are done as necessary). Most important is the pointer
410 to the vtable (virtual table) that contains the actual functionality
411
412 pdl_transvtable *vtable;
413
414 The vtable structure in turn looks something like (slightly simplified
415 from pdl.h for clarity)
416
417 typedef struct pdl_transvtable {
418 pdl_transtype transtype;
419 int flags;
420 int nparents; /* number of parent pdls (input) */
421 int npdls; /* number of child pdls (output) */
422 char *per_pdl_flags; /* optimization flags */
423 void (*redodims)(pdl_trans *tr); /* figure out dims of children */
424 void (*readdata)(pdl_trans *tr); /* flow parents to children */
425 void (*writebackdata)(pdl_trans *tr); /* flow backwards */
426 void (*freetrans)(pdl_trans *tr); /* Free both the contents and it of
427 the trans member */
428 pdl_trans *(*copy)(pdl_trans *tr); /* Full copy */
429 int structsize;
430 char *name; /* For debuggers, mostly */
431 } pdl_transvtable;
432
433 We focus on the callback functions:
434
435 void (*redodims)(pdl_trans *tr);
436
437 "redodims" will work out the dimensions of piddles that need to be
438 created and is called from within the API function that should be
439 called to ensure that the dimensions of a piddle are accessible
440 (pdlapi.c):
441
442 void pdl_make_physdims(pdl *it)
443
444 "readdata" and "writebackdata" are responsible for the actual
445 computations of the child data from the parents or parent data from
446 those of the children, respectively (the dataflow aspect). The PDL
447 core makes sure that these are called as needed when piddle data is
448 accessed (lazy-evaluation). The general API function to ensure that a
449 piddle is up-to-date is
450
451 void pdl_make_physvaffine(pdl *it)
452
453 which should be called before accessing piddle data from XS/C (see
454 Core.xs for some examples).
455
456 "freetrans" frees dynamically allocated memory associated with the
457 trans as needed and "copy" can copy the transformation. Again,
458 functions built with PDL::PP make sure that copying and freeing via
459 these callbacks happens at the right times. (If they fail to do that we
460 have got a memory leak -- this has happened in the past ;).
461
462 The transformation and vtable code is hardly ever written by hand but
463 rather generated by PDL::PP from concise descriptions.
464
465 Certain types of transformations can be optimized very efficiently
466 obviating the need for explicit "readdata" and "writebackdata" methods.
467 Those transformations are called pdl_vaffine. Most dimension
468 manipulating functions (e.g., "slice", "xchg") belong to this class.
469
470 The basic trick is that parent and child of such a transformation work
471 on the same (shared) block of data which they just choose to interpret
472 differently (by using different "dims", "dimincs" and "offs" on the
473 same data, compare the "pdl" structure above). Each operation on a
474 piddle sharing data with another one in this way is therefore
475 automatically flown from child to parent and back -- after all they are
476 reading and writing the same block of memory. This is currently not
477 Perl thread safe -- no big loss since the whole PDL core is not
478 reentrant (Perl threading "!=" PDL threading!).
479
480 Signatures: threading over elementary operations
481 Most of that functionality of PDL threading (automatic iteration of
482 elementary operations over multi-dim piddles) is implemented in the
483 file pdlthread.c.
484
485 The PDL::PP generated functions (in particular the "readdata" and
486 "writebackdata" callbacks) use this infrastructure to make sure that
487 the fundamental operation implemented by the trans is performed in
488 agreement with PDL's threading semantics.
489
490 Defining new PDL functions -- Glue code generation
491 Please, see PDL::PP and examples in the PDL distribution.
492 Implementation and syntax are currently far from perfect but it does a
493 good job!
494
495 The Core struct
496 As discussed in PDL::API, PDL uses a pointer to a structure to allow
497 PDL modules access to its core routines. The definition of this
498 structure (the "Core" struct) is in pdlcore.h (created by pdlcore.h.PL
499 in Basic/Core) and looks something like
500
501 /* Structure to hold pointers core PDL routines so as to be used by
502 * many modules
503 */
504 struct Core {
505 I32 Version;
506 pdl* (*SvPDLV) ( SV* );
507 void (*SetSV_PDL) ( SV *sv, pdl *it );
508 #if defined(PDL_clean_namespace) || defined(PDL_OLD_API)
509 pdl* (*new) ( ); /* make it work with gimp-perl */
510 #else
511 pdl* (*pdlnew) ( ); /* renamed because of C++ clash */
512 #endif
513 pdl* (*tmp) ( );
514 pdl* (*create) (int type);
515 void (*destroy) (pdl *it);
516 ...
517 }
518 typedef struct Core Core;
519
520 The first field of the structure ("Version") is used to ensure
521 consistency between modules at run time; the following code is placed
522 in the BOOT section of the generated xs code:
523
524 if (PDL->Version != PDL_CORE_VERSION)
525 Perl_croak(aTHX_ "Foo needs to be recompiled against the newly installed PDL");
526
527 If you add a new field to the Core struct you should:
528
529 • discuss it on the pdl porters email list
530 (pdl-devel@lists.sourceforge.net) [with the possibility of making
531 your changes to a separate branch of the CVS tree if it's a change
532 that will take time to complete]
533
534 • increase by 1 the value of the $pdl_core_version variable in
535 pdlcore.h.PL. This sets the value of the "PDL_CORE_VERSION" C
536 macro used to populate the Version field
537
538 • add documentation (e.g. to PDL::API) if it's a "useful" function
539 for external module writers (as well as ensuring the code is as
540 well documented as the rest of PDL ;)
541
543 This description is far from perfect. If you need more details or
544 something is still unclear please ask on the pdl-devel mailing list
545 (pdl-devel@lists.sourceforge.net).
546
548 Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), 2000 Doug
549 Burke (djburke@cpan.org), 2002 Christian Soeller & Doug Burke, 2013
550 Chris Marshall.
551
552 Redistribution in the same form is allowed but reprinting requires a
553 permission from the author.
554
555
556
557perl v5.32.1 2021-02-15 INTERNALS(1)