1INTERNALS(1) User Contributed Perl Documentation INTERNALS(1)
2
3
4
6 PDL::Internals - description of some aspects of the current internals
7
9 Intro
10 This document explains various aspects of the current implementation of
11 PDL. If you just want to use PDL for something, you definitely do not
12 need to read this. Even if you want to interface your C routines to PDL
13 or create new PDL::PP functions, you do not need to read this man page
14 (though it may be informative). This document is primarily intended for
15 people interested in debugging or changing the internals of PDL. To
16 read this, a good understanding of the C language and programming and
17 data structures in general is required, as well as some Perl
18 understanding. If you read through this document and understand all of
19 it and are able to point what any part of this document refers to in
20 the PDL core sources and additionally struggle to understand PDL::PP,
21 you will be awarded the title "PDL Guru" (of course, the current
22 version of this document is so incomplete that this is next to
23 impossible from just these notes).
24
25 Warning: If it seems that this document has gotten out of date, please
26 inform the PDL porters email list (pdl-porters@jach.hawaii.edu). This
27 may well happen.
28
29 Piddles
30 The pdl data object is generally an opaque scalar reference into a pdl
31 structure in memory. Alternatively, it may be a hash reference with the
32 "PDL" field containing the scalar reference (this makes overloading
33 piddles easy, see PDL::Objects). You can easily find out at the Perl
34 level which type of piddle you are dealing with. The example code below
35 demonstrates how to do it:
36
37 # check if this a piddle
38 die "not a piddle" unless UNIVERSAL::isa($pdl, 'PDL');
39 # is it a scalar ref or a hash ref?
40 if (UNIVERSAL::isa($pdl, "HASH")) {
41 die "not a valid PDL" unless exists $pdl->{PDL} &&
42 UNIVERSAL::isa($pdl->{PDL},'PDL');
43 print "This is a hash reference,",
44 " the PDL field contains the scalar ref\n";
45 } else {
46 print "This is a scalar ref that points to address $$pdl in memory\n";
47 }
48
49 The scalar reference points to the numeric address of a C structure of
50 type "pdl" which is defined in pdl.h. The mapping between the object at
51 the Perl level and the C structure containing the actual data and
52 structural that makes up a piddle is done by the PDL typemap. The
53 functions used in the PDL typemap are defined pretty much at the top of
54 the file pdlcore.h. So what does the structure look like:
55
56 struct pdl {
57 unsigned long magicno; /* Always stores PDL_MAGICNO as a sanity check */
58 /* This is first so most pointer accesses to wrong type are caught */
59 int state; /* What's in this pdl */
60
61 pdl_trans *trans; /* Opaque pointer to internals of transformation from
62 parent */
63
64 pdl_vaffine *vafftrans;
65
66 void* sv; /* (optional) pointer back to original sv.
67 ALWAYS check for non-null before use.
68 We cannot inc refcnt on this one or we'd
69 never get destroyed */
70
71 void *datasv; /* Pointer to SV containing data. Refcnt inced */
72 void *data; /* Null: no data alloced for this one */
73 int nvals; /* How many values allocated */
74 int datatype;
75 PDL_Long *dims; /* Array of data dimensions */
76 PDL_Long *dimincs; /* Array of data default increments */
77 short ndims; /* Number of data dimensions */
78
79 unsigned char *threadids; /* Starting index of the thread index set n */
80 unsigned char nthreadids;
81
82 pdl *progenitor; /* I'm in a mutated family. make_physical_now must
83 copy me to the new generation. */
84 pdl *future_me; /* I'm the "then" pdl and this is my "now" (or more modern
85 version, anyway */
86
87 pdl_children children;
88
89 short living_for; /* perl side not referenced; delete me when */
90
91 PDL_Long def_dims[PDL_NDIMS]; /* Preallocated space for efficiency */
92 PDL_Long def_dimincs[PDL_NDIMS]; /* Preallocated space for efficiency */
93 unsigned char def_threadids[PDL_NTHREADIDS];
94
95 struct pdl_magic *magic;
96
97 void *hdrsv; /* "header", settable from outside */
98 };
99
100 This is quite a structure for just storing some data in - what is going
101 on?
102
103 Data storage
104 We are going to start with some of the simpler members: first of
105 all, there is the member
106
107 void *datasv;
108
109 which is really a pointer to a Perl SV structure ("SV *"). The SV
110 is expected to be representing a string, in which the data of the
111 piddle is stored in a tightly packed form. This pointer counts as
112 a reference to the SV so the reference count has been incremented
113 when the "SV *" was placed here (this reference count business has
114 to do with Perl's garbage collection mechanism -- don't worry if
115 this doesn't mean much to you). This pointer is allowed to have
116 the value "NULL" which means that there is no actual Perl SV for
117 this data - for instance, the data might be allocated by a "mmap"
118 operation. Note the use of an SV* was purely for convenience, it
119 allows easy transformation of packed data from files into piddles.
120 Other implementations are not excluded.
121
122 The actual pointer to data is stored in the member
123
124 void *data;
125
126 which contains a pointer to a memory area with space for
127
128 int nvals;
129
130 data items of the data type of this piddle.
131
132 The data type of the data is stored in the variable
133
134 int datatype;
135
136 the values for this member are given in the enum "pdl_datatypes"
137 (see pdl.h). Currently we have byte, short, unsigned short, long,
138 float and double types, see also PDL::Types.
139
140 Dimensions
141 The number of dimensions in the piddle is given by the member
142
143 int ndims;
144
145 which shows how many entries there are in the arrays
146
147 PDL_Long *dims;
148 PDL_Long *dimincs;
149
150 These arrays are intimately related: "dims" gives the sizes of the
151 dimensions and "dimincs" is always calculated by the code
152
153 int inc = 1;
154 for(i=0; i<it->ndims; i++) {
155 it->dimincs[i] = inc; inc *= it->dims[i];
156 }
157
158 in the routine "pdl_resize_defaultincs" in "pdlapi.c". What this
159 means is that the dimincs can be used to calculate the offset by
160 code like
161
162 int offs = 0;
163 for(i=0; i<it->ndims; i++) {
164 offs += it->dimincs[i] * index[i];
165 }
166
167 but this is not always the right thing to do, at least without
168 checking for certain things first.
169
170 Default storage
171 Since the vast majority of piddles don't have more than 6
172 dimensions, it is more efficient to have default storage for the
173 dimensions and dimincs inside the PDL struct.
174
175 PDL_Long def_dims[PDL_NDIMS];
176 PDL_Long def_dimincs[PDL_NDIMS];
177
178 The "dims" and "dimincs" may be set to point to the beginning of
179 these arrays if "ndims" is smaller than or equal to the compile-
180 time constant "PDL_NDIMS". This is important to note when freeing
181 a piddle struct. The same applies for the threadids:
182
183 unsigned char def_threadids[PDL_NTHREADIDS];
184
185 Magic
186 It is possible to attach magic to piddles, much like Perl's own
187 magic mechanism. If the member pointer
188
189 struct pdl_magic *magic;
190
191 is nonzero, the PDL has some magic attached to it. The
192 implementation of magic can be gleaned from the file pdlmagic.c in
193 the distribution.
194
195 State
196 One of the first members of the structure is
197
198 int state;
199
200 The possible flags and their meanings are given in "pdl.h". These
201 are mainly used to implement the lazy evaluation mechanism and
202 keep track of piddles in these operations.
203
204 Transformations and virtual affine transformations
205 As you should already know, piddles often carry information about
206 where they come from. For example, the code
207
208 $b = $a->slice("2:5");
209 $b .= 1;
210
211 will alter $a. So $b and $a know that they are connected via a
212 "slice"-transformation. This information is stored in the members
213
214 pdl_trans *trans;
215 pdl_vaffine *vafftrans;
216
217 Both $a (the parent) and $b (the child) store this information
218 about the transformation in appropriate slots of the "pdl"
219 structure.
220
221 "pdl_trans" and "pdl_vaffine" are structures that we will look at
222 in more detail below.
223
224 The Perl SVs
225 When piddles are referred to through Perl SVs, we store an
226 additional reference to it in the member
227
228 void* sv;
229
230 in order to be able to return a reference to the user when he
231 wants to inspect the transformation structure on the Perl side.
232
233 Also, we store an opaque
234
235 void *hdrsv;
236
237 which is just for use by the user to hook up arbitrary data with
238 this sv. This one is generally manipulated through sethdr and
239 gethdr calls.
240
241 Smart references and transformations: slicing and dicing
242 Smart references and most other fundamental functions operating on
243 piddles are implemented via transformations (Aas mentioned above) which
244 are represented by the type "pdl_trans" in PDL.
245
246 A transformation links input and output piddles and contains all the
247 infrastructure that defines how
248
249 · output piddles are obtained from input piddles
250
251 · changes in smartly linked output piddles (e.g. the child of a
252 sliced parent piddle) are flown back to the input piddle in
253 transformations where this is supported (the most often used
254 example being "slice" here).
255
256 · datatype and size of output piddles that need to be created are
257 obtained
258
259 In general, executing a PDL function on a group of piddles results in
260 creation of a transformation of the requested type that links all input
261 and output arguments (at least those that are piddles). In PDL
262 functions that support data flow between input and output args (e.g.
263 "slice", "index") this transformation links parent (input) and child
264 (output) piddles permanently until either the link is explicitly broken
265 by user request ("sever" at the perl level) or all parents and childen
266 have been destroyed. In those cases the transformation is lazy-
267 evaluated, e.g. only executed when piddle values are actually accessed.
268
269 In non-flowing functions, for example addition ("+") and inner products
270 ("inner"), the transformation is installed just as in flowing functions
271 but then the transformation is immediately executed and destroyed
272 (breaking the link between input and output args) before the function
273 returns.
274
275 It should be noted that the close link between input and output args of
276 a flowing function (like slice) requires that piddle objects that are
277 linked in such a way be kept alive beyond the point where they have
278 gone out of scope from the point of view of perl:
279
280 $a = zeroes(20);
281 $b = $a->slice('2:4');
282 undef $a; # last reference to $a is now destroyed
283
284 Although $a should now be destroyed according to perl's rules the
285 underlying "pdl" structure must actually only be freed when $b also
286 goes out of scope (since it still references internally some of $a's
287 data). This example demonstrates that such a dataflow paradigm between
288 PDL objects necessitates a special destruction algorithm that takes the
289 links between piddles into account and couples the lifespan of those
290 objects. The non-trivial algorithm is implemented in the function
291 "pdl_destroy" in pdlapi.c. In fact, most of the code in pdlapi.c and
292 pdlfamily.c is concerned with making sure that piddles ("pdl *"s) are
293 created, updated and freed at the right times depending on interactions
294 with other piddles via PDL transformations (remember, "pdl_trans").
295
296 Accessing children and parents of a piddle
297 When piddles are dynamically linked via transformations as suggested
298 above input and output piddles are referred to as parents and children,
299 respectively.
300
301 An example of processing the children of a piddle is provided by the
302 "baddata" method of PDL::Bad (only available if you have comiled PDL
303 with the "WITH_BADVAL" option set to 1, but still useful as an
304 example!).
305
306 Consider the following situation:
307
308 perldl> $a = rvals(7,7,Centre=>[3,4]);
309 perldl> $b = $a->slice('2:4,3:5');
310 perldl> ? vars
311 PDL variables in package main::
312
313 Name Type Dimension Flow State Mem
314 ----------------------------------------------------------------
315 $a Double D [7,7] P 0.38Kb
316 $b Double D [3,3] VC 0.00Kb
317
318 Now, if I suddenly decide that $a should be flagged as possibly
319 containing bad values, using
320
321 perldl> $a->baddata(1)
322
323 then I want the state of $b - it's child - to be changed as well (since
324 it will either share or inherit some of $a's data and so be also bad),
325 so that I get a 'B' in the State field:
326
327 perldl> ? vars
328 PDL variables in package main::
329
330 Name Type Dimension Flow State Mem
331 ----------------------------------------------------------------
332 $a Double D [7,7] PB 0.38Kb
333 $b Double D [3,3] VCB 0.00Kb
334
335 This bit of magic is performed by the "propogate_badflag" function,
336 which is listed below:
337
338 /* newval = 1 means set flag, 0 means clear it */
339 /* thanks to Christian Soeller for this */
340
341 void propogate_badflag( pdl *it, int newval ) {
342 PDL_DECL_CHILDLOOP(it)
343 PDL_START_CHILDLOOP(it)
344 {
345 pdl_trans *trans = PDL_CHILDLOOP_THISCHILD(it);
346 int i;
347 for( i = trans->vtable->nparents;
348 i < trans->vtable->npdls;
349 i++ ) {
350 pdl *child = trans->pdls[i];
351
352 if ( newval ) child->state |= PDL_BADVAL;
353 else child->state &= ~PDL_BADVAL;
354
355 /* make sure we propogate to grandchildren, etc */
356 propogate_badflag( child, newval );
357
358 } /* for: i */
359 }
360 PDL_END_CHILDLOOP(it)
361 } /* propogate_badflag */
362
363 Given a piddle ("pdl *it"), the routine loops through each "pdl_trans"
364 structure, where access to this structure is provided by the
365 "PDL_CHILDLOOP_THISCHILD" macro. The children of the piddle are stored
366 in the "pdls" array, after the parents, hence the loop from "i =
367 ...nparents" to "i = ...nparents - 1". Once we have the pointer to the
368 child piddle, we can do what we want to it; here we change the value of
369 the "state" variable, but the details are unimportant). What is
370 important is that we call "propogate_badflag" on this piddle, to ensure
371 we loop through its children. This recursion ensures we get to all the
372 offspring of a particular piddle.
373
374 Access to parents is similar, with the "for" loop replaced by:
375
376 for( i = 0;
377 i < trans->vtable->nparents;
378 i++ ) {
379 /* do stuff with parent #i: trans->pdls[i] */
380 }
381
382 What's in a transformation ("pdl_trans")
383 All transformations are implemented as structures
384
385 struct XXX_trans {
386 int magicno; /* to detect memory overwrites */
387 short flags; /* state of the trans */
388 pdl_transvtable *vtable; /* the all important vtable */
389 void (*freeproc)(struct pdl_trans *); /* Call to free this trans
390 (in case we had to malloc some stuff dor this trans) */
391 pdl *pdls[NP]; /* The pdls involved in the transformation */
392 int __datatype; /* the type of the transformation */
393 /* in general more members
394 /* depending on the actual transformation (slice, add, etc)
395 */
396 };
397
398 The transformation identifies all "pdl"s involved in the trans
399
400 pdl *pdls[NP];
401
402 with "NP" depending on the number of piddle args of the particular
403 trans. It records a state
404
405 short flags;
406
407 and the datatype
408
409 int __datatype;
410
411 of the trans (to which all piddles must be converted unless they are
412 explicitly typed, PDL functions created with PDL::PP make sure that
413 these conversions are done as necessary). Most important is the pointer
414 to the vtable (virtual table) that contains the actual functionality
415
416 pdl_transvtable *vtable;
417
418 The vtable structure in turn looks something like (slightly simplified
419 from pdl.h for clarity)
420
421 typedef struct pdl_transvtable {
422 pdl_transtype transtype;
423 int flags;
424 int nparents; /* number of parent pdls (input) */
425 int npdls; /* number of child pdls (output) */
426 char *per_pdl_flags; /* optimization flags */
427 void (*redodims)(pdl_trans *tr); /* figure out dims of children */
428 void (*readdata)(pdl_trans *tr); /* flow parents to children */
429 void (*writebackdata)(pdl_trans *tr); /* flow backwards */
430 void (*freetrans)(pdl_trans *tr); /* Free both the contents and it of
431 the trans member */
432 pdl_trans *(*copy)(pdl_trans *tr); /* Full copy */
433 int structsize;
434 char *name; /* For debuggers, mostly */
435 } pdl_transvtable;
436
437 We focus on the callback functions:
438
439 void (*redodims)(pdl_trans *tr);
440
441 "redodims" will work out the dimensions of piddles that need to be
442 created and is called from within the API function that should be
443 called to ensure that the dimensions of a piddle are accessible
444 (pdlapi.c):
445
446 void pdl_make_physdims(pdl *it)
447
448 "readdata" and "writebackdata" are responsible for the actual
449 computations of the child data from the parents or parent data from
450 those of the children, respectively (the dataflow aspect). The PDL
451 core makes sure that these are called as needed when piddle data is
452 accessed (lazy-evaluation). The general API function to ensure that a
453 piddle is up-to-date is
454
455 void pdl_make_physvaffine(pdl *it)
456
457 which should be called before accessing piddle data from XS/C (see
458 Core.xs for some examples).
459
460 "freetrans" frees dynamically allocated memory associated with the
461 trans as needed and "copy" can copy the transformation. Again,
462 functions built with PDL::PP make sure that copying and freeing via
463 these callbacks happens at the right times. (If they fail to do that we
464 have got a memory leak -- this has happened in the past ;).
465
466 The transformation and vtable code is hardly ever written by hand but
467 rather generated by PDL::PP from concise descriptions.
468
469 Certain types of transformations can be optimized very efficiently
470 obviating the need for explicit "readdata" and "writebackdata" methods.
471 Those transformations are called pdl_vaffine. Most dimension
472 manipulating functions (e.g., "slice", "xchg") belong to this class.
473
474 The basic trick is that parent and child of such a transformation work
475 on the same (shared) block of data which they just choose to interpret
476 differently (by dusing different "dims", "dimincs" and "offs" on the
477 same data, compare the "pdl" structure above). Each operation on a
478 piddle sharing data with another one in this way is therefore
479 automatically flown from child to parent and back -- after all they are
480 reading and writing the same block of memory. This is currently not
481 perl thread safe -- no big loss since the whole PDL core is not
482 reentrant (perl threading "!=" PDL threading!).
483
484 Signatures: threading over elementary operations
485 Most of that functionality of PDL threading (automatic iteration of
486 elemntary operations over multidim piddles) is implemented in the file
487 pdlthread.c.
488
489 The PDL::PP generated functions (in particular the "readdata" and
490 "writebackdata" callbacks) use this infrastructure to make sure that
491 the fundamental operation implemented by the trans is performed in
492 agreement with PDL's threading semantics.
493
494 Defining new PDL functions -- Glue code generation
495 Please, see PDL::PP and examples in the PDL distribution.
496 Implementation and syntax are currently far from perfect but it does a
497 good job!
498
499 The Core struct
500 As discussed in PDL::API, PDL uses a pointer to a structure to allow
501 PDL modules access to its core routines. The definition of this
502 structure (the "Core" struct) is in pdlcore.h (created by pdlcore.h.PL
503 in Basic/Core) and looks something like
504
505 /* Structure to hold pointers core PDL routines so as to be used by
506 * many modules
507 */
508 struct Core {
509 I32 Version;
510 pdl* (*SvPDLV) ( SV* );
511 void (*SetSV_PDL) ( SV *sv, pdl *it );
512 #if defined(PDL_clean_namespace) || defined(PDL_OLD_API)
513 pdl* (*new) ( ); /* make it work with gimp-perl */
514 #else
515 pdl* (*pdlnew) ( ); /* renamed because of C++ clash */
516 #endif
517 pdl* (*tmp) ( );
518 pdl* (*create) (int type);
519 void (*destroy) (pdl *it);
520 ...
521 }
522 typedef struct Core Core;
523
524 The first field of the structure ("Version") is used to ensure
525 consistency between modules at run time; the following code is placed
526 in the BOOT section of the generated xs code:
527
528 if (PDL->Version != PDL_CORE_VERSION)
529 Perl_croak(aTHX_ "Foo needs to be recompiled against the newly installed PDL");
530
531 If you add a new field to the Core struct you should:
532
533 · discuss it on the pdl porters email list
534 (pdl-porters@jach.hawaii.edu) [with the possibility of making your
535 changes to a separate branch of the CVS tree if it's a change that
536 will take time to complete]
537
538 · increase by 1 the value of the $pdl_core_version variable in
539 pdlcore.h.PL. This sets the value of the "PDL_CORE_VERSION" C
540 macro used to populate the Version field
541
542 · add documentation (eg to PDL::API) if it's a "useful" function for
543 external module writers (as well as ensuring the code is as well
544 documented as the rest of PDL ;)
545
547 This description is far from perfect. If you need more details or
548 something is still unclear please ask on the pdl-porters mailing list
549 (pdl-porters@jach.hawaii.edu).
550
552 Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), 2000 Doug
553 Burke (djburke@cpan.org), 2002 Christian Soeller & Doug Burke.
554
555 Redistribution in the same form is allowed but reprinting requires a
556 permission from the author.
557
558
559
560perl v5.12.3 2009-10-17 INTERNALS(1)