1INTERNALS(1) User Contributed Perl Documentation INTERNALS(1)
2
3
4
6 PDL::Internals - description of some aspects of the current internals
7
9 Intro
10 This document explains various aspects of the current implementation of
11 PDL. If you just want to use PDL for something, you definitely do not
12 need to read this. Even if you want to interface your C routines to PDL
13 or create new PDL::PP functions, you do not need to read this man page
14 (though it may be informative). This document is primarily intended for
15 people interested in debugging or changing the internals of PDL. To
16 read this, a good understanding of the C language and programming and
17 data structures in general is required, as well as some Perl
18 understanding. If you read through this document and understand all of
19 it and are able to point what any part of this document refers to in
20 the PDL core sources and additionally struggle to understand PDL::PP,
21 you will be awarded the title "PDL Guru" (of course, the current
22 version of this document is so incomplete that this is next to
23 impossible from just these notes).
24
25 Warning: If it seems that this document has gotten out of date, please
26 inform the PDL porters email list (pdl-devel@lists.sourceforge.net).
27 This may well happen.
28
29 Piddles
30 The pdl data object is generally an opaque scalar reference into a pdl
31 structure in memory. Alternatively, it may be a hash reference with the
32 "PDL" field containing the scalar reference (this makes overloading
33 piddles easy, see PDL::Objects). You can easily find out at the Perl
34 level which type of piddle you are dealing with. The example code below
35 demonstrates how to do it:
36
37 # check if this a piddle
38 die "not a piddle" unless UNIVERSAL::isa($pdl, 'PDL');
39 # is it a scalar ref or a hash ref?
40 if (UNIVERSAL::isa($pdl, "HASH")) {
41 die "not a valid PDL" unless exists $pdl->{PDL} &&
42 UNIVERSAL::isa($pdl->{PDL},'PDL');
43 print "This is a hash reference,",
44 " the PDL field contains the scalar ref\n";
45 } else {
46 print "This is a scalar ref that points to address $$pdl in memory\n";
47 }
48
49 The scalar reference points to the numeric address of a C structure of
50 type "pdl" which is defined in pdl.h. The mapping between the object at
51 the Perl level and the C structure containing the actual data and
52 structural that makes up a piddle is done by the PDL typemap. The
53 functions used in the PDL typemap are defined pretty much at the top of
54 the file pdlcore.h. So what does the structure look like:
55
56 struct pdl {
57 unsigned long magicno; /* Always stores PDL_MAGICNO as a sanity check */
58 /* This is first so most pointer accesses to wrong type are caught */
59 int state; /* What's in this pdl */
60
61 pdl_trans *trans; /* Opaque pointer to internals of transformation from
62 parent */
63
64 pdl_vaffine *vafftrans;
65
66 void* sv; /* (optional) pointer back to original sv.
67 ALWAYS check for non-null before use.
68 We cannot inc refcnt on this one or we'd
69 never get destroyed */
70
71 void *datasv; /* Pointer to SV containing data. Refcnt inced */
72 void *data; /* Null: no data alloced for this one */
73 PDL_Indx nvals; /* How many values allocated */
74 int datatype;
75 PDL_Indx *dims; /* Array of data dimensions */
76 PDL_Indx *dimincs; /* Array of data default increments */
77 short ndims; /* Number of data dimensions */
78
79 unsigned char *threadids; /* Starting index of the thread index set n */
80 unsigned char nthreadids;
81
82 pdl_children children;
83
84 PDL_Indx def_dims[PDL_NDIMS]; /* Preallocated space for efficiency */
85 PDL_Indx def_dimincs[PDL_NDIMS]; /* Preallocated space for efficiency */
86 unsigned char def_threadids[PDL_NTHREADIDS];
87
88 struct pdl_magic *magic;
89
90 void *hdrsv; /* "header", settable from outside */
91 };
92
93 This is quite a structure for just storing some data in - what is going
94 on?
95
96 Data storage
97 We are going to start with some of the simpler members: first of
98 all, there is the member
99
100 void *datasv;
101
102 which is really a pointer to a Perl SV structure ("SV *"). The SV
103 is expected to be representing a string, in which the data of the
104 piddle is stored in a tightly packed form. This pointer counts as
105 a reference to the SV so the reference count has been incremented
106 when the "SV *" was placed here (this reference count business has
107 to do with Perl's garbage collection mechanism -- don't worry if
108 this doesn't mean much to you). This pointer is allowed to have
109 the value "NULL" which means that there is no actual Perl SV for
110 this data - for instance, the data might be allocated by a "mmap"
111 operation. Note the use of an SV* was purely for convenience, it
112 allows easy transformation of packed data from files into piddles.
113 Other implementations are not excluded.
114
115 The actual pointer to data is stored in the member
116
117 void *data;
118
119 which contains a pointer to a memory area with space for
120
121 PDL_Indx nvals;
122
123 data items of the data type of this piddle. PDL_Indx is either
124 'long' or 'long long' depending on whether your perl is 64bit or
125 not.
126
127 The data type of the data is stored in the variable
128
129 int datatype;
130
131 the values for this member are given in the enum "pdl_datatypes"
132 (see pdl.h). Currently we have byte, short, unsigned short, long,
133 float and double types, see also PDL::Types.
134
135 Dimensions
136 The number of dimensions in the piddle is given by the member
137
138 int ndims;
139
140 which shows how many entries there are in the arrays
141
142 PDL_Indx *dims;
143 PDL_Indx *dimincs;
144
145 These arrays are intimately related: "dims" gives the sizes of the
146 dimensions and "dimincs" is always calculated by the code
147
148 PDL_Indx inc = 1;
149 for(i=0; i<it->ndims; i++) {
150 it->dimincs[i] = inc; inc *= it->dims[i];
151 }
152
153 in the routine "pdl_resize_defaultincs" in "pdlapi.c". What this
154 means is that the dimincs can be used to calculate the offset by
155 code like
156
157 PDL_Indx offs = 0;
158 for(i=0; i<it->ndims; i++) {
159 offs += it->dimincs[i] * index[i];
160 }
161
162 but this is not always the right thing to do, at least without
163 checking for certain things first.
164
165 Default storage
166 Since the vast majority of piddles don't have more than 6
167 dimensions, it is more efficient to have default storage for the
168 dimensions and dimincs inside the PDL struct.
169
170 PDL_Indx def_dims[PDL_NDIMS];
171 PDL_Indx def_dimincs[PDL_NDIMS];
172
173 The "dims" and "dimincs" may be set to point to the beginning of
174 these arrays if "ndims" is smaller than or equal to the compile-
175 time constant "PDL_NDIMS". This is important to note when freeing
176 a piddle struct. The same applies for the threadids:
177
178 unsigned char def_threadids[PDL_NTHREADIDS];
179
180 Magic
181 It is possible to attach magic to piddles, much like Perl's own
182 magic mechanism. If the member pointer
183
184 struct pdl_magic *magic;
185
186 is nonzero, the PDL has some magic attached to it. The
187 implementation of magic can be gleaned from the file pdlmagic.c in
188 the distribution.
189
190 State
191 One of the first members of the structure is
192
193 int state;
194
195 The possible flags and their meanings are given in "pdl.h". These
196 are mainly used to implement the lazy evaluation mechanism and
197 keep track of piddles in these operations.
198
199 Transformations and virtual affine transformations
200 As you should already know, piddles often carry information about
201 where they come from. For example, the code
202
203 $b = $a->slice("2:5");
204 $b .= 1;
205
206 will alter $a. So $b and $a know that they are connected via a
207 "slice"-transformation. This information is stored in the members
208
209 pdl_trans *trans;
210 pdl_vaffine *vafftrans;
211
212 Both $a (the parent) and $b (the child) store this information
213 about the transformation in appropriate slots of the "pdl"
214 structure.
215
216 "pdl_trans" and "pdl_vaffine" are structures that we will look at
217 in more detail below.
218
219 The Perl SVs
220 When piddles are referred to through Perl SVs, we store an
221 additional reference to it in the member
222
223 void* sv;
224
225 in order to be able to return a reference to the user when he
226 wants to inspect the transformation structure on the Perl side.
227
228 Also, we store an opaque
229
230 void *hdrsv;
231
232 which is just for use by the user to hook up arbitrary data with
233 this sv. This one is generally manipulated through sethdr and
234 gethdr calls.
235
236 Smart references and transformations: slicing and dicing
237 Smart references and most other fundamental functions operating on
238 piddles are implemented via transformations (as mentioned above) which
239 are represented by the type "pdl_trans" in PDL.
240
241 A transformation links input and output piddles and contains all the
242 infrastructure that defines how:
243
244 · output piddles are obtained from input piddles;
245
246 · changes in smartly linked output piddles (e.g. the child of a
247 sliced parent piddle) are flown back to the input piddle in
248 transformations where this is supported (the most often used
249 example being "slice" here);
250
251 · datatype and size of output piddles that need to be created are
252 obtained.
253
254 In general, executing a PDL function on a group of piddles results in
255 creation of a transformation of the requested type that links all input
256 and output arguments (at least those that are piddles). In PDL
257 functions that support data flow between input and output args (e.g.
258 "slice", "index") this transformation links parent (input) and child
259 (output) piddles permanently until either the link is explicitly broken
260 by user request ("sever" at the Perl level) or all parents and children
261 have been destroyed. In those cases the transformation is lazy-
262 evaluated, e.g. only executed when piddle values are actually accessed.
263
264 In non-flowing functions, for example addition ("+") and inner products
265 ("inner"), the transformation is installed just as in flowing functions
266 but then the transformation is immediately executed and destroyed
267 (breaking the link between input and output args) before the function
268 returns.
269
270 It should be noted that the close link between input and output args of
271 a flowing function (like slice) requires that piddle objects that are
272 linked in such a way be kept alive beyond the point where they have
273 gone out of scope from the point of view of Perl:
274
275 $a = zeroes(20);
276 $b = $a->slice('2:4');
277 undef $a; # last reference to $a is now destroyed
278
279 Although $a should now be destroyed according to Perl's rules the
280 underlying "pdl" structure must actually only be freed when $b also
281 goes out of scope (since it still references internally some of $a's
282 data). This example demonstrates that such a dataflow paradigm between
283 PDL objects necessitates a special destruction algorithm that takes the
284 links between piddles into account and couples the lifespan of those
285 objects. The non-trivial algorithm is implemented in the function
286 "pdl_destroy" in pdlapi.c. In fact, most of the code in pdlapi.c and
287 pdlfamily.c is concerned with making sure that piddles ("pdl *"s) are
288 created, updated and freed at the right times depending on interactions
289 with other piddles via PDL transformations (remember, "pdl_trans").
290
291 Accessing children and parents of a piddle
292 When piddles are dynamically linked via transformations as suggested
293 above input and output piddles are referred to as parents and children,
294 respectively.
295
296 An example of processing the children of a piddle is provided by the
297 "baddata" method of PDL::Bad (only available if you have compiled PDL
298 with the "WITH_BADVAL" option set to 1, but still useful as an
299 example!).
300
301 Consider the following situation:
302
303 pdl> $a = rvals(7,7,{Centre=>[3,4]});
304 pdl> $b = $a->slice('2:4,3:5');
305 pdl> ? vars
306 PDL variables in package main::
307
308 Name Type Dimension Flow State Mem
309 ----------------------------------------------------------------
310 $a Double D [7,7] P 0.38Kb
311 $b Double D [3,3] -C 0.00Kb
312
313 Now, if I suddenly decide that $a should be flagged as possibly
314 containing bad values, using
315
316 pdl> $a->badflag(1)
317
318 then I want the state of $b - it's child - to be changed as well (since
319 it will either share or inherit some of $a's data and so be also bad),
320 so that I get a 'B' in the State field:
321
322 pdl> ? vars
323 PDL variables in package main::
324
325 Name Type Dimension Flow State Mem
326 ----------------------------------------------------------------
327 $a Double D [7,7] PB 0.38Kb
328 $b Double D [3,3] -CB 0.00Kb
329
330 This bit of magic is performed by the "propagate_badflag" function,
331 which is listed below:
332
333 /* newval = 1 means set flag, 0 means clear it */
334 /* thanks to Christian Soeller for this */
335
336 void propagate_badflag( pdl *it, int newval ) {
337 PDL_DECL_CHILDLOOP(it)
338 PDL_START_CHILDLOOP(it)
339 {
340 pdl_trans *trans = PDL_CHILDLOOP_THISCHILD(it);
341 int i;
342 for( i = trans->vtable->nparents;
343 i < trans->vtable->npdls;
344 i++ ) {
345 pdl *child = trans->pdls[i];
346
347 if ( newval ) child->state |= PDL_BADVAL;
348 else child->state &= ~PDL_BADVAL;
349
350 /* make sure we propagate to grandchildren, etc */
351 propagate_badflag( child, newval );
352
353 } /* for: i */
354 }
355 PDL_END_CHILDLOOP(it)
356 } /* propagate_badflag */
357
358 Given a piddle ("pdl *it"), the routine loops through each "pdl_trans"
359 structure, where access to this structure is provided by the
360 "PDL_CHILDLOOP_THISCHILD" macro. The children of the piddle are stored
361 in the "pdls" array, after the parents, hence the loop from "i =
362 ...nparents" to "i = ...npdls - 1". Once we have the pointer to the
363 child piddle, we can do what we want to it; here we change the value of
364 the "state" variable, but the details are unimportant). What is
365 important is that we call "propagate_badflag" on this piddle, to ensure
366 we loop through its children. This recursion ensures we get to all the
367 offspring of a particular piddle.
368
369 Access to parents is similar, with the "for" loop replaced by:
370
371 for( i = 0;
372 i < trans->vtable->nparents;
373 i++ ) {
374 /* do stuff with parent #i: trans->pdls[i] */
375 }
376
377 What's in a transformation ("pdl_trans")
378 All transformations are implemented as structures
379
380 struct XXX_trans {
381 int magicno; /* to detect memory overwrites */
382 short flags; /* state of the trans */
383 pdl_transvtable *vtable; /* the all important vtable */
384 void (*freeproc)(struct pdl_trans *); /* Call to free this trans
385 (in case we had to malloc some stuff for this trans) */
386 pdl *pdls[NP]; /* The pdls involved in the transformation */
387 int __datatype; /* the type of the transformation */
388 /* in general more members
389 /* depending on the actual transformation (slice, add, etc)
390 */
391 };
392
393 The transformation identifies all "pdl"s involved in the trans
394
395 pdl *pdls[NP];
396
397 with "NP" depending on the number of piddle args of the particular
398 trans. It records a state
399
400 short flags;
401
402 and the datatype
403
404 int __datatype;
405
406 of the trans (to which all piddles must be converted unless they are
407 explicitly typed, PDL functions created with PDL::PP make sure that
408 these conversions are done as necessary). Most important is the pointer
409 to the vtable (virtual table) that contains the actual functionality
410
411 pdl_transvtable *vtable;
412
413 The vtable structure in turn looks something like (slightly simplified
414 from pdl.h for clarity)
415
416 typedef struct pdl_transvtable {
417 pdl_transtype transtype;
418 int flags;
419 int nparents; /* number of parent pdls (input) */
420 int npdls; /* number of child pdls (output) */
421 char *per_pdl_flags; /* optimization flags */
422 void (*redodims)(pdl_trans *tr); /* figure out dims of children */
423 void (*readdata)(pdl_trans *tr); /* flow parents to children */
424 void (*writebackdata)(pdl_trans *tr); /* flow backwards */
425 void (*freetrans)(pdl_trans *tr); /* Free both the contents and it of
426 the trans member */
427 pdl_trans *(*copy)(pdl_trans *tr); /* Full copy */
428 int structsize;
429 char *name; /* For debuggers, mostly */
430 } pdl_transvtable;
431
432 We focus on the callback functions:
433
434 void (*redodims)(pdl_trans *tr);
435
436 "redodims" will work out the dimensions of piddles that need to be
437 created and is called from within the API function that should be
438 called to ensure that the dimensions of a piddle are accessible
439 (pdlapi.c):
440
441 void pdl_make_physdims(pdl *it)
442
443 "readdata" and "writebackdata" are responsible for the actual
444 computations of the child data from the parents or parent data from
445 those of the children, respectively (the dataflow aspect). The PDL
446 core makes sure that these are called as needed when piddle data is
447 accessed (lazy-evaluation). The general API function to ensure that a
448 piddle is up-to-date is
449
450 void pdl_make_physvaffine(pdl *it)
451
452 which should be called before accessing piddle data from XS/C (see
453 Core.xs for some examples).
454
455 "freetrans" frees dynamically allocated memory associated with the
456 trans as needed and "copy" can copy the transformation. Again,
457 functions built with PDL::PP make sure that copying and freeing via
458 these callbacks happens at the right times. (If they fail to do that we
459 have got a memory leak -- this has happened in the past ;).
460
461 The transformation and vtable code is hardly ever written by hand but
462 rather generated by PDL::PP from concise descriptions.
463
464 Certain types of transformations can be optimized very efficiently
465 obviating the need for explicit "readdata" and "writebackdata" methods.
466 Those transformations are called pdl_vaffine. Most dimension
467 manipulating functions (e.g., "slice", "xchg") belong to this class.
468
469 The basic trick is that parent and child of such a transformation work
470 on the same (shared) block of data which they just choose to interpret
471 differently (by using different "dims", "dimincs" and "offs" on the
472 same data, compare the "pdl" structure above). Each operation on a
473 piddle sharing data with another one in this way is therefore
474 automatically flown from child to parent and back -- after all they are
475 reading and writing the same block of memory. This is currently not
476 Perl thread safe -- no big loss since the whole PDL core is not
477 reentrant (Perl threading "!=" PDL threading!).
478
479 Signatures: threading over elementary operations
480 Most of that functionality of PDL threading (automatic iteration of
481 elementary operations over multi-dim piddles) is implemented in the
482 file pdlthread.c.
483
484 The PDL::PP generated functions (in particular the "readdata" and
485 "writebackdata" callbacks) use this infrastructure to make sure that
486 the fundamental operation implemented by the trans is performed in
487 agreement with PDL's threading semantics.
488
489 Defining new PDL functions -- Glue code generation
490 Please, see PDL::PP and examples in the PDL distribution.
491 Implementation and syntax are currently far from perfect but it does a
492 good job!
493
494 The Core struct
495 As discussed in PDL::API, PDL uses a pointer to a structure to allow
496 PDL modules access to its core routines. The definition of this
497 structure (the "Core" struct) is in pdlcore.h (created by pdlcore.h.PL
498 in Basic/Core) and looks something like
499
500 /* Structure to hold pointers core PDL routines so as to be used by
501 * many modules
502 */
503 struct Core {
504 I32 Version;
505 pdl* (*SvPDLV) ( SV* );
506 void (*SetSV_PDL) ( SV *sv, pdl *it );
507 #if defined(PDL_clean_namespace) || defined(PDL_OLD_API)
508 pdl* (*new) ( ); /* make it work with gimp-perl */
509 #else
510 pdl* (*pdlnew) ( ); /* renamed because of C++ clash */
511 #endif
512 pdl* (*tmp) ( );
513 pdl* (*create) (int type);
514 void (*destroy) (pdl *it);
515 ...
516 }
517 typedef struct Core Core;
518
519 The first field of the structure ("Version") is used to ensure
520 consistency between modules at run time; the following code is placed
521 in the BOOT section of the generated xs code:
522
523 if (PDL->Version != PDL_CORE_VERSION)
524 Perl_croak(aTHX_ "Foo needs to be recompiled against the newly installed PDL");
525
526 If you add a new field to the Core struct you should:
527
528 · discuss it on the pdl porters email list
529 (pdl-devel@lists.sourceforge.net) [with the possibility of making
530 your changes to a separate branch of the CVS tree if it's a change
531 that will take time to complete]
532
533 · increase by 1 the value of the $pdl_core_version variable in
534 pdlcore.h.PL. This sets the value of the "PDL_CORE_VERSION" C
535 macro used to populate the Version field
536
537 · add documentation (e.g. to PDL::API) if it's a "useful" function
538 for external module writers (as well as ensuring the code is as
539 well documented as the rest of PDL ;)
540
542 This description is far from perfect. If you need more details or
543 something is still unclear please ask on the pdl-devel mailing list
544 (pdl-devel@lists.sourceforge.net).
545
547 Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), 2000 Doug
548 Burke (djburke@cpan.org), 2002 Christian Soeller & Doug Burke, 2013
549 Chris Marshall.
550
551 Redistribution in the same form is allowed but reprinting requires a
552 permission from the author.
553
554
555
556perl v5.28.1 2018-05-05 INTERNALS(1)