1INTERNALS(1)          User Contributed Perl Documentation         INTERNALS(1)
2
3
4

NAME

6       PDL::Internals - description of some aspects of the current internals
7

DESCRIPTION

9   Intro
10       This document explains various aspects of the current implementation of
11       PDL. If you just want to use PDL for something, you definitely do not
12       need to read this. Even if you want to interface your C routines to PDL
13       or create new PDL::PP functions, you do not need to read this man page
14       (though it may be informative). This document is primarily intended for
15       people interested in debugging or changing the internals of PDL. To
16       read this, a good understanding of the C language and programming and
17       data structures in general is required, as well as some Perl
18       understanding. If you read through this document and understand all of
19       it and are able to point what any part of this document refers to in
20       the PDL core sources and additionally struggle to understand PDL::PP,
21       you will be awarded the title "PDL Guru" (of course, the current
22       version of this document is so incomplete that this is next to
23       impossible from just these notes).
24
25       Warning: If it seems that this document has gotten out of date, please
26       inform the PDL porters email list (pdl-porters@jach.hawaii.edu).  This
27       may well happen.
28
29   Piddles
30       The pdl data object is generally an opaque scalar reference into a pdl
31       structure in memory. Alternatively, it may be a hash reference with the
32       "PDL" field containing the scalar reference (this makes overloading
33       piddles easy, see PDL::Objects). You can easily find out at the Perl
34       level which type of piddle you are dealing with. The example code below
35       demonstrates how to do it:
36
37          # check if this a piddle
38          die "not a piddle" unless UNIVERSAL::isa($pdl, 'PDL');
39          # is it a scalar ref or a hash ref?
40          if (UNIVERSAL::isa($pdl, "HASH")) {
41            die "not a valid PDL" unless exists $pdl->{PDL} &&
42               UNIVERSAL::isa($pdl->{PDL},'PDL');
43            print "This is a hash reference,",
44               " the PDL field contains the scalar ref\n";
45          } else {
46               print "This is a scalar ref that points to address $$pdl in memory\n";
47          }
48
49       The scalar reference points to the numeric address of a C structure of
50       type "pdl" which is defined in pdl.h. The mapping between the object at
51       the Perl level and the C structure containing the actual data and
52       structural that makes up a piddle is done by the PDL typemap.  The
53       functions used in the PDL typemap are defined pretty much at the top of
54       the file pdlcore.h. So what does the structure look like:
55
56               struct pdl {
57                  unsigned long magicno; /* Always stores PDL_MAGICNO as a sanity check */
58                    /* This is first so most pointer accesses to wrong type are caught */
59                  int state;        /* What's in this pdl */
60
61                  pdl_trans *trans; /* Opaque pointer to internals of transformation from
62                                       parent */
63
64                  pdl_vaffine *vafftrans;
65
66                  void*    sv;      /* (optional) pointer back to original sv.
67                                         ALWAYS check for non-null before use.
68                                         We cannot inc refcnt on this one or we'd
69                                         never get destroyed */
70
71                  void *datasv;        /* Pointer to SV containing data. Refcnt inced */
72                  void *data;            /* Null: no data alloced for this one */
73                  int nvals;           /* How many values allocated */
74                  int datatype;
75                  PDL_Long   *dims;      /* Array of data dimensions */
76                  PDL_Long   *dimincs;   /* Array of data default increments */
77                  short    ndims;     /* Number of data dimensions */
78
79                  unsigned char *threadids;  /* Starting index of the thread index set n */
80                  unsigned char nthreadids;
81
82                  pdl *progenitor; /* I'm in a mutated family. make_physical_now must
83                                      copy me to the new generation. */
84                  pdl *future_me;  /* I'm the "then" pdl and this is my "now" (or more modern
85                                      version, anyway */
86
87                  pdl_children children;
88
89                  short living_for; /* perl side not referenced; delete me when */
90
91                  PDL_Long   def_dims[PDL_NDIMS];   /* Preallocated space for efficiency */
92                  PDL_Long   def_dimincs[PDL_NDIMS];   /* Preallocated space for efficiency */
93                  unsigned char def_threadids[PDL_NTHREADIDS];
94
95                  struct pdl_magic *magic;
96
97                  void *hdrsv; /* "header", settable from outside */
98               };
99
100       This is quite a structure for just storing some data in - what is going
101       on?
102
103       Data storage
104            We are going to start with some of the simpler members: first of
105            all, there is the member
106
107                    void *datasv;
108
109            which is really a pointer to a Perl SV structure ("SV *"). The SV
110            is expected to be representing a string, in which the data of the
111            piddle is stored in a tightly packed form. This pointer counts as
112            a reference to the SV so the reference count has been incremented
113            when the "SV *" was placed here (this reference count business has
114            to do with Perl's garbage collection mechanism -- don't worry if
115            this doesn't mean much to you). This pointer is allowed to have
116            the value "NULL" which means that there is no actual Perl SV for
117            this data - for instance, the data might be allocated by a "mmap"
118            operation. Note the use of an SV* was purely for convenience, it
119            allows easy transformation of packed data from files into piddles.
120            Other implementations are not excluded.
121
122            The actual pointer to data is stored in the member
123
124                    void *data;
125
126            which contains a pointer to a memory area with space for
127
128                    int nvals;
129
130            data items of the data type of this piddle.
131
132            The data type of the data is stored in the variable
133
134                    int datatype;
135
136            the values for this member are given in the enum "pdl_datatypes"
137            (see pdl.h). Currently we have byte, short, unsigned short, long,
138            float and double types, see also PDL::Types.
139
140       Dimensions
141            The number of dimensions in the piddle is given by the member
142
143                    int ndims;
144
145            which shows how many entries there are in the arrays
146
147                    PDL_Long   *dims;
148                    PDL_Long   *dimincs;
149
150            These arrays are intimately related: "dims" gives the sizes of the
151            dimensions and "dimincs" is always calculated by the code
152
153                    int inc = 1;
154                    for(i=0; i<it->ndims; i++) {
155                            it->dimincs[i] = inc; inc *= it->dims[i];
156                    }
157
158            in the routine "pdl_resize_defaultincs" in "pdlapi.c".  What this
159            means is that the dimincs can be used to calculate the offset by
160            code like
161
162                    int offs = 0;
163                    for(i=0; i<it->ndims; i++) {
164                            offs += it->dimincs[i] * index[i];
165                    }
166
167            but this is not always the right thing to do, at least without
168            checking for certain things first.
169
170       Default storage
171            Since the vast majority of piddles don't have more than 6
172            dimensions, it is more efficient to have default storage for the
173            dimensions and dimincs inside the PDL struct.
174
175                    PDL_Long   def_dims[PDL_NDIMS];
176                    PDL_Long   def_dimincs[PDL_NDIMS];
177
178            The "dims" and "dimincs" may be set to point to the beginning of
179            these arrays if "ndims" is smaller than or equal to the compile-
180            time constant "PDL_NDIMS". This is important to note when freeing
181            a piddle struct.  The same applies for the threadids:
182
183                    unsigned char def_threadids[PDL_NTHREADIDS];
184
185       Magic
186            It is possible to attach magic to piddles, much like Perl's own
187            magic mechanism. If the member pointer
188
189                       struct pdl_magic *magic;
190
191            is nonzero, the PDL has some magic attached to it. The
192            implementation of magic can be gleaned from the file pdlmagic.c in
193            the distribution.
194
195       State
196            One of the first members of the structure is
197
198                    int state;
199
200            The possible flags and their meanings are given in "pdl.h".  These
201            are mainly used to implement the lazy evaluation mechanism and
202            keep track of piddles in these operations.
203
204       Transformations and virtual affine transformations
205            As you should already know, piddles often carry information about
206            where they come from. For example, the code
207
208                    $b = $a->slice("2:5");
209                    $b .= 1;
210
211            will alter $a. So $b and $a know that they are connected via a
212            "slice"-transformation. This information is stored in the members
213
214                    pdl_trans *trans;
215                    pdl_vaffine *vafftrans;
216
217            Both $a (the parent) and $b (the child) store this information
218            about the transformation in appropriate slots of the "pdl"
219            structure.
220
221            "pdl_trans" and "pdl_vaffine" are structures that we will look at
222            in more detail below.
223
224       The Perl SVs
225            When piddles are referred to through Perl SVs, we store an
226            additional reference to it in the member
227
228                    void*    sv;
229
230            in order to be able to return a reference to the user when he
231            wants to inspect the transformation structure on the Perl side.
232
233            Also, we store an opaque
234
235                    void *hdrsv;
236
237            which is just for use by the user to hook up arbitrary data with
238            this sv.  This one is generally manipulated through sethdr and
239            gethdr calls.
240
241   Smart references and transformations: slicing and dicing
242       Smart references and most other fundamental functions operating on
243       piddles are implemented via transformations (Aas mentioned above) which
244       are represented by the type "pdl_trans" in PDL.
245
246       A transformation links input and output piddles and contains all the
247       infrastructure that defines how
248
249       ·   output piddles are obtained from input piddles
250
251       ·   changes in smartly linked output piddles (e.g. the child of a
252           sliced parent piddle) are flown back to the input piddle in
253           transformations where this is supported (the most often used
254           example being "slice" here).
255
256       ·   datatype and size of output piddles that need to be created are
257           obtained
258
259       In general, executing a PDL function on a group of piddles results in
260       creation of a transformation of the requested type that links all input
261       and output arguments (at least those that are piddles). In PDL
262       functions that support data flow between input and output args (e.g.
263       "slice", "index") this transformation links parent (input) and child
264       (output) piddles permanently until either the link is explicitly broken
265       by user request ("sever" at the perl level) or all parents and childen
266       have been destroyed. In those cases the transformation is lazy-
267       evaluated, e.g. only executed when piddle values are actually accessed.
268
269       In non-flowing functions, for example addition ("+") and inner products
270       ("inner"), the transformation is installed just as in flowing functions
271       but then the transformation is immediately executed and destroyed
272       (breaking the link between input and output args) before the function
273       returns.
274
275       It should be noted that the close link between input and output args of
276       a flowing function (like slice) requires that piddle objects that are
277       linked in such a way be kept alive beyond the point where they have
278       gone out of scope from the point of view of perl:
279
280         $a = zeroes(20);
281         $b = $a->slice('2:4');
282         undef $a;    # last reference to $a is now destroyed
283
284       Although $a should now be destroyed according to perl's rules the
285       underlying "pdl" structure must actually only be freed when $b also
286       goes out of scope (since it still references internally some of $a's
287       data). This example demonstrates that such a dataflow paradigm between
288       PDL objects necessitates a special destruction algorithm that takes the
289       links between piddles into account and couples the lifespan of those
290       objects. The non-trivial algorithm is implemented in the function
291       "pdl_destroy" in pdlapi.c. In fact, most of the code in pdlapi.c and
292       pdlfamily.c is concerned with making sure that piddles ("pdl *"s) are
293       created, updated and freed at the right times depending on interactions
294       with other piddles via PDL transformations (remember, "pdl_trans").
295
296   Accessing children and parents of a piddle
297       When piddles are dynamically linked via transformations as suggested
298       above input and output piddles are referred to as parents and children,
299       respectively.
300
301       An example of processing the children of a piddle is provided by the
302       "baddata" method of PDL::Bad (only available if you have comiled PDL
303       with the "WITH_BADVAL" option set to 1, but still useful as an
304       example!).
305
306       Consider the following situation:
307
308        perldl> $a = rvals(7,7,Centre=>[3,4]);
309        perldl> $b = $a->slice('2:4,3:5');
310        perldl> ? vars
311        PDL variables in package main::
312
313        Name         Type   Dimension       Flow  State          Mem
314        ----------------------------------------------------------------
315        $a           Double D [7,7]                P            0.38Kb
316        $b           Double D [3,3]                VC           0.00Kb
317
318       Now, if I suddenly decide that $a should be flagged as possibly
319       containing bad values, using
320
321        perldl> $a->baddata(1)
322
323       then I want the state of $b - it's child - to be changed as well (since
324       it will either share or inherit some of $a's data and so be also bad),
325       so that I get a 'B' in the State field:
326
327        perldl> ? vars
328        PDL variables in package main::
329
330        Name         Type   Dimension       Flow  State          Mem
331        ----------------------------------------------------------------
332        $a           Double D [7,7]                PB           0.38Kb
333        $b           Double D [3,3]                VCB          0.00Kb
334
335       This bit of magic is performed by the "propogate_badflag" function,
336       which is listed below:
337
338        /* newval = 1 means set flag, 0 means clear it */
339        /* thanks to Christian Soeller for this */
340
341        void propogate_badflag( pdl *it, int newval ) {
342           PDL_DECL_CHILDLOOP(it)
343           PDL_START_CHILDLOOP(it)
344           {
345               pdl_trans *trans = PDL_CHILDLOOP_THISCHILD(it);
346               int i;
347               for( i = trans->vtable->nparents;
348                    i < trans->vtable->npdls;
349                    i++ ) {
350                   pdl *child = trans->pdls[i];
351
352                   if ( newval ) child->state |=  PDL_BADVAL;
353                   else          child->state &= ~PDL_BADVAL;
354
355                   /* make sure we propogate to grandchildren, etc */
356                   propogate_badflag( child, newval );
357
358               } /* for: i */
359           }
360           PDL_END_CHILDLOOP(it)
361        } /* propogate_badflag */
362
363       Given a piddle ("pdl *it"), the routine loops through each "pdl_trans"
364       structure, where access to this structure is provided by the
365       "PDL_CHILDLOOP_THISCHILD" macro.  The children of the piddle are stored
366       in the "pdls" array, after the parents, hence the loop from "i =
367       ...nparents" to "i = ...nparents - 1".  Once we have the pointer to the
368       child piddle, we can do what we want to it; here we change the value of
369       the "state" variable, but the details are unimportant).  What is
370       important is that we call "propogate_badflag" on this piddle, to ensure
371       we loop through its children. This recursion ensures we get to all the
372       offspring of a particular piddle.
373
374       Access to parents is similar, with the "for" loop replaced by:
375
376               for( i = 0;
377                    i < trans->vtable->nparents;
378                    i++ ) {
379                  /* do stuff with parent #i: trans->pdls[i] */
380               }
381
382   What's in a transformation ("pdl_trans")
383       All transformations are implemented as structures
384
385         struct XXX_trans {
386               int magicno; /* to detect memory overwrites */
387               short flags; /* state of the trans */
388               pdl_transvtable *vtable;   /* the all important vtable */
389               void (*freeproc)(struct pdl_trans *);  /* Call to free this trans
390                       (in case we had to malloc some stuff dor this trans) */
391               pdl *pdls[NP]; /* The pdls involved in the transformation */
392               int __datatype; /* the type of the transformation */
393               /* in general more members
394               /* depending on the actual transformation (slice, add, etc)
395                */
396         };
397
398       The transformation identifies all "pdl"s involved in the trans
399
400         pdl *pdls[NP];
401
402       with "NP" depending on the number of piddle args of the particular
403       trans. It records a state
404
405         short flags;
406
407       and the datatype
408
409         int __datatype;
410
411       of the trans (to which all piddles must be converted unless they are
412       explicitly typed, PDL functions created with PDL::PP make sure that
413       these conversions are done as necessary). Most important is the pointer
414       to the vtable (virtual table) that contains the actual functionality
415
416        pdl_transvtable *vtable;
417
418       The vtable structure in turn looks something like (slightly simplified
419       from pdl.h for clarity)
420
421         typedef struct pdl_transvtable {
422               pdl_transtype transtype;
423               int flags;
424               int nparents;   /* number of parent pdls (input) */
425               int npdls;      /* number of child pdls (output) */
426               char *per_pdl_flags;  /* optimization flags */
427               void (*redodims)(pdl_trans *tr);  /* figure out dims of children */
428               void (*readdata)(pdl_trans *tr);  /* flow parents to children  */
429               void (*writebackdata)(pdl_trans *tr); /* flow backwards */
430               void (*freetrans)(pdl_trans *tr); /* Free both the contents and it of
431                                               the trans member */
432               pdl_trans *(*copy)(pdl_trans *tr); /* Full copy */
433               int structsize;
434               char *name; /* For debuggers, mostly */
435         } pdl_transvtable;
436
437       We focus on the callback functions:
438
439               void (*redodims)(pdl_trans *tr);
440
441       "redodims" will work out the dimensions of piddles that need to be
442       created and is called from within the API function that should be
443       called to ensure that the dimensions of a piddle are accessible
444       (pdlapi.c):
445
446          void pdl_make_physdims(pdl *it)
447
448       "readdata" and "writebackdata" are responsible for the actual
449       computations of the child data from the parents or parent data from
450       those of the children, respectively (the dataflow aspect).  The PDL
451       core makes sure that these are called as needed when piddle data is
452       accessed (lazy-evaluation). The general API function to ensure that a
453       piddle is up-to-date is
454
455         void pdl_make_physvaffine(pdl *it)
456
457       which should be called before accessing piddle data from XS/C (see
458       Core.xs for some examples).
459
460       "freetrans" frees dynamically allocated memory associated with the
461       trans as needed and "copy" can copy the transformation.  Again,
462       functions built with PDL::PP make sure that copying and freeing via
463       these callbacks happens at the right times. (If they fail to do that we
464       have got a memory leak -- this has happened in the past ;).
465
466       The transformation and vtable code is hardly ever written by hand but
467       rather generated by PDL::PP from concise descriptions.
468
469       Certain types of transformations can be optimized very efficiently
470       obviating the need for explicit "readdata" and "writebackdata" methods.
471       Those transformations are called pdl_vaffine. Most dimension
472       manipulating functions (e.g., "slice", "xchg") belong to this class.
473
474       The basic trick is that parent and child of such a transformation work
475       on the same (shared) block of data which they just choose to interpret
476       differently (by dusing different "dims", "dimincs" and "offs" on the
477       same data, compare the "pdl" structure above).  Each operation on a
478       piddle sharing data with another one in this way is therefore
479       automatically flown from child to parent and back -- after all they are
480       reading and writing the same block of memory. This is currently not
481       perl thread safe -- no big loss since the whole PDL core is not
482       reentrant (perl threading "!=" PDL threading!).
483
484   Signatures: threading over elementary operations
485       Most of that functionality of PDL threading (automatic iteration of
486       elemntary operations over multidim piddles) is implemented in the file
487       pdlthread.c.
488
489       The PDL::PP generated functions (in particular the "readdata" and
490       "writebackdata" callbacks) use this infrastructure to make sure that
491       the fundamental operation implemented by the trans is performed in
492       agreement with PDL's threading semantics.
493
494   Defining new PDL functions -- Glue code generation
495       Please, see PDL::PP and examples in the PDL distribution.
496       Implementation and syntax are currently far from perfect but it does a
497       good job!
498
499   The Core struct
500       As discussed in PDL::API, PDL uses a pointer to a structure to allow
501       PDL modules access to its core routines. The definition of this
502       structure (the "Core" struct) is in pdlcore.h (created by pdlcore.h.PL
503       in Basic/Core) and looks something like
504
505        /* Structure to hold pointers core PDL routines so as to be used by
506         * many modules
507         */
508        struct Core {
509           I32    Version;
510           pdl*   (*SvPDLV)      ( SV*  );
511           void   (*SetSV_PDL)   ( SV *sv, pdl *it );
512        #if defined(PDL_clean_namespace) || defined(PDL_OLD_API)
513           pdl*   (*new)      ( );     /* make it work with gimp-perl */
514        #else
515           pdl*   (*pdlnew)      ( );  /* renamed because of C++ clash */
516        #endif
517           pdl*   (*tmp)         ( );
518           pdl*   (*create)      (int type);
519           void   (*destroy)     (pdl *it);
520           ...
521        }
522        typedef struct Core Core;
523
524       The first field of the structure ("Version") is used to ensure
525       consistency between modules at run time; the following code is placed
526       in the BOOT section of the generated xs code:
527
528        if (PDL->Version != PDL_CORE_VERSION)
529          Perl_croak(aTHX_ "Foo needs to be recompiled against the newly installed PDL");
530
531       If you add a new field to the Core struct you should:
532
533       ·    discuss it on the pdl porters email list
534            (pdl-porters@jach.hawaii.edu) [with the possibility of making your
535            changes to a separate branch of the CVS tree if it's a change that
536            will take time to complete]
537
538       ·    increase by 1 the value of the $pdl_core_version variable in
539            pdlcore.h.PL. This sets the value of the "PDL_CORE_VERSION" C
540            macro used to populate the Version field
541
542       ·    add documentation (eg to PDL::API) if it's a "useful" function for
543            external module writers (as well as ensuring the code is as well
544            documented as the rest of PDL ;)
545

BUGS

547       This description is far from perfect. If you need more details or
548       something is still unclear please ask on the pdl-porters mailing list
549       (pdl-porters@jach.hawaii.edu).
550

AUTHOR

552       Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), 2000 Doug
553       Burke (djburke@cpan.org), 2002 Christian Soeller & Doug Burke.
554
555       Redistribution in the same form is allowed but reprinting requires a
556       permission from the author.
557
558
559
560perl v5.12.3                      2009-10-17                      INTERNALS(1)
Impressum