1INTERNALS(1)          User Contributed Perl Documentation         INTERNALS(1)
2
3
4

NAME

6       PDL::Internals - description of some aspects of the current internals
7

DESCRIPTION

9   Intro
10       This document explains various aspects of the current implementation of
11       PDL. If you just want to use PDL for something, you definitely do not
12       need to read this. Even if you want to interface your C routines to PDL
13       or create new PDL::PP functions, you do not need to read this man page
14       (though it may be informative). This document is primarily intended for
15       people interested in debugging or changing the internals of PDL. To
16       read this, a good understanding of the C language and programming and
17       data structures in general is required, as well as some Perl
18       understanding. If you read through this document and understand all of
19       it and are able to point what any part of this document refers to in
20       the PDL core sources and additionally struggle to understand PDL::PP,
21       you will be awarded the title "PDL Guru" (of course, the current
22       version of this document is so incomplete that this is next to
23       impossible from just these notes).
24
25       Warning: If it seems that this document has gotten out of date, please
26       inform the PDL porters email list (pdl-devel@lists.sourceforge.net).
27       This may well happen.
28
29   Piddles
30       The pdl data object is generally an opaque scalar reference into a pdl
31       structure in memory. Alternatively, it may be a hash reference with the
32       "PDL" field containing the scalar reference (this makes overloading
33       piddles easy, see PDL::Objects). You can easily find out at the Perl
34       level which type of piddle you are dealing with. The example code below
35       demonstrates how to do it:
36
37          # check if this a piddle
38          die "not a piddle" unless UNIVERSAL::isa($pdl, 'PDL');
39          # is it a scalar ref or a hash ref?
40          if (UNIVERSAL::isa($pdl, "HASH")) {
41            die "not a valid PDL" unless exists $pdl->{PDL} &&
42               UNIVERSAL::isa($pdl->{PDL},'PDL');
43            print "This is a hash reference,",
44               " the PDL field contains the scalar ref\n";
45          } else {
46               print "This is a scalar ref that points to address $$pdl in memory\n";
47          }
48
49       The scalar reference points to the numeric address of a C structure of
50       type "pdl" which is defined in pdl.h. The mapping between the object at
51       the Perl level and the C structure containing the actual data and
52       structural that makes up a piddle is done by the PDL typemap.  The
53       functions used in the PDL typemap are defined pretty much at the top of
54       the file pdlcore.h. So what does the structure look like:
55
56               struct pdl {
57                  unsigned long magicno; /* Always stores PDL_MAGICNO as a sanity check */
58                    /* This is first so most pointer accesses to wrong type are caught */
59                  int state;        /* What's in this pdl */
60
61                  pdl_trans *trans; /* Opaque pointer to internals of transformation from
62                                       parent */
63
64                  pdl_vaffine *vafftrans;
65
66                  void*    sv;      /* (optional) pointer back to original sv.
67                                         ALWAYS check for non-null before use.
68                                         We cannot inc refcnt on this one or we'd
69                                         never get destroyed */
70
71                  void *datasv;        /* Pointer to SV containing data. Refcnt inced */
72                  void *data;            /* Null: no data alloced for this one */
73                  PDL_Indx nvals;           /* How many values allocated */
74                  int datatype;
75                  PDL_Indx   *dims;      /* Array of data dimensions */
76                  PDL_Indx   *dimincs;   /* Array of data default increments */
77                  short    ndims;     /* Number of data dimensions */
78
79                  unsigned char *threadids;  /* Starting index of the thread index set n */
80                  unsigned char nthreadids;
81
82                  pdl_children children;
83
84                  PDL_Indx   def_dims[PDL_NDIMS];   /* Preallocated space for efficiency */
85                  PDL_Indx   def_dimincs[PDL_NDIMS];   /* Preallocated space for efficiency */
86                  unsigned char def_threadids[PDL_NTHREADIDS];
87
88                  struct pdl_magic *magic;
89
90                  void *hdrsv; /* "header", settable from outside */
91               };
92
93       This is quite a structure for just storing some data in - what is going
94       on?
95
96       Data storage
97            We are going to start with some of the simpler members: first of
98            all, there is the member
99
100                    void *datasv;
101
102            which is really a pointer to a Perl SV structure ("SV *"). The SV
103            is expected to be representing a string, in which the data of the
104            piddle is stored in a tightly packed form. This pointer counts as
105            a reference to the SV so the reference count has been incremented
106            when the "SV *" was placed here (this reference count business has
107            to do with Perl's garbage collection mechanism -- don't worry if
108            this doesn't mean much to you). This pointer is allowed to have
109            the value "NULL" which means that there is no actual Perl SV for
110            this data - for instance, the data might be allocated by a "mmap"
111            operation. Note the use of an SV* was purely for convenience, it
112            allows easy transformation of packed data from files into piddles.
113            Other implementations are not excluded.
114
115            The actual pointer to data is stored in the member
116
117                    void *data;
118
119            which contains a pointer to a memory area with space for
120
121                    PDL_Indx nvals;
122
123            data items of the data type of this piddle.  PDL_Indx is either
124            'long' or 'long long' depending on whether your perl is 64bit or
125            not.
126
127            The data type of the data is stored in the variable
128
129                    int datatype;
130
131            the values for this member are given in the enum "pdl_datatypes"
132            (see pdl.h). Currently we have byte, short, unsigned short, long,
133            float and double types, see also PDL::Types.
134
135       Dimensions
136            The number of dimensions in the piddle is given by the member
137
138                    int ndims;
139
140            which shows how many entries there are in the arrays
141
142                    PDL_Indx   *dims;
143                    PDL_Indx   *dimincs;
144
145            These arrays are intimately related: "dims" gives the sizes of the
146            dimensions and "dimincs" is always calculated by the code
147
148                    PDL_Indx inc = 1;
149                    for(i=0; i<it->ndims; i++) {
150                            it->dimincs[i] = inc; inc *= it->dims[i];
151                    }
152
153            in the routine "pdl_resize_defaultincs" in "pdlapi.c".  What this
154            means is that the dimincs can be used to calculate the offset by
155            code like
156
157                    PDL_Indx offs = 0;
158                    for(i=0; i<it->ndims; i++) {
159                            offs += it->dimincs[i] * index[i];
160                    }
161
162            but this is not always the right thing to do, at least without
163            checking for certain things first.
164
165       Default storage
166            Since the vast majority of piddles don't have more than 6
167            dimensions, it is more efficient to have default storage for the
168            dimensions and dimincs inside the PDL struct.
169
170                    PDL_Indx   def_dims[PDL_NDIMS];
171                    PDL_Indx   def_dimincs[PDL_NDIMS];
172
173            The "dims" and "dimincs" may be set to point to the beginning of
174            these arrays if "ndims" is smaller than or equal to the compile-
175            time constant "PDL_NDIMS". This is important to note when freeing
176            a piddle struct.  The same applies for the threadids:
177
178                    unsigned char def_threadids[PDL_NTHREADIDS];
179
180       Magic
181            It is possible to attach magic to piddles, much like Perl's own
182            magic mechanism. If the member pointer
183
184                       struct pdl_magic *magic;
185
186            is nonzero, the PDL has some magic attached to it. The
187            implementation of magic can be gleaned from the file pdlmagic.c in
188            the distribution.
189
190       State
191            One of the first members of the structure is
192
193                    int state;
194
195            The possible flags and their meanings are given in "pdl.h".  These
196            are mainly used to implement the lazy evaluation mechanism and
197            keep track of piddles in these operations.
198
199       Transformations and virtual affine transformations
200            As you should already know, piddles often carry information about
201            where they come from. For example, the code
202
203                    $y = $x->slice("2:5");
204                    $y .= 1;
205
206            will alter $x. So $y and $x know that they are connected via a
207            "slice"-transformation. This information is stored in the members
208
209                    pdl_trans *trans;
210                    pdl_vaffine *vafftrans;
211
212            Both $x (the parent) and $y (the child) store this information
213            about the transformation in appropriate slots of the "pdl"
214            structure.
215
216            "pdl_trans" and "pdl_vaffine" are structures that we will look at
217            in more detail below.
218
219       The Perl SVs
220            When piddles are referred to through Perl SVs, we store an
221            additional reference to it in the member
222
223                    void*    sv;
224
225            in order to be able to return a reference to the user when he
226            wants to inspect the transformation structure on the Perl side.
227
228            Also, we store an opaque
229
230                    void *hdrsv;
231
232            which is just for use by the user to hook up arbitrary data with
233            this sv.  This one is generally manipulated through sethdr and
234            gethdr calls.
235
236   Smart references and transformations: slicing and dicing
237       Smart references and most other fundamental functions operating on
238       piddles are implemented via transformations (as mentioned above) which
239       are represented by the type "pdl_trans" in PDL.
240
241       A transformation links input and output piddles and contains all the
242       infrastructure that defines how:
243
244       ·   output piddles are obtained from input piddles;
245
246       ·   changes in smartly linked output piddles (e.g. the child of a
247           sliced parent piddle) are flown back to the input piddle in
248           transformations where this is supported (the most often used
249           example being "slice" here);
250
251       ·   datatype and size of output piddles that need to be created are
252           obtained.
253
254       In general, executing a PDL function on a group of piddles results in
255       creation of a transformation of the requested type that links all input
256       and output arguments (at least those that are piddles). In PDL
257       functions that support data flow between input and output args (e.g.
258       "slice", "index") this transformation links parent (input) and child
259       (output) piddles permanently until either the link is explicitly broken
260       by user request ("sever" at the Perl level) or all parents and children
261       have been destroyed. In those cases the transformation is lazy-
262       evaluated, e.g. only executed when piddle values are actually accessed.
263
264       In non-flowing functions, for example addition ("+") and inner products
265       ("inner"), the transformation is installed just as in flowing functions
266       but then the transformation is immediately executed and destroyed
267       (breaking the link between input and output args) before the function
268       returns.
269
270       It should be noted that the close link between input and output args of
271       a flowing function (like slice) requires that piddle objects that are
272       linked in such a way be kept alive beyond the point where they have
273       gone out of scope from the point of view of Perl:
274
275         $x = zeroes(20);
276         $y = $x->slice('2:4');
277         undef $x;    # last reference to $x is now destroyed
278
279       Although $x should now be destroyed according to Perl's rules the
280       underlying "pdl" structure must actually only be freed when $y also
281       goes out of scope (since it still references internally some of $x's
282       data). This example demonstrates that such a dataflow paradigm between
283       PDL objects necessitates a special destruction algorithm that takes the
284       links between piddles into account and couples the lifespan of those
285       objects. The non-trivial algorithm is implemented in the function
286       "pdl_destroy" in pdlapi.c. In fact, most of the code in pdlapi.c and
287       pdlfamily.c is concerned with making sure that piddles ("pdl *"s) are
288       created, updated and freed at the right times depending on interactions
289       with other piddles via PDL transformations (remember, "pdl_trans").
290
291   Accessing children and parents of a piddle
292       When piddles are dynamically linked via transformations as suggested
293       above input and output piddles are referred to as parents and children,
294       respectively.
295
296       An example of processing the children of a piddle is provided by the
297       "baddata" method of PDL::Bad (only available if you have compiled PDL
298       with the "WITH_BADVAL" option set to 1, but still useful as an
299       example!).
300
301       Consider the following situation:
302
303        pdl> $x = rvals(7,7,{Centre=>[3,4]});
304        pdl> $y = $x->slice('2:4,3:5');
305        pdl> ? vars
306        PDL variables in package main::
307
308        Name         Type   Dimension       Flow  State          Mem
309        ----------------------------------------------------------------
310        $x           Double D [7,7]                P            0.38Kb
311        $y           Double D [3,3]                -C           0.00Kb
312
313       Now, if I suddenly decide that $x should be flagged as possibly
314       containing bad values, using
315
316        pdl> $x->badflag(1)
317
318       then I want the state of $y - it's child - to be changed as well (since
319       it will either share or inherit some of $x's data and so be also bad),
320       so that I get a 'B' in the State field:
321
322        pdl> ? vars
323        PDL variables in package main::
324
325        Name         Type   Dimension       Flow  State          Mem
326        ----------------------------------------------------------------
327        $x           Double D [7,7]                PB           0.38Kb
328        $y           Double D [3,3]                -CB          0.00Kb
329
330       This bit of magic is performed by the "propagate_badflag" function,
331       which is listed below:
332
333        /* newval = 1 means set flag, 0 means clear it */
334        /* thanks to Christian Soeller for this */
335
336        void propagate_badflag( pdl *it, int newval ) {
337           PDL_DECL_CHILDLOOP(it)
338           PDL_START_CHILDLOOP(it)
339           {
340               pdl_trans *trans = PDL_CHILDLOOP_THISCHILD(it);
341               int i;
342               for( i = trans->vtable->nparents;
343                    i < trans->vtable->npdls;
344                    i++ ) {
345                   pdl *child = trans->pdls[i];
346
347                   if ( newval ) child->state |=  PDL_BADVAL;
348                   else          child->state &= ~PDL_BADVAL;
349
350                   /* make sure we propagate to grandchildren, etc */
351                   propagate_badflag( child, newval );
352
353               } /* for: i */
354           }
355           PDL_END_CHILDLOOP(it)
356        } /* propagate_badflag */
357
358       Given a piddle ("pdl *it"), the routine loops through each "pdl_trans"
359       structure, where access to this structure is provided by the
360       "PDL_CHILDLOOP_THISCHILD" macro.  The children of the piddle are stored
361       in the "pdls" array, after the parents, hence the loop from "i =
362       ...nparents" to "i = ...npdls - 1".  Once we have the pointer to the
363       child piddle, we can do what we want to it; here we change the value of
364       the "state" variable, but the details are unimportant).  What is
365       important is that we call "propagate_badflag" on this piddle, to ensure
366       we loop through its children. This recursion ensures we get to all the
367       offspring of a particular piddle.
368
369       Access to parents is similar, with the "for" loop replaced by:
370
371               for( i = 0;
372                    i < trans->vtable->nparents;
373                    i++ ) {
374                  /* do stuff with parent #i: trans->pdls[i] */
375               }
376
377   What's in a transformation ("pdl_trans")
378       All transformations are implemented as structures
379
380         struct XXX_trans {
381               int magicno; /* to detect memory overwrites */
382               short flags; /* state of the trans */
383               pdl_transvtable *vtable;   /* the all important vtable */
384               void (*freeproc)(struct pdl_trans *);  /* Call to free this trans
385                       (in case we had to malloc some stuff for this trans) */
386               pdl *pdls[NP]; /* The pdls involved in the transformation */
387               int __datatype; /* the type of the transformation */
388               /* in general more members
389               /* depending on the actual transformation (slice, add, etc)
390                */
391         };
392
393       The transformation identifies all "pdl"s involved in the trans
394
395         pdl *pdls[NP];
396
397       with "NP" depending on the number of piddle args of the particular
398       trans. It records a state
399
400         short flags;
401
402       and the datatype
403
404         int __datatype;
405
406       of the trans (to which all piddles must be converted unless they are
407       explicitly typed, PDL functions created with PDL::PP make sure that
408       these conversions are done as necessary). Most important is the pointer
409       to the vtable (virtual table) that contains the actual functionality
410
411        pdl_transvtable *vtable;
412
413       The vtable structure in turn looks something like (slightly simplified
414       from pdl.h for clarity)
415
416         typedef struct pdl_transvtable {
417               pdl_transtype transtype;
418               int flags;
419               int nparents;   /* number of parent pdls (input) */
420               int npdls;      /* number of child pdls (output) */
421               char *per_pdl_flags;  /* optimization flags */
422               void (*redodims)(pdl_trans *tr);  /* figure out dims of children */
423               void (*readdata)(pdl_trans *tr);  /* flow parents to children  */
424               void (*writebackdata)(pdl_trans *tr); /* flow backwards */
425               void (*freetrans)(pdl_trans *tr); /* Free both the contents and it of
426                                               the trans member */
427               pdl_trans *(*copy)(pdl_trans *tr); /* Full copy */
428               int structsize;
429               char *name; /* For debuggers, mostly */
430         } pdl_transvtable;
431
432       We focus on the callback functions:
433
434               void (*redodims)(pdl_trans *tr);
435
436       "redodims" will work out the dimensions of piddles that need to be
437       created and is called from within the API function that should be
438       called to ensure that the dimensions of a piddle are accessible
439       (pdlapi.c):
440
441          void pdl_make_physdims(pdl *it)
442
443       "readdata" and "writebackdata" are responsible for the actual
444       computations of the child data from the parents or parent data from
445       those of the children, respectively (the dataflow aspect).  The PDL
446       core makes sure that these are called as needed when piddle data is
447       accessed (lazy-evaluation). The general API function to ensure that a
448       piddle is up-to-date is
449
450         void pdl_make_physvaffine(pdl *it)
451
452       which should be called before accessing piddle data from XS/C (see
453       Core.xs for some examples).
454
455       "freetrans" frees dynamically allocated memory associated with the
456       trans as needed and "copy" can copy the transformation.  Again,
457       functions built with PDL::PP make sure that copying and freeing via
458       these callbacks happens at the right times. (If they fail to do that we
459       have got a memory leak -- this has happened in the past ;).
460
461       The transformation and vtable code is hardly ever written by hand but
462       rather generated by PDL::PP from concise descriptions.
463
464       Certain types of transformations can be optimized very efficiently
465       obviating the need for explicit "readdata" and "writebackdata" methods.
466       Those transformations are called pdl_vaffine. Most dimension
467       manipulating functions (e.g., "slice", "xchg") belong to this class.
468
469       The basic trick is that parent and child of such a transformation work
470       on the same (shared) block of data which they just choose to interpret
471       differently (by using different "dims", "dimincs" and "offs" on the
472       same data, compare the "pdl" structure above).  Each operation on a
473       piddle sharing data with another one in this way is therefore
474       automatically flown from child to parent and back -- after all they are
475       reading and writing the same block of memory. This is currently not
476       Perl thread safe -- no big loss since the whole PDL core is not
477       reentrant (Perl threading "!=" PDL threading!).
478
479   Signatures: threading over elementary operations
480       Most of that functionality of PDL threading (automatic iteration of
481       elementary operations over multi-dim piddles) is implemented in the
482       file pdlthread.c.
483
484       The PDL::PP generated functions (in particular the "readdata" and
485       "writebackdata" callbacks) use this infrastructure to make sure that
486       the fundamental operation implemented by the trans is performed in
487       agreement with PDL's threading semantics.
488
489   Defining new PDL functions -- Glue code generation
490       Please, see PDL::PP and examples in the PDL distribution.
491       Implementation and syntax are currently far from perfect but it does a
492       good job!
493
494   The Core struct
495       As discussed in PDL::API, PDL uses a pointer to a structure to allow
496       PDL modules access to its core routines. The definition of this
497       structure (the "Core" struct) is in pdlcore.h (created by pdlcore.h.PL
498       in Basic/Core) and looks something like
499
500        /* Structure to hold pointers core PDL routines so as to be used by
501         * many modules
502         */
503        struct Core {
504           I32    Version;
505           pdl*   (*SvPDLV)      ( SV*  );
506           void   (*SetSV_PDL)   ( SV *sv, pdl *it );
507        #if defined(PDL_clean_namespace) || defined(PDL_OLD_API)
508           pdl*   (*new)      ( );     /* make it work with gimp-perl */
509        #else
510           pdl*   (*pdlnew)      ( );  /* renamed because of C++ clash */
511        #endif
512           pdl*   (*tmp)         ( );
513           pdl*   (*create)      (int type);
514           void   (*destroy)     (pdl *it);
515           ...
516        }
517        typedef struct Core Core;
518
519       The first field of the structure ("Version") is used to ensure
520       consistency between modules at run time; the following code is placed
521       in the BOOT section of the generated xs code:
522
523        if (PDL->Version != PDL_CORE_VERSION)
524          Perl_croak(aTHX_ "Foo needs to be recompiled against the newly installed PDL");
525
526       If you add a new field to the Core struct you should:
527
528       ·    discuss it on the pdl porters email list
529            (pdl-devel@lists.sourceforge.net) [with the possibility of making
530            your changes to a separate branch of the CVS tree if it's a change
531            that will take time to complete]
532
533       ·    increase by 1 the value of the $pdl_core_version variable in
534            pdlcore.h.PL. This sets the value of the "PDL_CORE_VERSION" C
535            macro used to populate the Version field
536
537       ·    add documentation (e.g. to PDL::API) if it's a "useful" function
538            for external module writers (as well as ensuring the code is as
539            well documented as the rest of PDL ;)
540

BUGS

542       This description is far from perfect. If you need more details or
543       something is still unclear please ask on the pdl-devel mailing list
544       (pdl-devel@lists.sourceforge.net).
545

AUTHOR

547       Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), 2000 Doug
548       Burke (djburke@cpan.org), 2002 Christian Soeller & Doug Burke, 2013
549       Chris Marshall.
550
551       Redistribution in the same form is allowed but reprinting requires a
552       permission from the author.
553
554
555
556perl v5.32.0                      2020-09-17                      INTERNALS(1)
Impressum