1INDEXING(1)           User Contributed Perl Documentation          INDEXING(1)
2
3
4

NAME

6       PDL::Indexing - how to index piddles.
7

OVERVIEW

9       This manpage should serve as a first tutorial on the indexing and
10       threading features of PDL.
11
12       Like all vectorized languages, PDL automates looping over arrays using
13       a variant of mathematical vector notation.  The automatic looping is
14       called "threading", in part because ultimately PDL will implement
15       parallel processing to speed up the loops.
16
17       A lot of the flexibility and power of PDL relies on the indexing and
18       threading features of the perl extension.  Indexing allows access to
19       the data of a piddle in a very flexible way.  Threading provides
20       efficient vectorization of simple operations.
21
22       The values of a piddle are stored compactly as typed values in a single
23       block of memory, not (as in a normal Perl list-of-lists) as individual
24       Perl scalars.
25
26       In the sections that follow many "methods" are called out -- these are
27       Perl operators that apply to PDLs.  From the perldl shell, you can find
28       out more about each method by typing "?" followed by the method name.
29
30   Dimension lists
31       A piddle (PDL variable), in general, is an N-dimensional array where N
32       can be 0 (for a scalar), 1 (e.g. for a sound sample), or higher values
33       for images and more complex structures.  Each dimension of the piddle
34       has a positive integer size.  The "perl" interpreter treats each piddle
35       as a special type of Perl scalar (a blessed perl object, actually --
36       but you don't have to know that to use them) that can be used anywhere
37       you can put a normal scalar.
38
39       You can access the dimensions of a piddle as a Perl list and otherwise
40       determine the size of a piddle with several methods.  The important
41       ones are:
42
43       nelem - the total number of elements in a PDL
44       ndims - returns the number of dimensions in a PDL
45       dims - returns the dimension list of a PDL as a Perl list
46       dim - returns the size of a particular dimension of a PDL
47
48   Indexing and Dataflow
49       PDL maintains a notion of "dataflow" between a piddle and indexed
50       subfields of that piddle.  When you produce an indexed subfield or
51       single element of a parent piddle, the child and parent remain attached
52       until you manually disconnect them.  This lets you represent the same
53       data different ways within your code -- for example, you can consider
54       an RGB image simultaneously as a collection of (R,G,B) values in a 3 x
55       1000 x 1000 image, and as three separate 1000 x 1000 color planes
56       stored in different variables.  Modifying any of the variables changes
57       the underlying memory, and the changes are reflected in all
58       representations of the data.
59
60       There are two important methods that let you control dataflow
61       connections between a child and parent PDL:
62
63       copy - forces an explicit copy of a PDL
64       sever - breaks the dataflow connection between a PDL and its parents
65       (if any)
66
67   Threading and Dimension Order
68       Most PDL operations act on the first few dimensions of their piddle
69       arguments.  For example, "sumover" sums all elements along the first
70       dimension in the list (dimension 0).  If you feed in a three-
71       dimensional piddle, then the first dimension is considered the "active"
72       dimension and the later dimensions are "thread" dimensions because they
73       are simply looped over.  There are several ways to transpose or re-
74       order the dimension list of a PDL.  Those techniques are very fast
75       since they don't touch the underlying data, only change the way that
76       PDL accesses the data.  The main dimension ordering functions are:
77
78       mv - moves a particular dimension somewhere else in the dimension list
79       xchg - exchanges two dimensions in the dimension list, leaving the rest
80       alone
81       reorder - allows wholesale mixing of the dimensions
82       clump - clumps together two or more small dimensions into one larger
83       one
84       squeeze - eliminates any dimensions of size 1
85
86   Physical and Dummy Dimensions
87       ·    document perl level threading
88
89       ·    threadids
90
91       ·    update and correct description of slice
92
93       ·    new functions in slice.pd (affine, lag, splitdim)
94
95       ·    reworking of paragraph on explicit threading
96

Indexing and threading with PDL

98       A lot of the flexibility and power of PDL relies on the indexing and
99       looping features of the perl extension. Indexing allows access to the
100       data of a pdl object in a very flexible way. Threading provides
101       efficient implicit looping functionality (since the loops are
102       implemented as optimized C code).
103
104       Pdl objects (later often called "pdls") are perl objects that represent
105       multidimensional arrays and operations on those. In contrast to simple
106       perl @x style lists the array data is compactly stored in a single
107       block of memory thus taking up a lot less memory and enabling use of
108       fast C code to implement operations (e.g. addition, etc) on pdls.
109
110   pdls can have children
111       Central to many of the indexing capabilities of PDL are the relation of
112       "parent" and "child" between pdls. Many of the indexing commands create
113       a new pdl from an existing pdl. The new pdl is the "child" and the old
114       one is the "parent". The data of the new pdl is defined by a
115       transformation that specifies how to generate (compute) its data from
116       the parent's data. The relation between the child pdl and its parent
117       are often bidirectional, meaning that changes in the child's data are
118       propagated back to the parent. (Note: You see, we are aiming in our
119       terminology already towards the new dataflow features. The kind of
120       dataflow that is used by the indexing commands (about which you will
121       learn in a minute) is always in operation, not only when you have
122       explicitly switched on dataflow in your pdl by saying "$a->doflow". For
123       further information about data flow check the dataflow manpage.)
124
125       Another way to interpret the pdls created by our indexing commands is
126       to view them as a kind of intelligent pointer that points back to some
127       portion or all of its parent's data. Therefore, it is not surprising
128       that the parent's data (or a portion of it) changes when manipulated
129       through this "pointer". After these introductory remarks that hopefully
130       prepared you for what is coming (rather than confuse you too much) we
131       are going to dive right in and start with a description of the indexing
132       commands and some typical examples how they might be used in PDL
133       programs. We will further illustrate the pointer/dataflow analogies in
134       the context of some of the examples later on.
135
136       There are two different implementations of this ``smart pointer''
137       relationship: the first one, which is a little slower but works for any
138       transformation is simply to do the transformation forwards and
139       backwards as necessary. The other is to consider the child piddle a
140       ``virtual'' piddle, which only stores a pointer to the parent and
141       access information so that routines which use the child piddle actually
142       directly access the data in the parent.  If the virtual piddle is given
143       to a routine which cannot use it, PDL transparently physicalizes the
144       virtual piddle before letting the routine use it.
145
146       Currently (1.94_01) all transformations which are ``affine'', i.e. the
147       indices of the data item in the parent piddle are determined by a
148       linear transformation (+ constant) from the indices of the child piddle
149       result in virtual piddles. All other indexing routines (e.g.
150       "->index(...)") result in physical piddles.  All routines compiled by
151       PP can accept affine piddles (except those routines that pass pointers
152       to external library functions).
153
154       Note that whether something is affine or not does not affect the
155       semantics of what you do in any way: both
156
157        $a->index(...) .= 5;
158        $a->slice(...) .= 5;
159
160       change the data in $a. The affinity does, however, have a significant
161       impact on memory usage and performance.
162
163   Slicing pdls
164       Probably the most important application of the concept of parent/child
165       pdls is the representation of rectangular slices of a physical pdl by a
166       virtual pdl. Having talked long enough about concepts let's get more
167       specific. Suppose we are working with a 2D pdl representing a 5x5 image
168       (its unusually small so that we can print it without filling several
169       screens full of digits ;).
170
171        perldl> $im = sequence(5,5)
172        perldl> p $im
173
174        [
175         [ 0  1  2  3  4]
176         [ 5  6  7  8  9]
177         [10 11 12 13 14]
178         [15 16 17 18 19]
179         [20 21 22 23 24]
180        ]
181
182        perldl> help vars
183        PDL variables in package main::
184
185        Name         Type   Dimension       Flow  State          Mem
186        ----------------------------------------------------------------
187        $im          Double D [5,5]                P            0.20Kb
188
189       [ here it might be appropriate to quickly talk about the "help vars"
190       command that provides information about pdls in the interactive
191       "perldl" shell that comes with pdl.  ]
192
193       Now suppose we want to create a 1-D pdl that just references one line
194       of the image, say line 2; or a pdl that represents all even lines of
195       the image (imagine we have to deal with even and odd frames of an
196       interlaced image due to some peculiar behaviour of our frame grabber).
197       As another frequent application of slices we might want to create a pdl
198       that represents a rectangular region of the image with top and bottom
199       reversed. All these effects (and many more) can be easily achieved with
200       the powerful slice function:
201
202        perldl> $line = $im->slice(':,(2)')
203        perldl> $even = $im->slice(':,1:-1:2')
204        perldl> $area = $im->slice('3:4,3:1')
205        perldl> help vars  # or just PDL->vars
206        PDL variables in package main::
207
208        Name         Type   Dimension       Flow  State          Mem
209        ----------------------------------------------------------------
210        $even        Double D [5,2]                -C           0.00Kb
211        $im          Double D [5,5]                P            0.20Kb
212        $line        Double D [5]                  -C           0.00Kb
213        $area        Double D [2,3]                -C           0.00Kb
214
215       All three "child" pdls are children of $im or in the other (largely
216       equivalent) interpretation pointers to data of $im.  Operations on
217       those virtual pdls access only those portions of the data as specified
218       by the argument to slice. So we can just print line 2:
219
220        perldl> p $line
221        [10 11 12 13 14]
222
223       Also note the difference in the "Flow State" of $area above and below:
224
225        perldl> p $area
226        perldl> help $area
227        This variable is Double D [2,3]                VC           0.00Kb
228
229       The following demonstrates that $im and $line really behave as you
230       would exspect from a pointer-like object (or in the dataflow picture:
231       the changes in $line's data are propagated back to $im):
232
233        perldl> $im++
234        perldl> p $line
235        [11 12 13 14 15]
236        perldl> $line += 2
237        perldl> p $im
238
239        [
240         [ 1  2  3  4  5]
241         [ 6  7  8  9 10]
242         [13 14 15 16 17]
243         [16 17 18 19 20]
244         [21 22 23 24 25]
245        ]
246
247       Note how assignment operations on the child virtual pdls change the
248       parent physical pdl and vice versa (however, the basic "=" assignment
249       doesn't, use ".=" to obtain that effect. See below for the reasons).
250       The virtual child pdls are something like "live links" to the
251       "original" parent pdl. As previously said, they can be thought of to
252       work similiar to a C-pointer. But in contrast to a C-pointer they carry
253       a lot more information. Firstly, they specify the structure of the data
254       they represent (the dimensionality of the new pdl) and secondly,
255       specify how to create this structure from its parents data (the way
256       this works is buried in the internals of PDL and not important for you
257       to know anyway (unless you want to hack the core in the future or would
258       like to become a PDL guru in general (for a definition of this strange
259       creature see PDL::Internals)).
260
261       The previous examples have demonstrated typical usage of the slice
262       function. Since the slicing functionality is so important here is an
263       explanation of the syntax for the string argument to slice:
264
265        $vpdl = $a->slice('ind0,ind1...')
266
267       where "ind0" specifies what to do with index No 0 of the pdl $a, etc.
268       Each element of the comma separated list can have one of the following
269       forms:
270
271       ':'   Use the whole dimension
272
273       'n'   Use only index "n". The dimension of this index in the resulting
274             virtual pdl is 1. An example involving those first two index
275             formats:
276
277              perldl> $column = $im->slice('2,:')
278              perldl> $row = $im->slice(':,0')
279              perldl> p $column
280
281              [
282               [ 3]
283               [ 8]
284               [15]
285               [18]
286               [23]
287              ]
288
289              perldl> p $row
290
291              [
292               [1 2 3 4 5]
293              ]
294
295              perldl> help $column
296              This variable is Double D [1,5]                VC           0.00Kb
297
298              perldl> help $row
299              This variable is Double D [5,1]                VC           0.00Kb
300
301       '(n)' Use only index "n". This dimension is removed from the resulting
302             pdl (relying on the fact that a dimension of size 1 can always be
303             removed). The distinction between this case and the previous one
304             becomes important in assignments where left and right hand side
305             have to have appropriate dimensions.
306
307              perldl> $line = $im->slice(':,(0)')
308              perldl> help $line
309              This variable is Double D [5]                  -C           0.00Kb
310
311              perldl> p $line
312              [1 2 3 4 5]
313
314             Spot the difference to the previous example?
315
316       'n1:n2' or 'n1:n2:n3'
317             Take the range of indices from "n1" to "n2" or (second form) take
318             the range of indices from "n1" to "n2" with step "n3". An example
319             for the use of this format is the previous definition of the
320             subimage composed of even lines.
321
322              perldl> $even = $im->slice(':,1:-1:2')
323
324             This example also demonstrates that negative indices work like
325             they do for normal perl style arrays by counting backwards from
326             the end of the dimension. If "n2" is smaller than "n1" (in the
327             example -1 is equivalent to index 4) the elements in the virtual
328             pdl are effectively reverted with respect to its parent.
329
330       '*[n]'
331             Add a dummy dimension. The size of this dimension will be 1 by
332             default or equal to "n" if the optional numerical argument is
333             given.
334
335             Now, this is really something a bit strange on first sight. What
336             is a dummy dimension? A dummy dimension inserts a dimension where
337             there wasn't one before. How is that done ? Well, in the case of
338             the new dimension having size 1 it can be easily explained by the
339             way in which you can identify a vector (with "m" elements) with
340             an "(1,m)" or "(m,1)" matrix. The same holds obviously for higher
341             dimensional objects. More interesting is the case of a dummy
342             dimensions of size greater than one (e.g. "slice('*5,:')"). This
343             works in the same way as a call to the dummy function creates a
344             new dummy dimension.  So read on and check its explanation below.
345
346       '([n1:n2[:n3]]=i)'
347             [Not yet implemented ??????]  With an argument like this you make
348             generalised diagonals. The diagonal will be dimension no. "i" of
349             the new output pdl and (if optional part in brackets specified)
350             will extend along the range of indices specified of the
351             respective parent pdl's dimension. In general an argument like
352             this only makes sense if there are other arguments like this in
353             the same call to slice. The part in brackets is optional for this
354             type of argument. All arguments of this type that specify the
355             same target dimension "i" have to relate to the same number of
356             indices in their parent dimension. The best way to explain it is
357             probably to give an example, here we make a pdl that refers to
358             the elements along the space diagonal of its parent pdl (a cube):
359
360              $cube = zeroes(5,5,5);
361              $sdiag = $cube->slice('(=0),(=0),(=0)');
362
363             The above command creates a virtual pdl that represents the
364             diagonal along the parents' dimension no. 0, 1 and 2 and makes
365             its dimension 0 (the only dimension) of it. You use the extended
366             syntax if the dimension sizes of the parent dimensions you want
367             to build the diagonal from have different sizes or you want to
368             reverse the sequence of elements in the diagonal, e.g.
369
370              $rect = zeroes(12,3,5,6,2);
371              $vpdl = $rect->slice('2:7,(0:1=1),(4),(5:4=1),(=1)');
372
373             So the elements of $vpdl will then be related to those of its
374             parent in way we can express as:
375
376               vpdl(i,j) = rect(i+2,j,4,5-j,j)       0<=i<5, 0<=j<2
377
378       [ work in the new index function: "$b = $a->index($c);" ???? ]
379
380   There are different kinds of assignments in PDL
381       The previous examples have already shown that virtual pdls can be used
382       to operate on or access portions of data of a parent pdl. They can also
383       be used as lvalues in assignments (as the use of "++" in some of the
384       examples above has already demonstrated). For explicit assignments to
385       the data represented by a virtual pdl you have to use the overloaded
386       ".=" operator (which in this context we call propagated assignment).
387       Why can't you use the normal assignment operator "="?
388
389       Well, you definitely still can use the '=' operator but it wouldn't do
390       what you want. This is due to the fact that the '=' operator cannot be
391       overloaded in the same way as other assignment operators. If we tried
392       to use '=' to try to assign data to a portion of a physical pdl through
393       a virtual pdl we wouldn't achieve the desired effect (instead the
394       variable representing the virtual pdl (a reference to a blessed thingy)
395       would after the assignment just contain the reference to another
396       blessed thingy which would behave to future assignments as a "physical"
397       copy of the original rvalue [this is actually not yet clear and subject
398       of discussions in the PDL developers mailing list]. In that sense it
399       would break the connection of the pdl to the parent [ isn't this
400       behaviour in a sense the opposite of what happens in dataflow, where
401       ".=" breaks the connection to the parent? ].
402
403       E.g.
404
405        perldl> $line = $im->slice(':,(2)')
406        perldl> $line = zeroes(5);
407        perldl> $line++;
408        perldl> p $im
409
410        [
411         [ 1  2  3  4  5]
412         [ 6  7  8  9 10]
413         [13 14 15 16 17]
414         [16 17 18 19 20]
415         [21 22 23 24 25]
416        ]
417
418        perldl> p $line
419        [1 1 1 1 1]
420
421       But using ".="
422
423        perldl> $line = $im->slice(':,(2)')
424        perldl> $line .= zeroes(5)
425        perldl> $line++
426        perldl> p $im
427
428        [
429         [ 1  2  3  4  5]
430         [ 6  7  8  9 10]
431         [ 1  1  1  1  1]
432         [16 17 18 19 20]
433         [21 22 23 24 25]
434        ]
435
436        perldl> print $line
437        [1 1 1 1 1]
438
439       Also, you can substitute
440
441        perldl> $line .= 0;
442
443       for the assignment above (the zero is converted to a scalar piddle,
444       with no dimensions so it can be assigned to any piddle).
445
446       Related to the assignment feature is a little trap for the unwary:
447       since perl currently does not allow subroutines to return lvalues the
448       following shortcut of the above is flagged as a compile time error:
449
450        perldl> $im->slice(':,(2)') .= zeroes(5)->xvals->float
451
452       instead you have to say something like
453
454        perldl> ($pdl = $im->slice(':,(2)')) .= zeroes(5)->xvals->float
455
456       We hope that future versions of perl will allow the simpler syntax
457       (i.e. allow subroutines to return lvalues).  [Note: perl v5.6.0 does
458       allow this, but it is an experimental feature. However, early reports
459       suggest it works in simple situations]
460
461       Note that there can be a problem with assignments like this when lvalue
462       and rvalue pdls refer to overlapping portions of data in the parent
463       pdl:
464
465        # revert the elements of the first line of $a
466        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)');
467
468       Currently, the parent data on the right side of the assignments is not
469       copied before the (internal) assignment loop proceeds. Therefore, the
470       outcome of this assignment will depend on the sequence in which
471       elements are assigned and almost certainly not do what you wanted.  So
472       the semantics are currently undefined for now and liable to change
473       anytime. To obtain the desired behaviour, use
474
475        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->copy;
476
477       which makes a physical copy of the slice or
478
479        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->sever;
480
481       which returns the same slice but severs the connection of the slice to
482       its parent.
483
484   Other functions that manipulate dimensions
485       Having talked extensively about the slice function it should be noted
486       that this is not the only PDL indexing function. There are additional
487       indexing functions which are also useful (especially in the context of
488       threading which we will talk about later). Here are a list and some
489       examples how to use them.
490
491       "dummy"
492           inserts a dummy dimension of the size you specify (default 1) at
493           the chosen location. You can't wait to hear how that is achieved?
494           Well, all elements with index "(X,x,Y)" ("0<=x<size_of_dummy_dim")
495           just map to the element with index "(X,Y)" of the parent pdl (where
496           "X" and "Y" refer to the group of indices before and after the
497           location where the dummy dimension was inserted.)
498
499           This example calculates the x coordinate of the centroid of an
500           image (later we will learn that we didn't actually need the dummy
501           dimension thanks to the magic of implicit threading; but using
502           dummy dimensions the code would also work in a threadless world;
503           though once you have worked with PDL threads you wouldn't want to
504           live without them again).
505
506            # centroid
507            ($xd,$yd) = $im->dims;
508            $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);
509
510           Let's explain how that works in a little more detail. First, the
511           product:
512
513            $xvs = xvals(zeroes($xd));
514            print $xvs->dummy(1,$yd);      # repeat the line $yd times
515            $prod = $im*xvs->dummy(1,$yd); # form the pixelwise product with
516                                           # the repeated line of x-values
517
518           The rest is then summing the results of the pixelwise product
519           together and normalising with the sum of all pixel values in the
520           original image thereby calculating the x-coordinate of the "center
521           of mass" of the image (interpreting pixel values as local mass)
522           which is known as the centroid of an image.
523
524           Next is a (from the point of view of memory consumption) very cheap
525           conversion from greyscale to RGB, i.e. every pixel holds now a
526           triple of values instead of a scalar. The three values in the
527           triple are, fortunately, all the same for a grey image, so that our
528           trick works well in that it maps all the three members of the
529           triple to the same source element:
530
531            # a cheap greyscale to RGB conversion
532            $rgb = $grey->dummy(0,3)
533
534           Unfortunately this trick cannot be used to convert your old B/W
535           photos to color ones in the way you'd like. :(
536
537           Note that the memory usage of piddles with dummy dimensions is
538           especially sensitive to the internal representation. If the piddle
539           can be represented as a virtual affine (``vaffine'') piddle, only
540           the control structures are stored. But if $b in
541
542            $a = zeroes(10000);
543            $b = $a->dummy(1,10000);
544
545           is made physical by some routine, you will find that the memory
546           usage of your program has suddenly grown by 100Mb.
547
548       "diagonal"
549           replaces two dimensions (which have to be of equal size) by one
550           dimension that references all the elements along the "diagonal"
551           along those two dimensions. Here, we have two examples which should
552           appear familiar to anyone who has ever done some linear algebra.
553           Firstly, make a unity matrix:
554
555            # unity matrix
556            $e = zeroes(float, 3, 3); # make everything zero
557            ($tmp = $e->diagonal(0,1)) .= 1; # set the elements along the diagonal to 1
558            print $e;
559
560           Or the other diagonal:
561
562            ($tmp = $e->slice(':-1:0')->diagonal(0,1)) .= 2;
563            print $e;
564
565           (Did you notice how we used the slice function to revert the
566           sequence of lines before setting the diagonal of the new child,
567           thereby setting the cross diagonal of the parent ?)  Or a mapping
568           from the space of diagonal matrices to the field over which the
569           matrices are defined, the trace of a matrix:
570
571            # trace of a matrix
572            $trace = sum($mat->diagonal(0,1));  # sum all the diagonal elements
573
574       "xchg" and "mv"
575           xchg exchanges or "transposes" the two  specified dimensions.  A
576           straightforward example:
577
578            # transpose a matrix (without explicitly reshuffling data and
579            # making a copy)
580            $prod = $a x $a->xchg(0,1);
581
582           $prod should now be pretty close to the unity matrix if $a is an
583           orthogonal matrix. Often "xchg" will be used in the context of
584           threading but more about that later.
585
586           mv works in a similar fashion. It moves a dimension (specified by
587           its number in the parent) to a new position in the new child pdl:
588
589            $b = $a->mv(4,0);  # make the 5th dimension of $a the first in the
590                               # new child $b
591
592           The difference between "xchg" and "mv" is that "xchg" only changes
593           the position of two dimensions with each other, whereas "mv"
594           inserts the first dimension to the place of second, moving the
595           other dimensions around accordingly.
596
597       "clump"
598           collapses several dimensions into one. Its only argument specifies
599           how many dimensions of the source pdl should be collapsed (starting
600           from the first). An (admittedly unrealistic) example is a 3D pdl
601           which holds data from a stack of image files that you have just
602           read in. However, the data from each image really represents a 1D
603           time series and has only been arranged that way because it was
604           digitized with a frame grabber. So to have it again as an array of
605           time sequences you say
606
607            perldl> $seqs = $stack->clump(2)
608            perldl> help vars
609            PDL variables in package main::
610
611            Name         Type   Dimension       Flow  State          Mem
612            ----------------------------------------------------------------
613            $seqs        Double D [8000,50]            -C           0.00Kb
614            $stack       Double D [100,80,50]          P            3.05Mb
615
616           Unrealistic as it may seem, our confocal microscope software writes
617           data (sometimes) this way. But more often you use clump to achieve
618           a certain effect when using implicit or explicit threading.
619
620   Calls to indexing functions can be chained
621       As you might have noticed in some of the examples above calls to the
622       indexing functions can be nicely chained since all of these functions
623       return a newly created child object. However, when doing extensive
624       index manipulations in a chain be sure to keep track of what you are
625       doing, e.g.
626
627        $a->xchg(0,1)->mv(0,4)
628
629       moves the dimension 1 of $a to position 4 since when the second command
630       is executed the original dimension 1 has been moved to position 0 of
631       the new child that calls the "mv" function. I think you get the idea
632       (in spite of my convoluted explanations).
633
634   Propagated assignments ('.=') and dummy dimensions
635       A sublety related to indexing is the assignment to pdls containing
636       dummy dimensions of size greater than 1. These assignments (using ".=")
637       are forbidden since several elements of the lvalue pdl point to the
638       same element of the parent. As a consequence the value of those parent
639       elements are potentially ambiguous and would depend on the sequence in
640       which the implementation makes the assignments to elements. Therefore,
641       an assignment like this:
642
643        $a = pdl [1,2,3];
644        $b = $a->dummy(1,4);
645        $b .= yvals(zeroes(3,4));
646
647       can produce unexpected results and the results are explicitly undefined
648       by PDL because when PDL gets parallel computing features, the current
649       result may well change.
650
651       From the point of view of dataflow the introduction of greater-size-
652       than-one dummy dimensions is regarded as an irreversible transformation
653       (similar to the terminology in thermodynamics) which precludes backward
654       propagation of assignment to a parent (which you had explicitly
655       requested using the ".=" assignment). A similar problem to watch out
656       for occurs in the context of threading where sometimes dummy dimensions
657       are created implicitly during the thread loop (see below).
658
659   Reasons for the parent/child (or "pointer") concept
660       [ this will have to wait a bit ]
661
662        XXXXX being memory efficient
663        XXXXX in the context of threading
664        XXXXX very flexible and powerful way of accessing portions of pdl data
665              (in much more general way than sec, etc allow)
666        XXXXX efficient implementation
667        XXXXX difference to section/at, etc.
668
669   How to make things physical again
670       [ XXXXX fill in later when everything has settled a bit more ]
671
672        ** When needed (xsub routine interfacing C lib function)
673        ** How achieved (->physical)
674        ** How to test (isphysical (explain how it works currently))
675        ** ->copy and ->sever
676

Threading

678       In the previous paragraph on indexing we have already mentioned the
679       term occasionally but now its really time to talk explicitly about
680       "threading" with pdls. The term threading has many different meanings
681       in different fields of computing. Within the framework of PDL it could
682       probably be loosely defined as an implicit looping facility. It is
683       implicit because you don't specify anything like enclosing for-loops
684       but rather the loops are automatically (or 'magically') generated by
685       PDL based on the dimensions of the pdls involved. This should give you
686       a first idea why the index/dimension manipulating functions you have
687       met in the previous paragraphs are especially important and useful in
688       the context of threading.  The other ingredient for threading (apart
689       from the pdls involved) is a function that is threading aware
690       (generally, these are PDL::PP compiled functions) and that the pdls are
691       "threaded" over.  So much about the terminology and now let's try to
692       shed some light on what it all means.
693
694   Implicit threading - a first example
695       There are two slightly different variants of threading. We start with
696       what we call "implicit threading". Let's pick a practical example that
697       involves looping of a function over many elements of a pdl. Suppose we
698       have an RGB image that we want to convert to greyscale. The RGB image
699       is represented by a 3-dim pdl "im(3,x,y)" where the first dimension
700       contains the three color components of each pixel and "x" and "y" are
701       width and height of the image, respectively. Next we need to specify
702       how to convert a color-triple at a given pixel into a greyvalue (to be
703       a realistic example it should represent the relative intensity with
704       which our color insensitive eye cells would detect that color to
705       achieve what we would call a natural conversion from color to
706       greyscale). An approximation that works quite well is to compute the
707       grey intensity from each RGB triplet (r,g,b) as a weighted sum
708
709        greyvalue = 77/256*r + 150/256*g + 29/256*b =
710            inner([77,150,29]/256, [r,g,b])
711
712       where the last form indicates that we can write this as an inner
713       product of the 3-vector comprising the weights for red, green and blue
714       components with the 3-vector containing the color components.
715       Traditionally, we might have written a function like the following to
716       process the whole image:
717
718        my @dims=$im->dims;
719        # here normally check that first dim has correct size (3), etc
720        $grey=zeroes(@dims[1,2]);   # make the pdl for the resulting grey image
721        $w = pdl [77,150,29] / 256; # the vector of weights
722        for ($j=0;$j<dims[2];$j++) {
723           for ($i=0;$i<dims[1];$i++) {
724               # compute the pixel value
725               $tmp = inner($w,$im->slice(':,(i),(j)'));
726               set($grey,$i,$j,$tmp); # and set it in the greyscale image
727           }
728        }
729
730       Now we write the same using threading (noting that "inner" is a
731       threading aware function defined in the PDL::Primitive package)
732
733        $grey = inner($im,pdl([77,150,29]/256));
734
735       We have ended up with a one-liner that automatically creates the pdl
736       $grey with the right number and size of dimensions and performs the
737       loops automatically (these loops are implemented as fast C code in the
738       internals of PDL).  Well, we still owe you an explanation how this
739       'magic' is achieved.
740
741   How does the example work ?
742       The first thing to note is that every function that is threading aware
743       (these are without exception functions compiled from concise
744       descriptions by PDL::PP, later just called PP-functions) expects a
745       defined (minimum) number of dimensions (we call them core dimensions)
746       from each of its pdl arguments. The inner function expects two one-
747       dimensional (input) parameters from which it calculates a zero-
748       dimensional (output) parameter. We write that symbolically as
749       "inner((n),(n),[o]())" and call it "inner"'s signature, where n
750       represents the size of that dimension. n being equal in the first and
751       second parameter means that those dimensions have to be of equal size
752       in any call. As a different example take the outer product which takes
753       two 1D vectors to generate a 2D matrix, symbolically written as
754       "outer((n),(m),[o](n,m))". The "[o]" in both examples indicates that
755       this (here third) argument is an output argument. In the latter example
756       the dimensions of first and second argument don't have to agree but you
757       see how they determine the size of the two dimensions of the output
758       pdl.
759
760       Here is the point when threading finally enters the game. If you call
761       PP-functions with pdls that have more than the required core dimensions
762       the first dimensions of the pdl arguments are used as the core
763       dimensions and the additional extra dimensions are threaded over. Let
764       us demonstrate this first with our example above
765
766        $grey = inner($im,$w); # w is the weight vector from above
767
768       In this case $w is 1D and so supplied just the core dimension, $im is
769       3D, more specifically "(3,x,y)". The first dimension (of size 3) is the
770       required core dimension that matches (as required by inner) the first
771       (and only) dimension of $w. The second dimension is the first thread
772       dimension (of size "x") and the third is here the second thread
773       dimension (of size "y"). The output pdl is automatically created (as
774       requested by setting $grey to "null" prior to invocation). The output
775       dimensions are obtained by appending the loop dimensions (here "(x,y)")
776       to the core output dimensions (here 0D) to yield the final dimensions
777       of the autocreated pdl (here "0D+2D=2D" to yield a 2D output of size
778       "(x,y)").
779
780       So the above command calls the core functioniality that computes the
781       inner product of two 1D vectors "x*y" times with $w and all 1D slices
782       of the form "(':,(i),(j)')" of $im and sets the respective elements of
783       the output pdl "$grey(i,j)" to the result of each computation. We could
784       write that symbolically as
785
786        $grey(0,0) = f($w,$im(:,(0),(0)))
787        $grey(1,0) = f($w,$im(:,(1),(0)))
788            .
789            .
790            .
791        $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
792        $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))
793
794       But this is done automatically by PDL without writing any explicit perl
795       loops.  We see that the command really creates an output pdl with the
796       right dimensions and sets the elements indeed to the result of the
797       computation for each pixel of the input image.
798
799       When even more pdls and extra dimensions are involved things get a bit
800       more complicated. We will first give the general rules how the thread
801       dimensions depend on the dimensions of input pdls enabling you to
802       figure out the dimensionality of an autocreated output pdl (for any
803       given set of input pdls and core dimensions of the PP-function in
804       question). The general rules will most likely appear a bit confusing on
805       first sight so that we'll set out to illustrate the usage with a set of
806       further examples (which will hopefully also demonstrate that there are
807       indeed many practical situations where threading comes in extremly
808       handy).
809
810   A call for coding discipline
811       Before we point out the other technical details of threading, please
812       note this call for programming discipline when using threading:
813
814       In order to preserve human readability, PLEASE comment any nontrivial
815       expression in your code involving threading.  Most importantly, for any
816       subroutine, include information at the beginning about what you expect
817       the dimensions to represent (or ranges of dimensions).
818
819       As a warning, look at this undocumented function and try to guess what
820       might be going on:
821
822        sub lookup {
823          my ($im,$palette) = @_;
824          my $res;
825          index($palette->xchg(0,1),
826                     $im->long->dummy(0,($palette->dim)[0]),
827                     ($res=null));
828          return $res;
829        }
830
831       Would you agree that it might be difficult to figure out expected
832       dimensions, purpose of the routine, etc ?  (If you want to find out
833       what this piece of code does, see below)
834
835   How to figure out the loop dimensions
836       There are a couple of rules that allow you to figure out number and
837       size of loop dimensions (and if the size of your input pdls comply with
838       the threading rules). Dimensions of any pdl argument are broken down
839       into two groups in the following: Core dimensions (as defined by the
840       PP-function, see Appendix B for a list of PDL primitives) and extra
841       dimensions which comprises all remaining dimensions of that pdl. For
842       example calling a function "func" with the signature
843       "func((n,m),[o](n))" with a pdl "a(2,4,7,1,3)" as "f($a,($o = null))"
844       results in the semantic splitting of a's dimensions into: core
845       dimensions "(2,4)" and extra dimensions "(7,1,3)".
846
847       R0    Core dimensions are identified with the first N dimensions of the
848             respective pdl argument (and are required). Any further
849             dimensions are extra dimensions and used to determine the loop
850             dimensions.
851
852       R1    The number of (implicit) loop dimensions is equal to the maximal
853             number of extra dimensions taken over the set of pdl arguments.
854
855       R2    The size of each of the loop dimensions is derived from the size
856             of the respective dimensions of the pdl arguments. The size of a
857             loop dimension is given by the maximal size found in any of the
858             pdls having this extra dimension.
859
860       R3    For all pdls that have a given extra dimension the size must be
861             equal to the size of the loop dimension (as determined by the
862             previous rule) or 1; otherwise you raise a runtime exception. If
863             the size of the extra dimension in a pdl is one it is implicitly
864             treated as a dummy dimension of size equal to that loop dim size
865             when performing the thread loop.
866
867       R4    If a pdl doesn't have a loop dimension, in the thread loop this
868             pdl is treated as if having a dummy dimension of size equal to
869             the size of that loop dimension.
870
871       R5    If output autocreation is used (by setting the relevant pdl to
872             "PDL->null" before invocation) the number of dimensions of the
873             created pdl is equal to the sum of the number of core output
874             dimensions + number of loop dimensions. The size of the core
875             output dimensions is derived from the relevant dimension of input
876             pdls (as specified in the function definition) and the sizes of
877             the other dimensions are equal to the size of the loop dimension
878             it is derived from. The automatically created pdl will be
879             physical (unless dataflow is in operation).
880
881       In this context, note that you can run into the problem with assignment
882       to pdls containing greater-than-one dummy dimensions (see above).
883       Although your output pdl(s) didn't contain any dummy dimensions in the
884       first place they may end up with implicitly created dummy dimensions
885       according to R4.
886
887       As an example, suppose we have a (here unspecified) PP-function with
888       the signature:
889
890        func((m,n),(m,n,o),(m),[o](m,o))
891
892       and you call it with 3 pdls "a(5,3,10,11)", "b(5,3,2,10,1,12)", and
893       "c(5,1,11,12)" as
894
895        func($a,$b,$c,($d=null))
896
897       then the number of loop dimensions is 3 (by "R0+R1" from $b and $c)
898       with sizes "(10,11,12)" (by R2); the two output core dimensions are
899       "(5,2)" (from the signature of func) resulting in a 5-dimensional
900       output pdl $c of size "(5,2,10,11,12)" (see R5) and (the automatically
901       created) $d is derived from "($a,$b,$c)" in a way that can be expressed
902       in pdl pseudo-code as
903
904        $d(:,:,i,j,k) .= func($a(:,:,i,j),$b(:,:,:,i,0,k),$c(:,0,j,k))
905           with 0<=i<10, 0<=j<=11, 0<=k<12
906
907       If we analyze the color to greyscale conversion again with these rules
908       in mind we note another great advantage of implicit threading.  We can
909       call the conversion with a pdl representing a pixel (im(3)), a line of
910       rgb pixels ("im(3,x)"), a proper color image ("im(3,x,y)") or a whole
911       stack of RGB images ("im(3,x,y,z)"). As long as $im is of the form
912       "(3,...)" the automatically created output pdl will contain the right
913       number of dimensions and contain the intensity data as we exspect it
914       since the loops have been implicitly performed thanks to implicit
915       threading. You can easily convince yourself that calling with a color
916       pixel $grey is 0D, with a line it turns out 1D grey(x), with an image
917       we get "grey(x,y)" and finally we get a converted image stack
918       "grey(x,y,z)".
919
920       Let's fill these general rules with some more life by going through a
921       couple of further examples. The reader may try to figure out equivalent
922       formulations with explicit for-looping and compare the flexibility of
923       those routines using implicit threading to the explicit formulation.
924       Furthermore, especially when using several thread dimensions it is a
925       useful exercise to check the relative speed by doing some benchmark
926       tests (which we still have to do).
927
928       First in the row is a slightly reworked centroid example, now coded
929       with threading in mind.
930
931        # threaded mult to calculate centroid coords, works for stacks as well
932        $xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) /
933              sumover($im->clump(2));
934
935       Let's analyse what's going on step by step. First the product:
936
937        $prod = $im*xvals(zeroes(($im->dims)[0]))
938
939       This will actually work for $im being one, two, three, and higher
940       dimensional. If $im is one-dimensional it's just an ordinary product
941       (in the sense that every element of $im is multiplied with the
942       respective element of "xvals(...)"), if $im has more dimensions further
943       threading is done by adding appropriate dummy dimensions to
944       "xvals(...)"  according to R4.  More importantly, the two sumover
945       operations show a first example of how to make use of the dimension
946       manipulating commands. A quick look at sumover's signature will remind
947       you that it will only "gobble up" the first dimension of a given input
948       pdl. But what if we want to really compute the sum over all elements of
949       the first two dimensions? Well, nothing keeps us from passing a virtual
950       pdl into sumover which in this case is formed by clumping the first two
951       dimensions of the "parent pdl" into one. From the point of view of the
952       parent pdl the sum is now computed over the first two dimensions, just
953       as we wanted, though sumover has just done the job as specified by its
954       signature. Got it ?
955
956       Another little finesse of writing the code like that: we intentionally
957       used "sumover($pdl->clump(2))" instead of "sum($pdl)" so that we can
958       either pass just an image "(x,y)" or a stack of images "(x,y,t)" into
959       this routine and get either just one x-coordiante or a vector of
960       x-coordinates (of size t) in return.
961
962       Another set of common operations are what one could call "projection
963       operations". These operations take a N-D pdl as input and return a
964       (N-1)-D "projected" pdl. These operations are often performed with
965       functions like sumover, prodover, minimum and maximum.  Using again
966       images as examples we might want to calculate the maximum pixel value
967       for each line of an image or image stack. We know how to do that
968
969        # maxima of lines (as function of line number and time)
970        maximum($stack,($ret=null));
971
972       But what if you want to calculate maxima per column when implicit
973       threading always applies the core functionality to the first dimension
974       and threads over all others? How can we achieve that instead the core
975       functionality is applied to the second dimension and threading is done
976       over the others. Can you guess it? Yes, we make a virtual pdl that has
977       the second dimension of the "parent pdl" as its first dimension using
978       the "mv" command.
979
980        # maxima of columns (as function of column number and time)
981        maximum($stack->mv(0,1),($ret=null));
982
983       and calculating all the sums of sub-slices over the third dimension is
984       now almost too easy
985
986        # sums of pixles in time (assuming time is the third dim)
987        sumover($stack->mv(0,2),($ret=null));
988
989       Finally, if you want to apply the operation to all elements (like max
990       over all elements or sum over all elements) regardless of the
991       dimensions of the pdl in question "clump" comes in handy. As an example
992       look at the definition of "sum" (as defined in "Basic.pm"):
993
994        sub sum {
995          PDL::Primitive::sumover($name->clump(-1),($tmp=null));
996          return $tmp->at(); # return a perl number, not a 0D pdl
997        }
998
999       We have already mentioned that all basic operations support threading
1000       and assignment is no exception. So here are a couple of threaded
1001       assignments
1002
1003        perldl> $im = zeroes(byte, 10,20)
1004        perldl> $line = exp(-rvals(10)**2/9)
1005        # threaded assignment
1006        perldl> $im .= $line      # set every line of $im to $line
1007        perldl> $im2 .= 5         # set every element of $im2 to 5
1008
1009       By now you probably see how it works and what it does, don't you?
1010
1011       To finish the examples in this paragraph here is a function to create
1012       an RGB image from what is called a palette image. The palette image
1013       consists of two parts: an image of indices into a color lookup table
1014       and the color lookup table itself. [ describe how it works ] We are
1015       going to use a PP-function we haven't encoutered yet in the previous
1016       examples. It is the aptly named index function, signature
1017       "((n),(),[o]())" (see Appendix B) with the core functionality that
1018       "index(pdl (0,2,4,5),2,($ret=null))" will return the element with index
1019       2 of the first input pdl. In this case, $ret will contain the value 4.
1020       So here is the example:
1021
1022        # a threaded index lookup to generate an RGB, or RGBA or YMCK image
1023        # from a palette image (represented by a lookup table $palette and
1024        # an color-index image $im)
1025        # you can say just dummy(0) since the rules of threading make it fit
1026        perldl> index($palette->xchg(0,1),
1027                      $im->long->dummy(0,($palette->dim)[0]),
1028                      ($res=null));
1029
1030       Let's go through it and explain the steps involved. Assuming we are
1031       dealing with an RGB lookup-table $palette is of size "(3,x)". First we
1032       exchange the dimensions of the palette so that looping is done over the
1033       first dimension of $palette (of size 3 that represent r, g, and b
1034       components). Now looking at $im, we add a dummy dimension of size equal
1035       to the length of the number of components (in the case we are
1036       discussing here we could have just used the number 3 since we have 3
1037       color components). We can use a dummy dimension since for red, green
1038       and blue color components we use the same index from the original
1039       image, e.g.  assuming a certain pixel of $im had the value 4 then the
1040       lookup should produce the triple
1041
1042        [palette(0,4),palette(1,4),palette(2,4)]
1043
1044       for the new red, green and blue components of the output image.
1045       Hopefully by now you have some sort of idea what the above piece of
1046       code is supposed to do (it is often actually quite complicated to
1047       describe in detail how a piece of threading code works; just go ahead
1048       and experiment a bit to get a better feeling for it).
1049
1050       If you have read the threading rules carefully, then you might have
1051       noticed that we didn't have to explicitely state the size of the dummy
1052       dimension that we created for $im; when we create it with size 1 (the
1053       default) the rules of threading make it automatically fit to the
1054       desired size (by rule R3, in our example the size would be 3 assuming a
1055       palette of size "(3,x)"). Since situations like this do occur often in
1056       practice this is actually why rule R3 has been introduced (the part
1057       that makes dimensions of size 1 fit to the thread loop dim size). So we
1058       can just say
1059
1060        perldl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null));
1061
1062       Again, you can convince yourself that this routine will create the
1063       right output if called with a pixel ($im is 0D), a line ($im is 1D), an
1064       image ($im is 2D), ..., an RGB lookup table (palette is "(3,x)") and
1065       RGBA lookup table (palette is "(4,x)", see e.g. OpenGL). This
1066       flexibility is achieved by the rules of threading which are made to do
1067       the right thing in most situations.
1068
1069       To wrap it all up once again, the general idea is as follows. If you
1070       want to achieve looping over certain dimensions and have the core
1071       functionality applied to another specified set of dimensions you use
1072       the dimension manipulating commands to create a (or several) virtual
1073       pdl(s) so that from the point of view of the parent pdl(s) you get what
1074       you want (always having the signature of the function in question and
1075       R1-R5 in mind!). Easy, isn't it ?
1076
1077   Output autocreation and PP-function calling conventions
1078       At this point we have to divert to some technical detail that has to do
1079       with the general calling conventions of PP-functions and the automatic
1080       creation of output arguments.  Basically, there are two ways of
1081       invoking pdl routines, namely
1082
1083        $result = func($a,$b);
1084
1085       and
1086
1087        func($a,$b,$result);
1088
1089       If you are only using implicit threading then the output variable can
1090       be automatically created by PDL. You flag that to the PP-function by
1091       setting the output argument to a special kind of pdl that is returned
1092       from a call to the function "PDL->null" that returns an essentially
1093       "empty" pdl (for those interested in details there is a flag in the C
1094       pdl structure for this). The dimensions of the created pdl are
1095       determined by the rules of implicit threading: the first dimensions are
1096       the core output dimensions to which the threading dimensions are
1097       appended (which are in turn determined by the dimensions of the input
1098       pdls as described above).  So you can say
1099
1100        func($a,$b,($result=PDL->null));
1101
1102       or
1103
1104        $result = func($a,$b)
1105
1106       which are exactly equivalent.
1107
1108       Be warned that you can not use output autocreation when using explicit
1109       threading (for reasons explained in the following section on explicit
1110       threading, the second variant of threading).
1111
1112       In "tight" loops you probably want to avoid the implicit creation of a
1113       temporary pdl in each step of the loop that comes along with the
1114       "functional" style but rather say
1115
1116        # create output pdl of appropriate size only at first invocation
1117        $result = null;
1118        for (0...$n) {
1119             func($a,$b,$result); # in all but the first invocation $result
1120             func2($b);           # is defined and has the right size to
1121                                  # take the output provided $b's dims don't change
1122             twiddle($result,$a); # do something from $result to $a for iteration
1123        }
1124
1125       The take-home message of this section once more: be aware of the
1126       limitation on output creation when using explicit threading.
1127
1128   Explicit threading
1129       Having so far only talked about the first flavour of threading it is
1130       now about time to introduce the second variant. Instead of shuffling
1131       around dimensions all the time and relying on the rules of implicit
1132       threading to get it all right you sometimes might want to specify in a
1133       more explicit way how to perform the thread loop. It is probably not
1134       too surprising that this variant of the game is called explicit
1135       threading.  Now, before we create the wrong impression: it is not
1136       either implicit or explicit; the two flavours do mix. But more about
1137       that later.
1138
1139       The two most used functions with explicit threading are thread and
1140       unthread.  We start with an example that illustrates typical usage of
1141       the former:
1142
1143        [ # ** this is the worst possible example to start with ]
1144        #  but can be used to show that $mat += $line is different from
1145        #                               $mat->thread(0) += $line
1146        # explicit threading to add a vector to each column of a matrix
1147        perldl> $mat  = zeroes(4,3)
1148        perldl> $line = pdl (3.1416,2,-2)
1149        perldl> ($tmp = $mat->thread(0)) += $line
1150
1151       In this example, "$mat->thread(0)" tells PDL that you want the second
1152       dimension of this pdl to be threaded over first leading to a thread
1153       loop that can be expressed as
1154
1155        for (j=0; j<3; j++) {
1156           for (i=0; i<4; i++) {
1157               mat(i,j) += src(j);
1158           }
1159        }
1160
1161       "thread" takes a list of numbers as arguments which explicitly specify
1162       which dimensions to thread over first. With the introduction of
1163       explicit threading the dimensions of a pdl are conceptually split into
1164       three different groups the latter two of which we have already
1165       encountered: thread dimensions, core dimensions and extra dimensions.
1166
1167       Conceptually, it is best to think of those dimensions of a pdl that
1168       have been specified in a call to "thread" as being taken away from the
1169       set of normal dimensions and put on a separate stack. So assuming we
1170       have a pdl "a(4,7,2,8)" saying
1171
1172        $b = $a->thread(2,1)
1173
1174       creates a new virtual pdl of dimension "b(4,8)" (which we call the
1175       remaining dims) that also has 2 thread dimensions of size "(2,7)". For
1176       the purposes of this document we write that symbolically as
1177       "b(4,8){2,7}". An important difference to the previous examples where
1178       only implicit threading was used is the fact that the core dimensions
1179       are matched against the remaining dimensions which are not necessarily
1180       the first dimensions of the pdl. We will now specify how the presence
1181       of thread dimensions changes the rules R1-R5 for threadloops (which
1182       apply to the special case where none of the pdl arguments has any
1183       thread dimensions).
1184
1185       T0  Core dimensions are matched against the first n remaining
1186           dimensions of the pdl argument (note the difference to R1). Any
1187           further remaining dimensions are extra dimensions and are used to
1188           determine the implicit loop dimensions.
1189
1190       T1a The number of implicit loop dimensions is equal to the maximal
1191           number of extra dimensions taken over the set of pdl arguments.
1192
1193       T1b The number of explicit loop dimensions is equal to the maximal
1194           number of thread dimensions taken over the set of pdl arguments.
1195
1196       T1c The total number of loop dimensions is equal to the sum of explicit
1197           loop dimensions and implicit loop dimensions. In the thread loop,
1198           explicit loop dimensions are threaded over first followed by
1199           implicit loop dimensions.
1200
1201       T2  The size of each of the loop dimensions is derived from the size of
1202           the respective dimensions of the pdl arguments. It is given by the
1203           maximal size found in any pdls having this thread dimension (for
1204           explicit loop dimensions) or extra dimension (for implicit loop
1205           dimensions).
1206
1207       T3  This rule applies to any explicit loop dimension as well as any
1208           implicit loop dimension. For all pdls that have a given
1209           thread/extra dimension the size must be equal to the size of the
1210           respective explicit/implicit loop dimension or 1; otherwise you
1211           raise a runtime exception. If the size of a thread/extra dimension
1212           of a pdl is one it is implicitly treated as a dummy dimension of
1213           size equal to the explicit/implicit loop dimension.
1214
1215       T4  If a pdl doesn't have a thread/extra dimension that corresponds to
1216           an explicit/implicit loop dimension, in the thread loop this pdl is
1217           treated as if having a dummy dimension of size equal to the size of
1218           that loop dimension.
1219
1220       T4a All pdls that do have thread dimensions must have the same number
1221           of thread dimensions.
1222
1223       T5  Output autocreation cannot be used if any of the pdl arguments has
1224           any thread dimensions. Otherwise R5 applies.
1225
1226       The same restrictions apply with regard to implicit dummy dimensions
1227       (created by application of T4) as already mentioned in the section on
1228       implicit threading: if any of the output pdls has an (explicit or
1229       implicitly created) greater-than-one dummy dimension a runtime
1230       exception will be raised.
1231
1232       Let us demonstrate these rules at work in a generic case.  Suppose we
1233       have a (here unspecified) PP-function with the signature:
1234
1235        func((m,n),(m),(),[o](m))
1236
1237       and you call it with 3 pdls "a(5,3,10,11)", "b(3,5,10,1,12)", "c(10)"
1238       and an output pdl "d(3,11,5,10,12)" (which can here not be
1239       automatically created) as
1240
1241        func($a->thread(1,3),$b->thread(0,3),$c,$d->thread(0,1))
1242
1243       From the signature of func and the above call the pdls split into the
1244       following groups of core, extra and thread dimensions (written in the
1245       form "pdl(core dims){thread dims}[extra dims]"):
1246
1247        a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]
1248
1249       With this to help us along (it is in general helpful to write the
1250       arguments down like this when you start playing with threading and want
1251       to keep track of what is going on) we further deduce that the number of
1252       explicit loop dimensions is 2 (by T1b from $a and $b) with sizes
1253       "(3,11)" (by T2); 2 implicit loop dimensions (by T1a from $b and $d) of
1254       size "(10,12)" (by T2) and the elements of are computed from the input
1255       pdls in a way that can be expressed in pdl pseudo-code as
1256
1257        for (l=0;l<12;l++)
1258         for (k=0;k<10;k++)
1259          for (j=0;j<11;j++)         effect of treating it as dummy dim (index j)
1260           for (i=0;i<3;i++)                         |
1261              d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))
1262
1263       Uhhmpf, this example was really not easy in terms of bookeeping. It
1264       serves mostly as an example how to figure out what's going on when you
1265       encounter a complicated looking expression. But now it is really time
1266       to show that threading is useful by giving some more of our so called
1267       "practical" examples.
1268
1269       [ The following examples will need some additional explanations in the
1270       future. For the moment please try to live with the comments in the code
1271       fragments. ]
1272
1273       Example 1:
1274
1275        *** inverse of matrix represented by eigvecs and eigvals
1276        ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
1277        **    =>  inverse M^-1 = A^T x diag(1/lambda_i) x A
1278        ** first $tmp = diag(1/lambda_i)*A
1279        ** then  A^T * $tmp by threaded inner product
1280        # index handling so that matrices print correct under pdl
1281        $inv .= $evecs*0;  # just copy to get appropriately sized output
1282        $tmp .= $evecs;    # initialise, no backpropagation
1283        ($tmp2 = $tmp->thread(0)) /= $evals;    #  threaded division
1284        # and now a matrix multiplication in disguise
1285        PDL::Primitive::inner($evecs->xchg(0,1)->thread(-1,1),
1286                              $tmp->thread(0,-1),
1287                              $inv->thread(0,1));
1288        # alternative for matrix mult using implicit threading,
1289        # first xchg only for transpose
1290        PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1),
1291                              $tmp->xchg(0,1)->dummy(2),
1292                              ($inv=null));
1293
1294       Example 2:
1295
1296        # outer product by threaded multiplication
1297        # stress that we need to do it with explicit call to my_biop1
1298        # when using explicit threading
1299        $res=zeroes(($a->dims)[0],($b->dims)[0]);
1300        my_biop1($a->thread(0,-1),$b->thread(-1,0),$res->(0,1),"*");
1301        # similiar thing by implicit threading with autocreated pdl
1302        $res = $a->dummy(1) * $b->dummy(0);
1303
1304       Example 3:
1305
1306        # different use of thread and unthread to shuffle a number of
1307        # dimensions in one go without lots of calls to ->xchg and ->mv
1308
1309
1310        # use thread/unthread to shuffle dimensions around
1311        # just try it out and compare the child pdl with its parent
1312        $trans = $a->thread(4,1,0,3,2)->unthread;
1313
1314       Example 4:
1315
1316        # calculate a couple of bounding boxes
1317        # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
1318        # we use again thread and unthread to shuffle dimensions around
1319        perldl> $bb = zeroes(double, 2,3 );
1320        perldl> minimum($vertices->thread(0)->clump->unthread(1),
1321                        $bb->slice('(0),:'));
1322        perldl> maximum($vertices->thread(0)->clump->unthread(1),
1323                        $bb->slice('(1),:'));
1324
1325       Example 5:
1326
1327        # calculate a self-ratioed (i.e. self normalized) sequence of images
1328        # uses explicit threading and an implicitly threaded division
1329        $stack = read_image_stack();
1330        # calculate the average (per pixel average) of the first $n+1 images
1331        $aver = zeroes([stack->dims]->[0,1]);  # make the output pdl
1332        sumover($stack->slice(":,:,0:$n")->thread(0,1),$aver);
1333        $aver /= ($n+1);
1334        $stack /= $aver;  # normalize the stack by doing a threaded divison
1335        # implicit versus explicit
1336        # alternatively calculate $aver with implicit threading and autocreation
1337        sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
1338        $aver /= ($n+1);
1339        #
1340
1341   Implicit versus explicit threading
1342       In this paragraph we are going to illustrate when explicit threading is
1343       preferrable over implicit threading and vice versa. But then again,
1344       this is probably not the best way of putting the case since you already
1345       know: the two flavours do mix. So, it's more about how to get the best
1346       of both worlds and, anyway, in the best of perl traditions: TIMTOWTDI !
1347
1348       [ Sorry, this still has to be filled in in a later release; either
1349       refer to above examples or choose some new ones ]
1350
1351       Finally, this may be a good place to justify all the technical detail
1352       we have been going on about for a couple of pages: why threading ?
1353
1354       Well, code that uses threading should be (considerably) faster than
1355       code that uses explicit for-loops (or similar perl constructs) to
1356       achieve the same functionality. Especially on supercomputers (with
1357       vector computing facilities/parallel processing) PDL threading will be
1358       implemented in a way that takes advantage of the additional facilities
1359       of these machines. Furthermore, it is a conceptually simply construct
1360       (though technical details might get involved at times) and can greatly
1361       reduce the syntactical complexity of PDL code (but keep the admonition
1362       for documentation in mind). Once you are comfortable with the threading
1363       way of thinking (and coding) it shouldn't be too difficult to
1364       understand code that somebody else has written than (provided he gave
1365       you an idea what exspected input dimensions are, etc.). As a general
1366       tip to increase the performance of your code: if you have to introduce
1367       a loop into your code try to reformulate the problem so that you can
1368       use threading to perform the loop (as with anything there are
1369       exceptions to this rule of thumb; but the authors of this document tend
1370       to think that these are rare cases ;).
1371

PDL::PP

1373   An easy way to define functions that are aware of indexing and threading
1374       (and the universe and everything)
1375       PDL:PP is part of the PDL distribution. It is used to generate
1376       functions that are aware of indexing and threading rules from very
1377       concise descriptions. It can be useful for you if you want to write
1378       your own functions or if you want to interface functions from an
1379       external library so  that they support indexing and threading (and mabe
1380       dataflow as well, see PDL::Dataflow). For further details check
1381       PDL::PP.
1382

Appendix A

1384   Affine transformations - a special class of simple and powerful
1385       transformations
1386       [ This is also something to be added in future releases. Do we already
1387       have the general make_affine routine in PDL ? It is possible that we
1388       will reference another appropriate manpage from here ]
1389

Appendix B

1391   signatures of standard PDL::PP compiled functions
1392       A selection of signatures of PDL primitives to show how many dimensions
1393       PP compiled functions gobble up (and therefore you can figure out what
1394       will be threaded over). Most of those functions are the basic ones
1395       defined in "primitive.pd"
1396
1397        # functions in primitive.pd
1398        #
1399        sumover        ((n),[o]())
1400        prodover       ((n),[o]())
1401        axisvalues     ((n))                                   inplace
1402        inner          ((n),(n),[o]())
1403        outer          ((n),(m),[o](n,m))
1404        innerwt        ((n),(n),(n),[o]())
1405        inner2         ((m),(m,n),(n),[o]())
1406        inner2t        ((j,n),(n,m),(m,k),[o]())
1407        index          (1D,0D,[o])
1408        minimum        (1D,[o])
1409        maximum        (1D,[o])
1410        wstat          ((n),(n),(),[o],())
1411        assgn          ((),())
1412
1413        # basic operations
1414        binary operations ((),(),[o]())
1415        unary operations  ((),[o]())
1416
1418       Copyright (C) 1997 Christian Soeller (c.soeller@auckland.ac.nz) &
1419       Tuomas J. Lukka (lukka@fas.harvard.edu). All rights reserved. Although
1420       destined for release as a man page with the standard PDL distribution,
1421       it is not public domain. Permission is granted to freely distribute
1422       verbatim copies of this document provided that no modifications outside
1423       of formatting be made, and that this notice remain intact.  You are
1424       permitted and encouraged to use its code and derivatives thereof in
1425       your own source code for fun or for profit as you see fit.
1426
1427
1428
1429perl v5.12.3                      2009-10-17                       INDEXING(1)
Impressum