1INDEXING(1)           User Contributed Perl Documentation          INDEXING(1)
2
3
4

NAME

6       PDL::Indexing - Introduction to indexing and slicing ndarrays.
7

OVERVIEW

9       This man page should serve as a first tutorial on the indexing and
10       threading features of PDL.
11
12       Like all vectorized languages, PDL automates looping over multi-
13       dimensional data structures ("ndarrays") using a variant of
14       mathematical vector notation.  The automatic looping is called
15       "threading", in part because ultimately PDL will implement parallel
16       processing to speed up the loops.
17
18       A lot of the flexibility and power of PDL relies on the indexing and
19       threading features of the Perl extension.  Indexing allows access to
20       the data of an ndarray in a very flexible way.  Threading provides
21       efficient vectorization of simple operations.
22
23       The values of an ndarray are stored compactly as typed values in a
24       single block of memory, not (as in a normal Perl list-of-lists) as
25       individual Perl scalars.
26
27       In the sections that follow many "methods" are called out -- these are
28       Perl operators that apply to ndarrays.  From the perldl (or pdl2)
29       shell, you can find out more about each method by typing "?" followed
30       by the method name.
31
32   Dimension lists
33       A ndarray (PDL variable), in general, is an N-dimensional array where N
34       can be 0 (for a scalar), 1 (e.g. for a sound sample), or higher values
35       for images and more complex structures.  Each dimension of the ndarray
36       has a positive integer size.  The "perl" interpreter treats each
37       ndarray as a special type of Perl scalar (a blessed Perl object,
38       actually -- but you don't have to know that to use them) that can be
39       used anywhere you can put a normal scalar.
40
41       You can access the dimensions of an ndarray as a Perl list and
42       otherwise determine the size of an ndarray with several methods.  The
43       important ones are:
44
45       nelem - the total number of elements in an ndarray
46       ndims - returns the number of dimensions in an ndarray
47       dims - returns the dimension list of an ndarray as a Perl list
48       dim - returns the size of a particular dimension of an ndarray
49
50   Indexing and Dataflow
51       PDL maintains a notion of "dataflow" between an ndarray and indexed
52       subfields of that ndarray.  When you produce an indexed subfield or
53       single element of a parent ndarray, the child and parent remain
54       attached until you manually disconnect them.  This lets you represent
55       the same data different ways within your code -- for example, you can
56       consider an RGB image simultaneously as a collection of (R,G,B) values
57       in a 3 x 1000 x 1000 image, and as three separate 1000 x 1000 color
58       planes stored in different variables.  Modifying any of the variables
59       changes the underlying memory, and the changes are reflected in all
60       representations of the data.
61
62       There are two important methods that let you control dataflow
63       connections between a child and parent ndarray:
64
65       copy - forces an explicit copy of an ndarray
66       sever - breaks the dataflow connection between an ndarray and its
67       parents (if any)
68
69   Threading and Dimension Order
70       Most PDL operations act on the first few dimensions of their ndarray
71       arguments.  For example, "sumover" sums all elements along the first
72       dimension in the list (dimension 0).  If you feed in a three-
73       dimensional ndarray, then the first dimension is considered the
74       "active" dimension and the later dimensions are "thread" dimensions
75       because they are simply looped over.  There are several ways to
76       transpose or re-order the dimension list of an ndarray.  Those
77       techniques are very fast since they don't touch the underlying data,
78       only change the way that PDL accesses the data.  The main dimension
79       ordering functions are:
80
81       mv - moves a particular dimension somewhere else in the dimension list
82       xchg - exchanges two dimensions in the dimension list, leaving the rest
83       alone
84       reorder - allows wholesale mixing of the dimensions
85       clump - clumps together two or more small dimensions into one larger
86       one
87       squeeze - eliminates any dimensions of size 1
88
89   Physical and Dummy Dimensions
90       •    document Perl level threading
91
92       •    threadids
93
94       •    update and correct description of slice
95
96       •    new functions in slice.pd (affine, lag, splitdim)
97
98       •    reworking of paragraph on explicit threading
99

Indexing and threading with PDL

101       A lot of the flexibility and power of PDL relies on the indexing and
102       looping features of the Perl extension. Indexing allows access to the
103       data of an ndarray in a very flexible way. Threading provides efficient
104       implicit looping functionality (since the loops are implemented as
105       optimized C code).
106
107       ndarrays are Perl objects that represent multidimensional arrays and
108       operations on those. In contrast to simple Perl @x style lists the
109       array data is compactly stored in a single block of memory thus taking
110       up a lot less memory and enabling use of fast C code to implement
111       operations (e.g. addition, etc) on ndarrays.
112
113   ndarrays can have children
114       Central to many of the indexing capabilities of PDL are the relation of
115       "parent" and "child" between ndarrays. Many of the indexing commands
116       create a new ndarray from an existing ndarray. The new ndarray is the
117       "child" and the old one is the "parent". The data of the new ndarray is
118       defined by a transformation that specifies how to generate (compute)
119       its data from the parent's data. The relation between the child ndarray
120       and its parent are often bidirectional, meaning that changes in the
121       child's data are propagated back to the parent. (Note: You see, we are
122       aiming in our terminology already towards the new dataflow features.
123       The kind of dataflow that is used by the indexing commands (about which
124       you will learn in a minute) is always in operation, not only when you
125       have explicitly switched on dataflow in your ndarray by saying
126       "$x->doflow". For further information about data flow check the
127       dataflow man page.)
128
129       Another way to interpret the ndarrays created by our indexing commands
130       is to view them as a kind of intelligent pointer that points back to
131       some portion or all of its parent's data. Therefore, it is not
132       surprising that the parent's data (or a portion of it) changes when
133       manipulated through this "pointer". After these introductory remarks
134       that hopefully prepared you for what is coming (rather than confuse you
135       too much) we are going to dive right in and start with a description of
136       the indexing commands and some typical examples how they might be used
137       in PDL programs. We will further illustrate the pointer/dataflow
138       analogies in the context of some of the examples later on.
139
140       There are two different implementations of this ``smart pointer''
141       relationship: the first one, which is a little slower but works for any
142       transformation is simply to do the transformation forwards and
143       backwards as necessary. The other is to consider the child ndarray a
144       ``virtual'' ndarray, which only stores a pointer to the parent and
145       access information so that routines which use the child ndarray
146       actually directly access the data in the parent.  If the virtual
147       ndarray is given to a routine which cannot use it, PDL transparently
148       physicalizes the virtual ndarray before letting the routine use it.
149
150       Currently (1.94_01) all transformations which are ``affine'', i.e. the
151       indices of the data item in the parent ndarray are determined by a
152       linear transformation (+ constant) from the indices of the child
153       ndarray result in virtual ndarrays. All other indexing routines (e.g.
154       "->index(...)") result in physical ndarrays.  All routines compiled by
155       PP can accept affine ndarrays (except those routines that pass pointers
156       to external library functions).
157
158       Note that whether something is affine or not does not affect the
159       semantics of what you do in any way: both
160
161        $x->index(...) .= 5;
162        $x->slice(...) .= 5;
163
164       change the data in $x. The affinity does, however, have a significant
165       impact on memory usage and performance.
166
167   Slicing ndarrays
168       Probably the most important application of the concept of parent/child
169       ndarrays is the representation of rectangular slices of a physical
170       ndarray by a virtual ndarray. Having talked long enough about concepts
171       let's get more specific. Suppose we are working with a 2D ndarray
172       representing a 5x5 image (it's unusually small so that we can print it
173       without filling several screens full of digits).
174
175        pdl> $im = sequence(5,5)
176        pdl> p $im
177
178        [
179         [ 0  1  2  3  4]
180         [ 5  6  7  8  9]
181         [10 11 12 13 14]
182         [15 16 17 18 19]
183         [20 21 22 23 24]
184        ]
185
186        pdl> help vars
187        PDL variables in package main::
188
189        Name         Type   Dimension       Flow  State          Mem
190        ----------------------------------------------------------------
191        $im          Double D [5,5]                P            0.20Kb
192
193       [ here it might be appropriate to quickly talk about the "help vars"
194       command that provides information about ndarrays in the interactive
195       "perldl" or "pdl2" shell that comes with PDL.  ]
196
197       Now suppose we want to create a 1-D ndarray that just references one
198       line of the image, say line 2; or an ndarray that represents all even
199       lines of the image (imagine we have to deal with even and odd frames of
200       an interlaced image due to some peculiar behaviour of our frame
201       grabber). As another frequent application of slices we might want to
202       create an ndarray that represents a rectangular region of the image
203       with top and bottom reversed. All these effects (and many more) can be
204       easily achieved with the powerful slice function:
205
206        pdl> $line = $im->slice(':,(2)')
207        pdl> $even = $im->slice(':,1:-1:2')
208        pdl> $area = $im->slice('3:4,3:1')
209        pdl> help vars  # or just PDL->vars
210        PDL variables in package main::
211
212        Name         Type   Dimension       Flow  State          Mem
213        ----------------------------------------------------------------
214        $even        Double D [5,2]                -C           0.00Kb
215        $im          Double D [5,5]                P            0.20Kb
216        $line        Double D [5]                  -C           0.00Kb
217        $area        Double D [2,3]                -C           0.00Kb
218
219       All three "child" ndarrays are children of $im or in the other (largely
220       equivalent) interpretation pointers to data of $im.  Operations on
221       those virtual ndarrays access only those portions of the data as
222       specified by the argument to slice. So we can just print line 2:
223
224        pdl> p $line
225        [10 11 12 13 14]
226
227       Also note the difference in the "Flow State" of $area above and below:
228
229        pdl> p $area
230        pdl> help $area
231        This variable is Double D [2,3]                VC           0.00Kb
232
233       The following demonstrates that $im and $line really behave as you
234       would expect from a pointer-like object (or in the dataflow picture:
235       the changes in $line's data are propagated back to $im):
236
237        pdl> $im++
238        pdl> p $line
239        [11 12 13 14 15]
240        pdl> $line += 2
241        pdl> p $im
242
243        [
244         [ 1  2  3  4  5]
245         [ 6  7  8  9 10]
246         [13 14 15 16 17]
247         [16 17 18 19 20]
248         [21 22 23 24 25]
249        ]
250
251       Note how assignment operations on the child virtual ndarrays change the
252       parent physical ndarray and vice versa (however, the basic "="
253       assignment doesn't, use ".=" to obtain that effect. See below for the
254       reasons).  The virtual child ndarrays are something like "live links"
255       to the "original" parent ndarray. As previously said, they can be
256       thought of to work similar to a C-pointer. But in contrast to a
257       C-pointer they carry a lot more information. Firstly, they specify the
258       structure of the data they represent (the dimensionality of the new
259       ndarray) and secondly, specify how to create this structure from its
260       parents data (the way this works is buried in the internals of PDL and
261       not important for you to know anyway (unless you want to hack the core
262       in the future or would like to become a PDL guru in general (for a
263       definition of this strange creature see PDL::Internals)).
264
265       The previous examples have demonstrated typical usage of the slice
266       function. Since the slicing functionality is so important here is an
267       explanation of the syntax for the string argument to slice:
268
269        $vpdl = $x->slice('ind0,ind1...')
270
271       where "ind0" specifies what to do with index No 0 of the ndarray $x,
272       etc. Each element of the comma separated list can have one of the
273       following forms:
274
275       ':'   Use the whole dimension
276
277       'n'   Use only index "n". The dimension of this index in the resulting
278             virtual ndarray is 1. An example involving those first two index
279             formats:
280
281              pdl> $column = $im->slice('2,:')
282              pdl> $row = $im->slice(':,0')
283              pdl> p $column
284
285              [
286               [ 3]
287               [ 8]
288               [15]
289               [18]
290               [23]
291              ]
292
293              pdl> p $row
294
295              [
296               [1 2 3 4 5]
297              ]
298
299              pdl> help $column
300              This variable is Double D [1,5]                VC           0.00Kb
301
302              pdl> help $row
303              This variable is Double D [5,1]                VC           0.00Kb
304
305       '(n)' Use only index "n". This dimension is removed from the resulting
306             ndarray (relying on the fact that a dimension of size 1 can
307             always be removed). The distinction between this case and the
308             previous one becomes important in assignments where left and
309             right hand side have to have appropriate dimensions.
310
311              pdl> $line = $im->slice(':,(0)')
312              pdl> help $line
313              This variable is Double D [5]                  -C           0.00Kb
314
315              pdl> p $line
316              [1 2 3 4 5]
317
318             Spot the difference to the previous example?
319
320       'n1:n2' or 'n1:n2:n3'
321             Take the range of indices from "n1" to "n2" or (second form) take
322             the range of indices from "n1" to "n2" with step "n3". An example
323             for the use of this format is the previous definition of the sub-
324             image composed of even lines.
325
326              pdl> $even = $im->slice(':,1:-1:2')
327
328             This example also demonstrates that negative indices work like
329             they do for normal Perl style arrays by counting backwards from
330             the end of the dimension. If "n2" is smaller than "n1" (in the
331             example -1 is equivalent to index 4) the elements in the virtual
332             ndarray are effectively reverted with respect to its parent.
333
334       '*[n]'
335             Add a dummy dimension. The size of this dimension will be 1 by
336             default or equal to "n" if the optional numerical argument is
337             given.
338
339             Now, this is really something a bit strange on first sight. What
340             is a dummy dimension? A dummy dimension inserts a dimension where
341             there wasn't one before. How is that done ? Well, in the case of
342             the new dimension having size 1 it can be easily explained by the
343             way in which you can identify a vector (with "m" elements) with
344             an "(1,m)" or "(m,1)" matrix. The same holds obviously for higher
345             dimensional objects. More interesting is the case of a dummy
346             dimensions of size greater than one (e.g. "slice('*5,:')"). This
347             works in the same way as a call to the dummy function creates a
348             new dummy dimension.  So read on and check its explanation below.
349
350       '([n1:n2[:n3]]=i)'
351             [Not yet implemented ??????]  With an argument like this you make
352             generalised diagonals. The diagonal will be dimension no. "i" of
353             the new output ndarray and (if optional part in brackets
354             specified) will extend along the range of indices specified of
355             the respective parent ndarray's dimension. In general an argument
356             like this only makes sense if there are other arguments like this
357             in the same call to slice. The part in brackets is optional for
358             this type of argument. All arguments of this type that specify
359             the same target dimension "i" have to relate to the same number
360             of indices in their parent dimension. The best way to explain it
361             is probably to give an example, here we make an ndarray that
362             refers to the elements along the space diagonal of its parent
363             ndarray (a cube):
364
365              $cube = zeroes(5,5,5);
366              $sdiag = $cube->slice('(=0),(=0),(=0)');
367
368             The above command creates a virtual ndarray that represents the
369             diagonal along the parents' dimension no. 0, 1 and 2 and makes
370             its dimension 0 (the only dimension) of it. You use the extended
371             syntax if the dimension sizes of the parent dimensions you want
372             to build the diagonal from have different sizes or you want to
373             reverse the sequence of elements in the diagonal, e.g.
374
375              $rect = zeroes(12,3,5,6,2);
376              $vpdl = $rect->slice('2:7,(0:1=1),(4),(5:4=1),(=1)');
377
378             So the elements of $vpdl will then be related to those of its
379             parent in way we can express as:
380
381               vpdl(i,j) = rect(i+2,j,4,5-j,j)       0<=i<5, 0<=j<2
382
383       [ work in the new index function: "$y = $x->index($c);" ???? ]
384
385   There are different kinds of assignments in PDL
386       The previous examples have already shown that virtual ndarrays can be
387       used to operate on or access portions of data of a parent ndarray. They
388       can also be used as lvalues in assignments (as the use of "++" in some
389       of the examples above has already demonstrated). For explicit
390       assignments to the data represented by a virtual ndarray you have to
391       use the overloaded ".=" operator (which in this context we call
392       propagated assignment). Why can't you use the normal assignment
393       operator "="?
394
395       Well, you definitely still can use the '=' operator but it wouldn't do
396       what you want. This is due to the fact that the '=' operator cannot be
397       overloaded in the same way as other assignment operators. If we tried
398       to use '=' to try to assign data to a portion of a physical ndarray
399       through a virtual ndarray we wouldn't achieve the desired effect
400       (instead the variable representing the virtual ndarray (a reference to
401       a blessed thingy) would after the assignment just contain the reference
402       to another blessed thingy which would behave to future assignments as a
403       "physical" copy of the original rvalue [this is actually not yet clear
404       and subject of discussions in the PDL developers mailing list]. In that
405       sense it would break the connection of the ndarray to the parent [
406       isn't this behaviour in a sense the opposite of what happens in
407       dataflow, where ".=" breaks the connection to the parent? ].
408
409       E.g.
410
411        pdl> $line = $im->slice(':,(2)')
412        pdl> $line = zeroes(5);
413        pdl> $line++;
414        pdl> p $im
415
416        [
417         [ 1  2  3  4  5]
418         [ 6  7  8  9 10]
419         [13 14 15 16 17]
420         [16 17 18 19 20]
421         [21 22 23 24 25]
422        ]
423
424        pdl> p $line
425        [1 1 1 1 1]
426
427       But using ".="
428
429        pdl> $line = $im->slice(':,(2)')
430        pdl> $line .= zeroes(5)
431        pdl> $line++
432        pdl> p $im
433
434        [
435         [ 1  2  3  4  5]
436         [ 6  7  8  9 10]
437         [ 1  1  1  1  1]
438         [16 17 18 19 20]
439         [21 22 23 24 25]
440        ]
441
442        pdl> print $line
443        [1 1 1 1 1]
444
445       Also, you can substitute
446
447        pdl> $line .= 0;
448
449       for the assignment above (the zero is converted to a scalar ndarray,
450       with no dimensions so it can be assigned to any ndarray).
451
452       A nice feature in recent perl versions is lvalue subroutines (i.e.,
453       versions 5.6.x and higher including all perls currently supported by
454       PDL).  That allows one to use the slicing syntax on both sides of the
455       assignment:
456
457        pdl> $im->slice(':,(2)') .= zeroes(5)->xvals->float
458
459       Related to the lvalue sub assignment feature is a little trap for the
460       unwary: recent perls introduced a "feature" which breaks PDL's use of
461       lvalue subs for slice assignments when running under the perl debugger,
462       "perl -d".  Under the debugger, the above usage gives an error like: "
463       Can't return a temporary from lvalue subroutine... " So you must use
464       syntax like this:
465
466        pdl> ($pdl = $im->slice(':,(2)')) .= zeroes(5)->xvals->float
467
468       which works both with and without the debugger but is arguably clumsy
469       and awkward to read.
470
471       Note that there can be a problem with assignments like this when lvalue
472       and rvalue ndarrays refer to overlapping portions of data in the parent
473       ndarray:
474
475        # revert the elements of the first line of $x
476        ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)');
477
478       Currently, the parent data on the right side of the assignments is not
479       copied before the (internal) assignment loop proceeds. Therefore, the
480       outcome of this assignment will depend on the sequence in which
481       elements are assigned and almost certainly not do what you wanted.  So
482       the semantics are currently undefined for now and liable to change
483       anytime. To obtain the desired behaviour, use
484
485        ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)')->copy;
486
487       which makes a physical copy of the slice or
488
489        ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)')->sever;
490
491       which returns the same slice but severs the connection of the slice to
492       its parent.
493
494   Other functions that manipulate dimensions
495       Having talked extensively about the slice function it should be noted
496       that this is not the only PDL indexing function. There are additional
497       indexing functions which are also useful (especially in the context of
498       threading which we will talk about later). Here are a list and some
499       examples how to use them.
500
501       "dummy"
502           inserts a dummy dimension of the size you specify (default 1) at
503           the chosen location. You can't wait to hear how that is achieved?
504           Well, all elements with index "(X,x,Y)" ("0<=x<size_of_dummy_dim")
505           just map to the element with index "(X,Y)" of the parent ndarray
506           (where "X" and "Y" refer to the group of indices before and after
507           the location where the dummy dimension was inserted.)
508
509           This example calculates the x coordinate of the centroid of an
510           image (later we will learn that we didn't actually need the dummy
511           dimension thanks to the magic of implicit threading; but using
512           dummy dimensions the code would also work in a thread-less world;
513           though once you have worked with PDL threads you wouldn't want to
514           live without them again).
515
516            # centroid
517            ($xd,$yd) = $im->dims;
518            $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);
519
520           Let's explain how that works in a little more detail. First, the
521           product:
522
523            $xvs = xvals(zeroes($xd));
524            print $xvs->dummy(1,$yd);      # repeat the line $yd times
525            $prod = $im*xvs->dummy(1,$yd); # form the pixel-wise product with
526                                           # the repeated line of x-values
527
528           The rest is then summing the results of the pixel-wise product
529           together and normalizing with the sum of all pixel values in the
530           original image thereby calculating the x-coordinate of the "center
531           of mass" of the image (interpreting pixel values as local mass)
532           which is known as the centroid of an image.
533
534           Next is a (from the point of view of memory consumption) very cheap
535           conversion from grey-scale to RGB, i.e. every pixel holds now a
536           triple of values instead of a scalar. The three values in the
537           triple are, fortunately, all the same for a grey image, so that our
538           trick works well in that it maps all the three members of the
539           triple to the same source element:
540
541            # a cheap grey-scale to RGB conversion
542            $rgb = $grey->dummy(0,3)
543
544           Unfortunately this trick cannot be used to convert your old B/W
545           photos to color ones in the way you'd like. :(
546
547           Note that the memory usage of ndarrays with dummy dimensions is
548           especially sensitive to the internal representation. If the ndarray
549           can be represented as a virtual affine (``vaffine'') ndarray, only
550           the control structures are stored. But if $y in
551
552            $x = zeroes(10000);
553            $y = $x->dummy(1,10000);
554
555           is made physical by some routine, you will find that the memory
556           usage of your program has suddenly grown by 100Mb.
557
558       "diagonal"
559           replaces two dimensions (which have to be of equal size) by one
560           dimension that references all the elements along the "diagonal"
561           along those two dimensions. Here, we have two examples which should
562           appear familiar to anyone who has ever done some linear algebra.
563           Firstly, make a unity matrix:
564
565            # unity matrix
566            $e = zeroes(float, 3, 3); # make everything zero
567            ($tmp = $e->diagonal(0,1)) .= 1; # set the elements along the diagonal to 1
568            print $e;
569
570           Or the other diagonal:
571
572            ($tmp = $e->slice(':-1:0')->diagonal(0,1)) .= 2;
573            print $e;
574
575           (Did you notice how we used the slice function to revert the
576           sequence of lines before setting the diagonal of the new child,
577           thereby setting the cross diagonal of the parent ?)  Or a mapping
578           from the space of diagonal matrices to the field over which the
579           matrices are defined, the trace of a matrix:
580
581            # trace of a matrix
582            $trace = sum($mat->diagonal(0,1));  # sum all the diagonal elements
583
584       "xchg" and "mv"
585           xchg exchanges or "transposes" the two  specified dimensions.  A
586           straightforward example:
587
588            # transpose a matrix (without explicitly reshuffling data and
589            # making a copy)
590            $prod = $x x $x->xchg(0,1);
591
592           $prod should now be pretty close to the unity matrix if $x is an
593           orthogonal matrix. Often "xchg" will be used in the context of
594           threading but more about that later.
595
596           mv works in a similar fashion. It moves a dimension (specified by
597           its number in the parent) to a new position in the new child
598           ndarray:
599
600            $y = $x->mv(4,0);  # make the 5th dimension of $x the first in the
601                               # new child $y
602
603           The difference between "xchg" and "mv" is that "xchg" only changes
604           the position of two dimensions with each other, whereas "mv"
605           inserts the first dimension to the place of second, moving the
606           other dimensions around accordingly.
607
608       "clump"
609           collapses several dimensions into one. Its only argument specifies
610           how many dimensions of the source ndarray should be collapsed
611           (starting from the first). An (admittedly unrealistic) example is a
612           3D ndarray which holds data from a stack of image files that you
613           have just read in. However, the data from each image really
614           represents a 1D time series and has only been arranged that way
615           because it was digitized with a frame grabber. So to have it again
616           as an array of time sequences you say
617
618            pdl> $seqs = $stack->clump(2)
619            pdl> help vars
620            PDL variables in package main::
621
622            Name         Type   Dimension       Flow  State          Mem
623            ----------------------------------------------------------------
624            $seqs        Double D [8000,50]            -C           0.00Kb
625            $stack       Double D [100,80,50]          P            3.05Mb
626
627           Unrealistic as it may seem, our confocal microscope software writes
628           data (sometimes) this way. But more often you use clump to achieve
629           a certain effect when using implicit or explicit threading.
630
631   Calls to indexing functions can be chained
632       As you might have noticed in some of the examples above calls to the
633       indexing functions can be nicely chained since all of these functions
634       return a newly created child object. However, when doing extensive
635       index manipulations in a chain be sure to keep track of what you are
636       doing, e.g.
637
638        $x->xchg(0,1)->mv(0,4)
639
640       moves the dimension 1 of $x to position 4 since when the second command
641       is executed the original dimension 1 has been moved to position 0 of
642       the new child that calls the "mv" function. I think you get the idea
643       (in spite of my convoluted explanations).
644
645   Propagated assignments ('.=') and dummy dimensions
646       A subtlety related to indexing is the assignment to ndarrays containing
647       dummy dimensions of size greater than 1. These assignments (using ".=")
648       are forbidden since several elements of the lvalue ndarray point to the
649       same element of the parent. As a consequence the value of those parent
650       elements are potentially ambiguous and would depend on the sequence in
651       which the implementation makes the assignments to elements. Therefore,
652       an assignment like this:
653
654        $x = pdl [1,2,3];
655        $y = $x->dummy(1,4);
656        $y .= yvals(zeroes(3,4));
657
658       can produce unexpected results and the results are explicitly undefined
659       by PDL because when PDL gets parallel computing features, the current
660       result may well change.
661
662       From the point of view of dataflow the introduction of greater-size-
663       than-one dummy dimensions is regarded as an irreversible transformation
664       (similar to the terminology in thermodynamics) which precludes backward
665       propagation of assignment to a parent (which you had explicitly
666       requested using the ".=" assignment). A similar problem to watch out
667       for occurs in the context of threading where sometimes dummy dimensions
668       are created implicitly during the thread loop (see below).
669
670   Reasons for the parent/child (or "pointer") concept
671       [ this will have to wait a bit ]
672
673        XXXXX being memory efficient
674        XXXXX in the context of threading
675        XXXXX very flexible and powerful way of accessing portions of ndarray data
676              (in much more general way than sec, etc allow)
677        XXXXX efficient implementation
678        XXXXX difference to section/at, etc.
679
680   How to make things physical again
681       [ XXXXX fill in later when everything has settled a bit more ]
682
683        ** When needed (xsub routine interfacing C lib function)
684        ** How achieved (->physical)
685        ** How to test (isphysical (explain how it works currently))
686        ** ->copy and ->sever
687

Threading

689       In the previous paragraph on indexing we have already mentioned the
690       term occasionally but now its really time to talk explicitly about
691       "threading" with ndarrays. The term threading has many different
692       meanings in different fields of computing. Within the framework of PDL
693       it could probably be loosely defined as an implicit looping facility.
694       It is implicit because you don't specify anything like enclosing for-
695       loops but rather the loops are automatically (or 'magically') generated
696       by PDL based on the dimensions of the ndarrays involved. This should
697       give you a first idea why the index/dimension manipulating functions
698       you have met in the previous paragraphs are especially important and
699       useful in the context of threading.  The other ingredient for threading
700       (apart from the ndarrays involved) is a function that is threading
701       aware (generally, these are PDL::PP compiled functions) and that the
702       ndarrays are "threaded" over.  So much about the terminology and now
703       let's try to shed some light on what it all means.
704
705   Implicit threading - a first example
706       There are two slightly different variants of threading. We start with
707       what we call "implicit threading". Let's pick a practical example that
708       involves looping of a function over many elements of a ndarray. Suppose
709       we have an RGB image that we want to convert to grey-scale. The RGB
710       image is represented by a 3-dim ndarray "im(3,x,y)" where the first
711       dimension contains the three color components of each pixel and "x" and
712       "y" are width and height of the image, respectively. Next we need to
713       specify how to convert a color-triple at a given pixel into a grey-
714       value (to be a realistic example it should represent the relative
715       intensity with which our color insensitive eye cells would detect that
716       color to achieve what we would call a natural conversion from color to
717       grey-scale). An approximation that works quite well is to compute the
718       grey intensity from each RGB triplet (r,g,b) as a weighted sum
719
720        grey-value = 77/256*r + 150/256*g + 29/256*b =
721            inner([77,150,29]/256, [r,g,b])
722
723       where the last form indicates that we can write this as an inner
724       product of the 3-vector comprising the weights for red, green and blue
725       components with the 3-vector containing the color components.
726       Traditionally, we might have written a function like the following to
727       process the whole image:
728
729        my @dims=$im->dims;
730        # here normally check that first dim has correct size (3), etc
731        $grey=zeroes(@dims[1,2]);   # make the ndarray for the resulting grey image
732        $w = pdl [77,150,29] / 256; # the vector of weights
733        for ($j=0;$j<dims[2];$j++) {
734           for ($i=0;$i<dims[1];$i++) {
735               # compute the pixel value
736               $tmp = inner($w,$im->slice(':,(i),(j)'));
737               set($grey,$i,$j,$tmp); # and set it in the grey-scale image
738           }
739        }
740
741       Now we write the same using threading (noting that "inner" is a
742       threading aware function defined in the PDL::Primitive package)
743
744        $grey = inner($im,pdl([77,150,29]/256));
745
746       We have ended up with a one-liner that automatically creates the
747       ndarray $grey with the right number and size of dimensions and performs
748       the loops automatically (these loops are implemented as fast C code in
749       the internals of PDL).  Well, we still owe you an explanation how this
750       'magic' is achieved.
751
752   How does the example work ?
753       The first thing to note is that every function that is threading aware
754       (these are without exception functions compiled from concise
755       descriptions by PDL::PP, later just called PP-functions) expects a
756       defined (minimum) number of dimensions (we call them core dimensions)
757       from each of its ndarray arguments. The inner function expects two one-
758       dimensional (input) parameters from which it calculates a zero-
759       dimensional (output) parameter. We write that symbolically as
760       "inner((n),(n),[o]())" and call it "inner"'s signature, where n
761       represents the size of that dimension. n being equal in the first and
762       second parameter means that those dimensions have to be of equal size
763       in any call. As a different example take the outer product which takes
764       two 1D vectors to generate a 2D matrix, symbolically written as
765       "outer((n),(m),[o](n,m))". The "[o]" in both examples indicates that
766       this (here third) argument is an output argument. In the latter example
767       the dimensions of first and second argument don't have to agree but you
768       see how they determine the size of the two dimensions of the output
769       ndarray.
770
771       Here is the point when threading finally enters the game. If you call
772       PP-functions with ndarrays that have more than the required core
773       dimensions the first dimensions of the ndarray arguments are used as
774       the core dimensions and the additional extra dimensions are threaded
775       over. Let us demonstrate this first with our example above
776
777        $grey = inner($im,$w); # w is the weight vector from above
778
779       In this case $w is 1D and so supplied just the core dimension, $im is
780       3D, more specifically "(3,x,y)". The first dimension (of size 3) is the
781       required core dimension that matches (as required by inner) the first
782       (and only) dimension of $w. The second dimension is the first thread
783       dimension (of size "x") and the third is here the second thread
784       dimension (of size "y"). The output ndarray is automatically created
785       (as requested by setting $grey to "null" prior to invocation). The
786       output dimensions are obtained by appending the loop dimensions (here
787       "(x,y)") to the core output dimensions (here 0D) to yield the final
788       dimensions of the auto-created ndarray (here "0D+2D=2D" to yield a 2D
789       output of size "(x,y)").
790
791       So the above command calls the core functionality that computes the
792       inner product of two 1D vectors "x*y" times with $w and all 1D slices
793       of the form "(':,(i),(j)')" of $im and sets the respective elements of
794       the output ndarray "$grey(i,j)" to the result of each computation. We
795       could write that symbolically as
796
797        $grey(0,0) = f($w,$im(:,(0),(0)))
798        $grey(1,0) = f($w,$im(:,(1),(0)))
799            .
800            .
801            .
802        $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
803        $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))
804
805       But this is done automatically by PDL without writing any explicit Perl
806       loops.  We see that the command really creates an output ndarray with
807       the right dimensions and sets the elements indeed to the result of the
808       computation for each pixel of the input image.
809
810       When even more ndarrays and extra dimensions are involved things get a
811       bit more complicated. We will first give the general rules how the
812       thread dimensions depend on the dimensions of input ndarrays enabling
813       you to figure out the dimensionality of an auto-created output ndarray
814       (for any given set of input ndarrays and core dimensions of the PP-
815       function in question). The general rules will most likely appear a bit
816       confusing on first sight so that we'll set out to illustrate the usage
817       with a set of further examples (which will hopefully also demonstrate
818       that there are indeed many practical situations where threading comes
819       in extremely handy).
820
821   A call for coding discipline
822       Before we point out the other technical details of threading, please
823       note this call for programming discipline when using threading:
824
825       In order to preserve human readability, PLEASE comment any nontrivial
826       expression in your code involving threading.  Most importantly, for any
827       subroutine, include information at the beginning about what you expect
828       the dimensions to represent (or ranges of dimensions).
829
830       As a warning, look at this undocumented function and try to guess what
831       might be going on:
832
833        sub lookup {
834          my ($im,$palette) = @_;
835          my $res;
836          index($palette->xchg(0,1),
837                     $im->long->dummy(0,($palette->dim)[0]),
838                     ($res=null));
839          return $res;
840        }
841
842       Would you agree that it might be difficult to figure out expected
843       dimensions, purpose of the routine, etc ?  (If you want to find out
844       what this piece of code does, see below)
845
846   How to figure out the loop dimensions
847       There are a couple of rules that allow you to figure out number and
848       size of loop dimensions (and if the size of your input ndarrays comply
849       with the threading rules). Dimensions of any ndarray argument are
850       broken down into two groups in the following: Core dimensions (as
851       defined by the PP-function, see Appendix B for a list of PDL
852       primitives) and extra dimensions which comprises all remaining
853       dimensions of that ndarray. For example calling a function "func" with
854       the signature "func((n,m),[o](n))" with an ndarray "$x(2,4,7,1,3)" as
855       "f($x,($o = null))" results in the semantic splitting of x's dimensions
856       into: core dimensions "(2,4)" and extra dimensions "(7,1,3)".
857
858       R0    Core dimensions are identified with the first N dimensions of the
859             respective ndarray argument (and are required). Any further
860             dimensions are extra dimensions and used to determine the loop
861             dimensions.
862
863       R1    The number of (implicit) loop dimensions is equal to the maximal
864             number of extra dimensions taken over the set of ndarray
865             arguments.
866
867       R2    The size of each of the loop dimensions is derived from the size
868             of the respective dimensions of the ndarray arguments. The size
869             of a loop dimension is given by the maximal size found in any of
870             the ndarrays having this extra dimension.
871
872       R3    For all ndarrays that have a given extra dimension the size must
873             be equal to the size of the loop dimension (as determined by the
874             previous rule) or 1; otherwise you raise a runtime exception. If
875             the size of the extra dimension in an ndarray is one it is
876             implicitly treated as a dummy dimension of size equal to that
877             loop dim size when performing the thread loop.
878
879       R4    If an ndarray doesn't have a loop dimension, in the thread loop
880             this ndarray is treated as if having a dummy dimension of size
881             equal to the size of that loop dimension.
882
883       R5    If output auto-creation is used (by setting the relevant ndarray
884             to "PDL->null" before invocation) the number of dimensions of the
885             created ndarray is equal to the sum of the number of core output
886             dimensions + number of loop dimensions. The size of the core
887             output dimensions is derived from the relevant dimension of input
888             ndarrays (as specified in the function definition) and the sizes
889             of the other dimensions are equal to the size of the loop
890             dimension it is derived from. The automatically created ndarray
891             will be physical (unless dataflow is in operation).
892
893       In this context, note that you can run into the problem with assignment
894       to ndarrays containing greater-than-one dummy dimensions (see above).
895       Although your output ndarray(s) didn't contain any dummy dimensions in
896       the first place they may end up with implicitly created dummy
897       dimensions according to R4.
898
899       As an example, suppose we have a (here unspecified) PP-function with
900       the signature:
901
902        func((m,n),(m,n,o),(m),[o](m,o))
903
904       and you call it with 3 ndarrays "$x(5,3,10,11)", "$y(5,3,2,10,1,12)",
905       and "$z(5,1,11,12)" as
906
907        func($x,$y,$z,($d=null))
908
909       then the number of loop dimensions is 3 (by "R0+R1" from $y and $z)
910       with sizes "(10,11,12)" (by R2); the two output core dimensions are
911       "(5,2)" (from the signature of func) resulting in a 5-dimensional
912       output ndarray $c of size "(5,2,10,11,12)" (see R5) and (the
913       automatically created) $d is derived from "($x,$y,$z)" in a way that
914       can be expressed in pdl pseudo-code as
915
916        $d(:,:,i,j,k) .= func($x(:,:,i,j),$y(:,:,:,i,0,k),$z(:,0,j,k))
917           with 0<=i<10, 0<=j<=11, 0<=k<12
918
919       If we analyze the color to grey-scale conversion again with these rules
920       in mind we note another great advantage of implicit threading.  We can
921       call the conversion with an ndarray representing a pixel (im(3)), a
922       line of rgb pixels ("im(3,x)"), a proper color image ("im(3,x,y)") or a
923       whole stack of RGB images ("im(3,x,y,z)"). As long as $im is of the
924       form "(3,...)" the automatically created output ndarray will contain
925       the right number of dimensions and contain the intensity data as we
926       expect it since the loops have been implicitly performed thanks to
927       implicit threading. You can easily convince yourself that calling with
928       a color pixel $grey is 0D, with a line it turns out 1D grey(x), with an
929       image we get "grey(x,y)" and finally we get a converted image stack
930       "grey(x,y,z)".
931
932       Let's fill these general rules with some more life by going through a
933       couple of further examples. The reader may try to figure out equivalent
934       formulations with explicit for-looping and compare the flexibility of
935       those routines using implicit threading to the explicit formulation.
936       Furthermore, especially when using several thread dimensions it is a
937       useful exercise to check the relative speed by doing some benchmark
938       tests (which we still have to do).
939
940       First in the row is a slightly reworked centroid example, now coded
941       with threading in mind.
942
943        # threaded mult to calculate centroid coords, works for stacks as well
944        $xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) /
945              sumover($im->clump(2));
946
947       Let's analyze what's going on step by step. First the product:
948
949        $prod = $im*xvals(zeroes(($im->dims)[0]))
950
951       This will actually work for $im being one, two, three, and higher
952       dimensional. If $im is one-dimensional it's just an ordinary product
953       (in the sense that every element of $im is multiplied with the
954       respective element of "xvals(...)"), if $im has more dimensions further
955       threading is done by adding appropriate dummy dimensions to
956       "xvals(...)"  according to R4.  More importantly, the two sumover
957       operations show a first example of how to make use of the dimension
958       manipulating commands. A quick look at sumover's signature will remind
959       you that it will only "gobble up" the first dimension of a given input
960       ndarray. But what if we want to really compute the sum over all
961       elements of the first two dimensions? Well, nothing keeps us from
962       passing a virtual ndarray into sumover which in this case is formed by
963       clumping the first two dimensions of the "parent ndarray" into one.
964       From the point of view of the parent ndarray the sum is now computed
965       over the first two dimensions, just as we wanted, though sumover has
966       just done the job as specified by its signature. Got it ?
967
968       Another little finesse of writing the code like that: we intentionally
969       used "sumover($pdl->clump(2))" instead of "sum($pdl)" so that we can
970       either pass just an image "(x,y)" or a stack of images "(x,y,t)" into
971       this routine and get either just one x-coordinate or a vector of
972       x-coordinates (of size t) in return.
973
974       Another set of common operations are what one could call "projection
975       operations". These operations take a N-D ndarray as input and return a
976       (N-1)-D "projected" ndarray. These operations are often performed with
977       functions like sumover, prodover, minimum and maximum.  Using again
978       images as examples we might want to calculate the maximum pixel value
979       for each line of an image or image stack. We know how to do that
980
981        # maxima of lines (as function of line number and time)
982        maximum($stack,($ret=null));
983
984       But what if you want to calculate maxima per column when implicit
985       threading always applies the core functionality to the first dimension
986       and threads over all others? How can we achieve that instead the core
987       functionality is applied to the second dimension and threading is done
988       over the others. Can you guess it? Yes, we make a virtual ndarray that
989       has the second dimension of the "parent ndarray" as its first dimension
990       using the "mv" command.
991
992        # maxima of columns (as function of column number and time)
993        maximum($stack->mv(1,0),($ret=null));
994
995       and calculating all the sums of sub-slices over the third dimension is
996       now almost too easy
997
998        # sums of pixels in time (assuming time is the third dim)
999        sumover($stack->mv(2,0),($ret=null));
1000
1001       Finally, if you want to apply the operation to all elements (like max
1002       over all elements or sum over all elements) regardless of the
1003       dimensions of the ndarray in question "clump" comes in handy. As an
1004       example look at the definition of "sum" (as defined in "Ufunc.pm"):
1005
1006        sub sum {
1007          PDL::Ufunc::sumover($name->clump(-1),($tmp=null));
1008          return $tmp->at(); # return a Perl number, not a 0D ndarray
1009        }
1010
1011       We have already mentioned that all basic operations support threading
1012       and assignment is no exception. So here are a couple of threaded
1013       assignments
1014
1015        pdl> $im = zeroes(byte, 10,20)
1016        pdl> $line = exp(-rvals(10)**2/9)
1017        # threaded assignment
1018        pdl> $im .= $line      # set every line of $im to $line
1019        pdl> $im2 .= 5         # set every element of $im2 to 5
1020
1021       By now you probably see how it works and what it does, don't you?
1022
1023       To finish the examples in this paragraph here is a function to create
1024       an RGB image from what is called a palette image. The palette image
1025       consists of two parts: an image of indices into a color lookup table
1026       and the color lookup table itself. [ describe how it works ] We are
1027       going to use a PP-function we haven't encoutered yet in the previous
1028       examples. It is the aptly named index function, signature
1029       "((n),(),[o]())" (see Appendix B) with the core functionality that
1030       "index(pdl (0,2,4,5),2,($ret=null))" will return the element with index
1031       2 of the first input ndarray. In this case, $ret will contain the value
1032       4.  So here is the example:
1033
1034        # a threaded index lookup to generate an RGB, or RGBA or YMCK image
1035        # from a palette image (represented by a lookup table $palette and
1036        # an color-index image $im)
1037        # you can say just dummy(0) since the rules of threading make it fit
1038        pdl> index($palette->xchg(0,1),
1039                      $im->long->dummy(0,($palette->dim)[0]),
1040                      ($res=null));
1041
1042       Let's go through it and explain the steps involved. Assuming we are
1043       dealing with an RGB lookup-table $palette is of size "(3,x)". First we
1044       exchange the dimensions of the palette so that looping is done over the
1045       first dimension of $palette (of size 3 that represent r, g, and b
1046       components). Now looking at $im, we add a dummy dimension of size equal
1047       to the length of the number of components (in the case we are
1048       discussing here we could have just used the number 3 since we have 3
1049       color components). We can use a dummy dimension since for red, green
1050       and blue color components we use the same index from the original
1051       image, e.g.  assuming a certain pixel of $im had the value 4 then the
1052       lookup should produce the triple
1053
1054        [palette(0,4),palette(1,4),palette(2,4)]
1055
1056       for the new red, green and blue components of the output image.
1057       Hopefully by now you have some sort of idea what the above piece of
1058       code is supposed to do (it is often actually quite complicated to
1059       describe in detail how a piece of threading code works; just go ahead
1060       and experiment a bit to get a better feeling for it).
1061
1062       If you have read the threading rules carefully, then you might have
1063       noticed that we didn't have to explicitly state the size of the dummy
1064       dimension that we created for $im; when we create it with size 1 (the
1065       default) the rules of threading make it automatically fit to the
1066       desired size (by rule R3, in our example the size would be 3 assuming a
1067       palette of size "(3,x)"). Since situations like this do occur often in
1068       practice this is actually why rule R3 has been introduced (the part
1069       that makes dimensions of size 1 fit to the thread loop dim size). So we
1070       can just say
1071
1072        pdl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null));
1073
1074       Again, you can convince yourself that this routine will create the
1075       right output if called with a pixel ($im is 0D), a line ($im is 1D), an
1076       image ($im is 2D), ..., an RGB lookup table (palette is "(3,x)") and
1077       RGBA lookup table (palette is "(4,x)", see e.g. OpenGL). This
1078       flexibility is achieved by the rules of threading which are made to do
1079       the right thing in most situations.
1080
1081       To wrap it all up once again, the general idea is as follows. If you
1082       want to achieve looping over certain dimensions and have the core
1083       functionality applied to another specified set of dimensions you use
1084       the dimension manipulating commands to create a (or several) virtual
1085       ndarray(s) so that from the point of view of the parent ndarray(s) you
1086       get what you want (always having the signature of the function in
1087       question and R1-R5 in mind!). Easy, isn't it ?
1088
1089   Output auto-creation and PP-function calling conventions
1090       At this point we have to divert to some technical detail that has to do
1091       with the general calling conventions of PP-functions and the automatic
1092       creation of output arguments.  Basically, there are two ways of
1093       invoking PDL routines, namely
1094
1095        $result = func($x,$y);
1096
1097       and
1098
1099        func($x,$y,$result);
1100
1101       If you are only using implicit threading then the output variable can
1102       be automatically created by PDL. You flag that to the PP-function by
1103       setting the output argument to a special kind of ndarray that is
1104       returned from a call to the function "PDL->null" that returns an
1105       essentially "empty" ndarray (for those interested in details there is a
1106       flag in the C pdl structure for this). The dimensions of the created
1107       ndarray are determined by the rules of implicit threading: the first
1108       dimensions are the core output dimensions to which the threading
1109       dimensions are appended (which are in turn determined by the dimensions
1110       of the input ndarrays as described above).  So you can say
1111
1112        func($x,$y,($result=PDL->null));
1113
1114       or
1115
1116        $result = func($x,$y)
1117
1118       which are exactly equivalent.
1119
1120       Be warned that you can not use output auto-creation when using explicit
1121       threading (for reasons explained in the following section on explicit
1122       threading, the second variant of threading).
1123
1124       In "tight" loops you probably want to avoid the implicit creation of a
1125       temporary ndarray in each step of the loop that comes along with the
1126       "functional" style but rather say
1127
1128        # create output ndarray of appropriate size only at first invocation
1129        $result = null;
1130        for (0...$n) {
1131             func($x,$y,$result); # in all but the first invocation $result
1132             func2($y);           # is defined and has the right size to
1133                                  # take the output provided $y's dims don't change
1134             twiddle($result,$x); # do something from $result to $x for iteration
1135        }
1136
1137       The take-home message of this section once more: be aware of the
1138       limitation on output creation when using explicit threading.
1139
1140   Explicit threading
1141       Having so far only talked about the first flavour of threading it is
1142       now about time to introduce the second variant. Instead of shuffling
1143       around dimensions all the time and relying on the rules of implicit
1144       threading to get it all right you sometimes might want to specify in a
1145       more explicit way how to perform the thread loop. It is probably not
1146       too surprising that this variant of the game is called explicit
1147       threading.  Now, before we create the wrong impression: it is not
1148       either implicit or explicit; the two flavours do mix. But more about
1149       that later.
1150
1151       The two most used functions with explicit threading are thread and
1152       unthread.  We start with an example that illustrates typical usage of
1153       the former:
1154
1155        [ # ** this is the worst possible example to start with ]
1156        #  but can be used to show that $mat += $line is different from
1157        #                               $mat->thread(0) += $line
1158        # explicit threading to add a vector to each column of a matrix
1159        pdl> $mat  = zeroes(4,3)
1160        pdl> $line = pdl (3.1416,2,-2)
1161        pdl> ($tmp = $mat->thread(0)) += $line
1162
1163       In this example, "$mat->thread(0)" tells PDL that you want the second
1164       dimension of this ndarray to be threaded over first leading to a thread
1165       loop that can be expressed as
1166
1167        for (j=0; j<3; j++) {
1168           for (i=0; i<4; i++) {
1169               mat(i,j) += src(j);
1170           }
1171        }
1172
1173       "thread" takes a list of numbers as arguments which explicitly specify
1174       which dimensions to thread over first. With the introduction of
1175       explicit threading the dimensions of an ndarray are conceptually split
1176       into three different groups the latter two of which we have already
1177       encountered: thread dimensions, core dimensions and extra dimensions.
1178
1179       Conceptually, it is best to think of those dimensions of an ndarray
1180       that have been specified in a call to "thread" as being taken away from
1181       the set of normal dimensions and put on a separate stack. So assuming
1182       we have an ndarray "x(4,7,2,8)" saying
1183
1184        $y = $x->thread(2,1)
1185
1186       creates a new virtual ndarray of dimension "y(4,8)" (which we call the
1187       remaining dims) that also has 2 thread dimensions of size "(2,7)". For
1188       the purposes of this document we write that symbolically as
1189       "y(4,8){2,7}". An important difference to the previous examples where
1190       only implicit threading was used is the fact that the core dimensions
1191       are matched against the remaining dimensions which are not necessarily
1192       the first dimensions of the ndarray. We will now specify how the
1193       presence of thread dimensions changes the rules R1-R5 for thread loops
1194       (which apply to the special case where none of the ndarray arguments
1195       has any thread dimensions).
1196
1197       T0  Core dimensions are matched against the first n remaining
1198           dimensions of the ndarray argument (note the difference to R1). Any
1199           further remaining dimensions are extra dimensions and are used to
1200           determine the implicit loop dimensions.
1201
1202       T1a The number of implicit loop dimensions is equal to the maximal
1203           number of extra dimensions taken over the set of ndarray arguments.
1204
1205       T1b The number of explicit loop dimensions is equal to the maximal
1206           number of thread dimensions taken over the set of ndarray
1207           arguments.
1208
1209       T1c The total number of loop dimensions is equal to the sum of explicit
1210           loop dimensions and implicit loop dimensions. In the thread loop,
1211           explicit loop dimensions are threaded over first followed by
1212           implicit loop dimensions.
1213
1214       T2  The size of each of the loop dimensions is derived from the size of
1215           the respective dimensions of the ndarray arguments. It is given by
1216           the maximal size found in any ndarrays having this thread dimension
1217           (for explicit loop dimensions) or extra dimension (for implicit
1218           loop dimensions).
1219
1220       T3  This rule applies to any explicit loop dimension as well as any
1221           implicit loop dimension. For all ndarrays that have a given
1222           thread/extra dimension the size must be equal to the size of the
1223           respective explicit/implicit loop dimension or 1; otherwise you
1224           raise a runtime exception. If the size of a thread/extra dimension
1225           of an ndarray is one it is implicitly treated as a dummy dimension
1226           of size equal to the explicit/implicit loop dimension.
1227
1228       T4  If an ndarray doesn't have a thread/extra dimension that
1229           corresponds to an explicit/implicit loop dimension, in the thread
1230           loop this ndarray is treated as if having a dummy dimension of size
1231           equal to the size of that loop dimension.
1232
1233       T4a All ndarrays that do have thread dimensions must have the same
1234           number of thread dimensions.
1235
1236       T5  Output auto-creation cannot be used if any of the ndarray arguments
1237           has any thread dimensions. Otherwise R5 applies.
1238
1239       The same restrictions apply with regard to implicit dummy dimensions
1240       (created by application of T4) as already mentioned in the section on
1241       implicit threading: if any of the output ndarrays has an (explicit or
1242       implicitly created) greater-than-one dummy dimension a runtime
1243       exception will be raised.
1244
1245       Let us demonstrate these rules at work in a generic case.  Suppose we
1246       have a (here unspecified) PP-function with the signature:
1247
1248        func((m,n),(m),(),[o](m))
1249
1250       and you call it with 3 ndarrays "a(5,3,10,11)", "b(3,5,10,1,12)",
1251       "c(10)" and an output ndarray "d(3,11,5,10,12)" (which can here not be
1252       automatically created) as
1253
1254        func($x->thread(1,3),$y->thread(0,3),$c,$d->thread(0,1))
1255
1256       From the signature of func and the above call the ndarrays split into
1257       the following groups of core, extra and thread dimensions (written in
1258       the form "pdl(core dims){thread dims}[extra dims]"):
1259
1260        a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]
1261
1262       With this to help us along (it is in general helpful to write the
1263       arguments down like this when you start playing with threading and want
1264       to keep track of what is going on) we further deduce that the number of
1265       explicit loop dimensions is 2 (by T1b from $a and $b) with sizes
1266       "(3,11)" (by T2); 2 implicit loop dimensions (by T1a from $b and $d) of
1267       size "(10,12)" (by T2) and the elements of are computed from the input
1268       ndarrays in a way that can be expressed in pdl pseudo-code as
1269
1270        for (l=0;l<12;l++)
1271         for (k=0;k<10;k++)
1272          for (j=0;j<11;j++)         effect of treating it as dummy dim (index j)
1273           for (i=0;i<3;i++)                         |
1274              d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))
1275
1276       Ugh, this example was really not easy in terms of bookkeeping. It
1277       serves mostly as an example how to figure out what's going on when you
1278       encounter a complicated looking expression. But now it is really time
1279       to show that threading is useful by giving some more of our so called
1280       "practical" examples.
1281
1282       [ The following examples will need some additional explanations in the
1283       future. For the moment please try to live with the comments in the code
1284       fragments. ]
1285
1286       Example 1:
1287
1288        *** inverse of matrix represented by eigvecs and eigvals
1289        ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
1290        **    =>  inverse M^-1 = A^T x diag(1/lambda_i) x A
1291        ** first $tmp = diag(1/lambda_i)*A
1292        ** then  A^T * $tmp by threaded inner product
1293        # index handling so that matrices print correct under pdl
1294        $inv .= $evecs*0;  # just copy to get appropriately sized output
1295        $tmp .= $evecs;    # initialise, no back-propagation
1296        ($tmp2 = $tmp->thread(0)) /= $evals;    #  threaded division
1297        # and now a matrix multiplication in disguise
1298        PDL::Primitive::inner($evecs->xchg(0,1)->thread(-1,1),
1299                              $tmp->thread(0,-1),
1300                              $inv->thread(0,1));
1301        # alternative for matrix mult using implicit threading,
1302        # first xchg only for transpose
1303        PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1),
1304                              $tmp->xchg(0,1)->dummy(2),
1305                              ($inv=null));
1306
1307       Example 2:
1308
1309        # outer product by threaded multiplication
1310        # stress that we need to do it with explicit call to my_biop1
1311        # when using explicit threading
1312        $res=zeroes(($x->dims)[0],($y->dims)[0]);
1313        my_biop1($x->thread(0,-1),$y->thread(-1,0),$res->(0,1),"*");
1314        # similar thing by implicit threading with auto-created ndarray
1315        $res = $x->dummy(1) * $y->dummy(0);
1316
1317       Example 3:
1318
1319        # different use of thread and unthread to shuffle a number of
1320        # dimensions in one go without lots of calls to ->xchg and ->mv
1321
1322
1323        # use thread/unthread to shuffle dimensions around
1324        # just try it out and compare the child ndarray with its parent
1325        $trans = $x->thread(4,1,0,3,2)->unthread;
1326
1327       Example 4:
1328
1329        # calculate a couple of bounding boxes
1330        # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
1331        # we use again thread and unthread to shuffle dimensions around
1332        pdl> $bb = zeroes(double, 2,3 );
1333        pdl> minimum($vertices->thread(0)->clump->unthread(1), $bb->slice('(0),:'));
1334        pdl> maximum($vertices->thread(0)->clump->unthread(1), $bb->slice('(1),:'));
1335
1336       Example 5:
1337
1338        # calculate a self-rationed (i.e. self normalized) sequence of images
1339        # uses explicit threading and an implicitly threaded division
1340        $stack = read_image_stack();
1341        # calculate the average (per pixel average) of the first $n+1 images
1342        $aver = zeroes([stack->dims]->[0,1]);  # make the output ndarray
1343        sumover($stack->slice(":,:,0:$n")->thread(0,1),$aver);
1344        $aver /= ($n+1);
1345        $stack /= $aver;  # normalize the stack by doing a threaded division
1346        # implicit versus explicit
1347        # alternatively calculate $aver with implicit threading and auto-creation
1348        sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
1349        $aver /= ($n+1);
1350        #
1351
1352   Implicit versus explicit threading
1353       In this paragraph we are going to illustrate when explicit threading is
1354       preferable over implicit threading and vice versa. But then again, this
1355       is probably not the best way of putting the case since you already
1356       know: the two flavours do mix. So, it's more about how to get the best
1357       of both worlds and, anyway, in the best of Perl traditions: TIMTOWTDI !
1358
1359       [ Sorry, this still has to be filled in in a later release; either
1360       refer to above examples or choose some new ones ]
1361
1362       Finally, this may be a good place to justify all the technical detail
1363       we have been going on about for a couple of pages: why threading ?
1364
1365       Well, code that uses threading should be (considerably) faster than
1366       code that uses explicit for-loops (or similar Perl constructs) to
1367       achieve the same functionality. Especially on supercomputers (with
1368       vector computing facilities/parallel processing) PDL threading will be
1369       implemented in a way that takes advantage of the additional facilities
1370       of these machines. Furthermore, it is a conceptually simply construct
1371       (though technical details might get involved at times) and can greatly
1372       reduce the syntactical complexity of PDL code (but keep the admonition
1373       for documentation in mind). Once you are comfortable with the threading
1374       way of thinking (and coding) it shouldn't be too difficult to
1375       understand code that somebody else has written than (provided he gave
1376       you an idea what expected input dimensions are, etc.). As a general tip
1377       to increase the performance of your code: if you have to introduce a
1378       loop into your code try to reformulate the problem so that you can use
1379       threading to perform the loop (as with anything there are exceptions to
1380       this rule of thumb; but the authors of this document tend to think that
1381       these are rare cases ;).
1382

PDL::PP

1384   An easy way to define functions that are aware of indexing and threading
1385       (and the universe and everything)
1386       PDL:PP is part of the PDL distribution. It is used to generate
1387       functions that are aware of indexing and threading rules from very
1388       concise descriptions. It can be useful for you if you want to write
1389       your own functions or if you want to interface functions from an
1390       external library so  that they support indexing and threading (and
1391       maybe dataflow as well, see PDL::Dataflow). For further details check
1392       PDL::PP.
1393

Appendix A

1395   Affine transformations - a special class of simple and powerful
1396       transformations
1397       [ This is also something to be added in future releases. Do we already
1398       have the general make_affine routine in PDL ? It is possible that we
1399       will reference another appropriate man page from here ]
1400

Appendix B

1402   signatures of standard PDL::PP compiled functions
1403       A selection of signatures of PDL primitives to show how many dimensions
1404       PP compiled functions gobble up (and therefore you can figure out what
1405       will be threaded over). Most of those functions are the basic ones
1406       defined in "primitive.pd"
1407
1408        # functions in primitive.pd
1409        #
1410        sumover        ((n),[o]())
1411        prodover       ((n),[o]())
1412        axisvalues     ((n))                                   inplace
1413        inner          ((n),(n),[o]())
1414        outer          ((n),(m),[o](n,m))
1415        innerwt        ((n),(n),(n),[o]())
1416        inner2         ((m),(m,n),(n),[o]())
1417        inner2t        ((j,n),(n,m),(m,k),[o]())
1418        index          (1D,0D,[o])
1419        minimum        (1D,[o])
1420        maximum        (1D,[o])
1421        wstat          ((n),(n),(),[o],())
1422        assgn          ((),())
1423
1424        # basic operations
1425        binary operations ((),(),[o]())
1426        unary operations  ((),[o]())
1427
1429       Copyright (C) 1997 Christian Soeller (c.soeller@auckland.ac.nz) &
1430       Tuomas J. Lukka (lukka@fas.harvard.edu). All rights reserved. Although
1431       destined for release as a man page with the standard PDL distribution,
1432       it is not public domain. Permission is granted to freely distribute
1433       verbatim copies of this document provided that no modifications outside
1434       of formatting be made, and that this notice remain intact.  You are
1435       permitted and encouraged to use its code and derivatives thereof in
1436       your own source code for fun or for profit as you see fit.
1437
1438
1439
1440perl v5.34.0                      2021-08-16                       INDEXING(1)
Impressum