PDL::Indexing(1)

1INDEXING(1)           User Contributed Perl Documentation          INDEXING(1)
2
3
4

NAME

6       PDL::Indexing - how to index piddles.
7

DESCRIPTION

9       This manpage should serve as a first tutorial on the indexing and
10       threading features of PDL.
11
12       This manpage is still in alpha development and not yet complete. "Meta"
13       comments that point out deficiencies/omissions of this document will be
14       surrounded by square brackets ([]), e.g. [ Hopefully I will be able to
15       remove this paragraph at some time in the future ]. Furthermore, it is
16       possible that there are errors in the code examples. Please report any
17       errors to Christian Soeller (c.soeller@auckland.ac.nz).
18
19       Still to be done are (please bear with us and/or ask on the mailing
20       list, see PDL::FAQ):
21
22       ·    document perl level threading
23
24       ·    threadids
25
26       ·    update and correct description of slice
27
28       ·    new functions in slice.pd (affine, lag, splitdim)
29
30       ·    reworking of paragraph on explicit threading
31

Indexing and threading with PDL

33       A lot of the flexibility and power of PDL relies on the indexing and
34       looping features of the perl extension. Indexing allows access to the
35       data of a pdl object in a very flexible way. Threading provides effi‐
36       cient implicit looping functionality (since the loops are implemented
37       as optimized C code).
38
39       Pdl objects (later often called "pdls") are perl objects that represent
40       multidimensional arrays and operations on those. In contrast to simple
41       perl @x style lists the array data is compactly stored in a single
42       block of memory thus taking up a lot less memory and enabling use of
43       fast C code to implement operations (e.g. addition, etc) on pdls.
44
45       pdls can have children
46
47       Central to many of the indexing capabilities of PDL are the relation of
48       "parent" and "child" between pdls. Many of the indexing commands create
49       a new pdl from an existing pdl. The new pdl is the "child" and the old
50       one is the "parent". The data of the new pdl is defined by a transfor‐
51       mation that specifies how to generate (compute) its data from the par‐
52       ent's data. The relation between the child pdl and its parent are often
53       bidirectional, meaning that changes in the child's data are propagated
54       back to the parent. (Note: You see, we are aiming in our terminology
55       already towards the new dataflow features. The kind of dataflow that is
56       used by the indexing commands (about which you will learn in a minute)
57       is always in operation, not only when you have explicitly switched on
58       dataflow in your pdl by saying "$a->doflow". For further information
59       about data flow check the dataflow manpage.)
60
61       Another way to interpret the pdls created by our indexing commands is
62       to view them as a kind of intelligent pointer that points back to some
63       portion or all of its parent's data. Therefore, it is not surprising
64       that the parent's data (or a portion of it) changes when manipulated
65       through this "pointer". After these introductory remarks that hopefully
66       prepared you for what is coming (rather than confuse you too much) we
67       are going to dive right in and start with a description of the indexing
68       commands and some typical examples how they might be used in PDL pro‐
69       grams. We will further illustrate the pointer/dataflow analogies in the
70       context of some of the examples later on.
71
72       There are two different implementations of this ``smart pointer'' rela‐
73       tionship: the first one, which is a little slower but works for any
74       transformation is simply to do the transformation forwards and back‐
75       wards as necessary. The other is to consider the child piddle a ``vir‐
76       tual'' piddle, which only stores a pointer to the parent and access
77       information so that routines which use the child piddle actually
78       directly access the data in the parent.  If the virtual piddle is given
79       to a routine which cannot use it, PDL transparently physicalizes the
80       virtual piddle before letting the routine use it.
81
82       Currently (1.94_01) all transformations which are ``affine'', i.e. the
83       indices of the data item in the parent piddle are determined by a lin‐
84       ear transformation (+ constant) from the indices of the child piddle
85       result in virtual piddles. All other indexing routines (e.g.
86       "->index(...)") result in physical piddles.  All routines compiled by
87       PP can accept affine piddles (except those routines that pass pointers
88       to external library functions).
89
90       Note that whether something is affine or not does not affect the seman‐
91       tics of what you do in any way: both
92
93        $a->index(...) .= 5;
94        $a->slice(...) .= 5;
95
96       change the data in $a. The affinity does, however, have a significant
97       impact on memory usage and performance.
98
99       Slicing pdls
100
101       Probably the most important application of the concept of parent/child
102       pdls is the representation of rectangular slices of a physical pdl by a
103       virtual pdl. Having talked long enough about concepts let's get more
104       specific. Suppose we are working with a 2D pdl representing a 5x5 image
105       (its unusually small so that we can print it without filling several
106       screens full of digits ;).
107
108        perldl> $im = sequence(5,5)
109        perldl> p $im
110
111        [
112         [ 0  1  2  3  4]
113         [ 5  6  7  8  9]
114         [10 11 12 13 14]
115         [15 16 17 18 19]
116         [20 21 22 23 24]
117        ]
118
119        perldl> help vars
120        PDL variables in package main::
121
122        Name         Type   Dimension       Flow  State          Mem
123        ----------------------------------------------------------------
124        $im          Double D [5,5]                P            0.20Kb
125
126       [ here it might be appropriate to quickly talk about the "help vars"
127       command that provides information about pdls in the interactive
128       "perldl" shell that comes with pdl.  ]
129
130       Now suppose we want to create a 1-D pdl that just references one line
131       of the image, say line 2; or a pdl that represents all even lines of
132       the image (imagine we have to deal with even and odd frames of an
133       interlaced image due to some peculiar behaviour of our frame grabber).
134       As another frequent application of slices we might want to create a pdl
135       that represents a rectangular region of the image with top and bottom
136       reversed. All these effects (and many more) can be easily achieved with
137       the powerful slice function:
138
139        perldl> $line = $im->slice(':,(2)')
140        perldl> $even = $im->slice(':,1:-1:2')
141        perldl> $area = $im->slice('3:4,3:1')
142        perldl> help vars  # or just PDL->vars
143        PDL variables in package main::
144
145        Name         Type   Dimension       Flow  State          Mem
146        ----------------------------------------------------------------
147        $even        Double D [5,2]                -C           0.00Kb
148        $im          Double D [5,5]                P            0.20Kb
149        $line        Double D [5]                  -C           0.00Kb
150        $area        Double D [2,3]                -C           0.00Kb
151
152       All three "child" pdls are children of $im or in the other (largely
153       equivalent) interpretation pointers to data of $im.  Operations on
154       those virtual pdls access only those portions of the data as specified
155       by the argument to slice. So we can just print line 2:
156
157        perldl> p $line
158        [10 11 12 13 14]
159
160       Also note the difference in the "Flow State" of $area above and below:
161
162        perldl> p $area
163        perldl> help $area
164        This variable is Double D [2,3]                VC           0.00Kb
165
166       The following demonstrates that $im and $line really behave as you
167       would exspect from a pointer-like object (or in the dataflow picture:
168       the changes in $line's data are propagated back to $im):
169
170        perldl> $im++
171        perldl> p $line
172        [11 12 13 14 15]
173        perldl> $line += 2
174        perldl> p $im
175
176        [
177         [ 1  2  3  4  5]
178         [ 6  7  8  9 10]
179         [13 14 15 16 17]
180         [16 17 18 19 20]
181         [21 22 23 24 25]
182        ]
183
184       Note how assignment operations on the child virtual pdls change the
185       parent physical pdl and vice versa (however, the basic "=" assignment
186       doesn't, use ".=" to obtain that effect. See below for the reasons).
187       The virtual child pdls are something like "live links" to the "origi‐
188       nal" parent pdl. As previously said, they can be thought of to work
189       similiar to a C-pointer. But in contrast to a C-pointer they carry a
190       lot more information. Firstly, they specify the structure of the data
191       they represent (the dimensionality of the new pdl) and secondly, spec‐
192       ify how to create this structure from its parents data (the way this
193       works is buried in the internals of PDL and not important for you to
194       know anyway (unless you want to hack the core in the future or would
195       like to become a PDL guru in general (for a definition of this strange
196       creature see PDL::Internals)).
197
198       The previous examples have demonstrated typical usage of the slice
199       function. Since the slicing functionality is so important here is an
200       explanation of the syntax for the string argument to slice:
201
202        $vpdl = $a->slice('ind0,ind1...')
203
204       where "ind0" specifies what to do with index No 0 of the pdl $a, etc.
205       Each element of the comma separated list can have one of the following
206       forms:
207
208       ':'   Use the whole dimension
209
210       'n'   Use only index "n". The dimension of this index in the resulting
211             virtual pdl is 1. An example involving those first two index for‐
212             mats:
213
214              perldl> $column = $im->slice('2,:')
215              perldl> $row = $im->slice(':,0')
216              perldl> p $column
217
218              [
219               [ 3]
220               [ 8]
221               [15]
222               [18]
223               [23]
224              ]
225
226              perldl> p $row
227
228              [
229               [1 2 3 4 5]
230              ]
231
232              perldl> help $column
233              This variable is Double D [1,5]                VC           0.00Kb
234
235              perldl> help $row
236              This variable is Double D [5,1]                VC           0.00Kb
237
238       '(n)' Use only index "n". This dimension is removed from the resulting
239             pdl (relying on the fact that a dimension of size 1 can always be
240             removed). The distinction between this case and the previous one
241             becomes important in assignments where left and right hand side
242             have to have appropriate dimensions.
243
244              perldl> $line = $im->slice(':,(0)')
245              perldl> help $line
246              This variable is Double D [5]                  -C           0.00Kb
247
248              perldl> p $line
249              [1 2 3 4 5]
250
251             Spot the difference to the previous example?
252
253       'n1:n2' or 'n1:n2:n3'
254             Take the range of indices from "n1" to "n2" or (second form) take
255             the range of indices from "n1" to "n2" with step "n3". An example
256             for the use of this format is the previous definition of the
257             subimage composed of even lines.
258
259              perldl> $even = $im->slice(':,1:-1:2')
260
261             This example also demonstrates that negative indices work like
262             they do for normal perl style arrays by counting backwards from
263             the end of the dimension. If "n2" is smaller than "n1" (in the
264             example -1 is equivalent to index 4) the elements in the virtual
265             pdl are effectively reverted with respect to its parent.
266
267       '*[n]'
268             Add a dummy dimension. The size of this dimension will be 1 by
269             default or equal to "n" if the optional numerical argument is
270             given.
271
272             Now, this is really something a bit strange on first sight. What
273             is a dummy dimension? A dummy dimension inserts a dimension where
274             there wasn't one before. How is that done ? Well, in the case of
275             the new dimension having size 1 it can be easily explained by the
276             way in which you can identify a vector (with "m" elements) with
277             an "(1,m)" or "(m,1)" matrix. The same holds obviously for higher
278             dimensional objects. More interesting is the case of a dummy
279             dimensions of size greater than one (e.g. "slice('*5,:')"). This
280             works in the same way as a call to the dummy function creates a
281             new dummy dimension.  So read on and check its explanation below.
282
283       '([n1:n2[:n3]]=i)'
284             [Not yet implemented ??????]  With an argument like this you make
285             generalised diagonals. The diagonal will be dimension no. "i" of
286             the new output pdl and (if optional part in brackets specified)
287             will extend along the range of indices specified of the respec‐
288             tive parent pdl's dimension. In general an argument like this
289             only makes sense if there are other arguments like this in the
290             same call to slice. The part in brackets is optional for this
291             type of argument. All arguments of this type that specify the
292             same target dimension "i" have to relate to the same number of
293             indices in their parent dimension. The best way to explain it is
294             probably to give an example, here we make a pdl that refers to
295             the elements along the space diagonal of its parent pdl (a cube):
296
297              $cube = zeroes(5,5,5);
298              $sdiag = $cube->slice('(=0),(=0),(=0)');
299
300             The above command creates a virtual pdl that represents the diag‐
301             onal along the parents' dimension no. 0, 1 and 2 and makes its
302             dimension 0 (the only dimension) of it. You use the extended syn‐
303             tax if the dimension sizes of the parent dimensions you want to
304             build the diagonal from have different sizes or you want to
305             reverse the sequence of elements in the diagonal, e.g.
306
307              $rect = zeroes(12,3,5,6,2);
308              $vpdl = $rect->slice('2:7,(0:1=1),(4),(5:4=1),(=1)');
309
310             So the elements of $vpdl will then be related to those of its
311             parent in way we can express as:
312
313               vpdl(i,j) = rect(i+2,j,4,5-j,j)       0<=i<5, 0<=j<2
314
315       [ work in the new index function: "$b = $a->index($c);" ???? ]
316
317       There are different kinds of assignments in PDL
318
319       The previous examples have already shown that virtual pdls can be used
320       to operate on or access portions of data of a parent pdl. They can also
321       be used as lvalues in assignments (as the use of "++" in some of the
322       examples above has already demonstrated). For explicit assignments to
323       the data represented by a virtual pdl you have to use the overloaded
324       ".=" operator (which in this context we call propagated assignment).
325       Why can't you use the normal assignment operator "="?
326
327       Well, you definitely still can use the '=' operator but it wouldn't do
328       what you want. This is due to the fact that the '=' operator cannot be
329       overloaded in the same way as other assignment operators. If we tried
330       to use '=' to try to assign data to a portion of a physical pdl through
331       a virtual pdl we wouldn't achieve the desired effect (instead the vari‐
332       able representing the virtual pdl (a reference to a blessed thingy)
333       would after the assignment just contain the reference to another
334       blessed thingy which would behave to future assignments as a "physical"
335       copy of the original rvalue [this is actually not yet clear and subject
336       of discussions in the PDL developers mailing list]. In that sense it
337       would break the connection of the pdl to the parent [ isn't this behav‐
338       iour in a sense the opposite of what happens in dataflow, where ".="
339       breaks the connection to the parent? ].
340
341       E.g.
342
343        perldl> $line = $im->slice(':,(2)')
344        perldl> $line = zeroes(5);
345        perldl> $line++;
346        perldl> p $im
347
348        [
349         [ 1  2  3  4  5]
350         [ 6  7  8  9 10]
351         [13 14 15 16 17]
352         [16 17 18 19 20]
353         [21 22 23 24 25]
354        ]
355
356        perldl> p $line
357        [1 1 1 1 1]
358
359       But using ".="
360
361        perldl> $line = $im->slice(':,(2)')
362        perldl> $line .= zeroes(5)
363        perldl> $line++
364        perldl> p $im
365
366        [
367         [ 1  2  3  4  5]
368         [ 6  7  8  9 10]
369         [ 1  1  1  1  1]
370         [16 17 18 19 20]
371         [21 22 23 24 25]
372        ]
373
374        perldl> print $line
375        [1 1 1 1 1]
376
377       Also, you can substitute
378
379        perldl> $line .= 0;
380
381       for the assignment above (the zero is converted to a scalar piddle,
382       with no dimensions so it can be assigned to any piddle).
383
384       Related to the assignment feature is a little trap for the unwary:
385       since perl currently does not allow subroutines to return lvalues the
386       following shortcut of the above is flagged as a compile time error:
387
388        perldl> $im->slice(':,(2)') .= zeroes(5)->xvals->float
389
390       instead you have to say something like
391
392        perldl> ($pdl = $im->slice(':,(2)')) .= zeroes(5)->xvals->float
393
394       We hope that future versions of perl will allow the simpler syntax
395       (i.e. allow subroutines to return lvalues).  [Note: perl v5.6.0 does
396       allow this, but it is an experimental feature. However, early reports
397       suggest it works in simple situations]
398
399       Note that there can be a problem with assignments like this when lvalue
400       and rvalue pdls refer to overlapping portions of data in the parent
401       pdl:
402
403        # revert the elements of the first line of $a
404        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)');
405
406       Currently, the parent data on the right side of the assignments is not
407       copied before the (internal) assignment loop proceeds. Therefore, the
408       outcome of this assignment will depend on the sequence in which ele‐
409       ments are assigned and almost certainly not do what you wanted.  So the
410       semantics are currently undefined for now and liable to change anytime.
411       To obtain the desired behaviour, use
412
413        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->copy;
414
415       which makes a physical copy of the slice or
416
417        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->sever;
418
419       which returns the same slice but severs the connection of the slice to
420       its parent.
421
422       Other functions that manipulate dimensions
423
424       Having talked extensively about the slice function it should be noted
425       that this is not the only PDL indexing function. There are additional
426       indexing functions which are also useful (especially in the context of
427       threading which we will talk about later). Here are a list and some
428       examples how to use them.
429
430       "dummy"
431           inserts a dummy dimension of the size you specify (default 1) at
432           the chosen location. You can't wait to hear how that is achieved?
433           Well, all elements with index "(X,x,Y)" ("0<=x<size_of_dummy_dim")
434           just map to the element with index "(X,Y)" of the parent pdl (where
435           "X" and "Y" refer to the group of indices before and after the
436           location where the dummy dimension was inserted.)
437
438           This example calculates the x coordinate of the centroid of an
439           image (later we will learn that we didn't actually need the dummy
440           dimension thanks to the magic of implicit threading; but using
441           dummy dimensions the code would also work in a threadless world;
442           though once you have worked with PDL threads you wouldn't want to
443           live without them again).
444
445            # centroid
446            ($xd,$yd) = $im->dims;
447            $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);
448
449           Let's explain how that works in a little more detail. First, the
450           product:
451
452            $xvs = xvals(zeroes($xd));
453            print $xvs->dummy(1,$yd);      # repeat the line $yd times
454            $prod = $im*xvs->dummy(1,$yd); # form the pixelwise product with
455                                           # the repeated line of x-values
456
457           The rest is then summing the results of the pixelwise product
458           together and normalising with the sum of all pixel values in the
459           original image thereby calculating the x-coordinate of the "center
460           of mass" of the image (interpreting pixel values as local mass)
461           which is known as the centroid of an image.
462
463           Next is a (from the point of view of memory consumption) very cheap
464           conversion from greyscale to RGB, i.e. every pixel holds now a
465           triple of values instead of a scalar. The three values in the
466           triple are, fortunately, all the same for a grey image, so that our
467           trick works well in that it maps all the three members of the
468           triple to the same source element:
469
470            # a cheap greyscale to RGB conversion
471            $rgb = $grey->dummy(0,3)
472
473           Unfortunately this trick cannot be used to convert your old B/W
474           photos to color ones in the way you'd like. :(
475
476           Note that the memory usage of piddles with dummy dimensions is
477           especially sensitive to the internal representation. If the piddle
478           can be represented as a virtual affine (``vaffine'') piddle, only
479           the control structures are stored. But if $b in
480
481            $a = zeroes(10000);
482            $b = $a->dummy(1,10000);
483
484           is made physical by some routine, you will find that the memory
485           usage of your program has suddenly grown by 100Mb.
486
487       "diagonal"
488           replaces two dimensions (which have to be of equal size) by one
489           dimension that references all the elements along the "diagonal"
490           along those two dimensions. Here, we have two examples which should
491           appear familiar to anyone who has ever done some linear algebra.
492           Firstly, make a unity matrix:
493
494            # unity matrix
495            $e = zeroes(float, 3, 3); # make everything zero
496            ($tmp = $e->diagonal(0,1)) .= 1; # set the elements along the diagonal to 1
497            print $e;
498
499           Or the other diagonal:
500
501            ($tmp = $e->slice(':-1:0')->diagonal(0,1)) .= 2;
502            print $e;
503
504           (Did you notice how we used the slice function to revert the
505           sequence of lines before setting the diagonal of the new child,
506           thereby setting the cross diagonal of the parent ?)  Or a mapping
507           from the space of diagonal matrices to the field over which the
508           matrices are defined, the trace of a matrix:
509
510            # trace of a matrix
511            $trace = sum($mat->diagonal(0,1));  # sum all the diagonal elements
512
513       "xchg" and "mv"
514           xchg exchanges or "transposes" the two  specified dimensions.  A
515           straightforward example:
516
517            # transpose a matrix (without explicitly reshuffling data and
518            # making a copy)
519            $prod = $a x $a->xchg(0,1);
520
521           $prod should now be pretty close to the unity matrix if $a is an
522           orthogonal matrix. Often "xchg" will be used in the context of
523           threading but more about that later.
524
525           mv works in a similar fashion. It moves a dimension (specified by
526           its number in the parent) to a new position in the new child pdl:
527
528            $b = $a->mv(4,0);  # make the 5th dimension of $a the first in the
529                               # new child $b
530
531           The difference between "xchg" and "mv" is that "xchg" only changes
532           the position of two dimensions with each other, whereas "mv"
533           inserts the first dimension to the place of second, moving the
534           other dimensions around accordingly.
535
536       "clump"
537           collapses several dimensions into one. Its only argument specifies
538           how many dimensions of the source pdl should be collapsed (starting
539           from the first). An (admittedly unrealistic) example is a 3D pdl
540           which holds data from a stack of image files that you have just
541           read in. However, the data from each image really represents a 1D
542           time series and has only been arranged that way because it was dig‐
543           itized with a frame grabber. So to have it again as an array of
544           time sequences you say
545
546            perldl> $seqs = $stack->clump(2)
547            perldl> help vars
548            PDL variables in package main::
549
550            Name         Type   Dimension       Flow  State          Mem
551            ----------------------------------------------------------------
552            $seqs        Double D [8000,50]            -C           0.00Kb
553            $stack       Double D [100,80,50]          P            3.05Mb
554
555           Unrealistic as it may seem, our confocal microscope software writes
556           data (sometimes) this way. But more often you use clump to achieve
557           a certain effect when using implicit or explicit threading.
558
559       Calls to indexing functions can be chained
560
561       As you might have noticed in some of the examples above calls to the
562       indexing functions can be nicely chained since all of these functions
563       return a newly created child object. However, when doing extensive
564       index manipulations in a chain be sure to keep track of what you are
565       doing, e.g.
566
567        $a->xchg(0,1)->mv(0,4)
568
569       moves the dimension 1 of $a to position 4 since when the second command
570       is executed the original dimension 1 has been moved to position 0 of
571       the new child that calls the "mv" function. I think you get the idea
572       (in spite of my convoluted explanations).
573
574       Propagated assignments ('.=') and dummy dimensions
575
576       A sublety related to indexing is the assignment to pdls containing
577       dummy dimensions of size greater than 1. These assignments (using ".=")
578       are forbidden since several elements of the lvalue pdl point to the
579       same element of the parent. As a consequence the value of those parent
580       elements are potentially ambiguous and would depend on the sequence in
581       which the implementation makes the assignments to elements. Therefore,
582       an assignment like this:
583
584        $a = pdl [1,2,3];
585        $b = $a->dummy(1,4);
586        $b .= yvals(zeroes(3,4));
587
588       can produce unexpected results and the results are explicitly undefined
589       by PDL because when PDL gets parallel computing features, the current
590       result may well change.
591
592       From the point of view of dataflow the introduction of greater-size-
593       than-one dummy dimensions is regarded as an irreversible transformation
594       (similar to the terminology in thermodynamics) which precludes backward
595       propagation of assignment to a parent (which you had explicitly
596       requested using the ".=" assignment). A similar problem to watch out
597       for occurs in the context of threading where sometimes dummy dimensions
598       are created implicitly during the thread loop (see below).
599
600       Reasons for the parent/child (or "pointer") concept
601
602       [ this will have to wait a bit ]
603
604        XXXXX being memory efficient
605        XXXXX in the context of threading
606        XXXXX very flexible and powerful way of accessing portions of pdl data
607              (in much more general way than sec, etc allow)
608        XXXXX efficient implementation
609        XXXXX difference to section/at, etc.
610
611       How to make things physical again
612
613       [ XXXXX fill in later when everything has settled a bit more ]
614
615        ** When needed (xsub routine interfacing C lib function)
616        ** How achieved (->physical)
617        ** How to test (isphysical (explain how it works currently))
618        ** ->copy and ->sever
619

Threading

621       In the previous paragraph on indexing we have already mentioned the
622       term occasionally but now its really time to talk explicitly about
623       "threading" with pdls. The term threading has many different meanings
624       in different fields of computing. Within the framework of PDL it could
625       probably be loosely defined as an implicit looping facility. It is
626       implicit because you don't specify anything like enclosing for-loops
627       but rather the loops are automatically (or 'magically') generated by
628       PDL based on the dimensions of the pdls involved. This should give you
629       a first idea why the index/dimension manipulating functions you have
630       met in the previous paragraphs are especially important and useful in
631       the context of threading.  The other ingredient for threading (apart
632       from the pdls involved) is a function that is threading aware (gener‐
633       ally, these are PDL::PP compiled functions) and that the pdls are
634       "threaded" over.  So much about the terminology and now let's try to
635       shed some light on what it all means.
636
637       Implicit threading - a first example
638
639       There are two slightly different variants of threading. We start with
640       what we call "implicit threading". Let's pick a practical example that
641       involves looping of a function over many elements of a pdl. Suppose we
642       have an RGB image that we want to convert to greyscale. The RGB image
643       is represented by a 3-dim pdl "im(3,x,y)" where the first dimension
644       contains the three color components of each pixel and "x" and "y" are
645       width and height of the image, respectively. Next we need to specify
646       how to convert a color-triple at a given pixel into a greyvalue (to be
647       a realistic example it should represent the relative intensity with
648       which our color insensitive eye cells would detect that color to
649       achieve what we would call a natural conversion from color to
650       greyscale). An approximation that works quite well is to compute the
651       grey intensity from each RGB triplet (r,g,b) as a weighted sum
652
653        greyvalue = 77/256*r + 150/256*g + 29/256*b =
654            inner([77,150,29]/256, [r,g,b])
655
656       where the last form indicates that we can write this as an inner prod‐
657       uct of the 3-vector comprising the weights for red, green and blue com‐
658       ponents with the 3-vector containing the color components. Tradition‐
659       ally, we might have written a function like the following to process
660       the whole image:
661
662        my @dims=$im->dims;
663        # here normally check that first dim has correct size (3), etc
664        $grey=zeroes(@dims[1,2]);   # make the pdl for the resulting grey image
665        $w = pdl [77,150,29] / 256; # the vector of weights
666        for ($j=0;$j<dims[2];$j++) {
667           for ($i=0;$i<dims[1];$i++) {
668               # compute the pixel value
669               $tmp = inner($w,$im->slice(':,(i),(j)'));
670               set($grey,$i,$j,$tmp); # and set it in the greyscale image
671           }
672        }
673
674       Now we write the same using threading (noting that "inner" is a thread‐
675       ing aware function defined in the PDL::Primitive package)
676
677        $grey = inner($im,pdl([77,150,29]/256));
678
679       We have ended up with a one-liner that automatically creates the pdl
680       $grey with the right number and size of dimensions and performs the
681       loops automatically (these loops are implemented as fast C code in the
682       internals of PDL).  Well, we still owe you an explanation how this
683       'magic' is achieved.
684
685       How does the example work ?
686
687       The first thing to note is that every function that is threading aware
688       (these are without exception functions compiled from concise descrip‐
689       tions by PDL::PP, later just called PP-functions) expects a defined
690       (minimum) number of dimensions (we call them core dimensions) from each
691       of its pdl arguments. The inner function expects two one-dimensional
692       (input) parameters from which it calculates a zero-dimensional (output)
693       parameter. We write that symbolically as "inner((n),(n),[o]())" and
694       call it "inner"'s signature, where n represents the size of that dimen‐
695       sion. n being equal in the first and second parameter means that those
696       dimensions have to be of equal size in any call. As a different example
697       take the outer product which takes two 1D vectors to generate a 2D
698       matrix, symbolically written as "outer((n),(m),[o](n,m))". The "[o]" in
699       both examples indicates that this (here third) argument is an output
700       argument. In the latter example the dimensions of first and second
701       argument don't have to agree but you see how they determine the size of
702       the two dimensions of the output pdl.
703
704       Here is the point when threading finally enters the game. If you call
705       PP-functions with pdls that have more than the required core dimensions
706       the first dimensions of the pdl arguments are used as the core dimen‐
707       sions and the additional extra dimensions are threaded over. Let us
708       demonstrate this first with our example above
709
710        $grey = inner($im,$w); # w is the weight vector from above
711
712       In this case $w is 1D and so supplied just the core dimension, $im is
713       3D, more specifically "(3,x,y)". The first dimension (of size 3) is the
714       required core dimension that matches (as required by inner) the first
715       (and only) dimension of $w. The second dimension is the first thread
716       dimension (of size "x") and the third is here the second thread dimen‐
717       sion (of size "y"). The output pdl is automatically created (as
718       requested by setting $grey to "null" prior to invocation). The output
719       dimensions are obtained by appending the loop dimensions (here "(x,y)")
720       to the core output dimensions (here 0D) to yield the final dimensions
721       of the autocreated pdl (here "0D+2D=2D" to yield a 2D output of size
722       "(x,y)").
723
724       So the above command calls the core functioniality that computes the
725       inner product of two 1D vectors "x*y" times with $w and all 1D slices
726       of the form "(':,(i),(j)')" of $im and sets the respective elements of
727       the output pdl "$grey(i,j)" to the result of each computation. We could
728       write that symbolically as
729
730        $grey(0,0) = f($w,$im(:,(0),(0)))
731        $grey(1,0) = f($w,$im(:,(1),(0)))
732            .
733            .
734            .
735        $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
736        $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))
737
738       But this is done automatically by PDL without writing any explicit perl
739       loops.  We see that the command really creates an output pdl with the
740       right dimensions and sets the elements indeed to the result of the com‐
741       putation for each pixel of the input image.
742
743       When even more pdls and extra dimensions are involved things get a bit
744       more complicated. We will first give the general rules how the thread
745       dimensions depend on the dimensions of input pdls enabling you to fig‐
746       ure out the dimensionality of an autocreated output pdl (for any given
747       set of input pdls and core dimensions of the PP-function in question).
748       The general rules will most likely appear a bit confusing on first
749       sight so that we'll set out to illustrate the usage with a set of fur‐
750       ther examples (which will hopefully also demonstrate that there are
751       indeed many practical situations where threading comes in extremly
752       handy).
753
754       A call for coding discipline
755
756       Before we point out the other technical details of threading, please
757       note this call for programming discipline when using threading:
758
759       In order to preserve human readability, PLEASE comment any nontrivial
760       expression in your code involving threading.  Most importantly, for any
761       subroutine, include information at the beginning about what you expect
762       the dimensions to represent (or ranges of dimensions).
763
764       As a warning, look at this undocumented function and try to guess what
765       might be going on:
766
767        sub lookup {
768          my ($im,$palette) = @_;
769          my $res;
770          index($palette->xchg(0,1),
771                     $im->long->dummy(0,($palette->dim)[0]),
772                     ($res=null));
773          return $res;
774        }
775
776       Would you agree that it might be difficult to figure out expected
777       dimensions, purpose of the routine, etc ?  (If you want to find out
778       what this piece of code does, see below)
779
780       How to figure out the loop dimensions
781
782       There are a couple of rules that allow you to figure out number and
783       size of loop dimensions (and if the size of your input pdls comply with
784       the threading rules). Dimensions of any pdl argument are broken down
785       into two groups in the following: Core dimensions (as defined by the
786       PP-function, see Appendix B for a list of PDL primitives) and extra
787       dimensions which comprises all remaining dimensions of that pdl. For
788       example calling a function "func" with the signature
789       "func((n,m),[o](n))" with a pdl "a(2,4,7,1,3)" as "f($a,($o = null))"
790       results in the semantic splitting of a's dimensions into: core dimen‐
791       sions "(2,4)" and extra dimensions "(7,1,3)".
792
793       R0    Core dimensions are identified with the first N dimensions of the
794             respective pdl argument (and are required). Any further dimen‐
795             sions are extra dimensions and used to determine the loop dimen‐
796             sions.
797
798       R1    The number of (implicit) loop dimensions is equal to the maximal
799             number of extra dimensions taken over the set of pdl arguments.
800
801       R2    The size of each of the loop dimensions is derived from the size
802             of the respective dimensions of the pdl arguments. The size of a
803             loop dimension is given by the maximal size found in any of the
804             pdls having this extra dimension.
805
806       R3    For all pdls that have a given extra dimension the size must be
807             equal to the size of the loop dimension (as determined by the
808             previous rule) or 1; otherwise you raise a runtime exception. If
809             the size of the extra dimension in a pdl is one it is implicitly
810             treated as a dummy dimension of size equal to that loop dim size
811             when performing the thread loop.
812
813       R4    If a pdl doesn't have a loop dimension, in the thread loop this
814             pdl is treated as if having a dummy dimension of size equal to
815             the size of that loop dimension.
816
817       R5    If output autocreation is used (by setting the relevant pdl to
818             "PDL->null" before invocation) the number of dimensions of the
819             created pdl is equal to the sum of the number of core output
820             dimensions + number of loop dimensions. The size of the core out‐
821             put dimensions is derived from the relevant dimension of input
822             pdls (as specified in the function definition) and the sizes of
823             the other dimensions are equal to the size of the loop dimension
824             it is derived from. The automatically created pdl will be physi‐
825             cal (unless dataflow is in operation).
826
827       In this context, note that you can run into the problem with assignment
828       to pdls containing greater-than-one dummy dimensions (see above).
829       Although your output pdl(s) didn't contain any dummy dimensions in the
830       first place they may end up with implicitly created dummy dimensions
831       according to R4.
832
833       As an example, suppose we have a (here unspecified) PP-function with
834       the signature:
835
836        func((m,n),(m,n,o),(m),[o](m,o))
837
838       and you call it with 3 pdls "a(5,3,10,11)", "b(5,3,2,10,1,12)", and
839       "c(5,1,11,12)" as
840
841        func($a,$b,$c,($d=null))
842
843       then the number of loop dimensions is 3 (by "R0+R1" from $b and $c)
844       with sizes "(10,11,12)" (by R2); the two output core dimensions are
845       "(5,2)" (from the signature of func) resulting in a 5-dimensional out‐
846       put pdl $c of size "(5,2,10,11,12)" (see R5) and (the automatically
847       created) $d is derived from "($a,$b,$c)" in a way that can be expressed
848       in pdl pseudo-code as
849
850        $d(:,:,i,j,k) .= func($a(:,:,i,j),$b(:,:,:,i,0,k),$c(:,0,j,k))
851           with 0<=i<10, 0<=j<=11, 0<=k<12
852
853       If we analyze the color to greyscale conversion again with these rules
854       in mind we note another great advantage of implicit threading.  We can
855       call the conversion with a pdl representing a pixel (im(3)), a line of
856       rgb pixels ("im(3,x)"), a proper color image ("im(3,x,y)") or a whole
857       stack of RGB images ("im(3,x,y,z)"). As long as $im is of the form
858       "(3,...)" the automatically created output pdl will contain the right
859       number of dimensions and contain the intensity data as we exspect it
860       since the loops have been implicitly performed thanks to implicit
861       threading. You can easily convince yourself that calling with a color
862       pixel $grey is 0D, with a line it turns out 1D grey(x), with an image
863       we get "grey(x,y)" and finally we get a converted image stack
864       "grey(x,y,z)".
865
866       Let's fill these general rules with some more life by going through a
867       couple of further examples. The reader may try to figure out equivalent
868       formulations with explicit for-looping and compare the flexibility of
869       those routines using implicit threading to the explicit formulation.
870       Furthermore, especially when using several thread dimensions it is a
871       useful exercise to check the relative speed by doing some benchmark
872       tests (which we still have to do).
873
874       First in the row is a slightly reworked centroid example, now coded
875       with threading in mind.
876
877        # threaded mult to calculate centroid coords, works for stacks as well
878        $xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) /
879              sumover($im->clump(2));
880
881       Let's analyse what's going on step by step. First the product:
882
883        $prod = $im*xvals(zeroes(($im->dims)[0]))
884
885       This will actually work for $im being one, two, three, and higher
886       dimensional. If $im is one-dimensional it's just an ordinary product
887       (in the sense that every element of $im is multiplied with the respec‐
888       tive element of "xvals(...)"), if $im has more dimensions further
889       threading is done by adding appropriate dummy dimensions to
890       "xvals(...)"  according to R4.  More importantly, the two sumover oper‐
891       ations show a first example of how to make use of the dimension manipu‐
892       lating commands. A quick look at sumover's signature will remind you
893       that it will only "gobble up" the first dimension of a given input pdl.
894       But what if we want to really compute the sum over all elements of the
895       first two dimensions? Well, nothing keeps us from passing a virtual pdl
896       into sumover which in this case is formed by clumping the first two
897       dimensions of the "parent pdl" into one. From the point of view of the
898       parent pdl the sum is now computed over the first two dimensions, just
899       as we wanted, though sumover has just done the job as specified by its
900       signature. Got it ?
901
902       Another little finesse of writing the code like that: we intentionally
903       used "sumover($pdl->clump(2))" instead of "sum($pdl)" so that we can
904       either pass just an image "(x,y)" or a stack of images "(x,y,t)" into
905       this routine and get either just one x-coordiante or a vector of
906       x-coordinates (of size t) in return.
907
908       Another set of common operations are what one could call "projection
909       operations". These operations take a N-D pdl as input and return a
910       (N-1)-D "projected" pdl. These operations are often performed with
911       functions like sumover, prodover, minimum and maximum.  Using again
912       images as examples we might want to calculate the maximum pixel value
913       for each line of an image or image stack. We know how to do that
914
915        # maxima of lines (as function of line number and time)
916        maximum($stack,($ret=null));
917
918       But what if you want to calculate maxima per column when implicit
919       threading always applies the core functionality to the first dimension
920       and threads over all others? How can we achieve that instead the core
921       functionality is applied to the second dimension and threading is done
922       over the others. Can you guess it? Yes, we make a virtual pdl that has
923       the second dimension of the "parent pdl" as its first dimension using
924       the "mv" command.
925
926        # maxima of columns (as function of column number and time)
927        maximum($stack->mv(0,1),($ret=null));
928
929       and calculating all the sums of sub-slices over the third dimension is
930       now almost too easy
931
932        # sums of pixles in time (assuming time is the third dim)
933        sumover($stack->mv(0,2),($ret=null));
934
935       Finally, if you want to apply the operation to all elements (like max
936       over all elements or sum over all elements) regardless of the dimen‐
937       sions of the pdl in question "clump" comes in handy. As an example look
938       at the definition of "sum" (as defined in "Basic.pm"):
939
940        sub sum {
941          PDL::Primitive::sumover($name->clump(-1),($tmp=null));
942          return $tmp->at(); # return a perl number, not a 0D pdl
943        }
944
945       We have already mentioned that all basic operations support threading
946       and assignment is no exception. So here are a couple of threaded
947       assignments
948
949        perldl> $im = zeroes(byte, 10,20)
950        perldl> $line = exp(-rvals(10)**2/9)
951        # threaded assignment
952        perldl> $im .= $line      # set every line of $im to $line
953        perldl> $im2 .= 5         # set every element of $im2 to 5
954
955       By now you probably see how it works and what it does, don't you?
956
957       To finish the examples in this paragraph here is a function to create
958       an RGB image from what is called a palette image. The palette image
959       consists of two parts: an image of indices into a color lookup table
960       and the color lookup table itself. [ describe how it works ] We are
961       going to use a PP-function we haven't encoutered yet in the previous
962       examples. It is the aptly named index function, signature
963       "((n),(),[o]())" (see Appendix B) with the core functionality that
964       "index(pdl (0,2,4,5),2,($ret=null))" will return the element with index
965       2 of the first input pdl. In this case, $ret will contain the value 4.
966       So here is the example:
967
968        # a threaded index lookup to generate an RGB, or RGBA or YMCK image
969        # from a palette image (represented by a lookup table $palette and
970        # an color-index image $im)
971        # you can say just dummy(0) since the rules of threading make it fit
972        perldl> index($palette->xchg(0,1),
973                      $im->long->dummy(0,($palette->dim)[0]),
974                      ($res=null));
975
976       Let's go through it and explain the steps involved. Assuming we are
977       dealing with an RGB lookup-table $palette is of size "(3,x)". First we
978       exchange the dimensions of the palette so that looping is done over the
979       first dimension of $palette (of size 3 that represent r, g, and b com‐
980       ponents). Now looking at $im, we add a dummy dimension of size equal to
981       the length of the number of components (in the case we are discussing
982       here we could have just used the number 3 since we have 3 color compo‐
983       nents). We can use a dummy dimension since for red, green and blue
984       color components we use the same index from the original image, e.g.
985       assuming a certain pixel of $im had the value 4 then the lookup should
986       produce the triple
987
988        [palette(0,4),palette(1,4),palette(2,4)]
989
990       for the new red, green and blue components of the output image. Hope‐
991       fully by now you have some sort of idea what the above piece of code is
992       supposed to do (it is often actually quite complicated to describe in
993       detail how a piece of threading code works; just go ahead and experi‐
994       ment a bit to get a better feeling for it).
995
996       If you have read the threading rules carefully, then you might have
997       noticed that we didn't have to explicitely state the size of the dummy
998       dimension that we created for $im; when we create it with size 1 (the
999       default) the rules of threading make it automatically fit to the
1000       desired size (by rule R3, in our example the size would be 3 assuming a
1001       palette of size "(3,x)"). Since situations like this do occur often in
1002       practice this is actually why rule R3 has been introduced (the part
1003       that makes dimensions of size 1 fit to the thread loop dim size). So we
1004       can just say
1005
1006        perldl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null));
1007
1008       Again, you can convince yourself that this routine will create the
1009       right output if called with a pixel ($im is 0D), a line ($im is 1D), an
1010       image ($im is 2D), ..., an RGB lookup table (palette is "(3,x)") and
1011       RGBA lookup table (palette is "(4,x)", see e.g. OpenGL). This flexibil‐
1012       ity is achieved by the rules of threading which are made to do the
1013       right thing in most situations.
1014
1015       To wrap it all up once again, the general idea is as follows. If you
1016       want to achieve looping over certain dimensions and have the core func‐
1017       tionality applied to another specified set of dimensions you use the
1018       dimension manipulating commands to create a (or several) virtual pdl(s)
1019       so that from the point of view of the parent pdl(s) you get what you
1020       want (always having the signature of the function in question and R1-R5
1021       in mind!). Easy, isn't it ?
1022
1023       Output autocreation and PP-function calling conventions
1024
1025       At this point we have to divert to some technical detail that has to do
1026       with the general calling conventions of PP-functions and the automatic
1027       creation of output arguments.  Basically, there are two ways of invok‐
1028       ing pdl routines, namely
1029
1030        $result = func($a,$b);
1031
1032       and
1033
1034        func($a,$b,$result);
1035
1036       If you are only using implicit threading then the output variable can
1037       be automatically created by PDL. You flag that to the PP-function by
1038       setting the output argument to a special kind of pdl that is returned
1039       from a call to the function "PDL->null" that returns an essentially
1040       "empty" pdl (for those interested in details there is a flag in the C
1041       pdl structure for this). The dimensions of the created pdl are deter‐
1042       mined by the rules of implicit threading: the first dimensions are the
1043       core output dimensions to which the threading dimensions are appended
1044       (which are in turn determined by the dimensions of the input pdls as
1045       described above).  So you can say
1046
1047        func($a,$b,($result=PDL->null));
1048
1049       or
1050
1051        $result = func($a,$b)
1052
1053       which are exactly equivalent.
1054
1055       Be warned that you can not use output autocreation when using explicit
1056       threading (for reasons explained in the following section on explicit
1057       threading, the second variant of threading).
1058
1059       In "tight" loops you probably want to avoid the implicit creation of a
1060       temporary pdl in each step of the loop that comes along with the "func‐
1061       tional" style but rather say
1062
1063        # create output pdl of appropriate size only at first invocation
1064        $result = null;
1065        for (0...$n) {
1066             func($a,$b,$result); # in all but the first invocation $result
1067             func2($b);           # is defined and has the right size to
1068                                  # take the output provided $b's dims don't change
1069             twiddle($result,$a); # do something from $result to $a for iteration
1070        }
1071
1072       The take-home message of this section once more: be aware of the limi‐
1073       tation on output creation when using explicit threading.
1074
1075       Explicit threading
1076
1077       Having so far only talked about the first flavour of threading it is
1078       now about time to introduce the second variant. Instead of shuffling
1079       around dimensions all the time and relying on the rules of implicit
1080       threading to get it all right you sometimes might want to specify in a
1081       more explicit way how to perform the thread loop. It is probably not
1082       too surprising that this variant of the game is called explicit thread‐
1083       ing.  Now, before we create the wrong impression: it is not either
1084       implicit or explicit; the two flavours do mix. But more about that
1085       later.
1086
1087       The two most used functions with explicit threading are thread and
1088       unthread.  We start with an example that illustrates typical usage of
1089       the former:
1090
1091        [ # ** this is the worst possible example to start with ]
1092        #  but can be used to show that $mat += $line is different from
1093        #                               $mat->thread(0) += $line
1094        # explicit threading to add a vector to each column of a matrix
1095        perldl> $mat  = zeroes(4,3)
1096        perldl> $line = pdl (3.1416,2,-2)
1097        perldl> ($tmp = $mat->thread(0)) += $line
1098
1099       In this example, "$mat->thread(0)" tells PDL that you want the second
1100       dimension of this pdl to be threaded over first leading to a thread
1101       loop that can be expressed as
1102
1103        for (j=0; j<3; j++) {
1104           for (i=0; i<4; i++) {
1105               mat(i,j) += src(j);
1106           }
1107        }
1108
1109       "thread" takes a list of numbers as arguments which explicitly specify
1110       which dimensions to thread over first. With the introduction of
1111       explicit threading the dimensions of a pdl are conceptually split into
1112       three different groups the latter two of which we have already encoun‐
1113       tered: thread dimensions, core dimensions and extra dimensions.
1114
1115       Conceptually, it is best to think of those dimensions of a pdl that
1116       have been specified in a call to "thread" as being taken away from the
1117       set of normal dimensions and put on a separate stack. So assuming we
1118       have a pdl "a(4,7,2,8)" saying
1119
1120        $b = $a->thread(2,1)
1121
1122       creates a new virtual pdl of dimension "b(4,8)" (which we call the
1123       remaining dims) that also has 2 thread dimensions of size "(2,7)". For
1124       the purposes of this document we write that symbolically as
1125       "b(4,8){2,7}". An important difference to the previous examples where
1126       only implicit threading was used is the fact that the core dimensions
1127       are matched against the remaining dimensions which are not necessarily
1128       the first dimensions of the pdl. We will now specify how the presence
1129       of thread dimensions changes the rules R1-R5 for threadloops (which
1130       apply to the special case where none of the pdl arguments has any
1131       thread dimensions).
1132
1133       T0  Core dimensions are matched against the first n remaining dimen‐
1134           sions of the pdl argument (note the difference to R1). Any further
1135           remaining dimensions are extra dimensions and are used to determine
1136           the implicit loop dimensions.
1137
1138       T1a The number of implicit loop dimensions is equal to the maximal num‐
1139           ber of extra dimensions taken over the set of pdl arguments.
1140
1141       T1b The number of explicit loop dimensions is equal to the maximal num‐
1142           ber of thread dimensions taken over the set of pdl arguments.
1143
1144       T1c The total number of loop dimensions is equal to the sum of explicit
1145           loop dimensions and implicit loop dimensions. In the thread loop,
1146           explicit loop dimensions are threaded over first followed by
1147           implicit loop dimensions.
1148
1149       T2  The size of each of the loop dimensions is derived from the size of
1150           the respective dimensions of the pdl arguments. It is given by the
1151           maximal size found in any pdls having this thread dimension (for
1152           explicit loop dimensions) or extra dimension (for implicit loop
1153           dimensions).
1154
1155       T3  This rule applies to any explicit loop dimension as well as any
1156           implicit loop dimension. For all pdls that have a given
1157           thread/extra dimension the size must be equal to the size of the
1158           respective explicit/implicit loop dimension or 1; otherwise you
1159           raise a runtime exception. If the size of a thread/extra dimension
1160           of a pdl is one it is implicitly treated as a dummy dimension of
1161           size equal to the explicit/implicit loop dimension.
1162
1163       T4  If a pdl doesn't have a thread/extra dimension that corresponds to
1164           an explicit/implicit loop dimension, in the thread loop this pdl is
1165           treated as if having a dummy dimension of size equal to the size of
1166           that loop dimension.
1167
1168       T4a All pdls that do have thread dimensions must have the same number
1169           of thread dimensions.
1170
1171       T5  Output autocreation cannot be used if any of the pdl arguments has
1172           any thread dimensions. Otherwise R5 applies.
1173
1174       The same restrictions apply with regard to implicit dummy dimensions
1175       (created by application of T4) as already mentioned in the section on
1176       implicit threading: if any of the output pdls has an (explicit or
1177       implicitly created) greater-than-one dummy dimension a runtime excep‐
1178       tion will be raised.
1179
1180       Let us demonstrate these rules at work in a generic case.  Suppose we
1181       have a (here unspecified) PP-function with the signature:
1182
1183        func((m,n),(m),(),[o](m))
1184
1185       and you call it with 3 pdls "a(5,3,10,11)", "b(3,5,10,1,12)", "c(10)"
1186       and an output pdl "d(3,11,5,10,12)" (which can here not be automati‐
1187       cally created) as
1188
1189        func($a->thread(1,3),$b->thread(0,3),$c,$d->thread(0,1))
1190
1191       From the signature of func and the above call the pdls split into the
1192       following groups of core, extra and thread dimensions (written in the
1193       form "pdl(core dims){thread dims}[extra dims]"):
1194
1195        a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]
1196
1197       With this to help us along (it is in general helpful to write the argu‐
1198       ments down like this when you start playing with threading and want to
1199       keep track of what is going on) we further deduce that the number of
1200       explicit loop dimensions is 2 (by T1b from $a and $b) with sizes
1201       "(3,11)" (by T2); 2 implicit loop dimensions (by T1a from $b and $d) of
1202       size "(10,12)" (by T2) and the elements of are computed from the input
1203       pdls in a way that can be expressed in pdl pseudo-code as
1204
1205        for (l=0;l<12;l++)
1206         for (k=0;k<10;k++)
1207          for (j=0;j<11;j++)         effect of treating it as dummy dim (index j)
1208           for (i=0;i<3;i++)                         ⎪
1209              d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))
1210
1211       Uhhmpf, this example was really not easy in terms of bookeeping. It
1212       serves mostly as an example how to figure out what's going on when you
1213       encounter a complicated looking expression. But now it is really time
1214       to show that threading is useful by giving some more of our so called
1215       "practical" examples.
1216
1217       [ The following examples will need some additional explanations in the
1218       future. For the moment please try to live with the comments in the code
1219       fragments. ]
1220
1221       Example 1:
1222
1223        *** inverse of matrix represented by eigvecs and eigvals
1224        ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
1225        **    =>  inverse M^-1 = A^T x diag(1/lambda_i) x A
1226        ** first $tmp = diag(1/lambda_i)*A
1227        ** then  A^T * $tmp by threaded inner product
1228        # index handling so that matrices print correct under pdl
1229        $inv .= $evecs*0;  # just copy to get appropriately sized output
1230        $tmp .= $evecs;    # initialise, no backpropagation
1231        ($tmp2 = $tmp->thread(0)) /= $evals;    #  threaded division
1232        # and now a matrix multiplication in disguise
1233        PDL::Primitive::inner($evecs->xchg(0,1)->thread(-1,1),
1234                              $tmp->thread(0,-1),
1235                              $inv->thread(0,1));
1236        # alternative for matrix mult using implicit threading,
1237        # first xchg only for transpose
1238        PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1),
1239                              $tmp->xchg(0,1)->dummy(2),
1240                              ($inv=null));
1241
1242       Example 2:
1243
1244        # outer product by threaded multiplication
1245        # stress that we need to do it with explicit call to my_biop1
1246        # when using explicit threading
1247        $res=zeroes(($a->dims)[0],($b->dims)[0]);
1248        my_biop1($a->thread(0,-1),$b->thread(-1,0),$res->(0,1),"*");
1249        # similiar thing by implicit threading with autocreated pdl
1250        $res = $a->dummy(1) * $b->dummy(0);
1251
1252       Example 3:
1253
1254        # different use of thread and unthread to shuffle a number of
1255        # dimensions in one go without lots of calls to ->xchg and ->mv
1256
1257        # use thread/unthread to shuffle dimensions around
1258        # just try it out and compare the child pdl with its parent
1259        $trans = $a->thread(4,1,0,3,2)->unthread;
1260
1261       Example 4:
1262
1263        # calculate a couple of bounding boxes
1264        # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
1265        # we use again thread and unthread to shuffle dimensions around
1266        perldl> $bb = zeroes(double, 2,3 );
1267        perldl> minimum($vertices->thread(0)->clump->unthread(1),
1268                        $bb->slice('(0),:'));
1269        perldl> maximum($vertices->thread(0)->clump->unthread(1),
1270                        $bb->slice('(1),:'));
1271
1272       Example 5:
1273
1274        # calculate a self-ratioed (i.e. self normalized) sequence of images
1275        # uses explicit threading and an implicitly threaded division
1276        $stack = read_image_stack();
1277        # calculate the average (per pixel average) of the first $n+1 images
1278        $aver = zeroes([stack->dims]->[0,1]);  # make the output pdl
1279        sumover($stack->slice(":,:,0:$n")->thread(0,1),$aver);
1280        $aver /= ($n+1);
1281        $stack /= $aver;  # normalize the stack by doing a threaded divison
1282        # implicit versus explicit
1283        # alternatively calculate $aver with implicit threading and autocreation
1284        sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
1285        $aver /= ($n+1);
1286        #
1287
1288       Implicit versus explicit threading
1289
1290       In this paragraph we are going to illustrate when explicit threading is
1291       preferrable over implicit threading and vice versa. But then again,
1292       this is probably not the best way of putting the case since you already
1293       know: the two flavours do mix. So, it's more about how to get the best
1294       of both worlds and, anyway, in the best of perl traditions: TIMTOWTDI !
1295
1296       [ Sorry, this still has to be filled in in a later release; either
1297       refer to above examples or choose some new ones ]
1298
1299       Finally, this may be a good place to justify all the technical detail
1300       we have been going on about for a couple of pages: why threading ?
1301
1302       Well, code that uses threading should be (considerably) faster than
1303       code that uses explicit for-loops (or similar perl constructs) to
1304       achieve the same functionality. Especially on supercomputers (with vec‐
1305       tor computing facilities/parallel processing) PDL threading will be
1306       implemented in a way that takes advantage of the additional facilities
1307       of these machines. Furthermore, it is a conceptually simply construct
1308       (though technical details might get involved at times) and can greatly
1309       reduce the syntactical complexity of PDL code (but keep the admonition
1310       for documentation in mind). Once you are comfortable with the threading
1311       way of thinking (and coding) it shouldn't be too difficult to under‐
1312       stand code that somebody else has written than (provided he gave you an
1313       idea what exspected input dimensions are, etc.). As a general tip to
1314       increase the performance of your code: if you have to introduce a loop
1315       into your code try to reformulate the problem so that you can use
1316       threading to perform the loop (as with anything there are exceptions to
1317       this rule of thumb; but the authors of this document tend to think that
1318       these are rare cases ;).
1319

PDL::PP

1321       An easy way to define functions that are aware of indexing and thread‐
1322       ing (and the universe and everything)
1323
1324       PDL:PP is part of the PDL distribution. It is used to generate func‐
1325       tions that are aware of indexing and threading rules from very concise
1326       descriptions. It can be useful for you if you want to write your own
1327       functions or if you want to interface functions from an external
1328       library so  that they support indexing and threading (and mabe dataflow
1329       as well, see PDL::Dataflow). For further details check PDL::PP.
1330

Appendix A

1332       Affine transformations - a special class of simple and powerful trans‐
1333       formations
1334
1335       [ This is also something to be added in future releases. Do we already
1336       have the general make_affine routine in PDL ? It is possible that we
1337       will reference another appropriate manpage from here ]
1338

Appendix B

1340       signatures of standard PDL::PP compiled functions
1341
1342       A selection of signatures of PDL primitives to show how many dimensions
1343       PP compiled functions gobble up (and therefore you can figure out what
1344       will be threaded over). Most of those functions are the basic ones
1345       defined in "primitive.pd"
1346
1347        # functions in primitive.pd
1348        #
1349        sumover        ((n),[o]())
1350        prodover       ((n),[o]())
1351        axisvalues     ((n))                                   inplace
1352        inner          ((n),(n),[o]())
1353        outer          ((n),(m),[o](n,m))
1354        innerwt        ((n),(n),(n),[o]())
1355        inner2         ((m),(m,n),(n),[o]())
1356        inner2t        ((j,n),(n,m),(m,k),[o]())
1357        index          (1D,0D,[o])
1358        minimum        (1D,[o])
1359        maximum        (1D,[o])
1360        wstat          ((n),(n),(),[o],())
1361        assgn          ((),())
1362
1363        # basic operations
1364        binary operations ((),(),[o]())
1365        unary operations  ((),[o]())
1366

AUTHOR & COPYRIGHT

1368       Copyright (C) 1997 Christian Soeller (c.soeller@auckland.ac.nz) & Tuo‐
1369       mas J. Lukka (lukka@fas.harvard.edu). All rights reserved. Although
1370       destined for release as a man page with the standard PDL distribution,
1371       it is not public domain. Permission is granted to freely distribute
1372       verbatim copies of this document provided that no modifications outside
1373       of formatting be made, and that this notice remain intact.  You are
1374       permitted and encouraged to use its code and derivatives thereof in
1375       your own source code for fun or for profit as you see fit.
1376
1377
1378
1379perl v5.8.8                       2003-05-21                       INDEXING(1)