1INDEXING(1) User Contributed Perl Documentation INDEXING(1)
2
3
4
6 PDL::Indexing - Introduction to indexing and slicing ndarrays.
7
9 This man page should serve as a first tutorial on the indexing and
10 threading features of PDL.
11
12 Like all vectorized languages, PDL automates looping over multi-
13 dimensional data structures ("ndarrays") using a variant of
14 mathematical vector notation. The automatic looping is called
15 "threading", in part because ultimately PDL will implement parallel
16 processing to speed up the loops.
17
18 A lot of the flexibility and power of PDL relies on the indexing and
19 threading features of the Perl extension. Indexing allows access to
20 the data of an ndarray in a very flexible way. Threading provides
21 efficient vectorization of simple operations.
22
23 The values of an ndarray are stored compactly as typed values in a
24 single block of memory, not (as in a normal Perl list-of-lists) as
25 individual Perl scalars.
26
27 In the sections that follow many "methods" are called out -- these are
28 Perl operators that apply to ndarrays. From the perldl (or pdl2)
29 shell, you can find out more about each method by typing "?" followed
30 by the method name.
31
32 Dimension lists
33 A ndarray (PDL variable), in general, is an N-dimensional array where N
34 can be 0 (for a scalar), 1 (e.g. for a sound sample), or higher values
35 for images and more complex structures. Each dimension of the ndarray
36 has a positive integer size. The "perl" interpreter treats each
37 ndarray as a special type of Perl scalar (a blessed Perl object,
38 actually -- but you don't have to know that to use them) that can be
39 used anywhere you can put a normal scalar.
40
41 You can access the dimensions of an ndarray as a Perl list and
42 otherwise determine the size of an ndarray with several methods. The
43 important ones are:
44
45 nelem - the total number of elements in an ndarray
46 ndims - returns the number of dimensions in an ndarray
47 dims - returns the dimension list of an ndarray as a Perl list
48 dim - returns the size of a particular dimension of an ndarray
49
50 Indexing and Dataflow
51 PDL maintains a notion of "dataflow" between an ndarray and indexed
52 subfields of that ndarray. When you produce an indexed subfield or
53 single element of a parent ndarray, the child and parent remain
54 attached until you manually disconnect them. This lets you represent
55 the same data different ways within your code -- for example, you can
56 consider an RGB image simultaneously as a collection of (R,G,B) values
57 in a 3 x 1000 x 1000 image, and as three separate 1000 x 1000 color
58 planes stored in different variables. Modifying any of the variables
59 changes the underlying memory, and the changes are reflected in all
60 representations of the data.
61
62 There are two important methods that let you control dataflow
63 connections between a child and parent ndarray:
64
65 copy - forces an explicit copy of an ndarray
66 sever - breaks the dataflow connection between an ndarray and its
67 parents (if any)
68
69 Threading and Dimension Order
70 Most PDL operations act on the first few dimensions of their ndarray
71 arguments. For example, "sumover" sums all elements along the first
72 dimension in the list (dimension 0). If you feed in a three-
73 dimensional ndarray, then the first dimension is considered the
74 "active" dimension and the later dimensions are "thread" dimensions
75 because they are simply looped over. There are several ways to
76 transpose or re-order the dimension list of an ndarray. Those
77 techniques are very fast since they don't touch the underlying data,
78 only change the way that PDL accesses the data. The main dimension
79 ordering functions are:
80
81 mv - moves a particular dimension somewhere else in the dimension list
82 xchg - exchanges two dimensions in the dimension list, leaving the rest
83 alone
84 reorder - allows wholesale mixing of the dimensions
85 clump - clumps together two or more small dimensions into one larger
86 one
87 squeeze - eliminates any dimensions of size 1
88
89 Physical and Dummy Dimensions
90 • document Perl level threading
91
92 • threadids
93
94 • update and correct description of slice
95
96 • new functions in slice.pd (affine, lag, splitdim)
97
98 • reworking of paragraph on explicit threading
99
101 A lot of the flexibility and power of PDL relies on the indexing and
102 looping features of the Perl extension. Indexing allows access to the
103 data of an ndarray in a very flexible way. Threading provides efficient
104 implicit looping functionality (since the loops are implemented as
105 optimized C code).
106
107 ndarrays are Perl objects that represent multidimensional arrays and
108 operations on those. In contrast to simple Perl @x style lists the
109 array data is compactly stored in a single block of memory thus taking
110 up a lot less memory and enabling use of fast C code to implement
111 operations (e.g. addition, etc) on ndarrays.
112
113 ndarrays can have children
114 Central to many of the indexing capabilities of PDL are the relation of
115 "parent" and "child" between ndarrays. Many of the indexing commands
116 create a new ndarray from an existing ndarray. The new ndarray is the
117 "child" and the old one is the "parent". The data of the new ndarray is
118 defined by a transformation that specifies how to generate (compute)
119 its data from the parent's data. The relation between the child ndarray
120 and its parent are often bidirectional, meaning that changes in the
121 child's data are propagated back to the parent. (Note: You see, we are
122 aiming in our terminology already towards the new dataflow features.
123 The kind of dataflow that is used by the indexing commands (about which
124 you will learn in a minute) is always in operation, not only when you
125 have explicitly switched on dataflow in your ndarray by saying
126 "$x->doflow". For further information about data flow check the
127 dataflow man page.)
128
129 Another way to interpret the ndarrays created by our indexing commands
130 is to view them as a kind of intelligent pointer that points back to
131 some portion or all of its parent's data. Therefore, it is not
132 surprising that the parent's data (or a portion of it) changes when
133 manipulated through this "pointer". After these introductory remarks
134 that hopefully prepared you for what is coming (rather than confuse you
135 too much) we are going to dive right in and start with a description of
136 the indexing commands and some typical examples how they might be used
137 in PDL programs. We will further illustrate the pointer/dataflow
138 analogies in the context of some of the examples later on.
139
140 There are two different implementations of this ``smart pointer''
141 relationship: the first one, which is a little slower but works for any
142 transformation is simply to do the transformation forwards and
143 backwards as necessary. The other is to consider the child ndarray a
144 ``virtual'' ndarray, which only stores a pointer to the parent and
145 access information so that routines which use the child ndarray
146 actually directly access the data in the parent. If the virtual
147 ndarray is given to a routine which cannot use it, PDL transparently
148 physicalizes the virtual ndarray before letting the routine use it.
149
150 Currently (1.94_01) all transformations which are ``affine'', i.e. the
151 indices of the data item in the parent ndarray are determined by a
152 linear transformation (+ constant) from the indices of the child
153 ndarray result in virtual ndarrays. All other indexing routines (e.g.
154 "->index(...)") result in physical ndarrays. All routines compiled by
155 PP can accept affine ndarrays (except those routines that pass pointers
156 to external library functions).
157
158 Note that whether something is affine or not does not affect the
159 semantics of what you do in any way: both
160
161 $x->index(...) .= 5;
162 $x->slice(...) .= 5;
163
164 change the data in $x. The affinity does, however, have a significant
165 impact on memory usage and performance.
166
167 Slicing ndarrays
168 Probably the most important application of the concept of parent/child
169 ndarrays is the representation of rectangular slices of a physical
170 ndarray by a virtual ndarray. Having talked long enough about concepts
171 let's get more specific. Suppose we are working with a 2D ndarray
172 representing a 5x5 image (it's unusually small so that we can print it
173 without filling several screens full of digits).
174
175 pdl> $im = sequence(5,5)
176 pdl> p $im
177
178 [
179 [ 0 1 2 3 4]
180 [ 5 6 7 8 9]
181 [10 11 12 13 14]
182 [15 16 17 18 19]
183 [20 21 22 23 24]
184 ]
185
186 pdl> help vars
187 PDL variables in package main::
188
189 Name Type Dimension Flow State Mem
190 ----------------------------------------------------------------
191 $im Double D [5,5] P 0.20Kb
192
193 [ here it might be appropriate to quickly talk about the "help vars"
194 command that provides information about ndarrays in the interactive
195 "perldl" or "pdl2" shell that comes with PDL. ]
196
197 Now suppose we want to create a 1-D ndarray that just references one
198 line of the image, say line 2; or an ndarray that represents all even
199 lines of the image (imagine we have to deal with even and odd frames of
200 an interlaced image due to some peculiar behaviour of our frame
201 grabber). As another frequent application of slices we might want to
202 create an ndarray that represents a rectangular region of the image
203 with top and bottom reversed. All these effects (and many more) can be
204 easily achieved with the powerful slice function:
205
206 pdl> $line = $im->slice(':,(2)')
207 pdl> $even = $im->slice(':,1:-1:2')
208 pdl> $area = $im->slice('3:4,3:1')
209 pdl> help vars # or just PDL->vars
210 PDL variables in package main::
211
212 Name Type Dimension Flow State Mem
213 ----------------------------------------------------------------
214 $even Double D [5,2] -C 0.00Kb
215 $im Double D [5,5] P 0.20Kb
216 $line Double D [5] -C 0.00Kb
217 $area Double D [2,3] -C 0.00Kb
218
219 All three "child" ndarrays are children of $im or in the other (largely
220 equivalent) interpretation pointers to data of $im. Operations on
221 those virtual ndarrays access only those portions of the data as
222 specified by the argument to slice. So we can just print line 2:
223
224 pdl> p $line
225 [10 11 12 13 14]
226
227 Also note the difference in the "Flow State" of $area above and below:
228
229 pdl> p $area
230 pdl> help $area
231 This variable is Double D [2,3] VC 0.00Kb
232
233 The following demonstrates that $im and $line really behave as you
234 would expect from a pointer-like object (or in the dataflow picture:
235 the changes in $line's data are propagated back to $im):
236
237 pdl> $im++
238 pdl> p $line
239 [11 12 13 14 15]
240 pdl> $line += 2
241 pdl> p $im
242
243 [
244 [ 1 2 3 4 5]
245 [ 6 7 8 9 10]
246 [13 14 15 16 17]
247 [16 17 18 19 20]
248 [21 22 23 24 25]
249 ]
250
251 Note how assignment operations on the child virtual ndarrays change the
252 parent physical ndarray and vice versa (however, the basic "="
253 assignment doesn't, use ".=" to obtain that effect. See below for the
254 reasons). The virtual child ndarrays are something like "live links"
255 to the "original" parent ndarray. As previously said, they can be
256 thought of to work similar to a C-pointer. But in contrast to a
257 C-pointer they carry a lot more information. Firstly, they specify the
258 structure of the data they represent (the dimensionality of the new
259 ndarray) and secondly, specify how to create this structure from its
260 parents data (the way this works is buried in the internals of PDL and
261 not important for you to know anyway (unless you want to hack the core
262 in the future or would like to become a PDL guru in general (for a
263 definition of this strange creature see PDL::Internals)).
264
265 The previous examples have demonstrated typical usage of the slice
266 function. Since the slicing functionality is so important here is an
267 explanation of the syntax for the string argument to slice:
268
269 $vpdl = $x->slice('ind0,ind1...')
270
271 where "ind0" specifies what to do with index No 0 of the ndarray $x,
272 etc. Each element of the comma separated list can have one of the
273 following forms:
274
275 ':' Use the whole dimension
276
277 'n' Use only index "n". The dimension of this index in the resulting
278 virtual ndarray is 1. An example involving those first two index
279 formats:
280
281 pdl> $column = $im->slice('2,:')
282 pdl> $row = $im->slice(':,0')
283 pdl> p $column
284
285 [
286 [ 3]
287 [ 8]
288 [15]
289 [18]
290 [23]
291 ]
292
293 pdl> p $row
294
295 [
296 [1 2 3 4 5]
297 ]
298
299 pdl> help $column
300 This variable is Double D [1,5] VC 0.00Kb
301
302 pdl> help $row
303 This variable is Double D [5,1] VC 0.00Kb
304
305 '(n)' Use only index "n". This dimension is removed from the resulting
306 ndarray (relying on the fact that a dimension of size 1 can
307 always be removed). The distinction between this case and the
308 previous one becomes important in assignments where left and
309 right hand side have to have appropriate dimensions.
310
311 pdl> $line = $im->slice(':,(0)')
312 pdl> help $line
313 This variable is Double D [5] -C 0.00Kb
314
315 pdl> p $line
316 [1 2 3 4 5]
317
318 Spot the difference to the previous example?
319
320 'n1:n2' or 'n1:n2:n3'
321 Take the range of indices from "n1" to "n2" or (second form) take
322 the range of indices from "n1" to "n2" with step "n3". An example
323 for the use of this format is the previous definition of the sub-
324 image composed of even lines.
325
326 pdl> $even = $im->slice(':,1:-1:2')
327
328 This example also demonstrates that negative indices work like
329 they do for normal Perl style arrays by counting backwards from
330 the end of the dimension. If "n2" is smaller than "n1" (in the
331 example -1 is equivalent to index 4) the elements in the virtual
332 ndarray are effectively reverted with respect to its parent.
333
334 '*[n]'
335 Add a dummy dimension. The size of this dimension will be 1 by
336 default or equal to "n" if the optional numerical argument is
337 given.
338
339 Now, this is really something a bit strange on first sight. What
340 is a dummy dimension? A dummy dimension inserts a dimension where
341 there wasn't one before. How is that done ? Well, in the case of
342 the new dimension having size 1 it can be easily explained by the
343 way in which you can identify a vector (with "m" elements) with
344 an "(1,m)" or "(m,1)" matrix. The same holds obviously for higher
345 dimensional objects. More interesting is the case of a dummy
346 dimensions of size greater than one (e.g. "slice('*5,:')"). This
347 works in the same way as a call to the dummy function creates a
348 new dummy dimension. So read on and check its explanation below.
349
350 '([n1:n2[:n3]]=i)'
351 [Not yet implemented ??????] With an argument like this you make
352 generalised diagonals. The diagonal will be dimension no. "i" of
353 the new output ndarray and (if optional part in brackets
354 specified) will extend along the range of indices specified of
355 the respective parent ndarray's dimension. In general an argument
356 like this only makes sense if there are other arguments like this
357 in the same call to slice. The part in brackets is optional for
358 this type of argument. All arguments of this type that specify
359 the same target dimension "i" have to relate to the same number
360 of indices in their parent dimension. The best way to explain it
361 is probably to give an example, here we make an ndarray that
362 refers to the elements along the space diagonal of its parent
363 ndarray (a cube):
364
365 $cube = zeroes(5,5,5);
366 $sdiag = $cube->slice('(=0),(=0),(=0)');
367
368 The above command creates a virtual ndarray that represents the
369 diagonal along the parents' dimension no. 0, 1 and 2 and makes
370 its dimension 0 (the only dimension) of it. You use the extended
371 syntax if the dimension sizes of the parent dimensions you want
372 to build the diagonal from have different sizes or you want to
373 reverse the sequence of elements in the diagonal, e.g.
374
375 $rect = zeroes(12,3,5,6,2);
376 $vpdl = $rect->slice('2:7,(0:1=1),(4),(5:4=1),(=1)');
377
378 So the elements of $vpdl will then be related to those of its
379 parent in way we can express as:
380
381 vpdl(i,j) = rect(i+2,j,4,5-j,j) 0<=i<5, 0<=j<2
382
383 [ work in the new index function: "$y = $x->index($c);" ???? ]
384
385 There are different kinds of assignments in PDL
386 The previous examples have already shown that virtual ndarrays can be
387 used to operate on or access portions of data of a parent ndarray. They
388 can also be used as lvalues in assignments (as the use of "++" in some
389 of the examples above has already demonstrated). For explicit
390 assignments to the data represented by a virtual ndarray you have to
391 use the overloaded ".=" operator (which in this context we call
392 propagated assignment). Why can't you use the normal assignment
393 operator "="?
394
395 Well, you definitely still can use the '=' operator but it wouldn't do
396 what you want. This is due to the fact that the '=' operator cannot be
397 overloaded in the same way as other assignment operators. If we tried
398 to use '=' to try to assign data to a portion of a physical ndarray
399 through a virtual ndarray we wouldn't achieve the desired effect
400 (instead the variable representing the virtual ndarray (a reference to
401 a blessed thingy) would after the assignment just contain the reference
402 to another blessed thingy which would behave to future assignments as a
403 "physical" copy of the original rvalue [this is actually not yet clear
404 and subject of discussions in the PDL developers mailing list]. In that
405 sense it would break the connection of the ndarray to the parent [
406 isn't this behaviour in a sense the opposite of what happens in
407 dataflow, where ".=" breaks the connection to the parent? ].
408
409 E.g.
410
411 pdl> $line = $im->slice(':,(2)')
412 pdl> $line = zeroes(5);
413 pdl> $line++;
414 pdl> p $im
415
416 [
417 [ 1 2 3 4 5]
418 [ 6 7 8 9 10]
419 [13 14 15 16 17]
420 [16 17 18 19 20]
421 [21 22 23 24 25]
422 ]
423
424 pdl> p $line
425 [1 1 1 1 1]
426
427 But using ".="
428
429 pdl> $line = $im->slice(':,(2)')
430 pdl> $line .= zeroes(5)
431 pdl> $line++
432 pdl> p $im
433
434 [
435 [ 1 2 3 4 5]
436 [ 6 7 8 9 10]
437 [ 1 1 1 1 1]
438 [16 17 18 19 20]
439 [21 22 23 24 25]
440 ]
441
442 pdl> print $line
443 [1 1 1 1 1]
444
445 Also, you can substitute
446
447 pdl> $line .= 0;
448
449 for the assignment above (the zero is converted to a scalar ndarray,
450 with no dimensions so it can be assigned to any ndarray).
451
452 A nice feature in recent perl versions is lvalue subroutines (i.e.,
453 versions 5.6.x and higher including all perls currently supported by
454 PDL). That allows one to use the slicing syntax on both sides of the
455 assignment:
456
457 pdl> $im->slice(':,(2)') .= zeroes(5)->xvals->float
458
459 Related to the lvalue sub assignment feature is a little trap for the
460 unwary: recent perls introduced a "feature" which breaks PDL's use of
461 lvalue subs for slice assignments when running under the perl debugger,
462 "perl -d". Under the debugger, the above usage gives an error like: "
463 Can't return a temporary from lvalue subroutine... " So you must use
464 syntax like this:
465
466 pdl> ($pdl = $im->slice(':,(2)')) .= zeroes(5)->xvals->float
467
468 which works both with and without the debugger but is arguably clumsy
469 and awkward to read.
470
471 Note that there can be a problem with assignments like this when lvalue
472 and rvalue ndarrays refer to overlapping portions of data in the parent
473 ndarray:
474
475 # revert the elements of the first line of $x
476 ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)');
477
478 Currently, the parent data on the right side of the assignments is not
479 copied before the (internal) assignment loop proceeds. Therefore, the
480 outcome of this assignment will depend on the sequence in which
481 elements are assigned and almost certainly not do what you wanted. So
482 the semantics are currently undefined for now and liable to change
483 anytime. To obtain the desired behaviour, use
484
485 ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)')->copy;
486
487 which makes a physical copy of the slice or
488
489 ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)')->sever;
490
491 which returns the same slice but severs the connection of the slice to
492 its parent.
493
494 Other functions that manipulate dimensions
495 Having talked extensively about the slice function it should be noted
496 that this is not the only PDL indexing function. There are additional
497 indexing functions which are also useful (especially in the context of
498 threading which we will talk about later). Here are a list and some
499 examples how to use them.
500
501 "dummy"
502 inserts a dummy dimension of the size you specify (default 1) at
503 the chosen location. You can't wait to hear how that is achieved?
504 Well, all elements with index "(X,x,Y)" ("0<=x<size_of_dummy_dim")
505 just map to the element with index "(X,Y)" of the parent ndarray
506 (where "X" and "Y" refer to the group of indices before and after
507 the location where the dummy dimension was inserted.)
508
509 This example calculates the x coordinate of the centroid of an
510 image (later we will learn that we didn't actually need the dummy
511 dimension thanks to the magic of implicit threading; but using
512 dummy dimensions the code would also work in a thread-less world;
513 though once you have worked with PDL threads you wouldn't want to
514 live without them again).
515
516 # centroid
517 ($xd,$yd) = $im->dims;
518 $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);
519
520 Let's explain how that works in a little more detail. First, the
521 product:
522
523 $xvs = xvals(zeroes($xd));
524 print $xvs->dummy(1,$yd); # repeat the line $yd times
525 $prod = $im*xvs->dummy(1,$yd); # form the pixel-wise product with
526 # the repeated line of x-values
527
528 The rest is then summing the results of the pixel-wise product
529 together and normalizing with the sum of all pixel values in the
530 original image thereby calculating the x-coordinate of the "center
531 of mass" of the image (interpreting pixel values as local mass)
532 which is known as the centroid of an image.
533
534 Next is a (from the point of view of memory consumption) very cheap
535 conversion from grey-scale to RGB, i.e. every pixel holds now a
536 triple of values instead of a scalar. The three values in the
537 triple are, fortunately, all the same for a grey image, so that our
538 trick works well in that it maps all the three members of the
539 triple to the same source element:
540
541 # a cheap grey-scale to RGB conversion
542 $rgb = $grey->dummy(0,3)
543
544 Unfortunately this trick cannot be used to convert your old B/W
545 photos to color ones in the way you'd like. :(
546
547 Note that the memory usage of ndarrays with dummy dimensions is
548 especially sensitive to the internal representation. If the ndarray
549 can be represented as a virtual affine (``vaffine'') ndarray, only
550 the control structures are stored. But if $y in
551
552 $x = zeroes(10000);
553 $y = $x->dummy(1,10000);
554
555 is made physical by some routine, you will find that the memory
556 usage of your program has suddenly grown by 100Mb.
557
558 "diagonal"
559 replaces two dimensions (which have to be of equal size) by one
560 dimension that references all the elements along the "diagonal"
561 along those two dimensions. Here, we have two examples which should
562 appear familiar to anyone who has ever done some linear algebra.
563 Firstly, make a unity matrix:
564
565 # unity matrix
566 $e = zeroes(float, 3, 3); # make everything zero
567 ($tmp = $e->diagonal(0,1)) .= 1; # set the elements along the diagonal to 1
568 print $e;
569
570 Or the other diagonal:
571
572 ($tmp = $e->slice(':-1:0')->diagonal(0,1)) .= 2;
573 print $e;
574
575 (Did you notice how we used the slice function to revert the
576 sequence of lines before setting the diagonal of the new child,
577 thereby setting the cross diagonal of the parent ?) Or a mapping
578 from the space of diagonal matrices to the field over which the
579 matrices are defined, the trace of a matrix:
580
581 # trace of a matrix
582 $trace = sum($mat->diagonal(0,1)); # sum all the diagonal elements
583
584 "xchg" and "mv"
585 xchg exchanges or "transposes" the two specified dimensions. A
586 straightforward example:
587
588 # transpose a matrix (without explicitly reshuffling data and
589 # making a copy)
590 $prod = $x x $x->xchg(0,1);
591
592 $prod should now be pretty close to the unity matrix if $x is an
593 orthogonal matrix. Often "xchg" will be used in the context of
594 threading but more about that later.
595
596 mv works in a similar fashion. It moves a dimension (specified by
597 its number in the parent) to a new position in the new child
598 ndarray:
599
600 $y = $x->mv(4,0); # make the 5th dimension of $x the first in the
601 # new child $y
602
603 The difference between "xchg" and "mv" is that "xchg" only changes
604 the position of two dimensions with each other, whereas "mv"
605 inserts the first dimension to the place of second, moving the
606 other dimensions around accordingly.
607
608 "clump"
609 collapses several dimensions into one. Its only argument specifies
610 how many dimensions of the source ndarray should be collapsed
611 (starting from the first). An (admittedly unrealistic) example is a
612 3D ndarray which holds data from a stack of image files that you
613 have just read in. However, the data from each image really
614 represents a 1D time series and has only been arranged that way
615 because it was digitized with a frame grabber. So to have it again
616 as an array of time sequences you say
617
618 pdl> $seqs = $stack->clump(2)
619 pdl> help vars
620 PDL variables in package main::
621
622 Name Type Dimension Flow State Mem
623 ----------------------------------------------------------------
624 $seqs Double D [8000,50] -C 0.00Kb
625 $stack Double D [100,80,50] P 3.05Mb
626
627 Unrealistic as it may seem, our confocal microscope software writes
628 data (sometimes) this way. But more often you use clump to achieve
629 a certain effect when using implicit or explicit threading.
630
631 Calls to indexing functions can be chained
632 As you might have noticed in some of the examples above calls to the
633 indexing functions can be nicely chained since all of these functions
634 return a newly created child object. However, when doing extensive
635 index manipulations in a chain be sure to keep track of what you are
636 doing, e.g.
637
638 $x->xchg(0,1)->mv(0,4)
639
640 moves the dimension 1 of $x to position 4 since when the second command
641 is executed the original dimension 1 has been moved to position 0 of
642 the new child that calls the "mv" function. I think you get the idea
643 (in spite of my convoluted explanations).
644
645 Propagated assignments ('.=') and dummy dimensions
646 A subtlety related to indexing is the assignment to ndarrays containing
647 dummy dimensions of size greater than 1. These assignments (using ".=")
648 are forbidden since several elements of the lvalue ndarray point to the
649 same element of the parent. As a consequence the value of those parent
650 elements are potentially ambiguous and would depend on the sequence in
651 which the implementation makes the assignments to elements. Therefore,
652 an assignment like this:
653
654 $x = pdl [1,2,3];
655 $y = $x->dummy(1,4);
656 $y .= yvals(zeroes(3,4));
657
658 can produce unexpected results and the results are explicitly undefined
659 by PDL because when PDL gets parallel computing features, the current
660 result may well change.
661
662 From the point of view of dataflow the introduction of greater-size-
663 than-one dummy dimensions is regarded as an irreversible transformation
664 (similar to the terminology in thermodynamics) which precludes backward
665 propagation of assignment to a parent (which you had explicitly
666 requested using the ".=" assignment). A similar problem to watch out
667 for occurs in the context of threading where sometimes dummy dimensions
668 are created implicitly during the thread loop (see below).
669
670 Reasons for the parent/child (or "pointer") concept
671 [ this will have to wait a bit ]
672
673 XXXXX being memory efficient
674 XXXXX in the context of threading
675 XXXXX very flexible and powerful way of accessing portions of ndarray data
676 (in much more general way than sec, etc allow)
677 XXXXX efficient implementation
678 XXXXX difference to section/at, etc.
679
680 How to make things physical again
681 [ XXXXX fill in later when everything has settled a bit more ]
682
683 ** When needed (xsub routine interfacing C lib function)
684 ** How achieved (->physical)
685 ** How to test (isphysical (explain how it works currently))
686 ** ->copy and ->sever
687
689 In the previous paragraph on indexing we have already mentioned the
690 term occasionally but now its really time to talk explicitly about
691 "threading" with ndarrays. The term threading has many different
692 meanings in different fields of computing. Within the framework of PDL
693 it could probably be loosely defined as an implicit looping facility.
694 It is implicit because you don't specify anything like enclosing for-
695 loops but rather the loops are automatically (or 'magically') generated
696 by PDL based on the dimensions of the ndarrays involved. This should
697 give you a first idea why the index/dimension manipulating functions
698 you have met in the previous paragraphs are especially important and
699 useful in the context of threading. The other ingredient for threading
700 (apart from the ndarrays involved) is a function that is threading
701 aware (generally, these are PDL::PP compiled functions) and that the
702 ndarrays are "threaded" over. So much about the terminology and now
703 let's try to shed some light on what it all means.
704
705 Implicit threading - a first example
706 There are two slightly different variants of threading. We start with
707 what we call "implicit threading". Let's pick a practical example that
708 involves looping of a function over many elements of a ndarray. Suppose
709 we have an RGB image that we want to convert to grey-scale. The RGB
710 image is represented by a 3-dim ndarray "im(3,x,y)" where the first
711 dimension contains the three color components of each pixel and "x" and
712 "y" are width and height of the image, respectively. Next we need to
713 specify how to convert a color-triple at a given pixel into a grey-
714 value (to be a realistic example it should represent the relative
715 intensity with which our color insensitive eye cells would detect that
716 color to achieve what we would call a natural conversion from color to
717 grey-scale). An approximation that works quite well is to compute the
718 grey intensity from each RGB triplet (r,g,b) as a weighted sum
719
720 grey-value = 77/256*r + 150/256*g + 29/256*b =
721 inner([77,150,29]/256, [r,g,b])
722
723 where the last form indicates that we can write this as an inner
724 product of the 3-vector comprising the weights for red, green and blue
725 components with the 3-vector containing the color components.
726 Traditionally, we might have written a function like the following to
727 process the whole image:
728
729 my @dims=$im->dims;
730 # here normally check that first dim has correct size (3), etc
731 $grey=zeroes(@dims[1,2]); # make the ndarray for the resulting grey image
732 $w = pdl [77,150,29] / 256; # the vector of weights
733 for ($j=0;$j<dims[2];$j++) {
734 for ($i=0;$i<dims[1];$i++) {
735 # compute the pixel value
736 $tmp = inner($w,$im->slice(':,(i),(j)'));
737 set($grey,$i,$j,$tmp); # and set it in the grey-scale image
738 }
739 }
740
741 Now we write the same using threading (noting that "inner" is a
742 threading aware function defined in the PDL::Primitive package)
743
744 $grey = inner($im,pdl([77,150,29]/256));
745
746 We have ended up with a one-liner that automatically creates the
747 ndarray $grey with the right number and size of dimensions and performs
748 the loops automatically (these loops are implemented as fast C code in
749 the internals of PDL). Well, we still owe you an explanation how this
750 'magic' is achieved.
751
752 How does the example work ?
753 The first thing to note is that every function that is threading aware
754 (these are without exception functions compiled from concise
755 descriptions by PDL::PP, later just called PP-functions) expects a
756 defined (minimum) number of dimensions (we call them core dimensions)
757 from each of its ndarray arguments. The inner function expects two one-
758 dimensional (input) parameters from which it calculates a zero-
759 dimensional (output) parameter. We write that symbolically as
760 "inner((n),(n),[o]())" and call it "inner"'s signature, where n
761 represents the size of that dimension. n being equal in the first and
762 second parameter means that those dimensions have to be of equal size
763 in any call. As a different example take the outer product which takes
764 two 1D vectors to generate a 2D matrix, symbolically written as
765 "outer((n),(m),[o](n,m))". The "[o]" in both examples indicates that
766 this (here third) argument is an output argument. In the latter example
767 the dimensions of first and second argument don't have to agree but you
768 see how they determine the size of the two dimensions of the output
769 ndarray.
770
771 Here is the point when threading finally enters the game. If you call
772 PP-functions with ndarrays that have more than the required core
773 dimensions the first dimensions of the ndarray arguments are used as
774 the core dimensions and the additional extra dimensions are threaded
775 over. Let us demonstrate this first with our example above
776
777 $grey = inner($im,$w); # w is the weight vector from above
778
779 In this case $w is 1D and so supplied just the core dimension, $im is
780 3D, more specifically "(3,x,y)". The first dimension (of size 3) is the
781 required core dimension that matches (as required by inner) the first
782 (and only) dimension of $w. The second dimension is the first thread
783 dimension (of size "x") and the third is here the second thread
784 dimension (of size "y"). The output ndarray is automatically created
785 (as requested by setting $grey to "null" prior to invocation). The
786 output dimensions are obtained by appending the loop dimensions (here
787 "(x,y)") to the core output dimensions (here 0D) to yield the final
788 dimensions of the auto-created ndarray (here "0D+2D=2D" to yield a 2D
789 output of size "(x,y)").
790
791 So the above command calls the core functionality that computes the
792 inner product of two 1D vectors "x*y" times with $w and all 1D slices
793 of the form "(':,(i),(j)')" of $im and sets the respective elements of
794 the output ndarray "$grey(i,j)" to the result of each computation. We
795 could write that symbolically as
796
797 $grey(0,0) = f($w,$im(:,(0),(0)))
798 $grey(1,0) = f($w,$im(:,(1),(0)))
799 .
800 .
801 .
802 $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
803 $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))
804
805 But this is done automatically by PDL without writing any explicit Perl
806 loops. We see that the command really creates an output ndarray with
807 the right dimensions and sets the elements indeed to the result of the
808 computation for each pixel of the input image.
809
810 When even more ndarrays and extra dimensions are involved things get a
811 bit more complicated. We will first give the general rules how the
812 thread dimensions depend on the dimensions of input ndarrays enabling
813 you to figure out the dimensionality of an auto-created output ndarray
814 (for any given set of input ndarrays and core dimensions of the PP-
815 function in question). The general rules will most likely appear a bit
816 confusing on first sight so that we'll set out to illustrate the usage
817 with a set of further examples (which will hopefully also demonstrate
818 that there are indeed many practical situations where threading comes
819 in extremely handy).
820
821 A call for coding discipline
822 Before we point out the other technical details of threading, please
823 note this call for programming discipline when using threading:
824
825 In order to preserve human readability, PLEASE comment any nontrivial
826 expression in your code involving threading. Most importantly, for any
827 subroutine, include information at the beginning about what you expect
828 the dimensions to represent (or ranges of dimensions).
829
830 As a warning, look at this undocumented function and try to guess what
831 might be going on:
832
833 sub lookup {
834 my ($im,$palette) = @_;
835 my $res;
836 index($palette->xchg(0,1),
837 $im->long->dummy(0,($palette->dim)[0]),
838 ($res=null));
839 return $res;
840 }
841
842 Would you agree that it might be difficult to figure out expected
843 dimensions, purpose of the routine, etc ? (If you want to find out
844 what this piece of code does, see below)
845
846 How to figure out the loop dimensions
847 There are a couple of rules that allow you to figure out number and
848 size of loop dimensions (and if the size of your input ndarrays comply
849 with the threading rules). Dimensions of any ndarray argument are
850 broken down into two groups in the following: Core dimensions (as
851 defined by the PP-function, see Appendix B for a list of PDL
852 primitives) and extra dimensions which comprises all remaining
853 dimensions of that ndarray. For example calling a function "func" with
854 the signature "func((n,m),[o](n))" with an ndarray "$x(2,4,7,1,3)" as
855 "f($x,($o = null))" results in the semantic splitting of x's dimensions
856 into: core dimensions "(2,4)" and extra dimensions "(7,1,3)".
857
858 R0 Core dimensions are identified with the first N dimensions of the
859 respective ndarray argument (and are required). Any further
860 dimensions are extra dimensions and used to determine the loop
861 dimensions.
862
863 R1 The number of (implicit) loop dimensions is equal to the maximal
864 number of extra dimensions taken over the set of ndarray
865 arguments.
866
867 R2 The size of each of the loop dimensions is derived from the size
868 of the respective dimensions of the ndarray arguments. The size
869 of a loop dimension is given by the maximal size found in any of
870 the ndarrays having this extra dimension.
871
872 R3 For all ndarrays that have a given extra dimension the size must
873 be equal to the size of the loop dimension (as determined by the
874 previous rule) or 1; otherwise you raise a runtime exception. If
875 the size of the extra dimension in an ndarray is one it is
876 implicitly treated as a dummy dimension of size equal to that
877 loop dim size when performing the thread loop.
878
879 R4 If an ndarray doesn't have a loop dimension, in the thread loop
880 this ndarray is treated as if having a dummy dimension of size
881 equal to the size of that loop dimension.
882
883 R5 If output auto-creation is used (by setting the relevant ndarray
884 to "PDL->null" before invocation) the number of dimensions of the
885 created ndarray is equal to the sum of the number of core output
886 dimensions + number of loop dimensions. The size of the core
887 output dimensions is derived from the relevant dimension of input
888 ndarrays (as specified in the function definition) and the sizes
889 of the other dimensions are equal to the size of the loop
890 dimension it is derived from. The automatically created ndarray
891 will be physical (unless dataflow is in operation).
892
893 In this context, note that you can run into the problem with assignment
894 to ndarrays containing greater-than-one dummy dimensions (see above).
895 Although your output ndarray(s) didn't contain any dummy dimensions in
896 the first place they may end up with implicitly created dummy
897 dimensions according to R4.
898
899 As an example, suppose we have a (here unspecified) PP-function with
900 the signature:
901
902 func((m,n),(m,n,o),(m),[o](m,o))
903
904 and you call it with 3 ndarrays "$x(5,3,10,11)", "$y(5,3,2,10,1,12)",
905 and "$z(5,1,11,12)" as
906
907 func($x,$y,$z,($d=null))
908
909 then the number of loop dimensions is 3 (by "R0+R1" from $y and $z)
910 with sizes "(10,11,12)" (by R2); the two output core dimensions are
911 "(5,2)" (from the signature of func) resulting in a 5-dimensional
912 output ndarray $c of size "(5,2,10,11,12)" (see R5) and (the
913 automatically created) $d is derived from "($x,$y,$z)" in a way that
914 can be expressed in pdl pseudo-code as
915
916 $d(:,:,i,j,k) .= func($x(:,:,i,j),$y(:,:,:,i,0,k),$z(:,0,j,k))
917 with 0<=i<10, 0<=j<=11, 0<=k<12
918
919 If we analyze the color to grey-scale conversion again with these rules
920 in mind we note another great advantage of implicit threading. We can
921 call the conversion with an ndarray representing a pixel (im(3)), a
922 line of rgb pixels ("im(3,x)"), a proper color image ("im(3,x,y)") or a
923 whole stack of RGB images ("im(3,x,y,z)"). As long as $im is of the
924 form "(3,...)" the automatically created output ndarray will contain
925 the right number of dimensions and contain the intensity data as we
926 expect it since the loops have been implicitly performed thanks to
927 implicit threading. You can easily convince yourself that calling with
928 a color pixel $grey is 0D, with a line it turns out 1D grey(x), with an
929 image we get "grey(x,y)" and finally we get a converted image stack
930 "grey(x,y,z)".
931
932 Let's fill these general rules with some more life by going through a
933 couple of further examples. The reader may try to figure out equivalent
934 formulations with explicit for-looping and compare the flexibility of
935 those routines using implicit threading to the explicit formulation.
936 Furthermore, especially when using several thread dimensions it is a
937 useful exercise to check the relative speed by doing some benchmark
938 tests (which we still have to do).
939
940 First in the row is a slightly reworked centroid example, now coded
941 with threading in mind.
942
943 # threaded mult to calculate centroid coords, works for stacks as well
944 $xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) /
945 sumover($im->clump(2));
946
947 Let's analyze what's going on step by step. First the product:
948
949 $prod = $im*xvals(zeroes(($im->dims)[0]))
950
951 This will actually work for $im being one, two, three, and higher
952 dimensional. If $im is one-dimensional it's just an ordinary product
953 (in the sense that every element of $im is multiplied with the
954 respective element of "xvals(...)"), if $im has more dimensions further
955 threading is done by adding appropriate dummy dimensions to
956 "xvals(...)" according to R4. More importantly, the two sumover
957 operations show a first example of how to make use of the dimension
958 manipulating commands. A quick look at sumover's signature will remind
959 you that it will only "gobble up" the first dimension of a given input
960 ndarray. But what if we want to really compute the sum over all
961 elements of the first two dimensions? Well, nothing keeps us from
962 passing a virtual ndarray into sumover which in this case is formed by
963 clumping the first two dimensions of the "parent ndarray" into one.
964 From the point of view of the parent ndarray the sum is now computed
965 over the first two dimensions, just as we wanted, though sumover has
966 just done the job as specified by its signature. Got it ?
967
968 Another little finesse of writing the code like that: we intentionally
969 used "sumover($pdl->clump(2))" instead of "sum($pdl)" so that we can
970 either pass just an image "(x,y)" or a stack of images "(x,y,t)" into
971 this routine and get either just one x-coordinate or a vector of
972 x-coordinates (of size t) in return.
973
974 Another set of common operations are what one could call "projection
975 operations". These operations take a N-D ndarray as input and return a
976 (N-1)-D "projected" ndarray. These operations are often performed with
977 functions like sumover, prodover, minimum and maximum. Using again
978 images as examples we might want to calculate the maximum pixel value
979 for each line of an image or image stack. We know how to do that
980
981 # maxima of lines (as function of line number and time)
982 maximum($stack,($ret=null));
983
984 But what if you want to calculate maxima per column when implicit
985 threading always applies the core functionality to the first dimension
986 and threads over all others? How can we achieve that instead the core
987 functionality is applied to the second dimension and threading is done
988 over the others. Can you guess it? Yes, we make a virtual ndarray that
989 has the second dimension of the "parent ndarray" as its first dimension
990 using the "mv" command.
991
992 # maxima of columns (as function of column number and time)
993 maximum($stack->mv(1,0),($ret=null));
994
995 and calculating all the sums of sub-slices over the third dimension is
996 now almost too easy
997
998 # sums of pixels in time (assuming time is the third dim)
999 sumover($stack->mv(2,0),($ret=null));
1000
1001 Finally, if you want to apply the operation to all elements (like max
1002 over all elements or sum over all elements) regardless of the
1003 dimensions of the ndarray in question "clump" comes in handy. As an
1004 example look at the definition of "sum" (as defined in "Ufunc.pm"):
1005
1006 sub sum {
1007 PDL::Ufunc::sumover($name->clump(-1),($tmp=null));
1008 return $tmp->at(); # return a Perl number, not a 0D ndarray
1009 }
1010
1011 We have already mentioned that all basic operations support threading
1012 and assignment is no exception. So here are a couple of threaded
1013 assignments
1014
1015 pdl> $im = zeroes(byte, 10,20)
1016 pdl> $line = exp(-rvals(10)**2/9)
1017 # threaded assignment
1018 pdl> $im .= $line # set every line of $im to $line
1019 pdl> $im2 .= 5 # set every element of $im2 to 5
1020
1021 By now you probably see how it works and what it does, don't you?
1022
1023 To finish the examples in this paragraph here is a function to create
1024 an RGB image from what is called a palette image. The palette image
1025 consists of two parts: an image of indices into a color lookup table
1026 and the color lookup table itself. [ describe how it works ] We are
1027 going to use a PP-function we haven't encoutered yet in the previous
1028 examples. It is the aptly named index function, signature
1029 "((n),(),[o]())" (see Appendix B) with the core functionality that
1030 "index(pdl (0,2,4,5),2,($ret=null))" will return the element with index
1031 2 of the first input ndarray. In this case, $ret will contain the value
1032 4. So here is the example:
1033
1034 # a threaded index lookup to generate an RGB, or RGBA or YMCK image
1035 # from a palette image (represented by a lookup table $palette and
1036 # an color-index image $im)
1037 # you can say just dummy(0) since the rules of threading make it fit
1038 pdl> index($palette->xchg(0,1),
1039 $im->long->dummy(0,($palette->dim)[0]),
1040 ($res=null));
1041
1042 Let's go through it and explain the steps involved. Assuming we are
1043 dealing with an RGB lookup-table $palette is of size "(3,x)". First we
1044 exchange the dimensions of the palette so that looping is done over the
1045 first dimension of $palette (of size 3 that represent r, g, and b
1046 components). Now looking at $im, we add a dummy dimension of size equal
1047 to the length of the number of components (in the case we are
1048 discussing here we could have just used the number 3 since we have 3
1049 color components). We can use a dummy dimension since for red, green
1050 and blue color components we use the same index from the original
1051 image, e.g. assuming a certain pixel of $im had the value 4 then the
1052 lookup should produce the triple
1053
1054 [palette(0,4),palette(1,4),palette(2,4)]
1055
1056 for the new red, green and blue components of the output image.
1057 Hopefully by now you have some sort of idea what the above piece of
1058 code is supposed to do (it is often actually quite complicated to
1059 describe in detail how a piece of threading code works; just go ahead
1060 and experiment a bit to get a better feeling for it).
1061
1062 If you have read the threading rules carefully, then you might have
1063 noticed that we didn't have to explicitly state the size of the dummy
1064 dimension that we created for $im; when we create it with size 1 (the
1065 default) the rules of threading make it automatically fit to the
1066 desired size (by rule R3, in our example the size would be 3 assuming a
1067 palette of size "(3,x)"). Since situations like this do occur often in
1068 practice this is actually why rule R3 has been introduced (the part
1069 that makes dimensions of size 1 fit to the thread loop dim size). So we
1070 can just say
1071
1072 pdl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null));
1073
1074 Again, you can convince yourself that this routine will create the
1075 right output if called with a pixel ($im is 0D), a line ($im is 1D), an
1076 image ($im is 2D), ..., an RGB lookup table (palette is "(3,x)") and
1077 RGBA lookup table (palette is "(4,x)", see e.g. OpenGL). This
1078 flexibility is achieved by the rules of threading which are made to do
1079 the right thing in most situations.
1080
1081 To wrap it all up once again, the general idea is as follows. If you
1082 want to achieve looping over certain dimensions and have the core
1083 functionality applied to another specified set of dimensions you use
1084 the dimension manipulating commands to create a (or several) virtual
1085 ndarray(s) so that from the point of view of the parent ndarray(s) you
1086 get what you want (always having the signature of the function in
1087 question and R1-R5 in mind!). Easy, isn't it ?
1088
1089 Output auto-creation and PP-function calling conventions
1090 At this point we have to divert to some technical detail that has to do
1091 with the general calling conventions of PP-functions and the automatic
1092 creation of output arguments. Basically, there are two ways of
1093 invoking PDL routines, namely
1094
1095 $result = func($x,$y);
1096
1097 and
1098
1099 func($x,$y,$result);
1100
1101 If you are only using implicit threading then the output variable can
1102 be automatically created by PDL. You flag that to the PP-function by
1103 setting the output argument to a special kind of ndarray that is
1104 returned from a call to the function "PDL->null" that returns an
1105 essentially "empty" ndarray (for those interested in details there is a
1106 flag in the C pdl structure for this). The dimensions of the created
1107 ndarray are determined by the rules of implicit threading: the first
1108 dimensions are the core output dimensions to which the threading
1109 dimensions are appended (which are in turn determined by the dimensions
1110 of the input ndarrays as described above). So you can say
1111
1112 func($x,$y,($result=PDL->null));
1113
1114 or
1115
1116 $result = func($x,$y)
1117
1118 which are exactly equivalent.
1119
1120 Be warned that you can not use output auto-creation when using explicit
1121 threading (for reasons explained in the following section on explicit
1122 threading, the second variant of threading).
1123
1124 In "tight" loops you probably want to avoid the implicit creation of a
1125 temporary ndarray in each step of the loop that comes along with the
1126 "functional" style but rather say
1127
1128 # create output ndarray of appropriate size only at first invocation
1129 $result = null;
1130 for (0...$n) {
1131 func($x,$y,$result); # in all but the first invocation $result
1132 func2($y); # is defined and has the right size to
1133 # take the output provided $y's dims don't change
1134 twiddle($result,$x); # do something from $result to $x for iteration
1135 }
1136
1137 The take-home message of this section once more: be aware of the
1138 limitation on output creation when using explicit threading.
1139
1140 Explicit threading
1141 Having so far only talked about the first flavour of threading it is
1142 now about time to introduce the second variant. Instead of shuffling
1143 around dimensions all the time and relying on the rules of implicit
1144 threading to get it all right you sometimes might want to specify in a
1145 more explicit way how to perform the thread loop. It is probably not
1146 too surprising that this variant of the game is called explicit
1147 threading. Now, before we create the wrong impression: it is not
1148 either implicit or explicit; the two flavours do mix. But more about
1149 that later.
1150
1151 The two most used functions with explicit threading are thread and
1152 unthread. We start with an example that illustrates typical usage of
1153 the former:
1154
1155 [ # ** this is the worst possible example to start with ]
1156 # but can be used to show that $mat += $line is different from
1157 # $mat->thread(0) += $line
1158 # explicit threading to add a vector to each column of a matrix
1159 pdl> $mat = zeroes(4,3)
1160 pdl> $line = pdl (3.1416,2,-2)
1161 pdl> ($tmp = $mat->thread(0)) += $line
1162
1163 In this example, "$mat->thread(0)" tells PDL that you want the second
1164 dimension of this ndarray to be threaded over first leading to a thread
1165 loop that can be expressed as
1166
1167 for (j=0; j<3; j++) {
1168 for (i=0; i<4; i++) {
1169 mat(i,j) += src(j);
1170 }
1171 }
1172
1173 "thread" takes a list of numbers as arguments which explicitly specify
1174 which dimensions to thread over first. With the introduction of
1175 explicit threading the dimensions of an ndarray are conceptually split
1176 into three different groups the latter two of which we have already
1177 encountered: thread dimensions, core dimensions and extra dimensions.
1178
1179 Conceptually, it is best to think of those dimensions of an ndarray
1180 that have been specified in a call to "thread" as being taken away from
1181 the set of normal dimensions and put on a separate stack. So assuming
1182 we have an ndarray "x(4,7,2,8)" saying
1183
1184 $y = $x->thread(2,1)
1185
1186 creates a new virtual ndarray of dimension "y(4,8)" (which we call the
1187 remaining dims) that also has 2 thread dimensions of size "(2,7)". For
1188 the purposes of this document we write that symbolically as
1189 "y(4,8){2,7}". An important difference to the previous examples where
1190 only implicit threading was used is the fact that the core dimensions
1191 are matched against the remaining dimensions which are not necessarily
1192 the first dimensions of the ndarray. We will now specify how the
1193 presence of thread dimensions changes the rules R1-R5 for thread loops
1194 (which apply to the special case where none of the ndarray arguments
1195 has any thread dimensions).
1196
1197 T0 Core dimensions are matched against the first n remaining
1198 dimensions of the ndarray argument (note the difference to R1). Any
1199 further remaining dimensions are extra dimensions and are used to
1200 determine the implicit loop dimensions.
1201
1202 T1a The number of implicit loop dimensions is equal to the maximal
1203 number of extra dimensions taken over the set of ndarray arguments.
1204
1205 T1b The number of explicit loop dimensions is equal to the maximal
1206 number of thread dimensions taken over the set of ndarray
1207 arguments.
1208
1209 T1c The total number of loop dimensions is equal to the sum of explicit
1210 loop dimensions and implicit loop dimensions. In the thread loop,
1211 explicit loop dimensions are threaded over first followed by
1212 implicit loop dimensions.
1213
1214 T2 The size of each of the loop dimensions is derived from the size of
1215 the respective dimensions of the ndarray arguments. It is given by
1216 the maximal size found in any ndarrays having this thread dimension
1217 (for explicit loop dimensions) or extra dimension (for implicit
1218 loop dimensions).
1219
1220 T3 This rule applies to any explicit loop dimension as well as any
1221 implicit loop dimension. For all ndarrays that have a given
1222 thread/extra dimension the size must be equal to the size of the
1223 respective explicit/implicit loop dimension or 1; otherwise you
1224 raise a runtime exception. If the size of a thread/extra dimension
1225 of an ndarray is one it is implicitly treated as a dummy dimension
1226 of size equal to the explicit/implicit loop dimension.
1227
1228 T4 If an ndarray doesn't have a thread/extra dimension that
1229 corresponds to an explicit/implicit loop dimension, in the thread
1230 loop this ndarray is treated as if having a dummy dimension of size
1231 equal to the size of that loop dimension.
1232
1233 T4a All ndarrays that do have thread dimensions must have the same
1234 number of thread dimensions.
1235
1236 T5 Output auto-creation cannot be used if any of the ndarray arguments
1237 has any thread dimensions. Otherwise R5 applies.
1238
1239 The same restrictions apply with regard to implicit dummy dimensions
1240 (created by application of T4) as already mentioned in the section on
1241 implicit threading: if any of the output ndarrays has an (explicit or
1242 implicitly created) greater-than-one dummy dimension a runtime
1243 exception will be raised.
1244
1245 Let us demonstrate these rules at work in a generic case. Suppose we
1246 have a (here unspecified) PP-function with the signature:
1247
1248 func((m,n),(m),(),[o](m))
1249
1250 and you call it with 3 ndarrays "a(5,3,10,11)", "b(3,5,10,1,12)",
1251 "c(10)" and an output ndarray "d(3,11,5,10,12)" (which can here not be
1252 automatically created) as
1253
1254 func($x->thread(1,3),$y->thread(0,3),$c,$d->thread(0,1))
1255
1256 From the signature of func and the above call the ndarrays split into
1257 the following groups of core, extra and thread dimensions (written in
1258 the form "pdl(core dims){thread dims}[extra dims]"):
1259
1260 a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]
1261
1262 With this to help us along (it is in general helpful to write the
1263 arguments down like this when you start playing with threading and want
1264 to keep track of what is going on) we further deduce that the number of
1265 explicit loop dimensions is 2 (by T1b from $a and $b) with sizes
1266 "(3,11)" (by T2); 2 implicit loop dimensions (by T1a from $b and $d) of
1267 size "(10,12)" (by T2) and the elements of are computed from the input
1268 ndarrays in a way that can be expressed in pdl pseudo-code as
1269
1270 for (l=0;l<12;l++)
1271 for (k=0;k<10;k++)
1272 for (j=0;j<11;j++) effect of treating it as dummy dim (index j)
1273 for (i=0;i<3;i++) |
1274 d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))
1275
1276 Ugh, this example was really not easy in terms of bookkeeping. It
1277 serves mostly as an example how to figure out what's going on when you
1278 encounter a complicated looking expression. But now it is really time
1279 to show that threading is useful by giving some more of our so called
1280 "practical" examples.
1281
1282 [ The following examples will need some additional explanations in the
1283 future. For the moment please try to live with the comments in the code
1284 fragments. ]
1285
1286 Example 1:
1287
1288 *** inverse of matrix represented by eigvecs and eigvals
1289 ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
1290 ** => inverse M^-1 = A^T x diag(1/lambda_i) x A
1291 ** first $tmp = diag(1/lambda_i)*A
1292 ** then A^T * $tmp by threaded inner product
1293 # index handling so that matrices print correct under pdl
1294 $inv .= $evecs*0; # just copy to get appropriately sized output
1295 $tmp .= $evecs; # initialise, no back-propagation
1296 ($tmp2 = $tmp->thread(0)) /= $evals; # threaded division
1297 # and now a matrix multiplication in disguise
1298 PDL::Primitive::inner($evecs->xchg(0,1)->thread(-1,1),
1299 $tmp->thread(0,-1),
1300 $inv->thread(0,1));
1301 # alternative for matrix mult using implicit threading,
1302 # first xchg only for transpose
1303 PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1),
1304 $tmp->xchg(0,1)->dummy(2),
1305 ($inv=null));
1306
1307 Example 2:
1308
1309 # outer product by threaded multiplication
1310 # stress that we need to do it with explicit call to my_biop1
1311 # when using explicit threading
1312 $res=zeroes(($x->dims)[0],($y->dims)[0]);
1313 my_biop1($x->thread(0,-1),$y->thread(-1,0),$res->(0,1),"*");
1314 # similar thing by implicit threading with auto-created ndarray
1315 $res = $x->dummy(1) * $y->dummy(0);
1316
1317 Example 3:
1318
1319 # different use of thread and unthread to shuffle a number of
1320 # dimensions in one go without lots of calls to ->xchg and ->mv
1321
1322
1323 # use thread/unthread to shuffle dimensions around
1324 # just try it out and compare the child ndarray with its parent
1325 $trans = $x->thread(4,1,0,3,2)->unthread;
1326
1327 Example 4:
1328
1329 # calculate a couple of bounding boxes
1330 # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
1331 # we use again thread and unthread to shuffle dimensions around
1332 pdl> $bb = zeroes(double, 2,3 );
1333 pdl> minimum($vertices->thread(0)->clump->unthread(1), $bb->slice('(0),:'));
1334 pdl> maximum($vertices->thread(0)->clump->unthread(1), $bb->slice('(1),:'));
1335
1336 Example 5:
1337
1338 # calculate a self-rationed (i.e. self normalized) sequence of images
1339 # uses explicit threading and an implicitly threaded division
1340 $stack = read_image_stack();
1341 # calculate the average (per pixel average) of the first $n+1 images
1342 $aver = zeroes([stack->dims]->[0,1]); # make the output ndarray
1343 sumover($stack->slice(":,:,0:$n")->thread(0,1),$aver);
1344 $aver /= ($n+1);
1345 $stack /= $aver; # normalize the stack by doing a threaded division
1346 # implicit versus explicit
1347 # alternatively calculate $aver with implicit threading and auto-creation
1348 sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
1349 $aver /= ($n+1);
1350 #
1351
1352 Implicit versus explicit threading
1353 In this paragraph we are going to illustrate when explicit threading is
1354 preferable over implicit threading and vice versa. But then again, this
1355 is probably not the best way of putting the case since you already
1356 know: the two flavours do mix. So, it's more about how to get the best
1357 of both worlds and, anyway, in the best of Perl traditions: TIMTOWTDI !
1358
1359 [ Sorry, this still has to be filled in in a later release; either
1360 refer to above examples or choose some new ones ]
1361
1362 Finally, this may be a good place to justify all the technical detail
1363 we have been going on about for a couple of pages: why threading ?
1364
1365 Well, code that uses threading should be (considerably) faster than
1366 code that uses explicit for-loops (or similar Perl constructs) to
1367 achieve the same functionality. Especially on supercomputers (with
1368 vector computing facilities/parallel processing) PDL threading will be
1369 implemented in a way that takes advantage of the additional facilities
1370 of these machines. Furthermore, it is a conceptually simply construct
1371 (though technical details might get involved at times) and can greatly
1372 reduce the syntactical complexity of PDL code (but keep the admonition
1373 for documentation in mind). Once you are comfortable with the threading
1374 way of thinking (and coding) it shouldn't be too difficult to
1375 understand code that somebody else has written than (provided he gave
1376 you an idea what expected input dimensions are, etc.). As a general tip
1377 to increase the performance of your code: if you have to introduce a
1378 loop into your code try to reformulate the problem so that you can use
1379 threading to perform the loop (as with anything there are exceptions to
1380 this rule of thumb; but the authors of this document tend to think that
1381 these are rare cases ;).
1382
1384 An easy way to define functions that are aware of indexing and threading
1385 (and the universe and everything)
1386 PDL:PP is part of the PDL distribution. It is used to generate
1387 functions that are aware of indexing and threading rules from very
1388 concise descriptions. It can be useful for you if you want to write
1389 your own functions or if you want to interface functions from an
1390 external library so that they support indexing and threading (and
1391 maybe dataflow as well, see PDL::Dataflow). For further details check
1392 PDL::PP.
1393
1395 Affine transformations - a special class of simple and powerful
1396 transformations
1397 [ This is also something to be added in future releases. Do we already
1398 have the general make_affine routine in PDL ? It is possible that we
1399 will reference another appropriate man page from here ]
1400
1402 signatures of standard PDL::PP compiled functions
1403 A selection of signatures of PDL primitives to show how many dimensions
1404 PP compiled functions gobble up (and therefore you can figure out what
1405 will be threaded over). Most of those functions are the basic ones
1406 defined in "primitive.pd"
1407
1408 # functions in primitive.pd
1409 #
1410 sumover ((n),[o]())
1411 prodover ((n),[o]())
1412 axisvalues ((n)) inplace
1413 inner ((n),(n),[o]())
1414 outer ((n),(m),[o](n,m))
1415 innerwt ((n),(n),(n),[o]())
1416 inner2 ((m),(m,n),(n),[o]())
1417 inner2t ((j,n),(n,m),(m,k),[o]())
1418 index (1D,0D,[o])
1419 minimum (1D,[o])
1420 maximum (1D,[o])
1421 wstat ((n),(n),(),[o],())
1422 assgn ((),())
1423
1424 # basic operations
1425 binary operations ((),(),[o]())
1426 unary operations ((),[o]())
1427
1429 Copyright (C) 1997 Christian Soeller (c.soeller@auckland.ac.nz) &
1430 Tuomas J. Lukka (lukka@fas.harvard.edu). All rights reserved. Although
1431 destined for release as a man page with the standard PDL distribution,
1432 it is not public domain. Permission is granted to freely distribute
1433 verbatim copies of this document provided that no modifications outside
1434 of formatting be made, and that this notice remain intact. You are
1435 permitted and encouraged to use its code and derivatives thereof in
1436 your own source code for fun or for profit as you see fit.
1437
1438
1439
1440perl v5.34.0 2021-08-16 INDEXING(1)