1INDEXING(1) User Contributed Perl Documentation INDEXING(1)
2
3
4
6 PDL::Indexing - Introduction to indexing and slicing ndarrays.
7
9 This man page should serve as a first tutorial on the indexing and
10 broadcasting features of PDL.
11
12 Like all vectorized languages, PDL automates looping over multi-
13 dimensional data structures ("ndarrays") using a variant of
14 mathematical vector notation. The automatic looping is called
15 "broadcasting", similar to NumPy and Julia. PDL also automatically runs
16 broadcasting computation in parallel - see PDL::ParallelCPU.
17
18 A lot of the flexibility and power of PDL relies on the indexing and
19 broadcasting features of the Perl extension. Indexing allows access to
20 the data of an ndarray in a very flexible way. Broadcasting provides
21 efficient vectorization of simple operations.
22
23 The values of an ndarray are stored compactly as typed values in a
24 single block of memory, not (as in a normal Perl list-of-lists) as
25 individual Perl scalars.
26
27 In the sections that follow many "methods" are called out -- these are
28 Perl operators that apply to ndarrays. From the perldl (or pdl2)
29 shell, you can find out more about each method by typing "?" followed
30 by the method name.
31
32 Dimension lists
33 A ndarray (PDL variable), in general, is an N-dimensional array where N
34 can be 0 (for a scalar), 1 (e.g. for a sound sample), or higher values
35 for images and more complex structures. Each dimension of the ndarray
36 has a positive integer size. The "perl" interpreter treats each
37 ndarray as a special type of Perl scalar (a blessed Perl object,
38 actually -- but you don't have to know that to use them) that can be
39 used anywhere you can put a normal scalar.
40
41 You can access the dimensions of an ndarray as a Perl list and
42 otherwise determine the size of an ndarray with several methods. The
43 important ones are:
44
45 nelem - the total number of elements in an ndarray
46 ndims - returns the number of dimensions in an ndarray
47 dims - returns the dimension list of an ndarray as a Perl list
48 dim - returns the size of a particular dimension of an ndarray
49
50 Indexing and Dataflow
51 PDL maintains a notion of "dataflow" between an ndarray and indexed
52 subfields of that ndarray. When you produce an indexed subfield or
53 single element of a parent ndarray, the child and parent remain
54 attached until you manually disconnect them. This lets you represent
55 the same data different ways within your code -- for example, you can
56 consider an RGB image simultaneously as a collection of (R,G,B) values
57 in a 3 x 1000 x 1000 image, and as three separate 1000 x 1000 color
58 planes stored in different variables. Modifying any of the variables
59 changes the underlying memory, and the changes are reflected in all
60 representations of the data.
61
62 There are two important methods that let you control dataflow
63 connections between a child and parent ndarray:
64
65 copy - forces an explicit copy of an ndarray
66 sever - breaks the dataflow connection between an ndarray and its
67 parents (if any)
68
69 Broadcasting and Dimension Order
70 Most PDL operations act on the first few dimensions of their ndarray
71 arguments. For example, "sumover" sums all elements along the first
72 dimension in the list (dimension 0). If you feed in a three-
73 dimensional ndarray, then the first dimension is considered the
74 "active" dimension and the later dimensions are "broadcast" dimensions
75 because they are simply looped over. There are several ways to
76 transpose or re-order the dimension list of an ndarray. Those
77 techniques are very fast since they don't touch the underlying data,
78 only change the way that PDL accesses the data. The main dimension
79 ordering functions are:
80
81 mv - moves a particular dimension somewhere else in the dimension list
82 xchg - exchanges two dimensions in the dimension list, leaving the rest
83 alone
84 reorder - allows wholesale mixing of the dimensions
85 clump - clumps together two or more small dimensions into one larger
86 one
87 squeeze - eliminates any dimensions of size 1
88
89 Physical and Dummy Dimensions
90 • document Perl level broadcasting
91
92 • broadcastids
93
94 • update and correct description of slice
95
96 • new functions in slice.pd (affine, lag, splitdim)
97
98 • reworking of paragraph on explicit broadcasting
99
101 A lot of the flexibility and power of PDL relies on the indexing and
102 looping features of the Perl extension. Indexing allows access to the
103 data of an ndarray in a very flexible way. Broadcasting provides
104 efficient implicit looping functionality (since the loops are
105 implemented as optimized C code).
106
107 ndarrays are Perl objects that represent multidimensional arrays and
108 operations on those. In contrast to simple Perl @x style lists the
109 array data is compactly stored in a single block of memory thus taking
110 up a lot less memory and enabling use of fast C code to implement
111 operations (e.g. addition, etc) on ndarrays.
112
113 ndarrays can have children
114 Central to many of the indexing capabilities of PDL are the relation of
115 "parent" and "child" between ndarrays. Many of the indexing commands
116 create a new ndarray from an existing ndarray. The new ndarray is the
117 "child" and the old one is the "parent". The data of the new ndarray is
118 defined by a transformation that specifies how to generate (compute)
119 its data from the parent's data. The relation between the child ndarray
120 and its parent are often bidirectional, meaning that changes in the
121 child's data are propagated back to the parent. (Note: You see, we are
122 aiming in our terminology already towards the new dataflow features.
123 The kind of dataflow that is used by the indexing commands (about which
124 you will learn in a minute) is always in operation, not only when you
125 have explicitly switched on dataflow in your ndarray by saying
126 "$x->doflow". For further information about data flow check the
127 dataflow man page.)
128
129 Another way to interpret the ndarrays created by our indexing commands
130 is to view them as a kind of intelligent pointer that points back to
131 some portion or all of its parent's data. Therefore, it is not
132 surprising that the parent's data (or a portion of it) changes when
133 manipulated through this "pointer". After these introductory remarks
134 that hopefully prepared you for what is coming (rather than confuse you
135 too much) we are going to dive right in and start with a description of
136 the indexing commands and some typical examples how they might be used
137 in PDL programs. We will further illustrate the pointer/dataflow
138 analogies in the context of some of the examples later on.
139
140 There are two different implementations of this ``smart pointer''
141 relationship: the first one, which is a little slower but works for any
142 transformation is simply to do the transformation forwards and
143 backwards as necessary. The other is to consider the child ndarray a
144 ``virtual'' ndarray, which only stores a pointer to the parent and
145 access information so that routines which use the child ndarray
146 actually directly access the data in the parent. If the virtual
147 ndarray is given to a routine which cannot use it, PDL transparently
148 physicalizes the virtual ndarray before letting the routine use it.
149
150 Currently (1.94_01) all transformations which are ``affine'', i.e. the
151 indices of the data item in the parent ndarray are determined by a
152 linear transformation (+ constant) from the indices of the child
153 ndarray result in virtual ndarrays. All other indexing routines (e.g.
154 "->index(...)") result in physical ndarrays. All routines compiled by
155 PP can accept affine ndarrays (except those routines that pass pointers
156 to external library functions).
157
158 Note that whether something is affine or not does not affect the
159 semantics of what you do in any way: both
160
161 $x->index(...) .= 5;
162 $x->slice(...) .= 5;
163
164 change the data in $x. The affinity does, however, have a significant
165 impact on memory usage and performance.
166
167 Slicing ndarrays
168 Probably the most important application of the concept of parent/child
169 ndarrays is the representation of rectangular slices of a physical
170 ndarray by a virtual ndarray. Having talked long enough about concepts
171 let's get more specific. Suppose we are working with a 2D ndarray
172 representing a 5x5 image (it's unusually small so that we can print it
173 without filling several screens full of digits).
174
175 pdl> $im = sequence(5,5)
176 pdl> p $im
177
178 [
179 [ 0 1 2 3 4]
180 [ 5 6 7 8 9]
181 [10 11 12 13 14]
182 [15 16 17 18 19]
183 [20 21 22 23 24]
184 ]
185
186 pdl> help vars
187 PDL variables in package main::
188
189 Name Type Dimension Flow State Mem
190 ----------------------------------------------------------------
191 $im Double D [5,5] P 0.20Kb
192
193 [ here it might be appropriate to quickly talk about the "help vars"
194 command that provides information about ndarrays in the interactive
195 "perldl" or "pdl2" shell that comes with PDL. ]
196
197 Now suppose we want to create a 1-D ndarray that just references one
198 line of the image, say line 2; or an ndarray that represents all even
199 lines of the image (imagine we have to deal with even and odd frames of
200 an interlaced image due to some peculiar behaviour of our frame
201 grabber). As another frequent application of slices we might want to
202 create an ndarray that represents a rectangular region of the image
203 with top and bottom reversed. All these effects (and many more) can be
204 easily achieved with the powerful slice function:
205
206 pdl> $line = $im->slice(':,(2)')
207 pdl> $even = $im->slice(':,1:-1:2')
208 pdl> $area = $im->slice('3:4,3:1')
209 pdl> help vars # or just PDL->vars
210 PDL variables in package main::
211
212 Name Type Dimension Flow State Mem
213 ----------------------------------------------------------------
214 $even Double D [5,2] -C 0.00Kb
215 $im Double D [5,5] P 0.20Kb
216 $line Double D [5] -C 0.00Kb
217 $area Double D [2,3] -C 0.00Kb
218
219 All three "child" ndarrays are children of $im or in the other (largely
220 equivalent) interpretation pointers to data of $im. Operations on
221 those virtual ndarrays access only those portions of the data as
222 specified by the argument to slice. So we can just print line 2:
223
224 pdl> p $line
225 [10 11 12 13 14]
226
227 Also note the difference in the "Flow State" of $area above and below:
228
229 pdl> p $area
230 pdl> help $area
231 This variable is Double D [2,3] VC 0.00Kb
232
233 The following demonstrates that $im and $line really behave as you
234 would expect from a pointer-like object (or in the dataflow picture:
235 the changes in $line's data are propagated back to $im):
236
237 pdl> $im++
238 pdl> p $line
239 [11 12 13 14 15]
240 pdl> $line += 2
241 pdl> p $im
242
243 [
244 [ 1 2 3 4 5]
245 [ 6 7 8 9 10]
246 [13 14 15 16 17]
247 [16 17 18 19 20]
248 [21 22 23 24 25]
249 ]
250
251 Note how assignment operations on the child virtual ndarrays change the
252 parent physical ndarray and vice versa (however, the basic "="
253 assignment doesn't, use ".=" to obtain that effect. See below for the
254 reasons). The virtual child ndarrays are something like "live links"
255 to the "original" parent ndarray. As previously said, they can be
256 thought of to work similar to a C-pointer. But in contrast to a
257 C-pointer they carry a lot more information. Firstly, they specify the
258 structure of the data they represent (the dimensionality of the new
259 ndarray) and secondly, specify how to create this structure from its
260 parents data (the way this works is buried in the internals of PDL and
261 not important for you to know anyway (unless you want to hack the core
262 in the future or would like to become a PDL guru in general (for a
263 definition of this strange creature see PDL::Internals)).
264
265 The previous examples have demonstrated typical usage of the slice
266 function. Since the slicing functionality is so important here is an
267 explanation of the syntax for the string argument to slice:
268
269 $vpdl = $x->slice('ind0,ind1...')
270
271 where "ind0" specifies what to do with index No 0 of the ndarray $x,
272 etc. Each element of the comma separated list can have one of the
273 following forms:
274
275 ':' Use the whole dimension
276
277 'n' Use only index "n". The dimension of this index in the resulting
278 virtual ndarray is 1. An example involving those first two index
279 formats:
280
281 pdl> $column = $im->slice('2,:')
282 pdl> $row = $im->slice(':,0')
283 pdl> p $column
284
285 [
286 [ 3]
287 [ 8]
288 [15]
289 [18]
290 [23]
291 ]
292
293 pdl> p $row
294
295 [
296 [1 2 3 4 5]
297 ]
298
299 pdl> help $column
300 This variable is Double D [1,5] VC 0.00Kb
301
302 pdl> help $row
303 This variable is Double D [5,1] VC 0.00Kb
304
305 '(n)' Use only index "n". This dimension is removed from the resulting
306 ndarray (relying on the fact that a dimension of size 1 can
307 always be removed). The distinction between this case and the
308 previous one becomes important in assignments where left and
309 right hand side have to have appropriate dimensions.
310
311 pdl> $line = $im->slice(':,(0)')
312 pdl> help $line
313 This variable is Double D [5] -C 0.00Kb
314
315 pdl> p $line
316 [1 2 3 4 5]
317
318 Spot the difference to the previous example?
319
320 'n1:n2' or 'n1:n2:n3'
321 Take the range of indices from "n1" to "n2" or (second form) take
322 the range of indices from "n1" to "n2" with step "n3". An example
323 for the use of this format is the previous definition of the sub-
324 image composed of even lines.
325
326 pdl> $even = $im->slice(':,1:-1:2')
327
328 This example also demonstrates that negative indices work like
329 they do for normal Perl style arrays by counting backwards from
330 the end of the dimension. If "n2" is smaller than "n1" (in the
331 example -1 is equivalent to index 4) the elements in the virtual
332 ndarray are effectively reverted with respect to its parent.
333
334 '*[n]'
335 Add a dummy dimension. The size of this dimension will be 1 by
336 default or equal to "n" if the optional numerical argument is
337 given.
338
339 Now, this is really something a bit strange on first sight. What
340 is a dummy dimension? A dummy dimension inserts a dimension where
341 there wasn't one before. How is that done ? Well, in the case of
342 the new dimension having size 1 it can be easily explained by the
343 way in which you can identify a vector (with "m" elements) with
344 an "(1,m)" or "(m,1)" matrix. The same holds obviously for higher
345 dimensional objects. More interesting is the case of a dummy
346 dimensions of size greater than one (e.g. "slice('*5,:')"). This
347 works in the same way as a call to the dummy function creates a
348 new dummy dimension. So read on and check its explanation below.
349
350 '([n1:n2[:n3]]=i)'
351 [Not yet implemented ??????] With an argument like this you make
352 generalised diagonals. The diagonal will be dimension no. "i" of
353 the new output ndarray and (if optional part in brackets
354 specified) will extend along the range of indices specified of
355 the respective parent ndarray's dimension. In general an argument
356 like this only makes sense if there are other arguments like this
357 in the same call to slice. The part in brackets is optional for
358 this type of argument. All arguments of this type that specify
359 the same target dimension "i" have to relate to the same number
360 of indices in their parent dimension. The best way to explain it
361 is probably to give an example, here we make an ndarray that
362 refers to the elements along the space diagonal of its parent
363 ndarray (a cube):
364
365 $cube = zeroes(5,5,5);
366 $sdiag = $cube->slice('(=0),(=0),(=0)');
367
368 The above command creates a virtual ndarray that represents the
369 diagonal along the parents' dimension no. 0, 1 and 2 and makes
370 its dimension 0 (the only dimension) of it. You use the extended
371 syntax if the dimension sizes of the parent dimensions you want
372 to build the diagonal from have different sizes or you want to
373 reverse the sequence of elements in the diagonal, e.g.
374
375 $rect = zeroes(12,3,5,6,2);
376 $vpdl = $rect->slice('2:7,(0:1=1),(4),(5:4=1),(=1)');
377
378 So the elements of $vpdl will then be related to those of its
379 parent in way we can express as:
380
381 vpdl(i,j) = rect(i+2,j,4,5-j,j) 0<=i<5, 0<=j<2
382
383 [ work in the new index function: "$y = $x->index($c);" ???? ]
384
385 There are different kinds of assignments in PDL
386 The previous examples have already shown that virtual ndarrays can be
387 used to operate on or access portions of data of a parent ndarray. They
388 can also be used as lvalues in assignments (as the use of "++" in some
389 of the examples above has already demonstrated). For explicit
390 assignments to the data represented by a virtual ndarray you have to
391 use the overloaded ".=" operator (which in this context we call
392 propagated assignment). Why can't you use the normal assignment
393 operator "="?
394
395 Well, you definitely still can use the '=' operator but it wouldn't do
396 what you want. This is due to the fact that the '=' operator cannot be
397 overloaded in the same way as other assignment operators. If we tried
398 to use '=' to try to assign data to a portion of a physical ndarray
399 through a virtual ndarray we wouldn't achieve the desired effect
400 (instead the variable representing the virtual ndarray (a reference to
401 a blessed thingy) would after the assignment just contain the reference
402 to another blessed thingy which would behave to future assignments as a
403 "physical" copy of the original rvalue [this is actually not yet clear
404 and subject of discussions in the PDL developers mailing list]. In that
405 sense it would break the connection of the ndarray to the parent [
406 isn't this behaviour in a sense the opposite of what happens in
407 dataflow, where ".=" breaks the connection to the parent? ].
408
409 E.g.
410
411 pdl> $line = $im->slice(':,(2)')
412 pdl> $line = zeroes(5);
413 pdl> $line++;
414 pdl> p $im
415
416 [
417 [ 1 2 3 4 5]
418 [ 6 7 8 9 10]
419 [13 14 15 16 17]
420 [16 17 18 19 20]
421 [21 22 23 24 25]
422 ]
423
424 pdl> p $line
425 [1 1 1 1 1]
426
427 But using ".="
428
429 pdl> $line = $im->slice(':,(2)')
430 pdl> $line .= zeroes(5)
431 pdl> $line++
432 pdl> p $im
433
434 [
435 [ 1 2 3 4 5]
436 [ 6 7 8 9 10]
437 [ 1 1 1 1 1]
438 [16 17 18 19 20]
439 [21 22 23 24 25]
440 ]
441
442 pdl> print $line
443 [1 1 1 1 1]
444
445 Also, you can substitute
446
447 pdl> $line .= 0;
448
449 for the assignment above (the zero is converted to a scalar ndarray,
450 with no dimensions so it can be assigned to any ndarray).
451
452 A nice feature in recent perl versions is lvalue subroutines (i.e.,
453 versions 5.6.x and higher including all perls currently supported by
454 PDL). That allows one to use the slicing syntax on both sides of the
455 assignment:
456
457 pdl> $im->slice(':,(2)') .= zeroes(5)->xvals->float
458
459 Related to the lvalue sub assignment feature is a little trap for the
460 unwary: recent perls introduced a "feature" which breaks PDL's use of
461 lvalue subs for slice assignments when running under the perl debugger,
462 "perl -d". Under the debugger, the above usage gives an error like: "
463 Can't return a temporary from lvalue subroutine... " So you must use
464 syntax like this:
465
466 pdl> ($pdl = $im->slice(':,(2)')) .= zeroes(5)->xvals->float
467
468 which works both with and without the debugger but is arguably clumsy
469 and awkward to read.
470
471 Note that there can be a problem with assignments like this when lvalue
472 and rvalue ndarrays refer to overlapping portions of data in the parent
473 ndarray:
474
475 # revert the elements of the first line of $x
476 ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)');
477
478 Currently, the parent data on the right side of the assignments is not
479 copied before the (internal) assignment loop proceeds. Therefore, the
480 outcome of this assignment will depend on the sequence in which
481 elements are assigned and almost certainly not do what you wanted. So
482 the semantics are currently undefined for now and liable to change
483 anytime. To obtain the desired behaviour, use
484
485 ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)')->copy;
486
487 which makes a physical copy of the slice or
488
489 ($tmp = $x->slice(':,(1)')) .= $x->slice('-1:0,(1)')->sever;
490
491 which returns the same slice but severs the connection of the slice to
492 its parent.
493
494 Other functions that manipulate dimensions
495 Having talked extensively about the slice function it should be noted
496 that this is not the only PDL indexing function. There are additional
497 indexing functions which are also useful (especially in the context of
498 broadcasting which we will talk about later). Here are a list and some
499 examples how to use them.
500
501 "dummy"
502 inserts a dummy dimension of the size you specify (default 1) at
503 the chosen location. You can't wait to hear how that is achieved?
504 Well, all elements with index "(X,x,Y)" ("0<=x<size_of_dummy_dim")
505 just map to the element with index "(X,Y)" of the parent ndarray
506 (where "X" and "Y" refer to the group of indices before and after
507 the location where the dummy dimension was inserted.)
508
509 This example calculates the x coordinate of the centroid of an
510 image (later we will learn that we didn't actually need the dummy
511 dimension thanks to the magic of implicit broadcasting; but using
512 dummy dimensions the code would also work in a broadcast-less
513 world; though once you have worked with PDL broadcasting you
514 wouldn't want to live without them again).
515
516 # centroid
517 ($xd,$yd) = $im->dims;
518 $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);
519
520 Let's explain how that works in a little more detail. First, the
521 product:
522
523 $xvs = xvals(zeroes($xd));
524 print $xvs->dummy(1,$yd); # repeat the line $yd times
525 $prod = $im*xvs->dummy(1,$yd); # form the pixel-wise product with
526 # the repeated line of x-values
527
528 The rest is then summing the results of the pixel-wise product
529 together and normalizing with the sum of all pixel values in the
530 original image thereby calculating the x-coordinate of the "center
531 of mass" of the image (interpreting pixel values as local mass)
532 which is known as the centroid of an image.
533
534 Next is a (from the point of view of memory consumption) very cheap
535 conversion from grey-scale to RGB, i.e. every pixel holds now a
536 triple of values instead of a scalar. The three values in the
537 triple are, fortunately, all the same for a grey image, so that our
538 trick works well in that it maps all the three members of the
539 triple to the same source element:
540
541 # a cheap grey-scale to RGB conversion
542 $rgb = $grey->dummy(0,3)
543
544 Unfortunately this trick cannot be used to convert your old B/W
545 photos to color ones in the way you'd like. :(
546
547 Note that the memory usage of ndarrays with dummy dimensions is
548 especially sensitive to the internal representation. If the ndarray
549 can be represented as a virtual affine (``vaffine'') ndarray, only
550 the control structures are stored. But if $y in
551
552 $x = zeroes(10000);
553 $y = $x->dummy(1,10000);
554
555 is made physical by some routine, you will find that the memory
556 usage of your program has suddenly grown by 100Mb.
557
558 "diagonal"
559 replaces two dimensions (which have to be of equal size) by one
560 dimension that references all the elements along the "diagonal"
561 along those two dimensions. Here, we have two examples which should
562 appear familiar to anyone who has ever done some linear algebra.
563 Firstly, make a unity matrix:
564
565 # unity matrix
566 $e = zeroes(float, 3, 3); # make everything zero
567 ($tmp = $e->diagonal(0,1)) .= 1; # set the elements along the diagonal to 1
568 print $e;
569
570 Or the other diagonal:
571
572 ($tmp = $e->slice(':-1:0')->diagonal(0,1)) .= 2;
573 print $e;
574
575 (Did you notice how we used the slice function to revert the
576 sequence of lines before setting the diagonal of the new child,
577 thereby setting the cross diagonal of the parent ?) Or a mapping
578 from the space of diagonal matrices to the field over which the
579 matrices are defined, the trace of a matrix:
580
581 # trace of a matrix
582 $trace = sum($mat->diagonal(0,1)); # sum all the diagonal elements
583
584 "xchg" and "mv"
585 xchg exchanges or "transposes" the two specified dimensions. A
586 straightforward example:
587
588 # transpose a matrix (without explicitly reshuffling data and
589 # making a copy)
590 $prod = $x x $x->xchg(0,1);
591
592 $prod should now be pretty close to the unity matrix if $x is an
593 orthogonal matrix. Often "xchg" will be used in the context of
594 broadcasting but more about that later.
595
596 mv works in a similar fashion. It moves a dimension (specified by
597 its number in the parent) to a new position in the new child
598 ndarray:
599
600 $y = $x->mv(4,0); # make the 5th dimension of $x the first in the
601 # new child $y
602
603 The difference between "xchg" and "mv" is that "xchg" only changes
604 the position of two dimensions with each other, whereas "mv"
605 inserts the first dimension to the place of second, moving the
606 other dimensions around accordingly.
607
608 "clump"
609 collapses several dimensions into one. Its only argument specifies
610 how many dimensions of the source ndarray should be collapsed
611 (starting from the first). An (admittedly unrealistic) example is a
612 3D ndarray which holds data from a stack of image files that you
613 have just read in. However, the data from each image really
614 represents a 1D time series and has only been arranged that way
615 because it was digitized with a frame grabber. So to have it again
616 as an array of time sequences you say
617
618 pdl> $seqs = $stack->clump(2)
619 pdl> help vars
620 PDL variables in package main::
621
622 Name Type Dimension Flow State Mem
623 ----------------------------------------------------------------
624 $seqs Double D [8000,50] -C 0.00Kb
625 $stack Double D [100,80,50] P 3.05Mb
626
627 Unrealistic as it may seem, our confocal microscope software writes
628 data (sometimes) this way. But more often you use clump to achieve
629 a certain effect when using implicit or explicit broadcasting.
630
631 Calls to indexing functions can be chained
632 As you might have noticed in some of the examples above calls to the
633 indexing functions can be nicely chained since all of these functions
634 return a newly created child object. However, when doing extensive
635 index manipulations in a chain be sure to keep track of what you are
636 doing, e.g.
637
638 $x->xchg(0,1)->mv(0,4)
639
640 moves the dimension 1 of $x to position 4 since when the second command
641 is executed the original dimension 1 has been moved to position 0 of
642 the new child that calls the "mv" function. I think you get the idea
643 (in spite of my convoluted explanations).
644
645 Propagated assignments ('.=') and dummy dimensions
646 A subtlety related to indexing is the assignment to ndarrays containing
647 dummy dimensions of size greater than 1. These assignments (using ".=")
648 are forbidden since several elements of the lvalue ndarray point to the
649 same element of the parent. As a consequence the value of those parent
650 elements are potentially ambiguous and would depend on the sequence in
651 which the implementation makes the assignments to elements. Therefore,
652 an assignment like this:
653
654 $x = pdl [1,2,3];
655 $y = $x->dummy(1,4);
656 $y .= yvals(zeroes(3,4));
657
658 can produce unexpected results and the results are explicitly undefined
659 by PDL because when PDL gets parallel computing features, the current
660 result may well change.
661
662 From the point of view of dataflow the introduction of greater-size-
663 than-one dummy dimensions is regarded as an irreversible transformation
664 (similar to the terminology in thermodynamics) which precludes backward
665 propagation of assignment to a parent (which you had explicitly
666 requested using the ".=" assignment). A similar problem to watch out
667 for occurs in the context of broadcasting where sometimes dummy
668 dimensions are created implicitly during the broadcast loop (see
669 below).
670
671 Reasons for the parent/child (or "pointer") concept
672 [ this will have to wait a bit ]
673
674 XXXXX being memory efficient
675 XXXXX in the context of broadcasting
676 XXXXX very flexible and powerful way of accessing portions of ndarray data
677 (in much more general way than sec, etc allow)
678 XXXXX efficient implementation
679 XXXXX difference to section/at, etc.
680
681 How to make things physical again
682 [ XXXXX fill in later when everything has settled a bit more ]
683
684 ** When needed (xsub routine interfacing C lib function)
685 ** How achieved (->physical)
686 ** How to test (isphysical (explain how it works currently))
687 ** ->copy and ->sever
688
690 In the previous paragraph on indexing we have already mentioned the
691 term occasionally but now its really time to talk explicitly about
692 "broadcasting" with ndarrays: within the framework of PDL it could
693 probably be loosely defined as an implicit looping facility. It is
694 implicit because you don't specify anything like enclosing for-loops
695 but rather the loops are automatically (or 'magically') generated by
696 PDL based on the dimensions of the ndarrays involved. This should give
697 you a first idea why the index/dimension manipulating functions you
698 have met in the previous paragraphs are especially important and useful
699 in the context of broadcasting. The other ingredient for broadcasting
700 (apart from the ndarrays involved) is a function that is broadcasting
701 aware (generally, these are PDL::PP compiled functions) and that the
702 ndarrays are "broadcast" over. So much about the terminology and now
703 let's try to shed some light on what it all means.
704
705 Implicit broadcasting - a first example
706 There are two slightly different variants of broadcasting. We start
707 with what we call "implicit broadcasting". Let's pick a practical
708 example that involves looping of a function over many elements of a
709 ndarray. Suppose we have an RGB image that we want to convert to grey-
710 scale. The RGB image is represented by a 3-dim ndarray "im(3,x,y)"
711 where the first dimension contains the three color components of each
712 pixel and "x" and "y" are width and height of the image, respectively.
713 Next we need to specify how to convert a color-triple at a given pixel
714 into a grey-value (to be a realistic example it should represent the
715 relative intensity with which our color insensitive eye cells would
716 detect that color to achieve what we would call a natural conversion
717 from color to grey-scale). An approximation that works quite well is to
718 compute the grey intensity from each RGB triplet (r,g,b) as a weighted
719 sum
720
721 grey-value = 77/256*r + 150/256*g + 29/256*b =
722 inner([77,150,29]/256, [r,g,b])
723
724 where the last form indicates that we can write this as an inner
725 product of the 3-vector comprising the weights for red, green and blue
726 components with the 3-vector containing the color components.
727 Traditionally, we might have written a function like the following to
728 process the whole image:
729
730 my @dims=$im->dims;
731 # here normally check that first dim has correct size (3), etc
732 $grey=zeroes(@dims[1,2]); # make the ndarray for the resulting grey image
733 $w = pdl [77,150,29] / 256; # the vector of weights
734 for ($j=0;$j<dims[2];$j++) {
735 for ($i=0;$i<dims[1];$i++) {
736 # compute the pixel value
737 $tmp = inner($w,$im->slice(':,(i),(j)'));
738 set($grey,$i,$j,$tmp); # and set it in the grey-scale image
739 }
740 }
741
742 Now we write the same using broadcasting (noting that "inner" is a
743 broadcasting aware function defined in the PDL::Primitive package)
744
745 $grey = inner($im,pdl([77,150,29]/256));
746
747 We have ended up with a one-liner that automatically creates the
748 ndarray $grey with the right number and size of dimensions and performs
749 the loops automatically (these loops are implemented as fast C code in
750 the internals of PDL). Well, we still owe you an explanation how this
751 'magic' is achieved.
752
753 How does the example work ?
754 The first thing to note is that every function that is broadcasting
755 aware (these are without exception functions compiled from concise
756 descriptions by PDL::PP, later just called PP-functions) expects a
757 defined (minimum) number of dimensions (we call them core dimensions)
758 from each of its ndarray arguments. The inner function expects two one-
759 dimensional (input) parameters from which it calculates a zero-
760 dimensional (output) parameter. We write that symbolically as
761 "inner((n),(n),[o]())" and call it "inner"'s signature, where n
762 represents the size of that dimension. n being equal in the first and
763 second parameter means that those dimensions have to be of equal size
764 in any call. As a different example take the outer product which takes
765 two 1D vectors to generate a 2D matrix, symbolically written as
766 "outer((n),(m),[o](n,m))". The "[o]" in both examples indicates that
767 this (here third) argument is an output argument. In the latter example
768 the dimensions of first and second argument don't have to agree but you
769 see how they determine the size of the two dimensions of the output
770 ndarray.
771
772 Here is the point when broadcasting finally enters the game. If you
773 call PP-functions with ndarrays that have more than the required core
774 dimensions the first dimensions of the ndarray arguments are used as
775 the core dimensions and the additional extra dimensions are broadcast
776 over. Let us demonstrate this first with our example above
777
778 $grey = inner($im,$w); # w is the weight vector from above
779
780 In this case $w is 1D and so supplied just the core dimension, $im is
781 3D, more specifically "(3,x,y)". The first dimension (of size 3) is the
782 required core dimension that matches (as required by inner) the first
783 (and only) dimension of $w. The second dimension is the first broadcast
784 dimension (of size "x") and the third is here the second broadcast
785 dimension (of size "y"). The output ndarray is automatically created
786 (as requested by setting $grey to "null" prior to invocation). The
787 output dimensions are obtained by appending the loop dimensions (here
788 "(x,y)") to the core output dimensions (here 0D) to yield the final
789 dimensions of the auto-created ndarray (here "0D+2D=2D" to yield a 2D
790 output of size "(x,y)").
791
792 So the above command calls the core functionality that computes the
793 inner product of two 1D vectors "x*y" times with $w and all 1D slices
794 of the form "(':,(i),(j)')" of $im and sets the respective elements of
795 the output ndarray "$grey(i,j)" to the result of each computation. We
796 could write that symbolically as
797
798 $grey(0,0) = f($w,$im(:,(0),(0)))
799 $grey(1,0) = f($w,$im(:,(1),(0)))
800 .
801 .
802 .
803 $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
804 $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))
805
806 But this is done automatically by PDL without writing any explicit Perl
807 loops. We see that the command really creates an output ndarray with
808 the right dimensions and sets the elements indeed to the result of the
809 computation for each pixel of the input image.
810
811 When even more ndarrays and extra dimensions are involved things get a
812 bit more complicated. We will first give the general rules how the
813 broadcast dimensions depend on the dimensions of input ndarrays
814 enabling you to figure out the dimensionality of an auto-created output
815 ndarray (for any given set of input ndarrays and core dimensions of the
816 PP-function in question). The general rules will most likely appear a
817 bit confusing on first sight so that we'll set out to illustrate the
818 usage with a set of further examples (which will hopefully also
819 demonstrate that there are indeed many practical situations where
820 broadcasting comes in extremely handy).
821
822 A call for coding discipline
823 Before we point out the other technical details of broadcasting, please
824 note this call for programming discipline when using broadcasting:
825
826 In order to preserve human readability, PLEASE comment any nontrivial
827 expression in your code involving broadcasting. Most importantly, for
828 any subroutine, include information at the beginning about what you
829 expect the dimensions to represent (or ranges of dimensions).
830
831 As a warning, look at this undocumented function and try to guess what
832 might be going on:
833
834 sub lookup {
835 my ($im,$palette) = @_;
836 my $res;
837 index($palette->xchg(0,1),
838 $im->long->dummy(0,($palette->dim)[0]),
839 ($res=null));
840 return $res;
841 }
842
843 Would you agree that it might be difficult to figure out expected
844 dimensions, purpose of the routine, etc ? (If you want to find out
845 what this piece of code does, see below)
846
847 How to figure out the loop dimensions
848 There are a couple of rules that allow you to figure out number and
849 size of loop dimensions (and if the size of your input ndarrays comply
850 with the broadcasting rules). Dimensions of any ndarray argument are
851 broken down into two groups in the following: Core dimensions (as
852 defined by the PP-function, see Appendix B for a list of PDL
853 primitives) and extra dimensions which comprises all remaining
854 dimensions of that ndarray. For example calling a function "func" with
855 the signature "func((n,m),[o](n))" with an ndarray "$x(2,4,7,1,3)" as
856 "f($x,($o = null))" results in the semantic splitting of x's dimensions
857 into: core dimensions "(2,4)" and extra dimensions "(7,1,3)".
858
859 R0 Core dimensions are identified with the first N dimensions of the
860 respective ndarray argument (and are required). Any further
861 dimensions are extra dimensions and used to determine the loop
862 dimensions.
863
864 R1 The number of (implicit) loop dimensions is equal to the maximal
865 number of extra dimensions taken over the set of ndarray
866 arguments.
867
868 R2 The size of each of the loop dimensions is derived from the size
869 of the respective dimensions of the ndarray arguments. The size
870 of a loop dimension is given by the maximal size found in any of
871 the ndarrays having this extra dimension.
872
873 R3 For all ndarrays that have a given extra dimension the size must
874 be equal to the size of the loop dimension (as determined by the
875 previous rule) or 1; otherwise you raise a runtime exception. If
876 the size of the extra dimension in an ndarray is one it is
877 implicitly treated as a dummy dimension of size equal to that
878 loop dim size when performing the broadcast loop.
879
880 R4 If an ndarray doesn't have a loop dimension, in the broadcast
881 loop this ndarray is treated as if having a dummy dimension of
882 size equal to the size of that loop dimension.
883
884 R5 If output auto-creation is used (by setting the relevant ndarray
885 to "PDL->null" before invocation) the number of dimensions of the
886 created ndarray is equal to the sum of the number of core output
887 dimensions + number of loop dimensions. The size of the core
888 output dimensions is derived from the relevant dimension of input
889 ndarrays (as specified in the function definition) and the sizes
890 of the other dimensions are equal to the size of the loop
891 dimension it is derived from. The automatically created ndarray
892 will be physical (unless dataflow is in operation).
893
894 In this context, note that you can run into the problem with assignment
895 to ndarrays containing greater-than-one dummy dimensions (see above).
896 Although your output ndarray(s) didn't contain any dummy dimensions in
897 the first place they may end up with implicitly created dummy
898 dimensions according to R4.
899
900 As an example, suppose we have a (here unspecified) PP-function with
901 the signature:
902
903 func((m,n),(m,n,o),(m),[o](m,o))
904
905 and you call it with 3 ndarrays "$x(5,3,10,11)", "$y(5,3,2,10,1,12)",
906 and "$z(5,1,11,12)" as
907
908 func($x,$y,$z,($d=null))
909
910 then the number of loop dimensions is 3 (by "R0+R1" from $y and $z)
911 with sizes "(10,11,12)" (by R2); the two output core dimensions are
912 "(5,2)" (from the signature of func) resulting in a 5-dimensional
913 output ndarray $c of size "(5,2,10,11,12)" (see R5) and (the
914 automatically created) $d is derived from "($x,$y,$z)" in a way that
915 can be expressed in pdl pseudo-code as
916
917 $d(:,:,i,j,k) .= func($x(:,:,i,j),$y(:,:,:,i,0,k),$z(:,0,j,k))
918 with 0<=i<10, 0<=j<=11, 0<=k<12
919
920 If we analyze the color to grey-scale conversion again with these rules
921 in mind we note another great advantage of implicit broadcasting. We
922 can call the conversion with an ndarray representing a pixel (im(3)), a
923 line of rgb pixels ("im(3,x)"), a proper color image ("im(3,x,y)") or a
924 whole stack of RGB images ("im(3,x,y,z)"). As long as $im is of the
925 form "(3,...)" the automatically created output ndarray will contain
926 the right number of dimensions and contain the intensity data as we
927 expect it since the loops have been implicitly performed thanks to
928 implicit broadcasting. You can easily convince yourself that calling
929 with a color pixel $grey is 0D, with a line it turns out 1D grey(x),
930 with an image we get "grey(x,y)" and finally we get a converted image
931 stack "grey(x,y,z)".
932
933 Let's fill these general rules with some more life by going through a
934 couple of further examples. The reader may try to figure out equivalent
935 formulations with explicit for-looping and compare the flexibility of
936 those routines using implicit broadcasting to the explicit formulation.
937 Furthermore, especially when using several broadcast dimensions it is a
938 useful exercise to check the relative speed by doing some benchmark
939 tests (which we still have to do).
940
941 First in the row is a slightly reworked centroid example, now coded
942 with broadcasting in mind.
943
944 # broadcast mult to calculate centroid coords, works for stacks as well
945 $xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) /
946 sumover($im->clump(2));
947
948 Let's analyze what's going on step by step. First the product:
949
950 $prod = $im*xvals(zeroes(($im->dims)[0]))
951
952 This will actually work for $im being one, two, three, and higher
953 dimensional. If $im is one-dimensional it's just an ordinary product
954 (in the sense that every element of $im is multiplied with the
955 respective element of xvals(...)), if $im has more dimensions further
956 broadcasting is done by adding appropriate dummy dimensions to
957 xvals(...) according to R4. More importantly, the two sumover
958 operations show a first example of how to make use of the dimension
959 manipulating commands. A quick look at sumover's signature will remind
960 you that it will only "gobble up" the first dimension of a given input
961 ndarray. But what if we want to really compute the sum over all
962 elements of the first two dimensions? Well, nothing keeps us from
963 passing a virtual ndarray into sumover which in this case is formed by
964 clumping the first two dimensions of the "parent ndarray" into one.
965 From the point of view of the parent ndarray the sum is now computed
966 over the first two dimensions, just as we wanted, though sumover has
967 just done the job as specified by its signature. Got it ?
968
969 Another little finesse of writing the code like that: we intentionally
970 used "sumover($pdl->clump(2))" instead of sum($pdl) so that we can
971 either pass just an image "(x,y)" or a stack of images "(x,y,t)" into
972 this routine and get either just one x-coordinate or a vector of
973 x-coordinates (of size t) in return.
974
975 Another set of common operations are what one could call "projection
976 operations". These operations take a N-D ndarray as input and return a
977 (N-1)-D "projected" ndarray. These operations are often performed with
978 functions like sumover, prodover, minimum and maximum. Using again
979 images as examples we might want to calculate the maximum pixel value
980 for each line of an image or image stack. We know how to do that
981
982 # maxima of lines (as function of line number and time)
983 maximum($stack,($ret=null));
984
985 But what if you want to calculate maxima per column when implicit
986 broadcasting always applies the core functionality to the first
987 dimension and broadcasts over all others? How can we achieve that
988 instead the core functionality is applied to the second dimension and
989 broadcasting is done over the others. Can you guess it? Yes, we make a
990 virtual ndarray that has the second dimension of the "parent ndarray"
991 as its first dimension using the "mv" command.
992
993 # maxima of columns (as function of column number and time)
994 maximum($stack->mv(1,0),($ret=null));
995
996 and calculating all the sums of sub-slices over the third dimension is
997 now almost too easy
998
999 # sums of pixels in time (assuming time is the third dim)
1000 sumover($stack->mv(2,0),($ret=null));
1001
1002 Finally, if you want to apply the operation to all elements (like max
1003 over all elements or sum over all elements) regardless of the
1004 dimensions of the ndarray in question "clump" comes in handy. As an
1005 example look at a definition of "sum" (summarised from
1006 Basic/Ufunc/ufunc.pd):
1007
1008 sub sum {
1009 PDL::Ufunc::sumover($name->clump(-1),($tmp=null));
1010 return $tmp; # return a 0D ndarray
1011 }
1012
1013 We have already mentioned that all basic operations support
1014 broadcasting and assignment is no exception. So here are a couple of
1015 broadcasted assignments
1016
1017 pdl> $im = zeroes(byte, 10,20)
1018 pdl> $line = exp(-rvals(10)**2/9)
1019 # broadcasted assignment
1020 pdl> $im .= $line # set every line of $im to $line
1021 pdl> $im2 .= 5 # set every element of $im2 to 5
1022
1023 By now you probably see how it works and what it does, don't you?
1024
1025 To finish the examples in this paragraph here is a function to create
1026 an RGB image from what is called a palette image. The palette image
1027 consists of two parts: an image of indices into a color lookup table
1028 and the color lookup table itself. [ describe how it works ] We are
1029 going to use a PP-function we haven't encoutered yet in the previous
1030 examples. It is the aptly named index function, signature
1031 "((n),(),[o]())" (see Appendix B) with the core functionality that
1032 "index(pdl (0,2,4,5),2,($ret=null))" will return the element with index
1033 2 of the first input ndarray. In this case, $ret will contain the value
1034 4. So here is the example:
1035
1036 # a broadcasted index lookup to generate an RGB, or RGBA or YMCK image
1037 # from a palette image (represented by a lookup table $palette and
1038 # an color-index image $im)
1039 # you can say just dummy(0) since the rules of broadcasting make it fit
1040 pdl> index($palette->xchg(0,1),
1041 $im->long->dummy(0,($palette->dim)[0]),
1042 ($res=null));
1043
1044 Let's go through it and explain the steps involved. Assuming we are
1045 dealing with an RGB lookup-table $palette is of size "(3,x)". First we
1046 exchange the dimensions of the palette so that looping is done over the
1047 first dimension of $palette (of size 3 that represent r, g, and b
1048 components). Now looking at $im, we add a dummy dimension of size equal
1049 to the length of the number of components (in the case we are
1050 discussing here we could have just used the number 3 since we have 3
1051 color components). We can use a dummy dimension since for red, green
1052 and blue color components we use the same index from the original
1053 image, e.g. assuming a certain pixel of $im had the value 4 then the
1054 lookup should produce the triple
1055
1056 [palette(0,4),palette(1,4),palette(2,4)]
1057
1058 for the new red, green and blue components of the output image.
1059 Hopefully by now you have some sort of idea what the above piece of
1060 code is supposed to do (it is often actually quite complicated to
1061 describe in detail how a piece of broadcasting code works; just go
1062 ahead and experiment a bit to get a better feeling for it).
1063
1064 If you have read the broadcasting rules carefully, then you might have
1065 noticed that we didn't have to explicitly state the size of the dummy
1066 dimension that we created for $im; when we create it with size 1 (the
1067 default) the rules of broadcasting make it automatically fit to the
1068 desired size (by rule R3, in our example the size would be 3 assuming a
1069 palette of size "(3,x)"). Since situations like this do occur often in
1070 practice this is actually why rule R3 has been introduced (the part
1071 that makes dimensions of size 1 fit to the broadcast loop dim size). So
1072 we can just say
1073
1074 pdl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null));
1075
1076 Again, you can convince yourself that this routine will create the
1077 right output if called with a pixel ($im is 0D), a line ($im is 1D), an
1078 image ($im is 2D), ..., an RGB lookup table (palette is "(3,x)") and
1079 RGBA lookup table (palette is "(4,x)", see e.g. OpenGL). This
1080 flexibility is achieved by the rules of broadcasting which are made to
1081 do the right thing in most situations.
1082
1083 To wrap it all up once again, the general idea is as follows. If you
1084 want to achieve looping over certain dimensions and have the core
1085 functionality applied to another specified set of dimensions you use
1086 the dimension manipulating commands to create a (or several) virtual
1087 ndarray(s) so that from the point of view of the parent ndarray(s) you
1088 get what you want (always having the signature of the function in
1089 question and R1-R5 in mind!). Easy, isn't it ?
1090
1091 Output auto-creation and PP-function calling conventions
1092 At this point we have to divert to some technical detail that has to do
1093 with the general calling conventions of PP-functions and the automatic
1094 creation of output arguments. Basically, there are two ways of
1095 invoking PDL routines, namely
1096
1097 $result = func($x,$y);
1098
1099 and
1100
1101 func($x,$y,$result);
1102
1103 If you are only using implicit broadcasting then the output variable
1104 can be automatically created by PDL. You flag that to the PP-function
1105 by setting the output argument to a special kind of ndarray that is
1106 returned from a call to the function "PDL->null" that returns an
1107 essentially "empty" ndarray (for those interested in details there is a
1108 flag in the C pdl structure for this). The dimensions of the created
1109 ndarray are determined by the rules of implicit broadcasting: the first
1110 dimensions are the core output dimensions to which the broadcasting
1111 dimensions are appended (which are in turn determined by the dimensions
1112 of the input ndarrays as described above). So you can say
1113
1114 func($x,$y,($result=PDL->null));
1115
1116 or
1117
1118 $result = func($x,$y)
1119
1120 which are exactly equivalent.
1121
1122 Be warned that you can not use output auto-creation when using explicit
1123 broadcasting (for reasons explained in the following section on
1124 explicit broadcasting, the second variant of broadcasting).
1125
1126 In "tight" loops you probably want to avoid the implicit creation of a
1127 temporary ndarray in each step of the loop that comes along with the
1128 "functional" style but rather say
1129
1130 # create output ndarray of appropriate size only at first invocation
1131 $result = null;
1132 for (0...$n) {
1133 func($x,$y,$result); # in all but the first invocation $result
1134 func2($y); # is defined and has the right size to
1135 # take the output provided $y's dims don't change
1136 twiddle($result,$x); # do something from $result to $x for iteration
1137 }
1138
1139 The take-home message of this section once more: be aware of the
1140 limitation on output creation when using explicit broadcasting.
1141
1142 Explicit broadcasting
1143 Having so far only talked about the first flavour of broadcasting it is
1144 now about time to introduce the second variant. Instead of shuffling
1145 around dimensions all the time and relying on the rules of implicit
1146 broadcasting to get it all right you sometimes might want to specify in
1147 a more explicit way how to perform the broadcast loop. It is probably
1148 not too surprising that this variant of the game is called explicit
1149 broadcasting. Now, before we create the wrong impression: it is not
1150 either implicit or explicit; the two flavours do mix. But more about
1151 that later.
1152
1153 The two most used functions with explicit broadcasting are broadcast
1154 and unbroadcast. We start with an example that illustrates typical
1155 usage of the former:
1156
1157 [ # ** this is the worst possible example to start with ]
1158 # but can be used to show that $mat += $line is different from
1159 # $mat->broadcast(0) += $line
1160 # explicit broadcasting to add a vector to each column of a matrix
1161 pdl> $mat = zeroes(4,3)
1162 pdl> $line = pdl (3.1416,2,-2)
1163 pdl> ($tmp = $mat->broadcast(0)) += $line
1164
1165 In this example, "$mat->broadcast(0)" tells PDL that you want the
1166 second dimension of this ndarray to be broadcast over first leading to
1167 a broadcast loop that can be expressed as
1168
1169 for (j=0; j<3; j++) {
1170 for (i=0; i<4; i++) {
1171 mat(i,j) += src(j);
1172 }
1173 }
1174
1175 "broadcast" takes a list of numbers as arguments which explicitly
1176 specify which dimensions to broadcast over first. With the introduction
1177 of explicit broadcasting the dimensions of an ndarray are conceptually
1178 split into three different groups the latter two of which we have
1179 already encountered: broadcast dimensions, core dimensions and extra
1180 dimensions.
1181
1182 Conceptually, it is best to think of those dimensions of an ndarray
1183 that have been specified in a call to "broadcast" as being taken away
1184 from the set of normal dimensions and put on a separate stack. So
1185 assuming we have an ndarray "x(4,7,2,8)" saying
1186
1187 $y = $x->broadcast(2,1)
1188
1189 creates a new virtual ndarray of dimension "y(4,8)" (which we call the
1190 remaining dims) that also has 2 broadcast dimensions of size "(2,7)".
1191 For the purposes of this document we write that symbolically as
1192 "y(4,8){2,7}". An important difference to the previous examples where
1193 only implicit broadcasting was used is the fact that the core
1194 dimensions are matched against the remaining dimensions which are not
1195 necessarily the first dimensions of the ndarray. We will now specify
1196 how the presence of broadcast dimensions changes the rules R1-R5 for
1197 broadcast loops (which apply to the special case where none of the
1198 ndarray arguments has any broadcast dimensions).
1199
1200 T0 Core dimensions are matched against the first n remaining
1201 dimensions of the ndarray argument (note the difference to R1). Any
1202 further remaining dimensions are extra dimensions and are used to
1203 determine the implicit loop dimensions.
1204
1205 T1a The number of implicit loop dimensions is equal to the maximal
1206 number of extra dimensions taken over the set of ndarray arguments.
1207
1208 T1b The number of explicit loop dimensions is equal to the maximal
1209 number of broadcast dimensions taken over the set of ndarray
1210 arguments.
1211
1212 T1c The total number of loop dimensions is equal to the sum of explicit
1213 loop dimensions and implicit loop dimensions. In the broadcast
1214 loop, explicit loop dimensions are broadcasted over first followed
1215 by implicit loop dimensions.
1216
1217 T2 The size of each of the loop dimensions is derived from the size of
1218 the respective dimensions of the ndarray arguments. It is given by
1219 the maximal size found in any ndarrays having this broadcast
1220 dimension (for explicit loop dimensions) or extra dimension (for
1221 implicit loop dimensions).
1222
1223 T3 This rule applies to any explicit loop dimension as well as any
1224 implicit loop dimension. For all ndarrays that have a given
1225 broadcast/extra dimension the size must be equal to the size of the
1226 respective explicit/implicit loop dimension or 1; otherwise you
1227 raise a runtime exception. If the size of a broadcast/extra
1228 dimension of an ndarray is one it is implicitly treated as a dummy
1229 dimension of size equal to the explicit/implicit loop dimension.
1230
1231 T4 If an ndarray doesn't have a broadcast/extra dimension that
1232 corresponds to an explicit/implicit loop dimension, in the
1233 broadcast loop this ndarray is treated as if having a dummy
1234 dimension of size equal to the size of that loop dimension.
1235
1236 T4a All ndarrays that do have broadcast dimensions must have the same
1237 number of broadcast dimensions.
1238
1239 T5 Output auto-creation cannot be used if any of the ndarray arguments
1240 has any broadcast dimensions. Otherwise R5 applies.
1241
1242 The same restrictions apply with regard to implicit dummy dimensions
1243 (created by application of T4) as already mentioned in the section on
1244 implicit broadcasting: if any of the output ndarrays has an (explicit
1245 or implicitly created) greater-than-one dummy dimension a runtime
1246 exception will be raised.
1247
1248 Let us demonstrate these rules at work in a generic case. Suppose we
1249 have a (here unspecified) PP-function with the signature:
1250
1251 func((m,n),(m),(),[o](m))
1252
1253 and you call it with 3 ndarrays "a(5,3,10,11)", "b(3,5,10,1,12)", c(10)
1254 and an output ndarray "d(3,11,5,10,12)" (which can here not be
1255 automatically created) as
1256
1257 func($x->broadcast(1,3),$y->broadcast(0,3),$c,$d->broadcast(0,1))
1258
1259 From the signature of func and the above call the ndarrays split into
1260 the following groups of core, extra and broadcast dimensions (written
1261 in the form "pdl(core dims){broadcast dims}[extra dims]"):
1262
1263 a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]
1264
1265 With this to help us along (it is in general helpful to write the
1266 arguments down like this when you start playing with broadcasting and
1267 want to keep track of what is going on) we further deduce that the
1268 number of explicit loop dimensions is 2 (by T1b from $a and $b) with
1269 sizes "(3,11)" (by T2); 2 implicit loop dimensions (by T1a from $b and
1270 $d) of size "(10,12)" (by T2) and the elements of are computed from the
1271 input ndarrays in a way that can be expressed in pdl pseudo-code as
1272
1273 for (l=0;l<12;l++)
1274 for (k=0;k<10;k++)
1275 for (j=0;j<11;j++) effect of treating it as dummy dim (index j)
1276 for (i=0;i<3;i++) |
1277 d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))
1278
1279 Ugh, this example was really not easy in terms of bookkeeping. It
1280 serves mostly as an example how to figure out what's going on when you
1281 encounter a complicated looking expression. But now it is really time
1282 to show that broadcasting is useful by giving some more of our so
1283 called "practical" examples.
1284
1285 [ The following examples will need some additional explanations in the
1286 future. For the moment please try to live with the comments in the code
1287 fragments. ]
1288
1289 Example 1:
1290
1291 *** inverse of matrix represented by eigvecs and eigvals
1292 ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
1293 ** => inverse M^-1 = A^T x diag(1/lambda_i) x A
1294 ** first $tmp = diag(1/lambda_i)*A
1295 ** then A^T * $tmp by broadcasted inner product
1296 # index handling so that matrices print correct under pdl
1297 $inv .= $evecs*0; # just copy to get appropriately sized output
1298 $tmp .= $evecs; # initialise, no back-propagation
1299 ($tmp2 = $tmp->broadcast(0)) /= $evals; # broadcasted division
1300 # and now a matrix multiplication in disguise
1301 PDL::Primitive::inner($evecs->xchg(0,1)->broadcast(-1,1),
1302 $tmp->broadcast(0,-1),
1303 $inv->broadcast(0,1));
1304 # alternative for matrix mult using implicit broadcasting,
1305 # first xchg only for transpose
1306 PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1),
1307 $tmp->xchg(0,1)->dummy(2),
1308 ($inv=null));
1309
1310 Example 2:
1311
1312 # outer product by broadcasted multiplication
1313 # stress that we need to do it with explicit call to my_biop1
1314 # when using explicit broadcasting
1315 $res=zeroes(($x->dims)[0],($y->dims)[0]);
1316 my_biop1($x->broadcast(0,-1),$y->broadcast(-1,0),$res->(0,1),"*");
1317 # similar thing by implicit broadcasting with auto-created ndarray
1318 $res = $x->dummy(1) * $y->dummy(0);
1319
1320 Example 3:
1321
1322 # different use of broadcast and unbroadcast to shuffle a number of
1323 # dimensions in one go without lots of calls to ->xchg and ->mv
1324
1325
1326 # use broadcast/unbroadcast to shuffle dimensions around
1327 # just try it out and compare the child ndarray with its parent
1328 $trans = $x->broadcast(4,1,0,3,2)->unbroadcast;
1329
1330 Example 4:
1331
1332 # calculate a couple of bounding boxes
1333 # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
1334 # we use again broadcast and unbroadcast to shuffle dimensions around
1335 pdl> $bb = zeroes(double, 2,3 );
1336 pdl> minimum($vertices->broadcast(0)->clump->unbroadcast(1), $bb->slice('(0),:'));
1337 pdl> maximum($vertices->broadcast(0)->clump->unbroadcast(1), $bb->slice('(1),:'));
1338
1339 Example 5:
1340
1341 # calculate a self-rationed (i.e. self normalized) sequence of images
1342 # uses explicit broadcasting and an implicitly broadcasted division
1343 $stack = read_image_stack();
1344 # calculate the average (per pixel average) of the first $n+1 images
1345 $aver = zeroes([stack->dims]->[0,1]); # make the output ndarray
1346 sumover($stack->slice(":,:,0:$n")->broadcast(0,1),$aver);
1347 $aver /= ($n+1);
1348 $stack /= $aver; # normalize the stack by doing a broadcasted division
1349 # implicit versus explicit
1350 # alternatively calculate $aver with implicit broadcasting and auto-creation
1351 sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
1352 $aver /= ($n+1);
1353 #
1354
1355 Implicit versus explicit broadcasting
1356 In this paragraph we are going to illustrate when explicit broadcasting
1357 is preferable over implicit broadcasting and vice versa. But then
1358 again, this is probably not the best way of putting the case since you
1359 already know: the two flavours do mix. So, it's more about how to get
1360 the best of both worlds and, anyway, in the best of Perl traditions:
1361 TIMTOWTDI !
1362
1363 [ Sorry, this still has to be filled in in a later release; either
1364 refer to above examples or choose some new ones ]
1365
1366 Finally, this may be a good place to justify all the technical detail
1367 we have been going on about for a couple of pages: why broadcasting ?
1368
1369 Well, code that uses broadcasting should be (considerably) faster than
1370 code that uses explicit for-loops (or similar Perl constructs) to
1371 achieve the same functionality. Especially on supercomputers (with
1372 vector computing facilities/parallel processing) PDL broadcasting is
1373 implemented in a way that takes advantage of the additional facilities
1374 of these machines. Furthermore, it is a conceptually simple construct
1375 (though technical details might get involved at times) and can greatly
1376 reduce the syntactical complexity of PDL code (but keep the admonition
1377 for documentation in mind). Once you are comfortable with the
1378 broadcasting way of thinking (and coding) it shouldn't be too difficult
1379 to understand code that somebody else has written than (provided they
1380 gave you an idea what expected input dimensions are, etc.). As a
1381 general tip to increase the performance of your code: if you have to
1382 introduce a loop into your code try to reformulate the problem so that
1383 you can use broadcasting to perform the loop (as with anything there
1384 are exceptions to this rule of thumb; but the authors of this document
1385 tend to think that these are rare cases ;).
1386
1388 An easy way to define functions that are aware of indexing and broadcasting
1389 (and the universe and everything)
1390 PDL:PP is part of the PDL distribution. It is used to generate
1391 functions that are aware of indexing and broadcasting rules from very
1392 concise descriptions. It can be useful for you if you want to write
1393 your own functions or if you want to interface functions from an
1394 external library so that they support indexing and broadcasting (and
1395 maybe dataflow as well, see PDL::Dataflow). For further details check
1396 PDL::PP.
1397
1399 Affine transformations - a special class of simple and powerful
1400 transformations
1401 [ This is also something to be added in future releases. Do we already
1402 have the general make_affine routine in PDL ? It is possible that we
1403 will reference another appropriate man page from here ]
1404
1406 signatures of standard PDL::PP compiled functions
1407 A selection of signatures of PDL primitives to show how many dimensions
1408 PP compiled functions gobble up (and therefore you can figure out what
1409 will be broadcasted over). Most of those functions are the basic ones
1410 defined in "primitive.pd"
1411
1412 # functions in primitive.pd
1413 #
1414 sumover ((n),[o]())
1415 prodover ((n),[o]())
1416 axisvalues ((n)) inplace
1417 inner ((n),(n),[o]())
1418 outer ((n),(m),[o](n,m))
1419 innerwt ((n),(n),(n),[o]())
1420 inner2 ((m),(m,n),(n),[o]())
1421 inner2t ((j,n),(n,m),(m,k),[o]())
1422 index (1D,0D,[o])
1423 minimum (1D,[o])
1424 maximum (1D,[o])
1425 wstat ((n),(n),(),[o],())
1426 assgn ((),())
1427
1428 # basic operations
1429 binary operations ((),(),[o]())
1430 unary operations ((),[o]())
1431
1433 Copyright (C) 1997 Christian Soeller (c.soeller@auckland.ac.nz) &
1434 Tuomas J. Lukka (lukka@fas.harvard.edu). All rights reserved. Although
1435 destined for release as a man page with the standard PDL distribution,
1436 it is not public domain. Permission is granted to freely distribute
1437 verbatim copies of this document provided that no modifications outside
1438 of formatting be made, and that this notice remain intact. You are
1439 permitted and encouraged to use its code and derivatives thereof in
1440 your own source code for fun or for profit as you see fit.
1441
1442
1443
1444perl v5.38.0 2023-07-21 INDEXING(1)