1PP(1) User Contributed Perl Documentation PP(1)
2
3
4
6 PDL::PP - Generate PDL routines from concise descriptions
7
9 e.g.
10
11 pp_def(
12 'sumover',
13 Pars => 'a(n); [o]b();',
14 Code => 'double tmp=0;
15 loop(n) %{ tmp += $a(); %}
16 $b() = tmp;
17 '
18 );
19
20 pp_done();
21
23 In much of what follows we will assume familiarity of the reader with
24 the concepts of implicit and explicit threading and index manipulations
25 within PDL. If you have not yet heard of these concepts or are not very
26 comfortable with them it is time to check PDL::Indexing.
27
28 As you may appreciate from its name PDL::PP is a Pre-Processor, i.e.
29 it expands code via substitutions to make real C-code (well, actually
30 it outputs XS code (See perlxs) but that is very close to C).
31
33 Why do we need PP? Several reasons: firstly, we want to be able to
34 generate subroutine code for each of the PDL datatypes (PDL_Byte,
35 PDL_Short,. etc). AUTOMATICALLY. Secondly, when referring to slices
36 of PDL arrays in Perl (e.g. "$a->slice('0:10:2,:')" or other things
37 such as transposes) it is nice to be able to do this transparently and
38 to be able to do this 'in-place' - i.e, not to have to make a memory
39 copy of the section. PP handles all the necessary element and offset
40 arithmetic for you. There are also the notions of threading (repeated
41 calling of the same routine for multiple slices, see PDL::Indexing) and
42 dataflow (see PDL::Dataflow) which use of PP allows.
43
44 So how do you use PP? Well for the most part you just write ordinary C
45 code except for special PP constructs which take the form:
46
47 $something(something else)
48
49 or:
50
51 PPfunction %{
52 <stuff>
53 %}
54
55 The most important PP construct is the form "$array()". Consider the
56 very simple PP function to sum the elements of a 1D vector (in fact
57 this is very similar to the actual code used by 'sumover'):
58
59 pp_def('sumit',
60 Pars => 'a(n); [o]b();',
61 Code => '
62 double tmp;
63 tmp = 0;
64 loop(n) %{
65 tmp += $a();
66 %}
67 $b() = tmp;
68 ');
69
70 What's going on? The "Pars =>" line is very important for PP - it
71 specifies all the arguments and their dimensionality. We call this the
72 signature of the PP function (compare also the explanations in
73 PDL::Indexing). In this case the routine takes a 1-D function as input
74 and returns a 0-D scalar as output. The "$a()" PP construct is used to
75 access elements of the array a(n) for you - PP fills in all the
76 required C code.
77
78 [Aside: since PP used "$var()" for its parsing you must single-quote
79 all Code=> arguments since you don't want perl to interpolate "$var()"
80 into another string - i.e. don't use "" unless you know what you are
81 doing! Tjl: it's usually easiest to use single quotes and
82 'something'.$interpolatable.'somethingelse']
83
84 In the simple case here where all elements are accessed the PP
85 construct "loop(n) %{ ... %}" is used to loop over all elements in
86 dimension "n". Note this feature of PP: ALL DIMENSIONS ARE SPECIFIED
87 BY NAME.
88
89 This is made clearer if we avoid the PP loop() construct and write the
90 loop explicitly using conventional C:
91
92 pp_def('sumit',
93 Pars => 'a(n); [o]b();',
94 Code => '
95 int i,n_size;
96 double tmp;
97 n_size = $SIZE(n);
98 tmp = 0;
99 for(i=0; i<n_size; i++) {
100 tmp += $a(n=>i);
101 }
102 $b() = tmp;
103 ');
104
105 which does the same as before, except more long-windedly. You can see
106 to get element "i" of a() we say "$a(n=>i)" - we are specifying the
107 dimension by name "n". In 2D we might say:
108
109 Pars=>'a(m,n);',
110 ...
111 tmp += $a(m=>i,n=>j);
112 ...
113
114 The syntax 'm=>i' borrows from Perl hashes (which are in fact used in
115 the implementation of PP). One could also say "$a(n=>j,m=>i)" as order
116 is not important.
117
118 You can also see in the above example the use of another PP construct -
119 $SIZE(n) to get the length of the dimension "n".
120
121 It should, however, be noted that you shouldn't write an explicit
122 C-loop when you could have used the PP "loop" construct since PDL::PP
123 checks automatically the loop limits for you, usage of "loop" makes the
124 code more concise, etc. But there are certainly situations where you
125 need explicit control of the loop and now you know how to do it ;).
126
127 To revisit 'Why PP?' - the above code for sumit() will be generated for
128 each data-type. It will operate on slices of arrays 'in-place'. It will
129 thread automatically - e.g. if a 2D array is given it will be called
130 repeatedly for each 1D row (again check PDL::Indexing for the details
131 of threading). And then b() will be a 1D array of sums of each row.
132 We could call it with $a->xchg(0,1) to sum the colums instead. And
133 Dataflow tracing etc. will be available.
134
135 You can see PP saves the programmer from writing a lot of needlessly
136 repetitive C-code -- in our opinion this is one of the best features of
137 PDL making writing new C subroutines for PDL an amazingly concise
138 exercise. A second reason is the ability to make PP expand your concise
139 code definitions into different C code based on the needs of the
140 computer architecture in question. Imagine for example you are lucky to
141 have a supercomputer at your hands; in that case you want PDL::PP
142 certainly to generate code that takes advantage of the
143 vectorising/parallel computing features of your machine (this a project
144 for the future). In any case, the bottom line is that your unchanged
145 code should still expand to working XS code even if the internals of
146 PDL changed.
147
148 Also, because you are generating the code in an actual Perl script,
149 there are many fun things that you can do. Let's say that you need to
150 write both sumit (as above) and multit. With a little bit of
151 inventivity, we can do
152
153 for({Name => 'sumit', Init => '0', Op => '+='},
154 {Name => 'multit', Init => '1', Op => '*='}) {
155 pp_def($_->{Name},
156 Pars => 'a(n); [o]b();',
157 Code => '
158 double tmp;
159 tmp = '.$_->{Init}.';
160 loop(n) %{
161 tmp '.$_->{Op}.' $a();
162 %}
163 $b() = tmp;
164 ');
165 }
166
167 which defines both the functions easily. Now, if you later need to
168 change the signature or dimensionality or whatever, you only need to
169 change one place in your code. Yeah, sure, your editor does have 'cut
170 and paste' and 'search and replace' but it's still less bothersome and
171 definitely more difficult to forget just one place and have strange
172 bugs creep in. Also, adding 'orit' (bitwise or) later is a one-liner.
173
174 And remember, you really have perl's full abilities with you - you can
175 very easily read any input file and make routines from the information
176 in that file. For simple cases like the above, the author (Tjl)
177 currently favors the hash syntax like the above - it's not too much
178 more characters than the corresponding array syntax but much easier to
179 understand and change.
180
181 We should mention here also the ability to get the pointer to the
182 beginning of the data in memory - a prerequisite for interfacing PDL to
183 some libraries. This is handled with the "$P(var)" directive, see
184 below.
185
186 So, after this quick overview of the general flavour of programming PDL
187 routines using PDL::PP let's summarise in which circumstances you
188 should actually use this preprocessor/precompiler. You should use
189 PDL::PP if you want to
190
191 · interface PDL to some external library
192
193 · write some algorithm that would be slow if coded in perl (this is
194 not as often as you think; take a look at threading and dataflow
195 first).
196
197 · be a PDL developer (and even then it's not obligatory)
198
200 Because of its architecture, PDL::PP can be both flexible and easy to
201 use (yet exuberantly complicated) at the same time. Currently, part of
202 the problem is that error messages are not very informative and if
203 something goes wrong, you'd better know what you are doing and be able
204 to hack your way through the internals (or be able to figure out by
205 trial and error what is wrong with your args to "pp_def").
206
207 An alternative, of course, is to ask someone about it (e.g., through
208 the mailing lists).
209
211 Now that you have some idea how to use "pp_def" to define new PDL
212 functions it is time to explain the general syntax of "pp_def".
213 "pp_def" takes as arguments first the name of the function you are
214 defining and then a hash list that can contain various keys.
215
216 Based on these keys PP generates XS code and a .pm file. The function
217 "pp_done" (see example in the SYNOPSIS) is used to tell PDL::PP that
218 there are no more definitions in this file and it is time to generate
219 the .xs and
220 .pm file.
221
222 As a consequence, there may be several pp_def() calls inside a file (by
223 convention files with PP code have the extension .pd or .pp) but
224 generally only one pp_done().
225
226 There are two main different types of usage of pp_def(), the 'data
227 operation' and 'slice operation' prototypes.
228
229 The 'data operation' is used to take some data, mangle it and output
230 some other data; this includes for example the '+' operation, matrix
231 inverse, sumover etc and all the examples we have talked about in this
232 document so far. Implicit and explicit threading and the creation of
233 the result are taken care of automatically in those opeartions. You can
234 even do dataflow with "sumit", "sumover", etc (don't be dismayed if you
235 don't understand the concept of dataflow in PDL very well yet; it is
236 still very much experimental).
237
238 The 'slice operation' is a different kind of operation: in a slice
239 operation, you are not changing any data, you are defining
240 correspondences between different elements of two piddles (examples
241 include the index manipulation/slicing function definitions in the file
242 slices.pd that is part of the PDL distribution; but beware, this is not
243 introductory level stuff).
244
245 If PDL was compiled with support for bad values (ie "WITH_BADVAL =>
246 1"), then additional keys are required for "pp_def", as explained
247 below.
248
249 If you are just interested in communicating with some external library
250 (for example some linear algebra/matrix library), you'll usually want
251 the 'data operation' so we are going to discuss that first.
252
254 A simple example
255 In the data operation, you must know what dimensions of data you need.
256 First, an example with scalars:
257
258 pp_def('add',
259 Pars => 'a(); b(); [o]c();',
260 Code => '$c() = $a() + $b();'
261 );
262
263 That looks a little strange but let's dissect it. The first line is
264 easy: we're defining a routine with the name 'add'. The second line
265 simply declares our parameters and the parentheses mean that they are
266 scalars. We call the string that defines our parameters and their
267 dimensionality the signature of that function. For its relevance with
268 regard to threading and index manipulations check the PDL::Indexing
269 manpage.
270
271 The third line is the actual operation. You need to use the dollar
272 signs and parentheses to refer to your parameters (this will probably
273 change at some point in the future, once a good syntax is found).
274
275 These lines are all that is necessary to actually define the function
276 for PDL (well, actually it isn't; you aditionally need to write a
277 Makefile.PL (see below) and build the module (something like 'perl
278 Makefile.PL; make'); but let's ignore that for the moment). So now you
279 can do
280
281 use MyModule;
282 $a = pdl 2,3,4;
283 $b = pdl 5;
284
285 $c = add($a,$b);
286 # or
287 add($a,$b,($c=null)); # Alternative form, useful if $c has been
288 # preset to something big, not useful here.
289
290 and have threading work correctly (the result is $c == [7 8 9]).
291
292 The Pars section: the signature of a PP function
293 Seeing the above example code you will most probably ask: what is this
294 strange "$c=null" syntax in the second call to our new "add" function?
295 If you take another look at the definition of "add" you will notice
296 that the third argument "c" is flagged with the qualifier "[o]" which
297 tells PDL::PP that this is an output argument. So the above call to add
298 means 'create a new $c from scratch with correct dimensions' - "null"
299 is a special token for 'empty piddle' (you might ask why we haven't
300 used the value "undef" to flag this instead of the PDL specific "null";
301 we are currently thinking about it ;).
302
303 [This should be explained in some other section of the manual as
304 well!!] The reason for having this syntax as an alternative is that if
305 you have really huge piddles, you can do
306
307 $c = PDL->null;
308 for(some long loop) {
309 # munge a,b
310 add($a,$b,$c);
311 # munge c, put something back to a,b
312 }
313
314 and avoid allocating and deallocating $c each time. It is allocated
315 once at the first add() and thereafter the memory stays until $c is
316 destroyed.
317
318 If you just say
319
320 $c = add($a,$b);
321
322 the code generated by PP will automatically fill in "$c=null" and
323 return the result. If you want to learn more about the reasons why
324 PDL::PP supports this style where output arguments are given as last
325 arguments check the PDL::Indexing manpage.
326
327 "[o]" is not the only qualifier a pdl argument can have in the
328 signature. Another important qualifier is the "[t]" option which flags
329 a pdl as temporary. What does that mean? You tell PDL::PP that this
330 pdl is only used for temporary results in the course of the calculation
331 and you are not interested in its value after the computation has been
332 completed. But why should PDL::PP want to know about this in the first
333 place? The reason is closely related to the concepts of pdl auto
334 creation (you heard about that above) and implicit threading. If you
335 use implicit threading the dimensionality of automatically created pdls
336 is actually larger than that specified in the signature. With "[o]"
337 flagged pdls will be created so that they have the additional
338 dimensions as required by the number of implicit thread dimensions.
339 When creating a temporary pdl, however, it will always only be made big
340 enough so that it can hold the result for one iteration in a
341 threadloop, i.e. as large as required by the signature. So less memory
342 is wasted when you flag a pdl as temporary. Secondly, you can use
343 output auto creation with temporary pdls even when you are using
344 explicit threading which is forbidden for normal output pdls flagged
345 with "[o]" (see PDL::Indexing).
346
347 Here is an example where we use the [t] qualifier. We define the
348 function "callf" that calls a C routine "f" which needs a temporary
349 array of the same size and type as the array "a" (sorry about the
350 forward reference for $P; it's a pointer access, see below) :
351
352 pp_def('callf',
353 Pars => 'a(n); [t] tmp(n); [o] b()',
354 Code => 'int ns = $SIZE(n);
355 f($P(a),$P(b),$P(tmp),ns);
356 '
357 );
358
359 Argument dimensions and the signature
360 Now we have just talked about dimensions of pdls and the signature. How
361 are they related? Let's say that we want to add a scalar + the index
362 number to a vector:
363
364 pp_def('add2',
365 Pars => 'a(n); b(); [o]c(n);',
366 Code => 'loop(n) %{
367 $c() = $a() + $b() + n;
368 %}'
369 );
370
371 There are several points to notice here: first, the "Pars" argument now
372 contains the n arguments to show that we have a single dimensions in a
373 and c. It is important to note that dimensions are actual entities that
374 are accessed by name so this declares a and c to have the same first
375 dimensions. In most PP definitions the size of named dimensions will be
376 set from the respective dimensions of non-output pdls (those with no
377 "[o]" flag) but sometimes you might want to set the size of a named
378 dimension explicitly through an integer parameter. See below in the
379 description of the "OtherPars" section how that works.
380
381 Type conversions and the signature
382 The signature also determines the type conversions that will be
383 performed when a PP function is invoked. So what happens when we invoke
384 one of our previously defined functions with pdls of different type,
385 e.g.
386
387 add2($a,$b,($ret=null));
388
389 where $a is of type "PDL_Float" and $b of type "PDL_Short"? With the
390 signature as shown in the definition of "add2" above the datatype of
391 the operation (as determined at runtime) is that of the pdl with the
392 'highest' type (sequence is byte < short < ushort < long < float <
393 double). In the add2 example the datatype of the operation is float ($a
394 has that datatype). All pdl arguments are then type converted to that
395 datatype (they are not converted inplace but a copy with the right type
396 is created if a pdl argument doesn't have the type of the operation).
397 Null pdls don't contribute a type in the determination of the type of
398 the operation. However, they will be created with the datatype of the
399 operation; here, for example, $ret will be of type float. You should be
400 aware of these rules when calling PP functions with pdls of different
401 types to take the additional storage and runtime requirements into
402 account.
403
404 These type conversions are correct for most functions you normally
405 define with "pp_def". However, there are certain cases where slightly
406 modified type conversion behaviour is desired. For these cases
407 additional qualifiers in the signature can be used to specify the
408 desired properties with regard to type conversion. These qualifiers can
409 be combined with those we have encountered already (the creation
410 qualifiers "[o]" and "[t]"). Let's go through the list of qualifiers
411 that change type conversion behaviour.
412
413 The most important is the "int" qualifier which comes in handy when a
414 pdl argument represents indices into another pdl. Let's take a look at
415 an example from "PDL::Ufunc":
416
417 pp_def('maximum_ind',
418 Pars => 'a(n); int [o] b()',
419 Code => '$GENERIC() cur;
420 int curind;
421 loop(n) %{
422 if (!n || $a() > cur) {cur = $a(); curind = n;}
423 %}
424 $b() = curind;',
425 );
426
427 The function "maximum_ind" finds the index of the largest element of a
428 vector. If you look at the signature you notice that the output
429 argument "b" has been declared with the additional "int" qualifier.
430 This has the following consequences for type conversions: regardless of
431 the type of the input pdl "a" the output pdl "b" will be of type
432 "PDL_Long" which makes sense since "b" will represent an index into
433 "a". Furthermore, if you call the function with an existing output pdl
434 "b" its type will not influence the datatype of the operation (see
435 above). Hence, even if "a" is of a smaller type than "b" it will not be
436 converted to match the type of "b" but stays untouched, which saves
437 memory and CPU cycles and is the right thing to do when "b" represents
438 indices. Also note that you can use the 'int' qualifier together with
439 other qualifiers (the "[o]" and "[t]" qualifiers). Order is significant
440 -- type qualifiers precede creation qualifiers ("[o]" and "[t]").
441
442 The above example also demonstrates typical usage of the "$GENERIC()"
443 macro. It expands to the current type in a so called generic loop.
444 What is a generic loop? As you already heard a PP function has a
445 runtime datatype as determined by the type of the pdl arguments it has
446 been invoked with. The PP generated XS code for this function
447 therefore contains a switch like "switch (type) {case PDL_Byte: ...
448 case PDL_Double: ...}" that selects a case based on the runtime
449 datatype of the function (it's called a type ``loop'' because there is
450 a loop in PP code that generates the cases). In any case your code is
451 inserted once for each PDL type into this switch statement. The
452 "$GENERIC()" macro just expands to the respective type in each copy of
453 your parsed code in this "switch" statement, e.g., in the "case
454 PDL_Byte" section "cur" will expand to "PDL_Byte" and so on for the
455 other case statements. I guess you realise that this is a useful macro
456 to hold values of pdls in some code.
457
458 There are a couple of other qualifiers with similar effects as "int".
459 For your convenience there are the "float" and "double" qualifiers with
460 analogous consequences on type conversions as "int". Let's assume you
461 have a very large array for which you want to compute row and column
462 sums with an equivalent of the "sumover" function. However, with the
463 normal definition of "sumover" you might run into problems when your
464 data is, e.g. of type short. A call like
465
466 sumover($large_pdl,($sums = null));
467
468 will result in $sums be of type short and is therefore prone to
469 overflow errors if $large_pdl is a very large array. On the other hand
470 calling
471
472 @dims = $large_pdl->dims; shift @dims;
473 sumover($large_pdl,($sums = zeroes(double,@dims)));
474
475 is not a good alternative either. Now we don't have overflow problems
476 with $sums but at the expense of a type conversion of $large_pdl to
477 double, something bad if this is really a large pdl. That's where
478 "double" comes in handy:
479
480 pp_def('sumoverd',
481 Pars => 'a(n); double [o] b()',
482 Code => 'double tmp=0;
483 loop(n) %{ tmp += a(); %}
484 $b() = tmp;',
485 );
486
487 This gets us around the type conversion and overflow problems. Again,
488 analogous to the "int" qualifier "double" results in "b" always being
489 of type double regardless of the type of "a" without leading to a
490 typeconversion of "a" as a side effect.
491
492 Finally, there are the "type+" qualifiers where type is one of "int" or
493 "float". What shall that mean. Let's illustrate the "int+" qualifier
494 with the actual definition of sumover:
495
496 pp_def('sumover',
497 Pars => 'a(n); int+ [o] b()',
498 Code => '$GENERIC(b) tmp=0;
499 loop(n) %{ tmp += a(); %}
500 $b() = tmp;',
501 );
502
503 As we had already seen for the "int", "float" and "double" qualifiers,
504 a pdl marked with a "type+" qualifier does not influence the datatype
505 of the pdl operation. Its meaning is "make this pdl at least of type
506 "type" or higher, as required by the type of the operation". In the
507 sumover example this means that when you call the function with an "a"
508 of type PDL_Short the output pdl will be of type PDL_Long (just as
509 would have been the case with the "int" qualifier). This again tries to
510 avoid overflow problems when using small datatypes (e.g. byte images).
511 However, when the datatype of the operation is higher than the type
512 specified in the "type+" qualifier "b" will be created with the
513 datatype of the operation, e.g. when "a" is of type double then "b"
514 will be double as well. We hope you agree that this is sensible
515 behaviour for "sumover". It should be obvious how the "float+"
516 qualifier works by analogy. It may become necessary to be able to
517 specify a set of alternative types for the parameters. However, this
518 will probably not be implemented until someone comes up with a
519 reasonable use for it.
520
521 Note that we now had to specify the $GENERIC macro with the name of the
522 pdl to derive the type from that argument. Why is that? If you
523 carefully followed our explanations you will have realised that in some
524 cases "b" will have a different type than the type of the operation.
525 Calling the '$GENERIC' macro with "b" as argument makes sure that the
526 type will always the same as that of "b" in that part of the generic
527 loop.
528
529 This is about all there is to say about the "Pars" section in a
530 "pp_def" call. You should remember that this section defines the
531 signature of a PP defined function, you can use several options to
532 qualify certain arguments as output and temporary args and all
533 dimensions that you can later refer to in the "Code" section are
534 defined by name.
535
536 It is important that you understand the meaning of the signature since
537 in the latest PDL versions you can use it to define threaded functions
538 from within perl, i.e. what we call perl level threading. Please check
539 PDL::Indexing for details.
540
541 The Code section
542 The "Code" section contains the actual XS code that will be in the
543 innermost part of a threadloop (if you don't know what a thread loop is
544 then you still haven't read PDL::Indexing; do it now ;) after any PP
545 macros (like $GENERIC) and PP functions have been expanded (like the
546 "loop" function we are going to explain next).
547
548 Let's quickly reiterate the "sumover" example:
549
550 pp_def('sumover',
551 Pars => 'a(n); int+ [o] b()',
552 Code => '$GENERIC(b) tmp=0;
553 loop(n) %{ tmp += a(); %}
554 $b() = tmp;',
555 );
556
557 The "loop" construct in the "Code" section also refers to the dimension
558 name so you don't need to specify any limits: the loop is correctly
559 sized and everything is done for you, again.
560
561 Next, there is the surprising fact that "$a()" and "$b()" do not
562 contain the index. This is not necessary because we're looping over n
563 and both variables know which dimensions they have so they
564 automatically know they're being looped over.
565
566 This feature comes in very handy in many places and makes for much
567 shorter code. Of course, there are times when you want to circumvent
568 this; here is a function which symmetrizes a matrix and serves as an
569 example of how to code explicit looping:
570
571 pp_def('symm',
572 Pars => 'a(n,n); [o]c(n,n);',
573 Code => 'loop(n) %{
574 int n2;
575 for(n2=n; n2<$SIZE(n); n2++) {
576 $c(n0 => n, n1 => n2) =
577 $c(n0 => n2, n1 => n) =
578 $a(n0 => n, n1 => n2);
579 }
580 %}
581 '
582 );
583
584 Let's dissect what is happening. Firstly, what is this function
585 supposed to do? From its signature you see that it takes a 2D matrix
586 with equal numbers of columns and rows and outputs a matrix of the same
587 size. From a given input matrix $a it computes a symmetric output
588 matrix $c (symmetric in the matrix sense that A^T = A where ^T means
589 matrix transpose, or in PDL parlance $c == $c->xchg(0,1)). It does this
590 by using only the values on and below the diagonal of $a. In the output
591 matrix $c all values on and below the diagonal are the same as those in
592 $a while those above the diagonal are a mirror image of those below the
593 diagonal (above and below are here interpreted in the way that PDL
594 prints 2D pdls). If this explanation still sounds a bit strange just go
595 ahead, make a little file into which you write this definition, build
596 the new PDL extension (see section on Makefiles for PP code) and try it
597 out with a couple of examples.
598
599 Having explained what the function is supposed to do there are a couple
600 of points worth noting from the syntactical point of view. First, we
601 get the size of the dimension named "n" again by using the $SIZE macro.
602 Second, there are suddenly these funny "n0" and "n1" index names in the
603 code though the signature defines only the dimension "n". Why this? The
604 reason becomes clear when you note that both the first and second
605 dimension of $a and $b are named "n" in the signature of "symm". This
606 tells PDL::PP that the first and second dimension of these arguments
607 should have the same size. Otherwise the generated function will raise
608 a runtime error. However, now in an access to $a and $c PDL::PP cannot
609 figure out which index "n" refers to any more just from the name of the
610 index. Therefore, the indices with equal dimension names get numbered
611 from left to right starting at 0, e.g. in the above example "n0" refers
612 to the first dimension of $a and $c, "n1" to the second and so on.
613
614 In all examples so far, we have only used the "Pars" and "Code" members
615 of the hash that was passed to "pp_def". There are certainly other keys
616 that are recognised by PDL::PP and we will hear about some of them in
617 the course of this document. Find a (non-exhaustive) list of keys in
618 Appendix A. A list of macros and PPfunctions (we have only encountered
619 some of those in the examples above yet) that are expanded in values of
620 the hash argument to "pp_def" is summarised in Appendix B.
621
622 At this point, it might be appropriate to mention that PDL::PP is not a
623 completely static, well designed set of routines (as Tuomas puts it:
624 "stop thinking of PP as a set of routines carved in stone") but rather
625 a collection of things that the PDL::PP author (Tuomas J. Lukka)
626 considered he would have to write often into his PDL extension
627 routines. PP tries to be expandable so that in the future, as new needs
628 arise, new common code can be abstracted back into it. If you want to
629 learn more on why you might want to change PDL::PP and how to do it
630 check the section on PDL::PP internals.
631
632 Handling bad values
633 If you do not have bad-value support compiled into PDL you can ignore
634 this section and the related keys: "BadCode", "HandleBad", ... (try
635 printing out the value of $PDL::Bad::Status - if it equals 0 then move
636 straight on).
637
638 There are several keys and macros used when writing code to handle bad
639 values. The first one is the "HandleBad" key:
640
641 HandleBad => 0
642 This flags a pp-routine as NOT handling bad values. If this routine
643 is sent piddles with their "badflag" set, then a warning message is
644 printed to STDOUT and the piddles are processed as if the value
645 used to represent bad values is a valid number. The "badflag" value
646 is not propogated to the output piddles.
647
648 An example of when this is used is for FFT routines, which
649 generally do not have a way of ignoring part of the data.
650
651 HandleBad => 1
652 This causes PDL::PP to write extra code that ensures the BadCode
653 section is used, and that the "$ISBAD()" macro (and its brethren)
654 work.
655
656 HandleBad is not given
657 If any of the input piddles have their "badflag" set, then the
658 output piddles will have their "badflag" set, but any supplied
659 BadCode is ignored.
660
661 The value of "HandleBad" is used to define the contents of the "BadDoc"
662 key, if it is not given.
663
664 To handle bad values, code must be written somewhat differently; for
665 instance,
666
667 $c() = $a() + $b();
668
669 becomes something like
670
671 if ( $a() != BADVAL && $b() != BADVAL ) {
672 $c() = $a() + $b();
673 } else {
674 $c() = BADVAL;
675 }
676
677 However, we only want the second version if bad values are present in
678 the input piddles (and that bad-value support is wanted!) - otherwise
679 we actually want the original code. This is where the "BadCode" key
680 comes in; you use it to specify the code to execute if bad values may
681 be present, and PP uses both it and the "Code" section to create
682 something like:
683
684 if ( bad_values_are_present ) {
685 fancy_threadloop_stuff {
686 BadCode
687 }
688 } else {
689 fancy_threadloop_stuff {
690 Code
691 }
692 }
693
694 This approach means that there is virtually no overhead when bad values
695 are not present (ie the badflag routine returns 0).
696
697 The BadCode section can use the same macros and looping constructs as
698 the Code section. However, it wouldn't be much use without the
699 following additional macros:
700
701 $ISBAD(var)
702 To check whether a piddle's value is bad, use the $ISBAD macro:
703
704 if ( $ISBAD(a()) ) { printf("a() is bad\n"); }
705
706 You can also access given elements of a piddle:
707
708 if ( $ISBAD(a(n=>l)) ) { printf("element %d of a() is bad\n", l); }
709
710 $ISGOOD(var)
711 This is the opposite of the $ISBAD macro.
712
713 $SETBAD(var)
714 For when you want to set an element of a piddle bad.
715
716 $ISBADVAR(c_var,pdl)
717 If you have cached the value of a piddle "$a()" into a c-variable
718 ("foo" say), then to check whether it is bad, use
719 "$ISBADVAR(foo,a)".
720
721 $ISGOODVAR(c_var,pdl)
722 As above, but this time checking that the cached value isn't bad.
723
724 $SETBADVAR(c_var,pdl)
725 To copy the bad value for a piddle into a c variable, use
726 "$SETBADVAR(foo,a)".
727
728 TODO: mention "$PPISBAD()" etc macros.
729
730 Using these macros, the above code could be specified as:
731
732 Code => '$c() = $a() + $b();',
733 BadCode => '
734 if ( $ISBAD(a()) || $ISBAD(b()) ) {
735 $SETBAD(c());
736 } else {
737 $c() = $a() + $b();
738 }',
739
740 Since this is perl, TMTOWTDI, so you could also write:
741
742 BadCode => '
743 if ( $ISGOOD(a()) && $ISGOOD(b()) ) {
744 $c() = $a() + $b();
745 } else {
746 $SETBAD(c());
747 }',
748
749 If you want access to the value of the badflag for a given piddle, you
750 can use the "$PDLSTATExxxx()" macros:
751
752 $PDLSTATEISBAD(pdl)
753 $PDLSTATEISGOOD(pdl)
754 $PDLSTATESETBAD(pdl)
755 $PDLSTATESETGOOD(pdl)
756
757 TODO: mention the "FindBadStatusCode" and "CopyBadStatusCode" options
758 to "pp_def", as well as the "BadDoc" key.
759
760 Interfacing your own/library functions using PP
761 Now, consider the following: you have your own C function (that may in
762 fact be part of some library you want to interface to PDL) which takes
763 as arguments two pointers to vectors of double:
764
765 void myfunc(int n,double *v1,double *v2);
766
767 The correct way of defining the PDL function is
768
769 pp_def('myfunc',
770 Pars => 'a(n); [o]b(n);',
771 GenericTypes => [D],
772 Code => 'myfunc($SIZE(n),$P(a),$P(b));'
773 );
774
775 The "$P("par")" syntax returns a pointer to the first element and the
776 other elements are guaranteed to lie after that.
777
778 Notice that here it is possible to make many mistakes. First, $SIZE(n)
779 must be used instead of "n". Second, you shouldn't put any loops in
780 this code. Third, here we encounter a new hash key recognised by
781 PDL::PP : the "GenericTypes" declaration tells PDL::PP to ONLY GENERATE
782 THE TYPELOOP FOP THE LIST OF TYPES SPECIFIED. In this case "double".
783 This has two advantages. Firstly the size of the compiled code is
784 reduced vastly, secondly if non-double arguments are passed to
785 "myfunc()" PDL will automatically convert them to double before passing
786 to the external C routine and convert them back afterwards.
787
788 One can also use "Pars" to qualify the types of individual arguments.
789 Thus one could also write this as:
790
791 pp_def('myfunc',
792 Pars => 'double a(n); double [o]b(n);',
793 Code => 'myfunc($SIZE(n),$P(a),$P(b));'
794 );
795
796 The type specification in "Pars" exempts the argument from variation in
797 the typeloop - rather it is automatically converted too and from the
798 type specified. This is obviously useful in a more general example,
799 e.g.:
800
801 void myfunc(int n,float *v1,long *v2);
802
803 pp_def('myfunc',
804 Pars => 'float a(n); long [o]b(n);',
805 GenericTypes => [F],
806 Code => 'myfunc($SIZE(n),$P(a),$P(b));'
807 );
808
809 Note we still use "GenericTypes" to reduce the size of the type loop,
810 obviously PP could in principle spot this and do it automatically
811 though the code has yet to attain that level of sophistication!
812
813 Finally note when types are converted automatically one MUST use the
814 "[o]" qualifier for output variables or you hard one changes will get
815 optimised away by PP!
816
817 If you interface a large library you can automate the interfacing even
818 further. Perl can help you again(!) in doing this. In many libraries
819 you have certain calling conventions. This can be exploited. In short,
820 you can write a little parser (which is really not difficult in perl)
821 that then generates the calls to "pp_def" from parsed descriptions of
822 the functions in that library. For an example, please check the Slatec
823 interface in the "Lib" tree of the PDL distribution. If you want to
824 check (during debugging) which calls to PP functions your perl code
825 generated a little helper package comes in handy which replaces the PP
826 functions by identically named ones that dump their arguments to
827 stdout.
828
829 Just say
830
831 perl -MPDL::PP::Dump myfile.pd
832
833 to see the calls to "pp_def" and friends. Try it with ops.pd and
834 slatec.pd. If you're interested (or want to enhance it), the source is
835 in Basic/Gen/PP/Dump.pm
836
837 Other macros and functions in the Code section
838 Macros: So far we have encountered the $SIZE, $GENERIC and $P macros.
839 Now we are going to quickly explain the other macros that are expanded
840 in the "Code" section of PDL::PP along with examples of their usage.
841
842 $T The $T macro is used for type switches. This is very useful when you
843 have to use different external (e.g. library) functions depending on
844 the input type of arguments. The general syntax is
845
846 $Ttypeletters(type_alternatives)
847
848 where "typeletters" is a permutation of a subset of the letters
849 "BSULFD" which stand for Byte, Short, Ushort, etc. and
850 "type_alternatives" are the expansions when the type of the PP
851 operation is equal to that indicated by the respective letter. Let's
852 illustrate this incomprehensible description by an example. Assuming
853 you have two C functions with prototypes
854
855 void float_func(float *in, float *out);
856 void double_func(double *in, double *out);
857
858 which do basically the same thing but one accepts float and the
859 other double pointers. You could interface them to PDL by defining a
860 generic function "foofunc" (which will call the correct function
861 depending on the type of the transformation):
862
863 pp_def('foofunc',
864 Pars => ' a(n); [o] b();',
865 Code => ' $TFD(float_func,double_func) ($P(a),$P(b));'
866 GenericTypes => [F,D],
867 );
868
869 Please note that you can't say
870
871 Code => ' $TFD(float,double)_func ($P(a),$P(b));'
872
873 since the $T macro expands with trailing spaces, analogously to C
874 preprocessor macros. The slightly longer form illustrated above is
875 correct. If you really want brevity, you can of course do
876
877 '$TBSULFD('.(join ',',map {"long_identifier_name_$_"}
878 qw/byt short unseigned lounge flotte dubble/).');'
879
880 $PP
881 The $PP macro is used for a so called physical pointer access. The
882 physical refers to some internal optimisations of PDL (for those who
883 are familiar with the PDL core we are talking about the vaffine
884 optimisations). This macro is mainly for internal use and you
885 shouldn't need to use it in any of your normal code.
886
887 $COMP (and the "OtherPars" section)
888 The $COMP macro is used to access non-pdl values in the code
889 section. Its name is derived from the implementation of
890 transformations in PDL. The variables you can refer to using $COMP
891 are members of the ``compiled'' structure that represents the PDL
892 transformation in question but does not yet contain any information
893 about dimensions (for further details check PDL::Internals).
894 However, you can treat $COMP just as a black box without knowing
895 anything about the implementation of transformations in PDL. So when
896 would you use this macro? Its main usage is to access values of
897 arguments that are declared in the "OtherPars" section of a "pp_def"
898 definition. But then you haven't heard about the "OtherPars" key
899 yet?! Let's have another example that illustrates typical usage of
900 both new features:
901
902 pp_def('pnmout',
903 Pars => 'a(m)',
904 OtherPars => "char* fd",
905 GenericTypes => [B,U,S,L],
906 Code => 'PerlIO *fp;
907 IO *io;
908
909 io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
910 if (!io || !(fp = IoIFP(io)))
911 croak("Can\'t figure out FP");
912
913 if (PerlIO_write(fp,$P(a),len) != len)
914 croak("Error writing pnm file");
915 ');
916
917 This function is used to write data from a pdl to a file. The file
918 descriptor is passed as a string into this function. This parameter
919 does not go into the "Pars" section since it cannot be usefully
920 treated like a pdl but rather into the aptly named "OtherPars"
921 section. Parameters in the "OtherPars" section follow those in the
922 "Pars" section when invoking the function, i.e.
923
924 open FILE,">out.dat" or die "couldn't open out.dat";
925 pnmout($pdl,'FILE');
926
927 When you want to access this parameter inside the code section you
928 have to tell PP by using the $COMP macro, i.e. you write "$COMP(fd)"
929 as in the example. Otherwise PP wouldn't know that the "fd" you are
930 referring to is the same as that specified in the "OtherPars"
931 section.
932
933 Another use for the "OtherPars" section is to set a named dimension
934 in the signature. Let's have an example how that is done:
935
936 pp_def('setdim',
937 Pars => '[o] a(n)',
938 OtherPars => 'int ns => n',
939 Code => 'loop(n) %{ $a() = n; %}',
940 );
941
942 This says that the named dimension "n" will be initialised from the
943 value of the other parameter "ns" which is of integer type (I guess
944 you have realised that we use the "CType From => named_dim" syntax).
945 Now you can call this function in the usual way:
946
947 setdim(($a=null),5);
948 print $a;
949 [ 0 1 2 3 4 ]
950
951 Admittedly this function is not very useful but it demonstrates how
952 it works. If you call the function with an existing pdl and you
953 don't need to explicitly specify the size of "n" since PDL::PP can
954 figure it out from the dimensions of the non-null pdl. In that case
955 you just give the dimension parameter as "-1":
956
957 $a = hist($b);
958 setdim($a,-1);
959
960 That should do it.
961
962 The only PP function that we have used in the examples so far is
963 "loop". Additionally, there are currently two other functions which
964 are recognised in the "Code" section:
965
966 threadloop
967 As we heard above the signature of a PP defined function defines the
968 dimensions of all the pdl arguments involved in a primitive
969 operation. However, you often call the functions that you defined
970 with PP with pdls that have more dimensions than those specified in
971 the signature. In this case the primitive operation is performed on
972 all subslices of appropriate dimensionality in what is called a
973 threadloop (see also overview above and PDL::Indexing). Assuming you
974 have some notion of this concept you will probably appreciate that
975 the operation specified in the code section should be optimised since
976 this is the tightest loop inside a threadloop. However, if you
977 revisit the example where we define the "pnmout" function, you will
978 quickly realise that looking up the "IO" file descriptor in the inner
979 threadloop is not very efficient when writing a pdl with many rows. A
980 better approach would be to look up the "IO" descriptor once outside
981 the threadloop and use its value then inside the tightest threadloop.
982 This is exactly where the "threadloop" function comes in handy. Here
983 is an improved definition of "pnmout" which uses this function:
984
985 pp_def('pnmout',
986 Pars => 'a(m)',
987 OtherPars => "char* fd",
988 GenericTypes => [B,U,S,L],
989 Code => 'PerlIO *fp;
990 IO *io;
991 int len;
992
993 io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
994 if (!io || !(fp = IoIFP(io)))
995 croak("Can\'t figure out FP");
996
997 len = $SIZE(m) * sizeof($GENERIC());
998
999 threadloop %{
1000 if (PerlIO_write(fp,$P(a),len) != len)
1001 croak("Error writing pnm file");
1002 %}
1003 ');
1004
1005 This works as follows. Normally the C code you write inside the
1006 "Code" section is placed inside a threadloop (i.e., PP generates the
1007 appropriate wrapping XS code around it). However, when you explicitly
1008 use the "threadloop" function, PDL::PP recognises this and doesn't
1009 wrap your code with an additional threadloop. This has the effect
1010 that code you write outside the threadloop is only executed once per
1011 transformation and just the code with in the surrounding "%{ ... %}"
1012 pair is placed within the tightest threadloop. This also comes in
1013 handy when you want to perform a decision (or any other code,
1014 especially CPU intensive code) only once per thread, i.e.
1015
1016 pp_addhdr('
1017 #define RAW 0
1018 #define ASCII 1
1019 ');
1020 pp_def('do_raworascii',
1021 Pars => 'a(); b(); [o]c()',
1022 OtherPars => 'int mode',
1023 Code => ' switch ($COMP(mode)) {
1024 case RAW:
1025 threadloop %{
1026 /* do raw stuff */
1027 %}
1028 break;
1029 case ASCII:
1030 threadloop %{
1031 /* do ASCII stuff */
1032 %}
1033 break;
1034 default:
1035 croak("unknown mode");
1036 }'
1037 );
1038
1039 types
1040 The types function works similar to the $T macro. However, with the
1041 "types" function the code in the following block (delimited by "%{"
1042 and "%}" as usual) is executed for all those cases in which the
1043 datatype of the operation is any of the types represented by the
1044 letters in the argument to "type", e.g.
1045
1046 Code => '...
1047
1048 types(BSUL) %{
1049 /* do integer type operation */
1050 %}
1051 types(FD) %{
1052 /* do floating point operation */
1053 %}
1054 ...'
1055
1056 Typemap handling in the "OtherPars" section
1057 The "OtherPars" section discussed above is very often absolutely
1058 crucial when you interface external libraries with PDL. However in many
1059 cases the external libraries either use derived types or pointers of
1060 various types.
1061
1062 The standard way to handle this in Perl is to use a "typemap" file.
1063 This is discussed in some detail in perlxs in the standard Perl
1064 documentation. In PP the functionality is very similar, so you can
1065 create a "typemap" file in the directory where your PP file resides and
1066 when it is built it is automatically read in to figure out the
1067 appropriate translation between the C type and Perl's built-in type.
1068
1069 That said, there are a couple of important differences from the general
1070 handling of types in XS. The first, and probably most important, is
1071 that at the moment pointers to types are not allowed in the "OtherPars"
1072 section. To get around this limitation you must use the "IV" type
1073 (thanks to Judd Taylor for pointing out that this is necessary for
1074 portability).
1075
1076 It is probably best to illustrate this with a couple of code-snippets:
1077
1078 For instance the "gsl_spline_init" function has the following C
1079 declaration:
1080
1081 int gsl_spline_init(gsl_spline * spline,
1082 const double xa[], const double ya[], size_t size);
1083
1084 Clearly the "xa" and "ya" arrays are candidates for being passed in as
1085 piddles and the "size" argument is just the length of these piddles so
1086 that can be handled by the "$SIZE()" macro in PP. The problem is the
1087 pointer to the "gsl_spline" type. The natural solution would be to
1088 write an "OtherPars" declaration of the form
1089
1090 OtherPars => 'gsl_spline *spl'
1091
1092 and write a short "typemap" file which handled this type. This does not
1093 work at present however! So what you have to do is to go around the
1094 problem slightly (and in some ways this is easier too!):
1095
1096 The solution is to declare "spline" in the "OtherPars" section using an
1097 "Integer Value", "IV". This hides the nature of the variable from PP
1098 and you then need to (well to avoid compiler warnings at least!)
1099 perform a type cast when you use the variable in your code. Thus
1100 "OtherPars" should take the form:
1101
1102 OtherPars => 'IV spl'
1103
1104 and when you use it in the code you will write
1105
1106 (gsl_spline *) $COMP(spl)
1107
1108 so putting this together as Andres Jordan has done (with the
1109 modification using "IV" by Judd Taylor) in the "gsl_interp.pd" in the
1110 distribution source you get:
1111
1112 pp_def('init_meat',
1113 Pars => 'double x(n); double y(n);',
1114 OtherPars => 'IV spl',
1115 Code =>'
1116 gsl_spline_init( (gsl_spline *) $COMP(spl), $P(x),$P(y),$SIZE(n));'
1117 );
1118
1119 where I have removed a macro wrapper call, but that would obscure the
1120 discussion.
1121
1122 The other minor difference as compared to the standard typemap handling
1123 in Perl, is that the user cannot specify non-standard typemap locations
1124 or typemap filenames using the "TYPEMAPS" option in MakeMaker... Thus
1125 you can only use a file called "typemap" and/or the "IV" trick above.
1126
1127 Other useful PP keys in data operation definitions
1128 You have already heard about the "OtherPars" key. Currently, there are
1129 not many other keys for a data operation that will be useful in normal
1130 (whatever that is) PP programming. In fact, it would be interesting to
1131 hear about a case where you think you need more than what is provided
1132 at the moment. Please speak up on one of the PDL mailing lists. Most
1133 other keys recognised by "pp_def" are only really useful for what we
1134 call slice operations (see also above).
1135
1136 One thing that is strongly being planned is variable number of
1137 arguments, which will be a little tricky.
1138
1139 An incomplete list of the available keys:
1140
1141 Inplace
1142 Setting this key marks the routine as working inplace - ie the
1143 input and output piddles are the same. An example is
1144 "$a->inplace->sqrt()" (or "sqrt(inplace($a))").
1145
1146 Inplace => 1
1147 Use when the routine is a unary function, such as "sqrt".
1148
1149 Inplace => ['a']
1150 If there are more than one input piddles, specify the name of
1151 the one that can be changed inplace using an array reference.
1152
1153 Inplace => ['a','b']
1154 If there are more than one output piddle, specify the name of
1155 the input piddle and output piddle in a 2-element array
1156 reference. This probably isn't needed, but left in for
1157 completeness.
1158
1159 If bad values are being used, care must be taken to ensure the
1160 propogation of the badflag when inplace is being used; consider
1161 this excerpt from Basic/Bad/bad.pd:
1162
1163 pp_def('replacebad',HandleBad => 1,
1164 Pars => 'a(); [o]b();',
1165 OtherPars => 'double newval',
1166 Inplace => 1,
1167 CopyBadStatusCode =>
1168 '/* propogate badflag if inplace AND it has changed */
1169 if ( a == b && $ISPDLSTATEBAD(a) )
1170 PDL->propogate_badflag( b, 0 );
1171
1172 /* always make sure the output is "good" */
1173 $SETPDLSTATEGOOD(b);
1174 ',
1175 ...
1176
1177 Since this routine removes all bad values, then the output piddle
1178 had its bad flag cleared. If run inplace (so "a == b"), then we
1179 have to tell all the children of "a" that the bad flag has been
1180 cleared (to save time we make sure that we call
1181 "PDL->propogate_badgflag" only if the input piddle had its bad flag
1182 set).
1183
1184 NOTE: one idea is that the documentation for the routine could be
1185 automatically flagged to indicate that it can be executed inplace,
1186 ie something similar to how "HandleBad" sets "BadDoc" if it's not
1187 supplied (it's not an ideal solution).
1188
1189 Other PDL::PP functions to support concise package definition
1190 So far, we have described the "pp_def" and "pp_done" functions. PDL::PP
1191 exports a few other functions to aid you in writing concise PDL
1192 extension package definitions.
1193
1194 Often when you interface library functions as in the above example you
1195 have to include additional C include files. Since the XS file is
1196 generated by PP we need some means to make PP insert the appropriate
1197 include directives in the right place into the generated XS file. To
1198 this end there is the "pp_addhdr" function. This is also the function
1199 to use when you want to define some C functions for internal use by
1200 some of the XS functions (which are mostly functions defined by
1201 "pp_def"). By including these functions here you make sure that
1202 PDL::PP inserts your code before the point where the actual XS module
1203 section begins and will therefore be left untouched by xsubpp (cf.
1204 perlxs and perlxstut manpages).
1205
1206 A typical call would be
1207
1208 pp_addhdr('
1209 #include <unistd.h> /* we need defs of XXXX */
1210 #include "libprotos.h" /* prototypes of library functions */
1211 #include "mylocaldecs.h" /* Local decs */
1212
1213 static void do_the real_work(PDL_Byte * in, PDL_Byte * out, int n)
1214 {
1215 /* do some calculations with the data */
1216 }
1217 ');
1218
1219 This ensures that all the constants and prototypes you need will be
1220 properly included and that you can use the internal functions defined
1221 here in the "pp_def"s, e.g.:
1222
1223 pp_def('barfoo',
1224 Pars => ' a(n); [o] b(n)',
1225 GenericTypes => '[B]',
1226 Code => ' int ns = $SIZE(n);
1227 do_the_real_work($P(a),$P(b),ns);
1228 ',
1229 );
1230
1231 In many cases the actual PP code (meaning the arguments to "pp_def"
1232 calls) is only part of the package you are currently implementing.
1233 Often there is additional perl code and XS code you would normally have
1234 written into the pm and XS files which are now automatically generated
1235 by PP. So how to get this stuff into those dynamically generated files?
1236 Fortunately, there are a couple of functions, generally called
1237 "pp_addXXX" that assist you in doing this.
1238
1239 Let's assume you have additional perl code that should go into the
1240 generated pm-file. This is easily achieved with the "pp_addpm" command:
1241
1242 pp_addpm(<<'EOD');
1243
1244 =head1 NAME
1245
1246 PDL::Lib::Mylib -- a PDL interface to the Mylib library
1247
1248 =head1 DESCRIPTION
1249
1250 This package implements an interface to the Mylib package with full
1251 threading and indexing support (see L<PDL::Indexing>).
1252
1253 =cut
1254
1255 use PGPLOT;
1256
1257 =head2 use_myfunc
1258 this function applies the myfunc operation to all the
1259 elements of the input pdl regardless of dimensions
1260 and returns the sum of the result
1261 =cut
1262
1263 sub use_myfunc {
1264 my $pdl = shift;
1265
1266 myfunc($pdl->clump(-1),($res=null));
1267
1268 return $res->sum;
1269 }
1270
1271 EOD
1272
1273 You have probably got the idea. In some cases you also want to export
1274 your additional functions. To avoid getting into trouble with PP which
1275 also messes around with the @EXPORT array you just tell PP to add your
1276 functions to the list of exported functions:
1277
1278 pp_add_exported('', 'use_myfunc gethynx');
1279
1280 Note the initial empty string argument (reason for it?).
1281
1282 The "pp_add_isa" command works like the the "pp_add_exported" function.
1283 The arguments to "pp_add_isa" are added the @ISA list, e.g.
1284
1285 pp_add_isa(' Some::Other::Class ');
1286
1287 Sometimes you want to add extra XS code of your own (that is generally
1288 not involved with any threading/indexing issues but supplies some other
1289 functionality you want to access from the perl side) to the generated
1290 XS file, for example
1291
1292 pp_addxs('','
1293
1294 # Determine endianness of machine
1295
1296 int
1297 isbigendian()
1298 CODE:
1299 unsigned short i;
1300 PDL_Byte *b;
1301
1302 i = 42; b = (PDL_Byte*) (void*) &i;
1303
1304 if (*b == 42)
1305 RETVAL = 0;
1306 else if (*(b+1) == 42)
1307 RETVAL = 1;
1308 else
1309 croak("Impossible - machine is neither big nor little endian!!\n");
1310 OUTPUT:
1311 RETVAL
1312 ');
1313
1314 Especially "pp_add_exported" and "pp_addxs" should be used with care.
1315 PP uses PDL::Exporter, hence letting PP export your function means that
1316 they get added to the standard list of function exported by default
1317 (the list defined by the export tag ``:Func''). If you use "pp_addxs"
1318 you shouldn't try to do anything that involves threading or indexing
1319 directly. PP is much better at generating the appropriate code from
1320 your definitions.
1321
1322 Finally, you may want to add some code to the BOOT section of the XS
1323 file (if you don't know what that is check perlxs). This is easily done
1324 with the "pp_add_boot" command:
1325
1326 pp_add_boot(<<EOB);
1327 descrip = mylib_initialize(KEEP_OPEN);
1328
1329 if (descrip == NULL)
1330 croak("Can't initialize library");
1331
1332 GlobalStruc->descrip = descrip;
1333 GlobalStruc->maxfiles = 200;
1334 EOB
1335
1336 By default, PP.pm puts all subs defined using the pp_def function into
1337 the output .pm file's EXPORT list. This can create problems if you are
1338 creating a subclassed object where you don't want any methods exported.
1339 (i.e. the methods will only be called using the $object->method
1340 syntax).
1341
1342 For these cases you can call pp_export_nothing() to clear out the
1343 export list. Example (At the end of the .pd file):
1344
1345 pp_export_nothing();
1346 pp_done();
1347
1348 By default, PP.pm puts the 'use Core;' line into the output .pm file.
1349 This imports Core's exported names into the current namespace, which
1350 can create problems if you are over-riding one of Core's methods in the
1351 current file. You end up getting messages like "Warning: sub sumover
1352 redefined in file subclass.pm" when running the program.
1353
1354 For these cases the pp_core_importList can be used to change what is
1355 imported from Core.pm. For example:
1356
1357 pp_core_importList('()')
1358
1359 This would result in
1360
1361 use Core();
1362
1363 being generated in the output .pm file. This would result in no names
1364 being imported from Core.pm. Similarly, calling
1365
1366 pp_core_importList(' qw/ barf /')
1367
1368 would result in
1369
1370 use Core qw/ barf/;
1371
1372 being generated in the output .pm file. This would result in just
1373 'barf' being imported from Core.pm.
1374
1376 The slice operation section of this manual is provided using dataflow
1377 and lazy evaluation: when you need it, ask Tjl to write it. a delivery
1378 in a week from when I receive the email is 95% probable and two week
1379 delivery is 99% probable.
1380
1381 And anyway, the slice operations require a much more intimate knowledge
1382 of PDL internals than the data operations. Furthermore, the complexity
1383 of the issues involved is considerably higher than that in the average
1384 data operation. If you would like to convince yourself of this fact
1385 take a look at the Basic/Slices/slices.pd file in the PDL distribution
1386 :-). Nevertheless, functions generated using the slice operations are
1387 at the heart of the index manipulation and dataflow capabilities of
1388 PDL.
1389
1390 Also, there are a lot of dirty issues with virtual piddles and vaffines
1391 which we shall entirely skip here.
1392
1393 Slices and bad values
1394 Slice operations need to be able to handle bad values (if support is
1395 compiled into PDL). The easiest thing to do is look at
1396 Basic/Slices/slices.pd to see how this works.
1397
1398 Along with "BadCode", there are also the "BadBackCode" and
1399 "BadRedoDimsCode" keys for "pp_def". However, any "EquivCPOffsCode"
1400 should not need changing, since any changes are absorbed into the
1401 definition of the "$EQUIVCPOFFS()" macro (ie it is handled
1402 automatically by PDL::PP>.
1403
1404 A few notes on writing a slicing routine...
1405 The following few paragraphs describe writing of a new slicing routine
1406 ('range'); any errors are CED's. (--CED 26-Aug-2002)
1407
1409 The PDL "Core" structure, defined in Basic/Core/pdlcore.h.PL, contains
1410 pointers to a number of routines that may be useful to you. The
1411 majority of these routines deal with manipulating piddles, but some are
1412 more general:
1413
1414 PDL->qsort_B( PDL_Byte *xx, int a, int b )
1415 Sort the array "xx" between the indices "a" and "b". There are
1416 also versions for the other PDL datatypes, with postfix "_S", "_U",
1417 "_L", "_F", and "_D". Any module using this must ensure that
1418 "PDL::Ufunc" is loaded.
1419
1420 PDL->qsort_ind_B( PDL_Byte *xx, int *ix, int a, int b )
1421 As for "PDL->qsort_B", but this time sorting the indices rather
1422 than the data.
1423
1424 The routine "med2d" in Lib/Image2D/image2d.pd shows how such routines
1425 are used.
1426
1428 If you are going to generate a package from your PP file (typical file
1429 extensions are ".pd" or ".pp" for the files containing PP code) it is
1430 easiest and safest to leave generation of the appropriate commands to
1431 the Makefile. In the following we will outline the typical format of a
1432 perl Makefile to automatically build and install your package from a
1433 description in a PP file. Most of the rules to build the xs, pm and
1434 other required files from the PP file are already predefined in the
1435 PDL::Core::Dev package. We just have to tell MakeMaker to use it.
1436
1437 In most cases you can define your Makefile like
1438
1439 # Makefile.PL for a package defined by PP code.
1440
1441 use PDL::Core::Dev; # Pick up development utilities
1442 use ExtUtils::MakeMaker;
1443
1444 $package = ["mylib.pd",Mylib,PDL::Lib::Mylib];
1445 %hash = pdlpp_stdargs($package);
1446 $hash{OBJECT} .= ' additional_Ccode$(OBJ_EXT) ';
1447 $hash{clean}->{FILES} .= ' todelete_Ccode$(OBJ_EXT) ';
1448 $hash{'VERSION_FROM'} = 'mylib.pd';
1449 WriteMakefile(%hash);
1450
1451 sub MY::postamble { pdlpp_postamble($package); }
1452
1453 Here, the list in $package is: first: PP source file name, then the
1454 prefix for the produced files and finally the whole package name. You
1455 can modify the hash in whatever way you like but it would be reasonable
1456 to stay within some limits so that your package will continue to work
1457 with later versions of PDL.
1458
1459 If you don't want to use prepackaged arguments, here is a generic
1460 Makefile.PL that you can adapt for your own needs:
1461
1462 # Makefile.PL for a package defined by PP code.
1463
1464 use PDL::Core::Dev; # Pick up development utilities
1465 use ExtUtils::MakeMaker;
1466
1467 WriteMakefile(
1468 'NAME' => 'PDL::Lib::Mylib',
1469 'VERSION_FROM' => 'mylib.pd',
1470 'TYPEMAPS' => [&PDL_TYPEMAP()],
1471 'OBJECT' => 'mylib$(OBJ_EXT) additional_Ccode$(OBJ_EXT)',
1472 'PM' => { 'Mylib.pm' => '$(INST_LIBDIR)/Mylib.pm'},
1473 'INC' => &PDL_INCLUDE(), # add include dirs as required by your lib
1474 'LIBS' => [''], # add link directives as necessary
1475 'clean' => {'FILES' =>
1476 'Mylib.pm Mylib.xs Mylib$(OBJ_EXT)
1477 additional_Ccode$(OBJ_EXT)'},
1478 );
1479
1480 # Add genpp rule; this will invoke PDL::PP on our PP file
1481 # the argument is an array reference where the array has three string elements:
1482 # arg1: name of the source file that contains the PP code
1483 # arg2: basename of the xs and pm files to be generated
1484 # arg3: name of the package that is to be generated
1485 sub MY::postamble { pdlpp_postamble(["mylib.pd",Mylib,PDL::Lib::Mylib]); }
1486
1487 To make life even easier PDL::Core::Dev defines the function
1488 "pdlpp_stdargs" that returns a hash with default values that can be
1489 passed (either directly or after appropriate modification) to a call to
1490 WriteMakefile. Currently, "pdlpp_stdargs" returns a hash where the
1491 keys are filled in as follows:
1492
1493 (
1494 'NAME' => $mod,
1495 'TYPEMAPS' => [&PDL_TYPEMAP()],
1496 'OBJECT' => "$pref\$(OBJ_EXT)",
1497 PM => {"$pref.pm" => "\$(INST_LIBDIR)/$pref.pm"},
1498 MAN3PODS => {"$src" => "\$(INST_MAN3DIR)/$mod.\$(MAN3EXT)"},
1499 'INC' => &PDL_INCLUDE(),
1500 'LIBS' => [''],
1501 'clean' => {'FILES' => "$pref.xs $pref.pm $pref\$(OBJ_EXT)"},
1502 )
1503
1504 Here, $src is the name of the source file with PP code, $pref the
1505 prefix for the generated .pm and .xs files and $mod the name of the
1506 exntension module to generate.
1507
1509 The internals of the current version consist of a large table which
1510 gives the rules according to which things are translated and the subs
1511 which implement these rules.
1512
1513 Later on, it would be good to make the table modifiable by the user so
1514 that different things may be tried.
1515
1516 [Meta comment: here will hopefully be more in the future; currently,
1517 your best bet will be to read the source code :-( or ask on the list
1518 (try the latter first) ]
1519
1521 Unless otherwise specified, the arguments are strings. Keys marked with
1522 (bad) are only used if bad-value support is compiled into PDL.
1523
1524 Pars
1525 define the signature of your function
1526
1527 OtherPars
1528 arguments which are not pdls. Default: nothing.
1529
1530 Code
1531 the actual code that implements the functionality; several PP
1532 macros and PP functions are recognised in the string value
1533
1534 HandleBad (bad)
1535 If set to 1, the routine is assumed to support bad values and the
1536 code in the BadCode key is used if bad values are present; it also
1537 sets things up so that the "$ISBAD()" etc macros can be used. If
1538 set to 0, cause the routine to print a warning if any of the input
1539 piddles have their bad flag set.
1540
1541 BadCode (bad)
1542 Give the code to be used if bad values may be present in the input
1543 piddles. Only used if "HandleBad => 1".
1544
1545 GenericTypes
1546 An array reference. The array may contain any subset of the strings
1547 `B', `S', `U', `L', `F' and `D', which specify which types your
1548 operation will accept. This is very useful (and important!) when
1549 interfacing an external library. Default: [qw/B S U L F D/]
1550
1551 Inplace
1552 Mark a function as being able to work inplace.
1553
1554 Inplace => 1 if Pars => 'a(); [o]b();'
1555 Inplace => ['a'] if Pars => 'a(); b(); [o]c();'
1556 Inplace => ['a','b'] if Pars => 'a(); b(); [o]c(); [o]d();'
1557
1558 If bad values are being used, care must be taken to ensure the
1559 propogation of the badflag when inplace is being used; for instance
1560 see the code for "replacebad" in Basic/Bad/bad.pd.
1561
1562 Doc Used to specify a documentation string in Pod format. See PDL::Doc
1563 for information on PDL documentation conventions. Note: in the
1564 special case where the PP 'Doc' string is one line this is
1565 implicitly used for the quick reference AND the documentation!
1566
1567 If the Doc field is omitted PP will generate default documentation
1568 (after all it knows about the Signature).
1569
1570 If you really want the function NOT to be documented in any way at
1571 this point (e.g. for an internal routine, or because youu are doing
1572 it elsewhere in the code) explictly specify "Doc=>undef".
1573
1574 BadDoc (bad)
1575 Contains the text returned by the "badinfo" command (in "perldl")
1576 or the "-b" switch to the "pdldoc" shell script. In many cases, you
1577 will not need to specify this, since the information can be
1578 automatically created by PDL::PP. However, as befits computer-
1579 generated text, it's rather stilted; it may be much better to do it
1580 yourself!
1581
1583 Macros
1584 Macros labelled by (bad) are only used if bad-value support is compiled
1585 into PDL.
1586
1587 $variablename_from_sig()
1588 access a pdl (by its name) that was specified in the signature
1589
1590 $COMP(x)
1591 access a value in the private data structure of this
1592 transformation (mainly used to use an argument that is specified
1593 in the "OtherPar" section)
1594
1595 $SIZE(n)
1596 replaced at runtime by the actual size of a named dimension (as
1597 specified in the signature)
1598
1599 $GENERIC()
1600 replaced by the C type that is equal to the runtime type of the
1601 operation
1602
1603 $P(a) a pointer access to the PDL named "a" in the signature. Useful
1604 for interfacing to C functions
1605
1606 $PP(a) a physical pointer access to pdl "a"; mainly for internal use
1607
1608 $TXXX(Alternative,Alternative)
1609 expansion alternatives according to runtime type of operation,
1610 where XXX is some string that is matched by "/[BSULFD+]/".
1611
1612 $PDL(a)
1613 return a pointer to the pdl data structure (pdl *) of piddle "a"
1614
1615 $ISBAD(a()) (bad)
1616 returns true if the value stored in "a()" equals the bad value
1617 for this piddle. Requires "HandleBad" being set to 1.
1618
1619 $ISGOOD(a()) (bad)
1620 returns true if the value stored in "a()" does not equal the bad
1621 value for this piddle. Requires "HandleBad" being set to 1.
1622
1623 $SETBAD(a()) (bad)
1624 Sets "a()" to equal the bad value for this piddle. Requires
1625 "HandleBad" being set to 1.
1626
1627 functions
1628 "loop(DIMS) %{ ... %}"
1629 loop over named dimensions; limits are generated automatically by PP
1630
1631 "threadloop %{ ... %}"
1632 enclose following code in a threadloop
1633
1634 "types(TYPES) %{ ... %}"
1635 execute following code if type of operation is any of "TYPES"
1636
1638 PDL
1639
1640 For the concepts of threading and slicing check PDL::Indexing.
1641
1642 PDL::Internals
1643
1644 PDL::BadValues for information on bad values
1645
1646 perlxs, perlxstut
1647
1649 RedoDimsCode, $RESIZE()
1650
1652 PDL::PP is still, even in its rewritten form, too complicated. It
1653 needs to be rethought a little as well as deconvoluted and modularized
1654 some more (e.g. all the NS things).
1655
1656 After the rewrite, this can happen a little by little, though.
1657
1658 Undocumented functions
1659 The following functions have been added since this manual was written
1660 and are as yet undocumented
1661
1662 pp_export_nothing
1663 pp_core_importList
1664 pp_beginwrap
1665 pp_setversion
1666 pp_addbegin
1667
1669 Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), Karl
1670 Glaazebrook (kgb@aaocbn1.aao.GOV.AU) and Christian Soeller
1671 (c.soeller@auckland.ac.nz). All rights reserved. Although destined for
1672 release as a man page with the standard PDL distribution, it is not
1673 public domain. Permission is granted to freely distribute verbatim
1674 copies of this document provided that no modifications outside of
1675 formatting be made, and that this notice remain intact. You are
1676 permitted and encouraged to use its code and derivatives thereof in
1677 your own source code for fun or for profit as you see fit.
1678
1679
1680
1681perl v5.12.3 2009-10-17 PP(1)