1PP(1) User Contributed Perl Documentation PP(1)
2
3
4
6 PDL::PP - Generate PDL routines from concise descriptions
7
9 e.g.
10
11 pp_def(
12 'sumover',
13 Pars => 'a(n); [o]b();',
14 Code => 'double tmp=0;
15 loop(n) %{ tmp += $a(); %}
16 $b() = tmp;
17 '
18 );
19
20 pp_done();
21
23 In much of what follows we will assume familiarity of the reader with
24 the concepts of implicit and explicit threading and index manipulations
25 within PDL. If you have not yet heard of these concepts or are not very
26 comfortable with them it is time to check PDL::Indexing.
27
28 As you may appreciate from its name PDL::PP is a Pre-Processor, i.e.
29 it expands code via substitutions to make real C-code (well, actually
30 it outputs XS code (See perlxs) but that is very close to C).
31
33 Why do we need PP? Several reasons: firstly, we want to be able to gen‐
34 erate subroutine code for each of the PDL datatypes (PDL_Byte,
35 PDL_Short,. etc). AUTOMATICALLY. Secondly, when referring to slices
36 of PDL arrays in Perl (e.g. "$a->slice('0:10:2,:')" or other things
37 such as transposes) it is nice to be able to do this transparently and
38 to be able to do this 'in-place' - i.e, not to have to make a memory
39 copy of the section. PP handles all the necessary element and offset
40 arithmetic for you. There are also the notions of threading (repeated
41 calling of the same routine for multiple slices, see PDL::Indexing) and
42 dataflow (see PDL::Dataflow) which use of PP allows.
43
44 So how do you use PP? Well for the most part you just write ordinary C
45 code except for special PP constructs which take the form:
46
47 $something(something else)
48
49 or:
50
51 PPfunction %{
52 <stuff>
53 %}
54
55 The most important PP construct is the form "$array()". Consider the
56 very simple PP function to sum the elements of a 1D vector (in fact
57 this is very similar to the actual code used by 'sumover'):
58
59 pp_def('sumit',
60 Pars => 'a(n); [o]b();',
61 Code => '
62 double tmp;
63 tmp = 0;
64 loop(n) %{
65 tmp += $a();
66 %}
67 $b() = tmp;
68 ');
69
70 What's going on? The "Pars =>" line is very important for PP - it spec‐
71 ifies all the arguments and their dimensionality. We call this the sig‐
72 nature of the PP function (compare also the explanations in PDL::Index‐
73 ing). In this case the routine takes a 1-D function as input and
74 returns a 0-D scalar as output. The "$a()" PP construct is used to
75 access elements of the array a(n) for you - PP fills in all the
76 required C code.
77
78 [Aside: since PP used "$var()" for its parsing you must single-quote
79 all Code=> arguments since you don't want perl to interpolate "$var()"
80 into another string - i.e. don't use "" unless you know what you are
81 doing! Tjl: it's usually easiest to use single quotes and 'some‐
82 thing'.$interpolatable.'somethingelse']
83
84 In the simple case here where all elements are accessed the PP con‐
85 struct "loop(n) %{ ... %}" is used to loop over all elements in dimen‐
86 sion "n". Note this feature of PP: ALL DIMENSIONS ARE SPECIFIED BY
87 NAME.
88
89 This is made clearer if we avoid the PP loop() construct and write the
90 loop explicitly using conventional C:
91
92 pp_def('sumit',
93 Pars => 'a(n); [o]b();',
94 Code => '
95 int i,n_size;
96 double tmp;
97 n_size = $SIZE(n);
98 tmp = 0;
99 for(i=0; i<n_size; i++) {
100 tmp += $a(n=>i);
101 }
102 $b() = tmp;
103 ');
104
105 which does the same as before, except more long-windedly. You can see
106 to get element "i" of a() we say "$a(n=>i)" - we are specifying the
107 dimension by name "n". In 2D we might say:
108
109 Pars=>'a(m,n);',
110 ...
111 tmp += $a(m=>i,n=>j);
112 ...
113
114 The syntax 'm=>i' borrows from Perl hashes (which are in fact used in
115 the implementation of PP). One could also say "$a(n=>j,m=>i)" as order
116 is not important.
117
118 You can also see in the above example the use of another PP construct -
119 $SIZE(n) to get the length of the dimension "n".
120
121 It should, however, be noted that you shouldn't write an explicit
122 C-loop when you could have used the PP "loop" construct since PDL::PP
123 checks automatically the loop limits for you, usage of "loop" makes the
124 code more concise, etc. But there are certainly situations where you
125 need explicit control of the loop and now you know how to do it ;).
126
127 To revisit 'Why PP?' - the above code for sumit() will be generated for
128 each data-type. It will operate on slices of arrays 'in-place'. It will
129 thread automatically - e.g. if a 2D array is given it will be called
130 repeatedly for each 1D row (again check PDL::Indexing for the details
131 of threading). And then b() will be a 1D array of sums of each row.
132 We could call it with $a->xchg(0,1) to sum the colums instead. And
133 Dataflow tracing etc. will be available.
134
135 You can see PP saves the programmer from writing a lot of needlessly
136 repetitive C-code -- in our opinion this is one of the best features of
137 PDL making writing new C subroutines for PDL an amazingly concise exer‐
138 cise. A second reason is the ability to make PP expand your concise
139 code definitions into different C code based on the needs of the com‐
140 puter architecture in question. Imagine for example you are lucky to
141 have a supercomputer at your hands; in that case you want PDL::PP cer‐
142 tainly to generate code that takes advantage of the vectorising/paral‐
143 lel computing features of your machine (this a project for the future).
144 In any case, the bottom line is that your unchanged code should still
145 expand to working XS code even if the internals of PDL changed.
146
147 Also, because you are generating the code in an actual Perl script,
148 there are many fun things that you can do. Let's say that you need to
149 write both sumit (as above) and multit. With a little bit of inventiv‐
150 ity, we can do
151
152 for({Name => 'sumit', Init => '0', Op => '+='},
153 {Name => 'multit', Init => '1', Op => '*='}) {
154 pp_def($_->{Name},
155 Pars => 'a(n); [o]b();',
156 Code => '
157 double tmp;
158 tmp = '.$_->{Init}.';
159 loop(n) %{
160 tmp '.$_->{Op}.' $a();
161 %}
162 $b() = tmp;
163 ');
164 }
165
166 which defines both the functions easily. Now, if you later need to
167 change the signature or dimensionality or whatever, you only need to
168 change one place in your code. Yeah, sure, your editor does have 'cut
169 and paste' and 'search and replace' but it's still less bothersome and
170 definitely more difficult to forget just one place and have strange
171 bugs creep in. Also, adding 'orit' (bitwise or) later is a one-liner.
172
173 And remember, you really have perl's full abilities with you - you can
174 very easily read any input file and make routines from the information
175 in that file. For simple cases like the above, the author (Tjl) cur‐
176 rently favors the hash syntax like the above - it's not too much more
177 characters than the corresponding array syntax but much easier to
178 understand and change.
179
180 We should mention here also the ability to get the pointer to the
181 beginning of the data in memory - a prerequisite for interfacing PDL to
182 some libraries. This is handled with the "$P(var)" directive, see
183 below.
184
185 So, after this quick overview of the general flavour of programming PDL
186 routines using PDL::PP let's summarise in which circumstances you
187 should actually use this preprocessor/precompiler. You should use
188 PDL::PP if you want to
189
190 · interface PDL to some external library
191
192 · write some algorithm that would be slow if coded in perl (this is
193 not as often as you think; take a look at threading and dataflow
194 first).
195
196 · be a PDL developer (and even then it's not obligatory)
197
199 Because of its architecture, PDL::PP can be both flexible and easy to
200 use (yet exuberantly complicated) at the same time. Currently, part of
201 the problem is that error messages are not very informative and if
202 something goes wrong, you'd better know what you are doing and be able
203 to hack your way through the internals (or be able to figure out by
204 trial and error what is wrong with your args to "pp_def").
205
206 An alternative, of course, is to ask someone about it (e.g., through
207 the mailing lists).
208
210 Now that you have some idea how to use "pp_def" to define new PDL func‐
211 tions it is time to explain the general syntax of "pp_def". "pp_def"
212 takes as arguments first the name of the function you are defining and
213 then a hash list that can contain various keys.
214
215 Based on these keys PP generates XS code and a .pm file. The function
216 "pp_done" (see example in the SYNOPSIS) is used to tell PDL::PP that
217 there are no more definitions in this file and it is time to generate
218 the .xs and
219 .pm file.
220
221 As a consequence, there may be several pp_def() calls inside a file (by
222 convention files with PP code have the extension .pd or .pp) but gener‐
223 ally only one pp_done().
224
225 There are two main different types of usage of pp_def(), the 'data
226 operation' and 'slice operation' prototypes.
227
228 The 'data operation' is used to take some data, mangle it and output
229 some other data; this includes for example the '+' operation, matrix
230 inverse, sumover etc and all the examples we have talked about in this
231 document so far. Implicit and explicit threading and the creation of
232 the result are taken care of automatically in those opeartions. You can
233 even do dataflow with "sumit", "sumover", etc (don't be dismayed if you
234 don't understand the concept of dataflow in PDL very well yet; it is
235 still very much experimental).
236
237 The 'slice operation' is a different kind of operation: in a slice
238 operation, you are not changing any data, you are defining correspon‐
239 dences between different elements of two piddles (examples include the
240 index manipulation/slicing function definitions in the file slices.pd
241 that is part of the PDL distribution; but beware, this is not introduc‐
242 tory level stuff).
243
244 If PDL was compiled with support for bad values (ie "WITH_BADVAL =>
245 1"), then additional keys are required for "pp_def", as explained
246 below.
247
248 If you are just interested in communicating with some external library
249 (for example some linear algebra/matrix library), you'll usually want
250 the 'data operation' so we are going to discuss that first.
251
253 A simple example
254
255 In the data operation, you must know what dimensions of data you need.
256 First, an example with scalars:
257
258 pp_def('add',
259 Pars => 'a(); b(); [o]c();',
260 Code => '$c() = $a() + $b();'
261 );
262
263 That looks a little strange but let's dissect it. The first line is
264 easy: we're defining a routine with the name 'add'. The second line
265 simply declares our parameters and the parentheses mean that they are
266 scalars. We call the string that defines our parameters and their
267 dimensionality the signature of that function. For its relevance with
268 regard to threading and index manipulations check the PDL::Indexing
269 manpage.
270
271 The third line is the actual operation. You need to use the dollar
272 signs and parentheses to refer to your parameters (this will probably
273 change at some point in the future, once a good syntax is found).
274
275 These lines are all that is necessary to actually define the function
276 for PDL (well, actually it isn't; you aditionally need to write a Make‐
277 file.PL (see below) and build the module (something like 'perl Make‐
278 file.PL; make'); but let's ignore that for the moment). So now you can
279 do
280
281 use MyModule;
282 $a = pdl 2,3,4;
283 $b = pdl 5;
284
285 $c = add($a,$b);
286 # or
287 add($a,$b,($c=null)); # Alternative form, useful if $c has been
288 # preset to something big, not useful here.
289
290 and have threading work correctly (the result is $c == [7 8 9]).
291
292 The Pars section: the signature of a PP function
293
294 Seeing the above example code you will most probably ask: what is this
295 strange "$c=null" syntax in the second call to our new "add" function?
296 If you take another look at the definition of "add" you will notice
297 that the third argument "c" is flagged with the qualifier "[o]" which
298 tells PDL::PP that this is an output argument. So the above call to add
299 means 'create a new $c from scratch with correct dimensions' - "null"
300 is a special token for 'empty piddle' (you might ask why we haven't
301 used the value "undef" to flag this instead of the PDL specific "null";
302 we are currently thinking about it ;).
303
304 [This should be explained in some other section of the manual as
305 well!!] The reason for having this syntax as an alternative is that if
306 you have really huge piddles, you can do
307
308 $c = PDL->null;
309 for(some long loop) {
310 # munge a,b
311 add($a,$b,$c);
312 # munge c, put something back to a,b
313 }
314
315 and avoid allocating and deallocating $c each time. It is allocated
316 once at the first add() and thereafter the memory stays until $c is
317 destroyed.
318
319 If you just say
320
321 $c = add($a,$b);
322
323 the code generated by PP will automatically fill in "$c=null" and
324 return the result. If you want to learn more about the reasons why
325 PDL::PP supports this style where output arguments are given as last
326 arguments check the PDL::Indexing manpage.
327
328 "[o]" is not the only qualifier a pdl argument can have in the signa‐
329 ture. Another important qualifier is the "[t]" option which flags a
330 pdl as temporary. What does that mean? You tell PDL::PP that this pdl
331 is only used for temporary results in the course of the calculation and
332 you are not interested in its value after the computation has been com‐
333 pleted. But why should PDL::PP want to know about this in the first
334 place? The reason is closely related to the concepts of pdl auto cre‐
335 ation (you heard about that above) and implicit threading. If you use
336 implicit threading the dimensionality of automatically created pdls is
337 actually larger than that specified in the signature. With "[o]"
338 flagged pdls will be created so that they have the additional dimen‐
339 sions as required by the number of implicit thread dimensions. When
340 creating a temporary pdl, however, it will always only be made big
341 enough so that it can hold the result for one iteration in a thread‐
342 loop, i.e. as large as required by the signature. So less memory is
343 wasted when you flag a pdl as temporary. Secondly, you can use output
344 auto creation with temporary pdls even when you are using explicit
345 threading which is forbidden for normal output pdls flagged with "[o]"
346 (see PDL::Indexing).
347
348 Here is an example where we use the [t] qualifier. We define the func‐
349 tion "callf" that calls a C routine "f" which needs a temporary array
350 of the same size and type as the array "a" (sorry about the forward
351 reference for $P; it's a pointer access, see below) :
352
353 pp_def('callf',
354 Pars => 'a(n); [t] tmp(n); [o] b()',
355 Code => 'int ns = $SIZE(n);
356 f($P(a),$P(b),$P(tmp),ns);
357 '
358 );
359
360 Argument dimensions and the signature
361
362 Now we have just talked about dimensions of pdls and the signature. How
363 are they related? Let's say that we want to add a scalar + the index
364 number to a vector:
365
366 pp_def('add2',
367 Pars => 'a(n); b(); [o]c(n);',
368 Code => 'loop(n) %{
369 $c() = $a() + $b() + n;
370 %}'
371 );
372
373 There are several points to notice here: first, the "Pars" argument now
374 contains the n arguments to show that we have a single dimensions in a
375 and c. It is important to note that dimensions are actual entities that
376 are accessed by name so this declares a and c to have the same first
377 dimensions. In most PP definitions the size of named dimensions will be
378 set from the respective dimensions of non-output pdls (those with no
379 "[o]" flag) but sometimes you might want to set the size of a named
380 dimension explicitly through an integer parameter. See below in the
381 description of the "OtherPars" section how that works.
382
383 Type conversions and the signature
384
385 The signature also determines the type conversions that will be per‐
386 formed when a PP function is invoked. So what happens when we invoke
387 one of our previously defined functions with pdls of different type,
388 e.g.
389
390 add2($a,$b,($ret=null));
391
392 where $a is of type "PDL_Float" and $b of type "PDL_Short"? With the
393 signature as shown in the definition of "add2" above the datatype of
394 the operation (as determined at runtime) is that of the pdl with the
395 'highest' type (sequence is byte < short < ushort < long < float < dou‐
396 ble). In the add2 example the datatype of the operation is float ($a
397 has that datatype). All pdl arguments are then type converted to that
398 datatype (they are not converted inplace but a copy with the right type
399 is created if a pdl argument doesn't have the type of the operation).
400 Null pdls don't contribute a type in the determination of the type of
401 the operation. However, they will be created with the datatype of the
402 operation; here, for example, $ret will be of type float. You should be
403 aware of these rules when calling PP functions with pdls of different
404 types to take the additional storage and runtime requirements into
405 account.
406
407 These type conversions are correct for most functions you normally
408 define with "pp_def". However, there are certain cases where slightly
409 modified type conversion behaviour is desired. For these cases addi‐
410 tional qualifiers in the signature can be used to specify the desired
411 properties with regard to type conversion. These qualifiers can be com‐
412 bined with those we have encountered already (the creation qualifiers
413 "[o]" and "[t]"). Let's go through the list of qualifiers that change
414 type conversion behaviour.
415
416 The most important is the "int" qualifier which comes in handy when a
417 pdl argument represents indices into another pdl. Let's take a look at
418 an example from "PDL::Ufunc":
419
420 pp_def('maximum_ind',
421 Pars => 'a(n); int [o] b()',
422 Code => '$GENERIC() cur;
423 int curind;
424 loop(n) %{
425 if (!n ⎪⎪ $a() > cur) {cur = $a(); curind = n;}
426 %}
427 $b() = curind;',
428 );
429
430 The function "maximum_ind" finds the index of the largest element of a
431 vector. If you look at the signature you notice that the output argu‐
432 ment "b" has been declared with the additional "int" qualifier. This
433 has the following consequences for type conversions: regardless of the
434 type of the input pdl "a" the output pdl "b" will be of type "PDL_Long"
435 which makes sense since "b" will represent an index into "a". Further‐
436 more, if you call the function with an existing output pdl "b" its type
437 will not influence the datatype of the operation (see above). Hence,
438 even if "a" is of a smaller type than "b" it will not be converted to
439 match the type of "b" but stays untouched, which saves memory and CPU
440 cycles and is the right thing to do when "b" represents indices. Also
441 note that you can use the 'int' qualifier together with other quali‐
442 fiers (the "[o]" and "[t]" qualifiers). Order is significant -- type
443 qualifiers precede creation qualifiers ("[o]" and "[t]").
444
445 The above example also demonstrates typical usage of the "$GENERIC()"
446 macro. It expands to the current type in a so called generic loop.
447 What is a generic loop? As you already heard a PP function has a run‐
448 time datatype as determined by the type of the pdl arguments it has
449 been invoked with. The PP generated XS code for this function there‐
450 fore contains a switch like "switch (type) {case PDL_Byte: ... case
451 PDL_Double: ...}" that selects a case based on the runtime datatype of
452 the function (it's called a type ``loop'' because there is a loop in PP
453 code that generates the cases). In any case your code is inserted once
454 for each PDL type into this switch statement. The "$GENERIC()" macro
455 just expands to the respective type in each copy of your parsed code in
456 this "switch" statement, e.g., in the "case PDL_Byte" section "cur"
457 will expand to "PDL_Byte" and so on for the other case statements. I
458 guess you realise that this is a useful macro to hold values of pdls in
459 some code.
460
461 There are a couple of other qualifiers with similar effects as "int".
462 For your convenience there are the "float" and "double" qualifiers with
463 analogous consequences on type conversions as "int". Let's assume you
464 have a very large array for which you want to compute row and column
465 sums with an equivalent of the "sumover" function. However, with the
466 normal definition of "sumover" you might run into problems when your
467 data is, e.g. of type short. A call like
468
469 sumover($large_pdl,($sums = null));
470
471 will result in $sums be of type short and is therefore prone to over‐
472 flow errors if $large_pdl is a very large array. On the other hand
473 calling
474
475 @dims = $large_pdl->dims; shift @dims;
476 sumover($large_pdl,($sums = zeroes(double,@dims)));
477
478 is not a good alternative either. Now we don't have overflow problems
479 with $sums but at the expense of a type conversion of $large_pdl to
480 double, something bad if this is really a large pdl. That's where "dou‐
481 ble" comes in handy:
482
483 pp_def('sumoverd',
484 Pars => 'a(n); double [o] b()',
485 Code => 'double tmp=0;
486 loop(n) %{ tmp += a(); %}
487 $b() = tmp;',
488 );
489
490 This gets us around the type conversion and overflow problems. Again,
491 analogous to the "int" qualifier "double" results in "b" always being
492 of type double regardless of the type of "a" without leading to a type‐
493 conversion of "a" as a side effect.
494
495 Finally, there are the "type+" qualifiers where type is one of "int" or
496 "float". What shall that mean. Let's illustrate the "int+" qualifier
497 with the actual definition of sumover:
498
499 pp_def('sumover',
500 Pars => 'a(n); int+ [o] b()',
501 Code => '$GENERIC(b) tmp=0;
502 loop(n) %{ tmp += a(); %}
503 $b() = tmp;',
504 );
505
506 As we had already seen for the "int", "float" and "double" qualifiers,
507 a pdl marked with a "type+" qualifier does not influence the datatype
508 of the pdl operation. Its meaning is "make this pdl at least of type
509 "type" or higher, as required by the type of the operation". In the
510 sumover example this means that when you call the function with an "a"
511 of type PDL_Short the output pdl will be of type PDL_Long (just as
512 would have been the case with the "int" qualifier). This again tries to
513 avoid overflow problems when using small datatypes (e.g. byte images).
514 However, when the datatype of the operation is higher than the type
515 specified in the "type+" qualifier "b" will be created with the
516 datatype of the operation, e.g. when "a" is of type double then "b"
517 will be double as well. We hope you agree that this is sensible behav‐
518 iour for "sumover". It should be obvious how the "float+" qualifier
519 works by analogy. It may become necessary to be able to specify a set
520 of alternative types for the parameters. However, this will probably
521 not be implemented until someone comes up with a reasonable use for it.
522
523 Note that we now had to specify the $GENERIC macro with the name of the
524 pdl to derive the type from that argument. Why is that? If you care‐
525 fully followed our explanations you will have realised that in some
526 cases "b" will have a different type than the type of the operation.
527 Calling the '$GENERIC' macro with "b" as argument makes sure that the
528 type will always the same as that of "b" in that part of the generic
529 loop.
530
531 This is about all there is to say about the "Pars" section in a
532 "pp_def" call. You should remember that this section defines the signa‐
533 ture of a PP defined function, you can use several options to qualify
534 certain arguments as output and temporary args and all dimensions that
535 you can later refer to in the "Code" section are defined by name.
536
537 It is important that you understand the meaning of the signature since
538 in the latest PDL versions you can use it to define threaded functions
539 from within perl, i.e. what we call perl level threading. Please check
540 PDL::Indexing for details.
541
542 The Code section
543
544 The "Code" section contains the actual XS code that will be in the
545 innermost part of a threadloop (if you don't know what a thread loop is
546 then you still haven't read PDL::Indexing; do it now ;) after any PP
547 macros (like $GENERIC) and PP functions have been expanded (like the
548 "loop" function we are going to explain next).
549
550 Let's quickly reiterate the "sumover" example:
551
552 pp_def('sumover',
553 Pars => 'a(n); int+ [o] b()',
554 Code => '$GENERIC(b) tmp=0;
555 loop(n) %{ tmp += a(); %}
556 $b() = tmp;',
557 );
558
559 The "loop" construct in the "Code" section also refers to the dimension
560 name so you don't need to specify any limits: the loop is correctly
561 sized and everything is done for you, again.
562
563 Next, there is the surprising fact that "$a()" and "$b()" do not con‐
564 tain the index. This is not necessary because we're looping over n and
565 both variables know which dimensions they have so they automatically
566 know they're being looped over.
567
568 This feature comes in very handy in many places and makes for much
569 shorter code. Of course, there are times when you want to circumvent
570 this; here is a function which symmetrizes a matrix and serves as an
571 example of how to code explicit looping:
572
573 pp_def('symm',
574 Pars => 'a(n,n); [o]c(n,n);',
575 Code => 'loop(n) %{
576 int n2;
577 for(n2=n; n2<$SIZE(n); n2++) {
578 $c(n0 => n, n1 => n2) =
579 $c(n0 => n2, n1 => n) =
580 $a(n0 => n, n1 => n2);
581 }
582 %}
583 '
584 );
585
586 Let's dissect what is happening. Firstly, what is this function sup‐
587 posed to do? From its signature you see that it takes a 2D matrix with
588 equal numbers of columns and rows and outputs a matrix of the same
589 size. From a given input matrix $a it computes a symmetric output
590 matrix $c (symmetric in the matrix sense that A^T = A where ^T means
591 matrix transpose, or in PDL parlance $c == $c->xchg(0,1)). It does this
592 by using only the values on and below the diagonal of $a. In the output
593 matrix $c all values on and below the diagonal are the same as those in
594 $a while those above the diagonal are a mirror image of those below the
595 diagonal (above and below are here interpreted in the way that PDL
596 prints 2D pdls). If this explanation still sounds a bit strange just go
597 ahead, make a little file into which you write this definition, build
598 the new PDL extension (see section on Makefiles for PP code) and try it
599 out with a couple of examples.
600
601 Having explained what the function is supposed to do there are a couple
602 of points worth noting from the syntactical point of view. First, we
603 get the size of the dimension named "n" again by using the $SIZE macro.
604 Second, there are suddenly these funny "n0" and "n1" index names in the
605 code though the signature defines only the dimension "n". Why this? The
606 reason becomes clear when you note that both the first and second
607 dimension of $a and $b are named "n" in the signature of "symm". This
608 tells PDL::PP that the first and second dimension of these arguments
609 should have the same size. Otherwise the generated function will raise
610 a runtime error. However, now in an access to $a and $c PDL::PP cannot
611 figure out which index "n" refers to any more just from the name of the
612 index. Therefore, the indices with equal dimension names get numbered
613 from left to right starting at 0, e.g. in the above example "n0" refers
614 to the first dimension of $a and $c, "n1" to the second and so on.
615
616 In all examples so far, we have only used the "Pars" and "Code" members
617 of the hash that was passed to "pp_def". There are certainly other keys
618 that are recognised by PDL::PP and we will hear about some of them in
619 the course of this document. Find a (non-exhaustive) list of keys in
620 Appendix A. A list of macros and PPfunctions (we have only encountered
621 some of those in the examples above yet) that are expanded in values of
622 the hash argument to "pp_def" is summarised in Appendix B.
623
624 At this point, it might be appropriate to mention that PDL::PP is not a
625 completely static, well designed set of routines (as Tuomas puts it:
626 "stop thinking of PP as a set of routines carved in stone") but rather
627 a collection of things that the PDL::PP author (Tuomas J. Lukka) con‐
628 sidered he would have to write often into his PDL extension routines.
629 PP tries to be expandable so that in the future, as new needs arise,
630 new common code can be abstracted back into it. If you want to learn
631 more on why you might want to change PDL::PP and how to do it check the
632 section on PDL::PP internals.
633
634 Handling bad values
635
636 If you do not have bad-value support compiled into PDL you can ignore
637 this section and the related keys: "BadCode", "HandleBad", ... (try
638 printing out the value of $PDL::Bad::Status - if it equals 0 then move
639 straight on).
640
641 There are several keys and macros used when writing code to handle bad
642 values. The first one is the "HandleBad" key:
643
644 HandleBad => 0
645 This flags a pp-routine as NOT handling bad values. If this routine
646 is sent piddles with their "badflag" set, then a warning message is
647 printed to STDOUT and the piddles are processed as if the value
648 used to represent bad values is a valid number. The "badflag" value
649 is not propogated to the output piddles.
650
651 An example of when this is used is for FFT routines, which gener‐
652 ally do not have a way of ignoring part of the data.
653
654 HandleBad => 1
655 This causes PDL::PP to write extra code that ensures the BadCode
656 section is used, and that the "$ISBAD()" macro (and its brethren)
657 work.
658
659 HandleBad is not given
660 If any of the input piddles have their "badflag" set, then the out‐
661 put piddles will have their "badflag" set, but any supplied BadCode
662 is ignored.
663
664 The value of "HandleBad" is used to define the contents of the "BadDoc"
665 key, if it is not given.
666
667 To handle bad values, code must be written somewhat differently; for
668 instance,
669
670 $c() = $a() + $b();
671
672 becomes something like
673
674 if ( $a() != BADVAL && $b() != BADVAL ) {
675 $c() = $a() + $b();
676 } else {
677 $c() = BADVAL;
678 }
679
680 However, we only want the second version if bad values are present in
681 the input piddles (and that bad-value support is wanted!) - otherwise
682 we actually want the original code. This is where the "BadCode" key
683 comes in; you use it to specify the code to execute if bad values may
684 be present, and PP uses both it and the "Code" section to create some‐
685 thing like:
686
687 if ( bad_values_are_present ) {
688 fancy_threadloop_stuff {
689 BadCode
690 }
691 } else {
692 fancy_threadloop_stuff {
693 Code
694 }
695 }
696
697 This approach means that there is virtually no overhead when bad values
698 are not present (ie the badflag routine returns 0).
699
700 The BadCode section can use the same macros and looping constructs as
701 the Code section. However, it wouldn't be much use without the follow‐
702 ing additional macros:
703
704 $ISBAD(var)
705 To check whether a piddle's value is bad, use the $ISBAD macro:
706
707 if ( $ISBAD(a()) ) { printf("a() is bad\n"); }
708
709 You can also access given elements of a piddle:
710
711 if ( $ISBAD(a(n=>l)) ) { printf("element %d of a() is bad\n", l); }
712
713 $ISGOOD(var)
714 This is the opposite of the $ISBAD macro.
715
716 $SETBAD(var)
717 For when you want to set an element of a piddle bad.
718
719 $ISBADVAR(c_var,pdl)
720 If you have cached the value of a piddle "$a()" into a c-variable
721 ("foo" say), then to check whether it is bad, use "$ISBAD‐
722 VAR(foo,a)".
723
724 $ISGOODVAR(c_var,pdl)
725 As above, but this time checking that the cached value isn't bad.
726
727 $SETBADVAR(c_var,pdl)
728 To copy the bad value for a piddle into a c variable, use "$SETBAD‐
729 VAR(foo,a)".
730
731 TODO: mention "$PPISBAD()" etc macros.
732
733 Using these macros, the above code could be specified as:
734
735 Code => '$c() = $a() + $b();',
736 BadCode => '
737 if ( $ISBAD(a()) ⎪⎪ $ISBAD(b()) ) {
738 $SETBAD(c());
739 } else {
740 $c() = $a() + $b();
741 }',
742
743 Since this is perl, TMTOWTDI, so you could also write:
744
745 BadCode => '
746 if ( $ISGOOD(a()) && $ISGOOD(b()) ) {
747 $c() = $a() + $b();
748 } else {
749 $SETBAD(c());
750 }',
751
752 If you want access to the value of the badflag for a given piddle, you
753 can use the "$PDLSTATExxxx()" macros:
754
755 $PDLSTATEISBAD(pdl)
756 $PDLSTATEISGOOD(pdl)
757 $PDLSTATESETBAD(pdl)
758 $PDLSTATESETGOOD(pdl)
759
760 TODO: mention the "FindBadStatusCode" and "CopyBadStatusCode" options
761 to "pp_def", as well as the "BadDoc" key.
762
763 Interfacing your own/library functions using PP
764
765 Now, consider the following: you have your own C function (that may in
766 fact be part of some library you want to interface to PDL) which takes
767 as arguments two pointers to vectors of double:
768
769 void myfunc(int n,double *v1,double *v2);
770
771 The correct way of defining the PDL function is
772
773 pp_def('myfunc',
774 Pars => 'a(n); [o]b(n);',
775 GenericTypes => [D],
776 Code => 'myfunc($SIZE(n),$P(a),$P(b));'
777 );
778
779 The "$P("par")" syntax returns a pointer to the first element and the
780 other elements are guaranteed to lie after that.
781
782 Notice that here it is possible to make many mistakes. First, $SIZE(n)
783 must be used instead of "n". Second, you shouldn't put any loops in
784 this code. Third, here we encounter a new hash key recognised by
785 PDL::PP : the "GenericTypes" declaration tells PDL::PP to ONLY GENERATE
786 THE TYPELOOP FOP THE LIST OF TYPES SPECIFIED. In this case "double".
787 This has two advantages. Firstly the size of the compiled code is
788 reduced vastly, secondly if non-double arguments are passed to
789 "myfunc()" PDL will automatically convert them to double before passing
790 to the external C routine and convert them back afterwards.
791
792 One can also use "Pars" to qualify the types of individual arguments.
793 Thus one could also write this as:
794
795 pp_def('myfunc',
796 Pars => 'double a(n); double [o]b(n);',
797 Code => 'myfunc($SIZE(n),$P(a),$P(b));'
798 );
799
800 The type specification in "Pars" exempts the argument from variation in
801 the typeloop - rather it is automatically converted too and from the
802 type specified. This is obviously useful in a more general example,
803 e.g.:
804
805 void myfunc(int n,float *v1,long *v2);
806
807 pp_def('myfunc',
808 Pars => 'float a(n); long [o]b(n);',
809 GenericTypes => [F],
810 Code => 'myfunc($SIZE(n),$P(a),$P(b));'
811 );
812
813 Note we still use "GenericTypes" to reduce the size of the type loop,
814 obviously PP could in principle spot this and do it automatically
815 though the code has yet to attain that level of sophistication!
816
817 Finally note when types are converted automatically one MUST use the
818 "[o]" qualifier for output variables or you hard one changes will get
819 optimised away by PP!
820
821 If you interface a large library you can automate the interfacing even
822 further. Perl can help you again(!) in doing this. In many libraries
823 you have certain calling conventions. This can be exploited. In short,
824 you can write a little parser (which is really not difficult in perl)
825 that then generates the calls to "pp_def" from parsed descriptions of
826 the functions in that library. For an example, please check the Slatec
827 interface in the "Lib" tree of the PDL distribution. If you want to
828 check (during debugging) which calls to PP functions your perl code
829 generated a little helper package comes in handy which replaces the PP
830 functions by identically named ones that dump their arguments to std‐
831 out.
832
833 Just say
834
835 perl -MPDL::PP::Dump myfile.pd
836
837 to see the calls to "pp_def" and friends. Try it with ops.pd and
838 slatec.pd. If you're interested (or want to enhance it), the source is
839 in Basic/Gen/PP/Dump.pm
840
841 Other macros and functions in the Code section
842
843 Macros: So far we have encountered the $SIZE, $GENERIC and $P macros.
844 Now we are going to quickly explain the other macros that are expanded
845 in the "Code" section of PDL::PP along with examples of their usage.
846
847 $T The $T macro is used for type switches. This is very useful when you
848 have to use different external (e.g. library) functions depending on
849 the input type of arguments. The general syntax is
850
851 $Ttypeletters(type_alternatives)
852
853 where "typeletters" is a permutation of a subset of the letters
854 "BSULFD" which stand for Byte, Short, Ushort, etc. and "type_alter‐
855 natives" are the expansions when the type of the PP operation is
856 equal to that indicated by the respective letter. Let's illustrate
857 this incomprehensible description by an example. Assuming you have
858 two C functions with prototypes
859
860 void float_func(float *in, float *out);
861 void double_func(double *in, double *out);
862
863 which do basically the same thing but one accepts float and the
864 other double pointers. You could interface them to PDL by defining a
865 generic function "foofunc" (which will call the correct function
866 depending on the type of the transformation):
867
868 pp_def('foofunc',
869 Pars => ' a(n); [o] b();',
870 Code => ' $TFD(float_func,double_func) ($P(a),$P(b));'
871 GenericTypes => [F,D],
872 );
873
874 Please note that you can't say
875
876 Code => ' $TFD(float,double)_func ($P(a),$P(b));'
877
878 since the $T macro expands with trailing spaces, analogously to C
879 preprocessor macros. The slightly longer form illustrated above is
880 correct. If you really want brevity, you can of course do
881
882 '$TBSULFD('.(join ',',map {"long_identifier_name_$_"}
883 qw/byt short unseigned lounge flotte dubble/).');'
884
885 $PP
886 The $PP macro is used for a so called physical pointer access. The
887 physical refers to some internal optimisations of PDL (for those who
888 are familiar with the PDL core we are talking about the vaffine
889 optimisations). This macro is mainly for internal use and you
890 shouldn't need to use it in any of your normal code.
891
892 $COMP (and the "OtherPars" section)
893 The $COMP macro is used to access non-pdl values in the code sec‐
894 tion. Its name is derived from the implementation of transformations
895 in PDL. The variables you can refer to using $COMP are members of
896 the ``compiled'' structure that represents the PDL transformation in
897 question but does not yet contain any information about dimensions
898 (for further details check PDL::Internals). However, you can treat
899 $COMP just as a black box without knowing anything about the imple‐
900 mentation of transformations in PDL. So when would you use this
901 macro? Its main usage is to access values of arguments that are
902 declared in the "OtherPars" section of a "pp_def" definition. But
903 then you haven't heard about the "OtherPars" key yet?! Let's have
904 another example that illustrates typical usage of both new features:
905
906 pp_def('pnmout',
907 Pars => 'a(m)',
908 OtherPars => "char* fd",
909 GenericTypes => [B,U,S,L],
910 Code => 'PerlIO *fp;
911 IO *io;
912
913 io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
914 if (!io ⎪⎪ !(fp = IoIFP(io)))
915 croak("Can\'t figure out FP");
916
917 if (PerlIO_write(fp,$P(a),len) != len)
918 croak("Error writing pnm file");
919 ');
920
921 This function is used to write data from a pdl to a file. The file
922 descriptor is passed as a string into this function. This parameter
923 does not go into the "Pars" section since it cannot be usefully
924 treated like a pdl but rather into the aptly named "OtherPars" sec‐
925 tion. Parameters in the "OtherPars" section follow those in the
926 "Pars" section when invoking the function, i.e.
927
928 open FILE,">out.dat" or die "couldn't open out.dat";
929 pnmout($pdl,'FILE');
930
931 When you want to access this parameter inside the code section you
932 have to tell PP by using the $COMP macro, i.e. you write "$COMP(fd)"
933 as in the example. Otherwise PP wouldn't know that the "fd" you are
934 referring to is the same as that specified in the "OtherPars" sec‐
935 tion.
936
937 Another use for the "OtherPars" section is to set a named dimension
938 in the signature. Let's have an example how that is done:
939
940 pp_def('setdim',
941 Pars => '[o] a(n)',
942 OtherPars => 'int ns => n',
943 Code => 'loop(n) %{ $a() = n; %}',
944 );
945
946 This says that the named dimension "n" will be initialised from the
947 value of the other parameter "ns" which is of integer type (I guess
948 you have realised that we use the "CType From => named_dim" syntax).
949 Now you can call this function in the usual way:
950
951 setdim(($a=null),5);
952 print $a;
953 [ 0 1 2 3 4 ]
954
955 Admittedly this function is not very useful but it demonstrates how
956 it works. If you call the function with an existing pdl and you
957 don't need to explicitly specify the size of "n" since PDL::PP can
958 figure it out from the dimensions of the non-null pdl. In that case
959 you just give the dimension parameter as "-1":
960
961 $a = hist($b);
962 setdim($a,-1);
963
964 That should do it.
965
966 The only PP function that we have used in the examples so far is
967 "loop". Additionally, there are currently two other functions which
968 are recognised in the "Code" section:
969
970 threadloop
971 As we heard above the signature of a PP defined function defines the
972 dimensions of all the pdl arguments involved in a primitive opera‐
973 tion. However, you often call the functions that you defined with PP
974 with pdls that have more dimensions than those specified in the sig‐
975 nature. In this case the primitive operation is performed on all sub‐
976 slices of appropriate dimensionality in what is called a threadloop
977 (see also overview above and PDL::Indexing). Assuming you have some
978 notion of this concept you will probably appreciate that the opera‐
979 tion specified in the code section should be optimised since this is
980 the tightest loop inside a threadloop. However, if you revisit the
981 example where we define the "pnmout" function, you will quickly
982 realise that looking up the "IO" file descriptor in the inner thread‐
983 loop is not very efficient when writing a pdl with many rows. A bet‐
984 ter approach would be to look up the "IO" descriptor once outside the
985 threadloop and use its value then inside the tightest threadloop.
986 This is exactly where the "threadloop" function comes in handy. Here
987 is an improved definition of "pnmout" which uses this function:
988
989 pp_def('pnmout',
990 Pars => 'a(m)',
991 OtherPars => "char* fd",
992 GenericTypes => [B,U,S,L],
993 Code => 'PerlIO *fp;
994 IO *io;
995 int len;
996
997 io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
998 if (!io ⎪⎪ !(fp = IoIFP(io)))
999 croak("Can\'t figure out FP");
1000
1001 len = $SIZE(m) * sizeof($GENERIC());
1002
1003 threadloop %{
1004 if (PerlIO_write(fp,$P(a),len) != len)
1005 croak("Error writing pnm file");
1006 %}
1007 ');
1008
1009 This works as follows. Normally the C code you write inside the
1010 "Code" section is placed inside a threadloop (i.e., PP generates the
1011 appropriate wrapping XS code around it). However, when you explicitly
1012 use the "threadloop" function, PDL::PP recognises this and doesn't
1013 wrap your code with an additional threadloop. This has the effect
1014 that code you write outside the threadloop is only executed once per
1015 transformation and just the code with in the surrounding "%{ ... %}"
1016 pair is placed within the tightest threadloop. This also comes in
1017 handy when you want to perform a decision (or any other code, espe‐
1018 cially CPU intensive code) only once per thread, i.e.
1019
1020 pp_addhdr('
1021 #define RAW 0
1022 #define ASCII 1
1023 ');
1024 pp_def('do_raworascii',
1025 Pars => 'a(); b(); [o]c()',
1026 OtherPars => 'int mode',
1027 Code => ' switch ($COMP(mode)) {
1028 case RAW:
1029 threadloop %{
1030 /* do raw stuff */
1031 %}
1032 break;
1033 case ASCII:
1034 threadloop %{
1035 /* do ASCII stuff */
1036 %}
1037 break;
1038 default:
1039 croak("unknown mode");
1040 }'
1041 );
1042
1043 types
1044 The types function works similar to the $T macro. However, with the
1045 "types" function the code in the following block (delimited by "%{"
1046 and "%}" as usual) is executed for all those cases in which the
1047 datatype of the operation is any of the types represented by the let‐
1048 ters in the argument to "type", e.g.
1049
1050 Code => '...
1051
1052 types(BSUL) %{
1053 /* do integer type operation */
1054 %}
1055 types(FD) %{
1056 /* do floating point operation */
1057 %}
1058 ...'
1059
1060 Typemap handling in the "OtherPars" section
1061
1062 The "OtherPars" section discussed above is very often absolutely cru‐
1063 cial when you interface external libraries with PDL. However in many
1064 cases the external libraries either use derived types or pointers of
1065 various types.
1066
1067 The standard way to handle this in Perl is to use a "typemap" file.
1068 This is discussed in some detail in perlxs in the standard Perl docu‐
1069 mentation. In PP the functionality is very similar, so you can create a
1070 "typemap" file in the directory where your PP file resides and when it
1071 is built it is automatically read in to figure out the appropriate
1072 translation between the C type and Perl's built-in type.
1073
1074 That said, there are a couple of important differences from the general
1075 handling of types in XS. The first, and probably most important, is
1076 that at the moment pointers to types are not allowed in the "OtherPars"
1077 section. To get around this limitation you must use the "IV" type
1078 (thanks to Judd Taylor for pointing out that this is necessary for
1079 portability).
1080
1081 It is probably best to illustrate this with a couple of code-snippets:
1082
1083 For instance the "gsl_spline_init" function has the following C decla‐
1084 ration:
1085
1086 int gsl_spline_init(gsl_spline * spline,
1087 const double xa[], const double ya[], size_t size);
1088
1089 Clearly the "xa" and "ya" arrays are candidates for being passed in as
1090 piddles and the "size" argument is just the length of these piddles so
1091 that can be handled by the "$SIZE()" macro in PP. The problem is the
1092 pointer to the "gsl_spline" type. The natural solution would be to
1093 write an "OtherPars" declaration of the form
1094
1095 OtherPars => 'gsl_spline *spl'
1096
1097 and write a short "typemap" file which handled this type. This does not
1098 work at present however! So what you have to do is to go around the
1099 problem slightly (and in some ways this is easier too!):
1100
1101 The solution is to declare "spline" in the "OtherPars" section using an
1102 "Integer Value", "IV". This hides the nature of the variable from PP
1103 and you then need to (well to avoid compiler warnings at least!) per‐
1104 form a type cast when you use the variable in your code. Thus "Other‐
1105 Pars" should take the form:
1106
1107 OtherPars => 'IV spl'
1108
1109 and when you use it in the code you will write
1110
1111 (gsl_spline *) $COMP(spl)
1112
1113 so putting this together as Andres Jordan has done (with the modifica‐
1114 tion using "IV" by Judd Taylor) in the "gsl_interp.pd" in the distribu‐
1115 tion source you get:
1116
1117 pp_def('init_meat',
1118 Pars => 'double x(n); double y(n);',
1119 OtherPars => 'IV spl',
1120 Code =>'
1121 gsl_spline_init( (gsl_spline *) $COMP(spl), $P(x),$P(y),$SIZE(n));'
1122 );
1123
1124 where I have removed a macro wrapper call, but that would obscure the
1125 discussion.
1126
1127 The other minor difference as compared to the standard typemap handling
1128 in Perl, is that the user cannot specify non-standard typemap locations
1129 or typemap filenames using the "TYPEMAPS" option in MakeMaker... Thus
1130 you can only use a file called "typemap" and/or the "IV" trick above.
1131
1132 Other useful PP keys in data operation definitions
1133
1134 You have already heard about the "OtherPars" key. Currently, there are
1135 not many other keys for a data operation that will be useful in normal
1136 (whatever that is) PP programming. In fact, it would be interesting to
1137 hear about a case where you think you need more than what is provided
1138 at the moment. Please speak up on one of the PDL mailing lists. Most
1139 other keys recognised by "pp_def" are only really useful for what we
1140 call slice operations (see also above).
1141
1142 One thing that is strongly being planned is variable number of argu‐
1143 ments, which will be a little tricky.
1144
1145 An incomplete list of the available keys:
1146
1147 Inplace
1148 Setting this key marks the routine as working inplace - ie the
1149 input and output piddles are the same. An example is
1150 "$a->inplace->sqrt()" (or "sqrt(inplace($a))").
1151
1152 Inplace => 1
1153 Use when the routine is a unary function, such as "sqrt".
1154
1155 Inplace => ['a']
1156 If there are more than one input piddles, specify the name of
1157 the one that can be changed inplace using an array reference.
1158
1159 Inplace => ['a','b']
1160 If there are more than one output piddle, specify the name of
1161 the input piddle and output piddle in a 2-element array refer‐
1162 ence. This probably isn't needed, but left in for completeness.
1163
1164 If bad values are being used, care must be taken to ensure the pro‐
1165 pogation of the badflag when inplace is being used; consider this
1166 excerpt from Basic/Bad/bad.pd:
1167
1168 pp_def('replacebad',HandleBad => 1,
1169 Pars => 'a(); [o]b();',
1170 OtherPars => 'double newval',
1171 Inplace => 1,
1172 CopyBadStatusCode =>
1173 '/* propogate badflag if inplace AND it has changed */
1174 if ( a == b && $ISPDLSTATEBAD(a) )
1175 PDL->propogate_badflag( b, 0 );
1176
1177 /* always make sure the output is "good" */
1178 $SETPDLSTATEGOOD(b);
1179 ',
1180 ...
1181
1182 Since this routine removes all bad values, then the output piddle
1183 had its bad flag cleared. If run inplace (so "a == b"), then we
1184 have to tell all the children of "a" that the bad flag has been
1185 cleared (to save time we make sure that we call "PDL->pro‐
1186 pogate_badgflag" only if the input piddle had its bad flag set).
1187
1188 NOTE: one idea is that the documentation for the routine could be
1189 automatically flagged to indicate that it can be executed inplace,
1190 ie something similar to how "HandleBad" sets "BadDoc" if it's not
1191 supplied (it's not an ideal solution).
1192
1193 Other PDL::PP functions to support concise package definition
1194
1195 So far, we have described the "pp_def" and "pp_done" functions. PDL::PP
1196 exports a few other functions to aid you in writing concise PDL exten‐
1197 sion package definitions.
1198
1199 Often when you interface library functions as in the above example you
1200 have to include additional C include files. Since the XS file is gener‐
1201 ated by PP we need some means to make PP insert the appropriate include
1202 directives in the right place into the generated XS file. To this end
1203 there is the "pp_addhdr" function. This is also the function to use
1204 when you want to define some C functions for internal use by some of
1205 the XS functions (which are mostly functions defined by "pp_def"). By
1206 including these functions here you make sure that PDL::PP inserts your
1207 code before the point where the actual XS module section begins and
1208 will therefore be left untouched by xsubpp (cf. perlxs and perlxstut
1209 manpages).
1210
1211 A typical call would be
1212
1213 pp_addhdr('
1214 #include <unistd.h> /* we need defs of XXXX */
1215 #include "libprotos.h" /* prototypes of library functions */
1216 #include "mylocaldecs.h" /* Local decs */
1217
1218 static void do_the real_work(PDL_Byte * in, PDL_Byte * out, int n)
1219 {
1220 /* do some calculations with the data */
1221 }
1222 ');
1223
1224 This ensures that all the constants and prototypes you need will be
1225 properly included and that you can use the internal functions defined
1226 here in the "pp_def"s, e.g.:
1227
1228 pp_def('barfoo',
1229 Pars => ' a(n); [o] b(n)',
1230 GenericTypes => '[B]',
1231 Code => ' int ns = $SIZE(n);
1232 do_the_real_work($P(a),$P(b),ns);
1233 ',
1234 );
1235
1236 In many cases the actual PP code (meaning the arguments to "pp_def"
1237 calls) is only part of the package you are currently implementing.
1238 Often there is additional perl code and XS code you would normally have
1239 written into the pm and XS files which are now automatically generated
1240 by PP. So how to get this stuff into those dynamically generated files?
1241 Fortunately, there are a couple of functions, generally called
1242 "pp_addXXX" that assist you in doing this.
1243
1244 Let's assume you have additional perl code that should go into the gen‐
1245 erated pm-file. This is easily achieved with the "pp_addpm" command:
1246
1247 pp_addpm(<<'EOD');
1248
1249 =head1 NAME
1250
1251 PDL::Lib::Mylib -- a PDL interface to the Mylib library
1252
1253 =head1 DESCRIPTION
1254
1255 This package implements an interface to the Mylib package with full
1256 threading and indexing support (see L<PDL::Indexing>).
1257
1258 =cut
1259
1260 use PGPLOT;
1261
1262 =head2 use_myfunc
1263 this function applies the myfunc operation to all the
1264 elements of the input pdl regardless of dimensions
1265 and returns the sum of the result
1266 =cut
1267
1268 sub use_myfunc {
1269 my $pdl = shift;
1270
1271 myfunc($pdl->clump(-1),($res=null));
1272
1273 return $res->sum;
1274 }
1275
1276 EOD
1277
1278 You have probably got the idea. In some cases you also want to export
1279 your additional functions. To avoid getting into trouble with PP which
1280 also messes around with the @EXPORT array you just tell PP to add your
1281 functions to the list of exported functions:
1282
1283 pp_add_exported('', 'use_myfunc gethynx');
1284
1285 Note the initial empty string argument (reason for it?).
1286
1287 The "pp_add_isa" command works like the the "pp_add_exported" function.
1288 The arguments to "pp_add_isa" are added the @ISA list, e.g.
1289
1290 pp_add_isa(' Some::Other::Class ');
1291
1292 Sometimes you want to add extra XS code of your own (that is generally
1293 not involved with any threading/indexing issues but supplies some other
1294 functionality you want to access from the perl side) to the generated
1295 XS file, for example
1296
1297 pp_addxs('','
1298
1299 # Determine endianness of machine
1300
1301 int
1302 isbigendian()
1303 CODE:
1304 unsigned short i;
1305 PDL_Byte *b;
1306
1307 i = 42; b = (PDL_Byte*) (void*) &i;
1308
1309 if (*b == 42)
1310 RETVAL = 0;
1311 else if (*(b+1) == 42)
1312 RETVAL = 1;
1313 else
1314 croak("Impossible - machine is neither big nor little endian!!\n");
1315 OUTPUT:
1316 RETVAL
1317 ');
1318
1319 Especially "pp_add_exported" and "pp_addxs" should be used with care.
1320 PP uses PDL::Exporter, hence letting PP export your function means that
1321 they get added to the standard list of function exported by default
1322 (the list defined by the export tag ``:Func''). If you use "pp_addxs"
1323 you shouldn't try to do anything that involves threading or indexing
1324 directly. PP is much better at generating the appropriate code from
1325 your definitions.
1326
1327 Finally, you may want to add some code to the BOOT section of the XS
1328 file (if you don't know what that is check perlxs). This is easily done
1329 with the "pp_add_boot" command:
1330
1331 pp_add_boot(<<EOB);
1332 descrip = mylib_initialize(KEEP_OPEN);
1333
1334 if (descrip == NULL)
1335 croak("Can't initialize library");
1336
1337 GlobalStruc->descrip = descrip;
1338 GlobalStruc->maxfiles = 200;
1339 EOB
1340
1341 By default, PP.pm puts all subs defined using the pp_def function into
1342 the output .pm file's EXPORT list. This can create problems if you are
1343 creating a subclassed object where you don't want any methods exported.
1344 (i.e. the methods will only be called using the $object->method syn‐
1345 tax).
1346
1347 For these cases you can call pp_export_nothing() to clear out the
1348 export list. Example (At the end of the .pd file):
1349
1350 pp_export_nothing();
1351 pp_done();
1352
1353 By default, PP.pm puts the 'use Core;' line into the output .pm file.
1354 This imports Core's exported names into the current namespace, which
1355 can create problems if you are over-riding one of Core's methods in the
1356 current file. You end up getting messages like "Warning: sub sumover
1357 redefined in file subclass.pm" when running the program.
1358
1359 For these cases the pp_core_importList can be used to change what is
1360 imported from Core.pm. For example:
1361
1362 pp_core_importList('()')
1363
1364 This would result in
1365
1366 use Core();
1367
1368 being generated in the output .pm file. This would result in no names
1369 being imported from Core.pm. Similarly, calling
1370
1371 pp_core_importList(' qw/ barf /')
1372
1373 would result in
1374
1375 use Core qw/ barf/;
1376
1377 being generated in the output .pm file. This would result in just
1378 'barf' being imported from Core.pm.
1379
1381 The slice operation section of this manual is provided using dataflow
1382 and lazy evaluation: when you need it, ask Tjl to write it. a delivery
1383 in a week from when I receive the email is 95% probable and two week
1384 delivery is 99% probable.
1385
1386 And anyway, the slice operations require a much more intimate knowledge
1387 of PDL internals than the data operations. Furthermore, the complexity
1388 of the issues involved is considerably higher than that in the average
1389 data operation. If you would like to convince yourself of this fact
1390 take a look at the Basic/Slices/slices.pd file in the PDL distribution
1391 :-). Nevertheless, functions generated using the slice operations are
1392 at the heart of the index manipulation and dataflow capabilities of
1393 PDL.
1394
1395 Also, there are a lot of dirty issues with virtual piddles and vaffines
1396 which we shall entirely skip here.
1397
1398 Slices and bad values
1399
1400 Slice operations need to be able to handle bad values (if support is
1401 compiled into PDL). The easiest thing to do is look at
1402 Basic/Slices/slices.pd to see how this works.
1403
1404 Along with "BadCode", there are also the "BadBackCode" and "BadRedoDim‐
1405 sCode" keys for "pp_def". However, any "EquivCPOffsCode" should not
1406 need changing, since any changes are absorbed into the definition of
1407 the "$EQUIVCPOFFS()" macro (ie it is handled automatically by PDL::PP>.
1408
1409 A few notes on writing a slicing routine...
1410
1411 The following few paragraphs describe writing of a new slicing routine
1412 ('range'); any errors are CED's. (--CED 26-Aug-2002)
1413
1415 The PDL "Core" structure, defined in Basic/Core/pdlcore.h.PL, contains
1416 pointers to a number of routines that may be useful to you. The major‐
1417 ity of these routines deal with manipulating piddles, but some are more
1418 general:
1419
1420 PDL->qsort_B( PDL_Byte *xx, int a, int b )
1421 Sort the array "xx" between the indices "a" and "b". There are
1422 also versions for the other PDL datatypes, with postfix "_S", "_U",
1423 "_L", "_F", and "_D". Any module using this must ensure that
1424 "PDL::Ufunc" is loaded.
1425
1426 PDL->qsort_ind_B( PDL_Byte *xx, int *ix, int a, int b )
1427 As for "PDL->qsort_B", but this time sorting the indices rather
1428 than the data.
1429
1430 The routine "med2d" in Lib/Image2D/image2d.pd shows how such routines
1431 are used.
1432
1434 If you are going to generate a package from your PP file (typical file
1435 extensions are ".pd" or ".pp" for the files containing PP code) it is
1436 easiest and safest to leave generation of the appropriate commands to
1437 the Makefile. In the following we will outline the typical format of a
1438 perl Makefile to automatically build and install your package from a
1439 description in a PP file. Most of the rules to build the xs, pm and
1440 other required files from the PP file are already predefined in the
1441 PDL::Core::Dev package. We just have to tell MakeMaker to use it.
1442
1443 In most cases you can define your Makefile like
1444
1445 # Makefile.PL for a package defined by PP code.
1446
1447 use PDL::Core::Dev; # Pick up development utilities
1448 use ExtUtils::MakeMaker;
1449
1450 $package = ["mylib.pd",Mylib,PDL::Lib::Mylib];
1451 %hash = pdlpp_stdargs($package);
1452 $hash{OBJECT} .= ' additional_Ccode$(OBJ_EXT) ';
1453 $hash{clean}->{FILES} .= ' todelete_Ccode$(OBJ_EXT) ';
1454 $hash{'VERSION_FROM'} = 'mylib.pd';
1455 WriteMakefile(%hash);
1456
1457 sub MY::postamble { pdlpp_postamble($package); }
1458
1459 Here, the list in $package is: first: PP source file name, then the
1460 prefix for the produced files and finally the whole package name. You
1461 can modify the hash in whatever way you like but it would be reasonable
1462 to stay within some limits so that your package will continue to work
1463 with later versions of PDL.
1464
1465 If you don't want to use prepackaged arguments, here is a generic Make‐
1466 file.PL that you can adapt for your own needs:
1467
1468 # Makefile.PL for a package defined by PP code.
1469
1470 use PDL::Core::Dev; # Pick up development utilities
1471 use ExtUtils::MakeMaker;
1472
1473 WriteMakefile(
1474 'NAME' => 'PDL::Lib::Mylib',
1475 'VERSION_FROM' => 'mylib.pd',
1476 'TYPEMAPS' => [&PDL_TYPEMAP()],
1477 'OBJECT' => 'mylib$(OBJ_EXT) additional_Ccode$(OBJ_EXT)',
1478 'PM' => { 'Mylib.pm' => '$(INST_LIBDIR)/Mylib.pm'},
1479 'INC' => &PDL_INCLUDE(), # add include dirs as required by your lib
1480 'LIBS' => [''], # add link directives as necessary
1481 'clean' => {'FILES' =>
1482 'Mylib.pm Mylib.xs Mylib$(OBJ_EXT)
1483 additional_Ccode$(OBJ_EXT)'},
1484 );
1485
1486 # Add genpp rule; this will invoke PDL::PP on our PP file
1487 # the argument is an array reference where the array has three string elements:
1488 # arg1: name of the source file that contains the PP code
1489 # arg2: basename of the xs and pm files to be generated
1490 # arg3: name of the package that is to be generated
1491 sub MY::postamble { pdlpp_postamble(["mylib.pd",Mylib,PDL::Lib::Mylib]); }
1492
1493 To make life even easier PDL::Core::Dev defines the function
1494 "pdlpp_stdargs" that returns a hash with default values that can be
1495 passed (either directly or after appropriate modification) to a call to
1496 WriteMakefile. Currently, "pdlpp_stdargs" returns a hash where the
1497 keys are filled in as follows:
1498
1499 (
1500 'NAME' => $mod,
1501 'TYPEMAPS' => [&PDL_TYPEMAP()],
1502 'OBJECT' => "$pref\$(OBJ_EXT)",
1503 PM => {"$pref.pm" => "\$(INST_LIBDIR)/$pref.pm"},
1504 MAN3PODS => {"$src" => "\$(INST_MAN3DIR)/$mod.\$(MAN3EXT)"},
1505 'INC' => &PDL_INCLUDE(),
1506 'LIBS' => [''],
1507 'clean' => {'FILES' => "$pref.xs $pref.pm $pref\$(OBJ_EXT)"},
1508 )
1509
1510 Here, $src is the name of the source file with PP code, $pref the pre‐
1511 fix for the generated .pm and .xs files and $mod the name of the exn‐
1512 tension module to generate.
1513
1515 The internals of the current version consist of a large table which
1516 gives the rules according to which things are translated and the subs
1517 which implement these rules.
1518
1519 Later on, it would be good to make the table modifiable by the user so
1520 that different things may be tried.
1521
1522 [Meta comment: here will hopefully be more in the future; currently,
1523 your best bet will be to read the source code :-( or ask on the list
1524 (try the latter first) ]
1525
1527 Unless otherwise specified, the arguments are strings. Keys marked with
1528 (bad) are only used if bad-value support is compiled into PDL.
1529
1530 Pars
1531 define the signature of your function
1532
1533 OtherPars
1534 arguments which are not pdls. Default: nothing.
1535
1536 Code
1537 the actual code that implements the functionality; several PP
1538 macros and PP functions are recognised in the string value
1539
1540 HandleBad (bad)
1541 If set to 1, the routine is assumed to support bad values and the
1542 code in the BadCode key is used if bad values are present; it also
1543 sets things up so that the "$ISBAD()" etc macros can be used. If
1544 set to 0, cause the routine to print a warning if any of the input
1545 piddles have their bad flag set.
1546
1547 BadCode (bad)
1548 Give the code to be used if bad values may be present in the input
1549 piddles. Only used if "HandleBad => 1".
1550
1551 GenericTypes
1552 An array reference. The array may contain any subset of the strings
1553 `B', `S', `U', `L', `F' and `D', which specify which types your
1554 operation will accept. This is very useful (and important!) when
1555 interfacing an external library. Default: [qw/B S U L F D/]
1556
1557 Inplace
1558 Mark a function as being able to work inplace.
1559
1560 Inplace => 1 if Pars => 'a(); [o]b();'
1561 Inplace => ['a'] if Pars => 'a(); b(); [o]c();'
1562 Inplace => ['a','b'] if Pars => 'a(); b(); [o]c(); [o]d();'
1563
1564 If bad values are being used, care must be taken to ensure the pro‐
1565 pogation of the badflag when inplace is being used; for instance
1566 see the code for "replacebad" in Basic/Bad/bad.pd.
1567
1568 Doc Used to specify a documentation string in Pod format. See PDL::Doc
1569 for information on PDL documentation conventions. Note: in the spe‐
1570 cial case where the PP 'Doc' string is one line this is implicitly
1571 used for the quick reference AND the documentation!
1572
1573 If the Doc field is omitted PP will generate default documentation
1574 (after all it knows about the Signature).
1575
1576 If you really want the function NOT to be documented in any way at
1577 this point (e.g. for an internal routine, or because youu are doing
1578 it elsewhere in the code) explictly specify "Doc=>undef".
1579
1580 BadDoc (bad)
1581 Contains the text returned by the "badinfo" command (in "perldl")
1582 or the "-b" switch to the "pdldoc" shell script. In many cases, you
1583 will not need to specify this, since the information can be auto‐
1584 matically created by PDL::PP. However, as befits computer-generated
1585 text, it's rather stilted; it may be much better to do it yourself!
1586
1588 Macros
1589
1590 Macros labelled by (bad) are only used if bad-value support is compiled
1591 into PDL.
1592
1593 $variablename_from_sig()
1594 access a pdl (by its name) that was specified in the signature
1595
1596 $COMP(x)
1597 access a value in the private data structure of this transforma‐
1598 tion (mainly used to use an argument that is specified in the
1599 "OtherPar" section)
1600
1601 $SIZE(n)
1602 replaced at runtime by the actual size of a named dimension (as
1603 specified in the signature)
1604
1605 $GENERIC()
1606 replaced by the C type that is equal to the runtime type of the
1607 operation
1608
1609 $P(a) a pointer access to the PDL named "a" in the signature. Useful
1610 for interfacing to C functions
1611
1612 $PP(a) a physical pointer access to pdl "a"; mainly for internal use
1613
1614 $TXXX(Alternative,Alternative)
1615 expansion alternatives according to runtime type of operation,
1616 where XXX is some string that is matched by "/[BSULFD+]/".
1617
1618 $PDL(a)
1619 return a pointer to the pdl data structure (pdl *) of piddle "a"
1620
1621 $ISBAD(a()) (bad)
1622 returns true if the value stored in "a()" equals the bad value
1623 for this piddle. Requires "HandleBad" being set to 1.
1624
1625 $ISGOOD(a()) (bad)
1626 returns true if the value stored in "a()" does not equal the bad
1627 value for this piddle. Requires "HandleBad" being set to 1.
1628
1629 $SETBAD(a()) (bad)
1630 Sets "a()" to equal the bad value for this piddle. Requires
1631 "HandleBad" being set to 1.
1632
1633 functions
1634
1635 "loop(DIMS) %{ ... %}"
1636 loop over named dimensions; limits are generated automatically by PP
1637
1638 "threadloop %{ ... %}"
1639 enclose following code in a threadloop
1640
1641 "types(TYPES) %{ ... %}"
1642 execute following code if type of operation is any of "TYPES"
1643
1645 PDL
1646
1647 For the concepts of threading and slicing check PDL::Indexing.
1648
1649 PDL::Internals
1650
1651 PDL::BadValues for information on bad values
1652
1653 perlxs, perlxstut
1654
1656 RedoDimsCode, $RESIZE()
1657
1659 PDL::PP is still, even in its rewritten form, too complicated. It
1660 needs to be rethought a little as well as deconvoluted and modularized
1661 some more (e.g. all the NS things).
1662
1663 After the rewrite, this can happen a little by little, though.
1664
1665 Undocumented functions
1666
1667 The following functions have been added since this manual was written
1668 and are as yet undocumented
1669
1670 pp_export_nothing
1671 pp_core_importList
1672 pp_beginwrap
1673 pp_setversion
1674 pp_addbegin
1675
1677 Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), Karl Glaaze‐
1678 brook (kgb@aaocbn1.aao.GOV.AU) and Christian Soeller (c.soeller@auck‐
1679 land.ac.nz). All rights reserved. Although destined for release as a
1680 man page with the standard PDL distribution, it is not public domain.
1681 Permission is granted to freely distribute verbatim copies of this doc‐
1682 ument provided that no modifications outside of formatting be made, and
1683 that this notice remain intact. You are permitted and encouraged to
1684 use its code and derivatives thereof in your own source code for fun or
1685 for profit as you see fit.
1686
1687
1688
1689perl v5.8.8 2003-05-21 PP(1)