PDL::PP(1)

1PP(1)                 User Contributed Perl Documentation                PP(1)
2
3
4

NAME

6       PDL::PP - Generate PDL routines from concise descriptions
7

SYNOPSIS

9       e.g.
10
11               pp_def(
12                       'sumover',
13                       Pars => 'a(n); [o]b();',
14                       Code => 'double tmp=0;
15                               loop(n) %{ tmp += $a(); %}
16                               $b() = tmp;
17                               '
18               );
19
20               pp_done();
21

DESCRIPTION

23       In much of what follows we will assume familiarity of the reader with
24       the concepts of implicit and explicit threading and index manipulations
25       within PDL. If you have not yet heard of these concepts or are not very
26       comfortable with them it is time to check PDL::Indexing.
27
28       As you may appreciate from its name PDL::PP is a Pre-Processor, i.e.
29       it expands code via substitutions to make real C-code (well, actually
30       it outputs XS code (See perlxs) but that is very close to C).
31

Overview

33       Why do we need PP? Several reasons: firstly, we want to be able to gen‐
34       erate subroutine code for each of the PDL datatypes (PDL_Byte,
35       PDL_Short,. etc).  AUTOMATICALLY.  Secondly, when referring to slices
36       of PDL arrays in Perl (e.g. "$a->slice('0:10:2,:')" or other things
37       such as transposes) it is nice to be able to do this transparently and
38       to be able to do this 'in-place' - i.e, not to have to make a memory
39       copy of the section. PP handles all the necessary element and offset
40       arithmetic for you. There are also the notions of threading (repeated
41       calling of the same routine for multiple slices, see PDL::Indexing) and
42       dataflow (see PDL::Dataflow) which use of PP allows.
43
44       So how do you use PP? Well for the most part you just write ordinary C
45       code except for special PP constructs which take the form:
46
47          $something(something else)
48
49       or:
50
51          PPfunction %{
52            <stuff>
53          %}
54
55       The most important PP construct is the form "$array()". Consider the
56       very simple PP function to sum the elements of a 1D vector (in fact
57       this is very similar to the actual code used by 'sumover'):
58
59          pp_def('sumit',
60                  Pars => 'a(n);  [o]b();',
61                  Code => '
62                       double tmp;
63                       tmp = 0;
64                       loop(n) %{
65                         tmp += $a();
66                       %}
67                       $b() = tmp;
68          ');
69
70       What's going on? The "Pars =>" line is very important for PP - it spec‐
71       ifies all the arguments and their dimensionality. We call this the sig‐
72       nature of the PP function (compare also the explanations in PDL::Index‐
73       ing).  In this case the routine takes a 1-D function as input and
74       returns a 0-D scalar as output.  The "$a()" PP construct is used to
75       access elements of the array a(n) for you - PP fills in all the
76       required C code.
77
78       [Aside: since PP used "$var()" for its parsing you must single-quote
79       all Code=> arguments since you don't want perl to interpolate "$var()"
80       into another string - i.e. don't use "" unless you know what you are
81       doing! Tjl: it's usually easiest to use single quotes and 'some‐
82       thing'.$interpolatable.'somethingelse']
83
84       In the simple case here where all elements are accessed the PP con‐
85       struct "loop(n) %{ ... %}" is used to loop over all elements in dimen‐
86       sion "n".  Note this feature of PP: ALL DIMENSIONS ARE SPECIFIED BY
87       NAME.
88
89       This is made clearer if we avoid the PP loop() construct and write the
90       loop explicitly using conventional C:
91
92          pp_def('sumit',
93                  Pars => 'a(n);  [o]b();',
94                  Code => '
95                       int i,n_size;
96                       double tmp;
97                       n_size = $SIZE(n);
98                       tmp = 0;
99                       for(i=0; i<n_size; i++) {
100                         tmp += $a(n=>i);
101                       }
102                       $b() = tmp;
103          ');
104
105       which does the same as before, except more long-windedly.  You can see
106       to get element "i" of a() we say "$a(n=>i)" - we are specifying the
107       dimension by name "n". In 2D we might say:
108
109          Pars=>'a(m,n);',
110             ...
111             tmp += $a(m=>i,n=>j);
112             ...
113
114       The syntax 'm=>i' borrows from Perl hashes (which are in fact used in
115       the implementation of PP). One could also say "$a(n=>j,m=>i)" as order
116       is not important.
117
118       You can also see in the above example the use of another PP construct -
119       $SIZE(n) to get the length of the dimension "n".
120
121       It should, however, be noted that you shouldn't write an explicit
122       C-loop when you could have used the PP "loop" construct since PDL::PP
123       checks automatically the loop limits for you, usage of "loop" makes the
124       code more concise, etc. But there are certainly situations where you
125       need explicit control of the loop and now you know how to do it ;).
126
127       To revisit 'Why PP?' - the above code for sumit() will be generated for
128       each data-type. It will operate on slices of arrays 'in-place'. It will
129       thread automatically - e.g. if a 2D array is given it will be called
130       repeatedly for each 1D row (again check PDL::Indexing for the details
131       of threading).  And then b() will be a 1D array of sums of each row.
132       We could call it with $a->xchg(0,1) to sum the colums instead.  And
133       Dataflow tracing etc. will be available.
134
135       You can see PP saves the programmer from writing a lot of needlessly
136       repetitive C-code -- in our opinion this is one of the best features of
137       PDL making writing new C subroutines for PDL an amazingly concise exer‐
138       cise. A second reason is the ability to make PP expand your concise
139       code definitions into different C code based on the needs of the com‐
140       puter architecture in question. Imagine for example you are lucky to
141       have a supercomputer at your hands; in that case you want PDL::PP cer‐
142       tainly to generate code that takes advantage of the vectorising/paral‐
143       lel computing features of your machine (this a project for the future).
144       In any case, the bottom line is that your unchanged code should still
145       expand to working XS code even if the internals of PDL changed.
146
147       Also, because you are generating the code in an actual Perl script,
148       there are many fun things that you can do. Let's say that you need to
149       write both sumit (as above) and multit. With a little bit of inventiv‐
150       ity, we can do
151
152          for({Name => 'sumit', Init => '0', Op => '+='},
153              {Name => 'multit', Init => '1', Op => '*='}) {
154                  pp_def($_->{Name},
155                          Pars => 'a(n);  [o]b();',
156                          Code => '
157                               double tmp;
158                               tmp = '.$_->{Init}.';
159                               loop(n) %{
160                                 tmp '.$_->{Op}.' $a();
161                               %}
162                               $b() = tmp;
163                  ');
164          }
165
166       which defines both the functions easily. Now, if you later need to
167       change the signature or dimensionality or whatever, you only need to
168       change one place in your code.  Yeah, sure, your editor does have 'cut
169       and paste' and 'search and replace' but it's still less bothersome and
170       definitely more difficult to forget just one place and have strange
171       bugs creep in.  Also, adding 'orit' (bitwise or) later is a one-liner.
172
173       And remember, you really have perl's full abilities with you - you can
174       very easily read any input file and make routines from the information
175       in that file. For simple cases like the above, the author (Tjl) cur‐
176       rently favors the hash syntax like the above - it's not too much more
177       characters than the corresponding array syntax but much easier to
178       understand and change.
179
180       We should mention here also the ability to get the pointer to the
181       beginning of the data in memory - a prerequisite for interfacing PDL to
182       some libraries. This is handled with the "$P(var)" directive, see
183       below.
184
185       So, after this quick overview of the general flavour of programming PDL
186       routines using PDL::PP let's summarise in which circumstances you
187       should actually use this preprocessor/precompiler. You should use
188       PDL::PP if you want to
189
190       ·  interface PDL to some external library
191
192       ·  write some algorithm that would be slow if coded in perl (this is
193          not as often as you think; take a look at threading and dataflow
194          first).
195
196       ·  be a PDL developer (and even then it's not obligatory)
197

WARNING

199       Because of its architecture, PDL::PP can be both flexible and easy to
200       use (yet exuberantly complicated) at the same time. Currently, part of
201       the problem is that error messages are not very informative and if
202       something goes wrong, you'd better know what you are doing and be able
203       to hack your way through the internals (or be able to figure out by
204       trial and error what is wrong with your args to "pp_def").
205
206       An alternative, of course, is to ask someone about it (e.g., through
207       the mailing lists).
208

ABANDON ALL HOPE, YE WHO ENTER HERE (DESCRIPTION)

210       Now that you have some idea how to use "pp_def" to define new PDL func‐
211       tions it is time to explain the general syntax of "pp_def".  "pp_def"
212       takes as arguments first the name of the function you are defining and
213       then a hash list that can contain various keys.
214
215       Based on these keys PP generates XS code and a .pm file. The function
216       "pp_done" (see example in the SYNOPSIS) is used to tell PDL::PP that
217       there are no more definitions in this file and it is time to generate
218       the .xs and
219        .pm file.
220
221       As a consequence, there may be several pp_def() calls inside a file (by
222       convention files with PP code have the extension .pd or .pp) but gener‐
223       ally only one pp_done().
224
225       There are two main different types of usage of pp_def(), the 'data
226       operation' and 'slice operation' prototypes.
227
228       The 'data operation' is used to take some data, mangle it and output
229       some other data; this includes for example the '+' operation, matrix
230       inverse, sumover etc and all the examples we have talked about in this
231       document so far. Implicit and explicit threading and the creation of
232       the result are taken care of automatically in those opeartions. You can
233       even do dataflow with "sumit", "sumover", etc (don't be dismayed if you
234       don't understand the concept of dataflow in PDL very well yet; it is
235       still very much experimental).
236
237       The 'slice operation' is a different kind of operation: in a slice
238       operation, you are not changing any data, you are defining correspon‐
239       dences between different elements of two piddles (examples include the
240       index manipulation/slicing function definitions in the file slices.pd
241       that is part of the PDL distribution; but beware, this is not introduc‐
242       tory level stuff).
243
244       If PDL was compiled with support for bad values (ie "WITH_BADVAL =>
245       1"), then additional keys are required for "pp_def", as explained
246       below.
247
248       If you are just interested in communicating with some external library
249       (for example some linear algebra/matrix library), you'll usually want
250       the 'data operation' so we are going to discuss that first.
251

Data operation

253       A simple example
254
255       In the data operation, you must know what dimensions of data you need.
256       First, an example with scalars:
257
258               pp_def('add',
259                       Pars => 'a(); b(); [o]c();',
260                       Code => '$c() = $a() + $b();'
261               );
262
263       That looks a little strange but let's dissect it. The first line is
264       easy: we're defining a routine with the name 'add'.  The second line
265       simply declares our parameters and the parentheses mean that they are
266       scalars. We call the string that defines our parameters and their
267       dimensionality the signature of that function. For its relevance with
268       regard to threading and index manipulations check the PDL::Indexing
269       manpage.
270
271       The third line is the actual operation. You need to use the dollar
272       signs and parentheses to refer to your parameters (this will probably
273       change at some point in the future, once a good syntax is found).
274
275       These lines are all that is necessary to actually define the function
276       for PDL (well, actually it isn't; you aditionally need to write a Make‐
277       file.PL (see below) and build the module (something like 'perl Make‐
278       file.PL; make'); but let's ignore that for the moment). So now you can
279       do
280
281               use MyModule;
282               $a = pdl 2,3,4;
283               $b = pdl 5;
284
285               $c = add($a,$b);
286               # or
287               add($a,$b,($c=null)); # Alternative form, useful if $c has been
288                                     # preset to something big, not useful here.
289
290       and have threading work correctly (the result is $c == [7 8 9]).
291
292       The Pars section: the signature of a PP function
293
294       Seeing the above example code you will most probably ask: what is this
295       strange "$c=null" syntax in the second call to our new "add" function?
296       If you take another look at the definition of "add" you will notice
297       that the third argument "c" is flagged with the qualifier "[o]" which
298       tells PDL::PP that this is an output argument. So the above call to add
299       means 'create a new $c from scratch with correct dimensions' - "null"
300       is a special token for 'empty piddle' (you might ask why we haven't
301       used the value "undef" to flag this instead of the PDL specific "null";
302       we are currently thinking about it ;).
303
304       [This should be explained in some other section of the manual as
305       well!!]  The reason for having this syntax as an alternative is that if
306       you have really huge piddles, you can do
307
308               $c = PDL->null;
309               for(some long loop) {
310                       # munge a,b
311                       add($a,$b,$c);
312                       # munge c, put something back to a,b
313               }
314
315       and avoid allocating and deallocating $c each time. It is allocated
316       once at the first add() and thereafter the memory stays until $c is
317       destroyed.
318
319       If you just say
320
321         $c =  add($a,$b);
322
323       the code generated by PP will automatically fill in "$c=null" and
324       return the result. If you want to learn more about the reasons why
325       PDL::PP supports this style where output arguments are given as last
326       arguments check the PDL::Indexing manpage.
327
328       "[o]" is not the only qualifier a pdl argument can have in the signa‐
329       ture.  Another important qualifier is the "[t]" option which flags a
330       pdl as temporary.  What does that mean? You tell PDL::PP that this pdl
331       is only used for temporary results in the course of the calculation and
332       you are not interested in its value after the computation has been com‐
333       pleted. But why should PDL::PP want to know about this in the first
334       place?  The reason is closely related to the concepts of pdl auto cre‐
335       ation (you heard about that above) and implicit threading. If you use
336       implicit threading the dimensionality of automatically created pdls is
337       actually larger than that specified in the signature. With "[o]"
338       flagged pdls will be created so that they have the additional dimen‐
339       sions as required by the number of implicit thread dimensions. When
340       creating a temporary pdl, however, it will always only be made big
341       enough so that it can hold the result for one iteration in a thread‐
342       loop, i.e. as large as required by the signature.  So less memory is
343       wasted when you flag a pdl as temporary. Secondly, you can use output
344       auto creation with temporary pdls even when you are using explicit
345       threading which is forbidden for normal output pdls flagged with "[o]"
346       (see PDL::Indexing).
347
348       Here is an example where we use the [t] qualifier. We define the func‐
349       tion "callf" that calls a C routine "f" which needs a temporary array
350       of the same size and type as the array "a" (sorry about the forward
351       reference for $P; it's a pointer access, see below) :
352
353         pp_def('callf',
354               Pars => 'a(n); [t] tmp(n); [o] b()',
355               Code => 'int ns = $SIZE(n);
356                        f($P(a),$P(b),$P(tmp),ns);
357                       '
358         );
359
360       Argument dimensions and the signature
361
362       Now we have just talked about dimensions of pdls and the signature. How
363       are they related? Let's say that we want to add a scalar + the index
364       number to a vector:
365
366               pp_def('add2',
367                       Pars => 'a(n); b(); [o]c(n);',
368                       Code => 'loop(n) %{
369                                       $c() = $a() + $b() + n;
370                                %}'
371               );
372
373       There are several points to notice here: first, the "Pars" argument now
374       contains the n arguments to show that we have a single dimensions in a
375       and c. It is important to note that dimensions are actual entities that
376       are accessed by name so this declares a and c to have the same first
377       dimensions. In most PP definitions the size of named dimensions will be
378       set from the respective dimensions of non-output pdls (those with no
379       "[o]" flag) but sometimes you might want to set the size of a named
380       dimension explicitly through an integer parameter. See below in the
381       description of the "OtherPars" section how that works.
382
383       Type conversions and the signature
384
385       The signature also determines the type conversions that will be per‐
386       formed when a PP function is invoked. So what happens when we invoke
387       one of our previously defined functions with pdls of different type,
388       e.g.
389
390         add2($a,$b,($ret=null));
391
392       where $a is of type "PDL_Float" and $b of type "PDL_Short"? With the
393       signature as shown in the definition of "add2" above the datatype of
394       the operation (as determined at runtime) is that of the pdl with the
395       'highest' type (sequence is byte < short < ushort < long < float < dou‐
396       ble). In the add2 example the datatype of the operation is float ($a
397       has that datatype). All pdl arguments are then type converted to that
398       datatype (they are not converted inplace but a copy with the right type
399       is created if a pdl argument doesn't have the type of the operation).
400       Null pdls don't contribute a type in the determination of the type of
401       the operation.  However, they will be created with the datatype of the
402       operation; here, for example, $ret will be of type float. You should be
403       aware of these rules when calling PP functions with pdls of different
404       types to take the additional storage and runtime requirements into
405       account.
406
407       These type conversions are correct for most functions you normally
408       define with "pp_def". However, there are certain cases where slightly
409       modified type conversion behaviour is desired. For these cases addi‐
410       tional qualifiers in the signature can be used to specify the desired
411       properties with regard to type conversion. These qualifiers can be com‐
412       bined with those we have encountered already (the creation qualifiers
413       "[o]" and "[t]"). Let's go through the list of qualifiers that change
414       type conversion behaviour.
415
416       The most important is the "int" qualifier which comes in handy when a
417       pdl argument represents indices into another pdl. Let's take a look at
418       an example from "PDL::Ufunc":
419
420          pp_def('maximum_ind',
421                 Pars => 'a(n); int [o] b()',
422                 Code => '$GENERIC() cur;
423                          int curind;
424                          loop(n) %{
425                           if (!n ⎪⎪ $a() > cur) {cur = $a(); curind = n;}
426                          %}
427                          $b() = curind;',
428          );
429
430       The function "maximum_ind" finds the index of the largest element of a
431       vector. If you look at the signature you notice that the output argu‐
432       ment "b" has been declared with the additional "int" qualifier.  This
433       has the following consequences for type conversions: regardless of the
434       type of the input pdl "a" the output pdl "b" will be of type "PDL_Long"
435       which makes sense since "b" will represent an index into "a". Further‐
436       more, if you call the function with an existing output pdl "b" its type
437       will not influence the datatype of the operation (see above). Hence,
438       even if "a" is of a smaller type than "b" it will not be converted to
439       match the type of "b" but stays untouched, which saves memory and CPU
440       cycles and is the right thing to do when "b" represents indices. Also
441       note that you can use the 'int' qualifier together with other quali‐
442       fiers (the "[o]" and "[t]" qualifiers). Order is significant -- type
443       qualifiers precede creation qualifiers ("[o]" and "[t]").
444
445       The above example also demonstrates typical usage of the "$GENERIC()"
446       macro.  It expands to the current type in a so called generic loop.
447       What is a generic loop? As you already heard a PP function has a run‐
448       time datatype as determined by the type of the pdl arguments it has
449       been invoked with.  The PP generated XS code for this function there‐
450       fore contains a switch like "switch (type) {case PDL_Byte: ... case
451       PDL_Double: ...}" that selects a case based on the runtime datatype of
452       the function (it's called a type ``loop'' because there is a loop in PP
453       code that generates the cases).  In any case your code is inserted once
454       for each PDL type into this switch statement. The "$GENERIC()" macro
455       just expands to the respective type in each copy of your parsed code in
456       this "switch" statement, e.g., in the "case PDL_Byte" section "cur"
457       will expand to "PDL_Byte" and so on for the other case statements. I
458       guess you realise that this is a useful macro to hold values of pdls in
459       some code.
460
461       There are a couple of other qualifiers with similar effects as "int".
462       For your convenience there are the "float" and "double" qualifiers with
463       analogous consequences on type conversions as "int". Let's assume you
464       have a very large array for which you want to compute row and column
465       sums with an equivalent of the "sumover" function.  However, with the
466       normal definition of "sumover" you might run into problems when your
467       data is, e.g. of type short. A call like
468
469         sumover($large_pdl,($sums = null));
470
471       will result in $sums be of type short and is therefore prone to over‐
472       flow errors if $large_pdl is a very large array. On the other hand
473       calling
474
475         @dims = $large_pdl->dims; shift @dims;
476         sumover($large_pdl,($sums = zeroes(double,@dims)));
477
478       is not a good alternative either. Now we don't have overflow problems
479       with $sums but at the expense of a type conversion of $large_pdl to
480       double, something bad if this is really a large pdl. That's where "dou‐
481       ble" comes in handy:
482
483         pp_def('sumoverd',
484                Pars => 'a(n); double [o] b()',
485                Code => 'double tmp=0;
486                         loop(n) %{ tmp += a(); %}
487                         $b() = tmp;',
488         );
489
490       This gets us around the type conversion and overflow problems. Again,
491       analogous to the "int" qualifier "double" results in "b" always being
492       of type double regardless of the type of "a" without leading to a type‐
493       conversion of "a" as a side effect.
494
495       Finally, there are the "type+" qualifiers where type is one of "int" or
496       "float". What shall that mean. Let's illustrate the "int+" qualifier
497       with the actual definition of sumover:
498
499         pp_def('sumover',
500                Pars => 'a(n); int+ [o] b()',
501                Code => '$GENERIC(b) tmp=0;
502                         loop(n) %{ tmp += a(); %}
503                         $b() = tmp;',
504         );
505
506       As we had already seen for the "int", "float" and "double" qualifiers,
507       a pdl marked with a "type+" qualifier does not influence the datatype
508       of the pdl operation. Its meaning is "make this pdl at least of type
509       "type" or higher, as required by the type of the operation". In the
510       sumover example this means that when you call the function with an "a"
511       of type PDL_Short the output pdl will be of type PDL_Long (just as
512       would have been the case with the "int" qualifier). This again tries to
513       avoid overflow problems when using small datatypes (e.g. byte images).
514       However, when the datatype of the operation is higher than the type
515       specified in the "type+" qualifier "b" will be created with the
516       datatype of the operation, e.g. when "a" is of type double then "b"
517       will be double as well. We hope you agree that this is sensible behav‐
518       iour for "sumover". It should be obvious how the "float+" qualifier
519       works by analogy.  It may become necessary to be able to specify a set
520       of alternative types for the parameters. However, this will probably
521       not be implemented until someone comes up with a reasonable use for it.
522
523       Note that we now had to specify the $GENERIC macro with the name of the
524       pdl to derive the type from that argument. Why is that? If you care‐
525       fully followed our explanations you will have realised that in some
526       cases "b" will have a different type than the type of the operation.
527       Calling the '$GENERIC' macro with "b" as argument makes sure that the
528       type will always the same as that of "b" in that part of the generic
529       loop.
530
531       This is about all there is to say about the "Pars" section in a
532       "pp_def" call. You should remember that this section defines the signa‐
533       ture of a PP defined function, you can use several options to qualify
534       certain arguments as output and temporary args and all dimensions that
535       you can later refer to in the "Code" section are defined by name.
536
537       It is important that you understand the meaning of the signature since
538       in the latest PDL versions you can use it to define threaded functions
539       from within perl, i.e. what we call perl level threading. Please check
540       PDL::Indexing for details.
541
542       The Code section
543
544       The "Code" section contains the actual XS code that will be in the
545       innermost part of a threadloop (if you don't know what a thread loop is
546       then you still haven't read PDL::Indexing; do it now ;) after any PP
547       macros (like $GENERIC) and PP functions have been expanded (like the
548       "loop" function we are going to explain next).
549
550       Let's quickly reiterate the "sumover" example:
551
552         pp_def('sumover',
553                Pars => 'a(n); int+ [o] b()',
554                Code => '$GENERIC(b) tmp=0;
555                         loop(n) %{ tmp += a(); %}
556                         $b() = tmp;',
557         );
558
559       The "loop" construct in the "Code" section also refers to the dimension
560       name so you don't need to specify any limits: the loop is correctly
561       sized and everything is done for you, again.
562
563       Next, there is the surprising fact that "$a()" and "$b()" do not con‐
564       tain the index. This is not necessary because we're looping over n and
565       both variables know which dimensions they have so they automatically
566       know they're being looped over.
567
568       This feature comes in very handy in many places and makes for much
569       shorter code. Of course, there are times when you want to circumvent
570       this; here is a function which symmetrizes a matrix and serves as an
571       example of how to code explicit looping:
572
573               pp_def('symm',
574                       Pars => 'a(n,n); [o]c(n,n);',
575                       Code => 'loop(n) %{
576                                       int n2;
577                                       for(n2=n; n2<$SIZE(n); n2++) {
578                                               $c(n0 => n, n1 => n2) =
579                                               $c(n0 => n2, n1 => n) =
580                                                $a(n0 => n, n1 => n2);
581                                       }
582                               %}
583                       '
584               );
585
586       Let's dissect what is happening. Firstly, what is this function sup‐
587       posed to do? From its signature you see that it takes a 2D matrix with
588       equal numbers of columns and rows and outputs a matrix of the same
589       size. From a given input matrix $a it computes a symmetric output
590       matrix $c (symmetric in the matrix sense that A^T = A where ^T means
591       matrix transpose, or in PDL parlance $c == $c->xchg(0,1)). It does this
592       by using only the values on and below the diagonal of $a. In the output
593       matrix $c all values on and below the diagonal are the same as those in
594       $a while those above the diagonal are a mirror image of those below the
595       diagonal (above and below are here interpreted in the way that PDL
596       prints 2D pdls). If this explanation still sounds a bit strange just go
597       ahead, make a little file into which you write this definition, build
598       the new PDL extension (see section on Makefiles for PP code) and try it
599       out with a couple of examples.
600
601       Having explained what the function is supposed to do there are a couple
602       of points worth noting from the syntactical point of view. First, we
603       get the size of the dimension named "n" again by using the $SIZE macro.
604       Second, there are suddenly these funny "n0" and "n1" index names in the
605       code though the signature defines only the dimension "n". Why this? The
606       reason becomes clear when you note that both the first and second
607       dimension of $a and $b are named "n" in the signature of "symm". This
608       tells PDL::PP that the first and second dimension of these arguments
609       should have the same size. Otherwise the generated function will raise
610       a runtime error.  However, now in an access to $a and $c PDL::PP cannot
611       figure out which index "n" refers to any more just from the name of the
612       index.  Therefore, the indices with equal dimension names get numbered
613       from left to right starting at 0, e.g. in the above example "n0" refers
614       to the first dimension of $a and $c, "n1" to the second and so on.
615
616       In all examples so far, we have only used the "Pars" and "Code" members
617       of the hash that was passed to "pp_def". There are certainly other keys
618       that are recognised by PDL::PP and we will hear about some of them in
619       the course of this document. Find a (non-exhaustive) list of keys in
620       Appendix A.  A list of macros and PPfunctions (we have only encountered
621       some of those in the examples above yet) that are expanded in values of
622       the hash argument to "pp_def" is summarised in Appendix B.
623
624       At this point, it might be appropriate to mention that PDL::PP is not a
625       completely static, well designed set of routines (as Tuomas puts it:
626       "stop thinking of PP as a set of routines carved in stone") but rather
627       a collection of things that the PDL::PP author (Tuomas J. Lukka) con‐
628       sidered he would have to write often into his PDL extension routines.
629       PP tries to be expandable so that in the future, as new needs arise,
630       new common code can be abstracted back into it. If you want to learn
631       more on why you might want to change PDL::PP and how to do it check the
632       section on PDL::PP internals.
633
634       Handling bad values
635
636       If you do not have bad-value support compiled into PDL you can ignore
637       this section and the related keys: "BadCode", "HandleBad", ...  (try
638       printing out the value of $PDL::Bad::Status - if it equals 0 then move
639       straight on).
640
641       There are several keys and macros used when writing code to handle bad
642       values. The first one is the "HandleBad" key:
643
644       HandleBad => 0
645           This flags a pp-routine as NOT handling bad values. If this routine
646           is sent piddles with their "badflag" set, then a warning message is
647           printed to STDOUT and the piddles are processed as if the value
648           used to represent bad values is a valid number. The "badflag" value
649           is not propogated to the output piddles.
650
651           An example of when this is used is for FFT routines, which gener‐
652           ally do not have a way of ignoring part of the data.
653
654       HandleBad => 1
655           This causes PDL::PP to write extra code that ensures the BadCode
656           section is used, and that the "$ISBAD()" macro (and its brethren)
657           work.
658
659       HandleBad is not given
660           If any of the input piddles have their "badflag" set, then the out‐
661           put piddles will have their "badflag" set, but any supplied BadCode
662           is ignored.
663
664       The value of "HandleBad" is used to define the contents of the "BadDoc"
665       key, if it is not given.
666
667       To handle bad values, code must be written somewhat differently; for
668       instance,
669
670        $c() = $a() + $b();
671
672       becomes something like
673
674        if ( $a() != BADVAL && $b() != BADVAL ) {
675           $c() = $a() + $b();
676        } else {
677           $c() = BADVAL;
678        }
679
680       However, we only want the second version if bad values are present in
681       the input piddles (and that bad-value support is wanted!) - otherwise
682       we actually want the original code. This is where the "BadCode" key
683       comes in; you use it to specify the code to execute if bad values may
684       be present, and PP uses both it and the "Code" section to create some‐
685       thing like:
686
687        if ( bad_values_are_present ) {
688           fancy_threadloop_stuff {
689              BadCode
690           }
691        } else {
692           fancy_threadloop_stuff {
693              Code
694           }
695        }
696
697       This approach means that there is virtually no overhead when bad values
698       are not present (ie the badflag routine returns 0).
699
700       The BadCode section can use the same macros and looping constructs as
701       the Code section.  However, it wouldn't be much use without the follow‐
702       ing additional macros:
703
704       $ISBAD(var)
705           To check whether a piddle's value is bad, use the $ISBAD macro:
706
707            if ( $ISBAD(a()) ) { printf("a() is bad\n"); }
708
709           You can also access given elements of a piddle:
710
711            if ( $ISBAD(a(n=>l)) ) { printf("element %d of a() is bad\n", l); }
712
713       $ISGOOD(var)
714           This is the opposite of the $ISBAD macro.
715
716       $SETBAD(var)
717           For when you want to set an element of a piddle bad.
718
719       $ISBADVAR(c_var,pdl)
720           If you have cached the value of a piddle "$a()" into a c-variable
721           ("foo" say), then to check whether it is bad, use "$ISBAD‐
722           VAR(foo,a)".
723
724       $ISGOODVAR(c_var,pdl)
725           As above, but this time checking that the cached value isn't bad.
726
727       $SETBADVAR(c_var,pdl)
728           To copy the bad value for a piddle into a c variable, use "$SETBAD‐
729           VAR(foo,a)".
730
731       TODO: mention "$PPISBAD()" etc macros.
732
733       Using these macros, the above code could be specified as:
734
735        Code => '$c() = $a() + $b();',
736        BadCode => '
737           if ( $ISBAD(a()) ⎪⎪ $ISBAD(b()) ) {
738              $SETBAD(c());
739           } else {
740              $c() = $a() + $b();
741           }',
742
743       Since this is perl, TMTOWTDI, so you could also write:
744
745        BadCode => '
746           if ( $ISGOOD(a()) && $ISGOOD(b()) ) {
747              $c() = $a() + $b();
748           } else {
749              $SETBAD(c());
750           }',
751
752       If you want access to the value of the badflag for a given piddle, you
753       can use the "$PDLSTATExxxx()" macros:
754
755       $PDLSTATEISBAD(pdl)
756       $PDLSTATEISGOOD(pdl)
757       $PDLSTATESETBAD(pdl)
758       $PDLSTATESETGOOD(pdl)
759
760       TODO: mention the "FindBadStatusCode" and "CopyBadStatusCode" options
761       to "pp_def", as well as the "BadDoc" key.
762
763       Interfacing your own/library functions using PP
764
765       Now, consider the following: you have your own C function (that may in
766       fact be part of some library you want to interface to PDL) which takes
767       as arguments two pointers to vectors of double:
768
769               void myfunc(int n,double *v1,double *v2);
770
771       The correct way of defining the PDL function is
772
773               pp_def('myfunc',
774                       Pars => 'a(n); [o]b(n);',
775                       GenericTypes => [D],
776                       Code => 'myfunc($SIZE(n),$P(a),$P(b));'
777               );
778
779       The "$P("par")" syntax returns a pointer to the first element and the
780       other elements are guaranteed to lie after that.
781
782       Notice that here it is possible to make many mistakes. First, $SIZE(n)
783       must be used instead of "n". Second, you shouldn't put any loops in
784       this code. Third, here we encounter a new hash key recognised by
785       PDL::PP : the "GenericTypes" declaration tells PDL::PP to ONLY GENERATE
786       THE TYPELOOP FOP THE LIST OF TYPES SPECIFIED. In this case "double".
787       This has two advantages. Firstly the size of the compiled code is
788       reduced vastly, secondly if non-double arguments are passed to
789       "myfunc()" PDL will automatically convert them to double before passing
790       to the external C routine and convert them back afterwards.
791
792       One can also use "Pars" to qualify the types of individual arguments.
793       Thus one could also write this as:
794
795               pp_def('myfunc',
796                       Pars => 'double a(n); double [o]b(n);',
797                       Code => 'myfunc($SIZE(n),$P(a),$P(b));'
798               );
799
800       The type specification in "Pars" exempts the argument from variation in
801       the typeloop - rather it is automatically converted too and from the
802       type specified. This is obviously useful in a more general example,
803       e.g.:
804
805               void myfunc(int n,float *v1,long *v2);
806
807               pp_def('myfunc',
808                       Pars => 'float a(n); long [o]b(n);',
809                       GenericTypes => [F],
810                       Code => 'myfunc($SIZE(n),$P(a),$P(b));'
811               );
812
813       Note we still use "GenericTypes" to reduce the size of the type loop,
814       obviously PP could in principle spot this and do it automatically
815       though the code has yet to attain that level of sophistication!
816
817       Finally note when types are converted automatically one MUST use the
818       "[o]" qualifier for output variables or you hard one changes will get
819       optimised away by PP!
820
821       If you interface a large library you can automate the interfacing even
822       further. Perl can help you again(!) in doing this. In many libraries
823       you have certain calling conventions. This can be exploited. In short,
824       you can write a little parser (which is really not difficult in perl)
825       that then generates the calls to "pp_def" from parsed descriptions of
826       the functions in that library. For an example, please check the Slatec
827       interface in the "Lib" tree of the PDL distribution. If you want to
828       check (during debugging) which calls to PP functions your perl code
829       generated a little helper package comes in handy which replaces the PP
830       functions by identically named ones that dump their arguments to std‐
831       out.
832
833       Just say
834
835          perl -MPDL::PP::Dump myfile.pd
836
837       to see the calls to "pp_def" and friends. Try it with ops.pd and
838       slatec.pd. If you're interested (or want to enhance it), the source is
839       in Basic/Gen/PP/Dump.pm
840
841       Other macros and functions in the Code section
842
843       Macros: So far we have encountered the $SIZE, $GENERIC and $P macros.
844       Now we are going to quickly explain the other macros that are expanded
845       in the "Code" section of PDL::PP along with examples of their usage.
846
847       $T The $T macro is used for type switches. This is very useful when you
848          have to use different external (e.g. library) functions depending on
849          the input type of arguments. The general syntax is
850
851                  $Ttypeletters(type_alternatives)
852
853          where "typeletters" is a permutation of a subset of the letters
854          "BSULFD" which stand for Byte, Short, Ushort, etc. and "type_alter‐
855          natives" are the expansions when the type of the PP operation is
856          equal to that indicated by the respective letter. Let's illustrate
857          this incomprehensible description by an example. Assuming you have
858          two C functions with prototypes
859
860            void float_func(float *in, float *out);
861            void double_func(double *in, double *out);
862
863          which do basically the same thing but one accepts float and the
864          other double pointers. You could interface them to PDL by defining a
865          generic function "foofunc" (which will call the correct function
866          depending on the type of the transformation):
867
868            pp_def('foofunc',
869                  Pars => ' a(n); [o] b();',
870                  Code => ' $TFD(float_func,double_func) ($P(a),$P(b));'
871                  GenericTypes => [F,D],
872            );
873
874          Please note that you can't say
875
876                 Code => ' $TFD(float,double)_func ($P(a),$P(b));'
877
878          since the $T macro expands with trailing spaces, analogously to C
879          preprocessor macros.  The slightly longer form illustrated above is
880          correct.  If you really want brevity, you can of course do
881
882                  '$TBSULFD('.(join ',',map {"long_identifier_name_$_"}
883                          qw/byt short unseigned lounge flotte dubble/).');'
884
885       $PP
886          The $PP macro is used for a so called physical pointer access. The
887          physical refers to some internal optimisations of PDL (for those who
888          are familiar with the PDL core we are talking about the vaffine
889          optimisations). This macro is mainly for internal use and you
890          shouldn't need to use it in any of your normal code.
891
892       $COMP (and the "OtherPars" section)
893          The $COMP macro is used to access non-pdl values in the code sec‐
894          tion. Its name is derived from the implementation of transformations
895          in PDL. The variables you can refer to using $COMP are members of
896          the ``compiled'' structure that represents the PDL transformation in
897          question but does not yet contain any information about dimensions
898          (for further details check PDL::Internals). However, you can treat
899          $COMP just as a black box without knowing anything about the imple‐
900          mentation of transformations in PDL. So when would you use this
901          macro? Its main usage is to access values of arguments that are
902          declared in the "OtherPars" section of a "pp_def" definition. But
903          then you haven't heard about the "OtherPars" key yet?!  Let's have
904          another example that illustrates typical usage of both new features:
905
906            pp_def('pnmout',
907                  Pars => 'a(m)',
908                  OtherPars => "char* fd",
909                  GenericTypes => [B,U,S,L],
910                  Code => 'PerlIO *fp;
911                           IO *io;
912
913                         io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
914                           if (!io ⎪⎪ !(fp = IoIFP(io)))
915                                  croak("Can\'t figure out FP");
916
917                           if (PerlIO_write(fp,$P(a),len) != len)
918                                          croak("Error writing pnm file");
919            ');
920
921          This function is used to write data from a pdl to a file. The file
922          descriptor is passed as a string into this function. This parameter
923          does not go into the "Pars" section since it cannot be usefully
924          treated like a pdl but rather into the aptly named "OtherPars" sec‐
925          tion. Parameters in the "OtherPars" section follow those in the
926          "Pars" section when invoking the function, i.e.
927
928             open FILE,">out.dat" or die "couldn't open out.dat";
929             pnmout($pdl,'FILE');
930
931          When you want to access this parameter inside the code section you
932          have to tell PP by using the $COMP macro, i.e. you write "$COMP(fd)"
933          as in the example. Otherwise PP wouldn't know that the "fd" you are
934          referring to is the same as that specified in the "OtherPars" sec‐
935          tion.
936
937          Another use for the "OtherPars" section is to set a named dimension
938          in the signature. Let's have an example how that is done:
939
940            pp_def('setdim',
941                  Pars => '[o] a(n)',
942                  OtherPars => 'int ns => n',
943                  Code => 'loop(n) %{ $a() = n; %}',
944            );
945
946          This says that the named dimension "n" will be initialised from the
947          value of the other parameter "ns" which is of integer type (I guess
948          you have realised that we use the "CType From => named_dim" syntax).
949          Now you can call this function in the usual way:
950
951            setdim(($a=null),5);
952            print $a;
953              [ 0 1 2 3 4 ]
954
955          Admittedly this function is not very useful but it demonstrates how
956          it works. If you call the function with an existing pdl and you
957          don't need to explicitly specify the size of "n" since PDL::PP can
958          figure it out from the dimensions of the non-null pdl. In that case
959          you just give the dimension parameter as "-1":
960
961            $a = hist($b);
962            setdim($a,-1);
963
964          That should do it.
965
966       The only PP function that we have used in the examples so far is
967       "loop".  Additionally, there are currently two other functions which
968       are recognised in the "Code" section:
969
970       threadloop
971         As we heard above the signature of a PP defined function defines the
972         dimensions of all the pdl arguments involved in a primitive opera‐
973         tion.  However, you often call the functions that you defined with PP
974         with pdls that have more dimensions than those specified in the sig‐
975         nature. In this case the primitive operation is performed on all sub‐
976         slices of appropriate dimensionality in what is called a threadloop
977         (see also overview above and PDL::Indexing). Assuming you have some
978         notion of this concept you will probably appreciate that the opera‐
979         tion specified in the code section should be optimised since this is
980         the tightest loop inside a threadloop.  However, if you revisit the
981         example where we define the "pnmout" function, you will quickly
982         realise that looking up the "IO" file descriptor in the inner thread‐
983         loop is not very efficient when writing a pdl with many rows. A bet‐
984         ter approach would be to look up the "IO" descriptor once outside the
985         threadloop and use its value then inside the tightest threadloop.
986         This is exactly where the "threadloop" function comes in handy. Here
987         is an improved definition of "pnmout" which uses this function:
988
989           pp_def('pnmout',
990                 Pars => 'a(m)',
991                 OtherPars => "char* fd",
992                 GenericTypes => [B,U,S,L],
993                 Code => 'PerlIO *fp;
994                          IO *io;
995                          int len;
996
997                        io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
998                          if (!io ⎪⎪ !(fp = IoIFP(io)))
999                                 croak("Can\'t figure out FP");
1000
1001                          len = $SIZE(m) * sizeof($GENERIC());
1002
1003                          threadloop %{
1004                             if (PerlIO_write(fp,$P(a),len) != len)
1005                                         croak("Error writing pnm file");
1006                          %}
1007           ');
1008
1009         This works as follows. Normally the C code you write inside the
1010         "Code" section is placed inside a threadloop (i.e., PP generates the
1011         appropriate wrapping XS code around it). However, when you explicitly
1012         use the "threadloop" function, PDL::PP recognises this and doesn't
1013         wrap your code with an additional threadloop. This has the effect
1014         that code you write outside the threadloop is only executed once per
1015         transformation and just the code with in the surrounding "%{ ... %}"
1016         pair is placed within the tightest threadloop. This also comes in
1017         handy when you want to perform a decision (or any other code, espe‐
1018         cially CPU intensive code) only once per thread, i.e.
1019
1020           pp_addhdr('
1021             #define RAW 0
1022             #define ASCII 1
1023           ');
1024           pp_def('do_raworascii',
1025                  Pars => 'a(); b(); [o]c()',
1026                  OtherPars => 'int mode',
1027                Code => ' switch ($COMP(mode)) {
1028                             case RAW:
1029                                 threadloop %{
1030                                     /* do raw stuff */
1031                                 %}
1032                                 break;
1033                             case ASCII:
1034                                 threadloop %{
1035                                     /* do ASCII stuff */
1036                                 %}
1037                                 break;
1038                             default:
1039                                 croak("unknown mode");
1040                            }'
1041            );
1042
1043       types
1044         The types function works similar to the $T macro. However, with the
1045         "types" function the code in the following block (delimited by "%{"
1046         and "%}" as usual) is executed for all those cases in which the
1047         datatype of the operation is any of the types represented by the let‐
1048         ters in the argument to "type", e.g.
1049
1050              Code => '...
1051
1052                      types(BSUL) %{
1053                          /* do integer type operation */
1054                      %}
1055                      types(FD) %{
1056                          /* do floating point operation */
1057                      %}
1058                      ...'
1059
1060       Typemap handling in the "OtherPars" section
1061
1062       The "OtherPars" section discussed above is very often absolutely cru‐
1063       cial when you interface external libraries with PDL. However in many
1064       cases the external libraries either use derived types or pointers of
1065       various types.
1066
1067       The standard way to handle this in Perl is to use a "typemap" file.
1068       This is discussed in some detail in perlxs in the standard Perl docu‐
1069       mentation. In PP the functionality is very similar, so you can create a
1070       "typemap" file in the directory where your PP file resides and when it
1071       is built it is automatically read in to figure out the appropriate
1072       translation between the C type and Perl's built-in type.
1073
1074       That said, there are a couple of important differences from the general
1075       handling of types in XS. The first, and probably most important, is
1076       that at the moment pointers to types are not allowed in the "OtherPars"
1077       section. To get around this limitation you must use the "IV" type
1078       (thanks to Judd Taylor for pointing out that this is necessary for
1079       portability).
1080
1081       It is probably best to illustrate this with a couple of code-snippets:
1082
1083       For instance the "gsl_spline_init" function has the following C decla‐
1084       ration:
1085
1086           int  gsl_spline_init(gsl_spline * spline,
1087                 const double xa[], const double ya[], size_t size);
1088
1089       Clearly the "xa" and "ya" arrays are candidates for being passed in as
1090       piddles and the "size" argument is just the length of these piddles so
1091       that can be handled by the "$SIZE()" macro in PP. The problem is the
1092       pointer to the "gsl_spline" type. The natural solution would be to
1093       write an "OtherPars" declaration of the form
1094
1095           OtherPars => 'gsl_spline *spl'
1096
1097       and write a short "typemap" file which handled this type. This does not
1098       work at present however! So what you have to do is to go around the
1099       problem slightly (and in some ways this is easier too!):
1100
1101       The solution is to declare "spline" in the "OtherPars" section using an
1102       "Integer Value", "IV". This hides the nature of the variable from PP
1103       and you then need to (well to avoid compiler warnings at least!)  per‐
1104       form a type cast when you use the variable in your code. Thus "Other‐
1105       Pars" should take the form:
1106
1107           OtherPars => 'IV spl'
1108
1109       and when you use it in the code you will write
1110
1111           (gsl_spline *) $COMP(spl)
1112
1113       so putting this together as Andres Jordan has done (with the modifica‐
1114       tion using "IV" by Judd Taylor) in the "gsl_interp.pd" in the distribu‐
1115       tion source you get:
1116
1117            pp_def('init_meat',
1118                   Pars => 'double x(n); double y(n);',
1119                   OtherPars => 'IV spl',
1120                   Code =>'
1121                gsl_spline_init( (gsl_spline *) $COMP(spl), $P(x),$P(y),$SIZE(n));'
1122           );
1123
1124       where I have removed a macro wrapper call, but that would obscure the
1125       discussion.
1126
1127       The other minor difference as compared to the standard typemap handling
1128       in Perl, is that the user cannot specify non-standard typemap locations
1129       or typemap filenames using the "TYPEMAPS" option in MakeMaker... Thus
1130       you can only use a file called "typemap" and/or the "IV" trick above.
1131
1132       Other useful PP keys in data operation definitions
1133
1134       You have already heard about the "OtherPars" key. Currently, there are
1135       not many other keys for a data operation that will be useful in normal
1136       (whatever that is) PP programming. In fact, it would be interesting to
1137       hear about a case where you think you need more than what is provided
1138       at the moment.  Please speak up on one of the PDL mailing lists. Most
1139       other keys recognised by "pp_def" are only really useful for what we
1140       call slice operations (see also above).
1141
1142       One thing that is strongly being planned is variable number of argu‐
1143       ments, which will be a little tricky.
1144
1145       An incomplete list of the available keys:
1146
1147       Inplace
1148           Setting this key marks the routine as working inplace - ie the
1149           input and output piddles are the same. An example is
1150           "$a->inplace->sqrt()" (or "sqrt(inplace($a))").
1151
1152           Inplace => 1
1153               Use when the routine is a unary function, such as "sqrt".
1154
1155           Inplace => ['a']
1156               If there are more than one input piddles, specify the name of
1157               the one that can be changed inplace using an array reference.
1158
1159           Inplace => ['a','b']
1160               If there are more than one output piddle, specify the name of
1161               the input piddle and output piddle in a 2-element array refer‐
1162               ence. This probably isn't needed, but left in for completeness.
1163
1164           If bad values are being used, care must be taken to ensure the pro‐
1165           pogation of the badflag when inplace is being used; consider this
1166           excerpt from Basic/Bad/bad.pd:
1167
1168             pp_def('replacebad',HandleBad => 1,
1169               Pars => 'a(); [o]b();',
1170               OtherPars => 'double newval',
1171               Inplace => 1,
1172               CopyBadStatusCode =>
1173               '/* propogate badflag if inplace AND it has changed */
1174                if ( a == b && $ISPDLSTATEBAD(a) )
1175                  PDL->propogate_badflag( b, 0 );
1176
1177                /* always make sure the output is "good" */
1178                $SETPDLSTATEGOOD(b);
1179               ',
1180               ...
1181
1182           Since this routine removes all bad values, then the output piddle
1183           had its bad flag cleared. If run inplace (so "a == b"), then we
1184           have to tell all the children of "a" that the bad flag has been
1185           cleared (to save time we make sure that we call "PDL->pro‐
1186           pogate_badgflag" only if the input piddle had its bad flag set).
1187
1188           NOTE: one idea is that the documentation for the routine could be
1189           automatically flagged to indicate that it can be executed inplace,
1190           ie something similar to how "HandleBad" sets "BadDoc" if it's not
1191           supplied (it's not an ideal solution).
1192
1193       Other PDL::PP functions to support concise package definition
1194
1195       So far, we have described the "pp_def" and "pp_done" functions. PDL::PP
1196       exports a few other functions to aid you in writing concise PDL exten‐
1197       sion package definitions.
1198
1199       Often when you interface library functions as in the above example you
1200       have to include additional C include files. Since the XS file is gener‐
1201       ated by PP we need some means to make PP insert the appropriate include
1202       directives in the right place into the generated XS file.  To this end
1203       there is the "pp_addhdr" function. This is also the function to use
1204       when you want to define some C functions for internal use by some of
1205       the XS functions (which are mostly functions defined by "pp_def").  By
1206       including these functions here you make sure that PDL::PP inserts your
1207       code before the point where the actual XS module section begins and
1208       will therefore be left untouched by xsubpp (cf. perlxs and perlxstut
1209       manpages).
1210
1211       A typical call would be
1212
1213         pp_addhdr('
1214         #include <unistd.h>       /* we need defs of XXXX */
1215         #include "libprotos.h"    /* prototypes of library functions */
1216         #include "mylocaldecs.h"  /* Local decs */
1217
1218         static void do_the real_work(PDL_Byte * in, PDL_Byte * out, int n)
1219         {
1220               /* do some calculations with the data */
1221         }
1222         ');
1223
1224       This ensures that all the constants and prototypes you need will be
1225       properly included and that you can use the internal functions defined
1226       here in the "pp_def"s, e.g.:
1227
1228         pp_def('barfoo',
1229                Pars => ' a(n); [o] b(n)',
1230                GenericTypes => '[B]',
1231                Code => ' int ns = $SIZE(n);
1232                          do_the_real_work($P(a),$P(b),ns);
1233                        ',
1234         );
1235
1236       In many cases the actual PP code (meaning the arguments to "pp_def"
1237       calls) is only part of the package you are currently implementing.
1238       Often there is additional perl code and XS code you would normally have
1239       written into the pm and XS files which are now automatically generated
1240       by PP. So how to get this stuff into those dynamically generated files?
1241       Fortunately, there are a couple of functions, generally called
1242       "pp_addXXX" that assist you in doing this.
1243
1244       Let's assume you have additional perl code that should go into the gen‐
1245       erated pm-file. This is easily achieved with the "pp_addpm" command:
1246
1247          pp_addpm(<<'EOD');
1248
1249          =head1 NAME
1250
1251          PDL::Lib::Mylib -- a PDL interface to the Mylib library
1252
1253          =head1 DESCRIPTION
1254
1255          This package implements an interface to the Mylib package with full
1256          threading and indexing support (see L<PDL::Indexing>).
1257
1258          =cut
1259
1260          use PGPLOT;
1261
1262          =head2 use_myfunc
1263               this function applies the myfunc operation to all the
1264               elements of the input pdl regardless of dimensions
1265               and returns the sum of the result
1266          =cut
1267
1268          sub use_myfunc {
1269               my $pdl = shift;
1270
1271               myfunc($pdl->clump(-1),($res=null));
1272
1273               return $res->sum;
1274          }
1275
1276          EOD
1277
1278       You have probably got the idea. In some cases you also want to export
1279       your additional functions. To avoid getting into trouble with PP which
1280       also messes around with the @EXPORT array you just tell PP to add your
1281       functions to the list of exported functions:
1282
1283         pp_add_exported('', 'use_myfunc gethynx');
1284
1285       Note the initial empty string argument (reason for it?).
1286
1287       The "pp_add_isa" command works like the the "pp_add_exported" function.
1288       The arguments to "pp_add_isa" are added the @ISA list, e.g.
1289
1290         pp_add_isa(' Some::Other::Class ');
1291
1292       Sometimes you want to add extra XS code of your own (that is generally
1293       not involved with any threading/indexing issues but supplies some other
1294       functionality you want to access from the perl side) to the generated
1295       XS file, for example
1296
1297         pp_addxs('','
1298
1299         # Determine endianness of machine
1300
1301         int
1302         isbigendian()
1303            CODE:
1304              unsigned short i;
1305              PDL_Byte *b;
1306
1307              i = 42; b = (PDL_Byte*) (void*) &i;
1308
1309              if (*b == 42)
1310                 RETVAL = 0;
1311              else if (*(b+1) == 42)
1312                 RETVAL = 1;
1313              else
1314                 croak("Impossible - machine is neither big nor little endian!!\n");
1315              OUTPUT:
1316                RETVAL
1317         ');
1318
1319       Especially "pp_add_exported" and "pp_addxs" should be used with care.
1320       PP uses PDL::Exporter, hence letting PP export your function means that
1321       they get added to the standard list of function exported by default
1322       (the list defined by the export tag ``:Func''). If you use "pp_addxs"
1323       you shouldn't try to do anything that involves threading or indexing
1324       directly. PP is much better at generating the appropriate code from
1325       your definitions.
1326
1327       Finally, you may want to add some code to the BOOT section of the XS
1328       file (if you don't know what that is check perlxs). This is easily done
1329       with the "pp_add_boot" command:
1330
1331         pp_add_boot(<<EOB);
1332               descrip = mylib_initialize(KEEP_OPEN);
1333
1334               if (descrip == NULL)
1335                  croak("Can't initialize library");
1336
1337               GlobalStruc->descrip = descrip;
1338               GlobalStruc->maxfiles = 200;
1339         EOB
1340
1341       By default, PP.pm puts all subs defined using the pp_def function into
1342       the output .pm file's EXPORT list. This can create problems if you are
1343       creating a subclassed object where you don't want any methods exported.
1344       (i.e. the methods will only be called using the $object->method syn‐
1345       tax).
1346
1347       For these cases you can call pp_export_nothing() to clear out the
1348       export list. Example (At the end of the .pd file):
1349
1350         pp_export_nothing();
1351         pp_done();
1352
1353       By default, PP.pm puts the 'use Core;' line into the output .pm file.
1354       This imports Core's exported names into the current namespace, which
1355       can create problems if you are over-riding one of Core's methods in the
1356       current file.  You end up getting messages like "Warning: sub sumover
1357       redefined in file subclass.pm" when running the program.
1358
1359       For these cases the pp_core_importList can be used to change what is
1360       imported from Core.pm.  For example:
1361
1362         pp_core_importList('()')
1363
1364       This would result in
1365
1366         use Core();
1367
1368       being generated in the output .pm file. This would result in no names
1369       being imported from Core.pm. Similarly, calling
1370
1371         pp_core_importList(' qw/ barf /')
1372
1373       would result in
1374
1375         use Core qw/ barf/;
1376
1377       being generated in the output .pm file. This would result in just
1378       'barf' being imported from Core.pm.
1379

Slice operation

1381       The slice operation section of this manual is provided using dataflow
1382       and lazy evaluation: when you need it, ask Tjl to write it.  a delivery
1383       in a week from when I receive the email is 95% probable and two week
1384       delivery is 99% probable.
1385
1386       And anyway, the slice operations require a much more intimate knowledge
1387       of PDL internals than the data operations. Furthermore, the complexity
1388       of the issues involved is considerably higher than that in the average
1389       data operation. If you would like to convince yourself of this fact
1390       take a look at the Basic/Slices/slices.pd file in the PDL distribution
1391       :-). Nevertheless, functions generated using the slice operations are
1392       at the heart of the index manipulation and dataflow capabilities of
1393       PDL.
1394
1395       Also, there are a lot of dirty issues with virtual piddles and vaffines
1396       which we shall entirely skip here.
1397
1398       Slices and bad values
1399
1400       Slice operations need to be able to handle bad values (if support is
1401       compiled into PDL). The easiest thing to do is look at
1402       Basic/Slices/slices.pd to see how this works.
1403
1404       Along with "BadCode", there are also the "BadBackCode" and "BadRedoDim‐
1405       sCode" keys for "pp_def". However, any "EquivCPOffsCode" should not
1406       need changing, since any changes are absorbed into the definition of
1407       the "$EQUIVCPOFFS()" macro (ie it is handled automatically by PDL::PP>.
1408
1409       A few notes on writing a slicing routine...
1410
1411       The following few paragraphs describe writing of a new slicing routine
1412       ('range'); any errors are CED's. (--CED 26-Aug-2002)
1413

USEFUL ROUTINES

1415       The PDL "Core" structure, defined in Basic/Core/pdlcore.h.PL, contains
1416       pointers to a number of routines that may be useful to you.  The major‐
1417       ity of these routines deal with manipulating piddles, but some are more
1418       general:
1419
1420       PDL->qsort_B( PDL_Byte *xx, int a, int b )
1421           Sort the array "xx" between the indices "a" and "b".  There are
1422           also versions for the other PDL datatypes, with postfix "_S", "_U",
1423           "_L", "_F", and "_D".  Any module using this must ensure that
1424           "PDL::Ufunc" is loaded.
1425
1426       PDL->qsort_ind_B( PDL_Byte *xx, int *ix, int a, int b )
1427           As for "PDL->qsort_B", but this time sorting the indices rather
1428           than the data.
1429
1430       The routine "med2d" in Lib/Image2D/image2d.pd shows how such routines
1431       are used.
1432

MAKEFILES FOR PP FILES

1434       If you are going to generate a package from your PP file (typical file
1435       extensions are ".pd" or ".pp" for the files containing PP code) it is
1436       easiest and safest to leave generation of the appropriate commands to
1437       the Makefile. In the following we will outline the typical format of a
1438       perl Makefile to automatically build and install your package from a
1439       description in a PP file. Most of the rules to build the xs, pm and
1440       other required files from the PP file are already predefined in the
1441       PDL::Core::Dev package. We just have to tell MakeMaker to use it.
1442
1443       In most cases you can define your Makefile like
1444
1445         # Makefile.PL for a package defined by PP code.
1446
1447         use PDL::Core::Dev;            # Pick up development utilities
1448         use ExtUtils::MakeMaker;
1449
1450         $package = ["mylib.pd",Mylib,PDL::Lib::Mylib];
1451         %hash = pdlpp_stdargs($package);
1452         $hash{OBJECT} .= ' additional_Ccode$(OBJ_EXT) ';
1453         $hash{clean}->{FILES} .= ' todelete_Ccode$(OBJ_EXT) ';
1454         $hash{'VERSION_FROM'} = 'mylib.pd';
1455         WriteMakefile(%hash);
1456
1457         sub MY::postamble { pdlpp_postamble($package); }
1458
1459       Here, the list in $package is: first: PP source file name, then the
1460       prefix for the produced files and finally the whole package name.  You
1461       can modify the hash in whatever way you like but it would be reasonable
1462       to stay within some limits so that your package will continue to work
1463       with later versions of PDL.
1464
1465       If you don't want to use prepackaged arguments, here is a generic Make‐
1466       file.PL that you can adapt for your own needs:
1467
1468         # Makefile.PL for a package defined by PP code.
1469
1470         use PDL::Core::Dev;            # Pick up development utilities
1471         use ExtUtils::MakeMaker;
1472
1473         WriteMakefile(
1474          'NAME'       => 'PDL::Lib::Mylib',
1475          'VERSION_FROM'       => 'mylib.pd',
1476          'TYPEMAPS'     => [&PDL_TYPEMAP()],
1477          'OBJECT'       => 'mylib$(OBJ_EXT) additional_Ccode$(OBJ_EXT)',
1478          'PM'         => { 'Mylib.pm'            => '$(INST_LIBDIR)/Mylib.pm'},
1479          'INC'          => &PDL_INCLUDE(), # add include dirs as required by your lib
1480          'LIBS'         => [''],   # add link directives as necessary
1481          'clean'        => {'FILES'  =>
1482                                 'Mylib.pm Mylib.xs Mylib$(OBJ_EXT)
1483                                 additional_Ccode$(OBJ_EXT)'},
1484         );
1485
1486         # Add genpp rule; this will invoke PDL::PP on our PP file
1487         # the argument is an array reference where the array has three string elements:
1488         #   arg1: name of the source file that contains the PP code
1489         #   arg2: basename of the xs and pm files to be generated
1490         #   arg3: name of the package that is to be generated
1491         sub MY::postamble { pdlpp_postamble(["mylib.pd",Mylib,PDL::Lib::Mylib]); }
1492
1493       To make life even easier PDL::Core::Dev defines the function
1494       "pdlpp_stdargs" that returns a hash with default values that can be
1495       passed (either directly or after appropriate modification) to a call to
1496       WriteMakefile.  Currently, "pdlpp_stdargs" returns a hash where the
1497       keys are filled in as follows:
1498
1499               (
1500                'NAME'         => $mod,
1501                'TYPEMAPS'     => [&PDL_TYPEMAP()],
1502                'OBJECT'       => "$pref\$(OBJ_EXT)",
1503                PM     => {"$pref.pm" => "\$(INST_LIBDIR)/$pref.pm"},
1504                MAN3PODS => {"$src" => "\$(INST_MAN3DIR)/$mod.\$(MAN3EXT)"},
1505                'INC'          => &PDL_INCLUDE(),
1506                'LIBS'         => [''],
1507                'clean'        => {'FILES'  => "$pref.xs $pref.pm $pref\$(OBJ_EXT)"},
1508               )
1509
1510       Here, $src is the name of the source file with PP code, $pref the pre‐
1511       fix for the generated .pm and .xs files and $mod the name of the exn‐
1512       tension module to generate.
1513

INTERNALS

1515       The internals of the current version consist of a large table which
1516       gives the rules according to which things are translated and the subs
1517       which implement these rules.
1518
1519       Later on, it would be good to make the table modifiable by the user so
1520       that different things may be tried.
1521
1522       [Meta comment: here will hopefully be more in the future; currently,
1523       your best bet will be to read the source code :-( or ask on the list
1524       (try the latter first) ]
1525

Appendix A: Some keys recognised by PDL::PP

1527       Unless otherwise specified, the arguments are strings. Keys marked with
1528       (bad) are only used if bad-value support is compiled into PDL.
1529
1530       Pars
1531           define the signature of your function
1532
1533       OtherPars
1534           arguments which are not pdls. Default: nothing.
1535
1536       Code
1537           the actual code that implements the functionality; several PP
1538           macros and PP functions are recognised in the string value
1539
1540       HandleBad (bad)
1541           If set to 1, the routine is assumed to support bad values and the
1542           code in the BadCode key is used if bad values are present; it also
1543           sets things up so that the "$ISBAD()" etc macros can be used.  If
1544           set to 0, cause the routine to print a warning if any of the input
1545           piddles have their bad flag set.
1546
1547       BadCode (bad)
1548           Give the code to be used if bad values may be present in the input
1549           piddles.  Only used if "HandleBad => 1".
1550
1551       GenericTypes
1552           An array reference. The array may contain any subset of the strings
1553           `B', `S', `U', `L', `F' and `D', which specify which types your
1554           operation will accept.  This is very useful (and important!) when
1555           interfacing an external library.  Default: [qw/B S U L F D/]
1556
1557       Inplace
1558           Mark a function as being able to work inplace.
1559
1560            Inplace => 1          if  Pars => 'a(); [o]b();'
1561            Inplace => ['a']      if  Pars => 'a(); b(); [o]c();'
1562            Inplace => ['a','b']  if  Pars => 'a(); b(); [o]c(); [o]d();'
1563
1564           If bad values are being used, care must be taken to ensure the pro‐
1565           pogation of the badflag when inplace is being used; for instance
1566           see the code for "replacebad" in Basic/Bad/bad.pd.
1567
1568       Doc Used to specify a documentation string in Pod format. See PDL::Doc
1569           for information on PDL documentation conventions. Note: in the spe‐
1570           cial case where the PP 'Doc' string is one line this is implicitly
1571           used for the quick reference AND the documentation!
1572
1573           If the Doc field is omitted PP will generate default documentation
1574           (after all it knows about the Signature).
1575
1576           If you really want the function NOT to be documented in any way at
1577           this point (e.g. for an internal routine, or because youu are doing
1578           it elsewhere in the code) explictly specify "Doc=>undef".
1579
1580       BadDoc (bad)
1581           Contains the text returned by the "badinfo" command (in "perldl")
1582           or the "-b" switch to the "pdldoc" shell script. In many cases, you
1583           will not need to specify this, since the information can be auto‐
1584           matically created by PDL::PP. However, as befits computer-generated
1585           text, it's rather stilted; it may be much better to do it yourself!
1586

Appendix B: PP macros and functions

1588       Macros
1589
1590       Macros labelled by (bad) are only used if bad-value support is compiled
1591       into PDL.
1592
1593       $variablename_from_sig()
1594              access a pdl (by its name) that was specified in the signature
1595
1596       $COMP(x)
1597              access a value in the private data structure of this transforma‐
1598              tion (mainly used to use an argument that is specified in the
1599              "OtherPar" section)
1600
1601       $SIZE(n)
1602              replaced at runtime by the actual size of a named dimension (as
1603              specified in the signature)
1604
1605       $GENERIC()
1606              replaced by the C type that is equal to the runtime type of the
1607              operation
1608
1609       $P(a)  a pointer access to the PDL named "a" in the signature. Useful
1610              for interfacing to C functions
1611
1612       $PP(a) a physical pointer access to pdl "a"; mainly for internal use
1613
1614       $TXXX(Alternative,Alternative)
1615              expansion alternatives according to runtime type of operation,
1616              where XXX is some string that is matched by "/[BSULFD+]/".
1617
1618       $PDL(a)
1619              return a pointer to the pdl data structure (pdl *) of piddle "a"
1620
1621       $ISBAD(a()) (bad)
1622              returns true if the value stored in "a()" equals the bad value
1623              for this piddle.  Requires "HandleBad" being set to 1.
1624
1625       $ISGOOD(a()) (bad)
1626              returns true if the value stored in "a()" does not equal the bad
1627              value for this piddle.  Requires "HandleBad" being set to 1.
1628
1629       $SETBAD(a()) (bad)
1630              Sets "a()" to equal the bad value for this piddle.  Requires
1631              "HandleBad" being set to 1.
1632
1633       functions
1634
1635       "loop(DIMS) %{ ... %}"
1636          loop over named dimensions; limits are generated automatically by PP
1637
1638       "threadloop %{ ... %}"
1639          enclose following code in a threadloop
1640
1641       "types(TYPES) %{ ... %}"
1642          execute following code if type of operation is any of "TYPES"
1643

CURRENTLY UNDOCUMENTED

1656       RedoDimsCode, $RESIZE()
1657

BUGS

1659       PDL::PP is still, even in its rewritten form, too complicated.  It
1660       needs to be rethought a little as well as deconvoluted and modularized
1661       some more (e.g. all the NS things).
1662
1663       After the rewrite, this can happen a little by little, though.
1664
1665       Undocumented functions
1666
1667       The following functions have been added since this manual was written
1668       and are as yet undocumented
1669
1670       pp_export_nothing
1671       pp_core_importList
1672       pp_beginwrap
1673       pp_setversion
1674       pp_addbegin
1675

AUTHOR

1677       Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), Karl Glaaze‐
1678       brook (kgb@aaocbn1.aao.GOV.AU) and Christian Soeller (c.soeller@auck‐
1679       land.ac.nz). All rights reserved. Although destined for release as a
1680       man page with the standard PDL distribution, it is not public domain.
1681       Permission is granted to freely distribute verbatim copies of this doc‐
1682       ument provided that no modifications outside of formatting be made, and
1683       that this notice remain intact.  You are permitted and encouraged to
1684       use its code and derivatives thereof in your own source code for fun or
1685       for profit as you see fit.
1686
1687
1688
1689perl v5.8.8                       2003-05-21                             PP(1)