PDL::PP(1)

1PP(1)                 User Contributed Perl Documentation                PP(1)
2
3
4

NAME

6       PDL::PP - Generate PDL routines from concise descriptions
7

SYNOPSIS

9       e.g.
10
11               pp_def(
12                       'sumover',
13                       Pars => 'a(n); [o]b();',
14                       Code => 'double tmp=0;
15                               loop(n) %{ tmp += $a(); %}
16                               $b() = tmp;
17                               '
18               );
19
20               pp_done();
21

DESCRIPTION

23       In much of what follows we will assume familiarity of the reader with
24       the concepts of implicit and explicit threading and index manipulations
25       within PDL. If you have not yet heard of these concepts or are not very
26       comfortable with them it is time to check PDL::Indexing.
27
28       As you may appreciate from its name PDL::PP is a Pre-Processor, i.e.
29       it expands code via substitutions to make real C-code (well, actually
30       it outputs XS code (See perlxs) but that is very close to C).
31

Overview

33       Why do we need PP? Several reasons: firstly, we want to be able to
34       generate subroutine code for each of the PDL datatypes (PDL_Byte,
35       PDL_Short,. etc).  AUTOMATICALLY.  Secondly, when referring to slices
36       of PDL arrays in Perl (e.g. "$a->slice('0:10:2,:')" or other things
37       such as transposes) it is nice to be able to do this transparently and
38       to be able to do this 'in-place' - i.e, not to have to make a memory
39       copy of the section. PP handles all the necessary element and offset
40       arithmetic for you. There are also the notions of threading (repeated
41       calling of the same routine for multiple slices, see PDL::Indexing) and
42       dataflow (see PDL::Dataflow) which use of PP allows.
43
44       So how do you use PP? Well for the most part you just write ordinary C
45       code except for special PP constructs which take the form:
46
47          $something(something else)
48
49       or:
50
51          PPfunction %{
52            <stuff>
53          %}
54
55       The most important PP construct is the form "$array()". Consider the
56       very simple PP function to sum the elements of a 1D vector (in fact
57       this is very similar to the actual code used by 'sumover'):
58
59          pp_def('sumit',
60                  Pars => 'a(n);  [o]b();',
61                  Code => '
62                       double tmp;
63                       tmp = 0;
64                       loop(n) %{
65                         tmp += $a();
66                       %}
67                       $b() = tmp;
68          ');
69
70       What's going on? The "Pars =>" line is very important for PP - it
71       specifies all the arguments and their dimensionality. We call this the
72       signature of the PP function (compare also the explanations in
73       PDL::Indexing).  In this case the routine takes a 1-D function as input
74       and returns a 0-D scalar as output.  The "$a()" PP construct is used to
75       access elements of the array a(n) for you - PP fills in all the
76       required C code.
77
78       [Aside: since PP used "$var()" for its parsing you must single-quote
79       all Code=> arguments since you don't want perl to interpolate "$var()"
80       into another string - i.e. don't use "" unless you know what you are
81       doing! Tjl: it's usually easiest to use single quotes and
82       'something'.$interpolatable.'somethingelse']
83
84       In the simple case here where all elements are accessed the PP
85       construct "loop(n) %{ ... %}" is used to loop over all elements in
86       dimension "n".  Note this feature of PP: ALL DIMENSIONS ARE SPECIFIED
87       BY NAME.
88
89       This is made clearer if we avoid the PP loop() construct and write the
90       loop explicitly using conventional C:
91
92          pp_def('sumit',
93                  Pars => 'a(n);  [o]b();',
94                  Code => '
95                       int i,n_size;
96                       double tmp;
97                       n_size = $SIZE(n);
98                       tmp = 0;
99                       for(i=0; i<n_size; i++) {
100                         tmp += $a(n=>i);
101                       }
102                       $b() = tmp;
103          ');
104
105       which does the same as before, except more long-windedly.  You can see
106       to get element "i" of a() we say "$a(n=>i)" - we are specifying the
107       dimension by name "n". In 2D we might say:
108
109          Pars=>'a(m,n);',
110             ...
111             tmp += $a(m=>i,n=>j);
112             ...
113
114       The syntax 'm=>i' borrows from Perl hashes (which are in fact used in
115       the implementation of PP). One could also say "$a(n=>j,m=>i)" as order
116       is not important.
117
118       You can also see in the above example the use of another PP construct -
119       $SIZE(n) to get the length of the dimension "n".
120
121       It should, however, be noted that you shouldn't write an explicit
122       C-loop when you could have used the PP "loop" construct since PDL::PP
123       checks automatically the loop limits for you, usage of "loop" makes the
124       code more concise, etc. But there are certainly situations where you
125       need explicit control of the loop and now you know how to do it ;).
126
127       To revisit 'Why PP?' - the above code for sumit() will be generated for
128       each data-type. It will operate on slices of arrays 'in-place'. It will
129       thread automatically - e.g. if a 2D array is given it will be called
130       repeatedly for each 1D row (again check PDL::Indexing for the details
131       of threading).  And then b() will be a 1D array of sums of each row.
132       We could call it with $a->xchg(0,1) to sum the colums instead.  And
133       Dataflow tracing etc. will be available.
134
135       You can see PP saves the programmer from writing a lot of needlessly
136       repetitive C-code -- in our opinion this is one of the best features of
137       PDL making writing new C subroutines for PDL an amazingly concise
138       exercise. A second reason is the ability to make PP expand your concise
139       code definitions into different C code based on the needs of the
140       computer architecture in question. Imagine for example you are lucky to
141       have a supercomputer at your hands; in that case you want PDL::PP
142       certainly to generate code that takes advantage of the
143       vectorising/parallel computing features of your machine (this a project
144       for the future). In any case, the bottom line is that your unchanged
145       code should still expand to working XS code even if the internals of
146       PDL changed.
147
148       Also, because you are generating the code in an actual Perl script,
149       there are many fun things that you can do. Let's say that you need to
150       write both sumit (as above) and multit. With a little bit of
151       inventivity, we can do
152
153          for({Name => 'sumit', Init => '0', Op => '+='},
154              {Name => 'multit', Init => '1', Op => '*='}) {
155                  pp_def($_->{Name},
156                          Pars => 'a(n);  [o]b();',
157                          Code => '
158                               double tmp;
159                               tmp = '.$_->{Init}.';
160                               loop(n) %{
161                                 tmp '.$_->{Op}.' $a();
162                               %}
163                               $b() = tmp;
164                  ');
165          }
166
167       which defines both the functions easily. Now, if you later need to
168       change the signature or dimensionality or whatever, you only need to
169       change one place in your code.  Yeah, sure, your editor does have 'cut
170       and paste' and 'search and replace' but it's still less bothersome and
171       definitely more difficult to forget just one place and have strange
172       bugs creep in.  Also, adding 'orit' (bitwise or) later is a one-liner.
173
174       And remember, you really have perl's full abilities with you - you can
175       very easily read any input file and make routines from the information
176       in that file. For simple cases like the above, the author (Tjl)
177       currently favors the hash syntax like the above - it's not too much
178       more characters than the corresponding array syntax but much easier to
179       understand and change.
180
181       We should mention here also the ability to get the pointer to the
182       beginning of the data in memory - a prerequisite for interfacing PDL to
183       some libraries. This is handled with the "$P(var)" directive, see
184       below.
185
186       So, after this quick overview of the general flavour of programming PDL
187       routines using PDL::PP let's summarise in which circumstances you
188       should actually use this preprocessor/precompiler. You should use
189       PDL::PP if you want to
190
191       ·  interface PDL to some external library
192
193       ·  write some algorithm that would be slow if coded in perl (this is
194          not as often as you think; take a look at threading and dataflow
195          first).
196
197       ·  be a PDL developer (and even then it's not obligatory)
198

WARNING

200       Because of its architecture, PDL::PP can be both flexible and easy to
201       use (yet exuberantly complicated) at the same time. Currently, part of
202       the problem is that error messages are not very informative and if
203       something goes wrong, you'd better know what you are doing and be able
204       to hack your way through the internals (or be able to figure out by
205       trial and error what is wrong with your args to "pp_def").
206
207       An alternative, of course, is to ask someone about it (e.g., through
208       the mailing lists).
209

ABANDON ALL HOPE, YE WHO ENTER HERE (DESCRIPTION)

211       Now that you have some idea how to use "pp_def" to define new PDL
212       functions it is time to explain the general syntax of "pp_def".
213       "pp_def" takes as arguments first the name of the function you are
214       defining and then a hash list that can contain various keys.
215
216       Based on these keys PP generates XS code and a .pm file. The function
217       "pp_done" (see example in the SYNOPSIS) is used to tell PDL::PP that
218       there are no more definitions in this file and it is time to generate
219       the .xs and
220        .pm file.
221
222       As a consequence, there may be several pp_def() calls inside a file (by
223       convention files with PP code have the extension .pd or .pp) but
224       generally only one pp_done().
225
226       There are two main different types of usage of pp_def(), the 'data
227       operation' and 'slice operation' prototypes.
228
229       The 'data operation' is used to take some data, mangle it and output
230       some other data; this includes for example the '+' operation, matrix
231       inverse, sumover etc and all the examples we have talked about in this
232       document so far. Implicit and explicit threading and the creation of
233       the result are taken care of automatically in those opeartions. You can
234       even do dataflow with "sumit", "sumover", etc (don't be dismayed if you
235       don't understand the concept of dataflow in PDL very well yet; it is
236       still very much experimental).
237
238       The 'slice operation' is a different kind of operation: in a slice
239       operation, you are not changing any data, you are defining
240       correspondences between different elements of two piddles (examples
241       include the index manipulation/slicing function definitions in the file
242       slices.pd that is part of the PDL distribution; but beware, this is not
243       introductory level stuff).
244
245       If PDL was compiled with support for bad values (ie "WITH_BADVAL =>
246       1"), then additional keys are required for "pp_def", as explained
247       below.
248
249       If you are just interested in communicating with some external library
250       (for example some linear algebra/matrix library), you'll usually want
251       the 'data operation' so we are going to discuss that first.
252

Data operation

254   A simple example
255       In the data operation, you must know what dimensions of data you need.
256       First, an example with scalars:
257
258               pp_def('add',
259                       Pars => 'a(); b(); [o]c();',
260                       Code => '$c() = $a() + $b();'
261               );
262
263       That looks a little strange but let's dissect it. The first line is
264       easy: we're defining a routine with the name 'add'.  The second line
265       simply declares our parameters and the parentheses mean that they are
266       scalars. We call the string that defines our parameters and their
267       dimensionality the signature of that function. For its relevance with
268       regard to threading and index manipulations check the PDL::Indexing
269       manpage.
270
271       The third line is the actual operation. You need to use the dollar
272       signs and parentheses to refer to your parameters (this will probably
273       change at some point in the future, once a good syntax is found).
274
275       These lines are all that is necessary to actually define the function
276       for PDL (well, actually it isn't; you aditionally need to write a
277       Makefile.PL (see below) and build the module (something like 'perl
278       Makefile.PL; make'); but let's ignore that for the moment). So now you
279       can do
280
281               use MyModule;
282               $a = pdl 2,3,4;
283               $b = pdl 5;
284
285               $c = add($a,$b);
286               # or
287               add($a,$b,($c=null)); # Alternative form, useful if $c has been
288                                     # preset to something big, not useful here.
289
290       and have threading work correctly (the result is $c == [7 8 9]).
291
292   The Pars section: the signature of a PP function
293       Seeing the above example code you will most probably ask: what is this
294       strange "$c=null" syntax in the second call to our new "add" function?
295       If you take another look at the definition of "add" you will notice
296       that the third argument "c" is flagged with the qualifier "[o]" which
297       tells PDL::PP that this is an output argument. So the above call to add
298       means 'create a new $c from scratch with correct dimensions' - "null"
299       is a special token for 'empty piddle' (you might ask why we haven't
300       used the value "undef" to flag this instead of the PDL specific "null";
301       we are currently thinking about it ;).
302
303       [This should be explained in some other section of the manual as
304       well!!]  The reason for having this syntax as an alternative is that if
305       you have really huge piddles, you can do
306
307               $c = PDL->null;
308               for(some long loop) {
309                       # munge a,b
310                       add($a,$b,$c);
311                       # munge c, put something back to a,b
312               }
313
314       and avoid allocating and deallocating $c each time. It is allocated
315       once at the first add() and thereafter the memory stays until $c is
316       destroyed.
317
318       If you just say
319
320         $c =  add($a,$b);
321
322       the code generated by PP will automatically fill in "$c=null" and
323       return the result. If you want to learn more about the reasons why
324       PDL::PP supports this style where output arguments are given as last
325       arguments check the PDL::Indexing manpage.
326
327       "[o]" is not the only qualifier a pdl argument can have in the
328       signature.  Another important qualifier is the "[t]" option which flags
329       a pdl as temporary.  What does that mean? You tell PDL::PP that this
330       pdl is only used for temporary results in the course of the calculation
331       and you are not interested in its value after the computation has been
332       completed. But why should PDL::PP want to know about this in the first
333       place?  The reason is closely related to the concepts of pdl auto
334       creation (you heard about that above) and implicit threading. If you
335       use implicit threading the dimensionality of automatically created pdls
336       is actually larger than that specified in the signature. With "[o]"
337       flagged pdls will be created so that they have the additional
338       dimensions as required by the number of implicit thread dimensions.
339       When creating a temporary pdl, however, it will always only be made big
340       enough so that it can hold the result for one iteration in a
341       threadloop, i.e. as large as required by the signature.  So less memory
342       is wasted when you flag a pdl as temporary. Secondly, you can use
343       output auto creation with temporary pdls even when you are using
344       explicit threading which is forbidden for normal output pdls flagged
345       with "[o]" (see PDL::Indexing).
346
347       Here is an example where we use the [t] qualifier. We define the
348       function "callf" that calls a C routine "f" which needs a temporary
349       array of the same size and type as the array "a" (sorry about the
350       forward reference for $P; it's a pointer access, see below) :
351
352         pp_def('callf',
353               Pars => 'a(n); [t] tmp(n); [o] b()',
354               Code => 'int ns = $SIZE(n);
355                        f($P(a),$P(b),$P(tmp),ns);
356                       '
357         );
358
359   Argument dimensions and the signature
360       Now we have just talked about dimensions of pdls and the signature. How
361       are they related? Let's say that we want to add a scalar + the index
362       number to a vector:
363
364               pp_def('add2',
365                       Pars => 'a(n); b(); [o]c(n);',
366                       Code => 'loop(n) %{
367                                       $c() = $a() + $b() + n;
368                                %}'
369               );
370
371       There are several points to notice here: first, the "Pars" argument now
372       contains the n arguments to show that we have a single dimensions in a
373       and c. It is important to note that dimensions are actual entities that
374       are accessed by name so this declares a and c to have the same first
375       dimensions. In most PP definitions the size of named dimensions will be
376       set from the respective dimensions of non-output pdls (those with no
377       "[o]" flag) but sometimes you might want to set the size of a named
378       dimension explicitly through an integer parameter. See below in the
379       description of the "OtherPars" section how that works.
380
381   Type conversions and the signature
382       The signature also determines the type conversions that will be
383       performed when a PP function is invoked. So what happens when we invoke
384       one of our previously defined functions with pdls of different type,
385       e.g.
386
387         add2($a,$b,($ret=null));
388
389       where $a is of type "PDL_Float" and $b of type "PDL_Short"? With the
390       signature as shown in the definition of "add2" above the datatype of
391       the operation (as determined at runtime) is that of the pdl with the
392       'highest' type (sequence is byte < short < ushort < long < float <
393       double). In the add2 example the datatype of the operation is float ($a
394       has that datatype). All pdl arguments are then type converted to that
395       datatype (they are not converted inplace but a copy with the right type
396       is created if a pdl argument doesn't have the type of the operation).
397       Null pdls don't contribute a type in the determination of the type of
398       the operation.  However, they will be created with the datatype of the
399       operation; here, for example, $ret will be of type float. You should be
400       aware of these rules when calling PP functions with pdls of different
401       types to take the additional storage and runtime requirements into
402       account.
403
404       These type conversions are correct for most functions you normally
405       define with "pp_def". However, there are certain cases where slightly
406       modified type conversion behaviour is desired. For these cases
407       additional qualifiers in the signature can be used to specify the
408       desired properties with regard to type conversion. These qualifiers can
409       be combined with those we have encountered already (the creation
410       qualifiers "[o]" and "[t]"). Let's go through the list of qualifiers
411       that change type conversion behaviour.
412
413       The most important is the "int" qualifier which comes in handy when a
414       pdl argument represents indices into another pdl. Let's take a look at
415       an example from "PDL::Ufunc":
416
417          pp_def('maximum_ind',
418                 Pars => 'a(n); int [o] b()',
419                 Code => '$GENERIC() cur;
420                          int curind;
421                          loop(n) %{
422                           if (!n || $a() > cur) {cur = $a(); curind = n;}
423                          %}
424                          $b() = curind;',
425          );
426
427       The function "maximum_ind" finds the index of the largest element of a
428       vector. If you look at the signature you notice that the output
429       argument "b" has been declared with the additional "int" qualifier.
430       This has the following consequences for type conversions: regardless of
431       the type of the input pdl "a" the output pdl "b" will be of type
432       "PDL_Long" which makes sense since "b" will represent an index into
433       "a". Furthermore, if you call the function with an existing output pdl
434       "b" its type will not influence the datatype of the operation (see
435       above). Hence, even if "a" is of a smaller type than "b" it will not be
436       converted to match the type of "b" but stays untouched, which saves
437       memory and CPU cycles and is the right thing to do when "b" represents
438       indices. Also note that you can use the 'int' qualifier together with
439       other qualifiers (the "[o]" and "[t]" qualifiers). Order is significant
440       -- type qualifiers precede creation qualifiers ("[o]" and "[t]").
441
442       The above example also demonstrates typical usage of the "$GENERIC()"
443       macro.  It expands to the current type in a so called generic loop.
444       What is a generic loop? As you already heard a PP function has a
445       runtime datatype as determined by the type of the pdl arguments it has
446       been invoked with.  The PP generated XS code for this function
447       therefore contains a switch like "switch (type) {case PDL_Byte: ...
448       case PDL_Double: ...}" that selects a case based on the runtime
449       datatype of the function (it's called a type ``loop'' because there is
450       a loop in PP code that generates the cases).  In any case your code is
451       inserted once for each PDL type into this switch statement. The
452       "$GENERIC()" macro just expands to the respective type in each copy of
453       your parsed code in this "switch" statement, e.g., in the "case
454       PDL_Byte" section "cur" will expand to "PDL_Byte" and so on for the
455       other case statements. I guess you realise that this is a useful macro
456       to hold values of pdls in some code.
457
458       There are a couple of other qualifiers with similar effects as "int".
459       For your convenience there are the "float" and "double" qualifiers with
460       analogous consequences on type conversions as "int". Let's assume you
461       have a very large array for which you want to compute row and column
462       sums with an equivalent of the "sumover" function.  However, with the
463       normal definition of "sumover" you might run into problems when your
464       data is, e.g. of type short. A call like
465
466         sumover($large_pdl,($sums = null));
467
468       will result in $sums be of type short and is therefore prone to
469       overflow errors if $large_pdl is a very large array. On the other hand
470       calling
471
472         @dims = $large_pdl->dims; shift @dims;
473         sumover($large_pdl,($sums = zeroes(double,@dims)));
474
475       is not a good alternative either. Now we don't have overflow problems
476       with $sums but at the expense of a type conversion of $large_pdl to
477       double, something bad if this is really a large pdl. That's where
478       "double" comes in handy:
479
480         pp_def('sumoverd',
481                Pars => 'a(n); double [o] b()',
482                Code => 'double tmp=0;
483                         loop(n) %{ tmp += a(); %}
484                         $b() = tmp;',
485         );
486
487       This gets us around the type conversion and overflow problems. Again,
488       analogous to the "int" qualifier "double" results in "b" always being
489       of type double regardless of the type of "a" without leading to a
490       typeconversion of "a" as a side effect.
491
492       Finally, there are the "type+" qualifiers where type is one of "int" or
493       "float". What shall that mean. Let's illustrate the "int+" qualifier
494       with the actual definition of sumover:
495
496         pp_def('sumover',
497                Pars => 'a(n); int+ [o] b()',
498                Code => '$GENERIC(b) tmp=0;
499                         loop(n) %{ tmp += a(); %}
500                         $b() = tmp;',
501         );
502
503       As we had already seen for the "int", "float" and "double" qualifiers,
504       a pdl marked with a "type+" qualifier does not influence the datatype
505       of the pdl operation. Its meaning is "make this pdl at least of type
506       "type" or higher, as required by the type of the operation". In the
507       sumover example this means that when you call the function with an "a"
508       of type PDL_Short the output pdl will be of type PDL_Long (just as
509       would have been the case with the "int" qualifier). This again tries to
510       avoid overflow problems when using small datatypes (e.g. byte images).
511       However, when the datatype of the operation is higher than the type
512       specified in the "type+" qualifier "b" will be created with the
513       datatype of the operation, e.g. when "a" is of type double then "b"
514       will be double as well. We hope you agree that this is sensible
515       behaviour for "sumover". It should be obvious how the "float+"
516       qualifier works by analogy.  It may become necessary to be able to
517       specify a set of alternative types for the parameters. However, this
518       will probably not be implemented until someone comes up with a
519       reasonable use for it.
520
521       Note that we now had to specify the $GENERIC macro with the name of the
522       pdl to derive the type from that argument. Why is that? If you
523       carefully followed our explanations you will have realised that in some
524       cases "b" will have a different type than the type of the operation.
525       Calling the '$GENERIC' macro with "b" as argument makes sure that the
526       type will always the same as that of "b" in that part of the generic
527       loop.
528
529       This is about all there is to say about the "Pars" section in a
530       "pp_def" call. You should remember that this section defines the
531       signature of a PP defined function, you can use several options to
532       qualify certain arguments as output and temporary args and all
533       dimensions that you can later refer to in the "Code" section are
534       defined by name.
535
536       It is important that you understand the meaning of the signature since
537       in the latest PDL versions you can use it to define threaded functions
538       from within perl, i.e. what we call perl level threading. Please check
539       PDL::Indexing for details.
540
541   The Code section
542       The "Code" section contains the actual XS code that will be in the
543       innermost part of a threadloop (if you don't know what a thread loop is
544       then you still haven't read PDL::Indexing; do it now ;) after any PP
545       macros (like $GENERIC) and PP functions have been expanded (like the
546       "loop" function we are going to explain next).
547
548       Let's quickly reiterate the "sumover" example:
549
550         pp_def('sumover',
551                Pars => 'a(n); int+ [o] b()',
552                Code => '$GENERIC(b) tmp=0;
553                         loop(n) %{ tmp += a(); %}
554                         $b() = tmp;',
555         );
556
557       The "loop" construct in the "Code" section also refers to the dimension
558       name so you don't need to specify any limits: the loop is correctly
559       sized and everything is done for you, again.
560
561       Next, there is the surprising fact that "$a()" and "$b()" do not
562       contain the index. This is not necessary because we're looping over n
563       and both variables know which dimensions they have so they
564       automatically know they're being looped over.
565
566       This feature comes in very handy in many places and makes for much
567       shorter code. Of course, there are times when you want to circumvent
568       this; here is a function which symmetrizes a matrix and serves as an
569       example of how to code explicit looping:
570
571               pp_def('symm',
572                       Pars => 'a(n,n); [o]c(n,n);',
573                       Code => 'loop(n) %{
574                                       int n2;
575                                       for(n2=n; n2<$SIZE(n); n2++) {
576                                               $c(n0 => n, n1 => n2) =
577                                               $c(n0 => n2, n1 => n) =
578                                                $a(n0 => n, n1 => n2);
579                                       }
580                               %}
581                       '
582               );
583
584       Let's dissect what is happening. Firstly, what is this function
585       supposed to do? From its signature you see that it takes a 2D matrix
586       with equal numbers of columns and rows and outputs a matrix of the same
587       size. From a given input matrix $a it computes a symmetric output
588       matrix $c (symmetric in the matrix sense that A^T = A where ^T means
589       matrix transpose, or in PDL parlance $c == $c->xchg(0,1)). It does this
590       by using only the values on and below the diagonal of $a. In the output
591       matrix $c all values on and below the diagonal are the same as those in
592       $a while those above the diagonal are a mirror image of those below the
593       diagonal (above and below are here interpreted in the way that PDL
594       prints 2D pdls). If this explanation still sounds a bit strange just go
595       ahead, make a little file into which you write this definition, build
596       the new PDL extension (see section on Makefiles for PP code) and try it
597       out with a couple of examples.
598
599       Having explained what the function is supposed to do there are a couple
600       of points worth noting from the syntactical point of view. First, we
601       get the size of the dimension named "n" again by using the $SIZE macro.
602       Second, there are suddenly these funny "n0" and "n1" index names in the
603       code though the signature defines only the dimension "n". Why this? The
604       reason becomes clear when you note that both the first and second
605       dimension of $a and $b are named "n" in the signature of "symm". This
606       tells PDL::PP that the first and second dimension of these arguments
607       should have the same size. Otherwise the generated function will raise
608       a runtime error.  However, now in an access to $a and $c PDL::PP cannot
609       figure out which index "n" refers to any more just from the name of the
610       index.  Therefore, the indices with equal dimension names get numbered
611       from left to right starting at 0, e.g. in the above example "n0" refers
612       to the first dimension of $a and $c, "n1" to the second and so on.
613
614       In all examples so far, we have only used the "Pars" and "Code" members
615       of the hash that was passed to "pp_def". There are certainly other keys
616       that are recognised by PDL::PP and we will hear about some of them in
617       the course of this document. Find a (non-exhaustive) list of keys in
618       Appendix A.  A list of macros and PPfunctions (we have only encountered
619       some of those in the examples above yet) that are expanded in values of
620       the hash argument to "pp_def" is summarised in Appendix B.
621
622       At this point, it might be appropriate to mention that PDL::PP is not a
623       completely static, well designed set of routines (as Tuomas puts it:
624       "stop thinking of PP as a set of routines carved in stone") but rather
625       a collection of things that the PDL::PP author (Tuomas J. Lukka)
626       considered he would have to write often into his PDL extension
627       routines. PP tries to be expandable so that in the future, as new needs
628       arise, new common code can be abstracted back into it. If you want to
629       learn more on why you might want to change PDL::PP and how to do it
630       check the section on PDL::PP internals.
631
632   Handling bad values
633       If you do not have bad-value support compiled into PDL you can ignore
634       this section and the related keys: "BadCode", "HandleBad", ...  (try
635       printing out the value of $PDL::Bad::Status - if it equals 0 then move
636       straight on).
637
638       There are several keys and macros used when writing code to handle bad
639       values. The first one is the "HandleBad" key:
640
641       HandleBad => 0
642           This flags a pp-routine as NOT handling bad values. If this routine
643           is sent piddles with their "badflag" set, then a warning message is
644           printed to STDOUT and the piddles are processed as if the value
645           used to represent bad values is a valid number. The "badflag" value
646           is not propogated to the output piddles.
647
648           An example of when this is used is for FFT routines, which
649           generally do not have a way of ignoring part of the data.
650
651       HandleBad => 1
652           This causes PDL::PP to write extra code that ensures the BadCode
653           section is used, and that the "$ISBAD()" macro (and its brethren)
654           work.
655
656       HandleBad is not given
657           If any of the input piddles have their "badflag" set, then the
658           output piddles will have their "badflag" set, but any supplied
659           BadCode is ignored.
660
661       The value of "HandleBad" is used to define the contents of the "BadDoc"
662       key, if it is not given.
663
664       To handle bad values, code must be written somewhat differently; for
665       instance,
666
667        $c() = $a() + $b();
668
669       becomes something like
670
671        if ( $a() != BADVAL && $b() != BADVAL ) {
672           $c() = $a() + $b();
673        } else {
674           $c() = BADVAL;
675        }
676
677       However, we only want the second version if bad values are present in
678       the input piddles (and that bad-value support is wanted!) - otherwise
679       we actually want the original code. This is where the "BadCode" key
680       comes in; you use it to specify the code to execute if bad values may
681       be present, and PP uses both it and the "Code" section to create
682       something like:
683
684        if ( bad_values_are_present ) {
685           fancy_threadloop_stuff {
686              BadCode
687           }
688        } else {
689           fancy_threadloop_stuff {
690              Code
691           }
692        }
693
694       This approach means that there is virtually no overhead when bad values
695       are not present (ie the badflag routine returns 0).
696
697       The BadCode section can use the same macros and looping constructs as
698       the Code section.  However, it wouldn't be much use without the
699       following additional macros:
700
701       $ISBAD(var)
702           To check whether a piddle's value is bad, use the $ISBAD macro:
703
704            if ( $ISBAD(a()) ) { printf("a() is bad\n"); }
705
706           You can also access given elements of a piddle:
707
708            if ( $ISBAD(a(n=>l)) ) { printf("element %d of a() is bad\n", l); }
709
710       $ISGOOD(var)
711           This is the opposite of the $ISBAD macro.
712
713       $SETBAD(var)
714           For when you want to set an element of a piddle bad.
715
716       $ISBADVAR(c_var,pdl)
717           If you have cached the value of a piddle "$a()" into a c-variable
718           ("foo" say), then to check whether it is bad, use
719           "$ISBADVAR(foo,a)".
720
721       $ISGOODVAR(c_var,pdl)
722           As above, but this time checking that the cached value isn't bad.
723
724       $SETBADVAR(c_var,pdl)
725           To copy the bad value for a piddle into a c variable, use
726           "$SETBADVAR(foo,a)".
727
728       TODO: mention "$PPISBAD()" etc macros.
729
730       Using these macros, the above code could be specified as:
731
732        Code => '$c() = $a() + $b();',
733        BadCode => '
734           if ( $ISBAD(a()) || $ISBAD(b()) ) {
735              $SETBAD(c());
736           } else {
737              $c() = $a() + $b();
738           }',
739
740       Since this is perl, TMTOWTDI, so you could also write:
741
742        BadCode => '
743           if ( $ISGOOD(a()) && $ISGOOD(b()) ) {
744              $c() = $a() + $b();
745           } else {
746              $SETBAD(c());
747           }',
748
749       If you want access to the value of the badflag for a given piddle, you
750       can use the "$PDLSTATExxxx()" macros:
751
752       $PDLSTATEISBAD(pdl)
753       $PDLSTATEISGOOD(pdl)
754       $PDLSTATESETBAD(pdl)
755       $PDLSTATESETGOOD(pdl)
756
757       TODO: mention the "FindBadStatusCode" and "CopyBadStatusCode" options
758       to "pp_def", as well as the "BadDoc" key.
759
760   Interfacing your own/library functions using PP
761       Now, consider the following: you have your own C function (that may in
762       fact be part of some library you want to interface to PDL) which takes
763       as arguments two pointers to vectors of double:
764
765               void myfunc(int n,double *v1,double *v2);
766
767       The correct way of defining the PDL function is
768
769               pp_def('myfunc',
770                       Pars => 'a(n); [o]b(n);',
771                       GenericTypes => [D],
772                       Code => 'myfunc($SIZE(n),$P(a),$P(b));'
773               );
774
775       The "$P("par")" syntax returns a pointer to the first element and the
776       other elements are guaranteed to lie after that.
777
778       Notice that here it is possible to make many mistakes. First, $SIZE(n)
779       must be used instead of "n". Second, you shouldn't put any loops in
780       this code. Third, here we encounter a new hash key recognised by
781       PDL::PP : the "GenericTypes" declaration tells PDL::PP to ONLY GENERATE
782       THE TYPELOOP FOP THE LIST OF TYPES SPECIFIED. In this case "double".
783       This has two advantages. Firstly the size of the compiled code is
784       reduced vastly, secondly if non-double arguments are passed to
785       "myfunc()" PDL will automatically convert them to double before passing
786       to the external C routine and convert them back afterwards.
787
788       One can also use "Pars" to qualify the types of individual arguments.
789       Thus one could also write this as:
790
791               pp_def('myfunc',
792                       Pars => 'double a(n); double [o]b(n);',
793                       Code => 'myfunc($SIZE(n),$P(a),$P(b));'
794               );
795
796       The type specification in "Pars" exempts the argument from variation in
797       the typeloop - rather it is automatically converted too and from the
798       type specified. This is obviously useful in a more general example,
799       e.g.:
800
801               void myfunc(int n,float *v1,long *v2);
802
803               pp_def('myfunc',
804                       Pars => 'float a(n); long [o]b(n);',
805                       GenericTypes => [F],
806                       Code => 'myfunc($SIZE(n),$P(a),$P(b));'
807               );
808
809       Note we still use "GenericTypes" to reduce the size of the type loop,
810       obviously PP could in principle spot this and do it automatically
811       though the code has yet to attain that level of sophistication!
812
813       Finally note when types are converted automatically one MUST use the
814       "[o]" qualifier for output variables or you hard one changes will get
815       optimised away by PP!
816
817       If you interface a large library you can automate the interfacing even
818       further. Perl can help you again(!) in doing this. In many libraries
819       you have certain calling conventions. This can be exploited. In short,
820       you can write a little parser (which is really not difficult in perl)
821       that then generates the calls to "pp_def" from parsed descriptions of
822       the functions in that library. For an example, please check the Slatec
823       interface in the "Lib" tree of the PDL distribution. If you want to
824       check (during debugging) which calls to PP functions your perl code
825       generated a little helper package comes in handy which replaces the PP
826       functions by identically named ones that dump their arguments to
827       stdout.
828
829       Just say
830
831          perl -MPDL::PP::Dump myfile.pd
832
833       to see the calls to "pp_def" and friends. Try it with ops.pd and
834       slatec.pd. If you're interested (or want to enhance it), the source is
835       in Basic/Gen/PP/Dump.pm
836
837   Other macros and functions in the Code section
838       Macros: So far we have encountered the $SIZE, $GENERIC and $P macros.
839       Now we are going to quickly explain the other macros that are expanded
840       in the "Code" section of PDL::PP along with examples of their usage.
841
842       $T The $T macro is used for type switches. This is very useful when you
843          have to use different external (e.g. library) functions depending on
844          the input type of arguments. The general syntax is
845
846                  $Ttypeletters(type_alternatives)
847
848          where "typeletters" is a permutation of a subset of the letters
849          "BSULFD" which stand for Byte, Short, Ushort, etc. and
850          "type_alternatives" are the expansions when the type of the PP
851          operation is equal to that indicated by the respective letter. Let's
852          illustrate this incomprehensible description by an example. Assuming
853          you have two C functions with prototypes
854
855            void float_func(float *in, float *out);
856            void double_func(double *in, double *out);
857
858          which do basically the same thing but one accepts float and the
859          other double pointers. You could interface them to PDL by defining a
860          generic function "foofunc" (which will call the correct function
861          depending on the type of the transformation):
862
863            pp_def('foofunc',
864                  Pars => ' a(n); [o] b();',
865                  Code => ' $TFD(float_func,double_func) ($P(a),$P(b));'
866                  GenericTypes => [F,D],
867            );
868
869          Please note that you can't say
870
871                 Code => ' $TFD(float,double)_func ($P(a),$P(b));'
872
873          since the $T macro expands with trailing spaces, analogously to C
874          preprocessor macros.  The slightly longer form illustrated above is
875          correct.  If you really want brevity, you can of course do
876
877                  '$TBSULFD('.(join ',',map {"long_identifier_name_$_"}
878                          qw/byt short unseigned lounge flotte dubble/).');'
879
880       $PP
881          The $PP macro is used for a so called physical pointer access. The
882          physical refers to some internal optimisations of PDL (for those who
883          are familiar with the PDL core we are talking about the vaffine
884          optimisations). This macro is mainly for internal use and you
885          shouldn't need to use it in any of your normal code.
886
887       $COMP (and the "OtherPars" section)
888          The $COMP macro is used to access non-pdl values in the code
889          section. Its name is derived from the implementation of
890          transformations in PDL. The variables you can refer to using $COMP
891          are members of the ``compiled'' structure that represents the PDL
892          transformation in question but does not yet contain any information
893          about dimensions (for further details check PDL::Internals).
894          However, you can treat $COMP just as a black box without knowing
895          anything about the implementation of transformations in PDL. So when
896          would you use this macro? Its main usage is to access values of
897          arguments that are declared in the "OtherPars" section of a "pp_def"
898          definition. But then you haven't heard about the "OtherPars" key
899          yet?!  Let's have another example that illustrates typical usage of
900          both new features:
901
902            pp_def('pnmout',
903                  Pars => 'a(m)',
904                  OtherPars => "char* fd",
905                  GenericTypes => [B,U,S,L],
906                  Code => 'PerlIO *fp;
907                           IO *io;
908
909                         io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
910                           if (!io || !(fp = IoIFP(io)))
911                                  croak("Can\'t figure out FP");
912
913                           if (PerlIO_write(fp,$P(a),len) != len)
914                                          croak("Error writing pnm file");
915            ');
916
917          This function is used to write data from a pdl to a file. The file
918          descriptor is passed as a string into this function. This parameter
919          does not go into the "Pars" section since it cannot be usefully
920          treated like a pdl but rather into the aptly named "OtherPars"
921          section. Parameters in the "OtherPars" section follow those in the
922          "Pars" section when invoking the function, i.e.
923
924             open FILE,">out.dat" or die "couldn't open out.dat";
925             pnmout($pdl,'FILE');
926
927          When you want to access this parameter inside the code section you
928          have to tell PP by using the $COMP macro, i.e. you write "$COMP(fd)"
929          as in the example. Otherwise PP wouldn't know that the "fd" you are
930          referring to is the same as that specified in the "OtherPars"
931          section.
932
933          Another use for the "OtherPars" section is to set a named dimension
934          in the signature. Let's have an example how that is done:
935
936            pp_def('setdim',
937                  Pars => '[o] a(n)',
938                  OtherPars => 'int ns => n',
939                  Code => 'loop(n) %{ $a() = n; %}',
940            );
941
942          This says that the named dimension "n" will be initialised from the
943          value of the other parameter "ns" which is of integer type (I guess
944          you have realised that we use the "CType From => named_dim" syntax).
945          Now you can call this function in the usual way:
946
947            setdim(($a=null),5);
948            print $a;
949              [ 0 1 2 3 4 ]
950
951          Admittedly this function is not very useful but it demonstrates how
952          it works. If you call the function with an existing pdl and you
953          don't need to explicitly specify the size of "n" since PDL::PP can
954          figure it out from the dimensions of the non-null pdl. In that case
955          you just give the dimension parameter as "-1":
956
957            $a = hist($b);
958            setdim($a,-1);
959
960          That should do it.
961
962       The only PP function that we have used in the examples so far is
963       "loop".  Additionally, there are currently two other functions which
964       are recognised in the "Code" section:
965
966       threadloop
967         As we heard above the signature of a PP defined function defines the
968         dimensions of all the pdl arguments involved in a primitive
969         operation.  However, you often call the functions that you defined
970         with PP with pdls that have more dimensions than those specified in
971         the signature. In this case the primitive operation is performed on
972         all subslices of appropriate dimensionality in what is called a
973         threadloop (see also overview above and PDL::Indexing). Assuming you
974         have some notion of this concept you will probably appreciate that
975         the operation specified in the code section should be optimised since
976         this is the tightest loop inside a threadloop.  However, if you
977         revisit the example where we define the "pnmout" function, you will
978         quickly realise that looking up the "IO" file descriptor in the inner
979         threadloop is not very efficient when writing a pdl with many rows. A
980         better approach would be to look up the "IO" descriptor once outside
981         the threadloop and use its value then inside the tightest threadloop.
982         This is exactly where the "threadloop" function comes in handy. Here
983         is an improved definition of "pnmout" which uses this function:
984
985           pp_def('pnmout',
986                 Pars => 'a(m)',
987                 OtherPars => "char* fd",
988                 GenericTypes => [B,U,S,L],
989                 Code => 'PerlIO *fp;
990                          IO *io;
991                          int len;
992
993                        io = GvIO(gv_fetchpv($COMP(fd),FALSE,SVt_PVIO));
994                          if (!io || !(fp = IoIFP(io)))
995                                 croak("Can\'t figure out FP");
996
997                          len = $SIZE(m) * sizeof($GENERIC());
998
999                          threadloop %{
1000                             if (PerlIO_write(fp,$P(a),len) != len)
1001                                         croak("Error writing pnm file");
1002                          %}
1003           ');
1004
1005         This works as follows. Normally the C code you write inside the
1006         "Code" section is placed inside a threadloop (i.e., PP generates the
1007         appropriate wrapping XS code around it). However, when you explicitly
1008         use the "threadloop" function, PDL::PP recognises this and doesn't
1009         wrap your code with an additional threadloop. This has the effect
1010         that code you write outside the threadloop is only executed once per
1011         transformation and just the code with in the surrounding "%{ ... %}"
1012         pair is placed within the tightest threadloop. This also comes in
1013         handy when you want to perform a decision (or any other code,
1014         especially CPU intensive code) only once per thread, i.e.
1015
1016           pp_addhdr('
1017             #define RAW 0
1018             #define ASCII 1
1019           ');
1020           pp_def('do_raworascii',
1021                  Pars => 'a(); b(); [o]c()',
1022                  OtherPars => 'int mode',
1023                Code => ' switch ($COMP(mode)) {
1024                             case RAW:
1025                                 threadloop %{
1026                                     /* do raw stuff */
1027                                 %}
1028                                 break;
1029                             case ASCII:
1030                                 threadloop %{
1031                                     /* do ASCII stuff */
1032                                 %}
1033                                 break;
1034                             default:
1035                                 croak("unknown mode");
1036                            }'
1037            );
1038
1039       types
1040         The types function works similar to the $T macro. However, with the
1041         "types" function the code in the following block (delimited by "%{"
1042         and "%}" as usual) is executed for all those cases in which the
1043         datatype of the operation is any of the types represented by the
1044         letters in the argument to "type", e.g.
1045
1046              Code => '...
1047
1048                      types(BSUL) %{
1049                          /* do integer type operation */
1050                      %}
1051                      types(FD) %{
1052                          /* do floating point operation */
1053                      %}
1054                      ...'
1055
1056   Typemap handling in the "OtherPars" section
1057       The "OtherPars" section discussed above is very often absolutely
1058       crucial when you interface external libraries with PDL. However in many
1059       cases the external libraries either use derived types or pointers of
1060       various types.
1061
1062       The standard way to handle this in Perl is to use a "typemap" file.
1063       This is discussed in some detail in perlxs in the standard Perl
1064       documentation. In PP the functionality is very similar, so you can
1065       create a "typemap" file in the directory where your PP file resides and
1066       when it is built it is automatically read in to figure out the
1067       appropriate translation between the C type and Perl's built-in type.
1068
1069       That said, there are a couple of important differences from the general
1070       handling of types in XS. The first, and probably most important, is
1071       that at the moment pointers to types are not allowed in the "OtherPars"
1072       section. To get around this limitation you must use the "IV" type
1073       (thanks to Judd Taylor for pointing out that this is necessary for
1074       portability).
1075
1076       It is probably best to illustrate this with a couple of code-snippets:
1077
1078       For instance the "gsl_spline_init" function has the following C
1079       declaration:
1080
1081           int  gsl_spline_init(gsl_spline * spline,
1082                 const double xa[], const double ya[], size_t size);
1083
1084       Clearly the "xa" and "ya" arrays are candidates for being passed in as
1085       piddles and the "size" argument is just the length of these piddles so
1086       that can be handled by the "$SIZE()" macro in PP. The problem is the
1087       pointer to the "gsl_spline" type. The natural solution would be to
1088       write an "OtherPars" declaration of the form
1089
1090           OtherPars => 'gsl_spline *spl'
1091
1092       and write a short "typemap" file which handled this type. This does not
1093       work at present however! So what you have to do is to go around the
1094       problem slightly (and in some ways this is easier too!):
1095
1096       The solution is to declare "spline" in the "OtherPars" section using an
1097       "Integer Value", "IV". This hides the nature of the variable from PP
1098       and you then need to (well to avoid compiler warnings at least!)
1099       perform a type cast when you use the variable in your code. Thus
1100       "OtherPars" should take the form:
1101
1102           OtherPars => 'IV spl'
1103
1104       and when you use it in the code you will write
1105
1106           (gsl_spline *) $COMP(spl)
1107
1108       so putting this together as Andres Jordan has done (with the
1109       modification using "IV" by Judd Taylor) in the "gsl_interp.pd" in the
1110       distribution source you get:
1111
1112            pp_def('init_meat',
1113                   Pars => 'double x(n); double y(n);',
1114                   OtherPars => 'IV spl',
1115                   Code =>'
1116                gsl_spline_init( (gsl_spline *) $COMP(spl), $P(x),$P(y),$SIZE(n));'
1117           );
1118
1119       where I have removed a macro wrapper call, but that would obscure the
1120       discussion.
1121
1122       The other minor difference as compared to the standard typemap handling
1123       in Perl, is that the user cannot specify non-standard typemap locations
1124       or typemap filenames using the "TYPEMAPS" option in MakeMaker... Thus
1125       you can only use a file called "typemap" and/or the "IV" trick above.
1126
1127   Other useful PP keys in data operation definitions
1128       You have already heard about the "OtherPars" key. Currently, there are
1129       not many other keys for a data operation that will be useful in normal
1130       (whatever that is) PP programming. In fact, it would be interesting to
1131       hear about a case where you think you need more than what is provided
1132       at the moment.  Please speak up on one of the PDL mailing lists. Most
1133       other keys recognised by "pp_def" are only really useful for what we
1134       call slice operations (see also above).
1135
1136       One thing that is strongly being planned is variable number of
1137       arguments, which will be a little tricky.
1138
1139       An incomplete list of the available keys:
1140
1141       Inplace
1142           Setting this key marks the routine as working inplace - ie the
1143           input and output piddles are the same. An example is
1144           "$a->inplace->sqrt()" (or "sqrt(inplace($a))").
1145
1146           Inplace => 1
1147               Use when the routine is a unary function, such as "sqrt".
1148
1149           Inplace => ['a']
1150               If there are more than one input piddles, specify the name of
1151               the one that can be changed inplace using an array reference.
1152
1153           Inplace => ['a','b']
1154               If there are more than one output piddle, specify the name of
1155               the input piddle and output piddle in a 2-element array
1156               reference. This probably isn't needed, but left in for
1157               completeness.
1158
1159           If bad values are being used, care must be taken to ensure the
1160           propogation of the badflag when inplace is being used; consider
1161           this excerpt from Basic/Bad/bad.pd:
1162
1163             pp_def('replacebad',HandleBad => 1,
1164               Pars => 'a(); [o]b();',
1165               OtherPars => 'double newval',
1166               Inplace => 1,
1167               CopyBadStatusCode =>
1168               '/* propogate badflag if inplace AND it has changed */
1169                if ( a == b && $ISPDLSTATEBAD(a) )
1170                  PDL->propogate_badflag( b, 0 );
1171
1172                /* always make sure the output is "good" */
1173                $SETPDLSTATEGOOD(b);
1174               ',
1175               ...
1176
1177           Since this routine removes all bad values, then the output piddle
1178           had its bad flag cleared. If run inplace (so "a == b"), then we
1179           have to tell all the children of "a" that the bad flag has been
1180           cleared (to save time we make sure that we call
1181           "PDL->propogate_badgflag" only if the input piddle had its bad flag
1182           set).
1183
1184           NOTE: one idea is that the documentation for the routine could be
1185           automatically flagged to indicate that it can be executed inplace,
1186           ie something similar to how "HandleBad" sets "BadDoc" if it's not
1187           supplied (it's not an ideal solution).
1188
1189   Other PDL::PP functions to support concise package definition
1190       So far, we have described the "pp_def" and "pp_done" functions. PDL::PP
1191       exports a few other functions to aid you in writing concise PDL
1192       extension package definitions.
1193
1194       Often when you interface library functions as in the above example you
1195       have to include additional C include files. Since the XS file is
1196       generated by PP we need some means to make PP insert the appropriate
1197       include directives in the right place into the generated XS file.  To
1198       this end there is the "pp_addhdr" function. This is also the function
1199       to use when you want to define some C functions for internal use by
1200       some of the XS functions (which are mostly functions defined by
1201       "pp_def").  By including these functions here you make sure that
1202       PDL::PP inserts your code before the point where the actual XS module
1203       section begins and will therefore be left untouched by xsubpp (cf.
1204       perlxs and perlxstut manpages).
1205
1206       A typical call would be
1207
1208         pp_addhdr('
1209         #include <unistd.h>       /* we need defs of XXXX */
1210         #include "libprotos.h"    /* prototypes of library functions */
1211         #include "mylocaldecs.h"  /* Local decs */
1212
1213         static void do_the real_work(PDL_Byte * in, PDL_Byte * out, int n)
1214         {
1215               /* do some calculations with the data */
1216         }
1217         ');
1218
1219       This ensures that all the constants and prototypes you need will be
1220       properly included and that you can use the internal functions defined
1221       here in the "pp_def"s, e.g.:
1222
1223         pp_def('barfoo',
1224                Pars => ' a(n); [o] b(n)',
1225                GenericTypes => '[B]',
1226                Code => ' int ns = $SIZE(n);
1227                          do_the_real_work($P(a),$P(b),ns);
1228                        ',
1229         );
1230
1231       In many cases the actual PP code (meaning the arguments to "pp_def"
1232       calls) is only part of the package you are currently implementing.
1233       Often there is additional perl code and XS code you would normally have
1234       written into the pm and XS files which are now automatically generated
1235       by PP. So how to get this stuff into those dynamically generated files?
1236       Fortunately, there are a couple of functions, generally called
1237       "pp_addXXX" that assist you in doing this.
1238
1239       Let's assume you have additional perl code that should go into the
1240       generated pm-file. This is easily achieved with the "pp_addpm" command:
1241
1242          pp_addpm(<<'EOD');
1243
1244          =head1 NAME
1245
1246          PDL::Lib::Mylib -- a PDL interface to the Mylib library
1247
1248          =head1 DESCRIPTION
1249
1250          This package implements an interface to the Mylib package with full
1251          threading and indexing support (see L<PDL::Indexing>).
1252
1253          =cut
1254
1255          use PGPLOT;
1256
1257          =head2 use_myfunc
1258               this function applies the myfunc operation to all the
1259               elements of the input pdl regardless of dimensions
1260               and returns the sum of the result
1261          =cut
1262
1263          sub use_myfunc {
1264               my $pdl = shift;
1265
1266               myfunc($pdl->clump(-1),($res=null));
1267
1268               return $res->sum;
1269          }
1270
1271          EOD
1272
1273       You have probably got the idea. In some cases you also want to export
1274       your additional functions. To avoid getting into trouble with PP which
1275       also messes around with the @EXPORT array you just tell PP to add your
1276       functions to the list of exported functions:
1277
1278         pp_add_exported('', 'use_myfunc gethynx');
1279
1280       Note the initial empty string argument (reason for it?).
1281
1282       The "pp_add_isa" command works like the the "pp_add_exported" function.
1283       The arguments to "pp_add_isa" are added the @ISA list, e.g.
1284
1285         pp_add_isa(' Some::Other::Class ');
1286
1287       Sometimes you want to add extra XS code of your own (that is generally
1288       not involved with any threading/indexing issues but supplies some other
1289       functionality you want to access from the perl side) to the generated
1290       XS file, for example
1291
1292         pp_addxs('','
1293
1294         # Determine endianness of machine
1295
1296         int
1297         isbigendian()
1298            CODE:
1299              unsigned short i;
1300              PDL_Byte *b;
1301
1302              i = 42; b = (PDL_Byte*) (void*) &i;
1303
1304              if (*b == 42)
1305                 RETVAL = 0;
1306              else if (*(b+1) == 42)
1307                 RETVAL = 1;
1308              else
1309                 croak("Impossible - machine is neither big nor little endian!!\n");
1310              OUTPUT:
1311                RETVAL
1312         ');
1313
1314       Especially "pp_add_exported" and "pp_addxs" should be used with care.
1315       PP uses PDL::Exporter, hence letting PP export your function means that
1316       they get added to the standard list of function exported by default
1317       (the list defined by the export tag ``:Func''). If you use "pp_addxs"
1318       you shouldn't try to do anything that involves threading or indexing
1319       directly. PP is much better at generating the appropriate code from
1320       your definitions.
1321
1322       Finally, you may want to add some code to the BOOT section of the XS
1323       file (if you don't know what that is check perlxs). This is easily done
1324       with the "pp_add_boot" command:
1325
1326         pp_add_boot(<<EOB);
1327               descrip = mylib_initialize(KEEP_OPEN);
1328
1329               if (descrip == NULL)
1330                  croak("Can't initialize library");
1331
1332               GlobalStruc->descrip = descrip;
1333               GlobalStruc->maxfiles = 200;
1334         EOB
1335
1336       By default, PP.pm puts all subs defined using the pp_def function into
1337       the output .pm file's EXPORT list. This can create problems if you are
1338       creating a subclassed object where you don't want any methods exported.
1339       (i.e. the methods will only be called using the $object->method
1340       syntax).
1341
1342       For these cases you can call pp_export_nothing() to clear out the
1343       export list. Example (At the end of the .pd file):
1344
1345         pp_export_nothing();
1346         pp_done();
1347
1348       By default, PP.pm puts the 'use Core;' line into the output .pm file.
1349       This imports Core's exported names into the current namespace, which
1350       can create problems if you are over-riding one of Core's methods in the
1351       current file.  You end up getting messages like "Warning: sub sumover
1352       redefined in file subclass.pm" when running the program.
1353
1354       For these cases the pp_core_importList can be used to change what is
1355       imported from Core.pm.  For example:
1356
1357         pp_core_importList('()')
1358
1359       This would result in
1360
1361         use Core();
1362
1363       being generated in the output .pm file. This would result in no names
1364       being imported from Core.pm. Similarly, calling
1365
1366         pp_core_importList(' qw/ barf /')
1367
1368       would result in
1369
1370         use Core qw/ barf/;
1371
1372       being generated in the output .pm file. This would result in just
1373       'barf' being imported from Core.pm.
1374

Slice operation

1376       The slice operation section of this manual is provided using dataflow
1377       and lazy evaluation: when you need it, ask Tjl to write it.  a delivery
1378       in a week from when I receive the email is 95% probable and two week
1379       delivery is 99% probable.
1380
1381       And anyway, the slice operations require a much more intimate knowledge
1382       of PDL internals than the data operations. Furthermore, the complexity
1383       of the issues involved is considerably higher than that in the average
1384       data operation. If you would like to convince yourself of this fact
1385       take a look at the Basic/Slices/slices.pd file in the PDL distribution
1386       :-). Nevertheless, functions generated using the slice operations are
1387       at the heart of the index manipulation and dataflow capabilities of
1388       PDL.
1389
1390       Also, there are a lot of dirty issues with virtual piddles and vaffines
1391       which we shall entirely skip here.
1392
1393   Slices and bad values
1394       Slice operations need to be able to handle bad values (if support is
1395       compiled into PDL). The easiest thing to do is look at
1396       Basic/Slices/slices.pd to see how this works.
1397
1398       Along with "BadCode", there are also the "BadBackCode" and
1399       "BadRedoDimsCode" keys for "pp_def". However, any "EquivCPOffsCode"
1400       should not need changing, since any changes are absorbed into the
1401       definition of the "$EQUIVCPOFFS()" macro (ie it is handled
1402       automatically by PDL::PP>.
1403
1404   A few notes on writing a slicing routine...
1405       The following few paragraphs describe writing of a new slicing routine
1406       ('range'); any errors are CED's. (--CED 26-Aug-2002)
1407

USEFUL ROUTINES

1409       The PDL "Core" structure, defined in Basic/Core/pdlcore.h.PL, contains
1410       pointers to a number of routines that may be useful to you.  The
1411       majority of these routines deal with manipulating piddles, but some are
1412       more general:
1413
1414       PDL->qsort_B( PDL_Byte *xx, int a, int b )
1415           Sort the array "xx" between the indices "a" and "b".  There are
1416           also versions for the other PDL datatypes, with postfix "_S", "_U",
1417           "_L", "_F", and "_D".  Any module using this must ensure that
1418           "PDL::Ufunc" is loaded.
1419
1420       PDL->qsort_ind_B( PDL_Byte *xx, int *ix, int a, int b )
1421           As for "PDL->qsort_B", but this time sorting the indices rather
1422           than the data.
1423
1424       The routine "med2d" in Lib/Image2D/image2d.pd shows how such routines
1425       are used.
1426

MAKEFILES FOR PP FILES

1428       If you are going to generate a package from your PP file (typical file
1429       extensions are ".pd" or ".pp" for the files containing PP code) it is
1430       easiest and safest to leave generation of the appropriate commands to
1431       the Makefile. In the following we will outline the typical format of a
1432       perl Makefile to automatically build and install your package from a
1433       description in a PP file. Most of the rules to build the xs, pm and
1434       other required files from the PP file are already predefined in the
1435       PDL::Core::Dev package. We just have to tell MakeMaker to use it.
1436
1437       In most cases you can define your Makefile like
1438
1439         # Makefile.PL for a package defined by PP code.
1440
1441         use PDL::Core::Dev;            # Pick up development utilities
1442         use ExtUtils::MakeMaker;
1443
1444         $package = ["mylib.pd",Mylib,PDL::Lib::Mylib];
1445         %hash = pdlpp_stdargs($package);
1446         $hash{OBJECT} .= ' additional_Ccode$(OBJ_EXT) ';
1447         $hash{clean}->{FILES} .= ' todelete_Ccode$(OBJ_EXT) ';
1448         $hash{'VERSION_FROM'} = 'mylib.pd';
1449         WriteMakefile(%hash);
1450
1451         sub MY::postamble { pdlpp_postamble($package); }
1452
1453       Here, the list in $package is: first: PP source file name, then the
1454       prefix for the produced files and finally the whole package name.  You
1455       can modify the hash in whatever way you like but it would be reasonable
1456       to stay within some limits so that your package will continue to work
1457       with later versions of PDL.
1458
1459       If you don't want to use prepackaged arguments, here is a generic
1460       Makefile.PL that you can adapt for your own needs:
1461
1462         # Makefile.PL for a package defined by PP code.
1463
1464         use PDL::Core::Dev;            # Pick up development utilities
1465         use ExtUtils::MakeMaker;
1466
1467         WriteMakefile(
1468          'NAME'       => 'PDL::Lib::Mylib',
1469          'VERSION_FROM'       => 'mylib.pd',
1470          'TYPEMAPS'     => [&PDL_TYPEMAP()],
1471          'OBJECT'       => 'mylib$(OBJ_EXT) additional_Ccode$(OBJ_EXT)',
1472          'PM'         => { 'Mylib.pm'            => '$(INST_LIBDIR)/Mylib.pm'},
1473          'INC'          => &PDL_INCLUDE(), # add include dirs as required by your lib
1474          'LIBS'         => [''],   # add link directives as necessary
1475          'clean'        => {'FILES'  =>
1476                                 'Mylib.pm Mylib.xs Mylib$(OBJ_EXT)
1477                                 additional_Ccode$(OBJ_EXT)'},
1478         );
1479
1480         # Add genpp rule; this will invoke PDL::PP on our PP file
1481         # the argument is an array reference where the array has three string elements:
1482         #   arg1: name of the source file that contains the PP code
1483         #   arg2: basename of the xs and pm files to be generated
1484         #   arg3: name of the package that is to be generated
1485         sub MY::postamble { pdlpp_postamble(["mylib.pd",Mylib,PDL::Lib::Mylib]); }
1486
1487       To make life even easier PDL::Core::Dev defines the function
1488       "pdlpp_stdargs" that returns a hash with default values that can be
1489       passed (either directly or after appropriate modification) to a call to
1490       WriteMakefile.  Currently, "pdlpp_stdargs" returns a hash where the
1491       keys are filled in as follows:
1492
1493               (
1494                'NAME'         => $mod,
1495                'TYPEMAPS'     => [&PDL_TYPEMAP()],
1496                'OBJECT'       => "$pref\$(OBJ_EXT)",
1497                PM     => {"$pref.pm" => "\$(INST_LIBDIR)/$pref.pm"},
1498                MAN3PODS => {"$src" => "\$(INST_MAN3DIR)/$mod.\$(MAN3EXT)"},
1499                'INC'          => &PDL_INCLUDE(),
1500                'LIBS'         => [''],
1501                'clean'        => {'FILES'  => "$pref.xs $pref.pm $pref\$(OBJ_EXT)"},
1502               )
1503
1504       Here, $src is the name of the source file with PP code, $pref the
1505       prefix for the generated .pm and .xs files and $mod the name of the
1506       exntension module to generate.
1507

INTERNALS

1509       The internals of the current version consist of a large table which
1510       gives the rules according to which things are translated and the subs
1511       which implement these rules.
1512
1513       Later on, it would be good to make the table modifiable by the user so
1514       that different things may be tried.
1515
1516       [Meta comment: here will hopefully be more in the future; currently,
1517       your best bet will be to read the source code :-( or ask on the list
1518       (try the latter first) ]
1519

Appendix A: Some keys recognised by PDL::PP

1521       Unless otherwise specified, the arguments are strings. Keys marked with
1522       (bad) are only used if bad-value support is compiled into PDL.
1523
1524       Pars
1525           define the signature of your function
1526
1527       OtherPars
1528           arguments which are not pdls. Default: nothing.
1529
1530       Code
1531           the actual code that implements the functionality; several PP
1532           macros and PP functions are recognised in the string value
1533
1534       HandleBad (bad)
1535           If set to 1, the routine is assumed to support bad values and the
1536           code in the BadCode key is used if bad values are present; it also
1537           sets things up so that the "$ISBAD()" etc macros can be used.  If
1538           set to 0, cause the routine to print a warning if any of the input
1539           piddles have their bad flag set.
1540
1541       BadCode (bad)
1542           Give the code to be used if bad values may be present in the input
1543           piddles.  Only used if "HandleBad => 1".
1544
1545       GenericTypes
1546           An array reference. The array may contain any subset of the strings
1547           `B', `S', `U', `L', `F' and `D', which specify which types your
1548           operation will accept.  This is very useful (and important!) when
1549           interfacing an external library.  Default: [qw/B S U L F D/]
1550
1551       Inplace
1552           Mark a function as being able to work inplace.
1553
1554            Inplace => 1          if  Pars => 'a(); [o]b();'
1555            Inplace => ['a']      if  Pars => 'a(); b(); [o]c();'
1556            Inplace => ['a','b']  if  Pars => 'a(); b(); [o]c(); [o]d();'
1557
1558           If bad values are being used, care must be taken to ensure the
1559           propogation of the badflag when inplace is being used; for instance
1560           see the code for "replacebad" in Basic/Bad/bad.pd.
1561
1562       Doc Used to specify a documentation string in Pod format. See PDL::Doc
1563           for information on PDL documentation conventions. Note: in the
1564           special case where the PP 'Doc' string is one line this is
1565           implicitly used for the quick reference AND the documentation!
1566
1567           If the Doc field is omitted PP will generate default documentation
1568           (after all it knows about the Signature).
1569
1570           If you really want the function NOT to be documented in any way at
1571           this point (e.g. for an internal routine, or because youu are doing
1572           it elsewhere in the code) explictly specify "Doc=>undef".
1573
1574       BadDoc (bad)
1575           Contains the text returned by the "badinfo" command (in "perldl")
1576           or the "-b" switch to the "pdldoc" shell script. In many cases, you
1577           will not need to specify this, since the information can be
1578           automatically created by PDL::PP. However, as befits computer-
1579           generated text, it's rather stilted; it may be much better to do it
1580           yourself!
1581

Appendix B: PP macros and functions

1583   Macros
1584       Macros labelled by (bad) are only used if bad-value support is compiled
1585       into PDL.
1586
1587       $variablename_from_sig()
1588              access a pdl (by its name) that was specified in the signature
1589
1590       $COMP(x)
1591              access a value in the private data structure of this
1592              transformation (mainly used to use an argument that is specified
1593              in the "OtherPar" section)
1594
1595       $SIZE(n)
1596              replaced at runtime by the actual size of a named dimension (as
1597              specified in the signature)
1598
1599       $GENERIC()
1600              replaced by the C type that is equal to the runtime type of the
1601              operation
1602
1603       $P(a)  a pointer access to the PDL named "a" in the signature. Useful
1604              for interfacing to C functions
1605
1606       $PP(a) a physical pointer access to pdl "a"; mainly for internal use
1607
1608       $TXXX(Alternative,Alternative)
1609              expansion alternatives according to runtime type of operation,
1610              where XXX is some string that is matched by "/[BSULFD+]/".
1611
1612       $PDL(a)
1613              return a pointer to the pdl data structure (pdl *) of piddle "a"
1614
1615       $ISBAD(a()) (bad)
1616              returns true if the value stored in "a()" equals the bad value
1617              for this piddle.  Requires "HandleBad" being set to 1.
1618
1619       $ISGOOD(a()) (bad)
1620              returns true if the value stored in "a()" does not equal the bad
1621              value for this piddle.  Requires "HandleBad" being set to 1.
1622
1623       $SETBAD(a()) (bad)
1624              Sets "a()" to equal the bad value for this piddle.  Requires
1625              "HandleBad" being set to 1.
1626
1627   functions
1628       "loop(DIMS) %{ ... %}"
1629          loop over named dimensions; limits are generated automatically by PP
1630
1631       "threadloop %{ ... %}"
1632          enclose following code in a threadloop
1633
1634       "types(TYPES) %{ ... %}"
1635          execute following code if type of operation is any of "TYPES"
1636

CURRENTLY UNDOCUMENTED

1649       RedoDimsCode, $RESIZE()
1650

BUGS

1652       PDL::PP is still, even in its rewritten form, too complicated.  It
1653       needs to be rethought a little as well as deconvoluted and modularized
1654       some more (e.g. all the NS things).
1655
1656       After the rewrite, this can happen a little by little, though.
1657
1658   Undocumented functions
1659       The following functions have been added since this manual was written
1660       and are as yet undocumented
1661
1662       pp_export_nothing
1663       pp_core_importList
1664       pp_beginwrap
1665       pp_setversion
1666       pp_addbegin
1667

AUTHOR

1669       Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), Karl
1670       Glaazebrook (kgb@aaocbn1.aao.GOV.AU) and Christian Soeller
1671       (c.soeller@auckland.ac.nz). All rights reserved. Although destined for
1672       release as a man page with the standard PDL distribution, it is not
1673       public domain. Permission is granted to freely distribute verbatim
1674       copies of this document provided that no modifications outside of
1675       formatting be made, and that this notice remain intact.  You are
1676       permitted and encouraged to use its code and derivatives thereof in
1677       your own source code for fun or for profit as you see fit.
1678
1679
1680
1681perl v5.12.3                      2009-10-17                             PP(1)