1BADVALUES(1)          User Contributed Perl Documentation         BADVALUES(1)
2
3
4

NAME

6       PDL::BadValues - Discussion of bad value support in PDL
7

DESCRIPTION

9   What are bad values and why should I bother with them?
10       Sometimes it's useful to be able to specify a certain value is 'bad' or
11       'missing'; for example CCDs used in astronomy produce 2D images which
12       are not perfect since certain areas contain invalid data due to
13       imperfections in the detector.  Whilst PDL's powerful index routines
14       and all the complicated business with dataflow, slices, etc etc mean
15       that these regions can be ignored in processing, it's awkward to do. It
16       would be much easier to be able to say "$c = $x + $y" and leave all the
17       hassle to the computer.
18
19       If you're not interested in this, then you may (rightly) be concerned
20       with how this affects the speed of PDL, since the overhead of checking
21       for a bad value at each operation can be large.  Because of this, the
22       code has been written to be as fast as possible - particularly when
23       operating on ndarrays which do not contain bad values.  In fact, you
24       should notice essentially no speed difference when working with
25       ndarrays which do not contain bad values.
26
27       You may also ask 'well, my computer supports IEEE NaN, so I already
28       have this'.  Well, yes and no - many routines, such as "y=sin(x)", will
29       propagate NaN's without the user having to code differently, but
30       routines such as "qsort", or finding the median of an array, need to be
31       re-coded to handle bad values.  For floating-point datatypes, "NaN" and
32       "Inf" can be used to flag bad values, but by default special values are
33       used (Default bad values).  I do not have any benchmarks to see which
34       option is faster.
35
36       As of PDL 2.040, you can have different bad values for separate
37       ndarrays of the same type.
38
39   A quick overview
40        pdl> $x = sequence(4,3);
41        pdl> p $x
42        [
43         [ 0  1  2  3]
44         [ 4  5  6  7]
45         [ 8  9 10 11]
46        ]
47        pdl> $x = $x->setbadif( $x % 3 == 2 )
48        pdl> p $x
49        [
50         [  0   1 BAD   3]
51         [  4 BAD   6   7]
52         [BAD   9  10 BAD]
53        ]
54        pdl> $x *= 3
55        pdl> p $x
56        [
57         [  0   3 BAD   9]
58         [ 12 BAD  18  21]
59         [BAD  27  30 BAD]
60        ]
61        pdl> p $x->sum
62        120
63
64       "demo bad" and "demo bad2" within perldl or pdl2 gives a demonstration
65       of some of the things possible with bad values.  These are also
66       available on PDL's web-site, at http://pdl.perl.org/demos/.  See
67       PDL::Bad for useful routines for working with bad values and t/bad.t to
68       see them in action.
69
70       To find out if a routine supports bad values, use the "badinfo" command
71       in perldl or pdl2 or the "-b" option to pdldoc.  This facility is
72       currently a 'proof of concept' (or, more realistically, a quick hack)
73       so expect it to be rough around the edges.
74
75       Each ndarray contains a flag - accessible via "$pdl->badflag" - to say
76       whether there's any bad data present:
77
78       •   If false/0, which means there's no bad data here, the code supplied
79           by the "Code" option to "pp_def()" is executed.
80
81       •   If true/1, then this says there MAY be bad data in the ndarray, so
82           use the code in the "BadCode" option (assuming that the "pp_def()"
83           for this routine has been updated to have a BadCode key).  You get
84           all the advantages of threading, as with the "Code" option, but it
85           will run slower since you are going to have to handle the presence
86           of bad values.
87
88       If you create an ndarray, it will have its bad-value flag set to 0. To
89       change this, use "$pdl->badflag($new_bad_status)", where
90       $new_bad_status can be 0 or 1.  When a routine creates an ndarray, its
91       bad-value flag will depend on the input ndarrays: unless over-ridden
92       (see the "CopyBadStatusCode" option to "pp_def"), the bad-value flag
93       will be set true if any of the input ndarrays contain bad values.  To
94       check that an ndarray really contains bad data, use the "check_badflag"
95       method.
96
97       NOTE: propagation of the badflag
98
99       If you change the badflag of an ndarray, this change is propagated to
100       all the children of an ndarray, so
101
102          pdl> $x = zeroes(20,30);
103          pdl> $y = $x->slice('0:10,0:10');
104          pdl> $c = $y->slice(',(2)');
105          pdl> print ">>c: ", $c->badflag, "\n";
106          >>c: 0
107          pdl> $x->badflag(1);
108          pdl> print ">>c: ", $c->badflag, "\n";
109          >>c: 1
110
111       No change is made to the parents of an ndarray, so
112
113          pdl> print ">>a: ", $x->badflag, "\n";
114          >>a: 1
115          pdl> $c->badflag(0);
116          pdl> print ">>a: ", $x->badflag, "\n";
117          >>a: 1
118
119       Thoughts:
120
121       •   the badflag can ONLY be cleared IF an ndarray has NO parents, and
122           that this change will propagate to all the children of that
123           ndarray. I am not so keen on this anymore (too awkward to code, for
124           one).
125
126       •   "$x->badflag(1)" should propagate the badflag to BOTH parents and
127           children.
128
129       This shouldn't be hard to implement (although an initial attempt
130       failed!).  Does it make sense though? There's also the issue of what
131       happens if you change the badvalue of an ndarray - should these
132       propagate to children/parents (yes) or whether you should only be able
133       to change the badvalue at the 'top' level - i.e. those ndarrays which
134       do not have parents.
135
136       The "orig_badvalue()" method returns the compile-time value for a given
137       datatype. It works on ndarrays, PDL::Type objects, and numbers - eg
138
139         $pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).
140
141       It also has a horrible name...
142
143       To get the current bad value, use the "badvalue()" method - it has the
144       same syntax as "orig_badvalue()".
145
146       To change the current bad value, supply the new number to badvalue - eg
147
148         $pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).
149
150       Note: the value is silently converted to the correct C type, and
151       returned - i.e. "byte->badvalue(-26)" returns 230 on my Linux machine.
152
153       Note that changes to the bad value are NOT propagated to previously-
154       created ndarrays - they will still have the bad value set, but suddenly
155       the elements that were bad will become 'good', but containing the old
156       bad value.  See discussion below.  It's not a problem for floating-
157       point types which use NaN, since you can not change their badvalue.
158
159   Bad values and boolean operators
160       For those boolean operators in PDL::Ops, evaluation on a bad value
161       returns the bad value.  Whilst this means that
162
163        $mask = $img > $thresh;
164
165       correctly propagates bad values, it will cause problems for checks such
166       as
167
168        do_something() if any( $img > $thresh );
169
170       which need to be re-written as something like
171
172        do_something() if any( setbadtoval( ($img > $thresh), 0 ) );
173
174       When using one of the 'projection' functions in PDL::Ufunc - such as
175       orover - bad values are skipped over (see the documentation of these
176       functions for the current (poor) handling of the case when all elements
177       are bad).
178
179   A bad value for each ndarray, and related issues
180       There is one default bad value for each datatype, but you can have a
181       separate bad value for each ndarray as of PDL 2.040.
182

IMPLEMENTATION DETAILS

184       PDL code just needs to access the %PDL::Config array (e.g.
185       Basic/Bad/bad.pd) to find out whether bad-value support is required.
186
187       A new flag has been added to the state of an ndarray - "PDL_BADVAL". If
188       unset, then the ndarray does not contain bad values, and so all the
189       support code can be ignored. If set, it does not guarantee that bad
190       values are present, just that they should be checked for. Thanks to
191       Christian, "badflag()" - which sets/clears this flag (see
192       Basic/Bad/bad.pd) - will update ALL the children/grandchildren/etc of
193       an ndarray if its state changes (see "badflag" in Basic/Bad/bad.pd and
194       "propagate_badflag" in Basic/Core/Core.xs.PL).  It's not clear what to
195       do with parents: I can see the reason for propagating a 'set badflag'
196       request to parents, but I think a child should NOT be able to clear the
197       badflag of a parent.  There's also the issue of what happens when you
198       change the bad value for an ndarray.
199
200       The "pdl_trans" structure has been extended to include an integer
201       value, "bvalflag", which acts as a switch to tell the code whether to
202       handle bad values or not. This value is set if any of the input
203       ndarrays have their "PDL_BADVAL" flag set (although this code can be
204       replaced by setting "FindBadStateCode" in pp_def).  The logic of the
205       check is going to get a tad more complicated if I allow routines to
206       fall back to using the "Code" section for floating-point types.
207
208       The default bad values are now stored in a structure within the Core
209       PDL structure - "PDL.bvals" (eg Basic/Core/pdlcore.h.PL); see also
210       "typedef badvals" in Basic/Core/pdl.h.PL and the BOOT code of
211       Basic/Core/Core.xs.PL where the values are initialised to (hopefully)
212       sensible values.  See PDL/Bad/bad.pd for read/write routines to the
213       values.
214
215   Why not make a PDL subclass?
216       The support for bad values could have been done as a PDL sub-class.
217       The advantage of this approach would be that you only load in the code
218       to handle bad values if you actually want to use them.  The downside is
219       that the code then gets separated: any bug fixes/improvements have to
220       be done to the code in two different files.  With the present approach
221       the code is in the same "pp_def" function (although there is still the
222       problem that both "Code" and "BadCode" sections need updating).
223
224   Default bad values
225       The default/original bad values are set to (taken from the Starlink
226       distribution):
227
228         #include <limits.h>
229
230         PDL_Byte    ==  UCHAR_MAX
231         PDL_Short   ==   SHRT_MIN
232         PDL_Ushort  ==  USHRT_MAX
233         PDL_Long    ==    INT_MIN
234         PDL_Float   ==   -FLT_MAX
235         PDL_Double  ==   -DBL_MAX
236
237   How do I change a routine to handle bad values?
238       Examples can be found in most of the *.pd files in Basic/ (and
239       hopefully many more places soon!).  Some of the logic might appear a
240       bit unclear - that's probably because it is! Comments appreciated.
241
242       All routines should automatically propagate the bad status flag to
243       output ndarrays, unless you declare otherwise.
244
245       If a routine explicitly deals with bad values, you must provide this
246       option to pp_def:
247
248          HandleBad => 1
249
250       This ensures that the correct variables are initialised for the $ISBAD
251       etc macros. It is also used by the automatic document-creation routines
252       to provide default information on the bad value support of a routine
253       without the user having to type it themselves (this is in its early
254       stages).
255
256       To flag a routine as NOT handling bad values, use
257
258          HandleBad => 0
259
260       This should cause the routine to print a warning if it's sent any
261       ndarrays with the bad flag set. Primitive's "intover" has had this set
262       - since it would be awkward to convert - but I've not tried it out to
263       see if it works.
264
265       If you want to handle bad values but not set the state of all the
266       output ndarrays, or if it's only one input ndarray that's important,
267       then look at the PP rules "NewXSFindBadStatus" and "NewXSCopyBadStatus"
268       and the corresponding "pp_def" options:
269
270       FindBadStatusCode
271           By default, "FindBadStatusCode" creates code which sets
272           "$PRIV(bvalflag)" depending on the state of the bad flag of the
273           input ndarrays: see "findbadstatus" in Basic/Gen/PP.pm.  User-
274           defined code should also store the value of "bvalflag" in the
275           "$BADFLAGCACHE()" variable.
276
277       CopyBadStatusCode
278           The default code here is a bit simpler than for
279           "FindBadStatusCode": the bad flag of the output ndarrays are set if
280           "$BADFLAGCACHE()" is true after the code has been evaluated.
281           Sometimes "CopyBadStatusCode" is set to an empty string, with the
282           responsibility of setting the badflag of the output ndarray left to
283           the "BadCode" section (e.g. the "xxxover" routines in
284           Basic/Primitive/primitive.pd).
285
286           Prior to PDL 2.4.3 we used "$PRIV(bvalflag)" instead of
287           "$BADFLAGCACHE()". This is dangerous since the "$PRIV()" structure
288           is not guaranteed to be valid at this point in the code.
289
290       If you have a routine that you want to be able to use as in-place, look
291       at the routines in bad.pd (or ops.pd) which use the "in-place" option
292       to see how the bad flag is propagated to children using the
293       "xxxBadStatusCode" options.  I decided not to automate this as rules
294       would be a little complex, since not every in-place op will need to
295       propagate the badflag (eg unary functions).
296
297       If the option
298
299          HandleBad => 1
300
301       is given, then many things happen.  For integer types, the readdata
302       code automatically creates a variable called "<pdl name>_badval", which
303       contains the bad value for that ndarray (see "get_xsdatapdecl()" in
304       Basic/Gen/PP/PdlParObjs.pm).  However, do not hard code this name into
305       your code!  Instead use macros (thanks to Tuomas for the suggestion):
306
307         '$ISBAD(a(n=>1))'  expands to '$a(n=>1) == a_badval'
308         '$ISGOOD(a())'                '$a()     != a_badval'
309         '$SETBAD(bob())'              '$bob()    = bob_badval'
310
311       well, the "$a(...)" is expanded as well. Also, you can use a "$" before
312       the pdl name, if you so wish, but it begins to look like line noise -
313       eg "$ISGOOD($a())".
314
315       If you cache an ndarray value in a variable -- eg "index" in slices.pd
316       -- the following routines are useful:
317
318          '$ISBADVAR(c_var,pdl)'       'c_var == pdl_badval'
319          '$ISGOODVAR(c_var,pdl)'      'c_var != pdl_badval'
320          '$SETBADVAR(c_var,pdl)'      'c_var  = pdl_badval'
321
322       The following have been introduced, They may need playing around with
323       to improve their use.
324
325         '$PPISBAD(CHILD,[i])          'CHILD_physdatap[i] == CHILD_badval'
326         '$PPISGOOD(CHILD,[i])         'CHILD_physdatap[i] != CHILD_badval'
327         '$PPSETBAD(CHILD,[i])         'CHILD_physdatap[i]  = CHILD_badval'
328
329       You can use "NaN" as the bad value for any floating-point type,
330       including complex.
331
332       This all means that you can change
333
334          Code => '$a() = $b() + $c();'
335
336       to
337
338          BadCode => 'if ( $ISBAD(b()) || $ISBAD(c()) ) {
339                        $SETBAD(a());
340                      } else {
341                        $a() = $b() + $c();
342                      }'
343
344       leaving Code as it is. PP::PDLCode will then create a loop something
345       like
346
347          if ( __trans->bvalflag ) {
348               threadloop over BadCode
349          } else {
350               threadloop over Code
351          }
352
353       (it's probably easier to just look at the .xs file to see what goes
354       on).
355
356   Going beyond the Code section
357       Similar to "BadCode", there's "BadBackCode", and "BadRedoDimsCode".
358
359       Handling "EquivCPOffsCode" is a bit different: under the assumption
360       that the only access to data is via the "$EQUIVCPOFFS(i,j)" macro, then
361       we can automatically create the 'bad' version of it; see the
362       "[EquivCPOffsCode]" and "[Code]" rules in PDL::PP.
363
364   Macro access to the bad flag of an ndarray
365       Macros have been provided to provide access to the bad-flag status of a
366       pdl:
367
368         '$PDLSTATEISBAD(a)'    -> '($PDL(a)->state & PDL_BADVAL) > 0'
369         '$PDLSTATEISGOOD(a)'      '($PDL(a)->state & PDL_BADVAL) == 0'
370
371         '$PDLSTATESETBAD(a)'      '$PDL(a)->state |= PDL_BADVAL'
372         '$PDLSTATESETGOOD(a)'     '$PDL(a)->state &= ~PDL_BADVAL'
373
374       For use in "xxxxBadStatusCode" (+ other stuff that goes into the INIT:
375       section) there are:
376
377         '$SETPDLSTATEBAD(a)'       -> 'a->state |= PDL_BADVAL'
378         '$SETPDLSTATEGOOD(a)'      -> 'a->state &= ~PDL_BADVAL'
379
380         '$ISPDLSTATEBAD(a)'        -> '((a->state & PDL_BADVAL) > 0)'
381         '$ISPDLSTATEGOOD(a)'       -> '((a->state & PDL_BADVAL) == 0)'
382
383       In PDL 2.4.3 the "$BADFLAGCACHE()" macro was introduced for use in
384       "FindBadStatusCode" and "CopyBadStatusCode".
385

WHAT ABOUT DOCUMENTATION?

387       One of the strengths of PDL is its on-line documentation. The aim is to
388       use this system to provide information on how/if a routine supports bad
389       values: in many cases "pp_def()" contains all the information anyway,
390       so the function-writer doesn't need to do anything at all! For the
391       cases when this is not sufficient, there's the "BadDoc" option. For
392       code written at the Perl level - i.e. in a .pm file - use the "=for
393       bad" pod directive.
394
395       This information will be available via man/pod2man/html documentation.
396       It's also accessible from the "perldl" or "pdl2" shells - using the
397       "badinfo" command - and the "pdldoc" shell command - using the "-b"
398       option.
399

CURRENT ISSUES

401       There are a number of areas that need work, user input, or both!  They
402       are mentioned elsewhere in this document, but this is just to make sure
403       they don't get lost.
404
405   Trapping invalid mathematical operations
406       Should we add exceptions to the functions in "PDL::Ops" to set the
407       output bad for out-of-range input values?
408
409        pdl> p log10(pdl(10,100,-1))
410
411       I would like the above to produce "[1 2 BAD]", but this would slow down
412       operations on all ndarrays.  We could check for "NaN"/"Inf" values
413       after the operation, but I doubt that would be any faster.
414
415   Dataflow of the badflag
416       Currently changes to the bad flag are propagated to the children of an
417       ndarray, but perhaps they should also be passed on to the parents as
418       well. With the advent of per-ndarray bad values we need to consider how
419       to handle changes to the value used to represent bad items too.
420

EVERYTHING ELSE

422       The build process has been affected. The following files are now
423       created during the build:
424
425         Basic/Core/pdlcore.h      pdlcore.h.PL
426                    pdlcore.c      pdlcore.c.PL
427                    pdlapi.c       pdlapi.c.PL
428                    Core.xs        Core.xs.PL
429                    Core.pm        Core.pm.PL
430
431       Several new files have been added:
432
433         Basic/Pod/BadValues.pod (i.e. this file)
434
435         t/bad.t
436
437         Basic/Bad/
438         Basic/Bad/Makefile.PL
439                   bad.pd
440
441       etc
442

TODO/SUGGESTIONS

444       •   what to do about "$y = pdl(-2); $x = log10($y)" - $x should be set
445           bad, but it currently isn't.
446
447       •   Allow the operations in PDL::Ops to skip the check for bad values
448           when using NaN as a bad value and processing a floating-point
449           ndarray.  Needs a fair bit of work to PDL::PP::PDLCode.
450
451       •   "$pdl->baddata()" now updates all the children of this ndarray as
452           well. However, not sure what to do with parents, since:
453
454             $y = $x->slice();
455             $y->baddata(0)
456
457           doesn't mean that $x shouldn't have its badvalue cleared.  however,
458           after
459
460             $y->baddata(1)
461
462           it's sensible to assume that the parents now get flagged as
463           containing bad values.
464
465           PERHAPS you can only clear the bad value flag if you are NOT a
466           child of another ndarray, whereas if you set the flag then all
467           children AND parents should be set as well?
468
469           Similarly, if you change the bad value in an ndarray, should this
470           be propagated to parent & children? Or should you only be able to
471           do this on the 'top-level' ndarray? Nasty...
472
473       •   some of the names aren't appealing - I'm thinking of
474           "orig_badvalue()" in Basic/Bad/bad.pd in particular. Any
475           suggestions appreciated.
476

AUTHOR

478       Copyright (C) Doug Burke (djburke@cpan.org), 2000, 2006.
479
480       The per-ndarray bad value support is by Heiko Klein (2006).
481
482       Commercial reproduction of this documentation in a different format is
483       forbidden.
484
485
486
487perl v5.34.0                      2021-08-16                      BADVALUES(1)
Impressum