1BADVALUES(1) User Contributed Perl Documentation BADVALUES(1)
2
3
4
6 PDL::BadValues - Discussion of bad value support in PDL
7
9 What are bad values and why should I bother with them?
10 Sometimes it's useful to be able to specify a certain value is 'bad' or
11 'missing'; for example CCDs used in astronomy produce 2D images which
12 are not perfect since certain areas contain invalid data due to
13 imperfections in the detector. Whilst PDL's powerful index routines
14 and all the complicated business with dataflow, slices, etc etc mean
15 that these regions can be ignored in processing, it's awkward to do. It
16 would be much easier to be able to say "$c = $x + $y" and leave all the
17 hassle to the computer.
18
19 If you're not interested in this, then you may (rightly) be concerned
20 with how this affects the speed of PDL, since the overhead of checking
21 for a bad value at each operation can be large. Because of this, the
22 code has been written to be as fast as possible - particularly when
23 operating on ndarrays which do not contain bad values. In fact, you
24 should notice essentially no speed difference when working with
25 ndarrays which do not contain bad values.
26
27 You may also ask 'well, my computer supports IEEE NaN, so I already
28 have this'. Well, yes and no - many routines, such as "y=sin(x)", will
29 propagate NaN's without the user having to code differently, but
30 routines such as "qsort", or finding the median of an array, need to be
31 re-coded to handle bad values. For floating-point datatypes, "NaN" and
32 "Inf" can be used to flag bad values, but by default special values are
33 used (Default bad values). I do not have any benchmarks to see which
34 option is faster.
35
36 As of PDL 2.040, you can have different bad values for separate
37 ndarrays of the same type.
38
39 A quick overview
40 pdl> $x = sequence(4,3);
41 pdl> p $x
42 [
43 [ 0 1 2 3]
44 [ 4 5 6 7]
45 [ 8 9 10 11]
46 ]
47 pdl> $x = $x->setbadif( $x % 3 == 2 )
48 pdl> p $x
49 [
50 [ 0 1 BAD 3]
51 [ 4 BAD 6 7]
52 [BAD 9 10 BAD]
53 ]
54 pdl> $x *= 3
55 pdl> p $x
56 [
57 [ 0 3 BAD 9]
58 [ 12 BAD 18 21]
59 [BAD 27 30 BAD]
60 ]
61 pdl> p $x->sum
62 120
63
64 "demo bad" and "demo bad2" within perldl or pdl2 gives a demonstration
65 of some of the things possible with bad values. These are also
66 available on PDL's web-site, at http://pdl.perl.org/demos/. See
67 PDL::Bad for useful routines for working with bad values and t/bad.t to
68 see them in action.
69
70 To find out if a routine supports bad values, use the "badinfo" command
71 in perldl or pdl2 or the "-b" option to pdldoc. This facility is
72 currently a 'proof of concept' (or, more realistically, a quick hack)
73 so expect it to be rough around the edges.
74
75 Each ndarray contains a flag - accessible via "$pdl->badflag" - to say
76 whether there's any bad data present:
77
78 • If false/0, which means there's no bad data here, the code supplied
79 by the "Code" option to "pp_def()" is executed.
80
81 • If true/1, then this says there MAY be bad data in the ndarray, so
82 use the code in the "BadCode" option (assuming that the "pp_def()"
83 for this routine has been updated to have a BadCode key). You get
84 all the advantages of threading, as with the "Code" option, but it
85 will run slower since you are going to have to handle the presence
86 of bad values.
87
88 If you create an ndarray, it will have its bad-value flag set to 0. To
89 change this, use "$pdl->badflag($new_bad_status)", where
90 $new_bad_status can be 0 or 1. When a routine creates an ndarray, its
91 bad-value flag will depend on the input ndarrays: unless over-ridden
92 (see the "CopyBadStatusCode" option to "pp_def"), the bad-value flag
93 will be set true if any of the input ndarrays contain bad values. To
94 check that an ndarray really contains bad data, use the "check_badflag"
95 method.
96
97 NOTE: propagation of the badflag
98
99 If you change the badflag of an ndarray, this change is propagated to
100 all the children of an ndarray, so
101
102 pdl> $x = zeroes(20,30);
103 pdl> $y = $x->slice('0:10,0:10');
104 pdl> $c = $y->slice(',(2)');
105 pdl> print ">>c: ", $c->badflag, "\n";
106 >>c: 0
107 pdl> $x->badflag(1);
108 pdl> print ">>c: ", $c->badflag, "\n";
109 >>c: 1
110
111 No change is made to the parents of an ndarray, so
112
113 pdl> print ">>a: ", $x->badflag, "\n";
114 >>a: 1
115 pdl> $c->badflag(0);
116 pdl> print ">>a: ", $x->badflag, "\n";
117 >>a: 1
118
119 Thoughts:
120
121 • the badflag can ONLY be cleared IF an ndarray has NO parents, and
122 that this change will propagate to all the children of that
123 ndarray. I am not so keen on this anymore (too awkward to code, for
124 one).
125
126 • "$x->badflag(1)" should propagate the badflag to BOTH parents and
127 children.
128
129 This shouldn't be hard to implement (although an initial attempt
130 failed!). Does it make sense though? There's also the issue of what
131 happens if you change the badvalue of an ndarray - should these
132 propagate to children/parents (yes) or whether you should only be able
133 to change the badvalue at the 'top' level - i.e. those ndarrays which
134 do not have parents.
135
136 The "orig_badvalue()" method returns the compile-time value for a given
137 datatype. It works on ndarrays, PDL::Type objects, and numbers - eg
138
139 $pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).
140
141 It also has a horrible name...
142
143 To get the current bad value, use the "badvalue()" method - it has the
144 same syntax as "orig_badvalue()".
145
146 To change the current bad value, supply the new number to badvalue - eg
147
148 $pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).
149
150 Note: the value is silently converted to the correct C type, and
151 returned - i.e. "byte->badvalue(-26)" returns 230 on my Linux machine.
152
153 Note that changes to the bad value are NOT propagated to previously-
154 created ndarrays - they will still have the bad value set, but suddenly
155 the elements that were bad will become 'good', but containing the old
156 bad value. See discussion below. It's not a problem for floating-
157 point types which use NaN, since you can not change their badvalue.
158
159 Bad values and boolean operators
160 For those boolean operators in PDL::Ops, evaluation on a bad value
161 returns the bad value. Whilst this means that
162
163 $mask = $img > $thresh;
164
165 correctly propagates bad values, it will cause problems for checks such
166 as
167
168 do_something() if any( $img > $thresh );
169
170 which need to be re-written as something like
171
172 do_something() if any( setbadtoval( ($img > $thresh), 0 ) );
173
174 When using one of the 'projection' functions in PDL::Ufunc - such as
175 orover - bad values are skipped over (see the documentation of these
176 functions for the current (poor) handling of the case when all elements
177 are bad).
178
179 A bad value for each ndarray, and related issues
180 There is one default bad value for each datatype, but you can have a
181 separate bad value for each ndarray as of PDL 2.040.
182
184 PDL code just needs to access the %PDL::Config array (e.g.
185 Basic/Bad/bad.pd) to find out whether bad-value support is required.
186
187 A new flag has been added to the state of an ndarray - "PDL_BADVAL". If
188 unset, then the ndarray does not contain bad values, and so all the
189 support code can be ignored. If set, it does not guarantee that bad
190 values are present, just that they should be checked for. Thanks to
191 Christian, "badflag()" - which sets/clears this flag (see
192 Basic/Bad/bad.pd) - will update ALL the children/grandchildren/etc of
193 an ndarray if its state changes (see "badflag" in Basic/Bad/bad.pd and
194 "propagate_badflag" in Basic/Core/Core.xs.PL). It's not clear what to
195 do with parents: I can see the reason for propagating a 'set badflag'
196 request to parents, but I think a child should NOT be able to clear the
197 badflag of a parent. There's also the issue of what happens when you
198 change the bad value for an ndarray.
199
200 The "pdl_trans" structure has been extended to include an integer
201 value, "bvalflag", which acts as a switch to tell the code whether to
202 handle bad values or not. This value is set if any of the input
203 ndarrays have their "PDL_BADVAL" flag set (although this code can be
204 replaced by setting "FindBadStateCode" in pp_def). The logic of the
205 check is going to get a tad more complicated if I allow routines to
206 fall back to using the "Code" section for floating-point types.
207
208 The default bad values are now stored in a structure within the Core
209 PDL structure - "PDL.bvals" (eg Basic/Core/pdlcore.h.PL); see also
210 "typedef badvals" in Basic/Core/pdl.h.PL and the BOOT code of
211 Basic/Core/Core.xs.PL where the values are initialised to (hopefully)
212 sensible values. See PDL/Bad/bad.pd for read/write routines to the
213 values.
214
215 Why not make a PDL subclass?
216 The support for bad values could have been done as a PDL sub-class.
217 The advantage of this approach would be that you only load in the code
218 to handle bad values if you actually want to use them. The downside is
219 that the code then gets separated: any bug fixes/improvements have to
220 be done to the code in two different files. With the present approach
221 the code is in the same "pp_def" function (although there is still the
222 problem that both "Code" and "BadCode" sections need updating).
223
224 Default bad values
225 The default/original bad values are set to (taken from the Starlink
226 distribution):
227
228 #include <limits.h>
229
230 PDL_Byte == UCHAR_MAX
231 PDL_Short == SHRT_MIN
232 PDL_Ushort == USHRT_MAX
233 PDL_Long == INT_MIN
234 PDL_Float == -FLT_MAX
235 PDL_Double == -DBL_MAX
236
237 How do I change a routine to handle bad values?
238 Examples can be found in most of the *.pd files in Basic/ (and
239 hopefully many more places soon!). Some of the logic might appear a
240 bit unclear - that's probably because it is! Comments appreciated.
241
242 All routines should automatically propagate the bad status flag to
243 output ndarrays, unless you declare otherwise.
244
245 If a routine explicitly deals with bad values, you must provide this
246 option to pp_def:
247
248 HandleBad => 1
249
250 This ensures that the correct variables are initialised for the $ISBAD
251 etc macros. It is also used by the automatic document-creation routines
252 to provide default information on the bad value support of a routine
253 without the user having to type it themselves (this is in its early
254 stages).
255
256 To flag a routine as NOT handling bad values, use
257
258 HandleBad => 0
259
260 This should cause the routine to print a warning if it's sent any
261 ndarrays with the bad flag set. Primitive's "intover" has had this set
262 - since it would be awkward to convert - but I've not tried it out to
263 see if it works.
264
265 If you want to handle bad values but not set the state of all the
266 output ndarrays, or if it's only one input ndarray that's important,
267 then look at the PP rules "NewXSFindBadStatus" and "NewXSCopyBadStatus"
268 and the corresponding "pp_def" options:
269
270 FindBadStatusCode
271 By default, "FindBadStatusCode" creates code which sets
272 "$PRIV(bvalflag)" depending on the state of the bad flag of the
273 input ndarrays: see "findbadstatus" in Basic/Gen/PP.pm. User-
274 defined code should also store the value of "bvalflag" in the
275 "$BADFLAGCACHE()" variable.
276
277 CopyBadStatusCode
278 The default code here is a bit simpler than for
279 "FindBadStatusCode": the bad flag of the output ndarrays are set if
280 "$BADFLAGCACHE()" is true after the code has been evaluated.
281 Sometimes "CopyBadStatusCode" is set to an empty string, with the
282 responsibility of setting the badflag of the output ndarray left to
283 the "BadCode" section (e.g. the "xxxover" routines in
284 Basic/Primitive/primitive.pd).
285
286 Prior to PDL 2.4.3 we used "$PRIV(bvalflag)" instead of
287 "$BADFLAGCACHE()". This is dangerous since the "$PRIV()" structure
288 is not guaranteed to be valid at this point in the code.
289
290 If you have a routine that you want to be able to use as in-place, look
291 at the routines in bad.pd (or ops.pd) which use the "in-place" option
292 to see how the bad flag is propagated to children using the
293 "xxxBadStatusCode" options. I decided not to automate this as rules
294 would be a little complex, since not every in-place op will need to
295 propagate the badflag (eg unary functions).
296
297 If the option
298
299 HandleBad => 1
300
301 is given, then many things happen. For integer types, the readdata
302 code automatically creates a variable called "<pdl name>_badval", which
303 contains the bad value for that ndarray (see "get_xsdatapdecl()" in
304 Basic/Gen/PP/PdlParObjs.pm). However, do not hard code this name into
305 your code! Instead use macros (thanks to Tuomas for the suggestion):
306
307 '$ISBAD(a(n=>1))' expands to '$a(n=>1) == a_badval'
308 '$ISGOOD(a())' '$a() != a_badval'
309 '$SETBAD(bob())' '$bob() = bob_badval'
310
311 well, the "$a(...)" is expanded as well. Also, you can use a "$" before
312 the pdl name, if you so wish, but it begins to look like line noise -
313 eg "$ISGOOD($a())".
314
315 If you cache an ndarray value in a variable -- eg "index" in slices.pd
316 -- the following routines are useful:
317
318 '$ISBADVAR(c_var,pdl)' 'c_var == pdl_badval'
319 '$ISGOODVAR(c_var,pdl)' 'c_var != pdl_badval'
320 '$SETBADVAR(c_var,pdl)' 'c_var = pdl_badval'
321
322 The following have been introduced, They may need playing around with
323 to improve their use.
324
325 '$PPISBAD(CHILD,[i]) 'CHILD_physdatap[i] == CHILD_badval'
326 '$PPISGOOD(CHILD,[i]) 'CHILD_physdatap[i] != CHILD_badval'
327 '$PPSETBAD(CHILD,[i]) 'CHILD_physdatap[i] = CHILD_badval'
328
329 You can use "NaN" as the bad value for any floating-point type,
330 including complex.
331
332 This all means that you can change
333
334 Code => '$a() = $b() + $c();'
335
336 to
337
338 BadCode => 'if ( $ISBAD(b()) || $ISBAD(c()) ) {
339 $SETBAD(a());
340 } else {
341 $a() = $b() + $c();
342 }'
343
344 leaving Code as it is. PP::PDLCode will then create a loop something
345 like
346
347 if ( __trans->bvalflag ) {
348 threadloop over BadCode
349 } else {
350 threadloop over Code
351 }
352
353 (it's probably easier to just look at the .xs file to see what goes
354 on).
355
356 Going beyond the Code section
357 Similar to "BadCode", there's "BadBackCode", and "BadRedoDimsCode".
358
359 Handling "EquivCPOffsCode" is a bit different: under the assumption
360 that the only access to data is via the "$EQUIVCPOFFS(i,j)" macro, then
361 we can automatically create the 'bad' version of it; see the
362 "[EquivCPOffsCode]" and "[Code]" rules in PDL::PP.
363
364 Macro access to the bad flag of an ndarray
365 Macros have been provided to provide access to the bad-flag status of a
366 pdl:
367
368 '$PDLSTATEISBAD(a)' -> '($PDL(a)->state & PDL_BADVAL) > 0'
369 '$PDLSTATEISGOOD(a)' '($PDL(a)->state & PDL_BADVAL) == 0'
370
371 '$PDLSTATESETBAD(a)' '$PDL(a)->state |= PDL_BADVAL'
372 '$PDLSTATESETGOOD(a)' '$PDL(a)->state &= ~PDL_BADVAL'
373
374 For use in "xxxxBadStatusCode" (+ other stuff that goes into the INIT:
375 section) there are:
376
377 '$SETPDLSTATEBAD(a)' -> 'a->state |= PDL_BADVAL'
378 '$SETPDLSTATEGOOD(a)' -> 'a->state &= ~PDL_BADVAL'
379
380 '$ISPDLSTATEBAD(a)' -> '((a->state & PDL_BADVAL) > 0)'
381 '$ISPDLSTATEGOOD(a)' -> '((a->state & PDL_BADVAL) == 0)'
382
383 In PDL 2.4.3 the "$BADFLAGCACHE()" macro was introduced for use in
384 "FindBadStatusCode" and "CopyBadStatusCode".
385
387 One of the strengths of PDL is its on-line documentation. The aim is to
388 use this system to provide information on how/if a routine supports bad
389 values: in many cases "pp_def()" contains all the information anyway,
390 so the function-writer doesn't need to do anything at all! For the
391 cases when this is not sufficient, there's the "BadDoc" option. For
392 code written at the Perl level - i.e. in a .pm file - use the "=for
393 bad" pod directive.
394
395 This information will be available via man/pod2man/html documentation.
396 It's also accessible from the "perldl" or "pdl2" shells - using the
397 "badinfo" command - and the "pdldoc" shell command - using the "-b"
398 option.
399
401 There are a number of areas that need work, user input, or both! They
402 are mentioned elsewhere in this document, but this is just to make sure
403 they don't get lost.
404
405 Trapping invalid mathematical operations
406 Should we add exceptions to the functions in "PDL::Ops" to set the
407 output bad for out-of-range input values?
408
409 pdl> p log10(pdl(10,100,-1))
410
411 I would like the above to produce "[1 2 BAD]", but this would slow down
412 operations on all ndarrays. We could check for "NaN"/"Inf" values
413 after the operation, but I doubt that would be any faster.
414
415 Dataflow of the badflag
416 Currently changes to the bad flag are propagated to the children of an
417 ndarray, but perhaps they should also be passed on to the parents as
418 well. With the advent of per-ndarray bad values we need to consider how
419 to handle changes to the value used to represent bad items too.
420
422 The build process has been affected. The following files are now
423 created during the build:
424
425 Basic/Core/pdlcore.h pdlcore.h.PL
426 pdlcore.c pdlcore.c.PL
427 pdlapi.c pdlapi.c.PL
428 Core.xs Core.xs.PL
429 Core.pm Core.pm.PL
430
431 Several new files have been added:
432
433 Basic/Pod/BadValues.pod (i.e. this file)
434
435 t/bad.t
436
437 Basic/Bad/
438 Basic/Bad/Makefile.PL
439 bad.pd
440
441 etc
442
444 • what to do about "$y = pdl(-2); $x = log10($y)" - $x should be set
445 bad, but it currently isn't.
446
447 • Allow the operations in PDL::Ops to skip the check for bad values
448 when using NaN as a bad value and processing a floating-point
449 ndarray. Needs a fair bit of work to PDL::PP::PDLCode.
450
451 • "$pdl->baddata()" now updates all the children of this ndarray as
452 well. However, not sure what to do with parents, since:
453
454 $y = $x->slice();
455 $y->baddata(0)
456
457 doesn't mean that $x shouldn't have its badvalue cleared. however,
458 after
459
460 $y->baddata(1)
461
462 it's sensible to assume that the parents now get flagged as
463 containing bad values.
464
465 PERHAPS you can only clear the bad value flag if you are NOT a
466 child of another ndarray, whereas if you set the flag then all
467 children AND parents should be set as well?
468
469 Similarly, if you change the bad value in an ndarray, should this
470 be propagated to parent & children? Or should you only be able to
471 do this on the 'top-level' ndarray? Nasty...
472
473 • some of the names aren't appealing - I'm thinking of
474 "orig_badvalue()" in Basic/Bad/bad.pd in particular. Any
475 suggestions appreciated.
476
478 Copyright (C) Doug Burke (djburke@cpan.org), 2000, 2006.
479
480 The per-ndarray bad value support is by Heiko Klein (2006).
481
482 Commercial reproduction of this documentation in a different format is
483 forbidden.
484
485
486
487perl v5.34.0 2021-08-16 BADVALUES(1)