1funtext(n) SAORD Documentation funtext(n)
2
3
4
6 Funtext: Support for Column-based Text Files
7
9 This document contains a summary of the options for processing column-
10 based text files.
11
13 Funtools will automatically sense and process "standard" column-based
14 text files as if they were FITS binary tables without any change in
15 Funtools syntax. In particular, you can filter text files using the
16 same syntax as FITS binary tables:
17
18 fundisp foo.txt'[cir 512 512 .1]'
19 fundisp -T foo.txt > foo.rdb
20 funtable foo.txt'[pha=1:10,cir 512 512 10]' foo.fits
21
22 The first example displays a filtered selection of a text file. The
23 second example converts a text file to an RDB file. The third example
24 converts a filtered selection of a text file to a FITS binary table.
25
26 Text files can also be used in Funtools image programs. In this case,
27 you must provide binning parameters (as with raw event files), using
28 the bincols keyword specifier:
29
30 bincols=([xname[:tlmin[:tlmax:[binsiz]]]],[yname[:tlmin[:tlmax[:binsiz]]]
31
32 For example:
33
34 funcnts foo'[bincols=(x:1024,y:1024)]' "ann 512 512 0 10 n=10"
35
36 Standard Text Files
37
38 Standard text files have the following characteristics:
39
40 · Optional comment lines start with #
41
42 · Optional blank lines are considered comments
43
44 · An optional table header consists of the following (in order):
45
46 · a single line of alpha-numeric column names
47
48 · an optional line of unit strings containing the same number of
49 cols
50
51 · an optional line of dashes containing the same number of cols
52
53 · Data lines follow the optional header and (for the present) consist
54 of
55 the same number of columns as the header.
56
57 · Standard delimiters such as space, tab, comma, semi-colon, and bar.
58
59 Examples:
60
61 # rdb file
62 foo1 foo2 foo3 foos
63 ---- ---- ---- ----
64 1 2.2 3 xxxx
65 10 20.2 30 yyyy
66
67 # multiple consecutive whitespace and dashes
68 foo1 foo2 foo3 foos
69 --- ---- ---- ----
70 1 2.2 3 xxxx
71 10 20.2 30 yyyy
72
73 # comma delims and blank lines
74 foo1,foo2,foo3,foos
75
76 1,2.2,3,xxxx
77 10,20.2,30,yyyy
78
79 # bar delims with null values
80 foo1⎪foo2⎪foo3⎪foos
81 1⎪⎪3⎪xxxx
82 10⎪20.2⎪⎪yyyy
83
84 # header-less data
85 1 2.2 3 xxxx
86 10 20.2 30 yyyy
87
88 The default set of token delimiters consists of spaces, tabs, commas,
89 semi-colons, and vertical bars. Several parsers are used simultaneously
90 to analyze a line of text in different ways. One way of analyzing a
91 line is to allow a combination of spaces, tabs, and commas to be
92 squashed into a single delimiter (no null values between consecutive
93 delimiters). Another way is to allow tab, semi-colon, and vertical bar
94 delimiters to support null values, i.e. two consecutive delimiters
95 implies a null value (e.g. RDB file). A successful parser is one which
96 returns a consistent number of columns for all rows, with each column
97 having a consistent data type. More than one parser can be successful.
98 For now, it is assumed that successful parsers all return the same
99 tokens for a given line. (Theoretically, there are pathological cases,
100 which will be taken care of as needed). Bad parsers are discarded on
101 the fly.
102
103 If the header does not exist, then names "col1", "col2", etc. are
104 assigned to the columns to allow filtering. Furthermore, data types
105 for each column are determined by the data types found in the columns
106 of the first data line, and can be one of the following: string, int,
107 and double. Thus, all of the above examples return the following dis‐
108 play:
109
110 fundisp foo'[foo1>5]'
111 FOO1 FOO2 FOO3 FOOS
112 ---------- --------------------- ---------- ------------
113 10 20.20000000 30 yyyy
114
115 Comments Convert to Header Params
116
117 Comments which precede data rows are converted into header parameters
118 and will be written out as such using funimage or funhead. Two styles
119 of comments are recognized:
120
121 1. FITS-style comments have an equal sign "=" between the keyword and
122 value and an optional slash "/" to signify a comment. The strict FITS
123 rules on column positions are not enforced. In addition, strings only
124 need to be quoted if they contain whitespace. For example, the follow‐
125 ing are valid FITS-style comments:
126
127 # fits0 = 100
128 # fits1 = /usr/local/bin
129 # fits2 = "/usr/local/bin /opt/local/bin"
130 # fits3c = /usr/local/bin /opt/local/bin /usr/bin
131 # fits4c = "/usr/local/bin /opt/local/bin" / path dir
132
133 Note that the fits3c comment is not quoted and therefore its value is
134 the single token "/usr/local/bin" and the comment is "opt/local/bin
135 /usr/bin". This is different from the quoted comment in fits4c.
136
137 2. Free-form comments can have an optional colon separator between the
138 keyword and value. In the absence of quote, all tokens after the key‐
139 word are part of the value, i.e. no comment is allowed. If a string is
140 quoted, then slash "/" after the string will signify a comment. For
141 example:
142
143 # com1 /usr/local/bin
144 # com2 "/usr/local/bin /opt/local/bin"
145 # com3 /usr/local/bin /opt/local/bin /usr/bin
146 # com4c "/usr/local/bin /opt/local/bin" / path dir
147
148 # com11: /usr/local/bin
149 # com12: "/usr/local/bin /opt/local/bin"
150 # com13: /usr/local/bin /opt/local/bin /usr/bin
151 # com14c: "/usr/local/bin /opt/local/bin" / path dir
152
153 Note that com3 and com13 are not quoted, so the whole string is part of
154 the value, while comz4c and com14c are quoted and have comments follow‐
155 ing the values.
156
157 Some text files have column name and data type information in the
158 header. You can specify the format of column information contained in
159 the header using the "hcolfmt=" specification. See below for a detailed
160 description.
161
162 Multiple Tables in a Single File
163
164 Multiple tables are supported in a single file. If an RDB-style file is
165 sensed, then a ^L (vertical tab) will signify end of table. Otherwise,
166 an end of table is sensed when a new header (i.e., all alphanumeric
167 columns) is found. (Note that this heuristic does not work for single
168 column tables where the column type is ASCII and the table that follows
169 also has only one column.) You also can specify characters that signal
170 an end of table condition using the eot= keyword. See below for
171 details.
172
173 You can access the nth table (starting from 1) in a multi-table file by
174 enclosing the table number in brackets, as with a FITS extension:
175
176 fundisp foo'[2]'
177
178 The above example will display the second table in the file. (Index
179 values start at 1 in oder to maintain logical compatibility with FITS
180 files, where extension numbers also start at 1).
181
182 TEXT() Specifier
183
184 As with ARRAY() and EVENTS() specifiers for raw image arrays and raw
185 event lists respectively, you can use TEXT() on text files to pass
186 key=value options to the parsers. An empty set of keywords is equiva‐
187 lent to not having TEXT() at all, that is:
188
189 fundisp foo
190 fundisp foo'[TEXT()]'
191
192 are equivalent. A multi-table index number is placed before the TEXT()
193 specifier as the first token, when indexing into a multi-table:
194
195 fundisp foo'[2,TEXT(...)]'
196
197 The filter specification is placed after the TEXT() specifier, sepa‐
198 rated by a comma, or in an entirely separate bracket:
199
200 fundisp foo'[TEXT(...),circle 512 512 .1]'
201 fundisp foo'[2,TEXT(...)][circle 512 512 .1]'
202
203 Text() Keyword Options
204
205 The following is a list of keywords that can be used within the TEXT()
206 specifier (the first three are the most important):
207
208 · delims="[delims]"
209
210 Specify token delimiters for this file. Only a single parser having
211 these delimiters will be used to process the file.
212
213 fundisp foo.fits'[TEXT(delims="!")]'
214 fundisp foo.fits'[TEXT(delims="\t%")]'
215
216 · comchars="[comchars]"
217
218 Specify comment characters. You must include "\n" to allow blank
219 lines. These comment characters will be used for all standard
220 parsers (unless delims are also specified).
221
222 fundisp foo.fits'[TEXT(comchars="!\n")]'
223
224 · cols="[name1:type1 ...]"
225
226 Specify names and data type of columns. This overrides header names
227 and/or data types in the first data row or default names and data
228 types for header-less tables.
229
230 fundisp foo.fits'[TEXT(cols="x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e")]'
231
232 If the column specifier is the only keyword, then the cols= is not
233 required (in analogy with EVENTS()):
234
235 fundisp foo.fits'[TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
236
237 Of course, an index is allowed in this case:
238
239 fundisp foo.fits'[2,TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
240
241 · eot="[eot delim]"
242
243 Specify end of table string specifier for multi-table files. RDB
244 files support ^L. The end of table specifier is a string and the
245 whole string must be found alone on a line to signify EOT. For
246 example:
247
248 fundisp foo.fits'[TEXT(eot="END")]'
249
250 will end the table when a line contains "END" is found. Multiple
251 lines are supported, so that:
252
253 fundisp foo.fits'[TEXT(eot="END\nGAME")]'
254
255 will end the table when a line contains "END" followed by a line
256 containing "GAME".
257
258 In the absence of an EOT delimiter, a new table will be sensed when
259 a new header (all alphanumeric columns) is found.
260
261 · null1="[datatype]"
262
263 Specify data type of a single null value in row 1. Since column
264 data types are determined by the first row, a null value in that
265 row will result in an error and a request to specify names and data
266 types using cols=. If you only have a one null in row 1, you don't
267 need to specify all names and columns. Instead, use null1="type" to
268 specify its data type.
269
270 · alen=[n]
271
272 Specify size in bytes for ASCII type columns. FITS binary tables
273 only support fixed length ASCII columns, so a size value must be
274 specified. The default is 16 bytes.
275
276 · nullvalues=["true"⎪"false"]
277
278 Specify whether to expect null values. Give the parsers a hint as
279 to whether null values should be allowed. The default is to try to
280 determine this from the data.
281
282 · whitespace=["true"⎪"false"]
283
284 Specify whether surrounding white space should be kept as part of
285 string tokens. By default surrounding white space is removed from
286 tokens.
287
288 · header=["true"⎪"false"]
289
290 Specify whether to require a header. This is needed by tables con‐
291 taining all string columns (and with no row containing dashes), in
292 order to be able to tell whether the first row is a header or part
293 of the data. The default is false, meaning that the first row will
294 be data. If a row dashes are present, the previous row is consid‐
295 ered the column name row.
296
297 · units=["true"⎪"false"]
298
299 Specify whether to require a units line. Give the parsers a hint
300 as to whether a row specifying units should be allowed. The default
301 is to try to determine this from the data.
302
303 · i2f=["true"⎪"false"]
304
305 Specify whether to allow int to float conversions. If a column in
306 row 1 contains an integer value, the data type for that column will
307 be set to int. If a subsequent row contains a float in that same
308 column, an error will be signaled. This flag specifies that,
309 instead of an error, the float should be silently truncated to int.
310 Usually, you will want an error to be signaled, so that you can
311 specify the data type using cols= (or by changing the value of the
312 column in row 1).
313
314 · comeot=["true"⎪"false"⎪0⎪1⎪2]
315
316 Specify whether comment signifies end of table. If comeot is 0 or
317 false, then comments do not signify end of table and can be inter‐
318 spersed with data rows. If the value is true or 1 (the default for
319 standard parsers), then non-blank lines (e.g. lines beginning with
320 '#') signify end of table but blanks are allowed between rows. If
321 the value is 2, then all comments, including blank lines, signify
322 end of table.
323
324 · lazyeot=["true"⎪"false"]
325
326 Specify whether "lazy" end of table should be permitted (default is
327 true for standard formats, except rdb format where explicit ^L is
328 required between tables). A lazy EOT can occur when a new table
329 starts directly after an old one, with no special EOT delimiter. A
330 check for this EOT condition is begun when a given row contains all
331 string tokens. If, in addition, there is a mismatch between the
332 number of tokens in the previous row and this row, or a mismatch
333 between the number of string tokens in the prev row and this row, a
334 new table is assumed to have been started. For example:
335
336 ival1 sval3
337 ----- -----
338 1 two
339 3 four
340
341 jval1 jval2 tval3
342 ----- ----- ------
343 10 20 thirty
344 40 50 sixty
345
346 Here the line "jval1 ..." contains all string tokens. In addition,
347 the number of tokens in this line (3) differs from the number of
348 tokens in the previous line (2). Therefore a new table is assumed
349 to have started. Similarly:
350
351 ival1 ival2 sval3
352 ----- ----- -----
353 1 2 three
354 4 5 six
355
356 jval1 jval2 tval3
357 ----- ----- ------
358 10 20 thirty
359 40 50 sixty
360
361 Again, the line "jval1 ..." contains all string tokens. The number
362 of string tokens in the previous row (1) differs from the number of
363 tokens in the current row(3). We therefore assume a new table as
364 been started. This lazy EOT test is not performed if lazyeot is
365 explicitly set to false.
366
367 · hcolfmt=[header column format]
368
369 Some text files have column name and data type information in the
370 header. For example, VizieR catalogs have headers containing both
371 column names and data types:
372
373 #Column e_Kmag (F6.3) ?(k_msigcom) K total magnitude uncertainty (4) [ucd=ERROR]
374 #Column Rflg (A3) (rd_flg) Source of JHK default mag (6) [ucd=REFER_CODE]
375 #Column Xflg (I1) [0,2] (gal_contam) Extended source contamination (10) [ucd=CODE_MISC]
376
377 while Sextractor files have headers containing column names alone:
378
379 # 1 X_IMAGE Object position along x [pixel]
380 # 2 Y_IMAGE Object position along y [pixel]
381 # 3 ALPHA_J2000 Right ascension of barycenter (J2000) [deg]
382 # 4 DELTA_J2000 Declination of barycenter (J2000) [deg]
383
384 The hcolfmt specification allows you to describe which header lines
385 contain column name and data type information. It consists of a
386 string defining the format of the column line, using "$col" (or
387 "$name") to specify placement of the column name, "$fmt" to specify
388 placement of the data format, and "$skip" to specify tokens to
389 ignore. You also can specify tokens explicitly (or, for those users
390 familiar with how sscanf works, you can specify scanf skip speci‐
391 fiers using "%*"). For example, the VizieR hcolfmt above might be
392 specified in several ways:
393
394 Column $col ($fmt) # explicit specification of "Column" string
395 $skip $col ($fmt) # skip one token
396 %*s $col ($fmt) # skip one string (using scanf format)
397
398 while the Sextractor format might be specified using:
399
400 $skip $col # skip one token
401 %*d $col # skip one int (using scanf format)
402
403 You must ensure that the hcolfmt statement only senses actual col‐
404 umn definitions, with no false positives or negatives. For exam‐
405 ple, the first Sextractor specification, "$skip $col", will con‐
406 sider any header line containing two tokens to be a column name
407 specifier, while the second one, "%*d $col", requires an integer to
408 be the first token. In general, it is preferable to specify formats
409 as explicitly as possible.
410
411 Note that the VizieR-style header info is sensed automatically by
412 the funtools standard VizieR-like parser, using the hcolfmt "Column
413 $col ($fmt)". There is no need for explicit use of hcolfmt in this
414 case.
415
416 · debug=["true"⎪"false"]
417
418 Display debugging information during parsing.
419
420 Environment Variables
421
422 Environment variables are defined to allow many of these TEXT() values
423 to be set without having to include them in TEXT() every time a file is
424 processed:
425
426 keyword environment variable
427 ------- --------------------
428 delims TEXT_DELIMS
429 comchars TEXT_COMCHARS
430 cols TEXT_COLUMNS
431 eot TEXT_EOT
432 null1 TEXT_NULL1
433 alen TEXT_ALEN
434 bincols TEXT_BINCOLS
435 hcolfmt TEXT_HCOLFMT
436
437 Restrictions and Problems
438
439 As with raw event files, the '+' (copy extensions) specifier is not
440 supported for programs such as funtable.
441
442 String to int and int to string data conversions are allowed by the
443 text parsers. This is done more by force of circumstance than by con‐
444 viction: these transitions often happens with VizieR catalogs, which we
445 want to support fully. One consequence of allowing these transitions is
446 that the text parsers can get confused by columns which contain a valid
447 integer in the first row and then switch to a string. Consider the fol‐
448 lowing table:
449
450 xxx yyy zzz
451 ---- ---- ----
452 111 aaa bbb
453 ccc 222 ddd
454
455 The xxx column has an integer value in row one a string in row two,
456 while the yyy column has the reverse. The parser will erroneously treat
457 the first column as having data type int:
458
459 fundisp foo.tab
460 XXX YYY ZZZ
461 ---------- ------------ ------------
462 111 'aaa' 'bbb'
463 1667457792 '222' 'ddd'
464
465 while the second column is processed correctly. This situation can be
466 avoided in any number of ways, all of which force the data type of the
467 first column to be a string. For example, you can edit the file and
468 explicitly quote the first row of the column:
469
470 xxx yyy zzz
471 ---- ---- ----
472 "111" aaa bbb
473 ccc 222 ddd
474
475 [sh] fundisp foo.tab
476 XXX YYY ZZZ
477 ------------ ------------ ------------
478 '111' 'aaa' 'bbb'
479 'ccc' '222' 'ddd'
480
481 You can edit the file and explicitly set the data type of the first
482 column:
483
484 xxx:3A yyy zzz
485 ------ ---- ----
486 111 aaa bbb
487 ccc 222 ddd
488
489 [sh] fundisp foo.tab
490 XXX YYY ZZZ
491 ------------ ------------ ------------
492 '111' 'aaa' 'bbb'
493 'ccc' '222' 'ddd'
494
495 You also can explicitly set the column names and data types of all col‐
496 umns, without editing the file:
497
498 [sh] fundisp foo.tab'[TEXT(xxx:3A,yyy:3A,zzz:3a)]'
499 XXX YYY ZZZ
500 ------------ ------------ ------------
501 '111' 'aaa' 'bbb'
502 'ccc' '222' 'ddd'
503
504 The issue of data type transitions (which to allow and which to disal‐
505 low) is still under discussion.
506
508 See funtools(n) for a list of Funtools help pages
509
510
511
512version 1.4.0 August 15, 2007 funtext(n)