funtext(n)

1funtext(n)                    SAORD Documentation                   funtext(n)
2
3
4

NAME

6       Funtext: Support for Column-based Text Files
7

SYNOPSIS

9       This document contains a summary of the options for processing column-
10       based text files.
11

DESCRIPTION

13       Funtools will automatically sense and process "standard" column-based
14       text files as if they were FITS binary tables without any change in
15       Funtools syntax. In particular, you can filter text files using the
16       same syntax as FITS binary tables:
17
18         fundisp foo.txt'[cir 512 512 .1]'
19         fundisp -T foo.txt > foo.rdb
20         funtable foo.txt'[pha=1:10,cir 512 512 10]' foo.fits
21
22       The first example displays a filtered selection of a text file.  The
23       second example converts a text file to an RDB file.  The third example
24       converts a filtered selection of a text file to a FITS binary table.
25
26       Text files can also be used in Funtools image programs. In this case,
27       you must provide binning parameters (as with raw event files), using
28       the bincols keyword specifier:
29
30         bincols=([xname[:tlmin[:tlmax:[binsiz]]]],[yname[:tlmin[:tlmax[:binsiz]]]
31
32       For example:
33
34         funcnts foo'[bincols=(x:1024,y:1024)]' "ann 512 512 0 10 n=10"
35
36       Standard Text Files
37
38       Standard text files have the following characteristics:
39
40       ·   Optional comment lines start with #
41
42       ·   Optional blank lines are considered comments
43
44       ·   An optional table header consists of the following (in order):
45
46           ·   a single line of alpha-numeric column names
47
48           ·   an optional line of unit strings containing the same number of
49               cols
50
51           ·   an optional line of dashes containing the same number of cols
52
53       ·   Data lines follow the optional header and (for the present) consist
54           of
55                the same number of columns as the header.
56
57       ·   Standard delimiters such as space, tab, comma, semi-colon, and bar.
58
59       Examples:
60
61         # rdb file
62         foo1  foo2    foo3    foos
63         ----  ----    ----    ----
64         1     2.2     3       xxxx
65         10    20.2    30      yyyy
66
67         # multiple consecutive whitespace and dashes
68         foo1   foo2    foo3 foos
69         ---    ----    ---- ----
70            1    2.2    3    xxxx
71           10   20.2    30   yyyy
72
73         # comma delims and blank lines
74         foo1,foo2,foo3,foos
75
76         1,2.2,3,xxxx
77         10,20.2,30,yyyy
78
79         # bar delims with null values
80         foo1⎪foo2⎪foo3⎪foos
81         1⎪⎪3⎪xxxx
82         10⎪20.2⎪⎪yyyy
83
84         # header-less data
85         1     2.2   3 xxxx
86         10    20.2 30 yyyy
87
88       The default set of token delimiters consists of spaces, tabs, commas,
89       semi-colons, and vertical bars. Several parsers are used simultaneously
90       to analyze a line of text in different ways.  One way of analyzing a
91       line is to allow a combination of spaces, tabs, and commas to be
92       squashed into a single delimiter (no null values between consecutive
93       delimiters). Another way is to allow tab, semi-colon, and vertical bar
94       delimiters to support null values, i.e. two consecutive delimiters
95       implies a null value (e.g. RDB file).  A successful parser is one which
96       returns a consistent number of columns for all rows, with each column
97       having a consistent data type.  More than one parser can be successful.
98       For now, it is assumed that successful parsers all return the same
99       tokens for a given line. (Theoretically, there are pathological cases,
100       which will be taken care of as needed). Bad parsers are discarded on
101       the fly.
102
103       If the header does not exist, then names "col1", "col2", etc.  are
104       assigned to the columns to allow filtering.  Furthermore, data types
105       for each column are determined by the data types found in the columns
106       of the first data line, and can be one of the following: string, int,
107       and double. Thus, all of the above examples return the following dis‐
108       play:
109
110         fundisp foo'[foo1>5]'
111               FOO1                  FOO2       FOO3         FOOS
112         ---------- --------------------- ---------- ------------
113                 10           20.20000000         30         yyyy
114
115       Comments Convert to Header Params
116
117       Comments which precede data rows are converted into header parameters
118       and will be written out as such using funimage or funhead. Two styles
119       of comments are recognized:
120
121       1. FITS-style comments have an equal sign "=" between the keyword and
122       value and an optional slash "/" to signify a comment. The strict FITS
123       rules on column positions are not enforced. In addition, strings only
124       need to be quoted if they contain whitespace. For example, the follow‐
125       ing are valid FITS-style comments:
126
127         # fits0 = 100
128         # fits1 = /usr/local/bin
129         # fits2 = "/usr/local/bin /opt/local/bin"
130         # fits3c = /usr/local/bin /opt/local/bin /usr/bin
131         # fits4c = "/usr/local/bin /opt/local/bin" / path dir
132
133       Note that the fits3c comment is not quoted and therefore its value is
134       the single token "/usr/local/bin" and the comment is "opt/local/bin
135       /usr/bin".  This is different from the quoted comment in fits4c.
136
137       2. Free-form comments can have an optional colon separator between the
138       keyword and value. In the absence of quote, all tokens after the key‐
139       word are part of the value, i.e. no comment is allowed. If a string is
140       quoted, then slash "/" after the string will signify a comment.  For
141       example:
142
143         # com1 /usr/local/bin
144         # com2 "/usr/local/bin /opt/local/bin"
145         # com3 /usr/local/bin /opt/local/bin /usr/bin
146         # com4c "/usr/local/bin /opt/local/bin" / path dir
147
148         # com11: /usr/local/bin
149         # com12: "/usr/local/bin /opt/local/bin"
150         # com13: /usr/local/bin /opt/local/bin /usr/bin
151         # com14c: "/usr/local/bin /opt/local/bin" / path dir
152
153       Note that com3 and com13 are not quoted, so the whole string is part of
154       the value, while comz4c and com14c are quoted and have comments follow‐
155       ing the values.
156
157       Some text files have column name and data type information in the
158       header.  You can specify the format of column information contained in
159       the header using the "hcolfmt=" specification. See below for a detailed
160       description.
161
162       Multiple Tables in a Single File
163
164       Multiple tables are supported in a single file. If an RDB-style file is
165       sensed, then a ^L (vertical tab) will signify end of table. Otherwise,
166       an end of table is sensed when a new header (i.e., all alphanumeric
167       columns) is found. (Note that this heuristic does not work for single
168       column tables where the column type is ASCII and the table that follows
169       also has only one column.) You also can specify characters that signal
170       an end of table condition using the eot= keyword. See below for
171       details.
172
173       You can access the nth table (starting from 1) in a multi-table file by
174       enclosing the table number in brackets, as with a FITS extension:
175
176         fundisp foo'[2]'
177
178       The above example will display the second table in the file.  (Index
179       values start at 1 in oder to maintain logical compatibility with FITS
180       files, where extension numbers also start at 1).
181
182       TEXT() Specifier
183
184       As with ARRAY() and EVENTS() specifiers for raw image arrays and raw
185       event lists respectively, you can use TEXT() on text files to pass
186       key=value options to the parsers. An empty set of keywords is equiva‐
187       lent to not having TEXT() at all, that is:
188
189         fundisp foo
190         fundisp foo'[TEXT()]'
191
192       are equivalent. A multi-table index number is placed before the TEXT()
193       specifier as the first token, when indexing into a multi-table:
194
195         fundisp foo'[2,TEXT(...)]'
196
197       The filter specification is placed after the TEXT() specifier, sepa‐
198       rated by a comma, or in an entirely separate bracket:
199
200         fundisp foo'[TEXT(...),circle 512 512 .1]'
201         fundisp foo'[2,TEXT(...)][circle 512 512 .1]'
202
203       Text() Keyword Options
204
205       The following is a list of keywords that can be used within the TEXT()
206       specifier (the first three are the most important):
207
208       ·   delims="[delims]"
209
210           Specify token delimiters for this file. Only a single parser having
211           these delimiters will be used to process the file.
212
213             fundisp foo.fits'[TEXT(delims="!")]'
214             fundisp foo.fits'[TEXT(delims="\t%")]'
215
216       ·   comchars="[comchars]"
217
218           Specify comment characters. You must include "\n" to allow blank
219           lines.  These comment characters will be used for all standard
220           parsers (unless delims are also specified).
221
222             fundisp foo.fits'[TEXT(comchars="!\n")]'
223
224       ·   cols="[name1:type1 ...]"
225
226           Specify names and data type of columns. This overrides header names
227           and/or data types in the first data row or default names and data
228           types for header-less tables.
229
230             fundisp foo.fits'[TEXT(cols="x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e")]'
231
232           If the column specifier is the only keyword, then the cols= is not
233           required (in analogy with EVENTS()):
234
235             fundisp foo.fits'[TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
236
237           Of course, an index is allowed in this case:
238
239             fundisp foo.fits'[2,TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
240
241       ·   eot="[eot delim]"
242
243           Specify end of table string specifier for multi-table files. RDB
244           files support ^L. The end of table specifier is a string and the
245           whole string must be found alone on a line to signify EOT. For
246           example:
247
248             fundisp foo.fits'[TEXT(eot="END")]'
249
250           will end the table when a line contains "END" is found. Multiple
251           lines are supported, so that:
252
253             fundisp foo.fits'[TEXT(eot="END\nGAME")]'
254
255           will end the table when a line contains "END" followed by a line
256           containing "GAME".
257
258           In the absence of an EOT delimiter, a new table will be sensed when
259           a new header (all alphanumeric columns) is found.
260
261       ·   null1="[datatype]"
262
263           Specify data type of a single null value in row 1.  Since column
264           data types are determined by the first row, a null value in that
265           row will result in an error and a request to specify names and data
266           types using cols=. If you only have a one null in row 1, you don't
267           need to specify all names and columns. Instead, use null1="type" to
268           specify its data type.
269
270       ·   alen=[n]
271
272           Specify size in bytes for ASCII type columns.  FITS binary tables
273           only support fixed length ASCII columns, so a size value must be
274           specified. The default is 16 bytes.
275
276       ·   nullvalues=["true"⎪"false"]
277
278           Specify whether to expect null values.  Give the parsers a hint as
279           to whether null values should be allowed. The default is to try to
280           determine this from the data.
281
282       ·   whitespace=["true"⎪"false"]
283
284           Specify whether surrounding white space should be kept as part of
285           string tokens.  By default surrounding white space is removed from
286           tokens.
287
288       ·   header=["true"⎪"false"]
289
290           Specify whether to require a header.  This is needed by tables con‐
291           taining all string columns (and with no row containing dashes), in
292           order to be able to tell whether the first row is a header or part
293           of the data. The default is false, meaning that the first row will
294           be data. If a row dashes are present, the previous row is consid‐
295           ered the column name row.
296
297       ·   units=["true"⎪"false"]
298
299           Specify whether to require a units line.  Give the parsers a hint
300           as to whether a row specifying units should be allowed. The default
301           is to try to determine this from the data.
302
303       ·   i2f=["true"⎪"false"]
304
305           Specify whether to allow int to float conversions.  If a column in
306           row 1 contains an integer value, the data type for that column will
307           be set to int. If a subsequent row contains a float in that same
308           column, an error will be signaled. This flag specifies that,
309           instead of an error, the float should be silently truncated to int.
310           Usually, you will want an error to be signaled, so that you can
311           specify the data type using cols= (or by changing the value of the
312           column in row 1).
313
314       ·   comeot=["true"⎪"false"⎪0⎪1⎪2]
315
316           Specify whether comment signifies end of table.  If comeot is 0 or
317           false, then comments do not signify end of table and can be inter‐
318           spersed with data rows. If the value is true or 1 (the default for
319           standard parsers), then non-blank lines (e.g. lines beginning with
320           '#') signify end of table but blanks are allowed between rows. If
321           the value is 2, then all comments, including blank lines, signify
322           end of table.
323
324       ·   lazyeot=["true"⎪"false"]
325
326           Specify whether "lazy" end of table should be permitted (default is
327           true for standard formats, except rdb format where explicit ^L is
328           required between tables). A lazy EOT can occur when a new table
329           starts directly after an old one, with no special EOT delimiter. A
330           check for this EOT condition is begun when a given row contains all
331           string tokens. If, in addition, there is a mismatch between the
332           number of tokens in the previous row and this row, or a mismatch
333           between the number of string tokens in the prev row and this row, a
334           new table is assumed to have been started. For example:
335
336             ival1 sval3
337             ----- -----
338             1     two
339             3     four
340
341             jval1 jval2 tval3
342             ----- ----- ------
343             10    20    thirty
344             40    50    sixty
345
346           Here the line "jval1 ..." contains all string tokens.  In addition,
347           the number of tokens in this line (3) differs from the number of
348           tokens in the previous line (2). Therefore a new table is assumed
349           to have started. Similarly:
350
351             ival1 ival2 sval3
352             ----- ----- -----
353             1     2     three
354             4     5     six
355
356             jval1 jval2 tval3
357             ----- ----- ------
358             10    20    thirty
359             40    50    sixty
360
361           Again, the line "jval1 ..." contains all string tokens. The number
362           of string tokens in the previous row (1) differs from the number of
363           tokens in the current row(3). We therefore assume a new table as
364           been started. This lazy EOT test is not performed if lazyeot is
365           explicitly set to false.
366
367       ·   hcolfmt=[header column format]
368
369           Some text files have column name and data type information in the
370           header.  For example, VizieR catalogs have headers containing both
371           column names and data types:
372
373             #Column e_Kmag  (F6.3)  ?(k_msigcom) K total magnitude uncertainty (4)  [ucd=ERROR]
374             #Column Rflg    (A3)    (rd_flg) Source of JHK default mag (6)  [ucd=REFER_CODE]
375             #Column Xflg    (I1)    [0,2] (gal_contam) Extended source contamination (10) [ucd=CODE_MISC]
376
377           while Sextractor files have headers containing column names alone:
378
379             #   1 X_IMAGE         Object position along x                         [pixel]
380             #   2 Y_IMAGE         Object position along y                         [pixel]
381             #   3 ALPHA_J2000     Right ascension of barycenter (J2000)           [deg]
382             #   4 DELTA_J2000     Declination of barycenter (J2000)               [deg]
383
384           The hcolfmt specification allows you to describe which header lines
385           contain column name and data type information. It consists of a
386           string defining the format of the column line, using "$col" (or
387           "$name") to specify placement of the column name, "$fmt" to specify
388           placement of the data format, and "$skip" to specify tokens to
389           ignore. You also can specify tokens explicitly (or, for those users
390           familiar with how sscanf works, you can specify scanf skip speci‐
391           fiers using "%*").  For example, the VizieR hcolfmt above might be
392           specified in several ways:
393
394             Column $col ($fmt)    # explicit specification of "Column" string
395             $skip  $col ($fmt)    # skip one token
396             %*s $col  ($fmt)      # skip one string (using scanf format)
397
398           while the Sextractor format might be specified using:
399
400             $skip $col            # skip one token
401             %*d $col              # skip one int (using scanf format)
402
403           You must ensure that the hcolfmt statement only senses actual col‐
404           umn definitions, with no false positives or negatives.  For exam‐
405           ple, the first Sextractor specification, "$skip $col", will con‐
406           sider any header line containing two tokens to be a column name
407           specifier, while the second one, "%*d $col", requires an integer to
408           be the first token. In general, it is preferable to specify formats
409           as explicitly as possible.
410
411           Note that the VizieR-style header info is sensed automatically by
412           the funtools standard VizieR-like parser, using the hcolfmt "Column
413           $col ($fmt)".  There is no need for explicit use of hcolfmt in this
414           case.
415
416       ·   debug=["true"⎪"false"]
417
418           Display debugging information during parsing.
419
420       Environment Variables
421
422       Environment variables are defined to allow many of these TEXT() values
423       to be set without having to include them in TEXT() every time a file is
424       processed:
425
426         keyword       environment variable
427         -------       --------------------
428         delims        TEXT_DELIMS
429         comchars      TEXT_COMCHARS
430         cols          TEXT_COLUMNS
431         eot           TEXT_EOT
432         null1         TEXT_NULL1
433         alen          TEXT_ALEN
434         bincols       TEXT_BINCOLS
435         hcolfmt       TEXT_HCOLFMT
436
437       Restrictions and Problems
438
439       As with raw event files, the '+' (copy extensions) specifier is not
440       supported for programs such as funtable.
441
442       String to int and int to string data conversions are allowed by the
443       text parsers. This is done more by force of circumstance than by con‐
444       viction: these transitions often happens with VizieR catalogs, which we
445       want to support fully. One consequence of allowing these transitions is
446       that the text parsers can get confused by columns which contain a valid
447       integer in the first row and then switch to a string. Consider the fol‐
448       lowing table:
449
450         xxx   yyy     zzz
451         ----  ----    ----
452         111   aaa     bbb
453         ccc   222     ddd
454
455       The xxx column has an integer value in row one a string in row two,
456       while the yyy column has the reverse. The parser will erroneously treat
457       the first column as having data type int:
458
459         fundisp foo.tab
460                XXX          YYY          ZZZ
461         ---------- ------------ ------------
462                111        'aaa'        'bbb'
463         1667457792        '222'        'ddd'
464
465       while the second column is processed correctly. This situation can be
466       avoided in any number of ways, all of which force the data type of the
467       first column to be a string. For example, you can edit the file and
468       explicitly quote the first row of the column:
469
470         xxx   yyy     zzz
471         ----  ----    ----
472         "111" aaa     bbb
473         ccc   222     ddd
474
475         [sh] fundisp foo.tab
476                  XXX          YYY          ZZZ
477         ------------ ------------ ------------
478                '111'        'aaa'        'bbb'
479                'ccc'        '222'        'ddd'
480
481       You can edit the file and explicitly set the data type of the first
482       column:
483
484         xxx:3A   yyy  zzz
485         ------   ---- ----
486         111      aaa  bbb
487         ccc      222  ddd
488
489         [sh] fundisp foo.tab
490                  XXX          YYY          ZZZ
491         ------------ ------------ ------------
492                '111'        'aaa'        'bbb'
493                'ccc'        '222'        'ddd'
494
495       You also can explicitly set the column names and data types of all col‐
496       umns, without editing the file:
497
498         [sh] fundisp foo.tab'[TEXT(xxx:3A,yyy:3A,zzz:3a)]'
499                  XXX          YYY          ZZZ
500         ------------ ------------ ------------
501                '111'        'aaa'        'bbb'
502                'ccc'        '222'        'ddd'
503
504       The issue of data type transitions (which to allow and which to disal‐
505       low) is still under discussion.
506

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO