1dirfile-format(5) DATA FORMATS dirfile-format(5)
2
3
4
6 dirfile-format — the dirfile database format specification file
7
9 The dirfile format file fully specifies the raw and derived time
10 streams and auxiliary information for a dirfile(5) database.
11
12 The format file is a case sensitive text file called format located in
13 the dirfile directory. The explicit text encoding of the file is not
14 specified by these standards, but must be 7-bit ASCII compatible. Exam‐
15 ples of acceptable character encodings include all the ISO 8859 charac‐
16 ter sets (i.e. Latin-1 through Latin-10, among others), as well as the
17 UTF-8 encoding of Unicode and UCS.
18
19
21 The format file is composed of field specification lines and directive
22 lines, optionally separated by blank lines or lines containing only
23 whitespace. Lines are separated by the line-feed character (0x0A).
24 Unless escaped (see below), the hash mark (#) is the comment delimiter;
25 the comment delimiter, and any text following it to the end of the
26 line, is ignored.
27
28
29 Tokens
30 Both field specification lines and directive lines consist of several
31 tokens separated by whitespace. Whitespace consists of one or more
32 whitespace characters. These are: space (0x20), horizontal tab (0x09),
33 vertical tab (0x0B), form-feed (0x0C), and carriage return (0x0D). The
34 first token of a directive line is always a reserved word. The first
35 token of a field specification line is never a reserved word. Any
36 amount of whitespace may precede the first token on a line.
37
38 Since tokens are separated by whitespace, to include a whitespace char‐
39 acter in a token, it must either escaped by preceding it by a backslash
40 character (\), or be replaced by a character escape sequence (see
41 below), or else the token must be enclosed in quotation marks ("). The
42 quotation marks themselves will be stripped from the token. The null-
43 token (that is, the token consisting of zero characters) may be speci‐
44 fied by a pair of quotation marks with nothing between them (""). To
45 include a literal quotation mark in a token, it must be escaped (\").
46 Similarly, a hash mark may be included in a token by including it in a
47 quoted token or else by escaping it (\#), otherwise the hash mark will
48 be understood as the comment delimiter.
49
50 It is a syntax error to have a line which contains unmatched quotation
51 marks, or in which the last character is the backslash character.
52
53 Several characters when escaped by a preceding backslash character are
54 interpreted as special characters in tokens. The character escape
55 sequences are:
56
57 \a an alert (bell) character (ASCII 0x07 / U+0007)
58
59 \b a backspace character (ASCII 0x08 / U+0008)
60
61 \e an escape character (ASCII 0x1B / U+001B)
62
63 \f a form-feed character (ASCII 0x0C / U+000C)
64
65 \n a line-feed character (ASCII 0x0A / U+000A)
66
67 \r a carriage return character (ASCII 0x0D / U+000D)
68
69 \t a horizontal tab character (ASCII 0x09 / U+0009)
70
71 \v a vertical tab character (ASCII 0x0B / U+000B)
72
73 \\ a backslash character (ASCII 0x5C / U+005C)
74
75 \ooo the single byte given by the octal number ooo.
76
77 \xhh the single byte given by the hexadecimal number hh.
78
79 \uhhhhhhh
80 the UTF-8 byte sequence encoding the Unicode code point
81 given by the hexadecimal number hhhhhhh.
82
83 Any other character which is escaped is interpreted as the character
84 itself. (i.e. \c is interpreted as c).
85
86 No token may contain the NULL character (ASCII 0x00 / U+0000). Fur‐
87 thermore, although support is present to create UTF-8 byte sequences,
88 tokens are not required to be valid UTF-8 sequences. Any byte sequence
89 not containing the NULL character forms a valid token. However, there
90 may be further restrictions on allowed characters for a token in a par‐
91 ticular situation, (for example, when used as a field name).
92
93
95 There are eight reserved words, which cannot be used as field names in
96 the dirfile. Instead, these specify directives. Any reserved word may
97 omit its initial forward slash (/), without change in meaning. Future
98 versions of the Standards may require the slash to distinguish a
99 reserved word from a field name. Like the rest of the format file,
100 directives are case sensitive.
101
102 A number of the directives have fragment scope. A directive with frag‐
103 ment scope only applies to the fragment in which it is present, plus
104 any sub-fragments indicated by the /INCLUDE directive, but only if
105 those sub-fragments don't have their own corresponding directive.
106 Directives which have fragment scope are: /ENCODING, /ENDIAN, /FRAME‐
107 OFFSET, and /PROTECT. Because of these scoping rules, different por‐
108 tions of the dirfile may have different encodings, endiannesses, frame
109 offsets, or protection levels.
110
111 If a directive with fragment scope appears more than once in a frag‐
112 ment, only the last such directive will be honoured, with the exception
113 that the effect of a directive will not be propagated to sub-fragments
114 if the directive line appears after the sub-fragment is included. The
115 scoping rules of the remaining directives are discussed below.
116
117
118 /ENCODING
119 The ENCODING directive specifies the encoding scheme used to
120 encode binary files in the dirfile. The encoding scheme may be
121 one of the predefined names listed below, which are described in
122 more detail in dirfile-encoding(5), or any other site-specific
123 encoding scheme. The predefined scheme names are:
124
125 none The dirfile is unencoded.
126
127 bzip2 The dirfile is compressed using the bzip2 compression
128 scheme.
129
130 gzip The dirfile is compressed using the gzip compression
131 scheme.
132
133 lzma The dirfile is compressed using the LZMA compression
134 scheme.
135
136 slim The dirfile is compressed using the slim compression
137 scheme.
138
139 text The dirfile is text encoded.
140
141 Implementations should fail gracefully when encountering an
142 unknown encoding scheme. If no encoding scheme is specified,
143 behaviour is implementation dependent. Syntax is:
144
145 /ENCODING <scheme>
146
147 The ENCODING directive has fragment scope.
148
149 /ENDIAN
150 The ENDIAN directive specifies the endianness of the raw data in
151 the database. In previous versions of the Dirfile Standard, raw
152 data was always assumed to be little-endian. This assumption
153 has been removed. The assumed endianness of raw data in
154 dirfiles which omit this directive is implementation dependent.
155 Syntax is:
156
157 /ENDIAN ( big | little )
158
159 The ENDIAN directive has fragment scope.
160
161 /FRAMEOFFSET
162 The FRAMEOFFSET directive specifies the frame number of the
163 first frame for which data exists in binary files associated
164 with RAW fields. Syntax is:
165
166 /FRAMEOFFSET <integer>
167
168 The FRAMEOFFSET directive has fragment scope.
169
170 /INCLUDE
171 The INCLUDE directive specifies another file (called a format
172 file fragment) to parse for additional format specification for
173 the dirfile. The inclusion is treated as if the lines of the
174 fragment were pasted verbatim in place of the INCLUDE directive
175 line. The exception to this is that RAW fields specified in the
176 fragment are located in the directory containing the fragment
177 and not in the directory containing the parent format file, and
178 the binary file encoding may be different for each fragment.
179 The fragment may be specified either with an absolute path, or
180 else a relative path from the current file. Syntax is:
181
182 /INCLUDE <file>
183
184 The INCLUDE directive has no scope: it is processed immediately
185 and has no long-term effect.
186
187 /META The META directive specifies a metafield attached to a particu‐
188 lar parent field. The field metadata may be of any allowed type
189 except RAW. Metafields are retrieved in exactly the same way as
190 regular field data, but the field code specified consists of the
191 parent and metafield names joined with a forward slash:
192
193 <parent-field>/<meta-field>
194
195 META fields may not be specified before their parent field has
196 been. Syntax is:
197
198 /META <parent-field> {field specification line}
199
200 As an illustration of this concept,
201
202 /META pfield meta CONST FLOAT64 3.291882
203
204 provides a scalar metadatum called meta with value 3.291882
205 attached to the field pfield. This particular metafield may be
206 referred to by the field code "pfield/meta". Note that differ‐
207 ent parent fields may have metafields with the same name, since
208 all references to metafields must include the parent field name.
209 Metafields may not themselves have further sub-metafields.
210
211 As an alternative to the META directive, a metafield may be
212 specified by a standard field specification line, using
213
214 <parent-field>/<meta-field>
215
216 as the field name. That is, the above example metafield could
217 have also been specified as:
218
219 pfield/meta CONST FLOAT64 3.291882
220
221 The META directive has no scope: it is processed immediately and
222 has no long-term effect.
223
224 /PROTECT
225 The PROTECT directive specifies the advisory protection level of
226 the current fragment and of the RAW fields defined therein. The
227 protection level indicates whether writing to the format file
228 fragment, or the binary data on disk is permitted. Syntax is:
229
230 /PROTECT <level>
231
232 Four advisory protection levels are defined:
233
234 none No protection at all: data and metadata may be freely
235 changed. This is the default, if no PROTECT directive is
236 present.
237
238 format The dirfile metadata is protected from change, but RAW
239 data on disk may be modified.
240
241 data The RAW data on disk is protected from change, but meta‐
242 data may be modified.
243
244 all Both metadata and data on disk are protected from change.
245
246 The PROTECT directive has fragment scope.
247
248 /REFERENCE
249 The REFERENCE directive specifies the name of the field to use
250 as the dirfile's reference field (see dirfile(5)). If no REFER‐
251 ENCE directive is specified, the first RAW field encountered is
252 used as the reference field. The REFERENCE directive must spec‐
253 ify a RAW field. Syntax is:
254
255 /REFERENCE <field-code>
256
257 The REFERENCE directive has global scope: if multiple REFERENCE
258 directives appear in the dirfile metadata, only the last such
259 will be honoured.
260
261 /VERSION
262 The VERSION directive specifies the particular version of the
263 Dirfile Standards to which the dirfile format file conforms.
264 This directive should occur before any version dependent syntax
265 is encountered. As of Standards Version 6, no such syntax
266 exists, and this directive is provided primarily to ease forward
267 compatibility. Syntax is:
268
269 /VERSION <integer>
270
271 The VERSION directive has immediate scope: its effect is immedi‐
272 ate, and it applies only to metadata below it, including and
273 propagating downwards to sub-fragments after the directive. Its
274 effect will also propagate upwards back to the parent fragment,
275 and affect subsequent metadata.
276
277
279 Any line which does not start with a reserved word is assumed to be a
280 field specification line. A field specification line consists of at
281 least two tokens. The first token is the field name. The second token
282 is the field type. Subsequent tokens are field parameters. The mean‐
283 ing and number these parameters depends on the field type specified.
284
285
286 Field Names
287 The first token in a field specification line is the field name. The
288 field name consists of one or more characters, excluding both ASCII
289 control characters (the bytes 0x01 through 0x1F), and the characters
290
291 & / ; < > | .
292
293 which are reserved (but see below for the use of / to specify
294 metafields). The field name may not be INDEX, which is a special,
295 implicit field which contains the integer frame index. Field names are
296 case sensitive.
297
298 If the field name beginning a field specifiction line does contain a /
299 character, the line is assumed to specify a metafield. See the META
300 directive above for further details.
301
302
303 Field Types
304 There are ten field types. Of these, eight are of vector type (BIT,
305 LINCOM, LINTERP, MULTIPLY, PHASE, POLYNOM, RAW, and SBIT) and two are
306 of scalar type (CONST and STRING). The possible fields types are:
307
308 BIT The BIT vector field type extracts one or more bits out of an
309 input vector field as an unsigned number. Syntax is:
310
311 <field-name> BIT <input> <first-bit> [<bits>]
312
313 which specifies field-name to be the value of bits first-bit
314 through first-bit+bits-1 of the input vector field input, when
315 input is converted from its native type to an (endianness cor‐
316 rected) unsigned 64-bit integer. If bits is omitted, it is
317 assumed to be 1. Both first-bit and bits may be either literal
318 numbers, or else the field code of a CONST field type containing
319 their values. The SBIT field type is a signed version of this
320 field type.
321
322 CONST The CONST scalar field type is a constant fully specified in the
323 format file metadata. Syntax is:
324
325 <field-name> CONST <type> <value>
326
327 where type may be any supported native data type (see the
328 description of the RAW field type below), and value is the
329 numerical value of the constant interpreted as indicated by
330 type.
331
332 LINCOM The LINCOM vector field type is the linear combination of one,
333 two or three input vector fields. Syntax is:
334
335 <field-name> LINCOM [<n>] <field1> <a1> <b1> [<field2>
336 <a2> <b2> [<field3> <a3> <b3>]]
337
338 where n, if present, indicates the number of input vector fields
339 (1, 2, or 3). The derived field will be computed as:
340
341 field-name[n] = (a1 * field1[n] + b1) + (a2 * field2[n2]
342 + b2) + (a3 * field3[n3] + b3)
343
344 with the field2 and field3 terms included only if specified and
345 the indices n2 and n3 computed appropriately for the (poten‐
346 tially differing) sample rates of the input fields. The resul‐
347 tant field will have the same sample rate as field1. Each sup‐
348 plied co-efficient (a1, b1, a2, &c.) may be either a literal
349 number, or else the field code of a CONST field type containing
350 its value.
351
352 If n is not specified, the number of fields is determined by
353 looking at the supplied parameters. Since it is possible to
354 create a field code which is identical to a literal number, the
355 third token on the line is assumed to be n if it the entire
356 token can be parsed as a literal number using the rules outlined
357 in strtod(3). That is, if the field code specifying field1
358 could be mistaken for a literal number, n must be specified to
359 prevent ambiguity.
360
361 LINTERP
362 The LINTERP vector field type specifies a table look up based on
363 another vector field. Syntax is:
364
365 <field-name> LINTERP <input> <table>
366
367 where input is the input vector field for the table lookup, and
368 table is the path to the lookup table file for the field. If
369 this path is relative, it is assumed to be relative to the
370 directory containing the format file fragment defining this
371 field. The lookup table file is an ASCII text file with two
372 whitespace separated columns of x and y values. Values are lin‐
373 early interpolated between the points specified in the lookup
374 table.
375
376 MULTIPLY
377 The MULTIPLY vector field type is the product of two vector
378 fields. Syntax is:
379
380 <field-name> MULTIPLY <field1> <field2>
381
382 The derived field will be computed as:
383
384 field-name[n] = field1[n] * field2[n2]
385
386 with the index n2 computed appropriately for the (potentially
387 differing) sample rates of the input fields. The resultant
388 field will have the same sample rate as field1.
389
390 PHASE The PHASE vector field type shifts an input vector field by the
391 specified number of samples. Syntax is:
392
393 <field-name> PHASE <input> <shift>
394
395 which specifies field-name to be the input vector field, input,
396 shifted by shift samples. A positive shift indicates a shift
397 forward in time. Results of shifting past the beginning- or
398 end-of-file is implementation dependent. The shift parameter
399 may be either a literal number, or else the field code of a
400 CONST field type containing its values.
401
402 POLYNOM
403 The POLYNOM vector field type specifies a polynomial function of
404 a single input vector field. Synax is:
405
406 <field_name> POLYNOM <input> <a0> <a1>
407 [<a2> [<a3> [<a4> [<a5>]]]]
408
409 where <input> is the input field code, and the order of the com‐
410 puted polynomial is determined by how many co-efficients are
411 present in the specification. The derived field is computed as:
412
413 field-name[n] = a0 + a1 * input[n] + a2 * input[n]**2 +
414 a3 * input[n]**3 + a4 * input[n]**4 + a5 * input[n]**5
415
416 where ** is the exponentiation operator, and the higher order
417 terms are computed only if the corresponding co-efficients ai
418 are specified. The coefficients, if specified, may be either
419 literal numbers, or else the field code of a CONST field type
420 containing the value.
421
422 RAW The RAW vector field type specifies raw time streams on disk.
423 In this case, the field name should correspond to the name of
424 the file containing the time stream. Syntax is:
425
426 <field-name> RAW <type> <sample-rate>
427
428 where sample-rate is the number of samples per dirfile frame for
429 the time stream and type is a token specifying the native data
430 format type:
431
432 UINT8 unsigned 8-bit integer
433
434 INT8 signed (two's complement) 8-bit integer
435
436 UINT16 unsigned 16-bit integer
437
438 INT16 signed (two's complement) 16-bit integer
439
440 UINT32 unsigned 32-bit integer
441
442 INT32 signed (two's complement) 32-bit integer
443
444 UINT64 unsigned 64-bit integer
445
446 INT64 signed (two's complement) 64-bit integer
447
448 FLOAT32 or FLOAT
449 IEEE-754 standard 32-bit single precision floating
450 point number
451
452 FLOAT64 or DOUBLE
453 IEEE-754 standard 64-bit double precision floating
454 point number
455
456 COMPLEX64
457 a 64-bit complex number consisting of two IEEE-754
458 standard 32-bit single precision floating point
459 numbers representing the real and imaginary parts
460 of the complex number.
461
462 COMPLEX128
463 a 128-bit complex number consisting of two
464 IEEE-754 standard 64-bit double precision floating
465 point numbers representing the real and imaginary
466 parts of the complex number.
467
468 For more information on the storage of complex valued data, see
469 dirfile(5).
470
471 For backwards compatibility, implementations should also recog‐
472 nise the following single character type aliases in use prior to
473 Standards Version 5:
474
475 c UINT8
476
477 u UINT16
478
479 s INT16
480
481 U UINT32
482
483 i, S INT32
484
485 f FLOAT32
486
487 d FLOAT64
488
489 Types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 are not
490 supported before Standards Version 5, so no single character
491 type aliases exist for these types.
492
493 The sample-rate parameter may be either a literal number, or
494 else the name of a CONST field type containing its values.
495
496 SBIT The SBIT vector field type extracts one or more bits out of an
497 input vector field as a signed number. Syntax is:
498
499 <field-name> SBIT <input> <first-bit> [<bits>]
500
501 which specifies field-name to be the value of bits first-bit
502 through first-bit+bits-1 of the input vector field input, when
503 input is converted from its native type to a (endianness cor‐
504 rected) signed 64-bit integer. If bits is omitted, it is
505 assumed to be 1. Both first-bit and bits may be either literal
506 numbers, or else the field code of a CONST field type containing
507 their values. The BIT field type is an unsigned version of this
508 field type.
509
510 STRING The STRING scalar field type is a character string fully speci‐
511 fied in the format file metadata. Syntax is:
512
513 <field-name> STRING <value>
514
515 where value is the string value of the field. Note that value
516 is a single token. To include whitespace in the string, enclose
517 value in quotation marks ("), or else escape the whitespace with
518 the backslash character (\).
519
520
521 Field Parameters
522 All input vector field parameters should be field codes (see below).
523 Additionally, some of the numerical field parameters may be either lit‐
524 eral numbers or else the field code of a CONST field containing the
525 value. Parameters for which this is possible are indicated above.
526 Since it is possible to create a field code which is identical to a
527 literal number, a parameter is assumed to be the field code of a CONST
528 field only if the entire token cannot be parsed as a literal number
529 using the rules outlined in strtod(3). (So, for example, a CONST field
530 whose field code consists solely of digits can never be used as a
531 parameter in a field specification line.)
532
533 A literal complex number is specified as two real (floating point) num‐
534 bers separated by a semicolon (;) with no intervening whitespace. So,
535 for example, the tokens
536
537 1;0 0;1 4;0 0;5 9.313e2;74.1
538
539 represent, respectively, the real unit, the imaginary unit, the real
540 number four, the imaginary number 5i, and the complex number 931.3 +
541 74.1i. Because the semicolon character cannot be used in field names,
542 a complex valued literal can never be mistaken for a field code. This
543 allows, among other things, the composition of complex valued fields
544 from purely real input fields. For example, a complex valued field, z,
545 may be created from a real valued field re, representing the real part
546 of the complex number, and the real valued field im, representing the
547 imaginary part of the complex number, with the following LINCOM speci‐
548 fication:
549
550 z LINCOM re 1 0 im 0;1 0
551
552
553 Field Codes
554 When specifying the input to a field, either as a CONST scalar parame‐
555 ter, or as an input vector field to a non-RAW vector field, field codes
556 are used. A field code is one of:
557
558 · a simple field name, indicating a vector or scalar field
559
560 · a parent field name, followed by a forward slash, followed by a
561 metafield name, indicating a metafield. See the description of the
562 META directive above for further details.
563
564 · either of the above, followed by a period, followed by a represen‐
565 tation suffix, but only if the field or metafield specified is not
566 a STRING type field.
567
568 A representation suffix may be used used to extract a real number from
569 a complex value. The available suffixes and their meanings are:
570
571 .a This representation indicates the angle (in radians) between the
572 positive real axis and the value (ie. the complex argument).
573 The argument is in the range [-pi, pi], and a branch cut exists
574 along the negative real axis. At the branch cut, -pi is
575 returned if the imaginary part is -0, and pi is returned if the
576 branch cut is +0. If z=0, zero is returned.
577
578 .i This representation indicates the projection of the value onto
579 the imaginary axis (ie. the imaginary part of the number).
580
581 .m This representation indicates the modulus of the value (ie. its
582 absolute value).
583
584 .r This representation indicates the projection of the value onto
585 the real axis (ie. the real part of the number).
586
587 If the specified field is purely real, the representations are calcu‐
588 lated as if the imaginary part was equal to +0. For example, given a
589 complex valued vector, z, a vector containing the real part of z, re_z,
590 could be produced with:
591
592 re_z PHASE z.r 0
593
594 and similarly for the complex field's imaginary part, argument, and
595 absolute value. (Although it should be pointed out this simplistic an
596 example isn't strictly necessary, since z.r could be used wherever re_z
597 would be.)
598
599
601 This document describes Version 7 of the Dirfile Standards.
602
603 Version 7 of the Standards (October 2009) added the SBIT and POLYNOM
604 field types, and the directive-less method of specifying metafields.
605 It also introduced the data types COMPLEX128 and COMPLEX64, along with
606 the notion of representations. Finally, it made the number of fields
607 parameter for LINCOM optional.
608
609 Version 6 of the Standards (October 2008) added the /ENCOD‐
610 ING, /META, /PROTECT, and /REFERENCE directives, and the CONST and
611 STRING field types. It permitted whitespace in tokens and introduced
612 the character escape sequences. It allowed CONST fields to be used as
613 parameters in field specification lines. It also removed FILEFRAM as
614 an alias for INDEX, and allowed # and \ in field names.
615
616 Version 5 of the Standards (August 2008) added /VERSION and /ENDIAN,
617 slash demarcation of reserved words, and removed the restriction on
618 field name length. It introduced the data types INT8, INT64, and
619 UINT64, the new-style type specifiers, and increased the range of the
620 BIT field type from 32 to 64 bits. It also prohibited the characters
621 #&/;<>\.| in field names.
622
623 Version 4 of the Standards (October 2006) added the PHASE field type.
624
625 Version 3 of the Standards (January 2006) added INCLUDE and increased
626 the allowed length of a field name from 16 to 50 characters.
627
628 Version 2 of the Standards (September 2005) added the MULTIPLY field
629 type.
630
631 Version 1 of the Standards (November 2004) added FRAMEOFFSET and the
632 optional fourth argument to the BIT field type.
633
634 Version 0 of the Standards (before March 2003) refers to the dirfile
635 standards supported by the getdata(3) library originally introduced
636 into the kst(1) sources, which contained support for all other features
637 covered by this document.
638
639
641 The dirfile specification was developed by C. B. Netterfield
642 <netterfield@astro.utoronto.ca>
643
644 Since Standards Version 3, the dirfile specification has been main‐
645 tained by D. V. Wiebe <dwiebe@physics.utoronto.ca>
646
647
649 dirfile(5), dirfile-encoding(5)
650
651
652
653Standards Version 7 19 October 2009 dirfile-format(5)