1dirfile-format(5) DATA FORMATS dirfile-format(5)
2
3
4
6 dirfile-format — the dirfile database format specification file
7
9 The dirfile format specification fully specifies the raw and derived
10 time streams and auxiliary information for a dirfile(5) database.
11
12 The format specification is contained in one or more case-sensitive
13 text files located in the dirfile tree. Each file is known as a frag‐
14 ment. The primary fragment is the file called format located in the
15 base dirfile directory. This file may contain only part of the format
16 specification, and may reference other fragments (using the /INCLUDE
17 directive) containing further format specification. This inclusion
18 mechanism may be nested arbitrarily deep.
19
20 The explicit text encoding of these files is not specified by these
21 Standards, but it must be 7-bit ASCII compatible. Examples of accept‐
22 able character encodings include all the ISO 8859 character sets (i.e.
23 Latin-1 through Latin-10, among others), as well as the UTF-8 encoding
24 of Unicode and UCS.
25
26 This document primarily describes the latest version of the Standards
27 (Version 10); differences with previous versions are noted where rele‐
28 vant. A complete list of changes between versions is given in the HIS‐
29 TORY section below.
30
31
33 The format specification is composed of field specification lines and
34 directive lines, optionally separated by blank lines or lines contain‐
35 ing only whitespace. Lines are separated by the line-feed character
36 (0x0A). Unless escaped (see below), the hash mark (#) is the comment
37 delimiter; the comment delimiter, and any text following it to the end
38 of the line, is ignored.
39
40
41 Tokens
42 Both field specification lines and directive lines consist of several
43 tokens separated by whitespace. Whitespace consists of one or more
44 whitespace characters. These are: space (0x20), horizontal tab (0x09),
45 vertical tab (0x0B), form-feed (0x0C), and carriage return (0x0D). The
46 first token of a directive line is always a reserved word. The first
47 token of a field specification line is never a reserved word. Any
48 amount of whitespace may precede the first token on a line.
49
50 Since tokens are separated by whitespace, to include a whitespace char‐
51 acter in a token, it must either escaped by preceding it by a backslash
52 character (\), or be replaced by a character escape sequence (see
53 below), or else the token must be enclosed in quotation marks ("). The
54 quotation marks themselves are stripped from the token. The null-token
55 (that is, the token consisting of zero characters) may be specified by
56 a pair of quotation marks with nothing between them (""). To include a
57 literal quotation mark in a token, it must be escaped (\"). Similarly,
58 a hash mark may be included in a token by including it in a quoted
59 token or else by escaping it (\#), otherwise the hash mark is under‐
60 stood as the comment delimiter.
61
62 It is a syntax error to have a line which contains unmatched quotation
63 marks, or in which the last character is the backslash character.
64
65 Several characters when escaped by a preceding backslash character are
66 interpreted as special characters in tokens. The character escape
67 sequences are:
68
69 \a an alert (bell) character (ASCII 0x07 / U+0007)
70
71 \b a backspace character (ASCII 0x08 / U+0008)
72
73 \e an escape character (ASCII 0x1B / U+001B)
74
75 \f a form-feed character (ASCII 0x0C / U+000C)
76
77 \n a line-feed character (ASCII 0x0A / U+000A)
78
79 \r a carriage return character (ASCII 0x0D / U+000D)
80
81 \t a horizontal tab character (ASCII 0x09 / U+0009)
82
83 \v a vertical tab character (ASCII 0x0B / U+000B)
84
85 \\ a backslash character (ASCII 0x5C / U+005C)
86
87 \ooo the single byte given by the octal number ooo (1 to 3
88 octal digits).
89
90 \xhh the single byte given by the hexadecimal number hh (1 or
91 2 hexadecimal digits).
92
93 \uhhhhhhh
94 the UTF-8 byte sequence encoding the Unicode code point
95 given by the hexadecimal number hhhhhhh (1 to 7 hexadeci‐
96 mal digits).
97
98 Any other character which is escaped is interpreted as the character
99 itself. (i.e. \c is interpreted as c; also, as pointed out above, \"
100 and \# are interpreted as simply " and #, without their special mean‐
101 ings).
102
103 No token may contain the NULL character (ASCII 0x00 / U+0000). Fur‐
104 thermore, although support is present to create UTF-8 byte sequences,
105 tokens are not required to be valid UTF-8 sequences. Any byte sequence
106 not containing the NULL character forms a valid token. However, there
107 may be further restrictions on allowed characters for a token in a par‐
108 ticular situation, (for example, when used as a field name).
109
110 Standards Version 5 and earlier do not recognise the character escape
111 sequences, nor allow quoting of tokens. As a result, they prohibit both
112 whitespace and the comment delimiter from being used in tokens.
113
114
116 There are ten directives, each specified by a different reserved word,
117 which cannot be used as field names in the dirfile. As of Standards
118 Version 8, all reserved words start with an initial forward slash (/),
119 to distinguish them from field names. Standards Versions 5, 6, and 7
120 permitted the omission of the initial forward slash, while in Standards
121 Version 4 and earlier, reserved words may not have an initial forward
122 slash. Like the rest of the format specification, directives are case
123 sensitive.
124
125 A number of the directives have fragment scope. A directive with frag‐
126 ment scope only applies to the fragment in which it is present, plus
127 any sub-fragments indicated by the /INCLUDE directive, but only if
128 those sub-fragments don't have their own corresponding directive.
129 Directives which have fragment scope are: /ENCODING, /ENDIAN, /FRAME‐
130 OFFSET, and /PROTECT. Because of these scoping rules, different por‐
131 tions of the dirfile may have different encodings, endiannesses, frame
132 offsets, or protection levels.
133
134 If a directive with fragment scope appears more than once in a frag‐
135 ment, only the last such directive is honoured, with the exception that
136 the effect of a directive is not propagated to sub-fragments if the
137 directive line appears after the sub-fragment is included. The scoping
138 rules of the remaining directives are discussed below.
139
140
141 /ALIAS The /ALIAS directive defines an alternate name for a field
142 defined elsewhere in the format specification (called the "tar‐
143 get"). Aliases may not be used as the parent field in a /META
144 directive, but are in most other ways indistinguishable from the
145 target's original, canonical name. Aliases may be chained (that
146 is, the target name appearing in an /ALIAS directive may itself
147 be an alias). In this case, the new alias is another name for
148 the target's own target. Just as there is no requirement that
149 the input fields of a derived field exist, it is not an error
150 for the target of an alias to not exist. Syntax is:
151
152 /ALIAS <name> <target>
153
154 A metafield alias may defined using the <parent-field>/<alias-
155 name> syntax for name in the /ALIAS directive. No restriction
156 is placed on target; specifically, a metafield alias may target
157 a top-level field, or a metafield of with a different parent;
158 conversely, a top-level alias may target a metafield.
159
160 A metafield alias may never appear as the parent part of a
161 metafield field code, even if it refers to a top-level field.
162 That is, given the valid format:
163
164 aaaa RAW UINT8 1
165 aaaa/bbbb CONST FLOAT64 0.0
166 cccc RAW UINT8 1
167 /ALIAS cccc/dddd aaaa
168
169 the metafield aaaa/bbbb may not be referred to as
170 cccc/dddd/bbbb, even though cccc/dddd is a valid field code
171 referring to aaaa.
172
173 This is not true of top-level aliases: if eeee is an alias of
174 ffff, then ffff/gggg, a metafield of ffff, may be referred to as
175 eeee/gggg as well.
176
177 The /ALIAS directive has no scope: it is processed immediately.
178 It appeared in Standards Version 9.
179
180 /ENCODING
181 The /ENCODING directive specifies the encoding scheme used to
182 encode binary files in the dirfile. The encoding scheme may be
183 one of the predefined names listed below, which are described in
184 more detail in dirfile-encoding(5), or any other site-specific
185 encoding scheme. The predefined scheme names are:
186
187 none The dirfile is unencoded.
188
189 bzip2 The dirfile is compressed using the bzip2 compression
190 scheme.
191
192 flac The dirfile is compressed using the flac compression
193 scheme.
194
195 gzip The dirfile is compressed using the gzip compression
196 scheme.
197
198 lzma The dirfile is compressed using the LZMA compression
199 scheme.
200
201 slim The dirfile is compressed using the slim compression
202 scheme.
203
204 sie The dirfile is sample-index encoded (a variant of run-
205 length encoding).
206
207 text The dirfile is text encoded.
208
209 zzip The dirfile is compressed and encapsulated using the zzip
210 compression scheme.
211
212 zzslim The dirfile is compressed and encapsulated using a combi‐
213 nation of the zzip and slim compression schemes.
214
215 Implementations should fail gracefully when encountering an
216 unknown encoding scheme. If no encoding scheme is specified,
217 behaviour is implementation dependent. Syntax is:
218
219 /ENCODING <scheme> [<enc-datum>]
220
221 The enc-datum token provides additional data for certain encod‐
222 ing schemes; see dirfile-encoding(5) for details. The form of
223 enc-datum is not specified.
224
225 The /ENCODING directive has fragment scope. It appeared in
226 Standards Version 6. The predefined schemes sie, zzip, and
227 zzslim, and the optional enc-datum token, appeared in Standards
228 Version 9; the predefined scheme lzma appeared in Standards Ver‐
229 sion 7; all other predefined schemes appeared in Standards Ver‐
230 sion 6.
231
232 /ENDIAN
233 The /ENDIAN directive specifies the endianness of the raw data
234 in the database. The assumed endianness of raw data in dirfiles
235 which omit this directive is implementation dependent. Syntax
236 is:
237
238 /ENDIAN ( big | little ) [ arm ]
239
240 where the "arm" token should be included if double precision
241 floating point data are stored in the ARM middle-endian format.
242 The /ENDIAN directive has fragment scope. It appeared in Stan‐
243 dards Version 5. The optional arm token appeared in Standards
244 Version 8.
245
246 /FRAMEOFFSET
247 The /FRAMEOFFSET directive specifies the frame number of the
248 first frame for which data exists in binary files associated
249 with RAW fields. Syntax is:
250
251 /FRAMEOFFSET <integer>
252
253 The /FRAMEOFFSET directive has fragment scope. It appeared in
254 Standards Version 1.
255
256 /HIDDEN
257 The /HIDDEN directive indicates that the specified field name is
258 hidden. The difference (if any) between a field name which is
259 hidden and one that is not is implementation dependent. Hidden‐
260 ness is not inherited by metafields of the specified field.
261 Hiddenness applies to the name, not the field itself; it does
262 not hide all aliases of the field-name, and if field-name an
263 alias, the alias is hidden, not its target. Syntax is:
264
265 /HIDDEN <field-name>
266
267 A /HIDDEN directive must appear after the specification of
268 field-name, (which occurs either in a field specification line,
269 or an /ALIAS directive, or a /META directive) in the same frag‐
270 ment.
271
272 The /HIDDEN directive has no scope: it is processed immediately.
273 It appeared in Standards Version 9.
274
275 /INCLUDE
276 The /INCLUDE directive specifies another file (called a frag‐
277 ment) to parse for additional format specification for the
278 dirfile. The inclusion is processed immediately, before the
279 fragment containing the /INCLUDE directive (the parent fragment)
280 is parsed further. RAW fields specified in the included frag‐
281 ment are located in the directory containing the fragment file,
282 and not in the directory containing the parent fragment, and the
283 binary file encoding may be different for each fragment. The
284 fragment may be specified either with an absolute path, or else
285 a path relative to the directory containing the parent fragment.
286
287 The /INCLUDE directive may optionally specify a prefix and/or
288 suffix to apply to field names defined in the included fragment.
289 If present, affixes are applied to all field-names (including
290 aliases) defined in the included fragment and any fragments it
291 further includes. Affixes nest, with the affixes of the deepest
292 inclusion innermost. Affixes are not applied to the names of
293 binary files associated with RAW fields. Syntax is:
294
295 /INCLUDE <file> [<namespace>.][<prefix>] [<suffix>]
296
297 To specify only suffix, the null-token ("") may be used as pre‐
298 fix.
299
300 A namespace may also be specified in an /INCLUDE directive by
301 prepending it to prefix. The namespace and prefix are separated
302 by a dot (.). The dot is required whenever a namespace is spec‐
303 ified: if the prefix is empty, the third token should be just
304 the namespace followed by a trailing dot. If a namespace is
305 specified, that namespace, relative to the including fragment's
306 root namespace, becomes the root namespace of the included frag‐
307 ment. If no namespace is specified in the /INCLUDE directive,
308 then the current namespace (specified by a previous /NAMESPACE
309 directive) is used as the root namespace of the included frag‐
310 ment. That is, if the current namespace is current_space, then
311 the statement:
312
313 /INCLUDE file newspace.
314
315 is equivalent to
316
317 /NAMESPACE newspace
318 /INCLUDE file
319 /NAMESPACE current_space
320
321 As a result, if no namespace is provided, and there has been no
322 previous /NAMESPACE directive, the included fragment will have
323 the same root namespace as the including fragment.
324
325 The /INCLUDE directive has no scope: it is processed immediate‐
326 ly. It appeared in Standards Version 3. The optional prefix
327 and suffix appeared in Standards Version 9. The optional name‐
328 space appeared in Standards Version 10.
329
330 /META The /META directive specifies a metafield attached to a particu‐
331 lar parent field. The field metadata may be of any allowed type
332 except RAW. Metafields are retrieved in exactly the same way as
333 regular field data, but the field code specified consists of the
334 parent and metafield names joined with a forward slash:
335
336 <parent-field>/<meta-field>
337
338 META fields may not be specified before their parent field has
339 been. Syntax is:
340
341 /META <parent-field> {field specification line}
342
343 The <parent-field> code may not be an alias. As an illustration
344 of this concept,
345
346 /META pfield meta CONST FLOAT64 3.291882
347
348 provides a scalar metadatum called meta with value 3.291882 at‐
349 tached to the field pfield. This particular metafield may be
350 referred to by the field code "pfield/meta". Note that differ‐
351 ent parent fields may have metafields with the same name, since
352 all references to metafields must include the parent field name.
353 Metafields may not themselves have further sub-metafields.
354
355 As an alternative to the /META directive, starting with Stan‐
356 dards Version 7, a metafield may be specified by a standard
357 field specification line, using
358
359 <parent-field>/<meta-field>
360
361 as the field name. That is, the above example metafield could
362 have also been specified as:
363
364 pfield/meta CONST FLOAT64 3.291882
365
366 The /META directive has no scope: it is processed immediately.
367 It appeared in Standards Version 6.
368
369 /NAMESPACE
370 The /NAMESPACE directive changes the current namespaceforsubse‐
371 quentfieldspecificationlines. Syntax is:
372
373 /NAMESPACE <subspace>
374
375 The subspace specified is relative to the current fragment's
376 root namespace. If subspace is the null-token ("") the current
377 namespace will be set back to the root namespace. Otherwise,
378 the current namespace will be changed to the concatenation of
379 the root namespace with subspace, with the two parts separated
380 by a dot:
381
382 rootspace.subspace
383
384 If rootspace is empty, the intervening dot is omitted, and the
385 current namespace is simply subspace.
386
387 By default, all field codes, both field names for newly speci‐
388 fied fields, and field codes used as inputs to fields or targets
389 for aliases, are placed in the current namespace, unless they
390 start with an initial dot, in which case the current namespace
391 is ignored, and they're placed instead in the fragment's root
392 namespace. See the Namespaces section for further details.
393
394 The /NAMESPACE directive has no scope: it is processed immedi‐
395 ately. For the effects of changing the current namespace on in‐
396 cluded fragments, see the /INCLUDE directive above. The effects
397 of a /NAMESPACE directive never propagate upwards to parent
398 fragments. It appeared in Standards Version 10.
399
400 /PROTECT
401 The /PROTECT directive specifies the advisory protection level
402 of the current fragment and of the RAW fields defined therein.
403 The protection level indicates whether writing to the fragment,
404 or the binary data on disk is permitted. Syntax is:
405
406 /PROTECT <level>
407
408 Four advisory protection levels are defined:
409
410 none No protection at all: data and metadata may be freely
411 changed. This is the default, if no /PROTECT directive
412 is present.
413
414 format The dirfile metadata is protected from change, but RAW
415 data on disk may be modified.
416
417 data The RAW data on disk is protected from change, but meta‐
418 data may be modified.
419
420 all Both metadata and data on disk are protected from change.
421
422 The /PROTECT directive has fragment scope. It appeared in Stan‐
423 dards Version 6.
424
425 /REFERENCE
426 The /REFERENCE directive specifies the name of the field to use
427 as the dirfile's reference field (see dirfile(5)). If no /REF‐
428 ERENCE directive is specified, the first RAW field encountered
429 is used as the reference field. The /REFERENCE directive must
430 specify a RAW field. Syntax is:
431
432 /REFERENCE <field-code>
433
434 The /REFERENCE directive has global scope: if multiple /REFER‐
435 ENCE directives appear in the dirfile metadata, only the last
436 such is honoured. It appeared in Standards Version 6.
437
438 /VERSION
439 The /VERSION directive specifies the particular version of the
440 Dirfile Standards to which the dirfile format specification con‐
441 forms. This directive should occur before any version dependent
442 syntax is encountered. As of Standards Version 6, no such syn‐
443 tax exists, and this directive is provided primarily to ease
444 forward compatibility. Syntax is:
445
446 /VERSION <integer>
447
448 The /VERSION directive has immediate scope: its effect is imme‐
449 diate, and it applies only to metadata below it, including and
450 propagating downwards to sub-fragments after the directive.
451
452 In Standards Version 8 and earlier, its effect also propagates
453 upwards back to the parent fragment, and affects subsequent
454 metadata. Starting with Standards Version 9, this no longer
455 happens. As a result, a /VERSION directive which indicates a
456 version of 9 or later never propagates upwards; additionally,
457 /VERSION directives found in subfragments included in a Version
458 9 or later fragment aren't propagated upwards into that frag‐
459 ment, regardless of the Version of the subfragments. The /VER‐
460 SION directive appeared in Standards Version 5.
461
462
464 Any line which does not start with a reserved word is assumed to be a
465 field specification line. A field specification line consists of at
466 least two tokens. The first token is the field name. The second token
467 is the field type. Subsequent tokens are field parameters. The mean‐
468 ing and number these parameters depends on the field type specified.
469
470
471 Field Names
472 The first token in a field specification line is the field name. The
473 field name consists of one or more characters, excluding both ASCII
474 control characters (the bytes 0x01 through 0x1F), and the characters
475
476 & / ; < > | .
477
478 which are reserved (but see below for the use of / to specify
479 metafields). The dot (.) is allowed in Standards Version 5 and earli‐
480 er. The ampersand, semicolon, less-than sign, greater-than sign, and
481 vertical line (& ; < > |) are allowed in Standards Version 4 and earli‐
482 er. Furthermore, due to the lack of an escape or quoting mechanism
483 (see Tokens above), Standards Version 5 and earlier also prohibit
484 whitespace and the comment delimiter (#) in field names.
485
486 The field name may not be INDEX, which is a special, implicit field
487 which contains the integer frame index. Standards Version 5 and earli‐
488 er also prohibit FILEFRAM, which was an alias for INDEX. Field names
489 are case sensitive. Standards Version 3 and 4 restrict field names to
490 50 characters. Standards Version 2 and earlier restrict field names to
491 16 characters. Additionally, the filesystem may put restrictions on the
492 length and acceptable characters of a RAW field name, regardless of
493 Standards Version.
494
495 Starting in Standards Version 7, if the field name beginning a field
496 specification line contains exactly one forward slash character (/),
497 the line is assumed to specify a metafield. See the /META directive
498 above for further details. A field name may not contain more than one
499 forward slash.
500
501 Starting in Standards Version 10, any field name may be preceded by a
502 namespace tag. The namespace tag and the field name are separated by a
503 dot (.). See the Namespaces section, following, for details.
504
505
506 Namespaces
507 Beginning with Standards Version 10, every field in a Dirfile is con‐
508 tained in a namespace. Every namespace is identified by a namespace
509 tag which consist of the same restricted set of characters used for
510 field names. Namespaces nest arbitrarily deep. Subnamespaces are
511 identified by concatenating all namespace tags, separating tags by dots
512 (.), with the outermost namespace leftmost:
513
514 topspace.subspace.subsubspace
515
516 Each fragment has an immutable root namespace. The root namespace of
517 the primary format file is the null namespace, identified by the null-
518 token (""). The root namespace of other fragments is specified when
519 they are introduced (see the /INCLUDE directive). Each fragment also
520 has a current namespace which may be changed as often as needed using
521 the /NAMESPACE directive, and defaults to the root namespace. The cur‐
522 rent namespace is always either the root namespace or else a subspace
523 under the root namespace.
524
525 If a field name or field code starts with a leading dot, then that name
526 or code is taken to be relative to the fragment's root space. If it
527 does not start with a dot, it is taken to be relative to the current
528 namespace.
529
530 For example, if the both the root namespace and current namespace of a
531 fragment start off as rootspace, then:
532
533 aaaa RAW UINT8 1
534 .bbbb RAW UINT8 1
535 cccc.dddd RAW UINT8 1
536 .eeee.ffff RAW UINT8 1
537 /NAMESPACE newspace
538 gggg RAW UINT8 1
539 .hhhh RAW UINT8 1
540 iiii.jjjj RAW UINT8 1
541 .kkkk.llll RAW UINT8 1
542
543 specifies, respectively, the fields:
544
545 rootspace.aaaa,
546 rootspace.bbbb,
547 rootspace.cccc.dddd,
548 rootspace.eeee.ffff,
549 rootspace.newspace.gggg,
550 rootspace.hhhh,
551 rootspace.newspace.iiii.jjjj, and
552 rootspace.kkkk.llll.
553
554 Note that a field may specify deeper subspaces under either the root
555 namespace or the current namespace (meaning it is never necessary to
556 use the /NAMESPACE directive). Note also that there is no way for meta‐
557 data in a given fragment to refer to fields outside the fragment's root
558 space.
559
560 There is one exception to this namespace scoping rule: the implicit IN‐
561 DEX vector is always in the null (top-level) namespace, and namespace
562 tags specified with it, either explicitly or implicitly, even a frag‐
563 ment root namespace, are ignored. So, in a fragment with root name‐
564 space rootspace, and current namespace rootspace.subspace,
565
566 INDEX,
567 .INDEX,
568 namespace.INDEX, and
569 .namespace.INDEX,
570
571 all refer to the same INDEX field.
572
573
574 Field Types
575 There are eighteen field types. Of these, fourteen are of vector type
576 (BIT, DIVIDE, INDIR, LINCOM, LINTERP, MPLEX, MULTIPLY, PHASE, POLYNOM,
577 RAW, RECIP, SBIT, SINDIR, and WINDOW) and four are of scalar type (CAR‐
578 RAY, CONST, SARRAY, and STRING). The thirteen vector field types other
579 than RAW fields are also called derived fields, since they derive their
580 value from one or more input vector fields. Any other vector field may
581 be used as an input vector, including the implicit INDEX field, but ex‐
582 cluding SINDIR string vectors.
583
584 Five of these derived fields (DIVIDE, LINCOM, MPLEX, MULTIPLY, and WIN‐
585 DOW) have more than one vector input field. In situations where these
586 input fields have differing sample rates, the sample rate of the de‐
587 rived field is the same as the sample rate of the first (left-most) in‐
588 put field specified. Furthermore, the input fields are synchronised by
589 aligning them on frame boundaries, assuming equally-spaced sampling
590 throughout a frame, and using the last sample of each input field which
591 did not occur after the sample of the derived field being computed.
592 That is, if the first and second input fields have sample rates s1 and
593 s2, the derived field also has sample rate s1 and, for every sample of
594 the derived field, n, the n'th sample of the first field is used (since
595 they have the same sample rate by definition), and the sample number
596 used of the second field, m, is computed as:
597
598 m = floor((n * s2) / s1).
599
600 Starting in Standards Version 6, certain scalar field parameters in the
601 field specifications may be specified using CONST or CARRAY fields, in‐
602 stead of literal values. A list of parameters for which this is al‐
603 lowed is given below in the Field Parameters section.
604
605 The possible fields types are:
606
607 BIT The BIT vector field type extracts one or more bits out of an
608 input vector field as an unsigned number. Syntax is:
609
610 <fieldname> BIT <input> <first-bit> [<num-bits>]
611
612 which specifies fieldname to be num-bits bits extracted from the
613 input vector field input starting with bit number first-bit
614 (counting from the least-significant bit, which is numbered ze‐
615 ro), after input has been converted from its native type to an
616 (endianness corrected) unsigned 64-bit integer. If num-bits is
617 omitted, it is assumed to be one.
618
619 The extracted bits are interpreted as an unsigned integer; the
620 SBIT field type is a signed version of this field type. The op‐
621 tional num-bits parameter appeared in Standards Version 1.
622
623 CARRAY The CARRAY scalar field type is a list of constants fully speci‐
624 fied in the format specification metadata. Syntax is:
625
626 <fieldname> CARRAY <type> <value0> <value1> <value2> ...
627
628 where type may be any supported native data type (see the de‐
629 scription of the RAW field type below), and value0, value1, &c.
630 are the values of successive elements in the scalar list inter‐
631 preted as indicated by type. No limit is placed on the number
632 of elements in a CARRAY. (Note: despite being multivalued, this
633 is not considered a vector field since the elements of the CAR‐
634 RAY are not indexed by frames.) CARRAY appeared in Standards
635 Version 8.
636
637 CONST The CONST scalar field type is a constant fully specified in the
638 format specification metadata. Syntax is:
639
640 <fieldname> CONST <type> <value>
641
642 where type may be any supported native data type (see the de‐
643 scription of the RAW field type below), and value is the numeri‐
644 cal value of the constant interpreted as indicated by type.
645 CONST appeared in Standards Version 6.
646
647 DIVIDE The DIVIDE vector field type is the quotient of two vector
648 fields. Syntax is:
649
650 <fieldname> DIVIDE <field1> <field1>
651
652 The derived field is computed as:
653
654 fieldname = field1 / field2.
655
656 It was introduced in Standards Version 8.
657
658 INDIR The INDIR vector field type performs an indirect translation of
659 a CARRAY scalar field to a derived vector field based on a vec‐
660 tor index field. Syntax is:
661
662 <fieldname> INDIR <index> <array>
663
664 where index is the vector field, which is converted to an inte‐
665 ger type, if necessary, and array is the CARRAY field. The nth
666 sample of the INDIR field is the value of the mth element of ar‐
667 ray (counting from zero), where m is the value of the nth sample
668 of index. When index is not a valid element number of array,
669 the corresponding value of the INDIR is implementation depen‐
670 dent. INDIR appeared in Standards Version 10.
671
672 LINCOM The LINCOM vector field type is the linear combination of one,
673 two or three input vector fields. Syntax is:
674
675 <fieldname> LINCOM [<n>] <field1> <a1> <b1> [<field2>
676 <a2> <b2> [<field3> <a3> <b3>]]
677
678 where n, if present, indicates the number of input vector fields
679 (1, 2, or 3). The derived field is computed as:
680
681 fieldname = (a1 * field1 + b1) + (a2 * field2 + b2) + (a3
682 * field3 + b3)
683
684 with the field2 and field3 terms included only if specified.
685
686 If n is not specified, the number of fields is determined by
687 looking at the supplied parameters. Since it is possible to
688 create a field code which is identical to a literal number, the
689 third token on the line is assumed to be n if it the entire to‐
690 ken can be parsed as a literal number using the rules outlined
691 in strtod(3). That is, if the field code specifying field1
692 could be mistaken for a literal number, n must be specified to
693 prevent ambiguity. In standards Version 6 and earlier, n is
694 mandatory.
695
696 LINTERP
697 The LINTERP vector field type specifies a table look up based on
698 another vector field. Syntax is:
699
700 <fieldname> LINTERP <input> <table>
701
702 where input is the input vector field for the table lookup, and
703 table is the path to the lookup table file for the field. If
704 this path is relative, it is assumed to be relative to the di‐
705 rectory containing the fragment defining this field. The lookup
706 table file is an ASCII text file with two whitespace separated
707 columns of x and y values. Values are linearly interpolated be‐
708 tween the points specified in the lookup table.
709
710 MPLEX The MPLEX vector field type permits the multiplexing of several
711 low sample rate fields into a single data field of higher sample
712 rate. Syntax is:
713
714 <fieldname> MPLEX <input> <index> <count> [<period>]
715
716 where input is the input vector containing the multiplexed
717 fields, index is the vector containing the mutliplex index,
718 count is the value of the multiplex index when the computed
719 field is stored in input, and period, if present and non-zero,
720 is the number of samples between successive occurrances of the
721 value count in the index vector. A period of zero (or, equiva‐
722 lently, it's omission) indicates that either the value count is
723 not equally spaced in the index vector, or else that the spacing
724 is unknown. Both count and period are integers, and period may
725 not be negative.
726
727 At every sample n, the derived field is computed as:
728
729 fieldname[n] = (index == count) ? input[n] : fieldname[n
730 - 1]
731
732 The index vector is converted to an integer type for comparison.
733 The value of the derived field before the first sample where in‐
734 dex equals count is implementation dependent.
735
736 The values of count and period place no restrictions on values
737 contained in index. Specifically, particular values of index
738 (including count) need not be equally spaced (neither by period
739 nor any other spacing); index need not ever take on the value
740 count (in which case the value of the entirety of the derived
741 field is implementation dependent). Different MPLEX field defi‐
742 nitions which use the same index vector may specify different
743 periods. MPLEX appeared in Standards Version 9.
744
745
746 MULTIPLY
747 The MULTIPLY vector field type is the product of two vector
748 fields. Syntax is:
749
750 <fieldname> MULTIPLY <field1> <field2>
751
752 The derived field is computed as:
753
754 fieldname = field1 * field2.
755
756 MULTIPLY appeared in Standards Version 2.
757
758 PHASE The PHASE vector field type shifts an input vector field by the
759 specified number of samples. Syntax is:
760
761 <fieldname> PHASE <input> <shift>
762
763 which specifies fieldname to be the input vector field, input,
764 shifted by shift samples. A positive shift indicates a forward
765 shift, towards the end-of-field. Results of shifting past the
766 beginning- or end-of-field is implementation dependent. PHASE
767 appeared in Standards Version 4.
768
769 POLYNOM
770 The POLYNOM vector field type specifies a polynomial function of
771 a single input vector field. Syntax is:
772
773 <field_name> POLYNOM <input> <a0> <a1> [<a2> [<a3> [<a4>
774 [<a5>]]]]
775
776 where <input> is the input field code, and the order of the com‐
777 puted polynomial is determined by how many co-efficients are
778 present in the specification. The derived field is computed as:
779
780 fieldname = a0 + a1 * input + a2 * input**2 + a3 * in‐
781 put**3 + a4 * input**4 + a5 * input**5
782
783 where ** is the element-wise exponentiation operator, and the
784 higher order terms are computed only if the corresponding co-ef‐
785 ficients ai are specified. POLYNOM appeared in Standards Ver‐
786 sion 7.
787
788 RAW The RAW vector field type specifies raw time streams on disk.
789 In this case, the field name should correspond to the name of
790 the file containing the time stream. Syntax is:
791
792 <fieldname> RAW <type> <sample-rate>
793
794 where sample-rate is the number of samples per dirfile frame for
795 the time stream and type is a token specifying the native data
796 type:
797
798 UINT8 unsigned 8-bit integer
799
800 INT8 two's complement signed 8-bit integer
801
802 UINT16 unsigned 16-bit integer
803
804 INT16 two's complement signed 16-bit integer
805
806 UINT32 unsigned 32-bit integer
807
808 INT32 two's complement signed 32-bit integer
809
810 UINT64 unsigned 64-bit integer
811
812 INT64 two's complement signed 64-bit integer
813
814 FLOAT32
815 IEEE-754 standard 32-bit single precision floating
816 point number
817
818 FLOAT64
819 IEEE-754 standard 64-bit double precision floating
820 point number
821
822 COMPLEX64
823 a 64-bit complex number consisting of two IEEE-754
824 standard 32-bit single precision floating point
825 numbers representing the real and imaginary parts
826 of the complex number (Standards Version 7 and
827 later)
828
829 COMPLEX128
830 a 128-bit complex number consisting of two
831 IEEE-754 standard 64-bit double precision floating
832 point numbers representing the real and imaginary
833 parts of the complex number (Standards Version 7
834 and later).
835
836 For more information on the storage of complex valued data, see
837 dirfile(5). Two additional type names exist: FLOAT is equiva‐
838 lent to FLOAT32, and DOUBLE is equivalent to FLOAT64. Standards
839 Version 9 deprecates these two aliases, but still allows them.
840
841 All these type names (except those for complex data, which came
842 later) were introduced in Standards Version 5. Earlier Stan‐
843 dards Versions specified data types with single-character type
844 aliases:
845
846
847 c UINT8
848
849 u UINT16
850
851 s INT16
852
853 U UINT32
854
855 i, S INT32
856
857 f FLOAT32
858
859 d FLOAT64
860
861 Types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 are not
862 supported before Standards Version 5, so no single-character
863 type aliases exist for these types. These single-character type
864 aliases were deprecated in Standards Version 5 and removed in
865 Standards Version 8.
866
867 RECIP The RECIP vector field type computes the reciprocal of a single
868 input vector field. Syntax is:
869
870 <field_name> RECIP <input> <dividend>
871
872 where <input> is the input field code and <dividend> is a scalar
873 quantity. The derived field is computed as:
874
875 fieldname = dividend / input.
876
877 RECIP appeared in Standards Version 8.
878
879 SARRAY The SARRAY scalar field type is a list of strings fully speci‐
880 fied in the format file metadata. Syntax is:
881
882 <fieldname> SARRAY <string0> <string1> <string2> ...
883
884 Each string is a single token. To include whitespace in a
885 string, enclose it in quotation marks ("), or else escape the
886 whitespace with the backslash character (\). No limit is placed
887 on the number of elements in a SARRAY. SARRAY appeared in Stan‐
888 dards Version 10.
889
890 SBIT The SBIT vector field type extracts one or more bits out of an
891 input vector field as a (two's-complement) signed number. Syn‐
892 tax is:
893
894 <fieldname> SBIT <input> <first-bit> [<num-bits>]
895
896 which specifies fieldname to be num-bits bits extracted from the
897 input vector field input starting with bit number first-bit
898 (counting from the least-significant bit, which is numbered ze‐
899 ro), after input has been converted from its native type to an
900 (endianness corrected) two's complement signed 64-bit integer.
901 If num-bits is omitted, it is assumed to be one.
902
903 The extracted bits are interpreted as a two's complement signed
904 integer of the specified width. (So, if num-bits is, for exam‐
905 ple, one, then the field can take on the value zero or negative
906 one.) The BIT field type is an unsigned version of this field
907 type. SBIT appeared in Standards Version 7.
908
909 SINDIR The SINDIR vector field type performs an indirect translation of
910 a SARRAY scalar field to a derived vector field of strings based
911 on a vector index field. Syntax is:
912
913 <fieldname> SINDIR <index> <array>
914
915 where index is the vector field, which is converted to an inte‐
916 ger type, if necessary, and array is the SARRAY field. The nth
917 sample of the SINDIR field is the string value of the mth ele‐
918 ment of array (counting from zero), where m is the value of the
919 nth sample of index. When index is not a valid element number
920 of array, the corresponding value of the SINDIR is implementa‐
921 tion dependent. SINDIR appeared in Standards Version 10.
922
923 STRING The STRING scalar field type is a character string fully speci‐
924 fied in the format file metadata. Syntax is:
925
926 <fieldname> STRING <string>
927
928 where string is the string value of the field. Note that string
929 is a single token. To include whitespace in the string, enclose
930 string in quotation marks ("), or else escape the whitespace
931 with the backslash character (\). STRING appeared in Standards
932 Version 6.
933
934 WINDOW The WINDOW vector field type isolates a portion of an input vec‐
935 tor based on a comparison. Syntax is:
936
937 <fieldname> WINDOW <input> <check> <op> <threshold>
938
939 where input is the vector containing the data to extract, check
940 is the vector on which to test the comparison, threshold is the
941 value against which check is compared, and op is one of the fol‐
942 lowing tokens indicating the particular comparison performed:
943
944 EQ data are extracted where check, converted to a
945 64-bit signed integer, equals threshold,
946
947 GE data are extracted where check, converted to a
948 64-bit floating-point number, is greater than or
949 equal to threshold,
950
951 GT data are extracted where check, converted to a
952 64-bit floating-point number, is strictly greater
953 than threshold,
954
955 LE data are extracted where check, converted to a
956 64-bit floating-point number, is less than or
957 equal to threshold,
958
959 LT data are extracted where check, converted to a
960 64-bit floating-point number, is strictly less
961 than threshold,
962
963 NE data are extracted where check, converted to a
964 64-bit signed integer, is not equal to threshold,
965
966 SET data are extracted where at least one bit set in
967 threshold is also set in check, when converted to
968 a 64-bit unsigned integer,
969
970 CLR data are extracted where at least one bit set in
971 threshold is not set in check, when converted to a
972 64-bit unsigned integer,
973
974 The storage type of threshold depends on the operator, and fol‐
975 lows the interpretation of check. It may never be complex val‐
976 ued.
977
978 Outside the region extracted, the value of the derived field is
979 implementation dependent.
980
981 Note: with the EQ operator, this derived field type is very sim‐
982 ilar to the MPLEX field type above. The primary difference is
983 that MPLEX mandates the value of the derived field outside the
984 extracted region, while WINDOW does not. WINDOW appeared in
985 Standards Version 9.
986
987
988 Field Parameters
989 All input vector field parameters should be field codes (see below).
990 Additionally, the scalar field parameters listed may be either literal
991 numbers or else the field code of a CONST field containing the value,
992 or the field code of a CARRAY followed by a left angle bracket (<),
993 then an non-negative integer used as the CARRAY element index, then a
994 right angle bracket (>), that is:
995
996 fieldcode<n>
997
998 If the angle brackets and element index are omitted from a CARRAY field
999 code used as a parameter, the first element in the field (index zero)
1000 is assumed.
1001
1002 Field parameters which may be specified using a scalar field code are:
1003
1004 BIT, SBIT
1005 bitnum, numbits
1006
1007 LINCOM any of the mi, or bi
1008
1009 MPLEX count, max
1010
1011 PHASE shift
1012
1013 POLYNOM
1014 any of the ai
1015
1016 RAW spf
1017
1018 RECIP dividend
1019
1020 WINDOW threshold
1021
1022 Since it is possible to create a field code which is identical to a
1023 literal number, a parameter is assumed to be the field code of a scalar
1024 field only if the entire token cannot be parsed as a literal number us‐
1025 ing the rules outlined in strtod(3). For example, a CONST field whose
1026 field code consists solely of digits can never be used as a parameter
1027 in a field specification line.
1028
1029 Starting in Standards Version 7, literal complex number is specified as
1030 two real (floating point) numbers separated by a semicolon (;) with no
1031 intervening whitespace. So, for example, the tokens
1032
1033 1;0 0;1 4;0 0;5 9.313e2;74.1
1034
1035 represent, respectively, the real unit, the imaginary unit, the real
1036 number four, the imaginary number 5i, and the complex number 931.3 +
1037 74.1i. Because the semicolon character cannot be used in field names,
1038 a complex valued literal can never be mistaken for a field code. This
1039 allows, among other things, the composition of complex valued fields
1040 from purely real input fields. For example, a complex valued field, z,
1041 may be created from a real valued field re, representing the real part
1042 of the complex number, and the real valued field im, representing the
1043 imaginary part of the complex number, with the following LINCOM speci‐
1044 fication:
1045
1046 z LINCOM re 1 0 im 0;1 0
1047
1048 Starting in Standards Version 9, in additional to decimal notation,
1049 literal integer parameters may be specified as hexadecimal numbers, by
1050 prefixing the number (after an optional '+' or '-' sign) with 0x or 0X,
1051 or as octal numbers, by prefixing the number with 0, as described in
1052 strtol(3). Similarly, floating point literal numbers (both purely real
1053 ones and components of complex literals) may be specified in hexadeci‐
1054 mal by prefixing them with 0x or 0X, and using p or P as the binary ex‐
1055 ponent prefix, as described in the C99 standard. Both uppercase and
1056 lowercase hexadecimal digits may be used. In cases where a literal
1057 floating point number may apear, the tokens INF or INFINITY, optionally
1058 preceded by a '+' or '-' sign, and NAN, optionally immediately followed
1059 by '(', then a sequence of characters, then ')', and all disregarding
1060 case, will be interpreted as the special floating point values ex‐
1061 plained in strtod(3).
1062
1063
1064 Field Codes
1065 When specifying the input to a field, either as a scalar parameter, or
1066 as an input vector field to a non-RAW vector field, field codes are
1067 used. A field code consists of, in order:
1068
1069 · (since Standards Version 10:) optonally, a leading dot (.), indi‐
1070 cating this field code is relative to the fragment's root name‐
1071 space. Without the leading dot, the field code is taken to be rel‐
1072 ative to the current namespace. (See the discussion in the Names‐
1073 paces section above for details.)
1074
1075 · (since Standards Version 10:) optionally, a non-null subnamespace
1076 followed by a dot (.) indicating a subspace under the current or
1077 root namespace. The subnamespace may be made up of any number of
1078 namespace tags separated by dots, to nest deeper in the namespace
1079 tree.
1080
1081 · (since Standards Version 6:) if the field in question is a
1082 metafield (see the /META directive above), the field name of the
1083 metafield's parent (which may be an alias) followed by a forward
1084 slash (/).
1085
1086
1087 · a simple field name, possibly an alias, indicating a vector or
1088 scalar field
1089
1090 · (since Standards Version 7:) optionally, a dot (.) followed by a
1091 representation suffix.
1092
1093 A representation suffix may be used used to extract a real number from
1094 a complex value. The available suffixes (listed here with their pre‐
1095 ceding dot) and their meanings are:
1096
1097 .a the argument of the input, that is, the angle (in radians) be‐
1098 tween the positive real axis and the input. The argument is in
1099 the range [-pi, pi], and a branch cut exists along the negative
1100 real axis. At the branch cut, -pi is returned if the imaginary
1101 part is -0, and pi is returned if the imaginary part is +0. If
1102 the input is zero, zero is returned.
1103
1104 .i the imaginary part of the input (i.e. the projection of the in‐
1105 put onto the imaginary axis)
1106
1107 .m the modulus of the input (i.e. its absolue value).
1108
1109 .r the real part of the input (i.e. the projection of the input on‐
1110 to the real axis)
1111
1112 .z (since Standards Version 10:) the identity representation: it
1113 returns the full complex value, equivalent to simply omitting
1114 the suffix completely. It is only needed in certain cases to
1115 force the correct interpretation of a field code in the presence
1116 of a namespace tag. To wit, the field code
1117
1118 name.r
1119
1120 may be interpreted as the real-part (via the .r representation
1121 suffix) of the field called name. (if such a field exists). To
1122 refer to a field called r in the name namespace, the field code
1123 must be written:
1124
1125 name.r.z
1126
1127 NB: The first interpretation only occurs with valid representa‐
1128 tion suffixes; the field code:
1129
1130 name.q
1131
1132 is interpreted as the field q in the name namespace because .q
1133 is not a valid representation suffix. Furthermore, ambiguity
1134 arises only if both fields "name" and "name.r" are defined. if
1135 the field "name" does not exist, but the field "name.r" does,
1136 then the original field code is not ambiguous. This is the only
1137 representation suffix allowed on SARRAY, SINDIR, and STRING
1138 field codes.
1139
1140 If the specified field is purely real, representations are calculated
1141 as if the imaginary part were equal to +0.
1142
1143
1145 This document describes Versions 10 and earlier of the Dirfile Stan‐
1146 dards.
1147
1148 Version 10 of the Standards (January 2017) added the INDIR, SARRAY, and
1149 SINDIR field types, namespaces, the /NAMESPACE directive, the flac en‐
1150 coding scheme, and the .z representation suffix.
1151
1152 Version 9 of the Standards (April 2012) added the MPLEX and WINDOW
1153 field types, the /ALIAS and /HIDDEN directives, the affixes to /IN‐
1154 CLUDE, the sie, zzip, and zzslim encoding schemes, along with the op‐
1155 tional enc_datum token to /ENCODING. It permitted specification of in‐
1156 teger literals in octal and hexadecimal. Finally, it deprecated the
1157 type aliases FLOAT and DOUBLE.
1158
1159 Version 8 of the Standards (November 2010) added the DIVIDE, RECIP, and
1160 CARRAY field types, made the forward slash on reserved words mandatory,
1161 and prohibited using the single-character type aliases in the specifi‐
1162 cation of RAW fields. It also introduced the optional second (arm) to‐
1163 ken to the /ENDIAN directive.
1164
1165 Version 7 of the Standards (October 2009) added the SBIT and POLYNOM
1166 field types, and the directive-less method of specifying metafields.
1167 It also introduced the data types COMPLEX128 and COMPLEX64, along with
1168 the notion of representations, and the lzma encoding scheme. Finally,
1169 it made the number of fields parameter for LINCOM optional.
1170
1171 Version 6 of the Standards (October 2008) added the /ENCODING, /META,
1172 /PROTECT, and /REFERENCE directives, and the CONST and STRING field
1173 types. It permitted whitespace in tokens and introduced the character
1174 escape sequences. It allowed CONST fields to be used as parameters in
1175 field specification lines. It also removed FILEFRAM as an alias for
1176 INDEX, and prohibited . but allowed # and \ in field names.
1177
1178 Version 5 of the Standards (August 2008) added VERSION and ENDIAN,
1179 slash demarcation of reserved words, and removed the restriction on
1180 field name length. It introduced the data types INT8, INT64, and
1181 UINT64, the new-style type specifiers, and increased the range of the
1182 BIT field type from 32 to 64 bits. It also prohibited the characters
1183 &;<>\| in field names.
1184
1185 Version 4 of the Standards (October 2006) added the PHASE field type.
1186
1187 Version 3 of the Standards (January 2006) added INCLUDE and increased
1188 the allowed length of a field name from 16 to 50 characters.
1189
1190 Version 2 of the Standards (September 2005) added the MULTIPLY field
1191 type.
1192
1193 Version 1 of the Standards (November 2004) added FRAMEOFFSET and the
1194 optional fourth argument to the BIT field type.
1195
1196 Version 0 of the Standards (before March 2003) refers to the dirfile
1197 standards supported by the getdata(3) library originally introduced in‐
1198 to the kst(1) sources, which contained support for all other features
1199 covered by this document.
1200
1201
1203 The dirfile specification was developed by C. B. Netterfield
1204 <netterfield@astro.utoronto.ca>.
1205
1206 Since Standards Version 3, the dirfile specification has been main‐
1207 tained by D. V. Wiebe <getdata@ketiltrout.net>.
1208
1209
1211 dirfile(5), dirfile-encoding(5)
1212
1213
1214
1215Standards Version 10 19 January 2017 dirfile-format(5)