1
2
3
4
5
6
7
8
9gd_open(3) GETDATA gd_open(3)
10
11
12
14 gd_open, gd_cbopen — open or create a Dirfile
15
16
18 #include <getdata.h>
19
20 DIRFILE* gd_open(const char *dirfilename, unsigned long flags);
21
22 DIRFILE* gd_cbopen(const char *dirfilename, unsigned long flags,
23 gd_parser_callback_t sehandler, void *extra);
24
25
27 The gd_cbopen() function opens or creates the dirfile specified by
28 dirfilename, returning a DIRFILE object associated with it. Opening a
29 dirfile will cause the library to read and parse the dirfile's format
30 specification (see dirfile-format(5)).
31
32 If not NULL, sehandler should be a pointer to a function which will be
33 called whenever a syntax error is encountered during parsing the format
34 specification. Specify NULL for this parameter if no callback function
35 is to be used. The caller may use this function to correct the error
36 or modify the error handling of the format specification parser. See
37 The Callback Function section below for details on this function. The
38 extra argument allows the caller to pass data to the callback function.
39 The pointer will be passed to the callback function verbatim.
40
41 The gd_open() function is equivalent to gd_cbopen(), with sehandler and
42 extra set to NULL.
43
44 The flags argument should include one of the access modes: GD_RDONLY
45 (read-only) or GD_RDWR (read-write), and may also contain zero or more
46 of the following flags, bitwise-or'd together:
47
48 GD_ARM_ENDIAN
49 GD_NOT_ARM_ENDIAN
50 Specifies that double precision floating point raw data on disk
51 are, or are not, stored in the middle-endian format used by
52 older ARM processors.
53
54 These flag only set the default endianness, and will be over‐
55 ridden when an /ENDIAN directive specifies the byte sex of RAW
56 fields, unless GD_FORCE_ENDIAN is also specified.
57
58 On every platform, one of these flags (GD_NOT_ARM_ENDIAN on all
59 but middle-ended ARM systems) indicates the native behaviour of
60 the platform. That symbol will equal zero, and may be omitted.
61
62 GD_BIG_ENDIAN
63 GD_LITTLE_ENDIAN
64 Specifies the default byte sex of raw data stored on disk to be
65 either big-endian (most significant byte first) or little-endi‐
66 an (least significant byte first). Omitting both flags indi‐
67 cates the default should be the native endianness of the plat‐
68 form.
69
70 Unlike the ARM endianness flags above, neither of these symbols
71 is ever zero. Specifying both these flags together will cause
72 the library to assume that the endianness of the data is oppo‐
73 site to that of the native architecture, whatever that might
74 be.
75
76 These flag only set the default endianness, and will be over‐
77 ridden when an /ENDIAN directive specifies the byte sex of RAW
78 fields, unless GD_FORCE_ENDIAN is also specified.
79
80 GD_CREAT
81 An empty dirfile will be created, if one does not already ex‐
82 ist. This will create both the dirfile directory and an empty
83 format specification file called format. If the call creates a
84 dirfile, then the specified access mode is ignored: a newly-
85 created DIRFILE is always opened with access mode GD_RDWR, even
86 if GD_RDONLY had been specified.
87
88 The directory will have have mode S_IRWXU | S_IRWXG | S_IRWXO
89 (0777), modified by the caller's umask value (see umask(2)).
90 The format file will have mode S_IRUSR | S_IWUSR | S_IRGRP |
91 S_IWGRP | S_IROTH | S_IWOTH (0666), also modified by the call‐
92 er's umask. The owner of the dirfile directory and format file
93 will be the effective user ID of the caller. Group ownership
94 follows the rules outlined in mkdir(2).
95
96 GD_EXCL Ensure that this call creates a dirfile: when specified along
97 with GD_CREAT, the call will fail if the dirfile specified by
98 dirfilename already exists. If GD_CREAT is not specified, this
99 flag is ignored. This flag suffers from all the limitations of
100 the O_EXCL flag as indicated in open(2).
101
102 GD_FORCE_ENCODING
103 Specifies that /ENCODING directives (see dirfile-format(5))
104 found in the dirfile format specification should be ignored.
105 The encoding scheme specified in flags will be used instead
106 (see below).
107
108 GD_FORCE_ENDIAN
109 Specifies that /ENDIAN directives (see dirfile-format(5)) found
110 in the dirfile format specification should be ignored. All raw
111 data will be assumed to have the byte sex indicated through the
112 presence or absence of the GD_ARM_ENDIAN, GD_BIG_ENDIAN,
113 GD_LITTLE_ENDIAN, and GD_NOT_ARM_ENDIAN flags.
114
115 GD_IGNORE_DUPS
116 If the dirfile format metadata specifies more than one field
117 with the same name, all but one of them will be ignored by the
118 parser. Without this flag, parsing would fail with the
119 GD_E_FORMAT error, possibly resulting in invocation of the reg‐
120 istered callback function. Which of the duplicate fields is
121 kept is not specified. As a result, this flag is typically on‐
122 ly useful in the case where identical copies of a field speci‐
123 fication line are present.
124
125 No indication is provided to indicate whether a duplicate field
126 has been discarded. If finer grained control is required, the
127 caller should handle GD_E_FORMAT_DUPLICATE suberrors itself
128 with an appropriate callback function.
129
130 GD_PEDANTIC
131 Reject dirfiles which don't conform to the Dirfile Standards.
132 See the Standards Compliance section below for full details.
133
134 GD_PERMISSIVE
135 Allow non-compliant format specification syntax, even when giv‐
136 en along with a conflicting /VERSION directive. See the Stan‐
137 dards Compliance section below for full details.
138
139 GD_PRETTY_PRINT
140 When dirfile metadata are flushed to disk (either explicitly
141 via gd_metaflush(3), gd_rewrite_fragment(3), or gd_flush(3) or
142 implicitly by closing the dirfile), an attempt will be made to
143 create a nicer looking format specification (from a human-read‐
144 able standpoint). What this explicitly means is not part of
145 the API, and any particular behaviour should not be relied on.
146 If the dirfile is opened read-only, this flag is ignored.
147
148 GD_TRUNC
149 If dirfilename specifies an already existing dirfile, it will
150 be truncated before opening. Since gd_cbopen() decides whether
151 dirfilename specifies an existing dirfile before attempting to
152 parse the dirfile, dirfilename is considered to specify an ex‐
153 isting dirfile if it refers to a directory containing a regular
154 file called format, regardless of the content or form of that
155 file.
156
157 Truncation occurs by deleting every regular file and symlink in
158 the specified directory, whether the files were referred to by
159 the dirfile before truncation or not. Accordingly, this flag
160 should be used with caution. Unless GD_TRUNCSUB is also speci‐
161 fied, subdirectories are left untouched. Notably, this opera‐
162 tion does not consider directories used in /INCLUDE directives.
163 If the dirfile does not exist, this flag is ignored.
164
165 GD_TRUNCSUB
166 If specified along with GD_TRUNC, truncation will descend into
167 subdirectories, deleting all regular files and symlinks recur‐
168 sively. It does not descend into directories pointed to by
169 symbolic links: in these cases, just the symlink itself is
170 deleted. If specified without an accompanying GD_TRUNC, this
171 flag is ignored.
172
173 GD_VERBOSE
174 Specifies that whenever an error is triggered by the library
175 when working on this dirfile, the corresponding error string,
176 which can be retrieved by calling gd_error_string(3), should be
177 written on the caller's standard error stream (stderr(3)) by
178 GetData. The error string may be prefixed by a string speci‐
179 fied by the caller; see gd_verbose_prefix(3). Without this
180 flag, GetData writes nothing to standard error. (GetData never
181 writes to standard output.)
182
183 Those flags which affect the operation of the library beyond this call
184 itself may be modified later using the gd_flags(3) function.
185
186 The flags argument may also be bitwise or'd with one of the following
187 symbols indicating the default encoding scheme of the dirfile. Like
188 the endianness flags, the choice of encoding here is ignored if the en‐
189 coding is specified in the dirfile itself, unless GD_FORCE_ENCODED is
190 also specified. If none of these symbols is present, GD_AUTO_ENCODED
191 is assumed, unless the gd_cbopen() call results in creation or trunca‐
192 tion of the dirfile. In that case, GD_UNENCODED is assumed. See
193 dirfile-encoding(5) for details on dirfile encoding schemes.
194
195 GD_AUTO_ENCODED
196 Specifies that the encoding type is not known in advance, but
197 should be detected by the GetData library. Detection is accom‐
198 plished by searching for raw data files with extensions appro‐
199 priate to the encoding scheme. This method will notably fail
200 if the the library is called via putdata(3) to create a previ‐
201 ously non-existent raw field unless a read is first successful‐
202 ly performed on the dirfile. Once the library has determined
203 the encoding scheme for the first time, it remembers it for
204 subsequent calls.
205
206 GD_BZIP2_ENCODED
207 Specifies that raw data files are compressed using the Burrows-
208 Wheeler block sorting text compression algorithm and Huffman
209 coding, as implemented in the bzip2 format.
210
211 GD_FLAC_ENCODED
212 Specifies that raw data files are compressed using the Free
213 Lossless Audio Coded (FLAC).
214
215 GD_GZIP_ENCODED
216 Specifies that raw data files are compressed using Lempel-Ziv
217 coding (LZ77) as implemented in the gzip format.
218
219 GD_LZMA_ENCODED
220 Specifies that raw data files are compressed using the Lempel-
221 Ziv Markov Chain Algorithm (LZMA) as implemented in the xz con‐
222 tainer format.
223
224 GD_SLIM_ENCODED
225 Specifies that raw data files are compressed using the slimlib
226 library.
227
228 GD_SIE_ENCODED
229 Specified that raw data files are sample-index encoded, similar
230 to run-length encoding, suitable for data that change rarely.
231
232 GD_TEXT_ENCODED
233 Specifies that raw data files are encoded as text files con‐
234 taining one data sample per line.
235
236 GD_UNENCODED
237 Specifies that raw data files are not encoded, but written as
238 simply binary data to disk.
239
240 GD_ZZIP_ENCODED
241 Specifies that raw data files are compressed using the DEFLATE
242 algorithm. All raw data files for a given fragment are col‐
243 lected together and stored in a PKZIP archive called raw.zip.
244
245 GD_ZZSLIM_ENCODED
246 Specifies that raw data files are compressed using a combina‐
247 tions of compression schemes: first files are slim-compressed,
248 as with the GD_SLIM_ENCODED scheme, and then they are collected
249 together and compressed (again) into a PKZIP archive called
250 raw.zip, as in the GD_ZZIP_ENCODED scheme.
251
252
253 Standards Compliance
254 The latest Dirfile Standards Version which this release of GetData un‐
255 derstands is provided in the preprocessor macro GD_DIRFILE_STAN‐
256 DARDS_VERSION defined in getdata.h. GetData is able to open and parse
257 any dirfile which conforms to this Standards Version, or to any earlier
258 Version. The dirfile-format(5) manual page lists the changes between
259 Standards Versions.
260
261 The GetData parser can operate in two modes: a permissive mode, in
262 which much non-Standards-compliant syntax is allowed, and a pedantic
263 mode, in which the parser adheres strictly to the Standards. The mode
264 made change during the parsing of a dirfile. If GD_PEDANTIC is passed
265 to gd_cbopen(), the parser will start parsing the format specification
266 in pedantic mode, otherwise it will start in permissive mode.
267
268 Permissive mode is provided primarily to allow GetData to be used on
269 dirfiles which conform to no single Standard, but which were accepted
270 by the GetData parser in previous versions. It is notably lax regard‐
271 ing reserved field names, and field name characters, the mixing of old
272 and new data type specifiers, and generally ignores the presence of
273 /VERSION directives. In read-write mode, permissive mode should be
274 used with caution, as it can cause unintentional corruption of dirfile
275 metadata on write, if the heuristics in the parser incorrectly guessed
276 the intention of non-compliant syntax. In permissive mode, actual syn‐
277 tax errors are still reported as such.
278
279 In pedantic mode, the parser conforms to one specific Standards Ver‐
280 sion. This target version may change any number of times in the course
281 of scanning a single format specification. If invoked using the
282 GD_PEDANTIC flag, the parser will start in pedantic mode with a target
283 version equal to GD_DIRFILE_STANDARDS_VERSION. Whenever a /VERSION di‐
284 rective is encountered in the format specification, the target version
285 is changed to the Standards Version specified. When encountering a
286 /VERSION directive in permissive mode, the parser will switch to pedan‐
287 tic mode, unless the GD_PERMISSIVE flag was passed to gd_cbopen(), in
288 which case no mode switch will take place.
289
290 Independent of the mode of the parser when parsing the format specifi‐
291 cation, GetData will calculate a list of Standards Versions to which
292 the parsed metadata conform to. The gd_dirfile_standards(3) function
293 can provide this information, and also specify the desired Standards
294 Version for writing format metadata back to disk.
295
296
297 The Callback Function
298 The caller-supplied sehandler function is called whenever the format
299 specification parser encounters a syntax error (i.e. whenever it would
300 return the GD_E_FORMAT error). This callback may be used to correct
301 the error, or to tell the parser how to recover from it.
302
303 This function should take two pointers as arguments, and return an int:
304
305 int sehandler(gd_parser_data_t *pdata, void *extra);
306
307 The extra parameter is the pointer supplied to gd_cbopen(), passed ver‐
308 batim to this function. It can be used to pass caller data to the
309 callback. GetData does not inspect this pointer, not even to check its
310 validity. If the caller needs to pass no data to the callback, it may
311 be NULL.
312
313 The gd_parser_data_t type is a structure with at least the following
314 members:
315
316 typedef struct {
317 const DIRFILE* dirfile;
318 int suberror;
319 int linenum;
320 const char* filename;
321 char* line;
322 size_t buflen;
323
324 ...
325 } gd_parser_data_t;
326
327 The pdata->dirfile member will be a pointer to a DIRFILE object suit‐
328 able only for passing to gd_error_string(). Notably, the caller should
329 not assume this pointer will be the same as the pointer eventually re‐
330 turned by gd_cbopen(), nor that it will be valid after the callback
331 function returns.
332
333 The pdata->suberror parameter will be one of the following symbols in‐
334 dicating the type of syntax error encountered:
335
336 GD_E_FORMAT_ALIAS
337 The parent specified for a meta field was an alias.
338
339 GD_E_FORMAT_BAD_LINE
340 The line was indecipherable. Typically this means that the
341 line contained neither a reserved word, nor a field type.
342
343 GD_E_FORMAT_BAD_NAME
344 The specified field name was invalid.
345
346 GD_E_FORMAT_BAD_SPF
347 The samples-per-frame of a RAW field was out-of-range.
348
349 GD_E_FORMAT_BAD_TYPE
350 The data type of a RAW field was unrecognised.
351
352 GD_E_FORMAT_BITNUM
353 The first bit of a BIT field was out-of-range.
354
355 GD_E_FORMAT_BITSIZE
356 The last bit of a BIT field was out-of-range.
357
358 GD_E_FORMAT_CHARACTER
359 An invalid character was found in the line, or a character es‐
360 cape sequence was malformed.
361
362 GD_E_FORMAT_DUPLICATE
363 The specified field name already exists.
364
365 GD_E_FORMAT_ENDIAN
366 The byte sex specified by an /ENDIAN directive was unrecog‐
367 nised.
368
369 GD_E_FORMAT_LITERAL
370 An unexpected character was encountered in a complex literal.
371
372 GD_E_FORMAT_LOCATION
373 The parent of a metafield was defined in another fragment.
374
375 GD_E_FORMAT_META_META
376 An attempt was made to use a metafield as the parent to a new
377 metafield.
378
379 GD_E_FORMAT_METARAW
380 An attempt was made to add a RAW metafield.
381
382 GD_E_FORMAT_MPLEXVAL
383 A MPLEX specification has a negative period.
384
385 GD_E_FORMAT_N_FIELDS
386 The number of fields of a LINCOM field was out-of-range.
387
388 GD_E_FORMAT_N_TOK
389 An insufficient number of tokens was found on the line.
390
391 GD_E_FORMAT_NO_FIELD
392 The parent of a metafield was not found.
393
394 GD_E_FORMAT_NUMBITS
395 The number of bits of a BIT field was out-of-range.
396
397 GD_E_FORMAT_PROTECT
398 The protection level specified by a /PROTECT directive was un‐
399 recognised.
400
401 GD_E_FORMAT_RES_NAME
402 A field was specified with the reserved name INDEX (or with the
403 reserved name FILEFRAM in a dirfile conforming to Standards
404 Version 5 or earlier).
405
406 GD_E_FORMAT_UNTERM
407 The last token of the line was unterminated.
408
409 GD_E_FORMAT_WINDOP
410 The operation in a WINDOW field was not recognised.
411
412 pdata->filename and pdata->linenum members contains the pathname of the
413 fragment and line number where the syntax error was encountered. The
414 first line in a fragment is line one.
415
416 The pdata->line member contains a copy of the line containing the syn‐
417 tax error. This line may be freely modified by the callback function.
418 It will then be reparsed if the callback function returns the symbol
419 GD_SYNTAX_RESCAN (see below). The size of the memory buffer, which may
420 be greater than the length of the actual string, is provided in pdata->
421 buflen, and space is available for at least GD_MAX_LINE_LENGTH bytes.
422
423 If the callback function returns GD_SYNTAX_RESCAN, then a different
424 buffer, which may be larger, may be used to hold the new string, by as‐
425 signing a pointer to the new buffer to pdata->line. This buffer will
426 be deallocated by the library using the free function specified through
427 gd_alloc_funcs(3), or else free(3) by default. Do not deallocate the
428 original buffer passed to the callback through pdata->line: it, too,
429 will be deallocated by the library.
430
431 The callback function should return one of the following symbols, which
432 tells the parser how to subsequently handle the error:
433
434 GD_SYNTAX_ABORT
435 The parser should immediately abort parsing the format specifi‐
436 cation and fail with the error GD_E_FORMAT. This is the de‐
437 fault behaviour, if no callback function is provided (or if the
438 parser is invoked by calling gd_open()).
439
440 GD_SYNTAX_CONTINUE
441 The parser should continue parsing the format specification.
442 However, once parsing has finished, the parser will fail with
443 the error GD_E_FORMAT, even if no further syntax errors are en‐
444 countered. This behaviour may be used by the caller to identi‐
445 fy all lines containing syntax errors in the format specifica‐
446 tion, instead of just the first one.
447
448 GD_SYNTAX_IGNORE
449 The parser should ignore the line containing the syntax error
450 completely, and carry on parsing the format specification. If
451 no further errors are encountered, the dirfile will be success‐
452 fully opened.
453
454 GD_SYNTAX_RESCAN
455 The parser should rescan the line argument, which replaces the
456 line which originally contained the syntax error. The line is
457 assumed to have been corrected by the callback function. If
458 the line still contains a syntax error, the callback function
459 will be called again.
460
461 Note: the line is not corrected on disk; however, the caller
462 may subsequently correct the fragment on disk by calling
463 gd_rewrite_fragment(3).
464
465 The callback function handles only syntax errors. The parser may still
466 abort early, if a different kind of library error is encountered. Fur‐
467 thermore, although a line may contain more than one syntax error, the
468 parser will only ever report one syntax error per line, even if the
469 callback function returns GD_SYNTAX_CONTINUE.
470
471
473 A call to gd_cbopen() or gd_open() always returns a pointer to a newly
474 allocated DIRFILE object, except in instances when it is unable to al‐
475 locate memory for the DIRFILE object itself, in which case it will re‐
476 turn NULL. The DIRFILE object is an opaque structure containing the
477 parsed dirfile metadata.
478
479 If an error occurred, these functions will store a negative-valued er‐
480 ror code in the returned DIRFILE, which may be retrieved by a subse‐
481 quent call to gd_error(3). Possible error codes are:
482
483 GD_E_ACCMODE
484 The library was asked to truncate a dirfile opened read-only
485 (i.e. GD_TRUNC was specified in flags along with GD_RDONLY).
486
487 GD_E_ALLOC
488 The library was unable to allocate memory.
489
490 GD_E_BAD_REFERENCE
491 The reference field specified by a /REFERENCE directive in the
492 format specification (see dirfile-format(5)) was not found, or
493 was not a RAW field.
494
495 GD_E_CALLBACK
496 The registered callback function, sehandler, returned an un‐
497 recognised response.
498
499 GD_E_CREAT
500 The library was unable to create the dirfile.
501
502 GD_E_EXISTS
503 The dirfile already exists and both GD_CREAT and GD_EXCL were
504 specified.
505
506 GD_E_FORMAT
507 A syntax error occurred in the format specification. See also
508 The Callback Function section above.
509
510 GD_E_IO The dirfile format file, or another file that it includes,
511 could not be read, or dirfilename does not specify a valid
512 dirfile.
513
514 GD_E_LINE_TOO_LONG
515 The parser encountered a line in the format specification
516 longer than it was able to deal with. Lines are limited by the
517 storage size of ssize_t. On 32-bit systems, this limits format
518 specification lines to 2**31 bytes. The limit is larger on
519 64-bit systems.
520
521 A DIRFILE which is returned from a failed open is flagged as invalid,
522 meaning most functions it is passed to will faill with the error
523 GD_E_BAD_DIRFILE. A descriptive error string for the error may be ob‐
524 tained by calling gd_error_string(3).
525
526 When no longer needed, the caller should de-allocate any returned
527 DIRFILE object by calling gd_close(3), or gd_discard(3), even if the
528 open failed.
529
530
532 When working with dirfiles conforming to Standards Versions 4 and ear‐
533 lier (before the introduction of the /ENDIAN directive), GetData as‐
534 sumes the dirfile has native byte sex, even though, officially, these
535 early Standards stipulated data to be little-endian. This is necessary
536 since, in the absence of an explicit /VERSION directive, it is often
537 impossible to determine the intended Standards Version of a dirfile,
538 and the current behaviour is to assume native byte sex for modern
539 dirfiles lacking /ENDIAN. To read an old, little-ended dirfile on a
540 big-ended platform, an /ENDIAN directive should be added to the format
541 specification, or else GD_LITTLE_ENDIAN should be specified by the
542 caller.
543
544 GetData's parser assumes it is running on an ASCII-compatible platform.
545 Format specification parsing will fail gloriously on an EBCDIC plat‐
546 form.
547
548
550 The dirfile_open() function appeared in GetData-0.3.0. The only sup‐
551 ported flags were GD_BIG_ENDIAN, GD_CREAT, GD_EXCL, GD_FORCE_ENDIAN,
552 GD_LITTLE_ENDIAN, GD_PEDANTIC, GD_RDONLY, GD_RDWR, and GD_TRUNC.
553
554 The GD_AUTO_ENCODED, GD_FORCE_ENCODING, GD_SLIM_ENCODED, GD_TEXT_ENCOD‐
555 ED, GD_UNECODED, and GD_VERBOSE flags appeared in GetData-0.4.0.
556
557 The dirfile_cbopen() function and the GD_BZIP2_ENCODED, GD_GZIP_ENCOD‐
558 ED, and GD_IGNORE_DUPS flags appeared in GetData-0.5.0.
559
560 The GD_PRETTY_PRINT and GD_LZMA_ENCODED flags appeared in GetDa‐
561 ta-0.6.0.
562
563 In GetData-0.7.0 these functions were renamed to gd_open() and
564 gd_cbopen(). The GD_ARM_ENDIAN, GD_NOT_ARM_ENDIAN, and GD_PERMISSIVE
565 flags also appeared in this release.
566
567 The GD_SIE_ENCODED, GD_TRUNCSUB, GD_ZZIP_ENCODED, and GD_ZZSLIM_ENCODED
568 flags appeared in GetData-0.8.0.
569
570 The GD_FLAC_ENCODED flag appeared in GetData-0.9.0.
571
572
574 gd_alloc_funcs(3), gd_close(3), gd_dirfile_standards(3), gd_discard(3),
575 gd_error(3), gd_error_string(3), gd_flags(3), gd_getdata(3),
576 gd_include(3), gd_parser_callback(3), gd_verbose_prefix(3), dirfile(5),
577 dirfile-encoding(5), dirfile-format(5)
578
579
580
581Version 0.10.0 25 December 2016 gd_open(3)