1dirfile_cbopen(3) GETDATA dirfile_cbopen(3)
2
3
4
6 dirfile_cbopen, dirfile_open — open or create a dirfile
7
9 #include <getdata.h>
10
11 DIRFILE* dirfile_cbopen(const char *dirfilename, unsigned long flags,
12 gd_parser_callback_t sehandler, void *extra);
13
14 DIRFILE* dirfile_open(const char *dirfilename, unsigned long flags);
15
17 The dirfile_cbopen() function opens or creates the dirfile specified by
18 dirfilename, returning a DIRFILE object associated with it. Opening a
19 dirfile will cause the library to read and parse the dirfile's format
20 file (see dirfile-format(5)).
21
22 If not NULL, sehandler should be a pointer to a function which will be
23 called whenever a syntax error is encountered during parsing the format
24 file. Specify NULL for this parameter if no callback function is to be
25 used. The caller may use this function to correct the error or modify
26 the error handling of the format file parser. See The Callback Func‐
27 tion section below for details on this function. The extra argument
28 allows the caller to pass data to the callback function. The pointer
29 will be passed to the callback function verbatim.
30
31 The dirfile_open() function is equivalent to dirfile_cbopen(), with se‐
32 handler and extra set to NULL.
33
34 The flags argument should include one of the access modes: GD_RDONLY
35 (read-only) or GD_RDWR (read-write), and may also contain zero or more
36 of the following flags, bitwise-or'd together:
37
38 GD_BIG_ENDIAN
39 Specifies that raw data on disk is stored as big-endian data
40 (most significant byte first). Specifying this flag along with
41 the contradictory GD_LITTLE_ENDIAN will cause the library to as‐
42 sume that the endianness of the data is opposite to that of the
43 native architecture.
44
45 This flag is ignored completely if an ENDIAN directive occurs in
46 the dirfile format file, unless GD_FORCE_ENDIAN is also speci‐
47 fied.
48
49 GD_CREAT
50 An empty dirfile will be created, if one does not already exist.
51 This will create both the dirfile directory and an empty format
52 file. The directory will have have mode S_IRWXU | S_IRWXG |
53 S_IRWXO (0777), modified by the caller's umask value (see
54 umask(2)). The format file will have mode S_IRUSR | S_IWUSR |
55 S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH (0666), also modified by
56 the caller's umask.
57
58 The owner of the dirfile directory and format file will be the
59 effective user ID of the caller. Group ownership follows the
60 rules outlined in mkdir(2).
61
62 GD_EXCL
63 Ensure that this call creates a dirfile: when specified along
64 with GD_CREAT, the call will fail if the dirfile specified by
65 dirfilename already exists. Behaviour of this flag is undefined
66 if GD_CREAT is not specified. This flag suffers from all the
67 limitations of the O_EXCL flag as indicated in open(2).
68
69 GD_FORCE_ENCODING
70 Specifies that ENCODING directives (see dirfile-format(5)) found
71 in the dirfile format file should be ignored. The encoding
72 scheme specified in flags will be used instead (see below).
73
74 GD_FORCE_ENDIAN
75 Specifies that ENDIAN directives (see dirfile-format(5)) found
76 in the dirfile format file should be ignored. When specified
77 with one of GD_BIG_ENDIAN or GD_LITTLE_ENDIAN, the endianness
78 specified will be assumed. If this flag is specified with nei‐
79 ther of those flags, the dirfile will be assumed to have the en‐
80 dianness of the native architecture.
81
82 GD_IGNORE_DUPS
83 If the dirfile format metadata specifies more than one field
84 with the same name, all but one of them will be ignored by the
85 parser. Without this flag, parsing would fail with the
86 GD_E_FORMAT error, possibly resulting in invocation of the reg‐
87 istered callback function. Which of the duplicate fields is
88 kept is not specified. As a result, this flag is typically only
89 useful in the case where identical copies of a field specifica‐
90 tion line are present.
91
92 No indication is provided to indicate whether a duplicate field
93 has been discarded. If finer grained control is required, the
94 caller should handle GD_E_FORMAT_DUPLICATE suberrors itself with
95 an appropriate callback function.
96
97 GD_LITTLE_ENDIAN
98 Specifies that raw data on disk is stored as little-endian data
99 (least significant byte first). Specifying this flag along with
100 the contradictory GD_BIG_ENDIAN will cause the library to assume
101 that the endianness of the data is opposite to that of the na‐
102 tive architecture.
103
104 This flag is ignored completely if an ENDIAN directive occurs in
105 the dirfile format file, unless GD_FORCE_ENDIAN is also speci‐
106 fied.
107
108 GD_PEDANTIC
109 Specifies that unrecognised lines found during the parsing of
110 the format file should always cause a fatal error. Without this
111 flag, if a VERSION directive (see dirfile-format(5)) indicates
112 that the dirfile being opened conforms Standards Version newer
113 than the version understood by the library, unrecognised lines
114 will be silently ignored.
115
116 GD_PRETTY_PRINT
117 When dirfile metadata is flushed to disc (either explicitly via
118 dirfile_metaflush() or dirfile_flush() or implicitly by closing
119 the dirfile), an attempt will be made to create a nicer looking
120 format file (from a human-readable standpoint). What this ex‐
121 plicitly means is not part of the API, and any particular behav‐
122 iour should not be relied on. If the dirfile is opened read-on‐
123 ly, this flag is ignored.
124
125 GD_TRUNC
126 If dirfilename specifies an already existing dirfile, it will be
127 truncated before opening. Since dirfile_cbopen() decides
128 whether dirfilename specifies an existing dirfile before at‐
129 tempting to parse the dirfile, dirfilename is considered to
130 specify an existing dirfile if it refers to a directory contain‐
131 ing a regular file called format, regardless of the content or
132 form of that file.
133
134 Truncation occurs by deleting every regular file in the speci‐
135 fied directory, whether the files were referred to by the
136 dirfile before truncation or not. Accordingly, this flag should
137 be used with caution. Subdirectories are left untouched. No‐
138 tably, this operation does not consider the presence of sub‐
139 dirfiles declared by INCLUDE directives. If the dirfile does
140 not exist, this flag is ignored.
141
142 GD_VERBOSE
143 Specifies that whenever an error is triggered by the library
144 when working on this dirfile, the corresponding error string,
145 which can be retrieved by calling get_error_string(3), should be
146 written on standard error by the library. Without this flag,
147 GetData writes nothing to standard error. (GetData never writes
148 to standard output.)
149
150
151 The flags argument may also be bitwise or'd with one of the following
152 symbols indicating the default encoding scheme of the dirfile. Like
153 the endianness flags, the choice of encoding here is ignored if the en‐
154 coding is specified in the dirfile itself, unless GD_FORCE_ENCODED is
155 also specified. If none of these symbols is present, GD_AUTO_ENCODED
156 is assumed, unless the dirfile_cbopen() call results in creation or
157 truncation of the dirfile. In that case, GD_UNENCODED is assumed. See
158 dirfile-encoding(5) for details on dirfile encoding schemes.
159
160 GD_AUTO_ENCODED
161 Specifies that the encoding type is not known in advance, but
162 should be detected by the GetData library. Detection is accom‐
163 plished by searching for raw data files with extensions appro‐
164 priate to the encoding scheme. This method will notably fail if
165 the the library is called via putdata(3) to create a previously
166 non-existent raw field unless a read is first successfully per‐
167 formed on the dirfile. Once the library has determined the en‐
168 coding scheme for the first time, it remembers it for subsequent
169 calls.
170
171 GD_BZIP2_ENDODED
172 Specifies that raw data files are compressed using the Burrows-
173 Wheeler block sorting text compression algorithm and Huffman
174 coding, as implemented in the bzip2 format.
175
176 GD_GZIP_ENDODED
177 Specifies that raw data files are compressed using Lempel-Ziv
178 coding (LZ77) as implemented in the gzip format.
179
180 GD_LZMA_ENDODED
181 Specifies that raw data files are compressed using the Lempel-
182 Ziv Markov Chain Algorithm (LZMA) as implemented in the xz con‐
183 tainer format.
184
185 GD_SLIM_ENCODED
186 Specifies that raw data files are compressed using the slimlib
187 library.
188
189 GD_TEXT_ENCODED
190 Specifies that raw data files are encoded as text files contain‐
191 ing one data sample per line.
192
193 GD_UNENCODED
194 Specifies that raw data files are not encoded, but written ver‐
195 batim to disk.
196
197
198 The Callback Function
199 The caller-supplied sehandler function is called whenever the format
200 file parser encounters a syntax error (i.e. whenever it would return
201 the GD_E_FORMAT error). This callback may be used to correct the er‐
202 ror, or to tell the parser how to recover from it.
203
204 This function should take two pointers as arguments, and return an int:
205
206 int sehandler(gd_parser_data_t *pdata, void *extra);
207
208 The extra parameter is the pointer supplied to dirfile_cbopen(), passed
209 verbatim to this function. It can be used to pass caller data to the
210 callback. GetData does not inspect this pointer, not even to check its
211 validity. If the caller needs to pass no data to the callback, it may
212 be NULL.
213
214 The gd_parser_data_t type is a structure with at least the following
215 members:
216
217 typedef struct {
218 const DIRFILE* dirfile;
219 int suberror;
220 int linenum;
221 const char* filename;
222 char* line;
223
224 ...
225 } gd_parser_data_t;
226
227 The pdata->dirfile member will be a pointer to a DIRFILE object suit‐
228 able only for passing to get_error_string(). Notably, the caller
229 should not assume this pointer will be the same as the pointer eventu‐
230 ally returned by dirfile_cbopen(), nor that it will be valid after the
231 callback function returns.
232
233 The pdata->suberror parameter will be one of the following symbols in‐
234 dicating the type of syntax error encountered:
235
236 GD_E_FORMAT_BAD_LINE
237 The line was indecipherable. Typically this means that the line
238 contained neither a reserved word, nor a field type.
239
240 GD_E_FORMAT_BAD_NAME
241 The specified field name was invalid.
242
243 GD_E_FORMAT_BAD_SPF
244 The samples-per-frame of a RAW field was out-of-range.
245
246 GD_E_FORMAT_BAD_TYPE
247 The data type of a RAW field was unrecognised.
248
249 GD_E_FORMAT_BITNUM
250 The first bit of a BIT field was out-of-range.
251
252 GD_E_FORMAT_BITSIZE
253 The last bit of a BIT field was out-of-range.
254
255 GD_E_FORMAT_CHARACTER
256 An invalid character was found in the line, or a character es‐
257 cape sequence was malformed.
258
259 GD_E_FORMAT_DUPLICATE
260 The specified field name already exists.
261
262 GD_E_FORMAT_ENDIAN
263 The byte sex specified by an ENDIAN directive was unrecognised.
264
265 GD_E_FORMAT_LITTERAL
266 An unexpected character was encountered in a complex literal.
267
268 GD_E_FORMAT_LOCATION
269 The parent of a metafield was defined in another fragment.
270
271 GD_E_FORMAT_METARAW
272 An attempt was made to add a RAW metafield.
273
274 GD_E_FORMAT_N_FIELDS
275 The number of fields of a LINCOM field was out-of-range.
276
277 GD_E_FORMAT_N_TOK
278 An insufficient number of tokens was found on the line.
279
280 GD_E_FORMAT_NO_PARENT
281 The parent of a metafield was not found.
282
283 GD_E_FORMAT_NUMBITS
284 The number of bits of a BIT field was out-of-range.
285
286 GD_E_FORMAT_PROTECT
287 The protection level specified by a PROTECT directive was un‐
288 recognised.
289
290 GD_E_FORMAT_RES_NAME
291 A field was specified with the reserved name INDEX.
292
293 GD_E_FORMAT_UNTERM
294 The last token of the line was unterminated.
295
296 pdata->filename and pdata->linenum members contains the name of the
297 fragment and line number where the syntax error was encountered. The
298 first line in a fragment is line one.
299
300 The pdata->line member contains a copy of the line containing the syn‐
301 tax error. This line may be freely modified by the callback function.
302 It will then be reparsed if the callback function returns the symbol
303 GD_SYNTAX_RESCAN (see below). Space is available for at least
304 GD_MAX_LINE_LENGTH characters, including the terminating NUL.
305
306 The callback function should return one of the following symbols, which
307 tells the parser how to subsequently handle the error:
308
309 GD_SYNTAX_ABORT
310 The parser should immediately abort parsing the format file and
311 fail with the error GD_E_FORMAT. This is the default behaviour,
312 if no callback function is provided (or if the parser is invoked
313 by calling dirfile_open()).
314
315 GD_SYNTAX_CONTINUE
316 The parser should continue parsing the format file. However,
317 once parsing has finished, the parser will fail with the error
318 GD_E_FORMAT, even if no further syntax errors are encountered.
319 This behaviour may be used by the caller to identify all lines
320 containing syntax errors in the format file, instead of just the
321 first one.
322
323 GD_SYNTAX_IGNORE
324 The parser should ignore the line containing the syntax error
325 completely, and carry on parsing the format file. If no further
326 errors are encountered, the dirfile will be successfully opened.
327
328 GD_SYNTAX_RESCAN
329 The parser should rescan the line argument, which replaces the
330 line which originally contained the syntax error. The line is
331 assumed to have been corrected by the callback function. If the
332 line still contains a syntax error, the callback function will
333 be called again.
334
335 The callback function handles only syntax errors. The parser may still
336 abort early, if a different kind of library error is encountered. Fur‐
337 thermore, although a line may contain more than one syntax error, the
338 parser will only ever report one syntax error per line, even if the
339 callback function returns GD_SYNTAX_CONTINUE.
340
341
343 A call to dirfile_cbopen() or dirfile_open() always returns a pointer
344 to a newly allocated DIRFILE object. The DIRFILE object is an opaque
345 structure containing the parsed dirfile metadata. If an error oc‐
346 curred, the dirfile error will be set to a non-zero error value. The
347 DIRFILE object will also be internally flagged as invalid. Possible
348 error values are:
349
350 GD_E_ACCMODE
351 The library was asked to create or truncate a dirfile opened
352 read-only (i.e. GD_CREAT or GD_TRUNC was specified in flags
353 along with GD_RDONLY).
354
355 GD_E_ALLOC
356 The library was unable to allocate memory.
357
358 GD_E_BAD_REFERENCE
359 The reference field specified by a /REFERENCE directive in the
360 format file (see dirfile-format(5)) was not found, or was not a
361 RAW field.
362
363 GD_E_CALLBACK
364 The registered callback function, sehandler, returned an un‐
365 recognised response.
366
367 GD_E_CREAT
368 The library was unable to create the dirfile, or the dirfile
369 exists and both GD_CREAT and GD_EXCL were specified.
370
371 GD_E_FORMAT
372 A syntax error occurred in the format file. See also The Call‐
373 back Function section above.
374
375 GD_E_INTERNAL_ERROR
376 An internal error occurred in the library while trying to per‐
377 form the task. This indicates a bug in the library. Please
378 report the incident to the GetData developers.
379
380 GD_E_OPEN
381 The dirfile format file could not be opened, or dirfilename
382 does not specify a valid dirfile.
383
384 GD_E_OPEN_INCLUDE
385 A file specified in an /INCLUDE directive could not be opened.
386
387 GD_E_TRUNC
388 The library was unable to truncate the dirfile.
389
390 The dirfile error may be retrieved by calling get_error(3). A descrip‐
391 tive error string for the last error encountered can be obtained from a
392 call to get_error_string(3). When finished with it, the DIRFILE object
393 should be deallocated with a call to dirfile_close(3), even if the open
394 failed.
395
397 GetData's parser assumes it is running on an ASCII-compatible platform.
398 Format file parsing will fail gloriously on an EBCDIC platform.
399
401 dirfile(5), dirfile-encoding(5), dirfile-format(5), dirfile_close(3),
402 dirfile_include(3), dirfile_parser_callback(3), getdata(3), get_er‐
403 ror(3), get_error_string(3)
404
405
406
407Version 0.6.0 16 October 2009 dirfile_cbopen(3)