1dirfile_cbopen(3)                   GETDATA                  dirfile_cbopen(3)
2
3
4

NAME

6       dirfile_cbopen, dirfile_open — open or create a dirfile
7

SYNOPSIS

9       #include <getdata.h>
10
11       DIRFILE* dirfile_cbopen(const char *dirfilename, unsigned long flags,
12              gd_parser_callback_t sehandler, void *extra);
13
14       DIRFILE* dirfile_open(const char *dirfilename, unsigned long flags);
15

DESCRIPTION

17       The dirfile_cbopen() function opens or creates the dirfile specified by
18       dirfilename,  returning a DIRFILE object associated with it.  Opening a
19       dirfile will cause the library to read and parse the  dirfile's  format
20       file (see dirfile-format(5)).
21
22       If  not NULL, sehandler should be a pointer to a function which will be
23       called whenever a syntax error is encountered during parsing the format
24       file.  Specify NULL for this parameter if no callback function is to be
25       used.  The caller may use this function to correct the error or  modify
26       the  error  handling of the format file parser.  See The Callback Func‐
27       tion section below for details on this function.   The  extra  argument
28       allows  the  caller to pass data to the callback function.  The pointer
29       will be passed to the callback function verbatim.
30
31       The dirfile_open() function is equivalent to dirfile_cbopen(), with se‐
32       handler and extra set to NULL.
33
34       The  flags  argument  should include one of the access modes: GD_RDONLY
35       (read-only) or GD_RDWR (read-write), and may also contain zero or  more
36       of the following flags, bitwise-or'd together:
37
38       GD_BIG_ENDIAN
39              Specifies  that  raw  data  on disk is stored as big-endian data
40              (most significant byte first).  Specifying this flag along  with
41              the contradictory GD_LITTLE_ENDIAN will cause the library to as‐
42              sume that the endianness of the data is opposite to that of  the
43              native architecture.
44
45              This flag is ignored completely if an ENDIAN directive occurs in
46              the dirfile format file, unless GD_FORCE_ENDIAN is  also  speci‐
47              fied.
48
49       GD_CREAT
50              An empty dirfile will be created, if one does not already exist.
51              This will create both the dirfile directory and an empty  format
52              file.   The  directory  will  have have mode S_IRWXU | S_IRWXG |
53              S_IRWXO (0777),  modified  by  the  caller's  umask  value  (see
54              umask(2)).   The  format file will have mode S_IRUSR | S_IWUSR |
55              S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH (0666), also  modified  by
56              the caller's umask.
57
58              The  owner  of the dirfile directory and format file will be the
59              effective user ID of the caller.  Group  ownership  follows  the
60              rules outlined in mkdir(2).
61
62       GD_EXCL
63              Ensure  that  this  call creates a dirfile: when specified along
64              with GD_CREAT, the call will fail if the  dirfile  specified  by
65              dirfilename already exists.  Behaviour of this flag is undefined
66              if GD_CREAT is not specified.  This flag suffers  from  all  the
67              limitations of the O_EXCL flag as indicated in open(2).
68
69       GD_FORCE_ENCODING
70              Specifies that ENCODING directives (see dirfile-format(5)) found
71              in the dirfile format file  should  be  ignored.   The  encoding
72              scheme specified in flags will be used instead (see below).
73
74       GD_FORCE_ENDIAN
75              Specifies  that  ENDIAN directives (see dirfile-format(5)) found
76              in the dirfile format file should be  ignored.   When  specified
77              with  one  of  GD_BIG_ENDIAN or GD_LITTLE_ENDIAN, the endianness
78              specified will be assumed.  If this flag is specified with  nei‐
79              ther of those flags, the dirfile will be assumed to have the en‐
80              dianness of the native architecture.
81
82       GD_IGNORE_DUPS
83              If the dirfile format metadata specifies  more  than  one  field
84              with  the  same name, all but one of them will be ignored by the
85              parser.   Without  this  flag,  parsing  would  fail  with   the
86              GD_E_FORMAT  error, possibly resulting in invocation of the reg‐
87              istered callback function.  Which of  the  duplicate  fields  is
88              kept is not specified.  As a result, this flag is typically only
89              useful in the case where identical copies of a field  specifica‐
90              tion line are present.
91
92              No  indication is provided to indicate whether a duplicate field
93              has been discarded.  If finer grained control is  required,  the
94              caller should handle GD_E_FORMAT_DUPLICATE suberrors itself with
95              an appropriate callback function.
96
97       GD_LITTLE_ENDIAN
98              Specifies that raw data on disk is stored as little-endian  data
99              (least significant byte first).  Specifying this flag along with
100              the contradictory GD_BIG_ENDIAN will cause the library to assume
101              that  the  endianness of the data is opposite to that of the na‐
102              tive architecture.
103
104              This flag is ignored completely if an ENDIAN directive occurs in
105              the  dirfile  format file, unless GD_FORCE_ENDIAN is also speci‐
106              fied.
107
108       GD_PEDANTIC
109              Specifies that unrecognised lines found during  the  parsing  of
110              the format file should always cause a fatal error.  Without this
111              flag, if a VERSION directive (see  dirfile-format(5))  indicates
112              that  the  dirfile being opened conforms Standards Version newer
113              than the version understood by the library,  unrecognised  lines
114              will be silently ignored.
115
116       GD_PRETTY_PRINT
117              When  dirfile metadata is flushed to disc (either explicitly via
118              dirfile_metaflush() or dirfile_flush() or implicitly by  closing
119              the  dirfile), an attempt will be made to create a nicer looking
120              format file (from a human-readable standpoint).  What  this  ex‐
121              plicitly means is not part of the API, and any particular behav‐
122              iour should not be relied on.  If the dirfile is opened read-on‐
123              ly, this flag is ignored.
124
125       GD_TRUNC
126              If dirfilename specifies an already existing dirfile, it will be
127              truncated  before  opening.   Since   dirfile_cbopen()   decides
128              whether  dirfilename  specifies  an  existing dirfile before at‐
129              tempting to parse the  dirfile,  dirfilename  is  considered  to
130              specify an existing dirfile if it refers to a directory contain‐
131              ing a regular file called format, regardless of the  content  or
132              form of that file.
133
134              Truncation  occurs  by deleting every regular file in the speci‐
135              fied directory, whether  the  files  were  referred  to  by  the
136              dirfile before truncation or not.  Accordingly, this flag should
137              be used with caution.  Subdirectories are left  untouched.   No‐
138              tably,  this  operation  does  not consider the presence of sub‐
139              dirfiles declared by INCLUDE directives.  If  the  dirfile  does
140              not exist, this flag is ignored.
141
142       GD_VERBOSE
143              Specifies  that  whenever  an  error is triggered by the library
144              when working on this dirfile, the  corresponding  error  string,
145              which can be retrieved by calling get_error_string(3), should be
146              written on standard error by the library.   Without  this  flag,
147              GetData writes nothing to standard error.  (GetData never writes
148              to standard output.)
149
150
151       The flags argument may also be bitwise or'd with one of  the  following
152       symbols  indicating  the  default encoding scheme of the dirfile.  Like
153       the endianness flags, the choice of encoding here is ignored if the en‐
154       coding  is  specified in the dirfile itself, unless GD_FORCE_ENCODED is
155       also specified.  If none of these symbols is  present,  GD_AUTO_ENCODED
156       is  assumed,  unless  the  dirfile_cbopen() call results in creation or
157       truncation of the dirfile.  In that case, GD_UNENCODED is assumed.  See
158       dirfile-encoding(5) for details on dirfile encoding schemes.
159
160       GD_AUTO_ENCODED
161              Specifies  that  the  encoding type is not known in advance, but
162              should be detected by the GetData library.  Detection is  accom‐
163              plished  by  searching for raw data files with extensions appro‐
164              priate to the encoding scheme.  This method will notably fail if
165              the  the library is called via putdata(3) to create a previously
166              non-existent raw field unless a read is first successfully  per‐
167              formed  on the dirfile.  Once the library has determined the en‐
168              coding scheme for the first time, it remembers it for subsequent
169              calls.
170
171       GD_BZIP2_ENDODED
172              Specifies  that raw data files are compressed using the Burrows-
173              Wheeler block sorting text  compression  algorithm  and  Huffman
174              coding, as implemented in the bzip2 format.
175
176       GD_GZIP_ENDODED
177              Specifies  that  raw  data files are compressed using Lempel-Ziv
178              coding (LZ77) as implemented in the gzip format.
179
180       GD_LZMA_ENDODED
181              Specifies that raw data files are compressed using  the  Lempel-
182              Ziv  Markov Chain Algorithm (LZMA) as implemented in the xz con‐
183              tainer format.
184
185       GD_SLIM_ENCODED
186              Specifies that raw data files are compressed using  the  slimlib
187              library.
188
189       GD_TEXT_ENCODED
190              Specifies that raw data files are encoded as text files contain‐
191              ing one data sample per line.
192
193       GD_UNENCODED
194              Specifies that raw data files are not encoded, but written  ver‐
195              batim to disk.
196
197
198   The Callback Function
199       The  caller-supplied  sehandler  function is called whenever the format
200       file parser encounters a syntax error (i.e.  whenever it  would  return
201       the  GD_E_FORMAT  error).  This callback may be used to correct the er‐
202       ror, or to tell the parser how to recover from it.
203
204       This function should take two pointers as arguments, and return an int:
205
206              int sehandler(gd_parser_data_t *pdata, void *extra);
207
208       The extra parameter is the pointer supplied to dirfile_cbopen(), passed
209       verbatim  to  this function.  It can be used to pass caller data to the
210       callback.  GetData does not inspect this pointer, not even to check its
211       validity.   If the caller needs to pass no data to the callback, it may
212       be NULL.
213
214       The gd_parser_data_t type is a structure with at  least  the  following
215       members:
216
217           typedef struct {
218             const DIRFILE* dirfile;
219             int suberror;
220             int linenum;
221             const char* filename;
222             char* line;
223
224             ...
225           } gd_parser_data_t;
226
227       The  pdata->dirfile  member will be a pointer to a DIRFILE object suit‐
228       able only for  passing  to  get_error_string().   Notably,  the  caller
229       should  not assume this pointer will be the same as the pointer eventu‐
230       ally returned by dirfile_cbopen(), nor that it will be valid after  the
231       callback function returns.
232
233       The  pdata->suberror parameter will be one of the following symbols in‐
234       dicating the type of syntax error encountered:
235
236       GD_E_FORMAT_BAD_LINE
237              The line was indecipherable.  Typically this means that the line
238              contained neither a reserved word, nor a field type.
239
240       GD_E_FORMAT_BAD_NAME
241              The specified field name was invalid.
242
243       GD_E_FORMAT_BAD_SPF
244              The samples-per-frame of a RAW field was out-of-range.
245
246       GD_E_FORMAT_BAD_TYPE
247              The data type of a RAW field was unrecognised.
248
249       GD_E_FORMAT_BITNUM
250              The first bit of a BIT field was out-of-range.
251
252       GD_E_FORMAT_BITSIZE
253              The last bit of a BIT field was out-of-range.
254
255       GD_E_FORMAT_CHARACTER
256              An  invalid  character was found in the line, or a character es‐
257              cape sequence was malformed.
258
259       GD_E_FORMAT_DUPLICATE
260              The specified field name already exists.
261
262       GD_E_FORMAT_ENDIAN
263              The byte sex specified by an ENDIAN directive was unrecognised.
264
265       GD_E_FORMAT_LITTERAL
266              An unexpected character was encountered in a complex literal.
267
268       GD_E_FORMAT_LOCATION
269              The parent of a metafield was defined in another fragment.
270
271       GD_E_FORMAT_METARAW
272              An attempt was made to add a RAW metafield.
273
274       GD_E_FORMAT_N_FIELDS
275              The number of fields of a LINCOM field was out-of-range.
276
277       GD_E_FORMAT_N_TOK
278              An insufficient number of tokens was found on the line.
279
280       GD_E_FORMAT_NO_PARENT
281              The parent of a metafield was not found.
282
283       GD_E_FORMAT_NUMBITS
284              The number of bits of a BIT field was out-of-range.
285
286       GD_E_FORMAT_PROTECT
287              The protection level specified by a PROTECT  directive  was  un‐
288              recognised.
289
290       GD_E_FORMAT_RES_NAME
291              A field was specified with the reserved name INDEX.
292
293       GD_E_FORMAT_UNTERM
294              The last token of the line was unterminated.
295
296       pdata->filename  and  pdata->linenum  members  contains the name of the
297       fragment and line number where the syntax error was  encountered.   The
298       first line in a fragment is line one.
299
300       The  pdata->line member contains a copy of the line containing the syn‐
301       tax error.  This line may be freely modified by the callback  function.
302       It  will  then  be reparsed if the callback function returns the symbol
303       GD_SYNTAX_RESCAN  (see  below).   Space  is  available  for  at   least
304       GD_MAX_LINE_LENGTH characters, including the terminating NUL.
305
306       The callback function should return one of the following symbols, which
307       tells the parser how to subsequently handle the error:
308
309       GD_SYNTAX_ABORT
310              The parser should immediately abort parsing the format file  and
311              fail with the error GD_E_FORMAT.  This is the default behaviour,
312              if no callback function is provided (or if the parser is invoked
313              by calling dirfile_open()).
314
315       GD_SYNTAX_CONTINUE
316              The  parser  should  continue parsing the format file.  However,
317              once parsing has finished, the parser will fail with  the  error
318              GD_E_FORMAT,  even  if no further syntax errors are encountered.
319              This behaviour may be used by the caller to identify  all  lines
320              containing syntax errors in the format file, instead of just the
321              first one.
322
323       GD_SYNTAX_IGNORE
324              The parser should ignore the line containing  the  syntax  error
325              completely, and carry on parsing the format file.  If no further
326              errors are encountered, the dirfile will be successfully opened.
327
328       GD_SYNTAX_RESCAN
329              The parser should rescan the line argument, which  replaces  the
330              line  which  originally contained the syntax error.  The line is
331              assumed to have been corrected by the callback function.  If the
332              line  still  contains a syntax error, the callback function will
333              be called again.
334
335       The callback function handles only syntax errors.  The parser may still
336       abort early, if a different kind of library error is encountered.  Fur‐
337       thermore, although a line may contain more than one syntax  error,  the
338       parser  will  only  ever  report one syntax error per line, even if the
339       callback function returns GD_SYNTAX_CONTINUE.
340
341

RETURN VALUE

343       A call to dirfile_cbopen() or dirfile_open() always returns  a  pointer
344       to  a  newly allocated DIRFILE object.  The DIRFILE object is an opaque
345       structure containing the parsed dirfile  metadata.   If  an  error  oc‐
346       curred,  the  dirfile error will be set to a non-zero error value.  The
347       DIRFILE object will also be internally flagged  as  invalid.   Possible
348       error values are:
349
350       GD_E_ACCMODE
351               The  library  was  asked to create or truncate a dirfile opened
352               read-only (i.e.  GD_CREAT or GD_TRUNC was  specified  in  flags
353               along with GD_RDONLY).
354
355       GD_E_ALLOC
356               The library was unable to allocate memory.
357
358       GD_E_BAD_REFERENCE
359               The  reference field specified by a /REFERENCE directive in the
360               format file (see dirfile-format(5)) was not found, or was not a
361               RAW field.
362
363       GD_E_CALLBACK
364               The  registered  callback  function, sehandler, returned an un‐
365               recognised response.
366
367       GD_E_CREAT
368               The library was unable to create the dirfile,  or  the  dirfile
369               exists and both GD_CREAT and GD_EXCL were specified.
370
371       GD_E_FORMAT
372               A syntax error occurred in the format file.  See also The Call‐
373               back Function section above.
374
375       GD_E_INTERNAL_ERROR
376               An internal error occurred in the library while trying to  per‐
377               form  the  task.   This indicates a bug in the library.  Please
378               report the incident to the GetData developers.
379
380       GD_E_OPEN
381               The dirfile format file could not  be  opened,  or  dirfilename
382               does not specify a valid dirfile.
383
384       GD_E_OPEN_INCLUDE
385               A file specified in an /INCLUDE directive could not be opened.
386
387       GD_E_TRUNC
388               The library was unable to truncate the dirfile.
389
390       The dirfile error may be retrieved by calling get_error(3).  A descrip‐
391       tive error string for the last error encountered can be obtained from a
392       call to get_error_string(3).  When finished with it, the DIRFILE object
393       should be deallocated with a call to dirfile_close(3), even if the open
394       failed.
395

BUGS

397       GetData's parser assumes it is running on an ASCII-compatible platform.
398       Format file parsing will fail gloriously on an EBCDIC platform.
399

SEE ALSO

401       dirfile(5), dirfile-encoding(5),  dirfile-format(5),  dirfile_close(3),
402       dirfile_include(3),   dirfile_parser_callback(3),  getdata(3),  get_er‐
403       ror(3), get_error_string(3)
404
405
406
407Version 0.6.0                   16 October 2009              dirfile_cbopen(3)
Impressum