1BT_POST_PROCESSING(1)               btparse              BT_POST_PROCESSING(1)
2
3
4

NAME

6       bt_post_processing - post-processing of BibTeX strings, values, and
7       entries
8

SYNOPSIS

10          void bt_postprocess_string (char * s,
11                                      btshort options)
12
13          char * bt_postprocess_value (AST *   value,
14                                       btshort  options,
15                                       boolean replace);
16
17          char * bt_postprocess_field (AST *   field,
18                                       btshort  options,
19                                       boolean replace);
20
21          void bt_postprocess_entry (AST *  entry,
22                                     btshort options);
23

DESCRIPTION

25       When btparse parses a BibTeX entry, it initially stores the results in
26       an abstract syntax tree (AST), in a form exactly mirroring the parsed
27       data.  For example, the entry
28
29          @Article{Jones:1997a,
30            AuThOr = "Bob   Jones" # and # "Jim Smith ",
31            TITLE = "Feeding Habits of
32                     the Common Cockroach",
33            JoUrNaL = j_ent,
34            YEAR = 1997
35          }
36
37       would parse to an AST that could be represented as follows:
38
39          (entry,"Article")
40            (key,"Jones:1997a")
41            (field,"AuThOr")
42              (string,"Bob   Jones")
43              (macro,"and")
44              (string,"Jim Smith ")
45            (field,"TITLE")
46              (string,"Feeding Habits of               the Common Cockroach")
47            (field,"JoUrNaL")
48              (macro,"j_ent")
49            (field,"YEAR")
50              (number,"1997")
51
52       The advantage of this form is that all the important information in the
53       entry is readily available by traversing the tree using the functions
54       described in bt_traversal.  This obvious problem is that the data is a
55       little too raw to be immediately useful: entry types and field names
56       are inconsistently capitalized, strings are full of unwanted
57       whitespace, field values not reduced to single strings, and so forth.
58
59       All of these problems are addressed by btparse's post-processing
60       functions, described here.  Normally, you won't have to call these
61       functions---the library does the Right Thing for you after parsing each
62       entry, and you can customize what exactly the Right Thing is for your
63       application.  (For instance, you can tell it to expand macros, but not
64       to concatenate substrings together.)  However, it's conceivable that
65       you might wish to move the post-processing into your own code and out
66       of the library's control.  More likely, you could have strings that
67       come from something other than BibTeX files that you would like to have
68       treated as BibTeX strings; for that situation, the post-processing
69       functions are essential.  Finally, you might just be curious about what
70       exactly happens to your data after it's parsed.  If so, you've come to
71       the right place for excruciatingly detailed explanations.
72

FUNCTIONS

74       btparse offers four points of entry to its post-processing code.  Of
75       these, probably only the first and last---for processing individual
76       strings and whole entries---will be commonly used.
77
78   Post-processing entry points
79       To understand why four entry points are offered, an explanation of the
80       sample AST shown above will help.  First of all, the whole entry is
81       represented by the "(entry,"Article")" node; this node has the entry
82       key and all its field/value pairs as children.  Entry nodes are
83       returned by bt_parse_entry() and bt_parse_entry_s() (see bt_input) as
84       well as bt_next_entry() (which traverses a list of entries returned
85       from bt_parse_file()---see bt_traversal).  Whole entries may be post-
86       processed with bt_postprocess_entry().
87
88       You may also need to post-process a single field, or just the value
89       associated with it.  (The difference is that processing the field can
90       change the field name---e.g. to lowercase---in addition to the field
91       value.)  The "(field,"AuThOr")" node above is an example of a field
92       sub-AST, and "(string,"Bob   Jones")" is the first node in the list of
93       simple values representing that field's value.  (Recall that a field
94       value is, in general, a list of simple values.)  Field nodes are
95       returned by bt_next_field(), value nodes by bt_next_value().  The
96       former may be passed to bt_postprocess_field() for post-processing, the
97       latter to bt_postprocess_value().
98
99       Finally, individual strings may wander into your program from many
100       places other than a btparse AST.  For that reason,
101       bt_postprocess_string() is available for post-processing arbitrary
102       strings.
103
104   Post-processing options
105       All of the post-processing routines have an "options" parameter, which
106       you can use to fine-tune the post-processing.  (This is just like the
107       per-metatype string-processing options that you can set before parsing
108       entries; see bt_set_stringopts() in bt_input.)  Like elsewhere in the
109       library, "options" is a bitmap constructed by or'ing together various
110       predefined constants.  These constants and their effects are documented
111       in "String processing option macros" in btparse.
112
113       bt_postprocess_string ()
114              void bt_postprocess_string (char * s,
115                                          btshort options)
116
117           Post-processes an individual string, "s", which is modified in
118           place.  The only post-processing option that makes sense on
119           individual strings is whether to collapse whitespace according to
120           the BibTeX rules; thus, if "options & BTO_COLLAPSE" is false, this
121           function has no effect.  (Although it makes a complete pass over
122           the string anyways.  This is for future expansion.)
123
124           The exact rules for collapsing whitespace are simple: non-space
125           whitespace characters (tabs and newlines mainly) are converted to
126           space, any strings of more than one space within are collapsed to a
127           single space, and any leading or trailing spaces are deleted.
128           (Ensuring that all whitespace is spaces is actually done by
129           btparse's lexical scanner, so strings in btparse ASTs will never
130           have whitespace apart from space.  Likewise, any strings passed to
131           bt_postprocess_string() should not contain non-space whitespace
132           characters.)
133
134       bt_postprocess_value ()
135              char * bt_postprocess_value (AST *   value,
136                                           btshort  options,
137                                           boolean replace);
138
139           Post-processes a single field value, which is the head of a list of
140           simple values as returned by bt_next_value().  All of the relevant
141           string-processing options come into play here: conversion of
142           numbers to strings ("BTO_CONVERT"), macro expansion ("BTO_EXPAND"),
143           collapsing of whitespace ("BTO_COLLAPSE"), and string pasting
144           ("BTO_PASTE").  Since pasting substrings together without first
145           expanding macros and converting numbers would be nonsensical,
146           attempting to do so is a fatal error.
147
148           If "replace" is true, then the list headed by "value" will be
149           replaced by a list representing the processed value.  That is, if
150           string pasting is turned on ("options & BTO_PASTE" is true), then
151           this list will be collapsed to a single node containing the single
152           string that results from pasting together all the substrings.  If
153           string pasting is not on, then each node in the list will be left
154           intact, but will have its text replaced by processed text.
155
156           If "replace" is false, then a new string will be built on the fly
157           and returned by the function.  Note that if pasting is not on in
158           this case, you will only get the last string in the list.  (It
159           doesn't really make a lot of sense to post-process a value without
160           pasting unless you're replacing it with the new value, though.)
161
162           Returns the string that resulted from processing the whole value,
163           which only makes sense if pasting was on or there was only one
164           value in the list.  If a multiple-value list was processed without
165           pasting, the last string in the list is returned (after
166           processing).
167
168           Consider what might be done to the value of the "author" field in
169           the above example, which is the concatenation of a string, a macro,
170           and another string.  Assume that the macro "and" expands to " and
171           ", and that the variable "value" points to the sub-AST for this
172           value.  The original sub-AST corresponding to this value is
173
174              (string,"Bob   Jones")
175              (macro,"and")
176              (string,"Jim Smith ")
177
178           To fully process this value in-place, you would call
179
180              bt_postprocess_value (value, BTO_FULL, TRUE);
181
182           ("BTO_FULL" is just the combination of all possible string-
183           processing options:
184           "BTO_CONVERT|BTO_EXPAND|BTO_PASTE|BTO_COLLAPSE".)  This would
185           convert the value to a single-element list,
186
187              (string,"Bob Jones and Jim Smith")
188
189           and return the fully-processed string "Bob Jones and Jim Smith".
190           Note that the "and" macro has been expanded, interpolated between
191           the two literal strings, everything pasted together, and finally
192           whitespace collapsed.  (Collapsing whitespace before concatenating
193           the strings would be a bad idea.)
194
195           Let's say you'd rather preserve the list nature of the value, while
196           expanding macros and converting any numbers to strings.  (This
197           conversion is trivial: it just changes the type of the node from
198           "BTAST_NUMBER" to "BTAST_STRING".  "Number" values are always
199           stored as a string of digits, just as they appear in the file.)
200           This would be done with the call
201
202              bt_postprocess_value
203                 (value, BTO_CONVERT|BTO_EXPAND|BTO_COLLAPSE,TRUE);
204
205           which would change the list to
206
207              (string,"Bob Jones")
208              (string,"and")
209              (string,"Jim Smith")
210
211           Note that whitespace is collapsed here before any concatenation can
212           be done; this is probably a bad idea.  But you can do it if you
213           wish.  (If you get any ideas about cooking up your own value post-
214           processing scheme by doing it in little steps like this, take a
215           look at the source to bt_postprocess_value(); it should dissuade
216           you from such a venture.)
217
218       bt_postprocess_field ()
219              char * bt_postprocess_field (AST *   field,
220                                           btshort  options,
221                                           boolean replace);
222
223           This is little more than a front-end to bt_postprocess_value(); the
224           only difference is that you pass it a "field" AST node (eg. the
225           "(field,"AuThOr")" in the above example), and that it transforms
226           the field name in addition to its value.  In particular, the field
227           name is forced to lowercase; this behaviour is (currently) not
228           optional.
229
230           Returns the string returned by bt_postprocess_value().
231
232       bt_postprocess_entry ()
233              void bt_postprocess_entry (AST *  entry,
234                                         btshort options);
235
236           Post-processes all values in an entry.  If "entry" points to the
237           AST for a "regular" or "macro definition" entry, then the values
238           are just what you'd expect: everything on the right-hand side of a
239           field or macro "assignment."  You can also post-process comment and
240           preamble entries, though.  Comment entries are essentially one big
241           string, so only whitespace collapsing makes sense on them.
242           Preambles may have multiple strings pasted together, so all the
243           string-processing options apply to them.  (And there's nothing to
244           prevent you from using macros in a preamble.)
245
246
247
248btparse, version 0.89             2023-07-21             BT_POST_PROCESSING(1)
Impressum