1BT_POSTPROCESS(1)                   btparse                  BT_POSTPROCESS(1)
2
3
4

NAME

6       bt_postprocess - post-processing of BibTeX strings, values, and entries
7

SYNOPSIS

9          void bt_postprocess_string (char * s,
10                                      btshort options)
11
12          char * bt_postprocess_value (AST *   value,
13                                       btshort  options,
14                                       boolean replace);
15
16          char * bt_postprocess_field (AST *   field,
17                                       btshort  options,
18                                       boolean replace);
19
20          void bt_postprocess_entry (AST *  entry,
21                                     btshort options);
22

DESCRIPTION

24       When btparse parses a BibTeX entry, it initially stores the results in
25       an abstract syntax tree (AST), in a form exactly mirroring the parsed
26       data.  For example, the entry
27
28          @Article{Jones:1997a,
29            AuThOr = "Bob   Jones" # and # "Jim Smith ",
30            TITLE = "Feeding Habits of
31                     the Common Cockroach",
32            JoUrNaL = j_ent,
33            YEAR = 1997
34          }
35
36       would parse to an AST that could be represented as follows:
37
38          (entry,"Article")
39            (key,"Jones:1997a")
40            (field,"AuThOr")
41              (string,"Bob   Jones")
42              (macro,"and")
43              (string,"Jim Smith ")
44            (field,"TITLE")
45              (string,"Feeding Habits of               the Common Cockroach")
46            (field,"JoUrNaL")
47              (macro,"j_ent")
48            (field,"YEAR")
49              (number,"1997")
50
51       The advantage of this form is that all the important information in the
52       entry is readily available by traversing the tree using the functions
53       described in bt_traversal.  This obvious problem is that the data is a
54       little too raw to be immediately useful: entry types and field names
55       are inconsistently capitalized, strings are full of unwanted
56       whitespace, field values not reduced to single strings, and so forth.
57
58       All of these problems are addressed by btparse's post-processing
59       functions, described here.  Normally, you won't have to call these
60       functions---the library does the Right Thing for you after parsing each
61       entry, and you can customize what exactly the Right Thing is for your
62       application.  (For instance, you can tell it to expand macros, but not
63       to concatenate substrings together.)  However, it's conceivable that
64       you might wish to move the post-processing into your own code and out
65       of the library's control.  More likely, you could have strings that
66       come from something other than BibTeX files that you would like to have
67       treated as BibTeX strings; for that situation, the post-processing
68       functions are essential.  Finally, you might just be curious about what
69       exactly happens to your data after it's parsed.  If so, you've come to
70       the right place for excruciatingly detailed explanations.
71

FUNCTIONS

73       btparse offers four points of entry to its post-processing code.  Of
74       these, probably only the first and last---for processing individual
75       strings and whole entries---will be commonly used.
76
77   Post-processing entry points
78       To understand why four entry points are offered, an explanation of the
79       sample AST shown above will help.  First of all, the whole entry is
80       represented by the "(entry,"Article")" node; this node has the entry
81       key and all its field/value pairs as children.  Entry nodes are
82       returned by bt_parse_entry() and bt_parse_entry_s() (see bt_input) as
83       well as bt_next_entry() (which traverses a list of entries returned
84       from bt_parse_file()---see bt_traversal).  Whole entries may be post-
85       processed with bt_postprocess_entry().
86
87       You may also need to post-process a single field, or just the value
88       associated with it.  (The difference is that processing the field can
89       change the field name---e.g. to lowercase---in addition to the field
90       value.)  The "(field,"AuThOr")" node above is an example of a field
91       sub-AST, and "(string,"Bob   Jones")" is the first node in the list of
92       simple values representing that field's value.  (Recall that a field
93       value is, in general, a list of simple values.)  Field nodes are
94       returned by bt_next_field(), value nodes by bt_next_value().  The
95       former may be passed to bt_postprocess_field() for post-processing, the
96       latter to bt_postprocess_value().
97
98       Finally, individual strings may wander into your program from many
99       places other than a btparse AST.  For that reason,
100       bt_postprocess_string() is available for post-processing arbitrary
101       strings.
102
103   Post-processing options
104       All of the post-processing routines have an "options" parameter, which
105       you can use to fine-tune the post-processing.  (This is just like the
106       per-metatype string-processing options that you can set before parsing
107       entries; see bt_set_stringopts() in bt_input.)  Like elsewhere in the
108       library, "options" is a bitmap constructed by or'ing together various
109       predefined constants.  These constants and their effects are documented
110       in "String processing option macros" in btparse.
111
112       bt_postprocess_string ()
113              void bt_postprocess_string (char * s,
114                                          btshort options)
115
116           Post-processes an individual string, "s", which is modified in
117           place.  The only post-processing option that makes sense on
118           individual strings is whether to collapse whitespace according to
119           the BibTeX rules; thus, if "options & BTO_COLLAPSE" is false, this
120           function has no effect.  (Although it makes a complete pass over
121           the string anyways.  This is for future expansion.)
122
123           The exact rules for collapsing whitespace are simple: non-space
124           whitespace characters (tabs and newlines mainly) are converted to
125           space, any strings of more than one space within are collapsed to a
126           single space, and any leading or trailing spaces are deleted.
127           (Ensuring that all whitespace is spaces is actually done by
128           btparse's lexical scanner, so strings in btparse ASTs will never
129           have whitespace apart from space.  Likewise, any strings passed to
130           bt_postprocess_string() should not contain non-space whitespace
131           characters.)
132
133       bt_postprocess_value ()
134              char * bt_postprocess_value (AST *   value,
135                                           btshort  options,
136                                           boolean replace);
137
138           Post-processes a single field value, which is the head of a list of
139           simple values as returned by bt_next_value().  All of the relevant
140           string-processing options come into play here: conversion of
141           numbers to strings ("BTO_CONVERT"), macro expansion ("BTO_EXPAND"),
142           collapsing of whitespace ("BTO_COLLAPSE"), and string pasting
143           ("BTO_PASTE").  Since pasting substrings together without first
144           expanding macros and converting numbers would be nonsensical,
145           attempting to do so is a fatal error.
146
147           If "replace" is true, then the list headed by "value" will be
148           replaced by a list representing the processed value.  That is, if
149           string pasting is turned on ("options & BTO_PASTE" is true), then
150           this list will be collapsed to a single node containing the single
151           string that results from pasting together all the substrings.  If
152           string pasting is not on, then each node in the list will be left
153           intact, but will have its text replaced by processed text.
154
155           If "replace" is false, then a new string will be built on the fly
156           and returned by the function.  Note that if pasting is not on in
157           this case, you will only get the last string in the list.  (It
158           doesn't really make a lot of sense to post-process a value without
159           pasting unless you're replacing it with the new value, though.)
160
161           Returns the string that resulted from processing the whole value,
162           which only makes sense if pasting was on or there was only one
163           value in the list.  If a multiple-value list was processed without
164           pasting, the last string in the list is returned (after
165           processing).
166
167           Consider what might be done to the value of the "author" field in
168           the above example, which is the concatenation of a string, a macro,
169           and another string.  Assume that the macro "and" expands to " and
170           ", and that the variable "value" points to the sub-AST for this
171           value.  The original sub-AST corresponding to this value is
172
173              (string,"Bob   Jones")
174              (macro,"and")
175              (string,"Jim Smith ")
176
177           To fully process this value in-place, you would call
178
179              bt_postprocess_value (value, BTO_FULL, TRUE);
180
181           This would convert the value to a single-element list,
182
183              (string,"Bob Jones and Jim Smith")
184
185           and return the fully-processed string "Bob Jones and Jim Smith".
186           Note that the "and" macro has been expanded, interpolated between
187           the two literal strings, everything pasted together, and finally
188           whitespace collapsed.  (Collapsing whitespace before concatenating
189           the strings would be a bad idea.)
190
191           (Incidentally, "BTO_FULL" is just a macro for the combination of
192           all possible string-processing options, currently:
193
194              BTO_CONVERT | BTO_EXPAND | BTO_PASTE | BTO_COLLAPSE
195
196           There are two other similar shortcut macros: "BTO_MACRO" to express
197           the special string-processing done on macro values, which is the
198           same as "BTO_FULL" except for the absence of "BTO_COLLAPSE"; and
199           "BTO_MINIMAL", which means no string-processing is to be done.)
200
201           Let's say you'd rather preserve the list nature of the value, while
202           expanding macros and converting any numbers to strings.  (This
203           conversion is trivial: it just changes the type of the node from
204           "BTAST_NUMBER" to "BTAST_STRING".  "Number" values are always
205           stored as a string of digits, just as they appear in the file.)
206           This would be done with the call
207
208              bt_postprocess_value
209                 (value, BTO_CONVERT|BTO_EXPAND|BTO_COLLAPSE,TRUE);
210
211           which would change the list to
212
213              (string,"Bob Jones")
214              (string,"and")
215              (string,"Jim Smith")
216
217           Note that whitespace is collapsed here before any concatenation can
218           be done; this is probably a bad idea.  But you can do it if you
219           wish.  (If you get any ideas about cooking up your own value post-
220           processing scheme by doing it in little steps like this, take a
221           look at the source to bt_postprocess_value(); it should dissuade
222           you from such a venture.)
223
224       bt_postprocess_field ()
225              char * bt_postprocess_field (AST *   field,
226                                           btshort  options,
227                                           boolean replace);
228
229           This is little more than a front-end to bt_postprocess_value(); the
230           only difference is that you pass it a "field" AST node (eg. the
231           "(field,"AuThOr")" in the above example), and that it transforms
232           the field name in addition to its value.  In particular, the field
233           name is forced to lowercase; this behaviour is (currently) not
234           optional.
235
236           Returns the string returned by bt_postprocess_value().
237
238       bt_postprocess_entry ()
239              void bt_postprocess_entry (AST *  entry,
240                                         btshort options);
241
242           Post-processes all values in an entry.  If "entry" points to the
243           AST for a "regular" or "macro definition" entry, then the values
244           are just what you'd expect: everything on the right-hand side of a
245           field or macro "assignment."  You can also post-process comment and
246           preamble entries, though.  Comment entries are essentially one big
247           string, so only whitespace collapsing makes sense on them.
248           Preambles may have multiple strings pasted together, so all the
249           string-processing options apply to them.  (And there's nothing to
250           prevent you from using macros in a preamble.)
251

SEE ALSO

253       btparse, bt_input, bt_traversal
254

AUTHOR

256       Greg Ward <gward@python.net>
257
258
259
260btparse, version 0.89             2023-07-21                 BT_POSTPROCESS(1)
Impressum