1BT_FORMAT_NAMES(1) btparse BT_FORMAT_NAMES(1)
2
3
4
6 bt_format_names - formatting BibTeX names for consistent output
7
9 bt_name_format * bt_create_name_format (char * parts,
10 boolean abbrev_first);
11 void bt_free_name_format (bt_name_format * format);
12 void bt_set_format_text (bt_name_format * format,
13 bt_namepart part,
14 char * pre_part,
15 char * post_part,
16 char * pre_token,
17 char * post_token);
18 void bt_set_format_options (bt_name_format * format,
19 bt_namepart part,
20 boolean abbrev,
21 bt_joinmethod join_tokens,
22 bt_joinmethod join_part);
23 char * bt_format_name (bt_name * name, bt_name_format * format);
24
26 After splitting a name into its components parts (represented as a
27 "bt_name" structure), you often want to put it back together again as a
28 single string in a consistent way. btparse provides a very flexible
29 way to do this, generally in two stages: first, you create a "name
30 format" which describes how to put the tokens and parts of any name
31 back together, and then you apply the format to a particular name.
32
33 The "name format" is encapsulated in a "bt_name_format" structure,
34 which is created with bt_create_name_format(). This function includes
35 some clever trickery that means you can usually get away with calling
36 it alone, and not need to do any customization of the format. If you
37 do need to customize the format, though, bt_set_format_text() and
38 bt_set_format_options() provide that capability.
39
40 The format controls the following:
41
42 • which name parts are printed, and in what order (e.g. "first von
43 last jr", or "von last jr first")
44
45 • the text that precedes and follows each part (e.g. if the first
46 name follows the last name, you probably want a comma before the
47 `first' part: "Smith, John" rather than "Smith John")
48
49 • the text that precedes and follows each token (e.g. if the first
50 name is abbreviated, you may want a period after each token: "J. R.
51 Smith" rather than "J R Smith")
52
53 • the method used to join the tokens of each part together
54
55 • the method used to join each part to the following part
56
57 All of these except the list of parts to format are kept in arrays
58 indexed by name part: for example, the structure has a field
59
60 char * post_token[BT_MAX_NAMEPARTS]
61
62 and "post_token[BTN_FIRST]" ("BTN_FIRST" is from the "bt_namepart"
63 "enum") is the string to be added after each token in the first
64 name---for example, "." if the first name is to be abbreviated in the
65 conventional way.
66
67 Yet another "enum", "bt_joinmethod", describes the available methods
68 for joining tokens together. Note that there are two sets of join
69 methods in a name format: between tokens within a single part, and
70 between the tokens of two different parts. The first allows you, for
71 example, to change "J R Smith" (first name abbreviated with no post-
72 token text but tokens joined by a space) to "JR Smith" (the same, but
73 first-name tokens jammed together). The second is mainly used to
74 ensure that "von" and "last" name-parts may be joined with a tie:
75 "de~Roche" rather than "de Roche".
76
77 The token join methods are:
78
79 BTJ_MAYTIE
80 Insert a "discretionary tie" between tokens. That is, either a
81 space or a "tie" is inserted, depending on context. (A "tie,"
82 otherwise known as unbreakable space, is currently hard-coded as
83 "~"---from TeX.)
84
85 The format is then applied to a particular name by
86 bt_format_name(), which returns a new string.
87
88 BTJ_SPACE
89 Always insert a space between tokens.
90
91 BTJ_FORCETIE
92 Always insert a "tie" ("~") between tokens.
93
94 BTJ_NOTHING
95 Insert nothing between tokens---just jam them together.
96
97 Tokens are joined together, and thus the choice of whether to insert a
98 "discretionary tie" is made, at two places: within a part and between
99 two parts. Naturally, this only applies when "BTJ_MAYTIE" was supplied
100 as the token-join method; "BTJ_SPACE" and "BTJ_FORCETIE" always insert
101 either a space or tie, and "BTJ_NOTHING" always adds nothing between
102 tokens. Within a part, ties are added after a the first token if it is
103 less than three characters long, and before the last token. Between
104 parts, a tie is added only if the preceding part consisted of single
105 token that was less than three characters long. In all other cases,
106 spaces are inserted. (This implementation slavishly follows BibTeX.)
107
109 bt_create_name_format()
110 bt_name_format * bt_create_name_format (char * parts,
111 boolean abbrev_first)
112
113 Creates a name format for a given set of parts, with variations for
114 the most common forms of customization---the order of parts and
115 whether to abbreviate the first name.
116
117 The "parts" parameter specifies which parts to include in a
118 formatted name, as well as the order in which to format them.
119 "parts" must be a string of four or fewer characters, each of which
120 denotes one of the four name parts: for instance, "vljf" means to
121 format all four parts in "von last jr first" order. No characters
122 outside of the set "fvlj" are allowed, and no characters may be
123 repeated. "abbrev_first" controls whether the `first' part will be
124 abbreviated (i.e., only the first letter from each token will be
125 printed).
126
127 In addition to simply setting the list of parts to format and the
128 "abbreviate" flag for the first name, bt_create_name_format()
129 initializes the entire format structure so as to minimize the need
130 for further customizations:
131
132 • The "token join method"---what to insert between tokens of the
133 same part---is set to "BTJ_MAYTIE" (discretionary tie) for all
134 parts
135
136 • The "part join method"---what to insert after the final token
137 of a particular part, assuming there are more parts to
138 come---is set to "BTJ_SPACE" for the `first', `last', and `jr'
139 parts. If the `von' part is present and immediately precedes
140 the `last' part (which will almost always be the case),
141 "BTJ_MAYTIE" is used to join `von' to `last'; otherwise, `von'
142 also gets "BTJ_SPACE" for the inter-part join method.
143
144 • The abbreviation flag is set to "FALSE" for the `von', `last',
145 and `jr' parts; for `first', the abbreviation flag is set to
146 whatever you pass in as "abbrev_first".
147
148 • Initially, all "surrounding text" (pre-part, post-part, pre-
149 token, and post-token) for all parts is set to the empty
150 string. Then a few tweaks are done, depending on the
151 "abbrev_first" flag and the order of tokens. First, if
152 "abbrev_first" is "TRUE", the post-token text for first name is
153 set to "."---this changes "J R Smith" to "J. R. Smith", which
154 is usually the desired form. (If you don't want the periods,
155 you'll have to set the post-token text yourself with
156 bt_set_format_text().)
157
158 Then, if `jr' is present and immediately after `last' (almost
159 always the case), the pre-part text for `jr' is set to ", ",
160 and the inter-part join method for `last' is set to
161 "BTJ_NOTHING". This changes "John Smith Jr" (where the space
162 following "Smith" comes from formatting the last name with a
163 "BTJ_SPACE" inter-part join method) to "John Smith, Jr" (where
164 the ", " is now associated with "Jr"---that way, if there is no
165 `jr' part, the ", " will not be printed.)
166
167 Finally, if `first' is present and immediately follows either
168 `jr' or `last' (which will usually be the case in "last-name
169 first" formats), the same sort of trickery is applied: the pre-
170 part text for `first' is set to ", ", and the part join method
171 for the preceding part (either `jr' or `last') is set to
172 "BTJ_NOTHING".
173
174 While all these rules are rather complicated, they mean that you
175 are usually freed from having to do any customization of the name
176 format. Certainly this is the case if you only need "fvlj" and
177 "vljf" part orders, only want to abbreviate the first name, want
178 periods after abbreviated tokens, non-breaking spaces in the
179 "right" places, and commas in the conventional places.
180
181 If you want something out of the ordinary---for instance,
182 abbreviated tokens jammed together with no puncuation, or
183 abbreviated last names---you'll need to customize the name format a
184 bit with bt_set_format_text() and bt_set_format_options().
185
186 bt_free_name_format()
187 void bt_free_name_format (bt_name_format * format)
188
189 Frees a name format created by bt_create_name_format().
190
191 bt_set_format_text()
192 void bt_set_format_text (bt_name_format * format,
193 bt_namepart part,
194 char * pre_part,
195 char * post_part,
196 char * pre_token,
197 char * post_token)
198
199 Allows you to customize some or all of the surrounding text for a
200 single name part. Supply "NULL" for any chunk of text that you
201 don't want to change.
202
203 For instance, say you want a name format that will abbreviate first
204 names, but without any punctuation after the abbreviated tokens.
205 You could create and customize the format as follows:
206
207 format = bt_create_name_format ("fvlj", TRUE);
208 bt_set_format_text (format,
209 BTN_FIRST, /* name-part to customize */
210 NULL, NULL, /* pre- and post- part text */
211 NULL, ""); /* empty string for post-token */
212
213 Without the bt_set_format_text() call, "format" would result in
214 names formatted like "J. R. Smith". After setting the post-token
215 text for first names to "", this name would become "J R Smith".
216
217 bt_set_format_options()
218 void bt_set_format_options (bt_name_format * format,
219 bt_namepart part,
220 boolean abbrev,
221 bt_joinmethod join_tokens,
222 bt_joinmethod join_part)
223
224 Allows further customization of a name format: you can set the
225 abbreviation flag and the two token-join methods. Alas, there is
226 no mechanism for leaving a value unchanged; you must set everything
227 with bt_set_format_options().
228
229 For example, let's say that just dropping periods from abbreviated
230 tokens in the first name isn't enough; you really want to save
231 space by jamming the abbreviated tokens together: "JR Smith" rather
232 than "J R Smith" Assuming the two calls in the above example have
233 been done, the following will finish the job:
234
235 bt_set_format_options (format, BTN_FIRST,
236 TRUE, /* keep same value for abbrev flag */
237 BTJ_NOTHING, /* jam tokens together */
238 BTJ_SPACE); /* space after final token of part */
239
240 Note that we unfortunately had to know (and supply) the current
241 values for the abbreviation flag and post-part join method, even
242 though we were only setting the intra-part join method.
243
244 bt_format_name()
245 char * bt_format_name (bt_name * name, bt_name_format * format)
246
247 Once a name format has been created and customized to your heart's
248 content, you can use it to format any number of names that have
249 been split with "bt_split_name" (see bt_split_names). Simply pass
250 the name structure and name format structure, and a newly-allocated
251 string containing the formatted name will be returned to you. It
252 is your responsibility to free() this string.
253
255 btparse, bt_split_names
256
258 Greg Ward <gward@python.net>
259
260
261
262btparse, version 0.89 2023-07-21 BT_FORMAT_NAMES(1)