1BT_FORMAT_NAMES(1)                  btparse                 BT_FORMAT_NAMES(1)
2
3
4

NAME

6       bt_format_names - formatting BibTeX names for consistent output
7

SYNOPSIS

9          bt_name_format * bt_create_name_format (char * parts,
10                                                  boolean abbrev_first);
11          void bt_free_name_format (bt_name_format * format);
12          void bt_set_format_text (bt_name_format * format,
13                                   bt_namepart part,
14                                   char * pre_part,
15                                   char * post_part,
16                                   char * pre_token,
17                                   char * post_token);
18          void bt_set_format_options (bt_name_format * format,
19                                      bt_namepart part,
20                                      boolean abbrev,
21                                      bt_joinmethod join_tokens,
22                                      bt_joinmethod join_part);
23          char * bt_format_name (bt_name * name, bt_name_format * format);
24

DESCRIPTION

26       After splitting a name into its components parts (represented as a
27       "bt_name" structure), you often want to put it back together again as a
28       single string in a consistent way.  btparse provides a very flexible
29       way to do this, generally in two stages: first, you create a "name
30       format" which describes how to put the tokens and parts of any name
31       back together, and then you apply the format to a particular name.
32
33       The "name format" is encapsulated in a "bt_name_format" structure,
34       which is created with bt_create_name_format().  This function includes
35       some clever trickery that means you can usually get away with calling
36       it alone, and not need to do any customization of the format.  If you
37       do need to customize the format, though, bt_set_format_text() and
38       bt_set_format_options() provide that capability.
39
40       The format controls the following:
41
42       •   which name parts are printed, and in what order (e.g. "first von
43           last jr", or "von last jr first")
44
45       •   the text that precedes and follows each part (e.g. if the first
46           name follows the last name, you probably want a comma before the
47           `first' part: "Smith, John" rather than "Smith John")
48
49       •   the text that precedes and follows each token (e.g. if the first
50           name is abbreviated, you may want a period after each token: "J. R.
51           Smith" rather than "J R Smith")
52
53       •   the method used to join the tokens of each part together
54
55       •   the method used to join each part to the following part
56
57       All of these except the list of parts to format are kept in arrays
58       indexed by name part: for example, the structure has a field
59
60          char * post_token[BT_MAX_NAMEPARTS]
61
62       and "post_token[BTN_FIRST]" ("BTN_FIRST" is from the "bt_namepart"
63       "enum") is the string to be added after each token in the first
64       name---for example, "." if the first name is to be abbreviated in the
65       conventional way.
66
67       Yet another "enum", "bt_joinmethod", describes the available methods
68       for joining tokens together.  Note that there are two sets of join
69       methods in a name format: between tokens within a single part, and
70       between the tokens of two different parts.  The first allows you, for
71       example, to change "J R Smith" (first name abbreviated with no post-
72       token text but tokens joined by a space) to "JR Smith" (the same, but
73       first-name tokens jammed together).  The second is mainly used to
74       ensure that "von" and "last" name-parts may be joined with a tie:
75       "de~Roche" rather than "de Roche".
76
77       The token join methods are:
78
79       BTJ_MAYTIE
80           Insert a "discretionary tie" between tokens.  That is, either a
81           space or a "tie" is inserted, depending on context.  (A "tie,"
82           otherwise known as unbreakable space, is currently hard-coded as
83           "~"---from TeX.)
84
85           The format is then applied to a particular name by
86           bt_format_name(), which returns a new string.
87
88       BTJ_SPACE
89           Always insert a space between tokens.
90
91       BTJ_FORCETIE
92           Always insert a "tie" ("~") between tokens.
93
94       BTJ_NOTHING
95           Insert nothing between tokens---just jam them together.
96
97       Tokens are joined together, and thus the choice of whether to insert a
98       "discretionary tie" is made, at two places: within a part and between
99       two parts.  Naturally, this only applies when "BTJ_MAYTIE" was supplied
100       as the token-join method; "BTJ_SPACE" and "BTJ_FORCETIE" always insert
101       either a space or tie, and "BTJ_NOTHING" always adds nothing between
102       tokens.  Within a part, ties are added after a the first token if it is
103       less than three characters long, and before the last token.  Between
104       parts, a tie is added only if the preceding part consisted of single
105       token that was less than three characters long.  In all other cases,
106       spaces are inserted.  (This implementation slavishly follows BibTeX.)
107

FUNCTIONS

109       bt_create_name_format()
110              bt_name_format * bt_create_name_format (char * parts,
111                                                      boolean abbrev_first)
112
113           Creates a name format for a given set of parts, with variations for
114           the most common forms of customization---the order of parts and
115           whether to abbreviate the first name.
116
117           The "parts" parameter specifies which parts to include in a
118           formatted name, as well as the order in which to format them.
119           "parts" must be a string of four or fewer characters, each of which
120           denotes one of the four name parts: for instance, "vljf" means to
121           format all four parts in "von last jr first" order.  No characters
122           outside of the set "fvlj" are allowed, and no characters may be
123           repeated.  "abbrev_first" controls whether the `first' part will be
124           abbreviated (i.e., only the first letter from each token will be
125           printed).
126
127           In addition to simply setting the list of parts to format and the
128           "abbreviate" flag for the first name, bt_create_name_format()
129           initializes the entire format structure so as to minimize the need
130           for further customizations:
131
132           •   The "token join method"---what to insert between tokens of the
133               same part---is set to "BTJ_MAYTIE" (discretionary tie) for all
134               parts
135
136           •   The "part join method"---what to insert after the final token
137               of a particular part, assuming there are more parts to
138               come---is set to "BTJ_SPACE" for the `first', `last', and `jr'
139               parts.  If the `von' part is present and immediately precedes
140               the `last' part (which will almost always be the case),
141               "BTJ_MAYTIE" is used to join `von' to `last'; otherwise, `von'
142               also gets "BTJ_SPACE" for the inter-part join method.
143
144           •   The abbreviation flag is set to "FALSE" for the `von', `last',
145               and `jr' parts; for `first', the abbreviation flag is set to
146               whatever you pass in as "abbrev_first".
147
148           •   Initially, all "surrounding text" (pre-part, post-part, pre-
149               token, and post-token) for all parts is set to the empty
150               string.  Then a few tweaks are done, depending on the
151               "abbrev_first" flag and the order of tokens.  First, if
152               "abbrev_first" is "TRUE", the post-token text for first name is
153               set to "."---this changes "J R Smith" to "J. R. Smith", which
154               is usually the desired form.  (If you don't want the periods,
155               you'll have to set the post-token text yourself with
156               bt_set_format_text().)
157
158               Then, if `jr' is present and immediately after `last' (almost
159               always the case), the pre-part text for `jr' is set to ", ",
160               and the inter-part join method for `last' is set to
161               "BTJ_NOTHING".  This changes "John Smith Jr" (where the space
162               following "Smith" comes from formatting the last name with a
163               "BTJ_SPACE" inter-part join method) to "John Smith, Jr" (where
164               the ", " is now associated with "Jr"---that way, if there is no
165               `jr' part, the ", " will not be printed.)
166
167               Finally, if `first' is present and immediately follows either
168               `jr' or `last' (which will usually be the case in "last-name
169               first" formats), the same sort of trickery is applied: the pre-
170               part text for `first' is set to ", ", and the part join method
171               for the preceding part (either `jr' or `last') is set to
172               "BTJ_NOTHING".
173
174           While all these rules are rather complicated, they mean that you
175           are usually freed from having to do any customization of the name
176           format.  Certainly this is the case if you only need "fvlj" and
177           "vljf" part orders, only want to abbreviate the first name, want
178           periods after abbreviated tokens, non-breaking spaces in the
179           "right" places, and commas in the conventional places.
180
181           If you want something out of the ordinary---for instance,
182           abbreviated tokens jammed together with no puncuation, or
183           abbreviated last names---you'll need to customize the name format a
184           bit with bt_set_format_text() and bt_set_format_options().
185
186       bt_free_name_format()
187              void bt_free_name_format (bt_name_format * format)
188
189           Frees a name format created by bt_create_name_format().
190
191       bt_set_format_text()
192              void bt_set_format_text (bt_name_format * format,
193                                       bt_namepart part,
194                                       char * pre_part,
195                                       char * post_part,
196                                       char * pre_token,
197                                       char * post_token)
198
199           Allows you to customize some or all of the surrounding text for a
200           single name part.  Supply "NULL" for any chunk of text that you
201           don't want to change.
202
203           For instance, say you want a name format that will abbreviate first
204           names, but without any punctuation after the abbreviated tokens.
205           You could create and customize the format as follows:
206
207              format = bt_create_name_format ("fvlj", TRUE);
208              bt_set_format_text (format,
209                                  BTN_FIRST,       /* name-part to customize */
210                                  NULL, NULL,      /* pre- and post- part text */
211                                  NULL, "");       /* empty string for post-token */
212
213           Without the bt_set_format_text() call, "format" would result in
214           names formatted like "J. R. Smith".  After setting the post-token
215           text for first names to "", this name would become "J R Smith".
216
217       bt_set_format_options()
218              void bt_set_format_options (bt_name_format * format,
219                                          bt_namepart part,
220                                          boolean abbrev,
221                                          bt_joinmethod join_tokens,
222                                          bt_joinmethod join_part)
223
224           Allows further customization of a name format: you can set the
225           abbreviation flag and the two token-join methods.  Alas, there is
226           no mechanism for leaving a value unchanged; you must set everything
227           with bt_set_format_options().
228
229           For example, let's say that just dropping periods from abbreviated
230           tokens in the first name isn't enough; you really want to save
231           space by jamming the abbreviated tokens together: "JR Smith" rather
232           than "J R Smith"  Assuming the two calls in the above example have
233           been done, the following will finish the job:
234
235              bt_set_format_options (format, BTN_FIRST,
236                                     TRUE,         /* keep same value for abbrev flag */
237                                     BTJ_NOTHING,  /* jam tokens together */
238                                     BTJ_SPACE);   /* space after final token of part */
239
240           Note that we unfortunately had to know (and supply) the current
241           values for the abbreviation flag and post-part join method, even
242           though we were only setting the intra-part join method.
243
244       bt_format_name()
245              char * bt_format_name (bt_name * name, bt_name_format * format)
246
247           Once a name format has been created and customized to your heart's
248           content, you can use it to format any number of names that have
249           been split with "bt_split_name" (see bt_split_names).  Simply pass
250           the name structure and name format structure, and a newly-allocated
251           string containing the formatted name will be returned to you.  It
252           is your responsibility to free() this string.
253

SEE ALSO

255       btparse, bt_split_names
256

AUTHOR

258       Greg Ward <gward@python.net>
259
260
261
262btparse, version 0.89             2023-01-29                BT_FORMAT_NAMES(1)
Impressum