Text::BibTeX::Name(3pm)

1Text::BibTeX::Name(3) User Contributed Perl DocumentationText::BibTeX::Name(3)
2
3
4

NAME

6       Text::BibTeX::Name - interface to BibTeX-style author names
7

SYNOPSIS

9          use Text::BibTeX::Name;
10
11          $name = Text::BibTeX::Name->new();
12          $name->split('J. Random Hacker');
13          # or:
14          $name = Text::BibTeX::Name->new('J. Random Hacker');
15
16          @firstname_tokens = $name->part ('first');
17          $lastname = join (' ', $name->part ('last'));
18
19          $format = Text::BibTeX::NameFormat->new();
20          # ...customize $format...
21          $formatted = $name->format ($format);
22

DESCRIPTION

24       "Text::BibTeX::Name" provides an abstraction for BibTeX-style names and
25       some basic operations on them.  A name, in the BibTeX world, consists
26       of a list of tokens which are divided amongst four parts: `first',
27       `von', `last', and `jr'.
28
29       Tokens are separated by whitespace or commas at brace-level zero.  Thus
30       the name
31
32          van der Graaf, Horace Q.
33
34       has five tokens, whereas the name
35
36          {Foo, Bar, and Sons}
37
38       consists of a single token.  Skip down to "EXAMPLES" for more examples,
39       or read on if you want to know the exact details of how names are split
40       into tokens and parts.
41
42       How tokens are divided into parts depends on the form of the name.  If
43       the name has no commas at brace-level zero (as in the second example),
44       then it is assumed to be in either "first last" or "first von last"
45       form.  If there are no tokens that start with a lower-case letter, then
46       "first last" form is assumed: the final token is the last name, and all
47       other tokens form the first name.  Otherwise, the earliest contiguous
48       sequence of tokens with initial lower-case letters is taken as the
49       `von' part; if this sequence includes the final token, then a warning
50       is printed and the final token is forced to be the `last' part.
51
52       If a name has a single comma, then it is assumed to be in "von last,
53       first" form.  A leading sequence of tokens with initial lower-case
54       letters, if any, forms the `von' part; tokens between the `von' and the
55       comma form the `last' part; tokens following the comma form the `first'
56       part.  Again, if there are no tokens following a leading sequence of
57       lowercase tokens, a warning is printed and the token immediately
58       preceding the comma is taken to be the `last' part.
59
60       If a name has more than two commas, a warning is printed and the name
61       is treated as though only the first two commas were present.
62
63       Finally, if a name has two commas, it is assumed to be in "von last,
64       jr, first" form.  (This is the only way to represent a name with a `jr'
65       part.)  The parsing of the name is the same as for a one-comma name,
66       except that tokens between the two commas are taken to be the `jr'
67       part.
68

CAVEAT

70       The C code that does the actual work of splitting up names takes a
71       shortcut and makes few assumptions about whitespace.  In particular,
72       there must be no leading whitespace, no trailing whitespace, no
73       consecutive whitespace characters in the string, and no whitespace
74       characters other than space.  In other words, all whitespace must
75       consist of lone internal spaces.
76

EXAMPLES

78       The strings "John Smith" and "Smith, John" are different
79       representations of the same name, so split into parts and tokens the
80       same way, namely as:
81
82          first => ('John')
83          von   => ()
84          last  => ('Smith')
85          jr    => ()
86
87       Note that every part is a list of tokens, even if there is only one
88       token in that part; empty parts get empty token lists.  Every token is
89       just a string.  Writing this example in actual code is simple:
90
91          $name = Text::BibTeX::Name->new("John Smith");  # or "Smith, John"
92          $name->part ('first');       # returns list ("John")
93          $name->part ('last');        # returns list ("Smith")
94          $name->part ('von');         # returns list ()
95          $name->part ('jr');          # returns list ()
96
97       (We'll omit the empty parts in the rest of the examples: just assume
98       that any unmentioned part is an empty list.)  If more than two tokens
99       are included and there's no comma, they'll go to the first name: thus
100       "John Q. Smith" splits into
101
102          first => ("John", "Q."))
103          last  => ("Smith")
104
105       and "J. R. R. Tolkein" into
106
107          first => ("J.", "R.", "R.")
108          last  => ("Tolkein")
109
110       The ambiguous name "Kevin Philips Bong" splits into
111
112          first => ("Kevin", "Philips")
113          last  => ("Bong")
114
115       which may or may not be the right thing, depending on the particular
116       person.  There's no way to know though, so if this fellow's last name
117       is "Philips Bong" and not "Bong", the string representation of his name
118       must disambiguate.  One possibility is "Philips Bong, Kevin" which
119       splits into
120
121          first => ("Kevin")
122          last  => ("Philips", "Bong")
123
124       Alternately, "Kevin {Philips Bong}" takes advantage of the fact that
125       tokes are only split on whitespace at brace-level zero, and becomes
126
127          first => ("Kevin")
128          last  => ("{Philips Bong}")
129
130       which is fine if your names are destined to be processed by TeX, but
131       might be problematic in other contexts.  Similarly, "St John-Mollusc,
132       Oliver" becomes
133
134          first => ("Oliver")
135          last  => ("St", "John-Mollusc")
136
137       which can also be written as "Oliver {St John-Mollusc}":
138
139          first => ("Oliver")
140          last  => ("{St John-Mollusc}")
141
142       Since tokens are separated purely by whitespace, hyphenated names will
143       work either way: both "Nigel Incubator-Jones" and "Incubator-Jones,
144       Nigel" come out as
145
146          first => ("Nigel")
147          last  => ("Incubator-Jones")
148
149       Multi-token last names with lowercase components -- the "von part" --
150       work fine: both "Ludwig van Beethoven" and "van Beethoven, Ludwig"
151       parse (correctly) into
152
153          first => ("Ludwig")
154          von   => ("van")
155          last  => ("Beethoven")
156
157       This allows these European aristocratic names to sort properly, i.e.
158       van Beethoven under B rather than v.  Speaking of aristocratic European
159       names, "Charles Louis Xavier Joseph de la Vall{\'e}e Poussin" is
160       handled just fine, and splits into
161
162          first => ("Charles", "Louis", "Xavier", "Joseph")
163          von   => ("de", "la")
164          last  => ("Vall{\'e}e", "Poussin")
165
166       so could be sorted under V rather than d.  (Note that the sorting
167       algorithm in Text::BibTeX::BibSort is a slavish imitiation of BibTeX
168       0.99, and therefore does the wrong thing with these names: the sort key
169       starts with the "von" part.)
170
171       However, capitalized "von parts" don't work so well: "R. J. Van de
172       Graaff" splits into
173
174          first => ("R.", "J.", "Van")
175          von   => ("de")
176          last  => ("Graaff")
177
178       which is clearly wrong.  This name should be represented as "Van de
179       Graaff, R. J."
180
181          first => ("R.", "J.")
182          last  => ("Van", "de", "Graaff")
183
184       which is probably right.  (This particular Van de Graaff was an
185       American, so he probably belongs under V -- which is where my (British)
186       dictionary puts him.  Other Van de Graaff's mileages may vary.)
187
188       Finally, many names include a suffix: "Jr.", "III", "fils", and so
189       forth.  These are handled, but with some limitations.  If there's a
190       comma before the suffix (the usual U.S. convention for "Jr."), then the
191       name should be in last, jr, first form, e.g. "Doe, Jr., John" comes out
192       (correctly) as
193
194          first => ("John")
195          last  => ("Doe")
196          jr    => ("Jr.")
197
198       but "John Doe, Jr." is ambiguous and is parsed as
199
200          first => ("Jr.")
201          last  => ("John", "Doe")
202
203       (so don't do it that way).  If there's no comma before the suffix --
204       the usual for Roman numerals, and occasionally seen with "Jr." -- then
205       you're stuck and have to make the suffix part of the last name.  Thus,
206       "Gates III, William H." comes out
207
208          first => ("William", "H.")
209          last  => ("Gates", "III")
210
211       but "William H. Gates III" is ambiguous, and becomes
212
213          first => ("William", "H.", "Gates")
214          last  => ("III")
215
216       -- not what you want.  Again, the curly-brace trick comes in handy, so
217       "William H. {Gates III}" splits into
218
219          first => ("William", "H.")
220          last  => ("{Gates III}")
221
222       There is no way to make a comma-less suffix the "jr" part.  (This is an
223       unfortunate consequence of slavishly imitating BibTeX 0.99.)
224
225       Finally, names that aren't really names of people but rather are
226       organization or company names should be forced into a single token by
227       wrapping them in curly braces.  For example, "Foo, Bar and Sons" should
228       be written "{Foo, Bar and Sons}", which will split as
229
230          last  => ("{Foo, Bar and Sons}")
231
232       Of course, if this is one name in a BibTeX "authors" or "editors" list,
233       this name has to be wrapped in braces anyways (because of the " and "),
234       but that's another story.
235

FORMATTING NAMES

237       Putting a split-up name back together again in a flexible, customizable
238       way is the job of another module: see Text::BibTeX::NameFormat.
239

METHODS

241       new([ [OPTS,] NAME [, FILENAME, LINE, NAME_NUM]])
242           Creates a new "Text::BibTeX::Name" object.  If NAME is supplied, it
243           must be a string containing a single name, and it will be be passed
244           to the "split" method for further processing.  FILENAME, LINE, and
245           NAME_NUM, if present, are all also passed to "split" to allow
246           better error messages.
247
248           If the first argument is a hash reference, it is used to define
249           configuration values. At the moment the available values are:
250
251           BINMODE
252               Set the way Text::BibTeX deals with strings. By default it
253               manages strings as bytes. You can set BINMODE to 'utf-8' to get
254               NFC normalized UTF-8 strings and you can customise the
255               normalization with the NORMALIZATION option.
256
257                  Text::BibTeX::Name->new(
258                     { binmode => 'utf-8', normalization => 'NFD' },
259                     "Alberto Simo~es"});
260
261       split (NAME [, FILENAME, LINE, NAME_NUM])
262           Splits NAME (a string containing a single name) into tokens and
263           subsequently into the four parts of a BibTeX-style name (first,
264           von, last, and jr).  (Each part is a list of tokens, and tokens are
265           separated by whitespace or commas at brace-depth zero.  See above
266           for full details on how a name is split into its component parts.)
267
268           The token-lists that make up each part of the name are then stored
269           in the "Text::BibTeX::Name" object for later retrieval or
270           formatting with the "part" and "format" methods.
271
272       part (PARTNAME)
273           Returns the list of tokens in part PARTNAME of a name previously
274           split with "split".  For example, suppose a "Text::BibTeX::Name"
275           object is created and initialized like this:
276
277              $name = Text::BibTeX::Name->new();
278              $name->split ('Charles Louis Xavier Joseph de la Vall{\'e}e Poussin');
279
280           Then this code:
281
282              $name->part ('von');
283
284           would return the list "('de','la')".
285
286       format (FORMAT)
287           Formats a name according to the specifications encoded in FORMAT,
288           which should be a "Text::BibTeX::NameFormat" (or descendant)
289           object.  (In short, it must supply a method "apply" which takes a
290           "Text::BibTeX::NameFormat" object as its only argument.)  Returns
291           the formatted name as a string.
292
293           See Text::BibTeX::NameFormat for full details on formatting names.
294

AUTHOR

299       Greg Ward <gward@python.net>
300

COPYRIGHT

302       Copyright (c) 1997-2000 by Gregory P. Ward.  All rights reserved.  This
303       file is part of the Text::BibTeX library.  This library is free
304       software; you may redistribute it and/or modify it under the same terms
305       as Perl itself.
306
307
308
309perl v5.32.0                      2020-07-28             Text::BibTeX::Name(3)