1Text::BibTeX(3) User Contributed Perl Documentation Text::BibTeX(3)
2
3
4
6 Text::BibTeX - interface to read and parse BibTeX files
7
9 use Text::BibTeX;
10
11 my $bibfile = Text::BibTeX::File->new("foo.bib");
12 my $newfile = Text::BibTeX::File->new(">newfoo.bib");
13
14 while ($entry = Text::BibTeX::Entry->new($bibfile))
15 {
16 next unless $entry->parse_ok;
17
18 . # hack on $entry contents, using various
19 . # Text::BibTeX::Entry methods
20 .
21
22 $entry->write ($newfile);
23 }
24
26 The "Text::BibTeX" module serves mainly as a high-level introduction to
27 the "Text::BibTeX" library, for both code and documentation purposes.
28 The code loads the two fundamental modules for processing BibTeX files
29 ("Text::BibTeX::File" and "Text::BibTeX::Entry"), and this
30 documentation gives a broad overview of the whole library that isn't
31 available in the documentation for the individual modules that comprise
32 it.
33
34 In addition, the "Text::BibTeX" module provides a number of
35 miscellaneous functions that are useful in processing BibTeX data
36 (especially the kind that comes from bibliographies as defined by
37 BibTeX 0.99, rather than generic database files). These functions
38 don't generally fit in the object-oriented class hierarchy centred
39 around the "Text::BibTeX::Entry" class, mainly because they are
40 specific to bibliographic data and operate on generic strings (rather
41 than being tied to a particular BibTeX entry). These are also
42 documented here, in "MISCELLANEOUS FUNCTIONS".
43
44 Note that every module described here begins with the "Text::BibTeX"
45 prefix. For brevity, I have dropped this prefix from most class and
46 module names in the rest of this manual page (and in most of the other
47 manual pages in the library).
48
50 The "Text::BibTeX" library includes a number of modules, many of which
51 provide classes. Usually, the relationship is simple and obvious: a
52 module provides a class of the same name---for instance, the
53 "Text::BibTeX::Entry" module provides the "Text::BibTeX::Entry" class.
54 There are a few exceptions, though: most obviously, the "Text::BibTeX"
55 module doesn't provide any classes itself, it merely loads two modules
56 ("Text::BibTeX::Entry" and "Text::BibTeX::File") that do. The other
57 exceptions are mentioned in the descriptions below, and discussed in
58 detail in the documentation for the respective modules.
59
60 The modules are presented roughly in order of increasing
61 specialization: the first three are essential for any program that
62 processes BibTeX data files, regardless of what kind of data they hold.
63 The later modules are specialized for use with bibliographic databases,
64 and serve both to emulate BibTeX 0.99's standard styles and to provide
65 an example of how to define a database structure through such
66 specialized modules. Each module is fully documented in its respective
67 manual page.
68
69 "Text::BibTeX"
70 Loads the two fundamental modules ("Entry" and "File"), and
71 provides a number of miscellaneous functions that don't fit
72 anywhere in the class hierarchy.
73
74 "Text::BibTeX::File"
75 Provides an object-oriented interface to BibTeX database files. In
76 addition to the obvious attributes of filename and filehandle, the
77 "file" abstraction manages properties such as the database
78 structure and options for it.
79
80 "Text::BibTeX::Entry"
81 Provides an object-oriented interface to BibTeX entries, which can
82 be parsed from "File" objects, arbitrary filehandles, or strings.
83 Manages all the properties of a single entry: type, key, fields,
84 and values. Also serves as the base class for the structured entry
85 classes (described in detail in Text::BibTeX::Structure).
86
87 "Text::BibTeX::Value"
88 Provides an object-oriented interface to values and simple values,
89 high-level constructs that can be used to represent the strings
90 associated with each field in an entry. Normally, field values are
91 returned simply as Perl strings, with macros expanded and multiple
92 strings "pasted" together. If desired, you can instruct
93 "Text::BibTeX" to return "Text::BibTeX::Value" objects, which give
94 you access to the original form of the data.
95
96 "Text::BibTeX::Structure"
97 Provides the "Structure" and "StructuredEntry" classes, which serve
98 primarily as base classes for the two kinds of classes that define
99 database structures. Read this man page for a comprehensive
100 description of the mechanism for implementing Perl classes
101 analogous to BibTeX "style files".
102
103 "Text::BibTeX::Bib"
104 Provides the "BibStructure" and "BibEntry" classes, which serve two
105 purposes: they fulfill the same role as the standard style files of
106 BibTeX 0.99, and they give an example of how to write new database
107 structures. These ultimately derive from, respectively, the
108 "Structure" and "StructuredEntry" classes provided by the
109 "Structure" module.
110
111 "Text::BibTeX::BibSort"
112 One of the "BibEntry" class's base classes: handles the generation
113 of sort keys for sorting prior to output formatting.
114
115 "Text::BibTeX::BibFormat"
116 One of the "BibEntry" class's base classes: handles the formatting
117 of bibliographic data for output in a markup language such as
118 LaTeX.
119
120 "Text::BibTeX::Name"
121 A class used by the "Bib" structure and specific to bibliographic
122 data as defined by BibTeX itself: parses individual author names
123 into "first", "von", "last", and "jr" parts.
124
125 "Text::BibTeX::NameFormat"
126 Also specific to bibliographic data: puts split-up names (as parsed
127 by the "Name" class) back together in a custom way.
128
129 For a first time through the library, you'll probably want to confine
130 your reading to Text::BibTeX::File and Text::BibTeX::Entry. The other
131 modules will come in handy eventually, especially if you need to
132 emulate BibTeX in a fairly fine grained way (e.g. parsing names,
133 generating sort keys). But for the simple database hacks that are the
134 bread and butter of the "Text::BibTeX" library, the "File" and "Entry"
135 classes are the bulk of what you'll need. You may also find some of
136 the material in this manual page useful, namely "CONSTANT VALUES" and
137 "UTILITY FUNCTIONS".
138
140 The "Text::BibTeX" module has a number of optional exports, most of
141 them constant values described in "CONSTANT VALUES" below. The default
142 exports are a subset of these constant values that are used
143 particularly often, the "entry metatypes" (also accessible via the
144 export tag "metatypes"). Thus, the following two lines are equivalent:
145
146 use Text::BibTeX;
147 use Text::BibTeX qw(:metatypes);
148
149 Some of the various subroutines provided by the module are also
150 exportable. "bibloop", "split_list", "purify_string", and
151 "change_case" are all useful in everyday processing of BibTeX data, but
152 don't really fit anywhere in the class hierarchy. They may be imported
153 from "Text::BibTeX" using the "subs" export tag. "check_class" and
154 "display_list" are also exportable, but only by name; they are not
155 included in any export tag. (These two mainly exist for use by other
156 modules in the library.) For instance, to use "Text::BibTeX" and
157 import the entry metatype constants and the common subroutines:
158
159 use Text::BibTeX qw(:metatypes :subs);
160
161 Another group of subroutines exists for direct manipulation of the
162 macro table maintained by the underlying C library. These functions
163 (see "Macro table functions", below) allow you to define, delete, and
164 query the value of BibTeX macros (or "abbreviations"). They may be
165 imported en masse using the "macrosubs" export tag:
166
167 use Text::BibTeX qw(:macrosubs);
168
170 The "Text::BibTeX" module makes a number of constant values available.
171 These correspond to the values of various enumerated types in the
172 underlying C library, btparse, and their meanings are more fully
173 explained in the btparse documentation.
174
175 Each group of constants is optionally exportable using an export tag
176 given in the descriptions below.
177
178 Entry metatypes
179 "BTE_UNKNOWN", "BTE_REGULAR", "BTE_COMMENT", "BTE_PREAMBLE",
180 "BTE_MACRODEF". The "metatype" method in the "Entry" class always
181 returns one of these values. The latter three describe,
182 respectively, "comment", "preamble", and "string" entries;
183 "BTE_REGULAR" describes all other entry types. "BTE_UNKNOWN"
184 should never be seen (it's mainly useful for C code that might have
185 to detect half-baked data structures). See also btparse. Export
186 tag: "metatypes".
187
188 AST node types
189 "BTAST_STRING", "BTAST_MACRO", "BTAST_NUMBER". Used to distinguish
190 the three kinds of simple values---strings, macros, and numbers.
191 The "SimpleValue" class' "type" method always returns one of these
192 three values. See also Text::BibTeX::Value, btparse. Export tag:
193 "nodetypes".
194
195 Name parts
196 "BTN_FIRST", "BTN_VON", "BTN_LAST", "BTN_JR", "BTN_NONE". Used to
197 specify the various parts of a name after it has been split up.
198 These are mainly useful when using the "NameFormat" class. See
199 also bt_split_names and bt_format_names. Export tag: "nameparts".
200
201 Join methods
202 "BTJ_MAYTIE", "BTJ_SPACE", "BTJ_FORCETIE", "BTJ_NOTHING". Used to
203 tell the "NameFormat" class how to join adjacent tokens together;
204 see Text::BibTeX::NameFormat and bt_format_names. Export tag:
205 "joinmethods".
206
208 "Text::BibTeX" provides several functions that operate outside of the
209 normal class hierarchy. Of these, only "bibloop" is likely to be of
210 much use to you in writing everyday BibTeX-hacking programs; the other
211 two ("check_class" and "display_list") are mainly provided for the use
212 of other modules in the library. They are documented here mainly for
213 completeness, but also because they might conceivably be useful in
214 other circumstances.
215
216 bibloop (ACTION, FILES [, DEST])
217 Loops over all entries in a set of BibTeX files, performing some
218 caller-supplied action on each entry. FILES should be a reference
219 to the list of filenames to process, and ACTION a reference to a
220 subroutine that will be called on each entry. DEST, if given,
221 should be a "Text::BibTeX::File" object (opened for output) to
222 which entries might be printed.
223
224 The subroutine referenced by ACTION is called with exactly one
225 argument: the "Text::BibTeX::Entry" object representing the entry
226 currently being processed. Information about both the entry itself
227 and the file where it originated is available through this object;
228 see Text::BibTeX::Entry. The ACTION subroutine is only called if
229 the entry was successfully parsed; any syntax errors will result in
230 a warning message being printed, and that entry being skipped.
231 Note that all successfully parsed entries are passed to the ACTION
232 subroutine, even "preamble", "string", and "comment" entries. To
233 skip these pseudo-entries and only process "regular" entries, then
234 your action subroutine should look something like this:
235
236 sub action {
237 my $entry = shift;
238 return unless $entry->metatype == BTE_REGULAR;
239 # process $entry ...
240 }
241
242 If your action subroutine needs any more arguments, you can just
243 create a closure (anonymous subroutine) as a wrapper, and pass it
244 to "bibloop":
245
246 sub action {
247 my ($entry, $extra_stuff) = @_;
248 # ...
249 }
250
251 my $extra = ...;
252 Text::BibTeX::bibloop (sub { &action ($_[0], $extra) }, \@files);
253
254 If the ACTION subroutine returns a true value and DEST was given,
255 then the processed entry will be written to DEST.
256
257 check_class (PACKAGE, DESCRIPTION, SUPERCLASS, METHODS)
258 Ensures that a PACKAGE implements a class meeting certain
259 requirements. First, it inspects Perl's symbol tables to ensure
260 that a package named PACKAGE actually exists. Then, it ensures
261 that the class named by PACKAGE derives from SUPERCLASS (using the
262 universal method "isa"). This derivation might be through multiple
263 inheritance, or through several generations of a class hierarchy;
264 the only requirement is that SUPERCLASS is somewhere in PACKAGE's
265 tree of base classes. Finally, it checks that PACKAGE provides
266 each method listed in METHODS (a reference to a list of method
267 names). This is done with the universal method "can", so the
268 methods might actually come from one of PACKAGE's base classes.
269
270 DESCRIPTION should be a brief string describing the class that was
271 expected to be provided by PACKAGE. It is used for generating
272 warning messages if any of the class requirements are not met.
273
274 This is mainly used by the supervisory code in
275 "Text::BibTeX::Structure", to ensure that user-supplied structure
276 modules meet the rules required of them.
277
278 display_list (LIST, QUOTE)
279 Converts a list of strings to the grammatical conventions of a
280 human language (currently, only English rules are supported). LIST
281 must be a reference to a list of strings. If this list is empty,
282 the empty string is returned. If it has one element, then just
283 that element is returned. If it has two elements, then they are
284 joined with the string " and " and the resulting string is
285 returned. Otherwise, the list has N elements for N >= 3; elements
286 1..N-1 are joined with commas, and the final element is tacked on
287 with an intervening ", and ".
288
289 If QUOTE is true, then each string is encased in single quotes
290 before anything else is done.
291
292 This is used elsewhere in the library for two very distinct
293 purposes: for generating warning messages describing lists of
294 fields that should be present or are conflicting in an entry, and
295 for generating lists of author names in formatted bibliographies.
296
298 In addition to loading the "File" and "Entry" modules, "Text::BibTeX"
299 loads the XSUB code which bridges the Perl modules to the underlying C
300 library, btparse. This XSUB code provides a number of miscellaneous
301 utility functions, most of which are put into other packages in the
302 "Text::BibTeX" family for use by the corresponding classes. (For
303 instance, the XSUB code loaded by "Text::BibTeX" provides a function
304 "Text::BibTeX::Entry::parse", which is actually documented as the
305 "parse" method of the "Text::BibTeX::Entry" class---see
306 Text::BibTeX::Entry. However, for completeness this function---and all
307 the other functions that become available when you "use
308 Text::BibTeX"---are at least mentioned here. The only functions from
309 this group that you're ever likely to use are described in "Generic
310 string-processing functions".
311
312 Startup/shutdown functions
313 These just initialize and shutdown the underlying C library. Don't
314 call either one of them; the "Text::BibTeX" startup/shutdown code takes
315 care of it as appropriate. They're just mentioned here for
316 completeness.
317
318 initialize ()
319 cleanup ()
320
321 Generic string-processing functions
322 split_list (STRING, DELIM [, FILENAME [, LINE [, DESCRIPTION [,
323 OPTS]]]])
324 Splits a string on a fixed delimiter according to the BibTeX rules
325 for splitting up lists of names. With BibTeX, the delimiter is
326 hard-coded as "and"; here, you can supply any string. Instances of
327 DELIM in STRING are considered delimiters if they are at brace-
328 depth zero, surrounded by whitespace, and not at the beginning or
329 end of STRING; the comparison is case-insensitive. See
330 bt_split_names for full details of how splitting is done (it's not
331 the same as Perl's "split" function). OPTS is a hash ref of the
332 same binmode and normalization arguments as with, e.g.
333 Text::BibTeX::File->open(). split_list calls isplit_list()
334 internally but handles UTF-8 conversion and normalization, if
335 requested.
336
337 Returns the list of strings resulting from splitting STRING on
338 DELIM.
339
340 isplit_list (STRING, DELIM [, FILENAME [, LINE [, DESCRIPTION]]])
341 Splits a string on a fixed delimiter according to the BibTeX rules
342 for splitting up lists of names. With BibTeX, the delimiter is
343 hard-coded as "and"; here, you can supply any string. Instances of
344 DELIM in STRING are considered delimiters if they are at brace-
345 depth zero, surrounded by whitespace, and not at the beginning or
346 end of STRING; the comparison is case-insensitive. See
347 bt_split_names for full details of how splitting is done (it's not
348 the same as Perl's "split" function). This function returns bytes.
349 Use Text::BibTeX::split_list to specify the same binmode and
350 normalization arguments as with, e.g. Text::BibTeX::File->open()
351
352 Returns the list of strings resulting from splitting STRING on
353 DELIM.
354
355 purify_string (STRING [, OPTIONS])
356 "Purifies" STRING in the BibTeX way (usually for generation of sort
357 keys). See bt_misc for details; note that, unlike the C interface,
358 "purify_string" does not modify STRING in-place. A purified copy
359 of the input string is returned.
360
361 OPTIONS is currently unused.
362
363 change_case (TRANSFORM, STRING [, OPTIONS])
364 Transforms the case of STRING according to TRANSFORM (a single
365 character, one of 'u', 'l', or 't'). See bt_misc for details;
366 again, "change_case" differs from the C interface in that STRING is
367 not modified in-place---the input string is copied, and the
368 transformed copy is returned.
369
370 Entry-parsing functions
371 Although these functions are provided by the "Text::BibTeX" module,
372 they are actually in the "Text::BibTeX::Entry" package. That's because
373 they are implemented in C, and thus loaded with the XSUB code that
374 "Text::BibTeX" loads; however, they are actually methods in the
375 "Text::BibTeX::Entry" class. Thus, they are documented as methods in
376 Text::BibTeX::Entry.
377
378 parse (ENTRY_STRUCT, FILENAME, FILEHANDLE)
379 parse_s (ENTRY_STRUCT, TEXT)
380
381 Macro table functions
382 These functions allow direct access to the macro table maintained by
383 btparse, the C library underlying "Text::BibTeX". In the normal course
384 of events, macro definitions always accumulate, and are only defined as
385 a result of parsing a macro definition (@string) entry. btparse never
386 deletes old macro definitions for you, and doesn't have any built-in
387 default macros. If, for example, you wish to start fresh with new
388 macros for every file, use "delete_all_macros". If you wish to pre-
389 define certain macros, use "add_macro_text". (But note that the "Bib"
390 structure, as part of its mission to emulate BibTeX 0.99, defines the
391 standard "month name" macros for you.)
392
393 See also bt_macros in the btparse documentation for a description of
394 the C interface to these functions.
395
396 add_macro_text (MACRO, TEXT [, FILENAME [, LINE]])
397 Defines a new macro, or redefines an old one. MACRO is the name of
398 the macro, and TEXT is the text it should expand to. FILENAME and
399 LINE are just used to generate any warnings about the macro
400 definition. The only such warning occurs when you redefine an old
401 macro: its value is overridden, and add_macro_text() issues a
402 warning saying so.
403
404 delete_macro (MACRO)
405 Deletes a macro from the macro table. If MACRO isn't defined,
406 takes no action.
407
408 delete_all_macros ()
409 Deletes all macros from the macro table, even the predefined month
410 names.
411
412 macro_length (MACRO)
413 Returns the length of a macro's expansion text. If the macro is
414 undefined, returns 0; no warning is issued.
415
416 macro_text (MACRO [, FILENAME [, LINE]])
417 Returns the expansion text of a macro. If the macro is not
418 defined, issues a warning and returns "undef". FILENAME and LINE,
419 if supplied, are used for generating this warning; they should be
420 supplied if you're looking up the macro as a result of finding it
421 in a file.
422
423 Name-parsing functions
424 These are both private functions for the use of the "Name" class, and
425 therefore are put in the "Text::BibTeX::Name" package. You should use
426 the interface provided by that class for parsing names in the BibTeX
427 style.
428
429 _split (NAME_STRUCT, NAME, FILENAME, LINE, NAME_NUM, KEEP_CSTRUCT)
430 free (NAME_STRUCT)
431
432 Name-formatting functions
433 These are private functions for the use of the "NameFormat" class, and
434 therefore are put in the "Text::BibTeX::NameFormat" package. You
435 should use the interface provided by that class for formatting names in
436 the BibTeX style.
437
438 create ([PARTS [, ABBREV_FIRST]])
439 free (FORMAT_STRUCT)
440 _set_text (FORMAT_STRUCT, PART, PRE_PART, POST_PART, PRE_TOKEN,
441 POST_TOKEN)
442 _set_options (FORMAT_STRUCT, PART, ABBREV, JOIN_TOKENS, JOIN_PART)
443 format_name (NAME_STRUCT, FORMAT_STRUCT)
444
446 "Text::BibTeX" inherits several limitations from its base C library,
447 btparse; see "BUGS AND LIMITATIONS" in btparse for details. In
448 addition, "Text::BibTeX" will not work with a Perl binary built using
449 the "sfio" library. This is because Perl's I/O abstraction layer does
450 not extend to third-party C libraries that use stdio, and btparse most
451 certainly does use stdio.
452
454 btool_faq, Text::BibTeX::File, Text::BibTeX::Entry, Text::BibTeX::Value
455
457 Greg Ward <gward@python.net>
458
460 Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This
461 file is part of the Text::BibTeX library. This library is free
462 software; you may redistribute it and/or modify it under the same terms
463 as Perl itself.
464
465
466
467perl v5.36.0 2023-01-29 Text::BibTeX(3)