1BT_MISC(1)                          btparse                         BT_MISC(1)
2
3
4

NAME

6       bt_misc - miscellaneous BibTeX-like string-processing utilities
7

SYNOPSIS

9          void bt_purify_string (char * string, btshort options);
10          void bt_change_case (char transform, char * string, btshort options);
11

DESCRIPTION

13       bt_purify_string()
14              void bt_purify_string (char * string, btshort options);
15
16           "Purifies" a "string" in the BibTeX way (usually used for
17           generating sort keys).  "string" is modified in-place.  "options"
18           is currently unused; just set it to zero for future compatibility.
19           Purification consists of copying alphanumeric characters,
20           converting hyphens and ties to space, copying spaces, and skipping
21           (almost) everything else.
22
23           "Almost" because "special characters" (used for accented and non-
24           English letters) are handled specially.  Recall that a BibTeX
25           special character is any brace-group that starts at brace-depth
26           zero whose first character is a backslash.  For instance, the
27           string
28
29              {\foo bar}Herr M\"uller went from {P{\r r}erov} to {\AA}rhus
30
31           contains two special characters: "{\foo bar}" and "\AA".  Neither
32           the "\"u" nor the "\r r" are special characters, because they are
33           not at the right brace depth.
34
35           Special characters are handled as follows: if the control sequence
36           (the TeX command that follows the backslash) is recognized as one
37           of LaTeX's "foreign letters" ("\oe", "\ae", "\o", "\l", "\ae",
38           "\ss", plus uppercase versions), then it is converted to a
39           reasonable English approximation by stripping the backslash and
40           converting the second character (if any) to lowercase; thus,
41           "{\AA}" in the above example would become simply "Aa".  All other
42           control sequences in a special character are stripped, as are all
43           non-alphabetic characters.
44
45           For example the above string, after "purification," becomes
46
47              barHerr Muller went from Pr rerov to Aarhus
48
49           Obviously, something has gone wrong with the word "P{\r r}erov" (a
50           town in the Czech Republic).  The accented `r' should be a special
51           character, starting at brace-depth zero.  If the original string
52           were instead
53
54              {\foo bar}Herr M\"uller went from P{\r r}erov to {\AA}rhus
55
56           then the purified result would be more sensible:
57
58              barHerr Muller went from Prerov to Aarhus
59
60           Note the use of a "nonsense" special character "{\foo bar}": this
61           trick is often used to put certain text in a string solely for
62           generating sort keys; the text is then ignored when the document is
63           processed by TeX (as long as "\foo" is defined as a no-op TeX
64           macro).  This assumes, of course, that the output is eventually
65           processed by TeX; if not, then this trick will backfire on you.
66
67           Also, "bt_purify_string()" is adequate for generating sort keys
68           when you want to sort according to English-language conventions.
69           To follow the conventions of other languages, though, a more
70           sophisticated approach will be needed; hopefully, future versions
71           of btparse will address this deficiency.
72
73       bt_change_case()
74              void bt_change_case (char transform, char * string, btshort options);
75
76           Converts a string to lowercase, uppercase, or "non-book title
77           capitalization", with special attention paid to BibTeX special
78           characters and other brace-groups.  The form of conversion is
79           selected by the single character "transform": 'u' to convert to
80           uppercase, 'l' for lowercase, and 't' for "title capitalization".
81           "string" is modified in-place, and "options" is currently unused;
82           set it to zero for future compatibility.
83
84           Lowercase and uppercase conversion are obvious, with the proviso
85           that text in braces is treated differently (explained below).
86           Title capitalization simply means that everything is converted to
87           lowercase, except the first letter of the first word, and words
88           immediately following a colon or sentence-ending punctuation.  For
89           instance,
90
91              Flying Squirrels: Their Peculiar Habits. Part One
92
93           would be converted to
94
95              Flying squirrels: Their peculiar habits. Part one
96
97           Text within braces is handled as follows.  First, in a "special
98           character" (see above for definition), control sequences that
99           constitute one of LaTeX's non-English letters are converted
100           appropriately---e.g., when converting to lowercase, "\AE" becomes
101           "\ae").  Any other control sequence in a special character
102           (including accents) is preserved, and all text in a special
103           character, regardless of depth and punctuation, is converted to
104           lowercase or uppercase.  (For "title capitalization," all text in a
105           special character is converted to lowercase.)
106
107           Brace groups that are not special characters are left completely
108           untouched: neither text nor control sequences within non-special
109           character braces are touched.
110
111           For example, the string
112
113              A Guide to \LaTeXe: Document Preparation ...
114
115           would, when "transform" is 't' (title capitalization), be converted
116           to
117
118              A guide to \latexe: Document preparation ...
119
120           which is probably not the desired result.  A better attempt is
121
122              A Guide to {\LaTeXe}: Document Preparation ...
123
124           which becomes
125
126              A guide to {\LaTeXe}: Document preparation ...
127
128           However, if you go back and re-read the description of
129           "bt_purify_string()", you'll discover that "{\LaTeXe}" here is a
130           special character, but not a non-English letter: thus, the control
131           sequence is stripped.  Thus, a sort key generated from this title
132           would be
133
134              A Guide to  Document Preparation
135
136           ...oops!  The right solution (and this applies to any title with a
137           TeX command that becomes actual text) is to bury the control
138           sequence at brace-depth two:
139
140              A Guide to {{\LaTeXe}}: Document Preparation ...
141

SEE ALSO

143       btparse
144

AUTHOR

146       Greg Ward <gward@python.net>
147
148
149
150btparse, version 0.88             2020-01-30                        BT_MISC(1)
Impressum