1BT_MISC(1) btparse BT_MISC(1)
2
3
4
6 bt_misc - miscellaneous BibTeX-like string-processing utilities
7
9 void bt_purify_string (char * string, btshort options);
10 void bt_change_case (char transform, char * string, btshort options);
11
13 bt_purify_string()
14 void bt_purify_string (char * string, btshort options);
15
16 "Purifies" a "string" in the BibTeX way (usually used for
17 generating sort keys). "string" is modified in-place. "options"
18 is currently unused; just set it to zero for future compatibility.
19 Purification consists of copying alphanumeric characters,
20 converting hyphens and ties to space, copying spaces, and skipping
21 (almost) everything else.
22
23 "Almost" because "special characters" (used for accented and non-
24 English letters) are handled specially. Recall that a BibTeX
25 special character is any brace-group that starts at brace-depth
26 zero whose first character is a backslash. For instance, the
27 string
28
29 {\foo bar}Herr M\"uller went from {P{\r r}erov} to {\AA}rhus
30
31 contains two special characters: "{\foo bar}" and "\AA". Neither
32 the "\"u" nor the "\r r" are special characters, because they are
33 not at the right brace depth.
34
35 Special characters are handled as follows: if the control sequence
36 (the TeX command that follows the backslash) is recognized as one
37 of LaTeX's "foreign letters" ("\oe", "\ae", "\o", "\l", "\ae",
38 "\ss", plus uppercase versions), then it is converted to a
39 reasonable English approximation by stripping the backslash and
40 converting the second character (if any) to lowercase; thus,
41 "{\AA}" in the above example would become simply "Aa". All other
42 control sequences in a special character are stripped, as are all
43 non-alphabetic characters.
44
45 For example the above string, after "purification," becomes
46
47 barHerr Muller went from Pr rerov to Aarhus
48
49 Obviously, something has gone wrong with the word "P{\r r}erov" (a
50 town in the Czech Republic). The accented `r' should be a special
51 character, starting at brace-depth zero. If the original string
52 were instead
53
54 {\foo bar}Herr M\"uller went from P{\r r}erov to {\AA}rhus
55
56 then the purified result would be more sensible:
57
58 barHerr Muller went from Prerov to Aarhus
59
60 Note the use of a "nonsense" special character "{\foo bar}": this
61 trick is often used to put certain text in a string solely for
62 generating sort keys; the text is then ignored when the document is
63 processed by TeX (as long as "\foo" is defined as a no-op TeX
64 macro). This assumes, of course, that the output is eventually
65 processed by TeX; if not, then this trick will backfire on you.
66
67 Also, bt_purify_string() is adequate for generating sort keys when
68 you want to sort according to English-language conventions. To
69 follow the conventions of other languages, though, a more
70 sophisticated approach will be needed; hopefully, future versions
71 of btparse will address this deficiency.
72
73 bt_change_case()
74 void bt_change_case (char transform, char * string, btshort options);
75
76 Converts a string to lowercase, uppercase, or "non-book title
77 capitalization", with special attention paid to BibTeX special
78 characters and other brace-groups. The form of conversion is
79 selected by the single character "transform": 'u' to convert to
80 uppercase, 'l' for lowercase, and 't' for "title capitalization".
81 "string" is modified in-place, and "options" is currently unused;
82 set it to zero for future compatibility.
83
84 Lowercase and uppercase conversion are obvious, with the proviso
85 that text in braces is treated differently (explained below).
86 Title capitalization simply means that everything is converted to
87 lowercase, except the first letter of the first word, and words
88 immediately following a colon or sentence-ending punctuation. For
89 instance,
90
91 Flying Squirrels: Their Peculiar Habits. Part One
92
93 would be converted to
94
95 Flying squirrels: Their peculiar habits. Part one
96
97 Text within braces is handled as follows. First, in a "special
98 character" (see above for definition), control sequences that
99 constitute one of LaTeX's non-English letters are converted
100 appropriately---e.g., when converting to lowercase, "\AE" becomes
101 "\ae"). Any other control sequence in a special character
102 (including accents) is preserved, and all text in a special
103 character, regardless of depth and punctuation, is converted to
104 lowercase or uppercase. (For "title capitalization," all text in a
105 special character is converted to lowercase.)
106
107 Brace groups that are not special characters are left completely
108 untouched: neither text nor control sequences within non-special
109 character braces are touched.
110
111 For example, the string
112
113 A Guide to \LaTeXe: Document Preparation ...
114
115 would, when "transform" is 't' (title capitalization), be converted
116 to
117
118 A guide to \latexe: Document preparation ...
119
120 which is probably not the desired result. A better attempt is
121
122 A Guide to {\LaTeXe}: Document Preparation ...
123
124 which becomes
125
126 A guide to {\LaTeXe}: Document preparation ...
127
128 However, if you go back and re-read the description of
129 bt_purify_string(), you'll discover that "{\LaTeXe}" here is a
130 special character, but not a non-English letter: thus, the control
131 sequence is stripped. Thus, a sort key generated from this title
132 would be
133
134 A Guide to Document Preparation
135
136 ...oops! The right solution (and this applies to any title with a
137 TeX command that becomes actual text) is to bury the control
138 sequence at brace-depth two:
139
140 A Guide to {{\LaTeXe}}: Document Preparation ...
141
143 btparse
144
146 Greg Ward <gward@python.net>
147
148
149
150btparse, version 0.89 2023-07-21 BT_MISC(1)