1MANDOC_ESCAPE(3) BSD Library Functions Manual MANDOC_ESCAPE(3)
2
4 mandoc_escape — parse roff escape sequences
5
7 #include <sys/types.h>
8 #include <mandoc.h>
9
10 enum mandoc_esc
11 mandoc_escape(const char **end, const char **start, int *sz);
12
14 This function scans a roff(7) escape sequence.
15
16 An escape sequence consists of
17 - an initial backslash character (‘\’),
18 - a single ASCII character called the escape sequence identifier,
19 - and, with only a few exceptions, an argument.
20
21 Arguments can be given in the following forms; some escape sequence iden‐
22 tifiers only accept some of these forms as specified below. The first
23 three forms are called the standard forms.
24
25 In brackets: [argument]
26 The argument starts after the initial ‘[’, ends before the final ‘]’,
27 and the escape sequence ends with the final ‘]’.
28
29 Two-character argument short form: (ar
30 This form can only be used for arguments consisting of exactly two
31 characters. It has the same effect as [ar].
32
33 One-character argument short form: a
34 This form can only be used for arguments consisting of exactly one
35 character. It has the same effect as [a].
36
37 Delimited form: CargumentC
38 The argument starts after the initial delimiter character C, ends
39 before the next occurrence of the delimiter character C, and the
40 escape sequence ends with that second C. Some escape sequences allow
41 arbitrary characters C as quoting characters, some restrict the range
42 of characters that can be used as quoting characters.
43
44 Upon function entry, end is expected to point to the escape sequence
45 identifier. The values passed in as start and sz are ignored and over‐
46 written.
47
48 By design, this function cannot handle those roff(7) escape sequences
49 that require in-place expansion, in particular user-defined strings \*,
50 number registers \n, width measurements \w, and numerical expression con‐
51 trol \B. These are handled by roff_res(), a private preprocessor func‐
52 tion called from roff_parseln(), see the file roff.c.
53
54 The function mandoc_escape() is used
55 - recursively by itself, because some escape sequence arguments can in
56 turn contain other escape sequences,
57 - for error detection internally by the roff(7) parser part of the
58 mandoc(3) library, see the file roff.c,
59 - above all externally by the mandoc(1) formatting modules, in particu‐
60 lar -Tascii and -Thtml, for formatting purposes, see the files term.c
61 and html.c,
62 - and rarely externally by high-level utilities using the mandoc
63 library, for example makewhatis(8), to purge escape sequences from
64 text.
65
67 Upon function return, the pointer end is set to the character after the
68 end of the escape sequence, such that the calling higher-level parser can
69 easily continue.
70
71 For escape sequences taking an argument, the pointer start is set to the
72 beginning of the argument and sz is set to the length of the argument.
73 For escape sequences not taking an argument, start is set to the charac‐
74 ter after the end of the sequence and sz is set to 0. Both start and sz
75 may be NULL; in that case, the argument and the length are not returned.
76
77 For sequences taking an argument, the function mandoc_escape() returns
78 one of the following values:
79
80 ESCAPE_FONT
81 The escape sequence \f taking an argument in standard form: \f[, \f(,
82 \fa. Two-character arguments starting with the character ‘C’ are
83 reduced to one-character arguments by skipping the ‘C’. More spe‐
84 cific values are returned for the most commonly used arguments:
85
86 argument return value
87 R or 1 ESCAPE_FONTROMAN
88 I or 2 ESCAPE_FONTITALIC
89 B or 3 ESCAPE_FONTBOLD
90 P ESCAPE_FONTPREV
91 BI ESCAPE_FONTBI
92
93 ESCAPE_SPECIAL
94 The escape sequence \C taking an argument delimited with the single
95 quote character and, as a special exception, the escape sequences not
96 having an identifier, that is, those where the argument, in standard
97 form, directly follows the initial backslash: \C', \[, \(, \a. Note
98 that the one-character argument short form can only be used for argu‐
99 ment characters that do not clash with escape sequence identifiers.
100
101 If the argument matches one of the forms described below under
102 ESCAPE_UNICODE, that value is returned instead.
103
104 The ESCAPE_SPECIAL special character escape sequences can be rendered
105 using the functions mchars_spec2cp() and mchars_spec2str() described
106 in the mchars_alloc(3) manual.
107
108 ESCAPE_UNICODE
109 Escape sequences of the same format as described above under
110 ESCAPE_SPECIAL, but with an argument of the forms uXXXX, uYXXXX, or
111 u10XXXX where X and Y are hexadecimal digits and Y is not zero: \C'u,
112 \[u. As a special exception, start is set to the character after the
113 u, and the sz return value does not include the u either.
114
115 Such Unicode character escape sequences can be rendered using the
116 function mchars_num2uc() described in the mchars_alloc(3) manual.
117
118 ESCAPE_NUMBERED
119 The escape sequence \N followed by a delimited argument. The delim‐
120 iter character is arbitrary except that digits cannot be used. If a
121 digit is encountered instead of the opening delimiter, that digit is
122 considered to be the argument and the end of the sequence, and
123 ESCAPE_IGNORE is returned.
124
125 Such ASCII character escape sequences can be rendered using the func‐
126 tion mchars_num2char() described in the mchars_alloc(3) manual.
127
128 ESCAPE_OVERSTRIKE
129 The escape sequence \o followed by an argument delimited by an arbi‐
130 trary character.
131
132 ESCAPE_IGNORE
133
134 · The escape sequence \s followed by an argument in standard form
135 or by an argument delimited by the single quote character: \s',
136 \s[, \s(, \sa. As a special exception, an optional ‘+’ or ‘-’
137 character is allowed after the ‘s’ for all forms.
138
139 · The escape sequences \F, \g, \k, \M, \m, \n, \V, and \Y followed
140 by an argument in standard form.
141
142 · The escape sequences \A, \b, \D, \R, \X, and \Z followed by an
143 argument delimited by an arbitrary character.
144
145 · The escape sequences \H, \h, \L, \l, \S, \v, and \x followed by
146 an argument delimited by a character that cannot occur in numeri‐
147 cal expressions. However, if any character that can occur in
148 numerical expressions is found instead of a delimiter, the
149 sequence is considered to end with that character, and
150 ESCAPE_ERROR is returned.
151
152 ESCAPE_ERROR
153 Escape sequences taking an argument but not matching any of the above
154 patterns. In particular, that happens if the end of the logical
155 input line is reached before the end of the argument.
156
157 For sequences that do not take an argument, the function mandoc_escape()
158 returns one of the following values:
159
160 ESCAPE_SKIPCHAR
161 The escape sequence "\z".
162
163 ESCAPE_NOSPACE
164 The escape sequence "\c".
165
166 ESCAPE_IGNORE
167 The escape sequences "\d" and "\u".
168
170 This function is implemented in mandoc.c.
171
173 mchars_alloc(3), mandoc_char(7), roff(7)
174
176 This function has been available since mandoc 1.11.2.
177
179 Kristaps Dzonsons <kristaps@bsd.lv>
180 Ingo Schwarze <schwarze@openbsd.org>
181
183 The function doesn't cleanly distinguish between sequences that are valid
184 and supported, valid and ignored, valid and unsupported, syntactically
185 invalid, or undefined. For sequences that are ignored or unsupported, it
186 doesn't tell whether that deficiency is likely to cause major formatting
187 problems and/or loss of document content. The function is already rather
188 complicated and still parses some sequences incorrectly.
189
190BSD June 20, 2019 BSD