1MANDOC_ESCAPE(3)         BSD Library Functions Manual         MANDOC_ESCAPE(3)
2

NAME

4     mandoc_escape — parse roff escape sequences
5

SYNOPSIS

7     #include <sys/types.h>
8     #include <mandoc.h>
9
10     enum mandoc_esc
11     mandoc_escape(const char **end, const char **start, int *sz);
12

DESCRIPTION

14     This function scans a roff(7) escape sequence.
15
16     An escape sequence consists of
17     -   an initial backslash character (‘\’),
18     -   a single ASCII character called the escape sequence identifier,
19     -   and, with only a few exceptions, an argument.
20
21     Arguments can be given in the following forms; some escape sequence iden‐
22     tifiers only accept some of these forms as specified below.  The first
23     three forms are called the standard forms.
24
25     In brackets: [argument]
26         The argument starts after the initial ‘[’, ends before the final ‘]’,
27         and the escape sequence ends with the final ‘]’.
28
29     Two-character argument short form: (ar
30         This form can only be used for arguments consisting of exactly two
31         characters.  It has the same effect as [ar].
32
33     One-character argument short form: a
34         This form can only be used for arguments consisting of exactly one
35         character.  It has the same effect as [a].
36
37     Delimited form: CargumentC
38         The argument starts after the initial delimiter character C, ends
39         before the next occurrence of the delimiter character C, and the
40         escape sequence ends with that second C.  Some escape sequences allow
41         arbitrary characters C as quoting characters, some restrict the range
42         of characters that can be used as quoting characters.
43
44     Upon function entry, end is expected to point to the escape sequence
45     identifier.  The values passed in as start and sz are ignored and over‐
46     written.
47
48     By design, this function cannot handle those roff(7) escape sequences
49     that require in-place expansion, in particular user-defined strings \*,
50     number registers \n, width measurements \w, and numerical expression con‐
51     trol \B.  These are handled by roff_res(), a private preprocessor func‐
52     tion called from roff_parseln(), see the file roff.c.
53
54     The function mandoc_escape() is used
55     -   recursively by itself, because some escape sequence arguments can in
56         turn contain other escape sequences,
57     -   for error detection internally by the roff(7) parser part of the
58         mandoc(3) library, see the file roff.c,
59     -   above all externally by the mandoc(1) formatting modules, in particu‐
60         lar -Tascii and -Thtml, for formatting purposes, see the files term.c
61         and html.c,
62     -   and rarely externally by high-level utilities using the mandoc
63         library, for example makewhatis(8), to purge escape sequences from
64         text.
65

RETURN VALUES

67     Upon function return, the pointer end is set to the character after the
68     end of the escape sequence, such that the calling higher-level parser can
69     easily continue.
70
71     For escape sequences taking an argument, the pointer start is set to the
72     beginning of the argument and sz is set to the length of the argument.
73     For escape sequences not taking an argument, start is set to the charac‐
74     ter after the end of the sequence and sz is set to 0.  Both start and sz
75     may be NULL; in that case, the argument and the length are not returned.
76
77     For sequences taking an argument, the function mandoc_escape() returns
78     one of the following values:
79
80     ESCAPE_FONT
81         The escape sequence \f taking an argument in standard form: \f[, \f(,
82         \fa.  Two-character arguments starting with the character ‘C’ are
83         reduced to one-character arguments by skipping the ‘C’.  More spe‐
84         cific values are returned for the most commonly used arguments:
85
86         argument    return value
87         R or 1      ESCAPE_FONTROMAN
88         I or 2      ESCAPE_FONTITALIC
89         B or 3      ESCAPE_FONTBOLD
90         P           ESCAPE_FONTPREV
91         BI          ESCAPE_FONTBI
92
93     ESCAPE_SPECIAL
94         The escape sequence \C taking an argument delimited with the single
95         quote character and, as a special exception, the escape sequences not
96         having an identifier, that is, those where the argument, in standard
97         form, directly follows the initial backslash: \C', \[, \(, \a.  Note
98         that the one-character argument short form can only be used for argu‐
99         ment characters that do not clash with escape sequence identifiers.
100
101         If the argument matches one of the forms described below under
102         ESCAPE_UNICODE, that value is returned instead.
103
104         The ESCAPE_SPECIAL special character escape sequences can be rendered
105         using the functions mchars_spec2cp() and mchars_spec2str() described
106         in the mchars_alloc(3) manual.
107
108     ESCAPE_UNICODE
109         Escape sequences of the same format as described above under
110         ESCAPE_SPECIAL, but with an argument of the forms uXXXX, uYXXXX, or
111         u10XXXX where X and Y are hexadecimal digits and Y is not zero: \C'u,
112         \[u.  As a special exception, start is set to the character after the
113         u, and the sz return value does not include the u either.
114
115         Such Unicode character escape sequences can be rendered using the
116         function mchars_num2uc() described in the mchars_alloc(3) manual.
117
118     ESCAPE_NUMBERED
119         The escape sequence \N followed by a delimited argument.  The delim‐
120         iter character is arbitrary except that digits cannot be used.  If a
121         digit is encountered instead of the opening delimiter, that digit is
122         considered to be the argument and the end of the sequence, and
123         ESCAPE_IGNORE is returned.
124
125         Such ASCII character escape sequences can be rendered using the func‐
126         tion mchars_num2char() described in the mchars_alloc(3) manual.
127
128     ESCAPE_OVERSTRIKE
129         The escape sequence \o followed by an argument delimited by an arbi‐
130         trary character.
131
132     ESCAPE_IGNORE
133
134         ·   The escape sequence \s followed by an argument in standard form
135             or by an argument delimited by the single quote character: \s',
136             \s[, \s(, \sa.  As a special exception, an optional ‘+’ or ‘-’
137             character is allowed after the ‘s’ for all forms.
138
139         ·   The escape sequences \F, \g, \k, \M, \m, \n, \V, and \Y followed
140             by an argument in standard form.
141
142         ·   The escape sequences \A, \b, \D, \R, \X, and \Z followed by an
143             argument delimited by an arbitrary character.
144
145         ·   The escape sequences \H, \h, \L, \l, \S, \v, and \x followed by
146             an argument delimited by a character that cannot occur in numeri‐
147             cal expressions.  However, if any character that can occur in
148             numerical expressions is found instead of a delimiter, the
149             sequence is considered to end with that character, and
150             ESCAPE_ERROR is returned.
151
152     ESCAPE_ERROR
153         Escape sequences taking an argument but not matching any of the above
154         patterns.  In particular, that happens if the end of the logical
155         input line is reached before the end of the argument.
156
157     For sequences that do not take an argument, the function mandoc_escape()
158     returns one of the following values:
159
160     ESCAPE_SKIPCHAR
161         The escape sequence "\z".
162
163     ESCAPE_NOSPACE
164         The escape sequence "\c".
165
166     ESCAPE_IGNORE
167         The escape sequences "\d" and "\u".
168

FILES

170     This function is implemented in mandoc.c.
171

SEE ALSO

173     mchars_alloc(3), mandoc_char(7), roff(7)
174

HISTORY

176     This function has been available since mandoc 1.11.2.
177

AUTHORS

179     Kristaps Dzonsons <kristaps@bsd.lv>
180     Ingo Schwarze <schwarze@openbsd.org>
181

BUGS

183     The function doesn't cleanly distinguish between sequences that are valid
184     and supported, valid and ignored, valid and unsupported, syntactically
185     invalid, or undefined.  For sequences that are ignored or unsupported, it
186     doesn't tell whether that deficiency is likely to cause major formatting
187     problems and/or loss of document content.  The function is already rather
188     complicated and still parses some sequences incorrectly.
189
190BSD                              June 20, 2019                             BSD
Impressum