1MANDOC(3) BSD Library Functions Manual MANDOC(3)
2
4 mandoc, deroff, mparse_alloc, mparse_copy, mparse_free, mparse_open,
5 mparse_readfd, mparse_reset, mparse_result — mandoc macro compiler
6 library
7
9 #include <sys/types.h>
10 #include <stdio.h>
11 #include <mandoc.h>
12
13 #define ASCII_NBRSP
14 #define ASCII_HYPH
15 #define ASCII_BREAK
16
17 struct mparse *
18 mparse_alloc(int options, enum mandoc_os oe_e, char *os_s);
19
20 void
21 mparse_free(struct mparse *parse);
22
23 void
24 mparse_copy(const struct mparse *parse);
25
26 int
27 mparse_open(struct mparse *parse, const char *fname);
28
29 void
30 mparse_readfd(struct mparse *parse, int fd, const char *fname);
31
32 void
33 mparse_reset(struct mparse *parse);
34
35 struct roff_meta *
36 mparse_result(struct mparse *parse);
37
38 #include <roff.h>
39
40 void
41 deroff(char **dest, const struct roff_node *node);
42
43 #include <sys/types.h>
44 #include <mandoc.h>
45 #include <mdoc.h>
46
47 extern const char * const * mdoc_argnames;
48 extern const char * const * mdoc_macronames;
49
50 #include <sys/types.h>
51 #include <mandoc.h>
52 #include <man.h>
53
54 extern const char * const * man_macronames;
55
57 The mandoc library parses a UNIX manual into an abstract syntax tree
58 (AST). UNIX manuals are composed of mdoc(7) or man(7), and may be mixed
59 with roff(7), tbl(7), and eqn(7) invocations.
60
61 The following describes a general parse sequence:
62
63 1. initiate a parsing sequence with mchars_alloc(3) and mparse_alloc();
64
65 2. open a file with open(2) or mparse_open();
66
67 3. parse it with mparse_readfd();
68
69 4. close it with close(2);
70
71 5. retrieve the syntax tree with mparse_result();
72
73 6. if information about the validity of the input is needed, fetch it
74 with mparse_updaterc();
75
76 7. iterate over parse nodes with starting from the first member of the
77 returned struct roff_meta;
78
79 8. free all allocated memory with mparse_free() and mchars_free(3), or
80 invoke mparse_reset() and go back to step 2 to parse new files.
81
83 This section documents the functions, types, and variables available via
84 <mandoc.h>, with the exception of those documented in mandoc_escape(3)
85 and mchars_alloc(3).
86
87 Types
88 enum mandocerr
89 An error or warning message during parsing.
90
91 enum mandoclevel
92 A classification of an enum mandocerr as regards system operation. See
93 the DIAGNOSTICS section in mandoc(1) regarding the meanings of the lev‐
94 els.
95
96 struct mparse
97 An opaque pointer to a running parse sequence. Created with
98 mparse_alloc() and freed with mparse_free(). This may be used across
99 parsed input if mparse_reset() is called between parses.
100
101 Functions
102 deroff()
103 Obtain a text-only representation of a struct roff_node, including text
104 contained in its child nodes. To be used on children of the first member
105 of struct roff_meta. When it is no longer needed, the pointer returned
106 from deroff() can be passed to free(3).
107
108 mparse_alloc()
109 Allocate a parser. The arguments have the following effect:
110
111 options When the MPARSE_MDOC or MPARSE_MAN bit is set, only that
112 parser is used. Otherwise, the document type is automati‐
113 cally detected.
114
115 When the MPARSE_SO bit is set, roff(7) so file inclusion
116 requests are always honoured. Otherwise, if the request is
117 the only content in an input file, only the file name is
118 remembered, to be returned in the sodest field of struct
119 roff_meta.
120
121 When the MPARSE_QUICK bit is set, parsing is aborted after
122 the NAME section. This is for example useful in
123 makewhatis(8) -Q to quickly build minimal databases.
124
125 When the MARSE_VALIDATE bit is set, mparse_result() runs
126 the validation functions before returning the syntax tree.
127 This is almost always required, except in certain debugging
128 scenarios, for example to dump unvalidated syntax trees.
129
130 os_e Operating system to check base system conventions for. If
131 MANDOC_OS_OTHER, the system is automatically detected from
132 Os, -Ios, or uname(3).
133
134 os_s A default string for the mdoc(7) Os macro, overriding the
135 OSNAME preprocessor definition and the results of uname(3).
136 Passing NULL sets no default.
137
138 The same parser may be used for multiple files so long as mparse_reset()
139 is called between parses. mparse_free() must be called to free the mem‐
140 ory allocated by this function. Declared in <mandoc.h>, implemented in
141 read.c.
142
143 mparse_free()
144 Free all memory allocated by mparse_alloc(). Declared in <mandoc.h>,
145 implemented in read.c.
146
147 mparse_copy()
148 Dump a copy of the input to the standard output; used for -man -Tman.
149 Declared in <mandoc.h>, implemented in read.c.
150
151 mparse_open()
152 Open the file for reading. If that fails and fname does not already end
153 in ‘.gz’, try again after appending ‘.gz’. Save the information whether
154 the file is zipped or not. Return a file descriptor open for reading or
155 -1 on failure. It can be passed to mparse_readfd() or used directly.
156 Declared in <mandoc.h>, implemented in read.c.
157
158 mparse_readfd()
159 Parse a file descriptor opened with open(2) or mparse_open(). Pass the
160 associated filename in fname. This function may be called multiple times
161 with different parameters; however, close(2) and mparse_reset() should be
162 invoked between parses. Declared in <mandoc.h>, implemented in read.c.
163
164 mparse_reset()
165 Reset a parser so that mparse_readfd() may be used again. Declared in
166 <mandoc.h>, implemented in read.c.
167
168 mparse_result()
169 Obtain the result of a parse. Declared in <mandoc.h>, implemented in
170 read.c.
171
172 Variables
173 man_macronames
174 The string representation of a man(7) macro as indexed by enum mant.
175
176 mdoc_argnames
177 The string representation of an mdoc(7) macro argument as indexed by enum
178 mdocargt.
179
180 mdoc_macronames
181 The string representation of an mdoc(7) macro as indexed by enum mdoct.
182
184 This section consists of structural documentation for mdoc(7) and man(7)
185 syntax trees and strings.
186
187 Man and Mdoc Strings
188 Strings may be extracted from mdoc and man meta-data, or from text nodes
189 (MDOC_TEXT and MAN_TEXT, respectively). These strings have special non-
190 printing formatting cues embedded in the text itself, as well as roff(7)
191 escapes preserved from input. Implementing systems will need to handle
192 both situations to produce human-readable text. In general, strings may
193 be assumed to consist of 7-bit ASCII characters.
194
195 The following non-printing characters may be embedded in text strings:
196
197 ASCII_NBRSP
198 A non-breaking space character.
199
200 ASCII_HYPH
201 A soft hyphen.
202
203 ASCII_BREAK
204 A breakable zero-width space.
205
206 Escape characters are also passed verbatim into text strings. An escape
207 character is a sequence of characters beginning with the backslash (‘\’).
208 To construct human-readable text, these should be intercepted with
209 mandoc_escape(3) and converted with one the functions described in
210 mchars_alloc(3).
211
212 Man Abstract Syntax Tree
213 This AST is governed by the ontological rules dictated in man(7) and
214 derives its terminology accordingly.
215
216 The AST is composed of struct roff_node nodes with element, root and text
217 types as declared by the type field. Each node also provides its parse
218 point (the line, pos, and sec fields), its position in the tree (the
219 parent, child, next and prev fields) and some type-specific data.
220
221 The tree itself is arranged according to the following normal form, where
222 capitalised non-terminals represent nodes.
223
224 ROOT ← mnode+
225 mnode ← ELEMENT | TEXT | BLOCK
226 BLOCK ← HEAD BODY
227 HEAD ← mnode*
228 BODY ← mnode*
229 ELEMENT ← ELEMENT | TEXT*
230 TEXT ← [[:ascii:]]*
231
232 The only elements capable of nesting other elements are those with next-
233 line scope as documented in man(7).
234
235 Mdoc Abstract Syntax Tree
236 This AST is governed by the ontological rules dictated in mdoc(7) and
237 derives its terminology accordingly. "In-line" elements described in
238 mdoc(7) are described simply as "elements".
239
240 The AST is composed of struct roff_node nodes with block, head, body,
241 element, root and text types as declared by the type field. Each node
242 also provides its parse point (the line, pos, and sec fields), its posi‐
243 tion in the tree (the parent, child, last, next and prev fields) and some
244 type-specific data, in particular, for nodes generated from macros, the
245 generating macro in the tok field.
246
247 The tree itself is arranged according to the following normal form, where
248 capitalised non-terminals represent nodes.
249
250 ROOT ← mnode+
251 mnode ← BLOCK | ELEMENT | TEXT
252 BLOCK ← HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
253 ELEMENT ← TEXT*
254 HEAD ← mnode*
255 BODY ← mnode* [ENDBODY mnode*]
256 TAIL ← mnode*
257 TEXT ← [[:ascii:]]*
258
259 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the
260 BLOCK production: these refer to punctuation marks. Furthermore,
261 although a TEXT node will generally have a non-zero-length string, in the
262 specific case of ‘.Bd -literal’, an empty line will produce a zero-length
263 string. Multiple body parts are only found in invocations of ‘Bl
264 -column’, where a new body introduces a new phrase.
265
266 The mdoc(7) syntax tree accommodates for broken block structures as well.
267 The ENDBODY node is available to end the formatting associated with a
268 given block before the physical end of that block. It has a non-null end
269 field, is of the BODY type, has the same tok as the BLOCK it is ending,
270 and has a pending field pointing to that BLOCK's BODY node. It is an
271 indirect child of that BODY node and has no children of its own.
272
273 An ENDBODY node is generated when a block ends while one of its child
274 blocks is still open, like in the following example:
275
276 .Ao ao
277 .Bo bo ac
278 .Ac bc
279 .Bc end
280
281 This example results in the following block structure:
282
283 BLOCK Ao
284 HEAD Ao
285 BODY Ao
286 TEXT ao
287 BLOCK Bo, pending -> Ao
288 HEAD Bo
289 BODY Bo
290 TEXT bo
291 TEXT ac
292 ENDBODY Ao, pending -> Ao
293 TEXT bc
294 TEXT end
295
296 Here, the formatting of the Ao block extends from TEXT ao to TEXT ac,
297 while the formatting of the Bo block extends from TEXT bo to TEXT bc. It
298 renders as follows in -Tascii mode:
299
300 <ao [bo ac> bc] end
301
302 Support for badly-nested blocks is only provided for backward compatibil‐
303 ity with some older mdoc(7) implementations. Using badly-nested blocks
304 is strongly discouraged; for example, the -Thtml front-end to mandoc(1)
305 is unable to render them in any meaningful way. Furthermore, behaviour
306 when encountering badly-nested blocks is not consistent across troff
307 implementations, especially when using multiple levels of badly-nested
308 blocks.
309
311 mandoc(1), man.cgi(3), mandoc_escape(3), mandoc_headers(3),
312 mandoc_malloc(3), mansearch(3), mchars_alloc(3), tbl(3), eqn(7), man(7),
313 mandoc_char(7), mdoc(7), roff(7), tbl(7)
314
316 The mandoc library was written by Kristaps Dzonsons <kristaps@bsd.lv> and
317 is maintained by Ingo Schwarze <schwarze@openbsd.org>.
318
319BSD May 10, 2020 BSD