1MANDOC(3)                BSD Library Functions Manual                MANDOC(3)
2

NAME

4     mandoc, deroff, mparse_alloc, mparse_copy, mparse_free, mparse_open,
5     mparse_readfd, mparse_reset, mparse_result — mandoc macro compiler
6     library
7

SYNOPSIS

9     #include <sys/types.h>
10     #include <stdio.h>
11     #include <mandoc.h>
12
13     #define ASCII_NBRSP
14     #define ASCII_HYPH
15     #define ASCII_BREAK
16
17     struct mparse *
18     mparse_alloc(int options, enum mandoc_os oe_e, char *os_s);
19
20     void
21     mparse_free(struct mparse *parse);
22
23     void
24     mparse_copy(const struct mparse *parse);
25
26     int
27     mparse_open(struct mparse *parse, const char *fname);
28
29     void
30     mparse_readfd(struct mparse *parse, int fd, const char *fname);
31
32     void
33     mparse_reset(struct mparse *parse);
34
35     struct roff_meta *
36     mparse_result(struct mparse *parse);
37
38     #include <roff.h>
39
40     void
41     deroff(char **dest, const struct roff_node *node);
42
43     #include <sys/types.h>
44     #include <mandoc.h>
45     #include <mdoc.h>
46
47     extern const char * const * mdoc_argnames;
48     extern const char * const * mdoc_macronames;
49
50     #include <sys/types.h>
51     #include <mandoc.h>
52     #include <man.h>
53
54     extern const char * const * man_macronames;
55

DESCRIPTION

57     The mandoc library parses a UNIX manual into an abstract syntax tree
58     (AST).  UNIX manuals are composed of mdoc(7) or man(7), and may be mixed
59     with roff(7), tbl(7), and eqn(7) invocations.
60
61     The following describes a general parse sequence:
62
63     1.   initiate a parsing sequence with mchars_alloc(3) and mparse_alloc();
64
65     2.   open a file with open(2) or mparse_open();
66
67     3.   parse it with mparse_readfd();
68
69     4.   close it with close(2);
70
71     5.   retrieve the syntax tree with mparse_result();
72
73     6.   if information about the validity of the input is needed, fetch it
74          with mparse_updaterc();
75
76     7.   iterate over parse nodes with starting from the first member of the
77          returned struct roff_meta;
78
79     8.   free all allocated memory with mparse_free() and mchars_free(3), or
80          invoke mparse_reset() and go back to step 2 to parse new files.
81

REFERENCE

83     This section documents the functions, types, and variables available via
84     <mandoc.h>, with the exception of those documented in mandoc_escape(3)
85     and mchars_alloc(3).
86
87   Types
88     enum mandocerr
89     An error or warning message during parsing.
90
91     enum mandoclevel
92     A classification of an enum mandocerr as regards system operation.  See
93     the DIAGNOSTICS section in mandoc(1) regarding the meanings of the lev‐
94     els.
95
96     struct mparse
97     An opaque pointer to a running parse sequence.  Created with
98     mparse_alloc() and freed with mparse_free().  This may be used across
99     parsed input if mparse_reset() is called between parses.
100
101   Functions
102     deroff()
103     Obtain a text-only representation of a struct roff_node, including text
104     contained in its child nodes.  To be used on children of the first member
105     of struct roff_meta.  When it is no longer needed, the pointer returned
106     from deroff() can be passed to free(3).
107
108     mparse_alloc()
109     Allocate a parser.  The arguments have the following effect:
110
111          options  When the MPARSE_MDOC or MPARSE_MAN bit is set, only that
112                   parser is used.  Otherwise, the document type is automati‐
113                   cally detected.
114
115                   When the MPARSE_SO bit is set, roff(7) so file inclusion
116                   requests are always honoured.  Otherwise, if the request is
117                   the only content in an input file, only the file name is
118                   remembered, to be returned in the sodest field of struct
119                   roff_meta.
120
121                   When the MPARSE_QUICK bit is set, parsing is aborted after
122                   the NAME section.  This is for example useful in
123                   makewhatis(8) -Q to quickly build minimal databases.
124
125                   When the MARSE_VALIDATE bit is set, mparse_result() runs
126                   the validation functions before returning the syntax tree.
127                   This is almost always required, except in certain debugging
128                   scenarios, for example to dump unvalidated syntax trees.
129
130          os_e     Operating system to check base system conventions for.  If
131                   MANDOC_OS_OTHER, the system is automatically detected from
132                   Os, -Ios, or uname(3).
133
134          os_s     A default string for the mdoc(7) Os macro, overriding the
135                   OSNAME preprocessor definition and the results of uname(3).
136                   Passing NULL sets no default.
137
138     The same parser may be used for multiple files so long as mparse_reset()
139     is called between parses.  mparse_free() must be called to free the mem‐
140     ory allocated by this function.  Declared in <mandoc.h>, implemented in
141     read.c.
142
143     mparse_free()
144     Free all memory allocated by mparse_alloc().  Declared in <mandoc.h>,
145     implemented in read.c.
146
147     mparse_copy()
148     Dump a copy of the input to the standard output; used for -man -Tman.
149     Declared in <mandoc.h>, implemented in read.c.
150
151     mparse_open()
152     Open the file for reading.  If that fails and fname does not already end
153     in ‘.gz’, try again after appending ‘.gz’.  Save the information whether
154     the file is zipped or not.  Return a file descriptor open for reading or
155     -1 on failure.  It can be passed to mparse_readfd() or used directly.
156     Declared in <mandoc.h>, implemented in read.c.
157
158     mparse_readfd()
159     Parse a file descriptor opened with open(2) or mparse_open().  Pass the
160     associated filename in fname.  This function may be called multiple times
161     with different parameters; however, close(2) and mparse_reset() should be
162     invoked between parses.  Declared in <mandoc.h>, implemented in read.c.
163
164     mparse_reset()
165     Reset a parser so that mparse_readfd() may be used again.  Declared in
166     <mandoc.h>, implemented in read.c.
167
168     mparse_result()
169     Obtain the result of a parse.  Declared in <mandoc.h>, implemented in
170     read.c.
171
172   Variables
173     man_macronames
174     The string representation of a man(7) macro as indexed by enum mant.
175
176     mdoc_argnames
177     The string representation of an mdoc(7) macro argument as indexed by enum
178     mdocargt.
179
180     mdoc_macronames
181     The string representation of an mdoc(7) macro as indexed by enum mdoct.
182

IMPLEMENTATION NOTES

184     This section consists of structural documentation for mdoc(7) and man(7)
185     syntax trees and strings.
186
187   Man and Mdoc Strings
188     Strings may be extracted from mdoc and man meta-data, or from text nodes
189     (MDOC_TEXT and MAN_TEXT, respectively).  These strings have special non-
190     printing formatting cues embedded in the text itself, as well as roff(7)
191     escapes preserved from input.  Implementing systems will need to handle
192     both situations to produce human-readable text.  In general, strings may
193     be assumed to consist of 7-bit ASCII characters.
194
195     The following non-printing characters may be embedded in text strings:
196
197     ASCII_NBRSP
198             A non-breaking space character.
199
200     ASCII_HYPH
201             A soft hyphen.
202
203     ASCII_BREAK
204             A breakable zero-width space.
205
206     Escape characters are also passed verbatim into text strings.  An escape
207     character is a sequence of characters beginning with the backslash (‘\’).
208     To construct human-readable text, these should be intercepted with
209     mandoc_escape(3) and converted with one the functions described in
210     mchars_alloc(3).
211
212   Man Abstract Syntax Tree
213     This AST is governed by the ontological rules dictated in man(7) and
214     derives its terminology accordingly.
215
216     The AST is composed of struct roff_node nodes with element, root and text
217     types as declared by the type field.  Each node also provides its parse
218     point (the line, pos, and sec fields), its position in the tree (the
219     parent, child, next and prev fields) and some type-specific data.
220
221     The tree itself is arranged according to the following normal form, where
222     capitalised non-terminals represent nodes.
223
224     ROOT       ← mnode+
225     mnode      ← ELEMENT | TEXT | BLOCK
226     BLOCK      ← HEAD BODY
227     HEAD       ← mnode*
228     BODY       ← mnode*
229     ELEMENT    ← ELEMENT | TEXT*
230     TEXT       ← [[:ascii:]]*
231
232     The only elements capable of nesting other elements are those with next-
233     line scope as documented in man(7).
234
235   Mdoc Abstract Syntax Tree
236     This AST is governed by the ontological rules dictated in mdoc(7) and
237     derives its terminology accordingly.  "In-line" elements described in
238     mdoc(7) are described simply as "elements".
239
240     The AST is composed of struct roff_node nodes with block, head, body,
241     element, root and text types as declared by the type field.  Each node
242     also provides its parse point (the line, pos, and sec fields), its posi‐
243     tion in the tree (the parent, child, last, next and prev fields) and some
244     type-specific data, in particular, for nodes generated from macros, the
245     generating macro in the tok field.
246
247     The tree itself is arranged according to the following normal form, where
248     capitalised non-terminals represent nodes.
249
250     ROOT       ← mnode+
251     mnode      ← BLOCK | ELEMENT | TEXT
252     BLOCK      ← HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
253     ELEMENT    ← TEXT*
254     HEAD       ← mnode*
255     BODY       ← mnode* [ENDBODY mnode*]
256     TAIL       ← mnode*
257     TEXT       ← [[:ascii:]]*
258
259     Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the
260     BLOCK production: these refer to punctuation marks.  Furthermore,
261     although a TEXT node will generally have a non-zero-length string, in the
262     specific case of ‘.Bd -literal’, an empty line will produce a zero-length
263     string.  Multiple body parts are only found in invocations of ‘Bl
264     -column’, where a new body introduces a new phrase.
265
266     The mdoc(7) syntax tree accommodates for broken block structures as well.
267     The ENDBODY node is available to end the formatting associated with a
268     given block before the physical end of that block.  It has a non-null end
269     field, is of the BODY type, has the same tok as the BLOCK it is ending,
270     and has a pending field pointing to that BLOCK's BODY node.  It is an
271     indirect child of that BODY node and has no children of its own.
272
273     An ENDBODY node is generated when a block ends while one of its child
274     blocks is still open, like in the following example:
275
276           .Ao ao
277           .Bo bo ac
278           .Ac bc
279           .Bc end
280
281     This example results in the following block structure:
282
283           BLOCK Ao
284               HEAD Ao
285               BODY Ao
286                   TEXT ao
287                   BLOCK Bo, pending -> Ao
288                       HEAD Bo
289                       BODY Bo
290                           TEXT bo
291                           TEXT ac
292                           ENDBODY Ao, pending -> Ao
293                           TEXT bc
294           TEXT end
295
296     Here, the formatting of the Ao block extends from TEXT ao to TEXT ac,
297     while the formatting of the Bo block extends from TEXT bo to TEXT bc.  It
298     renders as follows in -Tascii mode:
299
300           <ao [bo ac> bc] end
301
302     Support for badly-nested blocks is only provided for backward compatibil‐
303     ity with some older mdoc(7) implementations.  Using badly-nested blocks
304     is strongly discouraged; for example, the -Thtml front-end to mandoc(1)
305     is unable to render them in any meaningful way.  Furthermore, behaviour
306     when encountering badly-nested blocks is not consistent across troff
307     implementations, especially when using multiple levels of badly-nested
308     blocks.
309

SEE ALSO

311     mandoc(1), man.cgi(3), mandoc_escape(3), mandoc_headers(3),
312     mandoc_malloc(3), mansearch(3), mchars_alloc(3), tbl(3), eqn(7), man(7),
313     mandoc_char(7), mdoc(7), roff(7), tbl(7)
314

AUTHORS

316     The mandoc library was written by Kristaps Dzonsons <kristaps@bsd.lv> and
317     is maintained by Ingo Schwarze <schwarze@openbsd.org>.
318
319BSD                              May 10, 2020                              BSD
Impressum