1roff(7) Miscellaneous Information Manual roff(7)
2
3
4
6 roff - concepts and history of roff typesetting
7
9 The term roff denotes a family of document formatting systems known by
10 names like troff, nroff, and ditroff. A roff system consists of an in‐
11 terpreter for an extensible text formatting language and a set of pro‐
12 grams for preparing output for various devices and file formats. Unix-
13 like operating systems often distribute a roff system. The manual
14 pages on Unix systems (“man pages”) and bestselling books on software
15 engineering, including Brian Kernighan and Dennis Ritchie's The C Pro‐
16 gramming Language and W. Richard Stevens's Advanced Programming in the
17 Unix Environment have been written using roff systems. GNU roff—groff—
18 is arguably the most widespread roff implementation.
19
20 Below we present typographical concepts that form the background of all
21 roff implementations, narrate the development history of some roff sys‐
22 tems, detail the command pipeline managed by groff(1), survey the for‐
23 matting language, suggest tips for editing roff input, and recommend
24 further reading materials.
25
27 roff input files contain text interspersed with instructions to control
28 the formatter. Even in the absence of such instructions, a roff for‐
29 matter still processes its input in several ways, by filling, hyphenat‐
30 ing, breaking, and adjusting it, and supplementing it with inter-sen‐
31 tence space. These processes are basic to typesetting, and can be con‐
32 trolled at the input document's discretion.
33
34 When a device-independent roff formatter starts up, it obtains informa‐
35 tion about the device for which it is preparing output from the lat‐
36 ter's description file (see groff_font(5)). An essential property is
37 the length of the output line, such as “6.5 inches”.
38
39 The formatter interprets plain text files employing the Unix line-end‐
40 ing convention. It reads input a character at a time, collecting words
41 as it goes, and fits as many words together on an output line as it
42 can—this is known as filling. To a roff system, a word is any sequence
43 of one or more characters that aren't spaces or newlines. The excep‐
44 tions separate words.
45
46 A roff formatter attempts to detect boundaries between sentences, and
47 supplies additional inter-sentence space between them. It flags cer‐
48 tain characters (normally “!”, “?”, and “.”) as potentially ending a
49 sentence. When the formatter encounters one of these end-of-sentence
50 characters at the end of an input line, or one of them is followed by
51 two (unescaped) spaces on the same input line, it appends an inter-word
52 space followed by an inter-sentence space in the output. The dummy
53 character escape sequence \& can be used after an end-of-sentence char‐
54 acter to defeat end-of-sentence detection on a per-instance basis.
55 Normally, the occurrence of a visible non-end-of-sentence character (as
56 opposed to a space or tab) immediately after an end-of-sentence charac‐
57 ter cancels detection of the end of a sentence. However, several char‐
58 acters are treated transparently after the occurrence of an end-of-sen‐
59 tence character. That is, a roff does not cancel end-of-sentence de‐
60 tection when it processes them. This is because such characters are
61 often used as footnote markers or to close quotations and parentheti‐
62 cals. The default set is ", ', ), ], *, \[dg], \[dd], \[rq], and
63 \[cq]. The last four are examples of special characters, escape se‐
64 quences whose purpose is to obtain glyphs that are not easily typed at
65 the keyboard, or which have special meaning to the formatter (like \).
66
67 When an output line is nearly full, it is uncommon for the next word
68 collected from the input to exactly fill it—typically, there is room
69 left over only for part of the next word. The process of splitting a
70 word so that it appears partially on one line (with a hyphen to indi‐
71 cate to the reader that the word has been broken) with its remainder on
72 the next is hyphenation. Hyphenation points can be manually specified;
73 groff also uses a hyphenation algorithm and language-specific pattern
74 files to decide which words can be hyphenated and where. Hyphenation
75 does not always occur even when the hyphenation rules for a word allow
76 it; it can be disabled, and when not disabled there are several parame‐
77 ters that can prevent it in certain circumstances.
78
79 Once an output line is full, the next word (or remainder of a hyphen‐
80 ated one) is placed on a different output line; this is called a break.
81 In this document and in roff discussions generally, a “break” if not
82 further qualified always refers to the termination of an output line.
83 When the formatter is filling text, it introduces breaks automatically
84 to keep output lines from exceeding the configured line length. After
85 an automatic break, a roff formatter adjusts the line if applicable
86 (see below), and then resumes collecting and filling text on the next
87 output line.
88
89 Sometimes, a line cannot be broken automatically. This usually does
90 not happen with natural language text unless the output line length has
91 been manipulated to be extremely short, but it can with specialized
92 text like program source code. groff provides a means of telling the
93 formatter where the line may be broken without hyphens. This is done
94 with the non-printing break point escape sequence \:.
95
96 There are several ways to cause a break at a predictable location. A
97 blank input line not only causes a break, but by default it also out‐
98 puts a one-line vertical space (effectively a blank output line).
99 Macro packages may discourage or disable this “blank line method” of
100 paragraphing in favor of their own macros. A line that begins with one
101 or more spaces causes a break. The spaces are output at the beginning
102 of the next line without being adjusted (see below). Again, macro
103 packages may provide other methods of producing indented paragraphs.
104 Trailing spaces on text lines (see below) are discarded. The end of
105 input causes a break.
106
107 After the formatter performs an automatic break, it may then adjust the
108 line, widening inter-word spaces until the text reaches the right mar‐
109 gin. Extra spaces between words are preserved. Leading and trailing
110 spaces are handled as noted above. Text can be aligned to the left or
111 right margin only, or centered, using requests.
112
113 A roff formatter translates horizontal tab characters, also called sim‐
114 ply “tabs”, in the input into movements to the next tab stop. These
115 tab stops are by default located every half inch measured from the cur‐
116 rent position on the input line. With them, simple tables can be made.
117 However, this method can be deceptive, as the appearance (and width) of
118 the text in an editor and the results from the formatter can vary
119 greatly, particularly when proportional typefaces are used. A tab
120 character does not cause a break and therefore does not interrupt fill‐
121 ing. The formatter provides facilities for sophisticated table compo‐
122 sition; there are many details to track when using the “tab” and
123 “field” low-level features, so most users turn to the tbl(1) preproces‐
124 sor to lay out tables.
125
126 Requests and macros
127 A request is an instruction to the formatter that occurs after a con‐
128 trol character, which is recognized at the beginning of an input line.
129 The regular control character is a dot “.”. Its counterpart, the no-
130 break control character, a neutral apostrophe “'”, suppresses the break
131 implied by some requests. These characters were chosen because it is
132 uncommon for lines of text in natural languages to begin with them. If
133 you require a formatted period or apostrophe (closing single quotation
134 mark) where the formatter is expecting a control character, prefix the
135 dot or neutral apostrophe with the dummy character escape sequence,
136 “\&”.
137
138 An input line beginning with a control character is called a control
139 line. Every line of input that is not a control line is a text line.
140
141 Requests often take arguments, words (separated from the request name
142 and each other by spaces) that specify details of the action the for‐
143 matter is expected to perform. If a request is meaningless without ar‐
144 guments, it is typically ignored. Of key importance are the requests
145 that define macros. Macros are invoked like requests, enabling the re‐
146 quest repertoire to be extended or overridden.
147
148 A macro can be thought of as an abbreviation you can define for a col‐
149 lection of control and text lines. When the macro is called by giving
150 its name after a control character, it is replaced with what it stands
151 for. The process of textual replacement is known as interpolation.
152 Interpolations are handled as soon as they are recognized, and once
153 performed, a roff formatter scans the replacement for further requests,
154 macro calls, and escape sequences.
155
156 In roff systems, the “de” request defines a macro.
157
158 Page geometry
159 roff systems format text under certain assumptions about the size of
160 the output medium, or page. For the formatter to correctly break a
161 line it is filling, it must know the line length, which it derives from
162 the page width. For it to decide whether to write an output line to
163 the current page or wait until the next one, it must know the page
164 length. A device's resolution converts practical units like inches or
165 centimeters to basic units, a convenient length measure for the output
166 device or file format. The formatter and output driver use basic units
167 to reckon page measurements. The device description file defines its
168 resolution and page dimensions (see groff_font(5)).
169
170 A page is a two-dimensional structure upon which a roff system imposes
171 a rectangular coordinate system with its upper left corner as the ori‐
172 gin. Coordinate values are in basic units and increase down and to the
173 right. Useful ones are therefore always positive and within numeric
174 ranges corresponding to the page boundaries.
175
176 While the formatter (and, later, output driver) is processing a page,
177 it keeps track of its drawing position, which is the location at which
178 the next glyph will be written, from which the next motion will be mea‐
179 sured, or where a geometric object will commence rendering. Notion‐
180 ally, glyphs are drawn from the text baseline upward and to the right.
181 (groff does not yet support right-to-left scripts.) The text baseline
182 is a (usually invisible) line upon which the glyphs of a typeface are
183 aligned. A glyph therefore “starts” at its bottom-left corner. If
184 drawn at the origin, a typical letter glyph would lie partially or
185 wholly off the page, depending on whether, like “g”, it features a de‐
186 scender below the baseline.
187
188 Such a situation is nearly always undesirable. It is furthermore con‐
189 ventional not to write or draw at the extreme edges of the page.
190 Therefore the initial drawing position of a roff formatter is not at
191 the origin, but below and to the right of it. This rightward shift
192 from the left edge is known as the page offset. (groff's terminal out‐
193 put devices have page offsets of zero.) The downward shift leaves room
194 for a text output line.
195
196 Text is arranged on a one-dimensional lattice of text baselines from
197 the top to the bottom of the page. Vertical spacing is the distance
198 between adjacent text baselines. Typographic tradition sets this quan‐
199 tity to 120% of the type size. The initial vertical drawing position
200 is one unit of vertical spacing below the page top. Typographers term
201 this unit a vee.
202
203 Vertical spacing has an impact on page-breaking decisions. Generally,
204 when a break occurs, the formatter moves the drawing position to the
205 next text baseline automatically. If the formatter were already writ‐
206 ing to the last line that would fit on the page, advancing by one vee
207 would place the next text baseline off the page. Rather than let that
208 happen, roff formatters instruct the output driver to eject the page,
209 start a new one, and again set the drawing position to one vee below
210 the page top; this is a page break.
211
212 When the last line of input text corresponds to the last output line
213 that fits on the page, the break caused by the end of input will also
214 break the page, producing a useless blank one. Macro packages keep
215 users from having to confront this difficulty by setting “traps”; more‐
216 over, all but the simplest page layouts tend to have headers and foot‐
217 ers, or at least bear vertical margins larger than one vee.
218
219 Other language elements
220 Escape sequences start with the escape character, a backslash \, and
221 are followed by at least one additional character. They can appear
222 anywhere in the input.
223
224 With requests, the escape and control characters can be changed; fur‐
225 ther, escape sequence recognition can be turned off and back on.
226
227 Strings store character sequences. In groff, they can be parameterized
228 as macros can.
229
230 Registers store numerical values, including measurements. The latter
231 are generally in basic units; scaling units can be appended to numeric
232 expressions to clarify their meaning when stored or interpolated. Some
233 read-only predefined registers interpolate text.
234
235 Fonts are identified either by a name or by a mounting position (a non-
236 negative number). Four styles are available on all devices. R is “ro‐
237 man”: normal, upright text. B is bold, an upright typeface with a
238 heavier weight. I is italic, a face that is oblique on typesetter out‐
239 put devices and usually underlined instead on terminal devices. BI is
240 bold-italic, combining both of the foregoing style variations. Type‐
241 setting devices group these four styles into families of text fonts;
242 they also typically offer one or more special fonts that provide un‐
243 styled glyphs; see groff_char(7).
244
245 groff supports named colors for glyph rendering and drawing of geomet‐
246 ric objects. Stroke and fill colors are distinct; the stroke color is
247 used for glyphs.
248
249 Glyphs are visual representation forms of characters. In groff, the
250 distinction between those two elements is not always obvious (and a
251 full discussion is beyond our scope). In brief, “A” is a character
252 when we consider it in the abstract: to make it a glyph, we must select
253 a typeface with which to render it, and determine its type size and
254 color. The formatting process turns input characters into output
255 glyphs. A few characters commonly seen on keyboards are treated spe‐
256 cially by the roff language and may not look correct in output if used
257 unthinkingly; they are the (double) quotation mark ("), the neutral
258 apostrophe ('), the minus sign (-), the backslash (\), the caret or
259 circumflex accent (^), the grave accent (`), and the tilde (~). All of
260 these and more can be produced with special character escape sequences;
261 see groff_char(7).
262
263 groff offers streams, identifiers for writable files, but for security
264 reasons this feature is disabled by default.
265
266 A further few language elements arise as page layouts become more so‐
267 phisticated and demanding. Environments collect formatting parameters
268 like line length and typeface. A diversion stores formatted output for
269 later use. A trap is a condition on the input or output, tested auto‐
270 matically by the formatter, that is associated with a macro, calling it
271 when that condition is fulfilled.
272
273 Footnote support often exercises all three of the foregoing features.
274 A simple implementation might work as follows. A pair of macros is de‐
275 fined: one starts a footnote and the other ends it. The author calls
276 the first macro where a footnote marker is desired. The macro estab‐
277 lishes a diversion so that the footnote text is collected at the place
278 in the body text where its corresponding marker appears. An environ‐
279 ment is created for the footnote so that it is set at a smaller type‐
280 face. The footnote text is formatted in the diversion using that envi‐
281 ronment, but it does not yet appear in the output. The document author
282 calls the footnote end macro, which returns to the previous environment
283 and ends the diversion. Later, after much more body text in the docu‐
284 ment, a trap, set a small distance above the page bottom, is sprung.
285 The macro called by the trap draws a line across the page and emits the
286 stored diversion. Thus, the footnote is rendered.
287
289 Computer-driven document formatting dates back to the 1960s. The roff
290 system is intimately connected with Unix, but its origins lie with the
291 earlier operating systems CTSS, GECOS, and Multics.
292
293 The predecessor—RUNOFF
294 roff's ancestor RUNOFF was written in the MAD language by Jerry Saltzer
295 to prepare his Ph.D. thesis on the Compatible Time Sharing System
296 (CTSS), a project of the Massachusetts Institute of Technology (MIT).
297 This program is referred to in full capitals, both to distinguish it
298 from its many descendants, and because bits were expensive in those
299 days; five- and six-bit character encodings were still in widespread
300 usage, and mixed-case alphabetics in file names seen as a luxury.
301 RUNOFF introduced a syntax of inlining formatting directives amid docu‐
302 ment text, by beginning a line with a period (an unlikely occurrence in
303 human-readable material) followed by a “control word”. Control words
304 with obvious meaning like “.line length n” were supported as well as an
305 abbreviation system; the latter came to overwhelm the former in popular
306 usage and later derivatives of the program. A sample of control words
307 from a RUNOFF manual of December 1966 ⟨http://web.mit.edu/Saltzer/www/
308 publications/ctss/AH.9.01.html⟩ was documented as follows (with the pa‐
309 rameter notation slightly altered). The abbreviations will be familiar
310 to roff veterans.
311
312 Abbreviation Control word
313 .ad .adjust
314 .bp .begin page
315 .br .break
316 .ce .center
317 .in .indent n
318 .ll .line length n
319 .nf .nofill
320 .pl .paper length n
321 .sp .space [n]
322
323 In 1965, MIT's Project MAC teamed with Bell Telephone Laboratories and
324 General Electric (GE) to inaugurate the Multics ⟨http://www.multicians
325 .org⟩ project. After a few years, Bell Labs discontinued its partici‐
326 pation in Multics, famously prompting the development of Unix. Mean‐
327 while, Saltzer's RUNOFF proved influential, seeing many ports and deri‐
328 vations elsewhere.
329
330 In 1969, Doug McIlroy wrote one such reimplementation, adding exten‐
331 sions, in the BCPL language for a GE 645 running GECOS at the Bell Labs
332 location in Murray Hill, New Jersey. In its manual, the control com‐
333 mands were termed “requests”, their two-letter names were canonical,
334 and the control character was configurable with a .cc request. Other
335 familiar requests emerged at this time; no-adjust (.na), need (.ne),
336 page offset (.po), tab configuration (.ta, though it worked differ‐
337 ently), temporary indent (.ti), character translation (.tr), and auto‐
338 matic underlining (.ul; on RUNOFF you had to backspace and underscore
339 in the input yourself). .fi to enable filling of output lines got the
340 name it retains to this day. McIlroy's program also featured a heuris‐
341 tic system for automatically placing hyphenation points, designed and
342 implemented by Molly Wagner. It furthermore introduced numeric vari‐
343 ables, termed registers. By 1971, this program had been ported to Mul‐
344 tics and was known as roff, a name McIlroy attributes to Bob Morris, to
345 distinguish it from CTSS RUNOFF.
346
347 Unix and roff
348 McIlroy's roff was one of the first Unix programs. In Ritchie's term,
349 it was “transliterated” from BCPL to DEC PDP-7 assembly language for
350 the fledgling Unix operating system. Automatic hyphenation was managed
351 with .hc and .hy requests, line spacing control was generalized with
352 the .ls request, and what later roffs would call diversions were avail‐
353 able via “footnote” requests. This roff indirectly funded operating
354 systems research at Murray Hill; AT&T prepared patent applications to
355 the U.S. government with it. This arrangement enabled the group to ac‐
356 quire a PDP-11; roff promptly proved equal to the task of formatting
357 the manual for what would become known as “First Edition Unix”, dated
358 November 1971.
359
360 Output from all of the foregoing programs was limited to line printers
361 and paper terminals such as the IBM 2471 (based on the Selectric line
362 of typewriters) and the Teletype Corporation Model 37. Proportionally
363 spaced type was unavailable.
364
365 New roff and Typesetter roff
366 The first years of Unix were spent in rapid evolution. The practicali‐
367 ties of preparing standardized documents like patent applications (and
368 Unix manual pages), combined with McIlroy's enthusiasm for macro lan‐
369 guages, perhaps created an irresistible pressure to make roff extensi‐
370 ble. Joe Ossanna's nroff, literally a “new roff”, was the outlet for
371 this pressure. By the time of Unix Version 3 (February 1973)—and still
372 in PDP-11 assembly language—it sported a swath of features now consid‐
373 ered essential to roff systems: definition of macros (.de), diversion
374 of text thither (.di), and removal thereof (.rm); trap planting (.wh;
375 “when”) and relocation (.ch; “change”); conditional processing (.if);
376 and environments (.ev). Incremental improvements included assignment
377 of the next page number (.pn); no-space mode (.ns) and restoration of
378 vertical spacing (.rs); the saving (.sv) and output (.os) of vertical
379 space; specification of replacement characters for tabs (.tc) and lead‐
380 ers (.lc); configuration of the no-break control character (.c2);
381 shorthand to disable automatic hyphenation (.nh); a condensation of
382 what were formerly six different requests for configuration of page
383 “titles” (headers and footers) into one (.tl) with a length controlled
384 separately from the line length (.lt); automatic line numbering (.nm);
385 interactive input (.rd), which necessitated buffer-flushing (.fl), and
386 was made convenient with early program cessation (.ex); source file in‐
387 clusion in its modern form (.so; though RUNOFF had an “.append” control
388 word for a similar purpose) and early advance to the next file argument
389 (.nx); ignorable content (.ig); and programmable abort (.ab).
390
391 Third Edition Unix also brought the pipe(2) system call, the explosive
392 growth of a componentized system based around it, and a “filter model”
393 that remains perceptible today. Equally importantly, the Bell Labs
394 site in Murray Hill acquired a Graphic Systems C/A/T phototypesetter,
395 and with it came the necessity of expanding the capabilities of a roff
396 system to cope with a variety of proportionally spaced typefaces at
397 multiple sizes. Ossanna wrote a parallel implementation of nroff for
398 the C/A/T, dubbing it troff (for “typesetter roff”). Unfortunately,
399 surviving documentation does not illustrate what requests were imple‐
400 mented at this time for C/A/T support; the troff(1) man page in Fourth
401 Edition Unix (November 1973) does not feature a request list, unlike
402 nroff(1). Apart from typesetter-driven features, Unix Version 4 roffs
403 added string definitions (.ds); made the escape character configurable
404 (.ec); and enabled the user to write diagnostics to the standard error
405 stream (.tm). Around 1974, empowered with multiple type sizes, ital‐
406 ics, and a symbol font specially commissioned by Bell Labs from Graphic
407 Systems, Kernighan and Lorinda Cherry implemented eqn for typesetting
408 mathematics. In the same year, for Fifth Edition Unix, Ossanna com‐
409 bined and reimplemented the two roffs in C, using that language's pre‐
410 processor to generate both from a single source tree.
411
412 Ossanna documented the syntax of the input language to the nroff and
413 troff programs in the “Troff User's Manual”, first published in 1976,
414 with further revisions as late as 1992 by Kernighan. (The original
415 version was entitled “Nroff/Troff User's Manual”, which may partially
416 explain why roff practitioners have tended to refer to it by its AT&T
417 document identifier, “CSTR #54”.) Its final revision serves as the de
418 facto specification of AT&T troff, and all subsequent implementors of
419 roff systems have done so in its shadow.
420
421 A small and simple set of roff macros was first used for the manual
422 pages of Unix Version 4 and persisted for two further releases, but the
423 first macro package to be formally described and installed was ms by
424 Michael Lesk in Version 6. He also wrote a manual, “Typing Documents
425 on the Unix System”, describing ms and basic nroff/troff usage, updat‐
426 ing it as the package accrued features. Sixth Edition additionally saw
427 the debut of the tbl preprocessor for formatting tables, also by Lesk.
428
429 For Unix Version 7 (January 1979), McIlroy designed, implemented, and
430 documented the man macro package, introducing most of the macros de‐
431 scribed in groff_man(7) today, and edited volume 1 of the Version 7
432 manual using it. Documents composed using ms featured in volume 2,
433 edited by Kernighan.
434
435 Meanwhile, troff proved popular even at Unix sites that lacked a C/A/T
436 device. Tom Ferrin of the University of California at San Francisco
437 combined it with Allen Hershey's popular vector fonts to produce
438 vtroff, which translated troff's output to the command language used by
439 Versatec and Benson-Varian plotters.
440
441 Ossanna had passed away unexpectedly in 1977, and after the release of
442 Version 7, with the C/A/T typesetter becoming supplanted by alternative
443 devices such as the Mergenthaler Linotron 202, Kernighan undertook a
444 revision and rewrite of troff to generalize its design. To implement
445 this revised architecture, he developed the font and device description
446 file formats and the page description language that remain in use to‐
447 day. He described these novelties in the article “A Typesetter-inde‐
448 pendent TROFF”, last revised in 1982, and like the troff manual itself,
449 it is widely known by a shorthand, “CSTR #97”.
450
451 Kernighan's innovations prepared troff well for the introduction of the
452 Adobe PostScript language in 1982 and a vibrant market in laser print‐
453 ers with built-in interpreters for it. An output driver for Post‐
454 Script, dpost, was swiftly developed. However, AT&T's software licens‐
455 ing practices kept Ossanna's troff, with its tight coupling to the
456 C/A/T's capabilities, in parallel distribution with device-independent
457 troff throughout the 1980s. Today, however, all actively maintained
458 troffs follow Kernighan's device-independent design.
459
460 groff—a free roff from GNU
461 The most important free roff project historically has been groff, the
462 GNU implementation of troff, developed by James Clark starting in 1989
463 and distributed under copyleft ⟨http://www.gnu.org/copyleft⟩ licenses,
464 ensuring to all the availability of source code and the freedom to mod‐
465 ify and redistribute it, properties unprecedented in roff systems to
466 that point. groff rapidly attracted contributors, and has served as a
467 replacement for almost all applications of AT&T troff (exceptions in‐
468 clude mv, a macro package for preparation of viewgraphs and slides, and
469 the ideal preprocessor, which produces diagrams from mathematical con‐
470 straints). Beyond that, it has added numerous features; see
471 groff_diff(7). Since its inception and for at least the following
472 three decades, it has been used by practically all GNU/Linux and BSD
473 operating systems.
474
475 groff continues to be developed, is available for almost all operating
476 systems in common use (along with several obscure ones), and is free.
477 These factors make groff the de facto roff standard today.
478
479 Other free roffs
480 In 2007, Caldera/SCO and Sun Microsystems, having acquired rights to
481 AT&T Documenter's Workbench (DWB) troff (a descendant of the Bell Labs
482 code), released it under a free but GPL-incompatible license. This
483 implementation ⟨https://github.com/n-t-roff/DWB3.3⟩ was made portable
484 to modern POSIX systems, and adopted and enhanced first by Gunnar Rit‐
485 ter and then Carsten Kunze to produce Heirloom Doctools troff
486 ⟨https://github.com/n-t-roff/heirloom-doctools⟩.
487
488 In July 2013, Ali Gholami Rudi announced neatroff ⟨https://github.com/
489 aligrudi/neatroff⟩, a permissively licensed new implementation.
490
491 Another descendant of DWB troff is part of Plan 9 from User Space
492 ⟨https://9fans.github.io/plan9port/⟩. Since 2021, this troff has been
493 available under permissive terms.
494
496 When you read a man page, often a roff is the program rendering it.
497 Some roff implementations provide wrapper programs that make it easy to
498 use the roff system from the shell's command line. These can be spe‐
499 cific to a macro package, like mmroff(1), or more general. groff(1)
500 provides command-line options sparing the user from constructing the
501 long, order-dependent pipelines familiar to AT&T troff users. Further,
502 a heuristic program, grog(1), is available to infer from a document's
503 contents which groff arguments should be used to process it.
504
505 The roff pipeline
506 A typical roff document is prepared by running one or more processors
507 in series, followed by a a formatter program and then an output driver
508 (or “device postprocessor”). Commonly, these programs are structured
509 into a pipeline; that is, each is run in sequence such that the output
510 of one is taken as the input to the next, without passing through sec‐
511 ondary storage. (On non-Unix systems, pipelines may have to be simu‐
512 lated with temporary files.)
513
514 $ preproc1 < input-file | preproc2 | ... | troff [option] ... \
515 | output-driver
516
517 Once all preprocessors have run, they deliver pure roff language input
518 to the formatter, which in turn generates a document in a page descrip‐
519 tion language that is then interpreted by a postprocessor for viewing,
520 printing, or further processing.
521
522 Each program interprets input in a language that is independent of the
523 others; some are purely descriptive, as with tbl(1) and roff output,
524 and some permit the definition of macros, as with eqn(1) and roff in‐
525 put. Most roff input files employ the macros of a document formatting
526 package, intermixed with instructions for one or more preprocessors,
527 and seasoned with escape sequences and requests from the roff language.
528 Some documents are simpler still, since their formatting packages dis‐
529 courage direct use of roff requests; man pages are a prominent example.
530 Many features of the roff language are seldom needed by users; only au‐
531 thors of macro packages require a substantial command of them.
532
533 Preprocessors
534 A roff preprocessor is a program that, directly or ultimately, gener‐
535 ates output in the roff language. Typically, each preprocessor defines
536 a language of its own that transforms its input into that for roff or
537 another preprocessor. As an example of the latter, chem produces pic
538 input. Preprocessors must consequently be run in an appropriate order;
539 groff(1) handles this automatically for all preprocessors supplied by
540 the GNU roff system.
541
542 Portions of the document written in preprocessor languages are usually
543 bracketed by tokens that look like roff macro calls. roff preprocessor
544 programs transform only the regions of the document intended for them.
545 When a preprocessor language is used by a document, its corresponding
546 program must process it before the input is seen by the formatter, or
547 incorrect rendering is almost guaranteed.
548
549 GNU roff provides several preprocessors, including eqn, grn, pic, tbl,
550 refer, and soelim. See groff(1) for a complete list. Other preproces‐
551 sors for roff systems are known.
552
553 dformat depicts data structures;
554 grap constructs statistical charts; and
555 ideal draws diagrams using a constraint-based language.
556
557 Formatter programs
558 A roff formatter transforms roff language input into a single file in a
559 page description language, described in groff_out(5), intended for pro‐
560 cessing by a selected device. This page description language is spe‐
561 cialized in its parameters, but not its syntax, for the selected de‐
562 vice; the format is device-independent, but not device-agnostic. The
563 parameters the formatter uses to arrange the document are stored in de‐
564 vice and font description files; see groff_font(5).
565
566 AT&T Unix had two formatters—nroff for terminals, and troff for type‐
567 setters. Often, the name troff is used loosely to refer to both. When
568 generalizing thus, groff documentation prefers the term “roff”. In GNU
569 roff, the formatter program is always troff(1).
570
571 Devices and output drivers
572 To a roff system, a device is a hardware interface like a printer, a
573 text or graphical terminal, or a standardized file format that unre‐
574 lated software can interpret. An output driver is a program that
575 parses the output of troff and produces instructions specific to the
576 device or file format it supports. An output driver might support mul‐
577 tiple devices, particularly if they are similar.
578
579 The names of the devices and their driver programs are not standard‐
580 ized. Technological fashions evolve; the devices used for document
581 preparation when AT&T troff was first written in the 1970s are no
582 longer used in production environments. Device capabilities have
583 tended to increase, improving resolution and font repertoire, and
584 adding color output and hyperlinking. Further, to reduce file size and
585 processing time, AT&T troff's page description language placed low lim‐
586 its on the magnitudes of some quantities it could represent. Its Post‐
587 Script output driver, dpost(1), had a resolution of 720 units per inch;
588 groff's grops(1) uses 72,000.
589
591 Documents using roff are normal text files interleaved with roff for‐
592 matting elements. The roff language is powerful enough to support ar‐
593 bitrary computation and it supplies facilities that encourage exten‐
594 sion. The primary such facility is macro definition; with this fea‐
595 ture, macro packages have been developed that are tailored for particu‐
596 lar applications.
597
598 Macro packages
599 Macro packages can have a much smaller vocabulary than roff itself;
600 this trait combined with their domain-specific nature can make them
601 easy to acquire and master. The macro definitions of a package are
602 typically kept in a file called name.tmac (historically, tmac.name).
603 Find details on the naming and placement of macro packages in
604 groff_tmac(5).
605
606 A macro package anticipated for use in a document can be declared to
607 the formatter by the command-line option -m; see troff(1). It can al‐
608 ternatively be specified within a document using the mso request of the
609 groff language; see groff(7).
610
611 Well-known macro packages include man for traditional man pages and
612 mdoc for BSD-style manual pages. Macro packages for typesetting books,
613 articles, and letters include ms (from “manuscript macros”), me (named
614 by a system administrator from the first name of its creator, Eric All‐
615 man), mm (from “memorandum macros”), and mom, a punningly named package
616 exercising many groff extensions. See groff_tmac(5) for more.
617
618 The roff formatting language
619 The roff language provides requests, escape sequences, macro definition
620 facilities, string variables, registers for storage of numbers or di‐
621 mensions, and control of execution flow. The theoretically minded will
622 observe that a roff is not a mere markup language, but Turing-complete.
623 It has storage (registers), it can perform tests (as in conditional ex‐
624 pressions like “(\n[i] >= 1)”), its “if” and related requests alter the
625 flow of control, and macro definition permits unbounded recursion.
626
627 Requests and escape sequences are instructions, predefined parts of the
628 language, that perform formatting operations, interpolate stored mate‐
629 rial, or otherwise change the state of the parser. The user can define
630 their own request-like elements by composing together text, requests,
631 and escape sequences ad libitum. A document writer will not (usually)
632 note any difference in usage for requests or macros; both are found on
633 control lines. However, there is a distinction; requests take either a
634 fixed number of arguments (sometimes zero), silently ignoring any ex‐
635 cess, or consume the rest of the input line, whereas macros can take a
636 variable number of arguments. Since arguments are separated by spaces,
637 macros require a means of embedding a space in an argument; in other
638 words, of quoting it. This then demands a mechanism of embedding the
639 quoting character itself, in case it is needed literally in a macro ar‐
640 gument. AT&T troff had complex rules involving the placement and repe‐
641 tition of the double quote to achieve both aims. groff cuts this knot
642 by supporting a special character escape sequence for the neutral dou‐
643 ble quote, “\[dq]”, which never performs quoting in the typesetting
644 language, but is simply a glyph, ‘"’.
645
646 Escape sequences start with a backslash, “\”. They can appear almost
647 anywhere, even in the midst of text on a line, and implement various
648 features, including the insertion of special characters with “\(xx” or
649 “\[xxx]”, break suppression at input line endings with “\c”, font
650 changes with “\f”, type size changes with “\s”, in-line comments with
651 “\"”, and many others.
652
653 Strings store text. They are populated with the ds request and inter‐
654 polated using the \* escape sequence.
655
656 Registers store numbers and measurements. A register can be set with
657 the request nr and its value can be retrieved by the escape sequence
658 \n.
659
661 The structure or content of a file name, beyond its location in the
662 file system, is not significant to roff tools. roff documents employ‐
663 ing “full-service” macro packages (see groff_tmac(5)) tend to be named
664 with a suffix identifying the package; we thus see file names ending in
665 .man, .ms, .me, .mm, and .mom, for instance. When installed, man pages
666 tend to be named with the manual's section number as the suffix. For
667 example, the file name for this document is roff.7. Practice for “raw”
668 roff documents is less consistent; they are sometimes seen with a .t
669 suffix.
670
672 Since troff fills text automatically, it is common practice in the roff
673 language to avoid visual composition of text in input files: the es‐
674 thetic appeal of the formatted output is what matters. Therefore, roff
675 input should be arranged such that it is easy for authors and maintain‐
676 ers to compose and develop the document, understand the syntax of roff
677 requests, macro calls, and preprocessor languages used, and predict the
678 behavior of the formatter. Several traditions have accrued in service
679 of these goals.
680
681 • Follow sentence endings in the input with newlines to ease their
682 recognition. It is frequently convenient to end text lines after
683 colons and semicolons as well, as these typically precede independent
684 clauses. Consider doing so after commas; they often occur in lists
685 that become easy to scan when itemized by line, or constitute supple‐
686 ments to the sentence that are added, deleted, or updated to clarify
687 it. Parenthetical and quoted phrases are also good candidates for
688 placement on text lines by themselves.
689
690 • Set your text editor's line length to 72 characters or fewer; see the
691 subsections below. This limit, combined with the previous item of
692 advice, makes it less common that an input line will wrap in your
693 text editor, and thus will help you perceive excessively long con‐
694 structions in your text. Recall that natural languages originate in
695 speech, not writing, and that punctuation is correlated with pauses
696 for breathing and changes in prosody.
697
698 • Use \& after “!”, “?”, and “.” if they are followed by space, tab, or
699 newline characters and don't end a sentence.
700
701 • In filled text lines, use \& before “.” and “'” if they are preceded
702 by space, so that reflowing the input doesn't turn them into control
703 lines.
704
705 • Do not use spaces to perform indentation or align columns of a table.
706 Leading spaces are reliable when text is not being filled.
707
708 • Comment your document. It is never too soon to apply comments to
709 record information of use to future document maintainers (including
710 your future self). The \" escape sequence causes troff to ignore the
711 remainder of the input line.
712
713 • Use the empty request—a control character followed immediately by a
714 newline—to visually manage separation of material in input files.
715 Many of the groff project's own documents use an empty request be‐
716 tween sentences, after macro definitions, and where a break is ex‐
717 pected, and two empty requests between paragraphs or other requests
718 or macro calls that will introduce vertical space into the document.
719 You can combine the empty request with the comment escape sequence to
720 include whole-line comments in your document, and even “comment out”
721 sections of it.
722
723 An example sufficiently long to illustrate most of the above sugges‐
724 tions in practice follows. An arrow → indicates a tab character.
725
726 .\" nroff this_file.roff | less
727 .\" groff -T ps this_file.roff > this_file.ps
728 →The theory of relativity is intimately connected with
729 the theory of space and time.
730 .
731 I shall therefore begin with a brief investigation of
732 the origin of our ideas of space and time,
733 although in doing so I know that I introduce a
734 controversial subject. \" remainder of paragraph elided
735 .
736 .
737
738 →The experiences of an individual appear to us arranged
739 in a series of events;
740 in this series the single events which we remember
741 appear to be ordered according to the criterion of
742 \[lq]earlier\[rq] and \[lq]later\[rq], \" punct swapped
743 which cannot be analysed further.
744 .
745 There exists,
746 therefore,
747 for the individual,
748 an I-time,
749 or subjective time.
750 .
751 This itself is not measurable.
752 .
753 I can,
754 indeed,
755 associate numbers with the events,
756 in such a way that the greater number is associated with
757 the later event than with an earlier one;
758 but the nature of this association may be quite
759 arbitrary.
760 .
761 This association I can define by means of a clock by
762 comparing the order of events furnished by the clock
763 with the order of a given series of events.
764 .
765 We understand by a clock something which provides a
766 series of events which can be counted,
767 and which has other properties of which we shall speak
768 later.
769 .\" Albert Einstein, _The Meaning of Relativity_, 1922
770
771 Editing with Emacs
772 Official GNU doctrine holds that the best program for editing a roff
773 document is Emacs; see emacs(1). It provides an nroff major mode that
774 is suitable for all kinds of roff dialects. This mode can be activated
775 by the following methods.
776
777 When editing a file within Emacs the mode can be changed by typing “M-x
778 nroff-mode”, where M-x means to hold down the meta key (often labelled
779 “Alt”) while pressing and releasing the “x” key.
780
781 It is also possible to have the mode automatically selected when a roff
782 file is loaded into the editor.
783
784 • The most general method is to include file-local variables at the end
785 of the file; we can also configure the fill column this way.
786
787 .\" Local Variables:
788 .\" fill-column: 72
789 .\" mode: nroff
790 .\" End:
791
792 • Certain file name extensions, such as those commonly used by man
793 pages, trigger the automatic activation of the nroff mode.
794
795 • Technically, having the sequence
796
797 .\" -*- nroff -*-
798
799 in the first line of a file will cause Emacs to enter the nroff major
800 mode when it is loaded into the buffer. Unfortunately, some imple‐
801 mentations of the man(1) program are confused by this practice, so we
802 discourage it.
803
804 Editing with Vim
805 Other editors provide support for roff-style files too, such as vim(1),
806 an extension of the vi(1) program. Vim's highlighting can be made to
807 recognize roff files by setting the filetype option in a Vim modeline.
808 For this feature to work, your copy of vim must be built with support
809 for, and configured to enable, several features; consult the editor's
810 online help topics “auto-setting”, “filetype”, and “syntax”. Then put
811 the following at the end of your roff files, after any Emacs configura‐
812 tion:
813
814 .\" vim: set filetype=groff textwidth=72:
815
816 Replace “groff” in the above with “nroff” if you want highlighting that
817 does not recognize many of the GNU extensions to roff, such as request,
818 register, and string names longer than two characters.
819
821 This document was written by Bernd Warken ⟨groff-bernd.warken-72@web
822 .de⟩ and G. Branden Robinson ⟨g.branden.robinson@gmail.com⟩.
823
825 Much roff documentation is available. The Bell Labs papers describing
826 AT&T troff remain available, and groff is documented comprehensively.
827
828 Internet sites
829 Unix Text Processing ⟨https://github.com/larrykollar/
830 Unix-Text-Processing⟩, by Dale Dougherty and Tim O'Reilly, 1987, Hayden
831 Books. This well-regarded text brings the reader from a state of no
832 knowledge of Unix or text editing (if necessary) to sophisticated com‐
833 puter-aided typesetting. It has been placed under a free software li‐
834 cense by its authors and updated by a team of groff contributors and
835 enthusiasts.
836
837 “History of Unix Manpages” ⟨http://manpages.bsd.lv/history.html⟩, an
838 online article maintained by the mdocml project, provides an overview
839 of roff development from Saltzer's RUNOFF to 2008, with links to origi‐
840 nal documentation and recollections of the authors and their contempo‐
841 raries.
842
843 troff.org ⟨http://www.troff.org/⟩, Ralph Corderoy's troff site, pro‐
844 vides an overview and pointers to much historical roff information.
845
846 Multicians ⟨http://www.multicians.org/⟩, a site by Multics enthusiasts,
847 contains a lot of information on the MIT projects CTSS and Multics, in‐
848 cluding RUNOFF; it is especially useful for its glossary and the many
849 links to historical documents.
850
851 The Unix Archive ⟨http://www.tuhs.org/Archive/⟩, curated by the Unix
852 Heritage Society, provides the source code and some binaries of histor‐
853 ical Unices (including the source code of some versions of troff and
854 its documentation) contributed by their copyright holders.
855
856 Jerry Saltzer's home page ⟨http://web.mit.edu/Saltzer/www/publications/
857 pubs.html⟩ stores some documents using the original RUNOFF formatting
858 language.
859
860 groff ⟨http://www.gnu.org/software/groff⟩, GNU roff's web site, pro‐
861 vides convenient access to groff's source code repository, bug tracker,
862 and mailing lists (including archives and the subscription interface).
863
864 Historical roff documentation
865 Many AT&T troff documents are available online, and can be found at
866 Ralph Corderoy's site (see above) or via Internet search.
867
868 Of foremost significance are two mentioned in section “History” above,
869 describing the language and its device-independent implementation, re‐
870 spectively.
871
872 “Troff User's Manual” by Joseph F. Ossanna, 1976 (revised by Brian W.
873 Kernighan, 1992), AT&T Bell Laboratories Computing Science Technical
874 Report No. 54.
875
876 “A Typesetter-independent TROFF” by Brian W. Kernighan, 1982, AT&T Bell
877 Laboratories Computing Science Technical Report No. 97.
878
879 You can obtain many relevant Bell Labs papers in PDF from Bernd
880 Warken's “roff classical” GitHub repository ⟨https://github.com/
881 bwarken/roff_classical.git⟩.
882
883 Manual pages
884 As a system of multiple components, a roff system potentially has many
885 man pages, each describing an aspect of it. Unfortunately, there is no
886 consistent naming scheme for these pages among the different roff im‐
887 plementations.
888
889 For GNU roff, the groff(1) man page enumerates all man pages distrib‐
890 uted with the system, and individual pages frequently refer to external
891 resources as well as manuals distributed with groff on a variety of
892 topics.
893
894 With other roffs, you are on your own, but troff(1) might be a good
895 starting point.
896
897
898
899groff 1.23.0 2 November 2023 roff(7)