1MH-FORMAT(5) File Formats Manual MH-FORMAT(5)
2
3
4
6 mh-format - formatting language for nmh message system
7
9 Several nmh commands utilize either a format string or a format file
10 during their execution. For example, scan uses a format string to gen‐
11 erate its listing of messages; repl uses a format file to generate mes‐
12 sage replies, and so on.
13
14 There are a number of scan listing formats available, including
15 nmh/etc/scan.time, nmh/etc/scan.size, and nmh/etc/scan.timely. Look in
16 /etc/nmh for other scan and repl format files which may have been writ‐
17 ten at your site.
18
19 You can have your local nmh expert write new format commands or modify
20 existing ones, or you can try your hand at it yourself. This manual
21 section explains how to do that. Note: some familiarity with the C
22 printf routine is assumed.
23
24 A format string consists of ordinary text combined with special, multi-
25 character, escape sequences which begin with `%'. When specifying a
26 format string, the usual C backslash characters are honored: `\b',
27 `\f', `\n', `\r', and `\t'. Continuation lines in format files end
28 with `\' followed by the newline character. A literal `%' can be
29 inserted into a format file by using the sequence `%%'.
30
31 SYNTAX
32 Format strings are built around escape sequences. There are three
33 types of escape sequence: header components, built-in functions, and
34 flow control. Comments may be inserted in most places where a function
35 argument is not expected. A comment begins with `%;' and ends with a
36 (non-escaped) newline.
37
38 Component escapes
39 A component escape is specified as `%{component}', and exists for each
40 header in the message being processed. For example, `%{date}' refers
41 to the “Date:” field of the message. All component escapes have a
42 string value. Such values are usually compressed by converting any
43 control characters (tab and newline included) to spaces, then eliding
44 any leading or multiple spaces. Some commands, however, may interpret
45 some component escapes differently; be sure to refer to each command's
46 manual entry for details. Some commands (such as ap(8) and mhl(1)) use
47 a special component `%{text}' to refer to the text being processed; see
48 their respective man pages for details and examples.
49
50 Function escapes
51 A function escape is specified as `%(function)'. All functions are
52 built-in, and most have a string or integer value. A function escape
53 may take an argument. The argument follows the function escape (and
54 any separating whitespace is discarded) as in the following example:
55
56 %(function argument)
57
58 In addition to literal numbers or strings, the argument to a function
59 escape can be another function, or a component, or a control escape.
60 When the argument is a function or a component, the argument is speci‐
61 fied without a leading `%'. When the argument is a control escape, it
62 is specified with a leading `%'.
63
64 Control escapes
65 A control escape is one of: `%<', `%?', `%|', or `%>'. These are com‐
66 bined into the conditional execution construct:
67
68 %< condition format-text
69 %? condition format-text
70 ...
71 %| format-text
72 %>
73
74 (Extra white space is shown here only for clarity.) These constructs,
75 which may be nested without ambiguity, form a general if-elseif-else-
76 endif block where only one of the format-texts is interpreted. In
77 other words, `%<' is like the "if", `%?' is like the "elseif", `%|' is
78 like "else", and `%>' is like "endif".
79
80 A `%<' or `%?' control escape causes its condition to be evaluated.
81 This condition is a component or function. For components and func‐
82 tions whose value is an integer, the condition is true if it is non-
83 zero, and false if zero. For components and functions whose value is a
84 string, the condition is true it is a non-empty string, and false if an
85 empty string.
86
87 The `%?' control escape is optional, and can be used multiple times in
88 a conditional block. The `%|' control escape is also optional, but may
89 only be used once.
90
91 Function escapes
92 Functions expecting an argument generally require an argument of a par‐
93 ticular type. In addition to the integer and string types, these
94 include:
95
96 Argument Description Example Syntax
97 literal A literal number %(func 1234)
98 or string %(func text string)
99 comp Any component %(func{in-reply-to})
100 date A date component %(func{date})
101 addr An address component %(func{from})
102 expr Nothing %(func)
103 or a subexpression %(func(func2))
104 or control escape %(func %<{reply-to}%|%{from}%>)
105
106 The date and addr types have the same syntax as the component type,
107 comp, but require a header component which is a date, or address,
108 string, respectively.
109
110 Most arguments not of type expr are required. When escapes are nested
111 (via expr arguments), evaluation is done from innermost to outermost.
112 As noted above, for the expr argument type, functions and components
113 are written without a leading `%'. Control escape arguments must use a
114 leading `%', preceded by a space.
115
116 For example,
117
118 %<(mymbox{from}) To: %{to}%>
119
120 writes the value of the header component “From:” to the internal reg‐
121 ister named str; then (mymbox) reads str and writes its result to the
122 internal register named num; then the control escape, `%<', evaluates
123 num. If num is non-zero, the string “To:” is printed followed by the
124 value of the header component “To:”.
125
126 Evaluation
127 The evaluation of format strings is performed by a small virtual
128 machine. The machine is capable of evaluating nested expressions (as
129 described above) and, in addition, has an integer register num, and a
130 text string register str. When a function escape that accepts an
131 optional argument is processed, and the argument is not present, the
132 current value of either num or str is substituted as the argument: the
133 register used depends on the function, as listed below.
134
135 Component escapes write the value of their message header in str.
136 Function escapes write their return value in num for functions return‐
137 ing integer or boolean values, and in str for functions returning
138 string values. (The boolean type is a subset of integers, with usual
139 values 0=false and 1=true.) Control escapes return a boolean value,
140 setting num to 1 if the last explicit condition evaluated by a `%<' or
141 `%?' control escape succeeded, and 0 otherwise.
142
143 All component escapes, and those function escapes which return an inte‐
144 ger or string value, evaluate to their value as well as setting str or
145 num. Outermost escape expressions in these forms will print their
146 value, but outermost escapes which return a boolean value do not result
147 in printed output.
148
149 Functions
150 The function escapes may be roughly grouped into a few categories.
151
152 Function Argument Return Description
153 msg integer message number
154 cur integer message is current (0 or 1)
155 unseen integer message is unseen (0 or 1)
156 size integer size of message
157 strlen integer length of str
158 width integer column width of terminal
159 charleft integer bytes left in output buffer
160 timenow integer seconds since the Unix epoch
161 me string the user's mailbox (username)
162 myhost string the user's local hostname
163 myname string the user's name
164 localmbox string the complete local mailbox
165 eq literal boolean num == arg
166 ne literal boolean num != arg
167 gt literal boolean num > arg
168 match literal boolean str contains arg
169 amatch literal boolean str starts with arg
170 plus literal integer arg plus num
171 minus literal integer arg minus num
172 multiply literal integer num multiplied by arg
173 divide literal integer num divided by arg
174 modulo literal integer num modulo arg
175 num literal integer Set num to arg.
176 num integer Set num to zero.
177 lit literal string Set str to arg.
178 lit string Clear str.
179 getenv literal string Set str to environment value of arg
180 profile literal string Set str to profile component arg
181 value
182 nonzero expr boolean num is non-zero
183 zero expr boolean num is zero
184 null expr boolean str is empty
185 nonnull expr boolean str is non-empty
186 void expr Set str or num
187 comp comp string Set str to component text
188 compval comp integer Set num to “atoi(comp)”
189 decode expr string decode str as RFC 2047 (MIME-encoded)
190 component
191 unquote expr string remove RFC 2822 quotes from str
192 trim expr trim trailing whitespace from str
193 kilo expr string express in SI units: 15.9K, 2.3M, etc.
194 %(kilo) scales by factors of 1000,
195 kibi expr string express in IEC units: 15.5Ki, 2.2Mi.
196 %(kibi) scales by factors of 1024.
197 putstr expr print str
198 putstrf expr print str in a fixed width
199 putnum expr print num
200 putnumf expr print num in a fixed width
201 putlit expr print str without space compression
202 zputlit expr print str without space compression;
203 str must occupy no width on display
204 bold string set terminal bold mode
205 underline string set terminal underlined mode
206 standout string set terminal standout mode
207 resetterm string reset all terminal attributes
208 hascolor boolean terminal supports color
209 fgcolor literal string set terminal foreground color
210 bgcolor literal string set terminal background color
211 formataddr expr append arg to str as a
212 (comma separated) address list
213 concataddr expr append arg to str as a
214 (comma separated) address list,
215 including duplicates,
216 see Special Handling
217 putaddr literal print str address list with
218 arg as optional label;
219 get line width from num
220
221 The (me) function returns the username of the current user. The
222 (myhost) function returns the localname entry in mts.conf, or the local
223 hostname if localname is not configured. The (myname) function will
224 return the value of the SIGNATURE environment variable if set, other‐
225 wise it will return the passwd GECOS field (truncated at the first
226 comma if it contains one) for the current user. The (localmbox) func‐
227 tion will return the complete form of the local mailbox, suitable for
228 use in a “From” header. It will return the “Local-Mailbox” profile
229 entry if there is one; if not, it will be equivalent to:
230
231 %(myname) <%(me)@%(myhost)>
232
233 The following functions require a date component as an argument:
234
235 Function Argument Return Description
236 sec date integer seconds of the minute
237 min date integer minutes of the hour
238 hour date integer hours of the day (0-23)
239 wday date integer day of the week (Sun=0)
240 day date string day of the week (abbrev.)
241 weekday date string day of the week
242 sday date integer day of the week known?
243 (1=explicit,0=implicit,-1=unknown)
244 mday date integer day of the month
245 yday date integer day of the year
246 mon date integer month of the year
247 month date string month of the year (abbrev.)
248 lmonth date string month of the year
249 year date integer year (may be > 100)
250 zone date integer timezone in minutes
251 tzone date string timezone string
252 szone date integer timezone explicit?
253 (1=explicit,0=implicit,-1=unknown)
254 date2local date coerce date to local timezone
255 date2gmt date coerce date to GMT
256 dst date integer daylight savings in effect? (0 or 1)
257 clock date integer seconds since the Unix epoch
258 rclock date integer seconds prior to current time
259 tws date string official RFC 822 rendering
260 pretty date string user-friendly rendering
261 nodate date integer returns 1 if date is invalid
262
263 The following functions require an address component as an argument.
264 The return value of functions noted with `*' is computed from the first
265 address present in the header component.
266
267 Function Argument Return Description
268 proper addr string official RFC 822 rendering
269 friendly addr string user-friendly rendering
270 addr addr string mbox@host or host!mbox rendering*
271 pers addr string the personal name*
272 note addr string commentary text*
273 mbox addr string the local mailbox*
274 mymbox addr integer list has the user's address? (0 or 1)
275 getmymbox addr string the user's (first) address,
276 with personal name
277 getmyaddr addr string the user's (first) address,
278 without personal name
279 host addr string the host domain*
280 nohost addr integer no host was present (0 or 1)*
281 type addr integer host type* (0=local,1=network,
282 -1=uucp,2=unknown)
283 path addr string any leading host route*
284 ingrp addr integer address was inside a group (0 or 1)*
285 gname addr string name of group*
286
287 (A clarification on (mymbox{comp}) is in order. This function checks
288 each of the addresses in the header component “comp” against the user's
289 mailbox name and any “Alternate-Mailboxes”. It returns true if any
290 address matches. However, it also returns true if the “comp” header is
291 not present in the message. If needed, the (null) function can be used
292 to explicitly test for this case.)
293
294 Formatting
295 When a function or component escape is interpreted and the result will
296 be printed immediately, an optional field width can be specified to
297 print the field in exactly a given number of characters. For example,
298 a numeric escape like %4(size) will print at most 4 digits of the mes‐
299 sage size; overflow will be indicated by a `?' in the first position
300 (like `?234'). A string escape like %4(me) will print the first 4
301 characters and truncate at the end. Short fields are padded at the
302 right with the fill character (normally, a blank). If the field width
303 argument begins with a leading zero, then the fill character is set to
304 a zero.
305
306 The functions (putnumf) and (putstrf) print their result in exactly the
307 number of characters specified by their leading field width argument.
308 For example, %06(putnumf(size)) will print the message size in a field
309 six characters wide filled with leading zeros; %14(putstrf{from}) will
310 print the “From:” header component in fourteen characters with trailing
311 spaces added as needed. Using a negative value for the field width
312 causes right-justification within the field, with padding on the left
313 up to the field width. Padding is with spaces except for a left-padded
314 putnumf when the width starts with zero. The functions (putnum) and
315 (putstr) are somewhat special: they print their result in the minimum
316 number of characters required, and ignore any leading field width argu‐
317 ment. The (putlit) function outputs the exact contents of the str reg‐
318 ister without any changes such as duplicate space removal or control
319 character conversion. Similarly, the (zputlit) function outputs the
320 exact contents of the str register, but requires that those contents
321 not occupy any output width. It can therefore be used for outputting
322 terminal escape sequences.
323
324 There are a limited number of function escapes to output terminal
325 escape sequences. These sequences are retrieved from the terminfo(5)
326 database according to the current terminal setting. The (bold),
327 (underline), and (standout) escapes set bold mode, underline mode, and
328 standout mode respectively. (hascolor) can be used to determine if the
329 current terminal supports color. (fgcolor) and (bgcolor) set the fore‐
330 ground and background colors respectively. Both of these escapes take
331 one literal argument, the color name, which can be one of: black, red,
332 green, yellow, blue, magenta, cyan, white. (resetterm) resets all ter‐
333 minal attributes to their default setting. These terminal escapes
334 should be used in conjunction with (zputlit) (preferred) or (putlit),
335 as the normal (putstr) function will strip out control characters.
336
337 The available output width is kept in an internal register; any output
338 exceeding this width will be truncated. The one exception to this is
339 that (zputlit) functions will still be executed if a terminal reset
340 code is being placed at the end of a line.
341
342 Special Handling
343 Some functions have different behavior depending on the command they
344 are invoked from.
345
346 In repl the (formataddr) function stores all email addresses encoun‐
347 tered into an internal cache and will use this cache to suppress dupli‐
348 cate addresses. If you need to create an address list that includes
349 previously-seen addresses you may use the (concataddr) function, which
350 is identical to (formataddr) in all other respects. Note that (con‐
351 cataddr) does not add addresses to the duplicate-suppression cache.
352
353 Other Hints and Tips
354 Sometimes, the writer of a format function is confused because output
355 is duplicated. The general rule to remember is simple: If a function
356 or component escape begins with a `%', it will generate text in the
357 output file. Otherwise, it will not.
358
359 A good example is a simple attempt to generate a To: header based on
360 the From: and Reply-To: headers:
361
362 %(formataddr %<{reply-to}%|%{from})%(putaddr To: )
363
364 Unfortunately, if the Reply-to: header is not present, the output line
365 will be something like:
366
367 My From User <from@example.com>To: My From User <from@example.com>
368
369 What went wrong? When performing the test for the if clause (%<), the
370 component is not output because it is considered an argument to the if
371 statement (so the rule about not starting with % applies). But the
372 component escape in our else statement (everything after the `%|') is
373 not an argument to anything; it begins with a %, and thus the value of
374 that component is output. This also has the side effect of setting the
375 str register, which is later picked up by the (formataddr) function and
376 then output by (putaddr). The example format string above has another
377 bug: there should always be a valid width value in the num register
378 when (putaddr) is called, otherwise bad formatting can take place.
379
380 The solution is to use the (void) function; this will prevent the func‐
381 tion or component from outputting any text. With this in place (and
382 using (width) to set the num register for the width) a better implemen‐
383 tation would look like:
384
385 %(formataddr %<{reply-to}%|%(void{from})%(void(width))%(putaddr To: )
386
387 It should be noted here that the side effects of function and component
388 escapes are still in force and, as a result, each component test in the
389 if-elseif-else-endif clause sets the str register.
390
391 As an additional note, the (formataddr) and (concataddr) functions have
392 special behavior when it comes to the str register. The starting point
393 of the register is saved and is used to build up entries in the address
394 list.
395
396 You will find the fmttest(1) utility invaluable when debugging problems
397 with format strings.
398
399 Examples
400 With all the above in mind, here is a breakdown of the default format
401 string for scan. The first part is:
402
403 %4(msg)%<(cur)+%| %>%<{replied}-%?{encrypted}E%| %>
404
405 which says that the message number should be printed in four digits.
406 If the message is the current message then a `+', else a space, should
407 be printed; if a “Replied:” field is present then a `-', else if an
408 “Encrypted:” field is present then an `E', otherwise a space, should be
409 printed. Next:
410
411 %02(mon{date})/%02(mday{date})
412
413 the month and date are printed in two digits (zero filled) separated by
414 a slash. Next,
415
416 %<{date} %|*%>
417
418 If a “Date:” field is present it is printed, followed by a space; oth‐
419 erwise a `*' is printed. Next,
420
421 %<(mymbox{from})%<{to}To:%14(decode(friendly{to}))%>%>
422
423 if the message is from me, and there is a “To:” header, print “To:”
424 followed by a “user-friendly” rendering of the first address in the
425 “To:” field; any MIME-encoded characters are decoded into the actual
426 characters. Continuing,
427
428 %<(zero)%17(decode(friendly{from}))%>
429
430 if either of the above two tests failed, then the “From:” address is
431 printed in a mime-decoded, “user-friendly” format. And finally,
432
433 %(decode{subject})%<{body}<<%{body}>>%>
434
435 the mime-decoded subject and initial body (if any) are printed.
436
437 For a more complicated example, consider a possible replcomps format
438 file.
439
440 %(lit)%(formataddr %<{reply-to}
441
442 This clears str and formats the “Reply-To:” header if present. If not
443 present, the else-if clause is executed.
444
445 %?{from}%?{sender}%?{return-path}%>)\
446
447 This formats the “From:”, “Sender:” and “Return-Path:” headers, stop‐
448 ping as soon as one of them is present. Next:
449
450 %<(nonnull)%(void(width))%(putaddr To: )\n%>\
451
452 If the formataddr result is non-null, it is printed as an address (with
453 line folding if needed) in a field width wide, with a leading label of
454 “To:”.
455
456 %(lit)%(formataddr{to})%(formataddr{cc})%(formataddr(me))\
457
458 str is cleared, and the “To:” and “Cc:” headers, along with the user's
459 address (depending on what was specified with the “-cc” switch to repl)
460 are formatted.
461
462 %<(nonnull)%(void(width))%(putaddr cc: )\n%>\
463
464 If the result is non-null, it is printed as above with a leading label
465 of “cc:”.
466
467 %<{fcc}Fcc: %{fcc}\n%>\
468
469 If a -fcc folder switch was given to repl (see repl(1) for more details
470 about %{fcc}), an “Fcc:” header is output.
471
472 %<{subject}Subject: Re: %{subject}\n%>\
473
474 If a subject component was present, a suitable reply subject is output.
475
476 %<{message-id}In-Reply-To: %{message-id}\n%>\
477 %<{message-id}References: %<{references} %{references}%>\
478 %{message-id}\n%>
479 --------
480
481 If a message-id component was present, an “In-Reply-To:” header is out‐
482 put including the message-id, followed by a “References:” header with
483 references, if present, and the message-id. As with all plain-text,
484 the row of dashes are output as-is.
485
486 This last part is a good example for a little more elaboration. Here's
487 that part again in pseudo-code:
488
489 if (comp_exists(message-id)) then
490 print (“In-reply-to: ”)
491 print (message-id.value)
492 print (“\n”)
493 endif
494 if (comp_exists(message-id)) then
495 print (“References: ”)
496 if (comp_exists(references)) then
497 print(references.value);
498 endif
499 print (message-id.value)
500 print (“\n”)
501 endif
502
503 One more example: Currently, nmh supports very large message numbers,
504 and it is not uncommon for a folder to have far more than 10000 mes‐
505 sages. Nonetheless (as noted above) the various scan format strings,
506 inherited from older MH versions, are generally hard-coded to 4 digits
507 for the message number. Thereafter, formatting problems occur. The nmh
508 format strings can be modified to behave more sensibly with larger mes‐
509 sage numbers:
510
511 %(void(msg))%<(gt 9999)%(msg)%|%4(msg)%>
512
513 The current message number is placed in num. (Note that (msg) is a
514 function escape which returns an integer, it is not a component.) The
515 (gt) conditional is used to test whether the message number has 5 or
516 more digits. If so, it is printed at full width, otherwise at 4 dig‐
517 its.
518
520 scan(1), repl(1), fmttest(1),
521
523 None
524
525
526
527nmh-1.7.1 2015-01-10 MH-FORMAT(5)