1PERLPODSPEC(1) Perl Programmers Reference Guide PERLPODSPEC(1)
2
3
4
6 perlpodspec - Plain Old Documentation: format specification and notes
7
9 This document is detailed notes on the Pod markup language. Most
10 people will only have to read perlpod to know how to write in Pod, but
11 this document may answer some incidental questions to do with parsing
12 and rendering Pod.
13
14 In this document, "must" / "must not", "should" / "should not", and
15 "may" have their conventional (cf. RFC 2119) meanings: "X must do Y"
16 means that if X doesn't do Y, it's against this specification, and
17 should really be fixed. "X should do Y" means that it's recommended,
18 but X may fail to do Y, if there's a good reason. "X may do Y" is
19 merely a note that X can do Y at will (although it is up to the reader
20 to detect any connotation of "and I think it would be nice if X did Y"
21 versus "it wouldn't really bother me if X did Y").
22
23 Notably, when I say "the parser should do Y", the parser may fail to do
24 Y, if the calling application explicitly requests that the parser not
25 do Y. I often phrase this as "the parser should, by default, do Y."
26 This doesn't require the parser to provide an option for turning off
27 whatever feature Y is (like expanding tabs in verbatim paragraphs),
28 although it implicates that such an option may be provided.
29
31 Pod is embedded in files, typically Perl source files, although you can
32 write a file that's nothing but Pod.
33
34 A line in a file consists of zero or more non-newline characters,
35 terminated by either a newline or the end of the file.
36
37 A newline sequence is usually a platform-dependent concept, but Pod
38 parsers should understand it to mean any of CR (ASCII 13), LF (ASCII
39 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in addition
40 to any other system-specific meaning. The first CR/CRLF/LF sequence in
41 the file may be used as the basis for identifying the newline sequence
42 for parsing the rest of the file.
43
44 A blank line is a line consisting entirely of zero or more spaces
45 (ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-
46 file. A non-blank line is a line containing one or more characters
47 other than space or tab (and terminated by a newline or end-of-file).
48
49 (Note: Many older Pod parsers did not accept a line consisting of
50 spaces/tabs and then a newline as a blank line. The only lines they
51 considered blank were lines consisting of no characters at all,
52 terminated by a newline.)
53
54 Whitespace is used in this document as a blanket term for spaces, tabs,
55 and newline sequences. (By itself, this term usually refers to literal
56 whitespace. That is, sequences of whitespace characters in Pod source,
57 as opposed to "E<32>", which is a formatting code that denotes a
58 whitespace character.)
59
60 A Pod parser is a module meant for parsing Pod (regardless of whether
61 this involves calling callbacks or building a parse tree or directly
62 formatting it). A Pod formatter (or Pod translator) is a module or
63 program that converts Pod to some other format (HTML, plaintext, TeX,
64 PostScript, RTF). A Pod processor might be a formatter or translator,
65 or might be a program that does something else with the Pod (like
66 counting words, scanning for index points, etc.).
67
68 Pod content is contained in Pod blocks. A Pod block starts with a line
69 that matches "m/\A=[a-zA-Z]/", and continues up to the next line that
70 matches "m/\A=cut/" or up to the end of the file if there is no
71 "m/\A=cut/" line.
72
73 Note that a parser is not expected to distinguish between something
74 that looks like pod, but is in a quoted string, such as a here
75 document.
76
77 Within a Pod block, there are Pod paragraphs. A Pod paragraph consists
78 of non-blank lines of text, separated by one or more blank lines.
79
80 For purposes of Pod processing, there are four types of paragraphs in a
81 Pod block:
82
83 • A command paragraph (also called a "directive"). The first line of
84 this paragraph must match "m/\A=[a-zA-Z]/". Command paragraphs are
85 typically one line, as in:
86
87 =head1 NOTES
88
89 =item *
90
91 But they may span several (non-blank) lines:
92
93 =for comment
94 Hm, I wonder what it would look like if
95 you tried to write a BNF for Pod from this.
96
97 =head3 Dr. Strangelove, or: How I Learned to
98 Stop Worrying and Love the Bomb
99
100 Some command paragraphs allow formatting codes in their content
101 (i.e., after the part that matches "m/\A=[a-zA-Z]\S*\s*/"), as in:
102
103 =head1 Did You Remember to C<use strict;>?
104
105 In other words, the Pod processing handler for "head1" will apply
106 the same processing to "Did You Remember to C<use strict;>?" that
107 it would to an ordinary paragraph (i.e., formatting codes like
108 "C<...>") are parsed and presumably formatted appropriately, and
109 whitespace in the form of literal spaces and/or tabs is not
110 significant.
111
112 • A verbatim paragraph. The first line of this paragraph must be a
113 literal space or tab, and this paragraph must not be inside a
114 "=begin identifier", ... "=end identifier" sequence unless
115 "identifier" begins with a colon (":"). That is, if a paragraph
116 starts with a literal space or tab, but is inside a "=begin
117 identifier", ... "=end identifier" region, then it's a data
118 paragraph, unless "identifier" begins with a colon.
119
120 Whitespace is significant in verbatim paragraphs (although, in
121 processing, tabs are probably expanded).
122
123 • An ordinary paragraph. A paragraph is an ordinary paragraph if its
124 first line matches neither "m/\A=[a-zA-Z]/" nor "m/\A[ \t]/", and
125 if it's not inside a "=begin identifier", ... "=end identifier"
126 sequence unless "identifier" begins with a colon (":").
127
128 • A data paragraph. This is a paragraph that is inside a "=begin
129 identifier" ... "=end identifier" sequence where "identifier" does
130 not begin with a literal colon (":"). In some sense, a data
131 paragraph is not part of Pod at all (i.e., effectively it's "out-
132 of-band"), since it's not subject to most kinds of Pod parsing; but
133 it is specified here, since Pod parsers need to be able to call an
134 event for it, or store it in some form in a parse tree, or at least
135 just parse around it.
136
137 For example: consider the following paragraphs:
138
139 # <- that's the 0th column
140
141 =head1 Foo
142
143 Stuff
144
145 $foo->bar
146
147 =cut
148
149 Here, "=head1 Foo" and "=cut" are command paragraphs because the first
150 line of each matches "m/\A=[a-zA-Z]/". "[space][space]$foo->bar" is a
151 verbatim paragraph, because its first line starts with a literal
152 whitespace character (and there's no "=begin"..."=end" region around).
153
154 The "=begin identifier" ... "=end identifier" commands stop paragraphs
155 that they surround from being parsed as ordinary or verbatim
156 paragraphs, if identifier doesn't begin with a colon. This is
157 discussed in detail in the section "About Data Paragraphs and
158 "=begin/=end" Regions".
159
161 This section is intended to supplement and clarify the discussion in
162 "Command Paragraph" in perlpod. These are the currently recognized Pod
163 commands:
164
165 "=head1", "=head2", "=head3", "=head4", "=head5", "=head6"
166 This command indicates that the text in the remainder of the
167 paragraph is a heading. That text may contain formatting codes.
168 Examples:
169
170 =head1 Object Attributes
171
172 =head3 What B<Not> to Do!
173
174 Both "=head5" and "=head6" were added in 2020 and might not be
175 supported on all Pod parsers. Pod::Simple 3.41 was released on
176 October 2020 and supports both of these providing support for all
177 Pod::Simple-based Pod parsers.
178
179 "=pod"
180 This command indicates that this paragraph begins a Pod block. (If
181 we are already in the middle of a Pod block, this command has no
182 effect at all.) If there is any text in this command paragraph
183 after "=pod", it must be ignored. Examples:
184
185 =pod
186
187 This is a plain Pod paragraph.
188
189 =pod This text is ignored.
190
191 "=cut"
192 This command indicates that this line is the end of this previously
193 started Pod block. If there is any text after "=cut" on the line,
194 it must be ignored. Examples:
195
196 =cut
197
198 =cut The documentation ends here.
199
200 =cut
201 # This is the first line of program text.
202 sub foo { # This is the second.
203
204 It is an error to try to start a Pod block with a "=cut" command.
205 In that case, the Pod processor must halt parsing of the input
206 file, and must by default emit a warning.
207
208 "=over"
209 This command indicates that this is the start of a list/indent
210 region. If there is any text following the "=over", it must
211 consist of only a nonzero positive numeral. The semantics of this
212 numeral is explained in the "About =over...=back Regions" section,
213 further below. Formatting codes are not expanded. Examples:
214
215 =over 3
216
217 =over 3.5
218
219 =over
220
221 "=item"
222 This command indicates that an item in a list begins here.
223 Formatting codes are processed. The semantics of the (optional)
224 text in the remainder of this paragraph are explained in the "About
225 =over...=back Regions" section, further below. Examples:
226
227 =item
228
229 =item *
230
231 =item *
232
233 =item 14
234
235 =item 3.
236
237 =item C<< $thing->stuff(I<dodad>) >>
238
239 =item For transporting us beyond seas to be tried for pretended
240 offenses
241
242 =item He is at this time transporting large armies of foreign
243 mercenaries to complete the works of death, desolation and
244 tyranny, already begun with circumstances of cruelty and perfidy
245 scarcely paralleled in the most barbarous ages, and totally
246 unworthy the head of a civilized nation.
247
248 "=back"
249 This command indicates that this is the end of the region begun by
250 the most recent "=over" command. It permits no text after the
251 "=back" command.
252
253 "=begin formatname"
254 "=begin formatname parameter"
255 This marks the following paragraphs (until the matching "=end
256 formatname") as being for some special kind of processing. Unless
257 "formatname" begins with a colon, the contained non-command
258 paragraphs are data paragraphs. But if "formatname" does begin
259 with a colon, then non-command paragraphs are ordinary paragraphs
260 or data paragraphs. This is discussed in detail in the section
261 "About Data Paragraphs and "=begin/=end" Regions".
262
263 It is advised that formatnames match the regexp
264 "m/\A:?[-a-zA-Z0-9_]+\z/". Everything following whitespace after
265 the formatname is a parameter that may be used by the formatter
266 when dealing with this region. This parameter must not be repeated
267 in the "=end" paragraph. Implementors should anticipate future
268 expansion in the semantics and syntax of the first parameter to
269 "=begin"/"=end"/"=for".
270
271 "=end formatname"
272 This marks the end of the region opened by the matching "=begin
273 formatname" region. If "formatname" is not the formatname of the
274 most recent open "=begin formatname" region, then this is an error,
275 and must generate an error message. This is discussed in detail in
276 the section "About Data Paragraphs and "=begin/=end" Regions".
277
278 "=for formatname text..."
279 This is synonymous with:
280
281 =begin formatname
282
283 text...
284
285 =end formatname
286
287 That is, it creates a region consisting of a single paragraph; that
288 paragraph is to be treated as a normal paragraph if "formatname"
289 begins with a ":"; if "formatname" doesn't begin with a colon, then
290 "text..." will constitute a data paragraph. There is no way to use
291 "=for formatname text..." to express "text..." as a verbatim
292 paragraph.
293
294 "=encoding encodingname"
295 This command, which should occur early in the document (at least
296 before any non-US-ASCII data!), declares that this document is
297 encoded in the encoding encodingname, which must be an encoding
298 name that Encode recognizes. (Encode's list of supported
299 encodings, in Encode::Supported, is useful here.) If the Pod
300 parser cannot decode the declared encoding, it should emit a
301 warning and may abort parsing the document altogether.
302
303 A document having more than one "=encoding" line should be
304 considered an error. Pod processors may silently tolerate this if
305 the not-first "=encoding" lines are just duplicates of the first
306 one (e.g., if there's a "=encoding utf8" line, and later on another
307 "=encoding utf8" line). But Pod processors should complain if
308 there are contradictory "=encoding" lines in the same document
309 (e.g., if there is a "=encoding utf8" early in the document and
310 "=encoding big5" later). Pod processors that recognize BOMs may
311 also complain if they see an "=encoding" line that contradicts the
312 BOM (e.g., if a document with a UTF-16LE BOM has an "=encoding
313 shiftjis" line).
314
315 If a Pod processor sees any command other than the ones listed above
316 (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", or "=w123"),
317 that processor must by default treat this as an error. It must not
318 process the paragraph beginning with that command, must by default warn
319 of this as an error, and may abort the parse. A Pod parser may allow a
320 way for particular applications to add to the above list of known
321 commands, and to stipulate, for each additional command, whether
322 formatting codes should be processed.
323
324 Future versions of this specification may add additional commands.
325
327 (Note that in previous drafts of this document and of perlpod,
328 formatting codes were referred to as "interior sequences", and this
329 term may still be found in the documentation for Pod parsers, and in
330 error messages from Pod processors.)
331
332 There are two syntaxes for formatting codes:
333
334 • A formatting code starts with a capital letter (just US-ASCII
335 [A-Z]) followed by a "<", any number of characters, and ending with
336 the first matching ">". Examples:
337
338 That's what I<you> think!
339
340 What's C<CORE::dump()> for?
341
342 X<C<chmod> and C<unlink()> Under Different Operating Systems>
343
344 • A formatting code starts with a capital letter (just US-ASCII
345 [A-Z]) followed by two or more "<"'s, one or more whitespace
346 characters, any number of characters, one or more whitespace
347 characters, and ending with the first matching sequence of two or
348 more ">"'s, where the number of ">"'s equals the number of "<"'s in
349 the opening of this formatting code. Examples:
350
351 That's what I<< you >> think!
352
353 C<<< open(X, ">>thing.dat") || die $! >>>
354
355 B<< $foo->bar(); >>
356
357 With this syntax, the whitespace character(s) after the "C<<<" and
358 before the ">>>" (or whatever letter) are not renderable. They do
359 not signify whitespace, are merely part of the formatting codes
360 themselves. That is, these are all synonymous:
361
362 C<thing>
363 C<< thing >>
364 C<< thing >>
365 C<<< thing >>>
366 C<<<<
367 thing
368 >>>>
369
370 and so on.
371
372 Finally, the multiple-angle-bracket form does not alter the
373 interpretation of nested formatting codes, meaning that the
374 following four example lines are identical in meaning:
375
376 B<example: C<$a E<lt>=E<gt> $b>>
377
378 B<example: C<< $a <=> $b >>>
379
380 B<example: C<< $a E<lt>=E<gt> $b >>>
381
382 B<<< example: C<< $a E<lt>=E<gt> $b >> >>>
383
384 In parsing Pod, a notably tricky part is the correct parsing of
385 (potentially nested!) formatting codes. Implementors should consult
386 the code in the "parse_text" routine in Pod::Parser as an example of a
387 correct implementation.
388
389 "I<text>" -- italic text
390 See the brief discussion in "Formatting Codes" in perlpod.
391
392 "B<text>" -- bold text
393 See the brief discussion in "Formatting Codes" in perlpod.
394
395 "C<code>" -- code text
396 See the brief discussion in "Formatting Codes" in perlpod.
397
398 "F<filename>" -- style for filenames
399 See the brief discussion in "Formatting Codes" in perlpod.
400
401 "X<topic name>" -- an index entry
402 See the brief discussion in "Formatting Codes" in perlpod.
403
404 This code is unusual in that most formatters completely discard
405 this code and its content. Other formatters will render it with
406 invisible codes that can be used in building an index of the
407 current document.
408
409 "Z<>" -- a null (zero-effect) formatting code
410 Discussed briefly in "Formatting Codes" in perlpod.
411
412 This code is unusual in that it should have no content. That is, a
413 processor may complain if it sees "Z<potatoes>". Whether or not it
414 complains, the potatoes text should ignored.
415
416 "L<name>" -- a hyperlink
417 The complicated syntaxes of this code are discussed at length in
418 "Formatting Codes" in perlpod, and implementation details are
419 discussed below, in "About L<...> Codes". Parsing the contents of
420 L<content> is tricky. Notably, the content has to be checked for
421 whether it looks like a URL, or whether it has to be split on
422 literal "|" and/or "/" (in the right order!), and so on, before
423 E<...> codes are resolved.
424
425 "E<escape>" -- a character escape
426 See "Formatting Codes" in perlpod, and several points in "Notes on
427 Implementing Pod Processors".
428
429 "S<text>" -- text contains non-breaking spaces
430 This formatting code is syntactically simple, but semantically
431 complex. What it means is that each space in the printable content
432 of this code signifies a non-breaking space.
433
434 Consider:
435
436 C<$x ? $y : $z>
437
438 S<C<$x ? $y : $z>>
439
440 Both signify the monospace (c[ode] style) text consisting of "$x",
441 one space, "?", one space, ":", one space, "$z". The difference is
442 that in the latter, with the S code, those spaces are not "normal"
443 spaces, but instead are non-breaking spaces.
444
445 If a Pod processor sees any formatting code other than the ones listed
446 above (as in "N<...>", or "Q<...>", etc.), that processor must by
447 default treat this as an error. A Pod parser may allow a way for
448 particular applications to add to the above list of known formatting
449 codes; a Pod parser might even allow a way to stipulate, for each
450 additional command, whether it requires some form of special
451 processing, as L<...> does.
452
453 Future versions of this specification may add additional formatting
454 codes.
455
456 Historical note: A few older Pod processors would not see a ">" as
457 closing a "C<" code, if the ">" was immediately preceded by a "-".
458 This was so that this:
459
460 C<$foo->bar>
461
462 would parse as equivalent to this:
463
464 C<$foo-E<gt>bar>
465
466 instead of as equivalent to a "C" formatting code containing only
467 "$foo-", and then a "bar>" outside the "C" formatting code. This
468 problem has since been solved by the addition of syntaxes like this:
469
470 C<< $foo->bar >>
471
472 Compliant parsers must not treat "->" as special.
473
474 Formatting codes absolutely cannot span paragraphs. If a code is
475 opened in one paragraph, and no closing code is found by the end of
476 that paragraph, the Pod parser must close that formatting code, and
477 should complain (as in "Unterminated I code in the paragraph starting
478 at line 123: 'Time objects are not...'"). So these two paragraphs:
479
480 I<I told you not to do this!
481
482 Don't make me say it again!>
483
484 ...must not be parsed as two paragraphs in italics (with the I code
485 starting in one paragraph and starting in another.) Instead, the first
486 paragraph should generate a warning, but that aside, the above code
487 must parse as if it were:
488
489 I<I told you not to do this!>
490
491 Don't make me say it again!E<gt>
492
493 (In SGMLish jargon, all Pod commands are like block-level elements,
494 whereas all Pod formatting codes are like inline-level elements.)
495
497 The following is a long section of miscellaneous requirements and
498 suggestions to do with Pod processing.
499
500 • Pod formatters should tolerate lines in verbatim blocks that are of
501 any length, even if that means having to break them (possibly
502 several times, for very long lines) to avoid text running off the
503 side of the page. Pod formatters may warn of such line-breaking.
504 Such warnings are particularly appropriate for lines are over 100
505 characters long, which are usually not intentional.
506
507 • Pod parsers must recognize all of the three well-known newline
508 formats: CR, LF, and CRLF. See perlport.
509
510 • Pod parsers should accept input lines that are of any length.
511
512 • Since Perl recognizes a Unicode Byte Order Mark at the start of
513 files as signaling that the file is Unicode encoded as in UTF-16
514 (whether big-endian or little-endian) or UTF-8, Pod parsers should
515 do the same. Otherwise, the character encoding should be
516 understood as being UTF-8 if the first highbit byte sequence in the
517 file seems valid as a UTF-8 sequence, or otherwise as CP-1252
518 (earlier versions of this specification used Latin-1 instead of
519 CP-1252).
520
521 Future versions of this specification may specify how Pod can
522 accept other encodings. Presumably treatment of other encodings in
523 Pod parsing would be as in XML parsing: whatever the encoding
524 declared by a particular Pod file, content is to be stored in
525 memory as Unicode characters.
526
527 • The well known Unicode Byte Order Marks are as follows: if the
528 file begins with the two literal byte values 0xFE 0xFF, this is the
529 BOM for big-endian UTF-16. If the file begins with the two literal
530 byte value 0xFF 0xFE, this is the BOM for little-endian UTF-16. On
531 an ASCII platform, if the file begins with the three literal byte
532 values 0xEF 0xBB 0xBF, this is the BOM for UTF-8. A mechanism
533 portable to EBCDIC platforms is to:
534
535 my $utf8_bom = "\x{FEFF}";
536 utf8::encode($utf8_bom);
537
538 • A naive, but often sufficient heuristic on ASCII platforms, for
539 testing the first highbit byte-sequence in a BOM-less file (whether
540 in code or in Pod!), to see whether that sequence is valid as UTF-8
541 (RFC 2279) is to check whether that the first byte in the sequence
542 is in the range 0xC2 - 0xFD and whether the next byte is in the
543 range 0x80 - 0xBF. If so, the parser may conclude that this file
544 is in UTF-8, and all highbit sequences in the file should be
545 assumed to be UTF-8. Otherwise the parser should treat the file as
546 being in CP-1252. (A better check, and which works on EBCDIC
547 platforms as well, is to pass a copy of the sequence to
548 utf8::decode() which performs a full validity check on the sequence
549 and returns TRUE if it is valid UTF-8, FALSE otherwise. This
550 function is always pre-loaded, is fast because it is written in C,
551 and will only get called at most once, so you don't need to avoid
552 it out of performance concerns.) In the unlikely circumstance that
553 the first highbit sequence in a truly non-UTF-8 file happens to
554 appear to be UTF-8, one can cater to our heuristic (as well as any
555 more intelligent heuristic) by prefacing that line with a comment
556 line containing a highbit sequence that is clearly not valid as
557 UTF-8. A line consisting of simply "#", an e-acute, and any non-
558 highbit byte, is sufficient to establish this file's encoding.
559
560 • Pod processors must treat a "=for [label] [content...]" paragraph
561 as meaning the same thing as a "=begin [label]" paragraph, content,
562 and an "=end [label]" paragraph. (The parser may conflate these
563 two constructs, or may leave them distinct, in the expectation that
564 the formatter will nevertheless treat them the same.)
565
566 • When rendering Pod to a format that allows comments (i.e., to
567 nearly any format other than plaintext), a Pod formatter must
568 insert comment text identifying its name and version number, and
569 the name and version numbers of any modules it might be using to
570 process the Pod. Minimal examples:
571
572 %% POD::Pod2PS v3.14159, using POD::Parser v1.92
573
574 <!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
575
576 {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
577
578 .\" Pod::Man version 3.14159, using POD::Parser version 1.92
579
580 Formatters may also insert additional comments, including: the
581 release date of the Pod formatter program, the contact address for
582 the author(s) of the formatter, the current time, the name of input
583 file, the formatting options in effect, version of Perl used, etc.
584
585 Formatters may also choose to note errors/warnings as comments,
586 besides or instead of emitting them otherwise (as in messages to
587 STDERR, or "die"ing).
588
589 • Pod parsers may emit warnings or error messages ("Unknown E code
590 E<zslig>!") to STDERR (whether through printing to STDERR, or
591 "warn"ing/"carp"ing, or "die"ing/"croak"ing), but must allow
592 suppressing all such STDERR output, and instead allow an option for
593 reporting errors/warnings in some other way, whether by triggering
594 a callback, or noting errors in some attribute of the document
595 object, or some similarly unobtrusive mechanism -- or even by
596 appending a "Pod Errors" section to the end of the parsed form of
597 the document.
598
599 • In cases of exceptionally aberrant documents, Pod parsers may abort
600 the parse. Even then, using "die"ing/"croak"ing is to be avoided;
601 where possible, the parser library may simply close the input file
602 and add text like "*** Formatting Aborted ***" to the end of the
603 (partial) in-memory document.
604
605 • In paragraphs where formatting codes (like E<...>, B<...>) are
606 understood (i.e., not verbatim paragraphs, but including ordinary
607 paragraphs, and command paragraphs that produce renderable text,
608 like "=head1"), literal whitespace should generally be considered
609 "insignificant", in that one literal space has the same meaning as
610 any (nonzero) number of literal spaces, literal newlines, and
611 literal tabs (as long as this produces no blank lines, since those
612 would terminate the paragraph). Pod parsers should compact literal
613 whitespace in each processed paragraph, but may provide an option
614 for overriding this (since some processing tasks do not require
615 it), or may follow additional special rules (for example, specially
616 treating period-space-space or period-newline sequences).
617
618 • Pod parsers should not, by default, try to coerce apostrophe (')
619 and quote (") into smart quotes (little 9's, 66's, 99's, etc), nor
620 try to turn backtick (`) into anything else but a single backtick
621 character (distinct from an open quote character!), nor "--" into
622 anything but two minus signs. They must never do any of those
623 things to text in C<...> formatting codes, and never ever to text
624 in verbatim paragraphs.
625
626 • When rendering Pod to a format that has two kinds of hyphens (-),
627 one that's a non-breaking hyphen, and another that's a breakable
628 hyphen (as in "object-oriented", which can be split across lines as
629 "object-", newline, "oriented"), formatters are encouraged to
630 generally translate "-" to non-breaking hyphen, but may apply
631 heuristics to convert some of these to breaking hyphens.
632
633 • Pod formatters should make reasonable efforts to keep words of Perl
634 code from being broken across lines. For example, "Foo::Bar" in
635 some formatting systems is seen as eligible for being broken across
636 lines as "Foo::" newline "Bar" or even "Foo::-" newline "Bar".
637 This should be avoided where possible, either by disabling all
638 line-breaking in mid-word, or by wrapping particular words with
639 internal punctuation in "don't break this across lines" codes
640 (which in some formats may not be a single code, but might be a
641 matter of inserting non-breaking zero-width spaces between every
642 pair of characters in a word.)
643
644 • Pod parsers should, by default, expand tabs in verbatim paragraphs
645 as they are processed, before passing them to the formatter or
646 other processor. Parsers may also allow an option for overriding
647 this.
648
649 • Pod parsers should, by default, remove newlines from the end of
650 ordinary and verbatim paragraphs before passing them to the
651 formatter. For example, while the paragraph you're reading now
652 could be considered, in Pod source, to end with (and contain) the
653 newline(s) that end it, it should be processed as ending with (and
654 containing) the period character that ends this sentence.
655
656 • Pod parsers, when reporting errors, should make some effort to
657 report an approximate line number ("Nested E<>'s in Paragraph #52,
658 near line 633 of Thing/Foo.pm!"), instead of merely noting the
659 paragraph number ("Nested E<>'s in Paragraph #52 of
660 Thing/Foo.pm!"). Where this is problematic, the paragraph number
661 should at least be accompanied by an excerpt from the paragraph
662 ("Nested E<>'s in Paragraph #52 of Thing/Foo.pm, which begins
663 'Read/write accessor for the C<interest rate> attribute...'").
664
665 • Pod parsers, when processing a series of verbatim paragraphs one
666 after another, should consider them to be one large verbatim
667 paragraph that happens to contain blank lines. I.e., these two
668 lines, which have a blank line between them:
669
670 use Foo;
671
672 print Foo->VERSION
673
674 should be unified into one paragraph ("\tuse Foo;\n\n\tprint
675 Foo->VERSION") before being passed to the formatter or other
676 processor. Parsers may also allow an option for overriding this.
677
678 While this might be too cumbersome to implement in event-based Pod
679 parsers, it is straightforward for parsers that return parse trees.
680
681 • Pod formatters, where feasible, are advised to avoid splitting
682 short verbatim paragraphs (under twelve lines, say) across pages.
683
684 • Pod parsers must treat a line with only spaces and/or tabs on it as
685 a "blank line" such as separates paragraphs. (Some older parsers
686 recognized only two adjacent newlines as a "blank line" but would
687 not recognize a newline, a space, and a newline, as a blank line.
688 This is noncompliant behavior.)
689
690 • Authors of Pod formatters/processors should make every effort to
691 avoid writing their own Pod parser. There are already several in
692 CPAN, with a wide range of interface styles -- and one of them,
693 Pod::Simple, comes with modern versions of Perl.
694
695 • Characters in Pod documents may be conveyed either as literals, or
696 by number in E<n> codes, or by an equivalent mnemonic, as in
697 E<eacute> which is exactly equivalent to E<233>. The numbers are
698 the Latin1/Unicode values, even on EBCDIC platforms.
699
700 When referring to characters by using a E<n> numeric code, numbers
701 in the range 32-126 refer to those well known US-ASCII characters
702 (also defined there by Unicode, with the same meaning), which all
703 Pod formatters must render faithfully. Characters whose E<>
704 numbers are in the ranges 0-31 and 127-159 should not be used
705 (neither as literals, nor as E<number> codes), except for the
706 literal byte-sequences for newline (ASCII 13, ASCII 13 10, or ASCII
707 10), and tab (ASCII 9).
708
709 Numbers in the range 160-255 refer to Latin-1 characters (also
710 defined there by Unicode, with the same meaning). Numbers above
711 255 should be understood to refer to Unicode characters.
712
713 • Be warned that some formatters cannot reliably render characters
714 outside 32-126; and many are able to handle 32-126 and 160-255, but
715 nothing above 255.
716
717 • Besides the well-known "E<lt>" and "E<gt>" codes for less-than and
718 greater-than, Pod parsers must understand "E<sol>" for "/"
719 (solidus, slash), and "E<verbar>" for "|" (vertical bar, pipe).
720 Pod parsers should also understand "E<lchevron>" and "E<rchevron>"
721 as legacy codes for characters 171 and 187, i.e., "left-pointing
722 double angle quotation mark" = "left pointing guillemet" and
723 "right-pointing double angle quotation mark" = "right pointing
724 guillemet". (These look like little "<<" and ">>", and they are
725 now preferably expressed with the HTML/XHTML codes "E<laquo>" and
726 "E<raquo>".)
727
728 • Pod parsers should understand all "E<html>" codes as defined in the
729 entity declarations in the most recent XHTML specification at
730 "www.W3.org". Pod parsers must understand at least the entities
731 that define characters in the range 160-255 (Latin-1). Pod
732 parsers, when faced with some unknown "E<identifier>" code,
733 shouldn't simply replace it with nullstring (by default, at least),
734 but may pass it through as a string consisting of the literal
735 characters E, less-than, identifier, greater-than. Or Pod parsers
736 may offer the alternative option of processing such unknown
737 "E<identifier>" codes by firing an event especially for such codes,
738 or by adding a special node-type to the in-memory document tree.
739 Such "E<identifier>" may have special meaning to some processors,
740 or some processors may choose to add them to a special error
741 report.
742
743 • Pod parsers must also support the XHTML codes "E<quot>" for
744 character 34 (doublequote, "), "E<amp>" for character 38
745 (ampersand, &), and "E<apos>" for character 39 (apostrophe, ').
746
747 • Note that in all cases of "E<whatever>", whatever (whether an
748 htmlname, or a number in any base) must consist only of
749 alphanumeric characters -- that is, whatever must match
750 "m/\A\w+\z/". So "E< 0 1 2 3 >" is invalid, because it contains
751 spaces, which aren't alphanumeric characters. This presumably does
752 not need special treatment by a Pod processor; " 0 1 2 3 " doesn't
753 look like a number in any base, so it would presumably be looked up
754 in the table of HTML-like names. Since there isn't (and cannot be)
755 an HTML-like entity called " 0 1 2 3 ", this will be treated as an
756 error. However, Pod processors may treat "E< 0 1 2 3 >" or
757 "E<e-acute>" as syntactically invalid, potentially earning a
758 different error message than the error message (or warning, or
759 event) generated by a merely unknown (but theoretically valid)
760 htmlname, as in "E<qacute>" [sic]. However, Pod parsers are not
761 required to make this distinction.
762
763 • Note that E<number> must not be interpreted as simply "codepoint
764 number in the current/native character set". It always means only
765 "the character represented by codepoint number in Unicode." (This
766 is identical to the semantics of &#number; in XML.)
767
768 This will likely require many formatters to have tables mapping
769 from treatable Unicode codepoints (such as the "\xE9" for the
770 e-acute character) to the escape sequences or codes necessary for
771 conveying such sequences in the target output format. A converter
772 to *roff would, for example know that "\xE9" (whether conveyed
773 literally, or via a E<...> sequence) is to be conveyed as "e\\*'".
774 Similarly, a program rendering Pod in a Mac OS application window,
775 would presumably need to know that "\xE9" maps to codepoint 142 in
776 MacRoman encoding that (at time of writing) is native for Mac OS.
777 Such Unicode2whatever mappings are presumably already widely
778 available for common output formats. (Such mappings may be
779 incomplete! Implementers are not expected to bend over backwards
780 in an attempt to render Cherokee syllabics, Etruscan runes,
781 Byzantine musical symbols, or any of the other weird things that
782 Unicode can encode.) And if a Pod document uses a character not
783 found in such a mapping, the formatter should consider it an
784 unrenderable character.
785
786 • If, surprisingly, the implementor of a Pod formatter can't find a
787 satisfactory pre-existing table mapping from Unicode characters to
788 escapes in the target format (e.g., a decent table of Unicode
789 characters to *roff escapes), it will be necessary to build such a
790 table. If you are in this circumstance, you should begin with the
791 characters in the range 0x00A0 - 0x00FF, which is mostly the
792 heavily used accented characters. Then proceed (as patience
793 permits and fastidiousness compels) through the characters that the
794 (X)HTML standards groups judged important enough to merit mnemonics
795 for. These are declared in the (X)HTML specifications at the
796 www.W3.org site. At time of writing (September 2001), the most
797 recent entity declaration files are:
798
799 http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
800 http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
801 http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
802
803 Then you can progress through any remaining notable Unicode
804 characters in the range 0x2000-0x204D (consult the character tables
805 at www.unicode.org), and whatever else strikes your fancy. For
806 example, in xhtml-symbol.ent, there is the entry:
807
808 <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech -->
809
810 While the mapping "infin" to the character "\x{221E}" will
811 (hopefully) have been already handled by the Pod parser, the
812 presence of the character in this file means that it's reasonably
813 important enough to include in a formatter's table that maps from
814 notable Unicode characters to the codes necessary for rendering
815 them. So for a Unicode-to-*roff mapping, for example, this would
816 merit the entry:
817
818 "\x{221E}" => '\(in',
819
820 It is eagerly hoped that in the future, increasing numbers of
821 formats (and formatters) will support Unicode characters directly
822 (as (X)HTML does with "∞", "∞", or "∞"),
823 reducing the need for idiosyncratic mappings of
824 Unicode-to-my_escapes.
825
826 • It is up to individual Pod formatter to display good judgement when
827 confronted with an unrenderable character (which is distinct from
828 an unknown E<thing> sequence that the parser couldn't resolve to
829 anything, renderable or not). It is good practice to map Latin
830 letters with diacritics (like "E<eacute>"/"E<233>") to the
831 corresponding unaccented US-ASCII letters (like a simple character
832 101, "e"), but clearly this is often not feasible, and an
833 unrenderable character may be represented as "?", or the like. In
834 attempting a sane fallback (as from E<233> to "e"), Pod formatters
835 may use the %Latin1Code_to_fallback table in Pod::Escapes, or
836 Text::Unidecode, if available.
837
838 For example, this Pod text:
839
840 magic is enabled if you set C<$Currency> to 'E<euro>'.
841
842 may be rendered as: "magic is enabled if you set $Currency to '?'"
843 or as "magic is enabled if you set $Currency to '[euro]'", or as
844 "magic is enabled if you set $Currency to '[x20AC]', etc.
845
846 A Pod formatter may also note, in a comment or warning, a list of
847 what unrenderable characters were encountered.
848
849 • E<...> may freely appear in any formatting code (other than in
850 another E<...> or in an Z<>). That is, "X<The E<euro>1,000,000
851 Solution>" is valid, as is "L<The E<euro>1,000,000
852 Solution|Million::Euros>".
853
854 • Some Pod formatters output to formats that implement non-breaking
855 spaces as an individual character (which I'll call "NBSP"), and
856 others output to formats that implement non-breaking spaces just as
857 spaces wrapped in a "don't break this across lines" code. Note
858 that at the level of Pod, both sorts of codes can occur: Pod can
859 contain a NBSP character (whether as a literal, or as a "E<160>" or
860 "E<nbsp>" code); and Pod can contain "S<foo I<bar> baz>" codes,
861 where "mere spaces" (character 32) in such codes are taken to
862 represent non-breaking spaces. Pod parsers should consider
863 supporting the optional parsing of "S<foo I<bar> baz>" as if it
864 were "fooNBSPI<bar>NBSPbaz", and, going the other way, the optional
865 parsing of groups of words joined by NBSP's as if each group were
866 in a S<...> code, so that formatters may use the representation
867 that maps best to what the output format demands.
868
869 • Some processors may find that the "S<...>" code is easiest to
870 implement by replacing each space in the parse tree under the
871 content of the S, with an NBSP. But note: the replacement should
872 apply not to spaces in all text, but only to spaces in printable
873 text. (This distinction may or may not be evident in the
874 particular tree/event model implemented by the Pod parser.) For
875 example, consider this unusual case:
876
877 S<L</Autoloaded Functions>>
878
879 This means that the space in the middle of the visible link text
880 must not be broken across lines. In other words, it's the same as
881 this:
882
883 L<"AutoloadedE<160>Functions"/Autoloaded Functions>
884
885 However, a misapplied space-to-NBSP replacement could (wrongly)
886 produce something equivalent to this:
887
888 L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
889
890 ...which is almost definitely not going to work as a hyperlink
891 (assuming this formatter outputs a format supporting hypertext).
892
893 Formatters may choose to just not support the S format code,
894 especially in cases where the output format simply has no NBSP
895 character/code and no code for "don't break this stuff across
896 lines".
897
898 • Besides the NBSP character discussed above, implementors are
899 reminded of the existence of the other "special" character in
900 Latin-1, the "soft hyphen" character, also known as "discretionary
901 hyphen", i.e. "E<173>" = "E<0xAD>" = "E<shy>"). This character
902 expresses an optional hyphenation point. That is, it normally
903 renders as nothing, but may render as a "-" if a formatter breaks
904 the word at that point. Pod formatters should, as appropriate, do
905 one of the following: 1) render this with a code with the same
906 meaning (e.g., "\-" in RTF), 2) pass it through in the expectation
907 that the formatter understands this character as such, or 3) delete
908 it.
909
910 For example:
911
912 sigE<shy>action
913 manuE<shy>script
914 JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
915
916 These signal to a formatter that if it is to hyphenate "sigaction"
917 or "manuscript", then it should be done as "sig-[linebreak]action"
918 or "manu-[linebreak]script" (and if it doesn't hyphenate it, then
919 the "E<shy>" doesn't show up at all). And if it is to hyphenate
920 "Jarkko" and/or "Hietaniemi", it can do so only at the points where
921 there is a "E<shy>" code.
922
923 In practice, it is anticipated that this character will not be used
924 often, but formatters should either support it, or delete it.
925
926 • If you think that you want to add a new command to Pod (like, say,
927 a "=biblio" command), consider whether you could get the same
928 effect with a for or begin/end sequence: "=for biblio ..." or
929 "=begin biblio" ... "=end biblio". Pod processors that don't
930 understand "=for biblio", etc, will simply ignore it, whereas they
931 may complain loudly if they see "=biblio".
932
933 • Throughout this document, "Pod" has been the preferred spelling for
934 the name of the documentation format. One may also use "POD" or
935 "pod". For the documentation that is (typically) in the Pod
936 format, you may use "pod", or "Pod", or "POD". Understanding these
937 distinctions is useful; but obsessing over how to spell them,
938 usually is not.
939
941 As you can tell from a glance at perlpod, the L<...> code is the most
942 complex of the Pod formatting codes. The points below will hopefully
943 clarify what it means and how processors should deal with it.
944
945 • In parsing an L<...> code, Pod parsers must distinguish at least
946 four attributes:
947
948 First:
949 The link-text. If there is none, this must be "undef". (E.g.,
950 in "L<Perl Functions|perlfunc>", the link-text is "Perl
951 Functions". In "L<Time::HiRes>" and even "L<|Time::HiRes>",
952 there is no link text. Note that link text may contain
953 formatting.)
954
955 Second:
956 The possibly inferred link-text; i.e., if there was no real
957 link text, then this is the text that we'll infer in its place.
958 (E.g., for "L<Getopt::Std>", the inferred link text is
959 "Getopt::Std".)
960
961 Third:
962 The name or URL, or "undef" if none. (E.g., in "L<Perl
963 Functions|perlfunc>", the name (also sometimes called the page)
964 is "perlfunc". In "L</CAVEATS>", the name is "undef".)
965
966 Fourth:
967 The section (AKA "item" in older perlpods), or "undef" if none.
968 E.g., in "L<Getopt::Std/DESCRIPTION>", "DESCRIPTION" is the
969 section. (Note that this is not the same as a manpage section
970 like the "5" in "man 5 crontab". "Section Foo" in the Pod
971 sense means the part of the text that's introduced by the
972 heading or item whose text is "Foo".)
973
974 Pod parsers may also note additional attributes including:
975
976 Fifth:
977 A flag for whether item 3 (if present) is a URL (like
978 "http://lists.perl.org" is), in which case there should be no
979 section attribute; a Pod name (like "perldoc" and "Getopt::Std"
980 are); or possibly a man page name (like "crontab(5)" is).
981
982 Sixth:
983 The raw original L<...> content, before text is split on "|",
984 "/", etc, and before E<...> codes are expanded.
985
986 (The above were numbered only for concise reference below. It is
987 not a requirement that these be passed as an actual list or array.)
988
989 For example:
990
991 L<Foo::Bar>
992 => undef, # link text
993 "Foo::Bar", # possibly inferred link text
994 "Foo::Bar", # name
995 undef, # section
996 'pod', # what sort of link
997 "Foo::Bar" # original content
998
999 L<Perlport's section on NL's|perlport/Newlines>
1000 => "Perlport's section on NL's", # link text
1001 "Perlport's section on NL's", # possibly inferred link text
1002 "perlport", # name
1003 "Newlines", # section
1004 'pod', # what sort of link
1005 "Perlport's section on NL's|perlport/Newlines"
1006 # original content
1007
1008 L<perlport/Newlines>
1009 => undef, # link text
1010 '"Newlines" in perlport', # possibly inferred link text
1011 "perlport", # name
1012 "Newlines", # section
1013 'pod', # what sort of link
1014 "perlport/Newlines" # original content
1015
1016 L<crontab(5)/"DESCRIPTION">
1017 => undef, # link text
1018 '"DESCRIPTION" in crontab(5)', # possibly inferred link text
1019 "crontab(5)", # name
1020 "DESCRIPTION", # section
1021 'man', # what sort of link
1022 'crontab(5)/"DESCRIPTION"' # original content
1023
1024 L</Object Attributes>
1025 => undef, # link text
1026 '"Object Attributes"', # possibly inferred link text
1027 undef, # name
1028 "Object Attributes", # section
1029 'pod', # what sort of link
1030 "/Object Attributes" # original content
1031
1032 L<https://www.perl.org/>
1033 => undef, # link text
1034 "https://www.perl.org/", # possibly inferred link text
1035 "https://www.perl.org/", # name
1036 undef, # section
1037 'url', # what sort of link
1038 "https://www.perl.org/" # original content
1039
1040 L<Perl.org|https://www.perl.org/>
1041 => "Perl.org", # link text
1042 "https://www.perl.org/", # possibly inferred link text
1043 "https://www.perl.org/", # name
1044 undef, # section
1045 'url', # what sort of link
1046 "Perl.org|https://www.perl.org/" # original content
1047
1048 Note that you can distinguish URL-links from anything else by the
1049 fact that they match "m/\A\w+:[^:\s]\S*\z/". So
1050 "L<http://www.perl.com>" is a URL, but "L<HTTP::Response>" isn't.
1051
1052 • In case of L<...> codes with no "text|" part in them, older
1053 formatters have exhibited great variation in actually displaying
1054 the link or cross reference. For example, L<crontab(5)> would
1055 render as "the crontab(5) manpage", or "in the crontab(5) manpage"
1056 or just "crontab(5)".
1057
1058 Pod processors must now treat "text|"-less links as follows:
1059
1060 L<name> => L<name|name>
1061 L</section> => L<"section"|/section>
1062 L<name/section> => L<"section" in name|name/section>
1063
1064 • Note that section names might contain markup. I.e., if a section
1065 starts with:
1066
1067 =head2 About the C<-M> Operator
1068
1069 or with:
1070
1071 =item About the C<-M> Operator
1072
1073 then a link to it would look like this:
1074
1075 L<somedoc/About the C<-M> Operator>
1076
1077 Formatters may choose to ignore the markup for purposes of
1078 resolving the link and use only the renderable characters in the
1079 section name, as in:
1080
1081 <h1><a name="About_the_-M_Operator">About the <code>-M</code>
1082 Operator</h1>
1083
1084 ...
1085
1086 <a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
1087 Operator" in somedoc</a>
1088
1089 • Previous versions of perlpod distinguished "L<name/"section">"
1090 links from "L<name/item>" links (and their targets). These have
1091 been merged syntactically and semantically in the current
1092 specification, and section can refer either to a "=headn Heading
1093 Content" command or to a "=item Item Content" command. This
1094 specification does not specify what behavior should be in the case
1095 of a given document having several things all seeming to produce
1096 the same section identifier (e.g., in HTML, several things all
1097 producing the same anchorname in <a name="anchorname">...</a>
1098 elements). Where Pod processors can control this behavior, they
1099 should use the first such anchor. That is, "L<Foo/Bar>" refers to
1100 the first "Bar" section in Foo.
1101
1102 But for some processors/formats this cannot be easily controlled;
1103 as with the HTML example, the behavior of multiple ambiguous <a
1104 name="anchorname">...</a> is most easily just left up to browsers
1105 to decide.
1106
1107 • In a "L<text|...>" code, text may contain formatting codes for
1108 formatting or for E<...> escapes, as in:
1109
1110 L<B<ummE<234>stuff>|...>
1111
1112 For "L<...>" codes without a "name|" part, only "E<...>" and "Z<>"
1113 codes may occur. That is, authors should not use
1114 ""L<B<Foo::Bar>>"".
1115
1116 Note, however, that formatting codes and Z<>'s can occur in any and
1117 all parts of an L<...> (i.e., in name, section, text, and url).
1118
1119 Authors must not nest L<...> codes. For example, "L<The
1120 L<Foo::Bar> man page>" should be treated as an error.
1121
1122 • Note that Pod authors may use formatting codes inside the "text"
1123 part of "L<text|name>" (and so on for L<text|/"sec">).
1124
1125 In other words, this is valid:
1126
1127 Go read L<the docs on C<$.>|perlvar/"$.">
1128
1129 Some output formats that do allow rendering "L<...>" codes as
1130 hypertext, might not allow the link-text to be formatted; in that
1131 case, formatters will have to just ignore that formatting.
1132
1133 • At time of writing, "L<name>" values are of two types: either the
1134 name of a Pod page like "L<Foo::Bar>" (which might be a real Perl
1135 module or program in an @INC / PATH directory, or a .pod file in
1136 those places); or the name of a Unix man page, like
1137 "L<crontab(5)>". In theory, "L<chmod>" is ambiguous between a Pod
1138 page called "chmod", or the Unix man page "chmod" (in whatever man-
1139 section). However, the presence of a string in parens, as in
1140 "crontab(5)", is sufficient to signal that what is being discussed
1141 is not a Pod page, and so is presumably a Unix man page. The
1142 distinction is of no importance to many Pod processors, but some
1143 processors that render to hypertext formats may need to distinguish
1144 them in order to know how to render a given "L<foo>" code.
1145
1146 • Previous versions of perlpod allowed for a "L<section>" syntax (as
1147 in "L<Object Attributes>"), which was not easily distinguishable
1148 from "L<name>" syntax and for "L<"section">" which was only
1149 slightly less ambiguous. This syntax is no longer in the
1150 specification, and has been replaced by the "L</section>" syntax
1151 (where the slash was formerly optional). Pod parsers should
1152 tolerate the "L<"section">" syntax, for a while at least. The
1153 suggested heuristic for distinguishing "L<section>" from "L<name>"
1154 is that if it contains any whitespace, it's a section. Pod
1155 processors should warn about this being deprecated syntax.
1156
1158 "=over"..."=back" regions are used for various kinds of list-like
1159 structures. (I use the term "region" here simply as a collective term
1160 for everything from the "=over" to the matching "=back".)
1161
1162 • The non-zero numeric indentlevel in "=over indentlevel" ...
1163 "=back" is used for giving the formatter a clue as to how many
1164 "spaces" (ems, or roughly equivalent units) it should tab over,
1165 although many formatters will have to convert this to an absolute
1166 measurement that may not exactly match with the size of spaces (or
1167 M's) in the document's base font. Other formatters may have to
1168 completely ignore the number. The lack of any explicit indentlevel
1169 parameter is equivalent to an indentlevel value of 4. Pod
1170 processors may complain if indentlevel is present but is not a
1171 positive number matching "m/\A(\d*\.)?\d+\z/".
1172
1173 • Authors of Pod formatters are reminded that "=over" ... "=back" may
1174 map to several different constructs in your output format. For
1175 example, in converting Pod to (X)HTML, it can map to any of
1176 <ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
1177 <blockquote>...</blockquote>. Similarly, "=item" can map to <li>
1178 or <dt>.
1179
1180 • Each "=over" ... "=back" region should be one of the following:
1181
1182 • An "=over" ... "=back" region containing only "=item *"
1183 commands, each followed by some number of ordinary/verbatim
1184 paragraphs, other nested "=over" ... "=back" regions, "=for..."
1185 paragraphs, and "=begin"..."=end" regions.
1186
1187 (Pod processors must tolerate a bare "=item" as if it were
1188 "=item *".) Whether "*" is rendered as a literal asterisk, an
1189 "o", or as some kind of real bullet character, is left up to
1190 the Pod formatter, and may depend on the level of nesting.
1191
1192 • An "=over" ... "=back" region containing only
1193 "m/\A=item\s+\d+\.?\s*\z/" paragraphs, each one (or each group
1194 of them) followed by some number of ordinary/verbatim
1195 paragraphs, other nested "=over" ... "=back" regions, "=for..."
1196 paragraphs, and/or "=begin"..."=end" codes. Note that the
1197 numbers must start at 1 in each section, and must proceed in
1198 order and without skipping numbers.
1199
1200 (Pod processors must tolerate lines like "=item 1" as if they
1201 were "=item 1.", with the period.)
1202
1203 • An "=over" ... "=back" region containing only "=item [text]"
1204 commands, each one (or each group of them) followed by some
1205 number of ordinary/verbatim paragraphs, other nested "=over"
1206 ... "=back" regions, or "=for..." paragraphs, and
1207 "=begin"..."=end" regions.
1208
1209 The "=item [text]" paragraph should not match
1210 "m/\A=item\s+\d+\.?\s*\z/" or "m/\A=item\s+\*\s*\z/", nor
1211 should it match just "m/\A=item\s*\z/".
1212
1213 • An "=over" ... "=back" region containing no "=item" paragraphs
1214 at all, and containing only some number of ordinary/verbatim
1215 paragraphs, and possibly also some nested "=over" ... "=back"
1216 regions, "=for..." paragraphs, and "=begin"..."=end" regions.
1217 Such an itemless "=over" ... "=back" region in Pod is
1218 equivalent in meaning to a "<blockquote>...</blockquote>"
1219 element in HTML.
1220
1221 Note that with all the above cases, you can determine which type of
1222 "=over" ... "=back" you have, by examining the first (non-"=cut",
1223 non-"=pod") Pod paragraph after the "=over" command.
1224
1225 • Pod formatters must tolerate arbitrarily large amounts of text in
1226 the "=item text..." paragraph. In practice, most such paragraphs
1227 are short, as in:
1228
1229 =item For cutting off our trade with all parts of the world
1230
1231 But they may be arbitrarily long:
1232
1233 =item For transporting us beyond seas to be tried for pretended
1234 offenses
1235
1236 =item He is at this time transporting large armies of foreign
1237 mercenaries to complete the works of death, desolation and
1238 tyranny, already begun with circumstances of cruelty and perfidy
1239 scarcely paralleled in the most barbarous ages, and totally
1240 unworthy the head of a civilized nation.
1241
1242 • Pod processors should tolerate "=item *" / "=item number" commands
1243 with no accompanying paragraph. The middle item is an example:
1244
1245 =over
1246
1247 =item 1
1248
1249 Pick up dry cleaning.
1250
1251 =item 2
1252
1253 =item 3
1254
1255 Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
1256
1257 =back
1258
1259 • No "=over" ... "=back" region can contain headings. Processors may
1260 treat such a heading as an error.
1261
1262 • Note that an "=over" ... "=back" region should have some content.
1263 That is, authors should not have an empty region like this:
1264
1265 =over
1266
1267 =back
1268
1269 Pod processors seeing such a contentless "=over" ... "=back"
1270 region, may ignore it, or may report it as an error.
1271
1272 • Processors must tolerate an "=over" list that goes off the end of
1273 the document (i.e., which has no matching "=back"), but they may
1274 warn about such a list.
1275
1276 • Authors of Pod formatters should note that this construct:
1277
1278 =item Neque
1279
1280 =item Porro
1281
1282 =item Quisquam Est
1283
1284 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1285 velit, sed quia non numquam eius modi tempora incidunt ut
1286 labore et dolore magnam aliquam quaerat voluptatem.
1287
1288 =item Ut Enim
1289
1290 is semantically ambiguous, in a way that makes formatting decisions
1291 a bit difficult. On the one hand, it could be mention of an item
1292 "Neque", mention of another item "Porro", and mention of another
1293 item "Quisquam Est", with just the last one requiring the
1294 explanatory paragraph "Qui dolorem ipsum quia dolor..."; and then
1295 an item "Ut Enim". In that case, you'd want to format it like so:
1296
1297 Neque
1298
1299 Porro
1300
1301 Quisquam Est
1302 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1303 velit, sed quia non numquam eius modi tempora incidunt ut
1304 labore et dolore magnam aliquam quaerat voluptatem.
1305
1306 Ut Enim
1307
1308 But it could equally well be a discussion of three (related or
1309 equivalent) items, "Neque", "Porro", and "Quisquam Est", followed
1310 by a paragraph explaining them all, and then a new item "Ut Enim".
1311 In that case, you'd probably want to format it like so:
1312
1313 Neque
1314 Porro
1315 Quisquam Est
1316 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1317 velit, sed quia non numquam eius modi tempora incidunt ut
1318 labore et dolore magnam aliquam quaerat voluptatem.
1319
1320 Ut Enim
1321
1322 But (for the foreseeable future), Pod does not provide any way for
1323 Pod authors to distinguish which grouping is meant by the above
1324 "=item"-cluster structure. So formatters should format it like so:
1325
1326 Neque
1327
1328 Porro
1329
1330 Quisquam Est
1331
1332 Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1333 velit, sed quia non numquam eius modi tempora incidunt ut
1334 labore et dolore magnam aliquam quaerat voluptatem.
1335
1336 Ut Enim
1337
1338 That is, there should be (at least roughly) equal spacing between
1339 items as between paragraphs (although that spacing may well be less
1340 than the full height of a line of text). This leaves it to the
1341 reader to use (con)textual cues to figure out whether the "Qui
1342 dolorem ipsum..." paragraph applies to the "Quisquam Est" item or
1343 to all three items "Neque", "Porro", and "Quisquam Est". While not
1344 an ideal situation, this is preferable to providing formatting cues
1345 that may be actually contrary to the author's intent.
1346
1348 Data paragraphs are typically used for inlining non-Pod data that is to
1349 be used (typically passed through) when rendering the document to a
1350 specific format:
1351
1352 =begin rtf
1353
1354 \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
1355
1356 =end rtf
1357
1358 The exact same effect could, incidentally, be achieved with a single
1359 "=for" paragraph:
1360
1361 =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
1362
1363 (Although that is not formally a data paragraph, it has the same
1364 meaning as one, and Pod parsers may parse it as one.)
1365
1366 Another example of a data paragraph:
1367
1368 =begin html
1369
1370 I like <em>PIE</em>!
1371
1372 <hr>Especially pecan pie!
1373
1374 =end html
1375
1376 If these were ordinary paragraphs, the Pod parser would try to expand
1377 the "E</em>" (in the first paragraph) as a formatting code, just like
1378 "E<lt>" or "E<eacute>". But since this is in a "=begin
1379 identifier"..."=end identifier" region and the identifier "html"
1380 doesn't begin have a ":" prefix, the contents of this region are stored
1381 as data paragraphs, instead of being processed as ordinary paragraphs
1382 (or if they began with a spaces and/or tabs, as verbatim paragraphs).
1383
1384 As a further example: At time of writing, no "biblio" identifier is
1385 supported, but suppose some processor were written to recognize it as a
1386 way of (say) denoting a bibliographic reference (necessarily containing
1387 formatting codes in ordinary paragraphs). The fact that "biblio"
1388 paragraphs were meant for ordinary processing would be indicated by
1389 prefacing each "biblio" identifier with a colon:
1390
1391 =begin :biblio
1392
1393 Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
1394 Programs.> Prentice-Hall, Englewood Cliffs, NJ.
1395
1396 =end :biblio
1397
1398 This would signal to the parser that paragraphs in this begin...end
1399 region are subject to normal handling as ordinary/verbatim paragraphs
1400 (while still tagged as meant only for processors that understand the
1401 "biblio" identifier). The same effect could be had with:
1402
1403 =for :biblio
1404 Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
1405 Programs.> Prentice-Hall, Englewood Cliffs, NJ.
1406
1407 The ":" on these identifiers means simply "process this stuff normally,
1408 even though the result will be for some special target". I suggest
1409 that parser APIs report "biblio" as the target identifier, but also
1410 report that it had a ":" prefix. (And similarly, with the above
1411 "html", report "html" as the target identifier, and note the lack of a
1412 ":" prefix.)
1413
1414 Note that a "=begin identifier"..."=end identifier" region where
1415 identifier begins with a colon, can contain commands. For example:
1416
1417 =begin :biblio
1418
1419 Wirth's classic is available in several editions, including:
1420
1421 =for comment
1422 hm, check abebooks.com for how much used copies cost.
1423
1424 =over
1425
1426 =item
1427
1428 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
1429 Teubner, Stuttgart. [Yes, it's in German.]
1430
1431 =item
1432
1433 Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
1434 Programs.> Prentice-Hall, Englewood Cliffs, NJ.
1435
1436 =back
1437
1438 =end :biblio
1439
1440 Note, however, a "=begin identifier"..."=end identifier" region where
1441 identifier does not begin with a colon, should not directly contain
1442 "=head1" ... "=head4" commands, nor "=over", nor "=back", nor "=item".
1443 For example, this may be considered invalid:
1444
1445 =begin somedata
1446
1447 This is a data paragraph.
1448
1449 =head1 Don't do this!
1450
1451 This is a data paragraph too.
1452
1453 =end somedata
1454
1455 A Pod processor may signal that the above (specifically the "=head1"
1456 paragraph) is an error. Note, however, that the following should not
1457 be treated as an error:
1458
1459 =begin somedata
1460
1461 This is a data paragraph.
1462
1463 =cut
1464
1465 # Yup, this isn't Pod anymore.
1466 sub excl { (rand() > .5) ? "hoo!" : "hah!" }
1467
1468 =pod
1469
1470 This is a data paragraph too.
1471
1472 =end somedata
1473
1474 And this too is valid:
1475
1476 =begin someformat
1477
1478 This is a data paragraph.
1479
1480 And this is a data paragraph.
1481
1482 =begin someotherformat
1483
1484 This is a data paragraph too.
1485
1486 And this is a data paragraph too.
1487
1488 =begin :yetanotherformat
1489
1490 =head2 This is a command paragraph!
1491
1492 This is an ordinary paragraph!
1493
1494 And this is a verbatim paragraph!
1495
1496 =end :yetanotherformat
1497
1498 =end someotherformat
1499
1500 Another data paragraph!
1501
1502 =end someformat
1503
1504 The contents of the above "=begin :yetanotherformat" ... "=end
1505 :yetanotherformat" region aren't data paragraphs, because the
1506 immediately containing region's identifier (":yetanotherformat") begins
1507 with a colon. In practice, most regions that contain data paragraphs
1508 will contain only data paragraphs; however, the above nesting is
1509 syntactically valid as Pod, even if it is rare. However, the handlers
1510 for some formats, like "html", will accept only data paragraphs, not
1511 nested regions; and they may complain if they see (targeted for them)
1512 nested regions, or commands, other than "=end", "=pod", and "=cut".
1513
1514 Also consider this valid structure:
1515
1516 =begin :biblio
1517
1518 Wirth's classic is available in several editions, including:
1519
1520 =over
1521
1522 =item
1523
1524 Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
1525 Teubner, Stuttgart. [Yes, it's in German.]
1526
1527 =item
1528
1529 Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
1530 Programs.> Prentice-Hall, Englewood Cliffs, NJ.
1531
1532 =back
1533
1534 Buy buy buy!
1535
1536 =begin html
1537
1538 <img src='wirth_spokesmodeling_book.png'>
1539
1540 <hr>
1541
1542 =end html
1543
1544 Now now now!
1545
1546 =end :biblio
1547
1548 There, the "=begin html"..."=end html" region is nested inside the
1549 larger "=begin :biblio"..."=end :biblio" region. Note that the content
1550 of the "=begin html"..."=end html" region is data paragraph(s), because
1551 the immediately containing region's identifier ("html") doesn't begin
1552 with a colon.
1553
1554 Pod parsers, when processing a series of data paragraphs one after
1555 another (within a single region), should consider them to be one large
1556 data paragraph that happens to contain blank lines. So the content of
1557 the above "=begin html"..."=end html" may be stored as two data
1558 paragraphs (one consisting of "<img
1559 src='wirth_spokesmodeling_book.png'>\n" and another consisting of
1560 "<hr>\n"), but should be stored as a single data paragraph (consisting
1561 of "<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n").
1562
1563 Pod processors should tolerate empty "=begin something"..."=end
1564 something" regions, empty "=begin :something"..."=end :something"
1565 regions, and contentless "=for something" and "=for :something"
1566 paragraphs. I.e., these should be tolerated:
1567
1568 =for html
1569
1570 =begin html
1571
1572 =end html
1573
1574 =begin :biblio
1575
1576 =end :biblio
1577
1578 Incidentally, note that there's no easy way to express a data paragraph
1579 starting with something that looks like a command. Consider:
1580
1581 =begin stuff
1582
1583 =shazbot
1584
1585 =end stuff
1586
1587 There, "=shazbot" will be parsed as a Pod command "shazbot", not as a
1588 data paragraph "=shazbot\n". However, you can express a data paragraph
1589 consisting of "=shazbot\n" using this code:
1590
1591 =for stuff =shazbot
1592
1593 The situation where this is necessary, is presumably quite rare.
1594
1595 Note that =end commands must match the currently open =begin command.
1596 That is, they must properly nest. For example, this is valid:
1597
1598 =begin outer
1599
1600 X
1601
1602 =begin inner
1603
1604 Y
1605
1606 =end inner
1607
1608 Z
1609
1610 =end outer
1611
1612 while this is invalid:
1613
1614 =begin outer
1615
1616 X
1617
1618 =begin inner
1619
1620 Y
1621
1622 =end outer
1623
1624 Z
1625
1626 =end inner
1627
1628 This latter is improper because when the "=end outer" command is seen,
1629 the currently open region has the formatname "inner", not "outer". (It
1630 just happens that "outer" is the format name of a higher-up region.)
1631 This is an error. Processors must by default report this as an error,
1632 and may halt processing the document containing that error. A
1633 corollary of this is that regions cannot "overlap". That is, the latter
1634 block above does not represent a region called "outer" which contains X
1635 and Y, overlapping a region called "inner" which contains Y and Z. But
1636 because it is invalid (as all apparently overlapping regions would be),
1637 it doesn't represent that, or anything at all.
1638
1639 Similarly, this is invalid:
1640
1641 =begin thing
1642
1643 =end hting
1644
1645 This is an error because the region is opened by "thing", and the
1646 "=end" tries to close "hting" [sic].
1647
1648 This is also invalid:
1649
1650 =begin thing
1651
1652 =end
1653
1654 This is invalid because every "=end" command must have a formatname
1655 parameter.
1656
1658 perlpod, "PODs: Embedded Documentation" in perlsyn, podchecker
1659
1661 Sean M. Burke
1662
1663
1664
1665perl v5.36.0 2022-08-30 PERLPODSPEC(1)