perlpodspec(1)

1PERLPODSPEC(1)         Perl Programmers Reference Guide         PERLPODSPEC(1)
2
3
4

NAME

6       perlpodspec - Plain Old Documentation: format specification and notes
7

DESCRIPTION

9       This document is detailed notes on the Pod markup language.  Most
10       people will only have to read perlpod to know how to write in Pod, but
11       this document may answer some incidental questions to do with parsing
12       and rendering Pod.
13
14       In this document, "must" / "must not", "should" / "should not", and
15       "may" have their conventional (cf. RFC 2119) meanings: "X must do Y"
16       means that if X doesn't do Y, it's against this specification, and
17       should really be fixed.  "X should do Y" means that it's recommended,
18       but X may fail to do Y, if there's a good reason.  "X may do Y" is
19       merely a note that X can do Y at will (although it is up to the reader
20       to detect any connotation of "and I think it would be nice if X did Y"
21       versus "it wouldn't really bother me if X did Y").
22
23       Notably, when I say "the parser should do Y", the parser may fail to do
24       Y, if the calling application explicitly requests that the parser not
25       do Y.  I often phrase this as "the parser should, by default, do Y."
26       This doesn't require the parser to provide an option for turning off
27       whatever feature Y is (like expanding tabs in verbatim paragraphs),
28       although it implicates that such an option may be provided.
29

Pod Definitions

31       Pod is embedded in files, typically Perl source files, although you can
32       write a file that's nothing but Pod.
33
34       A line in a file consists of zero or more non-newline characters,
35       terminated by either a newline or the end of the file.
36
37       A newline sequence is usually a platform-dependent concept, but Pod
38       parsers should understand it to mean any of CR (ASCII 13), LF (ASCII
39       10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in addition
40       to any other system-specific meaning.  The first CR/CRLF/LF sequence in
41       the file may be used as the basis for identifying the newline sequence
42       for parsing the rest of the file.
43
44       A blank line is a line consisting entirely of zero or more spaces
45       (ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-
46       file.  A non-blank line is a line containing one or more characters
47       other than space or tab (and terminated by a newline or end-of-file).
48
49       (Note: Many older Pod parsers did not accept a line consisting of
50       spaces/tabs and then a newline as a blank line. The only lines they
51       considered blank were lines consisting of no characters at all,
52       terminated by a newline.)
53
54       Whitespace is used in this document as a blanket term for spaces, tabs,
55       and newline sequences.  (By itself, this term usually refers to literal
56       whitespace.  That is, sequences of whitespace characters in Pod source,
57       as opposed to "E<32>", which is a formatting code that denotes a
58       whitespace character.)
59
60       A Pod parser is a module meant for parsing Pod (regardless of whether
61       this involves calling callbacks or building a parse tree or directly
62       formatting it).  A Pod formatter (or Pod translator) is a module or
63       program that converts Pod to some other format (HTML, plaintext, TeX,
64       PostScript, RTF).  A Pod processor might be a formatter or translator,
65       or might be a program that does something else with the Pod (like
66       counting words, scanning for index points, etc.).
67
68       Pod content is contained in Pod blocks.  A Pod block starts with a line
69       that matches "m/\A=[a-zA-Z]/", and continues up to the next line that
70       matches "m/\A=cut/" or up to the end of the file if there is no
71       "m/\A=cut/" line.
72
73       Note that a parser is not expected to distinguish between something
74       that looks like pod, but is in a quoted string, such as a here
75       document.
76
77       Within a Pod block, there are Pod paragraphs.  A Pod paragraph consists
78       of non-blank lines of text, separated by one or more blank lines.
79
80       For purposes of Pod processing, there are four types of paragraphs in a
81       Pod block:
82
83       •   A command paragraph (also called a "directive").  The first line of
84           this paragraph must match "m/\A=[a-zA-Z]/".  Command paragraphs are
85           typically one line, as in:
86
87             =head1 NOTES
88
89             =item *
90
91           But they may span several (non-blank) lines:
92
93             =for comment
94             Hm, I wonder what it would look like if
95             you tried to write a BNF for Pod from this.
96
97             =head3 Dr. Strangelove, or: How I Learned to
98             Stop Worrying and Love the Bomb
99
100           Some command paragraphs allow formatting codes in their content
101           (i.e., after the part that matches "m/\A=[a-zA-Z]\S*\s*/"), as in:
102
103             =head1 Did You Remember to C<use strict;>?
104
105           In other words, the Pod processing handler for "head1" will apply
106           the same processing to "Did You Remember to C<use strict;>?" that
107           it would to an ordinary paragraph (i.e., formatting codes like
108           "C<...>") are parsed and presumably formatted appropriately, and
109           whitespace in the form of literal spaces and/or tabs is not
110           significant.
111
112       •   A verbatim paragraph.  The first line of this paragraph must be a
113           literal space or tab, and this paragraph must not be inside a
114           "=begin identifier", ... "=end identifier" sequence unless
115           "identifier" begins with a colon (":").  That is, if a paragraph
116           starts with a literal space or tab, but is inside a "=begin
117           identifier", ... "=end identifier" region, then it's a data
118           paragraph, unless "identifier" begins with a colon.
119
120           Whitespace is significant in verbatim paragraphs (although, in
121           processing, tabs are probably expanded).
122
123       •   An ordinary paragraph.  A paragraph is an ordinary paragraph if its
124           first line matches neither "m/\A=[a-zA-Z]/" nor "m/\A[ \t]/", and
125           if it's not inside a "=begin identifier", ... "=end identifier"
126           sequence unless "identifier" begins with a colon (":").
127
128       •   A data paragraph.  This is a paragraph that is inside a "=begin
129           identifier" ... "=end identifier" sequence where "identifier" does
130           not begin with a literal colon (":").  In some sense, a data
131           paragraph is not part of Pod at all (i.e., effectively it's "out-
132           of-band"), since it's not subject to most kinds of Pod parsing; but
133           it is specified here, since Pod parsers need to be able to call an
134           event for it, or store it in some form in a parse tree, or at least
135           just parse around it.
136
137       For example: consider the following paragraphs:
138
139         # <- that's the 0th column
140
141         =head1 Foo
142
143         Stuff
144
145           $foo->bar
146
147         =cut
148
149       Here, "=head1 Foo" and "=cut" are command paragraphs because the first
150       line of each matches "m/\A=[a-zA-Z]/".  "[space][space]$foo->bar" is a
151       verbatim paragraph, because its first line starts with a literal
152       whitespace character (and there's no "=begin"..."=end" region around).
153
154       The "=begin identifier" ... "=end identifier" commands stop paragraphs
155       that they surround from being parsed as ordinary or verbatim
156       paragraphs, if identifier doesn't begin with a colon.  This is
157       discussed in detail in the section "About Data Paragraphs and
158       "=begin/=end" Regions".
159

Pod Commands

161       This section is intended to supplement and clarify the discussion in
162       "Command Paragraph" in perlpod.  These are the currently recognized Pod
163       commands:
164
165       "=head1", "=head2", "=head3", "=head4", "=head5", "=head6"
166           This command indicates that the text in the remainder of the
167           paragraph is a heading.  That text may contain formatting codes.
168           Examples:
169
170             =head1 Object Attributes
171
172             =head3 What B<Not> to Do!
173
174           Both "=head5" and "=head6" were added in 2020 and might not be
175           supported on all Pod parsers. Pod::Simple 3.41 was released on
176           October 2020 and supports both of these providing support for all
177           Pod::Simple-based Pod parsers.
178
179       "=pod"
180           This command indicates that this paragraph begins a Pod block.  (If
181           we are already in the middle of a Pod block, this command has no
182           effect at all.)  If there is any text in this command paragraph
183           after "=pod", it must be ignored.  Examples:
184
185             =pod
186
187             This is a plain Pod paragraph.
188
189             =pod This text is ignored.
190
191       "=cut"
192           This command indicates that this line is the end of this previously
193           started Pod block.  If there is any text after "=cut" on the line,
194           it must be ignored.  Examples:
195
196             =cut
197
198             =cut The documentation ends here.
199
200             =cut
201             # This is the first line of program text.
202             sub foo { # This is the second.
203
204           It is an error to try to start a Pod block with a "=cut" command.
205           In that case, the Pod processor must halt parsing of the input
206           file, and must by default emit a warning.
207
208       "=over"
209           This command indicates that this is the start of a list/indent
210           region.  If there is any text following the "=over", it must
211           consist of only a nonzero positive numeral.  The semantics of this
212           numeral is explained in the "About =over...=back Regions" section,
213           further below.  Formatting codes are not expanded.  Examples:
214
215             =over 3
216
217             =over 3.5
218
219             =over
220
221       "=item"
222           This command indicates that an item in a list begins here.
223           Formatting codes are processed.  The semantics of the (optional)
224           text in the remainder of this paragraph are explained in the "About
225           =over...=back Regions" section, further below.  Examples:
226
227             =item
228
229             =item *
230
231             =item      *
232
233             =item 14
234
235             =item   3.
236
237             =item C<< $thing->stuff(I<dodad>) >>
238
239             =item For transporting us beyond seas to be tried for pretended
240             offenses
241
242             =item He is at this time transporting large armies of foreign
243             mercenaries to complete the works of death, desolation and
244             tyranny, already begun with circumstances of cruelty and perfidy
245             scarcely paralleled in the most barbarous ages, and totally
246             unworthy the head of a civilized nation.
247
248       "=back"
249           This command indicates that this is the end of the region begun by
250           the most recent "=over" command.  It permits no text after the
251           "=back" command.
252
253       "=begin formatname"
254       "=begin formatname parameter"
255           This marks the following paragraphs (until the matching "=end
256           formatname") as being for some special kind of processing.  Unless
257           "formatname" begins with a colon, the contained non-command
258           paragraphs are data paragraphs.  But if "formatname" does begin
259           with a colon, then non-command paragraphs are ordinary paragraphs
260           or data paragraphs.  This is discussed in detail in the section
261           "About Data Paragraphs and "=begin/=end" Regions".
262
263           It is advised that formatnames match the regexp
264           "m/\A:?[-a-zA-Z0-9_]+\z/".  Everything following whitespace after
265           the formatname is a parameter that may be used by the formatter
266           when dealing with this region.  This parameter must not be repeated
267           in the "=end" paragraph.  Implementors should anticipate future
268           expansion in the semantics and syntax of the first parameter to
269           "=begin"/"=end"/"=for".
270
271       "=end formatname"
272           This marks the end of the region opened by the matching "=begin
273           formatname" region.  If "formatname" is not the formatname of the
274           most recent open "=begin formatname" region, then this is an error,
275           and must generate an error message.  This is discussed in detail in
276           the section "About Data Paragraphs and "=begin/=end" Regions".
277
278       "=for formatname text..."
279           This is synonymous with:
280
281                =begin formatname
282
283                text...
284
285                =end formatname
286
287           That is, it creates a region consisting of a single paragraph; that
288           paragraph is to be treated as a normal paragraph if "formatname"
289           begins with a ":"; if "formatname" doesn't begin with a colon, then
290           "text..." will constitute a data paragraph.  There is no way to use
291           "=for formatname text..." to express "text..." as a verbatim
292           paragraph.
293
294       "=encoding encodingname"
295           This command, which should occur early in the document (at least
296           before any non-US-ASCII data!), declares that this document is
297           encoded in the encoding encodingname, which must be an encoding
298           name that Encode recognizes.  (Encode's list of supported
299           encodings, in Encode::Supported, is useful here.)  If the Pod
300           parser cannot decode the declared encoding, it should emit a
301           warning and may abort parsing the document altogether.
302
303           A document having more than one "=encoding" line should be
304           considered an error.  Pod processors may silently tolerate this if
305           the not-first "=encoding" lines are just duplicates of the first
306           one (e.g., if there's a "=encoding utf8" line, and later on another
307           "=encoding utf8" line).  But Pod processors should complain if
308           there are contradictory "=encoding" lines in the same document
309           (e.g., if there is a "=encoding utf8" early in the document and
310           "=encoding big5" later).  Pod processors that recognize BOMs may
311           also complain if they see an "=encoding" line that contradicts the
312           BOM (e.g., if a document with a UTF-16LE BOM has an "=encoding
313           shiftjis" line).
314
315       If a Pod processor sees any command other than the ones listed above
316       (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", or "=w123"),
317       that processor must by default treat this as an error.  It must not
318       process the paragraph beginning with that command, must by default warn
319       of this as an error, and may abort the parse.  A Pod parser may allow a
320       way for particular applications to add to the above list of known
321       commands, and to stipulate, for each additional command, whether
322       formatting codes should be processed.
323
324       Future versions of this specification may add additional commands.
325

Pod Formatting Codes

327       (Note that in previous drafts of this document and of perlpod,
328       formatting codes were referred to as "interior sequences", and this
329       term may still be found in the documentation for Pod parsers, and in
330       error messages from Pod processors.)
331
332       There are two syntaxes for formatting codes:
333
334       •   A formatting code starts with a capital letter (just US-ASCII
335           [A-Z]) followed by a "<", any number of characters, and ending with
336           the first matching ">".  Examples:
337
338               That's what I<you> think!
339
340               What's C<CORE::dump()> for?
341
342               X<C<chmod> and C<unlink()> Under Different Operating Systems>
343
344       •   A formatting code starts with a capital letter (just US-ASCII
345           [A-Z]) followed by two or more "<"'s, one or more whitespace
346           characters, any number of characters, one or more whitespace
347           characters, and ending with the first matching sequence of two or
348           more ">"'s, where the number of ">"'s equals the number of "<"'s in
349           the opening of this formatting code.  Examples:
350
351               That's what I<< you >> think!
352
353               C<<< open(X, ">>thing.dat") || die $! >>>
354
355               B<< $foo->bar(); >>
356
357           With this syntax, the whitespace character(s) after the "C<<<" and
358           before the ">>>" (or whatever letter) are not renderable. They do
359           not signify whitespace, are merely part of the formatting codes
360           themselves.  That is, these are all synonymous:
361
362               C<thing>
363               C<< thing >>
364               C<<           thing     >>
365               C<<<   thing >>>
366               C<<<<
367               thing
368                          >>>>
369
370           and so on.
371
372           Finally, the multiple-angle-bracket form does not alter the
373           interpretation of nested formatting codes, meaning that the
374           following four example lines are identical in meaning:
375
376             B<example: C<$a E<lt>=E<gt> $b>>
377
378             B<example: C<< $a <=> $b >>>
379
380             B<example: C<< $a E<lt>=E<gt> $b >>>
381
382             B<<< example: C<< $a E<lt>=E<gt> $b >> >>>
383
384       In parsing Pod, a notably tricky part is the correct parsing of
385       (potentially nested!) formatting codes.  Implementors should consult
386       the code in the "parse_text" routine in Pod::Parser as an example of a
387       correct implementation.
388
389       "I<text>" -- italic text
390           See the brief discussion in "Formatting Codes" in perlpod.
391
392       "B<text>" -- bold text
393           See the brief discussion in "Formatting Codes" in perlpod.
394
395       "C<code>" -- code text
396           See the brief discussion in "Formatting Codes" in perlpod.
397
398       "F<filename>" -- style for filenames
399           See the brief discussion in "Formatting Codes" in perlpod.
400
401       "X<topic name>" -- an index entry
402           See the brief discussion in "Formatting Codes" in perlpod.
403
404           This code is unusual in that most formatters completely discard
405           this code and its content.  Other formatters will render it with
406           invisible codes that can be used in building an index of the
407           current document.
408
409       "Z<>" -- a null (zero-effect) formatting code
410           Discussed briefly in "Formatting Codes" in perlpod.
411
412           This code is unusual in that it should have no content.  That is, a
413           processor may complain if it sees "Z<potatoes>".  Whether or not it
414           complains, the potatoes text should ignored.
415
416       "L<name>" -- a hyperlink
417           The complicated syntaxes of this code are discussed at length in
418           "Formatting Codes" in perlpod, and implementation details are
419           discussed below, in "About L<...> Codes".  Parsing the contents of
420           L<content> is tricky.  Notably, the content has to be checked for
421           whether it looks like a URL, or whether it has to be split on
422           literal "|" and/or "/" (in the right order!), and so on, before
423           E<...> codes are resolved.
424
425       "E<escape>" -- a character escape
426           See "Formatting Codes" in perlpod, and several points in "Notes on
427           Implementing Pod Processors".
428
429       "S<text>" -- text contains non-breaking spaces
430           This formatting code is syntactically simple, but semantically
431           complex.  What it means is that each space in the printable content
432           of this code signifies a non-breaking space.
433
434           Consider:
435
436               C<$x ? $y    :  $z>
437
438               S<C<$x ? $y     :  $z>>
439
440           Both signify the monospace (c[ode] style) text consisting of "$x",
441           one space, "?", one space, ":", one space, "$z".  The difference is
442           that in the latter, with the S code, those spaces are not "normal"
443           spaces, but instead are non-breaking spaces.
444
445       If a Pod processor sees any formatting code other than the ones listed
446       above (as in "N<...>", or "Q<...>", etc.), that processor must by
447       default treat this as an error.  A Pod parser may allow a way for
448       particular applications to add to the above list of known formatting
449       codes; a Pod parser might even allow a way to stipulate, for each
450       additional command, whether it requires some form of special
451       processing, as L<...> does.
452
453       Future versions of this specification may add additional formatting
454       codes.
455
456       Historical note:  A few older Pod processors would not see a ">" as
457       closing a "C<" code, if the ">" was immediately preceded by a "-".
458       This was so that this:
459
460           C<$foo->bar>
461
462       would parse as equivalent to this:
463
464           C<$foo-E<gt>bar>
465
466       instead of as equivalent to a "C" formatting code containing only
467       "$foo-", and then a "bar>" outside the "C" formatting code.  This
468       problem has since been solved by the addition of syntaxes like this:
469
470           C<< $foo->bar >>
471
472       Compliant parsers must not treat "->" as special.
473
474       Formatting codes absolutely cannot span paragraphs.  If a code is
475       opened in one paragraph, and no closing code is found by the end of
476       that paragraph, the Pod parser must close that formatting code, and
477       should complain (as in "Unterminated I code in the paragraph starting
478       at line 123: 'Time objects are not...'").  So these two paragraphs:
479
480         I<I told you not to do this!
481
482         Don't make me say it again!>
483
484       ...must not be parsed as two paragraphs in italics (with the I code
485       starting in one paragraph and starting in another.)  Instead, the first
486       paragraph should generate a warning, but that aside, the above code
487       must parse as if it were:
488
489         I<I told you not to do this!>
490
491         Don't make me say it again!E<gt>
492
493       (In SGMLish jargon, all Pod commands are like block-level elements,
494       whereas all Pod formatting codes are like inline-level elements.)
495

Notes on Implementing Pod Processors

497       The following is a long section of miscellaneous requirements and
498       suggestions to do with Pod processing.
499
500       •   Pod formatters should tolerate lines in verbatim blocks that are of
501           any length, even if that means having to break them (possibly
502           several times, for very long lines) to avoid text running off the
503           side of the page.  Pod formatters may warn of such line-breaking.
504           Such warnings are particularly appropriate for lines are over 100
505           characters long, which are usually not intentional.
506
507       •   Pod parsers must recognize all of the three well-known newline
508           formats: CR, LF, and CRLF.  See perlport.
509
510       •   Pod parsers should accept input lines that are of any length.
511
512       •   Since Perl recognizes a Unicode Byte Order Mark at the start of
513           files as signaling that the file is Unicode encoded as in UTF-16
514           (whether big-endian or little-endian) or UTF-8, Pod parsers should
515           do the same.  Otherwise, the character encoding should be
516           understood as being UTF-8 if the first highbit byte sequence in the
517           file seems valid as a UTF-8 sequence, or otherwise as CP-1252
518           (earlier versions of this specification used Latin-1 instead of
519           CP-1252).
520
521           Future versions of this specification may specify how Pod can
522           accept other encodings.  Presumably treatment of other encodings in
523           Pod parsing would be as in XML parsing: whatever the encoding
524           declared by a particular Pod file, content is to be stored in
525           memory as Unicode characters.
526
527       •   The well known Unicode Byte Order Marks are as follows:  if the
528           file begins with the two literal byte values 0xFE 0xFF, this is the
529           BOM for big-endian UTF-16.  If the file begins with the two literal
530           byte value 0xFF 0xFE, this is the BOM for little-endian UTF-16.  On
531           an ASCII platform, if the file begins with the three literal byte
532           values 0xEF 0xBB 0xBF, this is the BOM for UTF-8.  A mechanism
533           portable to EBCDIC platforms is to:
534
535             my $utf8_bom = "\x{FEFF}";
536             utf8::encode($utf8_bom);
537
538       •   A naive, but often sufficient heuristic on ASCII platforms, for
539           testing the first highbit byte-sequence in a BOM-less file (whether
540           in code or in Pod!), to see whether that sequence is valid as UTF-8
541           (RFC 2279) is to check whether that the first byte in the sequence
542           is in the range 0xC2 - 0xFD and whether the next byte is in the
543           range 0x80 - 0xBF.  If so, the parser may conclude that this file
544           is in UTF-8, and all highbit sequences in the file should be
545           assumed to be UTF-8.  Otherwise the parser should treat the file as
546           being in CP-1252.  (A better check, and which works on EBCDIC
547           platforms as well, is to pass a copy of the sequence to
548           utf8::decode() which performs a full validity check on the sequence
549           and returns TRUE if it is valid UTF-8, FALSE otherwise.  This
550           function is always pre-loaded, is fast because it is written in C,
551           and will only get called at most once, so you don't need to avoid
552           it out of performance concerns.)  In the unlikely circumstance that
553           the first highbit sequence in a truly non-UTF-8 file happens to
554           appear to be UTF-8, one can cater to our heuristic (as well as any
555           more intelligent heuristic) by prefacing that line with a comment
556           line containing a highbit sequence that is clearly not valid as
557           UTF-8.  A line consisting of simply "#", an e-acute, and any non-
558           highbit byte, is sufficient to establish this file's encoding.
559
560       •   Pod processors must treat a "=for [label] [content...]" paragraph
561           as meaning the same thing as a "=begin [label]" paragraph, content,
562           and an "=end [label]" paragraph.  (The parser may conflate these
563           two constructs, or may leave them distinct, in the expectation that
564           the formatter will nevertheless treat them the same.)
565
566       •   When rendering Pod to a format that allows comments (i.e., to
567           nearly any format other than plaintext), a Pod formatter must
568           insert comment text identifying its name and version number, and
569           the name and version numbers of any modules it might be using to
570           process the Pod.  Minimal examples:
571
572            %% POD::Pod2PS v3.14159, using POD::Parser v1.92
573
574            <!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
575
576            {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
577
578            .\" Pod::Man version 3.14159, using POD::Parser version 1.92
579
580           Formatters may also insert additional comments, including: the
581           release date of the Pod formatter program, the contact address for
582           the author(s) of the formatter, the current time, the name of input
583           file, the formatting options in effect, version of Perl used, etc.
584
585           Formatters may also choose to note errors/warnings as comments,
586           besides or instead of emitting them otherwise (as in messages to
587           STDERR, or "die"ing).
588
589       •   Pod parsers may emit warnings or error messages ("Unknown E code
590           E<zslig>!") to STDERR (whether through printing to STDERR, or
591           "warn"ing/"carp"ing, or "die"ing/"croak"ing), but must allow
592           suppressing all such STDERR output, and instead allow an option for
593           reporting errors/warnings in some other way, whether by triggering
594           a callback, or noting errors in some attribute of the document
595           object, or some similarly unobtrusive mechanism -- or even by
596           appending a "Pod Errors" section to the end of the parsed form of
597           the document.
598
599       •   In cases of exceptionally aberrant documents, Pod parsers may abort
600           the parse.  Even then, using "die"ing/"croak"ing is to be avoided;
601           where possible, the parser library may simply close the input file
602           and add text like "*** Formatting Aborted ***" to the end of the
603           (partial) in-memory document.
604
605       •   In paragraphs where formatting codes (like E<...>, B<...>) are
606           understood (i.e., not verbatim paragraphs, but including ordinary
607           paragraphs, and command paragraphs that produce renderable text,
608           like "=head1"), literal whitespace should generally be considered
609           "insignificant", in that one literal space has the same meaning as
610           any (nonzero) number of literal spaces, literal newlines, and
611           literal tabs (as long as this produces no blank lines, since those
612           would terminate the paragraph).  Pod parsers should compact literal
613           whitespace in each processed paragraph, but may provide an option
614           for overriding this (since some processing tasks do not require
615           it), or may follow additional special rules (for example, specially
616           treating period-space-space or period-newline sequences).
617
618       •   Pod parsers should not, by default, try to coerce apostrophe (')
619           and quote (") into smart quotes (little 9's, 66's, 99's, etc), nor
620           try to turn backtick (`) into anything else but a single backtick
621           character (distinct from an open quote character!), nor "--" into
622           anything but two minus signs.  They must never do any of those
623           things to text in C<...> formatting codes, and never ever to text
624           in verbatim paragraphs.
625
626       •   When rendering Pod to a format that has two kinds of hyphens (-),
627           one that's a non-breaking hyphen, and another that's a breakable
628           hyphen (as in "object-oriented", which can be split across lines as
629           "object-", newline, "oriented"), formatters are encouraged to
630           generally translate "-" to non-breaking hyphen, but may apply
631           heuristics to convert some of these to breaking hyphens.
632
633       •   Pod formatters should make reasonable efforts to keep words of Perl
634           code from being broken across lines.  For example, "Foo::Bar" in
635           some formatting systems is seen as eligible for being broken across
636           lines as "Foo::" newline "Bar" or even "Foo::-" newline "Bar".
637           This should be avoided where possible, either by disabling all
638           line-breaking in mid-word, or by wrapping particular words with
639           internal punctuation in "don't break this across lines" codes
640           (which in some formats may not be a single code, but might be a
641           matter of inserting non-breaking zero-width spaces between every
642           pair of characters in a word.)
643
644       •   Pod parsers should, by default, expand tabs in verbatim paragraphs
645           as they are processed, before passing them to the formatter or
646           other processor.  Parsers may also allow an option for overriding
647           this.
648
649       •   Pod parsers should, by default, remove newlines from the end of
650           ordinary and verbatim paragraphs before passing them to the
651           formatter.  For example, while the paragraph you're reading now
652           could be considered, in Pod source, to end with (and contain) the
653           newline(s) that end it, it should be processed as ending with (and
654           containing) the period character that ends this sentence.
655
656       •   Pod parsers, when reporting errors, should make some effort to
657           report an approximate line number ("Nested E<>'s in Paragraph #52,
658           near line 633 of Thing/Foo.pm!"), instead of merely noting the
659           paragraph number ("Nested E<>'s in Paragraph #52 of
660           Thing/Foo.pm!").  Where this is problematic, the paragraph number
661           should at least be accompanied by an excerpt from the paragraph
662           ("Nested E<>'s in Paragraph #52 of Thing/Foo.pm, which begins
663           'Read/write accessor for the C<interest rate> attribute...'").
664
665       •   Pod parsers, when processing a series of verbatim paragraphs one
666           after another, should consider them to be one large verbatim
667           paragraph that happens to contain blank lines.  I.e., these two
668           lines, which have a blank line between them:
669
670                   use Foo;
671
672                   print Foo->VERSION
673
674           should be unified into one paragraph ("\tuse Foo;\n\n\tprint
675           Foo->VERSION") before being passed to the formatter or other
676           processor.  Parsers may also allow an option for overriding this.
677
678           While this might be too cumbersome to implement in event-based Pod
679           parsers, it is straightforward for parsers that return parse trees.
680
681       •   Pod formatters, where feasible, are advised to avoid splitting
682           short verbatim paragraphs (under twelve lines, say) across pages.
683
684       •   Pod parsers must treat a line with only spaces and/or tabs on it as
685           a "blank line" such as separates paragraphs.  (Some older parsers
686           recognized only two adjacent newlines as a "blank line" but would
687           not recognize a newline, a space, and a newline, as a blank line.
688           This is noncompliant behavior.)
689
690       •   Authors of Pod formatters/processors should make every effort to
691           avoid writing their own Pod parser.  There are already several in
692           CPAN, with a wide range of interface styles -- and one of them,
693           Pod::Simple, comes with modern versions of Perl.
694
695       •   Characters in Pod documents may be conveyed either as literals, or
696           by number in E<n> codes, or by an equivalent mnemonic, as in
697           E<eacute> which is exactly equivalent to E<233>.  The numbers are
698           the Latin1/Unicode values, even on EBCDIC platforms.
699
700           When referring to characters by using a E<n> numeric code, numbers
701           in the range 32-126 refer to those well known US-ASCII characters
702           (also defined there by Unicode, with the same meaning), which all
703           Pod formatters must render faithfully.  Characters whose E<>
704           numbers are in the ranges 0-31 and 127-159 should not be used
705           (neither as literals, nor as E<number> codes), except for the
706           literal byte-sequences for newline (ASCII 13, ASCII 13 10, or ASCII
707           10), and tab (ASCII 9).
708
709           Numbers in the range 160-255 refer to Latin-1 characters (also
710           defined there by Unicode, with the same meaning).  Numbers above
711           255 should be understood to refer to Unicode characters.
712
713       •   Be warned that some formatters cannot reliably render characters
714           outside 32-126; and many are able to handle 32-126 and 160-255, but
715           nothing above 255.
716
717       •   Besides the well-known "E<lt>" and "E<gt>" codes for less-than and
718           greater-than, Pod parsers must understand "E<sol>" for "/"
719           (solidus, slash), and "E<verbar>" for "|" (vertical bar, pipe).
720           Pod parsers should also understand "E<lchevron>" and "E<rchevron>"
721           as legacy codes for characters 171 and 187, i.e., "left-pointing
722           double angle quotation mark" = "left pointing guillemet" and
723           "right-pointing double angle quotation mark" = "right pointing
724           guillemet".  (These look like little "<<" and ">>", and they are
725           now preferably expressed with the HTML/XHTML codes "E<laquo>" and
726           "E<raquo>".)
727
728       •   Pod parsers should understand all "E<html>" codes as defined in the
729           entity declarations in the most recent XHTML specification at
730           "www.W3.org".  Pod parsers must understand at least the entities
731           that define characters in the range 160-255 (Latin-1).  Pod
732           parsers, when faced with some unknown "E<identifier>" code,
733           shouldn't simply replace it with nullstring (by default, at least),
734           but may pass it through as a string consisting of the literal
735           characters E, less-than, identifier, greater-than.  Or Pod parsers
736           may offer the alternative option of processing such unknown
737           "E<identifier>" codes by firing an event especially for such codes,
738           or by adding a special node-type to the in-memory document tree.
739           Such "E<identifier>" may have special meaning to some processors,
740           or some processors may choose to add them to a special error
741           report.
742
743       •   Pod parsers must also support the XHTML codes "E<quot>" for
744           character 34 (doublequote, "), "E<amp>" for character 38
745           (ampersand, &), and "E<apos>" for character 39 (apostrophe, ').
746
747       •   Note that in all cases of "E<whatever>", whatever (whether an
748           htmlname, or a number in any base) must consist only of
749           alphanumeric characters -- that is, whatever must match
750           "m/\A\w+\z/".  So "E< 0 1 2 3 >" is invalid, because it contains
751           spaces, which aren't alphanumeric characters.  This presumably does
752           not need special treatment by a Pod processor; " 0 1 2 3 " doesn't
753           look like a number in any base, so it would presumably be looked up
754           in the table of HTML-like names.  Since there isn't (and cannot be)
755           an HTML-like entity called " 0 1 2 3 ", this will be treated as an
756           error.  However, Pod processors may treat "E< 0 1 2 3 >" or
757           "E<e-acute>" as syntactically invalid, potentially earning a
758           different error message than the error message (or warning, or
759           event) generated by a merely unknown (but theoretically valid)
760           htmlname, as in "E<qacute>" [sic].  However, Pod parsers are not
761           required to make this distinction.
762
763       •   Note that E<number> must not be interpreted as simply "codepoint
764           number in the current/native character set".  It always means only
765           "the character represented by codepoint number in Unicode."  (This
766           is identical to the semantics of &#number; in XML.)
767
768           This will likely require many formatters to have tables mapping
769           from treatable Unicode codepoints (such as the "\xE9" for the
770           e-acute character) to the escape sequences or codes necessary for
771           conveying such sequences in the target output format.  A converter
772           to *roff would, for example know that "\xE9" (whether conveyed
773           literally, or via a E<...> sequence) is to be conveyed as "e\\*'".
774           Similarly, a program rendering Pod in a Mac OS application window,
775           would presumably need to know that "\xE9" maps to codepoint 142 in
776           MacRoman encoding that (at time of writing) is native for Mac OS.
777           Such Unicode2whatever mappings are presumably already widely
778           available for common output formats.  (Such mappings may be
779           incomplete!  Implementers are not expected to bend over backwards
780           in an attempt to render Cherokee syllabics, Etruscan runes,
781           Byzantine musical symbols, or any of the other weird things that
782           Unicode can encode.)  And if a Pod document uses a character not
783           found in such a mapping, the formatter should consider it an
784           unrenderable character.
785
786       •   If, surprisingly, the implementor of a Pod formatter can't find a
787           satisfactory pre-existing table mapping from Unicode characters to
788           escapes in the target format (e.g., a decent table of Unicode
789           characters to *roff escapes), it will be necessary to build such a
790           table.  If you are in this circumstance, you should begin with the
791           characters in the range 0x00A0 - 0x00FF, which is mostly the
792           heavily used accented characters.  Then proceed (as patience
793           permits and fastidiousness compels) through the characters that the
794           (X)HTML standards groups judged important enough to merit mnemonics
795           for.  These are declared in the (X)HTML specifications at the
796           www.W3.org site.  At time of writing (September 2001), the most
797           recent entity declaration files are:
798
799             http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
800             http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
801             http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
802
803           Then you can progress through any remaining notable Unicode
804           characters in the range 0x2000-0x204D (consult the character tables
805           at www.unicode.org), and whatever else strikes your fancy.  For
806           example, in xhtml-symbol.ent, there is the entry:
807
808             <!ENTITY infin    "&#8734;"> <!-- infinity, U+221E ISOtech -->
809
810           While the mapping "infin" to the character "\x{221E}" will
811           (hopefully) have been already handled by the Pod parser, the
812           presence of the character in this file means that it's reasonably
813           important enough to include in a formatter's table that maps from
814           notable Unicode characters to the codes necessary for rendering
815           them.  So for a Unicode-to-*roff mapping, for example, this would
816           merit the entry:
817
818             "\x{221E}" => '\(in',
819
820           It is eagerly hoped that in the future, increasing numbers of
821           formats (and formatters) will support Unicode characters directly
822           (as (X)HTML does with "&infin;", "&#8734;", or "&#x221E;"),
823           reducing the need for idiosyncratic mappings of
824           Unicode-to-my_escapes.
825
826       •   It is up to individual Pod formatter to display good judgement when
827           confronted with an unrenderable character (which is distinct from
828           an unknown E<thing> sequence that the parser couldn't resolve to
829           anything, renderable or not).  It is good practice to map Latin
830           letters with diacritics (like "E<eacute>"/"E<233>") to the
831           corresponding unaccented US-ASCII letters (like a simple character
832           101, "e"), but clearly this is often not feasible, and an
833           unrenderable character may be represented as "?", or the like.  In
834           attempting a sane fallback (as from E<233> to "e"), Pod formatters
835           may use the %Latin1Code_to_fallback table in Pod::Escapes, or
836           Text::Unidecode, if available.
837
838           For example, this Pod text:
839
840             magic is enabled if you set C<$Currency> to 'E<euro>'.
841
842           may be rendered as: "magic is enabled if you set $Currency to '?'"
843           or as "magic is enabled if you set $Currency to '[euro]'", or as
844           "magic is enabled if you set $Currency to '[x20AC]', etc.
845
846           A Pod formatter may also note, in a comment or warning, a list of
847           what unrenderable characters were encountered.
848
849       •   E<...> may freely appear in any formatting code (other than in
850           another E<...> or in an Z<>).  That is, "X<The E<euro>1,000,000
851           Solution>" is valid, as is "L<The E<euro>1,000,000
852           Solution|Million::Euros>".
853
854       •   Some Pod formatters output to formats that implement non-breaking
855           spaces as an individual character (which I'll call "NBSP"), and
856           others output to formats that implement non-breaking spaces just as
857           spaces wrapped in a "don't break this across lines" code.  Note
858           that at the level of Pod, both sorts of codes can occur: Pod can
859           contain a NBSP character (whether as a literal, or as a "E<160>" or
860           "E<nbsp>" code); and Pod can contain "S<foo I<bar> baz>" codes,
861           where "mere spaces" (character 32) in such codes are taken to
862           represent non-breaking spaces.  Pod parsers should consider
863           supporting the optional parsing of "S<foo I<bar> baz>" as if it
864           were "fooNBSPI<bar>NBSPbaz", and, going the other way, the optional
865           parsing of groups of words joined by NBSP's as if each group were
866           in a S<...> code, so that formatters may use the representation
867           that maps best to what the output format demands.
868
869       •   Some processors may find that the "S<...>" code is easiest to
870           implement by replacing each space in the parse tree under the
871           content of the S, with an NBSP.  But note: the replacement should
872           apply not to spaces in all text, but only to spaces in printable
873           text.  (This distinction may or may not be evident in the
874           particular tree/event model implemented by the Pod parser.)  For
875           example, consider this unusual case:
876
877              S<L</Autoloaded Functions>>
878
879           This means that the space in the middle of the visible link text
880           must not be broken across lines.  In other words, it's the same as
881           this:
882
883              L<"AutoloadedE<160>Functions"/Autoloaded Functions>
884
885           However, a misapplied space-to-NBSP replacement could (wrongly)
886           produce something equivalent to this:
887
888              L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
889
890           ...which is almost definitely not going to work as a hyperlink
891           (assuming this formatter outputs a format supporting hypertext).
892
893           Formatters may choose to just not support the S format code,
894           especially in cases where the output format simply has no NBSP
895           character/code and no code for "don't break this stuff across
896           lines".
897
898       •   Besides the NBSP character discussed above, implementors are
899           reminded of the existence of the other "special" character in
900           Latin-1, the "soft hyphen" character, also known as "discretionary
901           hyphen", i.e. "E<173>" = "E<0xAD>" = "E<shy>").  This character
902           expresses an optional hyphenation point.  That is, it normally
903           renders as nothing, but may render as a "-" if a formatter breaks
904           the word at that point.  Pod formatters should, as appropriate, do
905           one of the following:  1) render this with a code with the same
906           meaning (e.g., "\-" in RTF), 2) pass it through in the expectation
907           that the formatter understands this character as such, or 3) delete
908           it.
909
910           For example:
911
912             sigE<shy>action
913             manuE<shy>script
914             JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
915
916           These signal to a formatter that if it is to hyphenate "sigaction"
917           or "manuscript", then it should be done as "sig-[linebreak]action"
918           or "manu-[linebreak]script" (and if it doesn't hyphenate it, then
919           the "E<shy>" doesn't show up at all).  And if it is to hyphenate
920           "Jarkko" and/or "Hietaniemi", it can do so only at the points where
921           there is a "E<shy>" code.
922
923           In practice, it is anticipated that this character will not be used
924           often, but formatters should either support it, or delete it.
925
926       •   If you think that you want to add a new command to Pod (like, say,
927           a "=biblio" command), consider whether you could get the same
928           effect with a for or begin/end sequence: "=for biblio ..." or
929           "=begin biblio" ... "=end biblio".  Pod processors that don't
930           understand "=for biblio", etc, will simply ignore it, whereas they
931           may complain loudly if they see "=biblio".
932
933       •   Throughout this document, "Pod" has been the preferred spelling for
934           the name of the documentation format.  One may also use "POD" or
935           "pod".  For the documentation that is (typically) in the Pod
936           format, you may use "pod", or "Pod", or "POD".  Understanding these
937           distinctions is useful; but obsessing over how to spell them,
938           usually is not.
939

About L<...> Codes

941       As you can tell from a glance at perlpod, the L<...> code is the most
942       complex of the Pod formatting codes.  The points below will hopefully
943       clarify what it means and how processors should deal with it.
944
945       •   In parsing an L<...> code, Pod parsers must distinguish at least
946           four attributes:
947
948           First:
949               The link-text.  If there is none, this must be "undef".  (E.g.,
950               in "L<Perl Functions|perlfunc>", the link-text is "Perl
951               Functions".  In "L<Time::HiRes>" and even "L<|Time::HiRes>",
952               there is no link text.  Note that link text may contain
953               formatting.)
954
955           Second:
956               The possibly inferred link-text; i.e., if there was no real
957               link text, then this is the text that we'll infer in its place.
958               (E.g., for "L<Getopt::Std>", the inferred link text is
959               "Getopt::Std".)
960
961           Third:
962               The name or URL, or "undef" if none.  (E.g., in "L<Perl
963               Functions|perlfunc>", the name (also sometimes called the page)
964               is "perlfunc".  In "L</CAVEATS>", the name is "undef".)
965
966           Fourth:
967               The section (AKA "item" in older perlpods), or "undef" if none.
968               E.g., in "L<Getopt::Std/DESCRIPTION>", "DESCRIPTION" is the
969               section.  (Note that this is not the same as a manpage section
970               like the "5" in "man 5 crontab".  "Section Foo" in the Pod
971               sense means the part of the text that's introduced by the
972               heading or item whose text is "Foo".)
973
974           Pod parsers may also note additional attributes including:
975
976           Fifth:
977               A flag for whether item 3 (if present) is a URL (like
978               "http://lists.perl.org" is), in which case there should be no
979               section attribute; a Pod name (like "perldoc" and "Getopt::Std"
980               are); or possibly a man page name (like "crontab(5)" is).
981
982           Sixth:
983               The raw original L<...> content, before text is split on "|",
984               "/", etc, and before E<...> codes are expanded.
985
986           (The above were numbered only for concise reference below.  It is
987           not a requirement that these be passed as an actual list or array.)
988
989           For example:
990
991             L<Foo::Bar>
992               =>  undef,                         # link text
993                   "Foo::Bar",                    # possibly inferred link text
994                   "Foo::Bar",                    # name
995                   undef,                         # section
996                   'pod',                         # what sort of link
997                   "Foo::Bar"                     # original content
998
999             L<Perlport's section on NL's|perlport/Newlines>
1000               =>  "Perlport's section on NL's",  # link text
1001                   "Perlport's section on NL's",  # possibly inferred link text
1002                   "perlport",                    # name
1003                   "Newlines",                    # section
1004                   'pod',                         # what sort of link
1005                   "Perlport's section on NL's|perlport/Newlines"
1006                                                  # original content
1007
1008             L<perlport/Newlines>
1009               =>  undef,                         # link text
1010                   '"Newlines" in perlport',      # possibly inferred link text
1011                   "perlport",                    # name
1012                   "Newlines",                    # section
1013                   'pod',                         # what sort of link
1014                   "perlport/Newlines"            # original content
1015
1016             L<crontab(5)/"DESCRIPTION">
1017               =>  undef,                         # link text
1018                   '"DESCRIPTION" in crontab(5)', # possibly inferred link text
1019                   "crontab(5)",                  # name
1020                   "DESCRIPTION",                 # section
1021                   'man',                         # what sort of link
1022                   'crontab(5)/"DESCRIPTION"'     # original content
1023
1024             L</Object Attributes>
1025               =>  undef,                         # link text
1026                   '"Object Attributes"',         # possibly inferred link text
1027                   undef,                         # name
1028                   "Object Attributes",           # section
1029                   'pod',                         # what sort of link
1030                   "/Object Attributes"           # original content
1031
1032             L<https://www.perl.org/>
1033               =>  undef,                         # link text
1034                   "https://www.perl.org/",       # possibly inferred link text
1035                   "https://www.perl.org/",       # name
1036                   undef,                         # section
1037                   'url',                         # what sort of link
1038                   "https://www.perl.org/"         # original content
1039
1040             L<Perl.org|https://www.perl.org/>
1041               =>  "Perl.org",                    # link text
1042                   "https://www.perl.org/",       # possibly inferred link text
1043                   "https://www.perl.org/",       # name
1044                   undef,                         # section
1045                   'url',                         # what sort of link
1046                   "Perl.org|https://www.perl.org/" # original content
1047
1048           Note that you can distinguish URL-links from anything else by the
1049           fact that they match "m/\A\w+:[^:\s]\S*\z/".  So
1050           "L<http://www.perl.com>" is a URL, but "L<HTTP::Response>" isn't.
1051
1052       •   In case of L<...> codes with no "text|" part in them, older
1053           formatters have exhibited great variation in actually displaying
1054           the link or cross reference.  For example, L<crontab(5)> would
1055           render as "the crontab(5) manpage", or "in the crontab(5) manpage"
1056           or just "crontab(5)".
1057
1058           Pod processors must now treat "text|"-less links as follows:
1059
1060             L<name>         =>  L<name|name>
1061             L</section>     =>  L<"section"|/section>
1062             L<name/section> =>  L<"section" in name|name/section>
1063
1064       •   Note that section names might contain markup.  I.e., if a section
1065           starts with:
1066
1067             =head2 About the C<-M> Operator
1068
1069           or with:
1070
1071             =item About the C<-M> Operator
1072
1073           then a link to it would look like this:
1074
1075             L<somedoc/About the C<-M> Operator>
1076
1077           Formatters may choose to ignore the markup for purposes of
1078           resolving the link and use only the renderable characters in the
1079           section name, as in:
1080
1081             <h1><a name="About_the_-M_Operator">About the <code>-M</code>
1082             Operator</h1>
1083
1084             ...
1085
1086             <a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
1087             Operator" in somedoc</a>
1088
1089       •   Previous versions of perlpod distinguished "L<name/"section">"
1090           links from "L<name/item>" links (and their targets).  These have
1091           been merged syntactically and semantically in the current
1092           specification, and section can refer either to a "=headn Heading
1093           Content" command or to a "=item Item Content" command.  This
1094           specification does not specify what behavior should be in the case
1095           of a given document having several things all seeming to produce
1096           the same section identifier (e.g., in HTML, several things all
1097           producing the same anchorname in <a name="anchorname">...</a>
1098           elements).  Where Pod processors can control this behavior, they
1099           should use the first such anchor.  That is, "L<Foo/Bar>" refers to
1100           the first "Bar" section in Foo.
1101
1102           But for some processors/formats this cannot be easily controlled;
1103           as with the HTML example, the behavior of multiple ambiguous <a
1104           name="anchorname">...</a> is most easily just left up to browsers
1105           to decide.
1106
1107       •   In a "L<text|...>" code, text may contain formatting codes for
1108           formatting or for E<...> escapes, as in:
1109
1110             L<B<ummE<234>stuff>|...>
1111
1112           For "L<...>" codes without a "name|" part, only "E<...>" and "Z<>"
1113           codes may occur.  That is, authors should not use
1114           ""L<B<Foo::Bar>>"".
1115
1116           Note, however, that formatting codes and Z<>'s can occur in any and
1117           all parts of an L<...> (i.e., in name, section, text, and url).
1118
1119           Authors must not nest L<...> codes.  For example, "L<The
1120           L<Foo::Bar> man page>" should be treated as an error.
1121
1122       •   Note that Pod authors may use formatting codes inside the "text"
1123           part of "L<text|name>" (and so on for L<text|/"sec">).
1124
1125           In other words, this is valid:
1126
1127             Go read L<the docs on C<$.>|perlvar/"$.">
1128
1129           Some output formats that do allow rendering "L<...>" codes as
1130           hypertext, might not allow the link-text to be formatted; in that
1131           case, formatters will have to just ignore that formatting.
1132
1133       •   At time of writing, "L<name>" values are of two types: either the
1134           name of a Pod page like "L<Foo::Bar>" (which might be a real Perl
1135           module or program in an @INC / PATH directory, or a .pod file in
1136           those places); or the name of a Unix man page, like
1137           "L<crontab(5)>".  In theory, "L<chmod>" is ambiguous between a Pod
1138           page called "chmod", or the Unix man page "chmod" (in whatever man-
1139           section).  However, the presence of a string in parens, as in
1140           "crontab(5)", is sufficient to signal that what is being discussed
1141           is not a Pod page, and so is presumably a Unix man page.  The
1142           distinction is of no importance to many Pod processors, but some
1143           processors that render to hypertext formats may need to distinguish
1144           them in order to know how to render a given "L<foo>" code.
1145
1146       •   Previous versions of perlpod allowed for a "L<section>" syntax (as
1147           in "L<Object Attributes>"), which was not easily distinguishable
1148           from "L<name>" syntax and for "L<"section">" which was only
1149           slightly less ambiguous.  This syntax is no longer in the
1150           specification, and has been replaced by the "L</section>" syntax
1151           (where the slash was formerly optional).  Pod parsers should
1152           tolerate the "L<"section">" syntax, for a while at least.  The
1153           suggested heuristic for distinguishing "L<section>" from "L<name>"
1154           is that if it contains any whitespace, it's a section.  Pod
1155           processors should warn about this being deprecated syntax.
1156

About =over...=back Regions

1158       "=over"..."=back" regions are used for various kinds of list-like
1159       structures.  (I use the term "region" here simply as a collective term
1160       for everything from the "=over" to the matching "=back".)
1161
1162       •   The non-zero numeric indentlevel in "=over indentlevel" ...
1163           "=back" is used for giving the formatter a clue as to how many
1164           "spaces" (ems, or roughly equivalent units) it should tab over,
1165           although many formatters will have to convert this to an absolute
1166           measurement that may not exactly match with the size of spaces (or
1167           M's) in the document's base font.  Other formatters may have to
1168           completely ignore the number.  The lack of any explicit indentlevel
1169           parameter is equivalent to an indentlevel value of 4.  Pod
1170           processors may complain if indentlevel is present but is not a
1171           positive number matching "m/\A(\d*\.)?\d+\z/".
1172
1173       •   Authors of Pod formatters are reminded that "=over" ... "=back" may
1174           map to several different constructs in your output format.  For
1175           example, in converting Pod to (X)HTML, it can map to any of
1176           <ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
1177           <blockquote>...</blockquote>.  Similarly, "=item" can map to <li>
1178           or <dt>.
1179
1180       •   Each "=over" ... "=back" region should be one of the following:
1181
1182           •   An "=over" ... "=back" region containing only "=item *"
1183               commands, each followed by some number of ordinary/verbatim
1184               paragraphs, other nested "=over" ... "=back" regions, "=for..."
1185               paragraphs, and "=begin"..."=end" regions.
1186
1187               (Pod processors must tolerate a bare "=item" as if it were
1188               "=item *".)  Whether "*" is rendered as a literal asterisk, an
1189               "o", or as some kind of real bullet character, is left up to
1190               the Pod formatter, and may depend on the level of nesting.
1191
1192           •   An "=over" ... "=back" region containing only
1193               "m/\A=item\s+\d+\.?\s*\z/" paragraphs, each one (or each group
1194               of them) followed by some number of ordinary/verbatim
1195               paragraphs, other nested "=over" ... "=back" regions, "=for..."
1196               paragraphs, and/or "=begin"..."=end" codes.  Note that the
1197               numbers must start at 1 in each section, and must proceed in
1198               order and without skipping numbers.
1199
1200               (Pod processors must tolerate lines like "=item 1" as if they
1201               were "=item 1.", with the period.)
1202
1203           •   An "=over" ... "=back" region containing only "=item [text]"
1204               commands, each one (or each group of them) followed by some
1205               number of ordinary/verbatim paragraphs, other nested "=over"
1206               ... "=back" regions, or "=for..." paragraphs, and
1207               "=begin"..."=end" regions.
1208
1209               The "=item [text]" paragraph should not match
1210               "m/\A=item\s+\d+\.?\s*\z/" or "m/\A=item\s+\*\s*\z/", nor
1211               should it match just "m/\A=item\s*\z/".
1212
1213           •   An "=over" ... "=back" region containing no "=item" paragraphs
1214               at all, and containing only some number of ordinary/verbatim
1215               paragraphs, and possibly also some nested "=over" ... "=back"
1216               regions, "=for..." paragraphs, and "=begin"..."=end" regions.
1217               Such an itemless "=over" ... "=back" region in Pod is
1218               equivalent in meaning to a "<blockquote>...</blockquote>"
1219               element in HTML.
1220
1221           Note that with all the above cases, you can determine which type of
1222           "=over" ... "=back" you have, by examining the first (non-"=cut",
1223           non-"=pod") Pod paragraph after the "=over" command.
1224
1225       •   Pod formatters must tolerate arbitrarily large amounts of text in
1226           the "=item text..." paragraph.  In practice, most such paragraphs
1227           are short, as in:
1228
1229             =item For cutting off our trade with all parts of the world
1230
1231           But they may be arbitrarily long:
1232
1233             =item For transporting us beyond seas to be tried for pretended
1234             offenses
1235
1236             =item He is at this time transporting large armies of foreign
1237             mercenaries to complete the works of death, desolation and
1238             tyranny, already begun with circumstances of cruelty and perfidy
1239             scarcely paralleled in the most barbarous ages, and totally
1240             unworthy the head of a civilized nation.
1241
1242       •   Pod processors should tolerate "=item *" / "=item number" commands
1243           with no accompanying paragraph.  The middle item is an example:
1244
1245             =over
1246
1247             =item 1
1248
1249             Pick up dry cleaning.
1250
1251             =item 2
1252
1253             =item 3
1254
1255             Stop by the store.  Get Abba Zabas, Stoli, and cheap lawn chairs.
1256
1257             =back
1258
1259       •   No "=over" ... "=back" region can contain headings.  Processors may
1260           treat such a heading as an error.
1261
1262       •   Note that an "=over" ... "=back" region should have some content.
1263           That is, authors should not have an empty region like this:
1264
1265             =over
1266
1267             =back
1268
1269           Pod processors seeing such a contentless "=over" ... "=back"
1270           region, may ignore it, or may report it as an error.
1271
1272       •   Processors must tolerate an "=over" list that goes off the end of
1273           the document (i.e., which has no matching "=back"), but they may
1274           warn about such a list.
1275
1276       •   Authors of Pod formatters should note that this construct:
1277
1278             =item Neque
1279
1280             =item Porro
1281
1282             =item Quisquam Est
1283
1284             Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1285             velit, sed quia non numquam eius modi tempora incidunt ut
1286             labore et dolore magnam aliquam quaerat voluptatem.
1287
1288             =item Ut Enim
1289
1290           is semantically ambiguous, in a way that makes formatting decisions
1291           a bit difficult.  On the one hand, it could be mention of an item
1292           "Neque", mention of another item "Porro", and mention of another
1293           item "Quisquam Est", with just the last one requiring the
1294           explanatory paragraph "Qui dolorem ipsum quia dolor..."; and then
1295           an item "Ut Enim".  In that case, you'd want to format it like so:
1296
1297             Neque
1298
1299             Porro
1300
1301             Quisquam Est
1302               Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1303               velit, sed quia non numquam eius modi tempora incidunt ut
1304               labore et dolore magnam aliquam quaerat voluptatem.
1305
1306             Ut Enim
1307
1308           But it could equally well be a discussion of three (related or
1309           equivalent) items, "Neque", "Porro", and "Quisquam Est", followed
1310           by a paragraph explaining them all, and then a new item "Ut Enim".
1311           In that case, you'd probably want to format it like so:
1312
1313             Neque
1314             Porro
1315             Quisquam Est
1316               Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1317               velit, sed quia non numquam eius modi tempora incidunt ut
1318               labore et dolore magnam aliquam quaerat voluptatem.
1319
1320             Ut Enim
1321
1322           But (for the foreseeable future), Pod does not provide any way for
1323           Pod authors to distinguish which grouping is meant by the above
1324           "=item"-cluster structure.  So formatters should format it like so:
1325
1326             Neque
1327
1328             Porro
1329
1330             Quisquam Est
1331
1332               Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1333               velit, sed quia non numquam eius modi tempora incidunt ut
1334               labore et dolore magnam aliquam quaerat voluptatem.
1335
1336             Ut Enim
1337
1338           That is, there should be (at least roughly) equal spacing between
1339           items as between paragraphs (although that spacing may well be less
1340           than the full height of a line of text).  This leaves it to the
1341           reader to use (con)textual cues to figure out whether the "Qui
1342           dolorem ipsum..." paragraph applies to the "Quisquam Est" item or
1343           to all three items "Neque", "Porro", and "Quisquam Est".  While not
1344           an ideal situation, this is preferable to providing formatting cues
1345           that may be actually contrary to the author's intent.
1346

About Data Paragraphs and "=begin/=end" Regions

1348       Data paragraphs are typically used for inlining non-Pod data that is to
1349       be used (typically passed through) when rendering the document to a
1350       specific format:
1351
1352         =begin rtf
1353
1354         \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
1355
1356         =end rtf
1357
1358       The exact same effect could, incidentally, be achieved with a single
1359       "=for" paragraph:
1360
1361         =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
1362
1363       (Although that is not formally a data paragraph, it has the same
1364       meaning as one, and Pod parsers may parse it as one.)
1365
1366       Another example of a data paragraph:
1367
1368         =begin html
1369
1370         I like <em>PIE</em>!
1371
1372         <hr>Especially pecan pie!
1373
1374         =end html
1375
1376       If these were ordinary paragraphs, the Pod parser would try to expand
1377       the "E</em>" (in the first paragraph) as a formatting code, just like
1378       "E<lt>" or "E<eacute>".  But since this is in a "=begin
1379       identifier"..."=end identifier" region and the identifier "html"
1380       doesn't begin have a ":" prefix, the contents of this region are stored
1381       as data paragraphs, instead of being processed as ordinary paragraphs
1382       (or if they began with a spaces and/or tabs, as verbatim paragraphs).
1383
1384       As a further example: At time of writing, no "biblio" identifier is
1385       supported, but suppose some processor were written to recognize it as a
1386       way of (say) denoting a bibliographic reference (necessarily containing
1387       formatting codes in ordinary paragraphs).  The fact that "biblio"
1388       paragraphs were meant for ordinary processing would be indicated by
1389       prefacing each "biblio" identifier with a colon:
1390
1391         =begin :biblio
1392
1393         Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1394         Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1395
1396         =end :biblio
1397
1398       This would signal to the parser that paragraphs in this begin...end
1399       region are subject to normal handling as ordinary/verbatim paragraphs
1400       (while still tagged as meant only for processors that understand the
1401       "biblio" identifier).  The same effect could be had with:
1402
1403         =for :biblio
1404         Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1405         Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1406
1407       The ":" on these identifiers means simply "process this stuff normally,
1408       even though the result will be for some special target".  I suggest
1409       that parser APIs report "biblio" as the target identifier, but also
1410       report that it had a ":" prefix.  (And similarly, with the above
1411       "html", report "html" as the target identifier, and note the lack of a
1412       ":" prefix.)
1413
1414       Note that a "=begin identifier"..."=end identifier" region where
1415       identifier begins with a colon, can contain commands.  For example:
1416
1417         =begin :biblio
1418
1419         Wirth's classic is available in several editions, including:
1420
1421         =for comment
1422          hm, check abebooks.com for how much used copies cost.
1423
1424         =over
1425
1426         =item
1427
1428         Wirth, Niklaus.  1975.  I<Algorithmen und Datenstrukturen.>
1429         Teubner, Stuttgart.  [Yes, it's in German.]
1430
1431         =item
1432
1433         Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1434         Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1435
1436         =back
1437
1438         =end :biblio
1439
1440       Note, however, a "=begin identifier"..."=end identifier" region where
1441       identifier does not begin with a colon, should not directly contain
1442       "=head1" ... "=head4" commands, nor "=over", nor "=back", nor "=item".
1443       For example, this may be considered invalid:
1444
1445         =begin somedata
1446
1447         This is a data paragraph.
1448
1449         =head1 Don't do this!
1450
1451         This is a data paragraph too.
1452
1453         =end somedata
1454
1455       A Pod processor may signal that the above (specifically the "=head1"
1456       paragraph) is an error.  Note, however, that the following should not
1457       be treated as an error:
1458
1459         =begin somedata
1460
1461         This is a data paragraph.
1462
1463         =cut
1464
1465         # Yup, this isn't Pod anymore.
1466         sub excl { (rand() > .5) ? "hoo!" : "hah!" }
1467
1468         =pod
1469
1470         This is a data paragraph too.
1471
1472         =end somedata
1473
1474       And this too is valid:
1475
1476         =begin someformat
1477
1478         This is a data paragraph.
1479
1480           And this is a data paragraph.
1481
1482         =begin someotherformat
1483
1484         This is a data paragraph too.
1485
1486           And this is a data paragraph too.
1487
1488         =begin :yetanotherformat
1489
1490         =head2 This is a command paragraph!
1491
1492         This is an ordinary paragraph!
1493
1494           And this is a verbatim paragraph!
1495
1496         =end :yetanotherformat
1497
1498         =end someotherformat
1499
1500         Another data paragraph!
1501
1502         =end someformat
1503
1504       The contents of the above "=begin :yetanotherformat" ...  "=end
1505       :yetanotherformat" region aren't data paragraphs, because the
1506       immediately containing region's identifier (":yetanotherformat") begins
1507       with a colon.  In practice, most regions that contain data paragraphs
1508       will contain only data paragraphs; however, the above nesting is
1509       syntactically valid as Pod, even if it is rare.  However, the handlers
1510       for some formats, like "html", will accept only data paragraphs, not
1511       nested regions; and they may complain if they see (targeted for them)
1512       nested regions, or commands, other than "=end", "=pod", and "=cut".
1513
1514       Also consider this valid structure:
1515
1516         =begin :biblio
1517
1518         Wirth's classic is available in several editions, including:
1519
1520         =over
1521
1522         =item
1523
1524         Wirth, Niklaus.  1975.  I<Algorithmen und Datenstrukturen.>
1525         Teubner, Stuttgart.  [Yes, it's in German.]
1526
1527         =item
1528
1529         Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1530         Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1531
1532         =back
1533
1534         Buy buy buy!
1535
1536         =begin html
1537
1538         <img src='wirth_spokesmodeling_book.png'>
1539
1540         <hr>
1541
1542         =end html
1543
1544         Now now now!
1545
1546         =end :biblio
1547
1548       There, the "=begin html"..."=end html" region is nested inside the
1549       larger "=begin :biblio"..."=end :biblio" region.  Note that the content
1550       of the "=begin html"..."=end html" region is data paragraph(s), because
1551       the immediately containing region's identifier ("html") doesn't begin
1552       with a colon.
1553
1554       Pod parsers, when processing a series of data paragraphs one after
1555       another (within a single region), should consider them to be one large
1556       data paragraph that happens to contain blank lines.  So the content of
1557       the above "=begin html"..."=end html" may be stored as two data
1558       paragraphs (one consisting of "<img
1559       src='wirth_spokesmodeling_book.png'>\n" and another consisting of
1560       "<hr>\n"), but should be stored as a single data paragraph (consisting
1561       of "<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n").
1562
1563       Pod processors should tolerate empty "=begin something"..."=end
1564       something" regions, empty "=begin :something"..."=end :something"
1565       regions, and contentless "=for something" and "=for :something"
1566       paragraphs.  I.e., these should be tolerated:
1567
1568         =for html
1569
1570         =begin html
1571
1572         =end html
1573
1574         =begin :biblio
1575
1576         =end :biblio
1577
1578       Incidentally, note that there's no easy way to express a data paragraph
1579       starting with something that looks like a command.  Consider:
1580
1581         =begin stuff
1582
1583         =shazbot
1584
1585         =end stuff
1586
1587       There, "=shazbot" will be parsed as a Pod command "shazbot", not as a
1588       data paragraph "=shazbot\n".  However, you can express a data paragraph
1589       consisting of "=shazbot\n" using this code:
1590
1591         =for stuff =shazbot
1592
1593       The situation where this is necessary, is presumably quite rare.
1594
1595       Note that =end commands must match the currently open =begin command.
1596       That is, they must properly nest.  For example, this is valid:
1597
1598         =begin outer
1599
1600         X
1601
1602         =begin inner
1603
1604         Y
1605
1606         =end inner
1607
1608         Z
1609
1610         =end outer
1611
1612       while this is invalid:
1613
1614         =begin outer
1615
1616         X
1617
1618         =begin inner
1619
1620         Y
1621
1622         =end outer
1623
1624         Z
1625
1626         =end inner
1627
1628       This latter is improper because when the "=end outer" command is seen,
1629       the currently open region has the formatname "inner", not "outer".  (It
1630       just happens that "outer" is the format name of a higher-up region.)
1631       This is an error.  Processors must by default report this as an error,
1632       and may halt processing the document containing that error.  A
1633       corollary of this is that regions cannot "overlap". That is, the latter
1634       block above does not represent a region called "outer" which contains X
1635       and Y, overlapping a region called "inner" which contains Y and Z.  But
1636       because it is invalid (as all apparently overlapping regions would be),
1637       it doesn't represent that, or anything at all.
1638
1639       Similarly, this is invalid:
1640
1641         =begin thing
1642
1643         =end hting
1644
1645       This is an error because the region is opened by "thing", and the
1646       "=end" tries to close "hting" [sic].
1647
1648       This is also invalid:
1649
1650         =begin thing
1651
1652         =end
1653
1654       This is invalid because every "=end" command must have a formatname
1655       parameter.
1656

AUTHOR

1661       Sean M. Burke
1662
1663
1664
1665perl v5.36.0                      2022-08-30                    PERLPODSPEC(1)