1PrettyPrinter(3)      User Contributed Perl Documentation     PrettyPrinter(3)
2
3
4

NAME

6        HTML::PrettyPrinter - generate nice HTML files from HTML syntax trees
7

SYNOPSIS

9         use HTML::TreeBuilder;
10         # generate a HTML syntax tree
11         my $tree = new HTML::TreeBuilder;
12         $tree->parse_file($file_name);
13         # modify the tree if you want
14
15         use HTML::PrettyPrinter;
16         my $hpp = new HTML::PrettyPrinter ('linelength' => 130,
17                                            'quote_attr' => 1);
18         # configure
19         $tree->address("0.1.0")->attr(_hpp_indent,0);    # for an individual element
20         $hpp->set_force_nl(1,qw(body head));             # for tags
21         $hpp->set_force_nl(1,qw(@SECTIONS));             # as above
22         $hpp->set_nl_inside(0,'default!');               # for all tags
23
24         # format the source
25         my $linearray_ref = $hpp->format($tree);
26         print @$linearray_ref;
27
28         # alternative: print directly to filehandle
29         use FileHandle;
30         my $fh = new FileHandel ">$filenaem2";
31         if (defined $fh) {
32           $hpp->select($fh);
33           $hpp->format();
34           undef $fh;
35           $hpp->select(undef),
36         }
37

DESCRIPTION

39       HTML::PrettyPrinter produces nicely formatted HTML code from a HTML
40       syntax tree. It is especially usefull if the produced HTML file shall
41       be read or edited manually afterwards. Various parameters let you adapt
42       the output to different styles and requirements.
43
44       If you don't care how the HTML source looks like as long as it is valid
45       and readable by browsers, you should use the as_HTML() method of
46       HTML::Element instead of the pretty printer. It is about five times
47       faster.
48
49       The pretty printer will handle line wrapping, indention and structuring
50       by the way the whitespace in the tree is represented in the output.
51       Furthermore upper/lowercase markup and markup minimization, quoting of
52       attribute values, the encoding of entities and the presence of optional
53       end tags are configurable.
54
55       There are two types of parameters to influence the output, individual
56       parameters that are set on a per element and per tag basis and common
57       parameters that are set only once for each instance of a pretty
58       printer.
59
60       In order to faciliate the configuration a mechanism to handle tag
61       groups is provided. Thus, it is possible to modify a parameter for a
62       group of tags (e.g. all known block elements) without writing each tag
63       name explicitly.  Perhaps the code for tag groups will move to an other
64       Perl module in the future.
65
66       For HTML::Elements that require a special treatment like <PRE>, <XMP>,
67       <SCRIPT>, comments and declarations, pretty printer will fall back to
68       the method "as_HTML()" of the HTML elements.
69

INDIVIDUAL PARAMETERS

71       Following individual paramters exist
72
73       indent n
74           The indent of new lines inside the element is increased by n
75           coloumns. Default is 2 for all tags.
76
77       skip bool
78           If true, the element and its content is skipped from output.
79           Default is false.
80
81       nl_before n
82           Number of newlines before the start tag. Default is 0 for inline
83           elements and 1 for other elements.
84
85       nl_inside n
86           Number of newlines between the tags and the contents of an element.
87           Default is 0.
88
89       nl_after n
90           Number of newlines after an element. Default is 0 for inline
91           elements and 1 for other elements.
92
93       force_nl bool
94           Force linebreaks before and after an element even if the HTML tree
95           does not contain whitespace at this place. Default is false for
96           inline elements and true for all other elements. This parameter is
97           superseded if the common parameter allow_forced_nl is set to false.
98
99       endtag bool
100           Print an optional endtag. Default is true.
101
102   Access Methods
103       Following access methods exist for each individual paramenter.  Replace
104       parameter by the respective name.
105
106       $hpp->parameter($element)
107           Takes a reference to an HTML element as argument. Returns the value
108           of the parameter for that element. The priority to retrieve the
109           value is:
110
111           1.  The value of the element's internal attribute "_hpp_parameter".
112
113           2.  The value specified inside the pretty printer for the tag of
114               the element.
115
116           3.  The value specified inside the pretty printer for 'default!'.
117
118       $hpp->parameter('tag')
119           Like "parameter($element)", except that only priorities 2 and 3 are
120           evaluated.
121
122       $hpp->set_parameter($value,'tag1','tag2',...)
123           Sets the parameter for each tag in the list to $value.
124
125           If $value is undefined, the entries for the tags are deleted.
126
127           Beside individual tags the list may include tag groups like
128           '@BLOCK' (see below) and '"default!"'. Individual tag names are
129           written in lower case, the names of tag groups start with an '@'
130           and are written in upper case letters. Tag groups are expanded
131           during the call of "set_parameter()".  '"default!"' sets the
132           default value, which is retrived if no value is defined for the
133           individual element or tag.
134
135       $hpp->set_parameter($value,'all!')
136           Deletes all existing settings for parameter inside the pretty
137           printer and sets the default to $value..
138

COMMON PARAMETERS

140       tabify n
141           If non zero, each n spaces at the beginnig of a line are converted
142           into one TAB. Default is 8.
143
144       linelength n
145           The maximum number of character a line should have. Default is 80.
146
147           The linelength may be exceeded if there is no proper way to break a
148           line without modifying the content, e.g. inside <PRE> and other
149           special elements or if there is no whitespace.
150
151       min_bool_attr bool
152           Minimize boolean attributes, e.g. print <UL COMPACT> instead of <UL
153           COMPACT=COMPACT>. Default is true.
154
155       quote_attr bool
156           Always quote attribute values. If false, attribute values
157           consisting entirely of letters, digits, periods and hyphens only
158           are not put into quotes. Default is false.
159
160       entities string
161           The string contains all characters that are escaped to their entity
162           names.  Default is the bare minimum of "&<>" plus the non breaking
163           space 'nbsp' (because otherwise it is difficult for the human eye
164           to distiguish it from a normal space in most editors).
165
166       wrap_at_tagend NEVER|AFTER_ATTR|ALWAYS
167           May pretty printer wrap lines before the closing ankle of a start
168           tag?  Supported values are the predifined constants NEVER (allow
169           line wraps at white space only ), AFTER_ATTR (allow line wraps at
170           the end of tags that contain attributes only) and ALWAYS (allow
171           line wraps at the end of every start tag). Default is AFTER_ATTR.
172
173       allow_forced_nl bool
174           Allow the addition of white space, that is not in the HTML tree.
175           If set to false (the default) the force_nl parameter is ignored.
176           It is recomended to set this parameter to true if the HTML tree was
177           generated with ignore_ignorable_whitespace set to true.
178
179       uppercase bool
180           Use uppercase letters for markup. Default is the value of
181           $HTML::Element::html_uc at the time the constructor is called.
182
183   Access Method
184       $hpp->paramter([value])
185           Retrieves and optionaly sets the parameter.
186

OTHER METHODS

188       $hpp = HTML::PrettyPrinter->new(%common_paremeters)
189           This class method creates a new HTML::PrettyPrinter and returns it.
190           Key/value pair arguments may be provided to overwrite the default
191           settings of common parameters. There is currently no mechanism to
192           overwrite the default values for individual parameters at
193           construction. Use the "$hpp-"set_parameter()> methods instead.
194
195       $hpp->select($fh)
196           Select a FileHandle object for output.
197
198           If a FileHandle is selected the generated HTML is printed directly
199           to that file. With $hpp->select(undef) you can switch back to the
200           default behaviour.
201
202       $line_array_ref = $hpp->format($tree,[$indent],[$line_array_ref])
203           Format the HTML syntax (sub-) tree.
204
205           $tree is not restricted to the root of the HTML syntax tree. A
206           reference to any HTML::Element will do.
207
208           The optional $indent indents the first element by n characters
209
210           Return value is the reference to an array with the generated lines.
211           If such a reference is provided as third argument, the lines will
212           be appended to that array. Otherwise a new array will be created.
213
214           If a FileHandle is selected by a previous call of the
215           "$hpp-"select($fh)> method, the lines are printed to the FileHandle
216           object directly.  The array of lines is not changed in this case.
217

TAG GROUPS

219       Tag groups are lists that contain the names of tags and other tag
220       groups which are considered as subsets. This reflects the way allowed
221       content is specified in HTML DTDs, where e.g. %flow consists of all
222       %block and %inline elements and %inline covers several subsets like
223       %phrase.
224
225       If you add a tag name to a group A, it will be seen in any group that
226       contains group A. Thus, it is easy to maintain groups of tags with
227       similar properties. (and configure HTML pretty printer for these tags).
228
229       The names of tag groups are written in upper case letters with a
230       leading '@' (e.g. '@BLOCK'). The names of simple tags are written all
231       lower case.
232
233   Functions
234       All the functions to handle and modify tag groups are included in the
235       @EXPORT_OK list of "HTML::PrettyPrinter".
236
237       @tag_groups = list_groups()
238           Returns a list with the names of all defined tag groups
239
240       @tags = group_expand('tag_or_tag_group0',['tag_or_tag_group1',...])
241           Returns a list of every tag in the tag groups and their subgroups
242           Each tag is listed once only. The order of the list is not
243           specified.
244
245       @tag_groups = sub_group('tag_group0',['tag_group1',...])
246           Returns a list of every tag group and sub group in the list.  Each
247           group is listed once only. The order of the list is not specified.
248
249       group_get('@NAME')
250           Return the (unexpanded) contents of a tag group.
251
252       "group_set('@NAME',['tag_or_tag_group0',...])"
253           Set a tag group.
254
255       "group_add('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
256           Add tags and tag groups to a group.
257
258       "group_remove('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
259           Remove tags or tag groups from a group. Subgroups are not expanded.
260           Thus, "group_remove('@A','@B')" will remove '@B' from '@A' if it is
261           included directly. Tags included in '@B' will not be removed from
262           '@A'.  Nor will '@A' be changed if '@B' is included in a aubgroup
263           of '@A' but not in '@A' directly.
264
265   Predefined Tag Groups
266       There are a couple of predefined tag groups. Use "  foreach my $tg
267       (list_groups()) {
268           print "'$tg' => qw(".join(',',group_get($tg)).")\n";
269         } " to get a list.
270
271   Examples for tag groups
272       1. create some groups
273           "
274             group_set('@A',qw(a1 a2 a3));
275             group_set('@B',qw(b1 b2));
276             group_set('@C',qw(@A @B c1 @D));
277             # @D needs to be defined when @C is expannded
278             group_set('@D',qw(d1 @B));
279             group_set('@E',qw(e1 @D));
280             group_set('@F',qw(f1 @A)); "
281
282       2. add tags
283           "
284             group_add('@A',qw(a4 a5)); # @A contains (a1 a2 a3 a4 a5)
285             group_add('@D',qw(d1));    # @D contains (d1 @B d1)
286             group_add('@F',group_exapand('@B'),'@F');
287             # @F contains (f1 @A b1 b2 f1 @F) "
288
289       3. evaluate
290           "
291             group_exapand('@E');    # returns e1, d1, b1, b2
292             sub_groups('@E');       # returns @B, @D
293             sub_groups(qw(@E @F));  # returns @A, @B, @D
294             group_get('@F'));       # returns f1, @A, b1, b2, f1, @F "
295
296       4. remove tags
297           "
298             group_remove('@E','@C');  # @E not changed, because it doesn't
299           contain @C
300             group_remove('@E','@D');  # @D removed from @E
301             group_remove('@D','d1');  # all d1's are removed. Now @D contains
302           @B only
303             group_remove('@C','@B');  # @C now contains (@a c1 @D), Thus
304             sub_groups('@C');         # still returns @A, @B, @D,
305                                       # because @B is included in @D, too "
306
307       5. application
308           "
309             # set the indent for tags b1, b2, e1, g1 to 0
310             $hpp->set_indent(0,qw(@D @E g1)); "
311
312           If the groups @D or @E are modified afterwards, the configuration
313           of the pretty printer is not affected, because "set_indent()" will
314           expand the tag groups.
315

EXAMPLE

317       Consider the following HTML tree
318
319           <html> @0
320             <head> @0.0
321               <title> @0.0.0
322                 "Demonstrate HTML::PrettyPrinter"
323             <body> @0.1
324               <h1> @0.1.0
325                 "Headline"
326               <p align="JUSTIFY"> @0.1.1
327                 "Some text in "
328                 <b> @0.1.1.1
329                   "bold"
330                 " and "
331                 <i> @0.1.1.3
332                   "italics"
333                 " and with 'ä' & 'ü'."
334               <table align="LEFT" border=0> @0.1.2
335                 <tr> @0.1.2.0
336                   <td align="RIGHT"> @0.1.2.0.0
337                     "top right"
338                 <tr> @0.1.2.1
339                   <td align="LEFT"> @0.1.2.1.0
340                     "bottom left"
341               <hr noshade="NOSHADE" size=5> @0.1.3
342               <address> @0.1.4
343                 <a href="mailto:schotten@gmx.de"> @0.1.4.0
344                   "Claus Schotten"
345
346       and "
347         $hpp = HTML::PrettyPrinter-"new('uppercase' => 1);
348         print @{$hpp->format($tree)}; >
349
350       will print
351
352         <HTML><HEAD><TITLE>Demonstrate
353               HTML::PrettyPrinter</TITLE></HEAD><BODY><H1>Headline</H1><P
354               ALIGN=JUSTIFY>Some text in <B>bold</B> and
355               <I>italics</I> and with 'ä' &amp; 'ü'.</P><TABLE
356               ALIGN=LEFT BORDER=0><TR><TD ALIGN=RIGHT>top
357                   right</TD></TR><TR><TD ALIGN=LEFT>bottom
358                   left</TD></TR></TABLE><HR NOSHADE SIZE=5
359               ><ADDRESS><A HREF="mailto:schotten@gmx.de"
360                 >Claus&nbsp;Schotten</A></ADDRESS></BODY></HTML>
361
362       That doesn't look very nice. What went wrong? By default
363       HTML::PrettyPrinter takes a conservative approach on whitespace. It
364       will enlarge existing whitespace, but it will not introduce new
365       whitespace outside of tags, because that might change the way a browser
366       renders the HTML document. However the HTML tree was constructed with
367       ""ignore_ignorable_whitespace> turned on.  Thus, there is no whitespace
368       between block elements that the pretty printer could format. So pretty
369       printer does line wrapping and indention only.  E.g. the title is in
370       the third level of the tree. Thus, the second line is indented six
371       characters. The table cells in the fifth level are indented by ten
372       characters. Furthermore, you see that there is a whitespace inserted
373       after the last attribute of the <A> tag.
374
375       Let's set $hpp->allow_forced_nl(1);. Now the forced_nl parameters are
376       enabled. By default, they are set for all non-inline tags. That creates
377
378        <HTML>
379          <HEAD>
380            <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
381          </HEAD>
382          <BODY>
383            <H1>Headline</H1>
384            <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
385              <I>italics</I> and with 'ä' &amp; 'ü'.</P>
386            <TABLE ALIGN=LEFT BORDER=0>
387              <TR>
388                <TD ALIGN=RIGHT>top right</TD>
389              </TR>
390              <TR>
391                <TD ALIGN=LEFT>bottom left</TD>
392              </TR>
393            </TABLE>
394            <HR NOSHADE SIZE=5>
395            <ADDRESS><A HREF="mailto:schotten@gmx.de"
396                >Claus&nbsp;Schotten</A></ADDRESS>
397          </BODY>
398        </HTML>
399
400       Much better, isn't it? Now let's improve the structuring.
401         $hpp->set_nl_before(2,qw(body table));
402         $hpp->set_nl_after(2,qw(table)); will require two new lines in front
403       of <body> and <table> tags and after <table> tags.
404
405        <HTML>
406          <HEAD>
407            <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
408          </HEAD>
409
410          <BODY>
411            <H1>Headline</H1>
412            <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
413              <I>italics</I> and with 'ä' &amp; 'ü'.</P>
414
415            <TABLE ALIGN=LEFT BORDER=0>
416              <TR>
417                <TD ALIGN=RIGHT>top right</TD>
418              </TR>
419              <TR>
420                <TD ALIGN=LEFT>bottom left</TD>
421              </TR>
422            </TABLE>
423
424            <HR NOSHADE SIZE=5>
425            <ADDRESS><A HREF="mailto:schotten@gmx.de"
426                >Claus&nbsp;Schotten</A></ADDRESS>
427          </BODY>
428        </HTML>
429
430       Currently the mail address is the only attribute value which is quoted.
431       Here the quotes are required by the '@' character. For all other
432       attribute values quotes are optional and thus ommited by default.
433       $hpp->quote_attr(1); will turn the quotes on.
434
435       $hpp->set_endtag(0,'all!') turns all optional endtags off.  This
436       affects the </p> (and should affect </tr> and </td>, see below).
437       Alternatively, we could use $hpp->set_endtag(0,'default!'). That would
438       turn the default off, too. But it wouldn't delete settings for
439       individual tags that supersede the default.
440
441       $hpp->set_nl_after(3,'head') requires three new lines after the <head>
442       element. Because there are already two new lines required by the start
443       of <body> only one additional line is added.
444
445       $hpp->set_force_nl(0,'td') will inhibit the introduction of whitespace
446       alround <td>. Thus, the table cells are now on the same line as the
447       table rows.
448
449         <HTML>
450           <HEAD>
451             <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
452           </HEAD>
453
454
455           <BODY>
456             <H1>Headline</H1>
457             <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
458               <I>italics</I> and with 'ä' &amp; 'ü'.
459
460             <TABLE ALIGN="LEFT" BORDER="0">
461               <TR><TD ALIGN="RIGHT">top right</TD></TR>
462               <TR><TD ALIGN="LEFT">bottom left</TD></TR>
463             </TABLE>
464
465             <HR NOSHADE SIZE="5">
466             <ADDRESS><A HREF="mailto:schotten@gmx.de"
467                 >Claus&nbsp;Schotten</A></ADDRESS>
468           </BODY>
469         </HTML>
470
471       The end tags </td> and </tr> are printed because HTML:Tagset says they
472       are mandatory.
473         map {$HTML::Tagset::optionalEndTag{$_}=1} qw(td tr th); will fix
474       that.
475
476       The additional new line after </head> doesn't look nice. With
477       $hpp->set_nl_after(undef,'head') we will reset the parameter for the
478       <head> tag.
479
480       $hpp->entities($hpp->entities().'ä'); will enforce the entity encoding
481       of 'ä'.
482
483       $hpp->min_bool_attr(0); will inhibt the minimizyation of the NOSHADE
484       attribute to <hr>.
485
486       Let's fiddle with the indention:
487         $hpp->set_indent(8,'@TEXTBLOCK');
488         $hpp->set_indent(0,'html');
489
490       New lines inside text blocks (here inside <h1>, <p> and <address>) will
491       be indented by 8 characters instead of two, whereas the code directly
492       under <html> will not be indented.
493
494        <HTML>
495        <HEAD>
496          <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
497        </HEAD>
498
499        <BODY>
500          <H1>Headline</H1>
501          <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
502                  <I>italics</I> and with '&auml;' &amp; 'ü'.
503
504          <TABLE ALIGN="LEFT" BORDER="0">
505            <TR><TD ALIGN="RIGHT">top right
506            <TR><TD ALIGN="LEFT">bottom left
507          </TABLE>
508
509          <HR NOSHADE="NOSHADE" SIZE="5">
510          <ADDRESS><A HREF="mailto:schotten@gmx.de"
511                    >Claus&nbsp;Schotten</A></ADDRESS>
512        </BODY>
513        </HTML>
514
515       $hpp->wrap_at_tagend(HTML::PrettyPrinter::NEVER); will disable the line
516       wrap between the attribute and the '>' of the <a> tag. The resulting
517       line excedes the target line length by far, but the is no point left,
518       where the pretty printer could legaly break this line.
519
520       $hpp->set_endtag(1,'tr') will overwrite the default. Thus, the </tr>
521       appears in the code whereas the other optional endtags are still
522       omitted.
523
524       Finally, we customize some individual elements:
525
526       "$tree-"address('0.1.1')->attr('_hpp_skip',1)>
527           will skip the <p> and its content from the output
528
529       "$tree-"address('0.1.2.1.0')->attr('_hpp_force_nl',1)>
530           will force new lines arround the second <td>, but will not affect
531           the first.  <td>.
532
533        <HTML>
534        <HEAD>
535          <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
536        </HEAD>
537
538        <BODY>
539          <H1>Headline</H1>
540
541          <TABLE ALIGN="LEFT" BORDER="0">
542            <TR><TD ALIGN="RIGHT">top right</TR>
543            <TR>
544              <TD ALIGN="LEFT">bottom left
545            </TR>
546          </TABLE>
547
548          <HR NOSHADE="NOSHADE" SIZE="5">
549          <ADDRESS><A
550                    HREF="mailto:schotten@gmx.de">Claus&nbsp;Schotten</A></ADDRESS>
551        </BODY>
552        </HTML>
553

KNOWN BUGS

555       •   This is early alpha code. The interfaces are subject to changes.
556
557       •   The module is tested with perl 5.005_03 only. It should work with
558           perl 5.004 though.
559
560       •   The predefined tag groups are incomplete. Several tags need to be
561           added.
562
563       •   Attribute values from a fixed set given in the DTD (e.g.
564           ALIGN=LEFT|RIGHT etc.) should be converted to upper or lower case
565           depending on the value of the uppercase parameter. Currently, they
566           are printed as given in the HTML tree.
567
568       •   No optimization for performance was done.
569

SEE ALSO

571       HTML::TreeBuilder, HTML::Element, HTML::Tagset
572
574       Copyright 2000 Claus Schotten  schotten@gmx.de
575
576       This library is free software; you can redistribute it and/or modify it
577       under the same terms as Perl itself.
578

AUTHOR

580       Claus Schotten <schotten@gmx.de>
581

POD ERRORS

583       Hey! The above document had some coding errors, which are explained
584       below:
585
586       Around line 954:
587           Non-ASCII character seen before =encoding in 'print "'$tg''.
588           Assuming UTF-8
589
590
591
592perl v5.34.0                      2021-07-22                  PrettyPrinter(3)
Impressum