1PrettyPrinter(3) User Contributed Perl Documentation PrettyPrinter(3)
2
3
4
6 HTML::PrettyPrinter - generate nice HTML files from HTML syntax trees
7
9 use HTML::TreeBuilder;
10 # generate a HTML syntax tree
11 my $tree = new HTML::TreeBuilder;
12 $tree->parse_file($file_name);
13 # modify the tree if you want
14
15 use HTML::PrettyPrinter;
16 my $hpp = new HTML::PrettyPrinter ('linelength' => 130,
17 'quote_attr' => 1);
18 # configure
19 $tree->address("0.1.0")->attr(_hpp_indent,0); # for an individual element
20 $hpp->set_force_nl(1,qw(body head)); # for tags
21 $hpp->set_force_nl(1,qw(@SECTIONS)); # as above
22 $hpp->set_nl_inside(0,'default!'); # for all tags
23
24 # format the source
25 my $linearray_ref = $hpp->format($tree);
26 print @$linearray_ref;
27
28 # alternative: print directly to filehandle
29 use FileHandle;
30 my $fh = new FileHandel ">$filenaem2";
31 if (defined $fh) {
32 $hpp->select($fh);
33 $hpp->format();
34 undef $fh;
35 $hpp->select(undef),
36 }
37
39 HTML::PrettyPrinter produces nicely formatted HTML code from a HTML
40 syntax tree. It is especially usefull if the produced HTML file shall
41 be read or edited manually afterwards. Various parameters let you adapt
42 the output to different styles and requirements.
43
44 If you don't care how the HTML source looks like as long as it is valid
45 and readable by browsers, you should use the as_HTML() method of
46 HTML::Element instead of the pretty printer. It is about five times
47 faster.
48
49 The pretty printer will handle line wrapping, indention and structuring
50 by the way the whitespace in the tree is represented in the output.
51 Furthermore upper/lowercase markup and markup minimization, quoting of
52 attribute values, the encoding of entities and the presence of optional
53 end tags are configurable.
54
55 There are two types of parameters to influence the output, individual
56 parameters that are set on a per element and per tag basis and common
57 parameters that are set only once for each instance of a pretty
58 printer.
59
60 In order to faciliate the configuration a mechanism to handle tag
61 groups is provided. Thus, it is possible to modify a parameter for a
62 group of tags (e.g. all known block elements) without writing each tag
63 name explicitly. Perhaps the code for tag groups will move to an other
64 Perl module in the future.
65
66 For HTML::Elements that require a special treatment like <PRE>, <XMP>,
67 <SCRIPT>, comments and declarations, pretty printer will fall back to
68 the method "as_HTML()" of the HTML elements.
69
71 Following individual paramters exist
72
73 indent n
74 The indent of new lines inside the element is increased by n
75 coloumns. Default is 2 for all tags.
76
77 skip bool
78 If true, the element and its content is skipped from output.
79 Default is false.
80
81 nl_before n
82 Number of newlines before the start tag. Default is 0 for inline
83 elements and 1 for other elements.
84
85 nl_inside n
86 Number of newlines between the tags and the contents of an element.
87 Default is 0.
88
89 nl_after n
90 Number of newlines after an element. Default is 0 for inline
91 elements and 1 for other elements.
92
93 force_nl bool
94 Force linebreaks before and after an element even if the HTML tree
95 does not contain whitespace at this place. Default is false for
96 inline elements and true for all other elements. This parameter is
97 superseded if the common parameter allow_forced_nl is set to false.
98
99 endtag bool
100 Print an optional endtag. Default is true.
101
102 Access Methods
103 Following access methods exist for each individual paramenter. Replace
104 parameter by the respective name.
105
106 $hpp->parameter($element)
107 Takes a reference to an HTML element as argument. Returns the value
108 of the parameter for that element. The priority to retrieve the
109 value is:
110
111 1. The value of the element's internal attribute "_hpp_parameter".
112
113 2. The value specified inside the pretty printer for the tag of
114 the element.
115
116 3. The value specified inside the pretty printer for 'default!'.
117
118 $hpp->parameter('tag')
119 Like "parameter($element)", except that only priorities 2 and 3 are
120 evaluated.
121
122 $hpp->set_parameter($value,'tag1','tag2',...)
123 Sets the parameter for each tag in the list to $value.
124
125 If $value is undefined, the entries for the tags are deleted.
126
127 Beside individual tags the list may include tag groups like
128 '@BLOCK' (see below) and '"default!"'. Individual tag names are
129 written in lower case, the names of tag groups start with an '@'
130 and are written in upper case letters. Tag groups are expanded
131 during the call of "set_parameter()". '"default!"' sets the
132 default value, which is retrived if no value is defined for the
133 individual element or tag.
134
135 $hpp->set_parameter($value,'all!')
136 Deletes all existing settings for parameter inside the pretty
137 printer and sets the default to $value..
138
140 tabify n
141 If non zero, each n spaces at the beginnig of a line are converted
142 into one TAB. Default is 8.
143
144 linelength n
145 The maximum number of character a line should have. Default is 80.
146
147 The linelength may be exceeded if there is no proper way to break a
148 line without modifying the content, e.g. inside <PRE> and other
149 special elements or if there is no whitespace.
150
151 min_bool_attr bool
152 Minimize boolean attributes, e.g. print <UL COMPACT> instead of <UL
153 COMPACT=COMPACT>. Default is true.
154
155 quote_attr bool
156 Always quote attribute values. If false, attribute values
157 consisting entirely of letters, digits, periods and hyphens only
158 are not put into quotes. Default is false.
159
160 entities string
161 The string contains all characters that are escaped to their entity
162 names. Default is the bare minimum of "&<>" plus the non breaking
163 space 'nbsp' (because otherwise it is difficult for the human eye
164 to distiguish it from a normal space in most editors).
165
166 wrap_at_tagend NEVER|AFTER_ATTR|ALWAYS
167 May pretty printer wrap lines before the closing ankle of a start
168 tag? Supported values are the predifined constants NEVER (allow
169 line wraps at white space only ), AFTER_ATTR (allow line wraps at
170 the end of tags that contain attributes only) and ALWAYS (allow
171 line wraps at the end of every start tag). Default is AFTER_ATTR.
172
173 allow_forced_nl bool
174 Allow the addition of white space, that is not in the HTML tree.
175 If set to false (the default) the force_nl parameter is ignored.
176 It is recomended to set this parameter to true if the HTML tree was
177 generated with ignore_ignorable_whitespace set to true.
178
179 uppercase bool
180 Use uppercase letters for markup. Default is the value of
181 $HTML::Element::html_uc at the time the constructor is called.
182
183 Access Method
184 $hpp->paramter([value])
185 Retrieves and optionaly sets the parameter.
186
188 $hpp = HTML::PrettyPrinter->new(%common_paremeters)
189 This class method creates a new HTML::PrettyPrinter and returns it.
190 Key/value pair arguments may be provided to overwrite the default
191 settings of common parameters. There is currently no mechanism to
192 overwrite the default values for individual parameters at
193 construction. Use the "$hpp-"set_parameter()> methods instead.
194
195 $hpp->select($fh)
196 Select a FileHandle object for output.
197
198 If a FileHandle is selected the generated HTML is printed directly
199 to that file. With $hpp->select(undef) you can switch back to the
200 default behaviour.
201
202 $line_array_ref = $hpp->format($tree,[$indent],[$line_array_ref])
203 Format the HTML syntax (sub-) tree.
204
205 $tree is not restricted to the root of the HTML syntax tree. A
206 reference to any HTML::Element will do.
207
208 The optional $indent indents the first element by n characters
209
210 Return value is the reference to an array with the generated lines.
211 If such a reference is provided as third argument, the lines will
212 be appended to that array. Otherwise a new array will be created.
213
214 If a FileHandle is selected by a previous call of the
215 "$hpp-"select($fh)> method, the lines are printed to the FileHandle
216 object directly. The array of lines is not changed in this case.
217
219 Tag groups are lists that contain the names of tags and other tag
220 groups which are considered as subsets. This reflects the way allowed
221 content is specified in HTML DTDs, where e.g. %flow consists of all
222 %block and %inline elements and %inline covers several subsets like
223 %phrase.
224
225 If you add a tag name to a group A, it will be seen in any group that
226 contains group A. Thus, it is easy to maintain groups of tags with
227 similar properties. (and configure HTML pretty printer for these tags).
228
229 The names of tag groups are written in upper case letters with a
230 leading '@' (e.g. '@BLOCK'). The names of simple tags are written all
231 lower case.
232
233 Functions
234 All the functions to handle and modify tag groups are included in the
235 @EXPORT_OK list of "HTML::PrettyPrinter".
236
237 @tag_groups = list_groups()
238 Returns a list with the names of all defined tag groups
239
240 @tags = group_expand('tag_or_tag_group0',['tag_or_tag_group1',...])
241 Returns a list of every tag in the tag groups and their subgroups
242 Each tag is listed once only. The order of the list is not
243 specified.
244
245 @tag_groups = sub_group('tag_group0',['tag_group1',...])
246 Returns a list of every tag group and sub group in the list. Each
247 group is listed once only. The order of the list is not specified.
248
249 group_get('@NAME')
250 Return the (unexpanded) contents of a tag group.
251
252 "group_set('@NAME',['tag_or_tag_group0',...])"
253 Set a tag group.
254
255 "group_add('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
256 Add tags and tag groups to a group.
257
258 "group_remove('@NAME','tag_or_tag_group0',['tag_or_tag_group1',...])"
259 Remove tags or tag groups from a group. Subgroups are not expanded.
260 Thus, "group_remove('@A','@B')" will remove '@B' from '@A' if it is
261 included directly. Tags included in '@B' will not be removed from
262 '@A'. Nor will '@A' be changed if '@B' is included in a aubgroup
263 of '@A' but not in '@A' directly.
264
265 Predefined Tag Groups
266 There are a couple of predefined tag groups. Use " foreach my $tg
267 (list_groups()) {
268 print "'$tg' => qw(".join(',',group_get($tg)).")\n";
269 } " to get a list.
270
271 Examples for tag groups
272 1. create some groups
273 "
274 group_set('@A',qw(a1 a2 a3));
275 group_set('@B',qw(b1 b2));
276 group_set('@C',qw(@A @B c1 @D));
277 # @D needs to be defined when @C is expannded
278 group_set('@D',qw(d1 @B));
279 group_set('@E',qw(e1 @D));
280 group_set('@F',qw(f1 @A)); "
281
282 2. add tags
283 "
284 group_add('@A',qw(a4 a5)); # @A contains (a1 a2 a3 a4 a5)
285 group_add('@D',qw(d1)); # @D contains (d1 @B d1)
286 group_add('@F',group_exapand('@B'),'@F');
287 # @F contains (f1 @A b1 b2 f1 @F) "
288
289 3. evaluate
290 "
291 group_exapand('@E'); # returns e1, d1, b1, b2
292 sub_groups('@E'); # returns @B, @D
293 sub_groups(qw(@E @F)); # returns @A, @B, @D
294 group_get('@F')); # returns f1, @A, b1, b2, f1, @F "
295
296 4. remove tags
297 "
298 group_remove('@E','@C'); # @E not changed, because it doesn't
299 contain @C
300 group_remove('@E','@D'); # @D removed from @E
301 group_remove('@D','d1'); # all d1's are removed. Now @D contains
302 @B only
303 group_remove('@C','@B'); # @C now contains (@a c1 @D), Thus
304 sub_groups('@C'); # still returns @A, @B, @D,
305 # because @B is included in @D, too "
306
307 5. application
308 "
309 # set the indent for tags b1, b2, e1, g1 to 0
310 $hpp->set_indent(0,qw(@D @E g1)); "
311
312 If the groups @D or @E are modified afterwards, the configuration
313 of the pretty printer is not affected, because "set_indent()" will
314 expand the tag groups.
315
317 Consider the following HTML tree
318
319 <html> @0
320 <head> @0.0
321 <title> @0.0.0
322 "Demonstrate HTML::PrettyPrinter"
323 <body> @0.1
324 <h1> @0.1.0
325 "Headline"
326 <p align="JUSTIFY"> @0.1.1
327 "Some text in "
328 <b> @0.1.1.1
329 "bold"
330 " and "
331 <i> @0.1.1.3
332 "italics"
333 " and with 'ä' & 'ü'."
334 <table align="LEFT" border=0> @0.1.2
335 <tr> @0.1.2.0
336 <td align="RIGHT"> @0.1.2.0.0
337 "top right"
338 <tr> @0.1.2.1
339 <td align="LEFT"> @0.1.2.1.0
340 "bottom left"
341 <hr noshade="NOSHADE" size=5> @0.1.3
342 <address> @0.1.4
343 <a href="mailto:schotten@gmx.de"> @0.1.4.0
344 "Claus Schotten"
345
346 and "
347 $hpp = HTML::PrettyPrinter-"new('uppercase' => 1);
348 print @{$hpp->format($tree)}; >
349
350 will print
351
352 <HTML><HEAD><TITLE>Demonstrate
353 HTML::PrettyPrinter</TITLE></HEAD><BODY><H1>Headline</H1><P
354 ALIGN=JUSTIFY>Some text in <B>bold</B> and
355 <I>italics</I> and with 'ä' & 'ü'.</P><TABLE
356 ALIGN=LEFT BORDER=0><TR><TD ALIGN=RIGHT>top
357 right</TD></TR><TR><TD ALIGN=LEFT>bottom
358 left</TD></TR></TABLE><HR NOSHADE SIZE=5
359 ><ADDRESS><A HREF="mailto:schotten@gmx.de"
360 >Claus Schotten</A></ADDRESS></BODY></HTML>
361
362 That doesn't look very nice. What went wrong? By default
363 HTML::PrettyPrinter takes a conservative approach on whitespace. It
364 will enlarge existing whitespace, but it will not introduce new
365 whitespace outside of tags, because that might change the way a browser
366 renders the HTML document. However the HTML tree was constructed with
367 ""ignore_ignorable_whitespace> turned on. Thus, there is no whitespace
368 between block elements that the pretty printer could format. So pretty
369 printer does line wrapping and indention only. E.g. the title is in
370 the third level of the tree. Thus, the second line is indented six
371 characters. The table cells in the fifth level are indented by ten
372 characters. Furthermore, you see that there is a whitespace inserted
373 after the last attribute of the <A> tag.
374
375 Let's set $hpp->allow_forced_nl(1);. Now the forced_nl parameters are
376 enabled. By default, they are set for all non-inline tags. That creates
377
378 <HTML>
379 <HEAD>
380 <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
381 </HEAD>
382 <BODY>
383 <H1>Headline</H1>
384 <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
385 <I>italics</I> and with 'ä' & 'ü'.</P>
386 <TABLE ALIGN=LEFT BORDER=0>
387 <TR>
388 <TD ALIGN=RIGHT>top right</TD>
389 </TR>
390 <TR>
391 <TD ALIGN=LEFT>bottom left</TD>
392 </TR>
393 </TABLE>
394 <HR NOSHADE SIZE=5>
395 <ADDRESS><A HREF="mailto:schotten@gmx.de"
396 >Claus Schotten</A></ADDRESS>
397 </BODY>
398 </HTML>
399
400 Much better, isn't it? Now let's improve the structuring.
401 $hpp->set_nl_before(2,qw(body table));
402 $hpp->set_nl_after(2,qw(table)); will require two new lines in front
403 of <body> and <table> tags and after <table> tags.
404
405 <HTML>
406 <HEAD>
407 <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
408 </HEAD>
409
410 <BODY>
411 <H1>Headline</H1>
412 <P ALIGN=JUSTIFY>Some text in <B>bold</B> and
413 <I>italics</I> and with 'ä' & 'ü'.</P>
414
415 <TABLE ALIGN=LEFT BORDER=0>
416 <TR>
417 <TD ALIGN=RIGHT>top right</TD>
418 </TR>
419 <TR>
420 <TD ALIGN=LEFT>bottom left</TD>
421 </TR>
422 </TABLE>
423
424 <HR NOSHADE SIZE=5>
425 <ADDRESS><A HREF="mailto:schotten@gmx.de"
426 >Claus Schotten</A></ADDRESS>
427 </BODY>
428 </HTML>
429
430 Currently the mail address is the only attribute value which is quoted.
431 Here the quotes are required by the '@' character. For all other
432 attribute values quotes are optional and thus ommited by default.
433 $hpp->quote_attr(1); will turn the quotes on.
434
435 $hpp->set_endtag(0,'all!') turns all optional endtags off. This
436 affects the </p> (and should affect </tr> and </td>, see below).
437 Alternatively, we could use $hpp->set_endtag(0,'default!'). That would
438 turn the default off, too. But it wouldn't delete settings for
439 individual tags that supersede the default.
440
441 $hpp->set_nl_after(3,'head') requires three new lines after the <head>
442 element. Because there are already two new lines required by the start
443 of <body> only one additional line is added.
444
445 $hpp->set_force_nl(0,'td') will inhibit the introduction of whitespace
446 alround <td>. Thus, the table cells are now on the same line as the
447 table rows.
448
449 <HTML>
450 <HEAD>
451 <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
452 </HEAD>
453
454
455 <BODY>
456 <H1>Headline</H1>
457 <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
458 <I>italics</I> and with 'ä' & 'ü'.
459
460 <TABLE ALIGN="LEFT" BORDER="0">
461 <TR><TD ALIGN="RIGHT">top right</TD></TR>
462 <TR><TD ALIGN="LEFT">bottom left</TD></TR>
463 </TABLE>
464
465 <HR NOSHADE SIZE="5">
466 <ADDRESS><A HREF="mailto:schotten@gmx.de"
467 >Claus Schotten</A></ADDRESS>
468 </BODY>
469 </HTML>
470
471 The end tags </td> and </tr> are printed because HTML:Tagset says they
472 are mandatory.
473 map {$HTML::Tagset::optionalEndTag{$_}=1} qw(td tr th); will fix
474 that.
475
476 The additional new line after </head> doesn't look nice. With
477 $hpp->set_nl_after(undef,'head') we will reset the parameter for the
478 <head> tag.
479
480 $hpp->entities($hpp->entities().'ä'); will enforce the entity encoding
481 of 'ä'.
482
483 $hpp->min_bool_attr(0); will inhibt the minimizyation of the NOSHADE
484 attribute to <hr>.
485
486 Let's fiddle with the indention:
487 $hpp->set_indent(8,'@TEXTBLOCK');
488 $hpp->set_indent(0,'html');
489
490 New lines inside text blocks (here inside <h1>, <p> and <address>) will
491 be indented by 8 characters instead of two, whereas the code directly
492 under <html> will not be indented.
493
494 <HTML>
495 <HEAD>
496 <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
497 </HEAD>
498
499 <BODY>
500 <H1>Headline</H1>
501 <P ALIGN="JUSTIFY">Some text in <B>bold</B> and
502 <I>italics</I> and with 'ä' & 'ü'.
503
504 <TABLE ALIGN="LEFT" BORDER="0">
505 <TR><TD ALIGN="RIGHT">top right
506 <TR><TD ALIGN="LEFT">bottom left
507 </TABLE>
508
509 <HR NOSHADE="NOSHADE" SIZE="5">
510 <ADDRESS><A HREF="mailto:schotten@gmx.de"
511 >Claus Schotten</A></ADDRESS>
512 </BODY>
513 </HTML>
514
515 $hpp->wrap_at_tagend(HTML::PrettyPrinter::NEVER); will disable the line
516 wrap between the attribute and the '>' of the <a> tag. The resulting
517 line excedes the target line length by far, but the is no point left,
518 where the pretty printer could legaly break this line.
519
520 $hpp->set_endtag(1,'tr') will overwrite the default. Thus, the </tr>
521 appears in the code whereas the other optional endtags are still
522 omitted.
523
524 Finally, we customize some individual elements:
525
526 "$tree-"address('0.1.1')->attr('_hpp_skip',1)>
527 will skip the <p> and its content from the output
528
529 "$tree-"address('0.1.2.1.0')->attr('_hpp_force_nl',1)>
530 will force new lines arround the second <td>, but will not affect
531 the first. <td>.
532
533 <HTML>
534 <HEAD>
535 <TITLE>Demonstrate HTML::PrettyPrinter</TITLE>
536 </HEAD>
537
538 <BODY>
539 <H1>Headline</H1>
540
541 <TABLE ALIGN="LEFT" BORDER="0">
542 <TR><TD ALIGN="RIGHT">top right</TR>
543 <TR>
544 <TD ALIGN="LEFT">bottom left
545 </TR>
546 </TABLE>
547
548 <HR NOSHADE="NOSHADE" SIZE="5">
549 <ADDRESS><A
550 HREF="mailto:schotten@gmx.de">Claus Schotten</A></ADDRESS>
551 </BODY>
552 </HTML>
553
555 • This is early alpha code. The interfaces are subject to changes.
556
557 • The module is tested with perl 5.005_03 only. It should work with
558 perl 5.004 though.
559
560 • The predefined tag groups are incomplete. Several tags need to be
561 added.
562
563 • Attribute values from a fixed set given in the DTD (e.g.
564 ALIGN=LEFT|RIGHT etc.) should be converted to upper or lower case
565 depending on the value of the uppercase parameter. Currently, they
566 are printed as given in the HTML tree.
567
568 • No optimization for performance was done.
569
571 HTML::TreeBuilder, HTML::Element, HTML::Tagset
572
574 Copyright 2000 Claus Schotten schotten@gmx.de
575
576 This library is free software; you can redistribute it and/or modify it
577 under the same terms as Perl itself.
578
580 Claus Schotten <schotten@gmx.de>
581
583 Hey! The above document had some coding errors, which are explained
584 below:
585
586 Around line 954:
587 Non-ASCII character seen before =encoding in 'print "'$tg''.
588 Assuming UTF-8
589
590
591
592perl v5.34.0 2021-07-22 PrettyPrinter(3)