1XML::LibXML::PrettyPrinUts(e3r)Contributed Perl DocumentXaMtLi:o:nLibXML::PrettyPrint(3)
2
3
4
6 XML::LibXML::PrettyPrint - add pleasant whitespace to a DOM tree
7
9 my $document = XML::LibXML->new->parse_file('in.xml');
10 my $pp = XML::LibXML::PrettyPrint->new(indent_string => " ");
11 $pp->pretty_print($document); # modified in-place
12 print $document->toString;
13
15 Long XML files can be daunting for humans to read. Of course, XML is
16 really designed for computers to read - not people - but there are
17 times when mere mortals do need to read and edit XML by hand. For
18 example, if your application stores its configuration in XML, or you
19 need to dump some XML to STDOUT for debugging purposes.
20
21 Syntax highlighting helps, but to really make sense of some XML, proper
22 indentation can be vital. Hence "XML::LibXML::PrettyPrint" - it can be
23 applied to an XML::LibXML DOM tree to reformat it into a more readable
24 result.
25
26 Pretty-printing XML is not as CPU-efficient as dumping it out sloppily,
27 so unless you're pretty sure that a human is going to need to make
28 sense of your XML, you should probably not use this module.
29
30 Constructors
31 new(%options)
32 Constructs a pretty-printer object.
33
34 Options:
35
36 • indent_string - The string to use to indent each line. Defaults
37 to a single tab character. Setting it to a non-whitespace
38 character is allowed, but will carp a warning.
39
40 • new_line - The string to use to begin a new line. Defaults to
41 "\n".
42
43 • element - A hashref of element categorisations. Each
44 categorisation is a reference to an array of element names or
45 callback functions. Element names may use Clark notation.
46
47 my $callback = sub {
48 my $node = shift;
49 return 1 if $node->hasAttribute('is_block');
50 return undef;
51 };
52 my $pp = XML::LibXML::PrettyPrint->new(
53 element => {
54 inline => [qw/span strong em b i a/],
55 block => [qw/p div body html head/, $callback],
56 compact => [qw/title caption li dd dt th td/],
57 preserves_whitespace => [qw/pre script style/],
58 }
59 );
60
61 Callbacks should return 1 (true), 0 (false) or undef (dunno).
62
63 new_for_html(%options)
64 Constructs a pretty printer object pre-configured to be suitable
65 for HTML and XHTML. The indent_string and new_line options are
66 supported.
67
68 Methods
69 If you just need to use a default configuration (no options passed to
70 the constructor, then you can call these as class methods, unless
71 otherwise stated.
72
73 strip_whitespace($node)
74 Strips superfluous whitespace from an "XML::LibXML::Document" or
75 "XML::LibXML::Element".
76
77 Whitespace just before, just after or leading/trailing within an
78 inline element is not considered superfluous. Runs of multiple
79 whitespace characters are replaced with a single space. Whitespace
80 is not changed within an element that preserves whitespace.
81
82 The node is modified in place.
83
84 "indent($node, $level)"
85 Indents the node to a certain indentation level, and its direct
86 children to "$level + 1", grandchildren to "$level + 2", etc.
87 Typically you'd just want to indent the root node to level 0.
88
89 The node is modified in place.
90
91 Elements that preserve whitespace are not changed.
92
93 "pretty_print($node, $level)"
94 Strip whitespace and indent. The node is modified in place and
95 returned.
96
97 Example use as a class method:
98
99 print XML::LibXML::PrettyPrint
100 ->pretty_print(XML::LibXML->new->parse_string($XML))
101 ->toString;
102
103 indent_string($level)
104 Returns the string that would be used to indent something to a
105 particular level. Descendent classes could override this method to
106 do funky indentation, such as having varying levels of indentation.
107
108 "new_line"
109 Returns the string that would be used to begin a new line.
110
111 element_category($node)
112 Returns EL_INLINE, EL_BLOCK, EL_COMPACT or undef.
113
114 element_preserves_whitespace($node)
115 Boolean indicating whether the contents of the element have
116 significant whitespace that needs preserving.
117
118 Returns undef if $node is not an "XML::LibXML::Element".
119
120 Functions
121 "print_xml $xml"
122 Given an XML string or an XML::LibXML::Node object, prints it
123 nicely.
124
125 This function is not exported by default, but can be requested:
126
127 use XML::LibXML::PrettyPrint 0.001 qw(print_xml);
128
129 Use like this:
130
131 print_xml '<foo> <bar> </bar> </foo>';
132
133 "IO::Handle::print_xml($handle, $xml)"
134 Partly experimental, partly mental. You can enable this feature
135 like this:
136
137 use XML::LibXML::PrettyPrint 0.001 qw(-io);
138
139 And that will allow stuff like this to work:
140
141 open LOG, '>mylog.xml';
142 print_xml LOG '<foo> <bar> </bar> </foo>';
143 close LOG;
144
145 open my $log, '>otherlog.xml';
146 print_xml $log '<foo> <bar> </bar> </foo>';
147 close $log;
148
149 print_xml STDERR '<foo> <bar> </bar> </foo>';
150
151 Constants
152 These can be exported:
153
154 use XML::LibXML::PrettyPrint 0.001 qw(:constants);
155
156 "EL_BLOCK"
157 "EL_COMPACT"
158 "EL_INLINE"
159
161 There are three categories of element: inline, block and compact.
162
163 For inline elements the presence of whitespace (though not the amount
164 of whitespace) is considered significant just before the element, just
165 after the element, or just within the element.
166
167 In XHTML, consider the difference between the block element "<div>":
168
169 <div>Will</div><div>Carlton</div> <div>Ashley</div>
170
171 and the inline element "<span>":
172
173 <span>Spider</span>-<span>Man</span> <span>lives</span>
174
175 The space or lackthereof between "<div>" elements does not matter one
176 whit. The lack of spaces between the first two "<span>" elements allows
177 them to be read as a single (in this case, hyphenated) word, whereas
178 the space before the third "<span>" separates out the word "lives".
179
180 In terms of indentation, inline elements do not start a new indented
181 line, unless they are the first element within their block, or are
182 preceded by a block or compact element.
183
184 Block elements always start a new line, and cause their child nodes to
185 be indented to the next level.
186
187 Compact elements are somewhere in-between. When it comes to whitespace
188 stripping, they're treated as block elements. In terms of indentation,
189 they always start a new line, but they only cause their child nodes to
190 be indented to the next level if they have block descendents. If we
191 imagine that in HTML, "<ul>" is a block element, "<i>" is an inline
192 element, and "<li>" is a compact element:
193
194 <ul>
195 <li>Will Smith - Will Smith</li>
196 <li>Carlton Banks - Alfonso Ribeiro</li>
197 <li>
198 Vivian Banks:
199 <ul>
200 <li>Janet Hubert-Whitten <i>(seasons 1-3)</i></li>
201 <li>Daphne Maxwell Reid <i>(seasons 3-6)</i></li>
202 </ul>
203 </li>
204 </ul>
205
206 The third "<li>" element is indented like a block element because it
207 contains a block "<ul>" element. The other "<li>" elements do not have
208 their contents indented, because they contain only inline content.
209
210 Elements default to being block, but you can specify particular
211 elements as inline or compact by passing node names or callbacks to the
212 constructor. Elements default to not preserving whitespace unless they
213 have an "xml:space="preserve"" attribute, but again you can use the
214 constructor to change this.
215
216 Comments and processing instructions default to being compact, but you
217 can make particular comments or PIs inline or block by passing
218 appropriate callbacks to the constructor. Whitespace within comments
219 and PIs is always preserved. (There is rarely any reason to make
220 comments and processing instructions block, but making them inline can
221 occasionally be useful, as it will mean that the presence of whitespace
222 just before or just after the comment is treated as significant.)
223
224 Text nodes are always inline.
225
227 Please report any bugs to
228 <http://rt.cpan.org/Dist/Display.html?Queue=XML-LibXML-PrettyPrint>.
229
231 Related: XML::LibXML, HTML::HTML5::Writer.
232
233 XML::Tidy - similar, but based on XML::XPath. Doesn't differentiate
234 between inline and block elements.
235
236 XML::Filter::Reindent - similar again, based on XML::Parser. Doesn't
237 differentiate between inline and block elements.
238
239 Sermon: <http://www.derkarl.org/why_to_tabs.html>. Read it.
240
242 Toby Inkster <tobyink@cpan.org>.
243
245 This software is copyright (c) 2011-2014 by Toby Inkster.
246
247 This is free software; you can redistribute it and/or modify it under
248 the same terms as the Perl 5 programming language system itself.
249
251 THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
252 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
253 MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
254
255
256
257perl v5.36.0 2023-01-20 XML::LibXML::PrettyPrint(3)