1XML::Tidy(3)          User Contributed Perl Documentation         XML::Tidy(3)
2
3
4

NAME

6       XML::Tidy - tidy indenting of XML documents
7

VERSION

9       This documentation refers to version 1.20 of XML::Tidy, which was
10       released on Sun Jul  9 09:43:30:08 -0500 2017.
11

SYNOPSIS

13         #!/usr/bin/perl
14         use strict;use  warnings;
15         use   utf8;use XML::Tidy;
16
17         # create new   XML::Tidy object by loading:  MainFile.xml
18         my $tidy_obj = XML::Tidy->new('filename' => 'MainFile.xml');
19
20         #   tidy  up  the  indenting
21            $tidy_obj->tidy();
22
23         #             write out changes back     to  MainFile.xml
24            $tidy_obj->write();
25

DESCRIPTION

27       This module creates XML document objects (with inheritance from
28       XML::XPath) to tidy mixed-content (i.e., non-data) text node indenting.
29       There are also some other handy member functions to compress and expand
30       your XML document object (into either a compact XML representation or a
31       binary one).
32

USAGE

34   new()
35       This is the standard Tidy object constructor. Except for the added
36       'binary' option, it can take the same parameters as an XML::XPath
37       object constructor to initialize the XML document object. These can be
38       any one of:
39
40         'filename' => 'SomeFile.xml'
41         'binary'   => 'SomeBinaryFile.xtb'
42         'xml'      => $variable_which_holds_a_bunch_of_XML_data
43         'ioref'    => $file_InputOutput_reference
44         'context'  => $existing_node_at_specified_context_to_become_new_obj
45
46   reload()
47       The reload() member function causes the latest data contained in a Tidy
48       object to be re-parsed (which re-indexes all nodes).
49
50       This can be necessary after modifications have been made to nodes which
51       impact the tree node hierarchy because XML::XPath's find() member
52       preserves state information which can get out-of-sync.
53
54       reload() is probably rarely useful by itself but it is needed by
55       strip() and prune() so it is exposed as a method in case it comes in
56       handy for other uses.
57
58   strip()
59       The strip() member function searches the Tidy object for all mixed-
60       content (i.e., non-data) text nodes and empties them out.  This will
61       basically unformat any markup indenting.
62
63       strip() is used by compress() and tidy() but it is exposed because it
64       is also worthwhile by itself.
65
66   tidy()
67       The tidy() member function can take a single optional parameter as the
68       string that should be inserted for each indent level. Some examples:
69
70         # Tidy up indenting with default two  (2) spaces per indent level
71            $tidy_obj->tidy();
72
73         # Tidy up indenting with         four (4) spaces per indent level
74            $tidy_obj->tidy('    ');
75
76         # Tidy up indenting with         one  (1) tab    per indent level
77            $tidy_obj->tidy('tab' );
78
79         # Tidy up indenting with         two  (2) tabs   per indent level
80            $tidy_obj->tidy("\t\t");
81
82       The default behavior is to use two (2) spaces for each indent level.
83       The Tidy object gets all mixed-content (i.e., non-data) text nodes
84       reformatted to appropriate indent levels according to tree nesting
85       depth.
86
87       NOTE: tidy() disturbs some XML escapes in whatever ways XML::XPath
88       does. It has been brought to my attention that these modules also strip
89       CDATA tags from XML files / data they operate on. Even though CDATA
90       tags don't seem very common, I would very much like for them to work
91       smoothly too. Hopefully the vast majority of files will work fine and
92       future support for any of the more rare types can be added later.
93
94       Additionally, please take notice that every call to tidy() (as well as
95       reload, strip, and most other XML::Tidy functions) leak some memory due
96       to their usage of XPath's findnodes command. This issue was described
97       helpfully at <HTTPS://RT.CPAN.Org/Ticket/Display.html?id=120296>.
98       Thanks to Jozef!
99
100   compress()
101       The compress() member function calls strip() on the Tidy object then
102       creates an encoded comment which contains the names of elements and
103       attributes as they occurred in the original document. Their respective
104       element and attribute names are replaced with just the appropriate
105       index throughout the document.
106
107       compress() can accept a parameter describing which node types to
108       attempt to shrink down as abbreviations. This parameter should be a
109       string of just the first letters of each node type you wish to include
110       as in the following mapping:
111
112         e = elements
113         a = attribute   keys
114         v = attribute values *EXPERIMENTAL*
115         t = text       nodes *EXPERIMENTAL*
116         c = comment    nodes *EXPERIMENTAL*
117         n = namespace  nodes *not-yet-implemented*
118
119       Attribute values ('v') and text nodes ('t') both seem to work fine with
120       current tokenization. I've still labeled them EXPERIMENTAL because they
121       seem more likely to cause problems than valid element or attribute key
122       names. I have some bugs in the comment node compression which I haven't
123       been able to find yet so that one should be avoided for now. Since
124       these three node types ('vtc') all require tokenization, they are not
125       included in default compression ('ea'). An example call which includes
126       values and text would be:
127
128         $tidy_obj->compress('eavt');
129
130       The original document structure (i.e., node hierarchy) is preserved.
131       compress() significantly reduces the file size of most XML documents
132       for when size matters more than immediate human readability.  expand()
133       performs the opposite conversion.
134
135   expand()
136       The expand() member function reads any XML::Tidy::compress comments
137       from the Tidy object and uses them to reconstruct the document that was
138       passed to compress().
139
140   bcompress('BinaryOutputFilename.xtb')
141       The bcompress() member function stores a binary representation of any
142       Tidy object. The format consists of:
143
144         0) a null-terminated version string
145         1) a byte specifying how many bytes later indices will be
146         2) the number of bytes from 1 above to designate the total string  count
147         3) the number of null-terminated          strings from 2 above
148         4) the number of bytes from 1 above to designate the total integer count
149         5) the number of 4-byte                  integers from 4 above
150         6) the number of bytes from 1 above to designate the total float   count
151         7) the number of 8-byte (double-precision) floats from 6 above
152         8) node index sets until the end of the file
153
154       Normal node index sets consist of two values. The first is an index
155       (again the number of bytes long comes from 1) into the three lists as
156       if they were all linear. The second is a single-byte integer
157       identifying the node type (using standard DOM node type enumerations).
158
159       A few special cases exist in node index sets though. If the index is
160       null, it is interpreted as a close-element tag (so no accompanying type
161       value is read). On the other end, when the index is non-zero, the type
162       value is always read. In the event that the type corresponds to an
163       attribute or a processing instruction, the next index is read (without
164       another accompanying type value) in order to complete the data fields
165       required by those node types.
166
167       NOTE: Please bear in mind that the encoding of binary integers and
168       floats only works properly if the values are not surrounded by spaces
169       or other delimiters and each is contained in its own single node. This
170       is necessary to enable thorough reconstruction of whitespace from the
171       original document. I recommend storing every numerical value as an
172       isolated attribute value or text node without any surrounding
173       whitespace.
174
175         # Examples which encode all numbers as binary:
176         <friend name="goodguy" category="15">
177           <hitpoints>31.255</hitpoints>
178           <location>
179             <x>-15.65535</x>
180             <y>16383.7</y>
181             <z>-1023.63</z>
182           </location>
183         </friend>
184
185         # Examples which encode all numbers as strings:
186         <enemy name="badguy" category=" 666 ">
187           <hitpoints> 2.0 </hitpoints>
188           <location> 4.0 -2.0 4.0 </location>
189         </enemy>
190
191       The default file extension is .xtb (for XML::Tidy Binary).
192
193   bexpand('BinaryInputFilename.xtb')
194       The bexpand() member function reads a binary file which was previously
195       written from bcompress(). bexpand() is an XML::Tidy object constructor
196       like new() so it can be called like:
197
198         my $xtbo = XML::Tidy->bexpand('BinaryInputFilename.xtb');
199
200   prune()
201       The prune() member function takes an XPath location to remove (along
202       with all attributes and child nodes) from the Tidy object. For example,
203       to remove all comments:
204
205         $tidy_obj->prune('//comment()');
206
207       or to remove the third baz (XPath indexing is 1-based):
208
209         $tidy_obj->prune('/foo/bar/baz[3]');
210
211       Pruning your XML tree is a form of tidying too so it snuck in here. =)
212
213   write()
214       The write() member function can take an optional filename parameter to
215       write out any changes to the Tidy object. If no parameters are given,
216       write() overwrites the original XML document file (if a 'filename'
217       parameter was given to the constructor).
218
219       write() will croak() if no filename can be found to write to.
220
221       write() can also take a secondary parameter which specifies an XPath
222       location to be written out as the new root element instead of the Tidy
223       object's root. Only the first matching element is written.
224
225   toString()
226       The toString() member function is almost identical to write() except
227       that it takes no parameters and simply returns the equivalent XML
228       string as a scalar. It is a little weird because normally only
229       XML::XPath::Node objects have a toString() member but I figure it makes
230       sense to extend the same syntax to the parent object as well, since it
231       is a useful option.
232

createNode Wrappers

234       The following are just aliases to Node constructors. They'll work with
235       just the unique portion of the node type as the member function name.
236
237   e() or el() or elem() or createElement()
238       wrapper for XML::XPath::Node::Element->new()
239
240   a() or at() or attr() or createAttribute()
241       wrapper for XML::XPath::Node::Attribute->new()
242
243   c() or cm() or cmnt() or createComment()
244       wrapper for XML::XPath::Node::Comment->new()
245
246   t() or tx() or text() or createTextNode()
247       wrapper for XML::XPath::Node::Text->new()
248
249   p() or pi() or proc() or createProcessingInstruction()
250       wrapper for XML::XPath::Node::PI->new()
251
252   n() or ns() or nspc() or createNamespace()
253       wrapper for XML::XPath::Node::Namespace->new()
254

EXPORTED CONSTANTS

256       Since they are sometimes needed to compare against, XML::Tidy also
257       exports the same node constants as XML::XPath::Node (which correspond
258       to DOM values). These include:
259
260   UNKNOWN_NODE
261   ELEMENT_NODE
262   ATTRIBUTE_NODE
263   TEXT_NODE
264   CDATA_SECTION_NODE
265   ENTITY_REFERENCE_NODE
266   ENTITY_NODE
267   PROCESSING_INSTRUCTION_NODE
268   COMMENT_NODE
269   DOCUMENT_NODE
270   DOCUMENT_TYPE_NODE
271   DOCUMENT_FRAGMENT_NODE
272   NOTATION_NODE
273   ELEMENT_DECL_NODE
274   ATT_DEF_NODE
275   XML_DECL_NODE
276   ATTLIST_DECL_NODE
277   NAMESPACE_NODE
278       XML::Tidy also exports:
279
280   STANDARD_XML_DECL
281       which returns a reasonable default XML declaration string (assuming
282       typical "utf-8" encoding).
283

TODO

285       - fix reload() from messing up Unicode escaped &XYZ; components like
286       Copyright &#xA9; and Registered &#xAE; (probably needs pre and post
287       processing)
288       - write many better UTF-8 tests
289       - support namespaces
290       - handle CDATA
291

CHANGES

293       Revision history for Perl extension XML::Tidy:
294
295       - 1.20 H79M9hU8  Sun Jul  9 09:43:30:08 -0500 2017
296         * removed broken Build.PL         to resolve
297         <HTTPS://RT.CPAN.Org/Ticket/Display.html?id=122406>. (Thank you,
298         Slaven.)
299
300       - 1.18 H78M5qm1  Sat Jul  8 05:52:48:01 -0500 2017
301         * fixed new() to check file or xml to detect standalone in
302         declaration, from <HTTPS://RT.CPAN.Org/Ticket/Display.html?id=122389>
303         (Thanks Alex!)
304
305         * traced tidy() memory leak from
306         <HTTPS://RT.CPAN.Org/Ticket/Display.html?id=120296> (Thanks Jozef!)
307         which seems to come from every XPath->findnodes() call
308
309         * aligned synopsis comments
310
311         * updated write() to use output encoding UTF-8 since that's what
312         almost all XML should rely on (with thanks to RJBS for teaching me
313         much from his great talk at
314         <HTTPS://YouTube.Com/watch?v=TmTeXcEixEg>)
315
316         * collapsed trailing curly braces on code blocks
317
318         * added croak for any failed file open attempt
319
320       - 1.16 G6LM4EST  Tue Jun 21 04:14:28:29 -0500 2016
321         * stopped using my old fragile package generation and manually
322         updated all distribution files (though Dist::Zilla should let me
323         generate much again)
324
325         * updated license to GPLv3+
326
327         * fixed 00pod.t and 01podc.t to eval the Test modules from issue and
328         patch: <HTTPS://RT.CPAN.Org/Public/Bug/Display.html?id=85592> (Thanks
329         again MichielB.)
330
331         * replaced all old '&&' with 'and' in POD
332
333       - 1.14 G6JMERCY  Sun Jun 19 14:27:12:34 -0500 2016
334         * separated old PT from VERSION to fix non-numeric issue:
335         <HTTPS://RT.CPAN.Org/Public/Bug/Display.html?id=56073> (Thanks to
336         Slaven.)
337
338         * removed Unicode from POD but added encoding utf8 anyway to pass
339         tests and resolve issues:
340         <HTTPS://RT.CPAN.Org/Public/Bug/Display.html?id=92434> and
341         <HTTPS://RT.CPAN.Org/Public/Bug/Display.html?id=85592> (Thanks to
342         Sudhanshu and MichielB.)
343
344       - 1.12.B55J2qn  Thu May  5 19:02:52:49 2011
345         * made "1.0" float binarize as float again, rather than just "1" int
346
347         * cleaned up POD and fixed EXPORTED CONSTANTS heads blocking together
348
349       - 1.10.B52FpLx  Mon May  2 15:51:21:59 2011
350         * added tests for undefined non-standard XML declaration to suppress
351         warnings
352
353       - 1.8.B2AMvdl  Thu Feb 10 22:57:39:47 2011
354         * aligned .t code
355
356         * added test for newline before -r to try to resolve:
357         <HTTPS://RT.CPAN.Org/Ticket/Display.html?id=65471> (Thanks, Leandro.)
358
359         * fixed off-by-one error when new gets a readable (non-newline)
360         filename (that's not "filename" without a pre-'filename' param) to
361         resolve: <HTTPS://RT.CPAN.Org/Ticket/Display.html?id=65151> (Thanks,
362         Simone.)
363
364       - 1.6.A7RJKwl  Tue Jul 27 19:20:58:47 2010
365         * added head2 POD for EXPORTED CONSTANTS to try to pass t/00podc.t
366
367       - 1.4.A7QCvHw  Mon Jul 26 12:57:17:58 2010
368         * hacked a little test for non-UTF-8 decl str to resolve FrankGoss'
369         need for ISO-8859-1 decl encoding to persist through tidying
370
371         * md sure META.yml is being generated correctly for the CPAN
372
373         * updated license to GPLv3
374
375       - 1.2.75BACCB  Fri May 11 10:12:12:11 2007
376         * made "1.0" float binarize as just "1" int
377
378         * made ints signed and bounds checked
379
380         * added new('binary' => 'BinFilename.xtb') option
381
382       - 1.2.54HJnFa  Sun Apr 17 19:49:15:36 2005
383         * fixed tidy() processing instruction stripping problem
384
385         * added support for binary ints and floats in bcompress()
386
387         * tightened up binary format and added pod
388
389       - 1.2.54HDR1G  Sun Apr 17 13:27:01:16 2005
390         * added bcompress() and bexpand()
391
392         * added  compress() and  expand()
393
394         * added toString()
395
396       - 1.2.4CKBHxt  Mon Dec 20 11:17:59:55 2004
397         * added exporting of XML::XPath::Node (DOM) constants
398
399         * added node object creation wrappers (like LibXML)
400
401       - 1.2.4CCJW4G  Sun Dec 12 19:32:04:16 2004
402         * added optional 'xpath_loc' => to prune()
403
404       - 1.0.4CAJna1  Fri Dec 10 19:49:36:01 2004
405         * added optional 'filename'  => to write()
406
407       - 1.0.4CAAf5B  Fri Dec 10 10:41:05:11 2004
408         * removed 2nd param from tidy() so that 1st param is just indent
409         string
410
411         * fixed pod errors
412
413       - 1.0.4C9JpoP  Thu Dec  9 19:51:50:25 2004
414         * added xplc option to write()
415
416         * added prune()
417
418       - 1.0.4C8K1Ah  Wed Dec  8 20:01:10:43 2004
419         * inherited from XPath so that those methods can be called directly
420
421         * original version (separating Tidy.pm from Merge.pm)
422

INSTALL

424       From the command shell, please run:
425
426         `perl -MCPAN -e "install XML::Tidy"`
427
428       or uncompress the package and run the standard:
429
430         `perl Makefile.PL; make; make test; make install`
431

FILES

433       XML::Tidy requires:
434
435       Carp                  to allow errors to croak() from calling sub
436
437       XML::XPath            to use XPath statements to query and update XML
438
439       XML::XPath::XMLParser to parse XML documents into XPath objects
440
441       Math::BaseCnv         to handle base-64 indexing for compress() and
442       expand()
443

BUGS

445       Please report any bugs or feature requests to   bug-XML-Tidy
446       at RT.CPAN.Org, or through the web interface at
447         <HTTPS://RT.CPAN.Org/NoAuth/ReportBug.html?Queue=XML-Tidy>.
448       I will be notified, and then you can be updated of progress on your bug
449         as I address fixes.
450

SUPPORT

452       You can find documentation for this module (after it is installed) with
453       the perldoc command.
454
455         `perldoc XML::Tidy`
456
457       You can also look for information at:
458
459           RT: CPAN's Request Tracker
460
461         HTTPS://RT.CPAN.Org/NoAuth/Bugs.html?Dist=XML-Tidy
462
463           AnnoCPAN: Annotated CPAN documentation
464
465         HTTP://AnnoCPAN.Org/dist/XML-Tidy
466
467           CPAN Ratings
468
469         HTTPS://CPANRatings.Perl.Org/d/XML-Tidy
470
471           Search CPAN
472
473         HTTP://Search.CPAN.Org/dist/XML-Tidy
474

LICENSE

476       Most source code should be Free! Code I have lawful authority over is
477       and shall be!  Copyright: (c) 2004-2017, Pip Stuart.  Copyleft :  This
478       software is licensed under the  GNU General Public License
479         (version 3 or later). Please consult
480       <HTTPS://GNU.Org/licenses/gpl-3.0.txt>
481         for important information about your freedom. This is Free Software:
482       you
483         are free to change and redistribute it. There is NO WARRANTY, to the
484         extent permitted by law. See <HTTPS://FSF.Org> for further
485       information.
486

AUTHOR

488       Pip Stuart <Pip@CPAN.Org>
489
490
491
492perl v5.28.1                      2017-07-09                      XML::Tidy(3)
Impressum