1Twig(3)               User Contributed Perl Documentation              Twig(3)
2
3
4

NAME

6       XML::Twig - A perl module for processing huge XML documents in tree
7       mode.
8

SYNOPSIS

10       Note that this documentation is intended as a reference to the module.
11
12       Complete docs, including a tutorial, examples, an easier to use HTML
13       version, a quick reference card and a FAQ are available at
14       <http://www.xmltwig.org/xmltwig>
15
16       Small documents (loaded in memory as a tree):
17
18         my $twig=XML::Twig->new();    # create the twig
19         $twig->parsefile( 'doc.xml'); # build it
20         my_process( $twig);           # use twig methods to process it
21         $twig->print;                 # output the twig
22
23       Huge documents (processed in combined stream/tree mode):
24
25         # at most one div will be loaded in memory
26         my $twig=XML::Twig->new(
27           twig_handlers =>
28             { title   => sub { $_->set_tag( 'h2') }, # change title tags to h2
29               para    => sub { $_->set_tag( 'p')  }, # change para to p
30               hidden  => sub { $_->delete;       },  # remove hidden elements
31               list    => \&my_list_process,          # process list elements
32               div     => sub { $_[0]->flush;     },  # output and free memory
33             },
34           pretty_print => 'indented',                # output will be nicely formatted
35           empty_tags   => 'html',                    # outputs <empty_tag />
36                                );
37         $twig->parsefile( 'my_big.xml');
38
39       See XML::Twig 101 for other ways to use the module, as a filter for
40       example.
41

DESCRIPTION

43       This module provides a way to process XML documents. It is build on top
44       of "XML::Parser".
45
46       The module offers a tree interface to the document, while allowing you
47       to output the parts of it that have been completely processed.
48
49       It allows minimal resource (CPU and memory) usage by building the tree
50       only for the parts of the documents that need actual processing,
51       through the use of the "twig_roots " and "twig_print_outside_roots "
52       options. The "finish " and "finish_print " methods also help to
53       increase performances.
54
55       XML::Twig tries to make simple things easy so it tries its best to
56       takes care of a lot of the (usually) annoying (but sometimes necessary)
57       features that come with XML and XML::Parser.
58

XML::Twig 101

60       XML::Twig can be used either on "small" XML documents (that fit in
61       memory) or on huge ones, by processing parts of the document and
62       outputting or discarding them once they are processed.
63
64   Loading an XML document and processing it
65         my $t= XML::Twig->new();
66         $t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
67         my $root= $t->root;
68         $root->set_tag( 'html');              # change doc to html
69         $title= $root->first_child( 'title'); # get the title
70         $title->set_tag( 'h1');               # turn it into h1
71         my @para= $root->children( 'para');   # get the para children
72         foreach my $para (@para)
73           { $para->set_tag( 'p'); }           # turn them into p
74         $t->print;                            # output the document
75
76       Other useful methods include:
77
78       att: "$elt->{'att'}->{'foo'}" return the "foo" attribute for an
79       element,
80
81       set_att : "$elt->set_att( foo => "bar")" sets the "foo" attribute to
82       the "bar" value,
83
84       next_sibling: "$elt->{next_sibling}" return the next sibling in the
85       document (in the example "$title->{next_sibling}" is the first "para",
86       you can also (and actually should) use "$elt->next_sibling( 'para')" to
87       get it
88
89       The document can also be transformed through the use of the cut, copy,
90       paste and move methods: "$title->cut; $title->paste( after => $p);" for
91       example
92
93       And much, much more, see XML::Twig::Elt.
94
95   Processing an XML document chunk by chunk
96       One of the strengths of XML::Twig is that it let you work with files
97       that do not fit in memory (BTW storing an XML document in memory as a
98       tree is quite memory-expensive, the expansion factor being often around
99       10).
100
101       To do this you can define handlers, that will be called once a specific
102       element has been completely parsed. In these handlers you can access
103       the element and process it as you see fit, using the navigation and the
104       cut-n-paste methods, plus lots of convenient ones like "prefix ".  Once
105       the element is completely processed you can then "flush " it, which
106       will output it and free the memory. You can also "purge " it if you
107       don't need to output it (if you are just extracting some data from the
108       document for example). The handler will be called again once the next
109       relevant element has been parsed.
110
111         my $t= XML::Twig->new( twig_handlers =>
112                                 { section => \&section,
113                                   para   => sub { $_->set_tag( 'p'); }
114                                 },
115                              );
116         $t->parsefile( 'doc.xml');
117
118         # the handler is called once a section is completely parsed, ie when
119         # the end tag for section is found, it receives the twig itself and
120         # the element (including all its sub-elements) as arguments
121         sub section
122           { my( $t, $section)= @_;      # arguments for all twig_handlers
123             $section->set_tag( 'div');  # change the tag name.4, my favourite method...
124             # let's use the attribute nb as a prefix to the title
125             my $title= $section->first_child( 'title'); # find the title
126             my $nb= $title->{'att'}->{'nb'}; # get the attribute
127             $title->prefix( "$nb - ");  # easy isn't it?
128             $section->flush;            # outputs the section and frees memory
129           }
130
131       There is of course more to it: you can trigger handlers on more
132       elaborate conditions than just the name of the element, "section/title"
133       for example.
134
135         my $t= XML::Twig->new( twig_handlers =>
136                                  { 'section/title' => sub { $_->print } }
137                              )
138                         ->parsefile( 'doc.xml');
139
140       Here "sub { $_->print }" simply prints the current element ($_ is
141       aliased to the element in the handler).
142
143       You can also trigger a handler on a test on an attribute:
144
145         my $t= XML::Twig->new( twig_handlers =>
146                             { 'section[@level="1"]' => sub { $_->print } }
147                              );
148                         ->parsefile( 'doc.xml');
149
150       You can also use "start_tag_handlers " to process an element as soon as
151       the start tag is found. Besides "prefix " you can also use "suffix ",
152
153   Processing just parts of an XML document
154       The twig_roots mode builds only the required sub-trees from the
155       document Anything outside of the twig roots will just be ignored:
156
157         my $t= XML::Twig->new(
158              # the twig will include just the root and selected titles
159                  twig_roots   => { 'section/title' => \&print_n_purge,
160                                    'annex/title'   => \&print_n_purge
161                  }
162                             );
163         $t->parsefile( 'doc.xml');
164
165         sub print_n_purge
166           { my( $t, $elt)= @_;
167             print $elt->text;    # print the text (including sub-element texts)
168             $t->purge;           # frees the memory
169           }
170
171       You can use that mode when you want to process parts of a documents but
172       are not interested in the rest and you don't want to pay the price,
173       either in time or memory, to build the tree for the it.
174
175   Building an XML filter
176       You can combine the "twig_roots" and the "twig_print_outside_roots"
177       options to build filters, which let you modify selected elements and
178       will output the rest of the document as is.
179
180       This would convert prices in $ to prices in Euro in a document:
181
182         my $t= XML::Twig->new(
183                  twig_roots   => { 'price' => \&convert, },   # process prices
184                  twig_print_outside_roots => 1,               # print the rest
185                             );
186         $t->parsefile( 'doc.xml');
187
188         sub convert
189           { my( $t, $price)= @_;
190             my $currency=  $price->{'att'}->{'currency'};          # get the currency
191             if( $currency eq 'USD')
192               { $usd_price= $price->text;                     # get the price
193                 # %rate is just a conversion table
194                 my $euro_price= $usd_price * $rate{usd2euro};
195                 $price->set_text( $euro_price);               # set the new price
196                 $price->set_att( currency => 'EUR');          # don't forget this!
197               }
198             $price->print;                                    # output the price
199           }
200
201   XML::Twig and various versions of Perl, XML::Parser and expat:
202       XML::Twig is a lot more sensitive to variations in versions of perl,
203       XML::Parser and expat than to the OS, so this should cover some
204       reasonable configurations.
205
206       The "recommended configuration" is perl 5.8.3+ (for good Unicode
207       support), XML::Parser 2.31+ and expat 1.95.5+
208
209       See <http://testers.cpan.org/search?request=dist&dist=XML-Twig> for the
210       CPAN testers reports on XML::Twig, which list all tested
211       configurations.
212
213       An Atom feed of the CPAN Testers results is available at
214       <http://xmltwig.org/rss/twig_testers.rss>
215
216       Finally:
217
218       XML::Twig does NOT work with expat 1.95.4
219       XML::Twig only works with XML::Parser 2.27 in perl 5.6.*
220           Note that I can't compile XML::Parser 2.27 anymore, so I can't
221           guarantee that it still works
222
223       XML::Parser 2.28 does not really work
224
225       When in doubt, upgrade expat, XML::Parser and Scalar::Util
226
227       Finally, for some optional features, XML::Twig depends on some
228       additional modules. The complete list, which depends somewhat on the
229       version of Perl that you are running, is given by running
230       "t/zz_dump_config.t"
231

Simplifying XML processing

233       Whitespaces
234           Whitespaces that look non-significant are discarded, this behaviour
235           can be controlled using the "keep_spaces ", "keep_spaces_in " and
236           "discard_spaces_in " options.
237
238       Encoding
239           You can specify that you want the output in the same encoding as
240           the input (provided you have valid XML, which means you have to
241           specify the encoding either in the document or when you create the
242           Twig object) using the "keep_encoding " option
243
244           You can also use "output_encoding" to convert the internal UTF-8
245           format to the required encoding.
246
247       Comments and Processing Instructions (PI)
248           Comments and PI's can be hidden from the processing, but still
249           appear in the output (they are carried by the "real" element closer
250           to them)
251
252       Pretty Printing
253           XML::Twig can output the document pretty printed so it is easier to
254           read for us humans.
255
256       Surviving an untimely death
257           XML parsers are supposed to react violently when fed improper XML.
258           XML::Parser just dies.
259
260           XML::Twig provides the "safe_parse " and the "safe_parsefile "
261           methods which wrap the parse in an eval and return either the
262           parsed twig or 0 in case of failure.
263
264       Private attributes
265           Attributes with a name starting with # (illegal in XML) will not be
266           output, so you can safely use them to store temporary values during
267           processing. Note that you can store anything in a private
268           attribute, not just text, it's just a regular Perl variable, so a
269           reference to an object or a huge data structure is perfectly fine.
270

CLASSES

272       XML::Twig uses a very limited number of classes. The ones you are most
273       likely to use are "XML::Twig" of course, which represents a complete
274       XML document, including the document itself (the root of the document
275       itself is "root"), its handlers, its input or output filters... The
276       other main class is "XML::Twig::Elt", which models an XML element.
277       Element here has a very wide definition: it can be a regular element,
278       or but also text, with an element "tag" of "#PCDATA" (or "#CDATA"), an
279       entity (tag is "#ENT"), a Processing Instruction ("#PI"), a comment
280       ("#COMMENT").
281
282       Those are the 2 commonly used classes.
283
284       You might want to look the "elt_class" option if you want to subclass
285       "XML::Twig::Elt".
286
287       Attributes are just attached to their parent element, they are not
288       objects per se. (Please use the provided methods "att" and "set_att" to
289       access them, if you access them as a hash, then your code becomes
290       implementaion dependent and might break in the future).
291
292       Other classes that are seldom used are "XML::Twig::Entity_list" and
293       "XML::Twig::Entity".
294
295       If you use "XML::Twig::XPath" instead of "XML::Twig", elements are then
296       created as "XML::Twig::XPath::Elt"
297

METHODS

299   XML::Twig
300       A twig is a subclass of XML::Parser, so all XML::Parser methods can be
301       called on a twig object, including parse and parsefile.  "setHandlers"
302       on the other hand cannot be used, see "BUGS "
303
304       new This is a class method, the constructor for XML::Twig. Options are
305           passed as keyword value pairs. Recognized options are the same as
306           XML::Parser, plus some (in fact a lot!) XML::Twig specifics.
307
308           New Options:
309
310           twig_handlers
311               This argument consists of a hash "{ expression =" \&handler}>
312               where expression is a an XPath-like expression (+ some others).
313
314               XPath expressions are limited to using the child and descendant
315               axis (indeed you can't specify an axis), and predicates cannot
316               be nested.  You can use the "string", or "string(<tag>)"
317               function (except in "twig_roots" triggers).
318
319               Additionally you can use regexps (/ delimited) to match
320               attribute and string values.
321
322               Examples:
323
324                 foo
325                 foo/bar
326                 foo//bar
327                 /foo/bar
328                 /foo//bar
329                 /foo/bar[@att1 = "val1" and @att2 = "val2"]/baz[@a >= 1]
330                 foo[string()=~ /^duh!+/]
331                 /foo[string(bar)=~ /\d+/]/baz[@att != 3]
332
333               #CDATA can be used to call a handler for a CDATA section.
334               #COMMENT can be used to call a handler for comments
335
336               Some additional (non-XPath) expressions are also provided for
337               convenience:
338
339               processing instructions
340                   '?' or '#PI' triggers the handler for any processing
341                   instruction, and '?<target>' or '#PI <target>' triggers a
342                   handler for processing instruction with the given target(
343                   ex: '#PI xml-stylesheet').
344
345               level(<level>)
346                   Triggers the handler on any element at that level in the
347                   tree (root is level 1)
348
349               _all_
350                   Triggers the handler for all elements in the tree
351
352               _default_
353                   Triggers the handler for each element that does NOT have
354                   any other handler.
355
356               Expressions are evaluated against the input document.  Which
357               means that even if you have changed the tag of an element
358               (changing the tag of a parent element from a handler for
359               example) the change will not impact the expression evaluation.
360               There is an exception to this: "private" attributes (which name
361               start with a '#', and can only be created during the parsing,
362               as they are not valid XML) are checked against the current
363               twig.
364
365               Handlers are triggered in fixed order, sorted by their type
366               (xpath expressions first, then regexps, then level), then by
367               whether they specify a full path (starting at the root element)
368               or not, then by by number of steps in the expression , then
369               number of predicates, then number of tests in predicates.
370               Handlers where the last step does not specify a step
371               ("foo/bar/*") are triggered after other XPath handlers. Finally
372               "_all_" handlers are triggered last.
373
374               Important: once a handler has been triggered if it returns 0
375               then no other handler is called, except a "_all_" handler which
376               will be called anyway.
377
378               If a handler returns a true value and other handlers apply,
379               then the next applicable handler will be called. Repeat, rinse,
380               lather..; The exception to that rule is when the
381               "do_not_chain_handlers" option is set, in which case only the
382               first handler will be called.
383
384               Note that it might be a good idea to explicitly return a short
385               true value (like 1) from handlers: this ensures that other
386               applicable handlers are called even if the last statement for
387               the handler happens to evaluate to false. This might also
388               speedup the code by avoiding the result of the last statement
389               of the code to be copied and passed to the code managing
390               handlers.  It can really pay to have 1 instead of a long string
391               returned.
392
393               When the closing tag for an element is parsed the corresponding
394               handler is called, with 2 arguments: the twig and the "Element
395               ". The twig includes the document tree that has been built so
396               far, the element is the complete sub-tree for the element. The
397               fact that the handler is called only when the closing tag for
398               the element is found means that handlers for inner elements are
399               called before handlers for outer elements.
400
401               $_ is also set to the element, so it is easy to write inline
402               handlers like
403
404                 para => sub { $_->set_tag( 'p'); }
405
406               Text is stored in elements whose tag name is #PCDATA (due to
407               mixed content, text and sub-element in an element there is no
408               way to store the text as just an attribute of the enclosing
409               element).
410
411               Warning: if you have used purge or flush on the twig the
412               element might not be complete, some of its children might have
413               been entirely flushed or purged, and the start tag might even
414               have been printed (by "flush") already, so changing its tag
415               might not give the expected result.
416
417           twig_roots
418               This argument let's you build the tree only for those elements
419               you are interested in.
420
421                 Example: my $t= XML::Twig->new( twig_roots => { title => 1, subtitle => 1});
422                          $t->parsefile( file);
423                          my $t= XML::Twig->new( twig_roots => { 'section/title' => 1});
424                          $t->parsefile( file);
425
426               return a twig containing a document including only "title" and
427               "subtitle" elements, as children of the root element.
428
429               You can use generic_attribute_condition, attribute_condition,
430               full_path, partial_path, tag, tag_regexp, _default_ and _all_
431               to trigger the building of the twig.  string_condition and
432               regexp_condition cannot be used as the content of the element,
433               and the string, have not yet been parsed when the condition is
434               checked.
435
436               WARNING: path are checked for the document. Even if the
437               "twig_roots" option is used they will be checked against the
438               full document tree, not the virtual tree created by XML::Twig
439
440               WARNING: twig_roots elements should NOT be nested, that would
441               hopelessly confuse XML::Twig ;--(
442
443               Note: you can set handlers (twig_handlers) using twig_roots
444                 Example: my $t= XML::Twig->new( twig_roots =>
445                                                  { title    => sub {
446               $_[1]->print;},
447                                                    subtitle =>
448               \&process_subtitle
449                                                  }
450                                              );
451                          $t->parsefile( file);
452
453           twig_print_outside_roots
454               To be used in conjunction with the "twig_roots" argument. When
455               set to a true value this will print the document outside of the
456               "twig_roots" elements.
457
458                Example: my $t= XML::Twig->new( twig_roots => { title => \&number_title },
459                                               twig_print_outside_roots => 1,
460                                              );
461                          $t->parsefile( file);
462                          { my $nb;
463                          sub number_title
464                            { my( $twig, $title);
465                              $nb++;
466                              $title->prefix( "$nb ");
467                              $title->print;
468                            }
469                          }
470
471               This example prints the document outside of the title element,
472               calls "number_title" for each "title" element, prints it, and
473               then resumes printing the document. The twig is built only for
474               the "title" elements.
475
476               If the value is a reference to a file handle then the document
477               outside the "twig_roots" elements will be output to this file
478               handle:
479
480                 open( my $out, '>', 'out_file.xml') or die "cannot open out file.xml out_file:$!";
481                 my $t= XML::Twig->new( twig_roots => { title => \&number_title },
482                                        # default output to $out
483                                        twig_print_outside_roots => $out,
484                                      );
485
486                        { my $nb;
487                          sub number_title
488                            { my( $twig, $title);
489                              $nb++;
490                              $title->prefix( "$nb ");
491                              $title->print( $out);    # you have to print to \*OUT here
492                            }
493                          }
494
495           start_tag_handlers
496               A hash "{ expression =" \&handler}>. Sets element handlers that
497               are called when the element is open (at the end of the
498               XML::Parser "Start" handler). The handlers are called with 2
499               params: the twig and the element. The element is empty at that
500               point, its attributes are created though.
501
502               You can use generic_attribute_condition, attribute_condition,
503               full_path, partial_path, tag, tag_regexp, _default_  and _all_
504               to trigger the handler.
505
506               string_condition and regexp_condition cannot be used as the
507               content of the element, and the string, have not yet been
508               parsed when the condition is checked.
509
510               The main uses for those handlers are to change the tag name
511               (you might have to do it as soon as you find the open tag if
512               you plan to "flush" the twig at some point in the element, and
513               to create temporary attributes that will be used when
514               processing sub-element with "twig_hanlders".
515
516               You should also use it to change tags if you use "flush". If
517               you change the tag in a regular "twig_handler" then the start
518               tag might already have been flushed.
519
520               Note: "start_tag" handlers can be called outside of
521               "twig_roots" if this argument is used, in this case handlers
522               are called with the following arguments: $t (the twig), $tag
523               (the tag of the element) and %att (a hash of the attributes of
524               the element).
525
526               If the "twig_print_outside_roots" argument is also used, if the
527               last handler called returns  a "true" value, then the the start
528               tag will be output as it appeared in the original document, if
529               the handler returns a a "false" value then the start tag will
530               not be printed (so you can print a modified string yourself for
531               example).
532
533               Note that you can use the ignore method in "start_tag_handlers"
534               (and only there).
535
536           end_tag_handlers
537               A hash "{ expression =" \&handler}>. Sets element handlers that
538               are called when the element is closed (at the end of the
539               XML::Parser "End" handler). The handlers are called with 2
540               params: the twig and the tag of the element.
541
542               twig_handlers are called when an element is completely parsed,
543               so why have this redundant option? There is only one use for
544               "end_tag_handlers": when using the "twig_roots" option, to
545               trigger a handler for an element outside the roots.  It is for
546               example very useful to number titles in a document using nested
547               sections:
548
549                 my @no= (0);
550                 my $no;
551                 my $t= XML::Twig->new(
552                         start_tag_handlers =>
553                          { section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
554                         twig_roots         =>
555                          { title   => sub { $_[1]->prefix( $no); $_[1]->print; } },
556                         end_tag_handlers   => { section => sub { pop @no;  } },
557                         twig_print_outside_roots => 1
558                                     );
559                  $t->parsefile( $file);
560
561               Using the "end_tag_handlers" argument without "twig_roots" will
562               result in an error.
563
564           do_not_chain_handlers
565               If this option is set to a true value, then only one handler
566               will be called for each element, even if several satisfy the
567               condition
568
569               Note that the "_all_" handler will still be called regardless
570
571           ignore_elts
572               This option lets you ignore elements when building the twig.
573               This is useful in cases where you cannot use "twig_roots" to
574               ignore elements, for example if the element to ignore is a
575               sibling of elements you are interested in.
576
577               Example:
578
579                 my $twig= XML::Twig->new( ignore_elts => { elt => 'discard' });
580                 $twig->parsefile( 'doc.xml');
581
582               This will build the complete twig for the document, except that
583               all "elt" elements (and their children) will be left out.
584
585               The keys in the hash are triggers, limited to the same subset
586               as "start_tag_handlers". The values can be "discard", to
587               discard the element, "print", to output the element as-is,
588               "string" to store the text of the ignored element(s), including
589               markup, in a field of the twig: "$t->{twig_buffered_string}" or
590               a reference to a scalar, in which case the text of the ignored
591               element(s), including markup, will be stored in the scalar. Any
592               other value will be treated as "discard".
593
594           char_handler
595               A reference to a subroutine that will be called every time
596               "PCDATA" is found.
597
598               The subroutine receives the string as argument, and returns the
599               modified string:
600
601                 # we want all strings in upper case
602                 sub my_char_handler
603                   { my( $text)= @_;
604                     $text= uc( $text);
605                     return $text;
606                   }
607
608           elt_class
609               The name of a class used to store elements. this class should
610               inherit from "XML::Twig::Elt" (and by default it is
611               "XML::Twig::Elt"). This option is used to subclass the element
612               class and extend it with new methods.
613
614               This option is needed because during the parsing of the XML,
615               elements are created by "XML::Twig", without any control from
616               the user code.
617
618           keep_atts_order
619               Setting this option to a true value causes the attribute hash
620               to be tied to a "Tie::IxHash" object.  This means that
621               "Tie::IxHash" needs to be installed for this option to be
622               available. It also means that the hash keeps its order, so you
623               will get the attributes in order. This allows outputting the
624               attributes in the same order as they were in the original
625               document.
626
627           keep_encoding
628               This is a (slightly?) evil option: if the XML document is not
629               UTF-8 encoded and you want to keep it that way, then setting
630               keep_encoding will use the"Expat" original_string method for
631               character, thus keeping the original encoding, as well as the
632               original entities in the strings.
633
634               See the "t/test6.t" test file to see what results you can
635               expect from the various encoding options.
636
637               WARNING: if the original encoding is multi-byte then attribute
638               parsing will be EXTREMELY unsafe under any Perl before 5.6, as
639               it uses regular expressions which do not deal properly with
640               multi-byte characters. You can specify an alternate function to
641               parse the start tags with the "parse_start_tag" option (see
642               below)
643
644               WARNING: this option is NOT used when parsing with the non-
645               blocking parser ("parse_start", "parse_more", parse_done
646               methods) which you probably should not use with XML::Twig
647               anyway as they are totally untested!
648
649           output_encoding
650               This option generates an output_filter using "Encode",
651               "Text::Iconv" or "Unicode::Map8" and "Unicode::Strings", and
652               sets the encoding in the XML declaration. This is the easiest
653               way to deal with encodings, if you need more sophisticated
654               features, look at "output_filter" below
655
656           output_filter
657               This option is used to convert the character encoding of the
658               output document.  It is passed either a string corresponding to
659               a predefined filter or a subroutine reference. The filter will
660               be called every time a document or element is processed by the
661               "print" functions ("print", "sprint", "flush").
662
663               Pre-defined filters:
664
665               latin1
666                   uses either "Encode", "Text::Iconv" or "Unicode::Map8" and
667                   "Unicode::String" or a regexp (which works only with
668                   XML::Parser 2.27), in this order, to convert all characters
669                   to ISO-8859-15 (usually latin1 is synonym to ISO-8859-1,
670                   but in practice it seems that ISO-8859-15, which includes
671                   the euro sign, is more useful and probably what most people
672                   want).
673
674               html
675                   does the same conversion as "latin1", plus encodes entities
676                   using "HTML::Entities" (oddly enough you will need to have
677                   HTML::Entities installed for it to be available). This
678                   should only be used if the tags and attribute names
679                   themselves are in US-ASCII, or they will be converted and
680                   the output will not be valid XML any more
681
682               safe
683                   converts the output to ASCII (US) only  plus character
684                   entities ("&#nnn;") this should be used only if the tags
685                   and attribute names themselves are in US-ASCII, or they
686                   will be converted and the output will not be valid XML any
687                   more
688
689               safe_hex
690                   same as "safe" except that the character entities are in
691                   hexa ("&#xnnn;")
692
693               encode_convert ($encoding)
694                   Return a subref that can be used to convert utf8 strings to
695                   $encoding).  Uses "Encode".
696
697                      my $conv = XML::Twig::encode_convert( 'latin1');
698                      my $t = XML::Twig->new(output_filter => $conv);
699
700               iconv_convert ($encoding)
701                   this function is used to create a filter subroutine that
702                   will be used to convert the characters to the target
703                   encoding using "Text::Iconv" (which needs to be installed,
704                   look at the documentation for the module and for the
705                   "iconv" library to find out which encodings are available
706                   on your system)
707
708                      my $conv = XML::Twig::iconv_convert( 'latin1');
709                      my $t = XML::Twig->new(output_filter => $conv);
710
711               unicode_convert ($encoding)
712                   this function is used to create a filter subroutine that
713                   will be used to convert the characters to the target
714                   encoding using  "Unicode::Strings" and "Unicode::Map8"
715                   (which need to be installed, look at the documentation for
716                   the modules to find out which encodings are available on
717                   your system)
718
719                      my $conv = XML::Twig::unicode_convert( 'latin1');
720                      my $t = XML::Twig->new(output_filter => $conv);
721
722               The "text" and "att" methods do not use the filter, so their
723               result are always in unicode.
724
725               Those predeclared filters are based on subroutines that can be
726               used by themselves (as "XML::Twig::foo").
727
728               html_encode ($string)
729                   Use "HTML::Entities" to encode a utf8 string
730
731               safe_encode ($string)
732                   Use either a regexp (perl < 5.8) or "Encode" to encode non-
733                   ascii characters in the string in "&#<nnnn>;" format
734
735               safe_encode_hex ($string)
736                   Use either a regexp (perl < 5.8) or "Encode" to encode non-
737                   ascii characters in the string in "&#x<nnnn>;" format
738
739               regexp2latin1 ($string)
740                   Use a regexp to encode a utf8 string into latin 1
741                   (ISO-8859-1). Does not work with Perl 5.8.0!
742
743           output_text_filter
744               same as output_filter, except it doesn't apply to the brackets
745               and quotes around attribute values. This is useful for all
746               filters that could change the tagging, basically anything that
747               does not just change the encoding of the output. "html", "safe"
748               and "safe_hex" are better used with this option.
749
750           input_filter
751               This option is similar to "output_filter" except the filter is
752               applied to the characters before they are stored in the twig,
753               at parsing time.
754
755           remove_cdata
756               Setting this option to a true value will force the twig to
757               output CDATA sections as regular (escaped) PCDATA
758
759           parse_start_tag
760               If you use the "keep_encoding" option then this option can be
761               used to replace the default parsing function. You should
762               provide a coderef (a reference to a subroutine) as the
763               argument, this subroutine takes the original tag (given by
764               XML::Parser::Expat "original_string()" method) and returns a
765               tag and the attributes in a hash (or in a list
766               attribute_name/attribute value).
767
768           expand_external_ents
769               When this option is used external entities (that are defined)
770               are expanded when the document is output using "print"
771               functions such as "print ", "sprint ", "flush " and "xml_string
772               ".  Note that in the twig the entity will be stored as an
773               element with a tag '"#ENT"', the entity will not be expanded
774               there, so you might want to process the entities before
775               outputting it.
776
777               If an external entity is not available, then the parse will
778               fail.
779
780               A special case is when the value of this option is -1. In that
781               case a missing entity will not cause the parser to die, but its
782               "name", "sysid" and "pubid" will be stored in the twig as
783               "$twig->{twig_missing_system_entities}" (a reference to an
784               array of hashes { name => <name>, sysid => <sysid>, pubid =>
785               <pubid> }). Yes, this is a bit of a hack, but it's useful in
786               some cases.
787
788           load_DTD
789               If this argument is set to a true value, "parse" or "parsefile"
790               on the twig will load  the DTD information. This information
791               can then be accessed through the twig, in a "DTD_handler" for
792               example. This will load even an external DTD.
793
794               Default and fixed values for attributes will also be filled,
795               based on the DTD.
796
797               Note that to do this the module will generate a temporary file
798               in the current directory. If this is a problem let me know and
799               I will add an option to specify an alternate directory.
800
801               See "DTD Handling" for more information
802
803           DTD_handler
804               Set a handler that will be called once the doctype (and the
805               DTD) have been loaded, with 2 arguments, the twig and the DTD.
806
807           no_prolog
808               Does not output a prolog (XML declaration and DTD)
809
810           id  This optional argument gives the name of an attribute that can
811               be used as an ID in the document. Elements whose ID is known
812               can be accessed through the elt_id method. id defaults to 'id'.
813               See "BUGS "
814
815           discard_spaces
816               If this optional argument is set to a true value then spaces
817               are discarded when they look non-significant: strings
818               containing only spaces and at least one line feed are
819               discarded. This argument is set to true by default.
820
821               The exact algorithm to drop spaces is: strings including only
822               spaces (perl \s) and at least one \n right before an open or
823               close tag are dropped.
824
825           discard_all_spaces
826               If this argument is set to a true value, spaces are discarded
827               more aggressively than with "discard_spaces": strings not
828               including a \n are also dropped. This option is appropriate for
829               data-oriented XML.
830
831           keep_spaces
832               If this optional argument is set to a true value then all
833               spaces in the document are kept, and stored as "PCDATA".
834
835               Warning: adding this option can result in changes in the twig
836               generated: space that was previously discarded might end up in
837               a new text element. see the difference by calling the following
838               code with 0 and 1 as arguments:
839
840                 perl -MXML::Twig -e'print XML::Twig->new( keep_spaces => shift)->parse( "<d> \n<e/></d>")->_dump'
841
842               "keep_spaces" and "discard_spaces" cannot be both set.
843
844           discard_spaces_in
845               This argument sets "keep_spaces" to true but will cause the
846               twig builder to discard spaces in the elements listed.
847
848               The syntax for using this argument is:
849
850                 XML::Twig->new( discard_spaces_in => [ 'elt1', 'elt2']);
851
852           keep_spaces_in
853               This argument sets "discard_spaces" to true but will cause the
854               twig builder to keep spaces in the elements listed.
855
856               The syntax for using this argument is:
857
858                 XML::Twig->new( keep_spaces_in => [ 'elt1', 'elt2']);
859
860               Warning: adding this option can result in changes in the twig
861               generated: space that was previously discarded might end up in
862               a new text element.
863
864           pretty_print
865               Set the pretty print method, amongst '"none"' (default),
866               '"nsgmls"', '"nice"', '"indented"', '"indented_c"',
867               '"indented_a"', '"indented_close_tag"', '"cvs"', '"wrapped"',
868               '"record"' and '"record_c"'
869
870               pretty_print formats:
871
872               none
873                   The document is output as one ling string, with no line
874                   breaks except those found within text elements
875
876               nsgmls
877                   Line breaks are inserted in safe places: that is within
878                   tags, between a tag and an attribute, between attributes
879                   and before the > at the end of a tag.
880
881                   This is quite ugly but better than "none", and it is very
882                   safe, the document will still be valid (conforming to its
883                   DTD).
884
885                   This is how the SGML parser "sgmls" splits documents, hence
886                   the name.
887
888               nice
889                   This option inserts line breaks before any tag that does
890                   not contain text (so element with textual content are not
891                   broken as the \n is the significant).
892
893                   WARNING: this option leaves the document well-formed but
894                   might make it invalid (not conformant to its DTD). If you
895                   have elements declared as
896
897                     <!ELEMENT foo (#PCDATA|bar)>
898
899                   then a "foo" element including a "bar" one will be printed
900                   as
901
902                     <foo>
903                     <bar>bar is just pcdata</bar>
904                     </foo>
905
906                   This is invalid, as the parser will take the line break
907                   after the "foo" tag as a sign that the element contains
908                   PCDATA, it will then die when it finds the "bar" tag. This
909                   may or may not be important for you, but be aware of it!
910
911               indented
912                   Same as "nice" (and with the same warning) but indents
913                   elements according to their level
914
915               indented_c
916                   Same as "indented" but a little more compact: the closing
917                   tags are on the same line as the preceding text
918
919               indented_close_tag
920                   Same as "indented" except that the closing tag is also
921                   indented, to line up with the tags within the element
922
923               idented_a
924                   This formats XML files in a line-oriented version control
925                   friendly way.  The format is described in
926                   <http://tinyurl.com/2kwscq> (that's an Oracle document with
927                   an insanely long URL).
928
929                   Note that to be totaly conformant to the "spec", the order
930                   of attributes should not be changed, so if they are not
931                   already in alphabetical order you will need to use the
932                   "keep_atts_order" option.
933
934               cvs Same as "idented_a".
935
936               wrapped
937                   Same as "indented_c" but lines are wrapped using
938                   Text::Wrap::wrap. The default length for lines is the
939                   default for $Text::Wrap::columns, and can be changed by
940                   changing that variable.
941
942               record
943                   This is a record-oriented pretty print, that display data
944                   in records, one field per line (which looks a LOT like
945                   "indented")
946
947               record_c
948                   Stands for record compact, one record per line
949
950           empty_tags
951               Set the empty tag display style ('"normal"', '"html"' or
952               '"expand"').
953
954               "normal" outputs an empty tag '"<tag/>"', "html" adds a space
955               '"<tag />"' for elements that can be empty in XHTML and
956               "expand" outputs '"<tag></tag>"'
957
958           quote
959               Set the quote character for attributes ('"single"' or
960               '"double"').
961
962           escape_gt
963               By default XML::Twig does not escape the character > in its
964               output, as it is not mandated by the XML spec. With this option
965               on, > will be replaced by "&gt;"
966
967           comments
968               Set the way comments are processed: '"drop"' (default),
969               '"keep"' or '"process"'
970
971               Comments processing options:
972
973               drop
974                   drops the comments, they are not read, nor printed to the
975                   output
976
977               keep
978                   comments are loaded and will appear on the output, they are
979                   not accessible within the twig and will not interfere with
980                   processing though
981
982                   Note: comments in the middle of a text element such as
983
984                     <p>text <!-- comment --> more text --></p>
985
986                   are kept at their original position in the text. Using
987                   X"print" methods like "print" or "sprint" will return the
988                   comments in the text. Using "text" or "field" on the other
989                   hand will not.
990
991                   Any use of "set_pcdata" on the "#PCDATA" element (directly
992                   or through other methods like "set_content") will delete
993                   the comment(s).
994
995               process
996                   comments are loaded in the twig and will be treated as
997                   regular elements (their "tag" is "#COMMENT") this can
998                   interfere with processing if you expect
999                   "$elt->{first_child}" to be an element but find a comment
1000                   there.  Validation will not protect you from this as
1001                   comments can happen anywhere.  You can use
1002                   "$elt->first_child( 'tag')" (which is a good habit anyway)
1003                   to get where you want.
1004
1005                   Consider using "process" if you are outputting SAX events
1006                   from XML::Twig.
1007
1008           pi  Set the way processing instructions are processed: '"drop"',
1009               '"keep"' (default) or '"process"'
1010
1011               Note that you can also set PI handlers in the "twig_handlers"
1012               option:
1013
1014                 '?'       => \&handler
1015                 '?target' => \&handler 2
1016
1017               The handlers will be called with 2 parameters, the twig and the
1018               PI element if "pi" is set to "process", and with 3, the twig,
1019               the target and the data if "pi" is set to "keep". Of course
1020               they will not be called if "pi" is set to "drop".
1021
1022               If "pi" is set to "keep" the handler should return a string
1023               that will be used as-is as the PI text (it should look like ""
1024               <?target data?" >" or '' if you want to remove the PI),
1025
1026               Only one handler will be called, "?target" or "?" if no
1027               specific handler for that target is available.
1028
1029           map_xmlns
1030               This option is passed a hashref that maps uri's to prefixes.
1031               The prefixes in the document will be replaced by the ones in
1032               the map. The mapped prefixes can (actually have to) be used to
1033               trigger handlers, navigate or query the document.
1034
1035               Here is an example:
1036
1037                 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1038                                        twig_handlers =>
1039                                          { 'svg:circle' => sub { $_->set_att( r => 20) } },
1040                                        pretty_print => 'indented',
1041                                      )
1042                                 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1043                                             <gr:circle cx="10" cy="90" r="10"/>
1044                                          </doc>'
1045                                        )
1046                                 ->print;
1047
1048               This will output:
1049
1050                 <doc xmlns:svg="http://www.w3.org/2000/svg">
1051                    <svg:circle cx="10" cy="90" r="20"/>
1052                 </doc>
1053
1054           keep_original_prefix
1055               When used with "map_xmlns" this option will make "XML::Twig"
1056               use the original namespace prefixes when outputting a document.
1057               The mapped prefix will still be used for triggering handlers
1058               and in navigation and query methods.
1059
1060                 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1061                                        twig_handlers =>
1062                                          { 'svg:circle' => sub { $_->set_att( r => 20) } },
1063                                        keep_original_prefix => 1,
1064                                        pretty_print => 'indented',
1065                                      )
1066                                 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1067                                             <gr:circle cx="10" cy="90" r="10"/>
1068                                          </doc>'
1069                                        )
1070                                 ->print;
1071
1072               This will output:
1073
1074                 <doc xmlns:gr="http://www.w3.org/2000/svg">
1075                    <gr:circle cx="10" cy="90" r="20"/>
1076                 </doc>
1077
1078           original_uri ($prefix)
1079               called within a handler, this will return the uri bound to the
1080               namespace prefix in the original document.
1081
1082           index ($arrayref or $hashref)
1083               This option creates lists of specific elements during the
1084               parsing of the XML.  It takes a reference to either a list of
1085               triggering expressions or to a hash name => expression, and for
1086               each one generates the list of elements that match the
1087               expression. The list can be accessed through the "index"
1088               method.
1089
1090               example:
1091
1092                 # using an array ref
1093                 my $t= XML::Twig->new( index => [ 'div', 'table' ])
1094                                 ->parsefile( "foo.xml");
1095                 my $divs= $t->index( 'div');
1096                 my $first_div= $divs->[0];
1097                 my $last_table= $t->index( table => -1);
1098
1099                 # using a hashref to name the indexes
1100                 my $t= XML::Twig->new( index => { email => 'a[@href=~/^ \s*mailto:/]'})
1101                                 ->parsefile( "foo.xml");
1102                 my $last_emails= $t->index( email => -1);
1103
1104               Note that the index is not maintained after the parsing. If
1105               elements are deleted, renamed or otherwise hurt during
1106               processing, the index is NOT updated.  (changing the id element
1107               OTOH will update the index)
1108
1109           att_accessors <list of attribute names>
1110               creates methods that give direct access to attribute:
1111
1112                 my $t= XML::Twig->new( att_accessors => [ 'href', 'src'])
1113                                 ->parsefile( $file);
1114                 my $first_href= $t->first_elt( 'img')->src; # same as ->att( 'src')
1115                 $t->first_elt( 'img')->src( 'new_logo.png') # changes the attribute value
1116
1117           elt_accessors
1118               creates methods that give direct access to the first child
1119               element (in scalar context) or the list of elements (in list
1120               context):
1121
1122               the list of accessors to create can be given 1 2 different
1123               ways: in an array, or in a hash alias => expression
1124                 my $t=  XML::Twig->new( elt_accessors => [ 'head'])
1125                                 ->parsefile( $file);
1126                 my $title_text= $t->root->head->field( 'title');
1127                 # same as $title_text= $t->root->first_child( 'head')->field(
1128               'title');
1129
1130                 my $t=  XML::Twig->new( elt_accessors => { warnings => 'p[@class="warning"]', d2 => 'div[2]'}, )
1131                                 ->parsefile( $file);
1132                 my $body= $t->first_elt( 'body');
1133                 my @warnings= $body->warnings; # same as $body->children( 'p[@class="warning"]');
1134                 my $s2= $body->d2;             # same as $body->first_child( 'div[2]')
1135
1136           field_accessors
1137               creates methods that give direct access to the first child
1138               element text:
1139
1140                 my $t=  XML::Twig->new( field_accessors => [ 'h1'])
1141                                 ->parsefile( $file);
1142                 my $div_title_text= $t->first_elt( 'div')->title;
1143                 # same as $title_text= $t->first_elt( 'div')->field( 'title');
1144
1145           use_tidy
1146               set this option to use HTML::Tidy instead of HTML::TreeBuilder
1147               to convert HTML to XML. HTML, especially real (real "crap")
1148               HTML found in the wild, so depending on the data, one module or
1149               the other does a better job at the conversion. Also, HTML::Tidy
1150               can be a bit difficult to install, so XML::Twig offers both
1151               option. TIMTOWTDI
1152
1153           output_html_doctype
1154               when using HTML::TreeBuilder to convert HTML, this option
1155               causes the DOCTYPE declaration to be output, which may be
1156               important for some legacy browsers.  Without that option the
1157               DOCTYPE definition is NOT output. Also if the definition is
1158               completely wrong (ie not easily parsable), it is not output
1159               either.
1160
1161           Note: I _HATE_ the Java-like name of arguments used by most XML
1162           modules.  So in pure TIMTOWTDI fashion all arguments can be written
1163           either as "UglyJavaLikeName" or as "readable_perl_name":
1164           "twig_print_outside_roots" or "TwigPrintOutsideRoots" (or even
1165           "twigPrintOutsideRoots" {shudder}).  XML::Twig normalizes them
1166           before processing them.
1167
1168       parse ( $source)
1169           The $source parameter should either be a string containing the
1170           whole XML document, or it should be an open "IO::Handle" (aka a
1171           filehandle).
1172
1173           A die call is thrown if a parse error occurs. Otherwise it will
1174           return the twig built by the parse. Use "safe_parse" if you want
1175           the parsing to return even when an error occurs.
1176
1177           If this method is called as a class method ("XML::Twig->parse(
1178           $some_xml_or_html)") then an XML::Twig object is created, using the
1179           parameters except the last one (eg "XML::Twig->parse( pretty_print
1180           => 'indented', $some_xml_or_html)") and "xparse" is called on it.
1181
1182           Note that when parsing a filehandle, the handle should NOT be open
1183           with an encoding (ie open with "open( my $in, '<', $filename)". The
1184           file will be parsed by "expat", so specifying the encoding actually
1185           causes problems for the parser (as in: it can crash it, see
1186           https://rt.cpan.org/Ticket/Display.html?id=78877). For parsing a
1187           file it is actually recommended to use "parsefile" on the file
1188           name, instead of <parse> on the open file.
1189
1190       parsestring
1191           This is just an alias for "parse" for backwards compatibility.
1192
1193       parsefile (FILE [, OPT => OPT_VALUE [...]])
1194           Open "FILE" for reading, then call "parse" with the open handle.
1195           The file is closed no matter how "parse" returns.
1196
1197           A "die" call is thrown if a parse error occurs. Otherwise it will
1198           return the twig built by the parse. Use "safe_parsefile" if you
1199           want the parsing to return even when an error occurs.
1200
1201       parsefile_inplace ( $file, $optional_extension)
1202           Parse and update a file "in place". It does this by creating a temp
1203           file, selecting it as the default for print() statements (and
1204           methods), then parsing the input file. If the parsing is
1205           successful, then the temp file is moved to replace the input file.
1206
1207           If an extension is given then the original file is backed-up (the
1208           rules for the extension are the same as the rule for the -i option
1209           in perl).
1210
1211       parsefile_html_inplace ( $file, $optional_extension)
1212           Same as parsefile_inplace, except that it parses HTML instead of
1213           XML
1214
1215       parseurl ($url $optional_user_agent)
1216           Gets the data from $url and parse it. The data is piped to the
1217           parser in chunks the size of the XML::Parser::Expat buffer, so
1218           memory consumption and hopefully speed are optimal.
1219
1220           For most (read "small") XML it is probably as efficient (and easier
1221           to debug) to just "get" the XML file and then parse it as a string.
1222
1223             use XML::Twig;
1224             use LWP::Simple;
1225             my $twig= XML::Twig->new();
1226             $twig->parse( LWP::Simple::get( $URL ));
1227
1228           or
1229
1230             use XML::Twig;
1231             my $twig= XML::Twig->nparse( $URL);
1232
1233           If the $optional_user_agent argument is used then it is used,
1234           otherwise a new one is created.
1235
1236       safe_parse ( SOURCE [, OPT => OPT_VALUE [...]])
1237           This method is similar to "parse" except that it wraps the parsing
1238           in an "eval" block. It returns the twig on success and 0 on failure
1239           (the twig object also contains the parsed twig). $@ contains the
1240           error message on failure.
1241
1242           Note that the parsing still stops as soon as an error is detected,
1243           there is no way to keep going after an error.
1244
1245       safe_parsefile (FILE [, OPT => OPT_VALUE [...]])
1246           This method is similar to "parsefile" except that it wraps the
1247           parsing in an "eval" block. It returns the twig on success and 0 on
1248           failure (the twig object also contains the parsed twig) . $@
1249           contains the error message on failure
1250
1251           Note that the parsing still stops as soon as an error is detected,
1252           there is no way to keep going after an error.
1253
1254       safe_parseurl ($url $optional_user_agent)
1255           Same as "parseurl" except that it wraps the parsing in an "eval"
1256           block. It returns the twig on success and 0 on failure (the twig
1257           object also contains the parsed twig) . $@ contains the error
1258           message on failure
1259
1260       parse_html ($string_or_fh)
1261           parse an HTML string or file handle (by converting it to XML using
1262           HTML::TreeBuilder, which needs to be available).
1263
1264           This works nicely, but some information gets lost in the process:
1265           newlines are removed, and (at least on the version I use), comments
1266           get get an extra CDATA section inside ( <!-- foo --> becomes <!--
1267           <![CDATA[ foo ]]> -->
1268
1269       parsefile_html ($file)
1270           parse an HTML file (by converting it to XML using
1271           HTML::TreeBuilder, which needs to be available, or HTML::Tidy if
1272           the "use_tidy" option was used).  The file is loaded completely in
1273           memory and converted to XML before being parsed.
1274
1275           this method is to be used with caution though, as it doesn't know
1276           about the file encoding, it is usually better to use "parse_html",
1277           which gives you a chance to open the file with the proper encoding
1278           layer.
1279
1280       parseurl_html ($url $optional_user_agent)
1281           parse an URL as html the same way "parse_html" does
1282
1283       safe_parseurl_html ($url $optional_user_agent)
1284           Same as "parseurl_html"> except that it wraps the parsing in an
1285           "eval" block.  It returns the twig on success and 0 on failure (the
1286           twig object also contains the parsed twig) . $@ contains the error
1287           message on failure
1288
1289       safe_parsefile_html ($file $optional_user_agent)
1290           Same as "parsefile_html"> except that it wraps the parsing in an
1291           "eval" block.  It returns the twig on success and 0 on failure (the
1292           twig object also contains the parsed twig) . $@ contains the error
1293           message on failure
1294
1295       safe_parse_html ($string_or_fh)
1296           Same as "parse_html" except that it wraps the parsing in an "eval"
1297           block.  It returns the twig on success and 0 on failure (the twig
1298           object also contains the parsed twig) . $@ contains the error
1299           message on failure
1300
1301       xparse ($thing_to_parse)
1302           parse the $thing_to_parse, whether it is a filehandle, a string, an
1303           HTML file, an HTML URL, an URL or a file.
1304
1305           Note that this is mostly a convenience method for one-off scripts.
1306           For example files that end in '.htm' or '.html' are parsed first as
1307           XML, and if this fails as HTML. This is certainly not the most
1308           efficient way to do this in general.
1309
1310       nparse ($optional_twig_options, $thing_to_parse)
1311           create a twig with the $optional_options, and parse the
1312           $thing_to_parse, whether it is a filehandle, a string, an HTML
1313           file, an HTML URL, an URL or a file.
1314
1315           Examples:
1316
1317              XML::Twig->nparse( "file.xml");
1318              XML::Twig->nparse( error_context => 1, "file://file.xml");
1319
1320       nparse_pp ($optional_twig_options, $thing_to_parse)
1321           same as "nparse" but also sets the "pretty_print" option to
1322           "indented".
1323
1324       nparse_e ($optional_twig_options, $thing_to_parse)
1325           same as "nparse" but also sets the "error_context" option to 1.
1326
1327       nparse_ppe ($optional_twig_options, $thing_to_parse)
1328           same as "nparse" but also sets the "pretty_print" option to
1329           "indented" and the "error_context" option to 1.
1330
1331       parser
1332           This method returns the "expat" object (actually the
1333           XML::Parser::Expat object) used during parsing. It is useful for
1334           example to call XML::Parser::Expat methods on it. To get the line
1335           of a tag for example use "$t->parser->current_line".
1336
1337       setTwigHandlers ($handlers)
1338           Set the twig_handlers. $handlers is a reference to a hash similar
1339           to the one in the "twig_handlers" option of new. All previous
1340           handlers are unset.  The method returns the reference to the
1341           previous handlers.
1342
1343       setTwigHandler ($exp $handler)
1344           Set a single twig_handler for elements matching $exp. $handler is a
1345           reference to a subroutine. If the handler was previously set then
1346           the reference to the previous handler is returned.
1347
1348       setStartTagHandlers ($handlers)
1349           Set the start_tag handlers. $handlers is a reference to a hash
1350           similar to the one in the "start_tag_handlers" option of new. All
1351           previous handlers are unset.  The method returns the reference to
1352           the previous handlers.
1353
1354       setStartTagHandler ($exp $handler)
1355           Set a single start_tag handlers for elements matching $exp.
1356           $handler is a reference to a subroutine. If the handler was
1357           previously set then the reference to the previous handler is
1358           returned.
1359
1360       setEndTagHandlers ($handlers)
1361           Set the end_tag handlers. $handlers is a reference to a hash
1362           similar to the one in the "end_tag_handlers" option of new. All
1363           previous handlers are unset.  The method returns the reference to
1364           the previous handlers.
1365
1366       setEndTagHandler ($exp $handler)
1367           Set a single end_tag handlers for elements matching $exp. $handler
1368           is a reference to a subroutine. If the handler was previously set
1369           then the reference to the previous handler is returned.
1370
1371       setTwigRoots ($handlers)
1372           Same as using the "twig_roots" option when creating the twig
1373
1374       setCharHandler ($exp $handler)
1375           Set a "char_handler"
1376
1377       setIgnoreEltsHandler ($exp)
1378           Set a "ignore_elt" handler (elements that match $exp will be
1379           ignored
1380
1381       setIgnoreEltsHandlers ($exp)
1382           Set all "ignore_elt" handlers (previous handlers are replaced)
1383
1384       dtd Return the dtd (an XML::Twig::DTD object) of a twig
1385
1386       xmldecl
1387           Return the XML declaration for the document, or a default one if it
1388           doesn't have one
1389
1390       doctype
1391           Return the doctype for the document
1392
1393       doctype_name
1394           returns the doctype of the document from the doctype declaration
1395
1396       system_id
1397           returns the system value of the DTD of the document from the
1398           doctype declaration
1399
1400       public_id
1401           returns the public doctype of the document from the doctype
1402           declaration
1403
1404       internal_subset
1405           returns the internal subset of the DTD
1406
1407       dtd_text
1408           Return the DTD text
1409
1410       dtd_print
1411           Print the DTD
1412
1413       model ($tag)
1414           Return the model (in the DTD) for the element $tag
1415
1416       root
1417           Return the root element of a twig
1418
1419       set_root ($elt)
1420           Set the root of a twig
1421
1422       first_elt ($optional_condition)
1423           Return the first element matching $optional_condition of a twig, if
1424           no condition is given then the root is returned
1425
1426       last_elt ($optional_condition)
1427           Return the last element matching $optional_condition of a twig, if
1428           no condition is given then the last element of the twig is returned
1429
1430       elt_id        ($id)
1431           Return the element whose "id" attribute is $id
1432
1433       getEltById
1434           Same as "elt_id"
1435
1436       index ($index_name, $optional_index)
1437           If the $optional_index argument is present, return the
1438           corresponding element in the index (created using the "index"
1439           option for "XML::Twig-"new>)
1440
1441           If the argument is not present, return an arrayref to the index
1442
1443       normalize
1444           merge together all consecutive pcdata elements in the document (if
1445           for example you have turned some elements into pcdata using
1446           "erase", this will give you a "clean" document in which there all
1447           text elements are as long as possible).
1448
1449       encoding
1450           This method returns the encoding of the XML document, as defined by
1451           the "encoding" attribute in the XML declaration (ie it is "undef"
1452           if the attribute is not defined)
1453
1454       set_encoding
1455           This method sets the value of the "encoding" attribute in the XML
1456           declaration.  Note that if the document did not have a declaration
1457           it is generated (with an XML version of 1.0)
1458
1459       xml_version
1460           This method returns the XML version, as defined by the "version"
1461           attribute in the XML declaration (ie it is "undef" if the attribute
1462           is not defined)
1463
1464       set_xml_version
1465           This method sets the value of the "version" attribute in the XML
1466           declaration.  If the declaration did not exist it is created.
1467
1468       standalone
1469           This method returns the value of the "standalone" declaration for
1470           the document
1471
1472       set_standalone
1473           This method sets the value of the "standalone" attribute in the XML
1474           declaration.  Note that if the document did not have a declaration
1475           it is generated (with an XML version of 1.0)
1476
1477       set_output_encoding
1478           Set the "encoding" "attribute" in the XML declaration
1479
1480       set_doctype ($name, $system, $public, $internal)
1481           Set the doctype of the element. If an argument is "undef" (or not
1482           present) then its former value is retained, if a false ('' or 0)
1483           value is passed then the former value is deleted;
1484
1485       entity_list
1486           Return the entity list of a twig
1487
1488       entity_names
1489           Return the list of all defined entities
1490
1491       entity ($entity_name)
1492           Return the entity
1493
1494       change_gi      ($old_gi, $new_gi)
1495           Performs a (very fast) global change. All elements $old_gi are now
1496           $new_gi. This is a bit dangerous though and should be avoided if <
1497           possible, as the new tag might be ignored in subsequent processing.
1498
1499           See "BUGS "
1500
1501       flush            ($optional_filehandle, %options)
1502           Flushes a twig up to (and including) the current element, then
1503           deletes all unnecessary elements from the tree that's kept in
1504           memory.  "flush" keeps track of which elements need to be
1505           open/closed, so if you flush from handlers you don't have to worry
1506           about anything. Just keep flushing the twig every time you're done
1507           with a sub-tree and it will come out well-formed. After the whole
1508           parsing don't forget to"flush" one more time to print the end of
1509           the document.  The doctype and entity declarations are also
1510           printed.
1511
1512           flush take an optional filehandle as an argument.
1513
1514           If you use "flush" at any point during parsing, the document will
1515           be flushed one last time at the end of the parsing, to the proper
1516           filehandle.
1517
1518           options: use the "update_DTD" option if you have updated the
1519           (internal) DTD and/or the entity list and you want the updated DTD
1520           to be output
1521
1522           The "pretty_print" option sets the pretty printing of the document.
1523
1524              Example: $t->flush( Update_DTD => 1);
1525                       $t->flush( $filehandle, pretty_print => 'indented');
1526                       $t->flush( \*FILE);
1527
1528       flush_up_to ($elt, $optional_filehandle, %options)
1529           Flushes up to the $elt element. This allows you to keep part of the
1530           tree in memory when you "flush".
1531
1532           options: see flush.
1533
1534       purge
1535           Does the same as a "flush" except it does not print the twig. It
1536           just deletes all elements that have been completely parsed so far.
1537
1538       purge_up_to ($elt)
1539           Purges up to the $elt element. This allows you to keep part of the
1540           tree in memory when you "purge".
1541
1542       print            ($optional_filehandle, %options)
1543           Prints the whole document associated with the twig. To be used only
1544           AFTER the parse.
1545
1546           options: see "flush".
1547
1548       print_to_file    ($filename, %options)
1549           Prints the whole document associated with the twig to file
1550           $filename.  To be used only AFTER the parse.
1551
1552           options: see "flush".
1553
1554       sprint
1555           Return the text of the whole document associated with the twig. To
1556           be used only AFTER the parse.
1557
1558           options: see "flush".
1559
1560       trim
1561           Trim the document: gets rid of initial and trailing spaces, and
1562           replaces multiple spaces by a single one.
1563
1564       toSAX1 ($handler)
1565           Send SAX events for the twig to the SAX1 handler $handler
1566
1567       toSAX2 ($handler)
1568           Send SAX events for the twig to the SAX2 handler $handler
1569
1570       flush_toSAX1 ($handler)
1571           Same as flush, except that SAX events are sent to the SAX1 handler
1572           $handler instead of the twig being printed
1573
1574       flush_toSAX2 ($handler)
1575           Same as flush, except that SAX events are sent to the SAX2 handler
1576           $handler instead of the twig being printed
1577
1578       ignore
1579           This method should be called during parsing, usually in
1580           "start_tag_handlers".  It causes the element to be skipped during
1581           the parsing: the twig is not built for this element, it will not be
1582           accessible during parsing or after it. The element will not take up
1583           any memory and parsing will be faster.
1584
1585           Note that this method can also be called on an element. If the
1586           element is a parent of the current element then this element will
1587           be ignored (the twig will not be built any more for it and what has
1588           already been built will be deleted).
1589
1590       set_pretty_print  ($style)
1591           Set the pretty print method, amongst '"none"' (default),
1592           '"nsgmls"', '"nice"', '"indented"', "indented_c", '"wrapped"',
1593           '"record"' and '"record_c"'
1594
1595           WARNING: the pretty print style is a GLOBAL variable, so once set
1596           it's applied to ALL "print"'s (and "sprint"'s). Same goes if you
1597           use XML::Twig with "mod_perl" . This should not be a problem as the
1598           XML that's generated is valid anyway, and XML processors (as well
1599           as HTML processors, including browsers) should not care. Let me
1600           know if this is a big problem, but at the moment the
1601           performance/cleanliness trade-off clearly favors the global
1602           approach.
1603
1604       set_empty_tag_style  ($style)
1605           Set the empty tag display style ('"normal"', '"html"' or
1606           '"expand"'). As with "set_pretty_print" this sets a global flag.
1607
1608           "normal" outputs an empty tag '"<tag/>"', "html" adds a space
1609           '"<tag />"' for elements that can be empty in XHTML and "expand"
1610           outputs '"<tag></tag>"'
1611
1612       set_remove_cdata  ($flag)
1613           set (or unset) the flag that forces the twig to output CDATA
1614           sections as regular (escaped) PCDATA
1615
1616       print_prolog     ($optional_filehandle, %options)
1617           Prints the prolog (XML declaration + DTD + entity declarations) of
1618           a document.
1619
1620           options: see "flush".
1621
1622       prolog     ($optional_filehandle, %options)
1623           Return the prolog (XML declaration + DTD + entity declarations) of
1624           a document.
1625
1626           options: see "flush".
1627
1628       finish
1629           Call Expat "finish" method.  Unsets all handlers (including
1630           internal ones that set context), but expat continues parsing to the
1631           end of the document or until it finds an error.  It should finish
1632           up a lot faster than with the handlers set.
1633
1634       finish_print
1635           Stops twig processing, flush the twig and proceed to finish
1636           printing the document as fast as possible. Use this method when
1637           modifying a document and the modification is done.
1638
1639       finish_now
1640           Stops twig processing, does not finish parsing the document (which
1641           could actually be not well-formed after the point where
1642           "finish_now" is called).  Execution resumes after the "Lparse"> or
1643           "parsefile" call. The content of the twig is what has been parsed
1644           so far (all open elements at the time "finish_now" is called are
1645           considered closed).
1646
1647       set_expand_external_entities
1648           Same as using the "expand_external_ents" option when creating the
1649           twig
1650
1651       set_input_filter
1652           Same as using the "input_filter" option when creating the twig
1653
1654       set_keep_atts_order
1655           Same as using the "keep_atts_order" option when creating the twig
1656
1657       set_keep_encoding
1658           Same as using the "keep_encoding" option when creating the twig
1659
1660       escape_gt
1661           usually XML::Twig does not escape > in its output. Using this
1662           option makes it replace > by &gt;
1663
1664       do_not_escape_gt
1665           reverts XML::Twig behavior to its default of not escaping > in its
1666           output.
1667
1668       set_output_filter
1669           Same as using the "output_filter" option when creating the twig
1670
1671       set_output_text_filter
1672           Same as using the "output_text_filter" option when creating the
1673           twig
1674
1675       add_stylesheet ($type, @options)
1676           Adds an external stylesheet to an XML document.
1677
1678           Supported types and options:
1679
1680           xsl option: the url of the stylesheet
1681
1682               Example:
1683
1684                 $t->add_stylesheet( xsl => "xsl_style.xsl");
1685
1686               will generate the following PI at the beginning of the
1687               document:
1688
1689                 <?xml-stylesheet type="text/xsl" href="xsl_style.xsl"?>
1690
1691           css option: the url of the stylesheet
1692
1693           active_twig
1694               a class method that returns the last processed twig, so you
1695               don't necessarily need the object to call methods on it.
1696
1697       Methods inherited from XML::Parser::Expat
1698           A twig inherits all the relevant methods from XML::Parser::Expat.
1699           These methods can only be used during the parsing phase (they will
1700           generate a fatal error otherwise).
1701
1702           Inherited methods are:
1703
1704           depth
1705               Returns the size of the context list.
1706
1707           in_element
1708               Returns true if NAME is equal to the name of the innermost curX
1709               rently opened element. If namespace processing is being used
1710               and you want to check against a name that may be in a
1711               namespace, then use the generate_ns_name method to create the
1712               NAME argument.
1713
1714           within_element
1715               Returns the number of times the given name appears in the
1716               context list.  If namespace processing is being used and you
1717               want to check against a name that may be in a namespace, then
1718               use the generX ate_ns_name method to create the NAME argument.
1719
1720           context
1721               Returns a list of element names that represent open elements,
1722               with the last one being the innermost. Inside start and end tag
1723               hanX dlers, this will be the tag of the parent element.
1724
1725           current_line
1726               Returns the line number of the current position of the parse.
1727
1728           current_column
1729               Returns the column number of the current position of the parse.
1730
1731           current_byte
1732               Returns the current position of the parse.
1733
1734           position_in_context
1735               Returns a string that shows the current parse position. LINES
1736               should be an integer >= 0 that represents the number of lines
1737               on either side of the current parse line to place into the
1738               returned string.
1739
1740           base ([NEWBASE])
1741               Returns the current value of the base for resolving relative
1742               URIs.  If NEWBASE is supplied, changes the base to that value.
1743
1744           current_element
1745               Returns the name of the innermost currently opened element.
1746               Inside start or end handlers, returns the parent of the element
1747               associated with those tags.
1748
1749           element_index
1750               Returns an integer that is the depth-first visit order of the
1751               curX rent element. This will be zero outside of the root
1752               element. For example, this will return 1 when called from the
1753               start handler for the root element start tag.
1754
1755           recognized_string
1756               Returns the string from the document that was recognized in
1757               order to call the current handler. For instance, when called
1758               from a start handler, it will give us the the start-tag string.
1759               The string is encoded in UTF-8.  This method doesn't return a
1760               meaningful string inside declaration handlers.
1761
1762           original_string
1763               Returns the verbatim string from the document that was
1764               recognized in order to call the current handler. The string is
1765               in the original document encoding. This method doesn't return a
1766               meaningful string inside declaration handlers.
1767
1768           xpcroak
1769               Concatenate onto the given message the current line number
1770               within the XML document plus the message implied by
1771               ErrorContext. Then croak with the formed message.
1772
1773           xpcarp
1774               Concatenate onto the given message the current line number
1775               within the XML document plus the message implied by
1776               ErrorContext. Then carp with the formed message.
1777
1778           xml_escape(TEXT [, CHAR [, CHAR ...]])
1779               Returns TEXT with markup characters turned into character
1780               entities.  Any additional characters provided as arguments are
1781               also turned into character references where found in TEXT.
1782
1783               (this method is broken on some versions of expat/XML::Parser)
1784
1785       path ( $optional_tag)
1786           Return the element context in a form similar to XPath's short form:
1787           '"/root/tag1/../tag"'
1788
1789       get_xpath  ( $optional_array_ref, $xpath, $optional_offset)
1790           Performs a "get_xpath" on the document root (see <Elt|"Elt">)
1791
1792           If the $optional_array_ref argument is used the array must contain
1793           elements. The $xpath expression is applied to each element in turn
1794           and the result is union of all results. This way a first query can
1795           be refined in further steps.
1796
1797       find_nodes ( $optional_array_ref, $xpath, $optional_offset)
1798           same as "get_xpath"
1799
1800       findnodes ( $optional_array_ref, $xpath, $optional_offset)
1801           same as "get_xpath" (similar to the XML::LibXML method)
1802
1803       findvalue ( $optional_array_ref, $xpath, $optional_offset)
1804           Return the "join" of all texts of the results of applying
1805           "get_xpath" to the node (similar to the XML::LibXML method)
1806
1807       findvalues ( $optional_array_ref, $xpath, $optional_offset)
1808           Return an array of all texts of the results of applying "get_xpath"
1809           to the node
1810
1811       subs_text ($regexp, $replace)
1812           subs_text does text substitution on the whole document, similar to
1813           perl's " s///" operator.
1814
1815       dispose
1816           Useful only if you don't have "Scalar::Util" or "WeakRef"
1817           installed.
1818
1819           Reclaims properly the memory used by an XML::Twig object. As the
1820           object has circular references it never goes out of scope, so if
1821           you want to parse lots of XML documents then the memory leak
1822           becomes a problem. Use "$twig->dispose" to clear this problem.
1823
1824       att_accessors (list_of_attribute_names)
1825           A convenience method that creates l-valued accessors for
1826           attributes.  So "$twig->create_accessors( 'foo')" will create a
1827           "foo" method that can be called on elements:
1828
1829             $elt->foo;         # equivalent to $elt->{'att'}->{'foo'};
1830             $elt->foo( 'bar'); # equivalent to $elt->set_att( foo => 'bar');
1831
1832           The methods are l-valued only under those perl's that support this
1833           feature (5.6 and above)
1834
1835       create_accessors (list_of_attribute_names)
1836           Same as att_accessors
1837
1838       elt_accessors (list_of_attribute_names)
1839           A convenience method that creates accessors for elements.  So
1840           "$twig->create_accessors( 'foo')" will create a "foo" method that
1841           can be called on elements:
1842
1843             $elt->foo;         # equivalent to $elt->first_child( 'foo');
1844
1845       field_accessors (list_of_attribute_names)
1846           A convenience method that creates accessors for element values
1847           ("field").  So "$twig->create_accessors( 'foo')" will create a
1848           "foo" method that can be called on elements:
1849
1850             $elt->foo;         # equivalent to $elt->field( 'foo');
1851
1852       set_do_not_escape_amp_in_atts
1853           An evil method, that I only document because Test::Pod::Coverage
1854           complaints otherwise, but really, you don't want to know about it.
1855
1856   XML::Twig::Elt
1857       new          ($optional_tag, $optional_atts, @optional_content)
1858           The "tag" is optional (but then you can't have a content ), the
1859           $optional_atts argument is a reference to a hash of attributes, the
1860           content can be just a string or a list of strings and element. A
1861           content of '"#EMPTY"' creates an empty element;
1862
1863            Examples: my $elt= XML::Twig::Elt->new();
1864                      my $elt= XML::Twig::Elt->new( para => { align => 'center' });
1865                      my $elt= XML::Twig::Elt->new( para => { align => 'center' }, 'foo');
1866                      my $elt= XML::Twig::Elt->new( br   => '#EMPTY');
1867                      my $elt= XML::Twig::Elt->new( 'para');
1868                      my $elt= XML::Twig::Elt->new( para => 'this is a para');
1869                      my $elt= XML::Twig::Elt->new( para => $elt3, 'another para');
1870
1871           The strings are not parsed, the element is not attached to any
1872           twig.
1873
1874           WARNING: if you rely on ID's then you will have to set the id
1875           yourself. At this point the element does not belong to a twig yet,
1876           so the ID attribute is not known so it won't be stored in the ID
1877           list.
1878
1879           Note that "#COMMENT", "#PCDATA" or "#CDATA" are valid tag names,
1880           that will create text elements.
1881
1882           To create an element "foo" containing a CDATA section:
1883
1884                      my $foo= XML::Twig::Elt->new( '#CDATA' => "content of the CDATA section")
1885                                             ->wrap_in( 'foo');
1886
1887           An attribute of '#CDATA', will create the content of the element as
1888           CDATA:
1889
1890             my $elt= XML::Twig::Elt->new( 'p' => { '#CDATA' => 1}, 'foo < bar');
1891
1892           creates an element
1893
1894             <p><![CDATA[foo < bar]]></>
1895
1896       parse         ($string, %args)
1897           Creates an element from an XML string. The string is actually
1898           parsed as a new twig, then the root of that twig is returned.  The
1899           arguments in %args are passed to the twig.  As always if the parse
1900           fails the parser will die, so use an eval if you want to trap
1901           syntax errors.
1902
1903           As obviously the element does not exist beforehand this method has
1904           to be called on the class:
1905
1906             my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
1907                                             <elements>, actually tons of </elements>
1908                             h</a>");
1909
1910       set_inner_xml ($string)
1911           Sets the content of the element to be the tree created from the
1912           string
1913
1914       set_inner_html ($string)
1915           Sets the content of the element, after parsing the string with an
1916           HTML parser (HTML::Parser)
1917
1918       set_outer_xml ($string)
1919           Replaces the element with the tree created from the string
1920
1921       print         ($optional_filehandle, $optional_pretty_print_style)
1922           Prints an entire element, including the tags, optionally to a
1923           $optional_filehandle, optionally with a $pretty_print_style.
1924
1925           The print outputs XML data so base entities are escaped.
1926
1927       print_to_file    ($filename, %options)
1928           Prints the element to file $filename.
1929
1930           options: see "flush".  =item sprint       ($elt,
1931           $optional_no_enclosing_tag)
1932
1933           Return the xml string for an entire element, including the tags.
1934           If the optional second argument is true then only the string inside
1935           the element is returned (the start and end tag for $elt are not).
1936           The text is XML-escaped: base entities (& and < in text, & < and "
1937           in attribute values) are turned into entities.
1938
1939       gi  Return the gi of the element (the gi is the "generic identifier"
1940           the tag name in SGML parlance).
1941
1942           "tag" and "name" are synonyms of "gi".
1943
1944       tag Same as "gi"
1945
1946       name
1947           Same as "tag"
1948
1949       set_gi         ($tag)
1950           Set the gi (tag) of an element
1951
1952       set_tag        ($tag)
1953           Set the tag (="tag") of an element
1954
1955       set_name       ($name)
1956           Set the name (="tag") of an element
1957
1958       root
1959           Return the root of the twig in which the element is contained.
1960
1961       twig
1962           Return the twig containing the element.
1963
1964       parent        ($optional_condition)
1965           Return the parent of the element, or the first ancestor matching
1966           the $optional_condition
1967
1968       first_child   ($optional_condition)
1969           Return the first child of the element, or the first child matching
1970           the $optional_condition
1971
1972       has_child ($optional_condition)
1973           Return the first child of the element, or the first child matching
1974           the $optional_condition (same as first_child)
1975
1976       has_children ($optional_condition)
1977           Return the first child of the element, or the first child matching
1978           the $optional_condition (same as first_child)
1979
1980       first_child_text   ($optional_condition)
1981           Return the text of the first child of the element, or the first
1982           child
1983            matching the $optional_condition If there is no first_child then
1984           returns ''. This avoids getting the child, checking for its
1985           existence then getting the text for trivial cases.
1986
1987           Similar methods are available for the other navigation methods:
1988
1989           last_child_text
1990           prev_sibling_text
1991           next_sibling_text
1992           prev_elt_text
1993           next_elt_text
1994           child_text
1995           parent_text
1996
1997           All this methods also exist in "trimmed" variant:
1998
1999           first_child_trimmed_text
2000           last_child_trimmed_text
2001           prev_sibling_trimmed_text
2002           next_sibling_trimmed_text
2003           prev_elt_trimmed_text
2004           next_elt_trimmed_text
2005           child_trimmed_text
2006           parent_trimmed_text
2007       field         ($condition)
2008           Same method as "first_child_text" with a different name
2009
2010       fields         ($condition_list)
2011           Return the list of field (text of first child matching the
2012           conditions), missing fields are returned as the empty string.
2013
2014           Same method as "first_child_text" with a different name
2015
2016       trimmed_field         ($optional_condition)
2017           Same method as "first_child_trimmed_text" with a different name
2018
2019       set_field ($condition, $optional_atts, @list_of_elt_and_strings)
2020           Set the content of the first child of the element that matches
2021           $condition, the rest of the arguments is the same as for
2022           "set_content"
2023
2024           If no child matches $condition _and_ if $condition is a valid XML
2025           element name, then a new element by that name is created and
2026           inserted as the last child.
2027
2028       first_child_matches   ($optional_condition)
2029           Return the element if the first child of the element (if it exists)
2030           passes the $optional_condition "undef" otherwise
2031
2032             if( $elt->first_child_matches( 'title')) ...
2033
2034           is equivalent to
2035
2036             if( $elt->{first_child} && $elt->{first_child}->passes( 'title'))
2037
2038           "first_child_is" is an other name for this method
2039
2040           Similar methods are available for the other navigation methods:
2041
2042           last_child_matches
2043           prev_sibling_matches
2044           next_sibling_matches
2045           prev_elt_matches
2046           next_elt_matches
2047           child_matches
2048           parent_matches
2049       is_first_child ($optional_condition)
2050           returns true (the element) if the element is the first child of its
2051           parent (optionally that satisfies the $optional_condition)
2052
2053       is_last_child ($optional_condition)
2054           returns true (the element) if the element is the last child of its
2055           parent (optionally that satisfies the $optional_condition)
2056
2057       prev_sibling  ($optional_condition)
2058           Return the previous sibling of the element, or the previous sibling
2059           matching $optional_condition
2060
2061       next_sibling  ($optional_condition)
2062           Return the next sibling of the element, or the first one matching
2063           $optional_condition.
2064
2065       next_elt     ($optional_elt, $optional_condition)
2066           Return the next elt (optionally matching $optional_condition) of
2067           the element. This is defined as the next element which opens after
2068           the current element opens.  Which usually means the first child of
2069           the element.  Counter-intuitive as it might look this allows you to
2070           loop through the whole document by starting from the root.
2071
2072           The $optional_elt is the root of a subtree. When the "next_elt" is
2073           out of the subtree then the method returns undef. You can then walk
2074           a sub-tree with:
2075
2076             my $elt= $subtree_root;
2077             while( $elt= $elt->next_elt( $subtree_root))
2078               { # insert processing code here
2079               }
2080
2081       prev_elt     ($optional_condition)
2082           Return the previous elt (optionally matching $optional_condition)
2083           of the element. This is the first element which opens before the
2084           current one.  It is usually either the last descendant of the
2085           previous sibling or simply the parent
2086
2087       next_n_elt   ($offset, $optional_condition)
2088           Return the $offset-th element that matches the $optional_condition
2089
2090       following_elt
2091           Return the following element (as per the XPath following axis)
2092
2093       preceding_elt
2094           Return the preceding element (as per the XPath preceding axis)
2095
2096       following_elts
2097           Return the list of following elements (as per the XPath following
2098           axis)
2099
2100       preceding_elts
2101           Return the pst of preceding elements (as per the XPath preceding
2102           axis)
2103
2104       children     ($optional_condition)
2105           Return the list of children (optionally which matches
2106           $optional_condition) of the element. The list is in document order.
2107
2108       children_count ($optional_condition)
2109           Return the number of children of the element (optionally which
2110           matches $optional_condition)
2111
2112       children_text ($optional_condition)
2113           In array context, reeturns an array containing the text of children
2114           of the element (optionally which matches $optional_condition)
2115
2116           In scalar context, returns the concatenation of the text of
2117           children of the element
2118
2119       children_trimmed_text ($optional_condition)
2120           In array context, returns an array containing the trimmed text of
2121           children of the element (optionally which matches
2122           $optional_condition)
2123
2124           In scalar context, returns the concatenation of the trimmed text of
2125           children of the element
2126
2127       children_copy ($optional_condition)
2128           Return a list of elements that are copies of the children of the
2129           element, optionally which matches $optional_condition
2130
2131       descendants     ($optional_condition)
2132           Return the list of all descendants (optionally which matches
2133           $optional_condition) of the element. This is the equivalent of the
2134           "getElementsByTagName" of the DOM (by the way, if you are really a
2135           DOM addict, you can use "getElementsByTagName" instead)
2136
2137       getElementsByTagName ($optional_condition)
2138           Same as "descendants"
2139
2140       find_by_tag_name ($optional_condition)
2141           Same as "descendants"
2142
2143       descendants_or_self ($optional_condition)
2144           Same as "descendants" except that the element itself is included in
2145           the list if it matches the $optional_condition
2146
2147       first_descendant  ($optional_condition)
2148           Return the first descendant of the element that matches the
2149           condition
2150
2151       last_descendant  ($optional_condition)
2152           Return the last descendant of the element that matches the
2153           condition
2154
2155       ancestors    ($optional_condition)
2156           Return the list of ancestors (optionally matching
2157           $optional_condition) of the element.  The list is ordered from the
2158           innermost ancestor to the outermost one
2159
2160           NOTE: the element itself is not part of the list, in order to
2161           include it you will have to use ancestors_or_self
2162
2163       ancestors_or_self     ($optional_condition)
2164           Return the list of ancestors (optionally matching
2165           $optional_condition) of the element, including the element (if it
2166           matches the condition>).  The list is ordered from the innermost
2167           ancestor to the outermost one
2168
2169       passes ($condition)
2170           Return the element if it passes the $condition
2171
2172       att          ($att)
2173           Return the value of attribute $att or "undef"
2174
2175       latt          ($att)
2176           Return the value of attribute $att or "undef"
2177
2178           this method is an lvalue, so you can do "$elt->latt( 'foo')= 'bar'"
2179           or "$elt->latt( 'foo')++;"
2180
2181       set_att      ($att, $att_value)
2182           Set the attribute of the element to the given value
2183
2184           You can actually set several attributes this way:
2185
2186             $elt->set_att( att1 => "val1", att2 => "val2");
2187
2188       del_att      ($att)
2189           Delete the attribute for the element
2190
2191           You can actually delete several attributes at once:
2192
2193             $elt->del_att( 'att1', 'att2', 'att3');
2194
2195       att_exists ($att)
2196           Returns true if the attribute $att exists for the element, false
2197           otherwise
2198
2199       cut Cut the element from the tree. The element still exists, it can be
2200           copied or pasted somewhere else, it is just not attached to the
2201           tree anymore.
2202
2203           Note that the "old" links to the parent, previous and next siblings
2204           can still be accessed using the former_* methods
2205
2206       former_next_sibling
2207           Returns the former next sibling of a cut node (or undef if the node
2208           has not been cut)
2209
2210           This makes it easier to write loops where you cut elements:
2211
2212               my $child= $parent->first_child( 'achild');
2213               while( $child->{'att'}->{'cut'})
2214                 { $child->cut; $child= ($child->{former} && $child->{former}->{next_sibling}); }
2215
2216       former_prev_sibling
2217           Returns the former previous sibling of a cut node (or undef if the
2218           node has not been cut)
2219
2220       former_parent
2221           Returns the former parent of a cut node (or undef if the node has
2222           not been cut)
2223
2224       cut_children ($optional_condition)
2225           Cut all the children of the element (or all of those which satisfy
2226           the $optional_condition).
2227
2228           Return the list of children
2229
2230       cut_descendants ($optional_condition)
2231           Cut all the descendants of the element (or all of those which
2232           satisfy the $optional_condition).
2233
2234           Return the list of descendants
2235
2236       copy        ($elt)
2237           Return a copy of the element. The copy is a "deep" copy: all sub-
2238           elements of the element are duplicated.
2239
2240       paste       ($optional_position, $ref)
2241           Paste a (previously "cut" or newly generated) element. Die if the
2242           element already belongs to a tree.
2243
2244           Note that the calling element is pasted:
2245
2246             $child->paste( first_child => $existing_parent);
2247             $new_sibling->paste( after => $this_sibling_is_already_in_the_tree);
2248
2249           or
2250
2251             my $new_elt= XML::Twig::Elt->new( tag => $content);
2252             $new_elt->paste( $position => $existing_elt);
2253
2254           Example:
2255
2256             my $t= XML::Twig->new->parse( 'doc.xml')
2257             my $toc= $t->root->new( 'toc');
2258             $toc->paste( $t->root); # $toc is pasted as first child of the root
2259             foreach my $title ($t->findnodes( '/doc/section/title'))
2260               { my $title_toc= $title->copy;
2261                 # paste $title_toc as the last child of toc
2262                 $title_toc->paste( last_child => $toc)
2263               }
2264
2265           Position options:
2266
2267           first_child (default)
2268               The element is pasted as the first child of $ref
2269
2270           last_child
2271               The element is pasted as the last child of $ref
2272
2273           before
2274               The element is pasted before $ref, as its previous sibling.
2275
2276           after
2277               The element is pasted after $ref, as its next sibling.
2278
2279           within
2280               In this case an extra argument, $offset, should be supplied.
2281               The element will be pasted in the reference element (or in its
2282               first text child) at the given offset. To achieve this the
2283               reference element will be split at the offset.
2284
2285           Note that you can call directly the underlying method:
2286
2287           paste_before
2288           paste_after
2289           paste_first_child
2290           paste_last_child
2291           paste_within
2292       move       ($optional_position, $ref)
2293           Move an element in the tree.  This is just a "cut" then a "paste".
2294           The syntax is the same as "paste".
2295
2296       replace       ($ref)
2297           Replaces an element in the tree. Sometimes it is just not possible
2298           to"cut" an element then "paste" another in its place, so "replace"
2299           comes in handy.  The calling element replaces $ref.
2300
2301       replace_with   (@elts)
2302           Replaces the calling element with one or more elements
2303
2304       delete
2305           Cut the element and frees the memory.
2306
2307       prefix       ($text, $optional_option)
2308           Add a prefix to an element. If the element is a "PCDATA" element
2309           the text is added to the pcdata, if the elements first child is a
2310           "PCDATA" then the text is added to it's pcdata, otherwise a new
2311           "PCDATA" element is created and pasted as the first child of the
2312           element.
2313
2314           If the option is "asis" then the prefix is added asis: it is
2315           created in a separate "PCDATA" element with an "asis" property. You
2316           can then write:
2317
2318             $elt1->prefix( '<b>', 'asis');
2319
2320           to create a "<b>" in the output of "print".
2321
2322       suffix       ($text, $optional_option)
2323           Add a suffix to an element. If the element is a "PCDATA" element
2324           the text is added to the pcdata, if the elements last child is a
2325           "PCDATA" then the text is added to it's pcdata, otherwise a new
2326           PCDATA element is created and pasted as the last child of the
2327           element.
2328
2329           If the option is "asis" then the suffix is added asis: it is
2330           created in a separate "PCDATA" element with an "asis" property. You
2331           can then write:
2332
2333             $elt2->suffix( '</b>', 'asis');
2334
2335       trim
2336           Trim the element in-place: spaces at the beginning and at the end
2337           of the element are discarded and multiple spaces within the element
2338           (or its descendants) are replaced by a single space.
2339
2340           Note that in some cases you can still end up with multiple spaces,
2341           if they are split between several elements:
2342
2343             <doc>  text <b>  hah! </b>  yep</doc>
2344
2345           gets trimmed to
2346
2347             <doc>text <b> hah! </b> yep</doc>
2348
2349           This is somewhere in between a bug and a feature.
2350
2351       normalize
2352           merge together all consecutive pcdata elements in the element (if
2353           for example you have turned some elements into pcdata using
2354           "erase", this will give you a "clean" element in which there all
2355           text fragments are as long as possible).
2356
2357       simplify (%options)
2358           Return a data structure suspiciously similar to XML::Simple's.
2359           Options are identical to XMLin options, see XML::Simple doc for
2360           more details (or use DATA::dumper or YAML to dump the data
2361           structure)
2362
2363           Note: there is no magic here, if you write "$twig->parsefile( $file
2364           )->simplify();" then it will load the entire document in memory. I
2365           am afraid you will have to put some work into it to get just the
2366           bits you want and discard the rest. Look at the synopsys or the
2367           XML::Twig 101 section at the top of the docs for more information.
2368
2369           content_key
2370           forcearray
2371           keyattr
2372           noattr
2373           normalize_space
2374               aka normalise_space
2375
2376           variables (%var_hash)
2377               %var_hash is a hash { name => value }
2378
2379               This option allows variables in the XML to be expanded when the
2380               file is read. (there is no facility for putting the variable
2381               names back if you regenerate XML using XMLout).
2382
2383               A 'variable' is any text of the form ${name} (or $name) which
2384               occurs in an attribute value or in the text content of an
2385               element. If 'name' matches a key in the supplied hashref,
2386               ${name} will be replaced with the corresponding value from the
2387               hashref. If no matching key is found, the variable will not be
2388               replaced.
2389
2390           var_att ($attribute_name)
2391               This option gives the name of an attribute that will be used to
2392               create variables in the XML:
2393
2394                 <dirs>
2395                   <dir name="prefix">/usr/local</dir>
2396                   <dir name="exec_prefix">$prefix/bin</dir>
2397                 </dirs>
2398
2399               use "var => 'name'" to get $prefix replaced by /usr/local in
2400               the generated data structure
2401
2402               By default variables are captured by the following regexp:
2403               /$(\w+)/
2404
2405           var_regexp (regexp)
2406               This option changes the regexp used to capture variables. The
2407               variable name should be in $1
2408
2409           group_tags { grouping tag => grouped tag, grouping tag 2 => grouped
2410           tag 2...}
2411               Option used to simplify the structure: elements listed will not
2412               be used.  Their children will be, they will be considered
2413               children of the element parent.
2414
2415               If the element is:
2416
2417                 <config host="laptop.xmltwig.org">
2418                   <server>localhost</server>
2419                   <dirs>
2420                     <dir name="base">/home/mrodrigu/standards</dir>
2421                     <dir name="tools">$base/tools</dir>
2422                   </dirs>
2423                   <templates>
2424                     <template name="std_def">std_def.templ</template>
2425                     <template name="dummy">dummy</template>
2426                   </templates>
2427                 </config>
2428
2429               Then calling simplify with "group_tags => { dirs => 'dir',
2430               templates => 'template'}" makes the data structure be exactly
2431               as if the start and end tags for "dirs" and "templates" were
2432               not there.
2433
2434               A YAML dump of the structure
2435
2436                 base: '/home/mrodrigu/standards'
2437                 host: laptop.xmltwig.org
2438                 server: localhost
2439                 template:
2440                   - std_def.templ
2441                   - dummy.templ
2442                 tools: '$base/tools'
2443
2444       split_at        ($offset)
2445           Split a text ("PCDATA" or "CDATA") element in 2 at $offset, the
2446           original element now holds the first part of the string and a new
2447           element holds the right part. The new element is returned
2448
2449           If the element is not a text element then the first text child of
2450           the element is split
2451
2452       split        ( $optional_regexp, $tag1, $atts1, $tag2, $atts2...)
2453           Split the text descendants of an element in place, the text is
2454           split using the $regexp, if the regexp includes () then the matched
2455           separators will be wrapped in elements.  $1 is wrapped in $tag1,
2456           with attributes $atts1 if $atts1 is given (as a hashref), $2 is
2457           wrapped in $tag2...
2458
2459           if $elt is "<p>tati tata <b>tutu tati titi</b> tata tati tata</p>"
2460
2461             $elt->split( qr/(ta)ti/, 'foo', {type => 'toto'} )
2462
2463           will change $elt to
2464
2465             <p><foo type="toto">ta</foo> tata <b>tutu <foo type="toto">ta</foo>
2466                 titi</b> tata <foo type="toto">ta</foo> tata</p>
2467
2468           The regexp can be passed either as a string or as "qr//" (perl
2469           5.005 and later), it defaults to \s+ just as the "split" built-in
2470           (but this would be quite a useless behaviour without the
2471           $optional_tag parameter)
2472
2473           $optional_tag defaults to PCDATA or CDATA, depending on the initial
2474           element type
2475
2476           The list of descendants is returned (including un-touched original
2477           elements and newly created ones)
2478
2479       mark        ( $regexp, $optional_tag, $optional_attribute_ref)
2480           This method behaves exactly as split, except only the newly created
2481           elements are returned
2482
2483       wrap_children ( $regexp_string, $tag, $optional_attribute_hashref)
2484           Wrap the children of the element that match the regexp in an
2485           element $tag.  If $optional_attribute_hashref is passed then the
2486           new element will have these attributes.
2487
2488           The $regexp_string includes tags, within pointy brackets, as in
2489           "<title><para>+" and the usual Perl modifiers (+*?...).  Tags can
2490           be further qualified with attributes: "<para type="warning"
2491           classif="cosmic_secret">+". The values for attributes should be
2492           xml-escaped: "<candy type="M&amp;Ms">*" ("<", "&" ">" and """
2493           should be escaped).
2494
2495           Note that elements might get extra "id" attributes in the process.
2496           See add_id.  Use strip_att to remove unwanted id's.
2497
2498           Here is an example:
2499
2500           If the element $elt has the following content:
2501
2502             <elt>
2503              <p>para 1</p>
2504              <l_l1_1>list 1 item 1 para 1</l_l1_1>
2505                <l_l1>list 1 item 1 para 2</l_l1>
2506              <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2507              <l_l1_n>list 1 item 3 para 1</l_l1_n>
2508                <l_l1>list 1 item 3 para 2</l_l1>
2509                <l_l1>list 1 item 3 para 3</l_l1>
2510              <l_l1_1>list 2 item 1 para 1</l_l1_1>
2511                <l_l1>list 2 item 1 para 2</l_l1>
2512              <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2513              <l_l1_n>list 2 item 3 para 1</l_l1_n>
2514                <l_l1>list 2 item 3 para 2</l_l1>
2515                <l_l1>list 2 item 3 para 3</l_l1>
2516             </elt>
2517
2518           Then the code
2519
2520             $elt->wrap_children( q{<l_l1_1><l_l1>*} , li => { type => "ul1" });
2521             $elt->wrap_children( q{<l_l1_n><l_l1>*} , li => { type => "ul" });
2522
2523             $elt->wrap_children( q{<li type="ul1"><li type="ul">+}, "ul");
2524             $elt->strip_att( 'id');
2525             $elt->strip_att( 'type');
2526             $elt->print;
2527
2528           will output:
2529
2530             <elt>
2531                <p>para 1</p>
2532                <ul>
2533                  <li>
2534                    <l_l1_1>list 1 item 1 para 1</l_l1_1>
2535                    <l_l1>list 1 item 1 para 2</l_l1>
2536                  </li>
2537                  <li>
2538                    <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2539                  </li>
2540                  <li>
2541                    <l_l1_n>list 1 item 3 para 1</l_l1_n>
2542                    <l_l1>list 1 item 3 para 2</l_l1>
2543                    <l_l1>list 1 item 3 para 3</l_l1>
2544                  </li>
2545                </ul>
2546                <ul>
2547                  <li>
2548                    <l_l1_1>list 2 item 1 para 1</l_l1_1>
2549                    <l_l1>list 2 item 1 para 2</l_l1>
2550                  </li>
2551                  <li>
2552                    <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2553                  </li>
2554                  <li>
2555                    <l_l1_n>list 2 item 3 para 1</l_l1_n>
2556                    <l_l1>list 2 item 3 para 2</l_l1>
2557                    <l_l1>list 2 item 3 para 3</l_l1>
2558                  </li>
2559                </ul>
2560             </elt>
2561
2562       subs_text ($regexp, $replace)
2563           subs_text does text substitution, similar to perl's " s///"
2564           operator.
2565
2566           $regexp must be a perl regexp, created with the "qr" operator.
2567
2568           $replace can include "$1, $2"... from the $regexp. It can also be
2569           used to create element and entities, by using "&elt( tag => { att
2570           => val }, text)" (similar syntax as "new") and "&ent( name)".
2571
2572           Here is a rather complex example:
2573
2574             $elt->subs_text( qr{(?<!do not )link to (http://([^\s,]*))},
2575                              'see &elt( a =>{ href => $1 }, $2)'
2576                            );
2577
2578           This will replace text like link to http://www.xmltwig.org by see
2579           <a href="www.xmltwig.org">www.xmltwig.org</a>, but not do not link
2580           to...
2581
2582           Generating entities (here replacing spaces with &nbsp;):
2583
2584             $elt->subs_text( qr{ }, '&ent( "&nbsp;")');
2585
2586           or, using a variable:
2587
2588             my $ent="&nbsp;";
2589             $elt->subs_text( qr{ }, "&ent( '$ent')");
2590
2591           Note that the substitution is always global, as in using the "g"
2592           modifier in a perl substitution, and that it is performed on all
2593           text descendants of the element.
2594
2595           Bug: in the $regexp, you can only use "\1", "\2"... if the
2596           replacement expression does not include elements or attributes. eg
2597
2598             $t->subs_text( qr/((t[aiou])\2)/, '$2');             # ok, replaces toto, tata, titi, tutu by to, ta, ti, tu
2599             $t->subs_text( qr/((t[aiou])\2)/, '&elt(p => $1)' ); # NOK, does not find toto...
2600
2601       add_id ($optional_coderef)
2602           Add an id to the element.
2603
2604           The id is an attribute, "id" by default, see the "id" option for
2605           XML::Twig "new" to change it. Use an id starting with "#" to get an
2606           id that's not output by print, flush or sprint, yet that allows you
2607           to use the elt_id method to get the element easily.
2608
2609           If the element already has an id, no new id is generated.
2610
2611           By default the method create an id of the form "twig_id_<nnnn>",
2612           where "<nnnn>" is a number, incremented each time the method is
2613           called successfully.
2614
2615       set_id_seed ($prefix)
2616           by default the id generated by "add_id" is "twig_id_<nnnn>",
2617           "set_id_seed" changes the prefix to $prefix and resets the number
2618           to 1
2619
2620       strip_att ($att)
2621           Remove the attribute $att from all descendants of the element
2622           (including the element)
2623
2624           Return the element
2625
2626       change_att_name ($old_name, $new_name)
2627           Change the name of the attribute from $old_name to $new_name. If
2628           there is no attribute $old_name nothing happens.
2629
2630       lc_attnames
2631           Lower cases the name all the attributes of the element.
2632
2633       sort_children_on_value( %options)
2634           Sort the children of the element in place according to their text.
2635           All children are sorted.
2636
2637           Return the element, with its children sorted.
2638
2639           %options are
2640
2641             type  : numeric |  alpha     (default: alpha)
2642             order : normal  |  reverse   (default: normal)
2643
2644           Return the element, with its children sorted
2645
2646       sort_children_on_att ($att, %options)
2647           Sort the children of the  element in place according to attribute
2648           $att.  %options are the same as for "sort_children_on_value"
2649
2650           Return the element.
2651
2652       sort_children_on_field ($tag, %options)
2653           Sort the children of the element in place, according to the field
2654           $tag (the text of the first child of the child with this tag).
2655           %options are the same as for "sort_children_on_value".
2656
2657           Return the element, with its children sorted
2658
2659       sort_children( $get_key, %options)
2660           Sort the children of the element in place. The $get_key argument is
2661           a reference to a function that returns the sort key when passed an
2662           element.
2663
2664           For example:
2665
2666             $elt->sort_children( sub { $_[0]->{'att'}->{"nb"} + $_[0]->text },
2667                                  type => 'numeric', order => 'reverse'
2668                                );
2669
2670       field_to_att ($cond, $att)
2671           Turn the text of the first sub-element matched by $cond into the
2672           value of attribute $att of the element. If $att is omitted then
2673           $cond is used as the name of the attribute, which makes sense only
2674           if $cond is a valid element (and attribute) name.
2675
2676           The sub-element is then cut.
2677
2678       att_to_field ($att, $tag)
2679           Take the value of attribute $att and create a sub-element $tag as
2680           first child of the element. If $tag is omitted then $att is used as
2681           the name of the sub-element.
2682
2683       get_xpath  ($xpath, $optional_offset)
2684           Return a list of elements satisfying the $xpath. $xpath is an
2685           XPATH-like expression.
2686
2687           A subset of the XPATH abbreviated syntax is covered:
2688
2689             tag
2690             tag[1] (or any other positive number)
2691             tag[last()]
2692             tag[@att] (the attribute exists for the element)
2693             tag[@att="val"]
2694             tag[@att=~ /regexp/]
2695             tag[att1="val1" and att2="val2"]
2696             tag[att1="val1" or att2="val2"]
2697             tag[string()="toto"] (returns tag elements which text (as per the text method)
2698                                  is toto)
2699             tag[string()=~/regexp/] (returns tag elements which text (as per the text
2700                                     method) matches regexp)
2701             expressions can start with / (search starts at the document root)
2702             expressions can start with . (search starts at the current element)
2703             // can be used to get all descendants instead of just direct children
2704             * matches any tag
2705
2706           So the following examples from the XPath
2707           recommendation<http://www.w3.org/TR/xpath.html#path-abbrev> work:
2708
2709             para selects the para element children of the context node
2710             * selects all element children of the context node
2711             para[1] selects the first para child of the context node
2712             para[last()] selects the last para child of the context node
2713             */para selects all para grandchildren of the context node
2714             /doc/chapter[5]/section[2] selects the second section of the fifth chapter
2715                of the doc
2716             chapter//para selects the para element descendants of the chapter element
2717                children of the context node
2718             //para selects all the para descendants of the document root and thus selects
2719                all para elements in the same document as the context node
2720             //olist/item selects all the item elements in the same document as the
2721                context node that have an olist parent
2722             .//para selects the para element descendants of the context node
2723             .. selects the parent of the context node
2724             para[@type="warning"] selects all para children of the context node that have
2725                a type attribute with value warning
2726             employee[@secretary and @assistant] selects all the employee children of the
2727                context node that have both a secretary attribute and an assistant
2728                attribute
2729
2730           The elements will be returned in the document order.
2731
2732           If $optional_offset is used then only one element will be returned,
2733           the one with the appropriate offset in the list, starting at 0
2734
2735           Quoting and interpolating variables can be a pain when the Perl
2736           syntax and the XPATH syntax collide, so use alternate quoting
2737           mechanisms like q or qq (I like q{} and qq{} myself).
2738
2739           Here are some more examples to get you started:
2740
2741             my $p1= "p1";
2742             my $p2= "p2";
2743             my @res= $t->get_xpath( qq{p[string( "$p1") or string( "$p2")]});
2744
2745             my $a= "a1";
2746             my @res= $t->get_xpath( qq{//*[@att="$a"]});
2747
2748             my $val= "a1";
2749             my $exp= qq{//p[ \@att='$val']}; # you need to use \@ or you will get a warning
2750             my @res= $t->get_xpath( $exp);
2751
2752           Note that the only supported regexps delimiters are / and that you
2753           must backslash all / in regexps AND in regular strings.
2754
2755           XML::Twig does not provide natively full XPATH support, but you can
2756           use "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as
2757           the XPath engine, with full coverage of the spec.
2758
2759           "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as the
2760           XPath engine, with full coverage of the spec.
2761
2762       find_nodes
2763           same as"get_xpath"
2764
2765       findnodes
2766           same as "get_xpath"
2767
2768       text @optional_options
2769           Return a string consisting of all the "PCDATA" and "CDATA" in an
2770           element, without any tags. The text is not XML-escaped: base
2771           entities such as "&" and "<" are not escaped.
2772
2773           The '"no_recurse"' option will only return the text of the element,
2774           not of any included sub-elements (same as "text_only").
2775
2776       text_only
2777           Same as "text" except that the text returned doesn't include the
2778           text of sub-elements.
2779
2780       trimmed_text
2781           Same as "text" except that the text is trimmed: leading and
2782           trailing spaces are discarded, consecutive spaces are collapsed
2783
2784       set_text        ($string)
2785           Set the text for the element: if the element is a "PCDATA", just
2786           set its text, otherwise cut all the children of the element and
2787           create a single "PCDATA" child for it, which holds the text.
2788
2789       merge ($elt2)
2790           Move the content of $elt2 within the element
2791
2792       insert         ($tag1, [$optional_atts1], $tag2, [$optional_atts2],...)
2793           For each tag in the list inserts an element $tag as the only child
2794           of the element.  The element gets the optional attributes
2795           in"$optional_atts<n>."  All children of the element are set as
2796           children of the new element.  The upper level element is returned.
2797
2798             $p->insert( table => { border=> 1}, 'tr', 'td')
2799
2800           put $p in a table with a visible border, a single "tr" and a single
2801           "td" and return the "table" element:
2802
2803             <p><table border="1"><tr><td>original content of p</td></tr></table></p>
2804
2805       wrap_in        (@tag)
2806           Wrap elements in @tag as the successive ancestors of the element,
2807           returns the new element.  "$elt->wrap_in( 'td', 'tr', 'table')"
2808           wraps the element as a single cell in a table for example.
2809
2810           Optionally each tag can be followed by a hashref of attributes,
2811           that will be set on the wrapping element:
2812
2813             $elt->wrap_in( p => { class => "advisory" }, div => { class => "intro", id => "div_intro" });
2814
2815       insert_new_elt ($opt_position, $tag, $opt_atts_hashref, @opt_content)
2816           Combines a "new " and a "paste ": creates a new element using $tag,
2817           $opt_atts_hashref and @opt_content which are arguments similar to
2818           those for "new", then paste it, using $opt_position or
2819           'first_child', relative to $elt.
2820
2821           Return the newly created element
2822
2823       erase
2824           Erase the element: the element is deleted and all of its children
2825           are pasted in its place.
2826
2827       set_content    ( $optional_atts, @list_of_elt_and_strings) (
2828       $optional_atts, '#EMPTY')
2829           Set the content for the element, from a list of strings and
2830           elements.  Cuts all the element children, then pastes the list
2831           elements as the children.  This method will create a "PCDATA"
2832           element for any strings in the list.
2833
2834           The $optional_atts argument is the ref of a hash of attributes. If
2835           this argument is used then the previous attributes are deleted,
2836           otherwise they are left untouched.
2837
2838           WARNING: if you rely on ID's then you will have to set the id
2839           yourself. At this point the element does not belong to a twig yet,
2840           so the ID attribute is not known so it won't be stored in the ID
2841           list.
2842
2843           A content of '"#EMPTY"' creates an empty element;
2844
2845       namespace ($optional_prefix)
2846           Return the URI of the namespace that $optional_prefix or the
2847           element name belongs to. If the name doesn't belong to any
2848           namespace, "undef" is returned.
2849
2850       local_name
2851           Return the local name (without the prefix) for the element
2852
2853       ns_prefix
2854           Return the namespace prefix for the element
2855
2856       current_ns_prefixes
2857           Return a list of namespace prefixes valid for the element. The
2858           order of the prefixes in the list has no meaning. If the default
2859           namespace is currently bound, '' appears in the list.
2860
2861       inherit_att  ($att, @optional_tag_list)
2862           Return the value of an attribute inherited from parent tags. The
2863           value returned is found by looking for the attribute in the element
2864           then in turn in each of its ancestors. If the @optional_tag_list is
2865           supplied only those ancestors whose tag is in the list will be
2866           checked.
2867
2868       all_children_are ($optional_condition)
2869           return 1 if all children of the element pass the
2870           $optional_condition, 0 otherwise
2871
2872       level       ($optional_condition)
2873           Return the depth of the element in the twig (root is 0).  If
2874           $optional_condition is given then only ancestors that match the
2875           condition are counted.
2876
2877           WARNING: in a tree created using the "twig_roots" option this will
2878           not return the level in the document tree, level 0 will be the
2879           document root, level 1 will be the "twig_roots" elements. During
2880           the parsing (in a "twig_handler") you can use the "depth" method on
2881           the twig object to get the real parsing depth.
2882
2883       in           ($potential_parent)
2884           Return true if the element is in the potential_parent
2885           ($potential_parent is an element)
2886
2887       in_context   ($cond, $optional_level)
2888           Return true if the element is included in an element which passes
2889           $cond optionally within $optional_level levels. The returned value
2890           is the including element.
2891
2892       pcdata
2893           Return the text of a "PCDATA" element or "undef" if the element is
2894           not "PCDATA".
2895
2896       pcdata_xml_string
2897           Return the text of a "PCDATA" element or undef if the element is
2898           not "PCDATA".  The text is "XML-escaped" ('&' and '<' are replaced
2899           by '&amp;' and '&lt;')
2900
2901       set_pcdata     ($text)
2902           Set the text of a "PCDATA" element. This method does not check that
2903           the element is indeed a "PCDATA" so usually you should use
2904           "set_text" instead.
2905
2906       append_pcdata  ($text)
2907           Add the text at the end of a "PCDATA" element.
2908
2909       is_cdata
2910           Return 1 if the element is a "CDATA" element, returns 0 otherwise.
2911
2912       is_text
2913           Return 1 if the element is a "CDATA" or "PCDATA" element, returns 0
2914           otherwise.
2915
2916       cdata
2917           Return the text of a "CDATA" element or "undef" if the element is
2918           not "CDATA".
2919
2920       cdata_string
2921           Return the XML string of a "CDATA" element, including the opening
2922           and closing markers.
2923
2924       set_cdata     ($text)
2925           Set the text of a "CDATA" element.
2926
2927       append_cdata  ($text)
2928           Add the text at the end of a "CDATA" element.
2929
2930       remove_cdata
2931           Turns all "CDATA" sections in the element into regular "PCDATA"
2932           elements. This is useful when converting XML to HTML, as browsers
2933           do not support CDATA sections.
2934
2935       extra_data
2936           Return the extra_data (comments and PI's) attached to an element
2937
2938       set_extra_data     ($extra_data)
2939           Set the extra_data (comments and PI's) attached to an element
2940
2941       append_extra_data  ($extra_data)
2942           Append extra_data to the existing extra_data before the element (if
2943           no previous extra_data exists then it is created)
2944
2945       set_asis
2946           Set a property of the element that causes it to be output without
2947           being XML escaped by the print functions: if it contains "a < b" it
2948           will be output as such and not as "a &lt; b". This can be useful to
2949           create text elements that will be output as markup. Note that all
2950           "PCDATA" descendants of the element are also marked as having the
2951           property (they are the ones that are actually impacted by the
2952           change).
2953
2954           If the element is a "CDATA" element it will also be output asis,
2955           without the "CDATA" markers. The same goes for any "CDATA"
2956           descendant of the element
2957
2958       set_not_asis
2959           Unsets the "asis" property for the element and its text
2960           descendants.
2961
2962       is_asis
2963           Return the "asis" property status of the element ( 1 or "undef")
2964
2965       closed
2966           Return true if the element has been closed. Might be useful if you
2967           are somewhere in the tree, during the parse, and have no idea
2968           whether a parent element is completely loaded or not.
2969
2970       get_type
2971           Return the type of the element: '"#ELT"' for "real" elements, or
2972           '"#PCDATA"', '"#CDATA"', '"#COMMENT"', '"#ENT"', '"#PI"'
2973
2974       is_elt
2975           Return the tag if the element is a "real" element, or 0 if it is
2976           "PCDATA", "CDATA"...
2977
2978       contains_only_text
2979           Return 1 if the element does not contain any other "real" element
2980
2981       contains_only ($exp)
2982           Return the list of children if all children of the element match
2983           the expression $exp
2984
2985             if( $para->contains_only( 'tt')) { ... }
2986
2987       contains_a_single ($exp)
2988           If the element contains a single child that matches the expression
2989           $exp returns that element. Otherwise returns 0.
2990
2991       is_field
2992           same as "contains_only_text"
2993
2994       is_pcdata
2995           Return 1 if the element is a "PCDATA" element, returns 0 otherwise.
2996
2997       is_ent
2998           Return 1 if the element is an entity (an unexpanded entity)
2999           element, return 0 otherwise.
3000
3001       is_empty
3002           Return 1 if the element is empty, 0 otherwise
3003
3004       set_empty
3005           Flags the element as empty. No further check is made, so if the
3006           element is actually not empty the output will be messed. The only
3007           effect of this method is that the output will be "<tag
3008           att="value""/>".
3009
3010       set_not_empty
3011           Flags the element as not empty. if it is actually empty then the
3012           element will be output as "<tag att="value""></tag>"
3013
3014       is_pi
3015           Return 1 if the element is a processing instruction ("#PI")
3016           element, return 0 otherwise.
3017
3018       target
3019           Return the target of a processing instruction
3020
3021       set_target ($target)
3022           Set the target of a processing instruction
3023
3024       data
3025           Return the data part of a processing instruction
3026
3027       set_data ($data)
3028           Set the data of a processing instruction
3029
3030       set_pi ($target, $data)
3031           Set the target and data of a processing instruction
3032
3033       pi_string
3034           Return the string form of a processing instruction ("<?target
3035           data?>")
3036
3037       is_comment
3038           Return 1 if the element is a comment ("#COMMENT") element, return 0
3039           otherwise.
3040
3041       set_comment ($comment_text)
3042           Set the text for a comment
3043
3044       comment
3045           Return the content of a comment (just the text, not the "<!--" and
3046           "-->")
3047
3048       comment_string
3049           Return the XML string for a comment ("<!-- comment -->")
3050
3051           Note that an XML comment cannot start or end with a '-', or include
3052           '--' (http://www.w3.org/TR/2008/REC-xml-20081126/#sec-comments), if
3053           that is the case (because you have created the comment yourself
3054           presumably, as it could not be in the input XML), then a space will
3055           be inserted before an initial '-', after a trailing one or between
3056           two '-' in the comment (which could presumably mangle javascript
3057           "hidden" in an XHTML comment);
3058
3059       set_ent ($entity)
3060           Set an (non-expanded) entity ("#ENT"). $entity) is the entity text
3061           ("&ent;")
3062
3063       ent Return the entity for an entity ("#ENT") element ("&ent;")
3064
3065       ent_name
3066           Return the entity name for an entity ("#ENT") element ("ent")
3067
3068       ent_string
3069           Return the entity, either expanded if the expanded version is
3070           available, or non-expanded ("&ent;") otherwise
3071
3072       child ($offset, $optional_condition)
3073           Return the $offset-th child of the element, optionally the
3074           $offset-th child that matches $optional_condition. The children are
3075           treated as a list, so "$elt->child( 0)" is the first child, while
3076           "$elt->child( -1)" is the last child.
3077
3078       child_text ($offset, $optional_condition)
3079           Return the text of a child or "undef" if the sibling does not
3080           exist. Arguments are the same as child.
3081
3082       last_child    ($optional_condition)
3083           Return the last child of the element, or the last child matching
3084           $optional_condition (ie the last of the element children matching
3085           the condition).
3086
3087       last_child_text   ($optional_condition)
3088           Same as "first_child_text" but for the last child.
3089
3090       sibling  ($offset, $optional_condition)
3091           Return the next or previous $offset-th sibling of the element, or
3092           the $offset-th one matching $optional_condition. If $offset is
3093           negative then a previous sibling is returned, if $offset is
3094           positive then  a next sibling is returned. "$offset=0" returns the
3095           element if there is no condition or if the element matches the
3096           condition>, "undef" otherwise.
3097
3098       sibling_text ($offset, $optional_condition)
3099           Return the text of a sibling or "undef" if the sibling does not
3100           exist.  Arguments are the same as "sibling".
3101
3102       prev_siblings ($optional_condition)
3103           Return the list of previous siblings (optionally matching
3104           $optional_condition) for the element. The elements are ordered in
3105           document order.
3106
3107       next_siblings ($optional_condition)
3108           Return the list of siblings (optionally matching
3109           $optional_condition) following the element. The elements are
3110           ordered in document order.
3111
3112       siblings ($optional_condition)
3113           Return the list of siblings (optionally matching
3114           $optional_condition) of the element (excluding the element itself).
3115           The elements are ordered in document order.
3116
3117       pos ($optional_condition)
3118           Return the position of the element in the children list. The first
3119           child has a position of 1 (as in XPath).
3120
3121           If the $optional_condition is given then only siblings that match
3122           the condition are counted. If the element itself does not match the
3123           condition then 0 is returned.
3124
3125       atts
3126           Return a hash ref containing the element attributes
3127
3128       set_atts      ({ att1=>$att1_val, att2=> $att2_val... })
3129           Set the element attributes with the hash ref supplied as the
3130           argument. The previous attributes are lost (ie the attributes set
3131           by "set_atts" replace all of the attributes of the element).
3132
3133           You can also pass a list instead of a hashref: "$elt->set_atts(
3134           att1 => 'val1',...)"
3135
3136       del_atts
3137           Deletes all the element attributes.
3138
3139       att_nb
3140           Return the number of attributes for the element
3141
3142       has_atts
3143           Return true if the element has attributes (in fact return the
3144           number of attributes, thus being an alias to "att_nb"
3145
3146       has_no_atts
3147           Return true if the element has no attributes, false (0) otherwise
3148
3149       att_names
3150           return a list of the attribute names for the element
3151
3152       att_xml_string ($att, $options)
3153           Return the attribute value, where '&', '<' and quote (" or the
3154           value of the quote option at twig creation) are XML-escaped.
3155
3156           The options are passed as a hashref, setting "escape_gt" to a true
3157           value will also escape '>' ($elt( 'myatt', { escape_gt => 1 });
3158
3159       set_id       ($id)
3160           Set the "id" attribute of the element to the value.  See "elt_id "
3161           to change the id attribute name
3162
3163       id  Gets the id attribute value
3164
3165       del_id       ($id)
3166           Deletes the "id" attribute of the element and remove it from the id
3167           list for the document
3168
3169       class
3170           Return the "class" attribute for the element (methods on the
3171           "class" attribute are quite convenient when dealing with XHTML, or
3172           plain XML that will eventually be displayed using CSS)
3173
3174       lclass
3175           same as class, except that this method is an lvalue, so you can do
3176           "$elt->lclass= "foo""
3177
3178       set_class ($class)
3179           Set the "class" attribute for the element to $class
3180
3181       add_class ($class)
3182           Add $class to the element "class" attribute: the new class is added
3183           only if it is not already present.
3184
3185           Note that classes are then sorted alphabetically, so the "class"
3186           attribute can be changed even if the class is already there
3187
3188       remove_class ($class)
3189           Remove $class from the element "class" attribute.
3190
3191           Note that classes are then sorted alphabetically, so the "class"
3192           attribute can be changed even if the class is already there
3193
3194       add_to_class ($class)
3195           alias for add_class
3196
3197       att_to_class ($att)
3198           Set the "class" attribute to the value of attribute $att
3199
3200       add_att_to_class ($att)
3201           Add the value of attribute $att to the "class" attribute of the
3202           element
3203
3204       move_att_to_class ($att)
3205           Add the value of attribute $att to the "class" attribute of the
3206           element and delete the attribute
3207
3208       tag_to_class
3209           Set the "class" attribute of the element to the element tag
3210
3211       add_tag_to_class
3212           Add the element tag to its "class" attribute
3213
3214       set_tag_class ($new_tag)
3215           Add the element tag to its "class" attribute and sets the tag to
3216           $new_tag
3217
3218       in_class ($class)
3219           Return true (1) if the element is in the class $class (if $class is
3220           one of the tokens in the element "class" attribute)
3221
3222       tag_to_span
3223           Change the element tag tp "span" and set its class to the old tag
3224
3225       tag_to_div
3226           Change the element tag tp "div" and set its class to the old tag
3227
3228       DESTROY
3229           Frees the element from memory.
3230
3231       start_tag
3232           Return the string for the start tag for the element, including the
3233           "/>" at the end of an empty element tag
3234
3235       end_tag
3236           Return the string for the end tag of an element.  For an empty
3237           element, this returns the empty string ('').
3238
3239       xml_string @optional_options
3240           Equivalent to "$elt->sprint( 1)", returns the string for the entire
3241           element, excluding the element's tags (but nested element tags are
3242           present)
3243
3244           The '"no_recurse"' option will only return the text of the element,
3245           not of any included sub-elements (same as "xml_text_only").
3246
3247       inner_xml
3248           Another synonym for xml_string
3249
3250       outer_xml
3251           An other synonym for sprint
3252
3253       xml_text
3254           Return the text of the element, encoded (and processed by the
3255           current "output_filter" or "output_encoding" options, without any
3256           tag.
3257
3258       xml_text_only
3259           Same as "xml_text" except that the text returned doesn't include
3260           the text of sub-elements.
3261
3262       set_pretty_print ($style)
3263           Set the pretty print method, amongst '"none"' (default),
3264           '"nsgmls"', '"nice"', '"indented"', '"record"' and '"record_c"'
3265
3266           pretty_print styles:
3267
3268           none
3269               the default, no "\n" is used
3270
3271           nsgmls
3272               nsgmls style, with "\n" added within tags
3273
3274           nice
3275               adds "\n" wherever possible (NOT SAFE, can lead to invalid XML)
3276
3277           indented
3278               same as "nice" plus indents elements (NOT SAFE, can lead to
3279               invalid XML)
3280
3281           record
3282               table-oriented pretty print, one field per line
3283
3284           record_c
3285               table-oriented pretty print, more compact than "record", one
3286               record per line
3287
3288       set_empty_tag_style ($style)
3289           Set the method to output empty tags, amongst '"normal"' (default),
3290           '"html"', and '"expand"',
3291
3292           "normal" outputs an empty tag '"<tag/>"', "html" adds a space
3293           '"<tag />"' for elements that can be empty in XHTML and "expand"
3294           outputs '"<tag></tag>"'
3295
3296       set_remove_cdata  ($flag)
3297           set (or unset) the flag that forces the twig to output CDATA
3298           sections as regular (escaped) PCDATA
3299
3300       set_indent ($string)
3301           Set the indentation for the indented pretty print style (default is
3302           2 spaces)
3303
3304       set_quote ($quote)
3305           Set the quotes used for attributes. can be '"double"' (default) or
3306           '"single"'
3307
3308       cmp       ($elt)
3309             Compare the order of the 2 elements in a twig.
3310
3311             C<$a> is the <A>..</A> element, C<$b> is the <B>...</B> element
3312
3313             document                        $a->cmp( $b)
3314             <A> ... </A> ... <B>  ... </B>     -1
3315             <A> ... <B>  ... </B> ... </A>     -1
3316             <B> ... </B> ... <A>  ... </A>      1
3317             <B> ... <A>  ... </A> ... </B>      1
3318              $a == $b                           0
3319              $a and $b not in the same tree   undef
3320
3321       before       ($elt)
3322           Return 1 if $elt starts before the element, 0 otherwise. If the 2
3323           elements are not in the same twig then return "undef".
3324
3325               if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3326
3327       after       ($elt)
3328           Return 1 if $elt starts after the element, 0 otherwise. If the 2
3329           elements are not in the same twig then return "undef".
3330
3331               if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3332
3333       other comparison methods
3334           lt
3335           le
3336           gt
3337           ge
3338       path
3339           Return the element context in a form similar to XPath's short form:
3340           '"/root/tag1/../tag"'
3341
3342       xpath
3343           Return a unique XPath expression that can be used to find the
3344           element again.
3345
3346           It looks like "/doc/sect[3]/title": unique elements do not have an
3347           index, the others do.
3348
3349       flush
3350           flushes the twig up to the current element (strictly equivalent to
3351           "$elt->root->flush")
3352
3353       private methods
3354           Low-level methods on the twig:
3355
3356           set_parent        ($parent)
3357           set_first_child   ($first_child)
3358           set_last_child    ($last_child)
3359           set_prev_sibling  ($prev_sibling)
3360           set_next_sibling  ($next_sibling)
3361           set_twig_current
3362           del_twig_current
3363           twig_current
3364           contains_text
3365
3366           Those methods should not be used, unless of course you find some
3367           creative and interesting, not to mention useful, ways to do it.
3368
3369   cond
3370       Most of the navigation functions accept a condition as an optional
3371       argument The first element (or all elements for "children " or
3372       "ancestors ") that passes the condition is returned.
3373
3374       The condition is a single step of an XPath expression using the XPath
3375       subset defined by "get_xpath". Additional conditions are:
3376
3377       The condition can be
3378
3379       #ELT
3380           return a "real" element (not a PCDATA, CDATA, comment or pi
3381           element)
3382
3383       #TEXT
3384           return a PCDATA or CDATA element
3385
3386       regular expression
3387           return an element whose tag matches the regexp. The regexp has to
3388           be created with "qr//" (hence this is available only on perl 5.005
3389           and above)
3390
3391       code reference
3392           applies the code, passing the current element as argument, if the
3393           code returns true then the element is returned, if it returns false
3394           then the code is applied to the next candidate.
3395
3396   XML::Twig::XPath
3397       XML::Twig implements a subset of XPath through the "get_xpath" method.
3398
3399       If you want to use the whole XPath power, then you can use
3400       "XML::Twig::XPath" instead. In this case "XML::Twig" uses "XML::XPath"
3401       to execute XPath queries.  You will of course need "XML::XPath"
3402       installed to be able to use "XML::Twig::XPath".
3403
3404       See XML::XPath for more information.
3405
3406       The methods you can use are:
3407
3408       findnodes              ($path)
3409           return a list of nodes found by $path.
3410
3411       findnodes_as_string    ($path)
3412           return the nodes found reproduced as XML. The result is not
3413           guaranteed to be valid XML though.
3414
3415       findvalue              ($path)
3416           return the concatenation of the text content of the result nodes
3417
3418       In order for "XML::XPath" to be used as the XPath engine the following
3419       methods are included in "XML::Twig":
3420
3421       in XML::Twig
3422
3423       getRootNode
3424       getParentNode
3425       getChildNodes
3426
3427       in XML::Twig::Elt
3428
3429       string_value
3430       toString
3431       getName
3432       getRootNode
3433       getNextSibling
3434       getPreviousSibling
3435       isElementNode
3436       isTextNode
3437       isPI
3438       isPINode
3439       isProcessingInstructionNode
3440       isComment
3441       isCommentNode
3442       getTarget
3443       getChildNodes
3444       getElementById
3445
3446   XML::Twig::XPath::Elt
3447       The methods you can use are the same as on "XML::Twig::XPath" elements:
3448
3449       findnodes              ($path)
3450           return a list of nodes found by $path.
3451
3452       findnodes_as_string    ($path)
3453           return the nodes found reproduced as XML. The result is not
3454           guaranteed to be valid XML though.
3455
3456       findvalue              ($path)
3457           return the concatenation of the text content of the result nodes
3458
3459   XML::Twig::Entity_list
3460       new Create an entity list.
3461
3462       add         ($ent)
3463           Add an entity to an entity list.
3464
3465       add_new_ent ($name, $val, $sysid, $pubid, $ndata, $param)
3466           Create a new entity and add it to the entity list
3467
3468       delete     ($ent or $tag).
3469           Delete an entity (defined by its name or by the Entity object) from
3470           the list.
3471
3472       print      ($optional_filehandle)
3473           Print the entity list.
3474
3475       list
3476           Return the list as an array
3477
3478   XML::Twig::Entity
3479       new        ($name, $val, $sysid, $pubid, $ndata, $param)
3480           Same arguments as the Entity handler for XML::Parser.
3481
3482       print       ($optional_filehandle)
3483           Print an entity declaration.
3484
3485       name
3486           Return the name of the entity
3487
3488       val Return the value of the entity
3489
3490       sysid
3491           Return the system id for the entity (for NDATA entities)
3492
3493       pubid
3494           Return the public id for the entity (for NDATA entities)
3495
3496       ndata
3497           Return true if the entity is an NDATA entity
3498
3499       param
3500           Return true if the entity is a parameter entity
3501
3502       text
3503           Return the entity declaration text.
3504

EXAMPLES

3506       Additional examples (and a complete tutorial) can be found  on the
3507       XML::Twig Page<http://www.xmltwig.org/xmltwig/>
3508
3509       To figure out what flush does call the following script with an XML
3510       file and an element name as arguments
3511
3512         use XML::Twig;
3513
3514         my ($file, $elt)= @ARGV;
3515         my $t= XML::Twig->new( twig_handlers =>
3516             { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
3517         $t->parsefile( $file, ErrorContext => 2);
3518         $t->flush;
3519         print "\n";
3520

NOTES

3522   Subclassing XML::Twig
3523       Useful methods:
3524
3525       elt_class
3526           In order to subclass "XML::Twig" you will probably need to subclass
3527           also "XML::Twig::Elt". Use the "elt_class" option when you create
3528           the "XML::Twig" object to get the elements created in a different
3529           class (which should be a subclass of "XML::Twig::Elt".
3530
3531       add_options
3532           If you inherit "XML::Twig" new method but want to add more options
3533           to it you can use this method to prevent XML::Twig to issue
3534           warnings for those additional options.
3535
3536   DTD Handling
3537       There are 3 possibilities here.  They are:
3538
3539       No DTD
3540           No doctype, no DTD information, no entity information, the world is
3541           simple...
3542
3543       Internal DTD
3544           The XML document includes an internal DTD, and maybe entity
3545           declarations.
3546
3547           If you use the load_DTD option when creating the twig the DTD
3548           information and the entity declarations can be accessed.
3549
3550           The DTD and the entity declarations will be "flush"'ed (or
3551           "print"'ed) either as is (if they have not been modified) or as
3552           reconstructed (poorly, comments are lost, order is not kept, due to
3553           it's content this DTD should not be viewed by anyone) if they have
3554           been modified. You can also modify them directly by changing the
3555           "$twig->{twig_doctype}->{internal}" field (straight from
3556           XML::Parser, see the "Doctype" handler doc)
3557
3558       External DTD
3559           The XML document includes a reference to an external DTD, and maybe
3560           entity declarations.
3561
3562           If you use the "load_DTD" when creating the twig the DTD
3563           information and the entity declarations can be accessed. The entity
3564           declarations will be "flush"'ed (or "print"'ed) either as is (if
3565           they have not been modified) or as reconstructed (badly, comments
3566           are lost, order is not kept).
3567
3568           You can change the doctype through the "$twig->set_doctype" method
3569           and print the dtd through the "$twig->dtd_text" or
3570           "$twig->dtd_print"
3571            methods.
3572
3573           If you need to modify the entity list this is probably the easiest
3574           way to do it.
3575
3576   Flush
3577       Remember that element handlers are called when the element is CLOSED,
3578       so if you have handlers for nested elements the inner handlers will be
3579       called first. It makes it for example trickier than it would seem to
3580       number nested sections (or clauses, or divs), as the titles in the
3581       inner sections are handled before the outer sections.
3582

BUGS

3584       segfault during parsing
3585           This happens when parsing huge documents, or lots of small ones,
3586           with a version of Perl before 5.16.
3587
3588           This is due to a bug in the way weak references are handled in Perl
3589           itself.
3590
3591           The fix is either to upgrade to Perl 5.16 or later ("perlbrew" is a
3592           great tool to manage several installations of perl on the same
3593           machine).
3594
3595           An other, NOT RECOMMENDED, way of fixing the problem, is to switch
3596           off weak references by writing "XML::Twig::_set_weakrefs( 0);" at
3597           the top of the code.  This is totally unsupported, and may lead to
3598           other problems though,
3599
3600       entity handling
3601           Due to XML::Parser behaviour, non-base entities in attribute values
3602           disappear if they are not declared in the document:
3603           "att="val&ent;"" will be turned into "att => val", unless you use
3604           the "keep_encoding" argument to "XML::Twig->new"
3605
3606       DTD handling
3607           The DTD handling methods are quite bugged. No one uses them and it
3608           seems very difficult to get them to work in all cases, including
3609           with several slightly incompatible versions of XML::Parser and of
3610           libexpat.
3611
3612           Basically you can read the DTD, output it back properly, and update
3613           entities, but not much more.
3614
3615           So use XML::Twig with standalone documents, or with documents
3616           refering to an external DTD, but don't expect it to properly parse
3617           and even output back the DTD.
3618
3619       memory leak
3620           If you use a REALLY old Perl (5.005!) and a lot of twigs you might
3621           find that you leak quite a lot of memory (about 2Ks per twig). You
3622           can use the "dispose " method to free that memory after you are
3623           done.
3624
3625           If you create elements the same thing might happen, use the
3626           "delete" method to get rid of them.
3627
3628           Alternatively installing the "Scalar::Util" (or "WeakRef") module
3629           on a version of Perl that supports it (>5.6.0) will get rid of the
3630           memory leaks automagically.
3631
3632       ID list
3633           The ID list is NOT updated when elements are cut or deleted.
3634
3635       change_gi
3636           This method will not function properly if you do:
3637
3638                $twig->change_gi( $old1, $new);
3639                $twig->change_gi( $old2, $new);
3640                $twig->change_gi( $new, $even_newer);
3641
3642       sanity check on XML::Parser method calls
3643           XML::Twig should really prevent calls to some XML::Parser methods,
3644           especially the "setHandlers" method.
3645
3646       pretty printing
3647           Pretty printing (at least using the '"indented"' style) is hard to
3648           get right!  Only elements that belong to the document will be
3649           properly indented. Printing elements that do not belong to the twig
3650           makes it impossible for XML::Twig to figure out their depth, and
3651           thus their indentation level.
3652
3653           Also there is an unavoidable bug when using "flush" and pretty
3654           printing for elements with mixed content that start with an
3655           embedded element:
3656
3657             <elt><b>b</b>toto<b>bold</b></elt>
3658
3659             will be output as
3660
3661             <elt>
3662               <b>b</b>toto<b>bold</b></elt>
3663
3664           if you flush the twig when you find the "<b>" element
3665

Globals

3667       These are the things that can mess up calling code, especially if
3668       threaded.  They might also cause problem under mod_perl.
3669
3670       Exported constants
3671           Whether you want them or not you get them! These are subroutines to
3672           use as constant when creating or testing elements
3673
3674             PCDATA  return '#PCDATA'
3675             CDATA   return '#CDATA'
3676             PI      return '#PI', I had the choice between PROC and PI :--(
3677
3678       Module scoped values: constants
3679           these should cause no trouble:
3680
3681             %base_ent= ( '>' => '&gt;',
3682                          '<' => '&lt;',
3683                          '&' => '&amp;',
3684                          "'" => '&apos;',
3685                          '"' => '&quot;',
3686                        );
3687             CDATA_START   = "<![CDATA[";
3688             CDATA_END     = "]]>";
3689             PI_START      = "<?";
3690             PI_END        = "?>";
3691             COMMENT_START = "<!--";
3692             COMMENT_END   = "-->";
3693
3694           pretty print styles
3695
3696             ( $NSGMLS, $NICE, $INDENTED, $INDENTED_C, $WRAPPED, $RECORD1, $RECORD2)= (1..7);
3697
3698           empty tag output style
3699
3700             ( $HTML, $EXPAND)= (1..2);
3701
3702       Module scoped values: might be changed
3703           Most of these deal with pretty printing, so the worst that can
3704           happen is probably that XML output does not look right, but is
3705           still valid and processed identically by XML processors.
3706
3707           $empty_tag_style can mess up HTML bowsers though and changing $ID
3708           would most likely create problems.
3709
3710             $pretty=0;           # pretty print style
3711             $quote='"';          # quote for attributes
3712             $INDENT= '  ';       # indent for indented pretty print
3713             $empty_tag_style= 0; # how to display empty tags
3714             $ID                  # attribute used as an id ('id' by default)
3715
3716       Module scoped values: definitely changed
3717           These 2 variables are used to replace tags by an index, thus saving
3718           some space when creating a twig. If they really cause you too much
3719           trouble, let me know, it is probably possible to create either a
3720           switch or at least a version of XML::Twig that does not perform
3721           this optimization.
3722
3723             %gi2index;     # tag => index
3724             @index2gi;     # list of tags
3725
3726       If you need to manipulate all those values, you can use the following
3727       methods on the XML::Twig object:
3728
3729       global_state
3730           Return a hashref with all the global variables used by XML::Twig
3731
3732           The hash has the following fields:  "pretty", "quote", "indent",
3733           "empty_tag_style", "keep_encoding", "expand_external_entities",
3734           "output_filter", "output_text_filter", "keep_atts_order"
3735
3736       set_global_state ($state)
3737           Set the global state, $state is a hashref
3738
3739       save_global_state
3740           Save the current global state
3741
3742       restore_global_state
3743           Restore the previously saved (using "Lsave_global_state"> state
3744

TODO

3746       SAX handlers
3747           Allowing XML::Twig to work on top of any SAX parser
3748
3749       multiple twigs are not well supported
3750           A number of twig features are just global at the moment. These
3751           include the ID list and the "tag pool" (if you use "change_gi" then
3752           you change the tag for ALL twigs).
3753
3754           A future version will try to support this while trying not to be to
3755           hard on performance (at least when a single twig is used!).
3756

AUTHOR

3758       Michel Rodriguez <mirod@cpan.org>
3759

LICENSE

3761       This library is free software; you can redistribute it and/or modify it
3762       under the same terms as Perl itself.
3763
3764       Bug reports should be sent using: RT
3765       <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig>
3766
3767       Comments can be sent to mirod@cpan.org
3768
3769       The XML::Twig page is at <http://www.xmltwig.org/xmltwig/> It includes
3770       the development version of the module, a slightly better version of the
3771       documentation, examples, a tutorial and a: Processing XML efficiently
3772       with Perl and XML::Twig:
3773       <http://www.xmltwig.org/xmltwig/tutorial/index.html>
3774

SEE ALSO

3776       Complete docs, including a tutorial, examples, an easier to use HTML
3777       version of the docs, a quick reference card and a FAQ are available at
3778       <http://www.xmltwig.org/xmltwig/>
3779
3780       git repository at <http://github.com/mirod/xmltwig>
3781
3782       XML::Parser, XML::Parser::Expat, XML::XPath, Encode, Text::Iconv,
3783       Scalar::Utils
3784
3785   Alternative Modules
3786       XML::Twig is not the only XML::Processing module available on CPAN (far
3787       from it!).
3788
3789       The main alternative I would recommend is XML::LibXML.
3790
3791       Here is a quick comparison of the 2 modules:
3792
3793       XML::LibXML, actually "libxml2" on which it is based, sticks to the
3794       standards, and implements a good number of them in a rather strict way:
3795       XML, XPath, DOM, RelaxNG, I must be forgetting a couple (XInclude?). It
3796       is fast and rather frugal memory-wise.
3797
3798       XML::Twig is older: when I started writing it XML::Parser/expat was the
3799       only game in town. It implements XML and that's about it (plus a subset
3800       of XPath, and you can use XML::Twig::XPath if you have XML::XPathEngine
3801       installed for full support). It is slower and requires more memory for
3802       a full tree than XML::LibXML. On the plus side (yes, there is a plus
3803       side!) it lets you process a big document in chunks, and thus let you
3804       tackle documents that couldn't be loaded in memory by XML::LibXML, and
3805       it offers a lot (and I mean a LOT!) of higher-level methods, for
3806       everything, from adding structure to "low-level" XML, to shortcuts for
3807       XHTML conversions and more. It also DWIMs quite a bit, getting comments
3808       and non-significant whitespaces out of the way but preserving them in
3809       the output for example. As it does not stick to the DOM, is also
3810       usually leads to shorter code than in XML::LibXML.
3811
3812       Beyond the pure features of the 2 modules, XML::LibXML seems to be
3813       prefered by "XML-purists", while XML::Twig seems to be more used by
3814       Perl Hackers who have to deal with XML. As you have noted, XML::Twig
3815       also comes with quite a lot of docs, but I am sure if you ask for help
3816       about XML::LibXML here or on Perlmonks you will get answers.
3817
3818       Note that it is actually quite hard for me to compare the 2 modules: on
3819       one hand I know XML::Twig inside-out and I can get it to do pretty much
3820       anything I need to (or I improve it ;--), while I have a very basic
3821       knowledge of XML::LibXML.  So feature-wise, I'd rather use XML::Twig
3822       ;--). On the other hand, I am painfully aware of some of the
3823       deficiencies, potential bugs and plain ugly code that lurk in
3824       XML::Twig, even though you are unlikely to be affected by them (unless
3825       for example you need to change the DTD of a document programatically),
3826       while I haven't looked much into XML::LibXML so it still looks shinny
3827       and clean to me.
3828
3829       That said, if you need to process a document that is too big to fit
3830       memory and XML::Twig is too slow for you, my reluctant advice would be
3831       to use "bare" XML::Parser.  It won't be as easy to use as XML::Twig:
3832       basically with XML::Twig you trade some speed (depending on what you do
3833       from a factor 3 to... none) for ease-of-use, but it will be easier IMHO
3834       than using SAX (albeit not standard), and at this point a LOT faster
3835       (see the last test in
3836       <http://www.xmltwig.org/article/simple_benchmark/>).
3837

POD ERRORS

3839       Hey! The above document had some coding errors, which are explained
3840       below:
3841
3842       Around line 9528:
3843           Invalid =encoding syntax: utf8   # > perl 5.10.0
3844
3845       Around line 10517:
3846           Non-ASCII character seen before =encoding in 'X"print"'. Assuming
3847           UTF-8
3848
3849
3850
3851perl v5.16.3                      2014-06-09                           Twig(3)
Impressum