1Twig(3)               User Contributed Perl Documentation              Twig(3)
2
3
4

NAME

6       XML::Twig - A perl module for processing huge XML documents in tree
7       mode.
8

SYNOPSIS

10       Note that this documentation is intended as a reference to the module.
11
12       Complete docs, including a tutorial, examples, an easier to use HTML
13       version, a quick reference card and a FAQ are available at
14       <http://www.xmltwig.org/xmltwig>
15
16       Small documents (loaded in memory as a tree):
17
18         my $twig=XML::Twig->new();    # create the twig
19         $twig->parsefile( 'doc.xml'); # build it
20         my_process( $twig);           # use twig methods to process it
21         $twig->print;                 # output the twig
22
23       Huge documents (processed in combined stream/tree mode):
24
25         # at most one div will be loaded in memory
26         my $twig=XML::Twig->new(
27           twig_handlers =>
28             { title   => sub { $_->set_tag( 'h2') }, # change title tags to h2
29                                                      # $_ is the current element
30               para    => sub { $_->set_tag( 'p')  }, # change para to p
31               hidden  => sub { $_->delete;       },  # remove hidden elements
32               list    => \&my_list_process,          # process list elements
33               div     => sub { $_[0]->flush;     },  # output and free memory
34             },
35           pretty_print => 'indented',                # output will be nicely formatted
36           empty_tags   => 'html',                    # outputs <empty_tag />
37                                );
38         $twig->parsefile( 'my_big.xml');
39
40         sub my_list_process
41           { my( $twig, $list)= @_;
42             # ...
43           }
44
45       See XML::Twig 101 for other ways to use the module, as a filter for
46       example.
47

DESCRIPTION

49       This module provides a way to process XML documents. It is build on top
50       of "XML::Parser".
51
52       The module offers a tree interface to the document, while allowing you
53       to output the parts of it that have been completely processed.
54
55       It allows minimal resource (CPU and memory) usage by building the tree
56       only for the parts of the documents that need actual processing,
57       through the use of the "twig_roots " and "twig_print_outside_roots "
58       options. The "finish " and "finish_print " methods also help to
59       increase performances.
60
61       XML::Twig tries to make simple things easy so it tries its best to
62       takes care of a lot of the (usually) annoying (but sometimes necessary)
63       features that come with XML and XML::Parser.
64

TOOLS

66       XML::Twig comes with a few command-line utilities:
67
68   xml_pp - xml pretty-printer
69       XML pretty printer using XML::Twig
70
71   xml_grep - grep XML files looking for specific elements
72       "xml_grep" does a grep on XML files. Instead of using regular
73       expressions it uses XPath expressions (in fact the subset of XPath
74       supported by XML::Twig).
75
76   xml_split - cut a big XML file into smaller chunks
77       "xml_split" takes a (presumably big) XML file and split it in several
78       smaller files, based on various criteria (level in the tree, size or an
79       XPath expression)
80
81   xml_merge - merge back XML files split with xml_split
82       "xml_merge" takes several xml files that have been split using
83       "xml_split" and recreates a single file.
84
85   xml_spellcheck - spellcheck XML files
86       "xml_spellcheck" lets you spell check the content of an XML file. It
87       extracts the text (the content of elements and optionally of
88       attributes), call a spell checker on it and then recreates the XML
89       document.
90

XML::Twig 101

92       XML::Twig can be used either on "small" XML documents (that fit in
93       memory) or on huge ones, by processing parts of the document and
94       outputting or discarding them once they are processed.
95
96   Loading an XML document and processing it
97         my $t= XML::Twig->new();
98         $t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
99         my $root= $t->root;
100         $root->set_tag( 'html');              # change doc to html
101         $title= $root->first_child( 'title'); # get the title
102         $title->set_tag( 'h1');               # turn it into h1
103         my @para= $root->children( 'para');   # get the para children
104         foreach my $para (@para)
105           { $para->set_tag( 'p'); }           # turn them into p
106         $t->print;                            # output the document
107
108       Other useful methods include:
109
110       att: "$elt->{'att'}->{'foo'}" return the "foo" attribute for an
111       element,
112
113       set_att : "$elt->set_att( foo => "bar")" sets the "foo" attribute to
114       the "bar" value,
115
116       next_sibling: "$elt->{next_sibling}" return the next sibling in the
117       document (in the example "$title->{next_sibling}" is the first "para",
118       you can also (and actually should) use "$elt->next_sibling( 'para')" to
119       get it
120
121       The document can also be transformed through the use of the cut, copy,
122       paste and move methods: "$title->cut; $title->paste( after => $p);" for
123       example
124
125       And much, much more, see XML::Twig::Elt.
126
127   Processing an XML document chunk by chunk
128       One of the strengths of XML::Twig is that it let you work with files
129       that do not fit in memory (BTW storing an XML document in memory as a
130       tree is quite memory-expensive, the expansion factor being often around
131       10).
132
133       To do this you can define handlers, that will be called once a specific
134       element has been completely parsed. In these handlers you can access
135       the element and process it as you see fit, using the navigation and the
136       cut-n-paste methods, plus lots of convenient ones like "prefix ".  Once
137       the element is completely processed you can then "flush " it, which
138       will output it and free the memory. You can also "purge " it if you
139       don't need to output it (if you are just extracting some data from the
140       document for example). The handler will be called again once the next
141       relevant element has been parsed.
142
143         my $t= XML::Twig->new( twig_handlers =>
144                                 { section => \&section,
145                                   para   => sub { $_->set_tag( 'p'); }
146                                 },
147                              );
148         $t->parsefile( 'doc.xml');
149
150         # the handler is called once a section is completely parsed, ie when
151         # the end tag for section is found, it receives the twig itself and
152         # the element (including all its sub-elements) as arguments
153         sub section
154           { my( $t, $section)= @_;      # arguments for all twig_handlers
155             $section->set_tag( 'div');  # change the tag name
156             # let's use the attribute nb as a prefix to the title
157             my $title= $section->first_child( 'title'); # find the title
158             my $nb= $title->{'att'}->{'nb'}; # get the attribute
159             $title->prefix( "$nb - ");  # easy isn't it?
160             $section->flush;            # outputs the section and frees memory
161           }
162
163       There is of course more to it: you can trigger handlers on more
164       elaborate conditions than just the name of the element, "section/title"
165       for example.
166
167         my $t= XML::Twig->new( twig_handlers =>
168                                  { 'section/title' => sub { $_->print } }
169                              )
170                         ->parsefile( 'doc.xml');
171
172       Here "sub { $_->print }" simply prints the current element ($_ is
173       aliased to the element in the handler).
174
175       You can also trigger a handler on a test on an attribute:
176
177         my $t= XML::Twig->new( twig_handlers =>
178                             { 'section[@level="1"]' => sub { $_->print } }
179                              );
180                         ->parsefile( 'doc.xml');
181
182       You can also use "start_tag_handlers " to process an element as soon as
183       the start tag is found. Besides "prefix " you can also use "suffix ",
184
185   Processing just parts of an XML document
186       The twig_roots mode builds only the required sub-trees from the
187       document Anything outside of the twig roots will just be ignored:
188
189         my $t= XML::Twig->new(
190              # the twig will include just the root and selected titles
191                  twig_roots   => { 'section/title' => \&print_n_purge,
192                                    'annex/title'   => \&print_n_purge
193                  }
194                             );
195         $t->parsefile( 'doc.xml');
196
197         sub print_n_purge
198           { my( $t, $elt)= @_;
199             print $elt->text;    # print the text (including sub-element texts)
200             $t->purge;           # frees the memory
201           }
202
203       You can use that mode when you want to process parts of a documents but
204       are not interested in the rest and you don't want to pay the price,
205       either in time or memory, to build the tree for the it.
206
207   Building an XML filter
208       You can combine the "twig_roots" and the "twig_print_outside_roots"
209       options to build filters, which let you modify selected elements and
210       will output the rest of the document as is.
211
212       This would convert prices in $ to prices in Euro in a document:
213
214         my $t= XML::Twig->new(
215                  twig_roots   => { 'price' => \&convert, },   # process prices
216                  twig_print_outside_roots => 1,               # print the rest
217                             );
218         $t->parsefile( 'doc.xml');
219
220         sub convert
221           { my( $t, $price)= @_;
222             my $currency=  $price->{'att'}->{'currency'};          # get the currency
223             if( $currency eq 'USD')
224               { $usd_price= $price->text;                     # get the price
225                 # %rate is just a conversion table
226                 my $euro_price= $usd_price * $rate{usd2euro};
227                 $price->set_text( $euro_price);               # set the new price
228                 $price->set_att( currency => 'EUR');          # don't forget this!
229               }
230             $price->print;                                    # output the price
231           }
232
233   XML::Twig and various versions of Perl, XML::Parser and expat:
234       XML::Twig is a lot more sensitive to variations in versions of perl,
235       XML::Parser and expat than to the OS, so this should cover some
236       reasonable configurations.
237
238       The "recommended configuration" is perl 5.8.3+ (for good Unicode
239       support), XML::Parser 2.31+ and expat 1.95.5+
240
241       See <http://testers.cpan.org/search?request=dist&dist=XML-Twig> for the
242       CPAN testers reports on XML::Twig, which list all tested
243       configurations.
244
245       An Atom feed of the CPAN Testers results is available at
246       <http://xmltwig.org/rss/twig_testers.rss>
247
248       Finally:
249
250       XML::Twig does NOT work with expat 1.95.4
251       XML::Twig only works with XML::Parser 2.27 in perl 5.6.*
252           Note that I can't compile XML::Parser 2.27 anymore, so I can't
253           guarantee that it still works
254
255       XML::Parser 2.28 does not really work
256
257       When in doubt, upgrade expat, XML::Parser and Scalar::Util
258
259       Finally, for some optional features, XML::Twig depends on some
260       additional modules. The complete list, which depends somewhat on the
261       version of Perl that you are running, is given by running
262       "t/zz_dump_config.t"
263

Simplifying XML processing

265       Whitespaces
266           Whitespaces that look non-significant are discarded, this behaviour
267           can be controlled using the "keep_spaces ", "keep_spaces_in " and
268           "discard_spaces_in " options.
269
270       Encoding
271           You can specify that you want the output in the same encoding as
272           the input (provided you have valid XML, which means you have to
273           specify the encoding either in the document or when you create the
274           Twig object) using the "keep_encoding " option
275
276           You can also use "output_encoding" to convert the internal UTF-8
277           format to the required encoding.
278
279       Comments and Processing Instructions (PI)
280           Comments and PI's can be hidden from the processing, but still
281           appear in the output (they are carried by the "real" element closer
282           to them)
283
284       Pretty Printing
285           XML::Twig can output the document pretty printed so it is easier to
286           read for us humans.
287
288       Surviving an untimely death
289           XML parsers are supposed to react violently when fed improper XML.
290           XML::Parser just dies.
291
292           XML::Twig provides the "safe_parse " and the "safe_parsefile "
293           methods which wrap the parse in an eval and return either the
294           parsed twig or 0 in case of failure.
295
296       Private attributes
297           Attributes with a name starting with # (illegal in XML) will not be
298           output, so you can safely use them to store temporary values during
299           processing. Note that you can store anything in a private
300           attribute, not just text, it's just a regular Perl variable, so a
301           reference to an object or a huge data structure is perfectly fine.
302

CLASSES

304       XML::Twig uses a very limited number of classes. The ones you are most
305       likely to use are "XML::Twig" of course, which represents a complete
306       XML document, including the document itself (the root of the document
307       itself is "root"), its handlers, its input or output filters... The
308       other main class is "XML::Twig::Elt", which models an XML element.
309       Element here has a very wide definition: it can be a regular element,
310       or but also text, with an element "tag" of "#PCDATA" (or "#CDATA"), an
311       entity (tag is "#ENT"), a Processing Instruction ("#PI"), a comment
312       ("#COMMENT").
313
314       Those are the 2 commonly used classes.
315
316       You might want to look the "elt_class" option if you want to subclass
317       "XML::Twig::Elt".
318
319       Attributes are just attached to their parent element, they are not
320       objects per se. (Please use the provided methods "att" and "set_att" to
321       access them, if you access them as a hash, then your code becomes
322       implementation dependent and might break in the future).
323
324       Other classes that are seldom used are "XML::Twig::Entity_list" and
325       "XML::Twig::Entity".
326
327       If you use "XML::Twig::XPath" instead of "XML::Twig", elements are then
328       created as "XML::Twig::XPath::Elt"
329

METHODS

331   XML::Twig
332       A twig is a subclass of XML::Parser, so all XML::Parser methods can be
333       called on a twig object, including parse and parsefile.  "setHandlers"
334       on the other hand cannot be used, see "BUGS "
335
336       new This is a class method, the constructor for XML::Twig. Options are
337           passed as keyword value pairs. Recognized options are the same as
338           XML::Parser, plus some (in fact a lot!) XML::Twig specifics.
339
340           New Options:
341
342           twig_handlers
343               This argument consists of a hash "{ expression =" \&handler}>
344               where expression is a an XPath-like expression (+ some others).
345
346               XPath expressions are limited to using the child and descendant
347               axis (indeed you can't specify an axis), and predicates cannot
348               be nested.  You can use the "string", or string(<tag>) function
349               (except in "twig_roots" triggers).
350
351               Additionally you can use regexps (/ delimited) to match
352               attribute and string values.
353
354               Examples:
355
356                 foo
357                 foo/bar
358                 foo//bar
359                 /foo/bar
360                 /foo//bar
361                 /foo/bar[@att1 = "val1" and @att2 = "val2"]/baz[@a >= 1]
362                 foo[string()=~ /^duh!+/]
363                 /foo[string(bar)=~ /\d+/]/baz[@att != 3]
364
365               #CDATA can be used to call a handler for a CDATA section.
366               #COMMENT can be used to call a handler for comments
367
368               Some additional (non-XPath) expressions are also provided for
369               convenience:
370
371               processing instructions
372                   '?' or '#PI' triggers the handler for any processing
373                   instruction, and '?<target>' or '#PI <target>' triggers a
374                   handler for processing instruction with the given target(
375                   ex: '#PI xml-stylesheet').
376
377               level(<level>)
378                   Triggers the handler on any element at that level in the
379                   tree (root is level 1)
380
381               _all_
382                   Triggers the handler for all elements in the tree
383
384               _default_
385                   Triggers the handler for each element that does NOT have
386                   any other handler.
387
388               Expressions are evaluated against the input document.  Which
389               means that even if you have changed the tag of an element
390               (changing the tag of a parent element from a handler for
391               example) the change will not impact the expression evaluation.
392               There is an exception to this: "private" attributes (which name
393               start with a '#', and can only be created during the parsing,
394               as they are not valid XML) are checked against the current
395               twig.
396
397               Handlers are triggered in fixed order, sorted by their type
398               (xpath expressions first, then regexps, then level), then by
399               whether they specify a full path (starting at the root element)
400               or not, then by number of steps in the expression, then number
401               of predicates, then number of tests in predicates. Handlers
402               where the last step does not specify a step ("foo/bar/*") are
403               triggered after other XPath handlers. Finally "_all_" handlers
404               are triggered last.
405
406               Important: once a handler has been triggered if it returns 0
407               then no other handler is called, except a "_all_" handler which
408               will be called anyway.
409
410               If a handler returns a true value and other handlers apply,
411               then the next applicable handler will be called. Repeat, rinse,
412               lather..; The exception to that rule is when the
413               "do_not_chain_handlers" option is set, in which case only the
414               first handler will be called.
415
416               Note that it might be a good idea to explicitly return a short
417               true value (like 1) from handlers: this ensures that other
418               applicable handlers are called even if the last statement for
419               the handler happens to evaluate to false. This might also
420               speedup the code by avoiding the result of the last statement
421               of the code to be copied and passed to the code managing
422               handlers.  It can really pay to have 1 instead of a long string
423               returned.
424
425               When the closing tag for an element is parsed the corresponding
426               handler is called, with 2 arguments: the twig and the "Element
427               ". The twig includes the document tree that has been built so
428               far, the element is the complete sub-tree for the element. The
429               fact that the handler is called only when the closing tag for
430               the element is found means that handlers for inner elements are
431               called before handlers for outer elements.
432
433               $_ is also set to the element, so it is easy to write inline
434               handlers like
435
436                 para => sub { $_->set_tag( 'p'); }
437
438               Text is stored in elements whose tag name is #PCDATA (due to
439               mixed content, text and sub-element in an element there is no
440               way to store the text as just an attribute of the enclosing
441               element, this is similar to the DOM model).
442
443               Warning: if you have used purge or flush on the twig the
444               element might not be complete, some of its children might have
445               been entirely flushed or purged, and the start tag might even
446               have been printed (by "flush") already, so changing its tag
447               might not give the expected result.
448
449           twig_roots
450               This argument let's you build the tree only for those elements
451               you are interested in.
452
453                 Example: my $t= XML::Twig->new( twig_roots => { title => 1, subtitle => 1});
454                          $t->parsefile( file);
455                          my $t= XML::Twig->new( twig_roots => { 'section/title' => 1});
456                          $t->parsefile( file);
457
458               return a twig containing a document including only "title" and
459               "subtitle" elements, as children of the root element.
460
461               You can use generic_attribute_condition, attribute_condition,
462               full_path, partial_path, tag, tag_regexp, _default_ and _all_
463               to trigger the building of the twig.  string_condition and
464               regexp_condition cannot be used as the content of the element,
465               and the string, have not yet been parsed when the condition is
466               checked.
467
468               WARNING: path are checked for the document. Even if the
469               "twig_roots" option is used they will be checked against the
470               full document tree, not the virtual tree created by XML::Twig
471
472               WARNING: twig_roots elements should NOT be nested, that would
473               hopelessly confuse XML::Twig ;--(
474
475               Note: you can set handlers (twig_handlers) using twig_roots
476                 Example: my $t= XML::Twig->new( twig_roots =>
477                                                  { title    => sub {
478               $_[1]->print;},
479                                                    subtitle =>
480               \&process_subtitle
481                                                  }
482                                              );
483                          $t->parsefile( file);
484
485           twig_print_outside_roots
486               To be used in conjunction with the "twig_roots" argument. When
487               set to a true value this will print the document outside of the
488               "twig_roots" elements.
489
490                Example: my $t= XML::Twig->new( twig_roots => { title => \&number_title },
491                                               twig_print_outside_roots => 1,
492                                              );
493                          $t->parsefile( file);
494                          { my $nb;
495                          sub number_title
496                            { my( $twig, $title);
497                              $nb++;
498                              $title->prefix( "$nb ");
499                              $title->print;
500                            }
501                          }
502
503               This example prints the document outside of the title element,
504               calls "number_title" for each "title" element, prints it, and
505               then resumes printing the document. The twig is built only for
506               the "title" elements.
507
508               If the value is a reference to a file handle then the document
509               outside the "twig_roots" elements will be output to this file
510               handle:
511
512                 open( my $out, '>', 'out_file.xml') or die "cannot open out file.xml out_file:$!";
513                 my $t= XML::Twig->new( twig_roots => { title => \&number_title },
514                                        # default output to $out
515                                        twig_print_outside_roots => $out,
516                                      );
517
518                        { my $nb;
519                          sub number_title
520                            { my( $twig, $title);
521                              $nb++;
522                              $title->prefix( "$nb ");
523                              $title->print( $out);    # you have to print to \*OUT here
524                            }
525                          }
526
527           start_tag_handlers
528               A hash "{ expression =" \&handler}>. Sets element handlers that
529               are called when the element is open (at the end of the
530               XML::Parser "Start" handler). The handlers are called with 2
531               params: the twig and the element. The element is empty at that
532               point, its attributes are created though.
533
534               You can use generic_attribute_condition, attribute_condition,
535               full_path, partial_path, tag, tag_regexp, _default_  and _all_
536               to trigger the handler.
537
538               string_condition and regexp_condition cannot be used as the
539               content of the element, and the string, have not yet been
540               parsed when the condition is checked.
541
542               The main uses for those handlers are to change the tag name
543               (you might have to do it as soon as you find the open tag if
544               you plan to "flush" the twig at some point in the element, and
545               to create temporary attributes that will be used when
546               processing sub-element with "twig_hanlders".
547
548               Note: "start_tag" handlers can be called outside of
549               "twig_roots" if this argument is used. Since the element object
550               is not built, in this case handlers are called with the
551               following arguments: $t (the twig), $tag (the tag of the
552               element) and %att (a hash of the attributes of the element).
553
554               If the "twig_print_outside_roots" argument is also used, if the
555               last handler called returns  a "true" value, then the start tag
556               will be output as it appeared in the original document, if the
557               handler returns a "false" value then the start tag will not be
558               printed (so you can print a modified string yourself for
559               example).
560
561               Note that you can use the ignore method in "start_tag_handlers"
562               (and only there).
563
564           end_tag_handlers
565               A hash "{ expression =" \&handler}>. Sets element handlers that
566               are called when the element is closed (at the end of the
567               XML::Parser "End" handler). The handlers are called with 2
568               params: the twig and the tag of the element.
569
570               twig_handlers are called when an element is completely parsed,
571               so why have this redundant option? There is only one use for
572               "end_tag_handlers": when using the "twig_roots" option, to
573               trigger a handler for an element outside the roots.  It is for
574               example very useful to number titles in a document using nested
575               sections:
576
577                 my @no= (0);
578                 my $no;
579                 my $t= XML::Twig->new(
580                         start_tag_handlers =>
581                          { section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
582                         twig_roots         =>
583                          { title   => sub { $_->prefix( $no); $_->print; } },
584                         end_tag_handlers   => { section => sub { pop @no;  } },
585                         twig_print_outside_roots => 1
586                                     );
587                  $t->parsefile( $file);
588
589               Using the "end_tag_handlers" argument without "twig_roots" will
590               result in an error.
591
592           do_not_chain_handlers
593               If this option is set to a true value, then only one handler
594               will be called for each element, even if several satisfy the
595               condition
596
597               Note that the "_all_" handler will still be called regardless
598
599           ignore_elts
600               This option lets you ignore elements when building the twig.
601               This is useful in cases where you cannot use "twig_roots" to
602               ignore elements, for example if the element to ignore is a
603               sibling of elements you are interested in.
604
605               Example:
606
607                 my $twig= XML::Twig->new( ignore_elts => { elt => 'discard' });
608                 $twig->parsefile( 'doc.xml');
609
610               This will build the complete twig for the document, except that
611               all "elt" elements (and their children) will be left out.
612
613               The keys in the hash are triggers, limited to the same subset
614               as "start_tag_handlers". The values can be "discard", to
615               discard the element, "print", to output the element as-is,
616               "string" to store the text of the ignored element(s), including
617               markup, in a field of the twig: "$t->{twig_buffered_string}" or
618               a reference to a scalar, in which case the text of the ignored
619               element(s), including markup, will be stored in the scalar. Any
620               other value will be treated as "discard".
621
622           char_handler
623               A reference to a subroutine that will be called every time
624               "PCDATA" is found.
625
626               The subroutine receives the string as argument, and returns the
627               modified string:
628
629                 # WE WANT ALL STRINGS IN UPPER CASE
630                 sub my_char_handler
631                   { my( $text)= @_;
632                     $text= uc( $text);
633                     return $text;
634                   }
635
636           elt_class
637               The name of a class used to store elements. this class should
638               inherit from "XML::Twig::Elt" (and by default it is
639               "XML::Twig::Elt"). This option is used to subclass the element
640               class and extend it with new methods.
641
642               This option is needed because during the parsing of the XML,
643               elements are created by "XML::Twig", without any control from
644               the user code.
645
646           keep_atts_order
647               Setting this option to a true value causes the attribute hash
648               to be tied to a "Tie::IxHash" object.  This means that
649               "Tie::IxHash" needs to be installed for this option to be
650               available. It also means that the hash keeps its order, so you
651               will get the attributes in order. This allows outputting the
652               attributes in the same order as they were in the original
653               document.
654
655           keep_encoding
656               This is a (slightly?) evil option: if the XML document is not
657               UTF-8 encoded and you want to keep it that way, then setting
658               keep_encoding will use the"Expat" original_string method for
659               character, thus keeping the original encoding, as well as the
660               original entities in the strings.
661
662               See the "t/test6.t" test file to see what results you can
663               expect from the various encoding options.
664
665               WARNING: if the original encoding is multi-byte then attribute
666               parsing will be EXTREMELY unsafe under any Perl before 5.6, as
667               it uses regular expressions which do not deal properly with
668               multi-byte characters. You can specify an alternate function to
669               parse the start tags with the "parse_start_tag" option (see
670               below)
671
672               WARNING: this option is NOT used when parsing with XML::Parser
673               non-blocking parser ("parse_start", "parse_more", "parse_done"
674               methods) which you probably should not use with XML::Twig
675               anyway as they are totally untested!
676
677           output_encoding
678               This option generates an output_filter using "Encode",
679               "Text::Iconv" or "Unicode::Map8" and "Unicode::Strings", and
680               sets the encoding in the XML declaration. This is the easiest
681               way to deal with encodings, if you need more sophisticated
682               features, look at "output_filter" below
683
684           output_filter
685               This option is used to convert the character encoding of the
686               output document.  It is passed either a string corresponding to
687               a predefined filter or a subroutine reference. The filter will
688               be called every time a document or element is processed by the
689               "print" functions ("print", "sprint", "flush").
690
691               Pre-defined filters:
692
693               latin1
694                   uses either "Encode", "Text::Iconv" or "Unicode::Map8" and
695                   "Unicode::String" or a regexp (which works only with
696                   XML::Parser 2.27), in this order, to convert all characters
697                   to ISO-8859-15 (usually latin1 is synonym to ISO-8859-1,
698                   but in practice it seems that ISO-8859-15, which includes
699                   the euro sign, is more useful and probably what most people
700                   want).
701
702               html
703                   does the same conversion as "latin1", plus encodes entities
704                   using "HTML::Entities" (oddly enough you will need to have
705                   HTML::Entities installed for it to be available). This
706                   should only be used if the tags and attribute names
707                   themselves are in US-ASCII, or they will be converted and
708                   the output will not be valid XML any more
709
710               safe
711                   converts the output to ASCII (US) only  plus character
712                   entities ("&#nnn;") this should be used only if the tags
713                   and attribute names themselves are in US-ASCII, or they
714                   will be converted and the output will not be valid XML any
715                   more
716
717               safe_hex
718                   same as "safe" except that the character entities are in
719                   hex ("&#xnnn;")
720
721               encode_convert ($encoding)
722                   Return a subref that can be used to convert utf8 strings to
723                   $encoding).  Uses "Encode".
724
725                      my $conv = XML::Twig::encode_convert( 'latin1');
726                      my $t = XML::Twig->new(output_filter => $conv);
727
728               iconv_convert ($encoding)
729                   this function is used to create a filter subroutine that
730                   will be used to convert the characters to the target
731                   encoding using "Text::Iconv" (which needs to be installed,
732                   look at the documentation for the module and for the
733                   "iconv" library to find out which encodings are available
734                   on your system, "iconv -l" should give you a list of
735                   available encodings)
736
737                      my $conv = XML::Twig::iconv_convert( 'latin1');
738                      my $t = XML::Twig->new(output_filter => $conv);
739
740               unicode_convert ($encoding)
741                   this function is used to create a filter subroutine that
742                   will be used to convert the characters to the target
743                   encoding using  "Unicode::Strings" and "Unicode::Map8"
744                   (which need to be installed, look at the documentation for
745                   the modules to find out which encodings are available on
746                   your system)
747
748                      my $conv = XML::Twig::unicode_convert( 'latin1');
749                      my $t = XML::Twig->new(output_filter => $conv);
750
751               The "text" and "att" methods do not use the filter, so their
752               result are always in unicode.
753
754               Those predeclared filters are based on subroutines that can be
755               used by themselves (as "XML::Twig::foo").
756
757               html_encode ($string)
758                   Use "HTML::Entities" to encode a utf8 string
759
760               safe_encode ($string)
761                   Use either a regexp (perl < 5.8) or "Encode" to encode non-
762                   ascii characters in the string in "&#<nnnn>;" format
763
764               safe_encode_hex ($string)
765                   Use either a regexp (perl < 5.8) or "Encode" to encode non-
766                   ascii characters in the string in "&#x<nnnn>;" format
767
768               regexp2latin1 ($string)
769                   Use a regexp to encode a utf8 string into latin 1
770                   (ISO-8859-1). Does not work with Perl 5.8.0!
771
772           output_text_filter
773               same as output_filter, except it doesn't apply to the brackets
774               and quotes around attribute values. This is useful for all
775               filters that could change the tagging, basically anything that
776               does not just change the encoding of the output. "html", "safe"
777               and "safe_hex" are better used with this option.
778
779           input_filter
780               This option is similar to "output_filter" except the filter is
781               applied to the characters before they are stored in the twig,
782               at parsing time.
783
784           remove_cdata
785               Setting this option to a true value will force the twig to
786               output CDATA sections as regular (escaped) PCDATA
787
788           parse_start_tag
789               If you use the "keep_encoding" option then this option can be
790               used to replace the default parsing function. You should
791               provide a coderef (a reference to a subroutine) as the
792               argument, this subroutine takes the original tag (given by
793               XML::Parser::Expat original_string() method) and returns a tag
794               and the attributes in a hash (or in a list
795               attribute_name/attribute value).
796
797           no_xxe
798               prevents external entities to be parsed.
799
800               This is a security feature, in case the input XML cannot be
801               trusted. With this option set to a true value defining external
802               entities in the document will cause the parse to fail.
803
804               This prevents an entity like "<!ENTITY xxe PUBLIC "bar"
805               "/etc/passwd">" to make the password fiel available in the
806               document.
807
808           expand_external_ents
809               When this option is used external entities (that are defined)
810               are expanded when the document is output using "print"
811               functions such as "print ", "sprint ", "flush " and "xml_string
812               ".  Note that in the twig the entity will be stored as an
813               element with a tag '"#ENT"', the entity will not be expanded
814               there, so you might want to process the entities before
815               outputting it.
816
817               If an external entity is not available, then the parse will
818               fail.
819
820               A special case is when the value of this option is -1. In that
821               case a missing entity will not cause the parser to die, but its
822               "name", "sysid" and "pubid" will be stored in the twig as
823               "$twig->{twig_missing_system_entities}" (a reference to an
824               array of hashes { name => <name>, sysid => <sysid>, pubid =>
825               <pubid> }). Yes, this is a bit of a hack, but it's useful in
826               some cases.
827
828           load_DTD
829               If this argument is set to a true value, "parse" or "parsefile"
830               on the twig will load  the DTD information. This information
831               can then be accessed through the twig, in a "DTD_handler" for
832               example. This will load even an external DTD.
833
834               Default and fixed values for attributes will also be filled,
835               based on the DTD.
836
837               Note that to do this the module will generate a temporary file
838               in the current directory. If this is a problem let me know and
839               I will add an option to specify an alternate directory.
840
841               See "DTD Handling" for more information
842
843           DTD_base <path_to_DTD_directory>
844               If the DTD is in a different directory, looks for it there,
845               useful to make up somewhat for the lack of catalog suport in
846               "expat". You still need a SYSTEM declaration
847
848           DTD_handler
849               Set a handler that will be called once the doctype (and the
850               DTD) have been loaded, with 2 arguments, the twig and the DTD.
851
852           no_prolog
853               Does not output a prolog (XML declaration and DTD)
854
855           id  This optional argument gives the name of an attribute that can
856               be used as an ID in the document. Elements whose ID is known
857               can be accessed through the elt_id method. id defaults to 'id'.
858               See "BUGS "
859
860           discard_spaces
861               If this optional argument is set to a true value then spaces
862               are discarded when they look non-significant: strings
863               containing only spaces and at least one line feed are
864               discarded. This argument is set to true by default.
865
866               The exact algorithm to drop spaces is: strings including only
867               spaces (perl \s) and at least one \n right before an open or
868               close tag are dropped.
869
870           discard_all_spaces
871               If this argument is set to a true value, spaces are discarded
872               more aggressively than with "discard_spaces": strings not
873               including a \n are also dropped. This option is appropriate for
874               data-oriented XML.
875
876           keep_spaces
877               If this optional argument is set to a true value then all
878               spaces in the document are kept, and stored as "PCDATA".
879
880               Warning: adding this option can result in changes in the twig
881               generated: space that was previously discarded might end up in
882               a new text element. see the difference by calling the following
883               code with 0 and 1 as arguments:
884
885                 perl -MXML::Twig -e'print XML::Twig->new( keep_spaces => shift)->parse( "<d> \n<e/></d>")->_dump'
886
887               "keep_spaces" and "discard_spaces" cannot be both set.
888
889           discard_spaces_in
890               This argument sets "keep_spaces" to true but will cause the
891               twig builder to discard spaces in the elements listed.
892
893               The syntax for using this argument is:
894
895                 XML::Twig->new( discard_spaces_in => [ 'elt1', 'elt2']);
896
897           keep_spaces_in
898               This argument sets "discard_spaces" to true but will cause the
899               twig builder to keep spaces in the elements listed.
900
901               The syntax for using this argument is:
902
903                 XML::Twig->new( keep_spaces_in => [ 'elt1', 'elt2']);
904
905               Warning: adding this option can result in changes in the twig
906               generated: space that was previously discarded might end up in
907               a new text element.
908
909           pretty_print
910               Set the pretty print method, amongst '"none"' (default),
911               '"nsgmls"', '"nice"', '"indented"', '"indented_c"',
912               '"indented_a"', '"indented_close_tag"', '"cvs"', '"wrapped"',
913               '"record"' and '"record_c"'
914
915               pretty_print formats:
916
917               none
918                   The document is output as one ling string, with no line
919                   breaks except those found within text elements
920
921               nsgmls
922                   Line breaks are inserted in safe places: that is within
923                   tags, between a tag and an attribute, between attributes
924                   and before the > at the end of a tag.
925
926                   This is quite ugly but better than "none", and it is very
927                   safe, the document will still be valid (conforming to its
928                   DTD).
929
930                   This is how the SGML parser "sgmls" splits documents, hence
931                   the name.
932
933               nice
934                   This option inserts line breaks before any tag that does
935                   not contain text (so element with textual content are not
936                   broken as the \n is the significant).
937
938                   WARNING: this option leaves the document well-formed but
939                   might make it invalid (not conformant to its DTD). If you
940                   have elements declared as
941
942                     <!ELEMENT foo (#PCDATA|bar)>
943
944                   then a "foo" element including a "bar" one will be printed
945                   as
946
947                     <foo>
948                     <bar>bar is just pcdata</bar>
949                     </foo>
950
951                   This is invalid, as the parser will take the line break
952                   after the "foo" tag as a sign that the element contains
953                   PCDATA, it will then die when it finds the "bar" tag. This
954                   may or may not be important for you, but be aware of it!
955
956               indented
957                   Same as "nice" (and with the same warning) but indents
958                   elements according to their level
959
960               indented_c
961                   Same as "indented" but a little more compact: the closing
962                   tags are on the same line as the preceding text
963
964               indented_close_tag
965                   Same as "indented" except that the closing tag is also
966                   indented, to line up with the tags within the element
967
968               idented_a
969                   This formats XML files in a line-oriented version control
970                   friendly way.  The format is described in
971                   <http://tinyurl.com/2kwscq> (that's an Oracle document with
972                   an insanely long URL).
973
974                   Note that to be totaly conformant to the "spec", the order
975                   of attributes should not be changed, so if they are not
976                   already in alphabetical order you will need to use the
977                   "keep_atts_order" option.
978
979               cvs Same as "idented_a".
980
981               wrapped
982                   Same as "indented_c" but lines are wrapped using
983                   Text::Wrap::wrap. The default length for lines is the
984                   default for $Text::Wrap::columns, and can be changed by
985                   changing that variable.
986
987               record
988                   This is a record-oriented pretty print, that display data
989                   in records, one field per line (which looks a LOT like
990                   "indented")
991
992               record_c
993                   Stands for record compact, one record per line
994
995           empty_tags
996               Set the empty tag display style ('"normal"', '"html"' or
997               '"expand"').
998
999               "normal" outputs an empty tag '"<tag/>"', "html" adds a space
1000               '"<tag />"' for elements that can be empty in XHTML and
1001               "expand" outputs '"<tag></tag>"'
1002
1003           quote
1004               Set the quote character for attributes ('"single"' or
1005               '"double"').
1006
1007           escape_gt
1008               By default XML::Twig does not escape the character > in its
1009               output, as it is not mandated by the XML spec. With this option
1010               on, > will be replaced by "&gt;"
1011
1012           comments
1013               Set the way comments are processed: '"drop"' (default),
1014               '"keep"' or '"process"'
1015
1016               Comments processing options:
1017
1018               drop
1019                   drops the comments, they are not read, nor printed to the
1020                   output
1021
1022               keep
1023                   comments are loaded and will appear on the output, they are
1024                   not accessible within the twig and will not interfere with
1025                   processing though
1026
1027                   Note: comments in the middle of a text element such as
1028
1029                     <p>text <!-- comment --> more text --></p>
1030
1031                   are kept at their original position in the text. Using
1032                   ˝"print" methods like "print" or "sprint" will return the
1033                   comments in the text. Using "text" or "field" on the other
1034                   hand will not.
1035
1036                   Any use of "set_pcdata" on the "#PCDATA" element (directly
1037                   or through other methods like "set_content") will delete
1038                   the comment(s).
1039
1040               process
1041                   comments are loaded in the twig and will be treated as
1042                   regular elements (their "tag" is "#COMMENT") this can
1043                   interfere with processing if you expect
1044                   "$elt->{first_child}" to be an element but find a comment
1045                   there.  Validation will not protect you from this as
1046                   comments can happen anywhere.  You can use
1047                   "$elt->first_child( 'tag')" (which is a good habit anyway)
1048                   to get where you want.
1049
1050                   Consider using "process" if you are outputting SAX events
1051                   from XML::Twig.
1052
1053           pi  Set the way processing instructions are processed: '"drop"',
1054               '"keep"' (default) or '"process"'
1055
1056               Note that you can also set PI handlers in the "twig_handlers"
1057               option:
1058
1059                 '?'       => \&handler
1060                 '?target' => \&handler 2
1061
1062               The handlers will be called with 2 parameters, the twig and the
1063               PI element if "pi" is set to "process", and with 3, the twig,
1064               the target and the data if "pi" is set to "keep". Of course
1065               they will not be called if "pi" is set to "drop".
1066
1067               If "pi" is set to "keep" the handler should return a string
1068               that will be used as-is as the PI text (it should look like ""
1069               <?target data?" >" or '' if you want to remove the PI),
1070
1071               Only one handler will be called, "?target" or "?" if no
1072               specific handler for that target is available.
1073
1074           map_xmlns
1075               This option is passed a hashref that maps uri's to prefixes.
1076               The prefixes in the document will be replaced by the ones in
1077               the map. The mapped prefixes can (actually have to) be used to
1078               trigger handlers, navigate or query the document.
1079
1080               Here is an example:
1081
1082                 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1083                                        twig_handlers =>
1084                                          { 'svg:circle' => sub { $_->set_att( r => 20) } },
1085                                        pretty_print => 'indented',
1086                                      )
1087                                 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1088                                             <gr:circle cx="10" cy="90" r="10"/>
1089                                          </doc>'
1090                                        )
1091                                 ->print;
1092
1093               This will output:
1094
1095                 <doc xmlns:svg="http://www.w3.org/2000/svg">
1096                    <svg:circle cx="10" cy="90" r="20"/>
1097                 </doc>
1098
1099           keep_original_prefix
1100               When used with "map_xmlns" this option will make "XML::Twig"
1101               use the original namespace prefixes when outputting a document.
1102               The mapped prefix will still be used for triggering handlers
1103               and in navigation and query methods.
1104
1105                 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1106                                        twig_handlers =>
1107                                          { 'svg:circle' => sub { $_->set_att( r => 20) } },
1108                                        keep_original_prefix => 1,
1109                                        pretty_print => 'indented',
1110                                      )
1111                                 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1112                                             <gr:circle cx="10" cy="90" r="10"/>
1113                                          </doc>'
1114                                        )
1115                                 ->print;
1116
1117               This will output:
1118
1119                 <doc xmlns:gr="http://www.w3.org/2000/svg">
1120                    <gr:circle cx="10" cy="90" r="20"/>
1121                 </doc>
1122
1123           original_uri ($prefix)
1124               called within a handler, this will return the uri bound to the
1125               namespace prefix in the original document.
1126
1127           index ($arrayref or $hashref)
1128               This option creates lists of specific elements during the
1129               parsing of the XML.  It takes a reference to either a list of
1130               triggering expressions or to a hash name => expression, and for
1131               each one generates the list of elements that match the
1132               expression. The list can be accessed through the "index"
1133               method.
1134
1135               example:
1136
1137                 # using an array ref
1138                 my $t= XML::Twig->new( index => [ 'div', 'table' ])
1139                                 ->parsefile( "foo.xml");
1140                 my $divs= $t->index( 'div');
1141                 my $first_div= $divs->[0];
1142                 my $last_table= $t->index( table => -1);
1143
1144                 # using a hashref to name the indexes
1145                 my $t= XML::Twig->new( index => { email => 'a[@href=~/^ \s*mailto:/]'})
1146                                 ->parsefile( "foo.xml");
1147                 my $last_emails= $t->index( email => -1);
1148
1149               Note that the index is not maintained after the parsing. If
1150               elements are deleted, renamed or otherwise hurt during
1151               processing, the index is NOT updated.  (changing the id element
1152               OTOH will update the index)
1153
1154           att_accessors <list of attribute names>
1155               creates methods that give direct access to attribute:
1156
1157                 my $t= XML::Twig->new( att_accessors => [ 'href', 'src'])
1158                                 ->parsefile( $file);
1159                 my $first_href= $t->first_elt( 'img')->src; # same as ->att( 'src')
1160                 $t->first_elt( 'img')->src( 'new_logo.png') # changes the attribute value
1161
1162           elt_accessors
1163               creates methods that give direct access to the first child
1164               element (in scalar context) or the list of elements (in list
1165               context):
1166
1167               the list of accessors to create can be given 1 2 different
1168               ways: in an array, or in a hash alias => expression
1169                 my $t=  XML::Twig->new( elt_accessors => [ 'head'])
1170                                 ->parsefile( $file);
1171                 my $title_text= $t->root->head->field( 'title');
1172                 # same as $title_text= $t->root->first_child( 'head')->field(
1173               'title');
1174
1175                 my $t=  XML::Twig->new( elt_accessors => { warnings => 'p[@class="warning"]', d2 => 'div[2]'}, )
1176                                 ->parsefile( $file);
1177                 my $body= $t->first_elt( 'body');
1178                 my @warnings= $body->warnings; # same as $body->children( 'p[@class="warning"]');
1179                 my $s2= $body->d2;             # same as $body->first_child( 'div[2]')
1180
1181           field_accessors
1182               creates methods that give direct access to the first child
1183               element text:
1184
1185                 my $t=  XML::Twig->new( field_accessors => [ 'h1'])
1186                                 ->parsefile( $file);
1187                 my $div_title_text= $t->first_elt( 'div')->title;
1188                 # same as $title_text= $t->first_elt( 'div')->field( 'title');
1189
1190           use_tidy
1191               set this option to use HTML::Tidy instead of HTML::TreeBuilder
1192               to convert HTML to XML. HTML, especially real (real "crap")
1193               HTML found in the wild, so depending on the data, one module or
1194               the other does a better job at the conversion. Also, HTML::Tidy
1195               can be a bit difficult to install, so XML::Twig offers both
1196               option. TIMTOWTDI
1197
1198           output_html_doctype
1199               when using HTML::TreeBuilder to convert HTML, this option
1200               causes the DOCTYPE declaration to be output, which may be
1201               important for some legacy browsers.  Without that option the
1202               DOCTYPE definition is NOT output. Also if the definition is
1203               completely wrong (ie not easily parsable), it is not output
1204               either.
1205
1206           Note: I _HATE_ the Java-like name of arguments used by most XML
1207           modules.  So in pure TIMTOWTDI fashion all arguments can be written
1208           either as "UglyJavaLikeName" or as "readable_perl_name":
1209           "twig_print_outside_roots" or "TwigPrintOutsideRoots" (or even
1210           "twigPrintOutsideRoots" {shudder}).  XML::Twig normalizes them
1211           before processing them.
1212
1213       parse ( $source)
1214           The $source parameter should either be a string containing the
1215           whole XML document, or it should be an open "IO::Handle" (aka a
1216           filehandle).
1217
1218           A die call is thrown if a parse error occurs. Otherwise it will
1219           return the twig built by the parse. Use "safe_parse" if you want
1220           the parsing to return even when an error occurs.
1221
1222           If this method is called as a class method ("XML::Twig->parse(
1223           $some_xml_or_html)") then an XML::Twig object is created, using the
1224           parameters except the last one (eg "XML::Twig->parse( pretty_print
1225           => 'indented', $some_xml_or_html)") and "xparse" is called on it.
1226
1227           Note that when parsing a filehandle, the handle should NOT be open
1228           with an encoding (ie open with "open( my $in, '<', $filename)". The
1229           file will be parsed by "expat", so specifying the encoding actually
1230           causes problems for the parser (as in: it can crash it, see
1231           https://rt.cpan.org/Ticket/Display.html?id=78877). For parsing a
1232           file it is actually recommended to use "parsefile" on the file
1233           name, instead of <parse> on the open file.
1234
1235       parsestring
1236           This is just an alias for "parse" for backwards compatibility.
1237
1238       parsefile (FILE [, OPT => OPT_VALUE [...]])
1239           Open "FILE" for reading, then call "parse" with the open handle.
1240           The file is closed no matter how "parse" returns.
1241
1242           A "die" call is thrown if a parse error occurs. Otherwise it will
1243           return the twig built by the parse. Use "safe_parsefile" if you
1244           want the parsing to return even when an error occurs.
1245
1246       parsefile_inplace ( $file, $optional_extension)
1247           Parse and update a file "in place". It does this by creating a temp
1248           file, selecting it as the default for print() statements (and
1249           methods), then parsing the input file. If the parsing is
1250           successful, then the temp file is moved to replace the input file.
1251
1252           If an extension is given then the original file is backed-up (the
1253           rules for the extension are the same as the rule for the -i option
1254           in perl).
1255
1256       parsefile_html_inplace ( $file, $optional_extension)
1257           Same as parsefile_inplace, except that it parses HTML instead of
1258           XML
1259
1260       parseurl ($url $optional_user_agent)
1261           Gets the data from $url and parse it. The data is piped to the
1262           parser in chunks the size of the XML::Parser::Expat buffer, so
1263           memory consumption and hopefully speed are optimal.
1264
1265           For most (read "small") XML it is probably as efficient (and easier
1266           to debug) to just "get" the XML file and then parse it as a string.
1267
1268             use XML::Twig;
1269             use LWP::Simple;
1270             my $twig= XML::Twig->new();
1271             $twig->parse( LWP::Simple::get( $URL ));
1272
1273           or
1274
1275             use XML::Twig;
1276             my $twig= XML::Twig->nparse( $URL);
1277
1278           If the $optional_user_agent argument is used then it is used,
1279           otherwise a new one is created.
1280
1281       safe_parse ( SOURCE [, OPT => OPT_VALUE [...]])
1282           This method is similar to "parse" except that it wraps the parsing
1283           in an "eval" block. It returns the twig on success and 0 on failure
1284           (the twig object also contains the parsed twig). $@ contains the
1285           error message on failure.
1286
1287           Note that the parsing still stops as soon as an error is detected,
1288           there is no way to keep going after an error.
1289
1290       safe_parsefile (FILE [, OPT => OPT_VALUE [...]])
1291           This method is similar to "parsefile" except that it wraps the
1292           parsing in an "eval" block. It returns the twig on success and 0 on
1293           failure (the twig object also contains the parsed twig) . $@
1294           contains the error message on failure
1295
1296           Note that the parsing still stops as soon as an error is detected,
1297           there is no way to keep going after an error.
1298
1299       safe_parseurl ($url $optional_user_agent)
1300           Same as "parseurl" except that it wraps the parsing in an "eval"
1301           block. It returns the twig on success and 0 on failure (the twig
1302           object also contains the parsed twig) . $@ contains the error
1303           message on failure
1304
1305       parse_html ($string_or_fh)
1306           parse an HTML string or file handle (by converting it to XML using
1307           HTML::TreeBuilder, which needs to be available).
1308
1309           This works nicely, but some information gets lost in the process:
1310           newlines are removed, and (at least on the version I use), comments
1311           get an extra CDATA section inside ( <!-- foo --> becomes <!--
1312           <![CDATA[ foo ]]> -->
1313
1314       parsefile_html ($file)
1315           parse an HTML file (by converting it to XML using
1316           HTML::TreeBuilder, which needs to be available, or HTML::Tidy if
1317           the "use_tidy" option was used).  The file is loaded completely in
1318           memory and converted to XML before being parsed.
1319
1320           this method is to be used with caution though, as it doesn't know
1321           about the file encoding, it is usually better to use "parse_html",
1322           which gives you a chance to open the file with the proper encoding
1323           layer.
1324
1325       parseurl_html ($url $optional_user_agent)
1326           parse an URL as html the same way "parse_html" does
1327
1328       safe_parseurl_html ($url $optional_user_agent)
1329           Same as "parseurl_html"> except that it wraps the parsing in an
1330           "eval" block.  It returns the twig on success and 0 on failure (the
1331           twig object also contains the parsed twig) . $@ contains the error
1332           message on failure
1333
1334       safe_parsefile_html ($file $optional_user_agent)
1335           Same as "parsefile_html"> except that it wraps the parsing in an
1336           "eval" block.  It returns the twig on success and 0 on failure (the
1337           twig object also contains the parsed twig) . $@ contains the error
1338           message on failure
1339
1340       safe_parse_html ($string_or_fh)
1341           Same as "parse_html" except that it wraps the parsing in an "eval"
1342           block.  It returns the twig on success and 0 on failure (the twig
1343           object also contains the parsed twig) . $@ contains the error
1344           message on failure
1345
1346       xparse ($thing_to_parse)
1347           parse the $thing_to_parse, whether it is a filehandle, a string, an
1348           HTML file, an HTML URL, an URL or a file.
1349
1350           Note that this is mostly a convenience method for one-off scripts.
1351           For example files that end in '.htm' or '.html' are parsed first as
1352           XML, and if this fails as HTML. This is certainly not the most
1353           efficient way to do this in general.
1354
1355       nparse ($optional_twig_options, $thing_to_parse)
1356           create a twig with the $optional_options, and parse the
1357           $thing_to_parse, whether it is a filehandle, a string, an HTML
1358           file, an HTML URL, an URL or a file.
1359
1360           Examples:
1361
1362              XML::Twig->nparse( "file.xml");
1363              XML::Twig->nparse( error_context => 1, "file://file.xml");
1364
1365       nparse_pp ($optional_twig_options, $thing_to_parse)
1366           same as "nparse" but also sets the "pretty_print" option to
1367           "indented".
1368
1369       nparse_e ($optional_twig_options, $thing_to_parse)
1370           same as "nparse" but also sets the "error_context" option to 1.
1371
1372       nparse_ppe ($optional_twig_options, $thing_to_parse)
1373           same as "nparse" but also sets the "pretty_print" option to
1374           "indented" and the "error_context" option to 1.
1375
1376       parser
1377           This method returns the "expat" object (actually the
1378           XML::Parser::Expat object) used during parsing. It is useful for
1379           example to call XML::Parser::Expat methods on it. To get the line
1380           of a tag for example use "$t->parser->current_line".
1381
1382       setTwigHandlers ($handlers)
1383           Set the twig_handlers. $handlers is a reference to a hash similar
1384           to the one in the "twig_handlers" option of new. All previous
1385           handlers are unset.  The method returns the reference to the
1386           previous handlers.
1387
1388       setTwigHandler ($exp $handler)
1389           Set a single twig_handler for elements matching $exp. $handler is a
1390           reference to a subroutine. If the handler was previously set then
1391           the reference to the previous handler is returned.
1392
1393       setStartTagHandlers ($handlers)
1394           Set the start_tag handlers. $handlers is a reference to a hash
1395           similar to the one in the "start_tag_handlers" option of new. All
1396           previous handlers are unset.  The method returns the reference to
1397           the previous handlers.
1398
1399       setStartTagHandler ($exp $handler)
1400           Set a single start_tag handlers for elements matching $exp.
1401           $handler is a reference to a subroutine. If the handler was
1402           previously set then the reference to the previous handler is
1403           returned.
1404
1405       setEndTagHandlers ($handlers)
1406           Set the end_tag handlers. $handlers is a reference to a hash
1407           similar to the one in the "end_tag_handlers" option of new. All
1408           previous handlers are unset.  The method returns the reference to
1409           the previous handlers.
1410
1411       setEndTagHandler ($exp $handler)
1412           Set a single end_tag handlers for elements matching $exp. $handler
1413           is a reference to a subroutine. If the handler was previously set
1414           then the reference to the previous handler is returned.
1415
1416       setTwigRoots ($handlers)
1417           Same as using the "twig_roots" option when creating the twig
1418
1419       setCharHandler ($exp $handler)
1420           Set a "char_handler"
1421
1422       setIgnoreEltsHandler ($exp)
1423           Set a "ignore_elt" handler (elements that match $exp will be
1424           ignored
1425
1426       setIgnoreEltsHandlers ($exp)
1427           Set all "ignore_elt" handlers (previous handlers are replaced)
1428
1429       dtd Return the dtd (an XML::Twig::DTD object) of a twig
1430
1431       xmldecl
1432           Return the XML declaration for the document, or a default one if it
1433           doesn't have one
1434
1435       doctype
1436           Return the doctype for the document
1437
1438       doctype_name
1439           returns the doctype of the document from the doctype declaration
1440
1441       system_id
1442           returns the system value of the DTD of the document from the
1443           doctype declaration
1444
1445       public_id
1446           returns the public doctype of the document from the doctype
1447           declaration
1448
1449       internal_subset
1450           returns the internal subset of the DTD
1451
1452       dtd_text
1453           Return the DTD text
1454
1455       dtd_print
1456           Print the DTD
1457
1458       model ($tag)
1459           Return the model (in the DTD) for the element $tag
1460
1461       root
1462           Return the root element of a twig
1463
1464       set_root ($elt)
1465           Set the root of a twig
1466
1467       first_elt ($optional_condition)
1468           Return the first element matching $optional_condition of a twig, if
1469           no condition is given then the root is returned
1470
1471       last_elt ($optional_condition)
1472           Return the last element matching $optional_condition of a twig, if
1473           no condition is given then the last element of the twig is returned
1474
1475       elt_id        ($id)
1476           Return the element whose "id" attribute is $id
1477
1478       getEltById
1479           Same as "elt_id"
1480
1481       index ($index_name, $optional_index)
1482           If the $optional_index argument is present, return the
1483           corresponding element in the index (created using the "index"
1484           option for "XML::Twig-"new>)
1485
1486           If the argument is not present, return an arrayref to the index
1487
1488       normalize
1489           merge together all consecutive pcdata elements in the document (if
1490           for example you have turned some elements into pcdata using
1491           "erase", this will give you a "clean" document in which there all
1492           text elements are as long as possible).
1493
1494       encoding
1495           This method returns the encoding of the XML document, as defined by
1496           the "encoding" attribute in the XML declaration (ie it is "undef"
1497           if the attribute is not defined)
1498
1499       set_encoding
1500           This method sets the value of the "encoding" attribute in the XML
1501           declaration.  Note that if the document did not have a declaration
1502           it is generated (with an XML version of 1.0)
1503
1504       xml_version
1505           This method returns the XML version, as defined by the "version"
1506           attribute in the XML declaration (ie it is "undef" if the attribute
1507           is not defined)
1508
1509       set_xml_version
1510           This method sets the value of the "version" attribute in the XML
1511           declaration.  If the declaration did not exist it is created.
1512
1513       standalone
1514           This method returns the value of the "standalone" declaration for
1515           the document
1516
1517       set_standalone
1518           This method sets the value of the "standalone" attribute in the XML
1519           declaration.  Note that if the document did not have a declaration
1520           it is generated (with an XML version of 1.0)
1521
1522       set_output_encoding
1523           Set the "encoding" "attribute" in the XML declaration
1524
1525       set_doctype ($name, $system, $public, $internal)
1526           Set the doctype of the element. If an argument is "undef" (or not
1527           present) then its former value is retained, if a false ('' or 0)
1528           value is passed then the former value is deleted;
1529
1530       entity_list
1531           Return the entity list of a twig
1532
1533       entity_names
1534           Return the list of all defined entities
1535
1536       entity ($entity_name)
1537           Return the entity
1538
1539       notation_list
1540           Return the notation list of a twig
1541
1542       notation_names
1543           Return the list of all defined notations
1544
1545       notation ($notation_name)
1546           Return the notation
1547
1548       change_gi      ($old_gi, $new_gi)
1549           Performs a (very fast) global change. All elements $old_gi are now
1550           $new_gi. This is a bit dangerous though and should be avoided if <
1551           possible, as the new tag might be ignored in subsequent processing.
1552
1553           See "BUGS "
1554
1555       flush            ($optional_filehandle, %options)
1556           Flushes a twig up to (and including) the current element, then
1557           deletes all unnecessary elements from the tree that's kept in
1558           memory.  "flush" keeps track of which elements need to be
1559           open/closed, so if you flush from handlers you don't have to worry
1560           about anything. Just keep flushing the twig every time you're done
1561           with a sub-tree and it will come out well-formed. After the whole
1562           parsing don't forget to"flush" one more time to print the end of
1563           the document.  The doctype and entity declarations are also
1564           printed.
1565
1566           flush take an optional filehandle as an argument.
1567
1568           If you use "flush" at any point during parsing, the document will
1569           be flushed one last time at the end of the parsing, to the proper
1570           filehandle.
1571
1572           options: use the "update_DTD" option if you have updated the
1573           (internal) DTD and/or the entity list and you want the updated DTD
1574           to be output
1575
1576           The "pretty_print" option sets the pretty printing of the document.
1577
1578              Example: $t->flush( Update_DTD => 1);
1579                       $t->flush( $filehandle, pretty_print => 'indented');
1580                       $t->flush( \*FILE);
1581
1582       flush_up_to ($elt, $optional_filehandle, %options)
1583           Flushes up to the $elt element. This allows you to keep part of the
1584           tree in memory when you "flush".
1585
1586           options: see flush.
1587
1588       purge
1589           Does the same as a "flush" except it does not print the twig. It
1590           just deletes all elements that have been completely parsed so far.
1591
1592       purge_up_to ($elt)
1593           Purges up to the $elt element. This allows you to keep part of the
1594           tree in memory when you "purge".
1595
1596       print            ($optional_filehandle, %options)
1597           Prints the whole document associated with the twig. To be used only
1598           AFTER the parse.
1599
1600           options: see "flush".
1601
1602       print_to_file    ($filename, %options)
1603           Prints the whole document associated with the twig to file
1604           $filename.  To be used only AFTER the parse.
1605
1606           options: see "flush".
1607
1608       safe_print_to_file    ($filename, %options)
1609           Prints the whole document associated with the twig to file
1610           $filename.  This variant, which probably only works on *nix prints
1611           to a temp file, then move the temp file to overwrite the original
1612           file.
1613
1614           This is a bit safer when 2 processes an potentiallywrite the same
1615           file: only the last one will succeed, but the file won't be
1616           corruted. I often use this for cron jobs, so testing the code
1617           doesn't interfere with the cron job running at the same time.
1618
1619           options: see "flush".
1620
1621       sprint
1622           Return the text of the whole document associated with the twig. To
1623           be used only AFTER the parse.
1624
1625           options: see "flush".
1626
1627       trim
1628           Trim the document: gets rid of initial and trailing spaces, and
1629           replaces multiple spaces by a single one.
1630
1631       toSAX1 ($handler)
1632           Send SAX events for the twig to the SAX1 handler $handler
1633
1634       toSAX2 ($handler)
1635           Send SAX events for the twig to the SAX2 handler $handler
1636
1637       flush_toSAX1 ($handler)
1638           Same as flush, except that SAX events are sent to the SAX1 handler
1639           $handler instead of the twig being printed
1640
1641       flush_toSAX2 ($handler)
1642           Same as flush, except that SAX events are sent to the SAX2 handler
1643           $handler instead of the twig being printed
1644
1645       ignore
1646           This method should be called during parsing, usually in
1647           "start_tag_handlers".  It causes the element to be skipped during
1648           the parsing: the twig is not built for this element, it will not be
1649           accessible during parsing or after it. The element will not take up
1650           any memory and parsing will be faster.
1651
1652           Note that this method can also be called on an element. If the
1653           element is a parent of the current element then this element will
1654           be ignored (the twig will not be built any more for it and what has
1655           already been built will be deleted).
1656
1657       set_pretty_print  ($style)
1658           Set the pretty print method, amongst '"none"' (default),
1659           '"nsgmls"', '"nice"', '"indented"', "indented_c", '"wrapped"',
1660           '"record"' and '"record_c"'
1661
1662           WARNING: the pretty print style is a GLOBAL variable, so once set
1663           it's applied to ALL "print"'s (and "sprint"'s). Same goes if you
1664           use XML::Twig with "mod_perl" . This should not be a problem as the
1665           XML that's generated is valid anyway, and XML processors (as well
1666           as HTML processors, including browsers) should not care. Let me
1667           know if this is a big problem, but at the moment the
1668           performance/cleanliness trade-off clearly favors the global
1669           approach.
1670
1671       set_empty_tag_style  ($style)
1672           Set the empty tag display style ('"normal"', '"html"' or
1673           '"expand"'). As with "set_pretty_print" this sets a global flag.
1674
1675           "normal" outputs an empty tag '"<tag/>"', "html" adds a space
1676           '"<tag />"' for elements that can be empty in XHTML and "expand"
1677           outputs '"<tag></tag>"'
1678
1679       set_remove_cdata  ($flag)
1680           set (or unset) the flag that forces the twig to output CDATA
1681           sections as regular (escaped) PCDATA
1682
1683       print_prolog     ($optional_filehandle, %options)
1684           Prints the prolog (XML declaration + DTD + entity declarations) of
1685           a document.
1686
1687           options: see "flush".
1688
1689       prolog     ($optional_filehandle, %options)
1690           Return the prolog (XML declaration + DTD + entity declarations) of
1691           a document.
1692
1693           options: see "flush".
1694
1695       finish
1696           Call Expat "finish" method.  Unsets all handlers (including
1697           internal ones that set context), but expat continues parsing to the
1698           end of the document or until it finds an error.  It should finish
1699           up a lot faster than with the handlers set.
1700
1701       finish_print
1702           Stops twig processing, flush the twig and proceed to finish
1703           printing the document as fast as possible. Use this method when
1704           modifying a document and the modification is done.
1705
1706       finish_now
1707           Stops twig processing, does not finish parsing the document (which
1708           could actually be not well-formed after the point where
1709           "finish_now" is called).  Execution resumes after the "Lparse"> or
1710           "parsefile" call. The content of the twig is what has been parsed
1711           so far (all open elements at the time "finish_now" is called are
1712           considered closed).
1713
1714       set_expand_external_entities
1715           Same as using the "expand_external_ents" option when creating the
1716           twig
1717
1718       set_input_filter
1719           Same as using the "input_filter" option when creating the twig
1720
1721       set_keep_atts_order
1722           Same as using the "keep_atts_order" option when creating the twig
1723
1724       set_keep_encoding
1725           Same as using the "keep_encoding" option when creating the twig
1726
1727       escape_gt
1728           usually XML::Twig does not escape > in its output. Using this
1729           option makes it replace > by &gt;
1730
1731       do_not_escape_gt
1732           reverts XML::Twig behavior to its default of not escaping > in its
1733           output.
1734
1735       set_output_filter
1736           Same as using the "output_filter" option when creating the twig
1737
1738       set_output_text_filter
1739           Same as using the "output_text_filter" option when creating the
1740           twig
1741
1742       add_stylesheet ($type, @options)
1743           Adds an external stylesheet to an XML document.
1744
1745           Supported types and options:
1746
1747           xsl option: the url of the stylesheet
1748
1749               Example:
1750
1751                 $t->add_stylesheet( xsl => "xsl_style.xsl");
1752
1753               will generate the following PI at the beginning of the
1754               document:
1755
1756                 <?xml-stylesheet type="text/xsl" href="xsl_style.xsl"?>
1757
1758           css option: the url of the stylesheet
1759
1760           active_twig
1761               a class method that returns the last processed twig, so you
1762               don't necessarily need the object to call methods on it.
1763
1764       Methods inherited from XML::Parser::Expat
1765           A twig inherits all the relevant methods from XML::Parser::Expat.
1766           These methods can only be used during the parsing phase (they will
1767           generate a fatal error otherwise).
1768
1769           Inherited methods are:
1770
1771           depth
1772               Returns the size of the context list.
1773
1774           in_element
1775               Returns true if NAME is equal to the name of the innermost cur‐
1776               rently opened element. If namespace processing is being used
1777               and you want to check against a name that may be in a
1778               namespace, then use the generate_ns_name method to create the
1779               NAME argument.
1780
1781           within_element
1782               Returns the number of times the given name appears in the
1783               context list.  If namespace processing is being used and you
1784               want to check against a name that may be in a namespace, then
1785               use the gener‐ ate_ns_name method to create the NAME argument.
1786
1787           context
1788               Returns a list of element names that represent open elements,
1789               with the last one being the innermost. Inside start and end tag
1790               han‐ dlers, this will be the tag of the parent element.
1791
1792           current_line
1793               Returns the line number of the current position of the parse.
1794
1795           current_column
1796               Returns the column number of the current position of the parse.
1797
1798           current_byte
1799               Returns the current position of the parse.
1800
1801           position_in_context
1802               Returns a string that shows the current parse position. LINES
1803               should be an integer >= 0 that represents the number of lines
1804               on either side of the current parse line to place into the
1805               returned string.
1806
1807           base ([NEWBASE])
1808               Returns the current value of the base for resolving relative
1809               URIs.  If NEWBASE is supplied, changes the base to that value.
1810
1811           current_element
1812               Returns the name of the innermost currently opened element.
1813               Inside start or end handlers, returns the parent of the element
1814               associated with those tags.
1815
1816           element_index
1817               Returns an integer that is the depth-first visit order of the
1818               cur‐ rent element. This will be zero outside of the root
1819               element. For example, this will return 1 when called from the
1820               start handler for the root element start tag.
1821
1822           recognized_string
1823               Returns the string from the document that was recognized in
1824               order to call the current handler. For instance, when called
1825               from a start handler, it will give us the start-tag string. The
1826               string is encoded in UTF-8.  This method doesn't return a
1827               meaningful string inside declaration handlers.
1828
1829           original_string
1830               Returns the verbatim string from the document that was
1831               recognized in order to call the current handler. The string is
1832               in the original document encoding. This method doesn't return a
1833               meaningful string inside declaration handlers.
1834
1835           xpcroak
1836               Concatenate onto the given message the current line number
1837               within the XML document plus the message implied by
1838               ErrorContext. Then croak with the formed message.
1839
1840           xpcarp
1841               Concatenate onto the given message the current line number
1842               within the XML document plus the message implied by
1843               ErrorContext. Then carp with the formed message.
1844
1845           xml_escape(TEXT [, CHAR [, CHAR ...]])
1846               Returns TEXT with markup characters turned into character
1847               entities.  Any additional characters provided as arguments are
1848               also turned into character references where found in TEXT.
1849
1850               (this method is broken on some versions of expat/XML::Parser)
1851
1852       path ( $optional_tag)
1853           Return the element context in a form similar to XPath's short form:
1854           '"/root/tag1/../tag"'
1855
1856       get_xpath  ( $optional_array_ref, $xpath, $optional_offset)
1857           Performs a "get_xpath" on the document root (see <Elt|"Elt">)
1858
1859           If the $optional_array_ref argument is used the array must contain
1860           elements. The $xpath expression is applied to each element in turn
1861           and the result is union of all results. This way a first query can
1862           be refined in further steps.
1863
1864       find_nodes ( $optional_array_ref, $xpath, $optional_offset)
1865           same as "get_xpath"
1866
1867       findnodes ( $optional_array_ref, $xpath, $optional_offset)
1868           same as "get_xpath" (similar to the XML::LibXML method)
1869
1870       findvalue ( $optional_array_ref, $xpath, $optional_offset)
1871           Return the "join" of all texts of the results of applying
1872           "get_xpath" to the node (similar to the XML::LibXML method)
1873
1874       findvalues ( $optional_array_ref, $xpath, $optional_offset)
1875           Return an array of all texts of the results of applying "get_xpath"
1876           to the node
1877
1878       subs_text ($regexp, $replace)
1879           subs_text does text substitution on the whole document, similar to
1880           perl's " s///" operator.
1881
1882       dispose
1883           Useful only if you don't have "Scalar::Util" or "WeakRef"
1884           installed.
1885
1886           Reclaims properly the memory used by an XML::Twig object. As the
1887           object has circular references it never goes out of scope, so if
1888           you want to parse lots of XML documents then the memory leak
1889           becomes a problem. Use "$twig->dispose" to clear this problem.
1890
1891       att_accessors (list_of_attribute_names)
1892           A convenience method that creates l-valued accessors for
1893           attributes.  So "$twig->create_accessors( 'foo')" will create a
1894           "foo" method that can be called on elements:
1895
1896             $elt->foo;         # equivalent to $elt->{'att'}->{'foo'};
1897             $elt->foo( 'bar'); # equivalent to $elt->set_att( foo => 'bar');
1898
1899           The methods are l-valued only under those perl's that support this
1900           feature (5.6 and above)
1901
1902       create_accessors (list_of_attribute_names)
1903           Same as att_accessors
1904
1905       elt_accessors (list_of_attribute_names)
1906           A convenience method that creates accessors for elements.  So
1907           "$twig->create_accessors( 'foo')" will create a "foo" method that
1908           can be called on elements:
1909
1910             $elt->foo;         # equivalent to $elt->first_child( 'foo');
1911
1912       field_accessors (list_of_attribute_names)
1913           A convenience method that creates accessors for element values
1914           ("field").  So "$twig->create_accessors( 'foo')" will create a
1915           "foo" method that can be called on elements:
1916
1917             $elt->foo;         # equivalent to $elt->field( 'foo');
1918
1919       set_do_not_escape_amp_in_atts
1920           An evil method, that I only document because Test::Pod::Coverage
1921           complaints otherwise, but really, you don't want to know about it.
1922
1923   XML::Twig::Elt
1924       new          ($optional_tag, $optional_atts, @optional_content)
1925           The "tag" is optional (but then you can't have a content ), the
1926           $optional_atts argument is a reference to a hash of attributes, the
1927           content can be just a string or a list of strings and element. A
1928           content of '"#EMPTY"' creates an empty element;
1929
1930            Examples: my $elt= XML::Twig::Elt->new();
1931                      my $elt= XML::Twig::Elt->new( para => { align => 'center' });
1932                      my $elt= XML::Twig::Elt->new( para => { align => 'center' }, 'foo');
1933                      my $elt= XML::Twig::Elt->new( br   => '#EMPTY');
1934                      my $elt= XML::Twig::Elt->new( 'para');
1935                      my $elt= XML::Twig::Elt->new( para => 'this is a para');
1936                      my $elt= XML::Twig::Elt->new( para => $elt3, 'another para');
1937
1938           The strings are not parsed, the element is not attached to any
1939           twig.
1940
1941           WARNING: if you rely on ID's then you will have to set the id
1942           yourself. At this point the element does not belong to a twig yet,
1943           so the ID attribute is not known so it won't be stored in the ID
1944           list.
1945
1946           Note that "#COMMENT", "#PCDATA" or "#CDATA" are valid tag names,
1947           that will create text elements.
1948
1949           To create an element "foo" containing a CDATA section:
1950
1951                      my $foo= XML::Twig::Elt->new( '#CDATA' => "content of the CDATA section")
1952                                             ->wrap_in( 'foo');
1953
1954           An attribute of '#CDATA', will create the content of the element as
1955           CDATA:
1956
1957             my $elt= XML::Twig::Elt->new( 'p' => { '#CDATA' => 1}, 'foo < bar');
1958
1959           creates an element
1960
1961             <p><![CDATA[foo < bar]]></>
1962
1963       parse         ($string, %args)
1964           Creates an element from an XML string. The string is actually
1965           parsed as a new twig, then the root of that twig is returned.  The
1966           arguments in %args are passed to the twig.  As always if the parse
1967           fails the parser will die, so use an eval if you want to trap
1968           syntax errors.
1969
1970           As obviously the element does not exist beforehand this method has
1971           to be called on the class:
1972
1973             my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
1974                                             <elements>, actually tons of </elements>
1975                             h</a>");
1976
1977       set_inner_xml ($string)
1978           Sets the content of the element to be the tree created from the
1979           string
1980
1981       set_inner_html ($string)
1982           Sets the content of the element, after parsing the string with an
1983           HTML parser (HTML::Parser)
1984
1985       set_outer_xml ($string)
1986           Replaces the element with the tree created from the string
1987
1988       print         ($optional_filehandle, $optional_pretty_print_style)
1989           Prints an entire element, including the tags, optionally to a
1990           $optional_filehandle, optionally with a $pretty_print_style.
1991
1992           The print outputs XML data so base entities are escaped.
1993
1994       print_to_file    ($filename, %options)
1995           Prints the element to file $filename.
1996
1997           options: see "flush".  =item sprint       ($elt,
1998           $optional_no_enclosing_tag)
1999
2000           Return the xml string for an entire element, including the tags.
2001           If the optional second argument is true then only the string inside
2002           the element is returned (the start and end tag for $elt are not).
2003           The text is XML-escaped: base entities (& and < in text, & < and "
2004           in attribute values) are turned into entities.
2005
2006       gi  Return the gi of the element (the gi is the "generic identifier"
2007           the tag name in SGML parlance).
2008
2009           "tag" and "name" are synonyms of "gi".
2010
2011       tag Same as "gi"
2012
2013       name
2014           Same as "tag"
2015
2016       set_gi         ($tag)
2017           Set the gi (tag) of an element
2018
2019       set_tag        ($tag)
2020           Set the tag (="tag") of an element
2021
2022       set_name       ($name)
2023           Set the name (="tag") of an element
2024
2025       root
2026           Return the root of the twig in which the element is contained.
2027
2028       twig
2029           Return the twig containing the element.
2030
2031       parent        ($optional_condition)
2032           Return the parent of the element, or the first ancestor matching
2033           the $optional_condition
2034
2035       first_child   ($optional_condition)
2036           Return the first child of the element, or the first child matching
2037           the $optional_condition
2038
2039       has_child ($optional_condition)
2040           Return the first child of the element, or the first child matching
2041           the $optional_condition (same as first_child)
2042
2043       has_children ($optional_condition)
2044           Return the first child of the element, or the first child matching
2045           the $optional_condition (same as first_child)
2046
2047       first_child_text   ($optional_condition)
2048           Return the text of the first child of the element, or the first
2049           child
2050            matching the $optional_condition If there is no first_child then
2051           returns ''. This avoids getting the child, checking for its
2052           existence then getting the text for trivial cases.
2053
2054           Similar methods are available for the other navigation methods:
2055
2056           last_child_text
2057           prev_sibling_text
2058           next_sibling_text
2059           prev_elt_text
2060           next_elt_text
2061           child_text
2062           parent_text
2063
2064           All this methods also exist in "trimmed" variant:
2065
2066           first_child_trimmed_text
2067           last_child_trimmed_text
2068           prev_sibling_trimmed_text
2069           next_sibling_trimmed_text
2070           prev_elt_trimmed_text
2071           next_elt_trimmed_text
2072           child_trimmed_text
2073           parent_trimmed_text
2074       field         ($condition)
2075           Same method as "first_child_text" with a different name
2076
2077       fields         ($condition_list)
2078           Return the list of field (text of first child matching the
2079           conditions), missing fields are returned as the empty string.
2080
2081           Same method as "first_child_text" with a different name
2082
2083       trimmed_field         ($optional_condition)
2084           Same method as "first_child_trimmed_text" with a different name
2085
2086       set_field ($condition, $optional_atts, @list_of_elt_and_strings)
2087           Set the content of the first child of the element that matches
2088           $condition, the rest of the arguments is the same as for
2089           "set_content"
2090
2091           If no child matches $condition _and_ if $condition is a valid XML
2092           element name, then a new element by that name is created and
2093           inserted as the last child.
2094
2095       first_child_matches   ($optional_condition)
2096           Return the element if the first child of the element (if it exists)
2097           passes the $optional_condition "undef" otherwise
2098
2099             if( $elt->first_child_matches( 'title')) ...
2100
2101           is equivalent to
2102
2103             if( $elt->{first_child} && $elt->{first_child}->passes( 'title'))
2104
2105           "first_child_is" is an other name for this method
2106
2107           Similar methods are available for the other navigation methods:
2108
2109           last_child_matches
2110           prev_sibling_matches
2111           next_sibling_matches
2112           prev_elt_matches
2113           next_elt_matches
2114           child_matches
2115           parent_matches
2116       is_first_child ($optional_condition)
2117           returns true (the element) if the element is the first child of its
2118           parent (optionally that satisfies the $optional_condition)
2119
2120       is_last_child ($optional_condition)
2121           returns true (the element) if the element is the last child of its
2122           parent (optionally that satisfies the $optional_condition)
2123
2124       prev_sibling  ($optional_condition)
2125           Return the previous sibling of the element, or the previous sibling
2126           matching $optional_condition
2127
2128       next_sibling  ($optional_condition)
2129           Return the next sibling of the element, or the first one matching
2130           $optional_condition.
2131
2132       next_elt     ($optional_elt, $optional_condition)
2133           Return the next elt (optionally matching $optional_condition) of
2134           the element. This is defined as the next element which opens after
2135           the current element opens.  Which usually means the first child of
2136           the element.  Counter-intuitive as it might look this allows you to
2137           loop through the whole document by starting from the root.
2138
2139           The $optional_elt is the root of a subtree. When the "next_elt" is
2140           out of the subtree then the method returns undef. You can then walk
2141           a sub-tree with:
2142
2143             my $elt= $subtree_root;
2144             while( $elt= $elt->next_elt( $subtree_root))
2145               { # insert processing code here
2146               }
2147
2148       prev_elt     ($optional_condition)
2149           Return the previous elt (optionally matching $optional_condition)
2150           of the element. This is the first element which opens before the
2151           current one.  It is usually either the last descendant of the
2152           previous sibling or simply the parent
2153
2154       next_n_elt   ($offset, $optional_condition)
2155           Return the $offset-th element that matches the $optional_condition
2156
2157       following_elt
2158           Return the following element (as per the XPath following axis)
2159
2160       preceding_elt
2161           Return the preceding element (as per the XPath preceding axis)
2162
2163       following_elts
2164           Return the list of following elements (as per the XPath following
2165           axis)
2166
2167       preceding_elts
2168           Return the list of preceding elements (as per the XPath preceding
2169           axis)
2170
2171       children     ($optional_condition)
2172           Return the list of children (optionally which matches
2173           $optional_condition) of the element. The list is in document order.
2174
2175       children_count ($optional_condition)
2176           Return the number of children of the element (optionally which
2177           matches $optional_condition)
2178
2179       children_text ($optional_condition)
2180           In array context, returns an array containing the text of children
2181           of the element (optionally which matches $optional_condition)
2182
2183           In scalar context, returns the concatenation of the text of
2184           children of the element
2185
2186       children_trimmed_text ($optional_condition)
2187           In array context, returns an array containing the trimmed text of
2188           children of the element (optionally which matches
2189           $optional_condition)
2190
2191           In scalar context, returns the concatenation of the trimmed text of
2192           children of the element
2193
2194       children_copy ($optional_condition)
2195           Return a list of elements that are copies of the children of the
2196           element, optionally which matches $optional_condition
2197
2198       descendants     ($optional_condition)
2199           Return the list of all descendants (optionally which matches
2200           $optional_condition) of the element. This is the equivalent of the
2201           "getElementsByTagName" of the DOM (by the way, if you are really a
2202           DOM addict, you can use "getElementsByTagName" instead)
2203
2204       getElementsByTagName ($optional_condition)
2205           Same as "descendants"
2206
2207       find_by_tag_name ($optional_condition)
2208           Same as "descendants"
2209
2210       descendants_or_self ($optional_condition)
2211           Same as "descendants" except that the element itself is included in
2212           the list if it matches the $optional_condition
2213
2214       first_descendant  ($optional_condition)
2215           Return the first descendant of the element that matches the
2216           condition
2217
2218       last_descendant  ($optional_condition)
2219           Return the last descendant of the element that matches the
2220           condition
2221
2222       ancestors    ($optional_condition)
2223           Return the list of ancestors (optionally matching
2224           $optional_condition) of the element.  The list is ordered from the
2225           innermost ancestor to the outermost one
2226
2227           NOTE: the element itself is not part of the list, in order to
2228           include it you will have to use ancestors_or_self
2229
2230       ancestors_or_self     ($optional_condition)
2231           Return the list of ancestors (optionally matching
2232           $optional_condition) of the element, including the element (if it
2233           matches the condition>).  The list is ordered from the innermost
2234           ancestor to the outermost one
2235
2236       passes ($condition)
2237           Return the element if it passes the $condition
2238
2239       att          ($att)
2240           Return the value of attribute $att or "undef"
2241
2242       latt          ($att)
2243           Return the value of attribute $att or "undef"
2244
2245           this method is an lvalue, so you can do "$elt->latt( 'foo')= 'bar'"
2246           or "$elt->latt( 'foo')++;"
2247
2248       set_att      ($att, $att_value)
2249           Set the attribute of the element to the given value
2250
2251           You can actually set several attributes this way:
2252
2253             $elt->set_att( att1 => "val1", att2 => "val2");
2254
2255       del_att      ($att)
2256           Delete the attribute for the element
2257
2258           You can actually delete several attributes at once:
2259
2260             $elt->del_att( 'att1', 'att2', 'att3');
2261
2262       att_exists ($att)
2263           Returns true if the attribute $att exists for the element, false
2264           otherwise
2265
2266       cut Cut the element from the tree. The element still exists, it can be
2267           copied or pasted somewhere else, it is just not attached to the
2268           tree anymore.
2269
2270           Note that the "old" links to the parent, previous and next siblings
2271           can still be accessed using the former_* methods
2272
2273       former_next_sibling
2274           Returns the former next sibling of a cut node (or undef if the node
2275           has not been cut)
2276
2277           This makes it easier to write loops where you cut elements:
2278
2279               my $child= $parent->first_child( 'achild');
2280               while( $child->{'att'}->{'cut'})
2281                 { $child->cut; $child= ($child->{former} && $child->{former}->{next_sibling}); }
2282
2283       former_prev_sibling
2284           Returns the former previous sibling of a cut node (or undef if the
2285           node has not been cut)
2286
2287       former_parent
2288           Returns the former parent of a cut node (or undef if the node has
2289           not been cut)
2290
2291       cut_children ($optional_condition)
2292           Cut all the children of the element (or all of those which satisfy
2293           the $optional_condition).
2294
2295           Return the list of children
2296
2297       cut_descendants ($optional_condition)
2298           Cut all the descendants of the element (or all of those which
2299           satisfy the $optional_condition).
2300
2301           Return the list of descendants
2302
2303       copy        ($elt)
2304           Return a copy of the element. The copy is a "deep" copy: all sub-
2305           elements of the element are duplicated.
2306
2307       paste       ($optional_position, $ref)
2308           Paste a (previously "cut" or newly generated) element. Die if the
2309           element already belongs to a tree.
2310
2311           Note that the calling element is pasted:
2312
2313             $child->paste( first_child => $existing_parent);
2314             $new_sibling->paste( after => $this_sibling_is_already_in_the_tree);
2315
2316           or
2317
2318             my $new_elt= XML::Twig::Elt->new( tag => $content);
2319             $new_elt->paste( $position => $existing_elt);
2320
2321           Example:
2322
2323             my $t= XML::Twig->new->parse( 'doc.xml')
2324             my $toc= $t->root->new( 'toc');
2325             $toc->paste( $t->root); # $toc is pasted as first child of the root
2326             foreach my $title ($t->findnodes( '/doc/section/title'))
2327               { my $title_toc= $title->copy;
2328                 # paste $title_toc as the last child of toc
2329                 $title_toc->paste( last_child => $toc)
2330               }
2331
2332           Position options:
2333
2334           first_child (default)
2335               The element is pasted as the first child of $ref
2336
2337           last_child
2338               The element is pasted as the last child of $ref
2339
2340           before
2341               The element is pasted before $ref, as its previous sibling.
2342
2343           after
2344               The element is pasted after $ref, as its next sibling.
2345
2346           within
2347               In this case an extra argument, $offset, should be supplied.
2348               The element will be pasted in the reference element (or in its
2349               first text child) at the given offset. To achieve this the
2350               reference element will be split at the offset.
2351
2352           Note that you can call directly the underlying method:
2353
2354           paste_before
2355           paste_after
2356           paste_first_child
2357           paste_last_child
2358           paste_within
2359       move       ($optional_position, $ref)
2360           Move an element in the tree.  This is just a "cut" then a "paste".
2361           The syntax is the same as "paste".
2362
2363       replace       ($ref)
2364           Replaces an element in the tree. Sometimes it is just not possible
2365           to"cut" an element then "paste" another in its place, so "replace"
2366           comes in handy.  The calling element replaces $ref.
2367
2368       replace_with   (@elts)
2369           Replaces the calling element with one or more elements
2370
2371       delete
2372           Cut the element and frees the memory.
2373
2374       prefix       ($text, $optional_option)
2375           Add a prefix to an element. If the element is a "PCDATA" element
2376           the text is added to the pcdata, if the elements first child is a
2377           "PCDATA" then the text is added to it's pcdata, otherwise a new
2378           "PCDATA" element is created and pasted as the first child of the
2379           element.
2380
2381           If the option is "asis" then the prefix is added asis: it is
2382           created in a separate "PCDATA" element with an "asis" property. You
2383           can then write:
2384
2385             $elt1->prefix( '<b>', 'asis');
2386
2387           to create a "<b>" in the output of "print".
2388
2389       suffix       ($text, $optional_option)
2390           Add a suffix to an element. If the element is a "PCDATA" element
2391           the text is added to the pcdata, if the elements last child is a
2392           "PCDATA" then the text is added to it's pcdata, otherwise a new
2393           PCDATA element is created and pasted as the last child of the
2394           element.
2395
2396           If the option is "asis" then the suffix is added asis: it is
2397           created in a separate "PCDATA" element with an "asis" property. You
2398           can then write:
2399
2400             $elt2->suffix( '</b>', 'asis');
2401
2402       trim
2403           Trim the element in-place: spaces at the beginning and at the end
2404           of the element are discarded and multiple spaces within the element
2405           (or its descendants) are replaced by a single space.
2406
2407           Note that in some cases you can still end up with multiple spaces,
2408           if they are split between several elements:
2409
2410             <doc>  text <b>  hah! </b>  yep</doc>
2411
2412           gets trimmed to
2413
2414             <doc>text <b> hah! </b> yep</doc>
2415
2416           This is somewhere in between a bug and a feature.
2417
2418       normalize
2419           merge together all consecutive pcdata elements in the element (if
2420           for example you have turned some elements into pcdata using
2421           "erase", this will give you a "clean" element in which there all
2422           text fragments are as long as possible).
2423
2424       simplify (%options)
2425           Return a data structure suspiciously similar to XML::Simple's.
2426           Options are identical to XMLin options, see XML::Simple doc for
2427           more details (or use DATA::dumper or YAML to dump the data
2428           structure)
2429
2430           Note: there is no magic here, if you write "$twig->parsefile( $file
2431           )->simplify();" then it will load the entire document in memory. I
2432           am afraid you will have to put some work into it to get just the
2433           bits you want and discard the rest. Look at the synopsis or the
2434           XML::Twig 101 section at the top of the docs for more information.
2435
2436           content_key
2437           forcearray
2438           keyattr
2439           noattr
2440           normalize_space
2441               aka normalise_space
2442
2443           variables (%var_hash)
2444               %var_hash is a hash { name => value }
2445
2446               This option allows variables in the XML to be expanded when the
2447               file is read. (there is no facility for putting the variable
2448               names back if you regenerate XML using XMLout).
2449
2450               A 'variable' is any text of the form ${name} (or $name) which
2451               occurs in an attribute value or in the text content of an
2452               element. If 'name' matches a key in the supplied hashref,
2453               ${name} will be replaced with the corresponding value from the
2454               hashref. If no matching key is found, the variable will not be
2455               replaced.
2456
2457           var_att ($attribute_name)
2458               This option gives the name of an attribute that will be used to
2459               create variables in the XML:
2460
2461                 <dirs>
2462                   <dir name="prefix">/usr/local</dir>
2463                   <dir name="exec_prefix">$prefix/bin</dir>
2464                 </dirs>
2465
2466               use "var => 'name'" to get $prefix replaced by /usr/local in
2467               the generated data structure
2468
2469               By default variables are captured by the following regexp:
2470               /$(\w+)/
2471
2472           var_regexp (regexp)
2473               This option changes the regexp used to capture variables. The
2474               variable name should be in $1
2475
2476           group_tags { grouping tag => grouped tag, grouping tag 2 => grouped
2477           tag 2...}
2478               Option used to simplify the structure: elements listed will not
2479               be used.  Their children will be, they will be considered
2480               children of the element parent.
2481
2482               If the element is:
2483
2484                 <config host="laptop.xmltwig.org">
2485                   <server>localhost</server>
2486                   <dirs>
2487                     <dir name="base">/home/mrodrigu/standards</dir>
2488                     <dir name="tools">$base/tools</dir>
2489                   </dirs>
2490                   <templates>
2491                     <template name="std_def">std_def.templ</template>
2492                     <template name="dummy">dummy</template>
2493                   </templates>
2494                 </config>
2495
2496               Then calling simplify with "group_tags => { dirs => 'dir',
2497               templates => 'template'}" makes the data structure be exactly
2498               as if the start and end tags for "dirs" and "templates" were
2499               not there.
2500
2501               A YAML dump of the structure
2502
2503                 base: '/home/mrodrigu/standards'
2504                 host: laptop.xmltwig.org
2505                 server: localhost
2506                 template:
2507                   - std_def.templ
2508                   - dummy.templ
2509                 tools: '$base/tools'
2510
2511       split_at        ($offset)
2512           Split a text ("PCDATA" or "CDATA") element in 2 at $offset, the
2513           original element now holds the first part of the string and a new
2514           element holds the right part. The new element is returned
2515
2516           If the element is not a text element then the first text child of
2517           the element is split
2518
2519       split        ( $optional_regexp, $tag1, $atts1, $tag2, $atts2...)
2520           Split the text descendants of an element in place, the text is
2521           split using the $regexp, if the regexp includes () then the matched
2522           separators will be wrapped in elements.  $1 is wrapped in $tag1,
2523           with attributes $atts1 if $atts1 is given (as a hashref), $2 is
2524           wrapped in $tag2...
2525
2526           if $elt is "<p>tati tata <b>tutu tati titi</b> tata tati tata</p>"
2527
2528             $elt->split( qr/(ta)ti/, 'foo', {type => 'toto'} )
2529
2530           will change $elt to
2531
2532             <p><foo type="toto">ta</foo> tata <b>tutu <foo type="toto">ta</foo>
2533                 titi</b> tata <foo type="toto">ta</foo> tata</p>
2534
2535           The regexp can be passed either as a string or as "qr//" (perl
2536           5.005 and later), it defaults to \s+ just as the "split" built-in
2537           (but this would be quite a useless behaviour without the
2538           $optional_tag parameter)
2539
2540           $optional_tag defaults to PCDATA or CDATA, depending on the initial
2541           element type
2542
2543           The list of descendants is returned (including un-touched original
2544           elements and newly created ones)
2545
2546       mark        ( $regexp, $optional_tag, $optional_attribute_ref)
2547           This method behaves exactly as split, except only the newly created
2548           elements are returned
2549
2550       wrap_children ( $regexp_string, $tag, $optional_attribute_hashref)
2551           Wrap the children of the element that match the regexp in an
2552           element $tag.  If $optional_attribute_hashref is passed then the
2553           new element will have these attributes.
2554
2555           The $regexp_string includes tags, within pointy brackets, as in
2556           "<title><para>+" and the usual Perl modifiers (+*?...).  Tags can
2557           be further qualified with attributes: "<para type="warning"
2558           classif="cosmic_secret">+". The values for attributes should be
2559           xml-escaped: "<candy type="M&amp;Ms">*" ("<", "&" ">" and """
2560           should be escaped).
2561
2562           Note that elements might get extra "id" attributes in the process.
2563           See add_id.  Use strip_att to remove unwanted id's.
2564
2565           Here is an example:
2566
2567           If the element $elt has the following content:
2568
2569             <elt>
2570              <p>para 1</p>
2571              <l_l1_1>list 1 item 1 para 1</l_l1_1>
2572                <l_l1>list 1 item 1 para 2</l_l1>
2573              <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2574              <l_l1_n>list 1 item 3 para 1</l_l1_n>
2575                <l_l1>list 1 item 3 para 2</l_l1>
2576                <l_l1>list 1 item 3 para 3</l_l1>
2577              <l_l1_1>list 2 item 1 para 1</l_l1_1>
2578                <l_l1>list 2 item 1 para 2</l_l1>
2579              <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2580              <l_l1_n>list 2 item 3 para 1</l_l1_n>
2581                <l_l1>list 2 item 3 para 2</l_l1>
2582                <l_l1>list 2 item 3 para 3</l_l1>
2583             </elt>
2584
2585           Then the code
2586
2587             $elt->wrap_children( q{<l_l1_1><l_l1>*} , li => { type => "ul1" });
2588             $elt->wrap_children( q{<l_l1_n><l_l1>*} , li => { type => "ul" });
2589
2590             $elt->wrap_children( q{<li type="ul1"><li type="ul">+}, "ul");
2591             $elt->strip_att( 'id');
2592             $elt->strip_att( 'type');
2593             $elt->print;
2594
2595           will output:
2596
2597             <elt>
2598                <p>para 1</p>
2599                <ul>
2600                  <li>
2601                    <l_l1_1>list 1 item 1 para 1</l_l1_1>
2602                    <l_l1>list 1 item 1 para 2</l_l1>
2603                  </li>
2604                  <li>
2605                    <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2606                  </li>
2607                  <li>
2608                    <l_l1_n>list 1 item 3 para 1</l_l1_n>
2609                    <l_l1>list 1 item 3 para 2</l_l1>
2610                    <l_l1>list 1 item 3 para 3</l_l1>
2611                  </li>
2612                </ul>
2613                <ul>
2614                  <li>
2615                    <l_l1_1>list 2 item 1 para 1</l_l1_1>
2616                    <l_l1>list 2 item 1 para 2</l_l1>
2617                  </li>
2618                  <li>
2619                    <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2620                  </li>
2621                  <li>
2622                    <l_l1_n>list 2 item 3 para 1</l_l1_n>
2623                    <l_l1>list 2 item 3 para 2</l_l1>
2624                    <l_l1>list 2 item 3 para 3</l_l1>
2625                  </li>
2626                </ul>
2627             </elt>
2628
2629       subs_text ($regexp, $replace)
2630           subs_text does text substitution, similar to perl's " s///"
2631           operator.
2632
2633           $regexp must be a perl regexp, created with the "qr" operator.
2634
2635           $replace can include "$1, $2"... from the $regexp. It can also be
2636           used to create element and entities, by using "&elt( tag => { att
2637           => val }, text)" (similar syntax as "new") and &ent( name).
2638
2639           Here is a rather complex example:
2640
2641             $elt->subs_text( qr{(?<!do not )link to (http://([^\s,]*))},
2642                              'see &elt( a =>{ href => $1 }, $2)'
2643                            );
2644
2645           This will replace text like link to http://www.xmltwig.org by see
2646           <a href="www.xmltwig.org">www.xmltwig.org</a>, but not do not link
2647           to...
2648
2649           Generating entities (here replacing spaces with &nbsp;):
2650
2651             $elt->subs_text( qr{ }, '&ent( "&nbsp;")');
2652
2653           or, using a variable:
2654
2655             my $ent="&nbsp;";
2656             $elt->subs_text( qr{ }, "&ent( '$ent')");
2657
2658           Note that the substitution is always global, as in using the "g"
2659           modifier in a perl substitution, and that it is performed on all
2660           text descendants of the element.
2661
2662           Bug: in the $regexp, you can only use "\1", "\2"... if the
2663           replacement expression does not include elements or attributes. eg
2664
2665             $t->subs_text( qr/((t[aiou])\2)/, '$2');             # ok, replaces toto, tata, titi, tutu by to, ta, ti, tu
2666             $t->subs_text( qr/((t[aiou])\2)/, '&elt(p => $1)' ); # NOK, does not find toto...
2667
2668       add_id ($optional_coderef)
2669           Add an id to the element.
2670
2671           The id is an attribute, "id" by default, see the "id" option for
2672           XML::Twig "new" to change it. Use an id starting with "#" to get an
2673           id that's not output by print, flush or sprint, yet that allows you
2674           to use the elt_id method to get the element easily.
2675
2676           If the element already has an id, no new id is generated.
2677
2678           By default the method create an id of the form "twig_id_<nnnn>",
2679           where "<nnnn>" is a number, incremented each time the method is
2680           called successfully.
2681
2682       set_id_seed ($prefix)
2683           by default the id generated by "add_id" is "twig_id_<nnnn>",
2684           "set_id_seed" changes the prefix to $prefix and resets the number
2685           to 1
2686
2687       strip_att ($att)
2688           Remove the attribute $att from all descendants of the element
2689           (including the element)
2690
2691           Return the element
2692
2693       change_att_name ($old_name, $new_name)
2694           Change the name of the attribute from $old_name to $new_name. If
2695           there is no attribute $old_name nothing happens.
2696
2697       lc_attnames
2698           Lower cases the name all the attributes of the element.
2699
2700       sort_children_on_value( %options)
2701           Sort the children of the element in place according to their text.
2702           All children are sorted.
2703
2704           Return the element, with its children sorted.
2705
2706           %options are
2707
2708             type  : numeric |  alpha     (default: alpha)
2709             order : normal  |  reverse   (default: normal)
2710
2711           Return the element, with its children sorted
2712
2713       sort_children_on_att ($att, %options)
2714           Sort the children of the  element in place according to attribute
2715           $att.  %options are the same as for "sort_children_on_value"
2716
2717           Return the element.
2718
2719       sort_children_on_field ($tag, %options)
2720           Sort the children of the element in place, according to the field
2721           $tag (the text of the first child of the child with this tag).
2722           %options are the same as for "sort_children_on_value".
2723
2724           Return the element, with its children sorted
2725
2726       sort_children( $get_key, %options)
2727           Sort the children of the element in place. The $get_key argument is
2728           a reference to a function that returns the sort key when passed an
2729           element.
2730
2731           For example:
2732
2733             $elt->sort_children( sub { $_[0]->{'att'}->{"nb"} + $_[0]->text },
2734                                  type => 'numeric', order => 'reverse'
2735                                );
2736
2737       field_to_att ($cond, $att)
2738           Turn the text of the first sub-element matched by $cond into the
2739           value of attribute $att of the element. If $att is omitted then
2740           $cond is used as the name of the attribute, which makes sense only
2741           if $cond is a valid element (and attribute) name.
2742
2743           The sub-element is then cut.
2744
2745       att_to_field ($att, $tag)
2746           Take the value of attribute $att and create a sub-element $tag as
2747           first child of the element. If $tag is omitted then $att is used as
2748           the name of the sub-element.
2749
2750       get_xpath  ($xpath, $optional_offset)
2751           Return a list of elements satisfying the $xpath. $xpath is an
2752           XPATH-like expression.
2753
2754           A subset of the XPATH abbreviated syntax is covered:
2755
2756             tag
2757             tag[1] (or any other positive number)
2758             tag[last()]
2759             tag[@att] (the attribute exists for the element)
2760             tag[@att="val"]
2761             tag[@att=~ /regexp/]
2762             tag[att1="val1" and att2="val2"]
2763             tag[att1="val1" or att2="val2"]
2764             tag[string()="toto"] (returns tag elements which text (as per the text method)
2765                                  is toto)
2766             tag[string()=~/regexp/] (returns tag elements which text (as per the text
2767                                     method) matches regexp)
2768             expressions can start with / (search starts at the document root)
2769             expressions can start with . (search starts at the current element)
2770             // can be used to get all descendants instead of just direct children
2771             * matches any tag
2772
2773           So the following examples from the XPath
2774           recommendation<http://www.w3.org/TR/xpath.html#path-abbrev> work:
2775
2776             para selects the para element children of the context node
2777             * selects all element children of the context node
2778             para[1] selects the first para child of the context node
2779             para[last()] selects the last para child of the context node
2780             */para selects all para grandchildren of the context node
2781             /doc/chapter[5]/section[2] selects the second section of the fifth chapter
2782                of the doc
2783             chapter//para selects the para element descendants of the chapter element
2784                children of the context node
2785             //para selects all the para descendants of the document root and thus selects
2786                all para elements in the same document as the context node
2787             //olist/item selects all the item elements in the same document as the
2788                context node that have an olist parent
2789             .//para selects the para element descendants of the context node
2790             .. selects the parent of the context node
2791             para[@type="warning"] selects all para children of the context node that have
2792                a type attribute with value warning
2793             employee[@secretary and @assistant] selects all the employee children of the
2794                context node that have both a secretary attribute and an assistant
2795                attribute
2796
2797           The elements will be returned in the document order.
2798
2799           If $optional_offset is used then only one element will be returned,
2800           the one with the appropriate offset in the list, starting at 0
2801
2802           Quoting and interpolating variables can be a pain when the Perl
2803           syntax and the XPATH syntax collide, so use alternate quoting
2804           mechanisms like q or qq (I like q{} and qq{} myself).
2805
2806           Here are some more examples to get you started:
2807
2808             my $p1= "p1";
2809             my $p2= "p2";
2810             my @res= $t->get_xpath( qq{p[string( "$p1") or string( "$p2")]});
2811
2812             my $a= "a1";
2813             my @res= $t->get_xpath( qq{//*[@att="$a"]});
2814
2815             my $val= "a1";
2816             my $exp= qq{//p[ \@att='$val']}; # you need to use \@ or you will get a warning
2817             my @res= $t->get_xpath( $exp);
2818
2819           Note that the only supported regexps delimiters are / and that you
2820           must backslash all / in regexps AND in regular strings.
2821
2822           XML::Twig does not provide natively full XPATH support, but you can
2823           use "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as
2824           the XPath engine, with full coverage of the spec.
2825
2826           "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as the
2827           XPath engine, with full coverage of the spec.
2828
2829       find_nodes
2830           same as"get_xpath"
2831
2832       findnodes
2833           same as "get_xpath"
2834
2835       text @optional_options
2836           Return a string consisting of all the "PCDATA" and "CDATA" in an
2837           element, without any tags. The text is not XML-escaped: base
2838           entities such as "&" and "<" are not escaped.
2839
2840           The '"no_recurse"' option will only return the text of the element,
2841           not of any included sub-elements (same as "text_only").
2842
2843       text_only
2844           Same as "text" except that the text returned doesn't include the
2845           text of sub-elements.
2846
2847       trimmed_text
2848           Same as "text" except that the text is trimmed: leading and
2849           trailing spaces are discarded, consecutive spaces are collapsed
2850
2851       set_text        ($string)
2852           Set the text for the element: if the element is a "PCDATA", just
2853           set its text, otherwise cut all the children of the element and
2854           create a single "PCDATA" child for it, which holds the text.
2855
2856       merge ($elt2)
2857           Move the content of $elt2 within the element
2858
2859       insert         ($tag1, [$optional_atts1], $tag2, [$optional_atts2],...)
2860           For each tag in the list inserts an element $tag as the only child
2861           of the element.  The element gets the optional attributes
2862           in"$optional_atts<n>."  All children of the element are set as
2863           children of the new element.  The upper level element is returned.
2864
2865             $p->insert( table => { border=> 1}, 'tr', 'td')
2866
2867           put $p in a table with a visible border, a single "tr" and a single
2868           "td" and return the "table" element:
2869
2870             <p><table border="1"><tr><td>original content of p</td></tr></table></p>
2871
2872       wrap_in        (@tag)
2873           Wrap elements in @tag as the successive ancestors of the element,
2874           returns the new element.  "$elt->wrap_in( 'td', 'tr', 'table')"
2875           wraps the element as a single cell in a table for example.
2876
2877           Optionally each tag can be followed by a hashref of attributes,
2878           that will be set on the wrapping element:
2879
2880             $elt->wrap_in( p => { class => "advisory" }, div => { class => "intro", id => "div_intro" });
2881
2882       insert_new_elt ($opt_position, $tag, $opt_atts_hashref, @opt_content)
2883           Combines a "new " and a "paste ": creates a new element using $tag,
2884           $opt_atts_hashref and @opt_content which are arguments similar to
2885           those for "new", then paste it, using $opt_position or
2886           'first_child', relative to $elt.
2887
2888           Return the newly created element
2889
2890       erase
2891           Erase the element: the element is deleted and all of its children
2892           are pasted in its place.
2893
2894       set_content    ( $optional_atts, @list_of_elt_and_strings) (
2895       $optional_atts, '#EMPTY')
2896           Set the content for the element, from a list of strings and
2897           elements.  Cuts all the element children, then pastes the list
2898           elements as the children.  This method will create a "PCDATA"
2899           element for any strings in the list.
2900
2901           The $optional_atts argument is the ref of a hash of attributes. If
2902           this argument is used then the previous attributes are deleted,
2903           otherwise they are left untouched.
2904
2905           WARNING: if you rely on ID's then you will have to set the id
2906           yourself. At this point the element does not belong to a twig yet,
2907           so the ID attribute is not known so it won't be stored in the ID
2908           list.
2909
2910           A content of '"#EMPTY"' creates an empty element;
2911
2912       namespace ($optional_prefix)
2913           Return the URI of the namespace that $optional_prefix or the
2914           element name belongs to. If the name doesn't belong to any
2915           namespace, "undef" is returned.
2916
2917       local_name
2918           Return the local name (without the prefix) for the element
2919
2920       ns_prefix
2921           Return the namespace prefix for the element
2922
2923       current_ns_prefixes
2924           Return a list of namespace prefixes valid for the element. The
2925           order of the prefixes in the list has no meaning. If the default
2926           namespace is currently bound, '' appears in the list.
2927
2928       inherit_att  ($att, @optional_tag_list)
2929           Return the value of an attribute inherited from parent tags. The
2930           value returned is found by looking for the attribute in the element
2931           then in turn in each of its ancestors. If the @optional_tag_list is
2932           supplied only those ancestors whose tag is in the list will be
2933           checked.
2934
2935       all_children_are ($optional_condition)
2936           return 1 if all children of the element pass the
2937           $optional_condition, 0 otherwise
2938
2939       level       ($optional_condition)
2940           Return the depth of the element in the twig (root is 0).  If
2941           $optional_condition is given then only ancestors that match the
2942           condition are counted.
2943
2944           WARNING: in a tree created using the "twig_roots" option this will
2945           not return the level in the document tree, level 0 will be the
2946           document root, level 1 will be the "twig_roots" elements. During
2947           the parsing (in a "twig_handler") you can use the "depth" method on
2948           the twig object to get the real parsing depth.
2949
2950       in           ($potential_parent)
2951           Return true if the element is in the potential_parent
2952           ($potential_parent is an element)
2953
2954       in_context   ($cond, $optional_level)
2955           Return true if the element is included in an element which passes
2956           $cond optionally within $optional_level levels. The returned value
2957           is the including element.
2958
2959       pcdata
2960           Return the text of a "PCDATA" element or "undef" if the element is
2961           not "PCDATA".
2962
2963       pcdata_xml_string
2964           Return the text of a "PCDATA" element or undef if the element is
2965           not "PCDATA".  The text is "XML-escaped" ('&' and '<' are replaced
2966           by '&amp;' and '&lt;')
2967
2968       set_pcdata     ($text)
2969           Set the text of a "PCDATA" element. This method does not check that
2970           the element is indeed a "PCDATA" so usually you should use
2971           "set_text" instead.
2972
2973       append_pcdata  ($text)
2974           Add the text at the end of a "PCDATA" element.
2975
2976       is_cdata
2977           Return 1 if the element is a "CDATA" element, returns 0 otherwise.
2978
2979       is_text
2980           Return 1 if the element is a "CDATA" or "PCDATA" element, returns 0
2981           otherwise.
2982
2983       cdata
2984           Return the text of a "CDATA" element or "undef" if the element is
2985           not "CDATA".
2986
2987       cdata_string
2988           Return the XML string of a "CDATA" element, including the opening
2989           and closing markers.
2990
2991       set_cdata     ($text)
2992           Set the text of a "CDATA" element.
2993
2994       append_cdata  ($text)
2995           Add the text at the end of a "CDATA" element.
2996
2997       remove_cdata
2998           Turns all "CDATA" sections in the element into regular "PCDATA"
2999           elements. This is useful when converting XML to HTML, as browsers
3000           do not support CDATA sections.
3001
3002       extra_data
3003           Return the extra_data (comments and PI's) attached to an element
3004
3005       set_extra_data     ($extra_data)
3006           Set the extra_data (comments and PI's) attached to an element
3007
3008       append_extra_data  ($extra_data)
3009           Append extra_data to the existing extra_data before the element (if
3010           no previous extra_data exists then it is created)
3011
3012       set_asis
3013           Set a property of the element that causes it to be output without
3014           being XML escaped by the print functions: if it contains "a < b" it
3015           will be output as such and not as "a &lt; b". This can be useful to
3016           create text elements that will be output as markup. Note that all
3017           "PCDATA" descendants of the element are also marked as having the
3018           property (they are the ones that are actually impacted by the
3019           change).
3020
3021           If the element is a "CDATA" element it will also be output asis,
3022           without the "CDATA" markers. The same goes for any "CDATA"
3023           descendant of the element
3024
3025       set_not_asis
3026           Unsets the "asis" property for the element and its text
3027           descendants.
3028
3029       is_asis
3030           Return the "asis" property status of the element ( 1 or "undef")
3031
3032       closed
3033           Return true if the element has been closed. Might be useful if you
3034           are somewhere in the tree, during the parse, and have no idea
3035           whether a parent element is completely loaded or not.
3036
3037       get_type
3038           Return the type of the element: '"#ELT"' for "real" elements, or
3039           '"#PCDATA"', '"#CDATA"', '"#COMMENT"', '"#ENT"', '"#PI"'
3040
3041       is_elt
3042           Return the tag if the element is a "real" element, or 0 if it is
3043           "PCDATA", "CDATA"...
3044
3045       contains_only_text
3046           Return 1 if the element does not contain any other "real" element
3047
3048       contains_only ($exp)
3049           Return the list of children if all children of the element match
3050           the expression $exp
3051
3052             if( $para->contains_only( 'tt')) { ... }
3053
3054       contains_a_single ($exp)
3055           If the element contains a single child that matches the expression
3056           $exp returns that element. Otherwise returns 0.
3057
3058       is_field
3059           same as "contains_only_text"
3060
3061       is_pcdata
3062           Return 1 if the element is a "PCDATA" element, returns 0 otherwise.
3063
3064       is_ent
3065           Return 1 if the element is an entity (an unexpanded entity)
3066           element, return 0 otherwise.
3067
3068       is_empty
3069           Return 1 if the element is empty, 0 otherwise
3070
3071       set_empty
3072           Flags the element as empty. No further check is made, so if the
3073           element is actually not empty the output will be messed. The only
3074           effect of this method is that the output will be "<tag
3075           att="value""/>".
3076
3077       set_not_empty
3078           Flags the element as not empty. if it is actually empty then the
3079           element will be output as "<tag att="value""></tag>"
3080
3081       is_pi
3082           Return 1 if the element is a processing instruction ("#PI")
3083           element, return 0 otherwise.
3084
3085       target
3086           Return the target of a processing instruction
3087
3088       set_target ($target)
3089           Set the target of a processing instruction
3090
3091       data
3092           Return the data part of a processing instruction
3093
3094       set_data ($data)
3095           Set the data of a processing instruction
3096
3097       set_pi ($target, $data)
3098           Set the target and data of a processing instruction
3099
3100       pi_string
3101           Return the string form of a processing instruction ("<?target
3102           data?>")
3103
3104       is_comment
3105           Return 1 if the element is a comment ("#COMMENT") element, return 0
3106           otherwise.
3107
3108       set_comment ($comment_text)
3109           Set the text for a comment
3110
3111       comment
3112           Return the content of a comment (just the text, not the "<!--" and
3113           "-->")
3114
3115       comment_string
3116           Return the XML string for a comment ("<!-- comment -->")
3117
3118           Note that an XML comment cannot start or end with a '-', or include
3119           '--' (http://www.w3.org/TR/2008/REC-xml-20081126/#sec-comments), if
3120           that is the case (because you have created the comment yourself
3121           presumably, as it could not be in the input XML), then a space will
3122           be inserted before an initial '-', after a trailing one or between
3123           two '-' in the comment (which could presumably mangle javascript
3124           "hidden" in an XHTML comment);
3125
3126       set_ent ($entity)
3127           Set an (non-expanded) entity ("#ENT"). $entity) is the entity text
3128           ("&ent;")
3129
3130       ent Return the entity for an entity ("#ENT") element ("&ent;")
3131
3132       ent_name
3133           Return the entity name for an entity ("#ENT") element ("ent")
3134
3135       ent_string
3136           Return the entity, either expanded if the expanded version is
3137           available, or non-expanded ("&ent;") otherwise
3138
3139       child ($offset, $optional_condition)
3140           Return the $offset-th child of the element, optionally the
3141           $offset-th child that matches $optional_condition. The children are
3142           treated as a list, so "$elt->child( 0)" is the first child, while
3143           "$elt->child( -1)" is the last child.
3144
3145       child_text ($offset, $optional_condition)
3146           Return the text of a child or "undef" if the sibling does not
3147           exist. Arguments are the same as child.
3148
3149       last_child    ($optional_condition)
3150           Return the last child of the element, or the last child matching
3151           $optional_condition (ie the last of the element children matching
3152           the condition).
3153
3154       last_child_text   ($optional_condition)
3155           Same as "first_child_text" but for the last child.
3156
3157       sibling  ($offset, $optional_condition)
3158           Return the next or previous $offset-th sibling of the element, or
3159           the $offset-th one matching $optional_condition. If $offset is
3160           negative then a previous sibling is returned, if $offset is
3161           positive then  a next sibling is returned. "$offset=0" returns the
3162           element if there is no condition or if the element matches the
3163           condition>, "undef" otherwise.
3164
3165       sibling_text ($offset, $optional_condition)
3166           Return the text of a sibling or "undef" if the sibling does not
3167           exist.  Arguments are the same as "sibling".
3168
3169       prev_siblings ($optional_condition)
3170           Return the list of previous siblings (optionally matching
3171           $optional_condition) for the element. The elements are ordered in
3172           document order.
3173
3174       next_siblings ($optional_condition)
3175           Return the list of siblings (optionally matching
3176           $optional_condition) following the element. The elements are
3177           ordered in document order.
3178
3179       siblings ($optional_condition)
3180           Return the list of siblings (optionally matching
3181           $optional_condition) of the element (excluding the element itself).
3182           The elements are ordered in document order.
3183
3184       pos ($optional_condition)
3185           Return the position of the element in the children list. The first
3186           child has a position of 1 (as in XPath).
3187
3188           If the $optional_condition is given then only siblings that match
3189           the condition are counted. If the element itself does not match the
3190           condition then 0 is returned.
3191
3192       atts
3193           Return a hash ref containing the element attributes
3194
3195       set_atts      ({ att1=>$att1_val, att2=> $att2_val... })
3196           Set the element attributes with the hash ref supplied as the
3197           argument. The previous attributes are lost (ie the attributes set
3198           by "set_atts" replace all of the attributes of the element).
3199
3200           You can also pass a list instead of a hashref: "$elt->set_atts(
3201           att1 => 'val1',...)"
3202
3203       del_atts
3204           Deletes all the element attributes.
3205
3206       att_nb
3207           Return the number of attributes for the element
3208
3209       has_atts
3210           Return true if the element has attributes (in fact return the
3211           number of attributes, thus being an alias to "att_nb"
3212
3213       has_no_atts
3214           Return true if the element has no attributes, false (0) otherwise
3215
3216       att_names
3217           return a list of the attribute names for the element
3218
3219       att_xml_string ($att, $options)
3220           Return the attribute value, where '&', '<' and quote (" or the
3221           value of the quote option at twig creation) are XML-escaped.
3222
3223           The options are passed as a hashref, setting "escape_gt" to a true
3224           value will also escape '>' ($elt( 'myatt', { escape_gt => 1 });
3225
3226       set_id       ($id)
3227           Set the "id" attribute of the element to the value.  See "elt_id "
3228           to change the id attribute name
3229
3230       id  Gets the id attribute value
3231
3232       del_id       ($id)
3233           Deletes the "id" attribute of the element and remove it from the id
3234           list for the document
3235
3236       class
3237           Return the "class" attribute for the element (methods on the
3238           "class" attribute are quite convenient when dealing with XHTML, or
3239           plain XML that will eventually be displayed using CSS)
3240
3241       lclass
3242           same as class, except that this method is an lvalue, so you can do
3243           "$elt->lclass= "foo""
3244
3245       set_class ($class)
3246           Set the "class" attribute for the element to $class
3247
3248       add_class ($class)
3249           Add $class to the element "class" attribute: the new class is added
3250           only if it is not already present.
3251
3252           Note that classes are then sorted alphabetically, so the "class"
3253           attribute can be changed even if the class is already there
3254
3255       remove_class ($class)
3256           Remove $class from the element "class" attribute.
3257
3258           Note that classes are then sorted alphabetically, so the "class"
3259           attribute can be changed even if the class is already there
3260
3261       add_to_class ($class)
3262           alias for add_class
3263
3264       att_to_class ($att)
3265           Set the "class" attribute to the value of attribute $att
3266
3267       add_att_to_class ($att)
3268           Add the value of attribute $att to the "class" attribute of the
3269           element
3270
3271       move_att_to_class ($att)
3272           Add the value of attribute $att to the "class" attribute of the
3273           element and delete the attribute
3274
3275       tag_to_class
3276           Set the "class" attribute of the element to the element tag
3277
3278       add_tag_to_class
3279           Add the element tag to its "class" attribute
3280
3281       set_tag_class ($new_tag)
3282           Add the element tag to its "class" attribute and sets the tag to
3283           $new_tag
3284
3285       in_class ($class)
3286           Return true (1) if the element is in the class $class (if $class is
3287           one of the tokens in the element "class" attribute)
3288
3289       tag_to_span
3290           Change the element tag tp "span" and set its class to the old tag
3291
3292       tag_to_div
3293           Change the element tag tp "div" and set its class to the old tag
3294
3295       DESTROY
3296           Frees the element from memory.
3297
3298       start_tag
3299           Return the string for the start tag for the element, including the
3300           "/>" at the end of an empty element tag
3301
3302       end_tag
3303           Return the string for the end tag of an element.  For an empty
3304           element, this returns the empty string ('').
3305
3306       xml_string @optional_options
3307           Equivalent to "$elt->sprint( 1)", returns the string for the entire
3308           element, excluding the element's tags (but nested element tags are
3309           present)
3310
3311           The '"no_recurse"' option will only return the text of the element,
3312           not of any included sub-elements (same as "xml_text_only").
3313
3314       inner_xml
3315           Another synonym for xml_string
3316
3317       outer_xml
3318           An other synonym for sprint
3319
3320       xml_text
3321           Return the text of the element, encoded (and processed by the
3322           current "output_filter" or "output_encoding" options, without any
3323           tag.
3324
3325       xml_text_only
3326           Same as "xml_text" except that the text returned doesn't include
3327           the text of sub-elements.
3328
3329       set_pretty_print ($style)
3330           Set the pretty print method, amongst '"none"' (default),
3331           '"nsgmls"', '"nice"', '"indented"', '"record"' and '"record_c"'
3332
3333           pretty_print styles:
3334
3335           none
3336               the default, no "\n" is used
3337
3338           nsgmls
3339               nsgmls style, with "\n" added within tags
3340
3341           nice
3342               adds "\n" wherever possible (NOT SAFE, can lead to invalid XML)
3343
3344           indented
3345               same as "nice" plus indents elements (NOT SAFE, can lead to
3346               invalid XML)
3347
3348           record
3349               table-oriented pretty print, one field per line
3350
3351           record_c
3352               table-oriented pretty print, more compact than "record", one
3353               record per line
3354
3355       set_empty_tag_style ($style)
3356           Set the method to output empty tags, amongst '"normal"' (default),
3357           '"html"', and '"expand"',
3358
3359           "normal" outputs an empty tag '"<tag/>"', "html" adds a space
3360           '"<tag />"' for elements that can be empty in XHTML and "expand"
3361           outputs '"<tag></tag>"'
3362
3363       set_remove_cdata  ($flag)
3364           set (or unset) the flag that forces the twig to output CDATA
3365           sections as regular (escaped) PCDATA
3366
3367       set_indent ($string)
3368           Set the indentation for the indented pretty print style (default is
3369           2 spaces)
3370
3371       set_quote ($quote)
3372           Set the quotes used for attributes. can be '"double"' (default) or
3373           '"single"'
3374
3375       cmp       ($elt)
3376             Compare the order of the 2 elements in a twig.
3377
3378             C<$a> is the <A>..</A> element, C<$b> is the <B>...</B> element
3379
3380             document                        $a->cmp( $b)
3381             <A> ... </A> ... <B>  ... </B>     -1
3382             <A> ... <B>  ... </B> ... </A>     -1
3383             <B> ... </B> ... <A>  ... </A>      1
3384             <B> ... <A>  ... </A> ... </B>      1
3385              $a == $b                           0
3386              $a and $b not in the same tree   undef
3387
3388       before       ($elt)
3389           Return 1 if $elt starts before the element, 0 otherwise. If the 2
3390           elements are not in the same twig then return "undef".
3391
3392               if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3393
3394       after       ($elt)
3395           Return 1 if $elt starts after the element, 0 otherwise. If the 2
3396           elements are not in the same twig then return "undef".
3397
3398               if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3399
3400       other comparison methods
3401           lt
3402           le
3403           gt
3404           ge
3405       path
3406           Return the element context in a form similar to XPath's short form:
3407           '"/root/tag1/../tag"'
3408
3409       xpath
3410           Return a unique XPath expression that can be used to find the
3411           element again.
3412
3413           It looks like "/doc/sect[3]/title": unique elements do not have an
3414           index, the others do.
3415
3416       flush
3417           flushes the twig up to the current element (strictly equivalent to
3418           "$elt->root->flush")
3419
3420       private methods
3421           Low-level methods on the twig:
3422
3423           set_parent        ($parent)
3424           set_first_child   ($first_child)
3425           set_last_child    ($last_child)
3426           set_prev_sibling  ($prev_sibling)
3427           set_next_sibling  ($next_sibling)
3428           set_twig_current
3429           del_twig_current
3430           twig_current
3431           contains_text
3432
3433           Those methods should not be used, unless of course you find some
3434           creative and interesting, not to mention useful, ways to do it.
3435
3436   cond
3437       Most of the navigation functions accept a condition as an optional
3438       argument The first element (or all elements for "children " or
3439       "ancestors ") that passes the condition is returned.
3440
3441       The condition is a single step of an XPath expression using the XPath
3442       subset defined by "get_xpath". Additional conditions are:
3443
3444       The condition can be
3445
3446       #ELT
3447           return a "real" element (not a PCDATA, CDATA, comment or pi
3448           element)
3449
3450       #TEXT
3451           return a PCDATA or CDATA element
3452
3453       regular expression
3454           return an element whose tag matches the regexp. The regexp has to
3455           be created with "qr//" (hence this is available only on perl 5.005
3456           and above)
3457
3458       code reference
3459           applies the code, passing the current element as argument, if the
3460           code returns true then the element is returned, if it returns false
3461           then the code is applied to the next candidate.
3462
3463   XML::Twig::XPath
3464       XML::Twig implements a subset of XPath through the "get_xpath" method.
3465
3466       If you want to use the whole XPath power, then you can use
3467       "XML::Twig::XPath" instead. In this case "XML::Twig" uses "XML::XPath"
3468       to execute XPath queries.  You will of course need "XML::XPath"
3469       installed to be able to use "XML::Twig::XPath".
3470
3471       See XML::XPath for more information.
3472
3473       The methods you can use are:
3474
3475       findnodes              ($path)
3476           return a list of nodes found by $path.
3477
3478       findnodes_as_string    ($path)
3479           return the nodes found reproduced as XML. The result is not
3480           guaranteed to be valid XML though.
3481
3482       findvalue              ($path)
3483           return the concatenation of the text content of the result nodes
3484
3485       In order for "XML::XPath" to be used as the XPath engine the following
3486       methods are included in "XML::Twig":
3487
3488       in XML::Twig
3489
3490       getRootNode
3491       getParentNode
3492       getChildNodes
3493
3494       in XML::Twig::Elt
3495
3496       string_value
3497       toString
3498       getName
3499       getRootNode
3500       getNextSibling
3501       getPreviousSibling
3502       isElementNode
3503       isTextNode
3504       isPI
3505       isPINode
3506       isProcessingInstructionNode
3507       isComment
3508       isCommentNode
3509       getTarget
3510       getChildNodes
3511       getElementById
3512
3513   XML::Twig::XPath::Elt
3514       The methods you can use are the same as on "XML::Twig::XPath" elements:
3515
3516       findnodes              ($path)
3517           return a list of nodes found by $path.
3518
3519       findnodes_as_string    ($path)
3520           return the nodes found reproduced as XML. The result is not
3521           guaranteed to be valid XML though.
3522
3523       findvalue              ($path)
3524           return the concatenation of the text content of the result nodes
3525
3526   XML::Twig::Entity_list
3527       new Create an entity list.
3528
3529       add         ($ent)
3530           Add an entity to an entity list.
3531
3532       add_new_ent ($name, $val, $sysid, $pubid, $ndata, $param)
3533           Create a new entity and add it to the entity list
3534
3535       delete     ($ent or $tag).
3536           Delete an entity (defined by its name or by the Entity object) from
3537           the list.
3538
3539       print      ($optional_filehandle)
3540           Print the entity list.
3541
3542       list
3543           Return the list as an array
3544
3545   XML::Twig::Entity
3546       new        ($name, $val, $sysid, $pubid, $ndata, $param)
3547           Same arguments as the Entity handler for XML::Parser.
3548
3549       print       ($optional_filehandle)
3550           Print an entity declaration.
3551
3552       name
3553           Return the name of the entity
3554
3555       val Return the value of the entity
3556
3557       sysid
3558           Return the system id for the entity (for NDATA entities)
3559
3560       pubid
3561           Return the public id for the entity (for NDATA entities)
3562
3563       ndata
3564           Return true if the entity is an NDATA entity
3565
3566       param
3567           Return true if the entity is a parameter entity
3568
3569       text
3570           Return the entity declaration text.
3571
3572   XML::Twig::Notation_list
3573       new Create an notation list.
3574
3575       add         ($notation)
3576           Add an notation to an notation list.
3577
3578       add_new_notation ($name, $base, $sysid, $pubid)
3579           Create a new notation and add it to the notation list
3580
3581       delete     ($notation or $tag).
3582           Delete an notation (defined by its name or by the Notation object)
3583           from the list.
3584
3585       print      ($optional_filehandle)
3586           Print the notation list.
3587
3588       list
3589           Return the list as an array
3590
3591   XML::Twig::Notation
3592       new        ($name, $base, $sysid, $pubid)
3593           Same argumnotations as the Notation handler for XML::Parser.
3594
3595       print       ($optional_filehandle)
3596           Print an notation declaration.
3597
3598       name
3599           Return the name of the notation
3600
3601       base
3602           Return the base to be used for resolving a relative URI
3603
3604       sysid
3605           Return the system id for the notation
3606
3607       pubid
3608           Return the public id for the notation
3609
3610       text
3611           Return the notation declaration text.
3612

EXAMPLES

3614       Additional examples (and a complete tutorial) can be found  on the
3615       XML::Twig Page<http://www.xmltwig.org/xmltwig/>
3616
3617       To figure out what flush does call the following script with an XML
3618       file and an element name as arguments
3619
3620         use XML::Twig;
3621
3622         my ($file, $elt)= @ARGV;
3623         my $t= XML::Twig->new( twig_handlers =>
3624             { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
3625         $t->parsefile( $file, ErrorContext => 2);
3626         $t->flush;
3627         print "\n";
3628

NOTES

3630   Subclassing XML::Twig
3631       Useful methods:
3632
3633       elt_class
3634           In order to subclass "XML::Twig" you will probably need to subclass
3635           also "XML::Twig::Elt". Use the "elt_class" option when you create
3636           the "XML::Twig" object to get the elements created in a different
3637           class (which should be a subclass of "XML::Twig::Elt".
3638
3639       add_options
3640           If you inherit "XML::Twig" new method but want to add more options
3641           to it you can use this method to prevent XML::Twig to issue
3642           warnings for those additional options.
3643
3644   DTD Handling
3645       There are 3 possibilities here.  They are:
3646
3647       No DTD
3648           No doctype, no DTD information, no entity information, the world is
3649           simple...
3650
3651       Internal DTD
3652           The XML document includes an internal DTD, and maybe entity
3653           declarations.
3654
3655           If you use the load_DTD option when creating the twig the DTD
3656           information and the entity declarations can be accessed.
3657
3658           The DTD and the entity declarations will be "flush"'ed (or
3659           "print"'ed) either as is (if they have not been modified) or as
3660           reconstructed (poorly, comments are lost, order is not kept, due to
3661           it's content this DTD should not be viewed by anyone) if they have
3662           been modified. You can also modify them directly by changing the
3663           "$twig->{twig_doctype}->{internal}" field (straight from
3664           XML::Parser, see the "Doctype" handler doc)
3665
3666       External DTD
3667           The XML document includes a reference to an external DTD, and maybe
3668           entity declarations.
3669
3670           If you use the "load_DTD" when creating the twig the DTD
3671           information and the entity declarations can be accessed. The entity
3672           declarations will be "flush"'ed (or "print"'ed) either as is (if
3673           they have not been modified) or as reconstructed (badly, comments
3674           are lost, order is not kept).
3675
3676           You can change the doctype through the "$twig->set_doctype" method
3677           and print the dtd through the "$twig->dtd_text" or
3678           "$twig->dtd_print"
3679            methods.
3680
3681           If you need to modify the entity list this is probably the easiest
3682           way to do it.
3683
3684   Flush
3685       Remember that element handlers are called when the element is CLOSED,
3686       so if you have handlers for nested elements the inner handlers will be
3687       called first. It makes it for example trickier than it would seem to
3688       number nested sections (or clauses, or divs), as the titles in the
3689       inner sections are handled before the outer sections.
3690

BUGS

3692       segfault during parsing
3693           This happens when parsing huge documents, or lots of small ones,
3694           with a version of Perl before 5.16.
3695
3696           This is due to a bug in the way weak references are handled in Perl
3697           itself.
3698
3699           The fix is either to upgrade to Perl 5.16 or later ("perlbrew" is a
3700           great tool to manage several installations of perl on the same
3701           machine).
3702
3703           An other, NOT RECOMMENDED, way of fixing the problem, is to switch
3704           off weak references by writing "XML::Twig::_set_weakrefs( 0);" at
3705           the top of the code.  This is totally unsupported, and may lead to
3706           other problems though,
3707
3708       entity handling
3709           Due to XML::Parser behaviour, non-base entities in attribute values
3710           disappear if they are not declared in the document:
3711           "att="val&ent;"" will be turned into "att => val", unless you use
3712           the "keep_encoding" argument to "XML::Twig->new"
3713
3714       DTD handling
3715           The DTD handling methods are quite bugged. No one uses them and it
3716           seems very difficult to get them to work in all cases, including
3717           with several slightly incompatible versions of XML::Parser and of
3718           libexpat.
3719
3720           Basically you can read the DTD, output it back properly, and update
3721           entities, but not much more.
3722
3723           So use XML::Twig with standalone documents, or with documents
3724           referring to an external DTD, but don't expect it to properly parse
3725           and even output back the DTD.
3726
3727       memory leak
3728           If you use a REALLY old Perl (5.005!) and a lot of twigs you might
3729           find that you leak quite a lot of memory (about 2Ks per twig). You
3730           can use the "dispose " method to free that memory after you are
3731           done.
3732
3733           If you create elements the same thing might happen, use the
3734           "delete" method to get rid of them.
3735
3736           Alternatively installing the "Scalar::Util" (or "WeakRef") module
3737           on a version of Perl that supports it (>5.6.0) will get rid of the
3738           memory leaks automagically.
3739
3740       ID list
3741           The ID list is NOT updated when elements are cut or deleted.
3742
3743       change_gi
3744           This method will not function properly if you do:
3745
3746                $twig->change_gi( $old1, $new);
3747                $twig->change_gi( $old2, $new);
3748                $twig->change_gi( $new, $even_newer);
3749
3750       sanity check on XML::Parser method calls
3751           XML::Twig should really prevent calls to some XML::Parser methods,
3752           especially the "setHandlers" method.
3753
3754       pretty printing
3755           Pretty printing (at least using the '"indented"' style) is hard to
3756           get right!  Only elements that belong to the document will be
3757           properly indented. Printing elements that do not belong to the twig
3758           makes it impossible for XML::Twig to figure out their depth, and
3759           thus their indentation level.
3760
3761           Also there is an unavoidable bug when using "flush" and pretty
3762           printing for elements with mixed content that start with an
3763           embedded element:
3764
3765             <elt><b>b</b>toto<b>bold</b></elt>
3766
3767             will be output as
3768
3769             <elt>
3770               <b>b</b>toto<b>bold</b></elt>
3771
3772           if you flush the twig when you find the "<b>" element
3773

Globals

3775       These are the things that can mess up calling code, especially if
3776       threaded.  They might also cause problem under mod_perl.
3777
3778       Exported constants
3779           Whether you want them or not you get them! These are subroutines to
3780           use as constant when creating or testing elements
3781
3782             PCDATA  return '#PCDATA'
3783             CDATA   return '#CDATA'
3784             PI      return '#PI', I had the choice between PROC and PI :--(
3785
3786       Module scoped values: constants
3787           these should cause no trouble:
3788
3789             %base_ent= ( '>' => '&gt;',
3790                          '<' => '&lt;',
3791                          '&' => '&amp;',
3792                          "'" => '&apos;',
3793                          '"' => '&quot;',
3794                        );
3795             CDATA_START   = "<![CDATA[";
3796             CDATA_END     = "]]>";
3797             PI_START      = "<?";
3798             PI_END        = "?>";
3799             COMMENT_START = "<!--";
3800             COMMENT_END   = "-->";
3801
3802           pretty print styles
3803
3804             ( $NSGMLS, $NICE, $INDENTED, $INDENTED_C, $WRAPPED, $RECORD1, $RECORD2)= (1..7);
3805
3806           empty tag output style
3807
3808             ( $HTML, $EXPAND)= (1..2);
3809
3810       Module scoped values: might be changed
3811           Most of these deal with pretty printing, so the worst that can
3812           happen is probably that XML output does not look right, but is
3813           still valid and processed identically by XML processors.
3814
3815           $empty_tag_style can mess up HTML bowsers though and changing $ID
3816           would most likely create problems.
3817
3818             $pretty=0;           # pretty print style
3819             $quote='"';          # quote for attributes
3820             $INDENT= '  ';       # indent for indented pretty print
3821             $empty_tag_style= 0; # how to display empty tags
3822             $ID                  # attribute used as an id ('id' by default)
3823
3824       Module scoped values: definitely changed
3825           These 2 variables are used to replace tags by an index, thus saving
3826           some space when creating a twig. If they really cause you too much
3827           trouble, let me know, it is probably possible to create either a
3828           switch or at least a version of XML::Twig that does not perform
3829           this optimization.
3830
3831             %gi2index;     # tag => index
3832             @index2gi;     # list of tags
3833
3834       If you need to manipulate all those values, you can use the following
3835       methods on the XML::Twig object:
3836
3837       global_state
3838           Return a hashref with all the global variables used by XML::Twig
3839
3840           The hash has the following fields:  "pretty", "quote", "indent",
3841           "empty_tag_style", "keep_encoding", "expand_external_entities",
3842           "output_filter", "output_text_filter", "keep_atts_order"
3843
3844       set_global_state ($state)
3845           Set the global state, $state is a hashref
3846
3847       save_global_state
3848           Save the current global state
3849
3850       restore_global_state
3851           Restore the previously saved (using "Lsave_global_state"> state
3852

TODO

3854       SAX handlers
3855           Allowing XML::Twig to work on top of any SAX parser
3856
3857       multiple twigs are not well supported
3858           A number of twig features are just global at the moment. These
3859           include the ID list and the "tag pool" (if you use "change_gi" then
3860           you change the tag for ALL twigs).
3861
3862           A future version will try to support this while trying not to be to
3863           hard on performance (at least when a single twig is used!).
3864

AUTHOR

3866       Michel Rodriguez <mirod@cpan.org>
3867

LICENSE

3869       This library is free software; you can redistribute it and/or modify it
3870       under the same terms as Perl itself.
3871
3872       Bug reports should be sent using: RT
3873       <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig>
3874
3875       Comments can be sent to mirod@cpan.org
3876
3877       The XML::Twig page is at <http://www.xmltwig.org/xmltwig/> It includes
3878       the development version of the module, a slightly better version of the
3879       documentation, examples, a tutorial and a: Processing XML efficiently
3880       with Perl and XML::Twig:
3881       <http://www.xmltwig.org/xmltwig/tutorial/index.html>
3882

SEE ALSO

3884       Complete docs, including a tutorial, examples, an easier to use HTML
3885       version of the docs, a quick reference card and a FAQ are available at
3886       <http://www.xmltwig.org/xmltwig/>
3887
3888       git repository at <http://github.com/mirod/xmltwig>
3889
3890       XML::Parser, XML::Parser::Expat, XML::XPath, Encode, Text::Iconv,
3891       Scalar::Utils
3892
3893   Alternative Modules
3894       XML::Twig is not the only XML::Processing module available on CPAN (far
3895       from it!).
3896
3897       The main alternative I would recommend is XML::LibXML.
3898
3899       Here is a quick comparison of the 2 modules:
3900
3901       XML::LibXML, actually "libxml2" on which it is based, sticks to the
3902       standards, and implements a good number of them in a rather strict way:
3903       XML, XPath, DOM, RelaxNG, I must be forgetting a couple (XInclude?). It
3904       is fast and rather frugal memory-wise.
3905
3906       XML::Twig is older: when I started writing it XML::Parser/expat was the
3907       only game in town. It implements XML and that's about it (plus a subset
3908       of XPath, and you can use XML::Twig::XPath if you have XML::XPathEngine
3909       installed for full support). It is slower and requires more memory for
3910       a full tree than XML::LibXML. On the plus side (yes, there is a plus
3911       side!) it lets you process a big document in chunks, and thus let you
3912       tackle documents that couldn't be loaded in memory by XML::LibXML, and
3913       it offers a lot (and I mean a LOT!) of higher-level methods, for
3914       everything, from adding structure to "low-level" XML, to shortcuts for
3915       XHTML conversions and more. It also DWIMs quite a bit, getting comments
3916       and non-significant whitespaces out of the way but preserving them in
3917       the output for example. As it does not stick to the DOM, is also
3918       usually leads to shorter code than in XML::LibXML.
3919
3920       Beyond the pure features of the 2 modules, XML::LibXML seems to be
3921       preferred by "XML-purists", while XML::Twig seems to be more used by
3922       Perl Hackers who have to deal with XML. As you have noted, XML::Twig
3923       also comes with quite a lot of docs, but I am sure if you ask for help
3924       about XML::LibXML here or on Perlmonks you will get answers.
3925
3926       Note that it is actually quite hard for me to compare the 2 modules: on
3927       one hand I know XML::Twig inside-out and I can get it to do pretty much
3928       anything I need to (or I improve it ;--), while I have a very basic
3929       knowledge of XML::LibXML.  So feature-wise, I'd rather use XML::Twig
3930       ;--). On the other hand, I am painfully aware of some of the
3931       deficiencies, potential bugs and plain ugly code that lurk in
3932       XML::Twig, even though you are unlikely to be affected by them (unless
3933       for example you need to change the DTD of a document programmatically),
3934       while I haven't looked much into XML::LibXML so it still looks shinny
3935       and clean to me.
3936
3937       That said, if you need to process a document that is too big to fit
3938       memory and XML::Twig is too slow for you, my reluctant advice would be
3939       to use "bare" XML::Parser.  It won't be as easy to use as XML::Twig:
3940       basically with XML::Twig you trade some speed (depending on what you do
3941       from a factor 3 to... none) for ease-of-use, but it will be easier IMHO
3942       than using SAX (albeit not standard), and at this point a LOT faster
3943       (see the last test in
3944       <http://www.xmltwig.org/article/simple_benchmark/>).
3945
3946
3947
3948perl v5.38.0                      2023-07-21                           Twig(3)
Impressum