1Twig(3)               User Contributed Perl Documentation              Twig(3)
2
3
4

NAME

6       XML::Twig - A perl module for processing huge XML documents in tree
7       mode.
8

SYNOPSIS

10       Note that this documentation is intended as a reference to the module.
11
12       Complete docs, including a tutorial, examples, an easier to use HTML
13       version, a quick reference card and a FAQ are available at
14       <http://www.xmltwig.com/xmltwig>
15
16       Small documents (loaded in memory as a tree):
17
18         my $twig=XML::Twig->new();    # create the twig
19         $twig->parsefile( 'doc.xml'); # build it
20         my_process( $twig);           # use twig methods to process it
21         $twig->print;                 # output the twig
22
23       Huge documents (processed in combined stream/tree mode):
24
25         # at most one div will be loaded in memory
26         my $twig=XML::Twig->new(
27           twig_handlers =>
28             { title   => sub { $_->set_tag( 'h2') }, # change title tags to h2
29               para    => sub { $_->set_tag( 'p')  }, # change para to p
30               hidden  => sub { $_->delete;       },  # remove hidden elements
31               list    => \&my_list_process,          # process list elements
32               div     => sub { $_[0]->flush;     },  # output and free memory
33             },
34           pretty_print => 'indented',                # output will be nicely formatted
35           empty_tags   => 'html',                    # outputs <empty_tag />
36                                );
37           $twig->flush;                              # flush the end of the document
38
39       See XML::Twig 101 for other ways to use the module, as a filter for
40       example.
41

DESCRIPTION

43       This module provides a way to process XML documents. It is build on top
44       of "XML::Parser".
45
46       The module offers a tree interface to the document, while allowing you
47       to output the parts of it that have been completely processed.
48
49       It allows minimal resource (CPU and memory) usage by building the tree
50       only for the parts of the documents that need actual processing,
51       through the use of the "twig_roots " and "twig_print_outside_roots "
52       options. The "finish " and "finish_print " methods also help to
53       increase performances.
54
55       XML::Twig tries to make simple things easy so it tries its best to
56       takes care of a lot of the (usually) annoying (but sometimes necessary)
57       features that come with XML and XML::Parser.
58

XML::Twig 101

60       XML::Twig can be used either on "small" XML documents (that fit in
61       memory) or on huge ones, by processing parts of the document and
62       outputting or discarding them once they are processed.
63
64   Loading an XML document and processing it
65         my $t= XML::Twig->new();
66         $t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
67         my $root= $t->root;
68         $root->set_tag( 'html');              # change doc to html
69         $title= $root->first_child( 'title'); # get the title
70         $title->set_tag( 'h1');               # turn it into h1
71         my @para= $root->children( 'para');   # get the para children
72         foreach my $para (@para)
73           { $para->set_tag( 'p'); }           # turn them into p
74         $t->print;                            # output the document
75
76       Other useful methods include:
77
78       att: "$elt->{'att'}->{'foo'}" return the "foo" attribute for an
79       element,
80
81       set_att : "$elt->set_att( foo => "bar")" sets the "foo" attribute to
82       the "bar" value,
83
84       next_sibling: "$elt->{next_sibling}" return the next sibling in the
85       document (in the example "$title->{next_sibling}" is the first "para",
86       you can also (and actually should) use "$elt->next_sibling( 'para')" to
87       get it
88
89       The document can also be transformed through the use of the cut, copy,
90       paste and move methods: "$title->cut; $title->paste( after => $p);" for
91       example
92
93       And much, much more, see XML::Twig::Elt.
94
95   Processing an XML document chunk by chunk
96       One of the strengths of XML::Twig is that it let you work with files
97       that do not fit in memory (BTW storing an XML document in memory as a
98       tree is quite memory-expensive, the expansion factor being often around
99       10).
100
101       To do this you can define handlers, that will be called once a specific
102       element has been completely parsed. In these handlers you can access
103       the element and process it as you see fit, using the navigation and the
104       cut-n-paste methods, plus lots of convenient ones like "prefix ".  Once
105       the element is completely processed you can then "flush " it, which
106       will output it and free the memory. You can also "purge " it if you
107       don't need to output it (if you are just extracting some data from the
108       document for example). The handler will be called again once the next
109       relevant element has been parsed.
110
111         my $t= XML::Twig->new( twig_handlers =>
112                                 { section => \&section,
113                                   para   => sub { $_->set_tag( 'p'); }
114                                 },
115                              );
116         $t->parsefile( 'doc.xml');
117         $t->flush; # don't forget to flush one last time in the end or anything
118                    # after the last </section> tag will not be output
119
120         # the handler is called once a section is completely parsed, ie when
121         # the end tag for section is found, it receives the twig itself and
122         # the element (including all its sub-elements) as arguments
123         sub section
124           { my( $t, $section)= @_;      # arguments for all twig_handlers
125             $section->set_tag( 'div');  # change the tag name.4, my favourite method...
126             # let's use the attribute nb as a prefix to the title
127             my $title= $section->first_child( 'title'); # find the title
128             my $nb= $title->{'att'}->{'nb'}; # get the attribute
129             $title->prefix( "$nb - ");  # easy isn't it?
130             $section->flush;            # outputs the section and frees memory
131           }
132
133       There is of course more to it: you can trigger handlers on more
134       elaborate conditions than just the name of the element, "section/title"
135       for example.
136
137         my $t= XML::Twig->new( twig_handlers =>
138                                  { 'section/title' => sub { $_->print } }
139                              )
140                         ->parsefile( 'doc.xml');
141
142       Here "sub { $_->print }" simply prints the current element ($_ is
143       aliased to the element in the handler).
144
145       You can also trigger a handler on a test on an attribute:
146
147         my $t= XML::Twig->new( twig_handlers =>
148                             { 'section[@level="1"]' => sub { $_->print } }
149                              );
150                         ->parsefile( 'doc.xml');
151
152       You can also use "start_tag_handlers " to process an element as soon as
153       the start tag is found. Besides "prefix " you can also use "suffix ",
154
155   Processing just parts of an XML document
156       The twig_roots mode builds only the required sub-trees from the
157       document Anything outside of the twig roots will just be ignored:
158
159         my $t= XML::Twig->new(
160              # the twig will include just the root and selected titles
161                  twig_roots   => { 'section/title' => \&print_n_purge,
162                                    'annex/title'   => \&print_n_purge
163                  }
164                             );
165         $t->parsefile( 'doc.xml');
166
167         sub print_n_purge
168           { my( $t, $elt)= @_;
169             print $elt->text;    # print the text (including sub-element texts)
170             $t->purge;           # frees the memory
171           }
172
173       You can use that mode when you want to process parts of a documents but
174       are not interested in the rest and you don't want to pay the price,
175       either in time or memory, to build the tree for the it.
176
177   Building an XML filter
178       You can combine the "twig_roots" and the "twig_print_outside_roots"
179       options to build filters, which let you modify selected elements and
180       will output the rest of the document as is.
181
182       This would convert prices in $ to prices in Euro in a document:
183
184         my $t= XML::Twig->new(
185                  twig_roots   => { 'price' => \&convert, },   # process prices
186                  twig_print_outside_roots => 1,               # print the rest
187                             );
188         $t->parsefile( 'doc.xml');
189
190         sub convert
191           { my( $t, $price)= @_;
192             my $currency=  $price->{'att'}->{'currency'};          # get the currency
193             if( $currency eq 'USD')
194               { $usd_price= $price->text;                     # get the price
195                 # %rate is just a conversion table
196                 my $euro_price= $usd_price * $rate{usd2euro};
197                 $price->set_text( $euro_price);               # set the new price
198                 $price->set_att( currency => 'EUR');          # don't forget this!
199               }
200             $price->print;                                    # output the price
201           }
202
203   XML::Twig and various versions of Perl, XML::Parser and expat:
204       Before being uploaded to CPAN, XML::Twig 3.22 has been tested under the
205       following environments:
206
207       linux-x86
208           perl 5.6.2, expat 1.95.8, XML::Parser 2.34 perl 5.8.0, expat
209           1.95.8, XML::Parser 2.34 perl 5.8.7, expat 1.95.8, XML::Parser2.34
210
211       Solaris
212           perl 5.6.1, expat 1.95.2, XML::Parser 2.31
213
214       XML::Twig is a lot more sensitive to variations in versions of perl,
215       XML::Parser and expat than to the OS, so this should cover some
216       reasonable configurations.
217
218       The "recommended configuration" is perl 5.8.3+ (for good Unicode
219       support), XML::Parser 2.31+ and expat 1.95.5+
220
221       See http://testers.cpan.org/search?request=dist&dist=XML-Twig
222       <http://testers.cpan.org/search?request=dist&dist=XML-Twig> for the
223       CPAN testers reports on XML::Twig, which list all tested
224       configurations.
225
226       An Atom feed of the CPAN Testers results is available at
227       <http://xmltwig.com/rss/twig_testers.rss>
228
229       Finally:
230
231       XML::Twig does NOT work with expat 1.95.4
232       XML::Twig only works with XML::Parser 2.27 in perl 5.6.*
233           Note that I can't compile XML::Parser 2.27 anymore, so I can't
234           guarantee that it still works
235
236       XML::Parser 2.28 does not really work
237
238       When in doubt, upgrade expat, XML::Parser and Scalar::Util
239
240       Finally, for some optional features, XML::Twig depends on some
241       additional modules. The complete list, which depends somewhat on the
242       version of Perl that you are running, is given by running
243       "t/zz_dump_config.t"
244

Simplifying XML processing

246       Whitespaces
247           Whitespaces that look non-significant are discarded, this behaviour
248           can be controlled using the "keep_spaces ", "keep_spaces_in " and
249           "discard_spaces_in " options.
250
251       Encoding
252           You can specify that you want the output in the same encoding as
253           the input (provided you have valid XML, which means you have to
254           specify the encoding either in the document or when you create the
255           Twig object) using the "keep_encoding " option
256
257           You can also use "output_encoding" to convert the internal UTF-8
258           format to the required encoding.
259
260       Comments and Processing Instructions (PI)
261           Comments and PI's can be hidden from the processing, but still
262           appear in the output (they are carried by the "real" element closer
263           to them)
264
265       Pretty Printing
266           XML::Twig can output the document pretty printed so it is easier to
267           read for us humans.
268
269       Surviving an untimely death
270           XML parsers are supposed to react violently when fed improper XML.
271           XML::Parser just dies.
272
273           XML::Twig provides the "safe_parse " and the "safe_parsefile "
274           methods which wrap the parse in an eval and return either the
275           parsed twig or 0 in case of failure.
276
277       Private attributes
278           Attributes with a name starting with # (illegal in XML) will not be
279           output, so you can safely use them to store temporary values during
280           processing. Note that you can store anything in a private
281           attribute, not just text, it's just a regular Perl variable, so a
282           reference to an object or a huge data structure is perfectly fine.
283

CLASSES

285       XML::Twig uses a very limited number of classes. The ones you are most
286       likely to use are "XML::Twig" of course, which represents a complete
287       XML document, including the document itself (the root of the document
288       itself is "root"), its handlers, its input or output filters... The
289       other main class is "XML::Twig::Elt", which models an XML element.
290       Element here has a very wide definition: it can be a regular element,
291       or but also text, with an element "tag" of "#PCDATA" (or "#CDATA"), an
292       entity (tag is "#ENT"), a Processing Instruction ("#PI"), a comment
293       ("#COMMENT").
294
295       Those are the 2 commonly used classes.
296
297       You might want to look the "elt_class" option if you want to subclass
298       "XML::Twig::Elt".
299
300       Attributes are just attached to their parent element, they are not
301       objects per se. (Please use the provided methods "att" and "set_att" to
302       access them, if you access them as a hash, then your code becomes
303       implementaion dependent and might break in the future).
304
305       Other classes that are seldom used are "XML::Twig::Entity_list" and
306       "XML::Twig::Entity".
307
308       If you use "XML::Twig::XPath" instead of "XML::Twig", elements are then
309       created as "XML::Twig::XPath::Elt"
310

METHODS

312   XML::Twig
313       A twig is a subclass of XML::Parser, so all XML::Parser methods can be
314       called on a twig object, including parse and parsefile.  "setHandlers"
315       on the other hand cannot be used, see "BUGS "
316
317       new This is a class method, the constructor for XML::Twig. Options are
318           passed as keyword value pairs. Recognized options are the same as
319           XML::Parser, plus some XML::Twig specifics.
320
321           New Options:
322
323           twig_handlers
324               This argument consists of a hash "{ expression =" \&handler}>
325               where expression is a an XPath-like expression (+ some others).
326
327               XPath expressions are limited to using the child and descendant
328               axis (indeed you can't specify an axis), and predicates cannot
329               be nested.  You can use the "string", or "string(<tag>)"
330               function (except in "twig_roots" triggers).
331
332               Additionally you can use regexps (/ delimited) to match
333               attribute and string values.
334
335               Examples:
336
337                 foo
338                 foo/bar
339                 foo//bar
340                 /foo/bar
341                 /foo//bar
342                 /foo/bar[@att1 = "val1" and @att2 = "val2"]/baz[@a >= 1]
343                 foo[string()=~ /^duh!+/]
344                 /foo[string(bar)=~ /\d+/]/baz[@att != 3]
345
346               #CDATA can be used to call a handler for a CDATA.  #COMMENT can
347               be used to call a handler for comments
348
349               Some additional (non-XPath) expressions are also provided for
350               convenience:
351
352               processing instructions
353                   '?' or '#PI' triggers the handler for any processing
354                   instruction, and '?<target>' or '#PI <target>' triggers a
355                   handler for processing instruction with the given target(
356                   ex: '#PI xml-stylesheet').
357
358               level(<level>)
359                   Triggers the handler on any element at that level in the
360                   tree (root is level 1)
361
362               _all_
363                   Triggers the handler for all elements in the tree
364
365               _default_
366                   Triggers the handler for each element that does NOT have
367                   any other handler.
368
369               Expressions are evaluated against the input document.  Which
370               means that even if you have changed the tag of an element
371               (changing the tag of a parent element from a handler for
372               example) the change will not impact the expression evaluation.
373               There is an exception to this: "private" attributes (which name
374               start with a '#', and can only be created during the parsing,
375               as they are not valid XML) are checked against the current
376               twig.
377
378               Handlers are triggered in fixed order, sorted by their type
379               (xpath expressions first, then regexps, then level), then by
380               whether they specify a full path (starting at the root element)
381               or not, then by by number of steps in the expression , then
382               number of predicates, then number of tests in predicates.
383               Handlers where the last step does not specify a step
384               ("foo/bar/*") are triggered after other XPath handlers.
385               Finally "_all_" handlers are triggered last.
386
387               Important: once a handler has been triggered if it returns 0
388               then no other handler is called, except a "_all_" handler which
389               will be called anyway.
390
391               If a handler returns a true value and other handlers apply,
392               then the next applicable handler will be called. Repeat, rinse,
393               lather..; The exception to that rule is when the
394               "do_not_chain_handlers" option is set, in which case only the
395               first handler will be called.
396
397               Note that it might be a good idea to explicitly return a short
398               true value (like 1) from handlers: this ensures that other
399               applicable handlers are called even if the last statement for
400               the handler happens to evaluate to false. This might also
401               speedup the code by avoiding the result of the last statement
402               of the code to be copied and passed to the code managing
403               handlers.  It can really pay to have 1 instead of a long string
404               returned.
405
406               When an element is CLOSED the corresponding handler is called,
407               with 2 arguments: the twig and the "Element ". The twig
408               includes the document tree that has been built so far, the
409               element is the complete sub-tree for the element. This means
410               that handlers for inner elements are called before handlers for
411               outer elements.
412
413               $_ is also set to the element, so it is easy to write inline
414               handlers like
415
416                 para => sub { $_->set_tag( 'p'); }
417
418               Text is stored in elements whose tag is #PCDATA (due to mixed
419               content, text and sub-element in an element there is no way to
420               store the text as just an attribute of the enclosing element).
421
422               Warning: if you have used purge or flush on the twig the
423               element might not be complete, some of its children might have
424               been entirely flushed or purged, and the start tag might even
425               have been printed (by "flush") already, so changing its tag
426               might not give the expected result.
427
428           twig_roots
429               This argument let's you build the tree only for those elements
430               you are interested in.
431
432                 Example: my $t= XML::Twig->new( twig_roots => { title => 1, subtitle => 1});
433                          $t->parsefile( file);
434                          my $t= XML::Twig->new( twig_roots => { 'section/title' => 1});
435                          $t->parsefile( file);
436
437               return a twig containing a document including only "title" and
438               "subtitle" elements, as children of the root element.
439
440               You can use generic_attribute_condition, attribute_condition,
441               full_path, partial_path, tag, tag_regexp, _default_ and _all_
442               to trigger the building of the twig.  string_condition and
443               regexp_condition cannot be used as the content of the element,
444               and the string, have not yet been parsed when the condition is
445               checked.
446
447               WARNING: path are checked for the document. Even if the
448               "twig_roots" option is used they will be checked against the
449               full document tree, not the virtual tree created by XML::Twig
450
451               WARNING: twig_roots elements should NOT be nested, that would
452               hopelessly confuse XML::Twig ;--(
453
454               Note: you can set handlers (twig_handlers) using twig_roots
455                 Example: my $t= XML::Twig->new( twig_roots =>
456                                                  { title    => sub {
457               $_{1]->print;},
458                                                    subtitle =>
459               \&process_subtitle
460                                                  }
461                                              );
462                          $t->parsefile( file);
463
464           twig_print_outside_roots
465               To be used in conjunction with the "twig_roots" argument. When
466               set to a true value this will print the document outside of the
467               "twig_roots" elements.
468
469                Example: my $t= XML::Twig->new( twig_roots => { title => \&number_title },
470                                               twig_print_outside_roots => 1,
471                                              );
472                          $t->parsefile( file);
473                          { my $nb;
474                          sub number_title
475                            { my( $twig, $title);
476                              $nb++;
477                              $title->prefix( "$nb "; }
478                              $title->print;
479                            }
480                          }
481
482               This example prints the document outside of the title element,
483               calls "number_title" for each "title" element, prints it, and
484               then resumes printing the document. The twig is built only for
485               the "title" elements.
486
487               If the value is a reference to a file handle then the document
488               outside the "twig_roots" elements will be output to this file
489               handle:
490
491                 open( OUT, ">out_file") or die "cannot open out file out_file:$!";
492                 my $t= XML::Twig->new( twig_roots => { title => \&number_title },
493                                        # default output to OUT
494                                        twig_print_outside_roots => \*OUT,
495                                      );
496
497                        { my $nb;
498                          sub number_title
499                            { my( $twig, $title);
500                              $nb++;
501                              $title->prefix( "$nb "; }
502                              $title->print( \*OUT);    # you have to print to \*OUT here
503                            }
504                          }
505
506           start_tag_handlers
507               A hash "{ expression =" \&handler}>. Sets element handlers that
508               are called when the element is open (at the end of the
509               XML::Parser "Start" handler). The handlers are called with 2
510               params: the twig and the element. The element is empty at that
511               point, its attributes are created though.
512
513               You can use generic_attribute_condition, attribute_condition,
514               full_path, partial_path, tag, tag_regexp, _default_  and _all_
515               to trigger the handler.
516
517               string_condition and regexp_condition cannot be used as the
518               content of the element, and the string, have not yet been
519               parsed when the condition is checked.
520
521               The main uses for those handlers are to change the tag name
522               (you might have to do it as soon as you find the open tag if
523               you plan to "flush" the twig at some point in the element, and
524               to create temporary attributes that will be used when
525               processing sub-element with "twig_hanlders".
526
527               You should also use it to change tags if you use "flush". If
528               you change the tag in a regular "twig_handler" then the start
529               tag might already have been flushed.
530
531               Note: "start_tag" handlers can be called outside of
532               "twig_roots" if this argument is used, in this case handlers
533               are called with the following arguments: $t (the twig), $tag
534               (the tag of the element) and %att (a hash of the attributes of
535               the element).
536
537               If the "twig_print_outside_roots" argument is also used, if the
538               last handler called returns  a "true" value, then the the start
539               tag will be output as it appeared in the original document, if
540               the handler returns a a "false" value then the start tag will
541               not be printed (so you can print a modified string yourself for
542               example).
543
544               Note that you can use the ignore method in "start_tag_handlers"
545               (and only there).
546
547           end_tag_handlers
548               A hash "{ expression =" \&handler}>. Sets element handlers that
549               are called when the element is closed (at the end of the
550               XML::Parser "End" handler). The handlers are called with 2
551               params: the twig and the tag of the element.
552
553               twig_handlers are called when an element is completely parsed,
554               so why have this redundant option? There is only one use for
555               "end_tag_handlers": when using the "twig_roots" option, to
556               trigger a handler for an element outside the roots.  It is for
557               example very useful to number titles in a document using nested
558               sections:
559
560                 my @no= (0);
561                 my $no;
562                 my $t= XML::Twig->new(
563                         start_tag_handlers =>
564                          { section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
565                         twig_roots         =>
566                          { title   => sub { $_[1]->prefix( $no); $_[1]->print; } },
567                         end_tag_handlers   => { section => sub { pop @no;  } },
568                         twig_print_outside_roots => 1
569                                     );
570                  $t->parsefile( $file);
571
572               Using the "end_tag_handlers" argument without "twig_roots" will
573               result in an error.
574
575           do_not_chain_handlers
576               If this option is set to a true value, then only one handler
577               will be called for each element, even if several satisfy the
578               condition
579
580               Note that the "_all_" handler will still be called regardless
581
582           ignore_elts
583               This option lets you ignore elements when building the twig.
584               This is useful in cases where you cannot use "twig_roots" to
585               ignore elements, for example if the element to ignore is a
586               sibling of elements you are interested in.
587
588               Example:
589
590                 my $twig= XML::Twig->new( ignore_elts => { elt => 1 });
591                 $twig->parsefile( 'doc.xml');
592
593               This will build the complete twig for the document, except that
594               all "elt" elements (and their children) will be left out.
595
596           char_handler
597               A reference to a subroutine that will be called every time
598               "PCDATA" is found.
599
600               The subroutine receives the string as argument, and returns the
601               modified string:
602
603                 # we want all strings in upper case
604                 sub my_char_handler
605                   { my( $text)= @_;
606                     $text= uc( $text);
607                     return $text;
608                   }
609
610           elt_class
611               The name of a class used to store elements. this class should
612               inherit from "XML::Twig::Elt" (and by default it is
613               "XML::Twig::Elt"). This option is used to subclass the element
614               class and extend it with new methods.
615
616               This option is needed because during the parsing of the XML,
617               elements are created by "XML::Twig", without any control from
618               the user code.
619
620           keep_atts_order
621               Setting this option to a true value causes the attribute hash
622               to be tied to a "Tie::IxHash" object.  This means that
623               "Tie::IxHash" needs to be installed for this option to be
624               available. It also means that the hash keeps its order, so you
625               will get the attributes in order. This allows outputting the
626               attributes in the same order as they were in the original
627               document.
628
629           keep_encoding
630               This is a (slightly?) evil option: if the XML document is not
631               UTF-8 encoded and you want to keep it that way, then setting
632               keep_encoding will use the"Expat" original_string method for
633               character, thus keeping the original encoding, as well as the
634               original entities in the strings.
635
636               See the "t/test6.t" test file to see what results you can
637               expect from the various encoding options.
638
639               WARNING: if the original encoding is multi-byte then attribute
640               parsing will be EXTREMELY unsafe under any Perl before 5.6, as
641               it uses regular expressions which do not deal properly with
642               multi-byte characters. You can specify an alternate function to
643               parse the start tags with the "parse_start_tag" option (see
644               below)
645
646               WARNING: this option is NOT used when parsing with the non-
647               blocking parser ("parse_start", "parse_more", parse_done
648               methods) which you probably should not use with XML::Twig
649               anyway as they are totally untested!
650
651           output_encoding
652               This option generates an output_filter using "Encode",
653               "Text::Iconv" or "Unicode::Map8" and "Unicode::Strings", and
654               sets the encoding in the XML declaration. This is the easiest
655               way to deal with encodings, if you need more sophisticated
656               features, look at "output_filter" below
657
658           output_filter
659               This option is used to convert the character encoding of the
660               output document.  It is passed either a string corresponding to
661               a predefined filter or a subroutine reference. The filter will
662               be called every time a document or element is processed by the
663               "print" functions ("print", "sprint", "flush").
664
665               Pre-defined filters:
666
667               latin1
668                   uses either "Encode", "Text::Iconv" or "Unicode::Map8" and
669                   "Unicode::String" or a regexp (which works only with
670                   XML::Parser 2.27), in this order, to convert all characters
671                   to ISO-8859-1 (aka latin1)
672
673               html
674                   does the same conversion as "latin1", plus encodes entities
675                   using "HTML::Entities" (oddly enough you will need to have
676                   HTML::Entities installed for it to be available). This
677                   should only be used if the tags and attribute names
678                   themselves are in US-ASCII, or they will be converted and
679                   the output will not be valid XML any more
680
681               safe
682                   converts the output to ASCII (US) only  plus character
683                   entities ("&#nnn;") this should be used only if the tags
684                   and attribute names themselves are in US-ASCII, or they
685                   will be converted and the output will not be valid XML any
686                   more
687
688               safe_hex
689                   same as "safe" except that the character entities are in
690                   hexa ("&#xnnn;")
691
692               encode_convert ($encoding)
693                   Return a subref that can be used to convert utf8 strings to
694                   $encoding).  Uses "Encode".
695
696                      my $conv = XML::Twig::encode_convert( 'latin1');
697                      my $t = XML::Twig->new(output_filter => $conv);
698
699               iconv_convert ($encoding)
700                   this function is used to create a filter subroutine that
701                   will be used to convert the characters to the target
702                   encoding using "Text::Iconv" (which needs to be installed,
703                   look at the documentation for the module and for the
704                   "iconv" library to find out which encodings are available
705                   on your system)
706
707                      my $conv = XML::Twig::iconv_convert( 'latin1');
708                      my $t = XML::Twig->new(output_filter => $conv);
709
710               unicode_convert ($encoding)
711                   this function is used to create a filter subroutine that
712                   will be used to convert the characters to the target
713                   encoding using  "Unicode::Strings" and "Unicode::Map8"
714                   (which need to be installed, look at the documentation for
715                   the modules to find out which encodings are available on
716                   your system)
717
718                      my $conv = XML::Twig::unicode_convert( 'latin1');
719                      my $t = XML::Twig->new(output_filter => $conv);
720
721               The "text" and "att" methods do not use the filter, so their
722               result are always in unicode.
723
724               Those predeclared filters are based on subroutines that can be
725               used by themselves (as "XML::Twig::foo").
726
727               html_encode ($string)
728                   Use "HTML::Entities" to encode a utf8 string
729
730               safe_encode ($string)
731                   Use either a regexp (perl < 5.8) or "Encode" to encode non-
732                   ascii characters in the string in "&#<nnnn>;" format
733
734               safe_encode_hex ($string)
735                   Use either a regexp (perl < 5.8) or "Encode" to encode non-
736                   ascii characters in the string in "&#x<nnnn>;" format
737
738               regexp2latin1 ($string)
739                   Use a regexp to encode a utf8 string into latin 1
740                   (ISO-8859-1). Does not work with Perl 5.8.0!
741
742           output_text_filter
743               same as output_filter, except it doesn't apply to the brackets
744               and quotes around attribute values. This is useful for all
745               filters that could change the tagging, basically anything that
746               does not just change the encoding of the output. "html", "safe"
747               and "safe_hex" are better used with this option.
748
749           input_filter
750               This option is similar to "output_filter" except the filter is
751               applied to the characters before they are stored in the twig,
752               at parsing time.
753
754           remove_cdata
755               Setting this option to a true value will force the twig to
756               output CDATA sections as regular (escaped) PCDATA
757
758           parse_start_tag
759               If you use the "keep_encoding" option then this option can be
760               used to replace the default parsing function. You should
761               provide a coderef (a reference to a subroutine) as the
762               argument, this subroutine takes the original tag (given by
763               XML::Parser::Expat "original_string()" method) and returns a
764               tag and the attributes in a hash (or in a list
765               attribute_name/attribute value).
766
767           expand_external_ents
768               When this option is used external entities (that are defined)
769               are expanded when the document is output using "print"
770               functions such as "print ", "sprint ", "flush " and "xml_string
771               ".  Note that in the twig the entity will be stored as an
772               element with a tag '"#ENT"', the entity will not be expanded
773               there, so you might want to process the entities before
774               outputting it.
775
776               If an external entity is not available, then the parse will
777               fail.
778
779               A special case is when the value of this option is -1. In that
780               case a missing entity will not cause the parser to die, but its
781               "name", "sysid" and "pubid" will be stored in the twig as
782               "$twig->{twig_missing_system_entities}" (a reference to an
783               array of hashes { name => <name>, sysid => <sysid>, pubid =>
784               <pubid> }). Yes, this is a bit of a hack, but it's useful in
785               some cases.
786
787           load_DTD
788               If this argument is set to a true value, "parse" or "parsefile"
789               on the twig will load  the DTD information. This information
790               can then be accessed through the twig, in a "DTD_handler" for
791               example. This will load even an external DTD.
792
793               Default and fixed values for attributes will also be filled,
794               based on the DTD.
795
796               Note that to do this the module will generate a temporary file
797               in the current directory. If this is a problem let me know and
798               I will add an option to specify an alternate directory.
799
800               See "DTD Handling" for more information
801
802           DTD_handler
803               Set a handler that will be called once the doctype (and the
804               DTD) have been loaded, with 2 arguments, the twig and the DTD.
805
806           no_prolog
807               Does not output a prolog (XML declaration and DTD)
808
809           id  This optional argument gives the name of an attribute that can
810               be used as an ID in the document. Elements whose ID is known
811               can be accessed through the elt_id method. id defaults to 'id'.
812               See "BUGS "
813
814           discard_spaces
815               If this optional argument is set to a true value then spaces
816               are discarded when they look non-significant: strings
817               containing only spaces are discarded.  This argument is set to
818               true by default.
819
820           keep_spaces
821               If this optional argument is set to a true value then all
822               spaces in the document are kept, and stored as "PCDATA".
823
824               Warning: adding this option can result in changes in the twig
825               generated: space that was previously discarded might end up in
826               a new text element. see the difference by calling the following
827               code with 0 and 1 as arguments:
828
829                 perl -MXML::Twig -e'print XML::Twig->new( keep_spaces => shift)->parse( "<d> \n<e/></d>")->_dump'
830
831               "keep_spaces" and "discard_spaces" cannot be both set.
832
833           discard_spaces_in
834               This argument sets "keep_spaces" to true but will cause the
835               twig builder to discard spaces in the elements listed.
836
837               The syntax for using this argument is:
838
839                 XML::Twig->new( discard_spaces_in => [ 'elt1', 'elt2']);
840
841           keep_spaces_in
842               This argument sets "discard_spaces" to true but will cause the
843               twig builder to keep spaces in the elements listed.
844
845               The syntax for using this argument is:
846
847                 XML::Twig->new( keep_spaces_in => [ 'elt1', 'elt2']);
848
849               Warning: adding this option can result in changes in the twig
850               generated: space that was previously discarded might end up in
851               a new text element.
852
853           pretty_print
854               Set the pretty print method, amongst '"none"' (default),
855               '"nsgmls"', '"nice"', '"indented"', '"indented_c"',
856               '"indented_a"', '"indented_close_tag"', '"cvs"', '"wrapped"',
857               '"record"' and '"record_c"'
858
859               pretty_print formats:
860
861               none
862                   The document is output as one ling string, with no line
863                   breaks except those found within text elements
864
865               nsgmls
866                   Line breaks are inserted in safe places: that is within
867                   tags, between a tag and an attribute, between attributes
868                   and before the > at the end of a tag.
869
870                   This is quite ugly but better than "none", and it is very
871                   safe, the document will still be valid (conforming to its
872                   DTD).
873
874                   This is how the SGML parser "sgmls" splits documents, hence
875                   the name.
876
877               nice
878                   This option inserts line breaks before any tag that does
879                   not contain text (so element with textual content are not
880                   broken as the \n is the significant).
881
882                   WARNING: this option leaves the document well-formed but
883                   might make it invalid (not conformant to its DTD). If you
884                   have elements declared as
885
886                     <!ELEMENT foo (#PCDATA|bar)>
887
888                   then a "foo" element including a "bar" one will be printed
889                   as
890
891                     <foo>
892                     <bar>bar is just pcdata</bar>
893                     </foo>
894
895                   This is invalid, as the parser will take the line break
896                   after the "foo" tag as a sign that the element contains
897                   PCDATA, it will then die when it finds the "bar" tag. This
898                   may or may not be important for you, but be aware of it!
899
900               indented
901                   Same as "nice" (and with the same warning) but indents
902                   elements according to their level
903
904               indented_c
905                   Same as "indented" but a little more compact: the closing
906                   tags are on the same line as the preceding text
907
908               indented_close_tag
909                   Same as "indented" except that the closing tag is also
910                   indented, to line up with the tags within the element
911
912               idented_a
913                   This formats XML files in a line-oriented version control
914                   friendly way.  The format is described in
915                   <http://tinyurl.com/2kwscq> (that's an Oracle document with
916                   an insanely long URL).
917
918                   Note that to be totaly conformant to the "spec", the order
919                   of attributes should not be changed, so if they are not
920                   already in alphabetical order you will need to use the
921                   "keep_atts_order" option.
922
923               cvs Same as "idented_a".
924
925               wrapped
926                   Same as "indented_c" but lines are wrapped using
927                   Text::Wrap::wrap. The default length for lines is the
928                   default for $Text::Wrap::columns, and can be changed by
929                   changing that variable.
930
931               record
932                   This is a record-oriented pretty print, that display data
933                   in records, one field per line (which looks a LOT like
934                   "indented")
935
936               record_c
937                   Stands for record compact, one record per line
938
939           empty_tags
940               Set the empty tag display style ('"normal"', '"html"' or
941               '"expand"').
942
943               "normal" outputs an empty tag '"<tag/>"', "html" adds a space
944               '"<tag />"' for elements that can be empty in XHTML and
945               "expand" outputs '"<tag></tag>"'
946
947           quote
948               Set the quote character for attributes ('"single"' or
949               '"double"').
950
951           escape_gt
952               By default XML::Twig does not escape the character > in its
953               output, as it is not mandated by the XML spec. With this option
954               on, > will be replaced by "&gt;"
955
956           comments
957               Set the way comments are processed: '"drop"' (default),
958               '"keep"' or '"process"'
959
960               Comments processing options:
961
962               drop
963                   drops the comments, they are not read, nor printed to the
964                   output
965
966               keep
967                   comments are loaded and will appear on the output, they are
968                   not accessible within the twig and will not interfere with
969                   processing though
970
971                   Note: comments in the middle of a text element such as
972
973                     <p>text <!-- comment --> more text --></p>
974
975                   are kept at their original position in the text. Using
976                   EeX"print" methods like "print" or "sprint" will return the
977                   comments in the text. Using "text" or "field" on the other
978                   hand will not.
979
980                   Any use of "set_pcdata" on the "#PCDATA" element (directly
981                   or through other methods like "set_content") will delete
982                   the comment(s).
983
984               process
985                   comments are loaded in the twig and will be treated as
986                   regular elements (their "tag" is "#COMMENT") this can
987                   interfere with processing if you expect
988                   "$elt->{first_child}" to be an element but find a comment
989                   there.  Validation will not protect you from this as
990                   comments can happen anywhere.  You can use
991                   "$elt->first_child( 'tag')" (which is a good habit anyway)
992                   to get where you want.
993
994                   Consider using "process" if you are outputting SAX events
995                   from XML::Twig.
996
997           pi  Set the way processing instructions are processed: '"drop"',
998               '"keep"' (default) or '"process"'
999
1000               Note that you can also set PI handlers in the "twig_handlers"
1001               option:
1002
1003                 '?'       => \&handler
1004                 '?target' => \&handler 2
1005
1006               The handlers will be called with 2 parameters, the twig and the
1007               PI element if "pi" is set to "process", and with 3, the twig,
1008               the target and the data if "pi" is set to "keep". Of course
1009               they will not be called if "pi" is set to "drop".
1010
1011               If "pi" is set to "keep" the handler should return a string
1012               that will be used as-is as the PI text (it should look like ""
1013               <?target data?" >" or '' if you want to remove the PI),
1014
1015               Only one handler will be called, "?target" or "?" if no
1016               specific handler for that target is available.
1017
1018           map_xmlns
1019               This option is passed a hashref that maps uri's to prefixes.
1020               The prefixes in the document will be replaced by the ones in
1021               the map. The mapped prefixes can (actually have to) be used to
1022               trigger handlers, navigate or query the document.
1023
1024               Here is an example:
1025
1026                 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1027                                        twig_handlers =>
1028                                          { 'svg:circle' => sub { $_->set_att( r => 20) } },
1029                                        pretty_print => 'indented',
1030                                      )
1031                                 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1032                                             <gr:circle cx="10" cy="90" r="10"/>
1033                                          </doc>'
1034                                        )
1035                                 ->print;
1036
1037               This will output:
1038
1039                 <doc xmlns:svg="http://www.w3.org/2000/svg">
1040                    <svg:circle cx="10" cy="90" r="20"/>
1041                 </doc>
1042
1043           keep_original_prefix
1044               When used with "map_xmlns" this option will make "XML::Twig"
1045               use the original namespace prefixes when outputting a document.
1046               The mapped prefix will still be used for triggering handlers
1047               and in navigation and query methods.
1048
1049                 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1050                                        twig_handlers =>
1051                                          { 'svg:circle' => sub { $_->set_att( r => 20) } },
1052                                        keep_original_prefix => 1,
1053                                        pretty_print => 'indented',
1054                                      )
1055                                 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1056                                             <gr:circle cx="10" cy="90" r="10"/>
1057                                          </doc>'
1058                                        )
1059                                 ->print;
1060
1061               This will output:
1062
1063                 <doc xmlns:gr="http://www.w3.org/2000/svg">
1064                    <gr:circle cx="10" cy="90" r="20"/>
1065                 </doc>
1066
1067           index ($arrayref or $hashref)
1068               This option creates lists of specific elements during the
1069               parsing of the XML.  It takes a reference to either a list of
1070               triggering expressions or to a hash name => expression, and for
1071               each one generates the list of elements that match the
1072               expression. The list can be accessed through the "index"
1073               method.
1074
1075               example:
1076
1077                 # using an array ref
1078                 my $t= XML::Twig->new( index => [ 'div', 'table' ])
1079                                 ->parsefile( "foo.xml');
1080                 my $divs= $t->index( 'div');
1081                 my $first_div= $divs->[0];
1082                 my $last_table= $t->index( table => -1);
1083
1084                 # using a hashref to name the indexes
1085                 my $t= XML::Twig->new( index => { email => 'a[@href=~/^\s*mailto:/]')
1086                                 ->parsefile( "foo.xml');
1087                 my $last_emails= $t->index( email => -1);
1088
1089               Note that the index is not maintained after the parsing. If
1090               elements are deleted, renamed or otherwise hurt during
1091               processing, the index is NOT updated.
1092
1093           Note: I _HATE_ the Java-like name of arguments used by most XML
1094           modules.  So in pure TIMTOWTDI fashion all arguments can be written
1095           either as "UglyJavaLikeName" or as "readable_perl_name":
1096           "twig_print_outside_roots" or "TwigPrintOutsideRoots" (or even
1097           "twigPrintOutsideRoots" {shudder}).  XML::Twig normalizes them
1098           before processing them.
1099
1100       parse ( $source)
1101           The $source parameter should either be a string containing the
1102           whole XML document, or it should be an open "IO::Handle".
1103           Constructor options to "XML::Parser::Expat" given as keyword-value
1104           pairs may follow the$source parameter. These override, for this
1105           call, any options or attributes passed through from the XML::Parser
1106           instance.
1107
1108           A die call is thrown if a parse error occurs. Otherwise it will
1109           return the twig built by the parse. Use "safe_parse" if you want
1110           the parsing to return even when an error occurs.
1111
1112           If this method is called as a class method ("XML::Twig->parse(
1113           $some_xml_or_html)") then an XML::Twig object is created, using the
1114           parameters except the last one (eg "XML::Twig->parse( pretty_print
1115           => 'indented', $some_xml_or_html)") and "xparse" is called on it.
1116
1117       parsestring
1118           This is just an alias for "parse" for backwards compatibility.
1119
1120       parsefile (FILE [, OPT => OPT_VALUE [...]])
1121           Open "FILE" for reading, then call "parse" with the open handle.
1122           The file is closed no matter how "parse" returns.
1123
1124           A "die" call is thrown if a parse error occurs. Otherwise it will
1125           return the twig built by the parse. Use "safe_parsefile" if you
1126           want the parsing to return even when an error occurs.
1127
1128       parsefile_inplace ( $file, $optional_extension)
1129           Parse and update a file "in place". It does this by creating a temp
1130           file, selecting it as the default for print() statements (and
1131           methods), then parsing the input file. If the parsing is
1132           successful, then the temp file is moved to replace the input file.
1133
1134           If an extension is given then the original file is backed-up (the
1135           rules for the extension are the same as the rule for the -i option
1136           in perl).
1137
1138       parsefile_html_inplace ( $file, $optional_extension)
1139           Same as parsefile_inplace, except that it parses HTML instead of
1140           XML
1141
1142       parseurl ($url $optional_user_agent)
1143           Gets the data from $url and parse it. The data is piped to the
1144           parser in chunks the size of the XML::Parser::Expat buffer, so
1145           memory consumption and hopefully speed are optimal.
1146
1147           For most (read "small") XML it is probably as efficient (and easier
1148           to debug) to just "get" the XML file and then parse it as a string.
1149
1150             use XML::Twig;
1151             use LWP::Simple;
1152             my $twig= XML::Twig->new();
1153             $twig->parse( LWP::Simple::get( $URL ));
1154
1155           or
1156
1157             use XML::Twig;
1158             my $twig= XML::Twig->nparse( $URL);
1159
1160           If the $optional_user_agent argument is used then it is used,
1161           otherwise a new one is created.
1162
1163       safe_parse ( SOURCE [, OPT => OPT_VALUE [...]])
1164           This method is similar to "parse" except that it wraps the parsing
1165           in an "eval" block. It returns the twig on success and 0 on failure
1166           (the twig object also contains the parsed twig). $@ contains the
1167           error message on failure.
1168
1169           Note that the parsing still stops as soon as an error is detected,
1170           there is no way to keep going after an error.
1171
1172       safe_parsefile (FILE [, OPT => OPT_VALUE [...]])
1173           This method is similar to "parsefile" except that it wraps the
1174           parsing in an "eval" block. It returns the twig on success and 0 on
1175           failure (the twig object also contains the parsed twig) . $@
1176           contains the error message on failure
1177
1178           Note that the parsing still stops as soon as an error is detected,
1179           there is no way to keep going after an error.
1180
1181       safe_parseurl ($url $optional_user_agent)
1182           Same as "parseurl" except that it wraps the parsing in an "eval"
1183           block. It returns the twig on success and 0 on failure (the twig
1184           object also contains the parsed twig) . $@ contains the error
1185           message on failure
1186
1187       parse_html ($string_or_fh)
1188           parse an HTML string or file handle (by converting it to XML using
1189           HTML::TreeBuilder, which needs to be available).
1190
1191           This works nicely, but some information gets lost in the process:
1192           newlines are removed, and (at least on the version I use), comments
1193           get get an extra CDATA section inside ( <!-- foo --> becomes <!--
1194           <![CDATA[ foo ]]> -->
1195
1196       parsefile_html
1197           parse an HTML file (by converting it to XML using
1198           HTML::TreeBuilder, which needs to be available). The file is loaded
1199           completely in memory and converted to XML before being parsed.
1200
1201           Alpha: implementation, and thus generated XML could change.
1202
1203       safe_parseurl_html ($url $optional_user_agent)
1204           Same as "parseurl_html"> except that it wraps the parsing in an
1205           "eval" block.  It returns the twig on success and 0 on failure (the
1206           twig object also contains the parsed twig) . $@ contains the error
1207           message on failure
1208
1209       safe_parsefile_html ($file $optional_user_agent)
1210           Same as "parsefile_html"> except that it wraps the parsing in an
1211           "eval" block.  It returns the twig on success and 0 on failure (the
1212           twig object also contains the parsed twig) . $@ contains the error
1213           message on failure
1214
1215       safe_parse_html ($string_or_fh)
1216           Same as "parse_html" except that it wraps the parsing in an "eval"
1217           block.  It returns the twig on success and 0 on failure (the twig
1218           object also contains the parsed twig) . $@ contains the error
1219           message on failure
1220
1221       xparse ($thing_to_parse)
1222           parse the $thing_to_parse, whether it is a filehandle, a string, an
1223           HTML file, an HTML URL, an URL or a file.
1224
1225           Note that this is mostly a convenience method for one-off scripts.
1226           For example files that end in '.htm' or '.html' are parsed first as
1227           XML, and if this fails as HTML. This is certainly not the most
1228           efficient way to do this in general.
1229
1230       nparse ($optional_twig_options, $thing_to_parse)
1231           create a twig with the $optional_options, and parse the
1232           $thing_to_parse, whether it is a filehandle, a string, an HTML
1233           file, an HTML URL, an URL or a file.
1234
1235           Examples:
1236
1237              XML::Twig->nparse( "file.xml");
1238              XML::Twig->nparse( error_context => 1, "file://file.xml");
1239
1240       nparse_pp ($optional_twig_options, $thing_to_parse)
1241           same as "nparse" but also sets the "pretty_print" option to
1242           "indented".
1243
1244       nparse_e ($optional_twig_options, $thing_to_parse)
1245           same as "nparse" but also sets the "error_context" option to 1.
1246
1247       nparse_ppe ($optional_twig_options, $thing_to_parse)
1248           same as "nparse" but also sets the "pretty_print" option to
1249           "indented" and the "error_context" option to 1.
1250
1251       parser
1252           This method returns the "expat" object (actually the
1253           XML::Parser::Expat object) used during parsing. It is useful for
1254           example to call XML::Parser::Expat methods on it. To get the line
1255           of a tag for example use "$t->parser->current_line".
1256
1257       setTwigHandlers ($handlers)
1258           Set the twig_handlers. $handlers is a reference to a hash similar
1259           to the one in the "twig_handlers" option of new. All previous
1260           handlers are unset.  The method returns the reference to the
1261           previous handlers.
1262
1263       setTwigHandler ($exp $handler)
1264           Set a single twig_handler for elements matching $exp. $handler is a
1265           reference to a subroutine. If the handler was previously set then
1266           the reference to the previous handler is returned.
1267
1268       setStartTagHandlers ($handlers)
1269           Set the start_tag handlers. $handlers is a reference to a hash
1270           similar to the one in the "start_tag_handlers" option of new. All
1271           previous handlers are unset.  The method returns the reference to
1272           the previous handlers.
1273
1274       setStartTagHandler ($exp $handler)
1275           Set a single start_tag handlers for elements matching $exp.
1276           $handler is a reference to a subroutine. If the handler was
1277           previously set then the reference to the previous handler is
1278           returned.
1279
1280       setEndTagHandlers ($handlers)
1281           Set the end_tag handlers. $handlers is a reference to a hash
1282           similar to the one in the "end_tag_handlers" option of new. All
1283           previous handlers are unset.  The method returns the reference to
1284           the previous handlers.
1285
1286       setEndTagHandler ($exp $handler)
1287           Set a single end_tag handlers for elements matching $exp. $handler
1288           is a reference to a subroutine. If the handler was previously set
1289           then the reference to the previous handler is returned.
1290
1291       setTwigRoots ($handlers)
1292           Same as using the "twig_roots" option when creating the twig
1293
1294       setCharHandler ($exp $handler)
1295           Set a "char_handler"
1296
1297       setIgnoreEltsHandler ($exp)
1298           Set a "ignore_elt" handler (elements that match $exp will be
1299           ignored
1300
1301       setIgnoreEltsHandlers ($exp)
1302           Set all "ignore_elt" handlers (previous handlers are replaced)
1303
1304       dtd Return the dtd (an XML::Twig::DTD object) of a twig
1305
1306       xmldecl
1307           Return the XML declaration for the document, or a default one if it
1308           doesn't have one
1309
1310       doctype
1311           Return the doctype for the document
1312
1313       doctype_name
1314           returns the doctype of the document from the doctype declaration
1315
1316       system_id
1317           returns the system value of the DTD of the document from the
1318           doctype declaration
1319
1320       public_id
1321           returns the public doctype of the document from the doctype
1322           declaration
1323
1324       internal_subset
1325           returns the internal subset of the DTD
1326
1327       dtd_text
1328           Return the DTD text
1329
1330       dtd_print
1331           Print the DTD
1332
1333       model ($tag)
1334           Return the model (in the DTD) for the element $tag
1335
1336       root
1337           Return the root element of a twig
1338
1339       set_root ($elt)
1340           Set the root of a twig
1341
1342       first_elt ($optional_condition)
1343           Return the first element matching $optional_condition of a twig, if
1344           no condition is given then the root is returned
1345
1346       last_elt ($optional_condition)
1347           Return the last element matching $optional_condition of a twig, if
1348           no condition is given then the last element of the twig is returned
1349
1350       elt_id        ($id)
1351           Return the element whose "id" attribute is $id
1352
1353       getEltById
1354           Same as "elt_id"
1355
1356       index ($index_name, $optional_index)
1357           If the $optional_index argument is present, return the
1358           corresponding element in the index (created using the "index"
1359           option for "XML::Twig-"new>)
1360
1361           If the argument is not present, return an arrayref to the index
1362
1363       normalize
1364           merge together all consecutive pcdata elements in the document (if
1365           for example you have turned some elements into pcdata using
1366           "erase", this will give you a "clean" document in which there all
1367           text elements are as long as possible).
1368
1369       encoding
1370           This method returns the encoding of the XML document, as defined by
1371           the "encoding" attribute in the XML declaration (ie it is "undef"
1372           if the attribute is not defined)
1373
1374       set_encoding
1375           This method sets the value of the "encoding" attribute in the XML
1376           declaration.  Note that if the document did not have a declaration
1377           it is generated (with an XML version of 1.0)
1378
1379       xml_version
1380           This method returns the XML version, as defined by the "version"
1381           attribute in the XML declaration (ie it is "undef" if the attribute
1382           is not defined)
1383
1384       set_xml_version
1385           This method sets the value of the "version" attribute in the XML
1386           declaration.  If the declaration did not exist it is created.
1387
1388       standalone
1389           This method returns the value of the "standalone" declaration for
1390           the document
1391
1392       set_standalone
1393           This method sets the value of the "standalone" attribute in the XML
1394           declaration.  Note that if the document did not have a declaration
1395           it is generated (with an XML version of 1.0)
1396
1397       set_output_encoding
1398           Set the "encoding" "attribute" in the XML declaration
1399
1400       set_doctype ($name, $system, $public, $internal)
1401           Set the doctype of the element. If an argument is "undef" (or not
1402           present) then its former value is retained, if a false ('' or 0)
1403           value is passed then the former value is deleted;
1404
1405       entity_list
1406           Return the entity list of a twig
1407
1408       entity_names
1409           Return the list of all defined entities
1410
1411       entity ($entity_name)
1412           Return the entity
1413
1414       change_gi      ($old_gi, $new_gi)
1415           Performs a (very fast) global change. All elements $old_gi are now
1416           $new_gi. This is a bit dangerous though and should be avoided if <
1417           possible, as the new tag might be ignored in subsequent processing.
1418
1419           See "BUGS "
1420
1421       flush            ($optional_filehandle, %options)
1422           Flushes a twig up to (and including) the current element, then
1423           deletes all unnecessary elements from the tree that's kept in
1424           memory.  "flush" keeps track of which elements need to be
1425           open/closed, so if you flush from handlers you don't have to worry
1426           about anything. Just keep flushing the twig every time you're done
1427           with a sub-tree and it will come out well-formed. After the whole
1428           parsing don't forget to"flush" one more time to print the end of
1429           the document.  The doctype and entity declarations are also
1430           printed.
1431
1432           flush take an optional filehandle as an argument.
1433
1434           options: use the "update_DTD" option if you have updated the
1435           (internal) DTD and/or the entity list and you want the updated DTD
1436           to be output
1437
1438           The "pretty_print" option sets the pretty printing of the document.
1439
1440              Example: $t->flush( Update_DTD => 1);
1441                       $t->flush( $filehandle, pretty_print => 'indented');
1442                       $t->flush( \*FILE);
1443
1444       flush_up_to ($elt, $optional_filehandle, %options)
1445           Flushes up to the $elt element. This allows you to keep part of the
1446           tree in memory when you "flush".
1447
1448           options: see flush.
1449
1450       purge
1451           Does the same as a "flush" except it does not print the twig. It
1452           just deletes all elements that have been completely parsed so far.
1453
1454       purge_up_to ($elt)
1455           Purges up to the $elt element. This allows you to keep part of the
1456           tree in memory when you "purge".
1457
1458       print            ($optional_filehandle, %options)
1459           Prints the whole document associated with the twig. To be used only
1460           AFTER the parse.
1461
1462           options: see "flush".
1463
1464       print_to_file    ($filename, %options)
1465           Prints the whole document associated with the twig to file
1466           $filename.  To be used only AFTER the parse.
1467
1468           options: see "flush".
1469
1470       sprint
1471           Return the text of the whole document associated with the twig. To
1472           be used only AFTER the parse.
1473
1474           options: see "flush".
1475
1476       trim
1477           Trim the document: gets rid of initial and trailing spaces, and
1478           replaces multiple spaces by a single one.
1479
1480       toSAX1 ($handler)
1481           Send SAX events for the twig to the SAX1 handler $handler
1482
1483       toSAX2 ($handler)
1484           Send SAX events for the twig to the SAX2 handler $handler
1485
1486       flush_toSAX1 ($handler)
1487           Same as flush, except that SAX events are sent to the SAX1 handler
1488           $handler instead of the twig being printed
1489
1490       flush_toSAX2 ($handler)
1491           Same as flush, except that SAX events are sent to the SAX2 handler
1492           $handler instead of the twig being printed
1493
1494       ignore
1495           This method should be called during parsing, usually in
1496           "start_tag_handlers".  It causes the element to be skipped during
1497           the parsing: the twig is not built for this element, it will not be
1498           accessible during parsing or after it. The element will not take up
1499           any memory and parsing will be faster.
1500
1501           Note that this method can also be called on an element. If the
1502           element is a parent of the current element then this element will
1503           be ignored (the twig will not be built any more for it and what has
1504           already been built will be deleted).
1505
1506       set_pretty_print  ($style)
1507           Set the pretty print method, amongst '"none"' (default),
1508           '"nsgmls"', '"nice"', '"indented"', "indented_c", '"wrapped"',
1509           '"record"' and '"record_c"'
1510
1511           WARNING: the pretty print style is a GLOBAL variable, so once set
1512           it's applied to ALL "print"'s (and "sprint"'s). Same goes if you
1513           use XML::Twig with "mod_perl" . This should not be a problem as the
1514           XML that's generated is valid anyway, and XML processors (as well
1515           as HTML processors, including browsers) should not care. Let me
1516           know if this is a big problem, but at the moment the
1517           performance/cleanliness trade-off clearly favors the global
1518           approach.
1519
1520       set_empty_tag_style  ($style)
1521           Set the empty tag display style ('"normal"', '"html"' or
1522           '"expand"'). As with "set_pretty_print" this sets a global flag.
1523
1524           "normal" outputs an empty tag '"<tag/>"', "html" adds a space
1525           '"<tag />"' for elements that can be empty in XHTML and "expand"
1526           outputs '"<tag></tag>"'
1527
1528       set_remove_cdata  ($flag)
1529           set (or unset) the flag that forces the twig to output CDATA
1530           sections as regular (escaped) PCDATA
1531
1532       print_prolog     ($optional_filehandle, %options)
1533           Prints the prolog (XML declaration + DTD + entity declarations) of
1534           a document.
1535
1536           options: see "flush".
1537
1538       prolog     ($optional_filehandle, %options)
1539           Return the prolog (XML declaration + DTD + entity declarations) of
1540           a document.
1541
1542           options: see "flush".
1543
1544       finish
1545           Call Expat "finish" method.  Unsets all handlers (including
1546           internal ones that set context), but expat continues parsing to the
1547           end of the document or until it finds an error.  It should finish
1548           up a lot faster than with the handlers set.
1549
1550       finish_print
1551           Stops twig processing, flush the twig and proceed to finish
1552           printing the document as fast as possible. Use this method when
1553           modifying a document and the modification is done.
1554
1555       finish_now
1556           Stops twig processing, does not finish parsing the document (which
1557           could actually be not well-formed after the point where
1558           "finish_now" is called).  Execution resumes after the "Lparse"> or
1559           "parsefile" call. The content of the twig is what has been parsed
1560           so far (all open elements at the time "finish_now" is called are
1561           considered closed).
1562
1563       set_expand_external_entities
1564           Same as using the "expand_external_ents" option when creating the
1565           twig
1566
1567       set_input_filter
1568           Same as using the "input_filter" option when creating the twig
1569
1570       set_keep_atts_order
1571           Same as using the "keep_atts_order" option when creating the twig
1572
1573       set_keep_encoding
1574           Same as using the "keep_encoding" option when creating the twig
1575
1576       escape_gt
1577           usually XML::Twig does not escape > in its output. Using this
1578           option makes it replace > by &gt;
1579
1580       do_not_escape_gt
1581           reverts XML::Twig behavior to its default of not escaping > in its
1582           output.
1583
1584       set_output_filter
1585           Same as using the "output_filter" option when creating the twig
1586
1587       set_output_text_filter
1588           Same as using the "output_text_filter" option when creating the
1589           twig
1590
1591       add_stylesheet ($type, @options)
1592           Adds an external stylesheet to an XML document.
1593
1594           Supported types and options:
1595
1596           xsl option: the url of the stylesheet
1597
1598               Example:
1599
1600                 $t->add_stylesheet( xsl => "xsl_style.xsl");
1601
1602               will generate the following PI at the beginning of the
1603               document:
1604
1605                 <?xml-stylesheet type="text/xsl" href="xsl_style.xsl"?>
1606
1607           css option: the url of the stylesheet
1608
1609       Methods inherited from XML::Parser::Expat
1610           A twig inherits all the relevant methods from XML::Parser::Expat.
1611           These methods can only be used during the parsing phase (they will
1612           generate a fatal error otherwise).
1613
1614           Inherited methods are:
1615
1616           depth
1617               Returns the size of the context list.
1618
1619           in_element
1620               Returns true if NAME is equal to the name of the innermost
1621               curaXX rently opened element. If namespace processing is being
1622               used and you want to check against a name that may be in a
1623               namespace, then use the generate_ns_name method to create the
1624               NAME argument.
1625
1626           within_element
1627               Returns the number of times the given name appears in the
1628               context list.  If namespace processing is being used and you
1629               want to check against a name that may be in a namespace, then
1630               use the generaXX ate_ns_name method to create the NAME
1631               argument.
1632
1633           context
1634               Returns a list of element names that represent open elements,
1635               with the last one being the innermost. Inside start and end tag
1636               hanaXX dlers, this will be the tag of the parent element.
1637
1638           current_line
1639               Returns the line number of the current position of the parse.
1640
1641           current_column
1642               Returns the column number of the current position of the parse.
1643
1644           current_byte
1645               Returns the current position of the parse.
1646
1647           position_in_context
1648               Returns a string that shows the current parse position. LINES
1649               should be an integer >= 0 that represents the number of lines
1650               on either side of the current parse line to place into the
1651               returned string.
1652
1653           base ([NEWBASE])
1654               Returns the current value of the base for resolving relative
1655               URIs.  If NEWBASE is supplied, changes the base to that value.
1656
1657           current_element
1658               Returns the name of the innermost currently opened element.
1659               Inside start or end handlers, returns the parent of the element
1660               associated with those tags.
1661
1662           element_index
1663               Returns an integer that is the depth-first visit order of the
1664               curaXX rent element. This will be zero outside of the root
1665               element. For example, this will return 1 when called from the
1666               start handler for the root element start tag.
1667
1668           recognized_string
1669               Returns the string from the document that was recognized in
1670               order to call the current handler. For instance, when called
1671               from a start handler, it will give us the the start-tag string.
1672               The string is encoded in UTF-8.  This method doesn't return a
1673               meaningful string inside declaration handlers.
1674
1675           original_string
1676               Returns the verbatim string from the document that was
1677               recognized in order to call the current handler. The string is
1678               in the original document encoding. This method doesn't return a
1679               meaningful string inside declaration handlers.
1680
1681           xpcroak
1682               Concatenate onto the given message the current line number
1683               within the XML document plus the message implied by
1684               ErrorContext. Then croak with the formed message.
1685
1686           xpcarp
1687               Concatenate onto the given message the current line number
1688               within the XML document plus the message implied by
1689               ErrorContext. Then carp with the formed message.
1690
1691           xml_escape(TEXT [, CHAR [, CHAR ...]])
1692               Returns TEXT with markup characters turned into character
1693               entities.  Any additional characters provided as arguments are
1694               also turned into character references where found in TEXT.
1695
1696               (this method is broken on some versions of expat/XML::Parser)
1697
1698       path ( $optional_tag)
1699           Return the element context in a form similar to XPath's short form:
1700           '"/root/tag1/../tag"'
1701
1702       get_xpath  ( $optional_array_ref, $xpath, $optional_offset)
1703           Performs a "get_xpath" on the document root (see <Elt|"Elt">)
1704
1705           If the $optional_array_ref argument is used the array must contain
1706           elements. The $xpath expression is applied to each element in turn
1707           and the result is union of all results. This way a first query can
1708           be refined in further steps.
1709
1710       find_nodes ( $optional_array_ref, $xpath, $optional_offset)
1711           same as "get_xpath"
1712
1713       findnodes ( $optional_array_ref, $xpath, $optional_offset)
1714           same as "get_xpath" (similar to the XML::LibXML method)
1715
1716       findvalue ( $optional_array_ref, $xpath, $optional_offset)
1717           Return the "join" of all texts of the results of applying
1718           "get_xpath" to the node (similar to the XML::LibXML method)
1719
1720       subs_text ($regexp, $replace)
1721           subs_text does text substitution on the whole document, similar to
1722           perl's " s///" operator.
1723
1724       dispose
1725           Useful only if you don't have "Scalar::Util" or "WeakRef"
1726           installed.
1727
1728           Reclaims properly the memory used by an XML::Twig object. As the
1729           object has circular references it never goes out of scope, so if
1730           you want to parse lots of XML documents then the memory leak
1731           becomes a problem. Use "$twig->dispose" to clear this problem.
1732
1733       create_accessors (list_of_attribute_names)
1734           A convenience method that creates l-valued accessors for
1735           attributes.  So "$twig->create_accessors( 'foo')" will create a
1736           "foo" method that can be called on elements:
1737
1738             $elt->foo;         # equivalent to $elt->{'att'}->{'foo'};
1739             $elt->foo( 'bar'); # equivalent to $elt->set_att( foo => 'bar');
1740
1741       set_do_not_escape_amp_in_atts
1742           An evil method, that I only document because Test::Pod::Coverage
1743           complaints otherwise, but really, you don't want to know about it.
1744
1745   XML::Twig::Elt
1746       new          ($optional_tag, $optional_atts, @optional_content)
1747           The "tag" is optional (but then you can't have a content ), the
1748           $optional_atts argument is a reference to a hash of attributes, the
1749           content can be just a string or a list of strings and element. A
1750           content of '"#EMPTY"' creates an empty element;
1751
1752            Examples: my $elt= XML::Twig::Elt->new();
1753                      my $elt= XML::Twig::Elt->new( para => { align => 'center' });
1754                      my $elt= XML::Twig::Elt->new( para => { align => 'center' }, 'foo');
1755                      my $elt= XML::Twig::Elt->new( br   => '#EMPTY');
1756                      my $elt= XML::Twig::Elt->new( 'para');
1757                      my $elt= XML::Twig::Elt->new( para => 'this is a para');
1758                      my $elt= XML::Twig::Elt->new( para => $elt3, 'another para');
1759
1760           The strings are not parsed, the element is not attached to any
1761           twig.
1762
1763           WARNING: if you rely on ID's then you will have to set the id
1764           yourself. At this point the element does not belong to a twig yet,
1765           so the ID attribute is not known so it won't be stored in the ID
1766           list.
1767
1768           Note that "#COMMENT", "#PCDATA" or "#CDATA" are valid tag names,
1769           that will create text elements.
1770
1771           To create an element "foo" containing a CDATA section:
1772
1773                      my $foo= XML::Twig::Elt->new( '#CDATA' => "content of the CDATA section")
1774                                             ->wrap_in( 'foo');
1775
1776           An attribute of '#CDATA', will create the content of the element as
1777           CDATA:
1778
1779             my $elt= XML::Twig::Elt->new( 'p' => { '#CDATA' => 1}, 'foo < bar');
1780
1781           creates an element
1782
1783             <p><![CDATA[foo < bar]]></>
1784
1785       parse         ($string, %args)
1786           Creates an element from an XML string. The string is actually
1787           parsed as a new twig, then the root of that twig is returned.  The
1788           arguments in %args are passed to the twig.  As always if the parse
1789           fails the parser will die, so use an eval if you want to trap
1790           syntax errors.
1791
1792           As obviously the element does not exist beforehand this method has
1793           to be called on the class:
1794
1795             my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
1796                                             <elements>, actually tons of </elements>
1797                             h</a>");
1798
1799       set_inner_xml ($string)
1800           Sets the content of the element to be the tree created from the
1801           string
1802
1803       set_inner_html ($string)
1804           Sets the content of the element, after parsing the string with an
1805           HTML parser (HTML::Parser)
1806
1807       print         ($optional_filehandle, $optional_pretty_print_style)
1808           Prints an entire element, including the tags, optionally to a
1809           $optional_filehandle, optionally with a $pretty_print_style.
1810
1811           The print outputs XML data so base entities are escaped.
1812
1813       sprint       ($elt, $optional_no_enclosing_tag)
1814           Return the xml string for an entire element, including the tags.
1815           If the optional second argument is true then only the string inside
1816           the element is returned (the start and end tag for $elt are not).
1817           The text is XML-escaped: base entities (& and < in text, & < and "
1818           in attribute values) are turned into entities.
1819
1820       gi  Return the gi of the element (the gi is the "generic identifier"
1821           the tag name in SGML parlance).
1822
1823           "tag" and "name" are synonyms of "gi".
1824
1825       tag Same as "gi"
1826
1827       name
1828           Same as "tag"
1829
1830       set_gi         ($tag)
1831           Set the gi (tag) of an element
1832
1833       set_tag        ($tag)
1834           Set the tag (="tag") of an element
1835
1836       set_name       ($name)
1837           Set the name (="tag") of an element
1838
1839       root
1840           Return the root of the twig in which the element is contained.
1841
1842       twig
1843           Return the twig containing the element.
1844
1845       parent        ($optional_condition)
1846           Return the parent of the element, or the first ancestor matching
1847           the $optional_condition
1848
1849       first_child   ($optional_condition)
1850           Return the first child of the element, or the first child matching
1851           the $optional_condition
1852
1853       has_child ($optional_condition)
1854           Return the first child of the element, or the first child matching
1855           the $optional_condition (same as first_child)
1856
1857       has_children ($optional_condition)
1858           Return the first child of the element, or the first child matching
1859           the $optional_condition (same as first_child)
1860
1861       first_child_text   ($optional_condition)
1862           Return the text of the first child of the element, or the first
1863           child
1864            matching the $optional_condition If there is no first_child then
1865           returns ''. This avoids getting the child, checking for its
1866           existence then getting the text for trivial cases.
1867
1868           Similar methods are available for the other navigation methods:
1869
1870           last_child_text
1871           prev_sibling_text
1872           next_sibling_text
1873           prev_elt_text
1874           next_elt_text
1875           child_text
1876           parent_text
1877
1878           All this methods also exist in "trimmed" variant:
1879
1880           first_child_trimmed_text
1881           last_child_trimmed_text
1882           prev_sibling_trimmed_text
1883           next_sibling_trimmed_text
1884           prev_elt_trimmed_text
1885           next_elt_trimmed_text
1886           child_trimmed_text
1887           parent_trimmed_text
1888       field         ($condition)
1889           Same method as "first_child_text" with a different name
1890
1891       fields         ($condition_list)
1892           Return the list of field (text of first child matching the
1893           conditions), missing fields are returned as the empty string.
1894
1895           Same method as "first_child_text" with a different name
1896
1897       trimmed_field         ($optional_condition)
1898           Same method as "first_child_trimmed_text" with a different name
1899
1900       set_field ($condition, $optional_atts, @list_of_elt_and_strings)
1901           Set the content of the first child of the element that matches
1902           $condition, the rest of the arguments is the same as for
1903           "set_content"
1904
1905           If no child matches $condition _and_ if $condition is a valid XML
1906           element name, then a new element by that name is created and
1907           inserted as the last child.
1908
1909       first_child_matches   ($optional_condition)
1910           Return the element if the first child of the element (if it exists)
1911           passes the $optional_condition "undef" otherwise
1912
1913             if( $elt->first_child_matches( 'title')) ...
1914
1915           is equivalent to
1916
1917             if( $elt->{first_child} && $elt->{first_child}->passes( 'title'))
1918
1919           "first_child_is" is an other name for this method
1920
1921           Similar methods are available for the other navigation methods:
1922
1923           last_child_matches
1924           prev_sibling_matches
1925           next_sibling_matches
1926           prev_elt_matches
1927           next_elt_matches
1928           child_matches
1929           parent_matches
1930       is_first_child ($optional_condition)
1931           returns true (the element) if the element is the first child of its
1932           parent (optionally that satisfies the $optional_condition)
1933
1934       is_last_child ($optional_condition)
1935           returns true (the element) if the element is the first child of its
1936           parent (optionally that satisfies the $optional_condition)
1937
1938       prev_sibling  ($optional_condition)
1939           Return the previous sibling of the element, or the previous sibling
1940           matching $optional_condition
1941
1942       next_sibling  ($optional_condition)
1943           Return the next sibling of the element, or the first one matching
1944           $optional_condition.
1945
1946       next_elt     ($optional_elt, $optional_condition)
1947           Return the next elt (optionally matching $optional_condition) of
1948           the element. This is defined as the next element which opens after
1949           the current element opens.  Which usually means the first child of
1950           the element.  Counter-intuitive as it might look this allows you to
1951           loop through the whole document by starting from the root.
1952
1953           The $optional_elt is the root of a subtree. When the "next_elt" is
1954           out of the subtree then the method returns undef. You can then walk
1955           a sub tree with:
1956
1957             my $elt= $subtree_root;
1958             while( $elt= $elt->next_elt( $subtree_root)
1959               { # insert processing code here
1960               }
1961
1962       prev_elt     ($optional_condition)
1963           Return the previous elt (optionally matching $optional_condition)
1964           of the element. This is the first element which opens before the
1965           current one.  It is usually either the last descendant of the
1966           previous sibling or simply the parent
1967
1968       next_n_elt   ($offset, $optional_condition)
1969           Return the $offset-th element that matches the $optional_condition
1970
1971       following_elt
1972           Return the following element (as per the XPath following axis)
1973
1974       preceding_elt
1975           Return the preceding element (as per the XPath preceding axis)
1976
1977       following_elts
1978           Return the list of following elements (as per the XPath following
1979           axis)
1980
1981       preceding_elts
1982           Return the pst of preceding elements (as per the XPath preceding
1983           axis)
1984
1985       children     ($optional_condition)
1986           Return the list of children (optionally which matches
1987           $optional_condition) of the element. The list is in document order.
1988
1989       children_count ($optional_condition)
1990           Return the number of children of the element (optionally which
1991           matches $optional_condition)
1992
1993       children_text ($optional_condition)
1994           In array context, reeturns an array containing the text of children
1995           of the element (optionally which matches $optional_condition)
1996
1997           In scalar context, returns the concatenation of the text of
1998           children of the element
1999
2000       children_trimmed_text ($optional_condition)
2001           In array context, returns an array containing the trimmed text of
2002           children of the element (optionally which matches
2003           $optional_condition)
2004
2005           In scalar context, returns the concatenation of the trimmed text of
2006           children of the element
2007
2008       children_copy ($optional_condition)
2009           Return a list of elements that are copies of the children of the
2010           element, optionally which matches $optional_condition
2011
2012       descendants     ($optional_condition)
2013           Return the list of all descendants (optionally which matches
2014           $optional_condition) of the element. This is the equivalent of the
2015           "getElementsByTagName" of the DOM (by the way, if you are really a
2016           DOM addict, you can use "getElementsByTagName" instead)
2017
2018       getElementsByTagName ($optional_condition)
2019           Same as "descendants"
2020
2021       find_by_tag_name ($optional_condition)
2022           Same as "descendants"
2023
2024       descendants_or_self ($optional_condition)
2025           Same as "descendants" except that the element itself is included in
2026           the list if it matches the $optional_condition
2027
2028       first_descendant  ($optional_condition)
2029           Return the first descendant of the element that matches the
2030           condition
2031
2032       last_descendant  ($optional_condition)
2033           Return the last descendant of the element that matches the
2034           condition
2035
2036       ancestors    ($optional_condition)
2037           Return the list of ancestors (optionally matching
2038           $optional_condition) of the element.  The list is ordered from the
2039           innermost ancestor to the outermost one
2040
2041           NOTE: the element itself is not part of the list, in order to
2042           include it you will have to use ancestors_or_self
2043
2044       ancestors_or_self     ($optional_condition)
2045           Return the list of ancestors (optionally matching
2046           $optional_condition) of the element, including the element (if it
2047           matches the condition>).  The list is ordered from the innermost
2048           ancestor to the outermost one
2049
2050       passes ($condition)
2051           Return the element if it passes the $condition
2052
2053       att          ($att)
2054           Return the value of attribute $att or "undef"
2055
2056       set_att      ($att, $att_value)
2057           Set the attribute of the element to the given value
2058
2059           You can actually set several attributes this way:
2060
2061             $elt->set_att( att1 => "val1", att2 => "val2");
2062
2063       del_att      ($att)
2064           Delete the attribute for the element
2065
2066           You can actually delete several attributes at once:
2067
2068             $elt->del_att( 'att1', 'att2', 'att3');
2069
2070       att_exists ($att)
2071           Returns true if the attribute $att exists for the element, false
2072           otherwise
2073
2074       cut Cut the element from the tree. The element still exists, it can be
2075           copied or pasted somewhere else, it is just not attached to the
2076           tree anymore.
2077
2078           Note that the "old" links to the parent, previous and next siblings
2079           can still be accessed using the former_* methods
2080
2081       former_next_sibling
2082           Returns the former next sibling of a cut node (or undef if the node
2083           has not been cut)
2084
2085           This makes it easier to write loops where you cut elements:
2086
2087               my $child= $parent->first_child( 'achild');
2088               while( $child->{'att'}->{'cut'})
2089                 { $child->cut; $child= $child->former_next_sibling; }
2090
2091       former_prev_sibling
2092           Returns the former previous sibling of a cut node (or undef if the
2093           node has not been cut)
2094
2095       former_parent
2096           Returns the former parent of a cut node (or undef if the node has
2097           not been cut)
2098
2099       cut_children ($optional_condition)
2100           Cut all the children of the element (or all of those which satisfy
2101           the $optional_condition).
2102
2103           Return the list of children
2104
2105       copy        ($elt)
2106           Return a copy of the element. The copy is a "deep" copy: all sub
2107           elements of the element are duplicated.
2108
2109       paste       ($optional_position, $ref)
2110           Paste a (previously "cut" or newly generated) element. Die if the
2111           element already belongs to a tree.
2112
2113           Note that the calling element is pasted:
2114
2115             $child->paste( first_child => $existing_parent);
2116             $new_sibling->paste( after => $this_sibling_is_already_in_the_tree);
2117
2118           or
2119
2120             my $new_elt= XML::Twig::Elt->new( tag => $content);
2121             $new_elt->paste( $position => $existing_elt);
2122
2123           Example:
2124
2125             my $t= XML::Twig->new->parse( 'doc.xml')
2126             my $toc= $t->root->new( 'toc');
2127             $toc->paste( $t->root); # $toc is pasted as first child of the root
2128             foreach my $title ($t->findnodes( '/doc/section/title'))
2129               { my $title_toc= $title->copy;
2130                 # paste $title_toc as the last child of toc
2131                 $title_toc->paste( last_child => $toc)
2132               }
2133
2134           Position options:
2135
2136           first_child (default)
2137               The element is pasted as the first child of $ref
2138
2139           last_child
2140               The element is pasted as the last child of $ref
2141
2142           before
2143               The element is pasted before $ref, as its previous sibling.
2144
2145           after
2146               The element is pasted after $ref, as its next sibling.
2147
2148           within
2149               In this case an extra argument, $offset, should be supplied.
2150               The element will be pasted in the reference element (or in its
2151               first text child) at the given offset. To achieve this the
2152               reference element will be split at the offset.
2153
2154           Note that you can call directly the underlying method:
2155
2156           paste_before
2157           paste_after
2158           paste_first_child
2159           paste_last_child
2160           paste_within
2161       move       ($optional_position, $ref)
2162           Move an element in the tree.  This is just a "cut" then a "paste".
2163           The syntax is the same as "paste".
2164
2165       replace       ($ref)
2166           Replaces an element in the tree. Sometimes it is just not possible
2167           to"cut" an element then "paste" another in its place, so "replace"
2168           comes in handy.  The calling element replaces $ref.
2169
2170       replace_with   (@elts)
2171           Replaces the calling element with one or more elements
2172
2173       delete
2174           Cut the element and frees the memory.
2175
2176       prefix       ($text, $optional_option)
2177           Add a prefix to an element. If the element is a "PCDATA" element
2178           the text is added to the pcdata, if the elements first child is a
2179           "PCDATA" then the text is added to it's pcdata, otherwise a new
2180           "PCDATA" element is created and pasted as the first child of the
2181           element.
2182
2183           If the option is "asis" then the prefix is added asis: it is
2184           created in a separate "PCDATA" element with an "asis" property. You
2185           can then write:
2186
2187             $elt1->prefix( '<b>', 'asis');
2188
2189           to create a "<b>" in the output of "print".
2190
2191       suffix       ($text, $optional_option)
2192           Add a suffix to an element. If the element is a "PCDATA" element
2193           the text is added to the pcdata, if the elements last child is a
2194           "PCDATA" then the text is added to it's pcdata, otherwise a new
2195           PCDATA element is created and pasted as the last child of the
2196           element.
2197
2198           If the option is "asis" then the suffix is added asis: it is
2199           created in a separate "PCDATA" element with an "asis" property. You
2200           can then write:
2201
2202             $elt2->suffix( '</b>', 'asis');
2203
2204       trim
2205           Trim the element in-place: spaces at the beginning and at the end
2206           of the element are discarded and multiple spaces within the element
2207           (or its descendants) are replaced by a single space.
2208
2209           Note that in some cases you can still end up with multiple spaces,
2210           if they are split between several elements:
2211
2212             <doc>  text <b>  hah! </b>  yep</doc>
2213
2214           gets trimmed to
2215
2216             <doc>text <b> hah! </b> yep</doc>
2217
2218           This is somewhere in between a bug and a feature.
2219
2220       normalize
2221           merge together all consecutive pcdata elements in the element (if
2222           for example you have turned some elements into pcdata using
2223           "erase", this will give you a "clean" element in which there all
2224           text fragments are as long as possible).
2225
2226       simplify (%options)
2227           Return a data structure suspiciously similar to XML::Simple's.
2228           Options are identical to XMLin options, see XML::Simple doc for
2229           more details (or use DATA::dumper or YAML to dump the data
2230           structure)
2231
2232           content_key
2233           forcearray
2234           keyattr
2235           noattr
2236           normalize_space
2237               aka normalise_space
2238
2239           variables (%var_hash)
2240               %var_hash is a hash { name => value }
2241
2242               This option allows variables in the XML to be expanded when the
2243               file is read. (there is no facility for putting the variable
2244               names back if you regenerate XML using XMLout).
2245
2246               A 'variable' is any text of the form ${name} (or $name) which
2247               occurs in an attribute value or in the text content of an
2248               element. If 'name' matches a key in the supplied hashref,
2249               ${name} will be replaced with the corresponding value from the
2250               hashref. If no matching key is found, the variable will not be
2251               replaced.
2252
2253           var_att ($attribute_name)
2254               This option gives the name of an attribute that will be used to
2255               create variables in the XML:
2256
2257                 <dirs>
2258                   <dir name="prefix">/usr/local</dir>
2259                   <dir name="exec_prefix">$prefix/bin</dir>
2260                 </dirs>
2261
2262               use "var => 'name'" to get $prefix replaced by /usr/local in
2263               the generated data structure
2264
2265               By default variables are captured by the following regexp:
2266               /$(\w+)/
2267
2268           var_regexp (regexp)
2269               This option changes the regexp used to capture variables. The
2270               variable name should be in $1
2271
2272           group_tags { grouping tag => grouped tag, grouping tag 2 => grouped
2273           tag 2...}
2274               Option used to simplify the structure: elements listed will not
2275               be used.  Their children will be, they will be considered
2276               children of the element parent.
2277
2278               If the element is:
2279
2280                 <config host="laptop.xmltwig.com">
2281                   <server>localhost</server>
2282                   <dirs>
2283                     <dir name="base">/home/mrodrigu/standards</dir>
2284                     <dir name="tools">$base/tools</dir>
2285                   </dirs>
2286                   <templates>
2287                     <template name="std_def">std_def.templ</template>
2288                     <template name="dummy">dummy</template>
2289                   </templates>
2290                 </config>
2291
2292               Then calling simplify with "group_tags => { dirs => 'dir',
2293               templates => 'template'}" makes the data structure be exactly
2294               as if the start and end tags for "dirs" and "templates" were
2295               not there.
2296
2297               A YAML dump of the structure
2298
2299                 base: '/home/mrodrigu/standards'
2300                 host: laptop.xmltwig.com
2301                 server: localhost
2302                 template:
2303                   - std_def.templ
2304                   - dummy.templ
2305                 tools: '$base/tools'
2306
2307       split_at        ($offset)
2308           Split a text ("PCDATA" or "CDATA") element in 2 at $offset, the
2309           original element now holds the first part of the string and a new
2310           element holds the right part. The new element is returned
2311
2312           If the element is not a text element then the first text child of
2313           the element is split
2314
2315       split        ( $optional_regexp, $tag1, $atts1, $tag2, $atts2...)
2316           Split the text descendants of an element in place, the text is
2317           split using the $regexp, if the regexp includes () then the matched
2318           separators will be wrapped in elements.  $1 is wrapped in $tag1,
2319           with attributes $atts1 if $atts1 is given (as a hashref), $2 is
2320           wrapped in $tag2...
2321
2322           if $elt is "<p>tati tata <b>tutu tati titi</b> tata tati tata</p>"
2323
2324             $elt->split( qr/(ta)ti/, 'foo', {type => 'toto'} )
2325
2326           will change $elt to
2327
2328             <p><foo type="toto">ta</foo> tata <b>tutu <foo type="toto">ta</foo>
2329                 titi</b> tata <foo type="toto">ta</foo> tata</p>
2330
2331           The regexp can be passed either as a string or as "qr//" (perl
2332           5.005 and later), it defaults to \s+ just as the "split" built-in
2333           (but this would be quite a useless behaviour without the
2334           $optional_tag parameter)
2335
2336           $optional_tag defaults to PCDATA or CDATA, depending on the initial
2337           element type
2338
2339           The list of descendants is returned (including un-touched original
2340           elements and newly created ones)
2341
2342       mark        ( $regexp, $optional_tag, $optional_attribute_ref)
2343           This method behaves exactly as split, except only the newly created
2344           elements are returned
2345
2346       wrap_children ( $regexp_string, $tag, $optional_attribute_hashref)
2347           Wrap the children of the element that match the regexp in an
2348           element $tag.  If $optional_attribute_hashref is passed then the
2349           new element will have these attributes.
2350
2351           The $regexp_string includes tags, within pointy brackets, as in
2352           "<title><para>+" and the usual Perl modifiers (+*?...).  Tags can
2353           be further qualified with attributes: "<para type="warning"
2354           classif="cosmic_secret">+". The values for attributes should be
2355           xml-escaped: "<candy type="M&amp;Ms">*" ("<", "&" ">" and """
2356           should be escaped).
2357
2358           Note that elements might get extra "id" attributes in the process.
2359           See add_id.  Use strip_att to remove unwanted id's.
2360
2361           Here is an example:
2362
2363           If the element $elt has the following content:
2364
2365             <elt>
2366              <p>para 1</p>
2367              <l_l1_1>list 1 item 1 para 1</l_l1_1>
2368                <l_l1>list 1 item 1 para 2</l_l1>
2369              <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2370              <l_l1_n>list 1 item 3 para 1</l_l1_n>
2371                <l_l1>list 1 item 3 para 2</l_l1>
2372                <l_l1>list 1 item 3 para 3</l_l1>
2373              <l_l1_1>list 2 item 1 para 1</l_l1_1>
2374                <l_l1>list 2 item 1 para 2</l_l1>
2375              <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2376              <l_l1_n>list 2 item 3 para 1</l_l1_n>
2377                <l_l1>list 2 item 3 para 2</l_l1>
2378                <l_l1>list 2 item 3 para 3</l_l1>
2379             </elt>
2380
2381           Then the code
2382
2383             $elt->wrap_children( q{<l_l1_1><l_l1>*} , li => { type => "ul1" });
2384             $elt->wrap_children( q{<l_l1_n><l_l1>*} , li => { type => "ul" });
2385
2386             $elt->wrap_children( q{<li type="ul1"><li type="ul">+}, "ul");
2387             $elt->strip_att( 'id');
2388             $elt->strip_att( 'type');
2389             $elt->print;
2390
2391           will output:
2392
2393             <elt>
2394                <p>para 1</p>
2395                <ul>
2396                  <li>
2397                    <l_l1_1>list 1 item 1 para 1</l_l1_1>
2398                    <l_l1>list 1 item 1 para 2</l_l1>
2399                  </li>
2400                  <li>
2401                    <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2402                  </li>
2403                  <li>
2404                    <l_l1_n>list 1 item 3 para 1</l_l1_n>
2405                    <l_l1>list 1 item 3 para 2</l_l1>
2406                    <l_l1>list 1 item 3 para 3</l_l1>
2407                  </li>
2408                </ul>
2409                <ul>
2410                  <li>
2411                    <l_l1_1>list 2 item 1 para 1</l_l1_1>
2412                    <l_l1>list 2 item 1 para 2</l_l1>
2413                  </li>
2414                  <li>
2415                    <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2416                  </li>
2417                  <li>
2418                    <l_l1_n>list 2 item 3 para 1</l_l1_n>
2419                    <l_l1>list 2 item 3 para 2</l_l1>
2420                    <l_l1>list 2 item 3 para 3</l_l1>
2421                  </li>
2422                </ul>
2423             </elt>
2424
2425       subs_text ($regexp, $replace)
2426           subs_text does text substitution, similar to perl's " s///"
2427           operator.
2428
2429           $regexp must be a perl regexp, created with the "qr" operator.
2430
2431           $replace can include "$1, $2"... from the $regexp. It can also be
2432           used to create element and entities, by using "&elt( tag => { att
2433           => val }, text)" (similar syntax as "new") and "&ent( name)".
2434
2435           Here is a rather complex example:
2436
2437             $elt->subs_text( qr{(?<!do not )link to (http://([^\s,]*))},
2438                              'see &elt( a =>{ href => $1 }, $2)'
2439                            );
2440
2441           This will replace text like link to http://www.xmltwig.com by see
2442           <a href="www.xmltwig.com">www.xmltwig.com</a>, but not do not link
2443           to...
2444
2445           Generating entities (here replacing spaces with &nbsp;):
2446
2447             $elt->subs_text( qr{ }, '&ent( "&nbsp;")');
2448
2449           or, using a variable:
2450
2451             my $ent="&nbsp;";
2452             $elt->subs_text( qr{ }, "&ent( '$ent')");
2453
2454           Note that the substitution is always global, as in using the "g"
2455           modifier in a perl substitution, and that it is performed on all
2456           text descendants of the element.
2457
2458           Bug: in the $regexp, you can only use "\1", "\2"... if the
2459           replacement expression does not include elements or attributes. eg
2460
2461             t->subs_text( qr/((t[aiou])\2)/, '$2');             # ok, replaces toto, tata, titi, tutu by to, ta, ti, tu
2462             t->subs_text( qr/((t[aiou])\2)/, '&elt(p => $1)' ); # NOK, does not find toto...
2463
2464       add_id ($optional_coderef)
2465           Add an id to the element.
2466
2467           The id is an attribute, "id" by default, see the "id" option for
2468           XML::Twig "new" to change it. Use an id starting with "#" to get an
2469           id that's not output by print, flush or sprint, yet that allows you
2470           to use the elt_id method to get the element easily.
2471
2472           If the element already has an id, no new id is generated.
2473
2474           By default the method create an id of the form "twig_id_<nnnn>",
2475           where "<nnnn>" is a number, incremented each time the method is
2476           called successfully.
2477
2478       set_id_seed ($prefix)
2479           by default the id generated by "add_id" is "twig_id_<nnnn>",
2480           "set_id_seed" changes the prefix to $prefix and resets the number
2481           to 1
2482
2483       strip_att ($att)
2484           Remove the attribute $att from all descendants of the element
2485           (including the element)
2486
2487           Return the element
2488
2489       change_att_name ($old_name, $new_name)
2490           Change the name of the attribute from $old_name to $new_name. If
2491           there is no attribute $old_name nothing happens.
2492
2493       lc_attnames
2494           Lower cases the name all the attributes of the element.
2495
2496       sort_children_on_value( %options)
2497           Sort the children of the element in place according to their text.
2498           All children are sorted.
2499
2500           Return the element, with its children sorted.
2501
2502           %options are
2503
2504             type  : numeric |  alpha     (default: alpha)
2505             order : normal  |  reverse   (default: normal)
2506
2507           Return the element, with its children sorted
2508
2509       sort_children_on_att ($att, %options)
2510           Sort the children of the  element in place according to attribute
2511           $att.  %options are the same as for "sort_children_on_value"
2512
2513           Return the element.
2514
2515       sort_children_on_field ($tag, %options)
2516           Sort the children of the element in place, according to the field
2517           $tag (the text of the first child of the child with this tag).
2518           %options are the same as for "sort_children_on_value".
2519
2520           Return the element, with its children sorted
2521
2522       sort_children( $get_key, %options)
2523           Sort the children of the element in place. The $get_key argument is
2524           a reference to a function that returns the sort key when passed an
2525           element.
2526
2527           For example:
2528
2529             $elt->sort_children( sub { $_[0]->{'att'}->{"nb"} + $_[0]->text },
2530                                  type => 'numeric', order => 'reverse'
2531                                );
2532
2533       field_to_att ($cond, $att)
2534           Turn the text of the first sub-element matched by $cond into the
2535           value of attribute $att of the element. If $att is omitted then
2536           $cond is used as the name of the attribute, which makes sense only
2537           if $cond is a valid element (and attribute) name.
2538
2539           The sub-element is then cut.
2540
2541       att_to_field ($att, $tag)
2542           Take the value of attribute $att and create a sub-element $tag as
2543           first child of the element. If $tag is omitted then $att is used as
2544           the name of the sub-element.
2545
2546       get_xpath  ($xpath, $optional_offset)
2547           Return a list of elements satisfying the $xpath. $xpath is an
2548           XPATH-like expression.
2549
2550           A subset of the XPATH abbreviated syntax is covered:
2551
2552             tag
2553             tag[1] (or any other positive number)
2554             tag[last()]
2555             tag[@att] (the attribute exists for the element)
2556             tag[@att="val"]
2557             tag[@att=~ /regexp/]
2558             tag[att1="val1" and att2="val2"]
2559             tag[att1="val1" or att2="val2"]
2560             tag[string()="toto"] (returns tag elements which text (as per the text method)
2561                                  is toto)
2562             tag[string()=~/regexp/] (returns tag elements which text (as per the text
2563                                     method) matches regexp)
2564             expressions can start with / (search starts at the document root)
2565             expressions can start with . (search starts at the current element)
2566             // can be used to get all descendants instead of just direct children
2567             * matches any tag
2568
2569           So the following examples from the XPath
2570           recommendationhttp://www.w3.org/TR/xpath.html#path-abbrev
2571           <http://www.w3.org/TR/xpath.html#path-abbrev> work:
2572
2573             para selects the para element children of the context node
2574             * selects all element children of the context node
2575             para[1] selects the first para child of the context node
2576             para[last()] selects the last para child of the context node
2577             */para selects all para grandchildren of the context node
2578             /doc/chapter[5]/section[2] selects the second section of the fifth chapter
2579                of the doc
2580             chapter//para selects the para element descendants of the chapter element
2581                children of the context node
2582             //para selects all the para descendants of the document root and thus selects
2583                all para elements in the same document as the context node
2584             //olist/item selects all the item elements in the same document as the
2585                context node that have an olist parent
2586             .//para selects the para element descendants of the context node
2587             .. selects the parent of the context node
2588             para[@type="warning"] selects all para children of the context node that have
2589                a type attribute with value warning
2590             employee[@secretary and @assistant] selects all the employee children of the
2591                context node that have both a secretary attribute and an assistant
2592                attribute
2593
2594           The elements will be returned in the document order.
2595
2596           If $optional_offset is used then only one element will be returned,
2597           the one with the appropriate offset in the list, starting at 0
2598
2599           Quoting and interpolating variables can be a pain when the Perl
2600           syntax and the XPATH syntax collide, so use alternate quoting
2601           mechanisms like q or qq (I like q{} and qq{} myself).
2602
2603           Here are some more examples to get you started:
2604
2605             my $p1= "p1";
2606             my $p2= "p2";
2607             my @res= $t->get_xpath( qq{p[string( "$p1") or string( "$p2")]});
2608
2609             my $a= "a1";
2610             my @res= $t->get_xpath( qq{//*[@att="$a"]});
2611
2612             my $val= "a1";
2613             my $exp= qq{//p[ \@att='$val']}; # you need to use \@ or you will get a warning
2614             my @res= $t->get_xpath( $exp);
2615
2616           Note that the only supported regexps delimiters are / and that you
2617           must backslash all / in regexps AND in regular strings.
2618
2619           XML::Twig does not provide natively full XPATH support, but you can
2620           use "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as
2621           the XPath engine, with full coverage of the spec.
2622
2623           "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as the
2624           XPath engine, with full coverage of the spec.
2625
2626       find_nodes
2627           same as"get_xpath"
2628
2629       findnodes
2630           same as "get_xpath"
2631
2632       text @optional_options
2633           Return a string consisting of all the "PCDATA" and "CDATA" in an
2634           element, without any tags. The text is not XML-escaped: base
2635           entities such as "&" and "<" are not escaped.
2636
2637           The '"no_recurse"' option will only return the text of the element,
2638           not of any included sub-elements (same as "text_only").
2639
2640       text_only
2641           Same as "text" except that the text returned doesn't include the
2642           text of sub-elements.
2643
2644       trimmed_text
2645           Same as "text" except that the text is trimmed: leading and
2646           trailing spaces are discarded, consecutive spaces are collapsed
2647
2648       set_text        ($string)
2649           Set the text for the element: if the element is a "PCDATA", just
2650           set its text, otherwise cut all the children of the element and
2651           create a single "PCDATA" child for it, which holds the text.
2652
2653       merge ($elt2)
2654           Move the content of $elt2 within the element
2655
2656       insert         ($tag1, [$optional_atts1], $tag2, [$optional_atts2],...)
2657           For each tag in the list inserts an element $tag as the only child
2658           of the element.  The element gets the optional attributes
2659           in"$optional_atts<n>."  All children of the element are set as
2660           children of the new element.  The upper level element is returned.
2661
2662             $p->insert( table => { border=> 1}, 'tr', 'td')
2663
2664           put $p in a table with a visible border, a single "tr" and a single
2665           "td" and return the "table" element:
2666
2667             <p><table border="1"><tr><td>original content of p</td></tr></table></p>
2668
2669       wrap_in        (@tag)
2670           Wrap elements in @tag as the successive ancestors of the element,
2671           returns the new element.  "$elt->wrap_in( 'td', 'tr', 'table')"
2672           wraps the element as a single cell in a table for example.
2673
2674           Optionally each tag can be followed by a hashref of attributes,
2675           that will be set on the wrapping element:
2676
2677             $elt->wrap_in( p => { class => "advisory" }, div => { class => "intro", id => "div_intro });
2678
2679       insert_new_elt ($opt_position, $tag, $opt_atts_hashref, @opt_content)
2680           Combines a "new " and a "paste ": creates a new element using $tag,
2681           $opt_atts_hashref and @opt_content which are arguments similar to
2682           those for "new", then paste it, using $opt_position or
2683           'first_child', relative to $elt.
2684
2685           Return the newly created element
2686
2687       erase
2688           Erase the element: the element is deleted and all of its children
2689           are pasted in its place.
2690
2691       set_content    ( $optional_atts, @list_of_elt_and_strings) (
2692       $optional_atts, '#EMPTY')
2693           Set the content for the element, from a list of strings and
2694           elements.  Cuts all the element children, then pastes the list
2695           elements as the children.  This method will create a "PCDATA"
2696           element for any strings in the list.
2697
2698           The $optional_atts argument is the ref of a hash of attributes. If
2699           this argument is used then the previous attributes are deleted,
2700           otherwise they are left untouched.
2701
2702           WARNING: if you rely on ID's then you will have to set the id
2703           yourself. At this point the element does not belong to a twig yet,
2704           so the ID attribute is not known so it won't be stored in the ID
2705           list.
2706
2707           A content of '"#EMPTY"' creates an empty element;
2708
2709       namespace ($optional_prefix)
2710           Return the URI of the namespace that $optional_prefix or the
2711           element name belongs to. If the name doesn't belong to any
2712           namespace, "undef" is returned.
2713
2714       local_name
2715           Return the local name (without the prefix) for the element
2716
2717       ns_prefix
2718           Return the namespace prefix for the element
2719
2720       current_ns_prefixes
2721           Return a list of namespace prefixes valid for the element. The
2722           order of the prefixes in the list has no meaning. If the default
2723           namespace is currently bound, '' appears in the list.
2724
2725       inherit_att  ($att, @optional_tag_list)
2726           Return the value of an attribute inherited from parent tags. The
2727           value returned is found by looking for the attribute in the element
2728           then in turn in each of its ancestors. If the @optional_tag_list is
2729           supplied only those ancestors whose tag is in the list will be
2730           checked.
2731
2732       all_children_are ($optional_condition)
2733           return 1 if all children of the element pass the
2734           $optional_condition, 0 otherwise
2735
2736       level       ($optional_condition)
2737           Return the depth of the element in the twig (root is 0).  If
2738           $optional_condition is given then only ancestors that match the
2739           condition are counted.
2740
2741           WARNING: in a tree created using the "twig_roots" option this will
2742           not return the level in the document tree, level 0 will be the
2743           document root, level 1 will be the "twig_roots" elements. During
2744           the parsing (in a "twig_handler") you can use the "depth" method on
2745           the twig object to get the real parsing depth.
2746
2747       in           ($potential_parent)
2748           Return true if the element is in the potential_parent
2749           ($potential_parent is an element)
2750
2751       in_context   ($cond, $optional_level)
2752           Return true if the element is included in an element which passes
2753           $cond optionally within $optional_level levels. The returned value
2754           is the including element.
2755
2756       pcdata
2757           Return the text of a "PCDATA" element or "undef" if the element is
2758           not "PCDATA".
2759
2760       pcdata_xml_string
2761           Return the text of a "PCDATA" element or undef if the element is
2762           not "PCDATA".  The text is "XML-escaped" ('&' and '<' are replaced
2763           by '&amp;' and '&lt;')
2764
2765       set_pcdata     ($text)
2766           Set the text of a "PCDATA" element. This method does not check that
2767           the element is indeed a "PCDATA" so usually you should use
2768           "set_text" instead.
2769
2770       append_pcdata  ($text)
2771           Add the text at the end of a "PCDATA" element.
2772
2773       is_cdata
2774           Return 1 if the element is a "CDATA" element, returns 0 otherwise.
2775
2776       is_text
2777           Return 1 if the element is a "CDATA" or "PCDATA" element, returns 0
2778           otherwise.
2779
2780       cdata
2781           Return the text of a "CDATA" element or "undef" if the element is
2782           not "CDATA".
2783
2784       cdata_string
2785           Return the XML string of a "CDATA" element, including the opening
2786           and closing markers.
2787
2788       set_cdata     ($text)
2789           Set the text of a "CDATA" element.
2790
2791       append_cdata  ($text)
2792           Add the text at the end of a "CDATA" element.
2793
2794       remove_cdata
2795           Turns all "CDATA" sections in the element into regular "PCDATA"
2796           elements. This is useful when converting XML to HTML, as browsers
2797           do not support CDATA sections.
2798
2799       extra_data
2800           Return the extra_data (comments and PI's) attached to an element
2801
2802       set_extra_data     ($extra_data)
2803           Set the extra_data (comments and PI's) attached to an element
2804
2805       append_extra_data  ($extra_data)
2806           Append extra_data to the existing extra_data before the element (if
2807           no previous extra_data exists then it is created)
2808
2809       set_asis
2810           Set a property of the element that causes it to be output without
2811           being XML escaped by the print functions: if it contains "a < b" it
2812           will be output as such and not as "a &lt; b". This can be useful to
2813           create text elements that will be output as markup. Note that all
2814           "PCDATA" descendants of the element are also marked as having the
2815           property (they are the ones that are actually impacted by the
2816           change).
2817
2818           If the element is a "CDATA" element it will also be output asis,
2819           without the "CDATA" markers. The same goes for any "CDATA"
2820           descendant of the element
2821
2822       set_not_asis
2823           Unsets the "asis" property for the element and its text
2824           descendants.
2825
2826       is_asis
2827           Return the "asis" property status of the element ( 1 or "undef")
2828
2829       closed
2830           Return true if the element has been closed. Might be useful if you
2831           are somewhere in the tree, during the parse, and have no idea
2832           whether a parent element is completely loaded or not.
2833
2834       get_type
2835           Return the type of the element: '"#ELT"' for "real" elements, or
2836           '"#PCDATA"', '"#CDATA"', '"#COMMENT"', '"#ENT"', '"#PI"'
2837
2838       is_elt
2839           Return the tag if the element is a "real" element, or 0 if it is
2840           "PCDATA", "CDATA"...
2841
2842       contains_only_text
2843           Return 1 if the element does not contain any other "real" element
2844
2845       contains_only ($exp)
2846           Return the list of children if all children of the element match
2847           the expression $exp
2848
2849             if( $para->contains_only( 'tt')) { ... }
2850
2851       contains_a_single ($exp)
2852           If the element contains a single child that matches the expression
2853           $exp returns that element. Otherwise returns 0.
2854
2855       is_field
2856           same as "contains_only_text"
2857
2858       is_pcdata
2859           Return 1 if the element is a "PCDATA" element, returns 0 otherwise.
2860
2861       is_ent
2862           Return 1 if the element is an entity (an unexpanded entity)
2863           element, return 0 otherwise.
2864
2865       is_empty
2866           Return 1 if the element is empty, 0 otherwise
2867
2868       set_empty
2869           Flags the element as empty. No further check is made, so if the
2870           element is actually not empty the output will be messed. The only
2871           effect of this method is that the output will be "<tag
2872           att="value""/>".
2873
2874       set_not_empty
2875           Flags the element as not empty. if it is actually empty then the
2876           element will be output as "<tag att="value""></tag>"
2877
2878       is_pi
2879           Return 1 if the element is a processing instruction ("#PI")
2880           element, return 0 otherwise.
2881
2882       target
2883           Return the target of a processing instruction
2884
2885       set_target ($target)
2886           Set the target of a processing instruction
2887
2888       data
2889           Return the data part of a processing instruction
2890
2891       set_data ($data)
2892           Set the data of a processing instruction
2893
2894       set_pi ($target, $data)
2895           Set the target and data of a processing instruction
2896
2897       pi_string
2898           Return the string form of a processing instruction ("<?target
2899           data?>")
2900
2901       is_comment
2902           Return 1 if the element is a comment ("#COMMENT") element, return 0
2903           otherwise.
2904
2905       set_comment ($comment_text)
2906           Set the text for a comment
2907
2908       comment
2909           Return the content of a comment (just the text, not the "<!--" and
2910           "-->")
2911
2912       comment_string
2913           Return the XML string for a comment ("<!-- comment -->")
2914
2915       set_ent ($entity)
2916           Set an (non-expanded) entity ("#ENT"). $entity) is the entity text
2917           ("&ent;")
2918
2919       ent Return the entity for an entity ("#ENT") element ("&ent;")
2920
2921       ent_name
2922           Return the entity name for an entity ("#ENT") element ("ent")
2923
2924       ent_string
2925           Return the entity, either expanded if the expanded version is
2926           available, or non-expanded ("&ent;") otherwise
2927
2928       child ($offset, $optional_condition)
2929           Return the $offset-th child of the element, optionally the
2930           $offset-th child that matches $optional_condition. The children are
2931           treated as a list, so "$elt->child( 0)" is the first child, while
2932           "$elt->child( -1)" is the last child.
2933
2934       child_text ($offset, $optional_condition)
2935           Return the text of a child or "undef" if the sibling does not
2936           exist. Arguments are the same as child.
2937
2938       last_child    ($optional_condition)
2939           Return the last child of the element, or the last child matching
2940           $optional_condition (ie the last of the element children matching
2941           the condition).
2942
2943       last_child_text   ($optional_condition)
2944           Same as "first_child_text" but for the last child.
2945
2946       sibling  ($offset, $optional_condition)
2947           Return the next or previous $offset-th sibling of the element, or
2948           the $offset-th one matching $optional_condition. If $offset is
2949           negative then a previous sibling is returned, if $offset is
2950           positive then  a next sibling is returned. "$offset=0" returns the
2951           element if there is no condition or if the element matches the
2952           condition>, "undef" otherwise.
2953
2954       sibling_text ($offset, $optional_condition)
2955           Return the text of a sibling or "undef" if the sibling does not
2956           exist.  Arguments are the same as "sibling".
2957
2958       prev_siblings ($optional_condition)
2959           Return the list of previous siblings (optionally matching
2960           $optional_condition) for the element. The elements are ordered in
2961           document order.
2962
2963       next_siblings ($optional_condition)
2964           Return the list of siblings (optionally matching
2965           $optional_condition) following the element. The elements are
2966           ordered in document order.
2967
2968       pos ($optional_condition)
2969           Return the position of the element in the children list. The first
2970           child has a position of 1 (as in XPath).
2971
2972           If the $optional_condition is given then only siblings that match
2973           the condition are counted. If the element itself does not match the
2974           condition then 0 is returned.
2975
2976       atts
2977           Return a hash ref containing the element attributes
2978
2979       set_atts      ({ att1=>$att1_val, att2=> $att2_val... })
2980           Set the element attributes with the hash ref supplied as the
2981           argument. The previous attributes are lost (ie the attributes set
2982           by "set_atts" replace all of the attributes of the element).
2983
2984           You can also pass a list instead of a hashref: "$elt->set_atts(
2985           att1 => 'val1',...)"
2986
2987       del_atts
2988           Deletes all the element attributes.
2989
2990       att_nb
2991           Return the number of attributes for the element
2992
2993       has_atts
2994           Return true if the element has attributes (in fact return the
2995           number of attributes, thus being an alias to "att_nb"
2996
2997       has_no_atts
2998           Return true if the element has no attributes, false (0) otherwise
2999
3000       att_names
3001           return a list of the attribute names for the element
3002
3003       att_xml_string ($att, $options)
3004           Return the attribute value, where '&', '<' and quote (" or the
3005           value of the quote option at twig creation) are XML-escaped.
3006
3007           The options are passed as a hashref, setting "escape_gt" to a true
3008           value will also escape '>' ($elt( 'myatt', { escape_gt => 1 });
3009
3010       set_id       ($id)
3011           Set the "id" attribute of the element to the value.  See "elt_id "
3012           to change the id attribute name
3013
3014       id  Gets the id attribute value
3015
3016       del_id       ($id)
3017           Deletes the "id" attribute of the element and remove it from the id
3018           list for the document
3019
3020       class
3021           Return the "class" attribute for the element (methods on the
3022           "class" attribute are quite convenient when dealing with XHTML, or
3023           plain XML that will eventually be displayed using CSS)
3024
3025       set_class ($class)
3026           Set the "class" attribute for the element to $class
3027
3028       add_to_class ($class)
3029           Add $class to the element "class" attribute: the new class is added
3030           only if it is not already present. Note that classes are sorted
3031           alphabetically, so the "class" attribute can be changed even if the
3032           class is already there
3033
3034       att_to_class ($att)
3035           Set the "class" attribute to the value of attribute $att
3036
3037       add_att_to_class ($att)
3038           Add the value of attribute $att to the "class" attribute of the
3039           element
3040
3041       move_att_to_class ($att)
3042           Add the value of attribute $att to the "class" attribute of the
3043           element and delete the attribute
3044
3045       tag_to_class
3046           Set the "class" attribute of the element to the element tag
3047
3048       add_tag_to_class
3049           Add the element tag to its "class" attribute
3050
3051       set_tag_class ($new_tag)
3052           Add the element tag to its "class" attribute and sets the tag to
3053           $new_tag
3054
3055       in_class ($class)
3056           Return true (1) if the element is in the class $class (if $class is
3057           one of the tokens in the element "class" attribute)
3058
3059       tag_to_span
3060           Change the element tag tp "span" and set its class to the old tag
3061
3062       tag_to_div
3063           Change the element tag tp "div" and set its class to the old tag
3064
3065       DESTROY
3066           Frees the element from memory.
3067
3068       start_tag
3069           Return the string for the start tag for the element, including the
3070           "/>" at the end of an empty element tag
3071
3072       end_tag
3073           Return the string for the end tag of an element.  For an empty
3074           element, this returns the empty string ('').
3075
3076       xml_string @optional_options
3077           Equivalent to "$elt->sprint( 1)", returns the string for the entire
3078           element, excluding the element's tags (but nested element tags are
3079           present)
3080
3081           The '"no_recurse"' option will only return the text of the element,
3082           not of any included sub-elements (same as "xml_text_only").
3083
3084       inner_xml
3085           Another synonym for xml_string
3086
3087       outer_xml
3088           An other synonym for sprint
3089
3090       xml_text
3091           Return the text of the element, encoded (and processed by the
3092           current "output_filter" or "output_encoding" options, without any
3093           tag.
3094
3095       xml_text_only
3096           Same as "xml_text" except that the text returned doesn't include
3097           the text of sub-elements.
3098
3099       set_pretty_print ($style)
3100           Set the pretty print method, amongst '"none"' (default),
3101           '"nsgmls"', '"nice"', '"indented"', '"record"' and '"record_c"'
3102
3103           pretty_print styles:
3104
3105           none
3106               the default, no "\n" is used
3107
3108           nsgmls
3109               nsgmls style, with "\n" added within tags
3110
3111           nice
3112               adds "\n" wherever possible (NOT SAFE, can lead to invalid XML)
3113
3114           indented
3115               same as "nice" plus indents elements (NOT SAFE, can lead to
3116               invalid XML)
3117
3118           record
3119               table-oriented pretty print, one field per line
3120
3121           record_c
3122               table-oriented pretty print, more compact than "record", one
3123               record per line
3124
3125       set_empty_tag_style ($style)
3126           Set the method to output empty tags, amongst '"normal"' (default),
3127           '"html"', and '"expand"',
3128
3129           "normal" outputs an empty tag '"<tag/>"', "html" adds a space
3130           '"<tag />"' for elements that can be empty in XHTML and "expand"
3131           outputs '"<tag></tag>"'
3132
3133       set_remove_cdata  ($flag)
3134           set (or unset) the flag that forces the twig to output CDATA
3135           sections as regular (escaped) PCDATA
3136
3137       set_indent ($string)
3138           Set the indentation for the indented pretty print style (default is
3139           2 spaces)
3140
3141       set_quote ($quote)
3142           Set the quotes used for attributes. can be '"double"' (default) or
3143           '"single"'
3144
3145       cmp       ($elt)
3146             Compare the order of the 2 elements in a twig.
3147
3148             C<$a> is the <A>..</A> element, C<$b> is the <B>...</B> element
3149
3150             document                        $a->cmp( $b)
3151             <A> ... </A> ... <B>  ... </B>     -1
3152             <A> ... <B>  ... </B> ... </A>     -1
3153             <B> ... </B> ... <A>  ... </A>      1
3154             <B> ... <A>  ... </A> ... </B>      1
3155              $a == $b                           0
3156              $a and $b not in the same tree   undef
3157
3158       before       ($elt)
3159           Return 1 if $elt starts before the element, 0 otherwise. If the 2
3160           elements are not in the same twig then return "undef".
3161
3162               if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3163
3164       after       ($elt)
3165           Return 1 if $elt starts after the element, 0 otherwise. If the 2
3166           elements are not in the same twig then return "undef".
3167
3168               if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3169
3170       other comparison methods
3171           lt
3172           le
3173           gt
3174           ge
3175       path
3176           Return the element context in a form similar to XPath's short form:
3177           '"/root/tag1/../tag"'
3178
3179       xpath
3180           Return a unique XPath expression that can be used to find the
3181           element again.
3182
3183           It looks like "/doc/sect[3]/title": unique elements do not have an
3184           index, the others do.
3185
3186       private methods
3187           Low-level methods on the twig:
3188
3189           set_parent        ($parent)
3190           set_first_child   ($first_child)
3191           set_last_child    ($last_child)
3192           set_prev_sibling  ($prev_sibling)
3193           set_next_sibling  ($next_sibling)
3194           set_twig_current
3195           del_twig_current
3196           twig_current
3197           flush
3198               This method should NOT be used, always flush the twig, not an
3199               element.
3200
3201           contains_text
3202
3203           Those methods should not be used, unless of course you find some
3204           creative and interesting, not to mention useful, ways to do it.
3205
3206   cond
3207       Most of the navigation functions accept a condition as an optional
3208       argument The first element (or all elements for "children " or
3209       "ancestors ") that passes the condition is returned.
3210
3211       The condition is a single step of an XPath expression using the XPath
3212       subset defined by "get_xpath". Additional conditions are:
3213
3214       The condition can be
3215
3216       #ELT
3217           return a "real" element (not a PCDATA, CDATA, comment or pi
3218           element)
3219
3220       #TEXT
3221           return a PCDATA or CDATA element
3222
3223       regular expression
3224           return an element whose tag matches the regexp. The regexp has to
3225           be created with "qr//" (hence this is available only on perl 5.005
3226           and above)
3227
3228       code reference
3229           applies the code, passing the current element as argument, if the
3230           code returns true then the element is returned, if it returns false
3231           then the code is applied to the next candidate.
3232
3233   XML::Twig::XPath
3234       XML::Twig implements a subset of XPath through the "get_xpath" method.
3235
3236       If you want to use the whole XPath power, then you can use
3237       "XML::Twig::XPath" instead. In this case "XML::Twig" uses "XML::XPath"
3238       to execute XPath queries.  You will of course need "XML::XPath"
3239       installed to be able to use "XML::Twig::XPath".
3240
3241       See XML::XPath for more information.
3242
3243       The methods you can use are:
3244
3245       findnodes              ($path)
3246           return a list of nodes found by $path.
3247
3248       findnodes_as_string    ($path)
3249           return the nodes found reproduced as XML. The result is not
3250           guaranteed to be valid XML though.
3251
3252       findvalue              ($path)
3253           return the concatenation of the text content of the result nodes
3254
3255       In order for "XML::XPath" to be used as the XPath engine the following
3256       methods are included in "XML::Twig":
3257
3258       in XML::Twig
3259
3260       getRootNode
3261       getParentNode
3262       getChildNodes
3263
3264       in XML::Twig::Elt
3265
3266       string_value
3267       toString
3268       getName
3269       getRootNode
3270       getNextSibling
3271       getPreviousSibling
3272       isElementNode
3273       isTextNode
3274       isPI
3275       isPINode
3276       isProcessingInstructionNode
3277       isComment
3278       isCommentNode
3279       getTarget
3280       getChildNodes
3281       getElementById
3282
3283   XML::Twig::XPath::Elt
3284       The methods you can use are the same as on "XML::Twig::XPath" elements:
3285
3286       findnodes              ($path)
3287           return a list of nodes found by $path.
3288
3289       findnodes_as_string    ($path)
3290           return the nodes found reproduced as XML. The result is not
3291           guaranteed to be valid XML though.
3292
3293       findvalue              ($path)
3294           return the concatenation of the text content of the result nodes
3295
3296   XML::Twig::Entity_list
3297       new Create an entity list.
3298
3299       add         ($ent)
3300           Add an entity to an entity list.
3301
3302       add_new_ent ($name, $val, $sysid, $pubid, $ndata, $param)
3303           Create a new entity and add it to the entity list
3304
3305       delete     ($ent or $tag).
3306           Delete an entity (defined by its name or by the Entity object) from
3307           the list.
3308
3309       print      ($optional_filehandle)
3310           Print the entity list.
3311
3312       list
3313           Return the list as an array
3314
3315   XML::Twig::Entity
3316       new        ($name, $val, $sysid, $pubid, $ndata, $param)
3317           Same arguments as the Entity handler for XML::Parser.
3318
3319       print       ($optional_filehandle)
3320           Print an entity declaration.
3321
3322       name
3323           Return the name of the entity
3324
3325       val Return the value of the entity
3326
3327       sysid
3328           Return the system id for the entity (for NDATA entities)
3329
3330       pubid
3331           Return the public id for the entity (for NDATA entities)
3332
3333       ndata
3334           Return true if the entity is an NDATA entity
3335
3336       param
3337           Return true if the entity is a parameter entity
3338
3339       text
3340           Return the entity declaration text.
3341

EXAMPLES

3343       Additional examples (and a complete tutorial) can be found  on the
3344       XML::Twig Page<http://www.xmltwig.com/xmltwig/>
3345
3346       To figure out what flush does call the following script with an XML
3347       file and an element name as arguments
3348
3349         use XML::Twig;
3350
3351         my ($file, $elt)= @ARGV;
3352         my $t= XML::Twig->new( twig_handlers =>
3353             { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
3354         $t->parsefile( $file, ErrorContext => 2);
3355         $t->flush;
3356         print "\n";
3357

NOTES

3359   Subclassing XML::Twig
3360       Useful methods:
3361
3362       elt_class
3363           In order to subclass "XML::Twig" you will probably need to subclass
3364           also "XML::Twig::Elt". Use the "elt_class" option when you create
3365           the "XML::Twig" object to get the elements created in a different
3366           class (which should be a subclass of "XML::Twig::Elt".
3367
3368       add_options
3369           If you inherit "XML::Twig" new method but want to add more options
3370           to it you can use this method to prevent XML::Twig to issue
3371           warnings for those additional options.
3372
3373   DTD Handling
3374       There are 3 possibilities here.  They are:
3375
3376       No DTD
3377           No doctype, no DTD information, no entity information, the world is
3378           simple...
3379
3380       Internal DTD
3381           The XML document includes an internal DTD, and maybe entity
3382           declarations.
3383
3384           If you use the load_DTD option when creating the twig the DTD
3385           information and the entity declarations can be accessed.
3386
3387           The DTD and the entity declarations will be "flush"'ed (or
3388           "print"'ed) either as is (if they have not been modified) or as
3389           reconstructed (poorly, comments are lost, order is not kept, due to
3390           it's content this DTD should not be viewed by anyone) if they have
3391           been modified. You can also modify them directly by changing the
3392           "$twig->{twig_doctype}->{internal}" field (straight from
3393           XML::Parser, see the "Doctype" handler doc)
3394
3395       External DTD
3396           The XML document includes a reference to an external DTD, and maybe
3397           entity declarations.
3398
3399           If you use the "load_DTD" when creating the twig the DTD
3400           information and the entity declarations can be accessed. The entity
3401           declarations will be "flush"'ed (or "print"'ed) either as is (if
3402           they have not been modified) or as reconstructed (badly, comments
3403           are lost, order is not kept).
3404
3405           You can change the doctype through the "$twig->set_doctype" method
3406           and print the dtd through the "$twig->dtd_text" or
3407           "$twig->dtd_print"
3408            methods.
3409
3410           If you need to modify the entity list this is probably the easiest
3411           way to do it.
3412
3413   Flush
3414       If you set handlers and use "flush", do not forget to flush the twig
3415       one last time AFTER the parsing, or you might be missing the end of the
3416       document.
3417
3418       Remember that element handlers are called when the element is CLOSED,
3419       so if you have handlers for nested elements the inner handlers will be
3420       called first. It makes it for example trickier than it would seem to
3421       number nested clauses.
3422

BUGS

3424       entity handling
3425           Due to XML::Parser behaviour, non-base entities in attribute values
3426           disappear: "att="val&ent;"" will be turned into "att => val",
3427           unless you use the "keep_encoding" argument to "XML::Twig->new"
3428
3429       DTD handling
3430           The DTD handling methods are quite bugged. No one uses them and it
3431           seems very difficult to get them to work in all cases, including
3432           with several slightly incompatible versions of XML::Parser and of
3433           libexpat.
3434
3435           Basically you can read the DTD, output it back properly, and update
3436           entities, but not much more.
3437
3438           So use XML::Twig with standalone documents, or with documents
3439           refering to an external DTD, but don't expect it to properly parse
3440           and even output back the DTD.
3441
3442       memory leak
3443           If you use a lot of twigs you might find that you leak quite a lot
3444           of memory (about 2Ks per twig). You can use the "dispose " method
3445           to free that memory after you are done.
3446
3447           If you create elements the same thing might happen, use the
3448           "delete" method to get rid of them.
3449
3450           Alternatively installing the "Scalar::Util" (or "WeakRef") module
3451           on a version of Perl that supports it (>5.6.0) will get rid of the
3452           memory leaks automagically.
3453
3454       ID list
3455           The ID list is NOT updated when elements are cut or deleted.
3456
3457       change_gi
3458           This method will not function properly if you do:
3459
3460                $twig->change_gi( $old1, $new);
3461                $twig->change_gi( $old2, $new);
3462                $twig->change_gi( $new, $even_newer);
3463
3464       sanity check on XML::Parser method calls
3465           XML::Twig should really prevent calls to some XML::Parser methods,
3466           especially the "setHandlers" method.
3467
3468       pretty printing
3469           Pretty printing (at least using the '"indented"' style) is hard to
3470           get right!  Only elements that belong to the document will be
3471           properly indented. Printing elements that do not belong to the twig
3472           makes it impossible for XML::Twig to figure out their depth, and
3473           thus their indentation level.
3474
3475           Also there is an unavoidable bug when using "flush" and pretty
3476           printing for elements with mixed content that start with an
3477           embedded element:
3478
3479             <elt><b>b</b>toto<b>bold</b></elt>
3480
3481             will be output as
3482
3483             <elt>
3484               <b>b</b>toto<b>bold</b></elt>
3485
3486           if you flush the twig when you find the "<b>" element
3487

Globals

3489       These are the things that can mess up calling code, especially if
3490       threaded.  They might also cause problem under mod_perl.
3491
3492       Exported constants
3493           Whether you want them or not you get them! These are subroutines to
3494           use as constant when creating or testing elements
3495
3496             PCDATA  return '#PCDATA'
3497             CDATA   return '#CDATA'
3498             PI      return '#PI', I had the choice between PROC and PI :--(
3499
3500       Module scoped values: constants
3501           these should cause no trouble:
3502
3503             %base_ent= ( '>' => '&gt;',
3504                          '<' => '&lt;',
3505                          '&' => '&amp;',
3506                          "'" => '&apos;',
3507                          '"' => '&quot;',
3508                        );
3509             CDATA_START   = "<![CDATA[";
3510             CDATA_END     = "]]>";
3511             PI_START      = "<?";
3512             PI_END        = "?>";
3513             COMMENT_START = "<!--";
3514             COMMENT_END   = "-->";
3515
3516           pretty print styles
3517
3518             ( $NSGMLS, $NICE, $INDENTED, $INDENTED_C, $WRAPPED, $RECORD1, $RECORD2)= (1..7);
3519
3520           empty tag output style
3521
3522             ( $HTML, $EXPAND)= (1..2);
3523
3524       Module scoped values: might be changed
3525           Most of these deal with pretty printing, so the worst that can
3526           happen is probably that XML output does not look right, but is
3527           still valid and processed identically by XML processors.
3528
3529           $empty_tag_style can mess up HTML bowsers though and changing $ID
3530           would most likely create problems.
3531
3532             $pretty=0;           # pretty print style
3533             $quote='"';          # quote for attributes
3534             $INDENT= '  ';       # indent for indented pretty print
3535             $empty_tag_style= 0; # how to display empty tags
3536             $ID                  # attribute used as an id ('id' by default)
3537
3538       Module scoped values: definitely changed
3539           These 2 variables are used to replace tags by an index, thus saving
3540           some space when creating a twig. If they really cause you too much
3541           trouble, let me know, it is probably possible to create either a
3542           switch or at least a version of XML::Twig that does not perform
3543           this optimization.
3544
3545             %gi2index;     # tag => index
3546             @index2gi;     # list of tags
3547
3548       If you need to manipulate all those values, you can use the following
3549       methods on the XML::Twig object:
3550
3551       global_state
3552           Return a hashref with all the global variables used by XML::Twig
3553
3554           The hash has the following fields:  "pretty", "quote", "indent",
3555           "empty_tag_style", "keep_encoding", "expand_external_entities",
3556           "output_filter", "output_text_filter", "keep_atts_order"
3557
3558       set_global_state ($state)
3559           Set the global state, $state is a hashref
3560
3561       save_global_state
3562           Save the current global state
3563
3564       restore_global_state
3565           Restore the previously saved (using "Lsave_global_state"> state
3566

TODO

3568       SAX handlers
3569           Allowing XML::Twig to work on top of any SAX parser
3570
3571       multiple twigs are not well supported
3572           A number of twig features are just global at the moment. These
3573           include the ID list and the "tag pool" (if you use "change_gi" then
3574           you change the tag for ALL twigs).
3575
3576           A future version will try to support this while trying not to be to
3577           hard on performance (at least when a single twig is used!).
3578

AUTHOR

3580       Michel Rodriguez <mirod@xmltwig.com>
3581

LICENSE

3583       This library is free software; you can redistribute it and/or modify it
3584       under the same terms as Perl itself.
3585
3586       Bug reports should be sent using: RT
3587       http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig
3588       <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig>
3589
3590       Comments can be sent to mirod@xmltwig.com
3591
3592       The XML::Twig page is at <http://www.xmltwig.com/xmltwig/> It includes
3593       the development version of the module, a slightly better version of the
3594       documentation, examples, a tutorial and a: Processing XML efficiently
3595       with Perl and XML::Twig:
3596       <http://www.xmltwig.com/xmltwig/tutorial/index.html>
3597

SEE ALSO

3599       Complete docs, including a tutorial, examples, an easier to use HTML
3600       version of the docs, a quick reference card and a FAQ are available at
3601       <http://www.xmltwig.com/xmltwig/>
3602
3603       git repository at <http://github.com/mirod/xmltwig>
3604
3605       XML::Parser, XML::Parser::Expat, XML::XPath, Encode, Text::Iconv,
3606       Scalar::Utils
3607
3608   Alternative Modules
3609       XML::Twig is not the only XML::Processing module available on CPAN (far
3610       from it!).
3611
3612       The main alternative I would recommend is XML::LibXML.
3613
3614       Here is a quick comparison of the 2 modules:
3615
3616       XML::LibXML, actually "libxml2" on which it is based, sticks to the
3617       standards, and implements a good number of them in a rather strict way:
3618       XML, XPath, DOM, RelaxNG, I must be forgetting a couple (XInclude?). It
3619       is fast and rather frugal memory-wise.
3620
3621       XML::Twig is older: when I started writing it XML::Parser/expat was the
3622       only game in town. It implements XML and that's about it (plus a subset
3623       of XPath, and you can use XML::Twig::XPath if you have XML::XPathEngine
3624       installed for full support). It is slower and requires more memory for
3625       a full tree than XML::LibXML. On the plus side (yes, there is a plus
3626       side!) it lets you process a big document in chunks, and thus let you
3627       tackle documents that couldn't be loaded in memory by XML::LibXML, and
3628       it offers a lot (and I mean a LOT!) of higher-level methods, for
3629       everything, from adding structure to "low-level" XML, to shortcuts for
3630       XHTML conversions and more. It also DWIMs quite a bit, getting comments
3631       and non-significant whitespaces out of the way but preserving them in
3632       the output for example. As it does not stick to the DOM, is also
3633       usually leads to shorter code than in XML::LibXML.
3634
3635       Beyond the pure features of the 2 modules, XML::LibXML seems to be
3636       prefered by "XML-purists", while XML::Twig seems to be more used by
3637       Perl Hackers who have to deal with XML. As you have noted, XML::Twig
3638       also comes with quite a lot of docs, but I am sure if you ask for help
3639       about XML::LibXML here or on Perlmonks you will get answers.
3640
3641       Note that it is actually quite hard for me to compare the 2 modules: on
3642       one hand I know XML::Twig inside-out and I can get it to do pretty much
3643       anything I need to (or I improve it ;--), while I have a very basic
3644       knowledge of XML::LibXML.  So feature-wise, I'd rather use XML::Twig
3645       ;--). On the other hand, I am painfully aware of some of the
3646       deficiencies, potential bugs and plain ugly code that lurk in
3647       XML::Twig, even though you are unlikely to be affected by them (unless
3648       for example you need to change the DTD of a document programatically),
3649       while I haven't looked much into XML::LibXML so it still looks shinny
3650       and clean to me.
3651
3652       That said, if you need to process a document that is too big to fit
3653       memory and XML::Twig is too slow for you, my reluctant advice would be
3654       to use "bare" XML::Parser.  It won't be as easy to use as XML::Twig:
3655       basically with XML::Twig you trade some speed (depending on what you do
3656       from a factor 3 to... none) for ease-of-use, but it will be easier IMHO
3657       than using SAX (albeit not standard), and at this point a LOT faster
3658       (see the last test in
3659       <http://www.xmltwig.com/article/simple_benchmark/>).
3660
3661
3662
3663perl v5.12.0                      2010-05-07                           Twig(3)
Impressum