1Twig(3) User Contributed Perl Documentation Twig(3)
2
3
4
6 XML::Twig - A perl module for processing huge XML documents in tree
7 mode.
8
10 Note that this documentation is intended as a reference to the module.
11
12 Complete docs, including a tutorial, examples, an easier to use HTML
13 version, a quick reference card and a FAQ are available at
14 <http://www.xmltwig.com/xmltwig>
15
16 Small documents (loaded in memory as a tree):
17
18 my $twig=XML::Twig->new(); # create the twig
19 $twig->parsefile( 'doc.xml'); # build it
20 my_process( $twig); # use twig methods to process it
21 $twig->print; # output the twig
22
23 Huge documents (processed in combined stream/tree mode):
24
25 # at most one div will be loaded in memory
26 my $twig=XML::Twig->new(
27 twig_handlers =>
28 { title => sub { $_->set_tag( 'h2') }, # change title tags to h2
29 para => sub { $_->set_tag( 'p') }, # change para to p
30 hidden => sub { $_->delete; }, # remove hidden elements
31 list => \&my_list_process, # process list elements
32 div => sub { $_[0]->flush; }, # output and free memory
33 },
34 pretty_print => 'indented', # output will be nicely formatted
35 empty_tags => 'html', # outputs <empty_tag />
36 );
37 $twig->flush; # flush the end of the document
38
39 See XML::Twig 101 for other ways to use the module, as a filter for
40 example.
41
43 This module provides a way to process XML documents. It is build on top
44 of "XML::Parser".
45
46 The module offers a tree interface to the document, while allowing you
47 to output the parts of it that have been completely processed.
48
49 It allows minimal resource (CPU and memory) usage by building the tree
50 only for the parts of the documents that need actual processing,
51 through the use of the "twig_roots " and "twig_print_outside_roots "
52 options. The "finish " and "finish_print " methods also help to
53 increase performances.
54
55 XML::Twig tries to make simple things easy so it tries its best to
56 takes care of a lot of the (usually) annoying (but sometimes necessary)
57 features that come with XML and XML::Parser.
58
60 XML::Twig can be used either on "small" XML documents (that fit in
61 memory) or on huge ones, by processing parts of the document and
62 outputting or discarding them once they are processed.
63
64 Loading an XML document and processing it
65 my $t= XML::Twig->new();
66 $t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
67 my $root= $t->root;
68 $root->set_tag( 'html'); # change doc to html
69 $title= $root->first_child( 'title'); # get the title
70 $title->set_tag( 'h1'); # turn it into h1
71 my @para= $root->children( 'para'); # get the para children
72 foreach my $para (@para)
73 { $para->set_tag( 'p'); } # turn them into p
74 $t->print; # output the document
75
76 Other useful methods include:
77
78 att: "$elt->{'att'}->{'foo'}" return the "foo" attribute for an
79 element,
80
81 set_att : "$elt->set_att( foo => "bar")" sets the "foo" attribute to
82 the "bar" value,
83
84 next_sibling: "$elt->{next_sibling}" return the next sibling in the
85 document (in the example "$title->{next_sibling}" is the first "para",
86 you can also (and actually should) use "$elt->next_sibling( 'para')" to
87 get it
88
89 The document can also be transformed through the use of the cut, copy,
90 paste and move methods: "$title->cut; $title->paste( after => $p);" for
91 example
92
93 And much, much more, see XML::Twig::Elt.
94
95 Processing an XML document chunk by chunk
96 One of the strengths of XML::Twig is that it let you work with files
97 that do not fit in memory (BTW storing an XML document in memory as a
98 tree is quite memory-expensive, the expansion factor being often around
99 10).
100
101 To do this you can define handlers, that will be called once a specific
102 element has been completely parsed. In these handlers you can access
103 the element and process it as you see fit, using the navigation and the
104 cut-n-paste methods, plus lots of convenient ones like "prefix ". Once
105 the element is completely processed you can then "flush " it, which
106 will output it and free the memory. You can also "purge " it if you
107 don't need to output it (if you are just extracting some data from the
108 document for example). The handler will be called again once the next
109 relevant element has been parsed.
110
111 my $t= XML::Twig->new( twig_handlers =>
112 { section => \§ion,
113 para => sub { $_->set_tag( 'p'); }
114 },
115 );
116 $t->parsefile( 'doc.xml');
117 $t->flush; # don't forget to flush one last time in the end or anything
118 # after the last </section> tag will not be output
119
120 # the handler is called once a section is completely parsed, ie when
121 # the end tag for section is found, it receives the twig itself and
122 # the element (including all its sub-elements) as arguments
123 sub section
124 { my( $t, $section)= @_; # arguments for all twig_handlers
125 $section->set_tag( 'div'); # change the tag name.4, my favourite method...
126 # let's use the attribute nb as a prefix to the title
127 my $title= $section->first_child( 'title'); # find the title
128 my $nb= $title->{'att'}->{'nb'}; # get the attribute
129 $title->prefix( "$nb - "); # easy isn't it?
130 $section->flush; # outputs the section and frees memory
131 }
132
133 There is of course more to it: you can trigger handlers on more
134 elaborate conditions than just the name of the element, "section/title"
135 for example.
136
137 my $t= XML::Twig->new( twig_handlers =>
138 { 'section/title' => sub { $_->print } }
139 )
140 ->parsefile( 'doc.xml');
141
142 Here "sub { $_->print }" simply prints the current element ($_ is
143 aliased to the element in the handler).
144
145 You can also trigger a handler on a test on an attribute:
146
147 my $t= XML::Twig->new( twig_handlers =>
148 { 'section[@level="1"]' => sub { $_->print } }
149 );
150 ->parsefile( 'doc.xml');
151
152 You can also use "start_tag_handlers " to process an element as soon as
153 the start tag is found. Besides "prefix " you can also use "suffix ",
154
155 Processing just parts of an XML document
156 The twig_roots mode builds only the required sub-trees from the
157 document Anything outside of the twig roots will just be ignored:
158
159 my $t= XML::Twig->new(
160 # the twig will include just the root and selected titles
161 twig_roots => { 'section/title' => \&print_n_purge,
162 'annex/title' => \&print_n_purge
163 }
164 );
165 $t->parsefile( 'doc.xml');
166
167 sub print_n_purge
168 { my( $t, $elt)= @_;
169 print $elt->text; # print the text (including sub-element texts)
170 $t->purge; # frees the memory
171 }
172
173 You can use that mode when you want to process parts of a documents but
174 are not interested in the rest and you don't want to pay the price,
175 either in time or memory, to build the tree for the it.
176
177 Building an XML filter
178 You can combine the "twig_roots" and the "twig_print_outside_roots"
179 options to build filters, which let you modify selected elements and
180 will output the rest of the document as is.
181
182 This would convert prices in $ to prices in Euro in a document:
183
184 my $t= XML::Twig->new(
185 twig_roots => { 'price' => \&convert, }, # process prices
186 twig_print_outside_roots => 1, # print the rest
187 );
188 $t->parsefile( 'doc.xml');
189
190 sub convert
191 { my( $t, $price)= @_;
192 my $currency= $price->{'att'}->{'currency'}; # get the currency
193 if( $currency eq 'USD')
194 { $usd_price= $price->text; # get the price
195 # %rate is just a conversion table
196 my $euro_price= $usd_price * $rate{usd2euro};
197 $price->set_text( $euro_price); # set the new price
198 $price->set_att( currency => 'EUR'); # don't forget this!
199 }
200 $price->print; # output the price
201 }
202
203 XML::Twig and various versions of Perl, XML::Parser and expat:
204 Before being uploaded to CPAN, XML::Twig 3.22 has been tested under the
205 following environments:
206
207 linux-x86
208 perl 5.6.2, expat 1.95.8, XML::Parser 2.34 perl 5.8.0, expat
209 1.95.8, XML::Parser 2.34 perl 5.8.7, expat 1.95.8, XML::Parser2.34
210
211 Solaris
212 perl 5.6.1, expat 1.95.2, XML::Parser 2.31
213
214 XML::Twig is a lot more sensitive to variations in versions of perl,
215 XML::Parser and expat than to the OS, so this should cover some
216 reasonable configurations.
217
218 The "recommended configuration" is perl 5.8.3+ (for good Unicode
219 support), XML::Parser 2.31+ and expat 1.95.5+
220
221 See http://testers.cpan.org/search?request=dist&dist=XML-Twig
222 <http://testers.cpan.org/search?request=dist&dist=XML-Twig> for the
223 CPAN testers reports on XML::Twig, which list all tested
224 configurations.
225
226 An Atom feed of the CPAN Testers results is available at
227 <http://xmltwig.com/rss/twig_testers.rss>
228
229 Finally:
230
231 XML::Twig does NOT work with expat 1.95.4
232 XML::Twig only works with XML::Parser 2.27 in perl 5.6.*
233 Note that I can't compile XML::Parser 2.27 anymore, so I can't
234 guarantee that it still works
235
236 XML::Parser 2.28 does not really work
237
238 When in doubt, upgrade expat, XML::Parser and Scalar::Util
239
240 Finally, for some optional features, XML::Twig depends on some
241 additional modules. The complete list, which depends somewhat on the
242 version of Perl that you are running, is given by running
243 "t/zz_dump_config.t"
244
246 Whitespaces
247 Whitespaces that look non-significant are discarded, this behaviour
248 can be controlled using the "keep_spaces ", "keep_spaces_in " and
249 "discard_spaces_in " options.
250
251 Encoding
252 You can specify that you want the output in the same encoding as
253 the input (provided you have valid XML, which means you have to
254 specify the encoding either in the document or when you create the
255 Twig object) using the "keep_encoding " option
256
257 You can also use "output_encoding" to convert the internal UTF-8
258 format to the required encoding.
259
260 Comments and Processing Instructions (PI)
261 Comments and PI's can be hidden from the processing, but still
262 appear in the output (they are carried by the "real" element closer
263 to them)
264
265 Pretty Printing
266 XML::Twig can output the document pretty printed so it is easier to
267 read for us humans.
268
269 Surviving an untimely death
270 XML parsers are supposed to react violently when fed improper XML.
271 XML::Parser just dies.
272
273 XML::Twig provides the "safe_parse " and the "safe_parsefile "
274 methods which wrap the parse in an eval and return either the
275 parsed twig or 0 in case of failure.
276
277 Private attributes
278 Attributes with a name starting with # (illegal in XML) will not be
279 output, so you can safely use them to store temporary values during
280 processing. Note that you can store anything in a private
281 attribute, not just text, it's just a regular Perl variable, so a
282 reference to an object or a huge data structure is perfectly fine.
283
285 XML::Twig uses a very limited number of classes. The ones you are most
286 likely to use are "XML::Twig" of course, which represents a complete
287 XML document, including the document itself (the root of the document
288 itself is "root"), its handlers, its input or output filters... The
289 other main class is "XML::Twig::Elt", which models an XML element.
290 Element here has a very wide definition: it can be a regular element,
291 or but also text, with an element "tag" of "#PCDATA" (or "#CDATA"), an
292 entity (tag is "#ENT"), a Processing Instruction ("#PI"), a comment
293 ("#COMMENT").
294
295 Those are the 2 commonly used classes.
296
297 You might want to look the "elt_class" option if you want to subclass
298 "XML::Twig::Elt".
299
300 Attributes are just attached to their parent element, they are not
301 objects per se. (Please use the provided methods "att" and "set_att" to
302 access them, if you access them as a hash, then your code becomes
303 implementaion dependent and might break in the future).
304
305 Other classes that are seldom used are "XML::Twig::Entity_list" and
306 "XML::Twig::Entity".
307
308 If you use "XML::Twig::XPath" instead of "XML::Twig", elements are then
309 created as "XML::Twig::XPath::Elt"
310
312 XML::Twig
313 A twig is a subclass of XML::Parser, so all XML::Parser methods can be
314 called on a twig object, including parse and parsefile. "setHandlers"
315 on the other hand cannot be used, see "BUGS "
316
317 new This is a class method, the constructor for XML::Twig. Options are
318 passed as keyword value pairs. Recognized options are the same as
319 XML::Parser, plus some XML::Twig specifics.
320
321 New Options:
322
323 twig_handlers
324 This argument consists of a hash "{ expression =" \&handler}>
325 where expression is a an XPath-like expression (+ some others).
326
327 XPath expressions are limited to using the child and descendant
328 axis (indeed you can't specify an axis), and predicates cannot
329 be nested. You can use the "string", or "string(<tag>)"
330 function (except in "twig_roots" triggers).
331
332 Additionally you can use regexps (/ delimited) to match
333 attribute and string values.
334
335 Examples:
336
337 foo
338 foo/bar
339 foo//bar
340 /foo/bar
341 /foo//bar
342 /foo/bar[@att1 = "val1" and @att2 = "val2"]/baz[@a >= 1]
343 foo[string()=~ /^duh!+/]
344 /foo[string(bar)=~ /\d+/]/baz[@att != 3]
345
346 #CDATA can be used to call a handler for a CDATA. #COMMENT can
347 be used to call a handler for comments
348
349 Some additional (non-XPath) expressions are also provided for
350 convenience:
351
352 processing instructions
353 '?' or '#PI' triggers the handler for any processing
354 instruction, and '?<target>' or '#PI <target>' triggers a
355 handler for processing instruction with the given target(
356 ex: '#PI xml-stylesheet').
357
358 level(<level>)
359 Triggers the handler on any element at that level in the
360 tree (root is level 1)
361
362 _all_
363 Triggers the handler for all elements in the tree
364
365 _default_
366 Triggers the handler for each element that does NOT have
367 any other handler.
368
369 Expressions are evaluated against the input document. Which
370 means that even if you have changed the tag of an element
371 (changing the tag of a parent element from a handler for
372 example) the change will not impact the expression evaluation.
373 There is an exception to this: "private" attributes (which name
374 start with a '#', and can only be created during the parsing,
375 as they are not valid XML) are checked against the current
376 twig.
377
378 Handlers are triggered in fixed order, sorted by their type
379 (xpath expressions first, then regexps, then level), then by
380 whether they specify a full path (starting at the root element)
381 or not, then by by number of steps in the expression , then
382 number of predicates, then number of tests in predicates.
383 Handlers where the last step does not specify a step
384 ("foo/bar/*") are triggered after other XPath handlers.
385 Finally "_all_" handlers are triggered last.
386
387 Important: once a handler has been triggered if it returns 0
388 then no other handler is called, except a "_all_" handler which
389 will be called anyway.
390
391 If a handler returns a true value and other handlers apply,
392 then the next applicable handler will be called. Repeat, rinse,
393 lather..; The exception to that rule is when the
394 "do_not_chain_handlers" option is set, in which case only the
395 first handler will be called.
396
397 Note that it might be a good idea to explicitly return a short
398 true value (like 1) from handlers: this ensures that other
399 applicable handlers are called even if the last statement for
400 the handler happens to evaluate to false. This might also
401 speedup the code by avoiding the result of the last statement
402 of the code to be copied and passed to the code managing
403 handlers. It can really pay to have 1 instead of a long string
404 returned.
405
406 When an element is CLOSED the corresponding handler is called,
407 with 2 arguments: the twig and the "Element ". The twig
408 includes the document tree that has been built so far, the
409 element is the complete sub-tree for the element. This means
410 that handlers for inner elements are called before handlers for
411 outer elements.
412
413 $_ is also set to the element, so it is easy to write inline
414 handlers like
415
416 para => sub { $_->set_tag( 'p'); }
417
418 Text is stored in elements whose tag is #PCDATA (due to mixed
419 content, text and sub-element in an element there is no way to
420 store the text as just an attribute of the enclosing element).
421
422 Warning: if you have used purge or flush on the twig the
423 element might not be complete, some of its children might have
424 been entirely flushed or purged, and the start tag might even
425 have been printed (by "flush") already, so changing its tag
426 might not give the expected result.
427
428 twig_roots
429 This argument let's you build the tree only for those elements
430 you are interested in.
431
432 Example: my $t= XML::Twig->new( twig_roots => { title => 1, subtitle => 1});
433 $t->parsefile( file);
434 my $t= XML::Twig->new( twig_roots => { 'section/title' => 1});
435 $t->parsefile( file);
436
437 return a twig containing a document including only "title" and
438 "subtitle" elements, as children of the root element.
439
440 You can use generic_attribute_condition, attribute_condition,
441 full_path, partial_path, tag, tag_regexp, _default_ and _all_
442 to trigger the building of the twig. string_condition and
443 regexp_condition cannot be used as the content of the element,
444 and the string, have not yet been parsed when the condition is
445 checked.
446
447 WARNING: path are checked for the document. Even if the
448 "twig_roots" option is used they will be checked against the
449 full document tree, not the virtual tree created by XML::Twig
450
451 WARNING: twig_roots elements should NOT be nested, that would
452 hopelessly confuse XML::Twig ;--(
453
454 Note: you can set handlers (twig_handlers) using twig_roots
455 Example: my $t= XML::Twig->new( twig_roots =>
456 { title => sub {
457 $_{1]->print;},
458 subtitle =>
459 \&process_subtitle
460 }
461 );
462 $t->parsefile( file);
463
464 twig_print_outside_roots
465 To be used in conjunction with the "twig_roots" argument. When
466 set to a true value this will print the document outside of the
467 "twig_roots" elements.
468
469 Example: my $t= XML::Twig->new( twig_roots => { title => \&number_title },
470 twig_print_outside_roots => 1,
471 );
472 $t->parsefile( file);
473 { my $nb;
474 sub number_title
475 { my( $twig, $title);
476 $nb++;
477 $title->prefix( "$nb "; }
478 $title->print;
479 }
480 }
481
482 This example prints the document outside of the title element,
483 calls "number_title" for each "title" element, prints it, and
484 then resumes printing the document. The twig is built only for
485 the "title" elements.
486
487 If the value is a reference to a file handle then the document
488 outside the "twig_roots" elements will be output to this file
489 handle:
490
491 open( OUT, ">out_file") or die "cannot open out file out_file:$!";
492 my $t= XML::Twig->new( twig_roots => { title => \&number_title },
493 # default output to OUT
494 twig_print_outside_roots => \*OUT,
495 );
496
497 { my $nb;
498 sub number_title
499 { my( $twig, $title);
500 $nb++;
501 $title->prefix( "$nb "; }
502 $title->print( \*OUT); # you have to print to \*OUT here
503 }
504 }
505
506 start_tag_handlers
507 A hash "{ expression =" \&handler}>. Sets element handlers that
508 are called when the element is open (at the end of the
509 XML::Parser "Start" handler). The handlers are called with 2
510 params: the twig and the element. The element is empty at that
511 point, its attributes are created though.
512
513 You can use generic_attribute_condition, attribute_condition,
514 full_path, partial_path, tag, tag_regexp, _default_ and _all_
515 to trigger the handler.
516
517 string_condition and regexp_condition cannot be used as the
518 content of the element, and the string, have not yet been
519 parsed when the condition is checked.
520
521 The main uses for those handlers are to change the tag name
522 (you might have to do it as soon as you find the open tag if
523 you plan to "flush" the twig at some point in the element, and
524 to create temporary attributes that will be used when
525 processing sub-element with "twig_hanlders".
526
527 You should also use it to change tags if you use "flush". If
528 you change the tag in a regular "twig_handler" then the start
529 tag might already have been flushed.
530
531 Note: "start_tag" handlers can be called outside of
532 "twig_roots" if this argument is used, in this case handlers
533 are called with the following arguments: $t (the twig), $tag
534 (the tag of the element) and %att (a hash of the attributes of
535 the element).
536
537 If the "twig_print_outside_roots" argument is also used, if the
538 last handler called returns a "true" value, then the the start
539 tag will be output as it appeared in the original document, if
540 the handler returns a a "false" value then the start tag will
541 not be printed (so you can print a modified string yourself for
542 example).
543
544 Note that you can use the ignore method in "start_tag_handlers"
545 (and only there).
546
547 end_tag_handlers
548 A hash "{ expression =" \&handler}>. Sets element handlers that
549 are called when the element is closed (at the end of the
550 XML::Parser "End" handler). The handlers are called with 2
551 params: the twig and the tag of the element.
552
553 twig_handlers are called when an element is completely parsed,
554 so why have this redundant option? There is only one use for
555 "end_tag_handlers": when using the "twig_roots" option, to
556 trigger a handler for an element outside the roots. It is for
557 example very useful to number titles in a document using nested
558 sections:
559
560 my @no= (0);
561 my $no;
562 my $t= XML::Twig->new(
563 start_tag_handlers =>
564 { section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
565 twig_roots =>
566 { title => sub { $_[1]->prefix( $no); $_[1]->print; } },
567 end_tag_handlers => { section => sub { pop @no; } },
568 twig_print_outside_roots => 1
569 );
570 $t->parsefile( $file);
571
572 Using the "end_tag_handlers" argument without "twig_roots" will
573 result in an error.
574
575 do_not_chain_handlers
576 If this option is set to a true value, then only one handler
577 will be called for each element, even if several satisfy the
578 condition
579
580 Note that the "_all_" handler will still be called regardless
581
582 ignore_elts
583 This option lets you ignore elements when building the twig.
584 This is useful in cases where you cannot use "twig_roots" to
585 ignore elements, for example if the element to ignore is a
586 sibling of elements you are interested in.
587
588 Example:
589
590 my $twig= XML::Twig->new( ignore_elts => { elt => 1 });
591 $twig->parsefile( 'doc.xml');
592
593 This will build the complete twig for the document, except that
594 all "elt" elements (and their children) will be left out.
595
596 char_handler
597 A reference to a subroutine that will be called every time
598 "PCDATA" is found.
599
600 The subroutine receives the string as argument, and returns the
601 modified string:
602
603 # we want all strings in upper case
604 sub my_char_handler
605 { my( $text)= @_;
606 $text= uc( $text);
607 return $text;
608 }
609
610 elt_class
611 The name of a class used to store elements. this class should
612 inherit from "XML::Twig::Elt" (and by default it is
613 "XML::Twig::Elt"). This option is used to subclass the element
614 class and extend it with new methods.
615
616 This option is needed because during the parsing of the XML,
617 elements are created by "XML::Twig", without any control from
618 the user code.
619
620 keep_atts_order
621 Setting this option to a true value causes the attribute hash
622 to be tied to a "Tie::IxHash" object. This means that
623 "Tie::IxHash" needs to be installed for this option to be
624 available. It also means that the hash keeps its order, so you
625 will get the attributes in order. This allows outputting the
626 attributes in the same order as they were in the original
627 document.
628
629 keep_encoding
630 This is a (slightly?) evil option: if the XML document is not
631 UTF-8 encoded and you want to keep it that way, then setting
632 keep_encoding will use the"Expat" original_string method for
633 character, thus keeping the original encoding, as well as the
634 original entities in the strings.
635
636 See the "t/test6.t" test file to see what results you can
637 expect from the various encoding options.
638
639 WARNING: if the original encoding is multi-byte then attribute
640 parsing will be EXTREMELY unsafe under any Perl before 5.6, as
641 it uses regular expressions which do not deal properly with
642 multi-byte characters. You can specify an alternate function to
643 parse the start tags with the "parse_start_tag" option (see
644 below)
645
646 WARNING: this option is NOT used when parsing with the non-
647 blocking parser ("parse_start", "parse_more", parse_done
648 methods) which you probably should not use with XML::Twig
649 anyway as they are totally untested!
650
651 output_encoding
652 This option generates an output_filter using "Encode",
653 "Text::Iconv" or "Unicode::Map8" and "Unicode::Strings", and
654 sets the encoding in the XML declaration. This is the easiest
655 way to deal with encodings, if you need more sophisticated
656 features, look at "output_filter" below
657
658 output_filter
659 This option is used to convert the character encoding of the
660 output document. It is passed either a string corresponding to
661 a predefined filter or a subroutine reference. The filter will
662 be called every time a document or element is processed by the
663 "print" functions ("print", "sprint", "flush").
664
665 Pre-defined filters:
666
667 latin1
668 uses either "Encode", "Text::Iconv" or "Unicode::Map8" and
669 "Unicode::String" or a regexp (which works only with
670 XML::Parser 2.27), in this order, to convert all characters
671 to ISO-8859-1 (aka latin1)
672
673 html
674 does the same conversion as "latin1", plus encodes entities
675 using "HTML::Entities" (oddly enough you will need to have
676 HTML::Entities installed for it to be available). This
677 should only be used if the tags and attribute names
678 themselves are in US-ASCII, or they will be converted and
679 the output will not be valid XML any more
680
681 safe
682 converts the output to ASCII (US) only plus character
683 entities ("&#nnn;") this should be used only if the tags
684 and attribute names themselves are in US-ASCII, or they
685 will be converted and the output will not be valid XML any
686 more
687
688 safe_hex
689 same as "safe" except that the character entities are in
690 hexa ("&#xnnn;")
691
692 encode_convert ($encoding)
693 Return a subref that can be used to convert utf8 strings to
694 $encoding). Uses "Encode".
695
696 my $conv = XML::Twig::encode_convert( 'latin1');
697 my $t = XML::Twig->new(output_filter => $conv);
698
699 iconv_convert ($encoding)
700 this function is used to create a filter subroutine that
701 will be used to convert the characters to the target
702 encoding using "Text::Iconv" (which needs to be installed,
703 look at the documentation for the module and for the
704 "iconv" library to find out which encodings are available
705 on your system)
706
707 my $conv = XML::Twig::iconv_convert( 'latin1');
708 my $t = XML::Twig->new(output_filter => $conv);
709
710 unicode_convert ($encoding)
711 this function is used to create a filter subroutine that
712 will be used to convert the characters to the target
713 encoding using "Unicode::Strings" and "Unicode::Map8"
714 (which need to be installed, look at the documentation for
715 the modules to find out which encodings are available on
716 your system)
717
718 my $conv = XML::Twig::unicode_convert( 'latin1');
719 my $t = XML::Twig->new(output_filter => $conv);
720
721 The "text" and "att" methods do not use the filter, so their
722 result are always in unicode.
723
724 Those predeclared filters are based on subroutines that can be
725 used by themselves (as "XML::Twig::foo").
726
727 html_encode ($string)
728 Use "HTML::Entities" to encode a utf8 string
729
730 safe_encode ($string)
731 Use either a regexp (perl < 5.8) or "Encode" to encode non-
732 ascii characters in the string in "&#<nnnn>;" format
733
734 safe_encode_hex ($string)
735 Use either a regexp (perl < 5.8) or "Encode" to encode non-
736 ascii characters in the string in "&#x<nnnn>;" format
737
738 regexp2latin1 ($string)
739 Use a regexp to encode a utf8 string into latin 1
740 (ISO-8859-1). Does not work with Perl 5.8.0!
741
742 output_text_filter
743 same as output_filter, except it doesn't apply to the brackets
744 and quotes around attribute values. This is useful for all
745 filters that could change the tagging, basically anything that
746 does not just change the encoding of the output. "html", "safe"
747 and "safe_hex" are better used with this option.
748
749 input_filter
750 This option is similar to "output_filter" except the filter is
751 applied to the characters before they are stored in the twig,
752 at parsing time.
753
754 remove_cdata
755 Setting this option to a true value will force the twig to
756 output CDATA sections as regular (escaped) PCDATA
757
758 parse_start_tag
759 If you use the "keep_encoding" option then this option can be
760 used to replace the default parsing function. You should
761 provide a coderef (a reference to a subroutine) as the
762 argument, this subroutine takes the original tag (given by
763 XML::Parser::Expat "original_string()" method) and returns a
764 tag and the attributes in a hash (or in a list
765 attribute_name/attribute value).
766
767 expand_external_ents
768 When this option is used external entities (that are defined)
769 are expanded when the document is output using "print"
770 functions such as "print ", "sprint ", "flush " and "xml_string
771 ". Note that in the twig the entity will be stored as an
772 element with a tag '"#ENT"', the entity will not be expanded
773 there, so you might want to process the entities before
774 outputting it.
775
776 If an external entity is not available, then the parse will
777 fail.
778
779 A special case is when the value of this option is -1. In that
780 case a missing entity will not cause the parser to die, but its
781 "name", "sysid" and "pubid" will be stored in the twig as
782 "$twig->{twig_missing_system_entities}" (a reference to an
783 array of hashes { name => <name>, sysid => <sysid>, pubid =>
784 <pubid> }). Yes, this is a bit of a hack, but it's useful in
785 some cases.
786
787 load_DTD
788 If this argument is set to a true value, "parse" or "parsefile"
789 on the twig will load the DTD information. This information
790 can then be accessed through the twig, in a "DTD_handler" for
791 example. This will load even an external DTD.
792
793 Default and fixed values for attributes will also be filled,
794 based on the DTD.
795
796 Note that to do this the module will generate a temporary file
797 in the current directory. If this is a problem let me know and
798 I will add an option to specify an alternate directory.
799
800 See "DTD Handling" for more information
801
802 DTD_handler
803 Set a handler that will be called once the doctype (and the
804 DTD) have been loaded, with 2 arguments, the twig and the DTD.
805
806 no_prolog
807 Does not output a prolog (XML declaration and DTD)
808
809 id This optional argument gives the name of an attribute that can
810 be used as an ID in the document. Elements whose ID is known
811 can be accessed through the elt_id method. id defaults to 'id'.
812 See "BUGS "
813
814 discard_spaces
815 If this optional argument is set to a true value then spaces
816 are discarded when they look non-significant: strings
817 containing only spaces are discarded. This argument is set to
818 true by default.
819
820 keep_spaces
821 If this optional argument is set to a true value then all
822 spaces in the document are kept, and stored as "PCDATA".
823
824 Warning: adding this option can result in changes in the twig
825 generated: space that was previously discarded might end up in
826 a new text element. see the difference by calling the following
827 code with 0 and 1 as arguments:
828
829 perl -MXML::Twig -e'print XML::Twig->new( keep_spaces => shift)->parse( "<d> \n<e/></d>")->_dump'
830
831 "keep_spaces" and "discard_spaces" cannot be both set.
832
833 discard_spaces_in
834 This argument sets "keep_spaces" to true but will cause the
835 twig builder to discard spaces in the elements listed.
836
837 The syntax for using this argument is:
838
839 XML::Twig->new( discard_spaces_in => [ 'elt1', 'elt2']);
840
841 keep_spaces_in
842 This argument sets "discard_spaces" to true but will cause the
843 twig builder to keep spaces in the elements listed.
844
845 The syntax for using this argument is:
846
847 XML::Twig->new( keep_spaces_in => [ 'elt1', 'elt2']);
848
849 Warning: adding this option can result in changes in the twig
850 generated: space that was previously discarded might end up in
851 a new text element.
852
853 pretty_print
854 Set the pretty print method, amongst '"none"' (default),
855 '"nsgmls"', '"nice"', '"indented"', '"indented_c"',
856 '"indented_a"', '"indented_close_tag"', '"cvs"', '"wrapped"',
857 '"record"' and '"record_c"'
858
859 pretty_print formats:
860
861 none
862 The document is output as one ling string, with no line
863 breaks except those found within text elements
864
865 nsgmls
866 Line breaks are inserted in safe places: that is within
867 tags, between a tag and an attribute, between attributes
868 and before the > at the end of a tag.
869
870 This is quite ugly but better than "none", and it is very
871 safe, the document will still be valid (conforming to its
872 DTD).
873
874 This is how the SGML parser "sgmls" splits documents, hence
875 the name.
876
877 nice
878 This option inserts line breaks before any tag that does
879 not contain text (so element with textual content are not
880 broken as the \n is the significant).
881
882 WARNING: this option leaves the document well-formed but
883 might make it invalid (not conformant to its DTD). If you
884 have elements declared as
885
886 <!ELEMENT foo (#PCDATA|bar)>
887
888 then a "foo" element including a "bar" one will be printed
889 as
890
891 <foo>
892 <bar>bar is just pcdata</bar>
893 </foo>
894
895 This is invalid, as the parser will take the line break
896 after the "foo" tag as a sign that the element contains
897 PCDATA, it will then die when it finds the "bar" tag. This
898 may or may not be important for you, but be aware of it!
899
900 indented
901 Same as "nice" (and with the same warning) but indents
902 elements according to their level
903
904 indented_c
905 Same as "indented" but a little more compact: the closing
906 tags are on the same line as the preceding text
907
908 indented_close_tag
909 Same as "indented" except that the closing tag is also
910 indented, to line up with the tags within the element
911
912 idented_a
913 This formats XML files in a line-oriented version control
914 friendly way. The format is described in
915 <http://tinyurl.com/2kwscq> (that's an Oracle document with
916 an insanely long URL).
917
918 Note that to be totaly conformant to the "spec", the order
919 of attributes should not be changed, so if they are not
920 already in alphabetical order you will need to use the
921 "keep_atts_order" option.
922
923 cvs Same as "idented_a".
924
925 wrapped
926 Same as "indented_c" but lines are wrapped using
927 Text::Wrap::wrap. The default length for lines is the
928 default for $Text::Wrap::columns, and can be changed by
929 changing that variable.
930
931 record
932 This is a record-oriented pretty print, that display data
933 in records, one field per line (which looks a LOT like
934 "indented")
935
936 record_c
937 Stands for record compact, one record per line
938
939 empty_tags
940 Set the empty tag display style ('"normal"', '"html"' or
941 '"expand"').
942
943 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
944 '"<tag />"' for elements that can be empty in XHTML and
945 "expand" outputs '"<tag></tag>"'
946
947 quote
948 Set the quote character for attributes ('"single"' or
949 '"double"').
950
951 escape_gt
952 By default XML::Twig does not escape the character > in its
953 output, as it is not mandated by the XML spec. With this option
954 on, > will be replaced by ">"
955
956 comments
957 Set the way comments are processed: '"drop"' (default),
958 '"keep"' or '"process"'
959
960 Comments processing options:
961
962 drop
963 drops the comments, they are not read, nor printed to the
964 output
965
966 keep
967 comments are loaded and will appear on the output, they are
968 not accessible within the twig and will not interfere with
969 processing though
970
971 Note: comments in the middle of a text element such as
972
973 <p>text <!-- comment --> more text --></p>
974
975 are kept at their original position in the text. Using
976 EeX"print" methods like "print" or "sprint" will return the
977 comments in the text. Using "text" or "field" on the other
978 hand will not.
979
980 Any use of "set_pcdata" on the "#PCDATA" element (directly
981 or through other methods like "set_content") will delete
982 the comment(s).
983
984 process
985 comments are loaded in the twig and will be treated as
986 regular elements (their "tag" is "#COMMENT") this can
987 interfere with processing if you expect
988 "$elt->{first_child}" to be an element but find a comment
989 there. Validation will not protect you from this as
990 comments can happen anywhere. You can use
991 "$elt->first_child( 'tag')" (which is a good habit anyway)
992 to get where you want.
993
994 Consider using "process" if you are outputting SAX events
995 from XML::Twig.
996
997 pi Set the way processing instructions are processed: '"drop"',
998 '"keep"' (default) or '"process"'
999
1000 Note that you can also set PI handlers in the "twig_handlers"
1001 option:
1002
1003 '?' => \&handler
1004 '?target' => \&handler 2
1005
1006 The handlers will be called with 2 parameters, the twig and the
1007 PI element if "pi" is set to "process", and with 3, the twig,
1008 the target and the data if "pi" is set to "keep". Of course
1009 they will not be called if "pi" is set to "drop".
1010
1011 If "pi" is set to "keep" the handler should return a string
1012 that will be used as-is as the PI text (it should look like ""
1013 <?target data?" >" or '' if you want to remove the PI),
1014
1015 Only one handler will be called, "?target" or "?" if no
1016 specific handler for that target is available.
1017
1018 map_xmlns
1019 This option is passed a hashref that maps uri's to prefixes.
1020 The prefixes in the document will be replaced by the ones in
1021 the map. The mapped prefixes can (actually have to) be used to
1022 trigger handlers, navigate or query the document.
1023
1024 Here is an example:
1025
1026 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1027 twig_handlers =>
1028 { 'svg:circle' => sub { $_->set_att( r => 20) } },
1029 pretty_print => 'indented',
1030 )
1031 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1032 <gr:circle cx="10" cy="90" r="10"/>
1033 </doc>'
1034 )
1035 ->print;
1036
1037 This will output:
1038
1039 <doc xmlns:svg="http://www.w3.org/2000/svg">
1040 <svg:circle cx="10" cy="90" r="20"/>
1041 </doc>
1042
1043 keep_original_prefix
1044 When used with "map_xmlns" this option will make "XML::Twig"
1045 use the original namespace prefixes when outputting a document.
1046 The mapped prefix will still be used for triggering handlers
1047 and in navigation and query methods.
1048
1049 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1050 twig_handlers =>
1051 { 'svg:circle' => sub { $_->set_att( r => 20) } },
1052 keep_original_prefix => 1,
1053 pretty_print => 'indented',
1054 )
1055 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1056 <gr:circle cx="10" cy="90" r="10"/>
1057 </doc>'
1058 )
1059 ->print;
1060
1061 This will output:
1062
1063 <doc xmlns:gr="http://www.w3.org/2000/svg">
1064 <gr:circle cx="10" cy="90" r="20"/>
1065 </doc>
1066
1067 index ($arrayref or $hashref)
1068 This option creates lists of specific elements during the
1069 parsing of the XML. It takes a reference to either a list of
1070 triggering expressions or to a hash name => expression, and for
1071 each one generates the list of elements that match the
1072 expression. The list can be accessed through the "index"
1073 method.
1074
1075 example:
1076
1077 # using an array ref
1078 my $t= XML::Twig->new( index => [ 'div', 'table' ])
1079 ->parsefile( "foo.xml');
1080 my $divs= $t->index( 'div');
1081 my $first_div= $divs->[0];
1082 my $last_table= $t->index( table => -1);
1083
1084 # using a hashref to name the indexes
1085 my $t= XML::Twig->new( index => { email => 'a[@href=~/^\s*mailto:/]')
1086 ->parsefile( "foo.xml');
1087 my $last_emails= $t->index( email => -1);
1088
1089 Note that the index is not maintained after the parsing. If
1090 elements are deleted, renamed or otherwise hurt during
1091 processing, the index is NOT updated.
1092
1093 Note: I _HATE_ the Java-like name of arguments used by most XML
1094 modules. So in pure TIMTOWTDI fashion all arguments can be written
1095 either as "UglyJavaLikeName" or as "readable_perl_name":
1096 "twig_print_outside_roots" or "TwigPrintOutsideRoots" (or even
1097 "twigPrintOutsideRoots" {shudder}). XML::Twig normalizes them
1098 before processing them.
1099
1100 parse ( $source)
1101 The $source parameter should either be a string containing the
1102 whole XML document, or it should be an open "IO::Handle".
1103 Constructor options to "XML::Parser::Expat" given as keyword-value
1104 pairs may follow the$source parameter. These override, for this
1105 call, any options or attributes passed through from the XML::Parser
1106 instance.
1107
1108 A die call is thrown if a parse error occurs. Otherwise it will
1109 return the twig built by the parse. Use "safe_parse" if you want
1110 the parsing to return even when an error occurs.
1111
1112 If this method is called as a class method ("XML::Twig->parse(
1113 $some_xml_or_html)") then an XML::Twig object is created, using the
1114 parameters except the last one (eg "XML::Twig->parse( pretty_print
1115 => 'indented', $some_xml_or_html)") and "xparse" is called on it.
1116
1117 parsestring
1118 This is just an alias for "parse" for backwards compatibility.
1119
1120 parsefile (FILE [, OPT => OPT_VALUE [...]])
1121 Open "FILE" for reading, then call "parse" with the open handle.
1122 The file is closed no matter how "parse" returns.
1123
1124 A "die" call is thrown if a parse error occurs. Otherwise it will
1125 return the twig built by the parse. Use "safe_parsefile" if you
1126 want the parsing to return even when an error occurs.
1127
1128 parsefile_inplace ( $file, $optional_extension)
1129 Parse and update a file "in place". It does this by creating a temp
1130 file, selecting it as the default for print() statements (and
1131 methods), then parsing the input file. If the parsing is
1132 successful, then the temp file is moved to replace the input file.
1133
1134 If an extension is given then the original file is backed-up (the
1135 rules for the extension are the same as the rule for the -i option
1136 in perl).
1137
1138 parsefile_html_inplace ( $file, $optional_extension)
1139 Same as parsefile_inplace, except that it parses HTML instead of
1140 XML
1141
1142 parseurl ($url $optional_user_agent)
1143 Gets the data from $url and parse it. The data is piped to the
1144 parser in chunks the size of the XML::Parser::Expat buffer, so
1145 memory consumption and hopefully speed are optimal.
1146
1147 For most (read "small") XML it is probably as efficient (and easier
1148 to debug) to just "get" the XML file and then parse it as a string.
1149
1150 use XML::Twig;
1151 use LWP::Simple;
1152 my $twig= XML::Twig->new();
1153 $twig->parse( LWP::Simple::get( $URL ));
1154
1155 or
1156
1157 use XML::Twig;
1158 my $twig= XML::Twig->nparse( $URL);
1159
1160 If the $optional_user_agent argument is used then it is used,
1161 otherwise a new one is created.
1162
1163 safe_parse ( SOURCE [, OPT => OPT_VALUE [...]])
1164 This method is similar to "parse" except that it wraps the parsing
1165 in an "eval" block. It returns the twig on success and 0 on failure
1166 (the twig object also contains the parsed twig). $@ contains the
1167 error message on failure.
1168
1169 Note that the parsing still stops as soon as an error is detected,
1170 there is no way to keep going after an error.
1171
1172 safe_parsefile (FILE [, OPT => OPT_VALUE [...]])
1173 This method is similar to "parsefile" except that it wraps the
1174 parsing in an "eval" block. It returns the twig on success and 0 on
1175 failure (the twig object also contains the parsed twig) . $@
1176 contains the error message on failure
1177
1178 Note that the parsing still stops as soon as an error is detected,
1179 there is no way to keep going after an error.
1180
1181 safe_parseurl ($url $optional_user_agent)
1182 Same as "parseurl" except that it wraps the parsing in an "eval"
1183 block. It returns the twig on success and 0 on failure (the twig
1184 object also contains the parsed twig) . $@ contains the error
1185 message on failure
1186
1187 parse_html ($string_or_fh)
1188 parse an HTML string or file handle (by converting it to XML using
1189 HTML::TreeBuilder, which needs to be available).
1190
1191 This works nicely, but some information gets lost in the process:
1192 newlines are removed, and (at least on the version I use), comments
1193 get get an extra CDATA section inside ( <!-- foo --> becomes <!--
1194 <![CDATA[ foo ]]> -->
1195
1196 parsefile_html
1197 parse an HTML file (by converting it to XML using
1198 HTML::TreeBuilder, which needs to be available). The file is loaded
1199 completely in memory and converted to XML before being parsed.
1200
1201 Alpha: implementation, and thus generated XML could change.
1202
1203 safe_parseurl_html ($url $optional_user_agent)
1204 Same as "parseurl_html"> except that it wraps the parsing in an
1205 "eval" block. It returns the twig on success and 0 on failure (the
1206 twig object also contains the parsed twig) . $@ contains the error
1207 message on failure
1208
1209 safe_parsefile_html ($file $optional_user_agent)
1210 Same as "parsefile_html"> except that it wraps the parsing in an
1211 "eval" block. It returns the twig on success and 0 on failure (the
1212 twig object also contains the parsed twig) . $@ contains the error
1213 message on failure
1214
1215 safe_parse_html ($string_or_fh)
1216 Same as "parse_html" except that it wraps the parsing in an "eval"
1217 block. It returns the twig on success and 0 on failure (the twig
1218 object also contains the parsed twig) . $@ contains the error
1219 message on failure
1220
1221 xparse ($thing_to_parse)
1222 parse the $thing_to_parse, whether it is a filehandle, a string, an
1223 HTML file, an HTML URL, an URL or a file.
1224
1225 Note that this is mostly a convenience method for one-off scripts.
1226 For example files that end in '.htm' or '.html' are parsed first as
1227 XML, and if this fails as HTML. This is certainly not the most
1228 efficient way to do this in general.
1229
1230 nparse ($optional_twig_options, $thing_to_parse)
1231 create a twig with the $optional_options, and parse the
1232 $thing_to_parse, whether it is a filehandle, a string, an HTML
1233 file, an HTML URL, an URL or a file.
1234
1235 Examples:
1236
1237 XML::Twig->nparse( "file.xml");
1238 XML::Twig->nparse( error_context => 1, "file://file.xml");
1239
1240 nparse_pp ($optional_twig_options, $thing_to_parse)
1241 same as "nparse" but also sets the "pretty_print" option to
1242 "indented".
1243
1244 nparse_e ($optional_twig_options, $thing_to_parse)
1245 same as "nparse" but also sets the "error_context" option to 1.
1246
1247 nparse_ppe ($optional_twig_options, $thing_to_parse)
1248 same as "nparse" but also sets the "pretty_print" option to
1249 "indented" and the "error_context" option to 1.
1250
1251 parser
1252 This method returns the "expat" object (actually the
1253 XML::Parser::Expat object) used during parsing. It is useful for
1254 example to call XML::Parser::Expat methods on it. To get the line
1255 of a tag for example use "$t->parser->current_line".
1256
1257 setTwigHandlers ($handlers)
1258 Set the twig_handlers. $handlers is a reference to a hash similar
1259 to the one in the "twig_handlers" option of new. All previous
1260 handlers are unset. The method returns the reference to the
1261 previous handlers.
1262
1263 setTwigHandler ($exp $handler)
1264 Set a single twig_handler for elements matching $exp. $handler is a
1265 reference to a subroutine. If the handler was previously set then
1266 the reference to the previous handler is returned.
1267
1268 setStartTagHandlers ($handlers)
1269 Set the start_tag handlers. $handlers is a reference to a hash
1270 similar to the one in the "start_tag_handlers" option of new. All
1271 previous handlers are unset. The method returns the reference to
1272 the previous handlers.
1273
1274 setStartTagHandler ($exp $handler)
1275 Set a single start_tag handlers for elements matching $exp.
1276 $handler is a reference to a subroutine. If the handler was
1277 previously set then the reference to the previous handler is
1278 returned.
1279
1280 setEndTagHandlers ($handlers)
1281 Set the end_tag handlers. $handlers is a reference to a hash
1282 similar to the one in the "end_tag_handlers" option of new. All
1283 previous handlers are unset. The method returns the reference to
1284 the previous handlers.
1285
1286 setEndTagHandler ($exp $handler)
1287 Set a single end_tag handlers for elements matching $exp. $handler
1288 is a reference to a subroutine. If the handler was previously set
1289 then the reference to the previous handler is returned.
1290
1291 setTwigRoots ($handlers)
1292 Same as using the "twig_roots" option when creating the twig
1293
1294 setCharHandler ($exp $handler)
1295 Set a "char_handler"
1296
1297 setIgnoreEltsHandler ($exp)
1298 Set a "ignore_elt" handler (elements that match $exp will be
1299 ignored
1300
1301 setIgnoreEltsHandlers ($exp)
1302 Set all "ignore_elt" handlers (previous handlers are replaced)
1303
1304 dtd Return the dtd (an XML::Twig::DTD object) of a twig
1305
1306 xmldecl
1307 Return the XML declaration for the document, or a default one if it
1308 doesn't have one
1309
1310 doctype
1311 Return the doctype for the document
1312
1313 doctype_name
1314 returns the doctype of the document from the doctype declaration
1315
1316 system_id
1317 returns the system value of the DTD of the document from the
1318 doctype declaration
1319
1320 public_id
1321 returns the public doctype of the document from the doctype
1322 declaration
1323
1324 internal_subset
1325 returns the internal subset of the DTD
1326
1327 dtd_text
1328 Return the DTD text
1329
1330 dtd_print
1331 Print the DTD
1332
1333 model ($tag)
1334 Return the model (in the DTD) for the element $tag
1335
1336 root
1337 Return the root element of a twig
1338
1339 set_root ($elt)
1340 Set the root of a twig
1341
1342 first_elt ($optional_condition)
1343 Return the first element matching $optional_condition of a twig, if
1344 no condition is given then the root is returned
1345
1346 last_elt ($optional_condition)
1347 Return the last element matching $optional_condition of a twig, if
1348 no condition is given then the last element of the twig is returned
1349
1350 elt_id ($id)
1351 Return the element whose "id" attribute is $id
1352
1353 getEltById
1354 Same as "elt_id"
1355
1356 index ($index_name, $optional_index)
1357 If the $optional_index argument is present, return the
1358 corresponding element in the index (created using the "index"
1359 option for "XML::Twig-"new>)
1360
1361 If the argument is not present, return an arrayref to the index
1362
1363 normalize
1364 merge together all consecutive pcdata elements in the document (if
1365 for example you have turned some elements into pcdata using
1366 "erase", this will give you a "clean" document in which there all
1367 text elements are as long as possible).
1368
1369 encoding
1370 This method returns the encoding of the XML document, as defined by
1371 the "encoding" attribute in the XML declaration (ie it is "undef"
1372 if the attribute is not defined)
1373
1374 set_encoding
1375 This method sets the value of the "encoding" attribute in the XML
1376 declaration. Note that if the document did not have a declaration
1377 it is generated (with an XML version of 1.0)
1378
1379 xml_version
1380 This method returns the XML version, as defined by the "version"
1381 attribute in the XML declaration (ie it is "undef" if the attribute
1382 is not defined)
1383
1384 set_xml_version
1385 This method sets the value of the "version" attribute in the XML
1386 declaration. If the declaration did not exist it is created.
1387
1388 standalone
1389 This method returns the value of the "standalone" declaration for
1390 the document
1391
1392 set_standalone
1393 This method sets the value of the "standalone" attribute in the XML
1394 declaration. Note that if the document did not have a declaration
1395 it is generated (with an XML version of 1.0)
1396
1397 set_output_encoding
1398 Set the "encoding" "attribute" in the XML declaration
1399
1400 set_doctype ($name, $system, $public, $internal)
1401 Set the doctype of the element. If an argument is "undef" (or not
1402 present) then its former value is retained, if a false ('' or 0)
1403 value is passed then the former value is deleted;
1404
1405 entity_list
1406 Return the entity list of a twig
1407
1408 entity_names
1409 Return the list of all defined entities
1410
1411 entity ($entity_name)
1412 Return the entity
1413
1414 change_gi ($old_gi, $new_gi)
1415 Performs a (very fast) global change. All elements $old_gi are now
1416 $new_gi. This is a bit dangerous though and should be avoided if <
1417 possible, as the new tag might be ignored in subsequent processing.
1418
1419 See "BUGS "
1420
1421 flush ($optional_filehandle, %options)
1422 Flushes a twig up to (and including) the current element, then
1423 deletes all unnecessary elements from the tree that's kept in
1424 memory. "flush" keeps track of which elements need to be
1425 open/closed, so if you flush from handlers you don't have to worry
1426 about anything. Just keep flushing the twig every time you're done
1427 with a sub-tree and it will come out well-formed. After the whole
1428 parsing don't forget to"flush" one more time to print the end of
1429 the document. The doctype and entity declarations are also
1430 printed.
1431
1432 flush take an optional filehandle as an argument.
1433
1434 options: use the "update_DTD" option if you have updated the
1435 (internal) DTD and/or the entity list and you want the updated DTD
1436 to be output
1437
1438 The "pretty_print" option sets the pretty printing of the document.
1439
1440 Example: $t->flush( Update_DTD => 1);
1441 $t->flush( $filehandle, pretty_print => 'indented');
1442 $t->flush( \*FILE);
1443
1444 flush_up_to ($elt, $optional_filehandle, %options)
1445 Flushes up to the $elt element. This allows you to keep part of the
1446 tree in memory when you "flush".
1447
1448 options: see flush.
1449
1450 purge
1451 Does the same as a "flush" except it does not print the twig. It
1452 just deletes all elements that have been completely parsed so far.
1453
1454 purge_up_to ($elt)
1455 Purges up to the $elt element. This allows you to keep part of the
1456 tree in memory when you "purge".
1457
1458 print ($optional_filehandle, %options)
1459 Prints the whole document associated with the twig. To be used only
1460 AFTER the parse.
1461
1462 options: see "flush".
1463
1464 print_to_file ($filename, %options)
1465 Prints the whole document associated with the twig to file
1466 $filename. To be used only AFTER the parse.
1467
1468 options: see "flush".
1469
1470 sprint
1471 Return the text of the whole document associated with the twig. To
1472 be used only AFTER the parse.
1473
1474 options: see "flush".
1475
1476 trim
1477 Trim the document: gets rid of initial and trailing spaces, and
1478 replaces multiple spaces by a single one.
1479
1480 toSAX1 ($handler)
1481 Send SAX events for the twig to the SAX1 handler $handler
1482
1483 toSAX2 ($handler)
1484 Send SAX events for the twig to the SAX2 handler $handler
1485
1486 flush_toSAX1 ($handler)
1487 Same as flush, except that SAX events are sent to the SAX1 handler
1488 $handler instead of the twig being printed
1489
1490 flush_toSAX2 ($handler)
1491 Same as flush, except that SAX events are sent to the SAX2 handler
1492 $handler instead of the twig being printed
1493
1494 ignore
1495 This method should be called during parsing, usually in
1496 "start_tag_handlers". It causes the element to be skipped during
1497 the parsing: the twig is not built for this element, it will not be
1498 accessible during parsing or after it. The element will not take up
1499 any memory and parsing will be faster.
1500
1501 Note that this method can also be called on an element. If the
1502 element is a parent of the current element then this element will
1503 be ignored (the twig will not be built any more for it and what has
1504 already been built will be deleted).
1505
1506 set_pretty_print ($style)
1507 Set the pretty print method, amongst '"none"' (default),
1508 '"nsgmls"', '"nice"', '"indented"', "indented_c", '"wrapped"',
1509 '"record"' and '"record_c"'
1510
1511 WARNING: the pretty print style is a GLOBAL variable, so once set
1512 it's applied to ALL "print"'s (and "sprint"'s). Same goes if you
1513 use XML::Twig with "mod_perl" . This should not be a problem as the
1514 XML that's generated is valid anyway, and XML processors (as well
1515 as HTML processors, including browsers) should not care. Let me
1516 know if this is a big problem, but at the moment the
1517 performance/cleanliness trade-off clearly favors the global
1518 approach.
1519
1520 set_empty_tag_style ($style)
1521 Set the empty tag display style ('"normal"', '"html"' or
1522 '"expand"'). As with "set_pretty_print" this sets a global flag.
1523
1524 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
1525 '"<tag />"' for elements that can be empty in XHTML and "expand"
1526 outputs '"<tag></tag>"'
1527
1528 set_remove_cdata ($flag)
1529 set (or unset) the flag that forces the twig to output CDATA
1530 sections as regular (escaped) PCDATA
1531
1532 print_prolog ($optional_filehandle, %options)
1533 Prints the prolog (XML declaration + DTD + entity declarations) of
1534 a document.
1535
1536 options: see "flush".
1537
1538 prolog ($optional_filehandle, %options)
1539 Return the prolog (XML declaration + DTD + entity declarations) of
1540 a document.
1541
1542 options: see "flush".
1543
1544 finish
1545 Call Expat "finish" method. Unsets all handlers (including
1546 internal ones that set context), but expat continues parsing to the
1547 end of the document or until it finds an error. It should finish
1548 up a lot faster than with the handlers set.
1549
1550 finish_print
1551 Stops twig processing, flush the twig and proceed to finish
1552 printing the document as fast as possible. Use this method when
1553 modifying a document and the modification is done.
1554
1555 finish_now
1556 Stops twig processing, does not finish parsing the document (which
1557 could actually be not well-formed after the point where
1558 "finish_now" is called). Execution resumes after the "Lparse"> or
1559 "parsefile" call. The content of the twig is what has been parsed
1560 so far (all open elements at the time "finish_now" is called are
1561 considered closed).
1562
1563 set_expand_external_entities
1564 Same as using the "expand_external_ents" option when creating the
1565 twig
1566
1567 set_input_filter
1568 Same as using the "input_filter" option when creating the twig
1569
1570 set_keep_atts_order
1571 Same as using the "keep_atts_order" option when creating the twig
1572
1573 set_keep_encoding
1574 Same as using the "keep_encoding" option when creating the twig
1575
1576 escape_gt
1577 usually XML::Twig does not escape > in its output. Using this
1578 option makes it replace > by >
1579
1580 do_not_escape_gt
1581 reverts XML::Twig behavior to its default of not escaping > in its
1582 output.
1583
1584 set_output_filter
1585 Same as using the "output_filter" option when creating the twig
1586
1587 set_output_text_filter
1588 Same as using the "output_text_filter" option when creating the
1589 twig
1590
1591 add_stylesheet ($type, @options)
1592 Adds an external stylesheet to an XML document.
1593
1594 Supported types and options:
1595
1596 xsl option: the url of the stylesheet
1597
1598 Example:
1599
1600 $t->add_stylesheet( xsl => "xsl_style.xsl");
1601
1602 will generate the following PI at the beginning of the
1603 document:
1604
1605 <?xml-stylesheet type="text/xsl" href="xsl_style.xsl"?>
1606
1607 css option: the url of the stylesheet
1608
1609 Methods inherited from XML::Parser::Expat
1610 A twig inherits all the relevant methods from XML::Parser::Expat.
1611 These methods can only be used during the parsing phase (they will
1612 generate a fatal error otherwise).
1613
1614 Inherited methods are:
1615
1616 depth
1617 Returns the size of the context list.
1618
1619 in_element
1620 Returns true if NAME is equal to the name of the innermost
1621 curaXX rently opened element. If namespace processing is being
1622 used and you want to check against a name that may be in a
1623 namespace, then use the generate_ns_name method to create the
1624 NAME argument.
1625
1626 within_element
1627 Returns the number of times the given name appears in the
1628 context list. If namespace processing is being used and you
1629 want to check against a name that may be in a namespace, then
1630 use the generaXX ate_ns_name method to create the NAME
1631 argument.
1632
1633 context
1634 Returns a list of element names that represent open elements,
1635 with the last one being the innermost. Inside start and end tag
1636 hanaXX dlers, this will be the tag of the parent element.
1637
1638 current_line
1639 Returns the line number of the current position of the parse.
1640
1641 current_column
1642 Returns the column number of the current position of the parse.
1643
1644 current_byte
1645 Returns the current position of the parse.
1646
1647 position_in_context
1648 Returns a string that shows the current parse position. LINES
1649 should be an integer >= 0 that represents the number of lines
1650 on either side of the current parse line to place into the
1651 returned string.
1652
1653 base ([NEWBASE])
1654 Returns the current value of the base for resolving relative
1655 URIs. If NEWBASE is supplied, changes the base to that value.
1656
1657 current_element
1658 Returns the name of the innermost currently opened element.
1659 Inside start or end handlers, returns the parent of the element
1660 associated with those tags.
1661
1662 element_index
1663 Returns an integer that is the depth-first visit order of the
1664 curaXX rent element. This will be zero outside of the root
1665 element. For example, this will return 1 when called from the
1666 start handler for the root element start tag.
1667
1668 recognized_string
1669 Returns the string from the document that was recognized in
1670 order to call the current handler. For instance, when called
1671 from a start handler, it will give us the the start-tag string.
1672 The string is encoded in UTF-8. This method doesn't return a
1673 meaningful string inside declaration handlers.
1674
1675 original_string
1676 Returns the verbatim string from the document that was
1677 recognized in order to call the current handler. The string is
1678 in the original document encoding. This method doesn't return a
1679 meaningful string inside declaration handlers.
1680
1681 xpcroak
1682 Concatenate onto the given message the current line number
1683 within the XML document plus the message implied by
1684 ErrorContext. Then croak with the formed message.
1685
1686 xpcarp
1687 Concatenate onto the given message the current line number
1688 within the XML document plus the message implied by
1689 ErrorContext. Then carp with the formed message.
1690
1691 xml_escape(TEXT [, CHAR [, CHAR ...]])
1692 Returns TEXT with markup characters turned into character
1693 entities. Any additional characters provided as arguments are
1694 also turned into character references where found in TEXT.
1695
1696 (this method is broken on some versions of expat/XML::Parser)
1697
1698 path ( $optional_tag)
1699 Return the element context in a form similar to XPath's short form:
1700 '"/root/tag1/../tag"'
1701
1702 get_xpath ( $optional_array_ref, $xpath, $optional_offset)
1703 Performs a "get_xpath" on the document root (see <Elt|"Elt">)
1704
1705 If the $optional_array_ref argument is used the array must contain
1706 elements. The $xpath expression is applied to each element in turn
1707 and the result is union of all results. This way a first query can
1708 be refined in further steps.
1709
1710 find_nodes ( $optional_array_ref, $xpath, $optional_offset)
1711 same as "get_xpath"
1712
1713 findnodes ( $optional_array_ref, $xpath, $optional_offset)
1714 same as "get_xpath" (similar to the XML::LibXML method)
1715
1716 findvalue ( $optional_array_ref, $xpath, $optional_offset)
1717 Return the "join" of all texts of the results of applying
1718 "get_xpath" to the node (similar to the XML::LibXML method)
1719
1720 subs_text ($regexp, $replace)
1721 subs_text does text substitution on the whole document, similar to
1722 perl's " s///" operator.
1723
1724 dispose
1725 Useful only if you don't have "Scalar::Util" or "WeakRef"
1726 installed.
1727
1728 Reclaims properly the memory used by an XML::Twig object. As the
1729 object has circular references it never goes out of scope, so if
1730 you want to parse lots of XML documents then the memory leak
1731 becomes a problem. Use "$twig->dispose" to clear this problem.
1732
1733 create_accessors (list_of_attribute_names)
1734 A convenience method that creates l-valued accessors for
1735 attributes. So "$twig->create_accessors( 'foo')" will create a
1736 "foo" method that can be called on elements:
1737
1738 $elt->foo; # equivalent to $elt->{'att'}->{'foo'};
1739 $elt->foo( 'bar'); # equivalent to $elt->set_att( foo => 'bar');
1740
1741 set_do_not_escape_amp_in_atts
1742 An evil method, that I only document because Test::Pod::Coverage
1743 complaints otherwise, but really, you don't want to know about it.
1744
1745 XML::Twig::Elt
1746 new ($optional_tag, $optional_atts, @optional_content)
1747 The "tag" is optional (but then you can't have a content ), the
1748 $optional_atts argument is a reference to a hash of attributes, the
1749 content can be just a string or a list of strings and element. A
1750 content of '"#EMPTY"' creates an empty element;
1751
1752 Examples: my $elt= XML::Twig::Elt->new();
1753 my $elt= XML::Twig::Elt->new( para => { align => 'center' });
1754 my $elt= XML::Twig::Elt->new( para => { align => 'center' }, 'foo');
1755 my $elt= XML::Twig::Elt->new( br => '#EMPTY');
1756 my $elt= XML::Twig::Elt->new( 'para');
1757 my $elt= XML::Twig::Elt->new( para => 'this is a para');
1758 my $elt= XML::Twig::Elt->new( para => $elt3, 'another para');
1759
1760 The strings are not parsed, the element is not attached to any
1761 twig.
1762
1763 WARNING: if you rely on ID's then you will have to set the id
1764 yourself. At this point the element does not belong to a twig yet,
1765 so the ID attribute is not known so it won't be stored in the ID
1766 list.
1767
1768 Note that "#COMMENT", "#PCDATA" or "#CDATA" are valid tag names,
1769 that will create text elements.
1770
1771 To create an element "foo" containing a CDATA section:
1772
1773 my $foo= XML::Twig::Elt->new( '#CDATA' => "content of the CDATA section")
1774 ->wrap_in( 'foo');
1775
1776 An attribute of '#CDATA', will create the content of the element as
1777 CDATA:
1778
1779 my $elt= XML::Twig::Elt->new( 'p' => { '#CDATA' => 1}, 'foo < bar');
1780
1781 creates an element
1782
1783 <p><![CDATA[foo < bar]]></>
1784
1785 parse ($string, %args)
1786 Creates an element from an XML string. The string is actually
1787 parsed as a new twig, then the root of that twig is returned. The
1788 arguments in %args are passed to the twig. As always if the parse
1789 fails the parser will die, so use an eval if you want to trap
1790 syntax errors.
1791
1792 As obviously the element does not exist beforehand this method has
1793 to be called on the class:
1794
1795 my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
1796 <elements>, actually tons of </elements>
1797 h</a>");
1798
1799 set_inner_xml ($string)
1800 Sets the content of the element to be the tree created from the
1801 string
1802
1803 set_inner_html ($string)
1804 Sets the content of the element, after parsing the string with an
1805 HTML parser (HTML::Parser)
1806
1807 print ($optional_filehandle, $optional_pretty_print_style)
1808 Prints an entire element, including the tags, optionally to a
1809 $optional_filehandle, optionally with a $pretty_print_style.
1810
1811 The print outputs XML data so base entities are escaped.
1812
1813 sprint ($elt, $optional_no_enclosing_tag)
1814 Return the xml string for an entire element, including the tags.
1815 If the optional second argument is true then only the string inside
1816 the element is returned (the start and end tag for $elt are not).
1817 The text is XML-escaped: base entities (& and < in text, & < and "
1818 in attribute values) are turned into entities.
1819
1820 gi Return the gi of the element (the gi is the "generic identifier"
1821 the tag name in SGML parlance).
1822
1823 "tag" and "name" are synonyms of "gi".
1824
1825 tag Same as "gi"
1826
1827 name
1828 Same as "tag"
1829
1830 set_gi ($tag)
1831 Set the gi (tag) of an element
1832
1833 set_tag ($tag)
1834 Set the tag (="tag") of an element
1835
1836 set_name ($name)
1837 Set the name (="tag") of an element
1838
1839 root
1840 Return the root of the twig in which the element is contained.
1841
1842 twig
1843 Return the twig containing the element.
1844
1845 parent ($optional_condition)
1846 Return the parent of the element, or the first ancestor matching
1847 the $optional_condition
1848
1849 first_child ($optional_condition)
1850 Return the first child of the element, or the first child matching
1851 the $optional_condition
1852
1853 has_child ($optional_condition)
1854 Return the first child of the element, or the first child matching
1855 the $optional_condition (same as first_child)
1856
1857 has_children ($optional_condition)
1858 Return the first child of the element, or the first child matching
1859 the $optional_condition (same as first_child)
1860
1861 first_child_text ($optional_condition)
1862 Return the text of the first child of the element, or the first
1863 child
1864 matching the $optional_condition If there is no first_child then
1865 returns ''. This avoids getting the child, checking for its
1866 existence then getting the text for trivial cases.
1867
1868 Similar methods are available for the other navigation methods:
1869
1870 last_child_text
1871 prev_sibling_text
1872 next_sibling_text
1873 prev_elt_text
1874 next_elt_text
1875 child_text
1876 parent_text
1877
1878 All this methods also exist in "trimmed" variant:
1879
1880 first_child_trimmed_text
1881 last_child_trimmed_text
1882 prev_sibling_trimmed_text
1883 next_sibling_trimmed_text
1884 prev_elt_trimmed_text
1885 next_elt_trimmed_text
1886 child_trimmed_text
1887 parent_trimmed_text
1888 field ($condition)
1889 Same method as "first_child_text" with a different name
1890
1891 fields ($condition_list)
1892 Return the list of field (text of first child matching the
1893 conditions), missing fields are returned as the empty string.
1894
1895 Same method as "first_child_text" with a different name
1896
1897 trimmed_field ($optional_condition)
1898 Same method as "first_child_trimmed_text" with a different name
1899
1900 set_field ($condition, $optional_atts, @list_of_elt_and_strings)
1901 Set the content of the first child of the element that matches
1902 $condition, the rest of the arguments is the same as for
1903 "set_content"
1904
1905 If no child matches $condition _and_ if $condition is a valid XML
1906 element name, then a new element by that name is created and
1907 inserted as the last child.
1908
1909 first_child_matches ($optional_condition)
1910 Return the element if the first child of the element (if it exists)
1911 passes the $optional_condition "undef" otherwise
1912
1913 if( $elt->first_child_matches( 'title')) ...
1914
1915 is equivalent to
1916
1917 if( $elt->{first_child} && $elt->{first_child}->passes( 'title'))
1918
1919 "first_child_is" is an other name for this method
1920
1921 Similar methods are available for the other navigation methods:
1922
1923 last_child_matches
1924 prev_sibling_matches
1925 next_sibling_matches
1926 prev_elt_matches
1927 next_elt_matches
1928 child_matches
1929 parent_matches
1930 is_first_child ($optional_condition)
1931 returns true (the element) if the element is the first child of its
1932 parent (optionally that satisfies the $optional_condition)
1933
1934 is_last_child ($optional_condition)
1935 returns true (the element) if the element is the first child of its
1936 parent (optionally that satisfies the $optional_condition)
1937
1938 prev_sibling ($optional_condition)
1939 Return the previous sibling of the element, or the previous sibling
1940 matching $optional_condition
1941
1942 next_sibling ($optional_condition)
1943 Return the next sibling of the element, or the first one matching
1944 $optional_condition.
1945
1946 next_elt ($optional_elt, $optional_condition)
1947 Return the next elt (optionally matching $optional_condition) of
1948 the element. This is defined as the next element which opens after
1949 the current element opens. Which usually means the first child of
1950 the element. Counter-intuitive as it might look this allows you to
1951 loop through the whole document by starting from the root.
1952
1953 The $optional_elt is the root of a subtree. When the "next_elt" is
1954 out of the subtree then the method returns undef. You can then walk
1955 a sub tree with:
1956
1957 my $elt= $subtree_root;
1958 while( $elt= $elt->next_elt( $subtree_root)
1959 { # insert processing code here
1960 }
1961
1962 prev_elt ($optional_condition)
1963 Return the previous elt (optionally matching $optional_condition)
1964 of the element. This is the first element which opens before the
1965 current one. It is usually either the last descendant of the
1966 previous sibling or simply the parent
1967
1968 next_n_elt ($offset, $optional_condition)
1969 Return the $offset-th element that matches the $optional_condition
1970
1971 following_elt
1972 Return the following element (as per the XPath following axis)
1973
1974 preceding_elt
1975 Return the preceding element (as per the XPath preceding axis)
1976
1977 following_elts
1978 Return the list of following elements (as per the XPath following
1979 axis)
1980
1981 preceding_elts
1982 Return the pst of preceding elements (as per the XPath preceding
1983 axis)
1984
1985 children ($optional_condition)
1986 Return the list of children (optionally which matches
1987 $optional_condition) of the element. The list is in document order.
1988
1989 children_count ($optional_condition)
1990 Return the number of children of the element (optionally which
1991 matches $optional_condition)
1992
1993 children_text ($optional_condition)
1994 In array context, reeturns an array containing the text of children
1995 of the element (optionally which matches $optional_condition)
1996
1997 In scalar context, returns the concatenation of the text of
1998 children of the element
1999
2000 children_trimmed_text ($optional_condition)
2001 In array context, returns an array containing the trimmed text of
2002 children of the element (optionally which matches
2003 $optional_condition)
2004
2005 In scalar context, returns the concatenation of the trimmed text of
2006 children of the element
2007
2008 children_copy ($optional_condition)
2009 Return a list of elements that are copies of the children of the
2010 element, optionally which matches $optional_condition
2011
2012 descendants ($optional_condition)
2013 Return the list of all descendants (optionally which matches
2014 $optional_condition) of the element. This is the equivalent of the
2015 "getElementsByTagName" of the DOM (by the way, if you are really a
2016 DOM addict, you can use "getElementsByTagName" instead)
2017
2018 getElementsByTagName ($optional_condition)
2019 Same as "descendants"
2020
2021 find_by_tag_name ($optional_condition)
2022 Same as "descendants"
2023
2024 descendants_or_self ($optional_condition)
2025 Same as "descendants" except that the element itself is included in
2026 the list if it matches the $optional_condition
2027
2028 first_descendant ($optional_condition)
2029 Return the first descendant of the element that matches the
2030 condition
2031
2032 last_descendant ($optional_condition)
2033 Return the last descendant of the element that matches the
2034 condition
2035
2036 ancestors ($optional_condition)
2037 Return the list of ancestors (optionally matching
2038 $optional_condition) of the element. The list is ordered from the
2039 innermost ancestor to the outermost one
2040
2041 NOTE: the element itself is not part of the list, in order to
2042 include it you will have to use ancestors_or_self
2043
2044 ancestors_or_self ($optional_condition)
2045 Return the list of ancestors (optionally matching
2046 $optional_condition) of the element, including the element (if it
2047 matches the condition>). The list is ordered from the innermost
2048 ancestor to the outermost one
2049
2050 passes ($condition)
2051 Return the element if it passes the $condition
2052
2053 att ($att)
2054 Return the value of attribute $att or "undef"
2055
2056 set_att ($att, $att_value)
2057 Set the attribute of the element to the given value
2058
2059 You can actually set several attributes this way:
2060
2061 $elt->set_att( att1 => "val1", att2 => "val2");
2062
2063 del_att ($att)
2064 Delete the attribute for the element
2065
2066 You can actually delete several attributes at once:
2067
2068 $elt->del_att( 'att1', 'att2', 'att3');
2069
2070 att_exists ($att)
2071 Returns true if the attribute $att exists for the element, false
2072 otherwise
2073
2074 cut Cut the element from the tree. The element still exists, it can be
2075 copied or pasted somewhere else, it is just not attached to the
2076 tree anymore.
2077
2078 Note that the "old" links to the parent, previous and next siblings
2079 can still be accessed using the former_* methods
2080
2081 former_next_sibling
2082 Returns the former next sibling of a cut node (or undef if the node
2083 has not been cut)
2084
2085 This makes it easier to write loops where you cut elements:
2086
2087 my $child= $parent->first_child( 'achild');
2088 while( $child->{'att'}->{'cut'})
2089 { $child->cut; $child= $child->former_next_sibling; }
2090
2091 former_prev_sibling
2092 Returns the former previous sibling of a cut node (or undef if the
2093 node has not been cut)
2094
2095 former_parent
2096 Returns the former parent of a cut node (or undef if the node has
2097 not been cut)
2098
2099 cut_children ($optional_condition)
2100 Cut all the children of the element (or all of those which satisfy
2101 the $optional_condition).
2102
2103 Return the list of children
2104
2105 copy ($elt)
2106 Return a copy of the element. The copy is a "deep" copy: all sub
2107 elements of the element are duplicated.
2108
2109 paste ($optional_position, $ref)
2110 Paste a (previously "cut" or newly generated) element. Die if the
2111 element already belongs to a tree.
2112
2113 Note that the calling element is pasted:
2114
2115 $child->paste( first_child => $existing_parent);
2116 $new_sibling->paste( after => $this_sibling_is_already_in_the_tree);
2117
2118 or
2119
2120 my $new_elt= XML::Twig::Elt->new( tag => $content);
2121 $new_elt->paste( $position => $existing_elt);
2122
2123 Example:
2124
2125 my $t= XML::Twig->new->parse( 'doc.xml')
2126 my $toc= $t->root->new( 'toc');
2127 $toc->paste( $t->root); # $toc is pasted as first child of the root
2128 foreach my $title ($t->findnodes( '/doc/section/title'))
2129 { my $title_toc= $title->copy;
2130 # paste $title_toc as the last child of toc
2131 $title_toc->paste( last_child => $toc)
2132 }
2133
2134 Position options:
2135
2136 first_child (default)
2137 The element is pasted as the first child of $ref
2138
2139 last_child
2140 The element is pasted as the last child of $ref
2141
2142 before
2143 The element is pasted before $ref, as its previous sibling.
2144
2145 after
2146 The element is pasted after $ref, as its next sibling.
2147
2148 within
2149 In this case an extra argument, $offset, should be supplied.
2150 The element will be pasted in the reference element (or in its
2151 first text child) at the given offset. To achieve this the
2152 reference element will be split at the offset.
2153
2154 Note that you can call directly the underlying method:
2155
2156 paste_before
2157 paste_after
2158 paste_first_child
2159 paste_last_child
2160 paste_within
2161 move ($optional_position, $ref)
2162 Move an element in the tree. This is just a "cut" then a "paste".
2163 The syntax is the same as "paste".
2164
2165 replace ($ref)
2166 Replaces an element in the tree. Sometimes it is just not possible
2167 to"cut" an element then "paste" another in its place, so "replace"
2168 comes in handy. The calling element replaces $ref.
2169
2170 replace_with (@elts)
2171 Replaces the calling element with one or more elements
2172
2173 delete
2174 Cut the element and frees the memory.
2175
2176 prefix ($text, $optional_option)
2177 Add a prefix to an element. If the element is a "PCDATA" element
2178 the text is added to the pcdata, if the elements first child is a
2179 "PCDATA" then the text is added to it's pcdata, otherwise a new
2180 "PCDATA" element is created and pasted as the first child of the
2181 element.
2182
2183 If the option is "asis" then the prefix is added asis: it is
2184 created in a separate "PCDATA" element with an "asis" property. You
2185 can then write:
2186
2187 $elt1->prefix( '<b>', 'asis');
2188
2189 to create a "<b>" in the output of "print".
2190
2191 suffix ($text, $optional_option)
2192 Add a suffix to an element. If the element is a "PCDATA" element
2193 the text is added to the pcdata, if the elements last child is a
2194 "PCDATA" then the text is added to it's pcdata, otherwise a new
2195 PCDATA element is created and pasted as the last child of the
2196 element.
2197
2198 If the option is "asis" then the suffix is added asis: it is
2199 created in a separate "PCDATA" element with an "asis" property. You
2200 can then write:
2201
2202 $elt2->suffix( '</b>', 'asis');
2203
2204 trim
2205 Trim the element in-place: spaces at the beginning and at the end
2206 of the element are discarded and multiple spaces within the element
2207 (or its descendants) are replaced by a single space.
2208
2209 Note that in some cases you can still end up with multiple spaces,
2210 if they are split between several elements:
2211
2212 <doc> text <b> hah! </b> yep</doc>
2213
2214 gets trimmed to
2215
2216 <doc>text <b> hah! </b> yep</doc>
2217
2218 This is somewhere in between a bug and a feature.
2219
2220 normalize
2221 merge together all consecutive pcdata elements in the element (if
2222 for example you have turned some elements into pcdata using
2223 "erase", this will give you a "clean" element in which there all
2224 text fragments are as long as possible).
2225
2226 simplify (%options)
2227 Return a data structure suspiciously similar to XML::Simple's.
2228 Options are identical to XMLin options, see XML::Simple doc for
2229 more details (or use DATA::dumper or YAML to dump the data
2230 structure)
2231
2232 content_key
2233 forcearray
2234 keyattr
2235 noattr
2236 normalize_space
2237 aka normalise_space
2238
2239 variables (%var_hash)
2240 %var_hash is a hash { name => value }
2241
2242 This option allows variables in the XML to be expanded when the
2243 file is read. (there is no facility for putting the variable
2244 names back if you regenerate XML using XMLout).
2245
2246 A 'variable' is any text of the form ${name} (or $name) which
2247 occurs in an attribute value or in the text content of an
2248 element. If 'name' matches a key in the supplied hashref,
2249 ${name} will be replaced with the corresponding value from the
2250 hashref. If no matching key is found, the variable will not be
2251 replaced.
2252
2253 var_att ($attribute_name)
2254 This option gives the name of an attribute that will be used to
2255 create variables in the XML:
2256
2257 <dirs>
2258 <dir name="prefix">/usr/local</dir>
2259 <dir name="exec_prefix">$prefix/bin</dir>
2260 </dirs>
2261
2262 use "var => 'name'" to get $prefix replaced by /usr/local in
2263 the generated data structure
2264
2265 By default variables are captured by the following regexp:
2266 /$(\w+)/
2267
2268 var_regexp (regexp)
2269 This option changes the regexp used to capture variables. The
2270 variable name should be in $1
2271
2272 group_tags { grouping tag => grouped tag, grouping tag 2 => grouped
2273 tag 2...}
2274 Option used to simplify the structure: elements listed will not
2275 be used. Their children will be, they will be considered
2276 children of the element parent.
2277
2278 If the element is:
2279
2280 <config host="laptop.xmltwig.com">
2281 <server>localhost</server>
2282 <dirs>
2283 <dir name="base">/home/mrodrigu/standards</dir>
2284 <dir name="tools">$base/tools</dir>
2285 </dirs>
2286 <templates>
2287 <template name="std_def">std_def.templ</template>
2288 <template name="dummy">dummy</template>
2289 </templates>
2290 </config>
2291
2292 Then calling simplify with "group_tags => { dirs => 'dir',
2293 templates => 'template'}" makes the data structure be exactly
2294 as if the start and end tags for "dirs" and "templates" were
2295 not there.
2296
2297 A YAML dump of the structure
2298
2299 base: '/home/mrodrigu/standards'
2300 host: laptop.xmltwig.com
2301 server: localhost
2302 template:
2303 - std_def.templ
2304 - dummy.templ
2305 tools: '$base/tools'
2306
2307 split_at ($offset)
2308 Split a text ("PCDATA" or "CDATA") element in 2 at $offset, the
2309 original element now holds the first part of the string and a new
2310 element holds the right part. The new element is returned
2311
2312 If the element is not a text element then the first text child of
2313 the element is split
2314
2315 split ( $optional_regexp, $tag1, $atts1, $tag2, $atts2...)
2316 Split the text descendants of an element in place, the text is
2317 split using the $regexp, if the regexp includes () then the matched
2318 separators will be wrapped in elements. $1 is wrapped in $tag1,
2319 with attributes $atts1 if $atts1 is given (as a hashref), $2 is
2320 wrapped in $tag2...
2321
2322 if $elt is "<p>tati tata <b>tutu tati titi</b> tata tati tata</p>"
2323
2324 $elt->split( qr/(ta)ti/, 'foo', {type => 'toto'} )
2325
2326 will change $elt to
2327
2328 <p><foo type="toto">ta</foo> tata <b>tutu <foo type="toto">ta</foo>
2329 titi</b> tata <foo type="toto">ta</foo> tata</p>
2330
2331 The regexp can be passed either as a string or as "qr//" (perl
2332 5.005 and later), it defaults to \s+ just as the "split" built-in
2333 (but this would be quite a useless behaviour without the
2334 $optional_tag parameter)
2335
2336 $optional_tag defaults to PCDATA or CDATA, depending on the initial
2337 element type
2338
2339 The list of descendants is returned (including un-touched original
2340 elements and newly created ones)
2341
2342 mark ( $regexp, $optional_tag, $optional_attribute_ref)
2343 This method behaves exactly as split, except only the newly created
2344 elements are returned
2345
2346 wrap_children ( $regexp_string, $tag, $optional_attribute_hashref)
2347 Wrap the children of the element that match the regexp in an
2348 element $tag. If $optional_attribute_hashref is passed then the
2349 new element will have these attributes.
2350
2351 The $regexp_string includes tags, within pointy brackets, as in
2352 "<title><para>+" and the usual Perl modifiers (+*?...). Tags can
2353 be further qualified with attributes: "<para type="warning"
2354 classif="cosmic_secret">+". The values for attributes should be
2355 xml-escaped: "<candy type="M&Ms">*" ("<", "&" ">" and """
2356 should be escaped).
2357
2358 Note that elements might get extra "id" attributes in the process.
2359 See add_id. Use strip_att to remove unwanted id's.
2360
2361 Here is an example:
2362
2363 If the element $elt has the following content:
2364
2365 <elt>
2366 <p>para 1</p>
2367 <l_l1_1>list 1 item 1 para 1</l_l1_1>
2368 <l_l1>list 1 item 1 para 2</l_l1>
2369 <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2370 <l_l1_n>list 1 item 3 para 1</l_l1_n>
2371 <l_l1>list 1 item 3 para 2</l_l1>
2372 <l_l1>list 1 item 3 para 3</l_l1>
2373 <l_l1_1>list 2 item 1 para 1</l_l1_1>
2374 <l_l1>list 2 item 1 para 2</l_l1>
2375 <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2376 <l_l1_n>list 2 item 3 para 1</l_l1_n>
2377 <l_l1>list 2 item 3 para 2</l_l1>
2378 <l_l1>list 2 item 3 para 3</l_l1>
2379 </elt>
2380
2381 Then the code
2382
2383 $elt->wrap_children( q{<l_l1_1><l_l1>*} , li => { type => "ul1" });
2384 $elt->wrap_children( q{<l_l1_n><l_l1>*} , li => { type => "ul" });
2385
2386 $elt->wrap_children( q{<li type="ul1"><li type="ul">+}, "ul");
2387 $elt->strip_att( 'id');
2388 $elt->strip_att( 'type');
2389 $elt->print;
2390
2391 will output:
2392
2393 <elt>
2394 <p>para 1</p>
2395 <ul>
2396 <li>
2397 <l_l1_1>list 1 item 1 para 1</l_l1_1>
2398 <l_l1>list 1 item 1 para 2</l_l1>
2399 </li>
2400 <li>
2401 <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2402 </li>
2403 <li>
2404 <l_l1_n>list 1 item 3 para 1</l_l1_n>
2405 <l_l1>list 1 item 3 para 2</l_l1>
2406 <l_l1>list 1 item 3 para 3</l_l1>
2407 </li>
2408 </ul>
2409 <ul>
2410 <li>
2411 <l_l1_1>list 2 item 1 para 1</l_l1_1>
2412 <l_l1>list 2 item 1 para 2</l_l1>
2413 </li>
2414 <li>
2415 <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2416 </li>
2417 <li>
2418 <l_l1_n>list 2 item 3 para 1</l_l1_n>
2419 <l_l1>list 2 item 3 para 2</l_l1>
2420 <l_l1>list 2 item 3 para 3</l_l1>
2421 </li>
2422 </ul>
2423 </elt>
2424
2425 subs_text ($regexp, $replace)
2426 subs_text does text substitution, similar to perl's " s///"
2427 operator.
2428
2429 $regexp must be a perl regexp, created with the "qr" operator.
2430
2431 $replace can include "$1, $2"... from the $regexp. It can also be
2432 used to create element and entities, by using "&elt( tag => { att
2433 => val }, text)" (similar syntax as "new") and "&ent( name)".
2434
2435 Here is a rather complex example:
2436
2437 $elt->subs_text( qr{(?<!do not )link to (http://([^\s,]*))},
2438 'see &elt( a =>{ href => $1 }, $2)'
2439 );
2440
2441 This will replace text like link to http://www.xmltwig.com by see
2442 <a href="www.xmltwig.com">www.xmltwig.com</a>, but not do not link
2443 to...
2444
2445 Generating entities (here replacing spaces with ):
2446
2447 $elt->subs_text( qr{ }, '&ent( " ")');
2448
2449 or, using a variable:
2450
2451 my $ent=" ";
2452 $elt->subs_text( qr{ }, "&ent( '$ent')");
2453
2454 Note that the substitution is always global, as in using the "g"
2455 modifier in a perl substitution, and that it is performed on all
2456 text descendants of the element.
2457
2458 Bug: in the $regexp, you can only use "\1", "\2"... if the
2459 replacement expression does not include elements or attributes. eg
2460
2461 t->subs_text( qr/((t[aiou])\2)/, '$2'); # ok, replaces toto, tata, titi, tutu by to, ta, ti, tu
2462 t->subs_text( qr/((t[aiou])\2)/, '&elt(p => $1)' ); # NOK, does not find toto...
2463
2464 add_id ($optional_coderef)
2465 Add an id to the element.
2466
2467 The id is an attribute, "id" by default, see the "id" option for
2468 XML::Twig "new" to change it. Use an id starting with "#" to get an
2469 id that's not output by print, flush or sprint, yet that allows you
2470 to use the elt_id method to get the element easily.
2471
2472 If the element already has an id, no new id is generated.
2473
2474 By default the method create an id of the form "twig_id_<nnnn>",
2475 where "<nnnn>" is a number, incremented each time the method is
2476 called successfully.
2477
2478 set_id_seed ($prefix)
2479 by default the id generated by "add_id" is "twig_id_<nnnn>",
2480 "set_id_seed" changes the prefix to $prefix and resets the number
2481 to 1
2482
2483 strip_att ($att)
2484 Remove the attribute $att from all descendants of the element
2485 (including the element)
2486
2487 Return the element
2488
2489 change_att_name ($old_name, $new_name)
2490 Change the name of the attribute from $old_name to $new_name. If
2491 there is no attribute $old_name nothing happens.
2492
2493 lc_attnames
2494 Lower cases the name all the attributes of the element.
2495
2496 sort_children_on_value( %options)
2497 Sort the children of the element in place according to their text.
2498 All children are sorted.
2499
2500 Return the element, with its children sorted.
2501
2502 %options are
2503
2504 type : numeric | alpha (default: alpha)
2505 order : normal | reverse (default: normal)
2506
2507 Return the element, with its children sorted
2508
2509 sort_children_on_att ($att, %options)
2510 Sort the children of the element in place according to attribute
2511 $att. %options are the same as for "sort_children_on_value"
2512
2513 Return the element.
2514
2515 sort_children_on_field ($tag, %options)
2516 Sort the children of the element in place, according to the field
2517 $tag (the text of the first child of the child with this tag).
2518 %options are the same as for "sort_children_on_value".
2519
2520 Return the element, with its children sorted
2521
2522 sort_children( $get_key, %options)
2523 Sort the children of the element in place. The $get_key argument is
2524 a reference to a function that returns the sort key when passed an
2525 element.
2526
2527 For example:
2528
2529 $elt->sort_children( sub { $_[0]->{'att'}->{"nb"} + $_[0]->text },
2530 type => 'numeric', order => 'reverse'
2531 );
2532
2533 field_to_att ($cond, $att)
2534 Turn the text of the first sub-element matched by $cond into the
2535 value of attribute $att of the element. If $att is omitted then
2536 $cond is used as the name of the attribute, which makes sense only
2537 if $cond is a valid element (and attribute) name.
2538
2539 The sub-element is then cut.
2540
2541 att_to_field ($att, $tag)
2542 Take the value of attribute $att and create a sub-element $tag as
2543 first child of the element. If $tag is omitted then $att is used as
2544 the name of the sub-element.
2545
2546 get_xpath ($xpath, $optional_offset)
2547 Return a list of elements satisfying the $xpath. $xpath is an
2548 XPATH-like expression.
2549
2550 A subset of the XPATH abbreviated syntax is covered:
2551
2552 tag
2553 tag[1] (or any other positive number)
2554 tag[last()]
2555 tag[@att] (the attribute exists for the element)
2556 tag[@att="val"]
2557 tag[@att=~ /regexp/]
2558 tag[att1="val1" and att2="val2"]
2559 tag[att1="val1" or att2="val2"]
2560 tag[string()="toto"] (returns tag elements which text (as per the text method)
2561 is toto)
2562 tag[string()=~/regexp/] (returns tag elements which text (as per the text
2563 method) matches regexp)
2564 expressions can start with / (search starts at the document root)
2565 expressions can start with . (search starts at the current element)
2566 // can be used to get all descendants instead of just direct children
2567 * matches any tag
2568
2569 So the following examples from the XPath
2570 recommendationhttp://www.w3.org/TR/xpath.html#path-abbrev
2571 <http://www.w3.org/TR/xpath.html#path-abbrev> work:
2572
2573 para selects the para element children of the context node
2574 * selects all element children of the context node
2575 para[1] selects the first para child of the context node
2576 para[last()] selects the last para child of the context node
2577 */para selects all para grandchildren of the context node
2578 /doc/chapter[5]/section[2] selects the second section of the fifth chapter
2579 of the doc
2580 chapter//para selects the para element descendants of the chapter element
2581 children of the context node
2582 //para selects all the para descendants of the document root and thus selects
2583 all para elements in the same document as the context node
2584 //olist/item selects all the item elements in the same document as the
2585 context node that have an olist parent
2586 .//para selects the para element descendants of the context node
2587 .. selects the parent of the context node
2588 para[@type="warning"] selects all para children of the context node that have
2589 a type attribute with value warning
2590 employee[@secretary and @assistant] selects all the employee children of the
2591 context node that have both a secretary attribute and an assistant
2592 attribute
2593
2594 The elements will be returned in the document order.
2595
2596 If $optional_offset is used then only one element will be returned,
2597 the one with the appropriate offset in the list, starting at 0
2598
2599 Quoting and interpolating variables can be a pain when the Perl
2600 syntax and the XPATH syntax collide, so use alternate quoting
2601 mechanisms like q or qq (I like q{} and qq{} myself).
2602
2603 Here are some more examples to get you started:
2604
2605 my $p1= "p1";
2606 my $p2= "p2";
2607 my @res= $t->get_xpath( qq{p[string( "$p1") or string( "$p2")]});
2608
2609 my $a= "a1";
2610 my @res= $t->get_xpath( qq{//*[@att="$a"]});
2611
2612 my $val= "a1";
2613 my $exp= qq{//p[ \@att='$val']}; # you need to use \@ or you will get a warning
2614 my @res= $t->get_xpath( $exp);
2615
2616 Note that the only supported regexps delimiters are / and that you
2617 must backslash all / in regexps AND in regular strings.
2618
2619 XML::Twig does not provide natively full XPATH support, but you can
2620 use "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as
2621 the XPath engine, with full coverage of the spec.
2622
2623 "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as the
2624 XPath engine, with full coverage of the spec.
2625
2626 find_nodes
2627 same as"get_xpath"
2628
2629 findnodes
2630 same as "get_xpath"
2631
2632 text @optional_options
2633 Return a string consisting of all the "PCDATA" and "CDATA" in an
2634 element, without any tags. The text is not XML-escaped: base
2635 entities such as "&" and "<" are not escaped.
2636
2637 The '"no_recurse"' option will only return the text of the element,
2638 not of any included sub-elements (same as "text_only").
2639
2640 text_only
2641 Same as "text" except that the text returned doesn't include the
2642 text of sub-elements.
2643
2644 trimmed_text
2645 Same as "text" except that the text is trimmed: leading and
2646 trailing spaces are discarded, consecutive spaces are collapsed
2647
2648 set_text ($string)
2649 Set the text for the element: if the element is a "PCDATA", just
2650 set its text, otherwise cut all the children of the element and
2651 create a single "PCDATA" child for it, which holds the text.
2652
2653 merge ($elt2)
2654 Move the content of $elt2 within the element
2655
2656 insert ($tag1, [$optional_atts1], $tag2, [$optional_atts2],...)
2657 For each tag in the list inserts an element $tag as the only child
2658 of the element. The element gets the optional attributes
2659 in"$optional_atts<n>." All children of the element are set as
2660 children of the new element. The upper level element is returned.
2661
2662 $p->insert( table => { border=> 1}, 'tr', 'td')
2663
2664 put $p in a table with a visible border, a single "tr" and a single
2665 "td" and return the "table" element:
2666
2667 <p><table border="1"><tr><td>original content of p</td></tr></table></p>
2668
2669 wrap_in (@tag)
2670 Wrap elements in @tag as the successive ancestors of the element,
2671 returns the new element. "$elt->wrap_in( 'td', 'tr', 'table')"
2672 wraps the element as a single cell in a table for example.
2673
2674 Optionally each tag can be followed by a hashref of attributes,
2675 that will be set on the wrapping element:
2676
2677 $elt->wrap_in( p => { class => "advisory" }, div => { class => "intro", id => "div_intro });
2678
2679 insert_new_elt ($opt_position, $tag, $opt_atts_hashref, @opt_content)
2680 Combines a "new " and a "paste ": creates a new element using $tag,
2681 $opt_atts_hashref and @opt_content which are arguments similar to
2682 those for "new", then paste it, using $opt_position or
2683 'first_child', relative to $elt.
2684
2685 Return the newly created element
2686
2687 erase
2688 Erase the element: the element is deleted and all of its children
2689 are pasted in its place.
2690
2691 set_content ( $optional_atts, @list_of_elt_and_strings) (
2692 $optional_atts, '#EMPTY')
2693 Set the content for the element, from a list of strings and
2694 elements. Cuts all the element children, then pastes the list
2695 elements as the children. This method will create a "PCDATA"
2696 element for any strings in the list.
2697
2698 The $optional_atts argument is the ref of a hash of attributes. If
2699 this argument is used then the previous attributes are deleted,
2700 otherwise they are left untouched.
2701
2702 WARNING: if you rely on ID's then you will have to set the id
2703 yourself. At this point the element does not belong to a twig yet,
2704 so the ID attribute is not known so it won't be stored in the ID
2705 list.
2706
2707 A content of '"#EMPTY"' creates an empty element;
2708
2709 namespace ($optional_prefix)
2710 Return the URI of the namespace that $optional_prefix or the
2711 element name belongs to. If the name doesn't belong to any
2712 namespace, "undef" is returned.
2713
2714 local_name
2715 Return the local name (without the prefix) for the element
2716
2717 ns_prefix
2718 Return the namespace prefix for the element
2719
2720 current_ns_prefixes
2721 Return a list of namespace prefixes valid for the element. The
2722 order of the prefixes in the list has no meaning. If the default
2723 namespace is currently bound, '' appears in the list.
2724
2725 inherit_att ($att, @optional_tag_list)
2726 Return the value of an attribute inherited from parent tags. The
2727 value returned is found by looking for the attribute in the element
2728 then in turn in each of its ancestors. If the @optional_tag_list is
2729 supplied only those ancestors whose tag is in the list will be
2730 checked.
2731
2732 all_children_are ($optional_condition)
2733 return 1 if all children of the element pass the
2734 $optional_condition, 0 otherwise
2735
2736 level ($optional_condition)
2737 Return the depth of the element in the twig (root is 0). If
2738 $optional_condition is given then only ancestors that match the
2739 condition are counted.
2740
2741 WARNING: in a tree created using the "twig_roots" option this will
2742 not return the level in the document tree, level 0 will be the
2743 document root, level 1 will be the "twig_roots" elements. During
2744 the parsing (in a "twig_handler") you can use the "depth" method on
2745 the twig object to get the real parsing depth.
2746
2747 in ($potential_parent)
2748 Return true if the element is in the potential_parent
2749 ($potential_parent is an element)
2750
2751 in_context ($cond, $optional_level)
2752 Return true if the element is included in an element which passes
2753 $cond optionally within $optional_level levels. The returned value
2754 is the including element.
2755
2756 pcdata
2757 Return the text of a "PCDATA" element or "undef" if the element is
2758 not "PCDATA".
2759
2760 pcdata_xml_string
2761 Return the text of a "PCDATA" element or undef if the element is
2762 not "PCDATA". The text is "XML-escaped" ('&' and '<' are replaced
2763 by '&' and '<')
2764
2765 set_pcdata ($text)
2766 Set the text of a "PCDATA" element. This method does not check that
2767 the element is indeed a "PCDATA" so usually you should use
2768 "set_text" instead.
2769
2770 append_pcdata ($text)
2771 Add the text at the end of a "PCDATA" element.
2772
2773 is_cdata
2774 Return 1 if the element is a "CDATA" element, returns 0 otherwise.
2775
2776 is_text
2777 Return 1 if the element is a "CDATA" or "PCDATA" element, returns 0
2778 otherwise.
2779
2780 cdata
2781 Return the text of a "CDATA" element or "undef" if the element is
2782 not "CDATA".
2783
2784 cdata_string
2785 Return the XML string of a "CDATA" element, including the opening
2786 and closing markers.
2787
2788 set_cdata ($text)
2789 Set the text of a "CDATA" element.
2790
2791 append_cdata ($text)
2792 Add the text at the end of a "CDATA" element.
2793
2794 remove_cdata
2795 Turns all "CDATA" sections in the element into regular "PCDATA"
2796 elements. This is useful when converting XML to HTML, as browsers
2797 do not support CDATA sections.
2798
2799 extra_data
2800 Return the extra_data (comments and PI's) attached to an element
2801
2802 set_extra_data ($extra_data)
2803 Set the extra_data (comments and PI's) attached to an element
2804
2805 append_extra_data ($extra_data)
2806 Append extra_data to the existing extra_data before the element (if
2807 no previous extra_data exists then it is created)
2808
2809 set_asis
2810 Set a property of the element that causes it to be output without
2811 being XML escaped by the print functions: if it contains "a < b" it
2812 will be output as such and not as "a < b". This can be useful to
2813 create text elements that will be output as markup. Note that all
2814 "PCDATA" descendants of the element are also marked as having the
2815 property (they are the ones that are actually impacted by the
2816 change).
2817
2818 If the element is a "CDATA" element it will also be output asis,
2819 without the "CDATA" markers. The same goes for any "CDATA"
2820 descendant of the element
2821
2822 set_not_asis
2823 Unsets the "asis" property for the element and its text
2824 descendants.
2825
2826 is_asis
2827 Return the "asis" property status of the element ( 1 or "undef")
2828
2829 closed
2830 Return true if the element has been closed. Might be useful if you
2831 are somewhere in the tree, during the parse, and have no idea
2832 whether a parent element is completely loaded or not.
2833
2834 get_type
2835 Return the type of the element: '"#ELT"' for "real" elements, or
2836 '"#PCDATA"', '"#CDATA"', '"#COMMENT"', '"#ENT"', '"#PI"'
2837
2838 is_elt
2839 Return the tag if the element is a "real" element, or 0 if it is
2840 "PCDATA", "CDATA"...
2841
2842 contains_only_text
2843 Return 1 if the element does not contain any other "real" element
2844
2845 contains_only ($exp)
2846 Return the list of children if all children of the element match
2847 the expression $exp
2848
2849 if( $para->contains_only( 'tt')) { ... }
2850
2851 contains_a_single ($exp)
2852 If the element contains a single child that matches the expression
2853 $exp returns that element. Otherwise returns 0.
2854
2855 is_field
2856 same as "contains_only_text"
2857
2858 is_pcdata
2859 Return 1 if the element is a "PCDATA" element, returns 0 otherwise.
2860
2861 is_ent
2862 Return 1 if the element is an entity (an unexpanded entity)
2863 element, return 0 otherwise.
2864
2865 is_empty
2866 Return 1 if the element is empty, 0 otherwise
2867
2868 set_empty
2869 Flags the element as empty. No further check is made, so if the
2870 element is actually not empty the output will be messed. The only
2871 effect of this method is that the output will be "<tag
2872 att="value""/>".
2873
2874 set_not_empty
2875 Flags the element as not empty. if it is actually empty then the
2876 element will be output as "<tag att="value""></tag>"
2877
2878 is_pi
2879 Return 1 if the element is a processing instruction ("#PI")
2880 element, return 0 otherwise.
2881
2882 target
2883 Return the target of a processing instruction
2884
2885 set_target ($target)
2886 Set the target of a processing instruction
2887
2888 data
2889 Return the data part of a processing instruction
2890
2891 set_data ($data)
2892 Set the data of a processing instruction
2893
2894 set_pi ($target, $data)
2895 Set the target and data of a processing instruction
2896
2897 pi_string
2898 Return the string form of a processing instruction ("<?target
2899 data?>")
2900
2901 is_comment
2902 Return 1 if the element is a comment ("#COMMENT") element, return 0
2903 otherwise.
2904
2905 set_comment ($comment_text)
2906 Set the text for a comment
2907
2908 comment
2909 Return the content of a comment (just the text, not the "<!--" and
2910 "-->")
2911
2912 comment_string
2913 Return the XML string for a comment ("<!-- comment -->")
2914
2915 set_ent ($entity)
2916 Set an (non-expanded) entity ("#ENT"). $entity) is the entity text
2917 ("&ent;")
2918
2919 ent Return the entity for an entity ("#ENT") element ("&ent;")
2920
2921 ent_name
2922 Return the entity name for an entity ("#ENT") element ("ent")
2923
2924 ent_string
2925 Return the entity, either expanded if the expanded version is
2926 available, or non-expanded ("&ent;") otherwise
2927
2928 child ($offset, $optional_condition)
2929 Return the $offset-th child of the element, optionally the
2930 $offset-th child that matches $optional_condition. The children are
2931 treated as a list, so "$elt->child( 0)" is the first child, while
2932 "$elt->child( -1)" is the last child.
2933
2934 child_text ($offset, $optional_condition)
2935 Return the text of a child or "undef" if the sibling does not
2936 exist. Arguments are the same as child.
2937
2938 last_child ($optional_condition)
2939 Return the last child of the element, or the last child matching
2940 $optional_condition (ie the last of the element children matching
2941 the condition).
2942
2943 last_child_text ($optional_condition)
2944 Same as "first_child_text" but for the last child.
2945
2946 sibling ($offset, $optional_condition)
2947 Return the next or previous $offset-th sibling of the element, or
2948 the $offset-th one matching $optional_condition. If $offset is
2949 negative then a previous sibling is returned, if $offset is
2950 positive then a next sibling is returned. "$offset=0" returns the
2951 element if there is no condition or if the element matches the
2952 condition>, "undef" otherwise.
2953
2954 sibling_text ($offset, $optional_condition)
2955 Return the text of a sibling or "undef" if the sibling does not
2956 exist. Arguments are the same as "sibling".
2957
2958 prev_siblings ($optional_condition)
2959 Return the list of previous siblings (optionally matching
2960 $optional_condition) for the element. The elements are ordered in
2961 document order.
2962
2963 next_siblings ($optional_condition)
2964 Return the list of siblings (optionally matching
2965 $optional_condition) following the element. The elements are
2966 ordered in document order.
2967
2968 pos ($optional_condition)
2969 Return the position of the element in the children list. The first
2970 child has a position of 1 (as in XPath).
2971
2972 If the $optional_condition is given then only siblings that match
2973 the condition are counted. If the element itself does not match the
2974 condition then 0 is returned.
2975
2976 atts
2977 Return a hash ref containing the element attributes
2978
2979 set_atts ({ att1=>$att1_val, att2=> $att2_val... })
2980 Set the element attributes with the hash ref supplied as the
2981 argument. The previous attributes are lost (ie the attributes set
2982 by "set_atts" replace all of the attributes of the element).
2983
2984 You can also pass a list instead of a hashref: "$elt->set_atts(
2985 att1 => 'val1',...)"
2986
2987 del_atts
2988 Deletes all the element attributes.
2989
2990 att_nb
2991 Return the number of attributes for the element
2992
2993 has_atts
2994 Return true if the element has attributes (in fact return the
2995 number of attributes, thus being an alias to "att_nb"
2996
2997 has_no_atts
2998 Return true if the element has no attributes, false (0) otherwise
2999
3000 att_names
3001 return a list of the attribute names for the element
3002
3003 att_xml_string ($att, $options)
3004 Return the attribute value, where '&', '<' and quote (" or the
3005 value of the quote option at twig creation) are XML-escaped.
3006
3007 The options are passed as a hashref, setting "escape_gt" to a true
3008 value will also escape '>' ($elt( 'myatt', { escape_gt => 1 });
3009
3010 set_id ($id)
3011 Set the "id" attribute of the element to the value. See "elt_id "
3012 to change the id attribute name
3013
3014 id Gets the id attribute value
3015
3016 del_id ($id)
3017 Deletes the "id" attribute of the element and remove it from the id
3018 list for the document
3019
3020 class
3021 Return the "class" attribute for the element (methods on the
3022 "class" attribute are quite convenient when dealing with XHTML, or
3023 plain XML that will eventually be displayed using CSS)
3024
3025 set_class ($class)
3026 Set the "class" attribute for the element to $class
3027
3028 add_to_class ($class)
3029 Add $class to the element "class" attribute: the new class is added
3030 only if it is not already present. Note that classes are sorted
3031 alphabetically, so the "class" attribute can be changed even if the
3032 class is already there
3033
3034 att_to_class ($att)
3035 Set the "class" attribute to the value of attribute $att
3036
3037 add_att_to_class ($att)
3038 Add the value of attribute $att to the "class" attribute of the
3039 element
3040
3041 move_att_to_class ($att)
3042 Add the value of attribute $att to the "class" attribute of the
3043 element and delete the attribute
3044
3045 tag_to_class
3046 Set the "class" attribute of the element to the element tag
3047
3048 add_tag_to_class
3049 Add the element tag to its "class" attribute
3050
3051 set_tag_class ($new_tag)
3052 Add the element tag to its "class" attribute and sets the tag to
3053 $new_tag
3054
3055 in_class ($class)
3056 Return true (1) if the element is in the class $class (if $class is
3057 one of the tokens in the element "class" attribute)
3058
3059 tag_to_span
3060 Change the element tag tp "span" and set its class to the old tag
3061
3062 tag_to_div
3063 Change the element tag tp "div" and set its class to the old tag
3064
3065 DESTROY
3066 Frees the element from memory.
3067
3068 start_tag
3069 Return the string for the start tag for the element, including the
3070 "/>" at the end of an empty element tag
3071
3072 end_tag
3073 Return the string for the end tag of an element. For an empty
3074 element, this returns the empty string ('').
3075
3076 xml_string @optional_options
3077 Equivalent to "$elt->sprint( 1)", returns the string for the entire
3078 element, excluding the element's tags (but nested element tags are
3079 present)
3080
3081 The '"no_recurse"' option will only return the text of the element,
3082 not of any included sub-elements (same as "xml_text_only").
3083
3084 inner_xml
3085 Another synonym for xml_string
3086
3087 outer_xml
3088 An other synonym for sprint
3089
3090 xml_text
3091 Return the text of the element, encoded (and processed by the
3092 current "output_filter" or "output_encoding" options, without any
3093 tag.
3094
3095 xml_text_only
3096 Same as "xml_text" except that the text returned doesn't include
3097 the text of sub-elements.
3098
3099 set_pretty_print ($style)
3100 Set the pretty print method, amongst '"none"' (default),
3101 '"nsgmls"', '"nice"', '"indented"', '"record"' and '"record_c"'
3102
3103 pretty_print styles:
3104
3105 none
3106 the default, no "\n" is used
3107
3108 nsgmls
3109 nsgmls style, with "\n" added within tags
3110
3111 nice
3112 adds "\n" wherever possible (NOT SAFE, can lead to invalid XML)
3113
3114 indented
3115 same as "nice" plus indents elements (NOT SAFE, can lead to
3116 invalid XML)
3117
3118 record
3119 table-oriented pretty print, one field per line
3120
3121 record_c
3122 table-oriented pretty print, more compact than "record", one
3123 record per line
3124
3125 set_empty_tag_style ($style)
3126 Set the method to output empty tags, amongst '"normal"' (default),
3127 '"html"', and '"expand"',
3128
3129 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
3130 '"<tag />"' for elements that can be empty in XHTML and "expand"
3131 outputs '"<tag></tag>"'
3132
3133 set_remove_cdata ($flag)
3134 set (or unset) the flag that forces the twig to output CDATA
3135 sections as regular (escaped) PCDATA
3136
3137 set_indent ($string)
3138 Set the indentation for the indented pretty print style (default is
3139 2 spaces)
3140
3141 set_quote ($quote)
3142 Set the quotes used for attributes. can be '"double"' (default) or
3143 '"single"'
3144
3145 cmp ($elt)
3146 Compare the order of the 2 elements in a twig.
3147
3148 C<$a> is the <A>..</A> element, C<$b> is the <B>...</B> element
3149
3150 document $a->cmp( $b)
3151 <A> ... </A> ... <B> ... </B> -1
3152 <A> ... <B> ... </B> ... </A> -1
3153 <B> ... </B> ... <A> ... </A> 1
3154 <B> ... <A> ... </A> ... </B> 1
3155 $a == $b 0
3156 $a and $b not in the same tree undef
3157
3158 before ($elt)
3159 Return 1 if $elt starts before the element, 0 otherwise. If the 2
3160 elements are not in the same twig then return "undef".
3161
3162 if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3163
3164 after ($elt)
3165 Return 1 if $elt starts after the element, 0 otherwise. If the 2
3166 elements are not in the same twig then return "undef".
3167
3168 if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3169
3170 other comparison methods
3171 lt
3172 le
3173 gt
3174 ge
3175 path
3176 Return the element context in a form similar to XPath's short form:
3177 '"/root/tag1/../tag"'
3178
3179 xpath
3180 Return a unique XPath expression that can be used to find the
3181 element again.
3182
3183 It looks like "/doc/sect[3]/title": unique elements do not have an
3184 index, the others do.
3185
3186 private methods
3187 Low-level methods on the twig:
3188
3189 set_parent ($parent)
3190 set_first_child ($first_child)
3191 set_last_child ($last_child)
3192 set_prev_sibling ($prev_sibling)
3193 set_next_sibling ($next_sibling)
3194 set_twig_current
3195 del_twig_current
3196 twig_current
3197 flush
3198 This method should NOT be used, always flush the twig, not an
3199 element.
3200
3201 contains_text
3202
3203 Those methods should not be used, unless of course you find some
3204 creative and interesting, not to mention useful, ways to do it.
3205
3206 cond
3207 Most of the navigation functions accept a condition as an optional
3208 argument The first element (or all elements for "children " or
3209 "ancestors ") that passes the condition is returned.
3210
3211 The condition is a single step of an XPath expression using the XPath
3212 subset defined by "get_xpath". Additional conditions are:
3213
3214 The condition can be
3215
3216 #ELT
3217 return a "real" element (not a PCDATA, CDATA, comment or pi
3218 element)
3219
3220 #TEXT
3221 return a PCDATA or CDATA element
3222
3223 regular expression
3224 return an element whose tag matches the regexp. The regexp has to
3225 be created with "qr//" (hence this is available only on perl 5.005
3226 and above)
3227
3228 code reference
3229 applies the code, passing the current element as argument, if the
3230 code returns true then the element is returned, if it returns false
3231 then the code is applied to the next candidate.
3232
3233 XML::Twig::XPath
3234 XML::Twig implements a subset of XPath through the "get_xpath" method.
3235
3236 If you want to use the whole XPath power, then you can use
3237 "XML::Twig::XPath" instead. In this case "XML::Twig" uses "XML::XPath"
3238 to execute XPath queries. You will of course need "XML::XPath"
3239 installed to be able to use "XML::Twig::XPath".
3240
3241 See XML::XPath for more information.
3242
3243 The methods you can use are:
3244
3245 findnodes ($path)
3246 return a list of nodes found by $path.
3247
3248 findnodes_as_string ($path)
3249 return the nodes found reproduced as XML. The result is not
3250 guaranteed to be valid XML though.
3251
3252 findvalue ($path)
3253 return the concatenation of the text content of the result nodes
3254
3255 In order for "XML::XPath" to be used as the XPath engine the following
3256 methods are included in "XML::Twig":
3257
3258 in XML::Twig
3259
3260 getRootNode
3261 getParentNode
3262 getChildNodes
3263
3264 in XML::Twig::Elt
3265
3266 string_value
3267 toString
3268 getName
3269 getRootNode
3270 getNextSibling
3271 getPreviousSibling
3272 isElementNode
3273 isTextNode
3274 isPI
3275 isPINode
3276 isProcessingInstructionNode
3277 isComment
3278 isCommentNode
3279 getTarget
3280 getChildNodes
3281 getElementById
3282
3283 XML::Twig::XPath::Elt
3284 The methods you can use are the same as on "XML::Twig::XPath" elements:
3285
3286 findnodes ($path)
3287 return a list of nodes found by $path.
3288
3289 findnodes_as_string ($path)
3290 return the nodes found reproduced as XML. The result is not
3291 guaranteed to be valid XML though.
3292
3293 findvalue ($path)
3294 return the concatenation of the text content of the result nodes
3295
3296 XML::Twig::Entity_list
3297 new Create an entity list.
3298
3299 add ($ent)
3300 Add an entity to an entity list.
3301
3302 add_new_ent ($name, $val, $sysid, $pubid, $ndata, $param)
3303 Create a new entity and add it to the entity list
3304
3305 delete ($ent or $tag).
3306 Delete an entity (defined by its name or by the Entity object) from
3307 the list.
3308
3309 print ($optional_filehandle)
3310 Print the entity list.
3311
3312 list
3313 Return the list as an array
3314
3315 XML::Twig::Entity
3316 new ($name, $val, $sysid, $pubid, $ndata, $param)
3317 Same arguments as the Entity handler for XML::Parser.
3318
3319 print ($optional_filehandle)
3320 Print an entity declaration.
3321
3322 name
3323 Return the name of the entity
3324
3325 val Return the value of the entity
3326
3327 sysid
3328 Return the system id for the entity (for NDATA entities)
3329
3330 pubid
3331 Return the public id for the entity (for NDATA entities)
3332
3333 ndata
3334 Return true if the entity is an NDATA entity
3335
3336 param
3337 Return true if the entity is a parameter entity
3338
3339 text
3340 Return the entity declaration text.
3341
3343 Additional examples (and a complete tutorial) can be found on the
3344 XML::Twig Page<http://www.xmltwig.com/xmltwig/>
3345
3346 To figure out what flush does call the following script with an XML
3347 file and an element name as arguments
3348
3349 use XML::Twig;
3350
3351 my ($file, $elt)= @ARGV;
3352 my $t= XML::Twig->new( twig_handlers =>
3353 { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
3354 $t->parsefile( $file, ErrorContext => 2);
3355 $t->flush;
3356 print "\n";
3357
3359 Subclassing XML::Twig
3360 Useful methods:
3361
3362 elt_class
3363 In order to subclass "XML::Twig" you will probably need to subclass
3364 also "XML::Twig::Elt". Use the "elt_class" option when you create
3365 the "XML::Twig" object to get the elements created in a different
3366 class (which should be a subclass of "XML::Twig::Elt".
3367
3368 add_options
3369 If you inherit "XML::Twig" new method but want to add more options
3370 to it you can use this method to prevent XML::Twig to issue
3371 warnings for those additional options.
3372
3373 DTD Handling
3374 There are 3 possibilities here. They are:
3375
3376 No DTD
3377 No doctype, no DTD information, no entity information, the world is
3378 simple...
3379
3380 Internal DTD
3381 The XML document includes an internal DTD, and maybe entity
3382 declarations.
3383
3384 If you use the load_DTD option when creating the twig the DTD
3385 information and the entity declarations can be accessed.
3386
3387 The DTD and the entity declarations will be "flush"'ed (or
3388 "print"'ed) either as is (if they have not been modified) or as
3389 reconstructed (poorly, comments are lost, order is not kept, due to
3390 it's content this DTD should not be viewed by anyone) if they have
3391 been modified. You can also modify them directly by changing the
3392 "$twig->{twig_doctype}->{internal}" field (straight from
3393 XML::Parser, see the "Doctype" handler doc)
3394
3395 External DTD
3396 The XML document includes a reference to an external DTD, and maybe
3397 entity declarations.
3398
3399 If you use the "load_DTD" when creating the twig the DTD
3400 information and the entity declarations can be accessed. The entity
3401 declarations will be "flush"'ed (or "print"'ed) either as is (if
3402 they have not been modified) or as reconstructed (badly, comments
3403 are lost, order is not kept).
3404
3405 You can change the doctype through the "$twig->set_doctype" method
3406 and print the dtd through the "$twig->dtd_text" or
3407 "$twig->dtd_print"
3408 methods.
3409
3410 If you need to modify the entity list this is probably the easiest
3411 way to do it.
3412
3413 Flush
3414 If you set handlers and use "flush", do not forget to flush the twig
3415 one last time AFTER the parsing, or you might be missing the end of the
3416 document.
3417
3418 Remember that element handlers are called when the element is CLOSED,
3419 so if you have handlers for nested elements the inner handlers will be
3420 called first. It makes it for example trickier than it would seem to
3421 number nested clauses.
3422
3424 entity handling
3425 Due to XML::Parser behaviour, non-base entities in attribute values
3426 disappear: "att="val&ent;"" will be turned into "att => val",
3427 unless you use the "keep_encoding" argument to "XML::Twig->new"
3428
3429 DTD handling
3430 The DTD handling methods are quite bugged. No one uses them and it
3431 seems very difficult to get them to work in all cases, including
3432 with several slightly incompatible versions of XML::Parser and of
3433 libexpat.
3434
3435 Basically you can read the DTD, output it back properly, and update
3436 entities, but not much more.
3437
3438 So use XML::Twig with standalone documents, or with documents
3439 refering to an external DTD, but don't expect it to properly parse
3440 and even output back the DTD.
3441
3442 memory leak
3443 If you use a lot of twigs you might find that you leak quite a lot
3444 of memory (about 2Ks per twig). You can use the "dispose " method
3445 to free that memory after you are done.
3446
3447 If you create elements the same thing might happen, use the
3448 "delete" method to get rid of them.
3449
3450 Alternatively installing the "Scalar::Util" (or "WeakRef") module
3451 on a version of Perl that supports it (>5.6.0) will get rid of the
3452 memory leaks automagically.
3453
3454 ID list
3455 The ID list is NOT updated when elements are cut or deleted.
3456
3457 change_gi
3458 This method will not function properly if you do:
3459
3460 $twig->change_gi( $old1, $new);
3461 $twig->change_gi( $old2, $new);
3462 $twig->change_gi( $new, $even_newer);
3463
3464 sanity check on XML::Parser method calls
3465 XML::Twig should really prevent calls to some XML::Parser methods,
3466 especially the "setHandlers" method.
3467
3468 pretty printing
3469 Pretty printing (at least using the '"indented"' style) is hard to
3470 get right! Only elements that belong to the document will be
3471 properly indented. Printing elements that do not belong to the twig
3472 makes it impossible for XML::Twig to figure out their depth, and
3473 thus their indentation level.
3474
3475 Also there is an unavoidable bug when using "flush" and pretty
3476 printing for elements with mixed content that start with an
3477 embedded element:
3478
3479 <elt><b>b</b>toto<b>bold</b></elt>
3480
3481 will be output as
3482
3483 <elt>
3484 <b>b</b>toto<b>bold</b></elt>
3485
3486 if you flush the twig when you find the "<b>" element
3487
3489 These are the things that can mess up calling code, especially if
3490 threaded. They might also cause problem under mod_perl.
3491
3492 Exported constants
3493 Whether you want them or not you get them! These are subroutines to
3494 use as constant when creating or testing elements
3495
3496 PCDATA return '#PCDATA'
3497 CDATA return '#CDATA'
3498 PI return '#PI', I had the choice between PROC and PI :--(
3499
3500 Module scoped values: constants
3501 these should cause no trouble:
3502
3503 %base_ent= ( '>' => '>',
3504 '<' => '<',
3505 '&' => '&',
3506 "'" => ''',
3507 '"' => '"',
3508 );
3509 CDATA_START = "<![CDATA[";
3510 CDATA_END = "]]>";
3511 PI_START = "<?";
3512 PI_END = "?>";
3513 COMMENT_START = "<!--";
3514 COMMENT_END = "-->";
3515
3516 pretty print styles
3517
3518 ( $NSGMLS, $NICE, $INDENTED, $INDENTED_C, $WRAPPED, $RECORD1, $RECORD2)= (1..7);
3519
3520 empty tag output style
3521
3522 ( $HTML, $EXPAND)= (1..2);
3523
3524 Module scoped values: might be changed
3525 Most of these deal with pretty printing, so the worst that can
3526 happen is probably that XML output does not look right, but is
3527 still valid and processed identically by XML processors.
3528
3529 $empty_tag_style can mess up HTML bowsers though and changing $ID
3530 would most likely create problems.
3531
3532 $pretty=0; # pretty print style
3533 $quote='"'; # quote for attributes
3534 $INDENT= ' '; # indent for indented pretty print
3535 $empty_tag_style= 0; # how to display empty tags
3536 $ID # attribute used as an id ('id' by default)
3537
3538 Module scoped values: definitely changed
3539 These 2 variables are used to replace tags by an index, thus saving
3540 some space when creating a twig. If they really cause you too much
3541 trouble, let me know, it is probably possible to create either a
3542 switch or at least a version of XML::Twig that does not perform
3543 this optimization.
3544
3545 %gi2index; # tag => index
3546 @index2gi; # list of tags
3547
3548 If you need to manipulate all those values, you can use the following
3549 methods on the XML::Twig object:
3550
3551 global_state
3552 Return a hashref with all the global variables used by XML::Twig
3553
3554 The hash has the following fields: "pretty", "quote", "indent",
3555 "empty_tag_style", "keep_encoding", "expand_external_entities",
3556 "output_filter", "output_text_filter", "keep_atts_order"
3557
3558 set_global_state ($state)
3559 Set the global state, $state is a hashref
3560
3561 save_global_state
3562 Save the current global state
3563
3564 restore_global_state
3565 Restore the previously saved (using "Lsave_global_state"> state
3566
3568 SAX handlers
3569 Allowing XML::Twig to work on top of any SAX parser
3570
3571 multiple twigs are not well supported
3572 A number of twig features are just global at the moment. These
3573 include the ID list and the "tag pool" (if you use "change_gi" then
3574 you change the tag for ALL twigs).
3575
3576 A future version will try to support this while trying not to be to
3577 hard on performance (at least when a single twig is used!).
3578
3580 Michel Rodriguez <mirod@xmltwig.com>
3581
3583 This library is free software; you can redistribute it and/or modify it
3584 under the same terms as Perl itself.
3585
3586 Bug reports should be sent using: RT
3587 http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig
3588 <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig>
3589
3590 Comments can be sent to mirod@xmltwig.com
3591
3592 The XML::Twig page is at <http://www.xmltwig.com/xmltwig/> It includes
3593 the development version of the module, a slightly better version of the
3594 documentation, examples, a tutorial and a: Processing XML efficiently
3595 with Perl and XML::Twig:
3596 <http://www.xmltwig.com/xmltwig/tutorial/index.html>
3597
3599 Complete docs, including a tutorial, examples, an easier to use HTML
3600 version of the docs, a quick reference card and a FAQ are available at
3601 <http://www.xmltwig.com/xmltwig/>
3602
3603 git repository at <http://github.com/mirod/xmltwig>
3604
3605 XML::Parser, XML::Parser::Expat, XML::XPath, Encode, Text::Iconv,
3606 Scalar::Utils
3607
3608 Alternative Modules
3609 XML::Twig is not the only XML::Processing module available on CPAN (far
3610 from it!).
3611
3612 The main alternative I would recommend is XML::LibXML.
3613
3614 Here is a quick comparison of the 2 modules:
3615
3616 XML::LibXML, actually "libxml2" on which it is based, sticks to the
3617 standards, and implements a good number of them in a rather strict way:
3618 XML, XPath, DOM, RelaxNG, I must be forgetting a couple (XInclude?). It
3619 is fast and rather frugal memory-wise.
3620
3621 XML::Twig is older: when I started writing it XML::Parser/expat was the
3622 only game in town. It implements XML and that's about it (plus a subset
3623 of XPath, and you can use XML::Twig::XPath if you have XML::XPathEngine
3624 installed for full support). It is slower and requires more memory for
3625 a full tree than XML::LibXML. On the plus side (yes, there is a plus
3626 side!) it lets you process a big document in chunks, and thus let you
3627 tackle documents that couldn't be loaded in memory by XML::LibXML, and
3628 it offers a lot (and I mean a LOT!) of higher-level methods, for
3629 everything, from adding structure to "low-level" XML, to shortcuts for
3630 XHTML conversions and more. It also DWIMs quite a bit, getting comments
3631 and non-significant whitespaces out of the way but preserving them in
3632 the output for example. As it does not stick to the DOM, is also
3633 usually leads to shorter code than in XML::LibXML.
3634
3635 Beyond the pure features of the 2 modules, XML::LibXML seems to be
3636 prefered by "XML-purists", while XML::Twig seems to be more used by
3637 Perl Hackers who have to deal with XML. As you have noted, XML::Twig
3638 also comes with quite a lot of docs, but I am sure if you ask for help
3639 about XML::LibXML here or on Perlmonks you will get answers.
3640
3641 Note that it is actually quite hard for me to compare the 2 modules: on
3642 one hand I know XML::Twig inside-out and I can get it to do pretty much
3643 anything I need to (or I improve it ;--), while I have a very basic
3644 knowledge of XML::LibXML. So feature-wise, I'd rather use XML::Twig
3645 ;--). On the other hand, I am painfully aware of some of the
3646 deficiencies, potential bugs and plain ugly code that lurk in
3647 XML::Twig, even though you are unlikely to be affected by them (unless
3648 for example you need to change the DTD of a document programatically),
3649 while I haven't looked much into XML::LibXML so it still looks shinny
3650 and clean to me.
3651
3652 That said, if you need to process a document that is too big to fit
3653 memory and XML::Twig is too slow for you, my reluctant advice would be
3654 to use "bare" XML::Parser. It won't be as easy to use as XML::Twig:
3655 basically with XML::Twig you trade some speed (depending on what you do
3656 from a factor 3 to... none) for ease-of-use, but it will be easier IMHO
3657 than using SAX (albeit not standard), and at this point a LOT faster
3658 (see the last test in
3659 <http://www.xmltwig.com/article/simple_benchmark/>).
3660
3661
3662
3663perl v5.12.0 2010-05-07 Twig(3)