1Twig(3) User Contributed Perl Documentation Twig(3)
2
3
4
6 XML::Twig - A perl module for processing huge XML documents in tree
7 mode.
8
10 Note that this documentation is intended as a reference to the module.
11
12 Complete docs, including a tutorial, examples, an easier to use HTML
13 version, a quick reference card and a FAQ are available at
14 http://www.xmltwig.com/xmltwig
15
16 Small documents (loaded in memory as a tree):
17
18 my $twig=XML::Twig->new(); # create the twig
19 $twig->parsefile( 'doc.xml'); # build it
20 my_process( $twig); # use twig methods to process it
21 $twig->print; # output the twig
22
23 Huge documents (processed in combined stream/tree mode):
24
25 # at most one div will be loaded in memory
26 my $twig=XML::Twig->new(
27 twig_handlers =>
28 { title => sub { $_->set_tag( 'h2') }, # change title tags to h2
29 para => sub { $_->set_tag( 'p') }, # change para to p
30 hidden => sub { $_->delete; }, # remove hidden elements
31 list => \&my_list_process, # process list elements
32 div => sub { $_[0]->flush; }, # output and free memory
33 },
34 pretty_print => 'indented', # output will be nicely formatted
35 empty_tags => 'html', # outputs <empty_tag />
36 );
37 $twig->flush; # flush the end of the document
38
39 See XML::Twig 101 for other ways to use the module, as a filter for
40 example
41
43 This module provides a way to process XML documents. It is build on top
44 of "XML::Parser".
45
46 The module offers a tree interface to the document, while allowing you
47 to output the parts of it that have been completely processed.
48
49 It allows minimal resource (CPU and memory) usage by building the tree
50 only for the parts of the documents that need actual processing,
51 through the use of the "twig_roots " and "twig_print_outside_roots "
52 options. The "finish " and "finish_print " methods also help to
53 increase performances.
54
55 XML::Twig tries to make simple things easy so it tries its best to
56 takes care of a lot of the (usually) annoying (but sometimes necessary)
57 features that come with XML and XML::Parser.
58
60 XML::Twig can be used either on "small" XML documents (that fit in mem‐
61 ory) or on huge ones, by processing parts of the document and out‐
62 putting or discarding them once they are processed.
63
64 Loading an XML document and processing it
65
66 my $t= XML::Twig->new();
67 $t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
68 my $root= $t->root;
69 $root->set_tag( 'html'); # change doc to html
70 $title= $root->first_child( 'title'); # get the title
71 $title->set_tag( 'h1'); # turn it into h1
72 my @para= $root->children( 'para'); # get the para children
73 foreach my $para (@para)
74 { $para->set_tag( 'p'); } # turn them into p
75 $t->print; # output the document
76
77 Other useful methods include:
78
79 att: "$elt->{'att'}->{'foo'}" return the "foo" attribute for an ele‐
80 ment,
81
82 set_att : "$elt->set_att( foo => "bar")" sets the "foo" attribute to
83 the "bar" value,
84
85 next_sibling: "$elt->{next_sibling}" return the next sibling in the
86 document (in the example "$title->{next_sibling}" is the first "para",
87 you can also (and actually should) use "$elt->next_sibling( 'para')" to
88 get it
89
90 The document can also be transformed through the use of the cut, copy,
91 paste and move methods: "$title->cut; $title->paste( after => $p);" for
92 example
93
94 And much, much more, see Elt.
95
96 Processing an XML document chunk by chunk
97
98 One of the strengths of XML::Twig is that it let you work with files
99 that do not fit in memory (BTW storing an XML document in memory as a
100 tree is quite memory-expensive, the expansion factor being often around
101 10).
102
103 To do this you can define handlers, that will be called once a specific
104 element has been completely parsed. In these handlers you can access
105 the element and process it as you see fit, using the navigation and the
106 cut-n-paste methods, plus lots of convenient ones like "prefix ". Once
107 the element is completely processed you can then "flush " it, which
108 will output it and free the memory. You can also "purge " it if you
109 don't need to output it (if you are just extracting some data from the
110 document for example). The handler will be called again once the next
111 relevant element has been parsed.
112
113 my $t= XML::Twig->new( twig_handlers =>
114 { section => \§ion,
115 para => sub { $_->set_tag( 'p');
116 },
117 );
118 $t->parsefile( 'doc.xml');
119 $t->flush; # don't forget to flush one last time in the end or anything
120 # after the last </section> tag will not be output
121
122 # the handler is called once a section is completely parsed, ie when
123 # the end tag for section is found, it receives the twig itself and
124 # the element (including all its sub-elements) as arguments
125 sub section
126 { my( $t, $section)= @_; # arguments for all twig_handlers
127 $section->set_tag( 'div'); # change the tag name.4, my favourite method...
128 # let's use the attribute nb as a prefix to the title
129 my $title= $section->first_child( 'title'); # find the title
130 my $nb= $title->{'att'}->{'nb'}; # get the attribute
131 $title->prefix( "$nb - "); # easy isn't it?
132 $section->flush; # outputs the section and frees memory
133 }
134
135 There is of course more to it: you can trigger handlers on more elabo‐
136 rate conditions than just the name of the element, "section/title" for
137 example.
138
139 my $t= XML::Twig->new( twig_handlers =>
140 { 'section/title' => sub { $_->print } }
141 )
142 ->parsefile( 'doc.xml');
143
144 Here "sub { $_->print }" simply prints the current element ($_ is
145 aliased to the element in the handler).
146
147 You can also trigger a handler on a test on an attribute:
148
149 my $t= XML::Twig->new( twig_handlers =>
150 { 'section[@level="1"]' => sub { $_->print } }
151 );
152 ->parsefile( 'doc.xml');
153
154 You can also use "start_tag_handlers " to process an element as soon as
155 the start tag is found. Besides "prefix " you can also use "suffix ",
156
157 Processing just parts of an XML document
158
159 The twig_roots mode builds only the required sub-trees from the docu‐
160 ment Anything outside of the twig roots will just be ignored:
161
162 my $t= XML::Twig->new(
163 # the twig will include just the root and selected titles
164 twig_roots => { 'section/title' => \&print_n_purge,
165 'annex/title' => \&print_n_purge
166 }
167 );
168 $t->parsefile( 'doc.xml');
169
170 sub print_n_purge
171 { my( $t, $elt)= @_;
172 print $elt->text; # print the text (including sub-element texts)
173 $t->purge; # frees the memory
174 }
175
176 You can use that mode when you want to process parts of a documents but
177 are not interested in the rest and you don't want to pay the price,
178 either in time or memory, to build the tree for the it.
179
180 Building an XML filter
181
182 You can combine the "twig_roots" and the "twig_print_outside_roots"
183 options to build filters, which let you modify selected elements and
184 will output the rest of the document as is.
185
186 This would convert prices in $ to prices in Euro in a document:
187
188 my $t= XML::Twig->new(
189 twig_roots => { 'price' => \&convert, }, # process prices
190 twig_print_outside_roots => 1, # print the rest
191 );
192 $t->parsefile( 'doc.xml');
193
194 sub convert
195 { my( $t, $price)= @_;
196 my $currency= $price->{'att'}->{'currency'}; # get the currency
197 if( $currency eq 'USD')
198 { $usd_price= $price->text; # get the price
199 # %rate is just a conversion table
200 my $euro_price= $usd_price * $rate{usd2euro};
201 $price->set_text( $euro_price); # set the new price
202 $price->set_att( currency => 'EUR'); # don't forget this!
203 }
204 $price->print; # output the price
205 }
206
207 XML::Twig and various versions of Perl, XML::Parser and expat:
208
209 Before being uploaded to CPAN, XML::Twig 3.22 has been tested under the
210 following environments:
211
212 linux-x86
213 perl 5.6.2, expat 1.95.8, XML::Parser 2.34 perl 5.8.0, expat
214 1.95.8, XML::Parser 2.34 perl 5.8.7, expat 1.95.8, XML::Parser2.34
215
216 Solaris
217 perl 5.6.1, expat 1.95.2, XML::Parser 2.31
218
219 XML::Twig is a lot more sensitive to variations in versions of perl,
220 XML::Parser and expat than to the OS, so this should cover some reason‐
221 able configurations.
222
223 The "recommended configuration" is perl 5.8.3+ (for good Unicode sup‐
224 port), XML::Parser 2.31+ and expat 1.95.5+
225
226 See <http://testers.cpan.org/search?request=dist&dist=XML-Twig> for the
227 CPAN testers reports on XML::Twig, which list all tested configura‐
228 tions.
229
230 An Atom feed of the CPAN Testers results is available at
231 <http://xmltwig.com/rss/twig_testers.rss>
232
233 Finally:
234
235 XML::Twig does NOT work with expat 1.95.4
236 XML::Twig only works with XML::Parser 2.27 in perl 5.6.*
237 Note that I can't compile XML::Parser 2.27 anymore, so I can't
238 garantee that it still works
239
240 XML::Parser 2.28 does not really work
241
242 When in doubt, upgrade expat, XML::Parser and Scalar::Util
243
244 Finally, for some optional features, XML::Twig depends on some addi‐
245 tional modules. The complete list, which depends somewhat on the ver‐
246 sion of Perl that you are running, is given by running "t/zz_dump_con‐
247 fig.t"
248
250 Whitespaces
251 Whitespaces that look non-significant are discarded, this behaviour
252 can be controlled using the "keep_spaces ", "keep_spaces_in " and
253 "discard_spaces_in " options.
254
255 Encoding
256 You can specify that you want the output in the same encoding as
257 the input (provided you have valid XML, which means you have to
258 specify the encoding either in the document or when you create the
259 Twig object) using the "keep_encoding " option
260
261 You can also use "output_encoding" to convert the internal UTF-8
262 format to the required encoding.
263
264 Comments and Processing Instructions (PI)
265 Comments and PI's can be hidden from the processing, but still
266 appear in the output (they are carried by the "real" element closer
267 to them)
268
269 Pretty Printing
270 XML::Twig can output the document pretty printed so it is easier to
271 read for us humans.
272
273 Surviving an untimely death
274 XML parsers are supposed to react violently when fed improper XML.
275 XML::Parser just dies.
276
277 XML::Twig provides the "safe_parse " and the "safe_parsefile "
278 methods which wrap the parse in an eval and return either the
279 parsed twig or 0 in case of failure.
280
281 Private attributes
282 Attributes with a name starting with # (illegal in XML) will not be
283 output, so you can safely use them to store temporary values during
284 processing. Note that you can store anything in a private
285 attribute, not just text, it's just a regular Perl variable, so a
286 reference to an object or a huge data structure is perfectly fine.
287
289 XML::Twig uses a very limited number of classes. The ones you are most
290 likely to use are "XML::Twig" of course, which represents a complete
291 XML document, including the document itself (the root of the document
292 itself is "root"), its handlers, its input or output filters... The
293 other main class is "XML::Twig::Elt", which models an XML element. Ele‐
294 ment here has a very wide definition: it can be a regular element, or
295 but also text, with an element "tag" of "#PCDATA" (or "#CDATA"), an
296 entity (tag is "#ENT"), a Processing Instruction ("#PI"), a comment
297 ("#COMMENT").
298
299 Those are the 2 commonly used classes.
300
301 You might want to look the "elt_class" option if you want to subclass
302 "XML::Twig::Elt".
303
304 Attributes are just attached to their parent element, they are not
305 objects per se. (Please use the provided methods "att" and "set_att" to
306 access them, if you access them as a hash, then your code becomes
307 implementaion dependant and might break in the future).
308
309 Other classes that are seldom used are "XML::Twig::Entity_list" and
310 "XML::Twig::Entity".
311
312 If you use "XML::Twig::XPath" instead of "XML::Twig", elements are then
313 created as "XML::Twig::XPath::Elt"
314
316 XML::Twig
317
318 A twig is a subclass of XML::Parser, so all XML::Parser methods can be
319 called on a twig object, including parse and parsefile. "setHandlers"
320 on the other hand cannot be used, see "BUGS "
321
322 new This is a class method, the constructor for XML::Twig. Options are
323 passed as keyword value pairs. Recognized options are the same as
324 XML::Parser, plus some XML::Twig specifics.
325
326 New Options:
327
328 twig_handlers
329 This argument consists of a hash "{ expression =" \&handler}>
330 where expression is a an XPath-like expression (+ some others).
331
332 XPath expressions are limited to using the child and descendant
333 axis (indeed you can't specify an axis), and predicates cannot
334 be nested. You can use the "string", or "string(<tag>)" func‐
335 tion (except in "twig_roots" triggers).
336
337 Additionally you can use regexps (/ delimited) to match
338 attribute and string values.
339
340 Examples:
341
342 foo
343 foo/bar
344 foo//bar
345 /foo/bar
346 /foo//bar
347 /foo/bar[@att1 = "val1" and @att2 = "val2"]/baz[@a >= 1]
348 foo[string()=~ /^duh!+/]
349 /foo[string(bar)=~ /\d+/]/baz[@att != 3]
350
351 #CDATA can be used to call a handler for a CDATA. #COMMENT can
352 be used to call a handler for comments
353
354 Some additional (non-XPath) expressions are also provided for
355 convenience:
356
357 processing instructions
358 '?' or '#PI' triggers the handler for any processing
359 instruction, and '?<target>' or '#PI <target>' triggers a
360 handler for processing instruction with the given target(
361 ex: '#PI xml-stylesheet').
362
363 level(<level>)
364 Triggers the handler on any element at that level in the
365 tree (root is level 1)
366
367 _all_
368 Triggers the handler for all elements in the tree
369
370 _default_
371 Triggers the handler for each element that does NOT have
372 any other handler.
373
374 Expressions are evaluated against the input document. Which
375 means that even if you have changed the tag of an element
376 (changing the tag of a parent element from a handler for exam‐
377 ple) the change will not impact the expression evaluation.
378 There is an exception to this: "private" attributes (which name
379 start with a '#', and can only be created during the parsing,
380 as they are not valid XML) are checked against the current
381 twig.
382
383 Handlers are triggered in fixed order, sorted by their type
384 (xpath expressions first, then regexps, then level), then by
385 whether they specify a full path (starting at the root element)
386 or not, then by by number of steps in the expression , then
387 number of predicates, then number of tests in predicates. Han‐
388 dlers where the last step does not specify a step ("foo/bar/*")
389 are triggered after other XPath handlers. Finally "_all_" han‐
390 dlers are triggered last.
391
392 Important: once a handler has been triggered if it returns 0
393 then no other handler is called, exept a "_all_" handler which
394 will be called anyway.
395
396 If a handler returns a true value and other handlers apply,
397 then the next applicable handler will be called. Repeat, rince,
398 lather..; The exception to that rule is when the
399 "do_not_chain_handlers" option is set, in which case only the
400 first handler will be called.
401
402 Note that it might be a good idea to explicitely return a short
403 true value (like 1) from handlers: this ensures that other
404 applicable handlers are called even if the last statement for
405 the handler happens to evaluate to false. This might also
406 speedup the code by avoiding the result of the last statement
407 of the code to be copied and passed to the code managing han‐
408 dlers. It can really pay to have 1 instead of a long string
409 returned.
410
411 When an element is CLOSED the corresponding handler is called,
412 with 2 arguments: the twig and the "/Element ". The twig
413 includes the document tree that has been built so far, the ele‐
414 ment is the complete sub-tree for the element. This means that
415 handlers for inner elements are called before handlers for
416 outer elements.
417
418 $_ is also set to the element, so it is easy to write inline
419 handlers like
420
421 para => sub { $_->set_tag( 'p'); }
422
423 Text is stored in elements whose tag is #PCDATA (due to mixed
424 content, text and sub-element in an element there is no way to
425 store the text as just an attribute of the enclosing element).
426
427 Warning: if you have used purge or flush on the twig the ele‐
428 ment might not be complete, some of its children might have
429 been entirely flushed or purged, and the start tag might even
430 have been printed (by "flush") already, so changing its tag
431 might not give the expected result.
432
433 twig_roots
434 This argument let's you build the tree only for those elements
435 you are interested in.
436
437 Example: my $t= XML::Twig->new( twig_roots => { title => 1, subtitle => 1});
438 $t->parsefile( file);
439 my $t= XML::Twig->new( twig_roots => { 'section/title' => 1});
440 $t->parsefile( file);
441
442 return a twig containing a document including only "title" and
443 "subtitle" elements, as children of the root element.
444
445 You can use generic_attribute_condition, attribute_condition,
446 full_path, partial_path, tag, tag_regexp, _default_ and _all_
447 to trigger the building of the twig. string_condition and reg‐
448 exp_condition cannot be used as the content of the element, and
449 the string, have not yet been parsed when the condition is
450 checked.
451
452 WARNING: path are checked for the document. Even if the
453 "twig_roots" option is used they will be checked against the
454 full document tree, not the virtual tree created by XML::Twig
455
456 WARNING: twig_roots elements should NOT be nested, that would
457 hopelessly confuse XML::Twig ;--(
458
459 Note: you can set handlers (twig_handlers) using twig_roots
460 Example: my $t= XML::Twig->new( twig_roots =>
461 { title => sub {
462 $_{1]->print;},
463 subtitle => \&process_sub‐
464 title
465 }
466 );
467 $t->parsefile( file);
468
469 twig_print_outside_roots
470 To be used in conjunction with the "twig_roots" argument. When
471 set to a true value this will print the document outside of the
472 "twig_roots" elements.
473
474 Example: my $t= XML::Twig->new( twig_roots => { title => \&number_title },
475 twig_print_outside_roots => 1,
476 );
477 $t->parsefile( file);
478 { my $nb;
479 sub number_title
480 { my( $twig, $title);
481 $nb++;
482 $title->prefix( "$nb "; }
483 $title->print;
484 }
485 }
486
487 This example prints the document outside of the title element,
488 calls "number_title" for each "title" element, prints it, and
489 then resumes printing the document. The twig is built only for
490 the "title" elements.
491
492 If the value is a reference to a file handle then the document
493 outside the "twig_roots" elements will be output to this file
494 handle:
495
496 open( OUT, ">out_file") or die "cannot open out file out_file:$!";
497 my $t= XML::Twig->new( twig_roots => { title => \&number_title },
498 # default output to OUT
499 twig_print_outside_roots => \*OUT,
500 );
501
502 { my $nb;
503 sub number_title
504 { my( $twig, $title);
505 $nb++;
506 $title->prefix( "$nb "; }
507 $title->print( \*OUT); # you have to print to \*OUT here
508 }
509 }
510
511 start_tag_handlers
512 A hash "{ expression =" \&handler}>. Sets element handlers that
513 are called when the element is open (at the end of the
514 XML::Parser "Start" handler). The handlers are called with 2
515 params: the twig and the element. The element is empty at that
516 point, its attributes are created though.
517
518 You can use generic_attribute_condition, attribute_condition,
519 full_path, partial_path, tag, tag_regexp, _default_ and _all_
520 to trigger the handler.
521
522 string_condition and regexp_condition cannot be used as the
523 content of the element, and the string, have not yet been
524 parsed when the condition is checked.
525
526 The main uses for those handlers are to change the tag name
527 (you might have to do it as soon as you find the open tag if
528 you plan to "flush" the twig at some point in the element, and
529 to create temporary attributes that will be used when process‐
530 ing sub-element with "twig_hanlders".
531
532 You should also use it to change tags if you use "flush". If
533 you change the tag in a regular "twig_handler" then the start
534 tag might already have been flushed.
535
536 Note: "start_tag" handlers can be called outside of
537 "twig_roots" if this argument is used, in this case handlers
538 are called with the following arguments: $t (the twig), $tag
539 (the tag of the element) and %att (a hash of the attributes of
540 the element).
541
542 If the "twig_print_outside_roots" argument is also used, if the
543 last handler called returns a "true" value, then the the start
544 tag will be output as it appeared in the original document, if
545 the handler returns a a "false" value then the start tag will
546 not be printed (so you can print a modified string yourself for
547 example).
548
549 Note that you can use the ignore method in "start_tag_handlers"
550 (and only there).
551
552 end_tag_handlers
553 A hash "{ expression =" \&handler}>. Sets element handlers that
554 are called when the element is closed (at the end of the
555 XML::Parser "End" handler). The handlers are called with 2
556 params: the twig and the tag of the element.
557
558 twig_handlers are called when an element is completely parsed,
559 so why have this redundant option? There is only one use for
560 "end_tag_handlers": when using the "twig_roots" option, to
561 trigger a handler for an element outside the roots. It is for
562 example very useful to number titles in a document using nested
563 sections:
564
565 my @no= (0);
566 my $no;
567 my $t= XML::Twig->new(
568 start_tag_handlers =>
569 { section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
570 twig_roots =>
571 { title => sub { $_[1]->prefix( $no); $_[1]->print; } },
572 end_tag_handlers => { section => sub { pop @no; } },
573 twig_print_outside_roots => 1
574 );
575 $t->parsefile( $file);
576
577 Using the "end_tag_handlers" argument without "twig_roots" will
578 result in an error.
579
580 do_not_chain_handlers
581 If this option is set to a true value, then only one handler
582 will be called for each element, even if several satisfy the
583 condition
584
585 Note that the "_all_" handler will still be called regardeless
586
587 ignore_elts
588 This option lets you ignore elements when building the twig.
589 This is useful in cases where you cannot use "twig_roots" to
590 ignore elements, for example if the element to ignore is a sib‐
591 ling of elements you are interested in.
592
593 Example:
594
595 my $twig= XML::Twig->new( ignore_elts => { elt => 1 });
596 $twig->parsefile( 'doc.xml');
597
598 This will build the complete twig for the document, except that
599 all "elt" elements (and their children) will be left out.
600
601 char_handler
602 A reference to a subroutine that will be called every time
603 "PCDATA" is found.
604
605 The subroutine receives the string as argument, and returns the
606 modified string:
607
608 # we want all strings in upper case
609 sub my_char_handler
610 { my( $text)= @_;
611 $text= uc( $text);
612 return $text;
613 }
614
615 elt_class
616 The name of a class used to store elements. this class should
617 inherit from "XML::Twig::Elt" (and by default it is
618 "XML::Twig::Elt"). This option is used to subclass the element
619 class and extend it with new methods.
620
621 This option is needed because during the parsing of the XML,
622 elements are created by "XML::Twig", without any control from
623 the user code.
624
625 keep_atts_order
626 Setting this option to a true value causes the attribute hash
627 to be tied to a "Tie::IxHash" object. This means that
628 "Tie::IxHash" needs to be installed for this option to be
629 available. It also means that the hash keeps its order, so you
630 will get the attributes in order. This allows outputing the
631 attributes in the same order as they were in the original docu‐
632 ment.
633
634 keep_encoding
635 This is a (slightly?) evil option: if the XML document is not
636 UTF-8 encoded and you want to keep it that way, then setting
637 keep_encoding will use the"Expat" original_string method for
638 character, thus keeping the original encoding, as well as the
639 original entities in the strings.
640
641 See the "t/test6.t" test file to see what results you can
642 expect from the various encoding options.
643
644 WARNING: if the original encoding is multi-byte then attribute
645 parsing will be EXTREMELY unsafe under any Perl before 5.6, as
646 it uses regular expressions which do not deal properly with
647 multi-byte characters. You can specify an alternate function to
648 parse the start tags with the "parse_start_tag" option (see
649 below)
650
651 WARNING: this option is NOT used when parsing with the non-
652 blocking parser ("parse_start", "parse_more", parse_done meth‐
653 ods) which you probably should not use with XML::Twig anyway as
654 they are totally untested!
655
656 output_encoding
657 This option generates an output_filter using "Encode",
658 "Text::Iconv" or "Unicode::Map8" and "Unicode::Strings", and
659 sets the encoding in the XML declaration. This is the easiest
660 way to deal with encodings, if you need more sophisticated fea‐
661 tures, look at "output_filter" below
662
663 output_filter
664 This option is used to convert the character encoding of the
665 output document. It is passed either a string corresponding to
666 a predefined filter or a subroutine reference. The filter will
667 be called every time a document or element is processed by the
668 "print" functions ("print", "sprint", "flush").
669
670 Pre-defined filters:
671
672 latin1
673 uses either "Encode", "Text::Iconv" or "Unicode::Map8" and
674 "Unicode::String" or a regexp (which works only with
675 XML::Parser 2.27), in this order, to convert all characters
676 to ISO-8859-1 (aka latin1)
677
678 html
679 does the same conversion as "latin1", plus encodes entities
680 using "HTML::Entities" (oddly enough you will need to have
681 HTML::Entities intalled for it to be available). This
682 should only be used if the tags and attribute names them‐
683 selves are in US-ASCII, or they will be converted and the
684 output will not be valid XML any more
685
686 safe
687 converts the output to ASCII (US) only plus character
688 entities ("&#nnn;") this should be used only if the tags
689 and attribute names themselves are in US-ASCII, or they
690 will be converted and the output will not be valid XML any
691 more
692
693 safe_hex
694 same as "safe" except that the character entities are in
695 hexa ("&#xnnn;")
696
697 encode_convert ($encoding)
698 Return a subref that can be used to convert utf8 strings to
699 $encoding). Uses "Encode".
700
701 my $conv = XML::Twig::encode_convert( 'latin1');
702 my $t = XML::Twig->new(output_filter => $conv);
703
704 iconv_convert ($encoding)
705 this function is used to create a filter subroutine that
706 will be used to convert the characters to the target encod‐
707 ing using "Text::Iconv" (which needs to be installed, look
708 at the documentation for the module and for the "iconv"
709 library to find out which encodings are available on your
710 system)
711
712 my $conv = XML::Twig::iconv_convert( 'latin1');
713 my $t = XML::Twig->new(output_filter => $conv);
714
715 unicode_convert ($encoding)
716 this function is used to create a filter subroutine that
717 will be used to convert the characters to the target encod‐
718 ing using "Unicode::Strings" and "Unicode::Map8" (which
719 need to be installed, look at the documentation for the
720 modules to find out which encodings are available on your
721 system)
722
723 my $conv = XML::Twig::unicode_convert( 'latin1');
724 my $t = XML::Twig->new(output_filter => $conv);
725
726 The "text" and "att" methods do not use the filter, so their
727 result are always in unicode.
728
729 Those predeclared filters are based on subroutines that can be
730 used by themselves (as "XML::Twig::foo").
731
732 html_encode ($string)
733 Use "HTML::Entities" to encode a utf8 string
734
735 safe_encode ($string)
736 Use either a regexp (perl < 5.8) or "Encode" to encode non-
737 ascii characters in the string in "&#<nnnn>;" format
738
739 safe_encode_hex ($string)
740 Use either a regexp (perl < 5.8) or "Encode" to encode non-
741 ascii characters in the string in "&#x<nnnn>;" format
742
743 regexp2latin1 ($string)
744 Use a regexp to encode a utf8 string into latin 1
745 (ISO-8859-1). Does not work with Perl 5.8.0!
746
747 output_text_filter
748 same as output_filter, except it doesn't apply to the brackets
749 and quotes around attribute values. This is useful for all fil‐
750 ters that could change the tagging, basically anything that
751 does not just change the encoding of the output. "html", "safe"
752 and "safe_hex" are better used with this option.
753
754 input_filter
755 This option is similar to "output_filter" except the filter is
756 applied to the characters before they are stored in the twig,
757 at parsing time.
758
759 remove_cdata
760 Setting this option to a true value will force the twig to out‐
761 put CDATA sections as regular (escaped) PCDATA
762
763 parse_start_tag
764 If you use the "keep_encoding" option then this option can be
765 used to replace the default parsing function. You should pro‐
766 vide a coderef (a reference to a subroutine) as the argument,
767 this subroutine takes the original tag (given by
768 XML::Parser::Expat "original_string()" method) and returns a
769 tag and the attributes in a hash (or in a list
770 attribute_name/attribute value).
771
772 expand_external_ents
773 When this option is used external entities (that are defined)
774 are expanded when the document is output using "print" func‐
775 tions such as "print ", "sprint ", "flush " and "xml_string ".
776 Note that in the twig the entity will be stored as an element
777 whith a tag '"#ENT"', the entity will not be expanded there, so
778 you might want to process the entities before outputting it.
779
780 load_DTD
781 If this argument is set to a true value, "parse" or "parsefile"
782 on the twig will load the DTD information. This information
783 can then be accessed through the twig, in a "DTD_handler" for
784 example. This will load even an external DTD.
785
786 Default and fixed values for attributes will also be filled,
787 based on the DTD.
788
789 Note that to do this the module will generate a temporary file
790 in the current directory. If this is a problem let me know and
791 I will add an option to specify an alternate directory.
792
793 See DTD Handling for more information
794
795 DTD_handler
796 Set a handler that will be called once the doctype (and the
797 DTD) have been loaded, with 2 arguments, the twig and the DTD.
798
799 no_prolog
800 Does not output a prolog (XML declaration and DTD)
801
802 id This optional argument gives the name of an attribute that can
803 be used as an ID in the document. Elements whose ID is known
804 can be accessed through the elt_id method. id defaults to 'id'.
805 See "BUGS "
806
807 discard_spaces
808 If this optional argument is set to a true value then spaces
809 are discarded when they look non-significant: strings contain‐
810 ing only spaces are discarded. This argument is set to true by
811 default.
812
813 keep_spaces
814 If this optional argument is set to a true value then all spa‐
815 ces in the document are kept, and stored as "PCDATA".
816
817 Warning: adding this option can result in changes in the twig
818 generated: space that was previously discarded might end up in
819 a new text element. see the difference by calling the following
820 code with 0 and 1 as arguments:
821
822 perl -MXML::Twig -e'print XML::Twig->new( keep_spaces => shift)->parse( "<d> \n<e/></d>")->_dump'
823
824 "keep_spaces" and "discard_spaces" cannot be both set.
825
826 discard_spaces_in
827 This argument sets "keep_spaces" to true but will cause the
828 twig builder to discard spaces in the elements listed.
829
830 The syntax for using this argument is:
831
832 XML::Twig->new( discard_spaces_in => [ 'elt1', 'elt2']);
833
834 keep_spaces_in
835 This argument sets "discard_spaces" to true but will cause the
836 twig builder to keep spaces in the elements listed.
837
838 The syntax for using this argument is:
839
840 XML::Twig->new( keep_spaces_in => [ 'elt1', 'elt2']);
841
842 Warning: adding this option can result in changes in the twig
843 generated: space that was previously discarded might end up in
844 a new text element.
845
846 pretty_print
847 Set the pretty print method, amongst '"none"' (default),
848 '"nsgmls"', '"nice"', '"indented"', '"indented_c"', "wrapped",
849 '"record"' and '"record_c"'
850
851 pretty_print formats:
852
853 none
854 The document is output as one ling string, with no line
855 breaks except those found within text elements
856
857 nsgmls
858 Line breaks are inserted in safe places: that is within
859 tags, between a tag and an attribute, between attributes
860 and before the > at the end of a tag.
861
862 This is quite ugly but better than "none", and it is very
863 safe, the document will still be valid (conforming to its
864 DTD).
865
866 This is how the SGML parser "sgmls" splits documents, hence
867 the name.
868
869 nice
870 This option inserts line breaks before any tag that does
871 not contain text (so element with textual content are not
872 broken as the \n is the significant).
873
874 WARNING: this option leaves the document well-formed but
875 might make it invalid (not conformant to its DTD). If you
876 have elements declared as
877
878 <!ELEMENT foo (#PCDATA⎪bar)>
879
880 then a "foo" element including a "bar" one will be printed
881 as
882
883 <foo>
884 <bar>bar is just pcdata</bar>
885 </foo>
886
887 This is invalid, as the parser will take the line break
888 after the "foo" tag as a sign that the element contains
889 PCDATA, it will then die when it finds the "bar" tag. This
890 may or may not be important for you, but be aware of it!
891
892 indented
893 Same as "nice" (and with the same warning) but indents ele‐
894 ments according to their level
895
896 indented_c
897 Same as "indented" but a little more compact: the closing
898 tags are on the same line as the preceeding text
899
900 wrapped
901 Same as "indented_c" but lines are wrapped using
902 Text::Wrap::wrap. The default length for lines is the
903 default for $Text::Wrap::columns, and can be changed by
904 changing that variable.
905
906 record
907 This is a record-oriented pretty print, that display data
908 in records, one field per line (which looks a LOT like
909 "indented")
910
911 record_c
912 Stands for record compact, one record per line
913
914 empty_tags
915 Set the empty tag display style ('"normal"', '"html"' or
916 '"expand"').
917
918 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
919 '"<tag />"' for elements that can be empty in XHTML and
920 "expand" outputs '"<tag></tag>"'
921
922 quote
923 Set the quote character for attributes ('"single"' or '"dou‐
924 ble"').
925
926 comments
927 Set the way comments are processed: '"drop"' (default),
928 '"keep"' or '"process"'
929
930 Comments processing options:
931
932 drop
933 drops the comments, they are not read, nor printed to the
934 output
935
936 keep
937 comments are loaded and will appear on the output, they are
938 not accessible within the twig and will not interfere with
939 processing though
940
941 Note: comments in the middle of a text element such as
942
943 <p>text <!-- comment --> more text --></p>
944
945 are kept at their original position in the text. Using
946 ˝"print" methods like "print" or "sprint" will return the
947 comments in the text. Using "text" or "field" on the other
948 hand will not.
949
950 Any use of "set_pcdata" on the "#PCDATA" element (directly
951 or through other methods like "set_content") will delete
952 the comment(s).
953
954 process
955 comments are loaded in the twig and will be treated as reg‐
956 ular elements (their "tag" is "#COMMENT") this can inter‐
957 fere with processing if you expect "$elt->{first_child}" to
958 be an element but find a comment there. Validation will
959 not protect you from this as comments can happen anywhere.
960 You can use "$elt->first_child( 'tag')" (which is a good
961 habit anyway) to get where you want.
962
963 Consider using "process" if you are outputing SAX events
964 from XML::Twig.
965
966 pi Set the way processing instructions are processed: '"drop"',
967 '"keep"' (default) or '"process"'
968
969 Note that you can also set PI handlers in the "twig_handlers"
970 option:
971
972 '?' => \&handler
973 '?target' => \&handler 2
974
975 The handlers will be called with 2 parameters, the twig and the
976 PI element if "pi" is set to "process", and with 3, the twig,
977 the target and the data if "pi" is set to "keep". Of course
978 they will not be called if "pi" is set to "drop".
979
980 If "pi" is set to "keep" the handler should return a string
981 that will be used as-is as the PI text (it should look like ""
982 <?target data?" >" or '' if you want to remove the PI),
983
984 Only one handler will be called, "?target" or "?" if no spe‐
985 cific handler for that target is available.
986
987 map_xmlns
988 This option is passed a hashref that maps uri's to prefixes.
989 The prefixes in the document will be replaced by the ones in
990 the map. The mapped prefixes can (actually have to) be used to
991 trigger handlers, navigate or query the document.
992
993 Here is an example:
994
995 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
996 twig_handlers =>
997 { 'svg:circle' => sub { $_->set_att( r => 20) } },
998 pretty_print => 'indented',
999 )
1000 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1001 <gr:circle cx="10" cy="90" r="10"/>
1002 </doc>'
1003 )
1004 ->print;
1005
1006 This will output:
1007
1008 <doc xmlns:svg="http://www.w3.org/2000/svg">
1009 <svg:circle cx="10" cy="90" r="20"/>
1010 </doc>
1011
1012 keep_original_prefix
1013 When used with "map_xmlns" this option will make "XML::Twig"
1014 use the original namespace prefixes when outputing a document.
1015 The mapped prefix will still be used for triggering handlers
1016 and in navigation and query methods.
1017
1018 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1019 twig_handlers =>
1020 { 'svg:circle' => sub { $_->set_att( r => 20) } },
1021 keep_original_prefix => 1,
1022 pretty_print => 'indented',
1023 )
1024 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1025 <gr:circle cx="10" cy="90" r="10"/>
1026 </doc>'
1027 )
1028 ->print;
1029
1030 This will output:
1031
1032 <doc xmlns:gr="http://www.w3.org/2000/svg">
1033 <gr:circle cx="10" cy="90" r="20"/>
1034 </doc>
1035
1036 index ($arrayref or $hashref)
1037 This option creates lists of specific elements during the pars‐
1038 ing of the XML. It takes a reference to either a list of trig‐
1039 gering expressions or to a hash name => expression, and for
1040 each one generates the list of elements that match the expres‐
1041 sion. The list can be accessed through the "index" method.
1042
1043 example:
1044
1045 # using an array ref
1046 my $t= XML::Twig->new( index => [ 'div', 'table' ])
1047 ->parsefile( "foo.xml');
1048 my $divs= $t->index( 'div');
1049 my $first_div= $divs->[0];
1050 my $last_table= $t->index( table => -1);
1051
1052 # using a hashref to name the indexes
1053 my $t= XML::Twig->new( index => { email => 'a[@href=~/^\s*mailto:/]')
1054 ->parsefile( "foo.xml');
1055 my $last_emails= $t->index( email => -1);
1056
1057 Note that the index is not maintained after the parsing. If
1058 elements are deleted, renamed or otherwise hurt during process‐
1059 ing, the index is NOT updated.
1060
1061 Note: I _HATE_ the Java-like name of arguments used by most XML
1062 modules. So in pure TIMTOWTDI fashion all arguments can be written
1063 either as "UglyJavaLikeName" or as "readable_perl_name":
1064 "twig_print_outside_roots" or "TwigPrintOutsideRoots" (or even
1065 "twigPrintOutsideRoots" {shudder}). XML::Twig normalizes them
1066 before processing them.
1067
1068 parse ( $source)
1069 The $source parameter should either be a string containing the
1070 whole XML document, or it should be an open "IO::Handle". Construc‐
1071 tor options to "XML::Parser::Expat" given as keyword-value pairs
1072 may follow the$source parameter. These override, for this call, any
1073 options or attributes passed through from the XML::Parser instance.
1074
1075 A die call is thrown if a parse error occurs. Otherwise it will
1076 return the twig built by the parse. Use "safe_parse" if you want
1077 the parsing to return even when an error occurs.
1078
1079 parsestring
1080 This is just an alias for "parse" for backwards compatibility.
1081
1082 parsefile (FILE [, OPT => OPT_VALUE [...]])
1083 Open "FILE" for reading, then call "parse" with the open handle.
1084 The file is closed no matter how "parse" returns.
1085
1086 A "die" call is thrown if a parse error occurs. Otherwise it will
1087 return the twig built by the parse. Use "safe_parsefile" if you
1088 want the parsing to return even when an error occurs.
1089
1090 parsefile_inplace ( $file, $optional_extension)
1091 Parse and update a file "in place". It does this by creating a temp
1092 file, selecting it as the default for print() statements (and meth‐
1093 ods), then parsing the input file. If the parsing is successful,
1094 then the temp file is moved to replace the input file.
1095
1096 If an extension is given then the original file is backed-up (the
1097 rules for the extension are the same as the rule for the -i option
1098 in perl).
1099
1100 parsefile_html_inplace ( $file, $optional_extension)
1101 Same as parsefile_inplace, except that it parses HTML instead of
1102 XML
1103
1104 parseurl ($url $optional_user_agent)
1105 Gets the data from $url and parse it. The data is piped to the
1106 parser in chunks the size of the XML::Parser::Expat buffer, so mem‐
1107 ory consumption and hopefully speed are optimal.
1108
1109 For most (read "small") XML it is probably as efficient (and easier
1110 to debug) to just "get" the XML file and then parse it as a string.
1111
1112 use XML::Twig;
1113 use LWP::Simple;
1114 my $twig= XML::Twig->new();
1115 $twig->parse( LWP::Simple::get( $URL ));
1116
1117 or
1118
1119 use XML::Twig;
1120 my $twig= XML::Twig->nparse( $URL);
1121
1122 If the $optional_user_agent argument is used then it is used, oth‐
1123 erwise a new one is created.
1124
1125 safe_parse ( SOURCE [, OPT => OPT_VALUE [...]])
1126 This method is similar to "parse" except that it wraps the parsing
1127 in an "eval" block. It returns the twig on success and 0 on failure
1128 (the twig object also contains the parsed twig). $@ contains the
1129 error message on failure.
1130
1131 Note that the parsing still stops as soon as an error is detected,
1132 there is no way to keep going after an error.
1133
1134 safe_parsefile (FILE [, OPT => OPT_VALUE [...]])
1135 This method is similar to "parsefile" except that it wraps the
1136 parsing in an "eval" block. It returns the twig on success and 0 on
1137 failure (the twig object also contains the parsed twig) . $@ con‐
1138 tains the error message on failure
1139
1140 Note that the parsing still stops as soon as an error is detected,
1141 there is no way to keep going after an error.
1142
1143 safe_parseurl ($url $optional_user_agent)
1144 Same as "parseurl" except that it wraps the parsing in an "eval"
1145 block. It returns the twig on success and 0 on failure (the twig
1146 object also contains the parsed twig) . $@ contains the error mes‐
1147 sage on failure
1148
1149 parse_html
1150 parse an HTML string or file handle (by converting it to XML using
1151 HTML::TreeBuilder, which needs to be available).
1152
1153 This works nicely, but some information gets lost in the process:
1154 newlines are removed, and (at least on the version I use), comments
1155 get get an extra CDATA section inside ( <!-- foo --> becomes <!--
1156 <![CDATA[ foo ]]> -->
1157
1158 parsefile_html
1159 parse an HTML file (by converting it to XML using HTML::Tree‐
1160 Builder, which needs to be available). The file is loaded com‐
1161 pletely in memory and converted to XML before being parsed.
1162
1163 Alpha: implementation, and thus generated XML could change.
1164
1165 xparse ($thing_to_parse)
1166 parse the $thing_to_parse, whether it is a filehandle, a string, an
1167 HTML file, an HTML URL, an URL or a file.
1168
1169 Note that this is mostly a convenience method for one-off scripts.
1170 For example files that end in '.htm' or '.html' are parsed first as
1171 XML, and if this fails as HTML. This is certainly not the most
1172 efficient way to do this in general.
1173
1174 nparse ($optional_twig_options, $thing_to_parse)
1175 create a twig with the $optional_options, and parse the
1176 $thing_to_parse, whether it is a filehandle, a string, an HTML
1177 file, an HTML URL, an URL or a file.
1178
1179 Examples:
1180
1181 XML::Twig->nparse( "file.xml");
1182 XML::Twig->nparse( error_context => 1, "file://file.xml");
1183
1184 nparse_pp ($optional_twig_options, $thing_to_parse)
1185 same as "nparse" but also sets the "pretty_print" option to
1186 "indented".
1187
1188 nparse_e ($optional_twig_options, $thing_to_parse)
1189 same as "nparse" but also sets the "error_context" option to 1.
1190
1191 nparse_ppe ($optional_twig_options, $thing_to_parse)
1192 same as "nparse" but also sets the "pretty_print" option to
1193 "indented" and the "error_context" option to 1.
1194
1195 parser
1196 This method returns the "expat" object (actually the
1197 XML::Parser::Expat object) used during parsing. It is useful for
1198 example to call XML::Parser::Expat methods on it. To get the line
1199 of a tag for example use "$t->parser->current_line".
1200
1201 setTwigHandlers ($handlers)
1202 Set the twig_handlers. $handlers is a reference to a hash similar
1203 to the one in the "twig_handlers" option of new. All previous han‐
1204 dlers are unset. The method returns the reference to the previous
1205 handlers.
1206
1207 setTwigHandler ($exp $handler)
1208 Set a single twig_handler for elements matching $exp. $handler is a
1209 reference to a subroutine. If the handler was previously set then
1210 the reference to the previous handler is returned.
1211
1212 setStartTagHandlers ($handlers)
1213 Set the start_tag handlers. $handlers is a reference to a hash sim‐
1214 ilar to the one in the "start_tag_handlers" option of new. All pre‐
1215 vious handlers are unset. The method returns the reference to the
1216 previous handlers.
1217
1218 setStartTagHandler ($exp $handler)
1219 Set a single start_tag handlers for elements matching $exp. $han‐
1220 dler is a reference to a subroutine. If the handler was previously
1221 set then the reference to the previous handler is returned.
1222
1223 setEndTagHandlers ($handlers)
1224 Set the end_tag handlers. $handlers is a reference to a hash simi‐
1225 lar to the one in the "end_tag_handlers" option of new. All previ‐
1226 ous handlers are unset. The method returns the reference to the
1227 previous handlers.
1228
1229 setEndTagHandler ($exp $handler)
1230 Set a single end_tag handlers for elements matching $exp. $handler
1231 is a reference to a subroutine. If the handler was previously set
1232 then the reference to the previous handler is returned.
1233
1234 setTwigRoots ($handlers)
1235 Same as using the "twig_roots" option when creating the twig
1236
1237 setCharHandler ($exp $handler)
1238 Set a "char_handler"
1239
1240 setIgnoreEltsHandler ($exp)
1241 Set a "ignore_elt" handler (elements that match $exp will be
1242 ignored
1243
1244 setIgnoreEltsHandlers ($exp)
1245 Set all "ignore_elt" handlers (previous handlers are replaced)
1246
1247 dtd Return the dtd (an XML::Twig::DTD object) of a twig
1248
1249 xmldecl
1250 Return the XML declaration for the document, or a default one if it
1251 doesn't have one
1252
1253 doctype
1254 Return the doctype for the document
1255
1256 dtd_text
1257 Return the DTD text
1258
1259 dtd_print
1260 Print the DTD
1261
1262 model ($tag)
1263 Return the model (in the DTD) for the element $tag
1264
1265 root
1266 Return the root element of a twig
1267
1268 set_root ($elt)
1269 Set the root of a twig
1270
1271 first_elt ($optional_condition)
1272 Return the first element matching $optional_condition of a twig, if
1273 no condition is given then the root is returned
1274
1275 last_elt ($optional_condition)
1276 Return the last element matching $optional_condition of a twig, if
1277 no condition is given then the last element of the twig is returned
1278
1279 elt_id ($id)
1280 Return the element whose "id" attribute is $id
1281
1282 getEltById
1283 Same as "elt_id"
1284
1285 index ($index_name, $optional_index)
1286 If the $optional_index argument is present, return the correspond‐
1287 ing element in the index (created using the "index" option for
1288 "XML::Twig-"new>)
1289
1290 If the argument is not present, return an arrayref to the index
1291
1292 encoding
1293 This method returns the encoding of the XML document, as defined by
1294 the "encoding" attribute in the XML declaration (ie it is "undef"
1295 if the attribute is not defined)
1296
1297 set_encoding
1298 This method sets the value of the "encoding" attribute in the XML
1299 declaration. Note that if the document did not have a declaration
1300 it is generated (with an XML version of 1.0)
1301
1302 xml_version
1303 This method returns the XML version, as defined by the "version"
1304 attribute in the XML declaration (ie it is "undef" if the attribute
1305 is not defined)
1306
1307 set_xml_version
1308 This method sets the value of the "version" attribute in the XML
1309 declaration. If the declaration did not exist it is created.
1310
1311 standalone
1312 This method returns the value of the "standalone" declaration for
1313 the document
1314
1315 set_standalone
1316 This method sets the value of the "standalone" attribute in the XML
1317 declaration. Note that if the document did not have a declaration
1318 it is generated (with an XML version of 1.0)
1319
1320 set_output_encoding
1321 Set the "encoding" "attribute" in the XML declaration
1322
1323 set_doctype ($name, $system, $public, $internal)
1324 Set the doctype of the element. If an argument is "undef" (or not
1325 present) then its former value is retained, if a false ('' or 0)
1326 value is passed then the former value is deleted;
1327
1328 entity_list
1329 Return the entity list of a twig
1330
1331 entity_names
1332 Return the list of all defined entities
1333
1334 entity ($entity_name)
1335 Return the entity
1336
1337 change_gi ($old_gi, $new_gi)
1338 Performs a (very fast) global change. All elements $old_gi are now
1339 $new_gi. This is a bit dangerous though and should be avoided if <
1340 possible, as the new tag might be ignored in subsequent processing.
1341
1342 See "BUGS "
1343
1344 flush ($optional_filehandle, %options)
1345 Flushes a twig up to (and including) the current element, then
1346 deletes all unnecessary elements from the tree that's kept in mem‐
1347 ory. "flush" keeps track of which elements need to be open/closed,
1348 so if you flush from handlers you don't have to worry about any‐
1349 thing. Just keep flushing the twig every time you're done with a
1350 sub-tree and it will come out well-formed. After the whole parsing
1351 don't forget to"flush" one more time to print the end of the docu‐
1352 ment. The doctype and entity declarations are also printed.
1353
1354 flush take an optional filehandle as an argument.
1355
1356 options: use the "update_DTD" option if you have updated the
1357 (internal) DTD and/or the entity list and you want the updated DTD
1358 to be output
1359
1360 The "pretty_print" option sets the pretty printing of the document.
1361
1362 Example: $t->flush( Update_DTD => 1);
1363 $t->flush( $filehandle, pretty_print => 'indented');
1364 $t->flush( \*FILE);
1365
1366 flush_up_to ($elt, $optional_filehandle, %options)
1367 Flushes up to the $elt element. This allows you to keep part of the
1368 tree in memory when you "flush".
1369
1370 options: see flush.
1371
1372 purge
1373 Does the same as a "flush" except it does not print the twig. It
1374 just deletes all elements that have been completely parsed so far.
1375
1376 purge_up_to ($elt)
1377 Purges up to the $elt element. This allows you to keep part of the
1378 tree in memory when you "purge".
1379
1380 print ($optional_filehandle, %options)
1381 Prints the whole document associated with the twig. To be used only
1382 AFTER the parse.
1383
1384 options: see "flush".
1385
1386 print_to_file ($filename, %options)
1387 Prints the whole document associated with the twig to file $file‐
1388 name. To be used only AFTER the parse.
1389
1390 options: see "flush".
1391
1392 sprint
1393 Return the text of the whole document associated with the twig. To
1394 be used only AFTER the parse.
1395
1396 options: see "flush".
1397
1398 trim
1399 Trim the document: gets rid of initial and trailing spaces, and
1400 relace multiple spaces by a single one.
1401
1402 toSAX1 ($handler)
1403 Send SAX events for the twig to the SAX1 handler $handler
1404
1405 toSAX2 ($handler)
1406 Send SAX events for the twig to the SAX2 handler $handler
1407
1408 flush_toSAX1 ($handler)
1409 Same as flush, except that SAX events are sent to the SAX1 handler
1410 $handler instead of the twig being printed
1411
1412 flush_toSAX2 ($handler)
1413 Same as flush, except that SAX events are sent to the SAX2 handler
1414 $handler instead of the twig being printed
1415
1416 ignore
1417 This method hould be called during parsing, usually in
1418 "start_tag_handlers". It causes the element to be skipped during
1419 the parsing: the twig is not built for this element, it will not be
1420 accessible during parsing or after it. The element will not take up
1421 any memory and parsing will be faster.
1422
1423 Note that this method can also be called on an element. If the ele‐
1424 ment is a parent of the current element then this element will be
1425 ignored (the twig will not be built any more for it and what has
1426 already been built will be deleted).
1427
1428 set_pretty_print ($style)
1429 Set the pretty print method, amongst '"none"' (default),
1430 '"nsgmls"', '"nice"', '"indented"', "indented_c", '"wrapped"',
1431 '"record"' and '"record_c"'
1432
1433 WARNING: the pretty print style is a GLOBAL variable, so once set
1434 it's applied to ALL "print"'s (and "sprint"'s). Same goes if you
1435 use XML::Twig with "mod_perl" . This should not be a problem as the
1436 XML that's generated is valid anyway, and XML processors (as well
1437 as HTML processors, including browsers) should not care. Let me
1438 know if this is a big problem, but at the moment the perfor‐
1439 mance/cleanliness trade-off clearly favors the global approach.
1440
1441 set_empty_tag_style ($style)
1442 Set the empty tag display style ('"normal"', '"html"' or
1443 '"expand"'). As with "set_pretty_print" this sets a global flag.
1444
1445 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
1446 '"<tag />"' for elements that can be empty in XHTML and "expand"
1447 outputs '"<tag></tag>"'
1448
1449 set_remove_cdata ($flag)
1450 set (or unset) the flag that forces the twig to output CDATA sec‐
1451 tions as regular (escaped) PCDATA
1452
1453 print_prolog ($optional_filehandle, %options)
1454 Prints the prolog (XML declaration + DTD + entity declarations) of
1455 a document.
1456
1457 options: see "flush".
1458
1459 prolog ($optional_filehandle, %options)
1460 Return the prolog (XML declaration + DTD + entity declarations) of
1461 a document.
1462
1463 options: see "flush".
1464
1465 finish
1466 Call Expat "finish" method. Unsets all handlers (including inter‐
1467 nal ones that set context), but expat continues parsing to the end
1468 of the document or until it finds an error. It should finish up a
1469 lot faster than with the handlers set.
1470
1471 finish_print
1472 Stop twig processing, flush the twig and proceed to finish printing
1473 the document as fast as possible. Use this method when modifying a
1474 document and the modification is done.
1475
1476 set_expand_external_entities
1477 Same as using the "expand_external_ents" option when creating the
1478 twig
1479
1480 set_input_filter
1481 Same as using the "input_filter" option when creating the twig
1482
1483 set_keep_atts_order
1484 Same as using the "keep_atts_order" option when creating the twig
1485
1486 set_keep_encoding
1487 Same as using the "keep_encoding" option when creating the twig
1488
1489 set_output_filter
1490 Same as using the "output_filter" option when creating the twig
1491
1492 set_output_text_filter
1493 Same as using the "output_text_filter" option when creating the
1494 twig
1495
1496 add_stylesheet ($type, @options)
1497 Adds an external stylesheet to an XML document.
1498
1499 Supported types and options:
1500
1501 xsl option: the url of the stylesheet
1502
1503 Example:
1504
1505 $t->add_stylesheet( xsl => "xsl_style.xsl");
1506
1507 will generate the following PI at the beginning of the docu‐
1508 ment:
1509
1510 <?xml-stylesheet type="text/xsl" href="xsl_style.xsl"?>
1511
1512 css option: the url of the stylesheet
1513
1514 Methods inherited from XML::Parser::Expat
1515 A twig inherits all the relevant methods from XML::Parser::Expat.
1516 These methods can only be used during the parsing phase (they will
1517 generate a fatal error otherwise).
1518
1519 Inherited methods are:
1520
1521 depth
1522 Returns the size of the context list.
1523
1524 in_element
1525 Returns true if NAME is equal to the name of the innermost cur‐
1526 rently opened element. If namespace processing is being used
1527 and you want to check against a name that may be in a names‐
1528 pace, then use the generate_ns_name method to create the NAME
1529 argument.
1530
1531 within_element
1532 Returns the number of times the given name appears in the con‐
1533 text list. If namespace processing is being used and you want
1534 to check against a name that may be in a namespace, then use
1535 the gener‐ ate_ns_name method to create the NAME argument.
1536
1537 context
1538 Returns a list of element names that represent open elements,
1539 with the last one being the innermost. Inside start and end tag
1540 han‐ dlers, this will be the tag of the parent element.
1541
1542 current_line
1543 Returns the line number of the current position of the parse.
1544
1545 current_column
1546 Returns the column number of the current position of the parse.
1547
1548 current_byte
1549 Returns the current position of the parse.
1550
1551 position_in_context
1552 Returns a string that shows the current parse position. LINES
1553 should be an integer >= 0 that represents the number of lines
1554 on either side of the current parse line to place into the
1555 returned string.
1556
1557 base ([NEWBASE])
1558 Returns the current value of the base for resolving relative
1559 URIs. If NEWBASE is supplied, changes the base to that value.
1560
1561 current_element
1562 Returns the name of the innermost currently opened element.
1563 Inside start or end handlers, returns the parent of the element
1564 associated with those tags.
1565
1566 element_index
1567 Returns an integer that is the depth-first visit order of the
1568 cur‐ rent element. This will be zero outside of the root ele‐
1569 ment. For example, this will return 1 when called from the
1570 start handler for the root element start tag.
1571
1572 recognized_string
1573 Returns the string from the document that was recognized in
1574 order to call the current handler. For instance, when called
1575 from a start handler, it will give us the the start-tag string.
1576 The string is encoded in UTF-8. This method doesn't return a
1577 meaningful string inside declaration handlers.
1578
1579 original_string
1580 Returns the verbatim string from the document that was recog‐
1581 nized in order to call the current handler. The string is in
1582 the original document encoding. This method doesn't return a
1583 meaningful string inside declaration handlers.
1584
1585 xpcroak
1586 Concatenate onto the given message the current line number
1587 within the XML document plus the message implied by ErrorCon‐
1588 text. Then croak with the formed message.
1589
1590 xpcarp
1591 Concatenate onto the given message the current line number
1592 within the XML document plus the message implied by ErrorCon‐
1593 text. Then carp with the formed message.
1594
1595 xml_escape(TEXT [, CHAR [, CHAR ...]])
1596 Returns TEXT with markup characters turned into character enti‐
1597 ties. Any additional characters provided as arguments are also
1598 turned into character references where found in TEXT.
1599
1600 (this method is broken on some versions of expat/XML::Parser)
1601
1602 path ( $optional_tag)
1603 Return the element context in a form similar to XPath's short form:
1604 '"/root/tag1/../tag"'
1605
1606 get_xpath ( $optional_array_ref, $xpath, $optional_offset)
1607 Performs a "get_xpath" on the document root (see <Elt⎪"Elt">)
1608
1609 If the $optional_array_ref argument is used the array must contain
1610 elements. The $xpath expression is applied to each element in turn
1611 and the result is union of all results. This way a first query can
1612 be refined in further steps.
1613
1614 find_nodes ( $optional_array_ref, $xpath, $optional_offset)
1615 same as "get_xpath"
1616
1617 findnodes ( $optional_array_ref, $xpath, $optional_offset)
1618 same as "get_xpath" (similar to the XML::LibXML method)
1619
1620 findvalue ( $optional_array_ref, $xpath, $optional_offset)
1621 Return the "join" of all texts of the results of appling
1622 "get_xpath" to the node (similar to the XML::LibXML method)
1623
1624 subs_text ($regexp, $replace)
1625 subs_text does text substitution on the whole document, similar to
1626 perl's " s///" operator.
1627
1628 dispose
1629 Useful only if you don't have "Scalar::Util" or "WeakRef"
1630 installed.
1631
1632 Reclaims properly the memory used by an XML::Twig object. As the
1633 object has circular references it never goes out of scope, so if
1634 you want to parse lots of XML documents then the memory leak
1635 becomes a problem. Use "$twig->dispose" to clear this problem.
1636
1637 create_accessors (list_of_attribute_names)
1638 A convenience method that creates l-valued accessors for
1639 attributes. So "$twig->create_accessors( 'foo')" will create a
1640 "foo" method that can be called on elements:
1641
1642 $elt->foo; # equivalent to $elt->{'att'}->{'foo'};
1643 $elt->foo( 'bar'); # equivalent to $elt->set_att( foo => 'bar');
1644
1645 set_do_not_escape_amp_in_atts
1646 An evil method, that I only document because Test::Pod::Coverage
1647 complaints otherwise, but really, you don't want to know about it.
1648
1649 XML::Twig::Elt
1650
1651 new ($optional_tag, $optional_atts, @optional_content)
1652 The "tag" is optional (but then you can't have a content ), the
1653 $optional_atts argument is a refreference to a hash of attributes,
1654 the content can be just a string or a list of strings and element.
1655 A content of '"#EMPTY"' creates an empty element;
1656
1657 Examples: my $elt= XML::Twig::Elt->new();
1658 my $elt= XML::Twig::Elt->new( para => { align => 'center' });
1659 my $elt= XML::Twig::Elt->new( para => { align => 'center' }, 'foo');
1660 my $elt= XML::Twig::Elt->new( br => '#EMPTY');
1661 my $elt= XML::Twig::Elt->new( 'para');
1662 my $elt= XML::Twig::Elt->new( para => 'this is a para');
1663 my $elt= XML::Twig::Elt->new( para => $elt3, 'another para');
1664
1665 The strings are not parsed, the element is not attached to any
1666 twig.
1667
1668 WARNING: if you rely on ID's then you will have to set the id your‐
1669 self. At this point the element does not belong to a twig yet, so
1670 the ID attribute is not known so it won't be strored in the ID
1671 list.
1672
1673 Note that "#COMMENT", "#PCDATA" or "#CDATA" are valid tag names,
1674 that will create text elements.
1675
1676 To create an element "foo" containing a CDATA section:
1677
1678 my $foo= XML::Twig::Elt->new( '#CDATA' => "content of the CDATA section")
1679 ->wrap_in( 'foo');
1680
1681 An attribute of '#CDATA', will create the content of the attribute
1682 as CDATA:
1683
1684 my $elt= XML::Twig::Elt->new( 'p' => { #CDATA => 1}, 'foo < bar');
1685
1686 creates an element
1687
1688 <p><![CDATA[foo < bar]]></>
1689
1690 parse ($string, %args)
1691 Creates an element from an XML string. The string is actually
1692 parsed as a new twig, then the root of that twig is returned. The
1693 arguments in %args are passed to the twig. As always if the parse
1694 fails the parser will die, so use an eval if you want to trap syn‐
1695 tax errors.
1696
1697 As obviously the element does not exist beforehand this method has
1698 to be called on the class:
1699
1700 my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
1701 <elements>, actually tons of </elements>
1702 h</a>");
1703
1704 set_inner_xml ($string)
1705 Sets the content of the element to be the tree created from the
1706 string
1707
1708 set_inner_html ($string)
1709 Sets the content of the element, after parsing the string with an
1710 HTML parser (HTML::Parser)
1711
1712 print ($optional_filehandle, $optional_pretty_print_style)
1713 Prints an entire element, including the tags, optionally to a
1714 $optional_filehandle, optionally with a $pretty_print_style.
1715
1716 The print outputs XML data so base entities are escaped.
1717
1718 sprint ($elt, $optional_no_enclosing_tag)
1719 Return the xml string for an entire element, including the tags.
1720 If the optional second argument is true then only the string inside
1721 the element is returned (the start and end tag for $elt are not).
1722 The text is XML-escaped: base entities (& and < in text, & < and "
1723 in attribute values) are turned into entities.
1724
1725 gi Return the gi of the element (the gi is the "generic identifier"
1726 the tag name in SGML parlance).
1727
1728 "tag" and "name" are synonyms of "gi".
1729
1730 tag Same as "gi"
1731
1732 name
1733 Same as "tag"
1734
1735 set_gi ($tag)
1736 Set the gi (tag) of an element
1737
1738 set_tag ($tag)
1739 Set the tag (="tag") of an element
1740
1741 set_name ($name)
1742 Set the name (="tag") of an element
1743
1744 root
1745 Return the root of the twig in which the element is contained.
1746
1747 twig
1748 Return the twig containing the element.
1749
1750 parent ($optional_condition)
1751 Return the parent of the element, or the first ancestor matching
1752 the $optional_condition
1753
1754 first_child ($optional_condition)
1755 Return the first child of the element, or the first child matching
1756 the $optional_condition
1757
1758 has_child ($optional_condition)
1759 Return the first child of the element, or the first child matching
1760 the $optional_condition (same as first_child)
1761
1762 has_children ($optional_condition)
1763 Return the first child of the element, or the first child matching
1764 the $optional_condition (same as first_child)
1765
1766 first_child_text ($optional_condition)
1767 Return the text of the first child of the element, or the first
1768 child
1769 matching the $optional_condition If there is no first_child then
1770 returns ''. This avoids getting the child, checking for its exis‐
1771 tence then getting the text for trivial cases.
1772
1773 Similar methods are available for the other navigation methods:
1774
1775 last_child_text
1776 prev_sibling_text
1777 next_sibling_text
1778 prev_elt_text
1779 next_elt_text
1780 child_text
1781 parent_text
1782
1783 All this methods also exist in "trimmed" variant:
1784
1785 first_child_trimmed_text
1786 last_child_trimmed_text
1787 prev_sibling_trimmed_text
1788 next_sibling_trimmed_text
1789 prev_elt_trimmed_text
1790 next_elt_trimmed_text
1791 child_trimmed_text
1792 parent_trimmed_text
1793 field ($optional_condition)
1794 Same method as "first_child_text" with a different name
1795
1796 trimmed_field ($optional_condition)
1797 Same method as "first_child_trimmed_text" with a different name
1798
1799 set_field ($condition, $optional_atts, @list_of_elt_and_strings)
1800 Set the content of the first child of the element that matches
1801 $condition, the rest of the arguments is tha same as for "set_con‐
1802 tent"
1803
1804 If no child matches $condition _and_ if $condition is a valid XML
1805 element name, then a new element by that name is created and
1806 inserted as the last child.
1807
1808 first_child_matches ($optional_condition)
1809 Return the element if the first child of the element (if it exists)
1810 passes the $optional_condition "undef" otherwise
1811
1812 if( $elt->first_child_matches( 'title')) ...
1813
1814 is equivalent to
1815
1816 if( $elt->{first_child} && $elt->{first_child}->passes( 'title'))
1817
1818 "first_child_is" is an other name for this method
1819
1820 Similar methods are available for the other navigation methods:
1821
1822 last_child_matches
1823 prev_sibling_matches
1824 next_sibling_matches
1825 prev_elt_matches
1826 next_elt_matches
1827 child_matches
1828 parent_matches
1829 is_first_child ($optional_condition)
1830 returns true (the element) if the element is the first child of its
1831 parent (optionaly that satisfies the $optional_condition)
1832
1833 is_last_child ($optional_condition)
1834 returns true (the element) if the element is the first child of its
1835 parent (optionaly that satisfies the $optional_condition)
1836
1837 prev_sibling ($optional_condition)
1838 Return the previous sibling of the element, or the previous sibling
1839 matching $optional_condition
1840
1841 next_sibling ($optional_condition)
1842 Return the next sibling of the element, or the first one matching
1843 $optional_condition.
1844
1845 next_elt ($optional_elt, $optional_condition)
1846 Return the next elt (optionally matching $optional_condition) of
1847 the element. This is defined as the next element which opens after
1848 the current element opens. Which usually means the first child of
1849 the element. Counter-intuitive as it might look this allows you to
1850 loop through the whole document by starting from the root.
1851
1852 The $optional_elt is the root of a subtree. When the "next_elt" is
1853 out of the subtree then the method returns undef. You can then walk
1854 a sub tree with:
1855
1856 my $elt= $subtree_root;
1857 while( $elt= $elt->next_elt( $subtree_root)
1858 { # insert processing code here
1859 }
1860
1861 prev_elt ($optional_condition)
1862 Return the previous elt (optionally matching $optional_condition)
1863 of the element. This is the first element which opens before the
1864 current one. It is usually either the last descendant of the pre‐
1865 vious sibling or simply the parent
1866
1867 next_n_elt ($offset, $optional_condition)
1868 Return the $offset-th element that matches the $optional_condition
1869
1870 following_elt
1871 Return the following element (as per the XPath following axis)
1872
1873 preceding_elt
1874 Return the preceding element (as per the XPath preceding axis)
1875
1876 following_elts
1877 Return the list of following elements (as per the XPath following
1878 axis)
1879
1880 preceding_elts
1881 Return the pst of preceding elements (as per the XPath preceding
1882 axis)
1883
1884 children ($optional_condition)
1885 Return the list of children (optionally which matches
1886 $optional_condition) of the element. The list is in document order.
1887
1888 children_count ($optional_condition)
1889 Return the number of children of the element (optionally which
1890 matches $optional_condition)
1891
1892 children_text ($optional_condition)
1893 Return an array containing the text of children of the element
1894 (optionally which matches $optional_condition)
1895
1896 children_trimmed_text ($optional_condition)
1897 Return an array containing the trimmed text of children of the ele‐
1898 ment (optionally which matches $optional_condition)
1899
1900 children_copy ($optional_condition)
1901 Return a list of elements that are copies of the children of the
1902 element, optionally which matches $optional_condition
1903
1904 descendants ($optional_condition)
1905 Return the list of all descendants (optionally which matches
1906 $optional_condition) of the element. This is the equivalent of the
1907 "getElementsByTagName" of the DOM (by the way, if you are really a
1908 DOM addict, you can use "getElementsByTagName" instead)
1909
1910 getElementsByTagName ($optional_condition)
1911 Same as "descendants"
1912
1913 find_by_tag_name ($optional_condition)
1914 Same as "descendants"
1915
1916 descendants_or_self ($optional_condition)
1917 Same as "descendants" except that the element itself is included in
1918 the list if it matches the $optional_condition
1919
1920 first_descendant ($optional_condition)
1921 Return the first descendant of the element that matches the condi‐
1922 tion
1923
1924 last_descendant ($optional_condition)
1925 Return the last descendant of the element that matches the condi‐
1926 tion
1927
1928 ancestors ($optional_condition)
1929 Return the list of ancestors (optionally matching $optional_condi‐
1930 tion) of the element. The list is ordered from the innermost
1931 ancestor to the outtermost one
1932
1933 NOTE: the element itself is not part of the list, in order to
1934 include it you will have to use ancestors_or_self
1935
1936 ancestors_or_self ($optional_condition)
1937 Return the list of ancestors (optionally matching $optional_condi‐
1938 tion) of the element, including the element (if it matches the con‐
1939 dition>). The list is ordered from the innermost ancestor to the
1940 outtermost one
1941
1942 passes ($condition)
1943 Return the element if it passes the $condition
1944
1945 att ($att)
1946 Return the value of attribute $att or "undef"
1947
1948 set_att ($att, $att_value)
1949 Set the attribute of the element to the given value
1950
1951 You can actually set several attributes this way:
1952
1953 $elt->set_att( att1 => "val1", att2 => "val2");
1954
1955 del_att ($att)
1956 Delete the attribute for the element
1957
1958 You can actually delete several attributes at once:
1959
1960 $elt->del_att( 'att1', 'att2', 'att3');
1961
1962 cut Cut the element from the tree. The element still exists, it can be
1963 copied or pasted somewhere else, it is just not attached to the
1964 tree anymore.
1965
1966 Note that the "old" links to the parent, previous and next siblings
1967 can still be accessed using the former_* methods
1968
1969 former_next_sibling
1970 Returns the former next sibling of a cut node (or undef if the node
1971 has not been cut)
1972
1973 This makes it easier to write loops where you cut elements:
1974
1975 my $child= $parent->first_child( 'achild');
1976 while( $child->{'att'}->{'cut'})
1977 { $child->cut; $child= $child->former_next_sibling; }
1978
1979 former_prev_sibling
1980 Returns the former previous sibling of a cut node (or undef if the
1981 node has not been cut)
1982
1983 former_parent
1984 Returns the former parent of a cut node (or undef if the node has
1985 not been cut)
1986
1987 cut_children ($optional_condition)
1988 Cut all the children of the element (or all of those which satisfy
1989 the $optional_condition).
1990
1991 Return the list of children
1992
1993 copy ($elt)
1994 Return a copy of the element. The copy is a "deep" copy: all sub
1995 elements of the element are duplicated.
1996
1997 paste ($optional_position, $ref)
1998 Paste a (previously "cut" or newly generated) element. Die if the
1999 element already belongs to a tree.
2000
2001 Note that the calling element is pasted:
2002
2003 $child->paste( first_child => $existing_parent);
2004 $new_sibling->paste( after => $this_sibling_is_already_in_the_tree);
2005
2006 or
2007
2008 my $new_elt= XML::Twig::Elt->new( tag => $content);
2009 $new_elt->paste( $position => $existing_elt);
2010
2011 Example:
2012
2013 my $t= XML::Twig->new->parse( 'doc.xml')
2014 my $toc= $t->root->new( 'toc');
2015 $toc->paste( $t->root); # $toc is pasted as first child of the root
2016 foreach my $title ($t->findnodes( '/doc/section/title'))
2017 { my $title_toc= $title->copy;
2018 # paste $title_toc as the last child of toc
2019 $title_toc->paste( last_child => $toc)
2020 }
2021
2022 Position options:
2023
2024 first_child (default)
2025 The element is pasted as the first child of $ref
2026
2027 last_child
2028 The element is pasted as the last child of $ref
2029
2030 before
2031 The element is pasted before $ref, as its previous sibling.
2032
2033 after
2034 The element is pasted after $ref, as its next sibling.
2035
2036 within
2037 In this case an extra argument, $offset, should be supplied.
2038 The element will be pasted in the reference element (or in its
2039 first text child) at the given offset. To achieve this the ref‐
2040 erence element will be split at the offset.
2041
2042 Note that you can call directly the underlying method:
2043
2044 paste_before
2045 paste_after
2046 paste_first_child
2047 paste_last_child
2048 paste_within
2049 move ($optional_position, $ref)
2050 Move an element in the tree. This is just a "cut" then a "paste".
2051 The syntax is the same as "paste".
2052
2053 replace ($ref)
2054 Replaces an element in the tree. Sometimes it is just not possible
2055 to"cut" an element then "paste" another in its place, so "replace"
2056 comes in handy. The calling element replaces $ref.
2057
2058 replace_with (@elts)
2059 Replaces the calling element with one or more elements
2060
2061 delete
2062 Cut the element and frees the memory.
2063
2064 prefix ($text, $optional_option)
2065 Add a prefix to an element. If the element is a "PCDATA" element
2066 the text is added to the pcdata, if the elements first child is a
2067 "PCDATA" then the text is added to it's pcdata, otherwise a new
2068 "PCDATA" element is created and pasted as the first child of the
2069 element.
2070
2071 If the option is "asis" then the prefix is added asis: it is cre‐
2072 ated in a separate "PCDATA" element with an "asis" property. You
2073 can then write:
2074
2075 $elt1->prefix( '<b>', 'asis');
2076
2077 to create a "<b>" in the output of "print".
2078
2079 suffix ($text, $optional_option)
2080 Add a suffix to an element. If the element is a "PCDATA" element
2081 the text is added to the pcdata, if the elements last child is a
2082 "PCDATA" then the text is added to it's pcdata, otherwise a new
2083 PCDATA element is created and pasted as the last child of the ele‐
2084 ment.
2085
2086 If the option is "asis" then the suffix is added asis: it is cre‐
2087 ated in a separate "PCDATA" element with an "asis" property. You
2088 can then write:
2089
2090 $elt2->suffix( '</b>', 'asis');
2091
2092 trim
2093 Trim the element in-place: spaces at the beginning and at the end
2094 of the element are discarded and multiple spaces within the element
2095 (or its descendants) are replaced by a single space.
2096
2097 Note that in some cases you can still end up with multiple spaces,
2098 if they are split between several elements:
2099
2100 <doc> text <b> hah! </b> yep</doc>
2101
2102 gets trimmed to
2103
2104 <doc>text <b> hah! </b> yep</doc>
2105
2106 This is somewhere in between a bug and a feature.
2107
2108 simplify (%options)
2109 Return a data structure suspiciously similar to XML::Simple's.
2110 Options are identical to XMLin options, see XML::Simple doc for
2111 more details (or use DATA::dumper or YAML to dump the data struc‐
2112 ture)
2113
2114 content_key
2115 forcearray
2116 keyattr
2117 noattr
2118 normalize_space
2119 aka normalise_space
2120
2121 variables (%var_hash)
2122 %var_hash is a hash { name => value }
2123
2124 This option allows variables in the XML to be expanded when the
2125 file is read. (there is no facility for putting the variable
2126 names back if you regenerate XML using XMLout).
2127
2128 A 'variable' is any text of the form ${name} (or $name) which
2129 occurs in an attribute value or in the text content of an ele‐
2130 ment. If 'name' matches a key in the supplied hashref, ${name}
2131 will be replaced with the corresponding value from the hashref.
2132 If no matching key is found, the variable will not be replaced.
2133
2134 var_att ($attribute_name)
2135 This option gives the name of an attribute that will be used to
2136 create variables in the XML:
2137
2138 <dirs>
2139 <dir name="prefix">/usr/local</dir>
2140 <dir name="exec_prefix">$prefix/bin</dir>
2141 </dirs>
2142
2143 use "var => 'name'" to get $prefix replaced by /usr/local in
2144 the generated data structure
2145
2146 By default variables are captured by the following regexp:
2147 /$(\w+)/
2148
2149 var_regexp (regexp)
2150 This option changes the regexp used to capture variables. The
2151 variable name should be in $1
2152
2153 group_tags { grouping tag => grouped tag, grouping tag 2 => grouped
2154 tag 2...}
2155 Option used to simplify the structure: elements listed will not
2156 be used. Their children will be, they will be considered chil‐
2157 dren of the element parent.
2158
2159 If the element is:
2160
2161 <config host="laptop.xmltwig.com">
2162 <server>localhost</server>
2163 <dirs>
2164 <dir name="base">/home/mrodrigu/standards</dir>
2165 <dir name="tools">$base/tools</dir>
2166 </dirs>
2167 <templates>
2168 <template name="std_def">std_def.templ</template>
2169 <template name="dummy">dummy</template>
2170 </templates>
2171 </config>
2172
2173 Then calling simplify with "group_tags => { dirs => 'dir', tem‐
2174 plates => 'template'}" makes the data structure be exactly as
2175 if the start and end tags for "dirs" and "templates" were not
2176 there.
2177
2178 A YAML dump of the structure
2179
2180 base: '/home/mrodrigu/standards'
2181 host: laptop.xmltwig.com
2182 server: localhost
2183 template:
2184 - std_def.templ
2185 - dummy.templ
2186 tools: '$base/tools'
2187
2188 split_at ($offset)
2189 Split a text ("PCDATA" or "CDATA") element in 2 at $offset, the
2190 original element now holds the first part of the string and a new
2191 element holds the right part. The new element is returned
2192
2193 If the element is not a text element then the first text child of
2194 the element is split
2195
2196 split ( $optional_regexp, $tag1, $atts1, $tag2, $atts2...)
2197 Split the text descendants of an element in place, the text is
2198 split using the $regexp, if the regexp includes () then the matched
2199 separators will be wrapped in elements. $1 is wrapped in $tag1,
2200 with attributes $atts1 if $atts1 is given (as a hashref), $2 is
2201 wrapped in $tag2...
2202
2203 if $elt is "<p>tati tata <b>tutu tati titi</b> tata tati tata</p>"
2204
2205 $elt->split( qr/(ta)ti/, 'foo', {type => 'toto'} )
2206
2207 will change $elt to
2208
2209 <p><foo type="toto">ta</foo> tata <b>tutu <foo type="toto">ta</foo>
2210 titi</b> tata <foo type="toto">ta</foo> tata</p>
2211
2212 The regexp can be passed either as a string or as "qr//" (perl
2213 5.005 and later), it defaults to \s+ just as the "split" built-in
2214 (but this would be quite a useless behaviour without the
2215 $optional_tag parameter)
2216
2217 $optional_tag defaults to PCDATA or CDATA, depending on the initial
2218 element type
2219
2220 The list of descendants is returned (including un-touched original
2221 elements and newly created ones)
2222
2223 mark ( $regexp, $optional_tag, $optional_attribute_ref)
2224 This method behaves exactly as split, except only the newly created
2225 elements are returned
2226
2227 wrap_children ( $regexp_string, $tag, $optional_attribute_hashref)
2228 Wrap the children of the element that match the regexp in an ele‐
2229 ment $tag. If $optional_attribute_hashref is passed then the new
2230 element will have these attributes.
2231
2232 The $regexp_string includes tags, within pointy brackets, as in
2233 "<title><para>+" and the usual Perl modifiers (+*?...). Tags can
2234 be further qualified with attributes: "<para type="warning" clas‐
2235 sif="cosmic_secret">+". The values for attributes should be
2236 xml-escaped: "<candy type="M&Ms">*" ("<", "&" ">" and """
2237 should be escaped).
2238
2239 Note that elements might get extra "id" attributes in the process.
2240 See add_id. Use strip_att to remove unwanted id's.
2241
2242 Here is an example:
2243
2244 If the element $elt has the following content:
2245
2246 <elt>
2247 <p>para 1</p>
2248 <l_l1_1>list 1 item 1 para 1</l_l1_1>
2249 <l_l1>list 1 item 1 para 2</l_l1>
2250 <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2251 <l_l1_n>list 1 item 3 para 1</l_l1_n>
2252 <l_l1>list 1 item 3 para 2</l_l1>
2253 <l_l1>list 1 item 3 para 3</l_l1>
2254 <l_l1_1>list 2 item 1 para 1</l_l1_1>
2255 <l_l1>list 2 item 1 para 2</l_l1>
2256 <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2257 <l_l1_n>list 2 item 3 para 1</l_l1_n>
2258 <l_l1>list 2 item 3 para 2</l_l1>
2259 <l_l1>list 2 item 3 para 3</l_l1>
2260 </elt>
2261
2262 Then the code
2263
2264 $elt->wrap_children( q{<l_l1_1><l_l1>*} , li => { type => "ul1" });
2265 $elt->wrap_children( q{<l_l1_n><l_l1>*} , li => { type => "ul" });
2266
2267 $elt->wrap_children( q{<li type="ul1"><li type="ul">+}, "ul");
2268 $elt->strip_att( 'id');
2269 $elt->strip_att( 'type');
2270 $elt->print;
2271
2272 will output:
2273
2274 <elt>
2275 <p>para 1</p>
2276 <ul>
2277 <li>
2278 <l_l1_1>list 1 item 1 para 1</l_l1_1>
2279 <l_l1>list 1 item 1 para 2</l_l1>
2280 </li>
2281 <li>
2282 <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2283 </li>
2284 <li>
2285 <l_l1_n>list 1 item 3 para 1</l_l1_n>
2286 <l_l1>list 1 item 3 para 2</l_l1>
2287 <l_l1>list 1 item 3 para 3</l_l1>
2288 </li>
2289 </ul>
2290 <ul>
2291 <li>
2292 <l_l1_1>list 2 item 1 para 1</l_l1_1>
2293 <l_l1>list 2 item 1 para 2</l_l1>
2294 </li>
2295 <li>
2296 <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2297 </li>
2298 <li>
2299 <l_l1_n>list 2 item 3 para 1</l_l1_n>
2300 <l_l1>list 2 item 3 para 2</l_l1>
2301 <l_l1>list 2 item 3 para 3</l_l1>
2302 </li>
2303 </ul>
2304 </elt>
2305
2306 subs_text ($regexp, $replace)
2307 subs_text does text substitution, similar to perl's " s///" opera‐
2308 tor.
2309
2310 $regexp must be a perl regexp, created with the "qr" operatot.
2311
2312 $replace can include "$1, $2"... from the $regexp. It can also be
2313 used to create element and entities, by using "&elt( tag => { att
2314 => val }, text)" (similar syntax as "new") and "&ent( name)".
2315
2316 Here is a rather complex example:
2317
2318 $elt->subs_text( qr{(?<!do not )link to (http://([^\s,]*))},
2319 'see &elt( a =>{ href => $1 }, $2)'
2320 );
2321
2322 This will replace text like link to http://www.xmltwig.com by see
2323 <a href="www.xmltwig.com">www.xmltwig.com</a>, but not do not link
2324 to...
2325
2326 Generating entities (here replacing spaces with ):
2327
2328 $elt->subs_text( qr{ }, '&ent( " ")');
2329
2330 or, using a variable:
2331
2332 my $ent=" ";
2333 $elt->subs_text( qr{ }, "&ent( '$ent')");
2334
2335 Note that the substitution is always global, as in using the "g"
2336 modifier in a perl substitution, and that it is performed on all
2337 text descendants of the element.
2338
2339 Bug: in the $regexp, you can only use "\1", "\2"... if the replace‐
2340 ment expression does not include elements or attributes. eg
2341
2342 t->subs_text( qr/((t[aiou])\2)/, '$2'); # ok, replaces toto, tata, titi, tutu by to, ta, ti, tu
2343 t->subs_text( qr/((t[aiou])\2)/, '&elt(p => $1)' ); # NOK, does not find toto...
2344
2345 add_id ($optional_coderef)
2346 Add an id to the element.
2347
2348 The id is an attribute, "id" by default, see the "id" option for
2349 XML::Twig "new" to change it. Use an id starting with "#" to get an
2350 id that's not output by print, flush or sprint, yet that allows you
2351 to use the elt_id method to get the element easily.
2352
2353 If the element already has an id, no new id is generated.
2354
2355 By default the method create an id of the form "twig_id_<nnnn>",
2356 where "<nnnn>" is a number, incremented each time the method is
2357 called successfully.
2358
2359 set_id_seed ($prefix)
2360 by default the id generated by "add_id" is "twig_id_<nnnn>",
2361 "set_id_seed" changes the prefix to $prefix and resets the number
2362 to 1
2363
2364 strip_att ($att)
2365 Remove the attribute $att from all descendants of the element
2366 (including the element)
2367
2368 Return the element
2369
2370 change_att_name ($old_name, $new_name)
2371 Change the name of the attribute from $old_name to $new_name. If
2372 there is no attribute $old_name nothing happens.
2373
2374 sort_children_on_value( %options)
2375 Sort the children of the element in place according to their text.
2376 All children are sorted.
2377
2378 Return the element, with its children sorted.
2379
2380 %options are
2381
2382 type : numeric ⎪ alpha (default: alpha)
2383 order : normal ⎪ reverse (default: normal)
2384
2385 Return the element, with its children sorted
2386
2387 sort_children_on_att ($att, %options)
2388 Sort the children of the element in place according to attribute
2389 $att. %options are the same as for "sort_children_on_value"
2390
2391 Return the element.
2392
2393 sort_children_on_field ($tag, %options)
2394 Sort the children of the element in place, according to the field
2395 $tag (the text of the first child of the child with this tag).
2396 %options are the same as for "sort_children_on_value".
2397
2398 Return the element, with its children sorted
2399
2400 sort_children( $get_key, %options)
2401 Sort the children of the element in place. The $get_key argument is
2402 a reference to a function that returns the sort key when passed an
2403 element.
2404
2405 For example:
2406
2407 $elt->sort_children( sub { $_[0]->{'att'}->{"nb"} + $_[0]->text },
2408 type => 'numeric', order => 'reverse'
2409 );
2410
2411 field_to_att ($cond, $att)
2412 Turn the text of the first sub-element matched by $cond into the
2413 value of attribute $att of the element. If $att is ommited then
2414 $cond is used as the name of the attribute, which makes sense only
2415 if $cond is a valid element (and attribute) name.
2416
2417 The sub-element is then cut.
2418
2419 att_to_field ($att, $tag)
2420 Take the value of attribute $att and create a sub-element $tag as
2421 first child of the element. If $tag is ommited then $att is used as
2422 the name of the sub-element.
2423
2424 get_xpath ($xpath, $optional_offset)
2425 Return a list of elements satisfying the $xpath. $xpath is an
2426 XPATH-like expression.
2427
2428 A subset of the XPATH abbreviated syntax is covered:
2429
2430 tag
2431 tag[1] (or any other positive number)
2432 tag[last()]
2433 tag[@att] (the attribute exists for the element)
2434 tag[@att="val"]
2435 tag[@att=~ /regexp/]
2436 tag[att1="val1" and att2="val2"]
2437 tag[att1="val1" or att2="val2"]
2438 tag[string()="toto"] (returns tag elements which text (as per the text method)
2439 is toto)
2440 tag[string()=~/regexp/] (returns tag elements which text (as per the text
2441 method) matches regexp)
2442 expressions can start with / (search starts at the document root)
2443 expressions can start with . (search starts at the current element)
2444 // can be used to get all descendants instead of just direct children
2445 * matches any tag
2446
2447 So the following examples from the XPath recommenda‐
2448 tion<http://www.w3.org/TR/xpath.html#path-abbrev> work:
2449
2450 para selects the para element children of the context node
2451 * selects all element children of the context node
2452 para[1] selects the first para child of the context node
2453 para[last()] selects the last para child of the context node
2454 */para selects all para grandchildren of the context node
2455 /doc/chapter[5]/section[2] selects the second section of the fifth chapter
2456 of the doc
2457 chapter//para selects the para element descendants of the chapter element
2458 children of the context node
2459 //para selects all the para descendants of the document root and thus selects
2460 all para elements in the same document as the context node
2461 //olist/item selects all the item elements in the same document as the
2462 context node that have an olist parent
2463 .//para selects the para element descendants of the context node
2464 .. selects the parent of the context node
2465 para[@type="warning"] selects all para children of the context node that have
2466 a type attribute with value warning
2467 employee[@secretary and @assistant] selects all the employee children of the
2468 context node that have both a secretary attribute and an assistant
2469 attribute
2470
2471 The elements will be returned in the document order.
2472
2473 If $optional_offset is used then only one element will be returned,
2474 the one with the appropriate offset in the list, starting at 0
2475
2476 Quoting and interpolating variables can be a pain when the Perl
2477 syntax and the XPATH syntax collide, so use alternate quoting mech‐
2478 anisms like q or qq (I like q{} and qq{} myself).
2479
2480 Here are some more examples to get you started:
2481
2482 my $p1= "p1";
2483 my $p2= "p2";
2484 my @res= $t->get_xpath( qq{p[string( "$p1") or string( "$p2")]});
2485
2486 my $a= "a1";
2487 my @res= $t->get_xpath( qq{//*[@att="$a"]});
2488
2489 my $val= "a1";
2490 my $exp= qq{//p[ \@att='$val']}; # you need to use \@ or you will get a warning
2491 my @res= $t->get_xpath( $exp);
2492
2493 Note that the only supported regexps delimiters are / and that you
2494 must backslash all / in regexps AND in regular strings.
2495
2496 XML::Twig does not provide natively full XPATH support, but you can
2497 use "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as
2498 the XPath engine, with full coverage of the spec.
2499
2500 "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as the
2501 XPath engine, with full coverage of the spec.
2502
2503 find_nodes
2504 same as"get_xpath"
2505
2506 findnodes
2507 same as "get_xpath"
2508
2509 text @optional_options
2510 Return a string consisting of all the "PCDATA" and "CDATA" in an
2511 element, without any tags. The text is not XML-escaped: base enti‐
2512 ties such as "&" and "<" are not escaped.
2513
2514 The '"no_recurse"' option will only return the text of the element,
2515 not of any included sub-elements (same as "text_only").
2516
2517 text_only
2518 Same as "text" except that the text returned doesn't include the
2519 text of sub-elements.
2520
2521 trimmed_text
2522 Same as "text" except that the text is trimmed: leading and trail‐
2523 ing spaces are discarded, consecutive spaces are collapsed
2524
2525 set_text ($string)
2526 Set the text for the element: if the element is a "PCDATA", just
2527 set its text, otherwise cut all the children of the element and
2528 create a single "PCDATA" child for it, which holds the text.
2529
2530 merge ($elt2)
2531 Move the content of $elt2 within the element
2532
2533 insert ($tag1, [$optional_atts1], $tag2, [$optional_atts2],...)
2534 For each tag in the list inserts an element $tag as the only child
2535 of the element. The element gets the optional attributes
2536 in"$optional_atts<n>." All children of the element are set as
2537 children of the new element. The upper level element is returned.
2538
2539 $p->insert( table => { border=> 1}, 'tr', 'td')
2540
2541 put $p in a table with a visible border, a single "tr" and a single
2542 "td" and return the "table" element:
2543
2544 <p><table border="1"><tr><td>original content of p</td></tr></table></p>
2545
2546 wrap_in (@tag)
2547 Wrap elements in @tag as the successive ancestors of the element,
2548 returns the new element. "$elt->wrap_in( 'td', 'tr', 'table')"
2549 wraps the element as a single cell in a table for example.
2550
2551 Optionally each tag can be followed by a hasref of attributes, that
2552 will be set on the wrapping element:
2553
2554 $elt->wrap_in( p => { class => "advisory" }, div => { class => "intro", id => "div_intro });
2555
2556 insert_new_elt ($opt_position, $tag, $opt_atts_hashref, @opt_content)
2557 Combines a "new " and a "paste ": creates a new element using $tag,
2558 $opt_atts_hashref and @opt_content which are arguments similar to
2559 those for "new", then paste it, using $opt_position or
2560 'first_child', relative to $elt.
2561
2562 Return the newly created element
2563
2564 erase
2565 Erase the element: the element is deleted and all of its children
2566 are pasted in its place.
2567
2568 set_content ( $optional_atts, @list_of_elt_and_strings) (
2569 $optional_atts, '#EMPTY')
2570 Set the content for the element, from a list of strings and ele‐
2571 ments. Cuts all the element children, then pastes the list ele‐
2572 ments as the children. This method will create a "PCDATA" element
2573 for any strings in the list.
2574
2575 The $optional_atts argument is the ref of a hash of attributes. If
2576 this argument is used then the previous attributes are deleted,
2577 otherwise they are left untouched.
2578
2579 WARNING: if you rely on ID's then you will have to set the id your‐
2580 self. At this point the element does not belong to a twig yet, so
2581 the ID attribute is not known so it won't be strored in the ID
2582 list.
2583
2584 A content of '"#EMPTY"' creates an empty element;
2585
2586 namespace ($optional_prefix)
2587 Return the URI of the namespace that $optional_prefix or the ele‐
2588 ment name belongs to. If the name doesn't belong to any namespace,
2589 "undef" is returned.
2590
2591 local_name
2592 Return the local name (without the prefix) for the element
2593
2594 ns_prefix
2595 Return the namespace prefix for the element
2596
2597 current_ns_prefixes
2598 Returna list of namespace prefixes valid for the element. The order
2599 of the prefixes in the list has no meaning. If the default names‐
2600 pace is currently bound, '' appears in the list.
2601
2602 inherit_att ($att, @optional_tag_list)
2603 Return the value of an attribute inherited from parent tags. The
2604 value returned is found by looking for the attribute in the element
2605 then in turn in each of its ancestors. If the @optional_tag_list is
2606 supplied only those ancestors whose tag is in the list will be
2607 checked.
2608
2609 all_children_are ($optional_condition)
2610 return 1 if all children of the element pass the $optional_condi‐
2611 tion, 0 otherwise
2612
2613 level ($optional_condition)
2614 Return the depth of the element in the twig (root is 0). If
2615 $optional_condition is given then only ancestors that match the
2616 condition are counted.
2617
2618 WARNING: in a tree created using the "twig_roots" option this will
2619 not return the level in the document tree, level 0 will be the doc‐
2620 ument root, level 1 will be the "twig_roots" elements. During the
2621 parsing (in a "twig_handler") you can use the "depth" method on the
2622 twig object to get the real parsing depth.
2623
2624 in ($potential_parent)
2625 Return true if the element is in the potential_parent ($poten‐
2626 tial_parent is an element)
2627
2628 in_context ($cond, $optional_level)
2629 Return true if the element is included in an element which passes
2630 $cond optionally within $optional_level levels. The returned value
2631 is the including element.
2632
2633 pcdata
2634 Return the text of a "PCDATA" element or "undef" if the element is
2635 not "PCDATA".
2636
2637 pcdata_xml_string
2638 Return the text of a PCDATA element or undef if the element is not
2639 PCDATA. The text is "XML-escaped" ('&' and '<' are replaced by
2640 '&' and '<')
2641
2642 set_pcdata ($text)
2643 Set the text of a "PCDATA" element.
2644
2645 append_pcdata ($text)
2646 Add the text at the end of a "PCDATA" element.
2647
2648 is_cdata
2649 Return 1 if the element is a "CDATA" element, returns 0 otherwise.
2650
2651 is_text
2652 Return 1 if the element is a "CDATA" or "PCDATA" element, returns 0
2653 otherwise.
2654
2655 cdata
2656 Return the text of a "CDATA" element or "undef" if the element is
2657 not "CDATA".
2658
2659 cdata_string
2660 Return the XML string of a "CDATA" element, including the opening
2661 and closing markers.
2662
2663 set_cdata ($text)
2664 Set the text of a "CDATA" element.
2665
2666 append_cdata ($text)
2667 Add the text at the end of a "CDATA" element.
2668
2669 remove_cdata
2670 Turns all "CDATA" sections in the element into regular "PCDATA"
2671 elements. This is useful when converting XML to HTML, as browsers
2672 do not support CDATA sections.
2673
2674 extra_data
2675 Return the extra_data (comments and PI's) attached to an element
2676
2677 set_extra_data ($extra_data)
2678 Set the extra_data (comments and PI's) attached to an element
2679
2680 append_extra_data ($extra_data)
2681 Append extra_data to the existing extra_data before the element (if
2682 no previous extra_data exists then it is created)
2683
2684 set_asis
2685 Set a property of the element that causes it to be output without
2686 being XML escaped by the print functions: if it contains "a < b" it
2687 will be output as such and not as "a < b". This can be useful to
2688 create text elements that will be output as markup. Note that all
2689 "PCDATA" descendants of the element are also marked as having the
2690 property (they are the ones taht are actually impacted by the
2691 change).
2692
2693 If the element is a "CDATA" element it will also be output asis,
2694 without the "CDATA" markers. The same goes for any "CDATA" descen‐
2695 dant of the element
2696
2697 set_not_asis
2698 Unsets the "asis" property for the element and its text descen‐
2699 dants.
2700
2701 is_asis
2702 Return the "asis" property status of the element ( 1 or "undef")
2703
2704 closed
2705 Return true if the element has been closed. Might be usefull if you
2706 are somewhere in the tree, during the parse, and have no idea
2707 whether a parent element is completely loaded or not.
2708
2709 get_type
2710 Return the type of the element: '"#ELT"' for "real" elements, or
2711 '"#PCDATA"', '"#CDATA"', '"#COMMENT"', '"#ENT"', '"#PI"'
2712
2713 is_elt
2714 Return the tag if the element is a "real" element, or 0 if it is
2715 "PCDATA", "CDATA"...
2716
2717 contains_only_text
2718 Return 1 if the element does not contain any other "real" element
2719
2720 contains_only ($exp)
2721 Return the list of children if all children of the element match
2722 the expression $exp
2723
2724 if( $para->contains_only( 'tt')) { ... }
2725
2726 contains_a_single ($exp)
2727 If the element contains a single child that matches the expression
2728 $exp returns that element. Otherwise returns 0.
2729
2730 is_field
2731 same as "contains_only_text"
2732
2733 is_pcdata
2734 Return 1 if the element is a "PCDATA" element, returns 0 otherwise.
2735
2736 is_ent
2737 Return 1 if the element is an entity (an unexpanded entity) ele‐
2738 ment, return 0 otherwise.
2739
2740 is_empty
2741 Return 1 if the element is empty, 0 otherwise
2742
2743 set_empty
2744 Flags the element as empty. No further check is made, so if the
2745 element is actually not empty the output will be messed. The only
2746 effect of this method is that the output will be "<tag
2747 att="value""/>".
2748
2749 set_not_empty
2750 Flags the element as not empty. if it is actually empty then the
2751 element will be output as "<tag att="value""></tag>"
2752
2753 is_pi
2754 Return 1 if the element is a processing instruction ("#PI") ele‐
2755 ment, return 0 otherwise.
2756
2757 target
2758 Return the target of a processing instruction
2759
2760 set_target ($target)
2761 Set the target of a processing instruction
2762
2763 data
2764 Return the data part of a processing instruction
2765
2766 set_data ($data)
2767 Set the data of a processing instruction
2768
2769 set_pi ($target, $data)
2770 Set the target and data of a processing instruction
2771
2772 pi_string
2773 Return the string form of a processing instruction ("<?target
2774 data?>")
2775
2776 is_comment
2777 Return 1 if the element is a comment ("#COMMENT") element, return 0
2778 otherwise.
2779
2780 set_comment ($comment_text)
2781 Set the text for a comment
2782
2783 comment
2784 Return the content of a comment (just the text, not the "<!--" and
2785 "-->")
2786
2787 comment_string
2788 Return the XML string for a comment ("<!-- comment -->")
2789
2790 set_ent ($entity)
2791 Set an (non-expanded) entity ("#ENT"). $entity) is the entity text
2792 ("&ent;")
2793
2794 ent Return the entity for an entity ("#ENT") element ("&ent;")
2795
2796 ent_name
2797 Return the entity name for an entity ("#ENT") element ("ent")
2798
2799 ent_string
2800 Return the entity, either expanded if the expanded version is
2801 available, or non-expanded ("&ent;") otherwise
2802
2803 child ($offset, $optional_condition)
2804 Return the $offset-th child of the element, optionally the $off‐
2805 set-th child that matches $optional_condition. The children are
2806 treated as a list, so "$elt->child( 0)" is the first child, while
2807 "$elt->child( -1)" is the last child.
2808
2809 child_text ($offset, $optional_condition)
2810 Return the text of a child or "undef" if the sibling does not
2811 exist. Arguments are the same as child.
2812
2813 last_child ($optional_condition)
2814 Return the last child of the element, or the last child matching
2815 $optional_condition (ie the last of the element children matching
2816 the condition).
2817
2818 last_child_text ($optional_condition)
2819 Same as "first_child_text" but for the last child.
2820
2821 sibling ($offset, $optional_condition)
2822 Return the next or previous $offset-th sibling of the element, or
2823 the $offset-th one matching $optional_condition. If $offset is neg‐
2824 ative then a previous sibling is returned, if $offset is positive
2825 then a next sibling is returned. "$offset=0" returns the element
2826 if there is no condition or if the element matches the condition>,
2827 "undef" otherwise.
2828
2829 sibling_text ($offset, $optional_condition)
2830 Return the text of a sibling or "undef" if the sibling does not
2831 exist. Arguments are the same as "sibling".
2832
2833 prev_siblings ($optional_condition)
2834 Return the list of previous siblings (optionaly matching
2835 $optional_condition) for the element. The elements are ordered in
2836 document order.
2837
2838 next_siblings ($optional_condition)
2839 Return the list of siblings (optionaly matching $optional_condi‐
2840 tion) following the element. The elements are ordered in document
2841 order.
2842
2843 pos ($optional_condition)
2844 Return the position of the element in the children list. The first
2845 child has a position of 1 (as in XPath).
2846
2847 If the $optional_condition is given then only siblings that match
2848 the condition are counted. If the element itself does not match the
2849 condition then 0 is returned.
2850
2851 atts
2852 Return a hash ref containing the element attributes
2853
2854 set_atts ({att1=>$att1_val, att2=> $att2_val... })
2855 Set the element attributes with the hash ref supplied as the argu‐
2856 ment
2857
2858 del_atts
2859 Deletes all the element attributes.
2860
2861 att_nb
2862 Return the number of attributes for the element
2863
2864 has_atts
2865 Return true if the element has attributes (in fact return the num‐
2866 ber of attributes, thus being an alias to "att_nb"
2867
2868 has_no_atts
2869 Return true if the element has no attributes, false (0) otherwise
2870
2871 att_names
2872 return a list of the attribute names for the element
2873
2874 att_xml_string ($att, $optional_quote)
2875 Return the attribute value, where '&', '<' and $quote (" by
2876 default) are XML-escaped
2877
2878 if $optional_quote is passed then it is used as the quote.
2879
2880 set_id ($id)
2881 Set the "id" attribute of the element to the value. See "elt_id "
2882 to change the id attribute name
2883
2884 id Gets the id attribute value
2885
2886 del_id ($id)
2887 Deletes the "id" attribute of the element and remove it from the id
2888 list for the document
2889
2890 class
2891 Return the "class" attribute for the element (methods on the
2892 "class" attribute are quite convenient when dealing with XHTML, or
2893 plain XML that will eventually be displayed using CSS)
2894
2895 set_class ($class)
2896 Set the "class" attribute for the element to $class
2897
2898 add_to_class ($class)
2899 Add $class to the element "class" attribute: the new class is added
2900 only if it is not already present. Note that classes are sorted
2901 alphabetically, so the "class" attribute can be changed even if the
2902 class is already there
2903
2904 att_to_class ($att)
2905 Set the "class" attribute to the value of attribute $att
2906
2907 add_att_to_class ($att)
2908 Add the value of attribute $att to the "class" attribute of the
2909 element
2910
2911 move_att_to_class ($att)
2912 Add the value of attribute $att to the "class" attribute of the
2913 element and delete the attribute
2914
2915 tag_to_class
2916 Set the "class" attribute of the element to the element tag
2917
2918 add_tag_to_class
2919 Add the element tag to its "class" attribute
2920
2921 set_tag_class ($new_tag)
2922 Add the element tag to its "class" attribute and sets the tag to
2923 $new_tag
2924
2925 in_class ($class)
2926 Return true (1) if the element is in the class $class (if $class is
2927 one of the tokens in the element "class" attribute)
2928
2929 tag_to_span
2930 Change the element tag tp "span" and set its class to the old tag
2931
2932 tag_to_div
2933 Change the element tag tp "div" and set its class to the old tag
2934
2935 DESTROY
2936 Frees the element from memory.
2937
2938 start_tag
2939 Return the string for the start tag for the element, including the
2940 "/>" at the end of an empty element tag
2941
2942 end_tag
2943 Return the string for the end tag of an element. For an empty ele‐
2944 ment, this returns the empty string ('').
2945
2946 xml_string @optional_options
2947 Equivalent to "$elt->sprint( 1)", returns the string for the entire
2948 element, excluding the element's tags (but nested element tags are
2949 present)
2950
2951 The '"no_recurse"' option will only return the text of the element,
2952 not of any included sub-elements (same as "xml_text_only").
2953
2954 inner_xml
2955 Another synonym for xml_string
2956
2957 outer_xml
2958 An other synonym for sprint
2959
2960 xml_text
2961 Return the text of the element, encoded (and processed by the cur‐
2962 rent "output_filter" or "output_encoding" options, without any tag.
2963
2964 xml_text_only
2965 Same as "xml_text" except that the text returned doesn't include
2966 the text of sub-elements.
2967
2968 set_pretty_print ($style)
2969 Set the pretty print method, amongst '"none"' (default),
2970 '"nsgmls"', '"nice"', '"indented"', '"record"' and '"record_c"'
2971
2972 pretty_print styles:
2973
2974 none
2975 the default, no "\n" is used
2976
2977 nsgmls
2978 nsgmls style, with "\n" added within tags
2979
2980 nice
2981 adds "\n" wherever possible (NOT SAFE, can lead to invalid XML)
2982
2983 indented
2984 same as "nice" plus indents elements (NOT SAFE, can lead to
2985 invalid XML)
2986
2987 record
2988 table-oriented pretty print, one field per line
2989
2990 record_c
2991 table-oriented pretty print, more compact than "record", one
2992 record per line
2993
2994 set_empty_tag_style ($style)
2995 Set the method to output empty tags, amongst '"normal"' (default),
2996 '"html"', and '"expand"',
2997
2998 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
2999 '"<tag />"' for elements that can be empty in XHTML and "expand"
3000 outputs '"<tag></tag>"'
3001
3002 set_remove_cdata ($flag)
3003 set (or unset) the flag that forces the twig to output CDATA sec‐
3004 tions as regular (escaped) PCDATA
3005
3006 set_indent ($string)
3007 Set the indentation for the indented pretty print style (default is
3008 2 spaces)
3009
3010 set_quote ($quote)
3011 Set the quotes used for attributes. can be '"double"' (default) or
3012 '"single"'
3013
3014 cmp ($elt)
3015 Compare the order of the 2 elements in a twig.
3016
3017 C<$a> is the <A>..</A> element, C<$b> is the <B>...</B> element
3018
3019 document $a->cmp( $b)
3020 <A> ... </A> ... <B> ... </B> -1
3021 <A> ... <B> ... </B> ... </A> -1
3022 <B> ... </B> ... <A> ... </A> 1
3023 <B> ... <A> ... </A> ... </B> 1
3024 $a == $b 0
3025 $a and $b not in the same tree undef
3026
3027 before ($elt)
3028 Return 1 if $elt starts before the element, 0 otherwise. If the 2
3029 elements are not in the same twig then return "undef".
3030
3031 if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3032
3033 after ($elt)
3034 Return 1 if $elt starts after the element, 0 otherwise. If the 2
3035 elements are not in the same twig then return "undef".
3036
3037 if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3038
3039 other comparison methods
3040 lt
3041 le
3042 gt
3043 ge
3044 path
3045 Return the element context in a form similar to XPath's short form:
3046 '"/root/tag1/../tag"'
3047
3048 xpath
3049 Return a unique XPath expression that can be used to find the ele‐
3050 ment again.
3051
3052 It looks like "/doc/sect[3]/title": unique elements do not have an
3053 index, the others do.
3054
3055 private methods
3056 Low-level methods on the twig:
3057
3058 set_parent ($parent)
3059 set_first_child ($first_child)
3060 set_last_child ($last_child)
3061 set_prev_sibling ($prev_sibling)
3062 set_next_sibling ($next_sibling)
3063 set_twig_current
3064 del_twig_current
3065 twig_current
3066 flush
3067 This method should NOT be used, always flush the twig, not an
3068 element.
3069
3070 contains_text
3071
3072 Those methods should not be used, unless of course you find some
3073 creative and interesting, not to mention useful, ways to do it.
3074
3075 cond
3076
3077 Most of the navigation functions accept a condition as an optional
3078 argument The first element (or all elements for "children " or "ances‐
3079 tors ") that passes the condition is returned.
3080
3081 The condition is a single step of an XPath expression using the XPath
3082 subset defined by "get_xpath". Additional conditions are:
3083
3084 The condition can be
3085
3086 #ELT
3087 return a "real" element (not a PCDATA, CDATA, comment or pi ele‐
3088 ment)
3089
3090 #TEXT
3091 return a PCDATA or CDATA element
3092
3093 regular expression
3094 return an element whose tag matches the regexp. The regexp has to
3095 be created with "qr//" (hence this is available only on perl 5.005
3096 and above)
3097
3098 code reference
3099 applies the code, passing the current element as argument, if the
3100 code returns true then the element is returned, if it returns false
3101 then the code is applied to the next candidate.
3102
3103 XML::Twig::XPath
3104
3105 XML::Twig implements a subset of XPath through the "get_xpath" method.
3106
3107 If you want to use the whole XPath power, then you can use
3108 "XML::Twig::XPath" instead. In this case "XML::Twig" uses "XML::XPath"
3109 to execute XPath queries. You will of course need "XML::XPath"
3110 installed to be able to use "XML::Twig::XPath".
3111
3112 See XML::XPath for more information.
3113
3114 The methods you can use are:
3115
3116 findnodes ($path)
3117 return a list of nodes found by $path.
3118
3119 findnodes_as_string ($path)
3120 return the nodes found reproduced as XML. The result is not guaran‐
3121 teed to be valid XML though.
3122
3123 findvalue ($path)
3124 return the concatenation of the text content of the result nodes
3125
3126 In order for "XML::XPath" to be used as the XPath engine the following
3127 methods are included in "XML::Twig":
3128
3129 in XML::Twig
3130
3131 getRootNode
3132 getParentNode
3133 getChildNodes
3134
3135 in XML::Twig::Elt
3136
3137 string_value
3138 toString
3139 getName
3140 getRootNode
3141 getNextSibling
3142 getPreviousSibling
3143 isElementNode
3144 isTextNode
3145 isPI
3146 isPINode
3147 isProcessingInstructionNode
3148 isComment
3149 isCommentNode
3150 getTarget
3151 getChildNodes
3152 getElementById
3153
3154 XML::Twig::XPath::Elt
3155
3156 The methods you can use are the same as on "XML::Twig::XPath" elements:
3157
3158 findnodes ($path)
3159 return a list of nodes found by $path.
3160
3161 findnodes_as_string ($path)
3162 return the nodes found reproduced as XML. The result is not guaran‐
3163 teed to be valid XML though.
3164
3165 findvalue ($path)
3166 return the concatenation of the text content of the result nodes
3167
3168 XML::Twig::Entity_list
3169
3170 new Create an entity list.
3171
3172 add ($ent)
3173 Add an entity to an entity list.
3174
3175 add_new_ent ($name, $val, $sysid, $pubid, $ndata)
3176 Create a new entity and add it to the entity list
3177
3178 delete ($ent or $tag).
3179 Delete an entity (defined by its name or by the Entity object) from
3180 the list.
3181
3182 print ($optional_filehandle)
3183 Print the entity list.
3184
3185 list
3186 Return the list as an array
3187
3188 XML::Twig::Entity
3189
3190 new ($name, $val, $sysid, $pubid, $ndata)
3191 Same arguments as the Entity handler for XML::Parser.
3192
3193 print ($optional_filehandle)
3194 Print an entity declaration.
3195
3196 name
3197 Return the name of the entity
3198
3199 val Return the value of the entity
3200
3201 sysid
3202 Return the system id for the entity (for NDATA entities)
3203
3204 pubid
3205 Return the public id for the entity (for NDATA entities)
3206
3207 ndata
3208 Return true if the entity is an NDATA entity
3209
3210 text
3211 Return the entity declaration text.
3212
3214 Additional examples (and a complete tutorial) can be found on the
3215 XML::Twig Page<http://www.xmltwig.com/xmltwig/>
3216
3217 To figure out what flush does call the following script with an XML
3218 file and an element name as arguments
3219
3220 use XML::Twig;
3221
3222 my ($file, $elt)= @ARGV;
3223 my $t= XML::Twig->new( twig_handlers =>
3224 { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
3225 $t->parsefile( $file, ErrorContext => 2);
3226 $t->flush;
3227 print "\n";
3228
3230 Subclassing XML::Twig
3231
3232 Useful methods:
3233
3234 elt_class
3235 In order to subclass "XML::Twig" you will probably need to subclass
3236 also "XML::Twig::Elt". Use the "elt_class" option when you create
3237 the "XML::Twig" object to get the elements created in a different
3238 class (which should be a subclass of "XML::Twig::Elt".
3239
3240 add_options
3241 If you inherit "XML::Twig" new method but want to add more options
3242 to it you can use this method to prevent XML::Twig to issue warn‐
3243 ings for those additional options.
3244
3245 DTD Handling
3246
3247 There are 3 possibilities here. They are:
3248
3249 No DTD
3250 No doctype, no DTD information, no entity information, the world is
3251 simple...
3252
3253 Internal DTD
3254 The XML document includes an internal DTD, and maybe entity decla‐
3255 rations.
3256
3257 If you use the load_DTD option when creating the twig the DTD
3258 information and the entity declarations can be accessed.
3259
3260 The DTD and the entity declarations will be "flush"'ed (or
3261 "print"'ed) either as is (if they have not been modified) or as
3262 reconstructed (poorly, comments are lost, order is not kept, due to
3263 it's content this DTD should not be viewed by anyone) if they have
3264 been modified. You can also modify them directly by changing the
3265 "$twig->{twig_doctype}->{internal}" field (straight from
3266 XML::Parser, see the "Doctype" handler doc)
3267
3268 External DTD
3269 The XML document includes a reference to an external DTD, and maybe
3270 entity declarations.
3271
3272 If you use the "load_DTD" when creating the twig the DTD informa‐
3273 tion and the entity declarations can be accessed. The entity decla‐
3274 rations will be "flush"'ed (or "print"'ed) either as is (if they
3275 have not been modified) or as reconstructed (badly, comments are
3276 lost, order is not kept).
3277
3278 You can change the doctype through the "$twig->set_doctype" method
3279 and print the dtd through the "$twig->dtd_text" or
3280 "$twig->dtd_print"
3281 methods.
3282
3283 If you need to modify the entity list this is probably the easiest
3284 way to do it.
3285
3286 Flush
3287
3288 If you set handlers and use "flush", do not forget to flush the twig
3289 one last time AFTER the parsing, or you might be missing the end of the
3290 document.
3291
3292 Remember that element handlers are called when the element is CLOSED,
3293 so if you have handlers for nested elements the inner handlers will be
3294 called first. It makes it for example trickier than it would seem to
3295 number nested clauses.
3296
3298 entity handling
3299 Due to XML::Parser behaviour, non-base entities in attribute values
3300 disappear: "att="val&ent;"" will be turned into "att => val",
3301 unless you use the "keep_encoding" argument to "XML::Twig->new"
3302
3303 DTD handling
3304 The DTD handling methods are quite bugged. No one uses them and it
3305 seems very difficult to get them to work in all cases, including
3306 with several slightly incompatible versions of XML::Parser and of
3307 libexpat.
3308
3309 Basically you can read the DTD, output it back properly, and update
3310 entities, but not much more.
3311
3312 So use XML::Twig with standalone documents, or with documents
3313 refering to an external DTD, but don't expect it to properly parse
3314 and even output back the DTD.
3315
3316 memory leak
3317 If you use a lot of twigs you might find that you leak quite a lot
3318 of memory (about 2Ks per twig). You can use the "dispose " method
3319 to free that memory after you are done.
3320
3321 If you create elements the same thing might happen, use the
3322 "delete" method to get rid of them.
3323
3324 Alternatively installing the "Scalar::Util" (or "WeakRef") module
3325 on a version of Perl that supports it (>5.6.0) will get rid of the
3326 memory leaks automagically.
3327
3328 ID list
3329 The ID list is NOT updated when elements are cut or deleted.
3330
3331 change_gi
3332 This method will not function properly if you do:
3333
3334 $twig->change_gi( $old1, $new);
3335 $twig->change_gi( $old2, $new);
3336 $twig->change_gi( $new, $even_newer);
3337
3338 sanity check on XML::Parser method calls
3339 XML::Twig should really prevent calls to some XML::Parser methods,
3340 especially the "setHandlers" method.
3341
3342 pretty printing
3343 Pretty printing (at least using the '"indented"' style) is hard to
3344 get right! Only elements that belong to the document will be prop‐
3345 erly indented. Printing elements that do not belong to the twig
3346 makes it impossible for XML::Twig to figure out their depth, and
3347 thus their indentation level.
3348
3349 Also there is an unavoidable bug when using "flush" and pretty
3350 printing for elements with mixed content that start with an embed‐
3351 ded element:
3352
3353 <elt><b>b</b>toto<b>bold</b></elt>
3354
3355 will be output as
3356
3357 <elt>
3358 <b>b</b>toto<b>bold</b></elt>
3359
3360 if you flush the twig when you find the "<b>" element
3361
3363 These are the things that can mess up calling code, especially if
3364 threaded. They might also cause problem under mod_perl.
3365
3366 Exported constants
3367 Whether you want them or not you get them! These are subroutines to
3368 use as constant when creating or testing elements
3369
3370 PCDATA return '#PCDATA'
3371 CDATA return '#CDATA'
3372 PI return '#PI', I had the choice between PROC and PI :--(
3373
3374 Module scoped values: constants
3375 these should cause no trouble:
3376
3377 %base_ent= ( '>' => '>',
3378 '<' => '<',
3379 '&' => '&',
3380 "'" => ''',
3381 '"' => '"',
3382 );
3383 CDATA_START = "<![CDATA[";
3384 CDATA_END = "]]>";
3385 PI_START = "<?";
3386 PI_END = "?>";
3387 COMMENT_START = "<!--";
3388 COMMENT_END = "-->";
3389
3390 pretty print styles
3391
3392 ( $NSGMLS, $NICE, $INDENTED, $INDENTED_C, $WRAPPED, $RECORD1, $RECORD2)= (1..7);
3393
3394 empty tag output style
3395
3396 ( $HTML, $EXPAND)= (1..2);
3397
3398 Module scoped values: might be changed
3399 Most of these deal with pretty printing, so the worst that can hap‐
3400 pen is probably that XML output does not look right, but is still
3401 valid and processed identically by XML processors.
3402
3403 $empty_tag_style can mess up HTML bowsers though and changing $ID
3404 would most likely create problems.
3405
3406 $pretty=0; # pretty print style
3407 $quote='"'; # quote for attributes
3408 $INDENT= ' '; # indent for indented pretty print
3409 $empty_tag_style= 0; # how to display empty tags
3410 $ID # attribute used as an id ('id' by default)
3411
3412 Module scoped values: definitely changed
3413 These 2 variables are used to replace tags by an index, thus saving
3414 some space when creating a twig. If they really cause you too much
3415 trouble, let me know, it is probably possible to create either a
3416 switch or at least a version of XML::Twig that does not perform
3417 this optimisation.
3418
3419 %gi2index; # tag => index
3420 @index2gi; # list of tags
3421
3422 If you need to manipulate all those values, you can use the following
3423 methods on the XML::Twig object:
3424
3425 global_state
3426 Return a hasref with all the global variables used by XML::Twig
3427
3428 The hash has the following fields: "pretty", "quote", "indent",
3429 "empty_tag_style", "keep_encoding", "expand_external_entities",
3430 "output_filter", "output_text_filter", "keep_atts_order"
3431
3432 set_global_state ($state)
3433 Set the global state, $state is a hashref
3434
3435 save_global_state
3436 Save the current global state
3437
3438 restore_global_state
3439 Restore the previously saved (using "Lsave_global_state"> state
3440
3442 SAX handlers
3443 Allowing XML::Twig to work on top of any SAX parser
3444
3445 multiple twigs are not well supported
3446 A number of twig features are just global at the moment. These
3447 include the ID list and the "tag pool" (if you use "change_gi" then
3448 you change the tag for ALL twigs).
3449
3450 A future version will try to support this while trying not to be to
3451 hard on performance (at least when a single twig is used!).
3452
3454 Michel Rodriguez <mirod@xmltwig.com>
3455
3457 This library is free software; you can redistribute it and/or modify it
3458 under the same terms as Perl itself.
3459
3460 Bug reports should be sent using: RT
3461 <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig>
3462
3463 Comments can be sent to mirod@xmltwig.com
3464
3465 The XML::Twig page is at <http://www.xmltwig.com/xmltwig/> It includes
3466 the development version of the module, a slightly better version of the
3467 documentation, examples, a tutorial and a: Processing XML efficiently
3468 with Perl and XML::Twig: <http://www.xmltwig.com/xmltwig/tuto‐
3469 rial/index.html>
3470
3472 Complete docs, including a tutorial, examples, an easier to use HTML
3473 version of the docs, a quick reference card and a FAQ are available at
3474 <http://www.xmltwig.com/xmltwig/>
3475
3476 XML::Parser, XML::Parser::Expat, XML::XPath, Encode, Text::Iconv,
3477 Scalar::Utils
3478
3479 Alternative Modules
3480
3481 XML::Twig is not the only XML::Processing module available on CPAN (far
3482 from it!).
3483
3484 The main alternative I would recommend is XML::LibXML.
3485
3486 Here is a quick comparison of the 2 modules:
3487
3488 XML::LibXML, actually "libxml2" on which it is based, sticks to the
3489 standards, and implements a good number of them in a rather strict way:
3490 XML, XPath, DOM, RelaxNG, I must be forgetting a couple (XInclude?). It
3491 is fast and rather frugal memory-wise.
3492
3493 XML::Twig is older: when I started writing it XML::Parser/expat was the
3494 only game in town. It implements XML and that's about it (plus a subset
3495 of XPath, and you can use XML::Twig::XPath if you have XML::XPath
3496 installed for full support). It is slower and requires more memory for
3497 a full tree than XML::LibXML. On the plus side (yes, there is a plus
3498 side!) it lets you process a big document in chunks, and thus let you
3499 tackle documents that couldn't be loaded in memory by XML::LibXML, and
3500 it offers a lot (and I mean a LOT!) of higher-level methods, for every‐
3501 thing, from adding structure to "low-level" XML, to shortcuts for XHTML
3502 conversions and more. It also DWIMs quite a bit, getting comments and
3503 non-significant whitespaces out of the way but preserving them in the
3504 output for example. As it does not stick to the DOM, is also usually
3505 leads to shorter code than in XML::LibXML.
3506
3507 Beyond the pure features of the 2 modules, XML::LibXML seems to be
3508 prefered by "XML-purists", while XML::Twig seems to be more used by
3509 Perl Hackers who have to deal with XML. As you have noted, XML::Twig
3510 also comes with quite a lot of docs, but I am sure if you ask for help
3511 about XML::LibXML here or on Perlmonks you will get answers.
3512
3513 Note that it is actually quite hard for me to compare the 2 modules: on
3514 one hand I know XML::Twig inside-out and I can get it to do pretty much
3515 anything I need to (or I improve it ;--), while I have a very basic
3516 knowledge of XML::LibXML. So feature-wise, I'd rather use XML::Twig
3517 ;--). On the other hand, I am painfully aware of some of the deficien‐
3518 cies, potential bugs and plain ugly code that lurk in XML::Twig, even
3519 though you are unlikely to be affected by them (unless for example you
3520 need to change the DTD of a document programatically), while I haven't
3521 looked much into XML::LibXML so it still looks shinny and clean to me.
3522
3523 That said, ifyou need to process a document that is too big to fit mem‐
3524 ory and XML::Twig is too slow for you, my reluctant advice would be to
3525 use "bare" XML::Parser. It won't be as easy to use as XML::Twig: basi‐
3526 cally with XML::Twig you trade some speed (depending on what you do
3527 from a factor 3 to... none) for ease-of-use, but it will be easier IMHO
3528 than using SAX (albeit not standard), and at this point a LOT faster
3529 (see the last test in <http://www.xmltwig.com/article/simple_bench‐
3530 mark/>).
3531
3532
3533
3534perl v5.8.8 2007-02-13 Twig(3)