1Twig(3) User Contributed Perl Documentation Twig(3)
2
3
4
6 XML::Twig - A perl module for processing huge XML documents in tree
7 mode.
8
10 Note that this documentation is intended as a reference to the module.
11
12 Complete docs, including a tutorial, examples, an easier to use HTML
13 version, a quick reference card and a FAQ are available at
14 <http://www.xmltwig.com/xmltwig>
15
16 Small documents (loaded in memory as a tree):
17
18 my $twig=XML::Twig->new(); # create the twig
19 $twig->parsefile( 'doc.xml'); # build it
20 my_process( $twig); # use twig methods to process it
21 $twig->print; # output the twig
22
23 Huge documents (processed in combined stream/tree mode):
24
25 # at most one div will be loaded in memory
26 my $twig=XML::Twig->new(
27 twig_handlers =>
28 { title => sub { $_->set_tag( 'h2') }, # change title tags to h2
29 para => sub { $_->set_tag( 'p') }, # change para to p
30 hidden => sub { $_->delete; }, # remove hidden elements
31 list => \&my_list_process, # process list elements
32 div => sub { $_[0]->flush; }, # output and free memory
33 },
34 pretty_print => 'indented', # output will be nicely formatted
35 empty_tags => 'html', # outputs <empty_tag />
36 );
37 $twig->flush; # flush the end of the document
38
39 See XML::Twig 101 for other ways to use the module, as a filter for
40 example.
41
43 This module provides a way to process XML documents. It is build on top
44 of "XML::Parser".
45
46 The module offers a tree interface to the document, while allowing you
47 to output the parts of it that have been completely processed.
48
49 It allows minimal resource (CPU and memory) usage by building the tree
50 only for the parts of the documents that need actual processing,
51 through the use of the "twig_roots " and "twig_print_outside_roots "
52 options. The "finish " and "finish_print " methods also help to
53 increase performances.
54
55 XML::Twig tries to make simple things easy so it tries its best to
56 takes care of a lot of the (usually) annoying (but sometimes necessary)
57 features that come with XML and XML::Parser.
58
60 XML::Twig can be used either on "small" XML documents (that fit in
61 memory) or on huge ones, by processing parts of the document and
62 outputting or discarding them once they are processed.
63
64 Loading an XML document and processing it
65 my $t= XML::Twig->new();
66 $t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
67 my $root= $t->root;
68 $root->set_tag( 'html'); # change doc to html
69 $title= $root->first_child( 'title'); # get the title
70 $title->set_tag( 'h1'); # turn it into h1
71 my @para= $root->children( 'para'); # get the para children
72 foreach my $para (@para)
73 { $para->set_tag( 'p'); } # turn them into p
74 $t->print; # output the document
75
76 Other useful methods include:
77
78 att: "$elt->{'att'}->{'foo'}" return the "foo" attribute for an
79 element,
80
81 set_att : "$elt->set_att( foo => "bar")" sets the "foo" attribute to
82 the "bar" value,
83
84 next_sibling: "$elt->{next_sibling}" return the next sibling in the
85 document (in the example "$title->{next_sibling}" is the first "para",
86 you can also (and actually should) use "$elt->next_sibling( 'para')" to
87 get it
88
89 The document can also be transformed through the use of the cut, copy,
90 paste and move methods: "$title->cut; $title->paste( after => $p);" for
91 example
92
93 And much, much more, see XML::Twig::Elt.
94
95 Processing an XML document chunk by chunk
96 One of the strengths of XML::Twig is that it let you work with files
97 that do not fit in memory (BTW storing an XML document in memory as a
98 tree is quite memory-expensive, the expansion factor being often around
99 10).
100
101 To do this you can define handlers, that will be called once a specific
102 element has been completely parsed. In these handlers you can access
103 the element and process it as you see fit, using the navigation and the
104 cut-n-paste methods, plus lots of convenient ones like "prefix ". Once
105 the element is completely processed you can then "flush " it, which
106 will output it and free the memory. You can also "purge " it if you
107 don't need to output it (if you are just extracting some data from the
108 document for example). The handler will be called again once the next
109 relevant element has been parsed.
110
111 my $t= XML::Twig->new( twig_handlers =>
112 { section => \§ion,
113 para => sub { $_->set_tag( 'p'); }
114 },
115 );
116 $t->parsefile( 'doc.xml');
117 $t->flush; # don't forget to flush one last time in the end or anything
118 # after the last </section> tag will not be output
119
120 # the handler is called once a section is completely parsed, ie when
121 # the end tag for section is found, it receives the twig itself and
122 # the element (including all its sub-elements) as arguments
123 sub section
124 { my( $t, $section)= @_; # arguments for all twig_handlers
125 $section->set_tag( 'div'); # change the tag name.4, my favourite method...
126 # let's use the attribute nb as a prefix to the title
127 my $title= $section->first_child( 'title'); # find the title
128 my $nb= $title->{'att'}->{'nb'}; # get the attribute
129 $title->prefix( "$nb - "); # easy isn't it?
130 $section->flush; # outputs the section and frees memory
131 }
132
133 There is of course more to it: you can trigger handlers on more
134 elaborate conditions than just the name of the element, "section/title"
135 for example.
136
137 my $t= XML::Twig->new( twig_handlers =>
138 { 'section/title' => sub { $_->print } }
139 )
140 ->parsefile( 'doc.xml');
141
142 Here "sub { $_->print }" simply prints the current element ($_ is
143 aliased to the element in the handler).
144
145 You can also trigger a handler on a test on an attribute:
146
147 my $t= XML::Twig->new( twig_handlers =>
148 { 'section[@level="1"]' => sub { $_->print } }
149 );
150 ->parsefile( 'doc.xml');
151
152 You can also use "start_tag_handlers " to process an element as soon as
153 the start tag is found. Besides "prefix " you can also use "suffix ",
154
155 Processing just parts of an XML document
156 The twig_roots mode builds only the required sub-trees from the
157 document Anything outside of the twig roots will just be ignored:
158
159 my $t= XML::Twig->new(
160 # the twig will include just the root and selected titles
161 twig_roots => { 'section/title' => \&print_n_purge,
162 'annex/title' => \&print_n_purge
163 }
164 );
165 $t->parsefile( 'doc.xml');
166
167 sub print_n_purge
168 { my( $t, $elt)= @_;
169 print $elt->text; # print the text (including sub-element texts)
170 $t->purge; # frees the memory
171 }
172
173 You can use that mode when you want to process parts of a documents but
174 are not interested in the rest and you don't want to pay the price,
175 either in time or memory, to build the tree for the it.
176
177 Building an XML filter
178 You can combine the "twig_roots" and the "twig_print_outside_roots"
179 options to build filters, which let you modify selected elements and
180 will output the rest of the document as is.
181
182 This would convert prices in $ to prices in Euro in a document:
183
184 my $t= XML::Twig->new(
185 twig_roots => { 'price' => \&convert, }, # process prices
186 twig_print_outside_roots => 1, # print the rest
187 );
188 $t->parsefile( 'doc.xml');
189
190 sub convert
191 { my( $t, $price)= @_;
192 my $currency= $price->{'att'}->{'currency'}; # get the currency
193 if( $currency eq 'USD')
194 { $usd_price= $price->text; # get the price
195 # %rate is just a conversion table
196 my $euro_price= $usd_price * $rate{usd2euro};
197 $price->set_text( $euro_price); # set the new price
198 $price->set_att( currency => 'EUR'); # don't forget this!
199 }
200 $price->print; # output the price
201 }
202
203 XML::Twig and various versions of Perl, XML::Parser and expat:
204 Before being uploaded to CPAN, XML::Twig 3.22 has been tested under the
205 following environments:
206
207 linux-x86
208 perl 5.6.2, expat 1.95.8, XML::Parser 2.34 perl 5.8.0, expat
209 1.95.8, XML::Parser 2.34 perl 5.8.7, expat 1.95.8, XML::Parser2.34
210
211 Solaris
212 perl 5.6.1, expat 1.95.2, XML::Parser 2.31
213
214 XML::Twig is a lot more sensitive to variations in versions of perl,
215 XML::Parser and expat than to the OS, so this should cover some
216 reasonable configurations.
217
218 The "recommended configuration" is perl 5.8.3+ (for good Unicode
219 support), XML::Parser 2.31+ and expat 1.95.5+
220
221 See <http://testers.cpan.org/search?request=dist&dist=XML-Twig> for the
222 CPAN testers reports on XML::Twig, which list all tested
223 configurations.
224
225 An Atom feed of the CPAN Testers results is available at
226 <http://xmltwig.com/rss/twig_testers.rss>
227
228 Finally:
229
230 XML::Twig does NOT work with expat 1.95.4
231 XML::Twig only works with XML::Parser 2.27 in perl 5.6.*
232 Note that I can't compile XML::Parser 2.27 anymore, so I can't
233 guarantee that it still works
234
235 XML::Parser 2.28 does not really work
236
237 When in doubt, upgrade expat, XML::Parser and Scalar::Util
238
239 Finally, for some optional features, XML::Twig depends on some
240 additional modules. The complete list, which depends somewhat on the
241 version of Perl that you are running, is given by running
242 "t/zz_dump_config.t"
243
245 Whitespaces
246 Whitespaces that look non-significant are discarded, this behaviour
247 can be controlled using the "keep_spaces ", "keep_spaces_in " and
248 "discard_spaces_in " options.
249
250 Encoding
251 You can specify that you want the output in the same encoding as
252 the input (provided you have valid XML, which means you have to
253 specify the encoding either in the document or when you create the
254 Twig object) using the "keep_encoding " option
255
256 You can also use "output_encoding" to convert the internal UTF-8
257 format to the required encoding.
258
259 Comments and Processing Instructions (PI)
260 Comments and PI's can be hidden from the processing, but still
261 appear in the output (they are carried by the "real" element closer
262 to them)
263
264 Pretty Printing
265 XML::Twig can output the document pretty printed so it is easier to
266 read for us humans.
267
268 Surviving an untimely death
269 XML parsers are supposed to react violently when fed improper XML.
270 XML::Parser just dies.
271
272 XML::Twig provides the "safe_parse " and the "safe_parsefile "
273 methods which wrap the parse in an eval and return either the
274 parsed twig or 0 in case of failure.
275
276 Private attributes
277 Attributes with a name starting with # (illegal in XML) will not be
278 output, so you can safely use them to store temporary values during
279 processing. Note that you can store anything in a private
280 attribute, not just text, it's just a regular Perl variable, so a
281 reference to an object or a huge data structure is perfectly fine.
282
284 XML::Twig uses a very limited number of classes. The ones you are most
285 likely to use are "XML::Twig" of course, which represents a complete
286 XML document, including the document itself (the root of the document
287 itself is "root"), its handlers, its input or output filters... The
288 other main class is "XML::Twig::Elt", which models an XML element.
289 Element here has a very wide definition: it can be a regular element,
290 or but also text, with an element "tag" of "#PCDATA" (or "#CDATA"), an
291 entity (tag is "#ENT"), a Processing Instruction ("#PI"), a comment
292 ("#COMMENT").
293
294 Those are the 2 commonly used classes.
295
296 You might want to look the "elt_class" option if you want to subclass
297 "XML::Twig::Elt".
298
299 Attributes are just attached to their parent element, they are not
300 objects per se. (Please use the provided methods "att" and "set_att" to
301 access them, if you access them as a hash, then your code becomes
302 implementaion dependent and might break in the future).
303
304 Other classes that are seldom used are "XML::Twig::Entity_list" and
305 "XML::Twig::Entity".
306
307 If you use "XML::Twig::XPath" instead of "XML::Twig", elements are then
308 created as "XML::Twig::XPath::Elt"
309
311 XML::Twig
312 A twig is a subclass of XML::Parser, so all XML::Parser methods can be
313 called on a twig object, including parse and parsefile. "setHandlers"
314 on the other hand cannot be used, see "BUGS "
315
316 new This is a class method, the constructor for XML::Twig. Options are
317 passed as keyword value pairs. Recognized options are the same as
318 XML::Parser, plus some XML::Twig specifics.
319
320 New Options:
321
322 twig_handlers
323 This argument consists of a hash "{ expression =" \&handler}>
324 where expression is a an XPath-like expression (+ some others).
325
326 XPath expressions are limited to using the child and descendant
327 axis (indeed you can't specify an axis), and predicates cannot
328 be nested. You can use the "string", or "string(<tag>)"
329 function (except in "twig_roots" triggers).
330
331 Additionally you can use regexps (/ delimited) to match
332 attribute and string values.
333
334 Examples:
335
336 foo
337 foo/bar
338 foo//bar
339 /foo/bar
340 /foo//bar
341 /foo/bar[@att1 = "val1" and @att2 = "val2"]/baz[@a >= 1]
342 foo[string()=~ /^duh!+/]
343 /foo[string(bar)=~ /\d+/]/baz[@att != 3]
344
345 #CDATA can be used to call a handler for a CDATA. #COMMENT can
346 be used to call a handler for comments
347
348 Some additional (non-XPath) expressions are also provided for
349 convenience:
350
351 processing instructions
352 '?' or '#PI' triggers the handler for any processing
353 instruction, and '?<target>' or '#PI <target>' triggers a
354 handler for processing instruction with the given target(
355 ex: '#PI xml-stylesheet').
356
357 level(<level>)
358 Triggers the handler on any element at that level in the
359 tree (root is level 1)
360
361 _all_
362 Triggers the handler for all elements in the tree
363
364 _default_
365 Triggers the handler for each element that does NOT have
366 any other handler.
367
368 Expressions are evaluated against the input document. Which
369 means that even if you have changed the tag of an element
370 (changing the tag of a parent element from a handler for
371 example) the change will not impact the expression evaluation.
372 There is an exception to this: "private" attributes (which name
373 start with a '#', and can only be created during the parsing,
374 as they are not valid XML) are checked against the current
375 twig.
376
377 Handlers are triggered in fixed order, sorted by their type
378 (xpath expressions first, then regexps, then level), then by
379 whether they specify a full path (starting at the root element)
380 or not, then by by number of steps in the expression , then
381 number of predicates, then number of tests in predicates.
382 Handlers where the last step does not specify a step
383 ("foo/bar/*") are triggered after other XPath handlers.
384 Finally "_all_" handlers are triggered last.
385
386 Important: once a handler has been triggered if it returns 0
387 then no other handler is called, except a "_all_" handler which
388 will be called anyway.
389
390 If a handler returns a true value and other handlers apply,
391 then the next applicable handler will be called. Repeat, rinse,
392 lather..; The exception to that rule is when the
393 "do_not_chain_handlers" option is set, in which case only the
394 first handler will be called.
395
396 Note that it might be a good idea to explicitly return a short
397 true value (like 1) from handlers: this ensures that other
398 applicable handlers are called even if the last statement for
399 the handler happens to evaluate to false. This might also
400 speedup the code by avoiding the result of the last statement
401 of the code to be copied and passed to the code managing
402 handlers. It can really pay to have 1 instead of a long string
403 returned.
404
405 When an element is CLOSED the corresponding handler is called,
406 with 2 arguments: the twig and the "Element ". The twig
407 includes the document tree that has been built so far, the
408 element is the complete sub-tree for the element. This means
409 that handlers for inner elements are called before handlers for
410 outer elements.
411
412 $_ is also set to the element, so it is easy to write inline
413 handlers like
414
415 para => sub { $_->set_tag( 'p'); }
416
417 Text is stored in elements whose tag is #PCDATA (due to mixed
418 content, text and sub-element in an element there is no way to
419 store the text as just an attribute of the enclosing element).
420
421 Warning: if you have used purge or flush on the twig the
422 element might not be complete, some of its children might have
423 been entirely flushed or purged, and the start tag might even
424 have been printed (by "flush") already, so changing its tag
425 might not give the expected result.
426
427 twig_roots
428 This argument let's you build the tree only for those elements
429 you are interested in.
430
431 Example: my $t= XML::Twig->new( twig_roots => { title => 1, subtitle => 1});
432 $t->parsefile( file);
433 my $t= XML::Twig->new( twig_roots => { 'section/title' => 1});
434 $t->parsefile( file);
435
436 return a twig containing a document including only "title" and
437 "subtitle" elements, as children of the root element.
438
439 You can use generic_attribute_condition, attribute_condition,
440 full_path, partial_path, tag, tag_regexp, _default_ and _all_
441 to trigger the building of the twig. string_condition and
442 regexp_condition cannot be used as the content of the element,
443 and the string, have not yet been parsed when the condition is
444 checked.
445
446 WARNING: path are checked for the document. Even if the
447 "twig_roots" option is used they will be checked against the
448 full document tree, not the virtual tree created by XML::Twig
449
450 WARNING: twig_roots elements should NOT be nested, that would
451 hopelessly confuse XML::Twig ;--(
452
453 Note: you can set handlers (twig_handlers) using twig_roots
454 Example: my $t= XML::Twig->new( twig_roots =>
455 { title => sub {
456 $_{1]->print;},
457 subtitle =>
458 \&process_subtitle
459 }
460 );
461 $t->parsefile( file);
462
463 twig_print_outside_roots
464 To be used in conjunction with the "twig_roots" argument. When
465 set to a true value this will print the document outside of the
466 "twig_roots" elements.
467
468 Example: my $t= XML::Twig->new( twig_roots => { title => \&number_title },
469 twig_print_outside_roots => 1,
470 );
471 $t->parsefile( file);
472 { my $nb;
473 sub number_title
474 { my( $twig, $title);
475 $nb++;
476 $title->prefix( "$nb "; }
477 $title->print;
478 }
479 }
480
481 This example prints the document outside of the title element,
482 calls "number_title" for each "title" element, prints it, and
483 then resumes printing the document. The twig is built only for
484 the "title" elements.
485
486 If the value is a reference to a file handle then the document
487 outside the "twig_roots" elements will be output to this file
488 handle:
489
490 open( OUT, ">out_file") or die "cannot open out file out_file:$!";
491 my $t= XML::Twig->new( twig_roots => { title => \&number_title },
492 # default output to OUT
493 twig_print_outside_roots => \*OUT,
494 );
495
496 { my $nb;
497 sub number_title
498 { my( $twig, $title);
499 $nb++;
500 $title->prefix( "$nb "; }
501 $title->print( \*OUT); # you have to print to \*OUT here
502 }
503 }
504
505 start_tag_handlers
506 A hash "{ expression =" \&handler}>. Sets element handlers that
507 are called when the element is open (at the end of the
508 XML::Parser "Start" handler). The handlers are called with 2
509 params: the twig and the element. The element is empty at that
510 point, its attributes are created though.
511
512 You can use generic_attribute_condition, attribute_condition,
513 full_path, partial_path, tag, tag_regexp, _default_ and _all_
514 to trigger the handler.
515
516 string_condition and regexp_condition cannot be used as the
517 content of the element, and the string, have not yet been
518 parsed when the condition is checked.
519
520 The main uses for those handlers are to change the tag name
521 (you might have to do it as soon as you find the open tag if
522 you plan to "flush" the twig at some point in the element, and
523 to create temporary attributes that will be used when
524 processing sub-element with "twig_hanlders".
525
526 You should also use it to change tags if you use "flush". If
527 you change the tag in a regular "twig_handler" then the start
528 tag might already have been flushed.
529
530 Note: "start_tag" handlers can be called outside of
531 "twig_roots" if this argument is used, in this case handlers
532 are called with the following arguments: $t (the twig), $tag
533 (the tag of the element) and %att (a hash of the attributes of
534 the element).
535
536 If the "twig_print_outside_roots" argument is also used, if the
537 last handler called returns a "true" value, then the the start
538 tag will be output as it appeared in the original document, if
539 the handler returns a a "false" value then the start tag will
540 not be printed (so you can print a modified string yourself for
541 example).
542
543 Note that you can use the ignore method in "start_tag_handlers"
544 (and only there).
545
546 end_tag_handlers
547 A hash "{ expression =" \&handler}>. Sets element handlers that
548 are called when the element is closed (at the end of the
549 XML::Parser "End" handler). The handlers are called with 2
550 params: the twig and the tag of the element.
551
552 twig_handlers are called when an element is completely parsed,
553 so why have this redundant option? There is only one use for
554 "end_tag_handlers": when using the "twig_roots" option, to
555 trigger a handler for an element outside the roots. It is for
556 example very useful to number titles in a document using nested
557 sections:
558
559 my @no= (0);
560 my $no;
561 my $t= XML::Twig->new(
562 start_tag_handlers =>
563 { section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
564 twig_roots =>
565 { title => sub { $_[1]->prefix( $no); $_[1]->print; } },
566 end_tag_handlers => { section => sub { pop @no; } },
567 twig_print_outside_roots => 1
568 );
569 $t->parsefile( $file);
570
571 Using the "end_tag_handlers" argument without "twig_roots" will
572 result in an error.
573
574 do_not_chain_handlers
575 If this option is set to a true value, then only one handler
576 will be called for each element, even if several satisfy the
577 condition
578
579 Note that the "_all_" handler will still be called regardless
580
581 ignore_elts
582 This option lets you ignore elements when building the twig.
583 This is useful in cases where you cannot use "twig_roots" to
584 ignore elements, for example if the element to ignore is a
585 sibling of elements you are interested in.
586
587 Example:
588
589 my $twig= XML::Twig->new( ignore_elts => { elt => 1 });
590 $twig->parsefile( 'doc.xml');
591
592 This will build the complete twig for the document, except that
593 all "elt" elements (and their children) will be left out.
594
595 char_handler
596 A reference to a subroutine that will be called every time
597 "PCDATA" is found.
598
599 The subroutine receives the string as argument, and returns the
600 modified string:
601
602 # we want all strings in upper case
603 sub my_char_handler
604 { my( $text)= @_;
605 $text= uc( $text);
606 return $text;
607 }
608
609 elt_class
610 The name of a class used to store elements. this class should
611 inherit from "XML::Twig::Elt" (and by default it is
612 "XML::Twig::Elt"). This option is used to subclass the element
613 class and extend it with new methods.
614
615 This option is needed because during the parsing of the XML,
616 elements are created by "XML::Twig", without any control from
617 the user code.
618
619 keep_atts_order
620 Setting this option to a true value causes the attribute hash
621 to be tied to a "Tie::IxHash" object. This means that
622 "Tie::IxHash" needs to be installed for this option to be
623 available. It also means that the hash keeps its order, so you
624 will get the attributes in order. This allows outputting the
625 attributes in the same order as they were in the original
626 document.
627
628 keep_encoding
629 This is a (slightly?) evil option: if the XML document is not
630 UTF-8 encoded and you want to keep it that way, then setting
631 keep_encoding will use the"Expat" original_string method for
632 character, thus keeping the original encoding, as well as the
633 original entities in the strings.
634
635 See the "t/test6.t" test file to see what results you can
636 expect from the various encoding options.
637
638 WARNING: if the original encoding is multi-byte then attribute
639 parsing will be EXTREMELY unsafe under any Perl before 5.6, as
640 it uses regular expressions which do not deal properly with
641 multi-byte characters. You can specify an alternate function to
642 parse the start tags with the "parse_start_tag" option (see
643 below)
644
645 WARNING: this option is NOT used when parsing with the non-
646 blocking parser ("parse_start", "parse_more", parse_done
647 methods) which you probably should not use with XML::Twig
648 anyway as they are totally untested!
649
650 output_encoding
651 This option generates an output_filter using "Encode",
652 "Text::Iconv" or "Unicode::Map8" and "Unicode::Strings", and
653 sets the encoding in the XML declaration. This is the easiest
654 way to deal with encodings, if you need more sophisticated
655 features, look at "output_filter" below
656
657 output_filter
658 This option is used to convert the character encoding of the
659 output document. It is passed either a string corresponding to
660 a predefined filter or a subroutine reference. The filter will
661 be called every time a document or element is processed by the
662 "print" functions ("print", "sprint", "flush").
663
664 Pre-defined filters:
665
666 latin1
667 uses either "Encode", "Text::Iconv" or "Unicode::Map8" and
668 "Unicode::String" or a regexp (which works only with
669 XML::Parser 2.27), in this order, to convert all characters
670 to ISO-8859-1 (aka latin1)
671
672 html
673 does the same conversion as "latin1", plus encodes entities
674 using "HTML::Entities" (oddly enough you will need to have
675 HTML::Entities installed for it to be available). This
676 should only be used if the tags and attribute names
677 themselves are in US-ASCII, or they will be converted and
678 the output will not be valid XML any more
679
680 safe
681 converts the output to ASCII (US) only plus character
682 entities ("&#nnn;") this should be used only if the tags
683 and attribute names themselves are in US-ASCII, or they
684 will be converted and the output will not be valid XML any
685 more
686
687 safe_hex
688 same as "safe" except that the character entities are in
689 hexa ("&#xnnn;")
690
691 encode_convert ($encoding)
692 Return a subref that can be used to convert utf8 strings to
693 $encoding). Uses "Encode".
694
695 my $conv = XML::Twig::encode_convert( 'latin1');
696 my $t = XML::Twig->new(output_filter => $conv);
697
698 iconv_convert ($encoding)
699 this function is used to create a filter subroutine that
700 will be used to convert the characters to the target
701 encoding using "Text::Iconv" (which needs to be installed,
702 look at the documentation for the module and for the
703 "iconv" library to find out which encodings are available
704 on your system)
705
706 my $conv = XML::Twig::iconv_convert( 'latin1');
707 my $t = XML::Twig->new(output_filter => $conv);
708
709 unicode_convert ($encoding)
710 this function is used to create a filter subroutine that
711 will be used to convert the characters to the target
712 encoding using "Unicode::Strings" and "Unicode::Map8"
713 (which need to be installed, look at the documentation for
714 the modules to find out which encodings are available on
715 your system)
716
717 my $conv = XML::Twig::unicode_convert( 'latin1');
718 my $t = XML::Twig->new(output_filter => $conv);
719
720 The "text" and "att" methods do not use the filter, so their
721 result are always in unicode.
722
723 Those predeclared filters are based on subroutines that can be
724 used by themselves (as "XML::Twig::foo").
725
726 html_encode ($string)
727 Use "HTML::Entities" to encode a utf8 string
728
729 safe_encode ($string)
730 Use either a regexp (perl < 5.8) or "Encode" to encode non-
731 ascii characters in the string in "&#<nnnn>;" format
732
733 safe_encode_hex ($string)
734 Use either a regexp (perl < 5.8) or "Encode" to encode non-
735 ascii characters in the string in "&#x<nnnn>;" format
736
737 regexp2latin1 ($string)
738 Use a regexp to encode a utf8 string into latin 1
739 (ISO-8859-1). Does not work with Perl 5.8.0!
740
741 output_text_filter
742 same as output_filter, except it doesn't apply to the brackets
743 and quotes around attribute values. This is useful for all
744 filters that could change the tagging, basically anything that
745 does not just change the encoding of the output. "html", "safe"
746 and "safe_hex" are better used with this option.
747
748 input_filter
749 This option is similar to "output_filter" except the filter is
750 applied to the characters before they are stored in the twig,
751 at parsing time.
752
753 remove_cdata
754 Setting this option to a true value will force the twig to
755 output CDATA sections as regular (escaped) PCDATA
756
757 parse_start_tag
758 If you use the "keep_encoding" option then this option can be
759 used to replace the default parsing function. You should
760 provide a coderef (a reference to a subroutine) as the
761 argument, this subroutine takes the original tag (given by
762 XML::Parser::Expat "original_string()" method) and returns a
763 tag and the attributes in a hash (or in a list
764 attribute_name/attribute value).
765
766 expand_external_ents
767 When this option is used external entities (that are defined)
768 are expanded when the document is output using "print"
769 functions such as "print ", "sprint ", "flush " and "xml_string
770 ". Note that in the twig the entity will be stored as an
771 element with a tag '"#ENT"', the entity will not be expanded
772 there, so you might want to process the entities before
773 outputting it.
774
775 If an external entity is not available, then the parse will
776 fail.
777
778 A special case is when the value of this option is -1. In that
779 case a missing entity will not cause the parser to die, but its
780 "name", "sysid" and "pubid" will be stored in the twig as
781 "$twig->{twig_missing_system_entities}" (a reference to an
782 array of hashes { name => <name>, sysid => <sysid>, pubid =>
783 <pubid> }). Yes, this is a bit of a hack, but it's useful in
784 some cases.
785
786 load_DTD
787 If this argument is set to a true value, "parse" or "parsefile"
788 on the twig will load the DTD information. This information
789 can then be accessed through the twig, in a "DTD_handler" for
790 example. This will load even an external DTD.
791
792 Default and fixed values for attributes will also be filled,
793 based on the DTD.
794
795 Note that to do this the module will generate a temporary file
796 in the current directory. If this is a problem let me know and
797 I will add an option to specify an alternate directory.
798
799 See "DTD Handling" for more information
800
801 DTD_handler
802 Set a handler that will be called once the doctype (and the
803 DTD) have been loaded, with 2 arguments, the twig and the DTD.
804
805 no_prolog
806 Does not output a prolog (XML declaration and DTD)
807
808 id This optional argument gives the name of an attribute that can
809 be used as an ID in the document. Elements whose ID is known
810 can be accessed through the elt_id method. id defaults to 'id'.
811 See "BUGS "
812
813 discard_spaces
814 If this optional argument is set to a true value then spaces
815 are discarded when they look non-significant: strings
816 containing only spaces are discarded. This argument is set to
817 true by default.
818
819 keep_spaces
820 If this optional argument is set to a true value then all
821 spaces in the document are kept, and stored as "PCDATA".
822
823 Warning: adding this option can result in changes in the twig
824 generated: space that was previously discarded might end up in
825 a new text element. see the difference by calling the following
826 code with 0 and 1 as arguments:
827
828 perl -MXML::Twig -e'print XML::Twig->new( keep_spaces => shift)->parse( "<d> \n<e/></d>")->_dump'
829
830 "keep_spaces" and "discard_spaces" cannot be both set.
831
832 discard_spaces_in
833 This argument sets "keep_spaces" to true but will cause the
834 twig builder to discard spaces in the elements listed.
835
836 The syntax for using this argument is:
837
838 XML::Twig->new( discard_spaces_in => [ 'elt1', 'elt2']);
839
840 keep_spaces_in
841 This argument sets "discard_spaces" to true but will cause the
842 twig builder to keep spaces in the elements listed.
843
844 The syntax for using this argument is:
845
846 XML::Twig->new( keep_spaces_in => [ 'elt1', 'elt2']);
847
848 Warning: adding this option can result in changes in the twig
849 generated: space that was previously discarded might end up in
850 a new text element.
851
852 pretty_print
853 Set the pretty print method, amongst '"none"' (default),
854 '"nsgmls"', '"nice"', '"indented"', '"indented_c"',
855 '"indented_a"', '"indented_close_tag"', '"cvs"', '"wrapped"',
856 '"record"' and '"record_c"'
857
858 pretty_print formats:
859
860 none
861 The document is output as one ling string, with no line
862 breaks except those found within text elements
863
864 nsgmls
865 Line breaks are inserted in safe places: that is within
866 tags, between a tag and an attribute, between attributes
867 and before the > at the end of a tag.
868
869 This is quite ugly but better than "none", and it is very
870 safe, the document will still be valid (conforming to its
871 DTD).
872
873 This is how the SGML parser "sgmls" splits documents, hence
874 the name.
875
876 nice
877 This option inserts line breaks before any tag that does
878 not contain text (so element with textual content are not
879 broken as the \n is the significant).
880
881 WARNING: this option leaves the document well-formed but
882 might make it invalid (not conformant to its DTD). If you
883 have elements declared as
884
885 <!ELEMENT foo (#PCDATA|bar)>
886
887 then a "foo" element including a "bar" one will be printed
888 as
889
890 <foo>
891 <bar>bar is just pcdata</bar>
892 </foo>
893
894 This is invalid, as the parser will take the line break
895 after the "foo" tag as a sign that the element contains
896 PCDATA, it will then die when it finds the "bar" tag. This
897 may or may not be important for you, but be aware of it!
898
899 indented
900 Same as "nice" (and with the same warning) but indents
901 elements according to their level
902
903 indented_c
904 Same as "indented" but a little more compact: the closing
905 tags are on the same line as the preceding text
906
907 indented_close_tag
908 Same as "indented" except that the closing tag is also
909 indented, to line up with the tags within the element
910
911 idented_a
912 This formats XML files in a line-oriented version control
913 friendly way. The format is described in
914 <http://tinyurl.com/2kwscq> (that's an Oracle document with
915 an insanely long URL).
916
917 Note that to be totaly conformant to the "spec", the order
918 of attributes should not be changed, so if they are not
919 already in alphabetical order you will need to use the
920 "keep_atts_order" option.
921
922 cvs Same as "idented_a".
923
924 wrapped
925 Same as "indented_c" but lines are wrapped using
926 Text::Wrap::wrap. The default length for lines is the
927 default for $Text::Wrap::columns, and can be changed by
928 changing that variable.
929
930 record
931 This is a record-oriented pretty print, that display data
932 in records, one field per line (which looks a LOT like
933 "indented")
934
935 record_c
936 Stands for record compact, one record per line
937
938 empty_tags
939 Set the empty tag display style ('"normal"', '"html"' or
940 '"expand"').
941
942 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
943 '"<tag />"' for elements that can be empty in XHTML and
944 "expand" outputs '"<tag></tag>"'
945
946 quote
947 Set the quote character for attributes ('"single"' or
948 '"double"').
949
950 escape_gt
951 By default XML::Twig does not escape the character > in its
952 output, as it is not mandated by the XML spec. With this option
953 on, > will be replaced by ">"
954
955 comments
956 Set the way comments are processed: '"drop"' (default),
957 '"keep"' or '"process"'
958
959 Comments processing options:
960
961 drop
962 drops the comments, they are not read, nor printed to the
963 output
964
965 keep
966 comments are loaded and will appear on the output, they are
967 not accessible within the twig and will not interfere with
968 processing though
969
970 Note: comments in the middle of a text element such as
971
972 <p>text <!-- comment --> more text --></p>
973
974 are kept at their original position in the text. Using
975 EeX"print" methods like "print" or "sprint" will return the
976 comments in the text. Using "text" or "field" on the other
977 hand will not.
978
979 Any use of "set_pcdata" on the "#PCDATA" element (directly
980 or through other methods like "set_content") will delete
981 the comment(s).
982
983 process
984 comments are loaded in the twig and will be treated as
985 regular elements (their "tag" is "#COMMENT") this can
986 interfere with processing if you expect
987 "$elt->{first_child}" to be an element but find a comment
988 there. Validation will not protect you from this as
989 comments can happen anywhere. You can use
990 "$elt->first_child( 'tag')" (which is a good habit anyway)
991 to get where you want.
992
993 Consider using "process" if you are outputting SAX events
994 from XML::Twig.
995
996 pi Set the way processing instructions are processed: '"drop"',
997 '"keep"' (default) or '"process"'
998
999 Note that you can also set PI handlers in the "twig_handlers"
1000 option:
1001
1002 '?' => \&handler
1003 '?target' => \&handler 2
1004
1005 The handlers will be called with 2 parameters, the twig and the
1006 PI element if "pi" is set to "process", and with 3, the twig,
1007 the target and the data if "pi" is set to "keep". Of course
1008 they will not be called if "pi" is set to "drop".
1009
1010 If "pi" is set to "keep" the handler should return a string
1011 that will be used as-is as the PI text (it should look like ""
1012 <?target data?" >" or '' if you want to remove the PI),
1013
1014 Only one handler will be called, "?target" or "?" if no
1015 specific handler for that target is available.
1016
1017 map_xmlns
1018 This option is passed a hashref that maps uri's to prefixes.
1019 The prefixes in the document will be replaced by the ones in
1020 the map. The mapped prefixes can (actually have to) be used to
1021 trigger handlers, navigate or query the document.
1022
1023 Here is an example:
1024
1025 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1026 twig_handlers =>
1027 { 'svg:circle' => sub { $_->set_att( r => 20) } },
1028 pretty_print => 'indented',
1029 )
1030 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1031 <gr:circle cx="10" cy="90" r="10"/>
1032 </doc>'
1033 )
1034 ->print;
1035
1036 This will output:
1037
1038 <doc xmlns:svg="http://www.w3.org/2000/svg">
1039 <svg:circle cx="10" cy="90" r="20"/>
1040 </doc>
1041
1042 keep_original_prefix
1043 When used with "map_xmlns" this option will make "XML::Twig"
1044 use the original namespace prefixes when outputting a document.
1045 The mapped prefix will still be used for triggering handlers
1046 and in navigation and query methods.
1047
1048 my $t= XML::Twig->new( map_xmlns => {'http://www.w3.org/2000/svg' => "svg"},
1049 twig_handlers =>
1050 { 'svg:circle' => sub { $_->set_att( r => 20) } },
1051 keep_original_prefix => 1,
1052 pretty_print => 'indented',
1053 )
1054 ->parse( '<doc xmlns:gr="http://www.w3.org/2000/svg">
1055 <gr:circle cx="10" cy="90" r="10"/>
1056 </doc>'
1057 )
1058 ->print;
1059
1060 This will output:
1061
1062 <doc xmlns:gr="http://www.w3.org/2000/svg">
1063 <gr:circle cx="10" cy="90" r="20"/>
1064 </doc>
1065
1066 index ($arrayref or $hashref)
1067 This option creates lists of specific elements during the
1068 parsing of the XML. It takes a reference to either a list of
1069 triggering expressions or to a hash name => expression, and for
1070 each one generates the list of elements that match the
1071 expression. The list can be accessed through the "index"
1072 method.
1073
1074 example:
1075
1076 # using an array ref
1077 my $t= XML::Twig->new( index => [ 'div', 'table' ])
1078 ->parsefile( "foo.xml');
1079 my $divs= $t->index( 'div');
1080 my $first_div= $divs->[0];
1081 my $last_table= $t->index( table => -1);
1082
1083 # using a hashref to name the indexes
1084 my $t= XML::Twig->new( index => { email => 'a[@href=~/^\s*mailto:/]')
1085 ->parsefile( "foo.xml');
1086 my $last_emails= $t->index( email => -1);
1087
1088 Note that the index is not maintained after the parsing. If
1089 elements are deleted, renamed or otherwise hurt during
1090 processing, the index is NOT updated.
1091
1092 Note: I _HATE_ the Java-like name of arguments used by most XML
1093 modules. So in pure TIMTOWTDI fashion all arguments can be written
1094 either as "UglyJavaLikeName" or as "readable_perl_name":
1095 "twig_print_outside_roots" or "TwigPrintOutsideRoots" (or even
1096 "twigPrintOutsideRoots" {shudder}). XML::Twig normalizes them
1097 before processing them.
1098
1099 parse ( $source)
1100 The $source parameter should either be a string containing the
1101 whole XML document, or it should be an open "IO::Handle".
1102 Constructor options to "XML::Parser::Expat" given as keyword-value
1103 pairs may follow the$source parameter. These override, for this
1104 call, any options or attributes passed through from the XML::Parser
1105 instance.
1106
1107 A die call is thrown if a parse error occurs. Otherwise it will
1108 return the twig built by the parse. Use "safe_parse" if you want
1109 the parsing to return even when an error occurs.
1110
1111 If this method is called as a class method ("XML::Twig->parse(
1112 $some_xml_or_html)") then an XML::Twig object is created, using the
1113 parameters except the last one (eg "XML::Twig->parse( pretty_print
1114 => 'indented', $some_xml_or_html)") and "xparse" is called on it.
1115
1116 parsestring
1117 This is just an alias for "parse" for backwards compatibility.
1118
1119 parsefile (FILE [, OPT => OPT_VALUE [...]])
1120 Open "FILE" for reading, then call "parse" with the open handle.
1121 The file is closed no matter how "parse" returns.
1122
1123 A "die" call is thrown if a parse error occurs. Otherwise it will
1124 return the twig built by the parse. Use "safe_parsefile" if you
1125 want the parsing to return even when an error occurs.
1126
1127 parsefile_inplace ( $file, $optional_extension)
1128 Parse and update a file "in place". It does this by creating a temp
1129 file, selecting it as the default for print() statements (and
1130 methods), then parsing the input file. If the parsing is
1131 successful, then the temp file is moved to replace the input file.
1132
1133 If an extension is given then the original file is backed-up (the
1134 rules for the extension are the same as the rule for the -i option
1135 in perl).
1136
1137 parsefile_html_inplace ( $file, $optional_extension)
1138 Same as parsefile_inplace, except that it parses HTML instead of
1139 XML
1140
1141 parseurl ($url $optional_user_agent)
1142 Gets the data from $url and parse it. The data is piped to the
1143 parser in chunks the size of the XML::Parser::Expat buffer, so
1144 memory consumption and hopefully speed are optimal.
1145
1146 For most (read "small") XML it is probably as efficient (and easier
1147 to debug) to just "get" the XML file and then parse it as a string.
1148
1149 use XML::Twig;
1150 use LWP::Simple;
1151 my $twig= XML::Twig->new();
1152 $twig->parse( LWP::Simple::get( $URL ));
1153
1154 or
1155
1156 use XML::Twig;
1157 my $twig= XML::Twig->nparse( $URL);
1158
1159 If the $optional_user_agent argument is used then it is used,
1160 otherwise a new one is created.
1161
1162 safe_parse ( SOURCE [, OPT => OPT_VALUE [...]])
1163 This method is similar to "parse" except that it wraps the parsing
1164 in an "eval" block. It returns the twig on success and 0 on failure
1165 (the twig object also contains the parsed twig). $@ contains the
1166 error message on failure.
1167
1168 Note that the parsing still stops as soon as an error is detected,
1169 there is no way to keep going after an error.
1170
1171 safe_parsefile (FILE [, OPT => OPT_VALUE [...]])
1172 This method is similar to "parsefile" except that it wraps the
1173 parsing in an "eval" block. It returns the twig on success and 0 on
1174 failure (the twig object also contains the parsed twig) . $@
1175 contains the error message on failure
1176
1177 Note that the parsing still stops as soon as an error is detected,
1178 there is no way to keep going after an error.
1179
1180 safe_parseurl ($url $optional_user_agent)
1181 Same as "parseurl" except that it wraps the parsing in an "eval"
1182 block. It returns the twig on success and 0 on failure (the twig
1183 object also contains the parsed twig) . $@ contains the error
1184 message on failure
1185
1186 parse_html ($string_or_fh)
1187 parse an HTML string or file handle (by converting it to XML using
1188 HTML::TreeBuilder, which needs to be available).
1189
1190 This works nicely, but some information gets lost in the process:
1191 newlines are removed, and (at least on the version I use), comments
1192 get get an extra CDATA section inside ( <!-- foo --> becomes <!--
1193 <![CDATA[ foo ]]> -->
1194
1195 parsefile_html
1196 parse an HTML file (by converting it to XML using
1197 HTML::TreeBuilder, which needs to be available). The file is loaded
1198 completely in memory and converted to XML before being parsed.
1199
1200 Alpha: implementation, and thus generated XML could change.
1201
1202 safe_parseurl_html ($url $optional_user_agent)
1203 Same as "parseurl_html"> except that it wraps the parsing in an
1204 "eval" block. It returns the twig on success and 0 on failure (the
1205 twig object also contains the parsed twig) . $@ contains the error
1206 message on failure
1207
1208 safe_parsefile_html ($file $optional_user_agent)
1209 Same as "parsefile_html"> except that it wraps the parsing in an
1210 "eval" block. It returns the twig on success and 0 on failure (the
1211 twig object also contains the parsed twig) . $@ contains the error
1212 message on failure
1213
1214 safe_parse_html ($string_or_fh)
1215 Same as "parse_html" except that it wraps the parsing in an "eval"
1216 block. It returns the twig on success and 0 on failure (the twig
1217 object also contains the parsed twig) . $@ contains the error
1218 message on failure
1219
1220 xparse ($thing_to_parse)
1221 parse the $thing_to_parse, whether it is a filehandle, a string, an
1222 HTML file, an HTML URL, an URL or a file.
1223
1224 Note that this is mostly a convenience method for one-off scripts.
1225 For example files that end in '.htm' or '.html' are parsed first as
1226 XML, and if this fails as HTML. This is certainly not the most
1227 efficient way to do this in general.
1228
1229 nparse ($optional_twig_options, $thing_to_parse)
1230 create a twig with the $optional_options, and parse the
1231 $thing_to_parse, whether it is a filehandle, a string, an HTML
1232 file, an HTML URL, an URL or a file.
1233
1234 Examples:
1235
1236 XML::Twig->nparse( "file.xml");
1237 XML::Twig->nparse( error_context => 1, "file://file.xml");
1238
1239 nparse_pp ($optional_twig_options, $thing_to_parse)
1240 same as "nparse" but also sets the "pretty_print" option to
1241 "indented".
1242
1243 nparse_e ($optional_twig_options, $thing_to_parse)
1244 same as "nparse" but also sets the "error_context" option to 1.
1245
1246 nparse_ppe ($optional_twig_options, $thing_to_parse)
1247 same as "nparse" but also sets the "pretty_print" option to
1248 "indented" and the "error_context" option to 1.
1249
1250 parser
1251 This method returns the "expat" object (actually the
1252 XML::Parser::Expat object) used during parsing. It is useful for
1253 example to call XML::Parser::Expat methods on it. To get the line
1254 of a tag for example use "$t->parser->current_line".
1255
1256 setTwigHandlers ($handlers)
1257 Set the twig_handlers. $handlers is a reference to a hash similar
1258 to the one in the "twig_handlers" option of new. All previous
1259 handlers are unset. The method returns the reference to the
1260 previous handlers.
1261
1262 setTwigHandler ($exp $handler)
1263 Set a single twig_handler for elements matching $exp. $handler is a
1264 reference to a subroutine. If the handler was previously set then
1265 the reference to the previous handler is returned.
1266
1267 setStartTagHandlers ($handlers)
1268 Set the start_tag handlers. $handlers is a reference to a hash
1269 similar to the one in the "start_tag_handlers" option of new. All
1270 previous handlers are unset. The method returns the reference to
1271 the previous handlers.
1272
1273 setStartTagHandler ($exp $handler)
1274 Set a single start_tag handlers for elements matching $exp.
1275 $handler is a reference to a subroutine. If the handler was
1276 previously set then the reference to the previous handler is
1277 returned.
1278
1279 setEndTagHandlers ($handlers)
1280 Set the end_tag handlers. $handlers is a reference to a hash
1281 similar to the one in the "end_tag_handlers" option of new. All
1282 previous handlers are unset. The method returns the reference to
1283 the previous handlers.
1284
1285 setEndTagHandler ($exp $handler)
1286 Set a single end_tag handlers for elements matching $exp. $handler
1287 is a reference to a subroutine. If the handler was previously set
1288 then the reference to the previous handler is returned.
1289
1290 setTwigRoots ($handlers)
1291 Same as using the "twig_roots" option when creating the twig
1292
1293 setCharHandler ($exp $handler)
1294 Set a "char_handler"
1295
1296 setIgnoreEltsHandler ($exp)
1297 Set a "ignore_elt" handler (elements that match $exp will be
1298 ignored
1299
1300 setIgnoreEltsHandlers ($exp)
1301 Set all "ignore_elt" handlers (previous handlers are replaced)
1302
1303 dtd Return the dtd (an XML::Twig::DTD object) of a twig
1304
1305 xmldecl
1306 Return the XML declaration for the document, or a default one if it
1307 doesn't have one
1308
1309 doctype
1310 Return the doctype for the document
1311
1312 doctype_name
1313 returns the doctype of the document from the doctype declaration
1314
1315 system_id
1316 returns the system value of the DTD of the document from the
1317 doctype declaration
1318
1319 public_id
1320 returns the public doctype of the document from the doctype
1321 declaration
1322
1323 internal_subset
1324 returns the internal subset of the DTD
1325
1326 dtd_text
1327 Return the DTD text
1328
1329 dtd_print
1330 Print the DTD
1331
1332 model ($tag)
1333 Return the model (in the DTD) for the element $tag
1334
1335 root
1336 Return the root element of a twig
1337
1338 set_root ($elt)
1339 Set the root of a twig
1340
1341 first_elt ($optional_condition)
1342 Return the first element matching $optional_condition of a twig, if
1343 no condition is given then the root is returned
1344
1345 last_elt ($optional_condition)
1346 Return the last element matching $optional_condition of a twig, if
1347 no condition is given then the last element of the twig is returned
1348
1349 elt_id ($id)
1350 Return the element whose "id" attribute is $id
1351
1352 getEltById
1353 Same as "elt_id"
1354
1355 index ($index_name, $optional_index)
1356 If the $optional_index argument is present, return the
1357 corresponding element in the index (created using the "index"
1358 option for "XML::Twig-"new>)
1359
1360 If the argument is not present, return an arrayref to the index
1361
1362 normalize
1363 merge together all consecutive pcdata elements in the document (if
1364 for example you have turned some elements into pcdata using
1365 "erase", this will give you a "clean" document in which there all
1366 text elements are as long as possible).
1367
1368 encoding
1369 This method returns the encoding of the XML document, as defined by
1370 the "encoding" attribute in the XML declaration (ie it is "undef"
1371 if the attribute is not defined)
1372
1373 set_encoding
1374 This method sets the value of the "encoding" attribute in the XML
1375 declaration. Note that if the document did not have a declaration
1376 it is generated (with an XML version of 1.0)
1377
1378 xml_version
1379 This method returns the XML version, as defined by the "version"
1380 attribute in the XML declaration (ie it is "undef" if the attribute
1381 is not defined)
1382
1383 set_xml_version
1384 This method sets the value of the "version" attribute in the XML
1385 declaration. If the declaration did not exist it is created.
1386
1387 standalone
1388 This method returns the value of the "standalone" declaration for
1389 the document
1390
1391 set_standalone
1392 This method sets the value of the "standalone" attribute in the XML
1393 declaration. Note that if the document did not have a declaration
1394 it is generated (with an XML version of 1.0)
1395
1396 set_output_encoding
1397 Set the "encoding" "attribute" in the XML declaration
1398
1399 set_doctype ($name, $system, $public, $internal)
1400 Set the doctype of the element. If an argument is "undef" (or not
1401 present) then its former value is retained, if a false ('' or 0)
1402 value is passed then the former value is deleted;
1403
1404 entity_list
1405 Return the entity list of a twig
1406
1407 entity_names
1408 Return the list of all defined entities
1409
1410 entity ($entity_name)
1411 Return the entity
1412
1413 change_gi ($old_gi, $new_gi)
1414 Performs a (very fast) global change. All elements $old_gi are now
1415 $new_gi. This is a bit dangerous though and should be avoided if <
1416 possible, as the new tag might be ignored in subsequent processing.
1417
1418 See "BUGS "
1419
1420 flush ($optional_filehandle, %options)
1421 Flushes a twig up to (and including) the current element, then
1422 deletes all unnecessary elements from the tree that's kept in
1423 memory. "flush" keeps track of which elements need to be
1424 open/closed, so if you flush from handlers you don't have to worry
1425 about anything. Just keep flushing the twig every time you're done
1426 with a sub-tree and it will come out well-formed. After the whole
1427 parsing don't forget to"flush" one more time to print the end of
1428 the document. The doctype and entity declarations are also
1429 printed.
1430
1431 flush take an optional filehandle as an argument.
1432
1433 options: use the "update_DTD" option if you have updated the
1434 (internal) DTD and/or the entity list and you want the updated DTD
1435 to be output
1436
1437 The "pretty_print" option sets the pretty printing of the document.
1438
1439 Example: $t->flush( Update_DTD => 1);
1440 $t->flush( $filehandle, pretty_print => 'indented');
1441 $t->flush( \*FILE);
1442
1443 flush_up_to ($elt, $optional_filehandle, %options)
1444 Flushes up to the $elt element. This allows you to keep part of the
1445 tree in memory when you "flush".
1446
1447 options: see flush.
1448
1449 purge
1450 Does the same as a "flush" except it does not print the twig. It
1451 just deletes all elements that have been completely parsed so far.
1452
1453 purge_up_to ($elt)
1454 Purges up to the $elt element. This allows you to keep part of the
1455 tree in memory when you "purge".
1456
1457 print ($optional_filehandle, %options)
1458 Prints the whole document associated with the twig. To be used only
1459 AFTER the parse.
1460
1461 options: see "flush".
1462
1463 print_to_file ($filename, %options)
1464 Prints the whole document associated with the twig to file
1465 $filename. To be used only AFTER the parse.
1466
1467 options: see "flush".
1468
1469 sprint
1470 Return the text of the whole document associated with the twig. To
1471 be used only AFTER the parse.
1472
1473 options: see "flush".
1474
1475 trim
1476 Trim the document: gets rid of initial and trailing spaces, and
1477 replaces multiple spaces by a single one.
1478
1479 toSAX1 ($handler)
1480 Send SAX events for the twig to the SAX1 handler $handler
1481
1482 toSAX2 ($handler)
1483 Send SAX events for the twig to the SAX2 handler $handler
1484
1485 flush_toSAX1 ($handler)
1486 Same as flush, except that SAX events are sent to the SAX1 handler
1487 $handler instead of the twig being printed
1488
1489 flush_toSAX2 ($handler)
1490 Same as flush, except that SAX events are sent to the SAX2 handler
1491 $handler instead of the twig being printed
1492
1493 ignore
1494 This method should be called during parsing, usually in
1495 "start_tag_handlers". It causes the element to be skipped during
1496 the parsing: the twig is not built for this element, it will not be
1497 accessible during parsing or after it. The element will not take up
1498 any memory and parsing will be faster.
1499
1500 Note that this method can also be called on an element. If the
1501 element is a parent of the current element then this element will
1502 be ignored (the twig will not be built any more for it and what has
1503 already been built will be deleted).
1504
1505 set_pretty_print ($style)
1506 Set the pretty print method, amongst '"none"' (default),
1507 '"nsgmls"', '"nice"', '"indented"', "indented_c", '"wrapped"',
1508 '"record"' and '"record_c"'
1509
1510 WARNING: the pretty print style is a GLOBAL variable, so once set
1511 it's applied to ALL "print"'s (and "sprint"'s). Same goes if you
1512 use XML::Twig with "mod_perl" . This should not be a problem as the
1513 XML that's generated is valid anyway, and XML processors (as well
1514 as HTML processors, including browsers) should not care. Let me
1515 know if this is a big problem, but at the moment the
1516 performance/cleanliness trade-off clearly favors the global
1517 approach.
1518
1519 set_empty_tag_style ($style)
1520 Set the empty tag display style ('"normal"', '"html"' or
1521 '"expand"'). As with "set_pretty_print" this sets a global flag.
1522
1523 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
1524 '"<tag />"' for elements that can be empty in XHTML and "expand"
1525 outputs '"<tag></tag>"'
1526
1527 set_remove_cdata ($flag)
1528 set (or unset) the flag that forces the twig to output CDATA
1529 sections as regular (escaped) PCDATA
1530
1531 print_prolog ($optional_filehandle, %options)
1532 Prints the prolog (XML declaration + DTD + entity declarations) of
1533 a document.
1534
1535 options: see "flush".
1536
1537 prolog ($optional_filehandle, %options)
1538 Return the prolog (XML declaration + DTD + entity declarations) of
1539 a document.
1540
1541 options: see "flush".
1542
1543 finish
1544 Call Expat "finish" method. Unsets all handlers (including
1545 internal ones that set context), but expat continues parsing to the
1546 end of the document or until it finds an error. It should finish
1547 up a lot faster than with the handlers set.
1548
1549 finish_print
1550 Stops twig processing, flush the twig and proceed to finish
1551 printing the document as fast as possible. Use this method when
1552 modifying a document and the modification is done.
1553
1554 finish_now
1555 Stops twig processing, does not finish parsing the document (which
1556 could actually be not well-formed after the point where
1557 "finish_now" is called). Execution resumes after the "Lparse"> or
1558 "parsefile" call. The content of the twig is what has been parsed
1559 so far (all open elements at the time "finish_now" is called are
1560 considered closed).
1561
1562 set_expand_external_entities
1563 Same as using the "expand_external_ents" option when creating the
1564 twig
1565
1566 set_input_filter
1567 Same as using the "input_filter" option when creating the twig
1568
1569 set_keep_atts_order
1570 Same as using the "keep_atts_order" option when creating the twig
1571
1572 set_keep_encoding
1573 Same as using the "keep_encoding" option when creating the twig
1574
1575 escape_gt
1576 usually XML::Twig does not escape > in its output. Using this
1577 option makes it replace > by >
1578
1579 do_not_escape_gt
1580 reverts XML::Twig behavior to its default of not escaping > in its
1581 output.
1582
1583 set_output_filter
1584 Same as using the "output_filter" option when creating the twig
1585
1586 set_output_text_filter
1587 Same as using the "output_text_filter" option when creating the
1588 twig
1589
1590 add_stylesheet ($type, @options)
1591 Adds an external stylesheet to an XML document.
1592
1593 Supported types and options:
1594
1595 xsl option: the url of the stylesheet
1596
1597 Example:
1598
1599 $t->add_stylesheet( xsl => "xsl_style.xsl");
1600
1601 will generate the following PI at the beginning of the
1602 document:
1603
1604 <?xml-stylesheet type="text/xsl" href="xsl_style.xsl"?>
1605
1606 css option: the url of the stylesheet
1607
1608 Methods inherited from XML::Parser::Expat
1609 A twig inherits all the relevant methods from XML::Parser::Expat.
1610 These methods can only be used during the parsing phase (they will
1611 generate a fatal error otherwise).
1612
1613 Inherited methods are:
1614
1615 depth
1616 Returns the size of the context list.
1617
1618 in_element
1619 Returns true if NAME is equal to the name of the innermost
1620 curaXX rently opened element. If namespace processing is being
1621 used and you want to check against a name that may be in a
1622 namespace, then use the generate_ns_name method to create the
1623 NAME argument.
1624
1625 within_element
1626 Returns the number of times the given name appears in the
1627 context list. If namespace processing is being used and you
1628 want to check against a name that may be in a namespace, then
1629 use the generaXX ate_ns_name method to create the NAME
1630 argument.
1631
1632 context
1633 Returns a list of element names that represent open elements,
1634 with the last one being the innermost. Inside start and end tag
1635 hanaXX dlers, this will be the tag of the parent element.
1636
1637 current_line
1638 Returns the line number of the current position of the parse.
1639
1640 current_column
1641 Returns the column number of the current position of the parse.
1642
1643 current_byte
1644 Returns the current position of the parse.
1645
1646 position_in_context
1647 Returns a string that shows the current parse position. LINES
1648 should be an integer >= 0 that represents the number of lines
1649 on either side of the current parse line to place into the
1650 returned string.
1651
1652 base ([NEWBASE])
1653 Returns the current value of the base for resolving relative
1654 URIs. If NEWBASE is supplied, changes the base to that value.
1655
1656 current_element
1657 Returns the name of the innermost currently opened element.
1658 Inside start or end handlers, returns the parent of the element
1659 associated with those tags.
1660
1661 element_index
1662 Returns an integer that is the depth-first visit order of the
1663 curaXX rent element. This will be zero outside of the root
1664 element. For example, this will return 1 when called from the
1665 start handler for the root element start tag.
1666
1667 recognized_string
1668 Returns the string from the document that was recognized in
1669 order to call the current handler. For instance, when called
1670 from a start handler, it will give us the the start-tag string.
1671 The string is encoded in UTF-8. This method doesn't return a
1672 meaningful string inside declaration handlers.
1673
1674 original_string
1675 Returns the verbatim string from the document that was
1676 recognized in order to call the current handler. The string is
1677 in the original document encoding. This method doesn't return a
1678 meaningful string inside declaration handlers.
1679
1680 xpcroak
1681 Concatenate onto the given message the current line number
1682 within the XML document plus the message implied by
1683 ErrorContext. Then croak with the formed message.
1684
1685 xpcarp
1686 Concatenate onto the given message the current line number
1687 within the XML document plus the message implied by
1688 ErrorContext. Then carp with the formed message.
1689
1690 xml_escape(TEXT [, CHAR [, CHAR ...]])
1691 Returns TEXT with markup characters turned into character
1692 entities. Any additional characters provided as arguments are
1693 also turned into character references where found in TEXT.
1694
1695 (this method is broken on some versions of expat/XML::Parser)
1696
1697 path ( $optional_tag)
1698 Return the element context in a form similar to XPath's short form:
1699 '"/root/tag1/../tag"'
1700
1701 get_xpath ( $optional_array_ref, $xpath, $optional_offset)
1702 Performs a "get_xpath" on the document root (see <Elt|"Elt">)
1703
1704 If the $optional_array_ref argument is used the array must contain
1705 elements. The $xpath expression is applied to each element in turn
1706 and the result is union of all results. This way a first query can
1707 be refined in further steps.
1708
1709 find_nodes ( $optional_array_ref, $xpath, $optional_offset)
1710 same as "get_xpath"
1711
1712 findnodes ( $optional_array_ref, $xpath, $optional_offset)
1713 same as "get_xpath" (similar to the XML::LibXML method)
1714
1715 findvalue ( $optional_array_ref, $xpath, $optional_offset)
1716 Return the "join" of all texts of the results of applying
1717 "get_xpath" to the node (similar to the XML::LibXML method)
1718
1719 subs_text ($regexp, $replace)
1720 subs_text does text substitution on the whole document, similar to
1721 perl's " s///" operator.
1722
1723 dispose
1724 Useful only if you don't have "Scalar::Util" or "WeakRef"
1725 installed.
1726
1727 Reclaims properly the memory used by an XML::Twig object. As the
1728 object has circular references it never goes out of scope, so if
1729 you want to parse lots of XML documents then the memory leak
1730 becomes a problem. Use "$twig->dispose" to clear this problem.
1731
1732 create_accessors (list_of_attribute_names)
1733 A convenience method that creates l-valued accessors for
1734 attributes. So "$twig->create_accessors( 'foo')" will create a
1735 "foo" method that can be called on elements:
1736
1737 $elt->foo; # equivalent to $elt->{'att'}->{'foo'};
1738 $elt->foo( 'bar'); # equivalent to $elt->set_att( foo => 'bar');
1739
1740 set_do_not_escape_amp_in_atts
1741 An evil method, that I only document because Test::Pod::Coverage
1742 complaints otherwise, but really, you don't want to know about it.
1743
1744 XML::Twig::Elt
1745 new ($optional_tag, $optional_atts, @optional_content)
1746 The "tag" is optional (but then you can't have a content ), the
1747 $optional_atts argument is a reference to a hash of attributes, the
1748 content can be just a string or a list of strings and element. A
1749 content of '"#EMPTY"' creates an empty element;
1750
1751 Examples: my $elt= XML::Twig::Elt->new();
1752 my $elt= XML::Twig::Elt->new( para => { align => 'center' });
1753 my $elt= XML::Twig::Elt->new( para => { align => 'center' }, 'foo');
1754 my $elt= XML::Twig::Elt->new( br => '#EMPTY');
1755 my $elt= XML::Twig::Elt->new( 'para');
1756 my $elt= XML::Twig::Elt->new( para => 'this is a para');
1757 my $elt= XML::Twig::Elt->new( para => $elt3, 'another para');
1758
1759 The strings are not parsed, the element is not attached to any
1760 twig.
1761
1762 WARNING: if you rely on ID's then you will have to set the id
1763 yourself. At this point the element does not belong to a twig yet,
1764 so the ID attribute is not known so it won't be stored in the ID
1765 list.
1766
1767 Note that "#COMMENT", "#PCDATA" or "#CDATA" are valid tag names,
1768 that will create text elements.
1769
1770 To create an element "foo" containing a CDATA section:
1771
1772 my $foo= XML::Twig::Elt->new( '#CDATA' => "content of the CDATA section")
1773 ->wrap_in( 'foo');
1774
1775 An attribute of '#CDATA', will create the content of the element as
1776 CDATA:
1777
1778 my $elt= XML::Twig::Elt->new( 'p' => { '#CDATA' => 1}, 'foo < bar');
1779
1780 creates an element
1781
1782 <p><![CDATA[foo < bar]]></>
1783
1784 parse ($string, %args)
1785 Creates an element from an XML string. The string is actually
1786 parsed as a new twig, then the root of that twig is returned. The
1787 arguments in %args are passed to the twig. As always if the parse
1788 fails the parser will die, so use an eval if you want to trap
1789 syntax errors.
1790
1791 As obviously the element does not exist beforehand this method has
1792 to be called on the class:
1793
1794 my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
1795 <elements>, actually tons of </elements>
1796 h</a>");
1797
1798 set_inner_xml ($string)
1799 Sets the content of the element to be the tree created from the
1800 string
1801
1802 set_inner_html ($string)
1803 Sets the content of the element, after parsing the string with an
1804 HTML parser (HTML::Parser)
1805
1806 print ($optional_filehandle, $optional_pretty_print_style)
1807 Prints an entire element, including the tags, optionally to a
1808 $optional_filehandle, optionally with a $pretty_print_style.
1809
1810 The print outputs XML data so base entities are escaped.
1811
1812 sprint ($elt, $optional_no_enclosing_tag)
1813 Return the xml string for an entire element, including the tags.
1814 If the optional second argument is true then only the string inside
1815 the element is returned (the start and end tag for $elt are not).
1816 The text is XML-escaped: base entities (& and < in text, & < and "
1817 in attribute values) are turned into entities.
1818
1819 gi Return the gi of the element (the gi is the "generic identifier"
1820 the tag name in SGML parlance).
1821
1822 "tag" and "name" are synonyms of "gi".
1823
1824 tag Same as "gi"
1825
1826 name
1827 Same as "tag"
1828
1829 set_gi ($tag)
1830 Set the gi (tag) of an element
1831
1832 set_tag ($tag)
1833 Set the tag (="tag") of an element
1834
1835 set_name ($name)
1836 Set the name (="tag") of an element
1837
1838 root
1839 Return the root of the twig in which the element is contained.
1840
1841 twig
1842 Return the twig containing the element.
1843
1844 parent ($optional_condition)
1845 Return the parent of the element, or the first ancestor matching
1846 the $optional_condition
1847
1848 first_child ($optional_condition)
1849 Return the first child of the element, or the first child matching
1850 the $optional_condition
1851
1852 has_child ($optional_condition)
1853 Return the first child of the element, or the first child matching
1854 the $optional_condition (same as first_child)
1855
1856 has_children ($optional_condition)
1857 Return the first child of the element, or the first child matching
1858 the $optional_condition (same as first_child)
1859
1860 first_child_text ($optional_condition)
1861 Return the text of the first child of the element, or the first
1862 child
1863 matching the $optional_condition If there is no first_child then
1864 returns ''. This avoids getting the child, checking for its
1865 existence then getting the text for trivial cases.
1866
1867 Similar methods are available for the other navigation methods:
1868
1869 last_child_text
1870 prev_sibling_text
1871 next_sibling_text
1872 prev_elt_text
1873 next_elt_text
1874 child_text
1875 parent_text
1876
1877 All this methods also exist in "trimmed" variant:
1878
1879 first_child_trimmed_text
1880 last_child_trimmed_text
1881 prev_sibling_trimmed_text
1882 next_sibling_trimmed_text
1883 prev_elt_trimmed_text
1884 next_elt_trimmed_text
1885 child_trimmed_text
1886 parent_trimmed_text
1887 field ($condition)
1888 Same method as "first_child_text" with a different name
1889
1890 fields ($condition_list)
1891 Return the list of field (text of first child matching the
1892 conditions), missing fields are returned as the empty string.
1893
1894 Same method as "first_child_text" with a different name
1895
1896 trimmed_field ($optional_condition)
1897 Same method as "first_child_trimmed_text" with a different name
1898
1899 set_field ($condition, $optional_atts, @list_of_elt_and_strings)
1900 Set the content of the first child of the element that matches
1901 $condition, the rest of the arguments is the same as for
1902 "set_content"
1903
1904 If no child matches $condition _and_ if $condition is a valid XML
1905 element name, then a new element by that name is created and
1906 inserted as the last child.
1907
1908 first_child_matches ($optional_condition)
1909 Return the element if the first child of the element (if it exists)
1910 passes the $optional_condition "undef" otherwise
1911
1912 if( $elt->first_child_matches( 'title')) ...
1913
1914 is equivalent to
1915
1916 if( $elt->{first_child} && $elt->{first_child}->passes( 'title'))
1917
1918 "first_child_is" is an other name for this method
1919
1920 Similar methods are available for the other navigation methods:
1921
1922 last_child_matches
1923 prev_sibling_matches
1924 next_sibling_matches
1925 prev_elt_matches
1926 next_elt_matches
1927 child_matches
1928 parent_matches
1929 is_first_child ($optional_condition)
1930 returns true (the element) if the element is the first child of its
1931 parent (optionally that satisfies the $optional_condition)
1932
1933 is_last_child ($optional_condition)
1934 returns true (the element) if the element is the first child of its
1935 parent (optionally that satisfies the $optional_condition)
1936
1937 prev_sibling ($optional_condition)
1938 Return the previous sibling of the element, or the previous sibling
1939 matching $optional_condition
1940
1941 next_sibling ($optional_condition)
1942 Return the next sibling of the element, or the first one matching
1943 $optional_condition.
1944
1945 next_elt ($optional_elt, $optional_condition)
1946 Return the next elt (optionally matching $optional_condition) of
1947 the element. This is defined as the next element which opens after
1948 the current element opens. Which usually means the first child of
1949 the element. Counter-intuitive as it might look this allows you to
1950 loop through the whole document by starting from the root.
1951
1952 The $optional_elt is the root of a subtree. When the "next_elt" is
1953 out of the subtree then the method returns undef. You can then walk
1954 a sub tree with:
1955
1956 my $elt= $subtree_root;
1957 while( $elt= $elt->next_elt( $subtree_root)
1958 { # insert processing code here
1959 }
1960
1961 prev_elt ($optional_condition)
1962 Return the previous elt (optionally matching $optional_condition)
1963 of the element. This is the first element which opens before the
1964 current one. It is usually either the last descendant of the
1965 previous sibling or simply the parent
1966
1967 next_n_elt ($offset, $optional_condition)
1968 Return the $offset-th element that matches the $optional_condition
1969
1970 following_elt
1971 Return the following element (as per the XPath following axis)
1972
1973 preceding_elt
1974 Return the preceding element (as per the XPath preceding axis)
1975
1976 following_elts
1977 Return the list of following elements (as per the XPath following
1978 axis)
1979
1980 preceding_elts
1981 Return the pst of preceding elements (as per the XPath preceding
1982 axis)
1983
1984 children ($optional_condition)
1985 Return the list of children (optionally which matches
1986 $optional_condition) of the element. The list is in document order.
1987
1988 children_count ($optional_condition)
1989 Return the number of children of the element (optionally which
1990 matches $optional_condition)
1991
1992 children_text ($optional_condition)
1993 In array context, reeturns an array containing the text of children
1994 of the element (optionally which matches $optional_condition)
1995
1996 In scalar context, returns the concatenation of the text of
1997 children of the element
1998
1999 children_trimmed_text ($optional_condition)
2000 In array context, returns an array containing the trimmed text of
2001 children of the element (optionally which matches
2002 $optional_condition)
2003
2004 In scalar context, returns the concatenation of the trimmed text of
2005 children of the element
2006
2007 children_copy ($optional_condition)
2008 Return a list of elements that are copies of the children of the
2009 element, optionally which matches $optional_condition
2010
2011 descendants ($optional_condition)
2012 Return the list of all descendants (optionally which matches
2013 $optional_condition) of the element. This is the equivalent of the
2014 "getElementsByTagName" of the DOM (by the way, if you are really a
2015 DOM addict, you can use "getElementsByTagName" instead)
2016
2017 getElementsByTagName ($optional_condition)
2018 Same as "descendants"
2019
2020 find_by_tag_name ($optional_condition)
2021 Same as "descendants"
2022
2023 descendants_or_self ($optional_condition)
2024 Same as "descendants" except that the element itself is included in
2025 the list if it matches the $optional_condition
2026
2027 first_descendant ($optional_condition)
2028 Return the first descendant of the element that matches the
2029 condition
2030
2031 last_descendant ($optional_condition)
2032 Return the last descendant of the element that matches the
2033 condition
2034
2035 ancestors ($optional_condition)
2036 Return the list of ancestors (optionally matching
2037 $optional_condition) of the element. The list is ordered from the
2038 innermost ancestor to the outermost one
2039
2040 NOTE: the element itself is not part of the list, in order to
2041 include it you will have to use ancestors_or_self
2042
2043 ancestors_or_self ($optional_condition)
2044 Return the list of ancestors (optionally matching
2045 $optional_condition) of the element, including the element (if it
2046 matches the condition>). The list is ordered from the innermost
2047 ancestor to the outermost one
2048
2049 passes ($condition)
2050 Return the element if it passes the $condition
2051
2052 att ($att)
2053 Return the value of attribute $att or "undef"
2054
2055 set_att ($att, $att_value)
2056 Set the attribute of the element to the given value
2057
2058 You can actually set several attributes this way:
2059
2060 $elt->set_att( att1 => "val1", att2 => "val2");
2061
2062 del_att ($att)
2063 Delete the attribute for the element
2064
2065 You can actually delete several attributes at once:
2066
2067 $elt->del_att( 'att1', 'att2', 'att3');
2068
2069 att_exists ($att)
2070 Returns true if the attribute $att exists for the element, false
2071 otherwise
2072
2073 cut Cut the element from the tree. The element still exists, it can be
2074 copied or pasted somewhere else, it is just not attached to the
2075 tree anymore.
2076
2077 Note that the "old" links to the parent, previous and next siblings
2078 can still be accessed using the former_* methods
2079
2080 former_next_sibling
2081 Returns the former next sibling of a cut node (or undef if the node
2082 has not been cut)
2083
2084 This makes it easier to write loops where you cut elements:
2085
2086 my $child= $parent->first_child( 'achild');
2087 while( $child->{'att'}->{'cut'})
2088 { $child->cut; $child= $child->former_next_sibling; }
2089
2090 former_prev_sibling
2091 Returns the former previous sibling of a cut node (or undef if the
2092 node has not been cut)
2093
2094 former_parent
2095 Returns the former parent of a cut node (or undef if the node has
2096 not been cut)
2097
2098 cut_children ($optional_condition)
2099 Cut all the children of the element (or all of those which satisfy
2100 the $optional_condition).
2101
2102 Return the list of children
2103
2104 copy ($elt)
2105 Return a copy of the element. The copy is a "deep" copy: all sub
2106 elements of the element are duplicated.
2107
2108 paste ($optional_position, $ref)
2109 Paste a (previously "cut" or newly generated) element. Die if the
2110 element already belongs to a tree.
2111
2112 Note that the calling element is pasted:
2113
2114 $child->paste( first_child => $existing_parent);
2115 $new_sibling->paste( after => $this_sibling_is_already_in_the_tree);
2116
2117 or
2118
2119 my $new_elt= XML::Twig::Elt->new( tag => $content);
2120 $new_elt->paste( $position => $existing_elt);
2121
2122 Example:
2123
2124 my $t= XML::Twig->new->parse( 'doc.xml')
2125 my $toc= $t->root->new( 'toc');
2126 $toc->paste( $t->root); # $toc is pasted as first child of the root
2127 foreach my $title ($t->findnodes( '/doc/section/title'))
2128 { my $title_toc= $title->copy;
2129 # paste $title_toc as the last child of toc
2130 $title_toc->paste( last_child => $toc)
2131 }
2132
2133 Position options:
2134
2135 first_child (default)
2136 The element is pasted as the first child of $ref
2137
2138 last_child
2139 The element is pasted as the last child of $ref
2140
2141 before
2142 The element is pasted before $ref, as its previous sibling.
2143
2144 after
2145 The element is pasted after $ref, as its next sibling.
2146
2147 within
2148 In this case an extra argument, $offset, should be supplied.
2149 The element will be pasted in the reference element (or in its
2150 first text child) at the given offset. To achieve this the
2151 reference element will be split at the offset.
2152
2153 Note that you can call directly the underlying method:
2154
2155 paste_before
2156 paste_after
2157 paste_first_child
2158 paste_last_child
2159 paste_within
2160 move ($optional_position, $ref)
2161 Move an element in the tree. This is just a "cut" then a "paste".
2162 The syntax is the same as "paste".
2163
2164 replace ($ref)
2165 Replaces an element in the tree. Sometimes it is just not possible
2166 to"cut" an element then "paste" another in its place, so "replace"
2167 comes in handy. The calling element replaces $ref.
2168
2169 replace_with (@elts)
2170 Replaces the calling element with one or more elements
2171
2172 delete
2173 Cut the element and frees the memory.
2174
2175 prefix ($text, $optional_option)
2176 Add a prefix to an element. If the element is a "PCDATA" element
2177 the text is added to the pcdata, if the elements first child is a
2178 "PCDATA" then the text is added to it's pcdata, otherwise a new
2179 "PCDATA" element is created and pasted as the first child of the
2180 element.
2181
2182 If the option is "asis" then the prefix is added asis: it is
2183 created in a separate "PCDATA" element with an "asis" property. You
2184 can then write:
2185
2186 $elt1->prefix( '<b>', 'asis');
2187
2188 to create a "<b>" in the output of "print".
2189
2190 suffix ($text, $optional_option)
2191 Add a suffix to an element. If the element is a "PCDATA" element
2192 the text is added to the pcdata, if the elements last child is a
2193 "PCDATA" then the text is added to it's pcdata, otherwise a new
2194 PCDATA element is created and pasted as the last child of the
2195 element.
2196
2197 If the option is "asis" then the suffix is added asis: it is
2198 created in a separate "PCDATA" element with an "asis" property. You
2199 can then write:
2200
2201 $elt2->suffix( '</b>', 'asis');
2202
2203 trim
2204 Trim the element in-place: spaces at the beginning and at the end
2205 of the element are discarded and multiple spaces within the element
2206 (or its descendants) are replaced by a single space.
2207
2208 Note that in some cases you can still end up with multiple spaces,
2209 if they are split between several elements:
2210
2211 <doc> text <b> hah! </b> yep</doc>
2212
2213 gets trimmed to
2214
2215 <doc>text <b> hah! </b> yep</doc>
2216
2217 This is somewhere in between a bug and a feature.
2218
2219 normalize
2220 merge together all consecutive pcdata elements in the element (if
2221 for example you have turned some elements into pcdata using
2222 "erase", this will give you a "clean" element in which there all
2223 text fragments are as long as possible).
2224
2225 simplify (%options)
2226 Return a data structure suspiciously similar to XML::Simple's.
2227 Options are identical to XMLin options, see XML::Simple doc for
2228 more details (or use DATA::dumper or YAML to dump the data
2229 structure)
2230
2231 content_key
2232 forcearray
2233 keyattr
2234 noattr
2235 normalize_space
2236 aka normalise_space
2237
2238 variables (%var_hash)
2239 %var_hash is a hash { name => value }
2240
2241 This option allows variables in the XML to be expanded when the
2242 file is read. (there is no facility for putting the variable
2243 names back if you regenerate XML using XMLout).
2244
2245 A 'variable' is any text of the form ${name} (or $name) which
2246 occurs in an attribute value or in the text content of an
2247 element. If 'name' matches a key in the supplied hashref,
2248 ${name} will be replaced with the corresponding value from the
2249 hashref. If no matching key is found, the variable will not be
2250 replaced.
2251
2252 var_att ($attribute_name)
2253 This option gives the name of an attribute that will be used to
2254 create variables in the XML:
2255
2256 <dirs>
2257 <dir name="prefix">/usr/local</dir>
2258 <dir name="exec_prefix">$prefix/bin</dir>
2259 </dirs>
2260
2261 use "var => 'name'" to get $prefix replaced by /usr/local in
2262 the generated data structure
2263
2264 By default variables are captured by the following regexp:
2265 /$(\w+)/
2266
2267 var_regexp (regexp)
2268 This option changes the regexp used to capture variables. The
2269 variable name should be in $1
2270
2271 group_tags { grouping tag => grouped tag, grouping tag 2 => grouped
2272 tag 2...}
2273 Option used to simplify the structure: elements listed will not
2274 be used. Their children will be, they will be considered
2275 children of the element parent.
2276
2277 If the element is:
2278
2279 <config host="laptop.xmltwig.com">
2280 <server>localhost</server>
2281 <dirs>
2282 <dir name="base">/home/mrodrigu/standards</dir>
2283 <dir name="tools">$base/tools</dir>
2284 </dirs>
2285 <templates>
2286 <template name="std_def">std_def.templ</template>
2287 <template name="dummy">dummy</template>
2288 </templates>
2289 </config>
2290
2291 Then calling simplify with "group_tags => { dirs => 'dir',
2292 templates => 'template'}" makes the data structure be exactly
2293 as if the start and end tags for "dirs" and "templates" were
2294 not there.
2295
2296 A YAML dump of the structure
2297
2298 base: '/home/mrodrigu/standards'
2299 host: laptop.xmltwig.com
2300 server: localhost
2301 template:
2302 - std_def.templ
2303 - dummy.templ
2304 tools: '$base/tools'
2305
2306 split_at ($offset)
2307 Split a text ("PCDATA" or "CDATA") element in 2 at $offset, the
2308 original element now holds the first part of the string and a new
2309 element holds the right part. The new element is returned
2310
2311 If the element is not a text element then the first text child of
2312 the element is split
2313
2314 split ( $optional_regexp, $tag1, $atts1, $tag2, $atts2...)
2315 Split the text descendants of an element in place, the text is
2316 split using the $regexp, if the regexp includes () then the matched
2317 separators will be wrapped in elements. $1 is wrapped in $tag1,
2318 with attributes $atts1 if $atts1 is given (as a hashref), $2 is
2319 wrapped in $tag2...
2320
2321 if $elt is "<p>tati tata <b>tutu tati titi</b> tata tati tata</p>"
2322
2323 $elt->split( qr/(ta)ti/, 'foo', {type => 'toto'} )
2324
2325 will change $elt to
2326
2327 <p><foo type="toto">ta</foo> tata <b>tutu <foo type="toto">ta</foo>
2328 titi</b> tata <foo type="toto">ta</foo> tata</p>
2329
2330 The regexp can be passed either as a string or as "qr//" (perl
2331 5.005 and later), it defaults to \s+ just as the "split" built-in
2332 (but this would be quite a useless behaviour without the
2333 $optional_tag parameter)
2334
2335 $optional_tag defaults to PCDATA or CDATA, depending on the initial
2336 element type
2337
2338 The list of descendants is returned (including un-touched original
2339 elements and newly created ones)
2340
2341 mark ( $regexp, $optional_tag, $optional_attribute_ref)
2342 This method behaves exactly as split, except only the newly created
2343 elements are returned
2344
2345 wrap_children ( $regexp_string, $tag, $optional_attribute_hashref)
2346 Wrap the children of the element that match the regexp in an
2347 element $tag. If $optional_attribute_hashref is passed then the
2348 new element will have these attributes.
2349
2350 The $regexp_string includes tags, within pointy brackets, as in
2351 "<title><para>+" and the usual Perl modifiers (+*?...). Tags can
2352 be further qualified with attributes: "<para type="warning"
2353 classif="cosmic_secret">+". The values for attributes should be
2354 xml-escaped: "<candy type="M&Ms">*" ("<", "&" ">" and """
2355 should be escaped).
2356
2357 Note that elements might get extra "id" attributes in the process.
2358 See add_id. Use strip_att to remove unwanted id's.
2359
2360 Here is an example:
2361
2362 If the element $elt has the following content:
2363
2364 <elt>
2365 <p>para 1</p>
2366 <l_l1_1>list 1 item 1 para 1</l_l1_1>
2367 <l_l1>list 1 item 1 para 2</l_l1>
2368 <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2369 <l_l1_n>list 1 item 3 para 1</l_l1_n>
2370 <l_l1>list 1 item 3 para 2</l_l1>
2371 <l_l1>list 1 item 3 para 3</l_l1>
2372 <l_l1_1>list 2 item 1 para 1</l_l1_1>
2373 <l_l1>list 2 item 1 para 2</l_l1>
2374 <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2375 <l_l1_n>list 2 item 3 para 1</l_l1_n>
2376 <l_l1>list 2 item 3 para 2</l_l1>
2377 <l_l1>list 2 item 3 para 3</l_l1>
2378 </elt>
2379
2380 Then the code
2381
2382 $elt->wrap_children( q{<l_l1_1><l_l1>*} , li => { type => "ul1" });
2383 $elt->wrap_children( q{<l_l1_n><l_l1>*} , li => { type => "ul" });
2384
2385 $elt->wrap_children( q{<li type="ul1"><li type="ul">+}, "ul");
2386 $elt->strip_att( 'id');
2387 $elt->strip_att( 'type');
2388 $elt->print;
2389
2390 will output:
2391
2392 <elt>
2393 <p>para 1</p>
2394 <ul>
2395 <li>
2396 <l_l1_1>list 1 item 1 para 1</l_l1_1>
2397 <l_l1>list 1 item 1 para 2</l_l1>
2398 </li>
2399 <li>
2400 <l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
2401 </li>
2402 <li>
2403 <l_l1_n>list 1 item 3 para 1</l_l1_n>
2404 <l_l1>list 1 item 3 para 2</l_l1>
2405 <l_l1>list 1 item 3 para 3</l_l1>
2406 </li>
2407 </ul>
2408 <ul>
2409 <li>
2410 <l_l1_1>list 2 item 1 para 1</l_l1_1>
2411 <l_l1>list 2 item 1 para 2</l_l1>
2412 </li>
2413 <li>
2414 <l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
2415 </li>
2416 <li>
2417 <l_l1_n>list 2 item 3 para 1</l_l1_n>
2418 <l_l1>list 2 item 3 para 2</l_l1>
2419 <l_l1>list 2 item 3 para 3</l_l1>
2420 </li>
2421 </ul>
2422 </elt>
2423
2424 subs_text ($regexp, $replace)
2425 subs_text does text substitution, similar to perl's " s///"
2426 operator.
2427
2428 $regexp must be a perl regexp, created with the "qr" operator.
2429
2430 $replace can include "$1, $2"... from the $regexp. It can also be
2431 used to create element and entities, by using "&elt( tag => { att
2432 => val }, text)" (similar syntax as "new") and "&ent( name)".
2433
2434 Here is a rather complex example:
2435
2436 $elt->subs_text( qr{(?<!do not )link to (http://([^\s,]*))},
2437 'see &elt( a =>{ href => $1 }, $2)'
2438 );
2439
2440 This will replace text like link to http://www.xmltwig.com by see
2441 <a href="www.xmltwig.com">www.xmltwig.com</a>, but not do not link
2442 to...
2443
2444 Generating entities (here replacing spaces with ):
2445
2446 $elt->subs_text( qr{ }, '&ent( " ")');
2447
2448 or, using a variable:
2449
2450 my $ent=" ";
2451 $elt->subs_text( qr{ }, "&ent( '$ent')");
2452
2453 Note that the substitution is always global, as in using the "g"
2454 modifier in a perl substitution, and that it is performed on all
2455 text descendants of the element.
2456
2457 Bug: in the $regexp, you can only use "\1", "\2"... if the
2458 replacement expression does not include elements or attributes. eg
2459
2460 t->subs_text( qr/((t[aiou])\2)/, '$2'); # ok, replaces toto, tata, titi, tutu by to, ta, ti, tu
2461 t->subs_text( qr/((t[aiou])\2)/, '&elt(p => $1)' ); # NOK, does not find toto...
2462
2463 add_id ($optional_coderef)
2464 Add an id to the element.
2465
2466 The id is an attribute, "id" by default, see the "id" option for
2467 XML::Twig "new" to change it. Use an id starting with "#" to get an
2468 id that's not output by print, flush or sprint, yet that allows you
2469 to use the elt_id method to get the element easily.
2470
2471 If the element already has an id, no new id is generated.
2472
2473 By default the method create an id of the form "twig_id_<nnnn>",
2474 where "<nnnn>" is a number, incremented each time the method is
2475 called successfully.
2476
2477 set_id_seed ($prefix)
2478 by default the id generated by "add_id" is "twig_id_<nnnn>",
2479 "set_id_seed" changes the prefix to $prefix and resets the number
2480 to 1
2481
2482 strip_att ($att)
2483 Remove the attribute $att from all descendants of the element
2484 (including the element)
2485
2486 Return the element
2487
2488 change_att_name ($old_name, $new_name)
2489 Change the name of the attribute from $old_name to $new_name. If
2490 there is no attribute $old_name nothing happens.
2491
2492 lc_attnames
2493 Lower cases the name all the attributes of the element.
2494
2495 sort_children_on_value( %options)
2496 Sort the children of the element in place according to their text.
2497 All children are sorted.
2498
2499 Return the element, with its children sorted.
2500
2501 %options are
2502
2503 type : numeric | alpha (default: alpha)
2504 order : normal | reverse (default: normal)
2505
2506 Return the element, with its children sorted
2507
2508 sort_children_on_att ($att, %options)
2509 Sort the children of the element in place according to attribute
2510 $att. %options are the same as for "sort_children_on_value"
2511
2512 Return the element.
2513
2514 sort_children_on_field ($tag, %options)
2515 Sort the children of the element in place, according to the field
2516 $tag (the text of the first child of the child with this tag).
2517 %options are the same as for "sort_children_on_value".
2518
2519 Return the element, with its children sorted
2520
2521 sort_children( $get_key, %options)
2522 Sort the children of the element in place. The $get_key argument is
2523 a reference to a function that returns the sort key when passed an
2524 element.
2525
2526 For example:
2527
2528 $elt->sort_children( sub { $_[0]->{'att'}->{"nb"} + $_[0]->text },
2529 type => 'numeric', order => 'reverse'
2530 );
2531
2532 field_to_att ($cond, $att)
2533 Turn the text of the first sub-element matched by $cond into the
2534 value of attribute $att of the element. If $att is omitted then
2535 $cond is used as the name of the attribute, which makes sense only
2536 if $cond is a valid element (and attribute) name.
2537
2538 The sub-element is then cut.
2539
2540 att_to_field ($att, $tag)
2541 Take the value of attribute $att and create a sub-element $tag as
2542 first child of the element. If $tag is omitted then $att is used as
2543 the name of the sub-element.
2544
2545 get_xpath ($xpath, $optional_offset)
2546 Return a list of elements satisfying the $xpath. $xpath is an
2547 XPATH-like expression.
2548
2549 A subset of the XPATH abbreviated syntax is covered:
2550
2551 tag
2552 tag[1] (or any other positive number)
2553 tag[last()]
2554 tag[@att] (the attribute exists for the element)
2555 tag[@att="val"]
2556 tag[@att=~ /regexp/]
2557 tag[att1="val1" and att2="val2"]
2558 tag[att1="val1" or att2="val2"]
2559 tag[string()="toto"] (returns tag elements which text (as per the text method)
2560 is toto)
2561 tag[string()=~/regexp/] (returns tag elements which text (as per the text
2562 method) matches regexp)
2563 expressions can start with / (search starts at the document root)
2564 expressions can start with . (search starts at the current element)
2565 // can be used to get all descendants instead of just direct children
2566 * matches any tag
2567
2568 So the following examples from the XPath
2569 recommendation<http://www.w3.org/TR/xpath.html#path-abbrev> work:
2570
2571 para selects the para element children of the context node
2572 * selects all element children of the context node
2573 para[1] selects the first para child of the context node
2574 para[last()] selects the last para child of the context node
2575 */para selects all para grandchildren of the context node
2576 /doc/chapter[5]/section[2] selects the second section of the fifth chapter
2577 of the doc
2578 chapter//para selects the para element descendants of the chapter element
2579 children of the context node
2580 //para selects all the para descendants of the document root and thus selects
2581 all para elements in the same document as the context node
2582 //olist/item selects all the item elements in the same document as the
2583 context node that have an olist parent
2584 .//para selects the para element descendants of the context node
2585 .. selects the parent of the context node
2586 para[@type="warning"] selects all para children of the context node that have
2587 a type attribute with value warning
2588 employee[@secretary and @assistant] selects all the employee children of the
2589 context node that have both a secretary attribute and an assistant
2590 attribute
2591
2592 The elements will be returned in the document order.
2593
2594 If $optional_offset is used then only one element will be returned,
2595 the one with the appropriate offset in the list, starting at 0
2596
2597 Quoting and interpolating variables can be a pain when the Perl
2598 syntax and the XPATH syntax collide, so use alternate quoting
2599 mechanisms like q or qq (I like q{} and qq{} myself).
2600
2601 Here are some more examples to get you started:
2602
2603 my $p1= "p1";
2604 my $p2= "p2";
2605 my @res= $t->get_xpath( qq{p[string( "$p1") or string( "$p2")]});
2606
2607 my $a= "a1";
2608 my @res= $t->get_xpath( qq{//*[@att="$a"]});
2609
2610 my $val= "a1";
2611 my $exp= qq{//p[ \@att='$val']}; # you need to use \@ or you will get a warning
2612 my @res= $t->get_xpath( $exp);
2613
2614 Note that the only supported regexps delimiters are / and that you
2615 must backslash all / in regexps AND in regular strings.
2616
2617 XML::Twig does not provide natively full XPATH support, but you can
2618 use "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as
2619 the XPath engine, with full coverage of the spec.
2620
2621 "XML::Twig::XPath" to get "findnodes" to use "XML::XPath" as the
2622 XPath engine, with full coverage of the spec.
2623
2624 find_nodes
2625 same as"get_xpath"
2626
2627 findnodes
2628 same as "get_xpath"
2629
2630 text @optional_options
2631 Return a string consisting of all the "PCDATA" and "CDATA" in an
2632 element, without any tags. The text is not XML-escaped: base
2633 entities such as "&" and "<" are not escaped.
2634
2635 The '"no_recurse"' option will only return the text of the element,
2636 not of any included sub-elements (same as "text_only").
2637
2638 text_only
2639 Same as "text" except that the text returned doesn't include the
2640 text of sub-elements.
2641
2642 trimmed_text
2643 Same as "text" except that the text is trimmed: leading and
2644 trailing spaces are discarded, consecutive spaces are collapsed
2645
2646 set_text ($string)
2647 Set the text for the element: if the element is a "PCDATA", just
2648 set its text, otherwise cut all the children of the element and
2649 create a single "PCDATA" child for it, which holds the text.
2650
2651 merge ($elt2)
2652 Move the content of $elt2 within the element
2653
2654 insert ($tag1, [$optional_atts1], $tag2, [$optional_atts2],...)
2655 For each tag in the list inserts an element $tag as the only child
2656 of the element. The element gets the optional attributes
2657 in"$optional_atts<n>." All children of the element are set as
2658 children of the new element. The upper level element is returned.
2659
2660 $p->insert( table => { border=> 1}, 'tr', 'td')
2661
2662 put $p in a table with a visible border, a single "tr" and a single
2663 "td" and return the "table" element:
2664
2665 <p><table border="1"><tr><td>original content of p</td></tr></table></p>
2666
2667 wrap_in (@tag)
2668 Wrap elements in @tag as the successive ancestors of the element,
2669 returns the new element. "$elt->wrap_in( 'td', 'tr', 'table')"
2670 wraps the element as a single cell in a table for example.
2671
2672 Optionally each tag can be followed by a hashref of attributes,
2673 that will be set on the wrapping element:
2674
2675 $elt->wrap_in( p => { class => "advisory" }, div => { class => "intro", id => "div_intro });
2676
2677 insert_new_elt ($opt_position, $tag, $opt_atts_hashref, @opt_content)
2678 Combines a "new " and a "paste ": creates a new element using $tag,
2679 $opt_atts_hashref and @opt_content which are arguments similar to
2680 those for "new", then paste it, using $opt_position or
2681 'first_child', relative to $elt.
2682
2683 Return the newly created element
2684
2685 erase
2686 Erase the element: the element is deleted and all of its children
2687 are pasted in its place.
2688
2689 set_content ( $optional_atts, @list_of_elt_and_strings) (
2690 $optional_atts, '#EMPTY')
2691 Set the content for the element, from a list of strings and
2692 elements. Cuts all the element children, then pastes the list
2693 elements as the children. This method will create a "PCDATA"
2694 element for any strings in the list.
2695
2696 The $optional_atts argument is the ref of a hash of attributes. If
2697 this argument is used then the previous attributes are deleted,
2698 otherwise they are left untouched.
2699
2700 WARNING: if you rely on ID's then you will have to set the id
2701 yourself. At this point the element does not belong to a twig yet,
2702 so the ID attribute is not known so it won't be stored in the ID
2703 list.
2704
2705 A content of '"#EMPTY"' creates an empty element;
2706
2707 namespace ($optional_prefix)
2708 Return the URI of the namespace that $optional_prefix or the
2709 element name belongs to. If the name doesn't belong to any
2710 namespace, "undef" is returned.
2711
2712 local_name
2713 Return the local name (without the prefix) for the element
2714
2715 ns_prefix
2716 Return the namespace prefix for the element
2717
2718 current_ns_prefixes
2719 Return a list of namespace prefixes valid for the element. The
2720 order of the prefixes in the list has no meaning. If the default
2721 namespace is currently bound, '' appears in the list.
2722
2723 inherit_att ($att, @optional_tag_list)
2724 Return the value of an attribute inherited from parent tags. The
2725 value returned is found by looking for the attribute in the element
2726 then in turn in each of its ancestors. If the @optional_tag_list is
2727 supplied only those ancestors whose tag is in the list will be
2728 checked.
2729
2730 all_children_are ($optional_condition)
2731 return 1 if all children of the element pass the
2732 $optional_condition, 0 otherwise
2733
2734 level ($optional_condition)
2735 Return the depth of the element in the twig (root is 0). If
2736 $optional_condition is given then only ancestors that match the
2737 condition are counted.
2738
2739 WARNING: in a tree created using the "twig_roots" option this will
2740 not return the level in the document tree, level 0 will be the
2741 document root, level 1 will be the "twig_roots" elements. During
2742 the parsing (in a "twig_handler") you can use the "depth" method on
2743 the twig object to get the real parsing depth.
2744
2745 in ($potential_parent)
2746 Return true if the element is in the potential_parent
2747 ($potential_parent is an element)
2748
2749 in_context ($cond, $optional_level)
2750 Return true if the element is included in an element which passes
2751 $cond optionally within $optional_level levels. The returned value
2752 is the including element.
2753
2754 pcdata
2755 Return the text of a "PCDATA" element or "undef" if the element is
2756 not "PCDATA".
2757
2758 pcdata_xml_string
2759 Return the text of a "PCDATA" element or undef if the element is
2760 not "PCDATA". The text is "XML-escaped" ('&' and '<' are replaced
2761 by '&' and '<')
2762
2763 set_pcdata ($text)
2764 Set the text of a "PCDATA" element. This method does not check that
2765 the element is indeed a "PCDATA" so usually you should use
2766 "set_text" instead.
2767
2768 append_pcdata ($text)
2769 Add the text at the end of a "PCDATA" element.
2770
2771 is_cdata
2772 Return 1 if the element is a "CDATA" element, returns 0 otherwise.
2773
2774 is_text
2775 Return 1 if the element is a "CDATA" or "PCDATA" element, returns 0
2776 otherwise.
2777
2778 cdata
2779 Return the text of a "CDATA" element or "undef" if the element is
2780 not "CDATA".
2781
2782 cdata_string
2783 Return the XML string of a "CDATA" element, including the opening
2784 and closing markers.
2785
2786 set_cdata ($text)
2787 Set the text of a "CDATA" element.
2788
2789 append_cdata ($text)
2790 Add the text at the end of a "CDATA" element.
2791
2792 remove_cdata
2793 Turns all "CDATA" sections in the element into regular "PCDATA"
2794 elements. This is useful when converting XML to HTML, as browsers
2795 do not support CDATA sections.
2796
2797 extra_data
2798 Return the extra_data (comments and PI's) attached to an element
2799
2800 set_extra_data ($extra_data)
2801 Set the extra_data (comments and PI's) attached to an element
2802
2803 append_extra_data ($extra_data)
2804 Append extra_data to the existing extra_data before the element (if
2805 no previous extra_data exists then it is created)
2806
2807 set_asis
2808 Set a property of the element that causes it to be output without
2809 being XML escaped by the print functions: if it contains "a < b" it
2810 will be output as such and not as "a < b". This can be useful to
2811 create text elements that will be output as markup. Note that all
2812 "PCDATA" descendants of the element are also marked as having the
2813 property (they are the ones that are actually impacted by the
2814 change).
2815
2816 If the element is a "CDATA" element it will also be output asis,
2817 without the "CDATA" markers. The same goes for any "CDATA"
2818 descendant of the element
2819
2820 set_not_asis
2821 Unsets the "asis" property for the element and its text
2822 descendants.
2823
2824 is_asis
2825 Return the "asis" property status of the element ( 1 or "undef")
2826
2827 closed
2828 Return true if the element has been closed. Might be useful if you
2829 are somewhere in the tree, during the parse, and have no idea
2830 whether a parent element is completely loaded or not.
2831
2832 get_type
2833 Return the type of the element: '"#ELT"' for "real" elements, or
2834 '"#PCDATA"', '"#CDATA"', '"#COMMENT"', '"#ENT"', '"#PI"'
2835
2836 is_elt
2837 Return the tag if the element is a "real" element, or 0 if it is
2838 "PCDATA", "CDATA"...
2839
2840 contains_only_text
2841 Return 1 if the element does not contain any other "real" element
2842
2843 contains_only ($exp)
2844 Return the list of children if all children of the element match
2845 the expression $exp
2846
2847 if( $para->contains_only( 'tt')) { ... }
2848
2849 contains_a_single ($exp)
2850 If the element contains a single child that matches the expression
2851 $exp returns that element. Otherwise returns 0.
2852
2853 is_field
2854 same as "contains_only_text"
2855
2856 is_pcdata
2857 Return 1 if the element is a "PCDATA" element, returns 0 otherwise.
2858
2859 is_ent
2860 Return 1 if the element is an entity (an unexpanded entity)
2861 element, return 0 otherwise.
2862
2863 is_empty
2864 Return 1 if the element is empty, 0 otherwise
2865
2866 set_empty
2867 Flags the element as empty. No further check is made, so if the
2868 element is actually not empty the output will be messed. The only
2869 effect of this method is that the output will be "<tag
2870 att="value""/>".
2871
2872 set_not_empty
2873 Flags the element as not empty. if it is actually empty then the
2874 element will be output as "<tag att="value""></tag>"
2875
2876 is_pi
2877 Return 1 if the element is a processing instruction ("#PI")
2878 element, return 0 otherwise.
2879
2880 target
2881 Return the target of a processing instruction
2882
2883 set_target ($target)
2884 Set the target of a processing instruction
2885
2886 data
2887 Return the data part of a processing instruction
2888
2889 set_data ($data)
2890 Set the data of a processing instruction
2891
2892 set_pi ($target, $data)
2893 Set the target and data of a processing instruction
2894
2895 pi_string
2896 Return the string form of a processing instruction ("<?target
2897 data?>")
2898
2899 is_comment
2900 Return 1 if the element is a comment ("#COMMENT") element, return 0
2901 otherwise.
2902
2903 set_comment ($comment_text)
2904 Set the text for a comment
2905
2906 comment
2907 Return the content of a comment (just the text, not the "<!--" and
2908 "-->")
2909
2910 comment_string
2911 Return the XML string for a comment ("<!-- comment -->")
2912
2913 set_ent ($entity)
2914 Set an (non-expanded) entity ("#ENT"). $entity) is the entity text
2915 ("&ent;")
2916
2917 ent Return the entity for an entity ("#ENT") element ("&ent;")
2918
2919 ent_name
2920 Return the entity name for an entity ("#ENT") element ("ent")
2921
2922 ent_string
2923 Return the entity, either expanded if the expanded version is
2924 available, or non-expanded ("&ent;") otherwise
2925
2926 child ($offset, $optional_condition)
2927 Return the $offset-th child of the element, optionally the
2928 $offset-th child that matches $optional_condition. The children are
2929 treated as a list, so "$elt->child( 0)" is the first child, while
2930 "$elt->child( -1)" is the last child.
2931
2932 child_text ($offset, $optional_condition)
2933 Return the text of a child or "undef" if the sibling does not
2934 exist. Arguments are the same as child.
2935
2936 last_child ($optional_condition)
2937 Return the last child of the element, or the last child matching
2938 $optional_condition (ie the last of the element children matching
2939 the condition).
2940
2941 last_child_text ($optional_condition)
2942 Same as "first_child_text" but for the last child.
2943
2944 sibling ($offset, $optional_condition)
2945 Return the next or previous $offset-th sibling of the element, or
2946 the $offset-th one matching $optional_condition. If $offset is
2947 negative then a previous sibling is returned, if $offset is
2948 positive then a next sibling is returned. "$offset=0" returns the
2949 element if there is no condition or if the element matches the
2950 condition>, "undef" otherwise.
2951
2952 sibling_text ($offset, $optional_condition)
2953 Return the text of a sibling or "undef" if the sibling does not
2954 exist. Arguments are the same as "sibling".
2955
2956 prev_siblings ($optional_condition)
2957 Return the list of previous siblings (optionally matching
2958 $optional_condition) for the element. The elements are ordered in
2959 document order.
2960
2961 next_siblings ($optional_condition)
2962 Return the list of siblings (optionally matching
2963 $optional_condition) following the element. The elements are
2964 ordered in document order.
2965
2966 pos ($optional_condition)
2967 Return the position of the element in the children list. The first
2968 child has a position of 1 (as in XPath).
2969
2970 If the $optional_condition is given then only siblings that match
2971 the condition are counted. If the element itself does not match the
2972 condition then 0 is returned.
2973
2974 atts
2975 Return a hash ref containing the element attributes
2976
2977 set_atts ({ att1=>$att1_val, att2=> $att2_val... })
2978 Set the element attributes with the hash ref supplied as the
2979 argument. The previous attributes are lost (ie the attributes set
2980 by "set_atts" replace all of the attributes of the element).
2981
2982 You can also pass a list instead of a hashref: "$elt->set_atts(
2983 att1 => 'val1',...)"
2984
2985 del_atts
2986 Deletes all the element attributes.
2987
2988 att_nb
2989 Return the number of attributes for the element
2990
2991 has_atts
2992 Return true if the element has attributes (in fact return the
2993 number of attributes, thus being an alias to "att_nb"
2994
2995 has_no_atts
2996 Return true if the element has no attributes, false (0) otherwise
2997
2998 att_names
2999 return a list of the attribute names for the element
3000
3001 att_xml_string ($att, $options)
3002 Return the attribute value, where '&', '<' and quote (" or the
3003 value of the quote option at twig creation) are XML-escaped.
3004
3005 The options are passed as a hashref, setting "escape_gt" to a true
3006 value will also escape '>' ($elt( 'myatt', { escape_gt => 1 });
3007
3008 set_id ($id)
3009 Set the "id" attribute of the element to the value. See "elt_id "
3010 to change the id attribute name
3011
3012 id Gets the id attribute value
3013
3014 del_id ($id)
3015 Deletes the "id" attribute of the element and remove it from the id
3016 list for the document
3017
3018 class
3019 Return the "class" attribute for the element (methods on the
3020 "class" attribute are quite convenient when dealing with XHTML, or
3021 plain XML that will eventually be displayed using CSS)
3022
3023 set_class ($class)
3024 Set the "class" attribute for the element to $class
3025
3026 add_to_class ($class)
3027 Add $class to the element "class" attribute: the new class is added
3028 only if it is not already present. Note that classes are sorted
3029 alphabetically, so the "class" attribute can be changed even if the
3030 class is already there
3031
3032 att_to_class ($att)
3033 Set the "class" attribute to the value of attribute $att
3034
3035 add_att_to_class ($att)
3036 Add the value of attribute $att to the "class" attribute of the
3037 element
3038
3039 move_att_to_class ($att)
3040 Add the value of attribute $att to the "class" attribute of the
3041 element and delete the attribute
3042
3043 tag_to_class
3044 Set the "class" attribute of the element to the element tag
3045
3046 add_tag_to_class
3047 Add the element tag to its "class" attribute
3048
3049 set_tag_class ($new_tag)
3050 Add the element tag to its "class" attribute and sets the tag to
3051 $new_tag
3052
3053 in_class ($class)
3054 Return true (1) if the element is in the class $class (if $class is
3055 one of the tokens in the element "class" attribute)
3056
3057 tag_to_span
3058 Change the element tag tp "span" and set its class to the old tag
3059
3060 tag_to_div
3061 Change the element tag tp "div" and set its class to the old tag
3062
3063 DESTROY
3064 Frees the element from memory.
3065
3066 start_tag
3067 Return the string for the start tag for the element, including the
3068 "/>" at the end of an empty element tag
3069
3070 end_tag
3071 Return the string for the end tag of an element. For an empty
3072 element, this returns the empty string ('').
3073
3074 xml_string @optional_options
3075 Equivalent to "$elt->sprint( 1)", returns the string for the entire
3076 element, excluding the element's tags (but nested element tags are
3077 present)
3078
3079 The '"no_recurse"' option will only return the text of the element,
3080 not of any included sub-elements (same as "xml_text_only").
3081
3082 inner_xml
3083 Another synonym for xml_string
3084
3085 outer_xml
3086 An other synonym for sprint
3087
3088 xml_text
3089 Return the text of the element, encoded (and processed by the
3090 current "output_filter" or "output_encoding" options, without any
3091 tag.
3092
3093 xml_text_only
3094 Same as "xml_text" except that the text returned doesn't include
3095 the text of sub-elements.
3096
3097 set_pretty_print ($style)
3098 Set the pretty print method, amongst '"none"' (default),
3099 '"nsgmls"', '"nice"', '"indented"', '"record"' and '"record_c"'
3100
3101 pretty_print styles:
3102
3103 none
3104 the default, no "\n" is used
3105
3106 nsgmls
3107 nsgmls style, with "\n" added within tags
3108
3109 nice
3110 adds "\n" wherever possible (NOT SAFE, can lead to invalid XML)
3111
3112 indented
3113 same as "nice" plus indents elements (NOT SAFE, can lead to
3114 invalid XML)
3115
3116 record
3117 table-oriented pretty print, one field per line
3118
3119 record_c
3120 table-oriented pretty print, more compact than "record", one
3121 record per line
3122
3123 set_empty_tag_style ($style)
3124 Set the method to output empty tags, amongst '"normal"' (default),
3125 '"html"', and '"expand"',
3126
3127 "normal" outputs an empty tag '"<tag/>"', "html" adds a space
3128 '"<tag />"' for elements that can be empty in XHTML and "expand"
3129 outputs '"<tag></tag>"'
3130
3131 set_remove_cdata ($flag)
3132 set (or unset) the flag that forces the twig to output CDATA
3133 sections as regular (escaped) PCDATA
3134
3135 set_indent ($string)
3136 Set the indentation for the indented pretty print style (default is
3137 2 spaces)
3138
3139 set_quote ($quote)
3140 Set the quotes used for attributes. can be '"double"' (default) or
3141 '"single"'
3142
3143 cmp ($elt)
3144 Compare the order of the 2 elements in a twig.
3145
3146 C<$a> is the <A>..</A> element, C<$b> is the <B>...</B> element
3147
3148 document $a->cmp( $b)
3149 <A> ... </A> ... <B> ... </B> -1
3150 <A> ... <B> ... </B> ... </A> -1
3151 <B> ... </B> ... <A> ... </A> 1
3152 <B> ... <A> ... </A> ... </B> 1
3153 $a == $b 0
3154 $a and $b not in the same tree undef
3155
3156 before ($elt)
3157 Return 1 if $elt starts before the element, 0 otherwise. If the 2
3158 elements are not in the same twig then return "undef".
3159
3160 if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3161
3162 after ($elt)
3163 Return 1 if $elt starts after the element, 0 otherwise. If the 2
3164 elements are not in the same twig then return "undef".
3165
3166 if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
3167
3168 other comparison methods
3169 lt
3170 le
3171 gt
3172 ge
3173 path
3174 Return the element context in a form similar to XPath's short form:
3175 '"/root/tag1/../tag"'
3176
3177 xpath
3178 Return a unique XPath expression that can be used to find the
3179 element again.
3180
3181 It looks like "/doc/sect[3]/title": unique elements do not have an
3182 index, the others do.
3183
3184 private methods
3185 Low-level methods on the twig:
3186
3187 set_parent ($parent)
3188 set_first_child ($first_child)
3189 set_last_child ($last_child)
3190 set_prev_sibling ($prev_sibling)
3191 set_next_sibling ($next_sibling)
3192 set_twig_current
3193 del_twig_current
3194 twig_current
3195 flush
3196 This method should NOT be used, always flush the twig, not an
3197 element.
3198
3199 contains_text
3200
3201 Those methods should not be used, unless of course you find some
3202 creative and interesting, not to mention useful, ways to do it.
3203
3204 cond
3205 Most of the navigation functions accept a condition as an optional
3206 argument The first element (or all elements for "children " or
3207 "ancestors ") that passes the condition is returned.
3208
3209 The condition is a single step of an XPath expression using the XPath
3210 subset defined by "get_xpath". Additional conditions are:
3211
3212 The condition can be
3213
3214 #ELT
3215 return a "real" element (not a PCDATA, CDATA, comment or pi
3216 element)
3217
3218 #TEXT
3219 return a PCDATA or CDATA element
3220
3221 regular expression
3222 return an element whose tag matches the regexp. The regexp has to
3223 be created with "qr//" (hence this is available only on perl 5.005
3224 and above)
3225
3226 code reference
3227 applies the code, passing the current element as argument, if the
3228 code returns true then the element is returned, if it returns false
3229 then the code is applied to the next candidate.
3230
3231 XML::Twig::XPath
3232 XML::Twig implements a subset of XPath through the "get_xpath" method.
3233
3234 If you want to use the whole XPath power, then you can use
3235 "XML::Twig::XPath" instead. In this case "XML::Twig" uses "XML::XPath"
3236 to execute XPath queries. You will of course need "XML::XPath"
3237 installed to be able to use "XML::Twig::XPath".
3238
3239 See XML::XPath for more information.
3240
3241 The methods you can use are:
3242
3243 findnodes ($path)
3244 return a list of nodes found by $path.
3245
3246 findnodes_as_string ($path)
3247 return the nodes found reproduced as XML. The result is not
3248 guaranteed to be valid XML though.
3249
3250 findvalue ($path)
3251 return the concatenation of the text content of the result nodes
3252
3253 In order for "XML::XPath" to be used as the XPath engine the following
3254 methods are included in "XML::Twig":
3255
3256 in XML::Twig
3257
3258 getRootNode
3259 getParentNode
3260 getChildNodes
3261
3262 in XML::Twig::Elt
3263
3264 string_value
3265 toString
3266 getName
3267 getRootNode
3268 getNextSibling
3269 getPreviousSibling
3270 isElementNode
3271 isTextNode
3272 isPI
3273 isPINode
3274 isProcessingInstructionNode
3275 isComment
3276 isCommentNode
3277 getTarget
3278 getChildNodes
3279 getElementById
3280
3281 XML::Twig::XPath::Elt
3282 The methods you can use are the same as on "XML::Twig::XPath" elements:
3283
3284 findnodes ($path)
3285 return a list of nodes found by $path.
3286
3287 findnodes_as_string ($path)
3288 return the nodes found reproduced as XML. The result is not
3289 guaranteed to be valid XML though.
3290
3291 findvalue ($path)
3292 return the concatenation of the text content of the result nodes
3293
3294 XML::Twig::Entity_list
3295 new Create an entity list.
3296
3297 add ($ent)
3298 Add an entity to an entity list.
3299
3300 add_new_ent ($name, $val, $sysid, $pubid, $ndata, $param)
3301 Create a new entity and add it to the entity list
3302
3303 delete ($ent or $tag).
3304 Delete an entity (defined by its name or by the Entity object) from
3305 the list.
3306
3307 print ($optional_filehandle)
3308 Print the entity list.
3309
3310 list
3311 Return the list as an array
3312
3313 XML::Twig::Entity
3314 new ($name, $val, $sysid, $pubid, $ndata, $param)
3315 Same arguments as the Entity handler for XML::Parser.
3316
3317 print ($optional_filehandle)
3318 Print an entity declaration.
3319
3320 name
3321 Return the name of the entity
3322
3323 val Return the value of the entity
3324
3325 sysid
3326 Return the system id for the entity (for NDATA entities)
3327
3328 pubid
3329 Return the public id for the entity (for NDATA entities)
3330
3331 ndata
3332 Return true if the entity is an NDATA entity
3333
3334 param
3335 Return true if the entity is a parameter entity
3336
3337 text
3338 Return the entity declaration text.
3339
3341 Additional examples (and a complete tutorial) can be found on the
3342 XML::Twig Page<http://www.xmltwig.com/xmltwig/>
3343
3344 To figure out what flush does call the following script with an XML
3345 file and an element name as arguments
3346
3347 use XML::Twig;
3348
3349 my ($file, $elt)= @ARGV;
3350 my $t= XML::Twig->new( twig_handlers =>
3351 { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
3352 $t->parsefile( $file, ErrorContext => 2);
3353 $t->flush;
3354 print "\n";
3355
3357 Subclassing XML::Twig
3358 Useful methods:
3359
3360 elt_class
3361 In order to subclass "XML::Twig" you will probably need to subclass
3362 also "XML::Twig::Elt". Use the "elt_class" option when you create
3363 the "XML::Twig" object to get the elements created in a different
3364 class (which should be a subclass of "XML::Twig::Elt".
3365
3366 add_options
3367 If you inherit "XML::Twig" new method but want to add more options
3368 to it you can use this method to prevent XML::Twig to issue
3369 warnings for those additional options.
3370
3371 DTD Handling
3372 There are 3 possibilities here. They are:
3373
3374 No DTD
3375 No doctype, no DTD information, no entity information, the world is
3376 simple...
3377
3378 Internal DTD
3379 The XML document includes an internal DTD, and maybe entity
3380 declarations.
3381
3382 If you use the load_DTD option when creating the twig the DTD
3383 information and the entity declarations can be accessed.
3384
3385 The DTD and the entity declarations will be "flush"'ed (or
3386 "print"'ed) either as is (if they have not been modified) or as
3387 reconstructed (poorly, comments are lost, order is not kept, due to
3388 it's content this DTD should not be viewed by anyone) if they have
3389 been modified. You can also modify them directly by changing the
3390 "$twig->{twig_doctype}->{internal}" field (straight from
3391 XML::Parser, see the "Doctype" handler doc)
3392
3393 External DTD
3394 The XML document includes a reference to an external DTD, and maybe
3395 entity declarations.
3396
3397 If you use the "load_DTD" when creating the twig the DTD
3398 information and the entity declarations can be accessed. The entity
3399 declarations will be "flush"'ed (or "print"'ed) either as is (if
3400 they have not been modified) or as reconstructed (badly, comments
3401 are lost, order is not kept).
3402
3403 You can change the doctype through the "$twig->set_doctype" method
3404 and print the dtd through the "$twig->dtd_text" or
3405 "$twig->dtd_print"
3406 methods.
3407
3408 If you need to modify the entity list this is probably the easiest
3409 way to do it.
3410
3411 Flush
3412 If you set handlers and use "flush", do not forget to flush the twig
3413 one last time AFTER the parsing, or you might be missing the end of the
3414 document.
3415
3416 Remember that element handlers are called when the element is CLOSED,
3417 so if you have handlers for nested elements the inner handlers will be
3418 called first. It makes it for example trickier than it would seem to
3419 number nested clauses.
3420
3422 entity handling
3423 Due to XML::Parser behaviour, non-base entities in attribute values
3424 disappear: "att="val&ent;"" will be turned into "att => val",
3425 unless you use the "keep_encoding" argument to "XML::Twig->new"
3426
3427 DTD handling
3428 The DTD handling methods are quite bugged. No one uses them and it
3429 seems very difficult to get them to work in all cases, including
3430 with several slightly incompatible versions of XML::Parser and of
3431 libexpat.
3432
3433 Basically you can read the DTD, output it back properly, and update
3434 entities, but not much more.
3435
3436 So use XML::Twig with standalone documents, or with documents
3437 refering to an external DTD, but don't expect it to properly parse
3438 and even output back the DTD.
3439
3440 memory leak
3441 If you use a lot of twigs you might find that you leak quite a lot
3442 of memory (about 2Ks per twig). You can use the "dispose " method
3443 to free that memory after you are done.
3444
3445 If you create elements the same thing might happen, use the
3446 "delete" method to get rid of them.
3447
3448 Alternatively installing the "Scalar::Util" (or "WeakRef") module
3449 on a version of Perl that supports it (>5.6.0) will get rid of the
3450 memory leaks automagically.
3451
3452 ID list
3453 The ID list is NOT updated when elements are cut or deleted.
3454
3455 change_gi
3456 This method will not function properly if you do:
3457
3458 $twig->change_gi( $old1, $new);
3459 $twig->change_gi( $old2, $new);
3460 $twig->change_gi( $new, $even_newer);
3461
3462 sanity check on XML::Parser method calls
3463 XML::Twig should really prevent calls to some XML::Parser methods,
3464 especially the "setHandlers" method.
3465
3466 pretty printing
3467 Pretty printing (at least using the '"indented"' style) is hard to
3468 get right! Only elements that belong to the document will be
3469 properly indented. Printing elements that do not belong to the twig
3470 makes it impossible for XML::Twig to figure out their depth, and
3471 thus their indentation level.
3472
3473 Also there is an unavoidable bug when using "flush" and pretty
3474 printing for elements with mixed content that start with an
3475 embedded element:
3476
3477 <elt><b>b</b>toto<b>bold</b></elt>
3478
3479 will be output as
3480
3481 <elt>
3482 <b>b</b>toto<b>bold</b></elt>
3483
3484 if you flush the twig when you find the "<b>" element
3485
3487 These are the things that can mess up calling code, especially if
3488 threaded. They might also cause problem under mod_perl.
3489
3490 Exported constants
3491 Whether you want them or not you get them! These are subroutines to
3492 use as constant when creating or testing elements
3493
3494 PCDATA return '#PCDATA'
3495 CDATA return '#CDATA'
3496 PI return '#PI', I had the choice between PROC and PI :--(
3497
3498 Module scoped values: constants
3499 these should cause no trouble:
3500
3501 %base_ent= ( '>' => '>',
3502 '<' => '<',
3503 '&' => '&',
3504 "'" => ''',
3505 '"' => '"',
3506 );
3507 CDATA_START = "<![CDATA[";
3508 CDATA_END = "]]>";
3509 PI_START = "<?";
3510 PI_END = "?>";
3511 COMMENT_START = "<!--";
3512 COMMENT_END = "-->";
3513
3514 pretty print styles
3515
3516 ( $NSGMLS, $NICE, $INDENTED, $INDENTED_C, $WRAPPED, $RECORD1, $RECORD2)= (1..7);
3517
3518 empty tag output style
3519
3520 ( $HTML, $EXPAND)= (1..2);
3521
3522 Module scoped values: might be changed
3523 Most of these deal with pretty printing, so the worst that can
3524 happen is probably that XML output does not look right, but is
3525 still valid and processed identically by XML processors.
3526
3527 $empty_tag_style can mess up HTML bowsers though and changing $ID
3528 would most likely create problems.
3529
3530 $pretty=0; # pretty print style
3531 $quote='"'; # quote for attributes
3532 $INDENT= ' '; # indent for indented pretty print
3533 $empty_tag_style= 0; # how to display empty tags
3534 $ID # attribute used as an id ('id' by default)
3535
3536 Module scoped values: definitely changed
3537 These 2 variables are used to replace tags by an index, thus saving
3538 some space when creating a twig. If they really cause you too much
3539 trouble, let me know, it is probably possible to create either a
3540 switch or at least a version of XML::Twig that does not perform
3541 this optimization.
3542
3543 %gi2index; # tag => index
3544 @index2gi; # list of tags
3545
3546 If you need to manipulate all those values, you can use the following
3547 methods on the XML::Twig object:
3548
3549 global_state
3550 Return a hashref with all the global variables used by XML::Twig
3551
3552 The hash has the following fields: "pretty", "quote", "indent",
3553 "empty_tag_style", "keep_encoding", "expand_external_entities",
3554 "output_filter", "output_text_filter", "keep_atts_order"
3555
3556 set_global_state ($state)
3557 Set the global state, $state is a hashref
3558
3559 save_global_state
3560 Save the current global state
3561
3562 restore_global_state
3563 Restore the previously saved (using "Lsave_global_state"> state
3564
3566 SAX handlers
3567 Allowing XML::Twig to work on top of any SAX parser
3568
3569 multiple twigs are not well supported
3570 A number of twig features are just global at the moment. These
3571 include the ID list and the "tag pool" (if you use "change_gi" then
3572 you change the tag for ALL twigs).
3573
3574 A future version will try to support this while trying not to be to
3575 hard on performance (at least when a single twig is used!).
3576
3578 Michel Rodriguez <mirod@xmltwig.com>
3579
3581 This library is free software; you can redistribute it and/or modify it
3582 under the same terms as Perl itself.
3583
3584 Bug reports should be sent using: RT
3585 <http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig>
3586
3587 Comments can be sent to mirod@xmltwig.com
3588
3589 The XML::Twig page is at <http://www.xmltwig.com/xmltwig/> It includes
3590 the development version of the module, a slightly better version of the
3591 documentation, examples, a tutorial and a: Processing XML efficiently
3592 with Perl and XML::Twig:
3593 <http://www.xmltwig.com/xmltwig/tutorial/index.html>
3594
3596 Complete docs, including a tutorial, examples, an easier to use HTML
3597 version of the docs, a quick reference card and a FAQ are available at
3598 <http://www.xmltwig.com/xmltwig/>
3599
3600 git repository at <http://github.com/mirod/xmltwig>
3601
3602 XML::Parser, XML::Parser::Expat, XML::XPath, Encode, Text::Iconv,
3603 Scalar::Utils
3604
3605 Alternative Modules
3606 XML::Twig is not the only XML::Processing module available on CPAN (far
3607 from it!).
3608
3609 The main alternative I would recommend is XML::LibXML.
3610
3611 Here is a quick comparison of the 2 modules:
3612
3613 XML::LibXML, actually "libxml2" on which it is based, sticks to the
3614 standards, and implements a good number of them in a rather strict way:
3615 XML, XPath, DOM, RelaxNG, I must be forgetting a couple (XInclude?). It
3616 is fast and rather frugal memory-wise.
3617
3618 XML::Twig is older: when I started writing it XML::Parser/expat was the
3619 only game in town. It implements XML and that's about it (plus a subset
3620 of XPath, and you can use XML::Twig::XPath if you have XML::XPathEngine
3621 installed for full support). It is slower and requires more memory for
3622 a full tree than XML::LibXML. On the plus side (yes, there is a plus
3623 side!) it lets you process a big document in chunks, and thus let you
3624 tackle documents that couldn't be loaded in memory by XML::LibXML, and
3625 it offers a lot (and I mean a LOT!) of higher-level methods, for
3626 everything, from adding structure to "low-level" XML, to shortcuts for
3627 XHTML conversions and more. It also DWIMs quite a bit, getting comments
3628 and non-significant whitespaces out of the way but preserving them in
3629 the output for example. As it does not stick to the DOM, is also
3630 usually leads to shorter code than in XML::LibXML.
3631
3632 Beyond the pure features of the 2 modules, XML::LibXML seems to be
3633 prefered by "XML-purists", while XML::Twig seems to be more used by
3634 Perl Hackers who have to deal with XML. As you have noted, XML::Twig
3635 also comes with quite a lot of docs, but I am sure if you ask for help
3636 about XML::LibXML here or on Perlmonks you will get answers.
3637
3638 Note that it is actually quite hard for me to compare the 2 modules: on
3639 one hand I know XML::Twig inside-out and I can get it to do pretty much
3640 anything I need to (or I improve it ;--), while I have a very basic
3641 knowledge of XML::LibXML. So feature-wise, I'd rather use XML::Twig
3642 ;--). On the other hand, I am painfully aware of some of the
3643 deficiencies, potential bugs and plain ugly code that lurk in
3644 XML::Twig, even though you are unlikely to be affected by them (unless
3645 for example you need to change the DTD of a document programatically),
3646 while I haven't looked much into XML::LibXML so it still looks shinny
3647 and clean to me.
3648
3649 That said, if you need to process a document that is too big to fit
3650 memory and XML::Twig is too slow for you, my reluctant advice would be
3651 to use "bare" XML::Parser. It won't be as easy to use as XML::Twig:
3652 basically with XML::Twig you trade some speed (depending on what you do
3653 from a factor 3 to... none) for ease-of-use, but it will be easier IMHO
3654 than using SAX (albeit not standard), and at this point a LOT faster
3655 (see the last test in
3656 <http://www.xmltwig.com/article/simple_benchmark/>).
3657
3658
3659
3660perl v5.10.1 2010-08-22 Twig(3)