1HTML::Element(3)      User Contributed Perl Documentation     HTML::Element(3)
2
3
4

NAME

6       HTML::Element - Class for objects that represent HTML elements
7

VERSION

9       This document describes version 5.07 of HTML::Element, released August
10       31, 2017 as part of HTML-Tree.
11

SYNOPSIS

13           use HTML::Element;
14           $a = HTML::Element->new('a', href => 'http://www.perl.com/');
15           $a->push_content("The Perl Homepage");
16
17           $tag = $a->tag;
18           print "$tag starts out as:",  $a->starttag, "\n";
19           print "$tag ends as:",  $a->endtag, "\n";
20           print "$tag\'s href attribute is: ", $a->attr('href'), "\n";
21
22           $links_r = $a->extract_links();
23           print "Hey, I found ", scalar(@$links_r), " links.\n";
24
25           print "And that, as HTML, is: ", $a->as_HTML, "\n";
26           $a = $a->delete;
27

DESCRIPTION

29       (This class is part of the HTML::Tree dist.)
30
31       Objects of the HTML::Element class can be used to represent elements of
32       HTML document trees.  These objects have attributes, notably attributes
33       that designates each element's parent and content.  The content is an
34       array of text segments and other HTML::Element objects.  A tree with
35       HTML::Element objects as nodes can represent the syntax tree for a HTML
36       document.
37

HOW WE REPRESENT TREES

39       Consider this HTML document:
40
41         <html lang='en-US'>
42           <head>
43             <title>Stuff</title>
44             <meta name='author' content='Jojo'>
45           </head>
46           <body>
47            <h1>I like potatoes!</h1>
48           </body>
49         </html>
50
51       Building a syntax tree out of it makes a tree-structure in memory that
52       could be diagrammed as:
53
54                            html (lang='en-US')
55                             / \
56                           /     \
57                         /         \
58                       head        body
59                      /\               \
60                    /    \               \
61                  /        \               \
62                title     meta              h1
63                 |       (name='author',     |
64              "Stuff"    content='Jojo')    "I like potatoes"
65
66       This is the traditional way to diagram a tree, with the "root" at the
67       top, and it's this kind of diagram that people have in mind when they
68       say, for example, that "the meta element is under the head element
69       instead of under the body element".  (The same is also said with
70       "inside" instead of "under" -- the use of "inside" makes more sense
71       when you're looking at the HTML source.)
72
73       Another way to represent the above tree is with indenting:
74
75         html (attributes: lang='en-US')
76           head
77             title
78               "Stuff"
79             meta (attributes: name='author' content='Jojo')
80           body
81             h1
82               "I like potatoes"
83
84       Incidentally, diagramming with indenting works much better for very
85       large trees, and is easier for a program to generate.  The
86       "$tree->dump" method uses indentation just that way.
87
88       However you diagram the tree, it's stored the same in memory -- it's a
89       network of objects, each of which has attributes like so:
90
91         element #1:  _tag: 'html'
92                      _parent: none
93                      _content: [element #2, element #5]
94                      lang: 'en-US'
95
96         element #2:  _tag: 'head'
97                      _parent: element #1
98                      _content: [element #3, element #4]
99
100         element #3:  _tag: 'title'
101                      _parent: element #2
102                      _content: [text segment "Stuff"]
103
104         element #4   _tag: 'meta'
105                      _parent: element #2
106                      _content: none
107                      name: author
108                      content: Jojo
109
110         element #5   _tag: 'body'
111                      _parent: element #1
112                      _content: [element #6]
113
114         element #6   _tag: 'h1'
115                      _parent: element #5
116                      _content: [text segment "I like potatoes"]
117
118       The "treeness" of the tree-structure that these elements comprise is
119       not an aspect of any particular object, but is emergent from the
120       relatedness attributes (_parent and _content) of these element-objects
121       and from how you use them to get from element to element.
122
123       While you could access the content of a tree by writing code that says
124       "access the 'src' attribute of the root's first child's seventh child's
125       third child", you're more likely to have to scan the contents of a
126       tree, looking for whatever nodes, or kinds of nodes, you want to do
127       something with.  The most straightforward way to look over a tree is to
128       "traverse" it; an HTML::Element method ("$h->traverse") is provided for
129       this purpose; and several other HTML::Element methods are based on it.
130
131       (For everything you ever wanted to know about trees, and then some, see
132       Niklaus Wirth's Algorithms + Data Structures = Programs or Donald
133       Knuth's The Art of Computer Programming, Volume 1.)
134
135   Weak References
136       TL;DR summary: "use HTML::TreeBuilder 5 -weak;" and forget about the
137       "delete" method (except for pruning a node from a tree).
138
139       Because HTML::Element stores a reference to the parent element, Perl's
140       reference-count garbage collection doesn't work properly with
141       HTML::Element trees.  Starting with version 5.00, HTML::Element uses
142       weak references (if available) to prevent that problem.  Weak
143       references were introduced in Perl 5.6.0, but you also need a version
144       of Scalar::Util that provides the "weaken" function.
145
146       Weak references are enabled by default.  If you want to be certain
147       they're in use, you can say "use HTML::Element 5 -weak;".  You must
148       include the version number; previous versions of HTML::Element ignored
149       the import list entirely.
150
151       To disable weak references, you can say "use HTML::Element -noweak;".
152       This is a global setting.  This feature is deprecated and is provided
153       only as a quick fix for broken code.  If your code does not work
154       properly with weak references, you should fix it immediately, as weak
155       references may become mandatory in a future version.  Generally, all
156       you need to do is keep a reference to the root of the tree until you're
157       done working with it.
158
159       Because HTML::TreeBuilder is a subclass of HTML::Element, you can also
160       import "-weak" or "-noweak" from HTML::TreeBuilder: e.g.
161       "use HTML::TreeBuilder: 5 -weak;".
162

BASIC METHODS

164   new
165         $h = HTML::Element->new('tag', 'attrname' => 'value', ... );
166
167       This constructor method returns a new HTML::Element object.  The tag
168       name is a required argument; it will be forced to lowercase.
169       Optionally, you can specify other initial attributes at object creation
170       time.
171
172   attr
173         $value = $h->attr('attr');
174         $old_value = $h->attr('attr', $new_value);
175
176       Returns (optionally sets) the value of the given attribute of $h.  The
177       attribute name (but not the value, if provided) is forced to lowercase.
178       If trying to read the value of an attribute not present for this
179       element, the return value is undef.  If setting a new value, the old
180       value of that attribute is returned.
181
182       If methods are provided for accessing an attribute (like "$h->tag" for
183       "_tag", "$h->content_list", etc. below), use those instead of calling
184       attr "$h->attr", whether for reading or setting.
185
186       Note that setting an attribute to "undef" (as opposed to "", the empty
187       string) actually deletes the attribute.
188
189   tag
190         $tagname = $h->tag();
191         $h->tag('tagname');
192
193       Returns (optionally sets) the tag name (also known as the generic
194       identifier) for the element $h.  In setting, the tag name is always
195       converted to lower case.
196
197       There are four kinds of "pseudo-elements" that show up as HTML::Element
198       objects:
199
200       Comment pseudo-elements
201           These are element objects with a "$h->tag" value of "~comment", and
202           the content of the comment is stored in the "text" attribute
203           ("$h->attr("text")").  For example, parsing this code with
204           HTML::TreeBuilder...
205
206             <!-- I like Pie.
207                Pie is good
208             -->
209
210           produces an HTML::Element object with these attributes:
211
212             "_tag",
213             "~comment",
214             "text",
215             " I like Pie.\n     Pie is good\n  "
216
217       Declaration pseudo-elements
218           Declarations (rarely encountered) are represented as HTML::Element
219           objects with a tag name of "~declaration", and content in the
220           "text" attribute.  For example, this:
221
222             <!DOCTYPE foo>
223
224           produces an element whose attributes include:
225
226             "_tag", "~declaration", "text", "DOCTYPE foo"
227
228       Processing instruction pseudo-elements
229           PIs (rarely encountered) are represented as HTML::Element objects
230           with a tag name of "~pi", and content in the "text" attribute.  For
231           example, this:
232
233             <?stuff foo?>
234
235           produces an element whose attributes include:
236
237             "_tag", "~pi", "text", "stuff foo?"
238
239           (assuming a recent version of HTML::Parser)
240
241       ~literal pseudo-elements
242           These objects are not currently produced by HTML::TreeBuilder, but
243           can be used to represent a "super-literal" -- i.e., a literal you
244           want to be immune from escaping.  (Yes, I just made that term up.)
245
246           That is, this is useful if you want to insert code into a tree that
247           you plan to dump out with "as_HTML", where you want, for some
248           reason, to suppress "as_HTML"'s normal behavior of amp-quoting text
249           segments.
250
251           For example, this:
252
253             my $literal = HTML::Element->new('~literal',
254               'text' => 'x < 4 & y > 7'
255             );
256             my $span = HTML::Element->new('span');
257             $span->push_content($literal);
258             print $span->as_HTML;
259
260           prints this:
261
262             <span>x < 4 & y > 7</span>
263
264           Whereas this:
265
266             my $span = HTML::Element->new('span');
267             $span->push_content('x < 4 & y > 7');
268               # normal text segment
269             print $span->as_HTML;
270
271           prints this:
272
273             <span>x &lt; 4 &amp; y &gt; 7</span>
274
275           Unless you're inserting lots of pre-cooked code into existing
276           trees, and dumping them out again, it's not likely that you'll find
277           "~literal" pseudo-elements useful.
278
279   parent
280         $parent = $h->parent();
281         $h->parent($new_parent);
282
283       Returns (optionally sets) the parent (aka "container") for this
284       element.  The parent should either be undef, or should be another
285       element.
286
287       You should not use this to directly set the parent of an element.
288       Instead use any of the other methods under "Structure-Modifying
289       Methods", below.
290
291       Note that "not($h->parent)" is a simple test for whether $h is the root
292       of its subtree.
293
294   content_list
295         @content = $h->content_list();
296         $num_children = $h->content_list();
297
298       Returns a list of the child nodes of this element -- i.e., what nodes
299       (elements or text segments) are inside/under this element. (Note that
300       this may be an empty list.)
301
302       In a scalar context, this returns the count of the items, as you may
303       expect.
304
305   content
306         $content_array_ref = $h->content(); # may return undef
307
308       This somewhat deprecated method returns the content of this element;
309       but unlike content_list, this returns either undef (which you should
310       understand to mean no content), or a reference to the array of content
311       items, each of which is either a text segment (a string, i.e., a
312       defined non-reference scalar value), or an HTML::Element object.  Note
313       that even if an arrayref is returned, it may be a reference to an empty
314       array.
315
316       While older code should feel free to continue to use "$h->content", new
317       code should use "$h->content_list" in almost all conceivable cases.  It
318       is my experience that in most cases this leads to simpler code anyway,
319       since it means one can say:
320
321           @children = $h->content_list;
322
323       instead of the inelegant:
324
325           @children = @{$h->content || []};
326
327       If you do use "$h->content" (or "$h->content_array_ref"), you should
328       not use the reference returned by it (assuming it returned a reference,
329       and not undef) to directly set or change the content of an element or
330       text segment!  Instead use content_refs_list or any of the other
331       methods under "Structure-Modifying Methods", below.
332
333   content_array_ref
334         $content_array_ref = $h->content_array_ref(); # never undef
335
336       This is like "content" (with all its caveats and deprecations) except
337       that it is guaranteed to return an array reference.  That is, if the
338       given node has no "_content" attribute, the "content" method would
339       return that undef, but "content_array_ref" would set the given node's
340       "_content" value to "[]" (a reference to a new, empty array), and
341       return that.
342
343   content_refs_list
344         @content_refs = $h->content_refs_list;
345
346       This returns a list of scalar references to each element of $h's
347       content list.  This is useful in case you want to in-place edit any
348       large text segments without having to get a copy of the current value
349       of that segment value, modify that copy, then use the "splice_content"
350       to replace the old with the new.  Instead, here you can in-place edit:
351
352           foreach my $item_r ($h->content_refs_list) {
353               next if ref $$item_r;
354               $$item_r =~ s/honour/honor/g;
355           }
356
357       You could currently achieve the same affect with:
358
359           foreach my $item (@{ $h->content_array_ref }) {
360               # deprecated!
361               next if ref $item;
362               $item =~ s/honour/honor/g;
363           }
364
365       ...except that using the return value of "$h->content" or
366       "$h->content_array_ref" to do that is deprecated, and just might stop
367       working in the future.
368
369   implicit
370         $is_implicit = $h->implicit();
371         $h->implicit($make_implicit);
372
373       Returns (optionally sets) the "_implicit" attribute.  This attribute is
374       a flag that's used for indicating that the element was not originally
375       present in the source, but was added to the parse tree (by
376       HTML::TreeBuilder, for example) in order to conform to the rules of
377       HTML structure.
378
379   pos
380         $pos = $h->pos();
381         $h->pos($element);
382
383       Returns (and optionally sets) the "_pos" (for "current position")
384       pointer of $h.  This attribute is a pointer used during some parsing
385       operations, whose value is whatever HTML::Element element at or under
386       $h is currently "open", where "$h->insert_element(NEW)" will actually
387       insert a new element.
388
389       (This has nothing to do with the Perl function called "pos", for
390       controlling where regular expression matching starts.)
391
392       If you set "$h->pos($element)", be sure that $element is either $h, or
393       an element under $h.
394
395       If you've been modifying the tree under $h and are no longer sure
396       "$h->pos" is valid, you can enforce validity with:
397
398           $h->pos(undef) unless $h->pos->is_inside($h);
399
400   all_attr
401         %attr = $h->all_attr();
402
403       Returns all this element's attributes and values, as key-value pairs.
404       This will include any "internal" attributes (i.e., ones not present in
405       the original element, and which will not be represented if/when you
406       call "$h->as_HTML").  Internal attributes are distinguished by the fact
407       that the first character of their key (not value! key!) is an
408       underscore ("_").
409
410       Example output of "$h->all_attr()" : "'_parent', "[object_value]" ,
411       '_tag', 'em', 'lang', 'en-US', '_content', "[array-ref value].
412
413   all_attr_names
414         @names = $h->all_attr_names();
415         $num_attrs = $h->all_attr_names();
416
417       Like "all_attr", but only returns the names of the attributes.  In
418       scalar context, returns the number of attributes.
419
420       Example output of "$h->all_attr_names()" : "'_parent', '_tag', 'lang',
421       '_content', ".
422
423   all_external_attr
424         %attr = $h->all_external_attr();
425
426       Like "all_attr", except that internal attributes are not present.
427
428   all_external_attr_names
429         @names = $h->all_external_attr_names();
430         $num_attrs = $h->all_external_attr_names();
431
432       Like "all_attr_names", except that internal attributes' names are not
433       present (or counted).
434
435   id
436         $id = $h->id();
437         $h->id($string);
438
439       Returns (optionally sets to $string) the "id" attribute.
440       "$h->id(undef)" deletes the "id" attribute.
441
442       "$h->id(...)" is basically equivalent to "$h->attr('id', ...)", except
443       that when setting the attribute, this method returns the new value, not
444       the old value.
445
446   idf
447         $id = $h->idf();
448         $h->idf($string);
449
450       Just like the "id" method, except that if you call "$h->idf()" and no
451       "id" attribute is defined for this element, then it's set to a likely-
452       to-be-unique value, and returned.  (The "f" is for "force".)
453

STRUCTURE-MODIFYING METHODS

455       These methods are provided for modifying the content of trees by adding
456       or changing nodes as parents or children of other nodes.
457
458   push_content
459         $h->push_content($element_or_text, ...);
460
461       Adds the specified items to the end of the content list of the element
462       $h.  The items of content to be added should each be either a text
463       segment (a string), an HTML::Element object, or an arrayref.  Arrayrefs
464       are fed thru "$h->new_from_lol(that_arrayref)" to convert them into
465       elements, before being added to the content list of $h.  This means you
466       can say things concise things like:
467
468         $body->push_content(
469           ['br'],
470           ['ul',
471             map ['li', $_], qw(Peaches Apples Pears Mangos)
472           ]
473         );
474
475       See the "new_from_lol" method's documentation, far below, for more
476       explanation.
477
478       Returns $h (the element itself).
479
480       The push_content method will try to consolidate adjacent text segments
481       while adding to the content list.  That's to say, if $h's
482       "content_list" is
483
484         ('foo bar ', $some_node, 'baz!')
485
486       and you call
487
488          $h->push_content('quack?');
489
490       then the resulting content list will be this:
491
492         ('foo bar ', $some_node, 'baz!quack?')
493
494       and not this:
495
496         ('foo bar ', $some_node, 'baz!', 'quack?')
497
498       If that latter is what you want, you'll have to override the feature of
499       consolidating text by using splice_content, as in:
500
501         $h->splice_content(scalar($h->content_list),0,'quack?');
502
503       Similarly, if you wanted to add 'Skronk' to the beginning of the
504       content list, calling this:
505
506          $h->unshift_content('Skronk');
507
508       then the resulting content list will be this:
509
510         ('Skronkfoo bar ', $some_node, 'baz!')
511
512       and not this:
513
514         ('Skronk', 'foo bar ', $some_node, 'baz!')
515
516       What you'd to do get the latter is:
517
518         $h->splice_content(0,0,'Skronk');
519
520   unshift_content
521         $h->unshift_content($element_or_text, ...)
522
523       Just like "push_content", but adds to the beginning of the $h element's
524       content list.
525
526       The items of content to be added should each be either a text segment
527       (a string), an HTML::Element object, or an arrayref (which is fed thru
528       "new_from_lol").
529
530       The unshift_content method will try to consolidate adjacent text
531       segments while adding to the content list.  See above for a discussion
532       of this.
533
534       Returns $h (the element itself).
535
536   splice_content
537         @removed = $h->splice_content($offset, $length,
538                                       $element_or_text, ...);
539
540       Detaches the elements from $h's list of content-nodes, starting at
541       $offset and continuing for $length items, replacing them with the
542       elements of the following list, if any.  Returns the elements (if any)
543       removed from the content-list.  If $offset is negative, then it starts
544       that far from the end of the array, just like Perl's normal "splice"
545       function.  If $length and the following list is omitted, removes
546       everything from $offset onward.
547
548       The items of content to be added (if any) should each be either a text
549       segment (a string), an arrayref (which is fed thru "new_from_lol"), or
550       an HTML::Element object that's not already a child of $h.
551
552   detach
553         $old_parent = $h->detach();
554
555       This unlinks $h from its parent, by setting its 'parent' attribute to
556       undef, and by removing it from the content list of its parent (if it
557       had one).  The return value is the parent that was detached from (or
558       undef, if $h had no parent to start with).  Note that neither $h nor
559       its parent are explicitly destroyed.
560
561   detach_content
562         @old_content = $h->detach_content();
563
564       This unlinks all of $h's children from $h, and returns them.  Note that
565       these are not explicitly destroyed; for that, you can just use
566       "$h->delete_content".
567
568   replace_with
569         $h->replace_with( $element_or_text, ... )
570
571       This replaces $h in its parent's content list with the nodes specified.
572       The element $h (which by then may have no parent) is returned.  This
573       causes a fatal error if $h has no parent.  The list of nodes to insert
574       may contain $h, but at most once.  Aside from that possible exception,
575       the nodes to insert should not already be children of $h's parent.
576
577       Also, note that this method does not destroy $h if weak references are
578       turned off -- use "$h->replace_with(...)->delete" if you need that.
579
580   preinsert
581         $h->preinsert($element_or_text...);
582
583       Inserts the given nodes right BEFORE $h in $h's parent's content list.
584       This causes a fatal error if $h has no parent.  None of the given nodes
585       should be $h or other children of $h.  Returns $h.
586
587   postinsert
588         $h->postinsert($element_or_text...)
589
590       Inserts the given nodes right AFTER $h in $h's parent's content list.
591       This causes a fatal error if $h has no parent.  None of the given nodes
592       should be $h or other children of $h.  Returns $h.
593
594   replace_with_content
595         $h->replace_with_content();
596
597       This replaces $h in its parent's content list with its own content.
598       The element $h (which by then has no parent or content of its own) is
599       returned.  This causes a fatal error if $h has no parent.  Also, note
600       that this does not destroy $h if weak references are turned off -- use
601       "$h->replace_with_content->delete" if you need that.
602
603   delete_content
604         $h->delete_content();
605         $h->destroy_content(); # alias
606
607       Clears the content of $h, calling "$h->delete" for each content
608       element.  Compare with "$h->detach_content".
609
610       Returns $h.
611
612       "destroy_content" is an alias for this method.
613
614   delete
615         $h->delete();
616         $h->destroy(); # alias
617
618       Detaches this element from its parent (if it has one) and explicitly
619       destroys the element and all its descendants.  The return value is the
620       empty list (or "undef" in scalar context).
621
622       Before version 5.00 of HTML::Element, you had to call "delete" when you
623       were finished with the tree, or your program would leak memory.  This
624       is no longer necessary if weak references are enabled, see "Weak
625       References".
626
627   destroy
628       An alias for "delete".
629
630   destroy_content
631       An alias for "delete_content".
632
633   clone
634         $copy = $h->clone();
635
636       Returns a copy of the element (whose children are clones (recursively)
637       of the original's children, if any).
638
639       The returned element is parentless.  Any '_pos' attributes present in
640       the source element/tree will be absent in the copy.  For that and other
641       reasons, the clone of an HTML::TreeBuilder object that's in mid-parse
642       (i.e, the head of a tree that HTML::TreeBuilder is elaborating) cannot
643       (currently) be used to continue the parse.
644
645       You are free to clone HTML::TreeBuilder trees, just as long as: 1)
646       they're done being parsed, or 2) you don't expect to resume parsing
647       into the clone.  (You can continue parsing into the original; it is
648       never affected.)
649
650   clone_list
651         @copies = HTML::Element->clone_list(...nodes...);
652
653       Returns a list consisting of a copy of each node given.  Text segments
654       are simply copied; elements are cloned by calling "$it->clone" on each
655       of them.
656
657       Note that this must be called as a class method, not as an instance
658       method.  "clone_list" will croak if called as an instance method.  You
659       can also call it like so:
660
661           ref($h)->clone_list(...nodes...)
662
663   normalize_content
664         $h->normalize_content
665
666       Normalizes the content of $h -- i.e., concatenates any adjacent text
667       nodes.  (Any undefined text segments are turned into empty-strings.)
668       Note that this does not recurse into $h's descendants.
669
670   delete_ignorable_whitespace
671         $h->delete_ignorable_whitespace()
672
673       This traverses under $h and deletes any text segments that are
674       ignorable whitespace.  You should not use this if $h is under a "<pre>"
675       element.
676
677   insert_element
678         $h->insert_element($element, $implicit);
679
680       Inserts (via push_content) a new element under the element at
681       "$h->pos()".  Then updates "$h->pos()" to point to the inserted
682       element, unless $element is a prototypically empty element like "<br>",
683       "<hr>", "<img>", etc.  The new "$h->pos()" is returned.  This method is
684       useful only if your particular tree task involves setting "$h->pos()".
685

DUMPING METHODS

687   dump
688         $h->dump()
689         $h->dump(*FH)  ; # or *FH{IO} or $fh_obj
690
691       Prints the element and all its children to STDOUT (or to a specified
692       filehandle), in a format useful only for debugging.  The structure of
693       the document is shown by indentation (no end tags).
694
695   as_HTML
696         $s = $h->as_HTML();
697         $s = $h->as_HTML($entities);
698         $s = $h->as_HTML($entities, $indent_char);
699         $s = $h->as_HTML($entities, $indent_char, \%optional_end_tags);
700
701       Returns a string representing in HTML the element and its descendants.
702       The optional argument $entities specifies a string of the entities to
703       encode.  For compatibility with previous versions, specify '<>&' here.
704       If omitted or undef, all unsafe characters are encoded as HTML
705       entities.  See HTML::Entities for details.  If passed an empty string,
706       no entities are encoded.
707
708       If $indent_char is specified and defined, the HTML to be output is
709       intented, using the string you specify (which you probably should set
710       to "\t", or some number of spaces, if you specify it).
711
712       If "\%optional_end_tags" is specified and defined, it should be a
713       reference to a hash that holds a true value for every tag name whose
714       end tag is optional.  Defaults to "\%HTML::Element::optionalEndTag",
715       which is an alias to %HTML::Tagset::optionalEndTag, which, at time of
716       writing, contains true values for "p, li, dt, dd".  A useful value to
717       pass is an empty hashref, "{}", which means that no end-tags are
718       optional for this dump.  Otherwise, possibly consider copying
719       %HTML::Tagset::optionalEndTag to a hash of your own, adding or deleting
720       values as you like, and passing a reference to that hash.
721
722   as_text
723         $s = $h->as_text();
724         $s = $h->as_text(skip_dels => 1);
725
726       Returns a string consisting of only the text parts of the element's
727       descendants.  Any whitespace inside the element is included unchanged,
728       but whitespace not in the tree is never added.  But remember that
729       whitespace may be ignored or compacted by HTML::TreeBuilder during
730       parsing (depending on the value of the "ignore_ignorable_whitespace"
731       and "no_space_compacting" attributes).  Also, since whitespace is never
732       added during parsing,
733
734         HTML::TreeBuilder->new_from_content("<p>a</p><p>b</p>")
735                          ->as_text;
736
737       returns "ab", not "a b" or "a\nb".
738
739       Text under "<script>" or "<style>" elements is never included in what's
740       returned.  If "skip_dels" is true, then text content under "<del>"
741       nodes is not included in what's returned.
742
743   as_trimmed_text
744         $s = $h->as_trimmed_text(...);
745         $s = $h->as_trimmed_text(extra_chars => '\xA0'); # remove &nbsp;
746         $s = $h->as_text_trimmed(...); # alias
747
748       This is just like "as_text(...)" except that leading and trailing
749       whitespace is deleted, and any internal whitespace is collapsed.
750
751       This will not remove non-breaking spaces, Unicode spaces, or any other
752       non-ASCII whitespace unless you supply the extra characters as a string
753       argument (e.g. "$h->as_trimmed_text(extra_chars => '\xA0')").
754       "extra_chars" may be any string that can appear inside a character
755       class, including ranges like "a-z", POSIX character classes like
756       "[:alpha:]", and character class escapes like "\p{Zs}".
757
758   as_XML
759         $s = $h->as_XML()
760
761       Returns a string representing in XML the element and its descendants.
762
763       The XML is not indented.
764
765   as_Lisp_form
766         $s = $h->as_Lisp_form();
767
768       Returns a string representing the element and its descendants as a Lisp
769       form.  Unsafe characters are encoded as octal escapes.
770
771       The Lisp form is indented, and contains external ("href", etc.)  as
772       well as internal attributes ("_tag", "_content", "_implicit", etc.),
773       except for "_parent", which is omitted.
774
775       Current example output for a given element:
776
777         ("_tag" "img" "border" "0" "src" "pie.png" "usemap" "#main.map")
778
779   format
780         $s = $h->format; # use HTML::FormatText
781         $s = $h->format($formatter);
782
783       Formats text output. Defaults to HTML::FormatText.
784
785       Takes a second argument that is a reference to a formatter.
786
787   starttag
788         $start = $h->starttag();
789         $start = $h->starttag($entities);
790
791       Returns a string representing the complete start tag for the element.
792       I.e., leading "<", tag name, attributes, and trailing ">".  All values
793       are surrounded with double-quotes, and appropriate characters are
794       encoded.  If $entities is omitted or undef, all unsafe characters are
795       encoded as HTML entities.  See HTML::Entities for details.  If you
796       specify some value for $entities, remember to include the double-quote
797       character in it.  (Previous versions of this module would basically
798       behave as if '&">' were specified for $entities.)  If $entities is an
799       empty string, no entity is escaped.
800
801   starttag_XML
802         $start = $h->starttag_XML();
803
804       Returns a string representing the complete start tag for the element.
805
806   endtag
807         $end = $h->endtag();
808
809       Returns a string representing the complete end tag for this element.
810       I.e., "</", tag name, and ">".
811
812   endtag_XML
813         $end = $h->endtag_XML();
814
815       Returns a string representing the complete end tag for this element.
816       I.e., "</", tag name, and ">".
817

SECONDARY STRUCTURAL METHODS

819       These methods all involve some structural aspect of the tree; either
820       they report some aspect of the tree's structure, or they involve
821       traversal down the tree, or walking up the tree.
822
823   is_inside
824         $inside = $h->is_inside('tag', $element, ...);
825
826       Returns true if the $h element is, or is contained anywhere inside an
827       element that is any of the ones listed, or whose tag name is any of the
828       tag names listed.  You can use any mix of elements and tag names.
829
830   is_empty
831         $empty = $h->is_empty();
832
833       Returns true if $h has no content, i.e., has no elements or text
834       segments under it.  In other words, this returns true if $h is a leaf
835       node, AKA a terminal node.  Do not confuse this sense of "empty" with
836       another sense that it can have in SGML/HTML/XML terminology, which
837       means that the element in question is of the type (like HTML's "<hr>",
838       "<br>", "<img>", etc.) that can't have any content.
839
840       That is, a particular "<p>" element may happen to have no content, so
841       $that_p_element->is_empty will be true -- even though the prototypical
842       "<p>" element isn't "empty" (not in the way that the prototypical
843       "<hr>" element is).
844
845       If you think this might make for potentially confusing code, consider
846       simply using the clearer exact equivalent:  "not($h->content_list)".
847
848   pindex
849         $index = $h->pindex();
850
851       Return the index of the element in its parent's contents array, such
852       that $h would equal
853
854         $h->parent->content->[$h->pindex]
855         # or
856         ($h->parent->content_list)[$h->pindex]
857
858       assuming $h isn't root.  If the element $h is root, then "$h->pindex"
859       returns "undef".
860
861   left
862         $element = $h->left();
863         @elements = $h->left();
864
865       In scalar context: returns the node that's the immediate left sibling
866       of $h.  If $h is the leftmost (or only) child of its parent (or has no
867       parent), then this returns undef.
868
869       In list context: returns all the nodes that're the left siblings of $h
870       (starting with the leftmost).  If $h is the leftmost (or only) child of
871       its parent (or has no parent), then this returns an empty list.
872
873       (See also "$h->preinsert(LIST)".)
874
875   right
876         $element = $h->right();
877         @elements = $h->right();
878
879       In scalar context: returns the node that's the immediate right sibling
880       of $h.  If $h is the rightmost (or only) child of its parent (or has no
881       parent), then this returns "undef".
882
883       In list context: returns all the nodes that're the right siblings of
884       $h, starting with the leftmost.  If $h is the rightmost (or only) child
885       of its parent (or has no parent), then this returns an empty list.
886
887       (See also "$h->postinsert(LIST)".)
888
889   address
890         $address = $h->address();
891         $element_or_text = $h->address($address);
892
893       The first form (with no parameter) returns a string representing the
894       location of $h in the tree it is a member of.  The address consists of
895       numbers joined by a '.', starting with '0', and followed by the
896       pindexes of the nodes in the tree that are ancestors of $h, starting
897       from the top.
898
899       So if the way to get to a node starting at the root is to go to child 2
900       of the root, then child 10 of that, and then child 0 of that, and then
901       you're there -- then that node's address is "0.2.10.0".
902
903       As a bit of a special case, the address of the root is simply "0".
904
905       I forsee this being used mainly for debugging, but you may find your
906       own uses for it.
907
908         $element_or_text = $h->address($address);
909
910       This form returns the node (whether element or text-segment) at the
911       given address in the tree that $h is a part of.  (That is, the address
912       is resolved starting from "$h->root".)
913
914       If there is no node at the given address, this returns "undef".
915
916       You can specify "relative addressing" (i.e., that indexing is supposed
917       to start from $h and not from "$h->root") by having the address start
918       with a period -- e.g., "$h->address(".3.2")" will look at child 3 of
919       $h, and child 2 of that.
920
921   depth
922         $depth = $h->depth();
923
924       Returns a number expressing $h's depth within its tree, i.e., how many
925       steps away it is from the root.  If $h has no parent (i.e., is root),
926       its depth is 0.
927
928   root
929         $root = $h->root();
930
931       Returns the element that's the top of $h's tree.  If $h is root, this
932       just returns $h.  (If you want to test whether $h is the root, instead
933       of asking what its root is, just test "not($h->parent)".)
934
935   lineage
936         @lineage = $h->lineage();
937
938       Returns the list of $h's ancestors, starting with its parent, and then
939       that parent's parent, and so on, up to the root.  If $h is root, this
940       returns an empty list.
941
942       If you simply want a count of the number of elements in $h's lineage,
943       use "$h->depth".
944
945   lineage_tag_names
946         @names = $h->lineage_tag_names();
947
948       Returns the list of the tag names of $h's ancestors, starting with its
949       parent, and that parent's parent, and so on, up to the root.  If $h is
950       root, this returns an empty list.  Example output: "('em', 'td', 'tr',
951       'table', 'body', 'html')"
952
953       Equivalent to:
954
955         map { $_->tag } $h->lineage;
956
957   descendants
958         @descendants = $h->descendants();
959
960       In list context, returns the list of all $h's descendant elements,
961       listed in pre-order (i.e., an element appears before its content-
962       elements).  Text segments DO NOT appear in the list.  In scalar
963       context, returns a count of all such elements.
964
965   descendents
966       This is just an alias to the "descendants" method, for people who can't
967       spell.
968
969   find_by_tag_name
970         @elements = $h->find_by_tag_name('tag', ...);
971         $first_match = $h->find_by_tag_name('tag', ...);
972
973       In list context, returns a list of elements at or under $h that have
974       any of the specified tag names.  In scalar context, returns the first
975       (in pre-order traversal of the tree) such element found, or undef if
976       none.
977
978   find
979       This is just an alias to "find_by_tag_name".  (There was once going to
980       be a whole find_* family of methods, but then "look_down" filled that
981       niche, so there turned out not to be much reason for the verboseness of
982       the name "find_by_tag_name".)
983
984   find_by_attribute
985         @elements = $h->find_by_attribute('attribute', 'value');
986         $first_match = $h->find_by_attribute('attribute', 'value');
987
988       In a list context, returns a list of elements at or under $h that have
989       the specified attribute, and have the given value for that attribute.
990       In a scalar context, returns the first (in pre-order traversal of the
991       tree) such element found, or undef if none.
992
993       This method is deprecated in favor of the more expressive "look_down"
994       method, which new code should use instead.
995
996   look_down
997         @elements = $h->look_down( ...criteria... );
998         $first_match = $h->look_down( ...criteria... );
999
1000       This starts at $h and looks thru its element descendants (in pre-
1001       order), looking for elements matching the criteria you specify.  In
1002       list context, returns all elements that match all the given criteria;
1003       in scalar context, returns the first such element (or undef, if nothing
1004       matched).
1005
1006       There are three kinds of criteria you can specify:
1007
1008       (attr_name, attr_value)
1009           This means you're looking for an element with that value for that
1010           attribute.  Example: "alt", "pix!".  Consider that you can search
1011           on internal attribute values too: "_tag", "p".
1012
1013       (attr_name, qr/.../)
1014           This means you're looking for an element whose value for that
1015           attribute matches the specified Regexp object.
1016
1017       a coderef
1018           This means you're looking for elements where
1019           coderef->(each_element) returns true.  Example:
1020
1021             my @wide_pix_images = $h->look_down(
1022               _tag => "img",
1023               alt  => "pix!",
1024               sub { $_[0]->attr('width') > 350 }
1025             );
1026
1027       Note that "(attr_name, attr_value)" and "(attr_name, qr/.../)" criteria
1028       are almost always faster than coderef criteria, so should presumably be
1029       put before them in your list of criteria.  That is, in the example
1030       above, the sub ref is called only for elements that have already passed
1031       the criteria of having a "_tag" attribute with value "img", and an
1032       "alt" attribute with value "pix!".  If the coderef were first, it would
1033       be called on every element, and then what elements pass that criterion
1034       (i.e., elements for which the coderef returned true) would be checked
1035       for their "_tag" and "alt" attributes.
1036
1037       Note that comparison of string attribute-values against the string
1038       value in "(attr_name, attr_value)" is case-INsensitive!  A criterion of
1039       "('align', 'right')" will match an element whose "align" value is
1040       "RIGHT", or "right" or "rIGhT", etc.
1041
1042       Note also that "look_down" considers "" (empty-string) and undef to be
1043       different things, in attribute values.  So this:
1044
1045         $h->look_down("alt", "")
1046
1047       will find elements with an "alt" attribute, but where the value for the
1048       "alt" attribute is "".  But this:
1049
1050         $h->look_down("alt", undef)
1051
1052       is the same as:
1053
1054         $h->look_down(sub { !defined($_[0]->attr('alt')) } )
1055
1056       That is, it finds elements that do not have an "alt" attribute at all
1057       (or that do have an "alt" attribute, but with a value of undef -- which
1058       is not normally possible).
1059
1060       Note that when you give several criteria, this is taken to mean you're
1061       looking for elements that match all your criterion, not just any of
1062       them.  In other words, there is an implicit "and", not an "or".  So if
1063       you wanted to express that you wanted to find elements with a "name"
1064       attribute with the value "foo" or with an "id" attribute with the value
1065       "baz", you'd have to do it like:
1066
1067         @them = $h->look_down(
1068           sub {
1069             # the lcs are to fold case
1070             lc($_[0]->attr('name')) eq 'foo'
1071             or lc($_[0]->attr('id')) eq 'baz'
1072           }
1073         );
1074
1075       Coderef criteria are more expressive than "(attr_name, attr_value)" and
1076       "(attr_name, qr/.../)" criteria, and all "(attr_name, attr_value)" and
1077       "(attr_name, qr/.../)" criteria could be expressed in terms of
1078       coderefs.  However, "(attr_name, attr_value)" and "(attr_name,
1079       qr/.../)" criteria are a convenient shorthand.  (In fact, "look_down"
1080       itself is basically "shorthand" too, since anything you can do with
1081       "look_down" you could do by traversing the tree, either with the
1082       "traverse" method or with a routine of your own.  However, "look_down"
1083       often makes for very concise and clear code.)
1084
1085   look_up
1086         @elements = $h->look_up( ...criteria... );
1087         $first_match = $h->look_up( ...criteria... );
1088
1089       This is identical to "$h->look_down", except that whereas
1090       "$h->look_down" basically scans over the list:
1091
1092          ($h, $h->descendants)
1093
1094       "$h->look_up" instead scans over the list
1095
1096          ($h, $h->lineage)
1097
1098       So, for example, this returns all ancestors of $h (possibly including
1099       $h itself) that are "<td>" elements with an "align" attribute with a
1100       value of "right" (or "RIGHT", etc.):
1101
1102          $h->look_up("_tag", "td", "align", "right");
1103
1104   traverse
1105         $h->traverse(...options...)
1106
1107       Lengthy discussion of HTML::Element's unnecessary and confusing
1108       "traverse" method has been moved to a separate file:
1109       HTML::Element::traverse
1110
1111   attr_get_i
1112         @values = $h->attr_get_i('attribute');
1113         $first_value = $h->attr_get_i('attribute');
1114
1115       In list context, returns a list consisting of the values of the given
1116       attribute for $h and for all its ancestors starting from $h and working
1117       its way up.  Nodes with no such attribute are skipped.  ("attr_get_i"
1118       stands for "attribute get, with inheritance".)  In scalar context,
1119       returns the first such value, or undef if none.
1120
1121       Consider a document consisting of:
1122
1123          <html lang='i-klingon'>
1124            <head><title>Pati Pata</title></head>
1125            <body>
1126              <h1 lang='la'>Stuff</h1>
1127              <p lang='es-MX' align='center'>
1128                Foo bar baz <cite>Quux</cite>.
1129              </p>
1130              <p>Hooboy.</p>
1131            </body>
1132          </html>
1133
1134       If $h is the "<cite>" element, "$h->attr_get_i("lang")" in list context
1135       will return the list "('es-MX', 'i-klingon')".  In scalar context, it
1136       will return the value 'es-MX'.
1137
1138       If you call with multiple attribute names...
1139
1140         @values = $h->attr_get_i('a1', 'a2', 'a3');
1141         $first_value = $h->attr_get_i('a1', 'a2', 'a3');
1142
1143       ...in list context, this will return a list consisting of the values of
1144       these attributes which exist in $h and its ancestors.  In scalar
1145       context, this returns the first value (i.e., the value of the first
1146       existing attribute from the first element that has any of the
1147       attributes listed).  So, in the above example,
1148
1149         $h->attr_get_i('lang', 'align');
1150
1151       will return:
1152
1153          ('es-MX', 'center', 'i-klingon') # in list context
1154         or
1155          'es-MX' # in scalar context.
1156
1157       But note that this:
1158
1159        $h->attr_get_i('align', 'lang');
1160
1161       will return:
1162
1163          ('center', 'es-MX', 'i-klingon') # in list context
1164         or
1165          'center' # in scalar context.
1166
1167   tagname_map
1168         $hash_ref = $h->tagname_map();
1169
1170       Scans across $h and all its descendants, and makes a hash (a reference
1171       to which is returned) where each entry consists of a key that's a tag
1172       name, and a value that's a reference to a list to all elements that
1173       have that tag name.  I.e., this method returns:
1174
1175          {
1176            # Across $h and all descendants...
1177            'a'   => [ ...list of all <a>   elements... ],
1178            'em'  => [ ...list of all <em>  elements... ],
1179            'img' => [ ...list of all <img> elements... ],
1180          }
1181
1182       (There are entries in the hash for only those tagnames that occur
1183       at/under $h -- so if there's no "<img>" elements, there'll be no "img"
1184       entry in the returned hashref.)
1185
1186       Example usage:
1187
1188           my $map_r = $h->tagname_map();
1189           my @heading_tags = sort grep m/^h\d$/s, keys %$map_r;
1190           if(@heading_tags) {
1191             print "Heading levels used: @heading_tags\n";
1192           } else {
1193             print "No headings.\n"
1194           }
1195
1196   extract_links
1197         $links_array_ref = $h->extract_links();
1198         $links_array_ref = $h->extract_links(@wantedTypes);
1199
1200       Returns links found by traversing the element and all of its children
1201       and looking for attributes (like "href" in an "<a>" element, or "src"
1202       in an "<img>" element) whose values represent links.  The return value
1203       is a reference to an array.  Each element of the array is reference to
1204       an array with four items: the link-value, the element that has the
1205       attribute with that link-value, and the name of that attribute, and the
1206       tagname of that element.  (Example: "['http://www.suck.com/',"
1207       $elem_obj ", 'href', 'a']".)  You may or may not end up using the
1208       element itself -- for some purposes, you may use only the link value.
1209
1210       You might specify that you want to extract links from just some kinds
1211       of elements (instead of the default, which is to extract links from all
1212       the kinds of elements known to have attributes whose values represent
1213       links).  For instance, if you want to extract links from only "<a>" and
1214       "<img>" elements, you could code it like this:
1215
1216         for (@{  $e->extract_links('a', 'img')  }) {
1217             my($link, $element, $attr, $tag) = @$_;
1218             print
1219               "Hey, there's a $tag that links to ",
1220               $link, ", in its $attr attribute, at ",
1221               $element->address(), ".\n";
1222         }
1223
1224   simplify_pres
1225         $h->simplify_pres();
1226
1227       In text bits under PRE elements that are at/under $h, this routine
1228       nativizes all newlines, and expands all tabs.
1229
1230       That is, if you read a file with lines delimited by "\cm\cj"'s, the
1231       text under PRE areas will have "\cm\cj"'s instead of "\n"'s. Calling
1232       "$h->simplify_pres" on such a tree will turn "\cm\cj"'s into "\n"'s.
1233
1234       Tabs are expanded to however many spaces it takes to get to the next
1235       8th column -- the usual way of expanding them.
1236
1237   same_as
1238         $equal = $h->same_as($i)
1239
1240       Returns true if $h and $i are both elements representing the same tree
1241       of elements, each with the same tag name, with the same explicit
1242       attributes (i.e., not counting attributes whose names start with "_"),
1243       and with the same content (textual, comments, etc.).
1244
1245       Sameness of descendant elements is tested, recursively, with
1246       "$child1->same_as($child_2)", and sameness of text segments is tested
1247       with "$segment1 eq $segment2".
1248
1249   new_from_lol
1250         $h = HTML::Element->new_from_lol($array_ref);
1251         @elements = HTML::Element->new_from_lol($array_ref, ...);
1252
1253       Resursively constructs a tree of nodes, based on the (non-cyclic) data
1254       structure represented by each $array_ref, where that is a reference to
1255       an array of arrays (of arrays (of arrays (etc.))).
1256
1257       In each arrayref in that structure, different kinds of values are
1258       treated as follows:
1259
1260       •   Arrayrefs
1261
1262           Arrayrefs are considered to designate a sub-tree representing
1263           children for the node constructed from the current arrayref.
1264
1265       •   Hashrefs
1266
1267           Hashrefs are considered to contain attribute-value pairs to add to
1268           the element to be constructed from the current arrayref
1269
1270       •   Text segments
1271
1272           Text segments at the start of any arrayref will be considered to
1273           specify the name of the element to be constructed from the current
1274           arrayref; all other text segments will be considered to specify
1275           text segments as children for the current arrayref.
1276
1277       •   Elements
1278
1279           Existing element objects are either inserted into the treelet
1280           constructed, or clones of them are.  That is, when the lol-tree is
1281           being traversed and elements constructed based what's in it, if an
1282           existing element object is found, if it has no parent, then it is
1283           added directly to the treelet constructed; but if it has a parent,
1284           then "$that_node->clone" is added to the treelet at the appropriate
1285           place.
1286
1287       An example will hopefully make this more obvious:
1288
1289         my $h = HTML::Element->new_from_lol(
1290           ['html',
1291             ['head',
1292               [ 'title', 'I like stuff!' ],
1293             ],
1294             ['body',
1295               {'lang', 'en-JP', _implicit => 1},
1296               'stuff',
1297               ['p', 'um, p < 4!', {'class' => 'par123'}],
1298               ['div', {foo => 'bar'}, '123'],
1299             ]
1300           ]
1301         );
1302         $h->dump;
1303
1304       Will print this:
1305
1306         <html> @0
1307           <head> @0.0
1308             <title> @0.0.0
1309               "I like stuff!"
1310           <body lang="en-JP"> @0.1 (IMPLICIT)
1311             "stuff"
1312             <p class="par123"> @0.1.1
1313               "um, p < 4!"
1314             <div foo="bar"> @0.1.2
1315               "123"
1316
1317       And printing $h->as_HTML will give something like:
1318
1319         <html><head><title>I like stuff!</title></head>
1320         <body lang="en-JP">stuff<p class="par123">um, p &lt; 4!
1321         <div foo="bar">123</div></body></html>
1322
1323       You can even do fancy things with "map":
1324
1325         $body->push_content(
1326           # push_content implicitly calls new_from_lol on arrayrefs...
1327           ['br'],
1328           ['blockquote',
1329             ['h2', 'Pictures!'],
1330             map ['p', $_],
1331             $body2->look_down("_tag", "img"),
1332               # images, to be copied from that other tree.
1333           ],
1334           # and more stuff:
1335           ['ul',
1336             map ['li', ['a', {'href'=>"$_.png"}, $_ ] ],
1337             qw(Peaches Apples Pears Mangos)
1338           ],
1339         );
1340
1341       In scalar context, you must supply exactly one arrayref.  In list
1342       context, you can pass a list of arrayrefs, and new_from_lol will return
1343       a list of elements, one for each arrayref.
1344
1345         @elements = HTML::Element->new_from_lol(
1346           ['hr'],
1347           ['p', 'And there, on the door, was a hook!'],
1348         );
1349          # constructs two elements.
1350
1351   objectify_text
1352         $h->objectify_text();
1353
1354       This turns any text nodes under $h from mere text segments (strings)
1355       into real objects, pseudo-elements with a tag-name of "~text", and the
1356       actual text content in an attribute called "text".  (For a discussion
1357       of pseudo-elements, see the "tag" method, far above.)  This method is
1358       provided because, for some purposes, it is convenient or necessary to
1359       be able, for a given text node, to ask what element is its parent; and
1360       clearly this is not possible if a node is just a text string.
1361
1362       Note that these "~text" objects are not recognized as text nodes by
1363       methods like "as_text".  Presumably you will want to call
1364       "$h->objectify_text", perform whatever task that you needed that for,
1365       and then call "$h->deobjectify_text" before calling anything like
1366       "$h->as_text".
1367
1368   deobjectify_text
1369         $h->deobjectify_text();
1370
1371       This undoes the effect of "$h->objectify_text".  That is, it takes any
1372       "~text" pseudo-elements in the tree at/under $h, and deletes each one,
1373       replacing each with the content of its "text" attribute.
1374
1375       Note that if $h itself is a "~text" pseudo-element, it will be
1376       destroyed -- a condition you may need to treat specially in your
1377       calling code (since it means you can't very well do anything with $h
1378       after that).  So that you can detect that condition, if $h is itself a
1379       "~text" pseudo-element, then this method returns the value of the
1380       "text" attribute, which should be a defined value; in all other cases,
1381       it returns undef.
1382
1383       (This method assumes that no "~text" pseudo-element has any children.)
1384
1385   number_lists
1386         $h->number_lists();
1387
1388       For every UL, OL, DIR, and MENU element at/under $h, this sets a
1389       "_bullet" attribute for every child LI element.  For LI children of an
1390       OL, the "_bullet" attribute's value will be something like "4.", "d.",
1391       "D.", "IV.", or "iv.", depending on the OL element's "type" attribute.
1392       LI children of a UL, DIR, or MENU get their "_bullet" attribute set to
1393       "*".  There should be no other LIs (i.e., except as children of OL, UL,
1394       DIR, or MENU elements), and if there are, they are unaffected.
1395
1396   has_insane_linkage
1397         $h->has_insane_linkage
1398
1399       This method is for testing whether this element or the elements under
1400       it have linkage attributes (_parent and _content) whose values are
1401       deeply aberrant: if there are undefs in a content list; if an element
1402       appears in the content lists of more than one element; if the _parent
1403       attribute of an element doesn't match its actual parent; or if an
1404       element appears as its own descendant (i.e., if there is a cyclicity in
1405       the tree).
1406
1407       This returns empty list (or false, in scalar context) if the subtree's
1408       linkage methods are sane; otherwise it returns two items (or true, in
1409       scalar context): the element where the error occurred, and a string
1410       describing the error.
1411
1412       This method is provided is mainly for debugging and troubleshooting --
1413       it should be quite impossible for any document constructed via
1414       HTML::TreeBuilder to parse into a non-sane tree (since it's not the
1415       content of the tree per se that's in question, but whether the tree in
1416       memory was properly constructed); and it should be impossible for you
1417       to produce an insane tree just thru reasonable use of normal documented
1418       structure-modifying methods.  But if you're constructing your own
1419       trees, and your program is going into infinite loops as during calls to
1420       traverse() or any of the secondary structural methods, as part of
1421       debugging, consider calling "has_insane_linkage" on the tree.
1422
1423   element_class
1424         $classname = $h->element_class();
1425
1426       This method returns the class which will be used for new elements.  It
1427       defaults to HTML::Element, but can be overridden by subclassing or
1428       esoteric means best left to those will will read the source and then
1429       not complain when those esoteric means change.  (Just subclass.)
1430

CLASS METHODS

1432   Use_Weak_Refs
1433         $enabled = HTML::Element->Use_Weak_Refs;
1434         HTML::Element->Use_Weak_Refs( $enabled );
1435
1436       This method allows you to check whether weak reference support is
1437       enabled, and to enable or disable it. For details, see "Weak
1438       References".  $enabled is true if weak references are enabled.
1439
1440       You should not switch this in the middle of your program, and you
1441       probably shouldn't use it at all.  Existing trees are not affected by
1442       this method (until you start modifying nodes in them).
1443
1444       Throws an exception if you attempt to enable weak references and your
1445       Perl or Scalar::Util does not support them.
1446
1447       Disabling weak reference support is deprecated.
1448

SUBROUTINES

1450   Version
1451       This subroutine is deprecated.  Please use the standard VERSION method
1452       (e.g. "HTML::Element->VERSION") instead.
1453
1454   ABORT OK PRUNE PRUNE_SOFTLY PRUNE_UP
1455       Constants for signalling back to the traverser
1456

BUGS

1458       * If you want to free the memory associated with a tree built of
1459       HTML::Element nodes, and you have disabled weak references, then you
1460       will have to delete it explicitly using the "delete" method.  See "Weak
1461       References".
1462
1463       * There's almost nothing to stop you from making a "tree" with
1464       cyclicities (loops) in it, which could, for example, make the traverse
1465       method go into an infinite loop.  So don't make cyclicities!  (If all
1466       you're doing is parsing HTML files, and looking at the resulting trees,
1467       this will never be a problem for you.)
1468
1469       * There's no way to represent comments or processing directives in a
1470       tree with HTML::Elements.  Not yet, at least.
1471
1472       * There's (currently) nothing to stop you from using an undefined value
1473       as a text segment.  If you're running under "perl -w", however, this
1474       may make HTML::Element's code produce a slew of warnings.
1475

NOTES ON SUBCLASSING

1477       You are welcome to derive subclasses from HTML::Element, but you should
1478       be aware that the code in HTML::Element makes certain assumptions about
1479       elements (and I'm using "element" to mean ONLY an object of class
1480       HTML::Element, or of a subclass of HTML::Element):
1481
1482       * The value of an element's _parent attribute must either be undef or
1483       otherwise false, or must be an element.
1484
1485       * The value of an element's _content attribute must either be undef or
1486       otherwise false, or a reference to an (unblessed) array.  The array may
1487       be empty; but if it has items, they must ALL be either mere strings
1488       (text segments), or elements.
1489
1490       * The value of an element's _tag attribute should, at least, be a
1491       string of printable characters.
1492
1493       Moreover, bear these rules in mind:
1494
1495       * Do not break encapsulation on objects.  That is, access their
1496       contents only thru $obj->attr or more specific methods.
1497
1498       * You should think twice before completely overriding any of the
1499       methods that HTML::Element provides.  (Overriding with a method that
1500       calls the superclass method is not so bad, though.)
1501

SEE ALSO

1503       HTML::Tree; HTML::TreeBuilder; HTML::AsSubs; HTML::Tagset; and, for the
1504       morbidly curious, HTML::Element::traverse.
1505

ACKNOWLEDGEMENTS

1507       Thanks to Mark-Jason Dominus for a POD suggestion.
1508

AUTHOR

1510       Current maintainers:
1511
1512       •   Christopher J. Madsen "<perl AT cjmweb.net>"
1513
1514       •   Jeff Fearn "<jfearn AT cpan.org>"
1515
1516       Original HTML-Tree author:
1517
1518       •   Gisle Aas
1519
1520       Former maintainers:
1521
1522       •   Sean M. Burke
1523
1524       •   Andy Lester
1525
1526       •   Pete Krawczyk "<petek AT cpan.org>"
1527
1528       You can follow or contribute to HTML-Tree's development at
1529       <https://github.com/kentfredric/HTML-Tree>.
1530
1532       Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke, 2005 Andy
1533       Lester, 2006 Pete Krawczyk, 2010 Jeff Fearn, 2012 Christopher J.
1534       Madsen.
1535
1536       This library is free software; you can redistribute it and/or modify it
1537       under the same terms as Perl itself.
1538
1539       The programs in this library are distributed in the hope that they will
1540       be useful, but without any warranty; without even the implied warranty
1541       of merchantability or fitness for a particular purpose.
1542
1543
1544
1545perl v5.34.0                      2021-07-22                  HTML::Element(3)
Impressum