1HTML::Element::traverseU(s3e)r Contributed Perl DocumentaHtTiMoLn::Element::traverse(3)
2
3
4

NAME

6       HTML::Element::traverse - discussion of HTML::Element's traverse method
7

VERSION

9       This document describes version 5.07 of HTML::Element::traverse,
10       released August 31, 2017 as part of HTML-Tree.
11

SYNOPSIS

13         # $element->traverse is unnecessary and obscure.
14         #   Don't use it in new code.
15

DESCRIPTION

17       "HTML::Element" provides a method "traverse" that traverses the tree
18       and calls user-specified callbacks for each node, in pre- or post-
19       order.  However, use of the method is quite superfluous: if you want to
20       recursively visit every node in the tree, it's almost always simpler to
21       write a subroutine does just that, than it is to bundle up the pre-
22       and/or post-order code in callbacks for the "traverse" method.
23

EXAMPLES

25       Suppose you want to traverse at/under a node $tree and give elements an
26       'id' attribute unless they already have one.
27
28       You can use the "traverse" method:
29
30         {
31           my $counter = 'x0000';
32           $start_node->traverse(
33             [ # Callbacks;
34               # pre-order callback:
35               sub {
36                 my $x = $_[0];
37                 $x->attr('id', $counter++) unless defined $x->attr('id');
38                 return HTML::Element::OK; # keep traversing
39               },
40               # post-order callback:
41               undef
42             ],
43             1, # don't call the callbacks for text nodes
44           );
45         }
46
47       or you can just be simple and clear (and not have to understand the
48       calling format for "traverse") by writing a sub that traverses the tree
49       by just calling itself:
50
51         {
52           my $counter = 'x0000';
53           sub give_id {
54             my $x = $_[0];
55             $x->attr('id', $counter++) unless defined $x->attr('id');
56             foreach my $c ($x->content_list) {
57               give_id($c) if ref $c; # ignore text nodes
58             }
59           };
60           give_id($start_node);
61         }
62
63       See, isn't that nice and clear?
64
65       But, if you really need to know:
66

THE TRAVERSE METHOD

68       The "traverse()" method is a general object-method for traversing a
69       tree or subtree and calling user-specified callbacks.  It accepts the
70       following syntaxes:
71
72       $h->traverse(\&callback)
73       or $h->traverse(\&callback, $ignore_text)
74       or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)
75
76       These all mean to traverse the element and all of its children.  That
77       is, this method starts at node $h, "pre-order visits" $h, traverses its
78       children, and then will "post-order visit" $h.  "Visiting" means that
79       the callback routine is called, with these arguments:
80
81           $_[0] : the node (element or text segment),
82           $_[1] : a startflag, and
83           $_[2] : the depth
84
85       If the $ignore_text parameter is given and true, then the pre-order
86       call will not be happen for text content.
87
88       The startflag is 1 when we enter a node (i.e., in pre-order calls) and
89       0 when we leave the node (in post-order calls).
90
91       Note, however, that post-order calls don't happen for nodes that are
92       text segments or are elements that are prototypically empty (like "br",
93       "hr", etc.).
94
95       If we visit text nodes (i.e., unless $ignore_text is given and true),
96       then when text nodes are visited, we will also pass two extra arguments
97       to the callback:
98
99           $_[3] : the element that's the parent
100                    of this text node
101           $_[4] : the index of this text node
102                    in its parent's content list
103
104       Note that you can specify that the pre-order routine can be a different
105       routine from the post-order one:
106
107           $h->traverse( [\&pre_callback,\&post_callback], ...);
108
109       You can also specify that no post-order calls are to be made, by
110       providing a false value as the post-order routine:
111
112           $h->traverse([ \&pre_callback,0 ], ...);
113
114       And similarly for suppressing pre-order callbacks:
115
116           $h->traverse([ 0,\&post_callback ], ...);
117
118       Note that these two syntaxes specify the same operation:
119
120           $h->traverse([\&foo,\&foo], ...);
121           $h->traverse( \&foo       , ...);
122
123       The return values from calls to your pre- or post-order routines are
124       significant, and are used to control recursion into the tree.
125
126       These are the values you can return, listed in descending order of my
127       estimation of their usefulness:
128
129       HTML::Element::OK, 1, or any other true value
130           ...to keep on traversing.
131
132           Note that "HTML::Element::OK" et al are constants.  So if you're
133           running under "use strict" (as I hope you are), and you say:
134           "return HTML::Element::PRUEN" the compiler will flag this as an
135           error (an unallowable bareword, specifically), whereas if you spell
136           PRUNE correctly, the compiler will not complain.
137
138       undef, 0, '0', '', or HTML::Element::PRUNE
139           ...to block traversing under the current element's content.  (This
140           is ignored if received from a post-order callback, since by then
141           the recursion has already happened.)  If this is returned by a pre-
142           order callback, no post-order callback for the current node will
143           happen.  (Recall that if your callback exits with just "return;",
144           it is returning undef -- at least in scalar context, and "traverse"
145           always calls your callbacks in scalar context.)
146
147       HTML::Element::ABORT
148           ...to abort the whole traversal immediately.  This is often useful
149           when you're looking for just the first node in the tree that meets
150           some criterion of yours.
151
152       HTML::Element::PRUNE_UP
153           ...to abort continued traversal into this node and its parent node.
154           No post-order callback for the current or parent node will happen.
155
156       HTML::Element::PRUNE_SOFTLY
157           Like PRUNE, except that the post-order call for the current node is
158           not blocked.
159
160       Almost every task to do with extracting information from a tree can be
161       expressed in terms of traverse operations (usually in only one pass,
162       and usually paying attention to only pre-order, or to only post-order),
163       or operations based on traversing. (In fact, many of the other methods
164       in this class are basically calls to traverse() with particular
165       arguments.)
166
167       The source code for HTML::Element and HTML::TreeBuilder contain several
168       examples of the use of the "traverse" method to gather information
169       about the content of trees and subtrees.
170
171       (Note: you should not change the structure of a tree while you are
172       traversing it.)
173
174       [End of documentation for the "traverse()" method]
175
176   Traversing with Recursive Anonymous Routines
177       Now, if you've been reading Structure and Interpretation of Computer
178       Programs too much, maybe you even want a recursive lambda.  Go ahead:
179
180         {
181           my $counter = 'x0000';
182           my $give_id;
183           $give_id = sub {
184             my $x = $_[0];
185             $x->attr('id', $counter++) unless defined $x->attr('id');
186             foreach my $c ($x->content_list) {
187               $give_id->($c) if ref $c; # ignore text nodes
188             }
189           };
190           $give_id->($start_node);
191           undef $give_id;
192         }
193
194       It's a bit nutty, and it's still more concise than a call to the
195       "traverse" method!
196
197       It is left as an exercise to the reader to figure out how to do the
198       same thing without using a $give_id symbol at all.
199
200       It is also left as an exercise to the reader to figure out why I
201       undefine $give_id, above; and why I could achieved the same effect with
202       any of:
203
204           $give_id = 'I like pie!';
205          # or...
206           $give_id = [];
207          # or even;
208           $give_id = sub { print "Mmmm pie!\n" };
209
210       But not:
211
212           $give_id = sub { print "I'm $give_id and I like pie!\n" };
213          # nor...
214           $give_id = \$give_id;
215          # nor...
216           $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
217
218   Doing Recursive Things Iteratively
219       Note that you may at times see an iterative implementation of pre-order
220       traversal, like so:
221
222          {
223            my @to_do = ($tree); # start-node
224            while(@to_do) {
225              my $this = shift @to_do;
226
227              # "Visit" the node:
228              $this->attr('id', $counter++)
229               unless defined $this->attr('id');
230
231              unshift @to_do, grep ref $_, $this->content_list;
232               # Put children on the stack -- they'll be visited next
233            }
234          }
235
236       This can under certain circumstances be more efficient than just a
237       normal recursive routine, but at the cost of being rather obscure.  It
238       gains efficiency by avoiding the overhead of function-calling, but
239       since there are several method dispatches however you do it (to "attr"
240       and "content_list"), the overhead for a simple function call is
241       insignificant.
242
243   Pruning and Whatnot
244       The "traverse" method does have the fairly neat features of the
245       "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals.  None of these can be
246       implemented totally straightforwardly with recursive routines, but it
247       is quite possible.  "ABORT"-like behavior can be implemented either
248       with using non-local returning with "eval"/"die":
249
250         my $died_on; # if you need to know where...
251         sub thing {
252           ... visits $_[0]...
253           ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
254           ... else call thing($child) for each child...
255           ...any post-order visiting $_[0]...
256         }
257         eval { thing($node) };
258         if($@) {
259           if($@ =~ m<^ABORT_TRAV>) {
260             ...it died (aborted) on $died_on...
261           } else {
262             die $@; # some REAL error happened
263           }
264         }
265
266       or you can just do it with flags:
267
268         my($abort_flag, $died_on);
269         sub thing {
270           ... visits $_[0]...
271           ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
272           foreach my $c ($_[0]->content_list) {
273             thing($c);
274             return if $abort_flag;
275           }
276           ...any post-order visiting $_[0]...
277           return;
278         }
279
280         $abort_flag = $died_on = undef;
281         thing($node);
282         ...if defined $abort_flag, it died on $died_on
283

SEE ALSO

285       HTML::Element
286

AUTHOR

288       Current maintainers:
289
290       ·   Christopher J. Madsen "<perl AT cjmweb.net>"
291
292       ·   Jeff Fearn "<jfearn AT cpan.org>"
293
294       Original HTML-Tree author:
295
296       ·   Gisle Aas
297
298       Former maintainers:
299
300       ·   Sean M. Burke
301
302       ·   Andy Lester
303
304       ·   Pete Krawczyk "<petek AT cpan.org>"
305
306       You can follow or contribute to HTML-Tree's development at
307       <https://github.com/kentfredric/HTML-Tree>.
308
310       Copyright 2000,2001 Sean M. Burke
311
312
313
314perl v5.30.1                      2020-01-30        HTML::Element::traverse(3)
Impressum