1HTML::Element::traverseU(s3e)r Contributed Perl DocumentaHtTiMoLn::Element::traverse(3)
2
3
4
6 HTML::Element::traverse - discussion of HTML::Element's traverse method
7
9 This document describes version 5.07 of HTML::Element::traverse,
10 released August 31, 2017 as part of HTML-Tree.
11
13 # $element->traverse is unnecessary and obscure.
14 # Don't use it in new code.
15
17 "HTML::Element" provides a method "traverse" that traverses the tree
18 and calls user-specified callbacks for each node, in pre- or post-
19 order. However, use of the method is quite superfluous: if you want to
20 recursively visit every node in the tree, it's almost always simpler to
21 write a subroutine does just that, than it is to bundle up the pre-
22 and/or post-order code in callbacks for the "traverse" method.
23
25 Suppose you want to traverse at/under a node $tree and give elements an
26 'id' attribute unless they already have one.
27
28 You can use the "traverse" method:
29
30 {
31 my $counter = 'x0000';
32 $start_node->traverse(
33 [ # Callbacks;
34 # pre-order callback:
35 sub {
36 my $x = $_[0];
37 $x->attr('id', $counter++) unless defined $x->attr('id');
38 return HTML::Element::OK; # keep traversing
39 },
40 # post-order callback:
41 undef
42 ],
43 1, # don't call the callbacks for text nodes
44 );
45 }
46
47 or you can just be simple and clear (and not have to understand the
48 calling format for "traverse") by writing a sub that traverses the tree
49 by just calling itself:
50
51 {
52 my $counter = 'x0000';
53 sub give_id {
54 my $x = $_[0];
55 $x->attr('id', $counter++) unless defined $x->attr('id');
56 foreach my $c ($x->content_list) {
57 give_id($c) if ref $c; # ignore text nodes
58 }
59 };
60 give_id($start_node);
61 }
62
63 See, isn't that nice and clear?
64
65 But, if you really need to know:
66
68 The "traverse()" method is a general object-method for traversing a
69 tree or subtree and calling user-specified callbacks. It accepts the
70 following syntaxes:
71
72 $h->traverse(\&callback)
73 or $h->traverse(\&callback, $ignore_text)
74 or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)
75
76 These all mean to traverse the element and all of its children. That
77 is, this method starts at node $h, "pre-order visits" $h, traverses its
78 children, and then will "post-order visit" $h. "Visiting" means that
79 the callback routine is called, with these arguments:
80
81 $_[0] : the node (element or text segment),
82 $_[1] : a startflag, and
83 $_[2] : the depth
84
85 If the $ignore_text parameter is given and true, then the pre-order
86 call will not be happen for text content.
87
88 The startflag is 1 when we enter a node (i.e., in pre-order calls) and
89 0 when we leave the node (in post-order calls).
90
91 Note, however, that post-order calls don't happen for nodes that are
92 text segments or are elements that are prototypically empty (like "br",
93 "hr", etc.).
94
95 If we visit text nodes (i.e., unless $ignore_text is given and true),
96 then when text nodes are visited, we will also pass two extra arguments
97 to the callback:
98
99 $_[3] : the element that's the parent
100 of this text node
101 $_[4] : the index of this text node
102 in its parent's content list
103
104 Note that you can specify that the pre-order routine can be a different
105 routine from the post-order one:
106
107 $h->traverse( [\&pre_callback,\&post_callback], ...);
108
109 You can also specify that no post-order calls are to be made, by
110 providing a false value as the post-order routine:
111
112 $h->traverse([ \&pre_callback,0 ], ...);
113
114 And similarly for suppressing pre-order callbacks:
115
116 $h->traverse([ 0,\&post_callback ], ...);
117
118 Note that these two syntaxes specify the same operation:
119
120 $h->traverse([\&foo,\&foo], ...);
121 $h->traverse( \&foo , ...);
122
123 The return values from calls to your pre- or post-order routines are
124 significant, and are used to control recursion into the tree.
125
126 These are the values you can return, listed in descending order of my
127 estimation of their usefulness:
128
129 HTML::Element::OK, 1, or any other true value
130 ...to keep on traversing.
131
132 Note that "HTML::Element::OK" et al are constants. So if you're
133 running under "use strict" (as I hope you are), and you say:
134 "return HTML::Element::PRUEN" the compiler will flag this as an
135 error (an unallowable bareword, specifically), whereas if you spell
136 PRUNE correctly, the compiler will not complain.
137
138 undef, 0, '0', '', or HTML::Element::PRUNE
139 ...to block traversing under the current element's content. (This
140 is ignored if received from a post-order callback, since by then
141 the recursion has already happened.) If this is returned by a pre-
142 order callback, no post-order callback for the current node will
143 happen. (Recall that if your callback exits with just "return;",
144 it is returning undef -- at least in scalar context, and "traverse"
145 always calls your callbacks in scalar context.)
146
147 HTML::Element::ABORT
148 ...to abort the whole traversal immediately. This is often useful
149 when you're looking for just the first node in the tree that meets
150 some criterion of yours.
151
152 HTML::Element::PRUNE_UP
153 ...to abort continued traversal into this node and its parent node.
154 No post-order callback for the current or parent node will happen.
155
156 HTML::Element::PRUNE_SOFTLY
157 Like PRUNE, except that the post-order call for the current node is
158 not blocked.
159
160 Almost every task to do with extracting information from a tree can be
161 expressed in terms of traverse operations (usually in only one pass,
162 and usually paying attention to only pre-order, or to only post-order),
163 or operations based on traversing. (In fact, many of the other methods
164 in this class are basically calls to traverse() with particular
165 arguments.)
166
167 The source code for HTML::Element and HTML::TreeBuilder contain several
168 examples of the use of the "traverse" method to gather information
169 about the content of trees and subtrees.
170
171 (Note: you should not change the structure of a tree while you are
172 traversing it.)
173
174 [End of documentation for the "traverse()" method]
175
176 Traversing with Recursive Anonymous Routines
177 Now, if you've been reading Structure and Interpretation of Computer
178 Programs too much, maybe you even want a recursive lambda. Go ahead:
179
180 {
181 my $counter = 'x0000';
182 my $give_id;
183 $give_id = sub {
184 my $x = $_[0];
185 $x->attr('id', $counter++) unless defined $x->attr('id');
186 foreach my $c ($x->content_list) {
187 $give_id->($c) if ref $c; # ignore text nodes
188 }
189 };
190 $give_id->($start_node);
191 undef $give_id;
192 }
193
194 It's a bit nutty, and it's still more concise than a call to the
195 "traverse" method!
196
197 It is left as an exercise to the reader to figure out how to do the
198 same thing without using a $give_id symbol at all.
199
200 It is also left as an exercise to the reader to figure out why I
201 undefine $give_id, above; and why I could achieved the same effect with
202 any of:
203
204 $give_id = 'I like pie!';
205 # or...
206 $give_id = [];
207 # or even;
208 $give_id = sub { print "Mmmm pie!\n" };
209
210 But not:
211
212 $give_id = sub { print "I'm $give_id and I like pie!\n" };
213 # nor...
214 $give_id = \$give_id;
215 # nor...
216 $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
217
218 Doing Recursive Things Iteratively
219 Note that you may at times see an iterative implementation of pre-order
220 traversal, like so:
221
222 {
223 my @to_do = ($tree); # start-node
224 while(@to_do) {
225 my $this = shift @to_do;
226
227 # "Visit" the node:
228 $this->attr('id', $counter++)
229 unless defined $this->attr('id');
230
231 unshift @to_do, grep ref $_, $this->content_list;
232 # Put children on the stack -- they'll be visited next
233 }
234 }
235
236 This can under certain circumstances be more efficient than just a
237 normal recursive routine, but at the cost of being rather obscure. It
238 gains efficiency by avoiding the overhead of function-calling, but
239 since there are several method dispatches however you do it (to "attr"
240 and "content_list"), the overhead for a simple function call is
241 insignificant.
242
243 Pruning and Whatnot
244 The "traverse" method does have the fairly neat features of the
245 "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals. None of these can be
246 implemented totally straightforwardly with recursive routines, but it
247 is quite possible. "ABORT"-like behavior can be implemented either
248 with using non-local returning with "eval"/"die":
249
250 my $died_on; # if you need to know where...
251 sub thing {
252 ... visits $_[0]...
253 ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
254 ... else call thing($child) for each child...
255 ...any post-order visiting $_[0]...
256 }
257 eval { thing($node) };
258 if($@) {
259 if($@ =~ m<^ABORT_TRAV>) {
260 ...it died (aborted) on $died_on...
261 } else {
262 die $@; # some REAL error happened
263 }
264 }
265
266 or you can just do it with flags:
267
268 my($abort_flag, $died_on);
269 sub thing {
270 ... visits $_[0]...
271 ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
272 foreach my $c ($_[0]->content_list) {
273 thing($c);
274 return if $abort_flag;
275 }
276 ...any post-order visiting $_[0]...
277 return;
278 }
279
280 $abort_flag = $died_on = undef;
281 thing($node);
282 ...if defined $abort_flag, it died on $died_on
283
285 HTML::Element
286
288 Current maintainers:
289
290 • Christopher J. Madsen "<perl AT cjmweb.net>"
291
292 • Jeff Fearn "<jfearn AT cpan.org>"
293
294 Original HTML-Tree author:
295
296 • Gisle Aas
297
298 Former maintainers:
299
300 • Sean M. Burke
301
302 • Andy Lester
303
304 • Pete Krawczyk "<petek AT cpan.org>"
305
306 You can follow or contribute to HTML-Tree's development at
307 <https://github.com/kentfredric/HTML-Tree>.
308
310 Copyright 2000,2001 Sean M. Burke
311
312
313
314perl v5.36.0 2022-07-22 HTML::Element::traverse(3)