1HTML::Element::traverseU(s3e)r Contributed Perl DocumentaHtTiMoLn::Element::traverse(3)
2
3
4
6 HTML::Element::traverse - discussion of HTML::Element's traverse method
7
9 # $element->traverse is unnecessary and obscure.
10 # Don't use it in new code.
11
13 "HTML::Element" provides a method "traverse" that traverses the tree
14 and calls user-specified callbacks for each node, in pre- or
15 post-order. However, use of the method is quite superfluous: if you
16 want to recursively visit every node in the tree, it's almost always
17 simpler to write a subroutine does just that, than it is to bundle up
18 the pre- and/or post-order code in callbacks for the "traverse" method.
19
21 Suppose you want to traverse at/under a node $tree and give elements an
22 'id' attribute unless they already have one.
23
24 You can use the "traverse" method:
25
26 {
27 my $counter = 'x0000';
28 $start_node->traverse(
29 [ # Callbacks;
30 # pre-order callback:
31 sub {
32 my $x = $_[0];
33 $x->attr('id', $counter++) unless defined $x->attr('id');
34 return HTML::Element::OK; # keep traversing
35 },
36 # post-order callback:
37 undef
38 ],
39 1, # don't call the callbacks for text nodes
40 );
41 }
42
43 or you can just be simple and clear (and not have to understand the
44 calling format for "traverse") by writing a sub that traverses the tree
45 by just calling itself:
46
47 {
48 my $counter = 'x0000';
49 sub give_id {
50 my $x = $_[0];
51 $x->attr('id', $counter++) unless defined $x->attr('id');
52 foreach my $c ($x->content_list) {
53 give_id($c) if ref $c; # ignore text nodes
54 }
55 };
56 give_id($start_node);
57 }
58
59 See, isn't that nice and clear?
60
61 But, if you really need to know:
62
64 The "traverse()" method is a general object-method for traversing a
65 tree or subtree and calling user-specified callbacks. It accepts the
66 following syntaxes:
67
68 $h->traverse(\&callback)
69 or $h->traverse(\&callback, $ignore_text)
70 or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)
71
72 These all mean to traverse the element and all of its children. That
73 is, this method starts at node $h, "pre-order visits" $h, traverses its
74 children, and then will "post-order visit" $h. "Visiting" means that
75 the callback routine is called, with these arguments:
76
77 $_[0] : the node (element or text segment),
78 $_[1] : a startflag, and
79 $_[2] : the depth
80
81 If the $ignore_text parameter is given and true, then the pre-order
82 call will not be happen for text content.
83
84 The startflag is 1 when we enter a node (i.e., in pre-order calls) and
85 0 when we leave the node (in post-order calls).
86
87 Note, however, that post-order calls don't happen for nodes that are
88 text segments or are elements that are prototypically empty (like "br",
89 "hr", etc.).
90
91 If we visit text nodes (i.e., unless $ignore_text is given and true),
92 then when text nodes are visited, we will also pass two extra arguments
93 to the callback:
94
95 $_[3] : the element that's the parent
96 of this text node
97 $_[4] : the index of this text node
98 in its parent's content list
99
100 Note that you can specify that the pre-order routine can be a different
101 routine from the post-order one:
102
103 $h->traverse( [\&pre_callback,\&post_callback], ...);
104
105 You can also specify that no post-order calls are to be made, by pro‐
106 viding a false value as the post-order routine:
107
108 $h->traverse([ \&pre_callback,0 ], ...);
109
110 And similarly for suppressing pre-order callbacks:
111
112 $h->traverse([ 0,\&post_callback ], ...);
113
114 Note that these two syntaxes specify the same operation:
115
116 $h->traverse([\&foo,\&foo], ...);
117 $h->traverse( \&foo , ...);
118
119 The return values from calls to your pre- or post-order routines are
120 significant, and are used to control recursion into the tree.
121
122 These are the values you can return, listed in descending order of my
123 estimation of their usefulness:
124
125 HTML::Element::OK, 1, or any other true value
126 ...to keep on traversing.
127
128 Note that "HTML::Element::OK" et al are constants. So if you're
129 running under "use strict" (as I hope you are), and you say:
130 "return HTML::Element::PRUEN" the compiler will flag this as an
131 error (an unallowable bareword, specifically), whereas if you spell
132 PRUNE correctly, the compiler will not complain.
133
134 undef, 0, '0', '', or HTML::Element::PRUNE
135 ...to block traversing under the current element's content. (This
136 is ignored if received from a post-order callback, since by then
137 the recursion has already happened.) If this is returned by a pre-
138 order callback, no post-order callback for the current node will
139 happen. (Recall that if your callback exits with just "return;",
140 it is returning undef -- at least in scalar context, and "traverse"
141 always calls your callbacks in scalar context.)
142
143 HTML::Element::ABORT
144 ...to abort the whole traversal immediately. This is often useful
145 when you're looking for just the first node in the tree that meets
146 some criterion of yours.
147
148 HTML::Element::PRUNE_UP
149 ...to abort continued traversal into this node and its parent node.
150 No post-order callback for the current or parent node will happen.
151
152 HTML::Element::PRUNE_SOFTLY
153 Like PRUNE, except that the post-order call for the current node is
154 not blocked.
155
156 Almost every task to do with extracting information from a tree can be
157 expressed in terms of traverse operations (usually in only one pass,
158 and usually paying attention to only pre-order, or to only post-order),
159 or operations based on traversing. (In fact, many of the other methods
160 in this class are basically calls to traverse() with particular argu‐
161 ments.)
162
163 The source code for HTML::Element and HTML::TreeBuilder contain several
164 examples of the use of the "traverse" method to gather information
165 about the content of trees and subtrees.
166
167 (Note: you should not change the structure of a tree while you are
168 traversing it.)
169
170 [End of documentation for the "traverse()" method]
171
172 Traversing with Recursive Anonymous Routines
173
174 Now, if you've been reading Structure and Interpretation of Computer
175 Programs too much, maybe you even want a recursive lambda. Go ahead:
176
177 {
178 my $counter = 'x0000';
179 my $give_id;
180 $give_id = sub {
181 my $x = $_[0];
182 $x->attr('id', $counter++) unless defined $x->attr('id');
183 foreach my $c ($x->content_list) {
184 $give_id->($c) if ref $c; # ignore text nodes
185 }
186 };
187 $give_id->($start_node);
188 undef $give_id;
189 }
190
191 It's a bit nutty, and it's still more concise than a call to the "tra‐
192 verse" method!
193
194 It is left as an exercise to the reader to figure out how to do the
195 same thing without using a $give_id symbol at all.
196
197 It is also left as an exercise to the reader to figure out why I unde‐
198 fine $give_id, above; and why I could achieved the same effect with any
199 of:
200
201 $give_id = 'I like pie!';
202 # or...
203 $give_id = [];
204 # or even;
205 $give_id = sub { print "Mmmm pie!\n" };
206
207 But not:
208
209 $give_id = sub { print "I'm $give_id and I like pie!\n" };
210 # nor...
211 $give_id = \$give_id;
212 # nor...
213 $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
214
215 Doing Recursive Things Iteratively
216
217 Note that you may at times see an iterative implementation of pre-order
218 traversal, like so:
219
220 {
221 my @to_do = ($tree); # start-node
222 while(@to_do) {
223 my $this = shift @to_do;
224
225 # "Visit" the node:
226 $this->attr('id', $counter++)
227 unless defined $this->attr('id');
228
229 unshift @to_do, grep ref $_, $this->content_list;
230 # Put children on the stack -- they'll be visited next
231 }
232 }
233
234 This can under certain circumstances be more efficient than just a nor‐
235 mal recursive routine, but at the cost of being rather obscure. It
236 gains efficiency by avoiding the overhead of function-calling, but
237 since there are several method dispatches however you do it (to "attr"
238 and "content_list"), the overhead for a simple function call is
239 insignificant.
240
241 Pruning and Whatnot
242
243 The "traverse" method does have the fairly neat features of the
244 "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals. None of these can be
245 implemented totally straightforwardly with recursive routines, but it
246 is quite possible. "ABORT"-like behavior can be implemented either
247 with using non-local returning with "eval"/"die":
248
249 my $died_on; # if you need to know where...
250 sub thing {
251 ... visits $_[0]...
252 ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
253 ... else call thing($child) for each child...
254 ...any post-order visiting $_[0]...
255 }
256 eval { thing($node) };
257 if($@) {
258 if($@ =~ m<^ABORT_TRAV>) {
259 ...it died (aborted) on $died_on...
260 } else {
261 die $@; # some REAL error happened
262 }
263 }
264
265 or you can just do it with flags:
266
267 my($abort_flag, $died_on);
268 sub thing {
269 ... visits $_[0]...
270 ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
271 foreach my $c ($_[0]->content_list) {
272 thing($c);
273 return if $abort_flag;
274 }
275 ...any post-order visiting $_[0]...
276 return;
277 }
278
279 $abort_flag = $died_on = undef;
280 thing($node);
281 ...if defined $abort_flag, it died on $died_on
282
284 HTML::Element
285
287 Copyright 2000,2001 Sean M. Burke
288
290 Sean M. Burke, <sburke@cpan.org>
291
292
293
294perl v5.8.8 2006-08-04 HTML::Element::traverse(3)