1HTML::Element::traverseU(s3e)r Contributed Perl DocumentaHtTiMoLn::Element::traverse(3)
2
3
4
6 HTML::Element::traverse - discussion of HTML::Element's traverse method
7
9 # $element->traverse is unnecessary and obscure.
10 # Don't use it in new code.
11
13 "HTML::Element" provides a method "traverse" that traverses the tree
14 and calls user-specified callbacks for each node, in pre- or post-
15 order. However, use of the method is quite superfluous: if you want to
16 recursively visit every node in the tree, it's almost always simpler to
17 write a subroutine does just that, than it is to bundle up the pre-
18 and/or post-order code in callbacks for the "traverse" method.
19
21 Suppose you want to traverse at/under a node $tree and give elements an
22 'id' attribute unless they already have one.
23
24 You can use the "traverse" method:
25
26 {
27 my $counter = 'x0000';
28 $start_node->traverse(
29 [ # Callbacks;
30 # pre-order callback:
31 sub {
32 my $x = $_[0];
33 $x->attr('id', $counter++) unless defined $x->attr('id');
34 return HTML::Element::OK; # keep traversing
35 },
36 # post-order callback:
37 undef
38 ],
39 1, # don't call the callbacks for text nodes
40 );
41 }
42
43 or you can just be simple and clear (and not have to understand the
44 calling format for "traverse") by writing a sub that traverses the tree
45 by just calling itself:
46
47 {
48 my $counter = 'x0000';
49 sub give_id {
50 my $x = $_[0];
51 $x->attr('id', $counter++) unless defined $x->attr('id');
52 foreach my $c ($x->content_list) {
53 give_id($c) if ref $c; # ignore text nodes
54 }
55 };
56 give_id($start_node);
57 }
58
59 See, isn't that nice and clear?
60
61 But, if you really need to know:
62
64 The "traverse()" method is a general object-method for traversing a
65 tree or subtree and calling user-specified callbacks. It accepts the
66 following syntaxes:
67
68 $h->traverse(\&callback)
69 or $h->traverse(\&callback, $ignore_text)
70 or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)
71
72 These all mean to traverse the element and all of its children. That
73 is, this method starts at node $h, "pre-order visits" $h, traverses its
74 children, and then will "post-order visit" $h. "Visiting" means that
75 the callback routine is called, with these arguments:
76
77 $_[0] : the node (element or text segment),
78 $_[1] : a startflag, and
79 $_[2] : the depth
80
81 If the $ignore_text parameter is given and true, then the pre-order
82 call will not be happen for text content.
83
84 The startflag is 1 when we enter a node (i.e., in pre-order calls) and
85 0 when we leave the node (in post-order calls).
86
87 Note, however, that post-order calls don't happen for nodes that are
88 text segments or are elements that are prototypically empty (like "br",
89 "hr", etc.).
90
91 If we visit text nodes (i.e., unless $ignore_text is given and true),
92 then when text nodes are visited, we will also pass two extra arguments
93 to the callback:
94
95 $_[3] : the element that's the parent
96 of this text node
97 $_[4] : the index of this text node
98 in its parent's content list
99
100 Note that you can specify that the pre-order routine can be a different
101 routine from the post-order one:
102
103 $h->traverse( [\&pre_callback,\&post_callback], ...);
104
105 You can also specify that no post-order calls are to be made, by
106 providing a false value as the post-order routine:
107
108 $h->traverse([ \&pre_callback,0 ], ...);
109
110 And similarly for suppressing pre-order callbacks:
111
112 $h->traverse([ 0,\&post_callback ], ...);
113
114 Note that these two syntaxes specify the same operation:
115
116 $h->traverse([\&foo,\&foo], ...);
117 $h->traverse( \&foo , ...);
118
119 The return values from calls to your pre- or post-order routines are
120 significant, and are used to control recursion into the tree.
121
122 These are the values you can return, listed in descending order of my
123 estimation of their usefulness:
124
125 HTML::Element::OK, 1, or any other true value
126 ...to keep on traversing.
127
128 Note that "HTML::Element::OK" et al are constants. So if you're
129 running under "use strict" (as I hope you are), and you say:
130 "return HTML::Element::PRUEN" the compiler will flag this as an
131 error (an unallowable bareword, specifically), whereas if you spell
132 PRUNE correctly, the compiler will not complain.
133
134 undef, 0, '0', '', or HTML::Element::PRUNE
135 ...to block traversing under the current element's content. (This
136 is ignored if received from a post-order callback, since by then
137 the recursion has already happened.) If this is returned by a pre-
138 order callback, no post-order callback for the current node will
139 happen. (Recall that if your callback exits with just "return;",
140 it is returning undef -- at least in scalar context, and "traverse"
141 always calls your callbacks in scalar context.)
142
143 HTML::Element::ABORT
144 ...to abort the whole traversal immediately. This is often useful
145 when you're looking for just the first node in the tree that meets
146 some criterion of yours.
147
148 HTML::Element::PRUNE_UP
149 ...to abort continued traversal into this node and its parent node.
150 No post-order callback for the current or parent node will happen.
151
152 HTML::Element::PRUNE_SOFTLY
153 Like PRUNE, except that the post-order call for the current node is
154 not blocked.
155
156 Almost every task to do with extracting information from a tree can be
157 expressed in terms of traverse operations (usually in only one pass,
158 and usually paying attention to only pre-order, or to only post-order),
159 or operations based on traversing. (In fact, many of the other methods
160 in this class are basically calls to traverse() with particular
161 arguments.)
162
163 The source code for HTML::Element and HTML::TreeBuilder contain several
164 examples of the use of the "traverse" method to gather information
165 about the content of trees and subtrees.
166
167 (Note: you should not change the structure of a tree while you are
168 traversing it.)
169
170 [End of documentation for the "traverse()" method]
171
172 Traversing with Recursive Anonymous Routines
173 Now, if you've been reading Structure and Interpretation of Computer
174 Programs too much, maybe you even want a recursive lambda. Go ahead:
175
176 {
177 my $counter = 'x0000';
178 my $give_id;
179 $give_id = sub {
180 my $x = $_[0];
181 $x->attr('id', $counter++) unless defined $x->attr('id');
182 foreach my $c ($x->content_list) {
183 $give_id->($c) if ref $c; # ignore text nodes
184 }
185 };
186 $give_id->($start_node);
187 undef $give_id;
188 }
189
190 It's a bit nutty, and it's still more concise than a call to the
191 "traverse" method!
192
193 It is left as an exercise to the reader to figure out how to do the
194 same thing without using a $give_id symbol at all.
195
196 It is also left as an exercise to the reader to figure out why I
197 undefine $give_id, above; and why I could achieved the same effect with
198 any of:
199
200 $give_id = 'I like pie!';
201 # or...
202 $give_id = [];
203 # or even;
204 $give_id = sub { print "Mmmm pie!\n" };
205
206 But not:
207
208 $give_id = sub { print "I'm $give_id and I like pie!\n" };
209 # nor...
210 $give_id = \$give_id;
211 # nor...
212 $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
213
214 Doing Recursive Things Iteratively
215 Note that you may at times see an iterative implementation of pre-order
216 traversal, like so:
217
218 {
219 my @to_do = ($tree); # start-node
220 while(@to_do) {
221 my $this = shift @to_do;
222
223 # "Visit" the node:
224 $this->attr('id', $counter++)
225 unless defined $this->attr('id');
226
227 unshift @to_do, grep ref $_, $this->content_list;
228 # Put children on the stack -- they'll be visited next
229 }
230 }
231
232 This can under certain circumstances be more efficient than just a
233 normal recursive routine, but at the cost of being rather obscure. It
234 gains efficiency by avoiding the overhead of function-calling, but
235 since there are several method dispatches however you do it (to "attr"
236 and "content_list"), the overhead for a simple function call is
237 insignificant.
238
239 Pruning and Whatnot
240 The "traverse" method does have the fairly neat features of the
241 "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals. None of these can be
242 implemented totally straightforwardly with recursive routines, but it
243 is quite possible. "ABORT"-like behavior can be implemented either
244 with using non-local returning with "eval"/"die":
245
246 my $died_on; # if you need to know where...
247 sub thing {
248 ... visits $_[0]...
249 ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
250 ... else call thing($child) for each child...
251 ...any post-order visiting $_[0]...
252 }
253 eval { thing($node) };
254 if($@) {
255 if($@ =~ m<^ABORT_TRAV>) {
256 ...it died (aborted) on $died_on...
257 } else {
258 die $@; # some REAL error happened
259 }
260 }
261
262 or you can just do it with flags:
263
264 my($abort_flag, $died_on);
265 sub thing {
266 ... visits $_[0]...
267 ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
268 foreach my $c ($_[0]->content_list) {
269 thing($c);
270 return if $abort_flag;
271 }
272 ...any post-order visiting $_[0]...
273 return;
274 }
275
276 $abort_flag = $died_on = undef;
277 thing($node);
278 ...if defined $abort_flag, it died on $died_on
279
281 HTML::Element
282
284 Copyright 2000,2001 Sean M. Burke
285
287 Sean M. Burke, <sburke@cpan.org>
288
289
290
291perl v5.10.1 2006-08-04 HTML::Element::traverse(3)