1HTML::Element::traverseU(s3e)r Contributed Perl DocumentaHtTiMoLn::Element::traverse(3)
2
3
4

NAME

6       HTML::Element::traverse - discussion of HTML::Element's traverse method
7

SYNOPSIS

9         # $element->traverse is unnecessary and obscure.
10         #   Don't use it in new code.
11

DESCRIPTION

13       "HTML::Element" provides a method "traverse" that traverses the tree
14       and calls user-specified callbacks for each node, in pre- or
15       post-order.  However, use of the method is quite superfluous: if you
16       want to recursively visit every node in the tree, it's almost always
17       simpler to write a subroutine does just that, than it is to bundle up
18       the pre- and/or post-order code in callbacks for the "traverse" method.
19

EXAMPLES

21       Suppose you want to traverse at/under a node $tree and give elements an
22       'id' attribute unless they already have one.
23
24       You can use the "traverse" method:
25
26         {
27           my $counter = 'x0000';
28           $start_node->traverse(
29             [ # Callbacks;
30               # pre-order callback:
31               sub {
32                 my $x = $_[0];
33                 $x->attr('id', $counter++) unless defined $x->attr('id');
34                 return HTML::Element::OK; # keep traversing
35               },
36               # post-order callback:
37               undef
38             ],
39             1, # don't call the callbacks for text nodes
40           );
41         }
42
43       or you can just be simple and clear (and not have to understand the
44       calling format for "traverse") by writing a sub that traverses the tree
45       by just calling itself:
46
47         {
48           my $counter = 'x0000';
49           sub give_id {
50             my $x = $_[0];
51             $x->attr('id', $counter++) unless defined $x->attr('id');
52             foreach my $c ($x->content_list) {
53               give_id($c) if ref $c; # ignore text nodes
54             }
55           };
56           give_id($start_node);
57         }
58
59       See, isn't that nice and clear?
60
61       But, if you really need to know:
62

THE TRAVERSE METHOD

64       The "traverse()" method is a general object-method for traversing a
65       tree or subtree and calling user-specified callbacks.  It accepts the
66       following syntaxes:
67
68       $h->traverse(\&callback)
69       or $h->traverse(\&callback, $ignore_text)
70       or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)
71
72       These all mean to traverse the element and all of its children.  That
73       is, this method starts at node $h, "pre-order visits" $h, traverses its
74       children, and then will "post-order visit" $h.  "Visiting" means that
75       the callback routine is called, with these arguments:
76
77           $_[0] : the node (element or text segment),
78           $_[1] : a startflag, and
79           $_[2] : the depth
80
81       If the $ignore_text parameter is given and true, then the pre-order
82       call will not be happen for text content.
83
84       The startflag is 1 when we enter a node (i.e., in pre-order calls) and
85       0 when we leave the node (in post-order calls).
86
87       Note, however, that post-order calls don't happen for nodes that are
88       text segments or are elements that are prototypically empty (like "br",
89       "hr", etc.).
90
91       If we visit text nodes (i.e., unless $ignore_text is given and true),
92       then when text nodes are visited, we will also pass two extra arguments
93       to the callback:
94
95           $_[3] : the element that's the parent
96                    of this text node
97           $_[4] : the index of this text node
98                    in its parent's content list
99
100       Note that you can specify that the pre-order routine can be a different
101       routine from the post-order one:
102
103           $h->traverse( [\&pre_callback,\&post_callback], ...);
104
105       You can also specify that no post-order calls are to be made, by pro‐
106       viding a false value as the post-order routine:
107
108           $h->traverse([ \&pre_callback,0 ], ...);
109
110       And similarly for suppressing pre-order callbacks:
111
112           $h->traverse([ 0,\&post_callback ], ...);
113
114       Note that these two syntaxes specify the same operation:
115
116           $h->traverse([\&foo,\&foo], ...);
117           $h->traverse( \&foo       , ...);
118
119       The return values from calls to your pre- or post-order routines are
120       significant, and are used to control recursion into the tree.
121
122       These are the values you can return, listed in descending order of my
123       estimation of their usefulness:
124
125       HTML::Element::OK, 1, or any other true value
126           ...to keep on traversing.
127
128           Note that "HTML::Element::OK" et al are constants.  So if you're
129           running under "use strict" (as I hope you are), and you say:
130           "return HTML::Element::PRUEN" the compiler will flag this as an
131           error (an unallowable bareword, specifically), whereas if you spell
132           PRUNE correctly, the compiler will not complain.
133
134       undef, 0, '0', '', or HTML::Element::PRUNE
135           ...to block traversing under the current element's content.  (This
136           is ignored if received from a post-order callback, since by then
137           the recursion has already happened.)  If this is returned by a pre-
138           order callback, no post-order callback for the current node will
139           happen.  (Recall that if your callback exits with just "return;",
140           it is returning undef -- at least in scalar context, and "traverse"
141           always calls your callbacks in scalar context.)
142
143       HTML::Element::ABORT
144           ...to abort the whole traversal immediately.  This is often useful
145           when you're looking for just the first node in the tree that meets
146           some criterion of yours.
147
148       HTML::Element::PRUNE_UP
149           ...to abort continued traversal into this node and its parent node.
150           No post-order callback for the current or parent node will happen.
151
152       HTML::Element::PRUNE_SOFTLY
153           Like PRUNE, except that the post-order call for the current node is
154           not blocked.
155
156       Almost every task to do with extracting information from a tree can be
157       expressed in terms of traverse operations (usually in only one pass,
158       and usually paying attention to only pre-order, or to only post-order),
159       or operations based on traversing. (In fact, many of the other methods
160       in this class are basically calls to traverse() with particular argu‐
161       ments.)
162
163       The source code for HTML::Element and HTML::TreeBuilder contain several
164       examples of the use of the "traverse" method to gather information
165       about the content of trees and subtrees.
166
167       (Note: you should not change the structure of a tree while you are
168       traversing it.)
169
170       [End of documentation for the "traverse()" method]
171
172       Traversing with Recursive Anonymous Routines
173
174       Now, if you've been reading Structure and Interpretation of Computer
175       Programs too much, maybe you even want a recursive lambda.  Go ahead:
176
177         {
178           my $counter = 'x0000';
179           my $give_id;
180           $give_id = sub {
181             my $x = $_[0];
182             $x->attr('id', $counter++) unless defined $x->attr('id');
183             foreach my $c ($x->content_list) {
184               $give_id->($c) if ref $c; # ignore text nodes
185             }
186           };
187           $give_id->($start_node);
188           undef $give_id;
189         }
190
191       It's a bit nutty, and it's still more concise than a call to the "tra‐
192       verse" method!
193
194       It is left as an exercise to the reader to figure out how to do the
195       same thing without using a $give_id symbol at all.
196
197       It is also left as an exercise to the reader to figure out why I unde‐
198       fine $give_id, above; and why I could achieved the same effect with any
199       of:
200
201           $give_id = 'I like pie!';
202          # or...
203           $give_id = [];
204          # or even;
205           $give_id = sub { print "Mmmm pie!\n" };
206
207       But not:
208
209           $give_id = sub { print "I'm $give_id and I like pie!\n" };
210          # nor...
211           $give_id = \$give_id;
212          # nor...
213           $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
214
215       Doing Recursive Things Iteratively
216
217       Note that you may at times see an iterative implementation of pre-order
218       traversal, like so:
219
220          {
221            my @to_do = ($tree); # start-node
222            while(@to_do) {
223              my $this = shift @to_do;
224
225              # "Visit" the node:
226              $this->attr('id', $counter++)
227               unless defined $this->attr('id');
228
229              unshift @to_do, grep ref $_, $this->content_list;
230               # Put children on the stack -- they'll be visited next
231            }
232          }
233
234       This can under certain circumstances be more efficient than just a nor‐
235       mal recursive routine, but at the cost of being rather obscure.  It
236       gains efficiency by avoiding the overhead of function-calling, but
237       since there are several method dispatches however you do it (to "attr"
238       and "content_list"), the overhead for a simple function call is
239       insignificant.
240
241       Pruning and Whatnot
242
243       The "traverse" method does have the fairly neat features of the
244       "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals.  None of these can be
245       implemented totally straightforwardly with recursive routines, but it
246       is quite possible.  "ABORT"-like behavior can be implemented either
247       with using non-local returning with "eval"/"die":
248
249         my $died_on; # if you need to know where...
250         sub thing {
251           ... visits $_[0]...
252           ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
253           ... else call thing($child) for each child...
254           ...any post-order visiting $_[0]...
255         }
256         eval { thing($node) };
257         if($@) {
258           if($@ =~ m<^ABORT_TRAV>) {
259             ...it died (aborted) on $died_on...
260           } else {
261             die $@; # some REAL error happened
262           }
263         }
264
265       or you can just do it with flags:
266
267         my($abort_flag, $died_on);
268         sub thing {
269           ... visits $_[0]...
270           ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
271           foreach my $c ($_[0]->content_list) {
272             thing($c);
273             return if $abort_flag;
274           }
275           ...any post-order visiting $_[0]...
276           return;
277         }
278
279         $abort_flag = $died_on = undef;
280         thing($node);
281         ...if defined $abort_flag, it died on $died_on
282

SEE ALSO

284       HTML::Element
285
287       Copyright 2000,2001 Sean M. Burke
288

AUTHOR

290       Sean M. Burke, <sburke@cpan.org>
291
292
293
294perl v5.8.8                       2006-08-04        HTML::Element::traverse(3)
Impressum