1HTML::Element::traverseU(s3e)r Contributed Perl DocumentaHtTiMoLn::Element::traverse(3)
2
3
4

NAME

6       HTML::Element::traverse - discussion of HTML::Element's traverse method
7

SYNOPSIS

9         # $element->traverse is unnecessary and obscure.
10         #   Don't use it in new code.
11

DESCRIPTION

13       "HTML::Element" provides a method "traverse" that traverses the tree
14       and calls user-specified callbacks for each node, in pre- or post-
15       order.  However, use of the method is quite superfluous: if you want to
16       recursively visit every node in the tree, it's almost always simpler to
17       write a subroutine does just that, than it is to bundle up the pre-
18       and/or post-order code in callbacks for the "traverse" method.
19

EXAMPLES

21       Suppose you want to traverse at/under a node $tree and give elements an
22       'id' attribute unless they already have one.
23
24       You can use the "traverse" method:
25
26         {
27           my $counter = 'x0000';
28           $start_node->traverse(
29             [ # Callbacks;
30               # pre-order callback:
31               sub {
32                 my $x = $_[0];
33                 $x->attr('id', $counter++) unless defined $x->attr('id');
34                 return HTML::Element::OK; # keep traversing
35               },
36               # post-order callback:
37               undef
38             ],
39             1, # don't call the callbacks for text nodes
40           );
41         }
42
43       or you can just be simple and clear (and not have to understand the
44       calling format for "traverse") by writing a sub that traverses the tree
45       by just calling itself:
46
47         {
48           my $counter = 'x0000';
49           sub give_id {
50             my $x = $_[0];
51             $x->attr('id', $counter++) unless defined $x->attr('id');
52             foreach my $c ($x->content_list) {
53               give_id($c) if ref $c; # ignore text nodes
54             }
55           };
56           give_id($start_node);
57         }
58
59       See, isn't that nice and clear?
60
61       But, if you really need to know:
62

THE TRAVERSE METHOD

64       The "traverse()" method is a general object-method for traversing a
65       tree or subtree and calling user-specified callbacks.  It accepts the
66       following syntaxes:
67
68       $h->traverse(\&callback)
69       or $h->traverse(\&callback, $ignore_text)
70       or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)
71
72       These all mean to traverse the element and all of its children.  That
73       is, this method starts at node $h, "pre-order visits" $h, traverses its
74       children, and then will "post-order visit" $h.  "Visiting" means that
75       the callback routine is called, with these arguments:
76
77           $_[0] : the node (element or text segment),
78           $_[1] : a startflag, and
79           $_[2] : the depth
80
81       If the $ignore_text parameter is given and true, then the pre-order
82       call will not be happen for text content.
83
84       The startflag is 1 when we enter a node (i.e., in pre-order calls) and
85       0 when we leave the node (in post-order calls).
86
87       Note, however, that post-order calls don't happen for nodes that are
88       text segments or are elements that are prototypically empty (like "br",
89       "hr", etc.).
90
91       If we visit text nodes (i.e., unless $ignore_text is given and true),
92       then when text nodes are visited, we will also pass two extra arguments
93       to the callback:
94
95           $_[3] : the element that's the parent
96                    of this text node
97           $_[4] : the index of this text node
98                    in its parent's content list
99
100       Note that you can specify that the pre-order routine can be a different
101       routine from the post-order one:
102
103           $h->traverse( [\&pre_callback,\&post_callback], ...);
104
105       You can also specify that no post-order calls are to be made, by
106       providing a false value as the post-order routine:
107
108           $h->traverse([ \&pre_callback,0 ], ...);
109
110       And similarly for suppressing pre-order callbacks:
111
112           $h->traverse([ 0,\&post_callback ], ...);
113
114       Note that these two syntaxes specify the same operation:
115
116           $h->traverse([\&foo,\&foo], ...);
117           $h->traverse( \&foo       , ...);
118
119       The return values from calls to your pre- or post-order routines are
120       significant, and are used to control recursion into the tree.
121
122       These are the values you can return, listed in descending order of my
123       estimation of their usefulness:
124
125       HTML::Element::OK, 1, or any other true value
126           ...to keep on traversing.
127
128           Note that "HTML::Element::OK" et al are constants.  So if you're
129           running under "use strict" (as I hope you are), and you say:
130           "return HTML::Element::PRUEN" the compiler will flag this as an
131           error (an unallowable bareword, specifically), whereas if you spell
132           PRUNE correctly, the compiler will not complain.
133
134       undef, 0, '0', '', or HTML::Element::PRUNE
135           ...to block traversing under the current element's content.  (This
136           is ignored if received from a post-order callback, since by then
137           the recursion has already happened.)  If this is returned by a pre-
138           order callback, no post-order callback for the current node will
139           happen.  (Recall that if your callback exits with just "return;",
140           it is returning undef -- at least in scalar context, and "traverse"
141           always calls your callbacks in scalar context.)
142
143       HTML::Element::ABORT
144           ...to abort the whole traversal immediately.  This is often useful
145           when you're looking for just the first node in the tree that meets
146           some criterion of yours.
147
148       HTML::Element::PRUNE_UP
149           ...to abort continued traversal into this node and its parent node.
150           No post-order callback for the current or parent node will happen.
151
152       HTML::Element::PRUNE_SOFTLY
153           Like PRUNE, except that the post-order call for the current node is
154           not blocked.
155
156       Almost every task to do with extracting information from a tree can be
157       expressed in terms of traverse operations (usually in only one pass,
158       and usually paying attention to only pre-order, or to only post-order),
159       or operations based on traversing. (In fact, many of the other methods
160       in this class are basically calls to traverse() with particular
161       arguments.)
162
163       The source code for HTML::Element and HTML::TreeBuilder contain several
164       examples of the use of the "traverse" method to gather information
165       about the content of trees and subtrees.
166
167       (Note: you should not change the structure of a tree while you are
168       traversing it.)
169
170       [End of documentation for the "traverse()" method]
171
172   Traversing with Recursive Anonymous Routines
173       Now, if you've been reading Structure and Interpretation of Computer
174       Programs too much, maybe you even want a recursive lambda.  Go ahead:
175
176         {
177           my $counter = 'x0000';
178           my $give_id;
179           $give_id = sub {
180             my $x = $_[0];
181             $x->attr('id', $counter++) unless defined $x->attr('id');
182             foreach my $c ($x->content_list) {
183               $give_id->($c) if ref $c; # ignore text nodes
184             }
185           };
186           $give_id->($start_node);
187           undef $give_id;
188         }
189
190       It's a bit nutty, and it's still more concise than a call to the
191       "traverse" method!
192
193       It is left as an exercise to the reader to figure out how to do the
194       same thing without using a $give_id symbol at all.
195
196       It is also left as an exercise to the reader to figure out why I
197       undefine $give_id, above; and why I could achieved the same effect with
198       any of:
199
200           $give_id = 'I like pie!';
201          # or...
202           $give_id = [];
203          # or even;
204           $give_id = sub { print "Mmmm pie!\n" };
205
206       But not:
207
208           $give_id = sub { print "I'm $give_id and I like pie!\n" };
209          # nor...
210           $give_id = \$give_id;
211          # nor...
212           $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };
213
214   Doing Recursive Things Iteratively
215       Note that you may at times see an iterative implementation of pre-order
216       traversal, like so:
217
218          {
219            my @to_do = ($tree); # start-node
220            while(@to_do) {
221              my $this = shift @to_do;
222
223              # "Visit" the node:
224              $this->attr('id', $counter++)
225               unless defined $this->attr('id');
226
227              unshift @to_do, grep ref $_, $this->content_list;
228               # Put children on the stack -- they'll be visited next
229            }
230          }
231
232       This can under certain circumstances be more efficient than just a
233       normal recursive routine, but at the cost of being rather obscure.  It
234       gains efficiency by avoiding the overhead of function-calling, but
235       since there are several method dispatches however you do it (to "attr"
236       and "content_list"), the overhead for a simple function call is
237       insignificant.
238
239   Pruning and Whatnot
240       The "traverse" method does have the fairly neat features of the
241       "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals.  None of these can be
242       implemented totally straightforwardly with recursive routines, but it
243       is quite possible.  "ABORT"-like behavior can be implemented either
244       with using non-local returning with "eval"/"die":
245
246         my $died_on; # if you need to know where...
247         sub thing {
248           ... visits $_[0]...
249           ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
250           ... else call thing($child) for each child...
251           ...any post-order visiting $_[0]...
252         }
253         eval { thing($node) };
254         if($@) {
255           if($@ =~ m<^ABORT_TRAV>) {
256             ...it died (aborted) on $died_on...
257           } else {
258             die $@; # some REAL error happened
259           }
260         }
261
262       or you can just do it with flags:
263
264         my($abort_flag, $died_on);
265         sub thing {
266           ... visits $_[0]...
267           ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
268           foreach my $c ($_[0]->content_list) {
269             thing($c);
270             return if $abort_flag;
271           }
272           ...any post-order visiting $_[0]...
273           return;
274         }
275
276         $abort_flag = $died_on = undef;
277         thing($node);
278         ...if defined $abort_flag, it died on $died_on
279

SEE ALSO

281       HTML::Element
282
284       Copyright 2000,2001 Sean M. Burke
285

AUTHOR

287       Sean M. Burke, <sburke@cpan.org>
288
289
290
291perl v5.10.1                      2006-08-04        HTML::Element::traverse(3)
Impressum