HTML::Tree::Scanning(3pm)

1HTML::Tree::Scanning(3)User Contributed Perl DocumentatioHnTML::Tree::Scanning(3)
2
3
4

NAME

6       HTML::Tree::Scanning -- article: "Scanning HTML"
7

SYNOPSIS

9         # This an article, not a module.
10

DESCRIPTION

12       The following article by Sean M. Burke first appeared in The Perl
13       Journal #19 and is copyright 2000 The Perl Journal. It appears courtesy
14       of Jon Orwant and The Perl Journal.  This document may be distributed
15       under the same terms as Perl itself.
16

Scanning HTML

18       -- Sean M. Burke
19
20       In The Perl Journal issue 17, Ken MacFarlane's article "Parsing HTML
21       with HTML::Parser" describes how the HTML::Parser module scans HTML
22       source as a stream of start-tags, end-tags, text, comments, etc.  In
23       TPJ #18, my "Trees" article kicked around the idea of tree-shaped data
24       structures.  Now I'll try to tie it together, in a discussion of HTML
25       trees.
26
27       The CPAN module HTML::TreeBuilder takes the tags that HTML::Parser
28       picks out, and builds a parse tree -- a tree-shaped network of
29       objects...
30
31           Footnote: And if you need a quick explanation of objects, see my
32           TPJ17 article "A User's View of Object-Oriented Modules"; or go
33           whole hog and get Damian Conway's excellent book Object-Oriented
34           Perl, from Manning Publications.
35
36       ...representing the structured content of the HTML document.  And once
37       the document is parsed as a tree, you'll find the common tasks of
38       extracting data from that HTML document/tree to be quite
39       straightforward.
40
41   HTML::Parser, HTML::TreeBuilder, and HTML::Element
42       You use HTML::TreeBuilder to make a parse tree out of an HTML source
43       file, by simply saying:
44
45         use HTML::TreeBuilder;
46         my $tree = HTML::TreeBuilder->new();
47         $tree->parse_file('foo.html');
48
49       and then $tree contains a parse tree built from the HTML source from
50       the file "foo.html".  The way this parse tree is represented is with a
51       network of objects -- $tree is the root, an element with tag-name
52       "html", and its children typically include a "head" and "body" element,
53       and so on.  Elements in the tree are objects of the class
54       HTML::Element.
55
56       So, if you take this source:
57
58         <html><head><title>Doc 1</title></head>
59         <body>
60         Stuff <hr> 2000-08-17
61         </body></html>
62
63       and feed it to HTML::TreeBuilder, it'll return a tree of objects that
64       looks like this:
65
66                      html
67                    /      \
68                head        body
69               /          /   |  \
70            title    "Stuff"  hr  "2000-08-17"
71              |
72           "Doc 1"
73
74       This is a pretty simple document, but if it were any more complex, it'd
75       be a bit hard to draw in that style, since it's sprawl left and right.
76       The same tree can be represented a bit more easily sideways, with
77       indenting:
78
79         . html
80            . head
81               . title
82                  . "Doc 1"
83            . body
84               . "Stuff"
85               . hr
86               . "2000-08-17"
87
88       Either way expresses the same structure.  In that structure, the root
89       node is an object of the class HTML::Element
90
91           Footnote: Well actually, the root is of the class
92           HTML::TreeBuilder, but that's just a subclass of HTML::Element,
93           plus the few extra methods like "parse_file" that elaborate the
94           tree
95
96       , with the tag name "html", and with two children: an HTML::Element
97       object whose tag names are "head" and "body".  And each of those
98       elements have children, and so on down.  Not all elements (as we'll
99       call the objects of class HTML::Element) have children -- the "hr"
100       element doesn't.  And note all nodes in the tree are elements -- the
101       text nodes ("Doc 1", "Stuff", and "2000-08-17") are just strings.
102
103       Objects of the class HTML::Element each have three noteworthy
104       attributes:
105
106       "_tag" -- (best accessed as "$e->tag") this element's tag-name,
107       lowercased (e.g., "em" for an "em" element).
108               Footnote: Yes, this is misnamed.  In proper SGML terminology,
109               this is instead called a "GI", short for "generic identifier";
110               and the term "tag" is used for a token of SGML source that
111               represents either the start of an element (a start-tag like
112               "<em lang='fr'>") or the end of an element (an end-tag like
113               "</em>".  However, since more people claim to have been
114               abducted by aliens than to have ever seen the SGML standard,
115               and since both encounters typically involve a feeling of
116               "missing time", it's not surprising that the terminology of the
117               SGML standard is not closely followed.
118
119       "_parent" -- (best accessed as "$e->parent") the element that is $obj's
120       parent, or undef if this element is the root of its tree.
121       "_content" -- (best accessed as "$e->content_list") the list of nodes
122       (i.e., elements or text segments) that are $e's children.
123
124       Moreover, if an element object has any attributes in the SGML sense of
125       the word, then those are readable as "$e->attr('name')" -- for example,
126       with the object built from having parsed "<a id='foo'>bar</a>",
127       "$e->attr('id')" will return the string "foo".  Moreover, "$e->tag" on
128       that object returns the string "a", "$e->content_list" returns a list
129       consisting of just the single scalar "bar", and "$e->parent" returns
130       the object that's this node's parent -- which may be, for example, a
131       "p" element.
132
133       And that's all that there is to it -- you throw HTML source at
134       TreeBuilder, and it returns a tree built of HTML::Element objects and
135       some text strings.
136
137       However, what do you do with a tree of objects?  People code
138       information into HTML trees not for the fun of arranging elements, but
139       to represent the structure of specific text and images -- some text is
140       in this "li" element, some other text is in that heading, some images
141       are in that other table cell that has those attributes, and so on.
142
143       Now, it may happen that you're rendering that whole HTML tree into some
144       layout format.  Or you could be trying to make some systematic change
145       to the HTML tree before dumping it out as HTML source again.  But, in
146       my experience, by far the most common programming task that Perl
147       programmers face with HTML is in trying to extract some piece of
148       information from a larger document.  Since that's so common (and also
149       since it involves concepts that are basic to more complex tasks), that
150       is what the rest of this article will be about.
151
152   Scanning HTML trees
153       Suppose you have a thousand HTML documents, each of them a press
154       release.  They all start out:
155
156         [...lots of leading images and junk...]
157         <h1>ConGlomCo to Open New Corporate Office in Ougadougou</h1>
158         BAKERSFIELD, CA, 2000-04-24 -- ConGlomCo's vice president in charge
159         of world conquest, Rock Feldspar, announced today the opening of a
160         new office in Ougadougou, the capital city of Burkino Faso, gateway
161         to the bustling "Silicon Sahara" of Africa...
162         [...etc...]
163
164       ...and what you've got to do is, for each document, copy whatever text
165       is in the "h1" element, so that you can, for example, make a table of
166       contents of it.  Now, there are three ways to do this:
167
168       ·   You can just use a regexp to scan the file for a text pattern.
169
170           For many very simple tasks, this will do fine.  Many HTML documents
171           are, in practice, very consistently formatted as far as placement
172           of linebreaks and whitespace, so you could just get away with
173           scanning the file like so:
174
175             sub get_heading {
176               my $filename = $_[0];
177               local *HTML;
178               open(HTML, $filename)
179                 or die "Couldn't open $filename);
180               my $heading;
181              Line:
182               while(<HTML>) {
183                 if( m{<h1>(.*?)</h1>}i ) {  # match it!
184                   $heading = $1;
185                   last Line;
186                 }
187               }
188               close(HTML);
189               warn "No heading in $filename?"
190                unless defined $heading;
191               return $heading;
192             }
193
194           This is quick and fast, but awfully fragile -- if there's a newline
195           in the middle of a heading's text, it won't match the above regexp,
196           and you'll get an error.  The regexp will also fail if the "h1"
197           element's start-tag has any attributes.  If you have to adapt your
198           code to fit more kinds of start-tags, you'll end up basically
199           reinventing part of HTML::Parser, at which point you should
200           probably just stop, and use HTML::Parser itself:
201
202       ·   You can use HTML::Parser to scan the file for an "h1" start-tag
203           token, then capture all the text tokens until the "h1" close-tag.
204           This approach is extensively covered in the Ken MacFarlane's TPJ17
205           article "Parsing HTML with HTML::Parser".  (A variant of this
206           approach is to use HTML::TokeParser, which presents a different and
207           rather handier interface to the tokens that HTML::Parser picks
208           out.)
209
210           Using HTML::Parser is less fragile than our first approach, since
211           it's not sensitive to the exact internal formatting of the start-
212           tag (much less whether it's split across two lines).  However, when
213           you need more information about the context of the "h1" element, or
214           if you're having to deal with any of the tricky bits of HTML, such
215           as parsing of tables, you'll find out the flat list of tokens that
216           HTML::Parser returns isn't immediately useful.  To get something
217           useful out of those tokens, you'll need to write code that knows
218           some things about what elements take no content (as with "hr"
219           elements), and that a "</p>" end-tags are omissible, so a "<p>"
220           will end any currently open paragraph -- and you're well on your
221           way to pointlessly reinventing much of the code in
222           HTML::TreeBuilder
223
224               Footnote: And, as the person who last rewrote that module, I
225               can attest that it wasn't terribly easy to get right!  Never
226               underestimate the perversity of people coding HTML.
227
228           , at which point you should probably just stop, and use
229           HTML::TreeBuilder itself:
230
231       ·   You can use HTML::Treebuilder, and scan the tree of element objects
232           that you get back.
233
234       The last approach, using HTML::TreeBuilder, is the diametric opposite
235       of first approach:  The first approach involves just elementary Perl
236       and one regexp, whereas the TreeBuilder approach involves being at home
237       with the concept of tree-shaped data structures and modules with
238       object-oriented interfaces, as well as with the particular interfaces
239       that HTML::TreeBuilder and HTML::Element provide.
240
241       However, what the TreeBuilder approach has going for it is that it's
242       the most robust, because it involves dealing with HTML in its "native"
243       format -- it deals with the tree structure that HTML code represents,
244       without any consideration of how the source is coded and with what tags
245       omitted.
246
247       So, to extract the text from the "h1" elements of an HTML document:
248
249         sub get_heading {
250           my $tree = HTML::TreeBuilder->new;
251           $tree->parse_file($_[0]);   # !
252           my $heading;
253           my $h1 = $tree->look_down('_tag', 'h1');  # !
254           if($h1) {
255             $heading = $h1->as_text;   # !
256           } else {
257             warn "No heading in $_[0]?";
258           }
259           $tree->delete; # clear memory!
260           return $heading;
261         }
262
263       This uses some unfamiliar methods that need explaining.  The
264       "parse_file" method that we've seen before, builds a tree based on
265       source from the file given.  The "delete" method is for marking a
266       tree's contents as available for garbage collection, when you're done
267       with the tree.  The "as_text" method returns a string that contains all
268       the text bits that are children (or otherwise descendants) of the given
269       node -- to get the text content of the $h1 object, we could just say:
270
271         $heading = join '', $h1->content_list;
272
273       but that will work only if we're sure that the "h1" element's children
274       will be only text bits -- if the document contained:
275
276         <h1>Local Man Sees <cite>Blade</cite> Again</h1>
277
278       then the sub-tree would be:
279
280         . h1
281           . "Local Man Sees "
282           . cite
283             . "Blade"
284           . " Again'
285
286       so "join '', $h1->content_list" will be something like:
287
288         Local Man Sees HTML::Element=HASH(0x15424040) Again
289
290       whereas "$h1->as_text" would yield:
291
292         Local Man Sees Blade Again
293
294       and depending on what you're doing with the heading text, you might
295       want the "as_HTML" method instead.  It returns the (sub)tree
296       represented as HTML source.  "$h1->as_HTML" would yield:
297
298         <h1>Local Man Sees <cite>Blade</cite> Again</h1>
299
300       However, if you wanted the contents of $h1 as HTML, but not the $h1
301       itself, you could say:
302
303         join '',
304           map(
305             ref($_) ? $_->as_HTML : $_,
306             $h1->content_list
307           )
308
309       This "map" iterates over the nodes in $h1's list of children; and for
310       each node that's just a text bit (as "Local Man Sees " is), it just
311       passes through that string value, and for each node that's an actual
312       object (causing "ref" to be true), "as_HTML" will used instead of the
313       string value of the object itself (which would be something quite
314       useless, as most object values are).  So that "as_HTML" for the "cite"
315       element will be the string "<cite>Blade</cite>".  And then, finally,
316       "join" just puts into one string all the strings that the "map"
317       returns.
318
319       Last but not least, the most important method in our "get_heading" sub
320       is the "look_down" method.  This method looks down at the subtree
321       starting at the given object ($h1), looking for elements that meet
322       criteria you provide.
323
324       The criteria are specified in the method's argument list.  Each
325       criterion can consist of two scalars, a key and a value, which express
326       that you want elements that have that attribute (like "_tag", or "src")
327       with the given value ("h1"); or the criterion can be a reference to a
328       subroutine that, when called on the given element, returns true if that
329       is a node you're looking for.  If you specify several criteria, then
330       that's taken to mean that you want all the elements that each satisfy
331       all the criteria.  (In other words, there's an "implicit AND".)
332
333       And finally, there's a bit of an optimization -- if you call the
334       "look_down" method in a scalar context, you get just the first node (or
335       undef if none) -- and, in fact, once "look_down" finds that first
336       matching element, it doesn't bother looking any further.
337
338       So the example:
339
340         $h1 = $tree->look_down('_tag', 'h1');
341
342       returns the first element at-or-under $tree whose "_tag" attribute has
343       the value "h1".
344
345   Complex Criteria in Tree Scanning
346       Now, the above "look_down" code looks like a lot of bother, with barely
347       more benefit than just grepping the file!  But consider if your
348       criteria were more complicated -- suppose you found that some of the
349       press releases that you were scanning had several "h1" elements,
350       possibly before or after the one you actually want.  For example:
351
352         <h1><center>Visit Our Corporate Partner
353          <br><a href="/dyna/clickthru"
354            ><img src="/dyna/vend_ad"></a>
355         </center></h1>
356         <h1><center>ConGlomCo President Schreck to Visit Regional HQ
357          <br><a href="/photos/Schreck_visit_large.jpg"
358            ><img src="/photos/Schreck_visit.jpg"></a>
359         </center></h1>
360
361       Here, you want to ignore the first "h1" element because it contains an
362       ad, and you want the text from the second "h1".  The problem is in
363       formalizing the way you know that it's an ad.  Since ad banners are
364       always entreating you to "visit" the sponsoring site, you could exclude
365       "h1" elements that contain the word "visit" under them:
366
367         my $real_h1 = $tree->look_down(
368           '_tag', 'h1',
369           sub {
370             $_[0]->as_text !~ m/\bvisit/i
371           }
372         );
373
374       The first criterion looks for "h1" elements, and the second criterion
375       limits those to only the ones whose text content doesn't match
376       "m/\bvisit/".  But unfortunately, that won't work for our example,
377       since the second "h1" mentions "ConGlomCo President Schreck to Visit
378       Regional HQ".
379
380       Instead you could try looking for the first "h1" element that doesn't
381       contain an image:
382
383         my $real_h1 = $tree->look_down(
384           '_tag', 'h1',
385           sub {
386             not $_[0]->look_down('_tag', 'img')
387           }
388         );
389
390       This criterion sub might seem a bit odd, since it calls "look_down" as
391       part of a larger "look_down" operation, but that's fine.  Note that
392       when considered as a boolean value, a "look_down" in a scalar context
393       value returns false (specifically, undef) if there's no matching
394       element at or under the given element; and it returns the first
395       matching element (which, being a reference and object, is always a true
396       value), if any matches.  So, here,
397
398         sub {
399           not $_[0]->look_down('_tag', 'img')
400         }
401
402       means "return true only if this element has no 'img' element as
403       descendants (and isn't an 'img' element itself)."
404
405       This correctly filters out the first "h1" that contains the ad, but it
406       also incorrectly filters out the second "h1" that contains a non-
407       advertisement photo besides the headline text you want.
408
409       There clearly are detectable differences between the first and second
410       "h1" elements -- the only second one contains the string "Schreck", and
411       we could just test for that:
412
413         my $real_h1 = $tree->look_down(
414           '_tag', 'h1',
415           sub {
416             $_[0]->as_text =~ m{Schreck}
417           }
418         );
419
420       And that works fine for this one example, but unless all thousand of
421       your press releases have "Schreck" in the headline, that's just not a
422       general solution.  However, if all the ads-in-"h1"s that you want to
423       exclude involve a link whose URL involves "/dyna/", then you can use
424       that:
425
426         my $real_h1 = $tree->look_down(
427           '_tag', 'h1',
428           sub {
429             my $link = $_[0]->look_down('_tag','a');
430             return 1 unless $link;
431               # no link means it's fine
432             return 0 if $link->attr('href') =~ m{/dyna/};
433               # a link to there is bad
434             return 1; # otherwise okay
435           }
436         );
437
438       Or you can look at it another way and say that you want the first "h1"
439       element that either contains no images, or else whose image has a "src"
440       attribute whose value contains "/photos/":
441
442         my $real_h1 = $tree->look_down(
443           '_tag', 'h1',
444           sub {
445             my $img = $_[0]->look_down('_tag','img');
446             return 1 unless $img;
447               # no image means it's fine
448             return 1 if $img->attr('src') =~ m{/photos/};
449               # good if a photo
450             return 0; # otherwise bad
451           }
452         );
453
454       Recall that this use of "look_down" in a scalar context means to return
455       the first element at or under $tree that matches all the criteria.  But
456       if you notice that you can formulate criteria that'll match several
457       possible "h1" elements, some of which may be bogus but the last one of
458       which is always the one you want, then you can use "look_down" in a
459       list context, and just use the last element of that list:
460
461         my @h1s = $tree->look_down(
462           '_tag', 'h1',
463           ...maybe more criteria...
464         );
465         die "What, no h1s here?" unless @h1s;
466         my $real_h1 = $h1s[-1]; # last or only
467
468   A Case Study: Scanning Yahoo News's HTML
469       The above (somewhat contrived) case involves extracting data from a
470       bunch of pre-existing HTML files.  In that sort of situation, if your
471       code works for all the files, then you know that the code works --
472       since the data it's meant to handle won't go changing or growing; and,
473       typically, once you've used the program, you'll never need to use it
474       again.
475
476       The other kind of situation faced in many data extraction tasks is
477       where the program is used recurringly to handle new data -- such as
478       from ever-changing Web pages.  As a real-world example of this,
479       consider a program that you could use (suppose it's crontabbed) to
480       extract headline-links from subsections of Yahoo News
481       ("http://dailynews.yahoo.com/").
482
483       Yahoo News has several subsections:
484
485       http://dailynews.yahoo.com/h/tc/ for technology news
486       http://dailynews.yahoo.com/h/sc/ for science news
487       http://dailynews.yahoo.com/h/hl/ for health news
488       http://dailynews.yahoo.com/h/wl/ for world news
489       http://dailynews.yahoo.com/h/en/ for entertainment news
490
491       and others.  All of them are built on the same basic HTML template --
492       and a scarily complicated template it is, especially when you look at
493       it with an eye toward making up rules that will select where the real
494       headline-links are, while screening out all the links to other parts of
495       Yahoo, other news services, etc.  You will need to puzzle over the HTML
496       source, and scrutinize the output of "$tree->dump" on the parse tree of
497       that HTML.
498
499       Sometimes the only way to pin down what you're after is by position in
500       the tree. For example, headlines of interest may be in the third column
501       of the second row of the second table element in a page:
502
503         my $table = ( $tree->look_down('_tag','table') )[1];
504         my $row2  = ( $table->look_down('_tag', 'tr' ) )[1];
505         my $col3  = ( $row2->look-down('_tag', 'td')   )[2];
506         ...then do things with $col3...
507
508       Or they may be all the links in a "p" element that has at least three
509       "br" elements as children:
510
511         my $p = $tree->look_down(
512           '_tag', 'p',
513           sub {
514             2 < grep { ref($_) and $_->tag eq 'br' }
515                      $_[0]->content_list
516           }
517         );
518         @links = $p->look_down('_tag', 'a');
519
520       But almost always, you can get away with looking for properties of the
521       of the thing itself, rather than just looking for contexts.  Now, if
522       you're lucky, the document you're looking through has clear semantic
523       tagging, such is as useful in CSS -- note the class="headlinelink" bit
524       here:
525
526         <a href="...long_news_url..." class="headlinelink">Elvis
527         seen in tortilla</a>
528
529       If you find anything like that, you could leap right in and select
530       links with:
531
532         @links = $tree->look_down('class','headlinelink');
533
534       Regrettably, your chances of seeing any sort of semantic markup
535       principles really being followed with actual HTML are pretty thin.
536
537           Footnote: In fact, your chances of finding a page that is simply
538           free of HTML errors are even thinner.  And surprisingly, sites like
539           Amazon or Yahoo are typically worse as far as quality of code than
540           personal sites whose entire production cycle involves simply being
541           saved and uploaded from Netscape Composer.
542
543       The code may be sort of "accidentally semantic", however -- for
544       example, in a set of pages I was scanning recently, I found that
545       looking for "td" elements with a "width" attribute value of "375" got
546       me exactly what I wanted.  No-one designing that page ever conceived of
547       "width=375" as meaning "this is a headline", but if you impute it to
548       mean that, it works.
549
550       An approach like this happens to work for the Yahoo News code, because
551       the headline-links are distinguished by the fact that they (and they
552       alone) contain a "b" element:
553
554         <a href="...long_news_url..."><b>Elvis seen in tortilla</b></a>
555
556       or, diagrammed as a part of the parse tree:
557
558         . a  [href="...long_news_url..."]
559           . b
560             . "Elvis seen in tortilla"
561
562       A rule that matches these can be formalized as "look for any 'a'
563       element that has only one daugher node, which must be a 'b' element".
564       And this is what it looks like when cooked up as a "look_down"
565       expression and prefaced with a bit of code that retrieves the text of
566       the given Yahoo News page and feeds it to TreeBuilder:
567
568         use strict;
569         use HTML::TreeBuilder 2.97;
570         use LWP::UserAgent;
571         sub get_headlines {
572           my $url = $_[0] || die "What URL?";
573
574           my $response = LWP::UserAgent->new->request(
575             HTTP::Request->new( GET => $url )
576           );
577           unless($response->is_success) {
578             warn "Couldn't get $url: ", $response->status_line, "\n";
579             return;
580           }
581
582           my $tree = HTML::TreeBuilder->new();
583           $tree->parse($response->content);
584           $tree->eof;
585
586           my @out;
587           foreach my $link (
588             $tree->look_down(   # !
589               '_tag', 'a',
590               sub {
591                 return unless $_[0]->attr('href');
592                 my @c = $_[0]->content_list;
593                 @c == 1 and ref $c[0] and $c[0]->tag eq 'b';
594               }
595             )
596           ) {
597             push @out, [ $link->attr('href'), $link->as_text ];
598           }
599
600           warn "Odd, fewer than 6 stories in $url!" if @out < 6;
601           $tree->delete;
602           return @out;
603         }
604
605       ...and add a bit of code to actually call that routine and display the
606       results...
607
608         foreach my $section (qw[tc sc hl wl en]) {
609           my @links = get_headlines(
610             "http://dailynews.yahoo.com/h/$section/"
611           );
612           print
613             $section, ": ", scalar(@links), " stories\n",
614             map(("  ", $_->[0], " : ", $_->[1], "\n"), @links),
615             "\n";
616         }
617
618       And we've got our own headline-extractor service!  This in and of
619       itself isn't no amazingly useful (since if you want to see the
620       headlines, you can just look at the Yahoo News pages), but it could
621       easily be the basis for quite useful features like filtering the
622       headlines for matching certain keywords of interest to you.
623
624       Now, one of these days, Yahoo News will decide to change its HTML
625       template.  When this happens, this will appear to the above program as
626       there being no links that meet the given criteria; or, less likely,
627       dozens of erroneous links will meet the criteria.  In either case, the
628       criteria will have to be changed for the new template; they may just
629       need adjustment, or you may need to scrap them and start over.
630
631   Regardez, duvet!
632       It's often quite a challenge to write criteria to match the desired
633       parts of an HTML parse tree.  Very often you can pull it off with a
634       simple "$tree->look_down('_tag', 'h1')", but sometimes you do have to
635       keep adding and refining criteria, until you might end up with complex
636       filters like what I've shown in this article.  The benefit to learning
637       how to deal with HTML parse trees is that one main search tool, the
638       "look_down" method, can do most of the work, making simple things easy,
639       while still making hard things possible.
640
641       [end body of article]
642
643   [Author Credit]
644       Sean M. Burke ("sburke@cpan.org") is the current maintainer of
645       "HTML::TreeBuilder" and "HTML::Element", both originally by Gisle Aas.
646
647       Sean adds: "I'd like to thank the folks who listened to me ramble
648       incessantly about HTML::TreeBuilder and HTML::Element at this year's
649       Yet Another Perl Conference and O'Reilly Open Source Software
650       Convention."
651

BACK

653       Return to the HTML::Tree docs.
654
655
656
657perl v5.12.2                      2010-12-20           HTML::Tree::Scanning(3)