1Tagset(3)             User Contributed Perl Documentation            Tagset(3)
2
3
4

NAME

6       HTML::Tagset - data tables useful in parsing HTML
7

VERSION

9       Version 3.10
10

SYNOPSIS

12         use HTML::Tagset;
13         # Then use any of the items in the HTML::Tagset package
14         #  as need arises
15

DESCRIPTION

17       This module contains several data tables useful in various kinds of
18       HTML parsing operations.
19
20       Note that all tag names used are lowercase.
21
22       In the following documentation, a "hashset" is a hash being used as a
23       set -- the hash conveys that its keys are there, and the actual values
24       associated with the keys are not significant.  (But what values are
25       there, are always true.)
26

VARIABLES

28       Note that none of these variables are exported.
29
30       hashset %HTML::Tagset::emptyElement
31
32       This hashset has as values the tag-names (GIs) of elements that cannot
33       have content.  (For example, "base", "br", "hr".)  So
34       $HTML::Tagset::emptyElement{'hr'} exists and is true.
35       $HTML::Tagset::emptyElement{'dl'} does not exist, and so is not true.
36
37       hashset %HTML::Tagset::optionalEndTag
38
39       This hashset lists tag-names for elements that can have content, but
40       whose end-tags are generally, "safely", omissible.  Example:
41       $HTML::Tagset::emptyElement{'li'} exists and is true.
42
43       hash %HTML::Tagset::linkElements
44
45       Values in this hash are tagnames for elements that might contain links,
46       and the value for each is a reference to an array of the names of
47       attributes whose values can be links.
48
49       hash %HTML::Tagset::boolean_attr
50
51       This hash (not hashset) lists what attributes of what elements can be
52       printed without showing the value (for example, the "noshade" attribute
53       of "hr" elements).  For elements with only one such attribute, its
54       value is simply that attribute name.  For elements with many such
55       attributes, the value is a reference to a hashset containing all such
56       attributes.
57
58       hashset %HTML::Tagset::isPhraseMarkup
59
60       This hashset contains all phrasal-level elements.
61
62       hashset %HTML::Tagset::is_Possible_Strict_P_Content
63
64       This hashset contains all phrasal-level elements that be content of a P
65       element, for a strict model of HTML.
66
67       hashset %HTML::Tagset::isHeadElement
68
69       This hashset contains all elements that elements that should be present
70       only in the 'head' element of an HTML document.
71
72       hashset %HTML::Tagset::isList
73
74       This hashset contains all elements that can contain "li" elements.
75
76       hashset %HTML::Tagset::isTableElement
77
78       This hashset contains all elements that are to be found only in/under a
79       "table" element.
80
81       hashset %HTML::Tagset::isFormElement
82
83       This hashset contains all elements that are to be found only in/under a
84       "form" element.
85
86       hashset %HTML::Tagset::isBodyMarkup
87
88       This hashset contains all elements that are to be found only in/under
89       the "body" element of an HTML document.
90
91       hashset %HTML::Tagset::isHeadOrBodyElement
92
93       This hashset includes all elements that I notice can fall either in the
94       head or in the body.
95
96       hashset %HTML::Tagset::isKnown
97
98       This hashset lists all known HTML elements.
99
100       hashset %HTML::Tagset::canTighten
101
102       This hashset lists elements that might have ignorable whitespace as
103       children or siblings.
104
105       array @HTML::Tagset::p_closure_barriers
106
107       This array has a meaning that I have only seen a need for in
108       "HTML::TreeBuilder", but I include it here on the off chance that some‐
109       one might find it of use:
110
111       When we see a "<p>" token, we go lookup up the lineage for a p element
112       we might have to minimize.  At first sight, we might say that if
113       there's a p anywhere in the lineage of this new p, it should be closed.
114       But that's wrong.  Consider this document:
115
116         <html>
117           <head>
118             <title>foo</title>
119           </head>
120           <body>
121             <p>foo
122               <table>
123                 <tr>
124                   <td>
125                      foo
126                      <p>bar
127                   </td>
128                 </tr>
129               </table>
130             </p>
131           </body>
132         </html>
133
134       The second p is quite legally inside a much higher p.
135
136       My formalization of the reason why this is legal, but this:
137
138         <p>foo<p>bar</p></p>
139
140       isn't, is that something about the table constitutes a "barrier" to the
141       application of the rule about what p must minimize.
142
143       So @HTML::Tagset::p_closure_barriers is the list of all such bar‐
144       rier-tags.
145
146       hashset %isCDATA_Parent
147
148       This hashset includes all elements whose content is CDATA.
149

CAVEATS

151       You may find it useful to alter the behavior of modules (like
152       "HTML::Element" or "HTML::TreeBuilder") that use "HTML::Tagset"'s data
153       tables by altering the data tables themselves.  You are welcome to try,
154       but be careful; and be aware that different modules may or may react
155       differently to the data tables being changed.
156
157       Note that it may be inappropriate to use these tables for producing
158       HTML -- for example, %isHeadOrBodyElement lists the tagnames for all
159       elements that can appear either in the head or in the body, such as
160       "script".  That doesn't mean that I am saying your code that produces
161       HTML should feel free to put script elements in either place!  If you
162       are producing programs that spit out HTML, you should be intimately
163       familiar with the DTDs for HTML or XHTML (available at
164       "http://www.w3.org/"), and you should slavishly obey them, not the data
165       tables in this document.
166

SEE ALSO

168       HTML::Element, HTML::TreeBuilder, HTML::LinkExtor
169
171       Copyright 1995-2000 Gisle Aas.
172
173       Copyright 2000-2005 Sean M. Burke.
174
175       Copyright 2005 Andy Lester.
176
177       This program is free software; you can redistribute it and/or modify it
178       under the same terms as Perl itself.
179

ACKNOWLEDGEMENTS

181       Most of the code/data in this module was adapted from code written by
182       Gisle Aas for "HTML::Element", "HTML::TreeBuilder", and "HTML::LinkEx‐
183       tor".  Then it was maintained by Sean M. Burke.
184

AUTHOR

186       Current maintainer: Andy Lester, "<andy at petdance.com>"
187

BUGS

189       Please report any bugs or feature requests to "bug-html-tagset at
190       rt.cpan.org", or through the web interface at
191       <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTML-Tagset>.  I will
192       be notified, and then you'll automatically be notified of progress on
193       your bug as I make changes.
194
195
196
197perl v5.8.8                       2005-11-08                         Tagset(3)
Impressum