1Tagset(3) User Contributed Perl Documentation Tagset(3)
2
3
4
6 HTML::Tagset - data tables useful in parsing HTML
7
9 Version 3.10
10
12 use HTML::Tagset;
13 # Then use any of the items in the HTML::Tagset package
14 # as need arises
15
17 This module contains several data tables useful in various kinds of
18 HTML parsing operations.
19
20 Note that all tag names used are lowercase.
21
22 In the following documentation, a "hashset" is a hash being used as a
23 set -- the hash conveys that its keys are there, and the actual values
24 associated with the keys are not significant. (But what values are
25 there, are always true.)
26
28 Note that none of these variables are exported.
29
30 hashset %HTML::Tagset::emptyElement
31
32 This hashset has as values the tag-names (GIs) of elements that cannot
33 have content. (For example, "base", "br", "hr".) So
34 $HTML::Tagset::emptyElement{'hr'} exists and is true.
35 $HTML::Tagset::emptyElement{'dl'} does not exist, and so is not true.
36
37 hashset %HTML::Tagset::optionalEndTag
38
39 This hashset lists tag-names for elements that can have content, but
40 whose end-tags are generally, "safely", omissible. Example:
41 $HTML::Tagset::emptyElement{'li'} exists and is true.
42
43 hash %HTML::Tagset::linkElements
44
45 Values in this hash are tagnames for elements that might contain links,
46 and the value for each is a reference to an array of the names of
47 attributes whose values can be links.
48
49 hash %HTML::Tagset::boolean_attr
50
51 This hash (not hashset) lists what attributes of what elements can be
52 printed without showing the value (for example, the "noshade" attribute
53 of "hr" elements). For elements with only one such attribute, its
54 value is simply that attribute name. For elements with many such
55 attributes, the value is a reference to a hashset containing all such
56 attributes.
57
58 hashset %HTML::Tagset::isPhraseMarkup
59
60 This hashset contains all phrasal-level elements.
61
62 hashset %HTML::Tagset::is_Possible_Strict_P_Content
63
64 This hashset contains all phrasal-level elements that be content of a P
65 element, for a strict model of HTML.
66
67 hashset %HTML::Tagset::isHeadElement
68
69 This hashset contains all elements that elements that should be present
70 only in the 'head' element of an HTML document.
71
72 hashset %HTML::Tagset::isList
73
74 This hashset contains all elements that can contain "li" elements.
75
76 hashset %HTML::Tagset::isTableElement
77
78 This hashset contains all elements that are to be found only in/under a
79 "table" element.
80
81 hashset %HTML::Tagset::isFormElement
82
83 This hashset contains all elements that are to be found only in/under a
84 "form" element.
85
86 hashset %HTML::Tagset::isBodyMarkup
87
88 This hashset contains all elements that are to be found only in/under
89 the "body" element of an HTML document.
90
91 hashset %HTML::Tagset::isHeadOrBodyElement
92
93 This hashset includes all elements that I notice can fall either in the
94 head or in the body.
95
96 hashset %HTML::Tagset::isKnown
97
98 This hashset lists all known HTML elements.
99
100 hashset %HTML::Tagset::canTighten
101
102 This hashset lists elements that might have ignorable whitespace as
103 children or siblings.
104
105 array @HTML::Tagset::p_closure_barriers
106
107 This array has a meaning that I have only seen a need for in
108 "HTML::TreeBuilder", but I include it here on the off chance that some‐
109 one might find it of use:
110
111 When we see a "<p>" token, we go lookup up the lineage for a p element
112 we might have to minimize. At first sight, we might say that if
113 there's a p anywhere in the lineage of this new p, it should be closed.
114 But that's wrong. Consider this document:
115
116 <html>
117 <head>
118 <title>foo</title>
119 </head>
120 <body>
121 <p>foo
122 <table>
123 <tr>
124 <td>
125 foo
126 <p>bar
127 </td>
128 </tr>
129 </table>
130 </p>
131 </body>
132 </html>
133
134 The second p is quite legally inside a much higher p.
135
136 My formalization of the reason why this is legal, but this:
137
138 <p>foo<p>bar</p></p>
139
140 isn't, is that something about the table constitutes a "barrier" to the
141 application of the rule about what p must minimize.
142
143 So @HTML::Tagset::p_closure_barriers is the list of all such bar‐
144 rier-tags.
145
146 hashset %isCDATA_Parent
147
148 This hashset includes all elements whose content is CDATA.
149
151 You may find it useful to alter the behavior of modules (like
152 "HTML::Element" or "HTML::TreeBuilder") that use "HTML::Tagset"'s data
153 tables by altering the data tables themselves. You are welcome to try,
154 but be careful; and be aware that different modules may or may react
155 differently to the data tables being changed.
156
157 Note that it may be inappropriate to use these tables for producing
158 HTML -- for example, %isHeadOrBodyElement lists the tagnames for all
159 elements that can appear either in the head or in the body, such as
160 "script". That doesn't mean that I am saying your code that produces
161 HTML should feel free to put script elements in either place! If you
162 are producing programs that spit out HTML, you should be intimately
163 familiar with the DTDs for HTML or XHTML (available at
164 "http://www.w3.org/"), and you should slavishly obey them, not the data
165 tables in this document.
166
168 HTML::Element, HTML::TreeBuilder, HTML::LinkExtor
169
171 Copyright 1995-2000 Gisle Aas.
172
173 Copyright 2000-2005 Sean M. Burke.
174
175 Copyright 2005 Andy Lester.
176
177 This program is free software; you can redistribute it and/or modify it
178 under the same terms as Perl itself.
179
181 Most of the code/data in this module was adapted from code written by
182 Gisle Aas for "HTML::Element", "HTML::TreeBuilder", and "HTML::LinkEx‐
183 tor". Then it was maintained by Sean M. Burke.
184
186 Current maintainer: Andy Lester, "<andy at petdance.com>"
187
189 Please report any bugs or feature requests to "bug-html-tagset at
190 rt.cpan.org", or through the web interface at
191 <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTML-Tagset>. I will
192 be notified, and then you'll automatically be notified of progress on
193 your bug as I make changes.
194
195
196
197perl v5.8.8 2005-11-08 Tagset(3)