1Schema(3) User Contributed Perl Documentation Schema(3)
2
3
4
6 XML::Validator::Schema - validate XML against a subset of W3C XML
7 Schema
8
10 use XML::SAX::ParserFactory;
11 use XML::Validator::Schema;
12
13 #
14 # create a new validator object, using foo.xsd
15 #
16 $validator = XML::Validator::Schema->new(file => 'foo.xsd');
17
18 #
19 # create a SAX parser and assign the validator as a Handler
20 #
21 $parser = XML::SAX::ParserFactory->parser(Handler => $validator);
22
23 #
24 # validate foo.xml against foo.xsd
25 #
26 eval { $parser->parse_uri('foo.xml') };
27 die "File failed validation: $@" if $@;
28
30 This module allows you to validate XML documents against a W3C XML
31 Schema. This module does not implement the full W3C XML Schema recom‐
32 mendation (http://www.w3.org/XML/Schema), but a useful subset. See the
33 SCHEMA SUPPORT section below.
34
35 IMPORTANT NOTE: To get line and column numbers in the error messages
36 generated by this module you must install XML::Filter::ExceptionLocator
37 and use XML::SAX::ExpatXS as your SAX parser. This module is much more
38 useful if you can tell where your errors are, so using these modules is
39 highly recommeded!
40
42 · "XML::Validator::Schema->new(file => 'file.xsd', cache => 1)"
43
44 Call this method to create a new XML::Validator:Schema object. The
45 only required option is "file" which must provide a path to an XML
46 Schema document.
47
48 Setting the optional "cache" parameter to 1 causes XML::Valida‐
49 tor::Schema to keep a copy of the schema parse tree in memory. The
50 tree will be reused on subsequent calls with the same "file" param‐
51 eter, as long as the mtime on the schema file hasn't changed. This
52 can save a lot of time if you're validating many documents against
53 a single schema.
54
55 Since XML::Validator::Schema is a SAX filter you will normally pass
56 this object to a SAX parser:
57
58 $validator = XML::Validator::Schema->new(file => 'foo.xsd');
59 $parser = XML::SAX::ParserFactory->parser(Handler => $validator);
60
61 Then you can proceed to validate files using the parser:
62
63 eval { $parser->parse_uri('foo.xml') };
64 die "File failed validation: $@" if $@;
65
66 Setting the optional "debug" parameter to 1 causes XML::Valida‐
67 tor::Schema to output elements and associated attributes while
68 parsing and validating the XML document. This provides useful
69 information on the position where the validation failed (although
70 not at useful as the line and column numbers included when
71 XML::Filter::ExceptiionLocator and XML::SAX::ExpatXS are used).
72
74 I'm writing a piece of software which uses Xerces/C++ (
75 http://xml.apache.org/xerces-c/ ) to validate documents against XML
76 Schema schemas. This works very well, but I'd like to release my
77 project to the world. Requiring users to install Xerces is simply too
78 onerous a requirement; few will have it already and the Xerces instal‐
79 lation system leaves much to be desired.
80
81 On CPAN, the only available XML Schema validator is XML::Schema.
82 Unfortunately, this module isn't ready for use as it lacks the ability
83 to actually parse the XML Schema document format! I looked into
84 enhancing XML::Schema but I must admit that I'm not smart enough to
85 understand the code... One day, when XML::Schema is completed I will
86 replace this module with a wrapper around it.
87
88 This module represents my attempt to support enough XML Schema syntax
89 to be useful without attempting to tackle the full standard. I'm sure
90 this will mean that it can't be used in all situations, but hopefully
91 that won't prevent it from being used at all.
92
94 Supported Elements
95
96 The following elements are supported by the XML Schema parser. If you
97 don't see an element or an attribute here then you definitely can't use
98 it in a schema document.
99
100 You can expect that the schema document parser will produce an error if
101 you include elements which are not supported. However, unsupported
102 attributes may be silently ignored. This should not be misconstrued as
103 a feature and will eventually be fixed.
104
105 All of these elements must be in the http://www.w3.org/2001/XMLSchema
106 namespace, either using a default namespace or a prefix.
107
108 <schema>
109
110 Supported attributes: targetNamespace, elementFormDefault,
111 attributeFormDefault
112
113 Notes: the only supported values for elementFormDefault and
114 attributeFormDefault are "unqualified." As such, targetNamespace
115 is essentially ignored.
116
117 <element name="foo">
118
119 Supported attributes: name, type, minOccurs, maxOccurs, ref
120
121 <attribute>
122
123 Supported attributes: name, type, use, ref
124
125 <sequence>
126
127 Supported attributes: minOccurs, maxOccurs
128
129 <choice>
130
131 Supported attributes: minOccurs, maxOccurs
132
133 <all>
134
135 Supported attributes: minOccurs, maxOccurs
136
137 <complexType>
138
139 Supported attributes: name
140
141 <simpleContent>
142
143 <extension>
144
145 Supported attributes: base
146
147 Notes: only allowed inside <simpleContent>
148
149 <simpleType>
150
151 Supported attributes: name
152
153 <restriction>
154
155 Supported attributes: base
156
157 <whiteSpace>
158
159 Supported attributes: value
160
161 <pattern>
162
163 Supported attributes: value
164
165 <enumeration>
166
167 Supported attributes: value
168
169 <length>
170
171 Supported attributes: value
172
173 <minLength>
174
175 Supported attributes: value
176
177 <maxLength>
178
179 Supported attributes: value
180
181 <minInclusive>
182
183 Supported attributes: value
184
185 <minExclusive>
186
187 Supported attributes: value
188
189 <maxInclusive>
190
191 Supported attributes: value
192
193 <maxExclusive>
194
195 Supported attributes: value
196
197 <totalDigits>
198
199 Supported attributes: value
200
201 <fractionDigits>
202
203 Supported attributes: value
204
205 <annotation>
206
207 <documentation>
208
209 Supported attributes: name
210
211 Simple Type Support
212
213 Supported built-in types are:
214
215 string
216
217 normalizedString
218
219 token
220
221 NMTOKEN
222
223 Notes: the spec says NMTOKEN should only be used for attributes,
224 but this rule is not enforced.
225
226 boolean
227
228 decimal
229
230 Notes: the enumeration facet is not supported on decimal or any
231 types derived from decimal.
232
233 integer
234
235 int
236
237 short
238
239 byte
240
241 unsignedInt
242
243 unsignedShort
244
245 unsignedByte
246
247 positiveInteger
248
249 negativeInteger
250
251 nonPositiveInteger
252
253 nonNegativeInteger
254
255 dateTime
256
257 Notes: Although dateTime correctly validates the lexical format it does not
258 offer comparison facets (min*, max*, enumeration).
259
260 double
261
262 Notes: Although double correctly validates the lexical format it
263 does not offer comparison facets (min*, max*, enumeration). Also,
264 minimum and maximum constraints as described in the spec are not
265 checked.
266
267 float
268
269 Notes: The restrictions on double support apply to float as well.
270
271 duration
272
273 time
274
275 date
276
277 gYearMonth
278
279 gYear
280
281 gMonthDay
282
283 gDay
284
285 gMonth
286
287 hexBinary
288
289 base64Binary
290
291 anyURI
292
293 QName
294
295 NOTATION
296
297 Miscellaneous Details
298
299 Other known devations from the specification:
300
301 · Patterns specified in pattern simpleType restrictions are Perl
302 regexes with none of the XML Schema extensions available.
303
304 · No effort is made to prevent the declaration of facets which
305 "loosen" the restrictions on a type. This is a bug and will be
306 fixed in a future release. Until then types which attempt to
307 loosen restrictions on their base class will behave unpredictably.
308
309 · No attempt has been made to exclude content models which are
310 ambiguous, as the spec demands. In fact, I don't see any com‐
311 pelling reason to do so, aside from strict compliance to the spec.
312 The content model implementaton uses regular expressions which
313 should be able to handle loads of ambiguity without significant
314 performance problems.
315
316 · Marking a facet "fixed" has no effect.
317
318 · SimpleTypes must come after their base types in the schema body.
319 For example, this is ok:
320
321 <xs:simpleType name="foo">
322 <xs:restriction base="xs:string">
323 <xs:minLength value="10"/>
324 </xs:restriction>
325 </xs:simpleType>
326 <xs:simpleType name="foo_bar">
327 <xs:restriction base="foo">
328 <xs:length value="10"/>
329 </xs:restriction>
330 </xs:simpleType>
331
332 But this is not:
333
334 <xs:simpleType name="foo_bar">
335 <xs:restriction base="foo">
336 <xs:length value="10"/>
337 </xs:restriction>
338 </xs:simpleType>
339 <xs:simpleType name="foo">
340 <xs:restriction base="xs:string">
341 <xs:minLength value="10"/>
342 </xs:restriction>
343 </xs:simpleType>
344
346 Here are a few gotchas that you should know about:
347
348 · No Unicode testing has been performed, although it seems possible
349 that the module will handle Unicode data correctly.
350
351 · Namespace processing is almost entirely missing from the module.
352
353 · Little work has been done to ensure that invalid schemas fail
354 gracefully. Until that is done you may want to develop your
355 schemas using a more mature validator (like Xerces or XML Spy)
356 before using them with this module.
357
359 Please use "rt.cpan.org" to report bugs in this module:
360
361 http://rt.cpan.org
362
363 Please note that I will delete bugs which merely point out the lack of
364 support for a particular feature of XML Schema. Those are feature
365 requests, and believe me, I know we've got a long way to go.
366
368 This module is supported on the perl-xml mailing-list. Please join the
369 list if you have questions, suggestions or patches:
370
371 http://listserv.activestate.com/mailman/listinfo/perl-xml
372
374 If you'd like to help develop XML::Validator::Schema you'll want to
375 check out a copy of the CVS tree:
376
377 http://sourceforge.net/cvs/?group_id=89764
378
380 The following people have contributed bug reports, test cases and/or
381 code:
382
383 Russell B Cecala (aka Plankton)
384 David Wheeler
385 Toby Long-Leather
386 Mathieu
387 h.bridge@fasol.fujitsu.com
388 michael.jacob@schering.de
389 josef@clubphoto.com
390 adamk@ali.as
391 Jean Flouret
392
394 Sam Tregar <sam@tregar.com>
395
397 Copyright (C) 2002-2003 Sam Tregar
398
399 This program is free software; you can redistribute it and/or modify it
400 under the same terms as Perl 5 itself.
401
403 This module isn't just an XML Schema validator, it's also a test of the
404 Test Driven Development methodology. I've been writing tests while I
405 develop code for a while now, but TDD goes further by requiring tests
406 to be written before code. One consequence of this is that the module
407 code may seem naive; it really is just enough code to pass the current
408 test suite. If I'm doing it right then there shouldn't be a single
409 line of code that isn't directly related to passing a test. As I add
410 functionality (by way of writing tests) I'll refactor the code a great
411 deal, but I won't add code only to support future development.
412
413 For more information I recommend "Test Driven Development: By Example"
414 by Kent Beck.
415
417 XML::Schema
418
419 http://www.w3.org/XML/Schema
420
421 http://xml.apache.org/xerces-c/
422
423
424
425perl v5.8.8 2004-11-04 Schema(3)