1xmerl_sax_parser(3) Erlang Module Definition xmerl_sax_parser(3)
2
3
4
6 xmerl_sax_parser - XML SAX parser API
7
9 A SAX parser for XML that sends the events through a callback inter‐
10 face. SAX is the Simple API for XML, originally a Java-only API. SAX
11 was the first widely adopted API for XML in Java, and is a de facto
12 standard where there are versions for several programming language en‐
13 vironments other than Java.
14
16 option():
17 Options used to customize the behaviour of the parser. Possible op‐
18 tions are:
19
20 {continuation_fun, ContinuationFun}:
21 ContinuationFun is a call back function to decide what to do if
22 the parser runs into EOF before the document is complete.
23
24 {continuation_state, term()}:
25 State that is accessible in the continuation call back function.
26
27 {event_fun, EventFun}:
28 EventFun is the call back function for parser events.
29
30 {event_state, term()}:
31 State that is accessible in the event call back function.
32
33 {file_type, FileType}:
34 Flag that tells the parser if it's parsing a DTD or a normal XML
35 file (default normal).
36
37 * FileType = normal | dtd
38
39 {encoding, Encoding}:
40 Set default character set used (default UTF-8). This character
41 set is used only if not explicitly given by the XML document.
42
43 * Encoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list
44
45 skip_external_dtd:
46 Skips the external DTD during parsing. This option is the same
47 as {external_entities, none} and {fail_undeclared_ref, false} but
48 just for the DTD.
49
50 disallow_entities:
51 Implies that parsing fails if an ENTITY declaration is found.
52
53 {entity_recurse_limit, N}:
54 Sets how many levels of recursion that is allowed for entities.
55 Default is 3 levels.
56
57 {external_entities, AllowedType}:
58 Sets which types of external entities that should be allowed, if
59 not allowed it's just skipped.
60
61 * AllowedType = all | file | none
62
63 {fail_undeclared_ref, Boolean}:
64 Decides how the parser should behave when an undeclared refer‐
65 ence is found. Can be useful if one has turned of external enti‐
66 ties so that an external DTD is not parsed. Default is true.
67
68 :
69
70
71 event():
72 The SAX events that are sent to the user via the callback.
73
74 startDocument:
75 Receive notification of the beginning of a document. The SAX
76 parser will send this event only once before any other event
77 callbacks.
78
79 endDocument:
80 Receive notification of the end of a document. The SAX parser
81 will send this event only once, and it will be the last event
82 during the parse.
83
84 {startPrefixMapping, Prefix, Uri}:
85 Begin the scope of a prefix-URI Namespace mapping. Note that
86 start/endPrefixMapping events are not guaranteed to be properly
87 nested relative to each other: all startPrefixMapping events will
88 occur immediately before the corresponding startElement event,
89 and all endPrefixMapping events will occur immediately after the
90 corresponding endElement event, but their order is not otherwise
91 guaranteed. There will not be start/endPrefixMapping events for
92 the "xml" prefix, since it is predeclared and immutable.
93
94 * Prefix = string()
95
96 * Uri = string()
97
98 {endPrefixMapping, Prefix}:
99 End the scope of a prefix-URI mapping.
100
101 * Prefix = string()
102
103 {startElement, Uri, LocalName, QualifiedName, Attributes}:
104 Receive notification of the beginning of an element. The Parser
105 will send this event at the beginning of every element in the XML
106 document; there will be a corresponding endElement event for ev‐
107 ery startElement event (even when the element is empty). All of
108 the element's content will be reported, in order, before the cor‐
109 responding endElement event.
110
111 * Uri = string()
112
113 * LocalName = string()
114
115 * QualifiedName = {Prefix, LocalName}
116
117 * Prefix = string()
118
119 * Attributes = [{Uri, Prefix, AttributeName, Value}]
120
121 * AttributeName = string()
122
123 * Value = string()
124
125 {endElement, Uri, LocalName, QualifiedName}:
126 Receive notification of the end of an element. The SAX parser
127 will send this event at the end of every element in the XML docu‐
128 ment; there will be a corresponding startElement event for every
129 endElement event (even when the element is empty).
130
131 * Uri = string()
132
133 * LocalName = string()
134
135 * QualifiedName = {Prefix, LocalName}
136
137 * Prefix = string()
138
139 {characters, string()}:
140 Receive notification of character data.
141
142 {ignorableWhitespace, string()}:
143 Receive notification of ignorable whitespace in element content.
144
145 {processingInstruction, Target, Data}:
146 Receive notification of a processing instruction. The Parser
147 will send this event once for each processing instruction found:
148 note that processing instructions may occur before or after the
149 main document element.
150
151 * Target = string()
152
153 * Data = string()
154
155 {comment, string()}:
156 Report an XML comment anywhere in the document (both inside and
157 outside of the document element).
158
159 startCDATA:
160 Report the start of a CDATA section. The contents of the CDATA
161 section will be reported through the regular characters event.
162
163 endCDATA:
164 Report the end of a CDATA section.
165
166 {startDTD, Name, PublicId, SystemId}:
167 Report the start of DTD declarations, it's reporting the start
168 of the DOCTYPE declaration. If the document has no DOCTYPE decla‐
169 ration, this event will not be sent.
170
171 * Name = string()
172
173 * PublicId = string()
174
175 * SystemId = string()
176
177 endDTD:
178 Report the end of DTD declarations, it's reporting the end of
179 the DOCTYPE declaration.
180
181 {startEntity, SysId}:
182 Report the beginning of some internal and external XML entities.
183 ???
184
185 {endEntity, SysId}:
186 Report the end of an entity. ???
187
188 {elementDecl, Name, Model}:
189 Report an element type declaration. The content model will con‐
190 sist of the string "EMPTY", the string "ANY", or a parenthesised
191 group, optionally followed by an occurrence indicator. The model
192 will be normalized so that all parameter entities are fully re‐
193 solved and all whitespace is removed,and will include the enclos‐
194 ing parentheses. Other normalization (such as removing redundant
195 parentheses or simplifying occurrence indicators) is at the dis‐
196 cretion of the parser.
197
198 * Name = string()
199
200 * Model = string()
201
202 {attributeDecl, ElementName, AttributeName, Type, Mode, Value}:
203 Report an attribute type declaration.
204
205 * ElementName = string()
206
207 * AttributeName = string()
208
209 * Type = string()
210
211 * Mode = string()
212
213 * Value = string()
214
215 {internalEntityDecl, Name, Value}:
216 Report an internal entity declaration.
217
218 * Name = string()
219
220 * Value = string()
221
222 {externalEntityDecl, Name, PublicId, SystemId}:
223 Report a parsed external entity declaration.
224
225 * Name = string()
226
227 * PublicId = string()
228
229 * SystemId = string()
230
231 {unparsedEntityDecl, Name, PublicId, SystemId, Ndata}:
232 Receive notification of an unparsed entity declaration event.
233
234 * Name = string()
235
236 * PublicId = string()
237
238 * SystemId = string()
239
240 * Ndata = string()
241
242 {notationDecl, Name, PublicId, SystemId}:
243 Receive notification of a notation declaration event.
244
245 * Name = string()
246
247 * PublicId = string()
248
249 * SystemId = string()
250
251 unicode_char():
252 Integer representing valid unicode codepoint.
253
254 unicode_binary():
255 Binary with characters encoded in UTF-8 or UTF-16.
256
257 latin1_binary():
258 Binary with characters encoded in iso-latin-1.
259
261 file(Filename, Options) -> Result
262
263 Types:
264
265 Filename = string()
266 Options = [option()]
267 Result = {ok, EventState, Rest} |
268 {Tag, Location, Reason, EndTags, EventState}
269 Rest = unicode_binary() | latin1_binary()
270 Tag = atom() (fatal_error, or user defined tag)
271 Location = {CurrentLocation, EntityName, LineNo}
272 CurrentLocation = string()
273 EntityName = string()
274 LineNo = integer()
275 EventState = term()
276 Reason = term()
277
278 Parse file containing an XML document. This functions uses a de‐
279 fault continuation function to read the file in blocks.
280
281 stream(Xml, Options) -> Result
282
283 Types:
284
285 Xml = unicode_binary() | latin1_binary() | [unicode_char()]
286 Options = [option()]
287 Result = {ok, EventState, Rest} |
288 {Tag, Location, Reason, EndTags, EventState}
289 Rest = unicode_binary() | latin1_binary() | [unicode_char()]
290 Tag = atom() (fatal_error or user defined tag)
291 Location = {CurrentLocation, EntityName, LineNo}
292 CurrentLocation = string()
293 EntityName = string()
294 LineNo = integer()
295 EventState = term()
296 Reason = term()
297
298 Parse a stream containing an XML document.
299
301 The callback interface is based on that the user sends a fun with the
302 correct signature to the parser.
303
305 Module:ContinuationFun(State) -> {NewBytes, NewState}
306
307 Types:
308
309 State = NewState = term()
310 NewBytes = binary() | list() (should be same as start input
311 in stream/2)
312
313 This function is called whenever the parser runs out of input
314 data. If the function can't get hold of more input an empty list
315 or binary (depends on start input in stream/2) is returned.
316 Other types of errors is handled through exceptions. Use throw/1
317 to send the following tuple {Tag = atom(), Reason = string()} if
318 the continuation function encounters a fatal error. Tag is an
319 atom that identifies the functional entity that sends the excep‐
320 tion and Reason is a string that describes the problem.
321
322 Module:EventFun(Event, Location, State) -> NewState
323
324 Types:
325
326 Event = event()
327 Location = {CurrentLocation, Entityname, LineNo}
328 CurrentLocation = string()
329 Entityname = string()
330 LineNo = integer()
331 State = NewState = term()
332
333 This function is called for every event sent by the parser. The
334 error handling is done through exceptions. Use throw/1 to send
335 the following tuple {Tag = atom(), Reason = string()} if the ap‐
336 plication encounters a fatal error. Tag is an atom that identi‐
337 fies the functional entity that sends the exception and Reason
338 is a string that describes the problem.
339
340
341
342Ericsson AB xmerl 1.3.31.1 xmerl_sax_parser(3)