1xmerl_sax_parser(3) Erlang Module Definition xmerl_sax_parser(3)
2
3
4
6 xmerl_sax_parser - XML SAX parser API
7
9 A SAX parser for XML that sends the events through a callback inter‐
10 face. SAX is the Simple API for XML, originally a Java-only API. SAX
11 was the first widely adopted API for XML in Java, and is a de facto
12 standard where there are versions for several programming language en‐
13 vironments other than Java.
14
16 option():
17 Options used to customize the behaviour of the parser. Possible op‐
18 tions are:
19
20 {continuation_fun, ContinuationFun}:
21 ContinuationFun is a call back function to decide what to do if
22 the parser runs into EOF before the document is complete.
23
24 {continuation_state, term()}:
25 State that is accessible in the continuation call back function.
26
27 {event_fun, EventFun}:
28 EventFun is the call back function for parser events.
29
30 {event_state, term()}:
31 State that is accessible in the event call back function.
32
33 {file_type, FileType}:
34 Flag that tells the parser if it's parsing a DTD or a normal XML
35 file (default normal).
36
37 * FileType = normal | dtd
38
39 {encoding, Encoding}:
40 Set default character set used (default UTF-8). This character
41 set is used only if not explicitly given by the XML document.
42
43 * Encoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list
44
45 skip_external_dtd:
46 Skips the external DTD during parsing.
47
48 :
49
50
51 event():
52 The SAX events that are sent to the user via the callback.
53
54 startDocument:
55 Receive notification of the beginning of a document. The SAX
56 parser will send this event only once before any other event
57 callbacks.
58
59 endDocument:
60 Receive notification of the end of a document. The SAX parser
61 will send this event only once, and it will be the last event
62 during the parse.
63
64 {startPrefixMapping, Prefix, Uri}:
65 Begin the scope of a prefix-URI Namespace mapping. Note that
66 start/endPrefixMapping events are not guaranteed to be properly
67 nested relative to each other: all startPrefixMapping events will
68 occur immediately before the corresponding startElement event,
69 and all endPrefixMapping events will occur immediately after the
70 corresponding endElement event, but their order is not otherwise
71 guaranteed. There will not be start/endPrefixMapping events for
72 the "xml" prefix, since it is predeclared and immutable.
73
74 * Prefix = string()
75
76 * Uri = string()
77
78 {endPrefixMapping, Prefix}:
79 End the scope of a prefix-URI mapping.
80
81 * Prefix = string()
82
83 {startElement, Uri, LocalName, QualifiedName, Attributes}:
84 Receive notification of the beginning of an element. The Parser
85 will send this event at the beginning of every element in the XML
86 document; there will be a corresponding endElement event for ev‐
87 ery startElement event (even when the element is empty). All of
88 the element's content will be reported, in order, before the cor‐
89 responding endElement event.
90
91 * Uri = string()
92
93 * LocalName = string()
94
95 * QualifiedName = {Prefix, LocalName}
96
97 * Prefix = string()
98
99 * Attributes = [{Uri, Prefix, AttributeName, Value}]
100
101 * AttributeName = string()
102
103 * Value = string()
104
105 {endElement, Uri, LocalName, QualifiedName}:
106 Receive notification of the end of an element. The SAX parser
107 will send this event at the end of every element in the XML docu‐
108 ment; there will be a corresponding startElement event for every
109 endElement event (even when the element is empty).
110
111 * Uri = string()
112
113 * LocalName = string()
114
115 * QualifiedName = {Prefix, LocalName}
116
117 * Prefix = string()
118
119 {characters, string()}:
120 Receive notification of character data.
121
122 {ignorableWhitespace, string()}:
123 Receive notification of ignorable whitespace in element content.
124
125 {processingInstruction, Target, Data}:
126 Receive notification of a processing instruction. The Parser
127 will send this event once for each processing instruction found:
128 note that processing instructions may occur before or after the
129 main document element.
130
131 * Target = string()
132
133 * Data = string()
134
135 {comment, string()}:
136 Report an XML comment anywhere in the document (both inside and
137 outside of the document element).
138
139 startCDATA:
140 Report the start of a CDATA section. The contents of the CDATA
141 section will be reported through the regular characters event.
142
143 endCDATA:
144 Report the end of a CDATA section.
145
146 {startDTD, Name, PublicId, SystemId}:
147 Report the start of DTD declarations, it's reporting the start
148 of the DOCTYPE declaration. If the document has no DOCTYPE decla‐
149 ration, this event will not be sent.
150
151 * Name = string()
152
153 * PublicId = string()
154
155 * SystemId = string()
156
157 endDTD:
158 Report the end of DTD declarations, it's reporting the end of
159 the DOCTYPE declaration.
160
161 {startEntity, SysId}:
162 Report the beginning of some internal and external XML entities.
163 ???
164
165 {endEntity, SysId}:
166 Report the end of an entity. ???
167
168 {elementDecl, Name, Model}:
169 Report an element type declaration. The content model will con‐
170 sist of the string "EMPTY", the string "ANY", or a parenthesised
171 group, optionally followed by an occurrence indicator. The model
172 will be normalized so that all parameter entities are fully re‐
173 solved and all whitespace is removed,and will include the enclos‐
174 ing parentheses. Other normalization (such as removing redundant
175 parentheses or simplifying occurrence indicators) is at the dis‐
176 cretion of the parser.
177
178 * Name = string()
179
180 * Model = string()
181
182 {attributeDecl, ElementName, AttributeName, Type, Mode, Value}:
183 Report an attribute type declaration.
184
185 * ElementName = string()
186
187 * AttributeName = string()
188
189 * Type = string()
190
191 * Mode = string()
192
193 * Value = string()
194
195 {internalEntityDecl, Name, Value}:
196 Report an internal entity declaration.
197
198 * Name = string()
199
200 * Value = string()
201
202 {externalEntityDecl, Name, PublicId, SystemId}:
203 Report a parsed external entity declaration.
204
205 * Name = string()
206
207 * PublicId = string()
208
209 * SystemId = string()
210
211 {unparsedEntityDecl, Name, PublicId, SystemId, Ndata}:
212 Receive notification of an unparsed entity declaration event.
213
214 * Name = string()
215
216 * PublicId = string()
217
218 * SystemId = string()
219
220 * Ndata = string()
221
222 {notationDecl, Name, PublicId, SystemId}:
223 Receive notification of a notation declaration event.
224
225 * Name = string()
226
227 * PublicId = string()
228
229 * SystemId = string()
230
231 unicode_char():
232 Integer representing valid unicode codepoint.
233
234 unicode_binary():
235 Binary with characters encoded in UTF-8 or UTF-16.
236
237 latin1_binary():
238 Binary with characters encoded in iso-latin-1.
239
241 file(Filename, Options) -> Result
242
243 Types:
244
245 Filename = string()
246 Options = [option()]
247 Result = {ok, EventState, Rest} |
248 {Tag, Location, Reason, EndTags, EventState}
249 Rest = unicode_binary() | latin1_binary()
250 Tag = atom() (fatal_error, or user defined tag)
251 Location = {CurrentLocation, EntityName, LineNo}
252 CurrentLocation = string()
253 EntityName = string()
254 LineNo = integer()
255 EventState = term()
256 Reason = term()
257
258 Parse file containing an XML document. This functions uses a de‐
259 fault continuation function to read the file in blocks.
260
261 stream(Xml, Options) -> Result
262
263 Types:
264
265 Xml = unicode_binary() | latin1_binary() | [unicode_char()]
266 Options = [option()]
267 Result = {ok, EventState, Rest} |
268 {Tag, Location, Reason, EndTags, EventState}
269 Rest = unicode_binary() | latin1_binary() | [unicode_char()]
270 Tag = atom() (fatal_error or user defined tag)
271 Location = {CurrentLocation, EntityName, LineNo}
272 CurrentLocation = string()
273 EntityName = string()
274 LineNo = integer()
275 EventState = term()
276 Reason = term()
277
278 Parse a stream containing an XML document.
279
281 The callback interface is based on that the user sends a fun with the
282 correct signature to the parser.
283
285 Module:ContinuationFun(State) -> {NewBytes, NewState}
286
287 Types:
288
289 State = NewState = term()
290 NewBytes = binary() | list() (should be same as start input
291 in stream/2)
292
293 This function is called whenever the parser runs out of input
294 data. If the function can't get hold of more input an empty list
295 or binary (depends on start input in stream/2) is returned.
296 Other types of errors is handled through exceptions. Use throw/1
297 to send the following tuple {Tag = atom(), Reason = string()} if
298 the continuation function encounters a fatal error. Tag is an
299 atom that identifies the functional entity that sends the excep‐
300 tion and Reason is a string that describes the problem.
301
302 Module:EventFun(Event, Location, State) -> NewState
303
304 Types:
305
306 Event = event()
307 Location = {CurrentLocation, Entityname, LineNo}
308 CurrentLocation = string()
309 Entityname = string()
310 LineNo = integer()
311 State = NewState = term()
312
313 This function is called for every event sent by the parser. The
314 error handling is done through exceptions. Use throw/1 to send
315 the following tuple {Tag = atom(), Reason = string()} if the ap‐
316 plication encounters a fatal error. Tag is an atom that identi‐
317 fies the functional entity that sends the exception and Reason
318 is a string that describes the problem.
319
320
321
322Ericsson AB xmerl 1.3.28 xmerl_sax_parser(3)