1XMLWF(1)                        [FIXME: manual]                       XMLWF(1)
2
3
4

NAME

6       xmlwf - Determines if an XML document is well-formed
7

SYNOPSIS

9       xmlwf [OPTIONS] [FILE ...]
10
11       xmlwf -h
12
13       xmlwf -v
14

DESCRIPTION

16       xmlwf uses the Expat library to determine if an XML document is
17       well-formed. It is non-validating.
18
19       If you do not specify any files on the command-line, and you have a
20       recent version of xmlwf, the input file will be read from standard
21       input.
22

WELL-FORMED DOCUMENTS

24       A well-formed document must adhere to the following rules:
25
26       •   The file begins with an XML declaration. For instance, <?xml
27           version="1.0" standalone="yes"?>.  NOTE: xmlwf does not currently
28           check for a valid XML declaration.
29
30       •   Every start tag is either empty (<tag/>) or has a corresponding end
31           tag.
32
33       •   There is exactly one root element. This element must contain all
34           other elements in the document. Only comments, white space, and
35           processing instructions may come after the close of the root
36           element.
37
38       •   All elements nest properly.
39
40       •   All attribute values are enclosed in quotes (either single or
41           double).
42
43       If the document has a DTD, and it strictly complies with that DTD, then
44       the document is also considered valid.  xmlwf is a non-validating
45       parser -- it does not check the DTD. However, it does support external
46       entities (see the -x option).
47

OPTIONS

49       When an option includes an argument, you may specify the argument
50       either separately ("-d output") or concatenated with the option
51       ("-doutput").  xmlwf supports both.
52
53       -a factor
54           Sets the maximum tolerated amplification factor for protection
55           against billion laughs attacks (default: 100.0). The amplification
56           factor is calculated as ..
57
58                           amplification := (direct + indirect) / direct
59
60
61           .. while parsing, whereas <direct> is the number of bytes read from
62           the primary document in parsing and <indirect> is the number of
63           bytes added by expanding entities and reading of external DTD
64           files, combined.
65
66           NOTE: If you ever need to increase this value for non-attack
67           payload, please file a bug report.
68
69       -b bytes
70           Sets the number of output bytes (including amplification) needed to
71           activate protection against billion laughs attacks (default: 8
72           MiB). This can be thought of as an "activation threshold".
73
74           NOTE: If you ever need to increase this value for non-attack
75           payload, please file a bug report.
76
77       -c
78           If the input file is well-formed and xmlwf doesn't encounter any
79           errors, the input file is simply copied to the output directory
80           unchanged. This implies no namespaces (turns off -n) and requires
81           -d to specify an output directory.
82
83       -d output-dir
84           Specifies a directory to contain transformed representations of the
85           input files. By default, -d outputs a canonical representation
86           (described below). You can select different output formats using
87           -c, -m and -N.
88
89           The output filenames will be exactly the same as the input
90           filenames or "STDIN" if the input is coming from standard input.
91           Therefore, you must be careful that the output file does not go
92           into the same directory as the input file. Otherwise, xmlwf will
93           delete the input file before it generates the output file (just
94           like running cat < file > file in most shells).
95
96           Two structurally equivalent XML documents have a byte-for-byte
97           identical canonical XML representation. Note that ignorable white
98           space is considered significant and is treated equivalently to
99           data. More on canonical XML can be found at
100           http://www.jclark.com/xml/canonxml.html .
101
102       -e encoding
103           Specifies the character encoding for the document, overriding any
104           document encoding declaration.  xmlwf supports four built-in
105           encodings: US-ASCII, UTF-8, UTF-16, and ISO-8859-1. Also see the -w
106           option.
107
108       -k
109           When processing multiple files, xmlwf by default halts after the
110           the first file with an error. This tells xmlwf to report the error
111           but to keep processing. This can be useful, for example, when
112           testing a filter that converts many files to XML and you want to
113           quickly find out which conversions failed.
114
115       -m
116           Outputs some strange sort of XML file that completely describes the
117           input file, including character positions. Requires -d to specify
118           an output file.
119
120       -n
121           Turns on namespace processing. (describe namespaces) -c disables
122           namespaces.
123
124       -N
125           Adds a doctype and notation declarations to canonical XML output.
126           This matches the example output used by the formal XML test cases.
127           Requires -d to specify an output file.
128
129       -p
130           Tells xmlwf to process external DTDs and parameter entities.
131
132           Normally xmlwf never parses parameter entities.  -p tells it to
133           always parse them.  -p implies -x.
134
135       -r
136           Normally xmlwf memory-maps the XML file before parsing; this can
137           result in faster parsing on many platforms.  -r turns off
138           memory-mapping and uses normal file IO calls instead. Of course,
139           memory-mapping is automatically turned off when reading from
140           standard input.
141
142           Use of memory-mapping can cause some platforms to report
143           substantially higher memory usage for xmlwf, but this appears to be
144           a matter of the operating system reporting memory in a strange way;
145           there is not a leak in xmlwf.
146
147       -s
148           Prints an error if the document is not standalone. A document is
149           standalone if it has no external subset and no references to
150           parameter entities.
151
152       -t
153           Turns on timings. This tells Expat to parse the entire file, but
154           not perform any processing. This gives a fairly accurate idea of
155           the raw speed of Expat itself without client overhead.  -t turns
156           off most of the output options (-d, -m, -c, ...).
157
158       -v
159           Prints the version of the Expat library being used, including some
160           information on the compile-time configuration of the library, and
161           then exits.
162
163       -w
164           Enables support for Windows code pages. Normally, xmlwf will throw
165           an error if it runs across an encoding that it is not equipped to
166           handle itself. With -w, xmlwf will try to use a Windows code page.
167           See also -e.
168
169       -x
170           Turns on parsing external entities.
171
172           Non-validating parsers are not required to resolve external
173           entities, or even expand entities at all. Expat always expands
174           internal entities (?), but external entity parsing must be enabled
175           explicitly.
176
177           External entities are simply entities that obtain their data from
178           outside the XML file currently being parsed.
179
180           This is an example of an internal entity:
181
182               <!ENTITY vers '1.0.2'>
183
184           And here are some examples of external entities:
185
186               <!ENTITY header SYSTEM "header-&vers;.xml">  (parsed)
187               <!ENTITY logo SYSTEM "logo.png" PNG>         (unparsed)
188
189
190       --
191           (Two hyphens.) Terminates the list of options. This is only needed
192           if a filename starts with a hyphen. For example:
193
194               xmlwf -- -myfile.xml
195
196           will run xmlwf on the file -myfile.xml.
197
198       Older versions of xmlwf do not support reading from standard input.
199

OUTPUT

201       xmlwf outputs nothing for files which are problem-free. If any input
202       file is not well-formed, or if the output for any input file cannot be
203       opened, xmlwf prints a single line describing the problem to standard
204       output.
205
206       If the -k option is not provided, xmlwf halts upon encountering a
207       well-formedness or output-file error. If -k is provided, xmlwf
208       continues processing the remaining input files, describing problems
209       found with any of them.
210

EXIT STATUS

212       For option -v or -h, xmlwf always exits with status code 0. For other
213       cases, the following exit status codes are returned:
214
215       0
216           The input files are well-formed and the output (if requested) was
217           written successfully.
218
219       1
220           An internal error occurred.
221
222       2
223           One or more input files were not well-formed or could not be
224           parsed.
225
226       3
227           If using the -d option, an error occurred opening an output file.
228
229       4
230           There was a command-line argument error in how xmlwf was invoked.
231

BUGS

233       The errors should go to standard error, not standard output.
234
235       There should be a way to get -d to send its output to standard output
236       rather than forcing the user to send it to a file.
237
238       I have no idea why anyone would want to use the -d, -c, and -m options.
239       If someone could explain it to me, I'd like to add this information to
240       this manpage.
241

SEE ALSO

243           The Expat home page:                            https://libexpat.github.io/
244           The W3 XML 1.0 specification (fourth edition):  https://www.w3.org/TR/2006/REC-xml-20060816/
245           Billion laughs attack:                          https://en.wikipedia.org/wiki/Billion_laughs_attack
246
247

AUTHOR

249       This manual page was originally written by Scott Bronson
250       <bronson@rinspin.com> in December 2001 for the Debian GNU/Linux(TM)
251       system (but may be used by others). Permission is granted to copy,
252       distribute and/or modify this document under the terms of the GNU Free
253       Documentation License, Version 1.1.
254

AUTHOR

256       Scott Bronson
257           Author.
258
260       Copyright © 2001 Scott Bronson
261
262
263
264[FIXME: source]                  March 4, 2022                        XMLWF(1)
Impressum