1xmlfy(1) User Commands xmlfy(1)
2
3
4
6 xmlfy - Convert to XML on the fly.
7
9 xmlfy [OPTION]...
10
11 -h, --help
12 print usage instructions
13
14 -v, --version
15 print version number
16
17 --license
18 print license
19
20 --debug
21 print extra debugging information
22
23 Input options:
24
25 -F, --fieldseparator[<level>[b][:<scope>]] <string>
26 specify a delimiter string token for the level specified
27
28 -R, --recordseparator <string>
29 this is a synonym for "-F1 <string>"
30 specify an alternative record separator string to the default
31
32 -C, --column[:<scope>] <r1>-<r2>[:<name>]
33 create an input field from an input column range
34
35 -W, --regex[:<scope>] [E|B][i][l][r][U][n][b][e]/<pat‐
36 tern>/[<name>[,..]]
37 create input fields from a regular expression
38
39 -e, --expelempty
40 expel empty input records and fields
41
42 -E, --expel <input_records>[:<input_fields>]
43 expel selected records or fields from being processed
44
45 -q, --quotedfields[2]
46 treat fields that are between quotes as one field
47
48 -Q, --quotechars[2] <string>
49 specify an array of quoting characters to use
50
51 -b, --blanklines
52 do not ignore blank input records
53
54 -t, --trim
55 trim leading and trailing white space from input fields
56
57 Output options:
58
59 -S, --schema <file>
60 -Sd, --schemadtd <file>
61 -Sr, --schemarnc <file>
62 -Sx, --schemaxsd <file>
63 use a schema <file> for tag names and element control
64
65 -M, --matchdirect 0|<elementname>
66 match directly on a specific element in the schema
67
68 -A, --attribute[<level>[:<scope>]] number|level
69 |delimiter|timestamp|insert <name> <value>
70 include attributes in the opening element tag
71
72 -T, --tag[<level>[:<scope>]] number|level
73 |name <name>
74 |[re]insert <name> <value>
75 |[re]insertfile <name> <file>
76 |[re]insertfilexml <indent> <file>
77 modify or insert element tags
78
79 -k, --keyvaluepairs[<level>]
80 generate key/value XML tag pairs
81
82 -l, --linenumbers
83 this is a synonym for "-T1 number"
84 include the line number in the line tag name
85
86 -f, --fieldnumbers
87 this is a synonym for "-T2 number"
88 include the field number in the field tag name
89
90 -L, --linetags
91 include a line number tag with the record data
92
93 -X, --xmlformat [XML1.0|XML1.1]|[SOAP1.1|SOAP1.2]|[HTML table|list]
94 |[UTF-8|UTF-16|UTF-16BE|UTF-16LE|UTF-32|UTF-32BE|UTF-32LE]|BOM
95 |ASCIItoUTF|[noescape all|amp|lt|gt|quot|apos|brvbar]
96 |trimtagclose|[newline dos|unix]
97 specify an XML output format
98
99 -p, --printonly header|footer|rtagopen|rtagclose|records
100 print only snippets of the XML output
101
102 -I, --identifier <system_identifier>
103 specify an alternate system identifier of the doctype or SOAP URI
104
105 -s, --summary[2|c|n|f <file>]
106 print a summary after the end of the processing
107
108 -U, --unxml
109 undo the XML syntax leaving just plain text
110
111 --noxml
112 do not XML-fy the input stream
113
115 The xmlfy command reads stdin and outputs it to stdout in XML format
116 using supplied control directives.
117
118 Delimiter tokens and/or column selections are used to break down the
119 input stream into XML elements which are then represented inside an XML
120 tree hierarchy that can span multiple depth levels. For example, com‐
121 mand line output was originally designed for text or CRT based process‐
122 ing. The xmlfy command takes this text based output where a new-line
123 often represents an end-of-record of data and white space often repre‐
124 sents a field separator, and reformats it into XML output suitable for
125 interfacing with modern object oriented systems.
126
127 xmlfy is a powerful yet lightweight tool that primarily caters for con‐
128 verting ASCII, UTF-8, UTF-16 or UTF-32 based output into XML format on
129 the fly and dealing with common issues associated with this kind of
130 transformation.
131
132 The xmlfy command also supports a basic version of a schema configura‐
133 tion allowing you to control the format of the XML output by supplying
134 a schema file as an option.
135
136 With no options supplied xmlfy will use default values for its XML for‐
137 mat. The entire standard input will be enclosed in <xmlfy></xmlfy>
138 pairs, each line of standard input will be enclosed in <line></line>
139 pairs, and each field of each line will be enclosed in <field></field>
140 pairs.
141
143 You can supply options to customise the behaviour of xmlfy at the com‐
144 mand line, or by a special token inside the schema file, or both.
145 NOTE: Options are resolved from left to right. If any conflicting
146 options are specified then the last one will have precedence.
147
148 Option: -h, --help
149 The command line usage is printed in plain text format not in XML for‐
150 mat.
151
152 Option: -v, --version
153 The version number is printed in plain text format not in XML format.
154 If the version number is required in XML format it is included with the
155 summary option.
156
157 Option: --license
158 Print all licenses used by xmlfy.
159
160 Option: --debug
161 Print extra debugging information to stderr to help debug xmlfy behav‐
162 iour.
163
164 Input options:
165
166 Option: -F, --fieldseparator[<level>[b][:<scope>]] <string>
167 Allows you to specify a delimiter string token for the level specified.
168 <level> - The XML depth level to be delimited by <string>.
169 Must be an integer value greater than or equal to 1.
170 E.g. a value of 1 will split the input into records delimited
171 by <string>, a value of 2 will split records into fields
172 delimited by <string>, a value of 3 will split fields into
173 subfields delimited by <string>, and so on.
174 There is no space separating the option and the level value.
175 If no level is specified then the given options will only
176 apply to level 2.
177 b - Use byte matching for the specified delimiter string.
178 By specifying this option the delimiter string is treated as just a
179 literal sequence of bytes. Normally command line arguments are pre‐
180 sented to xmlfy as ASCII strings and if wide UTF encoding like
181 UTF-16 or UTF-32 is being used then xmlfy will automatically con‐
182 vert the specified delimiter string to that encoding. With this
183 option no encoding conversion takes place. In this mode you can
184 also specify escaped decimal byte sequences inside the delimiter
185 string. E.g. "\123\234\\"
186 <scope> - A comma delimited set of sequence ranges with no spaces.
187 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
188 <s1> - integer representing a start range.
189 <s2> - integer or the $ token representing an end range.
190 r - restart the scope counter for this delimiter after the
191 completion of the associated range.
192 Restrict the delimiter effectiveness to the occurrences spec‐
193 ified in <scope>. If a delimiter <string> is encountered for
194 the level specified and its sequence is not in the scope then
195 it will not function as a field separator and will instead be
196 treated as data.
197 E.g. -F3:1-3,8 "." this is saying that level 3 fields will
198 only be created for the 1st to 3rd, and 8th occurrences of
199 the delimiter "." (period).
200 The restart scope counter option r allows you to specify
201 repeating scope sequences.
202 E.g -F1:2,5r "\n" this is saying create level 1 records out
203 of every second and fifth lines and keep repeating this until
204 the input is exhausted.
205 When using multiple same level delimiters, restarting scope
206 counters of the equivalent level and higher get reset when‐
207 ever a delimiter match is applied.
208 If a <scope> range is not specified then the delimiter func‐
209 tion applies to every occurrence of <string> of the target
210 level.
211 <string> - A sequence of characters or token to be used as a delimiter.
212 Tokens specified literally as "\n", "\r", and "\t" are
213 translated to their corresponding control character. If
214 using wide UTF encoding then <string> is automatically con‐
215 verted to that encoding, otherwise you can use the byte
216 matching option and specify escaped decimal byte sequences
217 inside <string>.
218 o If the delimiter token is the same for a series of levels then obvi‐
219 ously the shallowest level will take precedence, unless the shallow‐
220 est levels have been limited by scope restrictions. You can also make
221 use of quotes in the input along with specifying quote options.
222 o The XML tree algorithm deepens in a sequential way therefore you must
223 set your delimiter levels as an unbroken sequence for them to be of
224 any use, that is you cannot split a level 2 field with a level 4
225 delimiter string.
226 o Refer to the schema option section for information on level handling
227 when a schema file is specified.
228 o Levels 1 and 2 are already set by default.
229 o The default level 1 delimiter token is NEWLINE (new-line).
230 o The default level 2 delimiter token is WHITESPACE (space, tab,
231 new-line, carriage-return, vertical-tab and form-feed).
232 o The delimiters for levels 3 and above are unset.
233 o Only one delimiter string token can be specified however this option
234 can be invoked multiple times allowing for multiple delimiters to be
235 used at the level specified. When specifying multiple same level
236 delimiters, the larger delimiter strings are matched before the
237 smaller ones. The delimiter string is not included in the output.
238
239 Option: -R, --recordseparator <string>
240 This is a synonym for "-F1 <string>"
241 Allows you to specify a record separator string token that is different
242 from the default. The default record separator token is NEWLINE
243 (new-line).
244
245 Option: -C, --column[:<scope>] <c1>-<c2>[:<name>]
246 Use an input column range of the input record to generate an input
247 field. This is an alternative method of capturing input fields from
248 using delimiters.
249 <scope> - A comma delimited set of sequence ranges with no spaces.
250 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
251 <s1> - integer representing a start range.
252 <s2> - integer or the $ token representing an end range.
253 r - restart the scope counter for this column option after
254 the completion of the associated range.
255 Restrict the column option effectiveness to the occurrences
256 specified in <scope>. If the input record sequence is not in
257 the scope then the column option will not be applied and
258 input fields will not be generated.
259 The restart scope counter option r allows the scope sequences
260 to continually repeat themselves. E.g -C:1-3,5r 1-20 this is
261 saying capture column fields of 20 characters in length for
262 every first to third and fifth input records, and keep
263 repeating this until the input is exhausted.
264 If a <scope> range is not specified then the column option
265 applies to all input records.
266 <c1> - Integer or the $ token representing the start column range of
267 the input field.
268 <c2> - Integer or the $ token representing the end column range of the
269 input field.
270 <name> - Optional string value that will be used to override the tag
271 name for this input field.
272 You can pretty much specify anything as a tag name including
273 illegal XML therefore user discretion is advised.
274 Only applicable for changing default behaviour (i.e. when the
275 --schema option is NOT specified).
276 o Specifying field separators of level 2 and above with this option is
277 conflicting and will produce a usage error.
278 o The number of times and order in which this option is specified (in
279 conjunction with the -W option) determines the number of input fields
280 generated and their order.
281 o Column ranges represent code points (characters) meaning any multi
282 byte character will only account for just one column position.
283 o Multiple options can use non linear ranges and can overlap e.g. -C
284 5-10:part -C 1-$:whole
285 o Ranges that exceed the size of the input record will not process
286 beyond the end of the input record.
287 o You can use single or double quotes to protect the range from the
288 shell interpreter e.g. -C '80-$:text'
289 o Only one parameter pair can be specified however this option can be
290 invoked multiple times.
291
292 Option: -W, --regex[:<scope>] [E|B][i][l][r][U][n][b][e]/<pat‐
293 tern>/[<name>[,..]]
294 Use a regular expression on the input record to generate input fields.
295 This is an alternative method of capturing input fields from using
296 delimiters.
297 <scope> - A comma delimited set of sequence ranges with no spaces.
298 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
299 <s1> - integer representing a start range.
300 <s2> - integer or the $ token representing an end range.
301 r - restart the scope counter for this regex option after
302 the completion of the associated range.
303 Restrict the regex option effectiveness to the occurrences
304 specified in <scope>. If the input record sequence is not in
305 the scope then the regex option will not be applied and input
306 fields will not be generated.
307 The restart scope counter option r allows the scope sequences
308 to continually repeat themselves. E.g -W:1-3,5r
309 /(^A.*).*(B.*$)/ this is saying capture two regex fields for
310 every first to third and fifth input records, and keep
311 repeating this until the input is exhausted.
312 If a <scope> range is not specified then the regex option
313 applies to all input records.
314 E - flag to use Extended Regular Expressions in <pattern> (default).
315 B - flag to use Basic Regular Expressions in <pattern>.
316 i - flag to ignore case.
317 l - flag to treat <pattern> as a literal.
318 r - flag to make concatenation right associative.
319 U - flag to make operators ungreedy by default.
320 n - flag to give '\n' special meaning (REG_NEWLINE).
321 b - flag to set '^' as not beginning-of-line (REG_NOTBOL).
322 e - flag to set '$' as not end-of-line (REG_NOTEOL).
323 <pattern> - A POSIX 1003.2 compliant Regular Expression pattern utilis‐
324 ing zero or more parenthesis pairs to capture input fields.
325 <name> - Optional string value that will be used to override the tag
326 name for input fields derived from pattern matches.
327 A comma separated list of <name> can be specified with the
328 last entry being re-used if more input fields than names are
329 generated.
330 You can pretty much specify anything as a tag name including
331 illegal XML therefore user discretion is advised.
332 Only applicable for changing default behaviour (i.e. when the
333 --schema option is NOT specified).
334 o Specifying field separators of level 2 and above with this option is
335 conflicting and will produce a usage error.
336 o The number of times and order in which this option is specified (in
337 conjunction with the -C option) determines the number of input fields
338 generated and their order.
339 o If matches are not made for all parenthesis pairs specified in <pat‐
340 tern> then no output will result.
341 o If no parenthesis pairs are specified in <pattern> then the entire
342 input record will be used as the output when a pattern match occurs.
343 o Wide UTF encoding can be specified in <pattern> by using the \x lit‐
344 eral followed by two hexadecimal digits to represent any byte inside
345 the code-point e.g. \x0b.
346 o For further information on using regex syntax and its flags please
347 consult the TRE web documentation.
348 o You can use single or double quotes to protect <pattern> from the
349 shell interpreter e.g. -W 'iU/(^Pam .*)/pams'
350 o You can specify the percentage character % as an alternative separa‐
351 tor to forward-slash / for <pattern> so long as it remains paired.
352 o Only one parameter pair can be specified however this option can be
353 invoked multiple times.
354
355 Option: -e, --expelempty
356 Expel input fields that are empty (zero bytes in length) from being
357 processed. The use of multi level and multiple same level delimiters
358 can sometimes yield plenty of empty fields which may be undesirable.
359 This option expels all the empty input fields from being processed by
360 the output processor. All levels are examined and any input records
361 comprised entirely out of empty fields are also expelled.
362 This option will always run before any expelling tasks specified with
363 option -E are run.
364 This option has no influence on levels subjected to key/value pairing
365 as that process has its own way of dealing with empty fields at its
366 target levels.
367 If a schema is used then obviously the number of input records/fields
368 used for element matching has been reduced.
369
370 Option: -E, --expel <input_records>[:<input_fields>]
371 Expel selected input records or selected input fields of selected input
372 records from being processed. Each input record is checked against the
373 expel criteria and if a match occurs then these input records or input
374 fields are simply discarded from being passed onto the xmlfy output
375 processor.
376 <input_records> - A comma delimited set of input record expel criteria
377 with no spaces.
378 The <input_records> parameter has a sub form of
379 <range_type><r1>[-<r2>][/<string>/][,..]
380 Where <range_type> can be 'n', 'f' or 'c'.
381 n - the associated range refers to input record num‐
382 bers.
383 f - the associated range refers to input field num‐
384 bers.
385 c - the associated range refers to input record char‐
386 acter lengths.
387 <r1> - integer representing a start range.
388 <r2> - integer or the $ token representing an end
389 range.
390 <string> - the specified <string> must also exist
391 within the range.
392 Expel criteria types can be intermixed.
393 E.g. -E n10-$,f7-8,f4/Mer‐
394 cedes/,c10-20,c1-15/SUV/
395 this is saying that input records whose
396 record number is greater than or equal to
397 10, AND input records whose total number
398 of fields are between 7 and 8, AND input
399 records whose 4th input field contains the
400 string "Mercedes", AND input records whose
401 input record length is greater than or
402 equal to 10 but less than or equal to 20
403 characters, AND input records whose first
404 15 characters contain the string "SUV",
405 will finally match the input record expel
406 criteria.
407 In this release you can only specify the $
408 token (last input record) in a paired
409 range and not on its own.
410 Generally xmlfy can figure out where the
411 search string delimiters would likely
412 occur however you can specify the % char‐
413 acter as an alternative separator to / for
414 <string> so long as it remains paired.
415 If an <input_fields> criteria is not spec‐
416 ified then the entire input record is
417 expelled.
418 <input_fields> - A comma delimited set of field number ranges with no
419 spaces.
420 The <input_fields> parameter has a sub form of
421 <r1>[-<r2>][,..]
422 <r1> - integer or the $ token representing a start
423 range.
424 <r2> - integer or the $ token representing an end
425 range.
426 Discard select input fields of the input records that
427 match the expel criteria before passing onto the xmlfy
428 output processor.
429 E.g. -E n2-$:1,$ this is saying that input records
430 whose record number is greater than or equal to 2 will
431 have their first and last fields expelled.
432 You can specify the $ token (last input field) in a
433 paired range or on its own.
434 o You can use single or double quotes to protect the range from the
435 shell interpreter e.g. -E 'n2-$:$'
436 o If a schema is used then obviously the number of input records/fields
437 used for element matching has been reduced.
438 o Only one parameter group can be specified however this option can be
439 invoked multiple times with resolution occurring from left to right.
440
441 Option: -q, --quotedfields[2]
442 Treat fields that are quoted as one field. Normally xmlfy will parse
443 fields by their delimiter e.g. WHITESPACE, this option allows multi
444 delimited fields to be specified as one by quoting them. By default the
445 quoted field may only span the current input record unless the -q2
446 option is specified in which case the quoted field can span multiple
447 input records.
448 Quotes are not included in the field and any leading/trailing text out‐
449 side the field's quotes are truncated.
450 If quotes are not closed xmlfy will update the field until the end of
451 the input record, or if option -q2 is specified, until the input is
452 exhausted (EOF).
453 The default quote character is a double quote (").
454
455 Option: -Q, --quotechars[2] <string>
456 specify a string of characters that can be used as the quoting charac‐
457 ter.
458 <string> - an array of quoting characters.
459 o If field quoting is enabled then any input character that matches any
460 character in <string> will toggle the quoting function, unless the
461 -Q2 option is specified in which case characters in <string> repre‐
462 sent paired quotes with odd numbered characters in this array tog‐
463 gling the open quote function, and its corresponding pair toggling
464 the close quote function. This allows parenthesis, brackets, etc to
465 be used as quotes.
466 o Obviously when specifying this option care must be taken to prevent
467 the shell from interpreting the supplied quote characters. When using
468 a schema file containing this option you can specify quote characters
469 by escaping them with the backslash "\" character.
470
471 Option: -b, --blanklines
472 Normally xmlfy ignores blank lines or empty level 1 records in the
473 input stream. This option tells xmlfy to not ignore these blank lines
474 and print out XML line record tags but with no elements.
475 In this mode blank lines count as line numbers.
476
477 Option: -t, --trim
478 Field elements are trimmed of leading and trailing white space.
479
480 Output options:
481
482 Option: -S, --schema <file>
483 -Sd, --schemadtd <file>
484 -Sr, --schemarnc <file>
485 -Sx, --schemaxsd <file>
486 Specify a schema <file> for controlling the XML output.
487 <file> - The schema file must comply with either the Document Type Def‐
488 inition (.dtd) language, or the RELAX NG Compact (.rnc) lan‐
489 guage, or the XML Schema Document (.xsd) language, however
490 xmlfy does not support the finer aspects of these schema lan‐
491 guages at this early stage.
492 o When all input fields of the input record have been identified, xmlfy
493 will match them against the elements inside the tree hierarchy of the
494 schema file, and if a match is found then xmlfy will print an output
495 record using the matching schema tree hierarchy as its XML structure.
496 Option -S, --schema uses the case-insensitive file name extension
497 (.dtd or .rnc or .xsd) of <file> to determine which schema inter‐
498 preter xmlfy will apply.
499 Option -Sd, --schemadtd forces xmlfy to use the DTD schema inter‐
500 preter on <file>.
501 Option -Sr, --schemarnc forces xmlfy to use the RNC schema inter‐
502 preter on <file>.
503 Option -Sx, --schemaxsd forces xmlfy to use the XSD schema inter‐
504 preter on <file>.
505 o You can specify multi level delimiters when using this option however
506 any delimiters greater than level 2 are only used to identify more
507 input fields and are not used at all in altering the XML tree hierar‐
508 chy as is dictated by the schema file. Fields with levels of 2 and
509 above are flattened to be just plain fields of the input record -
510 this is very different to the default behaviour where field levels
511 form the XML tree hierarchy.
512 o If a schema option is not supplied then xmlfy will use default values
513 for tag names and element control.
514 o For further information on how to write a schema for xmlfy please
515 consult the web documentation.
516
517 Option: -M, --matchdirect 0|<elementname>
518 Match directly on a specific element in the schema making it the root
519 element.
520 0 - A token representing the default root element in the
521 schema.
522 <elementname> - The name of a record element in the schema.
523 o This option alters the way the selected schema element is matched
524 against the available input fields that were generated. In this mode
525 the target element is matched in its entirety using its element
526 helper and printed accordingly. This is very different to the
527 default legacy mode whereby only the record elements of the root ele‐
528 ment get matched in a continuously sequential way.
529 o Regardless of what wildcard attributes exist for the target element
530 it will only be printed once as a root element.
531 o If a schema file is not specified then this option will be ignored.
532
533 Option: -A, --attribute[<level>[:<scope>]] number|level
534 |delimiter|timestamp|insert <name> <value>
535 Include attributes in the opening element tag for the level specified.
536 <level> - The XML depth level to be modified.
537 Must be an integer value greater than or equal to 0.
538 E.g. a value of 1 will apply attributes to each opening
539 record element and a value of 2 will apply attributes to each
540 opening field element.
541 There is no space separating the option and the level value.
542 If no level is specified then the given options will apply to
543 all levels except level 0.
544 <scope> - A comma delimited set of sequence ranges with no spaces.
545 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
546 <s1> - integer representing a start range.
547 <s2> - integer or the $ token representing an end range.
548 r - restart the scope counter for this attribute after the
549 completion of the associated range.
550 Restrict the custom attribute effectiveness to the occur‐
551 rences specified in <scope>. If the element sequence is not
552 in the scope then the custom attribute will not be applied.
553 The restart scope counter option r allows the scope sequences
554 to continually repeat themselves. E.g -A2:1-3,5r insert x y
555 this is saying insert custom attributes x="y" for every first
556 to third and fifth level 2 elements, and keep repeating this
557 until the output is exhausted.
558 Scope sequence counters are always reset to zero for the next
559 element depth level and higher whenever a deeper XML depth
560 level is entered into.
561 If a <scope> range is not specified then the custom attribute
562 function applies to all elements at the specified <level>.
563 number - Specify the sequence number as an element attribute.
564 E.g. <field> becomes <field number="1"> and the next <field>
565 becomes <field number="2"> and so on.
566 Scoping is not supported.
567 Not supported for level 0.
568 level - Specify the level as an element attribute.
569 E.g. <field> becomes <field level="2">
570 Scoping is not supported.
571 Not supported for level 0.
572 delimiter - Specify the matching delimiter as an element attribute.
573 E.g. <field> becomes <field delimiter="ABC">
574 Delimiter string tokens that contain illegal XML characters
575 are printed as their hex pair equivalent.
576 When using a schema file only level 1 records and field
577 elements will have their delimiter attributes printed.
578 Scoping is not supported.
579 Not supported for level 0.
580 timestamp - Include a timestamp as an element attribute.
581 Two timestamps are provided, one for humans and one for
582 machines. The times are stamped at element print time.
583 E.g. <field> becomes <field timestamp_date="Fri May 5
584 10:23:33 2008" timestamp_sec="123456790">
585 Scoping is not supported.
586 insert <name> <value> - Insert a custom element attribute.
587 The parameters <name> and <value> are combined
588 to form an element attribute with <value>
589 wrapped around double quotes.
590 E.g <field> becomes <field name="value">
591 You can pretty much specify anything as an
592 attribute name and value including illegal XML
593 therefore user discretion is advised.
594 o
595 Only one parameter group can be specified however this option can be
596 invoked multiple times.
597
598 Option: -T, --tag[<level>[:<scope>]] number|level
599 |name <name>
600 |[re]insert <name> <value>
601 |[re]insertfile <name> <file>
602 |[re]insertfilexml <indent> <file>
603 Modify or insert element tags for the level specified.
604 <level> - The XML depth level to be modified.
605 Must be an integer value greater than or equal to 0.
606 E.g. a value of 1 will modify the tag name for each record
607 and a value of 2 will modify the tag name for each field.
608 There is no space separating the option and the level value.
609 If no level is specified then the given options will apply to
610 all levels except level 0.
611 <scope> - A comma delimited set of sequence ranges with no spaces.
612 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
613 <s1> - integer representing a start range.
614 <s2> - integer or the $ token representing an end range.
615 r - restart the scope counter for this tag after the com‐
616 pletion of the associated range.
617 Restrict the custom tag effectiveness to the occurrences
618 specified in <scope>. If the element sequence is not in the
619 scope then the custom tag will not be applied.
620 The restart scope counter option r allows the scope sequences
621 to continually repeat themselves. E.g -T2:1-3,5r insert x y
622 this is saying insert the custom tag <x>y</x> before every
623 first to third and fifth level 2 elements, and keep repeating
624 this until the output is exhausted.
625 Scope sequence counters are always reset to zero for the next
626 element depth level and higher whenever a deeper XML depth
627 level is entered into.
628 If a <scope> range is not specified then the custom tag func‐
629 tion applies to all elements at the specified <level>.
630 number - Suffix the tag name with its sequence number.
631 E.g. <line> becomes <line1> and the next <line> becomes
632 <line2> and so on.
633 Scoping is not supported.
634 Not supported for level 0.
635 level - Prefix the tag name with its level.
636 E.g. <field> becomes <L2field>
637 Scoping is not supported.
638 Not supported for level 0.
639 name <name> - Change the tag name from the default to <name>
640 Only applicable for changing default behaviour (i.e.
641 when the --schema option is NOT specified).
642 E.g. <field> becomes <word>
643 You can pretty much specify anything as a tag name
644 including illegal XML therefore user discretion is
645 advised.
646 Scoping is not supported.
647 [re]insert <name> <value> - Insert a custom element tag.
648 The parameters <name> and <value> are com‐
649 bined to form an element tag with <value>
650 wrapped between <name> tag pairs. E.g
651 <name>value</name>
652 The inserted element appears before any
653 output elements for the level specified.
654 The reinsert feature keeps applying itself
655 at the level specified.
656 You can pretty much specify anything as an
657 element name and value including illegal
658 XML therefore user discretion is advised.
659 Not supported for level 0.
660 [re]insertfile <name> <file>
661 -
662 Insert a custom element tag containing
663 contents of a file.
664 The contents of <file> are wrapped
665 between <name> tag pairs.
666 The encoding of <file> must match the
667 output encoding being used otherwise an
668 undesirable output will result.
669 Any BOM found in <file> is removed.
670 Any reserved XML characters in <file>
671 are escaped, and newlines are corrected.
672 The inserted element appears before any
673 output elements for the level specified.
674 The reinsert feature keeps applying
675 itself at the level specified.
676 You can pretty much specify anything as
677 an element name including illegal XML
678 therefore user discretion is advised.
679 Not supported for level 0.
680 [re]insertfilexml <indent> <file> - Insert contents of an XML file.
681 The entire contents of <file> are
682 inserted before any output elements
683 for the level specified.
684 The encoding of <file> must match
685 the output encoding being used oth‐
686 erwise an undesirable output will
687 result.
688 Any BOM found in <file> is removed.
689 If the parameter <indent> is an
690 integer value greater than or equal
691 to zero then the contents of file
692 are indented by this amount, any
693 XML prologue is removed, and new‐
694 lines are corrected.
695 If the parameter <indent> is the
696 value "raw" then the XML file is
697 inserted as is without its BOM.
698 The reinsert feature keeps applying
699 itself at the level specified.
700 You can pretty much insert anything
701 as XML file content including ille‐
702 gal XML therefore user discretion
703 is advised.
704 o Only one parameter group can be specified however this option can be
705 invoked multiple times.
706
707 Option: -k, --keyvaluepairs[<level>]
708 Switch on the generation of key/value XML tag pairs for the output.
709 <level> - The XML depth level to be modified.
710 Must be an integer value greater than or equal to 2.
711 There is no space separating the option and the level value.
712 If no level is specified then the option will apply to all
713 levels except levels 0 and 1.
714 o In this mode the data of the first field of the current XML level
715 becomes the tag name for that level, that is, it becomes the key, and
716 any subsequent fields become its value.
717 o This key/value pairing continues down the XML tree hierarchy for all
718 the XML levels specified.
719 o You can pretty much generate anything as a tag name including illegal
720 XML therefore user discretion is advised. The new tag name is trimmed
721 of leading and trailing white space and white space between text is
722 replaced with the underscore "_" character.
723 o If a blank field becomes a tag name candidate then xmlfy will skip it
724 and search along the same level for a more suitable candidate. This
725 behaviour can be mitigated by using the -b option which will force
726 the default tag name to be substituted instead.
727 o Only applicable for changing default behaviour (i.e. when the
728 --schema option is NOT specified).
729 o This option can be invoked multiple times.
730
731 Option: -l, --linenumbers
732 This is a synonym for "-T1 number"
733 Include the line number in the line tag name
734
735 Option: -f, --fieldnumbers
736 This is a synonym for "-T2 number"
737 include the field number in the field tag name
738
739 Option: -L, --linetags
740 Insert a line number tag within the XML formatted output.
741 This is an alternative way of numbering your XML records. E.g. for the
742 first line record of XML output the following tag is inserted <linenum‐
743 ber>1</linenumber> and so on.
744
745 Option: -X, --xmlformat [XML1.0|XML1.1]|[SOAP1.1|SOAP1.2]|[HTML ta‐
746 ble|list]
747 |[UTF-8|UTF-16|UTF-16BE|UTF-16LE|UTF-32|UTF-32BE|UTF-32LE]|BOM
748 |ASCIItoUTF|[noescape all|amp|lt|gt|quot|apos|brvbar]
749 |trimtagclose|[newline dos|unix]
750 Allows you to specify the XML format to be used for the output.
751 XML1.0 - Generate XML 1.0 output (this is the default).
752 XML1.1 - Generate XML 1.1 output.
753 SOAP1.1 - Generate XML SOAP 1.1 output.
754 SOAP1.2 - Generate XML SOAP 1.2 output.
755 HTML - Generate HTML output.
756 table- elements are displayed in table format.
757 list - elements are displayed in list format.
758 UTF-8 - Generate UTF-8 output encoding (default).
759 UTF-16 - Generate UTF-16 output encoding.
760 UTF-16BE - Generate UTF-16BE (big-endian) output encoding.
761 UTF-16LE - Generate UTF-16LE (little-endian) output encoding.
762 UTF-32 - Generate UTF-32 output encoding.
763 UTF-32BE - Generate UTF-32BE (big-endian) output encoding.
764 UTF-32LE - Generate UTF-32LE (little-endian) output encoding.
765 BOM - Generate and interpret a Byte-Order-Mark.
766 ASCIItoUTF - Convert ASCII input to wide UTF encoding.
767 noescape - Do not escape select reserved XML characters. By default
768 xmlfy will escape reserved XML characters that appear in the
769 input stream and this option provides an adjustment to this
770 behaviour.
771 all - do not escape any characters.
772 amp - do not escape the character & (ampersand).
773 lt - do not escape the character < (less-than).
774 gt - do not escape the character > (greater-than).
775 quot - do not escape the character " (quote).
776 apos - do not escape the character ' (apostrophe).
777 brvbar - do not escape the character | (broken vertical
778 bar).
779 trimtagclose - Truncate superfluous characters from the closing tag
780 name.
781 newline - Select the line ending format for XML meta-data.
782 dos - use carriage-return and new-line ("\r\n") for line end‐
783 ings.
784 unix - use new-line ("\n") for line endings.
785 o The only thing option XML1.1 does is change the prologue version
786 string to "1.1" and nothing else.
787 o When using the SOAP* options, the normal XML output generated by
788 xmlfy is encapsulated in a SOAP Envelope and SOAP Body, the root tag
789 defines a namespace prefix of "x" with a URI reference that can be
790 adjusted with the -I option, and all children elements (records and
791 fields) use this prefix name.
792 A non-mandatory administrative header element with a prefix name of
793 "xh" is provided containing program and execution details.
794 The SOAP* options are only a basic implementation for generating a
795 simple XML SOAP envelope containing xmlfy data. There is no further
796 scope provided for SOAP Headers, SOAP Faults, transaction or protocol
797 handling.
798 o When using the HTML option, the normal XML output generated by xmlfy
799 is displayed in either a table or list layout and encapsulated in a
800 HTML Body, of which the document title can be adjusted with the -I
801 option.
802 o The UTF-* options tell xmlfy to use the specified encoding for all
803 its XML meta-data (element tags, element attributes, prologues, etc).
804 Other than the ASCIItoUTF option, no transformation of the input
805 stream is performed and xmlfy assumes that the encoding used by the
806 input stream matches the encoding specified, otherwise an undesirable
807 output will result containing different encodings between the input
808 data and XML meta-data.
809 If specifying the UTF-16 or UTF-32 parameter and the BOM option is
810 either not specified or there is no BOM in the input stream then
811 encoding in big-endian format will be assumed.
812 o The BOM (Byte-Order-Mark) option will force xmlfy to handle the BOM
813 in the input stream if it is there, and also generate a BOM in the
814 output stream. If specifying the BOM option and a BOM is found in the
815 input stream then that will overide any user specified encoding
816 option.
817 The BOM byte sequence used for UTF-8 is 0xef 0xbb 0xbf (U+FEFF).
818 The BOM byte sequence used for UTF-16BE is 0xfe 0xff (U+FEFF).
819 The BOM byte sequence used for UTF-16LE is 0xff 0xfe (U+FFFE).
820 The BOM byte sequence used for UTF-32BE is 0x00 0x00 0xfe 0xff
821 (U+FEFF).
822 The BOM byte sequence used for UTF-32LE is 0xff 0xfe 0x00 0x00
823 (U+FFFE).
824 o The ASCIItoUTF option when used in conjunction with one of the UTF-*
825 options will process ASCII input and convert it to the wide UTF
826 encoding specified.
827 o The noescape options control which reserved XML characters should not
828 be escaped.
829 o The trimtagclose option trims back the closing tag from the first
830 white space character found. Some options allow the user to define
831 anything as a tag name including tag names that have element
832 attributes (non normal approach). Using this option under these cir‐
833 cumstances will prevent these element attributes from appearing in
834 the close tag.
835 o The newline option adjusts the line ending format used for XML
836 meta-data. On Unix platforms the default is unix and on Win32 plat‐
837 forms the default is dos. Only applies to XML meta-data output and
838 does not do conversion of newline characters found in the input
839 stream.
840 o Only one parameter group can be specified however this option can be
841 invoked multiple times.
842
843 Option: -p, --printonly header|footer|rtagopen|rtagclose|records
844 Allows you to just print XML snippets to the output.
845 This is useful when you want to execute xmlfy multiple times to con‐
846 struct a single XML output file.
847 header - Will only print the prologue, doctype, opened SOAP Envelope
848 and Body tags, the SOAP Header tag, HTML headers, and the BOM.
849 footer - Will only print closed SOAP Envelope and Body tags, and closed
850 HTML tags.
851 rtagopen - Will only print an opened root element tag.
852 rtagclose - Will only print a closed root element tag.
853 records - Will only print record elements and their field elements.
854 o Only one parameter can be specified however this option can be
855 invoked multiple times.
856
857 Option: -I, --identifier <system_identifier>
858 Allows you to specify your own system identifier of the doctype should
859 you not be content with what xmlfy has specified.
860 system_identifier - An array of characters used to override the default
861 system identifier.
862 You can pretty much specify anything as a system
863 identifier including illegal XML therefore user
864 discretion is advised.
865 o By default xmlfy will use the string "xmlfy.dtd", or if specifying a
866 schema, use the schema filename as the system identifier.
867 o You can also use this option to overide the default SOAP namespace
868 URI value for the root element when using the XML SOAP format
869 options.
870 o You can also use this option to overide the document title in the
871 HTML header when using the XML HTML format options.
872
873 Option: -s, --summary[2|c|n|f <file>]
874 When all input is exhausted an XML summary element is printed at the
875 bottom providing a brief summary of what xmlfy processed.
876 2 - Print the summary element to stderr instead.
877 c - Print the summary element as an XML comment.
878 n - Print the summary element without calculating any message
879 digests.
880 f <file> - Print the summary element to <file>.
881 By default MD5 and SHA512 checksum elements are provided inside the
882 summary called md5_input, md5_output, sha512_input and sha512_output.
883 The md5_input and sha512_input checksums are a digest of all the input
884 that was actually processed including any input BOM. The md5_output and
885 sha512_output checksums are a digest of all the output including any
886 output BOM that precedes the XML summary element. To correctly validate
887 the output result against the output checksum you must first remove any
888 summary element and summary comments from the output result.
889
890 Option: -U, --unxml
891 Read XML formatted input and remove all that bracket racket reverting
892 your XML document back to a plain format. Can be used in conjunction
893 with the -F<level> <string> option to specify the delimiter to use for
894 each XML depth level. Multiple same level -F options are meaningless
895 in this context and delimiters are only inserted if more than one field
896 is available to be delimited. Field separator scoping options are
897 ignored. The default delimiter is a space character for XML depth lev‐
898 els of 2 and above, and new-line for XML depth levels below 2. Tag
899 names and their attributes are not included in the output, and anything
900 between XML comments are filtered out. If there is a BOM in the input
901 then xmlfy will use that for the encoding, otherwise xmlfy will look
902 for the opening XML character sequence of "<?" to determine the encod‐
903 ing being used. If neither of the previous methods found the correct
904 encoding then you can use the -X UTF-* options as a fallback. Basic
905 quoting options are also supported. Works best with XML output gener‐
906 ated by xmlfy but can also be used with caution on other foreign XML
907 documents.
908
909 Option: --noxml
910 Do not XML-fy the input stream but do process it for reserved XML char‐
911 acters (this feature was initially written for formatting the xmlfy
912 HTML test reports that use wide encodings). Used in conjunction with
913 the -X options to control the conversion of reserved characters and/or
914 to transform the input stream to wide UTF encodings.
915 E.g. To transform an ASCII input stream to UTF-16BE encoding with a
916 BOM:
917 xmlfy --noxml -X UTF-16BE -X ASCIItoUTF -X noescape all -X BOM
918 E.g. To just escape select reserved XML characters in an UTF-32LE input
919 stream:
920 xmlfy --noxml -X UTF-32LE -X noescape amp
921
922 Important note on specifying options.
923 The way xmlfy handles options is very straightforward and can be easily
924 confused if you don't follow the syntax specified for each option. The
925 getopt library has been deliberately avoided to keep xmlfy portable.
926
927 xmlfy first evaluates options supplied on the command line, if a schema
928 file is supplied then xmlfy will also look for options in that file and
929 evaluate them too. See the schema file section below on how to specify
930 xmlfy options inside a schema file.
931
933 How it works.
934 The input processor used by xmlfy block reads unprocessed bytes from
935 standard input (stdin) and stores them in an array the size of a level
936 1 record. This level 1 record is then processed for fields and sub
937 fields etc by marking their positions in this array. Dynamic memory
938 handling is used.
939
940 The output processor used by xmlfy takes the results from the input
941 processor and re-packages it with suitably encoded XML syntax. Any
942 input characters that are reserved for XML are by default re-repre‐
943 sented in their escaped form.
944 Character & (ampersand) becomes string &
945 Character < (less-than) becomes string <
946 Character > (greater-than) becomes string >
947 Character " (quote) becomes string "
948 Character ' (apostrophe) becomes string '
949 Character | (broken vertical bar) becomes string ¦
950 The output processor then writes processed bytes to a block buffer for
951 printing to standard output (stdout).
952
953 Using a schema file.
954 The default schema used by xmlfy is hard coded and can be described as
955 follows:
956 In DTD schema form:
957 <!ELEMENT xmlfy (line*)>
958 <!ELEMENT line (field*)>
959 <!ELEMENT field (#PCDATA)>
960 In RNC schema form:
961 start = xmlfy
962 xmlfy = element xmlfy { line* }
963 line = element line { field* }
964 field = element field { text }
965 In XSD schema form:
966 <xs:schema>
967 <xs:element name="xmlfy">
968 <xs:sequence>
969 <xs:element name="line" type="lineType" minOccurs="0" maxOc‐
970 curs="unbounded" />
971 </xs:sequence>
972 </xs:element>
973 <xs:complexType name="lineType">
974 <xs:sequence>
975 <xs:element name="field" type="xs:string" minOccurs="0" max‐
976 Occurs="unbounded" />
977 </xs:sequence>
978 </xs:complexType>
979 </xs:schema>
980
981 A schema file for the ls -la command that produces output like this:
982 total 73
983 drwx------+ 3 ag None 0 Apr 20 19:36 .
984 -rwxr-xr-x 1 ag None 15639 Apr 20 19:31 a.exe
985 -rwx------+ 1 ag None 6354 Apr 20 19:31 xmlfy.c
986 -rwx------+ 1 ag None 4901 Apr 19 2008 xmlfy.h
987
988 In DTD schema form will look like this:
989 <!ELEMENT ls (total?), (file*)>
990 <!ELEMENT total (prompt, totalsize)>
991 <!ELEMENT file (permission?, blocks?, user?, group?, size?,
992 date_M?, date_d?, date_ty?, fname)>
993 <!ELEMENT date_ty (date_y)>
994 <!ELEMENT date_ty (date_h, date_m)>
995 <!ELEMENT prompt (#PCDATA)>
996 <!ELEMENT totalsize (#PCDATA)>
997 <!ELEMENT permission (#PCDATA)>
998 <!ELEMENT blocks (#PCDATA)>
999 <!ELEMENT user (#PCDATA)>
1000 <!ELEMENT group (#PCDATA)>
1001 <!ELEMENT size (#PCDATA)>
1002 <!ELEMENT date_y (#PCDATA)>
1003 <!ELEMENT date_M (#PCDATA)>
1004 <!ELEMENT date_d (#PCDATA)>
1005 <!ELEMENT date_h (#PCDATA)>
1006 <!ELEMENT date_m (#PCDATA)>
1007 <!ELEMENT fname (#PCDATA)>
1008
1009 and should be saved to a file as ls.dtd and invoked as:
1010 % ls -la | xmlfy --schema ls.dtd -F3 :
1011
1012 In RNC schema form will look like this:
1013 start = ls
1014 ls = element ls { total? | file* }
1015 total = element total { prompt, totalsize }
1016 file = element file { permission?, blocks?, user?, group?, size?,
1017 date_M?, date_d?, date_ty?, fname }
1018 date_ty = element date_ty { date_y }
1019 date_ty |= element date_ty { date_h, date_m }
1020 prompt = element prompt { text }
1021 totalsize = element totalsize { text }
1022 permission = element permission { text }
1023 blocks = element blocks { text }
1024 user = element user { text }
1025 group = element group { text }
1026 size = element size { text }
1027 date_y = element date_y { text }
1028 date_M = element date_M { text }
1029 date_d = element date_d { text }
1030 date_h = element date_h { text }
1031 date_m = element date_m { text }
1032 fname = element fname { text }
1033
1034 and should be saved to a file as ls.rnc and invoked as:
1035 % ls -la | xmlfy --schema ls.rnc -F3 :
1036
1037 In XSD schema form will look like this:
1038 <xs:schema>
1039 <xs:element name="ls" type="lsType" />
1040 <xs:complexType name="lsType">
1041 <xs:sequence>
1042 <xs:element name="total" type="totalType" minOccurs="0" />
1043 <xs:element name="file" type="fileType" minOccurs="0" maxOc‐
1044 curs="unbounded" />
1045 </xs:sequence>
1046 </xs:complexType>
1047 <xs:complexType name="totalType">
1048 <xs:sequence>
1049 <xs:element name="prompt" type="xs:string" />
1050 <xs:element name="totalsize" type="xs:string" />
1051 </xs:sequence>
1052 </xs:complexType>
1053 <xs:complexType name="fileType">
1054 <xs:sequence>
1055 <xs:element name="permission" type="xs:string" minOccurs="0"
1056 />
1057 <xs:element name="blocks" type="xs:string" minOccurs="0" />
1058 <xs:element name="user" type="xs:string" minOccurs="0" />
1059 <xs:element name="group" type="xs:string" minOccurs="0" />
1060 <xs:element name="size" type="xs:string" minOccurs="0" />
1061 <xs:element name="date_M" type="xs:string" minOccurs="0" />
1062 <xs:element name="date_d" type="xs:string" minOccurs="0" />
1063 <xs:element name="date_ty" type="datetyType" minOccurs="0" />
1064 <xs:element name="fname" type="xs:string" />
1065 </xs:sequence>
1066 </xs:complexType>
1067 <xs:complexType name="datetyType">
1068 <xs:choice>
1069 <xs:element name="date_y" type="xs:string" />
1070 <xs:sequence>
1071 <xs:element name="date_h" type="xs:string" />
1072 <xs:element name="date_m" type="xs:string" />
1073 </xs:sequence>
1074 </xs:choice>
1075 </xs:complexType>
1076 </xs:schema>
1077
1078 and should be saved to a file as ls.xsd and invoked as:
1079 % ls -la | xmlfy --schema ls.xsd -F3 :
1080
1081 Shoe-horning raw data into a structure defined by a schema is rather
1082 straight forward when the input fields have a one-to-one relationship
1083 with the fields of the schema elements, however if wildcard tokens
1084 and/or Boolean logic are employed in the schema then it becomes quite a
1085 challenge, sometimes even impossible, to be deterministic about which
1086 input field belongs to which schema field. Strictly speaking, the main
1087 function of the schema is to ensure XML is valid and to do this
1088 requires the XML document to already pre-exist. In xmlfy's case we are
1089 doing the reverse by building an XML document on the fly while follow‐
1090 ing rules described by the schema - this is still okay and the result‐
1091 ing XML can be considered to be both valid and well formed.
1092
1093 xmlfy employs two techniques to help with this shoe-horning input data
1094 problem. The first technique xmlfy uses is recognising multiple element
1095 definitions that have the same name. This allows you to capture your
1096 schema elements under a variety of input circumstances without having
1097 to create a unique element for each circumstance - you can still do
1098 that if you want. The second technique xmlfy uses is auto-generated
1099 field match constraint helpers to assist in matching the input fields
1100 to the elements described by the schema. These helpers are useful in
1101 improving the speed of xmlfy particularly when using compound element
1102 structures and wildcard tokens in the schema hierarchy. After the
1103 schema file is loaded into memory, an array of helpers is generated for
1104 each element that describes all combinations of the schema tree traver‐
1105 sal paths that can be taken and associates each combination with the
1106 minimum, maximum and last number of fields required for a match against
1107 the number of available input fields. For example, using the above
1108 schema a match will occur for:
1109 total(min=2, max=2, last=2) when input fields = 2.
1110 file(min=1, max=9, last=1) when 1 >= input fields <= 9
1111 and date_ty is a single field (min=1, max=1, last=1).
1112 file(min=1, max=10, last=1) when 1 >= input fields <= 10
1113 and date_ty is two fields (min=2, max=2, last=2).
1114 By default xmlfy continuously iterates through just the record elements
1115 of the root element looking for element helpers that can fully satisfy
1116 the requirements of that particular element's schema tree hierarchy for
1117 the given input fields, after which the matching record element is then
1118 checked against its wildcard obligations in the root element defini‐
1119 tion, and if okay is finally printed.
1120 In match direct mode xmlfy only looks at the element helpers of the
1121 targeted element, and if that element can fully satisfy the require‐
1122 ments of its schema tree hierarchy for the given input fields, is
1123 printed in its entirety only once as the root element.
1124
1125 To specify xmlfy options inside a schema file you encapsulate them
1126 inside a special token that is in effect a schema comment.
1127 DTD and XSD example:
1128 <!-- xmlfy-args: -F1 "\n" -F2 ABC -q -Q \"\' -->
1129 RNC example:
1130 ## xmlfy-args: -F1 "\n" -F2 ABC -q -Q \"\'
1131 This special token must exist in completed form on just one line at the
1132 left most side, spacing is important, only the first occurrence is
1133 recognised, and ideally it is placed somewhere near the top of the
1134 schema file. The schema option syntax is the same as the command line
1135 option syntax except that some options are not allowed e.g. --schema.
1136
1138 xmlfy has been successfully tested on average hardware with input
1139 records containing over 10,000,000 fields whilst using a complex schema
1140 tree structure and multi level delimiters.
1141
1142 Currently the xmlfy schema file parser is not that sophisticated and
1143 exhibits the following behaviour:
1144
1145 DTD schema
1146 - Only recognises the <!ELEMENT> directive and ignores all others.
1147 - The first valid <!ELEMENT> definition becomes the root element.
1148 - Element fields that don't have an element definition default to being
1149 (#PCDATA).
1150 - Elements defined as (#PCDATA) or (#CDATA) are ignored causing the
1151 referring field to default to (#PCDATA) however it is good practice
1152 to include these elements in order to furnish a complete DTD schema.
1153 - Only honours the +, ? and * wildcard tokens.
1154 - At this stage does not honour field group sets () and or-ing | syntax
1155 tokens.
1156
1157 RNC schema
1158 - Only recognises named directives and ignores all others.
1159 - The element named "start" becomes the root element.
1160 - Element fields that don't have an element definition default to being
1161 { text }.
1162 - Elements defined as { text } are ignored causing the referring field
1163 to default to { text } however it is good practice to include these
1164 elements in order to furnish a complete RNC schema.
1165 - Only honours the +, ? and * wildcard tokens.
1166 - At this stage does not honour field group sets () and or-ing | syntax
1167 tokens.
1168
1169 XSD schema
1170 - Only recognises the <schema>, <element>, <complexType>, <ref>,
1171 <sequence>, and <choice> directives and ignores all others.
1172 - The recognised directives are not fully implemented and their use
1173 should be kept straightforward.
1174 - The first valid <element> definition becomes the root element.
1175 - Element types that are not of matchable complexType are treated as
1176 "xsi:string" regardless of what type is specified.
1177 - Only honours the minOccurs="0", maxOccurs="0" and maxOc‐
1178 curs="unbounded" wildcard attributes.
1179 - At this stage does not honour group sets but does do limited support
1180 with choices.
1181
1182 All schema types
1183 - The fields of the root element define all the level 1 elements (lets
1184 call the fields that have their own branch structure record ele‐
1185 ments).
1186 - By default fields of the root element that are not record elements
1187 are ignored. Use the match direct option to match targeted elements
1188 in their entirety.
1189 - The fields of the record elements simply represent other elements and
1190 unlimited element nesting is allowed.
1191 - The field names that are specified in the element definitions are
1192 read from left to right and matched against a field number calcula‐
1193 tion on the input fields, and then matched again on any wildcard
1194 tokens.
1195 - You can wildcard many fields but you should think clearly about what
1196 you are trying to achieve and whether it is at all possible. For
1197 example, the following DTD which is perfectly suitable for checking
1198 for valid XML, will however prove impossible for xmlfy to shoe-horn
1199 input data into DTD elements a, b and c reliably because more than
1200 one field has a wildcard token to match none or many input fields.
1201 <!ELEMENT parent (record)>
1202 <!ELEMENT record (a*, b, c*)>
1203 <!ELEMENT a (#PCDATA)>
1204 <!ELEMENT b (#PCDATA)>
1205 <!ELEMENT c (#PCDATA)>
1206 In the above example xmlfy will allocate ALL input fields to element
1207 <a> and that MAY not be the desired intention.
1208
1210 0 Normal exit.
1211 -1 Invalid argument specified.
1212 -2 Error processing schema file contents.
1213 -3 Infinite loop detected when matching input against schema ele‐
1214 ments.
1215 -10 Out of memory.
1216
1218 Originally written by Arthur Gouros.
1219 This software also contains material derived from Ville Laurikari's TRE
1220 regex library.
1221 This software also contains material derived from the US Secure Hash
1222 Algorithms (RFC4634).
1223 This software also contains material derived from the RSA Data Secu‐
1224 rity, Inc. MD5 Message-Digest Algorithm.
1225
1227 BSD License for xmlfy
1228 Copyright © 2008-2020, Arthur Gouros
1229 All rights reserved.
1230
1231 Redistribution and use in source and binary forms, with or without mod‐
1232 ification, are permitted provided that the following conditions are
1233 met:
1234
1235 - Redistributions of source code must retain the above copyright
1236 notice, this list of conditions and the following disclaimer.
1237 - Redistributions in binary form must reproduce the above copyright
1238 notice, this list of conditions and the following disclaimer in the
1239 documentation and/or other materials provided with the distribution.
1240 - Neither the name of Arthur Gouros nor the names of its contributors
1241 may be used to endorse or promote products derived from this software
1242 without specific prior written permission.
1243
1244 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
1245 IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
1246 TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTIC‐
1247 ULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
1248 CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
1249 EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
1250 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
1251 PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
1252 LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
1253 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
1254 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1255
1257 The full documentation of the xmlfy project can be found on the web at:
1258
1259 http://xmlfy.sourceforge.net
1260
1261 The website is updated more frequently than the man pages and should be
1262 considered the authoritative source of information.
1263
1264
1265
1266xmlfy 1.5.7 February 2, 2020 xmlfy(1)