1xmlfy(1)                         User Commands                        xmlfy(1)
2
3
4

NAME

6       xmlfy - Convert to XML on the fly.
7

SYNOPSIS

9       xmlfy [OPTION]...
10
11       -h, --help
12             print usage instructions
13
14       -v, --version
15             print version number
16
17       --license
18             print license
19
20       --debug
21             print extra debugging information
22
23       Input options:
24
25       -F, --fieldseparator[<level>[b][:<scope>]] <string>
26             specify a delimiter string token for the level specified
27
28       -R, --recordseparator <string>
29             this is a synonym for "-F1 <string>"
30             specify an alternative record separator string to the default
31
32       -C, --column[:<scope>] <r1>-<r2>[:<name>]
33             create an input field from an input column range
34
35       -W, --regex[:<scope>] /<pattern>/[i][l][r][u]
36             create input fields from a regular expression
37
38       -e, --expelempty
39             expel empty input records and fields
40
41       -E, --expel <input_records>[:<input_fields>]
42             expel selected records or fields from being processed
43
44       -q, --quotedfields[2]
45             treat fields that are between quotes as one field
46
47       -Q, --quotechars[2] <string>
48             specify an array of quoting characters to use
49
50       -b, --blanklines
51             do not ignore blank input records
52
53       -t, --trim
54             trim leading and trailing whitespace from input fields
55
56       Output options:
57
58       -S, --schema <file>
59       -Sd, --schemadtd <file>
60       -Sr, --schemarnc <file>
61       -Sx, --schemaxsd <file>
62             use a schema <file> for tag names and element control
63
64       -A, --attribute[<level>[:<scope>]] number|level
65               |delimiter|timestamp|insert <name> <value>
66             include attributes in the opening element tag
67
68       -T, --tag[<level>[:<scope>]] number|level
69               |tagname <name>|insert <name> <value>
70               |insertfile <name> <file>|insertfilexml <indent> <file>
71             modify the element tag
72
73       -k, --keyvaluepairs[<level>]
74             generate key/value XML tag pairs
75
76       -l, --linenumbers
77             this is a synonym for "-T1 number"
78             include the line number in the line tag name
79
80       -f, --fieldnumbers
81             this is a synonym for "-T2 number"
82             include the field number in the field tag name
83
84       -L, --linetags
85             include a line number tag with the record data
86
87       -X, --xmlformat [XML1.0|XML1.1]|[SOAP1.1|SOAP1.2]
88               |[UTF-8|UTF-16|UTF-16BE|UTF-16LE|UTF-32|UTF-32BE|UTF-32LE]|BOM
89               |ASCIItoUTF|[noescape all|amp|lt|gt|quot|apos|brvbar]
90               |[newline dos|unix]
91             specify an XML output format
92
93       -p, --printonly header|footer|rtagopen|rtagclose|records
94             print only snippets of the XML output
95
96       -I, --identifier <system_identifier>
97             specify an alternate system identifier of the doctype or SOAP URI
98
99       -s, --summary[2|c|n|f <file>]
100             print a summary after the end of the processing
101
102       -U, --unxml
103             undo the XML syntax leaving just plain text
104
105       --noxml
106             do not XML-fy the input stream
107

DESCRIPTION

109       The  xmlfy  command  reads stdin and outputs it to stdout in XML format
110       using supplied control directives.
111
112       Delimiter tokens and/or column selections are used to  break  down  the
113       input stream into XML elements which are then represented inside an XML
114       tree hierarchy that can span multiple depth levels.  For example,  com‐
115       mand line output was originally designed for text or CRT based process‐
116       ing. The xmlfy command takes this text based output  where  a  new-line
117       often  represents  an end-of-record of data and whitespace often repre‐
118       sents a field separator, and reformats it into XML output suitable  for
119       interfacing with modern object oriented systems.
120
121       xmlfy is a powerful yet lightweight tool that primarily caters for con‐
122       verting ASCII, UTF-8, UTF-16 or UTF-32 based output into XML format  on
123       the  fly  and  dealing  with common issues associated with this kind of
124       transformation.
125
126       The xmlfy command also supports a basic version of a schema  configura‐
127       tion  allowing you to control the format of the XML output by supplying
128       a schema file as an option.
129
130       With no options supplied xmlfy will use default values for its XML for‐
131       mat.  The  entire  standard  input  will be enclosed in <xmlfy></xmlfy>
132       pairs, each line of standard input will be  enclosed  in  <line></line>
133       pairs,  and each field of each line will be enclosed in <field></field>
134       pairs.
135

OPTIONS

137       You can supply options to customise the behaviour of xmlfy at the  com‐
138       mand  line,  or  by  a  special  token inside the schema file, or both.
139       NOTE: Options are resolved from  left  to  right.  If  any  conflicting
140       options are specified then the last one will have precedence.
141
142       Option: -h, --help
143       The  command line usage is printed in plain text format not in XML for‐
144       mat.
145
146       Option: -v, --version
147       The version number is printed in plain text format not in  XML  format.
148       If the version number is required in XML format it is included with the
149       summary option.
150
151       Option: --license
152       Print all licenses used by xmlfy.
153
154       Option: --debug
155       Print extra debugging information to stderr to help debug xmlfy  behav‐
156       iour.
157
158       Input options:
159
160       Option: -F, --fieldseparator[<level>[b][:<scope>]] <string>
161       Allows you to specify a delimiter string token for the level specified.
162       <level> - The XML depth level to be delimited by <string>.
163                 Must be an integer value greater than or equal to 1.
164                 E.g. a value of 1 will split the input into records delimited
165                 by <string>, a value of 2  will  split  records  into  fields
166                 delimited  by  <string>,  a value of 3 will split fields into
167                 subfields delimited by <string>, and so on.
168                 There is no space separating the option and the level value.
169                 If no level is specified then the  given  options  will  only
170                 apply to level 2.
171       b - Use byte matching for the specified delimiter string.
172           By specifying this option the delimiter string is treated as just a
173           literal sequence of bytes. Normally command line arguments are pre‐
174           sented  to  xmlfy  as  ASCII  strings and if wide UTF encoding like
175           UTF-16 or UTF-32 is being used then xmlfy will  automatically  con‐
176           vert  the  specified  delimiter  string to that encoding. With this
177           option no encoding conversion takes place. In  this  mode  you  can
178           also  specify  escaped  decimal byte sequences inside the delimiter
179           string. E.g. "\123\234\\"
180       <scope> - A comma delimited set of sequence ranges with no spaces.
181                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
182                 <s1> - integer representing a start range.
183                 <s2> - integer or the $ token representing an end range.
184                 r    - restart the scope counter for this delimiter after the
185                        completion of the associated range.
186                 Restrict the delimiter effectiveness to the occurrences spec‐
187                 ified in <scope>. If a delimiter <string> is encountered  for
188                 the level specified and its sequence is not in the scope then
189                 it will not function as a field separator and will instead be
190                 treated as data.
191                 E.g.  -F3:1-3,8  "."  this is saying that level 3 fields will
192                 only be created for the 1st to 3rd, and  8th  occurrences  of
193                 the delimiter "." (period).
194                 The  restart  scope  counter  option  r allows you to specify
195                 repeating scope sequences.
196                 E.g -F1:2,5r "\n" this is saying create level 1  records  out
197                 of every second and fifth lines and keep repeating this until
198                 the input is exhausted.
199                 When using multiple same level delimiters,  restarting  scope
200                 counters  of  the equivalent level and higher get reset when‐
201                 ever a delimiter match is applied.
202                 If a <scope> range is not specified then the delimiter  func‐
203                 tion  applies  to  every occurrence of <string> of the target
204                 level.
205       <string> - A sequence of characters or token to be used as a delimiter.
206                  Tokens  specified  literally  as  "\n",  "\r",  and "\t" are
207                  translated to  their  corresponding  control  character.  If
208                  using  wide UTF encoding then <string> is automatically con‐
209                  verted to that encoding, otherwise  you  can  use  the  byte
210                  matching  option  and specify escaped decimal byte sequences
211                  inside <string>.
212       o If the delimiter token is the same for a series of levels then  obvi‐
213         ously  the shallowest level will take precedence, unless the shallow‐
214         est levels have been limited by scope restrictions. You can also make
215         use of quotes in the input along with specifying quote options.
216       o The XML tree algorithm deepens in a sequential way therefore you must
217         set your delimiter levels as an unbroken sequence for them to  be  of
218         any  use,  that  is  you  cannot split a level 2 field with a level 4
219         delimiter string.
220       o Refer to the schema option section for information on level  handling
221         when a schema file is specified.
222       o Levels 1 and 2 are already set by default.
223       o The default level 1 delimiter token is NEWLINE (new-line).
224       o The  default  level  2  delimiter  token  is  WHITESPACE (space, tab,
225         new-line, carriage-return, vertical-tab and form-feed).
226       o The delimiters for levels 3 and above are unset.
227       o Only one delimiter string token can be specified however this  option
228         can  be invoked multiple times allowing for multiple delimiters to be
229         used at the level specified.  When  specifying  multiple  same  level
230         delimiters,  the  larger  delimiter  strings  are  matched before the
231         smaller ones. The delimiter string is not included in the output.
232
233       Option: -R, --recordseparator <string>
234       This is a synonym for "-F1 <string>"
235       Allows you to specify a record separator string token that is different
236       from  the  default.  The  default  record  separator  token  is NEWLINE
237       (new-line).
238
239       Option: -C, --column[:<scope>] <c1>-<c2>[:<name>]
240       Use an input column range of the input  record  to  generate  an  input
241       field.  This  is  an  alternative method of capturing input fields from
242       using delimiters.
243       <scope> - A comma delimited set of sequence ranges with no spaces.
244                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
245                 <s1> - integer representing a start range.
246                 <s2> - integer or the $ token representing an end range.
247                 r    - restart the scope counter for this column option after
248                        the completion of the associated range.
249                 Restrict  the  column option effectiveness to the occurrences
250                 specified in <scope>.  If the input record sequence is not in
251                 the  scope  then  the  column  option will not be applied and
252                 input fields will not be generated.
253                 The restart scope counter option r allows the scope sequences
254                 to  continually repeat themselves. E.g -C:1-3,5r 1-20 this is
255                 saying capture column fields of 20 characters in  length  for
256                 every  first  to  third  and  fifth  input  records, and keep
257                 repeating this until the input is exhausted.
258                 If a <scope> range is not specified then  the  column  option
259                 applies to all input records.
260       <c1> - integer  or  the  $ token representing the start column range of
261              the input field.
262       <c2> - integer or the $ token representing the end column range of  the
263              input field.
264       <name>
265              -
266                optional  string  value  that will be used to override the tag
267                name for this input field.
268                You can pretty much specify anything as a tag  name  including
269                illegal XML therefore user discretion is advised.
270                Only  applicable for changing default behaviour (i.e. when the
271                --schema option is NOT specified).
272       o Specifying field separators of level 2 and above with this option  is
273         conflicting and will produce a usage error.
274       o The  number  of times and order in which this option is specified (in
275         conjunction with the -W option) determines the number of input fields
276         generated and their order.
277       o Column  ranges  represent  code points (characters) meaning any multi
278         byte character will only account for just one column position.
279       o Multiple options can use non linear ranges and can  overlap  e.g.  -C
280         5-10:part -C 1-$:whole
281       o Ranges  that  exceed  the  size  of the input record will not process
282         beyond the end of the input record.
283       o You can use single or double quotes to protect  the  range  from  the
284         shell interpreter e.g. -C '80-$:text'
285       o Only  one  parameter pair can be specified however this option can be
286         invoked multiple times.
287
288       Option: -W, --regex[:<scope>] /<pattern>/[i][l][r][u]
289       Use a regular expression on the input record to generate input  fields.
290       This  is  an  alternative  method  of capturing input fields from using
291       delimiters.
292       <scope> - A comma delimited set of sequence ranges with no spaces.
293                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
294                 <s1> - integer representing a start range.
295                 <s2> - integer or the $ token representing an end range.
296                 r    - restart the scope counter for this regex option  after
297                        the completion of the associated range.
298                 Restrict  the  regex  option effectiveness to the occurrences
299                 specified in <scope>.  If the input record sequence is not in
300                 the scope then the regex option will not be applied and input
301                 fields will not be generated.
302                 The restart scope counter option r allows the scope sequences
303                 to    continually    repeat    themselves.    E.g   -W:1-3,5r
304                 /(^A.*).*(B.*$)/ this is saying capture two regex fields  for
305                 every  first  to  third  and  fifth  input  records, and keep
306                 repeating this until the input is exhausted.
307                 If a <scope> range is not specified  then  the  regex  option
308                 applies to all input records.
309       <pattern> - A  POSIX  1003.2 compliant Extended Regular Expression pat‐
310                   tern utilising zero or more parenthesis  pairs  to  capture
311                   input fields.
312       i - Flag to ignore case.
313       l - Flag to treat <pattern> as a literal.
314       r - Flag to make concatenation right associative.
315       u - Flag to make operators ungreedy by default.
316       o Specifying  field separators of level 2 and above with this option is
317         conflicting and will produce a usage error.
318       o The number of times and order in which this option is  specified  (in
319         conjunction with the -C option) determines the number of input fields
320         generated and their order.
321       o If matches are not made for all parenthesis pairs specified in  <pat‐
322         tern> then no output will result.
323       o If  no  parenthesis  pairs are specified in <pattern> then the entire
324         input record will be used as the output when a pattern match occurs.
325       o Wide UTF encoding can be specified in <pattern> by using the \x  lit‐
326         eral  followed by two hexadecimal digits to represent any byte inside
327         the code-point e.g. \x0b.
328       o For further information on using regex syntax and the  pattern  flags
329         please consult the TRE documentation.
330       o
331         You  can  use single or double quotes to protect the pattern from the
332         shell interpreter e.g. -W '/(^Pam .*)/iu'
333       o You can specify the percentage character % as an alternative  separa‐
334         tor to forward-slash / for <pattern> so long as it remains paired.
335       o Only  one  parameter pair can be specified however this option can be
336         invoked multiple times.
337
338       Option: -e, --expelempty
339       Expel input fields that are empty (zero bytes  in  length)  from  being
340       processed.  The  use  of multi level and multiple same level delimiters
341       can sometimes yield plenty of empty fields which  may  be  undesirable.
342       This  option  expels all the empty input fields from being processed by
343       the output processor.  All levels are examined and  any  input  records
344       comprised entirely out of empty fields are also expelled.
345       This  option  will always run before any expelling tasks specified with
346       option -E are run.
347       This option has no influence on levels subjected to  key/value  pairing
348       as  that  process  has  its own way of dealing with empty fields at its
349       target levels.
350       If a schema is used then obviously the number of  input  records/fields
351       used for element matching has been reduced.
352
353       Option: -E, --expel <input_records>[:<input_fields>]
354       Expel selected input records or selected input fields of selected input
355       records from being processed. Each input record is checked against  the
356       expel  criteria and if a match occurs then these input records or input
357       fields are simply discarded from being passed  onto  the  xmlfy  output
358       processor.
359       <input_records> - A  comma delimited set of input record expel criteria
360                         with no spaces.
361                         The <input_records>  parameter  has  a  sub  form  of
362                         <range_type><r1>[-<r2>][/<string>/][,..]
363                         Where <range_type> can be 'n', 'f' or 'c'.
364                         n - the  associated range refers to input record num‐
365                             bers.
366                         f - the associated range refers to input  field  num‐
367                             bers.
368                         c - the associated range refers to input record char‐
369                             acter lengths.
370                         <r1> - integer representing a start range.
371                         <r2> - integer or the $  token  representing  an  end
372                                range.
373                         <string> - the  specified  <string>  must  also exist
374                                    within the range.
375                                    Expel criteria types can be intermixed.
376                                    E.g.         -E         n10-$,f7-8,f4/Mer‐
377                                    cedes/,c10-20,c1-15/SUV/
378                                    this  is  saying  that input records whose
379                                    record number is greater than or equal  to
380                                    10,  AND  input records whose total number
381                                    of fields are between 7 and 8,  AND  input
382                                    records whose 4th input field contains the
383                                    string "Mercedes", AND input records whose
384                                    input  record  length  is  greater than or
385                                    equal to 10 but less than or equal  to  20
386                                    characters,  AND input records whose first
387                                    15 characters contain  the  string  "SUV",
388                                    will  finally match the input record expel
389                                    criteria.
390                                    In this release you can only specify the $
391                                    token  (last  input  record)  in  a paired
392                                    range and not on its own.
393                                    Generally xmlfy can figure out  where  the
394                                    search   string  delimiters  would  likely
395                                    occur however you can specify the %  char‐
396                                    acter as an alternative separator to / for
397                                    <string> so long as it remains paired.
398                                    If an <input_fields> criteria is not spec‐
399                                    ified  then  the  entire  input  record is
400                                    expelled.
401       <input_fields> - A comma delimited set of field number ranges  with  no
402                        spaces.
403                        The   <input_fields>  parameter  has  a  sub  form  of
404                        <r1>[-<r2>][,..]
405                        <r1> - integer or the $  token  representing  a  start
406                               range.
407                        <r2> - integer  or  the  $  token  representing an end
408                               range.
409                        Discard select input fields of the input records  that
410                        match the expel criteria before passing onto the xmlfy
411                        output processor.
412                        E.g. -E n2-$:1,$ this is  saying  that  input  records
413                        whose record number is greater than or equal to 2 will
414                        have their first and last fields expelled.
415                        You can specify the $ token (last input  field)  in  a
416                        paired range or on its own.
417       o You  can  use  single  or double quotes to protect the range from the
418         shell interpreter e.g. -E 'n2-$:$'
419       o If a schema is used then obviously the number of input records/fields
420         used for element matching has been reduced.
421       o Only  one parameter group can be specified however this option can be
422         invoked multiple times with resolution occurring from left to right.
423
424       Option: -q, --quotedfields[2]
425       Treat fields that are quoted as one field. Normally  xmlfy  will  parse
426       fields  by  their  delimiter  e.g. WHITESPACE, this option allows multi
427       delimited fields to be specified as one by quoting them. By default the
428       quoted  field  may  only  span  the current input record unless the -q2
429       option is specified in which case the quoted field  can  span  multiple
430       input records.
431       Quotes are not included in the field and any leading/trailing text out‐
432       side the field's quotes are truncated.
433       If quotes are not closed xmlfy will update the field until the  end  of
434       the  input  record,  or  if option -q2 is specified, until the input is
435       exhausted (EOF).
436       The default quote character is a double quote (").
437
438       Option: -Q, --quotechars[2] <string>
439       specify a string of characters that can be used as the quoting  charac‐
440       ter.
441       <string> - an array of quoting characters.
442       o If field quoting is enabled then any input character that matches any
443         character in <string> will toggle the quoting  function,  unless  the
444         -Q2  option  is specified in which case characters in <string> repre‐
445         sent paired quotes with odd numbered characters in  this  array  tog‐
446         gling  the  open  quote function, and its corresponding pair toggling
447         the close quote function. This allows parenthesis, brackets,  etc  to
448         be used as quotes.
449       o Obviously  when  specifying this option care must be taken to prevent
450         the shell from interpreting the supplied quote characters. When using
451         a schema file containing this option you can specify quote characters
452         by escaping them with the backslash "\" character.
453
454       Option: -b, --blanklines
455       Normally xmlfy ignores blank lines or empty  level  1  records  in  the
456       input  stream.  This option tells xmlfy to not ignore these blank lines
457       and print out XML line record tags but with no elements.
458       In this mode blank lines count as line numbers.
459
460       Option: -t, --trim
461       Field elements are trimmed of leading and trailing whitespace.
462
463       Output options:
464
465       Option: -S, --schema <file>
466               -Sd, --schemadtd <file>
467               -Sr, --schemarnc <file>
468               -Sx, --schemaxsd <file>
469       Specify a schema <file> for controlling the XML output.
470       <file> - The schema file must comply with either the Document Type Def‐
471                inition  (.dtd)  language, or the RELAX NG Compact (.rnc) lan‐
472                guage, or the XML Schema  Document  (.xsd)  language,  however
473                xmlfy  does not support the finer aspects of these schema lan‐
474                guages at this early stage.
475       o When all input fields of the input record have been identified, xmlfy
476         will match them against the elements inside the tree hierarchy of the
477         schema file, and if a match is found then xmlfy will print an  output
478         record using the matching schema tree hierarchy as its XML structure.
479         Option  -S,  --schema  uses  the case-insensitive file name extension
480         (.dtd or .rnc or .xsd) of <file> to  determine  which  schema  inter‐
481         preter xmlfy will apply.
482         Option  -Sd,  --schemadtd  forces  xmlfy to use the DTD schema inter‐
483         preter on <file>.
484         Option -Sr, --schemarnc forces xmlfy to use  the  RNC  schema  inter‐
485         preter on <file>.
486         Option  -Sx,  --schemaxsd  forces  xmlfy to use the XSD schema inter‐
487         preter on <file>.
488       o You can specify multi level delimiters when using this option however
489         any  delimiters  greater  than level 2 are only used to identify more
490         input fields and are not used at all in altering the XML tree hierar‐
491         chy  as  is  dictated by the schema file. Fields with levels of 2 and
492         above are flattened to be just plain fields of  the  input  record  -
493         this  is  very  different to the default behaviour where field levels
494         form the XML tree hierarchy.
495       o If a schema option is not supplied then xmlfy will use default values
496         for tag names and element control.
497       o For  further  information  on  how to write a schema for xmlfy please
498         consult the web documentation.
499
500       Option: -d, --dtd <file>
501       This is a synonym for "-Sd, --schemadtd <file>"
502       This option has been depricated but  maintained  in  this  release  for
503       backwards compatibility.
504
505       Option: -A, --attribute[<level>[:<scope>]] number|level
506                       |delimiter|timestamp|insert <name> <value>
507       Include attributes in the opening element tag for the level specified.
508       <level> - The XML depth level to be modified.
509                 Must be an integer value greater than or equal to 0.
510                 E.g.  a  value  of  1  will  apply attributes to each opening
511                 record element and a value of 2 will apply attributes to each
512                 opening field element.
513                 There is no space separating the option and the level value.
514                 If no level is specified then the given options will apply to
515                 all levels except level 0.
516       <scope> - A comma delimited set of sequence ranges with no spaces.
517                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
518                 <s1> - integer representing a start range.
519                 <s2> - integer or the $ token representing an end range.
520                 r    - restart the scope counter for this attribute after the
521                        completion of the associated range.
522                 Restrict  the  custom  attribute  effectiveness to the occur‐
523                 rences specified in <scope>.  If the element sequence is  not
524                 in the scope then the custom attribute will not be applied.
525                 The restart scope counter option r allows the scope sequences
526                 to continually repeat themselves. E.g -A2:1-3,5r insert  x  y
527                 this is saying insert custom attributes x="y" for every first
528                 to third and fifth level 2 elements, and keep repeating  this
529                 until the output is exhausted.
530                 Scope sequence counters are always reset to zero for the next
531                 element depth level and higher whenever a  deeper  XML  depth
532                 level is entered into.
533                 If a <scope> range is not specified then the custom attribute
534                 function applies to all elements at the specified <level>.
535       number - Specify the sequence number as an element attribute.
536                E.g. <field> becomes <field number="1"> and the  next  <field>
537                becomes <field number="2"> and so on.
538                Scoping is not supported.
539                Not supported for level 0.
540       level - Specify the level as an element attribute.
541               E.g. <field> becomes <field level="2">
542               Scoping is not supported.
543               Not supported for level 0.
544       delimiter - Specify the matching delimiter as an element attribute.
545                   E.g. <field> becomes <field delimiter="ABC">
546                   Delimiter string tokens that contain illegal XML characters
547                   are printed as their hex pair equivalent.
548                   When using a schema file only level  1  records  and  field
549                   elements will have their delimiter attributes printed.
550                   Scoping is not supported.
551                   Not supported for level 0.
552       timestamp - Include a timestamp as an element attribute.
553                   Two  timestamps  are  provided,  one for humans and one for
554                   machines. The times are stamped at element print time.
555                   E.g.  <field>  becomes  <field  timestamp_date="Fri  May  5
556                   10:23:33 2008" timestamp_sec="123456790">
557                   Scoping is not supported.
558       insert <name> <value> - Insert a custom element attribute.
559                               The  parameters <name> and <value> are combined
560                               to  form  an  element  attribute  with  <value>
561                               wrapped around double quotes.
562                               E.g <field> becomes <field name="value">
563                               You  can  pretty  much  specify  anything as an
564                               attribute name and value including illegal  XML
565                               therefore user discretion is advised.
566       attrname - This is a synonym for the insert parameter.
567                  This  option  has  been  depricated  but  maintained in this
568                  release for backwards compatibility.
569       o Only one parameter group can be specified however this option can  be
570         invoked multiple times.
571
572       Option: -T, --tag[<level>[:<scope>]] number|level
573                       |tagname <name>|insert <name> <value>
574                       |insertfile <name> <file>|insertfilexml <indent> <file>
575       Modify the element tags for the level specified.
576       <level> - The XML depth level to be modified.
577                 Must be an integer value greater than or equal to 0.
578                 E.g.  a  value  of 1 will modify the tag name for each record
579                 and a value of 2 will modify the tag name for each field.
580                 There is no space separating the option and the level value.
581                 If no level is specified then the given options will apply to
582                 all levels except level 0.
583       <scope> - A comma delimited set of sequence ranges with no spaces.
584                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
585                 <s1> - integer representing a start range.
586                 <s2> - integer or the $ token representing an end range.
587                 r    - restart  the scope counter for this tag after the com‐
588                        pletion of the associated range.
589                 Restrict the custom  tag  effectiveness  to  the  occurrences
590                 specified  in <scope>.  If the element sequence is not in the
591                 scope then the custom tag will not be applied.
592                 The restart scope counter option r allows the scope sequences
593                 to  continually  repeat themselves. E.g -T2:1-3,5r insert x y
594                 this is saying insert the custom tag  <x>y</x>  before  every
595                 first to third and fifth level 2 elements, and keep repeating
596                 this until the output is exhausted.
597                 Scope sequence counters are always reset to zero for the next
598                 element  depth  level  and higher whenever a deeper XML depth
599                 level is entered into.
600                 If a <scope> range is not specified then the custom tag func‐
601                 tion applies to all elements at the specified <level>.
602       number - Suffix the tag name with its sequence number.
603                E.g.  <line>  becomes  <line1>  and  the  next  <line> becomes
604                <line2> and so on.
605                Scoping is not supported.
606                Not supported for level 0.
607       level - Prefix the tag name with its level.
608               E.g. <field> becomes <L2field>
609               Scoping is not supported.
610               Not supported for level 0.
611       tagname <name> - Change the tagname from the default to <name>
612                        Only applicable for changing default  behaviour  (i.e.
613                        when the --schema option is NOT specified).
614                        E.g. <field> becomes <word>
615                        You  can  pretty  much  specify  anything as a tagname
616                        including illegal XML  therefore  user  discretion  is
617                        advised.
618                        Scoping is not supported.
619       insert <name> <value> - insert a custom element tag.
620                               The  parameters <name> and <value> are combined
621                               to form an element  tag  with  <value>  wrapped
622                               between      <name>      tag     pairs.     E.g
623                               <name>value</name>
624                               The inserted element appears before any  output
625                               elements for the level specified.
626                               You can pretty much specify anything as an ele‐
627                               ment  name  and  value  including  illegal  XML
628                               therefore user discretion is advised.
629                               Not supported for level 0.
630       insertfile <name> <file> - Insert  a custom element tag containing con‐
631                                  tents of a file.
632                                  The contents of <file> are  wrapped  between
633                                  <name> tag pairs.
634                                  The encoding of <file> must match the output
635                                  encoding being used otherwise an undesirable
636                                  output will result.
637                                  Any BOM found in <file> is removed.
638                                  Any  reserved  XML  characters in <file> are
639                                  escaped, and newlines are corrected.
640                                  The inserted element appears before any out‐
641                                  put elements for the level specified.
642                                  You  can  pretty much specify anything as an
643                                  element name including illegal XML therefore
644                                  user discretion is advised.
645                                  Not supported for level 0.
646       insertfilexml <indent> <file> - Insert contents of an XML file.
647                                       The   entire  contents  of  <file>  are
648                                       inserted before any output elements for
649                                       the level specified.
650                                       The  encoding  of <file> must match the
651                                       output encoding being used otherwise an
652                                       undesirable output will result.
653                                       Any BOM found in <file> is removed.
654                                       If the parameter <indent> is an integer
655                                       value greater than  or  equal  to  zero
656                                       then  the contents of file are indented
657                                       by this amount,  any  XML  prologue  is
658                                       removed, and newlines are corrected.
659                                       If  the parameter <indent> is the value
660                                       "raw" then the XML file is inserted  as
661                                       is without its BOM.
662                                       You  can pretty much insert anything as
663                                       XML file content including illegal  XML
664                                       therefore user discretion is advised.
665       o Only  one parameter group can be specified however this option can be
666         invoked multiple times.
667
668       Option: -k, --keyvaluepairs[<level>]
669       Switch on the generation of key/value XML tag pairs for the output.
670       <level> - The XML depth level to be modified.
671                 Must be an integer value greater than or equal to 2.
672                 There is no space separating the option and the level value.
673                 If no level is specified then the option will  apply  to  all
674                 levels except levels 0 and 1.
675       o In  this  mode  the  data of the first field of the current XML level
676         becomes the tag name for that level, that is, it becomes the key, and
677         any subsequent fields become its value.
678       o This  key/value pairing continues down the XML tree hierarchy for all
679         the XML levels specified.
680       o You can pretty much generate anything as a tag name including illegal
681         XML therefore user discretion is advised. The new tag name is trimmed
682         of leading and trailing whitespace and  whitespace  between  text  is
683         replaced with the underscore "_" character.
684       o If a blank field becomes a tag name candidate then xmlfy will skip it
685         and search along the same level for a more suitable  candidate.  This
686         behaviour  can  be  mitigated by using the -b option which will force
687         the default tag name to be substituted instead.
688       o Only  applicable  for  changing  default  behaviour  (i.e.  when  the
689         --schema option is NOT specified).
690       o This option can be invoked multiple times.
691
692       Option: -l, --linenumbers
693       This is a synonym for "-T1 number"
694       Include the line number in the line tag name
695
696       Option: -f, --fieldnumbers
697       This is a synonym for "-T2 number"
698       include the field number in the field tag name
699
700       Option: -L, --linetags
701       Insert a line number tag within the XML formatted output.
702       This  is an alternative way of numbering your XML records. E.g. for the
703       first line record of XML output the following tag is inserted <linenum‐
704       ber>1</linenumber> and so on.
705
706       Option: -X, --xmlformat [XML1.0|XML1.1]|[SOAP1.1|SOAP1.2]
707               |[UTF-8|UTF-16|UTF-16BE|UTF-16LE|UTF-32|UTF-32BE|UTF-32LE]|BOM
708               |ASCIItoUTF|[noescape all|amp|lt|gt|quot|apos|brvbar]
709               |[newline dos|unix]
710       Allows you to specify the XML format to be used for the output.
711       XML1.0 - Generate XML 1.0 output (this is the default).
712       XML1.1 - Generate XML 1.1 output.
713       SOAP1.1 - Generate XML SOAP 1.1 output.
714       SOAP1.2 - Generate XML SOAP 1.2 output.
715       UTF-8 - Generate XML with UTF-8 encoding (default).
716       UTF-16 - Generate XML with UTF-16 encoding.
717       UTF-16BE - Generate XML with UTF-16BE (big-endian) encoding.
718       UTF-16LE - Generate XML with UTF-16LE (little-endian) encoding.
719       UTF-32 - Generate XML with UTF-32 encoding.
720       UTF-32BE - Generate XML with UTF-32BE (big-endian) encoding.
721       UTF-32LE - Generate XML with UTF-32LE (little-endian) encoding.
722       BOM - Generate and interpret a Byte-Order-Mark.
723       ASCIItoUTF - Convert ASCII input to wide UTF encoding.
724       noescape - Do  not  escape  select reserved XML characters.  By default
725                  xmlfy will escape reserved XML characters that appear in the
726                  input  stream and this option provides an adjustment to this
727                  behaviour.
728                  all - do not escape any characters.
729                  amp - do not escape the character & (ampersand).
730                  lt - do not escape the character < (less-than).
731                  gt - do not escape the character > (greater-than).
732                  quot - do not escape the character " (quote).
733                  apos - do not escape the character ' (apostrophe).
734                  brvbar - do not escape  the  character  |  (broken  vertical
735                           bar).
736       newline - Select the line ending format for XML meta-data.
737                 dos - Use carriage-return and new-line ("\r\n") for line end‐
738                       ings.
739                 unix - Use new-line ("\n") for line endings.
740       o The only thing option XML1.1 does  is  change  the  prologue  version
741         string to "1.1" and nothing else.
742       o When using the SOAP options, the normal XML output generated by xmlfy
743         is encapsulated in a SOAP Envelope and SOAP Body, the root or  parent
744         tag  defines  a namespace prefix of "x" with a URI reference that can
745         be adjusted with the -I option, and all  children  elements  (records
746         and fields) use this prefix name.
747         A  non-mandatory  administrative header element with a prefix name of
748         "xh" is provided containing program and execution details.
749         The SOAP options are only a basic  implementation  for  generating  a
750         simple  XML  SOAP envelope containing xmlfy data. There is no further
751         scope provided for SOAP Headers, SOAP Faults, transaction or protocol
752         handling.
753       o The  UTF-*  options  tell xmlfy to use the specified encoding for all
754         its XML meta-data (element tags, element attributes, prologues, etc).
755         Other  than  the  ASCIItoUTF  option,  no transformation of the input
756         stream is performed and xmlfy assumes that the encoding used  by  the
757         input stream matches the encoding specified, otherwise an undesirable
758         output will result containing different encodings between  the  input
759         data and XML meta-data.
760         If  specifying  the  UTF-16 or UTF-32 parameter and the BOM option is
761         either not specified or there is no BOM  in  the  input  stream  then
762         encoding in big-endian format will be assumed.
763       o The  BOM  (Byte-Order-Mark) will force xmlfy to handle the BOM in the
764         input stream if it is there, and also generate a BOM  in  the  output
765         stream.  If specifying the BOM option and a BOM is found in the input
766         stream then that will overide any user specified encoding option.
767         The BOM byte sequence used for UTF-8 is 0xef 0xbb 0xbf (U+FEFF).
768         The BOM byte sequence used for UTF-16BE is 0xfe 0xff (U+FEFF).
769         The BOM byte sequence used for UTF-16LE is 0xff 0xfe (U+FFFE).
770         The BOM byte sequence used  for  UTF-32BE  is  0x00  0x00  0xfe  0xff
771         (U+FEFF).
772         The  BOM  byte  sequence  used  for  UTF-32LE  is 0xff 0xfe 0x00 0x00
773         (U+FFFE).
774       o The ASCIItoUTF option when used in conjunction with one of the  UTF-*
775         options  will  process  ASCII  input  and  convert it to the wide UTF
776         encoding specified.
777       o The noescape options control which reserved XML characters should not
778         be escaped.
779       o The  newline  option  adjusts  the  line  ending  format used for XML
780         meta-data. On Unix platforms the default is unix and on  Win32  plat‐
781         forms  the  default  is dos. Only applies to XML meta-data output and
782         does not do input conversion. To match DOS input lines use either the
783         explicit -F1b \013\010 delimiter option, or use the default delimiter
784         and trim out the \013 character with the -t option.
785       o Only one parameter group can be specified however this option can  be
786         invoked multiple times.
787
788       Option: -p, --printonly header|footer|rtagopen|rtagclose|records
789       Allows you to just print XML snippets to the output.
790       This  is  useful  when you want to execute xmlfy multiple times to con‐
791       struct a single XML output file.
792       header
793              -
794                Will only print the prologue, doctype,  opened  SOAP  Envelope
795                and Body tags, the SOAP Header tag, and the BOM.
796       footer - Will only print closed SOAP Envelope and Body tags.
797       rtagopen - Will only print an opened root element tag.
798       rtagclose - Will only print a closed root element tag.
799       records - Will only print record elements and their field elements.
800       o Only  one  parameter  can  be  specified  however  this option can be
801         invoked multiple times.
802
803       Option: -I, --identifier <system_identifier>
804       Allows you to specify your own system identifier of the doctype  should
805       you not be content with what xmlfy has specified.
806       system_identifier - An array of characters used to override the default
807                           system identifier.
808                           You can pretty much specify anything  as  a  system
809                           identifier  including  illegal  XML  therefore user
810                           discretion is advised.
811       o By default xmlfy will use the string "xmlfy.dtd", or if specifying  a
812         schema, use the schema filename as the system identifier.
813       o You  can  also  use this option to overide the default SOAP namespace
814         URI value for the root or parent element when using the XML SOAP for‐
815         mat options.
816
817       Option: -s, --summary[2|c|n|f <file>]
818       When  all  input  is exhausted an XML summary element is printed at the
819       bottom providing a brief summary of what xmlfy processed.
820       2        - Print the summary element to stderr instead.
821       c        - Print the summary element as an XML comment.
822       n        - Print the summary element without  calculating  any  message
823                  digests.
824       f <file> - Print the summary element to <file>.
825       By  default  MD5  and  SHA512 checksum elements are provided inside the
826       summary called md5_input, md5_output, sha512_input  and  sha512_output.
827       The  md5_input and sha512_input checksums are a digest of all the input
828       that was actually processed including any input BOM. The md5_output and
829       sha512_output  checksums  are  a digest of all the output including any
830       output BOM that precedes the XML summary element. To correctly validate
831       the output result against the output checksum you must first remove any
832       summary element and summary comments from the output result.
833
834       Option: -U, --unxml
835       Read XML formatted input and remove all that bracket  racket  reverting
836       your  XML  document  back to a plain format. Can be used in conjunction
837       with the -F<level> <string> option to specify the delimiter to use  for
838       each  XML  depth level.  Multiple same level -F options are meaningless
839       in this context and delimiters are only inserted if more than one field
840       is  available  to  be  delimited.  Field  separator scoping options are
841       ignored. The default delimiter is a space character for XML depth  lev‐
842       els  of  2  and  above,  and new-line for XML depth levels below 2. Tag
843       names and their attributes are not included in the output, and anything
844       between  XML  comments are filtered out. If there is a BOM in the input
845       then xmlfy will use that for the encoding, otherwise  xmlfy  will  look
846       for  the opening XML character sequence of "<?" to determine the encod‐
847       ing being used.  If neither of the previous methods found  the  correct
848       encoding  then  you  can use the -X UTF-* options as a fallback.  Works
849       best with XML output generated by xmlfy but can also be used with  cau‐
850       tion on other foreign XML documents.
851
852       Option: --noxml
853       Do not XML-fy the input stream but do process it for reserved XML char‐
854       acters (this feature was initially written  for  formatting  the  xmlfy
855       HTML  test  reports  that use wide encodings).  Used in conjuction with
856       the -X options to control the conversion of reserved characters  and/or
857       to transform the input stream to wide UTF encodings.
858       E.g.  To  transform  an  ASCII input stream to UTF-16BE encoding with a
859       BOM:
860       xmlfy --noxml -X UTF-16BE -X ASCIItoUTF -X noescape all -X BOM
861       E.g. To just escape select reserved XML characters in an UTF-32LE input
862       stream:
863       xmlfy --noxml -X UTF-32LE -X noescape amp
864
865       Important note on specifying options.
866       The way xmlfy handles options is very straightforward and can be easily
867       confused if you don't follow the syntax specified for each option.  The
868       getopt library has been deliberately avoided to keep xmlfy portable.
869
870       xmlfy first evaluates options supplied on the command line, if a schema
871       file is supplied then xmlfy will also look for options in that file and
872       evaluate  them too. See the schema file section below on how to specify
873       xmlfy options inside a schema file.
874

OUTPUT

876       How it works.
877       The input processor used by xmlfy block reads  unprocessed  bytes  from
878       standard  input (stdin) and stores them in an array the size of a level
879       1 record. This level 1 record is then  processed  for  fields  and  sub
880       fields  etc  by  marking  their positions in this array. Dynamic memory
881       handling is used.
882
883       The output processor used by xmlfy takes the  results  from  the  input
884       processor  and  re-packages  it  with  suitably encoded XML syntax. Any
885       input characters that are reserved for XML  are  by  default  re-repre‐
886       sented in their escaped form.
887           Character & (ampersand) becomes string &amp;
888           Character < (less-than) becomes string &lt;
889           Character > (greater-than) becomes string &gt;
890           Character " (quote) becomes string &quot;
891           Character ' (apostrophe) becomes string &apos;
892           Character | (broken vertical bar) becomes string &brvbar;
893       The  output processor then writes processed bytes to a block buffer for
894       printing to standard output (stdout).
895
896       Using a schema file.
897       The default schema used by xmlfy is hard coded and can be described  as
898       follows:
899       In DTD schema form:
900           <!ELEMENT xmlfy (line*)>
901           <!ELEMENT line (field*)>
902           <!ELEMENT field (#PCDATA)>
903       In RNC schema form:
904           start = xmlfy
905           xmlfy = element xmlfy { line* }
906           line = element line { field* }
907           field = element field { text }
908       In XSD schema form:
909           <xs:schema>
910             <xs:element name="xmlfy">
911               <xs:sequence>
912                 <xs:element  name="line" type="lineType" minOccurs="0" maxOc‐
913           curs="unbounded" />
914               </xs:sequence>
915             </xs:element>
916             <xs:complexType name="lineType">
917               <xs:sequence>
918                 <xs:element name="field" type="xs:string" minOccurs="0"  max‐
919           Occurs="unbounded" />
920               </xs:sequence>
921             </xs:complexType>
922           </xs:schema>
923
924       A schema file for the ls -la command that produces output like this:
925           total 73
926           drwx------+  3 ag None     0 Apr 20 19:36 .
927           -rwxr-xr-x   1 ag None 15639 Apr 20 19:31 a.exe
928           -rwx------+  1 ag None  6354 Apr 20 19:31 xmlfy.c
929           -rwx------+  1 ag None  4901 Apr 19  2008 xmlfy.h
930
931       In DTD schema form will look like this:
932           <!ELEMENT ls (total?), (file*)>
933           <!ELEMENT total (prompt, totalsize)>
934           <!ELEMENT   file   (permission?,  blocks?,  user?,  group?,  size?,
935           date_M?, date_d?, date_ty?, fname)>
936           <!ELEMENT date_ty (date_y)>
937           <!ELEMENT date_ty (date_h, date_m)>
938           <!ELEMENT prompt (#PCDATA)>
939           <!ELEMENT totalsize (#PCDATA)>
940           <!ELEMENT permission (#PCDATA)>
941           <!ELEMENT blocks (#PCDATA)>
942           <!ELEMENT user (#PCDATA)>
943           <!ELEMENT group (#PCDATA)>
944           <!ELEMENT size (#PCDATA)>
945           <!ELEMENT date_y (#PCDATA)>
946           <!ELEMENT date_M (#PCDATA)>
947           <!ELEMENT date_d (#PCDATA)>
948           <!ELEMENT date_h (#PCDATA)>
949           <!ELEMENT date_m (#PCDATA)>
950           <!ELEMENT fname (#PCDATA)>
951
952       and should be saved to a file as ls.dtd and invoked as:
953           % ls -la | xmlfy --schema ls.dtd -F3 :
954
955       In RNC schema form will look like this:
956           start = ls
957           ls = element ls { total? | file* }
958           total = element total { prompt, totalsize }
959           file = element file { permission?, blocks?, user?,  group?,  size?,
960           date_M?, date_d?, date_ty?, fname }
961           date_ty = element date_ty { date_y }
962           date_ty |= element date_ty { date_h, date_m }
963           prompt = element prompt { text }
964           totalsize = element totalsize { text }
965           permission = element permission { text }
966           blocks = element blocks { text }
967           user = element user { text }
968           group = element group { text }
969           size = element size { text }
970           date_y = element date_y { text }
971           date_M = element date_M { text }
972           date_d = element date_d { text }
973           date_h = element date_h { text }
974           date_m = element date_m { text }
975           fname = element fname { text }
976
977       and should be saved to a file as ls.rnc and invoked as:
978           % ls -la | xmlfy --schema ls.rnc -F3 :
979
980       In XSD schema form will look like this:
981           <xs:schema>
982             <xs:element name="ls" type="lsType" />
983             <xs:complexType name="lsType">
984               <xs:sequence>
985                 <xs:element name="total" type="totalType" minOccurs="0" />
986                 <xs:element  name="file" type="fileType" minOccurs="0" maxOc‐
987           curs="unbounded" />
988               </xs:sequence>
989             </xs:complexType>
990             <xs:complexType name="totalType">
991               <xs:sequence>
992                 <xs:element name="prompt" type="xs:string" />
993                 <xs:element name="totalsize" type="xs:string" />
994               </xs:sequence>
995             </xs:complexType>
996             <xs:complexType name="fileType">
997               <xs:sequence>
998                 <xs:element name="permission" type="xs:string"  minOccurs="0"
999           />
1000                 <xs:element name="blocks" type="xs:string" minOccurs="0" />
1001                 <xs:element name="user" type="xs:string" minOccurs="0" />
1002                 <xs:element name="group" type="xs:string" minOccurs="0" />
1003                 <xs:element name="size" type="xs:string" minOccurs="0" />
1004                 <xs:element name="date_M" type="xs:string" minOccurs="0" />
1005                 <xs:element name="date_d" type="xs:string" minOccurs="0" />
1006                 <xs:element name="date_ty" type="datetyType" minOccurs="0" />
1007                 <xs:element name="fname" type="xs:string" />
1008               </xs:sequence>
1009             </xs:complexType>
1010             <xs:complexType name="datetyType">
1011               <xs:choice>
1012                 <xs:element name="date_y" type="xs:string" />
1013                 <xs:sequence>
1014                   <xs:element name="date_h" type="xs:string" />
1015                   <xs:element name="date_m" type="xs:string" />
1016                 </xs:sequence>
1017               </xs:choice>
1018             </xs:complexType>
1019           </xs:schema>
1020
1021       and should be saved to a file as ls.xsd and invoked as:
1022           % ls -la | xmlfy --schema ls.xsd -F3 :
1023
1024       Shoe-horning  raw  data  into a structure defined by a schema is rather
1025       straight forward when the input fields have a  one-to-one  relationship
1026       with  the  fields  of  the  schema elements, however if wildcard tokens
1027       and/or Boolean logic are employed in the schema then it becomes quite a
1028       challenge,  sometimes  even impossible, to be deterministic about which
1029       input field belongs to which schema field. Strictly speaking, the  main
1030       function  of  the  schema  is  to  ensure  XML  is valid and to do this
1031       requires the XML document to already pre-exist. In xmlfy's case we  are
1032       doing  the reverse by building an XML document on the fly while follow‐
1033       ing rules described by the schema - this is still okay and the  result‐
1034       ing XML can be considered to be both valid and well formed.
1035
1036       xmlfy  employs two techniques to help with this shoe-horning input data
1037       problem. The first technique xmlfy uses is recognising multiple element
1038       definitions  that  have  the same name. This allows you to capture your
1039       schema elements under a variety of input circumstances  without  having
1040       to  create  a  unique  element for each circumstance - you can still do
1041       that if you want. The second  technique  xmlfy  uses  is  incorporating
1042       field  match  constraint helpers to assist in matching the input fields
1043       to the elements described by the schema. These helpers  are  useful  in
1044       improving  the  speed of xmlfy particularly when using compound element
1045       structures and wildcard tokens  in  the  schema  hierarchy.  After  the
1046       schema file is loaded into memory, an array of helpers is generated for
1047       each element that describes all combinations of the schema tree traver‐
1048       sal  paths  that  can be taken and associates each combination with the
1049       minimum, maximum and last number of fields required for a match against
1050       the  number  of  available  input  fields. For example, using the above
1051       schema a match will occur for:
1052           total(min=2, max=2, last=2) when input fields = 2.
1053           file(min=1, max=9, last=1) when 1 >= input fields <= 9
1054           and date_ty is a single field (min=1, max=1, last=1).
1055           file(min=1, max=10, last=1) when 1 >= input fields <= 10
1056           and date_ty is two fields (min=2, max=2, last=2).
1057       xmlfy will iterate through all the matching helpers of the target  ele‐
1058       ment and any child elements that may be encountered, until it can fully
1059       satisfy the requirements of the schema tree hierarchy after  which  the
1060       matching record is then checked against its wildcard obligations in the
1061       parent element definition and if okay is finally printed.
1062
1063       To specify xmlfy options inside a  schema  file  you  encapsulate  them
1064       inside a special token that is in effect a schema comment.
1065           DTD and XSD example:
1066           <!-- xmlfy-args: -F1 "\n" -F2 ABC -q -Q \"\' -->
1067           RNC example:
1068           ## xmlfy-args: -F1 "\n" -F2 ABC -q -Q \"\'
1069       This special token must exist in completed form on just one line at the
1070       left most side, spacing is important,  only  the  first  occurrence  is
1071       recognised,  and  ideally  it  is  placed somewhere near the top of the
1072       schema file. The schema option syntax is the same as the  command  line
1073       option syntax except that some options are not allowed e.g. --schema.
1074

LIMITATIONS

1076       xmlfy  has  been  successfully  tested  on  average hardware with input
1077       records containing over 10,000,000 fields whilst using a complex schema
1078       tree structure and multi level delimiters.
1079
1080       Currently  the  xmlfy  schema file parser is not that sophisticated and
1081       exhibits the following behaviour:
1082
1083       DTD schema
1084       - Only recognises the <!ELEMENT> directive and ignores all others.
1085       - The first valid <!ELEMENT> definition becomes the root or parent ele‐
1086         ment.
1087       - Element fields that don't have an element definition default to being
1088         (#PCDATA).
1089       - Elements defined as (#PCDATA) or (#CDATA)  are  ignored  causing  the
1090         referring  field  to default to (#PCDATA) however it is good practice
1091         to include these elements in order to furnish a complete DTD schema.
1092       - Only honours the +, ? and * wildcard tokens.
1093       - At this stage does not honour field group sets () and or-ing | syntax
1094         tokens.
1095
1096       RNC schema
1097       - Only recognises named directives and ignores all others.
1098       - The element named "start" becomes the root or parent element.
1099       - Element fields that don't have an element definition default to being
1100         { text }.
1101       - Elements defined as { text } are ignored causing the referring  field
1102         to  default  to { text } however it is good practice to include these
1103         elements in order to furnish a complete RNC schema.
1104       - Only honours the +, ? and * wildcard tokens.
1105       - At this stage does not honour field group sets () and or-ing | syntax
1106         tokens.
1107
1108       XSD schema
1109       - Only   recognises  the  <schema>,  <element>,  <complexType>,  <ref>,
1110         <sequence>, and <choice> directives and ignores all others.
1111       - The recognised directives are not fully  implemented  and  their  use
1112         should be kept straightforward.
1113       - The  first valid <element> definition becomes the root or parent ele‐
1114         ment.
1115       - Element types that are not of matchable complexType  are  treated  as
1116         "xsi:string" regardless of what type is specified.
1117       - Only    honours   the   minOccurs="0",   maxOccurs="0"   and   maxOc‐
1118         curs="unbounded" wildcard attributes.
1119       - At this stage does not honour group sets but does do limited  support
1120         with choices.
1121
1122       All schema types
1123       - The  fields  of  the  parent  element define all the level 1 elements
1124         (lets call these fields the record elements).
1125       - The fields of the record elements simply represent other elements and
1126         element nesting is allowed.
1127       - The  field  names  that  are specified in the element definitions are
1128         read from left to right and matched against a field  number  calcula‐
1129         tion  on  the  input  fields,  and then matched again on any wildcard
1130         tokens.
1131       - You can wildcard many fields but you should think clearly about  what
1132         you  are  trying  to  achieve  and whether it is at all possible. For
1133         example, the following DTD which is perfectly suitable  for  checking
1134         for  valid  XML, will however prove impossible for xmlfy to shoe-horn
1135         input data into DTD elements a, b and c reliably  because  more  than
1136         one field has a wildcard token to match none or many input fields.
1137             <!ELEMENT parent (record)>
1138             <!ELEMENT record (a*, b, c*)>
1139             <!ELEMENT a (#PCDATA)>
1140             <!ELEMENT b (#PCDATA)>
1141             <!ELEMENT c (#PCDATA)>
1142         In  the above example xmlfy will allocate ALL input fields to element
1143         <a> and that MAY not be the desired intention.
1144

RETURN VALUES

1146        0    Normal exit.
1147       -1    Invalid argument specified.
1148       -2    Error processing schema file contents.
1149       -3    Infinite loop detected when matching input  against  schema  ele‐
1150             ments.
1151       -10   Out of memory.
1152

AUTHOR

1154       Originally written by Arthur Gouros.
1155       This  software  also  contains material derived from the US Secure Hash
1156       Algorithms (RFC4634).
1157       This software also contains material derived from the  RSA  Data  Secu‐
1158       rity, Inc. MD5 Message-Digest Algorithm.
1159       This software also contains material derived from Ville Laurikari's TRE
1160       regex library.
1161

LICENSE

1163       BSD License for xmlfy
1164       Copyright © 2008-2010, Arthur Gouros
1165       All rights reserved.
1166
1167       Redistribution and use in source and binary forms, with or without mod‐
1168       ification,  are  permitted  provided  that the following conditions are
1169       met:
1170
1171       - Redistributions of  source  code  must  retain  the  above  copyright
1172         notice, this list of conditions and the following disclaimer.
1173       - Redistributions  in  binary  form  must reproduce the above copyright
1174         notice, this list of conditions and the following disclaimer  in  the
1175         documentation and/or other materials provided with the distribution.
1176       - Neither  the  name of Arthur Gouros nor the names of its contributors
1177         may be used to endorse or promote products derived from this software
1178         without specific prior written permission.
1179
1180       THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
1181       IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT  NOT  LIMITED
1182       TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTIC‐
1183       ULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT  OWNER  OR
1184       CONTRIBUTORS  BE  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
1185       EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,  BUT  NOT  LIMITED  TO,
1186       PROCUREMENT  OF  SUBSTITUTE  GOODS  OR  SERVICES; LOSS OF USE, DATA, OR
1187       PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY  OF
1188       LIABILITY,  WHETHER  IN  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
1189       NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT  OF  THE  USE  OF  THIS
1190       SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1191

SEE ALSO

1193       The full documentation of the xmlfy project can be found on the web at:
1194
1195           http://xmlfy.sourceforge.net
1196
1197       The website is updated more frequently than the man pages and should be
1198       considered the authoritative source of information.
1199
1200
1201
1202xmlfy 1.5.3                        May 2010                           xmlfy(1)
Impressum