1xmlfy(1)                         User Commands                        xmlfy(1)
2
3
4

NAME

6       xmlfy - Convert to XML on the fly.
7

SYNOPSIS

9       xmlfy [OPTION]...
10
11       -h, --help
12             print usage instructions
13
14       -v, --version
15             print version number
16
17       --license
18             print license
19
20       --debug
21             print extra debugging information
22
23       Input options:
24
25       -F, --fieldseparator[<level>[b][:<scope>]] <string>
26             specify a delimiter string token for the level specified
27
28       -R, --recordseparator <string>
29             this is a synonym for "-F1 <string>"
30             specify an alternative record separator string to the default
31
32       -C, --column[:<scope>] <r1>-<r2>[:<name>]
33             create an input field from an input column range
34
35       -W,          --regex[:<scope>]         [E|B][i][l][r][U][n][b][e]/<pat‐
36       tern>/[<name>[,..]]
37             create input fields from a regular expression
38
39       -e, --expelempty
40             expel empty input records and fields
41
42       -E, --expel <input_records>[:<input_fields>]
43             expel selected records or fields from being processed
44
45       -q, --quotedfields[2]
46             treat fields that are between quotes as one field
47
48       -Q, --quotechars[2] <string>
49             specify an array of quoting characters to use
50
51       -b, --blanklines
52             do not ignore blank input records
53
54       -t, --trim
55             trim leading and trailing white space from input fields
56
57       Output options:
58
59       -S, --schema <file>
60       -Sd, --schemadtd <file>
61       -Sr, --schemarnc <file>
62       -Sx, --schemaxsd <file>
63             use a schema <file> for tag names and element control
64
65       -M, --matchdirect 0|<elementname>
66             match directly on a specific element in the schema
67
68       -A, --attribute[<level>[:<scope>]] number|level
69               |delimiter|timestamp|insert <name> <value>
70             include attributes in the opening element tag
71
72       -T, --tag[<level>[:<scope>]] number|level
73               |name <name>
74               |[re]insert <name> <value>
75               |[re]insertfile <name> <file>
76               |[re]insertfilexml <indent> <file>
77             modify or insert element tags
78
79       -k, --keyvaluepairs[<level>]
80             generate key/value XML tag pairs
81
82       -l, --linenumbers
83             this is a synonym for "-T1 number"
84             include the line number in the line tag name
85
86       -f, --fieldnumbers
87             this is a synonym for "-T2 number"
88             include the field number in the field tag name
89
90       -L, --linetags
91             include a line number tag with the record data
92
93       -X, --xmlformat [XML1.0|XML1.1]|[SOAP1.1|SOAP1.2]|[HTML table|list]
94               |[UTF-8|UTF-16|UTF-16BE|UTF-16LE|UTF-32|UTF-32BE|UTF-32LE]|BOM
95               |ASCIItoUTF|[noescape all|amp|lt|gt|quot|apos|brvbar]
96               |trimtagclose|[newline dos|unix]
97             specify an XML output format
98
99       -p, --printonly header|footer|rtagopen|rtagclose|records
100             print only snippets of the XML output
101
102       -I, --identifier <system_identifier>
103             specify an alternate system identifier of the doctype or SOAP URI
104
105       -s, --summary[2|c|n|f <file>]
106             print a summary after the end of the processing
107
108       -U, --unxml
109             undo the XML syntax leaving just plain text
110
111       --noxml
112             do not XML-fy the input stream
113

DESCRIPTION

115       The xmlfy command reads stdin and outputs it to stdout  in  XML  format
116       using supplied control directives.
117
118       Delimiter  tokens  and/or  column selections are used to break down the
119       input stream into XML elements which are then represented inside an XML
120       tree  hierarchy that can span multiple depth levels.  For example, com‐
121       mand line output was originally designed for text or CRT based process‐
122       ing.  The  xmlfy  command takes this text based output where a new-line
123       often represents an end-of-record of data and white space often  repre‐
124       sents  a field separator, and reformats it into XML output suitable for
125       interfacing with modern object oriented systems.
126
127       xmlfy is a powerful yet lightweight tool that primarily caters for con‐
128       verting  ASCII, UTF-8, UTF-16 or UTF-32 based output into XML format on
129       the fly and dealing with common issues associated  with  this  kind  of
130       transformation.
131
132       The  xmlfy command also supports a basic version of a schema configura‐
133       tion allowing you to control the format of the XML output by  supplying
134       a schema file as an option.
135
136       With no options supplied xmlfy will use default values for its XML for‐
137       mat. The entire standard input  will  be  enclosed  in  <xmlfy></xmlfy>
138       pairs,  each  line  of standard input will be enclosed in <line></line>
139       pairs, and each field of each line will be enclosed in  <field></field>
140       pairs.
141

OPTIONS

143       You  can supply options to customise the behaviour of xmlfy at the com‐
144       mand line, or by a special token  inside  the  schema  file,  or  both.
145       NOTE:  Options  are  resolved  from  left  to right. If any conflicting
146       options are specified then the last one will have precedence.
147
148       Option: -h, --help
149       The command line usage is printed in plain text format not in XML  for‐
150       mat.
151
152       Option: -v, --version
153       The  version  number is printed in plain text format not in XML format.
154       If the version number is required in XML format it is included with the
155       summary option.
156
157       Option: --license
158       Print all licenses used by xmlfy.
159
160       Option: --debug
161       Print  extra debugging information to stderr to help debug xmlfy behav‐
162       iour.
163
164       Input options:
165
166       Option: -F, --fieldseparator[<level>[b][:<scope>]] <string>
167       Allows you to specify a delimiter string token for the level specified.
168       <level> - The XML depth level to be delimited by <string>.
169                 Must be an integer value greater than or equal to 1.
170                 E.g. a value of 1 will split the input into records delimited
171                 by  <string>,  a  value  of  2 will split records into fields
172                 delimited by <string>, a value of 3 will  split  fields  into
173                 subfields delimited by <string>, and so on.
174                 There is no space separating the option and the level value.
175                 If  no  level  is  specified then the given options will only
176                 apply to level 2.
177       b - Use byte matching for the specified delimiter string.
178           By specifying this option the delimiter string is treated as just a
179           literal sequence of bytes. Normally command line arguments are pre‐
180           sented to xmlfy as ASCII strings and  if  wide  UTF  encoding  like
181           UTF-16  or  UTF-32 is being used then xmlfy will automatically con‐
182           vert the specified delimiter string to  that  encoding.  With  this
183           option  no  encoding  conversion  takes place. In this mode you can
184           also specify escaped decimal byte sequences  inside  the  delimiter
185           string. E.g. "\123\234\\"
186       <scope> - A comma delimited set of sequence ranges with no spaces.
187                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
188                 <s1> - integer representing a start range.
189                 <s2> - integer or the $ token representing an end range.
190                 r    - restart the scope counter for this delimiter after the
191                        completion of the associated range.
192                 Restrict the delimiter effectiveness to the occurrences spec‐
193                 ified  in <scope>. If a delimiter <string> is encountered for
194                 the level specified and its sequence is not in the scope then
195                 it will not function as a field separator and will instead be
196                 treated as data.
197                 E.g. -F3:1-3,8 "." this is saying that level  3  fields  will
198                 only  be  created  for the 1st to 3rd, and 8th occurrences of
199                 the delimiter "." (period).
200                 The restart scope counter option  r  allows  you  to  specify
201                 repeating scope sequences.
202                 E.g  -F1:2,5r  "\n" this is saying create level 1 records out
203                 of every second and fifth lines and keep repeating this until
204                 the input is exhausted.
205                 When  using  multiple same level delimiters, restarting scope
206                 counters of the equivalent level and higher get  reset  when‐
207                 ever a delimiter match is applied.
208                 If  a <scope> range is not specified then the delimiter func‐
209                 tion applies to every occurrence of <string>  of  the  target
210                 level.
211       <string> - A sequence of characters or token to be used as a delimiter.
212                  Tokens specified literally  as  "\n",  "\r",  and  "\t"  are
213                  translated  to  their  corresponding  control  character. If
214                  using wide UTF encoding then <string> is automatically  con‐
215                  verted  to  that  encoding,  otherwise  you can use the byte
216                  matching option and specify escaped decimal  byte  sequences
217                  inside <string>.
218       o If  the delimiter token is the same for a series of levels then obvi‐
219         ously the shallowest level will take precedence, unless the  shallow‐
220         est levels have been limited by scope restrictions. You can also make
221         use of quotes in the input along with specifying quote options.
222       o The XML tree algorithm deepens in a sequential way therefore you must
223         set  your  delimiter levels as an unbroken sequence for them to be of
224         any use, that is you cannot split a level 2  field  with  a  level  4
225         delimiter string.
226       o Refer  to the schema option section for information on level handling
227         when a schema file is specified.
228       o Levels 1 and 2 are already set by default.
229       o The default level 1 delimiter token is NEWLINE (new-line).
230       o The default level  2  delimiter  token  is  WHITESPACE  (space,  tab,
231         new-line, carriage-return, vertical-tab and form-feed).
232       o The delimiters for levels 3 and above are unset.
233       o Only  one delimiter string token can be specified however this option
234         can be invoked multiple times allowing for multiple delimiters to  be
235         used  at  the  level  specified.  When specifying multiple same level
236         delimiters, the larger  delimiter  strings  are  matched  before  the
237         smaller ones. The delimiter string is not included in the output.
238
239       Option: -R, --recordseparator <string>
240       This is a synonym for "-F1 <string>"
241       Allows you to specify a record separator string token that is different
242       from the  default.  The  default  record  separator  token  is  NEWLINE
243       (new-line).
244
245       Option: -C, --column[:<scope>] <c1>-<c2>[:<name>]
246       Use  an  input  column  range  of the input record to generate an input
247       field. This is an alternative method of  capturing  input  fields  from
248       using delimiters.
249       <scope> - A comma delimited set of sequence ranges with no spaces.
250                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
251                 <s1> - integer representing a start range.
252                 <s2> - integer or the $ token representing an end range.
253                 r    - restart the scope counter for this column option after
254                        the completion of the associated range.
255                 Restrict the column option effectiveness to  the  occurrences
256                 specified in <scope>.  If the input record sequence is not in
257                 the scope then the column option  will  not  be  applied  and
258                 input fields will not be generated.
259                 The restart scope counter option r allows the scope sequences
260                 to continually repeat themselves. E.g -C:1-3,5r 1-20 this  is
261                 saying  capture  column fields of 20 characters in length for
262                 every first to  third  and  fifth  input  records,  and  keep
263                 repeating this until the input is exhausted.
264                 If  a  <scope>  range is not specified then the column option
265                 applies to all input records.
266       <c1> - Integer or the $ token representing the start  column  range  of
267              the input field.
268       <c2> - Integer  or the $ token representing the end column range of the
269              input field.
270       <name> - Optional string value that will be used to  override  the  tag
271                name for this input field.
272                You  can  pretty much specify anything as a tag name including
273                illegal XML therefore user discretion is advised.
274                Only applicable for changing default behaviour (i.e. when  the
275                --schema option is NOT specified).
276       o Specifying  field separators of level 2 and above with this option is
277         conflicting and will produce a usage error.
278       o The number of times and order in which this option is  specified  (in
279         conjunction with the -W option) determines the number of input fields
280         generated and their order.
281       o Column ranges represent code points (characters)  meaning  any  multi
282         byte character will only account for just one column position.
283       o Multiple  options  can  use non linear ranges and can overlap e.g. -C
284         5-10:part -C 1-$:whole
285       o Ranges that exceed the size of the  input  record  will  not  process
286         beyond the end of the input record.
287       o You  can  use  single  or double quotes to protect the range from the
288         shell interpreter e.g. -C '80-$:text'
289       o Only one parameter pair can be specified however this option  can  be
290         invoked multiple times.
291
292       Option:    -W,    --regex[:<scope>]    [E|B][i][l][r][U][n][b][e]/<pat‐
293       tern>/[<name>[,..]]
294       Use a regular expression on the input record to generate input  fields.
295       This  is  an  alternative  method  of capturing input fields from using
296       delimiters.
297       <scope> - A comma delimited set of sequence ranges with no spaces.
298                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
299                 <s1> - integer representing a start range.
300                 <s2> - integer or the $ token representing an end range.
301                 r    - restart the scope counter for this regex option  after
302                        the completion of the associated range.
303                 Restrict  the  regex  option effectiveness to the occurrences
304                 specified in <scope>.  If the input record sequence is not in
305                 the scope then the regex option will not be applied and input
306                 fields will not be generated.
307                 The restart scope counter option r allows the scope sequences
308                 to    continually    repeat    themselves.    E.g   -W:1-3,5r
309                 /(^A.*).*(B.*$)/ this is saying capture two regex fields  for
310                 every  first  to  third  and  fifth  input  records, and keep
311                 repeating this until the input is exhausted.
312                 If a <scope> range is not specified  then  the  regex  option
313                 applies to all input records.
314       E - flag to use Extended Regular Expressions in <pattern> (default).
315       B - flag to use Basic Regular Expressions in <pattern>.
316       i - flag to ignore case.
317       l - flag to treat <pattern> as a literal.
318       r - flag to make concatenation right associative.
319       U - flag to make operators ungreedy by default.
320       n - flag to give '\n' special meaning (REG_NEWLINE).
321       b - flag to set '^' as not beginning-of-line (REG_NOTBOL).
322       e - flag to set '$' as not end-of-line (REG_NOTEOL).
323       <pattern> - A POSIX 1003.2 compliant Regular Expression pattern utilis‐
324                   ing zero or more parenthesis pairs to capture input fields.
325       <name> - Optional string value that will be used to  override  the  tag
326                name for input fields derived from pattern matches.
327                A  comma  separated  list  of <name> can be specified with the
328                last entry being re-used if more input fields than  names  are
329                generated.
330                You  can  pretty much specify anything as a tag name including
331                illegal XML therefore user discretion is advised.
332                Only applicable for changing default behaviour (i.e. when  the
333                --schema option is NOT specified).
334       o Specifying  field separators of level 2 and above with this option is
335         conflicting and will produce a usage error.
336       o The number of times and order in which this option is  specified  (in
337         conjunction with the -C option) determines the number of input fields
338         generated and their order.
339       o If matches are not made for all parenthesis pairs specified in  <pat‐
340         tern> then no output will result.
341       o If  no  parenthesis  pairs are specified in <pattern> then the entire
342         input record will be used as the output when a pattern match occurs.
343       o Wide UTF encoding can be specified in <pattern> by using the \x  lit‐
344         eral  followed by two hexadecimal digits to represent any byte inside
345         the code-point e.g. \x0b.
346       o For further information on using regex syntax and  its  flags  please
347         consult the TRE web documentation.
348       o You  can  use  single  or double quotes to protect <pattern> from the
349         shell interpreter e.g. -W 'iU/(^Pam .*)/pams'
350       o You can specify the percentage character % as an alternative  separa‐
351         tor to forward-slash / for <pattern> so long as it remains paired.
352       o Only  one  parameter pair can be specified however this option can be
353         invoked multiple times.
354
355       Option: -e, --expelempty
356       Expel input fields that are empty (zero bytes  in  length)  from  being
357       processed.  The  use  of multi level and multiple same level delimiters
358       can sometimes yield plenty of empty fields which  may  be  undesirable.
359       This  option  expels all the empty input fields from being processed by
360       the output processor.  All levels are examined and  any  input  records
361       comprised entirely out of empty fields are also expelled.
362       This  option  will always run before any expelling tasks specified with
363       option -E are run.
364       This option has no influence on levels subjected to  key/value  pairing
365       as  that  process  has  its own way of dealing with empty fields at its
366       target levels.
367       If a schema is used then obviously the number of  input  records/fields
368       used for element matching has been reduced.
369
370       Option: -E, --expel <input_records>[:<input_fields>]
371       Expel selected input records or selected input fields of selected input
372       records from being processed. Each input record is checked against  the
373       expel  criteria and if a match occurs then these input records or input
374       fields are simply discarded from being passed  onto  the  xmlfy  output
375       processor.
376       <input_records> - A  comma delimited set of input record expel criteria
377                         with no spaces.
378                         The <input_records>  parameter  has  a  sub  form  of
379                         <range_type><r1>[-<r2>][/<string>/][,..]
380                         Where <range_type> can be 'n', 'f' or 'c'.
381                         n - the  associated range refers to input record num‐
382                             bers.
383                         f - the associated range refers to input  field  num‐
384                             bers.
385                         c - the associated range refers to input record char‐
386                             acter lengths.
387                         <r1> - integer representing a start range.
388                         <r2> - integer or the $  token  representing  an  end
389                                range.
390                         <string> - the  specified  <string>  must  also exist
391                                    within the range.
392                                    Expel criteria types can be intermixed.
393                                    E.g.         -E         n10-$,f7-8,f4/Mer‐
394                                    cedes/,c10-20,c1-15/SUV/
395                                    this  is  saying  that input records whose
396                                    record number is greater than or equal  to
397                                    10,  AND  input records whose total number
398                                    of fields are between 7 and 8,  AND  input
399                                    records whose 4th input field contains the
400                                    string "Mercedes", AND input records whose
401                                    input  record  length  is  greater than or
402                                    equal to 10 but less than or equal  to  20
403                                    characters,  AND input records whose first
404                                    15 characters contain  the  string  "SUV",
405                                    will  finally match the input record expel
406                                    criteria.
407                                    In this release you can only specify the $
408                                    token  (last  input  record)  in  a paired
409                                    range and not on its own.
410                                    Generally xmlfy can figure out  where  the
411                                    search   string  delimiters  would  likely
412                                    occur however you can specify the %  char‐
413                                    acter as an alternative separator to / for
414                                    <string> so long as it remains paired.
415                                    If an <input_fields> criteria is not spec‐
416                                    ified  then  the  entire  input  record is
417                                    expelled.
418       <input_fields> - A comma delimited set of field number ranges  with  no
419                        spaces.
420                        The   <input_fields>  parameter  has  a  sub  form  of
421                        <r1>[-<r2>][,..]
422                        <r1> - integer or the $  token  representing  a  start
423                               range.
424                        <r2> - integer  or  the  $  token  representing an end
425                               range.
426                        Discard select input fields of the input records  that
427                        match the expel criteria before passing onto the xmlfy
428                        output processor.
429                        E.g. -E n2-$:1,$ this is  saying  that  input  records
430                        whose record number is greater than or equal to 2 will
431                        have their first and last fields expelled.
432                        You can specify the $ token (last input  field)  in  a
433                        paired range or on its own.
434       o You  can  use  single  or double quotes to protect the range from the
435         shell interpreter e.g. -E 'n2-$:$'
436       o If a schema is used then obviously the number of input records/fields
437         used for element matching has been reduced.
438       o Only  one parameter group can be specified however this option can be
439         invoked multiple times with resolution occurring from left to right.
440
441       Option: -q, --quotedfields[2]
442       Treat fields that are quoted as one field. Normally  xmlfy  will  parse
443       fields  by  their  delimiter  e.g. WHITESPACE, this option allows multi
444       delimited fields to be specified as one by quoting them. By default the
445       quoted  field  may  only  span  the current input record unless the -q2
446       option is specified in which case the quoted field  can  span  multiple
447       input records.
448       Quotes are not included in the field and any leading/trailing text out‐
449       side the field's quotes are truncated.
450       If quotes are not closed xmlfy will update the field until the  end  of
451       the  input  record,  or  if option -q2 is specified, until the input is
452       exhausted (EOF).
453       The default quote character is a double quote (").
454
455       Option: -Q, --quotechars[2] <string>
456       specify a string of characters that can be used as the quoting  charac‐
457       ter.
458       <string> - an array of quoting characters.
459       o If field quoting is enabled then any input character that matches any
460         character in <string> will toggle the quoting  function,  unless  the
461         -Q2  option  is specified in which case characters in <string> repre‐
462         sent paired quotes with odd numbered characters in  this  array  tog‐
463         gling  the  open  quote function, and its corresponding pair toggling
464         the close quote function. This allows parenthesis, brackets,  etc  to
465         be used as quotes.
466       o Obviously  when  specifying this option care must be taken to prevent
467         the shell from interpreting the supplied quote characters. When using
468         a schema file containing this option you can specify quote characters
469         by escaping them with the backslash "\" character.
470
471       Option: -b, --blanklines
472       Normally xmlfy ignores blank lines or empty  level  1  records  in  the
473       input  stream.  This option tells xmlfy to not ignore these blank lines
474       and print out XML line record tags but with no elements.
475       In this mode blank lines count as line numbers.
476
477       Option: -t, --trim
478       Field elements are trimmed of leading and trailing white space.
479
480       Output options:
481
482       Option: -S, --schema <file>
483               -Sd, --schemadtd <file>
484               -Sr, --schemarnc <file>
485               -Sx, --schemaxsd <file>
486       Specify a schema <file> for controlling the XML output.
487       <file> - The schema file must comply with either the Document Type Def‐
488                inition  (.dtd)  language, or the RELAX NG Compact (.rnc) lan‐
489                guage, or the XML Schema  Document  (.xsd)  language,  however
490                xmlfy  does not support the finer aspects of these schema lan‐
491                guages at this early stage.
492       o When all input fields of the input record have been identified, xmlfy
493         will match them against the elements inside the tree hierarchy of the
494         schema file, and if a match is found then xmlfy will print an  output
495         record using the matching schema tree hierarchy as its XML structure.
496         Option  -S,  --schema  uses  the case-insensitive file name extension
497         (.dtd or .rnc or .xsd) of <file> to  determine  which  schema  inter‐
498         preter xmlfy will apply.
499         Option  -Sd,  --schemadtd  forces  xmlfy to use the DTD schema inter‐
500         preter on <file>.
501         Option -Sr, --schemarnc forces xmlfy to use  the  RNC  schema  inter‐
502         preter on <file>.
503         Option  -Sx,  --schemaxsd  forces  xmlfy to use the XSD schema inter‐
504         preter on <file>.
505       o You can specify multi level delimiters when using this option however
506         any  delimiters  greater  than level 2 are only used to identify more
507         input fields and are not used at all in altering the XML tree hierar‐
508         chy  as  is  dictated by the schema file. Fields with levels of 2 and
509         above are flattened to be just plain fields of  the  input  record  -
510         this  is  very  different to the default behaviour where field levels
511         form the XML tree hierarchy.
512       o If a schema option is not supplied then xmlfy will use default values
513         for tag names and element control.
514       o For  further  information  on  how to write a schema for xmlfy please
515         consult the web documentation.
516
517       Option: -M, --matchdirect 0|<elementname>
518       Match directly on a specific element in the schema making it  the  root
519       element.
520       0             - A  token  representing  the default root element in the
521                       schema.
522       <elementname> - The name of a record element in the schema.
523       o This option alters the way the selected  schema  element  is  matched
524         against  the available input fields that were generated. In this mode
525         the target element is matched  in  its  entirety  using  its  element
526         helper  and  printed  accordingly.   This  is  very  different to the
527         default legacy mode whereby only the record elements of the root ele‐
528         ment get matched in a continuously sequential way.
529       o Regardless  of  what wildcard attributes exist for the target element
530         it will only be printed once as a root element.
531       o If a schema file is not specified then this option will be ignored.
532
533       Option: -A, --attribute[<level>[:<scope>]] number|level
534                       |delimiter|timestamp|insert <name> <value>
535       Include attributes in the opening element tag for the level specified.
536       <level> - The XML depth level to be modified.
537                 Must be an integer value greater than or equal to 0.
538                 E.g. a value of 1  will  apply  attributes  to  each  opening
539                 record element and a value of 2 will apply attributes to each
540                 opening field element.
541                 There is no space separating the option and the level value.
542                 If no level is specified then the given options will apply to
543                 all levels except level 0.
544       <scope> - A comma delimited set of sequence ranges with no spaces.
545                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
546                 <s1> - integer representing a start range.
547                 <s2> - integer or the $ token representing an end range.
548                 r    - restart the scope counter for this attribute after the
549                        completion of the associated range.
550                 Restrict the custom attribute  effectiveness  to  the  occur‐
551                 rences  specified in <scope>.  If the element sequence is not
552                 in the scope then the custom attribute will not be applied.
553                 The restart scope counter option r allows the scope sequences
554                 to  continually  repeat themselves. E.g -A2:1-3,5r insert x y
555                 this is saying insert custom attributes x="y" for every first
556                 to  third and fifth level 2 elements, and keep repeating this
557                 until the output is exhausted.
558                 Scope sequence counters are always reset to zero for the next
559                 element  depth  level  and higher whenever a deeper XML depth
560                 level is entered into.
561                 If a <scope> range is not specified then the custom attribute
562                 function applies to all elements at the specified <level>.
563       number - Specify the sequence number as an element attribute.
564                E.g.  <field>  becomes <field number="1"> and the next <field>
565                becomes <field number="2"> and so on.
566                Scoping is not supported.
567                Not supported for level 0.
568       level - Specify the level as an element attribute.
569               E.g. <field> becomes <field level="2">
570               Scoping is not supported.
571               Not supported for level 0.
572       delimiter - Specify the matching delimiter as an element attribute.
573                   E.g. <field> becomes <field delimiter="ABC">
574                   Delimiter string tokens that contain illegal XML characters
575                   are printed as their hex pair equivalent.
576                   When  using  a  schema  file only level 1 records and field
577                   elements will have their delimiter attributes printed.
578                   Scoping is not supported.
579                   Not supported for level 0.
580       timestamp - Include a timestamp as an element attribute.
581                   Two timestamps are provided, one for  humans  and  one  for
582                   machines. The times are stamped at element print time.
583                   E.g.  <field>  becomes  <field  timestamp_date="Fri  May  5
584                   10:23:33 2008" timestamp_sec="123456790">
585                   Scoping is not supported.
586       insert <name> <value> - Insert a custom element attribute.
587                               The parameters <name> and <value> are  combined
588                               to  form  an  element  attribute  with  <value>
589                               wrapped around double quotes.
590                               E.g <field> becomes <field name="value">
591                               You can pretty  much  specify  anything  as  an
592                               attribute  name and value including illegal XML
593                               therefore user discretion is advised.
594       o
595         Only one parameter group can be specified however this option can  be
596         invoked multiple times.
597
598       Option: -T, --tag[<level>[:<scope>]] number|level
599                       |name <name>
600                       |[re]insert <name> <value>
601                       |[re]insertfile <name> <file>
602                       |[re]insertfilexml <indent> <file>
603       Modify or insert element tags for the level specified.
604       <level> - The XML depth level to be modified.
605                 Must be an integer value greater than or equal to 0.
606                 E.g.  a  value  of 1 will modify the tag name for each record
607                 and a value of 2 will modify the tag name for each field.
608                 There is no space separating the option and the level value.
609                 If no level is specified then the given options will apply to
610                 all levels except level 0.
611       <scope> - A comma delimited set of sequence ranges with no spaces.
612                 The <scope> parameter has a sub form of <s1>[-<s2>][r][,..]
613                 <s1> - integer representing a start range.
614                 <s2> - integer or the $ token representing an end range.
615                 r    - restart  the scope counter for this tag after the com‐
616                        pletion of the associated range.
617                 Restrict the custom  tag  effectiveness  to  the  occurrences
618                 specified  in <scope>.  If the element sequence is not in the
619                 scope then the custom tag will not be applied.
620                 The restart scope counter option r allows the scope sequences
621                 to  continually  repeat themselves. E.g -T2:1-3,5r insert x y
622                 this is saying insert the custom tag  <x>y</x>  before  every
623                 first to third and fifth level 2 elements, and keep repeating
624                 this until the output is exhausted.
625                 Scope sequence counters are always reset to zero for the next
626                 element  depth  level  and higher whenever a deeper XML depth
627                 level is entered into.
628                 If a <scope> range is not specified then the custom tag func‐
629                 tion applies to all elements at the specified <level>.
630       number - Suffix the tag name with its sequence number.
631                E.g.  <line>  becomes  <line1>  and  the  next  <line> becomes
632                <line2> and so on.
633                Scoping is not supported.
634                Not supported for level 0.
635       level - Prefix the tag name with its level.
636               E.g. <field> becomes <L2field>
637               Scoping is not supported.
638               Not supported for level 0.
639       name <name>    - Change the tag name from the default to <name>
640                        Only applicable for changing default  behaviour  (i.e.
641                        when the --schema option is NOT specified).
642                        E.g. <field> becomes <word>
643                        You  can  pretty  much  specify anything as a tag name
644                        including illegal XML  therefore  user  discretion  is
645                        advised.
646                        Scoping is not supported.
647       [re]insert <name> <value> - Insert a custom element tag.
648                                   The  parameters <name> and <value> are com‐
649                                   bined to form an element tag  with  <value>
650                                   wrapped   between  <name>  tag  pairs.  E.g
651                                   <name>value</name>
652                                   The inserted  element  appears  before  any
653                                   output elements for the level specified.
654                                   The  reinsert feature keeps applying itself
655                                   at the level specified.
656                                   You can pretty much specify anything as  an
657                                   element  name  and  value including illegal
658                                   XML therefore user discretion is advised.
659                                   Not supported for level 0.
660       [re]insertfile <name> <file>
661                                    -
662                                      Insert a custom element  tag  containing
663                                      contents of a file.
664                                      The   contents  of  <file>  are  wrapped
665                                      between <name> tag pairs.
666                                      The encoding of <file>  must  match  the
667                                      output  encoding being used otherwise an
668                                      undesirable output will result.
669                                      Any BOM found in <file> is removed.
670                                      Any reserved XML  characters  in  <file>
671                                      are escaped, and newlines are corrected.
672                                      The  inserted element appears before any
673                                      output elements for the level specified.
674                                      The  reinsert  feature  keeps   applying
675                                      itself at the level specified.
676                                      You  can pretty much specify anything as
677                                      an element name  including  illegal  XML
678                                      therefore user discretion is advised.
679                                      Not supported for level 0.
680       [re]insertfilexml <indent> <file> - Insert contents of an XML file.
681                                           The  entire  contents of <file> are
682                                           inserted before any output elements
683                                           for the level specified.
684                                           The  encoding  of <file> must match
685                                           the output encoding being used oth‐
686                                           erwise  an  undesirable output will
687                                           result.
688                                           Any BOM found in <file> is removed.
689                                           If the  parameter  <indent>  is  an
690                                           integer value greater than or equal
691                                           to zero then the contents  of  file
692                                           are  indented  by  this amount, any
693                                           XML prologue is removed,  and  new‐
694                                           lines are corrected.
695                                           If  the  parameter  <indent> is the
696                                           value "raw" then the  XML  file  is
697                                           inserted as is without its BOM.
698                                           The reinsert feature keeps applying
699                                           itself at the level specified.
700                                           You can pretty much insert anything
701                                           as XML file content including ille‐
702                                           gal XML therefore  user  discretion
703                                           is advised.
704       o Only  one parameter group can be specified however this option can be
705         invoked multiple times.
706
707       Option: -k, --keyvaluepairs[<level>]
708       Switch on the generation of key/value XML tag pairs for the output.
709       <level> - The XML depth level to be modified.
710                 Must be an integer value greater than or equal to 2.
711                 There is no space separating the option and the level value.
712                 If no level is specified then the option will  apply  to  all
713                 levels except levels 0 and 1.
714       o In  this  mode  the  data of the first field of the current XML level
715         becomes the tag name for that level, that is, it becomes the key, and
716         any subsequent fields become its value.
717       o This  key/value pairing continues down the XML tree hierarchy for all
718         the XML levels specified.
719       o You can pretty much generate anything as a tag name including illegal
720         XML therefore user discretion is advised. The new tag name is trimmed
721         of leading and trailing white space and white space between  text  is
722         replaced with the underscore "_" character.
723       o If a blank field becomes a tag name candidate then xmlfy will skip it
724         and search along the same level for a more suitable  candidate.  This
725         behaviour  can  be  mitigated by using the -b option which will force
726         the default tag name to be substituted instead.
727       o Only  applicable  for  changing  default  behaviour  (i.e.  when  the
728         --schema option is NOT specified).
729       o This option can be invoked multiple times.
730
731       Option: -l, --linenumbers
732       This is a synonym for "-T1 number"
733       Include the line number in the line tag name
734
735       Option: -f, --fieldnumbers
736       This is a synonym for "-T2 number"
737       include the field number in the field tag name
738
739       Option: -L, --linetags
740       Insert a line number tag within the XML formatted output.
741       This  is an alternative way of numbering your XML records. E.g. for the
742       first line record of XML output the following tag is inserted <linenum‐
743       ber>1</linenumber> and so on.
744
745       Option:  -X,  --xmlformat  [XML1.0|XML1.1]|[SOAP1.1|SOAP1.2]|[HTML  ta‐
746       ble|list]
747               |[UTF-8|UTF-16|UTF-16BE|UTF-16LE|UTF-32|UTF-32BE|UTF-32LE]|BOM
748               |ASCIItoUTF|[noescape all|amp|lt|gt|quot|apos|brvbar]
749               |trimtagclose|[newline dos|unix]
750       Allows you to specify the XML format to be used for the output.
751       XML1.0 - Generate XML 1.0 output (this is the default).
752       XML1.1 - Generate XML 1.1 output.
753       SOAP1.1 - Generate XML SOAP 1.1 output.
754       SOAP1.2 - Generate XML SOAP 1.2 output.
755       HTML    - Generate HTML output.
756                 table- elements are displayed in table format.
757                 list - elements are displayed in list format.
758       UTF-8 - Generate UTF-8 output encoding (default).
759       UTF-16 - Generate UTF-16 output encoding.
760       UTF-16BE - Generate UTF-16BE (big-endian) output encoding.
761       UTF-16LE - Generate UTF-16LE (little-endian) output encoding.
762       UTF-32 - Generate UTF-32 output encoding.
763       UTF-32BE - Generate UTF-32BE (big-endian) output encoding.
764       UTF-32LE - Generate UTF-32LE (little-endian) output encoding.
765       BOM - Generate and interpret a Byte-Order-Mark.
766       ASCIItoUTF - Convert ASCII input to wide UTF encoding.
767       noescape - Do not escape select reserved XML  characters.   By  default
768                  xmlfy will escape reserved XML characters that appear in the
769                  input stream and this option provides an adjustment to  this
770                  behaviour.
771                  all - do not escape any characters.
772                  amp - do not escape the character & (ampersand).
773                  lt - do not escape the character < (less-than).
774                  gt - do not escape the character > (greater-than).
775                  quot - do not escape the character " (quote).
776                  apos - do not escape the character ' (apostrophe).
777                  brvbar - do  not  escape  the  character  | (broken vertical
778                           bar).
779       trimtagclose - Truncate superfluous characters  from  the  closing  tag
780                      name.
781       newline - Select the line ending format for XML meta-data.
782                 dos - use carriage-return and new-line ("\r\n") for line end‐
783                       ings.
784                 unix - use new-line ("\n") for line endings.
785       o The only thing option XML1.1 does  is  change  the  prologue  version
786         string to "1.1" and nothing else.
787       o When  using  the  SOAP*  options,  the normal XML output generated by
788         xmlfy is encapsulated in a SOAP Envelope and SOAP Body, the root  tag
789         defines  a  namespace  prefix of "x" with a URI reference that can be
790         adjusted with the -I option, and all children elements  (records  and
791         fields) use this prefix name.
792         A  non-mandatory  administrative header element with a prefix name of
793         "xh" is provided containing program and execution details.
794         The SOAP* options are only a basic implementation  for  generating  a
795         simple  XML  SOAP envelope containing xmlfy data. There is no further
796         scope provided for SOAP Headers, SOAP Faults, transaction or protocol
797         handling.
798       o When  using the HTML option, the normal XML output generated by xmlfy
799         is displayed in either a table or list layout and encapsulated  in  a
800         HTML  Body,  of  which the document title can be adjusted with the -I
801         option.
802       o The UTF-* options tell xmlfy to use the specified  encoding  for  all
803         its XML meta-data (element tags, element attributes, prologues, etc).
804         Other than the ASCIItoUTF option,  no  transformation  of  the  input
805         stream  is  performed and xmlfy assumes that the encoding used by the
806         input stream matches the encoding specified, otherwise an undesirable
807         output  will  result containing different encodings between the input
808         data and XML meta-data.
809         If specifying the UTF-16 or UTF-32 parameter and the  BOM  option  is
810         either  not  specified  or  there  is no BOM in the input stream then
811         encoding in big-endian format will be assumed.
812       o The BOM (Byte-Order-Mark) option will force xmlfy to handle  the  BOM
813         in  the  input  stream if it is there, and also generate a BOM in the
814         output stream. If specifying the BOM option and a BOM is found in the
815         input  stream  then  that  will  overide  any user specified encoding
816         option.
817         The BOM byte sequence used for UTF-8 is 0xef 0xbb 0xbf (U+FEFF).
818         The BOM byte sequence used for UTF-16BE is 0xfe 0xff (U+FEFF).
819         The BOM byte sequence used for UTF-16LE is 0xff 0xfe (U+FFFE).
820         The BOM byte sequence used  for  UTF-32BE  is  0x00  0x00  0xfe  0xff
821         (U+FEFF).
822         The  BOM  byte  sequence  used  for  UTF-32LE  is 0xff 0xfe 0x00 0x00
823         (U+FFFE).
824       o The ASCIItoUTF option when used in conjunction with one of the  UTF-*
825         options  will  process  ASCII  input  and  convert it to the wide UTF
826         encoding specified.
827       o The noescape options control which reserved XML characters should not
828         be escaped.
829       o The  trimtagclose  option  trims  back the closing tag from the first
830         white space character found. Some options allow the  user  to  define
831         anything  as  a  tag  name  including  tag  names  that  have element
832         attributes (non normal approach).  Using this option under these cir‐
833         cumstances  will  prevent  these element attributes from appearing in
834         the close tag.
835       o The newline option adjusts  the  line  ending  format  used  for  XML
836         meta-data.  On  Unix platforms the default is unix and on Win32 plat‐
837         forms the default is dos. Only applies to XML  meta-data  output  and
838         does  not  do  conversion  of  newline  characters found in the input
839         stream.
840       o Only one parameter group can be specified however this option can  be
841         invoked multiple times.
842
843       Option: -p, --printonly header|footer|rtagopen|rtagclose|records
844       Allows you to just print XML snippets to the output.
845       This  is  useful  when you want to execute xmlfy multiple times to con‐
846       struct a single XML output file.
847       header - Will only print the prologue, doctype,  opened  SOAP  Envelope
848                and Body tags, the SOAP Header tag, HTML headers, and the BOM.
849       footer - Will only print closed SOAP Envelope and Body tags, and closed
850                HTML tags.
851       rtagopen - Will only print an opened root element tag.
852       rtagclose - Will only print a closed root element tag.
853       records - Will only print record elements and their field elements.
854       o Only one parameter can  be  specified  however  this  option  can  be
855         invoked multiple times.
856
857       Option: -I, --identifier <system_identifier>
858       Allows  you to specify your own system identifier of the doctype should
859       you not be content with what xmlfy has specified.
860       system_identifier - An array of characters used to override the default
861                           system identifier.
862                           You  can  pretty  much specify anything as a system
863                           identifier including  illegal  XML  therefore  user
864                           discretion is advised.
865       o By  default xmlfy will use the string "xmlfy.dtd", or if specifying a
866         schema, use the schema filename as the system identifier.
867       o You can also use this option to overide the  default  SOAP  namespace
868         URI  value  for  the  root  element  when  using  the XML SOAP format
869         options.
870       o You can also use this option to overide the  document  title  in  the
871         HTML header when using the XML HTML format options.
872
873       Option: -s, --summary[2|c|n|f <file>]
874       When  all  input  is exhausted an XML summary element is printed at the
875       bottom providing a brief summary of what xmlfy processed.
876       2        - Print the summary element to stderr instead.
877       c        - Print the summary element as an XML comment.
878       n        - Print the summary element without  calculating  any  message
879                  digests.
880       f <file> - Print the summary element to <file>.
881       By  default  MD5  and  SHA512 checksum elements are provided inside the
882       summary called md5_input, md5_output, sha512_input  and  sha512_output.
883       The  md5_input and sha512_input checksums are a digest of all the input
884       that was actually processed including any input BOM. The md5_output and
885       sha512_output  checksums  are  a digest of all the output including any
886       output BOM that precedes the XML summary element. To correctly validate
887       the output result against the output checksum you must first remove any
888       summary element and summary comments from the output result.
889
890       Option: -U, --unxml
891       Read XML formatted input and remove all that bracket  racket  reverting
892       your  XML  document  back to a plain format. Can be used in conjunction
893       with the -F<level> <string> option to specify the delimiter to use  for
894       each  XML  depth level.  Multiple same level -F options are meaningless
895       in this context and delimiters are only inserted if more than one field
896       is  available  to  be  delimited.  Field  separator scoping options are
897       ignored. The default delimiter is a space character for XML depth  lev‐
898       els  of  2  and  above,  and new-line for XML depth levels below 2. Tag
899       names and their attributes are not included in the output, and anything
900       between  XML  comments are filtered out. If there is a BOM in the input
901       then xmlfy will use that for the encoding, otherwise  xmlfy  will  look
902       for  the opening XML character sequence of "<?" to determine the encod‐
903       ing being used.  If neither of the previous methods found  the  correct
904       encoding  then  you  can use the -X UTF-* options as a fallback.  Basic
905       quoting options are also supported.  Works best with XML output  gener‐
906       ated  by  xmlfy  but can also be used with caution on other foreign XML
907       documents.
908
909       Option: --noxml
910       Do not XML-fy the input stream but do process it for reserved XML char‐
911       acters  (this  feature  was  initially written for formatting the xmlfy
912       HTML test reports that use wide encodings).  Used in  conjunction  with
913       the  -X options to control the conversion of reserved characters and/or
914       to transform the input stream to wide UTF encodings.
915       E.g. To transform an ASCII input stream to  UTF-16BE  encoding  with  a
916       BOM:
917       xmlfy --noxml -X UTF-16BE -X ASCIItoUTF -X noescape all -X BOM
918       E.g. To just escape select reserved XML characters in an UTF-32LE input
919       stream:
920       xmlfy --noxml -X UTF-32LE -X noescape amp
921
922       Important note on specifying options.
923       The way xmlfy handles options is very straightforward and can be easily
924       confused  if you don't follow the syntax specified for each option. The
925       getopt library has been deliberately avoided to keep xmlfy portable.
926
927       xmlfy first evaluates options supplied on the command line, if a schema
928       file is supplied then xmlfy will also look for options in that file and
929       evaluate them too. See the schema file section below on how to  specify
930       xmlfy options inside a schema file.
931

OUTPUT

933       How it works.
934       The  input  processor  used by xmlfy block reads unprocessed bytes from
935       standard input (stdin) and stores them in an array the size of a  level
936       1  record.  This  level  1  record is then processed for fields and sub
937       fields etc by marking their positions in  this  array.  Dynamic  memory
938       handling is used.
939
940       The  output  processor  used  by xmlfy takes the results from the input
941       processor and re-packages it with  suitably  encoded  XML  syntax.  Any
942       input  characters  that  are  reserved for XML are by default re-repre‐
943       sented in their escaped form.
944           Character & (ampersand) becomes string &amp;
945           Character < (less-than) becomes string &lt;
946           Character > (greater-than) becomes string &gt;
947           Character " (quote) becomes string &quot;
948           Character ' (apostrophe) becomes string &apos;
949           Character | (broken vertical bar) becomes string &brvbar;
950       The output processor then writes processed bytes to a block buffer  for
951       printing to standard output (stdout).
952
953       Using a schema file.
954       The  default schema used by xmlfy is hard coded and can be described as
955       follows:
956       In DTD schema form:
957           <!ELEMENT xmlfy (line*)>
958           <!ELEMENT line (field*)>
959           <!ELEMENT field (#PCDATA)>
960       In RNC schema form:
961           start = xmlfy
962           xmlfy = element xmlfy { line* }
963           line = element line { field* }
964           field = element field { text }
965       In XSD schema form:
966           <xs:schema>
967             <xs:element name="xmlfy">
968               <xs:sequence>
969                 <xs:element name="line" type="lineType" minOccurs="0"  maxOc‐
970           curs="unbounded" />
971               </xs:sequence>
972             </xs:element>
973             <xs:complexType name="lineType">
974               <xs:sequence>
975                 <xs:element  name="field" type="xs:string" minOccurs="0" max‐
976           Occurs="unbounded" />
977               </xs:sequence>
978             </xs:complexType>
979           </xs:schema>
980
981       A schema file for the ls -la command that produces output like this:
982           total 73
983           drwx------+  3 ag None     0 Apr 20 19:36 .
984           -rwxr-xr-x   1 ag None 15639 Apr 20 19:31 a.exe
985           -rwx------+  1 ag None  6354 Apr 20 19:31 xmlfy.c
986           -rwx------+  1 ag None  4901 Apr 19  2008 xmlfy.h
987
988       In DTD schema form will look like this:
989           <!ELEMENT ls (total?), (file*)>
990           <!ELEMENT total (prompt, totalsize)>
991           <!ELEMENT  file  (permission?,  blocks?,  user?,   group?,   size?,
992           date_M?, date_d?, date_ty?, fname)>
993           <!ELEMENT date_ty (date_y)>
994           <!ELEMENT date_ty (date_h, date_m)>
995           <!ELEMENT prompt (#PCDATA)>
996           <!ELEMENT totalsize (#PCDATA)>
997           <!ELEMENT permission (#PCDATA)>
998           <!ELEMENT blocks (#PCDATA)>
999           <!ELEMENT user (#PCDATA)>
1000           <!ELEMENT group (#PCDATA)>
1001           <!ELEMENT size (#PCDATA)>
1002           <!ELEMENT date_y (#PCDATA)>
1003           <!ELEMENT date_M (#PCDATA)>
1004           <!ELEMENT date_d (#PCDATA)>
1005           <!ELEMENT date_h (#PCDATA)>
1006           <!ELEMENT date_m (#PCDATA)>
1007           <!ELEMENT fname (#PCDATA)>
1008
1009       and should be saved to a file as ls.dtd and invoked as:
1010           % ls -la | xmlfy --schema ls.dtd -F3 :
1011
1012       In RNC schema form will look like this:
1013           start = ls
1014           ls = element ls { total? | file* }
1015           total = element total { prompt, totalsize }
1016           file  =  element file { permission?, blocks?, user?, group?, size?,
1017           date_M?, date_d?, date_ty?, fname }
1018           date_ty = element date_ty { date_y }
1019           date_ty |= element date_ty { date_h, date_m }
1020           prompt = element prompt { text }
1021           totalsize = element totalsize { text }
1022           permission = element permission { text }
1023           blocks = element blocks { text }
1024           user = element user { text }
1025           group = element group { text }
1026           size = element size { text }
1027           date_y = element date_y { text }
1028           date_M = element date_M { text }
1029           date_d = element date_d { text }
1030           date_h = element date_h { text }
1031           date_m = element date_m { text }
1032           fname = element fname { text }
1033
1034       and should be saved to a file as ls.rnc and invoked as:
1035           % ls -la | xmlfy --schema ls.rnc -F3 :
1036
1037       In XSD schema form will look like this:
1038           <xs:schema>
1039             <xs:element name="ls" type="lsType" />
1040             <xs:complexType name="lsType">
1041               <xs:sequence>
1042                 <xs:element name="total" type="totalType" minOccurs="0" />
1043                 <xs:element name="file" type="fileType" minOccurs="0"  maxOc‐
1044           curs="unbounded" />
1045               </xs:sequence>
1046             </xs:complexType>
1047             <xs:complexType name="totalType">
1048               <xs:sequence>
1049                 <xs:element name="prompt" type="xs:string" />
1050                 <xs:element name="totalsize" type="xs:string" />
1051               </xs:sequence>
1052             </xs:complexType>
1053             <xs:complexType name="fileType">
1054               <xs:sequence>
1055                 <xs:element  name="permission" type="xs:string" minOccurs="0"
1056           />
1057                 <xs:element name="blocks" type="xs:string" minOccurs="0" />
1058                 <xs:element name="user" type="xs:string" minOccurs="0" />
1059                 <xs:element name="group" type="xs:string" minOccurs="0" />
1060                 <xs:element name="size" type="xs:string" minOccurs="0" />
1061                 <xs:element name="date_M" type="xs:string" minOccurs="0" />
1062                 <xs:element name="date_d" type="xs:string" minOccurs="0" />
1063                 <xs:element name="date_ty" type="datetyType" minOccurs="0" />
1064                 <xs:element name="fname" type="xs:string" />
1065               </xs:sequence>
1066             </xs:complexType>
1067             <xs:complexType name="datetyType">
1068               <xs:choice>
1069                 <xs:element name="date_y" type="xs:string" />
1070                 <xs:sequence>
1071                   <xs:element name="date_h" type="xs:string" />
1072                   <xs:element name="date_m" type="xs:string" />
1073                 </xs:sequence>
1074               </xs:choice>
1075             </xs:complexType>
1076           </xs:schema>
1077
1078       and should be saved to a file as ls.xsd and invoked as:
1079           % ls -la | xmlfy --schema ls.xsd -F3 :
1080
1081       Shoe-horning raw data into a structure defined by a  schema  is  rather
1082       straight  forward  when the input fields have a one-to-one relationship
1083       with the fields of the schema  elements,  however  if  wildcard  tokens
1084       and/or Boolean logic are employed in the schema then it becomes quite a
1085       challenge, sometimes even impossible, to be deterministic  about  which
1086       input  field belongs to which schema field. Strictly speaking, the main
1087       function of the schema is to  ensure  XML  is  valid  and  to  do  this
1088       requires  the XML document to already pre-exist. In xmlfy's case we are
1089       doing the reverse by building an XML document on the fly while  follow‐
1090       ing  rules described by the schema - this is still okay and the result‐
1091       ing XML can be considered to be both valid and well formed.
1092
1093       xmlfy employs two techniques to help with this shoe-horning input  data
1094       problem. The first technique xmlfy uses is recognising multiple element
1095       definitions that have the same name. This allows you  to  capture  your
1096       schema  elements  under a variety of input circumstances without having
1097       to create a unique element for each circumstance -  you  can  still  do
1098       that  if  you  want.  The second technique xmlfy uses is auto-generated
1099       field match constraint helpers to assist in matching the  input  fields
1100       to  the  elements  described by the schema. These helpers are useful in
1101       improving the speed of xmlfy particularly when using  compound  element
1102       structures  and  wildcard  tokens  in  the  schema hierarchy. After the
1103       schema file is loaded into memory, an array of helpers is generated for
1104       each element that describes all combinations of the schema tree traver‐
1105       sal paths that can be taken and associates each  combination  with  the
1106       minimum, maximum and last number of fields required for a match against
1107       the number of available input fields.  For  example,  using  the  above
1108       schema a match will occur for:
1109           total(min=2, max=2, last=2) when input fields = 2.
1110           file(min=1, max=9, last=1) when 1 >= input fields <= 9
1111           and date_ty is a single field (min=1, max=1, last=1).
1112           file(min=1, max=10, last=1) when 1 >= input fields <= 10
1113           and date_ty is two fields (min=2, max=2, last=2).
1114       By default xmlfy continuously iterates through just the record elements
1115       of the root element looking for element helpers that can fully  satisfy
1116       the requirements of that particular element's schema tree hierarchy for
1117       the given input fields, after which the matching record element is then
1118       checked  against  its  wildcard obligations in the root element defini‐
1119       tion, and if okay is finally printed.
1120       In match direct mode xmlfy only looks at the  element  helpers  of  the
1121       targeted  element,  and  if that element can fully satisfy the require‐
1122       ments of its schema tree hierarchy  for  the  given  input  fields,  is
1123       printed in its entirety only once as the root element.
1124
1125       To  specify  xmlfy  options  inside  a schema file you encapsulate them
1126       inside a special token that is in effect a schema comment.
1127           DTD and XSD example:
1128           <!-- xmlfy-args: -F1 "\n" -F2 ABC -q -Q \"\' -->
1129           RNC example:
1130           ## xmlfy-args: -F1 "\n" -F2 ABC -q -Q \"\'
1131       This special token must exist in completed form on just one line at the
1132       left  most  side,  spacing  is  important, only the first occurrence is
1133       recognised, and ideally it is placed somewhere  near  the  top  of  the
1134       schema  file.  The schema option syntax is the same as the command line
1135       option syntax except that some options are not allowed e.g. --schema.
1136

LIMITATIONS

1138       xmlfy has been successfully  tested  on  average  hardware  with  input
1139       records containing over 10,000,000 fields whilst using a complex schema
1140       tree structure and multi level delimiters.
1141
1142       Currently the xmlfy schema file parser is not  that  sophisticated  and
1143       exhibits the following behaviour:
1144
1145       DTD schema
1146       - Only recognises the <!ELEMENT> directive and ignores all others.
1147       - The first valid <!ELEMENT> definition becomes the root element.
1148       - Element fields that don't have an element definition default to being
1149         (#PCDATA).
1150       - Elements defined as (#PCDATA) or (#CDATA)  are  ignored  causing  the
1151         referring  field  to default to (#PCDATA) however it is good practice
1152         to include these elements in order to furnish a complete DTD schema.
1153       - Only honours the +, ? and * wildcard tokens.
1154       - At this stage does not honour field group sets () and or-ing | syntax
1155         tokens.
1156
1157       RNC schema
1158       - Only recognises named directives and ignores all others.
1159       - The element named "start" becomes the root element.
1160       - Element fields that don't have an element definition default to being
1161         { text }.
1162       - Elements defined as { text } are ignored causing the referring  field
1163         to  default  to { text } however it is good practice to include these
1164         elements in order to furnish a complete RNC schema.
1165       - Only honours the +, ? and * wildcard tokens.
1166       - At this stage does not honour field group sets () and or-ing | syntax
1167         tokens.
1168
1169       XSD schema
1170       - Only   recognises  the  <schema>,  <element>,  <complexType>,  <ref>,
1171         <sequence>, and <choice> directives and ignores all others.
1172       - The recognised directives are not fully  implemented  and  their  use
1173         should be kept straightforward.
1174       - The first valid <element> definition becomes the root element.
1175       - Element  types  that  are not of matchable complexType are treated as
1176         "xsi:string" regardless of what type is specified.
1177       - Only   honours   the   minOccurs="0",   maxOccurs="0"   and    maxOc‐
1178         curs="unbounded" wildcard attributes.
1179       - At  this stage does not honour group sets but does do limited support
1180         with choices.
1181
1182       All schema types
1183       - The fields of the root element define all the level 1 elements  (lets
1184         call  the  fields  that  have  their own branch structure record ele‐
1185         ments).
1186       - By default fields of the root element that are  not  record  elements
1187         are  ignored.  Use the match direct option to match targeted elements
1188         in their entirety.
1189       - The fields of the record elements simply represent other elements and
1190         unlimited element nesting is allowed.
1191       - The  field  names  that  are specified in the element definitions are
1192         read from left to right and matched against a field  number  calcula‐
1193         tion  on  the  input  fields,  and then matched again on any wildcard
1194         tokens.
1195       - You can wildcard many fields but you should think clearly about  what
1196         you  are  trying  to  achieve  and whether it is at all possible. For
1197         example, the following DTD which is perfectly suitable  for  checking
1198         for  valid  XML, will however prove impossible for xmlfy to shoe-horn
1199         input data into DTD elements a, b and c reliably  because  more  than
1200         one field has a wildcard token to match none or many input fields.
1201             <!ELEMENT parent (record)>
1202             <!ELEMENT record (a*, b, c*)>
1203             <!ELEMENT a (#PCDATA)>
1204             <!ELEMENT b (#PCDATA)>
1205             <!ELEMENT c (#PCDATA)>
1206         In  the above example xmlfy will allocate ALL input fields to element
1207         <a> and that MAY not be the desired intention.
1208

RETURN VALUES

1210        0    Normal exit.
1211       -1    Invalid argument specified.
1212       -2    Error processing schema file contents.
1213       -3    Infinite loop detected when matching input  against  schema  ele‐
1214             ments.
1215       -10   Out of memory.
1216

AUTHOR

1218       Originally written by Arthur Gouros.
1219       This software also contains material derived from Ville Laurikari's TRE
1220       regex library.
1221       This software also contains material derived from the  US  Secure  Hash
1222       Algorithms (RFC4634).
1223       This  software  also  contains material derived from the RSA Data Secu‐
1224       rity, Inc. MD5 Message-Digest Algorithm.
1225

LICENSE

1227       BSD License for xmlfy
1228       Copyright © 2008-2020, Arthur Gouros
1229       All rights reserved.
1230
1231       Redistribution and use in source and binary forms, with or without mod‐
1232       ification,  are  permitted  provided  that the following conditions are
1233       met:
1234
1235       - Redistributions of  source  code  must  retain  the  above  copyright
1236         notice, this list of conditions and the following disclaimer.
1237       - Redistributions  in  binary  form  must reproduce the above copyright
1238         notice, this list of conditions and the following disclaimer  in  the
1239         documentation and/or other materials provided with the distribution.
1240       - Neither  the  name of Arthur Gouros nor the names of its contributors
1241         may be used to endorse or promote products derived from this software
1242         without specific prior written permission.
1243
1244       THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
1245       IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT  NOT  LIMITED
1246       TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTIC‐
1247       ULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT  OWNER  OR
1248       CONTRIBUTORS  BE  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
1249       EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,  BUT  NOT  LIMITED  TO,
1250       PROCUREMENT  OF  SUBSTITUTE  GOODS  OR  SERVICES; LOSS OF USE, DATA, OR
1251       PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY  OF
1252       LIABILITY,  WHETHER  IN  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
1253       NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT  OF  THE  USE  OF  THIS
1254       SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1255

SEE ALSO

1257       The full documentation of the xmlfy project can be found on the web at:
1258
1259           http://xmlfy.sourceforge.net
1260
1261       The website is updated more frequently than the man pages and should be
1262       considered the authoritative source of information.
1263
1264
1265
1266xmlfy 1.5.7                    February 2, 2020                       xmlfy(1)
Impressum