1dcm2xml(1)                        OFFIS DCMTK                       dcm2xml(1)
2
3
4

NAME

6       dcm2xml - Convert DICOM file and data set to XML
7
8

SYNOPSIS

10       dcm2xml [options] dcmfile-in [xmlfile-out]
11

DESCRIPTION

13       The  dcm2xml utility converts the contents of a DICOM file (file format
14       or raw data set) to XML (Extensible Markup  Language).  There  are  two
15       output  formats.  The  first  one  is  specific  to  DCMTK with its DTD
16       (Document Type Definition)  described  in  the  file  dcm2xml.dtd.  The
17       second  one  refers  to the 'Native DICOM Model' which is specified for
18       the DICOM Application Hosting service found in DICOM part 19.
19
20       If dcm2xml reads a raw data set (DICOM data without a file format meta-
21       header)  it  will attempt to guess the transfer syntax by examining the
22       first few bytes of the file. It is not  always  possible  to  correctly
23       guess  the  transfer syntax and it is better to convert a data set to a
24       file format whenever possible (using the dcmconv utility). It  is  also
25       possible  to  use the -f and -t[ieb] options to force dcm2xml to read a
26       data set with a particular transfer syntax.
27

PARAMETERS

29       dcmfile-in   DICOM input filename to be converted
30
31       xmlfile-out  XML output filename (default: stdout)
32

OPTIONS

34   general options
35         -h    --help
36                 print this help text and exit
37
38               --version
39                 print version information and exit
40
41               --arguments
42                 print expanded command line arguments
43
44         -q    --quiet
45                 quiet mode, print no warnings and errors
46
47         -v    --verbose
48                 verbose mode, print processing details
49
50         -d    --debug
51                 debug mode, print debug information
52
53         -ll   --log-level  [l]evel: string constant
54                 (fatal, error, warn, info, debug, trace)
55                 use level l for the logger
56
57         -lc   --log-config  [f]ilename: string
58                 use config file f for the logger
59
60   input options
61       input file format:
62
63         +f    --read-file
64                 read file format or data set (default)
65
66         +fo   --read-file-only
67                 read file format only
68
69         -f    --read-dataset
70                 read data set without file meta information
71
72       input transfer syntax:
73
74         -t=   --read-xfer-auto
75                 use TS recognition (default)
76
77         -td   --read-xfer-detect
78                 ignore TS specified in the file meta header
79
80         -te   --read-xfer-little
81                 read with explicit VR little endian TS
82
83         -tb   --read-xfer-big
84                 read with explicit VR big endian TS
85
86         -ti   --read-xfer-implicit
87                 read with implicit VR little endian TS
88
89       long tag values:
90
91         +M    --load-all
92                 load very long tag values (e.g. pixel data)
93
94         -M    --load-short
95                 do not load very long values (default)
96
97         +R    --max-read-length  [k]bytes: integer (4..4194302, default: 4)
98                 set threshold for long values to k kbytes
99
100   processing options
101       specific character set:
102
103         +Cr   --charset-require
104                 require declaration of extended charset (default)
105
106         +Ca   --charset-assume  [c]harset: string
107                 assume charset c if no extended charset declared
108
109         +Cc   --charset-check-all
110                 check all data elements with string values
111                 (default: only PN, LO, LT, SH, ST, UC and UT)
112
113                 # this option is only used for the mapping to an appropriate
114                 # XML character encoding, but not for the conversion to UTF-8
115
116         +U8   --convert-to-utf8
117                 convert all element values that are affected
118                 by Specific Character Set (0008,0005) to UTF-8
119
120                 # requires support from an underlying character encoding library
121                 # (see output of --version on which one is available)
122
123   output options
124       general XML format:
125
126         -dtk  --dcmtk-format
127                 output in DCMTK-specific format (default)
128
129         -nat  --native-format
130                 output in Native DICOM Model format (part 19)
131
132         +Xn   --use-xml-namespace
133                 add XML namespace declaration to root element
134
135       DCMTK-specific format (not with --native-format):
136
137         +Xd   --add-dtd-reference
138                 add reference to document type definition (DTD)
139
140         +Xe   --embed-dtd-content
141                 embed document type definition into XML document
142
143         +Xf   --use-dtd-file  [f]ilename: string
144                 use specified DTD file (only with +Xe)
145                 (default: /usr/local/share/dcmtk/dcm2xml.dtd)
146
147         +Wn   --write-element-name
148                 write name of the DICOM data elements (default)
149
150         -Wn   --no-element-name
151                 do not write name of the DICOM data elements
152
153         +Wb   --write-binary-data
154                 write binary data of OB and OW elements
155                 (default: off, be careful with --load-all)
156
157       encoding of binary data:
158
159         +Eh   --encode-hex
160                 encode binary data as hex numbers
161                 (default for DCMTK-specific format)
162
163         +Eu   --encode-uuid
164                 encode binary data as a UUID reference
165                 (default for Native DICOM Model)
166
167         +Eb   --encode-base64
168                 encode binary data as Base64 (RFC 2045, MIME)
169

DCMTK Format

171       The basic structure of the DCMTK-specific XML  output  created  from  a
172       DICOM file looks like the following:
173
174       <?xml version="1.0" encoding="ISO-8859-1"?>
175       <!DOCTYPE file-format SYSTEM "dcm2xml.dtd">
176       <file-format xmlns="http://dicom.offis.de/dcmtk">
177         <meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
178           <element tag="0002,0000" vr="UL" vm="1" len="4"
179                    name="MetaElementGroupLength">
180             166
181           </element>
182           ...
183           <element tag="0002,0013" vr="SH" vm="1" len="16"
184                    name="ImplementationVersionName">
185             OFFIS_DCMTK_353
186           </element>
187         </meta-header>
188         <data-set xfer="1.2.840.10008.1.2" name="Little Endian Implicit">
189           <element tag="0008,0005" vr="CS" vm="1" len="10"
190                    name="SpecificCharacterSet">
191             ISO_IR 100
192           </element>
193           ...
194           <sequence tag="0028,3010" vr="SQ" card="2" name="VOILUTSequence">
195             <item card="3">
196               <element tag="0028,3002" vr="xs" vm="3" len="6"
197                        name="LUTDescriptor">
198                 256 8
199               </element>
200               ...
201             </item>
202             ...
203           </sequence>
204           ...
205           <element tag="7fe0,0010" vr="OW" vm="1" len="262144"
206                    name="PixelData" loaded="no" binary="hidden">
207           </element>
208         </data-set>
209       </file-format>
210
211       The  'file-format'  and  'meta-header'  tags  are absent for DICOM data
212       sets.
213
214   XML Encoding
215       Attributes with very large value  fields  (e.g.  pixel  data)  are  not
216       loaded  by  default. They can be identified by the additional attribute
217       'loaded' with a value of 'no' (see example  above).  The  command  line
218       option  --load-all  forces  to load all value fields including the very
219       long ones.
220
221       Furthermore, binary data of OB and OW attributes are not written to the
222       XML  output  file  by  default. These elements can be identified by the
223       additional attribute 'binary' with a  value  of  'hidden'  (default  is
224       'no').  The  command line option --write-binary-data causes also binary
225       value fields to be printed (attribute value is 'yes' or 'base64'). But,
226       be  careful  when using this option together with --load-all because of
227       the large amounts of pixel data that might be printed  to  the  output.
228       Please note that in this context element values with a VR of OD, OF, OL
229       and OV are not regarded as 'binary data'.
230
231       Multiple values (i.e. where the DICOM  value  multiplicity  is  greater
232       than  1)  are  separated  by a backslash '\' (except for Base64 encoded
233       data). The 'len' attribute  indicates  the  number  of  bytes  for  the
234       particular  value  field as stored in the DICOM data set, i.e. it might
235       deviate from  the  XML  encoded  value  length  e.g.  because  of  non-
236       significant padding that has been removed. If this attribute is missing
237       in 'sequence' or 'item' start tags, the corresponding DICOM element has
238       been stored with undefined length.
239

Native DICOM Model Format

241       The  description  of  the Native DICOM Model format can be found in the
242       DICOM standard, part 19 ('Application Hosting').
243
244   Bulk Data
245       Binary data, i.e. DICOM element values with Value Representations  (VR)
246       of OB or OW, as well as OD, OF, OL, OV and UN values are by default not
247       written to the XML output because of  their  size.  Instead,  for  each
248       element,  a new Universally Unique Identifier (UUID) is being generated
249       and written as an attribute of a <BulkData> XML element. So far,  there
250       is  no  possibility to write an additional file to hold the binary data
251       for each of the binary  data  chunks.  This  is  not  required  by  the
252       standard,  however,  it might be useful for implementing an Application
253       Hosting interface;  thus  this  feature  may  be  available  in  future
254       versions of dcm2xml.
255
256       In  addition,  Supplement  163  (Store Over the Web by Representational
257       State Transfer Services) introduces a new  <InlineBinary>  XML  element
258       that  allows for encoding binary data as Base64. Currently, the command
259       line option --encode-base64 enables this  encoding  for  the  following
260       VRs: OB, OD, OF, OL, OV, OW and UN.
261
262   Known Issues
263       In  addition  to  what  is written in the above section on 'Bulk Data',
264       there are further known issues with the current implementation  of  the
265       Native  DICOM Model format. For example, large element values with a VR
266       other than OB, OD, OF, OL, OV, OW or UN are currently never written  as
267       bulk  data,  although  it  might  be  useful,  e.g.  for very long text
268       elements (especially UT) or very long numeric fields (of various VRs).
269

NOTES

271   Character Encoding
272       The XML encoding is determined automatically from the  DICOM  attribute
273       (0008,0005) 'Specific Character Set' using the following mapping:
274
275       ASCII         (ISO_IR 6)    =>  "UTF-8"
276       UTF-8         "ISO_IR 192"  =>  "UTF-8"
277       ISO Latin 1   "ISO_IR 100"  =>  "ISO-8859-1"
278       ISO Latin 2   "ISO_IR 101"  =>  "ISO-8859-2"
279       ISO Latin 3   "ISO_IR 109"  =>  "ISO-8859-3"
280       ISO Latin 4   "ISO_IR 110"  =>  "ISO-8859-4"
281       ISO Latin 5   "ISO_IR 148"  =>  "ISO-8859-9"
282       Cyrillic      "ISO_IR 144"  =>  "ISO-8859-5"
283       Arabic        "ISO_IR 127"  =>  "ISO-8859-6"
284       Greek         "ISO_IR 126"  =>  "ISO-8859-7"
285       Hebrew        "ISO_IR 138"  =>  "ISO-8859-8"
286
287       If  this DICOM attribute is missing in the input file, although needed,
288       option --charset-assume can be used to specify an appropriate character
289       set  manually  (using  one  of the DICOM defined terms). For reasons of
290       backward  compatibility  with  previous  versions  of  this  tool,  the
291       following  terms  are  also  supported  and mapped automatically to the
292       associated DICOM defined terms:  latin-1,  latin-2,  latin-3,  latin-4,
293       latin-5, cyrillic, arabic, greek, hebrew.
294
295       Multiple  character  sets  using  code  extension  techniques  are  not
296       supported. If needed, option --convert-to-utf8 can be used  to  convert
297       the DICOM file or data set to UTF-8 encoding prior to the conversion to
298       XML format. This is also useful for DICOMDIR files where each directory
299       record can have a different character set.
300
301       If no mapping is defined and option --convert-to-utf8 is not used, non-
302       ASCII characters and those below #32 are stored as '&#nnn;' where 'nnn'
303       refers  to  the  numeric  character  code.  This  might lead to invalid
304       character entity references (such as '&#27;' for ESC)  and  will  cause
305       most XML parsers to reject the document.
306

LOGGING

308       The  level  of  logging  output  of  the various command line tools and
309       underlying libraries can be specified by the  user.  By  default,  only
310       errors  and  warnings  are  written to the standard error stream. Using
311       option --verbose also informational messages  like  processing  details
312       are  reported.  Option  --debug  can be used to get more details on the
313       internal activity, e.g. for debugging purposes.  Other  logging  levels
314       can  be  selected  using option --log-level. In --quiet mode only fatal
315       errors are reported. In such very severe error events, the  application
316       will  usually  terminate.  For  more  details  on the different logging
317       levels, see documentation of module 'oflog'.
318
319       In case the logging output should be written to file  (optionally  with
320       logfile  rotation),  to syslog (Unix) or the event log (Windows) option
321       --log-config can be used.  This  configuration  file  also  allows  for
322       directing  only  certain messages to a particular output stream and for
323       filtering certain messages based on the  module  or  application  where
324       they  are  generated.  An  example  configuration  file  is provided in
325       <etcdir>/logger.cfg.
326

COMMAND LINE

328       All command line tools  use  the  following  notation  for  parameters:
329       square  brackets  enclose  optional  values  (0-1), three trailing dots
330       indicate that multiple values are allowed (1-n), a combination of  both
331       means 0 to n values.
332
333       Command line options are distinguished from parameters by a leading '+'
334       or '-' sign, respectively. Usually, order and position of command  line
335       options  are  arbitrary  (i.e.  they  can appear anywhere). However, if
336       options are mutually exclusive the rightmost appearance is  used.  This
337       behavior  conforms  to  the  standard  evaluation  rules of common Unix
338       shells.
339
340       In addition, one or more command files can be specified  using  an  '@'
341       sign  as  a  prefix to the filename (e.g. @command.txt). Such a command
342       argument is replaced by the content  of  the  corresponding  text  file
343       (multiple  whitespaces  are  treated  as a single separator unless they
344       appear between two quotation marks) prior to  any  further  evaluation.
345       Please  note  that  a command file cannot contain another command file.
346       This simple but effective  approach  allows  one  to  summarize  common
347       combinations  of  options/parameters  and  avoids longish and confusing
348       command lines (an example is provided in file <datadir>/dumppat.txt).
349

ENVIRONMENT

351       The dcm2xml utility  will  attempt  to  load  DICOM  data  dictionaries
352       specified  in the DCMDICTPATH environment variable. By default, i.e. if
353       the  DCMDICTPATH  environment   variable   is   not   set,   the   file
354       <datadir>/dicom.dic  will be loaded unless the dictionary is built into
355       the application (default for Windows).
356
357       The  default  behavior  should  be  preferred   and   the   DCMDICTPATH
358       environment  variable  only used when alternative data dictionaries are
359       required. The DCMDICTPATH environment variable has the same  format  as
360       the  Unix  shell PATH variable in that a colon (':') separates entries.
361       On Windows systems, a semicolon (';') is used as a separator. The  data
362       dictionary  code  will  attempt  to  load  each  file  specified in the
363       DCMDICTPATH environment variable. It is an error if no data  dictionary
364       can be loaded.
365

FILES

367       <datadir>/dcm2xml.dtd - Document Type Definition (DTD) file
368

SEE ALSO

370       xml2dcm(1), dcmconv(1)
371
373       Copyright (C) 2002-2021 e.V., Escherweg 2, 26121 Oldenburg, Germany.
374
375
376
377Version 3.6.6                   Thu Jan 14 2021                     dcm2xml(1)
Impressum