1dcm2xml(1) OFFIS DCMTK dcm2xml(1)
2
3
4
6 dcm2xml - Convert DICOM file and data set to XML
7
8
10 dcm2xml [options] dcmfile-in [xmlfile-out]
11
13 The dcm2xml utility converts the contents of a DICOM file (file format
14 or raw data set) to XML (Extensible Markup Language). There are two
15 output formats. The first one is specific to DCMTK with its DTD
16 (Document Type Definition) described in the file dcm2xml.dtd. The
17 second one refers to the 'Native DICOM Model' which is specified for
18 the DICOM Application Hosting service found in DICOM part 19.
19
20 If dcm2xml reads a raw data set (DICOM data without a file format meta-
21 header) it will attempt to guess the transfer syntax by examining the
22 first few bytes of the file. It is not always possible to correctly
23 guess the transfer syntax and it is better to convert a data set to a
24 file format whenever possible (using the dcmconv utility). It is also
25 possible to use the -f and -t[ieb] options to force dcm2xml to read a
26 data set with a particular transfer syntax.
27
29 dcmfile-in DICOM input filename to be converted
30
31 xmlfile-out XML output filename (default: stdout)
32
34 general options
35 -h --help
36 print this help text and exit
37
38 --version
39 print version information and exit
40
41 --arguments
42 print expanded command line arguments
43
44 -q --quiet
45 quiet mode, print no warnings and errors
46
47 -v --verbose
48 verbose mode, print processing details
49
50 -d --debug
51 debug mode, print debug information
52
53 -ll --log-level [l]evel: string constant
54 (fatal, error, warn, info, debug, trace)
55 use level l for the logger
56
57 -lc --log-config [f]ilename: string
58 use config file f for the logger
59
60 input options
61 input file format:
62
63 +f --read-file
64 read file format or data set (default)
65
66 +fo --read-file-only
67 read file format only
68
69 -f --read-dataset
70 read data set without file meta information
71
72 input transfer syntax:
73
74 -t= --read-xfer-auto
75 use TS recognition (default)
76
77 -td --read-xfer-detect
78 ignore TS specified in the file meta header
79
80 -te --read-xfer-little
81 read with explicit VR little endian TS
82
83 -tb --read-xfer-big
84 read with explicit VR big endian TS
85
86 -ti --read-xfer-implicit
87 read with implicit VR little endian TS
88
89 long tag values:
90
91 +M --load-all
92 load very long tag values (e.g. pixel data)
93
94 -M --load-short
95 do not load very long values (default)
96
97 +R --max-read-length [k]bytes: integer (4..4194302, default: 4)
98 set threshold for long values to k kbytes
99
100 processing options
101 specific character set:
102
103 +Cr --charset-require
104 require declaration of extended charset (default)
105
106 +Ca --charset-assume [c]harset: string
107 assume charset c if no extended charset declared
108
109 +Cc --charset-check-all
110 check all data elements with string values
111 (default: only PN, LO, LT, SH, ST, UC and UT)
112
113 # this option is only used for the mapping to an appropriate
114 # XML character encoding, but not for the conversion to UTF-8
115
116 +U8 --convert-to-utf8
117 convert all element values that are affected
118 by Specific Character Set (0008,0005) to UTF-8
119
120 # requires support from an underlying character encoding library
121 # (see output of --version on which one is available)
122
123 output options
124 general XML format:
125
126 -dtk --dcmtk-format
127 output in DCMTK-specific format (default)
128
129 -nat --native-format
130 output in Native DICOM Model format (part 19)
131
132 +Xn --use-xml-namespace
133 add XML namespace declaration to root element
134
135 DCMTK-specific format (not with --native-format):
136
137 +Xd --add-dtd-reference
138 add reference to document type definition (DTD)
139
140 +Xe --embed-dtd-content
141 embed document type definition into XML document
142
143 +Xf --use-dtd-file [f]ilename: string
144 use specified DTD file (only with +Xe)
145 (default: /usr/local/share/dcmtk/dcm2xml.dtd)
146
147 +Wn --write-element-name
148 write name of the DICOM data elements (default)
149
150 -Wn --no-element-name
151 do not write name of the DICOM data elements
152
153 +Wb --write-binary-data
154 write binary data of OB and OW elements
155 (default: off, be careful with --load-all)
156
157 encoding of binary data:
158
159 +Eh --encode-hex
160 encode binary data as hex numbers
161 (default for DCMTK-specific format)
162
163 +Eu --encode-uuid
164 encode binary data as a UUID reference
165 (default for Native DICOM Model)
166
167 +Eb --encode-base64
168 encode binary data as Base64 (RFC 2045, MIME)
169
171 The basic structure of the DCMTK-specific XML output created from a
172 DICOM file looks like the following:
173
174 <?xml version="1.0" encoding="ISO-8859-1"?>
175 <!DOCTYPE file-format SYSTEM "dcm2xml.dtd">
176 <file-format xmlns="http://dicom.offis.de/dcmtk">
177 <meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
178 <element tag="0002,0000" vr="UL" vm="1" len="4"
179 name="MetaElementGroupLength">
180 166
181 </element>
182 ...
183 <element tag="0002,0013" vr="SH" vm="1" len="16"
184 name="ImplementationVersionName">
185 OFFIS_DCMTK_353
186 </element>
187 </meta-header>
188 <data-set xfer="1.2.840.10008.1.2" name="Little Endian Implicit">
189 <element tag="0008,0005" vr="CS" vm="1" len="10"
190 name="SpecificCharacterSet">
191 ISO_IR 100
192 </element>
193 ...
194 <sequence tag="0028,3010" vr="SQ" card="2" name="VOILUTSequence">
195 <item card="3">
196 <element tag="0028,3002" vr="xs" vm="3" len="6"
197 name="LUTDescriptor">
198 256 8
199 </element>
200 ...
201 </item>
202 ...
203 </sequence>
204 ...
205 <element tag="7fe0,0010" vr="OW" vm="1" len="262144"
206 name="PixelData" loaded="no" binary="hidden">
207 </element>
208 </data-set>
209 </file-format>
210
211 The 'file-format' and 'meta-header' tags are absent for DICOM data
212 sets.
213
214 XML Encoding
215 Attributes with very large value fields (e.g. pixel data) are not
216 loaded by default. They can be identified by the additional attribute
217 'loaded' with a value of 'no' (see example above). The command line
218 option --load-all forces to load all value fields including the very
219 long ones.
220
221 Furthermore, binary data of OB and OW attributes are not written to the
222 XML output file by default. These elements can be identified by the
223 additional attribute 'binary' with a value of 'hidden' (default is
224 'no'). The command line option --write-binary-data causes also binary
225 value fields to be printed (attribute value is 'yes' or 'base64'). But,
226 be careful when using this option together with --load-all because of
227 the large amounts of pixel data that might be printed to the output.
228 Please note that in this context element values with a VR of OD, OF, OL
229 and OV are not regarded as 'binary data'.
230
231 Multiple values (i.e. where the DICOM value multiplicity is greater
232 than 1) are separated by a backslash '\' (except for Base64 encoded
233 data). The 'len' attribute indicates the number of bytes for the
234 particular value field as stored in the DICOM data set, i.e. it might
235 deviate from the XML encoded value length e.g. because of non-
236 significant padding that has been removed. If this attribute is missing
237 in 'sequence' or 'item' start tags, the corresponding DICOM element has
238 been stored with undefined length.
239
241 The description of the Native DICOM Model format can be found in the
242 DICOM standard, part 19 ('Application Hosting').
243
244 Bulk Data
245 Binary data, i.e. DICOM element values with Value Representations (VR)
246 of OB or OW, as well as OD, OF, OL, OV and UN values are by default not
247 written to the XML output because of their size. Instead, for each
248 element, a new Universally Unique Identifier (UUID) is being generated
249 and written as an attribute of a <BulkData> XML element. So far, there
250 is no possibility to write an additional file to hold the binary data
251 for each of the binary data chunks. This is not required by the
252 standard, however, it might be useful for implementing an Application
253 Hosting interface; thus this feature may be available in future
254 versions of dcm2xml.
255
256 In addition, Supplement 163 (Store Over the Web by Representational
257 State Transfer Services) introduces a new <InlineBinary> XML element
258 that allows for encoding binary data as Base64. Currently, the command
259 line option --encode-base64 enables this encoding for the following
260 VRs: OB, OD, OF, OL, OV, OW and UN.
261
262 Known Issues
263 In addition to what is written in the above section on 'Bulk Data',
264 there are further known issues with the current implementation of the
265 Native DICOM Model format. For example, large element values with a VR
266 other than OB, OD, OF, OL, OV, OW or UN are currently never written as
267 bulk data, although it might be useful, e.g. for very long text
268 elements (especially UT) or very long numeric fields (of various VRs).
269
271 Character Encoding
272 The XML encoding is determined automatically from the DICOM attribute
273 (0008,0005) 'Specific Character Set' using the following mapping:
274
275 ASCII (ISO_IR 6) => "UTF-8"
276 UTF-8 "ISO_IR 192" => "UTF-8"
277 ISO Latin 1 "ISO_IR 100" => "ISO-8859-1"
278 ISO Latin 2 "ISO_IR 101" => "ISO-8859-2"
279 ISO Latin 3 "ISO_IR 109" => "ISO-8859-3"
280 ISO Latin 4 "ISO_IR 110" => "ISO-8859-4"
281 ISO Latin 5 "ISO_IR 148" => "ISO-8859-9"
282 Cyrillic "ISO_IR 144" => "ISO-8859-5"
283 Arabic "ISO_IR 127" => "ISO-8859-6"
284 Greek "ISO_IR 126" => "ISO-8859-7"
285 Hebrew "ISO_IR 138" => "ISO-8859-8"
286
287 If this DICOM attribute is missing in the input file, although needed,
288 option --charset-assume can be used to specify an appropriate character
289 set manually (using one of the DICOM defined terms). For reasons of
290 backward compatibility with previous versions of this tool, the
291 following terms are also supported and mapped automatically to the
292 associated DICOM defined terms: latin-1, latin-2, latin-3, latin-4,
293 latin-5, cyrillic, arabic, greek, hebrew.
294
295 Multiple character sets using code extension techniques are not
296 supported. If needed, option --convert-to-utf8 can be used to convert
297 the DICOM file or data set to UTF-8 encoding prior to the conversion to
298 XML format. This is also useful for DICOMDIR files where each directory
299 record can have a different character set.
300
301 If no mapping is defined and option --convert-to-utf8 is not used, non-
302 ASCII characters and those below #32 are stored as '&#nnn;' where 'nnn'
303 refers to the numeric character code. This might lead to invalid
304 character entity references (such as '' for ESC) and will cause
305 most XML parsers to reject the document.
306
308 The level of logging output of the various command line tools and
309 underlying libraries can be specified by the user. By default, only
310 errors and warnings are written to the standard error stream. Using
311 option --verbose also informational messages like processing details
312 are reported. Option --debug can be used to get more details on the
313 internal activity, e.g. for debugging purposes. Other logging levels
314 can be selected using option --log-level. In --quiet mode only fatal
315 errors are reported. In such very severe error events, the application
316 will usually terminate. For more details on the different logging
317 levels, see documentation of module 'oflog'.
318
319 In case the logging output should be written to file (optionally with
320 logfile rotation), to syslog (Unix) or the event log (Windows) option
321 --log-config can be used. This configuration file also allows for
322 directing only certain messages to a particular output stream and for
323 filtering certain messages based on the module or application where
324 they are generated. An example configuration file is provided in
325 <etcdir>/logger.cfg.
326
328 All command line tools use the following notation for parameters:
329 square brackets enclose optional values (0-1), three trailing dots
330 indicate that multiple values are allowed (1-n), a combination of both
331 means 0 to n values.
332
333 Command line options are distinguished from parameters by a leading '+'
334 or '-' sign, respectively. Usually, order and position of command line
335 options are arbitrary (i.e. they can appear anywhere). However, if
336 options are mutually exclusive the rightmost appearance is used. This
337 behavior conforms to the standard evaluation rules of common Unix
338 shells.
339
340 In addition, one or more command files can be specified using an '@'
341 sign as a prefix to the filename (e.g. @command.txt). Such a command
342 argument is replaced by the content of the corresponding text file
343 (multiple whitespaces are treated as a single separator unless they
344 appear between two quotation marks) prior to any further evaluation.
345 Please note that a command file cannot contain another command file.
346 This simple but effective approach allows one to summarize common
347 combinations of options/parameters and avoids longish and confusing
348 command lines (an example is provided in file <datadir>/dumppat.txt).
349
351 The dcm2xml utility will attempt to load DICOM data dictionaries
352 specified in the DCMDICTPATH environment variable. By default, i.e. if
353 the DCMDICTPATH environment variable is not set, the file
354 <datadir>/dicom.dic will be loaded unless the dictionary is built into
355 the application (default for Windows).
356
357 The default behavior should be preferred and the DCMDICTPATH
358 environment variable only used when alternative data dictionaries are
359 required. The DCMDICTPATH environment variable has the same format as
360 the Unix shell PATH variable in that a colon (':') separates entries.
361 On Windows systems, a semicolon (';') is used as a separator. The data
362 dictionary code will attempt to load each file specified in the
363 DCMDICTPATH environment variable. It is an error if no data dictionary
364 can be loaded.
365
367 <datadir>/dcm2xml.dtd - Document Type Definition (DTD) file
368
370 xml2dcm(1), dcmconv(1)
371
373 Copyright (C) 2002-2022 by OFFIS e.V., Escherweg 2, 26121 Oldenburg,
374 Germany.
375
376
377
378Version 3.6.7 Fri Apr 22 2022 dcm2xml(1)