1h5import(1) General Commands Manual h5import(1)
2
3
4
6 h5import - Imports data into an existing or new HDF5 file.
7
9 h5import infile -d dim_list [ -p pathname ] [ -t input_class ] [ -s
10 input_size ] [infile ...] -o outfile
11
12 h5import infile -dims dim_list [ -path pathname ] [ -type input_class ]
13 [ -size input_size ] [infile ...] -outfile outfile
14
15 h5import infile -c config_file [infile ...] -outfile outfile
16
17 h5import -h
18
19 h5import -help
20
22 h5import converts data from one or more ASCII or binary files, infile,
23 into the same number of HDF5 datasets in the existing or new HDF5 file,
24 outfile. Data conversion is performed in accordance with the user-spec‐
25 ified type and storage properties specified in in_options.
26
27 The primary objective of h5import is to import floating point or inte‐
28 ger data. The utility's design allows for future versions that accept
29 ASCII text files and store the contents as a compact array of one-
30 dimensional strings, but that capability is not implemented in HDF5
31 Release 1.6.
32
33 Input data and options
34
35 Input data can be provided in one of the follwing forms:
36
37 * As an ASCII, or plain-text, file containing either floating
38 point or integer data
39
40 * As a binary file containing either 32-bit or 64-bit native
41 floating point data
42
43 * As a binary file containing native integer data, signed or
44 unsigned and 8-bit, 16-bit, 32-bit, or 64-bit.
45
46 * As an ASCII, or plain-text, file containing text data. (This
47 feature is not implemented in HDF5 Release 1.6.)
48
49 Each input file, infile, contains a single n-dimensional array of val‐
50 ues of one of the above types expressed in the order of fastest-chang‐
51 ing dimensions first.
52
53 Floating point data in an ASCII input file must be expressed in the
54 fixed floating form (e.g., 323.56) h5import is designed to accept sci‐
55 entific notation (e.g., 3.23E+02) in an ASCII, but that is not imple‐
56 mented in HDF5 release 1.6.
57
58 Each input file can be associated with options specifying the datatype
59 and storage properties. These options can be specified either as com‐
60 mand line arguments or in a configuration file. Note that exactly one
61 of these approaches must be used with a single input file.
62
63 Command line arguments, best used with simple input files, can be used
64 to specify the class, size, dimensions of the input data and a path
65 identifying the output dataset.
66
67 The recommended means of specifying input data options is in a configu‐
68 ration file; this is also the only means of specifying advanced storage
69 features. See further discussion in "The configuration file" below.
70
71 The only required option for input data is dimension sizes; defaults
72 are available for all others.
73
74 h5import will accept up to 30 input files in a single call. Other con‐
75 siderations, such as the maximum length of a command line, may impose a
76 more stringent limitation.
77
78 Output data and options:
79
80 The name of the output file is specified following the -o or -output
81 option in outfile. The data from each input file is stored as a sepa‐
82 rate dataset in this output file. outfile may be an existing file. If
83 it does not yet exist, h5import will create it.
84
85 Output dataset information and storage properties can be specified only
86 by means of a configuration file.
87
88 Dataset path
89 If the groups in the path leading to the dataset do not exist,
90 h5import will create them. If no group is specified, the
91 dataset will be created as a member of the root group. If no
92 dataset name is specified, the default name is dataset1 for the
93 first input dataset, dataset2 for the second input dataset,
94 dataset3 for the third input dataset, etc. h5import does not
95 overwrite a pre-existing dataset of the specified or default
96 name. When an existing dataset of a confilcting name is encoun‐
97 tered, h5import quits with an error; the current input file and
98 any subsequent input files are not processed.
99
100 Output type
101 Datatype parameters for output data
102
103 Output data class
104 Signed or unsigned integer or floating point
105
106 Output data size
107 8-, 16-, 32-, or 64-bit integer 32- or 64-bit floating point
108
109 Output architecture
110 IEEE, STD, NATIVE (Default), Other architectures are included in
111 the h5import design but are not implemented in this release.
112
113 Output byte order
114 Little- or big-endian. Relevant only if output architecture is
115 IEEE, UNIX, or STD; fixed for other architectures.
116
117 Dataset layout and storage properties
118 Denote how raw data is to be organized on the disk. If none of
119 the following are specified, the default configuration is con‐
120 tiguous layout and with no compression.
121
122 Layout Contiguous (Default), Chunked
123
124 External storage
125 Allows raw data to be stored in a non-HDF5 file or in an exter‐
126 nal HDF5 file. Requires contiguous layout.
127
128 Compressed
129 Sets the type of compression and the level to which the dataset
130 must be compressed. Requires chunked layout.
131
132 Extendible
133 Allows the dimensions of the dataset increase over time and/or
134 to be unlimited. Requires chunked layout.
135
136 Compressed and extendible
137 Requires chunked layout.
138
140 A configuration file is specified with the -c config_file option:
141 h5import infile -c config_file [infile -
142
143 The configuration file is an ASCII file and must be organized as "Con‐
144 figuration_Keyword Value" pairs, with one pair on each line. For exam‐
145 ple, the line indicating that the input data class (configuration key‐
146 word INPUT-CLASS) is floating point in a text file (value TEXTFP) would
147 appear as follows:
148 INPUT-CLASS TEXTFP
149
150 A configuration file may have the following keywords each followed by
151 one of the following defined values. One entry for each of the first
152 two keywords, RANK and DIMENSION-SIZES, is required; all other keywords
153 are optional.
154
155 RANK rank
156 An integer specifying the number of dimensions in the dataset.
157 Example: RANK 4 for a 4-dimensional dataset.
158
159 DIMENSION-SIZES dim_sizes
160 Sizes of the dataset dimensions. (Required) A string of space-
161 separated integers specifying the sizes of the dimensions in the
162 dataset. The number of sizes in this entry must match the value
163 in the RANK entry. The fastest-changing dimension must be listed
164 first. Example: DIMENSION_SIZES 4 3 4 38 for a 38x4x3x4
165 dataset.
166
167 PATH path
168 Path of the output dataset. The full HDF5 pathname identifying
169 the output dataset relative to the root group within the output
170 file. I.e., path is a string consisting of optional group names,
171 each followed by a slash, and ending with a dataset name. If the
172 groups in the path do no exist, they will be created. If PATH is
173 not specified, the output dataset is stored as a member of the
174 root group and the default dataset name is dataset1 for the
175 first input dataset, dataset2 for the second input dataset,
176 dataset3 for the third input dataset, etc. Note that h5import
177 does not overwrite a pre-existing dataset of the specified or
178 default name. When an existing dataset of a confilcting name is
179 encountered, h5import quits with an error; the current input
180 file and any subsequent input files are not processed. Example:
181 The configuration file entry "PATH grp1/grp2/dataset1" indicates
182 that the output dataset dataset1 will be written in the group
183 grp2/ which is in the group grp1/, a member of the root group in
184 the output file.
185
186 INPUT-CLASS {TEXTIN|TEXTUIN|TEXTFP|TEXTFPE|IN|UIN|FP|STR}
187 A string denoting the type of input data.
188 TEXTIN Input is signed integer data in an ASCII file.
189 TEXTUIN Input is unsigned integer data in an ASCII file.
190 TEXTFP Input is floating point data in fixed notation (e.g.,
191 325.34) in an ASCII file.
192 TEXTFPE Input is floating point data in scientific notation
193 (e.g., 3.2534E+02) in an ASCII file. (Not implemented in this
194 release.)
195 IN Input is signed integer data in a binary file.
196 UIN Input is unsigned integer data in a binary file.
197 FP Input is floating point data in a binary file. (Default)
198 STR Input is character data in an ASCII file. With this value,
199 the configuration keywords RANK, DIMENSION-SIZES, OUTPUT-CLASS,
200 OUTPUT-SIZE, OUTPUT-ARCHITECTURE, and OUTPUT-BYTE-ORDER will be
201 ignored. (Not implemented in this release.)
202
203 INPUT-SIZE {8|16|32|64}
204 An integer denoting the size of the input data, in bits. For
205 signed and unsigned integer data (TEXTIN, TEXTUIN, IN, or UIN)
206 any of 8, 16, 32, or 64 may be used. The default is 32. For
207 floating point data (TEXTFP, TEXTFPE, or FP), either 32 or 64
208 may be specified. The default is 32.
209
210 OUTPUT-CLASS {IN|UIN|FP|STR}
211 A string denoting the type of output data.
212 IN Output is signed integer data. (Default if INPUT-CLASS is IN
213 or TEXTIN)
214 UIN Output is unsigned integer data. (Default if INPUT-CLASS is
215 UIN or TEXTUIN)
216 FP Output is floating point data. (Default if INPUT-CLASS is not
217 specified or is FP, TEXTFP, or TEXTFPE)
218 STR Output is character data, to be written as a 1-dimensional
219 array of strings. (Default if INPUT-CLASS is STR) (Not imple‐
220 mented in this release.)
221
222 OUTPUT-SIZE {8|16|32|64}
223 An integer denoting the size of the output data, in bits. For
224 signed and unsigned integer data (IN or UIN), any of the four
225 sizes are valid. The default is the same as INPUT-SIZE, else 32.
226 For floating point data (FP), either 32 or 64 may be specified.
227 The default is the same as INPUT-SIZE, else 32.
228
229 OUTPUT-ARCHITECTURE {NATIVE|STD|IEEE|INTEL|CRAY|MIPS|ALPHA|UNIX}
230 A string denoting the type of output architecture. See the
231 "Predefined Atomic Types" section in the "HDF5 Datatypes" chap‐
232 ter of the HDF5 User's Guide for a discussion of these architec‐
233 tures. INTEL, CRAY, MIPS, ALPHA, and UNIX are not implemented
234 in this release. (Default: NATIVE)
235
236 OUTPUT-BYTE-ORDER {BE|LE}
237 A string denoting the output byte order. This entry is ignored
238 if the OUTPUT-ARCHITECTURE is not specified or if it is not
239 specified as IEEE, UNIX, or STD.
240 BE Big-endian. (Default)
241 LE Little-endian.
242
243 The following options are disabled by default, making the default stor‐
244 age properties no chunking, no compression, no external storage, and no
245 extensible dimensions.
246
247 CHUNKED-DIMENSION-SIZES chunk_dims
248 Dimension sizes of the chunk for chunked output data. A string
249 of space-separated integers specifying the dimension sizes of
250 the chunk for chunked output data. The number of dimensions must
251 correspond to the value of RANK. The presence of this field
252 indicates that the output dataset is to be stored in chunked
253 layout; if this configuration field is absent, the dataset will
254 be stored in contiguous layout.
255
256 COMPRESSION-TYPE {GZIP}
257 Type of compression to be used with chunked storage. Requires
258 that CHUNKED-DIMENSION-SIZES be specified. GZIP is gzip
259 compression. Othe compression algorithms are not implemented in
260 this release of h5import.
261
262 COMPRESSION-PARAM [1-9]
263 Compression level. Required if COMPRESSION-TYPE is specified.
264 Gzip compression levels: 1 will result in the fastest compres‐
265 sion while 9 will result in the best compression ratio.
266 (Default: 6. The default gzip compression level is 6; not all
267 compression methods will have a default level.)
268
269 EXTERNAL-STORAGE external_file
270 Name of an external file in which to create the output dataset.
271 Cannot be used with CHUNKED-DIMENSIONS-SIZES, COMPRESSION-TYPE,
272 OR MAXIMUM-DIMENSIONS. A string specifying the name of an
273 external file.
274
275 MAXIMUM-DIMENSIONS max_dims
276 Maximum sizes of all dimensions. Requires that CHUNKED-DIMEN‐
277 SION-SIZES be specified. A string of space-separated integers
278 specifying the maximum size of each dimension of the output
279 dataset. A value of -1 for any dimension implies unlimited size
280 for that particular dimension. The number of dimensions must
281 correspond to the value of RANK.
282
284 -h[elp]
285 prints the h5import usage summary
286
287 infile(s)
288 Name of the Input file(s).
289
290 -d[ims] dim_list
291 Input data dimensions. dim_list is a string of comma-separated
292 numbers with no spaces describing the dimensions of the input
293 data. For example, a 50 x 100 2-dimensional array would be spec‐
294 ified as -dims 50,100. Required argument: if no configuration
295 file is used, this command-line argument is mandatory.
296
297 -p[athname] pathname
298 pathname is a string consisting of one or more strings separated
299 by slashes (/) specifying the path of the dataset in the output
300 file. If the groups in the path do no exist, they will be cre‐
301 ated. Optional argument: if not specified, the default path is
302 dataset1 for the first input dataset, dataset2 for the second
303 input dataset, dataset3 for the third input dataset, etc.
304 h5import does not overwrite a pre-existing dataset of the speci‐
305 fied or default name. When an existing dataset of a confilcting
306 name is encountered, h5import quits with an error; the current
307 input file and any subsequent input files are not processed.
308
309 -t[ype] input_class
310 input_class specifies the class of the input data and determines
311 the class of the output data. Valid values are as defined in
312 the Keyword/Values table in the section "The configuration file"
313 above. Optional argument: if not specified, the default value
314 is FP.
315
316 -s[size] input_size
317 input_size specifies the size in bits of the input data and
318 determines the size of the output data. Valid values for signed
319 or unsigned integers are 8, 16, 32, and 64. Valid values for
320 floating point data are 32 and 64. Optional argument: if not
321 specified, the default value is 32.
322
323 -c config_file
324 config_file specifies a configuration file. This argument
325 replaces all other arguments except infile and -o outfile
326
327 outfile
328 Name of the HDF5 output file.
329
331 If the -c config_file option is used with an input file, no other argu‐
332 ment can be used with that input file. If the -c config_file option is
333 not used with an input data file, the -d dim_list argument (or -dims
334 dim_list) must be used and any combination of the remaining options may
335 be used. Any arguments used must appear in exactly the order used in
336 the syntax declarations immediately above.
337
338 Note that while only the -dims argument is required, arguments must
339 used in the order in which they are listed below.
340
342 Using command-line arguments:
343
344 This command creates a file out1 containing a single 2x3x4 32-bit inte‐
345 ger dataset. Since no pathname is specified, the dataset is stored in
346 out1 as /dataset1.
347 h5import infile -dims 2,3,4 -type TEXTIN -size 32 -o out1
348
349 This command creates a file out2 containing a single a 20x50 64-bit
350 floating point dataset. The dataset is stored in out2 as /bin1/dset1.
351 h5import infile -dims 20,50 -path bin1/dset1 -type FP -size 64
352 -o out2
353
354 Sample configuration files: The following configuration file specifies
355 the following:
356
357 o The input data is a 5x2x4 floating point array in an ASCII file.
358
359 o The output dataset will be saved in chunked layout, with chunk
360 dimension sizes of 2x2x2.
361
362 o The output datatype will be 64-bit floating point, little-
363 endian, IEEE.
364
365 o The output dataset will be stored in outfile at /work/h5/pka‐
366 mat/First-set.
367
368 o The maximum dimension sizes of the output dataset will be
369 8x8x(unlimited).
370
371 PATH work/h5/pkamat/First-set
372 INPUT-CLASS TEXTFP
373 RANK 3
374 DIMENSION-SIZES 5 2 4
375 OUTPUT-CLASS FP
376 OUTPUT-SIZE 64
377 OUTPUT-ARCHITECTURE IEEE
378 OUTPUT-BYTE-ORDER LE
379 CHUNKED-DIMENSION-SIZES 2 2 2
380 MAXIMUM-DIMENSIONS 8 8 -1
381
382 The next configuration file specifies the following:
383
384 o The input data is a 6x3x5x2x4 integer array in a binary file.
385
386 o The output dataset will be saved in chunked layout, with chunk
387 dimension sizes of 2x2x2x2x2.
388
389 o The output datatype will be 32-bit integer in NATIVE format (as
390 the output architecure is not specified).
391
392 o The output dataset will be compressed using Gzip compression
393 with a compression level of 7.
394
395 o The output dataset will be stored in outfile at /Second-set.
396
397 PATH Second-set
398 INPUT-CLASS IN
399 RANK 5
400 DIMENSION-SIZES 6 3 5 2 4
401 OUTPUT-CLASS IN
402 OUTPUT-SIZE 32
403 CHUNKED-DIMENSION-SIZES 2 2 2 2 2
404 COMPRESSION-TYPE GZIP
405 COMPRESSION-PARAM 7
406
408 h5dump(1), h5ls(1), h5diff(1), h5repart(1), gif2h5(1), h52gif(1),
409 h5perf(1)
410
411
412
413
414 h5import(1)