1NCCOPY(1) UNIDATA UTILITIES NCCOPY(1)
2
3
4
6 nccopy - Copy a netCDF file, optionally changing format, compression,
7 or chunking in the output.
8
10 nccopy [-k kind_name ] [-kind_code] [-d n ] [-s] [-c chunkspec ]
11 [-u] [-w] [-[v|V] var1,...] [-[g|G] grp1,...] [-m bufsize ]
12 [-h chunk_cache ] [-e cache_elems ] [-r] [-F filterspec ] [-L
13 n ] [-M n ] infile outfile
14
16 The nccopy utility copies an input netCDF file in any supported format
17 variant to an output netCDF file, optionally converting the output to
18 any compatible netCDF format variant, compressing the data, or rechunk‐
19 ing the data. For example, if built with the netCDF-3 library, a
20 netCDF classic file may be copied to a netCDF 64-bit offset file, per‐
21 mitting larger variables. If built with the netCDF-4 library, a netCDF
22 classic file may be copied to a netCDF-4 file or to a netCDF-4 classic
23 model file as well, permitting data compression, efficient schema
24 changes, larger variable sizes, and use of other netCDF-4 features.
25
26 If no output format is specified, with either -k kind_name or
27 -kind_code, then the output will use the same format as the input, un‐
28 less the input is classic or 64-bit offset and either chunking or com‐
29 pression is specified, in which case the output will be netCDF-4 clas‐
30 sic model format. Attempting some kinds of format conversion will re‐
31 sult in an error, if the conversion is not possible. For example, an
32 attempt to copy a netCDF-4 file that uses features of the enhanced mod‐
33 el, such as groups or variable-length strings, to any of the other
34 kinds of netCDF formats that use the classic model will result in an
35 error.
36
37 nccopy also serves as an example of a generic netCDF-4 program, with
38 its ability to read any valid netCDF file and handle nested groups,
39 strings, and user-defined types, including arbitrarily nested compound
40 types, variable-length types, and data of any valid netCDF-4 type.
41
42 If DAP support was enabled when nccopy was built, the file name may
43 specify a DAP URL. This may be used to convert data on DAP servers to
44 local netCDF files.
45
47 -k kind_name
48 Use format name to specify the kind of file to be created and,
49 by inference, the data model (i.e. netcdf-3 (classic) or
50 netcdf-4 (enhanced)). The possible arguments are:
51
52 'nc3' or 'classic' => netCDF classic format
53
54 'nc6' or '64-bit offset' => netCDF 64-bit format
55
56 'nc4' or 'netCDF-4' => netCDF-4 format (enhanced data
57 model)
58
59 'nc7' or 'netCDF-4 classic model' => netCDF-4 classic
60 model format
61
62 Note: The old format numbers '1', '2', '3', '4', equivalent to
63 the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are
64 also still accepted but deprecated, due to easy confusion be‐
65 tween format numbers and format names.
66
67 [-kind_code]
68 Use format numeric code (instead of format name) to specify the
69 kind of file to be created and, by inference, the data model
70 (i.e. netcdf-3 (classic) versus netcdf-4 (enhanced)). The nu‐
71 meric codes are:
72
73 3 => netcdf classic format
74
75 6 => netCDF 64-bit format
76
77 4 => netCDF-4 format (enhanced data model)
78
79 7 => netCDF-4 classic model format
80 The numeric code "7" is used because "7=3+4", specifying the format
81 that uses the netCDF-3 data model for compatibility with the netCDF-4
82 storage format for performance. Credit is due to NCO for use of these
83 numeric codes instead of the old and confusing format numbers.
84
85 -d n
86 For netCDF-4 output, including netCDF-4 classic model, specify
87 deflation level (level of compression) for variable data output.
88 0 corresponds to no compression and 9 to maximum compression,
89 with higher levels of compression requiring marginally more time
90 to compress or uncompress than lower levels. As a side effect
91 specifying a compression level of 0 (via "-d 0") actually turns
92 off deflation altogether. Compression achieved may also depend
93 on output chunking parameters. If this option is specified for
94 a classic format or 64-bit offset format input file, it is not
95 necessary to also specify that the output should be netCDF-4
96 classic model, as that will be the default. If this option is
97 not specified and the input file has compressed variables, the
98 compression will still be preserved in the output, using the
99 same chunking as in the input by default.
100
101 Note that nccopy requires all variables to be compressed using
102 the same compression level, but the API has no such restriction.
103 With a program you can customize compression for each variable
104 independently.
105
106 -s For netCDF-4 output, including netCDF-4 classic model, specify
107 shuffling of variable data bytes before compression or after de‐
108 compression. Shuffling refers to interlacing of bytes in a
109 chunk so that the first bytes of all values are contiguous in
110 storage, followed by all the second bytes, and so on, which of‐
111 ten improves compression. This option is ignored unless a non-
112 zero deflation level is specified. Using -d0 to specify no de‐
113 flation on input data that has been compressed and shuffled
114 turns off both compression and shuffling in the output.
115
116 -u Convert any unlimited size dimensions in the input to fixed size
117 dimensions in the output. This can speed up variable-at-a-time
118 access, but slow down record-at-a-time access to multiple vari‐
119 ables along an unlimited dimension.
120
121 -w Keep output in memory (as a diskless netCDF file) until output
122 is closed, at which time output file is written to disk. This
123 can greatly speedup operations such as converting unlimited di‐
124 mension to fixed size (-u option), chunking, rechunking, or com‐
125 pressing the input. It requires that available memory is large
126 enough to hold the output file. This option may provide a larg‐
127 er speedup than careful tuning of the -m, -h, or -e options, and
128 it's certainly a lot simpler.
129
130 -c chunkspec
131 For netCDF-4 output, including netCDF-4 classic model, specify
132 chunking (multidimensional tiling) for variable data in the out‐
133 put. This is useful to specify the units of disk access, com‐
134 pression, or other filters such as checksums. Changing the
135 chunking in a netCDF file can also greatly speedup access, by
136 choosing chunk shapes that are appropriate for the most common
137 access patterns.
138
139 The chunkspec argument has several forms. The first form is the
140 original, deprecated form and is a string of comma-separated as‐
141 sociations, each specifying a dimension name, a '/' character,
142 and optionally the corresponding chunk length for that dimen‐
143 sion. No blanks should appear in the chunkspec string, except
144 possibly escaped blanks that are part of a dimension name. A
145 chunkspec names at least one dimension, and may omit dimensions
146 which are not to be chunked or for which the default chunk
147 length is desired. If a dimension name is followed by a '/'
148 character but no subsequent chunk length, the actual dimension
149 length is assumed. If copying a classic model file to a
150 netCDF-4 output file and not naming all dimensions in the
151 chunkspec, unnamed dimensions will also use the actual dimension
152 length for the chunk length. An example of a chunkspec for
153 variables that use 'm' and 'n' dimensions might be 'm/100,n/200'
154 to specify 100 by 200 chunks. To see the chunking resulting from
155 copying with a chunkspec, use the '-s' option of ncdump on the
156 output file.
157
158 The chunkspec '/' that omits all dimension names and correspond‐
159 ing chunk lengths specifies that no chunking is to occur in the
160 output, so can be used to unchunk all the chunked variables. To
161 see the chunking resulting from copying with a chunkspec, use
162 the '-s' option of ncdump on the output file.
163
164 As an I/O optimization, nccopy has a threshold for the minimum
165 size of non-record variables that get chunked, currently 8192
166 bytes. The -M flag can be used to override this value.
167
168 Note that nccopy requires variables that share a dimension to
169 also share the chunk size associated with that dimension, but
170 the programming interface has no such restriction. If you need
171 to customize chunking for variables independently, you will need
172 to use the second form of chunkspec. This second form of
173 chunkspec has this syntax: var:n1,n2,...,nn . This assumes that
174 the variable named "var" has rank n. The chunking to be applied
175 to each dimension of the variable is specified by the values of
176 n1 through nn. This second form of chunking specification can be
177 repeated multiple times to specify the exact chunking for dif‐
178 ferent variables. If the variable is specified but no chunk
179 sizes are specified (i.e. -c var: ) then chunking is disabled
180 for that variable. If the same variable is specified more than
181 once, the second and later specifications are ignored. Also,
182 this second form, per-variable chunking, takes precedence over
183 any per-dimension chunking except the bare "/" case.
184
185 The third form of the chunkspec has the syntax: var:compact or
186 var:contiguous. This explicitly attempts to set the variable
187 storage type as compact or contiguous, respectively. These may
188 be overridden if other flags require the variable to be chunked.
189
190 -v var1,...
191 The output will include data values for the specified variables,
192 in addition to the declarations of all dimensions, variables,
193 and attributes. One or more variables must be specified by name
194 in the comma-delimited list following this option. The list must
195 be a single argument to the command, hence cannot contain un‐
196 escaped blanks or other white space characters. The named vari‐
197 ables must be valid netCDF variables in the input-file. A vari‐
198 able within a group in a netCDF-4 file may be specified with an
199 absolute path name, such as "/GroupA/GroupA2/var". Use of a
200 relative path name such as 'var' or "grp/var" specifies all
201 matching variable names in the file. The default, without this
202 option, is to include data values for all variables in the
203 output.
204
205 -V var1,...
206 The output will include the specified variables only but all di‐
207 mensions and global or group attributes. One or more variables
208 must be specified by name in the comma-delimited list following
209 this option. The list must be a single argument to the command,
210 hence cannot contain unescaped blanks or other white space char‐
211 acters. The named variables must be valid netCDF variables in
212 the input-file. A variable within a group in a netCDF-4 file may
213 be specified with an absolute path name, such as
214 '/GroupA/GroupA2/var'. Use of a relative path name such as
215 'var' or 'grp/var' specifies all matching variable names in the
216 file. The default, without this option, is to include all
217 variables in the output.
218
219 -g grp1,...
220 The output will include data values only for the specified
221 groups. One or more groups must be specified by name in the
222 comma-delimited list following this option. The list must be a
223 single argument to the command. The named groups must be valid
224 netCDF groups in the input-file. The default, without this op‐
225 tion, is to include data values for all groups in the output.
226
227 -G grp1,...
228 The output will include only the specified groups. One or more
229 groups must be specified by name in the comma-delimited list
230 following this option. The list must be a single argument to the
231 command. The named groups must be valid netCDF groups in the in‐
232 put-file. The default, without this option, is to include all
233 groups in the output.
234
235 -m bufsize
236 An integer or floating-point number that specifies the size, in
237 bytes, of the copy buffer used to copy large variables. A suf‐
238 fix of K, M, G, or T multiplies the copy buffer size by one
239 thousand, million, billion, or trillion, respectively. The de‐
240 fault is 5 Mbytes, but will be increased if necessary to hold at
241 least one chunk of netCDF-4 chunked variables in the input file.
242 You may want to specify a value larger than the default for
243 copying large files over high latency networks. Using the '-w'
244 option may provide better performance, if the output fits in
245 memory.
246
247 -h chunk_cache
248 For netCDF-4 output, including netCDF-4 classic model, an inte‐
249 ger or floating-point number that specifies the size in bytes of
250 chunk cache allocated for each chunked variable. This is not a
251 property of the file, but merely a performance tuning parameter
252 for avoiding compressing or decompressing the same data multiple
253 times while copying and changing chunk shapes. A suffix of K,
254 M, G, or T multiplies the chunk cache size by one thousand, mil‐
255 lion, billion, or trillion, respectively. The default is
256 4.194304 Mbytes (or whatever was specified for the configure-
257 time constant CHUNK_CACHE_SIZE when the netCDF library was
258 built). Ideally, the nccopy utility should accept only one mem‐
259 ory buffer size and divide it optimally between a copy buffer
260 and chunk cache, but no general algorithm for computing the op‐
261 timum chunk cache size has been implemented yet. Using the '-w'
262 option may provide better performance, if the output fits in
263 memory.
264
265 -e cache_elems
266 For netCDF-4 output, including netCDF-4 classic model, specifies
267 number of chunks that the chunk cache can hold. A suffix of K,
268 M, G, or T multiplies the number of chunks that can be held in
269 the cache by one thousand, million, billion, or trillion, re‐
270 spectively. This is not a property of the file, but merely a
271 performance tuning parameter for avoiding compressing or decom‐
272 pressing the same data multiple times while copying and changing
273 chunk shapes. The default is 1009 (or whatever was specified
274 for the configure-time constant CHUNK_CACHE_NELEMS when the
275 netCDF library was built). Ideally, the nccopy utility should
276 determine an optimum value for this parameter, but no general
277 algorithm for computing the optimum number of chunk cache ele‐
278 ments has been implemented yet.
279
280 -r Read netCDF classic or 64-bit offset input file into a diskless
281 netCDF file in memory before copying. Requires that input file
282 be small enough to fit into memory. For nccopy, this doesn't
283 seem to provide any significant speedup, so may not be a useful
284 option.
285
286 -L n Set the log level; only usable if nccopy supports netCDF-4 (en‐
287 hanced).
288
289 -M n Set the minimum chunk size; only usable if nccopy supports
290 netCDF-4 (enhanced).
291
292 -F filterspec
293 For netCDF-4 output, including netCDF-4 classic model, specify a
294 filter to apply to a specified set of variables in the output.
295 As a rule, the filter is a compression/decompression algorithm
296 with a unique numeric identifier assigned by the HDF Group (see
297 https://support.hdfgroup.org/services/filters.html).
298
299 The filterspec argument has this general form.
300 fqn1|fqn2...,filterid,param1,param2...paramn or *,fil‐
301 terid,param1,param2...paramn
302 An fqn (fully qualified name) is the name of a variable prefixed by its
303 containing groups with the group names separated by forward slash
304 ('/'). An example might be /g1/g2/var. Alternatively, just the vari‐
305 able name can be given if it is in the root group: e.g. var. Backslash
306 escapes may be used as needed. A note of warning: the '|' separator is
307 a bash reserved character, so you will probably need to put the filter
308 spec in some kind of quotes or otherwise escape it.
309
310 The filterid is an unsigned positive integer representing the id
311 assigned by the HDFgroup to the filter. Following the id is a
312 sequence of parameters defining the operation of the filter.
313 Each parameter is a 32-bit unsigned integer.
314
315 This parameter may be repeated multiple times with different
316 variable names.
317
318
320 Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF
321 file of the same type:
322
323 nccopy foo1.nc foo2.nc
324
325 Note that the above copy will not be as fast as use of cp or other sim‐
326 ple copy utility, because the file is copied using only the netCDF API.
327 If the input file has extra bytes after the end of the netCDF data,
328 those will not be copied, because they are not accessible through the
329 netCDF interface. If the original file was generated in "No fill" mode
330 so that fill values are not stored for padding for data alignment, the
331 output file may have different padding bytes.
332
333 Convert a netCDF-4 classic model file, compressed.nc, that uses com‐
334 pression, to a netCDF-3 file classic.nc:
335
336 nccopy -k classic compressed.nc classic.nc
337
338 Note that 'nc3' could be used instead of 'classic'.
339
340 Download the variable 'time_bnds' and its associated attributes from an
341 OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
342
343 nccopy 'http://test.opendap.org/opendap/data/nc/sst.mn‐
344 mean.nc.gz?time_bnds' tb.nc
345
346 Note that URLs that name specific variables as command-line arguments
347 should generally be quoted, to avoid the shell interpreting special
348 characters such as '?'.
349
350 Compress all the variables in the input file foo.nc, a netCDF file of
351 any type, to the output file bar.nc:
352
353 nccopy -d1 foo.nc bar.nc
354
355 If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
356 netCDF-4 classic model netCDF file, because the classic and 64-bit off‐
357 set format variants don't support compression. If foo.nc was a
358 netCDF-4 file with some variables compressed using various deflation
359 levels, the output will also be a netCDF-4 file of the same type, but
360 all the variables, including any uncompressed variables in the input,
361 will now use deflation level 1.
362
363 Assume the input data includes gridded variables that use time, lat,
364 lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes,
365 and that the time dimension varies most slowly. Also assume that users
366 want quick access to data at all times for a small set of lat-lon
367 points. Accessing data for 1000 times would typically require access‐
368 ing 1000 disk blocks, which may be slow.
369
370 Reorganizing the data into chunks on disk that have all the time in
371 each chunk for a few lat and lon coordinates would greatly speed up
372 such access. To chunk the data in the input file slow.nc, a netCDF
373 file of any type, to the output file fast.nc, you could use;
374
375 nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc
376
377 to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
378 If you had enough memory to contain the output file, you could speed up
379 the rechunking operation significantly by creating the output in memory
380 before writing it to disk on close (using the -w flag):
381
382 nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
383 Alternatively, one could write this using the alternate, variable-spe‐
384 cific chunking specification and assuming that times, lat, and lon are
385 variables.
386
387 nccopy -c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc
388
390 The complete set of chunking rules is captured here. As a rough summa‐
391 ry, these rules preserve all chunking properties from the input file.
392 These rules apply only when the selected output format supports chunk‐
393 ing, i.e. for the netcdf-4 variants.
394
395 The variable specific chunking specification should be obvious and
396 translates directly to the corresponding "nc_def_var_chunking" API
397 call.
398
399 The original per-dimension, chunking specification requires some inter‐
400 pretation by nccopy. The following rules are applied in the given or‐
401 der independently for each variable to be copied from input to output.
402 The rules are written assuming we are trying to determine the chunking
403 for a given output variable Vout that comes from an input variable Vin.
404
405 1. If there is no '-c' option that applies to a variable and the
406 corresponding input variable is contiguous or the input is some
407 netcdf-3 variant, then let the netcdf-c library make all chunk‐
408 ing decisions.
409
410 2. For each dimension of Vout explicitly specified on the command
411 line (using the '-c' option), apply the chunking value for that
412 dimension regardless of input format or input properties.
413
414 3. For dimensions of Vout not named on the command line in a '-c'
415 option, preserve chunk sizes from the corresponding input vari‐
416 able, if it is chunked.
417
418 4. If Vin is contiguous, and none of its dimensions are named on
419 the command line, and chunking is not mandated by other options,
420 then make Vout be contiguous.
421
422 5. If the input variable is contiguous (or is some netcdf-3 vari‐
423 ant) and there are no options requiring chunking, or the '/'
424 special case for the '-c' option is specified, then the output
425 variable V is marked as contiguous.
426
427 6. Final, default case: some or all chunk sizes are not determined
428 by the command line or the input variable. This includes the
429 non-chunked input cases such as netcdf-3, cdf5, and DAP. In
430 these cases retain all chunk sizes determined by previous rules,
431 and use the full dimension size as the default. The exception is
432 unlimited dimensions, where the default is 4 megabytes.
433
434
436 ncdump(1),[22mncgen(1),netcdf(3)
437
438
439
440Release 4.2 2012-03-08 NCCOPY(1)