1NCCOPY(1) UNIDATA UTILITIES NCCOPY(1)
2
3
4
6 nccopy - Copy a netCDF file, optionally changing format, compression,
7 or chunking in the output.
8
10 nccopy [-k kind_name ] [-kind_code] [-d n ] [-s] [-c chunkspec ]
11 [-u] [-w] [-[v|V] var1,...] [-[g|G] grp1,...] [-m bufsize ]
12 [-h chunk_cache ] [-e cache_elems ] [-r] [-F filterspec ] [-L
13 n ] [-M n ] infile outfile
14
16 The nccopy utility copies an input netCDF file in any supported format
17 variant to an output netCDF file, optionally converting the output to
18 any compatible netCDF format variant, compressing the data, or rechunk‐
19 ing the data. For example, if built with the netCDF-3 library, a
20 netCDF classic file may be copied to a netCDF 64-bit offset file, per‐
21 mitting larger variables. If built with the netCDF-4 library, a netCDF
22 classic file may be copied to a netCDF-4 file or to a netCDF-4 classic
23 model file as well, permitting data compression, efficient schema
24 changes, larger variable sizes, and use of other netCDF-4 features.
25
26 If no output format is specified, with either -k kind_name or
27 -kind_code, then the output will use the same format as the input, un‐
28 less the input is classic or 64-bit offset and either chunking or com‐
29 pression is specified, in which case the output will be netCDF-4 clas‐
30 sic model format. Attempting some kinds of format conversion will re‐
31 sult in an error, if the conversion is not possible. For example, an
32 attempt to copy a netCDF-4 file that uses features of the enhanced mod‐
33 el, such as groups or variable-length strings, to any of the other
34 kinds of netCDF formats that use the classic model will result in an
35 error.
36
37 nccopy also serves as an example of a generic netCDF-4 program, with
38 its ability to read any valid netCDF file and handle nested groups,
39 strings, and user-defined types, including arbitrarily nested compound
40 types, variable-length types, and data of any valid netCDF-4 type.
41
42 If DAP support was enabled when nccopy was built, the file name may
43 specify a DAP URL. This may be used to convert data on DAP servers to
44 local netCDF files.
45
47 -k kind_name
48 Use format name to specify the kind of file to be created and,
49 by inference, the data model (i.e. netcdf-3 (classic) or
50 netcdf-4 (enhanced)). The possible arguments are:
51
52 'nc3' or 'classic' => netCDF classic format
53
54 'nc6' or '64-bit offset' => netCDF 64-bit format
55
56 'nc4' or 'netCDF-4' => netCDF-4 format (enhanced data
57 model)
58
59 'nc7' or 'netCDF-4 classic model' => netCDF-4 classic
60 model format
61
62 Note: The old format numbers '1', '2', '3', '4', equivalent to
63 the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are
64 also still accepted but deprecated, due to easy confusion be‐
65 tween format numbers and format names.
66
67 [-kind_code]
68 Use format numeric code (instead of format name) to specify the
69 kind of file to be created and, by inference, the data model
70 (i.e. netcdf-3 (classic) versus netcdf-4 (enhanced)). The nu‐
71 meric codes are:
72
73 3 => netcdf classic format
74
75 6 => netCDF 64-bit format
76
77 4 => netCDF-4 format (enhanced data model)
78
79 7 => netCDF-4 classic model format
80 The numeric code "7" is used because "7=3+4", specifying the format
81 that uses the netCDF-3 data model for compatibility with the netCDF-4
82 storage format for performance. Credit is due to NCO for use of these
83 numeric codes instead of the old and confusing format numbers.
84
85 -d n
86 For netCDF-4 output, including netCDF-4 classic model, specify
87 deflation level (level of compression) for variable data output.
88 0 corresponds to no compression and 9 to maximum compression,
89 with higher levels of compression requiring marginally more time
90 to compress or uncompress than lower levels. Compression
91 achieved may also depend on output chunking parameters. If this
92 option is specified for a classic format or 64-bit offset format
93 input file, it is not necessary to also specify that the output
94 should be netCDF-4 classic model, as that will be the default.
95 If this option is not specified and the input file has com‐
96 pressed variables, the compression will still be preserved in
97 the output, using the same chunking as in the input by default.
98
99 Note that nccopy requires all variables to be compressed using
100 the same compression level, but the API has no such restriction.
101 With a program you can customize compression for each variable
102 independently.
103
104 -s For netCDF-4 output, including netCDF-4 classic model, specify
105 shuffling of variable data bytes before compression or after de‐
106 compression. Shuffling refers to interlacing of bytes in a
107 chunk so that the first bytes of all values are contiguous in
108 storage, followed by all the second bytes, and so on, which of‐
109 ten improves compression. This option is ignored unless a non-
110 zero deflation level is specified. Using -d0 to specify no de‐
111 flation on input data that has been compressed and shuffled
112 turns off both compression and shuffling in the output.
113
114 -u Convert any unlimited size dimensions in the input to fixed size
115 dimensions in the output. This can speed up variable-at-a-time
116 access, but slow down record-at-a-time access to multiple vari‐
117 ables along an unlimited dimension.
118
119 -w Keep output in memory (as a diskless netCDF file) until output
120 is closed, at which time output file is written to disk. This
121 can greatly speedup operations such as converting unlimited di‐
122 mension to fixed size (-u option), chunking, rechunking, or com‐
123 pressing the input. It requires that available memory is large
124 enough to hold the output file. This option may provide a larg‐
125 er speedup than careful tuning of the -m, -h, or -e options, and
126 it's certainly a lot simpler.
127
128 -c chunkspec
129 For netCDF-4 output, including netCDF-4 classic model, specify
130 chunking (multidimensional tiling) for variable data in the out‐
131 put. This is useful to specify the units of disk access, com‐
132 pression, or other filters such as checksums. Changing the
133 chunking in a netCDF file can also greatly speedup access, by
134 choosing chunk shapes that are appropriate for the most common
135 access patterns.
136
137 The chunkspec argument has two forms. The first form is the
138 original, deprecated form and is a string of comma-separated as‐
139 sociations, each specifying a dimension name, a '/' character,
140 and optionally the corresponding chunk length for that dimen‐
141 sion. No blanks should appear in the chunkspec string, except
142 possibly escaped blanks that are part of a dimension name. A
143 chunkspec names at least one dimension, and may omit dimensions
144 which are not to be chunked or for which the default chunk
145 length is desired. If a dimension name is followed by a '/'
146 character but no subsequent chunk length, the actual dimension
147 length is assumed. If copying a classic model file to a
148 netCDF-4 output file and not naming all dimensions in the
149 chunkspec, unnamed dimensions will also use the actual dimension
150 length for the chunk length. An example of a chunkspec for
151 variables that use 'm' and 'n' dimensions might be 'm/100,n/200'
152 to specify 100 by 200 chunks. To see the chunking resulting from
153 copying with a chunkspec, use the '-s' option of ncdump on the
154 output file.
155
156 The chunkspec '/' that omits all dimension names and correspond‐
157 ing chunk lengths specifies that no chunking is to occur in the
158 output, so can be used to unchunk all the chunked variables. To
159 see the chunking resulting from copying with a chunkspec, use
160 the '-s' option of ncdump on the output file.
161
162 As an I/O optimization, nccopy has a threshold for the minimum
163 size of non-record variables that get chunked, currently 8192
164 bytes. The -M flag can be used to override this value.
165
166 Note that nccopy requires variables that share a dimension to
167 also share the chunk size associated with that dimension, but
168 the programming interface has no such restriction. If you need
169 to customize chunking for variables independently, you will need
170 to use the second form of chunkspec. This second form of
171 chunkspec has this syntax: var:n1,n2,...,nn . This assumes that
172 the variable named "var" has rank n. The chunking to be applied
173 to each dimension of the variable is specified by the values of
174 n1 through nn. This second form of chunking specification can be
175 repeated multiple times to specify the exact chunking for dif‐
176 ferent variables. If the variable is specified but no chunk
177 sizes are specified (i.e. -c var: ) then chunking is disabled
178 for that variable. If the same variable is specified more than
179 once, the second and later specifications are ignored. Also,
180 this second form, per-variable chunking, takes precedence over
181 any per-dimension chunking except the bare "/" case.
182
183 -v var1,...
184 The output will include data values for the specified variables,
185 in addition to the declarations of all dimensions, variables,
186 and attributes. One or more variables must be specified by name
187 in the comma-delimited list following this option. The list must
188 be a single argument to the command, hence cannot contain un‐
189 escaped blanks or other white space characters. The named vari‐
190 ables must be valid netCDF variables in the input-file. A vari‐
191 able within a group in a netCDF-4 file may be specified with an
192 absolute path name, such as "/GroupA/GroupA2/var". Use of a
193 relative path name such as 'var' or "grp/var" specifies all
194 matching variable names in the file. The default, without this
195 option, is to include data values for all variables in the
196 output.
197
198 -V var1,...
199 The output will include the specified variables only but all di‐
200 mensions and global or group attributes. One or more variables
201 must be specified by name in the comma-delimited list following
202 this option. The list must be a single argument to the command,
203 hence cannot contain unescaped blanks or other white space char‐
204 acters. The named variables must be valid netCDF variables in
205 the input-file. A variable within a group in a netCDF-4 file may
206 be specified with an absolute path name, such as
207 '/GroupA/GroupA2/var'. Use of a relative path name such as
208 'var' or 'grp/var' specifies all matching variable names in the
209 file. The default, without this option, is to include all
210 variables in the output.
211
212 -g grp1,...
213 The output will include data values only for the specified
214 groups. One or more groups must be specified by name in the
215 comma-delimited list following this option. The list must be a
216 single argument to the command. The named groups must be valid
217 netCDF groups in the input-file. The default, without this op‐
218 tion, is to include data values for all groups in the output.
219
220 -G grp1,...
221 The output will include only the specified groups. One or more
222 groups must be specified by name in the comma-delimited list
223 following this option. The list must be a single argument to the
224 command. The named groups must be valid netCDF groups in the in‐
225 put-file. The default, without this option, is to include all
226 groups in the output.
227
228 -m bufsize
229 An integer or floating-point number that specifies the size, in
230 bytes, of the copy buffer used to copy large variables. A suf‐
231 fix of K, M, G, or T multiplies the copy buffer size by one
232 thousand, million, billion, or trillion, respectively. The de‐
233 fault is 5 Mbytes, but will be increased if necessary to hold at
234 least one chunk of netCDF-4 chunked variables in the input file.
235 You may want to specify a value larger than the default for
236 copying large files over high latency networks. Using the '-w'
237 option may provide better performance, if the output fits in
238 memory.
239
240 -h chunk_cache
241 For netCDF-4 output, including netCDF-4 classic model, an inte‐
242 ger or floating-point number that specifies the size in bytes of
243 chunk cache allocated for each chunked variable. This is not a
244 property of the file, but merely a performance tuning parameter
245 for avoiding compressing or decompressing the same data multiple
246 times while copying and changing chunk shapes. A suffix of K,
247 M, G, or T multiplies the chunk cache size by one thousand, mil‐
248 lion, billion, or trillion, respectively. The default is
249 4.194304 Mbytes (or whatever was specified for the configure-
250 time constant CHUNK_CACHE_SIZE when the netCDF library was
251 built). Ideally, the nccopy utility should accept only one mem‐
252 ory buffer size and divide it optimally between a copy buffer
253 and chunk cache, but no general algorithm for computing the op‐
254 timum chunk cache size has been implemented yet. Using the '-w'
255 option may provide better performance, if the output fits in
256 memory.
257
258 -e cache_elems
259 For netCDF-4 output, including netCDF-4 classic model, specifies
260 number of chunks that the chunk cache can hold. A suffix of K,
261 M, G, or T multiplies the number of chunks that can be held in
262 the cache by one thousand, million, billion, or trillion, re‐
263 spectively. This is not a property of the file, but merely a
264 performance tuning parameter for avoiding compressing or decom‐
265 pressing the same data multiple times while copying and changing
266 chunk shapes. The default is 1009 (or whatever was specified
267 for the configure-time constant CHUNK_CACHE_NELEMS when the
268 netCDF library was built). Ideally, the nccopy utility should
269 determine an optimum value for this parameter, but no general
270 algorithm for computing the optimum number of chunk cache ele‐
271 ments has been implemented yet.
272
273 -r Read netCDF classic or 64-bit offset input file into a diskless
274 netCDF file in memory before copying. Requires that input file
275 be small enough to fit into memory. For nccopy, this doesn't
276 seem to provide any significant speedup, so may not be a useful
277 option.
278
279 -L n Set the log level; only usable if nccopy supports netCDF-4 (en‐
280 hanced).
281
282 -M n Set the minimum chunk size; only usable if nccopy supports
283 netCDF-4 (enhanced).
284
285 -F filterspec
286 For netCDF-4 output, including netCDF-4 classic model, specify a
287 filter to apply to a specified set of variables in the output.
288 As a rule, the filter is a compression/decompression algorithm
289 with a unique numeric identifier assigned by the HDF Group (see
290 https://support.hdfgroup.org/services/filters.html).
291
292 The filterspec argument has this general form.
293 fqn1|fqn2...,filterid,param1,param2...paramn or *,fil‐
294 terid,param1,param2...paramn
295 An fqn (fully qualified name) is the name of a variable prefixed by its
296 containing groups with the group names separated by forward slash
297 ('/'). An example might be /g1/g2/var. Alternatively, just the vari‐
298 able name can be given if it is in the root group: e.g. var. Backslash
299 escapes may be used as needed. A note of warning: the '|' separator is
300 a bash reserved character, so you will probably need to put the filter
301 spec in some kind of quotes or otherwise escape it.
302
303 The filterid is an unsigned positive integer representing the id
304 assigned by the HDFgroup to the filter. Following the id is a
305 sequence of parameters defining the operation of the filter.
306 Each parameter is a 32-bit unsigned integer.
307
308 This parameter may be repeated multiple times with different
309 variable names.
310
311
313 Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF
314 file of the same type:
315
316 nccopy foo1.nc foo2.nc
317
318 Note that the above copy will not be as fast as use of cp or other sim‐
319 ple copy utility, because the file is copied using only the netCDF API.
320 If the input file has extra bytes after the end of the netCDF data,
321 those will not be copied, because they are not accessible through the
322 netCDF interface. If the original file was generated in "No fill" mode
323 so that fill values are not stored for padding for data alignment, the
324 output file may have different padding bytes.
325
326 Convert a netCDF-4 classic model file, compressed.nc, that uses com‐
327 pression, to a netCDF-3 file classic.nc:
328
329 nccopy -k classic compressed.nc classic.nc
330
331 Note that 'nc3' could be used instead of 'classic'.
332
333 Download the variable 'time_bnds' and its associated attributes from an
334 OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
335
336 nccopy 'http://test.opendap.org/opendap/data/nc/sst.mn‐
337 mean.nc.gz?time_bnds' tb.nc
338
339 Note that URLs that name specific variables as command-line arguments
340 should generally be quoted, to avoid the shell interpreting special
341 characters such as '?'.
342
343 Compress all the variables in the input file foo.nc, a netCDF file of
344 any type, to the output file bar.nc:
345
346 nccopy -d1 foo.nc bar.nc
347
348 If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
349 netCDF-4 classic model netCDF file, because the classic and 64-bit off‐
350 set format variants don't support compression. If foo.nc was a
351 netCDF-4 file with some variables compressed using various deflation
352 levels, the output will also be a netCDF-4 file of the same type, but
353 all the variables, including any uncompressed variables in the input,
354 will now use deflation level 1.
355
356 Assume the input data includes gridded variables that use time, lat,
357 lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes,
358 and that the time dimension varies most slowly. Also assume that users
359 want quick access to data at all times for a small set of lat-lon
360 points. Accessing data for 1000 times would typically require access‐
361 ing 1000 disk blocks, which may be slow.
362
363 Reorganizing the data into chunks on disk that have all the time in
364 each chunk for a few lat and lon coordinates would greatly speed up
365 such access. To chunk the data in the input file slow.nc, a netCDF
366 file of any type, to the output file fast.nc, you could use;
367
368 nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc
369
370 to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
371 If you had enough memory to contain the output file, you could speed up
372 the rechunking operation significantly by creating the output in memory
373 before writing it to disk on close (using the -w flag):
374
375 nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
376 Alternatively, one could write this using the alternate, variable-spe‐
377 cific chunking specification and assuming that times, lat, and lon are
378 variables.
379
380 nccopy -c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc
381
383 The complete set of chunking rules is captured here. As a rough summa‐
384 ry, these rules preserve all chunking properties from the input file.
385 These rules apply only when the selected output format supports chunk‐
386 ing, i.e. for the netcdf-4 variants.
387
388 The variable specific chunking specification should be obvious and
389 translates directly to the corresponding "nc_def_var_chunking" API
390 call.
391
392 The original per-dimension, chunking specification requires some inter‐
393 pretation by nccopy. The following rules are applied in the given or‐
394 der independently for each variable to be copied from input to output.
395 The rules are written assuming we are trying to determine the chunking
396 for a given output variable Vout that comes from an input variable Vin.
397
398 1. If there is no '-c' option that applies to a variable and the
399 corresponding input variable is contiguous or the input is some
400 netcdf-3 variant, then let the netcdf-c library make all chunk‐
401 ing decisions.
402
403 2. For each dimension of Vout explicitly specified on the command
404 line (using the '-c' option), apply the chunking value for that
405 dimension regardless of input format or input properties.
406
407 3. For dimensions of Vout not named on the command line in a '-c'
408 option, preserve chunk sizes from the corresponding input vari‐
409 able, if it is chunked.
410
411 4. If Vin is contiguous, and none of its dimensions are named on
412 the command line, and chunking is not mandated by other options,
413 then make Vout be contiguous.
414
415 5. If the input variable is contiguous (or is some netcdf-3 vari‐
416 ant) and there are no options requiring chunking, or the '/'
417 special case for the '-c' option is specified, then the output
418 variable V is marked as contiguous.
419
420 6. Final, default case: some or all chunk sizes are not determined
421 by the command line or the input variable. This includes the
422 non-chunked input cases such as netcdf-3, cdf5, and DAP. In
423 these cases retain all chunk sizes determined by previous rules,
424 and use the full dimension size as the default. The exception is
425 unlimited dimensions, where the default is 4 megabytes.
426
427
429 ncdump(1),[22mncgen(1),netcdf(3)
430
431
432
433Release 4.2 2012-03-08 NCCOPY(1)