nccopy(1)

1NCCOPY(1)                      UNIDATA UTILITIES                     NCCOPY(1)
2
3
4

NAME

6       nccopy  -  Copy a netCDF file, optionally changing format, compression,
7       or chunking in the output.
8

SYNOPSIS

10       nccopy [-k  kind_name ] [-kind_code] [-d  n ]  [-s]  [-c   chunkspec  ]
11              [-u]  [-w]  [-[v|V] var1,...]  [-[g|G] grp1,...]  [-m  bufsize ]
12              [-h  chunk_cache ] [-e  cache_elems ] [-r] [-F  filterspec ] [-L
13              n ] [-M  n ]  infile  outfile
14

DESCRIPTION

16       The  nccopy utility copies an input netCDF file in any supported format
17       variant to an output netCDF file, optionally converting the  output  to
18       any compatible netCDF format variant, compressing the data, or rechunk‐
19       ing the data.  For example, if  built  with  the  netCDF-3  library,  a
20       netCDF  classic file may be copied to a netCDF 64-bit offset file, per‐
21       mitting larger variables.  If built with the netCDF-4 library, a netCDF
22       classic  file may be copied to a netCDF-4 file or to a netCDF-4 classic
23       model file as  well,  permitting  data  compression,  efficient  schema
24       changes, larger variable sizes, and use of other netCDF-4 features.
25
26       If  no  output  format  is  specified,  with  either  -k  kind_name  or
27       -kind_code, then the output will use the same format as the input,  un‐
28       less  the input is classic or 64-bit offset and either chunking or com‐
29       pression is specified, in which case the output will be netCDF-4  clas‐
30       sic  model format.  Attempting some kinds of format conversion will re‐
31       sult in an error, if the conversion is not possible.  For  example,  an
32       attempt to copy a netCDF-4 file that uses features of the enhanced mod‐
33       el, such as groups or variable-length strings,  to  any  of  the  other
34       kinds  of  netCDF  formats that use the classic model will result in an
35       error.
36
37       nccopy also serves as an example of a generic  netCDF-4  program,  with
38       its  ability  to  read  any valid netCDF file and handle nested groups,
39       strings, and user-defined types, including arbitrarily nested  compound
40       types, variable-length types, and data of any valid netCDF-4 type.
41
42       If  DAP  support  was  enabled when nccopy was built, the file name may
43       specify a DAP URL. This may be used to convert data on DAP  servers  to
44       local netCDF files.
45

OPTIONS

47        -k   kind_name
48              Use  format  name to specify the kind of file to be created and,
49              by  inference,  the  data  model  (i.e.  netcdf-3  (classic)  or
50              netcdf-4 (enhanced)).  The possible arguments are:
51
52                     'nc3' or 'classic' => netCDF classic format
53
54                     'nc6' or '64-bit offset' => netCDF 64-bit format
55
56                     'nc4'  or  'netCDF-4'  =>  netCDF-4 format (enhanced data
57                     model)
58
59                     'nc7' or 'netCDF-4 classic  model'  =>  netCDF-4  classic
60                     model format
61
62              Note:  The  old format numbers '1', '2', '3', '4', equivalent to
63              the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are
64              also  still  accepted  but deprecated, due to easy confusion be‐
65              tween format numbers and format names.
66
67       [-kind_code]
68              Use format numeric code (instead of format name) to specify  the
69              kind  of  file  to  be created and, by inference, the data model
70              (i.e. netcdf-3 (classic) versus netcdf-4 (enhanced)).   The  nu‐
71              meric codes are:
72
73                     3 => netcdf classic format
74
75                     6 => netCDF 64-bit format
76
77                     4 => netCDF-4 format (enhanced data model)
78
79                     7 => netCDF-4 classic model format
80       The  numeric  code  "7"  is used because "7=3+4", specifying the format
81       that uses the netCDF-3 data model for compatibility with  the  netCDF-4
82       storage  format  for performance. Credit is due to NCO for use of these
83       numeric codes instead of the old and confusing format numbers.
84
85        -d   n
86              For netCDF-4 output, including netCDF-4 classic  model,  specify
87              deflation level (level of compression) for variable data output.
88              0 corresponds to no compression and 9  to  maximum  compression,
89              with higher levels of compression requiring marginally more time
90              to  compress  or  uncompress  than  lower  levels.   Compression
91              achieved may also depend on output chunking parameters.  If this
92              option is specified for a classic format or 64-bit offset format
93              input  file, it is not necessary to also specify that the output
94              should be netCDF-4 classic model, as that will be  the  default.
95              If  this  option  is  not  specified and the input file has com‐
96              pressed variables, the compression will still  be  preserved  in
97              the output, using the same chunking as in the input by default.
98
99              Note  that  nccopy requires all variables to be compressed using
100              the same compression level, but the API has no such restriction.
101              With  a  program you can customize compression for each variable
102              independently.
103
104        -s    For netCDF-4 output, including netCDF-4 classic  model,  specify
105              shuffling of variable data bytes before compression or after de‐
106              compression.  Shuffling refers to  interlacing  of  bytes  in  a
107              chunk  so  that  the first bytes of all values are contiguous in
108              storage, followed by all the second bytes, and so on, which  of‐
109              ten  improves compression.  This option is ignored unless a non-
110              zero deflation level is specified.  Using -d0 to specify no  de‐
111              flation  on  input  data  that  has been compressed and shuffled
112              turns off both compression and shuffling in the output.
113
114        -u    Convert any unlimited size dimensions in the input to fixed size
115              dimensions  in the output.  This can speed up variable-at-a-time
116              access, but slow down record-at-a-time access to multiple  vari‐
117              ables along an unlimited dimension.
118
119        -w    Keep  output  in memory (as a diskless netCDF file) until output
120              is closed, at which time output file is written to  disk.   This
121              can  greatly speedup operations such as converting unlimited di‐
122              mension to fixed size (-u option), chunking, rechunking, or com‐
123              pressing  the input.  It requires that available memory is large
124              enough to hold the output file.  This option may provide a larg‐
125              er speedup than careful tuning of the -m, -h, or -e options, and
126              it's certainly a lot simpler.
127
128        -c  chunkspec
129              For netCDF-4 output, including netCDF-4 classic  model,  specify
130              chunking (multidimensional tiling) for variable data in the out‐
131              put.  This is useful to specify the units of disk  access,  com‐
132              pression,  or  other  filters  such  as checksums.  Changing the
133              chunking in a netCDF file can also greatly  speedup  access,  by
134              choosing  chunk  shapes that are appropriate for the most common
135              access patterns.
136
137              The chunkspec argument has two forms.  The  first  form  is  the
138              original, deprecated form and is a string of comma-separated as‐
139              sociations, each specifying a dimension name, a  '/'  character,
140              and  optionally  the  corresponding chunk length for that dimen‐
141              sion.  No blanks should appear in the chunkspec  string,  except
142              possibly  escaped  blanks  that are part of a dimension name.  A
143              chunkspec names at least one dimension, and may omit  dimensions
144              which  are  not  to  be  chunked  or for which the default chunk
145              length is desired.  If a dimension name is  followed  by  a  '/'
146              character  but  no subsequent chunk length, the actual dimension
147              length is assumed.   If  copying  a  classic  model  file  to  a
148              netCDF-4  output  file  and  not  naming  all  dimensions in the
149              chunkspec, unnamed dimensions will also use the actual dimension
150              length  for  the  chunk  length.   An example of a chunkspec for
151              variables that use 'm' and 'n' dimensions might be 'm/100,n/200'
152              to specify 100 by 200 chunks. To see the chunking resulting from
153              copying with a chunkspec, use the '-s' option of ncdump  on  the
154              output file.
155
156              The chunkspec '/' that omits all dimension names and correspond‐
157              ing chunk lengths specifies that no chunking is to occur in  the
158              output, so can be used to unchunk all the chunked variables.  To
159              see the chunking resulting from copying with  a  chunkspec,  use
160              the '-s' option of ncdump on the output file.
161
162              As  an  I/O optimization, nccopy has a threshold for the minimum
163              size of non-record variables that get  chunked,  currently  8192
164              bytes. The -M flag can be used to override this value.
165
166              Note  that  nccopy  requires variables that share a dimension to
167              also share the chunk size associated with  that  dimension,  but
168              the  programming interface has no such restriction.  If you need
169              to customize chunking for variables independently, you will need
170              to  use  the  second  form  of  chunkspec.  This  second form of
171              chunkspec has this syntax:  var:n1,n2,...,nn . This assumes that
172              the  variable named "var" has rank n. The chunking to be applied
173              to each dimension of the variable is specified by the values  of
174              n1 through nn. This second form of chunking specification can be
175              repeated multiple times to specify the exact chunking  for  dif‐
176              ferent  variables.   If  the  variable is specified but no chunk
177              sizes are specified (i.e.  -c var: ) then chunking  is  disabled
178              for  that variable.  If the same variable is specified more than
179              once, the second and later specifications  are  ignored.   Also,
180              this  second  form, per-variable chunking, takes precedence over
181              any per-dimension chunking except the bare "/" case.
182
183        -v   var1,...
184              The output will include data values for the specified variables,
185              in  addition  to  the declarations of all dimensions, variables,
186              and attributes. One or more variables must be specified by  name
187              in the comma-delimited list following this option. The list must
188              be a single argument to the command, hence  cannot  contain  un‐
189              escaped  blanks or other white space characters. The named vari‐
190              ables must be valid netCDF variables in the input-file. A  vari‐
191              able  within a group in a netCDF-4 file may be specified with an
192              absolute path name, such as  "/GroupA/GroupA2/var".   Use  of  a
193              relative  path  name  such  as  'var' or "grp/var" specifies all
194              matching variable names in the file.  The default, without  this
195              option,  is  to  include  data values for  all  variables in the
196              output.
197
198        -V   var1,...
199              The output will include the specified variables only but all di‐
200              mensions  and  global or group attributes. One or more variables
201              must be specified by name in the comma-delimited list  following
202              this  option. The list must be a single argument to the command,
203              hence cannot contain unescaped blanks or other white space char‐
204              acters.  The  named  variables must be valid netCDF variables in
205              the input-file. A variable within a group in a netCDF-4 file may
206              be   specified   with   an   absolute   path   name,   such   as
207              '/GroupA/GroupA2/var'.  Use of a  relative  path  name  such  as
208              'var'  or 'grp/var' specifies all matching variable names in the
209              file.  The default, without this  option,  is  to  include   all
210              variables in the output.
211
212        -g   grp1,...
213              The  output  will  include  data  values  only for the specified
214              groups.  One or more groups must be specified  by  name  in  the
215              comma-delimited  list  following this option. The list must be a
216              single argument to the command. The named groups must  be  valid
217              netCDF  groups  in the input-file. The default, without this op‐
218              tion, is to include data values for all groups in the output.
219
220        -G   grp1,...
221              The output will include only the specified groups.  One or  more
222              groups  must  be  specified  by name in the comma-delimited list
223              following this option. The list must be a single argument to the
224              command. The named groups must be valid netCDF groups in the in‐
225              put-file. The default, without this option, is  to  include  all
226              groups in the output.
227
228        -m   bufsize
229              An  integer or floating-point number that specifies the size, in
230              bytes, of the copy buffer used to copy large variables.  A  suf‐
231              fix  of  K,  M,  G,  or T multiplies the copy buffer size by one
232              thousand, million, billion, or trillion, respectively.  The  de‐
233              fault is 5 Mbytes, but will be increased if necessary to hold at
234              least one chunk of netCDF-4 chunked variables in the input file.
235              You  may  want  to  specify  a value larger than the default for
236              copying large files over high latency networks.  Using the  '-w'
237              option  may  provide  better  performance, if the output fits in
238              memory.
239
240        -h   chunk_cache
241              For netCDF-4 output, including netCDF-4 classic model, an  inte‐
242              ger or floating-point number that specifies the size in bytes of
243              chunk cache allocated for each chunked variable.  This is not  a
244              property  of the file, but merely a performance tuning parameter
245              for avoiding compressing or decompressing the same data multiple
246              times  while  copying and changing chunk shapes.  A suffix of K,
247              M, G, or T multiplies the chunk cache size by one thousand, mil‐
248              lion,  billion,  or  trillion,  respectively.   The  default  is
249              4.194304 Mbytes (or whatever was specified  for  the  configure-
250              time  constant  CHUNK_CACHE_SIZE  when  the  netCDF  library was
251              built).  Ideally, the nccopy utility should accept only one mem‐
252              ory  buffer  size  and divide it optimally between a copy buffer
253              and chunk cache, but no general algorithm for computing the  op‐
254              timum  chunk cache size has been implemented yet. Using the '-w'
255              option may provide better performance, if  the  output  fits  in
256              memory.
257
258        -e   cache_elems
259              For netCDF-4 output, including netCDF-4 classic model, specifies
260              number of chunks that the chunk cache can hold. A suffix  of  K,
261              M,  G,  or T multiplies the number of chunks that can be held in
262              the cache by one thousand, million, billion,  or  trillion,  re‐
263              spectively.   This  is  not a property of the file, but merely a
264              performance tuning parameter for avoiding compressing or  decom‐
265              pressing the same data multiple times while copying and changing
266              chunk shapes.  The default is 1009 (or  whatever  was  specified
267              for  the  configure-time  constant  CHUNK_CACHE_NELEMS  when the
268              netCDF library was built).  Ideally, the nccopy  utility  should
269              determine  an  optimum  value for this parameter, but no general
270              algorithm for computing the optimum number of chunk  cache  ele‐
271              ments has been implemented yet.
272
273        -r    Read  netCDF classic or 64-bit offset input file into a diskless
274              netCDF file in memory before copying.  Requires that input  file
275              be  small  enough  to fit into memory.  For nccopy, this doesn't
276              seem to provide any significant speedup, so may not be a  useful
277              option.
278
279        -L  n Set  the log level; only usable if nccopy supports netCDF-4 (en‐
280              hanced).
281
282        -M  n Set the minimum chunk  size;  only  usable  if  nccopy  supports
283              netCDF-4 (enhanced).
284
285        -F  filterspec
286              For netCDF-4 output, including netCDF-4 classic model, specify a
287              filter to apply to a specified set of variables in  the  output.
288              As  a  rule, the filter is a compression/decompression algorithm
289              with a unique numeric identifier assigned by the HDF Group  (see
290              https://support.hdfgroup.org/services/filters.html).
291
292              The filterspec argument has this general form.
293              fqn1|fqn2...,filterid,param1,param2...paramn      or      *,fil‐
294              terid,param1,param2...paramn
295       An fqn (fully qualified name) is the name of a variable prefixed by its
296       containing  groups  with  the  group  names  separated by forward slash
297       ('/').  An example might be /g1/g2/var. Alternatively, just  the  vari‐
298       able  name can be given if it is in the root group: e.g. var. Backslash
299       escapes may be used as needed.  A note of warning: the '|' separator is
300       a  bash reserved character, so you will probably need to put the filter
301       spec in some kind of quotes or otherwise escape it.
302
303              The filterid is an unsigned positive integer representing the id
304              assigned  by  the  HDFgroup to the filter. Following the id is a
305              sequence of parameters defining the  operation  of  the  filter.
306              Each parameter is a 32-bit unsigned integer.
307
308              This  parameter  may  be  repeated multiple times with different
309              variable names.
310
311

EXAMPLES

313       Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF
314       file of the same type:
315
316              nccopy foo1.nc foo2.nc
317
318       Note that the above copy will not be as fast as use of cp or other sim‐
319       ple copy utility, because the file is copied using only the netCDF API.
320       If  the  input  file  has extra bytes after the end of the netCDF data,
321       those will not be copied, because they are not accessible  through  the
322       netCDF interface.  If the original file was generated in "No fill" mode
323       so that fill values are not stored for padding for data alignment,  the
324       output file may have different padding bytes.
325
326       Convert  a  netCDF-4  classic model file, compressed.nc, that uses com‐
327       pression, to a netCDF-3 file classic.nc:
328
329              nccopy -k classic compressed.nc classic.nc
330
331       Note that 'nc3' could be used instead of 'classic'.
332
333       Download the variable 'time_bnds' and its associated attributes from an
334       OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
335
336              nccopy          'http://test.opendap.org/opendap/data/nc/sst.mn‐
337                     mean.nc.gz?time_bnds' tb.nc
338
339       Note that URLs that name specific variables as  command-line  arguments
340       should  generally  be  quoted,  to avoid the shell interpreting special
341       characters such as '?'.
342
343       Compress all the variables in the input file foo.nc, a netCDF  file  of
344       any type, to the output file bar.nc:
345
346              nccopy -d1 foo.nc bar.nc
347
348       If  foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be a
349       netCDF-4 classic model netCDF file, because the classic and 64-bit off‐
350       set  format  variants  don't  support  compression.   If  foo.nc  was a
351       netCDF-4 file with some variables compressed  using  various  deflation
352       levels,  the  output will also be a netCDF-4 file of the same type, but
353       all the variables, including any uncompressed variables in  the  input,
354       will now use deflation level 1.
355
356       Assume  the  input  data includes gridded variables that use time, lat,
357       lon dimensions, with 1000 times by 1000 latitudes by  1000  longitudes,
358       and that the time dimension varies most slowly.  Also assume that users
359       want quick access to data at all times  for  a  small  set  of  lat-lon
360       points.   Accessing data for 1000 times would typically require access‐
361       ing 1000 disk blocks, which may be slow.
362
363       Reorganizing the data into chunks on disk that have  all  the  time  in
364       each  chunk  for  a  few lat and lon coordinates would greatly speed up
365       such access.  To chunk the data in the input  file  slow.nc,  a  netCDF
366       file of any type, to the output file fast.nc, you could use;
367
368              nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc
369
370       to  specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.
371       If you had enough memory to contain the output file, you could speed up
372       the rechunking operation significantly by creating the output in memory
373       before writing it to disk on close (using the -w flag):
374
375              nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
376       Alternatively, one could write this using the alternate,  variable-spe‐
377       cific  chunking specification and assuming that times, lat, and lon are
378       variables.
379
380              nccopy -c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc
381

Chunking Rules

383       The complete set of chunking rules is captured here.  As a rough summa‐
384       ry,  these  rules preserve all chunking properties from the input file.
385       These rules apply only when the selected output format supports  chunk‐
386       ing, i.e. for the netcdf-4 variants.
387
388       The  variable  specific  chunking  specification  should be obvious and
389       translates directly  to  the  corresponding  "nc_def_var_chunking"  API
390       call.
391
392       The original per-dimension, chunking specification requires some inter‐
393       pretation by nccopy.  The following rules are applied in the given  or‐
394       der  independently for each variable to be copied from input to output.
395       The rules are written assuming we are trying to determine the  chunking
396       for a given output variable Vout that comes from an input variable Vin.
397
398       1.     If  there  is  no '-c' option that applies to a variable and the
399              corresponding input variable is contiguous or the input is  some
400              netcdf-3  variant, then let the netcdf-c library make all chunk‐
401              ing decisions.
402
403       2.     For each dimension of Vout explicitly specified on  the  command
404              line  (using the '-c' option), apply the chunking value for that
405              dimension regardless of input format or input properties.
406
407       3.     For dimensions of Vout not named on the command line in  a  '-c'
408              option,  preserve chunk sizes from the corresponding input vari‐
409              able, if it is chunked.
410
411       4.     If Vin is contiguous, and none of its dimensions  are  named  on
412              the command line, and chunking is not mandated by other options,
413              then make Vout be contiguous.
414
415       5.     If the input variable is contiguous (or is some  netcdf-3  vari‐
416              ant)  and  there  are  no options requiring chunking, or the '/'
417              special case for the '-c' option is specified, then  the  output
418              variable V is marked as contiguous.
419
420       6.     Final,  default case: some or all chunk sizes are not determined
421              by the command line or the input  variable.  This  includes  the
422              non-chunked  input  cases  such  as  netcdf-3, cdf5, and DAP. In
423              these cases retain all chunk sizes determined by previous rules,
424              and use the full dimension size as the default. The exception is
425              unlimited dimensions, where the default is 4 megabytes.
426
427

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

Chunking Rules

SEE ALSO