1NCCOPY(1)                      UNIDATA UTILITIES                     NCCOPY(1)
2
3
4

NAME

6       nccopy  -  Copy a netCDF file, optionally changing format, compression,
7       or chunking in the output.
8

SYNOPSIS

10       nccopy [-k  kind_name ] [-kind_code] [-d  n ]  [-s]  [-c   chunkspec  ]
11              [-u]  [-w]  [-[v|V] var1,...]  [-[g|G] grp1,...]  [-m  bufsize ]
12              [-h  chunk_cache ] [-e  cache_elems ] [-r] [-F  filterspec ] [-L
13              n ] [-M  n ]  infile  outfile
14

DESCRIPTION

16       The  nccopy utility copies an input netCDF file in any supported format
17       variant to an output netCDF file, optionally converting the  output  to
18       any compatible netCDF format variant, compressing the data, or rechunk‐
19       ing the data.  For example, if  built  with  the  netCDF-3  library,  a
20       netCDF  classic file may be copied to a netCDF 64-bit offset file, per‐
21       mitting larger variables.  If built with the netCDF-4 library, a netCDF
22       classic  file may be copied to a netCDF-4 file or to a netCDF-4 classic
23       model file as  well,  permitting  data  compression,  efficient  schema
24       changes, larger variable sizes, and use of other netCDF-4 features.
25
26       If  no  output  format  is  specified,  with  either  -k  kind_name  or
27       -kind_code, then the output will use the same format as the input,  un‐
28       less  the input is classic or 64-bit offset and either chunking or com‐
29       pression is specified, in which case the output will be netCDF-4  clas‐
30       sic  model format.  Attempting some kinds of format conversion will re‐
31       sult in an error, if the conversion is not possible.  For  example,  an
32       attempt to copy a netCDF-4 file that uses features of the enhanced mod‐
33       el, such as groups or variable-length strings,  to  any  of  the  other
34       kinds  of  netCDF  formats that use the classic model will result in an
35       error.
36
37       nccopy also serves as an example of a generic  netCDF-4  program,  with
38       its  ability  to  read  any valid netCDF file and handle nested groups,
39       strings, and user-defined types, including arbitrarily nested  compound
40       types, variable-length types, and data of any valid netCDF-4 type.
41
42       If  DAP  support  was  enabled when nccopy was built, the file name may
43       specify a DAP URL. This may be used to convert data on DAP  servers  to
44       local netCDF files.
45

OPTIONS

47        -k   kind_name
48              Use  format  name to specify the kind of file to be created and,
49              by  inference,  the  data  model  (i.e.  netcdf-3  (classic)  or
50              netcdf-4 (enhanced)).  The possible arguments are:
51
52                     'nc3' or 'classic' => netCDF classic format
53
54                     'nc6' or '64-bit offset' => netCDF 64-bit format
55
56                     'nc4'  or  'netCDF-4'  =>  netCDF-4 format (enhanced data
57                     model)
58
59                     'nc7' or 'netCDF-4 classic  model'  =>  netCDF-4  classic
60                     model format
61
62              Note:  The  old format numbers '1', '2', '3', '4', equivalent to
63              the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are
64              also  still  accepted  but deprecated, due to easy confusion be‐
65              tween format numbers and format names.
66
67       [-kind_code]
68              Use format numeric code (instead of format name) to specify  the
69              kind  of  file  to  be created and, by inference, the data model
70              (i.e. netcdf-3 (classic) versus netcdf-4 (enhanced)).   The  nu‐
71              meric codes are:
72
73                     3 => netcdf classic format
74
75                     6 => netCDF 64-bit format
76
77                     4 => netCDF-4 format (enhanced data model)
78
79                     7 => netCDF-4 classic model format
80       The  numeric  code  "7"  is used because "7=3+4", specifying the format
81       that uses the netCDF-3 data model for compatibility with  the  netCDF-4
82       storage  format  for performance. Credit is due to NCO for use of these
83       numeric codes instead of the old and confusing format numbers.
84
85        -d   n
86              For netCDF-4 output, including netCDF-4 classic  model,  specify
87              deflation level (level of compression) for variable data output.
88              0 corresponds to no compression and 9  to  maximum  compression,
89              with higher levels of compression requiring marginally more time
90              to compress or uncompress than lower levels. As  a  side  effect
91              specifying  a compression level of 0 (via "-d 0") actually turns
92              off deflation altogether.  Compression achieved may also  depend
93              on  output chunking parameters.  If this option is specified for
94              a classic format or 64-bit offset format input file, it  is  not
95              necessary  to  also  specify  that the output should be netCDF-4
96              classic model, as that will be the default.  If this  option  is
97              not  specified  and the input file has compressed variables, the
98              compression will still be preserved in  the  output,  using  the
99              same chunking as in the input by default.
100
101              Note  that  nccopy requires all variables to be compressed using
102              the same compression level, but the API has no such restriction.
103              With  a  program you can customize compression for each variable
104              independently.
105
106        -s    For netCDF-4 output, including netCDF-4 classic  model,  specify
107              shuffling of variable data bytes before compression or after de‐
108              compression.  Shuffling refers to  interlacing  of  bytes  in  a
109              chunk  so  that  the first bytes of all values are contiguous in
110              storage, followed by all the second bytes, and so on, which  of‐
111              ten  improves compression.  This option is ignored unless a non-
112              zero deflation level is specified.  Using -d0 to specify no  de‐
113              flation  on  input  data  that  has been compressed and shuffled
114              turns off both compression and shuffling in the output.
115
116        -u    Convert any unlimited size dimensions in the input to fixed size
117              dimensions  in the output.  This can speed up variable-at-a-time
118              access, but slow down record-at-a-time access to multiple  vari‐
119              ables along an unlimited dimension.
120
121        -w    Keep  output  in memory (as a diskless netCDF file) until output
122              is closed, at which time output file is written to  disk.   This
123              can  greatly speedup operations such as converting unlimited di‐
124              mension to fixed size (-u option), chunking, rechunking, or com‐
125              pressing  the input.  It requires that available memory is large
126              enough to hold the output file.  This option may provide a larg‐
127              er speedup than careful tuning of the -m, -h, or -e options, and
128              it's certainly a lot simpler.
129
130        -c  chunkspec
131              For netCDF-4 output, including netCDF-4 classic  model,  specify
132              chunking (multidimensional tiling) for variable data in the out‐
133              put.  This is useful to specify the units of disk  access,  com‐
134              pression,  or  other  filters  such  as checksums.  Changing the
135              chunking in a netCDF file can also greatly  speedup  access,  by
136              choosing  chunk  shapes that are appropriate for the most common
137              access patterns.
138
139              The chunkspec argument has several forms. The first form is  the
140              original, deprecated form and is a string of comma-separated as‐
141              sociations, each specifying a dimension name, a  '/'  character,
142              and  optionally  the  corresponding chunk length for that dimen‐
143              sion.  No blanks should appear in the chunkspec  string,  except
144              possibly  escaped  blanks  that are part of a dimension name.  A
145              chunkspec names at least one dimension, and may omit  dimensions
146              which  are  not  to  be  chunked  or for which the default chunk
147              length is desired.  If a dimension name is  followed  by  a  '/'
148              character  but  no subsequent chunk length, the actual dimension
149              length is assumed.   If  copying  a  classic  model  file  to  a
150              netCDF-4  output  file  and  not  naming  all  dimensions in the
151              chunkspec, unnamed dimensions will also use the actual dimension
152              length  for  the  chunk  length.   An example of a chunkspec for
153              variables that use 'm' and 'n' dimensions might be 'm/100,n/200'
154              to specify 100 by 200 chunks. To see the chunking resulting from
155              copying with a chunkspec, use the '-s' option of ncdump  on  the
156              output file.
157
158              The chunkspec '/' that omits all dimension names and correspond‐
159              ing chunk lengths specifies that no chunking is to occur in  the
160              output, so can be used to unchunk all the chunked variables.  To
161              see the chunking resulting from copying with  a  chunkspec,  use
162              the '-s' option of ncdump on the output file.
163
164              As  an  I/O optimization, nccopy has a threshold for the minimum
165              size of non-record variables that get  chunked,  currently  8192
166              bytes. The -M flag can be used to override this value.
167
168              Note  that  nccopy  requires variables that share a dimension to
169              also share the chunk size associated with  that  dimension,  but
170              the  programming interface has no such restriction.  If you need
171              to customize chunking for variables independently, you will need
172              to  use  the  second  form  of  chunkspec.  This  second form of
173              chunkspec has this syntax:  var:n1,n2,...,nn . This assumes that
174              the  variable named "var" has rank n. The chunking to be applied
175              to each dimension of the variable is specified by the values  of
176              n1 through nn. This second form of chunking specification can be
177              repeated multiple times to specify the exact chunking  for  dif‐
178              ferent  variables.   If  the  variable is specified but no chunk
179              sizes are specified (i.e.  -c var: ) then chunking  is  disabled
180              for  that variable.  If the same variable is specified more than
181              once, the second and later specifications  are  ignored.   Also,
182              this  second  form, per-variable chunking, takes precedence over
183              any per-dimension chunking except the bare "/" case.
184
185              The third form of the chunkspec has the syntax:  var:compact  or
186              var:contiguous.   This  explicitly  attempts to set the variable
187              storage type as compact or contiguous, respectively.  These  may
188              be overridden if other flags require the variable to be chunked.
189
190        -v   var1,...
191              The output will include data values for the specified variables,
192              in addition to the declarations of  all  dimensions,  variables,
193              and  attributes. One or more variables must be specified by name
194              in the comma-delimited list following this option. The list must
195              be  a  single  argument to the command, hence cannot contain un‐
196              escaped blanks or other white space characters. The named  vari‐
197              ables  must be valid netCDF variables in the input-file. A vari‐
198              able within a group in a netCDF-4 file may be specified with  an
199              absolute  path  name,  such  as "/GroupA/GroupA2/var".  Use of a
200              relative path name such as  'var'  or  "grp/var"  specifies  all
201              matching  variable names in the file.  The default, without this
202              option, is to include data values for   all   variables  in  the
203              output.
204
205        -V   var1,...
206              The output will include the specified variables only but all di‐
207              mensions and global or group attributes. One or  more  variables
208              must  be specified by name in the comma-delimited list following
209              this option. The list must be a single argument to the  command,
210              hence cannot contain unescaped blanks or other white space char‐
211              acters. The named variables must be valid  netCDF  variables  in
212              the input-file. A variable within a group in a netCDF-4 file may
213              be   specified   with   an   absolute   path   name,   such   as
214              '/GroupA/GroupA2/var'.   Use  of  a  relative  path name such as
215              'var' or 'grp/var' specifies all matching variable names in  the
216              file.   The  default,  without  this  option, is to include  all
217              variables in the output.
218
219        -g   grp1,...
220              The output will include  data  values  only  for  the  specified
221              groups.   One  or  more  groups must be specified by name in the
222              comma-delimited list following this option. The list must  be  a
223              single  argument  to the command. The named groups must be valid
224              netCDF groups in the input-file. The default, without  this  op‐
225              tion, is to include data values for all groups in the output.
226
227        -G   grp1,...
228              The  output will include only the specified groups.  One or more
229              groups must be specified by name  in  the  comma-delimited  list
230              following this option. The list must be a single argument to the
231              command. The named groups must be valid netCDF groups in the in‐
232              put-file.  The  default,  without this option, is to include all
233              groups in the output.
234
235        -m   bufsize
236              An integer or floating-point number that specifies the size,  in
237              bytes,  of the copy buffer used to copy large variables.  A suf‐
238              fix of K, M, G, or T multiplies the  copy  buffer  size  by  one
239              thousand,  million, billion, or trillion, respectively.  The de‐
240              fault is 5 Mbytes, but will be increased if necessary to hold at
241              least one chunk of netCDF-4 chunked variables in the input file.
242              You may want to specify a value  larger  than  the  default  for
243              copying  large files over high latency networks.  Using the '-w'
244              option may provide better performance, if  the  output  fits  in
245              memory.
246
247        -h   chunk_cache
248              For  netCDF-4 output, including netCDF-4 classic model, an inte‐
249              ger or floating-point number that specifies the size in bytes of
250              chunk  cache allocated for each chunked variable.  This is not a
251              property of the file, but merely a performance tuning  parameter
252              for avoiding compressing or decompressing the same data multiple
253              times while copying and changing chunk shapes.  A suffix  of  K,
254              M, G, or T multiplies the chunk cache size by one thousand, mil‐
255              lion,  billion,  or  trillion,  respectively.   The  default  is
256              4.194304  Mbytes  (or  whatever was specified for the configure-
257              time constant  CHUNK_CACHE_SIZE  when  the  netCDF  library  was
258              built).  Ideally, the nccopy utility should accept only one mem‐
259              ory buffer size and divide it optimally between  a  copy  buffer
260              and  chunk cache, but no general algorithm for computing the op‐
261              timum chunk cache size has been implemented yet. Using the  '-w'
262              option  may  provide  better  performance, if the output fits in
263              memory.
264
265        -e   cache_elems
266              For netCDF-4 output, including netCDF-4 classic model, specifies
267              number  of  chunks that the chunk cache can hold. A suffix of K,
268              M, G, or T multiplies the number of chunks that can be  held  in
269              the  cache  by  one thousand, million, billion, or trillion, re‐
270              spectively.  This is not a property of the file,  but  merely  a
271              performance  tuning parameter for avoiding compressing or decom‐
272              pressing the same data multiple times while copying and changing
273              chunk  shapes.   The  default is 1009 (or whatever was specified
274              for the  configure-time  constant  CHUNK_CACHE_NELEMS  when  the
275              netCDF  library  was built).  Ideally, the nccopy utility should
276              determine an optimum value for this parameter,  but  no  general
277              algorithm  for  computing the optimum number of chunk cache ele‐
278              ments has been implemented yet.
279
280        -r    Read netCDF classic or 64-bit offset input file into a  diskless
281              netCDF  file in memory before copying.  Requires that input file
282              be small enough to fit into memory.  For  nccopy,  this  doesn't
283              seem  to provide any significant speedup, so may not be a useful
284              option.
285
286        -L  n Set the log level; only usable if nccopy supports netCDF-4  (en‐
287              hanced).
288
289        -M  n Set  the  minimum  chunk  size;  only  usable if nccopy supports
290              netCDF-4 (enhanced).
291
292        -F  filterspec
293              For netCDF-4 output, including netCDF-4 classic model, specify a
294              filter  to  apply to a specified set of variables in the output.
295              As a rule, the filter is a  compression/decompression  algorithm
296              with  a unique numeric identifier assigned by the HDF Group (see
297              https://support.hdfgroup.org/services/filters.html).
298
299              The filterspec argument has this general form.
300              fqn1|fqn2...,filterid,param1,param2...paramn      or      *,fil‐
301              terid,param1,param2...paramn
302       An fqn (fully qualified name) is the name of a variable prefixed by its
303       containing groups with the  group  names  separated  by  forward  slash
304       ('/').   An  example might be /g1/g2/var. Alternatively, just the vari‐
305       able name can be given if it is in the root group: e.g. var.  Backslash
306       escapes may be used as needed.  A note of warning: the '|' separator is
307       a bash reserved character, so you will probably need to put the  filter
308       spec in some kind of quotes or otherwise escape it.
309
310              The filterid is an unsigned positive integer representing the id
311              assigned by the HDFgroup to the filter. Following the  id  is  a
312              sequence  of  parameters  defining  the operation of the filter.
313              Each parameter is a 32-bit unsigned integer.
314
315              This parameter may be repeated  multiple  times  with  different
316              variable names.
317
318

EXAMPLES

320       Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF
321       file of the same type:
322
323              nccopy foo1.nc foo2.nc
324
325       Note that the above copy will not be as fast as use of cp or other sim‐
326       ple copy utility, because the file is copied using only the netCDF API.
327       If the input file has extra bytes after the end  of  the  netCDF  data,
328       those  will  not be copied, because they are not accessible through the
329       netCDF interface.  If the original file was generated in "No fill" mode
330       so  that fill values are not stored for padding for data alignment, the
331       output file may have different padding bytes.
332
333       Convert a netCDF-4 classic model file, compressed.nc,  that  uses  com‐
334       pression, to a netCDF-3 file classic.nc:
335
336              nccopy -k classic compressed.nc classic.nc
337
338       Note that 'nc3' could be used instead of 'classic'.
339
340       Download the variable 'time_bnds' and its associated attributes from an
341       OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
342
343              nccopy          'http://test.opendap.org/opendap/data/nc/sst.mn
344                     mean.nc.gz?time_bnds' tb.nc
345
346       Note  that  URLs that name specific variables as command-line arguments
347       should generally be quoted, to avoid  the  shell  interpreting  special
348       characters such as '?'.
349
350       Compress  all  the variables in the input file foo.nc, a netCDF file of
351       any type, to the output file bar.nc:
352
353              nccopy -d1 foo.nc bar.nc
354
355       If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be  a
356       netCDF-4 classic model netCDF file, because the classic and 64-bit off‐
357       set format  variants  don't  support  compression.   If  foo.nc  was  a
358       netCDF-4  file  with  some variables compressed using various deflation
359       levels, the output will also be a netCDF-4 file of the same  type,  but
360       all  the  variables, including any uncompressed variables in the input,
361       will now use deflation level 1.
362
363       Assume the input data includes gridded variables that  use  time,  lat,
364       lon  dimensions,  with 1000 times by 1000 latitudes by 1000 longitudes,
365       and that the time dimension varies most slowly.  Also assume that users
366       want  quick  access  to  data  at  all times for a small set of lat-lon
367       points.  Accessing data for 1000 times would typically require  access‐
368       ing 1000 disk blocks, which may be slow.
369
370       Reorganizing  the  data  into  chunks on disk that have all the time in
371       each chunk for a few lat and lon coordinates  would  greatly  speed  up
372       such  access.   To  chunk  the data in the input file slow.nc, a netCDF
373       file of any type, to the output file fast.nc, you could use;
374
375              nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc
376
377       to specify data chunks of 1000 times, 40 latitudes, and 40  longitudes.
378       If you had enough memory to contain the output file, you could speed up
379       the rechunking operation significantly by creating the output in memory
380       before writing it to disk on close (using the -w flag):
381
382              nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
383       Alternatively,  one could write this using the alternate, variable-spe‐
384       cific chunking specification and assuming that times, lat, and lon  are
385       variables.
386
387              nccopy -c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc
388

Chunking Rules

390       The complete set of chunking rules is captured here.  As a rough summa‐
391       ry, these rules preserve all chunking properties from the  input  file.
392       These  rules apply only when the selected output format supports chunk‐
393       ing, i.e. for the netcdf-4 variants.
394
395       The variable specific chunking  specification  should  be  obvious  and
396       translates  directly  to  the  corresponding  "nc_def_var_chunking" API
397       call.
398
399       The original per-dimension, chunking specification requires some inter‐
400       pretation  by nccopy.  The following rules are applied in the given or‐
401       der independently for each variable to be copied from input to  output.
402       The  rules are written assuming we are trying to determine the chunking
403       for a given output variable Vout that comes from an input variable Vin.
404
405       1.     If there is no '-c' option that applies to a  variable  and  the
406              corresponding  input variable is contiguous or the input is some
407              netcdf-3 variant, then let the netcdf-c library make all  chunk‐
408              ing decisions.
409
410       2.     For  each  dimension of Vout explicitly specified on the command
411              line (using the '-c' option), apply the chunking value for  that
412              dimension regardless of input format or input properties.
413
414       3.     For  dimensions  of Vout not named on the command line in a '-c'
415              option, preserve chunk sizes from the corresponding input  vari‐
416              able, if it is chunked.
417
418       4.     If  Vin  is  contiguous, and none of its dimensions are named on
419              the command line, and chunking is not mandated by other options,
420              then make Vout be contiguous.
421
422       5.     If  the  input variable is contiguous (or is some netcdf-3 vari‐
423              ant) and there are no options requiring  chunking,  or  the  '/'
424              special  case  for the '-c' option is specified, then the output
425              variable V is marked as contiguous.
426
427       6.     Final, default case: some or all chunk sizes are not  determined
428              by  the  command  line  or the input variable. This includes the
429              non-chunked input cases such as  netcdf-3,  cdf5,  and  DAP.  In
430              these cases retain all chunk sizes determined by previous rules,
431              and use the full dimension size as the default. The exception is
432              unlimited dimensions, where the default is 4 megabytes.
433
434

SEE ALSO

436       ncdump(1),ncgen(1),netcdf(3)
437
438
439
440Release 4.2                       2012-03-08                         NCCOPY(1)
Impressum