1NCGEN(1) UNIDATA UTILITIES NCGEN(1)
2
3
4
6 ncgen - From a CDL file generate a netCDF-3 file, a netCDF-4 file or a
7 C program
8
10 ncgen [-b] [-c] [-f] [-k format_name] [-format_code] [-l output
11 language] [-n] [-o netcdf_filename] [-x] [input_file]
12
14 ncgen generates either a netCDF-3 (i.e. classic) binary .nc file, a
15 netCDF-4 (i.e. enhanced) binary .nc file or a file in some source lan‐
16 guage that when executed will construct the corresponding binary .nc
17 file. The input to ncgen is a description of a netCDF file in a small
18 language known as CDL (network Common Data form Language), described
19 below. Input is read from standard input if no input_file is speci‐
20 fied. If no options are specified in invoking ncgen, it merely checks
21 the syntax of the input CDL file, producing error messages for any vio‐
22 lations of CDL syntax. Other options can be used, for example, to cre‐
23 ate the corresponding netCDF file, or to generate a C program that uses
24 the netCDF C interface to create the netCDF file.
25
26 Note that this version of ncgen was originally called ncgen4. The old‐
27 er ncgen program has been renamed to ncgen3.
28
29 ncgen may be used with the companion program ncdump to perform some
30 simple operations on netCDF files. For example, to rename a dimension
31 in a netCDF file, use ncdump to get a CDL version of the netCDF file,
32 edit the CDL file to change the name of the dimensions, and use ncgen
33 to generate the corresponding netCDF file from the edited CDL file.
34
36 -b Create a (binary) netCDF file. If the -o option is absent, a
37 default file name will be constructed from the basename of the
38 CDL file, with any suffix replaced by the `.nc' extension. If a
39 file already exists with the specified name, it will be over‐
40 written.
41
42 -c Generate C source code that will create a netCDF file matching
43 the netCDF specification. The C source code is written to stan‐
44 dard output; equivalent to -lc.
45
46 -f Generate FORTRAN 77 source code that will create a netCDF file
47 matching the netCDF specification. The source code is written
48 to standard output; equivalent to -lf77.
49
50 -o netcdf_file
51 Name of the file to pass to calls to "nc_create()". If this op‐
52 tion is specified it implies (in the absence of any explicit -l
53 flag) the "-b" option. This option is necessary because netCDF
54 files cannot be written directly to standard output, since stan‐
55 dard output is not seekable.
56
57 -k format_name
58
59 -format_code
60 The -k flag specifies the format of the file to be created and,
61 by inference, the data model accepted by ncgen (i.e. netcdf-3
62 (classic) versus netcdf-4 vs netcdf-5). As a shortcut, a numeric
63 format_code may be specified instead. The possible format_name
64 values for the -k option are:
65
66 'classic' or 'nc3' => netCDF classic format
67
68 '64-bit offset' or 'nc6' => netCDF 64-bit format
69
70 '64-bit data or 'nc5' => netCDF-5 (64-bit data) format
71
72 'netCDF-4' 0r 'nc4' => netCDF-4 format (enhanced data
73 model)
74
75 'netCDF-4 classic model' or 'nc7' => netCDF-4 classic
76 model format
77 Accepted format_number arguments, just shortcuts for format_names, are:
78
79 3 => netcdf classic format
80
81 5 => netcdf 5 format
82
83 6 => netCDF 64-bit format
84
85 4 => netCDF-4 format (enhanced data model)
86
87 7 => netCDF-4 classic model format
88 The numeric code "7" is used because "7=3+4", a mnemonic for the format
89 that uses the netCDF-3 data model for compatibility with the netCDF-4
90 storage format for performance. Credit is due to NCO for use of these
91 numeric codes instead of the old and confusing format numbers.
92
93 Note: The old version format numbers '1', '2', '3', '4', equivalent to
94 the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are also
95 still accepted but deprecated, due to easy confusion between format
96 numbers and format names. Various old format name aliases are also ac‐
97 cepted but deprecated, e.g. 'hdf5', 'enhanced-nc3', etc. Also, note
98 that -v is accepted to mean the same thing as -k for backward compati‐
99 bility.
100
101 -x Don't initialize data with fill values. This can speed up cre‐
102 ation of large netCDF files greatly, but later attempts to read
103 unwritten data from the generated file will not be easily de‐
104 tectable.
105
106 -l output_language
107 The -l flag specifies the output language to use when generating
108 source code that will create or define a netCDF file matching
109 the netCDF specification. The output is written to standard
110 output. The currently supported languages have the following
111 flags.
112
113 c|C' => C language output.
114
115 f77|fortran77' => FORTRAN 77 language output
116 ; note that currently only the classic model is
117 supported.
118
119 j|java' => (experimental) Java language output
120 ; targets the existing Unidata Java interface,
121 which means that only the classic model is sup‐
122 ported.
123
125 The choice of output format is determined by three flags.
126
127 -k flag.
128
129 _Format attribute (see below).
130
131 Occurrence of CDF-5 (64-bit data) or
132 netcdf-4 constructs in the input CDL." The term "netCDF-4 con‐
133 structs" means constructs from the enhanced data model, not just
134 special performance-related attributes such as
135 _ChunkSizes, _DeflateLevel, _Endianness, etc. The term "CDF-5
136 constructs" means extended unsigned integer types allowed in the
137 64-bit data model.
138
139 Note that there is an ambiguity between the netCDF-4 case and the CDF-5
140 case is only an unsigned type is seen in the input.
141
142 The rules are as follows, in order of application.
143
144 1. If either Fortran or Java output is specified, then -k flag val‐
145 ue of 1 (classic model) will be used. Conflicts with the use of
146 enhanced constructs in the CDL will report an error.
147
148 2. If both the -k flag and _Format attribute are specified, the
149 _Format flag will be ignored. If no -k flag is specified, and a
150 _Format attribute value is specified, then the -k flag value
151 will be set to that of the _Format attribute. Otherwise the -k
152 flag is undefined.
153
154 3. If the -k option is defined and is consistent with the CDL, nc‐
155 gen will output a file in the requested form, else an error will
156 be reported.
157
158 4. If the -k flag is undefined, and if there are CDF-5 constructs,
159 only, in the CDL, a -k flag value of 5 (64-bit data model) will
160 be used. If there are true netCDF-4 constructs in the CDL, a -k
161 flag value of 3 (enhanced model) will be used.
162
163 5. If special performance-related attributes are specified in the
164 CDL, a -k flag value of 4 (netCDF-4 classic model) will be used.
165
166 6. Otherwise ncgen will set the -k flag to 1 (classic model).
167
169 Check the syntax of the CDL file `foo.cdl':
170
171 ncgen foo.cdl
172
173 From the CDL file `foo.cdl', generate an equivalent binary netCDF file
174 named `x.nc':
175
176 ncgen -o x.nc foo.cdl
177
178 From the CDL file `foo.cdl', generate a C program containing the netCDF
179 function invocations necessary to create an equivalent binary netCDF
180 file named `x.nc':
181
182 ncgen -lc foo.cdl >x.c
183
185 CDL Syntax Overview
186 Below is an example of CDL syntax, describing a netCDF file with sever‐
187 al named dimensions (lat, lon, and time), variables (Z, t, p, rh, lat,
188 lon, time), variable attributes (units, long_name, valid_range, _Fill‐
189 Value), and some data. CDL keywords are in boldface. (This example is
190 intended to illustrate the syntax; a real CDL file would have a more
191 complete set of attributes so that the data would be more completely
192 self-describing.)
193 netcdf foo { // an example netCDF specification in CDL
194
195 types:
196 ubyte enum enum_t {Clear = 0, Cumulonimbus = 1, Stratus = 2};
197 opaque(11) opaque_t;
198 int(*) vlen_t;
199
200 dimensions:
201 lat = 10, lon = 5, time = unlimited ;
202
203 variables:
204 long lat(lat), lon(lon), time(time);
205 float Z(time,lat,lon), t(time,lat,lon);
206 double p(time,lat,lon);
207 long rh(time,lat,lon);
208
209 string country(time,lat,lon);
210 ubyte tag;
211
212 // variable attributes
213 lat:long_name = "latitude";
214 lat:units = "degrees_north";
215 lon:long_name = "longitude";
216 lon:units = "degrees_east";
217 time:units = "seconds since 1992-1-1 00:00:00";
218
219 // typed variable attributes
220 string Z:units = "geopotential meters";
221 float Z:valid_range = 0., 5000.;
222 double p:_FillValue = -9999.;
223 long rh:_FillValue = -1;
224 vlen_t :globalatt = {17, 18, 19};
225 data:
226 lat = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90;
227 lon = -140, -118, -96, -84, -52;
228 group: g {
229 types:
230 compound cmpd_t { vlen_t f1; enum_t f2;};
231 } // group g
232 group: h {
233 variables:
234 /g/cmpd_t compoundvar;
235 data:
236 compoundvar = { {3,4,5}, enum_t.Stratus } ;
237 } // group h
238 }
239
240 All CDL statements are terminated by a semicolon. Spaces, tabs, and
241 newlines can be used freely for readability. Comments may follow the
242 characters `//' on any line.
243
244 A CDL description consists of five optional parts: types, dimensions,
245 variables, data, beginning with the keyword `types:', `dimensions:',
246 `variables:', and `data:', respectively. Note several things: (1) the
247 keyword includes the trailing colon, so there must not be any space be‐
248 fore the colon character, and (2) the keywords are required to be lower
249 case.
250
251 The variables: section may contain variable declarations and attribute
252 assignments. All sections may contain global attribute assignments.
253
254 In addition, after the data: section, the user may define a series of
255 groups (see the example above). Groups themselves can contain types,
256 dimensions, variables, data, and other (nested) groups.
257
258 The netCDF types: section declares the user defined types. These may
259 be constructed using any of the following types: enum, vlen, opaque, or
260 compound.
261
262 A netCDF dimension is used to define the shape of one or more of the
263 multidimensional variables contained in the netCDF file. A netCDF di‐
264 mension has a name and a size. A dimension can have the unlimited
265 size, which means a variable using this dimension can grow to any
266 length in that dimension.
267
268 A variable represents a multidimensional array of values of the same
269 type. A variable has a name, a data type, and a shape described by its
270 list of dimensions. Each variable may also have associated attributes
271 (see below) as well as data values. The name, data type, and shape of
272 a variable are specified by its declaration in the variable section of
273 a CDL description. A variable may have the same name as a dimension;
274 by convention such a variable is one-dimensional and contains coordi‐
275 nates of the dimension it names. Dimensions need not have correspond‐
276 ing variables.
277
278 A netCDF attribute contains information about a netCDF variable or
279 about the whole netCDF dataset. Attributes are used to specify such
280 properties as units, special values, maximum and minimum valid values,
281 scaling factors, offsets, and parameters. Attribute information is
282 represented by single values or arrays of values. For example, "units"
283 is an attribute represented by a character array such as "celsius". An
284 attribute has an associated variable, a name, a data type, a length,
285 and a value. In contrast to variables that are intended for data, at‐
286 tributes are intended for metadata (data about data). Unlike netCDF-3,
287 attribute types can be any user defined type as well as the usual
288 built-in types.
289
290 In CDL, an attribute is designated by a a type, a variable, a ':', and
291 then an attribute name. The type is optional and if missing, it will
292 be inferred from the values assigned to the attribute. It is possible
293 to assign global attributes not associated with any variable to the
294 netCDF as a whole by omitting the variable name in the attribute decla‐
295 ration. Notice that there is a potential ambiguity in a specification
296 such as
297 x : a = ...
298 In this situation, x could be either a type for a global attribute, or
299 the variable name for an attribute. Since there could both be a type
300 named x and a variable named x, there is an ambiguity. The rule is
301 that in this situation, x will be interpreted as a type if possible,
302 and otherwise as a variable.
303
304 If not specified, the data type of an attribute in CDL is derived from
305 the type of the value(s) assigned to it. The length of an attribute is
306 the number of data values assigned to it, or the number of characters
307 in the character string assigned to it. Multiple values are assigned
308 to non-character attributes by separating the values with commas. All
309 values assigned to an attribute must be of the same type.
310
311 The names for CDL dimensions, variables, attributes, types, and groups
312 may contain any non-control utf-8 character except the forward slash
313 character (`/'). However, certain characters must escaped if they are
314 used in a name, where the escape character is the backward slash `\'.
315 In particular, if the leading character off the name is a digit (0-9),
316 then it must be preceded by the escape character. In addition, the
317 characters ` !"#$%&()*,:;<=>?[]^`´{}|~\' must be escaped if they occur
318 anywhere in a name. Note also that attribute names that begin with an
319 underscore (`_') are reserved for the use of Unidata and should not be
320 used in user defined attributes.
321
322 Note also that the words `variable', `dimension', `data', `group', and
323 `types' are legal CDL names, but be careful that there is a space be‐
324 tween them and any following colon character when used as a variable
325 name. This is mostly an issue with attribute declarations. For exam‐
326 ple, consider this.
327
328
329 netcdf ... {
330 ...
331 variables:
332 int dimensions;
333 dimensions: attribute=0 ; // this will cause an error
334 dimensions : attribute=0 ; // this is ok.
335 ...
336 }
337
338 The optional data: section of a CDL specification is where netCDF vari‐
339 ables may be initialized. The syntax of an initialization is simple: a
340 variable name, an equals sign, and a comma-delimited list of constants
341 (possibly separated by spaces, tabs and newlines) terminated with a
342 semicolon. For multi-dimensional arrays, the last dimension varies
343 fastest. Thus row-order rather than column order is used for matrices.
344 If fewer values are supplied than are needed to fill a variable, it is
345 extended with a type-dependent `fill value', which can be overridden by
346 supplying a value for a distinguished variable attribute named `_Fill‐
347 Value'. The types of constants need not match the type declared for a
348 variable; coercions are done to convert integers to floating point, for
349 example. The constant `_' can be used to designate the fill value for
350 a variable. If the type of the variable is explicitly `string', then
351 the special constant `NIL` can be used to represent a nil string, which
352 is not the same as a zero length string.
353
354 Primitive Data Types
355 char characters
356 byte 8-bit data
357 short 16-bit signed integers
358 int 32-bit signed integers
359 long (synonymous with int)
360 int64 64-bit signed integers
361 float IEEE single precision floating point (32 bits)
362 real (synonymous with float)
363 double IEEE double precision floating point (64 bits)
364 ubyte unsigned 8-bit data
365 ushort 16-bit unsigned integers
366 uint 32-bit unsigned integers
367 uint64 64-bit unsigned integers
368 string arbitrary length strings
369
370 CDL supports a superset of the primitive data types of C. The names
371 for the primitive data types are reserved words in CDL, so the names of
372 variables, dimensions, and attributes must not be primitive type names.
373 In declarations, type names may be specified in either upper or lower
374 case.
375
376 Bytes are intended to hold a full eight bits of data, and the zero byte
377 has no special significance, as it mays for character data. ncgen con‐
378 verts byte declarations to char declarations in the output C code and
379 to the nonstandard BYTE declaration in output Fortran code.
380
381 Shorts can hold values between -32768 and 32767. ncgen converts short
382 declarations to short declarations in the output C code and to the non‐
383 standard INTEGER*2 declaration in output Fortran code.
384
385 Ints can hold values between -2147483648 and 2147483647. ncgen con‐
386 verts int declarations to int declarations in the output C code and to
387 INTEGER declarations in output Fortran code. long is accepted as a
388 synonym for int in CDL declarations, but is deprecated since there are
389 now platforms with 64-bit representations for C longs.
390
391 Int64 can hold values between -9223372036854775808 and
392 9223372036854775807. ncgen converts int64 declarations to longlong
393 declarations in the output C code.
394
395 Floats can hold values between about -3.4+38 and 3.4+38. Their exter‐
396 nal representation is as 32-bit IEEE normalized single-precision float‐
397 ing point numbers. ncgen converts float declarations to float declara‐
398 tions in the output C code and to REAL declarations in output Fortran
399 code. real is accepted as a synonym for float in CDL declarations.
400
401 Doubles can hold values between about -1.7+308 and 1.7+308. Their ex‐
402 ternal representation is as 64-bit IEEE standard normalized double-pre‐
403 cision floating point numbers. ncgen converts double declarations to
404 double declarations in the output C code and to DOUBLE PRECISION decla‐
405 rations in output Fortran code.
406
407 The unsigned counterparts of the above integer types are mapped to the
408 corresponding unsigned C types. Their ranges are suitably modified to
409 start at zero.
410
411 The technical interpretation of the char type is that it is an unsigned
412 8-bit value. The encoding of the 256 possible values is unspecified by
413 default. A variable of char type may be marked with an "_Encoding" at‐
414 tribute to indicate the character set to be used: US-ASCII, ISO-8859-1,
415 etc. Note that specifying the encoding of UTF-8 is equivalent to spec‐
416 ifying US-ASCII This is because multi-byte UTF-8 characters cannot be
417 stored in an 8-bit character. The only legal single byte UTF-8 values
418 are by definition the 7-bit US-ASCII encoding with the top bit set to
419 zero.
420
421 Strings are assumed by default to be encoded using UTF-8. Note that
422 this means that multi-byte UTF-8 encodings may be present in the
423 string, so it is possible that the number of distinct UTF-8 characters
424 in a string is smaller than the number of 8-bit bytes used to store the
425 string.
426
427 CDL Constants
428 Constants assigned to attributes or variables may be of any of the ba‐
429 sic netCDF types. The syntax for constants is similar to C syntax, ex‐
430 cept that type suffixes must be appended to shorts and floats to dis‐
431 tinguish them from longs and doubles.
432
433 A byte constant is represented by an integer constant with a `b' (or
434 `B') appended. In the old netCDF-2 API, byte constants could also be
435 represented using single characters or standard C character escape se‐
436 quences such as `a' or `0. This is still supported for backward com‐
437 patibility, but deprecated to make the distinction clear between the
438 numeric byte type and the textual char type. Example byte constants
439 include:
440 0b // a zero byte
441 -1b // -1 as an 8-bit byte
442 255b // also -1 as a signed 8-bit byte
443
444 short integer constants are intended for representing 16-bit signed
445 quantities. The form of a short constant is an integer constant with
446 an `s' or `S' appended. If a short constant begins with `0', it is in‐
447 terpreted as octal, except that if it begins with `0x', it is inter‐
448 preted as a hexadecimal constant. For example:
449 -2s // a short -2
450 0123s // octal
451 0x7ffs //hexadecimal
452
453 int integer constants are intended for representing 32-bit signed quan‐
454 tities. The form of an int constant is an ordinary integer constant,
455 although it is acceptable to optionally append a single `l' or `L'
456 (again, deprecated). Be careful, though, the L suffix is interpreted as
457 a 32 bit integer, and never as a 64 bit integer. This can be confusing
458 since the C long type can ambigously be either 32 bit or 64 bit.
459
460 If an int constant begins with `0', it is interpreted as octal, except
461 that if it begins with `0x', it is interpreted as a hexadecimal con‐
462 stant (but see opaque constants below). Examples of valid int con‐
463 stants include:
464 -2
465 1234567890L
466 0123 // octal
467 0x7ff // hexadecimal
468
469 int64 integer constants are intended for representing 64-bit signed
470 quantities. The form of an int64 constant is an integer constant with
471 an `ll' or `LL' appended. If an int64 constant begins with `0', it is
472 interpreted as octal, except that if it begins with `0x', it is inter‐
473 preted as a hexadecimal constant. For example:
474 -2ll // an unsigned -2
475 0123LL // octal
476 0x7ffLL //hexadecimal
477
478 Floating point constants of type float are appropriate for representing
479 floating point data with about seven significant digits of precision.
480 The form of a float constant is the same as a C floating point constant
481 with an `f' or `F' appended. For example the following are all accept‐
482 able float constants:
483 -2.0f
484 3.14159265358979f // will be truncated to less precision
485 1.f
486
487
488 Floating point constants of type double are appropriate for represent‐
489 ing floating point data with about sixteen significant digits of preci‐
490 sion. The form of a double constant is the same as a C floating point
491 constant. An optional `d' or `D' may be appended. For example the
492 following are all acceptable double constants:
493 -2.0
494 3.141592653589793
495 1.0e-20
496 1.d
497
498 Unsigned integer constants can be created by appending the character
499 'U' or 'u' between the constant and any trailing size specifier, or im‐
500 mediately at the end of the size specifier. Thus one could say 10U,
501 100su, 100000ul, or 1000000llu, for example.
502
503 Single character constants may be enclosed in single quotes. If a se‐
504 quence of one or more characters is enclosed in double quotes, then its
505 interpretation must be inferred from the context. If the dataset is
506 created using the netCDF classic model, then all such constants are in‐
507 terpreted as a character array, so each character in the constant is
508 interpreted as if it were a single character. If the dataset is netCDF
509 extended, then the constant may be interpreted as for the classic model
510 or as a true string (see below) depending on the type of the attribute
511 or variable into which the string is contained.
512
513 The interpretation of char constants is that those that are in the
514 printable ASCII range (' '..'~') are assumed to be encoded as the
515 1-byte subset ofUTF-8, which is equivalent to US-ASCII. In all cases,
516 the usual C string escape conventions are honored for values from 0
517 thru 127. Values greater than 127 are allowed, but their encoding is
518 undefined. For netCDF extended, the use of the char type is deprecated
519 in favor of the string type.
520
521 Some character constant examples are as follows.
522 'a' // ASCII `a'
523 "a" // equivalent to 'a'
524 "Two\nlines\n" // a 10-character string with two embedded newlines
525 "a bell:\007" // a string containing an ASCII bell
526 Note that the netCDF character array "a" would fit in a one-element
527 variable, since no terminating NULL character is assumed. However, a
528 zero byte in a character array is interpreted as the end of the signif‐
529 icant characters by the ncdump program, following the C convention.
530 Therefore, a NULL byte should not be embedded in a character string un‐
531 less at the end: use the byte data type instead for byte arrays that
532 contain the zero byte.
533
534 String constants are, like character constants, represented using dou‐
535 ble quotes. This represents a potential ambiguity since a multi-charac‐
536 ter string may also indicate a dimensioned character value. Disambigua‐
537 tion usually occurs by context, but care should be taken to specify
538 thestring type to ensure the proper choice. String constants are as‐
539 sumed to always be UTF-8 encoded. This specifically means that the
540 string constant may actually contain multi-byte UTF-8 characters. The
541 special constant `NIL` can be used to represent a nil string, which is
542 not the same as a zero length string.
543
544 Opaque constants are represented as sequences of hexadecimal digits
545 preceded by 0X or 0x: 0xaa34ffff, for example. These constants can
546 still be used as integer constants and will be either truncated or ex‐
547 tended as necessary.
548
549 Compound Constant Expressions
550 In order to assign values to variables (or attributes) whose type is
551 user-defined type, the constant notation has been extended to include
552 sequences of constants enclosed in curly brackets (e.g. "{"..."}").
553 Such a constant is called a compound constant, and compound constants
554 can be nested.
555
556 Given a type "T(*) vlen_t", where T is some other arbitrary base type,
557 constants for this should be specified as follows.
558 vlen_t var[2] = {t11,t12,...t1N}, {t21,t22,...t2m};
559 The values tij, are assumed to be constants of type T.
560
561 Given a type "compound cmpd_t {T1 f1; T2 f2...Tn fn}", where the Ti are
562 other arbitrary base types, constants for this should be specified as
563 follows.
564 cmpd_t var[2] = {t11,t12,...t1N}, {t21,t22,...t2n};
565 The values tij, are assumed to be constants of type Ti. If the fields
566 are missing, then they will be set using any specified or default fill
567 value for the field's base type.
568
569 The general set of rules for using braces are defined in the Specifying
570 Datalists section below.
571
572 Scoping Rules
573 With the addition of groups, the name space for defined objects is no
574 longer flat. References (names) of any type, dimension, or variable may
575 be prefixed with the absolute path specifying a specific declaration.
576 Thus one might say
577 variables:
578 /g1/g2/t1 v1;
579 The type being referenced (t1) is the one within group g2, which in
580 turn is nested in group g1. The similarity of this notation to Unix
581 file paths is deliberate, and one can consider groups as a form of di‐
582 rectory structure.
583
584 When name is not prefixed, then scope rules are applied to locate the
585 specified declaration. Currently, there are three rules: one for dimen‐
586 sions, one for types and enumeration constants, and one for all others.
587
588 When an unprefixed name of a dimension is used (as in a variable decla‐
589 ration), ncgen first looks in the immediately enclosing group
590 for the dimension. If it is not found there, then it looks in
591 the group enclosing this group. This continues up the group hi‐
592 erarchy until the dimension is found, or there are no more
593 groups to search.
594
595 2. When an unprefixed name of a type or an enumeration constant is
596 used, ncgen searches the group tree using a pre-order depth-
597 first search. This essentially means that it will find the
598 matching declaration that precedes the reference textually in
599 the cdl file and that is "highest" in the group hierarchy.
600
601 3. For all other names, only the immediately enclosing group is
602 searched.
603
604 One final note. Forward references are not allowed. This means that
605 specifying, for example, /g1/g2/t1 will fail if this reference occurs
606 before g1 and/or g2 are defined.
607
608 Specifying Enumeration Constants
609 References to Enumeration constants (in data lists) can be ambiguous
610 since the same enumeration constant name can be defined in more than
611 one enumeration. If a cdl file specified an ambiguous constant, then
612 ncgen will signal an error. Such constants can be disambiguated in two
613 ways.
614
615 1. Prefix the enumeration constant with the name of the enumeration
616 separated by a dot: enum.econst, for example.
617
618 2. If case one is not sufficient to disambiguate the enumeration
619 constant, then one must specify the precise enumeration type us‐
620 ing a group path: /g1/g2/enum.econst, for example.
621
622 Special Attributes
623 Special, virtual, attributes can be specified to provide performance-
624 related information about the file format and about variable proper‐
625 ties. The file must be a netCDF-4 file for these to take effect.
626
627 These special virtual attributes are not actually part of the file,
628 they are merely a convenient way to set miscellaneous properties of the
629 data in CDL
630
631 The special attributes currently supported are as follows: `_Format',
632 `_Fletcher32, `_ChunkSizes', `_Endianness', `_DeflateLevel', `_Shuf‐
633 fle', and `_Storage'.
634
635 `_Format' is a global attribute specifying the netCDF format variant.
636 Its value must be a single string matching one of `classic', `64-bit
637 offset', `64-bit data', `netCDF-4', or `netCDF-4 classic model'.
638
639 The rest of the special attributes are all variable attributes. Essen‐
640 tially all of then map to some corresponding `nc_def_var_XXX' function
641 as defined in the netCDF-4 API. For the attributes that are essential‐
642 ly boolean (_Fletcher32, _Shuffle, and _NOFILL), the value true can be
643 specified by using the strings `true' or `1', or by using the integer
644 1. The value false expects either `false', `0', or the integer 0. The
645 actions associated with these attributes are as follows.
646
647 1. `_Fletcher32 sets the `fletcher32' property for a variable.
648
649 2. `_Endianness' is either `little' or `big', depending on how the
650 variable is stored when first written.
651
652 3. `_DeflateLevel' is an integer between 0 and 9 inclusive if compres‐
653 sion has been specified for the variable.
654
655 4. `_Shuffle' specifies if the the shuffle filter should be used.
656
657 5. `_Storage' is `contiguous' or `chunked'.
658
659 6. `_ChunkSizes' is a list of chunk sizes for each dimension of the
660 variable
661
662 Note that attributes such as "add_offset" or "scale_factor" have no
663 special meaning to ncgen. These attributes are currently conventions,
664 handled above the library layer by other utility packages, for example
665 NCO.
666
667 Specifying Datalists
668 Specifying datalists for variables in the `data:` section can be some‐
669 what complicated. There are some rules that must be followed to ensure
670 that datalists are parsed correctly by ncgen.
671
672 First, the top level is automatically assumed to be a list of items, so
673 it should not be inside {...}. That means that if the variable is a
674 scalar, there will be a single top-level element and if the variable is
675 an array, there will be N top-level elements. For each element of the
676 top level list, the following rules should be applied.
677
678 1. Instances of UNLIMITED dimensions (other than the first dimension)
679 must be surrounded by {...} in order to specify the size.
680
681 2. Compound instances must be embedded in {...}
682
683 3. Non-scalar fields of compound instances must be embedded in {...}.
684
685 4. Instances of vlens must be surrounded by {...} in order to specify
686 the size.
687
688 Datalists associated with attributes are implicitly a vector (i.e., a
689 list) of values of the type of the attribute and the above rules must
690 apply with that in mind.
691
692 7. No other use of braces is allowed.
693
694 Note that one consequence of these rules is that arrays of values can‐
695 not have subarrays within braces. Consider, for example, int
696 var(d1)(d2)...(dn), where none of d2...dn are unlimited. A datalist
697 for this variable must be a single list of integers, where the number
698 of integers is no more than D=d1*d2*...dn values; note that the list
699 can be less than D, in which case fill values will be used to pad the
700 list.
701
702 Rule 6 about attribute datalist has the following consequence. If the
703 type of the attribute is a compound (or vlen) type, and if the number
704 of entries in the list is one, then the compound instances must be en‐
705 closed in braces.
706
707 Specifying Character Datalists
708 Specifying datalists for variables of type char also has some complica‐
709 tions. consider, for example
710 dimensions: u=UNLIMITED; d1=1; d2=2; d3=3;
711 d4=4; d5=5; u2=UNLIMITED;
712 variables: char var(d4,d5);
713 datalist: var="1", "two", "three";
714
715 We have twenty elements of var to fill (d5 X d4) and we have three
716 strings of length 1, 3, 5. How do we assign the characters in the
717 strings to the twenty elements?
718
719 This is challenging because it is desirable to mimic the original ncgen
720 (ncgen3). The core algorithm is notionally as follows.
721
722 1. Assume we have a set of dimensions D1..Dn, where D1 may optionally
723 be an Unlimited dimension. It is assumed that the sizes of the Di
724 are all known (including unlimited dimensions).
725
726 2. Given a sequence of string or character constants C1..Cm, our goal
727 is to construct a single string whose length is the cross product of
728 D1 thru Dn. Note that for purposes of this algorithm, character
729 constants are treated as strings of size 1.
730
731 3. Construct Dx = cross product of D1 thru D(n-1).
732
733 4. For each constant Ci, add fill characters as needed so that its
734 length is a multiple of Dn.
735
736 5. Concatenate the modified C1..Cm to produce string S.
737
738 6. Add fill characters to S to make its length be a multiple of Dn.
739
740 8. If S is longer than the Dx * Dn, then truncate and generate a warn‐
741 ing.
742
743 There are three other cases of note.
744
745 1. If there is only a single, unlimited dimension, then all of the con‐
746 stants are concatenated and fill characters are added to the end of
747 the resulting string to make its length be that of the unlimited di‐
748 mension. If the length is larger than the unlimited dimension, then
749 it is truncated with a warning.
750
751 2. For the case of character typed vlen, "char(*) vlen_t" for example.
752 we simply concatenate all the constants with no filling at all.
753
754 3. For the case of a character typed attribute, we simply concatenate
755 all the constants.
756
757 In netcdf-4, dimensions other than the first can be unlimited. Of
758 course by the rules above, the interior unlimited instances must be de‐
759 limited by {...}. For example.
760 variables: char var(u,u2);
761 datalist: var={"1", "two"}, {"three"};
762 In this case u will have the effective length of two. Within each in‐
763 stance of u2, the rules above will apply, leading to this.
764 datalist: var={"1","t","w","o"}, {"t","h","r","e","e"};
765 The effective size of u2 will be the max of the two instance lengths
766 (five in this case) and the shorter will be padded to produce this.
767 datalist: var={"1","t","w","o","\0"}, {"t","h","r","e","e"};
768
769 Consider an even more complicated case.
770 variables: char var(u,u2,u3);
771 datalist: var={{"1", "two"}}, {{"three"},{"four","xy"}};
772 In this case u again will have the effective length of two. The u2 di‐
773 mensions will have a size = max(1,2) = 2; Within each instance of u2,
774 the rules above will apply, leading to this.
775 datalist: var={{"1","t","w","o"}}, {{"t","h","r","e","e"},{"f","o","u","r","x","y"}};
776 The effective size of u3 will be the max of the two instance lengths
777 (six in this case) and the shorter ones will be padded to produce this.
778 datalist: var={{"1","t","w","o"," "," "}}, {{"t","h","r","e","e"," "},{"f","o","u","r","x","y"}};
779 Note however that the first instance of u2 is less than the max length
780 of u2, so we need to add a filler for another instance of u2, producing
781 this.
782 datalist: var={{"1","t","w","o"," "," "},{" "," "," "," "," "," "}}, {{"t","h","r","e","e"," "},{"f","o","u","r","x","y"}};
783
784
786 The programs generated by ncgen when using the -c flag use initializa‐
787 tion statements to store data in variables, and will fail to produce
788 compilable programs if you try to use them for large datasets, since
789 the resulting statements may exceed the line length or number of con‐
790 tinuation statements permitted by the compiler.
791
792 The CDL syntax makes it easy to assign what looks like an array of
793 variable-length strings to a netCDF variable, but the strings may sim‐
794 ply be concatenated into a single array of characters. Specific use of
795 the string type specifier may solve the problem
796
797
799 The file ncgen.y is the definitive grammar for CDL, but a stripped down
800 version is included here for completeness.
801 ncdesc: NETCDF
802 datasetid
803 rootgroup
804 ;
805
806 datasetid: DATASETID
807
808 rootgroup: '{'
809 groupbody
810 subgrouplist
811 '}';
812
813 groupbody:
814 attrdecllist
815 typesection
816 dimsection
817 vasection
818 datasection
819 ;
820
821 subgrouplist:
822 /*empty*/
823 | subgrouplist namedgroup
824 ;
825
826 namedgroup: GROUP ident '{'
827 groupbody
828 subgrouplist
829 '}'
830 attrdecllist
831 ;
832
833 typesection: /* empty */
834 | TYPES
835 | TYPES typedecls
836 ;
837
838 typedecls:
839 type_or_attr_decl
840 | typedecls type_or_attr_decl
841 ;
842
843 typename: ident ;
844
845 type_or_attr_decl:
846 typedecl
847 | attrdecl ';'
848 ;
849
850 typedecl:
851 enumdecl optsemicolon
852 | compounddecl optsemicolon
853 | vlendecl optsemicolon
854 | opaquedecl optsemicolon
855 ;
856
857 optsemicolon:
858 /*empty*/
859 | ';'
860 ;
861
862 enumdecl: primtype ENUM typename ;
863
864 enumidlist: enumid
865 | enumidlist ',' enumid
866 ;
867
868 enumid: ident '=' constint ;
869
870 opaquedecl: OPAQUE '(' INT_CONST ')' typename ;
871
872 vlendecl: typeref '(' '*' ')' typename ;
873
874 compounddecl: COMPOUND typename '{' fields '}' ;
875
876 fields: field ';'
877 | fields field ';'
878 ;
879
880 field: typeref fieldlist ;
881
882 primtype: CHAR_K
883 | BYTE_K
884 | SHORT_K
885 | INT_K
886 | FLOAT_K
887 | DOUBLE_K
888 | UBYTE_K
889 | USHORT_K
890 | UINT_K
891 | INT64_K
892 | UINT64_K
893 ;
894
895 dimsection: /* empty */
896 | DIMENSIONS
897 | DIMENSIONS dimdecls
898 ;
899
900 dimdecls: dim_or_attr_decl ';'
901 | dimdecls dim_or_attr_decl ';'
902 ;
903
904 dim_or_attr_decl: dimdeclist | attrdecl ;
905
906 dimdeclist: dimdecl
907 | dimdeclist ',' dimdecl
908 ;
909
910 dimdecl:
911 dimd '=' UINT_CONST
912 | dimd '=' INT_CONST
913 | dimd '=' DOUBLE_CONST
914 | dimd '=' NC_UNLIMITED_K
915 ;
916
917 dimd: ident ;
918
919 vasection: /* empty */
920 | VARIABLES
921 | VARIABLES vadecls
922 ;
923
924 vadecls: vadecl_or_attr ';'
925 | vadecls vadecl_or_attr ';'
926 ;
927
928 vadecl_or_attr: vardecl | attrdecl ;
929
930 vardecl: typeref varlist ;
931
932 varlist: varspec
933 | varlist ',' varspec
934 ;
935
936 varspec: ident dimspec ;
937
938 dimspec: /* empty */
939 | '(' dimlist ')'
940 ;
941
942 dimlist: dimref
943 | dimlist ',' dimref
944 ;
945
946 dimref: path ;
947
948 fieldlist:
949 fieldspec
950 | fieldlist ',' fieldspec
951 ;
952
953 fieldspec: ident fielddimspec ;
954
955 fielddimspec: /* empty */
956 | '(' fielddimlist ')'
957 ;
958
959 fielddimlist:
960 fielddim
961 | fielddimlist ',' fielddim
962 ;
963
964 fielddim:
965 UINT_CONST
966 | INT_CONST
967 ;
968
969 /* Use this when referencing defined objects */
970 varref: type_var_ref ;
971
972 typeref: type_var_ref ;
973
974 type_var_ref:
975 path
976 | primtype
977 ;
978
979 /* Use this for all attribute decls */
980 /* Watch out; this is left recursive */
981 attrdecllist: /*empty*/ | attrdecl ';' attrdecllist ;
982
983 attrdecl:
984 ':' ident '=' datalist
985 | typeref type_var_ref ':' ident '=' datalist
986 | type_var_ref ':' ident '=' datalist
987 | type_var_ref ':' _FILLVALUE '=' datalist
988 | typeref type_var_ref ':' _FILLVALUE '=' datalist
989 | type_var_ref ':' _STORAGE '=' conststring
990 | type_var_ref ':' _CHUNKSIZES '=' intlist
991 | type_var_ref ':' _FLETCHER32 '=' constbool
992 | type_var_ref ':' _DEFLATELEVEL '=' constint
993 | type_var_ref ':' _SHUFFLE '=' constbool
994 | type_var_ref ':' _ENDIANNESS '=' conststring
995 | type_var_ref ':' _NOFILL '=' constbool
996 | ':' _FORMAT '=' conststring
997 ;
998
999 path:
1000 ident
1001 | PATH
1002 ;
1003
1004 datasection: /* empty */
1005 | DATA
1006 | DATA datadecls
1007 ;
1008
1009 datadecls:
1010 datadecl ';'
1011 | datadecls datadecl ';'
1012 ;
1013
1014 datadecl: varref '=' datalist ;
1015 datalist:
1016 datalist0
1017 | datalist1
1018 ;
1019
1020 datalist0:
1021 /*empty*/
1022 ;
1023
1024 /* Must have at least 1 element */
1025 datalist1:
1026 dataitem
1027 | datalist ',' dataitem
1028 ;
1029
1030 dataitem:
1031 constdata
1032 | '{' datalist '}'
1033 ;
1034
1035 constdata:
1036 simpleconstant
1037 | OPAQUESTRING
1038 | FILLMARKER
1039 | NIL
1040 | econstref
1041 | function
1042 ;
1043
1044 econstref: path ;
1045
1046 function: ident '(' arglist ')' ;
1047
1048 arglist:
1049 simpleconstant
1050 | arglist ',' simpleconstant
1051 ;
1052
1053 simpleconstant:
1054 CHAR_CONST /* never used apparently*/
1055 | BYTE_CONST
1056 | SHORT_CONST
1057 | INT_CONST
1058 | INT64_CONST
1059 | UBYTE_CONST
1060 | USHORT_CONST
1061 | UINT_CONST
1062 | UINT64_CONST
1063 | FLOAT_CONST
1064 | DOUBLE_CONST
1065 | TERMSTRING
1066 ;
1067
1068 intlist:
1069 constint
1070 | intlist ',' constint
1071 ;
1072
1073 constint:
1074 INT_CONST
1075 | UINT_CONST
1076 | INT64_CONST
1077 | UINT64_CONST
1078 ;
1079
1080 conststring: TERMSTRING ;
1081
1082 constbool:
1083 conststring
1084 | constint
1085 ;
1086
1087 /* Push all idents thru here for tracking */
1088 ident: IDENT ;
1089
1090
1091
1092Printed: 119-6-20 $Date: 2010/04/29 16:38:55 $ NCGEN(1)