1i.cluster(1) GRASS GIS User's Manual i.cluster(1)
2
3
4
6 i.cluster - Generates spectral signatures for land cover types in an
7 image using a clustering algorithm.
8 The resulting signature file is used as input for i.maxlik, to generate
9 an unsupervised image classification.
10
12 imagery, classification, signatures
13
15 i.cluster
16 i.cluster --help
17 i.cluster group=name subgroup=name signaturefile=name classes=integer
18 [seed=name] [sample=rows,cols] [iterations=integer] [conver‐
19 gence=float] [separation=float] [min_size=integer] [report‐
20 file=name] [--overwrite] [--help] [--verbose] [--quiet] [--ui]
21
22 Flags:
23 --overwrite
24 Allow output files to overwrite existing files
25
26 --help
27 Print usage summary
28
29 --verbose
30 Verbose module output
31
32 --quiet
33 Quiet module output
34
35 --ui
36 Force launching GUI dialog
37
38 Parameters:
39 group=name [required]
40 Name of input imagery group
41
42 subgroup=name [required]
43 Name of input imagery subgroup
44
45 signaturefile=name [required]
46 Name for output file containing result signatures
47
48 classes=integer [required]
49 Initial number of classes
50 Options: 1-255
51
52 seed=name
53 Name of file containing initial signatures
54
55 sample=rows,cols
56 Number of rows and columns over which a sample pixel is taken
57
58 iterations=integer
59 Maximum number of iterations
60 Default: 30
61
62 convergence=float
63 Percent convergence
64 Options: 0-100
65 Default: 98.0
66
67 separation=float
68 Cluster separation
69 Default: 0.0
70
71 min_size=integer
72 Minimum number of pixels in a class
73 Default: 17
74
75 reportfile=name
76 Name for output file containing final report
77
79 i.cluster performs the first pass in the two-pass unsupervised classi‐
80 fication of imagery, while the GRASS module i.maxlik executes the sec‐
81 ond pass. Both commands must be run to complete the unsupervised clas‐
82 sification.
83
84 i.cluster is a clustering algorithm (a modification of the k-means
85 clustering algorithm) that reads through the (raster) imagery data and
86 builds pixel clusters based on the spectral reflectances of the pixels
87 (see Figure). The pixel clusters are imagery categories that can be
88 related to land cover types on the ground. The spectral distributions
89 of the clusters (e.g., land cover spectral signatures) are influenced
90 by six parameters set by the user. A relevant parameter set by the user
91 is the initial number of clusters to be discriminated.
92
93 Fig.: Land use/land cover clustering of LANDSAT scene (sim‐
94 plified)
95
96
97 i.cluster starts by generating spectral signatures for this number of
98 clusters and "attempts" to end up with this number of clusters during
99 the clustering process. The resulting number of clusters and their
100 spectral distributions, however, are also influenced by the range of
101 the spectral values (category values) in the image files and the other
102 parameters set by the user. These parameters are: the minimum cluster
103 size, minimum cluster separation, the percent convergence, the maximum
104 number of iterations, and the row and column sampling intervals.
105
106 The cluster spectral signatures that result are composed of cluster
107 means and covariance matrices. These cluster means and covariance ma‐
108 trices are used in the second pass (i.maxlik) to classify the image.
109 The clusters or spectral classes result can be related to land cover
110 types on the ground. The user has to specify the name of group file,
111 the name of subgroup file, the name of a file to contain result signa‐
112 tures, the initial number of clusters to be discriminated, and option‐
113 ally other parameters (see below) where the group should contain the
114 imagery files that the user wishes to classify. The subgroup is a sub‐
115 set of this group. The user must create a group and subgroup by run‐
116 ning the GRASS program i.group before running i.cluster. The subgroup
117 should contain only the imagery band files that the user wishes to
118 classify. Note that this subgroup must contain more than one band
119 file. The purpose of the group and subgroup is to collect map layers
120 for classification or analysis. The signaturefile is the file to con‐
121 tain result signatures which can be used as input for i.maxlik. The
122 classes value is the initial number of clusters to be discriminated;
123 any parameter values left unspecified are set to their default values.
124
125 For all raster maps used to generate signature file it is recommended
126 to have semantic label set. Use r.support to set semantc labels of
127 each member of the imagery group. Signatures generated for one scene
128 are suitable for classification of other scenes as long as they consist
129 of same raster bands (semantic labels match). If semantic labels are
130 not set, it will be possible to use obtained signature file to classify
131 only the same imagery group used for generating signatures.
132
133 Parameters:
134 group=name
135 The name of the group file which contains the imagery files that
136 the user wishes to classify.
137
138 subgroup=name
139 The name of the subset of the group specified in group option,
140 which must contain only imagery band files and more than one band
141 file. The user must create a group and a subgroup by running the
142 GRASS program i.group before running i.cluster.
143
144 signaturefile=name
145 The name assigned to output signature file which contains signa‐
146 tures of classes and can be used as the input file for the GRASS
147 program i.maxlik for an unsupervised classification.
148
149 classes=value
150 The number of clusters that will initially be identified in the
151 clustering process before the iterations begin.
152
153 seed=name
154 The name of a seed signature file is optional. The seed signatures
155 are signatures that contain cluster means and covariance matrices
156 which were calculated prior to the current run of i.cluster. They
157 may be acquired from a previously run of i.cluster or from a super‐
158 vised classification signature training site section (e.g., using
159 the signature file output by g.gui.iclass). The purpose of seed
160 signatures is to optimize the cluster decision boundaries (means)
161 for the number of clusters specified.
162
163 sample=rows,cols
164 These numbers are optional with default values based on the size of
165 the data set such that the total pixels to be processed is approxi‐
166 mately 10,000 (consider round up). The smaller these numbers, the
167 larger the sample size used to generate the signatures for the
168 classes defined.
169
170 iterations=value
171 This parameter determines the maximum number of iterations which is
172 greater than the number of iterations predicted to achieve the op‐
173 timum percent convergence. The default value is 30. If the number
174 of iterations reaches the maximum designated by the user; the user
175 may want to rerun i.cluster with a higher number of iterations (see
176 reportfile).
177 Default: 30
178
179 convergence=value
180 A high percent convergence is the point at which cluster means be‐
181 come stable during the iteration process. The default value is
182 98.0 percent. When clusters are being created, their means con‐
183 stantly change as pixels are assigned to them and the means are re‐
184 calculated to include the new pixel. After all clusters have been
185 created, i.cluster begins iterations that change cluster means by
186 maximizing the distances between them. As these means shift, a
187 higher and higher convergence is approached. Because means will
188 never become totally static, a percent convergence and a maximum
189 number of iterations are supplied to stop the iterative process.
190 The percent convergence should be reached before the maximum number
191 of iterations. If the maximum number of iterations is reached, it
192 is probable that the desired percent convergence was not reached.
193 The number of iterations is reported in the cluster statistics in
194 the report file (see reportfile).
195 Default: 98.0
196
197 separation=value
198 This is the minimum separation below which clusters will be merged
199 in the iteration process. The default value is 0.0. This is an im‐
200 age-specific number (a "magic" number) that depends on the image
201 data being classified and the number of final clusters that are ac‐
202 ceptable. Its determination requires experimentation. Note that as
203 the minimum class (or cluster) separation is increased, the maximum
204 number of iterations should also be increased to achieve this sepa‐
205 ration with a high percentage of convergence (see convergence).
206 Default: 0.0
207
208 min_size=value
209 This is the minimum number of pixels that will be used to define a
210 cluster, and is therefore the minimum number of pixels for which
211 means and covariance matrices will be calculated.
212 Default: 17
213
214 reportfile=name
215 The reportfile is an optional parameter which contains the result,
216 i.e., the statistics for each cluster. Also included are the re‐
217 sulting percent convergence for the clusters, the number of itera‐
218 tions that was required to achieve the convergence, and the separa‐
219 bility matrix.
220
222 Sampling method
223 i.cluster does not cluster all pixels, but only a sample (see parameter
224 sample). The result of that clustering is not that all pixels are as‐
225 signed to a given cluster; essentially, only signatures which are rep‐
226 resentative of a given cluster are generated. When running i.cluster on
227 the same data asking for the same number of classes, but with different
228 sample sizes, likely slightly different signatures for each cluster are
229 obtained at each run.
230
231 Algorithm used for i.cluster
232 The algorithm uses input parameters set by the user on the initial num‐
233 ber of clusters, the minimum distance between clusters, and the corre‐
234 spondence between iterations which is desired, and minimum size for
235 each cluster. It also asks if all pixels to be clustered, or every
236 "x"th row and "y"th column (sampling), the correspondence between iter‐
237 ations desired, and the maximum number of iterations to be carried out.
238
239 In the 1st pass, initial cluster means for each band are defined by
240 giving the first cluster a value equal to the band mean minus its stan‐
241 dard deviation, and the last cluster a value equal to the band mean
242 plus its standard deviation, with all other cluster means distributed
243 equally spaced in between these. Each pixel is then assigned to the
244 class which it is closest to, distance being measured as Euclidean dis‐
245 tance. All clusters less than the user-specified minimum distance are
246 then merged. If a cluster has less than the user-specified minimum num‐
247 ber of pixels, all those pixels are again reassigned to the next near‐
248 est cluster. New cluster means are calculated for each band as the av‐
249 erage of raster pixel values in that band for all pixels present in
250 that cluster.
251
252 In the 2nd pass, pixels are then again reassigned to clusters based on
253 new cluster means. The cluster means are then again recalculated. This
254 process is repeated until the correspondence between iterations reaches
255 a user-specified level, or till the maximum number of iterations speci‐
256 fied is over, whichever comes first.
257
259 Preparing the statistics for unsupervised classification of a LANDSAT
260 scene within North Carolina location:
261 # Set computational region to match the scene
262 g.region raster=lsat7_2002_10 -p
263 # store VIZ, NIR, MIR into group/subgroup (leaving out TIR)
264 i.group group=lsat7_2002 subgroup=res_30m \
265 input=lsat7_2002_10,lsat7_2002_20,lsat7_2002_30,lsat7_2002_40,lsat7_2002_50,lsat7_2002_70
266 # generate signature file and report
267 i.cluster group=lsat7_2002 subgroup=res_30m \
268 signaturefile=cluster_lsat2002 \
269 classes=10 reportfile=rep_clust_lsat2002.txt
270 To complete the unsupervised classification, i.maxlik is subsequently
271 used. See example in its manual page.
272
273 The signature file obtained in the example above will allow to classify
274 the current imagery group only (lsat7_2002). If the user would like to
275 re-use the signature file for the classification of different imagery
276 group(s), they can set semantic labels for each group member before‐
277 hand, i.e., before generating the signature files. Semantic labels are
278 set by means of r.support as shown below:
279 # Define semantic labels for all LANDSAT bands
280 r.support map=lsat7_2002_10 semantic_label=TM7_1
281 r.support map=lsat7_2002_20 semantic_label=TM7_2
282 r.support map=lsat7_2002_30 semantic_label=TM7_3
283 r.support map=lsat7_2002_40 semantic_label=TM7_4
284 r.support map=lsat7_2002_50 semantic_label=TM7_5
285 r.support map=lsat7_2002_61 semantic_label=TM7_61
286 r.support map=lsat7_2002_62 semantic_label=TM7_62
287 r.support map=lsat7_2002_70 semantic_label=TM7_7
288 r.support map=lsat7_2002_80 semantic_label=TM7_8
289
291 • Image classification wiki page
292
293 • Historical reference also the GRASS GIS 4 Image Processing man‐
294 ual (PDF)
295
296 • Wikipedia article on k-means clustering (note that i.cluster
297 uses a modification of the k-means clustering algorithm)
298
299 r.support, g.gui.iclass, i.group, i.gensig, i.maxlik, i.segment,
300 i.smap, r.kappa
301
303 Michael Shapiro, U.S. Army Construction Engineering Research Laboratory
304 Tao Wen, University of Illinois at Urbana-Champaign, Illinois
305 Semantic label support: Maris Nartiss, University of Latvia
306
308 Available at: i.cluster source code (history)
309
310 Accessed: Saturday Oct 28 18:19:00 2023
311
312 Main index | Imagery index | Topics index | Keywords index | Graphical
313 index | Full index
314
315 © 2003-2023 GRASS Development Team, GRASS GIS 8.3.1 Reference Manual
316
317
318
319GRASS 8.3.1 i.cluster(1)