1i.cluster(1)                GRASS GIS User's Manual               i.cluster(1)
2
3
4

NAME

6       i.cluster   -  Generates spectral signatures for land cover types in an
7       image using a clustering algorithm.
8       The resulting signature file is used as input for i.maxlik, to generate
9       an unsupervised image classification.
10

KEYWORDS

12       imagery, classification, signatures
13

SYNOPSIS

15       i.cluster
16       i.cluster --help
17       i.cluster  group=name  subgroup=name signaturefile=name classes=integer
18       [seed=name]    [sample=rows,cols]     [iterations=integer]     [conver‐
19       gence=float]     [separation=float]     [min_size=integer]     [report‐
20       file=name]   [--overwrite]  [--help]  [--verbose]  [--quiet]  [--ui]
21
22   Flags:
23       --overwrite
24           Allow output files to overwrite existing files
25
26       --help
27           Print usage summary
28
29       --verbose
30           Verbose module output
31
32       --quiet
33           Quiet module output
34
35       --ui
36           Force launching GUI dialog
37
38   Parameters:
39       group=name [required]
40           Name of input imagery group
41
42       subgroup=name [required]
43           Name of input imagery subgroup
44
45       signaturefile=name [required]
46           Name for output file containing result signatures
47
48       classes=integer [required]
49           Initial number of classes
50           Options: 1-255
51
52       seed=name
53           Name of file containing initial signatures
54
55       sample=rows,cols
56           Number of rows and columns over which a sample pixel is taken
57
58       iterations=integer
59           Maximum number of iterations
60           Default: 30
61
62       convergence=float
63           Percent convergence
64           Options: 0-100
65           Default: 98.0
66
67       separation=float
68           Cluster separation
69           Default: 0.0
70
71       min_size=integer
72           Minimum number of pixels in a class
73           Default: 17
74
75       reportfile=name
76           Name for output file containing final report
77

DESCRIPTION

79       i.cluster performs the first pass in the two-pass unsupervised  classi‐
80       fication  of imagery, while the GRASS module i.maxlik executes the sec‐
81       ond pass.  Both commands must be run to complete the unsupervised clas‐
82       sification.
83
84       i.cluster  is  a  clustering  algorithm  (a modification of the k-means
85       clustering algorithm) that reads through the (raster) imagery data  and
86       builds  pixel clusters based on the spectral reflectances of the pixels
87       (see Figure).  The pixel clusters are imagery categories  that  can  be
88       related  to  land cover types on the ground. The spectral distributions
89       of the clusters (e.g., land cover spectral signatures)  are  influenced
90       by six parameters set by the user. A relevant parameter set by the user
91       is the initial number of clusters to be discriminated.
92
93       Fig.: Land use/land cover clustering of LANDSAT scene  (sim‐
94       plified)
95
96
97       i.cluster  starts  by generating spectral signatures for this number of
98       clusters and "attempts" to end up with this number of  clusters  during
99       the  clustering  process.   The  resulting number of clusters and their
100       spectral distributions, however, are also influenced by  the  range  of
101       the  spectral values (category values) in the image files and the other
102       parameters set by the user.  These parameters are:  the minimum cluster
103       size,  minimum cluster separation, the percent convergence, the maximum
104       number of iterations, and the row and column sampling intervals.
105
106       The cluster spectral signatures that result  are  composed  of  cluster
107       means  and covariance matrices.  These cluster means and covariance ma‐
108       trices are used in the second pass (i.maxlik) to  classify  the  image.
109       The  clusters  or  spectral classes result can be related to land cover
110       types on the ground.  The user has to specify the name of  group  file,
111       the  name of subgroup file, the name of a file to contain result signa‐
112       tures, the initial number of clusters to be discriminated, and  option‐
113       ally  other  parameters  (see below) where the group should contain the
114       imagery files that the user wishes to classify.  The subgroup is a sub‐
115       set  of  this group.  The user must create a group and subgroup by run‐
116       ning the GRASS program i.group before running i.cluster.  The  subgroup
117       should  contain  only  the  imagery  band files that the user wishes to
118       classify.  Note that this subgroup must  contain  more  than  one  band
119       file.   The  purpose of the group and subgroup is to collect map layers
120       for classification or analysis. The signaturefile is the file  to  con‐
121       tain  result  signatures  which can be used as input for i.maxlik.  The
122       classes value is the initial number of clusters  to  be  discriminated;
123       any parameter values left unspecified are set to their default values.
124
125       For  all  raster maps used to generate signature file it is recommended
126       to have semantic label set.  Use r.support to  set  semantc  labels  of
127       each  member  of the imagery group.  Signatures generated for one scene
128       are suitable for classification of other scenes as long as they consist
129       of  same  raster  bands (semantic labels match). If semantic labels are
130       not set, it will be possible to use obtained signature file to classify
131       only the same imagery group used for generating signatures.
132
133   Parameters:
134       group=name
135           The  name  of  the group file which contains the imagery files that
136           the user wishes to classify.
137
138       subgroup=name
139           The name of the subset of the  group  specified  in  group  option,
140           which  must  contain only imagery band files and more than one band
141           file. The user must create a group and a subgroup  by  running  the
142           GRASS program i.group before running i.cluster.
143
144       signaturefile=name
145           The  name  assigned  to output signature file which contains signa‐
146           tures of classes and can be used as the input file  for  the  GRASS
147           program i.maxlik for an unsupervised classification.
148
149       classes=value
150           The  number  of  clusters  that will initially be identified in the
151           clustering process before the iterations begin.
152
153       seed=name
154           The name of a seed signature file is optional. The seed  signatures
155           are  signatures  that contain cluster means and covariance matrices
156           which were calculated prior to the current run of  i.cluster.  They
157           may be acquired from a previously run of i.cluster or from a super‐
158           vised classification signature training site section  (e.g.,  using
159           the  signature  file  output by g.gui.iclass).  The purpose of seed
160           signatures is to optimize the cluster decision  boundaries  (means)
161           for the number of clusters specified.
162
163       sample=rows,cols
164           These numbers are optional with default values based on the size of
165           the data set such that the total pixels to be processed is approxi‐
166           mately  10,000  (consider round up). The smaller these numbers, the
167           larger the sample size used to  generate  the  signatures  for  the
168           classes defined.
169
170       iterations=value
171           This parameter determines the maximum number of iterations which is
172           greater than the number of iterations predicted to achieve the  op‐
173           timum  percent  convergence. The default value is 30. If the number
174           of iterations reaches the maximum designated by the user; the  user
175           may want to rerun i.cluster with a higher number of iterations (see
176           reportfile).
177           Default: 30
178
179       convergence=value
180           A high percent convergence is the point at which cluster means  be‐
181           come  stable  during  the  iteration process.  The default value is
182           98.0 percent.  When clusters are being created,  their  means  con‐
183           stantly change as pixels are assigned to them and the means are re‐
184           calculated to include the new pixel.  After all clusters have  been
185           created,  i.cluster  begins iterations that change cluster means by
186           maximizing the distances between them.  As  these  means  shift,  a
187           higher  and  higher  convergence is approached.  Because means will
188           never become totally static, a percent convergence  and  a  maximum
189           number  of  iterations  are supplied to stop the iterative process.
190           The percent convergence should be reached before the maximum number
191           of  iterations.  If the maximum number of iterations is reached, it
192           is probable that the desired percent convergence was  not  reached.
193           The  number  of iterations is reported in the cluster statistics in
194           the report file (see reportfile).
195           Default: 98.0
196
197       separation=value
198           This is the minimum separation below which clusters will be  merged
199           in  the iteration process. The default value is 0.0. This is an im‐
200           age-specific number (a "magic" number) that depends  on  the  image
201           data being classified and the number of final clusters that are ac‐
202           ceptable. Its determination requires experimentation. Note that  as
203           the minimum class (or cluster) separation is increased, the maximum
204           number of iterations should also be increased to achieve this sepa‐
205           ration with a high percentage of convergence (see convergence).
206           Default: 0.0
207
208       min_size=value
209           This  is the minimum number of pixels that will be used to define a
210           cluster, and is therefore the minimum number of  pixels  for  which
211           means and covariance matrices will be calculated.
212           Default: 17
213
214       reportfile=name
215           The  reportfile is an optional parameter which contains the result,
216           i.e., the statistics for each cluster. Also included  are  the  re‐
217           sulting  percent convergence for the clusters, the number of itera‐
218           tions that was required to achieve the convergence, and the separa‐
219           bility matrix.
220

NOTES

222   Sampling method
223       i.cluster does not cluster all pixels, but only a sample (see parameter
224       sample). The result of that clustering is not that all pixels  are  as‐
225       signed  to a given cluster; essentially, only signatures which are rep‐
226       resentative of a given cluster are generated. When running i.cluster on
227       the same data asking for the same number of classes, but with different
228       sample sizes, likely slightly different signatures for each cluster are
229       obtained at each run.
230
231   Algorithm used for i.cluster
232       The algorithm uses input parameters set by the user on the initial num‐
233       ber of clusters, the minimum distance between clusters, and the  corre‐
234       spondence  between  iterations  which  is desired, and minimum size for
235       each cluster. It also asks if all pixels  to  be  clustered,  or  every
236       "x"th row and "y"th column (sampling), the correspondence between iter‐
237       ations desired, and the maximum number of iterations to be carried out.
238
239       In the 1st pass, initial cluster means for each  band  are  defined  by
240       giving the first cluster a value equal to the band mean minus its stan‐
241       dard deviation, and the last cluster a value equal  to  the  band  mean
242       plus  its  standard deviation, with all other cluster means distributed
243       equally spaced in between these. Each pixel is  then  assigned  to  the
244       class which it is closest to, distance being measured as Euclidean dis‐
245       tance. All clusters less than the user-specified minimum  distance  are
246       then merged. If a cluster has less than the user-specified minimum num‐
247       ber of pixels, all those pixels are again reassigned to the next  near‐
248       est  cluster. New cluster means are calculated for each band as the av‐
249       erage of raster pixel values in that band for  all  pixels  present  in
250       that cluster.
251
252       In  the 2nd pass, pixels are then again reassigned to clusters based on
253       new cluster means. The cluster means are then again recalculated.  This
254       process is repeated until the correspondence between iterations reaches
255       a user-specified level, or till the maximum number of iterations speci‐
256       fied is over, whichever comes first.
257

EXAMPLE

259       Preparing  the  statistics for unsupervised classification of a LANDSAT
260       scene within North Carolina location:
261       # Set computational region to match the scene
262       g.region raster=lsat7_2002_10 -p
263       # store VIZ, NIR, MIR into group/subgroup (leaving out TIR)
264       i.group group=lsat7_2002 subgroup=res_30m \
265         input=lsat7_2002_10,lsat7_2002_20,lsat7_2002_30,lsat7_2002_40,lsat7_2002_50,lsat7_2002_70
266       # generate signature file and report
267       i.cluster group=lsat7_2002 subgroup=res_30m \
268         signaturefile=cluster_lsat2002 \
269         classes=10 reportfile=rep_clust_lsat2002.txt
270       To complete the unsupervised classification, i.maxlik  is  subsequently
271       used.  See example in its manual page.
272
273       The signature file obtained in the example above will allow to classify
274       the current imagery group only (lsat7_2002).  If the user would like to
275       re-use  the  signature file for the classification of different imagery
276       group(s), they can set semantic labels for each  group  member  before‐
277       hand, i.e., before generating the signature files.  Semantic labels are
278       set by means of r.support as shown below:
279       # Define semantic labels for all LANDSAT bands
280       r.support map=lsat7_2002_10 semantic_label=TM7_1
281       r.support map=lsat7_2002_20 semantic_label=TM7_2
282       r.support map=lsat7_2002_30 semantic_label=TM7_3
283       r.support map=lsat7_2002_40 semantic_label=TM7_4
284       r.support map=lsat7_2002_50 semantic_label=TM7_5
285       r.support map=lsat7_2002_61 semantic_label=TM7_61
286       r.support map=lsat7_2002_62 semantic_label=TM7_62
287       r.support map=lsat7_2002_70 semantic_label=TM7_7
288       r.support map=lsat7_2002_80 semantic_label=TM7_8
289

SEE ALSO

291           •   Image classification wiki page
292
293           •   Historical reference also the GRASS GIS 4 Image Processing man‐
294               ual (PDF)
295
296           •   Wikipedia  article  on  k-means clustering (note that i.cluster
297               uses a modification of the k-means clustering algorithm)
298
299         r.support,  g.gui.iclass,  i.group,  i.gensig,  i.maxlik,  i.segment,
300       i.smap, r.kappa
301

AUTHORS

303       Michael Shapiro, U.S. Army Construction Engineering Research Laboratory
304       Tao Wen, University of Illinois at Urbana-Champaign, Illinois
305       Semantic label support: Maris Nartiss, University of Latvia
306

SOURCE CODE

308       Available at: i.cluster source code (history)
309
310       Accessed: Mon Jun 20 16:47:26 2022
311
312       Main  index | Imagery index | Topics index | Keywords index | Graphical
313       index | Full index
314
315       © 2003-2022 GRASS Development Team, GRASS GIS 8.2.0 Reference Manual
316
317
318
319GRASS 8.2.0                                                       i.cluster(1)
Impressum