1i.cluster(1)                GRASS GIS User's Manual               i.cluster(1)
2
3
4

NAME

6       i.cluster   -  Generates spectral signatures for land cover types in an
7       image using a clustering algorithm.
8       The resulting signature file is used as input for i.maxlik, to generate
9       an unsupervised image classification.
10

KEYWORDS

12       imagery, classification, signatures
13

SYNOPSIS

15       i.cluster
16       i.cluster --help
17       i.cluster  group=name  subgroup=name signaturefile=name classes=integer
18       [seed=name]    [sample=rows,cols]     [iterations=integer]     [conver‐
19       gence=float]     [separation=float]     [min_size=integer]     [report‐
20       file=name]   [--overwrite]  [--help]  [--verbose]  [--quiet]  [--ui]
21
22   Flags:
23       --overwrite
24           Allow output files to overwrite existing files
25
26       --help
27           Print usage summary
28
29       --verbose
30           Verbose module output
31
32       --quiet
33           Quiet module output
34
35       --ui
36           Force launching GUI dialog
37
38   Parameters:
39       group=name [required]
40           Name of input imagery group
41
42       subgroup=name [required]
43           Name of input imagery subgroup
44
45       signaturefile=name [required]
46           Name for output file containing result signatures
47
48       classes=integer [required]
49           Initial number of classes
50           Options: 1-255
51
52       seed=name
53           Name of file containing initial signatures
54
55       sample=rows,cols
56           Number of rows and columns over which a sample pixel is taken
57
58       iterations=integer
59           Maximum number of iterations
60           Default: 30
61
62       convergence=float
63           Percent convergence
64           Options: 0-100
65           Default: 98.0
66
67       separation=float
68           Cluster separation
69           Default: 0.0
70
71       min_size=integer
72           Minimum number of pixels in a class
73           Default: 17
74
75       reportfile=name
76           Name for output file containing final report
77

DESCRIPTION

79       i.cluster performs the first pass in the two-pass unsupervised  classi‐
80       fication  of imagery, while the GRASS module i.maxlik executes the sec‐
81       ond pass.  Both commands must be run to complete the unsupervised clas‐
82       sification.
83
84       i.cluster  is  a  clustering  algorithm  (a modification of the k-means
85       clustering algorithm) that reads through the (raster) imagery data  and
86       builds  pixel clusters based on the spectral reflectances of the pixels
87       (see Figure).  The pixel clusters are imagery categories  that  can  be
88       related  to  land cover types on the ground. The spectral distributions
89       of the clusters (e.g., land cover spectral signatures)  are  influenced
90       by six parameters set by the user. A relevant parameter set by the user
91       is the initial number of clusters to be discriminated.
92
93       Fig.: Land use/land cover clustering of LANDSAT scene  (sim‐
94       plified)
95
96
97       i.cluster  starts  by generating spectral signatures for this number of
98       clusters and "attempts" to end up with this number of  clusters  during
99       the  clustering  process.   The  resulting number of clusters and their
100       spectral distributions, however, are also influenced by  the  range  of
101       the  spectral values (category values) in the image files and the other
102       parameters set by the user.  These parameters are:  the minimum cluster
103       size,  minimum cluster separation, the percent convergence, the maximum
104       number of iterations, and the row and column sampling intervals.
105
106       The cluster spectral signatures that result  are  composed  of  cluster
107       means  and  covariance  matrices.   These  cluster means and covariance
108       matrices are used in the second pass (i.maxlik) to classify the  image.
109       The  clusters  or  spectral classes result can be related to land cover
110       types on the ground.  The user has to specify the name of  group  file,
111       the  name of subgroup file, the name of a file to contain result signa‐
112       tures, the initial number of clusters to be discriminated, and  option‐
113       ally  other  parameters  (see below) where the group should contain the
114       imagery files that the user wishes to classify.  The subgroup is a sub‐
115       set  of  this group.  The user must create a group and subgroup by run‐
116       ning the GRASS program i.group before running i.cluster.  The  subgroup
117       should  contain  only  the  imagery  band files that the user wishes to
118       classify.  Note that this subgroup must  contain  more  than  one  band
119       file.   The  purpose of the group and subgroup is to collect map layers
120       for classification or analysis. The signaturefile is the file  to  con‐
121       tain  result  signatures  which can be used as input for i.maxlik.  The
122       classes value is the initial number of clusters  to  be  discriminated;
123       any parameter values left unspecified are set to their default values.
124
125   Parameters:
126       group=name
127           The  name  of  the group file which contains the imagery files that
128           the user wishes to classify.
129
130       subgroup=name
131           The name of the subset of the  group  specified  in  group  option,
132           which  must  contain only imagery band files and more than one band
133           file. The user must create a group and a subgroup  by  running  the
134           GRASS program i.group before running i.cluster.
135
136       signaturefile=name
137           The  name  assigned  to output signature file which contains signa‐
138           tures of classes and can be used as the input file  for  the  GRASS
139           program i.maxlik for an unsupervised classification.
140
141       classes=value
142           The  number  of  clusters  that will initially be identified in the
143           clustering process before the iterations begin.
144
145       seed=name
146           The name of a seed signature file is optional. The seed  signatures
147           are  signatures  that contain cluster means and covariance matrices
148           which were calculated prior to the current run of  i.cluster.  They
149           may be acquired from a previously run of i.cluster or from a super‐
150           vised classification signature training site section  (e.g.,  using
151           the  signature  file  output by g.gui.iclass).  The purpose of seed
152           signatures is to optimize the cluster decision  boundaries  (means)
153           for the number of clusters specified.
154
155       sample=rows,cols
156           These numbers are optional with default values based on the size of
157           the data set such that the total pixels to be processed is approxi‐
158           mately  10,000  (consider round up). The smaller these numbers, the
159           larger the sample size used to  generate  the  signatures  for  the
160           classes defined.
161
162       iterations=value
163           This parameter determines the maximum number of iterations which is
164           greater than the number of  iterations  predicted  to  achieve  the
165           optimum percent convergence. The default value is 30. If the number
166           of iterations reaches the maximum designated by the user; the  user
167           may want to rerun i.cluster with a higher number of iterations (see
168           reportfile).
169           Default: 30
170
171       convergence=value
172           A high percent convergence is the  point  at  which  cluster  means
173           become  stable  during the iteration process.  The default value is
174           98.0 percent.  When clusters are being created,  their  means  con‐
175           stantly  change  as  pixels  are assigned to them and the means are
176           recalculated to include the new pixel.   After  all  clusters  have
177           been created, i.cluster begins iterations that change cluster means
178           by maximizing the distances between them.  As these means shift,  a
179           higher  and  higher  convergence is approached.  Because means will
180           never become totally static, a percent convergence  and  a  maximum
181           number  of  iterations  are supplied to stop the iterative process.
182           The percent convergence should be reached before the maximum number
183           of  iterations.  If the maximum number of iterations is reached, it
184           is probable that the desired percent convergence was  not  reached.
185           The  number  of iterations is reported in the cluster statistics in
186           the report file (see reportfile).
187           Default: 98.0
188
189       separation=value
190           This is the minimum separation below which clusters will be  merged
191           in  the  iteration  process.  The  default value is 0.0. This is an
192           image-specific number (a "magic" number) that depends on the  image
193           data  being  classified  and  the number of final clusters that are
194           acceptable. Its determination requires experimentation.  Note  that
195           as the minimum class (or cluster) separation is increased, the max‐
196           imum number of iterations should also be increased to achieve  this
197           separation with a high percentage of convergence (see convergence).
198           Default: 0.0
199
200       min_size=value
201           This  is the minimum number of pixels that will be used to define a
202           cluster, and is therefore the minimum number of  pixels  for  which
203           means and covariance matrices will be calculated.
204           Default: 17
205
206       reportfile=name
207           The  reportfile is an optional parameter which contains the result,
208           i.e., the statistics  for  each  cluster.  Also  included  are  the
209           resulting percent convergence for the clusters, the number of iter‐
210           ations that was required to achieve the convergence, and the  sepa‐
211           rability matrix.
212

NOTES

214   Sampling method
215       i.cluster does not cluster all pixels, but only a sample (see parameter
216       sample). The result of that clustering  is  not  that  all  pixels  are
217       assigned  to  a  given  cluster; essentially, only signatures which are
218       representative of a given cluster are generated. When running i.cluster
219       on  the  same data asking for the same number of classes, but with dif‐
220       ferent sample sizes, likely  slightly  different  signatures  for  each
221       cluster are obtained at each run.
222
223   Algorithm used for i.cluster
224       The algorithm uses input parameters set by the user on the initial num‐
225       ber of clusters, the minimum distance between clusters, and the  corre‐
226       spondence  between  iterations  which  is desired, and minimum size for
227       each cluster. It also asks if all pixels  to  be  clustered,  or  every
228       "x"th row and "y"th column (sampling), the correspondence between iter‐
229       ations desired, and the maximum number of iterations to be carried out.
230
231       In the 1st pass, initial cluster means for each  band  are  defined  by
232       giving the first cluster a value equal to the band mean minus its stan‐
233       dard deviation, and the last cluster a value equal  to  the  band  mean
234       plus  its  standard deviation, with all other cluster means distributed
235       equally spaced in between these. Each pixel is  then  assigned  to  the
236       class which it is closest to, distance being measured as Euclidean dis‐
237       tance. All clusters less than the user-specified minimum  distance  are
238       then merged. If a cluster has less than the user-specified minimum num‐
239       ber of pixels, all those pixels are again reassigned to the next  near‐
240       est  cluster.  New  cluster  means  are calculated for each band as the
241       average of raster pixel values in that band for all pixels  present  in
242       that cluster.
243
244       In  the 2nd pass, pixels are then again reassigned to clusters based on
245       new cluster means. The cluster means are then again recalculated.  This
246       process is repeated until the correspondence between iterations reaches
247       a user-specified level, or till the maximum number of iterations speci‐
248       fied is over, whichever comes first.
249

EXAMPLE

251       Preparing  the  statistics for unsupervised classification of a LANDSAT
252       subscene in North Carolina:
253       g.region raster=lsat7_2002_10 -p
254       # store VIZ, NIR, MIR into group/subgroup (leaving out TIR)
255       i.group group=lsat7_2002 subgroup=lsat7_2002 \
256         input=lsat7_2002_10,lsat7_2002_20,lsat7_2002_30,lsat7_2002_40,lsat7_2002_50,lsat7_2002_70
257       # generate signature file and report
258       i.cluster group=lsat7_2002 subgroup=lsat7_2002 \
259         signaturefile=sig_cluster_lsat2002 \
260         classes=10 reportfile=rep_clust_lsat2002.txt
261       To complete the unsupervised classification, i.maxlik  is  subsequently
262       used.  See example in its manual page.
263

SEE ALSO

265           ·   Image classification wiki page
266
267           ·   Historical reference also the GRASS GIS 4 Image Processing man‐
268               ual (PDF)
269
270           ·   Wikipedia article on k-means clustering  (note  that  i.cluster
271               uses a modification of the k-means clustering algorithm)
272
273        g.gui.iclass, i.group, i.gensig, i.maxlik, i.segment, i.smap, r.kappa
274

AUTHORS

276       Michael Shapiro, U.S. Army Construction Engineering Research Laboratory
277       Tao Wen, University of Illinois at Urbana-Champaign, Illinois
278

SOURCE CODE

280       Available at: i.cluster source code (history)
281
282       Main  index | Imagery index | Topics index | Keywords index | Graphical
283       index | Full index
284
285       © 2003-2020 GRASS Development Team, GRASS GIS 7.8.5 Reference Manual
286
287
288
289GRASS 7.8.5                                                       i.cluster(1)
Impressum