i.cluster(1grass)

1i.cluster(1)                  Grass User's Manual                 i.cluster(1)
2
3
4

NAME

6       i.cluster   -  Generates spectral signatures for land cover types in an
7       image using a clustering algorithm.
8       The resulting signature file is used as input for i.maxlik, to generate
9       an unsupervised image classification.
10

KEYWORDS

12       imagery, classification, signatures
13

SYNOPSIS

15       i.cluster
16       i.cluster --help
17       i.cluster  group=name  subgroup=name signaturefile=name classes=integer
18       [seed=name]   [sample=row_interval,col_interval]   [iterations=integer]
19       [convergence=float]        [separation=float]        [min_size=integer]
20       [reportfile=name]    [--overwrite]   [--help]   [--verbose]   [--quiet]
21       [--ui]
22
23   Flags:
24       --overwrite
25           Allow output files to overwrite existing files
26
27       --help
28           Print usage summary
29
30       --verbose
31           Verbose module output
32
33       --quiet
34           Quiet module output
35
36       --ui
37           Force launching GUI dialog
38
39   Parameters:
40       group=name [required]
41           Name of input imagery group
42
43       subgroup=name [required]
44           Name of input imagery subgroup
45
46       signaturefile=name [required]
47           Name for output file containing result signatures
48
49       classes=integer [required]
50           Initial number of classes
51           Options: 1-255
52
53       seed=name
54           Name of file containing initial signatures
55
56       sample=row_interval,col_interval
57           Sampling intervals (by row and col); default: ~10,000 pixels
58
59       iterations=integer
60           Maximum number of iterations
61           Default: 30
62
63       convergence=float
64           Percent convergence
65           Options: 0-100
66           Default: 98.0
67
68       separation=float
69           Cluster separation
70           Default: 0.0
71
72       min_size=integer
73           Minimum number of pixels in a class
74           Default: 17
75
76       reportfile=name
77           Name for output file containing final report
78

DESCRIPTION

80       i.cluster  performs the first pass in the two-pass unsupervised classi‐
81       fication of imagery, while the GRASS module i.maxlik executes the  sec‐
82       ond pass.  Both commands must be run to complete the unsupervised clas‐
83       sification.
84
85       i.cluster is a clustering algorithm  (a  modification  of  the  k-means
86       clustering  algorithm) that reads through the (raster) imagery data and
87       builds pixel clusters based on the spectral reflectances of the  pixels
88       (see  Figure).   The  pixel clusters are imagery categories that can be
89       related to land cover types on the ground. The  spectral  distributions
90       of  the  clusters (e.g., land cover spectral signatures) are influenced
91       by six parameters set by the user. A relevant parameter set by the user
92       is the initial number of clusters to be discriminated.
93
94       Fig.:  Land use/land cover clustering of LANDSAT scene (sim‐
95       plified)
96
97
98       i.cluster starts by generating spectral signatures for this  number  of
99       clusters  and  "attempts" to end up with this number of clusters during
100       the clustering process.  The resulting number  of  clusters  and  their
101       spectral  distributions,  however,  are also influenced by the range of
102       the spectral values (category values) in the image files and the  other
103       parameters set by the user.  These parameters are:  the minimum cluster
104       size, minimum cluster separation, the percent convergence, the  maximum
105       number of iterations, and the row and column sampling intervals.
106
107       The  cluster  spectral  signatures  that result are composed of cluster
108       means and covariance matrices.   These  cluster  means  and  covariance
109       matrices  are used in the second pass (i.maxlik) to classify the image.
110       The clusters or spectral classes result can be related  to  land  cover
111       types  on  the ground.  The user has to specify the name of group file,
112       the name of subgroup file, the name of a file to contain result  signa‐
113       tures,  the initial number of clusters to be discriminated, and option‐
114       ally other parameters (see below) where the group  should  contain  the
115       imagery files that the user wishes to classify.  The subgroup is a sub‐
116       set of this group.  The user must create a group and subgroup  by  run‐
117       ning  the GRASS program i.group before running i.cluster.  The subgroup
118       should contain only the imagery band files  that  the  user  wishes  to
119       classify.   Note  that  this  subgroup  must contain more than one band
120       file.  The purpose of the group and subgroup is to collect  map  layers
121       for  classification  or analysis. The signaturefile is the file to con‐
122       tain result signatures which can be used as input  for  i.maxlik.   The
123       classes  value  is  the initial number of clusters to be discriminated;
124       any parameter values left unspecified are set to their default values.
125
126   Parameters:
127       group=name
128           The name of the group file which contains the  imagery  files  that
129           the user wishes to classify.
130
131       subgroup=name
132           The  name  of  the  subset  of the group specified in group option,
133           which must contain only imagery band files and more than  one  band
134           file.  The  user  must create a group and a subgroup by running the
135           GRASS program i.group before running i.cluster.
136
137       signaturefile=name
138           The name assigned to output signature file  which  contains  signa‐
139           tures  of  classes  and can be used as the input file for the GRASS
140           program i.maxlik for an unsupervised classification.
141
142       classes=value
143           The number of clusters that will initially  be  identified  in  the
144           clustering process before the iterations begin.
145
146       seed=name
147           The  name of a seed signature file is optional. The seed signatures
148           are signatures that contain cluster means and  covariance  matrices
149           which  were  calculated prior to the current run of i.cluster. They
150           may be acquired from a previously run of i.cluster or from a super‐
151           vised  classification  signature training site section (e.g., using
152           the signature file output by g.gui.iclass).  The  purpose  of  seed
153           signatures  is  to optimize the cluster decision boundaries (means)
154           for the number of clusters specified.
155
156       sample=row_interval,col_interval
157           These numbers are optional with default values based on the size of
158           the data set such that the total pixels to be processed is approxi‐
159           mately 10,000 (consider round up).
160
161       iterations=value
162           This parameter determines the maximum number of iterations which is
163           greater  than  the  number  of  iterations predicted to achieve the
164           optimum percent convergence. The default value is 30. If the number
165           of  iterations reaches the maximum designated by the user; the user
166           may want to rerun i.cluster with a higher number of iterations (see
167           reportfile).
168           Default: 30
169
170       convergence=value
171           A  high  percent  convergence  is  the point at which cluster means
172           become stable during the iteration process.  The default  value  is
173           98.0  percent.   When  clusters are being created, their means con‐
174           stantly change as pixels are assigned to them  and  the  means  are
175           recalculated  to  include  the  new pixel.  After all clusters have
176           been created, i.cluster begins iterations that change cluster means
177           by  maximizing the distances between them.  As these means shift, a
178           higher and higher convergence is approached.   Because  means  will
179           never  become  totally  static, a percent convergence and a maximum
180           number of iterations are supplied to stop  the  iterative  process.
181           The percent convergence should be reached before the maximum number
182           of iterations. If the maximum number of iterations is  reached,  it
183           is  probable  that the desired percent convergence was not reached.
184           The number of iterations is reported in the cluster  statistics  in
185           the report file (see reportfile).
186           Default: 98.0
187
188       separation=value
189           This  is the minimum separation below which clusters will be merged
190           in the iteration process. The default value  is  0.0.  This  is  an
191           image-specific  number (a "magic" number) that depends on the image
192           data being classified and the number of  final  clusters  that  are
193           acceptable.  Its  determination requires experimentation. Note that
194           as the minimum class (or cluster) separation is increased, the max‐
195           imum  number of iterations should also be increased to achieve this
196           separation with a high percentage of convergence (see convergence).
197           Default: 0.0
198
199       min_size=value
200           This is the minimum number of pixels that will be used to define  a
201           cluster,  and  is  therefore the minimum number of pixels for which
202           means and covariance matrices will be calculated.
203           Default: 17
204
205       reportfile=name
206           The reportfile is an optional parameter which contains the  result,
207           i.e.,  the  statistics  for  each  cluster.  Also  included are the
208           resulting percent convergence for the clusters, the number of iter‐
209           ations  that was required to achieve the convergence, and the sepa‐
210           rability matrix.
211

NOTES

213   Sampling method
214       i.cluster does not cluster all pixels, but only a sample (see parameter
215       sample).  The  result  of  that  clustering  is not that all pixels are
216       assigned to a given cluster; essentially,  only  signatures  which  are
217       representative of a given cluster are generated. When running i.cluster
218       on the same data asking for the same number of classes, but  with  dif‐
219       ferent  sample  sizes,  likely  slightly  different signatures for each
220       cluster are obtained at each run.
221
222   Algorithm used for i.cluster
223       The algorithm uses input parameters set by the user on the initial num‐
224       ber  of clusters, the minimum distance between clusters, and the corre‐
225       spondence between iterations which is desired,  and  minimum  size  for
226       each  cluster.  It  also  asks  if all pixels to be clustered, or every
227       "x"th row and "y"th column (sampling), the correspondence between iter‐
228       ations desired, and the maximum number of iterations to be carried out.
229
230       In  the  1st  pass,  initial cluster means for each band are defined by
231       giving the first cluster a value equal to the band mean minus its stan‐
232       dard  deviation,  and  the  last cluster a value equal to the band mean
233       plus its standard deviation, with all other cluster  means  distributed
234       equally  spaced  in  between  these. Each pixel is then assigned to the
235       class which it is closest to, distance being measured as Euclidean dis‐
236       tance.  All  clusters less than the user-specified minimum distance are
237       then merged. If a cluster has less than the user-specified minimum num‐
238       ber  of pixels, all those pixels are again reassigned to the next near‐
239       est cluster. New cluster means are calculated  for  each  band  as  the
240       average  of  raster pixel values in that band for all pixels present in
241       that cluster.
242
243       In the 2nd pass, pixels are then again reassigned to clusters based  on
244       new cluster means. The cluster means are then again recalculated.  This
245       process is repeated until the correspondence between iterations reaches
246       a user-specified level, or till the maximum number of iterations speci‐
247       fied is over, whichever comes first.
248

EXAMPLE

250       Preparing the statistics for unsupervised classification of  a  LANDSAT
251       subscene in North Carolina:
252       g.region raster=lsat7_2002_10 -p
253       # store VIZ, NIR, MIR into group/subgroup (leaving out TIR)
254       i.group group=lsat7_2002 subgroup=lsat7_2002 \
255         input=lsat7_2002_10,lsat7_2002_20,lsat7_2002_30,lsat7_2002_40,lsat7_2002_50,lsat7_2002_70
256       # generate signature file and report
257       i.cluster group=lsat7_2002 subgroup=lsat7_2002 \
258         signaturefile=sig_cluster_lsat2002 \
259         classes=10 reportfile=rep_clust_lsat2002.txt
260       To  complete  the unsupervised classification, i.maxlik is subsequently
261       used.  See example in its manual page.
262

AUTHORS

275       Michael Shapiro, U.S. Army Construction Engineering Research Laboratory
276       Tao Wen, University of Illinois at Urbana-Champaign, Illinois
277
278       Last changed: $Date: 2018-03-02 23:36:04 +0100 (Fri, 02 Mar 2018) $
279

SOURCE CODE

281       Available at: i.cluster source code (history)
282
283       Main index | Imagery index | Topics index | Keywords index |  Graphical
284       index | Full index
285
286       © 2003-2019 GRASS Development Team, GRASS GIS 7.6.0 Reference Manual
287
288
289
290GRASS 7.6.0                                                       i.cluster(1)