1DGEJSV(1)LAPACK routine (version 3.2) DGEJSV(1)
2
3
4
6 DGEJSV - computes the singular value decomposition (SVD) of a real M-
7 by-N matrix [A], where M >= N
8
10 SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
11
12 & M, N, A, LDA, SVA, U, LDU, V, LDV,
13
14 & WORK, LWORK, IWORK, INFO )
15
16 IMPLICIT NONE
17
18 INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
19
20 DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V(
21 LDV, * ),
22
23 & WORK( LWORK )
24
25 INTEGER IWORK( * )
26
27 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
28
30 DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
31 matrix [A], where M >= N. The SVD of [A] is written as
32 [A] = [U] * [SIGMA] * [V]^t,
33 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its
34 N diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix,
35 and [V] is an N-by-N orthogonal matrix. The diagonal elements of
36 [SIGMA] are the singular values of [A]. The columns of [U] and [V] are
37 the left and the right singular vectors of [A], respectively. The
38 matrices [U] and [V] are computed and stored in the arrays U and V,
39 respectively. The diagonal of [SIGMA] is computed and stored in the
40 array SVA.
41
43 Specifies the level of accuracy: = 'C': This option works well (high
44 relative accuracy) if A = B * D, with well-conditioned B and arbitrary
45 diagonal matrix D. The accuracy cannot be spoiled by COLUMN scaling.
46 The accuracy of the computed output depends on the condition of B, and
47 the procedure aims at the best theoretical accuracy. The relative
48 error max_{i=1:N}|d sigma_i| / sigma_i is bounded by f(M,N)*epsilon*
49 cond(B), independent of D. The input matrix is preprocessed with the
50 QRF with column pivoting. This initial preprocessing and precondition‐
51 ing by a rank revealing QR factorization is common for all values of
52 JOBA. Additional actions are specified as follows:
53 = 'E': Computation as with 'C' with an additional estimate of the con‐
54 dition number of B. It provides a realistic error bound. = 'F': If A =
55 D1 * C * D2 with ill-conditioned diagonal scalings D1, D2, and well-
56 conditioned matrix C, this option gives higher accuracy than the 'C'
57 option. If the structure of the input matrix is not known, and relative
58 accuracy is desirable, then this option is advisable. The input matrix
59 A is preprocessed with QR factorization with FULL (row and column) piv‐
60 oting. = 'G' Computation as with 'F' with an additional estimate of
61 the condition number of B, where A=D*B. If A has heavily weighted rows,
62 then using this condition number gives too pessimistic error bound. =
63 'A': Small singular values are the noise and the matrix is treated as
64 numerically rank defficient. The error in the computed singular values
65 is bounded by f(m,n)*epsilon*||A||. The computed SVD A = U * S * V^t
66 restores A up to f(m,n)*epsilon*||A||. This gives the procedure the
67 licence to discard (set to zero) all singular values below
68 N*epsilon*||A||. = 'R': Similar as in 'A'. Rank revealing property of
69 the initial QR factorization is used do reveal (using triangular fac‐
70 tor) a gap sigma_{r+1} < epsilon * sigma_r in which case the numerical
71 RANK is declared to be r. The SVD is computed with absolute error
72 bounds, but more accurately than with 'A'. Specifies whether to com‐
73 pute the columns of U: = 'U': N columns of U are returned in the array
74 U.
75 = 'F': full set of M left sing. vectors is returned in the array U.
76 = 'W': U may be used as workspace of length M*N. See the description of
77 U. = 'N': U is not computed. Specifies whether to compute the matrix
78 V:
79 = 'V': N columns of V are returned in the array V; Jacobi rotations are
80 not explicitly accumulated. = 'J': N columns of V are returned in the
81 array V, but they are computed as the product of Jacobi rotations. This
82 option is allowed only if JOBU .NE. 'N', i.e. in computing the full
83 SVD. = 'W': V may be used as workspace of length N*N. See the descrip‐
84 tion of V. = 'N': V is not computed. Specifies the RANGE for the sin‐
85 gular values. Issues the licence to set to zero small positive singular
86 values if they are outside specified range. If A .NE. 0 is scaled so
87 that the largest singular value of c*A is around DSQRT(BIG),
88 BIG=SLAMCH('O'), then JOBR issues the licence to kill columns of A
89 whose norm in c*A is less than DSQRT(SFMIN) (for JOBR.EQ.'R'), or less
90 than SMALL=SFMIN/EPSLN, where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E'). =
91 'N': Do not kill small columns of c*A. This option assumes that BLAS
92 and QR factorizations and triangular solvers are implemented to work in
93 that range. If the condition of A is greater than BIG, use DGESVJ. =
94 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
95 (roughly, as described above). This option is recommended.
96 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ For computing the singular values in the
97 FULL range [SFMIN,BIG] use DGESVJ. If the matrix is square then the
98 procedure may determine to use transposed A if A^t seems to be better
99 with respect to convergence. If the matrix is not square, JOBT is
100 ignored. This is subject to changes in the future. The decision is
101 based on two values of entropy over the adjoint orbit of A^t * A. See
102 the descriptions of WORK(6) and WORK(7). = 'T': transpose if entropy
103 test indicates possibly faster convergence of Jacobi process if A^t is
104 taken as input. If A is replaced with A^t, then the row pivoting is
105 included automatically. = 'N': do not speculate. This option can be
106 used to compute only the singular values, or the full SVD (U, SIGMA and
107 V). For only one set of singular vectors (U or V), the caller should
108 provide both U and V, as one of the matrices is used as workspace if
109 the matrix A is transposed. The implementer can easily remove this
110 constraint and make the code more complicated. See the descriptions of
111 U and V. Issues the licence to introduce structured perturbations to
112 drown denormalized numbers. This licence should be active if the denor‐
113 mals are poorly implemented, causing slow computation, especially in
114 cases of fast convergence (!). For details see [1,2]. For the sake of
115 simplicity, this perturbations are included only when the full SVD or
116 only the singular values are requested. The implementer/user can easily
117 add the perturbation for the cases of computing one set of singular
118 vectors. = 'P': introduce perturbation
119 = 'N': do not perturb
120
121 M (input) INTEGER
122 The number of rows of the input matrix A. M >= 0.
123
124 N (input) INTEGER
125 The number of columns of the input matrix A. M >= N >= 0.
126
127 A (input/workspace) REAL array, dimension (LDA,N)
128 On entry, the M-by-N matrix A.
129
130 LDA (input) INTEGER
131 The leading dimension of the array A. LDA >= max(1,M).
132
133 SVA (workspace/output) REAL array, dimension (N)
134 On exit, - For WORK(1)/WORK(2) = ONE: The singular values of A.
135 During the computation SVA contains Euclidean column norms of
136 the iterated matrices in the array A. - For WORK(1) .NE.
137 WORK(2): The singular values of A are
138 (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
139 sigma_max(A) overflows or if small singular values have been
140 saved from underflow by scaling the input matrix A. - If
141 JOBR='R' then some of the singular values may be returned as
142 exact zeros obtained by "set to zero" because they are below
143 the numerical rank threshold or are denormalized numbers.
144
145 U (workspace/output) REAL array, dimension ( LDU, N )
146 If JOBU = 'U', then U contains on exit the M-by-N matrix of the
147 left singular vectors. If JOBU = 'F', then U contains on exit
148 the M-by-M matrix of the left singular vectors, including an
149 ONB of the orthogonal complement of the Range(A). If JOBU =
150 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N), then U
151 is used as workspace if the procedure replaces A with A^t. In
152 that case, [V] is computed in U as left singular vectors of A^t
153 and then copied back to the V array. This 'W' option is just a
154 reminder to the caller that in this case U is reserved as
155 workspace of length N*N. If JOBU = 'N' U is not referenced.
156 The leading dimension of the array U, LDU >= 1. IF JOBU =
157 'U' or 'F' or 'W', then LDU >= M.
158
159 V (workspace/output) REAL array, dimension ( LDV, N )
160 If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
161 the right singular vectors; If JOBV = 'W', AND (JOBU.EQ.'U' AND
162 JOBT.EQ.'T' AND M.EQ.N), then V is used as workspace if the
163 pprocedure replaces A with A^t. In that case, [U] is computed
164 in V as right singular vectors of A^t and then copied back to
165 the U array. This 'W' option is just a reminder to the caller
166 that in this case V is reserved as workspace of length N*N. If
167 JOBV = 'N' V is not referenced.
168
169 LDV (input) INTEGER
170 The leading dimension of the array V, LDV >= 1. If JOBV = 'V'
171 or 'J' or 'W', then LDV >= N.
172
173 WORK (workspace/output) REAL array, dimension at least LWORK.
174 On exit, WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling
175 factor such that SCALE*SVA(1:N) are the computed singular val‐
176 ues of A. (See the description of SVA().) WORK(2) = See the
177 description of WORK(1). WORK(3) = SCONDA is an estimate for
178 the condition number of column equilibrated A. (If JOBA .EQ.
179 'E' or 'G') SCONDA is an estimate of DSQRT(||(R^t *
180 R)^(-1)||_1). It is computed using DPOCON. It holds N^(-1/4) *
181 SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA where R is the tri‐
182 angular factor from the QRF of A. However, if R is truncated
183 and the numerical rank is determined to be strictly smaller
184 than N, SCONDA is returned as -1, thus indicating that the
185 smallest singular values might be lost. If full SVD is needed,
186 the following two condition numbers are useful for the analysis
187 of the algorithm. They are provied for a developer/implementer
188 who is familiar with the details of the method. WORK(4) = an
189 estimate of the scaled condition number of the triangular fac‐
190 tor in the first QR factorization. WORK(5) = an estimate of
191 the scaled condition number of the triangular factor in the
192 second QR factorization. The following two parameters are com‐
193 puted if JOBT .EQ. 'T'. They are provided for a devel‐
194 oper/implementer who is familiar with the details of the
195 method. WORK(6) = the entropy of A^t*A :: this is the Shannon
196 entropy of diag(A^t*A) / Trace(A^t*A) taken as point in the
197 probability simplex. WORK(7) = the entropy of A*A^t.
198
199 LWORK (input) INTEGER
200 Length of WORK to confirm proper allocation of work space.
201 LWORK depends on the job: If only SIGMA is needed (
202 JOBU.EQ.'N', JOBV.EQ.'N' ) and
203 -> .. no scaled condition estimate required ( JOBE.EQ.'N'):
204 LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
205 For optimal performance (blocked code) the optimal value is
206 LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
207 block size for xGEQP3/xGEQRF. -> .. an estimate of the scaled
208 condition number of A is required (JOBA='E', 'G'). In this
209 case, LWORK is the maximum of the above and N*N+4*N, i.e. LWORK
210 >= max(2*M+N,N*N+4N,7). If SIGMA and the right singular vec‐
211 tors are needed (JOBV.EQ.'V'), -> the minimal requirement is
212 LWORK >= max(2*N+M,7). -> For optimal performance, LWORK >=
213 max(2*N+M,2*N+N*NB,7), where NB is the optimal block size. If
214 SIGMA and the left singular vectors are needed -> the minimal
215 requirement is LWORK >= max(2*N+M,7). -> For optimal perfor‐
216 mance, LWORK >= max(2*N+M,2*N+N*NB,7), where NB is the optimal
217 block size. If full SVD is needed ( JOBU.EQ.'U' or 'F',
218 JOBV.EQ.'V' ) and -> .. the singular vectors are computed with‐
219 out explicit accumulation of the Jacobi rotations, LWORK >=
220 6*N+2*N*N -> .. in the iterative part, the Jacobi rotations are
221 explicitly accumulated (option, see the description of JOBV),
222 then the minimal requirement is LWORK >= max(M+3*N+N*N,7). For
223 better performance, if NB is the optimal block size, LWORK >=
224 max(3*N+N*N+M,3*N+N*N+N*NB,7).
225
226 IWORK (workspace/output) INTEGER array, dimension M+3*N.
227 On exit, IWORK(1) = the numerical rank determined after the
228 initial QR factorization with pivoting. See the descriptions of
229 JOBA and JOBR. IWORK(2) = the number of the computed nonzero
230 singular values IWORK(3) = if nonzero, a warning message: If
231 IWORK(3).EQ.1 then some of the column norms of A were denormal‐
232 ized floats. The requested high accuracy is not warranted by
233 the data.
234
235 INFO (output) INTEGER
236 < 0 : if INFO = -i, then the i-th argument had an illegal
237 value.
238 = 0 : successfull exit;
239 > 0 : DGEJSV did not converge in the maximal allowed number
240 of sweeps. The computed values may be inaccurate.
241
243 DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses
244 SGEQP3, SGEQRF, and SGELQF as preprocessors and preconditioners.
245 Optionally, an additional row pivoting can be used as a preprocessor,
246 which in some cases results in much higher accuracy. An example is
247 matrix A with the structure A = D1 * C * D2, where D1, D2 are arbitrar‐
248 ily ill-conditioned diagonal matrices and C is well-conditioned matrix.
249 In that case, complete pivoting in the first QR factorizations provides
250 accuracy dependent on the condition number of C, and independent of D1,
251 D2. Such higher accuracy is not completely understood theoretically,
252 but it works well in practice. Further, if A can be written as A =
253 B*D, with well-conditioned B and some diagonal D, then the high accu‐
254 racy is guaranteed, both theoretically and in software, independent of
255 D. For more details see [1], [2].
256 The computational range for the singular values can be the full
257 range ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and
258 the BLAS & LAPACK routines called by DGEJSV are implemented to work in
259 that range. If that is not the case, then the restriction for safe
260 computation with the singular values in the range of normalized IEEE
261 numbers is that the spectral condition number
262 kappa(A)=sigma_max(A)/sigma_min(A) does not overflow. This code (DGE‐
263 JSV) is best used in this restricted range, meaning that singular val‐
264 ues of magnitude below ||A||_2 / SLAMCH('O') are returned as zeros. See
265 JOBR for details on this.
266 Further, this implementation is somewhat slower than the one
267 described in [1,2] due to replacement of some non-LAPACK components,
268 and because the choice of some tuning parameters in the iterative part
269 (DGESVJ) is left to the implementer on a particular machine.
270 The rank revealing QR factorization (in this code: SGEQP3) should be
271 implemented as in [3]. We have a new version of SGEQP3 under develop‐
272 ment that is more robust than the current one in LAPACK, with a cleaner
273 cut in rank defficient cases. It will be available in the SIGMA library
274 [4]. If M is much larger than N, it is obvious that the inital QRF
275 with column pivoting can be preprocessed by the QRF without pivoting.
276 That well known trick is not used in DGEJSV because in some cases heavy
277 row weighting can be treated with complete pivoting. The overhead in
278 cases M much larger than N is then only due to pivoting, but the bene‐
279 fits in terms of accuracy have prevailed. The implementer/user can
280 incorporate this extra QRF step easily. The implementer can also
281 improve data movement (matrix transpose, matrix copy, matrix transposed
282 copy) - this implementation of DGEJSV uses only the simplest, naive
283 data movement. Contributors
284 Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
285 References
286 SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
287 LAPACK Working note 169.
288 SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
289 LAPACK Working note 170.
290 factorization software - a case study.
291 ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
292 LAPACK Working note 176.
293 QSVD, (H,K)-SVD computations.
294 Department of Mathematics, University of Zagreb, 2008. Bugs, exam‐
295 ples and comments
296 Please report all bugs and send interesting examples and/or comments to
297 drmac@math.hr. Thank you.
298
299
300
301 LAPACK routine (version 3.2) November 2008 DGEJSV(1)