1DGEJSV(1)LAPACK routine (version 3.2)                                 DGEJSV(1)
2
3
4

NAME

6       DGEJSV  -  computes the singular value decomposition (SVD) of a real M-
7       by-N matrix [A], where M >= N
8

SYNOPSIS

10       SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
11
12           &              M, N, A, LDA, SVA, U, LDU, V, LDV,
13
14           &              WORK, LWORK, IWORK, INFO )
15
16           IMPLICIT       NONE
17
18           INTEGER        INFO, LDA, LDU, LDV, LWORK, M, N
19
20           DOUBLE         PRECISION A( LDA, * ), SVA( N ), U(  LDU,  *  ),  V(
21                          LDV, * ),
22
23           &              WORK( LWORK )
24
25           INTEGER        IWORK( * )
26
27           CHARACTER*1    JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
28

PURPOSE

30       DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
31       matrix [A], where M >= N. The SVD of [A] is written as
32                    [A] = [U] * [SIGMA] * [V]^t,
33       where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its
34       N  diagonal  elements, [U] is an M-by-N (or M-by-M) orthonormal matrix,
35       and [V] is an  N-by-N  orthogonal  matrix.  The  diagonal  elements  of
36       [SIGMA]  are the singular values of [A]. The columns of [U] and [V] are
37       the left and the right  singular  vectors  of  [A],  respectively.  The
38       matrices  [U]  and  [V]  are computed and stored in the arrays U and V,
39       respectively. The diagonal of [SIGMA] is computed  and  stored  in  the
40       array SVA.
41

ARGUMENTS

43       Specifies  the  level  of accuracy: = 'C': This option works well (high
44       relative accuracy) if A = B * D, with well-conditioned B and  arbitrary
45       diagonal  matrix  D.  The accuracy cannot be spoiled by COLUMN scaling.
46       The accuracy of the computed output depends on the condition of B,  and
47       the  procedure  aims  at  the  best theoretical accuracy.  The relative
48       error max_{i=1:N}|d sigma_i| / sigma_i is  bounded  by  f(M,N)*epsilon*
49       cond(B),  independent  of D.  The input matrix is preprocessed with the
50       QRF with column pivoting. This initial preprocessing and  precondition‐
51       ing  by  a  rank revealing QR factorization is common for all values of
52       JOBA. Additional actions are specified as follows:
53       = 'E': Computation as with 'C' with an additional estimate of the  con‐
54       dition number of B. It provides a realistic error bound.  = 'F': If A =
55       D1 * C * D2 with ill-conditioned diagonal scalings D1,  D2,  and  well-
56       conditioned  matrix  C,  this option gives higher accuracy than the 'C'
57       option. If the structure of the input matrix is not known, and relative
58       accuracy  is desirable, then this option is advisable. The input matrix
59       A is preprocessed with QR factorization with FULL (row and column) piv‐
60       oting.   =  'G'  Computation as with 'F' with an additional estimate of
61       the condition number of B, where A=D*B. If A has heavily weighted rows,
62       then  using this condition number gives too pessimistic error bound.  =
63       'A': Small singular values are the noise and the matrix is  treated  as
64       numerically  rank defficient. The error in the computed singular values
65       is bounded by f(m,n)*epsilon*||A||.  The computed SVD A = U * S  *  V^t
66       restores  A  up  to f(m,n)*epsilon*||A||.  This gives the procedure the
67       licence  to  discard  (set  to  zero)   all   singular   values   below
68       N*epsilon*||A||.   = 'R': Similar as in 'A'. Rank revealing property of
69       the initial QR factorization is used do reveal (using  triangular  fac‐
70       tor)  a gap sigma_{r+1} < epsilon * sigma_r in which case the numerical
71       RANK is declared to be r. The  SVD  is  computed  with  absolute  error
72       bounds,  but  more accurately than with 'A'.  Specifies whether to com‐
73       pute the columns of U: = 'U': N columns of U are returned in the  array
74       U.
75       = 'F': full set of M left sing. vectors is returned in the array U.
76       = 'W': U may be used as workspace of length M*N. See the description of
77       U.  = 'N': U is not computed.  Specifies whether to compute the  matrix
78       V:
79       = 'V': N columns of V are returned in the array V; Jacobi rotations are
80       not explicitly accumulated.  = 'J': N columns of V are returned in  the
81       array V, but they are computed as the product of Jacobi rotations. This
82       option is allowed only if JOBU .NE. 'N', i.e.  in  computing  the  full
83       SVD.  = 'W': V may be used as workspace of length N*N. See the descrip‐
84       tion of V.  = 'N': V is not computed.  Specifies the RANGE for the sin‐
85       gular values. Issues the licence to set to zero small positive singular
86       values if they are outside specified range. If A .NE. 0  is  scaled  so
87       that   the   largest  singular  value  of  c*A  is  around  DSQRT(BIG),
88       BIG=SLAMCH('O'), then JOBR issues the licence  to  kill  columns  of  A
89       whose  norm in c*A is less than DSQRT(SFMIN) (for JOBR.EQ.'R'), or less
90       than SMALL=SFMIN/EPSLN, where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').   =
91       'N':  Do  not  kill small columns of c*A. This option assumes that BLAS
92       and QR factorizations and triangular solvers are implemented to work in
93       that  range.  If the condition of A is greater than BIG, use DGESVJ.  =
94       'R': RESTRICTED range  for  sigma(c*A)  is  [DSQRT(SFMIN),  DSQRT(BIG)]
95       (roughly,   as   described   above).   This   option   is  recommended.
96       ~~~~~~~~~~~~~~~~~~~~~~~~~~~ For computing the singular  values  in  the
97       FULL  range  [SFMIN,BIG]  use DGESVJ.  If the matrix is square then the
98       procedure may determine to use transposed A if A^t seems to  be  better
99       with  respect  to  convergence.   If  the matrix is not square, JOBT is
100       ignored. This is subject to changes in the  future.   The  decision  is
101       based  on  two values of entropy over the adjoint orbit of A^t * A. See
102       the descriptions of WORK(6) and WORK(7).  = 'T': transpose  if  entropy
103       test  indicates possibly faster convergence of Jacobi process if A^t is
104       taken as input. If A is replaced with A^t, then  the  row  pivoting  is
105       included  automatically.   = 'N': do not speculate.  This option can be
106       used to compute only the singular values, or the full SVD (U, SIGMA and
107       V).  For  only  one set of singular vectors (U or V), the caller should
108       provide both U and V, as one of the matrices is used  as  workspace  if
109       the  matrix  A  is  transposed.  The implementer can easily remove this
110       constraint and make the code more complicated. See the descriptions  of
111       U  and  V.  Issues the licence to introduce structured perturbations to
112       drown denormalized numbers. This licence should be active if the denor‐
113       mals  are  poorly  implemented, causing slow computation, especially in
114       cases of fast convergence (!). For details see [1,2].  For the sake  of
115       simplicity,  this  perturbations are included only when the full SVD or
116       only the singular values are requested. The implementer/user can easily
117       add  the  perturbation  for  the cases of computing one set of singular
118       vectors.  = 'P': introduce perturbation
119       = 'N': do not perturb
120
121       M      (input) INTEGER
122              The number of rows of the input matrix A.  M >= 0.
123
124       N      (input) INTEGER
125              The number of columns of the input matrix A. M >= N >= 0.
126
127       A       (input/workspace) REAL array, dimension (LDA,N)
128               On entry, the M-by-N matrix A.
129
130       LDA     (input) INTEGER
131               The leading dimension of the array A.  LDA >= max(1,M).
132
133       SVA     (workspace/output) REAL array, dimension (N)
134               On exit, - For WORK(1)/WORK(2) = ONE: The singular values of A.
135               During  the  computation SVA contains Euclidean column norms of
136               the iterated matrices in the  array  A.   -  For  WORK(1)  .NE.
137               WORK(2): The singular values of A are
138               (WORK(1)/WORK(2))  *  SVA(1:N).  This  factored form is used if
139               sigma_max(A) overflows or if small singular  values  have  been
140               saved  from  underflow  by  scaling  the  input matrix A.  - If
141               JOBR='R' then some of the singular values may  be  returned  as
142               exact  zeros  obtained  by "set to zero" because they are below
143               the numerical rank threshold or are denormalized numbers.
144
145       U       (workspace/output) REAL array, dimension ( LDU, N )
146               If JOBU = 'U', then U contains on exit the M-by-N matrix of the
147               left  singular vectors.  If JOBU = 'F', then U contains on exit
148               the M-by-M matrix of the left singular  vectors,  including  an
149               ONB  of  the  orthogonal complement of the Range(A).  If JOBU =
150               'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N), then U
151               is  used  as workspace if the procedure replaces A with A^t. In
152               that case, [V] is computed in U as left singular vectors of A^t
153               and  then copied back to the V array. This 'W' option is just a
154               reminder to the caller that in  this  case  U  is  reserved  as
155               workspace  of  length N*N.  If JOBU = 'N'  U is not referenced.
156               The leading dimension of the array U,  LDU >= 1.   IF   JOBU  =
157               'U' or 'F' or 'W',  then LDU >= M.
158
159       V       (workspace/output) REAL array, dimension ( LDV, N )
160               If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
161               the right singular vectors; If JOBV = 'W', AND (JOBU.EQ.'U' AND
162               JOBT.EQ.'T'  AND  M.EQ.N),  then  V is used as workspace if the
163               pprocedure replaces A with A^t. In that case, [U]  is  computed
164               in  V  as right singular vectors of A^t and then copied back to
165               the U array. This 'W' option is just a reminder to  the  caller
166               that in this case V is reserved as workspace of length N*N.  If
167               JOBV = 'N'  V is not referenced.
168
169       LDV     (input) INTEGER
170               The leading dimension of the array V,  LDV >= 1.  If JOBV = 'V'
171               or 'J' or 'W', then LDV >= N.
172
173       WORK    (workspace/output) REAL array, dimension at least LWORK.
174               On  exit,  WORK(1)  =  SCALE = WORK(2) / WORK(1) is the scaling
175               factor such that SCALE*SVA(1:N) are the computed singular  val‐
176               ues  of  A.  (See the description of SVA().)  WORK(2) = See the
177               description of WORK(1).  WORK(3) = SCONDA is  an  estimate  for
178               the  condition  number  of column equilibrated A. (If JOBA .EQ.
179               'E'  or  'G')  SCONDA  is  an  estimate   of   DSQRT(||(R^t   *
180               R)^(-1)||_1).  It is computed using DPOCON. It holds N^(-1/4) *
181               SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA where R is the  tri‐
182               angular  factor  from the QRF of A.  However, if R is truncated
183               and the numerical rank is determined  to  be  strictly  smaller
184               than  N,  SCONDA  is  returned  as -1, thus indicating that the
185               smallest singular values might be lost.  If full SVD is needed,
186               the following two condition numbers are useful for the analysis
187               of the algorithm. They are provied for a  developer/implementer
188               who  is  familiar with the details of the method.  WORK(4) = an
189               estimate of the scaled condition number of the triangular  fac‐
190               tor  in  the  first QR factorization.  WORK(5) = an estimate of
191               the scaled condition number of the  triangular  factor  in  the
192               second QR factorization.  The following two parameters are com‐
193               puted if JOBT  .EQ.  'T'.   They  are  provided  for  a  devel‐
194               oper/implementer  who  is  familiar  with  the  details  of the
195               method.  WORK(6) = the entropy of A^t*A :: this is the  Shannon
196               entropy  of  diag(A^t*A)  /  Trace(A^t*A) taken as point in the
197               probability simplex.  WORK(7) = the entropy of A*A^t.
198
199       LWORK   (input) INTEGER
200               Length of WORK to confirm  proper  allocation  of  work  space.
201               LWORK   depends   on  the  job:  If  only  SIGMA  is  needed  (
202               JOBU.EQ.'N', JOBV.EQ.'N' ) and
203               -> .. no scaled condition  estimate  required  (  JOBE.EQ.'N'):
204               LWORK  >=  max(2*M+N,4*N+1,7). This is the minimal requirement.
205               For optimal performance (blocked code)  the  optimal  value  is
206               LWORK  >=  max(2*M+N,3*N+(N+1)*NB,7).  Here  NB  is the optimal
207               block size for xGEQP3/xGEQRF.  -> .. an estimate of the  scaled
208               condition  number  of  A  is  required (JOBA='E', 'G'). In this
209               case, LWORK is the maximum of the above and N*N+4*N, i.e. LWORK
210               >=  max(2*M+N,N*N+4N,7).   If SIGMA and the right singular vec‐
211               tors are needed (JOBV.EQ.'V'), -> the  minimal  requirement  is
212               LWORK  >=  max(2*N+M,7).   -> For optimal performance, LWORK >=
213               max(2*N+M,2*N+N*NB,7), where NB is the optimal block size.   If
214               SIGMA  and  the left singular vectors are needed -> the minimal
215               requirement is LWORK >= max(2*N+M,7).  -> For  optimal  perfor‐
216               mance,  LWORK >= max(2*N+M,2*N+N*NB,7), where NB is the optimal
217               block size.  If full  SVD  is  needed  (  JOBU.EQ.'U'  or  'F',
218               JOBV.EQ.'V' ) and -> .. the singular vectors are computed with‐
219               out explicit accumulation of the  Jacobi  rotations,  LWORK  >=
220               6*N+2*N*N -> .. in the iterative part, the Jacobi rotations are
221               explicitly accumulated (option, see the description  of  JOBV),
222               then the minimal requirement is LWORK >= max(M+3*N+N*N,7).  For
223               better performance, if NB is the optimal block size,  LWORK  >=
224               max(3*N+N*N+M,3*N+N*N+N*NB,7).
225
226       IWORK   (workspace/output) INTEGER array, dimension M+3*N.
227               On  exit,  IWORK(1)  =  the numerical rank determined after the
228               initial QR factorization with pivoting. See the descriptions of
229               JOBA  and  JOBR.  IWORK(2) = the number of the computed nonzero
230               singular values IWORK(3) = if nonzero, a  warning  message:  If
231               IWORK(3).EQ.1 then some of the column norms of A were denormal‐
232               ized floats. The requested high accuracy is  not  warranted  by
233               the data.
234
235       INFO    (output) INTEGER
236               <  0   :  if  INFO  = -i, then the i-th argument had an illegal
237               value.
238               = 0 :  successfull exit;
239               > 0 :  DGEJSV  did not converge in the maximal  allowed  number
240               of sweeps. The computed values may be inaccurate.
241

FURTHER DETAILS

243       DGEJSV  implements  a  preconditioned  Jacobi  SVD  algorithm.  It uses
244       SGEQP3,  SGEQRF,  and  SGELQF  as  preprocessors  and  preconditioners.
245       Optionally,  an  additional row pivoting can be used as a preprocessor,
246       which in some cases results in much  higher  accuracy.  An  example  is
247       matrix A with the structure A = D1 * C * D2, where D1, D2 are arbitrar‐
248       ily ill-conditioned diagonal matrices and C is well-conditioned matrix.
249       In that case, complete pivoting in the first QR factorizations provides
250       accuracy dependent on the condition number of C, and independent of D1,
251       D2.  Such  higher  accuracy is not completely understood theoretically,
252       but it works well in practice.  Further, if A can be  written  as  A  =
253       B*D,  with  well-conditioned B and some diagonal D, then the high accu‐
254       racy is guaranteed, both theoretically and in software, independent  of
255       D. For more details see [1], [2].
256          The  computational  range  for  the  singular values can be the full
257       range ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic  and
258       the  BLAS & LAPACK routines called by DGEJSV are implemented to work in
259       that range.  If that is not the case, then  the  restriction  for  safe
260       computation  with  the  singular values in the range of normalized IEEE
261       numbers     is     that     the     spectral      condition      number
262       kappa(A)=sigma_max(A)/sigma_min(A)  does  not overflow. This code (DGE‐
263       JSV) is best used in this restricted range, meaning that singular  val‐
264       ues of magnitude below ||A||_2 / SLAMCH('O') are returned as zeros. See
265       JOBR for details on this.
266          Further,  this  implementation  is  somewhat  slower  than  the  one
267       described  in  [1,2]  due to replacement of some non-LAPACK components,
268       and because the choice of some tuning parameters in the iterative  part
269       (DGESVJ) is left to the implementer on a particular machine.
270          The rank revealing QR factorization (in this code: SGEQP3) should be
271       implemented as in [3]. We have a new version of SGEQP3  under  develop‐
272       ment that is more robust than the current one in LAPACK, with a cleaner
273       cut in rank defficient cases. It will be available in the SIGMA library
274       [4].   If  M  is  much larger than N, it is obvious that the inital QRF
275       with column pivoting can be preprocessed by the QRF  without  pivoting.
276       That well known trick is not used in DGEJSV because in some cases heavy
277       row weighting can be treated with complete pivoting.  The  overhead  in
278       cases  M much larger than N is then only due to pivoting, but the bene‐
279       fits in terms of accuracy  have  prevailed.  The  implementer/user  can
280       incorporate  this  extra  QRF  step  easily.  The  implementer can also
281       improve data movement (matrix transpose, matrix copy, matrix transposed
282       copy)  -  this  implementation  of DGEJSV uses only the simplest, naive
283       data movement.  Contributors
284       Zlatko Drmac (Zagreb, Croatia) and Kresimir  Veselic  (Hagen,  Germany)
285       References
286          SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
287          LAPACK Working note 169.
288          SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
289          LAPACK Working note 170.
290          factorization software - a case study.
291          ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
292          LAPACK Working note 176.
293          QSVD, (H,K)-SVD computations.
294          Department  of Mathematics, University of Zagreb, 2008.  Bugs, exam‐
295       ples and comments
296       Please report all bugs and send interesting examples and/or comments to
297       drmac@math.hr. Thank you.
298
299
300
301 LAPACK routine (version 3.2)    November 2008                       DGEJSV(1)
Impressum