1HPL_pdrpanllN(3) HPL Library Functions HPL_pdrpanllN(3)
2
3
4
6 HPL_pdrpanllN - Left-looking recursive panel factorization.
7
9 #include "hpl.h"
10
11 void HPL_pdrpanllN( HPL_T_panel * PANEL, const int M, const int N,
12 const int ICOFF, double * WORK );
13
15 HPL_pdrpanllN recursively factorizes a panel of columns using the
16 recursive Left-looking variant of the one-dimensional algorithm. The
17 lower triangular N0-by-N0 upper block of the panel is stored in
18 no-transpose form (i.e. just like the input matrix itself).
19
20 Bi-directional exchange is used to perform the swap::broadcast
21 operations at once for one column in the panel. This results in a
22 lower number of slightly larger messages than usual. On P processes
23 and assuming bi-directional links, the running time of this function
24 can be approximated by (when N is equal to N0):
25
26 N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
27 N0^2 * ( M - N0/3 ) * gam2-3
28
29 where M is the local number of rows of the panel, lat and bdwth are
30 the latency and bandwidth of the network for double precision real
31 words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS
32 rate of execution. The recursive algorithm allows indeed to almost
33 achieve Level 3 BLAS performance in the panel factorization. On a
34 large number of modern machines, this operation is however latency
35 bound, meaning that its cost can be estimated by only the latency
36 portion N0 * log_2(P) * lat. Mono-directional links will double this
37 communication cost.
38
40 PANEL (local input/output) HPL_T_panel *
41 On entry, PANEL points to the data structure containing the
42 panel information.
43
44 M (local input) const int
45 On entry, M specifies the local number of rows of sub(A).
46
47 N (local input) const int
48 On entry, N specifies the local number of columns of sub(A).
49
50 ICOFF (global input) const int
51 On entry, ICOFF specifies the row and column offset of sub(A)
52 in A.
53
54 WORK (local workspace) double *
55 On entry, WORK is a workarray of size at least 2*(4+2*N0).
56
58 HPL_dlocmax (3), HPL_dlocswpN (3), HPL_dlocswpT (3), HPL_pdmxswp (3),
59 HPL_pdpancrN (3), HPL_pdpancrT (3), HPL_pdpanllN (3), HPL_pdpanllT (3),
60 HPL_pdpanrlN (3), HPL_pdpanrlT (3), HPL_pdrpancrN (3), HPL_pdrpan‐
61 crT (3), HPL_pdrpanllT (3), HPL_pdrpanrlN (3), HPL_pdrpanrlT (3),
62 HPL_pdfact (3).
63
64
65
66HPL 2.2 February 24, 2016 HPL_pdrpanllN(3)