1HPL_pdrpancrN(3) HPL Library Functions HPL_pdrpancrN(3)
2
3
4
6 HPL_pdrpancrN - Crout recursive panel factorization.
7
9 #include "hpl.h"
10
11 void HPL_pdrpancrN( HPL_T_panel * PANEL, const int M, const int N,
12 const int ICOFF, double * WORK );
13
15 HPL_pdrpancrN HPL_pdrpancrN recursively factorizes a panel of columns
16 using the recursive Crout variant of the usual one-dimensional algo‐
17 rithm. The lower triangular N0-by-N0 upper block of the panel is
18 stored in no-transpose form (i.e. just like the input matrix itself).
19
20 Bi-directional exchange is used to perform the swap::broadcast
21 operations at once for one column in the panel. This results in a
22 lower number of slightly larger messages than usual. On P processes
23 and assuming bi-directional links, the running time of this function
24 can be approximated by (when N is equal to N0):
25
26 N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
27 N0^2 * ( M - N0/3 ) * gam2-3
28
29 where M is the local number of rows of the panel, lat and bdwth are
30 the latency and bandwidth of the network for double precision real
31 words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS
32 rate of execution. The recursive algorithm allows indeed to almost
33 achieve Level 3 BLAS performance in the panel factorization. On a
34 large number of modern machines, this operation is however latency
35 bound, meaning that its cost can be estimated by only the latency
36 portion N0 * log_2(P) * lat. Mono-directional links will double this
37 communication cost.
38
40 PANEL (local input/output) HPL_T_panel *
41 On entry, PANEL points to the data structure containing the
42 panel information.
43
44 M (local input) const int
45 On entry, M specifies the local number of rows of sub(A).
46
47 N (local input) const int
48 On entry, N specifies the local number of columns of sub(A).
49
50 ICOFF (global input) const int
51 On entry, ICOFF specifies the row and column offset of sub(A)
52 in A.
53
54 WORK (local workspace) double *
55 On entry, WORK is a workarray of size at least 2*(4+2*N0).
56
58 HPL_dlocmax (3), HPL_dlocswpN (3), HPL_dlocswpT (3), HPL_pdmxswp (3),
59 HPL_pdpancrN (3), HPL_pdpancrT (3), HPL_pdpanllN (3), HPL_pdpanllT (3),
60 HPL_pdpanrlN (3), HPL_pdpanrlT (3), HPL_pdrpancrT (3), HPL_pdrpan‐
61 llN (3), HPL_pdrpanllT (3), HPL_pdrpanrlN (3), HPL_pdrpanrlT (3),
62 HPL_pdfact (3).
63
64
65
66HPL 2.1 October 26, 2012 HPL_pdrpancrN(3)