1HPL_pdrpancrN(3)             HPL Library Functions            HPL_pdrpancrN(3)
2
3
4

NAME

6       HPL_pdrpancrN - Crout recursive panel factorization.
7

SYNOPSIS

9       #include "hpl.h"
10
11       void  HPL_pdrpancrN(  HPL_T_panel  *  PANEL,  const int M, const int N,
12       const int ICOFF, double * WORK );
13

DESCRIPTION

15       HPL_pdrpancrN HPL_pdrpancrN recursively  factorizes  a panel of columns
16       using  the recursive  Crout  variant of the usual one-dimensional algo‐
17       rithm. The lower triangular  N0-by-N0  upper block  of  the  panel   is
18       stored in no-transpose form (i.e. just like the input matrix itself).
19
20       Bi-directional   exchange   is  used  to  perform  the  swap::broadcast
21       operations  at once  for one column in the panel.  This  results  in  a
22       lower  number  of slightly larger  messages than usual.  On P processes
23       and assuming bi-directional links,  the running time of  this  function
24       can be approximated by (when N is equal to N0):
25
26          N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
27          N0^2 * ( M - N0/3 ) * gam2-3
28
29       where  M  is the local number of rows of  the panel, lat and bdwth  are
30       the latency and bandwidth of the network for  double   precision   real
31       words,  and   gam2-3  is  an estimate of the  Level 2 and Level 3  BLAS
32       rate of execution. The  recursive  algorithm  allows indeed  to  almost
33       achieve   Level  3 BLAS  performance  in the panel factorization.  On a
34       large  number of modern machines,  this  operation is  however  latency
35       bound,   meaning   that its cost can  be estimated  by only the latency
36       portion N0 * log_2(P) * lat.  Mono-directional links will  double  this
37       communication cost.
38

ARGUMENTS

40       PANEL   (local input/output)    HPL_T_panel *
41               On  entry,   PANEL  points to the data structure containing the
42               panel information.
43
44       M       (local input)           const int
45               On entry,  M specifies the local number of rows of sub(A).
46
47       N       (local input)           const int
48               On entry,  N specifies the local number of columns of sub(A).
49
50       ICOFF   (global input)          const int
51               On entry, ICOFF specifies the row and column offset  of  sub(A)
52               in A.
53
54       WORK    (local workspace)       double *
55               On entry, WORK  is a workarray of size at least 2*(4+2*N0).
56

SEE ALSO

58       HPL_dlocmax (3),  HPL_dlocswpN (3),  HPL_dlocswpT (3), HPL_pdmxswp (3),
59       HPL_pdpancrN (3), HPL_pdpancrT (3), HPL_pdpanllN (3), HPL_pdpanllT (3),
60       HPL_pdpanrlN (3),   HPL_pdpanrlT (3),   HPL_pdrpancrT (3),  HPL_pdrpan‐
61       llN (3),   HPL_pdrpanllT (3),   HPL_pdrpanrlN (3),   HPL_pdrpanrlT (3),
62       HPL_pdfact (3).
63
64
65
66HPL 2.1                        October 26, 2012               HPL_pdrpancrN(3)
Impressum