1HPL_pdtrsv(3) HPL Library Functions HPL_pdtrsv(3)
2
3
4
6 HPL_pdtrsv - Solve triu( A ) x = b.
7
9 #include "hpl.h"
10
11 void HPL_pdtrsv( HPL_T_grid * GRID, HPL_T_pmat * AMAT );
12
14 HPL_pdtrsv solves an upper triangular system of linear equations.
15
16 The rhs is the last column of the N by N+1 matrix A. The solve starts
17 in the process column owning the Nth column of A, so the rhs b may
18 need to be moved one process column to the left at the beginning. The
19 routine therefore needs a column vector in every process column but
20 the one owning b. The result is replicated in all process rows, and
21 returned in XR, i.e. XR is of size nq = LOCq( N ) in all processes.
22
23 The algorithm uses decreasing one-ring broadcast in process rows and
24 columns implemented in terms of synchronous communication point to
25 point primitives. The lookahead of depth 1 is used to minimize the
26 critical path. This entire operation is essentially ``latency'' bound
27 and an estimate of its running time is given by:
28
29 (move rhs) lat + N / ( P bdwth ) +
30 (solve) ((N / NB)-1) 2 (lat + NB / bdwth) +
31 gam2 N^2 / ( P Q ),
32
33 where gam2 is an estimate of the Level 2 BLAS rate of execution.
34 There are N / NB diagonal blocks. One must exchange 2 messages of
35 length NB to compute the next NB entries of the vector solution, as
36 well as performing a total of N^2 floating point operations.
37
39 GRID (local input) HPL_T_grid *
40 On entry, GRID points to the data structure containing the
41 process grid information.
42
43 AMAT (local input/output) HPL_T_pmat *
44 On entry, AMAT points to the data structure containing the
45 local array information.
46
48 HPL_pdgesv (3).
49
50
51
52HPL 2.2 February 24, 2016 HPL_pdtrsv(3)