1HPL_plindx0(3) HPL Library Functions HPL_plindx0(3)
2
3
4
6 HPL_plindx0 - Compute local swapping index arrays.
7
9 #include "hpl.h"
10
11 void HPL_plindx0( HPL_T_panel * PANEL, const int K, int * IPID, int *
12 LINDXA, int * LINDXAU, int * LLEN );
13
15 HPL_plindx0 computes two local arrays LINDXA and LINDXAU containing
16 the local source and final destination position resulting from the
17 application of row interchanges.
18
19 On entry, the array IPID of length K is such that the row of global
20 index IPID(i) should be mapped onto row of global index IPID(i+1).
21 Let IA be the global index of the first row to be swapped. For k in
22 [0..K/2), the row of global index IPID(2*k) should be mapped onto the
23 row of global index IPID(2*k+1). The question then, is to determine
24 which rows should ultimately be part of U.
25
26 First, some rows of the process ICURROW may be swapped locally. One
27 of this row belongs to U, the other one belongs to my local piece of
28 A. The other rows of the current block are swapped with remote rows
29 and are thus not part of U. These rows however should be sent along,
30 and grabbed by the other processes as we progress in the exchange
31 phase.
32
33 So, assume that I am ICURROW and consider a row of index IPID(2*i)
34 that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less
35 than N, this row is locally swapped and should be copied into U at
36 the position IPID(2*i+1) - IA. No row will be exchanged for this one.
37 If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be
38 locally copied into my local piece of A at the position corresponding
39 to the row of global index IPID(2*i+1).
40
41 If the process ICURROW does not own IPID(2*i+1), then row IPID(2*i)
42 is to be swapped away and strictly speaking does not belong to U, but
43 to A remotely. Since this process will however send this array U,
44 this row is copied into U, exactly where the row IPID(2*i+1) should
45 go. For this, we search IPID for k1, such that IPID(2*k1) is equal to
46 IPID(2*i+1); and row IPID(2*i) is to be copied in U at the position
47 IPID(2*k1+1)-IA.
48
49 It is thus important to put the rows that go into U, i.e., such that
50 IPID(2*i+1) - IA is less than N at the begining of the array IPID. By
51 doing so, U is formed, and the local copy is performed in just one
52 sweep.
53
54 Two lists LINDXA and LINDXAU are built. LINDXA contains the local
55 index of the rows I have that should be copied. LINDXAU contains the
56 local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A
57 is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k)
58 of A should be locally copied into A(-LINDXAU(k),:). In the process
59 ICURROW, the initial packing algorithm proceeds as follows.
60
61 for all entries in IPID,
62 if IPID(2*i) is in ICURROW,
63 if IPID(2*i+1) is in ICURROW,
64 if( IPID(2*i+1) - IA < N )
65 save corresponding local position
66 of this row (LINDXA);
67 save local position (LINDXAU) in U
68 where this row goes;
69 [copy row IPID(2*i) in U at position
70 IPID(2*i+1)-IA; ];
71 else
72 save corresponding local position of
73 this row (LINDXA);
74 save local position (-LINDXAU) in A
75 where this row goes;
76 [copy row IPID(2*i) in my piece of A
77 at IPID(2*i+1);]
78 end if
79 else
80 find k1 such that IPID(2*k1) = IPID(2*i+1);
81 copy row IPID(2*i) in U at position
82 IPID(2*k1+1)-IA;
83 save corresponding local position of this
84 row (LINDXA);
85 save local position (LINDXAU) in U where
86 this row goes;
87 end if
88 end if
89 end for
90
91 Second, if I am not the current row process ICURROW, all source rows
92 in IPID that I own are part of U. Indeed, they are swapped with one
93 row of the current block of rows, and the main factorization
94 algorithm proceeds one row after each other. The processes different
95 from ICURROW, should exchange and accumulate those rows until they
96 receive some data previously owned by the process ICURROW.
97
98 In processes different from ICURROW, the initial packing algorithm
99 proceeds as follows. Consider a row of global index IPID(2*i) that I
100 own. When I will be receiving data previously owned by ICURROW, i.e.,
101 U, row IPID(2*i) should replace the row in U at pos. IPID(2*i+1)-IA,
102 and this particular row of U should be first copied into my piece of
103 A, at A(il,:), where il is the local row index corresponding to
104 IPID(2*i). Now,initially, this row will be packed into workspace, say
105 as the kth row of that work array. The following algorithm sets
106 LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row
107 should be copied. LINDXA(k) stores the local index in A where this
108 row of U should be copied, i.e il.
109
110 for all entries in IPID,
111 if IPID(2*i) is not in ICURROW,
112 copy row IPID(2*i) in work array;
113 save corresponding local position
114 of this row (LINDXA);
115 save position (LINDXAU) in U where
116 this row should be copied;
117 end if
118 end for
119
120 Since we are at it, we also globally figure out how many rows every
121 process has. That is necessary, because it would rather be cumbersome
122 to figure it on the fly during the bi-directional exchange phase.
123 This information is kept in the array LLEN of size NPROW. Also note
124 that the arrays LINDXA and LINDXAU are of max length equal to 2*N.
125
127 PANEL (local input/output) HPL_T_panel *
128 On entry, PANEL points to the data structure containing the
129 panel information.
130
131 K (global input) const int
132 On entry, K specifies the number of entries in IPID. K is at
133 least 2*N, and at most 4*N.
134
135 IPID (global input) int *
136 On entry, IPID is an array of length K. The first K entries
137 of that array contain the src and final destination resulting
138 from the application of the interchanges.
139
140 LINDXA (local output) int *
141 On entry, LINDXA is an array of dimension 2*N. On exit, this
142 array contains the local indexes of the rows of A I have that
143 should be copied into U.
144
145 LINDXAU (local output) int *
146 On exit, LINDXAU is an array of dimension 2*N. On exit, this
147 array contains the local destination information encoded as
148 follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be
149 copied in U at position LINDXAU(k). Otherwise, row LINDXA(k)
150 of A should be locally copied into A(-LINDXAU(k),:).
151
152 LLEN (global output) int *
153 On entry, LLEN is an array of length NPROW. On exit, it
154 contains how many rows every process has.
155
157 HPL_pdlaswp00N (3), HPL_pdlaswp00T (3), HPL_pdlaswp01N (3),
158 HPL_pdlaswp01T (3).
159
160
161
162HPL 2.2 February 24, 2016 HPL_plindx0(3)