tuning(3)

1tuning(3)                           MEPACK                           tuning(3)
2
3
4

NAME

6       tuning - Tuning
7
8        MEPACK can be tuned at runtime. Therefore, all computation related
9       parameters can be adjusted at runtime, see Options and Settings to tune
10       MEPACK for details. Another variant is to provide a configuration file.
11       This configuration file is a LUA script, which contains a set of
12       functions. Each function gets the problem dimension as input and
13       returns the optimal value for the desired parameter as return value.
14
15       By default, the configuration file src/defaulf_config.lua is compiled
16       into the library. If a user supplied configuration file can be used in
17       two ways:
18
19       • The environment variable MEPACK_LUA_CONFIG is set to an alternative
20         configuration file.
21
22       • Set the -DMEPACK_LUA_CONFIG=PATH option at compile time to the path
23         of the new configuration.
24

Configuration File Template

26       The following template includes all parameters and the corresponding
27       function used by MEPACK at the moment. All function arguments and
28       return values are integer values. The input m represents the number of
29       rows of the solution and the right-hand side. The input n denotes the
30       number of columns of the solution and the right-hand side. If the
31       algorithm solves a 'symmetric equation', like the Lyapunov or the Stein
32       equation, only m is used. The parameter only effect the solvers with
33       (quasi) triangular coefficient matrices, since the solver for equations
34       with general coefficient matrices rely on them.
35
36       --
37       -- Outer block size for the two stage schemes. These values are used by the
38       -- _2STAGE routines.
39       --
40
41       -- TGCSYLV in double precision
42       function tgcsylv_double_2stage(m,n)
43           return 1024
44       end
45
46       -- TGCSYLV_DUAL in double precision
47       function tgcsylv_dual_double_2stage(m,n)
48           return 1024
49       end
50
51       -- TGCSYLV in single precision
52       function tgcsylv_single_2stage(m,n)
53           return 1024
54       end
55
56       -- TGCSYLV_DUAL in single precision
57       function tgcsylv_dual_single_2stage(m,n)
58           return 1024
59       end
60
61       -- TGSYLV in double precision
62       function tgsylv_double_2stage(m,n)
63           return 1024
64       end
65
66       -- TGSYLV in single precision
67       function tgsylv_single_2stage(m,n)
68           return 1024
69       end
70
71       -- TRSYLV in double precision
72       function trsylv_double_2stage(m,n)
73           return 1024
74       end
75
76       -- TRSYLV in single precision
77       function trsylv_single_2stage(m,n)
78           return 1024
79       end
80
81       -- TRSYLV2 in double precision
82       function trsylv2_double_2stage(m,n)
83           return 1024
84       end
85
86       -- TRSYLV2 in single precision
87       function trsylv2_single_2stage(m,n)
88           return 1024
89       end
90
91       -- TGLYAP for double precision
92       function tglyap_double_2stage(m)
93           return 1024
94       end
95
96       -- TGLYAP for single precision precision
97       function tglyap_single_2stage(m)
98           return 1024
99       end
100
101       -- TRLYAP for double precision
102       function trlyap_double_2stage(m)
103           return 1024
104       end
105
106       -- TRLYAP for single precision precision
107       function trlyap_single_2stage(m)
108           return 1024
109       end
110
111       -- TGSTEIN for double precision
112       function tgstein_double_2stage(m)
113           return 1024
114       end
115
116       -- TGSTEIN for single precision precision
117       function tgstein_single_2stage(m)
118           return 1024
119       end
120
121       --
122       -- Block-sizes for the Level-3 and DAG schemes. These values are used by
123       -- routines with the names '_L3' or '_DAG'. The function with the suffix '_mb'
124       -- returns the block-size with respect to the number of rows of the solution.
125       -- The function suffixed with '_nb' returns the block-size with respect to the
126       -- number of columns of the solution. In case of symmetric equations, like
127       -- the Lypanov or the Stein equation. only the function with the suffix '_mb'
128       -- is used.
129
130       -- MB for TGCSYLV in double precision
131       function tgcsylv_double_mb(m,n)
132           return 32
133       end
134
135       -- NB for TGCSYLV in double precision
136       function tgcsylv_double_nb(m,n)
137           return 32
138       end
139
140       -- MB for TGCSYLV_DUAL in double precision
141       function tgcsylv_dual_double_mb(m,n)
142           return 32
143       end
144
145       -- NB for TGCSYLV_DUAL in double precision
146       function tgcsylv_dual_double_nb(m,n)
147           return 32
148       end
149
150       -- MB for TGCSYLV in single precision
151       function tgcsylv_single_mb(m,n)
152           return 32
153       end
154
155       -- NB for TGCSYLV in single precision
156       function tgcsylv_single_nb(m,n)
157           return 32
158       end
159
160       -- MB for TGCSYLV_DUAL in single precision
161       function tgcsylv_dual_single_mb(m,n)
162           return 32
163       end
164
165       -- NB for TGCSYLV_DUAL in single precision
166       function tgcsylv_dual_single_nb(m,n)
167           return 32
168       end
169
170       -- MB for TGSYLV in double precision
171       function tgsylv_double_mb(m,n)
172           return 32
173       end
174
175
176       -- NB for TGSYLV in double precision
177       function tgsylv_double_nb(m,n)
178           return 32
179       end
180
181       -- MB for TGSYLV in single precision
182       function tgsylv_single_mb(m,n)
183           return 32
184       end
185
186       -- NB for TGSYLV in single precision
187       function tgsylv_single_nb(m,n)
188           return 32
189       end
190
191       -- MB for TRSYLV in double precision
192       function trsylv_double_mb(m,n)
193           return 32
194       end
195
196
197       -- NB for TRSYLV in double precision
198       function trsylv_double_nb(m,n)
199           return 32
200       end
201
202       -- MB for TRSYLV in single precision
203       function trsylv_single_mb(m,n)
204           return 32
205       end
206
207       -- NB for TRSYLV in single precision
208       function trsylv_single_nb(m,n)
209           return 32
210       end
211
212       -- MB for TRSYLV2 in double precision
213       function trsylv2_double_mb(m,n)
214           return 32
215       end
216
217       -- NB for TRSYLV2 in double precision
218       function trsylv2_double_nb(m,n)
219           return 32
220       end
221
222       -- MB for TRSYLV2 in single precision
223       function trsylv2_single_mb(m,n)
224           return 32
225       end
226
227       -- NB for TRSYLV2 in single precision
228       function trsylv2_single_nb(m,n)
229           return 32
230       end
231
232       -- MB for TGLYAP in double precision
233       function tglyap_double_mb(m)
234           return 32
235       end
236
237       -- MB for TGLYAP in single precision
238       function tglyap_single_mb(m)
239           return 32
240       end
241
242       -- MB for TRLYAP in double precision
243       function trlyap_double_mb(m)
244           return 32
245       end
246
247       -- MB for TRLYAP in single precision
248       function trlyap_single_mb(m)
249           return 32
250       end
251
252       -- MB for TGSTEIN in double precision
253       function tgstein_double_mb(m)
254           return 32
255       end
256
257       -- MB for TGSTEIN in single precision
258       function tgstein_single_mb(m)
259           return 32
260       end
261
262       -- MB for TRSTEIN in double precision
263       function trstein_double_mb(m)
264           return 32
265       end
266
267       -- MB for TRSTEIN in single precision
268       function trstein_single_mb(m)
269           return 32
270       end
271

Obtaining Optimal values

273       Optimal block-sizes can be obtained by running benchmarks. Therefore,
274       the examples from examples/triangular can be used. For example, the
275       optimal block size MB for the standard Lyapunov equation can be
276       obtained by executing
277
278       $ ./examples/triangular/benchmark_trlyap --solver=1 --rows=1000:1000:5000 --mb=32:32:128
279       # HDF5 Store Path: ./ (set with MEPACK_HDF_PATH)
280       # Command Line: --solver=1 --rows=1000:1000:5000 --mb=32:32:128
281       # RUNS:  5
282       # Number of Matrices: 1
283       # Rows: 1000 (1000:1000:5000)
284       # TRANSA: N
285       # Block Alignment: YES
286       # MACHINE DEFAULT Config:
287       # Solver: LEVEL3 - LEVEL2: LOCAL COPY fixed maximum size with alignment
288       #
289       #  M   MB  Wall-Time     CPU-Time       Ratio    Forward-Err
290        1000  32  1.47723e-01  5.73458e-01  3.88198e+00  1.02451e-12
291        1000  64  2.02979e-01  6.52629e-01  3.21526e+00  7.70460e-13
292        1000  96  1.96380e-01  7.09709e-01  3.61397e+00  7.76942e-13
293        1000 128  2.20266e-01  8.20464e-01  3.72487e+00  7.80125e-13
294        2000  32  1.52258e+00  5.40316e+00  3.54869e+00  2.53205e-12
295        2000  64  1.37894e+00  5.08727e+00  3.68926e+00  3.81413e-12
296        2000  96  1.50218e+00  5.38654e+00  3.58582e+00  6.89304e-12
297        2000 128  1.42162e+00  5.25848e+00  3.69895e+00  3.36752e-12
298        3000  32  4.94669e+00  1.81289e+01  3.66485e+00  6.80014e-12
299        3000  64  5.09770e+00  1.77916e+01  3.49011e+00  5.05035e-12
300        3000  96  4.85457e+00  1.75745e+01  3.62020e+00  6.99595e-12
301        3000 128  6.37335e+00  2.37100e+01  3.72018e+00  5.60129e-12
302        4000  32  1.49584e+01  5.41659e+01  3.62110e+00  3.11764e-12
303        4000  64  1.32034e+01  4.91697e+01  3.72402e+00  3.50054e-12
304        4000  96  1.55753e+01  5.29380e+01  3.39884e+00  3.79209e-12
305        4000 128  1.45271e+01  5.27828e+01  3.63339e+00  2.42395e-12
306        5000  32  2.87573e+01  1.03002e+02  3.58176e+00  4.27677e-12
307        5000  64  2.20910e+01  8.00796e+01  3.62499e+00  5.20105e-12
308        5000  96  1.84365e+01  6.93392e+01  3.76097e+00  4.36991e-12
309        5000 128  2.49911e+01  9.18636e+01  3.67585e+00  5.64129e-12
310
311
312        Selecting the block-sizes with the minimal runtimes from the output
313       and interpolating between them gives the following trlyap_double_mb
314       function on an Intel Celeron N3450 with OpenBLAS 0.3.8 in pthread-mode:
315
316       function trlyap_double_mb(m)
317           if ( m < 1500 ) then
318               return 32
319           elseif ( 1500 <= m and m < 2500 ) then
320               return 64
321           elseif ( 2500 <= m and m < 3500 ) then
322               return 96
323           elseif ( 3500 <= m and m < 4500 ) then
324               return 64
325           else
326               return 96
327           end
328       end
329
330

Predefined Configuration Files

332       We provide a some preconfigured tuning files inside the src/config/
333       directory. At the moment, we provide them for the following systems:
334
335       • Dual Socket Intel Xeon Silver Edition 4110 (2x 8 cores), Intel
336         Parallel Studio 2018 (icc, ifort, mkl) (src/config/intel-xeon-
337         silver-4110-parallel-studio-2018.lua)
338
339       • Dual Socket Intel Xeon Haswell E5-2640 v3 (2x 8 cores), Intel
340         Parallel Studio 2018 (icc, ifort, mkl) (src/config/intel-xeon-
341         e5-2640v3-parallel-studio-2018.lua)
342
343       • Dual Socket IBM Power8 (2x 10 cores), IBM XLC 16.1, IBM XLF 16.1, IBM
344         ESSL 6.3 (src/config/ibm-power8-xlf-essl.lua)
345

Solver Selection

347       MEPACK provides a huge set of Level-3, Level-2 and DAG accelerated
348       solvers. By default the level-3 solvers use the level-2 solvers with
349       aligned local copies. The DAG accelerated solvers use the same
350       selection. In the routines for solving the equations with general
351       coefficient matrices, the level-3 triangular solver is the default.
352       Everything can be changed with the help of the routines from Options -
353       Level 2 Solvers and Options - Frontend Solvers. If the GNU Fortran
354       compiler is used, using the level-2 solvers with aligned local copies
355       is no longer beneficial. In this case one can select the level-2 with
356       local copies (but without alignment) to achieve the same or in some
357       cases a better performance. For such reasons, all the differently
358       optimized solvers are included in MEPACK. In this way, one can evaluate
359       different solver approaches on newly emerge hardware platforms to check
360       for the best performance easily.
361

Recommendations

363       Since there are too many different possibilities to solve a given
364       equation with MEPACK, there are some basic rules, when which solver-
365       type is beneficial. In all cases 'dimension' is meant in the context of
366       the size of the right-hand side.
367
368       • If the problem is small, e.g. m,n <= 128, use a level-2.
369
370       • If the problem is small to medium sized, e.g. m,n <= 1000, use a
371         level-3 solver.
372
373       • If the problem is large, e.g. m,n = 1000 ... 5000, and only a few CPU
374         cores are available, use a level-3 solver as well.
375
376       • If the problem is large, e.g. m,n = 1000 ... 5000, and many CPU cores
377         are available, use a DAG accelerated solver.
378
379       • If the problem is huge, e.g. m,n >= 5000, use the 2-stage solver.
380
381       These recommendations are only a personal experience on the systems of
382       the author had available during development. The situation can change
383       on different hardware dramatically.
384
385
386
387Version 1.1.0                   Wed Oct 18 2023                      tuning(3)