1tuning(3) MEPACK tuning(3)
2
3
4
6 tuning - Tuning
7
8 MEPACK can be tuned at runtime. Therefore, all computation related
9 parameters can be adjusted at runtime, see Options and Settings to tune
10 MEPACK for details. Another variant is to provide a configuration file.
11 This configuration file is a LUA script, which contains a set of
12 functions. Each function gets the problem dimension as input and
13 returns the optimal value for the desired parameter as return value.
14
15 By default, the configuration file src/defaulf_config.lua is compiled
16 into the library. If a user supplied configuration file can be used in
17 two ways:
18
19 • The environment variable MEPACK_LUA_CONFIG is set to an alternative
20 configuration file.
21
22 • Set the -DMEPACK_LUA_CONFIG=PATH option at compile time to the path
23 of the new configuration.
24
26 The following template includes all parameters and the corresponding
27 function used by MEPACK at the moment. All function arguments and
28 return values are integer values. The input m represents the number of
29 rows of the solution and the right-hand side. The input n denotes the
30 number of columns of the solution and the right-hand side. If the
31 algorithm solves a 'symmetric equation', like the Lyapunov or the Stein
32 equation, only m is used. The parameter only effect the solvers with
33 (quasi) triangular coefficient matrices, since the solver for equations
34 with general coefficient matrices rely on them.
35
36 --
37 -- Outer block size for the two stage schemes. These values are used by the
38 -- _2STAGE routines.
39 --
40
41 -- TGCSYLV in double precision
42 function tgcsylv_double_2stage(m,n)
43 return 1024
44 end
45
46 -- TGCSYLV_DUAL in double precision
47 function tgcsylv_dual_double_2stage(m,n)
48 return 1024
49 end
50
51 -- TGCSYLV in single precision
52 function tgcsylv_single_2stage(m,n)
53 return 1024
54 end
55
56 -- TGCSYLV_DUAL in single precision
57 function tgcsylv_dual_single_2stage(m,n)
58 return 1024
59 end
60
61 -- TGSYLV in double precision
62 function tgsylv_double_2stage(m,n)
63 return 1024
64 end
65
66 -- TGSYLV in single precision
67 function tgsylv_single_2stage(m,n)
68 return 1024
69 end
70
71 -- TRSYLV in double precision
72 function trsylv_double_2stage(m,n)
73 return 1024
74 end
75
76 -- TRSYLV in single precision
77 function trsylv_single_2stage(m,n)
78 return 1024
79 end
80
81 -- TRSYLV2 in double precision
82 function trsylv2_double_2stage(m,n)
83 return 1024
84 end
85
86 -- TRSYLV2 in single precision
87 function trsylv2_single_2stage(m,n)
88 return 1024
89 end
90
91 -- TGLYAP for double precision
92 function tglyap_double_2stage(m)
93 return 1024
94 end
95
96 -- TGLYAP for single precision precision
97 function tglyap_single_2stage(m)
98 return 1024
99 end
100
101 -- TRLYAP for double precision
102 function trlyap_double_2stage(m)
103 return 1024
104 end
105
106 -- TRLYAP for single precision precision
107 function trlyap_single_2stage(m)
108 return 1024
109 end
110
111 -- TGSTEIN for double precision
112 function tgstein_double_2stage(m)
113 return 1024
114 end
115
116 -- TGSTEIN for single precision precision
117 function tgstein_single_2stage(m)
118 return 1024
119 end
120
121 --
122 -- Block-sizes for the Level-3 and DAG schemes. These values are used by
123 -- routines with the names '_L3' or '_DAG'. The function with the suffix '_mb'
124 -- returns the block-size with respect to the number of rows of the solution.
125 -- The function suffixed with '_nb' returns the block-size with respect to the
126 -- number of columns of the solution. In case of symmetric equations, like
127 -- the Lypanov or the Stein equation. only the function with the suffix '_mb'
128 -- is used.
129
130 -- MB for TGCSYLV in double precision
131 function tgcsylv_double_mb(m,n)
132 return 32
133 end
134
135 -- NB for TGCSYLV in double precision
136 function tgcsylv_double_nb(m,n)
137 return 32
138 end
139
140 -- MB for TGCSYLV_DUAL in double precision
141 function tgcsylv_dual_double_mb(m,n)
142 return 32
143 end
144
145 -- NB for TGCSYLV_DUAL in double precision
146 function tgcsylv_dual_double_nb(m,n)
147 return 32
148 end
149
150 -- MB for TGCSYLV in single precision
151 function tgcsylv_single_mb(m,n)
152 return 32
153 end
154
155 -- NB for TGCSYLV in single precision
156 function tgcsylv_single_nb(m,n)
157 return 32
158 end
159
160 -- MB for TGCSYLV_DUAL in single precision
161 function tgcsylv_dual_single_mb(m,n)
162 return 32
163 end
164
165 -- NB for TGCSYLV_DUAL in single precision
166 function tgcsylv_dual_single_nb(m,n)
167 return 32
168 end
169
170 -- MB for TGSYLV in double precision
171 function tgsylv_double_mb(m,n)
172 return 32
173 end
174
175
176 -- NB for TGSYLV in double precision
177 function tgsylv_double_nb(m,n)
178 return 32
179 end
180
181 -- MB for TGSYLV in single precision
182 function tgsylv_single_mb(m,n)
183 return 32
184 end
185
186 -- NB for TGSYLV in single precision
187 function tgsylv_single_nb(m,n)
188 return 32
189 end
190
191 -- MB for TRSYLV in double precision
192 function trsylv_double_mb(m,n)
193 return 32
194 end
195
196
197 -- NB for TRSYLV in double precision
198 function trsylv_double_nb(m,n)
199 return 32
200 end
201
202 -- MB for TRSYLV in single precision
203 function trsylv_single_mb(m,n)
204 return 32
205 end
206
207 -- NB for TRSYLV in single precision
208 function trsylv_single_nb(m,n)
209 return 32
210 end
211
212 -- MB for TRSYLV2 in double precision
213 function trsylv2_double_mb(m,n)
214 return 32
215 end
216
217 -- NB for TRSYLV2 in double precision
218 function trsylv2_double_nb(m,n)
219 return 32
220 end
221
222 -- MB for TRSYLV2 in single precision
223 function trsylv2_single_mb(m,n)
224 return 32
225 end
226
227 -- NB for TRSYLV2 in single precision
228 function trsylv2_single_nb(m,n)
229 return 32
230 end
231
232 -- MB for TGLYAP in double precision
233 function tglyap_double_mb(m)
234 return 32
235 end
236
237 -- MB for TGLYAP in single precision
238 function tglyap_single_mb(m)
239 return 32
240 end
241
242 -- MB for TRLYAP in double precision
243 function trlyap_double_mb(m)
244 return 32
245 end
246
247 -- MB for TRLYAP in single precision
248 function trlyap_single_mb(m)
249 return 32
250 end
251
252 -- MB for TGSTEIN in double precision
253 function tgstein_double_mb(m)
254 return 32
255 end
256
257 -- MB for TGSTEIN in single precision
258 function tgstein_single_mb(m)
259 return 32
260 end
261
262 -- MB for TRSTEIN in double precision
263 function trstein_double_mb(m)
264 return 32
265 end
266
267 -- MB for TRSTEIN in single precision
268 function trstein_single_mb(m)
269 return 32
270 end
271
273 Optimal block-sizes can be obtained by running benchmarks. Therefore,
274 the examples from examples/triangular can be used. For example, the
275 optimal block size MB for the standard Lyapunov equation can be
276 obtained by executing
277
278 $ ./examples/triangular/benchmark_trlyap --solver=1 --rows=1000:1000:5000 --mb=32:32:128
279 # HDF5 Store Path: ./ (set with MEPACK_HDF_PATH)
280 # Command Line: --solver=1 --rows=1000:1000:5000 --mb=32:32:128
281 # RUNS: 5
282 # Number of Matrices: 1
283 # Rows: 1000 (1000:1000:5000)
284 # TRANSA: N
285 # Block Alignment: YES
286 # MACHINE DEFAULT Config:
287 # Solver: LEVEL3 - LEVEL2: LOCAL COPY fixed maximum size with alignment
288 #
289 # M MB Wall-Time CPU-Time Ratio Forward-Err
290 1000 32 1.47723e-01 5.73458e-01 3.88198e+00 1.02451e-12
291 1000 64 2.02979e-01 6.52629e-01 3.21526e+00 7.70460e-13
292 1000 96 1.96380e-01 7.09709e-01 3.61397e+00 7.76942e-13
293 1000 128 2.20266e-01 8.20464e-01 3.72487e+00 7.80125e-13
294 2000 32 1.52258e+00 5.40316e+00 3.54869e+00 2.53205e-12
295 2000 64 1.37894e+00 5.08727e+00 3.68926e+00 3.81413e-12
296 2000 96 1.50218e+00 5.38654e+00 3.58582e+00 6.89304e-12
297 2000 128 1.42162e+00 5.25848e+00 3.69895e+00 3.36752e-12
298 3000 32 4.94669e+00 1.81289e+01 3.66485e+00 6.80014e-12
299 3000 64 5.09770e+00 1.77916e+01 3.49011e+00 5.05035e-12
300 3000 96 4.85457e+00 1.75745e+01 3.62020e+00 6.99595e-12
301 3000 128 6.37335e+00 2.37100e+01 3.72018e+00 5.60129e-12
302 4000 32 1.49584e+01 5.41659e+01 3.62110e+00 3.11764e-12
303 4000 64 1.32034e+01 4.91697e+01 3.72402e+00 3.50054e-12
304 4000 96 1.55753e+01 5.29380e+01 3.39884e+00 3.79209e-12
305 4000 128 1.45271e+01 5.27828e+01 3.63339e+00 2.42395e-12
306 5000 32 2.87573e+01 1.03002e+02 3.58176e+00 4.27677e-12
307 5000 64 2.20910e+01 8.00796e+01 3.62499e+00 5.20105e-12
308 5000 96 1.84365e+01 6.93392e+01 3.76097e+00 4.36991e-12
309 5000 128 2.49911e+01 9.18636e+01 3.67585e+00 5.64129e-12
310
311
312 Selecting the block-sizes with the minimal runtimes from the output
313 and interpolating between them gives the following trlyap_double_mb
314 function on an Intel Celeron N3450 with OpenBLAS 0.3.8 in pthread-mode:
315
316 function trlyap_double_mb(m)
317 if ( m < 1500 ) then
318 return 32
319 elseif ( 1500 <= m and m < 2500 ) then
320 return 64
321 elseif ( 2500 <= m and m < 3500 ) then
322 return 96
323 elseif ( 3500 <= m and m < 4500 ) then
324 return 64
325 else
326 return 96
327 end
328 end
329
330
332 We provide a some preconfigured tuning files inside the src/config/
333 directory. At the moment, we provide them for the following systems:
334
335 • Dual Socket Intel Xeon Silver Edition 4110 (2x 8 cores), Intel
336 Parallel Studio 2018 (icc, ifort, mkl) (src/config/intel-xeon-
337 silver-4110-parallel-studio-2018.lua)
338
339 • Dual Socket Intel Xeon Haswell E5-2640 v3 (2x 8 cores), Intel
340 Parallel Studio 2018 (icc, ifort, mkl) (src/config/intel-xeon-
341 e5-2640v3-parallel-studio-2018.lua)
342
343 • Dual Socket IBM Power8 (2x 10 cores), IBM XLC 16.1, IBM XLF 16.1, IBM
344 ESSL 6.3 (src/config/ibm-power8-xlf-essl.lua)
345
347 MEPACK provides a huge set of Level-3, Level-2 and DAG accelerated
348 solvers. By default the level-3 solvers use the level-2 solvers with
349 aligned local copies. The DAG accelerated solvers use the same
350 selection. In the routines for solving the equations with general
351 coefficient matrices, the level-3 triangular solver is the default.
352 Everything can be changed with the help of the routines from Options -
353 Level 2 Solvers and Options - Frontend Solvers. If the GNU Fortran
354 compiler is used, using the level-2 solvers with aligned local copies
355 is no longer beneficial. In this case one can select the level-2 with
356 local copies (but without alignment) to achieve the same or in some
357 cases a better performance. For such reasons, all the differently
358 optimized solvers are included in MEPACK. In this way, one can evaluate
359 different solver approaches on newly emerge hardware platforms to check
360 for the best performance easily.
361
363 Since there are too many different possibilities to solve a given
364 equation with MEPACK, there are some basic rules, when which solver-
365 type is beneficial. In all cases 'dimension' is meant in the context of
366 the size of the right-hand side.
367
368 • If the problem is small, e.g. m,n <= 128, use a level-2.
369
370 • If the problem is small to medium sized, e.g. m,n <= 1000, use a
371 level-3 solver.
372
373 • If the problem is large, e.g. m,n = 1000 ... 5000, and only a few CPU
374 cores are available, use a level-3 solver as well.
375
376 • If the problem is large, e.g. m,n = 1000 ... 5000, and many CPU cores
377 are available, use a DAG accelerated solver.
378
379 • If the problem is huge, e.g. m,n >= 5000, use the 2-stage solver.
380
381 These recommendations are only a personal experience on the systems of
382 the author had available during development. The situation can change
383 on different hardware dramatically.
384
385
386
387Version 1.1.0 Wed Oct 18 2023 tuning(3)