1GMX-TUNE_PME(1) GROMACS GMX-TUNE_PME(1)
2
3
4
6 gmx-tune_pme - Time mdrun as a function of PME ranks to optimize set‐
7 tings
8
10 gmx tune_pme [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]]
11 [-tablep [<.xvg>]] [-tableb [<.xvg>]]
12 [-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]] [-p [<.out>]]
13 [-err [<.log>]] [-so [<.tpr>]] [-o [<.trr/.cpt/...>]]
14 [-x [<.xtc/.tng>]] [-cpo [<.cpt>]]
15 [-c [<.gro/.g96/...>]] [-e [<.edr>]] [-g [<.log>]]
16 [-dhdl [<.xvg>]] [-field [<.xvg>]] [-tpi [<.xvg>]]
17 [-tpid [<.xvg>]] [-eo [<.xvg>]] [-px [<.xvg>]]
18 [-pf [<.xvg>]] [-ro [<.xvg>]] [-ra [<.log>]]
19 [-rs [<.log>]] [-rt [<.log>]] [-mtx [<.mtx>]]
20 [-swap [<.xvg>]] [-bo [<.trr/.cpt/...>]] [-bx [<.xtc>]]
21 [-bcpo [<.cpt>]] [-bc [<.gro/.g96/...>]] [-be [<.edr>]]
22 [-bg [<.log>]] [-beo [<.xvg>]] [-bdhdl [<.xvg>]]
23 [-bfield [<.xvg>]] [-btpi [<.xvg>]] [-btpid [<.xvg>]]
24 [-bdevout [<.xvg>]] [-brunav [<.xvg>]] [-bpx [<.xvg>]]
25 [-bpf [<.xvg>]] [-bro [<.xvg>]] [-bra [<.log>]]
26 [-brs [<.log>]] [-brt [<.log>]] [-bmtx [<.mtx>]]
27 [-bdn [<.ndx>]] [-bswap [<.xvg>]] [-xvg <enum>]
28 [-mdrun <string>] [-np <int>] [-npstring <enum>]
29 [-ntmpi <int>] [-r <int>] [-max <real>] [-min <real>]
30 [-npme <enum>] [-fix <int>] [-rmax <real>]
31 [-rmin <real>] [-[no]scalevdw] [-ntpr <int>]
32 [-steps <int>] [-resetstep <int>] [-nsteps <int>]
33 [-[no]launch] [-[no]bench] [-[no]check]
34 [-gpu_id <string>] [-[no]append] [-[no]cpnum]
35 [-deffnm <string>]
36
38 For a given number -np or -ntmpi of ranks, gmx tune_pme systematically
39 times gmx mdrun with various numbers of PME-only ranks and determines
40 which setting is fastest. It will also test whether performance can be
41 enhanced by shifting load from the reciprocal to the real space part of
42 the Ewald sum. Simply pass your .tpr file to gmx tune_pme together
43 with other options for gmx mdrun as needed.
44
45 gmx tune_pme needs to call gmx mdrun and so requires that you specify
46 how to call mdrun with the argument to the -mdrun parameter. Depending
47 how you have built GROMACS, values such as 'gmx mdrun', 'gmx_d mdrun',
48 or 'gmx_mpi mdrun' might be needed.
49
50 The program that runs MPI programs can be set in the environment vari‐
51 able MPIRUN (defaults to 'mpirun'). Note that for certain MPI frame‐
52 works, you need to provide a machine- or hostfile. This can also be
53 passed via the MPIRUN variable, e.g.
54
55 export MPIRUN="/usr/local/mpirun -machinefile hosts" Note that in such
56 cases it is normally necessary to compile and/or run gmx tune_pme with‐
57 out MPI support, so that it can call the MPIRUN program.
58
59 Before doing the actual benchmark runs, gmx tune_pme will do a quick
60 check whether gmx mdrun works as expected with the provided parallel
61 settings if the -check option is activated (the default). Please call
62 gmx tune_pme with the normal options you would pass to gmx mdrun and
63 add -np for the number of ranks to perform the tests on, or -ntmpi for
64 the number of threads. You can also add -r to repeat each test several
65 times to get better statistics.
66
67 gmx tune_pme can test various real space / reciprocal space workloads
68 for you. With -ntpr you control how many extra .tpr files will be writ‐
69 ten with enlarged cutoffs and smaller Fourier grids respectively. Typ‐
70 ically, the first test (number 0) will be with the settings from the
71 input .tpr file; the last test (number ntpr) will have the Coulomb cut‐
72 off specified by -rmax with a somewhat smaller PME grid at the same
73 time. In this last test, the Fourier spacing is multiplied with
74 rmax/rcoulomb. The remaining .tpr files will have equally-spaced
75 Coulomb radii (and Fourier spacings) between these extremes. Note that
76 you can set -ntpr to 1 if you just seek the optimal number of PME-only
77 ranks; in that case your input .tpr file will remain unchanged.
78
79 For the benchmark runs, the default of 1000 time steps should suffice
80 for most MD systems. The dynamic load balancing needs about 100 time
81 steps to adapt to local load imbalances, therefore the time step coun‐
82 ters are by default reset after 100 steps. For large systems (>1M
83 atoms), as well as for a higher accuracy of the measurements, you
84 should set -resetstep to a higher value. From the 'DD' load imbalance
85 entries in the md.log output file you can tell after how many steps the
86 load is sufficiently balanced. Example call:
87
88 gmx tune_pme -np 64 -s protein.tpr -launch
89
90 After calling gmx mdrun several times, detailed performance information
91 is available in the output file perf.out. Note that during the bench‐
92 marks, a couple of temporary files are written (options -b*), these
93 will be automatically deleted after each test.
94
95 If you want the simulation to be started automatically with the opti‐
96 mized parameters, use the command line option -launch.
97
98 Basic support for GPU-enabled mdrun exists. Give a string containing
99 the IDs of the GPUs that you wish to use in the optimization in the
100 -gpu_id command-line argument. This works exactly like mdrun -gpu_id,
101 does not imply a mapping, and merely declares the eligible set of GPU
102 devices. gmx-tune_pme will construct calls to mdrun that use this set
103 appropriately. gmx-tune_pme does not support -gputasks.
104
106 Options to specify input files:
107
108 -s [<.tpr>] (topol.tpr)
109 Portable xdr run input file
110
111 -cpi [<.cpt>] (state.cpt) (Optional)
112 Checkpoint file
113
114 -table [<.xvg>] (table.xvg) (Optional)
115 xvgr/xmgr file
116
117 -tablep [<.xvg>] (tablep.xvg) (Optional)
118 xvgr/xmgr file
119
120 -tableb [<.xvg>] (table.xvg) (Optional)
121 xvgr/xmgr file
122
123 -rerun [<.xtc/.trr/...>] (rerun.xtc) (Optional)
124 Trajectory: xtc trr cpt gro g96 pdb tng
125
126 -ei [<.edi>] (sam.edi) (Optional)
127 ED sampling input
128
129 Options to specify output files:
130
131 -p [<.out>] (perf.out)
132 Generic output file
133
134 -err [<.log>] (bencherr.log)
135 Log file
136
137 -so [<.tpr>] (tuned.tpr)
138 Portable xdr run input file
139
140 -o [<.trr/.cpt/...>] (traj.trr)
141 Full precision trajectory: trr cpt tng
142
143 -x [<.xtc/.tng>] (traj_comp.xtc) (Optional)
144 Compressed trajectory (tng format or portable xdr format)
145
146 -cpo [<.cpt>] (state.cpt) (Optional)
147 Checkpoint file
148
149 -c [<.gro/.g96/...>] (confout.gro)
150 Structure file: gro g96 pdb brk ent esp
151
152 -e [<.edr>] (ener.edr)
153 Energy file
154
155 -g [<.log>] (md.log)
156 Log file
157
158 -dhdl [<.xvg>] (dhdl.xvg) (Optional)
159 xvgr/xmgr file
160
161 -field [<.xvg>] (field.xvg) (Optional)
162 xvgr/xmgr file
163
164 -tpi [<.xvg>] (tpi.xvg) (Optional)
165 xvgr/xmgr file
166
167 -tpid [<.xvg>] (tpidist.xvg) (Optional)
168 xvgr/xmgr file
169
170 -eo [<.xvg>] (edsam.xvg) (Optional)
171 xvgr/xmgr file
172
173 -px [<.xvg>] (pullx.xvg) (Optional)
174 xvgr/xmgr file
175
176 -pf [<.xvg>] (pullf.xvg) (Optional)
177 xvgr/xmgr file
178
179 -ro [<.xvg>] (rotation.xvg) (Optional)
180 xvgr/xmgr file
181
182 -ra [<.log>] (rotangles.log) (Optional)
183 Log file
184
185 -rs [<.log>] (rotslabs.log) (Optional)
186 Log file
187
188 -rt [<.log>] (rottorque.log) (Optional)
189 Log file
190
191 -mtx [<.mtx>] (nm.mtx) (Optional)
192 Hessian matrix
193
194 -swap [<.xvg>] (swapions.xvg) (Optional)
195 xvgr/xmgr file
196
197 -bo [<.trr/.cpt/...>] (bench.trr)
198 Full precision trajectory: trr cpt tng
199
200 -bx [<.xtc>] (bench.xtc)
201 Compressed trajectory (portable xdr format): xtc
202
203 -bcpo [<.cpt>] (bench.cpt)
204 Checkpoint file
205
206 -bc [<.gro/.g96/...>] (bench.gro)
207 Structure file: gro g96 pdb brk ent esp
208
209 -be [<.edr>] (bench.edr)
210 Energy file
211
212 -bg [<.log>] (bench.log)
213 Log file
214
215 -beo [<.xvg>] (benchedo.xvg) (Optional)
216 xvgr/xmgr file
217
218 -bdhdl [<.xvg>] (benchdhdl.xvg) (Optional)
219 xvgr/xmgr file
220
221 -bfield [<.xvg>] (benchfld.xvg) (Optional)
222 xvgr/xmgr file
223
224 -btpi [<.xvg>] (benchtpi.xvg) (Optional)
225 xvgr/xmgr file
226
227 -btpid [<.xvg>] (benchtpid.xvg) (Optional)
228 xvgr/xmgr file
229
230 -bdevout [<.xvg>] (benchdev.xvg) (Optional)
231 xvgr/xmgr file
232
233 -brunav [<.xvg>] (benchrnav.xvg) (Optional)
234 xvgr/xmgr file
235
236 -bpx [<.xvg>] (benchpx.xvg) (Optional)
237 xvgr/xmgr file
238
239 -bpf [<.xvg>] (benchpf.xvg) (Optional)
240 xvgr/xmgr file
241
242 -bro [<.xvg>] (benchrot.xvg) (Optional)
243 xvgr/xmgr file
244
245 -bra [<.log>] (benchrota.log) (Optional)
246 Log file
247
248 -brs [<.log>] (benchrots.log) (Optional)
249 Log file
250
251 -brt [<.log>] (benchrott.log) (Optional)
252 Log file
253
254 -bmtx [<.mtx>] (benchn.mtx) (Optional)
255 Hessian matrix
256
257 -bdn [<.ndx>] (bench.ndx) (Optional)
258 Index file
259
260 -bswap [<.xvg>] (benchswp.xvg) (Optional)
261 xvgr/xmgr file
262
263 Other options:
264
265 -xvg <enum> (xmgrace)
266 xvg plot formatting: xmgrace, xmgr, none
267
268 -mdrun <string>
269 Command line to run a simulation, e.g. 'gmx mdrun' or 'gmx_mpi
270 mdrun'
271
272 -np <int> (1)
273 Number of ranks to run the tests on (must be > 2 for separate
274 PME ranks)
275
276 -npstring <enum> (np)
277 Name of the $MPIRUN option that specifies the number of ranks to
278 use ('np', or 'n'; use 'none' if there is no such option): np,
279 n, none
280
281 -ntmpi <int> (1)
282 Number of MPI-threads to run the tests on (turns MPI & mpirun
283 off)
284
285 -r <int> (2)
286 Repeat each test this often
287
288 -max <real> (0.5)
289 Max fraction of PME ranks to test with
290
291 -min <real> (0.25)
292 Min fraction of PME ranks to test with
293
294 -npme <enum> (auto)
295 Within -min and -max, benchmark all possible values for -npme,
296 or just a reasonable subset. Auto neglects -min and -max and
297 chooses reasonable values around a guess for npme derived from
298 the .tpr: auto, all, subset
299
300 -fix <int> (-2)
301 If >= -1, do not vary the number of PME-only ranks, instead use
302 this fixed value and only vary rcoulomb and the PME grid spac‐
303 ing.
304
305 -rmax <real> (0)
306 If >0, maximal rcoulomb for -ntpr>1 (rcoulomb upscaling results
307 in fourier grid downscaling)
308
309 -rmin <real> (0)
310 If >0, minimal rcoulomb for -ntpr>1
311
312 -[no]scalevdw (yes)
313 Scale rvdw along with rcoulomb
314
315 -ntpr <int> (0)
316 Number of .tpr files to benchmark. Create this many files with
317 different rcoulomb scaling factors depending on -rmin and -rmax.
318 If < 1, automatically choose the number of .tpr files to test
319
320 -steps <int> (1000)
321 Take timings for this many steps in the benchmark runs
322
323 -resetstep <int> (1500)
324 Let dlb equilibrate this many steps before timings are taken
325 (reset cycle counters after this many steps)
326
327 -nsteps <int> (-1)
328 If non-negative, perform this many steps in the real run (over‐
329 writes nsteps from .tpr, add .cpt steps)
330
331 -[no]launch (no)
332 Launch the real simulation after optimization
333
334 -[no]bench (yes)
335 Run the benchmarks or just create the input .tpr files?
336
337 -[no]check (yes)
338 Before the benchmark runs, check whether mdrun works in parallel
339
340 -gpu_id <string>
341 List of unique GPU device IDs that are eligible for use
342
343 -[no]append (yes)
344 Append to previous output files when continuing from checkpoint
345 instead of adding the simulation part number to all file names
346 (for launch only)
347
348 -[no]cpnum (no)
349 Keep and number checkpoint files (launch only)
350
351 -deffnm <string>
352 Set the default filenames (launch only)
353
355 gmx(1)
356
357 More information about GROMACS is available at <‐
358 http://www.gromacs.org/>.
359
361 2022, GROMACS development team
362
363
364
365
3662022.2 Jun 16, 2022 GMX-TUNE_PME(1)