1GMX-TUNE_PME(1) GROMACS GMX-TUNE_PME(1)
2
3
4
6 gmx-tune_pme - Time mdrun as a function of PME ranks to optimize set‐
7 tings
8
10 gmx tune_pme [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]]
11 [-tablep [<.xvg>]] [-tableb [<.xvg>]]
12 [-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]] [-p [<.out>]]
13 [-err [<.log>]] [-so [<.tpr>]] [-o [<.trr/.cpt/...>]]
14 [-x [<.xtc/.tng>]] [-cpo [<.cpt>]]
15 [-c [<.gro/.g96/...>]] [-e [<.edr>]] [-g [<.log>]]
16 [-dhdl [<.xvg>]] [-field [<.xvg>]] [-tpi [<.xvg>]]
17 [-tpid [<.xvg>]] [-eo [<.xvg>]] [-devout [<.xvg>]]
18 [-runav [<.xvg>]] [-px [<.xvg>]] [-pf [<.xvg>]]
19 [-ro [<.xvg>]] [-ra [<.log>]] [-rs [<.log>]]
20 [-rt [<.log>]] [-mtx [<.mtx>]] [-swap [<.xvg>]]
21 [-bo [<.trr/.cpt/...>]] [-bx [<.xtc>]] [-bcpo [<.cpt>]]
22 [-bc [<.gro/.g96/...>]] [-be [<.edr>]] [-bg [<.log>]]
23 [-beo [<.xvg>]] [-bdhdl [<.xvg>]] [-bfield [<.xvg>]]
24 [-btpi [<.xvg>]] [-btpid [<.xvg>]] [-bdevout [<.xvg>]]
25 [-brunav [<.xvg>]] [-bpx [<.xvg>]] [-bpf [<.xvg>]]
26 [-bro [<.xvg>]] [-bra [<.log>]] [-brs [<.log>]]
27 [-brt [<.log>]] [-bmtx [<.mtx>]] [-bdn [<.ndx>]]
28 [-bswap [<.xvg>]] [-xvg <enum>] [-mdrun <string>]
29 [-np <int>] [-npstring <enum>] [-ntmpi <int>] [-r <int>]
30 [-max <real>] [-min <real>] [-npme <enum>] [-fix <int>]
31 [-rmax <real>] [-rmin <real>] [-[no]scalevdw]
32 [-ntpr <int>] [-steps <int>] [-resetstep <int>]
33 [-nsteps <int>] [-[no]launch] [-[no]bench] [-[no]check]
34 [-gpu_id <string>] [-[no]append] [-[no]cpnum]
35 [-deffnm <string>]
36
38 For a given number -np or -ntmpi of ranks, gmx tune_pme systematically
39 times gmx mdrun with various numbers of PME-only ranks and determines
40 which setting is fastest. It will also test whether performance can be
41 enhanced by shifting load from the reciprocal to the real space part of
42 the Ewald sum. Simply pass your .tpr file to gmx tune_pme together
43 with other options for gmx mdrun as needed.
44
45 gmx tune_pme needs to call gmx mdrun and so requires that you specify
46 how to call mdrun with the argument to the -mdrun parameter. Depending
47 how you have built GROMACS, values such as ‘gmx mdrun’, ‘gmx_d mdrun’,
48 or ‘mdrun_mpi’ might be needed.
49
50 The program that runs MPI programs can be set in the environment vari‐
51 able MPIRUN (defaults to ‘mpirun’). Note that for certain MPI frame‐
52 works, you need to provide a machine- or hostfile. This can also be
53 passed via the MPIRUN variable, e.g.
54
55 export MPIRUN="/usr/local/mpirun -machinefile hosts" Note that in such
56 cases it is normally necessary to compile and/or run gmx tune_pme with‐
57 out MPI support, so that it can call the MPIRUN program.
58
59 Before doing the actual benchmark runs, gmx tune_pme will do a quick
60 check whether gmx mdrun works as expected with the provided parallel
61 settings if the -check option is activated (the default). Please call
62 gmx tune_pme with the normal options you would pass to gmx mdrun and
63 add -np for the number of ranks to perform the tests on, or -ntmpi for
64 the number of threads. You can also add -r to repeat each test several
65 times to get better statistics.
66
67 gmx tune_pme can test various real space / reciprocal space workloads
68 for you. With -ntpr you control how many extra .tpr files will be writ‐
69 ten with enlarged cutoffs and smaller Fourier grids respectively. Typ‐
70 ically, the first test (number 0) will be with the settings from the
71 input .tpr file; the last test (number ntpr) will have the Coulomb cut‐
72 off specified by -rmax with a somewhat smaller PME grid at the same
73 time. In this last test, the Fourier spacing is multiplied with
74 rmax/rcoulomb. The remaining .tpr files will have equally-spaced
75 Coulomb radii (and Fourier spacings) between these extremes. Note that
76 you can set -ntpr to 1 if you just seek the optimal number of PME-only
77 ranks; in that case your input .tpr file will remain unchanged.
78
79 For the benchmark runs, the default of 1000 time steps should suffice
80 for most MD systems. The dynamic load balancing needs about 100 time
81 steps to adapt to local load imbalances, therefore the time step coun‐
82 ters are by default reset after 100 steps. For large systems (>1M
83 atoms), as well as for a higher accuracy of the measurements, you
84 should set -resetstep to a higher value. From the ‘DD’ load imbalance
85 entries in the md.log output file you can tell after how many steps the
86 load is sufficiently balanced. Example call:
87
88 gmx tune_pme -np 64 -s protein.tpr -launch
89
90 After calling gmx mdrun several times, detailed performance information
91 is available in the output file perf.out. Note that during the bench‐
92 marks, a couple of temporary files are written (options -b*), these
93 will be automatically deleted after each test.
94
95 If you want the simulation to be started automatically with the opti‐
96 mized parameters, use the command line option -launch.
97
98 Basic support for GPU-enabled mdrun exists. Give a string containing
99 the IDs of the GPUs that you wish to use in the optimization in the
100 -gpu_id command-line argument. This works exactly like mdrun -gpu_id,
101 does not imply a mapping, and merely declares the eligible set of GPU
102 devices. gmx-tune_pme will construct calls to mdrun that use this set
103 appropriately. gmx-tune_pme does not support -gputasks.
104
106 Options to specify input files:
107
108 -s [<.tpr>] (topol.tpr)
109 Portable xdr run input file
110
111 -cpi [<.cpt>] (state.cpt) (Optional)
112 Checkpoint file
113
114 -table [<.xvg>] (table.xvg) (Optional)
115 xvgr/xmgr file
116
117 -tablep [<.xvg>] (tablep.xvg) (Optional)
118 xvgr/xmgr file
119
120 -tableb [<.xvg>] (table.xvg) (Optional)
121 xvgr/xmgr file
122
123 -rerun [<.xtc/.trr/…>] (rerun.xtc) (Optional)
124 Trajectory: xtc trr cpt gro g96 pdb tng
125
126 -ei [<.edi>] (sam.edi) (Optional)
127 ED sampling input
128
129 Options to specify output files:
130
131 -p [<.out>] (perf.out)
132 Generic output file
133
134 -err [<.log>] (bencherr.log)
135 Log file
136
137 -so [<.tpr>] (tuned.tpr)
138 Portable xdr run input file
139
140 -o [<.trr/.cpt/…>] (traj.trr)
141 Full precision trajectory: trr cpt tng
142
143 -x [<.xtc/.tng>] (traj_comp.xtc) (Optional)
144 Compressed trajectory (tng format or portable xdr format)
145
146 -cpo [<.cpt>] (state.cpt) (Optional)
147 Checkpoint file
148
149 -c [<.gro/.g96/…>] (confout.gro)
150 Structure file: gro g96 pdb brk ent esp
151
152 -e [<.edr>] (ener.edr)
153 Energy file
154
155 -g [<.log>] (md.log)
156 Log file
157
158 -dhdl [<.xvg>] (dhdl.xvg) (Optional)
159 xvgr/xmgr file
160
161 -field [<.xvg>] (field.xvg) (Optional)
162 xvgr/xmgr file
163
164 -tpi [<.xvg>] (tpi.xvg) (Optional)
165 xvgr/xmgr file
166
167 -tpid [<.xvg>] (tpidist.xvg) (Optional)
168 xvgr/xmgr file
169
170 -eo [<.xvg>] (edsam.xvg) (Optional)
171 xvgr/xmgr file
172
173 -devout [<.xvg>] (deviatie.xvg) (Optional)
174 xvgr/xmgr file
175
176 -runav [<.xvg>] (runaver.xvg) (Optional)
177 xvgr/xmgr file
178
179 -px [<.xvg>] (pullx.xvg) (Optional)
180 xvgr/xmgr file
181
182 -pf [<.xvg>] (pullf.xvg) (Optional)
183 xvgr/xmgr file
184
185 -ro [<.xvg>] (rotation.xvg) (Optional)
186 xvgr/xmgr file
187
188 -ra [<.log>] (rotangles.log) (Optional)
189 Log file
190
191 -rs [<.log>] (rotslabs.log) (Optional)
192 Log file
193
194 -rt [<.log>] (rottorque.log) (Optional)
195 Log file
196
197 -mtx [<.mtx>] (nm.mtx) (Optional)
198 Hessian matrix
199
200 -swap [<.xvg>] (swapions.xvg) (Optional)
201 xvgr/xmgr file
202
203 -bo [<.trr/.cpt/…>] (bench.trr)
204 Full precision trajectory: trr cpt tng
205
206 -bx [<.xtc>] (bench.xtc)
207 Compressed trajectory (portable xdr format): xtc
208
209 -bcpo [<.cpt>] (bench.cpt)
210 Checkpoint file
211
212 -bc [<.gro/.g96/…>] (bench.gro)
213 Structure file: gro g96 pdb brk ent esp
214
215 -be [<.edr>] (bench.edr)
216 Energy file
217
218 -bg [<.log>] (bench.log)
219 Log file
220
221 -beo [<.xvg>] (benchedo.xvg) (Optional)
222 xvgr/xmgr file
223
224 -bdhdl [<.xvg>] (benchdhdl.xvg) (Optional)
225 xvgr/xmgr file
226
227 -bfield [<.xvg>] (benchfld.xvg) (Optional)
228 xvgr/xmgr file
229
230 -btpi [<.xvg>] (benchtpi.xvg) (Optional)
231 xvgr/xmgr file
232
233 -btpid [<.xvg>] (benchtpid.xvg) (Optional)
234 xvgr/xmgr file
235
236 -bdevout [<.xvg>] (benchdev.xvg) (Optional)
237 xvgr/xmgr file
238
239 -brunav [<.xvg>] (benchrnav.xvg) (Optional)
240 xvgr/xmgr file
241
242 -bpx [<.xvg>] (benchpx.xvg) (Optional)
243 xvgr/xmgr file
244
245 -bpf [<.xvg>] (benchpf.xvg) (Optional)
246 xvgr/xmgr file
247
248 -bro [<.xvg>] (benchrot.xvg) (Optional)
249 xvgr/xmgr file
250
251 -bra [<.log>] (benchrota.log) (Optional)
252 Log file
253
254 -brs [<.log>] (benchrots.log) (Optional)
255 Log file
256
257 -brt [<.log>] (benchrott.log) (Optional)
258 Log file
259
260 -bmtx [<.mtx>] (benchn.mtx) (Optional)
261 Hessian matrix
262
263 -bdn [<.ndx>] (bench.ndx) (Optional)
264 Index file
265
266 -bswap [<.xvg>] (benchswp.xvg) (Optional)
267 xvgr/xmgr file
268
269 Other options:
270
271 -xvg <enum> (xmgrace)
272 xvg plot formatting: xmgrace, xmgr, none
273
274 -mdrun <string>
275 Command line to run a simulation, e.g. ‘gmx mdrun’ or
276 ‘mdrun_mpi’
277
278 -np <int> (1)
279 Number of ranks to run the tests on (must be > 2 for separate
280 PME ranks)
281
282 -npstring <enum> (np)
283 Name of the $MPIRUN option that specifies the number of ranks to
284 use (‘np’, or ‘n’; use ‘none’ if there is no such option): np,
285 n, none
286
287 -ntmpi <int> (1)
288 Number of MPI-threads to run the tests on (turns MPI & mpirun
289 off)
290
291 -r <int> (2)
292 Repeat each test this often
293
294 -max <real> (0.5)
295 Max fraction of PME ranks to test with
296
297 -min <real> (0.25)
298 Min fraction of PME ranks to test with
299
300 -npme <enum> (auto)
301 Within -min and -max, benchmark all possible values for -npme,
302 or just a reasonable subset. Auto neglects -min and -max and
303 chooses reasonable values around a guess for npme derived from
304 the .tpr: auto, all, subset
305
306 -fix <int> (-2)
307 If >= -1, do not vary the number of PME-only ranks, instead use
308 this fixed value and only vary rcoulomb and the PME grid spac‐
309 ing.
310
311 -rmax <real> (0)
312 If >0, maximal rcoulomb for -ntpr>1 (rcoulomb upscaling results
313 in fourier grid downscaling)
314
315 -rmin <real> (0)
316 If >0, minimal rcoulomb for -ntpr>1
317
318 -[no]scalevdw (yes)
319 Scale rvdw along with rcoulomb
320
321 -ntpr <int> (0)
322 Number of .tpr files to benchmark. Create this many files with
323 different rcoulomb scaling factors depending on -rmin and -rmax.
324 If < 1, automatically choose the number of .tpr files to test
325
326 -steps <int> (1000)
327 Take timings for this many steps in the benchmark runs
328
329 -resetstep <int> (1500)
330 Let dlb equilibrate this many steps before timings are taken
331 (reset cycle counters after this many steps)
332
333 -nsteps <int> (-1)
334 If non-negative, perform this many steps in the real run (over‐
335 writes nsteps from .tpr, add .cpt steps)
336
337 -[no]launch (no)
338 Launch the real simulation after optimization
339
340 -[no]bench (yes)
341 Run the benchmarks or just create the input .tpr files?
342
343 -[no]check (yes)
344 Before the benchmark runs, check whether mdrun works in parallel
345
346 -gpu_id <string>
347 List of unique GPU device IDs that are eligible for use
348
349 -[no]append (yes)
350 Append to previous output files when continuing from checkpoint
351 instead of adding the simulation part number to all file names
352 (for launch only)
353
354 -[no]cpnum (no)
355 Keep and number checkpoint files (launch only)
356
357 -deffnm <string>
358 Set the default filenames (launch only)
359
361 gmx(1)
362
363 More information about GROMACS is available at <‐
364 http://www.gromacs.org/>.
365
367 2019, GROMACS development team
368
369
370
371
3722019.4 Oct 02, 2019 GMX-TUNE_PME(1)