1GMX-TUNE_PME(1)                     GROMACS                    GMX-TUNE_PME(1)
2
3
4

NAME

6       gmx-tune_pme  -  Time mdrun as a function of PME ranks to optimize set‐
7       tings
8

SYNOPSIS

10          gmx tune_pme [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]]
11                       [-tablep [<.xvg>]] [-tableb [<.xvg>]]
12                       [-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]] [-p [<.out>]]
13                       [-err [<.log>]] [-so [<.tpr>]] [-o [<.trr/.cpt/...>]]
14                       [-x [<.xtc/.tng>]] [-cpo [<.cpt>]]
15                       [-c [<.gro/.g96/...>]] [-e [<.edr>]] [-g [<.log>]]
16                       [-dhdl [<.xvg>]] [-field [<.xvg>]] [-tpi [<.xvg>]]
17                       [-tpid [<.xvg>]] [-eo [<.xvg>]] [-devout [<.xvg>]]
18                       [-runav [<.xvg>]] [-px [<.xvg>]] [-pf [<.xvg>]]
19                       [-ro [<.xvg>]] [-ra [<.log>]] [-rs [<.log>]]
20                       [-rt [<.log>]] [-mtx [<.mtx>]] [-swap [<.xvg>]]
21                       [-bo [<.trr/.cpt/...>]] [-bx [<.xtc>]] [-bcpo [<.cpt>]]
22                       [-bc [<.gro/.g96/...>]] [-be [<.edr>]] [-bg [<.log>]]
23                       [-beo [<.xvg>]] [-bdhdl [<.xvg>]] [-bfield [<.xvg>]]
24                       [-btpi [<.xvg>]] [-btpid [<.xvg>]] [-bdevout [<.xvg>]]
25                       [-brunav [<.xvg>]] [-bpx [<.xvg>]] [-bpf [<.xvg>]]
26                       [-bro [<.xvg>]] [-bra [<.log>]] [-brs [<.log>]]
27                       [-brt [<.log>]] [-bmtx [<.mtx>]] [-bdn [<.ndx>]]
28                       [-bswap [<.xvg>]] [-xvg <enum>] [-mdrun <string>]
29                       [-np <int>] [-npstring <enum>] [-ntmpi <int>] [-r <int>]
30                       [-max <real>] [-min <real>] [-npme <enum>] [-fix <int>]
31                       [-rmax <real>] [-rmin <real>] [-[no]scalevdw]
32                       [-ntpr <int>] [-steps <int>] [-resetstep <int>]
33                       [-nsteps <int>] [-[no]launch] [-[no]bench] [-[no]check]
34                       [-gpu_id <string>] [-[no]append] [-[no]cpnum]
35                       [-deffnm <string>]
36

DESCRIPTION

38       For a given number -np or -ntmpi of ranks, gmx tune_pme  systematically
39       times  gmx  mdrun with various numbers of PME-only ranks and determines
40       which setting is fastest. It will also test whether performance can  be
41       enhanced by shifting load from the reciprocal to the real space part of
42       the Ewald sum.  Simply pass your .tpr file  to  gmx  tune_pme  together
43       with other options for gmx mdrun as needed.
44
45       gmx  tune_pme  needs to call gmx mdrun and so requires that you specify
46       how to call mdrun with the argument to the -mdrun parameter.  Depending
47       how  you have built GROMACS, values such as ‘gmx mdrun’, ‘gmx_d mdrun’,
48       or ‘mdrun_mpi’ might be needed.
49
50       The program that runs MPI programs can be set in the environment  vari‐
51       able  MPIRUN  (defaults  to ‘mpirun’). Note that for certain MPI frame‐
52       works, you need to provide a machine- or hostfile.  This  can  also  be
53       passed via the MPIRUN variable, e.g.
54
55       export  MPIRUN="/usr/local/mpirun -machinefile hosts" Note that in such
56       cases it is normally necessary to compile and/or run gmx tune_pme with‐
57       out MPI support, so that it can call the MPIRUN program.
58
59       Before  doing  the  actual benchmark runs, gmx tune_pme will do a quick
60       check whether gmx mdrun works as expected with  the  provided  parallel
61       settings  if the -check option is activated (the default).  Please call
62       gmx tune_pme with the normal options you would pass to  gmx  mdrun  and
63       add  -np for the number of ranks to perform the tests on, or -ntmpi for
64       the number of threads. You can also add -r to repeat each test  several
65       times to get better statistics.
66
67       gmx  tune_pme  can test various real space / reciprocal space workloads
68       for you. With -ntpr you control how many extra .tpr files will be writ‐
69       ten with enlarged cutoffs and smaller Fourier grids respectively.  Typ‐
70       ically, the first test (number 0) will be with the  settings  from  the
71       input .tpr file; the last test (number ntpr) will have the Coulomb cut‐
72       off specified by -rmax with a somewhat smaller PME  grid  at  the  same
73       time.   In  this  last  test,  the  Fourier  spacing is multiplied with
74       rmax/rcoulomb.  The  remaining  .tpr  files  will  have  equally-spaced
75       Coulomb  radii (and Fourier spacings) between these extremes. Note that
76       you can set -ntpr to 1 if you just seek the optimal number of  PME-only
77       ranks; in that case your input .tpr file will remain unchanged.
78
79       For  the  benchmark runs, the default of 1000 time steps should suffice
80       for most MD systems. The dynamic load balancing needs  about  100  time
81       steps  to adapt to local load imbalances, therefore the time step coun‐
82       ters are by default reset after  100  steps.  For  large  systems  (>1M
83       atoms),  as  well  as  for  a  higher accuracy of the measurements, you
84       should set -resetstep to a higher value.  From the ‘DD’ load  imbalance
85       entries in the md.log output file you can tell after how many steps the
86       load is sufficiently balanced. Example call:
87
88       gmx tune_pme -np 64 -s protein.tpr -launch
89
90       After calling gmx mdrun several times, detailed performance information
91       is  available in the output file perf.out.  Note that during the bench‐
92       marks, a couple of temporary files are  written  (options  -b*),  these
93       will be automatically deleted after each test.
94
95       If  you  want the simulation to be started automatically with the opti‐
96       mized parameters, use the command line option -launch.
97
98       Basic support for GPU-enabled mdrun exists. Give  a  string  containing
99       the  IDs  of  the  GPUs that you wish to use in the optimization in the
100       -gpu_id command-line argument. This works exactly like  mdrun  -gpu_id,
101       does  not  imply a mapping, and merely declares the eligible set of GPU
102       devices. gmx-tune_pme will construct calls to mdrun that use  this  set
103       appropriately. gmx-tune_pme does not support -gputasks.
104

OPTIONS

106       Options to specify input files:
107
108       -s [<.tpr>] (topol.tpr)
109              Portable xdr run input file
110
111       -cpi [<.cpt>] (state.cpt) (Optional)
112              Checkpoint file
113
114       -table [<.xvg>] (table.xvg) (Optional)
115              xvgr/xmgr file
116
117       -tablep [<.xvg>] (tablep.xvg) (Optional)
118              xvgr/xmgr file
119
120       -tableb [<.xvg>] (table.xvg) (Optional)
121              xvgr/xmgr file
122
123       -rerun [<.xtc/.trr/…>] (rerun.xtc) (Optional)
124              Trajectory: xtc trr cpt gro g96 pdb tng
125
126       -ei [<.edi>] (sam.edi) (Optional)
127              ED sampling input
128
129       Options to specify output files:
130
131       -p [<.out>] (perf.out)
132              Generic output file
133
134       -err [<.log>] (bencherr.log)
135              Log file
136
137       -so [<.tpr>] (tuned.tpr)
138              Portable xdr run input file
139
140       -o [<.trr/.cpt/…>] (traj.trr)
141              Full precision trajectory: trr cpt tng
142
143       -x [<.xtc/.tng>] (traj_comp.xtc) (Optional)
144              Compressed trajectory (tng format or portable xdr format)
145
146       -cpo [<.cpt>] (state.cpt) (Optional)
147              Checkpoint file
148
149       -c [<.gro/.g96/…>] (confout.gro)
150              Structure file: gro g96 pdb brk ent esp
151
152       -e [<.edr>] (ener.edr)
153              Energy file
154
155       -g [<.log>] (md.log)
156              Log file
157
158       -dhdl [<.xvg>] (dhdl.xvg) (Optional)
159              xvgr/xmgr file
160
161       -field [<.xvg>] (field.xvg) (Optional)
162              xvgr/xmgr file
163
164       -tpi [<.xvg>] (tpi.xvg) (Optional)
165              xvgr/xmgr file
166
167       -tpid [<.xvg>] (tpidist.xvg) (Optional)
168              xvgr/xmgr file
169
170       -eo [<.xvg>] (edsam.xvg) (Optional)
171              xvgr/xmgr file
172
173       -devout [<.xvg>] (deviatie.xvg) (Optional)
174              xvgr/xmgr file
175
176       -runav [<.xvg>] (runaver.xvg) (Optional)
177              xvgr/xmgr file
178
179       -px [<.xvg>] (pullx.xvg) (Optional)
180              xvgr/xmgr file
181
182       -pf [<.xvg>] (pullf.xvg) (Optional)
183              xvgr/xmgr file
184
185       -ro [<.xvg>] (rotation.xvg) (Optional)
186              xvgr/xmgr file
187
188       -ra [<.log>] (rotangles.log) (Optional)
189              Log file
190
191       -rs [<.log>] (rotslabs.log) (Optional)
192              Log file
193
194       -rt [<.log>] (rottorque.log) (Optional)
195              Log file
196
197       -mtx [<.mtx>] (nm.mtx) (Optional)
198              Hessian matrix
199
200       -swap [<.xvg>] (swapions.xvg) (Optional)
201              xvgr/xmgr file
202
203       -bo [<.trr/.cpt/…>] (bench.trr)
204              Full precision trajectory: trr cpt tng
205
206       -bx [<.xtc>] (bench.xtc)
207              Compressed trajectory (portable xdr format): xtc
208
209       -bcpo [<.cpt>] (bench.cpt)
210              Checkpoint file
211
212       -bc [<.gro/.g96/…>] (bench.gro)
213              Structure file: gro g96 pdb brk ent esp
214
215       -be [<.edr>] (bench.edr)
216              Energy file
217
218       -bg [<.log>] (bench.log)
219              Log file
220
221       -beo [<.xvg>] (benchedo.xvg) (Optional)
222              xvgr/xmgr file
223
224       -bdhdl [<.xvg>] (benchdhdl.xvg) (Optional)
225              xvgr/xmgr file
226
227       -bfield [<.xvg>] (benchfld.xvg) (Optional)
228              xvgr/xmgr file
229
230       -btpi [<.xvg>] (benchtpi.xvg) (Optional)
231              xvgr/xmgr file
232
233       -btpid [<.xvg>] (benchtpid.xvg) (Optional)
234              xvgr/xmgr file
235
236       -bdevout [<.xvg>] (benchdev.xvg) (Optional)
237              xvgr/xmgr file
238
239       -brunav [<.xvg>] (benchrnav.xvg) (Optional)
240              xvgr/xmgr file
241
242       -bpx [<.xvg>] (benchpx.xvg) (Optional)
243              xvgr/xmgr file
244
245       -bpf [<.xvg>] (benchpf.xvg) (Optional)
246              xvgr/xmgr file
247
248       -bro [<.xvg>] (benchrot.xvg) (Optional)
249              xvgr/xmgr file
250
251       -bra [<.log>] (benchrota.log) (Optional)
252              Log file
253
254       -brs [<.log>] (benchrots.log) (Optional)
255              Log file
256
257       -brt [<.log>] (benchrott.log) (Optional)
258              Log file
259
260       -bmtx [<.mtx>] (benchn.mtx) (Optional)
261              Hessian matrix
262
263       -bdn [<.ndx>] (bench.ndx) (Optional)
264              Index file
265
266       -bswap [<.xvg>] (benchswp.xvg) (Optional)
267              xvgr/xmgr file
268
269       Other options:
270
271       -xvg <enum> (xmgrace)
272              xvg plot formatting: xmgrace, xmgr, none
273
274       -mdrun <string>
275              Command   line   to  run  a  simulation,  e.g.  ‘gmx  mdrun’  or
276              ‘mdrun_mpi’
277
278       -np <int> (1)
279              Number of ranks to run the tests on (must be >  2  for  separate
280              PME ranks)
281
282       -npstring <enum> (np)
283              Name of the $MPIRUN option that specifies the number of ranks to
284              use (‘np’, or ‘n’; use ‘none’ if there is no such  option):  np,
285              n, none
286
287       -ntmpi <int> (1)
288              Number  of  MPI-threads  to run the tests on (turns MPI & mpirun
289              off)
290
291       -r <int> (2)
292              Repeat each test this often
293
294       -max <real> (0.5)
295              Max fraction of PME ranks to test with
296
297       -min <real> (0.25)
298              Min fraction of PME ranks to test with
299
300       -npme <enum> (auto)
301              Within -min and -max, benchmark all possible values  for  -npme,
302              or  just  a  reasonable  subset. Auto neglects -min and -max and
303              chooses reasonable values around a guess for npme  derived  from
304              the .tpr: auto, all, subset
305
306       -fix <int> (-2)
307              If  >= -1, do not vary the number of PME-only ranks, instead use
308              this fixed value and only vary rcoulomb and the PME  grid  spac‐
309              ing.
310
311       -rmax <real> (0)
312              If  >0, maximal rcoulomb for -ntpr>1 (rcoulomb upscaling results
313              in fourier grid downscaling)
314
315       -rmin <real> (0)
316              If >0, minimal rcoulomb for -ntpr>1
317
318       -[no]scalevdw (yes)
319              Scale rvdw along with rcoulomb
320
321       -ntpr <int> (0)
322              Number of .tpr files to benchmark. Create this many  files  with
323              different rcoulomb scaling factors depending on -rmin and -rmax.
324              If < 1, automatically choose the number of .tpr files to test
325
326       -steps <int> (1000)
327              Take timings for this many steps in the benchmark runs
328
329       -resetstep <int> (1500)
330              Let dlb equilibrate this many steps  before  timings  are  taken
331              (reset cycle counters after this many steps)
332
333       -nsteps <int> (-1)
334              If  non-negative, perform this many steps in the real run (over‐
335              writes nsteps from .tpr, add .cpt steps)
336
337       -[no]launch (no)
338              Launch the real simulation after optimization
339
340       -[no]bench (yes)
341              Run the benchmarks or just create the input .tpr files?
342
343       -[no]check (yes)
344              Before the benchmark runs, check whether mdrun works in parallel
345
346       -gpu_id <string>
347              List of unique GPU device IDs that are eligible for use
348
349       -[no]append (yes)
350              Append to previous output files when continuing from  checkpoint
351              instead  of  adding the simulation part number to all file names
352              (for launch only)
353
354       -[no]cpnum (no)
355              Keep and number checkpoint files (launch only)
356
357       -deffnm <string>
358              Set the default filenames (launch only)
359

SEE ALSO

361       gmx(1)
362
363       More    information    about    GROMACS    is    available    at     <‐
364       http://www.gromacs.org/>.
365
367       2020, GROMACS development team
368
369
370
371
3722019.6                           Feb 28, 2020                  GMX-TUNE_PME(1)
Impressum