1GMX-TUNE_PME(1)                     GROMACS                    GMX-TUNE_PME(1)
2
3
4

NAME

6       gmx-tune_pme  -  Time mdrun as a function of PME ranks to optimize set‐
7       tings
8

SYNOPSIS

10          gmx tune_pme [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]]
11                       [-tablep [<.xvg>]] [-tableb [<.xvg>]]
12                       [-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]] [-p [<.out>]]
13                       [-err [<.log>]] [-so [<.tpr>]] [-o [<.trr/.cpt/...>]]
14                       [-x [<.xtc/.tng>]] [-cpo [<.cpt>]]
15                       [-c [<.gro/.g96/...>]] [-e [<.edr>]] [-g [<.log>]]
16                       [-dhdl [<.xvg>]] [-field [<.xvg>]] [-tpi [<.xvg>]]
17                       [-tpid [<.xvg>]] [-eo [<.xvg>]] [-px [<.xvg>]]
18                       [-pf [<.xvg>]] [-ro [<.xvg>]] [-ra [<.log>]]
19                       [-rs [<.log>]] [-rt [<.log>]] [-mtx [<.mtx>]]
20                       [-swap [<.xvg>]] [-bo [<.trr/.cpt/...>]] [-bx [<.xtc>]]
21                       [-bcpo [<.cpt>]] [-bc [<.gro/.g96/...>]] [-be [<.edr>]]
22                       [-bg [<.log>]] [-beo [<.xvg>]] [-bdhdl [<.xvg>]]
23                       [-bfield [<.xvg>]] [-btpi [<.xvg>]] [-btpid [<.xvg>]]
24                       [-bdevout [<.xvg>]] [-brunav [<.xvg>]] [-bpx [<.xvg>]]
25                       [-bpf [<.xvg>]] [-bro [<.xvg>]] [-bra [<.log>]]
26                       [-brs [<.log>]] [-brt [<.log>]] [-bmtx [<.mtx>]]
27                       [-bdn [<.ndx>]] [-bswap [<.xvg>]] [-xvg <enum>]
28                       [-mdrun <string>] [-np <int>] [-npstring <enum>]
29                       [-ntmpi <int>] [-r <int>] [-max <real>] [-min <real>]
30                       [-npme <enum>] [-fix <int>] [-rmax <real>]
31                       [-rmin <real>] [-[no]scalevdw] [-ntpr <int>]
32                       [-steps <int>] [-resetstep <int>] [-nsteps <int>]
33                       [-[no]launch] [-[no]bench] [-[no]check]
34                       [-gpu_id <string>] [-[no]append] [-[no]cpnum]
35                       [-deffnm <string>]
36

DESCRIPTION

38       For a given number -np or -ntmpi of ranks, gmx tune_pme  systematically
39       times  gmx  mdrun with various numbers of PME-only ranks and determines
40       which setting is fastest. It will also test whether performance can  be
41       enhanced by shifting load from the reciprocal to the real space part of
42       the Ewald sum.  Simply pass your .tpr file  to  gmx  tune_pme  together
43       with other options for gmx mdrun as needed.
44
45       gmx  tune_pme  needs to call gmx mdrun and so requires that you specify
46       how to call mdrun with the argument to the -mdrun parameter.  Depending
47       how  you have built GROMACS, values such as 'gmx mdrun', 'gmx_d mdrun',
48       or 'gmx_mpi mdrun' might be needed.
49
50       The program that runs MPI programs can be set in the environment  vari‐
51       able  MPIRUN  (defaults  to 'mpirun'). Note that for certain MPI frame‐
52       works, you need to provide a machine- or hostfile.  This  can  also  be
53       passed via the MPIRUN variable, e.g.
54
55       export  MPIRUN="/usr/local/mpirun -machinefile hosts" Note that in such
56       cases it is normally necessary to compile and/or run gmx tune_pme with‐
57       out MPI support, so that it can call the MPIRUN program.
58
59       Before  doing  the  actual benchmark runs, gmx tune_pme will do a quick
60       check whether gmx mdrun works as expected with  the  provided  parallel
61       settings  if the -check option is activated (the default).  Please call
62       gmx tune_pme with the normal options you would pass to  gmx  mdrun  and
63       add  -np for the number of ranks to perform the tests on, or -ntmpi for
64       the number of threads. You can also add -r to repeat each test  several
65       times to get better statistics.
66
67       gmx  tune_pme  can test various real space / reciprocal space workloads
68       for you. With -ntpr you control how many extra .tpr files will be writ‐
69       ten with enlarged cutoffs and smaller Fourier grids respectively.  Typ‐
70       ically, the first test (number 0) will be with the  settings  from  the
71       input .tpr file; the last test (number ntpr) will have the Coulomb cut‐
72       off specified by -rmax with a somewhat smaller PME  grid  at  the  same
73       time.   In  this  last  test,  the  Fourier  spacing is multiplied with
74       rmax/rcoulomb.  The  remaining  .tpr  files  will  have  equally-spaced
75       Coulomb  radii (and Fourier spacings) between these extremes. Note that
76       you can set -ntpr to 1 if you just seek the optimal number of  PME-only
77       ranks; in that case your input .tpr file will remain unchanged.
78
79       For  the  benchmark runs, the default of 1000 time steps should suffice
80       for most MD systems. The dynamic load balancing needs  about  100  time
81       steps  to adapt to local load imbalances, therefore the time step coun‐
82       ters are by default reset after  100  steps.  For  large  systems  (>1M
83       atoms),  as  well  as  for  a  higher accuracy of the measurements, you
84       should set -resetstep to a higher value.  From the 'DD' load  imbalance
85       entries in the md.log output file you can tell after how many steps the
86       load is sufficiently balanced. Example call:
87
88       gmx tune_pme -np 64 -s protein.tpr -launch
89
90       After calling gmx mdrun several times, detailed performance information
91       is  available in the output file perf.out.  Note that during the bench‐
92       marks, a couple of temporary files are  written  (options  -b*),  these
93       will be automatically deleted after each test.
94
95       If  you  want the simulation to be started automatically with the opti‐
96       mized parameters, use the command line option -launch.
97
98       Basic support for GPU-enabled mdrun exists. Give  a  string  containing
99       the  IDs  of  the  GPUs that you wish to use in the optimization in the
100       -gpu_id command-line argument. This works exactly like  mdrun  -gpu_id,
101       does  not  imply a mapping, and merely declares the eligible set of GPU
102       devices. gmx-tune_pme will construct calls to mdrun that use  this  set
103       appropriately. gmx-tune_pme does not support -gputasks.
104

OPTIONS

106       Options to specify input files:
107
108       -s [<.tpr>] (topol.tpr)
109              Portable xdr run input file
110
111       -cpi [<.cpt>] (state.cpt) (Optional)
112              Checkpoint file
113
114       -table [<.xvg>] (table.xvg) (Optional)
115              xvgr/xmgr file
116
117       -tablep [<.xvg>] (tablep.xvg) (Optional)
118              xvgr/xmgr file
119
120       -tableb [<.xvg>] (table.xvg) (Optional)
121              xvgr/xmgr file
122
123       -rerun [<.xtc/.trr/...>] (rerun.xtc) (Optional)
124              Trajectory: xtc trr cpt gro g96 pdb tng
125
126       -ei [<.edi>] (sam.edi) (Optional)
127              ED sampling input
128
129       Options to specify output files:
130
131       -p [<.out>] (perf.out)
132              Generic output file
133
134       -err [<.log>] (bencherr.log)
135              Log file
136
137       -so [<.tpr>] (tuned.tpr)
138              Portable xdr run input file
139
140       -o [<.trr/.cpt/...>] (traj.trr)
141              Full precision trajectory: trr cpt tng
142
143       -x [<.xtc/.tng>] (traj_comp.xtc) (Optional)
144              Compressed trajectory (tng format or portable xdr format)
145
146       -cpo [<.cpt>] (state.cpt) (Optional)
147              Checkpoint file
148
149       -c [<.gro/.g96/...>] (confout.gro)
150              Structure file: gro g96 pdb brk ent esp
151
152       -e [<.edr>] (ener.edr)
153              Energy file
154
155       -g [<.log>] (md.log)
156              Log file
157
158       -dhdl [<.xvg>] (dhdl.xvg) (Optional)
159              xvgr/xmgr file
160
161       -field [<.xvg>] (field.xvg) (Optional)
162              xvgr/xmgr file
163
164       -tpi [<.xvg>] (tpi.xvg) (Optional)
165              xvgr/xmgr file
166
167       -tpid [<.xvg>] (tpidist.xvg) (Optional)
168              xvgr/xmgr file
169
170       -eo [<.xvg>] (edsam.xvg) (Optional)
171              xvgr/xmgr file
172
173       -px [<.xvg>] (pullx.xvg) (Optional)
174              xvgr/xmgr file
175
176       -pf [<.xvg>] (pullf.xvg) (Optional)
177              xvgr/xmgr file
178
179       -ro [<.xvg>] (rotation.xvg) (Optional)
180              xvgr/xmgr file
181
182       -ra [<.log>] (rotangles.log) (Optional)
183              Log file
184
185       -rs [<.log>] (rotslabs.log) (Optional)
186              Log file
187
188       -rt [<.log>] (rottorque.log) (Optional)
189              Log file
190
191       -mtx [<.mtx>] (nm.mtx) (Optional)
192              Hessian matrix
193
194       -swap [<.xvg>] (swapions.xvg) (Optional)
195              xvgr/xmgr file
196
197       -bo [<.trr/.cpt/...>] (bench.trr)
198              Full precision trajectory: trr cpt tng
199
200       -bx [<.xtc>] (bench.xtc)
201              Compressed trajectory (portable xdr format): xtc
202
203       -bcpo [<.cpt>] (bench.cpt)
204              Checkpoint file
205
206       -bc [<.gro/.g96/...>] (bench.gro)
207              Structure file: gro g96 pdb brk ent esp
208
209       -be [<.edr>] (bench.edr)
210              Energy file
211
212       -bg [<.log>] (bench.log)
213              Log file
214
215       -beo [<.xvg>] (benchedo.xvg) (Optional)
216              xvgr/xmgr file
217
218       -bdhdl [<.xvg>] (benchdhdl.xvg) (Optional)
219              xvgr/xmgr file
220
221       -bfield [<.xvg>] (benchfld.xvg) (Optional)
222              xvgr/xmgr file
223
224       -btpi [<.xvg>] (benchtpi.xvg) (Optional)
225              xvgr/xmgr file
226
227       -btpid [<.xvg>] (benchtpid.xvg) (Optional)
228              xvgr/xmgr file
229
230       -bdevout [<.xvg>] (benchdev.xvg) (Optional)
231              xvgr/xmgr file
232
233       -brunav [<.xvg>] (benchrnav.xvg) (Optional)
234              xvgr/xmgr file
235
236       -bpx [<.xvg>] (benchpx.xvg) (Optional)
237              xvgr/xmgr file
238
239       -bpf [<.xvg>] (benchpf.xvg) (Optional)
240              xvgr/xmgr file
241
242       -bro [<.xvg>] (benchrot.xvg) (Optional)
243              xvgr/xmgr file
244
245       -bra [<.log>] (benchrota.log) (Optional)
246              Log file
247
248       -brs [<.log>] (benchrots.log) (Optional)
249              Log file
250
251       -brt [<.log>] (benchrott.log) (Optional)
252              Log file
253
254       -bmtx [<.mtx>] (benchn.mtx) (Optional)
255              Hessian matrix
256
257       -bdn [<.ndx>] (bench.ndx) (Optional)
258              Index file
259
260       -bswap [<.xvg>] (benchswp.xvg) (Optional)
261              xvgr/xmgr file
262
263       Other options:
264
265       -xvg <enum> (xmgrace)
266              xvg plot formatting: xmgrace, xmgr, none
267
268       -mdrun <string>
269              Command  line  to run a simulation, e.g. 'gmx mdrun' or 'gmx_mpi
270              mdrun'
271
272       -np <int> (1)
273              Number of ranks to run the tests on (must be >  2  for  separate
274              PME ranks)
275
276       -npstring <enum> (np)
277              Name of the $MPIRUN option that specifies the number of ranks to
278              use ('np', or 'n'; use 'none' if there is no such  option):  np,
279              n, none
280
281       -ntmpi <int> (1)
282              Number  of  MPI-threads  to run the tests on (turns MPI & mpirun
283              off)
284
285       -r <int> (2)
286              Repeat each test this often
287
288       -max <real> (0.5)
289              Max fraction of PME ranks to test with
290
291       -min <real> (0.25)
292              Min fraction of PME ranks to test with
293
294       -npme <enum> (auto)
295              Within -min and -max, benchmark all possible values  for  -npme,
296              or  just  a  reasonable  subset. Auto neglects -min and -max and
297              chooses reasonable values around a guess for npme  derived  from
298              the .tpr: auto, all, subset
299
300       -fix <int> (-2)
301              If  >= -1, do not vary the number of PME-only ranks, instead use
302              this fixed value and only vary rcoulomb and the PME  grid  spac‐
303              ing.
304
305       -rmax <real> (0)
306              If  >0, maximal rcoulomb for -ntpr>1 (rcoulomb upscaling results
307              in fourier grid downscaling)
308
309       -rmin <real> (0)
310              If >0, minimal rcoulomb for -ntpr>1
311
312       -[no]scalevdw (yes)
313              Scale rvdw along with rcoulomb
314
315       -ntpr <int> (0)
316              Number of .tpr files to benchmark. Create this many  files  with
317              different rcoulomb scaling factors depending on -rmin and -rmax.
318              If < 1, automatically choose the number of .tpr files to test
319
320       -steps <int> (1000)
321              Take timings for this many steps in the benchmark runs
322
323       -resetstep <int> (1500)
324              Let dlb equilibrate this many steps  before  timings  are  taken
325              (reset cycle counters after this many steps)
326
327       -nsteps <int> (-1)
328              If  non-negative, perform this many steps in the real run (over‐
329              writes nsteps from .tpr, add .cpt steps)
330
331       -[no]launch (no)
332              Launch the real simulation after optimization
333
334       -[no]bench (yes)
335              Run the benchmarks or just create the input .tpr files?
336
337       -[no]check (yes)
338              Before the benchmark runs, check whether mdrun works in parallel
339
340       -gpu_id <string>
341              List of unique GPU device IDs that are eligible for use
342
343       -[no]append (yes)
344              Append to previous output files when continuing from  checkpoint
345              instead  of  adding the simulation part number to all file names
346              (for launch only)
347
348       -[no]cpnum (no)
349              Keep and number checkpoint files (launch only)
350
351       -deffnm <string>
352              Set the default filenames (launch only)
353

SEE ALSO

355       gmx(1)
356
357       More    information    about    GROMACS    is    available    at     <‐
358       http://www.gromacs.org/>.
359
361       2022, GROMACS development team
362
363
364
365
3662022.2                           Jun 16, 2022                  GMX-TUNE_PME(1)
Impressum