1g_tune_pme(1) GROMACS suite, VERSION 4.5 g_tune_pme(1)
2
3
4
6 g_tune_pme - time mdrun as a function of PME nodes to optimize settings
7
8 VERSION 4.5
9
11 g_tune_pme -p perf.out -err errors.log -so tuned.tpr -s topol.tpr -o
12 traj.trr -x traj.xtc -cpi state.cpt -cpo state.cpt -c confout.gro -e
13 ener.edr -g md.log -dhdl dhdl.xvg -field field.xvg -table table.xvg
14 -tablep tablep.xvg -tableb table.xvg -rerun rerun.xtc -tpi tpi.xvg
15 -tpid tpidist.xvg -ei sam.edi -eo sam.edo -j wham.gct -jo bam.gct
16 -ffout gct.xvg -devout deviatie.xvg -runav runaver.xvg -px pullx.xvg
17 -pf pullf.xvg -mtx nm.mtx -dn dipole.ndx -bo bench.trr -bx bench.xtc
18 -bcpo bench.cpt -bc bench.gro -be bench.edr -bg bench.log -beo
19 bench.edo -bdhdl benchdhdl.xvg -bfield benchfld.xvg -btpi benchtpi.xvg
20 -btpid benchtpid.xvg -bjo bench.gct -bffout benchgct.xvg -bdevout
21 benchdev.xvg -brunav benchrnav.xvg -bpx benchpx.xvg -bpf benchpf.xvg
22 -bmtx benchn.mtx -bdn bench.ndx -[no]h -[no]version -nice int -xvg enum
23 -np int -npstring enum -nt int -r int -max real -min real -npme enum
24 -upfac real -downfac real -ntpr int -four real -steps step -resetstep
25 int -simsteps step -[no]launch -deffnm string -ddorder enum
26 -[no]ddcheck -rdd real -rcon real -dlb enum -dds real -gcom int -[no]v
27 -[no]compact -[no]seppot -pforce real -[no]reprod -cpt real -[no]cpnum
28 -[no]append -maxh real -multi int -replex int -reseed int -[no]ionize
29
31 For a given number -np or -nt of processors/threads, this program
32 systematically times mdrun with various numbers of PME-only nodes and
33 determines which setting is fastest. It will also test whether perfor‐
34 mance can be enhanced by shifting load from the reciprocal to the real
35 space part of the Ewald sum. Simply pass your .tpr file to g_tune_pme
36 together with other options for mdrun as needed.
37
38
39 Which executables are used can be set in the environment variables
40 MPIRUN and MDRUN. If these are not present, 'mpirun' and 'mdrun' will
41 be used as defaults. Note that for certain MPI frameworks you need to
42 provide a machine- or hostfile. This can also be passed via the MPIRUN
43 variable, e.g. 'export MPIRUN="/usr/local/mpirun -machinefile hosts"'
44
45
46 Please call g_tune_pme with the normal options you would pass to mdrun
47 and add -np for the number of processors to perform the tests on, or
48 -nt for the number of threads. You can also add -r to repeat each test
49 several times to get better statistics.
50
51
52 g_tune_pme can test various real space / reciprocal space workloads for
53 you. With -ntpr you control how many extra .tpr files will be written
54 with enlarged cutoffs and smaller fourier grids respectively. Typi‐
55 cally, the first test (no. 0) will be with the settings from the input
56 .tpr file; the last test (no. ntpr) will have cutoffs multiplied by
57 (and at the same time fourier grid dimensions divided by) the scaling
58 factor -fac (default 1.2). The remaining .tpr files will have equally
59 spaced values inbetween these extremes. Note that you can set -ntpr to
60 1 if you just want to find the optimal number of PME-only nodes; in
61 that case your input .tpr file will remain unchanged.
62
63
64 For the benchmark runs, the default of 1000 time steps should suffice
65 for most MD systems. The dynamic load balancing needs about 100 time
66 steps to adapt to local load imbalances, therefore the time step coun‐
67 ters are by default reset after 100 steps. For large systems (1M atoms)
68 you may have to set -resetstep to a higher value. From the 'DD' load
69 imbalance entries in the md.log output file you can tell after how many
70 steps the load is sufficiently balanced.
71
72 Example call: g_tune_pme -np 64 -s protein.tpr -launch
73
74
75 After calling mdrun several times, detailed performance information is
76 available in the output file perf.out. Note that during the benchmarks
77 a couple of temporary files are written (options -b*), these will be
78 automatically deleted after each test.
79
80
81 If you want the simulation to be started automatically with the opti‐
82 mized parameters, use the command line option -launch.
83
84
85
87 -p perf.out Output
88 Generic output file
89
90 -err errors.log Output
91 Log file
92
93 -so tuned.tpr Output
94 Run input file: tpr tpb tpa
95
96 -s topol.tpr Input
97 Run input file: tpr tpb tpa
98
99 -o traj.trr Output
100 Full precision trajectory: trr trj cpt
101
102 -x traj.xtc Output, Opt.
103 Compressed trajectory (portable xdr format)
104
105 -cpi state.cpt Input, Opt.
106 Checkpoint file
107
108 -cpo state.cpt Output, Opt.
109 Checkpoint file
110
111 -c confout.gro Output
112 Structure file: gro g96 pdb etc.
113
114 -e ener.edr Output
115 Energy file
116
117 -g md.log Output
118 Log file
119
120 -dhdl dhdl.xvg Output, Opt.
121 xvgr/xmgr file
122
123 -field field.xvg Output, Opt.
124 xvgr/xmgr file
125
126 -table table.xvg Input, Opt.
127 xvgr/xmgr file
128
129 -tablep tablep.xvg Input, Opt.
130 xvgr/xmgr file
131
132 -tableb table.xvg Input, Opt.
133 xvgr/xmgr file
134
135 -rerun rerun.xtc Input, Opt.
136 Trajectory: xtc trr trj gro g96 pdb cpt
137
138 -tpi tpi.xvg Output, Opt.
139 xvgr/xmgr file
140
141 -tpid tpidist.xvg Output, Opt.
142 xvgr/xmgr file
143
144 -ei sam.edi Input, Opt.
145 ED sampling input
146
147 -eo sam.edo Output, Opt.
148 ED sampling output
149
150 -j wham.gct Input, Opt.
151 General coupling stuff
152
153 -jo bam.gct Output, Opt.
154 General coupling stuff
155
156 -ffout gct.xvg Output, Opt.
157 xvgr/xmgr file
158
159 -devout deviatie.xvg Output, Opt.
160 xvgr/xmgr file
161
162 -runav runaver.xvg Output, Opt.
163 xvgr/xmgr file
164
165 -px pullx.xvg Output, Opt.
166 xvgr/xmgr file
167
168 -pf pullf.xvg Output, Opt.
169 xvgr/xmgr file
170
171 -mtx nm.mtx Output, Opt.
172 Hessian matrix
173
174 -dn dipole.ndx Output, Opt.
175 Index file
176
177 -bo bench.trr Output
178 Full precision trajectory: trr trj cpt
179
180 -bx bench.xtc Output
181 Compressed trajectory (portable xdr format)
182
183 -bcpo bench.cpt Output
184 Checkpoint file
185
186 -bc bench.gro Output
187 Structure file: gro g96 pdb etc.
188
189 -be bench.edr Output
190 Energy file
191
192 -bg bench.log Output
193 Log file
194
195 -beo bench.edo Output, Opt.
196 ED sampling output
197
198 -bdhdl benchdhdl.xvg Output, Opt.
199 xvgr/xmgr file
200
201 -bfield benchfld.xvg Output, Opt.
202 xvgr/xmgr file
203
204 -btpi benchtpi.xvg Output, Opt.
205 xvgr/xmgr file
206
207 -btpid benchtpid.xvg Output, Opt.
208 xvgr/xmgr file
209
210 -bjo bench.gct Output, Opt.
211 General coupling stuff
212
213 -bffout benchgct.xvg Output, Opt.
214 xvgr/xmgr file
215
216 -bdevout benchdev.xvg Output, Opt.
217 xvgr/xmgr file
218
219 -brunav benchrnav.xvg Output, Opt.
220 xvgr/xmgr file
221
222 -bpx benchpx.xvg Output, Opt.
223 xvgr/xmgr file
224
225 -bpf benchpf.xvg Output, Opt.
226 xvgr/xmgr file
227
228 -bmtx benchn.mtx Output, Opt.
229 Hessian matrix
230
231 -bdn bench.ndx Output, Opt.
232 Index file
233
234
236 -[no]hno
237 Print help info and quit
238
239 -[no]versionno
240 Print version info and quit
241
242 -nice int 0
243 Set the nicelevel
244
245 -xvg enum xmgrace
246 xvg plot formatting: xmgrace, xmgr or none
247
248 -np int 1
249 Number of nodes to run the tests on (must be 2 for separate PME
250 nodes)
251
252 -npstring enum -np
253 Specify the number of processors to $MPIRUN using this string: -np,
254 -n or none
255
256 -nt int 1
257 Number of threads to run the tests on (turns MPI & mpirun off)
258
259 -r int 2
260 Repeat each test this often
261
262 -max real 0.5
263 Max fraction of PME nodes to test with
264
265 -min real 0.25
266 Min fraction of PME nodes to test with
267
268 -npme enum auto
269 Benchmark all possible values for -npme or just the subset that is
270 expected to perform well: auto, all or subset
271
272 -upfac real 1.2
273 Upper limit for rcoulomb scaling factor (Note that rcoulomb upscaling
274 results in fourier grid downscaling)
275
276 -downfac real 1
277 Lower limit for rcoulomb scaling factor
278
279 -ntpr int 0
280 Number of tpr files to benchmark. Create these many files with scaling
281 factors ranging from 1.0 to fac. If 1, automatically choose the number
282 of tpr files to test
283
284 -four real 0
285 Use this fourierspacing value instead of the grid found in the tpr
286 input file. (Spacing applies to a scaling factor of 1.0 if multiple tpr
287 files are written)
288
289 -steps step 1000
290 Take timings for these many steps in the benchmark runs
291
292 -resetstep int 100
293 Let dlb equilibrate these many steps before timings are taken (reset
294 cycle counters after these many steps)
295
296 -simsteps step -1
297 If non-negative, perform these many steps in the real run (overwrite
298 nsteps from tpr, add cpt steps)
299
300 -[no]launchno
301 Lauch the real simulation after optimization
302
303 -deffnm string
304 Set the default filename for all file options at launch time
305
306 -ddorder enum interleave
307 DD node order: interleave, pp_pme or cartesian
308
309 -[no]ddcheckyes
310 Check for all bonded interactions with DD
311
312 -rdd real 0
313 The maximum distance for bonded interactions with DD (nm), 0 is deter‐
314 mine from initial coordinates
315
316 -rcon real 0
317 Maximum distance for P-LINCS (nm), 0 is estimate
318
319 -dlb enum auto
320 Dynamic load balancing (with DD): auto, no or yes
321
322 -dds real 0.8
323 Minimum allowed dlb scaling of the DD cell size
324
325 -gcom int -1
326 Global communication frequency
327
328 -[no]vno
329 Be loud and noisy
330
331 -[no]compactyes
332 Write a compact log file
333
334 -[no]seppotno
335 Write separate V and dVdl terms for each interaction type and node to
336 the log file(s)
337
338 -pforce real -1
339 Print all forces larger than this (kJ/mol nm)
340
341 -[no]reprodno
342 Try to avoid optimizations that affect binary reproducibility
343
344 -cpt real 15
345 Checkpoint interval (minutes)
346
347 -[no]cpnumno
348 Keep and number checkpoint files
349
350 -[no]appendyes
351 Append to previous output files when continuing from checkpoint
352 instead of adding the simulation part number to all file names (for
353 launch only)
354
355 -maxh real -1
356 Terminate after 0.99 times this time (hours)
357
358 -multi int 0
359 Do multiple simulations in parallel
360
361 -replex int 0
362 Attempt replica exchange every steps
363
364 -reseed int -1
365 Seed for replica exchange, -1 is generate a seed
366
367 -[no]ionizeno
368 Do a simulation including the effect of an X-Ray bombardment on your
369 system
370
371
373 gromacs(7)
374
375 More information about GROMACS is available at <http://www.gro‐
376 macs.org/>.
377
378
379
380 Thu 26 Aug 2010 g_tune_pme(1)