1pdsh(1) General Commands Manual pdsh(1)
2
3
4
6 pdsh - issue commands to groups of hosts in parallel
7
8
10 pdsh [options]... command
11
12
14 pdsh is a variant of the rsh(1) command. Unlike rsh(1), which runs com‐
15 mands on a single remote host, pdsh can run multiple remote commands in
16 parallel. pdsh uses a "sliding window" (or fanout) of threads to con‐
17 serve resources on the initiating host while allowing some connections
18 to time out.
19
20 When pdsh receives SIGINT (ctrl-C), it lists the status of current
21 threads. A second SIGINT within one second terminates the program.
22 Pending threads may be canceled by issuing ctrl-Z within one second of
23 ctrl-C. Pending threads are those that have not yet been initiated, or
24 are still in the process of connecting to the remote host.
25
26
27 If a remote command is not specified on the command line, pdsh runs
28 interactively, prompting for commands and executing them when termi‐
29 nated with a carriage return. In interactive mode, target nodes that
30 time out on the first command are not contacted for subsequent com‐
31 mands, and commands prefixed with an exclamation point will be executed
32 on the local system.
33
34 The core functionality of pdsh may be supplemented by dynamically load‐
35 able modules. The modules may provide a new connection protocol
36 (replacing the standard rcmd(3) protocol used by rsh(1)), filtering
37 options (e.g. removing hosts that are "down" from the target list),
38 and/or host selection options (e.g., -a selects all hosts from a con‐
39 figuration file.). By default, pdsh must have at least one "rcmd" mod‐
40 ule loaded. See the RCMD MODULES section for more information.
41
42
44 The method by which pdsh runs commands on remote hosts may be selected
45 at runtime using the -R option (See OPTIONS below). This functionality
46 is ultimately implemented via dynamically loadable modules, and so the
47 list of available options may be different from installation to instal‐
48 lation. A list of currently available rcmd modules is printed when
49 using any of the -h, -V, or -L options. The default rcmd module will
50 also be displayed with the -h and -V options.
51
52 A list of rcmd modules currently distributed with pdsh follows.
53
54 rsh Uses an internal, thread-safe implementation of BSD rcmd(3) to
55 run commands using the standard rsh(1) protocol.
56
57 ssh Uses a variant of popen(3) to run multiple copies of the ssh(1)
58 command.
59
60 mrsh This module uses the mrsh(1) protocol to execute jobs on remote
61 hosts. The mrsh protocol uses a credential based authentica‐
62 tion, forgoing the need to allocate reserved ports. In other
63 aspects, it acts just like rsh. Remote nodes must be running
64 mrshd(8) in order for the mrsh module to work.
65
66 qsh Allows pdsh to execute MPI jobs over QsNet. Qshell propagates
67 the current working directory, pdsh environment, and Elan capa‐
68 bilities to the remote process. The following environment vari‐
69 able are also appended to the environment: RMS_RANK,
70 RMS_NODEID, RMS_PROCID, RMS_NNODES, and RMS_NPROCS. Since pdsh
71 needs to run setuid root for qshell support, qshell does not
72 directly support propagation of LD_LIBRARY_PATH and LD_PREOPEN.
73 Instead the QSHELL_REMOTE_LD_LIBRARY_PATH and
74 QSHELL_REMOTE_LD_PREOPEN environment variables will may be used
75 and will be remapped to LD_LIBRARY_PATH and LD_PREOPEN by the
76 qshell daemon if set.
77
78 mqsh Similar to qshell, but uses the mrsh protocol instead of the
79 rsh protocol.
80
81 krb4 The krb4 module allows users to execute remote commands after
82 authenticating with kerberos. Of course, the remote rshd dae‐
83 mons must be kerberized.
84
85 xcpu The xcpu module uses the xcpu service to execute remote com‐
86 mands.
87
88
90 The list of available options is determined at runtime by supplementing
91 the list of standard pdsh options with any options provided by loaded
92 rcmd and misc modules. In some cases, options provided by modules may
93 conflict with each other. In these cases, the modules are incompatible
94 and the first module loaded wins.
95
96
98 -w [rcmd_type:][user@]host,host,...
99 Target the specified list of hosts. Do not use with any other
100 node selection options (e.g. -a, -g if they are available). No
101 spaces are allowed in the comma-separated list. A list consist‐
102 ing of a single `-' character causes the target hosts to be read
103 from stdin, one per line. The host list may contain hostlist
104 expressions of the form ``host[1-5,7]''. For more information
105 about the hostlist format, see the HOSTLIST EXPRESSIONS section
106 below. A list of hosts may also be preceded by "user@" to spec‐
107 ify a remote username other than the default, or "rcmd_type:" to
108 specify an alternate rcmd connection type for these hosts. When
109 used together, the rcmd type must be specified first, e.g.
110 "ssh:user1@host0" would use ssh to connect to host0 as user
111 "user1."
112
113 -x host,host,...
114 Exclude the specified hosts. May be specified in conjunction
115 with other target node list options such as -a and -g (when
116 available). Hostlists may also be specified to the -x option
117 (see the HOSTLIST EXPRESSIONS section below).
118
119
121 -S Return the largest of the remote command return values.
122
123 -h Output usage menu and quit. A list of available rcmd modules
124 will also be printed at the end of the usage message.
125
126 -s Only on AIX, separate remote command stderr and stdout into two
127 sockets.
128
129 -q List option values and the target nodelist and exit without
130 action.
131
132 -b Disable ctrl-C status feature so that a single ctrl-C kills par‐
133 allel job. (Batch Mode)
134
135 -l user
136 This option may be used to run remote commands as another user,
137 subject to authorization. For BSD rcmd, this means the invoking
138 user and system must be listed in the user´s .rhosts file (even
139 for root).
140
141 -t seconds
142 Set the connect timeout. Default is 10 seconds.
143
144 -u seconds
145 Set a limit on the amount of time a remote command is allowed to
146 execute. Default is no limit. See note in LIMITATIONS if using
147 -u with ssh.
148
149 -f number
150 Set the maximum number of simultaneous remote commands to num‐
151 ber. The default is 32.
152
153 -R name
154 Set rcmd module to name. This option may also be set via the
155 PDSH_RCMD_TYPE environment variable. A list of available rcmd
156 modules may be obtained via the -h, -V, or -L options. The
157 default will be listed with -h or -V.
158
159 -L List info on all loaded pdsh modules and quit.
160
161 -d Include more complete thread status when SIGINT is received, and
162 display connect and command time statistics on stderr when done.
163
164 -V Output pdsh version information, along with list of currently
165 loaded modules, and exit.
166
167
169 -n tasks_per_node
170 Set the number of tasks spawned per node. Default is 1.
171
172 -m block | cyclic
173 Set block versus cyclic allocation of processes to nodes.
174 Default is block.
175
176 -r railmask
177 Set the rail bitmask for a job on a multirail system. The
178 default railmask is 1, which corresponds to rail 0 only. Each
179 bit set in the argument to -r corresponds to a rail on the sys‐
180 tem, so a value of 2 would correspond to rail 1 only, and 3
181 would indicate to use both rail 1 and rail 0.
182
183
185 -a Target all nodes from machines file.
186
187
189 In addition to the genders options presented below, the genders
190 attribute pdsh_rcmd_type may also be used in the genders database to
191 specify an alternate rcmd connect type than the pdsh default for hosts
192 with this attribute. For example, the following line in the genders
193 file
194
195 host0 pdsh_rcmd_type=ssh
196
197 would cause pdsh to use ssh to connect to host0, even if rsh were the
198 default. This can be overridden on the commandline with the
199 "rcmd_type:host0" syntax.
200
201
202 -A Target all nodes in genders database. The -A option will target
203 every host listed in genders -- if you want to omit some hosts
204 by default, see the -a option below.
205
206 -a Target all nodes in genders database except those with the
207 "pdsh_all_skip" attribute. This is shorthand for running "pdsh
208 -A -X pdsh_all_skip ..."
209
210 -g attr[=val][,attr[=val],...]
211 Target nodes that match any of the specified genders attributes
212 (with optional values). Conflicts with -a and -w options. This
213 option targets the alternate hostnames in the genders database
214 by default. The -i option provided by the genders module may be
215 used to translate these to the canonical genders hostnames. If
216 the installed version of genders supports it, attributes sup‐
217 plied to -g may also take the form of genders queries. Genders
218 queries will query the genders database for the union, intersec‐
219 tion, difference, or complement of genders attributes and val‐
220 ues. The set operation union is represented by two pipe symbols
221 ('||'), intersection by two ampersand symbols ('&&'), difference
222 by two minus symbols ('--'), and complement by a tilde ('~').
223 Parentheses may be used to change the order of operations. See
224 the nodeattr(1) manpage for examples of genders queries.
225
226 -X attr[=val][,attr[=val],...]
227 Exclude nodes that match any of the specified genders attributes
228 (optionally with values). This option may be used in combina‐
229 tion with any other of the node selection options (e.g. -w, -g,
230 -a, -X may also take the form of genders queries. Please see
231 documentation for the genders -g option for more information
232 about genders queries.
233
234 -i Request translation between canonical and alternate hostnames.
235
236 -F filename
237 Read genders information from filename instead of the system
238 default genders file.
239
240
242 -v Eliminate target nodes that are considered "down" by libnodeup‐
243 down.
244
245
247 The slurm module allows pdsh to target nodes based on currently running
248 SLURM jobs. The slurm module is typically called after all other node
249 selection options have been processed, and if no nodes have been
250 selected, the module will attempt to read a running jobid from the
251 SLURM_JOBID environment variable (which is set when running under a
252 SLURM allocation). If SLURM_JOBID references an invalid job, it will be
253 silently ignored.
254
255 -j jobid[,jobid,...]
256 Target list of nodes allocated to the SLURM job jobid. This
257 option may be used multiple times to target multiple SLURM jobs.
258
259
261 The rms module allows pdsh to target nodes based on an RMS resource.
262 The rms module is typically called after all other node selection
263 options, and if no nodes have been selected, the module will examine
264 the RMS_RESOURCEID environment variable and attempt to set the target
265 list of hosts to the nodes in the RMS resource. If an invalid resource
266 is denoted, the variable is silently ignored.
267
268
270 The SDR module supports targeting hosts via the System Data Repository
271 on IBM SPs.
272
273 -a Target all nodes in the SDR. The list is generated from the
274 "reliable hostname" in the SDR by default.
275
276 -i Translate hostnames between reliable and initial in the SDR,
277 when applicable. If the a target hostname matches either the
278 initial or reliable hostname in the SDR, the alternate name will
279 be substitued. Thus a list composed of initial hostnames will
280 instead be replaced with a list of reliable hostnames. For
281 example, when used with -a above, all initial hostnames in the
282 SDR are targeted.
283
284 -v Do not target nodes that are marked as not responding in the SDR
285 on the targeted interface. (If a hostname does not appear in the
286 SDR, then that name will remain in the target hostlist.)
287
288 -G In combination with -a, include all partitions.
289
290
292 The nodeattr module supports access to the genders database via the
293 nodeattr(1) command. See the genders section above for a list of sup‐
294 port options with this module. The option usage with the nodeattr mod‐
295 ule is the same as genders, above, with the exception that the -i
296 option may only be used with -a or -g.
297
298
300 The dshgroup module allows pdsh to use dsh (or Dancer's shell) style
301 group files from /etc/dsh/group/ or ~/.dsh/group/.
302
303 -g groupname,...
304 Target nodes in dsh group file "groupname" found in either
305 ~/.dsh/group/groupname or /etc/dsh/group/groupname.
306
307 -X groupname,...
308 Exclude nodes in dsh group file "groupname."
309
310
312 The netgroup module allows pdsh to use standard netgroup entries to
313 build lists of target hosts. (/etc/netgroup or NIS)
314
315 -g groupname,...
316 Target nodes in netgroup "groupname."
317
318 -X groupname,...
319 Exclude nodes in netgroup "groupname."
320
321
323 PDSH_RCMD_TYPE
324 Equivalent to the -R option, the value of this environment vari‐
325 able will be used to set the default rcmd module for pdsh to use
326 (e.g. ssh, rsh).
327
328 PDSH_SSH_ARGS
329 Override the standard arguments that pdsh passes to the ssh(1)
330 command ("-2 -a -x").
331
332 PDSH_SSH_ARGS_APPEND
333 Append additional options to the ssh(1) command invoked by pdsh.
334 For example, PDSH_SSH_ARGS_APPEND="-q" would run ssh in quiet
335 mode, or "-v" would increase the verbosity of ssh.
336
337 WCOLL If no other node selection option is used, the WCOLL environment
338 variable may be set to a filename from which a list of target
339 hosts will be read. The file should contain a list of hosts, one
340 per line (though each line may contain a hostlist expression.
341 See HOSTLIST EXPRESSIONS section below).
342
343 DSHPATH
344 If set, the path in DSHPATH will be used as the PATH for the
345 remote processes.
346
347 FANOUT Set the pdsh fanout (See description of -f above).
348
349
351 As noted in sections above pdsh accepts lists of hosts the general
352 form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an alterna‐
353 tive to explicit lists of hosts. This form should not be confused with
354 regular expression character classes (also denoted by ``[]''). For
355 example, foo[19] does not represent an expression matching foo1 or
356 foo9, but rather represents the degenerate hostlist: foo19.
357
358 The hostlist syntax is meant only as a convenience on clusters with a
359 "prefixNNN" naming convention and specification of ranges should not be
360 considered necessary -- this foo1,foo9 could be specified as such, or
361 by the hostlist foo[1,9].
362
363 Some examples of usage follow:
364
365
366 Run command on foo01,foo02,...,foo05
367 pdsh -w foo[01-05] command
368
369 Run command on foo7,foo9,foo10
370 pdsh -w foo[7,9-10] command
371
372 Run command on foo0,foo4,foo5
373 pdsh -w foo[0-5] -x foo[1-3] command
374
375
376 As a reminder to the reader, some shells will interpret brackets ('['
377 and ']') for pattern matching. Depending on your shell, it may be nec‐
378 essary to enclose ranged lists within quotes. For example, in tcsh,
379 the first example above should be executed as:
380
381 pdsh -w "foo[01-05]" command
382
383
385 Originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on
386 LLNL's ASCI Blue-Pacific IBM SP system. It is now used on Linux clus‐
387 ters at LLNL.
388
389
391 When using ssh for remote execution, expect the stderr of ssh to be
392 folded in with that of the remote command. When invoked by pdsh, it is
393 not possible for ssh to prompt for passwords if RSA/DSA keys are con‐
394 figured properly, etc.. Additionally, the connect timeout is not
395 adjustable when ssh is used. Finally, there is no reliable way for pdsh
396 to ensure that remote commands are actually terminated when using a
397 command timeout. Thus if -u is used with ssh commands may be left run‐
398 ning on remote hosts even after timeout has killed local ssh processes.
399
400 Output from multiple processes per node may be interspersed when using
401 qshell or mqshell rcmd modules.
402
403 Hostlist parsing assumes numerical part of hostname is at the end only,
404 e.g., specifying foo[0-5]bar will not work.
405
406 The number of nodes that pdsh can simultaneously execute remote jobs on
407 is limited by the maximum number of threads that can be created concur‐
408 rently, as well as the availability of reserved ports in the rsh and
409 qshell rcmd modules. On systems that implement Posix threads, the limit
410 is typically defined by the constant PTHREADS_THREADS_MAX.
411
412
415 rsh(1), ssh(1), dshbak(1), pdcp(1)
416
417
418
419pdsh-2.11 linux-gnu pdsh(1)