1pdsh(1) General Commands Manual pdsh(1)
2
3
4
6 pdsh - issue commands to groups of hosts in parallel
7
8
10 pdsh [options]... command
11
12
14 pdsh is a variant of the rsh(1) command. Unlike rsh(1), which runs com‐
15 mands on a single remote host, pdsh can run multiple remote commands in
16 parallel. pdsh uses a "sliding window" (or fanout) of threads to con‐
17 serve resources on the initiating host while allowing some connections
18 to time out.
19
20 When pdsh receives SIGINT (ctrl-C), it lists the status of current
21 threads. A second SIGINT within one second terminates the program.
22 Pending threads may be canceled by issuing ctrl-Z within one second of
23 ctrl-C. Pending threads are those that have not yet been initiated, or
24 are still in the process of connecting to the remote host.
25
26
27 If a remote command is not specified on the command line, pdsh runs
28 interactively, prompting for commands and executing them when termi‐
29 nated with a carriage return. In interactive mode, target nodes that
30 time out on the first command are not contacted for subsequent com‐
31 mands, and commands prefixed with an exclamation point will be executed
32 on the local system.
33
34 The core functionality of pdsh may be supplemented by dynamically load‐
35 able modules. The modules may provide a new connection protocol
36 (replacing the standard rcmd(3) protocol used by rsh(1)), filtering
37 options (e.g. removing hosts that are "down" from the target list),
38 and/or host selection options (e.g., -a selects all hosts from a con‐
39 figuration file.). By default, pdsh must have at least one "rcmd" mod‐
40 ule loaded. See the RCMD MODULES section for more information.
41
42
44 The method by which pdsh runs commands on remote hosts may be selected
45 at runtime using the -R option (See OPTIONS below). This functionality
46 is ultimately implemented via dynamically loadable modules, and so the
47 list of available options may be different from installation to instal‐
48 lation. A list of currently available rcmd modules is printed when
49 using any of the -h, -V, or -L options. The default rcmd module will
50 also be displayed with the -h and -V options.
51
52 A list of rcmd modules currently distributed with pdsh follows.
53
54 rsh Uses an internal, thread-safe implementation of BSD rcmd(3) to
55 run commands using the standard rsh(1) protocol.
56
57 exec Executes an arbitrary command for each target host. The first
58 of the pdsh remote arguments is the local command to execute,
59 followed by any further arguments. Some simple parameters are
60 substitued on the command line, including %h for the target
61 hostname, %u for the remote username, and %n for the remote
62 rank [0-n] (To get a literal % use %%). For example, the fol‐
63 lowing would duplicate using the ssh module to run hostname(1)
64 across the hosts foo[0-10]:
65
66 pdsh -R exec -w foo[0-10] ssh -x -l %u %h hostname
67
68 and this command line would run grep(1) in parallel across the
69 files console.foo[0-10]:
70
71 pdsh -R exec -w foo[0-10] grep BUG console.%h
72
73
74 ssh Uses a variant of popen(3) to run multiple copies of the ssh(1)
75 command.
76
77 mrsh This module uses the mrsh(1) protocol to execute jobs on remote
78 hosts. The mrsh protocol uses a credential based authentica‐
79 tion, forgoing the need to allocate reserved ports. In other
80 aspects, it acts just like rsh. Remote nodes must be running
81 mrshd(8) in order for the mrsh module to work.
82
83 qsh Allows pdsh to execute MPI jobs over QsNet. Qshell propagates
84 the current working directory, pdsh environment, and Elan capa‐
85 bilities to the remote process. The following environment vari‐
86 able are also appended to the environment: RMS_RANK,
87 RMS_NODEID, RMS_PROCID, RMS_NNODES, and RMS_NPROCS. Since pdsh
88 needs to run setuid root for qshell support, qshell does not
89 directly support propagation of LD_LIBRARY_PATH and LD_PREOPEN.
90 Instead the QSHELL_REMOTE_LD_LIBRARY_PATH and
91 QSHELL_REMOTE_LD_PREOPEN environment variables will may be used
92 and will be remapped to LD_LIBRARY_PATH and LD_PREOPEN by the
93 qshell daemon if set.
94
95 mqsh Similar to qshell, but uses the mrsh protocol instead of the
96 rsh protocol.
97
98 krb4 The krb4 module allows users to execute remote commands after
99 authenticating with kerberos. Of course, the remote rshd dae‐
100 mons must be kerberized.
101
102 xcpu The xcpu module uses the xcpu service to execute remote com‐
103 mands.
104
105
107 The list of available options is determined at runtime by supplementing
108 the list of standard pdsh options with any options provided by loaded
109 rcmd and misc modules. In some cases, options provided by modules may
110 conflict with each other. In these cases, the modules are incompatible
111 and the first module loaded wins.
112
113
115 -w TARGETS,...
116 Target and or filter the specified list of hosts. Do not use
117 with any other node selection options (e.g. -a, -g, if they are
118 available). No spaces are allowed in the comma-separated list.
119 Arguments in the TARGETS list may include normal host names, a
120 range of hosts in hostlist format (See HOSTLIST EXPRESSIONS), or
121 a single `-' character to read the list of hosts on stdin.
122
123 If a host or hostlist is preceded by a `-' character, this
124 causes those hosts to be explicitly excluded. If the argument is
125 preceded by a single `^' character, it is taken to be the path
126 to file containing a list of hosts, one per line. If the item
127 begins with a `/' character, it is taken as a regular expres‐
128 sion on which to filter the list of hosts (a regex argument may
129 also be optionally trailed by another '/', e.g. /node.*/). A
130 regex or file name argument may also be preceeded by a minus `-'
131 to exclude instead of include thoses hosts.
132
133 A list of hosts may also be preceded by "user@" to specify a
134 remote username other than the default, or "rcmd_type:" to spec‐
135 ify an alternate rcmd connection type for these hosts. When used
136 together, the rcmd type must be specified first, e.g.
137 "ssh:user1@host0" would use ssh to connect to host0 as user
138 "user1."
139
140
141
142 -x host,host,...
143 Exclude the specified hosts. May be specified in conjunction
144 with other target node list options such as -a and -g (when
145 available). Hostlists may also be specified to the -x option
146 (see the HOSTLIST EXPRESSIONS section below). Arguments to -x
147 may also be preceeded by the filename (`^') and regex ('/')
148 characters as described above, in which case the resulting hosts
149 are excluded as if they had been given to -w and preceeded with
150 the minus `-' character.
151
152
154 -S Return the largest of the remote command return values.
155
156 -h Output usage menu and quit. A list of available rcmd modules
157 will also be printed at the end of the usage message.
158
159 -s Only on AIX, separate remote command stderr and stdout into two
160 sockets.
161
162 -q List option values and the target nodelist and exit without
163 action.
164
165 -b Disable ctrl-C status feature so that a single ctrl-C kills par‐
166 allel job. (Batch Mode)
167
168 -l user
169 This option may be used to run remote commands as another user,
170 subject to authorization. For BSD rcmd, this means the invoking
171 user and system must be listed in the user´s .rhosts file (even
172 for root).
173
174 -t seconds
175 Set the connect timeout. Default is 10 seconds.
176
177 -u seconds
178 Set a limit on the amount of time a remote command is allowed to
179 execute. Default is no limit. See note in LIMITATIONS if using
180 -u with ssh.
181
182 -f number
183 Set the maximum number of simultaneous remote commands to num‐
184 ber. The default is 32.
185
186 -R name
187 Set rcmd module to name. This option may also be set via the
188 PDSH_RCMD_TYPE environment variable. A list of available rcmd
189 modules may be obtained via the -h, -V, or -L options. The
190 default will be listed with -h or -V.
191
192 -M name,...
193 When multiple misc modules provide the same options to pdsh, the
194 first module initialized "wins" and subsequent modules are not
195 loaded. The -M option allows a list of modules to be specified
196 that will be force-initialized before all others, in-effect
197 ensuring that they load without conflict (unless they conflict
198 with eachother). This option may also be set via the
199 PDSH_MISC_MODULES environment variable.
200
201 -L List info on all loaded pdsh modules and quit.
202
203 -N Disable hostname: prefix on lines of output.
204
205 -d Include more complete thread status when SIGINT is received, and
206 display connect and command time statistics on stderr when done.
207
208 -V Output pdsh version information, along with list of currently
209 loaded modules, and exit.
210
211
213 -n tasks_per_node
214 Set the number of tasks spawned per node. Default is 1.
215
216 -m block | cyclic
217 Set block versus cyclic allocation of processes to nodes.
218 Default is block.
219
220 -r railmask
221 Set the rail bitmask for a job on a multirail system. The
222 default railmask is 1, which corresponds to rail 0 only. Each
223 bit set in the argument to -r corresponds to a rail on the sys‐
224 tem, so a value of 2 would correspond to rail 1 only, and 3
225 would indicate to use both rail 1 and rail 0.
226
227
229 -a Target all nodes from machines file.
230
231
233 In addition to the genders options presented below, the genders
234 attribute pdsh_rcmd_type may also be used in the genders database to
235 specify an alternate rcmd connect type than the pdsh default for hosts
236 with this attribute. For example, the following line in the genders
237 file
238
239 host0 pdsh_rcmd_type=ssh
240
241 would cause pdsh to use ssh to connect to host0, even if rsh were the
242 default. This can be overridden on the commandline with the
243 "rcmd_type:host0" syntax.
244
245
246 -A Target all nodes in genders database. The -A option will target
247 every host listed in genders -- if you want to omit some hosts
248 by default, see the -a option below.
249
250 -a Target all nodes in genders database except those with the
251 "pdsh_all_skip" attribute. This is shorthand for running "pdsh
252 -A -X pdsh_all_skip ..."
253
254 -g attr[=val][,attr[=val],...]
255 Target nodes that match any of the specified genders attributes
256 (with optional values). Conflicts with the -a option. If used in
257 combination with other node selection options like -w, the -g
258 option will select from the supplied node list, instead of from
259 the genders file as a whole. Otherwise, This option targets the
260 alternate hostnames in the genders database by default. The -i
261 option provided by the genders module may be used to translate
262 these to the canonical genders hostnames. If the installed ver‐
263 sion of genders supports it, attributes supplied to -g may also
264 take the form of genders queries. Genders queries will query the
265 genders database for the union, intersection, difference, or
266 complement of genders attributes and values. The set operation
267 union is represented by two pipe symbols ('||'), intersection by
268 two ampersand symbols ('&&'), difference by two minus symbols
269 ('--'), and complement by a tilde ('~'). Parentheses may be
270 used to change the order of operations. See the nodeattr(1) man‐
271 page for examples of genders queries.
272
273 -X attr[=val][,attr[=val],...]
274 Exclude nodes that match any of the specified genders attributes
275 (optionally with values). This option may be used in combina‐
276 tion with any other of the node selection options (e.g. -w, -g,
277 -a, -X may also take the form of genders queries. Please see
278 documentation for the genders -g option for more information
279 about genders queries.
280
281 -i Request translation between canonical and alternate hostnames.
282
283 -F filename
284 Read genders information from filename instead of the system
285 default genders file. If filename doesn't specify an absolute
286 path then it is taken to be relative to the directory specified
287 by the PDSH_GENDERS_DIR environment variable (/etc by default).
288 An alternate genders file may also be specified via the
289 PDSH_GENDERS_FILE environment variable.
290
291
293 -v Eliminate target nodes that are considered "down" by libnodeup‐
294 down.
295
296
298 The slurm module allows pdsh to target nodes based on currently running
299 SLURM jobs. The slurm module is typically called after all other node
300 selection options have been processed, and if no nodes have been
301 selected, the module will attempt to read a running jobid from the
302 SLURM_JOBID environment variable (which is set when running under a
303 SLURM allocation). If SLURM_JOBID references an invalid job, it will be
304 silently ignored.
305
306 -j jobid[,jobid,...]
307 Target list of nodes allocated to the SLURM job jobid. This
308 option may be used multiple times to target multiple SLURM jobs.
309 The special argument "all" can be used to target all nodes run‐
310 ning SLURM jobs, e.g. -j all.
311
312 -P partition[,partition,...]
313 Target list of nodes containing in the SLURM partition parti‐
314 tion. This option may be used multiple times to target multiple
315 SLURM partitions and/or partitions may be given in a comma-
316 delimited list.
317
318
320 The torque module allows pdsh to target nodes based on currently run‐
321 ning Torque/PBS jobs. Similar to the slurm module, the torque module is
322 typically called after all other node selection options have been pro‐
323 cessed, and if no nodes have been selected, the module will attempt to
324 read a running jobid from the PBS_JOBID environment variable (which is
325 set when running under a Torque allocation).
326
327 -j jobid[,jobid,...]
328 Target list of nodes allocated to the Torque job jobid. This
329 option may be used multiple times to target multiple Torque
330 jobs.
331
332
334 The rms module allows pdsh to target nodes based on an RMS resource.
335 The rms module is typically called after all other node selection
336 options, and if no nodes have been selected, the module will examine
337 the RMS_RESOURCEID environment variable and attempt to set the target
338 list of hosts to the nodes in the RMS resource. If an invalid resource
339 is denoted, the variable is silently ignored.
340
341
343 The SDR module supports targeting hosts via the System Data Repository
344 on IBM SPs.
345
346 -a Target all nodes in the SDR. The list is generated from the
347 "reliable hostname" in the SDR by default.
348
349 -i Translate hostnames between reliable and initial in the SDR,
350 when applicable. If the a target hostname matches either the
351 initial or reliable hostname in the SDR, the alternate name will
352 be substitued. Thus a list composed of initial hostnames will
353 instead be replaced with a list of reliable hostnames. For
354 example, when used with -a above, all initial hostnames in the
355 SDR are targeted.
356
357 -v Do not target nodes that are marked as not responding in the SDR
358 on the targeted interface. (If a hostname does not appear in the
359 SDR, then that name will remain in the target hostlist.)
360
361 -G In combination with -a, include all partitions.
362
363
365 The nodeattr module supports access to the genders database via the
366 nodeattr(1) command. See the genders section above for a list of sup‐
367 port options with this module. The option usage with the nodeattr mod‐
368 ule is the same as genders, above, with the exception that the -i
369 option may only be used with -a or -g. NOTE: This module will only work
370 with very old releases of genders where the nodeattr(1) command sup‐
371 ports the -r option, and before the libgenders API was available. Users
372 running newer versions of genders will need to use the genders module
373 instead.
374
375
377 The dshgroup module allows pdsh to use dsh (or Dancer's shell) style
378 group files from /etc/dsh/group/ or ~/.dsh/group/. The default search
379 path may be overridden with the DSHGROUP_PATH environment variable, a
380 colon-separated list of directories to search. The default value for
381 DSHGROUP_PATH is /etc/dsh/group.
382
383 -g groupname,...
384 Target nodes in dsh group file "groupname" found in either
385 ~/.dsh/group/groupname or /etc/dsh/group/groupname.
386
387 -X groupname,...
388 Exclude nodes in dsh group file "groupname."
389
390 As an enhancement in pdsh, dshgroup files may optionally include other
391 dshgroup files via a special #include STRING syntax. The argument to
392 #include may be either a file path, or a group name, in which case the
393 path used to search for the group file is the same as if the group had
394 been specified to -g.
395
396
398 The netgroup module allows pdsh to use standard netgroup entries to
399 build lists of target hosts. (/etc/netgroup or NIS)
400
401 -g groupname,...
402 Target nodes in netgroup "groupname."
403
404 -X groupname,...
405 Exclude nodes in netgroup "groupname."
406
407
409 PDSH_RCMD_TYPE
410 Equivalent to the -R option, the value of this environment vari‐
411 able will be used to set the default rcmd module for pdsh to use
412 (e.g. ssh, rsh).
413
414 PDSH_SSH_ARGS
415 Override the standard arguments that pdsh passes to the ssh(1)
416 command ("-2 -a -x -l%u %h"). The use of the parameters %u, %h,
417 and %n (as documented in the rcmd/exec section above) is
418 optional. If these parameters are missing, pdsh will append them
419 to the ssh commandline because it is assumed they are mandatory.
420
421 PDSH_SSH_ARGS_APPEND
422 Append additional options to the ssh(1) command invoked by pdsh.
423 For example, PDSH_SSH_ARGS_APPEND="-q" would run ssh in quiet
424 mode, or "-v" would increase the verbosity of ssh. (Note: these
425 arguments are actually prepended to the ssh commandline to
426 ensure they appear before any target hostname argument to ssh.)
427
428 WCOLL If no other node selection option is used, the WCOLL environment
429 variable may be set to a filename from which a list of target
430 hosts will be read. The file should contain a list of hosts, one
431 per line (though each line may contain a hostlist expression.
432 See HOSTLIST EXPRESSIONS section below).
433
434 DSHPATH
435 If set, the path in DSHPATH will be used as the PATH for the
436 remote processes.
437
438 FANOUT Set the pdsh fanout (See description of -f above).
439
440
442 As noted in sections above pdsh accepts lists of hosts the general
443 form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an alterna‐
444 tive to explicit lists of hosts. This form should not be confused with
445 regular expression character classes (also denoted by ``[]''). For
446 example, foo[19] does not represent an expression matching foo1 or
447 foo9, but rather represents the degenerate hostlist: foo19.
448
449 The hostlist syntax is meant only as a convenience on clusters with a
450 "prefixNNN" naming convention and specification of ranges should not be
451 considered necessary -- this foo1,foo9 could be specified as such, or
452 by the hostlist foo[1,9].
453
454 Some examples of usage follow:
455
456
457 Run command on foo01,foo02,...,foo05
458 pdsh -w foo[01-05] command
459
460 Run command on foo7,foo9,foo10
461 pdsh -w foo[7,9-10] command
462
463 Run command on foo0,foo4,foo5
464 pdsh -w foo[0-5] -x foo[1-3] command
465
466
467 A suffix on the hostname is also supported:
468
469
470 Run command on foo0-eth0,foo1-eth0,foo2-eth0,foo3-eth0
471 pdsh -w foo[0-3]-eth0 command
472
473
474 As a reminder to the reader, some shells will interpret brackets ('['
475 and ']') for pattern matching. Depending on your shell, it may be nec‐
476 essary to enclose ranged lists within quotes. For example, in tcsh,
477 the first example above should be executed as:
478
479 pdsh -w "foo[01-05]" command
480
481
483 Originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on
484 LLNL's ASCI Blue-Pacific IBM SP system. It is now used on Linux clus‐
485 ters at LLNL.
486
487
489 When using ssh for remote execution, expect the stderr of ssh to be
490 folded in with that of the remote command. When invoked by pdsh, it is
491 not possible for ssh to prompt for passwords if RSA/DSA keys are con‐
492 figured properly, etc.. For ssh implementations that suppport a con‐
493 nect timeout option, pdsh attempts to use that option to enforce the
494 timeout (e.g. -oConnectTimeout=T for OpenSSH), otherwise connect time‐
495 outs are not supported when using ssh. Finally, there is no reliable
496 way for pdsh to ensure that remote commands are actually terminated
497 when using a command timeout. Thus if -u is used with ssh commands may
498 be left running on remote hosts even after timeout has killed local ssh
499 processes.
500
501 Output from multiple processes per node may be interspersed when using
502 qshell or mqshell rcmd modules.
503
504 The number of nodes that pdsh can simultaneously execute remote jobs on
505 is limited by the maximum number of threads that can be created concur‐
506 rently, as well as the availability of reserved ports in the rsh and
507 qshell rcmd modules. On systems that implement Posix threads, the limit
508 is typically defined by the constant PTHREADS_THREADS_MAX.
509
510
513 rsh(1), ssh(1), dshbak(1), pdcp(1)
514
515
516
517pdsh-2.31 linux-gnu pdsh(1)