1pdsh(1) General Commands Manual pdsh(1)
2
3
4
6 pdsh - issue commands to groups of hosts in parallel
7
8
10 pdsh [options]... command
11
12
14 pdsh is a variant of the rsh(1) command. Unlike rsh(1), which runs com‐
15 mands on a single remote host, pdsh can run multiple remote commands in
16 parallel. pdsh uses a "sliding window" (or fanout) of threads to con‐
17 serve resources on the initiating host while allowing some connections
18 to time out.
19
20 When pdsh receives SIGINT (ctrl-C), it lists the status of current
21 threads. A second SIGINT within one second terminates the program.
22 Pending threads may be canceled by issuing ctrl-Z within one second of
23 ctrl-C. Pending threads are those that have not yet been initiated, or
24 are still in the process of connecting to the remote host.
25
26
27 If a remote command is not specified on the command line, pdsh runs
28 interactively, prompting for commands and executing them when termi‐
29 nated with a carriage return. In interactive mode, target nodes that
30 time out on the first command are not contacted for subsequent com‐
31 mands, and commands prefixed with an exclamation point will be executed
32 on the local system.
33
34 The core functionality of pdsh may be supplemented by dynamically load‐
35 able modules. The modules may provide a new connection protocol
36 (replacing the standard rcmd(3) protocol used by rsh(1)), filtering
37 options (e.g. removing hosts that are "down" from the target list),
38 and/or host selection options (e.g., -a selects all hosts from a con‐
39 figuration file.). By default, pdsh must have at least one "rcmd" mod‐
40 ule loaded. See the RCMD MODULES section for more information.
41
42
44 The method by which pdsh runs commands on remote hosts may be selected
45 at runtime using the -R option (See OPTIONS below). This functionality
46 is ultimately implemented via dynamically loadable modules, and so the
47 list of available options may be different from installation to instal‐
48 lation. A list of currently available rcmd modules is printed when
49 using any of the -h, -V, or -L options. The default rcmd module will
50 also be displayed with the -h and -V options.
51
52 A list of rcmd modules currently distributed with pdsh follows.
53
54 rsh Uses an internal, thread-safe implementation of BSD rcmd(3) to
55 run commands using the standard rsh(1) protocol.
56
57 exec Executes an arbitrary command for each target host. The first
58 of the pdsh remote arguments is the local command to execute,
59 followed by any further arguments. Some simple parameters are
60 substitued on the command line, including %h for the target
61 hostname, %u for the remote username, and %n for the remote
62 rank [0-n] (To get a literal % use %%). For example, the fol‐
63 lowing would duplicate using the ssh module to run hostname(1)
64 across the hosts foo[0-10]:
65
66 pdsh -R exec -w foo[0-10] ssh -x -l %u %h hostname
67
68 and this command line would run grep(1) in parallel across the
69 files console.foo[0-10]:
70
71 pdsh -R exec -w foo[0-10] grep BUG console.%h
72
73
74 ssh Uses a variant of popen(3) to run multiple copies of the ssh(1)
75 command.
76
77 mrsh This module uses the mrsh(1) protocol to execute jobs on remote
78 hosts. The mrsh protocol uses a credential based authentica‐
79 tion, forgoing the need to allocate reserved ports. In other
80 aspects, it acts just like rsh. Remote nodes must be running
81 mrshd(8) in order for the mrsh module to work.
82
83 qsh Allows pdsh to execute MPI jobs over QsNet. Qshell propagates
84 the current working directory, pdsh environment, and Elan capa‐
85 bilities to the remote process. The following environment vari‐
86 able are also appended to the environment: RMS_RANK,
87 RMS_NODEID, RMS_PROCID, RMS_NNODES, and RMS_NPROCS. Since pdsh
88 needs to run setuid root for qshell support, qshell does not
89 directly support propagation of LD_LIBRARY_PATH and LD_PREOPEN.
90 Instead the QSHELL_REMOTE_LD_LIBRARY_PATH and
91 QSHELL_REMOTE_LD_PREOPEN environment variables will may be used
92 and will be remapped to LD_LIBRARY_PATH and LD_PREOPEN by the
93 qshell daemon if set.
94
95 mqsh Similar to qshell, but uses the mrsh protocol instead of the
96 rsh protocol.
97
98 krb4 The krb4 module allows users to execute remote commands after
99 authenticating with kerberos. Of course, the remote rshd dae‐
100 mons must be kerberized.
101
102 xcpu The xcpu module uses the xcpu service to execute remote com‐
103 mands.
104
105
107 The list of available options is determined at runtime by supplementing
108 the list of standard pdsh options with any options provided by loaded
109 rcmd and misc modules. In some cases, options provided by modules may
110 conflict with each other. In these cases, the modules are incompatible
111 and the first module loaded wins.
112
113
115 -w [rcmd_type:][user@]host,host,...
116 Target the specified list of hosts. Do not use with any other
117 node selection options (e.g. -a, -g if they are available). No
118 spaces are allowed in the comma-separated list. A list consist‐
119 ing of a single `-' character causes the target hosts to be read
120 from stdin, one per line. The host list may contain hostlist
121 expressions of the form ``host[1-5,7]''. For more information
122 about the hostlist format, see the HOSTLIST EXPRESSIONS section
123 below. A list of hosts may also be preceded by "user@" to spec‐
124 ify a remote username other than the default, or "rcmd_type:" to
125 specify an alternate rcmd connection type for these hosts. When
126 used together, the rcmd type must be specified first, e.g.
127 "ssh:user1@host0" would use ssh to connect to host0 as user
128 "user1."
129
130 -x host,host,...
131 Exclude the specified hosts. May be specified in conjunction
132 with other target node list options such as -a and -g (when
133 available). Hostlists may also be specified to the -x option
134 (see the HOSTLIST EXPRESSIONS section below).
135
136
138 -S Return the largest of the remote command return values.
139
140 -h Output usage menu and quit. A list of available rcmd modules
141 will also be printed at the end of the usage message.
142
143 -s Only on AIX, separate remote command stderr and stdout into two
144 sockets.
145
146 -q List option values and the target nodelist and exit without
147 action.
148
149 -b Disable ctrl-C status feature so that a single ctrl-C kills par‐
150 allel job. (Batch Mode)
151
152 -l user
153 This option may be used to run remote commands as another user,
154 subject to authorization. For BSD rcmd, this means the invoking
155 user and system must be listed in the user´s .rhosts file (even
156 for root).
157
158 -t seconds
159 Set the connect timeout. Default is 10 seconds.
160
161 -u seconds
162 Set a limit on the amount of time a remote command is allowed to
163 execute. Default is no limit. See note in LIMITATIONS if using
164 -u with ssh.
165
166 -f number
167 Set the maximum number of simultaneous remote commands to num‐
168 ber. The default is 32.
169
170 -R name
171 Set rcmd module to name. This option may also be set via the
172 PDSH_RCMD_TYPE environment variable. A list of available rcmd
173 modules may be obtained via the -h, -V, or -L options. The
174 default will be listed with -h or -V.
175
176 -M name,...
177 When multiple misc modules provide the same options to pdsh, the
178 first module initialized "wins" and subsequent modules are not
179 loaded. The -M option allows a list of modules to be specified
180 that will be force-initialized before all others, in-effect
181 ensuring that they load without conflict (unless they conflict
182 with eachother). This option may also be set via the
183 PDSH_MISC_MODULES environment variable.
184
185 -L List info on all loaded pdsh modules and quit.
186
187 -N Disable hostname: prefix on lines of output.
188
189 -d Include more complete thread status when SIGINT is received, and
190 display connect and command time statistics on stderr when done.
191
192 -V Output pdsh version information, along with list of currently
193 loaded modules, and exit.
194
195
197 -n tasks_per_node
198 Set the number of tasks spawned per node. Default is 1.
199
200 -m block | cyclic
201 Set block versus cyclic allocation of processes to nodes.
202 Default is block.
203
204 -r railmask
205 Set the rail bitmask for a job on a multirail system. The
206 default railmask is 1, which corresponds to rail 0 only. Each
207 bit set in the argument to -r corresponds to a rail on the sys‐
208 tem, so a value of 2 would correspond to rail 1 only, and 3
209 would indicate to use both rail 1 and rail 0.
210
211
213 -a Target all nodes from machines file.
214
215
217 In addition to the genders options presented below, the genders
218 attribute pdsh_rcmd_type may also be used in the genders database to
219 specify an alternate rcmd connect type than the pdsh default for hosts
220 with this attribute. For example, the following line in the genders
221 file
222
223 host0 pdsh_rcmd_type=ssh
224
225 would cause pdsh to use ssh to connect to host0, even if rsh were the
226 default. This can be overridden on the commandline with the
227 "rcmd_type:host0" syntax.
228
229
230 -A Target all nodes in genders database. The -A option will target
231 every host listed in genders -- if you want to omit some hosts
232 by default, see the -a option below.
233
234 -a Target all nodes in genders database except those with the
235 "pdsh_all_skip" attribute. This is shorthand for running "pdsh
236 -A -X pdsh_all_skip ..."
237
238 -g attr[=val][,attr[=val],...]
239 Target nodes that match any of the specified genders attributes
240 (with optional values). Conflicts with -a and -w options. This
241 option targets the alternate hostnames in the genders database
242 by default. The -i option provided by the genders module may be
243 used to translate these to the canonical genders hostnames. If
244 the installed version of genders supports it, attributes sup‐
245 plied to -g may also take the form of genders queries. Genders
246 queries will query the genders database for the union, intersec‐
247 tion, difference, or complement of genders attributes and val‐
248 ues. The set operation union is represented by two pipe symbols
249 ('||'), intersection by two ampersand symbols ('&&'), difference
250 by two minus symbols ('--'), and complement by a tilde ('~').
251 Parentheses may be used to change the order of operations. See
252 the nodeattr(1) manpage for examples of genders queries.
253
254 -X attr[=val][,attr[=val],...]
255 Exclude nodes that match any of the specified genders attributes
256 (optionally with values). This option may be used in combina‐
257 tion with any other of the node selection options (e.g. -w, -g,
258 -a, -X may also take the form of genders queries. Please see
259 documentation for the genders -g option for more information
260 about genders queries.
261
262 -i Request translation between canonical and alternate hostnames.
263
264 -F filename
265 Read genders information from filename instead of the system
266 default genders file. If filename doesn't specify an absolute
267 path then it is taken to be relative to the directory specified
268 by the PDSH_GENDERS_DIR environment variable (/etc by default).
269 An alternate genders file may also be specified via the
270 PDSH_GENDERS_FILE environment variable.
271
272
274 -v Eliminate target nodes that are considered "down" by libnodeup‐
275 down.
276
277
279 The slurm module allows pdsh to target nodes based on currently running
280 SLURM jobs. The slurm module is typically called after all other node
281 selection options have been processed, and if no nodes have been
282 selected, the module will attempt to read a running jobid from the
283 SLURM_JOBID environment variable (which is set when running under a
284 SLURM allocation). If SLURM_JOBID references an invalid job, it will be
285 silently ignored.
286
287 -j jobid[,jobid,...]
288 Target list of nodes allocated to the SLURM job jobid. This
289 option may be used multiple times to target multiple SLURM jobs.
290 The special argument "all" can be used to target all nodes run‐
291 ning SLURM jobs, e.g. -j all.
292
293
295 The rms module allows pdsh to target nodes based on an RMS resource.
296 The rms module is typically called after all other node selection
297 options, and if no nodes have been selected, the module will examine
298 the RMS_RESOURCEID environment variable and attempt to set the target
299 list of hosts to the nodes in the RMS resource. If an invalid resource
300 is denoted, the variable is silently ignored.
301
302
304 The SDR module supports targeting hosts via the System Data Repository
305 on IBM SPs.
306
307 -a Target all nodes in the SDR. The list is generated from the
308 "reliable hostname" in the SDR by default.
309
310 -i Translate hostnames between reliable and initial in the SDR,
311 when applicable. If the a target hostname matches either the
312 initial or reliable hostname in the SDR, the alternate name will
313 be substitued. Thus a list composed of initial hostnames will
314 instead be replaced with a list of reliable hostnames. For
315 example, when used with -a above, all initial hostnames in the
316 SDR are targeted.
317
318 -v Do not target nodes that are marked as not responding in the SDR
319 on the targeted interface. (If a hostname does not appear in the
320 SDR, then that name will remain in the target hostlist.)
321
322 -G In combination with -a, include all partitions.
323
324
326 The nodeattr module supports access to the genders database via the
327 nodeattr(1) command. See the genders section above for a list of sup‐
328 port options with this module. The option usage with the nodeattr mod‐
329 ule is the same as genders, above, with the exception that the -i
330 option may only be used with -a or -g. NOTE: This module will only work
331 with very old releases of genders where the nodeattr(1) command sup‐
332 ports the -r option, and before the libgenders API was available. Users
333 running newer versions of genders will need to use the genders module
334 instead.
335
336
338 The dshgroup module allows pdsh to use dsh (or Dancer's shell) style
339 group files from /etc/dsh/group/ or ~/.dsh/group/.
340
341 -g groupname,...
342 Target nodes in dsh group file "groupname" found in either
343 ~/.dsh/group/groupname or /etc/dsh/group/groupname.
344
345 -X groupname,...
346 Exclude nodes in dsh group file "groupname."
347
348
350 The netgroup module allows pdsh to use standard netgroup entries to
351 build lists of target hosts. (/etc/netgroup or NIS)
352
353 -g groupname,...
354 Target nodes in netgroup "groupname."
355
356 -X groupname,...
357 Exclude nodes in netgroup "groupname."
358
359
361 PDSH_RCMD_TYPE
362 Equivalent to the -R option, the value of this environment vari‐
363 able will be used to set the default rcmd module for pdsh to use
364 (e.g. ssh, rsh).
365
366 PDSH_SSH_ARGS
367 Override the standard arguments that pdsh passes to the ssh(1)
368 command ("-2 -a -x").
369
370 PDSH_SSH_ARGS_APPEND
371 Append additional options to the ssh(1) command invoked by pdsh.
372 For example, PDSH_SSH_ARGS_APPEND="-q" would run ssh in quiet
373 mode, or "-v" would increase the verbosity of ssh.
374
375 WCOLL If no other node selection option is used, the WCOLL environment
376 variable may be set to a filename from which a list of target
377 hosts will be read. The file should contain a list of hosts, one
378 per line (though each line may contain a hostlist expression.
379 See HOSTLIST EXPRESSIONS section below).
380
381 DSHPATH
382 If set, the path in DSHPATH will be used as the PATH for the
383 remote processes.
384
385 FANOUT Set the pdsh fanout (See description of -f above).
386
387
389 As noted in sections above pdsh accepts lists of hosts the general
390 form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an alterna‐
391 tive to explicit lists of hosts. This form should not be confused with
392 regular expression character classes (also denoted by ``[]''). For
393 example, foo[19] does not represent an expression matching foo1 or
394 foo9, but rather represents the degenerate hostlist: foo19.
395
396 The hostlist syntax is meant only as a convenience on clusters with a
397 "prefixNNN" naming convention and specification of ranges should not be
398 considered necessary -- this foo1,foo9 could be specified as such, or
399 by the hostlist foo[1,9].
400
401 Some examples of usage follow:
402
403
404 Run command on foo01,foo02,...,foo05
405 pdsh -w foo[01-05] command
406
407 Run command on foo7,foo9,foo10
408 pdsh -w foo[7,9-10] command
409
410 Run command on foo0,foo4,foo5
411 pdsh -w foo[0-5] -x foo[1-3] command
412
413
414 A suffix on the hostname is also supported:
415
416
417 Run command on foo0-eth0,foo1-eth0,foo2-eth0,foo3-eth0
418 pdsh -w foo[0-3]-eth0 command
419
420
421 As a reminder to the reader, some shells will interpret brackets ('['
422 and ']') for pattern matching. Depending on your shell, it may be nec‐
423 essary to enclose ranged lists within quotes. For example, in tcsh,
424 the first example above should be executed as:
425
426 pdsh -w "foo[01-05]" command
427
428
430 Originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on
431 LLNL's ASCI Blue-Pacific IBM SP system. It is now used on Linux clus‐
432 ters at LLNL.
433
434
436 When using ssh for remote execution, expect the stderr of ssh to be
437 folded in with that of the remote command. When invoked by pdsh, it is
438 not possible for ssh to prompt for passwords if RSA/DSA keys are con‐
439 figured properly, etc.. For ssh implementations that suppport a con‐
440 nect timeout option, pdsh attempts to use that option to enforce the
441 timeout (e.g. -oConnectTimeout=T for OpenSSH), otherwise connect time‐
442 outs are not supported when using ssh. Finally, there is no reliable
443 way for pdsh to ensure that remote commands are actually terminated
444 when using a command timeout. Thus if -u is used with ssh commands may
445 be left running on remote hosts even after timeout has killed local ssh
446 processes.
447
448 Output from multiple processes per node may be interspersed when using
449 qshell or mqshell rcmd modules.
450
451 The number of nodes that pdsh can simultaneously execute remote jobs on
452 is limited by the maximum number of threads that can be created concur‐
453 rently, as well as the availability of reserved ports in the rsh and
454 qshell rcmd modules. On systems that implement Posix threads, the limit
455 is typically defined by the constant PTHREADS_THREADS_MAX.
456
457
460 rsh(1), ssh(1), dshbak(1), pdcp(1)
461
462
463
464pdsh-2.22 linux-gnu pdsh(1)