1CONDOR_SSH_TO_JOB(1) HTCondor Manual CONDOR_SSH_TO_JOB(1)
2
3
4
6 condor_ssh_to_job - HTCondor Manual
7
8 create an ssh session to a running job
9
10
12 condor_ssh_to_job [-help ]
13
14 condor_ssh_to_job [-debug ] [-name schedd-name] [-pool pool-name] [-ssh
15 ssh-command] [-keygen-options ssh-keygen-options] [-shells
16 shell1,shell2,...] [-auto-retry ] [-remove-on-interrupt ] cluster |
17 cluster.process | cluster.process.node [remote-command ]
18
20 condor_ssh_to_job creates an ssh session to a running job. The job is
21 specified with the argument. If only the job cluster id is given, then
22 the job process id defaults to the value 0.
23
24 condor_ssh_to_job is available in Unix HTCondor distributions, and
25 works with two kinds of jobs: those in the vanilla, vm, java, local, or
26 parallel universes, and those jobs in the grid universe which use EC2
27 resources. It will not work with other grid universe jobs.
28
29 For jobs in the vanilla, vm, java, local, or parallel universes, the
30 user must be the owner of the job or must be a queue super user, and
31 both the condor_schedd and condor_starter daemons must allow con‐
32 dor_ssh_to_job access. If no remote-command is specified, an interac‐
33 tive shell is created. An alternate ssh program such as sftp may be
34 specified, using the -ssh option, for uploading and downloading files.
35
36 The remote command or shell runs with the same user id as the running
37 job, and it is initialized with the same working directory. The envi‐
38 ronment is initialized to be the same as that of the job, plus any
39 changes made by the shell setup scripts and any environment variables
40 passed by the ssh client. In addition, the environment variable _CON‐
41 DOR_JOB_PIDS is defined. It is a space-separated list of PIDs associ‐
42 ated with the job. At a minimum, the list will contain the PID of the
43 process started when the job was launched, and it will be the first
44 item in the list. It may contain additional PIDs of other processes
45 that the job has created.
46
47 The ssh session and all processes it creates are treated by HTCondor as
48 though they are processes belonging to the job. If the slot is pre‐
49 empted or suspended, the ssh session is killed or suspended along with
50 the job. If the job exits before the ssh session finishes, the slot re‐
51 mains in the Claimed Busy state and is treated as though not all job
52 processes have exited until all ssh sessions are closed. Multiple ssh
53 sessions may be created to the same job at the same time. Resource con‐
54 sumption of the sshd process and all processes spawned by it are moni‐
55 tored by the condor_starter as though these processes belong to the
56 job, so any policies such as PREEMPT that enforce a limit on resource
57 consumption also take into account resources consumed by the ssh ses‐
58 sion.
59
60 condor_ssh_to_job stores ssh keys in temporary files within a newly
61 created and uniquely named directory. The newly created directory will
62 be within the directory defined by the environment variable TMPDIR.
63 When the ssh session is finished, this directory and the ssh keys con‐
64 tained within it are removed.
65
66 See the HTCondor administrator's manual section on configuration for
67 details of the configuration variables related to condor_ssh_to_job.
68
69 An ssh session works by first authenticating and authorizing a secure
70 connection between condor_ssh_to_job and the condor_starter daemon, us‐
71 ing HTCondor protocols. The condor_starter generates an ssh key pair
72 and sends it securely to condor_ssh_to_job. Then the condor_starter
73 spawns sshd in inetd mode with its stdin and stdout attached to the TCP
74 connection from condor_ssh_to_job. condor_ssh_to_job acts as a proxy
75 for the ssh client to communicate with sshd, using the existing connec‐
76 tion authorized by HTCondor. At no point is sshd listening on the net‐
77 work for connections or running with any privileges other than that of
78 the user identity running the job. If CCB is being used to enable con‐
79 nectivity to the execute node from outside of a firewall or private
80 network, condor_ssh_to_job is able to make use of CCB in order to form
81 the ssh connection.
82
83 The login shell of the user id running the job is used to run the re‐
84 quested command, sshd subsystem, or interactive shell. This is
85 hard-coded behavior in OpenSSH and cannot be overridden by configura‐
86 tion. This means that condor_ssh_to_job access is effectively disabled
87 if the login shell disables access, as in the example programs
88 /bin/true and /sbin/nologin.
89
90 condor_ssh_to_job is intended to work with OpenSSH as installed in typ‐
91 ical environments. It does not work on Windows platforms. If the ssh
92 programs are installed in non-standard locations, then the paths to
93 these programs will need to be customized within the HTCondor configu‐
94 ration. Versions of ssh other than OpenSSH may work, but they will
95 likely require additional configuration of command-line arguments,
96 changes to the sshd configuration template file, and possibly modifica‐
97 tion of the $(LIBEXEC)/condor_ssh_to_job_sshd_setup script used by the
98 condor_starter to set up sshd.
99
100 For jobs in the grid universe which use EC2 resources, a request that
101 HTCondor have the EC2 service create a new key pair for the job by
102 specifying ec2_keypair_file causes condor_ssh_to_job to attempt to con‐
103 nect to the corresponding instance via ssh. This attempts invokes ssh
104 directly, bypassing the HTCondor networking layer. It supplies ssh with
105 the public DNS name of the instance and the name of the file with the
106 new key pair's private key. For the connection to succeed, the instance
107 must have started an ssh server, and its security group(s) must allow
108 connections on port 22. Conventionally, images will allow logins using
109 the key pair on a single specific account. Because ssh defaults to log‐
110 ging in as the current user, the -l <username> option or its equivalent
111 for other versions of ssh will be needed as part of the remote-command
112 argument. Although the -X option does not apply to EC2 jobs, adding -X
113 or -Y to the remote-command argument can duplicate the effect.
114
116 -help Display brief usage information and exit.
117
118 -debug Causes debugging information to be sent to stderr, based on
119 the value of the configuration variable TOOL_DEBUG.
120
121 -name schedd-name
122 Specify an alternate condor_schedd, if the default (local)
123 one is not desired.
124
125 -pool pool-name
126 Specify an alternate HTCondor pool, if the default one is not
127 desired. Does not apply to EC2 jobs.
128
129 -ssh ssh-command
130 Specify an alternate ssh program to run in place of ssh, for
131 example sftp or scp. Additional arguments are specified as
132 ssh-command. Since the arguments are delimited by spaces,
133 place double quote marks around the whole command, to prevent
134 the shell from splitting it into multiple arguments to con‐
135 dor_ssh_to_job. If any arguments must contain spaces, en‐
136 close them within single quotes. Does not apply to EC2 jobs.
137
138 -keygen-options ssh-keygen-options
139 Specify additional arguments to the ssh_keygen program, for
140 creating the ssh key that is used for the duration of the
141 session. For example, a different number of bits could be
142 used, or a different key type than the default. Does not ap‐
143 ply to EC2 jobs.
144
145 -shells shell1,shell2,...
146 Specify a comma-separated list of shells to attempt to
147 launch. If the first shell does not exist on the remote ma‐
148 chine, then the following ones in the list will be tried. If
149 none of the specified shells can be found, /bin/sh is used by
150 default. If this option is not specified, it defaults to the
151 environment variable SHELL from within the condor_ssh_to_job
152 environment. Does not apply to EC2 jobs.
153
154 -auto-retry
155 Specifies that if the job is not yet running, con‐
156 dor_ssh_to_job should keep trying periodically until it suc‐
157 ceeds or encounters some other error.
158
159 -remove-on-interrupt
160 If specified, attempt to remove the job from the queue if
161 condor_ssh_to_job is interrupted via a CTRL-c or otherwise
162 terminated abnormally.
163
164 -X Enable X11 forwarding. Does not apply to EC2 jobs.
165
166 -x Disable X11 forwarding.
167
169 % condor_ssh_to_job 32.0
170 Welcome to slot2@tonic.cs.wisc.edu!
171 Your condor job is running with pid(s) 65881.
172 % gdb -p 65881
173 (gdb) where
174 ...
175 % logout
176 Connection to condor-job.tonic.cs.wisc.edu closed.
177
178 To upload or download files interactively with sftp:
179
180 % condor_ssh_to_job -ssh sftp 32.0
181 Connecting to condor-job.tonic.cs.wisc.edu...
182 sftp> ls
183 ...
184 sftp> get outputfile.dat
185
186 This example shows downloading a file from the job with scp. The string
187 "remote" is used in place of a host name in this example. It is not
188 necessary to insert the correct remote host name, or even a valid one,
189 because the connection to the job is created automatically. Therefore,
190 the placeholder string "remote" is perfectly fine.
191
192 % condor_ssh_to_job -ssh scp 32 remote:outputfile.dat .
193
194 This example uses condor_ssh_to_job to accomplish the task of running
195 rsync to synchronize a local file with a remote file in the job's work‐
196 ing directory. Job id 32.0 is used in place of a host name in this ex‐
197 ample. This causes rsync to insert the expected job id in the arguments
198 to condor_ssh_to_job.
199
200 % rsync -v -e "condor_ssh_to_job" 32.0:outputfile.dat .
201
202 Note that condor_ssh_to_job was added to HTCondor in version 7.3. If
203 one uses condor_ssh_to_job to connect to a job on an execute machine
204 running a version of HTCondor older than the 7.3 series, the command
205 will fail with the error message
206
207 Failed to send CREATE_JOB_OWNER_SEC_SESSION to starter
208
210 condor_ssh_to_job will exit with a non-zero status value if it fails to
211 set up an ssh session. If it succeeds, it will exit with the status
212 value of the remote command or shell.
213
215 HTCondor Team
216
218 1990-2021, Center for High Throughput Computing, Computer Sciences De‐
219 partment, University of Wisconsin-Madison, Madison, WI, US. Licensed
220 under the Apache License, Version 2.0.
221
222
223
224
2258.8 Jan 26, 2021 CONDOR_SSH_TO_JOB(1)