1CONDOR_DRAIN(1) HTCondor Manual CONDOR_DRAIN(1)
2
3
4
6 condor_drain - HTCondor Manual
7
8 Control draining of an execute machine
9
10
12 condor_drain [-help ]
13
14 condor_drain [-debug ] [-pool pool-name] [-graceful | -quick | -fast ]
15 [-resume-on-completion ] [-check expr] [-start expr] machine-name
16
17 condor_drain [-debug ] [-pool pool-name] -cancel [-request-id id] ma‐
18 chine-name
19
21 condor_drain is an administrative command used to control the draining
22 of all slots on an execute machine. When a machine is draining, it will
23 not accept any new jobs unless the -start expression specifies other‐
24 wise. Which machine to drain is specified by the argument machine-name,
25 and will be the same as the machine ClassAd attribute Machine.
26
27 How currently running jobs are treated depends on the draining schedule
28 that is chosen with a command-line option:
29
30 -graceful
31 Initiate a graceful eviction of the job. This means all prom‐
32 ises that have been made to the job are honored, including
33 MaxJobRetirementTime. The eviction of jobs is coordinated to
34 reduce idle time. This means that if one slot has a job with
35 a long retirement time and the other slots have jobs with
36 shorter retirement times, the effective retirement time for
37 all of the jobs is the longer one. If no draining schedule is
38 specified, -graceful is chosen by default.
39
40 -quick MaxJobRetirementTime is not honored. Eviction of jobs is im‐
41 mediately initiated. Jobs are given time to shut down and
42 produce checkpoints, according to the usual policy, that is,
43 given by MachineMaxVacateTime.
44
45 -fast Jobs are immediately hard-killed, with no chance to grace‐
46 fully shut down or produce a checkpoint.
47
48 If you specify -graceful, you may also specify -start. On a grace‐
49 fully-draining machine, some jobs may finish retiring before others. By
50 default, the resources used by the newly-retired jobs do not become
51 available for use by other jobs until the machine exits the draining
52 state (see below). The -start expression you supply replaces the drain‐
53 ing machine's normal START expression for the duration of the draining
54 state, potentially making those resources available. See the
55 condor_startd Policy Configuration section for more information.
56
57 Once draining is complete, the machine will enter the Drained/Idle
58 state. To resume normal operation (negotiation) at that time or any
59 previous time during draining, the -cancel option may be used. The -re‐
60 sume-on-completion option results in automatic resumption of normal op‐
61 eration once draining has completed, and may be used when initiating
62 draining. This is useful for forcing a machine with a partitionable
63 slots to join all of the resources back together into one machine, fa‐
64 cilitating de-fragmentation and whole machine negotiation.
65
67 -help Display brief usage information and exit.
68
69 -debug Causes debugging information to be sent to stderr, based on
70 the value of the configuration variable TOOL_DEBUG.
71
72 -pool pool-name
73 Specify an alternate HTCondor pool, if the default one is not
74 desired.
75
76 -graceful
77 (the default) Honor the maximum vacate and retirement time
78 policy.
79
80 -quick Honor the maximum vacate time, but not the retirement time
81 policy.
82
83 -fast Honor neither the maximum vacate time policy nor the retire‐
84 ment time policy.
85
86 -resume-on-completion
87 When done draining, resume normal operation, such that poten‐
88 tially the whole machine could be claimed.
89
90 -check expr
91 Abort draining, if expr is not true for all slots to be
92 drained.
93
94 -start expr
95 The START expression to use while the machine is draining.
96 You can't reference the machine's existing START expression.
97
98 -cancel
99 Cancel a prior draining request, to permit the condor_nego‐
100 tiator to use the machine again.
101
102 -request-id id
103 Specify a specific draining request to cancel, where id is
104 given by the DrainingRequestId machine ClassAd attribute.
105
107 condor_drain will exit with a non-zero status value if it fails and
108 zero status if it succeeds.
109
111 HTCondor Team
112
114 1990-2022, Center for High Throughput Computing, Computer Sciences De‐
115 partment, University of Wisconsin-Madison, Madison, WI, US. Licensed
116 under the Apache License, Version 2.0.
117
118
119
120
1218.8 Jun 13, 2022 CONDOR_DRAIN(1)