1CONDOR_DAGMAN(1) HTCondor Manual CONDOR_DAGMAN(1)
2
3
4
6 condor_dagman - HTCondor Manual
7
8 meta scheduler of the jobs submitted as the nodes of a DAG or DAGs
9
10
12 condor_dagman -f -t -l . -help
13
14 condor_dagman -version
15
16 condor_dagman -f -l . -csdversion version_string [-debug level] [-maxi‐
17 dle numberOfProcs] [-maxjobs numberOfJobs] [-maxpre NumberOfPreScripts]
18 [-maxpost NumberOfPostScripts] [-noeventchecks ] [-allowlogerror ]
19 [-usedagdir ] -lockfile filename [-waitfordebug ] [-autorescue 0|1]
20 [-dorescuefrom number] [-allowversionmismatch ] [-DumpRescue ] [-ver‐
21 bose ] [-force ] [-notification value] [-suppress_notification ]
22 [-dont_suppress_notification ] [-dagman DagmanExecutable] [-outfile_dir
23 directory] [-update_submit ] [-import_env ] [-priority number]
24 [-dont_use_default_node_log ] [-DontAlwaysRunPost ] [-AlwaysRunPost ]
25 [-DoRecovery ] -dag dag_file [-dag dag_file_2 ... -dag dag_file_n ]
26
28 condor_dagman is a meta scheduler for the HTCondor jobs within a DAG
29 (directed acyclic graph) (or multiple DAGs). In typical usage, a sub‐
30 mitter of jobs that are organized into a DAG submits the DAG using con‐
31 dor_submit_dag. condor_submit_dag does error checking on aspects of the
32 DAG and then submits condor_dagman as an HTCondor job. condor_dagman
33 uses log files to coordinate the further submission of the jobs within
34 the DAG.
35
36 All command line arguments to the DaemonCore library functions work for
37 condor_dagman. When invoked from the command line, condor_dagman re‐
38 quires the arguments -f -l . to appear first on the command line, to be
39 processed by DaemonCore. The csdversion must also be specified; at
40 start up, condor_dagman checks for a version mismatch with the con‐
41 dor_submit_dag version in this argument. The -t argument must also be
42 present for the -help option, such that output is sent to the terminal.
43
44 Arguments to condor_dagman are either automatically set by condor_sub‐
45 mit_dag or they are specified as command-line arguments to condor_sub‐
46 mit_dag and passed on to condor_dagman. The method by which the argu‐
47 ments are set is given in their description below.
48
49 condor_dagman can run multiple, independent DAGs. This is done by spec‐
50 ifying multiple -dag a rguments. Pass multiple DAG input files as com‐
51 mand-line arguments to condor_submit_dag.
52
53 Debugging output may be obtained by using the -debug level option.
54 Level values and what they produce is described as
55
56 • level = 0; never produce output, except for usage info
57
58 • level = 1; very quiet, output severe errors
59
60 • level = 2; normal output, errors and warnings
61
62 • level = 3; output errors, as well as all warnings
63
64 • level = 4; internal debugging output
65
66 • level = 5; internal debugging output; outer loop debugging
67
68 • level = 6; internal debugging output; inner loop debugging; output
69 DAG input file lines as they are parsed
70
71 • level = 7; internal debugging output; rarely used; output DAG input
72 file lines as they are parsed
73
75 -help Display usage information and exit.
76
77 -version
78 Display version information and exit.
79
80 -debug level
81 An integer level of debugging output. level is an integer,
82 with values of 0-7 inclusive, where 7 is the most verbose
83 output. This command-line option to condor_submit_dag is
84 passed to condor_dagman or defaults to the value 3.
85
86 -maxidle NumberOfProcs
87 Sets the maximum number of idle procs allowed before con‐
88 dor_dagman stops submitting more node jobs. Note that for
89 this argument, each individual proc within a cluster counts
90 as a towards the limit, which is inconsistent with -maxjobs .
91 Once idle procs start to run, condor_dagman will resume sub‐
92 mitting jobs once the number of idle procs falls below the
93 specified limit. NumberOfProcs is a non-negative integer. If
94 this option is omitted, the number of idle procs is limited
95 by the configuration variable DAGMAN_MAX_JOBS_IDLE
96 (see Configuration File Entries for DAGMan), which defaults
97 to 1000. To disable this limit, set NumberOfProcs to 0. Note
98 that submit description files that queue multiple procs can
99 cause the NumberOfProcs limit to be exceeded. Setting queue
100 5000 in the submit description file, where -maxidle is set to
101 250 will result in a cluster of 5000 new procs being submit‐
102 ted to the condor_schedd, not 250. In this case, condor_dag‐
103 man will resume submitting jobs when the number of idle procs
104 falls below 250.
105
106 -maxjobs NumberOfClusters
107 Sets the maximum number of clusters within the DAG that will
108 be submitted to HTCondor at one time. Note that for this ar‐
109 gument, each cluster counts as one job, no matter how many
110 individual procs are in the cluster. NumberOfClusters is a
111 non-negative integer. If this option is omitted, the number
112 of clusters is limited by the configuration variable DAG‐
113 MAN_MAX_JOBS_SUBMITTED
114 (see Configuration File Entries for DAGMan), which defaults
115 to 0 (unlimited).
116
117 -maxpre NumberOfPreScripts
118 Sets the maximum number of PRE scripts within the DAG that
119 may be running at one time. NumberOfPreScripts is a non-nega‐
120 tive integer. If this option is omitted, the number of PRE
121 scripts is limited by the configuration variable DAG‐
122 MAN_MAX_PRE_SCRIPTS (see Configuration File Entries for DAG‐
123 Man), which defaults to 20.
124
125 -maxpost NumberOfPostScripts
126 Sets the maximum number of POST scripts within the DAG that
127 may be running at one time. NumberOfPostScripts is a non-neg‐
128 ative integer. If this option is omitted, the number of POST
129 scripts is limited by the configuration variable DAG‐
130 MAN_MAX_POST_SCRIPTS
131 (see Configuration File Entries for DAGMan), which defaults
132 to 20.
133
134 -noeventchecks
135 This argument is no longer used; it is now ignored. Its func‐
136 tionality is now implemented by the DAGMAN_ALLOW_EVENTS con‐
137 figuration variable.
138
139 -allowlogerror
140 As of verson 8.5.5 this argument is no longer supported, and
141 setting it will generate a warning.
142
143 -usedagdir
144 This optional argument causes condor_dagman to run each spec‐
145 ified DAG as if the directory containing that DAG file was
146 the current working directory. This option is most useful
147 when running multiple DAGs in a single condor_dagman.
148
149 -lockfile filename
150 Names the file created and used as a lock file. The lock file
151 prevents execution of two of the same DAG, as defined by a
152 DAG input file. A default lock file ending with the suffix
153 .dag.lock is passed to condor_dagman by condor_submit_dag.
154
155 -waitfordebug
156 This optional argument causes condor_dagman to wait at
157 startup until someone attaches to the process with a debugger
158 and sets the wait_for_debug variable in main_init() to false.
159
160 -autorescue 0|1
161 Whether to automatically run the newest rescue DAG for the
162 given DAG file, if one exists (0 = false, 1 = true).
163
164 -dorescuefrom number
165 Forces condor_dagman to run the specified rescue DAG number
166 for the given DAG. A value of 0 is the same as not specifying
167 this option. Specifying a nonexistent rescue DAG is a fatal
168 error.
169
170 -allowversionmismatch
171 This optional argument causes condor_dagman to allow a ver‐
172 sion mismatch between condor_dagman itself and the .con‐
173 dor.sub file produced by condor_submit_dag (or, in other
174 words, between condor_submit_dag and condor_dagman). WARNING!
175 This option should be used only if absolutely necessary. Al‐
176 lowing version mismatches can cause subtle problems when run‐
177 ning DAGs. (Note that, starting with version 7.4.0, con‐
178 dor_dagman no longer requires an exact version match between
179 itself and the .condor.sub file. Instead, a "minimum compat‐
180 ible version" is defined, and any .condor.sub file of that
181 version or newer is accepted.)
182
183 -DumpRescue
184 This optional argument causes condor_dagman to immediately
185 dump a Rescue DAG and then exit, as opposed to actually run‐
186 ning the DAG. This feature is mainly intended for testing.
187 The Rescue DAG file is produced whether or not there are
188 parse errors reading the original DAG input file. The name of
189 the file differs if there was a parse error.
190
191 -verbose
192 (This argument is included only to be passed to condor_sub‐
193 mit_dag if lazy submit file generation is used for nested
194 DAGs.) Cause condor_submit_dag to give verbose error mes‐
195 sages.
196
197 -force (This argument is included only to be passed to condor_sub‐
198 mit_dag if lazy submit file generation is used for nested
199 DAGs.) Require condor_submit_dag to overwrite the files that
200 it produces, if the files already exist. Note that dagman.out
201 will be appended to, not overwritten. If new-style rescue DAG
202 mode is in effect, and any new-style rescue DAGs exist, the
203 -force flag will cause them to be renamed, and the original
204 DAG will be run. If old-style rescue DAG mode is in effect,
205 any existing old-style rescue DAGs will be deleted, and the
206 original DAG will be run. See the HTCondor manual section on
207 Rescue DAGs for more information.
208
209 -notification value
210 This argument is only included to be passed to condor_sub‐
211 mit_dag if lazy submit file generation is used for nested
212 DAGs. Sets the e-mail notification for DAGMan itself. This
213 information will be used within the HTCondor submit descrip‐
214 tion file for DAGMan. This file is produced by condor_sub‐
215 mit_dag. The notification option is described in the con‐
216 dor_submit manual page.
217
218 -suppress_notification
219 Causes jobs submitted by condor_dagman to not send email no‐
220 tification for events. The same effect can be achieved by
221 setting the configuration variable DAGMAN_SUPPRESS_NOTIFICA‐
222 TION
223 to True. This command line option is independent of the
224 -notification command line option, which controls notifica‐
225 tion for the condor_dagman job itself. This flag is generally
226 superfluous, as DAGMAN_SUPPRESS_NOTIFICATION defaults to
227 True.
228
229 -dont_suppress_notification
230 Causes jobs submitted by condor_dagman to defer to content
231 within the submit description file when deciding to send
232 email notification for events. The same effect can be
233 achieved by setting the configuration variable DAGMAN_SUP‐
234 PRESS_NOTIFICATION
235 to False. This command line flag is independent of the -no‐
236 tification command line option, which controls notification
237 for the condor_dagman job itself. If both -dont_suppress_no‐
238 tification and -suppress_notification are specified within
239 the same command line, the last argument is used.
240
241 -dagman DagmanExecutable
242 (This argument is included only to be passed to condor_sub‐
243 mit_dag if lazy submit file generation is used for nested
244 DAGs.) Allows the specification of an alternate condor_dagman
245 executable to be used instead of the one found in the user's
246 path. This must be a fully qualified path.
247
248 -outfile_dir directory
249 (This argument is included only to be passed to condor_sub‐
250 mit_dag if lazy submit file generation is used for nested
251 DAGs.) Specifies the directory in which the .dagman.out file
252 will be written. The directory may be specified relative to
253 the current working directory as condor_submit_dag is exe‐
254 cuted, or specified with an absolute path. Without this op‐
255 tion, the .dagman.out file is placed in the same directory as
256 the first DAG input file listed on the command line.
257
258 -update_submit
259 (This argument is included only to be passed to condor_sub‐
260 mit_dag if lazy submit file generation is used for nested
261 DAGs.) This optional argument causes an existing .condor.sub
262 file to not be treated as an error; rather, the .condor.sub
263 file will be overwritten, but the existing values of
264 -maxjobs, -maxidle, -maxpre, and -maxpost will be preserved.
265
266 -import_env
267 (This argument is included only to be passed to condor_sub‐
268 mit_dag if lazy submit file generation is used for nested
269 DAGs.) This optional argument causes condor_submit_dag to im‐
270 port the current environment into the environment command of
271 the .condor.sub file it generates.
272
273 -priority number
274 Sets the minimum job priority of node jobs submitted and run‐
275 ning under this condor_dagman job.
276
277 -dont_use_default_node_log
278 This option is disabled as of HTCondor version 8.3.1. Tells
279 condor_dagman to use the file specified by the job ClassAd
280 attribute UserLog to monitor job status. If this command line
281 argument is used, then the job event log file cannot be de‐
282 fined with a macro.
283
284 -DontAlwaysRunPost
285 This option causes condor_dagman to not run the POST script
286 of a node if the PRE script fails. (This was the default be‐
287 havior prior to HTCondor version 7.7.2, and is again the de‐
288 fault behavior from version 8.5.4 onwards.)
289
290 -AlwaysRunPost
291 This option causes condor_dagman to always run the POST
292 script of a node, even if the PRE script fails. (This was the
293 default behavior for HTCondor version 7.7.2 through version
294 8.5.3.)
295
296 -DoRecovery
297 Causes condor_dagman to start in recovery mode. This means
298 that it reads the relevant job user log(s) and catches up to
299 the given DAG's previous state before submitting any new
300 jobs.
301
302 -dag filename
303 filename is the name of the DAG input file that is set as an
304 argument to condor_submit_dag, and passed to condor_dagman.
305
307 condor_dagman will exit with a status value of 0 (zero) upon success,
308 and it will exit with the value 1 (one) upon failure.
309
311 condor_dagman is normally not run directly, but submitted as an HTCon‐
312 dor job by running condor_submit_dag. See the condor_submit_dag manual
313 page for examples.
314
316 HTCondor Team
317
319 1990-2022, Center for High Throughput Computing, Computer Sciences De‐
320 partment, University of Wisconsin-Madison, Madison, WI, US. Licensed
321 under the Apache License, Version 2.0.
322
323
324
325
3268.8 Jun 13, 2022 CONDOR_DAGMAN(1)