1CONDOR_DAGMAN(1) HTCondor Manual CONDOR_DAGMAN(1)
2
3
4
6 condor_dagman - HTCondor Manual
7
8 meta scheduler of the jobs submitted as the nodes of a DAG or DAGs
9
10
12 condor_dagman -f -t -l . -help
13
14 condor_dagman -version
15
16 condor_dagman -f -l . -csdversion version_string [-debug level] [-maxi‐
17 dle numberOfProcs] [-maxjobs numberOfJobs] [-maxpre NumberOfPreScripts]
18 [-maxpost NumberOfPostScripts] [-noeventchecks ] [-allowlogerror ]
19 [-usedagdir ] -lockfile filename [-waitfordebug ] [-autorescue 0|1]
20 [-dorescuefrom number] [-allowversionmismatch ] [-DumpRescue ] [-ver‐
21 bose ] [-force ] [-notification value] [-suppress_notification ]
22 [-dont_suppress_notification ] [-dagman DagmanExecutable] [-outfile_dir
23 directory] [-update_submit ] [-import_env ] [-priority number]
24 [-dont_use_default_node_log ] [-DontAlwaysRunPost ] [-AlwaysRunPost ]
25 [-DoRecovery ] -dag dag_file [-dag dag_file_2 ... -dag dag_file_n ]
26
28 condor_dagman is a meta scheduler for the HTCondor jobs within a DAG
29 (directed acyclic graph) (or multiple DAGs). In typical usage, a sub‐
30 mitter of jobs that are organized into a DAG submits the DAG using con‐
31 dor_submit_dag. condor_submit_dag does error checking on aspects of the
32 DAG and then submits condor_dagman as an HTCondor job. condor_dagman
33 uses log files to coordinate the further submission of the jobs within
34 the DAG.
35
36 All command line arguments to the DaemonCore library functions work for
37 condor_dagman. When invoked from the command line, condor_dagman re‐
38 quires the arguments -f -l . to appear first on the command line, to be
39 processed by DaemonCore. The csdversion must also be specified; at
40 start up, condor_dagman checks for a version mismatch with the con‐
41 dor_submit_dag version in this argument. The -t argument must also be
42 present for the -help option, such that output is sent to the terminal.
43
44 Arguments to condor_dagman are either automatically set by condor_sub‐
45 mit_dag or they are specified as command-line arguments to condor_sub‐
46 mit_dag and passed on to condor_dagman. The method by which the argu‐
47 ments are set is given in their description below.
48
49 condor_dagman can run multiple, independent DAGs. This is done by spec‐
50 ifying multiple -dag a rguments. Pass multiple DAG input files as com‐
51 mand-line arguments to condor_submit_dag.
52
53 Debugging output may be obtained by using the -debug level option.
54 Level values and what they produce is described as
55
56 • level = 0; never produce output, except for usage info
57
58 • level = 1; very quiet, output severe errors
59
60 • level = 2; normal output, errors and warnings
61
62 • level = 3; output errors, as well as all warnings
63
64 • level = 4; internal debugging output
65
66 • level = 5; internal debugging output; outer loop debugging
67
68 • level = 6; internal debugging output; inner loop debugging; output
69 DAG input file lines as they are parsed
70
71 • level = 7; internal debugging output; rarely used; output DAG input
72 file lines as they are parsed
73
75 -help Display usage information and exit.
76
77 -version
78 Display version information and exit.
79
80 -debug level
81 An integer level of debugging output. level is an integer,
82 with values of 0-7 inclusive, where 7 is the most verbose
83 output. This command-line option to condor_submit_dag is
84 passed to condor_dagman or defaults to the value 3.
85
86 -maxidle NumberOfProcs
87 Sets the maximum number of idle procs allowed before con‐
88 dor_dagman stops submitting more node jobs. Note that for
89 this argument, each individual proc within a cluster counts
90 as a towards the limit, which is inconsistent with -maxjobs .
91 Once idle procs start to run, condor_dagman will resume sub‐
92 mitting jobs once the number of idle procs falls below the
93 specified limit. NumberOfProcs is a non-negative integer. If
94 this option is omitted, the number of idle procs is limited
95 by the configuration variable DAGMAN_MAX_JOBS_IDLE
96 (see admin-manual/configuration-macros:configuration file
97 entries for dagman), which defaults to 1000. To disable this
98 limit, set NumberOfProcs to 0. Note that submit description
99 files that queue multiple procs can cause the NumberOfProcs
100 limit to be exceeded. Setting queue 5000 in the submit de‐
101 scription file, where -maxidle is set to 250 will result in a
102 cluster of 5000 new procs being submitted to the con‐
103 dor_schedd, not 250. In this case, condor_dagman will resume
104 submitting jobs when the number of idle procs falls below
105 250.
106
107 -maxjobs NumberOfClusters
108 Sets the maximum number of clusters within the DAG that will
109 be submitted to HTCondor at one time. Note that for this ar‐
110 gument, each cluster counts as one job, no matter how many
111 individual procs are in the cluster. NumberOfClusters is a
112 non-negative integer. If this option is omitted, the number
113 of clusters is limited by the configuration variable DAG‐
114 MAN_MAX_JOBS_SUBMITTED
115 (see admin-manual/configuration-macros:configuration file
116 entries for dagman), which defaults to 0 (unlimited).
117
118 -maxpre NumberOfPreScripts
119 Sets the maximum number of PRE scripts within the DAG that
120 may be running at one time. NumberOfPreScripts is a non-nega‐
121 tive integer. If this option is omitted, the number of PRE
122 scripts is limited by the configuration variable DAG‐
123 MAN_MAX_PRE_SCRIPTS (see admin-manual/configura‐
124 tion-macros:configuration file entries for dagman), which de‐
125 faults to 20.
126
127 -maxpost NumberOfPostScripts
128 Sets the maximum number of POST scripts within the DAG that
129 may be running at one time. NumberOfPostScripts is a non-neg‐
130 ative integer. If this option is omitted, the number of POST
131 scripts is limited by the configuration variable DAG‐
132 MAN_MAX_POST_SCRIPTS
133 (see admin-manual/configuration-macros:configuration file
134 entries for dagman), which defaults to 20.
135
136 -noeventchecks
137 This argument is no longer used; it is now ignored. Its func‐
138 tionality is now implemented by the DAGMAN_ALLOW_EVENTS con‐
139 figuration variable.
140
141 -allowlogerror
142 As of verson 8.5.5 this argument is no longer supported, and
143 setting it will generate a warning.
144
145 -usedagdir
146 This optional argument causes condor_dagman to run each spec‐
147 ified DAG as if the directory containing that DAG file was
148 the current working directory. This option is most useful
149 when running multiple DAGs in a single condor_dagman.
150
151 -lockfile filename
152 Names the file created and used as a lock file. The lock file
153 prevents execution of two of the same DAG, as defined by a
154 DAG input file. A default lock file ending with the suffix
155 .dag.lock is passed to condor_dagman by condor_submit_dag.
156
157 -waitfordebug
158 This optional argument causes condor_dagman to wait at
159 startup until someone attaches to the process with a debugger
160 and sets the wait_for_debug variable in main_init() to false.
161
162 -autorescue 0|1
163 Whether to automatically run the newest rescue DAG for the
164 given DAG file, if one exists (0 = false, 1 = true).
165
166 -dorescuefrom number
167 Forces condor_dagman to run the specified rescue DAG number
168 for the given DAG. A value of 0 is the same as not specifying
169 this option. Specifying a nonexistent rescue DAG is a fatal
170 error.
171
172 -allowversionmismatch
173 This optional argument causes condor_dagman to allow a ver‐
174 sion mismatch between condor_dagman itself and the .con‐
175 dor.sub file produced by condor_submit_dag (or, in other
176 words, between condor_submit_dag and condor_dagman). WARNING!
177 This option should be used only if absolutely necessary. Al‐
178 lowing version mismatches can cause subtle problems when run‐
179 ning DAGs. (Note that, starting with version 7.4.0, con‐
180 dor_dagman no longer requires an exact version match between
181 itself and the .condor.sub file. Instead, a "minimum compat‐
182 ible version" is defined, and any .condor.sub file of that
183 version or newer is accepted.)
184
185 -DumpRescue
186 This optional argument causes condor_dagman to immediately
187 dump a Rescue DAG and then exit, as opposed to actually run‐
188 ning the DAG. This feature is mainly intended for testing.
189 The Rescue DAG file is produced whether or not there are
190 parse errors reading the original DAG input file. The name of
191 the file differs if there was a parse error.
192
193 -verbose
194 (This argument is included only to be passed to condor_sub‐
195 mit_dag if lazy submit file generation is used for nested
196 DAGs.) Cause condor_submit_dag to give verbose error mes‐
197 sages.
198
199 -force (This argument is included only to be passed to condor_sub‐
200 mit_dag if lazy submit file generation is used for nested
201 DAGs.) Require condor_submit_dag to overwrite the files that
202 it produces, if the files already exist. Note that dagman.out
203 will be appended to, not overwritten. If new-style rescue DAG
204 mode is in effect, and any new-style rescue DAGs exist, the
205 -force flag will cause them to be renamed, and the original
206 DAG will be run. If old-style rescue DAG mode is in effect,
207 any existing old-style rescue DAGs will be deleted, and the
208 original DAG will be run. See the HTCondor manual section on
209 Rescue DAGs for more information.
210
211 -notification value
212 This argument is only included to be passed to condor_sub‐
213 mit_dag if lazy submit file generation is used for nested
214 DAGs. Sets the e-mail notification for DAGMan itself. This
215 information will be used within the HTCondor submit descrip‐
216 tion file for DAGMan. This file is produced by condor_sub‐
217 mit_dag. The notification option is described in the con‐
218 dor_submit manual page.
219
220 -suppress_notification
221 Causes jobs submitted by condor_dagman to not send email no‐
222 tification for events. The same effect can be achieved by
223 setting the configuration variable DAGMAN_SUPPRESS_NOTIFICA‐
224 TION
225 to True. This command line option is independent of the
226 -notification command line option, which controls notifica‐
227 tion for the condor_dagman job itself. This flag is generally
228 superfluous, as DAGMAN_SUPPRESS_NOTIFICATION defaults to
229 True.
230
231 -dont_suppress_notification
232 Causes jobs submitted by condor_dagman to defer to content
233 within the submit description file when deciding to send
234 email notification for events. The same effect can be
235 achieved by setting the configuration variable DAGMAN_SUP‐
236 PRESS_NOTIFICATION
237 to False. This command line flag is independent of the -no‐
238 tification command line option, which controls notification
239 for the condor_dagman job itself. If both -dont_suppress_no‐
240 tification and -suppress_notification are specified within
241 the same command line, the last argument is used.
242
243 -dagman DagmanExecutable
244 (This argument is included only to be passed to condor_sub‐
245 mit_dag if lazy submit file generation is used for nested
246 DAGs.) Allows the specification of an alternate condor_dagman
247 executable to be used instead of the one found in the user's
248 path. This must be a fully qualified path.
249
250 -outfile_dir directory
251 (This argument is included only to be passed to condor_sub‐
252 mit_dag if lazy submit file generation is used for nested
253 DAGs.) Specifies the directory in which the .dagman.out file
254 will be written. The directory may be specified relative to
255 the current working directory as condor_submit_dag is exe‐
256 cuted, or specified with an absolute path. Without this op‐
257 tion, the .dagman.out file is placed in the same directory as
258 the first DAG input file listed on the command line.
259
260 -update_submit
261 (This argument is included only to be passed to condor_sub‐
262 mit_dag if lazy submit file generation is used for nested
263 DAGs.) This optional argument causes an existing .condor.sub
264 file to not be treated as an error; rather, the .condor.sub
265 file will be overwritten, but the existing values of
266 -maxjobs, -maxidle, -maxpre, and -maxpost will be preserved.
267
268 -import_env
269 (This argument is included only to be passed to condor_sub‐
270 mit_dag if lazy submit file generation is used for nested
271 DAGs.) This optional argument causes condor_submit_dag to im‐
272 port the current environment into the environment command of
273 the .condor.sub file it generates.
274
275 -priority number
276 Sets the minimum job priority of node jobs submitted and run‐
277 ning under this condor_dagman job.
278
279 -dont_use_default_node_log
280 This option is disabled as of HTCondor version 8.3.1. Tells
281 condor_dagman to use the file specified by the job ClassAd
282 attribute UserLog to monitor job status. If this command line
283 argument is used, then the job event log file cannot be de‐
284 fined with a macro.
285
286 -DontAlwaysRunPost
287 This option causes condor_dagman to not run the POST script
288 of a node if the PRE script fails. (This was the default be‐
289 havior prior to HTCondor version 7.7.2, and is again the de‐
290 fault behavior from version 8.5.4 onwards.)
291
292 -AlwaysRunPost
293 This option causes condor_dagman to always run the POST
294 script of a node, even if the PRE script fails. (This was the
295 default behavior for HTCondor version 7.7.2 through version
296 8.5.3.)
297
298 -DoRecovery
299 Causes condor_dagman to start in recovery mode. This means
300 that it reads the relevant job user log(s) and catches up to
301 the given DAG's previous state before submitting any new
302 jobs.
303
304 -dag filename
305 filename is the name of the DAG input file that is set as an
306 argument to condor_submit_dag, and passed to condor_dagman.
307
309 condor_dagman will exit with a status value of 0 (zero) upon success,
310 and it will exit with the value 1 (one) upon failure.
311
313 condor_dagman is normally not run directly, but submitted as an HTCon‐
314 dor job by running condor_submit_dag. See the /man-pages/condor_sub‐
315 mit_dag manual page for examples.
316
318 HTCondor Team
319
321 1990-2021, Center for High Throughput Computing, Computer Sciences De‐
322 partment, University of Wisconsin-Madison, Madison, WI, US. Licensed
323 under the Apache License, Version 2.0.
324
325
326
327
3288.8 Jan 26, 2021 CONDOR_DAGMAN(1)