1CONDOR_DAGMAN(1) HTCondor Manual CONDOR_DAGMAN(1)
2
3
4
6 condor_dagman - HTCondor Manual
7
8 meta scheduler of the jobs submitted as the nodes of a DAG or DAGs
9
10
12 condor_dagman -f -t -l . -help
13
14 condor_dagman -version
15
16 condor_dagman -f -l . -csdversion version_string [-debug level]
17 [-dryrun] [-maxidle numberOfProcs] [-maxjobs numberOfJobs] [-maxpre
18 NumberOfPreScripts] [-maxpost NumberOfPostScripts] [-maxhold Num‐
19 berOfHoldScripts] [-usedagdir ] -lockfile filename [-waitfordebug ]
20 [-autorescue 0|1] [-dorescuefrom number] [-load_save filename] [-al‐
21 lowversionmismatch ] [-DumpRescue ] [-verbose ] [-force ] [-notifica‐
22 tion value] [-suppress_notification ] [-dont_suppress_notification ]
23 [-dagman DagmanExecutable] [-outfile_dir directory] [-update_submit ]
24 [-import_env ] [-include_env Variables] [-insert_env Key=Value] [-pri‐
25 ority number] [-DontAlwaysRunPost ] [-AlwaysRunPost ] [-DoRecovery ]
26 [-dot] -dag dag_file [-dag dag_file_2 ... -dag dag_file_n ]
27
29 condor_dagman is a meta scheduler for the HTCondor jobs within a DAG
30 (directed acyclic graph) (or multiple DAGs). In typical usage, a sub‐
31 mitter of jobs that are organized into a DAG submits the DAG using con‐
32 dor_submit_dag. condor_submit_dag does error checking on aspects of the
33 DAG and then submits condor_dagman as an HTCondor job. condor_dagman
34 uses log files to coordinate the further submission of the jobs within
35 the DAG.
36
37 All command line arguments to the DaemonCore library functions work for
38 condor_dagman. When invoked from the command line, condor_dagman re‐
39 quires the arguments -f -l . to appear first on the command line, to be
40 processed by DaemonCore. The csdversion must also be specified; at
41 start up, condor_dagman checks for a version mismatch with the con‐
42 dor_submit_dag version in this argument. The -t argument must also be
43 present for the -help option, such that output is sent to the terminal.
44
45 Arguments to condor_dagman are either automatically set by condor_sub‐
46 mit_dag or they are specified as command-line arguments to condor_sub‐
47 mit_dag and passed on to condor_dagman. The method by which the argu‐
48 ments are set is given in their description below.
49
50 condor_dagman can run multiple, independent DAGs. This is done by spec‐
51 ifying multiple -dag a rguments. Pass multiple DAG input files as com‐
52 mand-line arguments to condor_submit_dag.
53
54 Debugging output may be obtained by using the -debug level option.
55 Level values and what they produce is described as
56
57 • level = 0; never produce output, except for usage info
58
59 • level = 1; very quiet, output severe errors
60
61 • level = 2; normal output, errors and warnings
62
63 • level = 3; output errors, as well as all warnings
64
65 • level = 4; internal debugging output
66
67 • level = 5; internal debugging output; outer loop debugging
68
69 • level = 6; internal debugging output; inner loop debugging; output
70 DAG input file lines as they are parsed
71
72 • level = 7; internal debugging output; rarely used; output DAG input
73 file lines as they are parsed
74
76 -help Display usage information and exit.
77
78 -version
79 Display version information and exit.
80
81 -csdversion VersionString
82 Sets the version of condor_submit_dag command used to submit
83 the DAGMan workflow. Used to help identify version mismatch‐
84 ing.
85
86 -debug level
87 An integer level of debugging output. level is an integer,
88 with values of 0-7 inclusive, where 7 is the most verbose
89 output. This command-line option to condor_submit_dag is
90 passed to condor_dagman or defaults to the value 3.
91
92 -dryrun
93 Inform DAGMan to do a dry run. Where the DAG is ran but node
94 jobs are not actually submitted.
95
96 -maxidle NumberOfProcs
97 Sets the maximum number of idle procs allowed before con‐
98 dor_dagman stops submitting more node jobs. If this option is
99 omitted then the number of idle procs is limited by the con‐
100 figuration variable DAGMAN_MAX_JOBS_IDLE which defaults to
101 1000. To disable this limit, set NumberOfProcs to 0. The
102 NumberOfProcs can be exceeded if a nodes job has a queue com‐
103 mand with more than one proc to queue. i.e. queue 500 will
104 submit all procs even if NumberOfProcs is 250. In this case
105 DAGMan will wait for for the number of idle procs to fall be‐
106 low 250 before submitting more jobs to the condor_schedd.
107
108 -maxjobs NumberOfClusters
109 Sets the maximum number of clusters within the DAG that will
110 be submitted to HTCondor at one time. Each cluster is associ‐
111 ated with one node job no matter how many individual procs
112 are in the cluster. NumberOfClusters is a non-negative inte‐
113 ger. If this option is omitted then the number of clusters is
114 limited by the configuration variable
115 DAGMAN_MAX_JOBS_SUBMITTED which defaults to 0 (unlimited).
116
117 -maxpre NumberOfPreScripts
118 Sets the maximum number of PRE scripts within the DAG that
119 may be running at one time. NumberOfPreScripts is a non-nega‐
120 tive integer. If this option is omitted, the number of PRE
121 scripts is limited by the configuration variable
122 DAGMAN_MAX_PRE_SCRIPTS which defaults to 20.
123
124 -maxpost NumberOfPostScripts
125 Sets the maximum number of POST scripts within the DAG that
126 may be running at one time. NumberOfPostScripts is a non-neg‐
127 ative integer. If this option is omitted, the number of POST
128 scripts is limited by the configuration variable
129 DAGMAN_MAX_POST_SCRIPTS which defaults to 20.
130
131 -maxhold NumberOfHoldScripts
132 Sets the maximum number of HOLD scripts within the DAG that
133 may be running at one time. NumberOfHoldscripts is a non-neg‐
134 ative integer. If this option is omitted, the number of HOLD
135 scripts is limited by the configuration variable
136 DAGMAN_MAX_HOLD_SCRIPTS, which defaults to 0 (unlimited).
137
138 -usedagdir
139 This optional argument causes condor_dagman to run each spec‐
140 ified DAG as if the directory containing that DAG file was
141 the current working directory. This option is most useful
142 when running multiple DAGs in a single condor_dagman.
143
144 -lockfile filename
145 Names the file created and used as a lock file. The lock file
146 prevents execution of two of the same DAG, as defined by a
147 DAG input file. A default lock file ending with the suffix
148 .dag.lock is passed to condor_dagman by condor_submit_dag.
149
150 -waitfordebug
151 This optional argument causes condor_dagman to wait at
152 startup until someone attaches to the process with a debugger
153 and sets the wait_for_debug variable in main_init() to false.
154
155 -autorescue 0|1
156 Whether to automatically run the newest rescue DAG for the
157 given DAG file, if one exists (0 = false, 1 = true).
158
159 -dorescuefrom number
160 Forces condor_dagman to run the specified rescue DAG number
161 for the given DAG. A value of 0 is the same as not specifying
162 this option. Specifying a nonexistent rescue DAG is a fatal
163 error.
164
165 -load_save filename
166 Specify a file with saved DAG progress to re-run the DAG
167 from. If given a path DAGMan will attempt to read that file
168 following that path. Otherwise, DAGMan will check for the
169 file in the DAG's save_files sub-directory.
170
171 -allowversionmismatch
172 This optional argument causes condor_dagman to allow a ver‐
173 sion mismatch between condor_dagman itself and the .con‐
174 dor.sub file produced by condor_submit_dag (or, in other
175 words, between condor_submit_dag and condor_dagman). WARNING!
176 This option should be used only if absolutely necessary. Al‐
177 lowing version mismatches can cause subtle problems when run‐
178 ning DAGs.
179
180 -DumpRescue
181 This optional argument causes condor_dagman to immediately
182 dump a Rescue DAG and then exit, as opposed to actually run‐
183 ning the DAG. This feature is mainly intended for testing.
184 The Rescue DAG file is produced whether or not there are
185 parse errors reading the original DAG input file. The name of
186 the file differs if there was a parse error.
187
188 -verbose
189 (This argument is included only to be passed to condor_sub‐
190 mit_dag if lazy submit file generation is used for nested
191 DAGs.) Cause condor_submit_dag to give verbose error mes‐
192 sages.
193
194 -force (This argument is included only to be passed to condor_sub‐
195 mit_dag if lazy submit file generation is used for nested
196 DAGs.) Require condor_submit_dag to overwrite the files that
197 it produces, if the files already exist. Note that dagman.out
198 will be appended to, not overwritten. If new-style rescue DAG
199 mode is in effect, and any new-style rescue DAGs exist, the
200 -force flag will cause them to be renamed, and the original
201 DAG will be run. If old-style rescue DAG mode is in effect,
202 any existing old-style rescue DAGs will be deleted, and the
203 original DAG will be run. See the HTCondor manual section on
204 Rescue DAGs for more information.
205
206 -notification value
207 This argument is only included to be passed to condor_sub‐
208 mit_dag if lazy submit file generation is used for nested
209 DAGs. Sets the e-mail notification for DAGMan itself. This
210 information will be used within the HTCondor submit descrip‐
211 tion file for DAGMan. This file is produced by condor_sub‐
212 mit_dag. The notification option is described in the con‐
213 dor_submit manual page.
214
215 -suppress_notification
216 Causes jobs submitted by condor_dagman to not send email no‐
217 tification for events. The same effect can be achieved by
218 setting the configuration variable
219 DAGMAN_SUPPRESS_NOTIFICATION to True. This command line op‐
220 tion is independent of the -notification command line option,
221 which controls notification for the condor_dagman job itself.
222 This flag is generally superfluous, as DAGMAN_SUPPRESS_NOTI‐
223 FICATION defaults to True.
224
225 -dont_suppress_notification
226 Causes jobs submitted by condor_dagman to defer to content
227 within the submit description file when deciding to send
228 email notification for events. The same effect can be
229 achieved by setting the configuration variable
230 DAGMAN_SUPPRESS_NOTIFICATION to False. This command line flag
231 is independent of the -notification command line option,
232 which controls notification for the condor_dagman job itself.
233 If both -dont_suppress_notification and -suppress_notifica‐
234 tion are specified within the same command line, the last ar‐
235 gument is used.
236
237 -dagman DagmanExecutable
238 (This argument is included only to be passed to condor_sub‐
239 mit_dag if lazy submit file generation is used for nested
240 DAGs.) Allows the specification of an alternate condor_dagman
241 executable to be used instead of the one found in the user's
242 path. This must be a fully qualified path.
243
244 -outfile_dir directory
245 (This argument is included only to be passed to condor_sub‐
246 mit_dag if lazy submit file generation is used for nested
247 DAGs.) Specifies the directory in which the .dagman.out file
248 will be written. The directory may be specified relative to
249 the current working directory as condor_submit_dag is exe‐
250 cuted, or specified with an absolute path. Without this op‐
251 tion, the .dagman.out file is placed in the same directory as
252 the first DAG input file listed on the command line.
253
254 -update_submit
255 (This argument is included only to be passed to condor_sub‐
256 mit_dag if lazy submit file generation is used for nested
257 DAGs.) This optional argument causes an existing .condor.sub
258 file to not be treated as an error; rather, the .condor.sub
259 file will be overwritten, but the existing values of
260 -maxjobs, -maxidle, -maxpre, and -maxpost will be preserved.
261
262 -import_env
263 (This argument is included only to be passed to condor_sub‐
264 mit_dag if lazy submit file generation is used for nested
265 DAGs.) This optional argument causes condor_submit_dag to im‐
266 port the current environment into the environment command of
267 the .condor.sub file it generates.
268
269 -include_env Variables
270 This optional argument takes a comma separated list of envi‐
271 roment variables to add to .condor.sub getenv environment
272 filter which causes found matching environment variables to
273 be added to the DAGMan manager jobs environment.
274
275 -insert_env Key=Value
276 This optional argument takes a delimited string of Key=Value
277 pairs to explicitly set into the .condor.sub files environ‐
278 ment macro. The base delimiter is a semicolon that can be
279 overriden by setting the first character in the string to a
280 valid delimiting character. If multiple -insert_env flags
281 contain the same Key then the last occurances Value will be
282 set in the DAGMan jobs environment.
283
284 -priority number
285 Sets the minimum job priority of node jobs submitted and run‐
286 ning under this condor_dagman job.
287
288 -DontAlwaysRunPost
289 This option causes condor_dagman to not run the POST script
290 of a node if the PRE script fails.
291
292 -AlwaysRunPost
293 This option causes condor_dagman to always run the POST
294 script of a node, even if the PRE script fails.
295
296 -DoRecovery
297 Causes condor_dagman to start in recovery mode. This means
298 that it reads the relevant job user log(s) and catches up to
299 the given DAG's previous state before submitting any new
300 jobs.
301
302 -dot Run condor_dagman up until the point when a DOT file is pro‐
303 duced.
304
305 -dag filename
306 filename is the name of the DAG input file that is set as an
307 argument to condor_submit_dag, and passed to condor_dagman.
308
310 condor_dagman will exit with a status value of 0 (zero) upon success,
311 and it will exit with the value 1 (one) upon failure.
312
314 condor_dagman is normally not run directly, but submitted as an HTCon‐
315 dor job by running condor_submit_dag. See the condor_submit_dag manual
316 page for examples.
317
319 HTCondor Team
320
322 1990-2023, Center for High Throughput Computing, Computer Sciences De‐
323 partment, University of Wisconsin-Madison, Madison, WI, US. Licensed
324 under the Apache License, Version 2.0.
325
326
327
328
329 Oct 02, 2023 CONDOR_DAGMAN(1)