1CONDOR_DAGMAN(1)                HTCondor Manual               CONDOR_DAGMAN(1)
2
3
4

NAME

6       condor_dagman - HTCondor Manual
7
8       meta scheduler of the jobs submitted as the nodes of a DAG or DAGs
9
10

SYNOPSIS

12       condor_dagman -f -t -l . -help
13
14       condor_dagman -version
15
16       condor_dagman -f -l . -csdversion version_string [-debug level] [-maxi‐
17       dle numberOfProcs] [-maxjobs numberOfJobs] [-maxpre NumberOfPreScripts]
18       [-maxpost  NumberOfPostScripts]  [-noeventchecks  ]  [-allowlogerror  ]
19       [-usedagdir ] -lockfile filename  [-waitfordebug  ]  [-autorescue  0|1]
20       [-dorescuefrom  number]  [-allowversionmismatch ] [-DumpRescue ] [-ver‐
21       bose  ]  [-force  ]  [-notification  value]  [-suppress_notification  ]
22       [-dont_suppress_notification ] [-dagman DagmanExecutable] [-outfile_dir
23       directory]  [-update_submit  ]  [-import_env   ]   [-priority   number]
24       [-dont_use_default_node_log  ]  [-DontAlwaysRunPost ] [-AlwaysRunPost ]
25       [-DoRecovery ] -dag dag_file [-dag dag_file_2 ... -dag dag_file_n ]
26

DESCRIPTION

28       condor_dagman is a meta scheduler for the HTCondor jobs  within  a  DAG
29       (directed  acyclic  graph) (or multiple DAGs). In typical usage, a sub‐
30       mitter of jobs that are organized into a DAG submits the DAG using con‐
31       dor_submit_dag. condor_submit_dag does error checking on aspects of the
32       DAG and then submits condor_dagman as an HTCondor  job.   condor_dagman
33       uses  log files to coordinate the further submission of the jobs within
34       the DAG.
35
36       All command line arguments to the DaemonCore library functions work for
37       condor_dagman.  When  invoked  from the command line, condor_dagman re‐
38       quires the arguments -f -l . to appear first on the command line, to be
39       processed  by  DaemonCore.  The  csdversion  must also be specified; at
40       start up, condor_dagman checks for a version  mismatch  with  the  con‐
41       dor_submit_dag  version  in this argument. The -t argument must also be
42       present for the -help option, such that output is sent to the terminal.
43
44       Arguments to condor_dagman are either automatically set by  condor_sub‐
45       mit_dag  or they are specified as command-line arguments to condor_sub‐
46       mit_dag and passed on to condor_dagman. The method by which  the  argu‐
47       ments are set is given in their description below.
48
49       condor_dagman can run multiple, independent DAGs. This is done by spec‐
50       ifying multiple -dag a rguments. Pass multiple DAG input files as  com‐
51       mand-line arguments to condor_submit_dag.
52
53       Debugging  output  may  be  obtained  by using the -debug level option.
54       Level values and what they produce is described as
55
56       • level = 0; never produce output, except for usage info
57
58       • level = 1; very quiet, output severe errors
59
60       • level = 2; normal output, errors and warnings
61
62       • level = 3; output errors, as well as all warnings
63
64       • level = 4; internal debugging output
65
66       • level = 5; internal debugging output; outer loop debugging
67
68       • level = 6; internal debugging output; inner  loop  debugging;  output
69         DAG input file lines as they are parsed
70
71       • level  =  7; internal debugging output; rarely used; output DAG input
72         file lines as they are parsed
73

OPTIONS

75          -help  Display usage information and exit.
76
77          -version
78                 Display version information and exit.
79
80          -debug level
81                 An integer level of debugging output. level  is  an  integer,
82                 with  values  of  0-7  inclusive, where 7 is the most verbose
83                 output. This  command-line  option  to  condor_submit_dag  is
84                 passed to condor_dagman or defaults to the value 3.
85
86          -maxidle NumberOfProcs
87                 Sets  the  maximum  number  of idle procs allowed before con‐
88                 dor_dagman stops submitting more node  jobs.  Note  that  for
89                 this  argument,  each individual proc within a cluster counts
90                 as a towards the limit, which is inconsistent with -maxjobs .
91                 Once  idle procs start to run, condor_dagman will resume sub‐
92                 mitting jobs once the number of idle procs  falls  below  the
93                 specified limit.  NumberOfProcs is a non-negative integer. If
94                 this option is omitted, the number of idle procs  is  limited
95                 by the configuration variable DAGMAN_MAX_JOBS_IDLE
96                   (see Configuration File Entries for DAGMan), which defaults
97                 to 1000. To disable this limit, set NumberOfProcs to 0.  Note
98                 that  submit  description files that queue multiple procs can
99                 cause the NumberOfProcs limit to be exceeded.  Setting  queue
100                 5000 in the submit description file, where -maxidle is set to
101                 250 will result in a cluster of 5000 new procs being  submit‐
102                 ted  to the condor_schedd, not 250. In this case, condor_dag‐
103                 man will resume submitting jobs when the number of idle procs
104                 falls below 250.
105
106          -maxjobs NumberOfClusters
107                 Sets  the maximum number of clusters within the DAG that will
108                 be submitted to HTCondor at one time. Note that for this  ar‐
109                 gument,  each  cluster  counts as one job, no matter how many
110                 individual procs are in the cluster.  NumberOfClusters  is  a
111                 non-negative  integer.  If this option is omitted, the number
112                 of clusters is limited by  the  configuration  variable  DAG‐
113                 MAN_MAX_JOBS_SUBMITTED
114                   (see Configuration File Entries for DAGMan), which defaults
115                 to 0 (unlimited).
116
117          -maxpre NumberOfPreScripts
118                 Sets the maximum number of PRE scripts within  the  DAG  that
119                 may be running at one time. NumberOfPreScripts is a non-nega‐
120                 tive integer.  If this option is omitted, the number  of  PRE
121                 scripts   is  limited  by  the  configuration  variable  DAG‐
122                 MAN_MAX_PRE_SCRIPTS (see Configuration File Entries for  DAG‐
123                 Man), which defaults to 20.
124
125          -maxpost NumberOfPostScripts
126                 Sets  the  maximum number of POST scripts within the DAG that
127                 may be running at one time. NumberOfPostScripts is a non-neg‐
128                 ative  integer. If this option is omitted, the number of POST
129                 scripts  is  limited  by  the  configuration  variable   DAG‐
130                 MAN_MAX_POST_SCRIPTS
131                   (see Configuration File Entries for DAGMan), which defaults
132                 to 20.
133
134          -noeventchecks
135                 This argument is no longer used; it is now ignored. Its func‐
136                 tionality  is now implemented by the DAGMAN_ALLOW_EVENTS con‐
137                 figuration variable.
138
139          -allowlogerror
140                 As of verson 8.5.5 this argument is no longer supported,  and
141                 setting it will generate a warning.
142
143          -usedagdir
144                 This optional argument causes condor_dagman to run each spec‐
145                 ified DAG as if the directory containing that  DAG  file  was
146                 the  current  working  directory.  This option is most useful
147                 when running multiple DAGs in a single condor_dagman.
148
149          -lockfile filename
150                 Names the file created and used as a lock file. The lock file
151                 prevents  execution  of  two of the same DAG, as defined by a
152                 DAG input file. A default lock file ending  with  the  suffix
153                 .dag.lock is passed to condor_dagman by condor_submit_dag.
154
155          -waitfordebug
156                 This  optional  argument  causes  condor_dagman  to  wait  at
157                 startup until someone attaches to the process with a debugger
158                 and sets the wait_for_debug variable in main_init() to false.
159
160          -autorescue 0|1
161                 Whether  to  automatically  run the newest rescue DAG for the
162                 given DAG file, if one exists (0 = false, 1 = true).
163
164          -dorescuefrom number
165                 Forces condor_dagman to run the specified rescue  DAG  number
166                 for the given DAG. A value of 0 is the same as not specifying
167                 this option. Specifying a nonexistent rescue DAG is  a  fatal
168                 error.
169
170          -allowversionmismatch
171                 This  optional  argument causes condor_dagman to allow a ver‐
172                 sion mismatch between  condor_dagman  itself  and  the  .con‐
173                 dor.sub  file  produced  by  condor_submit_dag  (or, in other
174                 words, between condor_submit_dag and condor_dagman). WARNING!
175                 This  option should be used only if absolutely necessary. Al‐
176                 lowing version mismatches can cause subtle problems when run‐
177                 ning  DAGs.  (Note  that,  starting  with version 7.4.0, con‐
178                 dor_dagman no longer requires an exact version match  between
179                 itself and the .condor.sub file.  Instead, a "minimum compat‐
180                 ible version" is defined, and any .condor.sub  file  of  that
181                 version or newer is accepted.)
182
183          -DumpRescue
184                 This  optional  argument  causes condor_dagman to immediately
185                 dump a Rescue DAG and then exit, as opposed to actually  run‐
186                 ning  the  DAG.  This feature is mainly intended for testing.
187                 The Rescue DAG file is produced  whether  or  not  there  are
188                 parse errors reading the original DAG input file. The name of
189                 the file differs if there was a parse error.
190
191          -verbose
192                 (This argument is included only to be passed  to  condor_sub‐
193                 mit_dag  if  lazy  submit  file generation is used for nested
194                 DAGs.) Cause condor_submit_dag to  give  verbose  error  mes‐
195                 sages.
196
197          -force (This  argument  is included only to be passed to condor_sub‐
198                 mit_dag if lazy submit file generation  is  used  for  nested
199                 DAGs.)  Require condor_submit_dag to overwrite the files that
200                 it produces, if the files already exist. Note that dagman.out
201                 will be appended to, not overwritten. If new-style rescue DAG
202                 mode is in effect, and any new-style rescue DAGs  exist,  the
203                 -force  flag  will cause them to be renamed, and the original
204                 DAG will be run. If old-style rescue DAG mode is  in  effect,
205                 any  existing  old-style rescue DAGs will be deleted, and the
206                 original DAG will be run. See the HTCondor manual section  on
207                 Rescue DAGs for more information.
208
209          -notification value
210                 This  argument  is  only included to be passed to condor_sub‐
211                 mit_dag if lazy submit file generation  is  used  for  nested
212                 DAGs.  Sets  the  e-mail notification for DAGMan itself. This
213                 information will be used within the HTCondor submit  descrip‐
214                 tion  file  for  DAGMan. This file is produced by condor_sub‐
215                 mit_dag. The notification option is  described  in  the  con‐
216                 dor_submit manual page.
217
218          -suppress_notification
219                 Causes  jobs submitted by condor_dagman to not send email no‐
220                 tification for events. The same effect  can  be  achieved  by
221                 setting  the configuration variable DAGMAN_SUPPRESS_NOTIFICA‐
222                 TION
223                   to True. This command line option  is  independent  of  the
224                 -notification  command  line option, which controls notifica‐
225                 tion for the condor_dagman job itself. This flag is generally
226                 superfluous,   as  DAGMAN_SUPPRESS_NOTIFICATION  defaults  to
227                 True.
228
229          -dont_suppress_notification
230                 Causes jobs submitted by condor_dagman to  defer  to  content
231                 within  the  submit  description  file  when deciding to send
232                 email  notification  for  events.  The  same  effect  can  be
233                 achieved  by  setting  the configuration variable DAGMAN_SUP‐
234                 PRESS_NOTIFICATION
235                   to False. This command line flag is independent of the -no‐
236                 tification  command  line option, which controls notification
237                 for the condor_dagman job itself. If both  -dont_suppress_no‐
238                 tification  and  -suppress_notification  are specified within
239                 the same command line, the last argument is used.
240
241          -dagman DagmanExecutable
242                 (This argument is included only to be passed  to  condor_sub‐
243                 mit_dag  if  lazy  submit  file generation is used for nested
244                 DAGs.) Allows the specification of an alternate condor_dagman
245                 executable  to be used instead of the one found in the user's
246                 path. This must be a fully qualified path.
247
248          -outfile_dir directory
249                 (This argument is included only to be passed  to  condor_sub‐
250                 mit_dag  if  lazy  submit  file generation is used for nested
251                 DAGs.) Specifies the directory in which the .dagman.out  file
252                 will  be  written. The directory may be specified relative to
253                 the current working directory as  condor_submit_dag  is  exe‐
254                 cuted,  or  specified with an absolute path. Without this op‐
255                 tion, the .dagman.out file is placed in the same directory as
256                 the first DAG input file listed on the command line.
257
258          -update_submit
259                 (This  argument  is included only to be passed to condor_sub‐
260                 mit_dag if lazy submit file generation  is  used  for  nested
261                 DAGs.)  This optional argument causes an existing .condor.sub
262                 file to not be treated as an error; rather,  the  .condor.sub
263                 file   will  be  overwritten,  but  the  existing  values  of
264                 -maxjobs, -maxidle, -maxpre, and -maxpost will be preserved.
265
266          -import_env
267                 (This argument is included only to be passed  to  condor_sub‐
268                 mit_dag  if  lazy  submit  file generation is used for nested
269                 DAGs.) This optional argument causes condor_submit_dag to im‐
270                 port  the current environment into the environment command of
271                 the .condor.sub file it generates.
272
273          -priority number
274                 Sets the minimum job priority of node jobs submitted and run‐
275                 ning under this condor_dagman job.
276
277          -dont_use_default_node_log
278                 This  option  is disabled as of HTCondor version 8.3.1. Tells
279                 condor_dagman to use the file specified by  the  job  ClassAd
280                 attribute UserLog to monitor job status. If this command line
281                 argument is used, then the job event log file cannot  be  de‐
282                 fined with a macro.
283
284          -DontAlwaysRunPost
285                 This  option  causes condor_dagman to not run the POST script
286                 of a node if the PRE script fails. (This was the default  be‐
287                 havior  prior to HTCondor version 7.7.2, and is again the de‐
288                 fault behavior from version 8.5.4 onwards.)
289
290          -AlwaysRunPost
291                 This option causes  condor_dagman  to  always  run  the  POST
292                 script of a node, even if the PRE script fails. (This was the
293                 default behavior for HTCondor version 7.7.2  through  version
294                 8.5.3.)
295
296          -DoRecovery
297                 Causes  condor_dagman  to  start in recovery mode. This means
298                 that it reads the relevant job user log(s) and catches up  to
299                 the  given  DAG's  previous  state  before submitting any new
300                 jobs.
301
302          -dag filename
303                 filename is the name of the DAG input file that is set as  an
304                 argument to condor_submit_dag, and passed to condor_dagman.
305

EXIT STATUS

307       condor_dagman  will  exit with a status value of 0 (zero) upon success,
308       and it will exit with the value 1 (one) upon failure.
309

EXAMPLES

311       condor_dagman is normally not run directly, but submitted as an  HTCon‐
312       dor  job by running condor_submit_dag. See the condor_submit_dag manual
313       page for examples.
314

AUTHOR

316       HTCondor Team
317
319       1990-2022, Center for High Throughput Computing, Computer Sciences  De‐
320       partment,  University  of  Wisconsin-Madison, Madison, WI, US. Licensed
321       under the Apache License, Version 2.0.
322
323
324
325
3268.8                              Jun 13, 2022                 CONDOR_DAGMAN(1)
Impressum