1just-man-pages/condor_dagman(G1e)neral Commands Manjuuaslt-man-pages/condor_dagman(1)
2
3
4

Name

6       condor_dagman  meta  scheduler  of the jobs submitted as the nodes of a
7       DAG or DAGs
8

Synopsis

10       condor_dagman -f -t -l .  -help
11
12       condor_dagman -version
13
14       condor_dagman -f -l .  -csdversion version_string [ -debug  level  ]  [
15       -maxidle  numberOfProcs ] [ -maxjobs numberOfJobs ] [ -maxpre NumberOf‐
16       PreScripts ] [ -maxpost NumberOfPostScripts  ]  [  -noeventchecks  ]  [
17       -allowlogerror  ] [ -usedagdir ] -lockfile filename [ -waitfordebug ] [
18       -autorescue 0|1 ] [ -dorescuefrom number ] [ -allowversionmismatch ]  [
19       -DumpRescue  ]  [ -verbose ] [ -force ] [ -notification value ] [ -sup‐
20       press_notification ] [ -dont_suppress_notification  ]  [  -dagman  Dag‐
21       manExecutable  ]  [  -outfile_dir  directory  ]  [  -update_submit  ] [
22       -import_env ] [ -priority number ]  [  -dont_use_default_node_log  ]  [
23       -DontAlwaysRunPost ] [ -AlwaysRunPost ] [ -DoRecovery ] -dag dag_file [
24       -dag dag_file_2 ...  -dag dag_file_n ]
25

Description

27       condor_dagman is a meta scheduler for the HTCondor jobs  within  a  DAG
28       (directed  acyclic  graph) (or multiple DAGs). In typical usage, a sub‐
29       mitter of jobs that are organized into a DAG submits the DAG using con‐
30       dor_submit_dag  .   condor_submit_dag does error checking on aspects of
31       the DAG and then submits condor_dagman as an HTCondor job.  condor_dag‐
32       man  uses  log  files  to coordinate the further submission of the jobs
33       within the DAG.
34
35       All command line arguments to the DaemonCore library functions work for
36       condor_dagman  .  When  invoked  from  the  command line, condor_dagman
37       requires the arguments -f -l .  to appear first on the command line, to
38       be  processed by DaemonCore . The csdversion must also be specified; at
39       start up, condor_dagman checks for a version  mismatch  with  the  con‐
40       dor_submit_dag  version  in this argument. The -t argument must also be
41       present for the -help option, such that output is sent to the terminal.
42
43       Arguments to condor_dagman are either automatically set by  condor_sub‐
44       mit_dag  or they are specified as command-line arguments to condor_sub‐
45       mit_dag and passed on to condor_dagman . The method by which the  argu‐
46       ments are set is given in their description below.
47
48       condor_dagman can run multiple, independent DAGs. This is done by spec‐
49       ifying multiple -dag a rguments. Pass multiple DAG input files as  com‐
50       mand-line arguments to condor_submit_dag .
51
52       Debugging  output  may  be  obtained  by using the -debug level option.
53       Level values and what they produce is described as
54
55          * level = 0; never produce output, except for usage info
56
57          * level = 1; very quiet, output severe errors
58
59          * level = 2; normal output, errors and warnings
60
61          * level = 3; output errors, as well as all warnings
62
63          * level = 4; internal debugging output
64
65          * level = 5; internal debugging output; outer loop debugging
66
67          * level = 6; internal debugging output; inner loop debugging; output
68          DAG input file lines as they are parsed
69
70          *  level  =  7;  internal  debugging output; rarely used; output DAG
71          input file lines as they are parsed
72

Options

74       -help
75
76          Display usage information and exit.
77
78
79
80       -version
81
82          Display version information and exit.
83
84
85
86       -debug level
87
88          An integer level of debugging output.  level  is  an  integer,  with
89          values  of  0-7  inclusive, where 7 is the most verbose output. This
90          command-line option to condor_submit_dag is passed to  condor_dagman
91          or defaults to the value 3.
92
93
94
95       -maxidle NumberOfProcs
96
97          Sets  the  maximum number of idle procs allowed before condor_dagman
98          stops submitting more node jobs. Note that for this  argument,  each
99          individual  proc  within  a  cluster  counts as a towards the limit,
100          which is inconsistent with -maxjobs .  Once idle procs start to run,
101          condor_dagman  will  resume  submitting jobs once the number of idle
102          procs falls below the specified limit.  NumberOfProcs is a non-nega‐
103          tive integer. If this option is omitted, the number of idle procs is
104          limited by the configuration variable  DAGMAN_MAX_JOBS_IDLE (see  ),
105          which  defaults to 1000. To disable this limit, set NumberOfProcs to
106          0. Note that submit description files that queue multiple procs  can
107          cause the NumberOfProcs limit to be exceeded. Setting  queue 5000 in
108          the submit description file, where  -maxidle  is  set  to  250  will
109          result  in  a  cluster of 5000 new procs being submitted to the con‐
110          dor_schedd , not 250. In this case, condor_dagman will  resume  sub‐
111          mitting jobs when the number of idle procs falls below 250.
112
113
114
115       -maxjobs NumberOfClusters
116
117          Sets the maximum number of clusters within the DAG that will be sub‐
118          mitted to HTCondor at one time. Note that for  this  argument,  each
119          cluster  counts  as one job, no matter how many individual procs are
120          in the cluster.  NumberOfClusters is a non-negative integer. If this
121          option is omitted, the number of clusters is limited by the configu‐
122          ration variable  DAGMAN_MAX_JOBS_SUBMITTED (see ), which defaults to
123          0 (unlimited).
124
125
126
127       -maxpre NumberOfPreScripts
128
129          Sets  the  maximum  number of PRE scripts within the DAG that may be
130          running at one time.  NumberOfPreScripts is a non-negative  integer.
131          If  this  option is omitted, the number of PRE scripts is limited by
132          the configuration variable   DAGMAN_MAX_PRE_SCRIPTS  (see  ),  which
133          defaults to 20.
134
135
136
137       -maxpost NumberOfPostScripts
138
139          Sets  the  maximum number of POST scripts within the DAG that may be
140          running at one time.  NumberOfPostScripts is a non-negative integer.
141          If  this option is omitted, the number of POST scripts is limited by
142          the configuration variable  DAGMAN_MAX_POST_SCRIPTS  (see  ),  which
143          defaults to 20.
144
145
146
147       -noeventchecks
148
149          This  argument is no longer used; it is now ignored. Its functional‐
150          ity is now implemented  by  the   DAGMAN_ALLOW_EVENTS  configuration
151          variable.
152
153
154
155       -allowlogerror
156
157          As of verson 8.5.5 this argument is no longer supported, and setting
158          it will generate a warning.
159
160
161
162       -usedagdir
163
164          This optional argument causes condor_dagman to  run  each  specified
165          DAG  as  if  the  directory containing that DAG file was the current
166          working directory. This option is most useful when running  multiple
167          DAGs in a single condor_dagman .
168
169
170
171       -lockfile filename
172
173          Names  the  file created and used as a lock file. The lock file pre‐
174          vents execution of two of the same DAG, as defined by  a  DAG  input
175          file.  A  default  lock  file  ending  with the suffix  .dag.lock is
176          passed to condor_dagman by condor_submit_dag .
177
178
179
180       -waitfordebug
181
182          This optional argument causes condor_dagman to wait at startup until
183          someone  attaches  to  the  process  with  a  debugger  and sets the
184          wait_for_debug variable in main_init() to false.
185
186
187
188       -autorescue 0|1
189
190          Whether to automatically run the newest rescue DAG for the given DAG
191          file, if one exists (0 =  false , 1 =  true ).
192
193
194
195       -dorescuefrom number
196
197          Forces  condor_dagman to run the specified rescue DAG number for the
198          given DAG. A value of 0 is the same as not specifying  this  option.
199          Specifying a nonexistent rescue DAG is a fatal error.
200
201
202
203       -allowversionmismatch
204
205          This  optional argument causes condor_dagman to allow a version mis‐
206          match between condor_dagman itself and the   .condor.sub  file  pro‐
207          duced  by condor_submit_dag (or, in other words, between condor_sub‐
208          mit_dag and condor_dagman ). WARNING! This  option  should  be  used
209          only  if absolutely necessary. Allowing version mismatches can cause
210          subtle problems when running DAGs. (Note that, starting with version
211          7.4.0,  condor_dagman  no  longer  requires  an  exact version match
212          between itself and the  .condor.sub file. Instead, a  "minimum  com‐
213          patible  version" is defined, and any  .condor.sub file of that ver‐
214          sion or newer is accepted.)
215
216
217
218       -DumpRescue
219
220          This optional argument causes condor_dagman to  immediately  dump  a
221          Rescue  DAG  and  then exit, as opposed to actually running the DAG.
222          This feature is mainly intended for testing. The Rescue DAG file  is
223          produced  whether or not there are parse errors reading the original
224          DAG input file. The name of the file differs if there  was  a  parse
225          error.
226
227
228
229       -verbose
230
231          (This argument is included only to be passed to condor_submit_dag if
232          lazy submit file generation is used for  nested  DAGs.)  Cause  con‐
233          dor_submit_dag to give verbose error messages.
234
235
236
237       -force
238
239          (This argument is included only to be passed to condor_submit_dag if
240          lazy submit file generation is used for nested DAGs.)  Require  con‐
241          dor_submit_dag to overwrite the files that it produces, if the files
242          already exist. Note that  dagman.out will be appended to, not  over‐
243          written.  If  new-style  rescue  DAG mode is in effect, and any new-
244          style rescue DAGs exist, the -force  flag  will  cause  them  to  be
245          renamed,  and  the original DAG will be run. If old-style rescue DAG
246          mode is in effect,  any  existing  old-style  rescue  DAGs  will  be
247          deleted,  and  the original DAG will be run. See the HTCondor manual
248          section on Rescue DAGs for more information.
249
250
251
252       -notification value
253
254          This argument is only included to be passed to condor_submit_dag  if
255          lazy submit file generation is used for nested DAGs. Sets the e-mail
256          notification for DAGMan itself. This information will be used within
257          the  HTCondor  submit description file for DAGMan. This file is pro‐
258          duced by condor_submit_dag . The notification option is described in
259          the condor_submit manual page.
260
261
262
263       -suppress_notification
264
265          Causes  jobs  submitted by condor_dagman to not send email notifica‐
266          tion for events. The same effect can be achieved by setting the con‐
267          figuration  variable   DAGMAN_SUPPRESS_NOTIFICATION  to  True . This
268          command line option is independent of the -notification command line
269          option,  which  controls  notification  for  the  condor_dagman  job
270          itself.  This  flag  is  generally  superfluous,   as    DAGMAN_SUP‐
271          PRESS_NOTIFICATION defaults to  True .
272
273
274
275       -dont_suppress_notification
276
277          Causes  jobs  submitted  by condor_dagman to defer to content within
278          the submit description file when deciding to send email notification
279          for  events. The same effect can be achieved by setting the configu‐
280          ration variable  DAGMAN_SUPPRESS_NOTIFICATION to  False . This  com‐
281          mand  line  flag  is  independent  of the -notification command line
282          option,  which  controls  notification  for  the  condor_dagman  job
283          itself.  If both -dont_suppress_notification and -suppress_notifica‐
284          tion are specified within the same command line, the  last  argument
285          is used.
286
287
288
289       -dagman DagmanExecutable
290
291          (This argument is included only to be passed to condor_submit_dag if
292          lazy submit file generation is used for  nested  DAGs.)  Allows  the
293          specification  of  an  alternate condor_dagman executable to be used
294          instead of the one found in the user's path. This must  be  a  fully
295          qualified path.
296
297
298
299       -outfile_dir directory
300
301          (This argument is included only to be passed to condor_submit_dag if
302          lazy submit file generation is used for nested DAGs.) Specifies  the
303          directory in which the  .dagman.out file will be written. The direc‐
304          tory may be specified relative to the current working  directory  as
305          condor_submit_dag  is  executed, or specified with an absolute path.
306          Without this option, the  .dagman.out file is  placed  in  the  same
307          directory as the first DAG input file listed on the command line.
308
309
310
311       -update_submit
312
313          (This argument is included only to be passed to condor_submit_dag if
314          lazy submit file generation is used for nested DAGs.) This  optional
315          argument  causes  an existing  .condor.sub file to not be treated as
316          an error; rather, the  .condor.sub file will be overwritten, but the
317          existing values of -maxjobs , -maxidle , -maxpre , and -maxpost will
318          be preserved.
319
320
321
322       -import_env
323
324          (This argument is included only to be passed to condor_submit_dag if
325          lazy  submit file generation is used for nested DAGs.) This optional
326          argument causes condor_submit_dag to import the current  environment
327          into the environment command of the  .condor.sub file it generates.
328
329
330
331       -priority number
332
333          Sets  the  minimum  job  priority of node jobs submitted and running
334          under this condor_dagman job.
335
336
337
338       -dont_use_default_node_log
339
340          This option is disabled as of HTCondor version  8.3.1.   Tells  con‐
341          dor_dagman  to  use  the file specified by the job ClassAd attribute
342          UserLog to monitor job status. If  this  command  line  argument  is
343          used, then the job event log file cannot be defined with a macro.
344
345
346
347       -DontAlwaysRunPost
348
349          This  option  causes  condor_dagman  to not run the POST script of a
350          node if the PRE script fails. (This was the default  behavior  prior
351          to  HTCondor  version  7.7.2, and is again the default behavior from
352          version 8.5.4 onwards.)
353
354
355
356       -AlwaysRunPost
357
358          This option causes condor_dagman to always run the POST script of  a
359          node,  even  if the PRE script fails. (This was the default behavior
360          for HTCondor version 7.7.2 through version 8.5.3.)
361
362
363
364       -DoRecovery
365
366          Causes condor_dagman to start in recovery mode. This means  that  it
367          reads the relevant job user log(s) and catches up to the given DAG's
368          previous state before submitting any new jobs.
369
370
371
372       -dag filename
373
374          filename is the name of the DAG input file that is set as  an  argu‐
375          ment to condor_submit_dag , and passed to condor_dagman .
376
377
378
379
380

Exit Status

382       condor_dagman  will  exit with a status value of 0 (zero) upon success,
383       and it will exit with the value 1 (one) upon failure.
384

Examples

386       condor_dagman is normally not run directly, but submitted as an  HTCon‐
387       dor  job by running condor_submit_dag. See the condor_submit_dag manual
388       page for examples.
389

Author

391       Center for High Throughput Computing, University of Wisconsin-Madison
392
394       Copyright (C) 1990-2018 Center for High Throughput Computing,  Computer
395       Sciences  Department, University of Wisconsin-Madison, Madison, WI. All
396       Rights Reserved. Licensed under the Apache License, Version 2.0.
397
398
399
400                                     date      just-man-pages/condor_dagman(1)
Impressum