1just-man-pages/condor_q(1) General Commands Manual just-man-pages/condor_q(1)
2
3
4
6 condor_q Display information about jobs in queue
7
9 condor_q [ -help [Universe | State] ]
10
11 condor_q [ -debug ] [ general options ] [ restriction list ] [ output
12 options ] [ analyze options ]
13
15 condor_q displays information about jobs in the HTCondor job queue. By
16 default, condor_q queries the local job queue, but this behavior may be
17 modified by specifying one of the general options.
18
19 As of version 8.5.2, condor_q defaults to querying only the current
20 user's jobs. This default is overridden when the restriction list has
21 usernames and/or job ids, when the -submitter or -allusers arguments
22 are specified, or when the current user is a queue superuser. It can
23 also be overridden by setting the CONDOR_Q_ONLY_MY_JOBS configuration
24 macro to False .
25
26 As of version 8.5.6, condor_q defaults to batch-mode output (see -batch
27 in the Options section below). The old behavior can be obtained by
28 specifying -nobatch on the command line. To change the default back to
29 its pre-8.5.6 value, set the new configuration variable CON‐
30 DOR_Q_DASH_BATCH_IS_DEFAULT to False .
31
33 As of version 8.5.6, condor_q defaults to displaying information about
34 batches of jobs, rather than individual jobs. The intention is that
35 this will be a more useful, and user-friendly, format for users with
36 large numbers of jobs in the queue. Ideally, users will specify mean‐
37 ingful batch names for their jobs, to make it easier to keep track of
38 related jobs.
39
40 (For information about specifying batch names for your jobs, see the
41 condor_submit ( ) and condor_submit_dag ( ) man pages.)
42
43 A batch of jobs is defined as follows:
44
45 * An entire workflow (a DAG or hierarchy of nested DAGs) (note that
46 condor_dagman now specifies a default batch name for all jobs in a
47 given workflow)
48
49 * All jobs in a single cluster
50
51 * All jobs submitted by a single user that have the same executable
52 specified in their submit file (unless submitted with different
53 batch names)
54
55 * All jobs submitted by a single user that have the same batch name
56 specified in their submit file or on the condor_submit or con‐
57 dor_submit_dag command line.
58
60 There are many output options that modify the output generated by con‐
61 dor_q . The effects of these options, and the meanings of the various
62 output data, are described below.
63
64 Output options
65 If the -long option is specified, condor_q displays a long description
66 of the queried jobs by printing the entire job ClassAd for all jobs
67 matching the restrictions, if any. Individual attributes of the job
68 ClassAd can be displayed by means of the -format option, which displays
69 attributes with a printf(3) format, or with the -autoformat option.
70 Multiple -format options may be specified in the option list to display
71 several attributes of the job.
72
73 For most output options (except as specified), the last line of con‐
74 dor_q output contains a summary of the queue: the total number of jobs,
75 and the number of jobs in the completed, removed, idle, running, held
76 and suspended states.
77
78 If no output options are specified, condor_q now defaults to batch
79 mode, and displays the following columns of information, with one line
80 of output per batch of jobs:
81
82 OWNER, BATCH_NAME, SUBMITTED, DONE, RUN, IDLE, [HOLD,] TOTAL,
83 JOB_IDS
84
85 Note that the HOLD column is only shown if there are held jobs in the
86 output or if there are no jobs in the output.
87
88 If the -nobatch option is specified, condor_q displays the following
89 columns of information, with one line of output per job:
90
91 ID, OWNER, SUBMITTED, RUN_TIME, ST, PRI, SIZE, CMD
92
93 If the -dag option is specified (in conjunction with -nobatch ), con‐
94 dor_q displays the following columns of information, with one line of
95 output per job; the owner is shown only for top-level jobs, and for all
96 other jobs (including sub-DAGs) the node name is shown:
97
98 ID, OWNER/NODENAME, SUBMITTED, RUN_TIME, ST, PRI, SIZE, CMD
99
100 If the -run option is specified (in conjunction with -nobatch ), con‐
101 dor_q displays the following columns of information, with one line of
102 output per running job:
103
104 ID, OWNER, SUBMITTED, RUN_TIME, HOST(S)
105
106 Also note that the -run option disables output of the totals line.
107
108 If the -grid option is specified, condor_q displays the following col‐
109 umns of information, with one line of output per job:
110
111 ID, OWNER, STATUS, GRID->MANAGER, HOST, GRID_JOB_ID
112
113 If the -goodput option is specified, condor_q displays the following
114 columns of information, with one line of output per job:
115
116 ID, OWNER, SUBMITTED, RUN_TIME, GOODPUT, CPU_UTIL, Mb/s
117
118 If the -io option is specified, condor_q displays the following columns
119 of information, with one line of output per job:
120
121 ID, OWNER, RUNS, ST, INPUT, OUTPUT, RATE, MISC
122
123 If the -cputime option is specified (in conjunction with -nobatch ),
124 condor_q displays the following columns of information, with one line
125 of output per job:
126
127 ID, OWNER, SUBMITTED, CPU_TIME, ST, PRI, SIZE, CMD
128
129 If the -hold option is specified, condor_q displays the following col‐
130 umns of information, with one line of output per job:
131
132 ID, OWNER, HELD_SINCE, HOLD_REASON
133
134 If the -totals option is specified, condor_q displays only one line of
135 output no matter how many jobs and batches of jobs are in the queue.
136 That line of output contains the total number of jobs, and the number
137 of jobs in the completed, removed, idle, running, held and suspended
138 states.
139
140 Output data
141 The available output data are as follows:
142
143 ID
144
145 (Non-batch mode only) The cluster/process id of the HTCondor job.
146
147
148
149 OWNER
150
151 The owner of the job or batch of jobs.
152
153
154
155 OWNER/NODENAME
156
157 ( -dag only) The owner of a job or the DAG node name of the job.
158
159
160
161 BATCH_NAME
162
163 (Batch mode only) The batch name of the job or batch of jobs.
164
165
166
167 SUBMITTED
168
169 The month, day, hour, and minute the job was submitted to the queue.
170
171
172
173 DONE
174
175 (Batch mode only) The number of job procs that are done, but still
176 in the queue.
177
178
179
180 RUN
181
182 (Batch mode only) The number of job procs that are running.
183
184
185
186 IDLE
187
188 (Batch mode only) The number of job procs that are in the queue but
189 idle.
190
191
192
193 HOLD
194
195 (Batch mode only) The number of job procs that are in the queue but
196 held.
197
198
199
200 TOTAL
201
202 (Batch mode only) The total number of job procs in the queue, unless
203 the batch is a DAG, in which case this is the total number of clus‐
204 ters in the queue. Note: for non-DAG batches, the TOTAL column con‐
205 tains correct values only in version 8.5.7 and later.
206
207
208
209 JOB_IDS
210
211 (Batch mode only) The range of job IDs belonging to the batch.
212
213
214
215 RUN_TIME
216
217 (Non-batch mode only) Wall-clock time accumulated by the job to date
218 in days, hours, minutes, and seconds.
219
220
221
222 ST
223
224 (Non-batch mode only) Current status of the job, which varies some‐
225 what according to the job universe and the timing of updates. H = on
226 hold, R = running, I = idle (waiting for a machine to execute on), C
227 = completed, X = removed, S = suspended (execution of a running job
228 temporarily suspended on execute node), < = transferring input (or
229 queued to do so), and > = transferring output (or queued to do so).
230
231
232
233 PRI
234
235 (Non-batch mode only) User specified priority of the job, displayed
236 as an integer, with higher numbers corresponding to better priority.
237
238
239
240 SIZE
241
242 (Non-batch mode only) The peak amount of memory in Mbytes consumed
243 by the job; note this value is only refreshed periodically. The
244 actual value reported is taken from the job ClassAd attribute Memo‐
245 ryUsage if this attribute is defined, and from job attribute Image‐
246 Size otherwise.
247
248
249
250 CMD
251
252 (Non-batch mode only) The name of the executable.
253
254
255
256 HOST(S)
257
258 ( -run only) The host where the job is running.
259
260
261
262 STATUS
263
264 ( -grid only) The state that HTCondor believes the job is in. Possi‐
265 ble values are
266
267 PENDING
268
269 The job is waiting for resources to become available in order to
270 run.
271
272
273
274 ACTIVE
275
276 The job has received resources, and the application is executing.
277
278
279
280 FAILED
281
282 The job terminated before completion because of an error, user-
283 triggered cancel, or system-triggered cancel.
284
285
286
287 DONE
288
289 The job completed successfully.
290
291
292
293 SUSPENDED
294
295 The job has been suspended. Resources which were allocated for
296 this job may have been released due to a scheduler-specific rea‐
297 son.
298
299
300
301 UNSUBMITTED
302
303 The job has not been submitted to the scheduler yet, pending the
304 reception of the GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_REQUEST
305 signal from a client.
306
307
308
309 STAGE_IN
310
311 The job manager is staging in files, in order to run the job.
312
313
314
315 STAGE_OUT
316
317 The job manager is staging out files generated by the job.
318
319
320
321 UNKNOWN
322
323
324
325
326
327
328
329 GRID->MANAGER
330
331 ( -grid only) A guess at what remote batch system is running the
332 job. It is a guess, because HTCondor looks at the Globus jobmanager
333 contact string to attempt identification. If the value is fork, the
334 job is running on the remote host without a jobmanager. Values may
335 also be condor, lsf, or pbs.
336
337
338
339 HOST
340
341 ( -grid only) The host to which the job was submitted.
342
343
344
345 GRID_JOB_ID
346
347 ( -grid only) (More information needed here.)
348
349
350
351 GOODPUT
352
353 ( -goodput only) The percentage of RUN_TIME for this job which has
354 been saved in a checkpoint. A low GOODPUT value indicates that the
355 job is failing to checkpoint. If a job has not yet attempted a
356 checkpoint, this column contains [?????] .
357
358
359
360 CPU_UTIL
361
362 ( -goodput only) The ratio of CPU_TIME to RUN_TIME for checkpointed
363 work. A low CPU_UTIL indicates that the job is not running effi‐
364 ciently, perhaps because it is I/O bound or because the job requires
365 more memory than available on the remote workstations. If the job
366 has not (yet) checkpointed, this column contains [??????] .
367
368
369
370 Mb/s
371
372 ( -goodput only) The network usage of this job, in Megabits per sec‐
373 ond of run-time.
374
375
376
377 READ The total number of bytes the application has read from files
378 and sockets.
379
380
381
382 WRITE The total number of bytes the application has written to files
383 and sockets.
384
385
386
387 SEEK The total number of seek operations the application has per‐
388 formed on files.
389
390
391
392 XPUT The effective throughput (average bytes read and written per
393 second) from the application's point of view.
394
395
396
397 BUFSIZE The maximum number of bytes to be buffered per file.
398
399
400
401 BLOCKSIZE The desired block size for large data transfers. These
402 fields are updated when a job produces a checkpoint or completes. If
403 a job has not yet produced a checkpoint, this information is not
404 available.
405
406
407
408 INPUT
409
410 ( -io only) For standard universe, FileReadBytes; otherwise, Bytes‐
411 Recvd.
412
413
414
415 OUTPUT
416
417 ( -io only) For standard universe, FileWriteBytes; otherwise,
418 BytesSent.
419
420
421
422 RATE
423
424 ( -io only) For standard universe, FileReadBytes+FileWriteBytes;
425 otherwise, BytesRecvd+BytesSent.
426
427
428
429 MISC
430
431 ( -io only) JobUniverse.
432
433
434
435 CPU_TIME
436
437 ( -cputime only) The remote CPU time accumulated by the job to date
438 (which has been stored in a checkpoint) in days, hours, minutes, and
439 seconds. (If the job is currently running, time accumulated during
440 the current run is not shown. If the job has not produced a check‐
441 point, this column contains 0+00:00:00.)
442
443
444
445 HELD_SINCE
446
447 ( -hold only) Month, day, hour and minute at which the job was held.
448
449
450
451 HOLD_REASON
452
453 ( -hold only) The hold reason for the job.
454
455
456
457 Analyze
458 The -analyze or -better-analyze options can be used to determine why
459 certain jobs are not running by performing an analysis on a per machine
460 basis for each machine in the pool. The reasons can vary among failed
461 constraints, insufficient priority, resource owner preferences and pre‐
462 vention of preemption by the PREEMPTION_REQUIREMENTS expression. If
463 the analyze option -verbose is specified along with the -analyze
464 option, the reason for failure is displayed on a per machine basis.
465 -better-analyze differs from -analyze in that it will do matchmaking
466 analysis on jobs even if they are currently running, or if the reason
467 they are not running is not due to matchmaking. -better-analyze also
468 produces more thorough analysis of complex Requirements and shows the
469 values of relevant job ClassAd attributes. When only a single machine
470 is being analyzed via -machine or -mconstraint , the values of relevant
471 attributes of the machine ClassAd are also displayed.
472
474 To restrict the display to jobs of interest, a list of zero or more
475 restriction options may be supplied. Each restriction may be one of:
476
477 * cluster . process , which matches jobs which belong to the speci‐
478 fied cluster and have the specified process number;
479
480 * cluster (without a process ), which matches all jobs belonging to
481 the specified cluster;
482
483 * owner , which matches all jobs owned by the specified owner;
484
485 * -constraint expression , which matches all jobs that satisfy the
486 specified ClassAd expression;
487
488 * -allusers , which overrides the default restriction of only match‐
489 ing jobs submitted by the current user.
490
491 If cluster or cluster . process is specified, and the job matching
492 that restriction is a condor_dagman job, information for all jobs of
493 that DAG is displayed in batch mode (in non-batch mode, only the con‐
494 dor_dagman job itself is displayed).
495
496 If no owner restrictions are present, the job matches the restriction
497 list if it matches at least one restriction in the list. If owner
498 restrictions are present, the job matches the list if it matches one of
499 the owner restrictions and at least one non- owner restriction.
500
502 -debug
503
504 Causes debugging information to be sent to stderr , based on the
505 value of the configuration variable TOOL_DEBUG .
506
507
508
509 -batch
510
511 (output option) Show a single line of progress information for a
512 batch of jobs, where a batch is defined as follows:
513
514 * An entire workflow (a DAG or hierarchy of nested DAGs)
515
516 * All jobs in a single cluster
517
518 * All jobs submitted by a single user that have the same exe‐
519 cutable specified in their submit file
520
521 * All jobs submitted by a single user that have the same batch
522 name specified in their submit file or on the condor_submit or
523 condor_submit_dag command line. Also change the output columns as
524 noted above.
525
526 Note that, as of version 8.5.6, -batch is the default, unless the
527 CONDOR_Q_DASH_BATCH_IS_DEFAULT configuration variable is set to
528 False .
529
530
531
532 -nobatch
533
534 (output option) Show a line for each job (turn off the -batch
535 option).
536
537
538
539 -global
540
541 (general option) Queries all job queues in the pool.
542
543
544
545 -submitter submitter
546
547 (general option) List jobs of a specific submitter in the entire
548 pool, not just for a single condor_schedd .
549
550
551
552 -name name
553
554 (general option) Query only the job queue of the named condor_schedd
555 daemon.
556
557
558
559 -pool centralmanagerhostname[:portnumber]
560
561 (general option) Use the centralmanagerhostname as the central man‐
562 ager to locate condor_schedd daemons. The default is the COLLEC‐
563 TOR_HOST , as specified in the configuration.
564
565
566
567 -jobads file
568
569 (general option) Display jobs from a list of ClassAds from a file,
570 instead of the real ClassAds from the condor_schedd daemon. This is
571 most useful for debugging purposes. The ClassAds appear as if con‐
572 dor_q -long is used with the header stripped out.
573
574
575
576 -userlog file
577
578 (general option) Display jobs, with job information coming from a
579 job event log, instead of from the real ClassAds from the con‐
580 dor_schedd daemon. This is most useful for automated testing of the
581 status of jobs known to be in the given job event log, because it
582 reduces the load on the condor_schedd . A job event log does not
583 contain all of the job information, so some fields in the normal
584 output of condor_q will be blank.
585
586
587
588 -autocluster
589
590 (output option) Output condor_schedd daemon auto cluster informa‐
591 tion. For each auto cluster, output the unique ID of the auto clus‐
592 ter along with the number of jobs in that auto cluster. This option
593 is intended to be used together with the -long option to output the
594 ClassAds representing auto clusters. The ClassAds can then be used
595 to identify or classify the demand for sets of machine resources,
596 which will be useful in the on-demand creation of execute nodes for
597 glidein services.
598
599
600
601 -cputime
602
603 (output option) Instead of wall-clock allocation time (RUN_TIME),
604 display remote CPU time accumulated by the job to date in days,
605 hours, minutes, and seconds. If the job is currently running, time
606 accumulated during the current run is not shown. Note that this
607 option has no effect unless used in conjunction with -nobatch .
608
609
610
611 -currentrun
612
613 (output option) Normally, RUN_TIME contains all the time accumulated
614 during the current run plus all previous runs. If this option is
615 specified, RUN_TIME only displays the time accumulated so far on
616 this current run.
617
618
619
620 -dag
621
622 (output option) Display DAG node jobs under their DAGMan instance.
623 Child nodes are listed using indentation to show the structure of
624 the DAG. Note that this option has no effect unless used in conjunc‐
625 tion with -nobatch .
626
627
628
629 -expert
630
631 (output option) Display shorter error messages.
632
633
634
635 -grid
636
637 (output option) Get information only about jobs submitted to grid
638 resources described as gt2 or gt5 .
639
640
641
642 -goodput
643
644 (output option) Display job goodput statistics.
645
646
647
648 -help [Universe | State]
649
650 (output option) Print usage info, and, optionally, additionally
651 print job universes or job states.
652
653
654
655 -hold
656
657 (output option) Get information about jobs in the hold state. Also
658 displays the time the job was placed into the hold state and the
659 reason why the job was placed in the hold state.
660
661
662
663 -limit Number
664
665 (output option) Limit the number of items output to Number .
666
667
668
669 -io
670
671 (output option) Display job input/output summaries.
672
673
674
675 -long
676
677 (output option) Display entire job ClassAds in long format (one
678 attribute per line).
679
680
681
682 -run
683
684 (output option) Get information about running jobs. Note that this
685 option has no effect unless used in conjunction with -nobatch .
686
687
688
689 -stream-results
690
691 (output option) Display results as jobs are fetched from the job
692 queue rather than storing results in memory until all jobs have been
693 fetched. This can reduce memory consumption when fetching large num‐
694 bers of jobs, but if condor_q is paused while displaying results,
695 this could result in a timeout in communication with condor_schedd .
696
697
698
699 -totals
700
701 (output option) Display only the totals.
702
703
704
705 -version
706
707 (output option) Print the HTCondor version and exit.
708
709
710
711 -wide
712
713 (output option) If this option is specified, and the command portion
714 of the output would cause the output to extend beyond 80 columns,
715 display beyond the 80 columns.
716
717
718
719 -xml
720
721 (output option) Display entire job ClassAds in XML format. The XML
722 format is fully defined in the reference manual, obtained from the
723 ClassAds web page, with a link at http://htcondor.org/classad/clas‐
724 sad.html.
725
726
727
728 -json
729
730 (output option) Display entire job ClassAds in JSON format.
731
732
733
734 -attributes Attr1[,Attr2 ...]
735
736 (output option) Explicitly list the attributes, by name in a comma
737 separated list, which should be displayed when using the -xml ,
738 -json or -long options. Limiting the number of attributes increases
739 the efficiency of the query.
740
741
742
743 -format fmt attr
744
745 (output option) Display attribute or expression attr in format fmt .
746 To display the attribute or expression the format must contain a
747 single printf(3) -style conversion specifier. Attributes must be
748 from the job ClassAd. Expressions are ClassAd expressions and may
749 refer to attributes in the job ClassAd. If the attribute is not
750 present in a given ClassAd and cannot be parsed as an expression,
751 then the format option will be silently skipped. %r prints the
752 unevaluated, or raw values. The conversion specifier must match the
753 type of the attribute or expression. %s is suitable for strings such
754 as Owner , %d for integers such as ClusterId , and %f for floating
755 point numbers such as RemoteWallClockTime . %v identifies the type
756 of the attribute, and then prints the value in an appropriate for‐
757 mat. %V identifies the type of the attribute, and then prints the
758 value in an appropriate format as it would appear in the -long for‐
759 mat. As an example, strings used with %V will have quote marks. An
760 incorrect format will result in undefined behavior. Do not use more
761 than one conversion specifier in a given format. More than one con‐
762 version specifier will result in undefined behavior. To output mul‐
763 tiple attributes repeat the -format option once for each desired
764 attribute. Like printf(3) style formats, one may include other text
765 that will be reproduced directly. A format without any conversion
766 specifiers may be specified, but an attribute is still required.
767 Include n to specify a line break.
768
769
770
771
772
773 -autoformat[:jlhVr,tng] attr1 [attr2 ...] or -af[:jlhVr,tng] attr1
774 [attr2 ...]
775
776 (output option) Display attribute(s) or expression(s) formatted in a
777 default way according to attribute types. This option takes an arbi‐
778 trary number of attribute names as arguments, and prints out their
779 values, with a space between each value and a newline character
780 after the last value. It is like the -format option without format
781 strings. This output option does not work in conjunction with any of
782 the options -run , -currentrun , -hold , -grid , -goodput , or -io .
783
784 It is assumed that no attribute names begin with a dash character,
785 so that the next word that begins with dash is the start of the next
786 option. The autoformat option may be followed by a colon character
787 and formatting qualifiers to deviate the output formatting from the
788 default:
789
790 j print the job ID as the first field,
791
792 l label each field,
793
794 h print column headings before the first line of output,
795
796 V use %V rather than %v for formatting (string values are quoted),
797
798 r print "raw", or unevaluated values,
799
800 , add a comma character after each field,
801
802 t add a tab character before each field instead of the default space
803 character,
804
805 n add a newline character after each field,
806
807 g add a newline character between ClassAds, and suppress spaces
808 before each field.
809
810 Use -af:h to get tabular values with headings.
811
812 Use -af:lrng to get -long equivalent format.
813
814 The newline and comma characters may not be used together. The l and
815 h characters may not be used together.
816
817
818
819 -analyze[:<qual>]
820
821 (analyze option) Perform a matchmaking analysis on why the requested
822 jobs are not running. First a simple analysis determines if the job
823 is not running due to not being in a runnable state. If the job is
824 in a runnable state, then this option is equivalent to -better-ana‐
825 lyze . <qual> is a comma separated list containing one or more of
826
827 priority to consider user priority during the analysis
828
829 summary to show a one line summary for each job or machine
830
831 reverse to analyze machines, rather than jobs
832
833
834
835 -better-analyze[:<qual>]
836
837 (analyze option) Perform a more detailed matchmaking analysis to
838 determine how many resources are available to run the requested
839 jobs. This option is never meaningful for Scheduler universe jobs
840 and only meaningful for grid universe jobs doing matchmaking.
841 <qual> is a comma separated list containing one or more of
842
843 priority to consider user priority during the analysis
844
845 summary to show a one line summary for each job or machine
846
847 reverse to analyze machines, rather than jobs
848
849
850
851 -machine name
852
853 (analyze option) When doing matchmaking analysis, analyze only
854 machine ClassAds that have slot or machine names that match the
855 given name.
856
857
858
859 -mconstraint expression
860
861 (analyze option) When doing matchmaking analysis, match only machine
862 ClassAds which match the ClassAd expression constraint.
863
864
865
866 -slotads file
867
868 (analyze option) When doing matchmaking analysis, use the machine
869 ClassAds from the file instead of the ones from the condor_collector
870 daemon. This is most useful for debugging purposes. The ClassAds
871 appear as if condor_status -long is used.
872
873
874
875 -userprios file
876
877 (analyze option) When doing matchmaking analysis with priority, read
878 user priorities from the file rather than the ones from the con‐
879 dor_negotiator daemon. This is most useful for debugging purposes or
880 to speed up analysis in situations where the condor_negotiator dae‐
881 mon is slow to respond to condor_userprio requests. The file should
882 be in the format produced by condor_userprio -long .
883
884
885
886 -nouserprios
887
888 (analyze option) Do not consider user priority during the analysis.
889
890
891
892 -reverse-analyze
893
894 (analyze option) Analyze machine requirements against jobs.
895
896
897
898 -verbose
899
900 (analyze option) When doing analysis, show progress and include the
901 names of specific machines in the output.
902
903
904
906 The default output from condor_q is formatted to be human readable, not
907 script readable. In an effort to make the output fit within 80 charac‐
908 ters, values in some fields might be truncated. Furthermore, the HTCon‐
909 dor Project can (and does) change the formatting of this default output
910 as we see fit. Therefore, any script that is attempting to parse data
911 from condor_q is strongly encouraged to use the -format option
912 (described above, examples given below).
913
914 Although -analyze provides a very good first approximation, the ana‐
915 lyzer cannot diagnose all possible situations, because the analysis is
916 based on instantaneous and local information. Therefore, there are some
917 situations such as when several submitters are contending for
918 resources, or if the pool is rapidly changing state which cannot be
919 accurately diagnosed.
920
921 Options -goodput , -cputime , and -io are most useful for standard uni‐
922 verse jobs, since they rely on values computed when a job produces a
923 checkpoint.
924
925 It is possible to to hold jobs that are in the X state. To avoid this
926 it is best to construct a -constraint expression that option contains
927 JobStatus != 3 if the user wishes to avoid this condition.
928
930 The -format option provides a way to specify both the job attributes
931 and formatting of those attributes. There must be only one conversion
932 specification per -format option. As an example, to list only Jane
933 Doe's jobs in the queue, choosing to print and format only the owner of
934 the job, the command line arguments for the job, and the process ID of
935 the job:
936
937 $ condor_q -submitter jdoe -format "%s" Owner -format " %s " Args -for‐
938 mat " ProcId = %d\n" ProcId
939 jdoe 16386 2800 ProcId = 0
940 jdoe 16386 3000 ProcId = 1
941 jdoe 16386 3200 ProcId = 2
942 jdoe 16386 3400 ProcId = 3
943 jdoe 16386 3600 ProcId = 4
944 jdoe 16386 4200 ProcId = 7
945
946 To display only the JobID's of Jane Doe's jobs you can use the follow‐
947 ing.
948
949 $ condor_q -submitter jdoe -format "%d." ClusterId -format "%d\n" Pro‐
950 cId
951 27.0
952 27.1
953 27.2
954 27.3
955 27.4
956 27.7
957
958 An example that shows the analysis in summary format:
959
960 $ condor_q -analyze:summary
961
962 -- Submitter: submit-1.chtc.wisc.edu :
963 <192.168.100.43:9618?sock=11794_95bb_3> :
964 submit-1.chtc.wisc.edu
965 Analyzing matches for 5979 slots
966 Autocluster Matches Machine Running Serving
967 JobId Members/Idle Reqmnts Rejects Job Users Job Other User
968 Avail Owner
969 ---------- ------------ -------- ------------ ---------- ----------
970 ----- -----
971 25764522.0 7/0 5910 820 7/10 5046 34
972 smith
973 25764682.0 9/0 2172 603 9/9 1531 29
974 smith
975 25765082.0 18/0 2172 603 18/9 1531 29
976 smith
977 25765900.0 1/0 2172 603 1/9 1531 29
978 smith
979
980 An example that shows summary information by machine:
981
982 $ condor_q -ana:sum,rev
983
984 -- Submitter: s-1.chtc.wisc.edu :
985 <192.168.100.43:9618?sock=11794_95bb_3> : s-1.chtc.wisc.edu
986 Analyzing matches for 2885 jobs
987 Slot Slot's Req Job's Req Both
988 Name Type Matches Job Matches Slot
989 Match %
990 ------------------------ ---- ------------ ------------
991 ----------
992 slot1@INFO.wisc.edu Stat 2729 0
993 0.00
994 slot2@INFO.wisc.edu Stat 2729 0
995 0.00
996 slot1@aci-001.chtc.wisc.edu Part 0 2793
997 0.00
998 slot1_1@a-001.chtc.wisc.edu Dyn 2644 2792
999 91.37
1000 slot1_2@a-001.chtc.wisc.edu Dyn 2623 2601
1001 85.10
1002 slot1_3@a-001.chtc.wisc.edu Dyn 2644 2632
1003 85.82
1004 slot1_4@a-001.chtc.wisc.edu Dyn 2644 2792
1005 91.37
1006 slot1@a-002.chtc.wisc.edu Part 0 2633
1007 0.00
1008 slot1_10@a-002.chtc.wisc.edu Den 2623 2601
1009 85.10
1010
1011 An example with two independent DAGs in the queue:
1012
1013 $ condor_q
1014
1015 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:35169?...
1016 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
1017 wenger DAG: 3696 2/12 11:55 _ 10 _ 10 3698.0 ...
1018 3707.0
1019 wenger DAG: 3697 2/12 11:55 1 1 1 10 3709.0 ...
1020 3710.0
1021
1022 14 jobs; 0 completed, 0 removed, 1 idle, 13 running, 0 held, 0 sus‐
1023 pended
1024
1025 Note that the "13 running" in the last line is two more than the total
1026 of the RUN column, because the two condor_dagman jobs themselves are
1027 counted in the last line but not the RUN column.
1028
1029 Also note that the "completed" value in the last line does not corre‐
1030 spond to the total of the DONE column, because the "completed" value in
1031 the last line only counts jobs that are completed but still in the
1032 queue, whereas the DONE column counts jobs that are no longer in the
1033 queue.
1034
1035 Here's an example with a held job, illustrating the addition of the
1036 HOLD column to the output:
1037
1038 $ condor_q
1039
1040 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
1041 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL
1042 JOB_IDS
1043 wenger CMD: /bin/slee 9/13 16:25 _ 3 _ 1 4
1044 599.0 ...
1045
1046 4 jobs; 0 completed, 0 removed, 0 idle, 3 running, 1 held, 0 suspended
1047
1048 Here are some examples with a nested-DAG workflow in the queue, which
1049 is one of the most complicated cases. The workflow consists of a top-
1050 level DAG with nodes NodeA and NodeB, each with two two-proc clusters;
1051 and a sub-DAG SubZ with nodes NodeSA and NodeSB, each with two two-proc
1052 clusters.
1053
1054 First of all, non-batch mode with all of the node jobs in the queue:
1055
1056 $ condor_q -nobatch
1057
1058 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
1059 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1060 591.0 wenger 9/13 16:05 0+00:00:13 R 0 2.4 con‐
1061 dor_dagman -p 0
1062 592.0 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 60
1063 592.1 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 300
1064 593.0 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 60
1065 593.1 wenger 9/13 16:05 0+00:00:07 R 0 0.0 sleep 300
1066 594.0 wenger 9/13 16:05 0+00:00:07 R 0 2.4 con‐
1067 dor_dagman -p 0
1068 595.0 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 60
1069 595.1 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 300
1070 596.0 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 60
1071 596.1 wenger 9/13 16:05 0+00:00:01 R 0 0.0 sleep 300
1072
1073 10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 sus‐
1074 pended
1075
1076 Now non-batch mode with the -dag option (unfortunately, condor_q
1077 doesn't do a good job of grouping procs in the same cluster together):
1078
1079 $ condor_q -nobatch -dag
1080
1081 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
1082 ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD
1083 591.0 wenger 9/13 16:05 0+00:00:27 R 0 2.4 con‐
1084 dor_dagman -
1085 592.0 |-NodeA 9/13 16:05 0+00:00:21 R 0 0.0 sleep
1086 60
1087 593.0 |-NodeB 9/13 16:05 0+00:00:21 R 0 0.0 sleep
1088 60
1089 594.0 |-SubZ 9/13 16:05 0+00:00:21 R 0 2.4 con‐
1090 dor_dagman -
1091 595.0 |-NodeSA 9/13 16:05 0+00:00:15 R 0 0.0 sleep
1092 60
1093 596.0 |-NodeSB 9/13 16:05 0+00:00:15 R 0 0.0 sleep
1094 60
1095 592.1 |-NodeA 9/13 16:05 0+00:00:21 R 0 0.0 sleep
1096 300
1097 593.1 |-NodeB 9/13 16:05 0+00:00:21 R 0 0.0 sleep
1098 300
1099 595.1 |-NodeSA 9/13 16:05 0+00:00:15 R 0 0.0 sleep
1100 300
1101 596.1 |-NodeSB 9/13 16:05 0+00:00:15 R 0 0.0 sleep
1102 300
1103
1104 10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 sus‐
1105 pended
1106
1107 Now, finally, the non-batch (default) mode:
1108
1109 $ condor_q
1110
1111 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
1112 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
1113 wenger ex1.dag+591 9/13 16:05 _ 8 _ 5 592.0 ...
1114 596.1
1115
1116 10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 sus‐
1117 pended
1118
1119 There are several things about this output that may be slightly confus‐
1120 ing:
1121
1122 * The TOTAL column is less than the RUN column. This is because, for
1123 DAG node jobs, their contribution to the TOTAL column is the number
1124 of clusters, not the number of procs (but their contribution to the
1125 RUN column is the number of procs). So the four DAG nodes (8 procs)
1126 contribute 4, and the sub-DAG contributes 1, to the TOTAL column.
1127 (But, somewhat confusingly, the sub-DAG job is not counted in the
1128 RUN column.)
1129
1130 * The sum of the RUN and IDLE columns (8) is less than the 10 jobs
1131 listed in the totals line at the bottom. This is because the top-
1132 level DAG and sub-DAG jobs are not counted in the RUN column, but
1133 they are counted in the totals line.
1134
1135 Now here is non-batch mode after proc 0 of each node job has finished:
1136
1137 $ condor_q -nobatch
1138
1139 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
1140 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1141 591.0 wenger 9/13 16:05 0+00:01:19 R 0 2.4 con‐
1142 dor_dagman -p 0
1143 592.1 wenger 9/13 16:05 0+00:01:13 R 0 0.0 sleep 300
1144 593.1 wenger 9/13 16:05 0+00:01:13 R 0 0.0 sleep 300
1145 594.0 wenger 9/13 16:05 0+00:01:13 R 0 2.4 con‐
1146 dor_dagman -p 0
1147 595.1 wenger 9/13 16:05 0+00:01:07 R 0 0.0 sleep 300
1148 596.1 wenger 9/13 16:05 0+00:01:07 R 0 0.0 sleep 300
1149
1150 6 jobs; 0 completed, 0 removed, 0 idle, 6 running, 0 held, 0 suspended
1151
1152 The same state also with the -dag option:
1153
1154 $ condor_q -nobatch -dag
1155
1156 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
1157 ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD
1158 591.0 wenger 9/13 16:05 0+00:01:30 R 0 2.4 con‐
1159 dor_dagman -
1160 592.1 |-NodeA 9/13 16:05 0+00:01:24 R 0 0.0 sleep
1161 300
1162 593.1 |-NodeB 9/13 16:05 0+00:01:24 R 0 0.0 sleep
1163 300
1164 594.0 |-SubZ 9/13 16:05 0+00:01:24 R 0 2.4 con‐
1165 dor_dagman -
1166 595.1 |-NodeSA 9/13 16:05 0+00:01:18 R 0 0.0 sleep
1167 300
1168 596.1 |-NodeSB 9/13 16:05 0+00:01:18 R 0 0.0 sleep
1169 300
1170
1171 6 jobs; 0 completed, 0 removed, 0 idle, 6 running, 0 held, 0 suspended
1172
1173 And, finally, that state in batch (default) mode:
1174
1175 $ condor_q
1176
1177 -- Schedd: wenger@manta.cs.wisc.edu : <128.105.14.228:9619?...
1178 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
1179 wenger ex1.dag+591 9/13 16:05 _ 4 _ 5 592.1 ...
1180 596.1
1181
1182 6 jobs; 0 completed, 0 removed, 0 idle, 6 running, 0 held, 0 suspended
1183
1185 condor_q will exit with a status value of 0 (zero) upon success, and it
1186 will exit with the value 1 (one) upon failure.
1187
1189 Center for High Throughput Computing, University of Wisconsin-Madison
1190
1192 Copyright (C) 1990-2018 Center for High Throughput Computing, Computer
1193 Sciences Department, University of Wisconsin-Madison, Madison, WI. All
1194 Rights Reserved. Licensed under the Apache License, Version 2.0.
1195
1196
1197
1198 date just-man-pages/condor_q(1)