1QUEUE_CONF(5) Grid Engine File Formats QUEUE_CONF(5)
2
3
4
6 queue_conf - Grid Engine queue configuration file format
7
9 This manual page describes the format of the template file for the
10 cluster queue configuration. Via the -aq and -mq options of the
11 qconf(1) command, you can add cluster queues and modify the configura‐
12 tion of any queue in the cluster. Any of these change operations can be
13 rejected, as a result of a failed integrity verification.
14
15 The queue configuration parameters take as values strings, integer dec‐
16 imal numbers or boolean, time and memory specifiers (see time_specifier
17 and memory_specifier in sge_types(5)) as well as comma separated lists.
18
19 Note, Grid Engine allows backslashes (\) be used to escape newline
20 (\newline) characters. The backslash and the newline are replaced with
21 a space (" ") character before any interpretation.
22
24 The following list of parameters specifies the queue configuration file
25 content:
26
27 qname
28 The name of the cluster queue as defined for queue_name in
29 sge_types(1). As template default "template" is used.
30
31 hostlist
32 A list of host identifiers as defined for host_identifier in
33 sge_types(1). For each host Grid Engine maintains a queue instance for
34 running jobs on that particular host. Large amounts of hosts can easily
35 be managed by using host groups rather than by single host names. As
36 list separators white-spaces and "," can be used. (template default:
37 NONE).
38
39 If more than one host is specified it can be desirable to specify
40 divergences with the further below parameter settings for certain
41 hosts. These divergences can be expressed using the enhanced queue
42 configuration specifier syntax. This syntax builds upon the regular
43 parameter specifier syntax separately for each parameter:
44
45 "["host_identifier=<parameters_specifier_syntax>"]" [,"["host_identi‐
46 fier=<parameters_specifier_syntax>"]" ]
47
48 note, even in the enhanced queue configuration specifier syntax an
49 entry without brackets denoting the default setting is required and
50 used for all queue instances where no divergences are specified.
51 Tuples with a host group host_identifier override the default setting.
52 Tuples with a host name host_identifier override both the default and
53 the host group setting.
54
55 Note that also with the enhanced queue configuration specifier syntax a
56 default setting is always needed for each configuration attribute; oth‐
57 erwise the queue configuration gets rejected. Ambiguous queue configu‐
58 rations with more than one attribute setting for a particular host are
59 rejected. Configurations containing override values for hosts not
60 enlisted under 'hostname' are accepted but are indicated by -sds of
61 qconf(1). The cluster queue should contain an unambiguous specifica‐
62 tion for each configuration attribute of each queue instance specified
63 under hostname in the queue configuration. Ambiguous configurations
64 with more than one attribute setting resulting from overlapping host
65 groups are indicated by -explain c of qstat(1) and cause the queue
66 instance with ambiguous configurations to enter the c(onfiguration
67 ambiguous) state.
68
69 seq_no
70 In conjunction with the hosts load situation at a time this parameter
71 specifies this queue's position in the scheduling order within the
72 suitable queues for a job to be dispatched under consideration of the
73 queue_sort_method (see sched_conf(5) ).
74
75 Regardless of the queue_sort_method setting, qstat(1) reports queue
76 information in the order defined by the value of the seq_no. Set this
77 parameter to a monotonically increasing sequence. (type number; tem‐
78 plate default: 0).
79
80 load_thresholds
81 load_thresholds is a list of load thresholds. Already if one of the
82 thresholds is exceeded no further jobs will be scheduled to the queues
83 and qmon(1) will signal an overload condition for this node. Arbitrary
84 load values being defined in the "host" and "global" complexes (see
85 complex(5) for details) can be used.
86
87 The syntax is that of a comma separated list with each list element
88 consisting of the complex_name (see sge_types(5)) of a load value, an
89 equal sign and the threshold value being intended to trigger the over‐
90 load situation (e.g. load_avg=1.75,users_logged_in=5).
91
92 Note: Load values as well as consumable resources may be scaled differ‐
93 ently for different hosts if specified in the corresponding execution
94 host definitions (refer to host_conf(5) for more information). Load
95 thresholds are compared against the scaled load and consumable values.
96
97 suspend_thresholds
98 A list of load thresholds with the same semantics as that of the
99 load_thresholds parameter (see above) except that exceeding one of the
100 denoted thresholds initiates suspension of one of multiple jobs in the
101 queue. See the nsuspend parameter below for details on the number of
102 jobs which are suspended. There is an important relationship between
103 the uspend_threshold and the cheduler_interval. If you have for example
104 a suspend threshold on the np_load_avg, and the load exceeds the
105 threshold, this does not have immediate effect. Jobs continue running
106 until the next scheduling run, where the scheduler detects the thresh‐
107 old has been exceeded and sends an order to qmaster to suspend the job.
108 The same applies for unsuspending.
109
110 nsuspend
111 The number of jobs which are suspended/enabled per time interval if at
112 least one of the load thresholds in the suspend_thresholds list is
113 exceeded or if no suspend_threshold is violated anymore respectively.
114 Nsuspend jobs are suspended in each time interval until no sus‐
115 pend_thresholds are exceeded anymore or all jobs in the queue are sus‐
116 pended. Jobs are enabled in the corresponding way if the sus‐
117 pend_thresholds are no longer exceeded. The time interval in which the
118 suspensions of the jobs occur is defined in suspend_interval below.
119
120 suspend_interval
121 The time interval in which further nsuspend jobs are suspended if one
122 of the suspend_thresholds (see above for both) is exceeded by the cur‐
123 rent load on the host on which the queue is located. The time interval
124 is also used when enabling the jobs. The syntax is that of a
125 time_specifier in sge_types(5).
126
127 priority
128 The priority parameter specifies the nice(2) value at which jobs in
129 this queue will be run. The type is number and the default is zero
130 (which means no nice value is set explicitly). Negative values (up to
131 -20) correspond to a higher scheduling priority, positive values (up to
132 +20) correspond to a lower scheduling priority.
133
134 Note, the value of priority has no effect, if Grid Engine adjusts pri‐
135 orities dynamically to implement ticket-based entitlement policy goals.
136 Dynamic priority adjustment is switched off by default due to
137 sge_conf(5) reprioritize being set to false.
138
139 min_cpu_interval
140 The time between two automatic checkpoints in case of transparently
141 checkpointing jobs. The maximum of the time requested by the user via
142 qsub(1) and the time defined by the queue configuration is used as
143 checkpoint interval. Since checkpoint files may be considerably large
144 and thus writing them to the file system may become expensive, users
145 and administrators are advised to choose sufficiently large time inter‐
146 vals. min_cpu_interval is of type time and the default is 5 minutes
147 (which usually is suitable for test purposes only). The syntax is that
148 of a time_specifier in sge_types(5).
149
150 processors
151 A set of processors in case of a multiprocessor execution host can be
152 defined to which the jobs executing in this queue are bound. The value
153 type of this parameter is a range description like that of the -pe
154 option of qsub(1) (e.g. 1-4,8,10) denoting the processor numbers for
155 the processor group to be used. Obviously the interpretation of these
156 values relies on operating system specifics and is thus performed
157 inside ge_execd(8) running on the queue host. Therefore, the parsing of
158 the parameter has to be provided by the execution daemon and the param‐
159 eter is only passed through ge_qmaster(8) as a string.
160
161 Currently, support is only provided for multiprocessor machines running
162 Solaris, SGI multiprocessor machines running IRIX 6.2 and Digital UNIX
163 multiprocessor machines. In the case of Solaris the processor set must
164 already exist, when this processors parameter is configured. So the
165 processor set has to be created manually. In the case of Digital UNIX
166 only one job per processor set is allowed to execute at the same time,
167 i.e. slots (see above) should be set to 1 for this queue.
168
169 qtype
170 The type of queue. Currently batch, interactive or a combination in a
171 comma separated list or NONE.
172
173 The formerly supported types parallel and checkpointing are not allowed
174 anymore. A queue instance is implicitly of type parallel/checkpointing
175 if there is a parallel environment or a checkpointing interface speci‐
176 fied for this queue instance in pe_list/ckpt_list. Formerly possible
177 settings e.g.
178
179 qtype PARALLEL
180
181 could be transferred into
182
183 qtype NONE
184 pe_list pe_name
185
186 (type string; default: batch interactive).
187
188 pe_list
189 The list of administrator-defined parallel environment (see sge_pe(5))
190 names to be associated with the queue. The default is NONE.
191
192 ckpt_list
193 The list of administrator-defined checkpointing interface names (see
194 ckpt_name in sge_types(1)) to be associated with the queue. The default
195 is NONE.
196
197 rerun
198 Defines a default behavior for jobs which are aborted by system crashes
199 or manual "violent" (via kill(1)) shutdown of the complete Grid Engine
200 system (including the ge_shepherd(8) of the jobs and their process
201 hierarchy) on the queue host. As soon as ge_execd(8) is restarted and
202 detects that a job has been aborted for such reasons it can be
203 restarted if the jobs are restartable. A job may not be restartable,
204 for example, if it updates databases (first reads then writes to the
205 same record of a database/file) because the abortion of the job may
206 have left the database in an inconsistent state. If the owner of a job
207 wants to overrule the default behavior for the jobs in the queue the -r
208 option of qsub(1) can be used.
209
210 The type of this parameter is boolean, thus either TRUE or FALSE can be
211 specified. The default is FALSE, i.e. do not restart jobs automati‐
212 cally.
213
214 slots
215 The maximum number of concurrently executing jobs allowed in the queue.
216 Type is number, valid values are 0 to 9999999.
217
218 tmpdir
219 The tmpdir parameter specifies the absolute path to the base of the
220 temporary directory filesystem. When ge_execd(8) launches a job, it
221 creates a uniquely-named directory in this filesystem for the purpose
222 of holding scratch files during job execution. At job completion, this
223 directory and its contents are removed automatically. The environment
224 variables TMPDIR and TMP are set to the path of each jobs scratch
225 directory (type string; default: /tmp).
226
227 shell
228 If either posix_compliant or script_from_stdin is specified as the
229 shell_start_mode parameter in ge_conf(5) the shell parameter specifies
230 the executable path of the command interpreter (e.g. sh(1) or csh(1))
231 to be used to process the job scripts executed in the queue. The defi‐
232 nition of shell can be overruled by the job owner via the qsub(1) -S
233 option.
234
235 The type of the parameter is string. The default is /bin/csh.
236
237 shell_start_mode
238 This parameter defines the mechanisms which are used to actually invoke
239 the job scripts on the execution hosts. The following values are recog‐
240 nized:
241
242 unix_behavior
243 If a user starts a job shell script under UNIX interactively by
244 invoking it just with the script name the operating system's
245 executable loader uses the information provided in a comment
246 such as `#!/bin/csh' in the first line of the script to detect
247 which command interpreter to start to interpret the script. This
248 mechanism is used by Grid Engine when starting jobs if
249 unix_behavior is defined as shell_start_mode.
250
251 posix_compliant
252 POSIX does not consider first script line comments such a
253 `#!/bin/csh' as being significant. The POSIX standard for batch
254 queuing systems (P1003.2d) therefore requires a compliant queu‐
255 ing system to ignore such lines but to use user specified or
256 configured default command interpreters instead. Thus, if
257 shell_start_mode is set to posix_compliant Grid Engine will
258 either use the command interpreter indicated by the -S option of
259 the qsub(1) command or the shell parameter of the queue to be
260 used (see above).
261
262 script_from_stdin
263 Setting the shell_start_mode parameter either to posix_compliant
264 or unix_behavior requires you to set the umask in use for
265 ge_execd(8) such that every user has read access to the
266 active_jobs directory in the spool directory of the correspond‐
267 ing execution daemon. In case you have prolog and epilog scripts
268 configured, they also need to be readable by any user who may
269 execute jobs.
270 If this violates your site's security policies you may want to
271 set shell_start_mode to script_from_stdin. This will force Grid
272 Engine to open the job script as well as the epilogue and pro‐
273 logue scripts for reading into STDIN as root (if ge_execd(8) was
274 started as root) before changing to the job owner's user
275 account. The script is then fed into the STDIN stream of the
276 command interpreter indicated by the -S option of the qsub(1)
277 command or the shell parameter of the queue to be used (see
278 above).
279 Thus setting shell_start_mode to script_from_stdin also implies
280 posix_compliant behavior. Note, however, that feeding scripts
281 into the STDIN stream of a command interpreter may cause trouble
282 if commands like rsh(1) are invoked inside a job script as they
283 also process the STDIN stream of the command interpreter. These
284 problems can usually be resolved by redirecting the STDIN chan‐
285 nel of those commands to come from /dev/null (e.g. rsh host date
286 < /dev/null). Note also, that any command-line options associ‐
287 ated with the job are passed to the executing shell. The shell
288 will only forward them to the job if they are not recognized as
289 valid shell options.
290
291 The default for shell_start_mode is posix_compliant. Note, though,
292 that the shell_start_mode can only be used for batch jobs submitted by
293 qsub(1) and can't be used for interactive jobs submitted by qrsh(1),
294 qsh(1), qlogin(1).
295
296 prolog
297 The executable path of a shell script that is started before execution
298 of Grid Engine jobs with the same environment setting as that for the
299 Grid Engine jobs to be started afterwards. An optional prefix "user@"
300 specifies the user under which this procedure is to be started. The
301 procedures standard output and the error output stream are written to
302 the same file used also for the standard output and error output of
303 each job. This procedure is intended as a means for the Grid Engine
304 administrator to automate the execution of general site specific tasks
305 like the preparation of temporary file systems with the need for the
306 same context information as the job. This queue configuration entry
307 overwrites cluster global or execution host specific prolog definitions
308 (see ge_conf(5)).
309
310 The default for prolog is the special value NONE, which prevents from
311 execution of a prologue script. The special variables for constitut‐
312 ing a command line are the same like in prolog definitions of the clus‐
313 ter configuration (see ge_conf(5)).
314
315 Exit codes for the prolog attribute can be interpreted based on the
316 following exit values:
317 0: Success
318 99: Reschedule job
319 100: Put job in error state
320 Anything else: Put queue in error state
321
322 epilog
323 The executable path of a shell script that is started after execution
324 of Grid Engine jobs with the same environment setting as that for the
325 Grid Engine jobs that has just completed. An optional prefix "user@"
326 specifies the user under which this procedure is to be started. The
327 procedures standard output and the error output stream are written to
328 the same file used also for the standard output and error output of
329 each job. This procedure is intended as a means for the Grid Engine
330 administrator to automate the execution of general site specific tasks
331 like the cleaning up of temporary file systems with the need for the
332 same context information as the job. This queue configuration entry
333 overwrites cluster global or execution host specific epilog definitions
334 (see ge_conf(5)).
335
336 The default for epilog is the special value NONE, which prevents from
337 execution of a epilogue script. The special variables for constitut‐
338 ing a command line are the same like in prolog definitions of the clus‐
339 ter configuration (see ge_conf(5)).
340
341 Exit codes for the epilog attribute can be interpreted based on the
342 following exit values:
343 0: Success
344 99: Reschedule job
345 100: Put job in error state
346 Anything else: Put queue in error state
347
348 starter_method
349 The specified executable path will be used as a job starter facility
350 responsible for starting batch jobs. The executable path will be exe‐
351 cuted instead of the configured shell to start the job. The job argu‐
352 ments will be passed as arguments to the job starter. The following
353 environment variables are used to pass information to the job starter
354 concerning the shell environment which was configured or requested to
355 start the job.
356
357
358 SGE_STARTER_SHELL_PATH
359 The name of the requested shell to start the job
360
361 SGE_STARTER_SHELL_START_MODE
362 The configured shell_start_mode
363
364 SGE_STARTER_USE_LOGIN_SHELL
365 Set to "true" if the shell is supposed to be used as a login
366 shell (see login_shells in ge_conf(5))
367
368 The starter_method will not be invoked for qsh, qlogin or qrsh acting
369 as rlogin.
370
371
372 suspend_method
373 resume_method
374 terminate_method
375 These parameters can be used for overwriting the default method used by
376 Grid Engine for suspension, release of a suspension and for termination
377 of a job. Per default, the signals SIGSTOP, SIGCONT and SIGKILL are
378 delivered to the job to perform these actions. However, for some appli‐
379 cations this is not appropriate.
380
381 If no executable path is given, Grid Engine takes the specified parame‐
382 ter entries as the signal to be delivered instead of the default sig‐
383 nal. A signal must be either a positive number or a signal name with
384 "SIG" as prefix and the signal name as printed by kill -l (e.g.
385 SIGTERM).
386
387 If an executable path is given (it must be an absolute path starting
388 with a "/") then this command together with its arguments is started by
389 Grid Engine to perform the appropriate action. The following special
390 variables are expanded at runtime and can be used (besides any other
391 strings which have to be interpreted by the procedures) to constitute a
392 command line:
393
394
395 $host The name of the host on which the procedure is started.
396
397 $job_owner
398 The user name of the job owner.
399
400 $job_id
401 Grid Engine's unique job identification number.
402
403 $job_name
404 The name of the job.
405
406 $queue The name of the queue.
407
408 $job_pid
409 The pid of the job.
410
411
412 notify
413 The time waited between delivery of SIGUSR1/SIGUSR2 notification sig‐
414 nals and suspend/kill signals if job was submitted with the qsub(1)
415 -notify option.
416
417 owner_list
418 The owner_list enlists comma separated the login(1) user names (see
419 user_name in sge_types(1)) of those users who are authorized to disable
420 and suspend this queue through qmod(1) (Grid Engine operators and man‐
421 agers can do this by default). It is customary to set this field for
422 queues on interactive workstations where the computing resources are
423 shared between interactive sessions and Grid Engine jobs, allowing the
424 workstation owner to have priority access. (default: NONE).
425
426 user_lists
427 The user_lists parameter contains a comma separated list of Grid Engine
428 user access list names as described in access_list(5). Each user con‐
429 tained in at least one of the enlisted access lists has access to the
430 queue. If the user_lists parameter is set to NONE (the default) any
431 user has access being not explicitly excluded via the xuser_lists
432 parameter described below. If a user is contained both in an access
433 list enlisted in xuser_lists and user_lists the user is denied access
434 to the queue.
435
436 xuser_lists
437 The xuser_lists parameter contains a comma separated list of Grid
438 Engine user access list names as described in access_list(5). Each
439 user contained in at least one of the enlisted access lists is not
440 allowed to access the queue. If the xuser_lists parameter is set to
441 NONE (the default) any user has access. If a user is contained both in
442 an access list enlisted in xuser_lists and user_lists the user is
443 denied access to the queue.
444
445 projects
446 The projects parameter contains a comma separated list of Grid Engine
447 projects (see project(5)) that have access to the queue. Any project
448 not in this list are denied access to the queue. If set to NONE (the
449 default), any project has access that is not specifically excluded via
450 the xprojects parameter described below. If a project is in both the
451 projects and xprojects parameters, the project is denied access to the
452 queue.
453
454 xprojects
455 The xprojects parameter contains a comma separated list of Grid Engine
456 projects (see project(5)) that are denied access to the queue. If set
457 to NONE (the default), no projects are denied access other than those
458 denied access based on the projects parameter described above. If a
459 project is in both the projects and xprojects parameters, the project
460 is denied access to the queue.
461
462 subordinate_list
463 There are two different types of subordination:
464
465 1. Queuewise subordination
466
467 A list of Grid Engine queue names as defined for queue_name in
468 sge_types(1). Subordinate relationships are in effect only between
469 queue instances residing at the same host. The relationship does not
470 apply and is ignored when jobs are running in queue instances on other
471 hosts. Queue instances residing on the same host will be suspended
472 when a specified count of jobs is running in this queue instance. The
473 list specification is the same as that of the load_thresholds parameter
474 above, e.g. low_pri_q=5,small_q. The numbers denote the job slots of
475 the queue that have to be filled in the superordinated queue to trigger
476 the suspension of the subordinated queue. If no value is assigned a
477 suspension is triggered if all slots of the queue are filled.
478
479 On nodes which host more than one queue, you might wish to accord bet‐
480 ter service to certain classes of jobs (e.g., queues that are dedicated
481 to parallel processing might need priority over low priority production
482 queues; default: NONE).
483
484 2. Slotwise preemption
485
486 The slotwise preemption provides a means to ensure that high priority
487 jobs get the resources they need, while at the same time low priority
488 jobs on the same host are not unnecessarily preempted, maximizing the
489 host utilization. The slotwise preemption is designed to provide dif‐
490 ferent preemption actions, but with the current implementation only
491 suspension is provided. This means there is a subordination relation‐
492 ship defined between queues similar to the queuewise subordination, but
493 if the suspend threshold is exceeded, not the whole subordinated queue
494 is suspended, there are only single tasks running in single slots sus‐
495 pended.
496
497 Like with queuewise subordination, the subordination relationships are
498 in effect only between queue instances residing at the same host. The
499 relationship does not apply and is ignored when jobs and tasks are run‐
500 ning in queue instances on other hosts.
501
502 The syntax is:
503
504 slots=<threshold>(<queue_list>)
505
506 where
507 <threshold> =a positive integer number
508 <queue_list>=<queue_def>[,<queue_list>]
509 <queue_def> =<queue>[:<seq_no>][:<action>]
510 <queue> =a Grid Engine queue name as defined for
511 queue_name in sge_types(1).
512 <seq_no> =sequence number among all subordinated queues
513 of the same depth in the tree. The higher the
514 sequence number, the lower is the priority of
515 the queue.
516 Default is 0, which is the highest priority.
517 <action> =the action to be taken if the threshold is
518 exceeded. Supported is:
519 "sr": Suspend the task with the shortest run
520 time.
521 "lr": Suspend the task with the longest run
522 time.
523 Default is "sr".
524
525 Some examples of possible configurations and their functionalities:
526
527 a) The simplest configuration
528
529 subordinate_list slots=2(B.q)
530
531 which means the queue "B.q" is subordinated to the current queue (let's
532 call it "A.q"), the suspend threshold for all tasks running in "A.q"
533 and "B.q" on the current host is two, the sequence number of "B.q" is
534 "0" and the action is "suspend task with shortest run time first". This
535 subordination relationship looks like this:
536
537 A.q
538 |
539 B.q
540
541 This could be a typical configuration for a host with a dual core CPU.
542 This subordination configuration ensures that tasks that are scheduled
543 to "A.q" always get a CPU core for themselves, while jobs in "B.q" are
544 not preempted as long as there are no jobs running in "A.q".
545
546 If there is no task running in "A.q", two tasks are running in "B.q"
547 and a new task is scheduled to "A.q", the sum of tasks running in "A.q"
548 and "B.q" is three. Three is greater than two, this triggers the
549 defined action. This causes the task with the shortest run time in the
550 subordinated queue "B.q" to be suspended. After suspension, there is
551 one task running in "A.q", on task running in "B.q" and one task sus‐
552 pended in "B.q".
553
554 b) A simple tree
555
556 subordinate_list slots=2(B.q:1, C.q:2)
557
558 This defines a small tree that looks like this:
559
560 A.q
561 / \
562 B.q C.q
563
564 A use case for this configuration could be a host with a dual core CPU
565 and queue "B.q" and "C.q" for jobs with different requirements, e.g.
566 "B.q" for interactive jobs, "C.q" for batch jobs. Again, the tasks in
567 "A.q" always get a CPU core, while tasks in "B.q" and "C.q" are sus‐
568 pended only if the threshold of running tasks is exceeded. Here the
569 sequence number among the queues of the same depth comes into play.
570 Tasks scheduled to "B.q" can't directly trigger the suspension of tasks
571 in "C.q", but if there is a task to be suspended, first "C.q" will be
572 searched for a suitable task.
573
574 If there is one task running in "A.q", one in "C.q" and a new task is
575 scheduled to "B.q", the threshold of "2" in "A.q", "B.q" and "C.q" is
576 exceeded. This triggers the suspension of one task in either "B.q" or
577 "C.q". The sequence number gives "B.q" a higher priority than "C.q",
578 therefore the task in "C.q" is suspended. After suspension, there is
579 one task running in "A.q", one task running in "B.q" and one task sus‐
580 pended in "C.q".
581
582 c) More than two levels
583
584 Configuration of A.q: subordinate_list slots=2(B.q)
585 Configuration of B.q: subordinate_list slots=2(C.q)
586
587 looks like this:
588
589 A.q
590 |
591 B.q
592 |
593 C.q
594
595 These are three queues with high, medium and low priority. If a task
596 is scheduled to "C.q", first the subtree consisting of "B.q" and "C.q"
597 is checked, the number of tasks running there is counted. If the
598 threshold which is defined in "B.q" is exceeded, the job in "C.q" is
599 suspended. Then the whole tree is checked, if the number of tasks run‐
600 ning in "A.q", "B.q" and "C.q" exceeds the threshold defined in "A.q"
601 the task in "C.q" is suspended. This means, the effective threshold of
602 any subtree is not higher than the threshold of the root node of the
603 tree. If in this example a task is scheduled to "A.q", immediately the
604 number of tasks running in "A.q", "B.q" and "C.q" is checked against
605 the threshold defined in "A.q".
606
607 d) Any tree
608
609 A.q
610 / \
611 B.q C.q
612 / / \
613 D.q E.q F.q
614 \
615 G.q
616
617 The computation of the tasks that are to be (un)suspended always starts
618 at the queue instance that is modified, i.e. a task is scheduled to, a
619 task ends at, the configuration is modified, a manual or other auto‐
620 matic (un)suspend is issued, except when it is a leaf node, like "D.q",
621 "E.q" and "G.q" in this example. Then the computation starts at its
622 parent queue instance (like "B.q", "C.q" or "F.q" in this example).
623 From there first all running tasks in the whole subtree of this queue
624 instance are counted. If the sum exceeds the threshold configured in
625 the subordinate_list, in this subtree a task is searched to be sus‐
626 pended. Then the algorithm proceeds to the parent of this queue
627 instance, counts all running tasks in the whole subtree below the par‐
628 ent and checks if the number exceeds the threshold configured at the
629 parent's subordinate_list. If so, it searches for a task to suspend in
630 the whole subtree below the parent. And so on, until it did this compu‐
631 tation for the root node of the tree.
632
633
634 complex_values
635 complex_values defines quotas for resource attributes managed via this
636 queue. The syntax is the same as for load_thresholds (see above). The
637 quotas are related to the resource consumption of all jobs in a queue
638 in the case of consumable resources (see complex(5) for details on con‐
639 sumable resources) or they are interpreted on a per queue slot (see
640 slots above) basis in the case of non-consumable resources. Consumable
641 resource attributes are commonly used to manage free memory, free disk
642 space or available floating software licenses while non-consumable
643 attributes usually define distinctive characteristics like type of
644 hardware installed.
645
646 For consumable resource attributes an available resource amount is
647 determined by subtracting the current resource consumption of all run‐
648 ning jobs in the queue from the quota in the complex_values list. Jobs
649 can only be dispatched to a queue if no resource requests exceed any
650 corresponding resource availability obtained by this scheme. The quota
651 definition in the complex_values list is automatically replaced by the
652 current load value reported for this attribute, if load is monitored
653 for this resource and if the reported load value is more stringent than
654 the quota. This effectively avoids oversubscription of resources.
655
656 Note: Load values replacing the quota specifications may have become
657 more stringent because they have been scaled (see host_conf(5)) and/or
658 load adjusted (see sched_conf(5)). The -F option of qstat(1) and the
659 load display in the qmon(1) queue control dialog (activated by clicking
660 on a queue icon while the "Shift" key is pressed) provide detailed
661 information on the actual availability of consumable resources and on
662 the origin of the values taken into account currently.
663
664 Note also: The resource consumption of running jobs (used for the
665 availability calculation) as well as the resource requests of the jobs
666 waiting to be dispatched either may be derived from explicit user
667 requests during job submission (see the -l option to qsub(1)) or from a
668 "default" value configured for an attribute by the administrator (see
669 complex(5)). The -r option to qstat(1) can be used for retrieving full
670 detail on the actual resource requests of all jobs in the system.
671
672 For non-consumable resources Grid Engine simply compares the job's
673 attribute requests with the corresponding specification in complex_val‐
674 ues taking the relation operator of the complex attribute definition
675 into account (see complex(5)). If the result of the comparison is
676 "true", the queue is suitable for the job with respect to the particu‐
677 lar attribute. For parallel jobs each queue slot to be occupied by a
678 parallel task is meant to provide the same resource attribute value.
679
680 Note: Only numeric complex attributes can be defined as consumable
681 resources and hence non-numeric attributes are always handled on a per
682 queue slot basis.
683
684 The default value for this parameter is NONE, i.e. no administrator
685 defined resource attribute quotas are associated with the queue.
686
687 calendar
688 specifies the calendar to be valid for this queue or contains NONE (the
689 default). A calendar defines the availability of a queue depending on
690 time of day, week and year. Please refer to calendar_conf(5) for
691 details on the Grid Engine calendar facility.
692
693 Note: Jobs can request queues with a certain calendar model via a "-l
694 c=<cal_name>" option to qsub(1).
695
696 initial_state
697 defines an initial state for the queue either when adding the queue to
698 the system for the first time or on start-up of the ge_execd(8) on the
699 host on which the queue resides. Possible values are:
700
701 default The queue is enabled when adding the queue or is reset to the
702 previous status when ge_execd(8) comes up (this corresponds
703 to the behavior in earlier Grid Engine releases not support‐
704 ing initial_state).
705
706 enabled The queue is enabled in either case. This is equivalent to a
707 manual and explicit 'qmod -e' command (see qmod(1)).
708
709 disabled The queue is disable in either case. This is equivalent to a
710 manual and explicit 'qmod -d' command (see qmod(1)).
711
713 The first two resource limit parameters, s_rt and h_rt, are implemented
714 by Grid Engine. They define the "real time" or also called "elapsed" or
715 "wall clock" time having passed since the start of the job. If h_rt is
716 exceeded by a job running in the queue, it is aborted via the SIGKILL
717 signal (see kill(1)). If s_rt is exceeded, the job is first "warned"
718 via the SIGUSR1 signal (which can be caught by the job) and finally
719 aborted after the notification time defined in the queue configuration
720 parameter notify (see above) has passed. In cases when s_rt is used in
721 combination with job notification it might be necessary to configure a
722 signal other than SIGUSR1 using the NOTIFY_KILL and NOTIFY_SUSP
723 execd_params (see sge_conf(5)) so that the jobs' signal-catching mecha‐
724 nism can "differ" the cases and react accordingly.
725
726 The resource limit parameters s_cpu and h_cpu are implemented by Grid
727 Engine as a job limit. They impose a limit on the amount of combined
728 CPU time consumed by all the processes in the job. If h_cpu is
729 exceeded by a job running in the queue, it is aborted via a SIGKILL
730 signal (see kill(1)). If s_cpu is exceeded, the job is sent a SIGXCPU
731 signal which can be caught by the job. If you wish to allow a job to
732 be "warned" so it can exit gracefully before it is killed then you
733 should set the s_cpu limit to a lower value than h_cpu. For parallel
734 processes, the limit is applied per slot which means that the limit is
735 multiplied by the number of slots being used by the job before being
736 applied.
737
738 The resource limit parameters s_vmem and h_vmem are implemented by Grid
739 Engine as a job limit. They impose a limit on the amount of combined
740 virtual memory consumed by all the processes in the job. If h_vmem is
741 exceeded by a job running in the queue, it is aborted via a SIGKILL
742 signal (see kill(1)). If s_vmem is exceeded, the job is sent a SIGXCPU
743 signal which can be caught by the job. If you wish to allow a job to
744 be "warned" so it can exit gracefully before it is killed then you
745 should set the s_vmem limit to a lower value than h_vmem. For parallel
746 processes, the limit is applied per slot which means that the limit is
747 multiplied by the number of slots being used by the job before being
748 applied.
749
750 The remaining parameters in the queue configuration template specify
751 per job soft and hard resource limits as implemented by the setr‐
752 limit(2) system call. See this manual page on your system for more
753 information. By default, each limit field is set to infinity (which
754 means RLIM_INFINITY as described in the setrlimit(2) manual page). The
755 value type for the CPU-time limits s_cpu and h_cpu is time. The value
756 type for the other limits is memory. Note: Not all systems support
757 setrlimit(2).
758
759 Note also: s_vmem and h_vmem (virtual memory) are only available on
760 systems supporting RLIMIT_VMEM (see setrlimit(2) on your operating sys‐
761 tem).
762
763 The UNICOS operating system supplied by SGI/Cray does not support the
764 setrlimit(2) system call, using their own resource limit-setting system
765 call instead. For UNICOS systems only, the following meanings apply:
766
767 s_cpu The per-process CPU time limit in seconds.
768
769 s_core The per-process maximum core file size in bytes.
770
771 s_data The per-process maximum memory limit in bytes.
772
773 s_vmem The same as s_data (if both are set the minimum is used).
774
775 h_cpu The per-job CPU time limit in seconds.
776
777 h_data The per-job maximum memory limit in bytes.
778
779 h_vmem The same as h_data (if both are set the minimum is used).
780
781 h_fsize The total number of disk blocks that this job can create.
782
784 ge_intro(1), sge_types(1), csh(1), qconf(1), qmon(1), qrestart(1),
785 qstat(1), qsub(1), sh(1), nice(2), setrlimit(2), access_list(5), calen‐
786 dar_conf(5), ge_conf(5), complex(5), host_conf(5), sched_conf(5),
787 ge_execd(8), ge_qmaster(8), ge_shepherd(8).
788
790 See ge_intro(1) for a full statement of rights and permissions.
791
792
793
794GE 6.2u5 $Date: 2009/12/07 19:09:27 $ QUEUE_CONF(5)