sge_queue_conf(5)

1QUEUE_CONF(5)              Grid Engine File Formats              QUEUE_CONF(5)
2
3
4

NAME

6       queue_conf - Grid Engine queue configuration file format
7

DESCRIPTION

9       This  manual  page  describes  the  format of the template file for the
10       cluster queue configuration.  Via  the  -aq  and  -mq  options  of  the
11       qconf(1)  command, you can add cluster queues and modify the configura‐
12       tion of any queue in the cluster. Any of these change operations can be
13       rejected, as a result of a failed integrity verification.
14
15       The queue configuration parameters take as values strings, integer dec‐
16       imal numbers or boolean, time and memory specifiers (see time_specifier
17       and memory_specifier in sge_types(5)) as well as comma separated lists.
18
19       Note,  Grid  Engine  allows  backslashes  (\) be used to escape newline
20       (\newline) characters. The backslash and the newline are replaced  with
21       a space (" ") character before any interpretation.
22

FORMAT

24       The following list of parameters specifies the queue configuration file
25       content:
26
27   qname
28       The  name  of  the  cluster  queue  as  defined   for   queue_name   in
29       sge_types(1).  As template default "template" is used.
30
31   hostlist
32       A   list   of  host  identifiers  as  defined  for  host_identifier  in
33       sge_types(1).  For each host Grid Engine maintains a queue instance for
34       running jobs on that particular host. Large amounts of hosts can easily
35       be managed by using host groups rather than by single host  names.   As
36       list  separators  white-spaces and "," can be used.  (template default:
37       NONE).
38
39       If more than one host is specified  it  can  be  desirable  to  specify
40       divergences  with  the  further  below  parameter  settings for certain
41       hosts.  These divergences can be expressed  using  the  enhanced  queue
42       configuration  specifier  syntax.  This  syntax builds upon the regular
43       parameter specifier syntax separately for each parameter:
44
45       "["host_identifier=<parameters_specifier_syntax>"]"   [,"["host_identi‐
46       fier=<parameters_specifier_syntax>"]" ]
47
48       note,  even  in  the  enhanced  queue configuration specifier syntax an
49       entry without brackets denoting the default  setting  is  required  and
50       used  for  all  queue  instances  where  no  divergences are specified.
51       Tuples with a host group host_identifier override the default  setting.
52       Tuples  with  a host name host_identifier override both the default and
53       the host group setting.
54
55       Note that also with the enhanced queue configuration specifier syntax a
56       default setting is always needed for each configuration attribute; oth‐
57       erwise the queue configuration gets rejected. Ambiguous queue  configu‐
58       rations  with more than one attribute setting for a particular host are
59       rejected.  Configurations containing  override  values  for  hosts  not
60       enlisted  under  'hostname'  are  accepted but are indicated by -sds of
61       qconf(1).  The cluster queue should contain an  unambiguous  specifica‐
62       tion  for each configuration attribute of each queue instance specified
63       under hostname in the  queue  configuration.  Ambiguous  configurations
64       with  more  than  one attribute setting resulting from overlapping host
65       groups are indicated by -explain c of  qstat(1)  and  cause  the  queue
66       instance  with  ambiguous  configurations  to  enter the c(onfiguration
67       ambiguous) state.
68
69   seq_no
70       In conjunction with the hosts load situation at a time  this  parameter
71       specifies  this  queue's  position  in  the scheduling order within the
72       suitable queues for a job to be dispatched under consideration  of  the
73       queue_sort_method (see sched_conf(5) ).
74
75       Regardless  of  the  queue_sort_method  setting, qstat(1) reports queue
76       information in the order defined by the value of the seq_no.  Set  this
77       parameter  to  a  monotonically increasing sequence. (type number; tem‐
78       plate default: 0).
79
80   load_thresholds
81       load_thresholds is a list of load thresholds. Already  if  one  of  the
82       thresholds  is exceeded no further jobs will be scheduled to the queues
83       and qmon(1) will signal an overload condition for this node.  Arbitrary
84       load  values  being  defined  in the "host" and "global" complexes (see
85       complex(5) for details) can be used.
86
87       The syntax is that of a comma separated list  with  each  list  element
88       consisting  of  the complex_name (see sge_types(5)) of a load value, an
89       equal sign and the threshold value being intended to trigger the  over‐
90       load situation (e.g. load_avg=1.75,users_logged_in=5).
91
92       Note: Load values as well as consumable resources may be scaled differ‐
93       ently for different hosts if specified in the  corresponding  execution
94       host  definitions  (refer  to  host_conf(5) for more information). Load
95       thresholds are compared against the scaled load and consumable values.
96
97   suspend_thresholds
98       A list of load thresholds with  the  same  semantics  as  that  of  the
99       load_thresholds  parameter (see above) except that exceeding one of the
100       denoted thresholds initiates suspension of one of multiple jobs in  the
101       queue.   See  the nsuspend parameter below for details on the number of
102       jobs which are suspended. There is an  important  relationship  between
103       the uspend_threshold and the cheduler_interval. If you have for example
104       a suspend threshold on  the  np_load_avg,  and  the  load  exceeds  the
105       threshold,  this  does not have immediate effect. Jobs continue running
106       until the next scheduling run, where the scheduler detects the  thresh‐
107       old has been exceeded and sends an order to qmaster to suspend the job.
108       The same applies for unsuspending.
109
110   nsuspend
111       The number of jobs which are suspended/enabled per time interval if  at
112       least  one  of  the  load  thresholds in the suspend_thresholds list is
113       exceeded or if no suspend_threshold is violated  anymore  respectively.
114       Nsuspend  jobs  are  suspended  in  each  time  interval  until no sus‐
115       pend_thresholds are exceeded anymore or all jobs in the queue are  sus‐
116       pended.  Jobs  are  enabled  in  the  corresponding  way  if  the  sus‐
117       pend_thresholds are no longer exceeded.  The time interval in which the
118       suspensions of the jobs occur is defined in suspend_interval below.
119
120   suspend_interval
121       The  time  interval in which further nsuspend jobs are suspended if one
122       of the suspend_thresholds (see above for both) is exceeded by the  cur‐
123       rent load on the host on which the queue is located.  The time interval
124       is also used  when  enabling  the  jobs.   The  syntax  is  that  of  a
125       time_specifier in sge_types(5).
126
127   priority
128       The  priority  parameter  specifies  the nice(2) value at which jobs in
129       this queue will be run. The type is number  and  the  default  is  zero
130       (which  means  no nice value is set explicitly). Negative values (up to
131       -20) correspond to a higher scheduling priority, positive values (up to
132       +20) correspond to a lower scheduling priority.
133
134       Note,  the value of priority has no effect, if Grid Engine adjusts pri‐
135       orities dynamically to implement ticket-based entitlement policy goals.
136       Dynamic  priority  adjustment  is  switched   off  by  default  due  to
137       sge_conf(5) reprioritize being set to false.
138
139   min_cpu_interval
140       The time between two automatic checkpoints  in  case  of  transparently
141       checkpointing  jobs.  The maximum of the time requested by the user via
142       qsub(1) and the time defined by the  queue  configuration  is  used  as
143       checkpoint  interval.  Since checkpoint files may be considerably large
144       and thus writing them to the file system may  become  expensive,  users
145       and administrators are advised to choose sufficiently large time inter‐
146       vals. min_cpu_interval is of type time and the  default  is  5  minutes
147       (which usually is suitable for test purposes only).  The syntax is that
148       of a time_specifier in sge_types(5).
149
150   processors
151       A set of processors in case of a multiprocessor execution host  can  be
152       defined  to which the jobs executing in this queue are bound. The value
153       type of this parameter is a range description  like  that  of  the  -pe
154       option  of  qsub(1)  (e.g. 1-4,8,10) denoting the processor numbers for
155       the processor group to be used. Obviously the interpretation  of  these
156       values  relies  on  operating  system  specifics  and is thus performed
157       inside ge_execd(8) running on the queue host. Therefore, the parsing of
158       the parameter has to be provided by the execution daemon and the param‐
159       eter is only passed through ge_qmaster(8) as a string.
160
161       Currently, support is only provided for multiprocessor machines running
162       Solaris,  SGI multiprocessor machines running IRIX 6.2 and Digital UNIX
163       multiprocessor machines.  In the case of Solaris the processor set must
164       already  exist,  when  this  processors parameter is configured. So the
165       processor set has to be created manually.  In the case of Digital  UNIX
166       only  one job per processor set is allowed to execute at the same time,
167       i.e.  slots (see above) should be set to 1 for this queue.
168
169   qtype
170       The type of queue. Currently batch, interactive or a combination  in  a
171       comma separated list or NONE.
172
173       The formerly supported types parallel and checkpointing are not allowed
174       anymore. A queue instance is implicitly of type  parallel/checkpointing
175       if  there is a parallel environment or a checkpointing interface speci‐
176       fied for this queue instance in pe_list/ckpt_list.   Formerly  possible
177       settings e.g.
178
179       qtype   PARALLEL
180
181       could be transferred into
182
183       qtype   NONE
184       pe_list pe_name
185
186       (type string; default: batch interactive).
187
188   pe_list
189       The  list of administrator-defined parallel environment (see sge_pe(5))
190       names to be associated with the queue. The default is NONE.
191
192   ckpt_list
193       The list of administrator-defined checkpointing  interface  names  (see
194       ckpt_name in sge_types(1)) to be associated with the queue. The default
195       is NONE.
196
197   rerun
198       Defines a default behavior for jobs which are aborted by system crashes
199       or  manual "violent" (via kill(1)) shutdown of the complete Grid Engine
200       system (including the ge_shepherd(8) of  the  jobs  and  their  process
201       hierarchy)  on  the queue host. As soon as ge_execd(8) is restarted and
202       detects that a job  has  been  aborted  for  such  reasons  it  can  be
203       restarted  if  the  jobs are restartable. A job may not be restartable,
204       for example, if it updates databases (first reads then  writes  to  the
205       same  record  of  a  database/file) because the abortion of the job may
206       have left the database in an inconsistent state. If the owner of a  job
207       wants to overrule the default behavior for the jobs in the queue the -r
208       option of qsub(1) can be used.
209
210       The type of this parameter is boolean, thus either TRUE or FALSE can be
211       specified.  The  default  is  FALSE, i.e. do not restart jobs automati‐
212       cally.
213
214   slots
215       The maximum number of concurrently executing jobs allowed in the queue.
216       Type is number, valid values are 0 to 9999999.
217
218   tmpdir
219       The  tmpdir  parameter  specifies  the absolute path to the base of the
220       temporary directory filesystem. When ge_execd(8)  launches  a  job,  it
221       creates  a  uniquely-named directory in this filesystem for the purpose
222       of holding scratch files during job execution. At job completion,  this
223       directory  and  its contents are removed automatically. The environment
224       variables TMPDIR and TMP are set to  the  path  of  each  jobs  scratch
225       directory (type string; default: /tmp).
226
227   shell
228       If  either  posix_compliant  or  script_from_stdin  is specified as the
229       shell_start_mode parameter in ge_conf(5) the shell parameter  specifies
230       the  executable path of the command interpreter (e.g.  sh(1) or csh(1))
231       to be used to process the job scripts executed in the queue. The  defi‐
232       nition  of  shell  can be overruled by the job owner via the qsub(1) -S
233       option.
234
235       The type of the parameter is string. The default is /bin/csh.
236
237   shell_start_mode
238       This parameter defines the mechanisms which are used to actually invoke
239       the job scripts on the execution hosts. The following values are recog‐
240       nized:
241
242       unix_behavior
243              If a user starts a job shell script under UNIX interactively  by
244              invoking  it  just  with  the script name the operating system's
245              executable loader uses the information  provided  in  a  comment
246              such  as  `#!/bin/csh' in the first line of the script to detect
247              which command interpreter to start to interpret the script. This
248              mechanism   is  used  by  Grid  Engine  when  starting  jobs  if
249              unix_behavior is defined as shell_start_mode.
250
251       posix_compliant
252              POSIX does not  consider  first  script  line  comments  such  a
253              `#!/bin/csh'  as being significant. The POSIX standard for batch
254              queuing systems (P1003.2d) therefore requires a compliant  queu‐
255              ing  system  to  ignore  such lines but to use user specified or
256              configured  default  command  interpreters  instead.  Thus,   if
257              shell_start_mode  is  set  to  posix_compliant  Grid Engine will
258              either use the command interpreter indicated by the -S option of
259              the  qsub(1)  command  or the shell parameter of the queue to be
260              used (see above).
261
262       script_from_stdin
263              Setting the shell_start_mode parameter either to posix_compliant
264              or  unix_behavior  requires  you  to  set  the  umask in use for
265              ge_execd(8)  such  that  every  user  has  read  access  to  the
266              active_jobs  directory in the spool directory of the correspond‐
267              ing execution daemon. In case you have prolog and epilog scripts
268              configured,  they  also  need to be readable by any user who may
269              execute jobs.
270              If this violates your site's security policies you may  want  to
271              set  shell_start_mode to script_from_stdin. This will force Grid
272              Engine to open the job script as well as the epilogue  and  pro‐
273              logue scripts for reading into STDIN as root (if ge_execd(8) was
274              started as  root)  before  changing  to  the  job  owner's  user
275              account.   The  script  is then fed into the STDIN stream of the
276              command interpreter indicated by the -S option  of  the  qsub(1)
277              command  or  the  shell  parameter  of the queue to be used (see
278              above).
279              Thus setting shell_start_mode to script_from_stdin also  implies
280              posix_compliant  behavior.  Note,  however, that feeding scripts
281              into the STDIN stream of a command interpreter may cause trouble
282              if  commands like rsh(1) are invoked inside a job script as they
283              also process the STDIN stream of the command interpreter.  These
284              problems  can usually be resolved by redirecting the STDIN chan‐
285              nel of those commands to come from /dev/null (e.g. rsh host date
286              <  /dev/null).  Note also, that any command-line options associ‐
287              ated with the job are passed to the executing shell.  The  shell
288              will  only forward them to the job if they are not recognized as
289              valid shell options.
290
291       The default for shell_start_mode  is  posix_compliant.   Note,  though,
292       that  the shell_start_mode can only be used for batch jobs submitted by
293       qsub(1) and can't be used for interactive jobs  submitted  by  qrsh(1),
294       qsh(1), qlogin(1).
295
296   prolog
297       The  executable path of a shell script that is started before execution
298       of Grid Engine jobs with the same environment setting as that  for  the
299       Grid  Engine  jobs to be started afterwards. An optional prefix "user@"
300       specifies the user under which this procedure is  to  be  started.  The
301       procedures  standard  output and the error output stream are written to
302       the same file used also for the standard output  and  error  output  of
303       each  job.   This  procedure is intended as a means for the Grid Engine
304       administrator to automate the execution of general site specific  tasks
305       like  the  preparation  of temporary file systems with the need for the
306       same context information as the job.  This  queue  configuration  entry
307       overwrites cluster global or execution host specific prolog definitions
308       (see ge_conf(5)).
309
310       The default for prolog is the special value NONE, which  prevents  from
311       execution  of a prologue script.  The  special variables for constitut‐
312       ing a command line are the same like in prolog definitions of the clus‐
313       ter configuration (see ge_conf(5)).
314
315       Exit  codes  for  the  prolog attribute can be interpreted based on the
316       following exit values:
317              0: Success
318              99: Reschedule job
319              100: Put job in error state
320              Anything else: Put queue in error state
321
322   epilog
323       The executable path of a shell script that is started  after  execution
324       of  Grid  Engine jobs with the same environment setting as that for the
325       Grid Engine jobs that has just completed.  An optional  prefix  "user@"
326       specifies  the  user  under  which this procedure is to be started. The
327       procedures standard output and the error output stream are  written  to
328       the  same  file  used  also for the standard output and error output of
329       each job. This procedure is intended as a means  for  the  Grid  Engine
330       administrator  to automate the execution of general site specific tasks
331       like the cleaning up of temporary file systems with the  need  for  the
332       same  context  information  as  the job. This queue configuration entry
333       overwrites cluster global or execution host specific epilog definitions
334       (see ge_conf(5)).
335
336       The  default  for epilog is the special value NONE, which prevents from
337       execution of a epilogue script.  The  special variables for  constitut‐
338       ing a command line are the same like in prolog definitions of the clus‐
339       ter configuration (see ge_conf(5)).
340
341       Exit codes for the epilog attribute can be  interpreted  based  on  the
342       following exit values:
343              0: Success
344              99: Reschedule job
345              100: Put job in error state
346              Anything else: Put queue in error state
347
348   starter_method
349       The  specified  executable  path will be used as a job starter facility
350       responsible for starting batch jobs.  The executable path will be  exe‐
351       cuted  instead  of the configured shell to start the job. The job argu‐
352       ments will be passed as arguments to the  job  starter.  The  following
353       environment  variables  are used to pass information to the job starter
354       concerning the shell environment which was configured or  requested  to
355       start the job.
356
357
358       SGE_STARTER_SHELL_PATH
359              The name of the requested shell to start the job
360
361       SGE_STARTER_SHELL_START_MODE
362              The configured shell_start_mode
363
364       SGE_STARTER_USE_LOGIN_SHELL
365              Set  to  "true"  if  the shell is supposed to be used as a login
366              shell (see login_shells in ge_conf(5))
367
368       The starter_method will not be invoked for qsh, qlogin or  qrsh  acting
369       as rlogin.
370
371
372   suspend_method
373   resume_method
374   terminate_method
375       These parameters can be used for overwriting the default method used by
376       Grid Engine for suspension, release of a suspension and for termination
377       of  a  job.  Per  default, the signals SIGSTOP, SIGCONT and SIGKILL are
378       delivered to the job to perform these actions. However, for some appli‐
379       cations this is not appropriate.
380
381       If no executable path is given, Grid Engine takes the specified parame‐
382       ter entries as the signal to be delivered instead of the  default  sig‐
383       nal.  A  signal  must be either a positive number or a signal name with
384       "SIG" as prefix and the  signal  name  as  printed  by  kill  -l  (e.g.
385       SIGTERM).
386
387       If  an  executable  path is given (it must be an absolute path starting
388       with a "/") then this command together with its arguments is started by
389       Grid  Engine  to  perform the appropriate action. The following special
390       variables are expanded at runtime and can be used  (besides  any  other
391       strings which have to be interpreted by the procedures) to constitute a
392       command line:
393
394
395       $host  The name of the host on which the procedure is started.
396
397       $job_owner
398              The user name of the job owner.
399
400       $job_id
401              Grid Engine's unique job identification number.
402
403       $job_name
404              The name of the job.
405
406       $queue The name of the queue.
407
408       $job_pid
409              The pid of the job.
410
411
412   notify
413       The time waited between delivery of SIGUSR1/SIGUSR2  notification  sig‐
414       nals  and  suspend/kill  signals  if job was submitted with the qsub(1)
415       -notify option.
416
417   owner_list
418       The owner_list enlists comma separated the  login(1)  user  names  (see
419       user_name in sge_types(1)) of those users who are authorized to disable
420       and suspend this queue through qmod(1) (Grid Engine operators and  man‐
421       agers  can  do  this by default). It is customary to set this field for
422       queues on interactive workstations where the  computing  resources  are
423       shared  between interactive sessions and Grid Engine jobs, allowing the
424       workstation owner to have priority access.  (default: NONE).
425
426   user_lists
427       The user_lists parameter contains a comma separated list of Grid Engine
428       user  access list names as described in access_list(5).  Each user con‐
429       tained in at least one of the enlisted access lists has access  to  the
430       queue.  If  the  user_lists  parameter is set to NONE (the default) any
431       user has access being  not  explicitly  excluded  via  the  xuser_lists
432       parameter  described  below.   If a user is contained both in an access
433       list enlisted in xuser_lists and user_lists the user is  denied  access
434       to the queue.
435
436   xuser_lists
437       The  xuser_lists  parameter  contains  a  comma  separated list of Grid
438       Engine user access list names as  described  in  access_list(5).   Each
439       user  contained  in  at  least  one of the enlisted access lists is not
440       allowed to access the queue. If the xuser_lists  parameter  is  set  to
441       NONE (the default) any user has access.  If a user is contained both in
442       an access list enlisted in  xuser_lists  and  user_lists  the  user  is
443       denied access to the queue.
444
445   projects
446       The  projects  parameter contains a comma separated list of Grid Engine
447       projects (see project(5)) that have access to the  queue.  Any  project
448       not  in  this  list are denied access to the queue. If set to NONE (the
449       default), any project has access that is not specifically excluded  via
450       the  xprojects  parameter  described below. If a project is in both the
451       projects and xprojects parameters, the project is denied access to  the
452       queue.
453
454   xprojects
455       The  xprojects parameter contains a comma separated list of Grid Engine
456       projects (see project(5)) that are denied access to the queue.  If  set
457       to  NONE  (the default), no projects are denied access other than those
458       denied access based on the projects parameter described  above.   If  a
459       project  is  in both the projects and xprojects parameters, the project
460       is denied access to the queue.
461
462   subordinate_list
463       There are two different types of subordination:
464
465       1. Queuewise subordination
466
467       A list of  Grid  Engine  queue  names  as  defined  for  queue_name  in
468       sge_types(1).   Subordinate  relationships  are  in effect only between
469       queue instances residing at the same host.  The relationship  does  not
470       apply  and is ignored when jobs are running in queue instances on other
471       hosts.  Queue instances residing on the same  host  will  be  suspended
472       when  a specified count of jobs is running in this queue instance.  The
473       list specification is the same as that of the load_thresholds parameter
474       above,  e.g.  low_pri_q=5,small_q.  The numbers denote the job slots of
475       the queue that have to be filled in the superordinated queue to trigger
476       the  suspension  of  the  subordinated queue. If no value is assigned a
477       suspension is triggered if all slots of the queue are filled.
478
479       On nodes which host more than one queue, you might wish to accord  bet‐
480       ter service to certain classes of jobs (e.g., queues that are dedicated
481       to parallel processing might need priority over low priority production
482       queues; default: NONE).
483
484       2. Slotwise preemption
485
486       The  slotwise  preemption provides a means to ensure that high priority
487       jobs get the resources they need, while at the same time  low  priority
488       jobs  on  the same host are not unnecessarily preempted, maximizing the
489       host utilization.  The slotwise preemption is designed to provide  dif‐
490       ferent  preemption  actions,  but  with the current implementation only
491       suspension is provided.  This means there is a subordination  relation‐
492       ship defined between queues similar to the queuewise subordination, but
493       if the suspend threshold is exceeded, not the whole subordinated  queue
494       is  suspended, there are only single tasks running in single slots sus‐
495       pended.
496
497       Like with queuewise subordination, the subordination relationships  are
498       in  effect  only between queue instances residing at the same host. The
499       relationship does not apply and is ignored when jobs and tasks are run‐
500       ning in queue instances on other hosts.
501
502       The syntax is:
503
504       slots=<threshold>(<queue_list>)
505
506       where
507       <threshold> =a positive integer number
508       <queue_list>=<queue_def>[,<queue_list>]
509       <queue_def> =<queue>[:<seq_no>][:<action>]
510       <queue>     =a Grid Engine queue name as defined for
511                    queue_name in sge_types(1).
512       <seq_no>    =sequence number among all subordinated queues
513                    of the same depth in the tree. The higher the
514                    sequence number, the lower is the priority of
515                    the queue.
516                    Default is 0, which is the highest priority.
517       <action>    =the action to be taken if the threshold is
518                    exceeded. Supported is:
519                    "sr": Suspend the task with the shortest run
520                          time.
521                    "lr": Suspend the task with the longest run
522                          time.
523                    Default is "sr".
524
525       Some examples of possible configurations and their functionalities:
526
527       a) The simplest configuration
528
529       subordinate_list   slots=2(B.q)
530
531       which means the queue "B.q" is subordinated to the current queue (let's
532       call it "A.q"), the suspend threshold for all tasks  running  in  "A.q"
533       and  "B.q"  on the current host is two, the sequence number of "B.q" is
534       "0" and the action is "suspend task with shortest run time first". This
535       subordination relationship looks like this:
536
537             A.q
538              |
539             B.q
540
541       This  could be a typical configuration for a host with a dual core CPU.
542       This subordination configuration ensures that tasks that are  scheduled
543       to  "A.q" always get a CPU core for themselves, while jobs in "B.q" are
544       not preempted as long as there are no jobs running in "A.q".
545
546       If there is no task running in "A.q", two tasks are  running  in  "B.q"
547       and a new task is scheduled to "A.q", the sum of tasks running in "A.q"
548       and "B.q" is three. Three  is  greater  than  two,  this  triggers  the
549       defined  action. This causes the task with the shortest run time in the
550       subordinated queue "B.q" to be suspended. After  suspension,  there  is
551       one  task  running in "A.q", on task running in "B.q" and one task sus‐
552       pended in "B.q".
553
554       b) A simple tree
555
556       subordinate_list   slots=2(B.q:1, C.q:2)
557
558       This defines a small tree that looks like this:
559
560             A.q
561            /   \
562          B.q   C.q
563
564       A use case for this configuration could be a host with a dual core  CPU
565       and  queue  "B.q"  and "C.q" for jobs with different requirements, e.g.
566       "B.q" for interactive jobs, "C.q" for batch jobs.  Again, the tasks  in
567       "A.q"  always  get  a CPU core, while tasks in "B.q" and "C.q" are sus‐
568       pended only if the threshold of running tasks is  exceeded.   Here  the
569       sequence  number  among  the  queues of the same depth comes into play.
570       Tasks scheduled to "B.q" can't directly trigger the suspension of tasks
571       in  "C.q",  but if there is a task to be suspended, first "C.q" will be
572       searched for a suitable task.
573
574       If there is one task running in "A.q", one in "C.q" and a new  task  is
575       scheduled  to  "B.q", the threshold of "2" in "A.q", "B.q" and "C.q" is
576       exceeded. This triggers the suspension of one task in either  "B.q"  or
577       "C.q".  The  sequence  number gives "B.q" a higher priority than "C.q",
578       therefore the task in "C.q" is suspended. After  suspension,  there  is
579       one  task running in "A.q", one task running in "B.q" and one task sus‐
580       pended in "C.q".
581
582       c) More than two levels
583
584       Configuration of A.q: subordinate_list   slots=2(B.q)
585       Configuration of B.q: subordinate_list   slots=2(C.q)
586
587       looks like this:
588
589             A.q
590              |
591             B.q
592              |
593             C.q
594
595       These are three queues with high, medium and low priority.  If  a  task
596       is  scheduled to "C.q", first the subtree consisting of "B.q" and "C.q"
597       is checked, the number of  tasks  running  there  is  counted.  If  the
598       threshold  which  is  defined in "B.q" is exceeded, the job in "C.q" is
599       suspended. Then the whole tree is checked, if the number of tasks  run‐
600       ning  in  "A.q", "B.q" and "C.q" exceeds the threshold defined in "A.q"
601       the task in "C.q" is suspended. This means, the effective threshold  of
602       any  subtree  is  not higher than the threshold of the root node of the
603       tree.  If in this example a task is scheduled to "A.q", immediately the
604       number  of  tasks  running in "A.q", "B.q" and "C.q" is checked against
605       the threshold defined in "A.q".
606
607       d) Any tree
608
609              A.q
610             /   \
611           B.q   C.q
612          /     /   \
613        D.q    E.q  F.q
614                       \
615                        G.q
616
617       The computation of the tasks that are to be (un)suspended always starts
618       at  the queue instance that is modified, i.e. a task is scheduled to, a
619       task ends at, the configuration is modified, a manual  or  other  auto‐
620       matic (un)suspend is issued, except when it is a leaf node, like "D.q",
621       "E.q" and "G.q" in this example. Then the  computation  starts  at  its
622       parent  queue  instance  (like  "B.q", "C.q" or "F.q" in this example).
623       From there first all running tasks in the whole subtree of  this  queue
624       instance  are  counted.  If the sum exceeds the threshold configured in
625       the subordinate_list, in this subtree a task is  searched  to  be  sus‐
626       pended.  Then  the  algorithm  proceeds  to  the  parent  of this queue
627       instance, counts all running tasks in the whole subtree below the  par‐
628       ent  and  checks  if the number exceeds the threshold configured at the
629       parent's subordinate_list. If so, it searches for a task to suspend  in
630       the whole subtree below the parent. And so on, until it did this compu‐
631       tation for the root node of the tree.
632
633
634   complex_values
635       complex_values defines quotas for resource attributes managed via  this
636       queue.  The  syntax is the same as for load_thresholds (see above). The
637       quotas are related to the resource consumption of all jobs in  a  queue
638       in the case of consumable resources (see complex(5) for details on con‐
639       sumable resources) or they are interpreted on a  per  queue  slot  (see
640       slots  above) basis in the case of non-consumable resources. Consumable
641       resource attributes are commonly used to manage free memory, free  disk
642       space  or  available  floating  software  licenses while non-consumable
643       attributes usually define  distinctive  characteristics  like  type  of
644       hardware installed.
645
646       For  consumable  resource  attributes  an  available resource amount is
647       determined by subtracting the current resource consumption of all  run‐
648       ning  jobs in the queue from the quota in the complex_values list. Jobs
649       can only be dispatched to a queue if no resource  requests  exceed  any
650       corresponding  resource availability obtained by this scheme. The quota
651       definition in the complex_values list is automatically replaced by  the
652       current  load  value  reported for this attribute, if load is monitored
653       for this resource and if the reported load value is more stringent than
654       the quota. This effectively avoids oversubscription of resources.
655
656       Note:  Load  values  replacing the quota specifications may have become
657       more stringent because they have been scaled (see host_conf(5))  and/or
658       load  adjusted  (see sched_conf(5)).  The -F option of qstat(1) and the
659       load display in the qmon(1) queue control dialog (activated by clicking
660       on  a  queue  icon  while  the "Shift" key is pressed) provide detailed
661       information on the actual availability of consumable resources  and  on
662       the origin of the values taken into account currently.
663
664       Note  also:  The  resource  consumption  of  running jobs (used for the
665       availability calculation) as well as the resource requests of the  jobs
666       waiting  to  be  dispatched  either  may  be derived from explicit user
667       requests during job submission (see the -l option to qsub(1)) or from a
668       "default"  value  configured for an attribute by the administrator (see
669       complex(5)).  The -r option to qstat(1) can be used for retrieving full
670       detail on the actual resource requests of all jobs in the system.
671
672       For  non-consumable  resources  Grid  Engine  simply compares the job's
673       attribute requests with the corresponding specification in complex_val‐
674       ues  taking  the  relation operator of the complex attribute definition
675       into account (see complex(5)).  If the  result  of  the  comparison  is
676       "true",  the queue is suitable for the job with respect to the particu‐
677       lar attribute. For parallel jobs each queue slot to be  occupied  by  a
678       parallel task is meant to provide the same resource attribute value.
679
680       Note:  Only  numeric  complex  attributes  can be defined as consumable
681       resources and hence non-numeric attributes are always handled on a  per
682       queue slot basis.
683
684       The  default  value  for  this parameter is NONE, i.e. no administrator
685       defined resource attribute quotas are associated with the queue.
686
687   calendar
688       specifies the calendar to be valid for this queue or contains NONE (the
689       default).  A  calendar defines the availability of a queue depending on
690       time of day, week  and  year.  Please  refer  to  calendar_conf(5)  for
691       details on the Grid Engine calendar facility.
692
693       Note:  Jobs  can request queues with a certain calendar model via a "-l
694       c=<cal_name>" option to qsub(1).
695
696   initial_state
697       defines an initial state for the queue either when adding the queue  to
698       the  system for the first time or on start-up of the ge_execd(8) on the
699       host on which the queue resides. Possible values are:
700
701       default   The queue is enabled when adding the queue or is reset to the
702                 previous  status  when ge_execd(8) comes up (this corresponds
703                 to the behavior in earlier Grid Engine releases not  support‐
704                 ing initial_state).
705
706       enabled   The  queue is enabled in either case. This is equivalent to a
707                 manual and explicit 'qmod -e' command (see qmod(1)).
708
709       disabled  The queue is disable in either case. This is equivalent to  a
710                 manual and explicit 'qmod -d' command (see qmod(1)).
711

RESOURCE LIMITS

713       The first two resource limit parameters, s_rt and h_rt, are implemented
714       by Grid Engine. They define the "real time" or also called "elapsed" or
715       "wall  clock" time having passed since the start of the job. If h_rt is
716       exceeded by a job running in the queue, it is aborted via  the  SIGKILL
717       signal  (see  kill(1)).  If s_rt is exceeded, the job is first "warned"
718       via the SIGUSR1 signal (which can be caught by  the  job)  and  finally
719       aborted  after the notification time defined in the queue configuration
720       parameter notify (see above) has passed. In cases when s_rt is used  in
721       combination  with job notification it might be necessary to configure a
722       signal  other  than  SIGUSR1  using  the  NOTIFY_KILL  and  NOTIFY_SUSP
723       execd_params (see sge_conf(5)) so that the jobs' signal-catching mecha‐
724       nism can "differ" the cases and react accordingly.
725
726       The resource limit parameters s_cpu and h_cpu are implemented  by  Grid
727       Engine  as  a  job limit. They impose a limit on the amount of combined
728       CPU time consumed by all  the  processes  in  the  job.   If  h_cpu  is
729       exceeded  by  a  job  running in the queue, it is aborted via a SIGKILL
730       signal (see kill(1)).  If s_cpu is exceeded, the job is sent a  SIGXCPU
731       signal  which  can be caught by the job.  If you wish to allow a job to
732       be "warned" so it can exit gracefully before  it  is  killed  then  you
733       should  set  the s_cpu limit to a lower value than h_cpu.  For parallel
734       processes, the limit is applied per slot which means that the limit  is
735       multiplied  by  the  number of slots being used by the job before being
736       applied.
737
738       The resource limit parameters s_vmem and h_vmem are implemented by Grid
739       Engine  as  a job limit.  They impose a limit on the amount of combined
740       virtual memory consumed by all the processes in the job. If  h_vmem  is
741       exceeded  by  a  job  running in the queue, it is aborted via a SIGKILL
742       signal (see kill(1)).  If s_vmem is exceeded, the job is sent a SIGXCPU
743       signal  which  can be caught by the job.  If you wish to allow a job to
744       be "warned" so it can exit gracefully before  it  is  killed  then  you
745       should set the s_vmem limit to a lower value than h_vmem.  For parallel
746       processes, the limit is applied per slot which means that the limit  is
747       multiplied  by  the  number of slots being used by the job before being
748       applied.
749
750       The remaining parameters in the queue  configuration  template  specify
751       per  job  soft  and  hard  resource  limits as implemented by the setr‐
752       limit(2) system call. See this manual page  on  your  system  for  more
753       information.   By  default,  each limit field is set to infinity (which
754       means RLIM_INFINITY as described in the setrlimit(2) manual page).  The
755       value  type  for the CPU-time limits s_cpu and h_cpu is time. The value
756       type for the other limits is memory.  Note:  Not  all  systems  support
757       setrlimit(2).
758
759       Note  also:  s_vmem  and  h_vmem (virtual memory) are only available on
760       systems supporting RLIMIT_VMEM (see setrlimit(2) on your operating sys‐
761       tem).
762
763       The  UNICOS  operating system supplied by SGI/Cray does not support the
764       setrlimit(2) system call, using their own resource limit-setting system
765       call instead.  For UNICOS systems only, the following meanings apply:
766
767       s_cpu     The per-process CPU time limit in seconds.
768
769       s_core    The per-process maximum core file size in bytes.
770
771       s_data    The per-process maximum memory limit in bytes.
772
773       s_vmem    The same as s_data (if both are set the minimum is used).
774
775       h_cpu     The per-job CPU time limit in seconds.
776
777       h_data    The per-job maximum memory limit in bytes.
778
779       h_vmem    The same as h_data (if both are set the minimum is used).
780
781       h_fsize   The total number of disk blocks that this job can create.
782

COPYRIGHT

790       See ge_intro(1) for a full statement of rights and permissions.
791
792
793
794GE 6.2u5                 $Date: 2009/12/07 19:09:27 $            QUEUE_CONF(5)

NAME

DESCRIPTION

FORMAT

RESOURCE LIMITS

SEE ALSO

COPYRIGHT