1SGE_CONF(5) Grid Engine File Formats SGE_CONF(5)
2
3
4
6 sge_conf - Grid Engine configuration files
7
9 sge_conf defines the global and local Grid Engine configurations and
10 can be shown/modified by qconf(1) using the -sconf/-mconf options. Only
11 root or the cluster administrator may modify sge_conf.
12
13 At its initial start-up, sge_qmaster(8) checks to see if a valid Grid
14 Engine configuration is available at a well known location in the Grid
15 Engine internal directory hierarchy. If so, it loads that configura‐
16 tion information and proceeds. If not, sge_qmaster(8) writes a generic
17 configuration containing default values to that same location. The
18 Grid Engine execution daemons sge_execd(8) upon start-up retrieve their
19 configuration from sge_qmaster(8).
20
21 The actual configuration for both sge_qmaster(8) and sge_execd(8) is a
22 superposition of a global configuration and a local configuration per‐
23 tinent for the host on which a master or execution daemon resides. If
24 a local configuration is available, its entries overwrite the corre‐
25 sponding entries of the global configuration. Note: The local configu‐
26 ration does not have to contain all valid configuration entries, but
27 only those which need to be modified against the global entries.
28
29 Note: Grid Engine allows backslashes (\) be used to escape newline
30 (\newline) characters. The backslash and the newline are replaced with
31 a space (" ") character before any interpretation.
32
34 The paragraphs that follow provide brief descriptions of the individual
35 parameters that compose the global and local configurations for a Grid
36 Engine cluster:
37
38 execd_spool_dir
39 The execution daemon spool directory path. Again, a feasible spool
40 directory requires read/write access permission for root. The entry in
41 the global configuration for this parameter can be overwritten by exe‐
42 cution host local configurations, i.e. each sge_execd(8) may have a
43 private spool directory with a different path, in which case it needs
44 to provide read/write permission for the root account of the corre‐
45 sponding execution host only.
46
47 Under execd_spool_dir a directory named corresponding to the unquali‐
48 fied hostname of the execution host is opened and contains all informa‐
49 tion spooled to disk. Thus, it is possible for the execd_spool_dirs of
50 all execution hosts to physically reference the same directory path
51 (the root access restrictions mentioned above need to be met, however).
52
53 Changing the global execd_spool_dir parameter set at installation time
54 is not supported in a running system. If the change should still be
55 done it is required to restart all affected execution daemons. Please
56 make sure running jobs have finished before doing so, otherwise running
57 jobs will be lost.
58
59
60 The default location for the execution daemon spool directory is
61 $SGE_ROOT/$SGE_CELL/spool.
62
63 The global configuration entry for this value may be overwritten by the
64 execution host local configuration.
65
66 mailer
67 mailer is the absolute pathname to the electronic mail delivery agent
68 on your system. It must accept the following syntax:
69
70 mailer -s <subject-of-mail-message> <recipient>
71
72 Each sge_execd(8) may use a private mail agent. Changing mailer will
73 take immediate effect.
74
75 The default for mailer depends on the operating system of the host on
76 which the Grid Engine master installation was run. Common values are
77 /bin/mail or /usr/bin/Mail.
78
79 The global configuration entry for this value may be overwritten by the
80 execution host local configuration.
81
82 xterm
83 xterm is the absolute pathname to the X Window System terminal emula‐
84 tor, xterm(1).
85
86 Each sge_execd(8) may use a private mail agent. Changing xterm will
87 take immediate effect.
88
89 The default for xterm is /usr/bin/X11/xterm.
90
91 The global configuration entry for this value may be overwritten by the
92 execution host local configuration.
93
94 load_sensor
95 A comma separated list of executable shell script paths or programs to
96 be started by sge_execd(8) and to be used in order to retrieve site
97 configurable load information (e.g. free space on a certain disk parti‐
98 tion).
99
100 Each sge_execd(8) may use a set of private load_sensor programs or
101 scripts. Changing load_sensor will take effect after two load report
102 intervals (see load_report_time). A load sensor will be restarted auto‐
103 matically if the file modification time of the load sensor executable
104 changes.
105
106 The global configuration entry for this value may be overwritten by the
107 execution host local configuration.
108
109 In addition to the load sensors configured via load_sensor, sge_exec(8)
110 searches for an executable file named qloadsensor in the execution
111 host's Grid Engine binary directory path. If such a file is found, it
112 is treated like the configurable load sensors defined in load_sensor.
113 This facility is intended for pre-installing a default load sensor.
114
115 prolog
116 The executable path of a shell script that is started before execution
117 of Grid Engine jobs with the same environment setting as that for the
118 Grid Engine jobs to be started afterwards. An optional prefix "user@"
119 specifies the user under which this procedure is to be started. The
120 procedures standard output and the error output stream are written to
121 the same file used also for the standard output and error output of
122 each job. This procedure is intended as a means for the Grid Engine
123 administrator to automate the execution of general site specific tasks
124 like the preparation of temporary file systems with the need for the
125 same context information as the job. Each sge_execd(8) may use a pri‐
126 vate prolog script. Correspondingly, the execution host local configu‐
127 rations is can be overwritten by the queue configuration (see
128 queue_conf(5) ). Changing prolog will take immediate effect.
129
130 The default for prolog is the special value NONE, which prevents from
131 execution of a prolog script.
132
133 The following special variables expanded at runtime can be used
134 (besides any other strings which have to be interpreted by the proce‐
135 dure) to constitute a command line:
136
137 $host The name of the host on which the prolog or epilog procedures
138 are started.
139
140 $job_owner
141 The user name of the job owner.
142
143 $job_id
144 Grid Engine's unique job identification number.
145
146 $job_name
147 The name of the job.
148
149 $processors
150 The processors string as contained in the queue configuration
151 (see queue_conf(5)) of the master queue (the queue in which the
152 prolog and epilog procedures are started).
153
154 $queue The cluster queue name of the master queue instance, i.e. the
155 cluster queue in which the prolog and epilog procedures are
156 started.
157
158 $stdin_path
159 The pathname of the stdin file. This is always /dev/null for
160 prolog, pe_start, pe_stop and epilog. It is the pathname of the
161 stdin file for the job in the job script. When delegated file
162 staging is enabled, this path is set to $fs_stdin_tmp_path. When
163 delegated file staging is not enabled, it is the stdin pathname
164 given via DRMAA or qsub.
165
166 $stdout_path
167
168 $stderr_path
169 The pathname of the stdout/stderr file. This always points to
170 the output/error file. When delegated file staging is enabled,
171 this path is set to $fs_stdout_tmp_path/$fs_stderr_tmp_path.
172 When delegated file staging is not enabled, it is the std‐
173 out/stderr pathname given via DRMAA or qsub.
174
175 $merge_stderr
176 If merging of stderr and stdout is requested, this flag is "1",
177 otherwise it is "0". If this flag is 1, stdout and stderr are
178 merged in one file, the stdout file. Merging of stderr and std‐
179 out can be requested via the DRMAA job template attribute
180 'drmaa_join_files' (see drmaa_attributes(3) ) or the qsub param‐
181 eter '-j y' (see qsub(1) ).
182
183 $fs_stdin_host
184 When delegated file staging is requested for the stdin file,
185 this is the name of the host where the stdin file has to be
186 copied from before the job is started.
187
188 $fs_stdout_host
189
190 $fs_stderr_host
191 When delegated file staging is requested for the stdout/stderr
192 file, this is the name of the host where the stdout/stderr file
193 has to be copied to after the job has run.
194
195 $fs_stdin_path
196 When delegated file staging is requested for the stdin file,
197 this is the pathname of the stdin file on the host
198 $fs_stdin_host.
199
200 $fs_stdout_path
201
202 $fs_stderr_path
203 When delegated file staging is requested for the stdout/stderr
204 file, this is the pathname of the stdout/stderr file on the host
205 $fs_stdout_host/$fs_stderr_host.
206
207 $fs_stdin_tmp_path
208 When delegated file staging is requested for the stdin file,
209 this is the destination pathname of the stdin file on the execu‐
210 tion host. The prolog script must copy the stdin file from
211 $fs_stdin_host:$fs_stdin_path to localhost:$fs_stdin_tmp_path to
212 establish delegated file staging of the stdin file.
213
214 $fs_stdout_tmp_path
215
216 $fs_stderr_tmp_path
217 When delegated file staging is requested for the stdout/stderr
218 file, this is the source pathname of the stdout/stderr file on
219 the execution host. The epilog script must copy the stdout file
220 from localhost:$fs_stdout_tmp_path to $fs_stdout_host:$fs_std‐
221 out_path (the stderr file from localhost:$fs_stderr_tmp_path to
222 $fs_stderr_host:$fs_stderr_path) to establish delegated file
223 staging of the stdout/stderr file.
224
225 $fs_stdin_file_staging
226
227 $fs_stdout_file_staging
228
229 $fs_stderr_file_staging
230 When delegated file staging is requested for the stdin/std‐
231 out/stderr file, the flag is set to "1", otherwise it is set to
232 "0" (see in delegated_file_staging how to enable delegated file
233 staging).
234
235 These three flags correspond to the DRMAA job template attribute
236 'drmaa_transfer_files' (see drmaa_attributes(3) ).
237
238 The global configuration entry for this value may be overwritten by the
239 execution host local configuration.
240
241 epilog
242 The executable path of a shell script that is started after execution
243 of Grid Engine jobs with the same environment setting as that for the
244 Grid Engine jobs that has just completed. An optional prefix "user@"
245 specifies the user under which this procedure is to be started. The
246 procedures standard output and the error output stream are written to
247 the same file used also for the standard output and error output of
248 each job. This procedure is intended as a means for the Grid Engine
249 administrator to automate the execution of general site specific tasks
250 like the cleaning up of temporary file systems with the need for the
251 same context information as the job. Each sge_execd(8) may use a pri‐
252 vate epilog script. Correspondingly, the execution host local configu‐
253 rations is can be overwritten by the queue configuration (see
254 queue_conf(5) ). Changing epilog will take immediate effect.
255
256 The default for epilog is the special value NONE, which prevents from
257 execution of a epilog script. The same special variables as for pro‐
258 log can be used to constitute a command line.
259
260 The global configuration entry for this value may be overwritten by the
261 execution host local configuration.
262
263 shell_start_mode
264 This parameter defines the mechanisms which are used to actually invoke
265 the job scripts on the execution hosts. The following values are recog‐
266 nized:
267
268 unix_behavior
269 If a user starts a job shell script under UNIX interactively by
270 invoking it just with the script name the operating system's
271 executable loader uses the information provided in a comment
272 such as `#!/bin/csh' in the first line of the script to detect
273 which command interpreter to start to interpret the script. This
274 mechanism is used by Grid Engine when starting jobs if
275 unix_behavior is defined as shell_start_mode.
276
277 posix_compliant
278 POSIX does not consider first script line comments such a
279 `#!/bin/csh' as significant. The POSIX standard for batch queu‐
280 ing systems (P1003.2d) therefore requires a compliant queuing
281 system to ignore such lines but to use user specified or config‐
282 ured default command interpreters instead. Thus, if
283 shell_start_mode is set to posix_compliant Grid Engine will
284 either use the command interpreter indicated by the -S option of
285 the qsub(1) command or the shell parameter of the queue to be
286 used (see queue_conf(5) for details).
287
288 script_from_stdin
289 Setting the shell_start_mode parameter either to posix_compliant
290 or unix_behavior requires you to set the umask in use for
291 sge_execd(8) such that every user has read access to the
292 active_jobs directory in the spool directory of the correspond‐
293 ing execution daemon. In case you have prolog and epilog scripts
294 configured, they also need to be readable by any user who may
295 execute jobs.
296 If this violates your site's security policies you may want to
297 set shell_start_mode to script_from_stdin. This will force Grid
298 Engine to open the job script as well as the epilog and prolog
299 scripts for reading into STDIN as root (if sge_execd(8) was
300 started as root) before changing to the job owner's user
301 account. The script is then fed into the STDIN stream of the
302 command interpreter indicated by the -S option of the qsub(1)
303 command or the shell parameter of the queue to be used (see
304 queue_conf(5) for details).
305 Thus setting shell_start_mode to script_from_stdin also implies
306 posix_compliant behavior. Note, however, that feeding scripts
307 into the STDIN stream of a command interpreter may cause trouble
308 if commands like rsh(1) are invoked inside a job script as they
309 also process the STDIN stream of the command interpreter. These
310 problems can usually be resolved by redirecting the STDIN chan‐
311 nel of those commands to come from /dev/null (e.g. rsh host date
312 < /dev/null). Note also, that any command-line options associ‐
313 ated with the job are passed to the executing shell. The shell
314 will only forward them to the job if they are not recognized as
315 valid shell options.
316
317 Changes to shell_start_mode will take immediate effect. The default
318 for shell_start_mode is posix_compliant.
319
320 This value is a global configuration parameter only. It cannot be over‐
321 written by the execution host local configuration.
322
323 login_shells
324 UNIX command interpreters like the Bourne-Shell (see sh(1)) or the C-
325 Shell (see csh(1)) can be used by Grid Engine to start job scripts. The
326 command interpreters can either be started as login-shells (i.e. all
327 system and user default resource files like .login or .profile will be
328 executed when the command interpreter is started and the environment
329 for the job will be set up as if the user has just logged in) or just
330 for command execution (i.e. only shell specific resource files like
331 .cshrc will be executed and a minimal default environment is set up by
332 Grid Engine - see qsub(1)). The parameter login_shells contains a
333 comma separated list of the executable names of the command inter‐
334 preters to be started as login-shells. Shells in this list are only
335 started as login shells if the parameter shell_start_mode (see above)
336 is set to posix_compliant.
337
338 Changes to login_shells will take immediate effect. The default for
339 login_shells is sh,csh,tcsh,ksh.
340
341 This value is a global configuration parameter only. It cannot be over‐
342 written by the execution host local configuration.
343
344 min_uid
345 min_uid places a lower bound on user IDs that may use the cluster.
346 Users whose user ID (as returned by getpwnam(3)) is less than min_uid
347 will not be allowed to run jobs on the cluster.
348
349 Changes to min_uid will take immediate effect. The default for min_uid
350 is 0.
351
352 This value is a global configuration parameter only. It cannot be over‐
353 written by the execution host local configuration.
354
355 min_gid
356 This parameter sets the lower bound on group IDs that may use the clus‐
357 ter. Users whose default group ID (as returned by getpwnam(3)) is less
358 than min_gid will not be allowed to run jobs on the cluster.
359
360 Changes to min_gid will take immediate effect. The default for min_gid
361 is 0.
362
363 This value is a global configuration parameter only. It cannot be over‐
364 written by the execution host local configuration.
365
366 user_lists
367 The user_lists parameter contains a comma separated list of user access
368 lists as described in access_list(5). Each user contained in at least
369 one of the enlisted access lists has access to the cluster. If the
370 user_lists parameter is set to NONE (the default) any user has access
371 not explicitly excluded via the xuser_lists parameter described below.
372 If a user is contained both in an access list enlisted in xuser_lists
373 and user_lists the user is denied access to the cluster.
374
375 Changes to user_lists will take immediate effect
376
377 This value is a global configuration parameter only. It cannot be over‐
378 written by the execution host local configuration.
379
380 xuser_lists
381 The xuser_lists parameter contains a comma separated list of user
382 access lists as described in access_list(5). Each user contained in at
383 least one of the enlisted access lists is denied access to the cluster.
384 If the xuser_lists parameter is set to NONE (the default) any user has
385 access. If a user is contained both in an access list enlisted in
386 xuser_lists and user_lists (see above) the user is denied access to the
387 cluster.
388
389 Changes to xuser_lists will take immediate effect
390
391 This value is a global configuration parameter only. It cannot be over‐
392 written by the execution host local configuration.
393
394 administrator_mail
395 administrator_mail specifies a comma separated list of the electronic
396 mail address(es) of the cluster administrator(s) to whom internally-
397 generated problem reports are sent. The mail address format depends on
398 your electronic mail system and how it is configured; consult your sys‐
399 tem's configuration guide for more information.
400
401 Changing administrator_mail takes immediate effect. The default for
402 administrator_mail is an empty mail list.
403
404 This value is a global configuration parameter only. It cannot be over‐
405 written by the execution host local configuration.
406
407 projects
408 The projects list contains all projects which are granted access to
409 Grid Engine. User belonging to none of these projects cannot use Grid
410 Engine. If users belong to projects in the projects list and the xpro‐
411 jects list (see below), they also cannot use the system.
412
413 Changing projects takes immediate effect. The default for projects is
414 none.
415
416 This value is a global configuration parameter only. It cannot be over‐
417 written by the execution host local configuration.
418
419 xprojects
420 The xprojects list contains all projects which are granted access to
421 Grid Engine. User belonging to one of these projects cannot use Grid
422 Engine. If users belong to projects in the projects list (see above)
423 and the xprojects list, they also cannot use the system.
424
425 Changing xprojects takes immediate effect. The default for xprojects
426 is none.
427
428 This value is a global configuration parameter only. It cannot be over‐
429 written by the execution host local configuration.
430
431 load_report_time
432 System load is reported periodically by the execution daemons to
433 sge_qmaster(8). The parameter load_report_time defines the time inter‐
434 val between load reports.
435
436 Each sge_execd(8) may use a different load report time. Changing
437 load_report_time will take immediate effect.
438
439 Note: Be careful when modifying load_report_time. Reporting load too
440 frequently might block sge_qmaster(8) especially if the number of exe‐
441 cution hosts is large. Moreover, since the system load typically
442 increases and decreases smoothly, frequent load reports hardly offer
443 any benefit.
444
445 The default for load_report_time is 40 seconds.
446
447 The global configuration entry for this value may be overwritten by the
448 execution host local configuration.
449
450 reschedule_unknown
451 Determines whether jobs on hosts in unknown state are rescheduled and
452 thus sent to other hosts. Hosts are registered as unknown if sge_mas‐
453 ter(8) cannot establish contact to the sge_execd(8) on those hosts (see
454 max_unheard ). Likely reasons are a breakdown of the host or a break‐
455 down of the network connection in between, but also sge_execd(8) may
456 not be executing on such hosts.
457
458 In any case, Grid Engine can reschedule jobs running on such hosts to
459 another system. reschedule_unknown controls the time which Grid Engine
460 will wait before jobs are rescheduled after a host became unknown. The
461 time format specification is hh:mm:ss. If the special value 00:00:00 is
462 set, then jobs will not be rescheduled from this host.
463
464 Rescheduling is only initiated for jobs which have activated the rerun
465 flag (see the -r y option of qsub(1) and the rerun option of
466 queue_conf(5)). Parallel jobs are only rescheduled if the host on
467 which their master task executes is in unknown state. Checkpointing
468 jobs will only be rescheduled when the when option of the corresponding
469 checkpointing environment contains an appropriate flag. (see check‐
470 point(5)). Interactive jobs (see qsh(1), qrsh(1), qtcsh(1)) are not
471 rescheduled.
472
473 The default for reschedule_unknown is 00:00:00
474
475 The global configuration entry for this value may be over written by
476 the execution host local configuration.
477
478 max_unheard
479 If sge_qmaster(8) could not contact or was not contacted by the execu‐
480 tion daemon of a host for max_unheard seconds, all queues residing on
481 that particular host are set to status unknown. sge_qmaster(8), at
482 least, should be contacted by the execution daemons in order to get the
483 load reports. Thus, max_unheard should by greater than the
484 load_report_time (see above).
485
486 Changing max_unheard takes immediate effect. The default for
487 max_unheard is 2 minutes 30 seconds.
488
489 This value is a global configuration parameter only. It cannot be over‐
490 written by the execution host local configuration.
491
492 loglevel
493 This parameter specifies the level of detail that Grid Engine compo‐
494 nents such as sge_qmaster(8) or sge_execd(8) use to produce informa‐
495 tive, warning or error messages which are logged to the messages files
496 in the master and execution daemon spool directories (see the descrip‐
497 tion of the execd_spool_dir parameter above). The following message
498 levels are available:
499
500 log_err
501 All error events are recognized are logged.
502
503 log_warning
504 All error events are recognized and all detected signs of poten‐
505 tially erroneous behavior are logged.
506
507 log_info
508 All error events are recognized, all detected signs of poten‐
509 tially erroneous behavior and a variety of informative messages
510 are logged.
511
512 Changing loglevel will take immediate effect.
513
514 The default for loglevel is log_info.
515
516 This value is a global configuration parameter only. It cannot be over‐
517 written by the execution host local configuration.
518
519 max_aj_instances
520 This parameter defines the maximum amount of array task to be scheduled
521 to run simultaneously per array job. An instance of an array task will
522 be created within the master daemon when it gets a start order from the
523 scheduler. The instance will be destroyed when the array task finishes.
524 Thus the parameter provides control mainly over the memory consumption
525 of array jobs in the master and scheduler daemon. It is most useful for
526 very large clusters and very large array jobs. The default for this
527 parameter is 2000. The value 0 will deactivate this limit and will
528 allow the scheduler to start as many array job tasks as suitable
529 resources are available in the cluster.
530
531 Changing max_aj_instances will take immediate effect.
532
533 This value is a global configuration parameter only. It cannot be over‐
534 written by the execution host local configuration.
535
536 max_aj_tasks
537 This parameter defines the maximum number of array job tasks within an
538 array job. sge_qmaster(8) will reject all array job submissions which
539 request more than max_aj_tasks array job tasks. The default for this
540 parameter is 75000. The value 0 will deactivate this limit.
541
542 Changing max_aj_tasks will take immediate effect.
543
544 This value is a global configuration parameter only. It cannot be over‐
545 written by the execution host local configuration.
546
547 max_u_jobs
548 The number of active (not finished) jobs which each Grid Engine user
549 can have in the system simultaneously is controlled by this parameter.
550 A value greater than 0 defines the limit. The default value 0 means
551 "unlimited". If the max_u_jobs limit is exceeded by a job submission
552 then the submission command exits with exit status 25 and an appropri‐
553 ate error message.
554
555 Changing max_u_jobs will take immediate effect.
556
557 This value is a global configuration parameter only. It cannot be over‐
558 written by the execution host local configuration.
559
560 max_jobs
561 The number of active (not finished) jobs simultaneously allowed in Grid
562 Engine is controlled by this parameter. A value greater than 0 defines
563 the limit. The default value 0 means "unlimited". If the max_jobs
564 limit is exceeded by a job submission then the submission command exits
565 with exit status 25 and an appropriate error message.
566
567 Changing max_jobs will take immediate effect.
568
569 This value is a global configuration parameter only. It cannot be over‐
570 written by the execution host local configuration.
571
572 enforce_project
573 If set to true, users are required to request a project whenever sub‐
574 mitting a job. See the -P option to [22mqsub(1) for details.
575
576 Changing enforce_project will take immediate effect. The default for
577 enforce_project is false.
578
579 This value is a global configuration parameter only. It cannot be over‐
580 written by the execution host local configuration.
581
582 enforce_user
583 If set to true, a [22muser(5) must exist to allow for job submission. Jobs
584 are rejected if no corresponding user exists.
585
586 If set to auto, a [22muser(5) object for the submitting user will automati‐
587 cally be created during job submission, if one does not already exist.
588 The auto_user_oticket, auto_user_fshare, auto_user_default_project, and
589 auto_user_delete_time configuration parameters will be used as default
590 attributes of the new user(5) object.
591
592 Changing enforce_user will take immediate effect. The default for
593 enforce_user is false.
594
595 This value is a global configuration parameter only. It cannot be over‐
596 written by the execution host local configuration.
597
598 auto_user_oticket
599 The number of override tickets to assign to automatically created
600 user(5) objects. User objects are created automatically if the
601 enforce_user attribute is set to auto.
602
603 Changing auto_user_oticket will affect any newly created user objects,
604 but will not change user objects created in the past.
605
606 This value is a global configuration parameter only. It cannot be over‐
607 written by the execution host local configuration.
608
609 auto_user_fshare
610 The number of functional shares to assign to automatically created
611 user(5) objects. User objects are created automatically if the
612 enforce_user attribute is set to auto.
613
614 Changing auto_user_fshare will affect any newly created user objects,
615 but will not change user objects created in the past.
616
617 This value is a global configuration parameter only. It cannot be over‐
618 written by the execution host local configuration.
619
620 auto_user_default_project
621 The default project to assign to automatically created user(5) objects.
622 User objects are created automatically if the enforce_user attribute is
623 set to auto.
624
625 Changing auto_user_default_project will affect any newly created user
626 objects, but will not change user objects created in the past.
627
628 This value is a global configuration parameter only. It cannot be over‐
629 written by the execution host local configuration.
630
631 auto_user_delete_time
632 The number of seconds of inactivity after which automatically created
633 user(5) objects will be deleted. User objects are created automatically
634 if the enforce_user attribute is set to auto. If the user has no active
635 or pending jobs for the specified amount of time, the object will auto‐
636 matically be deleted. A value of 0 can be used to indicate that the
637 automatically created user object is permanent and should not be auto‐
638 matically deleted.
639
640 Changing auto_user_delete_time will affect the deletion time for all
641 users with active jobs.
642
643 This value is a global configuration parameter only. It cannot be over‐
644 written by the execution host local configuration.
645
646 set_token_cmd
647 This parameter is only present if your Grid Engine system is licensed
648 to support AFS.
649
650 Set_token_cmd points to a command which sets and extends AFS tokens for
651 Grid Engine jobs. In the standard Grid Engine AFS distribution, it is
652 supplied as a script which expects two command line parameters. It
653 reads the token from STDIN, extends the token's expiration time and
654 sets the token:
655
656 <set_token_cmd> <user> <token_extend_after_seconds>
657
658 As a shell script this command will call the programs:
659
660 - SetToken
661 - forge
662
663 which are provided by your distributor as source code. The script looks
664 as follows:
665
666 --------------------------------
667 #!/bin/sh
668 # set_token_cmd
669 forge -u $1 -t $2 | SetToken
670 --------------------------------
671
672 Since it is necessary for forge to read the secret AFS server key, a
673 site might wish to replace the set_token_cmd script by a command, which
674 connects to a custom daemon at the AFS server. The token must be forged
675 at the AFS server and returned to the local machine, where SetToken is
676 executed.
677
678 Changing set_token_cmd will take immediate effect. The default for
679 set_token_cmd is none.
680
681 The global configuration entry for this value may be overwritten by the
682 execution host local configuration.
683
684 pag_cmd
685 This parameter is only present if your Grid Engine system is licensed
686 to support AFS.
687
688 The path to your pagsh is specified via this parameter. The sge_shep‐
689 herd(8) process and the job run in a pagsh. Please ask your AFS admin‐
690 istrator for details.
691
692 Changing pag_cmd will take immediate effect. The default for pag_cmd
693 is none.
694
695 The global configuration entry for this value may be overwritten by the
696 execution host local configuration.
697
698 token_extend_time
699 This parameter is only present if your Grid Engine system is licensed
700 to support AFS.
701
702 The token_extend_time is the time period for which AFS tokens are peri‐
703 odically extended. Grid Engine will call the token extension 30 minutes
704 before the tokens expire until jobs have finished and the corresponding
705 tokens are no longer required.
706
707 Changing token_extend_time will take immediate effect. The default for
708 token_extend_time is 24:0:0, i.e. 24 hours.
709
710 The global configuration entry for this value may be overwritten by the
711 execution host local configuration.
712
713 gid_range
714 The gid_range is a comma separated list of range expressions of the
715 form n-m (n as well as m are integer numbers greater than 99), where m
716 is an abbreviation for m-m. These numbers are used in sge_execd(8) to
717 identify processes belonging to the same job.
718
719 Each sge_execd(8) may use a separate set up group ids for this purpose.
720 All number in the group id range have to be unused supplementary group
721 ids on the system, where the sge_execd(8) is started.
722
723 Changing gid_range will take immediate effect. There is no default for
724 gid_range. The administrator will have to assign a value for gid_range
725 during installation of Grid Engine.
726
727 The global configuration entry for this value may be overwritten by the
728 execution host local configuration.
729
730 qmaster_params
731 A list of additional parameters can be passed to the Grid Engine qmas‐
732 ter. The following values are recognized:
733
734 ENABLE_FORCED_QDEL
735 If this parameter is set, non-administrative users can force
736 deletion of their own jobs via the -f option of qdel(1). With‐
737 out this parameter, forced deletion of jobs is only allowed by
738 the Grid Engine manager or operator.
739
740 Note: Forced deletion for jobs is executed differently depending
741 on whether users are Grid Engine administrators or not. In case
742 of administrative users, the jobs are removed from the internal
743 database of Grid Engine immediately. For regular users, the
744 equivalent of a normal qdel(1) is executed first, and deletion
745 is forced only if the normal cancellation was unsuccessful.
746
747 FORBID_RESCHEDULE
748 If this parameter is set, re-queuing of jobs cannot be initiated
749 by the job script which is under control of the user. Without
750 this parameter jobs returning the value 99 are rescheduled. This
751 can be used to cause the job to be restarted at a different
752 machine, for instance if there are not enough resources on the
753 current one.
754
755 FORBID_APPERROR
756 If this parameter is set, the application cannot set itself to
757 error state. Without this parameter jobs returning the value
758 100 are set to error state (and therefore can be manually
759 rescheduled by clearing the error state). This can be used to
760 set the job to error state when a starting condition of the
761 application is not fulfilled before the application itself has
762 been started, or when a clean up procedure (e.g. in the epilog)
763 decides that it is necessary to run the job again, by returning
764 100 in the prolog, pe_start, job script, pe_stop or epilog
765 script.
766
767 DISABLE_AUTO_RESCHEDULING
768 If set to "true" or "1", the reschedule_unknown parameter is not
769 taken into account.
770
771 MAX_DYN_EC
772 Sets the max number of dynamic event clients (as used by qsub
773 -sync y and by Grid Engine DRMAA API library sessions). The
774 default is set to 99. The number of dynamic event clients
775 should not be bigger than half of the number of file descriptors
776 the system has. The number of file descriptors are shared among
777 the connections to all exec hosts, all event clients, and file
778 handles that the qmaster needs.
779
780 MONITOR_TIME
781 Specifies the time interval when the monitoring information
782 should be printed. The monitoring is disabled by default and can
783 be enabled by specifying an interval. The monitoring is per
784 thread and is written to the messages file or displayed by the
785 "qping -f" command line tool. Example: MONITOR_TIME=0:0:10 gen‐
786 erates and prints the monitoring information approximately every
787 10 seconds. The specified time is a guideline only and not a
788 fixed interval. The interval that is actually used is printed.
789 In this example, the interval could be anything between 9 sec‐
790 onds and 20 seconds.
791
792 LOG_MONITOR_MESSAGE
793 Monitoring information is logged into the messages files by
794 default. This information can be accessed via by qping(1). If
795 monitoring is always enabled, the messages files can become
796 quite large. This switch disables logging into the messages
797 files, making qping -f the only source of monitoring data.
798
799 PROF_SIGNAL
800 Profiling provides the user with the possibility to get system
801 measurements. This can be useful for debugging or optimization
802 of the system. The profiling output will be done within the mes‐
803 sages file.
804
805 Enables the profiling for qmaster signal thread. (e.g.
806 PROF_SIGNAL=true)
807
808 PROF_MESSAGE
809 Enables the profiling for qmaster message thread. (e.g.
810 PROF_MESSAGE=true)
811
812 PROF_DELIVER
813 Enables the profiling for qmaster event deliver thread. (e.g.
814 PROF_DELIVER=true)
815
816 PROF_TEVENT
817 Enables the profiling for qmaster timed event thread. (e.g.
818 PROF_TEVENT=true)
819
820 Please note, that the cpu utime and stime values contained in the pro‐
821 filing output are not per thread cpu times. These cpu usage statistics
822 are per process statistics. So the printed profiling values for cpu
823 mean "cpu time consumed by sge_qmaster (all threads) while the reported
824 profiling level was active".
825
826 STREE_SPOOL_INTERVAL
827 Sets the time interval for spooling the sharetree usage. The
828 default is set to 00:04:00. The setting accepts colon-separated
829 string or seconds. There is no setting to turn the sharetree
830 spooling off. (e.g. STREE_SPOOL_INTERVAL=00:02:00)
831
832 MAX_JOB_DELETION_TIME
833 Sets the value of how long the qmaster will spend deleting jobs.
834 After this time, the qmaster will continue with other tasks and
835 schedule the deletion of remaining jobs at a later time. The
836 default value is 3 seconds, and will be used if no value is
837 entered. The range of valid values is > 0 and <= 5. (e.g.
838 MAX_JOB_DELETION_TIME=1)
839
840 gdi_timeout
841 Sets how long the communication will wait for gdi send/receive
842 operations. The default value is set to 60 seconds. After this
843 time, the communication library will retry, if "gdi_retries" is
844 configured, receiving the gdi request. In case of not configured
845 "gdi_retries" the communication will return with a "gdi receive
846 failure" (e.g. gdi_timeout=120 will set the timeout time to 120
847 sec) Configuring no gdi_timeout value, the value defaults to 60
848 sec.
849
850 gdi_retries
851 Sets how often the gdi receive call will be repeated until the
852 gdi receive error appears. The default is set to 0. In this case
853 the call will be done 1 time with no retry. Setting the value
854 to -1 the call will be done permanently. In combination with
855 gdi_timeout parameter it is possible to configure a system with
856 eg. slow NFS, to make sure that all jobs will be submitted.
857 (e.g. gdi_retries=4)
858
859 cl_ping
860 Turns on/off a communication library ping. This parameter will
861 create additional debug output. This output shows information
862 about the error messages which are returned by communication and
863 it will give information about the application status of the
864 qmaster. eg, if it's unclear what's the reason for gdi timeouts,
865 this may show you some useful messages. The default value is
866 false (off) (e.g cl_ping=false)
867
868 Changing qmaster_params will take immediate effect, except gdi_timeout,
869 gdi_retries, cl_ping, these will take effect only for new connections.
870 The default for qmaster_params is none.
871
872 This value is a global configuration parameter only. It cannot be over‐
873 written by the execution host local configuration.
874
875 execd_params
876 This is used for passing additional parameters to the Grid Engine exe‐
877 cution daemon. The following values are recognized:
878
879 ACCT_RESERVED_USAGE
880 If this parameter is set to true, the usage of reserved
881 resources is used for the accounting entries cpu, mem and io
882 instead of the measured usage.
883
884 ENABLE_WINDOMACC
885 If this parameter is set to true, Windows Domain accounts (Win‐
886 DomAcc) are used on Windows hosts. These accounts require the
887 use of sgepasswd(1) (see also sgepasswd(5)). If this parameter
888 is set to false or is not set, local Windows accounts are used.
889 On non-Windows hosts, this parameter is ignored.
890
891 KEEP_ACTIVE
892 This value should only be set for debugging purposes. If set to
893 true, the execution daemon will not remove the spool directory
894 maintained by sge_shepherd(8) for a job.
895
896 PTF_MIN_PRIORITY, PTF_MAX_PRIORITY
897 The maximum/minimum priority which Grid Engine will assign to a
898 job. Typically this is a negative/positive value in the range
899 of -20 (maximum) to 19 (minimum) for systems which allow setting
900 of priorities with the nice(2) system call. Other systems may
901 provide different ranges.
902 The default priority range (varies from system to system) is
903 installed either by removing the parameters or by setting a
904 value of -999.
905 See the "messages" file of the execution daemon for the prede‐
906 fined default value on your hosts. The values are logged during
907 the startup of the execution daemon.
908
909 PROF_EXECD
910 Enables the profiling for the execution daemon. (e.g.
911 PROF_EXECD=true)
912
913 NOTIFY_KILL
914 The parameter allows you to change the notification signal for
915 the signal SIGKILL (see -notify option of qsub(1)). The parame‐
916 ter either accepts signal names (use the -l option of kill(1))
917 or the special value none. If set to none, no notification sig‐
918 nal will be sent. If it is set to TERM, for instance, or another
919 signal name then this signal will be sent as notification sig‐
920 nal.
921
922 NOTIFY_SUSP
923 With this parameter it is possible to modify the notification
924 signal for the signal SIGSTOP (see -notify parameter of
925 qsub(1)). The parameter either accepts signal names (use the -l
926 option of kill(1)) or the special value none. If set to none, no
927 notification signal will be sent. If it is set to TSTP, for
928 instance, or another signal name then this signal will be sent
929 as notification signal.
930
931 SHARETREE_RESERVED_USAGE
932 If this parameter is set to true, the usage of reserved
933 resources is taken for the Grid Engine share tree consumption
934 instead of measured usage.
935
936 Changing execd_params will take immediate effect. The default for
937 execd_params is none.
938
939 The global configuration entry for this value may be overwritten by the
940 execution host local configuration.
941
942 USE_QSUB_GID
943 If this parameter is set to true, the primary group id active
944 when a job was submitted will be set to become the primary group
945 id for job execution. If the parameter is not set, the primary
946 group id as defined for the job owner in the execution host
947 passwd(5) file is used.
948 The feature is only available for jobs submitted via qsub(1),
949 qrsh(1), qmake(1) and qtcsh(1). Also, it only works for qrsh(1)
950 jobs (and thus also for qtcsh(1) and qmake(1)) if rsh and rshd
951 components are used which are provided with Grid Engine (i.e.,
952 the rsh_daemon and rsh_command parameters may not be changed
953 from the default).
954
955 INHERIT_ENV
956 This parameter indicates whether the shepherd should allow the
957 environment inherited by the execution daemon from the shell
958 that started it to be inherited by the job it's starting. When
959 true, any environment variable that is set in the shell which
960 starts the execution daemon at the time the execution daemon is
961 started will be set in the environment of any jobs run by that
962 execution daemon, unless the environment variable is explicitly
963 overridden, such as PATH or LOGNAME. If set to false, each job
964 starts with only the environment variables that are explicitly
965 passed on by the execution daemon, such as PATH and LOGNAME.
966 The default value is true.
967
968 SET_LIB_PATH
969 This parameter tells the execution daemon whether to add the
970 Grid Engine shared library directory to the library path of exe‐
971 cuted jobs. If set to true, and INHERIT_ENV is also set to
972 true, the Grid Engine shared library directory will be prepended
973 to the library path which is inherited from the shell which
974 started the execution daemon. If INHERIT_ENV is set to false,
975 the library path will contain only the Grid Engine shared
976 library directory. If set to false, and INHERIT_ENV is set to
977 true, the library path exported to the job will be the one
978 inherited from the shell which started the execution daemon. If
979 INHERIT_ENV is also set to false, the library path will be
980 empty. After the execution daemon has set the library path, it
981 may be further altered by the shell in which the job is exe‐
982 cuted, or by the job script itself. The default value for
983 SET_LIB_PATH is false.
984
985 ENABLE_ADDGRP_KILL
986 If this parameter is set then Grid Engine uses the supplementary
987 group ids (see gid_range) to identify all processes which are to
988 be terminated when a job is deleted, or when sge_shepherd(8)
989 cleans up after job termination.
990
991 reporting_params
992 Used to define the behavior of reporting modules in the Grid Engine
993 qmaster. Changes to the reporting_params takes immediate effect. The
994 following values are recognized:
995
996 accounting
997 If this parameter is set to true, the accounting file is writ‐
998 ten. The accounting file is prerequisite for using the qacct
999 command.
1000
1001 reporting
1002 If this parameter is set to true, the reporting file is written.
1003 The reporting file contains data that can be used for monitoring
1004 and analysis, like job accounting, job log, host load and con‐
1005 sumables, queue status and consumables and sharetree configura‐
1006 tion and usage. Attention: Depending on the size and load of
1007 the cluster, the reporting file can become quite large. Only
1008 activate the reporting file if you have a process running that
1009 will consume the reporting file! See reporting(5) for further
1010 information about format and contents of the reporting file.
1011
1012 flush_time
1013 Contents of the reporting file are buffered in the Grid Engine
1014 qmaster and flushed at a fixed interval. This interval can be
1015 configured with the flush_time parameter. It is specified as a
1016 time value in the format HH:MM:SS. Sensible values range from a
1017 few seconds to one minute. Setting it too low may slow down the
1018 qmaster. Setting it too high will make the qmaster consume large
1019 amounts of memory for buffering data.
1020
1021 accounting_flush_time
1022 Contents of the accounting file are buffered in the Grid Engine
1023 qmaster and flushed at a fixed interval. This interval can be
1024 configured with the accounting_flush_time parameter. It is
1025 specified as a time value in the format HH:MM:SS. Sensible val‐
1026 ues range from a few seconds to one minute. Setting it too low
1027 may slow down the qmaster. Setting it too high will make the
1028 qmaster consume large amounts of memory for buffering data.
1029 Setting it to 00:00:00 will disable accounting data buffering;
1030 as soon as data is generated, it will be written to the account‐
1031 ing file. If this parameter is not set, the accounting data
1032 flush interval will default to the value of the flush_time
1033 parameter.
1034
1035 joblog If this parameter is set to true, the reporting file will con‐
1036 tain job logging information. See reporting(5) for more informa‐
1037 tion about job logging.
1038
1039 sharelog
1040 The Grid Engine qmaster can dump information about sharetree
1041 configuration and use to the reporting file. The parameter
1042 sharelog sets an interval in which sharetree information will be
1043 dumped. It is set in the format HH:MM:SS. A value of 00:00:00
1044 configures qmaster not to dump sharetree information. Intervals
1045 of several minutes up to hours are sensible values for this
1046 parameter. See reporting(5) for further information about
1047 sharelog.
1048
1049 log_consumables
1050 This parameter controls writing of consumable resources to the
1051 reporting file. Default (log_consumables=true) is to write
1052 information about all consumable resources (their current usage
1053 and their capacity) to the reporting file, whenever a consumable
1054 resource changes either in definition, or in capacity, or when
1055 the usage of a consumable resource changes. When log_consum‐
1056 ables is set to false, only those variables will be written to
1057 the reporting file, that are configured in the report_variables
1058 in the exec host configuration, see [22mhost_conf(5) for further
1059 information about report_variables.
1060
1061 finished_jobs
1062 Grid Engine stores a certain number of just finished jobs to provide
1063 post mortem status information. The finished_jobs parameter defines the
1064 number of finished jobs stored. If this maximum number is reached, the
1065 eldest finished job will be discarded for every new job added to the
1066 finished job list.
1067
1068 Changing finished_jobs will take immediate effect. The default for
1069 finished_jobs is 0.
1070
1071 This value is a global configuration parameter only. It cannot be over‐
1072 written by the execution host local configuration.
1073
1074 qlogin_daemon
1075 This parameter specifies the executable that is to be started on the
1076 server side of a qlogin(1) request. Usually this is the absolute path‐
1077 name of the system's telnet daemon.
1078
1079 Changing qlogin_daemon will take immediate effect. The default value
1080 for qlogin_daemon is the system's default telnetd.
1081
1082 The global configuration entry for this value may be overwritten by the
1083 execution host local configuration.
1084
1085 qlogin_command
1086 This is the command to be executed on the client side of a qlogin(1)
1087 request. Usually this is the absolute pathname of the system's telnet
1088 client program. Otherwise the systems default telnet client will be
1089 used. It is automatically started with the target host and port number
1090 as parameters.
1091
1092 Changing qlogin_command will take immediate effect. The default value
1093 for qlogin_command is telnet.
1094
1095 The global configuration entry for this value may be overwritten by the
1096 execution host local configuration.
1097
1098 rlogin_daemon
1099 This parameter specifies the executable that is to be started on the
1100 server side of a qrsh(1) request without a command argument to be exe‐
1101 cuted remotely. Usually this is the absolute pathname of the system's
1102 rlogin daemon.
1103
1104 Changing rlogin_daemon will take immediate effect. The default for
1105 rlogin_daemon is the system's default rlogind.
1106
1107 The global configuration entry for this value may be overwritten by the
1108 execution host local configuration.
1109
1110 rlogin_command
1111 This is the command to be executed on the client side of a qrsh(1)
1112 request without a command argument to be executed remotely. If no
1113 value is given, a specialized Grid Engine component is used. The com‐
1114 mand is automatically started with the target host and port number as
1115 parameters. The Grid Engine rlogin client has been extended to accept
1116 and use the port number argument. You can only use clients, such as
1117 ssh, which also understand this syntax.
1118
1119 Changing rlogin_command will take immediate effect. A default value for
1120 rlogin_command is not configured.
1121
1122 The global configuration entry for this value may be overwritten by the
1123 execution host local configuration.
1124
1125 rsh_daemon
1126 This parameter specifies the executable that is to be started on the
1127 server side of a qrsh(1) request with a command argument to be executed
1128 remotely. If no value is given, a specialized Grid Engine component is
1129 used.
1130
1131 Changing rsh_daemon will take immediate effect. A default value for
1132 rsh_daemon is not configured.
1133
1134 The global configuration entry for this value may be overwritten by the
1135 execution host local configuration.
1136
1137 rsh_command
1138 This is the command to be executed on the client side of a qrsh(1)
1139 request with a command argument to be executed remotely. If no value
1140 is given, a specialized Grid Engine component is used. The command is
1141 automatically started with the target host and port number as parame‐
1142 ters like required for telnet(1) plus the command with its arguments to
1143 be executed remotely. The Grid Engine rsh client has been extended to
1144 accept and use the port number argument. You can only use clients, such
1145 as ssh, which also understand this syntax.
1146
1147 Changing rsh_command will take immediate effect. A default value for
1148 rsh_command is not configured.
1149
1150 The global configuration entry for this value may be overwritten by the
1151 execution host local configuration.
1152
1153 delegated_file_staging
1154 This flag must be set to "true" when the prolog and epilog are ready
1155 for delegated file staging, so that the DRMAA attribute 'drmaa_trans‐
1156 fer_files' is supported. To establish delegated file staging, use the
1157 variables beginning with "$fs_..." in prolog and epilog to move the
1158 input, output and error files from one host to the other. When this
1159 flag is set to "false", no file staging is available for the DRMAA
1160 interface. File staging is currently implemented only via the DRMAA
1161 interface. When an error occurs while moving the input, output and
1162 error files, return error code 100 so that the error handling mechanism
1163 can handle the error correctly. (See also FORBID_APPERROR).
1164
1165 reprioritize
1166 This flag enables or disables the reprioritization of jobs based on
1167 their ticket amount. The reprioritize_interval in sched_conf(5) takes
1168 effect only if reprioritize is set to true. To turn off job reprioriti‐
1169 zation, the reprioritize flag must be set to false and the repriori‐
1170 tize_interval to 0 which is the default.
1171
1172 This value is a global configuration parameter only. It cannot be over‐
1173 ridden by the execution host local configuration.
1174
1176 sge_intro(1), csh(1), qconf(1), qsub(1), rsh(1), sh(1), getpwnam(3),
1177 drmaa_attributes(3), queue_conf(5), sched_conf(5), sge_execd(8),
1178 sge_qmaster(8), sge_shepherd(8), cron(8), Grid Engine Installation and
1179 Administration Guide.
1180
1182 See sge_intro(1) for a full statement of rights and permissions.
1183
1184
1185
1186GE 6.1 $Date: 2007/10/22 08:12:35 $ SGE_CONF(5)