1SGE_PE(5)                  Grid Engine File Formats                  SGE_PE(5)
2
3
4

NAME

6       sge_pe - Grid Engine parallel environment configuration file format
7

DESCRIPTION

9       Parallel environments are parallel programming and runtime environments
10       allowing for the execution of shared memory or distributed memory  par‐
11       allelized applications. Parallel environments usually require some kind
12       of setup to  be  operational  before  starting  parallel  applications.
13       Examples  for  common  parallel environments are shared memory parallel
14       operating systems and the distributed memory environments Parallel Vir‐
15       tual Machine (PVM) or Message Passing Interface (MPI).
16
17       sge_pe  allows  for  the definition of interfaces to arbitrary parallel
18       environments.  Once a parallel environment is defined or modified  with
19       the  -ap  or -mp options to qconf(1) and linked with one or more queues
20       via pe_list in queue_conf(5) the environment can be requested for a job
21       via  the  -pe  switch to qsub(1) together with a request of a range for
22       the number of parallel process to be allocated by the  job.  Additional
23       -l  options  may  be  used  to  specify  the job requirement to further
24       detail.
25
26       Note: Grid Engine allows backslashes (\)  be  used  to  escape  newline
27       (\newline)  characters. The backslash and the newline are replaced with
28       a space (" ") character before any interpretation.
29

FORMAT

31       The format of a sge_pe file is defined as follows:
32
33   pe_name
34       The name  of  the  parallel  environment  as  defined  for  pe_name  in
35       sge_types(1).  To be used in the qsub(1) -pe switch.
36
37   slots
38       The number of parallel processes allowed to run in total under the par‐
39       allel environment concurrently.
40
41   user_lists
42       A comma separated list of user access list names (see  access_list(5)).
43       Each  user  contained  in  at  least one of the listed access lists has
44       access to the parallel environment. If the user_lists parameter is  set
45       to  NONE  (the default) any user has access not explicitly excluded via
46       the xuser_lists parameter described below.  If a user is contained both
47       in  an  access  list  listed  in xuser_lists and user_lists the user is
48       denied access to the parallel environment.
49
50   xuser_lists
51       The xuser_lists parameter contains  a  comma  separated  list  of  user
52       access lists as described in access_list(5).  Each user contained in at
53       least one of the listed access lists is not allowed to access the  par‐
54       allel  environment.  If  the  xuser_lists parameter is set to NONE (the
55       default) any user has access. If a user is contained both in an  access
56       list  listed in xuser_lists and user_lists the user is denied access to
57       the parallel environment.
58
59   start_proc_args
60       The invocation command line of a start-up procedure  for  the  parallel
61       environment. The start-up procedure is invoked by sge_shepherd(8) prior
62       to executing the job script. Its purpose is to setup the parallel envi‐
63       ronment correspondingly to its needs.  An optional prefix "user@" spec‐
64       ifies the user under which this procedure is to be started.  The  stan‐
65       dard  output  of  the start-up procedure is redirected to the file REQ‐
66       NAME.poJID in the job's working directory (see qsub(1)),  with  REQNAME
67       the  name of the job as displayed by qstat(1) and JID the job's identi‐
68       fication number.  Likewise, the standard error output is redirected  to
69       REQNAME.peJID
70       The  following  special  variables  expanded  at  runtime  can  be used
71       (besides any other strings which have to be interpreted  by  the  start
72       and stop procedures) to constitute a command line:
73
74       $pe_hostfile
75              The  pathname of a file containing a detailed description of the
76              layout of the parallel environment to be setup by  the  start-up
77              procedure.  Each line of the file refers to a host on which par‐
78              allel processes are to be run. The  first  entry  of  each  line
79              denotes  the  hostname,  the second entry the number of parallel
80              processes to be run on the host, the third entry the name of the
81              queue, and the fourth entry a processor range to be used in case
82              of a multiprocessor machine.
83
84       $host  The name of the host on which the start-up  or  stop  procedures
85              are executed.
86
87       $job_owner
88              The user name of the job owner.
89
90       $job_id
91              Grid Engine's unique job identification number.
92
93       $job_name
94              The name of the job.
95
96       $pe    The name of the parallel environment in use.
97
98       $pe_slots
99              Number of slots granted for the job.
100
101       $processors
102              The  processors  string  as contained in the queue configuration
103              (see queue_conf(5)) of the master queue (the queue in which  the
104              start-up and stop procedures are executed).
105
106       $queue The cluster queue of the master queue instance.
107
108   stop_proc_args
109       The  invocation  command  line of a shutdown procedure for the parallel
110       environment. The shutdown procedure is invoked by sge_shepherd(8) after
111       the  job script has finished. Its purpose is to stop the parallel envi‐
112       ronment and to remove it from all participating systems.   An  optional
113       prefix  "user@"  specifies the user under which this procedure is to be
114       started.  The standard output of the stop procedure is also  redirected
115       to the file REQNAME.poJID in the job's working directory (see qsub(1)),
116       with REQNAME the name of the job as displayed by qstat(1) and  JID  the
117       job's  identification  number.   Likewise, the standard error output is
118       redirected to REQNAME.peJID
119       The same special variables as for start_proc_args can be used  to  con‐
120       stitute a command line.
121
122   allocation_rule
123       The  allocation  rule  is  interpreted  by  sge_schedd(8) and helps the
124       scheduler to decide how to  distribute  parallel  processes  among  the
125       available  machines.  If, for instance, a parallel environment is built
126       for shared memory applications only, all parallel processes have to  be
127       assigned  to a single machine, no matter how much suitable machines are
128       available.  If, however, the parallel environment follows the  distrib‐
129       uted  memory paradigm, an even distribution of processes among machines
130       may be favorable.
131       The current version of the scheduler  only  understands  the  following
132       allocation rules:
133
134       <int>:    An integer number fixing the number of processes per host. If
135                 the number is 1, all processes have to  reside  on  different
136                 hosts. If the special denominator $pe_slots is used, the full
137                 range of processes as specified with the qsub(1)  -pe  switch
138                 has  to  be allocated on a single host (no matter which value
139                 belonging to the range is finally chosen for the  job  to  be
140                 allocated).
141
142       $fill_up: Starting  from  the  best  suitable host/queue, all available
143                 slots are allocated. Further hosts and queues are "filled up"
144                 as long as a job still requires slots for parallel tasks.
145
146       $round_robin:
147                 From  all suitable hosts a single slot is allocated until all
148                 tasks requested by the parallel job are dispatched.  If  more
149                 tasks are requested than suitable hosts are found, allocation
150                 starts again from the  first  host.   The  allocation  scheme
151                 walks through suitable hosts in a best-suitable-first order.
152
153   control_slaves
154       This  parameter can be set to TRUE or FALSE (the default). It indicates
155       whether Grid Engine is the creator of the slave  tasks  of  a  parallel
156       application via sge_execd(8) and sge_shepherd(8) and thus has full con‐
157       trol over all processes in a parallel application, which enables  capa‐
158       bilities  such  as resource limitation and correct accounting. However,
159       to gain control over the slave  tasks  of  a  parallel  application,  a
160       sophisticated  PE  interface  is required, which works closely together
161       with Grid Engine facilities. Such PE interfaces are  available  through
162       your local Grid Engine support office.
163
164       Please  set  the  control_slaves  parameter  to  false for all other PE
165       interfaces.
166
167   job_is_first_task
168       This parameter is only checked if control_slaves (see above) is set  to
169       TRUE and thus Grid Engine is the creator of the slave tasks of a paral‐
170       lel application via sge_execd(8) and sge_shepherd(8).  In this case,  a
171       sophisticated  PE  interface  is required closely coupling the parallel
172       environment and Grid Engine. The  documentation  accompanying  such  PE
173       interfaces will recommend the setting for job_is_first_task.
174
175       The job_is_first_task parameter can be set to TRUE or FALSE. A value of
176       TRUE indicates that the Grid Engine job script already contains one  of
177       the tasks of the parallel application, while a value of FALSE indicates
178       that the job script (and its child processes) is not part of the paral‐
179       lel program.
180
181
182   urgency_slots
183       For  pending  jobs  with a slot range PE request the number of slots is
184       not determined. This setting specifies the method to be  used  by  Grid
185       Engine to assess the number of slots such jobs might finally get.
186
187       The  assumed  slot  allocation  has  a  meaning  when  determining  the
188       resource-request-based priority contribution for numeric  resources  as
189       described  in  sge_priority(5)  and  is  displayed when qstat(1) is run
190       without -g t option.
191
192       The following methods are supported:
193
194       <int>:    The specified integer number is directly used as  prospective
195                 slot amount.
196
197       min:      The slot range minimum is used as prospective slot amount. If
198                 no lower bound is specified with the range 1 is assumed.
199
200       max:      The of the slot range maximum is  used  as  prospective  slot
201                 amount.   If  no  upper bound is specified with the range the
202                 absolute maximum possible due to the PE's  slots  setting  is
203                 assumed.
204
205       avg:      The  average  of  all  numbers  occurring within the job's PE
206                 range request is assumed.
207

RESTRICTIONS

209       Note that the functionality of the  start-up,  shutdown  and  signaling
210       procedures remains the full responsibility of the administrator config‐
211       uring the parallel environment.  Grid Engine  will  just  invoke  these
212       procedures  and  evaluate  their  exit status. If the procedures do not
213       perform their tasks properly or if the parallel environment or the par‐
214       allel  application  behave  unexpectedly,  Grid  Engine has no means to
215       detect this.
216

SEE ALSO

218       sge_intro(1),  sge__types(1),  qconf(1),  qdel(1),  qmod(1),   qsub(1),
219       access_list(5), sge_qmaster(8), sge_schedd(8), sge_shepherd(8).
220
222       See sge_intro(1) for a full statement of rights and permissions.
223
224
225
226GE 6.1                   $Date: 2007/07/19 08:17:18 $                SGE_PE(5)
Impressum