1GE_PE(5)                   Grid Engine File Formats                   GE_PE(5)
2
3
4

NAME

6       ge_pe - Grid Engine parallel environment configuration file format
7

DESCRIPTION

9       Parallel environments are parallel programming and runtime environments
10       allowing for the execution of shared memory or distributed memory  par‐
11       allelized applications. Parallel environments usually require some kind
12       of setup to  be  operational  before  starting  parallel  applications.
13       Examples  for  common  parallel environments are shared memory parallel
14       operating systems and the distributed memory environments Parallel Vir‐
15       tual Machine (PVM) or Message Passing Interface (MPI).
16
17       ge_pe  allows  for  the  definition of interfaces to arbitrary parallel
18       environments.  Once a parallel environment is defined or modified  with
19       the  -ap  or -mp options to qconf(1) and linked with one or more queues
20       via pe_list in queue_conf(5) the environment can be requested for a job
21       via  the  -pe  switch to qsub(1) together with a request of a range for
22       the number of parallel processes to be allocated by the job. Additional
23       -l  options  may  be  used  to  specify  the job requirement to further
24       detail.
25
26       Note, Grid Engine allows backslashes (\)  be  used  to  escape  newline
27       (\newline)  characters. The backslash and the newline are replaced with
28       a space (" ") character before any interpretation.
29

FORMAT

31       The format of a ge_pe file is defined as follows:
32
33   pe_name
34       The name  of  the  parallel  environment  as  defined  for  pe_name  in
35       sge_types(1).  To be used in the qsub(1) -pe switch.
36
37   slots
38       The  number  of  parallel processes being allowed to run in total under
39       the parallel environment concurrently.  Type is  number,  valid  values
40       are 0 to 9999999.
41
42   user_lists
43       A  comma separated list of user access list names (see access_list(5)).
44       Each user contained in at least one of the enlisted  access  lists  has
45       access  to the parallel environment. If the user_lists parameter is set
46       to NONE (the default) any user has access being not explicitly excluded
47       via  the xuser_lists parameter described below.  If a user is contained
48       both in an access list enlisted in xuser_lists and user_lists the  user
49       is denied access to the parallel environment.
50
51   xuser_lists
52       The  xuser_lists parameter contains a comma separated list of so called
53       user access lists as described in access_list(5).  Each user  contained
54       in  at  least one of the enlisted access lists is not allowed to access
55       the parallel environment. If the xuser_lists parameter is set  to  NONE
56       (the  default)  any  user has access. If a user is contained both in an
57       access list enlisted in xuser_lists and user_lists the user  is  denied
58       access to the parallel environment.
59
60   start_proc_args
61       The  invocation  command  line of a start-up procedure for the parallel
62       environment. The start-up procedure is invoked by ge_shepherd(8)  prior
63       to executing the job script. Its purpose is to setup the parallel envi‐
64       ronment correspondingly to its needs.  An optional prefix "user@" spec‐
65       ifies  the user under which this procedure is to be started.  The stan‐
66       dard output of the start-up procedure is redirected to  the  file  REQ‐
67       NAME.poJID  in  the job's working directory (see qsub(1)), with REQNAME
68       being the name of the job as displayed by qstat(1) and  JID  being  the
69       job's  identification  number.   Likewise, the standard error output is
70       redirected to REQNAME.peJID
71       The following special variables being expanded at runtime can  be  used
72       (besides  any  other  strings which have to be interpreted by the start
73       and stop procedures) to constitute a command line:
74
75       $pe_hostfile
76              The pathname of a file containing a detailed description of  the
77              layout  of  the parallel environment to be setup by the start-up
78              procedure. Each line of the file refers to a host on which  par‐
79              allel  processes  are  to  be  run. The first entry of each line
80              denotes the hostname, the second entry the  number  of  parallel
81              processes to be run on the host, the third entry the name of the
82              queue, and the fourth entry a processor range to be used in case
83              of a multiprocessor machine.
84
85       $host  The  name  of  the host on which the start-up or stop procedures
86              are started.
87
88       $job_owner
89              The user name of the job owner.
90
91       $job_id
92              Grid Engine's unique job identification number.
93
94       $job_name
95              The name of the job.
96
97       $pe    The name of the parallel environment in use.
98
99       $pe_slots
100              Number of slots granted for the job.
101
102       $processors
103              The processors string as contained in  the  queue  configuration
104              (see  queue_conf(5)) of the master queue (the queue in which the
105              start-up and stop procedures are started).
106
107       $queue The cluster queue of the master queue instance.
108
109   stop_proc_args
110       The invocation command line of a shutdown procedure  for  the  parallel
111       environment.  The shutdown procedure is invoked by ge_shepherd(8) after
112       the job script has finished. Its purpose is to stop the parallel  envi‐
113       ronment  and  to remove it from all participating systems.  An optional
114       prefix "user@" specifies the user under which this procedure is  to  be
115       started.   The standard output of the stop procedure is also redirected
116       to the file REQNAME.poJID in the job's working directory (see qsub(1)),
117       with REQNAME being the name of the job as displayed by qstat(1) and JID
118       being the job's identification number.  Likewise,  the  standard  error
119       output is redirected to REQNAME.peJID
120       The  same  special variables as for start_proc_args can be used to con‐
121       stitute a command line.
122
123   allocation_rule
124       The allocation rule is interpreted by the scheduler  thread  and  helps
125       the  scheduler to decide how to distribute parallel processes among the
126       available machines. If, for instance, a parallel environment  is  built
127       for  shared memory applications only, all parallel processes have to be
128       assigned to a single machine, no matter how much suitable machines  are
129       available.   If, however, the parallel environment follows the distrib‐
130       uted memory paradigm, an even distribution of processes among  machines
131       may be favorable.
132       The  current  version  of  the scheduler only understands the following
133       allocation rules:
134
135       <int>:    An integer number fixing the number of processes per host. If
136                 the  number  is  1, all processes have to reside on different
137                 hosts. If the special denominator $pe_slots is used, the full
138                 range  of  processes as specified with the qsub(1) -pe switch
139                 has to be allocated on a single host (no matter  which  value
140                 belonging  to  the  range is finally chosen for the job to be
141                 allocated).
142
143       $fill_up: Starting from the best  suitable  host/queue,  all  available
144                 slots are allocated. Further hosts and queues are "filled up"
145                 as long as a job still requires slots for parallel tasks.
146
147       $round_robin:
148                 From all suitable hosts a single slot is allocated until  all
149                 tasks  requested  by the parallel job are dispatched. If more
150                 tasks are requested than suitable hosts are found, allocation
151                 starts  again  from  the  first  host.  The allocation scheme
152                 walks through suitable hosts in a best-suitable-first order.
153
154   control_slaves
155       This parameter can be set to TRUE or FALSE (the default). It  indicates
156       whether  Grid  Engine  is  the creator of the slave tasks of a parallel
157       application via ge_execd(8) and ge_shepherd(8) and thus has  full  con‐
158       trol  over all processes in a parallel application, which enables capa‐
159       bilities such as resource limitation and correct  accounting.  However,
160       to  gain  control  over  the  slave  tasks of a parallel application, a
161       sophisticated PE interface is required, which  works  closely  together
162       with  Grid  Engine facilities. Such PE interfaces are available through
163       your local Grid Engine support office.
164
165       Please set the control_slaves parameter  to  false  for  all  other  PE
166       interfaces.
167
168   job_is_first_task
169       The job_is_first_task parameter can be set to TRUE or FALSE. A value of
170       TRUE indicates that the Grid Engine job script already contains one  of
171       the tasks of the parallel application (the number of slots reserved for
172       the job is the number of slots requested with the -pe switch), while  a
173       value  of FALSE indicates that the job script (and its child processes)
174       is not part of the parallel program (the number of slots  reserved  for
175       the job is the number of slots requested with the -pe switch + 1).
176
177       If  wallclock  accounting  is  used  (execd_params  ACCT_RESERVED_USAGE
178       and/or SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is  set
179       to FALSE, the job_is_first_task parameter influences the accounting for
180       the job: A value of TRUE means that accounting for  cpu  and  requested
181       memory  gets  multiplied  by the number of slots requested with the -pe
182       switch, if job_is_first_task is set to FALSE, the  accounting  informa‐
183       tion gets multiplied by number of slots + 1.
184
185   urgency_slots
186       For  pending  jobs  with a slot range PE request the number of slots is
187       not determined. This setting specifies the method to be  used  by  Grid
188       Engine to assess the number of slots such jobs might finally get.
189
190       The  assumed  slot  allocation  has  a  meaning  when  determining  the
191       resource-request-based priority contribution for numeric  resources  as
192       described  in  sge_priority(5)  and  is  displayed when qstat(1) is run
193       without -g t option.
194
195       The following methods are supported:
196
197       <int>:    The specified integer number is directly used as  prospective
198                 slot amount.
199
200       min:      The slot range minimum is used as prospective slot amount. If
201                 no lower bound is specified with the range 1 is assumed.
202
203       max:      The of the slot range maximum is  used  as  prospective  slot
204                 amount.   If  no  upper bound is specified with the range the
205                 absolute maximum possible due to the PE's  slots  setting  is
206                 assumed.
207
208       avg:      The  average  of  all  numbers  occurring within the job's PE
209                 range request is assumed.
210
211   accounting_summary
212       This parameter is only checked if control_slaves (see above) is set  to
213       TRUE and thus Grid Engine is the creator of the slave tasks of a paral‐
214       lel application via ge_execd(8)  and  ge_shepherd(8).   In  this  case,
215       accounting information is available for every single slave task started
216       by Grid Engine.
217
218       The accounting_summary parameter can be set to TRUE or FALSE.  A  value
219       of  TRUE  indicates  that only a single accounting record is written to
220       the accounting(5) file, containing the accounting summary of the  whole
221       job  including  all  slave  tasks,  while a value of FALSE indicates an
222       individual accounting(5) record is written for  every  slave  task,  as
223       well as for the master task.
224       Note:    When    running    tightly   integrated   jobs   with   SHARE‐
225       TREE_RESERVED_USAGE set, and with having accounting_summary enabled  in
226       the  parallel  environment, reserved usage will only be reported by the
227       master task of the parallel job.  No per parallel  task  usage  records
228       will be sent from execd to qmaster, which can significantly reduce load
229       on qmaster when running large tightly integrated parallel jobs.
230

RESTRICTIONS

232       Note, that the functionality of the start-up,  shutdown  and  signaling
233       procedures remains the full responsibility of the administrator config‐
234       uring the parallel environment.  Grid Engine  will  just  invoke  these
235       procedures  and  evaluate  their  exit status. If the procedures do not
236       perform their tasks properly or if the parallel environment or the par‐
237       allel  application  behave  unexpectedly,  Grid  Engine has no means to
238       detect this.
239

SEE ALSO

241       ge_intro(1),  ge__types(1),  qconf(1),   qdel(1),   qmod(1),   qsub(1),
242       access_list(5), ge_qmaster(8), ge_shepherd(8).
243
245       See ge_intro(1) for a full statement of rights and permissions.
246
247
248
249GE 6.2u5                 $Date: 2009/04/06 15:31:32 $                 GE_PE(5)
Impressum