1SGE_PE(5) Grid Engine File Formats SGE_PE(5)
2
3
4
6 sge_pe - Grid Engine parallel environment configuration file format
7
9 Parallel environments are parallel programming and runtime environments
10 allowing for the execution of shared memory or distributed memory par‐
11 allelized applications. Parallel environments usually require some kind
12 of setup to be operational before starting parallel applications.
13 Examples for common parallel environments are shared memory parallel
14 operating systems and the distributed memory environments Parallel Vir‐
15 tual Machine (PVM) or Message Passing Interface (MPI).
16
17 sge_pe allows for the definition of interfaces to arbitrary parallel
18 environments. Once a parallel environment is defined or modified with
19 the -ap or -mp options to qconf(1) and linked with one or more queues
20 via pe_list in queue_conf(5) the environment can be requested for a job
21 via the -pe switch to qsub(1) together with a request of a range for
22 the number of parallel process to be allocated by the job. Additional
23 -l options may be used to specify the job requirement to further
24 detail.
25
26 Note: Grid Engine allows backslashes (\) be used to escape newline
27 (\newline) characters. The backslash and the newline are replaced with
28 a space (" ") character before any interpretation.
29
31 The format of a sge_pe file is defined as follows:
32
33 pe_name
34 The name of the parallel environment as defined for pe_name in
35 sge_types(1). To be used in the qsub(1) -pe switch.
36
37 slots
38 The number of parallel processes allowed to run in total under the par‐
39 allel environment concurrently.
40
41 user_lists
42 A comma separated list of user access list names (see access_list(5)).
43 Each user contained in at least one of the listed access lists has
44 access to the parallel environment. If the user_lists parameter is set
45 to NONE (the default) any user has access not explicitly excluded via
46 the xuser_lists parameter described below. If a user is contained both
47 in an access list listed in xuser_lists and user_lists the user is
48 denied access to the parallel environment.
49
50 xuser_lists
51 The xuser_lists parameter contains a comma separated list of user
52 access lists as described in access_list(5). Each user contained in at
53 least one of the listed access lists is not allowed to access the par‐
54 allel environment. If the xuser_lists parameter is set to NONE (the
55 default) any user has access. If a user is contained both in an access
56 list listed in xuser_lists and user_lists the user is denied access to
57 the parallel environment.
58
59 start_proc_args
60 The invocation command line of a start-up procedure for the parallel
61 environment. The start-up procedure is invoked by sge_shepherd(8) prior
62 to executing the job script. Its purpose is to setup the parallel envi‐
63 ronment correspondingly to its needs. An optional prefix "user@" spec‐
64 ifies the user under which this procedure is to be started. The stan‐
65 dard output of the start-up procedure is redirected to the file REQ‐
66 NAME.poJID in the job's working directory (see qsub(1)), with REQNAME
67 the name of the job as displayed by qstat(1) and JID the job's identi‐
68 fication number. Likewise, the standard error output is redirected to
69 REQNAME.peJID
70 The following special variables expanded at runtime can be used
71 (besides any other strings which have to be interpreted by the start
72 and stop procedures) to constitute a command line:
73
74 $pe_hostfile
75 The pathname of a file containing a detailed description of the
76 layout of the parallel environment to be setup by the start-up
77 procedure. Each line of the file refers to a host on which par‐
78 allel processes are to be run. The first entry of each line
79 denotes the hostname, the second entry the number of parallel
80 processes to be run on the host, the third entry the name of the
81 queue, and the fourth entry a processor range to be used in case
82 of a multiprocessor machine.
83
84 $host The name of the host on which the start-up or stop procedures
85 are executed.
86
87 $job_owner
88 The user name of the job owner.
89
90 $job_id
91 Grid Engine's unique job identification number.
92
93 $job_name
94 The name of the job.
95
96 $pe The name of the parallel environment in use.
97
98 $pe_slots
99 Number of slots granted for the job.
100
101 $processors
102 The processors string as contained in the queue configuration
103 (see queue_conf(5)) of the master queue (the queue in which the
104 start-up and stop procedures are executed).
105
106 $queue The cluster queue of the master queue instance.
107
108 stop_proc_args
109 The invocation command line of a shutdown procedure for the parallel
110 environment. The shutdown procedure is invoked by sge_shepherd(8) after
111 the job script has finished. Its purpose is to stop the parallel envi‐
112 ronment and to remove it from all participating systems. An optional
113 prefix "user@" specifies the user under which this procedure is to be
114 started. The standard output of the stop procedure is also redirected
115 to the file REQNAME.poJID in the job's working directory (see qsub(1)),
116 with REQNAME the name of the job as displayed by qstat(1) and JID the
117 job's identification number. Likewise, the standard error output is
118 redirected to REQNAME.peJID
119 The same special variables as for start_proc_args can be used to con‐
120 stitute a command line.
121
122 allocation_rule
123 The allocation rule is interpreted by sge_schedd(8) and helps the
124 scheduler to decide how to distribute parallel processes among the
125 available machines. If, for instance, a parallel environment is built
126 for shared memory applications only, all parallel processes have to be
127 assigned to a single machine, no matter how much suitable machines are
128 available. If, however, the parallel environment follows the distrib‐
129 uted memory paradigm, an even distribution of processes among machines
130 may be favorable.
131 The current version of the scheduler only understands the following
132 allocation rules:
133
134 <int>: An integer number fixing the number of processes per host. If
135 the number is 1, all processes have to reside on different
136 hosts. If the special denominator $pe_slots is used, the full
137 range of processes as specified with the qsub(1) -pe switch
138 has to be allocated on a single host (no matter which value
139 belonging to the range is finally chosen for the job to be
140 allocated).
141
142 $fill_up: Starting from the best suitable host/queue, all available
143 slots are allocated. Further hosts and queues are "filled up"
144 as long as a job still requires slots for parallel tasks.
145
146 $round_robin:
147 From all suitable hosts a single slot is allocated until all
148 tasks requested by the parallel job are dispatched. If more
149 tasks are requested than suitable hosts are found, allocation
150 starts again from the first host. The allocation scheme
151 walks through suitable hosts in a best-suitable-first order.
152
153 control_slaves
154 This parameter can be set to TRUE or FALSE (the default). It indicates
155 whether Grid Engine is the creator of the slave tasks of a parallel
156 application via sge_execd(8) and sge_shepherd(8) and thus has full con‐
157 trol over all processes in a parallel application, which enables capa‐
158 bilities such as resource limitation and correct accounting. However,
159 to gain control over the slave tasks of a parallel application, a
160 sophisticated PE interface is required, which works closely together
161 with Grid Engine facilities. Such PE interfaces are available through
162 your local Grid Engine support office.
163
164 Please set the control_slaves parameter to false for all other PE
165 interfaces.
166
167 job_is_first_task
168 This parameter is only checked if control_slaves (see above) is set to
169 TRUE and thus Grid Engine is the creator of the slave tasks of a paral‐
170 lel application via sge_execd(8) and sge_shepherd(8). In this case, a
171 sophisticated PE interface is required closely coupling the parallel
172 environment and Grid Engine. The documentation accompanying such PE
173 interfaces will recommend the setting for job_is_first_task.
174
175 The job_is_first_task parameter can be set to TRUE or FALSE. A value of
176 TRUE indicates that the Grid Engine job script already contains one of
177 the tasks of the parallel application, while a value of FALSE indicates
178 that the job script (and its child processes) is not part of the paral‐
179 lel program.
180
181
182 urgency_slots
183 For pending jobs with a slot range PE request the number of slots is
184 not determined. This setting specifies the method to be used by Grid
185 Engine to assess the number of slots such jobs might finally get.
186
187 The assumed slot allocation has a meaning when determining the
188 resource-request-based priority contribution for numeric resources as
189 described in sge_priority(5) and is displayed when qstat(1) is run
190 without -g t option.
191
192 The following methods are supported:
193
194 <int>: The specified integer number is directly used as prospective
195 slot amount.
196
197 min: The slot range minimum is used as prospective slot amount. If
198 no lower bound is specified with the range 1 is assumed.
199
200 max: The of the slot range maximum is used as prospective slot
201 amount. If no upper bound is specified with the range the
202 absolute maximum possible due to the PE's slots setting is
203 assumed.
204
205 avg: The average of all numbers occurring within the job's PE
206 range request is assumed.
207
209 Note that the functionality of the start-up, shutdown and signaling
210 procedures remains the full responsibility of the administrator config‐
211 uring the parallel environment. Grid Engine will just invoke these
212 procedures and evaluate their exit status. If the procedures do not
213 perform their tasks properly or if the parallel environment or the par‐
214 allel application behave unexpectedly, Grid Engine has no means to
215 detect this.
216
218 sge_intro(1), sge__types(1), qconf(1), qdel(1), qmod(1), qsub(1),
219 access_list(5), sge_qmaster(8), sge_schedd(8), sge_shepherd(8).
220
222 See sge_intro(1) for a full statement of rights and permissions.
223
224
225
226GE 6.1 $Date: 2007/07/19 08:17:18 $ SGE_PE(5)