1pbs_scheduler_basl(8B) PBS pbs_scheduler_basl(8B)
2
3
4
6 pbs_sched_basl - pbs BASL scheduler
7
9 pbs_sched [-d home] [-L logfile] [-p print_file] [-a alarm] [-S port]
10 [-c configfile]
11
13 The pbs_sched command starts the operation of a batch scheduler on the
14 local host. It runs in conjunction with the PBS server. It queries
15 the server about the state of PBS and communicates with pbs_mom to get
16 information about the status of running jobs, memory available etc. It
17 then makes decisions as to what jobs to run.
18
19 Typically, this command will be in a local boot file such as
20 /etc/rc.local .
21
22 pbs_sched must be executed with root permission.
23
25 -d home
26 Specifies the name of the PBS home directory, PBS_HOME. If not
27 specified, the value of $PBS_SERVER_HOME as defined at compile
28 time is used. Also see the -L option.
29
30 -L logfile
31 Specifies an absolute path name of the file to use as the log
32 file. If not specified, the scheduler will open a file named for
33 the current date in the PBS_HOME/sched_logs directory. See the -d
34 option.
35
36 -p print_file
37 This specifies the "print" file. Any output from the scheduler
38 code which is written to standard out or standard error will be
39 written to this file. If this option is not given, the file used
40 will be $PBS_HOME/sched_priv/sched_out. See the -d option.
41
42 -a alarm
43 This specifies the time in seconds to wait for a schedule run to
44 finish. If a scheduling iteration takes too long to finish, an
45 alarm signal is sent, and the scheduler is restarted. If a core
46 file does not exist in the current directory, abort() is called
47 and a core file is generated. The default for alarm is 180 sec‐
48 onds.
49
50 -S port
51 Specifies a port on which to talk to the server. This option is
52 not required. It merely overides the default PBS scheduler port.
53
54 -c configfile
55 Specify a configuration file, see description below. If this is a
56 relative file name it will be relative to PBS_HOME/sched_priv, see
57 the -d option. If the -c option is not supplied, pbs_sched will
58 not attempt to open a configuration file. In BASL, this config
59 file is almost always needed because it is where the list of
60 servers, nodes, and host resource queries are specified by the
61 administrator.
62
64 This version of the scheduler requires knowledge of the BASL language.
65 The site must first write a function called sched_main() (and all func‐
66 tions supporting it) using BASL constructs, and then translate the
67 functions into C using the BASL compiler basl2c , which would also
68 attach a main program to the resulting code. This main program per‐
69 forms general initialization and housekeeping chores such as setting up
70 local socket to communicate with the server running on the same
71 machine, cd-ing to the priv directory, opening log files, opening con‐
72 figuration file (if any), setting up locks, forking the child to become
73 a daemon, initializing a scheduling cycle (i.e. get node attributes
74 that are static in nature), setting up the signal handlers, executing
75 global initialization assignment statements specified by the scheduler
76 writer, and finally sitting on a loop waiting for a scheduling command
77 from the server. When the server sends the scheduler an appropriate
78 scheduling command {SCH_SCHEDULE_NEW, SCH_SCHEDULE_TERM, SCH_SCHED‐
79 ULE_TIME, SCH_SCHEDULE_RECYC, SCH_SCHEDULE_CMD, SCH_SCHEDULE_FIRST},
80 information about server(s), jobs, queues, and execution host(s) are
81 obtained, and then sched_main() is called.
82
84 The BAtch Scheduling Language (BASL) is a C-like procedural language.
85 It provides a number of constructs and predefined functions that facil‐
86 itate dealing with scheduling issues. Information about a PBS server,
87 the queues that it owns, jobs residing on each queue, and the computa‐
88 tional nodes where jobs can be run, are accessed via the BASL data
89 types Server, Que, Job, CNode, Set Server, Set Que, Set Job, and Set
90 CNode.
91
92 The following simple sched_main() will cause the server to run all
93 queued jobs on the local server:
94
95 sched_main()
96 {
97 Server s;
98 Que q;
99 Job j;
100 Set Que queues;
101 Set Job jobs;
102
103 s = AllServersLocalHostGet(); // get local server
104 queues = ServerQueuesGet(s);
105
106 foreach( q in queues ) {
107 jobs = QueJobsGet(q);
108 foreach( j in jobs ) {
109 JobAction(j, SYNCRUN, NULLSTR);
110 }
111 }
112
113 }
114
115 For a more complete discussion of the Batch Scheduler Language, see
116 basl2c(1B) .
117
119 A configuration file may be specified with the -c option. This file is
120 used to specify the (1) hosts which are allowed to connect to
121 pbs_sched, (2) the list of server hosts for which the scheduler writer
122 wishes the system to periodically check for status, queues, and jobs
123 info, (3) list of execution hosts for which the scheduler writer wants
124 the system to periodically check for information like state, property,
125 and so on, and (4) various queries to send to each execution host.
126
127 (1) specifying client hosts:
128 The hosts allowed to connect to pbs_sched are specified in the
129 configuration file in a manner identical to that used in pbs_mom.
130 There is one line per host using the syntax:
131
132 $clienthost hostname
133
134 where clienthost and hostname are separated by white space. Two
135 host names are always allowed to connection to pbs_sched: "local‐
136 host" and the name returned to pbs_sched by the system call geth‐
137 ostname(). These names need not be specified in the configura‐
138 tion file.
139
140 (2) specifying list of servers:
141 The list of servers is specified in a one host per line manner,
142 using the syntax:
143
144 $serverhost hostname port_number
145 or where $server_host, hostname, and port_number are separated by
146 white space.
147
148 If port_number is 0, then the default PBS server port will be
149 used.
150
151 Regardless of what has been specified in the file, the list of
152 servers will always include the local server - one running on the
153 same host where the scheduler is running.
154
155 Within the BASL code, access to data of the list of servers is
156 done by calling AllServersGet(), or AllServersLocalHostGet()
157 which returns the local server on the list.
158
159 (3) specifying the list of execution hosts:
160 The list of execution hosts (nodes), whose MOMs are to be queried
161 from the scheduler, is specified in a one host per line manner,
162 using the syntax:
163
164 $momhost hostname port_number
165
166 where $momhost, hostname, and port_number are separated by white
167 space.
168
169 If port_number is 0, then the default PBS MOM port will be used.
170
171 The BASL function AllNodesGet() , or ServerNodesGet(AllServersLo‐
172 calHostGet()) is available for getting the list of nodes known to
173 the local system.
174
175 (4) specifying the list of host resources:
176 For specifying the list of host resource queries to send to each
177 execution host's MOM, the following syntax is used:
178
179 $node node_name CNode..Get host_resource
180
181 node_name should be the same hostname string that was specified
182 in a $momhost line. A node_name value of "*" (wildcard) means to
183 match any node.
184
185 Please consult section 9 of the PBS ERS (Resource Moni‐
186 tor/Resources) for a list of possible values to host_resource
187 parameter.
188
189 CNode..Get refers to the actual function name that is called from
190 the scheduler code to obtain the return values to host resource
191 queries. The list of CNode..Get function names that can appear
192 in the configuration file are:
193 STATIC:
194 ================================
195 CNodePropertiesGet
196 CNodeVendorGet
197 CNodeNumCpusGet
198 CNodeOsGet
199 CNodeMemTotalGet[type]
200 CNodeNetworkBwGet[type]
201 CNodeSwapSpaceTotalGet[name]
202 CNodeDiskSpaceTotalGet[name]
203 CNodeDiskInBwGet[name]
204 CNodeDiskOutBwGet[name]
205 CNodeTapeSpaceTotalGet[name]
206 CNodeTapeInBwGet[name]
207 CNodeTapeOutBwGet[name]
208 CNodeSrfsSpaceTotalGet[name]
209 CNodeSrfsInBwGet[name]
210 CNodeSrfsOutBwGet[name]
211
212 DYNAMIC:
213 ================================
214 CNodeIdletimeGet
215 CNodeLoadAveGet
216 CNodeMemAvailGet[type]
217 CNodeSwapSpaceAvailGet[name]
218 CNodeSwapInBwGet[name]
219 CNodeSwapOutBwGet[name]
220 CNodeDiskSpaceReservedGet[name]
221 CNodeDiskSpaceAvailGet[name]
222 CNodeTapeSpaceAvailGet[name]
223 CNodeSrfsSpaceReservedGet[name]
224 CNodeSrfsSpaceAvailGet[name]
225 CNodeCpuPercentIdleGet
226 CNodeCpuPercentSysGet
227 CNodeCpuPercentUserGet
228 CNodeCpuPercentGuestGet
229
230 STATIC function names return values that are obtained only during
231 the first scheduling cycle, or when the scheduler is instructed
232 to reconfig; whereas, DYNAMIC function names return attribute
233 values that are taken at every subsequent scheduling cycle.
234
235 name and type are arbitrarily defined. For example, you can
236 choose to have name defined as "$FASTDIR" for the CNodeSrfs*
237 calls, and a sample configuration file entry would look like:
238
239 $node unicos8 CNodeSrfsSpaceAvailGet[$FASTDIR]
240 quota[type=ares_avail,dir=$FASTDIR]
241
242 So in a BASL code, if you call CNodeSrfsSpaceAvailGet(node,
243 "$FASTDIR"), then it will return the value to the query
244 "quota[type=ares_avail,dir=$FASTDIR]" (3rd parameter) as sent to
245 the node's MOM.
246
247 By default, the scheduler has already internally defined the fol‐
248 lowing mappings, which can be overriden in the configuration
249 file:
250
251 keyword node_name CNode..Get host_resource
252 ======= ========= ================ =============
253 $node * CNodeOsGet arch
254 $node * CNodeLoadAveGet loadave
255 $node * CNodeIdletimeGet idletime
256
257 The above means that for all declared nodes (via $momhost), the
258 host queries arch, loadave, and idletime will be sent to each
259 node's MOM. The value to arch is obtained internally by the sys‐
260 tem during the first scheduling cycle because it falls under
261 STATIC category, while values to loadave and idletime are taken
262 at every scheduling iteration because they fall under the DYNAMIC
263 category. Access to the return values is done by calling
264 CNodeOsGet(node), CNodeLoadAveGet(node), and CNodeIdle‐
265 timeGet(node), respectively. The following are some sample $node
266 arguments that you may put in the configuration file.
267 node_name CNode..Get host res
268 ================== ========================= ==========
269 <sunos4_nodename> CNodeIdletimeGet idletime
270 <sunos4_nodename> CNodeLoadAveGet loadave
271 <sunos4_nodename> CNodeMemTotalGet[real] physmem
272 <sunos4_nodename> CNodeMemTotalGet[virtual] totmem
273 <sunos4_nodename> CNodeMemAvailGet[virtual] availmem
274
275 <irix5_nodename> CNodeNumCpusGet ncpus
276 <irix5_nodename> CNodeMemTotalGet[real] physmem
277 <irix5_nodename> CNodeMemTotalGet[virtual] totmem
278 <irix5_nodename> CNodeIdletimeGet idletime
279 <irix5_nodename> CNodeLoadAveGet loadave
280 <irix5_nodename> CNodeMemAvailGet[virtual] availmem
281
282 <linux_nodename> CNodeNumCpusGet ncpus
283 <linux_nodename> CNodeMemTotalGet[real] physmem
284 <linux_nodename> CNodeMemTotalGet[virtual] totmem
285 <linux_nodename> CNodeIdletimeGet idletime
286 <linux_nodename> CNodeLoadAveGet loadave
287 <linux_nodename> CNodeMemAvailGet[virtual] availmem
288
289 <solaris5_nodename> CNodeIdletimeGet idletime
290 <solaris5_nodename> CNodeLoadAveGet loadave
291 <solaris5_nodename> CNodeNumCpusGet ncpus
292 <solaris5_nodename> CNodeMemTotalGet[real] physmem
293
294 <aix4_nodename> CNodeIdletimeGet idletime
295 <aix4_nodename> CNodeLoadAveGet loadave
296 <aix4_nodename> CNodeMemTotalGet[virtual] totmem
297 <aix4_nodename> CNodeMemAvailGet[virtual] availmem
298
299 <unicos8_nodename> CNodeIdletimeGet idletime
300 <unicos8_nodename> CNodeLoadAveGet loadave
301 <unicos8_nodename> CNodeNumCpusGet ncpus
302 <unicos8_nodename> CNodeMemTotalGet[real] physme
303 <unicos8_nodename> CNodeMemAvailGet[virtual] availmem
304 <unicos8_nodename> CNodeSwapSpaceTotalGet[primary] swaptotal
305 <unicos8_nodename> CNodeSwapSpaceAvailGet[primary] swapavail
306 <unicos8_nodename> CNodeSwapInBwGet[primary] swapinrate
307 <unicos8_nodename> CNodeSwapOutBwGet[primary] swapoutrate
308 <unicos8_nodename> CNodePercentIdleGet cpuidle
309 <unicos8_nodename> CNodePercentSysGet cpuunix
310 <unicos8_nodename> CNodePercentGuestGet cpuguest
311 <unicos8_nodename> CNodePercentUsrGet cpuuser
312 <unicos8_nodename> CNodeSrfsSpaceAvailGet[$FASTDIR] quota[type
313 =ares_avail,
314 dir=$FASTDIR]
315
316 <unicos8_nodename> CNodeSrfsSpaceAvailGet[$BIGDIR] quota[type
317 =ares_avail,
318 dir=$BIGDIR]
319
320 <unicos8_nodename> CNodeSrfsSpaceAvailGet[$WRKDIR] quota[type
321 =ares_avail,
322 dir=$WRKDIR]
323
324 <sp2_nodename> CNodeLoadAveGet loadave
325
326 Suppose you have an execution host that is of irix5 os type, then
327 the <irix5_node_name> entries will be consulted by the scheduler.
328 The initial scheduling cycle would involve sending the STATIC
329 queries ncpus, physmem, totmem to the execution host's MOM, and
330 access to return values of the queries is done via CNodeNumCpus‐
331 Get(node), CNodeMemTotalGet(node, "real"), CNodeMemTotalGet(node,
332 "virtual") respectively, where node is the CNode representation
333 of the execution host. The subsequent scheduling cycles will
334 only send DYNAMIC queries idletime, loadave, and availmem, and
335 access to the return values of the queries is done via CNodeIdle‐
336 TimeGet(node), CNodeLoadAveGet(node), CNodeMemAvailGet(node,
337 "virtual"). respectively.
338
339"Later" entries in the config file take precedence.
340
341The configuration file must be "secure". It must be owned by a user id and
342group id less than 10 and not be world writable.
343
344On receipt of a SIGHUP signal, the scheduler will close and reopen its log
345file and reread its configuration file (if any).
346
348 $PBS_SERVER_HOME/sched_priv
349 the default directory for configuration files, typically
350 (/usr/spool/pbs)/sched_priv.
351
353 A C based scheduler will handle the following signals:
354
355 SIGHUP The server will close and reopen its log file and reread the
356 config file if one exists.
357
358 SIGALRM
359 If the site supplied scheduling module exceeds the time limit,
360 the Alarm will cause the scheduler to attempt to core dump and
361 restart itself.
362
363 SIGINT and SIGTERM
364 Will result in an orderly shutdown of the scheduler.
365
366 All other signals have the default action installed.
367
369 Upon normal termination, an exit status of zero is returned.
370
372 basl2c(1B), pbs_sched_tcl(8B), pbs_server(8B), and pbs_mom(8B).
373 PBS Internal Design Specification
374
375
376
377Local pbs_scheduler_basl(8B)