1pbs_sched(8B) PBS pbs_sched(8B)
2
3
4
6 pbs_sched_tcl - pbs Tcl scheduler
7
9 pbs_sched [-a alarm] [-b file] [-d home] [-i file] [-L logfile]
10 [-p file] [-S port] [-t file] [-v] [-c file]
11
13 The pbs_sched program runs in conjunction with the PBS server. It
14 queries the server about the state of PBS and communicates with pbs_mom
15 to get information about the status of running jobs, memory available
16 etc. It then makes decisions as to what jobs to run.
17
18 pbs_sched must be executed with root permission.
19
21 -a alarm This specifies the time in seconds to wait for a sched‐
22 ule run to finish. If a script takes too long to fin‐
23 ish, an alarm signal is sent, and the scheduler is
24 restarted. If a core file does not exist in the current
25 directory, abort() is called and a core file is gener‐
26 ated. The default for alarm is 180 seconds.
27
28 -b file This specifies the "body" file. The file given is read
29 into memory once at program start or after the program
30 receives a SIGHUP and executed each time the scheduler
31 is awakened by the server. If this option is not given,
32 the file "sched_tcl" in the directory
33 PBS_HOME/sched_priv is read for the body code.
34
35 -d home This specifies the PBS home directory, PBS_HOME. The
36 current working directory of the scheduler is
37 PBS_HOME/sched_priv. If this option is not given,
38 PBS_HOME defaults to $PBS_SERVER_HOME as defined during
39 the PBS build procedure.
40
41 -i file This specifies the "initialize" file. The file given is
42 executed once before the main processing loop is
43 entered. If this option is not given, no initialization
44 code is executed.
45
46 -L logfile Specifies an absolute path name of the file to use as
47 the log file. If not specified, the scheduler will open
48 a file named for the current date in the
49 PBS_HOME/sched_logs directory (see the -d option).
50
51 -p file This specifies the "print" file. Any output from the
52 Tcl code which is written to standard out or standard
53 error will be written to this file. If this option is
54 not given, the file used will be
55 PBS_HOME/sched_priv/sched_out. See the -d option.
56
57 -S port This specifies the port to use. If this option is not
58 given, the default port for the PBS scheduler is used.
59
60 -t file This specifies the "terminator" file. If a QUIT command
61 is sent from the server, this code is executed before
62 the scheduler exits. If this option is not given, no
63 special termination handling is done.
64
65 -v This puts the scheduler into "verbose" mode. Any errors
66 will be shown no matter what this may be set to, but
67 some "uninteresting" events may be logged by using this
68 flag. An example is a message each time the server con‐
69 tacts the scheduler.
70
71 -c file Specify a configuration file, see description below. If
72 this is a relative file name it will be relative to
73 PBS_HOME/sched_priv, see the -d option. If the -c
74 option is not supplied, pbs_sched will not attempt to
75 open a configuration file.
76
77 The options that specify file names may be absolute or relative. If
78 they are relative, their root directory will be PBS_HOME/sched_priv.
79
81 This version of the scheduler requires knowledge of the Tcl language.
82 A set of functions to communicate with the PBS server and resource mon‐
83 itor have been added to those normally available with Tcl. All these
84 calls will set the Tcl variable "pbs_errno" to a value to indicate if
85 an error occured. In all cases, the value "0" means no error. If a
86 call to a Resource Monitor function is made, any error value will come
87 from the system supplied errno variable. If the function call communi‐
88 cates with the PBS Server, any error value will come from the error
89 number returned by the server.
90
91 openrm host ?port?
92 Creates a connection to the PBS Resource Monitor on host using
93 port as the port number or the standard port for the resource
94 monitor if it is not given. A connection handle is returned. If
95 the open is successful, this will be a non-negative integer. If
96 not, an error occurred.
97
98 closerm connection
99 The parameter connection is a handle to a resource monitor which
100 was previously returned from openrm. This connection is closed.
101 Nothing is returned.
102
103 downrm connection
104 Sends a command to the connected resource monitor to shutdown.
105 Nothing is returned.
106
107 configrm connection filename
108 Sends a command to the connected resource monitor to read the
109 configuration file given by filename. If this is successful, a
110 "0" is returned, otherwise, "-1" is returned.
111
112 addreq connection request
113 A resource request is sent to the connected resource monitor. If
114 this is successful, a "0" is returned, otherwise, "-1" is
115 returned.
116
117 getreq connection
118 One resource request response from the connected resource monitor
119 is returned. If an error occurred or there are no more
120 responses, an empty string is returned.
121
122 allreq request
123 A resource request is sent to all connected resource monitors.
124 The number of streams acted upon is returned.
125
126 flushreq
127 All resource requests previously sent to all connected resource
128 monitors are flushed out to the network. Nothing is returned.
129
130 activereq
131 The connection number of the next stream with something to read
132 is returned. If there is nothing to read from any of the connec‐
133 tions, a negative number is returned.
134
135 fullresp flag
136 Evaluates flag as a boolean value and sets the response mode used
137 by getreq to full if flag evaluates to "true". The full return
138 from a resource monitor includes the original request followed by
139 an equal sign followed by the response. The default situation is
140 only to return the response following the equal sign. If a
141 script needs to "see" the entire line, this function may be used.
142
143 pbsstatserv
144 The server is sent a status request for information about the
145 server itself. If the request succeeds, a list with three ele‐
146 ments is returned, otherwise an empty string is returned. The
147 first element is the server's name. The second is a list of
148 attributes. The third is the "text" associated with the server
149 (usually blank).
150
151 pbsstatjob
152 The server is sent a status request for information about the all
153 jobs resident within the server. If the request succeeds, a list
154 is returned, otherwise an empty string is returned. The list
155 contains an entry for each job. Each element is a list with
156 three elements. The first is the job's jobid. The second is a
157 list of attributes. The attribute names which specify resources
158 will have a name of the form "Resource_List:name" where "name" is
159 the resource name. The third is the "text" associated with the
160 job (usually blank).
161
162 pbsstatque
163 The server is sent a status request for information about all
164 queues resident within the server. If the request succeeds, a
165 list is returned, otherwise an empty string is returned. The
166 list contains an entry for each queue. Each element is a list
167 with three elements. This first is the queue's name. The second
168 is a list of attributes similar to pbsstatjob. The third is the
169 "text" associated with the queue (usually blank).
170
171 pbsstatnode
172 The server is sent a status request for information about all
173 nodes defined within the server. If the request succeeds, a list
174 is returned, otherwise an empty string is returned. The list
175 contains an entry for each node. Each element is a list with
176 three elements. This first is the nodes's name. The second is a
177 list of attributes similar to pbsstatjob. The third is the
178 "text" associated with the node (usually blank).
179
180 pbsselstat
181 The server is sent a status request for information about the all
182 runnable jobs resident within the server. If the request suc‐
183 ceeds, a list similar to pbsstatjob is returned, otherwise an
184 empty string is returned.
185
186 pbsrunjob jobid ?location?
187 Run the job given by jobid at the location given by location. If
188 location is not given, the default location is used. If this is
189 successful, a "0" is returned, otherwise, "-1" is returned.
190
191 pbsasyrunjob jobid ?location?
192 Run the job given by jobid at the location given by location
193 without waiting for a positive response that the job has actually
194 started. If location is not given, the default location is used.
195 If this is successful, a "0" is returned, otherwise, "-1" is
196 returned.
197
198 pbsrerunjob jobid
199 Re-runs the job given by jobid. If this is successful, a "0" is
200 returned, otherwise, "-1" is returned.
201
202 pbsdeljob jobid
203 Delete the job given by jobid. If this is successful, a "0" is
204 returned, otherwise, "-1" is returned.
205
206 pbsholdjob jobid
207 Place a hold on the job given by jobid. If this is successful, a
208 "0" is returned, otherwise, "-1" is returned.
209
210 pbsmovejob jobid ?location?
211 Move the job given by jobid to the location given by location.
212 If location is not given, the default location is used. If this
213 is successful, a "0" is returned, otherwise, "-1" is returned.
214
215 pbsqenable queue
216 Set the "enabled" attribute for the queue given by queue to true.
217 If this is successful, a "0" is returned, otherwise, "-1" is
218 returned.
219
220 pbsqdisable queue
221 Set the "enabled" attribute for the queue given by queue to
222 false. If this is successful, a "0" is returned, otherwise, "-1"
223 is returned.
224
225 pbsqstart queue
226 Set the "started" attribute for the queue given by queue to true.
227 If this is successful, a "0" is returned, otherwise, "-1" is
228 returned.
229
230 pbsqstop queue
231 Set the "started" attribute for the queue given by queue to
232 false. If this is successful, a "0" is returned, otherwise, "-1"
233 is returned.
234
235 pbsalterjob jobid attribute_list
236 Alter the attributes for a job specified by jobid. The parameter
237 attribute_list is the list of attributes to be altered. There
238 can be more than one. Each attribute consists of a list of three
239 elements. The first is the name, the second the resource and the
240 third is the new value. If the alter is successful, a "0" is
241 returned, otherwise, "-1" is returned.
242
243 pbsrescquery resource_list
244 Obtain information about the resources specified by
245 resource_list. This will be a list of strings. If the request
246 succeeds, a list with the same number of elements as
247 resource_list is returned. Each element in this list will be a
248 list with four numbers. The numbers specify available, allo‐
249 cated, reserved, and down in that order.
250
251 pbsrescreserve resource_id resource_list
252 Make (or extend) a reservation for the resources specified by
253 resource_list which will be given as a list of strings. The
254 parameter resource_id is a number which provides a unique identi‐
255 fier for a reservation being tracked by the server. If
256 resource_id is given as "0", a new reservation is created. In
257 this case, a new identifier is generated and returned by the
258 function. If an old identifier is used, that same number will be
259 returned. The Tcl variable "pbs_errno" will be set to indicate
260 the success or failure of the reservation.
261
262 pbsrescrelease resource_id
263 The reservation specified by resource_id is released.
264
265 The two following commands are not normally used by the scheduler.
266 They are included here because there could be a need for a scheduler to
267 contact a server other than the one which it normally communicates
268 with. Also, these commands are used by the Tcl tools.
269
270 pbsconnect ?server?
271 Make a connection to the named server or the default server if a
272 parameter is not given. Only one connection to a server is
273 allowed at any one time.
274
275 pbsdisconnect
276 Disconnect from the currently connected server.
277
278 The above Tcl functions use PBS interface library calls for communica‐
279 tion with the server and the PBS resource monitor library to communi‐
280 cate with pbs_mom.
281
282 datetime ?day? ?time?
283 The number of arguments used determine the type of date to be
284 calculated. With no arguments, the current POSIX date is
285 returned. This is an integer in seconds.
286
287 With one argument there are two possible formats. The first is a
288 12 (or more) character string specifying a complete date in the
289 following format:
290 YYMMDDhhmmss
291
292 All characters must be digits. The year (YY) is given by the
293 first two (or more) characters and is the number of years since
294 1900. The month (MM) is the number of the month [01-12]. The
295 day (DD) is the day of the month [01-32]. The hour (hh) is the
296 hour of the day [00-23]. The minute (mm) is minutes after the
297 hour [00-59]. The second (ss) is seconds after the minute
298 [00-59]. The POSIX date for the given date/time is returned.
299
300 The second option with one argument is a relative time. The for‐
301 mat for this is
302 HH:MM:SS
303
304 With hours (HH), minutes (MM) and seconds (SS) being separated by
305 colons ":". The number returned in this case will be the number
306 of seconds in the interval specified, not an absolute POSIX date.
307
308 With two arguments a relative date is calculated. The first
309 argument specifies a day of the week and must be one of the fol‐
310 lowing strings: "Sun", "Mon", "Tue", "Wed", "Thr", "Fri", or
311 "Sat". The second argument is a relative time as given above.
312 The POSIX date calculated will be the day of the week given which
313 follows the current day, and the time given in the second argu‐
314 ment. For example, if the current day was Monday, and the two
315 arguments were "Fri" and "04:30:00", the date calculated would be
316 the POSIX date for the Friday following the current Monday, at
317 four-thirty in the morning. If the day specified and the current
318 day are the same, the current day is used, not the day one week
319 later.
320
321 strftime format time
322 This function calls the POSIX function strftime(). It requires
323 two arguments. The first is a format string. The format con‐
324 ventions are the same as those for the POSIX function strf‐
325 time(). The second argument is POSIX calendar time in second as
326 returned by datetime. It returns a string based on the format
327 given. This gives the ability to extract information about a
328 time, or format it for printing.
329
330 The Tcl interpreter is started at program initialization and after a
331 reset (the receipt of a SIGHUP signal). It is not deleted between
332 scheduling runs so variables which are set in one can be accessed
333 later.
334
335 The "initialize" and "terminator" files are run with no supplied con‐
336 nection to the server. This means that none of the above functions
337 which talk to the server will work unless pbsconnect is called first.
338 The "body" file is run with a connection to the server already estab‐
339 lished.
340
342 A configuration file may be specified with the -c option. This file
343 may be used to specify the hosts (servers) which are allowed to connect
344 to pbs_sched. The hosts are specified in the configuration file in a
345 manor identical to that used in pbs_mom. There is one line per host
346 with the syntax:
347 $clienthost hostname
348 where clienthost and hostname are separated by white space.
349
350 Two host names are always allowed to connection to pbs_sched, "local‐
351 host" and the name returned to pbs_sched by the system call gethost‐
352 name(). These names need not be specified in the configuration file.
353
354 The configuration file must be "secure". It must be owned by a user id
355 and group id less than 10 and not be world writable.
356
358 $PBS_SERVER_HOME/sched_priv
359 the default directory for configuration files, typically
360 (/usr/spool/pbs)/sched_priv.
361
363 A C based scheduler will handle the following signals:
364
365 SIGHUP The server will close and reopen its log file and reread the
366 config file if one exists.
367
368 SIGALRM
369 If the site supplied scheduling module exceeds the time limit,
370 the Alarm will cause the scheduler to attempt to core dump and
371 restart itself.
372
373 SIGINT and SIGTERM
374 Will result in an orderly shutdown of the scheduler.
375
376 All other signals have the default action installed.
377
379 Upon normal termination, an exit status of zero is returned.
380
382 pbs_scheduler_cc(8B), pbs_scheduler_rule(8B), pbs_server(8B), and
383 pbs_mom(8B).
384 PBS Internal Design Specification
385
386
387
388Local pbs_sched(8B)