1pbs_mom(8B) PBS pbs_mom(8B)
2
3
4
6 pbs_mom - start a pbs batch execution mini-server
7
9 pbs_mom [-a alarm] [-C chkdirectory] [-c config] [-d directory]
10 [-h help] [-H hostname] [-L logfile] [-M MOMport] [-R RPPport]
11 [-p|-q|-r] [-x]
12
14 The pbs_mom command starts the operation of a batch Machine Oriented
15 Mini-server, MOM, on the local host. Typically, this command will be
16 in a local boot file such as /etc/rc.local . To insure that the
17 pbs_mom command is not runnable by the general user community, the
18 server will only execute if its real and effective uid is zero.
19
20 One function of pbs_mom is to place jobs into execution as directed by
21 the server, establish resource usage limits, monitor the job's usage,
22 and notify the server when the job completes. If they exist, pbs_mom
23 will execute a prologue script before executing a job and an epilogue
24 script after executing the job. The next function of pbs_mom is to
25 respond to resource monitor requests. This was done by a separate
26 process in previous versions of PBS but has now been combined into one
27 process. The resource monitor function is provided mainly for the PBS
28 scheduler. It provides information about the status of running jobs,
29 memory available etc. The next function of pbs_mom is to respond to
30 task manager requests. This involves communicating with running tasks
31 over a tcp socket as well as communicating with other MOMs within a job
32 (aka a "sisterhood").
33
34 Pbs_mom will record a diagnostic message in a log file for any error
35 occurrence. The log files are maintained in the mom_logs directory
36 below the home directory of the server. If the log file cannot be
37 opened, the diagnostic message is written to the system console.
38
40 -A alias Used with -m (multi-mom option) to give the alias name
41 of this instance of pbs_mom
42
43 -a alarm Specifies the alarm timeout in seconds for computing a
44 resource. Every time a resource request is processed,
45 an alarm is set for the given amount of time. If the
46 request has not completed before the given time, an
47 alarm signal is generated. The default is 5 seconds.
48
49 -C chkdirectory Specifies the path of the directory used to hold check‐
50 point files. [Currently this is only valid on Cray
51 systems.] The default directory is
52 PBS_HOME/spool/checkpoint, see the -d option. The
53 directory specified with the -C option must be owned by
54 root and accessible (rwx) only by root to protect the
55 security of the checkpoint files.
56
57 -c config Specifies an alternative configuration file, see
58 description below. If this is a relative file name it
59 will be relative to PBS_HOME/mom_priv, see the -d
60 option. If the specified file cannot be opened,
61 pbs_mom will abort. If the -c option is not supplied,
62 pbs_mom will attempt to open the default
63 configuration file "config" in PBS_HOME/mom_priv. If
64 this file is not present, pbs_mom will log the fact and
65 continue.
66
67 -h help Displays the help/usage message.
68
69 -H hostname Sets the MOM's hostname. This can be useful on multi-
70 homed networks.
71
72 -d directory Specifies the path of the directory which is the home
73 of the servers working files, PBS_HOME. This option is
74 typically used along with -M when debugging MOM. The
75 default directory is given by $PBS_SERVER_HOME which is
76 typically /usr/spool/PBS.
77
78 -L logfile Specifies an absolute path name for use as the log
79 file. If not specified, MOM will open a file named for
80 the current date in the PBS_HOME/mom_logs directory,
81 see the -d option.
82
83 -m Directs the MOM to start in multi-mom mode. In addition
84 to using -m the -M, -R and -A options need to be used
85 to properly start a MOM in multi-mom mode. For example
86 pbs_mom -m -M 30002 -R 30003 -A alias-host will start
87 pbs_mom with the service port on port 30002, the man‐
88 ager port at 30003 and with the name alias-host.
89
90 -M port Specifies the port number on which the mini-server
91 (MOM) will listen for batch requests.
92
93 -R port Specifies the port number on which the mini-server
94 (MOM) will listen for resource monitor requests, task
95 manager requests and inter-MOM messages.
96
97 -p (Default after version 2.4.0) (Preserve running jobs)
98 -- Specifies the impact on jobs which were in execution
99 when the mini-server shut-down. The -p option tries
100 to preserve any running jobs when the MOM restarts.
101 The new mini-server will not be the parent of any run‐
102 ning jobs, MOM has lost control of her offspring (not
103 a new situation for a mother). The MOM will allow the
104 jobs to continue to run and monitor them indirectly via
105 polling. All recovered jobs will report an exit code of
106 0 when they are complete. The -p option is mutually
107 exclusive with the -r, -P and -q options.
108
109 -P (Terminate all jobs and remove them from the queue) --
110 Specifies the impact on jobs which were in execution
111 when the mini-server shut-down. With the -P option, it
112 is assumed that either the entire system has been
113 restarted or the MOM has been down so long that it can
114 no longer guarantee that the pid of any running process
115 is the same as the recorded job process pid of a recov‐
116 ering job. Unlike the -p option no attempt is made to
117 try and preserve or recover running jobs. All jobs are
118 terminated and removed from the queue. The -q option
119 is mutually exclusive with the -p, -q and -r options.
120
121 -q (Requeue all jobs - This is the default behavior in
122 versions prior to 2.4.0) -- Specifies the impact on
123 jobs which were in execution when the mini-servershut-
124 down. Do not terminate running processes. With the -q
125 option, it is assumed that either the entire system has
126 been restarted or the MOM has been down so long that it
127 can no longer guarantee that the pid of any running
128 process is the same as the recorded job process pid of
129 a recovering job. No attempt is made to kill job pro‐
130 cesses. The MOM will mark the jobs as terminated and
131 notify the batch server which owns the job. Re-runnable
132 jobs will be requeued. The -q option is mutually
133 exclusive with the -p, -P and -r options.
134
135 -r (Terminate running processes and requeue all jobs) --
136 Specifies the impact on jobs which were in execution
137 when the mini-server shut-down. With the -r option, MOM
138 will kill any processes belonging to running jobs, mark
139 the jobs as terminated and notify the batch server that
140 owns the job. Re-runnable jobs are reset to a queued
141 state so they can be run again. The -r option is mutu‐
142 ally exclusive with the -p, -P and -q options.
143
144 If the -r option is used following a reboot, process
145 IDs (pids) may be reused and MOM may kill a process
146 that is not a batch session.
147
148 -S port Specifies the port number on which the pbs_server is
149 listening for requests. If pbs_server is started with
150 a -p option, pbs_mom will need to use the -S option and
151 match the port value which was used to start
152 pbs_server.
153
154 -x Disables the check for privileged port resource monitor
155 connections. This is used mainly for testing since the
156 privileged port is the only mechanism used to prevent
157 any ordinary user from connecting.
158
160 The configuration file may be specified on the command line at program
161 start with the -c flag. The use of this file is to provide several
162 types of run time information to pbs_mom: static resource names and
163 values, external resources provided by a program to be run on request
164 via a shell escape, and values to pass to internal set up functions at
165 initialization (and re-initialization).
166
167 Each item type is on a single line with the component parts separated
168 by white space. If the line starts with a hash mark (pound sign, #),
169 the line is considered to be a comment and is skipped.
170
171 Static Resources
172 For static resource names and values, the configuration file
173 contains a list of resource names/values pairs, one pair per
174 line and separated by white space. An Example of static
175 resource names and values could be the number of tape drives of
176 different types and could be specified by
177
178 tape3480 4
179 tape3420 2
180 tapedat 1
181 tape8mm 1
182
183 Shell Commands
184 If the first character of the value is an exclamation mark (!),
185 the entire rest of the line is saved to be executed through the
186 services of the system(3) standard library routine.
187
188 The shell escape provides a means for the resource monitor to
189 yield arbitrary information to the scheduler. Parameter substi‐
190 tution is done such that the value of any qualifier sent with
191 the query, as explained below, replaces a token with a percent
192 sign (%) followed by the name of the qualifier. For example,
193 here is a configuration file line which gives a resource name of
194 "escape":
195
196 escape !echo %xxx %yyy
197
198 If a query for "escape" is sent with no qualifiers, the command
199 executed would be "echo %xxx %yyy". If one qualifier is sent,
200 "escape[xxx=hi there]", the command executed would be "echo hi
201 there %yyy". If two qualifiers are sent,
202 "escape[xxx=hi][yyy=there]", the command executed would be "echo
203 hi there". If a qualifier is sent with no matching token in the
204 command line, "escape[zzz=snafu]", an error is reported.
205
206 size[fs=<FS>]
207 Specifies that the available and configured disk space in the
208 <FS> filesystem is to be reported to the pbs_server and sched‐
209 uler. NOTE: To request disk space on a per job basis, specify
210 the file resource as in 'qsub -l nodes=1,file=1000kb' For exam‐
211 ple, the available and configured disk space in the
212 /localscratch filesystem will be reported:
213
214 size[fs=/localscratch]
215
216 Initialization Value
217 An initialization value directive has a name which starts with a
218 dollar sign ($) and must be known to MOM via an internal table.
219 The entries in this table now are:
220
221 auto_ideal_load
222 if jobs are running, sets idea_load based on a simple
223 expression. The expressions start with the variable 't'
224 (total assigned CPUs) or 'c' (existing CPUs), an operator
225 (+ - / *), and followed by a float constant.
226
227 $auto_ideal_load t-0.2
228
229 auto_max_load
230 if jobs are running, sets max_load based on a simple
231 expression. The expressions start with the variable 't'
232 (total assigned CPUs) or 'c' (existing CPUs), an operator
233 (+ - / *), and followed by a float constant.
234
235 cputmult
236 which sets a factor used to adjust cpu time used by a
237 job. This is provided to allow adjustment of time
238 charged and limits enforced where the job might run on
239 systems with different cpu performance. If Mom's system
240 is faster than the reference system, set cputmult to a
241 decimal value greater than 1.0. If Mom's system is
242 slower, set cputmult to a value between 1.0 and 0.0. For
243 example:
244
245 $cputmult 1.5
246 $cputmult 0.75
247
248 configversion
249 specifies the version of the config file data, a string.
250
251 check_poll_time
252 specifies the MOM interval in seconds. MOM checks each
253 job for updated resource usages, exited processes, over-
254 limit conditions, etc. once per interval. This value
255 should be equal or lower to pbs_server's job_stat_rate.
256 High values result in stale information reported to
257 pbs_server. Low values result in increased system usage
258 by MOM. Default is 45 seconds.
259
260 down_on_error
261 causes MOM to report itself as state "down" to pbs_server
262 in the event of a failed health check. This feature is
263 EXPERIMENTAL and likely to be removed in the future. See
264 HEALTH CHECK below.
265
266 enablemomrestart
267 enable automatic restarts of MOM. If enabled, MOM will
268 check if its binary has been updated and restart itself
269 at a safe point when no jobs are running; thus making
270 upgrades easier. The check is made by comparing the
271 mtime of the pbs_mom executable. Command-line args, the
272 process name, and the PATH env variable are preserved
273 across restarts. It is recommended that this not be
274 enabled in the config file, but enabled when desired with
275 momctl (see RESOURCES for more information.)
276
277 ideal_load
278 ideal processor load. Represents a low water mark for
279 the load average. Nodes that are currently busy will
280 consider itself free after falling below ideal_load.
281
282 igncput
283 Ignore cpu time violations on this mom, meaning jobs will
284 not be cancelled due to exceeding their limits for cpu
285 time.
286
287 ignmem Ignore memory violations on this mom, meaning jobs will
288 not be cancelled due to exceeding their memory limits.
289
290 ignvmem
291 If set to true, then pbs_mom will ignore vmem/pvmem limit
292 enforcement.
293
294 ignwalltime
295 If set to true, then pbs_mom will ignore walltime limit
296 enforcement.
297
298 job_output_file_mask
299 Specifies a mask for creating job output and error files.
300 Values can be specified in base 8, 10, or 16; leading 0
301 implies octal and leading 0x or 0X hexadecimal. A value
302 of "userdefault" will use the user's default umask.
303 $job_output_file_mask 027
304
305 log_directory
306 Changes the log directory. Default is $TORQUE‐
307 HOME/mom_logs/. $TORQUEHOME default is /var/spool/torque/
308 but can be changed in the ./configure script. The value
309 is a string and should be the full path to the desired
310 mom log directory. $log_directory /opt/torque/mom_logs/
311
312 logevent
313 which sets the mask that determines which event types are
314 logged by pbs_mom. For example:
315
316 $logevent 0x1fff
317 $logevent 255
318
319 The first example would set the log event mask to 0x1ff
320 (511) which enables logging of all events including debug
321 events. The second example would set the mask to 0x0ff
322 (255) which enables all events except debug events.
323
324 log_file_suffix
325 Optional suffix to append to log file names. If %h is the
326 suffix, pbs_mom appends the hostname for where the log
327 files are stored if it knows it, otherwise it will append
328 the hostname where the mom is running. $log_file_suffix
329 tom = 20100223.tom
330
331 log_keep_days
332 Specifies how many days to keep log files. pbs_mom
333 deletes log files older than the specified number of
334 days. If not specified, pbs_mom won't delete log files
335 based on their age.
336
337 loglevel
338 specifies the verbosity of logging with higher numbers
339 specifying more verbose logging. Values may range
340 between 0 and 7.
341
342 log_file_max_size
343 If this is set to a value > 0 then pbs_mom will roll
344 the current log file to log-file-name.1 when its size is
345 greater than or equal to the value of
346 log_file_max_size. This value is interpreted as kilo‐
347 bytes.
348
349 log_file_roll_depth
350 If this is set to a value >=1 and log_file_max_size is
351 set then pbs_mom will continue rolling the log files to
352 log-file-name.log_file_roll_depth.
353
354 max_load
355 maximum processor load. Nodes over this load average are
356 considered busy (see ideal_load above).
357
358 memory_pressure_threshold
359 The option is only available, if pbs_mom is enabled to
360 use cpusets. If set to a value > 0, a job gets killed if
361 its memory pressure exceeds this value, and if $mem‐
362 ory_pressure_duration is set. The default is 0 (memory
363 pressure recording is off).
364 See cpuset(7) for more information about memory pressure.
365
366 memory_pressure_duration
367 The option is only available, if pbs_mom is enabled to
368 use cpusets. Specifies the number of subsequent MOM
369 intervals a job's memory pressure must be above $mem‐
370 ory_pressure_threshold to get killed. The default is 0
371 (jobs are never killed due to memory pressure). set
372 See cpuset(7) for more information about memory pressure.
373
374 node_check_script
375 specifies the fully qualified pathname of the health
376 check script to run (see HEALTH CHECK for more informa‐
377 tion).
378
379 node_check_interval
380 specifies when to run the MOM health check. The check
381 can be either periodic, event-driver, or both. The value
382 starts with an integer specifying the number of MOM
383 intervals between subsequent executions of the specified
384 health check. After the integer is an optional comma-
385 separated list of event names. Currently supported are
386 "jobstart" and "jobend". This value defaults to 1 with
387 no events indicating the check is run every MOM interval.
388 (see HEALTH CHECK for more information)
389
390 $node_check_interval 0 #Disabled.
391 $node_check_interval 0,jobstart #Only runs at job starts
392 $node_check_interval 10,jobstart,jobend
393
394 nodefile_suffix
395 Specifies the suffix to append to a host names to denote
396 the data channel network adapter in a multihomed compute
397 node. $nodefile_suffix i With the suffix of 'i' and the
398 control channel adapter with the name node01, the data
399 channel would have a hostname of node01i.
400
401 nospool_dir_list
402 If the job's output file should be in one of the paths
403 specified here, then it will be spooled directly in that
404 directory instead of the normal spool directory.
405 Specified in the format path1, path2, etc.
406 $nospool_dir_list/home/mike/*,/var/tmp/spool/
407
408 pbsclient
409 which causes a host name to be added to the list of hosts
410 which will be allowed to connect to MOM as long as they
411 are using a privilaged port for the purposes of resource
412 monitor requests. For example, here are two configura‐
413 tion file lines which will allow the hosts "fred" and
414 "wilma" to connect:
415
416 $pbsclient fred
417 $pbsclient wilma
418
419 Two host name are always allowed to connection to
420 pbs_mom, "localhost" and the name returned to pbs_mom by
421 the system call gethostname(). These names need not be
422 specified in the configuration file. The hosts listed as
423 "clients" can issue Resource Monitor (RM) requests.
424 Other MOM nodes and servers do not need to be listed as
425 clients.
426
427 pbsserver
428 which defines hostnames running pbs_server that will be
429 allowed to submit jobs, issue Resource Monitor (RM)
430 requests, and get status updates. MOM will continually
431 attempt to contact all server hosts for node status and
432 state updates. Like $PBS_SERVER_HOME/server_name, the
433 hostname may be followed by a colon and a port number.
434 This parameter replaces the oft-confused $clienthost
435 parameter from TORQUE 2.0.0p0 and earlier. Note that the
436 hostname in $PBS_SERVER_HOME/server_name is used if no
437 $pbsserver parameters are found
438
439 prologalarm
440 Specifies maximum duration (in seconds) which the MOM
441 will wait for the job prolog or job job epilog to com‐
442 plete. This parameter default to 300 seconds (5 minutes)
443
444 rcpcmd Specify the the full path and argument to be used for
445 remote file copies. This overrides the compile-time
446 default found in configure. This must contain 2 words:
447 the full path to the command and the switches. The copy
448 command must be able to recursively copy files to the
449 remote host and accept arguments of the form
450 "user@host:files" For example:
451
452 $rcpcmd /usr/bin/rcp -rp
453 $rcpcmd /usr/bin/scp -rpB
454
455 restricted
456 which causes a host name to be added to the list of hosts
457 which will be allowed to connect to MOM without needing
458 to use a privilaged port. These names allow for wildcard
459 matching. For example, here is a configuration file line
460 which will allow queries from any host from the domain
461 "ibm.com".
462
463 $restricted *.ibm.com
464
465 The restriction which applies to these connections is
466 that only internal queries may be made. No resources
467 from a config file will be found. This is to prevent any
468 shell commands from being run by a non-root process.
469 This parameter is generally not required except for some
470 versions of OSX.
471
472 remote_checkpoint_dirs
473 Specifies what server checkpoint directories are remotely
474 mounted. This directive is used to tell the MOM which
475 directories are shared with the server. Using remote
476 checkpoint directories eliminates the need to copy the
477 checkpoint files back and forth between the MOM and the
478 server. This parameter is available in 2.4.1 and later.
479
480 $remote_checkpoint_dirs /var/spool/torque/checkpoint
481
482 remote_reconfig
483 Enables the ability to remotely reconfigure pbs_mom with
484 a new config file. Default is disabled. This parameter
485 accepts various forms of true, yes, and 1.
486
487 source_login_batch
488 Specifies whether or not mom will source the /etc/pro‐
489 file, etc. type files for batch jobs. Parameter accepts
490 various forms of true, false, yes, no, 1 and 0. Default
491 is True.
492
493 source_login_interactive
494 Specifies whether or not mom will source the /etc/pro‐
495 file, etc. type files for interactive jobs. Parameter
496 accepts various forms of true, false, yes, no, 1 and 0.
497 Default is True.
498
499 spool_as_final_name
500 If set to true, jobs will spool directly as their output
501 files, with no intermediate locations or steps. This is
502 mostly useful for shared filesystems with fast writing
503 capability.
504
505 status_update_time
506 Specifies (in seconds) how often MOM updates its status
507 information to pbs_server. This value should correlate
508 with the server's scheduling interval. High values
509 increase the load of pbs_server and the network. Low
510 values cause pbs_server to report stale information.
511 Default is 45 seconds.
512
513 tmpdir Sets the directory basename for a per-job temporary
514 directory. Before job launch, MOM will append the jobid
515 to the tmpdir basename and create the directory. After
516 the job exit, MOM will recursively delete it. The env
517 variable TMPDIR will be set for all pro/epilog scripts,
518 the job script, and TM tasks.
519 Directory creation and removal is done as the job owner
520 and group, so the owner must have write permission to
521 create the directory. If the directory already exists
522 and is owned by the job owner, it will not be deleted
523 after the job. If the directory already exists and is
524 NOT owned by the job owner, the job start will be
525 rejected.
526
527 timeout
528 Specifies the number of seconds before TCP messages will
529 time out. TCP messages include job obituaries, and TM
530 requests if RPP is disabled. Default is 60 seconds.
531
532 usecp specifies which directories should be staged with cp
533 instead of rcp/scp. If a shared filesystem is available
534 on all hosts in a cluster, this directive is used to make
535 these filesystems known to MOM. For example, if /home is
536 NFS mounted on all nodes in a cluster:
537
538 $usecp *:/home /home
539
540 varattr
541 This is similar to a shell escape above, but includes a
542 TTL. The command will only be run every TTL seconds. A
543 TTL of -1 will cause the command to be executed only
544 once. A TTL of 0 will cause the command to be run every‐
545 time varattr is requested. This parameter may be used
546 multiple times, but all output will be grouped into a
547 single "varattr" attribute in the request and status out‐
548 put. The command should output data in the form of
549 varattrname=va1ue1[+value2]...
550
551 $varattr 3600 /path/to/script [<ARGS>]...
552
553 use_smt
554 This option is only available, if pbs_mom is enabled to
555 use cpusets. It has only effect, if there are more that
556 one logical processor per physical core in the system
557 (simultaneous multithreading or hyperthreading is enabled
558 via BIOS settings). If set to true, all logical proces‐
559 sors of allocated cores are added to the cpuset of a job.
560 If set to false, only the first logical processor per
561 allocated core is contained in the cpuset of a job. The
562 default is true.
563
564 wallmult
565 which sets a factor used to adjust wall time usage by to
566 job to a common reference system. The factor is used for
567 walltime calculations and limits the same as cputmult is
568 used for cpu time.
569
570 The configuration file must be executable and "secure". It must be
571 owned by a user id and group id less than 10 and not be world writable.
572 Output from this file must be in the format $VAR=$VAL, i.e.,
573
574 dataset13=20070104
575 dataset22=20070202
576 viraltest=abdd3
577
578 xauthpath
579 Specifies the path to the xauth binary to enable X11 fowarding.
580
581 mom_host
582 Sets the local hostname as used by pbs_mom.
583
585 There is also an optional layout file for creating multiple moms on one
586 box in a specified layout. In the file, each mom on the single box is
587 given its own hostname, cpu indexes, memory nodes (a linux construct),
588 and memory size. This is useful for NUMA systems. Each line in the file
589 specifies one mom. The file follows the following format:
590
591 <hostname> cpus=<X> mem=<Y> memsize=<Z>
592 cpus and mem can be comma separated lists, while memsize should
593 be a memory size in the format:
594
595 <number><units>
596 For example, a file could contain the following line:
597
598 foohost-1 cpus=1,2 mem=1,2,3,4 memsize=8GB
599 This would specify that foohost-1 has cpus 1 and 2, memory nodes
600 1-4, and a total of 8 GB of memory.
601
603 Resource Monitor queries can be made with momctl's -q option to
604 retrieve and set pbs_mom options. Any configured static resource may
605 be retrieved with a request of the same name. These are resource
606 requests not otherwise documented in the PBS ERS.
607
608 cycle forces an immediate MOM cycle
609
610 status_update_time
611 retrieve or set the $status_update_time parameter
612
613 check_poll_time
614 retrieve or set the $check_poll_time parameter
615
616 configversion
617 retrieve the config version
618
619 jobstartblocktime
620 retrieve or set the $jobstartblocktime parameter
621
622 enablemomrestart
623 retrieve or set the $enablemomrestart parameter
624
625 loglevel
626 retrieve or set the $loglevel parameter
627
628 down_on_error
629 retrieve or set the EXPERIMENTAL $down_on_error parameter
630
631 diag0 - diag4
632 retrieves various diagnostic information
633
634 rcpcmd retrieve or set the $rcpcmd parameter
635
636 version
637 retrieves the pbs_mom version
638
640 The health check script is executed directly by the pbs_mom daemon
641 under the root user id. It must be accessible from the compute node and
642 may be a script or compiled executable program. It may make any needed
643 system calls and execute any combination of system utilities but should
644 not execute resource manager client commands. Also, as of TORQUE
645 1.0.1, the pbs_mom daemon blocks until the health check is completed
646 and does not possess a built-in timeout. Consequently, it is advisable
647 to keep the launch script execution time short and verify that the
648 script will not block even under failure conditions.
649
650 If the script detects a failure, it should return the keyword 'ERROR'
651 to stdout followed by an error message. The message (up to 256 charac‐
652 ters) immediately following the ERROR string will be assigned to the
653 node attribute 'message' of the associated node.
654
655 If the script detects a failure when run from "jobstart", then the job
656 will be rejected. This should probably only be used with advanced
657 schedulers like Moab so that the job can be routed to another node.
658
659 TORQUE currently ignores ERROR messages by default, but advanced sched‐
660 ulers like moab can be configured to react appropriately.
661
662 If the experimental $down_on_error MOM setting is enabled, MOM will set
663 itself to state down and report to pbs_server; and pbs_server will
664 report the node as "down". Additionally, the experimental
665 "down_on_error" server attribute can be enabled which has the same
666 effect but moves the decision to pbs_server. It is redundant to have
667 MOM's $down_on_error and pbs_server's down_on_error features enabled.
668 See "down_on_error" in pbs_server_attributes(7B).
669
671 $PBS_SERVER_HOME/server_name
672 contains the hostname running pbs_server.
673
674 $PBS_SERVER_HOME/mom_priv
675 the default directory for configuration files, typically
676 (/usr/spool/pbs)/mom_priv.
677
678 $PBS_SERVER_HOME/mom_logs
679 directory for log files recorded by the server.
680
681 $PBS_SERVER_HOME/mom_priv/prologue
682 the administrative script to be run before job execution.
683
684 $PBS_SERVER_HOME/mom_priv/epilogue
685 the administrative script to be run after job execution.
686
688 pbs_mom handles the following signals:
689
690 SIGHUP causes pbs_mom to re-read its configuration file, close and
691 reopen the log file, and reinitialize resource structures.
692
693 SIGALRM
694 results in a log file entry. The signal is used to limit the
695 time taken by certain children processes, such as the prologue
696 and epilogue.
697
698 SIGINT and SIGTERM
699 results in pbs_mom exiting without terminating any running jobs.
700 This is the action for the following signals as well: SIGXCPU,
701 SIGXFSZ, SIGCPULIM, and SIGSHUTDN.
702
703 SIGUSR1, SIGUSR2
704 causes MOM to increase and decrease logging levels, respec‐
705 tively.
706
707 SIGPIPE, SIGINFO
708 are ignored.
709
710 SIGBUS, SIGFPE, SIGILL, SIGTRAP, and SIGSYS
711 cause a core dump if the PBSCOREDUMP environmental variable is
712 defined.
713
714 All other signals have their default behavior installed.
715
717 If the mini-server command fails to begin operation, the server exits
718 with a value greater than zero.
719
721 pbs_server(8B), pbs_scheduler_basl(8B), pbs_scheduler_tcl(8B), the PBS
722 External Reference Specification, and the PBS Administrator's Guide.
723
724
725
726Local pbs_mom(8B)