1pbs_mom(8B)                           PBS                          pbs_mom(8B)
2
3
4

NAME

6       pbs_mom - start a pbs batch execution mini-server
7

SYNOPSIS

9       pbs_mom  [-a alarm]  [-C chkdirectory]  [-c config] [-D] [-d directory]
10       [-F] [-h help]  [-H hostname]  [-L logfile]  [-M MOMport]  [-R RPPport]
11       [-p|-q|-r] [-w] [-x]
12
13       SH  DESCRIPTION The pbs_mom command starts the operation of a batch Ma‐
14       chine Oriented Mini-server, MOM, on the local  host.   Typically,  this
15       command will be in a local boot file such as /etc/rc.local .  To insure
16       that the pbs_mom command is not runnable by the general user community,
17       the server will only execute if its real and effective uid is zero.
18
19       One  function of pbs_mom is to place jobs into execution as directed by
20       the server, establish resource usage limits, monitor the  job's  usage,
21       and  notify  the server when the job completes.  If they exist, pbs_mom
22       will execute a prologue script before executing a job and  an  epilogue
23       script after executing the job.  The next function of pbs_mom is to re‐
24       spond to resource monitor  requests.   This  was  done  by  a  separate
25       process  in previous versions of PBS but has now been combined into one
26       process.  The resource monitor function is provided mainly for the  PBS
27       scheduler.   It  provides information about the status of running jobs,
28       memory available etc.  The next function of pbs_mom is  to  respond  to
29       task  manager requests.  This involves communicating with running tasks
30       over a tcp socket as well as communicating with other MOMs within a job
31       (aka a "sisterhood").
32
33       Pbs_mom  will  record  a diagnostic message in a log file for any error
34       occurrence.  The log files are maintained in the mom_logs directory be‐
35       low  the  home  directory  of  the  server.   If the log file cannot be
36       opened, the diagnostic message is written to the system console.
37

OPTIONS

39       -A alias        Used with -m (multi-mom option) to give the alias  name
40                       of this instance of pbs_mom
41
42       -a alarm        Specifies  the alarm timeout in seconds for computing a
43                       resource.  Every time a resource request is  processed,
44                       an  alarm  is set for the given amount of time.  If the
45                       request has not completed before  the  given  time,  an
46                       alarm signal is generated.  The default is 5 seconds.
47
48       -C chkdirectory Specifies the path of the directory used to hold check‐
49                       point files.  [Currently this is  only  valid  on  Cray
50                       systems.]       The      default      directory      is
51                       PBS_HOME/spool/checkpoint, see the -d option.  The  di‐
52                       rectory  specified  with the -C option must be owned by
53                       root and accessible (rwx) only by root to  protect  the
54                       security of the checkpoint files.
55
56       -c config       Specifies  an  alternative  configuration file, see de‐
57                       scription below.  If this is a relative  file  name  it
58                       will  be  relative to PBS_HOME/mom_priv, see the -d op‐
59                       tion.  If the specified file cannot be opened,  pbs_mom
60                       will  abort.  If the -c option is not supplied, pbs_mom
61                       will attempt to open the default
62                        configuration file "config" in PBS_HOME/mom_priv.   If
63                       this file is not present, pbs_mom will log the fact and
64                       continue.
65
66       -h help         Displays the help/usage message.
67
68       -H hostname     Sets the MOM's hostname.  This can be useful on  multi-
69                       homed networks.
70
71       -D              Debug mode. Do not fork.
72
73       -d directory    Specifies  the  path of the directory which is the home
74                       of the servers working files, PBS_HOME.  This option is
75                       typically  used  along with -M when debugging MOM.  The
76                       default directory is given by $PBS_SERVER_HOME which is
77                       typically /usr/spool/PBS.
78
79       -F              Do not fork. Use when running under systemd.
80
81       -L logfile      Specifies  an  absolute  path  name  for use as the log
82                       file.  If not specified, MOM will open a file named for
83                       the  current  date  in the PBS_HOME/mom_logs directory,
84                       see the -d option.
85
86       -m              Directs the MOM to start in multi-mom mode. In addition
87                       to  using  -m the -M, -R and -A options need to be used
88                       to properly start a MOM in multi-mom mode.  For example
89                       pbs_mom  -m  -M 30002 -R 30003 -A alias-host will start
90                       pbs_mom with the service port on port 30002,  the  man‐
91                       ager port at 30003 and with the name alias-host.
92
93       -M port         Specifies  the  port  number  on  which the mini-server
94                       (MOM) will listen for batch requests.
95
96       -R port         Specifies the port  number  on  which  the  mini-server
97                       (MOM)  will  listen for resource monitor requests, task
98                       manager requests and inter-MOM messages.
99
100       -p              (Default after version 2.4.0) (Preserve  running  jobs)
101                       -- Specifies the impact on jobs which were in execution
102                       when the    mini-server shut-down.  The -p option tries
103                       to  preserve  any  running  jobs when the MOM restarts.
104                       The new mini-server will not be the parent of any  run‐
105                       ning  jobs, MOM has lost control of her  offspring (not
106                       a new situation for a mother).  The MOM will allow  the
107                       jobs to continue to run and monitor them indirectly via
108                       polling. All recovered jobs will report an exit code of
109                       0 when they are complete. The -p option is mutually ex‐
110                       clusive with the -r, -P and -q options.
111
112       -P              (Terminate all jobs and remove them from the queue)  --
113                       Specifies  the  impact  on jobs which were in execution
114                       when the mini-server shut-down.  With the -P option, it
115                       is  assumed  that  either  the  entire  system has been
116                       restarted or the MOM has been down so long that it  can
117                       no longer guarantee that the pid of any running process
118                       is the same as the recorded job process pid of a recov‐
119                       ering  job.  Unlike the -p option no attempt is made to
120                       try and preserve or recover running jobs. All jobs  are
121                       terminated  and  removed from the queue.  The -q option
122                       is mutually exclusive with the -p, -q and -r options.
123
124       -q              (Requeue all jobs - This is  the  default  behavior  in
125                       versions  prior  to  2.4.0)  -- Specifies the impact on
126                       jobs which were in execution when the  mini-servershut-
127                       down.  Do not terminate running processes.  With the -q
128                       option, it is assumed that either the entire system has
129                       been restarted or the MOM has been down so long that it
130                       can no longer guarantee that the  pid  of  any  running
131                       process  is the same as the recorded job process pid of
132                       a recovering job. No attempt is made to kill  job  pro‐
133                       cesses.   The  MOM will mark the jobs as terminated and
134                       notify the batch server which owns the job. Re-runnable
135                       jobs  will  be requeued.  The -q option is mutually ex‐
136                       clusive with the -p, -P and -r options.
137
138       -r              (Terminate running processes and requeue all  jobs)  --
139                       Specifies  the  impact  on jobs which were in execution
140                       when the mini-server shut-down. With the -r option, MOM
141                       will kill any processes belonging to running jobs, mark
142                       the jobs as terminated and notify the batch server that
143                       owns  the  job.  Re-runnable jobs are reset to a queued
144                       state so they can be run again.  The -r option is mutu‐
145                       ally exclusive with the -p, -P and -q options.
146
147                       If  the  -r  option is used following a reboot, process
148                       IDs (pids) may be reused and MOM  may  kill  a  process
149                       that is not a batch session.
150
151       -S port         Specifies  the  port  number on which the pbs_server is
152                       listening for requests.  If pbs_server is started  with
153                       a -p option, pbs_mom will need to use the -S option and
154                       match  the  port  value  which  was   used   to   start
155                       pbs_server.
156
157       -w              When  started  with  -w,  pbs_moms  wait until they get
158                       their MOM hierarchy file from pbs_server to send  their
159                       first  update,  or  until 10 minutes pass. This reduces
160                       network traffic on startup and can  bring  up  clusters
161                       faster.
162
163       -x              Disables the check for privileged port resource monitor
164                       connections.  This is used mainly for testing since the
165                       privileged  port  is the only mechanism used to prevent
166                       any ordinary user from connecting.
167

CONFIGURATION FILE

169       The configuration file may be specified on the command line at  program
170       start  with  the  -c  flag.  The use of this file is to provide several
171       types of run time information to pbs_mom:  static  resource  names  and
172       values,  external  resources provided by a program to be run on request
173       via a shell escape, and values to pass to internal set up functions  at
174       initialization (and re-initialization).
175
176       Each  item  type is on a single line with the component parts separated
177       by white space.  If the line starts with a hash mark (pound  sign,  #),
178       the line is considered to be a comment and is skipped.
179
180       Static Resources
181              For  static  resource  names  and values, the configuration file
182              contains a list of resource names/values  pairs,  one  pair  per
183              line  and  separated  by white space.   An Example of static re‐
184              source names and values could be the number of  tape  drives  of
185              different types and could be specified by
186
187              tape3480      4
188              tape3420      2
189              tapedat       1
190              tape8mm       1
191
192       Shell Commands
193              If  the first character of the value is an exclamation mark (!),
194              the entire rest of the line is saved to be executed through  the
195              services of the system(3) standard library routine.
196
197              The  shell  escape  provides a means for the resource monitor to
198              yield arbitrary information to the scheduler.  Parameter substi‐
199              tution  is  done  such that the value of any qualifier sent with
200              the query, as explained below, replaces a token with  a  percent
201              sign  (%)  followed  by the name of the qualifier.  For example,
202              here is a configuration file line which gives a resource name of
203              "escape":
204
205              escape     !echo %xxx %yyy
206
207              If  a query for "escape" is sent with no qualifiers, the command
208              executed would be "echo %xxx %yyy".  If one qualifier  is  sent,
209              "escape[xxx=hi  there]",  the command executed would be "echo hi
210              there   %yyy".    If   two    qualifiers    are    sent,    "es‐
211              cape[xxx=hi][yyy=there]", the command executed would be "echo hi
212              there".  If a qualifier is sent with no matching  token  in  the
213              command line, "escape[zzz=snafu]", an error is reported.
214
215       size[fs=<FS>]
216              Specifies  that  the  available and configured disk space in the
217              <FS> filesystem is to be reported to the pbs_server  and  sched‐
218              uler.   NOTE:  To request disk space on a per job basis, specify
219              the file resource as in 'qsub -l nodes=1,file=1000kb'  For exam‐
220              ple,  the  available  and  configured  disk  space  in  the /lo‐
221              calscratch filesystem will be reported:
222
223              size[fs=/localscratch]
224
225       Initialization Value
226              An initialization value directive has a name which starts with a
227              dollar  sign ($) and must be known to MOM via an internal table.
228              The entries in this table now are:
229
230              auto_ideal_load
231                     if jobs are running, sets idea_load based on a simple ex‐
232                     pression.   The  expressions  start with the variable 't'
233                     (total assigned CPUs) or 'c' (existing CPUs), an operator
234                     (+ - / *), and followed by a float constant.
235
236                     $auto_ideal_load t-0.2
237
238              auto_max_load
239                     if  jobs are running, sets max_load based on a simple ex‐
240                     pression.  The expressions start with  the  variable  't'
241                     (total assigned CPUs) or 'c' (existing CPUs), an operator
242                     (+ - / *), and followed by a float constant.
243
244              cputmult
245                     which sets a factor used to adjust cpu  time  used  by  a
246                     job.   This  is  provided  to  allow  adjustment  of time
247                     charged and limits enforced where the job  might  run  on
248                     systems  with different cpu performance.  If Mom's system
249                     is faster than the reference system, set  cputmult  to  a
250                     decimal  value  greater  than  1.0.    If Mom's system is
251                     slower, set cputmult to a value between 1.0 and 0.0.  For
252                     example:
253
254                     $cputmult 1.5
255                     $cputmult 0.75
256
257              configversion
258                     specifies the version of the config file data, a string.
259
260              check_poll_time
261                     specifies  the  MOM interval in seconds.  MOM checks each
262                     job for updated resource usages, exited processes,  over-
263                     limit  conditions,  etc.  once  per interval.  This value
264                     should be equal or lower to  pbs_server's  job_stat_rate.
265                     High  values  result  in  stale  information  reported to
266                     pbs_server.  Low values result in increased system  usage
267                     by MOM.  Default is 45 seconds.
268
269              down_on_error
270                     causes MOM to report itself as state "down" to pbs_server
271                     in the event of a failed health check.  This  feature  is
272                     EXPERIMENTAL and likely to be removed in the future.  See
273                     HEALTH CHECK below.
274
275              enablemomrestart
276                     enable automatic restarts of MOM.  If enabled,  MOM  will
277                     check  if  its binary has been updated and restart itself
278                     at a safe point when no jobs are running; thus making up‐
279                     grades  easier.  The check is made by comparing the mtime
280                     of  the  pbs_mom  executable.   Command-line  args,   the
281                     process  name,  and  the  PATH env variable are preserved
282                     across restarts.  It is recommended that this not be  en‐
283                     abled  in  the config file, but enabled when desired with
284                     momctl (see RESOURCES for more information.)
285
286              ideal_load
287                     ideal processor load.  Represents a low  water  mark  for
288                     the  load  average.   Nodes  that are currently busy will
289                     consider itself free after falling below ideal_load.
290
291              igncput
292                     Ignore cpu time violations on this mom, meaning jobs will
293                     not  be  cancelled  due to exceeding their limits for cpu
294                     time.
295
296              ignmem Ignore memory violations on this mom, meaning  jobs  will
297                     not be cancelled due to exceeding their memory limits.
298
299              ignvmem
300                     If set to true, then pbs_mom will ignore vmem/pvmem limit
301                     enforcement.
302
303              ignwalltime
304                     If set to true, then pbs_mom will ignore  walltime  limit
305                     enforcement.
306
307              job_output_file_mask
308                     Specifies a mask for creating job output and error files.
309                     Values can be specified in base 8, 10, or 16;  leading  0
310                     implies  octal  and leading 0x or 0X hexadecimal. A value
311                     of "userdefault"  will  use  the  user's  default  umask.
312                     $job_output_file_mask 027
313
314              log_directory
315                     Changes   the   log   directory.   Default   is  $TORQUE‐
316                     HOME/mom_logs/. $TORQUEHOME default is /var/spool/torque/
317                     but  can  be changed in the ./configure script. The value
318                     is a string and should be the full path  to  the  desired
319                     mom log directory.  $log_directory /opt/torque/mom_logs/
320
321              logevent
322                     which sets the mask that determines which event types are
323                     logged by pbs_mom.  For example:
324
325                     $logevent 0x1fff
326                     $logevent 255
327
328                     The first example would set the log event mask  to  0x1ff
329                     (511) which enables logging of all events including debug
330                     events.  The second example would set the mask  to  0x0ff
331                     (255) which enables all events except debug events.
332
333              log_file_suffix
334                     Optional suffix to append to log file names. If %h is the
335                     suffix, pbs_mom appends the hostname for  where  the  log
336                     files are stored if it knows it, otherwise it will append
337                     the hostname where the mom is running.   $log_file_suffix
338                     tom = 20100223.tom
339
340              log_keep_days
341                     Specifies  how  many  days  to  keep  log  files. pbs_mom
342                     deletes log files older  than  the  specified  number  of
343                     days.  If  not  specified, pbs_mom won't delete log files
344                     based on their age.
345
346              loglevel
347                     specifies the verbosity of logging  with  higher  numbers
348                     specifying  more  verbose  logging.  Values may range be‐
349                     tween 0 and 7.
350
351              log_file_max_size
352                     If  this  is set to a value > 0 then  pbs_mom  will  roll
353                     the  current log file to log-file-name.1 when its size is
354                     greater    than   or    equal    to    the    value    of
355                     log_file_max_size.  This  value  is  interpreted as kilo‐
356                     bytes.
357
358              log_file_roll_depth
359                     If this is set to a value >=1 and  log_file_max_size   is
360                     set then  pbs_mom  will continue rolling the log files to
361                     log-file-name.log_file_roll_depth.
362
363              max_load
364                     maximum processor load.  Nodes over this load average are
365                     considered busy (see ideal_load above).
366
367              memory_pressure_threshold
368                     The  option  is  only available, if pbs_mom is enabled to
369                     use cpusets.  If set to a value > 0, a job gets killed if
370                     its  memory  pressure  exceeds  this  value, and if $mem‐
371                     ory_pressure_duration is set.  The default is  0  (memory
372                     pressure recording is off).
373                     See cpuset(7) for more information about memory pressure.
374
375              memory_pressure_duration
376                     The  option  is  only available, if pbs_mom is enabled to
377                     use cpusets.  Specifies the number of subsequent MOM  in‐
378                     tervals  a  job's  memory  pressure  must  be above $mem‐
379                     ory_pressure_threshold to get killed.  The default  is  0
380                     (jobs are never killed due to memory pressure).  set
381                     See cpuset(7) for more information about memory pressure.
382
383              node_check_script
384                     specifies  the  fully  qualified  pathname  of the health
385                     check script to run (see HEALTH CHECK for  more  informa‐
386                     tion).
387
388              node_check_interval
389                     specifies  when  to  run the MOM health check.  The check
390                     can be either periodic, event-driver, or both.  The value
391                     starts  with  an integer specifying the number of MOM in‐
392                     tervals between subsequent executions  of  the  specified
393                     health  check.   After  the integer is an optional comma-
394                     separated list of event names.  Currently  supported  are
395                     "jobstart"  and  "jobend".  This value defaults to 1 with
396                     no events indicating the check is run every MOM interval.
397                     (see HEALTH CHECK for more information)
398
399                     $node_check_interval 0  #Disabled.
400                     $node_check_interval 0,jobstart  #Only runs at job starts
401                     $node_check_interval 10,jobstart,jobend
402
403              nodefile_suffix
404                     Specifies  the suffix to append to a host names to denote
405                     the data channel network adapter in a multihomed  compute
406                     node.   $nodefile_suffix i With the suffix of 'i' and the
407                     control channel adapter with the name  node01,  the  data
408                     channel would have a hostname of node01i.
409
410              nospool_dir_list
411                     If  the  job's  output file should be in one of the paths
412                     specified here, then it will be spooled directly in  that
413                     directory instead of the normal spool directory.
414                     Specified    in    the    format   path1,   path2,   etc.
415                     $nospool_dir_list/home/mike/*,/var/tmp/spool/
416
417              pbsclient
418                     which causes a host name to be added to the list of hosts
419                     which  will  be allowed to connect to MOM as long as they
420                     are using a privilaged port for the purposes of  resource
421                     monitor  requests.   For example, here are two configura‐
422                     tion file lines which will allow  the  hosts  "fred"  and
423                     "wilma" to connect:
424
425                     $pbsclient      fred
426                     $pbsclient      wilma
427
428                     Two  host  name  are  always  allowed  to  connection  to
429                     pbs_mom, "localhost" and the name returned to pbs_mom  by
430                     the  system  call gethostname().  These names need not be
431                     specified in the configuration file.  The hosts listed as
432                     "clients"  can  issue  Resource  Monitor  (RM)  requests.
433                     Other MOM nodes and servers do not need to be  listed  as
434                     clients.
435
436              pbsserver
437                     which  defines  hostnames running pbs_server that will be
438                     allowed to submit jobs, issue Resource Monitor  (RM)  re‐
439                     quests, and get status updates.  MOM will continually at‐
440                     tempt to contact all server hosts  for  node  status  and
441                     state  updates.   Like  $PBS_SERVER_HOME/server_name, the
442                     hostname may be followed by a colon and  a  port  number.
443                     This  parameter replaces the oft-confused $clienthost pa‐
444                     rameter from TORQUE 2.0.0p0 and earlier.  Note  that  the
445                     hostname  in  $PBS_SERVER_HOME/server_name  is used if no
446                     $pbsserver parameters are found
447
448              prologalarm
449                     Specifies maximum duration (in  seconds)  which  the  MOM
450                     will  wait  for  the job prolog or job job epilog to com‐
451                     plete.  This parameter default to 300 seconds (5 minutes)
452
453              rcpcmd Specify the the full path and argument to be used for re‐
454                     mote  file  copies.   This overrides the compile-time de‐
455                     fault found in configure.  This must contain 2 words: the
456                     full path to the command and the switches.  The copy com‐
457                     mand must be able to recursively copy files to the remote
458                     host  and  accept arguments of the form "user@host:files"
459                     For example:
460
461                     $rcpcmd /usr/bin/rcp -rp
462                     $rcpcmd /usr/bin/scp -rpB
463
464              restricted
465                     which causes a host name to be added to the list of hosts
466                     which  will  be allowed to connect to MOM without needing
467                     to use a privilaged port.  These names allow for wildcard
468                     matching.  For example, here is a configuration file line
469                     which will allow queries from any host  from  the  domain
470                     "ibm.com".
471
472                     $restricted      *.ibm.com
473
474                     The  restriction  which  applies  to these connections is
475                     that only internal queries may  be  made.   No  resources
476                     from a config file will be found.  This is to prevent any
477                     shell commands from being run by a non-root process.
478                     This parameter is generally not required except for  some
479                     versions of OSX.
480
481              remote_checkpoint_dirs
482                     Specifies what server checkpoint directories are remotely
483                     mounted.  This directive is used to tell  the  MOM  which
484                     directories  are  shared  with  the server.  Using remote
485                     checkpoint directories eliminates the need  to  copy  the
486                     checkpoint  files  back and forth between the MOM and the
487                     server. This parameter is available in 2.4.1 and later.
488
489                     $remote_checkpoint_dirs /var/spool/torque/checkpoint
490
491              remote_reconfig
492                     Enables the ability to remotely reconfigure pbs_mom  with
493                     a  new config file.  Default is disabled.  This parameter
494                     accepts various forms of true, yes, and 1.
495
496              source_login_batch
497                     Specifies whether or not mom will  source  the  /etc/pro‐
498                     file,  etc.  type files for batch jobs. Parameter accepts
499                     various forms of true, false, yes, no, 1 and  0.  Default
500                     is True.
501
502              source_login_interactive
503                     Specifies  whether  or  not mom will source the /etc/pro‐
504                     file, etc. type files for interactive jobs. Parameter ac‐
505                     cepts various forms of true, false, yes, no, 1 and 0. De‐
506                     fault is True.
507
508              spool_as_final_name
509                     If set to true, jobs will spool directly as their  output
510                     files,  with  no intermediate locations or steps. This is
511                     mostly useful for shared filesystems  with  fast  writing
512                     capability.
513
514              status_update_time
515                     Specifies  (in  seconds) how often MOM updates its status
516                     information to pbs_server.  This value  should  correlate
517                     with  the  server's scheduling interval.  High values in‐
518                     crease the load of pbs_server and the network.  Low  val‐
519                     ues  cause  pbs_server  to report stale information.  De‐
520                     fault is 45 seconds.
521
522              tmpdir Sets the directory basename for a per-job  temporary  di‐
523                     rectory.  Before job launch, MOM will append the jobid to
524                     the tmpdir basename and create the directory.  After  the
525                     job  exit, MOM will recursively delete it.  The env vari‐
526                     able TMPDIR will be set for all pro/epilog  scripts,  the
527                     job script, and TM tasks.
528                     Directory  creation  and removal is done as the job owner
529                     and group, so the owner must  have  write  permission  to
530                     create  the  directory.   If the directory already exists
531                     and is owned by the job owner, it will not be deleted af‐
532                     ter  the job.  If the directory already exists and is NOT
533                     owned by the job owner, the job start will be rejected.
534
535              timeout
536                     Specifies the number of seconds before TCP messages  will
537                     time  out.   TCP  messages include job obituaries, and TM
538                     requests if RPP is disabled.  Default is 60 seconds.
539
540              usecp  specifies which directories should be staged with cp  in‐
541                     stead of rcp/scp.  If a shared filesystem is available on
542                     all hosts in a cluster, this directive is  used  to  make
543                     these filesystems known to MOM.  For example, if /home is
544                     NFS mounted on all nodes in a cluster:
545
546                     $usecp *:/home  /home
547
548              varattr
549                     This is similar to a shell escape above, but  includes  a
550                     TTL.   The command will only be run every TTL seconds.  A
551                     TTL of -1 will cause the  command  to  be  executed  only
552                     once.  A TTL of 0 will cause the command to be run every‐
553                     time varattr is requested.  This parameter  may  be  used
554                     multiple  times,  but  all  output will be grouped into a
555                     single "varattr" attribute in the request and status out‐
556                     put.   The  command  should  output  data  in the form of
557                     varattrname=va1ue1[+value2]...
558
559                     $varattr 3600 /path/to/script [<ARGS>]...
560
561              use_smt
562                     This option is only available, if pbs_mom is  enabled  to
563                     use  cpusets.  It has only effect, if there are more that
564                     one logical processor per physical  core  in  the  system
565                     (simultaneous multithreading or hyperthreading is enabled
566                     via BIOS settings).  If set to true, all logical  proces‐
567                     sors of allocated cores are added to the cpuset of a job.
568                     If set to false, only the first logical processor per al‐
569                     located  core  is  contained in the cpuset of a job.  The
570                     default is true.
571
572              wallmult
573                     which sets a factor used to adjust wall time usage by  to
574                     job to a common reference system.  The factor is used for
575                     walltime calculations and limits the same as cputmult  is
576                     used for cpu time.
577
578       The  configuration  file  must  be executable and "secure".  It must be
579       owned by a user id and group id less than 10 and not be world writable.
580       Output from this file must be in the format $VAR=$VAL, i.e.,
581
582              dataset13=20070104
583              dataset22=20070202
584              viraltest=abdd3
585
586       xauthpath
587              Specifies the path to the xauth binary to enable X11 fowarding.
588
589       mom_host
590              Sets the local hostname as used by pbs_mom.
591

LAYOUT FILE

593       There is also an optional layout file for creating multiple moms on one
594       box in a specified layout. In the file, each mom on the single  box  is
595       given  its own hostname, cpu indexes, memory nodes (a linux construct),
596       and memory size. This is useful for NUMA systems. Each line in the file
597       specifies one mom. The file follows the following format:
598
599       <hostname> cpus=<X> mem=<Y> memsize=<Z>
600              cpus  and mem can be comma separated lists, while memsize should
601              be a memory size in the format:
602
603       <number><units>
604              For example, a file could contain the following line:
605
606       foohost-1 cpus=1,2 mem=1,2,3,4 memsize=8GB
607              This would specify that foohost-1 has cpus 1 and 2, memory nodes
608              1-4, and a total of 8 GB of memory.
609

RESOURCES

611       Resource  Monitor  queries  can  be made with momctl's -q option to re‐
612       trieve and set pbs_mom options.  Any configured static resource may  be
613       retrieved with a request of the same name.  These are resource requests
614       not otherwise documented in the PBS ERS.
615
616       cycle  forces an immediate MOM cycle
617
618       status_update_time
619              retrieve or set the $status_update_time parameter
620
621       check_poll_time
622              retrieve or set the $check_poll_time parameter
623
624       configversion
625              retrieve the config version
626
627       jobstartblocktime
628              retrieve or set the $jobstartblocktime parameter
629
630       enablemomrestart
631              retrieve or set the $enablemomrestart parameter
632
633       loglevel
634              retrieve or set the $loglevel parameter
635
636       down_on_error
637              retrieve or set the EXPERIMENTAL $down_on_error parameter
638
639       diag0 - diag4
640              retrieves various diagnostic information
641
642       rcpcmd retrieve or set the $rcpcmd parameter
643
644       version
645              retrieves the pbs_mom version
646

HEALTH CHECK

648       The health check script is executed directly by the pbs_mom daemon  un‐
649       der  the  root user id. It must be accessible from the compute node and
650       may be a script or compiled executable program.  It may make any needed
651       system calls and execute any combination of system utilities but should
652       not execute resource manager  client  commands.   Also,  as  of  TORQUE
653       1.0.1,  the  pbs_mom  daemon blocks until the health check is completed
654       and does not possess a built-in timeout.  Consequently, it is advisable
655       to  keep  the  launch  script  execution time short and verify that the
656       script will not block even under failure conditions.
657
658       If the script detects a failure, it should return the  keyword  'ERROR'
659       to stdout followed by an error message.  The message (up to 256 charac‐
660       ters) immediately following the ERROR string will be  assigned  to  the
661       node attribute 'message' of the associated node.
662
663       If  the script detects a failure when run from "jobstart", then the job
664       will be rejected.  This should probably  only  be  used  with  advanced
665       schedulers like Moab so that the job can be routed to another node.
666
667       TORQUE currently ignores ERROR messages by default, but advanced sched‐
668       ulers like moab can be configured to react appropriately.
669
670       If the experimental $down_on_error MOM setting is enabled, MOM will set
671       itself  to state down and report to pbs_server; and pbs_server will re‐
672       port the node as "down".  Additionally, the  experimental  "down_on_er‐
673       ror"  server  attribute  can  be  enabled which has the same effect but
674       moves the decision to  pbs_server.   It  is  redundant  to  have  MOM's
675       $down_on_error  and  pbs_server's  down_on_error features enabled.  See
676       "down_on_error" in pbs_server_attributes(7B).
677

FILES

679       $PBS_SERVER_HOME/server_name
680              contains the hostname running pbs_server.
681
682       $PBS_SERVER_HOME/mom_priv
683                 the default  directory  for  configuration  files,  typically
684                 (/usr/spool/pbs)/mom_priv.
685
686       $PBS_SERVER_HOME/mom_logs
687                 directory for log files recorded by the server.
688
689       $PBS_SERVER_HOME/mom_priv/prologue
690                 the administrative script to be run before job execution.
691
692       $PBS_SERVER_HOME/mom_priv/epilogue
693                 the administrative script to be run after job execution.
694

SIGNAL HANDLING

696       pbs_mom handles the following signals:
697
698       SIGHUP causes  pbs_mom to re-read its configuration file, close and re‐
699              open the log file, and reinitialize resource structures.
700
701       SIGALRM
702              results in a log file entry. The signal is  used  to  limit  the
703              time  taken  by certain children processes, such as the prologue
704              and epilogue.
705
706       SIGINT and SIGTERM
707              results in pbs_mom exiting without terminating any running jobs.
708              This  is  the action for the following signals as well: SIGXCPU,
709              SIGXFSZ, SIGCPULIM, and SIGSHUTDN.
710
711       SIGUSR1, SIGUSR2
712              causes MOM to increase  and  decrease  logging  levels,  respec‐
713              tively.
714
715       SIGPIPE, SIGINFO
716               are ignored.
717
718       SIGBUS, SIGFPE, SIGILL, SIGTRAP, and SIGSYS
719              cause  a  core dump if the PBSCOREDUMP environmental variable is
720              defined.
721
722       All other signals have their default behavior installed.
723

EXIT STATUS

725       If the mini-server command fails to begin operation, the  server  exits
726       with a value greater than zero.
727

SEE ALSO

729       pbs_server(8B),  pbs_scheduler_basl(8B), pbs_scheduler_tcl(8B), the PBS
730       External Reference Specification, and the PBS Administrator's Guide.
731
732
733
734Local                                                              pbs_mom(8B)
Impressum