1pbs_mom(8B)                           PBS                          pbs_mom(8B)
2
3
4

NAME

6       pbs_mom - start a pbs batch execution mini-server
7

SYNOPSIS

9       pbs_mom   [-a alarm]   [-C chkdirectory]   [-c config]   [-d directory]
10       [-h help]   [-H hostname]   [-L logfile]   [-M MOMport]    [-R RPPport]
11       [-p|-q|-r] [-x]
12

DESCRIPTION

14       The  pbs_mom  command  starts the operation of a batch Machine Oriented
15       Mini-server, MOM, on the local host.  Typically, this command  will  be
16       in  a  local  boot  file  such  as  /etc/rc.local .  To insure that the
17       pbs_mom command is not runnable by  the  general  user  community,  the
18       server will only execute if its real and effective uid is zero.
19
20       One  function of pbs_mom is to place jobs into execution as directed by
21       the server, establish resource usage limits, monitor the  job's  usage,
22       and  notify  the server when the job completes.  If they exist, pbs_mom
23       will execute a prologue script before executing a job and  an  epilogue
24       script  after  executing  the  job.  The next function of pbs_mom is to
25       respond to resource monitor requests.  This  was  done  by  a  separate
26       process  in previous versions of PBS but has now been combined into one
27       process.  The resource monitor function is provided mainly for the  PBS
28       scheduler.   It  provides information about the status of running jobs,
29       memory available etc.  The next function of pbs_mom is  to  respond  to
30       task  manager requests.  This involves communicating with running tasks
31       over a tcp socket as well as communicating with other MOMs within a job
32       (aka a "sisterhood").
33
34       Pbs_mom  will  record  a diagnostic message in a log file for any error
35       occurrence.  The log files are maintained  in  the  mom_logs  directory
36       below  the  home  directory  of  the server.  If the log file cannot be
37       opened, the diagnostic message is written to the system console.
38

OPTIONS

40       -A alias        Used with -m (multi-mom option) to give the alias  name
41                       of this instance of pbs_mom
42
43       -a alarm        Specifies  the alarm timeout in seconds for computing a
44                       resource.  Every time a resource request is  processed,
45                       an  alarm  is set for the given amount of time.  If the
46                       request has not completed before  the  given  time,  an
47                       alarm signal is generated.  The default is 5 seconds.
48
49       -C chkdirectory Specifies the path of the directory used to hold check‐
50                       point files.  [Currently this is  only  valid  on  Cray
51                       systems.]       The      default      directory      is
52                       PBS_HOME/spool/checkpoint,  see  the  -d  option.   The
53                       directory specified with the -C option must be owned by
54                       root and accessible (rwx) only by root to  protect  the
55                       security of the checkpoint files.
56
57       -c config       Specifies   an   alternative  configuration  file,  see
58                       description below.  If this is a relative file name  it
59                       will  be  relative  to  PBS_HOME/mom_priv,  see  the -d
60                       option.   If  the  specified  file  cannot  be  opened,
61                       pbs_mom  will abort.  If the -c option is not supplied,
62                       pbs_mom will attempt to open the default
63                        configuration file "config" in PBS_HOME/mom_priv.   If
64                       this file is not present, pbs_mom will log the fact and
65                       continue.
66
67       -h help         Displays the help/usage message.
68
69       -H hostname     Sets the MOM's hostname.  This can be useful on  multi-
70                       homed networks.
71
72       -d directory    Specifies  the  path of the directory which is the home
73                       of the servers working files, PBS_HOME.  This option is
74                       typically  used  along with -M when debugging MOM.  The
75                       default directory is given by $PBS_SERVER_HOME which is
76                       typically /usr/spool/PBS.
77
78       -L logfile      Specifies  an  absolute  path  name  for use as the log
79                       file.  If not specified, MOM will open a file named for
80                       the  current  date  in the PBS_HOME/mom_logs directory,
81                       see the -d option.
82
83       -m              Directs the MOM to start in multi-mom mode. In addition
84                       to  using  -m the -M, -R and -A options need to be used
85                       to properly start a MOM in multi-mom mode.  For example
86                       pbs_mom  -m  -M 30002 -R 30003 -A alias-host will start
87                       pbs_mom with the service port on port 30002,  the  man‐
88                       ager port at 30003 and with the name alias-host.
89
90       -M port         Specifies  the  port  number  on  which the mini-server
91                       (MOM) will listen for batch requests.
92
93       -R port         Specifies the port  number  on  which  the  mini-server
94                       (MOM)  will  listen for resource monitor requests, task
95                       manager requests and inter-MOM messages.
96
97       -p              (Default after version 2.4.0) (Preserve  running  jobs)
98                       -- Specifies the impact on jobs which were in execution
99                       when the    mini-server shut-down.  The -p option tries
100                       to  preserve  any  running  jobs when the MOM restarts.
101                       The new mini-server will not be the parent of any  run‐
102                       ning  jobs, MOM has lost control of her  offspring (not
103                       a new situation for a mother).  The MOM will allow  the
104                       jobs to continue to run and monitor them indirectly via
105                       polling. All recovered jobs will report an exit code of
106                       0  when  they  are  complete. The -p option is mutually
107                       exclusive with the -r, -P and -q options.
108
109       -P              (Terminate all jobs and remove them from the queue)  --
110                       Specifies  the  impact  on jobs which were in execution
111                       when the mini-server shut-down.  With the -P option, it
112                       is  assumed  that  either  the  entire  system has been
113                       restarted or the MOM has been down so long that it  can
114                       no longer guarantee that the pid of any running process
115                       is the same as the recorded job process pid of a recov‐
116                       ering  job.  Unlike the -p option no attempt is made to
117                       try and preserve or recover running jobs. All jobs  are
118                       terminated  and  removed from the queue.  The -q option
119                       is mutually exclusive with the -p, -q and -r options.
120
121       -q              (Requeue all jobs - This is  the  default  behavior  in
122                       versions  prior  to  2.4.0)  -- Specifies the impact on
123                       jobs which were in execution when the  mini-servershut-
124                       down.  Do not terminate running processes.  With the -q
125                       option, it is assumed that either the entire system has
126                       been restarted or the MOM has been down so long that it
127                       can no longer guarantee that the  pid  of  any  running
128                       process  is the same as the recorded job process pid of
129                       a recovering job. No attempt is made to kill  job  pro‐
130                       cesses.   The  MOM will mark the jobs as terminated and
131                       notify the batch server which owns the job. Re-runnable
132                       jobs  will  be  requeued.   The  -q  option is mutually
133                       exclusive with the -p, -P and -r options.
134
135       -r              (Terminate running processes and requeue all  jobs)  --
136                       Specifies  the  impact  on jobs which were in execution
137                       when the mini-server shut-down. With the -r option, MOM
138                       will kill any processes belonging to running jobs, mark
139                       the jobs as terminated and notify the batch server that
140                       owns  the  job.  Re-runnable jobs are reset to a queued
141                       state so they can be run again.  The -r option is mutu‐
142                       ally exclusive with the -p, -P and -q options.
143
144                       If  the  -r  option is used following a reboot, process
145                       IDs (pids) may be reused and MOM  may  kill  a  process
146                       that is not a batch session.
147
148       -S port         Specifies  the  port  number on which the pbs_server is
149                       listening for requests.  If pbs_server is started  with
150                       a -p option, pbs_mom will need to use the -S option and
151                       match  the  port  value  which  was   used   to   start
152                       pbs_server.
153
154       -x              Disables the check for privileged port resource monitor
155                       connections.  This is used mainly for testing since the
156                       privileged  port  is the only mechanism used to prevent
157                       any ordinary user from connecting.
158

CONFIGURATION FILE

160       The configuration file may be specified on the command line at  program
161       start  with  the  -c  flag.  The use of this file is to provide several
162       types of run time information to pbs_mom:  static  resource  names  and
163       values,  external  resources provided by a program to be run on request
164       via a shell escape, and values to pass to internal set up functions  at
165       initialization (and re-initialization).
166
167       Each  item  type is on a single line with the component parts separated
168       by white space.  If the line starts with a hash mark (pound  sign,  #),
169       the line is considered to be a comment and is skipped.
170
171       Static Resources
172              For  static  resource  names  and values, the configuration file
173              contains a list of resource names/values  pairs,  one  pair  per
174              line  and  separated  by  white  space.    An  Example of static
175              resource names and values could be the number of tape drives  of
176              different types and could be specified by
177
178              tape3480      4
179              tape3420      2
180              tapedat       1
181              tape8mm       1
182
183       Shell Commands
184              If  the first character of the value is an exclamation mark (!),
185              the entire rest of the line is saved to be executed through  the
186              services of the system(3) standard library routine.
187
188              The  shell  escape  provides a means for the resource monitor to
189              yield arbitrary information to the scheduler.  Parameter substi‐
190              tution  is  done  such that the value of any qualifier sent with
191              the query, as explained below, replaces a token with  a  percent
192              sign  (%)  followed  by the name of the qualifier.  For example,
193              here is a configuration file line which gives a resource name of
194              "escape":
195
196              escape     !echo %xxx %yyy
197
198              If  a query for "escape" is sent with no qualifiers, the command
199              executed would be "echo %xxx %yyy".  If one qualifier  is  sent,
200              "escape[xxx=hi  there]",  the command executed would be "echo hi
201              there    %yyy".      If     two     qualifiers     are     sent,
202              "escape[xxx=hi][yyy=there]", the command executed would be "echo
203              hi there".  If a qualifier is sent with no matching token in the
204              command line, "escape[zzz=snafu]", an error is reported.
205
206       size[fs=<FS>]
207              Specifies  that  the  available and configured disk space in the
208              <FS> filesystem is to be reported to the pbs_server  and  sched‐
209              uler.   NOTE:  To request disk space on a per job basis, specify
210              the file resource as in 'qsub -l nodes=1,file=1000kb'  For exam‐
211              ple,   the   available   and   configured   disk  space  in  the
212              /localscratch filesystem will be reported:
213
214              size[fs=/localscratch]
215
216       Initialization Value
217              An initialization value directive has a name which starts with a
218              dollar  sign ($) and must be known to MOM via an internal table.
219              The entries in this table now are:
220
221              auto_ideal_load
222                     if jobs are running, sets idea_load  based  on  a  simple
223                     expression.   The expressions start with the variable 't'
224                     (total assigned CPUs) or 'c' (existing CPUs), an operator
225                     (+ - / *), and followed by a float constant.
226
227                     $auto_ideal_load t-0.2
228
229              auto_max_load
230                     if  jobs  are  running,  sets  max_load based on a simple
231                     expression.  The expressions start with the variable  't'
232                     (total assigned CPUs) or 'c' (existing CPUs), an operator
233                     (+ - / *), and followed by a float constant.
234
235              cputmult
236                     which sets a factor used to adjust cpu  time  used  by  a
237                     job.   This  is  provided  to  allow  adjustment  of time
238                     charged and limits enforced where the job  might  run  on
239                     systems  with different cpu performance.  If Mom's system
240                     is faster than the reference system, set  cputmult  to  a
241                     decimal  value  greater  than  1.0.    If Mom's system is
242                     slower, set cputmult to a value between 1.0 and 0.0.  For
243                     example:
244
245                     $cputmult 1.5
246                     $cputmult 0.75
247
248              configversion
249                     specifies the version of the config file data, a string.
250
251              check_poll_time
252                     specifies  the  MOM interval in seconds.  MOM checks each
253                     job for updated resource usages, exited processes,  over-
254                     limit  conditions,  etc.  once  per interval.  This value
255                     should be equal or lower to  pbs_server's  job_stat_rate.
256                     High  values  result  in  stale  information  reported to
257                     pbs_server.  Low values result in increased system  usage
258                     by MOM.  Default is 45 seconds.
259
260              down_on_error
261                     causes MOM to report itself as state "down" to pbs_server
262                     in the event of a failed health check.  This  feature  is
263                     EXPERIMENTAL and likely to be removed in the future.  See
264                     HEALTH CHECK below.
265
266              enablemomrestart
267                     enable automatic restarts of MOM.  If enabled,  MOM  will
268                     check  if  its binary has been updated and restart itself
269                     at a safe point when no jobs  are  running;  thus  making
270                     upgrades  easier.   The  check  is  made by comparing the
271                     mtime of the pbs_mom executable.  Command-line args,  the
272                     process  name,  and  the  PATH env variable are preserved
273                     across restarts.  It is  recommended  that  this  not  be
274                     enabled in the config file, but enabled when desired with
275                     momctl (see RESOURCES for more information.)
276
277              ideal_load
278                     ideal processor load.  Represents a low  water  mark  for
279                     the  load  average.   Nodes  that are currently busy will
280                     consider itself free after falling below ideal_load.
281
282              igncput
283                     Ignore cpu time violations on this mom, meaning jobs will
284                     not  be  cancelled  due to exceeding their limits for cpu
285                     time.
286
287              ignmem Ignore memory violations on this mom, meaning  jobs  will
288                     not be cancelled due to exceeding their memory limits.
289
290              ignvmem
291                     If set to true, then pbs_mom will ignore vmem/pvmem limit
292                     enforcement.
293
294              ignwalltime
295                     If set to true, then pbs_mom will ignore  walltime  limit
296                     enforcement.
297
298              job_output_file_mask
299                     Specifies a mask for creating job output and error files.
300                     Values can be specified in base 8, 10, or 16;  leading  0
301                     implies  octal  and leading 0x or 0X hexadecimal. A value
302                     of "userdefault"  will  use  the  user's  default  umask.
303                     $job_output_file_mask 027
304
305              log_directory
306                     Changes   the   log   directory.   Default   is  $TORQUE‐
307                     HOME/mom_logs/. $TORQUEHOME default is /var/spool/torque/
308                     but  can  be changed in the ./configure script. The value
309                     is a string and should be the full path  to  the  desired
310                     mom log directory.  $log_directory /opt/torque/mom_logs/
311
312              logevent
313                     which sets the mask that determines which event types are
314                     logged by pbs_mom.  For example:
315
316                     $logevent 0x1fff
317                     $logevent 255
318
319                     The first example would set the log event mask  to  0x1ff
320                     (511) which enables logging of all events including debug
321                     events.  The second example would set the mask  to  0x0ff
322                     (255) which enables all events except debug events.
323
324              log_file_suffix
325                     Optional suffix to append to log file names. If %h is the
326                     suffix, pbs_mom appends the hostname for  where  the  log
327                     files are stored if it knows it, otherwise it will append
328                     the hostname where the mom is running.   $log_file_suffix
329                     tom = 20100223.tom
330
331              log_keep_days
332                     Specifies  how  many  days  to  keep  log  files. pbs_mom
333                     deletes log files older  than  the  specified  number  of
334                     days.  If  not  specified, pbs_mom won't delete log files
335                     based on their age.
336
337              loglevel
338                     specifies the verbosity of logging  with  higher  numbers
339                     specifying   more  verbose  logging.   Values  may  range
340                     between 0 and 7.
341
342              log_file_max_size
343                     If  this  is set to a value > 0 then  pbs_mom  will  roll
344                     the  current log file to log-file-name.1 when its size is
345                     greater    than   or    equal    to    the    value    of
346                     log_file_max_size.  This  value  is  interpreted as kilo‐
347                     bytes.
348
349              log_file_roll_depth
350                     If this is set to a value >=1 and  log_file_max_size   is
351                     set then  pbs_mom  will continue rolling the log files to
352                     log-file-name.log_file_roll_depth.
353
354              max_load
355                     maximum processor load.  Nodes over this load average are
356                     considered busy (see ideal_load above).
357
358              memory_pressure_threshold
359                     The  option  is  only available, if pbs_mom is enabled to
360                     use cpusets.  If set to a value > 0, a job gets killed if
361                     its  memory  pressure  exceeds  this  value, and if $mem‐
362                     ory_pressure_duration is set.  The default is  0  (memory
363                     pressure recording is off).
364                     See cpuset(7) for more information about memory pressure.
365
366              memory_pressure_duration
367                     The  option  is  only available, if pbs_mom is enabled to
368                     use cpusets.  Specifies  the  number  of  subsequent  MOM
369                     intervals  a  job's  memory  pressure must be above $mem‐
370                     ory_pressure_threshold to get killed.  The default  is  0
371                     (jobs are never killed due to memory pressure).  set
372                     See cpuset(7) for more information about memory pressure.
373
374              node_check_script
375                     specifies  the  fully  qualified  pathname  of the health
376                     check script to run (see HEALTH CHECK for  more  informa‐
377                     tion).
378
379              node_check_interval
380                     specifies  when  to  run the MOM health check.  The check
381                     can be either periodic, event-driver, or both.  The value
382                     starts  with  an  integer  specifying  the  number of MOM
383                     intervals between subsequent executions of the  specified
384                     health  check.   After  the integer is an optional comma-
385                     separated list of event names.  Currently  supported  are
386                     "jobstart"  and  "jobend".  This value defaults to 1 with
387                     no events indicating the check is run every MOM interval.
388                     (see HEALTH CHECK for more information)
389
390                     $node_check_interval 0  #Disabled.
391                     $node_check_interval 0,jobstart  #Only runs at job starts
392                     $node_check_interval 10,jobstart,jobend
393
394              nodefile_suffix
395                     Specifies  the suffix to append to a host names to denote
396                     the data channel network adapter in a multihomed  compute
397                     node.   $nodefile_suffix i With the suffix of 'i' and the
398                     control channel adapter with the name  node01,  the  data
399                     channel would have a hostname of node01i.
400
401              nospool_dir_list
402                     If  the  job's  output file should be in one of the paths
403                     specified here, then it will be spooled directly in  that
404                     directory instead of the normal spool directory.
405                     Specified    in    the    format   path1,   path2,   etc.
406                     $nospool_dir_list/home/mike/*,/var/tmp/spool/
407
408              pbsclient
409                     which causes a host name to be added to the list of hosts
410                     which  will  be allowed to connect to MOM as long as they
411                     are using a privilaged port for the purposes of  resource
412                     monitor  requests.   For example, here are two configura‐
413                     tion file lines which will allow  the  hosts  "fred"  and
414                     "wilma" to connect:
415
416                     $pbsclient      fred
417                     $pbsclient      wilma
418
419                     Two  host  name  are  always  allowed  to  connection  to
420                     pbs_mom, "localhost" and the name returned to pbs_mom  by
421                     the  system  call gethostname().  These names need not be
422                     specified in the configuration file.  The hosts listed as
423                     "clients"  can  issue  Resource  Monitor  (RM)  requests.
424                     Other MOM nodes and servers do not need to be  listed  as
425                     clients.
426
427              pbsserver
428                     which  defines  hostnames running pbs_server that will be
429                     allowed to  submit  jobs,  issue  Resource  Monitor  (RM)
430                     requests,  and  get status updates.  MOM will continually
431                     attempt to contact all server hosts for node  status  and
432                     state  updates.   Like  $PBS_SERVER_HOME/server_name, the
433                     hostname may be followed by a colon and  a  port  number.
434                     This  parameter  replaces  the  oft-confused  $clienthost
435                     parameter from TORQUE 2.0.0p0 and earlier.  Note that the
436                     hostname  in  $PBS_SERVER_HOME/server_name  is used if no
437                     $pbsserver parameters are found
438
439              prologalarm
440                     Specifies maximum duration (in  seconds)  which  the  MOM
441                     will  wait  for  the job prolog or job job epilog to com‐
442                     plete.  This parameter default to 300 seconds (5 minutes)
443
444              rcpcmd Specify the the full path and argument  to  be  used  for
445                     remote  file  copies.   This  overrides  the compile-time
446                     default found in configure.  This must contain  2  words:
447                     the  full path to the command and the switches.  The copy
448                     command must be able to recursively  copy  files  to  the
449                     remote   host   and   accept   arguments   of   the  form
450                     "user@host:files"  For example:
451
452                     $rcpcmd /usr/bin/rcp -rp
453                     $rcpcmd /usr/bin/scp -rpB
454
455              restricted
456                     which causes a host name to be added to the list of hosts
457                     which  will  be allowed to connect to MOM without needing
458                     to use a privilaged port.  These names allow for wildcard
459                     matching.  For example, here is a configuration file line
460                     which will allow queries from any host  from  the  domain
461                     "ibm.com".
462
463                     $restricted      *.ibm.com
464
465                     The  restriction  which  applies  to these connections is
466                     that only internal queries may  be  made.   No  resources
467                     from a config file will be found.  This is to prevent any
468                     shell commands from being run by a non-root process.
469                     This parameter is generally not required except for  some
470                     versions of OSX.
471
472              remote_checkpoint_dirs
473                     Specifies what server checkpoint directories are remotely
474                     mounted.  This directive is used to tell  the  MOM  which
475                     directories  are  shared  with  the server.  Using remote
476                     checkpoint directories eliminates the need  to  copy  the
477                     checkpoint  files  back and forth between the MOM and the
478                     server. This parameter is available in 2.4.1 and later.
479
480                     $remote_checkpoint_dirs /var/spool/torque/checkpoint
481
482              remote_reconfig
483                     Enables the ability to remotely reconfigure pbs_mom  with
484                     a  new config file.  Default is disabled.  This parameter
485                     accepts various forms of true, yes, and 1.
486
487              source_login_batch
488                     Specifies whether or not mom will  source  the  /etc/pro‐
489                     file,  etc.  type files for batch jobs. Parameter accepts
490                     various forms of true, false, yes, no, 1 and  0.  Default
491                     is True.
492
493              source_login_interactive
494                     Specifies  whether  or  not mom will source the /etc/pro‐
495                     file, etc. type files  for  interactive  jobs.  Parameter
496                     accepts  various  forms of true, false, yes, no, 1 and 0.
497                     Default is True.
498
499              spool_as_final_name
500                     If set to true, jobs will spool directly as their  output
501                     files,  with  no intermediate locations or steps. This is
502                     mostly useful for shared filesystems  with  fast  writing
503                     capability.
504
505              status_update_time
506                     Specifies  (in  seconds) how often MOM updates its status
507                     information to pbs_server.  This value  should  correlate
508                     with  the  server's  scheduling  interval.   High  values
509                     increase the load of pbs_server  and  the  network.   Low
510                     values  cause  pbs_server  to  report  stale information.
511                     Default is 45 seconds.
512
513              tmpdir Sets the  directory  basename  for  a  per-job  temporary
514                     directory.   Before job launch, MOM will append the jobid
515                     to the tmpdir basename and create the  directory.   After
516                     the  job  exit,  MOM will recursively delete it.  The env
517                     variable TMPDIR will be set for all  pro/epilog  scripts,
518                     the job script, and TM tasks.
519                     Directory  creation  and removal is done as the job owner
520                     and group, so the owner must  have  write  permission  to
521                     create  the  directory.   If the directory already exists
522                     and is owned by the job owner, it  will  not  be  deleted
523                     after  the  job.   If the directory already exists and is
524                     NOT owned by  the  job  owner,  the  job  start  will  be
525                     rejected.
526
527              timeout
528                     Specifies  the number of seconds before TCP messages will
529                     time out.  TCP messages include job  obituaries,  and  TM
530                     requests if RPP is disabled.  Default is 60 seconds.
531
532              usecp  specifies  which  directories  should  be  staged with cp
533                     instead of rcp/scp.  If a shared filesystem is  available
534                     on all hosts in a cluster, this directive is used to make
535                     these filesystems known to MOM.  For example, if /home is
536                     NFS mounted on all nodes in a cluster:
537
538                     $usecp *:/home  /home
539
540              varattr
541                     This  is  similar to a shell escape above, but includes a
542                     TTL.  The command will only be run every TTL seconds.   A
543                     TTL  of  -1  will  cause  the command to be executed only
544                     once.  A TTL of 0 will cause the command to be run every‐
545                     time  varattr  is  requested.  This parameter may be used
546                     multiple times, but all output will  be  grouped  into  a
547                     single "varattr" attribute in the request and status out‐
548                     put.  The command should  output  data  in  the  form  of
549                     varattrname=va1ue1[+value2]...
550
551                     $varattr 3600 /path/to/script [<ARGS>]...
552
553              use_smt
554                     This  option  is only available, if pbs_mom is enabled to
555                     use cpusets.  It has only effect, if there are more  that
556                     one  logical  processor  per  physical core in the system
557                     (simultaneous multithreading or hyperthreading is enabled
558                     via  BIOS settings).  If set to true, all logical proces‐
559                     sors of allocated cores are added to the cpuset of a job.
560                     If  set  to  false,  only the first logical processor per
561                     allocated core is contained in the cpuset of a job.   The
562                     default is true.
563
564              wallmult
565                     which  sets a factor used to adjust wall time usage by to
566                     job to a common reference system.  The factor is used for
567                     walltime  calculations and limits the same as cputmult is
568                     used for cpu time.
569
570       The configuration file must be executable and  "secure".   It  must  be
571       owned by a user id and group id less than 10 and not be world writable.
572       Output from this file must be in the format $VAR=$VAL, i.e.,
573
574              dataset13=20070104
575              dataset22=20070202
576              viraltest=abdd3
577
578       xauthpath
579              Specifies the path to the xauth binary to enable X11 fowarding.
580
581       mom_host
582              Sets the local hostname as used by pbs_mom.
583

LAYOUT FILE

585       There is also an optional layout file for creating multiple moms on one
586       box  in  a specified layout. In the file, each mom on the single box is
587       given its own hostname, cpu indexes, memory nodes (a linux  construct),
588       and memory size. This is useful for NUMA systems. Each line in the file
589       specifies one mom. The file follows the following format:
590
591       <hostname> cpus=<X> mem=<Y> memsize=<Z>
592              cpus and mem can be comma separated lists, while memsize  should
593              be a memory size in the format:
594
595       <number><units>
596              For example, a file could contain the following line:
597
598       foohost-1 cpus=1,2 mem=1,2,3,4 memsize=8GB
599              This would specify that foohost-1 has cpus 1 and 2, memory nodes
600              1-4, and a total of 8 GB of memory.
601

RESOURCES

603       Resource Monitor queries  can  be  made  with  momctl's  -q  option  to
604       retrieve  and  set pbs_mom options.  Any configured static resource may
605       be retrieved with a request of  the  same  name.   These  are  resource
606       requests not otherwise documented in the PBS ERS.
607
608       cycle  forces an immediate MOM cycle
609
610       status_update_time
611              retrieve or set the $status_update_time parameter
612
613       check_poll_time
614              retrieve or set the $check_poll_time parameter
615
616       configversion
617              retrieve the config version
618
619       jobstartblocktime
620              retrieve or set the $jobstartblocktime parameter
621
622       enablemomrestart
623              retrieve or set the $enablemomrestart parameter
624
625       loglevel
626              retrieve or set the $loglevel parameter
627
628       down_on_error
629              retrieve or set the EXPERIMENTAL $down_on_error parameter
630
631       diag0 - diag4
632              retrieves various diagnostic information
633
634       rcpcmd retrieve or set the $rcpcmd parameter
635
636       version
637              retrieves the pbs_mom version
638

HEALTH CHECK

640       The  health  check  script  is  executed directly by the pbs_mom daemon
641       under the root user id. It must be accessible from the compute node and
642       may be a script or compiled executable program.  It may make any needed
643       system calls and execute any combination of system utilities but should
644       not  execute  resource  manager  client  commands.   Also, as of TORQUE
645       1.0.1, the pbs_mom daemon blocks until the health  check  is  completed
646       and does not possess a built-in timeout.  Consequently, it is advisable
647       to keep the launch script execution time  short  and  verify  that  the
648       script will not block even under failure conditions.
649
650       If  the  script detects a failure, it should return the keyword 'ERROR'
651       to stdout followed by an error message.  The message (up to 256 charac‐
652       ters)  immediately  following  the ERROR string will be assigned to the
653       node attribute 'message' of the associated node.
654
655       If the script detects a failure when run from "jobstart", then the  job
656       will  be  rejected.   This  should  probably only be used with advanced
657       schedulers like Moab so that the job can be routed to another node.
658
659       TORQUE currently ignores ERROR messages by default, but advanced sched‐
660       ulers like moab can be configured to react appropriately.
661
662       If the experimental $down_on_error MOM setting is enabled, MOM will set
663       itself to state down and report  to  pbs_server;  and  pbs_server  will
664       report   the   node   as   "down".    Additionally,   the  experimental
665       "down_on_error" server attribute can be  enabled  which  has  the  same
666       effect  but  moves the decision to pbs_server.  It is redundant to have
667       MOM's $down_on_error and pbs_server's down_on_error  features  enabled.
668       See "down_on_error" in pbs_server_attributes(7B).
669

FILES

671       $PBS_SERVER_HOME/server_name
672              contains the hostname running pbs_server.
673
674       $PBS_SERVER_HOME/mom_priv
675                 the  default  directory  for  configuration  files, typically
676                 (/usr/spool/pbs)/mom_priv.
677
678       $PBS_SERVER_HOME/mom_logs
679                 directory for log files recorded by the server.
680
681       $PBS_SERVER_HOME/mom_priv/prologue
682                 the administrative script to be run before job execution.
683
684       $PBS_SERVER_HOME/mom_priv/epilogue
685                 the administrative script to be run after job execution.
686

SIGNAL HANDLING

688       pbs_mom handles the following signals:
689
690       SIGHUP causes pbs_mom to re-read  its  configuration  file,  close  and
691              reopen the log file, and reinitialize resource structures.
692
693       SIGALRM
694              results  in  a  log  file entry. The signal is used to limit the
695              time taken by certain children processes, such as  the  prologue
696              and epilogue.
697
698       SIGINT and SIGTERM
699              results in pbs_mom exiting without terminating any running jobs.
700              This is the action for the following signals as  well:  SIGXCPU,
701              SIGXFSZ, SIGCPULIM, and SIGSHUTDN.
702
703       SIGUSR1, SIGUSR2
704              causes  MOM  to  increase  and  decrease logging levels, respec‐
705              tively.
706
707       SIGPIPE, SIGINFO
708               are ignored.
709
710       SIGBUS, SIGFPE, SIGILL, SIGTRAP, and SIGSYS
711              cause a core dump if the PBSCOREDUMP environmental  variable  is
712              defined.
713
714       All other signals have their default behavior installed.
715

EXIT STATUS

717       If  the  mini-server command fails to begin operation, the server exits
718       with a value greater than zero.
719

SEE ALSO

721       pbs_server(8B), pbs_scheduler_basl(8B), pbs_scheduler_tcl(8B), the  PBS
722       External Reference Specification, and the PBS Administrator's Guide.
723
724
725
726Local                                                              pbs_mom(8B)
Impressum