1SGE_SHEPHERD(8)       Grid Engine Administrative Commands      SGE_SHEPHERD(8)
2
3
4

NAME

6       sge_shepherd - Grid Engine single job controlling agent
7

SYNOPSIS

9       sge_shepherd
10

DESCRIPTION

12       sge_shepherd  provides  the  parent  process functionality for a single
13       Grid Engine job.  The parent functionality is necessary on UNIX systems
14       to  retrieve  resource usage information (see getrusage(2)) after a job
15       has finished. In addition, the sge_shepherd  forwards  signals  to  the
16       job,  such as the signals for suspension, enabling, termination and the
17       Grid Engine checkpointing signal (see sge_ckpt(1) for details).
18
19       The sge_shepherd receives information about the job to be started  from
20       the  sge_execd(8).   During the execution of the job it actually starts
21       up to 5 child processes. First a prolog script is run if  this  feature
22       is  enabled  by the prolog parameter in the cluster configuration. (See
23       sge_conf(5).)  Next a parallel environment startup procedure is run  if
24       the job is a parallel job. (See sge_pe(5) for more information.)  After
25       that, the job itself is run, followed by a parallel  environment  shut‐
26       down  procedure  for  parallel  jobs,  and  finally an epilog script if
27       requested by the epilog parameter in  the  cluster  configuration.  The
28       prolog  and  epilog scripts as well as the parallel environment startup
29       and shutdown procedures are to be provided by the Grid Engine  adminis‐
30       trator  and  are  intended for site-specific actions to be taken before
31       and after execution of the actual user job.
32
33       After the  job  has  finished  and  the  epilog  script  is  processed,
34       sge_shepherd  retrieves resource usage statistics about the job, places
35       them in a job specific subdirectory of the sge_execd(8) spool directory
36       for reporting through sge_execd(8) and finishes.
37
38       sge_shepherd  also  places  an exit status file in the spool directory.
39       This exit status can be viewed with qacct -j JobId (see  qacct(1));  it
40       is not the exit status of sge_shepherd itself but of one of the parame‐
41       ters passed to sge_shepherd.  This exit status can have  several  mean‐
42       ings,  depending  on in where an error occurred (if any).  The possible
43       parameters are: prolog, parallel start,  job,  parallel  stop,  epilog,
44       suspend, restart, terminate, clean, migrate, and checkpoint.
45
46       The following exit values are returned:
47
48       0      All methods: Operation was executed successfully.
49
50       99     Job script, prolog and epilog: When FORBID_RESCHEDULE is not set
51              in the configuration (see sge_conf(5)), the job gets  re-queued.
52              Otherwise see "Other".
53
54       100    Job  script,  prolog and epilog: When FORBID_APPERROR is not set
55              in the configuration (see sge_conf(5)), the job gets  re-queued.
56              Otherwise see "Other".
57
58       Other  Job script: This is the exit status of the job itself. No action
59              is taken upon this exit status because the meaning of this  exit
60              status is not known.
61              Prolog,  epilog  and  parallel  start: The queue is set to error
62              state and the job is re-queued.
63              Parallel stop: The queue is set to error state, but the  job  is
64              not  re-queued.  It  is assumed that the job itself ran success‐
65              fully and only the clean up script failed.
66              Suspend, restart, terminate, clean, and migrate: Always success‐
67              ful.
68              Checkpoint: Success, except for kernel checkpointing: checkpoint
69              was not successful, did not happen (but migration will happen by
70              Grid Engine).
71

RESTRICTIONS

73       sge_shepherd should not be invoked manually, but only by sge_execd(8).
74

FILES

76       sgepasswd  contains  a  list  of  user  names   and   their correspond‐
77       ing encrypted Windows passwords. If available, the password  file  will
78       be    used    by   sge_shepherd.  To  change  the contents of this file
79       please use the sgepasswd command. It is not  advised  to   change  that
80       file manually.
81       <execd_spool>/job_dir/<job_id>     job specific directory
82

SEE ALSO

84       sge_intro(1), sge_conf(5), sge_execd(8).
85
87       See sge_intro(1) for a full statement of rights and permissions.
88
89
90
91GE 6.1                   $Date: 2007/07/19 08:17:19 $          SGE_SHEPHERD(8)
Impressum