sge_shepherd(8)

1GE_SHEPHERD(8)        Grid Engine Administrative Commands       GE_SHEPHERD(8)
2
3
4

NAME

6       ge_shepherd - Grid Engine single job controlling agent
7

SYNOPSIS

9       ge_shepherd
10

DESCRIPTION

12       ge_shepherd provides the parent process functionality for a single Grid
13       Engine job.  The parent functionality is necessary on UNIX  systems  to
14       retrieve  resource usage information (see getrusage(2)) after a job has
15       finished. In addition, the ge_shepherd forwards  signals  to  the  job,
16       such  as the signals for suspension, enabling, termination and the Grid
17       Engine checkpointing signal (see ge_ckpt(1) for details).
18
19       The ge_shepherd receives information about the job to be  started  from
20       the ge_execd(8).  During the execution of the job it actually starts up
21       to 5 child processes. First a prolog script is run if this  feature  is
22       enabled  by  the  prolog  parameter  in the cluster configuration. (See
23       ge_conf(5).)  Next a parallel environment startup procedure is  run  if
24       the job is a parallel job. (See sge_pe(5) for more information.)  After
25       that, the job itself is run, followed by a parallel  environment  shut‐
26       down  procedure  for  parallel  jobs,  and  finally an epilog script if
27       requested by the epilog parameter in  the  cluster  configuration.  The
28       prolog  and  epilog scripts as well as the parallel environment startup
29       and shutdown procedures are to be provided by the Grid Engine  adminis‐
30       trator  and  are  intended for site-specific actions to be taken before
31       and after execution of the actual user job.
32
33       After the job has finished and the epilog script is processed, ge_shep‐
34       herd  retrieves resource usage statistics about the job, places them in
35       a job specific subdirectory of  the  ge_execd(8)  spool  directory  for
36       reporting through ge_execd(8) and finishes.
37
38       ge_shepherd  also  places  an  exit status file in the spool directory.
39       This exit status can be viewed with qacct -j JobId (see  qacct(1));  it
40       is  not the exit status of ge_shepherd itself but of one of the methods
41       executed by ge_shepherd.  This exit status can have  several  meanings,
42       depending  on in which method an error occurred (if any).  The possible
43       methods are: prolog, parallel start, job, parallel stop,  epilog,  sus‐
44       pend, restart, terminate, clean, migrate, and checkpoint.
45
46       The following exit values are returned:
47
48       0      All methods: Operation was executed successfully.
49
50       99     Job script, prolog and epilog: When FORBID_RESCHEDULE is not set
51              in the configuration (see ge_conf(5)), the job  gets  re-queued.
52              Otherwise see "Other".
53
54       100    Job  script,  prolog and epilog: When FORBID_APPERROR is not set
55              in the configuration (see ge_conf(5)), the job  gets  re-queued.
56              Otherwise see "Other".
57
58       Other  Job script: This is the exit status of the job itself. No action
59              is taken upon this exit status because the meaning of this  exit
60              status is not known.
61              Prolog,  epilog  and  parallel  start: The queue is set to error
62              state and the job is re-queued.
63              Parallel stop: The queue is set to error state, but the  job  is
64              not  re-queued.  It  is assumed that the job itself ran success‐
65              fully and only the clean up script failed.
66              Suspend, restart, terminate, clean, and migrate: Always success‐
67              ful.
68              Checkpoint: Success, except for kernel checkpointing: checkpoint
69              was not successful, did not happen (but migration will happen by
70              Grid Engine).
71

RESTRICTIONS

73       ge_shepherd should not be invoked manually, but only by ge_execd(8).
74

FILES

76       sgepasswd  contains  a  list  of  user  names   and   their correspond‐
77       ing encrypted passwords. If available, the  password   file   will   be
78       used   by  sge_shepherd. To change the contents of this file please use
79       the sgepasswd command. It is not advised to  change that file manually.
80       <execd_spool>/job_dir/<job_id>     job specific directory
81

COPYRIGHT

86       See ge_intro(1) for a full statement of rights and permissions.
87
88
89
90GE 6.2u5                 $Date: 2007/07/19 09:04:33 $           GE_SHEPHERD(8)

NAME

SYNOPSIS

DESCRIPTION

RESTRICTIONS

FILES

SEE ALSO

COPYRIGHT