1JSV(1) Grid Engine File Formats JSV(1)
2
3
4
6 JSV - Grid Engine Job Submission Verifier
7
9 JSV is an abbreviation for Job Submission Verifier. A JSV is a script
10 or binary that can be used to verify, modify or reject a job during the
11 time of job submission.
12
13 JSVs will be triggered by submit clients like qsub, qrsh, qsh and qmon
14 on submit hosts (Client JSV) or they verify incoming jobs on the master
15 host (Server JSV) or both.
16
18 JSVs can be configured on various locations. Either a jsv_url can be
19 provided with the -jsv submit parameter during job submission, a corre‐
20 sponding switch can be added to one of the sge_request files or a
21 jsv_url can be configured in the global cluster configuration of the
22 Grid Engine installation.
23
24 All defined JSV instances will be executed in following order:
25
26 1) qsub -jsv ...
27 2) $cwd/.sge_request
28 3) $HOME/.sge_request
29 4) $SGE_ROOT/$SGE_CELL/common/sge_request
30 5) Global configuration
31
32 The Client JSVs (1-3) can be defined by Grid Engine end users whereas
33 the client JSV defined in the global sge_request file (4) and the
34 server JSV (5) can only be defined by the Grid Engine administrators.
35
36 Due to the fact that (4) and (5) are defined and configured by Grid
37 Engine administrators and because they are executed as last JSV
38 instances in the sequence of JSV scripts, an administrator has an addi‐
39 tional way to define certain policies for a cluster.
40
41 As soon as one JSV instance rejects a job the whole process of verifi‐
42 cation is stopped and the end user will get a corresponding error mes‐
43 sage that the submission of the job has failed.
44
45 If a JSV accepts a job or accepts a job after it applied several modi‐
46 fications then the following JSV instance will get the job parameters
47 including all modifications as input for the verification process.
48 This is done as long as either the job is accepted or rejected.
49
50 Find more information how to use Client JSVs in qsub(1) and for Server
51 JSVs in sge_conf(5)
52
54 A Client or Server JSV is started as own UNIX process. This process
55 communicates either with a Grid Engine client process or the master
56 daemon by exchanging commands, job parameters and other data via
57 stdin/stdout channels.
58
59 Client JSV instances are started by client applications before a job is
60 sent to qmaster. This instance does the job verification for the job to
61 be submitted. After that verification the JSV process is stopped.
62
63 Server JSV instances are started for each worker thread part of the
64 qmaster process (for version 6.2 of Grid Engine this means that two
65 processes are started). Each of those processes have to verify job
66 parameters for multiple jobs as long as the master is running, the
67 underlying JSV configuration is not changed and no error occurs.
68
70 The timeout is a modifiable value that will measure the response time
71 of either the client or server JSV. In the event that the response time
72 of the JSV is longer than timeout value specified, this will result in
73 the JSV being re-started. The server JSV timeout value is specified
74 through the qmaster parameter jsv_timeout. The client JSV timeout
75 value is set through the environment variable SGE_JSV_TIMEOUT. The
76 default value is 10 seconds, and this value must be greater than 0. If
77 the timeout has been reach, the JSV will only try to re-start once, if
78 the timeout is reached again an error will occur.
79
81 The threshold value is defined as a qmaster parameter jsv_threshold.
82 This value measures the time for a server job verification. If this
83 time exceeds the defined threshold then additional logging will appear
84 in the master message file at the INFO level. This value is specified
85 in milliseconds and has a default value of 5000. If a value of 0 is
86 defined then this means all jobs will be logged in the message file.
87
89 After a JSV script or binary is started it will get commands through
90 its stdin stream and it has to respond with certain commands on the
91 stdout stream. Data which is send via the stderr stream of a JSV
92 instance is ignored. Each command which is send to/by a JSV script has
93 to be terminated by a new line character ('\n') whereas new line char‐
94 acters are not allowed in the whole command string itself.
95
96 In general commands which are exchanged between a JSV and client/qmas‐
97 ter have following format. Commands and arguments are case sensitive.
98 Find the EBNF command description below.
99
100 command := command_name ' ' { argument ' ' } ;
101
102 A command starts with a command_name followed by a space character and
103 a space separated list of arguments.
104
106 Following commands have to be implemented by an JSV script so that it
107 conforms to version 1.0 of the JSV protocol which was first implemented
108 in Grid Engine 6.2u2:
109
110 begin_command := 'BEGIN' ;
111 After a JSV instance has received all env_commands and
112 param_commands of a job which should be verified, the
113 client/qmaster will trigger the verification process by sending
114 one begin_command. After that it will wait for param_commands
115 and env_commands which are sent back from the JSV instance to
116 modify the job specification. As part of the verification
117 process a JSV script or binary has to use the result_command to
118 indicate that the verification process is finished for a job.
119
120 env_command := ENV ' ' modifier ' ' name ' ' value ;
121
122 modifier := 'ADD' | 'MOD' | 'DEL' ;
123 The env_command is an optional command which has only to be
124 implemented by a JSV instance if the send_data_command is sent
125 by this JSV before a the started_command was sent. Only in that
126 case the client or master will use one or multiple env_commands
127 to pass the environment variables (name and value) to the JSV
128 instance which would be exported to the job environment when the
129 job would be started. Client and qmaster will only sent env_com‐
130 mands with the modifier 'ADD'.
131
132 JSV instances modify the set of environment variables by sending
133 back env_commands and by using the modifiers ADD, MOD and DEL.
134
135 param_command := 'PARAM' ' ' param_parameter ' ' value ;
136
137 param_parameter := submit_parameter | pseudo_parameter ;
138 The param_command has two additional arguments which are sepa‐
139 rated by space characters. The first argument is either a sub‐
140 mit_parameter as it is specified in qsub(1) or it is a
141 pseudo_parameters as documented below. The second parameter is
142 the value of the corresponding param_parameter.
143
144 Multiple param_commands will be sent to a JSV instance after the
145 JSV has sent a started_command. The sum of all param_commands
146 which is sent represents a job specification of that job which
147 should be verified.
148
149 submit_parameters are for example b (similar to the qsub -b
150 switch) or masterq (similar to qsub -masterq switch). Find a
151 complete list of submit_parameters in the qsub(1) man page.
152 Please note that not in all cases the param_parameter name and
153 the corresponding value format is equivalent with the qsub
154 switch name and its argument format. E.g. the qsub -pe parame‐
155 ters will by available as a set of parameters with the name
156 pe_name, pe_min, pe_max or the switch combination -soft -l will
157 be passed to JSV scripts as l_soft parameter. For details con‐
158 cerning this differences consult also the qsub(1) man page.
159
160 start_command := 'START' ;
161 The start_command has no additional arguments. This command
162 indicates that a new job verification should be started. It is
163 the first command which will be sent to JSV script after it has
164 been started and it will initiate each new job verification. A
165 JSV instance might trash cached values which are still stored
166 due to a previous job verification. The application which send
167 the start_command will wait for a started_command before it con‐
168 tinues.
169
170 quit_command := 'QUIT' ;
171 The quit_command has no additional arguments. If this command is
172 sent to a JSV instance then it should terminate itself immedi‐
173 ately.
174
176 A JSV script or binary can send a set of commands to a client/qmaster
177 process to indicate its state in the communication process, to change
178 the job specification of a job which should be verified and to report
179 messages or errors. Below you can find the commands which are under‐
180 stood by the client/qmaster which will implement version 1.0 of the
181 communication protocol which was first implemented in Grid Engine
182 6.2u2:
183
184 error_command := 'ERROR' message ;
185 Any time a JSV script encounters an error it might report it to
186 the client/qmaster. If the error happens during a job verifica‐
187 tion the job which is currently verified will be rejected. The
188 JSV binary or script will also be restarted before it gets a new
189 verification task.
190
191 log_command := 'LOG' log_level ;
192
193 log_level := 'INFO' | 'WARNING' | 'ERROR'
194 log_commands can be used whenever the client or qmaster expects
195 input from a JSV instance. This command can be used in client
196 JSVs to send information to the user submitting the job. In
197 client JSVs all messages, independent of the log_level will be
198 printed to the stdout stream of the used submit client. If a
199 server JSV receives a log_command it will add the received mes‐
200 sage to the message file respecting the specified log_level.
201 Please note that message might contain spaces but no new line
202 characters.
203
204 param_command (find definition above)
205 By sending param_commands a JSV script can change the job speci‐
206 fication of the job which should be verified. If a JSV instance
207 later on sends a result_command which indicates that a JSV
208 instance should be accepted with correction then the values pro‐
209 vided with these param_commands will be used to modify the job
210 before it is accepted by the Grid Engine system.
211
212 result_command := 'RESULT' result_type [ message ] ;
213
214 result_type := 'ACCEPT' | 'CORRECT' | 'REJECT' | 'REJECT_WAIT' ;
215 After the verification of a job is done a JSV script or binary
216 has to send a result_command which indicates what should happen
217 with the job. If the result_type is ACCEPTED the job will be
218 accepted as it was initially submitted by the end user. All
219 param_commands and env_commands which might have been sent
220 before the result_command are ignored in this case. The
221 result_type CORRECT indicates that the job should be accepted
222 after all modifications sent via param_commands and env_commands
223 are applied to the job. REJECT and REJECT_WAIT cause the client
224 or qmaster instance to reject the job.
225
226 send_data_command := 'SEND' data_name ;
227
228 data_name := 'ENV';
229 If a client/qmaster receives a send_env_command from a JSV
230 instance before a started_command is sent, then it will not only
231 pass job parameters with param_commands but also env_commands
232 which provide the JSV with the information which environment
233 variables would be exported to the job environment if the job is
234 accepted and started later on.
235
236 The job environment is not passed to JSV instances as default
237 because the job environment of the end user might contain data
238 which might be interpreted wrong in the JSV context and might
239 therefore cause errors or security issues.
240
241 started_command := 'STARTED' ;
242 By sending the started_command a JSV instance indicates that it
243 is ready to receive param_commands and env_commands for a new
244 job verification. It will only receive env_commands if it sends
245 a send_data_command before the started_command.
246
248 CLIENT The corresponding value for the CLIENT parameters is either
249 'qmaster' or the name of a submit client like 'qsub',
250 'qsh', 'qrsh', 'qlogin' and so on This parameter value can't be
251 changed by JSV instances. It will always be sent as part of a
252 job verification.
253
254 CMDARGS
255 Number of arguments which will be passed to the job script or
256 command when the job execution is started. It will always be
257 sent as part of a job verification. If no arguments should be
258 passed to the job script or command it will have the value 0.
259 This parameter can be changed by JSV instances. If the value of
260 CMDARGS is bigger than the number of available CMDARG<id> param‐
261 eters then the missing parameters will be automatically passed
262 as empty parameters to the job script.
263
264 CMDNAME
265 Either the path to the script or the command name in case of
266 binary submission. It will always be sent as part of a job ver‐
267 ification.
268
269 CONTEXT
270 Either 'client' if the JSV which receives this param_command was
271 started by a commandline client like qsub, qsh, ... or 'master'
272 if it was started by the sge_qmaster process. It will always be
273 sent as part of a job verification. Changing the value of this
274 parameters is not possible within JSV instances.
275
276 GROUP Defines Primary group of the user which tries to submit the job
277 which should be verified. This parameter cannot be changed but
278 is always sent as part of the verification process. The user
279 name is passed as parameters with the name USER.
280
281 JOB_ID Not available in the client context (see CONTEXT). Otherwise it
282 contains the job number of the job which will be submitted to
283 Grid Engine when the verification process is successful. JOB_ID
284 is an optional parameter which can't be changed by JSV
285 instances.
286
287 USER Username of the user which tries to submit the job which should
288 be verified. Cannot be changed but is always sent as part of the
289 verification process. The group name is passed as parameter
290 with the name GROUP
291
292 VERSION
293 VERSION will always be sent as part of a job verification
294 process and it will always be the first parameter which is sent.
295 It will contain a version number of the format <major>.<minor>.
296 In version 6.2u2 and higher the value will be '1.0'. The value
297 of this parameter can't be changed.
298
300 Here is an example for the communication of a client with a JSV
301 instance when following job is submitted:
302
303 > qsub -pe p 3 -hard -l a=1,b=5 -soft -l q=all.q $SGE_ROOT/examples/jobs/sleeper.sh
304
305 Data in the first column is sent from the client/qmaster to the JSV
306 instance. That data contained in the second column is sent from the JSV
307 script to the client/qmaster. New line characters which terminate each
308 line in the communication protocol are omitted.
309
310 START
311 SEND ENV
312 STARTED
313 PARAM VERSION 1.0
314 PARAM CONTEXT client
315 PARAM CLIENT qsub
316 PARAM USER ernst
317 PARAM GROUP staff
318 PARAM CMDNAME /sge_root/examples/jobs/sleeper.sh
319 PARAM CMDARGS 1
320 PARAM CMDARG0 12
321 PARAM l_hard a=1,b=5
322 PARAM l_soft q=all.q
323 PARAM M user@hostname
324 PARAM N Sleeper
325 PARAM o /dev/null
326 PARAM pe_name pe1
327 PARAM pe_min 3
328 PARAM pe_max 3
329 PARAM S /bin/sh
330 BEGIN
331 RESULT STATE ACCEPT
332
333
335 ge_intro(1), qalter(1), qlogin(1), qmake(1), qrsh(1), qsh(1), qsub(1),
336 qtcsh(1),
337
339 See ge_intro(1) for a full statement of rights and permissions.
340
341
342
343GE 6.2u5 $Date: 2009/08/25 19:39:34 $ JSV(1)