1REPORTING(5) Grid Engine File Formats REPORTING(5)
2
3
4
6 reporting - Grid Engine reporting file format
7
9 An Grid Engine system writes a reporting file to $SGE_ROOT/default/com‐
10 mon/reporting. The reporting file contains data that can be used for
11 accounting, monitoring and analysis purposes. It contains information
12 about the cluster (hosts, queues, load values, consumables, etc.),
13 about the jobs running in the cluster and about sharetree configuration
14 and usage. All information is time related, events are dumped to the
15 reporting file in a configurable interval. It allows to monitor a
16 "real time" status of the cluster as well as historical analysis.
17
19 The reporting file is an ASCII file. Each line contains one record,
20 and the fields of a record are separated by a delimiter (:). The
21 reporting file contains records of different type. Each record type has
22 a specific record structure.
23
24 The first two fields are common to all reporting records:
25
26 time Time (GMT Unix timestamp) when the record was created.
27
28 record type
29 Type of the accounting record. The different types of records
30 and their structure are described in the following text.
31
32 new_job
33 The new_job record is written whenever a new job enters the system
34 (usually by a submitting command). It has the following fields:
35
36 submission_time
37 Time (GMT Unix time stamp) when the job was submitted.
38
39 job_number
40 The job number.
41
42 task_number
43 The array task id. It has the value -1 for new_job records (as
44 we don't have array tasks yet).
45
46 pe_taskid
47 The task id of parallel tasks. It has the value "none" for
48 new_job records.
49
50 job_name
51 The job name (from -N submission option)
52
53 owner The job owner.
54
55 group The Unix group of the job owner.
56
57 project
58 The project the job is running in.
59
60 department
61 The department the job owner is in.
62
63 account
64 The account string specified for the job (from -A submission
65 option).
66
67 priority
68 The job priority (from -p submission option).
69
70 job_log
71 The job_log record is written whenever a job, an array task or a pe
72 tasks is changing status. A status change can be the transition from
73 pending to running, but can also be triggered by user actions like sus‐
74 pension of a job. It has the following fields:
75
76 event_time
77 Time (GMT Unix time stamp) when the event was generated.
78
79 event A one word description of the event.
80
81 job_number
82 The job number.
83
84 task_number
85 The array task id. It has the value -1 for new_job records (as
86 we don't have array tasks yet).
87
88 pe_taskid
89 The task id of parallel tasks. It has the value "none" for
90 new_job records.
91
92 state The state of the job after the event was processed.
93
94 user The user who initiated the event (or special usernames "qmas‐
95 ter", "scheduler" and "execd" for actions of the system itself
96 like scheduling jobs, executing jobs etc.).
97
98 host The host from which the action was initiated (e.g. the submit
99 host, the qmaster host, etc.).
100
101 state_time
102 Reserved field for later use.
103
104 submission_time
105 Time (GMT Unix time stamp) when the job was submitted.
106
107 job_name
108 The job name (from -N submission option)
109
110 owner The job owner.
111
112 group The Unix group of the job owner.
113
114 project
115 The project the job is running in.
116
117 department
118 The department the job owner is in.
119
120 account
121 The account string specified for the job (from -A submission
122 option).
123
124 priority
125 The job priority (from -p submission option).
126
127 message
128 A message describing the reported action.
129
130 acct
131 Records of type acct are accounting records. They are written whenever
132 a job, a task of an array job or the task of a parallel job terminates.
133 Accounting records comprise the following fields:
134
135 qname Name of the cluster queue in which the job has run.
136
137 hostname
138 Name of the execution host.
139
140 group The effective group id of the job owner when executing the job.
141
142 owner Owner of the Grid Engine job.
143
144 job_name
145 Job name.
146
147 job_number
148 Job identifier - job number.
149
150 account
151 An account string as specified by the qsub(1) or qalter(1) -A
152 option.
153
154 priority
155 Priority value assigned to the job corresponding to the priority
156 parameter in the queue configuration (see queue_conf(5)).
157
158 submission_time
159 Submission time (GMT Unix time stamp). For slave tasks of
160 tightly integrated parallel jobs, the submission_time is set to
161 0.
162
163 start_time
164 Start time (GMT Unix time stamp).
165
166 end_time
167 End time (GMT Unix time stamp).
168
169 failed Indicates the problem which occurred in case a job could not be
170 started on the execution host (e.g. because the owner of the job
171 did not have a valid account on that machine). If Grid Engine
172 tries to start a job multiple times, this may lead to multiple
173 entries in the accounting file corresponding to the same job ID.
174
175 exit_status
176 Exit status of the job script (or Grid Engine specific status in
177 case of certain error conditions).
178
179 ru_wallclock
180 Difference between end_time and start_time (see above).
181
182 The remainder of the accounting entries follows the contents of the
183 standard UNIX rusage structure as described in getrusage(2). Depending
184 on the operating system where the job was executed some of the fields
185 may be 0. The following entries are provided:
186
187 ru_utime
188 ru_stime
189 ru_maxrss
190 ru_ixrss
191 ru_ismrss
192 ru_idrss
193 ru_isrss
194 ru_minflt
195 ru_majflt
196 ru_nswap
197 ru_inblock
198 ru_oublock
199 ru_msgsnd
200 ru_msgrcv
201 ru_nsignals
202 ru_nvcsw
203 ru_nivcsw
204
205 project
206 The project which was assigned to the job.
207
208 department
209 The department which was assigned to the job.
210
211 granted_pe
212 The parallel environment which was selected for that job.
213
214 slots The number of slots which were dispatched to the job by the
215 scheduler.
216
217 task_number
218 Array job task index number.
219
220 cpu The cpu time usage in seconds.
221
222 mem The integral memory usage in Gbytes seconds.
223
224 io The amount of data transferred in input/output operations.
225
226 category
227 A string specifying the job category.
228
229 iow The io wait time in seconds.
230
231 pe_taskid
232 If this identifier is set the task was part of a parallel job
233 and was passed to Grid Engine via the qrsh -inherit interface.
234
235 maxvmem
236 The maximum vmem size in bytes.
237
238 queue
239 Records of type queue contain state information for queues (queue
240 instances). A queue record has the following fields:
241
242 qname The cluster queue name.
243
244 hostname
245 The hostname of a specific queue instance.
246
247 report_time
248 The time (GMT Unix time stamp) when a state change was trig‐
249 gered.
250
251 state The new queue state.
252
253 queue_consumable
254 A queue_consumable record contains information about queue consumable
255 values in addition to queue state information:
256
257 qname The cluster queue name.
258
259 hostname
260 The hostname of a specific queue instance.
261
262 report_time
263 The time (GMT Unix time stamp) when a state change was trig‐
264 gered.
265
266 state The new queue state.
267
268 consumables
269 Description of consumable values. Information about multiple
270 consumables is separated by space. A consumable description has
271 the format <name>=<actual_value>=<configured value>.
272
273 host
274 A host record contains information about hosts and host load values.
275 It contains the following information:
276
277 hostname
278 The name of the host.
279
280 report_time
281 The time (GMT Unix time stamp) when the reported information was
282 generated.
283
284 state The new host state. Currently, Grid Engine doesn't track a host
285 state, the field is reserved for future use. It contains the
286 value X.
287
288 load values
289 Description of load values. Information about multiple load val‐
290 ues is separated by space. A load value description has the
291 format <name>=<actual_value>.
292
293 host_consumable
294 A host_consumable record contains information about hosts and host con‐
295 sumables. Host consumables can for example be licenses. It contains
296 the following information:
297
298 hostname
299 The name of the host.
300
301 report_time
302 The time (GMT Unix time stamp) when the reported information was
303 generated.
304
305 state The new host state. Currently, Grid Engine doesn't track a host
306 state, the field is reserved for future use. It contains the
307 value X.
308
309 consumables
310 Description of consumable values. Information about multiple
311 consumables is separated by space. A consumable description has
312 the format <name>=<actual_value>=<configured value>.
313
314 sharelog
315 The Grid Engine qmaster can dump information about sharetree configura‐
316 tion and use to the reporting file. The parameter sharelog sets an
317 interval in which sharetree information will be dumped. It is set in
318 the format HH:MM:SS. A value of 00:00:00 configures qmaster not to dump
319 sharetree information. Intervals of several minutes up to hours are
320 sensible values for this parameter. The record contains the following
321 fields
322
323 current time
324 The present time
325
326 usage time
327 The time used so far
328
329 node name
330 The node name
331
332 user name
333 The user name
334
335 project name
336 The project name
337
338 shares The total shares
339
340 job count
341 The job count
342
343 level The percentage of shares used
344
345 total The adjusted percentage of shares used
346
347 long target share
348 The long target percentage of resource shares used
349
350 short target share
351 The short target percentage of resource shares used
352
353 actual share
354 The actual percentage of resource shares used
355
356 usage The combined shares used
357
358 cpu The cpu used
359
360 mem The memory used
361
362 io The IO used
363
364 long target cpu
365 The long target cpu used
366
367 long target mem
368 The long target memory used
369
370 long target io
371 The long target IO used
372
373 new_ar
374 A new_ar record contains information about advance reservation objects.
375 Entries of this type will be added if an advance reservation is cre‐
376 ated. It contains the following information:
377
378 submission_time
379 The time (GMT unix time stamp) when the advance reservation was
380 created.
381
382 ar_number
383 The advance reservation number identifying the reservation.
384
385 ar_owner
386 The owner of the advance reservation.
387
388 ar_attribute
389 The ar_attribute record is written whenever a new advance reservation
390 was added or the attribute of an existing advance reservation has
391 changed. It has following fields.
392
393 event_time
394 The time (GMT unix time stamp) when the event was generated.
395
396 ar_number
397 The advance reservation number identifying the reservation.
398
399 ar_name
400 Name of the advance reservation.
401
402 ar_account
403 An account string which was specified during the creation of the
404 advance reservation.
405
406 ar_start_time
407 Start time.
408
409 ar_end_time
410 End time.
411
412 ar_granted_pe
413 The parallel environment which was selected for an advance
414 reservation.
415
416 ar_granted_resources
417 The granted resources which were selected for an advance reser‐
418 vation.
419
420 ar_log
421 The ar_log record is written whenever a advance reservation is changing
422 status. A status change can be from pending to active, but can also be
423 triggered by system events like host outage. It has following fields.
424
425 ar_state_change_time
426 The time (GMT unix time stamp) when the event occurred which
427 caused a state change.
428
429 ar_number
430 The advance reservation number identifying the reservation.
431
432 ar_state
433 The new state.
434
435 ar_event
436 An event id identifying the event which caused the state change.
437
438 ar_message
439 A message describing the event which caused the state change.
440
441 ar_acct
442 The ar_acct records are accounting records which are written for every
443 queue instance whenever a advance reservation terminates. Advance
444 reservation accounting records comprise following fields.
445
446 ar_termination_time
447 The time (GMT unix time stamp) when the advance reservation ter‐
448 minated.
449
450 ar_number
451 The advance reservation number identifying the reservation.
452
453 ar_qname
454 Cluster queue name which the advance reservation reserved.
455
456 ar_hostname
457 The name of the execution host.
458
459 ar_slots
460 The number of slots which were reserved.
461
463 sge_conf(5). host_conf(5).
464
466 See sge_intro(1) for a full statement of rights and permissions.
467
468
469
470GE 6.1 $Date: 2007/07/19 08:17:18 $ REPORTING(5)