1REPORTING(5) Grid Engine File Formats REPORTING(5)
2
3
4
6 reporting - Grid Engine reporting file format
7
9 A Grid Engine system writes a reporting file to $SGE_ROOT/default/com‐
10 mon/reporting. The reporting file contains data that can be used for
11 accounting, monitoring and analysis purposes. It contains information
12 about the cluster (hosts, queues, load values, consumables, etc.),
13 about the jobs running in the cluster and about sharetree configuration
14 and usage. All information is time related, events are dumped to the
15 reporting file in a configurable interval. It allows to monitor a
16 "real time" status of the cluster as well as historical analysis.
17
19 The reporting file is an ASCII file. Each line contains one record,
20 and the fields of a record are separated by a delimiter (:). The
21 reporting file contains records of different type. Each record type has
22 a specific record structure.
23
24 The first two fields are common to all reporting records:
25
26 time Time (GMT unix timestamp) when the record was created.
27
28 record type
29 Type of the accounting record. The different types of records
30 and their structure are described in the following text.
31
32 new_job
33 The new_job record is written whenever a new job enters the system
34 (usually by a submitting command). It has the following fields:
35
36 submission_time
37 Time (GMT unix time stamp) when the job was submitted.
38
39 job_number
40 The job number.
41
42 task_number
43 The array task id. Always has the value -1 for new_job records
44 (as we don't have array tasks yet).
45
46 pe_taskid
47 The task id of parallel tasks. Always has the value "none" for
48 new_job records.
49
50 job_name
51 The job name (from -N submission option)
52
53 owner The job owner.
54
55 group The unix group of the job owner.
56
57 project
58 The project the job is running in.
59
60 department
61 The department the job owner is in.
62
63 account
64 The account string specified for the job (from -A submission
65 option).
66
67 priority
68 The job priority (from -p submission option).
69
70 job_log
71 The job_log record is written whenever a job, an array task or a pe
72 tasks is changing status. A status change can be the transition from
73 pending to running, but can also be triggered by user actions like sus‐
74 pension of a job. It has the following fields:
75
76 event_time
77 Time (GMT unix time stamp) when the event was generated.
78
79 event A one word description of the event.
80
81 job_number
82 The job number.
83
84 task_number
85 The array task id. Always has the value -1 for new_job records
86 (as we don't have array tasks yet).
87
88 pe_taskid
89 The task id of parallel tasks. Always has the value "none" for
90 new_job records.
91
92 state The state of the job after the event was processed.
93
94 user The user who initiated the event (or special usernames "qmas‐
95 ter", "scheduler" and "execd" for actions of the system itself
96 like scheduling jobs, executing jobs etc.).
97
98 host The host from which the action was initiated (e.g. the submit
99 host, the qmaster host, etc.).
100
101 state_time
102 Reserved field for later use.
103
104 submission_time
105 Time (GMT unix time stamp) when the job was submitted.
106
107 job_name
108 The job name (from -N submission option)
109
110 owner The job owner.
111
112 group The unix group of the job owner.
113
114 project
115 The project the job is running in.
116
117 department
118 The department the job owner is in.
119
120 account
121 The account string specified for the job (from -A submission
122 option).
123
124 priority
125 The job priority (from -p submission option).
126
127 message
128 A message describing the reported action.
129
130 acct
131 Records of type acct are accounting records. Normally, they are written
132 whenever a job, a task of an array job, or the task of a parallel job
133 terminates. However, for long running jobs an intermediate acct record
134 is created once a day after a midnight. This results in multiple
135 accounting records for a particular job and allows for a fine-grained
136 resource usage monitoring over time. Accounting records comprise the
137 following fields:
138
139 qname Name of the cluster queue in which the job has run.
140
141 hostname
142 Name of the execution host.
143
144 group The effective group id of the job owner when executing the job.
145
146 owner Owner of the Grid Engine job.
147
148 job_name
149 Job name.
150
151 job_number
152 Job identifier - job number.
153
154 account
155 An account string as specified by the qsub(1) or qalter(1) -A
156 option.
157
158 priority
159 Priority value assigned to the job corresponding to the priority
160 parameter in the queue configuration (see queue_conf(5)).
161
162 submission_time
163 Submission time (GMT unix time stamp).
164
165 start_time
166 Start time (GMT unix time stamp).
167
168 end_time
169 End time (GMT unix time stamp).
170
171 failed Indicates the problem which occurred in case a job could not be
172 started on the execution host (e.g. because the owner of the job
173 did not have a valid account on that machine). If Grid Engine
174 tries to start a job multiple times, this may lead to multiple
175 entries in the accounting file corresponding to the same job ID.
176
177 exit_status
178 Exit status of the job script (or Grid Engine specific status in
179 case of certain error conditions).
180
181 ru_wallclock
182 Difference between end_time and start_time (see above).
183
184 The remainder of the accounting entries follows the contents of the
185 standard UNIX rusage structure as described in getrusage(2). Depending
186 on the operating system where the job was executed some of the fields
187 may be 0. The following entries are provided:
188
189 ru_utime
190 ru_stime
191 ru_maxrss
192 ru_ixrss
193 ru_ismrss
194 ru_idrss
195 ru_isrss
196 ru_minflt
197 ru_majflt
198 ru_nswap
199 ru_inblock
200 ru_oublock
201 ru_msgsnd
202 ru_msgrcv
203 ru_nsignals
204 ru_nvcsw
205 ru_nivcsw
206
207 project
208 The project which was assigned to the job.
209
210 department
211 The department which was assigned to the job.
212
213 granted_pe
214 The parallel environment which was selected for that job.
215
216 slots The number of slots which were dispatched to the job by the
217 scheduler.
218
219 task_number
220 Array job task index number.
221
222 cpu The cpu time usage in seconds.
223
224 mem The integral memory usage in Gbytes seconds.
225
226 io The amount of data transferred in input/output operations.
227
228 category
229 A string specifying the job category.
230
231 iow The io wait time in seconds.
232
233 pe_taskid
234 If this identifier is set the task was part of a parallel job
235 and was passed to Grid Engine via the qrsh -inherit interface.
236
237 maxvmem
238 The maximum vmem size in bytes.
239
240 arid Advance reservation identifier. If the job used resources of an
241 advance reservation then this field contains a positive integer
242 identifier otherwise the value is "0" .
243
244 queue
245 Records of type queue contain state information for queues (queue
246 instances). A queue record has the following fields:
247
248 qname The cluster queue name.
249
250 hostname
251 The hostname of a specific queue instance.
252
253 report_time
254 The time (GMT unix time stamp) when a state change was trig‐
255 gered.
256
257 state The new queue state.
258
259 queue_consumable
260 A queue_consumable record contains information about queue consumable
261 values in addition to queue state information:
262
263 qname The cluster queue name.
264
265 hostname
266 The hostname of a specific queue instance.
267
268 report_time
269 The time (GMT unix time stamp) when a state change was trig‐
270 gered.
271
272 state The new queue state.
273
274 consumables
275 Description of consumable values. Information about multiple
276 consumables is separated by space. A consumable description has
277 the format <name>=<actual_value>=<configured value>.
278
279 host
280 A host record contains information about hosts and host load values.
281 It contains the following information:
282
283 hostname
284 The name of the host.
285
286 report_time
287 The time (GMT unix time stamp) when the reported information was
288 generated.
289
290 state The new host state. Currently, Grid Engine doesn't track a host
291 state, the field is reserved for future use. Always contains the
292 value X.
293
294 load values
295 Description of load values. Information about multiple load val‐
296 ues is separated by space. A load value description has the
297 format <name>=<actual_value>.
298
299 host_consumable
300 A host_consumable record contains information about hosts and host con‐
301 sumables. Host consumables can for example be licenses. It contains
302 the following information:
303
304 hostname
305 The name of the host.
306
307 report_time
308 The time (GMT unix time stamp) when the reported information was
309 generated.
310
311 state The new host state. Currently, Grid Engine doesn't track a host
312 state, the field is reserved for future use. Always contains the
313 value X.
314
315 consumables
316 Description of consumable values. Information about multiple
317 consumables is separated by space. A consumable description has
318 the format <name>=<actual_value>=<configured value>.
319
320 sharelog
321 The Grid Engine qmaster can dump information about sharetree configura‐
322 tion and use to the reporting file. The parameter sharelog sets an
323 interval in which sharetree information will be dumped. It is set in
324 the format HH:MM:SS. A value of 00:00:00 configures qmaster not to dump
325 sharetree information. Intervals of several minutes up to hours are
326 sensible values for this parameter. The record contains the following
327 fields
328
329 current time
330 The present time
331
332 usage time
333 The time used so far
334
335 node name
336 The node name
337
338 user name
339 The user name
340
341 project name
342 The project name
343
344 shares The total shares
345
346 job count
347 The job count
348
349 level The percentage of shares used
350
351 total The adjusted percentage of shares used
352
353 long target share
354 The long target percentage of resource shares used
355
356 short target share
357 The short target percentage of resource shares used
358
359 actual share
360 The actual percentage of resource shares used
361
362 usage The combined shares used
363
364 cpu The cpu used
365
366 mem The memory used
367
368 io The IO used
369
370 long target cpu
371 The long target cpu used
372
373 long target mem
374 The long target memory used
375
376 long target io
377 The long target IO used
378
379 new_ar
380 A new_ar record contains information about advance reservation objects.
381 Entries of this type will be added if an advance reservation is cre‐
382 ated. It contains the following information:
383
384 submission_time
385 The time (GMT unix time stamp) when the advance reservation was
386 created.
387
388 ar_number
389 The advance reservation number identifying the reservation.
390
391 ar_owner
392 The owner of the advance reservation.
393
394 ar_attribute
395 The ar_attribute record is written whenever a new advance reservation
396 was added or the attribute of an existing advance reservation has
397 changed. It has following fields.
398
399 event_time
400 The time (GMT unix time stamp) when the event was generated.
401
402 submission_time
403 The time (GMT unix time stamp) when the advance reservation was
404 created.
405
406 ar_number
407 The advance reservation number identifying the reservation.
408
409 ar_name
410 Name of the advance reservation.
411
412 ar_account
413 An account string which was specified during the creation of the
414 advance reservation.
415
416 ar_start_time
417 Start time.
418
419 ar_end_time
420 End time.
421
422 ar_granted_pe
423 The parallel environment which was selected for an advance
424 reservation.
425
426 ar_granted_resources
427 The granted resources which were selected for an advance reser‐
428 vation.
429
430 ar_log
431 The ar_log record is written whenever a advance reservation is changing
432 status. A status change can be from pending to active, but can also be
433 triggered by system events like host outage. It has following fields.
434
435 ar_state_change_time
436 The time (GMT unix time stamp) when the event occurred which
437 caused a state change.
438
439 submission_time
440 The time (GMT unix time stamp) when the advance reservation was
441 created.
442
443 ar_number
444 The advance reservation number identifying the reservation.
445
446 ar_state
447 The new state.
448
449 ar_event
450 An event id identifying the event which caused the state change.
451
452 ar_message
453 A message describing the event which caused the state change.
454
455 ar_acct
456 The ar_acct records are accounting records which are written for every
457 queue instance whenever a advance reservation terminates. Advance
458 reservation accounting records comprise following fields.
459
460 ar_termination_time
461 The time (GMT unix time stamp) when the advance reservation ter‐
462 minated.
463
464 submission_time
465 The time (GMT unix time stamp) when the advance reservation was
466 created.
467
468 ar_number
469 The advance reservation number identifying the reservation.
470
471 ar_qname
472 Cluster queue name which the advance reservation reserved.
473
474 ar_hostname
475 The name of the execution host.
476
477 ar_slots
478 The number of slots which were reserved.
479
481 sge_conf(5). host_conf(5).
482
484 See ge_intro(1) for a full statement of rights and permissions.
485
486
487
488GE 6.2u5 $Date: 2008/04/22 15:49:02 $ REPORTING(5)