1QPING(1) Grid Engine User Commands QPING(1)
2
3
4
6 qping - check application status of Grid Engine daemons.
7
9 qping [-help] [-noalias] [-ssl|-tcp] [ [ [-i <interval>] [-info] [-f] ]
10 | [ [-dump_tag tag [param] ] [-dump] [-nonewline] ] ] <host> <port>
11 <name> <id>
12
14 Qping is used to validate the runtime status of a Grid Engine service
15 daemon. The current Grid Engine implementation allows one to query the
16 GE_QMASTER daemon and any running GE_EXECD daemon. The qping command is
17 used to send a SIM (Status Information Message) to the destination dae‐
18 mon. The communication layer of the specified daemon will respond with
19 a SIRM (Status Information Response Message) which contains status
20 information about the consulted daemon.
21
22 The qping -dump and -dump_tag options allowing an administrator to
23 observe the communication protocol data flow of a Grid Engine service
24 daemon. The qping -dump instruction must be started with root account
25 and on the same host where the observed daemon is running.
26
28 -f
29 Show full status information on each ping interval.
30
31 First output Line: The first output line shows the date and time of the
32 request.
33
34 SIRM version: Internal version number of the SIRM (Status Information
35 Response Message)
36
37 SIRM message id: Current message id for this connection
38
39 start time: Start time of daemon. The format is as follows:
40
41 MM/DD/YYYY HH:MM:SS (seconds since 01.01.1970)
42
43 run time [s]: Run time in seconds since start time
44
45 messages in read buffer: Nr. of buffered messages in communication buf‐
46 fer. The messages are buffered for the application (daemon). When this
47 number grows too large the daemon is not able to handle all messages
48 sent to it.
49
50 messages in write buffer: Nr. of buffered messages in the communication
51 write buffer. The messages are sent from the application (daemon) to
52 the connected clients, but the communication layer wasn't able to send
53 the messages yet. If this number grows too large, the communication
54 layer isn't able to send them as fast as the application (daemon) wants
55 the messages to be sent.
56
57 nr. of connected clients: This is the number of actual connected
58 clients to this daemon. This also implies the current qping connection.
59
60 status: The status value of the daemon. This value depends on the
61 application which reply to the qping request. If the application does
62 not provide any information the status is 99999. Here are the possible
63 status information values for the Grid Engine daemons:
64
65 qmaster:
66
67 0 There is no unusual timing situation.
68
69 1 One or more threads has reached warning timeout. This may hap‐
70 pen when at least one thread does not increment his time stamp
71 for a not usual long time. A possible reason for this is a high
72 workload for this thread.
73
74 2 One or more threads has reached error timeout. This may happen
75 when at least one thread has not incremented his time stamp for
76 longer than 10 minutes.
77
78 3 The time measurement is not initialized.
79
80 execd:
81
82 0 There is no unusual timing situation.
83
84 1 Dispatcher has reached warning timeout. This may happen when
85 the dispatcher does not increment his time stamp for a unusual
86 long time. A possible reason for this is a high workload.
87
88 2 Dispatcher has reached error timeout. This may happen when the
89 dispatcher has not incremented his time stamp for longer than 10
90 minutes.
91
92 3 The time measurement is not initialized.
93
94
95 info: Status message of the daemon. This value depends on the applica‐
96 tion which reply to the qping request. If the application does not
97 provide any information the info message is "not available". Here are
98 the possible status information values for the Grid Engine daemons:
99
100
101 qmaster:
102
103 The info message contains information about the qmaster threads
104 followed by a thread state and time information. Each time when
105 one of the known threads pass through their main loop the time
106 information is updated. Since the qmaster has two message threads
107 every message thread updates the time. This means the timeout for
108 the message thread (MT) can only occur when no message thread is
109 active anymore:
110
111 THREAD_NAME: THREAD_STATE (THREAD_TIME)
112
113 THREAD_NAME:
114 MAIN: Main thread
115 signaler: Signal thread
116 event_master: Event master thread
117 timer: Timer thread
118 worker: Worker thread
119 listener: Listener thread
120 scheduler: Scheduler thread
121 jvm: Java thread
122
123 The thread names above will be followed by a 3 digit number.
124
125 THREAD_STATE:
126 R: Running
127 W: Warning
128 E: Error
129
130 THREAD_TIME:
131 Time since last timestamp updating.
132
133 After the dispatcher information follows an additional informa‐
134 tion string which describes the complete application status.
135
136 execd:
137
138 The info message contains information for the execd job dis‐
139 patcher:
140 dispatcher: STATE (TIME)
141
142 STATE:
143 R: Running
144 W: Warning
145 E: Error
146
147 TIME:
148 Time since last timestamp updating.
149
150 After the thread information follows an additional information
151 string which describes the application status.
152
153 Monitor: If available, displays statistics on a thread. The data for each
154 thread is displayed in one line. The format of this line can be changed at
155 any time. Only the master implements the monitoring.
156
157
158 -help
159 Prints a list of all options.
160
161
162 -i interval
163 Set qping interval time.
164
165 The default interval time is one second. Qping will send a SIM (Status
166 Information Message) on each interval time.
167
168
169 -info
170 Show full status information (see -f for more information) and exit.
171 The exit value 0 indicates no error. On errors qping returns with 1.
172
173
174 -noalias
175 Ignore host_aliases file, which is located at <ge_root>/<cell>/com‐
176 mon/host_aliases. If this option is used it is not necessary to set
177 any Grid Engine environment variable.
178
179
180 -ssl
181 This option can be used to specify an SSL (Secure Socket Layer) config‐
182 uration. The qping will use the configuration to connect to services
183 running SSL. If the SGE settings file is not sourced, you have to use
184 the -noalias option to bypass the need for the SGE_ROOT environment
185 variable. The following environment variables are used to specify your
186 certificates:
187 SSL_CA_CERT_FILE - CA certificate file
188 SSL_CERT_FILE - certificates file
189 SSL_KEY_FILE - key file
190 SSL_RAND_FILE - rand file
191
192
193 -tcp
194 This option is used to select TCP/IP as the protocol used to connect to
195 other services.
196
197
198 -nonewline
199 Dump output will not have a linebreak within a message and binary mes‐
200 sages are not unpacked.
201
202
203 -dump
204 This option allows an administrator to observe the communication proto‐
205 col data flow of a Grid Engine service daemon. The qping -dump instruc‐
206 tion must be started as root and on the same host where the observed
207 daemon is running.
208
209 The output is written to stdout. The environment variable
210 "SGE_QPING_OUTPUT_FORMAT" can be set to hide columns, set a default
211 column width or to set a hostname output format. The value of the envi‐
212 ronment variable can be set to any combination of the following speci‐
213 fiers separated by a space character:
214 "h:X" -> hide column X
215 "s:X" -> show column X
216 "w:X:Y" -> set width of column X to Y
217 "hn:X" -> set hostname output parameter X.
218 X values are "long" or "short"
219
220 Start qping -help to see which columns are available.
221
222
223
224 -dump_tag tag [param]
225 This option has the same the same meaning as -dump, but can provide
226 more information by specifying the debug level and message types qping
227 should print:
228 -dump_tag ALL <debug level>
229 This option shows all possible debug messages (APP+MSG) for the
230 debug levels, ERROR, WARNING, INFO, DEBUG and DPRINTF. The con‐
231 tacted service must support this kind of debugging. This option
232 is not currently implemented.
233 -dump_tag APP <debug level>
234 This option shows only application debug messages for the debug
235 levels, ERROR, WARNING, INFO, DEBUG and DPRINTF. The contacted
236 service must support this kind of debugging. This option is not
237 currently implemented.
238 -dump_tag MSG
239 This option has the same behavior as the -dump option.
240
241
242 host
243 Host where daemon is running.
244
245
246 port
247 Port which daemon has bound (used ge_qmaster/ge_execd port number).
248
249
250 name
251 Name of communication endpoint ("qmaster" or "execd"). A communication
252 endpoint is a triplet of hostname/endpoint name/endpoint id (e.g.
253 hostA/qmaster/1 or subhost/qstat/4).
254
255
256 id
257 Id of communication endpoint ("1" for daemons)
258
259
260
262 >qping master_host 31116 qmaster
263 08/24/2004 16:41:15 endpoint master_host/qmaster/1 at port 31116 is up since 365761 seconds
264 08/24/2004 16:41:16 endpoint master_host/qmaster/1 at port 31116 is up since 365762 seconds
265 08/24/2004 16:41:17 endpoint master_host/qmaster/1 at port 31116 is up since 365763 seconds
266
267 > qping -info master_host 31116 qmaster 1
268 08/24/2004 16:42:47:
269 SIRM version: 0.1
270 SIRM message id: 1
271 start time: 08/20/2004 11:05:14 (1092992714)
272 run time [s]: 365853
273 messages in read buffer: 0
274 messages in write buffer: 0
275 nr. of connected clients: 4
276 status: 0
277 info: ok
278
279 > qping -info execd_host 31117 execd 1
280 08/24/2004 16:43:45:
281 SIRM version: 0.1
282 SIRM message id: 1
283 start time: 08/20/2004 11:06:13 (1092992773)
284 run time [s]: 365852
285 messages in read buffer: 0
286 messages in write buffer: 0
287 nr. of connected clients: 2
288 status: 0
289 info: ok
290
291
292
294 GE_ROOT Specifies the location of the Grid Engine standard con‐
295 figuration files.
296
297 GE_CELL If set, specifies the default Grid Engine cell.
298
300 ge_intro(1), host_aliases(5), ge_qmaster(8), ge_execd(8).
301
303 See ge_intro(1) for a full statement of rights and permissions.
304
305
306
307GE 6.2u5 $Date: 2009/03/12 16:06:25 $ QPING(1)