1QPING(1) Grid Engine User Commands QPING(1)
2
3
4
6 qping - check application status of Grid Engine daemons.
7
9 qping [-help] [-noalias] [-ssl|-tcp] [ [ [-i <interval>] [-info] [-f] ]
10 | [ [-dump_tag tag [param] ] [-dump] [-nonewline] ] ] <host> <port>
11 <name> <id>
12
14 qping is used to validate the runtime status of a Grid Engine service
15 daemon. The current Grid Engine implementation allows one to query the
16 SGE_QMASTER daemon and any running SGE_EXECD daemon. The qping command
17 is used to send a SIM (Status Information Message) to the destination
18 daemon. The communication layer of the specified daemon will respond
19 with a SIRM (Status Information Response Message) which contains status
20 information about the consulted daemon.
21
22 The qping -dump and -dump_tag options allowing an administrator to
23 observe the communication protocol data flow of a Grid Engine service
24 daemon. The qping -dump instruction must be started with root account
25 and on the same host where the observed daemon is running.
26
28 -f
29 Show full status information on each ping interval.
30
31 First output Line: The first output line shows the date and time of the
32 request.
33
34 SIRM version: Internal version number of the SIRM (Status Information
35 Response Message)
36
37 SIRM message id: Current message id for this connection
38
39 start time: Start time of daemon. The format is as follows:
40
41 MM/DD/YYYY HH:MM:SS (seconds since 01.01.1970)
42
43 run time [s]: Run time in seconds since start time
44
45 messages in read buffer: Nr. of buffered messages in communication buf‐
46 fer. The messages are buffered for the application (daemon). When this
47 number grows too large the daemon is not able to handle all messages
48 sent to it.
49
50 messages in write buffer: Nr. of buffered messages in the communication
51 write buffer. The messages are sent from the application (daemon) to
52 the connected clients, but the communication layer wasn't able to send
53 the messages yet. If this number grows too large, the communication
54 layer isn't able to send them as fast as the application (daemon) wants
55 the messages to be sent.
56
57 nr. of connected clients: This is the number of actual connected
58 clients to this daemon. This also implies the current qping connection.
59
60 status: The status value of the daemon. This value depends on the
61 application which reply to the qping request. If the application does
62 not provide any information the status is 99999. Here are the possible
63 status information values for the Grid Engine daemons:
64
65 qmaster:
66
67 0 There is no unusual timing situation.
68
69 1 One or more threads has reached warning timeout. This may happen when
70 at least one thread does not increment its time stamp for an unusually
71 long time. A possible reason for this is a high workload for this
72 thread.
73
74 2 One or more threads has reached error timeout. This may happen when
75 at least one thread has not incremented his time stamp for longer than
76 10 minutes.
77
78 3 The time measurement is not initialized.
79
80 execd:
81
82 0 There is no unusual timing situation.
83
84 1 Dispatcher has reached warning timeout. This may happen when the dis‐
85 patcher does not increment his time stamp for a unusual long time. A
86 possible reason for this is a high workload.
87
88 2 Dispatcher has reached error timeout. This may happen when the dis‐
89 patcher has not incremented his time stamp for longer than 10 minutes.
90
91 3 The time measurement is not initialized.
92
93
94 info: Status message of the daemon. This value depends on the applica‐
95 tion which reply to the qping request. If the application does not
96 provide any information the info message is "not available". Here are
97 the possible status information values for the Grid Engine daemons:
98
99
100 qmaster:
101
102 The info message contains information about the qmaster threads fol‐
103 lowed by a thread state and time information. Each time when one of the
104 known threads pass through their main loop the time information is
105 updated. Since the qmaster has two message threads every message thread
106 updates the time. This means the timeout for the message thread (MT)
107 can only occur when no message thread is active anymore:
108
109 THREAD_NAME: THREAD_STATE (THREAD_TIME)
110
111 THREAD_NAME:
112 EDT: Event Delivery Thread
113 TET: Timed Event Thread
114 MT: Message Thread(s)
115 SIGT: SIGnal Thread
116
117 In addition to these thread names, the name can contain a thread number (for example:
118 MT(1)), when multiple instances of this thread are running.
119
120 THREAD_STATE:
121 R: Running
122 W: Warning
123 E: Error
124
125 THREAD_TIME:
126 Time since last timestamp updating.
127
128 After the dispatcher information follows an additional information
129 string which describes the complete application status.
130
131 execd:
132
133 The info message contains information for the execd job dispatcher:
134 dispatcher: STATE (TIME)
135
136 STATE:
137 R: Running
138 W: Warning
139 E: Error
140
141 TIME:
142 Time since last timestamp updating.
143
144 After the thread information follows an additional information string
145 which describes the application status.
146
147 Monitor: If available, displays statistics on a thread. The data for
148 each thread is displayed in one line. The format of this line can be
149 changed at any time. Only the master implements the monitoring.
150
151
152 -help
153 Prints a list of all options.
154
155
156 -i interval
157 Set qping interval time.
158
159 The default interval time is one second. qping will send a SIM (Status
160 Information Message) on each interval time.
161
162
163 -info
164 Show full status information (see -f for more information) and exit.
165 The exit value 0 indicates no error. On errors qping returns with 1.
166
167
168 -noalias
169 Ignore host_aliases file, which is located at $SGE_ROOT/$SGE_CELL/com‐
170 mon/host_aliases. If this option is used it is not necessary to set
171 any Grid Engine environment variable.
172
173
174 -ssl
175 This option can be used to specify an SSL (Secure Socket Layer) config‐
176 uration. The qping will use the configuration to connect to services
177 running SSL. If the SGE settings file is not sourced, you have to use
178 the -noalias option to bypass the need for the SGE_ROOT environment
179 variable. The following environment variables are used to specify your
180 certificates:
181 SSL_CA_CERT_FILE - CA certificate file
182 SSL_CERT_FILE - certificates file
183 SSL_KEY_FILE - key file
184 SSL_RAND_FILE - rand file
185
186
187 -tcp
188 This option is used to select TCP/IP as the protocol used to connect to
189 other services.
190
191
192 -nonewline
193 Dump output will not have a linebreak within a message and binary mes‐
194 sages are not unpacked.
195
196
197 -dump
198 This option allows an administrator to observe the communication proto‐
199 col data flow of a Grid Engine service daemon. The qping -dump instruc‐
200 tion must be started as root and on the same host where the observed
201 daemon is running.
202
203 The output is written to stdout. The environment variable
204 "SGE_QPING_OUTPUT_FORMAT" can be set to hide columns, set a default
205 column width or to set a hostname output format. The value of the envi‐
206 ronment variable can be set to any combination of the following speci‐
207 fiers separated by a space character:
208 "h:X" -> hide column X
209 "s:X" -> show column X
210 "w:X:Y" -> set width of column X to Y
211 "hn:X" -> set hostname output parameter X.
212 X values are "long" or "short"
213
214 Start qping -help to see which columns are available.
215
216
217
218 -dump_tag tag [param]
219 This option has the same the same meaning as -dump, but can provide
220 more information by specifying the debug level and message types qping
221 should print: -dump_tag ALL <debug level> This option shows all possi‐
222 ble debug messages (APP+MSG) for the debug levels, ERROR, WARNING,
223 INFO, DEBUG and DPRINTF. The contacted service must support this kind
224 of debugging. This option is not currently implemented. -dump_tag APP
225 <debug level> This option shows only application debug messages for the
226 debug levels, ERROR, WARNING, INFO, DEBUG and DPRINTF. The contacted
227 service must support this kind of debugging. This option is not cur‐
228 rently implemented. -dump_tag MSG This option has the same behavior as
229 the -dump option.
230
231
232 host
233 Host where daemon is running.
234
235
236 port
237 Port which daemon has bound (used sge_qmaster/sge_execd port number).
238
239
240 name
241 Name of communication endpoint ("qmaster" or "execd"). A communication
242 endpoint is a triplet of hostname/endpoint name/endpoint id (e.g.
243 hostA/qmaster/1 or subhost/qstat/4).
244
245
246 id
247 Id of communication endpoint ("1" for daemons)
248
249
250
252 >qping master_host 31116 qmaster
253 08/24/2004 16:41:15 endpoint master_host/qmaster/1 at port 31116 is up since 365761 seconds
254 08/24/2004 16:41:16 endpoint master_host/qmaster/1 at port 31116 is up since 365762 seconds
255 08/24/2004 16:41:17 endpoint master_host/qmaster/1 at port 31116 is up since 365763 seconds
256
257 > qping -info master_host 31116 qmaster 1
258 08/24/2004 16:42:47:
259 SIRM version: 0.1
260 SIRM message id: 1
261 start time: 08/20/2004 11:05:14 (1092992714)
262 run time [s]: 365853
263 messages in read buffer: 0
264 messages in write buffer: 0
265 nr. of connected clients: 4
266 status: 0
267 info: ok
268
269 > qping -info execd_host 31117 execd 1
270 08/24/2004 16:43:45:
271 SIRM version: 0.1
272 SIRM message id: 1
273 start time: 08/20/2004 11:06:13 (1092992773)
274 run time [s]: 365852
275 messages in read buffer: 0
276 messages in write buffer: 0
277 nr. of connected clients: 2
278 status: 0
279 info: ok
280
281
282
284 SGE_ROOT Specifies the location of the Grid Engine standard con‐
285 figuration files.
286
287 SGE_CELL If set, specifies the default Grid Engine cell.
288
290 sge_intro(1), SGE_H_ALIASES(5), sge_qmaster(8), sge_execd(8).
291
293 See sge_intro(1) for a full statement of rights and permissions.
294
295
296
297GE 6.1 $Date: 2007/07/19 08:17:15 $ QPING(1)