1CHECK_OPENMANAGE(8) Nagios plugin CHECK_OPENMANAGE(8)
2
3
4
6 check_openmanage - Nagios plugin for checking the hardware status on
7 Dell servers running OpenManage
8
10 check_openmanage [option...]
11
12 check_openmanage -H hostname [option...]
13
15 check_openmanage is a plugin for Nagios which checks the hardware
16 health of Dell servers running OpenManage Server Administrator (OMSA).
17 The plugin checks the health of the storage subsystem, power supplies,
18 memory modules, temperature probes etc., and gives an alert if any of
19 the components are faulty or operate outside normal parameters.
20
21 check_openmanage is designed to be used by either locally (using NRPE
22 or similar) or remotely (using SNMP). In either mode, the output is
23 (nearly) the same. Note that checking the alert log is not supported in
24 SNMP mode.
25
27 -f, --config file
28 Specify a configuration file. For reference on the config file
29 syntax and options, consult the check_openmanage.conf(5) manual
30 page.
31
32 -t, --timeout seconds
33 The number of seconds after which the plugin will abort. Default
34 timeout is 30 seconds if the option is not present.
35
36 -p, --perfdata [argument]
37 Collect performance data. Performance data collected include
38 temperatures (in Celsius) and fan speeds (in rpm). On systems that
39 support it, power consumption is also collected (in Watts). This
40 option takes one of two arguments, both of which are optional:
41
42 minimal
43 If minimal is specified as argument, the plugin will use
44 shorter names for the performance data labels, e.g. “t0”
45 instead of “temp_0_system_board_ambient”. This can be used as a
46 workaround in cases where the plugin output needs shortening,
47 for example if the 1024 character limit of NRPE is reached.
48
49 multiline
50 If multiline is specified as argument, the plugin will output
51 the performance data on multiple lines, for Nagios 3.x and
52 above.
53
54 The default behaviour should be sufficient for most users.
55
56 --legacy-perfdata
57 With version 3.7.0, performance data output changed. The new format
58 is not compatible with the old format. Users who wish to postpone
59 switching to the new performance data API may set this option.
60
61 -w, --warning string | file
62 Override the machine-default temperature warning thresholds. Syntax
63 is:
64
65 id1=max[/min],id2=max[/min],...
66
67
68 The following example sets warning limits to max 50C for probe 0,
69 and max 45C and min 10C for probe 1:
70
71 check_openmanage -w 0=50,1=45/10
72
73
74 The minimum limit can be omitted, if desired. Most often, you are
75 only interested in setting the maximum thresholds.
76
77 This parameter can be either a string with the limits, or a file
78 containing the limits string. The option can be specified multiple
79 times.
80
81 NOTE: This option should only be used to narrow the field of OK
82 temperatures wrt. the OMSA defaults. To expand the field of OK
83 temperatures, increase the OMSA thresholds. See the plugin web page
84 for more information.
85
86 -c, --critical string | file
87 Override the machine-default temperature critical thresholds.
88 Syntax and behaviour is the same as for warning thresholds
89 described above.
90
91 -F, --fahrenheit
92 Set Fahrenheit as unit for all temperatures. This option will
93 override the --tempunit option, if used simultaneously.
94
95 --tempunit unit
96 Set temperature unit. Legal values are:
97
98 F: Fahrenheit
99
100 C: Celsius
101
102 K: Kelvin
103
104 R: Rankine
105
106 Default: C
107
108 --omreport path
109 Specify full path to omreport, if it is not installed in any of the
110 regular places. Usually this option is only needed on Windows, if
111 omreport is not installed on the C: drive.
112
113 --vdisk-critical
114 Make any alerts concerning virtual disks appear as critical.
115
116 -d, --debug
117 Debug output. Will report status on everything, even if status is
118 ok. Blacklisted or unchecked components are ignored (i.e. no
119 output).
120
121 NOTE: This option is intended for diagnostics and debugging
122 purposes only. Do not use this option from within Nagios, i.e. in
123 the Nagios config.
124
125 -h, --help
126 Display help message and exit.
127
128 -V, --version
129 Print version info and exit.
130
132 -o, --ok-info level
133 This option lets you define how much output you want the plugin to
134 give when everything is OK, i.e. the verbosity level. The default
135 value is 0 (one line of output). The output levels are cumulative.
136
137 0: Only one line
138
139 1: BIOS and firmware info on a separate line
140
141 2: Storage controller and enclosure info on separate lines
142
143 3: OMSA version on separate line
144
145 Default: 0
146
147 The reason that OMSA version is separated from the rest is that
148 finding it requires running a really slow omreport command, when
149 the plugin is run locally via NRPE.
150
151 -B, --show-blacklist
152 If used together with blacklisting, this option will make the
153 plugin output all blacklistings that are being used. The output
154 will have the correct blacklisting syntax, and will make it easy to
155 maintain control over which blacklistings that are used for each
156 server, as any blacklistings can be viewed from Nagios.
157
158 When blacklisting is not used, this option has no effect.
159
160 -i, --info
161 Prefix any alerts with the service tag.
162
163 -e, --extinfo
164 Display a short summary of system information (model and service
165 tag) in case of an alert.
166
167 -I, --htmlinfo [code]
168 Using this option will make the servicetag and model name into
169 clickable HTML links in the output. The model name link will point
170 to the official Dell documentation for that model, while the
171 servicetag link will point to a website containing support info for
172 that particular server.
173
174 This option takes an optional argument, which should be a country
175 or area code or. If the country code is omitted the servicetag link
176 will still work, but it will not be speficic for your country or
177 area. Example for Germany:
178
179 check_openmanage --htmlinfo de
180
181
182 If this option is used together with either the --extinfo or --info
183 options, it is particularly useful. Only the most common country
184 codes is supported at this time:
185
186 Europe, Middle East and Africa (EMEA)
187 at: Austria be: Belgium cz: Czech Republic
188 de: Germany dk: Denmark es: Spain
189 fi: Finland fr: France gr: Greece
190 it: Italy il: Israel me: Middle East
191 no: Norway nl: The Netherlands pl: Poland
192 pt: Portugal ru: Russia se: Sweden
193 uk: United Kingdom za: South Africa
194
195 America
196 br: Brazil ca: Canada mx: Mexico
197 us: USA
198
199 Asia / Pacific
200
201 au: Australia cn: China in: India
202 jp: Japan
203
204
205 --postmsg string | file
206 User specified post message. Useful for displaying arbitrary or
207 various system information at the end of alerts. The argument is
208 either a string with the message, or a file containing that string.
209 You can control the format with the following interpreted
210 sequences:
211
212 %m: System model
213
214 %s: Service tag
215
216 %b: BIOS version
217
218 %d: BIOS release date
219
220 %o: Operating system name
221
222 %r: Operating system release
223
224 %p: Number of physical drives
225
226 %l: Number of logical drives
227
228 %n: Line break. Will be a regular line break if run from a TTY,
229 else an HTML line break.
230
231 %%: A literal “%”
232
233 -s, --state
234 Prefix each alert with its corresponding service state (i.e.
235 warning, critical etc.). This is useful in case of several alerts
236 from the same monitored system.
237
238 -S, --short-state
239 Same as the --state option above, except that the state is
240 abbreviated to a single letter (W=warning, C=critical etc.).
241
242 --hide-servicetag
243 This option will replace the servicetag (serial number) in the
244 output with “XXXXXXX”. Use this option to suppress or censor the
245 servicetag in the plugin output.
246
247 --linebreak string
248 check_openmanage will sometimes report more than one line, e.g. if
249 there are several alerts. If the script has a TTY, it will use
250 regular linebreaks. If not (which is the case with NRPE) it will
251 use HTML linebreaks. Sometimes it can be useful to control what the
252 plugin uses as a line separator, and this option provides that
253 control.
254
255 The argument is the exact string to be used as the line separator.
256 There are two exceptions, i.e. two keywords that translates to the
257 following:
258
259 REG: Regular linebreaks, i.e. “\n”.
260
261 HTML: HTML linebreaks, i.e. “<br/>”.
262
263 This is a rather special option that is normally not needed. The
264 default behaviour should be sufficient for most users.
265
267 -H, --hostname hostname
268 The transport address of the destination SNMP device. Using this
269 option triggers SNMP mode.
270
271 -P, --protocol protocol-number
272 SNMP protocol version. This option is optional and expects either
273 of the following:
274
275 1: SNMP version 1
276
277 2, 2c: SNMP version 2c
278
279 3: SNMP version 3
280
281 Default: 2c
282
283 --port port-number
284 SNMP port of the remote (monitored) system. Defaults to the
285 well-known SNMP port 161.
286
287 -6, --ipv6
288 This option will cause the plugin to use IPv6. The default is IPv4
289 if the option is not present.
290
291 --tcp
292 This option will cause the plugin to use TCP as transport protocol.
293 The default is UDP if the option is not present.
294
295 --snmp-timeout seconds
296 This option sets the timeout for the SNMP object of the Net::SNMP
297 perl module. Legal values are between 1 and 60 seconds, and the
298 default is 5 seconds if the option is not present. Note that there
299 is one retry (with the same timeout) before the SNMP object times
300 out completely. For an unresponsive SNMP server, you'll see that
301 the plugin times out with an SNMP error after 10 seconds if the 5
302 second default is used.
303
304 This option is usually not needed. The default timeout of 5 seconds
305 is more than sufficient in most cases.
306
307 -U, --username securityname
308 [SNMPv3] The User-based Security Model (USM) used by SNMPv3
309 requires that a securityName be specified. This option is required
310 when using SNMP version 3, and expects a string 1 to 32 octets in
311 lenght.
312
313 --authpassword password, --authkey key
314 [SNMPv3] By default a securityLevel of noAuthNoPriv is assumed. If
315 the --authpassword option is specified, the securityLevel becomes
316 authNoPriv. The --authpassword option expects a string which is at
317 least 1 octet in length as argument.
318
319 Optionally, instead of the --authpassword option, the --authkey
320 option can be used so that a plain text password does not have to
321 be specified in a script. The --authkey option expects a
322 hexadecimal string produced by localizing the password with the
323 authoritativeEngineID for the specific destination device. The
324 snmpkey utility included with the Net::SNMP distribution can be
325 used to create the hexadecimal string. See snmpkey(1) for more
326 information.
327
328 --authprotocol algorithm
329 [SNMPv3] Two different hash algorithms are defined by SNMPv3 which
330 can be used by the Security Model for authentication. These
331 algorithms are HMAC-MD5-96 “MD5” (RFC 1321) and HMAC-SHA-96 “SHA-1”
332 (NIST FIPS PUB 180-1). The default algorithm used by the plugin is
333 HMAC-MD5-96. This behavior can be changed by using this option. The
334 option expects either the string md5 or sha to be passed as
335 argument to modify the hash algorithm.
336
337 --privpassword password, --privkey key
338 [SNMPv3] By specifying the options --privkey or --privpassword, the
339 securityLevel associated with the object becomes authPriv.
340 According to SNMPv3, privacy requires the use of authentication.
341 Therefore, if either of these two options are present and the
342 --authkey or --authpassword arguments are missing, the creation of
343 the object fails. The --privkey and --privpassword options expect
344 the same input as the --authkey and --authpassword options
345 respectively.
346
347 --privprotocol algorithm
348 [SNMPv3] The User-based Security Model described in RFC 3414
349 defines a single encryption protocol to be used for privacy. This
350 protocol, CBC-DES “DES” (NIST FIPS PUB 46-1), is used by default or
351 if the string des is passed to the --privprotocol option. The
352 Net::SNMP module also supports RFC 3826 which describes the use of
353 CFB128-AES-128 “AES” (NIST FIPS PUB 197) in the USM. The AES
354 encryption protocol can be selected by passing aes or aes128 to the
355 --privprotocol option.
356
357 One of the following arguments are required: des, aes, aes128,
358 3des, 3desde
359
360 --use-get_table
361 This option exists as a workaround when using check_openmanage with
362 SNMPv3 on Windows with net-snmp. Using this option will make
363 check_openmanage use the Net::SNMP function get_table() instead of
364 get_entries() while fetching values via SNMP. The latter is faster
365 and is the default.
366
368 -b, --blacklist string | file
369 Blacklist missing and/or failed components, if you do not plan to
370 fix them. The parameter is either the blacklist string, or a file
371 (that may or may not exist) containing the string. The blacklist
372 string contains component names with component IDs separated by
373 slash “/”. Blacklisted components are left unchecked.
374
375 TIP: Use the option -d or --debug to get the blacklist ID for
376 devices. The ID is listed in a separate column in the debug output.
377
378 NOTE: If blacklisting is in effect, the global health of the system
379 is not checked.
380
381 Syntax:
382
383 component1=id1[,id2,...]/component2=id1[,id2,...]/...
384
385
386 The ID part can also be “all”, in which all components of that type
387 is blacklisted.
388
389 Example:
390
391 check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
392
393
394 In the example we blacklist powersupply 0, fans 3 and 5, physical
395 disk 1:0:0:1, and warnings about out-of-date drivers for all
396 controllers. Legal component names include:
397
398 ctrl
399 Storage controller. Note that if a controller is blacklisted,
400 all components on that controller (such as physical and logical
401 drives) are blacklisted as well.
402
403 ctrl_fw
404 Suppress the special warning message about old controller
405 firmware. Use this if you can not or will not upgrade the
406 firmware.
407
408 ctrl_driver
409 Suppress the special warning message about old controller
410 driver. Particularly useful on systems where you can not
411 upgrade the driver.
412
413 ctrl_stdr
414 Suppress the special warning message about old Storport driver
415 on Windows.
416
417 ctrl_pdisk
418 This blacklisting keyword exists as a possible workaround for
419 physical drives with bad firmware which makes Openmanage choke.
420 It takes the controller number as argument. Use this option to
421 blacklist all physical drives on a specific controller. This
422 blacklisting keyword is only available in local mode, i.e. not
423 with SNMP.
424
425 pdisk
426 Physical disk.
427
428 pdisk_cert
429 Suppress warning message about non-certified physical disk.
430
431 pdisk_foreign
432 Suppress warning message about foreign physical disk.
433
434 vdisk
435 Logical drive (virtual disk).
436
437 bat
438 Controller cache battery.
439
440 bat_charge
441 Ignore warnings related to the controller cache battery
442 charging cycle, which happens approximately every 40-90 days on
443 Dell servers. Note that using this blacklist keyword makes
444 check_openmanage ignore non-critical cache battery errors.
445
446 conn
447 Connector (channel).
448
449 encl
450 Storage enclosure.
451
452 encl_fan
453 Enclosure fan.
454
455 encl_ps
456 Enclosure power supply.
457
458 encl_temp
459 Enclosure temperature probe.
460
461 encl_emm
462 Enclosure management module (EMM).
463
464 dimm
465 Memory module.
466
467 fan
468 Chassis fan.
469
470 ps
471 Power supply.
472
473 temp
474 Temperature sensor.
475
476 cpu
477 Processor (CPU).
478
479 volt
480 Voltage probe.
481
482 bp
483 System battery.
484
485 amp
486 Amperage probe (power consumption monitoring).
487
488 intr
489 Intrusion detection sensor.
490
491 sd
492 SD card
493
494
496 --no-storage
497 Turn off storage checking. This is an alias for “--check
498 storage=0”.
499
500 --only keyword
501 Makes check_openmanage check and/or report on a single class of
502 components or warning level. This option can be specifed once and
503 expects an argument. The different arguments and the corresponding
504 behaviour are described below.
505
506 critical
507 Print only critical alerts. With this option any warning alerts
508 are suppressed.
509
510 warning
511 Print only warning alerts. With this option any critical alerts
512 are suppressed.
513
514 chassis
515 Check all chassis components and nothing else.
516
517 storage
518 Only check storage
519
520 memory
521 Only check memory modules
522
523 fans
524 Only check fans
525
526 power
527 Only check power supplies
528
529 temp
530 Only check temperatures
531
532 cpu
533 Only check processors
534
535 voltage
536 Only check voltage probes
537
538 batteries
539 Only check batteries
540
541 amperage
542 Only check power usage
543
544 intrusion
545 Only check chassis intrusion
546
547 sdcard
548 Only check SD cards
549
550 esmhealth
551 Only check ESM log overall health, i.e. fill grade
552
553 servicetag
554 Only check for sane service tag
555
556 esmlog
557 Only check the event log (ESM) content
558
559 alertlog
560 Only check the alert log content
561
562
563 --check string | file
564 This parameter allows you to adjust which components that should be
565 checked at all. This is a rougher approach than blacklisting, which
566 require that you specify component id or index. The parameter
567 should be either a string containing the adjustments, or a file
568 containing the string. No errors are raised if the file does not
569 exist.
570
571 Example:
572
573 check_openmanage --check storage=0,intrusion=1
574
575
576 Legal values are described below, along with the default value.
577
578 storage
579 Check storage subsystem (controllers, disks etc.). Default: ON
580
581 memory
582 Check memory (dimms). Default: ON
583
584 fans
585 Check chassis fans. Default: ON
586
587 power
588 Check power supplies. Default: ON
589
590 temp
591 Check temperature sensors. Default: ON
592
593 cpu
594 Check CPUs. Default: ON
595
596 voltage
597 Check voltage sensors. Default: ON
598
599 batteries
600 Check system batteries. Default: ON
601
602 amperage
603 Check amperage probes. Default: ON
604
605 intrusion
606 Check chassis intrusion. Default: ON
607
608 sdcard
609 Check SD cards. Default: ON
610
611 esmhealth
612 Check the ESM log health, i.e. fill grade. Default: ON
613
614 servicetag
615 Check that the service tag (serial number) is sane and not
616 empty. Default: ON
617
618 esmlog
619 Check the ESM log content. Default: OFF
620
621 alertlog
622 Check the alert log content. Default: OFF
623
624
626 The option -d or --debug can be specified to display all monitored
627 components.
628
630 If no errors are discovered, a value of 0 (OK) is returned. An exit
631 value of 1 (WARNING) signifies one or more non-critical errors, while 2
632 (CRITICAL) signifies one or more critical errors.
633
634 The exit value 3 (UNKNOWN) is reserved for errors within the script, or
635 errors getting values from Dell OMSA.
636
638 Storage info is not collected or checked on very old PowerEdge models
639 and/or old OMSA versions, due to limitations in OMSA. The overall
640 support on those models/versions by this plugin is not well tested.
641
643 This program is free software: you can redistribute it and/or modify it
644 under the terms of the GNU General Public License as published by the
645 Free Software Foundation, either version 3 of the License, or (at your
646 option) any later version.
647
648 This program is distributed in the hope that it will be useful, but
649 WITHOUT ANY WARRANTY; without even the implied warranty of
650 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
651 General Public License for more details.
652
653 You should have received a copy of the GNU General Public License along
654 with this program. If not, see http://www.gnu.org/licenses/.
655
657 check_openmanage.conf(5), Net::SNMP(3),
658 http://folk.uio.no/trondham/software/check_openmanage.html
659
661 Trond Hasle Amundsen <t.h.amundsen@usit.uio.no>
662
663
664
665check_openmanage 01/19/2023 CHECK_OPENMANAGE(8)