1CHECK_OPENMANAGE(8)              Nagios plugin             CHECK_OPENMANAGE(8)
2
3
4

NAME

6       check_openmanage - Nagios plugin for checking the hardware status on
7       Dell servers running OpenManage
8

SYNOPSIS

10       check_openmanage [option...]
11
12       check_openmanage -H hostname [option...]
13

DESCRIPTION

15       check_openmanage is a plugin for Nagios which checks the hardware
16       health of Dell servers running OpenManage Server Administrator (OMSA).
17       The plugin checks the health of the storage subsystem, power supplies,
18       memory modules, temperature probes etc., and gives an alert if any of
19       the components are faulty or operate outside normal parameters.
20
21       check_openmanage is designed to be used by either locally (using NRPE
22       or similar) or remotely (using SNMP). In either mode, the output is
23       (nearly) the same. Note that checking the alert log is not supported in
24       SNMP mode.
25

GENERAL OPTIONS

27       -f, --config file
28           Specify a configuration file. For reference on the config file
29           syntax and options, consult the check_openmanage.conf(5) manual
30           page.
31
32       -t, --timeout seconds
33           The number of seconds after which the plugin will abort. Default
34           timeout is 30 seconds if the option is not present.
35
36       -p, --perfdata [argument]
37           Collect performance data. Performance data collected include
38           temperatures (in Celsius) and fan speeds (in rpm). On systems that
39           support it, power consumption is also collected (in Watts). This
40           option takes one of two arguments, both of which are optional:
41
42           minimal
43               If minimal is specified as argument, the plugin will use
44               shorter names for the performance data labels, e.g.  “t0”
45               instead of “temp_0_system_board_ambient”. This can be used as a
46               workaround in cases where the plugin output needs shortening,
47               for example if the 1024 character limit of NRPE is reached.
48
49           multiline
50               If multiline is specified as argument, the plugin will output
51               the performance data on multiple lines, for Nagios 3.x and
52               above.
53
54           The default behaviour should be sufficient for most users.
55
56       --legacy-perfdata
57           With version 3.7.0, performance data output changed. The new format
58           is not compatible with the old format. Users who wish to postpone
59           switching to the new performance data API may set this option.
60
61       -w, --warning string | file
62           Override the machine-default temperature warning thresholds. Syntax
63           is:
64
65               id1=max[/min],id2=max[/min],...
66
67
68           The following example sets warning limits to max 50C for probe 0,
69           and max 45C and min 10C for probe 1:
70
71               check_openmanage -w 0=50,1=45/10
72
73
74           The minimum limit can be omitted, if desired. Most often, you are
75           only interested in setting the maximum thresholds.
76
77           This parameter can be either a string with the limits, or a file
78           containing the limits string. The option can be specified multiple
79           times.
80
81           NOTE: This option should only be used to narrow the field of OK
82           temperatures wrt. the OMSA defaults. To expand the field of OK
83           temperatures, increase the OMSA thresholds. See the plugin web page
84           for more information.
85
86       -c, --critical string | file
87           Override the machine-default temperature critical thresholds.
88           Syntax and behaviour is the same as for warning thresholds
89           described above.
90
91       -F, --fahrenheit
92           Set Fahrenheit as unit for all temperatures. This option will
93           override the --tempunit option, if used simultaneously.
94
95       --tempunit unit
96           Set temperature unit. Legal values are:
97
98           F: Fahrenheit
99
100           C: Celsius
101
102           K: Kelvin
103
104           R: Rankine
105
106           Default: C
107
108       --omreport path
109           Specify full path to omreport, if it is not installed in any of the
110           regular places. Usually this option is only needed on Windows, if
111           omreport is not installed on the C: drive.
112
113       --vdisk-critical
114           Make any alerts concerning virtual disks appear as critical.
115
116       -d, --debug
117           Debug output. Will report status on everything, even if status is
118           ok. Blacklisted or unchecked components are ignored (i.e. no
119           output).
120
121           NOTE: This option is intended for diagnostics and debugging
122           purposes only. Do not use this option from within Nagios, i.e. in
123           the Nagios config.
124
125       -h, --help
126           Display help message and exit.
127
128       -V, --version
129           Print version info and exit.
130

OUTPUT OPTIONS

132       -o, --ok-info level
133           This option lets you define how much output you want the plugin to
134           give when everything is OK, i.e. the verbosity level. The default
135           value is 0 (one line of output). The output levels are cumulative.
136
137           0: Only one line
138
139           1: BIOS and firmware info on a separate line
140
141           2: Storage controller and enclosure info on separate lines
142
143           3: OMSA version on separate line
144
145           Default: 0
146
147           The reason that OMSA version is separated from the rest is that
148           finding it requires running a really slow omreport command, when
149           the plugin is run locally via NRPE.
150
151       -B, --show-blacklist
152           If used together with blacklisting, this option will make the
153           plugin output all blacklistings that are being used. The output
154           will have the correct blacklisting syntax, and will make it easy to
155           maintain control over which blacklistings that are used for each
156           server, as any blacklistings can be viewed from Nagios.
157
158           When blacklisting is not used, this option has no effect.
159
160       -i, --info
161           Prefix any alerts with the service tag.
162
163       -e, --extinfo
164           Display a short summary of system information (model and service
165           tag) in case of an alert.
166
167       -I, --htmlinfo [code]
168           Using this option will make the servicetag and model name into
169           clickable HTML links in the output. The model name link will point
170           to the official Dell documentation for that model, while the
171           servicetag link will point to a website containing support info for
172           that particular server.
173
174           This option takes an optional argument, which should be a country
175           or area code or. If the country code is omitted the servicetag link
176           will still work, but it will not be speficic for your country or
177           area. Example for Germany:
178
179               check_openmanage --htmlinfo de
180
181
182           If this option is used together with either the --extinfo or --info
183           options, it is particularly useful. Only the most common country
184           codes is supported at this time:
185
186           Europe, Middle East and Africa (EMEA)
187           at: Austria             be: Belgium              cz: Czech Republic
188           de: Germany             dk: Denmark              es: Spain
189           fi: Finland             fr: France               gr: Greece
190           it: Italy               il: Israel               me: Middle East
191           no: Norway              nl: The Netherlands      pl: Poland
192           pt: Portugal            ru: Russia               se: Sweden
193           uk: United Kingdom      za: South Africa
194
195           America
196           br: Brazil              ca: Canada               mx: Mexico
197           us: USA
198
199           Asia / Pacific
200
201           au: Australia           cn: China                in: India
202           jp: Japan
203
204
205       --postmsg string | file
206           User specified post message. Useful for displaying arbitrary or
207           various system information at the end of alerts. The argument is
208           either a string with the message, or a file containing that string.
209           You can control the format with the following interpreted
210           sequences:
211
212           %m: System model
213
214           %s: Service tag
215
216           %b: BIOS version
217
218           %d: BIOS release date
219
220           %o: Operating system name
221
222           %r: Operating system release
223
224           %p: Number of physical drives
225
226           %l: Number of logical drives
227
228           %n: Line break. Will be a regular line break if run from a TTY,
229           else an HTML line break.
230
231           %%: A literal “%”
232
233       -s, --state
234           Prefix each alert with its corresponding service state (i.e.
235           warning, critical etc.). This is useful in case of several alerts
236           from the same monitored system.
237
238       -S, --short-state
239           Same as the --state option above, except that the state is
240           abbreviated to a single letter (W=warning, C=critical etc.).
241
242       --hide-servicetag
243           This option will replace the servicetag (serial number) in the
244           output with “XXXXXXX”. Use this option to suppress or censor the
245           servicetag in the plugin output.
246
247       --linebreak string
248           check_openmanage will sometimes report more than one line, e.g. if
249           there are several alerts. If the script has a TTY, it will use
250           regular linebreaks. If not (which is the case with NRPE) it will
251           use HTML linebreaks. Sometimes it can be useful to control what the
252           plugin uses as a line separator, and this option provides that
253           control.
254
255           The argument is the exact string to be used as the line separator.
256           There are two exceptions, i.e. two keywords that translates to the
257           following:
258
259           REG: Regular linebreaks, i.e.  “\n”.
260
261           HTML: HTML linebreaks, i.e.  “<br/>”.
262
263           This is a rather special option that is normally not needed. The
264           default behaviour should be sufficient for most users.
265

SNMP OPTIONS

267       -H, --hostname hostname
268           The transport address of the destination SNMP device. Using this
269           option triggers SNMP mode.
270
271       -P, --protocol protocol-number
272           SNMP protocol version. This option is optional and expects either
273           of the following:
274
275           1: SNMP version 1
276
277           2, 2c: SNMP version 2c
278
279           3: SNMP version 3
280
281           Default: 2c
282
283       --port port-number
284           SNMP port of the remote (monitored) system. Defaults to the
285           well-known SNMP port 161.
286
287       -6, --ipv6
288           This option will cause the plugin to use IPv6. The default is IPv4
289           if the option is not present.
290
291       --tcp
292           This option will cause the plugin to use TCP as transport protocol.
293           The default is UDP if the option is not present.
294
295       --snmp-timeout seconds
296           This option sets the timeout for the SNMP object of the Net::SNMP
297           perl module. Legal values are between 1 and 60 seconds, and the
298           default is 5 seconds if the option is not present. Note that there
299           is one retry (with the same timeout) before the SNMP object times
300           out completely. For an unresponsive SNMP server, you'll see that
301           the plugin times out with an SNMP error after 10 seconds if the 5
302           second default is used.
303
304           This option is usually not needed. The default timeout of 5 seconds
305           is more than sufficient in most cases.
306
307       -U, --username securityname
308           [SNMPv3] The User-based Security Model (USM) used by SNMPv3
309           requires that a securityName be specified. This option is required
310           when using SNMP version 3, and expects a string 1 to 32 octets in
311           lenght.
312
313       --authpassword password, --authkey key
314           [SNMPv3] By default a securityLevel of noAuthNoPriv is assumed. If
315           the --authpassword option is specified, the securityLevel becomes
316           authNoPriv. The --authpassword option expects a string which is at
317           least 1 octet in length as argument.
318
319           Optionally, instead of the --authpassword option, the --authkey
320           option can be used so that a plain text password does not have to
321           be specified in a script. The --authkey option expects a
322           hexadecimal string produced by localizing the password with the
323           authoritativeEngineID for the specific destination device. The
324           snmpkey utility included with the Net::SNMP distribution can be
325           used to create the hexadecimal string. See snmpkey(1) for more
326           information.
327
328       --authprotocol algorithm
329           [SNMPv3] Two different hash algorithms are defined by SNMPv3 which
330           can be used by the Security Model for authentication. These
331           algorithms are HMAC-MD5-96 “MD5” (RFC 1321) and HMAC-SHA-96 “SHA-1”
332           (NIST FIPS PUB 180-1). The default algorithm used by the plugin is
333           HMAC-MD5-96. This behavior can be changed by using this option. The
334           option expects either the string md5 or sha to be passed as
335           argument to modify the hash algorithm.
336
337       --privpassword password, --privkey key
338           [SNMPv3] By specifying the options --privkey or --privpassword, the
339           securityLevel associated with the object becomes authPriv.
340           According to SNMPv3, privacy requires the use of authentication.
341           Therefore, if either of these two options are present and the
342           --authkey or --authpassword arguments are missing, the creation of
343           the object fails. The --privkey and --privpassword options expect
344           the same input as the --authkey and --authpassword options
345           respectively.
346
347       --privprotocol algorithm
348           [SNMPv3] The User-based Security Model described in RFC 3414
349           defines a single encryption protocol to be used for privacy. This
350           protocol, CBC-DES “DES” (NIST FIPS PUB 46-1), is used by default or
351           if the string des is passed to the --privprotocol option. The
352           Net::SNMP module also supports RFC 3826 which describes the use of
353           CFB128-AES-128 “AES” (NIST FIPS PUB 197) in the USM. The AES
354           encryption protocol can be selected by passing aes or aes128 to the
355           --privprotocol option.
356
357           One of the following arguments are required: des, aes, aes128,
358           3des, 3desde
359
360       --use-get_table
361           This option exists as a workaround when using check_openmanage with
362           SNMPv3 on Windows with net-snmp. Using this option will make
363           check_openmanage use the Net::SNMP function get_table() instead of
364           get_entries() while fetching values via SNMP. The latter is faster
365           and is the default.
366

BLACKLISTING

368       -b, --blacklist string | file
369           Blacklist missing and/or failed components, if you do not plan to
370           fix them. The parameter is either the blacklist string, or a file
371           (that may or may not exist) containing the string. The blacklist
372           string contains component names with component IDs separated by
373           slash “/”. Blacklisted components are left unchecked.
374
375           TIP: Use the option -d or --debug to get the blacklist ID for
376           devices. The ID is listed in a separate column in the debug output.
377
378           NOTE: If blacklisting is in effect, the global health of the system
379           is not checked.
380
381           Syntax:
382
383               component1=id1[,id2,...]/component2=id1[,id2,...]/...
384
385
386           The ID part can also be “all”, in which all components of that type
387           is blacklisted.
388
389           Example:
390
391               check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
392
393
394           In the example we blacklist powersupply 0, fans 3 and 5, physical
395           disk 1:0:0:1, and warnings about out-of-date drivers for all
396           controllers. Legal component names include:
397
398           ctrl
399               Storage controller. Note that if a controller is blacklisted,
400               all components on that controller (such as physical and logical
401               drives) are blacklisted as well.
402
403           ctrl_fw
404               Suppress the special warning message about old controller
405               firmware. Use this if you can not or will not upgrade the
406               firmware.
407
408           ctrl_driver
409               Suppress the special warning message about old controller
410               driver. Particularly useful on systems where you can not
411               upgrade the driver.
412
413           ctrl_stdr
414               Suppress the special warning message about old Storport driver
415               on Windows.
416
417           ctrl_pdisk
418               This blacklisting keyword exists as a possible workaround for
419               physical drives with bad firmware which makes Openmanage choke.
420               It takes the controller number as argument. Use this option to
421               blacklist all physical drives on a specific controller. This
422               blacklisting keyword is only available in local mode, i.e. not
423               with SNMP.
424
425           pdisk
426               Physical disk.
427
428           pdisk_cert
429               Suppress warning message about non-certified physical disk.
430
431           pdisk_foreign
432               Suppress warning message about foreign physical disk.
433
434           vdisk
435               Logical drive (virtual disk).
436
437           bat
438               Controller cache battery.
439
440           bat_charge
441               Ignore warnings related to the controller cache battery
442               charging cycle, which happens approximately every 40-90 days on
443               Dell servers. Note that using this blacklist keyword makes
444               check_openmanage ignore non-critical cache battery errors.
445
446           conn
447               Connector (channel).
448
449           encl
450               Storage enclosure.
451
452           encl_fan
453               Enclosure fan.
454
455           encl_ps
456               Enclosure power supply.
457
458           encl_temp
459               Enclosure temperature probe.
460
461           encl_emm
462               Enclosure management module (EMM).
463
464           dimm
465               Memory module.
466
467           fan
468               Chassis fan.
469
470           ps
471               Power supply.
472
473           temp
474               Temperature sensor.
475
476           cpu
477               Processor (CPU).
478
479           volt
480               Voltage probe.
481
482           bp
483               System battery.
484
485           amp
486               Amperage probe (power consumption monitoring).
487
488           intr
489               Intrusion detection sensor.
490
491           sd
492               SD card
493
494

CHECK CONTROL

496       --no-storage
497           Turn off storage checking. This is an alias for “--check
498           storage=0”.
499
500       --only keyword
501           Makes check_openmanage check and/or report on a single class of
502           components or warning level. This option can be specifed once and
503           expects an argument. The different arguments and the corresponding
504           behaviour are described below.
505
506           critical
507               Print only critical alerts. With this option any warning alerts
508               are suppressed.
509
510           warning
511               Print only warning alerts. With this option any critical alerts
512               are suppressed.
513
514           chassis
515               Check all chassis components and nothing else.
516
517           storage
518               Only check storage
519
520           memory
521               Only check memory modules
522
523           fans
524               Only check fans
525
526           power
527               Only check power supplies
528
529           temp
530               Only check temperatures
531
532           cpu
533               Only check processors
534
535           voltage
536               Only check voltage probes
537
538           batteries
539               Only check batteries
540
541           amperage
542               Only check power usage
543
544           intrusion
545               Only check chassis intrusion
546
547           sdcard
548               Only check SD cards
549
550           esmhealth
551               Only check ESM log overall health, i.e. fill grade
552
553           servicetag
554               Only check for sane service tag
555
556           esmlog
557               Only check the event log (ESM) content
558
559           alertlog
560               Only check the alert log content
561
562
563       --check string | file
564           This parameter allows you to adjust which components that should be
565           checked at all. This is a rougher approach than blacklisting, which
566           require that you specify component id or index. The parameter
567           should be either a string containing the adjustments, or a file
568           containing the string. No errors are raised if the file does not
569           exist.
570
571           Example:
572
573               check_openmanage --check storage=0,intrusion=1
574
575
576           Legal values are described below, along with the default value.
577
578           storage
579               Check storage subsystem (controllers, disks etc.). Default: ON
580
581           memory
582               Check memory (dimms). Default: ON
583
584           fans
585               Check chassis fans. Default: ON
586
587           power
588               Check power supplies. Default: ON
589
590           temp
591               Check temperature sensors. Default: ON
592
593           cpu
594               Check CPUs. Default: ON
595
596           voltage
597               Check voltage sensors. Default: ON
598
599           batteries
600               Check system batteries. Default: ON
601
602           amperage
603               Check amperage probes. Default: ON
604
605           intrusion
606               Check chassis intrusion. Default: ON
607
608           sdcard
609               Check SD cards. Default: ON
610
611           esmhealth
612               Check the ESM log health, i.e. fill grade. Default: ON
613
614           servicetag
615               Check that the service tag (serial number) is sane and not
616               empty. Default: ON
617
618           esmlog
619               Check the ESM log content. Default: OFF
620
621           alertlog
622               Check the alert log content. Default: OFF
623
624

DIAGNOSTICS

626       The option -d or --debug can be specified to display all monitored
627       components.
628

EXIT STATUS

630       If no errors are discovered, a value of 0 (OK) is returned. An exit
631       value of 1 (WARNING) signifies one or more non-critical errors, while 2
632       (CRITICAL) signifies one or more critical errors.
633
634       The exit value 3 (UNKNOWN) is reserved for errors within the script, or
635       errors getting values from Dell OMSA.
636

BUGS AND LIMITATIONS

638       Storage info is not collected or checked on very old PowerEdge models
639       and/or old OMSA versions, due to limitations in OMSA. The overall
640       support on those models/versions by this plugin is not well tested.
641
643       This program is free software: you can redistribute it and/or modify it
644       under the terms of the GNU General Public License as published by the
645       Free Software Foundation, either version 3 of the License, or (at your
646       option) any later version.
647
648       This program is distributed in the hope that it will be useful, but
649       WITHOUT ANY WARRANTY; without even the implied warranty of
650       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
651       General Public License for more details.
652
653       You should have received a copy of the GNU General Public License along
654       with this program. If not, see http://www.gnu.org/licenses/.
655

SEE ALSO

657       check_openmanage.conf(5), Net::SNMP(3),
658       http://folk.uio.no/trondham/software/check_openmanage.html
659

AUTHORS

661       Trond Hasle Amundsen <t.h.amundsen@usit.uio.no>
662
663
664
665check_openmanage                  01/29/2020               CHECK_OPENMANAGE(8)
Impressum