1SYSUSAGE(1) User Contributed Perl Documentation SYSUSAGE(1)
2
3
4
6 SysUsage v5.7 - System Monitoring Tool
7
9 SysUsage is a tool used to continuously monitor a system and generate
10 daily/weekly/monthly/yearly graphical report using rrdtool and sar.
11
13 SysUsage generate graphical reports on all system activity information.
14 His periodical reports allow you to keep track of the machine activity
15 during his life and will be a great help for performance analysis and
16 resources management.
17
18 SysUsage can be run periodically from 10 seconds cycle in daemon mode
19 to 1 minute or more using crond.
20
21 SysUsage can be run from a central server to call a ssh remote
22 execution of the sysusage perl script so that collected data will be
23 stored in this central place. You also will have just one place where
24 rrdtool and related Perl modules need to be installed as well as just
25 one place where sysusagegraph or sysusagejqgraph need to be executed.
26
27 CPUs
28 - CPUs distribution usage (user, nice, system).
29 - CPUs global usage (total cpu used, iowait).
30 - CPUs virtualized usage (steal, guest).
31
32 Memory
33 - Memory usage (with and without cache).
34 - Swap usage (with and without cache).
35 - Amount of memory need for current workload.
36 - Posix share memory.
37 - Hugepages utilisation
38 - Active versus inactive memory
39 - Dirty memeory that need to be written to disk
40
41 I/O
42 - Context switches per second.
43 - Interrupts per second.
44 - Page swapping.
45 - Page I/O stats.
46 - I/O request stats.
47 - I/O block stats.
48
49 Network
50 - TCP connections per second.
51 - TCP segments per second.
52 - Number of socket in use (Total, TCP and UDP).
53 - Number of socket in TIME_WAIT state.
54 - Active network interface usage.
55 - Active network interface bad packet, dropping, collision.
56
57 Devices
58 - CPU time for I/O on device.
59 - Read/Write sectors on device.
60 - Disk throughput on device.
61 - I/O workload on device.
62 - Times for I/O requests issued to device.
63 - Hard drive temperature if your hardward support it (with hddtemp).
64 - MotherBoard/CPU/Remote temperature reported by sensors or sar.
65 - Fan RPM reported by sensors.
66
67 Files
68 - Number of open file.
69 - Number of file in a queue directory.
70 - Disk space used on mounted partition.
71
72 Process
73 - Load average.
74 - Process created per second.
75 - Number of running process (ex: sendmail, httpd, oracle, etc.).
76 - Number of running thread (ex: mysqld, amarok, etc.).
77 - Number of task blocked waiting for I/O
78
79 Notification
80 You can have mail or Nagios notification when some monitored values are
81 outside max/min threshold values for all type of monitoring.
82
83 Plugins
84 With SysUsage you can create your own monitoring plugins. Any script or
85 program can be embeded in SysUsage provided that it return up to 3
86 numeric values. The graphic title and labels are defined in the
87 configuration file.
88
89 Remote call
90 SysUsage can be installed and run onto a central server that will be
91 used to store statistics data by periodically calling sysusage on
92 remote host using SSH. This central place will also be in charge to
93 renderer HTML plages and graphics for all hosts. This will allow to
94 simplify the SysUsage installation on remote host that will only
95 require sysstat and rsysusage.
96
98 rrdtool
99 You need to install rrdtool. All distribution may have a dedicated
100 package for rrdtool. On CentOs/RedHat distributions, use the following
101 command:
102
103 yum install rrdtool rrdtool-perl
104
105 on Debian/Ubuntu distributions use command:
106
107 apt-get install rrdtool librrds-perl
108
109 The sources can be found here:
110
111 http://people.ee.ethz.ch/~oetiker/
112
113 If you compile from sources and want to use the RRDs perl module
114 embedded with it, you must use the following command to compile:
115
116 make site-perl-install
117
118 This installation is optional if sysusage is installed on a remote
119 host.
120
121 sysstat
122 You also need sar to collect statistics. Sar is part of the sysstat
123 package. For RPM like distributions:
124
125 yum install sysstat
126
127 and Debian like distributions:
128
129 apt-get install sysstat
130
131 The sources can always be found here :
132
133 http://freshmeat.net/projects/sysstat/
134
135 If you plan to use threshold notification you must have Net::SMTP
136 installed.
137
138 yum install perl-Net-SMTP-SSL
139
140 or
141
142 apt-get install libnet-smtp-ssl-perl
143
144 Sources can be found on CPAN (https://metacpan.org/pod/Net::SMTP)
145
146 Perl modules
147 Sysusage can be run in a central place to collect remote sysusage
148 statistics using ssh. The remote calls are proceed simultaneously using
149 fork with the Proc::Queue Perl module.
150
151 If you're plan tu use sysusagegraph instead of sysusagejqgrpah you will
152 also need the GD and GD::Graph3D Perl modules. Note that the use of GD
153 and GD::Graph is deprecated and sysusagegraph will be removed in next
154 major release (6.0).
155
156 All these modules are always available from CPAN
157 (https://metacpan.org/) and may at least be installed on the central
158 server. On remote host this is optional and depend if you want to run
159 it on each server or by ssh from a central place.
160
161 Nagios nsca client (optional)
162 If you want to send message to Nagios you need to install
163 nsca-2.7.2.tar.gz or a more recent version. You can get it here:
164
165 http://sourceforge.net/projects/nagios/files/
166
167 hddtemp and sensors (optional)
168 If you want to monitor your hard drive temperature you must install a
169 small utility called hddtemp. You can download it from
170 http://download.savannah.gnu.org/releases/hddtemp/. Run it to see if
171 your hard drive have a temperature sensor.
172
173 You can also use sensors to monitor your cpu temperature and fan speed.
174 If you harware support it run sensors-detect and load the required
175 kernel modules at boot time.
176
178 Quick install
179 Simply run the following commands:
180
181 perl Makefile.PL
182 make && make install
183
184 By default it will copy the perl programs into /usr/local/sysusage/bin
185 and the HTML output will be done to /var/www/htdocs/sysusage/. The
186 configuration file is /usr/local/sysusage/etc/sysusage.cfg and all RRD
187 Bekerley DB databases from rrdtool will be saved under
188 /usr/local/sysusage/rrdfiles.
189
190 If you plan to run sysusage on different servers from a central place
191 you may just want to install the rsysusage Perl script on remote hosts.
192 So proceed as follow:
193
194 perl Makefile.PL REMOTE=1
195 make && make install
196
197 It will copy the only the rsysusage into /usr/local/sysusage/bin and
198 the configuration file under /usr/local/sysusage/etc/sysusage.cfg. The
199 RRD data directory will be created under /usr/local/sysusage/rrdfiles
200 but just to hold the *.cnt files relatives to the count of alert
201 attempt on threshold exceed.
202
203 Custom install
204 You can overwrite all install path with the following Makefile.PL
205 arguments. Here are the default values:
206
207 BINDIR=/usr/local/sysusage/bin
208 CONFDIR=/usr/local/sysusage/etc
209 PIDDIR=/usr/local/sysusage/etc
210 BASEDIR=/usr/local/sysusage/rrdfiles
211 PLUGINDIR=/usr/local/sysusage/plugins
212 HTMLDIR=/var/www/htdocs/sysusage
213 MANDIR=/usr/local/sysusage/doc
214 DOCDIR=/usr/local/sysusage/doc
215 REMOTE=
216
217 For example on a RedHat System you may prefer install SysUsage as this:
218
219 perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
220 BASEDIR=/var/lib/sysusage HTMLDIR=/var/www/html/sysusage \
221 MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage
222
223 If you are installing sysusage on a host that will be call by ssh from
224 a central place, you may want to install just what is necessary and not
225 more:
226
227 perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
228 MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage \
229 REMOTE=1
230
231 This will just install the rsysusage Perl script, the configuration
232 file and documentation. So that you don't need to install extra Perl
233 modules and other graphics related things.
234
235 Package/binary install
236 In directory packaging/ you will find all scripts to build RPM,
237 slackBuild and debian package. See README in this directory to know how
238 to build these packages.
239
241 SysUsage consist in two main Perl scripts, sysusage and sysusagegraph.
242 Once you have correctly installed and configured SysUsage the best way
243 to execute them is by setting a cron job. If you prefer javascript
244 graphics instead of GD::Graph images use sysusagejqgraph that is based
245 on jqplot javascript library. This is the recommanded script as use of
246 GD::Graph through sysusagegraph is deprecated.
247
248 sysusage
249 The script sysusage is responsible of collecting system informations at
250 a given interval and store them into rrdtool database files.
251
252 As it is very fast you can set running interval time to 1 minute. This
253 is the default pooling interval used in configuration and graph
254 reports. If you change this interval you must also change it in the
255 configuration file otherwise your graph will be false. See the INTERVAL
256 configuration directive.
257
258 Here is how I use it with a default installation:
259
260 */1 * * * * /usr/local/sysusage/bin/sysusage > /dev/null 2>&1
261
262 rsysusage
263 This script do the same things as the sysusage Perl script but instead
264 of storing collected datas on file it will dump them to the standard
265 output. This script is used instead of the sysusage Perl script by a
266 ssh call from a central server where the local sysusage will store the
267 statistics retrieved from multiple servers.
268
269 /usr/local/sysusage/bin/rsysusage -r remote_hostname
270
271 Where 'remote_hostname' is the hostname given in the [REMOTE ...]
272 configuration section.
273
274 sysusagegraph (deprecated) / sysusagejqgraph
275 The perl script sysusagegraph is used to draw PNG graphs and write HTML
276 file. As he knows the pooling interval given in the configuration file
277 it can be run at any time. I used to run it each five minutes but you
278 can run it each hours or more this is the same.
279
280 */5 * * * * /usr/local/sysusage/bin/sysusagegraph > /dev/null 2>&1
281
282 Since release v4.0 of SysUsage there's a JQuery plotting replacement of
283 rrdGraph that only write HTML files with all javascript code to allow
284 the client browser to draw the graphs. To enable this feature you just
285 have to use sysusagejqgrpah instead.
286
287 */5 * * * * /usr/local/sysusage/bin/sysusagejqgraph > /dev/null 2>&1
288
289 There's some more resources javascript libraries and CSS files to
290 install. The SysUsage installer will do the job for you. This remove
291 the requirement of the GD, GD::Graph and GD::Graph3D Perl modules.
292
293 sysusage.cfg
294 If you have change the default installation path (/usr/local/sysusage)
295 you may need to give these scripts the path to the configuration file
296 as command line argument using -c option. To know what arguments can be
297 passed use option -h or --help.
298
299 Note that since version 3.0 the default configuration path in these
300 scripts is set during installation. So you may not need anymore to edit
301 these scripts or give the path of the configuration file as command
302 line argument.
303
304 See CONFIGURATION chapter for more information on howto configure your
305 system monitoring.
306
307 Daemon mode
308 Crond is good for scheduling but not under the minute. If you want to
309 monitor your system within an interval under the minute you may want to
310 run sysusage in daemon mode. To do that, just change the INTERVAL to
311 the desired timer in the configuration file and the DAEMON directive to
312 1.
313
314 Debug mode
315 Some time things don't appear as you wanted. The best way to see what's
316 going wrong is to run sysusage in debug mode. This mode allow you to
317 see all values extracted from sar and other tools. Use the --debug
318 option for that, this mode prevent sysusage to store data in the
319 rrdfiles. Command:
320
321 /usr/local/sysusage/bin/sysusage --debug
322
323 Please, run this command and check the result before sending bug
324 report.
325
326 Output
327 Once sysusage and sysusagegraph are running since some cycles, run your
328 favorite browser and take a look at the output directory. By default:
329
330 http://my.server.dom/sysusage/
331
332 If you have special URI and/or port remember to modify the URL
333 configuration directive without that the web interface will not works.
334
336 During installation a default configuration file sysusage.cfg is
337 generated. The default settings are good enougth to report essential
338 information of your system, but if you want to monitor some processes,
339 queue directories or some devices you must edit this file by hand.
340
341 Here is the format of the configuration file and all directives. There
342 is three section, the first one set the general parameters of the
343 application, the second set the parameters related to SMTP or Nagios
344 notification at threshold exceed and the last configure all type of
345 system information you may want to monitor.
346
347 Full sample of configuration file:
348
349 [GENERAL]
350 DEBUG = 0
351 DATA_DIR = /usr/local/sysusage/rrdfiles
352 PID_DIR = /usr/local/sysusage/etc
353 DEST_DIR = /var/www/htdocs/sysusage
354 SAR_BIN = /usr/bin/sar
355 UPTIME = /usr/bin/uptime
356 HOSTNAME = /bin/hostname
357 INTERVAL = 60
358 SKIP = 12:00/14:00 20:00/06:00
359 HDDTEMP_BIN = /usr/local/sbin/hddtemp
360 SENSORS_BIN = /usr/bin/sensors
361 DAEMON = 0
362 GRAPH_WIDTH = 550
363 GRAPH_HEIGHT= 200
364 FLAMING = 0
365 HIRES = 0
366 LINE_SIZE = 2
367 PROC_QSIZE = 4
368 RESRC_URL =
369 SSH_BIN = /usr/bin/ssh
370 SSH_OPTION = -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
371 SSH_USER =
372 SSH_IDENTITY=
373
374
375 [ALARM]
376 WARN_MODE = 0
377 ALARM_PROG = /usr/local/sysusage/bin/sysusagewarn
378 SMTP = localhost
379 FROM = root@localhost
380 TO = root@localhost
381 NAGIOS = /usr/local/nagios/bin/submit_check_result
382 UPPER_LEVEL = 1
383 LOWER_LEVEL = 2
384 URL =
385
386 [MONITOR]
387 load:threshold_max_value
388 blocked:threshold_max_value
389 cpu:threshold_max_value
390 cswch:threshold_max_value
391 intr:threshold_max_value
392 mem:threshold_max_value
393 dirty:threshold_max_value
394 swap:threshold_max_value
395 work:threshold_max_value
396 share:threshold_max_value
397 sock:threshold_max_value
398 socktw:threshold_max_value
399 io:threshold_max_value
400 file:threshold_max_value
401 page:threshold_max_value
402 pcrea:threshold_max_value
403 pswap:threshold_max_value
404 net:threshold_max_value
405 tcp:threshold_max_value
406 err:threshold_max_value
407 disk:threshold_max_value
408 proc:proc_name:threshold_max_value:threshold_min_value
409 tproc:proc_name:threshold_max_value:threshold_min_value
410 queue:path_queue_dir:threshold_max_value
411 hddtemp:device:threshold_max_value
412 dev:device(alias):threshold_max_value
413 dev:device(alias):rpm_speed:raid_type:nb_disk
414 work:threshold_max_value
415 sensors:pattern:threshold_max_value
416 temp:device:threshold_max_value
417 fan:device:threshold_max_value
418 huge:threshold_max_value
419
420 [PLUGIN testplug]
421 title:Sysage Test plugin
422 menu:Database
423 enable:no
424 program:/usr/local/sysusage/plugins/plugin-sample.pl
425 minThreshold:0
426 maxThreshold:10
427 verticalLabel:Number of seconds
428 label1:Total seconds
429 label2:
430 label3:
431 legend1:seconds
432 legend2:
433 legend3:
434 remote:yes
435
436 [REMOTE hostname1]
437 enable:no
438 ssh_user:monitor
439 ssh_identity:/home/monitor/.ssh/id_rsa
440 #ssh_options: -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
441 #ssh_command:
442 remote_sysusage:/usr/local/sysusage/bin/rsysusage
443
444 #[GROUP Web Servers]
445 #hostname1
446 #hostname2
447
448 Section GENERAL
449 DEBUG = 0|1
450 This option is used to set debug mode. If set to 1 then sysusage
451 and sysusagegraph just show what they do but don't create or send
452 anything.
453
454 DATA_DIR = /path/to/rrdfiles
455 This option is used to set te ouput directory for all RRDTOOL
456 database.
457
458 PID_DIR = /path/to/piddir
459 sysusage and sysusagegraph use a file to store the pid of the
460 running process to prevent simultaneous run.
461
462 DEST_DIR = /path/to/html_output
463 Set the path to the directory where all HTML and graph files should
464 be created.
465
466 SAR_BIN = /path/to/sar_binary
467 sysusage use sar, part of the sysstat distribution to grab system
468 information so we need to know where it is.
469
470 UPTIME = /path/to/uptime_binary
471 sysusagegraph report the current uptime of the system using the
472 uptime command. Used to set path to uptime binary.
473
474 HOSTNAME = /path/to/hostname_binary
475 All scripts of Sysusage distribution need to know the name of the
476 host. They use hostname command for that.
477
478 INTERVAL = pull_interval_in_second
479 All RRDTOOL input use the given interval in second to store
480 monitored values. Graph construction also use this interval to
481 render things properly. By default Sysusage use an interval of 60
482 seconds to have a better statistic report. You can change this but
483 it's not recommanded. If you change this adjust your crontab to the
484 same value. This value must between 10 and 300 seconds. If you want
485 to be under the minute you must use the daemon mode to run
486 sysusage. See DAEMON bellow.
487
488 SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
489 You can define here some time range where monitoring will not be
490 done. Value is a list of begin_time/end_time separated by space or
491 tabulation. Let's say you don't want to monitor the host during the
492 night for some good reason, you can write it like that: 20:00/06:00
493
494 HDDTEMP_BIN = /path/to/hddtemp_binary
495 You can monitor your hard drive temperature if you have installed
496 hddtemp utility. We need to know the path to hddtemp binary.
497
498 SENSORS_BIN = /path/to/sensors_binary
499 You can monitor your device temperature if you have installed
500 lm_sensor utility. We need to know the path to sensors binary.
501
502 DAEMON = 0 | 1
503 You can monitor your system under the crond limitation of 1 minute
504 by running sysusage in daemon mode with an INTERVAL between 10 end
505 60 seconds.
506
507 GRAPH_WIDTH and GRAPH_HEIGHT
508 These are usefull if you want to resize graph dimension. Default is
509 a width of 550 pixels and a height of 200.
510
511 FLAMING
512 This is for fun, if you want to have random flaming effect on
513 graphs with only dataset set this directive to 1. Disable by
514 default. Not used with JQuery graph renderer.
515
516 HIRES
517 Allow addition of hourly graph to have fine granularity of the
518 data. This is disable by default. Set it to any integer between 1
519 to 23 hours included to show data from past N hours to now. Not
520 used with JQuery graph renderer as the Javascript library allow you
521 to zoom into the resolution you want.
522
523 LINE_SIZE
524 By default the graph line size is 1 if you want graph with a more
525 thick line set it to 2. This is rrd graph limitation (1 or 2). Not
526 used with JQuery graph renderer.
527
528 PROC_QSIZE
529 Number of simultaneous remote sysusage call process that should be
530 run. Default is 4 but it can be up to 15 or more depending of the
531 hardware configuration. One per core is the lower value you may
532 think about.
533
534 RESRC_URL
535 Images, javascripts and css ressources by default are search into
536 the DEST_DIR directory so that in the HTML view they all stayed on
537 the current main directory. You may want to place thoses resources
538 on an other directory or an another place. Using this directive
539 you can set any FQDN, absolute or relative URL for these resources.
540
541 SSH_IDENTITY
542 Used to set the default identity file to connect to all remote
543 hosts without password. If undefined, sysusage will use the ssh
544 system default value. You may want to use the default value unless
545 you know exactly what's you are doing.
546
547 SSH_OPTION
548 Use set the default ssh options, that correspond to a passwordless
549 authent:
550
551 -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
552
553 with a five seconds connection timeout. You may want to increase
554 this timeout on very slow network links.
555
556 Do not change this value unless you know exactly what's you are
557 doing.
558
559 SSH_BIN
560 Path to the ssh command is set here at install time.
561
562 SSH_USER
563 Used to defined the default ssh user that will be used to connect
564 to all remote hosts.
565
566 Section ALARM
567 WARN_MODE = 0|1
568 Used to disable/enable alert message during threshold exceed.
569
570 ALARM_PROG = /path/to/sysusagewarn
571 Used to set path to the external program responsible of sending
572 alarm message. You can change it to your own, just take a look at
573 the sysusagewarn usage to see what command line options are used by
574 sysusage
575
576 SMTP = smtp.server.net
577 Name or Ip address of the SMTP server to contact. Default is none
578 => No smtp message is sent.
579
580 FROM = sender@localhost
581 Sender email addresse to use in the SMTP message.
582
583 TO = destination@localhost
584 Destination email address where the alarm message will be sent.
585
586 NAGIOS = /usr/local/nagios/bin/submit_check_result
587 Path to the external nsca program used to send check message to
588 Nagios. Setting this will activate nagios check report. See at end
589 of this file to see how to configure Nagios
590
591 UPPER_LEVEL = 1
592 Nagios check level to send when a high threshold limit is reached.
593 Default is 1 => WARNING.
594
595 LOWER_LEVEL = 2
596 Nagios check level to send when a low threshold limit is reached.
597 Default is 2 => CRITICAL.
598
599 URL = Url of Sysusage report
600 Used to overwrite the default URL of SysUsage report
601 http://host.dom/sysusage/ especially if you have a special port or
602 a different path. Example:
603 http://hostname.domain:9080/Reports/Sysusage/
604
605 SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
606 You can define here some time range where alarm notice will not be
607 sent. Value is a list of begin_time/end_time separated by space or
608 tabulation. Let's say you don't want to received notice during the
609 night for some good reason, you can write it like that: 20:00/06:00
610
611 Section MONITOR
612 This section has two different format the first one is used to specify
613 most of the monitoring target:
614
615 type:threshold_max
616
617 or
618
619 type:threshold_max(attempt)
620
621 type
622 Type of system information you may want to monitor. It can takes
623 around 30 differents values:
624
625 load => monitor load average
626 blocked=> monitor task blocked waiting for I/O
627 cpu => monitor each cpu(s) user/nice/system usage
628 => monitor each cpu(s) total/iowait usage
629 => monitor each cpu(s) steal/guest usage
630 cpuall => monitor global cpu(s) statistics
631 cswch => monitor context switches usage
632 intr => monitor number of interrupt per second
633 mem => monitor memory usage
634 dirty => monitor memory active/inactive/dirty memory
635 share => monitore Posix share memory usage (/dev/shm)
636 swap => monitor swap usage
637 work => monitor amount of memory needed for current workload
638 sock => monitor number of open socket
639 socktw => monitor number of socket in TIME_WAIT state
640 io => monitor I/O request and block usage
641 page => monitor I/O page usage
642 pswap => monitor I/O page swap usage
643 pcrea => monitor number of process created per second
644 proc => monitor number of running process
645 tproc => monitor number of running thread
646 file => monitor number of open file
647 queue => monitor number of files in queue
648 net => monitor I/O network bytes on all network interfaces
649 err => monitor bad packet, drop and collision on interfaces
650 tcp => monitor number of tcp connection and segment
651 disk => monitor disk space usage
652 dev => monitor percentage of CPU time per device
653 => monitor average request queue length
654 => monitor I/O sectors read and write to device
655 => monitor time spent in queue (await)
656 => monitor time spent in servicing (svctm)
657 sensors=> monitor fan and device temperature using sensors command
658 hddtemp=> monitor disk drive temperature
659 temp => monitor device temperature using sar
660 fan => monitor fan rotation using sar
661 huge => monitor size of hugepages utilisation
662
663 Note: the 'cpu' target monitoring type will report all statictics
664 per cpu. This can represent a lot of informations if you several
665 cpu. To limit statistics to total cpu only, you must replace
666 default the 'cpu' target to 'cpuall' in your configuration file.
667
668 threshold_max
669 This is the maximum threshold value. Any value equal or upper
670 than this one will generate SMTP and/or Nagios alert if you
671 have enable it.
672
673 attempt
674 You can delay the call to the alarm program at threshold exceed by
675 specifying the number of consecutive exceed attempt before the
676 command will be called. Just specify the number of attempt between
677 bracket just after the min and/or max threshold value. This setting
678 is optional for both threshold value and the default is to send
679 alarm immediatly.
680
681 Specials cases
682 There's a special case for 'disk' usage monitoring that allow
683 exclusion of some mount point. This is usefull if you have hard
684 link or some special device you don't need to monitor. Where
685 exclusion is a semi- colon (;) separated list of mount point to
686 exclude from monitoring.
687
688 disk:ThresholdMax:exclusion
689
690 Ex: disk:90:/home/mondo_image;/home/smb_mountpoint
691
692 You can use regexp in your excluded path.
693
694 The other directive with special syntax is 'dev'. It is construct
695 as follow:
696
697 dev:device(alias):rpm_speed:raid_type:nb_disk
698
699 where device is sda, sdb or any device name (without the /dev/),
700 the alias between parenthesis is the name that must be displayed in
701 the user interface instead of the device name. For example:
702
703 dev:sdc(ASM disk1):
704 dev:sdb(/data):
705
706 I you plan to use I/O workload report, SysUsage need to know the
707 speed of the disk (RPM), the raid type (0,1,5,10) and the number of
708 disk in the raid array to calculate the IOPS. For example if we
709 have a 7200 RPM disk with 2 disk in raid 1, we will write thing
710 like that:
711
712 dev:sdc(ASM disk1):7200:1:2
713
714 I/O workload is the relation between TPS (transfers per second) and
715 IOPS (I/O operations measured in seconds) of a device. If the tps
716 returned by sysstat reach the maximum theoretical IOPS, your
717 storage subsystem is saturated. Here is the equation to calculate
718 the maximum theoretical IOPS:
719
720 d = number of disks
721 dIOPS = IOPS per disk
722 %r = % of read workload
723 %w = % of write workload
724 F = raid factor
725
726 IOPS = (d *dIOPS) / (%r + (F * %w))
727
728 the theoretical maximum IOPS for a RAID set (excluding caching of
729 course). To do this you take the product of the number of disks
730 and IOPS per disk divided by the sum of the %read workload and the
731 product of the raid factor and %write workload. Where %read and
732 %write are calculated from the following equation:
733
734 %r = rd_sec / (rd_sec + wr_sec);
735 %w = wr_sec / (rd_sec + wr_sec);
736
737 This IOPS monitoring is build following the excellent article of
738 Nick Anderson readable from Analyzing I/O performance in Linux.
739
740 The second format is used to monitor running process, hard drive
741 temperature or queue directory. It has the following format:
742
743 type:target:threshold_max_value:threshold_min_value
744
745 or
746
747 type:target:threshold_max_value(attempt):threshold_min_value(attempt)
748
749 type
750 Type of system information you may want to monitor. It can takes
751 these differents values:
752
753 load, cpu, cswch, intr, mem, swap, work, share, sock, socktw, io, file,
754 page, pcrea, pswap, net, tcp, err, disk, proc, tproc, queue, hddtemp,
755 dev, work, sensors, temp, fan, huge, blocked, dirty
756
757 target
758 If type is 'proc' or 'tproc' target represent the name of the
759 process to monitor. You can put a regexp as target to match exactly
760 the required process. The number of running process are obtain by
761 the system command line:
762
763 ps -e -o command | grep -E "target" | grep -v grep | wc -l
764
765 so you can replace the word target by the regexp to match and see
766 if it returns the right number of process.
767
768 The number of running thread are obtain by the system command line:
769
770 ps -eL -o command | grep -E "target" | grep -v grep | wc -l
771
772 If type is 'queue' this represent the full path of the directory to
773 monitor. Sysusage will try to find and count any regular file in
774 the target directory and will not follow sub directories.
775
776 If type is 'hddtemp' the target represent the hard drive device to
777 monitor, ex: /dev/sda. You can try it with the following command
778 line:
779
780 hddtemp -n /dev/sda
781
782 This may return the actual temperature detected on the hard drive.
783
784 If this is 'dev' this represent the device name to monitor. Ex:
785 sda. Do not add the /dev/ before this will not work. You may want
786 to change the device name in the graphic menu, this is possible by
787 adding the device alias enclosed with parenthesis.
788
789 For example lets say you're monitoring some EMCpower SAN device.
790 Using sar the reported devices are dev120-48 and dev120-64. Once
791 you have find what partition are mapped to these devices (reading
792 /proc/partitions). In this example these devices are mounted as
793 /cache1 and /cache2 so we want to see these mount points instead of
794 device number in the graphical menu:
795
796 dev:dev120-48(/cache1):90
797 dev:dev120-64(/cache2):97
798
799 in you sysusage.conf file will do the job. The threshold_max value
800 is the max percentage of CPU used for this device before sending an
801 alarm.
802
803 If type is 'sensors' this represent the pattern to match to obtain
804 temperature or fan speed information in the sensors program output.
805 See chapter SENSORS to have more information.
806
807 If type is 'temp' or 'fan' this represent the device number
808 reported by sar to obtain temperature or fan speed information. To
809 know what device number must be used, see result of command: sar -m
810 ALL 1 1
811
812 threshold_max
813 This is the maximum threshold value. Any value equal or upper will
814 generate an SMTP and/or Nagios alert if you have enable it.
815
816 threshold_min
817 This is the minimum threshold value. Any value equal or lower of
818 this one will generate SMTP and/or Nagios alert if you have enable
819 it. Min threshold should certainly only be used with 'proc' and
820 'tproc' monitoring type. If you set it to 0 then you will be warn
821 if any of the monitored process are down.
822
823 attempt
824 You can delay the call to the alarm program at threshold exceed by
825 specifying the number of consecutive exceed attempt before the
826 command will be called. Just specify the number of attempt between
827 bracket just after the min and/or max threshold value. This setting
828 is optional for both threshold value and the default is to send
829 alarm immediatly.
830
831 For example a load average monitoring defined like this
832
833 load:12(3)
834
835 will send an alarm when the system load average will exceed 12
836 after three consecutives attempts at the define interval. If the
837 interval is 60 seconds, the alarm will be sent up to 180 second
838 after the first exceed.
839
840 Section PLUGIN
841 This part enable the use of custom plugins. You can call any program or
842 script provide that it return up to 3 numbers separated by a space
843 character. See plugins/ directory for sample scripts.
844
845 This section must include a name composed of any alphanumeric character
846 that will be used to create the target file, for example:
847
848 [PLUGIN testplug1] or [PLUGIN testplug2]
849
850 The section allow the following configuration directives. They are
851 composed of named directives followed by ':' or '=' and a value.
852
853 enable
854 Is used to disable temporary the plugin monitoring. Default is
855 'yes' enable. To disable write it enable:no
856
857 program
858 Is used to set the path to the program or script to execute as
859 plugin. This program must print to STDOUT 1 to 3 numbers separated
860 by a space character as result following the number of reports you
861 want. So each plugin can have 1, 2 or 3 graphed data.
862
863 title
864 Is used to set the title of the report page and the index link.
865 Default is set to "Sysusage plugin".
866
867 menu
868 Is used to store the plugin under a submenu of the plugins menu.
869 Default is to store plugin under the "Others" submenu.
870
871 maxthreshold
872 This is the maximum threshold value. Any value equal or upper than
873 this one will generate SMTP and/or Nagios alert if you have enable
874 it.
875
876 minthreshold
877 This is the minimum threshold value. Any value equal or lower of
878 this one will generate SMTP and/or Nagios alert if you have enable
879 it.
880
881 verticallabel
882 This is used to set the vertical label of the graph.
883
884 label1, label2, label3
885 Are used to show a legend for each graphed data, label1 is for the
886 first returned value, label2 for the second and label3 for the
887 last. If you just have one value returned just omit the other
888 labels.
889
890 legend1, legend2, legend3
891 These are use to set the units for Current, Avg and Max values.
892
893 remote
894 This directive must be set to 'no' to prevent execution of the
895 plugin program by a issh call to sysusage in a remote context. This
896 directive is activated by default ('yes').
897
898 Section REMOTE
899 This part allow to run sysusage on remote hosts from a central server.
900 It use ssh to execute sysusage on the destination host with the -r
901 option that force sysusage to not write anything to local data files
902 but to print all result to stdout. As sysusage is run by cron job or
903 daemon mode it can not authenticate interactively to remote host so you
904 must give a ssh user and an identity file with the corresponding
905 configuration option.
906
907 This section must include the name or the ip address of the remote host
908 that will be used to create the target data directory, for example:
909
910 [REMOTE hostname] or [REMOTE host.domain.dom] or [REMOTE 192.168.1.14]
911
912 The section allow the following configuration directives. They are
913 composed of named directives followed by ':' or '=' and a value.
914
915 Once you have installed sysusage on all remote host and exchange the
916 SSH key certificat between the central host and all remote hosts, most
917 of the time you just have to set the ssh_user directive to have it
918 working. Use remote_sysusage directive if sysusage perl script is not
919 installed on the same place than the central server.
920
921 Section GROUP
922 This section allow you to groups remote host report under a common
923 groupname in the index page. Remote hosts will be ordered following
924 their parent groups. The name of the group can be any string and the
925 values in the section must be a list of remote servers defined in the
926 REMOTE sections.
927
928 For example if you are monitoring a cluster of web and database servers
929 you can use the following declaration:
930
931 [GROUP Web Servers]
932 webhost1
933 webhost2
934 webhost3
935
936 [GROUP Database Servers]
937 dbhost1
938 dbhost2
939
940 Of course webhostN and dbhostN hosts must be declared in the remote
941 section.
942
943 enable
944 Is used to enable/disable the remote host monitoring. Default is
945 'yes' enable. Set it as 'enable=no' to disable it.
946
947 ssh_user
948 Used to defined the ssh user allowed to connect to remote host. By
949 default the value set to SSH_USER configuration option in the
950 GENERAL section will be used.
951
952 ssh_identity
953 Used to set the identity file to connect to remote host without
954 password. By default the value set to SSH_IDENTITY configuration
955 option in the GENERAL section will be used. Usually this is the
956 private key that you've generated using ssh-keygen and most of the
957 time file $HOME/.ssh/id_rsa. You may want to use the default value
958 unless you know exactly what's you are doing.
959
960 ssh_options
961 Use to overwrite the default ssh options, that are:
962
963 -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
964
965 The default options are set into the SSH_OPTIONS configuration
966 option in the GENERAL section. You may want to use the default
967 value unless you know exactly what's you are doing.
968
969 ssh_command
970 You can overwrite the complete ssh command using this directive,
971 this will replace the ssh command, the ssh option, the ssh user and
972 the host part. The sysusage remote command will not be replaced.
973 You may want to use the default value unless you know exactly
974 what's you are doing.
975
976 remote_sysusage
977 Use it to set the path to the rsysusage command that must be used
978 on the remote host, SysUsage will automatically add the -r option
979 to cause the remote execution mode.
980
982 SMTP alert
983 Sysusage use an external perl script to send SMTP alert and/or Nagios
984 checks when a max or min threshold is reached. This program is named
985 sysusagewarn. All options of the configuration file in section [ALARM]
986 are use by sysusage to call this program. If they are correctly set you
987 don't have to take care of the parameters given to this program. If you
988 want to use this program outside sysusage, here are the command line
989 options it understand:
990
991 Usage: sysusagewarn -t subject -c current_value -v threshold_value
992 [-s smtp_srv] [-f from] [-d to] [-b hostname_prog]
993
994 -t subject : Subject of the alarm
995 -c value : Current value monitored by sysusage
996 -v value : Threshold value used.
997 -s host : SMTP server name or ip where to send email.
998 -f from : Sender email address of the alarm message.
999 -d to : Destination address of the alarm message.
1000 -b path : Path to program hostname. Default is /bin/hostname
1001 -n path : Path to Nagios program submit_check_result. Default none.
1002 -l value : Alarm level (0=OK,1=WARNING,2=CRITICAL). Default: 1.
1003 -r service : Nagios service name to used. Must be any sysusage type of
1004 monitoring defined in the configuration file.
1005 -u url : Url to HTML sysusage output to include in email.
1006 Default: http://hostname.domain/sysusage/
1007 -h : Output this message and exit
1008
1009 NAGIOS alert
1010 SysUsage send check message to Nagios through an external command
1011 (submit_check_result). So you need to create the host and associate all
1012 sysusage service that you want to monitor with Nagios. The services
1013 name correspond to the type of monitoring. For example, if you have
1014 enable alarm on memory usage the service sent is 'mem'. There's also
1015 specials case with type of monitoring with multiple instance like
1016 network monitoring. You need to create a service per instance. For
1017 example type 'net' will have 'net_eth0' and 'net_lo' and more if you
1018 have more network interface. To see if your sysusage alarm messages are
1019 well understood by Nagios take a look at the nagios.log file (default
1020 to /usr/local/nagios/var/nagios.log).
1021
1022 To desactivate automatically an alarm reported to Nagios, SysUsage will
1023 send each time it run an OK request if every thing is correct for the
1024 monitored type.
1025
1027 Monitoring of sensors output is based on regexp. To be clear enought
1028 here an example:
1029
1030 Sensors output on my server:
1031
1032 adt7463-i2c-0-2d
1033 Adapter: SMBus I801 adapter at 1480
1034 V1.5: +3.23 V (min = +0.00 V, max = +3.32 V)
1035 VCore: +1.24 V (min = +1.10 V, max = +1.49 V)
1036 V3.3: +3.33 V (min = +2.80 V, max = +3.78 V)
1037 V5: +4.99 V (min = +4.25 V, max = +5.75 V)
1038 V12: +0.11 V (min = +0.00 V, max = +15.94 V)
1039 CPU_Fan: 0 RPM (min = 0 RPM)
1040 fan2: 10671 RPM (min = 8095 RPM)
1041 fan3: 0 RPM (min = 0 RPM)
1042 fan4: 0 RPM (min = 0 RPM)
1043 CPU Temp: +69.5 C (low = +2.0 C, high = +91.0 C)
1044 Board Temp: +32.5 C (low = +2.0 C, high = +83.0 C)
1045 Remote Temp: +31.2 C (low = +2.0 C, high = +58.0 C)
1046 cpu0_vid: +1.338 V
1047
1048 adt7463-i2c-0-2e
1049 Adapter: SMBus I801 adapter at 1480
1050 V1.5: +3.21 V (min = +0.00 V, max = +3.32 V)
1051 VCore: +1.28 V (min = +1.10 V, max = +1.49 V)
1052 V3.3: +3.32 V (min = +2.80 V, max = +3.78 V)
1053 V5: +4.95 V (min = +0.00 V, max = +6.64 V)
1054 V12: +0.11 V (min = +0.00 V, max = +15.94 V)
1055 CPU_Fan: 10843 RPM (min = 8095 RPM)
1056 fan2: 0 RPM (min = 0 RPM)
1057 fan3: 9642 RPM (min = 8095 RPM)
1058 fan4: 0 RPM (min = 0 RPM)
1059 CPU Temp: +57.2 C (low = +2.0 C, high = +91.0 C)
1060 Board Temp: +35.2 C (low = +2.0 C, high = +91.0 C)
1061 Remote Temp: +35.8 C (low = +2.0 C, high = +58.0 C)
1062 cpu0_vid: +1.338 V
1063
1064 Following the sensors kernel module load you could have more or less
1065 output than that. To monitor all sensors CPUs temperature on my server
1066 I need to add the following lines into sysusage.cfg:
1067
1068 sensors:CPU Temp:75
1069 sensors:Board Temp:45
1070 sensors:Remote Temp:45
1071
1072 This will create 3 graphs based on lines matching 'CPU Temp', an other
1073 with lines matching 'Board Temp' and the last with lines matching
1074 'Remote Temp'. As I have 2 CPUs for each graph there will be 2 values.
1075 You can not report more than 3 values per graph, this is hard coded
1076 into sysusage. So if you have more CPUs you will not see more than 3
1077 values. Here it will sent alarm when temperature exceed the given
1078 values (75,45,45).
1079
1080 To monitor fan speed, I just add lines like this in the configuration
1081 file:
1082
1083 sensors:fan2:11000:8095
1084 sensors:fan3:11000:8095
1085
1086 This whil create 2 graphs for fan 2 and fan 3. With an alarm sent when
1087 speed exceed 11000 RPM or is lower than 8095 RPM.
1088
1089 On my personal computer (/etc/sysconfig/lm_sensors => modprobe
1090 coretemp) sensors output is:
1091
1092 coretemp-isa-0000
1093 Adapter: ISA adapter
1094 Core 0: +53.0 C (high = +78.0 C, crit = +100.0 C)
1095
1096 coretemp-isa-0001
1097 Adapter: ISA adapter
1098 Core 1: +50.0 C (high = +78.0 C, crit = +100.0 C)
1099
1100 To monitor CPU temprature, I just add this line in my sysusage.cfg:
1101
1102 sensors:Core:70
1103
1104 This will generate a graph with 2 graphed data for Core 0 and Core 1.
1105
1106 Now that sysstat sar natively reports deviceis temperature and fan
1107 speed you don't need sensors anymore. Type 'temp' can be used instead
1108 and type 'fan' for the fan speed. The target of these types is the
1109 device number, See sar -m TEMP or sar -m FAN to see which device number
1110 to monitor.
1111
1113 Please report any bugs, remarqs and feature request using the Github
1114 interface at https://github.com/darold/sysusage/ or send a mail to the
1115 author.
1116
1118 Copyright (C) 2003-2018 Gilles Darold
1119
1120 This program is free software; you can redistribute it and/or modify it
1121 under the terms of the GNU General Public License as published by the
1122 Free Software Foundation; either version 3 of the License, or any later
1123 version.
1124
1125 This program is distributed in the hope that it will be useful, but
1126 WITHOUT ANY WARRANTY; without even the implied warranty of
1127 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1128 General Public License for more details.
1129
1130 You should have received a copy of the GNU General Public License along
1131 with this program; if not, write to the Free Software Foundation, Inc.,
1132 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
1133
1135 Gilles Darold <gilles _|_At_|_ darold _|_DoT_|_ net>
1136
1138 I want ot thanks all the people who help to build this tool with a very
1139 special thank to Marat Dyatko for the web design contribution.
1140
1141
1142
1143perl v5.26.1 2018-08-06 SYSUSAGE(1)