1SYSUSAGE(1)           User Contributed Perl Documentation          SYSUSAGE(1)
2
3
4

NAME

6       SysUsage v5.7 - System Monitoring Tool
7

DESCRIPTION

9       SysUsage is a tool used to continuously monitor a system and generate
10       daily/weekly/monthly/yearly graphical report using rrdtool and sar.
11

FEATURES

13       SysUsage generate graphical reports on all system activity information.
14       His periodical reports allow you to keep track of the machine activity
15       during his life and will be a great help for performance analysis and
16       resources management.
17
18       SysUsage can be run periodically from 10 seconds cycle in daemon mode
19       to 1 minute or more using crond.
20
21       SysUsage can be run from a central server to call a ssh remote
22       execution of the sysusage perl script so that collected data will be
23       stored in this central place. You also will have just one place where
24       rrdtool and related Perl modules need to be installed as well as just
25       one place where sysusagegraph or sysusagejqgraph need to be executed.
26
27   CPUs
28               - CPUs distribution usage (user, nice, system).
29               - CPUs global usage (total cpu used, iowait).
30               - CPUs virtualized usage (steal, guest).
31
32   Memory
33               - Memory usage (with and without cache).
34               - Swap usage (with and without cache).
35               - Amount of memory need for current workload.
36               - Posix share memory.
37               - Hugepages utilisation
38               - Active versus inactive memory
39               - Dirty memeory that need to be written to disk
40
41   I/O
42               - Context switches per second.
43               - Interrupts per second.
44               - Page swapping.
45               - Page I/O stats.
46               - I/O request stats.
47               - I/O block stats.
48
49   Network
50               - TCP connections per second.
51               - TCP segments per second.
52               - Number of socket in use (Total, TCP and UDP).
53               - Number of socket in TIME_WAIT state.
54               - Active network interface usage.
55               - Active network interface bad packet, dropping, collision.
56
57   Devices
58               - CPU time for I/O on device.
59               - Read/Write sectors on device.
60               - Disk throughput on device.
61               - I/O workload on device.
62               - Times for I/O requests issued to device.
63               - Hard drive temperature if your hardward support it (with hddtemp).
64               - MotherBoard/CPU/Remote temperature reported by sensors or sar.
65               - Fan RPM reported by sensors.
66
67   Files
68               - Number of open file.
69               - Number of file in a queue directory.
70               - Disk space used on mounted partition.
71
72   Process
73               - Load average.
74               - Process created per second.
75               - Number of running process (ex: sendmail, httpd, oracle, etc.).
76               - Number of running thread (ex: mysqld, amarok, etc.).
77               - Number of task blocked waiting for I/O
78
79   Notification
80       You can have mail or Nagios notification when some monitored values are
81       outside max/min threshold values for all type of monitoring.
82
83   Plugins
84       With SysUsage you can create your own monitoring plugins. Any script or
85       program can be embeded in SysUsage provided that it return up to 3
86       numeric values. The graphic title and labels are defined in the
87       configuration file.
88
89   Remote call
90       SysUsage can be installed and run onto a central server that will be
91       used to store statistics data by periodically calling sysusage on
92       remote host using SSH. This central place will also be in charge to
93       renderer HTML plages and graphics for all hosts. This will allow to
94       simplify the SysUsage installation on remote host that will only
95       require sysstat and rsysusage.
96

REQUIREMENT

98   rrdtool
99       You need to install rrdtool. All distribution may have a dedicated
100       package for rrdtool. On CentOs/RedHat distributions, use the following
101       command:
102
103               yum install rrdtool rrdtool-perl
104
105       on Debian/Ubuntu distributions use command:
106
107               apt-get install rrdtool librrds-perl
108
109       The sources can be found here:
110
111               http://people.ee.ethz.ch/~oetiker/
112
113       If you compile from sources and want to use the RRDs perl module
114       embedded with it, you must use the following command to compile:
115
116               make site-perl-install
117
118       This installation is optional if sysusage is installed on a remote
119       host.
120
121   sysstat
122       You also need sar to collect statistics. Sar is part of the sysstat
123       package. For RPM like distributions:
124
125               yum install sysstat
126
127       and Debian like distributions:
128
129               apt-get install sysstat
130
131       The sources can always be found here :
132
133               http://freshmeat.net/projects/sysstat/
134
135       If you plan to use threshold notification you must have Net::SMTP
136       installed.
137
138               yum install perl-Net-SMTP-SSL
139
140       or
141
142               apt-get install libnet-smtp-ssl-perl
143
144       Sources can be found on CPAN (https://metacpan.org/pod/Net::SMTP)
145
146   Perl modules
147       Sysusage can be run in a central place to collect remote sysusage
148       statistics using ssh. The remote calls are proceed simultaneously using
149       fork with the Proc::Queue Perl module.
150
151       If you're plan tu use sysusagegraph instead of sysusagejqgrpah you will
152       also need the GD and GD::Graph3D Perl modules. Note that the use of GD
153       and GD::Graph is deprecated and sysusagegraph will be removed in next
154       major release (6.0).
155
156       All these modules are always available from CPAN
157       (https://metacpan.org/) and may at least be installed on the central
158       server. On remote host this is optional and depend if you want to run
159       it on each server or by ssh from a central place.
160
161   Nagios nsca client (optional)
162       If you want to send message to Nagios you need to install
163       nsca-2.7.2.tar.gz or a more recent version. You can get it here:
164
165               http://sourceforge.net/projects/nagios/files/
166
167   hddtemp and sensors (optional)
168       If you want to monitor your hard drive temperature you must install a
169       small utility called hddtemp. You can download it from
170       http://download.savannah.gnu.org/releases/hddtemp/.  Run it to see if
171       your hard drive have a temperature sensor.
172
173       You can also use sensors to monitor your cpu temperature and fan speed.
174       If you harware support it run sensors-detect and load the required
175       kernel modules at boot time.
176

INSTALLATION

178   Quick install
179       Simply run the following commands:
180
181               perl Makefile.PL
182               make && make install
183
184       By default it will copy the perl programs into /usr/local/sysusage/bin
185       and the HTML output will be done to /var/www/htdocs/sysusage/.  The
186       configuration file is /usr/local/sysusage/etc/sysusage.cfg and all RRD
187       Bekerley DB databases from rrdtool will be saved under
188       /usr/local/sysusage/rrdfiles.
189
190       If you plan to run sysusage on different servers from a central place
191       you may just want to install the rsysusage Perl script on remote hosts.
192       So proceed as follow:
193
194               perl Makefile.PL REMOTE=1
195               make && make install
196
197       It will copy the only the rsysusage into /usr/local/sysusage/bin and
198       the configuration file under /usr/local/sysusage/etc/sysusage.cfg. The
199       RRD data directory will be created under /usr/local/sysusage/rrdfiles
200       but just to hold the *.cnt files relatives to the count of alert
201       attempt on threshold exceed.
202
203   Custom install
204       You can overwrite all install path with the following Makefile.PL
205       arguments. Here are the default values:
206
207               BINDIR=/usr/local/sysusage/bin
208               CONFDIR=/usr/local/sysusage/etc
209               PIDDIR=/usr/local/sysusage/etc
210               BASEDIR=/usr/local/sysusage/rrdfiles
211               PLUGINDIR=/usr/local/sysusage/plugins
212               HTMLDIR=/var/www/htdocs/sysusage
213               MANDIR=/usr/local/sysusage/doc
214               DOCDIR=/usr/local/sysusage/doc
215               REMOTE=
216
217       For example on a RedHat System you may prefer install SysUsage as this:
218
219               perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
220                       BASEDIR=/var/lib/sysusage HTMLDIR=/var/www/html/sysusage \
221                       MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage
222
223       If you are installing sysusage on a host that will be call by ssh from
224       a central place, you may want to install just what is necessary and not
225       more:
226
227               perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
228                       MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage \
229                       REMOTE=1
230
231       This will just install the rsysusage Perl script, the configuration
232       file and documentation. So that you don't need to install extra Perl
233       modules and other graphics related things.
234
235   Package/binary install
236       In directory packaging/ you will find all scripts to build RPM,
237       slackBuild and debian package. See README in this directory to know how
238       to build these packages.
239

USAGE

241       SysUsage consist in two main Perl scripts, sysusage and sysusagegraph.
242       Once you have correctly installed and configured SysUsage the best way
243       to execute them is by setting a cron job. If you prefer javascript
244       graphics instead of GD::Graph images use sysusagejqgraph that is based
245       on jqplot javascript library. This is the recommanded script as use of
246       GD::Graph through sysusagegraph is deprecated.
247
248   sysusage
249       The script sysusage is responsible of collecting system informations at
250       a given interval and store them into rrdtool database files.
251
252       As it is very fast you can set running interval time to 1 minute. This
253       is the default pooling interval used in configuration and graph
254       reports.  If you change this interval you must also change it in the
255       configuration file otherwise your graph will be false. See the INTERVAL
256       configuration directive.
257
258       Here is how I use it with a default installation:
259
260               */1 * * * * /usr/local/sysusage/bin/sysusage > /dev/null 2>&1
261
262   rsysusage
263       This script do the same things as the sysusage Perl script but instead
264       of storing collected datas on file it will dump them to the standard
265       output.  This script is used instead of the sysusage Perl script by a
266       ssh call from a central server where the local sysusage will store the
267       statistics retrieved from multiple servers.
268
269               /usr/local/sysusage/bin/rsysusage -r remote_hostname
270
271       Where 'remote_hostname' is the hostname given in the [REMOTE ...]
272       configuration section.
273
274   sysusagegraph (deprecated) / sysusagejqgraph
275       The perl script sysusagegraph is used to draw PNG graphs and write HTML
276       file.  As he knows the pooling interval given in the configuration file
277       it can be run at any time. I used to run it each five minutes but you
278       can run it each hours or more this is the same.
279
280               */5 * * * * /usr/local/sysusage/bin/sysusagegraph > /dev/null 2>&1
281
282       Since release v4.0 of SysUsage there's a JQuery plotting replacement of
283       rrdGraph that only write HTML files with all javascript code to allow
284       the client browser to draw the graphs. To enable this feature you just
285       have to use sysusagejqgrpah instead.
286
287               */5 * * * * /usr/local/sysusage/bin/sysusagejqgraph > /dev/null 2>&1
288
289       There's some more resources javascript libraries and CSS files to
290       install. The SysUsage installer will do the job for you. This remove
291       the requirement of the GD, GD::Graph and GD::Graph3D Perl modules.
292
293   sysusage.cfg
294       If you have change the default installation path (/usr/local/sysusage)
295       you may need to give these scripts the path to the configuration file
296       as command line argument using -c option. To know what arguments can be
297       passed use option -h or --help.
298
299       Note that since version 3.0 the default configuration path in these
300       scripts is set during installation. So you may not need anymore to edit
301       these scripts or give the path of the configuration file as command
302       line argument.
303
304       See CONFIGURATION chapter for more information on howto configure your
305       system monitoring.
306
307   Daemon mode
308       Crond is good for scheduling but not under the minute. If you want to
309       monitor your system within an interval under the minute you may want to
310       run sysusage in daemon mode. To do that, just change the INTERVAL to
311       the desired timer in the configuration file and the DAEMON directive to
312       1.
313
314   Debug mode
315       Some time things don't appear as you wanted. The best way to see what's
316       going wrong is to run sysusage in debug mode. This mode allow you to
317       see all values extracted from sar and other tools. Use the --debug
318       option for that, this mode prevent sysusage to store data in the
319       rrdfiles. Command:
320
321               /usr/local/sysusage/bin/sysusage --debug
322
323       Please, run this command and check the result before sending bug
324       report.
325
326   Output
327       Once sysusage and sysusagegraph are running since some cycles, run your
328       favorite browser and take a look at the output directory. By default:
329
330               http://my.server.dom/sysusage/
331
332       If you have special URI and/or port remember to modify the URL
333       configuration directive without that the web interface will not works.
334

CONFIGURATION

336       During installation a default configuration file sysusage.cfg is
337       generated.  The default settings are good enougth to report essential
338       information of your system, but if you want to monitor some processes,
339       queue directories or some devices you must edit this file by hand.
340
341       Here is the format of the configuration file and all directives. There
342       is three section, the first one set the general parameters of the
343       application, the second set the parameters related to SMTP or Nagios
344       notification at threshold exceed and the last configure all type of
345       system information you may want to monitor.
346
347       Full sample of configuration file:
348
349               [GENERAL]
350               DEBUG       = 0
351               DATA_DIR    = /usr/local/sysusage/rrdfiles
352               PID_DIR     = /usr/local/sysusage/etc
353               DEST_DIR    = /var/www/htdocs/sysusage
354               SAR_BIN     = /usr/bin/sar
355               UPTIME      = /usr/bin/uptime
356               HOSTNAME    = /bin/hostname
357               INTERVAL    = 60
358               SKIP        = 12:00/14:00 20:00/06:00
359               HDDTEMP_BIN = /usr/local/sbin/hddtemp
360               SENSORS_BIN = /usr/bin/sensors
361               DAEMON      = 0
362               GRAPH_WIDTH = 550
363               GRAPH_HEIGHT= 200
364               FLAMING     = 0
365               HIRES       = 0
366               LINE_SIZE   = 2
367               PROC_QSIZE  = 4
368               RESRC_URL   =
369               SSH_BIN     = /usr/bin/ssh
370               SSH_OPTION  = -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
371               SSH_USER    =
372               SSH_IDENTITY=
373
374
375               [ALARM]
376               WARN_MODE   = 0
377               ALARM_PROG  = /usr/local/sysusage/bin/sysusagewarn
378               SMTP        = localhost
379               FROM        = root@localhost
380               TO          = root@localhost
381               NAGIOS      = /usr/local/nagios/bin/submit_check_result
382               UPPER_LEVEL = 1
383               LOWER_LEVEL = 2
384               URL         =
385
386               [MONITOR]
387               load:threshold_max_value
388               blocked:threshold_max_value
389               cpu:threshold_max_value
390               cswch:threshold_max_value
391               intr:threshold_max_value
392               mem:threshold_max_value
393               dirty:threshold_max_value
394               swap:threshold_max_value
395               work:threshold_max_value
396               share:threshold_max_value
397               sock:threshold_max_value
398               socktw:threshold_max_value
399               io:threshold_max_value
400               file:threshold_max_value
401               page:threshold_max_value
402               pcrea:threshold_max_value
403               pswap:threshold_max_value
404               net:threshold_max_value
405               tcp:threshold_max_value
406               err:threshold_max_value
407               disk:threshold_max_value
408               proc:proc_name:threshold_max_value:threshold_min_value
409               tproc:proc_name:threshold_max_value:threshold_min_value
410               queue:path_queue_dir:threshold_max_value
411               hddtemp:device:threshold_max_value
412               dev:device(alias):threshold_max_value
413               dev:device(alias):rpm_speed:raid_type:nb_disk
414               work:threshold_max_value
415               sensors:pattern:threshold_max_value
416               temp:device:threshold_max_value
417               fan:device:threshold_max_value
418               huge:threshold_max_value
419
420               [PLUGIN testplug]
421               title:Sysage Test plugin
422               menu:Database
423               enable:no
424               program:/usr/local/sysusage/plugins/plugin-sample.pl
425               minThreshold:0
426               maxThreshold:10
427               verticalLabel:Number of seconds
428               label1:Total seconds
429               label2:
430               label3:
431               legend1:seconds
432               legend2:
433               legend3:
434               remote:yes
435
436               [REMOTE hostname1]
437               enable:no
438               ssh_user:monitor
439               ssh_identity:/home/monitor/.ssh/id_rsa
440               #ssh_options: -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
441               #ssh_command:
442               remote_sysusage:/usr/local/sysusage/bin/rsysusage
443
444               #[GROUP Web Servers]
445               #hostname1
446               #hostname2
447
448   Section GENERAL
449       DEBUG   = 0|1
450           This option is used to set debug mode. If set to 1 then sysusage
451           and sysusagegraph just show what they do but don't create or send
452           anything.
453
454       DATA_DIR  = /path/to/rrdfiles
455           This option is used to set te ouput directory for all RRDTOOL
456           database.
457
458       PID_DIR   = /path/to/piddir
459           sysusage and sysusagegraph use a file to store the pid of the
460           running process to prevent simultaneous run.
461
462       DEST_DIR  = /path/to/html_output
463           Set the path to the directory where all HTML and graph files should
464           be created.
465
466       SAR_BIN   = /path/to/sar_binary
467           sysusage use sar, part of the sysstat distribution to grab system
468           information so we need to know where it is.
469
470       UPTIME    = /path/to/uptime_binary
471           sysusagegraph report the current uptime of the system using the
472           uptime command. Used to set path to uptime binary.
473
474       HOSTNAME  = /path/to/hostname_binary
475           All scripts of Sysusage distribution need to know the name of the
476           host.  They use hostname command for that.
477
478       INTERVAL  = pull_interval_in_second
479           All RRDTOOL input use the given interval in second to store
480           monitored values.  Graph construction also use this interval to
481           render things properly. By default Sysusage use an interval of 60
482           seconds to have a better statistic report. You can change this but
483           it's not recommanded. If you change this adjust your crontab to the
484           same value. This value must between 10 and 300 seconds. If you want
485           to be under the minute you must use the daemon mode to run
486           sysusage. See DAEMON bellow.
487
488       SKIP      = HH:MM/HH:MM HH:MM/HH:MM ...
489           You can define here some time range where monitoring will not be
490           done. Value is a list of begin_time/end_time separated by space or
491           tabulation. Let's say you don't want to monitor the host during the
492           night for some good reason, you can write it like that: 20:00/06:00
493
494       HDDTEMP_BIN = /path/to/hddtemp_binary
495           You can monitor your hard drive temperature if you have installed
496           hddtemp utility. We need to know the path to hddtemp binary.
497
498       SENSORS_BIN = /path/to/sensors_binary
499           You can monitor your device temperature if you have installed
500           lm_sensor utility. We need to know the path to sensors binary.
501
502       DAEMON = 0 | 1
503           You can monitor your system under the crond limitation of 1 minute
504           by running sysusage in daemon mode with an INTERVAL between 10 end
505           60 seconds.
506
507       GRAPH_WIDTH and GRAPH_HEIGHT
508           These are usefull if you want to resize graph dimension. Default is
509           a width of 550 pixels and a height of 200.
510
511       FLAMING
512           This is for fun, if you want to have random flaming effect on
513           graphs with only dataset set this directive to 1. Disable by
514           default. Not used with JQuery graph renderer.
515
516       HIRES
517           Allow addition of hourly graph to have fine granularity of the
518           data. This is disable by default. Set it to any integer between 1
519           to 23 hours included to show data from past N hours to now. Not
520           used with JQuery graph renderer as the Javascript library allow you
521           to zoom into the resolution you want.
522
523       LINE_SIZE
524           By default the graph line size is 1 if you want graph with a more
525           thick line set it to 2. This is rrd graph limitation (1 or 2). Not
526           used with JQuery graph renderer.
527
528       PROC_QSIZE
529           Number of simultaneous remote sysusage call process that should be
530           run. Default is 4 but it can be up to 15 or more depending of the
531           hardware configuration. One per core is the lower value you may
532           think about.
533
534       RESRC_URL
535           Images, javascripts and css ressources by default are search into
536           the DEST_DIR directory so that in the HTML view they all stayed on
537           the current main directory.  You may want to place thoses resources
538           on an other directory or an another place.  Using this directive
539           you can set any FQDN, absolute or relative URL for these resources.
540
541       SSH_IDENTITY
542           Used to set the default identity file to connect to all remote
543           hosts without password. If undefined, sysusage will use the ssh
544           system default value. You may want to use the default value unless
545           you know exactly what's you are doing.
546
547       SSH_OPTION
548           Use set the default ssh options, that correspond to a passwordless
549           authent:
550
551                   -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
552
553           with a five seconds connection timeout. You may want to increase
554           this timeout on very slow network links.
555
556           Do not change this value unless you know exactly what's you are
557           doing.
558
559       SSH_BIN
560           Path to the ssh command is set here at install time.
561
562       SSH_USER
563           Used to defined the default ssh user that will be used to connect
564           to all remote hosts.
565
566   Section ALARM
567       WARN_MODE   = 0|1
568           Used to disable/enable alert message during threshold exceed.
569
570       ALARM_PROG  = /path/to/sysusagewarn
571           Used to set path to the external program responsible of sending
572           alarm message.  You can change it to your own, just take a look at
573           the sysusagewarn usage to see what command line options are used by
574           sysusage
575
576       SMTP        = smtp.server.net
577           Name or Ip address of the SMTP server to contact. Default is none
578           => No smtp message is sent.
579
580       FROM        = sender@localhost
581           Sender email addresse to use in the SMTP message.
582
583       TO          = destination@localhost
584           Destination email address where the alarm message will be sent.
585
586       NAGIOS      = /usr/local/nagios/bin/submit_check_result
587           Path to the external nsca program used to send check message to
588           Nagios.  Setting this will activate nagios check report. See at end
589           of this file to see how to configure Nagios
590
591       UPPER_LEVEL = 1
592           Nagios check level to send when a high threshold limit is reached.
593           Default is 1 => WARNING.
594
595       LOWER_LEVEL = 2
596           Nagios check level to send when a low threshold limit is reached.
597           Default is 2 => CRITICAL.
598
599       URL = Url of Sysusage report
600           Used to overwrite the default URL of SysUsage report
601           http://host.dom/sysusage/ especially if you have a special port or
602           a different path. Example:
603           http://hostname.domain:9080/Reports/Sysusage/
604
605       SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
606           You can define here some time range where alarm notice will not be
607           sent.  Value is a list of begin_time/end_time separated by space or
608           tabulation.  Let's say you don't want to received notice during the
609           night for some good reason, you can write it like that: 20:00/06:00
610
611   Section MONITOR
612       This section has two different format the first one is used to specify
613       most of the monitoring target:
614
615               type:threshold_max
616
617       or
618
619               type:threshold_max(attempt)
620
621       type
622           Type of system information you may want to monitor. It can takes
623           around 30 differents values:
624
625                   load   => monitor load average
626                   blocked=> monitor task blocked waiting for I/O
627                   cpu    => monitor each cpu(s) user/nice/system usage
628                          => monitor each cpu(s) total/iowait usage
629                          => monitor each cpu(s) steal/guest usage
630                   cpuall => monitor global cpu(s) statistics
631                   cswch  => monitor context switches usage
632                   intr   => monitor number of interrupt per second
633                   mem    => monitor memory usage
634                   dirty  => monitor memory active/inactive/dirty memory
635                   share  => monitore Posix share memory usage (/dev/shm)
636                   swap   => monitor swap usage
637                   work   => monitor amount of memory needed for current workload
638                   sock   => monitor number of open socket
639                   socktw => monitor number of socket in TIME_WAIT state
640                   io     => monitor I/O request and block usage
641                   page   => monitor I/O page usage
642                   pswap  => monitor I/O page swap usage
643                   pcrea  => monitor number of process created per second
644                   proc   => monitor number of running process
645                   tproc  => monitor number of running thread
646                   file   => monitor number of open file
647                   queue  => monitor number of files in queue
648                   net    => monitor I/O network bytes on all network interfaces
649                   err    => monitor bad packet, drop and collision on interfaces
650                   tcp    => monitor number of tcp connection and segment
651                   disk   => monitor disk space usage
652                   dev    => monitor percentage of CPU time per device
653                          => monitor average request queue length
654                          => monitor I/O sectors read and write to device
655                          => monitor time spent in queue (await)
656                          => monitor time spent in servicing (svctm)
657                   sensors=> monitor fan and device temperature using sensors command
658                   hddtemp=> monitor disk drive temperature
659                   temp   => monitor device temperature using sar
660                   fan    => monitor fan rotation using sar
661                   huge   => monitor size of hugepages utilisation
662
663           Note: the 'cpu' target monitoring type will report all statictics
664           per cpu. This can represent a lot of informations if you several
665           cpu. To limit statistics to total cpu only, you must replace
666           default the 'cpu' target to 'cpuall' in your configuration file.
667
668       threshold_max
669                   This is the maximum threshold value. Any value equal or upper
670                   than this one will generate SMTP and/or Nagios alert if you
671                   have enable it.
672
673       attempt
674           You can delay the call to the alarm program at threshold exceed by
675           specifying the number of consecutive exceed attempt before the
676           command will be called.  Just specify the number of attempt between
677           bracket just after the min and/or max threshold value. This setting
678           is optional for both threshold value and the default is to send
679           alarm immediatly.
680
681       Specials cases
682           There's a special case for 'disk' usage monitoring that allow
683           exclusion of some mount point. This is usefull if you have hard
684           link or some special device you don't need to monitor. Where
685           exclusion is a semi- colon (;) separated list of mount point to
686           exclude from monitoring.
687
688                   disk:ThresholdMax:exclusion
689
690           Ex: disk:90:/home/mondo_image;/home/smb_mountpoint
691
692           You can use regexp in your excluded path.
693
694           The other directive with special syntax is 'dev'. It is construct
695           as follow:
696
697                   dev:device(alias):rpm_speed:raid_type:nb_disk
698
699           where device is sda, sdb or any device name (without the /dev/),
700           the alias between parenthesis is the name that must be displayed in
701           the user interface instead of the device name. For example:
702
703                   dev:sdc(ASM disk1):
704                   dev:sdb(/data):
705
706           I you plan to use I/O workload report, SysUsage need to know the
707           speed of the disk (RPM), the raid type (0,1,5,10) and the number of
708           disk in the raid array to calculate the IOPS. For example if we
709           have a 7200 RPM disk with 2 disk in raid 1, we will write thing
710           like that:
711
712                   dev:sdc(ASM disk1):7200:1:2
713
714           I/O workload is the relation between TPS (transfers per second) and
715           IOPS (I/O operations measured in seconds) of a device. If the tps
716           returned by sysstat reach the maximum theoretical IOPS, your
717           storage subsystem is saturated.  Here is the equation to calculate
718           the maximum theoretical IOPS:
719
720                   d = number of disks
721                   dIOPS = IOPS per disk
722                   %r = % of read workload
723                   %w = % of write workload
724                   F = raid factor
725
726                   IOPS = (d *dIOPS) / (%r + (F * %w))
727
728           the theoretical maximum IOPS for a RAID set (excluding caching of
729           course).  To do this you take the product of the number of disks
730           and IOPS per disk divided by the sum of the %read workload and the
731           product of the raid factor and %write workload. Where %read and
732           %write are calculated from the following equation:
733
734                   %r = rd_sec / (rd_sec + wr_sec);
735                   %w = wr_sec / (rd_sec + wr_sec);
736
737           This IOPS monitoring is build following the excellent article of
738           Nick Anderson readable from Analyzing I/O performance in Linux.
739
740       The second format is used to monitor running process, hard drive
741       temperature or queue directory. It has the following format:
742
743               type:target:threshold_max_value:threshold_min_value
744
745       or
746
747               type:target:threshold_max_value(attempt):threshold_min_value(attempt)
748
749       type
750           Type of system information you may want to monitor. It can takes
751           these differents values:
752
753                   load, cpu, cswch, intr, mem, swap, work, share, sock, socktw, io, file,
754                   page, pcrea, pswap, net, tcp, err, disk, proc, tproc, queue, hddtemp,
755                   dev, work, sensors, temp, fan, huge, blocked, dirty
756
757       target
758           If type is 'proc' or 'tproc' target represent the name of the
759           process to monitor. You can put a regexp as target to match exactly
760           the required process.  The number of running process are obtain by
761           the system command line:
762
763                   ps -e -o command | grep -E "target" | grep -v grep | wc -l
764
765           so you can replace the word target by the regexp to match and see
766           if it returns the right number of process.
767
768           The number of running thread are obtain by the system command line:
769
770                   ps -eL -o command | grep -E "target" | grep -v grep | wc -l
771
772           If type is 'queue' this represent the full path of the directory to
773           monitor.  Sysusage will try to find and count any regular file in
774           the target directory and will not follow sub directories.
775
776           If type is 'hddtemp' the target represent the hard drive device to
777           monitor, ex: /dev/sda. You can try it with the following command
778           line:
779
780                   hddtemp -n /dev/sda
781
782           This may return the actual temperature detected on the hard drive.
783
784           If this is 'dev' this represent the device name to monitor. Ex:
785           sda.  Do not add the /dev/ before this will not work. You may want
786           to change the device name in the graphic menu, this is possible by
787           adding the device alias enclosed with parenthesis.
788
789           For example lets say you're monitoring some EMCpower SAN device.
790           Using sar the reported devices are dev120-48 and dev120-64. Once
791           you have find what partition are mapped to these devices (reading
792           /proc/partitions). In this example these devices are mounted as
793           /cache1 and /cache2 so we want to see these mount points instead of
794           device number in the graphical menu:
795
796                   dev:dev120-48(/cache1):90
797                   dev:dev120-64(/cache2):97
798
799           in you sysusage.conf file will do the job. The threshold_max value
800           is the max percentage of CPU used for this device before sending an
801           alarm.
802
803           If type is 'sensors' this represent the pattern to match to obtain
804           temperature or fan speed information in the sensors program output.
805           See chapter SENSORS to have more information.
806
807           If type is 'temp' or 'fan' this represent the device number
808           reported by sar to obtain temperature or fan speed information. To
809           know what device number must be used, see result of command: sar -m
810           ALL 1 1
811
812       threshold_max
813           This is the maximum threshold value. Any value equal or upper will
814           generate an SMTP and/or Nagios alert if you have enable it.
815
816       threshold_min
817           This is the minimum threshold value. Any value equal or lower of
818           this one will generate SMTP and/or Nagios alert if you have enable
819           it. Min threshold should certainly only be used with 'proc' and
820           'tproc' monitoring type. If you set it to 0 then you will be warn
821           if any of the monitored process are down.
822
823       attempt
824           You can delay the call to the alarm program at threshold exceed by
825           specifying the number of consecutive exceed attempt before the
826           command will be called.  Just specify the number of attempt between
827           bracket just after the min and/or max threshold value. This setting
828           is optional for both threshold value and the default is to send
829           alarm immediatly.
830
831           For example a load average monitoring defined like this
832
833                   load:12(3)
834
835           will send an alarm when the system load average will exceed 12
836           after three consecutives attempts at the define interval. If the
837           interval is 60 seconds, the alarm will be sent up to 180 second
838           after the first exceed.
839
840   Section PLUGIN
841       This part enable the use of custom plugins. You can call any program or
842       script provide that it return up to 3 numbers separated by a space
843       character. See plugins/ directory for sample scripts.
844
845       This section must include a name composed of any alphanumeric character
846       that will be used to create the target file, for example:
847
848               [PLUGIN testplug1] or [PLUGIN testplug2]
849
850       The section allow the following configuration directives. They are
851       composed of named directives followed by ':' or '=' and a value.
852
853       enable
854           Is used to disable temporary the plugin monitoring. Default is
855           'yes' enable.  To disable write it enable:no
856
857       program
858           Is used to set the path to the program or script to execute as
859           plugin. This program must print to STDOUT 1 to 3 numbers separated
860           by a space character as result following the number of reports you
861           want. So each plugin can have 1, 2 or 3 graphed data.
862
863       title
864           Is used to set the title of the report page and the index link.
865           Default is set to "Sysusage plugin".
866
867       menu
868           Is used to store the plugin under a submenu of the plugins menu.
869           Default is to store plugin under the "Others" submenu.
870
871       maxthreshold
872           This is the maximum threshold value. Any value equal or upper than
873           this one will generate SMTP and/or Nagios alert if you have enable
874           it.
875
876       minthreshold
877           This is the minimum threshold value. Any value equal or lower of
878           this one will generate SMTP and/or Nagios alert if you have enable
879           it.
880
881       verticallabel
882           This is used to set the vertical label of the graph.
883
884       label1, label2, label3
885           Are used to show a legend for each graphed data, label1 is for the
886           first returned value, label2 for the second and label3 for the
887           last. If you just have one value returned just omit the other
888           labels.
889
890       legend1, legend2, legend3
891           These are use to set the units for Current, Avg and Max values.
892
893       remote
894           This directive must be set to 'no' to prevent execution of the
895           plugin program by a issh call to sysusage in a remote context. This
896           directive is activated by default ('yes').
897
898   Section REMOTE
899       This part allow to run sysusage on remote hosts from a central server.
900       It use ssh to execute sysusage on the destination host with the -r
901       option that force sysusage to not write anything to local data files
902       but to print all result to stdout. As sysusage is run by cron job or
903       daemon mode it can not authenticate interactively to remote host so you
904       must give a ssh user and an identity file with the corresponding
905       configuration option.
906
907       This section must include the name or the ip address of the remote host
908       that will be used to create the target data directory, for example:
909
910               [REMOTE hostname] or [REMOTE host.domain.dom] or [REMOTE 192.168.1.14]
911
912       The section allow the following configuration directives. They are
913       composed of named directives followed by ':' or '=' and a value.
914
915       Once you have installed sysusage on all remote host and exchange the
916       SSH key certificat between the central host and all remote hosts, most
917       of the time you just have to set the ssh_user directive to have it
918       working. Use remote_sysusage directive if sysusage perl script is not
919       installed on the same place than the central server.
920
921   Section GROUP
922       This section allow you to groups remote host report under a common
923       groupname in the index page. Remote hosts will be ordered following
924       their parent groups.  The name of the group can be any string and the
925       values in the section must be a list of remote servers defined in the
926       REMOTE sections.
927
928       For example if you are monitoring a cluster of web and database servers
929       you can use the following declaration:
930
931               [GROUP Web Servers]
932               webhost1
933               webhost2
934               webhost3
935
936               [GROUP Database Servers]
937               dbhost1
938               dbhost2
939
940       Of course webhostN and dbhostN hosts must be declared in the remote
941       section.
942
943       enable
944           Is used to enable/disable the remote host monitoring. Default is
945           'yes' enable.  Set it as 'enable=no' to disable it.
946
947       ssh_user
948           Used to defined the ssh user allowed to connect to remote host. By
949           default the value set to SSH_USER configuration option in the
950           GENERAL section will be used.
951
952       ssh_identity
953           Used to set the identity file to connect to remote host without
954           password.  By default the value set to SSH_IDENTITY configuration
955           option in the GENERAL section will be used. Usually this is the
956           private key that you've generated using ssh-keygen and most of the
957           time file $HOME/.ssh/id_rsa. You may want to use the default value
958           unless you know exactly what's you are doing.
959
960       ssh_options
961           Use to overwrite the default ssh options, that are:
962
963                   -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
964
965           The default options are set into the SSH_OPTIONS configuration
966           option in the GENERAL section. You may want to use the default
967           value unless you know exactly what's you are doing.
968
969       ssh_command
970           You can overwrite the complete ssh command using this directive,
971           this will replace the ssh command, the ssh option, the ssh user and
972           the host part.  The sysusage remote command will not be replaced.
973           You may want to use the default value unless you know exactly
974           what's you are doing.
975
976       remote_sysusage
977           Use it to set the path to the rsysusage command that must be used
978           on the remote host, SysUsage will automatically add the -r option
979           to cause the remote execution mode.
980

THRESHOLD NOTIFICATION

982   SMTP alert
983       Sysusage use an external perl script to send SMTP alert and/or Nagios
984       checks when a max or min threshold is reached. This program is named
985       sysusagewarn.  All options of the configuration file in section [ALARM]
986       are use by sysusage to call this program. If they are correctly set you
987       don't have to take care of the parameters given to this program. If you
988       want to use this program outside sysusage, here are the command line
989       options it understand:
990
991               Usage: sysusagewarn -t subject -c current_value -v threshold_value
992                               [-s smtp_srv] [-f from] [-d to] [-b hostname_prog]
993
994               -t subject : Subject of the alarm
995               -c value   : Current value monitored by sysusage
996               -v value   : Threshold value used.
997               -s host    : SMTP server name or ip where to send email.
998               -f from    : Sender email address of the alarm message.
999               -d to      : Destination address of the alarm message.
1000               -b path    : Path to program hostname. Default is /bin/hostname
1001               -n path    : Path to Nagios program submit_check_result. Default none.
1002               -l value   : Alarm level (0=OK,1=WARNING,2=CRITICAL). Default: 1.
1003               -r service : Nagios service name to used. Must be any sysusage type of
1004                            monitoring defined in the configuration file.
1005               -u url     : Url to HTML sysusage output to include in email.
1006                            Default: http://hostname.domain/sysusage/
1007               -h         : Output this message and exit
1008
1009   NAGIOS alert
1010       SysUsage send check message to Nagios through an external command
1011       (submit_check_result). So you need to create the host and associate all
1012       sysusage service that you want to monitor with Nagios. The services
1013       name correspond to the type of monitoring. For example, if you have
1014       enable alarm on memory usage the service sent is 'mem'. There's also
1015       specials case with type of monitoring with multiple instance like
1016       network monitoring. You need to create a service per instance. For
1017       example type 'net' will have 'net_eth0' and 'net_lo' and more if you
1018       have more network interface. To see if your sysusage alarm messages are
1019       well understood by Nagios take a look at the nagios.log file (default
1020       to /usr/local/nagios/var/nagios.log).
1021
1022       To desactivate automatically an alarm reported to Nagios, SysUsage will
1023       send each time it run an OK request if every thing is correct for the
1024       monitored type.
1025

SENSORS

1027       Monitoring of sensors output is based on regexp. To be clear enought
1028       here an example:
1029
1030       Sensors output on my server:
1031
1032               adt7463-i2c-0-2d
1033               Adapter: SMBus I801 adapter at 1480
1034               V1.5:        +3.23 V  (min =  +0.00 V, max =  +3.32 V)
1035               VCore:       +1.24 V  (min =  +1.10 V, max =  +1.49 V)
1036               V3.3:        +3.33 V  (min =  +2.80 V, max =  +3.78 V)
1037               V5:          +4.99 V  (min =  +4.25 V, max =  +5.75 V)
1038               V12:         +0.11 V  (min =  +0.00 V, max = +15.94 V)
1039               CPU_Fan:       0 RPM  (min =    0 RPM)
1040               fan2:       10671 RPM  (min = 8095 RPM)
1041               fan3:          0 RPM  (min =    0 RPM)
1042               fan4:          0 RPM  (min =    0 RPM)
1043               CPU Temp:    +69.5 C  (low  =  +2.0 C, high = +91.0 C)
1044               Board Temp:  +32.5 C  (low  =  +2.0 C, high = +83.0 C)
1045               Remote Temp: +31.2 C  (low  =  +2.0 C, high = +58.0 C)
1046               cpu0_vid:   +1.338 V
1047
1048               adt7463-i2c-0-2e
1049               Adapter: SMBus I801 adapter at 1480
1050               V1.5:        +3.21 V  (min =  +0.00 V, max =  +3.32 V)
1051               VCore:       +1.28 V  (min =  +1.10 V, max =  +1.49 V)
1052               V3.3:        +3.32 V  (min =  +2.80 V, max =  +3.78 V)
1053               V5:          +4.95 V  (min =  +0.00 V, max =  +6.64 V)
1054               V12:         +0.11 V  (min =  +0.00 V, max = +15.94 V)
1055               CPU_Fan:    10843 RPM  (min = 8095 RPM)
1056               fan2:          0 RPM  (min =    0 RPM)
1057               fan3:       9642 RPM  (min = 8095 RPM)
1058               fan4:          0 RPM  (min =    0 RPM)
1059               CPU Temp:    +57.2 C  (low  =  +2.0 C, high = +91.0 C)
1060               Board Temp:  +35.2 C  (low  =  +2.0 C, high = +91.0 C)
1061               Remote Temp: +35.8 C  (low  =  +2.0 C, high = +58.0 C)
1062               cpu0_vid:   +1.338 V
1063
1064       Following the sensors kernel module load you could have more or less
1065       output than that. To monitor all sensors CPUs temperature on my server
1066       I need to add the following lines into sysusage.cfg:
1067
1068               sensors:CPU Temp:75
1069               sensors:Board Temp:45
1070               sensors:Remote Temp:45
1071
1072       This will create 3 graphs based on lines matching 'CPU Temp', an other
1073       with lines matching 'Board Temp' and the last with lines matching
1074       'Remote Temp'.  As I have 2 CPUs for each graph there will be 2 values.
1075       You can not report more than 3 values per graph, this is hard coded
1076       into sysusage. So if you have more CPUs you will not see more than 3
1077       values. Here it will sent alarm when temperature exceed the given
1078       values (75,45,45).
1079
1080       To monitor fan speed, I just add lines like this in the configuration
1081       file:
1082
1083               sensors:fan2:11000:8095
1084               sensors:fan3:11000:8095
1085
1086       This whil create 2 graphs for fan 2 and fan 3. With an alarm sent when
1087       speed exceed 11000 RPM or is lower than 8095 RPM.
1088
1089       On my personal computer (/etc/sysconfig/lm_sensors => modprobe
1090       coretemp) sensors output is:
1091
1092               coretemp-isa-0000
1093               Adapter: ISA adapter
1094               Core 0:      +53.0 C  (high = +78.0 C, crit = +100.0 C)
1095
1096               coretemp-isa-0001
1097               Adapter: ISA adapter
1098               Core 1:      +50.0 C  (high = +78.0 C, crit = +100.0 C)
1099
1100       To monitor CPU temprature, I just add this line in my sysusage.cfg:
1101
1102               sensors:Core:70
1103
1104       This will generate a graph with 2 graphed data for Core 0 and Core 1.
1105
1106       Now that sysstat sar natively reports deviceis temperature and fan
1107       speed you don't need sensors anymore. Type 'temp' can be used instead
1108       and type 'fan' for the fan speed. The target of these types is the
1109       device number, See sar -m TEMP or sar -m FAN to see which device number
1110       to monitor.
1111

BUGS / FEATURE REQUEST

1113       Please report any bugs, remarqs and feature request using the Github
1114       interface at https://github.com/darold/sysusage/ or send a mail to the
1115       author.
1116

LICENSE

1118       Copyright (C) 2003-2018 Gilles Darold
1119
1120       This program is free software; you can redistribute it and/or modify it
1121       under the terms of the GNU General Public License as published by the
1122       Free Software Foundation; either version 3 of the License, or any later
1123       version.
1124
1125       This program is distributed in the hope that it will be useful, but
1126       WITHOUT ANY WARRANTY; without even the implied warranty of
1127       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
1128       General Public License for more details.
1129
1130       You should have received a copy of the GNU General Public License along
1131       with this program; if not, write to the Free Software Foundation, Inc.,
1132       51 Franklin Street, Fifth Floor, Boston, MA 02110-1301  USA
1133

AUTHOR

1135       Gilles Darold <gilles _|_At_|_ darold _|_DoT_|_ net>
1136

ACKNOWLEGMENT

1138       I want ot thanks all the people who help to build this tool with a very
1139       special thank to Marat Dyatko for the web design contribution.
1140
1141
1142
1143perl v5.26.1                      2018-08-06                       SYSUSAGE(1)
Impressum