1MONIT(1) User Commands MONIT(1)
2
3
4
6 Monit - utility for monitoring services on a Unix system
7
9 monit [options] {arguments}
10
12 monit is a utility for managing and monitoring processes, files, direc‐
13 tories and filesystems on a Unix system. Monit conducts automatic main‐
14 tenance and repair and can execute meaningful causal actions in error
15 situations. E.g. Monit can start a process if it does not run, restart
16 a process if it does not respond and stop a process if it uses too much
17 resources. You may use Monit to monitor files, directories and filesys‐
18 tems for changes, such as timestamps changes, checksum changes or size
19 changes.
20
21 Monit is controlled via an easy to configure control file based on a
22 free-format, token-oriented syntax. Monit logs to syslog or to its own
23 log file and notifies you about error conditions via customizable alert
24 messages. Monit can perform various TCP/IP network checks, protocol
25 checks and can utilize SSL for such checks. Monit provides a http(s)
26 interface and you may use a browser to access the Monit program.
27
29 The behavior of Monit is controlled by command-line options and a run
30 control file, ~/.monitrc, the syntax of which we describe in a later
31 section. Command-line options override .monitrc declarations.
32
33 The following options are recognized by monit. However, it is recom‐
34 mended that you set options (when applicable) directly in the .monitrc
35 control file.
36
37 General Options and Arguments
38
39 -c file
40 Use this control file
41
42 -d n
43 Run as a daemon once per n seconds
44
45 -g
46 Set group name for start, stop, restart, monitor and
47 unmonitor.
48
49 -l logfile
50 Print log information to this file
51
52 -p pidfile
53 Use this lock file in daemon mode
54
55 -s statefile
56 Write state information to this file
57
58 -I
59 Do not run in background (needed for run from init)
60
61 -t
62 Run syntax check for the control file
63
64 -v
65 Verbose mode, work noisy (diagnostic output)
66
67 -H [filename]
68 Print MD5 and SHA1 hashes of the file or of stdin if the
69 filename is omitted; Monit will exit afterwards
70
71 -V
72 Print version number and patch level
73
74 -h
75 Print a help text
76
77 In addition to the options above, Monit can be started with one of the
78 following action arguments; Monit will then execute the action and exit
79 without transforming itself to a daemon.
80
81 start all
82 Start all services listed in the control file and
83 enable monitoring for them. If the group option is
84 set, only start and enable monitoring of services in
85 the named group (no "all" verb is required in this
86 case).
87
88 start name
89 Start the named service and enable monitoring for
90 it. The name is a service entry name from the
91 monitrc file.
92
93 stop all
94 Stop all services listed in the control file and
95 disable their monitoring. If the group option is
96 set, only stop and disable monitoring of the services
97 in the named group (no "all" verb is required in this
98 case).
99
100 stop name
101 Stop the named service and disable its monitoring.
102 The name is a service entry name from the monitrc
103 file.
104
105 restart all
106 Stop and start all services. If the group option
107 is set, only restart the services in the named group
108 (no "all" verb is required in this case).
109
110 restart name
111 Restart the named service. The name is a service entry
112 name from the monitrc file.
113
114 monitor all
115 Enable monitoring of all services listed in the
116 control file. If the group option is set, only start
117 monitoring of services in the named group (no "all"
118 verb is required in this case).
119
120 monitor name
121 Enable monitoring of the named service. The name is
122 a service entry name from the monitrc file. Monit will
123 also enable monitoring of all services this service
124 depends on.
125
126 unmonitor all
127 Disable monitoring of all services listed in the
128 control file. If the group option is set, only disable
129 monitoring of services in the named group (no "all"
130 verb is required in this case).
131
132 unmonitor name
133 Disable monitoring of the named service. The name is
134 a service entry name from the monitrc file. Monit
135 will also disable monitoring of all services that
136 depends on this service.
137
138 status
139 Print full status information for each service.
140
141 summary
142 Print short status information for each service.
143
144 reload
145 Reinitialize a running Monit daemon, the daemon will
146 reread its configuration, close and reopen log files.
147
148 quit
149 Kill a Monit daemon process
150
151 validate
152 Check all services listed in the control file. This
153 action is also the default behavior when Monit runs
154 in daemon mode.
155
157 You may use Monit to monitor daemon processes or similar programs run‐
158 ning on localhost. Monit is particular useful for monitoring daemon
159 processes, such as those started at system boot time from /etc/init.d/.
160 For instance sendmail, sshd, apache and mysql. In difference to many
161 monitoring systems, Monit can act if an error situation should occur,
162 e.g.; if sendmail is not running, monit can start sendmail or if apache
163 is using too much resources (e.g. if a DoS attack is in progress)
164 Monit can stop or restart apache and send you an alert message. Monit
165 can also monitor process characteristics, such as; if a process has
166 become a zombie and how much memory or cpu cycles a process is using.
167
168 You may also use Monit to monitor files, directories and filesystems on
169 localhost. Monit can monitor these items for changes, such as time‐
170 stamps changes, checksum changes or size changes. This is also useful
171 for security reasons - you can monitor the md5 checksum of files that
172 should not change.
173
174 You may even use Monit to monitor remote hosts. First and foremost
175 Monit is a utility for monitoring and mending services on localhost,
176 but if a service depends on a remote service, e.g. a database server
177 or an application server, it might by useful to be able to test a
178 remote host as well.
179
180 You may monitor the general system-wide resources such as cpu usage,
181 memory and load average.
182
184 Monit is configured and controlled via a control file called monitrc.
185 The default location for this file is ~/.monitrc. If this file does not
186 exist, Monit will try /etc/monitrc, then @sysconfdir@/monitrc and
187 finally ./monitrc.
188
189 A Monit control file consists of a series of service entries and global
190 option statements in a free-format, token-oriented syntax. Comments
191 begin with a # and extend through the end of the line. There are three
192 kinds of tokens in the control file: grammar keywords, numbers and
193 strings.
194
195 On a semantic level, the control file consists of three types of state‐
196 ments:
197
198 1. Global set-statements
199 A global set-statement starts with the keyword set and the item to
200 configure.
201
202 2. Global include-statement
203 The include statement consists of the keyword include and a glob
204 string.
205
206 3. One or more service entry statements.
207 A service entry starts with the keyword check followed by the ser‐
208 vice type.
209
210 A Monit control file example:
211
212 #
213 # Monit control file
214 #
215
216 set daemon 120 # Poll at 2-minute intervals
217 set logfile syslog facility log_daemon
218 set alert foo@bar.baz
219 set httpd port 2812 and use address localhost
220 allow localhost # Allow localhost to connect
221 allow admin:Monit # Allow Basic Auth
222
223 check system myhost.mydomain.tld
224 if loadavg (1min) > 4 then alert
225 if loadavg (5min) > 2 then alert
226 if memory usage > 75% then alert
227 if cpu usage (user) > 70% then alert
228 if cpu usage (system) > 30% then alert
229 if cpu usage (wait) > 20% then alert
230
231 check process apache
232 with pidfile "/usr/local/apache/logs/httpd.pid"
233 start program = "/etc/init.d/httpd start" with timeout 60 seconds
234 stop program = "/etc/init.d/httpd stop"
235 if 2 restarts within 3 cycles then timeout
236 if totalmem > 100 Mb then alert
237 if children > 255 for 5 cycles then stop
238 if cpu usage > 95% for 3 cycles then restart
239 if failed port 80 protocol http then restart
240 group server
241 depends on httpd.conf, httpd.bin
242
243 check file httpd.conf
244 with path /usr/local/apache/conf/httpd.conf
245 # Reload apache if the httpd.conf file was changed
246 if changed checksum
247 then exec "/usr/local/apache/bin/apachectl graceful"
248
249 check file httpd.bin
250 with path /usr/local/apache/bin/httpd
251 # Run /watch/dog in the case that the binary was changed
252 # and alert in the case that the checksum value recovered
253 # later
254 if failed checksum then exec "/watch/dog"
255 else if recovered then alert
256
257 include /etc/monit/mysql.monitrc
258 include /etc/monit/mail/*.monitrc
259
260 The above example illustrates a service entry for monitoring the apache
261 web server process as well as related files. The meaning of the various
262 statements will be explained in the following sections.
263
265 Monit will log status and error messages to a log file. Use the set
266 logfile statement in the monitrc control file. To setup Monit to log to
267 its own logfile, use e.g. set logfile /var/log/monit.log. If syslog is
268 given as a value for the -l command-line switch (or the keyword set
269 logfile syslog is found in the control file) Monit will use the syslog
270 system daemon to log messages with a priority assigned to each message
271 based on the context. To turn off logging, simply do not set the log‐
272 file in the control file (and of course, do not use the -l switch)
273
275 The -d interval command-line switch runs Monit in daemon mode. You must
276 specify a numeric argument which is a polling interval in seconds.
277
278 In daemon mode, Monit detaches from the console, puts itself in the
279 background and runs continuously, monitoring each specified service and
280 then goes to sleep for the given poll interval.
281
282 Simply invoking
283
284 Monit -d 300
285
286 will poll all services described in your ~/.monitrc file every 5 min‐
287 utes.
288
289 It is strongly recommended to set the poll interval in your ~/.monitrc
290 file instead, by using set daemon n, where n is an integer number of
291 seconds. If you do this, Monit will always start in daemon mode (as
292 long as no action arguments are given). Example (check every 5 min‐
293 utes):
294
295 set daemon 300
296
297 If you need Monit to wait some time at startup before it start checking
298 services you can use the delay statement. Example (check every 5 min‐
299 utes, wait 1 minute on start before first monitoring cycle):
300
301 set daemon 300 with start delay 60
302
303 Monit makes a per-instance lock-file in daemon mode. If you need more
304 Monit instances, you will need more configuration files, each pointing
305 to its own lock-file.
306
307 Calling monit with a Monit daemon running in the background sends a
308 wake-up signal to the daemon, forcing it to check services immediately.
309
310 The quit argument will kill a running daemon process instead of waking
311 it up.
312
314 Monit can run and be controlled from init. If Monit should crash, init
315 will re-spawn a new Monit process. Using init to start Monit is proba‐
316 bly the best way to run Monit if you want to be certain that you always
317 have a running Monit daemon on your system. (It's obvious, but never
318 the less worth to stress; Make sure that the control file does not have
319 any syntax errors before you start Monit from init. Also, make sure
320 that if you run monit from init, that you do not start Monit from a
321 startup scripts as well).
322
323 To setup Monit to run from init, you can either use the 'set init'
324 statement in monit's control file or use the -I option from the command
325 line and here is what you must add to /etc/inittab:
326
327 # Run Monit in standard run-levels
328 mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc
329
330 After you have modified init's configuration file, you can run the fol‐
331 lowing command to re-examine /etc/inittab and start monit:
332
333 telinit q
334
335 For systems without telinit:
336
337 kill -1 1
338
339 If Monit is used to monitor services that are also started at boot time
340 (e.g. services started via SYSV init rc scripts or via inittab) then,
341 in some cases, a race condition could occur. That is; if a service is
342 slow to start, Monit can assume that the service is not running and
343 possibly try to start it and raise an alert, while, in fact the service
344 is already about to start or already in its startup sequence. Please
345 see the FAQ for solutions to this problem.
346
348 The Monit control file, monitrc, can include additional configuration
349 files. This feature helps to maintain a certain structure or to place
350 repeating settings into one file. Include statements can be placed at
351 virtually any spot. The syntax is the following:
352
353 INCLUDE globstring
354
355 The globstring is any kind of string as defined in glob(7). Thus, you
356 can refer to a single file or you can load several files at once. In
357 case you want to use whitespace in your string the globstring need to
358 be embedded into quotes (') or double quotes ("). For example,
359
360 INCLUDE "/etc/monit/Monit configuration files/printer.*.monitrc"
361
362 loads any file matching the single globstring. If the globstring
363 matches a directory instead of a file, it is silently ignored.
364
365 INCLUDE statements in included files are parsed as in the main control
366 file.
367
368 If the globstring matches several results, the files are included in a
369 non sorted manner. If you need to rely on a certain order, you might
370 need to use single include statements.
371
373 Service entries in the control file, monitrc, can be grouped together
374 by the group statement. The syntax is simply (keyword in capital):
375
376 GROUP groupname
377
378 With this statement it is possible to group similar service entries
379 together and manage them as a whole. Monit provides functions to start,
380 stop, restart, monitor and unmonitor a group of services, like so:
381
382 To start a group of services from the console:
383
384 Monit -g <groupname> start
385
386 To stop a group of services:
387
388 Monit -g <groupname> stop
389
390 To restart a group of services:
391
392 Monit -g <groupname> restart
393
394 Note: the status and summary commands don't support the -g option and
395 will print the state of all services.
396
397 Service can be added to multiple groups by adding group statement mul‐
398 tiple times:
399
400 group www
401 group filesystem
402
404 Monit supports three monitoring modes per service: active, passive and
405 manual. See also the example section below for usage of the mode state‐
406 ment.
407
408 In active mode, Monit will monitor a service and in case of problems
409 Monit will act and raise alerts, start, stop or restart the service.
410 Active mode is the default mode.
411
412 In passive mode, Monit will passively monitor a service and specifi‐
413 cally not try to fix a problem, but it will still raise alerts in case
414 of a problem.
415
416 For use in clustered environments there is also a manual mode. In this
417 mode, Monit will enter active mode only if a service was brought under
418 monit's control, for example by executing the following command in the
419 console:
420
421 Monit start sybase
422 (Monit will call sybase's start method and enable monitoring)
423
424 If a service was not started by Monit or was stopped or disabled for
425 example by:
426
427 Monit stop sybase
428 (Monit will call sybase's stop method and disable monitoring)
429
430 Monit will then not monitor the service. This allows for having ser‐
431 vices configured in monitrc and start it with Monit only if it should
432 run. This feature can be used to build a simple failsafe cluster. To
433 see how, read more about how to setup a cluster with Monit using the
434 heartbeat system in the examples sections below.
435
436 A service's monitoring state is persistent across Monit restart. This
437 means that you probably would like to make certain that services in
438 manual mode are stopped or in unmonitored mode at server shutdown. Do
439 for instance the following in a server shutdown script:
440
441 Monit stop sybase
442
443 or
444
445 Monit unmonitor sybase
446
447 If you use Monit in a HA-cluster you should place the state file in a
448 temporary filesystem so if the machine should crash and the stand-by
449 machine take over services, any manual monitoring mode services that
450 were started on the crashed machine won't be started on reboot. Use for
451 example:
452
453 set statefile /tmp/monit.state
454
456 Monit will raise an email alert in the following situations:
457
458 o A service timed out
459 o A service does not exist
460 o A service related data access problem
461 o A service related program execution problem
462 o A service is of invalid object type
463 o A icmp problem
464 o A port connection problem
465 o A resource statement match
466 o A file checksum problem
467 o A file size problem
468 o A file/directory timestamp problem
469 o A file/directory/filesystem permission problem
470 o A file/directory/filesystem uid problem
471 o A file/directory/filesystem gid problem
472 o An action is done per administrator's request
473
474 Monit will send an alert each time a monitored object changed. This
475 involves:
476
477 o Monit started, stopped or reloaded
478 o A file checksum changed
479 o A file size changed
480 o A file content match
481 o A file/directory timestamp changed
482 o A filesystem mount flags changed
483 o A process PID changed
484 o A process PPID changed
485
486 You use the alert statement to notify Monit that you want alert mes‐
487 sages sent to an email address. If you do not specify an alert state‐
488 ment, Monit will not send alert messages.
489
490 There are two forms of alert statement:
491
492 o Global - common for all services
493 o Local - per service
494
495 In both cases you can use more than one alert statement. In other
496 words, you can send many different emails to many different addresses.
497
498 Recipients in the global and in the local lists are alerted when a ser‐
499 vice failed, recovered or changed. If the same email address is in the
500 global and in the local list, Monit will only send one alert. Local
501 (per service) defined alert email addresses override global addresses
502 in case of a conflict. Finally, you may choose to only use a global
503 alert list (recommended), a local per service list or both.
504
505 It is also possible to disable the global alerts locally for particular
506 service(s) and recipients.
507
508 Setting a global alert statement
509
510 If a change occurred on a monitored services, Monit will send an alert
511 to all recipients in the global list who has registered interest for
512 the event type. Here is the syntax for the global alert statement:
513
514 SET ALERT mail-address [ [NOT] {events}] [MAIL-FORMAT {mail-format}]
515 [REMINDER number]
516
517 Simply using the following in the global section of monitrc:
518
519 set alert foo@bar
520
521 will send a default email to the address foo@bar whenever an event
522 occurred on any service. Such an event may be that a service timed out,
523 a service doesn't exist and so on. If you want to send alert messages
524 to more email addresses, add a set alert 'email' statement for each
525 address.
526
527 For explanations of the events, MAIL-FORMAT and REMINDER keywords
528 above, please see below.
529
530 You can also use the NOT option ahead of the events list which will
531 reverse the meaning of the list. That is, only send alerts for events
532 not in the list. This can save you some configuration bytes if you are
533 interested in most events except a few.
534
535 Setting a local alert statement
536
537 Each service can also have its own recipient list.
538
539 ALERT mail-address [ [NOT] {events}] [MAIL-FORMAT {mail-format}]
540 [REMINDER number]
541
542 or
543
544 NOALERT mail-address
545
546 If you only want an alert message sent for certain events and for cer‐
547 tain service(s), for example only for timeout events or only if a ser‐
548 vice died, then postfix the alert-statement with a filter block:
549
550 check process myproc with pidfile /var/run/my.pid
551 alert foo@bar only on { timeout, nonexist }
552 ...
553
554 (only and on are noise keywords, ignored by Monit. As a side note;
555 Noise keywords are used in the control file grammar to make an entry
556 resemble English and thus make it easier to read (or, so goes the phi‐
557 losophy). The full set of available noise keywords are listed below in
558 the Control File section).
559
560 You can also setup to send alerts for all events except some by putting
561 the word not ahead of the list. For example, if you want to receive
562 alerts for all events except Monit instance events, you can write (note
563 that the noise words 'but' and 'on' are optional):
564
565 check system myserver
566 alert foo@bar but not on { instance }
567 ...
568
569 instead of:
570
571 alert foo@bar on { action
572 checksum
573 content
574 data
575 exec
576 gid
577 icmp
578 invalid
579 fsflags
580 nonexist
581 permission
582 pid
583 ppid
584 size
585 timeout
586 timestamp }
587
588 This will send alerts for all events to foo@bar, except Monit instance
589 events. An instance event BTW, is an event fired whenever the Monit
590 program start or stop.
591
592 Event filtering can be used to send an email to different email
593 addresses depending on the events that occurred. For instance:
594
595 alert foo@bar { nonexist, timeout, resource, icmp, connection }
596 alert security@bar on { checksum, permission, uid, gid }
597 alert manager@bar
598
599 This will send an alert message to foo@bar whenever a nonexist, time‐
600 out, resource or connection problem occurs and a message to secu‐
601 rity@bar if a checksum, permission, uid or gid problem occurs. And
602 finally, a message to manager@bar whenever any error event occurs.
603
604 Here is the list of events you can use in a mail-filter: uid, gid,
605 size, nonexist, data, icmp, instance, invalid, exec, content, timeout,
606 resource, checksum, fsflags, timestamp, connection, permission, pid,
607 ppid, action
608
609 You can also disable the alerts localy using the NOALERT statement.
610 This is useful if you have lots of services monitored and are using the
611 global alert statement, but don't want to receive alerts for some minor
612 subset of services:
613
614 noalert appadmin@bar
615
616 For example, if you stick the noalert statement in a 'check system'
617 entry, you won't receive system related alerts (such as Monit instance
618 started/stopped/reloaded alert, system overloaded alert, etc.) but will
619 receive alerts for all other monitored services.
620
621 The following example will alert foo@bar on all events on all services
622 by default, except the service mybar which will send an alert only on
623 timeout. The trick is based on the fact that local definition of the
624 same recipient overrides the global setting (including registered
625 events and mail format):
626
627 set alert foo@bar
628
629 check process myfoo with pidfile /var/run/myfoo.pid
630 ...
631 check process mybar with pidfile /var/run/mybar.pid
632 alert foo@bar only on { timeout }
633
634 Alert message layout
635
636 Monit provides a default mail message layout that is short and to the
637 point. Here's an example of a standard alert mail sent by monit:
638
639 From: monit@tildeslash.com
640 Subject: Monit alert -- Does not exist apache
641 To: hauk@tildeslash.com
642 Date: Thu, 04 Sep 2003 02:33:03 +0200
643
644 Does not exist Service apache
645
646 Date: Thu, 04 Sep 2003 02:33:03 +0200
647 Action: restart
648 Host: www.tildeslash.com
649
650 Your faithful employee,
651 monit
652
653 If you want to, you can change the format of this message with the
654 optional mail-format statement. The syntax for this statement is as
655 follows:
656
657 mail-format {
658 from: monit@localhost
659 subject: $SERVICE $EVENT at $DATE
660 message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
661 Yours sincerely,
662 monit
663 }
664
665 Where the keyword from: is the email address Monit should pretend it is
666 sending from. It does not have to be a real mail address, but it must
667 be a proper formated mail address, on the form: name@domain. The key‐
668 word subject: is for the email subject line. The subject must be on
669 only one line. The message: keyword denotes the mail body. If used,
670 this keyword should always be the last in a mail-format statement. The
671 mail body can be as long as you want, but must not contain the '}'
672 character.
673
674 All of these format keywords are optional, but if used, you must pro‐
675 vide at least one. Thus if you only want to change the from address
676 Monit is using you can do:
677
678 set alert foo@bar with mail-format { from: bofh@bar.baz }
679
680 From the previous example you will notice that some special $XXX vari‐
681 ables were used. If used, they will be substituted and expanded into
682 the text with these values:
683
684 * $EVENT
685 A string describing the event that occurred. The values are
686 fixed and are:
687
688 Event: ⎪ Failure state: ⎪ Success state:
689 -------------------------------------------------------------------
690 ACTION ⎪ "Action done" ⎪ "Action done"
691 CHECKSUM ⎪ "Checksum failed" ⎪ "Checksum succeeded"
692 CONNECTION⎪ "Connection failed" ⎪ "Connection succeeded"
693 CONTENT ⎪ "Content failed", ⎪ "Content succeeded"
694 DATA ⎪ "Data access error" ⎪ "Data access succeeded"
695 EXEC ⎪ "Execution failed" ⎪ "Execution succeeded"
696 FSFLAG ⎪ "Filesystem flags failed"⎪ "Filesystem flags succeeded"
697 GID ⎪ "GID failed" ⎪ "GID succeeded"
698 ICMP ⎪ "ICMP failed" ⎪ "ICMP succeeded"
699 INSTANCE ⎪ "Monit instance changed" ⎪ "Monit instance changed not"
700 INVALID ⎪ "Invalid type" ⎪ "Type succeeded"
701 NONEXIST ⎪ "Does not exist" ⎪ "Exists"
702 PERMISSION⎪ "Permission failed" ⎪ "Permission succeeded"
703 PID ⎪ "PID failed" ⎪ "PID succeeded"
704 PPID ⎪ "PPID failed" ⎪ "PPID succeeded"
705 RESOURCE ⎪ "Resource limit matched" ⎪ "Resource limit succeeded"
706 SIZE ⎪ "Size failed" ⎪ "Size succeeded"
707 TIMEOUT ⎪ "Timeout" ⎪ "Timeout recovery"
708 TIMESTAMP ⎪ "Timestamp failed" ⎪ "Timestamp succeeded"
709 UID ⎪ "UID failed" ⎪ "UID succeeded"
710
711 * $SERVICE
712 The service entry name in monitrc
713
714 * $DATE
715 The current time and date (RFC 822 date style).
716
717 * $HOST
718 The name of the host Monit is running on
719
720 * $ACTION
721 The name of the action which was done. Action names are fixed
722 and are:
723
724 Action: ⎪ Name:
725 --------------------
726 ALERT ⎪ "alert"
727 EXEC ⎪ "exec"
728 MONITOR ⎪ "monitor"
729 RESTART ⎪ "restart"
730 START ⎪ "start"
731 STOP ⎪ "stop"
732 UNMONITOR⎪ "unmonitor"
733
734 * $DESCRIPTION
735 The description of the error condition
736
737 Setting a global mail format
738
739 It is possible to set a standard mail format with the following global
740 set-statement (keywords are in capital):
741
742 SET MAIL-FORMAT {mail-format}
743
744 Format set with this statement will apply to every alert statement that
745 does not have its own specified mail-format. This statement is most
746 useful for setting a default from address for messages sent by monit,
747 like so:
748
749 set mail-format { from: monit@foo.bar.no }
750
751 Setting an error reminder
752
753 Monit by default sends just one error notification if a service failed
754 and another when it recovered. If you want to be notified more then
755 once if a service remains in a failed state, you can use the reminder
756 option to the alert statement (keywords are in capital):
757
758 ALERT ... [WITH] REMINDER [ON] number [CYCLES]
759
760 For example if you want to be notified each tenth cycle if a service
761 remains in a failed state, you can use:
762
763 alert foo@bar with reminder on 10 cycles
764
765 Likewise if you want to be notified on each failed cycle, you can use:
766
767 alert foo@bar with reminder on 1 cycle
768
769 Setting a mail server for alert messages
770
771 The mail server Monit should use to send alert messages is defined with
772 a global set statement (keywords are in capital and optional statements
773 in [brackets]):
774
775 SET MAILSERVER {hostname⎪ip-address [PORT port]
776 [USERNAME username] [PASSWORD password]
777 [using SSLV2⎪SSLV3⎪TLSV1] [CERTMD5 checksum]}+
778 [with TIMEOUT X SECONDS]
779 [using HOSTNAME hostname]
780
781 The port statement allows to use SMTP servers other then those listen‐
782 ing on port 25. If omitted, port 25 is used unless ssl or tls is used,
783 in which case port 465 is used by default.
784
785 Monit support plain smtp authentication - you can set a username and a
786 password using the USERNAME and PASSWORD options.
787
788 To use secure communication, use the SSLV2, SSLV3 or TLSV1 options, you
789 can also specify the server certificate checksum using CERTMD5 option.
790
791 As you can see, it is possible to set several SMTP servers. If Monit
792 cannot connect to the first server in the list it will try the second
793 server and so on. Monit has a default 5 seconds connection timeout and
794 if the SMTP server is slow, Monit could timeout when connecting or
795 reading from the server. If this is the case, you can use the optional
796 timeout statement to explicit set the timeout to a higher value if
797 needed. Here is an example for setting several mail servers:
798
799 set mailserver mail.tildeslash.com, mail.foo.bar port 10025
800 username "Rabbi" password "Loewe" using tlsv1, localhost
801 with timeout 15 seconds
802
803 Here Monit will first try to connect to the server
804 "mail.tildeslash.com", if this server is down Monit will try
805 "mail.foo.bar" on port 10025 using the given credentials via tls and
806 finally "localhost". We also set an explicit connect and read timeout;
807 If Monit cannot connect to the first SMTP server in the list within 15
808 seconds it will try the next server and so on. The set mailserver ..
809 statement is optional and if not defined Monit will not send email
810 alerts. Not setting a mail server is recommended only if alert notifi‐
811 cation is delegated to M/Monit.
812
813 Monit, by default, use the local host name in SMTP HELO/EHLO and in the
814 Message-ID header. Some mail servers check this information against DNS
815 for spam protection and can reject the email if the DNS and the host‐
816 name used in the transaction does not match. If this is the case, you
817 can override the default local host name by using the HOSTNAME option:
818
819 set mailserver mail.tildeslash.com using hostname
820 "myhost.example.org"
821
822 Event queue
823
824 If the MTA (mail server) for sending alerts is not available, Monit can
825 queue events on the local file-system until the MTA recover. Monit will
826 then post queued events in order with their original timestamp so the
827 events are not lost. This feature is most useful if Monit is used
828 together with M/Monit and when event history is important.
829
830 The event queue is persistent across monit restarts and provided that
831 the back-end filesystem is persistent too, across system restart as
832 well.
833
834 By default, the queue is disabled and if the alert handler fails, Monit
835 will simply drop the alert message. To enable the event queue, add the
836 following statement to the Monit control file:
837
838 SET EVENTQUEUE BASEDIR <path> [SLOTS <number>]
839
840 The <path> is the path to the directory where events will be stored.
841 Optionally if you want to limit the queue size, use the slots option to
842 only store up to number event messages. If the slots option is not
843 used, Monit will store as many events as the backend filesystem allows.
844
845 Example:
846
847 set eventqueue
848 basedir /var/monit
849 slots 5000
850
851 Events are stored in a binary format, with one file per event. The
852 file size is ca. 130 bytes or a bit more (depending on the message
853 length). The file name is composed of the unix timestamp, underscore
854 and the service name, for example:
855
856 /var/monit/1131269471_apache
857
858 If you are running more then one Monit instance on the same machine,
859 you must use separated event queue directories to avoid sending wrong
860 alerts to the wrong addresses.
861
862 If you want to purge the queue by hand, that is, remove queued
863 event-files, Monit should be stopped before the removal.
864
866 monit provides a service timeout mechanism for situations where a ser‐
867 vice simply refuses to start or respond over a longer period.
868
869 The timeout mechanism is based on number if service restarts and number
870 of poll-cycles. For example, if a service had x restarts within y poll-
871 cycles (where x <= y) then Monit will perform an action (for example
872 unmonitor the service). If a timeout occurs Monit will send an alert
873 message if you have register interest for this event.
874
875 The syntax for the timeout statement is as follows (keywords are in
876 capital):
877
878 IF <number> RESTART <number> CYCLE(S) THEN <action>
879
880 Here is an example where Monit will unmonitor the service if it was
881 restarted 2 times within 3 cycles:
882
883 if 2 restarts within 3 cycles then unmonitor
884
885 To have Monit check the service again after a monitoring was disabled,
886 run 'monit monitor <servicename>' from the command line.
887
888 Example for setting custom exec on timeout:
889
890 if 5 restarts within 5 cycles then exec "/foo/bar"
891
892 Example for stopping the service:
893
894 if 7 restarts within 10 cycles then stop
895
897 Monit provides several tests you may utilize in a service entry to test
898 a service. There are two classes of tests: variable and constant tests.
899 That is, the condition we test is either constant e.g. a number or it
900 can vary.
901
902 A constant test has this general format:
903
904 IF <TEST> [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION [ELSE IF SUC‐
905 CEEDED [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION]
906
907 If the <TEST> condition should evaluate to true, then the selected
908 action is executed each cycle the test condition remains true. The com‐
909 parison value is constant. Recovery action is evaluated only once (on a
910 failed to succeeded state change only). The 'ELSE IF SUCCEEDED' part
911 is optional, if omitted, Monit will send an alert on recovery. The
912 alert is sent only once on each state change unless overridden by the
913 'reminder' alert option.
914
915 A variable test has this general format:
916
917 IF CHANGED <TEST> [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION
918
919 If the <TEST> should evaluate to true, then the selected action is exe‐
920 cuted once. The comparison value is a variable where the last result
921 becomes the new value and is compared in future cycles. The alert is
922 delivered each time the condition becomes true.
923
924 You can use this test for alerts or for some automatic action, for
925 example to reload monitored process after its configuration file was
926 changed. Variable tests are supported for 'checksum', 'size', 'pid,
927 'ppid' and 'timestamp' tests only.
928
929 ... [[<X>] [TIMES WITHIN] <Y> CYCLES] ...
930
931 If a test match, its action is executed at once. This behaviour can
932 optionally be changed and you can for instance require that a test must
933 match over several poll cycles before the action is executed by using
934 the statement above. You can use this in several ways. For example:
935
936 if failed port 80 for 3 times within 5 cycles then alert
937
938 or
939
940 if failed port 80 for 10 cycles then unmonitor
941
942 If you don't specify <X> times, it equals <Y> by default, thus the test
943 match if it evaluate to true for <Y> consecutive cycles.
944
945 It is possible to use this option for failed, succeeded, recovered or
946 changed rules. Here is a more complex example:
947
948 check filesystem rootfs with path /dev/hda1
949 if space usage > 80% for 5 times within 15 cycles
950 then alert else if succeeded for 10 cycles then alert
951 if space usage > 90% for 5 cycles then
952 exec '/try/to/free/the/space'
953 if space usage > 99% then exec '/stop/processess'
954
955 In each test you must select the action to be executed from this list:
956
957 · ALERT sends the user an alert event on each state change (for con‐
958 stant tests) or on each change (for variable tests).
959
960 · RESTART restarts the service and sends an alert. Restart is con‐
961 ducted by first calling the service's registered stop method and
962 then the service's start method.
963
964 · START starts the service by calling the service's registered start
965 method and send an alert.
966
967 · STOP stops the service by calling the service's registered stop
968 method and send an alert. If Monit stops a service it will not be
969 checked by Monit anymore nor restarted again later. To reactivate
970 monitoring of the service again you must explicitly enable monitor‐
971 ing from the web interface or from the console, e.g. 'monit monitor
972 apache'.
973
974 · EXEC can be used to execute an arbitrary program and send an alert.
975 If you choose this action you must state the program to be executed
976 and if the program require arguments you must enclose the program
977 and its arguments in a quoted string. You may optionally specify
978 the uid and gid the executed program should switch to upon start.
979 For instance:
980
981 exec "/usr/local/tomcat/bin/startup.sh"
982 as uid nobody and gid nobody
983
984 The uid and gid switch can be useful if the program to be started
985 cannot change to a lesser privileged user and group. This is typi‐
986 cally needed for Java Servers. Remember, if Monit is run by the
987 superuser, then all programs executed by Monit will be started with
988 superuser privileges unless the uid and gid extension was used.
989
990 · MONITOR will enable monitoring of the service and send an alert.
991
992 · UNMONITOR will disable monitoring of the service and send an alert.
993 The service will not be checked by Monit anymore nor restarted
994 again later. To reactivate monitoring of the service you must
995 explicitly enable monitoring from monit's web interface or from the
996 console using the monitor argument.
997
998 RESOURCE TESTING
999
1000 Monit can examine how much system resources a service are using. This
1001 test can only be used within a system or process service entry in the
1002 Monit control file.
1003
1004 Depending on system or process characteristics, services can be stopped
1005 or restarted and alerts can be generated. Thus it is possible to uti‐
1006 lize systems which are idle and to spare system under high load.
1007
1008 The full syntax for the resource-statements used for resource testing
1009 is as follows (keywords are in capital and optional statements in
1010 [brackets]),
1011
1012 IF resource operator value [[<X>] <Y> CYCLES] THEN action [ELSE IF SUC‐
1013 CEEDED [[<X>] <Y> CYCLES] THEN action]
1014
1015 resource is a choice of "CPU", "TOTALCPU", "CPU([user⎪system⎪wait])",
1016 "MEMORY", "CHILDREN", "TOTALMEMORY", "LOADAVG([1min⎪5min⎪15min])". Some
1017 resource tests can be used inside a check system entry, some in a check
1018 process entry and some in both:
1019
1020 System only resource tests:
1021
1022 CPU([user⎪system⎪wait]) is the percent of time the system spend in user
1023 or system/kernel space. Some systems such as linux 2.6 supports a
1024 'wait' indicator as well.
1025
1026 Process only resource tests:
1027
1028 CPU is the CPU usage of the process itself (percent).
1029
1030 TOTALCPU is the total CPU usage of the process and its children in
1031 (percent). You will want to use TOTALCPU typically for services like
1032 Apache webserver where one master process forks the child processes as
1033 workers.
1034
1035 CHILDREN is the number of child processes of the process.
1036
1037 TOTALMEMORY is the memory usage of the process and its child processes
1038 in either percent or as an amount (Byte, kB, MB, GB).
1039
1040 System and process resource tests:
1041
1042 MEMORY is the memory usage of the system or of a process (without chil‐
1043 dren) in either percent (of the systems total) or as an amount (Byte,
1044 kB, MB, GB).
1045
1046 LOADAVG([1min⎪5min⎪15min]) refers to the system's load average. The
1047 load average is the number of processes in the system run queue, aver‐
1048 aged over the specified time period.
1049
1050 operator is a choice of "<", ">", "!=", "==" in C notation, "gt", "lt",
1051 "eq", "ne" in shell sh notation and "greater", "less", "equal", "note‐
1052 qual" in human readable form (if not specified, default is EQUAL).
1053
1054 value is either an integer or a real number (except for CHILDREN). For
1055 CPU, TOTALCPU, MEMORY and TOTALMEMORY you need to specify a unit. This
1056 could be "%" or if applicable "B" (Byte), "kB" (1024 Byte), "MB" (1024
1057 KiloByte) or "GB" (1024 MegaByte).
1058
1059 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1060 "MONITOR" or "UNMONITOR".
1061
1062 To calculate the cycles, a counter is raised whenever the expression
1063 above is true and it is lowered whenever it is false (but not below 0).
1064 All counters are reset in case of a restart.
1065
1066 The following is an example to check that the CPU usage of a service is
1067 not going beyond 50% during five poll cycles. If it does, Monit will
1068 restart the service:
1069
1070 if cpu is greater than 50% for 5 cycles then restart
1071
1072 See also the example section below.
1073
1074 FILE CHECKSUM TESTING
1075
1076 The checksum statement may only be used in a file service entry. If
1077 specified in the control file, Monit will compute a md5 or sha1 check‐
1078 sum for a file.
1079
1080 The checksum test in constant form is used to verify that a file does
1081 not change. Syntax (keywords are in capital):
1082
1083 IF FAILED [MD5⎪SHA1] CHECKSUM [EXPECT checksum] [[<X>] <Y> CYCLES] THEN
1084 action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1085
1086 The checksum test in variable form is used to watch for file changes.
1087 Syntax (keywords are in capital):
1088
1089 IF CHANGED [MD5⎪SHA1] CHECKSUM [[<X>] <Y> CYCLES] THEN action
1090
1091 The choice of MD5 or SHA1 is optional. MD5 features a 256 bit and SHA1
1092 a 320 bit checksum. If this option is omitted Monit tries to guess the
1093 method from the EXPECT string or uses MD5 as default.
1094
1095 expect is optional and if used it specifies a md5 or sha1 string Monit
1096 should expect when testing a file's checksum. If expect is used, Monit
1097 will not compute an initial checksum for the file, but instead use the
1098 string you submit. For example:
1099
1100 if failed checksum and
1101 expect the sum 8f7f419955cefa0b33a2ba316cba3659
1102 then alert
1103
1104 You can, for example, use the GNU utility md5sum(1) or sha1sum(1) to
1105 create a checksum string for a file and use this string in the
1106 expect-statement.
1107
1108 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1109 "MONITOR" or "UNMONITOR".
1110
1111 The checksum statement in variable form may be used to check a file for
1112 changes and if changed, do a specified action. For instance to reload a
1113 server if its configuration file was changed. The following illustrates
1114 this for the apache web server:
1115
1116 check file httpd.conf path /usr/local/apache/conf/httpd.conf
1117 if changed sha1 checksum
1118 then exec "/usr/local/apache/bin/apachectl graceful"
1119
1120 If you plan to use the checksum statement for security reasons, (a very
1121 good idea, by the way) and to monitor a file or files which should not
1122 change, then please use the constant form and also read the DEPENDENCY
1123 TREE section below to see a detailed example on how to do this prop‐
1124 erly.
1125
1126 Monit can also test the checksum for files on a remote host via the
1127 HTTP protocol. See the CONNECTION TESTING section below.
1128
1129 TIMESTAMP TESTING
1130
1131 The timestamp statement may only be used in a file, fifo or directory
1132 service entry.
1133
1134 The timestamp test in constant form is used to verify various timestamp
1135 conditions. Syntax (keywords are in capital):
1136
1137 IF TIMESTAMP [[operator] value [unit]] [[<X>] <Y> CYCLES] THEN action
1138 [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1139
1140 The timestamp statement in variable form is simply to test an existing
1141 file or directory for timestamp changes and if changed, execute an
1142 action. Syntax (keywords are in capital):
1143
1144 IF CHANGED TIMESTAMP [[<X>] <Y> CYCLES] THEN action
1145
1146 operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT",
1147 "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL", "NOTE‐
1148 QUAL" in human readable form (if not specified, default is EQUAL).
1149
1150 value is a time watermark.
1151
1152 unit is either "SECOND", "MINUTE", "HOUR" or "DAY" (it is also possible
1153 to use "SECONDS", "MINUTES", "HOURS", or "DAYS").
1154
1155 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1156 "MONITOR" or "UNMONITOR".
1157
1158 The variable timestamp statement is useful for checking a file for
1159 changes and then execute an action. This version was written particu‐
1160 larly with configuration files in mind. For instance, if you monitor
1161 the apache web server you can use this statement to reload apache if
1162 the httpd.conf (apache's configuration file) was changed. Like so:
1163
1164 check file httpd.conf with path /usr/local/apache/conf/httpd.conf
1165 if changed timestamp
1166 then exec "/usr/local/apache/bin/apachectl graceful"
1167
1168 The constant timestamp version is useful for monitoring systems able to
1169 report its state by changing the timestamp of certain state files. For
1170 instance the iPlanet Messaging server stored process system updates the
1171 timestamp of the following files:
1172
1173 o stored.ckp
1174 o stored.lcu
1175 o stored.per
1176
1177 If a task should fail, the system keeps the timestamp. To report stored
1178 problems you can use the following statements:
1179
1180 check file stored.ckp with path /msg-foo/config/stored.ckp
1181 if timestamp > 1 minute then alert
1182
1183 check file stored.lcu with path /msg-foo/config/stored.lcu
1184 if timestamp > 5 minutes then alert
1185
1186 check file stored.per with path /msg-foo/config/stored.per
1187 if timestamp > 1 hour then alert
1188
1189 As mentioned above, you can also use the timestamp statement for moni‐
1190 toring directories for changes. If files are added or removed from a
1191 directory, its timestamp is changed:
1192
1193 check directory mydir path /foo/directory
1194 if timestamp > 1 hour then alert
1195
1196 or
1197
1198 check directory myotherdir path /foo/secure/directory
1199 if timestamp < 1 hour then alert
1200
1201 The following example is a hack for restarting a process after a cer‐
1202 tain time. Sometimes this is a necessary workaround for some third-
1203 party applications, until the vendor fix a problem:
1204
1205 check file server.pid path /var/run/server.pid
1206 if timestamp > 7 days
1207 then exec "/usr/local/server/restart-server"
1208
1209 FILE SIZE TESTING
1210
1211 The size statement may only be used in a file service entry. If speci‐
1212 fied in the control file, Monit will compute a size for a file.
1213
1214 The size test in constant form is used to verify various size condi‐
1215 tions. Syntax (keywords are in capital):
1216
1217 IF SIZE [[operator] value [unit]] [[<X>] <Y> CYCLES] THEN action [ELSE
1218 IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1219
1220 The size statement in variable form is simply to test an existing file
1221 for size changes and if changed, execute an action. Syntax (keywords
1222 are in capital):
1223
1224 IF CHANGED SIZE [[<X>] <Y> CYCLES] THEN action
1225
1226 operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT",
1227 "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL", "NOTE‐
1228 QUAL" in human readable form (if not specified, default is EQUAL).
1229
1230 value is a size watermark.
1231
1232 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
1233 "kilobyte", "megabyte", "gigabyte". If it is not specified, "byte" unit
1234 is assumed by default.
1235
1236 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1237 "MONITOR" or "UNMONITOR".
1238
1239 The variable size test form is useful for checking a file for changes
1240 and send an alert or execute an action. Monit will register the size of
1241 the file at startup and monitor the file for changes. As soon as the
1242 value changes, Monit will perform the specified action, reset the reg‐
1243 istered value to the new value and continue monitoring and test if the
1244 size changes again.
1245
1246 One example of use for this statement is to conduct security checks,
1247 for instance:
1248
1249 check file su with path /bin/su
1250 if changed size then exec "/sbin/ifconfig eth0 down"
1251
1252 which will "cut the cable" and stop a possible intruder from compromis‐
1253 ing the system further. This test is just one of many you may use to
1254 increase the security awareness on a system. If you plan to use Monit
1255 for security reasons we recommend that you use this test in combination
1256 with other supported tests like checksum, timestamp, and so on.
1257
1258 The constant form of this test can be useful in similar or different
1259 contexts. It can, for instance, be used to test if a certain file size
1260 was exceeded and then alert you or Monit may execute a certain action
1261 specified by you. An example is to use this statement to rotate log
1262 files after they have reached a certain size or to check that a data‐
1263 base file does not grow beyond a specified threshold.
1264
1265 To rotate a log file:
1266
1267 check file myapp.log with path /var/log/myapp.log
1268 if size > 50 MB then
1269 exec "/usr/local/bin/rotate /var/log/myapp.log myapp"
1270
1271 where /usr/local/bin/rotate may be a simple script, such as:
1272
1273 #/bin/bash
1274 /bin/mv $1 $1.`date +%y-%m-%d`
1275 /usr/bin/pkill -HUP $2
1276
1277 Or you may use this statement to trigger the logrotate(8) program, to
1278 do an "emergency" rotate. Or to send an alert if a file becomes a known
1279 bottleneck if it grows behind a certain size because of limits in a
1280 database engine:
1281
1282 check file mydb with path /data/mydatabase.db
1283 if size > 1 GB then alert
1284
1285 This is a more restrictive form of the first example where the size is
1286 explicitly defined (note that the real su size is system dependent):
1287
1288 check file su with path /bin/su
1289 if size != 95564 then exec "/sbin/ifconfig eth0 down"
1290
1291 FILE CONTENT TESTING
1292
1293 The match statement allows you to test the content of a text file by
1294 using regular expressions. This is a great feature if you need to peri‐
1295 odically test files, such as log files, for certain patterns. If a pat‐
1296 tern match, Monit defaults to raise an alert, other actions are also
1297 possible.
1298
1299 The syntax (keywords in capital) for using this test is:
1300
1301 IF [NOT] MATCH {regex⎪path} [[<X>] <Y> CYCLES] THEN action
1302
1303 regex is a string containing the extended regular expression. See also
1304 regex(7).
1305
1306 path is an absolute path to a file containing extended regular expres‐
1307 sion on every line. See also regex(7).
1308
1309 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1310 "MONITOR" or "UNMONITOR".
1311
1312 You can use the NOT statement to invert a match.
1313
1314 The content is only being checked every cycle. If content is being
1315 added and removed between two checks they are unnoticed.
1316
1317 On startup the read position is set to the end of the file and Monit
1318 continue to scan to the end of file on each cycle. But if the file
1319 size should decrease or inode change the read position is set to the
1320 start of the file.
1321
1322 Only lines ending with a newline character are inspected. Thus, lines
1323 are being ignored until they have been completed with this character.
1324 Also note that only the first 511 characters of a line are inspected.
1325
1326 IGNORE [NOT] MATCH {regex⎪path}
1327
1328 Lines matching an IGNORE are not inspected during later evaluations.
1329 IGNORE MATCH has always precedence over IF MATCH.
1330
1331 All IGNORE MATCH statements are evaluated first, in the order of their
1332 appearance. Thereafter, all the IF MATCH statements are evaluated.
1333
1334 A real life example might look like this:
1335
1336 check file syslog with path /var/log/syslog
1337 ignore match
1338 "^\w{3} [ :0-9]{11} [._[:alnum:]-]+ monit\[[0-9]+\]:"
1339 ignore match /etc/monit/ignore.regex
1340 if match
1341 "^\w{3} [ :0-9]{11} [._[:alnum:]-]+ mrcoffee\[[0-9]+\]:"
1342 if match /etc/monit/active.regex then alert
1343
1344 FILESYSTEM FLAGS TESTING
1345
1346 Monit can test the flags of a filesystem for changes. This test is
1347 implicit and Monit will send alert in case of failure by default.
1348
1349 This test is useful for detecting changes of the filesystem flags such
1350 as when the filesystem became read-only based on disk errors or the
1351 mount flags were changed (such as nosuid). Each platform provides dif‐
1352 ferent set of flags. POSIX define the RDONLY and NOSUID flags which
1353 should work on all platforms. Some platforms (such as FreeBSD) has
1354 additonal flags.
1355
1356 The syntax for the fsflags statement is:
1357
1358 IF CHANGED FSFLAGS [[<X>] <Y> CYCLES] THEN action
1359
1360 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1361 "MONITOR" or "UNMONITOR".
1362
1363 Example:
1364
1365 check filesystem rootfs with path /
1366 if changed fsflags then exec "/my/script"
1367 alert root@localhost
1368
1369 SPACE TESTING
1370
1371 Monit can test file systems for space usage. This test may only be used
1372 within a filesystem service entry in the Monit control file.
1373
1374 Monit will check a filesystem's total space usage. If you only want to
1375 check available space for non-superuser, you must set the watermark
1376 appropriately (i.e. total space minus reserved blocks for the supe‐
1377 ruser).
1378
1379 You can obtain (and set) the superuser's reserved blocks size, for
1380 example by using the tune2fs utility on Linux. On Linux 5% of available
1381 blocks are reserved for the superuser by default. On solaris 10% of the
1382 blocks are reserved. You can also use tunefs on solaris to change val‐
1383 ues on a live filesystem.
1384
1385 The full syntax for the space statement is:
1386
1387 IF SPACE operator value unit [[<X>] <Y> CYCLES] THEN action [ELSE IF
1388 SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1389
1390 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
1391 "eq", "ne" in shell sh notation and "greater", "less", "equal", "note‐
1392 qual" in human readable form (if not specified, default is EQUAL).
1393
1394 unit is a choice of "B","KB","MB","GB", "%" or long alternatives
1395 "byte", "kilobyte", "megabyte", "gigabyte", "percent".
1396
1397 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1398 "MONITOR" or "UNMONITOR".
1399
1400 INODE TESTING
1401
1402 If supported by the file-system, you can use Monit to test for inodes
1403 usage. This test may only be used within a filesystem service entry in
1404 the Monit control file.
1405
1406 If the filesystem becomes unavailable, Monit will call the service's
1407 registered start method, if it is defined and if Monit is running in
1408 active mode. If Monit runs in passive mode or the start methods is not
1409 defined, Monit will just send an error alert.
1410
1411 The syntax for the inode statement is:
1412
1413 IF INODE(S) operator value [unit] [[<X>] <Y> CYCLES] THEN action [ELSE
1414 IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1415
1416 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
1417 "eq", "ne" in shell sh notation and "greater", "less", "equal", "note‐
1418 qual" in human readable form (if not specified, default is EQUAL).
1419
1420 unit is optional. If not specified, the value is an absolute count of
1421 inodes. You can use the "%" character or the longer alternative "per‐
1422 cent" as a unit.
1423
1424 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1425 "MONITOR" or "UNMONITOR".
1426
1427 PERMISSION TESTING
1428
1429 Monit can monitor the permission of file objects. This test may only be
1430 used within a file, fifo, directory or filesystem service entry in the
1431 Monit control file.
1432
1433 The syntax for the permission statement is:
1434
1435 IF FAILED PERM(ISSION) octalnumber [[<X>] <Y> CYCLES] THEN action [ELSE
1436 IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1437
1438 octalnumber defines permissions for a file, a directory or a filesystem
1439 as four octal digits (0-7). Valid range: 0000 - 7777 (you can omit the
1440 leading zeros, Monit will add the zeros to the left thus for example
1441 "640" is valid value and matches "0640").
1442
1443 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1444 "MONITOR" or "UNMONITOR".
1445
1446 The web interface will show a permission warning if the test failed.
1447
1448 We recommend that you use the UNMONITOR action in a permission state‐
1449 ment. The rationale for this feature is security and that Monit does
1450 not start a possible cracked program or script. Example:
1451
1452 check file monit.bin with path "/usr/local/bin/monit"
1453 if failed permission 0555 then unmonitor
1454
1455 If the test fails, Monit will simply send an alert and stop monitoring
1456 the file and propagate an unmonitor action upward in a depend tree.
1457
1458 UID TESTING
1459
1460 Monit can monitor the owner user id (uid) of a file object. This test
1461 may only be used within a file, fifo, directory or filesystem service
1462 entry in the Monit control file.
1463
1464 The syntax for the uid statement is:
1465
1466 IF FAILED UID user [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED
1467 [[<X>] <Y> CYCLES] THEN action]
1468
1469 user defines a user id either in numeric or in string form.
1470
1471 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1472 "MONITOR" or "UNMONITOR".
1473
1474 The web interface will show a uid warning if the test should fail.
1475
1476 We recommend that you use the UNMONITOR action in a uid statement. The
1477 rationale for this feature is security and that Monit does not start a
1478 possible cracked program or script. Example:
1479
1480 check file passwd with path /etc/passwd
1481 if failed uid root then unmonitor
1482
1483 If the test fails, Monit will simply send an alert and stop monitoring
1484 the file and propagate an unmonitor action upward in a depend tree.
1485
1486 GID TESTING
1487
1488 Monit can monitor the owner group id (gid) of file objects. This test
1489 may only be used within a file, fifo, directory or filesystem service
1490 entry in the Monit control file.
1491
1492 The syntax for the gid statement is:
1493
1494 IF FAILED GID user [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED
1495 [[<X>] <Y> CYCLES] THEN action]
1496
1497 user defines a group id either in numeric or in string form.
1498
1499 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1500 "MONITOR" or "UNMONITOR".
1501
1502 The web interface will show a gid warning if the test should fail.
1503
1504 We recommend that you use the UNMONITOR action in a gid statement. The
1505 rationale for this feature is security and that Monit does not start a
1506 possible cracked program or script. Example:
1507
1508 check file shadow with path /etc/shadow
1509 if failed gid root then unmonitor
1510
1511 If the test fails, Monit will simply send an alert and stop monitoring
1512 the file and propagate an unmonitor action upward in a depend tree.
1513
1514 PID TESTING
1515
1516 Monit can test the process identification number (pid) of a process for
1517 changes. This test is implicit and Monit will send a alert in the case
1518 of failure by default.
1519
1520 The syntax for the pid statement is:
1521
1522 IF CHANGED PID [[<X>] <Y> CYCLES] THEN action
1523
1524 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1525 "MONITOR" or "UNMONITOR".
1526
1527 This test is useful to detect possible process restarts which has
1528 occurred in the timeframe between two Monit testing cycles. In the case
1529 that the restart was fast and the process provides expected service
1530 (i.e. all tests succeeded) you will be notified that the process was
1531 replaced.
1532
1533 For example sshd daemon can restart very quickly, thus if someone
1534 changes its configuration and do sshd restart outside of Monit's con‐
1535 trol you will be notified that the process was replaced by a new
1536 instance (or you can optionally do some other action such as preven‐
1537 tively stop sshd).
1538
1539 Another example is a MySQL Cluster which has its own watchdog with
1540 process restart ability. You can use Monit for redundant monitoring.
1541
1542 Example:
1543
1544 check process sshd with pidfile /var/run/sshd.pid
1545 if changed pid then exec "/my/script"
1546
1547 PPID TESTING
1548
1549 Monit can test the process parent process identification number (ppid)
1550 of a process for changes. This test is implicit and Monit will send
1551 alert in the case of failure by default.
1552
1553 The syntax for the ppid statement is:
1554
1555 IF CHANGED PPID [[<X>] <Y> CYCLES] THEN action
1556
1557 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1558 "MONITOR" or "UNMONITOR".
1559
1560 This test is useful for detecting changes of a process parent.
1561
1562 Example:
1563
1564 check process myproc with pidfile /var/run/myproc.pid
1565 if changed ppid then exec "/my/script"
1566
1567 CONNECTION TESTING
1568
1569 Monit is able to perform connection testing via networked ports or via
1570 Unix sockets. A connection test may only be used within a process or
1571 within a host service entry in the Monit control file.
1572
1573 If a service listens on one or more sockets, Monit can connect to the
1574 port (using either tcp or udp) and verify that the service will accept
1575 a connection and that it is possible to write and read from the socket.
1576 If a connection is not accepted or if there is a problem with socket
1577 i/o, Monit will assume that something is wrong and execute a specified
1578 action. If Monit is compiled with openssl, then ssl based network ser‐
1579 vices can also be tested.
1580
1581 The full syntax for the statement used for connection testing is as
1582 follows (keywords are in capital and optional statements in [brack‐
1583 ets]),
1584
1585 IF FAILED [host] port [type] [protocol⎪{send/expect}+] [timeout] [[<X>]
1586 <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN
1587 action]
1588
1589 or for Unix sockets,
1590
1591 IF FAILED [unixsocket] [type] [protocol⎪{send/expect}+] [timeout]
1592 [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES]
1593 THEN action]
1594
1595 host:HOST hostname. Optionally specify the host to connect to. If the
1596 host is not given then localhost is assumed if this test is used inside
1597 a process entry. If this test was used inside a remote host entry then
1598 the entry's remote host is assumed. Although host is intended for
1599 testing name based virtual host in a HTTP server running on local or
1600 remote host, it does allow the connection statement to be used to test
1601 a server running on another machine. This may be useful; For instance
1602 if you use Apache httpd as a front-end and an application-server as the
1603 back-end running on another machine, this statement may be used to test
1604 that the back-end server is running and if not raise an alert.
1605
1606 port:PORT number. The port number to connect to
1607
1608 unixsocket:UNIXSOCKET PATH. Specifies the path to a Unix socket.
1609 Servers based on Unix sockets, always runs on the local machine and
1610 does not use a port.
1611
1612 type:TYPE {TCP⎪UDP⎪TCPSSL}. Optionally specify the socket type Monit
1613 should use when trying to connect to the port. The different socket
1614 types are; TCP, UDP or TCPSSL, where TCP is a regular stream based
1615 socket, UDP is a datagram socket and TCPSSL specify that Monit should
1616 use a TCP socket with SSL when connecting to a port. The default socket
1617 type is TCP. If TCPSSL is used you may optionally specify the SSL/TLS
1618 protocol to be used and the md5 sum of the server's certificate. The
1619 TCPSSL options are:
1620
1621 TCPSSL [SSLAUTO⎪SSLV2⎪SSLV3⎪TLSV1] [CERTMD5 md5sum]
1622
1623 proto(col):PROTO {protocols}. Optionally specify the protocol Monit
1624 should speak when a connection is established. At the moment Monit
1625 knows how to speak:
1626 APACHE-STATUS
1627 DNS
1628 DWP
1629 FTP
1630 GPS
1631 HTTP
1632 IMAP
1633 CLAMAV
1634 LDAP2
1635 LDAP3
1636 LMTP
1637 MYSQL
1638 NNTP
1639 NTP3
1640 POP
1641 POSTFIX-POLICY
1642 RADIUS
1643 RDATE
1644 RSYNC
1645 SIP
1646 SMTP
1647 SSH
1648 TNS
1649 PGSQL If you have compiled Monit with ssl support, Monit can also
1650 speak the SSL variants such as:
1651 HTTPS
1652 FTPS
1653 POPS
1654 IMAPS To use the SSL protocol support you need to define the socket as
1655 SSL and use the general protocol name (for example in the case of
1656 HTTPS) :
1657 TYPE TCPSSL PROTOCOL HTTP If the server's protocol is not found in
1658 this list, simply do not specify the protocol and Monit will utilize a
1659 default test, including test if it is possible to read and write to the
1660 port. This default test is in most cases more than good enough to
1661 deduce if the server behind the port is up or not.
1662
1663 The protocol statement is:
1664
1665 PROTO(COL) {name}
1666
1667 HTTP protocol supports additional options:
1668 o REQUEST
1669 o HOSTHEADER
1670 o CHECKSUM
1671
1672 PROTO(COL) HTTP [REQUEST {"/path"} [with HOSTHEADER "string"] [with CHECKSUM checksum]
1673
1674 When the Host header option is not specified, for HTTP protocol, by
1675 default the content of host option which specifies the target host to
1676 connect to is used. The Host header can be used when you need to test
1677
1678 Examples:
1679
1680 if failed host 192.168.1.100 port 8080 protocol http and request '/testing' hostheader 'example.com' with timeout 20 seconds for 2 cycles then alert
1681 if failed host 192.168.1.101 port 8080 protocol http and request '/testing' hostheader 'example.com' with timeout 20 seconds for 2 cycles then alert
1682 if failed host 192.168.1.102 port 8080 protocol http and request '/testing' hostheader 'example.com' with timeout 20 seconds for 2 cycles then alert
1683
1684 In addition to the standard protocols, the APACHE-STATUS protocol is a
1685 test of a specific server type, rather than a generic protocol. Server
1686 performance is examined using the status page generated by Apache's
1687 mod_status, which is expected to be at its default address of
1688 http://www.example.com/server-status. Currently the APACHE-STATUS pro‐
1689 tocol examines the percentage of Apache child processes which are
1690
1691 o logging (loglimit)
1692 o closing connections (closelimit)
1693 o performing DNS lookups (dnslimit)
1694 o in keepalive with a client (keepalivelimit)
1695 o replying to a client (replylimit)
1696 o receiving a request (requestlimit)
1697 o initialising (startlimit)
1698 o waiting for incoming connections (waitlimit)
1699 o gracefully closing down (gracefullimit)
1700 o performing cleanup procedures (cleanuplimit)
1701
1702 Each of these quantities can be compared against a value relative to
1703 the total number of active Apache child processes. If the comparison
1704 expression is true the chosen action is performed.
1705
1706 The apache-status protocol statement is formally defined as (keywords
1707 in uppercase):
1708
1709 PROTO(COL) {limit} OP PERCENT [OR {limit} OP PERCENT]*
1710
1711 where {limit} is one or more of: loglimit, closelimit, dnslimit,
1712 keepalivelimit, replylimit, requestlimit, startlimit, waitlimit grace‐
1713 fullimit or cleanuplimit. The operator OP is one of: [<⎪=⎪>].
1714
1715 You can combine all of these test into one expression or you can choose
1716 to test a certain limit. If you combine the limits you must or' them
1717 together using the OR keyword.
1718
1719 Here's an example were we test for a loglimit more than 10 percent, a
1720 dnslimit over 25 percent and a wait limit less than 20 percent of pro‐
1721 cesses. See also more examples below in the example section.
1722
1723 protocol apache-status
1724 loglimit > 10% or
1725 dnslimit > 50% or
1726 waitlimit < 20%
1727 then alert
1728
1729 Obviously, do not use this test unless the httpd server you are testing
1730 is Apache Httpd and mod_status is activated on the server.
1731
1732 send/expect: {SEND⎪EXPECT} "string" .... If Monit does not support the
1733 protocol spoken by the server, you can write your own protocol-test
1734 using send and expect strings. The SEND statement sends a string to the
1735 server port and the EXPECT statement compares a string read from the
1736 server with the string given in the expect statement. If your system
1737 supports POSIX regular expressions, you can use regular expressions in
1738 the expect string, see regex(7) to learn more about the types of regu‐
1739 lar expressions you can use in an expect string. Otherwise the string
1740 is used as it is. The send/expect statement is:
1741
1742 [{SEND⎪EXPECT} "string"]+
1743
1744 Note that Monit will send a string as it is, and you must remember to
1745 include CR and LF in the string sent to the server if the protocol
1746 expect such characters to terminate a string (most text based protocols
1747 used over Internet does). Likewise monit will read up to 256 bytes from
1748 the server and use this string when comparing the expect string. If the
1749 server sends strings terminated by CRLF, (i.e. "\r\n") you may remember
1750 to add the same terminating characters to the string you expect from
1751 the server.
1752
1753 As mentioned above, Monit limits the expect input to 255 bytes. You
1754 can override the default value by using this set statement at the top
1755 of the Monit configuration file:
1756
1757 SET EXPECTBUFFER <number> ["b"⎪"kb"⎪"mb"]
1758
1759 For example, to set the expect buffer to read 10 kilobytes:
1760
1761 set expectbuffer 10 kb
1762
1763 Note, if you want to test the number of bytes returned from the server
1764 you need to work around a bound check limit in POSIX regex. You cannot
1765 use something like expect ".{5000}" as the max number in a boundary
1766 check usually is {255}. However this should work, expect "(.{250}){20}"
1767
1768 You can use non-printable characters in a send string if needed. Use
1769 the hex notation, \0xHEXHEX to send any char in the range \0x00-\0xFF,
1770 that is, 0-255 in decimal. This may be useful when testing some network
1771 protocols, particularly those over UDP. For example, to test a quake 3
1772 server you can use the following,
1773
1774 send "\0xFF\0xFF\0xFF\0xFFgetstatus"
1775 expect "sv_floodProtect⎪sv_maxPing"
1776
1777 Finally, send/expect can be used with any socket type, such as TCP
1778 sockets, UNIX sockets and UDP sockets.
1779
1780 timeout:with TIMEOUT x SECONDS. Optionally specifies the connect and
1781 read timeout for the connection. If Monit cannot connect to the server
1782 within this time it will assume that the connection failed and execute
1783 the specified action. The default connect timeout is 5 seconds.
1784
1785 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC",
1786 "MONITOR" or "UNMONITOR".
1787
1788 Connection testing using the URL notation
1789
1790 You can test a HTTP server using the compact URL syntax. This test also
1791 allow you to use POSIX regular expressions to test the content returned
1792 by the HTTP server.
1793
1794 The full syntax for the URL statement is as follows (keywords are in
1795 capital and optional statements in [brackets]):
1796
1797 IF FAILED URL ULR-spec
1798 [CONTENT {==⎪!=} "regular-expression"]
1799 [TIMEOUT number SECONDS] [[<X>] <Y> CYCLES]
1800 THEN action
1801 [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1802
1803 Where URL-spec is an URL on the standard form as specified in RFC 2396:
1804
1805 <protocol>://<authority><path>?<query>
1806
1807 Here is an example of an URL where all components are used:
1808
1809 http://user:password@www.foo.bar:8080/document/?querystring#ref
1810
1811 If a username and password is included in the URL Monit will attempt to
1812 login at the server using Basic Authentication.
1813
1814 Testing the content returned by the server is optional. If used, you
1815 can test if the content match or does not match a regular expression.
1816 Here's an example on how the URL statement can be used in a check ser‐
1817 vice:
1818
1819 check host FOO with address www.foo.bar
1820 if failed url
1821 http://user:password@www.foo.bar:8080/?querystring
1822 and content == 'action="j_security_check"'
1823 then ...
1824
1825 Monit will look at the content-length header returned by the server and
1826 download this amount before testing the content. That is, if the con‐
1827 tent-length is more than 1Mb or this header is not set by the server
1828 Monit will default to download up to 1 Mb and not more.
1829
1830 Only the http(s) protocol is supported in an URL statement. If the pro‐
1831 tocol is https Monit will use SSL when connecting to the server.
1832
1833 Remote host ping test
1834
1835 In addition Monit can perform ICMP Echo tests in remote host checks.
1836 The icmp test may only be used in a check host entry and Monit must run
1837 with super user privileges, that is, the root user must run monit. The
1838 reason is that the icmp test utilize a raw socket to send the icmp
1839 packet and only the super user is allowed to create a raw socket.
1840
1841 The full syntax for the ICMP Echo statement used for ping testing is as
1842 follows (keywords are in capital and optional statements in [brack‐
1843 ets]):
1844
1845 IF FAILED ICMP TYPE ECHO
1846 [COUNT number] [WITH] [TIMEOUT number SECONDS]
1847 [[<X>] <Y> CYCLES]
1848 THEN action
1849 [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
1850
1851 The rules for action and timeout are the same as those mentioned above
1852 in the CONNECTION TESTING section. The count parameter specifies how
1853 many consecutive echo requests will be send to the host in one cycle.
1854 In the case that no reply came within timeout frame, Monit reports
1855 error. When at least one reply was received, the test will pass. Monit
1856 sends by default three echo requests in one cycle to prevent the random
1857 packet loss from generating false alarm (i.e. up to 66% packet loss is
1858 tolerated). You can set the count option to a value between 1 and 20,
1859 which can serve as an error ratio. For example if you require 100% ping
1860 success, set the count to 1 (i.e. just one request will be sent, and if
1861 the packet was lost an error will be reported).
1862
1863 An icmp ping test is useful for testing if a host is up, before testing
1864 ports at the host. If an icmp ping test is used in a check host entry,
1865 this test is run first and if the ping test should fail we assume that
1866 the connection to the host is down and Monit does not continue to test
1867 any ports. Here's an example:
1868
1869 check host xyzzy with address xyzzy.org
1870 if failed icmp type echo count 5 with timeout 15 seconds
1871 then alert
1872 if failed port 80 proto http then alert
1873 if failed port 443 type TCPSSL proto http then alert
1874 alert foo@bar
1875
1876 In this case, if the icmp test should fail you will get one alert and
1877 only one alert as long as the host is down, and equally important,
1878 Monit will not test port 80 and port 443. Likewise if the icmp ping
1879 test should succeed (again) Monit will continue to test both port 80
1880 and 443.
1881
1882 Keep in mind though that some firewalls can block icmp packages and
1883 thus render the test useless.
1884
1885 Examples
1886
1887 To check a port connection and receive an alert if Monit cannot connect
1888 to the port, use the following statement:
1889
1890 if failed port 80 then alert
1891
1892 In this case the machine in question is assumed to be the default host.
1893 For a process entry it's localhost and for a remote host entry it's the
1894 address of the remote host. Monit will conduct a tcp connection to the
1895 host at port 80 and use tcp by default. If you want to connect with
1896 udp, you can specify this after the port-statement;
1897
1898 if failed port 53 type udp protocol dns then alert
1899
1900 Monit will stop trying to connect to the port after 5 seconds and
1901 assume that the server behind the port is down. You may increase or
1902 decrease the connect timeout by explicit add a connection timeout. In
1903 the following example the timeout is increased to 15 seconds and if
1904 Monit cannot connect to the server within 15 seconds the test will fail
1905 and an alert message is sent.
1906
1907 if failed port 80 with timeout 15 seconds then alert
1908
1909 If a server is listening to a Unix socket the following statement can
1910 be used:
1911
1912 if failed unixsocket /var/run/sophie then alert
1913
1914 A Unix socket is used by some servers for fast (interprocess) communi‐
1915 cation on localhost only. A Unix socket is specified by a path and in
1916 the example above the path, /var/run/sophie, specifies a Unix socket.
1917
1918 If your machine answers for several virtual hosts you can prefix the
1919 port statement with a host-statement like so:
1920
1921 if failed host www.sol.no port 80 then alert
1922 if failed host 80.69.226.133 port 443 then alert
1923 if failed host kvasir.sol.no port 80 then alert
1924
1925 And as mentioned above, if you do not specify a host-statement, local‐
1926 host or address is assumed.
1927
1928 Monit also knows how to speak some of the more popular Internet proto‐
1929 cols. So, besides testing for connections, Monit can also speak with
1930 the server in question to verify that the server works. For example,
1931 the following is used to test a http server:
1932
1933 if failed host www.tildeslash.com port 80 proto http
1934 then restart
1935
1936 Some protocols also support a request statement. This statement can be
1937 used to ask the server for a special document entity.
1938
1939 Currently only the HTTP protocol module supports the request statement,
1940 such as:
1941
1942 if failed host www.myhost.com port 80 protocol http
1943 and request "/data/show.php?a=b&c=d"
1944 then restart
1945
1946 The request must contain an URI string specifying a document from the
1947 http server. The string will be URL encoded by Monit before it sends
1948 the request to the http server, so it's okay to use URL unsafe charac‐
1949 ters in the request. If the request statement isn't specified, the
1950 default web server page will be requested.
1951
1952 You can override default Host header in HTTP request:
1953
1954 if failed host 192.168.1.100 port 80 protocol http
1955 hostheader "example.com"
1956 then alert
1957
1958 You can also test the checksum for documents returned by a http server.
1959 You can use either MD5 sums:
1960
1961 if failed port 80 protocol http
1962 and request "/page.html"
1963 with checksum 8f7f419955cefa0b33a2ba316cba3659
1964 then alert
1965
1966 Or you can use SHA1 sums:
1967
1968 if failed port 80 protocol http
1969 and request "/page.html"
1970 with checksum e428302e260e0832007d82de853aa8edf19cd872
1971 then alert
1972
1973 Monit will compute a checksum (either MD5 or SHA1 is used, depending on
1974 length of the hash) for the document (in the above case, /page.html)
1975 and compare the computed checksum with the expected checksum. If the
1976 sums does not match then the if-tests action is performed, in this case
1977 alert. Note that Monit will not test the checksum for a document if the
1978 server does not set the HTTP Content-Length header. A HTTP server
1979 should set this header when it server a static document (i.e. a file).
1980 A server will often use chunked transfer encoding instead when serving
1981 dynamic content (e.g. a document created by a CGI-script or a Servlet),
1982 but to test the checksum for dynamic content is not very useful. There
1983 are no limitation on the document size, but keep in mind that Monit
1984 will use time to download the document over the network so it's proba‐
1985 bly smart not to ask monit to compute a checksum for documents larger
1986 than 1Mb or so, depending on you network connection of course. Tip; If
1987 you get a checksum error even if the document has the correct sum, the
1988 reason may be that the download timed out. In this case, explicit set a
1989 longer timeout than the default 5 seconds.
1990
1991 As mentioned above, if the server protocol is not supported by Monit
1992 you can write your own protocol test using send/expect strings. Here we
1993 show a protocol test using send/expect for an imaginary "Ali Baba and
1994 the Forty Thieves" protocol:
1995
1996 if failed host cave.persia.ir port 4040
1997 send "Open, Sesame!\r\n"
1998 expect "Please enter the cave\r\n"
1999 send "Shut, Sesame!\r\n"
2000 expect "See you later [A-Za-z ]+\r\n"
2001 then restart
2002
2003 The TCPSSL statement can optionally test the md5 sum of the server's
2004 certificate. You must state the md5 certificate string you expect the
2005 server to deliver and upon a connect to the server, the server's actual
2006 md5 sum certificate string is tested. Any other symbol but [A-Fa-f0-9]
2007 is being ignored in that sting. Thus it is possible to copy and paste
2008 the output of e.g. openssl. If they do not match, the connection test
2009 fails. If the ssl version handshake does not work properly you can also
2010 force a specific ssl version, as we demonstrate in this example:
2011
2012 if failed host shop.sol.no port 443
2013 type TCPSSL SSLV3 # Force Monit to use ssl version 3
2014 # We expect the server to return this md5 certificate sum
2015 # as either 12-34-56-78-90-AB-CD-EF-12-34-56-78-90-AB-CD-EF
2016 # or e.g. 1234567890ABCDEF1234567890ABCDEF
2017 # or e.g. 1234567890abcdef1234567890abcdef
2018 # what ever come in more handy (see text above)
2019 CERTMD5 12-34-56-78-90-AB-CD-EF-12-34-56-78-90-AB-CD-EF
2020 protocol http
2021 then restart
2022
2023 Here's an example where a connection test is used inside a process
2024 entry:
2025
2026 check process apache with pidfile /var/run/apache.pid
2027 start program = "/etc/init.d/httpd start"
2028 stop program = "/etc/init.d/httpd stop"
2029 if failed host www.tildeslash.com port 80 then restart
2030
2031 Here, a connection test is used in a remote host entry:
2032
2033 check host up2date with address ftp.redhat.com
2034 if failed port 21 and protocol ftp then alert
2035
2036 Since we did not explicit specify a host in the above test, monit will
2037 connect to port 21 at ftp.redhat.com. Apropos, the host address can be
2038 specified as a dotted IP address string or as hostname in the DNS. The
2039 following is exactly[*] the same test, but here an ip address is used
2040 instead:
2041
2042 check host up2date with address 66.187.232.30
2043 if failed port 21 and protocol ftp then alert
2044
2045 [*] Well, not quite, since we specify an ip-address directly we will
2046 bypass any DNS round-robin setup, but that's another story.
2047
2048 Testing the SIP protocol
2049
2050 The SIP protocol is used by communication platform servers such as
2051 Asterisk and FreeSWITCH.
2052
2053 The SIP test is similar to the other protocol tests, but in addition
2054 allows extra optional parameters.
2055
2056 IF FAILED [host] [port] [type] PROTOCOL sip [AND] [TARGET valid@uri]
2057 [AND] [MAXFORWARD n] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES]
2058 THEN action]
2059
2060 TARGET : you may specify an alternative recipient for the message, by
2061 adding a valid sip uri after this keyword.
2062
2063 MAXFORWARD : Limit the number of proxies or gateways that can forward
2064 the request to the next server. It's value is an integer in the range
2065 0-255, set by default to 70. If max-forward = 0, the next server may
2066 respond 200 OK (test succeeded) or send a 483 Too Many Hops (test
2067 failed)
2068
2069 SIP examples:
2070
2071 check host openser_all with address 127.0.0.1
2072 if failed port 5060 type udp protocol sip
2073 with target "localhost:5060" and maxforward 6
2074 then alert
2075
2076 If sips is supported, that is, sip over ssl, specify tcpssl as the con‐
2077 nection type.
2078
2079 check host fwd.pulver.com with address fwd.pulver.com
2080 if failed port 5060 type tcpssl protocol SIP
2081 and target 613@fwd.pulver.com maxforward 10
2082 then alert
2083
2084 For more examples, see the example section below.
2085
2086 Testing the RADIUS protocol
2087
2088 The SIP test is similar to the other protocol tests, but in addition
2089 allows extra optional parameters.
2090
2091 IF FAILED [host] [port] [type] PROTOCOL radius [SECRET string] THEN
2092 action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
2093
2094 SECRET: you may specify an alternative secret, default is "testing123".
2095
2096 RADIUS example:
2097
2098 check process radiusd with pidfile /var/run/radiusd.pid
2099 start program = "/etc/init.d/freeradius start"
2100 stop program = "/etc/init.d/freeradius stop"
2101 if failed host 127.0.0.1 port 1812 type udp protocol radius secret testing123 then alert
2102 if 5 restarts within 5 cycles then timeout
2103
2105 If specified in the control file, Monit will start a Monit daemon with
2106 http support. From a Browser you can then start and stop services, dis‐
2107 able or enable service monitoring as well as view the status of each
2108 service. Also, if Monit logs to its own file, you can view the content
2109 of this logfile in a Browser.
2110
2111 The control file statement for starting a Monit daemon with http sup‐
2112 port is a global set-statement:
2113
2114 set httpd port 2812
2115
2116 And you can use this URL, http://localhost:2812/, to access the daemon
2117 from a browser. The port number, in this case 2812, can be any number
2118 that you are allowed to bind to.
2119
2120 If you have compiled Monit with openssl, you can also start the httpd
2121 server with ssl support, using the following expression:
2122
2123 set httpd port 2812
2124 ssl enable
2125 pemfile /etc/certs/monit.pem
2126
2127 And you can use this URL, https://localhost:2812/, to access the Monit
2128 web server over an ssl encrypted connection.
2129
2130 The pemfile, in the example above, holds both the server's private key
2131 and certificate. This file should be stored in a safe place on the
2132 filesystem and should have strict permissions, that is, no more than
2133 0700.
2134
2135 In addition, if you want to check for client certificates you can use
2136 the CLIENTPEMFILE statement. In this case, a connecting client has to
2137 provided a certificate known by Monit in order to connect. This file
2138 also needs to have all necessary CA certificates. A configuration could
2139 look like:
2140
2141 set httpd port 2812
2142 ssl enable
2143 pemfile /etc/certs/monit.pem
2144 clientpemfile /etc/certs/monit-client.pem
2145
2146 By default self signed client certificates are not allowed. If you want
2147 to use a self signed certificate from a client it has to be allowed
2148 explicitly with the ALLOWSELFCERTIFICATION statement.
2149
2150 For more information on how to use Monit with SSL and for more informa‐
2151 tion about certificates and generating pem files, please consult the
2152 README.SSL file accompanying the software.
2153
2154 If you only want the http server to accept connect requests to one host
2155 addresses you can specify the bind address either as an IP number
2156 string or as a hostname. In the following example we bind the http
2157 server to the loopback device. In other words the http server will only
2158 be reachable from localhost:
2159
2160 set httpd port 2812 and use the address 127.0.0.1
2161
2162 or
2163
2164 set httpd port 2812 and use the address localhost
2165
2166 If you do not use the ADDRESS statement the http server will accept
2167 connections on any/all local addresses.
2168
2169 It is possible to hide monit's httpd server version, which usually is
2170 available in httpd header responses and in error pages.
2171
2172 set httpd port 2812
2173 ...
2174 signature {enable⎪disable}
2175
2176 Use disable to hide the server signature - Monit will only report its
2177 name (e.g. 'monit' instead of for example 'monit 4.2'). By default the
2178 version signature is enabled. It is worth to stress that this option
2179 provides no security advantage and falls into the "security through
2180 obscurity" category.
2181
2182 If you remove the httpd statement from the config file, monit will stop
2183 the httpd server on configuration reload. Likewise if you change the
2184 port number, Monit will restart the http server using the new specified
2185 port number.
2186
2187 The status page displayed by the Monit web server is automatically
2188 refreshed with the same poll time set for the monit daemon.
2189
2190 Note:
2191
2192 We strongly recommend that you start Monit with http support (and bind
2193 the server to localhost, only, unless you are behind a firewall). The
2194 built-in web-server is small and does not use much resources, and more
2195 importantly, Monit can use the http server for interprocess communica‐
2196 tion between a Monit client and a monit daemon.
2197
2198 For instance, you must start a Monit daemon with http support if you
2199 want to be able to use most of the available console commands. I.e.
2200 'Monit stop all', 'Monit start all' etc.
2201
2202 If a Monit daemon is running in the background we will ask the daemon
2203 (via the HTTP protocol) to execute the above commands. That is, the
2204 daemon is requested to start and stop the services. This ensures that
2205 a daemon will not restart a service that you requested to stop and that
2206 (any) timeout lock will be removed from a service when you start it.
2207
2208 Monit HTTPD Authentication
2209
2210 Monit supports two types of authentication schema's for connecting to
2211 the httpd server, (three, if you count SSL client certificate valida‐
2212 tion). Both schema's can be used together or by itself. You must choose
2213 at least one.
2214
2215 Host and network allow list
2216
2217 The http server maintains an access-control list of hosts and networks
2218 allowed to connect to the server. You can add as many hosts as you want
2219 to, but only hosts with a valid domain name or its IP address are
2220 allowed. Networks require a network IP and a netmask to be accepted.
2221
2222 The http server will query a name server to check any hosts connecting
2223 to the server. If a host (client) is trying to connect to the server,
2224 but cannot be found in the access list or cannot be resolved, the
2225 server will shutdown the connection to the client promptly.
2226
2227 Control file example:
2228
2229 set httpd port 2812
2230 allow localhost
2231 allow my.other.work.machine.com
2232 allow 10.1.1.1
2233 allow 192.168.1.0/255.255.255.0
2234 allow 10.0.0.0/8
2235
2236 Clients, not mentioned in the allow list, trying to connect to the
2237 server are logged with their ip-address.
2238
2239 Basic Authentication
2240
2241 This authentication schema is HTTP specific and described in more
2242 detail in RFC 2617.
2243
2244 In short; a server challenge a client (e.g. a Browser) to send authen‐
2245 tication information (username and password) and if accepted, the
2246 server will allow the client access to the requested document.
2247
2248 The biggest weakness with Basic Authentication is that the username and
2249 password is sent in clear-text (i.e. base64 encoded) over the network.
2250 It is therefor recommended that you do not use this authentication
2251 method unless you run the Monit http server with ssl support. With ssl
2252 support it is completely safe to use Basic Authentication since all
2253 http data, including Basic Authentication headers will be encrypted.
2254
2255 Monit will use Basic Authentication if an allow statement contains a
2256 username and a password separated with a single ':' character, like so:
2257 allow username:password. The username and password must be written in
2258 clear-text. Special characters can be used but the password has to be
2259 quoted.
2260
2261 PAM is supported as well on platforms which provide PAM (such as Linux,
2262 Mac OS X, FreeBSD, NetBSD). The syntax is: allow @mygroup which pro‐
2263 vides access to the user of group called mygroup. Monit uses PAM ser‐
2264 vice called monit for PAM authentication, see PAM manual page for
2265 detailed instructions how to set the PAM service and PAM authentication
2266 plugins. Example Monit PAM for Mac OS X - /etc/pam.d/monit:
2267
2268 # monit: auth account password session
2269 auth sufficient pam_securityserver.so
2270 auth sufficient pam_unix.so
2271 auth required pam_deny.so
2272 account required pam_permit.so
2273
2274 And configuration part for monitrc which allows only group admins
2275 authenticated using via PAM to access the http interface:
2276
2277 set httpd port 2812 allow @admin
2278
2279 Alternatively you can use files in "htpasswd" format (one user:passwd
2280 entry per line), like so: allow [cleartext⎪crypt⎪md5] /path [users]. By
2281 default cleartext passwords are read. In case the passwords are
2282 digested it is necessary to specify the cryptographic method. If you do
2283 not want all users in the password file to have access to Monit you can
2284 specify only those users that should have access, in the allow state‐
2285 ment. Otherwise all users are added.
2286
2287 Example1:
2288
2289 set httpd port 2812
2290 allow hauk:password
2291 allow md5 /etc/httpd/htpasswd john paul ringo george
2292
2293 If you use this method together with a host list, then only clients
2294 from the listed hosts will be allowed to connect to the Monit http
2295 server and each client will be asked to provide a username and a pass‐
2296 word.
2297
2298 Example2:
2299
2300 set httpd port 2812
2301 allow localhost
2302 allow 10.1.1.1
2303 allow hauk:"password"
2304
2305 If you only want to use Basic Authentication, then just provide allow
2306 entries with username and password or password files as in example 1
2307 above.
2308
2309 Finally it is possible to define some users as read-only. A read-only
2310 user can read the Monit web pages but will not get access to push-but‐
2311 tons and cannot change a service from the web interface.
2312
2313 set httpd port 2812
2314 allow admin:password
2315 allow hauk:password read-only
2316 allow @admins
2317 allow @users read-only
2318
2319 A user is set to read-only by using the read-only keyword after user‐
2320 name:password. In the above example the user hauk is defined as a read-
2321 only user, while the admin user has all access rights.
2322
2323 If you use Basic Authentication it is a good idea to set the access
2324 permission for the control file (~/.monitrc) to only readable and
2325 writable for the user running monit, because the password is written in
2326 clear-text. (Use this command, /bin/chmod 600 ~/.monitrc). In fact,
2327 since Monit version 3.0, Monit will complain and exit if the control
2328 file is readable by others.
2329
2330 Clients trying to connect to the server but supply the wrong username
2331 and/or password are logged with their ip-address.
2332
2333 If the Monit command line interface is being used, at least one cleart‐
2334 ext password is necessary. Otherwise, the Monit command line interface
2335 will not be able to connect to the Monit daemon server.
2336
2338 If specified in the control file, Monit can do dependency checking
2339 before start, stop, monitoring or unmonitoring of services. The depen‐
2340 dency statement may be used within any service entries in the Monit
2341 control file.
2342
2343 The syntax for the depend statement is simply:
2344
2345 DEPENDS on service[, service [,...]]
2346
2347 Where service is a service entry name, for instance apache or datafs.
2348
2349 You may add more than one service name of any type or use more than one
2350 depend statement in an entry.
2351
2352 Services specified in a depend statement will be checked during
2353 stop/start/monitor/unmonitor operations. If a service is stopped or
2354 unmonitored it will stop/unmonitor any services that depends on itself.
2355 Likewise, if a service is started, it will first stop any services that
2356 depends on itself and after it is started, start all depending services
2357 again. If the service is to be monitored (enable monitoring), all ser‐
2358 vices which this service depends on will be monitored before enabling
2359 monitoring of this service.
2360
2361 Here is an example where we set up an apache service entry to depend on
2362 the underlying apache binary. If the binary should change an alert is
2363 sent and apache is not monitored anymore. The rationale is security and
2364 that Monit should not execute a possibly cracked apache binary.
2365
2366 (1) check process apache
2367 (2) with pidfile "/usr/local/apache/logs/httpd.pid"
2368 (3) ...
2369 (4) depends on httpd
2370 (5)
2371 (6) check file httpd with path /usr/local/apache/bin/httpd
2372 (7) if failed checksum then unmonitor
2373
2374 The first entry is the process entry for apache shown before (abbrevi‐
2375 ated for clarity). The fourth line sets up a dependency between this
2376 entry and the service entry named httpd in line 6. A depend tree works
2377 as follows, if an action is conducted in a lower branch it will propa‐
2378 gate upward in the tree and for every dependent entry execute the same
2379 action. In this case, if the checksum should fail in line 7 then an
2380 unmonitor action is executed and the apache binary is not checked any‐
2381 more. But since the apache process entry depends on the httpd entry
2382 this entry will also execute the unmonitor action. In short, if the
2383 checksum test for the httpd binary file should fail, both the check
2384 file httpd entry and the check process apache entry is set in un-moni‐
2385 toring mode.
2386
2387 A dependency tree is a general construct and can be used between all
2388 types of service entries and span many levels and propagate any sup‐
2389 ported action (except the exec action which will not propagate upward
2390 in a dependency tree for obvious reasons).
2391
2392 Here is another different example. Consider the following common server
2393 setup:
2394
2395 WEB-SERVER -> APPLICATION-SERVER -> DATABASE -> FILESYSTEM
2396 (a) (b) (c) (d)
2397
2398 You can set dependencies so that the web-server depends on the applica‐
2399 tion server to run before the web-server starts and the application
2400 server depends on the database server and the database depends on the
2401 file-system to be mounted before it starts. See also the example sec‐
2402 tion below for examples using the depend statement.
2403
2404 Here we describe how Monit will function with the above dependencies:
2405
2406 If no servers are running
2407 Monit will start the servers in the following order: d, c, b, a
2408
2409 If all servers are running
2410 When you run 'Monit stop all' this is the stop order: a, b, c, d.
2411 If you run 'Monit stop d' then a, b and c are also stopped because
2412 they depend on d and finally d is stopped.
2413
2414 If a does not run
2415 When Monit runs it will start a
2416
2417 If b does not run
2418 When Monit runs it will first stop a then start b and finally start
2419 a again.
2420
2421 If c does not run
2422 When Monit runs it will first stop a and b then start c and finally
2423 start b then a.
2424
2425 If d does not run
2426 When Monit runs it will first stop a, b and c then start d and
2427 finally start c, b then a.
2428
2429 If the control file contains a depend loop.
2430 A depend loop is for example; a->b and b->a or a->b->c->a.
2431
2432 When Monit starts it will check for such loops and complain and
2433 exit if a loop was found. It will also exit with a complaint if a
2434 depend statement was used that does not point to a service in the
2435 control file.
2436
2438 The preferred way to set up Monit is to write a .monitrc file in your
2439 home directory. When there is a conflict between the command-line argu‐
2440 ments and the arguments in this file, the command-line arguments take
2441 precedence. To protect the security of your control file and passwords
2442 the control file must have permissions no more than 0700 (u=xrw,g=,o=);
2443 Monit will complain and exit otherwise.
2444
2445 Run Control Syntax
2446
2447 Comments begin with a '#' and extend through the end of the line. Oth‐
2448 erwise the file consists of a series of service entries or global
2449 option statements in a free-format, token-oriented syntax.
2450
2451 There are three kinds of tokens: grammar keywords, numbers (i.e. deci‐
2452 mal digit sequences) and strings. Strings can be either quoted or
2453 unquoted. A quoted string is bounded by double quotes and may contain
2454 whitespace (and quoted digits are treated as a string). An unquoted
2455 string is any whitespace-delimited token, containing characters and/or
2456 numbers.
2457
2458 On a semantic level, the control file consists of two types of entries:
2459
2460 1. Global set-statements
2461 A global set-statement starts with the keyword set and the item to
2462 configure.
2463
2464 2. One or more service entry statements.
2465 Each service entry consists of the keywords `check', followed by
2466 the service type. Each entry requires a <unique> descriptive name,
2467 which may be freely chosen. This name is used by monit to refer to
2468 the service internally and in all interactions with the user.
2469
2470 Currently, six types of check statements are supported:
2471
2472 1. CHECK PROCESS <unique name> PIDFILE <path>
2473 <path> is the absolute path to the program's pidfile. If the pid‐
2474 file does not exist or does not contain the pid number of a running
2475 process, Monit will call the entry's start method if defined, If
2476 Monit runs in passive mode or the start methods is not defined,
2477 Monit will just send alerts on errors.
2478
2479 2. CHECK FILE <unique name> PATH <path>
2480 <path> is the absolute path to the file. If the file does not exist
2481 or disappeared, Monit will call the entry's start method if
2482 defined, if <path> does not point to a regular file type (for
2483 instance a directory), Monit will disable monitoring of this entry.
2484 If Monit runs in passive mode or the start methods is not defined,
2485 Monit will just send alerts on errors.
2486
2487 3. CHECK FIFO <unique name> PATH <path>
2488 <path> is the absolute path to the fifo. If the fifo does not exist
2489 or disappeared, Monit will call the entry's start method if
2490 defined, if <path> does not point to a fifo type (for instance a
2491 directory), Monit will disable monitoring of this entry. If Monit
2492 runs in passive mode or the start methods is not defined, Monit
2493 will just send alerts on errors.
2494
2495 4. CHECK FILESYSTEM <unique name> PATH <path>
2496 <path> is the path to the filesystem block special device, mount
2497 point, file or a directory which is part of a filesystem. It is
2498 recommended to use a block special file directly (for example
2499 /dev/hda1 on Linux or /dev/dsk/c0t0d0s1 on Solaris, etc.) If you
2500 use a mount point (for example /data), be careful, because if the
2501 filesystem is unmounted the test will still be true because the
2502 mount point exist.
2503
2504 If the filesystem becomes unavailable, Monit will call the entry's
2505 start method if defined. if <path> does not point to a filesystem,
2506 Monit will disable monitoring of this entry. If Monit runs in pas‐
2507 sive mode or the start methods is not defined, Monit will just send
2508 alerts on errors.
2509
2510 5. CHECK DIRECTORY <unique name> PATH <path>
2511 <path> is the absolute path to the directory. If the directory does
2512 not exist or disappeared, Monit will call the entry's start method
2513 if defined, if <path> does not point to a directory, monit will
2514 disable monitoring of this entry. If Monit runs in passive mode or
2515 the start methods is not defined, Monit will just send alerts on
2516 errors.
2517
2518 6. CHECK HOST <unique name> ADDRESS <host address>
2519 The host address can be specified as a hostname string or as an ip-
2520 address string on a dotted decimal format. Such as, tildeslash.com
2521 or "64.87.72.95".
2522
2523 7. CHECK SYSTEM <unique name>
2524 The system name is usualy hostname, but any descriptive name can be
2525 used. This test allows to check general system resources such as
2526 CPU usage (percent of time spent in user, system and wait), total
2527 memory usage or load average.
2528
2529 You can use noise keywords like 'if', `and', `with(in)', `has',
2530 `using', 'use', 'on(ly)', `usage' and `program(s)' anywhere in an entry
2531 to make it resemble English. They're ignored, but can make entries much
2532 easier to read at a glance. The punctuation characters ';' ',' and '='
2533 are also ignored. Keywords are case insensitive.
2534
2535 Here are the legal global keywords:
2536
2537 Keyword Function
2538 ----------------------------------------------------------------
2539 set daemon Set a background poll interval in seconds.
2540 set init Set Monit to run from init. Monit will not
2541 transform itself into a daemon process.
2542 set logfile Name of a file to dump error- and status-
2543 messages to. If syslog is specified as the
2544 file, Monit will utilize the syslog daemon
2545 to log messages. This can optionally be
2546 followed by 'facility <facility>' where
2547 facility is 'log_local0' - 'log_local7' or
2548 'log_daemon'. If no facility is specified,
2549 LOG_USER is used.
2550 set mailserver The mailserver used for sending alert
2551 notifications. If the mailserver is not
2552 defined, Monit will try to use 'localhost'
2553 as the smtp-server for sending mail. You
2554 can add more mail servers, if Monit cannot
2555 connect to the first server it will try the
2556 next server and so on.
2557 set mail-format Set a global mail format for all alert
2558 messages emitted by monit.
2559 set idfile Explicit set the location of the Monit id
2560 file. E.g. set idfile /var/monit/id.
2561 set pidfile Explicit set the location of the Monit lock
2562 file. E.g. set pidfile /var/run/xyzmonit.pid.
2563 set statefile Explicit set the location of the file Monit
2564 will write state data to. If not set, the
2565 default is $HOME/.monit.state.
2566 set httpd port Activates Monit http server at the given
2567 port number.
2568 ssl enable Enables ssl support for the httpd server.
2569 Requires the use of the pemfile statement.
2570 ssl disable Disables ssl support for the httpd server.
2571 It is equal to omitting any ssl statement.
2572 pemfile Set the pemfile to be used with ssl.
2573 clientpemfile Set the pemfile to be used when client
2574 certificates should be checked by monit.
2575 address If specified, the http server will only
2576 accept connect requests to this addresses
2577 This statement is an optional part of the
2578 set httpd statement.
2579 allow Specifies a host or IP address allowed to
2580 connect to the http server. Can also specify
2581 a username and password allowed to connect
2582 to the server. More than one allow statement
2583 are allowed. This statement is also an
2584 optional part of the set httpd statement.
2585 read-only Set the user defined in username:password
2586 to read only. A read-only user cannot change
2587 a service from the Monit web interface.
2588 include include a file or files matching the globstring
2589
2590 Here are the legal service entry keywords:
2591
2592 Keyword Function
2593 ----------------------------------------------------------------
2594 check Starts an entry and must be followed by the type
2595 of monitored service {filesystem⎪directory⎪file⎪host
2596 process⎪system} and a descriptive name for the
2597 service.
2598 pidfile Specify the process pidfile. Every
2599 process must create a pidfile with its
2600 current process id. This statement should only
2601 be used in a process service entry.
2602 path Must be followed by a path to the block
2603 special file for filesystem, regular
2604 file, directory or a process's pidfile.
2605 group Specify a groupname for a service entry.
2606 start The program used to start the specified
2607 service. Full path is required. This
2608 statement is optional, but recommended.
2609 stop The program used to stop the specified
2610 service. Full path is required. This
2611 statement is optional, but recommended.
2612 pid and ppid These keywords may be used as standalone
2613 statements in a process service entry to
2614 override the alert action for change of
2615 process pid and ppid.
2616 uid and gid These keywords are either 1) an optional part of
2617 a start, stop or exec statement. They may be
2618 used to specify a user id and a group id the
2619 program (process) should switch to upon start.
2620 This feature can only be used if the superuser
2621 is running monit. 2) uid and gid may also be
2622 used as standalone statements in a file service
2623 entry to test a file's uid and gid attributes.
2624 host The hostname or IP address to test the port
2625 at. This keyword can only be used together
2626 with a port statement or in the check host
2627 statement.
2628 port Specify a TCP/IP service port number which
2629 a process is listening on. This statement
2630 is also optional. If this statement is not
2631 prefixed with a host-statement, localhost is
2632 used as the hostname to test the port at.
2633 type Specifies the socket type Monit should use when
2634 testing a connection to a port. If the type
2635 keyword is omitted, tcp is used. This keyword
2636 must be followed by either tcp, udp or tcpssl.
2637 tcp Specifies that Monit should use a TCP
2638 socket type (stream) when testing a port.
2639 tcpssl Specifies that Monit should use a TCP socket
2640 type (stream) and the secure socket layer (ssl)
2641 when testing a port connection.
2642 udp Specifies that Monit should use a UDP socket
2643 type (datagram) when testing a port.
2644 certmd5 The md5 sum of a certificate a ssl forged
2645 server has to deliver.
2646 proto(col) This keyword specifies the type of service
2647 found at the port. See CONNECTION TESTING
2648 for list of supported protocols.
2649 You're welcome to write new protocol test
2650 modules. If no protocol is specified Monit will
2651 use a default test which in most cases are good
2652 enough.
2653 request Specifies a server request and must come
2654 after the protocol keyword mentioned above.
2655 - for http it can contain an URL and an
2656 optional query string.
2657 - other protocols does not support this
2658 statement yet
2659 send/expect These keywords specify a generic protocol.
2660 Both require a string whether to be sent or
2661 to be matched against (as extended regex if
2662 supported). Send/expect can not be used
2663 together with the proto(col) statement.
2664 unix(socket) Specifies a Unix socket file and used like
2665 the port statement above to test a Unix
2666 domain network socket connection.
2667 URL Specify an URL string which Monit will use for
2668 connection testing.
2669 content Optional sub-statement for the URL statement.
2670 Specifies that Monit should test the content
2671 returned by the server against a regular
2672 expression.
2673 timeout x sec. Define a network port connection timeout. Must
2674 be followed by a number in seconds and the
2675 keyword, seconds.
2676 timeout Define a service timeout. Must be followed by
2677 two digits. The first digit is max number of
2678 restarts for the service. The second digit
2679 is the cycle interval to test restarts.
2680 This statement is optional.
2681 alert Specifies an email address for notification
2682 if a service event occurs. Alert can also
2683 be postfixed, to only send a message for
2684 certain events. See the examples above. More
2685 than one alert statement is allowed in an
2686 entry. This statement is also optional.
2687 noalert Specifies an email address which don't want
2688 to receive alerts. This statement is also
2689 optional.
2690 restart, stop These keywords may be used as actions for
2691 unmonitor, various test statements. The exec statement is
2692 start and special in that it requires a following string
2693 exec specifying the program to be execute. You may
2694 also specify an UID and GID for the exec
2695 statement. The program executed will then run
2696 using the specified user id and group id.
2697 mail-format Specifies a mail format for an alert message
2698 This statement is an optional part of the
2699 alert statement.
2700 checksum Specify that Monit should compute and monitor a
2701 file's md5/sha1 checksum. May only be used in a
2702 check file entry.
2703 expect Specifies a md5/sha1 checksum string Monit
2704 should expect when testing the checksum. This
2705 statement is an optional part of the checksum
2706 statement.
2707 timestamp Specifies an expected timestamp for a file
2708 or directory. More than one timestamp statement
2709 are allowed. May only be used in a check file or
2710 check directory entry.
2711 changed Part of a timestamp statement and used as an
2712 operator to simply test for a timestamp change.
2713 every Validate this entry only at every n poll cycle.
2714 Useful in daemon mode when the cycle is short
2715 and a service takes some time to start.
2716 mode Must be followed either by the keyword active,
2717 passive or manual. If active, Monit will restart
2718 the service if it is not running (this is the
2719 default behavior). If passive, Monit will not
2720 (re)start the service if it is not running - it
2721 will only monitor and send alerts (resource
2722 related restart and stop options are ignored
2723 in this mode also). If manual, Monit will enter
2724 active mode only if a service was started under
2725 monit's control otherwise the service isn't
2726 monitored.
2727 cpu Must be followed by a compare operator, a number
2728 with "%" and an action. This statement is used
2729 to check the cpu usage in percent of a process
2730 with its children over a number of cycles. If
2731 the compare expression matches then the
2732 specified action is executed.
2733 mem The equivalent to the cpu token for memory of a
2734 process (w/o children!). This token must be
2735 followed by a compare operator a number with
2736 unit {B⎪KB⎪MB⎪GB⎪%⎪byte⎪kilobyte⎪megabyte⎪
2737 gigabyte⎪percent} and an action.
2738 loadavg Must be followed by [1min,5min,15min] in (), a
2739 compare operator, a number and an action. This
2740 statement is used to check the system load
2741 average over a number of cycles. If the compare
2742 expression matches then the specified action is
2743 executed.
2744 children This is the number of child processes spawn by a
2745 process. The syntax is the same as above.
2746 totalmem The equivalent of mem, except totalmem is an
2747 aggregation of memory, not only used by a
2748 process but also by all its child
2749 processes. The syntax is the same as above.
2750 space Must be followed by a compare operator, a
2751 number, unit {B⎪KB⎪MB⎪GB⎪%⎪byte⎪kilobyte⎪
2752 megabyte⎪gigabyte⎪percent} and an action.
2753 inode(s) Must be followed by a compare operator, integer
2754 number, optionally by percent sign (if not, the
2755 limit is absolute) and an action.
2756 perm(ission) Must be followed by an octal number describing
2757 the permissions.
2758 size Must be followed by a compare operator, a
2759 number, unit {B⎪KB⎪MB⎪GB⎪byte⎪kilobyte⎪
2760 megabyte⎪gigabyte} and an action.
2761 depends (on) Must be followed by the name of a service this
2762 service depends on.
2763
2764 Here's the complete list of reserved keywords used by monit:
2765
2766 if, then, else, set, daemon, logfile, syslog, address, httpd, ssl,
2767 enable, disable, pemfile, allow, read-only, check, init, count, pid‐
2768 file, statefile, group, start, stop, uid, gid, connection, port(num‐
2769 ber), unix(socket), type, proto(col), tcp, tcpssl, udp, alert, noalert,
2770 mail-format, restart, timeout, checksum, resource, expect, send,
2771 mailserver, every, mode, active, passive, manual, depends, host,
2772 default, http, ftp, smtp, pop, ntp3, nntp, imap, clamav, ssh, dwp,
2773 ldap2, ldap3, tns, request, cpu, mem, totalmem, children, loadavg,
2774 timestamp, changed, second(s), minute(s), hour(s), day(s), space,
2775 inode, pid, ppid, perm(ission), icmp, process, file, directory,
2776 filesystem, size, action, unmonitor, rdate, rsync, data, invalid, exec,
2777 nonexist, policy, reminder, instance, eventqueue, basedir, slot(s),
2778 system, idfile, gps, radius, secret, target, maxforward, hostheader and
2779 failed
2780
2781 And here is a complete list of noise keywords ignored by monit:
2782
2783 is, as, are, on(ly), with(in), and, has, using, use, the, sum, pro‐
2784 gram(s), than, for, usage, was, but, of.
2785
2786 Note: If the start or stop programs are shell scripts, then the script
2787 must begin with "#!" and the remainder of the first line must specify
2788 an interpreter for the program. E.g. "#!/bin/sh"
2789
2790 It's possible to write scripts directly into the start and stop entries
2791 by using a string of shell-commands. Like so:
2792
2793 start="/bin/bash -c 'echo $$ > pidfile; exec program'"
2794 stop="/bin/bash -c 'kill -s SIGTERM `cat pidfile`'"
2795
2796 CONFIGURATION EXAMPLES
2797
2798 The simplest form is just the check statement. In this example we check
2799 to see if the server is running and log a message if not:
2800
2801 check process resin with pidfile /usr/local/resin/srun.pid
2802
2803 To have Monit start the server if it's not running, add a start state‐
2804 ment:
2805
2806 check process resin with pidfile /usr/local/resin/srun.pid
2807 start program = "/usr/local/resin/bin/srun.sh start"
2808
2809 Here's a more advanced example for monitoring an apache web-server lis‐
2810 tening on the default port number for HTTP and HTTPS. In this example
2811 Monit will restart apache if it's not accepting connections at the port
2812 numbers. The method Monit use for a process restart is to first execute
2813 the stop-program, wait up to 30s for the process to stop and then exe‐
2814 cute the start-program and wait up to 30s for it to start. The length
2815 of start or stop timeout can be overriden using the 'timeout' option.
2816 If Monit was unable to stop or start the service a failed alert message
2817 will be sent if you have requested alert messages to be sent.
2818
2819 check process apache with pidfile /var/run/httpd.pid
2820 start program = "/etc/init.d/httpd start" with timeout 60 seconds
2821 stop program = "/etc/init.d/httpd stop"
2822 if failed port 80 then restart
2823 if failed port 443 with timeout 15 seconds then restart
2824
2825 This example demonstrate how you can run a program as a specified user
2826 (uid) and with a specified group (gid). Many daemon programs will do
2827 the uid and gid switch by them self, but for those programs that does
2828 not (e.g. Java programs), monit's ability to start a program as a cer‐
2829 tain user can be very useful. In this example we start the Tomcat Java
2830 Servlet Engine as the standard nobody user and group. Please note that
2831 Monit will only switch uid and gid for a program if the super-user is
2832 running monit, otherwise Monit will simply ignore the request to change
2833 uid and gid.
2834
2835 check process tomcat with pidfile /var/run/tomcat.pid
2836 start program = "/etc/init.d/tomcat start"
2837 as uid nobody and gid nobody
2838 stop program = "/etc/init.d/tomcat stop"
2839 # You can also use id numbers instead and write:
2840 as uid 99 and with gid 99
2841 if failed port 8080 then alert
2842
2843 In this example we use udp for connection testing to check if the name-
2844 server is running and also use timeout and alert:
2845
2846 check process named with pidfile /var/run/named.pid
2847 start program = "/etc/init.d/named start"
2848 stop program = "/etc/init.d/named stop"
2849 if failed port 53 use type udp protocol dns then restart
2850 if 3 restarts within 5 cycles then timeout
2851
2852 The following example illustrates how to check if the service 'sophie'
2853 is answering connections on its Unix domain socket:
2854
2855 check process sophie with pidfile /var/run/sophie.pid
2856 start program = "/etc/init.d/sophie start"
2857 stop program = "/etc/init.d/sophie stop"
2858 if failed unix /var/run/sophie then restart
2859
2860 In this example we check an apache web-server running on localhost that
2861 answers for several IP-based virtual hosts or vhosts, hence the host
2862 statement before port:
2863
2864 check process apache with pidfile /var/run/httpd.pid
2865 start "/etc/init.d/httpd start"
2866 stop "/etc/init.d/httpd stop"
2867 if failed host www.sol.no port 80 then alert
2868 if failed host shop.sol.no port 443 then alert
2869 if failed host chat.sol.no port 80 then alert
2870 if failed host www.tildeslash.com port 80 then alert
2871
2872 To make sure that Monit is communicating with a http server a protocol
2873 test can be added:
2874
2875 check process apache with pidfile /var/run/httpd.pid
2876 start "/etc/init.d/httpd start"
2877 stop "/etc/init.d/httpd stop"
2878 if failed host www.sol.no port 80
2879 protocol HTTP
2880 then alert
2881
2882 This example shows a different way to check a webserver using the
2883 send/expect mechanism:
2884
2885 check process apache with pidfile /var/run/httpd.pid
2886 start "/etc/init.d/httpd start"
2887 stop "/etc/init.d/httpd stop"
2888 if failed host www.sol.no port 80
2889 send "GET / HTTP/1.0\r\nHost: www.sol.no\r\n\r\n"
2890 expect "HTTP/[0-9\.]{3} 200 .*\r\n"
2891 then alert
2892
2893 To make sure that Apache is logging successfully (i.e. no more than 60
2894 percent of child servers are logging), use its mod_status page at
2895 www.sol.no/server-status with this special protocol test:
2896
2897 check process apache with pidfile /var/run/httpd.pid
2898 start "/etc/init.d/httpd start"
2899 stop "/etc/init.d/httpd stop"
2900 if failed host www.sol.no port 80
2901 protocol apache-status loglimit > 60% then restart
2902
2903 This configuration can be used to alert you if 25 percent or more of
2904 Apache child processes are stuck performing DNS lookups:
2905
2906 check process apache with pidfile /var/run/httpd.pid
2907 start "/etc/init.d/httpd start"
2908 stop "/etc/init.d/httpd stop"
2909 if failed host www.sol.no port 80
2910 protocol apache-status dnslimit > 25% then alert
2911
2912 Here we use an icmp ping test to check if a remote host is up and if
2913 not send an alert:
2914
2915 check host www.tildeslash.com with address www.tildeslash.com
2916 if failed icmp type echo count 5 with timeout 15 seconds
2917 then alert
2918
2919 In the following example we ask Monit to compute and verify the check‐
2920 sum for the underlying apache binary used by the start and stop pro‐
2921 grams. If the the checksum test should fail, monitoring will be dis‐
2922 abled to prevent possibly starting a compromised binary:
2923
2924 check process apache with pidfile /var/run/httpd.pid
2925 start program = "/etc/init.d/httpd start"
2926 stop program = "/etc/init.d/httpd stop"
2927 if failed host www.tildeslash.com port 80 then restart
2928 depends on apache_bin
2929
2930 check file apache_bin with path /usr/local/apache/bin/httpd
2931 if failed checksum then unmonitor
2932
2933 In this example we ask Monit to test the checksum for a document on a
2934 remote server. If the checksum was changed we send an alert:
2935
2936 check host tildeslash with address www.tildeslash.com
2937 if failed port 80 protocol http
2938 and request "/monit/dist/monit-4.0.tar.gz"
2939 with checksum f9d26b8393736b5dfad837bb13780786
2940 then alert
2941
2942 Here are a couple of tests for some popular communication servers,
2943 using the SIP protocol. First we test a FreeSWITCH server and then an
2944 Asterisk server
2945
2946 check process freeswitch
2947 with pidfile /usr/local/freeswitch/log/freeswitch.pid
2948 start program = “/usr/local/freeswitch/bin/freeswitch -nc -hp”
2949 stop program = “/usr/local/freeswitch/bin/freeswitch -stop”
2950 if totalmem > 1000.0 MB for 5 cycles then alert
2951 if totalmem > 1500.0 MB for 5 cycles then alert
2952 if totalmem > 2000.0 MB for 5 cycles then restart
2953 if cpu > 60% for 5 cycles then alert
2954 if failed port 5060 type udp protocol SIP
2955 target me@foo.bar and maxforward 10
2956 then restart
2957 if 5 restarts within 5 cycles then timeout
2958
2959 check process asterisk
2960 with pidfile /var/run/asterisk/asterisk.pid
2961 start program = “/usr/sbin/asterisk”
2962 stop program = “/usr/sbin/asterisk -r -x ’shutdown now’”
2963 if totalmem > 1000.0 MB for 5 cycles then alert
2964 if totalmem > 1500.0 MB for 5 cycles then alert
2965 if totalmem > 2000.0 MB for 5 cycles then restart
2966 if cpu > 60% for 5 cycles then alert
2967 if failed port 5060 type udp protocol SIP
2968 and target me@foo.bar maxforward 10
2969 then restart
2970 if 5 restarts within 5 cycles then timeout
2971
2972 Some servers are slow starters, like for example Java based Application
2973 Servers. So if we want to keep the poll-cycle low (i.e. < 60 seconds)
2974 but allow some services to take its time to start, the every statement
2975 is handy:
2976
2977 check process dynamo with pidfile /etc/dynamo.pid
2978 start program = "/etc/init.d/dynamo start"
2979 stop program = "/etc/init.d/dynamo stop"
2980 if failed port 8840 then alert
2981 every 2 cycles
2982
2983 Here is an example where we group together two database entries so you
2984 can manage them together, e.g.; 'Monit -g database start all'. The mode
2985 statement is also illustrated in the first entry and have the effect
2986 that Monit will not try to (re)start this service if it is not running:
2987
2988 check process sybase with pidfile /var/run/sybase.pid
2989 start = "/etc/init.d/sybase start"
2990 stop = "/etc/init.d/sybase stop"
2991 mode passive
2992 group database
2993
2994 check process oracle with pidfile /var/run/oracle.pid
2995 start program = "/etc/init.d/oracle start"
2996 stop program = "/etc/init.d/oracle stop"
2997 mode active # Not necessary really, since it's the default
2998 if failed port 9001 then restart
2999 group database
3000
3001 Here is an example to show the usage of the resource checks. It will
3002 send an alert when the CPU usage of the http daemon and its child pro‐
3003 cesses raises beyond 60% for over two cycles. Apache is restarted if
3004 the CPU usage is over 80% for five cycles or the memory usage over
3005 100Mb for five cycles or if the machines load average is more than 10
3006 for 8 cycles:
3007
3008 check process apache with pidfile /var/run/httpd.pid
3009 start program = "/etc/init.d/httpd start"
3010 stop program = "/etc/init.d/httpd stop"
3011 if cpu > 40% for 2 cycles then alert
3012 if totalcpu > 60% for 2 cycles then alert
3013 if totalcpu > 80% for 5 cycles then restart
3014 if mem > 100 MB for 5 cycles then stop
3015 if loadavg(5min) greater than 10.0 for 8 cycles then stop
3016
3017 This examples demonstrate the timestamp statement with exec and how you
3018 may restart apache if its configuration file was changed.
3019
3020 check file httpd.conf with path /etc/httpd/httpd.conf
3021 if changed timestamp
3022 then exec "/etc/init.d/httpd graceful"
3023
3024 In this example we demonstrate usage of the extended alert statement
3025 and a file check dependency:
3026
3027 check process apache with pidfile /var/run/httpd.pid
3028 start = "/etc/init.d/httpd start"
3029 stop = "/etc/init.d/httpd stop"
3030 alert admin@bar on {nonexist, timeout}
3031 with mail-format {
3032 from: bofh@$HOST
3033 subject: apache $EVENT - $ACTION
3034 message: This event occurred on $HOST at $DATE.
3035 Your faithful employee,
3036 monit
3037 }
3038 if failed host www.tildeslash.com port 80 then restart
3039 if 3 restarts within 5 cycles then timeout
3040 depend httpd_bin
3041 group apache
3042
3043 check file httpd_bin with path /usr/local/apache/bin/httpd
3044 alert security@bar on {checksum, timestamp,
3045 permission, uid, gid}
3046 with mail-format {subject: Alaaarrm! on $HOST}
3047 if failed checksum
3048 and expect 8f7f419955cefa0b33a2ba316cba3659
3049 then unmonitor
3050 if failed permission 755 then unmonitor
3051 if failed uid root then unmonitor
3052 if failed gid root then unmonitor
3053 if changed timestamp then alert
3054 group apache
3055
3056 In this example, we demonstrate usage of the depend statement. In this
3057 case, we want to start oracle and apache. However, we've set up apache
3058 to use oracle as a back end, and if oracle is restarted, apache must be
3059 restarted as well.
3060
3061 check process apache with pidfile /var/run/httpd.pid
3062 start = "/etc/init.d/httpd start"
3063 stop = "/etc/init.d/httpd stop"
3064 depends on oracle
3065
3066 check process oracle with pidfile /var/run/oracle.pid
3067 start = "/etc/init.d/oracle start"
3068 stop = "/etc/init.d/oracle stop"
3069 if failed port 9001 then restart
3070
3071 Next, we have 2 services, oracle-import and oracle-export that need to
3072 be restarted if oracle is restarted, but are independent of each other.
3073
3074 check process oracle with pidfile /var/run/oracle.pid
3075 start = "/etc/init.d/oracle start"
3076 stop = "/etc/init.d/oracle stop"
3077 if failed port 9001 then restart
3078
3079 check process oracle-import
3080 with pidfile /var/run/oracle-import.pid
3081 start = "/etc/init.d/oracle-import start"
3082 stop = "/etc/init.d/oracle-import stop"
3083 depends on oracle
3084
3085 check process oracle-export
3086 with pidfile /var/run/oracle-export.pid
3087 start = "/etc/init.d/oracle-export start"
3088 stop = "/etc/init.d/oracle-export stop"
3089 depends on oracle
3090
3091 Finally an example with all statements:
3092
3093 check process apache with pidfile /var/run/httpd.pid
3094 start program = "/etc/init.d/httpd start"
3095 stop program = "/etc/init.d/httpd stop"
3096 if 3 restarts within 5 cycles then timeout
3097 if failed host www.sol.no port 80 protocol http
3098 and use the request "/login.cgi"
3099 then alert
3100 if failed host shop.sol.no port 443 type tcpssl
3101 protocol http and with timeout 15 seconds
3102 then restart
3103 if cpu is greater than 60% for 2 cycles then alert
3104 if cpu > 80% for 5 cycles then restart
3105 if totalmem > 100 MB then stop
3106 if children > 200 then alert
3107 alert bofh@bar with mail-format {from: monit@foo.bar.no}
3108 every 2 cycles
3109 mode active
3110 depends on weblogic
3111 depends on httpd.pid
3112 depends on httpd.conf
3113 depends on httpd_bin
3114 depends on datafs
3115 group server
3116
3117 check file httpd.pid with path /usr/local/apache/logs/httpd.pid
3118 group server
3119 if timestamp > 7 days then restart
3120 every 2 cycles
3121 alert bofh@bar with mail-format {from: monit@foo.bar.no}
3122 depends on datafs
3123
3124 check file httpd.conf with path /etc/httpd/httpd.conf
3125 group server
3126 if timestamp was changed
3127 then exec "/usr/local/apache/bin/apachectl graceful"
3128 every 2 cycles
3129 alert bofh@bar with mail-format {from: monit@foo.bar.no}
3130 depends on datafs
3131
3132 check file httpd_bin with path /usr/local/apache/bin/httpd
3133 group server
3134 if failed checksum and expect the sum
3135 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
3136 if failed permission 755 then unmonitor
3137 if failed uid root then unmonitor
3138 if failed gid root then unmonitor
3139 if changed size then alert
3140 if changed timestamp then alert
3141 every 2 cycles
3142 alert bofh@bar with mail-format {from: monit@foo.bar.no}
3143 alert foo@bar on { checksum, size, timestamp, uid, gid }
3144 depends on datafs
3145
3146 check filesystem datafs with path /dev/sdb1
3147 group server
3148 start program = "/bin/mount /data"
3149 stop program = "/bin/umount /data"
3150 if failed permission 660 then unmonitor
3151 if failed uid root then unmonitor
3152 if failed gid disk then unmonitor
3153 if space usage > 80 % then alert
3154 if space usage > 94 % then stop
3155 if inode usage > 80 % then alert
3156 if inode usage > 94 % then stop
3157 alert root@localhost
3158
3159 check host ftp.redhat.com with address ftp.redhat.com
3160 if failed icmp type echo with timeout 15 seconds
3161 then alert
3162 if failed port 21 protocol ftp
3163 then exec "/usr/X11R6/bin/xmessage -display
3164 :0 ftp connection failed"
3165 alert foo@bar.com
3166
3167 check host www.gnu.org with address www.gnu.org
3168 if failed port 80 protocol http
3169 and request "/pub/gnu/bash/bash-2.05b.tar.gz"
3170 with checksum 8f7f419955cefa0b33a2ba316cba3659
3171 then alert
3172 alert rms@gnu.org with mail-format {
3173 subject: The gnu server may be hacked again! }
3174
3175 Note; only the check statement is mandatory, the other statements are
3176 optional and the order of the optional statements is not important.
3177
3179 ~/.monitrc
3180 Default run control file
3181
3182 /etc/monitrc
3183 If the control file is not found in the default
3184 location and /etc contains a monitrc file, this
3185 file will be used instead.
3186
3187 ./monitrc
3188 If the control file is not found in either of the
3189 previous two locations, and the current working
3190 directory contains a monitrc file, this file is
3191 used instead.
3192
3193 ~/.monit.pid
3194 Lock file to help prevent concurrent runs (non-root
3195 mode).
3196
3197 /var/run/monit.pid
3198 Lock file to help prevent concurrent runs (root mode,
3199 Linux systems).
3200
3201 /etc/monit.pid
3202 Lock file to help prevent concurrent runs (root mode,
3203 systems without /var/run).
3204
3205 ~/.monit.state
3206 Monit save its state to this file and utilize
3207 information found in this file to recover from
3208 a crash. This is a binary file and its content is
3209 only of interest to monit. You may set the location
3210 of this file in the Monit control file or by using
3211 the -s switch when Monit is started.
3212
3213 ~/.monit.id
3214 Monit save its unique id to this file.
3215
3217 No environment variables are used by Monit. However, when Monit execute
3218 a script or a program Monit will set several environment variables
3219 which can be utilized by the executable. The following and only the
3220 following environment variables are available:
3221
3222 MONIT_EVENT
3223 The event that occurred on the service
3224
3225 MONIT_DESCRIPTION
3226 A description of the error condition
3227
3228 MONIT_SERVICE
3229 The name of the service (from monitrc) on which the event occurred.
3230
3231 MONIT_DATE
3232 The time and date (rfc 822 style) the event occurred
3233
3234 MONIT_HOST
3235 The host the event occurred on
3236
3237 The following environment variables are only available for process ser‐
3238 vice entries:
3239
3240 MONIT_PROCESS_PID
3241 The process pid. This may be 0 if the process was (re)started,
3242
3243 MONIT_PROCESS_MEMORY
3244 Process memory. This may be 0 if the process was (re)started,
3245
3246 MONIT_PROCESS_CHILDREN
3247 Process children. This may be 0 if the process was (re)started,
3248
3249 MONIT_PROCESS_CPU_PERCENT
3250 Process cpu%. This may be 0 if the process was (re)started,
3251
3252 In addition the following spartan PATH environment variable is avail‐
3253 able:
3254
3255 PATH=/bin:/usr/bin:/sbin:/usr/sbin
3256
3257 Scripts or programs that depends on other environment variables or on a
3258 more verbose PATH must provide means to set these variables by them
3259 self.
3260
3262 If a Monit daemon is running, SIGUSR1 wakes it up from its sleep phase
3263 and forces a poll of all services. SIGTERM and SIGINT will gracefully
3264 terminate a Monit daemon. The SIGTERM signal is sent to a Monit daemon
3265 if Monit is started with the quit action argument.
3266
3267 Sending a SIGHUP signal to a running Monit daemon will force the daemon
3268 to reinitialize itself, specifically it will reread configuration,
3269 close and reopen log files.
3270
3271 Running Monit in foreground while a background Monit daemon is running
3272 will wake up the daemon.
3273
3275 This is a very silent program. Use the -v switch if you want to see
3276 what Monit is doing, and tail -f the logfile. Optionally for testing
3277 purposes; you can start Monit with the -Iv switch. Monit will then
3278 print debug information to the console, to stop monit in this mode,
3279 simply press CTRL^C (i.e. SIGINT) in the same console.
3280
3281 The syntax (and parser) of the control file was inspired by Eric S.
3282 Raymond et al. excellent fetchmail program. Some portions of this man
3283 page does also receive inspiration from the same authors.
3284
3286 Jan-Henrik Haukeland <hauk@tildeslash.com>, Martin Pala <mart‐
3287 inp@tildeslash.com>, Christian Hopp <chopp@iei.tu-clausthal.de>, Rory
3288 Toma <rory@digeo.com>
3289
3290 See also http://mmonit.com/monit/who/
3291
3293 Copyright (C) 2010 by Tildeslash Ltd. All Rights Reserved. This product
3294 is distributed in the hope that it will be useful, but WITHOUT any war‐
3295 ranty; without even the implied warranty of MERCHANTABILITY or FITNESS
3296 for a particular purpose.
3297
3299 GNU text utilities; md5sum(1); sha1sum(1); openssl(1); glob(7);
3300 regex(7); http://mmonit.com/
3301
3302
3303
3304February 23. 2010 www.tildeslash.com MONIT(1)