1MONIT(1) User Commands MONIT(1)
2
3
4
6 Monit - utility for monitoring services on a Unix system
7
9 monit [options] <arguments>
10
12 Monit is a utility for managing and monitoring processes, programs,
13 files, directories and filesystems on a Unix system. Monit conducts
14 automatic maintenance and repair and can execute meaningful causal
15 actions in error situations. E.g. Monit can start a process if it does
16 not run, restart a process if it does not respond and stop a process if
17 it uses too much resources. You can use Monit to monitor files,
18 directories and filesystems for changes, such as timestamps changes,
19 checksum changes or size changes.
20
21 Monit is controlled via an easy to configure control file based on a
22 free-format, token-oriented syntax. Monit logs to syslog or to its own
23 log file and notifies you about error conditions via customisable alert
24 messages. Monit can perform various TCP/IP network checks, protocol
25 checks and can utilise SSL for such checks. Monit provides a HTTP(S)
26 interface and you may use a browser to access the Monit program.
27
29 You can use Monit to monitor daemon processes or similar programs
30 running on localhost. Monit is particularly useful for monitoring
31 daemon processes, such as those started at system boot time. For
32 instance sendmail, sshd, apache and mysql. In contrast to many other
33 monitoring systems, Monit can act if an error situation should occur,
34 e.g.; if sendmail is not running, monit can start sendmail again
35 automatically or if apache is using too many resources (e.g. if a DoS
36 attack is in progress) Monit can stop or restart apache and send you an
37 alert message. Monit can also monitor process characteristics, such as
38 how much memory or cpu cycles a process is using.
39
40 You can also use Monit to monitor files, directories and filesystems on
41 localhost. Monit can monitor these items for changes, such as
42 timestamps changes, checksum changes or size changes. This is also
43 useful for security reasons - you can monitor the md5 or sha1 checksum
44 of files that should not change and get an alert or perform an action
45 if they should change.
46
47 Monit can monitor network connections to various servers, either on
48 localhost or on remote hosts. TCP, UDP and Unix Domain Sockets are
49 supported. Network test can be performed on a protocol level; Monit has
50 built-in tests for the main Internet protocols, such as HTTP, SMTP etc.
51 Even if a protocol is not supported you can still test the server
52 because you can configure Monit to send any data and test the response
53 from the server.
54
55 Monit can be used to test programs or scripts at certain times, much
56 like cron, but in addition, you can test the exit value of a program
57 and perform an action or send an alert if the exit value indicates an
58 error. This means that you can use Monit to perform any type of check
59 you can write a script for.
60
61 Finally, Monit can be used to monitor general system resources on
62 localhost such as overall CPU usage, Memory and System Load.
63
65 The behaviour of Monit is controlled by command-line options and a run
66 control file, monitrc, the syntax of which we describe in a later
67 section. Command-line options override .monitrc declarations.
68
69 The default location for monitrc is ~/.monitrc. If this file does not
70 exist, Monit will try /etc/monitrc and a few other places. See FILES
71 for details. You can also specify the control file directly by using
72 the -c command-line switch to monit. For instance,
73
74 $ monit -c /var/monit/monitrc
75
76 Before Monit is started the first time, you can test the control file
77 for syntax errors:
78
79 $ monit -t
80 $ Control file syntax OK
81
82 If there was an error, Monit will print an error message to the
83 console, including the line number in the control file from where the
84 error was found.
85
86 Once you have a working Monit control file, simply start Monit from the
87 console, like so:
88
89 $ monit
90
91 You can change some configuration directives via command-line switches,
92 but for simplicity it is recommended that you put these in the control
93 file.
94
95 Monit will detach from the terminal and run as a background process,
96 i.e. as a daemon process. As a daemon, Monit runs in cycles; It monitor
97 services, then goes to sleep for a configured period, then wakes up and
98 start monitoring again in an endless loop.
99
100 Options
101 The following options are recognized by Monit. However, it is
102 recommended that you set options (when applicable) directly in the
103 .monitrc control file.
104
105 -c file
106 Use this control file
107
108 -d n
109 Run Monit as a daemon once per n seconds. Or use "set
110 daemon" in monitrc.
111
112 -g name
113 Set group name for start, stop, restart, monitor, unmonitor,
114 status and summary action.
115
116 -l file
117 Print log information to this file. Or use "set log"
118 in monitrc.
119
120 -p pidfile
121 Use this lock file in daemon mode. Or use "set pidfile"
122 in monitrc.
123
124 -s statefile
125 Write state information to this file. Or use "set
126 statefile" in monitrc.
127
128 -B
129 Batch command line mode (no tabular output and no colors). Or
130 use "set terminal batch" in monitrc.
131
132 -I
133 Do not run in background mode (needed to run from init). Or use
134 "set init" in monitrc.
135
136 -i
137 Print Monit's unique ID
138
139 -r
140 Reset Monit's unique ID. Use with caution
141
142 -t
143 Run syntax check for the control file
144
145 -v
146 Verbose mode, work noisy (diagnostic output)
147
148 -vv
149 Very verbose mode, same as -v plus log stack-trace on error
150
151 -H [filename]
152 Print MD5 and SHA1 hashes of the file or of stdin if the
153 filename is omitted; Monit will exit afterwards
154
155 -V
156 Print version number and patch level
157
158 -h
159 Print a help text
160
161 Arguments
162 Once you have Monit running as a daemon process, you can call Monit
163 with one of the following arguments. Monit will then connect to the
164 Monit daemon (on TCP port 127.0.0.1:2812 by default) and ask the Monit
165 daemon to perform the requested action. In other words; calling monit
166 without arguments starts the Monit daemon, and calling monit with
167 arguments enables you to communicate with the Monit daemon process.
168
169 start all
170 Start all services listed in the control file and enable monitoring
171 for them. If the group option is set (-g), only start and enable
172 monitoring of services in the named group ("all" is not required in
173 this case).
174
175 start <name|pattern>
176 Start the named service and enable monitoring for it. The name is a
177 service entry name from the monitrc file. You can use a regex
178 pattern too (note that it is case insensitive).
179
180 stop all
181 Stop all services listed in the control file and disable their
182 monitoring. If the group option is set, only stop and disable
183 monitoring of the services in the named group ("all" is not
184 required in this case).
185
186 stop <name|pattern>
187 Stop the named service and disable its monitoring. The name is a
188 service entry name from the monitrc file. You can use a regex
189 pattern too (note that it is case insensitive).
190
191 restart all
192 Stop and start all services. If the group option is set, only
193 restart the services in the named group ("all" is not required in
194 this case).
195
196 restart <name|pattern>
197 Restart the named service. The name is a service entry name from
198 the monitrc file. You can use a regex pattern too (note that it is
199 case insensitive).
200
201 monitor all
202 Enable monitoring of all services listed in the control file. If
203 the group option is set, only start monitoring of services in the
204 named group ("all" is not required in this case).
205
206 monitor <name|pattern>
207 Enable monitoring of the named service. The name is a service entry
208 name from the monitrc file. Monit will also enable monitoring of
209 all services this service depends on. You can use a regex pattern
210 too (note that it is case insensitive).
211
212 unmonitor all
213 Disable monitoring of all services listed in the control file. If
214 the group option is set, only disable monitoring of services in the
215 named group ("all" is not required in this case).
216
217 unmonitor <name|pattern>
218 Disable monitoring of the named service. The name is a service
219 entry name from the monitrc file. Monit will also disable
220 monitoring of all services that depends on this service. You can
221 use a regex pattern too (note that it is case insensitive).
222
223 status [name|pattern]
224 Print service status information.
225
226 summary [name|pattern]
227 Print a short status summary.
228
229 report [up | down | initialising | unmonitored | total]
230 Report services state. The output can easily be parsed by scripts.
231 Without options, prints a short overview of the state of all
232 services managed by Monit. The option, up prints the number of all
233 services in this state, down likewise and so on.
234
235 reload
236 Reinitialise a running Monit daemon, the daemon will reread its
237 configuration, close and reopen log files.
238
239 quit
240 Kill the Monit daemon process
241
242 validate
243 Check all services listed in the control file. This action is also
244 the default behaviour when Monit runs in daemon mode.
245
246 procmatch <regex>
247 Allows for easy testing of pattern for process match check. The
248 command takes regular expression as an argument and displays all
249 running processes matching the pattern.
250
252 Monit is configured and controlled via a control file called monitrc.
253 The default location for this file is ~/.monitrc. If this file does not
254 exist, Monit will try /etc/monitrc, then @sysconfdir@/monitrc and
255 finally ./monitrc. If you build Monit from source, the value of
256 @sysconfdir@ can be given at configure time as ./configure
257 --sysconfdir. For instance, using ./configure --sysconfdir
258 /var/monit/etc will make Monit search for monitrc in /var/monit/etc
259
260 To protect the security of your control file and passwords the control
261 file must have read-write permissions no more than 0700 (u=xrw,g=,o=);
262 Monit will complain and exit otherwise.
263
264 When there is a conflict between the command-line arguments and the
265 arguments in this file, the command-line arguments takes precedence.
266
267 Monit uses its own Domain Specific Language (DSL); The control file
268 consists of a series of service entries and global option statements.
269
270 Comments begin with a '#' and extend through the end of the line.
271 Otherwise the file consists of a series of service entries or global
272 option statements in a free-format, token-oriented syntax.
273
274 You can use noise keywords like 'if', 'and', 'with(in)', 'has',
275 'us(ing|e)', 'on(ly)', 'then', 'for', 'of' anywhere in an entry to make
276 it resemble English. They're ignored, but can make entries much easier
277 to read at a glance. Keywords are case insensitive.
278
279 There are three kinds of tokens: grammar, numbers (i.e. decimal digit
280 sequences) and strings. Strings can be either quoted or unquoted. A
281 quoted string is bounded by double quotes and may contain whitespace
282 (and quoted digits are treated as a string). An unquoted string is any
283 whitespace-delimited token, containing characters and/or numbers.
284
285 On a semantic level, the control file consists of three types of
286 entries:
287
288 1. Global set-statements
289 A global set-statement starts with the keyword "set" and the item
290 to configure.
291
292 2. Global include-statement
293 The include statement consists of the keyword "include" and a glob
294 string. This statement is used to include configure directives from
295 separate files.
296
297 3. One or more service entry statements.
298
299 Service checks
300 Each service entry consists of the keywords "check", followed by the
301 service type. Each entry requires a unique descriptive name, which may
302 be freely chosen. This name is used by Monit to refer to the service
303 internally and in all interactions with the user. The name is case
304 insensitive.
305
306 Currently, nine types of check statements are supported:
307
308 Process
309
310 CHECK PROCESS <unique name> <PIDFILE <path> | MATCHING <regex>>
311
312 <path> is the absolute path to the program's pid-file. A pid-file is a
313 file, containing a Process's unique ID. If the pid-file does not exist
314 or does not contain the PID number of a running process, Monit will
315 call the entry's start method if defined.
316
317 <regex> is an alternative to using PID files and uses process name
318 pattern matching to find the process to monitor. The top-most matching
319 parent with highest uptime is selected, so this form of check is most
320 useful if the process name is unique. Pid-file should be used where
321 possible as it defines expected PID exactly. You can test if a process
322 match a pattern from the command-line using "monit procmatch
323 "regex-pattern"". This will lists all processes matching or not, the
324 regex-pattern.
325
326 File
327
328 CHECK FILE <unique name> PATH <path>
329
330 <path> is the absolute path to the file. If the file does not exist,
331 Monit will call the entry's start method if defined, if <path> does not
332 point to a regular file type (for instance a directory), Monit will
333 disable monitoring of this entry. If Monit runs in passive mode or the
334 start method is not defined, Monit will just send an alert on error.
335
336 Fifo
337
338 CHECK FIFO <unique name> PATH <path>
339
340 <path> is the absolute path to the fifo. If the fifo does not exist,
341 Monit will call the entry's start method if defined, if <path> does not
342 point to a fifo type (for instance a directory), Monit will disable
343 monitoring of this entry. If Monit runs in passive mode or the start
344 method is not defined, Monit will just send an alert on error.
345
346 Filesystem
347
348 CHECK FILESYSTEM <unique name> PATH <string>
349
350 <path> is the path to the device/disk, mount point or NFS/CIFS/FUSE
351 connection string. If the filesystem becomes unavailable, Monit will
352 call the service's start method if defined. If Monit runs in passive
353 mode or the start method is not defined, Monit will just send an alert
354 on error.
355
356 Directory
357
358 CHECK DIRECTORY <unique name> PATH <path>
359
360 <path> is the absolute path to the directory. If the directory does not
361 exist, Monit will call the entry's start method if defined. If <path>
362 does not point to a directory, monit will disable monitoring of this
363 entry. If Monit runs in passive mode or the start methods is not
364 defined, Monit will just send an alert on error.
365
366 Remote host
367
368 CHECK HOST <unique name> ADDRESS <host>
369
370 The host address can be specified as a hostname string or as an IP-
371 address string on a dotted decimal format. Such as, "tildeslash.com" or
372 "64.87.72.95".
373
374 System
375
376 CHECK SYSTEM <unique name>
377
378 The unique name is usually the local host name, but any descriptive
379 name can be used. If you use the variable $HOST as the name, it will
380 expand to the hostname. This check allows one to monitor general system
381 resources such as CPU usage, total memory usage or load average. The
382 unique name is used as the system hostname in mail alerts and as the
383 initial name of the host entry in M/Monit.
384
385 Program
386
387 CHECK PROGRAM <unique name> PATH <executable file> [TIMEOUT <number> SECONDS]
388
389 <path> is the absolute path to the executable program or script. The
390 status test allows one to check the program's exit status. If the
391 program does not finish executing within <number> seconds, Monit will
392 terminate it. The default program timeout is 300 seconds (5 minutes).
393 The output of the program is recorded and made available in the User
394 Interface and in alerts, by default up to 512 bytes. You can change the
395 output limit using the set limits statement).
396
397 Network
398
399 CHECK NETWORK <unique name> <ADDRESS <ipaddress> | INTERFACE <name>>
400
401 <ipaddress> is the IPv4 or IPv6 address of the monitored network
402 interface. It is also possible to use interface name, such as "eth0" on
403 Linux.
404
406 Monit will log status and error messages to a file or via syslog. Use
407 the set log statement in the monitrc control file.
408
409 To setup Monit to log to its own file, use e.g. set log
410 /var/log/monit.log. Note, the previous set logfile statement is
411 deprecated, but can alternatively be used.
412
413 If syslog is given as a value for the "-l" command-line switch or the
414 keyword set log syslog is found in the control file, Monit will use the
415 syslog system daemon to log messages with a priority assigned to each
416 message based on the context.
417
418 To turn off logging, simply do not set the log in the control file (and
419 of course, do not use the -l switch)
420
421 The format for an entry in the log file is:
422
423 [date] priority : message
424
425 for example:
426
427 [2020-08-12T16:35:00+0200] info : 'localhost' Monit started
428
430 Monit uses ANSI escape sequences to colorise important parts of the
431 command-line output, if the terminal supports colors, and UTF-8 box
432 characters for tabular output.
433
434 If you want to process the monit CLI output in a script, you can use
435 either the -B option or use the following statement in the monit
436 configuration file to disable tabular output and colors completely:
437
438 SET TERMINAL BATCH
439
441 Use
442
443 SET DAEMON <seconds>
444 [[WITH] START DELAY <seconds>]
445
446 to specify Monit's poll cycle length and run Monit in daemon mode. You
447 must specify a numeric argument which is a polling interval in seconds.
448
449 In daemon mode, Monit detaches from the console, puts itself in the
450 background and runs continuously, monitoring each specified service and
451 then goes to sleep for the given poll interval, wakes up and start
452 monitoring again in an endless cycle.
453
454 Alternatively, you can use the "-d" command line switch to set the poll
455 interval, but it is strongly recommended to set the poll interval in
456 your ~/.monitrc file, by using set daemon.
457
458 Monit will then always start in daemon mode. If you do not use this
459 statement and do not start monit with the -d option, Monit will just
460 run through the service checks once and then exit. This might be useful
461 in some situations, but Monit is primarily designed to run as a daemon
462 process.
463
464 Calling "monit" with a Monit daemon running in the background sends a
465 wake-up signal to the daemon, forcing it to check services immediately.
466 Calling "monit" with the quit argument will kill a running Monit daemon
467 process instead of waking it up.
468
469 The start delay option can be used to wait (once) before Monit starts
470 checking services after system reboot. Monit will by default start
471 checking services immediately at startup.
472
474 The "set init" statement prevents Monit from transforming itself into a
475 daemon process. Instead Monit will run as a foreground process. (You
476 should still use "set daemon" to specify the poll cycle).
477
478 This is required to run Monit from init. Using init to start Monit is
479 probably the best way to run Monit if you want to be certain that you
480 always have a running Monit daemon on your system. Another option is to
481 run Monit from crontab. In any case, you should make sure that the
482 control file does not have any syntax errors before you start Monit
483 from init or crontab (use "monit -t" to check).
484
485 To setup Monit to run from init, you can either use the "set init"
486 statement in Monit's control file or use the "-I" option from the
487 command line. Here is what you must add to "/etc/inittab":
488
489 # Run Monit in standard run-levels
490 mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc
491
492 After you have modified init's configuration file, you can run the
493 following command to re-examine /etc/inittab and start Monit:
494
495 telinit q
496
497 For systems without telinit:
498
499 kill -1 1
500
501 If Monit is used to monitor services that are also started at boot time
502 (e.g. services started via SYSV init rc scripts or via inittab) then,
503 in some cases, a race condition could occur. That is; if a service is
504 slow to start, Monit can assume that the service is not running and
505 possibly try to start it and raise an alert, while, in fact the service
506 is already about to start or already in its startup sequence. Please
507 see the FAQ for a solution to this problem. The short version is to
508 start Monit on a higher run-level after system processes.
509
511 The Monit control file, "monitrc", can include additional configuration
512 files. This feature helps one to organise configuration into separate
513 files instead of having everything in one file, if you like this kind
514 of thing. Include statements can be placed at virtually any place in
515 "monitrc" though the convention is at the bottom. The syntax is the
516 following:
517
518 INCLUDE <globstring>
519
520 The globstring is any kind of string as defined in glob(7). Thus, you
521 can refer to a single file or you can load several files at once. If
522 you want to use whitespace in your string the globstring needs to be
523 embedded into quotes (') or double quotes ("). If the globstring
524 matches a directory instead of a file, it is silently ignored.
525
526 Any include statements in an included file are parsed as in the main
527 control file.
528
529 If the globstring matches several results, the files are included in a
530 non sorted manner. If you need to rely on a certain order, you should
531 avoid wild-card globbing and instead specify the full path of files
532 included.
533
534 An example,
535
536 include /etc/monit.d/*.cfg
537
538 This will load any file matching the globstring. That is, all files in
539 /etc/monit.d that ends with the prefix .cfg.
540
541 Up to 1024 include files are supported. If this limit is exceeded,
542 Monit will report an error.
543
545 Common SSL/TLS options can be set using the following statement and
546 will apply to all SSL connections made through Monit:
547
548 SET <SSL | TLS> [OPTIONS] {
549 VERSION: <AUTO | [-]SSLV2 | [-]SSLV3 | [-]TLSV1 | [-]TLSV11 | [-]TLSV12 | [-]TLSV13>, ...
550 VERIFY: <ENABLE | DISABLE>
551 SELFSIGNED: <ALLOW | REJECT>
552 CIPHERS: <string>
553 PEMFILE: <path>
554 PEMCHAIN: <path>
555 PEMKEY: <path>
556 CLIENTPEMFILE: <path>
557 CACERTIFICATEFILE: <path>
558 CACERTIFICATEPATH: <path>
559 }
560
561 VERSION set the specific SSL/TLS version to use. By default Monit uses
562 AUTO. In AUTO mode, only TLS 1.2 and 1.3 are allowed, all other
563 protocols are considered obsolete. If you want to use the obsolete
564 protocol you must explicitly set the version. You can exclude the
565 protocol using the "-" prefix. Exclude list example:
566 set ssl {
567 version: auto -sslv2 -sslv3 -tlsv1 -tlsv11
568 } Example of allowed protocols list:
569 set ssl {
570 version: tlsv12 tlsv13
571 }
572
573 VERIFY enable SSL server certificate verification. This will verify and
574 report an error if the server certificate is not trusted, not valid or
575 has expired. By default certificate verification is disabled, though we
576 recommend enabling it, otherwise there is no guarantee that Monit
577 speaks with the server you think it speaks with.
578
579 SELFSIGNED self-signed certificates are rejected by default. Use this
580 option to allow self-signed certificates. Warning: not recommended in
581 production for security reasons, as in such case the client cannot
582 verify it talks to the correct server and attack types like man-in-the-
583 middle or DNS hijacking are possible).
584
585 CIPHERS override default SSL/TLS ciphers.
586
587 PEMFILE set the path to the SSL server certificate "database-file" in
588 PEM format. This options has effect only for the monit HTTP interface.
589
590 As an alternative to setting PEMFILE with a combined chain-key file,
591 PEMCHAIN and PEMKEY set the path to the SSL certificate chain
592 respectively the server private key file in PEM format. This options
593 has effect only for the monit HTTP interface.
594
595 CLIENTPEMFILE set the path to the PEM encoded SSL client certificates
596 database file. If set, a client certificate authentication is enabled.
597
598 CACERTIFICATEFILE set the path to the PEM encoded file containing
599 Certificate Authority (CA) certificates. Monit uses OpenSSL's default
600 CA certificates if this options is not used (openssl version -d can be
601 used to get the default CA certificates). Many distributions comes with
602 SSL and CA certificates already setup and using this option is normally
603 not necessary.
604
605 CACERTIFICATEPATH set the path to the directory containing Certificate
606 Authority (CA) certificates. Monit uses OpenSSL's default CA
607 certificates if this options is not used. Many distributions comes with
608 SSL and CA certificates already setup and using this option is normally
609 not necessary.
610
611 The SSL options statement will globally apply to all SSL/TLS connection
612 made through Monit. SSL options can also be set in a local check, in
613 mailserver settings or in the mmonit statement, and will then override
614 or extend the global settings.
615
616 To set global SSL options, put this statement near the top of your
617 .monitrc file:
618
619 set ssl options {...}
620
621 Here is an example of setting both global and local SSL options:
622
623 # Enable certificate verification for all SSL connections
624 # Self-signed certificates are not allowed by default
625 set ssl options {
626 verify: enable
627 }
628
629 # Verify certificate (via global setting)
630 # Allow self-signed certificate for this check
631 check host example with address example.com
632 if failed
633 port 443
634 protocol https
635 with ssl options {selfsigned: allow}
636 then alert
637
638 # Do not verify example2.com's certificate (override global setting)
639 check host example2 with address example2.com
640 if failed
641 port 443
642 protocol https
643 with ssl options {verify: disable}
644 then alert
645
647 To enable FIPS mode (provided your OpenSSL library supports it), add
648 this statement to Monit control file:
649
650 SET FIPS
651
653 If specified in the control file, Monit will start with HTTP support.
654 You can then use Monit CLI to start and stop services, disable or
655 enable service monitoring as well as view the status of each service.
656
657 If HTTP support is enabled over TCP rather than over a Unix Socket, you
658 can also view Monit's informative dashboard in your web browser.
659
660 Note that if HTTP support is disabled, the Monit CLI interface will
661 have reduced functionality, as most CLI commands (such as "monit
662 status") needs to communicate with the Monit background process via the
663 HTTP interface. We strongly recommend having HTTP support enabled. If
664 security is a concern, bind the HTTP interface to local host only or
665 use Unix Socket so Monit is not accessible from the outside.
666
667 UNIX SOCKET
668 Syntax for Unix Socket:
669
670 SET HTTPD UNIXSOCKET <path>
671 [UID <uid | username>]
672 [GID <gid | groupname>]
673 [PERMISSION <octal number>]
674 ALLOW <user:password>+
675
676 Example:
677
678 set httpd unixsocket /var/run/monit.sock
679 allow username:password
680
681 UNIXSOCKET set the path to the Unix Socket Monit should bind to and
682 listen on.
683
684 UID Socket owner (optional, defaults to the user who executes Monit)
685
686 GID Socket group (optional, defaults to primary group of the user who
687 executes Monit)
688
689 PERMISSION Socket permissions - absolute octal mode (optional, process
690 UMASK is applied by default)
691
692 TCP PORT
693 Syntax for TCP port:
694
695 SET HTTPD PORT <number>
696 [ADDRESS <hostname | IP-address>]
697 [[with] SSL {pemfile: <path>}]
698 ALLOW <user:password | IP-address | IP-range>+
699
700 PORT set the port Monit should bind to and listen on. Monit is usually
701 setup on port 2812. Example:
702
703 set httpd port 2812
704 allow username:password
705
706 You can now use <http://localhost:2812/> to access Monit's web
707 interface from a browser, after you have entered username and password
708 as credentials. You might need to use double quotes around the password
709 if it contains special chars such as "p@ssw:r#".
710
711 ADDRESS make Monit listen on a specific interface only. For example if
712 you don't want to expose Monit's web interface to the network, bind it
713 to localhost only. Monit will accept connections on any addresses if
714 the ADDRESS option is not used:
715
716 set httpd
717 port 2812
718 use address 127.0.0.1
719 allow username:password
720
721 Monit HTTP over TCP supports both IP version 4 and 6. Support is
722 transparent and does not require any special configuration. If the bind
723 address is not specified as in this example:
724
725 set httpd
726 port 2812
727 allow ...
728
729 Monit will bind to and listen on port 2812 on all interfaces, both IPv4
730 and IPv6 if available. To force Monit HTTP to only listen on and accept
731 connections over IP version 6, specify an IPv6 address:
732
733 set httpd
734 port 2812
735 use address "fe80::222:19ff:fe53:6c59"
736 allow ...
737
738 Likewise, to force Monit HTTP to only listen on and accept connections
739 over IP version 4, specify an IPv4 address:
740
741 set httpd
742 port 2812
743 use address 62.109.39.247
744 allow ...
745
746 SSL settings
747
748 SSL enable SSL/TLS for Monit's web interface. See options for full
749 list of SSL options.
750
751 PEMFILE sets the path to the PEM encoded file, which contains the
752 server's private key and certificate. This file should be stored in a
753 safe place on the filesystem and should have strict permissions, no
754 more than 0700.
755
756 As an alternative PEMCHAIN and PEMKEY sets the path to separate PEM
757 encoded certificate chain and private key file. The key file should be
758 stored in a safe place on the filesystem and should have strict
759 permissions, no more than 0700.
760
761 Example for using pemfile:
762
763 set httpd
764 port 2812
765 with ssl {
766 pemfile: /etc/ssl/certs/monit.pem
767 }
768 allow myuser:mypassword
769
770 Example for using separate certificate chain and key:
771
772 set httpd
773 port 2812
774 with ssl {
775 pemchain: /etc/ssl/certs/monit.chain.pem
776 pemkey: /etc/ssl/certs/monit.key.pem
777 }
778 allow myuser:mypassword
779
780 You can now use <https://localhost:2812/> to access the Monit web
781 server over a TLS encrypted connection.
782
783 Self-signed server certificates note: The Monit CLI works on a client-
784 server basis and uses the Monit HTTP GUI to collect status from the
785 Monit daemon and pass commands like start/stop to it. As self-signed
786 certificates are rejected by default for security reasons, the CLI
787 won't work unless you explicitly allow it by using the SELFSIGNED:
788 ALLOW option:
789
790 set httpd
791 port 2812
792 with ssl {
793 pemfile: /etc/ssl/certs/monit.pem
794 selfsigned: allow
795 }
796 allow myuser:mypassword
797
798 CLIENTPEMFILE enables a client certificate based authentication and
799 sets the path to a PEM encoded database file, that contains a list of
800 allowed client certificates. A connecting client has to provide a
801 certificate known to Monit (listed in clientpemfile), otherwise it is
802 rejected. This file must also include all necessary CA certificates. By
803 default self-signed client certificates are rejected for security
804 reasons, if you want to allow self-signed client certificates
805 (recommended only for testing), you have to allow it explicitly using
806 the SELFSIGNED: ALLOW option (see the example above). See your
807 browser's documentation for how to import client certificate to it.
808
809 Example:
810
811 set httpd
812 port 2812
813 with SSL {
814 pemfile: /etc/ssl/certs/monit.pem
815 clientpemfile: /etc/ssl/certs/monit-client.pem
816 }
817
818 Monit version signature
819 SIGNATURE can be used to hide Monit version from the HTTP response
820 header and error pages. For example:
821
822 set httpd
823 port 2812
824 signature disable
825 allow myuser:mypassword
826
827 Authentication
828 Access to the Monit web interface is controlled primarily via the ALLOW
829 option which is used to specify authentication and authorise only
830 specific clients to connect.
831
832 If the Monit command line interface is being used, at least one
833 cleartext password is necessary (see below), otherwise the Monit
834 command line interface will not be able to connect to the Monit web
835 interface.
836
837 Clients that try to connect to Monit, but submit a wrong username
838 and/or password are logged with their IP-address.
839
840 Client certificates
841
842 This authentication method is a strong authentication mechanism and
843 employ HTTPS client certificates to verify the authenticity of a
844 connecting client. Clients must posses a Public Key Certificate known
845 by Monit. The client must connect to Monit over SSL and Monit will ask
846 the client to send its certificate. Upon receiving the certificate
847 Monit compares the certificate to certificates located in the
848 CLIENTPEMFILE file. Access is granted if the client certificate is in
849 this file. See SSL settings for details.
850
851 Basic Authentication
852
853 Monit supports Basic Authentication as described in RFC 2617.
854
855 In short; a server challenge a client (e.g. a Browser) to send
856 authentication information (username and password) and if accepted, the
857 server will allow the client access to the requested document.
858
859 The biggest weakness with Basic Authentication is that username and
860 password is sent in clear-text over the network (i.e. base64 encoded).
861 It is therefore recommended that you do not use this authentication
862 method unless you run Monit with ssl support. With ssl, it is safe to
863 use Basic Authentication since all HTTP data, including Basic
864 Authentication headers will be encrypted.
865
866 Cleartext user and password
867
868 Monit will use Basic Authentication if an allow statement contains a
869 username and a password separated with a single ':' character.
870
871 Note: Special characters can be used, but for non-alphanumerics the
872 password has to be quoted.
873
874 Syntax:
875
876 ALLOW <username>:<password>
877
878 Host and network allow list
879
880 Monit maintains an access-control list of hosts and networks allowed to
881 connect. You can add as many hosts as you want to, but only hosts with
882 a valid domain name or its IP address are allowed.
883
884 Monit will query a name server to check any hosts trying to connect. If
885 a host (client) is trying to connect, but cannot be found in the access
886 list or cannot be resolved, Monit will shutdown the connection to the
887 client promptly.
888
889 Control file example:
890
891 set httpd port 2812
892 allow localhost
893 allow my.other.work.machine.com
894 allow 10.1.1.1
895 allow 192.168.1.0/255.255.255.0
896 allow 10.0.0.0/8
897
898 Clients, not mentioned in the allow list and trying to connect to Monit
899 will be denied access and are logged with their IP-address.
900
901 PAM
902
903 PAM is supported on platforms which provide PAM (such as Linux, macOS,
904 FreeBSD, NetBSD).
905
906 Syntax:
907
908 ALLOW @<group>
909
910 where "group" is the group name allowed to access Monit's web
911 interface. Monit uses a PAM service called monit for PAM
912 authentication, see the PAM manual page for detailed instructions on
913 how to set the PAM service and PAM authentication plugins.
914
915 Sample PAM service for Monit on macOS (store as "/etc/pam.d/monit"
916 file):
917
918 # monit: auth account password session
919 auth sufficient pam_securityserver.so
920 auth sufficient pam_unix.so
921 auth required pam_deny.so
922 account required pam_permit.so
923
924 A "monitrc" config which only allows group "admin" authenticated via
925 PAM to access the web interface:
926
927 set httpd
928 port 2812
929 allow @admin
930
931 htpasswd file
932
933 Alternatively you store credentials in a "htpasswd" formatted file (one
934 user:passwd entry per line), like so: allow [cleartext|crypt|md5] /path
935 [users]. The default is cleartext passwords. In case passwords are
936 digested it is necessary to specify the cryptographic method. If you do
937 not want all users in the password file to have access to Monit, you
938 can specify only those users that should have access in the allow
939 statement. Otherwise all users are added.
940
941 Example1:
942
943 set httpd port 2812
944 allow md5 /etc/httpd/htpasswd john paul ringo george
945
946 If you use this method together with a host list, then only clients
947 from the listed hosts will be allowed to connect to the Monit HTTP
948 server and each client will be asked to provide a username and a
949 password.
950
951 Example2:
952
953 set httpd port 2812
954 allow localhost
955 allow 10.1.1.1
956 allow hauk:"passw@rd"
957
958 If you only want to use Basic Authentication, then just provide allow
959 entries with username and password or password files as in example 1
960 above.
961
962 Read-only users
963
964 Further it is possible to define some users as read-only. A read-only
965 user can read the Monit web pages but will not get access to push-
966 buttons and cannot change a service from the web interface.
967
968 set httpd port 2812
969 allow admin:password
970 allow hauk:password read-only
971 allow @admins
972 allow @users read-only
973
974 A user is set to read-only by using the read-only keyword after
975 username:password. In the above example the user hauk is defined as a
976 read-only user, while the admin user has all access rights.
977
978 Read-only http server
979
980 Finally is is possible to restrict the entire web interface as read-
981 only. All users, regardless if defined with or without the read-only
982 keyword, have only the permissions described above. When using this
983 setting it is recommend to set up a UNIXSOCKET as well, otherwise the
984 monit CLI will not work.
985
986 set httpd
987 port 2812
988 read-only
989 unixsocket /run/monit.socket
990 allow @users
991
993 Monit will raise an alert in the following situations:
994
995 o A service does not exist (e.g. process is not running)
996 o Cannot read service data (e.g. cannot get filesystem usage)
997 o Execution of a service related script failed (e.g. start failed)
998 o Invalid service type (e.g. if path points to directory instead of file)
999 o Custom test script returned error
1000 o Ping test failed
1001 o TCP/UDP connection and/or port test failed
1002 o Resource usage test failed (e.g. cpu usage too high)
1003 o Checksum mismatch or change (e.g. file changed)
1004 o File size test failed (e.g. file too large)
1005 o Timestamp test failed (e.g. file is older then expected)
1006 o Permission test failed (e.g. file mode doesn't match)
1007 o An UID test failed (e.g. file owned by different user)
1008 o A GID test failed (e.g. file owned by different group)
1009 o A process's PID changed out of Monit's control
1010 o A process's PPID changed out of Monit control
1011 o Too many service recovery attempts failed
1012 o A file content test found a match
1013 o Filesystem flags changed
1014 o A service action was performed by administrator
1015 o A network link down or up
1016 o A network link capacity changed
1017 o A network link saturation failed
1018 o A network link upload/download rate failed
1019 o Monit was started, stopped or reloaded
1020
1021 To get an alert via e-mail, set the alert target using the global "set
1022 alert" statement (for all services) or the "alert" statement in the
1023 context of a service entry (for a single service).
1024
1025 Setting an alert recipient
1026 If an event occurs, Monit will send an alert. There are two kinds of
1027 alert statement: global and local.
1028
1029 Global syntax:
1030
1031 SET ALERT mail-address [[NOT] {event, ...}] [REMINDER cycles]
1032
1033 Example:
1034
1035 set alert foo@bar
1036
1037 will send a default email to the address foo@bar whenever any event
1038 occurs on any service.
1039
1040 If you want to send alert messages to more email addresses, add a "set
1041 alert 'email'" statement for each address.
1042
1043 It is also possible to use the local alert statement in the context of
1044 a service check to enable alert for the given service only:
1045
1046 ALERT mail-address [[NOT] {event, ...}] [REMINDER cycles]
1047
1048 Local alert example:
1049
1050 check host myhost with address 1.2.3.4
1051 if failed port 3306 protocol mysql then alert
1052 if failed port 80 protocol http then alert
1053 alert foo@baz # Local service alert
1054
1055 You can combine global and local alert statements. If there is a
1056 conflict, the local alert has precedence and overrides the global
1057 statement.
1058
1059 Setting an event filter
1060
1061 If you only want an alert message sent for certain events, list them in
1062 an "{event, ...}" block, e.g.:
1063
1064 set alert foo@bar only on { timeout, nonexist }
1065
1066 The event list can also be negated to send alerts for all events except
1067 those which are listed, by prepending the list with the word "not". For
1068 example, to receive all alerts except notification about Monit program
1069 start and stop:
1070
1071 set alert foo@bar but not on { instance }
1072
1073 Here is a list of all possible event types emitted by Monit. Values
1074 from the first column can be used in the event filter list mentioned
1075 above:
1076
1077 Event: | Failure state: | Success state:
1078 ---------------------------------------------------------------------
1079 action | "Action failed" | "Action done"
1080 checksum | "Checksum failed" | "Checksum succeeded"
1081 bytein | "Download bytes exceeded" | "Download bytes ok"
1082 byteout | "Upload bytes exceeded" | "Upload bytes ok"
1083 connection | "Connection failed" | "Connection succeeded"
1084 content | "Content failed", | "Content succeeded"
1085 data | "Data access error" | "Data access succeeded"
1086 exec | "Execution failed" | "Execution succeeded"
1087 fsflags | "Filesystem flags failed" | "Filesystem flags succeeded"
1088 gid | "GID failed" | "GID succeeded"
1089 icmp | "Ping failed" | "Ping succeeded"
1090 instance | "Monit instance changed" | "Monit instance changed not"
1091 invalid | "Invalid type" | "Type succeeded"
1092 link | "Link down" | "Link up"
1093 nonexist | "Does not exist" | "Exists"
1094 packetin | "Download packets exceeded" | "Download packets ok"
1095 packetout | "Upload packets exceeded" | "Upload packets ok"
1096 permission | "Permission failed" | "Permission succeeded"
1097 pid | "PID failed" | "PID succeeded"
1098 ppid | "PPID failed" | "PPID succeeded"
1099 resource | "Resource limit matched" | "Resource limit succeeded"
1100 saturation | "Saturation exceeded" | "Saturation ok"
1101 size | "Size failed" | "Size succeeded"
1102 speed | "Speed failed" | "Speed ok"
1103 status | "Status failed" | "Status succeeded"
1104 timeout | "Timeout" | "Timeout recovery"
1105 timestamp | "Timestamp failed" | "Timestamp succeeded"
1106 uid | "UID failed" | "UID succeeded"
1107 uptime | "Uptime failed" | "Uptime succeeded"
1108
1109 Each alert recipient can have it's own filter, for example:
1110
1111 set alert foo@bar { nonexist, timeout, resource, icmp, connection }
1112 set alert security@bar on { checksum, permission, uid, gid }
1113 set alert admin@bar
1114
1115 Setting an error reminder
1116
1117 Monit by default sends just one notification if a service failed and
1118 another when/if it recovers. If you want to be notified that the
1119 service is still in a failed state, you can use the reminder option in
1120 the alert statement:
1121
1122 SET ALERT mail-address [WITH] REMINDER [ON] number [CYCLES]
1123
1124 For example if you want to be notified each tenth cycle if a service
1125 remains in a failed state, you can use:
1126
1127 alert foo@bar with reminder on 10 cycles
1128
1129 Likewise if you want to be notified on each failed cycle, you can use:
1130
1131 alert foo@bar with reminder on 1 cycle
1132
1133 Disabling alerts for some service
1134 To suppress alerts for some user and service, add the "noalert"
1135 statement in the context of a service check.
1136
1137 NOALERT mail-address
1138
1139 Example (send all alerts to foo@bar except for service p3):
1140
1141 set alert foo@bar
1142
1143 check process p1 with pidfile /var/run/p1.pid
1144
1145 check process p2 with pidfile /var/run/p2.pid
1146
1147 check process p3 with pidfile /var/run/p3.pid
1148 noalert foo@bar
1149
1150 Message format
1151 The alert message format can be modified by using the "set mail-format"
1152 statement:
1153
1154 SET MAIL-FORMAT {mail-format}
1155
1156 Example:
1157
1158 set mail-format {
1159 from: Monit Support <monit@foo.bar>
1160 reply-to: support@domain.com
1161 subject: $SERVICE $EVENT at $DATE
1162 message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
1163 Yours sincerely,
1164 monit
1165 }
1166
1167 The from: option is the sender's email address for Monit alerts. A
1168 sender's name is optional, but if used, requires that the subsequent
1169 email-address is enclosed in angle brackets as in the example above.
1170
1171 The reply-to: option can be used to set the reply-to mail header,
1172 optionally with a name.
1173
1174 The subject: option sets the message subject and must be on only one
1175 line.
1176
1177 The message: option sets the mail body. This option should always be
1178 the last in a mail-format statement. The mail body can be as long as
1179 needed, but must not contain the block-closing '}' character.
1180
1181 You need not use all options, only the option which you want to
1182 override. For example to globally change the sender address only:
1183
1184 set mail-format { from: bofh@foo.bar }
1185
1186 The subject and body may contain $NAME variables, which are expanded by
1187 Monit. Here is a list of variables that can be used when composing an
1188 alert message.
1189
1190 • $EVENT
1191
1192 A string describing the event that occurred.
1193
1194 • $SERVICE
1195
1196 The service name
1197
1198 • $DATE
1199
1200 The current time and date (RFC 822 date style).
1201
1202 • $HOST
1203
1204 The name of the host Monit is running on
1205
1206 • $ACTION
1207
1208 The name of the action which was done by Monit.
1209
1210 • $DESCRIPTION
1211
1212 The description of the error condition
1213
1214 Setting a mail server for alert delivery
1215 The mail server Monit should use to send alert messages is defined with
1216 a "set mailserver" statement:
1217
1218 SET MAILSERVER
1219 <hostname|ip-address>
1220 [PORT number]
1221 [USERNAME string] [PASSWORD string]
1222 [using SSL [with options {...}]
1223 [CERTIFICATE CHECKSUM [MD5|SHA1] <hash>],
1224 ...
1225 [with TIMEOUT X SECONDS]
1226 [using HOSTNAME hostname]
1227
1228 Multiple mail servers can be set by using a comma separated list. If
1229 Monit cannot connect to the first server, it will try the next in the
1230 list and so on.
1231
1232 The port statement allows one to override the default SMTP port (465
1233 for SSL, or 25 for TLS and non secure connection).
1234
1235 Monit supports AUTH PLAIN and AUTH LOGIN for SMTP authentication. You
1236 can set a username and a password using the USERNAME and PASSWORD
1237 options.
1238
1239 You can set SSL/TLS options for the connection and also check a SSL
1240 certificate checksum.
1241
1242 The default connection timeout is 5 seconds. You can rise this limit
1243 using the TIMEOUT option.
1244
1245 Example (setting two mail servers for failover):
1246
1247 set mailserver smtp.gmail.com, smtp.other.host
1248
1249 By default, Monit uses the local host name in SMTP HELO/EHLO and in the
1250 Message-ID header. You can override this using the HOSTNAME option.
1251
1252 Event queue
1253 If no mail server is available, Monit can queue events in the local
1254 file-system for retry until the mail server recovers.
1255
1256 If Monit is used with M/Monit, the event queue provides a safe event
1257 store for M/Monit in the case of temporary problems.
1258
1259 The event queue is persistent across Monit restarts and provided that
1260 the back-end filesystem is persistent, across system restart as well.
1261
1262 By default, the queue is disabled and if the alert handler fails, Monit
1263 will simply drop the alert message.
1264
1265 To enable the event queue, add the following statement:
1266
1267 SET EVENTQUEUE BASEDIR <path> [SLOTS <number>]
1268
1269 The <path> is the path to the directory where events will be stored.
1270
1271 Optionally if you want to limit the queue size, use the slots option to
1272 only store up to number event messages.
1273
1274 Example:
1275
1276 set eventqueue basedir /var/monit slots 5000
1277
1278 If you are running more then one Monit instance on the same machine,
1279 you must use separated event queue directories.
1280
1282 Each service can have associated start, stop and restart methods which
1283 Monit can use to execute action on the service.
1284
1285 Syntax:
1286
1287 <START | STOP | RESTART> [PROGRAM] = "program"
1288 [[AS] UID <number | string>]
1289 [[AS] GID <number | string>]
1290 [[WITH] TIMEOUT <number> SECOND(S)]
1291
1292 If the "program" is a shell script it must begin with "#!" and the
1293 remainder of the first line must specify an interpreter for the
1294 program. e.g. "#!/bin/sh"
1295
1296 The "program" must also be executable (for example mode 0755).
1297
1298 It's possible to write scripts directly into the program this way:
1299
1300 stop = "/bin/sh -c 'kill -s SIGTERM `cat /var/run/process.pid`'"
1301
1302 By default the program is executed as the user under which Monit is
1303 running. If Monit is running as root, you may optionally specify the
1304 UID and GID the executed program should switch to.
1305
1306 Example:
1307
1308 check process mmonit with pidfile /usr/local/mmonit/mmonit/logs/mmonit.pid
1309 start program = "/usr/local/mmonit/bin/mmonit" as uid "mmonit" and gid "mmonit"
1310 stop program = "/usr/local/mmonit/bin/mmonit stop" as uid "mmonit" and gid "mmonit"
1311
1312 In the case of a process check, Monit will wait up to 30 seconds for
1313 the start/stop action to finish before giving up and report an error.
1314 You can override this timeout using the TIMEOUT option or globally
1315 using the set limits.
1316
1317 Example:
1318
1319 check process foobar with pidfile /var/run/foobar.pid
1320 start program = "/etc/init.d/foobar start" with timeout 60 seconds
1321 stop program = "/etc/init.d/foobar stop"
1322
1324 Services are checked regularly in an interval defined by the "set
1325 daemon n" statement. Checks are performed in the same order as they are
1326 written in the ".monitrc" file, except if dependencies are setup
1327 between services, where pre-requisite services are tested first.
1328
1329 It is possible to modify a service check schedule by using the "every"
1330 statement.
1331
1332 There are three variants:
1333
1334 1. A poll cycle multiple
1335 EVERY [number] CYCLES
1336
1337 2. Cron-style
1338 EVERY [cron]
1339
1340 3. Negative Cron-style (do-not-check)
1341 NOT EVERY [cron]
1342
1343 A cron-style string consist of 5 fields separated with white-space.
1344 All fields are required:
1345
1346 Name: | Allowed values: | Special characters:
1347 ---------------------------------------------------------------
1348 Minutes | 0-59 | * - ,
1349 Hours | 0-23 | * - ,
1350 Day of month | 1-31 | * - ,
1351 Month | 1-12 (1=jan, 12=dec) | * - ,
1352 Day of week | 0-6 (0=sunday, 6=saturday) | * - ,
1353
1354 The special characters:
1355
1356 Character: | Description:
1357 ---------------------------------------------------------------
1358 * (asterisk) | The asterisk indicates that the expression will
1359 | match for all values of the field; e.g., using
1360 | an asterisk in the 4th field (month) would
1361 | indicate every month.
1362 - (hyphen) | Hyphens are used to define ranges. For example,
1363 | 8-9 in the hour field indicate between 8AM and
1364 | 9AM. Note that range is from start time until and
1365 | including end time. That is, from 8AM and until
1366 | 10AM unless minutes are set. Another example,
1367 | 1-5 in the weekday field, specify from monday to
1368 | friday (including friday).
1369 , (comma) | Comma are used to specify a sequence. For example
1370 | 17,18 in the day field indicate the 17th and 18th
1371 | day of the month. A sequence can also include
1372 | ranges. For example, using 1-5,0 in the weekday
1373 | field indicate monday to friday and sunday.
1374
1375 Example 1: Check once per two cycles
1376
1377 check process nginx with pidfile /var/run/nginx.pid
1378 every 2 cycles
1379
1380 Example 2: Check every workday between 8AM to 7PM
1381
1382 check program checkOracleDatabase
1383 with path /var/monit/programs/checkoracle.pl
1384 every "* 8-19 * * 1-5"
1385
1386 Example 3: Do not run the check in the backup window on Sunday between
1387 0AM to 3AM, otherwise run the check with the regular poll cycle
1388 frequency.
1389
1390 check process mysqld with pidfile /var/run/mysqld.pid
1391 not every "* 0-3 * * 0"
1392
1393 Limitations:
1394
1395 The current scheduler is poll cycle based. If a service check is
1396 scheduled with the every cron statement, Monit will check if the
1397 current time match the cron-string pattern. If it does, then the check
1398 is performed otherwise it is skipped. The cron specification does not
1399 guarantee when exactly the test will run, this depends on the default
1400 poll time and the length of the check cycle. In other words, we cannot
1401 guarantee that Monit will run on a specific time. Therefore we strongly
1402 recommend to use an asterix in the minute field or at minimum a range,
1403 e..g. 0-15. Never use a specific minute as Monit may not run on that
1404 minute.
1405
1406 We will address this limitation in a future release and convert the
1407 scheduler from serial polling into a parallel non-blocking scheduler
1408 where checks are guaranteed to run on time and with seconds resolution.
1409
1411 Service entries in the control file, monitrc, can be grouped together
1412 by the group statement. The syntax is simply (keyword in capital):
1413
1414 GROUP groupname
1415
1416 With this statement it is possible to group similar service entries
1417 together and manage them as a whole. Monit provides functions to start,
1418 stop, restart, monitor and unmonitor a group of services, like so:
1419
1420 To start a group of services from the console:
1421
1422 monit -g <groupname> start
1423
1424 To stop a group of services:
1425
1426 monit -g <groupname> stop
1427
1428 To restart a group of services:
1429
1430 monit -g <groupname> restart
1431
1432 A service can be added to multiple groups by using more than one group
1433 statement:
1434
1435 group www
1436 group filesystem
1437
1439 Monit supports two monitoring modes: active and passive.
1440
1441 Syntax:
1442
1443 MODE <ACTIVE | PASSIVE>
1444
1445 In active mode, Monit will pro-actively monitor a service and in case
1446 of problems raise alerts and restart the service. Active is the default
1447 mode.
1448
1449 The passive mode is similar to the active mode, except if the service
1450 fails, monit will not try to fix a problem by restarting the service
1451 and will raise alerts only.
1452
1454 Monit supports three reboot modes: start, nostart and laststate.
1455
1456 Syntax:
1457
1458 ONREBOOT <START | NOSTART | LASTSTATE>
1459
1460 In start mode, Monit will always start the service automatically on
1461 reboot, even if it was stopped before restart. This is the default mode
1462 and used if onreboot is not specified.
1463
1464 In nostart mode, the service is never started automatically after
1465 reboot. This mode is intended for a high-availability solutions with
1466 active/passive clusters. For example, a service group HA, consisting of
1467 e.g. a mobile IP alias and an application server, is started on host
1468 H1, host H2 is backup and heartbeat is in place between both hosts.
1469 The service group HA must be started on one node only. If H1 dies, H2
1470 takes over the HA group. If H1 reboots, it is important that it won't
1471 try to start the HA group also. Even though the group was active on H1
1472 before it crashed, as HA is running on H2 now.
1473
1474 In laststate mode, a service's monitoring state is persistent across
1475 reboot. For instance, if a service was started before reboot, it will
1476 be started after reboot. If it was stopped before reboot, it will not
1477 be started after and so on.
1478
1479 The default ONREBOOT START mode can be overridden globally:
1480
1481 SET ONREBOOT <START | NOSTART | LASTSTATE>
1482
1484 Monit provides a restart limit mechanism for situations where a service
1485 simply refuses to start or respond over a longer period.
1486
1487 The restart limit mechanism is based on number of service restarts and
1488 number of poll-cycles. For example, if a service had x restarts within
1489 y poll-cycles (where x <= y) then Monit will perform an action (for
1490 example unmonitor the service). If a timeout occurs, Monit will send an
1491 alert message if you have register interest for this event.
1492
1493 The syntax for the timeout statement is as follows (keywords are in
1494 capital):
1495
1496 IF <number> RESTART <number> CYCLE(S) THEN <action>
1497
1498 The action value is either one of common actions or TIMEOUT (for
1499 backward compatibility, equals to UNMONITOR action).
1500
1501 Here is an example where Monit will unmonitor the service if it was
1502 restarted 2 times within 3 cycles:
1503
1504 if 2 restarts within 3 cycles then unmonitor
1505
1506 To have Monit check the service again after monitoring was disabled,
1507 run "monit monitor servicename" from the command line.
1508
1509 Example for setting custom exec on timeout:
1510
1511 if 5 restarts within 5 cycles then exec "/foo/bar"
1512
1513 Example for stopping the service:
1514
1515 if 7 restarts within 10 cycles then stop
1516
1518 If specified in the control file, Monit can do dependency checking
1519 before start, stop, monitoring or unmonitoring of services. The
1520 dependency statement may be used within any service entries in the
1521 Monit control file.
1522
1523 The syntax for the depend statement is simply:
1524
1525 DEPENDS on service[, service [,...]]
1526
1527 Where service is a check service entry name used in your ".monitrc"
1528 file, for instance apache or datafs.
1529
1530 You may add more than one service name of any type or use more than one
1531 depend statement in an entry.
1532
1533 Services specified in a depend statement will be checked during
1534 stop/start/monitor/unmonitor operations.
1535
1536 If a service is stopped or unmonitored it will stop/unmonitor any
1537 services that depends on itself.
1538
1539 If the service is started, all services which this service depends on
1540 will be started before starting this service. if start of some service
1541 failed, the service with prerequisites will NOT be started and the, but
1542 will remember that it should start and will retry next cycle.
1543
1544 If a service is restarted, it will first stop any active services that
1545 depend on it and after it is started, start all depending services that
1546 were active before the restart again.
1547
1548 Here is an example where we set up an apache service entry to depend on
1549 the underlying apache binary. If the binary should change an alert is
1550 sent and apache is not monitored anymore. The rationale is security and
1551 that Monit should not execute a possibly cracked apache binary.
1552
1553 (1) check process apache with pidfile "/var/run/httpd.pid"
1554 (2) depends on httpd
1555 (3) ...
1556 (4)
1557 (5) check file httpd with path /usr/bin/httpd
1558 (6) if failed checksum then stop
1559
1560 The first entry is the process entry for apache. The second line sets
1561 up a dependency between this entry and the service entry named httpd in
1562 line 5. A dependency tree works as follows, if an action is conducted
1563 in a lower branch it will propagate upward in the tree and for every
1564 dependent entry execute the same action. In this case, if the checksum
1565 should fail in line 6 then an stop action is executed and apache binary
1566 is not checked anymore. But since the apache process entry depends on
1567 the httpd entry this entry will also execute the stop action. In short,
1568 if the checksum test for the httpd binary file should fail, both the
1569 check file httpd and the check process apache entry are stopped.
1570
1571 A dependency tree is a general construct and can be used between all
1572 types of service entries and span many levels and propagate any
1573 supported action (except the exec action which will not propagate
1574 upward in a dependency tree for obvious reasons).
1575
1576 Here is another different example. Consider the following common server
1577 setup:
1578
1579 WEB-SERVER -> APPLICATION-SERVER -> DATABASE -> FILESYSTEM
1580 (a) (b) (c) (d)
1581
1582 You can set dependencies so that the web-server depends on the
1583 application server to run before the web-server starts and the
1584 application server depends on the database server and the database
1585 depends on the filesystem to be mounted before it starts. See also the
1586 example section below for examples using the depend statement.
1587
1588 Here we describe how Monit will function with the above dependencies:
1589
1590 If no services are running
1591 Monit will start the servers in the following order: d, c, b, a
1592
1593 If all servers are running
1594 When you run 'monit stop all' this is the stop order: a, b, c, d.
1595 If you run 'Monit stop d' then a, b and c are also stopped because
1596 they depend on d and finally d is stopped.
1597
1598 If a does not run
1599 Monit will start a
1600
1601 If b does not run
1602 Monit will first stop a then start b and finally start a if b is up
1603 again.
1604
1605 If c does not run
1606 Monit will first stop a and b then start c and finally start b then
1607 a.
1608
1609 If d does not run
1610 Monit will first stop a, b and c then start d and finally start c,
1611 b then a.
1612
1613 If the control file contains a depend loop.
1614 A depend loop is for example; a->b and b->a or a->b->c->a.
1615
1616 When Monit starts it will check for such loops and complain and
1617 exit if a loop was found. It will also exit with a complaint if a
1618 depend statement was used that does not point to a service in the
1619 control file.
1620
1622 LIMITS
1623 You can configure and set various limits to tweak buffer sizes and
1624 timeouts used by Monit. In most situations the default values are fine.
1625 If needed, below are the limits you can currently modify in Monit.
1626
1627 Syntax:
1628
1629 SET LIMITS {
1630 PROGRAMOUTPUT: <number> <unit>,
1631 SENDEXPECTBUFFER: <number> <unit>,
1632 FILECONTENTBUFFER: <number> <unit>,
1633 HTTPCONTENTBUFFER: <number> <unit>,
1634 NETWORKTIMEOUT: <number> <timeunit>
1635 PROGRAMTIMEOUT: <number> <timeunit>
1636 STOPTIMEOUT: <number> <timeunit>
1637 STARTTIMEOUT: <number> <timeunit>
1638 RESTARTTIMEOUT: <number> <timeunit>
1639 }
1640
1641 Where:
1642 unit is "B" (byte), "kB" (kilobyte) or "MB" (megabyte)
1643 timeunit is "MS" (millisecond) or "S" (second)
1644
1645 Options legend:
1646
1647 ----------------------------------------------------------------------------------
1648 | Option | Description | Default |
1649 ----------------------------------------------------------------------------------
1650 | programOutput | limit for check program output (truncated after) | 512 B |
1651 | sendExpectBuffer | limit for send/expect protocol test | 256 B |
1652 | fileContentBuffer | limit for file content test (line) | 512 B |
1653 | httpContentBuffer | limit for HTTP content test (response body) | 1 MB |
1654 | networkTimeout | timeout for network I/O | 5 s |
1655 | programTimeout | timeout for check program | 300 s |
1656 | stopTimeout | timeout for service stop | 30 s |
1657 | startTimeout | timeout for service start | 30 s |
1658 | restartTimeout | timeout for service restart | 30 s |
1659 ----------------------------------------------------------------------------------
1660
1661 GENERAL SYNTAX
1662 Monit offers several if-tests you can use in a 'check' statement to
1663 test various aspects of a service.
1664
1665 You can test both for a predefined value or for a range and take
1666 actions if the value changes.
1667
1668 General syntax for testing a specific value or range:
1669
1670 IF <test> THEN <action> [ELSE <action>]
1671
1672 The action is evaluated each time the <TEST> condition is true. Success
1673 action is optional and executed only when the state changes from
1674 failure to success. If success action is not set, Monit will send a
1675 recovery alert by default.
1676
1677 General syntax for a value change test:
1678
1679 IF CHANGED <test> THEN <action>
1680
1681 The action is executed each time the value changes. Monit will remember
1682 the new value and will trigger event if the value change again.
1683
1684 ACTION
1685 In each test you must select the action to be executed from this list:
1686
1687 • ALERT sends the user an alert event on each state change.
1688
1689 • RESTART restarts the service and send an alert. Restart is
1690 performed by calling the service's registered restart method or by
1691 first calling the stop method followed by the start method if
1692 restart is not set.
1693
1694 • START starts the service by calling the service's registered start
1695 method and send an alert.
1696
1697 • STOP stops the service by calling the service's registered stop
1698 method and send an alert. If Monit stops a service it will not be
1699 checked by Monit anymore nor restarted again later. To reactivate
1700 monitoring of the service again you must explicitly enable
1701 monitoring from the web interface or from the console.
1702
1703 • EXEC can be used to execute an arbitrary program and send an alert.
1704 If you choose this action you must state the program to be executed
1705 and if the program requires arguments you must enclose the program
1706 and its arguments in a quoted string. You may optionally specify
1707 the uid and gid the executed program should switch to upon start.
1708 The program is executed only once if the test fails. You can enable
1709 execute repetition if the error persists for a given number of
1710 cycles. For instance:
1711
1712 if failed <test> then exec "/usr/local/bin/sms.sh"
1713 as uid "nobody" and gid "nobody"
1714 repeat every 5 cycles
1715
1716 Remember, if Monit is run by root, then all programs executed by
1717 Monit will be started with superuser privileges unless the uid and
1718 gid extension is used.
1719
1720 • UNMONITOR will disable monitoring of the service and send an alert.
1721 The service will not be checked by Monit anymore nor restarted
1722 again later. To reactivate monitoring of the service you must
1723 explicitly enable monitoring from the web interface or from the
1724 console.
1725
1726 FAULT TOLERANCE
1727 By default an action is executed if it matches and the corresponding
1728 service is set in an error state. However, you can require a test to
1729 fail more than once before the error event is triggered and the service
1730 state is changed to failed. This is useful to avoid getting alerts on
1731 spurious errors, which can happen, especially with network tests.
1732
1733 Syntax:
1734
1735 FOR <X> CYCLES ...
1736
1737 or:
1738
1739 <X> [TIMES WITHIN] <Y> CYCLES ...
1740
1741 The condition can be used both for failure and success action.
1742
1743 The first, simpler and recommended format requires "X" consecutive
1744 events before switching the state:
1745
1746 if failed
1747 port 80
1748 for 3 cycles
1749 then alert
1750
1751 The second format is more advanced and allows one to tolerate
1752 intermittent issues, but still catch excessive problems, where the
1753 service is flapping between error and success states frequently.
1754
1755 For example if every second cycle fails (1-0-1-0-1-0-...), then "for 2
1756 cycles" condition will never match, despite the service having
1757 problems. The following statement will catch such a state:
1758
1759 if failed
1760 port 80
1761 for 3 times within 5 cycles
1762 then alert
1763
1764 Example which sets multiple error levels and actions:
1765
1766 check filesystem rootfs with path /dev/hda1
1767 if space usage > 80% for 5 times within 15 cycles then alert
1768 if space usage > 90% for 5 cycles then exec '/try/to/free/the/space'
1769
1770 Note: the maximum value for cycles is 64.
1771
1772 EXISTENCE TESTS
1773 This test allows one to trigger an action based on the monitored object
1774 existence. It is supported for process, file, directory, filesystem and
1775 fifo services.
1776
1777 If no existence test is defined, the implicit non-existence test with
1778 restart action is activated, so for example if the process stops, Monit
1779 will restart it.
1780
1781 There are two types of existence tests:
1782
1783 NON-EXIST
1784
1785 This test will trigger an action if the object does not exist. It can
1786 be used for example to make sure apache is running, data filesystem is
1787 mounted, etc.
1788
1789 IF [DOES] NOT EXIST THEN <action>
1790
1791 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
1792 "UNMONITOR".
1793
1794 Example: Exec a script if a filesystem does NOT exist:
1795
1796 check filesystem disk1 with path /dev/sda1
1797 if does not exist then exec "/sbin/mount..."
1798
1799 EXIST
1800
1801 This test is the inverse of the non-existence test: it will trigger an
1802 action if the object DOES exist. It can be used for example to kill a
1803 process which shouldn't be running.
1804
1805 IF [DOES] EXIST THEN <action>
1806
1807 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
1808 "UNMONITOR".
1809
1810 Example: kill a process that should not run:
1811
1812 check process vmware matching "vmware"
1813 if exist then exec "/usr/bin/pkill -9 vmware"
1814
1815 Example: Alert if a file exist which shouldn't
1816
1817 check file x with path /some/path/x
1818 if exist then alert
1819
1820 RESOURCE TESTS
1821 Monit can examine how much resources a service is using. This test can
1822 only be used within a system or process service entry in the Monit
1823 control file.
1824
1825 Depending on system or process characteristics, services can be stopped
1826 or restarted and alerts can be generated. Thus it is possible to
1827 utilise systems which are idle and to spare system under high load.
1828
1829 Syntax:
1830
1831 IF <resource> <operator> <value> THEN <action>
1832
1833 operator is a choice of "<", ">", "!=", "==" in C notation, "gt", "lt",
1834 "eq", "ne" in shell sh notation and "greater", "less", "equal",
1835 "notequal" in human readable form (if not specified, default is EQUAL).
1836
1837 value is either an integer or a real number.
1838
1839 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
1840 "UNMONITOR".
1841
1842 resource set depends on the service type:
1843
1844 System resource tests
1845
1846 LOADAVG([1min|5min|15min]) [PER CORE] refers to the system's load
1847 average. The load average is the number of processes in the system run
1848 queue per CPU core, averaged over the specified time period. Example:
1849
1850 if loadavg (1min) per core > 2 for 15 cycles then alert
1851 if loadavg (5min) per core > 1.5 for 10 cycles then alert
1852 if loadavg (15min) per core > 1 for 8 cycles then alert
1853
1854 If you'll omit the per core option, the test will check the total load
1855 average regardless of CPU cores count.
1856
1857 CPU([user|system|wait|nice|hardirq|softirq|steal|guest|guestnice]) is
1858 the percent of time the system spend in given type of task:
1859
1860 user
1861 The CPU is running code in user space mode, which includes any
1862 process that doesn't belong to the kernel, such as webservers,
1863 databases, shells and desktop related programs.
1864
1865 system
1866 The CPU is running the kernel, which includes drivers and other
1867 kernel modules. The kernel also handles requests from user space
1868 processes like memory allocation, disk and network I/O and creating
1869 child processes.
1870
1871 wait
1872 I/O wait is when the CPU was idle while waiting for an I/O
1873 operation from disk or network to complete.
1874
1875 nice
1876 The nice statistics accounts for user space processes that are
1877 running with altered priority (higher or lower then normal).
1878
1879 hardirq
1880 The kernel is servicing hardware interrupt requests. Hardware
1881 interrupts come from peripherals like keyboard, network interfaces,
1882 disks, system clock, etc.
1883
1884 softirq
1885 The kernel is servicing software interrupt requests. Software
1886 interrupts come from processes running in the system.
1887
1888 steal
1889 This applies only to virtual machines on a hypervisor. The steal
1890 time shows the percentage of time a virtual machine had to wait the
1891 real CPU while the hypervisor was servicing another virtual
1892 machine. If this number remains high, the host system is too busy
1893 and may need more physical CPUs or offload some virtual machines to
1894 another host.
1895
1896 guest
1897 This applies only to host machines running a hypervisor. It shows
1898 time spent running a virtual CPU for guest operating systems under
1899 the control of the Linux kernel. This value is already included in
1900 "user" statistics.
1901
1902 guestnice
1903 This applies only to host machines running a hypervisor. It shows
1904 time spent running a virtual CPU for guest operating systems under
1905 the control of the Linux kernel, with altered priority. This value
1906 is already included in "nice" statistics.
1907
1908 The user/system/wait/nice/hardirq/softirq/steal/guest/guestnice
1909 modifier is optional and the support depends on platform (Linux support
1910 depends on kernel version, all statistics are available since kernel
1911 2.6.33):
1912
1913 -----------------------------------------------------------------------------------------------
1914 | Platform | user | nice | system | wait | hardirq | softirq | steal | guest | guest nice |
1915 -----------------------------------------------------------------------------------------------
1916 | AIX | X | | X | X | | | | | |
1917 | DragonFlyBSD | X | X | X | | X | | | | |
1918 | FreeBSD | X | X | X | | X | | | | |
1919 | Linux | X | X | X | X | X | X | X | X | X |
1920 | MacOS | X | X | X | | | | | | |
1921 | NetBSD | X | X | X | | X | | | | |
1922 | OpenBSD | X | X | X | | X | | | | |
1923 | Solaris | X | | X | X | | | | | |
1924 -----------------------------------------------------------------------------------------------
1925
1926 Example:
1927
1928 if cpu usage > 95% for 10 cycles then alert
1929
1930 MEMORY is the system memory usage [%] or absolute value [B, kB, MB,
1931 GB]. Example:
1932
1933 if memory usage > 75% for 5 cycles then alert
1934
1935 SWAP is the swap usage of the system [%] or absolute [B, kB, MB, GB].
1936 Example:
1937
1938 if swap usage > 20% for 10 cycles then alert
1939
1940 Process resource tests
1941
1942 CPU is the CPU usage of the process itself [%]. Monit calculates the
1943 CPU usage based on number of threads vs. available CPU cores. If the
1944 process has one thread, the 100% CPU usage equals to 100% utilization
1945 of one CPU core. If it has 2 threads, 100% CPU usage is reported when
1946 it uses 2 CPU cores on 100%, etc. If the process has more threads then
1947 the machine's available CPU cores, then the 100% CPU usage corresponds
1948 to utilization of all available CPU cores. Example:
1949
1950 if cpu > 10% for 5 cycles then restart
1951
1952 TOTAL CPU is the total CPU usage of the process and its children in
1953 (percent). You will want to use TOTAL CPU typically for services like
1954 Apache web server where one master process forks child processes as
1955 workers. Example:
1956
1957 if total cpu > 50% for 10 cycles then restart
1958
1959 THREADS is the number of processes' threads. Example:
1960
1961 if threads > 3 then alert
1962
1963 CHILDREN is the number of child processes of the process. Example:
1964
1965 if children > 10 then alert
1966
1967 MEMORY is the memory usage of the process itself, [%] or absolute value
1968 [B, kB, MB, GB]. Example:
1969
1970 if memory usage > 8 MB then alert
1971
1972 TOTAL MEMORY is the memory usage of the process and its child processes
1973 in either percent or as an amount [B, kB, MB, GB]. Example:
1974
1975 if total memory usage > 1% for 10 cycles then alert
1976
1977 PROCESS I/O ACTIVITY TEST
1978 Monit can test process's filesystem read and write activity. This test
1979 can only be used in the context of a process service type. Monit will
1980 normally need to run as the root user to access this metrics.
1981
1982 The OS usually supports the per-process I/O metrics by bytes or by
1983 operations.
1984
1985 Some platforms allows one to differentiate the I/O subset that required
1986 physical storage access from generic I/O which was handled by cache.
1987 Note that as the physical I/O is usually aligned to the filesystem
1988 page, there may be difference between the total and physical I/O even
1989 if the process tried to read just 1 byte.
1990
1991 Per-process I/O activity statistics by platform:
1992
1993 ---------------------------------------------------------------
1994 | Platform | Operation | Byte (physical) | Byte (generic) |
1995 ---------------------------------------------------------------
1996 | AIX | X | | |
1997 | DragonFlyBSD | X | | |
1998 | FreeBSD | X | | |
1999 | Linux | X | X | X |
2000 | MacOS | | X | |
2001 | NetBSD | X | | |
2002 | OpenBSD | X | | |
2003 | Solaris | X | | |
2004 ---------------------------------------------------------------
2005
2006 Read: bytes per second (generic)
2007
2008 Syntax:
2009
2010 IF READ [ACTIVITY] <operator> <number> <unit>/S THEN action
2011
2012 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2013 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2014 "notequal" in human readable form (if not specified, default is EQUAL).
2015
2016 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2017 "kilobyte", "megabyte", "gigabyte", "percent".
2018
2019 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2020 "UNMONITOR".
2021
2022 Example:
2023
2024 check process p...
2025 if read activity > 1 MB/s then alert
2026
2027 Read: bytes per second (physical storage)
2028
2029 Syntax:
2030
2031 IF DISK READ [ACTIVITY] <operator> <number> <unit>/S THEN action
2032
2033 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2034 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2035 "notequal" in human readable form (if not specified, default is EQUAL).
2036
2037 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2038 "kilobyte", "megabyte", "gigabyte", "percent".
2039
2040 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2041 "UNMONITOR".
2042
2043 Example:
2044
2045 check process p...
2046 if disk read activity > 1 MB/s then alert
2047
2048 Read: operations per second
2049
2050 Syntax:
2051
2052 IF DISK READ [ACTIVITY] <operator> <number> operations/S THEN action
2053
2054 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2055 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2056 "notequal" in human readable form (if not specified, default is EQUAL).
2057
2058 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2059 "UNMONITOR".
2060
2061 Example:
2062
2063 check process p...
2064 if disk read activity > 500 operations/s then alert
2065
2066 Write: bytes per second (generic)
2067
2068 Syntax:
2069
2070 IF WRITE [ACTIVITY] <operator> <number> <unit>/S THEN action
2071
2072 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2073 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2074 "notequal" in human readable form (if not specified, default is EQUAL).
2075
2076 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2077 "kilobyte", "megabyte", "gigabyte", "percent".
2078
2079 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2080 "UNMONITOR".
2081
2082 Example:
2083
2084 check process p...
2085 if write activity > 1 MB/s then alert
2086
2087 Write: bytes per second (physical storage)
2088
2089 Syntax:
2090
2091 IF DISK WRITE [ACTIVITY] <operator> <number> <unit>/S THEN action
2092
2093 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2094 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2095 "notequal" in human readable form (if not specified, default is EQUAL).
2096
2097 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2098 "kilobyte", "megabyte", "gigabyte", "percent".
2099
2100 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2101 "UNMONITOR".
2102
2103 Example:
2104
2105 check process p...
2106 if disk write activity > 1 MB/s then alert
2107
2108 Write: operations per second
2109
2110 Syntax:
2111
2112 IF DISK WRITE [ACTIVITY] <operator> <number> operations/S THEN action
2113
2114 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2115 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2116 "notequal" in human readable form (if not specified, default is EQUAL).
2117
2118 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2119 "UNMONITOR".
2120
2121 Example:
2122
2123 check process p...
2124 if disk write activity > 500 operations/s then alert
2125
2126 FILE CHECKSUM TEST
2127 The checksum statement may only be used in a file service entry and can
2128 be used to check the file's MD5 or SHA1 checksum.
2129
2130 Check specific checksum:
2131
2132 IF FAILED [MD5|SHA1] CHECKSUM [EXPECT checksum] THEN action
2133
2134 Check any file changes:
2135
2136 IF CHANGED [MD5|SHA1] CHECKSUM THEN action
2137
2138 The choice of MD5 or SHA1 is optional. MD5 features a 128 bits checksum
2139 (32 bytes hex encoded string) and SHA1 a 160 bits checksum (40 bytes
2140 hex encoded string). If this option is omitted, Monit will try to guess
2141 the method from the EXPECT string or use MD5 as the default checksum.
2142
2143 "expect" is optional and if used, specifies the md5 or sha1 string
2144 Monit should expect when testing a file's checksum. Monit will then not
2145 compute an initial checksum for the file, but instead use the string
2146 you submit. For example:
2147
2148 if failed
2149 checksum expect 8f7f419955cefa0b33a2ba316cba3659
2150 then alert
2151
2152 You can, for example, use the GNU utility md5sum(1) or sha1sum(1) to
2153 create a checksum string for a file and use this string in the expect-
2154 statement.
2155
2156 Reloading a server if its configuration file was changed:
2157
2158 check file apache_conf with path /etc/apache/httpd.conf
2159 if changed checksum then exec "/usr/bin/apachectl graceful"
2160
2161 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2162 "UNMONITOR".
2163
2164 TIMESTAMP TEST
2165 The timestamp statement may only be used in a file, fifo or directory
2166 service entry.
2167
2168 Relative timestamp syntax:
2169
2170 IF <ACCESS TIME | ATIME | MODIFICATION TIME | MTIME | CHANGE TIME | CTIME | TIME[STAMP]> <operator> <value> [unit] THEN <action>
2171
2172 Timestamp change syntax:
2173
2174 IF CHANGED <ACCESS TIME | ATIME | MODIFICATION TIME | MTIME | CHANGE TIME | CTIME | TIME[STAMP]> THEN action
2175
2176 There are four timestamp test types:
2177
2178 ACCESS (ATIME)
2179 Test the timestamp which is updated whenever the object is
2180 accessed, for example the file is read. Filesystem usually
2181 allows one to disable atime updates using mount options, so
2182 this test will work only if the filesystem performs atime
2183 updates.
2184
2185 CHANGE (CTIME)
2186 Test the timestamp which is updated whenever the object
2187 metadata such as owner, group, permissions or hard link
2188 count are changed.
2189
2190 MODIFICATION (MTIME)
2191 Test the timestamp which is updated whenever the object
2192 content is modified. The file modification timestamp is
2193 updated whenever the file is truncated or written to. The
2194 directory modification timestamp is updated whenever some
2195 files/subdirectories were added to the directory or removed
2196 from that directory.
2197
2198 DEFAULT (LATEST OF CHANGE AND MODIFICATION TIMES)
2199 If no specific timestamp type is set, the latest of change
2200 and modification timestamps is checked. This test allows
2201 for simple testing of any object modification (data and
2202 metadata).
2203
2204 operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT",
2205 "EQ", "NE" in shell sh notation and "NEWER, "OLDER", "GREATER", "LESS",
2206 "EQUAL", "NOTEQUAL" in human readable form (if not specified, default
2207 is EQUAL).
2208
2209 value is a time watermark.
2210
2211 unit is either "SECOND(S)", "MINUTE(S)", "HOUR(S)" or "DAY(S)".
2212
2213 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2214 "UNMONITOR".
2215
2216 For example to reload apache if the configuration file changed:
2217
2218 check file apache_conf with path /etc/apache/httpd.conf
2219 if changed timestamp then exec "/usr/bin/apachectl graceful"
2220
2221 For example to test directory for file addition or removal:
2222
2223 check directory bar path /foo/bar
2224 if changed timestamp then alert
2225
2226 Example for sending alert if a log file is not updated for more than 1
2227 hour:
2228
2229 if timestamp is older than 1 hour then alert
2230
2231 FILE SIZE TEST
2232 The size statement may only be used in a check file service entry. If
2233 specified in the control file, Monit will compute a size for a file.
2234
2235 Testing specific size or range:
2236
2237 IF SIZE [[operator] value [unit]] THEN action
2238
2239 Testing size changes:
2240
2241 IF CHANGED SIZE THEN action
2242
2243 operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT",
2244 "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL",
2245 "NOTEQUAL" in human readable form (if not specified, default is EQUAL).
2246
2247 value is a size watermark.
2248
2249 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2250 "kilobyte", "megabyte", "gigabyte". If it is not specified, "byte" unit
2251 is assumed by default.
2252
2253 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2254 "UNMONITOR".
2255
2256 For example to send an alert if the file is too large:
2257
2258 check file mydb with path /data/mydatabase.db
2259 if size > 1 GB then alert
2260
2261 FILE CONTENT TEST
2262 The content statement can be used to incrementally test the content of
2263 a text file by using regular expressions.
2264
2265 Syntax:
2266
2267 IF CONTENT <operator> <regex|path> THEN action
2268
2269 operator is either a "=" for match or "!=" for no-match.
2270
2271 regex is a string containing the extended regular expression. See also
2272 regex(7).
2273
2274 path is an absolute path to a file containing extended regular
2275 expression on every line. See also regex(7).
2276
2277 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2278 "UNMONITOR".
2279
2280 On startup the read position is set to the end of the file and Monit
2281 continues to scan to the end of the file on each cycle.
2282
2283 If the file size should decrease or inode changed, the read position is
2284 set to the start of the file.
2285
2286 Only lines ending with a newline character are inspected.
2287
2288 By default only the first 511 characters of a line are inspected. You
2289 can increase the limit using the set limits statement.
2290
2291 IGNORE CONTENT <operator> <regex|path>
2292
2293 Lines matching an IGNORE are not inspected during later evaluations.
2294 IGNORE CONTENT has always precedence over IF CONTENT.
2295
2296 All IGNORE CONTENT statements are evaluated first, in the order of
2297 their appearance. Thereafter, all the IF CONTENT statements are
2298 evaluated.
2299
2300 For example:
2301
2302 check file syslog with path /var/log/syslog
2303 ignore content = "monit"
2304 if content = "^mrcoffee" then alert
2305
2306 FILESYSTEM MOUNT FLAGS TEST
2307 Monit can test the filesystem mount flags for changes. This test is
2308 implicit and Monit will send alert in case of failure by default.
2309
2310 This test is useful for detecting changes of filesystem flags such as
2311 if the filesystem become read-only (on disk error) or mount flags were
2312 changed (such as nosuid).
2313
2314 The syntax for the fsflags statement is:
2315
2316 IF CHANGED FSFLAGS THEN action
2317
2318 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2319 "UNMONITOR".
2320
2321 Example:
2322
2323 check filesystem rootfs with path /
2324 if changed fsflags then exec "/my/script"
2325
2326 SPACE USAGE TEST
2327 Monit can test a filesystem or a disk for space usage. This test may
2328 only be used in the context of a filesystem service type.
2329
2330 Filesystems usually have some space reserved for the root user (ca.
2331 1-5%), so non-superusers cannot write to a nearly full filesystem. If
2332 you set a limit for the filesystem which is used by non-root users you
2333 might want to consider these reserved blocks when setting the limit.
2334 You can use Monit itself to view the reserved blocks percentage by
2335 using the CLI status command or the HTTP interface for the given
2336 filesystem.
2337
2338 Syntax:
2339
2340 IF SPACE operator value unit THEN action
2341
2342 or:
2343
2344 IF SPACE FREE operator value unit THEN action
2345
2346 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2347 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2348 "notequal" in human readable form (if not specified, default is EQUAL).
2349
2350 unit is a choice of "B","KB","MB","GB", "%" or long alternatives
2351 "byte", "kilobyte", "megabyte", "gigabyte", "percent".
2352
2353 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2354 "UNMONITOR".
2355
2356 Example:
2357
2358 check filesystem rootfs with path /
2359 if space usage > 90% then alert
2360
2361 INODE USAGE TEST
2362 Monit can test filesystem inode usage. This test may only be used in
2363 the context of a filesystem service type.
2364
2365 Syntax:
2366
2367 IF INODE(S) operator value [unit] THEN action
2368
2369 or:
2370
2371 IF INODE(S) FREE operator value [unit] THEN action
2372
2373 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2374 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2375 "notequal" in human readable form (if not specified, default is EQUAL).
2376
2377 unit is optional. If not specified, the value is an absolute count of
2378 inodes. You can use the "%" character or the longer alternative
2379 "percent" as a unit.
2380
2381 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2382 "UNMONITOR".
2383
2384 Example:
2385
2386 check filesystem rootfs with path /
2387 if inode usage > 90% then alert
2388
2389 DISK I/O TEST
2390 Monit can test a filesystem read and write activity. This test may only
2391 be used in the context of a filesystem service type.
2392
2393 The available I/O metrics depends on the platform and filesystem. Some
2394 platforms allows us to get I/O activity for specific partition, others
2395 just for the whole disk. Some allows us to get metrics for network
2396 filesystems, others just for block devices.
2397
2398 Platforms I/O metrics granularity and filesystem support in Monit:
2399
2400 ---------------------------------------------------------------------------------------
2401 | Platform | Granularity | Supported filesystems | TBD |
2402 ---------------------------------------------------------------------------------------
2403 | AIX | per-disk | Disk io monitoring currently not supported | JFSx |
2404 | DragonFlyBSD | per-disk | UFS | HAMMER |
2405 | FreeBSD | per-disk | UFS, ZFS | |
2406 | Linux | per-filesystem | EXTx, XFS, BTRFS, ZFS, NFS, CIFS | |
2407 | MacOS | per-disk | HFS | |
2408 | NetBSD | per-disk | FFS | NFS |
2409 | OpenBSD | per-disk | FFS | |
2410 | Solaris | per-filesystem | ZFS, UFS, NFS | |
2411 ---------------------------------------------------------------------------------------
2412
2413 Read: bytes per second
2414
2415 Syntax:
2416
2417 IF READ [RATE] <operator> <number> <unit>/S THEN action
2418
2419 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2420 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2421 "notequal" in human readable form (if not specified, default is EQUAL).
2422
2423 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2424 "kilobyte", "megabyte", "gigabyte", "percent".
2425
2426 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2427 "UNMONITOR".
2428
2429 Example:
2430
2431 check filesystem disk1...
2432 if read rate > 1 MB/s then alert
2433
2434 Read: operations per second
2435
2436 Syntax:
2437
2438 IF READ [RATE] <operator> <number> operations/S THEN action
2439
2440 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2441 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2442 "notequal" in human readable form (if not specified, default is EQUAL).
2443
2444 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2445 "UNMONITOR".
2446
2447 Example:
2448
2449 check filesystem disk1...
2450 if read rate > 500 operations/s then alert
2451
2452 Write: bytes per second
2453
2454 Syntax:
2455
2456 IF WRITE [RATE] <operator> <number> <unit>/S THEN action
2457
2458 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2459 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2460 "notequal" in human readable form (if not specified, default is EQUAL).
2461
2462 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2463 "kilobyte", "megabyte", "gigabyte", "percent".
2464
2465 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2466 "UNMONITOR".
2467
2468 Example:
2469
2470 check filesystem disk1...
2471 if write rate > 1 MB/s then alert
2472
2473 Write: operations per second
2474
2475 Syntax:
2476
2477 IF WRITE [RATE] <operator> <number> operations/S THEN action
2478
2479 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2480 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2481 "notequal" in human readable form (if not specified, default is EQUAL).
2482
2483 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2484 "UNMONITOR".
2485
2486 Example:
2487
2488 check filesystem disk1...
2489 if write rate > 500 operations/s then alert
2490
2491 Service time per operation
2492
2493 Service Time is the time taken to complete a read or a write operation.
2494 This is a fairly important metric. If it grows, it means that the disk
2495 is not able to handle the operations fast enough. Growth charts are
2496 available in M/Monit.
2497
2498 Syntax:
2499
2500 IF SERVICE TIME <operator> <number> <unit> THEN action
2501
2502 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2503 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2504 "notequal" in human readable form (if not specified, default is EQUAL).
2505
2506 unit is "MS" (millisecond) or "S" (second)
2507
2508 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2509 "UNMONITOR".
2510
2511 Example:
2512
2513 if service time > 10 milliseconds
2514 for 3 times within 5 cycles
2515 then alert
2516
2517 PERMISSION TEST
2518 Monit can test the permissions of file objects. This test may only be
2519 used in the context of a file, fifo, directory or filesystem service
2520 types.
2521
2522 Syntax for testing specific permissions:
2523
2524 IF FAILED PERM(ISSION) octalnumber THEN action
2525
2526 Syntax for testing any permission change:
2527
2528 IF CHANGED PERM(ISSION) THEN action
2529
2530 octalnumber defines permissions for a file, a directory or a filesystem
2531 as four octal digits (0-7). Valid range is 0000 - 7777 (you can omit
2532 the leading zeros, Monit will add the zeros to the left. For example,
2533 "640" is a valid value and matches "0640").
2534
2535 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2536 "UNMONITOR".
2537
2538 Example:
2539
2540 check file shadow with path /etc/shadow
2541 if failed permission 0640 then alert
2542
2543 UID TEST
2544 Monit can monitor the owner user id (uid) of a file, fifo, directory or
2545 owner and effective user of a process.
2546
2547 Syntax:
2548
2549 IF FAILED [E]UID <value> THEN action
2550
2551 value defines a user id either in numeric or in string form.
2552
2553 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2554 "UNMONITOR".
2555
2556 Example:
2557
2558 check file shadow with path /etc/shadow
2559 if failed uid "root" then alert
2560
2561 GID TEST
2562 Monit can monitor the owner group id (gid) of a file, fifo, directory
2563 or process.
2564
2565 Syntax:
2566
2567 IF FAILED GID <value> THEN action
2568
2569 value defines a group id either in numeric or in string form.
2570
2571 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2572 "UNMONITOR".
2573
2574 Example:
2575
2576 check file shadow with path /etc/shadow
2577 if failed gid "shadow" then alert
2578
2579 PID TEST
2580 Monit can test the process's PID. Monit will send an alert in case the
2581 PID changed outside of Monit's control.
2582
2583 Syntax:
2584
2585 IF CHANGED PID THEN action
2586
2587 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2588 "UNMONITOR".
2589
2590 This test is useful to detect possible process restarts which has
2591 occurred in the timeframe between two Monit testing cycles.
2592
2593 For example if someone changes sshd configuration and did sshd restart
2594 outside of Monit's control you will be notified that the process was
2595 replaced by a new instance:
2596
2597 check process sshd with pidfile /var/run/sshd.pid
2598 if changed pid then alert
2599
2600 PPID TEST
2601 Monit can test the process's parent PID (PPID) for changes. Monit will
2602 send alert in the case that the PPID changed outside of Monit control.
2603
2604 The syntax for the ppid statement is:
2605
2606 IF CHANGED PPID THEN action
2607
2608 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2609 "UNMONITOR".
2610
2611 Example:
2612
2613 check process myproc with pidfile /var/run/myproc.pid
2614 if changed ppid then exec "/my/script"
2615
2616 UPTIME TEST
2617 The uptime statement may only be used in a process and system service
2618 type context.
2619
2620 Syntax:
2621
2622 IF UPTIME [[operator] value [unit]] THEN action
2623
2624 operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT",
2625 "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL",
2626 "NOTEQUAL" in human readable form (if not specified, default is EQUAL).
2627
2628 value is a uptime watermark.
2629
2630 unit is either "SECOND", "MINUTE", "HOUR" or "DAY" (it is also possible
2631 to use "SECONDS", "MINUTES", "HOURS", or "DAYS").
2632
2633 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2634 "UNMONITOR".
2635
2636 Example of restarting the process every three days:
2637
2638 check process myapp with pidfile /var/run/myapp.pid
2639 start program = "/etc/init.d/myapp start"
2640 stop program = "/etc/init.d/myapp stop"
2641 if uptime > 3 days then restart
2642
2643 SECURITY ATTRIBUTE TEST
2644 The security attribute statement may only be used in a process context.
2645
2646 Syntax:
2647
2648 IF FAILED SECURITY ATTRIBUTE <string> THEN <action>
2649
2650 string expected security attribute value
2651
2652 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2653 "UNMONITOR".
2654
2655 Example for SELinux:
2656
2657 check process ntpd matching "ntpd"
2658 if failed security attribute "system_u:system_r:ntpd_t:s0" then alert
2659
2660 Example for AppArmor:
2661
2662 check process ntpd matching "ntpd"
2663 if failed security attribute "/usr/sbin/ntpd (enforce)" then alert
2664
2665 SYSTEM AND PER-PROCESS FILEDESCRIPTORS TEST
2666 Monit can test the filedescriptors usage on the system and process
2667 level. You can check either an absolute value or percentual usage of
2668 the current maximum. The per-process percentual usage can be used only
2669 if the system exposes per-process maximum.
2670
2671 Syntax:
2672
2673 IF FILEDESCRIPTORS <operator> <number> [%] THEN action
2674
2675 For process only, you can also check accumulated number for the process
2676 and all its children.
2677
2678 Syntax:
2679
2680 IF TOTAL FILEDESCRIPTORS <operator> <number> THEN action
2681
2682 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2683 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2684 "notequal" in human readable form (if not specified, default is EQUAL).
2685
2686 number limit.
2687
2688 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2689 "UNMONITOR".
2690
2691 Examples:
2692
2693 check system $HOST
2694 if filedescriptors >= 90% then alert
2695
2696 check process myproc with pidfile /var/run/myproc.pid
2697 if filedescriptors >= 90% then alert
2698 if filedescriptors >= 99% then restart
2699 if total filedescriptors > 5000 then alert
2700
2701 PROGRAM STATUS TEST
2702 You can check the exit status of a program or a script. This test may
2703 only be used within a check program service entry in the Monit control
2704 file.
2705
2706 Syntax for testing specific exit value:
2707
2708 IF STATUS operator value THEN action
2709
2710 Syntax for testing any exit value change:
2711
2712 IF CHANGED STATUS THEN action
2713
2714 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2715 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2716 "notequal" in human readable form (if not specified, default is EQUAL).
2717
2718 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2719 "UNMONITOR".
2720
2721 Example:
2722
2723 check program myscript with path /usr/local/bin/myscript.sh
2724 if status != 0 then alert
2725
2726 Sample script for the above example (/usr/local/bin/myscript.sh):
2727
2728 #!/bin/sh
2729 echo test
2730 exit $?
2731
2732 You can also send parameters with the program:
2733
2734 check program list-files with path "/bin/ls -lrt /tmp/"
2735 if status != 0 then alert
2736
2737 Arguments to the program or script is a sequence of whitespace
2738 separated strings. In the above example the strings '-lrt' and '/tmp/'
2739 are arguments to the program '/bin/ls'. If arguments are used, it is
2740 recommended to use quotes " to enclose the string, otherwise, if no
2741 arguments are used, quotes are not needed.
2742
2743 Notes: If the program is a script, the interpreter is required in the
2744 first line. The program or script must also be executable.
2745
2746 If Monit is run as the super user, you can optionally run the program
2747 as a different user and/or group. In this example we run the ls program
2748 as user www and as group staff:
2749
2750 check program ls with path "/bin/ls /tmp" as uid "www"
2751 and gid "staff"
2752 if status != 0 then alert
2753
2754 Monit will execute the program periodically and if the exit status of
2755 the program does not match the expected result, Monit can perform an
2756 action. In the example above, Monit will raise an alert if the exit
2757 value is different from 0. By convention, 0 means the program exited
2758 normally.
2759
2760 Program checks are asynchronous. Meaning that Monit will not wait for
2761 the program to exit, but instead, Monit will start the program in the
2762 background and immediately continue checking the next service entry in
2763 monitrc. At the next cycle, Monit will check if the program has
2764 finished and if so, collect the program's exit status. If the status
2765 indicate a failure, Monit will raise an alert message containing the
2766 program's error (stderr) output, if any. If the program has not exited
2767 after the first cycle, Monit will wait another cycle and so on. If the
2768 program is still running after 5 minutes, Monit will kill it and
2769 generate a program timeout event. It is possible to override the
2770 default timeout (see the syntax below).
2771
2772 The asynchronous nature of the program check allows for non-blocking
2773 behaviour in the current Monit design, but it comes with a side-effect:
2774 when the program has finished executing and is waiting for Monit to
2775 collect the result, it becomes a so-called "zombie" process. A zombie
2776 process does not consume any system resources (only the PID remains in
2777 use) and it is under Monit's control and the zombie process is removed
2778 from the system as soon as Monit collects the exit status. This means
2779 that every "check program" will be associated with either a running
2780 process or a temporary zombie. This unwanted zombie side-effect will be
2781 removed in a later release of Monit.
2782
2783 Multiple status tests can be used, for example:
2784
2785 check program hwtest with path /usr/local/bin/hwtest.sh
2786 with timeout 500 seconds
2787 if status = 1 then alert
2788 if status = 3 for 5 cycles then exec "/usr/local/bin/emergency.sh"
2789
2790 PROGRAM OUTPUT CONTENT TEST
2791 The content statement can be used to test the content of a program by
2792 using regular expressions.
2793
2794 Syntax:
2795
2796 IF CONTENT <operator> <regex> THEN action
2797
2798 operator is either a "=" for match or "!=" for no-match.
2799
2800 regex is a string containing the extended regular expression. See also
2801 regex(7).
2802
2803 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2804 "UNMONITOR".
2805
2806 By default the output check is limited to 511 characters only. You can
2807 increase the limit using the set limits statement.
2808
2809 Example:
2810
2811 check program disk0_smart with path "/usr/sbin/nvme smart-log /dev/nvme0"
2812 if content != "critical_warning[ ]+: 0" then alert
2813
2814 NETWORK INTERFACE TESTS
2815 Monit can check network interfaces for:
2816
2817 Status
2818 Capacity
2819 Saturation
2820 Upload and download [bytes]
2821 Upload and download [packets]
2822
2823 Link status
2824
2825 You can check the network link state. This test may only be used within
2826 a check network service entry in the Monit control file.
2827
2828 Syntax:
2829
2830 IF LINK <DOWN|UP> THEN action
2831
2832 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2833 "UNMONITOR".
2834
2835 The DOWN test will fail if the link/interface is down or link errors
2836 were detected.
2837
2838 Mixing "link up" and "link down" in the same "check network" is not
2839 supported.
2840
2841 Examples:
2842
2843 check network eth0 with interface eth0
2844 if link down then alert
2845
2846 check network eth5 with interface eth5
2847 if link up then exec "/usr/bin/monit start backup"
2848
2849 In case a link failed you can add a start and stop program to
2850 automatically restart the interface which might help. (Substitute with
2851 the relevant network commands for your system)
2852
2853 check network eth0 with interface eth0
2854 start program = '/sbin/ipup eth0'
2855 stop program = '/sbin/ipdown eth0'
2856 if link down then restart
2857
2858 Link capacity
2859
2860 You can check the network link mode capacity for changes. This test may
2861 only be used within a check network service entry in the Monit control
2862 file.
2863
2864 Syntax:
2865
2866 IF CHANGED LINK [CAPACITY] THEN action
2867
2868 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2869 "UNMONITOR".
2870
2871 The test will match if the link mode has changed (e.g. maximum speed
2872 dropped) or if the duplex mode has changed.
2873
2874 NOTE: not all interface types allow for capacity monitoring. Pseudo
2875 interfaces such as loopback device or VMWare interfaces does not have a
2876 speed attribute.
2877
2878 Example:
2879
2880 check network eth0 with interface eth0
2881 if changed link capacity then alert
2882
2883 Link saturation
2884
2885 You can check the network link saturation. Monit then computes the link
2886 utilisation based on the current transfer rate vs. link capacity. This
2887 test may only be used within a check network service entry in the Monit
2888 control file.
2889
2890 Syntax:
2891
2892 IF SATURATION operator value% THEN action
2893
2894 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2895 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2896 "notequal" in human readable form (if not specified, default is EQUAL).
2897
2898 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2899 "UNMONITOR".
2900
2901 NOTE: this test depends on the availability of the speed attribute and
2902 not all interface types have this attribute. See the LINK SPEED test
2903 description.
2904
2905 Example:
2906
2907 check network eth0 with interface eth0
2908 if saturation > 90% then alert
2909
2910 Link upload and download [bytes]
2911
2912 You can check a network link upload and download bandwidth usage,
2913 current transfer speed and total data transferred in the last 24 hours.
2914 This test may only be used within a check network service entry in the
2915 Monit control file.
2916
2917 Upload speed test syntax (per second):
2918
2919 IF UPLOAD operator value unit/S THEN action
2920
2921 Download speed test syntax (per second):
2922
2923 IF DOWNLOAD operator value unit/S THEN action
2924
2925 Total upload data test syntax:
2926
2927 IF TOTAL UPLOADED operator value unit IN LAST number time-unit THEN action
2928
2929 Total download data test syntax:
2930
2931 IF TOTAL DOWNLOADED operator value unit IN LAST number time-unit THEN action
2932
2933 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2934 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2935 "notequal" in human readable form (if not specified, default is EQUAL).
2936
2937 unit is a choice of "B","KB","MB","GB" or long alternatives "byte",
2938 "kilobyte", "megabyte", "gigabyte".
2939
2940 time-unit is a choice of "MINUTE(S)", "HOUR(S)", "DAY". NOTE: Monit
2941 maintains a rolling count of total uploaded and downloaded bytes for
2942 the last 24 hours only. The value of time-unit can therefore not
2943 specify a range wider than one day.
2944
2945 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2946 "UNMONITOR".
2947
2948 Examples:
2949
2950 check network eth0 with interface eth0
2951 if upload > 500 kB/s then alert
2952 if total downloaded > 1 GB in last 2 hours then alert
2953 if total downloaded > 10 GB in last day then alert
2954
2955 Link upload and download [packets]
2956
2957 You can check the network link upload and download packets count,
2958 current transfer rate and total data transferred in last 24 hours. This
2959 test may only be used within a check network service entry in the Monit
2960 control file.
2961
2962 Current upload bandwidth rate test syntax:
2963
2964 IF UPLOAD operator value PACKETS/S THEN action
2965
2966 Current download bandwidth rate test syntax:
2967
2968 IF DOWNLOAD operator value PACKETS/S THEN action
2969
2970 Total upload test syntax:
2971
2972 IF TOTAL UPLOADED operator value PACKETS IN LAST number time-unit THEN action
2973
2974 Total download test syntax:
2975
2976 IF TOTAL DOWNLOADED operator value PACKETS IN LAST number time-unit THEN action
2977
2978 operator is a choice of "<",">","!=","==" in c notation, "gt", "lt",
2979 "eq", "ne" in shell sh notation and "greater", "less", "equal",
2980 "notequal" in human readable form (if not specified, default is EQUAL).
2981
2982 time-unit is a choice of "MINUTE(S)", "HOUR(S)", "DAY". NOTE: Monit
2983 keeps total upload/download statistics only for the last 24 hours. The
2984 time-unit value cannot therefore span more than one day.
2985
2986 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
2987 "UNMONITOR".
2988
2989 Examples:
2990
2991 check network eth0 with interface eth0
2992 if upload > 1000 packets/s then alert
2993 if total uploaded > 900000 packets in last hour then alert
2994
2995 NETWORK PING TEST
2996 Monit can perform a network ping test by sending ICMP echo request
2997 datagram packets to a host and wait for the reply. This test can only
2998 be used within a check host statement. Monit must also run as the root
2999 user in order to be able to perform the ping test (because the ping
3000 test must use raw sockets which usually only the super user is allowed
3001 to).
3002
3003 Syntax:
3004
3005 IF <FAILED|SUCCEEDED> PING[4|6]
3006 [COUNT number]
3007 [SIZE number]
3008 [RESPONSETIME number <MILLISECONDS|SECONDS>]
3009 [TIMEOUT number SECONDS]
3010 [ADDRESS string]
3011 THEN action
3012
3013 If a DNS host name was used in the check host statement and the host
3014 name resolve to several addresses (either IPv4 or IPv6), Monit will
3015 ping the first available address and continue with the next address
3016 until one connection succeed or until there are no more addresses left
3017 to try. You can force Monit to only ping IPv4 or IPv6 addresses by
3018 using the PING4 or the PING6 keyword instead of PING.
3019
3020 The COUNT parameter specifies how many consecutive ping requests will
3021 be sent to the host in one cycle at maximum. The default value is 3.
3022
3023 The SIZE parameter specifies the ping request payload size. Default is
3024 64 bytes, minimum is 8 bytes, maximum 1492 bytes.
3025
3026 The RESPONSETIME parameter sets the response time limit.
3027
3028 If no reply arrive within TIMEOUT seconds, Monit reports an error. If
3029 at least one reply was received, the ping test is considered a success.
3030
3031 The ADDRESS parameter specifies source IP address.
3032
3033 Monit will, by default, send up to three ping request packets in one
3034 cycle to prevent false alarm (i.e. up to 66% packet loss is tolerated).
3035 You can set the COUNT option to a value between 1 and 20 to send more
3036 or fewer packets. If you require 100% ping success, set the count to 1
3037 (i.e. just one request will be sent, and if the packet was lost an
3038 error will be reported).
3039
3040 Note that many ISPs have started to filter out ping or ICMP packets
3041 now, in which case there will be no reply from the host.
3042
3043 If a ping test is used in a check host entry, this test is run first
3044 and if the test should fail, we assume that the connection to the host
3045 is down and Monit will not continue with any subsequent port tests.
3046
3047 Example:
3048
3049 check host mmonit.com with address mmonit.com
3050 if failed ping then alert # IPv4 or IPv6
3051
3052 check host mmonit.com with address 62.109.39.247
3053 if failed ping then alert # Address is IPv4 so IPv4 is preferred
3054
3055 or test that the system is explicit accessible via IPv4 and IPv6:
3056
3057 check host mmonit.com with address mmonit.com
3058 if failed ping4 then alert # IPv4 only
3059 if failed ping6 then alert # IPv6 only
3060
3061 or with all parameters; Send five 128 byte pings to mmonit.com and wait
3062 for up to 10 seconds for a reply
3063
3064 check host mmonit.com with address mmonit.com
3065 if failed ping count 5 size 128 with timeout 10 seconds then alert
3066
3067 You can also watch host, that is supposed to be offline:
3068
3069 check host offlinehost with address 192.168.100.50
3070 if succeeded ping then alert
3071
3072 CONNECTION TESTS
3073 Monit can perform connection testing via network ports or via Unix
3074 sockets. A connection test may only be used within a process or host
3075 service type context.
3076
3077 If a service listens on one or more sockets, Monit can connect to the
3078 port (using TCP or UDP) and verify that the service will accept a
3079 connection and that it is possible to write and read from the socket.
3080 If a connection is not accepted or if there is a problem with socket
3081 I/O, Monit will execute a specified action.
3082
3083 For TCP/UDP ports monit can alert on successful connection, e.g. when a
3084 service like mysql should not be publicly available.
3085
3086 TCP/UDP port test syntax:
3087
3088 IF <FAILED|SUCCEEDED>
3089 [HOST string]
3090 <PORT number>
3091 [ADDRESS string]
3092 [IPV4 | IPV6]
3093 [TYPE <TCP|UDP>]
3094 [<SSL|TLS> [with options {...}]
3095 [CERTIFICATE CHECKSUM [MD5|SHA1] string]
3096 [CERTIFICATE VALID for number DAYS]
3097 [PROTOCOL protocol | <SEND|EXPECT> "string",...]
3098 [RESPONSETIME number <MILLISECONDS|SECONDS>]
3099 [TIMEOUT number SECONDS]
3100 [RETRY number]
3101 THEN action
3102
3103 Unix socket test syntax:
3104
3105 IF <FAILED|SUCCEEDED>
3106 <UNIXSOCKET path>
3107 [TYPE <TCP|UDP>]
3108 [PROTOCOL protocol | <SEND|EXPECT> "string",...]
3109 [RESPONSETIME number <MILLISECONDS|SECONDS>]
3110 [TIMEOUT number SECONDS]
3111 [RETRY number]
3112 THEN action
3113
3114 Examples:
3115
3116 if failed port 80 then alert
3117
3118 if failed port 53 type udp protocol dns then alert
3119
3120 if succeeded host example.org port 3306 type tcp protocol mysql then alert
3121
3122 if failed unixsocket /var/run/sophie then alert
3123
3124 Options:
3125
3126 HOST hostname. Optionally specify the host to connect to. If the host
3127 is not given then localhost is assumed if this test is used inside a
3128 process entry. If this test is used inside a remote host entry then the
3129 entry's remote host is assumed.
3130
3131 PORT number. The port number to connect to
3132
3133 UNIXSOCKET path. Specifies the path to a Unix socket (local machine
3134 only).
3135
3136 ADDRESS string. The source IP address to use.
3137
3138 IPV4 | IPV6 . Optionally specify the IP version Monit should use when
3139 trying to connect to the port. If not used, Monit will try to connect
3140 to the first available address (IPv4 or IPv6). If multiple addresses
3141 are available and connection to one address failed, Monit will try the
3142 next address and so on until a connection succeed or until there are no
3143 more addresses left to try.
3144
3145 TYPE [TCP | UDP]. Optionally specify the socket type Monit should use
3146 when trying to connect to the port. The different socket types are: TCP
3147 or UDP, where TCP is a regular stream based socket, UDP, a datagram
3148 socket. The default socket type is TCP.
3149
3150 [SSL | TLS] [with options {...}]. Set SSL/TLS options and override
3151 global/default SSL options. You can set the SSL/TLS version to use,
3152 whether to verify certificates, trust self-signed certificates or set
3153 the SSL client certificates database-file for client certificate
3154 authentication.
3155
3156 CERTIFICATE CHECKSUM [MD5|SHA1] hash. Verify the SSL server certificate
3157 by checking its checksum. You can use either MD5 or SHA1 checksum (if
3158 you don't specify the type, Monit will determine the digest based on
3159 the hash length). You can use the openssl command line tool to get the
3160 checksum value for your certificate, which you can then use in Monit's
3161 control file:
3162
3163 openssl x509 -fingerprint -sha1 -in server.crt | head -1 | cut -f2 -d'='
3164
3165 Example:
3166
3167 if failed
3168 port 443
3169 protocol https
3170 and certificate checksum = "1ED948A6F4258ACAB964227EF4EB19FCC453B0F8"
3171 then alert
3172
3173 CERTIFICATE VALID for number DAYS. Send an alert if the certificate
3174 will expire in the given number of days. This test is pretty useful to
3175 get a notification when it is time to renew your SSL certificate.
3176
3177 Example:
3178
3179 if failed
3180 port 443
3181 protocol https
3182 and certificate valid > 30 days
3183 then alert
3184
3185 PROTOCOL protocol. Optionally specify the protocol Monit should speak
3186 when a connection is established. At the moment Monit knows how to
3187 speak:
3188 APACHE-STATUS
3189 DNS
3190 DWP
3191 FAIL2BAN
3192 FTP
3193 GPS
3194 HTTP
3195 HTTPS
3196 IMAP
3197 IMAPS
3198 CLAMAV
3199 LDAP2
3200 LDAP3
3201 LMTP
3202 MEMCACHE
3203 MONGODB
3204 MQTT
3205 MYSQL
3206 MYSQLS
3207 NNTP
3208 NTP3
3209 PGSQL
3210 POP
3211 POPS
3212 POSTFIX-POLICY
3213 RADIUS
3214 RDATE
3215 REDIS
3216 RSYNC
3217 SIEVE
3218 SIP
3219 SMTP
3220 SMTPS
3221 SPAMASSASSIN
3222 SSH
3223 TNS
3224 WEBSOCKET
3225
3226 If the target server's protocol is not found in this list, simply do
3227 not specify the protocol and Monit will use a default connection test.
3228
3229 RESPONSETIME parameter sets the response time limit.
3230
3231 TIMEOUT number SECONDS. Optionally specifies the connect and read
3232 timeout for the connection. If Monit cannot connect to the server
3233 within this time it will assume that the connection failed and execute
3234 the specified action. The default connect timeout is 5 seconds.
3235
3236 RETRY number. Optionally specifies the number of consecutive retries
3237 within the same testing cycle in the case that the connection failed.
3238 The default is fail on first error.
3239
3240 action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or
3241 "UNMONITOR".
3242
3243 Specific protocol test options
3244
3245 GENERIC (SEND/EXPECT)
3246
3247 If Monit does not support the protocol spoken by the server, you can
3248 write your own protocol-test using send and expect strings. The SEND
3249 statement sends a string to the server port and the EXPECT statement
3250 compares a string read from the server with the string given in the
3251 expect statement.
3252
3253 Syntax:
3254
3255 [<SEND|EXPECT> "string"]+
3256
3257 Monit will send a string as it is, and you must remember to include CR
3258 and LF in the string sent to the server if the protocol expects such
3259 characters to terminate a string (most text based protocols used over
3260 Internet do).
3261
3262 Monit will by default read up to 255 bytes from the server and use this
3263 string when comparing the EXPECT string. You can override the default
3264 value using the set limits statement.
3265
3266 You can use non-printable characters in a SEND string if needed. Use
3267 the hex notation, \0xHEXHEX to send any char in the range \0x00-\0xFF,
3268 that is, 0-255 in decimal. For example, to test a Quake 3 server:
3269
3270 send "\0xFF\0xFF\0xFF\0xFFgetstatus"
3271 expect "sv_floodProtect|sv_maxPing"
3272
3273 If your system supports POSIX regular expressions, you can use regular
3274 expressions in the EXPECT string, see regex(7) to learn more about the
3275 types of regular expressions you can use in an expect string.
3276
3277 Since both regex and string compare operates on a zero terminated
3278 string, you cannot test for '\0' in an EXPECT buffer since this
3279 character marks the end of the buffer. However, we escape '\0' in the
3280 expect buffer as "\0" which you can test for. That is, '\' followed by
3281 the ascii value for 0. For instance, here is how to test for an expect
3282 string that starts with zero followed by any number of characters.
3283
3284 expect "^[\\]0.*"
3285
3286 Here is a simple SMTP protocol example:
3287
3288 if failed
3289 port 25 and
3290 expect "^220.*"
3291 send "HELO localhost.localdomain\r\n"
3292 expect "^250.*"
3293 send "QUIT\r\n"
3294 then alert
3295
3296 SEND/EXPECT can be used with any socket type, such as TCP sockets, UNIX
3297 sockets and UDP sockets.
3298
3299 HTTP
3300
3301 Syntax:
3302
3303 PROTO(COL) HTTP
3304 [USERNAME "string"]
3305 [PASSWORD "string"]
3306 [REQUEST "string"]
3307 [METHOD <GET|HEAD>]
3308 [STATUS operator number]
3309 [CHECKSUM checksum]
3310 [HTTP HEADERS list of headers]
3311 [CONTENT < "=" | "!=" > STRING]
3312
3313 USERNAME is an optional username for Basic authentication
3314
3315 PASSWORD is an optional password for Basic authentication
3316
3317 REQUEST option can set an URL string specifying a document on the HTTP
3318 server. If the request statement isn't specified, the default "/" page
3319 will be requested.
3320
3321 For example:
3322
3323 if failed
3324 port 80
3325 protocol http
3326 request "/data/show?a=b&c=d"
3327 then restart
3328
3329 METHOD set the HTTP request method. If not specified, Monit prefers the
3330 HTTP GET request method, which is more common then the HEAD method.
3331 One may want to set the method explicitly to HEAD to save the network
3332 bandwidth.
3333
3334 STATUS option can be used to explicitly test the HTTP status code
3335 returned by the HTTP server. If not used, the HTTP protocol test will
3336 fail if the status code returned is greater than or equal to 400. You
3337 can override this behaviour by using the status qualifier.
3338
3339 For example to test that a page does not exist (the HTTP server should
3340 return 404 in this case):
3341
3342 if failed
3343 port 80
3344 protocol http
3345 request "/non/existent.php"
3346 status = 404
3347 then alert
3348
3349 CHECKSUM You can test the checksum of documents returned by a HTTP
3350 server. Either MD5 or SHA1 hash can be used. Monit will not test the
3351 checksum for a document if the server does not set the HTTP Content-
3352 Length header. A HTTP server should set this header when it server a
3353 static document (i.e. a file). There are no limitation on the document
3354 size, but keep in mind that Monit will use time to download the
3355 document over the network to compute the checksum.
3356
3357 Example:
3358
3359 if failed
3360 port 80
3361 protocol http
3362 request "/page.html"
3363 checksum 8f7f419955cefa0b33a2ba316cba3659
3364 then alert
3365
3366 HTTP HEADERS can be used to send a list of HTTP headers when using the
3367 HTTP protocol test. For instance, the host header. If the host header
3368 is not set, Monit will use the hostname or IP-address of the host as
3369 specified in the check host statement. Specifying a host header is
3370 useful if you want to connect to and test a name-based virtual host.
3371 The syntax for setting HTTP headers is
3372
3373 http headers [name:value, name:value,..]
3374
3375 where each name:value pair is separated with ','. If you need to use
3376 ':' in the value string, for instance to set port number for a host
3377 header, you must enclose the value in quotes. For example,
3378
3379 http headers [Host: "mmonit.com:443"]
3380
3381 In a check host context, using this statement might look like
3382
3383 check host mmonit.com with address mmonit.com
3384 if failed
3385 port 80 protocol http
3386 with http headers [Host: mmonit.com, Cache-Control: no-cache,
3387 Cookie: csrftoken=nj1bI3CnMCaiNv4beqo8ZaCfAQQvpgLH]
3388 and request /monit/ with content = "Monit [0-9.]+"
3389 then alert
3390
3391 Setting HTTP headers is associated with the HTTP protocol test and must
3392 come before request as in the example above.
3393
3394 The CONTENT option sets the pattern which is expected in the data
3395 returned by the server. If the pattern doesn't match, the test fails.
3396 In the example above, if the server does not return a page with the
3397 name Monit followed by a version number the test will fail.
3398
3399 By default, at maximum 1MB of content is inspected. You can increase
3400 this limit using the set limits statement.
3401
3402 For example:
3403
3404 if failed
3405 port 80
3406 protocol http
3407 content = "foobar [0-9.]+"
3408 then alert
3409
3410 APACHE-STATUS
3411
3412 The APACHE-STATUS test allows one to check server performance by
3413 examination of the status page generated by Apache's mod_status, which
3414 is expected to be at its default address of
3415 http://www.example.com/server-status.
3416
3417 Syntax:
3418
3419 PROTOCOL APACHE-STATUS [PATH <path>] [USERNAME <string>] [PASSWORD <string>] [<property> <operator> <number>]+
3420
3421 PATH is an optional path to apache status ("/server-status" by default)
3422
3423 USERNAME is an optional username for Basic authentication
3424
3425 PASSWORD is an optional password for Basic authentication
3426
3427 property is acronym for child status:
3428
3429 (1) logging (loglimit)
3430 (2) closing connections (closelimit)
3431 (3) performing DNS lookups (dnslimit)
3432 (4) in keepalive with a client (keepalivelimit)
3433 (5) replying to a client (replylimit)
3434 (6) receiving a request (requestlimit)
3435 (7) initialising (startlimit)
3436 (8) waiting for incoming connections (waitlimit)
3437 (9) gracefully closing down (gracefullimit)
3438 (10) performing cleanup procedures (cleanuplimit)
3439
3440 operator is one of "<", "=", ">".
3441
3442 number is percentile numeric limit.
3443
3444 Each of these limits can be compared against a value relative to the
3445 total number of active Apache child processes.
3446
3447 You can combine all of these tests into one expression or you can
3448 choose to test a certain limit only. If you combine the limits you must
3449 connect them together using the OR keyword.
3450
3451 Example:
3452
3453 if failed port 80 protocol apache-status
3454 loglimit > 10% or
3455 dnslimit > 50% or
3456 waitlimit < 20%
3457 then alert
3458
3459 MQTT
3460
3461 Syntax:
3462
3463 PROTOCOL MQTT [USERNAME string PASSWORD string]
3464
3465 USERNAME MQTT username
3466
3467 PASSWORD MQTT password
3468
3469 Username and password (credentials) are optional. If not used, Monit
3470 will try anonymous connect, which may trigger authorization error =>
3471 credentials are recommended unless your server allows anonymous
3472 connect.
3473
3474 Example:
3475
3476 check process mosquitto with pidfile /var/run/mosquitto.pid
3477 start program = "/sbin/start mosquitto"
3478 stop program = "/sbin/stop mosquitto"
3479 if failed port 1883 protocol mqtt then alert
3480
3481 MYSQL
3482
3483 Syntax:
3484
3485 PROTOCOL MYSQL[S] [USERNAME string PASSWORD string [RSAKEY CHECKSUM string]]
3486
3487 USERNAME MySQL username.
3488
3489 PASSWORD MySQL password (special characters can be used, but for non-
3490 alphanumerics the password has to be quoted).
3491
3492 RSKEY CHECKSUM If you use unsecured connection (plain MYSQL without
3493 TLS), you can set the expected MD5 or SHA1 checksum of the server's RSA
3494 key to protect afainst man-in-the-middle attacks. Monit will check the
3495 key fingerprint before sending the password to the server.
3496
3497 Username and password (credentials) are optional and if not set, Monit
3498 will perform the test using anonymous login. This can cause an
3499 authentication error to be logged in your MySQL log, depending on your
3500 MySQL configuration.
3501
3502 If credentials are set, Monit will try to login. Monit does not require
3503 any database privileges, it just needs the database user. You might
3504 want to create standalone user for Monit to use when testing, for
3505 example:
3506
3507 CREATE USER 'monit'@'host_from_which_monit_performs_testing' IDENTIFIED BY 'mysecretpassword';
3508 FLUSH PRIVILEGES;
3509
3510 Example:
3511
3512 check process mysql with pidfile /var/run/mysqld/mysqld.pid
3513 start program = "/sbin/start mysql"
3514 stop program = "/sbin/stop mysql"
3515 if failed
3516 port 3306
3517 protocol mysql username "foo" password "bar"
3518 then alert
3519
3520 or with unix-socket start/stop commands
3521
3522 check process mysql with pidfile /var/run/mysqld/mysqld.pid
3523 start program = "/usr/local/mysql/support-files/mysql.server start"
3524 stop program = "/usr/local/mysql/support-files/mysql.server stop"
3525 if failed
3526 unixsocket /tmp/mysql.sock
3527 protocol mysql username "foo" password "bar"
3528 then alert
3529
3530 You can enable the TLS encryption for the test by using MYSQLS as
3531 protocol name:
3532
3533 if failed
3534 port 3306
3535 protocol mysqls username "foo" password "bar"
3536 then alert
3537
3538 PGSQL
3539
3540 Syntax:
3541
3542 PROTOCOL PGSQL [USERNAME string] [PASSWORD string] [DATABASE string]]
3543
3544 USERNAME PostgreSQL username.
3545
3546 PASSWORD PostgreSQL password (special characters can be used, but for
3547 non-alphanumerics the password has to be quoted).
3548
3549 DATABASE PostgreSQL database (defaults to the database that matches the
3550 username if not set).
3551
3552 Username and password (credentials) are optional and if not set, Monit
3553 will perform the test with hardcoded user=root and database=root, which
3554 may trigger errors in PostgreSQL logs.
3555
3556 If credentials are set, Monit will try to login. You might want to
3557 create standalone user for Monit to use when testing.
3558
3559 Monit currently supports only 'password' and 'md5' PostgreSQL
3560 authentication methods. If the server asks for authentication method
3561 that Monit doesn't support (such as 'scram-sha-256'), Monit terminates
3562 the connection and the test succeeds (although monit cannot
3563 authenticate, the server is communicating).
3564
3565 To allow access to Monit for testing purposes, one can create an
3566 account and allow access for example like this:
3567
3568 PostgreSQL pg_hba.conf entry example:
3569
3570 # TYPE DATABASE USER ADDRESS METHOD
3571 host test monit 127.0.0.1/32 md5
3572
3573 Monit configurations example:
3574
3575 check process postgresql with pidfile /var/run/postgresql/12-main.pid
3576 start program = "/bin/systemctl postgresql start"
3577 stop program = "/bin/systemctl postgresql stop"
3578 if failed
3579 port 5432
3580 protocol pgsql username "monit" password "123456" database "test"
3581 then alert
3582
3583 RADIUS
3584
3585 Syntax:
3586
3587 PROTOCOL RADIUS [SECRET string]
3588
3589 SECRET you may specify an alternative secret, default is "testing123".
3590
3591 For example:
3592
3593 check process radiusd with pidfile /var/run/radiusd.pid
3594 start program = "/etc/init.d/freeradius start"
3595 stop program = "/etc/init.d/freeradius stop"
3596 if failed
3597 host 127.0.0.1 port 1812 type udp protocol radius
3598 secret pingpong
3599 then alert
3600
3601 SIP
3602
3603 The SIP protocol is used by communication platform servers such as
3604 Asterisk and FreeSWITCH.
3605
3606 Syntax:
3607
3608 PROTOCOL SIP [TARGET valid@uri] [MAXFORWARD n]
3609
3610 TARGET you may specify an alternative recipient for the message, by
3611 adding a valid sip uri after this keyword.
3612
3613 MAXFORWARD Limit the number of proxies or gateways that can forward the
3614 request to the next server. It's value is an integer in the range
3615 0-255, set by default to 70. If max-forward = 0, the next server may
3616 respond 200 OK (test succeeded) or send a 483 Too Many Hops (test
3617 failed)
3618
3619 For example:
3620
3621 check host openser_all with address 127.0.0.1
3622 if failed
3623 port 5060 type udp protocol sip
3624 with target "localhost:5060" and maxforward 6
3625 then alert
3626
3627 SMTP
3628
3629 Syntax:
3630
3631 PROTOCOL SMTP[S] [USERNAME string PASSWORD string]
3632
3633 USERNAME SMTP username.
3634
3635 PASSWORD SMTP password (special characters can be used, but for non-
3636 alphanumerics the password has to be quoted).
3637
3638 Credentials are optional and when used will perform authentication
3639 during testing so you can test that authentication also works. We
3640 recommend using smtps if authentication is to be used to encrypt the
3641 communication. If no credentials are set, Monit will just perform a
3642 basic protocol test.
3643
3644 Example:
3645
3646 check process postfix with pidfile /var/spool/postfix/pid/master.pid
3647 start program = "/etc/init.d/postfix start"
3648 stop program = "/etc/init.d/postfix stop"
3649 if failed
3650 port 25
3651 protocol smtp
3652 then alert
3653
3654 Example using authentication and STARTTLS/SMTPS:
3655
3656 check process postfix with pidfile /var/spool/postfix/pid/master.pid
3657 start program = "/etc/init.d/postfix start"
3658 stop program = "/etc/init.d/postfix stop"
3659 if failed
3660 port 25
3661 protocol smtps
3662 username "foo"
3663 password "bar"
3664 then alert
3665
3666 WEBSOCKET
3667
3668 Syntax:
3669
3670 PROTOCOL WEBSOCKET
3671 [REQUEST string]
3672 [HOST string]
3673 [ORIGIN string]
3674 [VERSION number]
3675
3676 HOST you may specify an alternative Host header
3677
3678 REQUEST you may specify an alternative request, default is "/"
3679
3680 ORIGIN you may specify an alternative origin, default is
3681 "https://mmonit.com"
3682
3683 VERSION you may specify an alternative version, default is "0"
3684
3685 For example:
3686
3687 check host websocket.org with address "echo.websocket.org"
3688 if failed
3689 port 80 protocol websocket
3690 host "echo.websocket.org"
3691 request "/"
3692 origin 'http://websocket.com'
3693 version 13
3694 then alert
3695
3697 M/Monit <https://mmonit.com> expands on Monit's capabilities and
3698 provides monitoring and management of all your Monit enabled hosts.
3699
3700 M/Monit uses Monit as an agent. With regular intervals, Monit sends a
3701 status message to M/Monit with a snapshot of the host it is running on.
3702
3703 M/Monit presents the collected data in charts and event logs and give
3704 you the option to view key performance data of all your hosts in a
3705 modern, clean and well designed user interface which also works on
3706 mobile devices.
3707
3708 From M/Monit, you can also start, stop and restart services on your
3709 hosts running Monit.
3710
3711 To send data to M/Monit, add the following statement to your Monit
3712 control file:
3713
3714 SET MMONIT <url>
3715 [TIMEOUT <number> SECONDS]
3716 [REGISTER WITHOUT CREDENTIALS]
3717
3718 Example:
3719
3720 set mmonit https://monit:monit@192.168.1.10:8443/collector
3721
3722 Monit will register itself in M/Monit and will start sending status and
3723 event messages to M/Monit. We recommend using https as in the example
3724 above to ensure that the communication between Monit and M/Monit is
3725 secure.
3726
3727 The password should be URL encoded if it contains URL-significant
3728 characters like ":", "?", "@".
3729
3730 The default timeout is 5 seconds, you can customise the timeout using
3731 the TIMEOUT option.
3732
3733 When Monit registers itself in M/Monit it sends credentials that can be
3734 used to perform service actions from M/Monit. You can disable sending
3735 credentials by using REGISTER WITHOUT CREDENTIALS and instead manually
3736 add credentials in M/Monit.
3737
3739 The simplest form is just the check statement. In this example we check
3740 to see if our web server is running and raise an alert if not:
3741
3742 check process nginx with pidfile /var/run/nginx.pid
3743
3744 To have Monit start the server if it's not running, add a start
3745 statement:
3746
3747 check process nginx with pidfile /var/run/nginx.pid
3748 start program = "/etc/init.d/nginx start"
3749
3750 Here's a more advanced example for monitoring an apache web-server
3751 listening on the default port number for HTTP and HTTPS. In this
3752 example Monit will restart apache if it's not accepting connections at
3753 the port numbers. The method Monit use for restart is to first execute
3754 the stop-program, then wait (up to 30s) for the process to stop and
3755 then execute the start-program and wait (30s) for it to start. The
3756 length of start or stop wait can be overridden using the 'timeout'
3757 option. If Monit was unable to stop or start the service a failed alert
3758 message will be sent if you have requested alert messages to be sent.
3759
3760 check process apache with pidfile /var/run/httpd.pid
3761 start program = "/etc/init.d/httpd start" with timeout 60 seconds
3762 stop program = "/etc/init.d/httpd stop"
3763 if failed port 80 for 2 cycles then restart
3764 if failed port 443 for 2 cycles then restart
3765
3766 This example demonstrate how you can run a program as a specified user
3767 (uid) and with a specified group (gid). Many daemon programs can do the
3768 uid and gid switch by themselves, but for those programs that does not
3769 (e.g. Java programs), monit's ability to start a program as a certain
3770 user can be very useful. In this example we start the Tomcat Java
3771 Servlet Engine as the standard nobody user and group. Please note that
3772 Monit can only switch uid and gid for the program if the super-user is
3773 running Monit, otherwise Monit will simply ignore the request to change
3774 uid and gid.
3775
3776 check process tomcat with pidfile /var/run/tomcat.pid
3777 start program = "/etc/init.d/tomcat start"
3778 as uid "nobody" and gid "nobody"
3779 stop program = "/etc/init.d/tomcat stop"
3780 # You can also use id numbers instead and write:
3781 as uid 99 and with gid 99
3782 if failed port 8080 then alert
3783
3784 In this example we use udp for connection testing to check if the name-
3785 server is running:
3786
3787 check process named with pidfile /var/run/named.pid
3788 start program = "/etc/init.d/named start"
3789 stop program = "/etc/init.d/named stop"
3790 if failed port 53 use type udp protocol dns then restart
3791
3792 The following example illustrates how to check if the service 'sophie'
3793 is answering connections on its Unix domain socket:
3794
3795 check process sophie with pidfile /var/run/sophie.pid
3796 start program = "/etc/init.d/sophie start"
3797 stop program = "/etc/init.d/sophie stop"
3798 if failed unix /var/run/sophie then restart
3799
3800 In this example we check an apache web-server running on localhost
3801 which answers for several IP-based virtual hosts or vhosts, hence the
3802 host statement before port:
3803
3804 check process apache with pidfile /var/run/httpd.pid
3805 start "/etc/init.d/httpd start"
3806 stop "/etc/init.d/httpd stop"
3807 if failed host www.sol.no port 80 then alert
3808 if failed host shop.sol.no port 443 then alert
3809 if failed host chat.sol.no port 80 then alert
3810
3811 To make sure that Monit is communicating with a HTTP server a protocol
3812 test can be added:
3813
3814 check process apache with pidfile /var/run/httpd.pid
3815 start "/etc/init.d/httpd start"
3816 stop "/etc/init.d/httpd stop"
3817 if failed
3818 host www.sol.no port 80 protocol http
3819 then alert
3820
3821 This example demonstrate a different way to check a web-server using
3822 the send/expect mechanism:
3823
3824 check process apache with pidfile /var/run/httpd.pid
3825 start "/etc/init.d/httpd start"
3826 stop "/etc/init.d/httpd stop"
3827 if failed
3828 host www.sol.no port 80 and
3829 send "GET / HTTP/1.1\r\nHost: www.sol.no\r\n\r\n"
3830 expect "HTTP/[0-9\.]{3} 200.*"
3831 then alert
3832
3833 Here we ping a remote host to check if it is up and if not, send an
3834 alert:
3835
3836 check host www.tildeslash.com with address www.tildeslash.com
3837 if failed ping then alert
3838
3839 In the following example we ask Monit to compute and verify the
3840 checksum for the underlying apache binary used by the start and stop
3841 programs. If the checksum test should fail, monitoring will be disabled
3842 to prevent possibly restarting a compromised binary:
3843
3844 check process apache with pidfile /var/run/httpd.pid
3845 start program = "/etc/init.d/httpd start"
3846 stop program = "/etc/init.d/httpd stop"
3847 if failed host www.tildeslash.com port 80 then restart
3848 depends on apache_bin
3849
3850 check file apache_bin with path /usr/local/apache/bin/httpd
3851 if failed checksum then unmonitor
3852
3853 In this example we ask Monit to test a document's checksum on a remote
3854 server. If the checksum was changed we send an alert:
3855
3856 check host mmonit.com with address mmonit.com
3857 if failed
3858 port 80 protocol http and
3859 request "/monit/dist/monit-5.7.tar.gz"
3860 with checksum f9d26b8393736b5dfad837bb13780786
3861 then alert
3862
3863 Here are a couple of tests for some popular communication servers,
3864 using the SIP protocol. First we test a FreeSWITCH server and then an
3865 Asterisk server
3866
3867 check process freeswitch
3868 with pidfile /usr/local/freeswitch/log/freeswitch.pid
3869 start program = "/usr/local/freeswitch/bin/freeswitch -nc -hp"
3870 stop program = "/usr/local/freeswitch/bin/freeswitch -stop"
3871 if total memory > 1000.0 MB for 5 cycles then alert
3872 if total memory > 1500.0 MB for 5 cycles then alert
3873 if total memory > 2000.0 MB for 5 cycles then restart
3874 if cpu > 60% for 5 cycles then alert
3875 if failed
3876 port 5060 type udp protocol SIP
3877 target me@foo.bar and maxforward 10
3878 then restart
3879
3880 check process asterisk
3881 with pidfile /var/run/asterisk/asterisk.pid
3882 start program = "/usr/sbin/asterisk"
3883 stop program = "/usr/sbin/asterisk -r -x 'shutdown now'"
3884 if total memory > 1000.0 MB for 5 cycles then alert
3885 if total memory > 1500.0 MB for 5 cycles then alert
3886 if total memory > 2000.0 MB for 5 cycles then restart
3887 if cpu > 60% for 5 cycles then alert
3888 if failed
3889 port 5060 type udp protocol SIP
3890 and target me@foo.bar maxforward 10
3891 then restart
3892
3893 Some servers are slow starters, like for example Java based Application
3894 Servers. If we want to keep the poll-cycle low (i.e. < 60 seconds) but
3895 allow some services to take its time to start, the every statement is
3896 handy:
3897
3898 check process dynamo with pidfile /etc/dynamo.pid every 2 cycles
3899 start program = "/etc/init.d/dynamo start"
3900 stop program = "/etc/init.d/dynamo stop"
3901 if failed port 8840 then alert
3902
3903 Here is an example where we group together two database entries so you
3904 can manage them together, e.g.; 'Monit -g database start all'. The mode
3905 statement is also illustrated in the first entry and have the effect
3906 that Monit will not try to (re)start this service if it is not running:
3907
3908 check process sybase with pidfile /var/run/sybase.pid
3909 start = "/etc/init.d/sybase start"
3910 stop = "/etc/init.d/sybase stop"
3911 mode passive
3912 group database
3913
3914 check process oracle with pidfile /var/run/oracle.pid
3915 start program = "/etc/init.d/oracle start"
3916 stop program = "/etc/init.d/oracle stop"
3917 if failed
3918 port 9001 protocol tns
3919 then restart
3920 group database
3921
3922 This resource checks example will send an alert if CPU usage of the
3923 Apache's HTTP daemon and its child processes goes beyond 60% for two
3924 cycles. Apache is restarted if the CPU usage is over 80% for five
3925 cycles or the memory usage is over 100Mb for five cycles:
3926
3927 check process apache with pidfile /var/run/httpd.pid
3928 start program = "/etc/init.d/httpd start"
3929 stop program = "/etc/init.d/httpd stop"
3930 if cpu > 40% for 2 cycles then alert
3931 if total cpu > 60% for 2 cycles then alert
3932 if total cpu > 80% for 5 cycles then restart
3933 if mem > 100 MB for 5 cycles then stop
3934
3935 This examples demonstrate the timestamp statement with exec and how you
3936 may restart apache if its configuration file was changed.
3937
3938 check file httpd.conf with path /etc/httpd/httpd.conf
3939 if changed timestamp
3940 then exec "/etc/init.d/httpd graceful"
3941
3942 In this example we demonstrate usage of the extended alert statement
3943 and a file check dependency:
3944
3945 check process apache with pidfile /var/run/httpd.pid
3946 start = "/etc/init.d/httpd start"
3947 stop = "/etc/init.d/httpd stop"
3948 alert admin@bar on {nonexist, timeout}
3949 with mail-format {
3950 from: bofh@$HOST
3951 subject: apache $EVENT - $ACTION
3952 message: This event occurred on $HOST at $DATE.
3953 Your faithful employee,
3954 monit
3955 }
3956 if failed host www.tildeslash.com port 80 then restart
3957 depend httpd_bin
3958 group apache
3959
3960 check file httpd_bin with path /usr/local/apache/bin/httpd
3961 alert security@bar on {checksum, timestamp,
3962 permission, uid, gid}
3963 with mail-format {subject: Alaaarrm! on $HOST}
3964 if failed checksum
3965 and expect 8f7f419955cefa0b33a2ba316cba3659
3966 then unmonitor
3967 if failed permission 755 then unmonitor
3968 if failed uid "root" then unmonitor
3969 if failed gid "root" then unmonitor
3970 if changed timestamp then alert
3971 group apache
3972
3973 In this example, we demonstrate usage of the depend statement. In this
3974 case, we want to start oracle and apache. However, we've set up apache
3975 to use oracle as a back end, and if oracle is restarted, apache must be
3976 restarted as well.
3977
3978 check process apache with pidfile /var/run/httpd.pid
3979 start = "/etc/init.d/httpd start"
3980 stop = "/etc/init.d/httpd stop"
3981 depends on oracle
3982
3983 check process oracle with pidfile /var/run/oracle.pid
3984 start = "/etc/init.d/oracle start"
3985 stop = "/etc/init.d/oracle stop"
3986 if failed port 9001 for 5 cycles then restart
3987
3988 Next, we have 2 services, oracle-import and oracle-export that need to
3989 be restarted if oracle is restarted, but are independent of each other.
3990
3991 check process oracle with pidfile /var/run/oracle.pid
3992 start = "/etc/init.d/oracle start"
3993 stop = "/etc/init.d/oracle stop"
3994 if failed port 9001 for 3 cycles then restart
3995
3996 check process oracle-import
3997 with pidfile /var/run/oracle-import.pid
3998 start = "/etc/init.d/oracle-import start"
3999 stop = "/etc/init.d/oracle-import stop"
4000 depends on oracle
4001
4002 check process oracle-export
4003 with pidfile /var/run/oracle-export.pid
4004 start = "/etc/init.d/oracle-export start"
4005 stop = "/etc/init.d/oracle-export stop"
4006 depends on oracle
4007
4009 ~/.monitrc
4010 Default run control file
4011
4012 /etc/monitrc
4013 If the control file is not found in the default
4014 location and /etc contains a monitrc file, this
4015 file will be used instead.
4016
4017 ./monitrc
4018 If the control file is not found in either of the
4019 previous two locations, and the current working
4020 directory contains a monitrc file, this file is
4021 used instead.
4022
4023 ~/.monit.pid
4024 Lock file to help prevent concurrent runs (non-root
4025 mode).
4026
4027 /run/monit.pid
4028 Lock file to help prevent concurrent runs (root mode,
4029 Linux systems, if /run directory is available).
4030
4031 /var/run/monit.pid
4032 Lock file to help prevent concurrent runs (root mode,
4033 Linux systems).
4034
4035 /etc/monit.pid
4036 Lock file to help prevent concurrent runs (root mode,
4037 systems without /var/run).
4038
4039 ~/.monit.state
4040 Monit saves its state to this file and utilises
4041 information found in this file to recover from
4042 a crash. This is a binary file and its content is
4043 only of interest to monit. You may set the location
4044 of this file in the Monit control file or by using
4045 the -s switch when Monit is started.
4046
4047 ~/.monit.id
4048 Monit save its unique id to this file.
4049
4051 No environment variables are used by Monit. However, when Monit
4052 executes a start/stop/restart program or an exec action, it will set
4053 several environment variables which can be utilised by the executable
4054 to get information about the event, which triggered the action.
4055
4056 The following environment variable is set for every program executed by
4057 monit, including check program:
4058
4059 MONIT_SERVICE
4060 The name of the service (from monitrc) for which the program is
4061 executed.
4062
4063 The following environment variables are only available in the service
4064 start/stop/restart program and exec action context:
4065
4066 MONIT_EVENT
4067 The event that occurred on the service
4068
4069 MONIT_DESCRIPTION
4070 A description of the error condition
4071
4072 MONIT_DATE
4073 The time and date (RFC 822 style) the event occurred
4074
4075 MONIT_HOST
4076 The host the event occurred on
4077
4078 The following environment variables are only available in the check
4079 process start/stop/restart program and exec action context:
4080
4081 MONIT_PROCESS_PID
4082 The process pid. This may be 0 if the process was (re)started,
4083
4084 MONIT_PROCESS_MEMORY
4085 Process memory. This may be 0 if the process was (re)started,
4086
4087 MONIT_PROCESS_CHILDREN
4088 Process children. This may be 0 if the process was (re)started,
4089
4090 MONIT_PROCESS_CPU_PERCENT
4091 Process cpu%. This may be 0 if the process was (re)started,
4092
4093 The following environment variables are only available for check
4094 program start/stop/restart program and exec action context:
4095
4096 MONIT_PROGRAM_STATUS
4097 The program status (exit value).
4098
4100 If a Monit daemon is running, SIGUSR1 wakes it up from its sleep phase
4101 and forces a poll of all services. SIGTERM and SIGINT will gracefully
4102 terminate a Monit daemon. The SIGTERM signal is sent to a Monit daemon
4103 if Monit is started with the quit action argument.
4104
4105 Sending a SIGHUP signal to a running Monit daemon will force the daemon
4106 to reinitialise itself, specifically it will reread configuration,
4107 close and reopen log files.
4108
4109 Running Monit in foreground while a background Monit daemon is running
4110 will wake up the daemon.
4111
4113 This is a very silent program. Use the -v switch if you want to see
4114 what Monit is doing, and tail -f the log file. Optionally for testing
4115 purposes; you can start Monit with the -Iv switch. Monit will then
4116 print debug information to the console, to stop monit in this mode,
4117 simply press CTRL^C (i.e. SIGINT) in the same console.
4118
4119 The syntax (and parser) of the control file was inspired by Eric S.
4120 Raymond et al.'s excellent fetchmail program. Some portions of this man
4121 page also receive inspiration from the same authors.
4122
4124 Copyright (C) 2001-2022 by Tildeslash Ltd. All Rights Reserved. This
4125 product is distributed in the hope that it will be useful, but WITHOUT
4126 any warranty; without even the implied warranty of MERCHANTABILITY or
4127 FITNESS for a particular purpose.
4128
4130 GNU text utilities; md5sum(1); sha1sum(1); openssl(1); glob(7);
4131 regex(7); https://mmonit.com
4132
4133
4134
41355.32.0 www.mmonit.com MONIT(1)