1WATCHDOG.CONF(5) File Formats Manual WATCHDOG.CONF(5)
2
3
4
6 watchdog.conf - configuration file for the watchdog daemon
7
9 This file carries all configuration options for the Linux watchdog dae‐
10 mon. Each option has to be written on a line for itself. Comments
11 start with '#'. Blanks are ignored except after the '=' sign. An empty
12 text after the '=' sign disables the feature as long as that makes
13 sense.
14
16 interval = <interval>
17 Set the highest possible interval between two writes to the
18 watchdog device. The device is triggered after each check
19 regardless of the time it took. After finishing all checks
20 watchdog goes to sleep for a full cycle of <interval> seconds.
21 Default value is 1 second. The kernel drivers expects a write
22 command every minute. Otherwise the system will be rebooted.
23 Therefore an interval of more than a minute can only be used
24 with the force command-line option [--force | -f].
25
26 logtick = <logtick>
27 If you enable verbose logging, a message is written into the
28 syslog or a logfile. While this is nice, it is not necessary to
29 get a message every interval which really fills up disk and
30 needs CPU. logtick allows adjustment of the number of intervals
31 skipped before a log message is written. If you use logtick = 60
32 and interval = 10, only every 10 minutes (600 seconds) a message
33 is written. This may make the exact time of a crash harder to
34 find but greatly reduces disk usage and administrator nerves if
35 you're looking for a particular syslog entry in between of
36 watchdog messages.
37
38 max-load-1 = <load1>
39 Set the maximal allowed load average for a 1 minute span. Once
40 this load average is reached the system is rebooted. Default
41 value is 0. That means the load average check is disabled. Be
42 careful not to set this parameter too low. To set a value less
43 then the predefined minimal value of 2, you have to use the -f
44 command line option.
45
46 max-load-5 = <load5>
47 Set the maximal allowed load average for a 5 minute span. Once
48 this load average is reached the system is rebooted. Default
49 value is 3/4*max-load-1. Be careful not to this parameter too
50 low. To set a value less then the predefined minimal value of 2,
51 you have to use the -f command line option.
52
53 max-load-15 = <load15>
54 Set the maximal allowed load average for a 15 minute span. Once
55 this load average is reached the system is rebooted. Default
56 value is 1/2*max-load-1. Be careful not to this parameter too
57 low. To set a value less then the predefined minimal value of 2,
58 you have to use the -f command line option.
59
60 min-memory = <minpage>
61 Set the minimal amount of virtual memory that has to stay free.
62 Note that this is in memory pages (4kB on x86). Default value is
63 0 pages which means this test is disabled. The page size is
64 taken from the system include files. This is a 'passive' test
65 and works by reading /proc/meminfo
66
67 allocatable-memory = <minpage>
68 Set the minimum amount of allocatable memory available on the
69 system. Note that this is in pages. Default value is 0 pages
70 which means the test is disabled. As with min-memory, the page
71 size is taken from the system include files. This is an 'active'
72 test and it works by attempting to memory-map a block of the
73 configured size.
74
75 watchdog-device = <device>
76 Set the watchdog device name, typically /dev/watchdog. Default
77 is to disable keep alive support. This should be tested by run‐
78 ning the daemon from the command line before configuring it to
79 start automatically on booting.
80
81 watchdog-timeout = <timeout>
82 Set the watchdog device timeout during startup. If not set, a
83 default is used that should be set to the kernel timer margin at
84 compile time.
85
86 temperature-sensor = <temp-virtual-file>
87 Set the temperature sensor name. This is normally a 'virtual
88 file' under /sys and it contains the temperature in milli-Cel‐
89 sius. Usually these are generated by the sensors package, but
90 take care as device enumeration may not be fixed. Default is to
91 disable temperature checking. Multiple sensors can be used by
92 having repeated temperature-sensor entries.
93
94 max-temperature = <temp>
95 Set the maximal allowed temperature. Once this temperature is
96 reached the system is stopped. Default value is 90 C. Watchdog
97 will issue warnings once the temperature increases 90%, 95% and
98 98% of this temperature.
99
100 temp-power-off = <yes|no>
101 Set the watchdog action on overheating. Yes option (default) is
102 to power the machine off, no option is to halt machine and allow
103 Ctrl-Alt-Del reboot.
104
105 file = <filename>
106 Set file name for file mode. This option can be given as often
107 as you like to check several files.
108
109 change = <mtime>
110 Set the change interval time for file mode. This options always
111 belongs to the active filename, that is when finding a 'change
112 =' line watchdog assumes it belongs to the most recently read
113 'file =' line. They don't necessarily have to follow each other
114 directly. But you cannot specify a 'change =' before a 'file ='.
115 The default is to only stat the file and don't look for changes.
116 Using this feature to monitor changes in /var/log/messages might
117 require some special syslog daemon configuration, e.g. rsyslog
118 needs "$ActionWriteAllMarkMessages on" to be set to make sure
119 the marks are written no matter what.
120
121 pidfile = <pidfilename>
122 Set pidfile name for server test mode. This option can be given
123 as often as you like to check several servers. See the Systemd
124 section in watchdog (8) for more information.
125
126 ping = <ip-addr>
127 Set IPv4 address for ping mode. This option can be used more
128 than once to check different connections.
129
130 interface = <if-name>
131 Set interface name for network mode. This option can be used
132 more than once to check different interfaces. Note it is only
133 possible to check physical interfaces, and not aliased IP inter‐
134 faces.
135
136 test-binary = <testbin>
137 Execute the given binary to do some user defined tests. With
138 enforcing SELinux policy please use the /usr/libexec/watch‐
139 dog/scripts/ for your test-binary configuration.
140
141 test-timeout = <timeout in seconds>
142 User defined tests may only run for <timeout> seconds. Set to 0
143 for unlimited.
144
145 repair-binary = <repbin>
146 Execute the given binary in case of a problem instead of shut‐
147 ting down the system. With enforcing SELinux policy please use
148 the /usr/libexec/watchdog/scripts/ for your repair-binary con‐
149 figuration.
150
151 repair-timeout = <timeout in seconds>
152 repair command may only run for <timeout> seconds. Set to 0 for
153 'unlimited', but note that the hardware timer is not refreshed
154 in this case so the system will hard-reset at some point.
155
156 retry-timeout = <timeout in seconds>
157 Allow most error conditions to persist for <timeout> seconds.
158 Set to 0 for immediate action (like softboot behaviour).
159
160 repair-maximum = <count>
161 This allows no more then <count> repair attempts against a given
162 fault that report success (i.e. return 0), but fail to clear the
163 fault, before a reboot is initiated anyway. If set to zero then
164 a repairable fault can always be blocked by a repair program
165 reporting success (previous daemon behaviour).
166
167 admin = <mail-address>
168 Email address to send admin mail to. That is, who shall be noti‐
169 fied that the machine is being halted or rebooted. Default is
170 'root'. If you want to disable notification via email just set
171 admin to en empty string.
172
173 realtime = <yes|no>
174 If set to yes watchdog will lock itself into memory so it is
175 never swapped out.
176
177 priority = <schedule priority>
178 Set the schedule priority for realtime mode.
179
180 test-directory = <test directory>
181 Set the directory to run user test/repair scripts. Default is
182 '/etc/watchdog.d' The /etc/watchdog.d/ is recognized by SELinux
183 policy. See the Test Directory section in watchdog(8) for more
184 information.
185
186 log-dir = <log directory>
187 Set the log directory to capture the standard output and stan‐
188 dard error from repair-binary and test-binary execution. Default
189 is '/var/log/watchdog'.
190
191 sigterm-delay = <time in seconds>
192 Set the time on shut down between first sending SIGTERM to all
193 processes, and then sending SIGKILL. Default is 5 seconds which
194 is generally enough, but systems with large databases or virtual
195 machines might need longer.
196
197 verbose = <yes|no>
198 This overrides the command line --verbose option. Generally the
199 verbose mode is only enabled for debugging as it creates a lot
200 of syslog chatter, so use this option with consideration.
201
203 /etc/watchdog.conf
204 The watchdog configuration file
205
206 /etc/watchdog.d
207 A directory containing test-or-repair commands. See the Test
208 Directory section in watchdog(8) for more information.
209
211 watchdog(8)
212
213
214
2154th Berkeley Distribution January 2016 WATCHDOG.CONF(5)