watchdog.conf(5)

1WATCHDOG.CONF(5)              File Formats Manual             WATCHDOG.CONF(5)
2
3
4

NAME

6       watchdog.conf - configuration file for the watchdog daemon
7

DESCRIPTION

9       This file carries all configuration options for the Linux watchdog dae‐
10       mon.  Each option has to be written on  a  line  for  itself.  Comments
11       start with '#'.  Blanks are ignored except after the '=' sign. An empty
12       text after the '=' sign disables the feature  as  long  as  that  makes
13       sense.
14

OPTIONS

16       interval = <interval>
17              Set  the  highest  possible  interval  between two writes to the
18              watchdog device.  The device is triggered after each  check  re‐
19              gardless  of the time it took. After finishing all checks watch‐
20              dog goes to sleep for a full cycle of  <interval>  seconds.  De‐
21              fault  value is 1 second. The kernel drivers typically expects a
22              write command every minute otherwise  the  system  will  be  re‐
23              booted.  Therefore an interval of more than a minute can only be
24              used with the force command-line option [--force | -f].
25
26       logtick = <logtick>
27              If you enable verbose logging, a message  is  written  into  the
28              syslog  or a logfile. While this is nice, it is not necessary to
29              get a message every interval which  really  fills  up  disk  and
30              needs  CPU. logtick allows adjustment of the number of intervals
31              skipped before a log message is written. If you use logtick = 60
32              and interval = 10, only every 10 minutes (600 seconds) a message
33              is written. This may make the exact time of a  crash  harder  to
34              find  but greatly reduces disk usage and administrator nerves if
35              you're looking for a  particular  syslog  entry  in  between  of
36              watchdog messages.
37
38       max-load-1 = <load1>
39              Set  the  maximal allowed load average for a 1 minute span. Once
40              this load average is reached the  system  is  rebooted.  Default
41              value  is  0.  That means the load average check is disabled. Be
42              careful not to set this parameter too low. To set a  value  less
43              then  the  predefined minimal value of 2, you have to use the -f
44              command line option.
45
46       max-load-5 = <load5>
47              Set the maximal allowed load average for a 5 minute  span.  Once
48              this  load  average  is  reached the system is rebooted. Default
49              value is 3/4*max-load-1.  Be careful not to this  parameter  too
50              low. To set a value less then the predefined minimal value of 2,
51              you have to use the -f command line option.
52
53       max-load-15 = <load15>
54              Set the maximal allowed load average for a 15 minute span.  Once
55              this  load  average  is  reached the system is rebooted. Default
56              value is 1/2*max-load-1.  Be careful not to this  parameter  too
57              low. To set a value less then the predefined minimal value of 2,
58              you have to use the -f command line option.
59
60       min-memory = <minpage>
61              Set the minimal amount of memory that has  to  stay  free.  Note
62              that  this  is  in memory pages (4kB on x86). Default value is 0
63              pages which means this test is disabled. The page size is  taken
64              from  the  system  include files.  The usable memory is computed
65              from MemFree + Buffers + Cached since buffer and cache use typi‐
66              cally expand to use most free memory but the kernel will reclaim
67              this as needed. NOTE: If this measure gets below a few  tens  of
68              MB  then the system will page swap aggressively have poorer file
69              system performance due to the lack of caching.  This is a  'pas‐
70              sive' test and works by reading /proc/meminfo
71
72       allocatable-memory = <minpage>
73              Set  the  minimum  amount of allocatable memory available on the
74              system.  Note that this is in pages.  Default value is  0  pages
75              which  means the test is disabled.  As with min-memory, the page
76              size is taken from the system include files. This is an 'active'
77              test  and  it  works  by attempting to memory-map a block of the
78              configured size.
79
80       max-swap = <maxpage>
81              Set the maximum amount of swap use. Note that this is in  memory
82              pages  (4kB  on  x86). Default value is 0 pages which means this
83              test is disabled. Often this should be a large portion of avail‐
84              able  swap,  but  remember that paging 1GB of swap can take sev‐
85              eral/tens of seconds.  This is a 'passive'  test  and  works  by
86              reading /proc/meminfo
87
88       watchdog-device = <device>
89              Set  the  watchdog device name, typically /dev/watchdog. Default
90              is to disable keep alive support. This should be tested by  run‐
91              ning  the  daemon from the command line before configuring it to
92              start automatically on booting.
93
94       watchdog-refresh-use-settimeout = <auto|yes|no>
95              Refresh watchdog timer by setting its timeout instead of using a
96              normal  watchdog  refresh operation. Might help if your watchdog
97              trips by itself when the first timeout interval elapses. Default
98              is  'auto' for IT87 fix-up but this can be disabled with 'no' or
99              forced for other modules with 'yes'.
100
101       watchdog-refresh-ignore-errors = <yes|no>
102              Ignore errors reported by writing to the watchdog device.  Typi‐
103              cally  this is used for systems that have broken implementations
104              of the IPMI driver to avoid a reboot loop.
105
106       watchdog-timeout = <timeout>
107              Set the watchdog device timeout during startup.  If not  set,  a
108              default is used that should be set to the kernel timer margin at
109              compile time.
110
111       temperature-sensor = <temp-virtual-file>
112              Set the temperature sensor name. This  is  normally  a  'virtual
113              file'  under  /sys and it contains the temperature in milli-Cel‐
114              sius. Usually these are generated by the  sensors  package,  but
115              take  care as device enumeration may not be fixed. Default is to
116              disable temperature checking. Multiple sensors can  be  used  by
117              having  repeated temperature-sensor entries. Due to the enumera‐
118              tion problem any missing temp sensor is simply ignored  and  not
119              treated as a reboot trigger.
120
121       max-temperature = <temp>
122              Set  the  maximal allowed temperature in Celsius. Once this tem‐
123              perature is reached the system is stopped. Default value  is  90
124              C.  Watchdog  will issue warnings once the temperature increases
125              90%, 95% and 98% of this temperature.
126
127       temp-power-off = <yes|no>
128              Set the watchdog action on overheating. Yes option (default)  is
129              to power the machine off, no option is to halt machine and allow
130              Ctrl-Alt-Del reboot.
131
132       file = <filename>
133              Set file name for file mode.  This option can be given as  often
134              as you like to check several files.
135
136       change = <mtime>
137              Set  the change interval time for file mode. This options always
138              belongs to the active filename, that is when finding  a  'change
139              ='  line  watchdog  assumes it belongs to the most recently read
140              'file =' line.  They don't necessarily have to follow each other
141              directly. But you cannot specify a 'change =' before a 'file ='.
142              The default is to only stat the file and don't look for changes.
143              Using this feature to monitor changes in /var/log/messages might
144              require some special syslog daemon configuration,  e.g.  rsyslog
145              needs  "$ActionWriteAllMarkMessages  on"  to be set to make sure
146              the marks are written no matter what.
147
148       pidfile = <pidfilename>
149              Set pidfile name for daemon test mode.  This option can be given
150              as  often  as  you  like to check several daemons, assuming they
151              write their post-forking PID to the specified  files.   See  the
152              Systemd section in watchdog (8) for more information.
153
154       ping = <ip-addr>
155              Set  IPv4  address  for ping mode.  This option can be used more
156              than once to check different connections.
157
158       ping-count = <ping-per-interval>
159              Set the number of ping attempts in each 'interval' of time.  De‐
160              fault is 3 and it completes on the first successful ping.
161
162       interface = <if-name>
163              Set  interface  name  for network mode.  This option can be used
164              more than once to check different interfaces. Note  it  is  only
165              possible to check physical interfaces, and not aliased IP inter‐
166              faces.
167
168       test-binary = <testbin>
169              Execute the given binary to do some user  defined  tests.   With
170              enforcing  SELinux  policy  please  use  the /usr/libexec/watch‐
171              dog/scripts/ for your test-binary configuration.
172
173       test-timeout = <timeout in seconds>
174              User defined tests may only run for <timeout> seconds. Set to  0
175              for unlimited.
176
177       repair-binary = <repbin>
178              Execute  the  given binary in case of a problem instead of shut‐
179              ting down the system.  With enforcing SELinux policy please  use
180              the  /usr/libexec/watchdog/scripts/  for your repair-binary con‐
181              figuration.
182
183       repair-timeout = <timeout in seconds>
184              repair command may only run for <timeout> seconds. Set to 0  for
185              'unlimited',  but  note that the hardware timer is not refreshed
186              in this case so the system will hard-reset at some point.
187
188       retry-timeout = <timeout in seconds>
189              Allow most error conditions to persist  for  <timeout>  seconds.
190              Set to 0 for immediate action (like softboot behaviour).
191
192       repair-maximum = <count>
193              This allows no more then <count> repair attempts against a given
194              fault that report success (i.e. return 0), but fail to clear the
195              fault,  before a reboot is initiated anyway. If set to zero then
196              a repairable fault can always be blocked by a repair program re‐
197              porting success (previous daemon behaviour).
198
199       softboot-option = <yes|no>
200              This  acts like the -b / --softboot command line and simply sets
201              the retry timeout to zero.
202
203       admin = <mail-address>
204              Email address to send admin mail to. That is, who shall be noti‐
205              fied  that  the  machine is being halted or rebooted. Default is
206              'root'. If you want to disable notification via email  just  set
207              admin to en empty string.
208
209       realtime = <yes|no>
210              If  set  to  yes  watchdog will lock itself into memory so it is
211              never swapped out.
212
213       priority = <schedule priority>
214              Set  the  schedule  priority  for  realtime   mode   passed   to
215              sched_setscheduler().
216
217       test-directory = <test directory>
218              Set  the  directory to run user test/repair scripts.  Default is
219              '/etc/watchdog.d' The /etc/watchdog.d/ is recognized by  SELinux
220              policy.   See the Test Directory section in watchdog(8) for more
221              information.
222
223       log-dir = <log directory>
224              Set the log directory to capture the standard output  and  stan‐
225              dard error from repair-binary and test-binary execution. Default
226              is '/var/log/watchdog'.
227
228       sigterm-delay = <time in seconds>
229              Set the time on shut down between first sending SIGTERM  to  all
230              processes,  and then sending SIGKILL. Default is 5 seconds which
231              is generally enough, but systems with large databases or virtual
232              machines might need longer.
233
234       verbose = <level>
235              This  overrides the command line --verbose option. Generally the
236              verbose mode is only enabled for debugging as it creates  a  lot
237              of  syslog  chatter, so use this option with consideration. Zero
238              is "normal" operation (quiet), while 1 is typically used for de‐
239              bugging.  Values of 2 or more usually generate far too many mes‐
240              sages.
241
242       heartbeat-file = <filename>
243              For debugging this allows a rolling set of status values  to  be
244              kept on disk
245
246       heartbeat-stamps = <interval>
247              For debugging this sets the number of entries in the <heartbeat-
248              file>
249
250       log-killed-pids = <yes|no>
251              This acts like enabling 'verbose' logging, but only for a system
252              reboot,  where  it enables the logging of the PID values for all
253              processes that are being killed. The results are written to  the
254              killall5.log  file  in the log directory (if at all possible) in
255              this case.  Intended for debugging cases where you would like to
256              know  what  was  running  at the point the machine triggered the
257              watchdog, but don't want syslog filling up with the usual  chat‐
258              ter of activity.
259

FILES

261       /etc/watchdog.conf
262              The watchdog configuration file
263
264       /etc/watchdog.d
265              A directory containing test-or-repair commands. See the Test Di‐
266              rectory section in watchdog(8) for more information.
267

NAME

DESCRIPTION

OPTIONS

FILES

SEE ALSO