1COLLECTD-THRESHOLD(5)              collectd              COLLECTD-THRESHOLD(5)
2
3
4

NAME

6       collectd-threshold - Documentation of collectd's Threshold plugin
7

SYNOPSIS

9        LoadPlugin "threshold"
10        <Plugin "threshold">
11          <Type "foo">
12            WarningMin    0.00
13            WarningMax 1000.00
14            FailureMin    0.00
15            FailureMax 1200.00
16            Invert false
17            Instance "bar"
18          </Type>
19        </Plugin>
20

DESCRIPTION

22       Starting with version 4.3.0 collectd has support for monitoring. By
23       that we mean that the values are not only stored or sent somewhere, but
24       that they are judged and, if a problem is recognized, acted upon. The
25       only action the Threshold plugin takes itself is to generate and
26       dispatch a notification. Other plugins can register to receive
27       notifications and perform appropriate further actions.
28
29       Since systems and what you expect them to do differ a lot, you can
30       configure thresholds for your values freely. This gives you a lot of
31       flexibility but also a lot of responsibility.
32
33       Every time a value is out of range, a notification is dispatched. This
34       means that the idle percentage of your CPU needs to be less then the
35       configured threshold only once for a notification to be generated.
36       There's no such thing as a moving average or similar - at least not
37       now.
38
39       Also, all values that match a threshold are considered to be relevant
40       or "interesting". As a consequence collectd will issue a notification
41       if they are not received for Timeout iterations. The Timeout
42       configuration option is explained in section "GLOBAL OPTIONS" in
43       collectd.conf(5). If, for example, Timeout is set to "2" (the default)
44       and some hosts sends its CPU statistics to the server every 60 seconds,
45       a notification will be dispatched after about 120 seconds. It may take
46       a little longer because the timeout is checked only once each Interval
47       on the server.
48
49       When a value comes within range again or is received after it was
50       missing, an "OKAY-notification" is dispatched.
51

CONFIGURATION

53       Here is a configuration example to get you started. Read below for more
54       information.
55
56        LoadPlugin "threshold"
57        <Plugin "threshold">
58          <Type "foo">
59            WarningMin    0.00
60            WarningMax 1000.00
61            FailureMin    0.00
62            FailureMax 1200.00
63            Invert false
64            Instance "bar"
65          </Type>
66
67          <Plugin "interface">
68            Instance "eth0"
69            <Type "if_octets">
70              FailureMax 10000000
71              DataSource "rx"
72            </Type>
73          </Plugin>
74
75          <Host "hostname">
76            <Type "cpu">
77              Instance "idle"
78              FailureMin 10
79            </Type>
80
81            <Plugin "memory">
82              <Type "memory">
83                Instance "cached"
84                WarningMin 100000000
85              </Type>
86            </Plugin>
87
88            <Type "load">
89               DataSource "midterm"
90               FailureMax 4
91               Hits 3
92               Hysteresis 3
93            </Type>
94          </Host>
95        </Plugin>
96
97       There are basically two types of configuration statements: The "Host",
98       "Plugin", and "Type" blocks select the value for which a threshold
99       should be configured. The "Plugin" and "Type" blocks may be specified
100       further using the "Instance" option. You can combine the block by
101       nesting the blocks, though they must be nested in the above order, i.e.
102       "Host" may contain either "Plugin" and "Type" blocks, "Plugin" may only
103       contain "Type" blocks and "Type" may not contain other blocks. If
104       multiple blocks apply to the same value the most specific block is
105       used.
106
107       The other statements specify the threshold to configure. They must be
108       included in a "Type" block. Currently the following statements are
109       recognized:
110
111       FailureMax Value
112       WarningMax Value
113           Sets the upper bound of acceptable values. If unset defaults to
114           positive infinity. If a value is greater than FailureMax a FAILURE
115           notification will be created. If the value is greater than
116           WarningMax but less than (or equal to) FailureMax a WARNING
117           notification will be created.
118
119       FailureMin Value
120       WarningMin Value
121           Sets the lower bound of acceptable values. If unset defaults to
122           negative infinity. If a value is less than FailureMin a FAILURE
123           notification will be created. If the value is less than WarningMin
124           but greater than (or equal to) FailureMin a WARNING notification
125           will be created.
126
127       DataSource DSName
128           Some data sets have more than one "data source". Interesting
129           examples are the "if_octets" data set, which has received ("rx")
130           and sent ("tx") bytes and the "disk_ops" data set, which holds
131           "read" and "write" operations. The system load data set, "load",
132           even has three data sources: "shortterm", "midterm", and
133           "longterm".
134
135           Normally, all data sources are checked against a configured
136           threshold. If this is undesirable, or if you want to specify
137           different limits for each data source, you can use the DataSource
138           option to have a threshold apply only to one data source.
139
140       Invert true|false
141           If set to true the range of acceptable values is inverted, i.e.
142           values between FailureMin and FailureMax (WarningMin and
143           WarningMax) are not okay. Defaults to false.
144
145       Persist true|false
146           Sets how often notifications are generated. If set to true one
147           notification will be generated for each value that is out of the
148           acceptable range. If set to false (the default) then a notification
149           is only generated if a value is out of range but the previous value
150           was okay.
151
152           This applies to missing values, too: If set to true a notification
153           about a missing value is generated once every Interval seconds. If
154           set to false only one such notification is generated until the
155           value appears again.
156
157       PersistOK true|false
158           Sets how OKAY notifications act. If set to true one notification
159           will be generated for each value that is in the acceptable range.
160           If set to false (the default) then a notification is only generated
161           if a value is in range but the previous value was not.
162
163       Percentage true|false
164           If set to true, the minimum and maximum values given are
165           interpreted as percentage value, relative to the other data
166           sources. This is helpful for example for the "df" type, where you
167           may want to issue a warning when less than 5 % of the total space
168           is available. Defaults to false.
169
170       Hits Value
171           Sets the number of occurrences which the threshold must be raised
172           before to dispatch any notification or, in other words, the number
173           of Intervals that the threshold must be match before dispatch any
174           notification.
175
176       Hysteresis Value
177           Sets the hysteresis value for threshold. The hysteresis is a method
178           to prevent flapping between states, until a new received value for
179           a previously matched threshold down below the threshold condition
180           (WarningMax, FailureMin or everything else) minus the hysteresis
181           value, the failure (respectively warning) state will be keep.
182
183       Interesting true|false
184           If set to true (the default), a notification with severity
185           "FAILURE" will be created when a matching value list is no longer
186           updated and purged from the internal cache. When this happens
187           depends on the interval of the value list and the global Timeout
188           setting. See the Interval and Timeout settings in collectd.conf(5)
189           for details. If set to false, this event will be ignored.
190

SEE ALSO

192       collectd(1), collectd.conf(5)
193

AUTHOR

195       Florian Forster <octo at collectd.org>
196
197
198
1995.9.1.599.gbb5cfbc                2020-03-08             COLLECTD-THRESHOLD(5)
Impressum