1COLLECTD-THRESHOLD(5) collectd COLLECTD-THRESHOLD(5)
2
3
4
6 collectd-threshold - Documentation of collectd's Threshold plugin
7
9 LoadPlugin "threshold"
10 <Plugin "threshold">
11 <Type "foo">
12 WarningMin 0.00
13 WarningMax 1000.00
14 FailureMin 0.00
15 FailureMax 1200.00
16 Invert false
17 Instance "bar"
18 </Type>
19 </Plugin>
20
22 Starting with version 4.3.0 collectd has support for monitoring. By
23 that we mean that the values are not only stored or sent somewhere, but
24 that they are judged and, if a problem is recognized, acted upon. The
25 only action the Threshold plugin takes itself is to generate and
26 dispatch a notification. Other plugins can register to receive
27 notifications and perform appropriate further actions.
28
29 Since systems and what you expect them to do differ a lot, you can
30 configure thresholds for your values freely. This gives you a lot of
31 flexibility but also a lot of responsibility.
32
33 Every time a value is out of range, a notification is dispatched. This
34 means that the idle percentage of your CPU needs to be less then the
35 configured threshold only once for a notification to be generated.
36 There's no such thing as a moving average or similar - at least not
37 now.
38
39 Also, all values that match a threshold are considered to be relevant
40 or "interesting". As a consequence collectd will issue a notification
41 if they are not received for Timeout iterations. The Timeout
42 configuration option is explained in section "GLOBAL OPTIONS" in
43 collectd.conf(5). If, for example, Timeout is set to "2" (the default)
44 and some hosts sends its CPU statistics to the server every 60 seconds,
45 a notification will be dispatched after about 120 seconds. It may take
46 a little longer because the timeout is checked only once each Interval
47 on the server.
48
49 When a value comes within range again or is received after it was
50 missing, an "OKAY-notification" is dispatched.
51
53 Here is a configuration example to get you started. Read below for more
54 information.
55
56 LoadPlugin "threshold"
57 <Plugin "threshold">
58 <Type "foo">
59 WarningMin 0.00
60 WarningMax 1000.00
61 FailureMin 0.00
62 FailureMax 1200.00
63 Invert false
64 Instance "bar"
65 </Type>
66
67 <Plugin "interface">
68 Instance "eth0"
69 <Type "if_octets">
70 FailureMax 10000000
71 DataSource "rx"
72 </Type>
73 </Plugin>
74
75 <Host "hostname">
76 <Type "cpu">
77 Instance "idle"
78 FailureMin 10
79 </Type>
80
81 <Plugin "memory">
82 <Type "memory">
83 Instance "cached"
84 WarningMin 100000000
85 </Type>
86 </Plugin>
87
88 <Type "load">
89 DataSource "midterm"
90 FailureMax 4
91 Hits 3
92 Hysteresis 3
93 </Type>
94 </Host>
95 </Plugin>
96
97 There are basically two types of configuration statements: The "Host",
98 "Plugin", and "Type" blocks select the value for which a threshold
99 should be configured. The "Plugin" and "Type" blocks may be specified
100 further using the "Instance" option. You can combine the block by
101 nesting the blocks, though they must be nested in the above order, i.e.
102 "Host" may contain either "Plugin" and "Type" blocks, "Plugin" may only
103 contain "Type" blocks and "Type" may not contain other blocks. If
104 multiple blocks apply to the same value the most specific block is
105 used.
106
107 The other statements specify the threshold to configure. They must be
108 included in a "Type" block. Currently the following statements are
109 recognized:
110
111 FailureMax Value
112 WarningMax Value
113 Sets the upper bound of acceptable values. If unset defaults to
114 positive infinity. If a value is greater than FailureMax a FAILURE
115 notification will be created. If the value is greater than
116 WarningMax but less than (or equal to) FailureMax a WARNING
117 notification will be created.
118
119 FailureMin Value
120 WarningMin Value
121 Sets the lower bound of acceptable values. If unset defaults to
122 negative infinity. If a value is less than FailureMin a FAILURE
123 notification will be created. If the value is less than WarningMin
124 but greater than (or equal to) FailureMin a WARNING notification
125 will be created.
126
127 DataSource DSName
128 Some data sets have more than one "data source". Interesting
129 examples are the "if_octets" data set, which has received ("rx")
130 and sent ("tx") bytes and the "disk_ops" data set, which holds
131 "read" and "write" operations. The system load data set, "load",
132 even has three data sources: "shortterm", "midterm", and
133 "longterm".
134
135 Normally, all data sources are checked against a configured
136 threshold. If this is undesirable, or if you want to specify
137 different limits for each data source, you can use the DataSource
138 option to have a threshold apply only to one data source.
139
140 Invert true|false
141 If set to true the range of acceptable values is inverted, i.e.
142 values between FailureMin and FailureMax (WarningMin and
143 WarningMax) are not okay. Defaults to false.
144
145 Persist true|false
146 Sets how often notifications are generated. If set to true one
147 notification will be generated for each value that is out of the
148 acceptable range. If set to false (the default) then a notification
149 is only generated if a value is out of range but the previous value
150 was okay.
151
152 This applies to missing values, too: If set to true a notification
153 about a missing value is generated once every Interval seconds. If
154 set to false only one such notification is generated until the
155 value appears again.
156
157 PersistOK true|false
158 Sets how OKAY notifications act. If set to true one notification
159 will be generated for each value that is in the acceptable range.
160 If set to false (the default) then a notification is only generated
161 if a value is in range but the previous value was not.
162
163 Percentage true|false
164 If set to true, the minimum and maximum values given are
165 interpreted as percentage value, relative to the other data
166 sources. This is helpful for example for the "df" type, where you
167 may want to issue a warning when less than 5 % of the total space
168 is available. Defaults to false.
169
170 Hits Value
171 Sets the number of occurrences which the threshold must be raised
172 before to dispatch any notification or, in other words, the number
173 of Intervals that the threshold must be match before dispatch any
174 notification.
175
176 Hysteresis Value
177 Sets the hysteresis value for threshold. The hysteresis is a method
178 to prevent flapping between states, until a new received value for
179 a previously matched threshold down below the threshold condition
180 (WarningMax, FailureMin or everything else) minus the hysteresis
181 value, the failure (respectively warning) state will be keep.
182
183 Interesting true|false
184 If set to true (the default), a notification with severity
185 "FAILURE" will be created when a matching value list is no longer
186 updated and purged from the internal cache. When this happens
187 depends on the interval of the value list and the global Timeout
188 setting. See the Interval and Timeout settings in collectd.conf(5)
189 for details. If set to false, this event will be ignored.
190
192 collectd(1), collectd.conf(5)
193
195 Florian Forster <octo at collectd.org>
196
197
198
1995.11.0.94.g41b1e33 2020-07-20 COLLECTD-THRESHOLD(5)