1VOTEQUORUM(5) Corosync Cluster Engine Programmer's Manual VOTEQUORUM(5)
2
3
4
6 votequorum - Votequorum Configuration Overview
7
9 The votequorum service is part of the corosync project. This service
10 can be optionally loaded into the nodes of a corosync cluster to avoid
11 split-brain situations. It does this by having a number of votes
12 assigned to each system in the cluster and ensuring that only when a
13 majority of the votes are present, cluster operations are allowed to
14 proceed. The service must be loaded into all nodes or none. If it is
15 loaded into a subset of cluster nodes the results will be unpre‐
16 dictable.
17
18 The following corosync.conf extract will enable votequorum service
19 within corosync:
20
21 quorum {
22 provider: corosync_votequorum
23 }
24
25 votequorum reads its configuration from corosync.conf. Some values can
26 be changed at runtime, others are only read at corosync startup. It is
27 very important that those values are consistent across all the nodes
28 participating in the cluster or votequorum behavior will be unpre‐
29 dictable.
30
31 votequorum requires an expected_votes value to function, this can be
32 provided in two ways. The number of expected votes will be automati‐
33 cally calculated when the nodelist { } section is present in
34 corosync.conf or expected_votes can be specified in the quorum { } sec‐
35 tion. Lack of both will disable votequorum. If both are present at the
36 same time, the quorum.expected_votes value will override the one calcu‐
37 lated from the nodelist.
38
39 Example (no nodelist) of an 8 node cluster (each node has 1 vote):
40
41 quorum {
42 provider: corosync_votequorum
43 expected_votes: 8
44 }
45
46 Example (with nodelist) of a 3 node cluster (each node has 1 vote):
47
48 quorum {
49 provider: corosync_votequorum
50 }
51
52 nodelist {
53 node {
54 ring0_addr: 192.168.1.1
55 }
56 node {
57 ring0_addr: 192.168.1.2
58 }
59 node {
60 ring0_addr: 192.168.1.3
61 }
62 }
63
65 two_node: 1
66
67 Enables two node cluster operations (default: 0).
68
69 The "two node cluster" is a use case that requires special considera‐
70 tion. With a standard two node cluster, each node with a single vote,
71 there are 2 votes in the cluster. Using the simple majority calculation
72 (50% of the votes + 1) to calculate quorum, the quorum would be 2.
73 This means that the both nodes would always have to be alive for the
74 cluster to be quorate and operate.
75
76 Enabling two_node: 1, quorum is set artificially to 1.
77
78 Example configuration 1:
79
80 quorum {
81 provider: corosync_votequorum
82 expected_votes: 2
83 two_node: 1
84 }
85
86
87 Example configuration 2:
88
89 quorum {
90 provider: corosync_votequorum
91 two_node: 1
92 }
93
94 nodelist {
95 node {
96 ring0_addr: 192.168.1.1
97 }
98 node {
99 ring0_addr: 192.168.1.2
100 }
101 }
102
103 NOTES: enabling two_node: 1 automatically enables wait_for_all. It is
104 still possible to override wait_for_all by explicitly setting it to 0.
105 If more than 2 nodes join the cluster, the two_node option is automati‐
106 cally disabled.
107
108 wait_for_all: 1
109
110 Enables Wait For All (WFA) feature (default: 0).
111
112 The general behaviour of votequorum is to switch a cluster from inquo‐
113 rate to quorate as soon as possible. For example, in an 8 node cluster,
114 where every node has 1 vote, expected_votes is set to 8 and quorum is
115 (50% + 1) 5. As soon as 5 (or more) nodes are visible to each other,
116 the partition of 5 (or more) becomes quorate and can start operating.
117
118 When WFA is enabled, the cluster will be quorate for the first time
119 only after all nodes have been visible at least once at the same time.
120
121 This feature has the advantage of avoiding some startup race condi‐
122 tions, with the cost that all nodes need to be up at the same time at
123 least once before the cluster can operate.
124
125 A common startup race condition based on the above example is that as
126 soon as 5 nodes become quorate, with the other 3 still offline, the
127 remaining 3 nodes will be fenced.
128
129 It is very useful when combined with last_man_standing (see below).
130
131 Example configuration:
132
133 quorum {
134 provider: corosync_votequorum
135 expected_votes: 8
136 wait_for_all: 1
137 }
138
139 last_man_standing: 1 / last_man_standing_window: 10000
140
141 Enables Last Man Standing (LMS) feature (default: 0). Tunable
142 last_man_standing_window (default: 10 seconds, expressed in ms).
143
144 The general behaviour of votequorum is to set expected_votes and quorum
145 at startup (unless modified by the user at runtime, see below) and use
146 those values during the whole lifetime of the cluster.
147
148 Using for example an 8 node cluster where each node has 1 vote,
149 expected_votes is set to 8 and quorum to 5. This condition allows a
150 total failure of 3 nodes. If a 4th node fails, the cluster becomes
151 inquorate and it will stop providing services.
152
153 Enabling LMS allows the cluster to dynamically recalculate
154 expected_votes and quorum under specific circumstances. It is essential
155 to enable WFA when using LMS in High Availability clusters.
156
157 Using the above 8 node cluster example, with LMS enabled the cluster
158 can retain quorum and continue operating by losing, in a cascade fash‐
159 ion, up to 6 nodes with only 2 remaining active.
160
161 Example chain of events:
162 1) cluster is fully operational with 8 nodes.
163 (expected_votes: 8 quorum: 5)
164
165 2) 3 nodes die, cluster is quorate with 5 nodes.
166
167 3) after last_man_standing_window timer expires,
168 expected_votes and quorum are recalculated.
169 (expected_votes: 5 quorum: 3)
170
171 4) at this point, 2 more nodes can die and
172 cluster will still be quorate with 3.
173
174 5) once again, after last_man_standing_window
175 timer expires expected_votes and quorum are
176 recalculated.
177 (expected_votes: 3 quorum: 2)
178
179 6) at this point, 1 more node can die and
180 cluster will still be quorate with 2.
181
182 7) one more last_man_standing_window timer
183 (expected_votes: 2 quorum: 2)
184
185 NOTES: In order for the cluster to downgrade automatically from 2 nodes
186 to a 1 node cluster, the auto_tie_breaker feature must also be enabled
187 (see below). If auto_tie_breaker is not enabled, and one more failure
188 occurs, the remaining node will not be quorate. LMS does not work with
189 asymmetric voting schemes, each node must vote 1. LMS is also incompat‐
190 ible with quorum devices, if last_man_standing is specified in
191 corosync.conf then the quorum device will be disabled.
192
193
194 Example configuration 1:
195
196 quorum {
197 provider: corosync_votequorum
198 expected_votes: 8
199 last_man_standing: 1
200 }
201
202 Example configuration 2 (increase timeout to 20 seconds):
203
204 quorum {
205 provider: corosync_votequorum
206 expected_votes: 8
207 last_man_standing: 1
208 last_man_standing_window: 20000
209 }
210
211 auto_tie_breaker: 1
212
213 Enables Auto Tie Breaker (ATB) feature (default: 0).
214
215 The general behaviour of votequorum allows a simultaneous node failure
216 up to 50% - 1 node, assuming each node has 1 vote.
217
218 When ATB is enabled, the cluster can suffer up to 50% of the nodes
219 failing at the same time, in a deterministic fashion. By default the
220 cluster partition, or the set of nodes that are still in contact with
221 the node that has the lowest nodeid will remain quorate. The other
222 nodes will be inquorate. This behaviour can be changed by also specify‐
223 ing
224
225 auto_tie_breaker_node: lowest|highest|<list of node IDs>
226
227 ´lowest' is the default, 'highest' is similar in that if the current
228 set of nodes contains the highest nodeid then it will remain quorate.
229 Alternatively it is possible to specify a particular node ID or list of
230 node IDs that will be required to maintain quorum. If a (space-sepa‐
231 rated) list is given, the nodes are evaluated in order, so if the first
232 node is present then it will be used to determine the quorate parti‐
233 tion, if that node is not in either half (ie was not in the cluster
234 before the split) then the second node ID will be checked for and so
235 on. ATB is incompatible with quorum devices - if auto_tie_breaker is
236 specified in corosync.conf then the quorum device will be disabled.
237
238 Example configuration 1:
239
240 quorum {
241 provider: corosync_votequorum
242 expected_votes: 8
243 auto_tie_breaker: 1
244 auto_tie_breaker_node: lowest
245 }
246
247 Example configuration 2:
248 quorum {
249 provider: corosync_votequorum
250 expected_votes: 8
251 auto_tie_breaker: 1
252 auto_tie_breaker_node: 1 3 5
253 }
254
255 allow_downscale: 1
256
257 Enables allow downscale (AD) feature (default: 0).
258
259 THIS FEATURE IS INCOMPLETE AND CURRENTLY UNSUPPORTED.
260
261 The general behaviour of votequorum is to never decrease expected votes
262 or quorum.
263
264 When AD is enabled, both expected votes and quorum are recalculated
265 when a node leaves the cluster in a clean state (normal corosync shut‐
266 down process) down to configured expected_votes.
267
268 Example use case:
269
270 1) N node cluster (where N is any value higher than 3)
271
272 2) expected_votes set to 3 in corosync.conf
273
274 3) only 3 nodes are running
275
276 4) admin requires to increase processing power and adds 10 nodes
277
278 5) internal expected_votes is automatically set to 13
279
280 6) minimum expected_votes is 3 (from configuration)
281
282 - up to this point this is standard votequorum behavior -
283
284 7) once the work is done, admin wants to remove nodes from the cluster
285
286 8) using an ordered shutdown the admin can reduce the cluster size
287 automatically back to 3, but not below 3, where normal quorum
288 operation will work as usual.
289
290
291 Example configuration:
292
293 quorum {
294 provider: corosync_votequorum
295 expected_votes: 3
296 allow_downscale: 1
297 }
298 allow_downscale implicitly enabled EVT (see below).
299
300 expected_votes_tracking: 1
301
302 Enables Expected Votes Tracking (EVT) feature (default: 0).
303
304 Expected Votes Tracking stores the highest-seen value of expected votes
305 on disk and uses that as the minimum value for expected votes in the
306 absence of any higher authority (eg a current quorate cluster). This is
307 useful for when a group of nodes becomes detached from the main cluster
308 and after a restart could have enough votes to provide quorum, which
309 can happen after using allow_downscale.
310
311 Note that even if the in-memory version of expected_votes is reduced,
312 eg by removing nodes or using corosync-quorumtool, the stored value
313 will still be the highest value seen - it never gets reduced.
314
315 The value is held in the file /var/lib/corosync/ev_tracking which can
316 be deleted if you really do need to reduce the expected votes for any
317 reason, like the node has been moved to a different cluster.
318
320 * WFA / LMS / ATB / AD can be used combined together.
321
322 * In order to change the default votes for a node there are two
323 options:
324
325 1) nodelist:
326
327 nodelist {
328 node {
329 ring0_addr: 192.168.1.1
330 quorum_votes: 3
331 }
332 ....
333 }
334
335 2) quorum section (deprecated):
336
337 quorum {
338 provider: corosync_votequorum
339 expected_votes: 2
340 votes: 2
341 }
342
343 In the event that both nodelist and quorum { votes: } are defined, the
344 value from the nodelist will be used.
345
346 * Only votes, quorum_votes, expected_votes and two_node can be changed
347 at runtime. Everything else requires a cluster restart.
348
350 No known bugs at the time of writing. The authors are from outerspace.
351 Deal with it.
352
354 corosync(8), corosync.conf(5), corosync-quorumtool(8), corosync-qde‐
355 vice(8), votequorum_overview(8)
356
357corosync Man Page 2012-01-24 VOTEQUORUM(5)