1VOTEQUORUM(5)     Corosync Cluster Engine Programmer's Manual    VOTEQUORUM(5)
2
3
4

NAME

6       votequorum - Votequorum Configuration Overview
7

OVERVIEW

9       The  votequorum  service  is part of the corosync project. This service
10       can be optionally loaded into the nodes of a corosync cluster to  avoid
11       split-brain  situations.   It  does  this  by  having a number of votes
12       assigned to each system in the cluster and ensuring that  only  when  a
13       majority  of  the  votes are present, cluster operations are allowed to
14       proceed.  The service must be loaded into all nodes or none. If  it  is
15       loaded  into  a  subset  of  cluster  nodes  the results will be unpre‐
16       dictable.
17
18       The following corosync.conf  extract  will  enable  votequorum  service
19       within corosync:
20
21       quorum {
22           provider: corosync_votequorum
23       }
24
25       votequorum  reads its configuration from corosync.conf. Some values can
26       be changed at runtime, others are only read at corosync startup. It  is
27       very  important  that  those values are consistent across all the nodes
28       participating in the cluster or  votequorum  behavior  will  be  unpre‐
29       dictable.
30
31       votequorum  requires  an  expected_votes value to function, this can be
32       provided in two ways.  The number of expected votes will  be  automati‐
33       cally   calculated  when  the  nodelist  {  }  section  is  present  in
34       corosync.conf or expected_votes can be specified in the quorum { } sec‐
35       tion.  Lack of both will disable votequorum. If both are present at the
36       same time, the quorum.expected_votes value will override the one calcu‐
37       lated from the nodelist.
38
39       Example (no nodelist) of an 8 node cluster (each node has 1 vote):
40
41       quorum {
42           provider: corosync_votequorum
43           expected_votes: 8
44       }
45
46       Example (with nodelist) of a 3 node cluster (each node has 1 vote):
47
48       quorum {
49           provider: corosync_votequorum
50       }
51
52       nodelist {
53           node {
54               ring0_addr: 192.168.1.1
55           }
56           node {
57               ring0_addr: 192.168.1.2
58           }
59           node {
60               ring0_addr: 192.168.1.3
61           }
62       }
63

SPECIAL FEATURES

65       two_node: 1
66
67       Enables two node cluster operations (default: 0).
68
69       The  "two  node cluster" is a use case that requires special considera‐
70       tion.  With a standard two node cluster, each node with a single  vote,
71       there are 2 votes in the cluster. Using the simple majority calculation
72       (50% of the votes + 1) to calculate quorum,  the  quorum  would  be  2.
73       This  means  that  the both nodes would always have to be alive for the
74       cluster to be quorate and operate.
75
76       Enabling two_node: 1, quorum is set artificially to 1.
77
78       Example configuration 1:
79
80       quorum {
81           provider: corosync_votequorum
82           expected_votes: 2
83           two_node: 1
84       }
85
86
87       Example configuration 2:
88
89       quorum {
90           provider: corosync_votequorum
91           two_node: 1
92       }
93
94       nodelist {
95           node {
96               ring0_addr: 192.168.1.1
97           }
98           node {
99               ring0_addr: 192.168.1.2
100           }
101       }
102
103       NOTES: enabling two_node: 1 automatically enables wait_for_all.  It  is
104       still  possible to override wait_for_all by explicitly setting it to 0.
105       If more than 2 nodes join the cluster, the two_node option is automati‐
106       cally disabled.
107
108       wait_for_all: 1
109
110       Enables Wait For All (WFA) feature (default: 0).
111
112       The  general behaviour of votequorum is to switch a cluster from inquo‐
113       rate to quorate as soon as possible. For example, in an 8 node cluster,
114       where  every  node has 1 vote, expected_votes is set to 8 and quorum is
115       (50% + 1) 5. As soon as 5 (or more) nodes are visible  to  each  other,
116       the partition of 5 (or more) becomes quorate and can start operating.
117
118       When  WFA  is  enabled,  the cluster will be quorate for the first time
119       only after all nodes have been visible at least once at the same time.
120
121       This feature has the advantage of avoiding  some  startup  race  condi‐
122       tions,  with  the cost that all nodes need to be up at the same time at
123       least once before the cluster can operate.
124
125       A common startup race condition based on the above example is  that  as
126       soon  as  5  nodes  become quorate, with the other 3 still offline, the
127       remaining 3 nodes will be fenced.
128
129       It is very useful when combined with last_man_standing (see below).
130
131       Example configuration:
132
133       quorum {
134           provider: corosync_votequorum
135           expected_votes: 8
136           wait_for_all: 1
137       }
138
139       last_man_standing: 1 / last_man_standing_window: 10000
140
141       Enables  Last  Man  Standing  (LMS)  feature  (default:  0).    Tunable
142       last_man_standing_window (default: 10 seconds, expressed in ms).
143
144       The general behaviour of votequorum is to set expected_votes and quorum
145       at startup (unless modified by the user at runtime, see below) and  use
146       those values during the whole lifetime of the cluster.
147
148       Using  for  example  an  8  node  cluster  where  each node has 1 vote,
149       expected_votes is set to 8 and quorum to 5.  This  condition  allows  a
150       total  failure  of  3  nodes.  If a 4th node fails, the cluster becomes
151       inquorate and it will stop providing services.
152
153       Enabling  LMS   allows   the   cluster   to   dynamically   recalculate
154       expected_votes and quorum under specific circumstances. It is essential
155       to enable WFA when using LMS in High Availability clusters.
156
157       Using the above 8 node cluster example, with LMS  enabled  the  cluster
158       can  retain quorum and continue operating by losing, in a cascade fash‐
159       ion, up to 6 nodes with only 2 remaining active.
160
161       Example chain of events:
162       1) cluster is fully operational with 8 nodes.
163          (expected_votes: 8 quorum: 5)
164
165       2) 3 nodes die, cluster is quorate with 5 nodes.
166
167       3) after last_man_standing_window timer expires,
168          expected_votes and quorum are recalculated.
169          (expected_votes: 5 quorum: 3)
170
171       4) at this point, 2 more nodes can die and
172          cluster will still be quorate with 3.
173
174       5) once again, after last_man_standing_window
175          timer expires expected_votes and quorum are
176          recalculated.
177          (expected_votes: 3 quorum: 2)
178
179       6) at this point, 1 more node can die and
180          cluster will still be quorate with 2.
181
182       7) one more last_man_standing_window timer
183          (expected_votes: 2 quorum: 2)
184
185       NOTES: In order for the cluster to downgrade automatically from 2 nodes
186       to  a 1 node cluster, the auto_tie_breaker feature must also be enabled
187       (see below).  If auto_tie_breaker is not enabled, and one more  failure
188       occurs,  the remaining node will not be quorate. LMS does not work with
189       asymmetric voting schemes, each node must vote 1. LMS is also incompat‐
190       ible   with  quorum  devices,  if  last_man_standing  is  specified  in
191       corosync.conf then the quorum device will be disabled.
192
193
194       Example configuration 1:
195
196       quorum {
197           provider: corosync_votequorum
198           expected_votes: 8
199           last_man_standing: 1
200       }
201
202       Example configuration 2 (increase timeout to 20 seconds):
203
204       quorum {
205           provider: corosync_votequorum
206           expected_votes: 8
207           last_man_standing: 1
208           last_man_standing_window: 20000
209       }
210
211       auto_tie_breaker: 1
212
213       Enables Auto Tie Breaker (ATB) feature (default: 0).
214
215       The general behaviour of votequorum allows a simultaneous node  failure
216       up to 50% - 1 node, assuming each node has 1 vote.
217
218       When  ATB  is  enabled,  the  cluster can suffer up to 50% of the nodes
219       failing at the same time, in a deterministic fashion.  By  default  the
220       cluster  partition,  or the set of nodes that are still in contact with
221       the node that has the lowest nodeid  will  remain  quorate.  The  other
222       nodes will be inquorate. This behaviour can be changed by also specify‐
223       ing
224
225       auto_tie_breaker_node: lowest|highest|<list of node IDs>
226
227       ´lowest' is the default, 'highest' is similar in that  if  the  current
228       set  of  nodes contains the highest nodeid then it will remain quorate.
229       Alternatively it is possible to specify a particular node ID or list of
230       node  IDs  that  will be required to maintain quorum. If a (space-sepa‐
231       rated) list is given, the nodes are evaluated in order, so if the first
232       node  is  present  then it will be used to determine the quorate parti‐
233       tion, if that node is not in either half (ie was  not  in  the  cluster
234       before  the  split)  then the second node ID will be checked for and so
235       on. ATB is incompatible with quorum devices -  if  auto_tie_breaker  is
236       specified in corosync.conf then the quorum device will be disabled.
237
238       Example configuration 1:
239
240       quorum {
241           provider: corosync_votequorum
242           expected_votes: 8
243           auto_tie_breaker: 1
244           auto_tie_breaker_node: lowest
245       }
246
247       Example configuration 2:
248       quorum {
249           provider: corosync_votequorum
250           expected_votes: 8
251           auto_tie_breaker: 1
252           auto_tie_breaker_node: 1 3 5
253       }
254
255       allow_downscale: 1
256
257       Enables allow downscale (AD) feature (default: 0).
258
259       THIS FEATURE IS INCOMPLETE AND CURRENTLY UNSUPPORTED.
260
261       The general behaviour of votequorum is to never decrease expected votes
262       or quorum.
263
264       When AD is enabled, both expected votes  and  quorum  are  recalculated
265       when  a node leaves the cluster in a clean state (normal corosync shut‐
266       down process) down to configured expected_votes.
267
268       Example use case:
269
270       1) N node cluster (where N is any value higher than 3)
271
272       2) expected_votes set to 3 in corosync.conf
273
274       3) only 3 nodes are running
275
276       4) admin requires to increase processing power and adds 10 nodes
277
278       5) internal expected_votes is automatically set to 13
279
280       6) minimum expected_votes is 3 (from configuration)
281
282       - up to this point this is standard votequorum behavior -
283
284       7) once the work is done, admin wants to remove nodes from the cluster
285
286       8) using an ordered shutdown the admin can reduce the cluster size
287          automatically back to 3, but not below 3, where normal quorum
288          operation will work as usual.
289
290
291       Example configuration:
292
293       quorum {
294           provider: corosync_votequorum
295           expected_votes: 3
296           allow_downscale: 1
297       }
298       allow_downscale implicitly enabled EVT (see below).
299
300       expected_votes_tracking: 1
301
302       Enables Expected Votes Tracking (EVT) feature (default: 0).
303
304       Expected Votes Tracking stores the highest-seen value of expected votes
305       on  disk  and  uses that as the minimum value for expected votes in the
306       absence of any higher authority (eg a current quorate cluster). This is
307       useful for when a group of nodes becomes detached from the main cluster
308       and after a restart could have enough votes to  provide  quorum,  which
309       can happen after using allow_downscale.
310
311       Note  that  even if the in-memory version of expected_votes is reduced,
312       eg by removing nodes or using  corosync-quorumtool,  the  stored  value
313       will still be the highest value seen - it never gets reduced.
314
315       The value is held in the file ev_tracking (stored in the directory con‐
316       figured in system.state_dir or /var/lib/corosync/ when unset) which can
317       be  deleted  if you really do need to reduce the expected votes for any
318       reason, like the node has been moved to a different cluster.
319

VARIOUS NOTES

321       * WFA / LMS / ATB / AD can be used combined together.
322
323       * In order to change the  default  votes  for  a  node  there  are  two
324       options:
325
326       1) nodelist:
327
328       nodelist {
329           node {
330               ring0_addr: 192.168.1.1
331               quorum_votes: 3
332           }
333           ....
334       }
335
336       2) quorum section (deprecated):
337
338       quorum {
339           provider: corosync_votequorum
340           expected_votes: 2
341           votes: 2
342       }
343
344       In  the event that both nodelist and quorum { votes: } are defined, the
345       value from the nodelist will be used.
346
347       * Only votes, quorum_votes, expected_votes and two_node can be  changed
348       at runtime. Everything else requires a cluster restart.
349

BUGS

351       No  known bugs at the time of writing. The authors are from outerspace.
352       Deal with it.
353

SEE ALSO

355       corosync(8),  corosync.conf(5),  corosync-quorumtool(8),  corosync-qde‐
356       vice(8), votequorum_overview(3)
357
358corosync Man Page                 2018-12-14                     VOTEQUORUM(5)
Impressum