1COROSYNC_CONF(5) Corosync Cluster Engine Programmer's Manual COROSYNC_CONF(5)
2
3
4
6 corosync.conf - corosync executive configuration file
7
8
10 /etc/corosync/corosync.conf
11
12
14 The corosync.conf instructs the corosync executive about various param‐
15 eters needed to control the corosync executive. Empty lines and lines
16 starting with # character are ignored. The configuration file consists
17 of bracketed top level directives. The possible directive choices are:
18
19
20 totem { }
21 This top level directive contains configuration options for the
22 totem protocol.
23
24 logging { }
25 This top level directive contains configuration options for log‐
26 ging.
27
28 quorum { }
29 This top level directive contains configuration options for quo‐
30 rum.
31
32 nodelist { }
33 This top level directive contains configuration options for
34 nodes in cluster.
35
36 qb { } This top level directive contains configuration options related
37 to libqb.
38
39 resources { }
40 This top level directive contains configuration options for
41 resources.
42
43
44 Within the totem directive, an interface directive is required. There
45 is also one configuration option which is required:
46
47 Within the interface sub-directive of totem there are four parameters
48 which are required. There is one parameter which is optional.
49
50
51 ringnumber
52 This specifies the ring number for the interface. When using
53 the redundant ring protocol, each interface should specify sepa‐
54 rate ring numbers to uniquely identify to the membership proto‐
55 col which interface to use for which redundant ring. The
56 ringnumber must start at 0.
57
58
59 bindnetaddr
60 This specifies the network address the corosync executive should
61 bind to.
62
63 bindnetaddr should be an IP address configured on the system, or
64 a network address.
65
66 For example, if the local interface is 192.168.5.92 with netmask
67 255.255.255.0, you should set bindnetaddr to 192.168.5.92 or
68 192.168.5.0. If the local interface is 192.168.5.92 with net‐
69 mask 255.255.255.192, set bindnetaddr to 192.168.5.92 or
70 192.168.5.64, and so forth.
71
72 This may also be an IPV6 address, in which case IPV6 networking
73 will be used. In this case, the exact address must be specified
74 and there is no automatic selection of the network interface
75 within a specific subnet as with IPv4.
76
77 If IPv6 networking is used, the nodeid field in nodelist must be
78 specified.
79
80
81 broadcast
82 This is optional and can be set to yes. If it is set to yes,
83 the broadcast address will be used for communication. If this
84 option is set, mcastaddr should not be set.
85
86
87 mcastaddr
88 This is the multicast address used by corosync executive. The
89 default should work for most networks, but the network adminis‐
90 trator should be queried about a multicast address to use.
91 Avoid 224.x.x.x because this is a "config" multicast address.
92
93 This may also be an IPV6 multicast address, in which case IPV6
94 networking will be used. If IPv6 networking is used, the nodeid
95 field in nodelist must be specified.
96
97 It's not needed to use this option if cluster_name option is
98 used. If both options are used, mcastaddr has higher priority.
99
100
101 mcastport
102 This specifies the UDP port number. It is possible to use the
103 same multicast address on a network with the corosync services
104 configured for different UDP ports. Please note corosync uses
105 two UDP ports mcastport (for mcast receives) and mcastport - 1
106 (for mcast sends). If you have multiple clusters on the same
107 network using the same mcastaddr please configure the mcastports
108 with a gap.
109
110
111 ttl This specifies the Time To Live (TTL). If you run your cluster
112 on a routed network then the default of "1" will be too small.
113 This option provides a way to increase this up to 255. The valid
114 range is 0..255. Note that this is only valid on multicast
115 transport types.
116
117
118 Within the totem directive, there are seven configuration options of
119 which one is required, five are optional, and one is required when IPV6
120 is configured in the interface subdirective. The required directive
121 controls the version of the totem configuration. The optional option
122 unless using IPV6 directive controls identification of the processor.
123 The optional options control secrecy and authentication, the redundant
124 ring mode of operation and maximum network MTU field.
125
126
127 version
128 This specifies the version of the configuration file. Currently
129 the only valid version for this directive is 2.
130
131
132 clear_node_high_bit This configuration option is optional and is only
133 relevant when no nodeid is specified. Some corosync clients require a
134 signed 32 bit nodeid that is greater than zero however by default
135 corosync uses all 32 bits of the IPv4 address space when generating a
136 nodeid. Set this option to yes to force the high bit to be zero and
137 therefor ensure the nodeid is a positive signed 32 bit integer.
138
139 WARNING: The clusters behavior is undefined if this option is enabled
140 on only a subset of the cluster (for example during a rolling upgrade).
141
142
143 crypto_hash
144 This specifies which HMAC authentication should be used to
145 authenticate all messages. Valid values are none (no authentica‐
146 tion), md5, sha1, sha256, sha384 and sha512.
147
148 The default is sha1.
149
150
151 crypto_cipher
152 This specifies which cipher should be used to encrypt all mes‐
153 sages. Valid values are none (no encryption), aes256, aes192,
154 aes128 and 3des. Enabling crypto_cipher, requires also enabling
155 of crypto_hash.
156
157 The default is aes256.
158
159
160 secauth
161 This specifies that HMAC/SHA1 authentication should be used to
162 authenticate all messages. It further specifies that all data
163 should be encrypted with the nss library and aes256 encryption
164 algorithm to protect data from eavesdropping.
165
166 Enabling this option adds a encryption header to every message
167 sent by totem which reduces total throughput. Also encryption
168 and authentication consume extra CPU cycles in corosync.
169
170 The default is on.
171
172 WARNING: This parameter is deprecated. It's recomended to use
173 combination of crypto_cipher and crypto_hash.
174
175
176 rrp_mode
177 This specifies the mode of redundant ring, which may be none,
178 active, or passive. Currently only 'passive' is supported or
179 tested (using 'active' is not recommended). Active replica‐
180 tion offers slightly lower latency from transmit to delivery in
181 faulty network environments but with less performance. Passive
182 replication may nearly double the speed of the totem protocol if
183 the protocol doesn't become cpu bound. The final option is
184 none, in which case only one network interface will be used to
185 operate the totem protocol.
186
187 If only one interface directive is specified, none is automati‐
188 cally chosen. If multiple interface directives are specified,
189 only active or passive may be chosen.
190
191 The maximum number of interface directives that is allowed for
192 either modes (active or passive) is 2.
193
194 When using multiple interfaces, make sure to use different mul‐
195 ticast address/port (port for same address must differ by at
196 least two) pair for each interface (this is checked by parser)
197 to make rrp works.
198
199
200 netmtu This specifies the network maximum transmit unit. To set this
201 value beyond 1500, the regular frame MTU, requires ethernet
202 devices that support large, or also called jumbo, frames. If
203 any device in the network doesn't support large frames, the pro‐
204 tocol will not operate properly. The hosts must also have their
205 mtu size set from 1500 to whatever frame size is specified here.
206
207 Please note while some NICs or switches claim large frame sup‐
208 port, they support 9000 MTU as the maximum frame size including
209 the IP header. Setting the netmtu and host MTUs to 9000 will
210 cause totem to use the full 9000 bytes of the frame. Then Linux
211 will add a 18 byte header moving the full frame size to 9018.
212 As a result some hardware will not operate properly with this
213 size of data. A netmtu of 8982 seems to work for the few large
214 frame devices that have been tested. Some manufacturers claim
215 large frame support when in fact they support frame sizes of
216 4500 bytes.
217
218 When sending multicast traffic, if the network frequently recon‐
219 figures, chances are that some device in the network doesn't
220 support large frames.
221
222 Choose hardware carefully if intending to use large frame sup‐
223 port.
224
225 The default is 1500.
226
227
228 transport
229 This directive controls the transport mechanism used. If the
230 interface to which corosync is binding is an RDMA interface such
231 as RoCEE or Infiniband, the "iba" parameter may be specified.
232 To avoid the use of multicast entirely, a unicast transport
233 parameter "udpu" can be specified. This requires specifying the
234 list of members in nodelist directive, that could potentially
235 make up the membership before deployment.
236
237 The default is udp. The transport type can also be set to udpu
238 or iba.
239
240
241 cluster_name
242 This specifies the name of cluster and it's used for automatic
243 generating of multicast address.
244
245
246 config_version
247 This specifies version of config file. This is converted to
248 unsigned 64-bit int. By default it's 0. Option is used to pre‐
249 vent joining old nodes with not up-to-date configuration. If
250 value is not 0, and node is going for first time (only for first
251 time, join after split doesn't follow this rules) from single-
252 node membership to multiple nodes membership, other nodes con‐
253 fig_versions are collected. If current node config_version is
254 not equal to highest of collected versions, corosync is termi‐
255 nated.
256
257
258 ip_version
259 Specifies version of IP to use for communication. Value can be
260 one of ipv4 or ipv6. Default (if unspecified) is ipv4.
261
262
263 Within the totem directive, there are several configuration
264 options which are used to control the operation of the protocol.
265 It is generally not recommended to change any of these values
266 without proper guidance and sufficient testing. Some networks
267 may require larger values if suffering from frequent reconfigu‐
268 rations. Some applications may require faster failure detection
269 times which can be achieved by reducing the token timeout.
270
271
272 token This timeout is used directly or as a base for real token time‐
273 out calculation (explained in token_coefficient section). Token
274 timeout specifies in milliseconds until a token loss is declared
275 after not receiving a token. This is the time spent detecting a
276 failure of a processor in the current configuration. Reforming
277 a new configuration takes about 50 milliseconds in addition to
278 this timeout.
279
280 For real token timeout used by totem it's possible to read cmap
281 value of runtime.config.token key.
282
283 The default is 1000 milliseconds.
284
285
286 token_coefficient
287 This value is used only when nodelist section is specified and
288 contains at least 3 nodes. If so, real token timeout is then
289 computed as token + (number_of_nodes - 2) * token_coefficient.
290 This allows cluster to scale without manually changing token
291 timeout every time new node is added. This value can be set to 0
292 resulting in effective removal of this feature.
293
294 The default is 650 milliseconds.
295
296
297 token_retransmit
298 This timeout specifies in milliseconds after how long before
299 receiving a token the token is retransmitted. This will be
300 automatically calculated if token is modified. It is not recom‐
301 mended to alter this value without guidance from the corosync
302 community.
303
304 The default is 238 milliseconds.
305
306
307 hold This timeout specifies in milliseconds how long the token should
308 be held by the representative when the protocol is under low
309 utilization. It is not recommended to alter this value without
310 guidance from the corosync community.
311
312 The default is 180 milliseconds.
313
314
315 token_retransmits_before_loss_const
316 This value identifies how many token retransmits should be
317 attempted before forming a new configuration. If this value is
318 set, retransmit and hold will be automatically calculated from
319 retransmits_before_loss and token.
320
321 The default is 4 retransmissions.
322
323
324 join This timeout specifies in milliseconds how long to wait for join
325 messages in the membership protocol.
326
327 The default is 50 milliseconds.
328
329
330 send_join
331 This timeout specifies in milliseconds an upper range between 0
332 and send_join to wait before sending a join message. For con‐
333 figurations with less than 32 nodes, this parameter is not nec‐
334 essary. For larger rings, this parameter is necessary to ensure
335 the NIC is not overflowed with join messages on formation of a
336 new ring. A reasonable value for large rings (128 nodes) would
337 be 80msec. Other timer values must also change if this value is
338 changed. Seek advice from the corosync mailing list if trying
339 to run larger configurations.
340
341 The default is 0 milliseconds.
342
343
344 consensus
345 This timeout specifies in milliseconds how long to wait for con‐
346 sensus to be achieved before starting a new round of membership
347 configuration. The minimum value for consensus must be 1.2 *
348 token. This value will be automatically calculated at 1.2 *
349 token if the user doesn't specify a consensus value.
350
351 For two node clusters, a consensus larger than the join timeout
352 but less than token is safe. For three node or larger clusters,
353 consensus should be larger than token. There is an increasing
354 risk of odd membership changes, which still guarantee virtual
355 synchrony, as node count grows if consensus is less than token.
356
357 The default is 1200 milliseconds.
358
359
360 merge This timeout specifies in milliseconds how long to wait before
361 checking for a partition when no multicast traffic is being
362 sent. If multicast traffic is being sent, the merge detection
363 happens automatically as a function of the protocol.
364
365 The default is 200 milliseconds.
366
367
368 downcheck
369 This timeout specifies in milliseconds how long to wait before
370 checking that a network interface is back up after it has been
371 downed.
372
373 The default is 1000 milliseconds.
374
375
376 fail_recv_const
377 This constant specifies how many rotations of the token without
378 receiving any of the messages when messages should be received
379 may occur before a new configuration is formed.
380
381 The default is 2500 failures to receive a message.
382
383
384 seqno_unchanged_const
385 This constant specifies how many rotations of the token without
386 any multicast traffic should occur before the hold timer is
387 started.
388
389 The default is 30 rotations.
390
391
392 heartbeat_failures_allowed
393 [HeartBeating mechanism] Configures the optional HeartBeating
394 mechanism for faster failure detection. Keep in mind that engag‐
395 ing this mechanism in lossy networks could cause faulty loss
396 declaration as the mechanism relies on the network for heart‐
397 beating.
398
399 So as a rule of thumb use this mechanism if you require improved
400 failure in low to medium utilized networks.
401
402 This constant specifies the number of heartbeat failures the
403 system should tolerate before declaring heartbeat failure e.g 3.
404 Also if this value is not set or is 0 then the heartbeat mecha‐
405 nism is not engaged in the system and token rotation is the
406 method of failure detection
407
408 The default is 0 (disabled).
409
410
411 max_network_delay
412 [HeartBeating mechanism] This constant specifies in milliseconds
413 the approximate delay that your network takes to transport one
414 packet from one machine to another. This value is to be set by
415 system engineers and please don't change if not sure as this
416 effects the failure detection mechanism using heartbeat.
417
418 The default is 50 milliseconds.
419
420
421 window_size
422 This constant specifies the maximum number of messages that may
423 be sent on one token rotation. If all processors perform
424 equally well, this value could be large (300), which would
425 introduce higher latency from origination to delivery for very
426 large rings. To reduce latency in large rings(16+), the
427 defaults are a safe compromise. If 1 or more slow processor(s)
428 are present among fast processors, window_size should be no
429 larger than 256000 / netmtu to avoid overflow of the kernel
430 receive buffers. The user is notified of this by the display of
431 a retransmit list in the notification logs. There is no loss of
432 data, but performance is reduced when these errors occur.
433
434 The default is 50 messages.
435
436
437 max_messages
438 This constant specifies the maximum number of messages that may
439 be sent by one processor on receipt of the token. The max_mes‐
440 sages parameter is limited to 256000 / netmtu to prevent over‐
441 flow of the kernel transmit buffers.
442
443 The default is 17 messages.
444
445
446 miss_count_const
447 This constant defines the maximum number of times on receipt of
448 a token a message is checked for retransmission before a
449 retransmission occurs. This parameter is useful to modify for
450 switches that delay multicast packets compared to unicast pack‐
451 ets. The default setting works well for nearly all modern
452 switches.
453
454 The default is 5 messages.
455
456
457 rrp_problem_count_timeout
458 This specifies the time in milliseconds to wait before decre‐
459 menting the problem count by 1 for a particular ring to ensure a
460 link is not marked faulty for transient network failures.
461
462 The default is 2000 milliseconds.
463
464
465 rrp_problem_count_threshold
466 This specifies the number of times a problem is detected with a
467 link before setting the link faulty. Once a link is set faulty,
468 no more data is transmitted upon it. Also, the problem counter
469 is no longer decremented when the problem count timeout expires.
470
471 A problem is detected whenever all tokens from the proceeding
472 processor have not been received within the
473 rrp_token_expired_timeout. The rrp_problem_count_threshold *
474 rrp_token_expired_timeout should be atleast 50 milliseconds less
475 then the token timeout, or a complete reconfiguration may occur.
476
477 The default is 10 problem counts.
478
479
480 rrp_problem_count_mcast_threshold
481 This specifies the number of times a problem is detected with
482 multicast before setting the link faulty for passive rrp mode.
483 This variable is unused in active rrp mode.
484
485 The default is 10 times rrp_problem_count_threshold.
486
487
488 rrp_token_expired_timeout
489 This specifies the time in milliseconds to increment the problem
490 counter for the redundant ring protocol after not having
491 received a token from all rings for a particular processor.
492
493 This value will automatically be calculated from the token time‐
494 out and problem_count_threshold but may be overridden. It is
495 not recommended to override this value without guidance from the
496 corosync community.
497
498 The default is 47 milliseconds.
499
500
501 rrp_autorecovery_check_timeout
502 This specifies the time in milliseconds to check if the failed
503 ring can be auto-recovered.
504
505 The default is 1000 milliseconds.
506
507
508 Within the logging directive, there are several configuration options
509 which are all optional.
510
511
512 The following 3 options are valid only for the top level logging direc‐
513 tive:
514
515
516 timestamp
517 This specifies that a timestamp is placed on all log messages.
518
519 The default is off.
520
521
522 fileline
523 This specifies that file and line should be printed.
524
525 The default is off.
526
527
528 function_name
529 This specifies that the code function name should be printed.
530
531 The default is off.
532
533
534 blackbox
535 This specifies that blackbox functionality should be enabled.
536
537 The defualt is on.
538
539
540 The following options are valid both for top level logging directive
541 and they can be overridden in logger_subsys entries.
542
543
544 to_stderr
545
546 to_logfile
547
548 to_syslog
549 These specify the destination of logging output. Any combination
550 of these options may be specified. Valid options are yes and no.
551
552 The default is syslog and stderr.
553
554 Please note, if you are using to_logfile and want to rotate the
555 file, use logrotate(8) with the option copytruncate. eg.
556 /var/log/corosync.log {
557 missingok
558 compress
559 notifempty
560 daily
561 rotate 7
562 copytruncate
563 }
564
565
566 logfile
567 If the to_logfile directive is set to yes , this option speci‐
568 fies the pathname of the log file.
569
570 No default.
571
572
573 logfile_priority
574 This specifies the logfile priority for this particular subsys‐
575 tem. Ignored if debug is on. Possible values are: alert, crit,
576 debug (same as debug = on), emerg, err, info, notice, warning.
577
578 The default is: info.
579
580
581 syslog_facility
582 This specifies the syslog facility type that will be used for
583 any messages sent to syslog. options are daemon, local0, local1,
584 local2, local3, local4, local5, local6 & local7.
585
586 The default is daemon.
587
588
589 syslog_priority
590 This specifies the syslog level for this particular subsystem.
591 Ignored if debug is on. Possible values are: alert, crit, debug
592 (same as debug = on), emerg, err, info, notice, warning.
593
594 The default is: info.
595
596
597 debug This specifies whether debug output is logged for this particu‐
598 lar logger. Also can contain value trace, what is highest level
599 of debug information.
600
601 The default is off.
602
603
604 Within the logging directive, logger_subsys directives are optional.
605
606
607 Within the logger_subsys sub-directive, all of the above logging con‐
608 figuration options are valid and can be used to override the default
609 settings. The subsys entry, described below, is mandatory to identify
610 the subsystem.
611
612
613 subsys This specifies the subsystem identity (name) for which logging
614 is specified. This is the name used by a service in the
615 log_init() call. E.g. 'CPG'. This directive is required.
616
617
618 Within the quorum directive it is possible to specify the quorum algo‐
619 rithm to use with the
620
621
622 provider
623 directive. At the time of writing only corosync_votequorum is
624 supported. See votequorum(5) for configuration options.
625
626
627 Within the nodelist directive it is possible to specify specific infor‐
628 mation about nodes in cluster. Directive can contain only node sub-
629 directive, which specifies every node that should be a member of the
630 membership, and where non-default options are needed. Every node must
631 have at least ring0_addr field filled.
632
633 For UDPU, every node that should be a member of the membership must be
634 specified.
635
636 Possible options are:
637
638 ringX_addr
639 This specifies IP address of one of the nodes. X is ring number.
640
641
642 nodeid This configuration option is optional when using IPv4 and
643 required when using IPv6. This is a 32 bit value specifying the
644 node identifier delivered to the cluster membership service. If
645 this is not specified with IPv4, the node id will be determined
646 from the 32 bit IP address the system to which the system is
647 bound with ring identifier of 0. The node identifier value of
648 zero is reserved and should not be used.
649
650
651 Within the qb directive it is possible to specify options for libqb.
652
653 Possible option is:
654
655 ipc_type
656 This specifies type of IPC to use. Can be one of native
657 (default), shm and socket. Native means one of shm or socket,
658 depending on what is supported by OS. On systems with support
659 for both, SHM is selected. SHM is generally faster, but need to
660 allocate ring buffer file in /dev/shm.
661
662
663 Within the resources directive it is possible to specify options for
664 resources.
665
666 Possible option is:
667
668 watchdog_device
669 (Valid only if Corosync was compiled with watchdog support.)
670 Watchdog device to use. The default value is /dev/watchdog.
671 The special value "off" disables watchdog usage.
672
673 In a cluster with properly configured power fencing a watchdog
674 provides no additional value. On the other hand, slow watchdog
675 communication may incur multi-second delays in the Corosync main
676 loop, potentially breaking down membership. IPMI watchdogs are
677 particularly notorious in this regard: read about kip‐
678 mid_max_busy_us in IPMI.txt in the Linux kernel documentation.
679
680
682 /etc/corosync/corosync.conf
683 The corosync executive configuration file.
684
685
687 corosync_overview(8), votequorum(5), corosync-qdevice(8), logrotate(8)
688
689corosync Man Page 2012-10-10 COROSYNC_CONF(5)