1COROSYNC_CONF(5) Corosync Cluster Engine Programmer's Manual COROSYNC_CONF(5)
2
3
4
6 corosync.conf - corosync executive configuration file
7
8
10 /etc/corosync/corosync.conf
11
12
14 The corosync.conf instructs the corosync executive about various param‐
15 eters needed to control the corosync executive. Empty lines and lines
16 starting with # character are ignored. The configuration file consists
17 of bracketed top level directives. The possible directive choices are:
18
19
20 totem { }
21 This top level directive contains configuration options for the
22 totem protocol.
23
24 logging { }
25 This top level directive contains configuration options for log‐
26 ging.
27
28 event { }
29 This top level directive contains configuration options for the
30 event service.
31
32
33 It is also possible to specify the top level parameter compatibility.
34 This directive indicates the level of compatibility requested by the
35 user. The option whitetank can be specified to remain backward compat‐
36 able with openais-0.80.z. The option none can be specified to only be
37 compatable with corosync-1.Y.Z. Extra processing during configuration
38 changes is required to remain backward compatable.
39
40 The default is whitetank. (backwards compatibility)
41
42
43 Within the totem directive, an interface directive is required. There
44 is also one configuration option which is required:
45
46 Within the interface sub-directive of totem there are four parameters
47 which are required. There is one parameter which is optional.
48
49
50 ringnumber
51 This specifies the ring number for the interface. When using
52 the redundant ring protocol, each interface should specify sepa‐
53 rate ring numbers to uniquely identify to the membership proto‐
54 col which interface to use for which redundant ring. The
55 ringnumber must start at 0.
56
57
58 bindnetaddr
59 This specifies the network address the corosync executive should
60 bind to. For example, if the local interface is 192.168.5.92
61 with netmask 255.255.255.0, set bindnetaddr to 192.168.5.0. If
62 the local interface is 192.168.5.92 with netmask
63 255.255.255.192, set bindnetaddr to 192.168.5.64, and so forth.
64
65 This may also be an IPV6 address, in which case IPV6 networking
66 will be used. In this case, the full address must be specified
67 and there is no automatic selection of the network interface
68 within a specific subnet as with IPv4.
69
70 If IPv6 networking is used, the nodeid field must be specified.
71
72
73 broadcast
74 This is optional and can be set to yes. If it is set to yes,
75 the broadcast address will be used for communication. If this
76 option is set, mcastaddr should not be set.
77
78
79 mcastaddr
80 This is the multicast address used by corosync executive. The
81 default should work for most networks, but the network adminis‐
82 trator should be queried about a multicast address to use.
83 Avoid 224.x.x.x because this is a "config" multicast address.
84
85 This may also be an IPV6 multicast address, in which case IPV6
86 networking will be used. If IPv6 networking is used, the nodeid
87 field must be specified.
88
89
90 mcastport
91 This specifies the UDP port number. It is possible to use the
92 same multicast address on a network with the corosync services
93 configured for different UDP ports. Please note corosync uses
94 two UDP ports mcastport (for mcast receives) and mcastport - 1
95 (for mcast sends). If you have multiple clusters on the same
96 network using the same mcastaddr please configure the mcastports
97 with a gap.
98
99
100 ttl This specifies the Time To Live (TTL). If you run your cluster
101 on a routed network then the default of "1" will be too small.
102 This option provides a way to increase this up to 255. The valid
103 range is 0..255. Note that this is only valid on multicast
104 transport types.
105
106
107 member This specifies a member on the interface and used with the udpu
108 transport only. Every node that should be a member of the mem‐
109 bership should be specified as a separate member directive.
110 Within the member directive there is a parameter memberaddr
111 which specifies the ip address of one of the nodes.
112
113
114 Within the totem directive, there are seven configuration options of
115 which one is required, five are optional, and one is required when IPV6
116 is configured in the interface subdirective. The required directive
117 controls the version of the totem configuration. The optional option
118 unless using IPV6 directive controls identification of the processor.
119 The optional options control secrecy and authentication, the redundant
120 ring mode of operation, maximum network MTU, and number of sending
121 threads, and the nodeid field.
122
123
124 version
125 This specifies the version of the configuration file. Currently
126 the only valid version for this directive is 2.
127
128
129 nodeid This configuration option is optional when using IPv4 and
130 required when using IPv6. This is a 32 bit value specifying the
131 node identifier delivered to the cluster membership service. If
132 this is not specified with IPv4, the node id will be determined
133 from the 32 bit IP address the system to which the system is
134 bound with ring identifier of 0. The node identifier value of
135 zero is reserved and should not be used.
136
137
138 clear_node_high_bit
139 This configuration option is optional and is only relevant when
140 no nodeid is specified. Some openais clients require a signed
141 32 bit nodeid that is greater than zero however by default ope‐
142 nais uses all 32 bits of the IPv4 address space when generating
143 a nodeid. Set this option to yes to force the high bit to be
144 zero and therefor ensure the nodeid is a positive signed 32 bit
145 integer.
146
147 WARNING: The clusters behavior is undefined if this option is
148 enabled on only a subset of the cluster (for example during a
149 rolling upgrade).
150
151
152 secauth
153 This specifies that HMAC/SHA1 authentication should be used to
154 authenticate all messages. It further specifies that all data
155 should be encrypted with the sober128 encryption algorithm to
156 protect data from eavesdropping.
157
158 Enabling this option adds a 36 byte header to every message sent
159 by totem which reduces total throughput. Encryption and authen‐
160 tication consume 75% of CPU cycles in aisexec as measured with
161 gprof when enabled.
162
163 For 100mbit networks with 1500 MTU frame transmissions: A
164 throughput of 9mb/sec is possible with 100% cpu utilization when
165 this option is enabled on 3ghz cpus. A throughput of 10mb/sec
166 is possible wth 20% cpu utilization when this optin is disabled
167 on 3ghz cpus.
168
169 For gig-e networks with large frame transmissions: A throughput
170 of 20mb/sec is possible when this option is enabled on 3ghz
171 cpus. A throughput of 60mb/sec is possible when this option is
172 disabled on 3ghz cpus.
173
174 The default is on.
175
176
177 rrp_mode
178 This specifies the mode of redundant ring, which may be none,
179 active, or passive. Active replication offers slightly lower
180 latency from transmit to delivery in faulty network environments
181 but with less performance. Passive replication may nearly dou‐
182 ble the speed of the totem protocol if the protocol doesn't
183 become cpu bound. The final option is none, in which case only
184 one network interface will be used to operate the totem proto‐
185 col.
186
187 If only one interface directive is specified, none is automati‐
188 cally chosen. If multiple interface directives are specified,
189 only active or passive may be chosen.
190
191 When using multiple interfaces, make sure to use different mul‐
192 ticast address/port (port for same address must differ by at
193 least two) pair for each interface (this is checked by parser)
194 to make rrp works.
195
196
197 netmtu This specifies the network maximum transmit unit. To set this
198 value beyond 1500, the regular frame MTU, requires ethernet
199 devices that support large, or also called jumbo, frames. If
200 any device in the network doesn't support large frames, the pro‐
201 tocol will not operate properly. The hosts must also have their
202 mtu size set from 1500 to whatever frame size is specified here.
203
204 Please note while some NICs or switches claim large frame sup‐
205 port, they support 9000 MTU as the maximum frame size including
206 the IP header. Setting the netmtu and host MTUs to 9000 will
207 cause totem to use the full 9000 bytes of the frame. Then Linux
208 will add a 18 byte header moving the full frame size to 9018.
209 As a result some hardware will not operate properly with this
210 size of data. A netmtu of 8982 seems to work for the few large
211 frame devices that have been tested. Some manufacturers claim
212 large frame support when in fact they support frame sizes of
213 4500 bytes.
214
215 Increasing the MTU from 1500 to 8982 doubles throughput perfor‐
216 mance from 30MB/sec to 60MB/sec as measured with evsbench with
217 175000 byte messages with the secauth directive set to off.
218
219 When sending multicast traffic, if the network frequently recon‐
220 figures, chances are that some device in the network doesn't
221 support large frames.
222
223 Choose hardware carefully if intending to use large frame sup‐
224 port.
225
226 The default is 1500.
227
228
229 threads
230 This directive controls how many threads are used to encrypt and
231 send multicast messages. If secauth is off, the protocol will
232 never use threaded sending. If secauth is on, this directive
233 allows systems to be configured to use multiple threads to
234 encrypt and send multicast messages.
235
236 A thread directive of 0 indicates that no threaded send should
237 be used. This mode offers best performance for non-SMP systems.
238
239 The default is 0.
240
241
242 vsftype
243 This directive controls the virtual synchrony filter type used
244 to identify a primary component. The preferred choice is YKD
245 dynamic linear voting, however, for clusters larger then 32
246 nodes YKD consumes alot of memory. For large scale clusters
247 that are created by changing the MAX_PROCESSORS_COUNT #define in
248 the C code totem.h file, the virtual synchrony filter "none" is
249 recommended but then AMF and DLCK services (which are currently
250 experimental) are not safe for use.
251
252 The default is ykd. The vsftype can also be set to none.
253
254
255 transport
256 This directive controls the transport mechanism used. If the
257 interface to which corosync is binding is an RDMA interface such
258 as RoCEE or Infiniband, the "iba" parameter may be specified.
259 To avoid the use of multicast entirely, a unicast transport
260 parameter "udpu" can be specified. This requires specifying the
261 list of members that could potentially make up the membership
262 before deployment.
263
264 The default is udp. The transport type can also be set to udpu
265 or iba.
266
267 Within the totem directive, there are several configuration
268 options which are used to control the operation of the protocol.
269 It is generally not recommended to change any of these values
270 without proper guidance and sufficient testing. Some networks
271 may require larger values if suffering from frequent reconfigu‐
272 rations. Some applications may require faster failure detection
273 times which can be achieved by reducing the token timeout.
274
275
276 token This timeout specifies in milliseconds until a token loss is
277 declared after not receiving a token. This is the time spent
278 detecting a failure of a processor in the current configuration.
279 Reforming a new configuration takes about 50 milliseconds in
280 addition to this timeout.
281
282 The default is 1000 milliseconds.
283
284
285 token_retransmit
286 This timeout specifies in milliseconds after how long before
287 receiving a token the token is retransmitted. This will be
288 automatically calculated if token is modified. It is not recom‐
289 mended to alter this value without guidance from the corosync
290 community.
291
292 The default is 238 milliseconds.
293
294
295 hold This timeout specifies in milliseconds how long the token should
296 be held by the representative when the protocol is under low
297 utilization. It is not recommended to alter this value without
298 guidance from the corosync community.
299
300 The default is 180 milliseconds.
301
302
303 token_retransmits_before_loss_const
304 This value identifies how many token retransmits should be
305 attempted before forming a new configuration. If this value is
306 set, retransmit and hold will be automatically calculated from
307 retransmits_before_loss and token.
308
309 The default is 4 retransmissions.
310
311
312 join This timeout specifies in milliseconds how long to wait for join
313 messages in the membership protocol.
314
315 The default is 50 milliseconds.
316
317
318 send_join
319 This timeout specifies in milliseconds an upper range between 0
320 and send_join to wait before sending a join message. For con‐
321 figurations with less then 32 nodes, this parameter is not nec‐
322 essary. For larger rings, this parameter is necessary to ensure
323 the NIC is not overflowed with join messages on formation of a
324 new ring. A reasonable value for large rings (128 nodes) would
325 be 80msec. Other timer values must also change if this value is
326 changed. Seek advice from the corosync mailing list if trying
327 to run larger configurations.
328
329 The default is 0 milliseconds.
330
331
332 consensus
333 This timeout specifies in milliseconds how long to wait for con‐
334 sensus to be achieved before starting a new round of membership
335 configuration. The minimum value for consensus must be 1.2 *
336 token. This value will be automatically calculated at 1.2 *
337 token if the user doesn't specify a consensus value.
338
339 For two node clusters, a consensus larger then the join timeout
340 but less then token is safe. For three node or larger clusters,
341 consensus should be larger then token. There is an increasing
342 risk of odd membership changes, which stil guarantee virtual
343 synchrony, as node count grows if consensus is less than token.
344
345 The default is 1200 milliseconds.
346
347
348 merge This timeout specifies in milliseconds how long to wait before
349 checking for a partition when no multicast traffic is being
350 sent. If multicast traffic is being sent, the merge detection
351 happens automatically as a function of the protocol.
352
353 The default is 200 milliseconds.
354
355
356 downcheck
357 This timeout specifies in milliseconds how long to wait before
358 checking that a network interface is back up after it has been
359 downed.
360
361 The default is 1000 millseconds.
362
363
364 fail_recv_const
365 This constant specifies how many rotations of the token without
366 receiving any of the messages when messages should be received
367 may occur before a new configuration is formed.
368
369 The default is 2500 failures to receive a message.
370
371
372 seqno_unchanged_const
373 This constant specifies how many rotations of the token without
374 any multicast traffic should occur before the hold timer is
375 started.
376
377 The default is 30 rotations.
378
379
380 heartbeat_failures_allowed
381 [HeartBeating mechanism] Configures the optional HeartBeating
382 mechanism for faster failure detection. Keep in mind that engag‐
383 ing this mechanism in lossy networks could cause faulty loss
384 declaration as the mechanism relies on the network for heart‐
385 beating.
386
387 So as a rule of thumb use this mechanism if you require improved
388 failure in low to medium utilized networks.
389
390 This constant specifies the number of heartbeat failures the
391 system should tolerate before declaring heartbeat failure e.g 3.
392 Also if this value is not set or is 0 then the heartbeat mecha‐
393 nism is not engaged in the system and token rotation is the
394 method of failure detection
395
396 The default is 0 (disabled).
397
398
399 max_network_delay
400 [HeartBeating mechanism] This constant specifies in milliseconds
401 the approximate delay that your network takes to transport one
402 packet from one machine to another. This value is to be set by
403 system engineers and please dont change if not sure as this
404 effects the failure detection mechanism using heartbeat.
405
406 The default is 50 milliseconds.
407
408
409 window_size
410 This constant specifies the maximum number of messages that may
411 be sent on one token rotation. If all processors perform
412 equally well, this value could be large (300), which would
413 introduce higher latency from origination to delivery for very
414 large rings. To reduce latency in large rings(16+), the
415 defaults are a safe compromise. If 1 or more slow processor(s)
416 are present among fast processors, window_size should be no
417 larger then 256000 / netmtu to avoid overflow of the kernel
418 receive buffers. The user is notified of this by the display of
419 a retransmit list in the notification logs. There is no loss of
420 data, but performance is reduced when these errors occur.
421
422 The default is 50 messages.
423
424
425 max_messages
426 This constant specifies the maximum number of messages that may
427 be sent by one processor on receipt of the token. The max_mes‐
428 sages parameter is limited to 256000 / netmtu to prevent over‐
429 flow of the kernel transmit buffers.
430
431 The default is 17 messages.
432
433
434 miss_count_const
435 This constant defines the maximum number of times on receipt of
436 a token a message is checked for retransmission before a
437 retransmission occurs. This parameter is useful to modify for
438 switches that delay multicast packets compared to unicast pack‐
439 ets. The default setting works well for nearly all modern
440 switches.
441
442 The default is 5 messages.
443
444
445 rrp_problem_count_timeout
446 This specifies the time in milliseconds to wait before decre‐
447 menting the problem count by 1 for a particular ring to ensure a
448 link is not marked faulty for transient network failures.
449
450 The default is 2000 milliseconds.
451
452
453 rrp_problem_count_threshold
454 This specifies the number of times a problem is detected with a
455 link before setting the link faulty. Once a link is set faulty,
456 no more data is transmitted upon it. Also, the problem counter
457 is no longer decremented when the problem count timeout expires.
458
459 A problem is detected whenever all tokens from the proceeding
460 processor have not been received within the
461 rrp_token_expired_timeout. The rrp_problem_count_threshold *
462 rrp_token_expired_timeout should be atleast 50 milliseconds less
463 then the token timeout, or a complete reconfiguration may occur.
464
465 The default is 10 problem counts.
466
467
468 rrp_problem_count_mcast_threshold
469 This specifies the number of times a problem is detected with
470 multicast before setting the link faulty for passive rrp mode.
471 This variable is unused in active rrp mode.
472
473 The default is 10 times rrp_problem_count_threshold.
474
475
476 rrp_token_expired_timeout
477 This specifies the time in milliseconds to increment the problem
478 counter for the redundant ring protocol after not having
479 received a token from all rings for a particular processor.
480
481 This value will automatically be calculated from the token time‐
482 out and problem_count_threshold but may be overridden. It is
483 not recommended to override this value without guidance from the
484 corosync community.
485
486 The default is 47 milliseconds.
487
488
489 rrp_autorecovery_check_timeout
490 This specifies the time in milliseconds to check if the failed
491 ring can be auto-recovered.
492
493 The default is 1000 milliseconds.
494
495
496 Within the logging directive, there are several configuration options
497 which are all optional.
498
499
500 The following 3 options are valid only for the top level logging direc‐
501 tive:
502
503
504 timestamp
505 This specifies that a timestamp is placed on all log messages.
506
507 The default is off.
508
509
510 fileline
511 This specifies that file and line should be printed.
512
513 The default is off.
514
515
516 function_name
517 This specifies that the code function name should be printed.
518
519 The default is off.
520
521
522 The following options are valid both for top level logging directive
523 and they can be overriden in logger_subsys entries.
524
525
526 to_stderr
527
528 to_logfile
529
530 to_syslog
531 These specify the destination of logging output. Any combination
532 of these options may be specified. Valid options are yes and no.
533
534 The default is syslog and stderr.
535
536 Please note, if you are using to_logfile and want to rotate the
537 file, use logrotate(8) with the option copytruncate. eg.
538 /var/log/corosync.log {
539 missingok
540 compress
541 notifempty
542 daily
543 rotate 7
544 copytruncate
545 }
546
547
548 logfile
549 If the to_logfile directive is set to yes , this option speci‐
550 fies the pathname of the log file.
551
552 No default.
553
554
555 logfile_priority
556 This specifies the logfile priority for this particular subsys‐
557 tem. Ignored if debug is on. Possible values are: alert, crit,
558 debug (same as debug = on), emerg, err, info, notice, warning.
559
560 The default is: info.
561
562
563 syslog_facility
564 This specifies the syslog facility type that will be used for
565 any messages sent to syslog. options are daemon, local0, local1,
566 local2, local3, local4, local5, local6 & local7.
567
568 The default is daemon.
569
570
571 syslog_priority
572 This specifies the syslog level for this particular subsystem.
573 Ignored if debug is on. Possible values are: alert, crit, debug
574 (same as debug = on), emerg, err, info, notice, warning.
575
576 The default is: info.
577
578
579 debug This specifies whether debug output is logged for this particu‐
580 lar logger. Also can contain value trace, what is highest level
581 of debug informations.
582
583 The default is off.
584
585
586 tags This specifies which tags should be traced for this particular
587 logger. Set debug directive to on in order to enable tracing
588 using tags. Values are specified using a vertical bar as a log‐
589 ical OR separator:
590
591 enter|leave|trace1|trace2|trace3|...
592
593 The default is none.
594
595
596 Within the logging directive, logger_subsys directives are optional.
597
598
599 Within the logger_subsys sub-directive, all of the above logging con‐
600 figuration options are valid and can be used to override the default
601 settings. The subsys entry, described below, is mandatory to identify
602 the subsystem.
603
604
605 subsys This specifies the subsystem identity (name) for which logging
606 is specified. This is the name used by a service in the log_init
607 () call. E.g. 'CKPT'. This directive is required.
608
609
611 /etc/corosync/corosync.conf
612 The corosync executive configuration file.
613
614
616 corosync_overview(8), logrotate(8)
617
618corosync Man Page 2006-03-28 COROSYNC_CONF(5)