1COROSYNC_CONF(5) Corosync Cluster Engine Programmer's Manual COROSYNC_CONF(5)
2
3
4
6 corosync.conf - corosync executive configuration file
7
8
10 /etc/corosync.conf
11
12
14 The corosync.conf instructs the corosync executive about various param‐
15 eters needed to control the corosync executive. Empty lines and lines
16 starting with # character are ignored. The configuration file consists
17 of bracketed top level directives. The possible directive choices are:
18
19
20 totem { }
21 This top level directive contains configuration options for the
22 totem protocol.
23
24 logging { }
25 This top level directive contains configuration options for log‐
26 ging.
27
28 event { }
29 This top level directive contains configuration options for the
30 event service.
31
32
33 It is also possible to specify the top level parameter compatibility.
34 This directive indicates the level of compatibility requested by the
35 user. The option whitetank can be specified to remain backward compat‐
36 able with openais-0.80.z. The option none can be specified to only be
37 compatable with corosync-1.Y.Z. Extra processing during configuration
38 changes is required to remain backward compatable.
39
40 The default is whitetank. (backwards compatibility)
41
42
43 Within the totem directive, an interface directive is required. There
44 is also one configuration option which is required:
45
46 Within the interface sub-directive of totem there are four parameters
47 which are required. There is one parameter which is optional.
48
49
50 ringnumber
51 This specifies the ring number for the interface. When using
52 the redundant ring protocol, each interface should specify sepa‐
53 rate ring numbers to uniquely identify to the membership proto‐
54 col which interface to use for which redundant ring. The
55 ringnumber must start at 0.
56
57
58 bindnetaddr
59 This specifies the network address the corosync executive should
60 bind to. For example, if the local interface is 192.168.5.92
61 with netmask 255.255.255.0, set bindnetaddr to 192.168.5.0. If
62 the local interface is 192.168.5.92 with netmask
63 255.255.255.192, set bindnetaddr to 192.168.5.64, and so forth.
64
65 This may also be an IPV6 address, in which case IPV6 networking
66 will be used. In this case, the full address must be specified
67 and there is no automatic selection of the network interface
68 within a specific subnet as with IPv4.
69
70 If IPv6 networking is used, the nodeid field must be specified.
71
72
73 broadcast
74 This is optional and can be set to yes. If it is set to yes,
75 the broadcast address will be used for communication. If this
76 option is set, mcastaddr should not be set.
77
78
79 mcastaddr
80 This is the multicast address used by corosync executive. The
81 default should work for most networks, but the network adminis‐
82 trator should be queried about a multicast address to use.
83 Avoid 224.x.x.x because this is a "config" multicast address.
84
85 This may also be an IPV6 multicast address, in which case IPV6
86 networking will be used. If IPv6 networking is used, the nodeid
87 field must be specified.
88
89
90 mcastport
91 This specifies the UDP port number. It is possible to use the
92 same multicast address on a network with the corosync services
93 configured for different UDP ports. Please note corosync uses
94 two UDP ports mcastport (for mcast receives) and mcastport - 1
95 (for mcast sends). If you have multiple clusters on the same
96 network using the same mcastaddr please configure the mcastports
97 with a gap.
98
99
100 ttl This specifies the Time To Live (TTL). If you run your cluster
101 on a routed network then the default of "1" will be too small.
102 This option provides a way to increase this up to 255. The valid
103 range is 0..255. Note that this is only valid on multicast
104 transport types.
105
106
107 member This specifies a member on the interface and used with the udpu
108 transport only. Every node that should be a member of the mem‐
109 bership should be specified as a separate member directive.
110 Within the member directive there is a parameter memberaddr
111 which specifies the ip address of one of the nodes.
112
113
114 Within the totem directive, there are seven configuration options of
115 which one is required, five are optional, and one is required when IPV6
116 is configured in the interface subdirective. The required directive
117 controls the version of the totem configuration. The optional option
118 unless using IPV6 directive controls identification of the processor.
119 The optional options control secrecy and authentication, the redundant
120 ring mode of operation, maximum network MTU, and number of sending
121 threads, and the nodeid field.
122
123
124 version
125 This specifies the version of the configuration file. Currently
126 the only valid version for this directive is 2.
127
128
129 nodeid This configuration option is optional when using IPv4 and
130 required when using IPv6. This is a 32 bit value specifying the
131 node identifier delivered to the cluster membership service. If
132 this is not specified with IPv4, the node id will be determined
133 from the 32 bit IP address the system to which the system is
134 bound with ring identifier of 0. The node identifier value of
135 zero is reserved and should not be used.
136
137
138 clear_node_high_bit
139 This configuration option is optional and is only relevant when
140 no nodeid is specified. Some openais clients require a signed
141 32 bit nodeid that is greater than zero however by default ope‐
142 nais uses all 32 bits of the IPv4 address space when generating
143 a nodeid. Set this option to yes to force the high bit to be
144 zero and therefor ensure the nodeid is a positive signed 32 bit
145 integer.
146
147 WARNING: The clusters behavior is undefined if this option is
148 enabled on only a subset of the cluster (for example during a
149 rolling upgrade).
150
151
152 secauth
153 This specifies that HMAC/SHA1 authentication should be used to
154 authenticate all messages. It further specifies that all data
155 should be encrypted with the sober128 encryption algorithm to
156 protect data from eavesdropping.
157
158 Enabling this option adds a 36 byte header to every message sent
159 by totem which reduces total throughput. Encryption and authen‐
160 tication consume 75% of CPU cycles in aisexec as measured with
161 gprof when enabled.
162
163 For 100mbit networks with 1500 MTU frame transmissions: A
164 throughput of 9mb/sec is possible with 100% cpu utilization when
165 this option is enabled on 3ghz cpus. A throughput of 10mb/sec
166 is possible wth 20% cpu utilization when this optin is disabled
167 on 3ghz cpus.
168
169 For gig-e networks with large frame transmissions: A throughput
170 of 20mb/sec is possible when this option is enabled on 3ghz
171 cpus. A throughput of 60mb/sec is possible when this option is
172 disabled on 3ghz cpus.
173
174 The default is on.
175
176
177 rrp_mode
178 This specifies the mode of redundant ring, which may be none,
179 active, or passive. Active replication offers slightly lower
180 latency from transmit to delivery in faulty network environments
181 but with less performance. Passive replication may nearly dou‐
182 ble the speed of the totem protocol if the protocol doesn't
183 become cpu bound. The final option is none, in which case only
184 one network interface will be used to operate the totem proto‐
185 col.
186
187 If only one interface directive is specified, none is automati‐
188 cally chosen. If multiple interface directives are specified,
189 only active or passive may be chosen.
190
191
192 netmtu This specifies the network maximum transmit unit. To set this
193 value beyond 1500, the regular frame MTU, requires ethernet
194 devices that support large, or also called jumbo, frames. If
195 any device in the network doesn't support large frames, the pro‐
196 tocol will not operate properly. The hosts must also have their
197 mtu size set from 1500 to whatever frame size is specified here.
198
199 Please note while some NICs or switches claim large frame sup‐
200 port, they support 9000 MTU as the maximum frame size including
201 the IP header. Setting the netmtu and host MTUs to 9000 will
202 cause totem to use the full 9000 bytes of the frame. Then Linux
203 will add a 18 byte header moving the full frame size to 9018.
204 As a result some hardware will not operate properly with this
205 size of data. A netmtu of 8982 seems to work for the few large
206 frame devices that have been tested. Some manufacturers claim
207 large frame support when in fact they support frame sizes of
208 4500 bytes.
209
210 Increasing the MTU from 1500 to 8982 doubles throughput perfor‐
211 mance from 30MB/sec to 60MB/sec as measured with evsbench with
212 175000 byte messages with the secauth directive set to off.
213
214 When sending multicast traffic, if the network frequently recon‐
215 figures, chances are that some device in the network doesn't
216 support large frames.
217
218 Choose hardware carefully if intending to use large frame sup‐
219 port.
220
221 The default is 1500.
222
223
224 threads
225 This directive controls how many threads are used to encrypt and
226 send multicast messages. If secauth is off, the protocol will
227 never use threaded sending. If secauth is on, this directive
228 allows systems to be configured to use multiple threads to
229 encrypt and send multicast messages.
230
231 A thread directive of 0 indicates that no threaded send should
232 be used. This mode offers best performance for non-SMP systems.
233
234 The default is 0.
235
236
237 vsftype
238 This directive controls the virtual synchrony filter type used
239 to identify a primary component. The preferred choice is YKD
240 dynamic linear voting, however, for clusters larger then 32
241 nodes YKD consumes alot of memory. For large scale clusters
242 that are created by changing the MAX_PROCESSORS_COUNT #define in
243 the C code totem.h file, the virtual synchrony filter "none" is
244 recommended but then AMF and DLCK services (which are currently
245 experimental) are not safe for use.
246
247 The default is ykd. The vsftype can also be set to none.
248
249
250 transport
251 This directive controls the transport mechanism used. If the
252 interface to which corosync is binding is an RDMA interface such
253 as RoCEE or Infiniband, the "iba" parameter may be specified.
254 To avoid the use of multicast entirely, a unicast transport
255 parameter "udpu" can be specified. This requires specifying the
256 list of members that could potentially make up the membership
257 before deployment.
258
259 The default is udp. The transport type can also be set to udpu
260 or iba.
261
262 Within the totem directive, there are several configuration
263 options which are used to control the operation of the protocol.
264 It is generally not recommended to change any of these values
265 without proper guidance and sufficient testing. Some networks
266 may require larger values if suffering from frequent reconfigu‐
267 rations. Some applications may require faster failure detection
268 times which can be achieved by reducing the token timeout.
269
270
271 token This timeout specifies in milliseconds until a token loss is
272 declared after not receiving a token. This is the time spent
273 detecting a failure of a processor in the current configuration.
274 Reforming a new configuration takes about 50 milliseconds in
275 addition to this timeout.
276
277 The default is 1000 milliseconds.
278
279
280 token_retransmit
281 This timeout specifies in milliseconds after how long before
282 receiving a token the token is retransmitted. This will be
283 automatically calculated if token is modified. It is not recom‐
284 mended to alter this value without guidance from the corosync
285 community.
286
287 The default is 238 milliseconds.
288
289
290 hold This timeout specifies in milliseconds how long the token should
291 be held by the representative when the protocol is under low
292 utilization. It is not recommended to alter this value without
293 guidance from the corosync community.
294
295 The default is 180 milliseconds.
296
297
298 token_retransmits_before_loss_const
299 This value identifies how many token retransmits should be
300 attempted before forming a new configuration. If this value is
301 set, retransmit and hold will be automatically calculated from
302 retransmits_before_loss and token.
303
304 The default is 4 retransmissions.
305
306
307 join This timeout specifies in milliseconds how long to wait for join
308 messages in the membership protocol.
309
310 The default is 50 milliseconds.
311
312
313 send_join
314 This timeout specifies in milliseconds an upper range between 0
315 and send_join to wait before sending a join message. For con‐
316 figurations with less then 32 nodes, this parameter is not nec‐
317 essary. For larger rings, this parameter is necessary to ensure
318 the NIC is not overflowed with join messages on formation of a
319 new ring. A reasonable value for large rings (128 nodes) would
320 be 80msec. Other timer values must also change if this value is
321 changed. Seek advice from the corosync mailing list if trying
322 to run larger configurations.
323
324 The default is 0 milliseconds.
325
326
327 consensus
328 This timeout specifies in milliseconds how long to wait for con‐
329 sensus to be achieved before starting a new round of membership
330 configuration. The minimum value for consensus must be 1.2 *
331 token. This value will be automatically calculated at 1.2 *
332 token if the user doesn't specify a consensus value.
333
334 For two node clusters, a consensus larger then the join timeout
335 but less then token is safe. For three node or larger clusters,
336 consensus should be larger then token. There is an increasing
337 risk of odd membership changes, which stil guarantee virtual
338 synchrony, as node count grows if consensus is less than token.
339
340 The default is 1200 milliseconds.
341
342
343 merge This timeout specifies in milliseconds how long to wait before
344 checking for a partition when no multicast traffic is being
345 sent. If multicast traffic is being sent, the merge detection
346 happens automatically as a function of the protocol.
347
348 The default is 200 milliseconds.
349
350
351 downcheck
352 This timeout specifies in milliseconds how long to wait before
353 checking that a network interface is back up after it has been
354 downed.
355
356 The default is 1000 millseconds.
357
358
359 fail_recv_const
360 This constant specifies how many rotations of the token without
361 receiving any of the messages when messages should be received
362 may occur before a new configuration is formed.
363
364 The default is 2500 failures to receive a message.
365
366
367 seqno_unchanged_const
368 This constant specifies how many rotations of the token without
369 any multicast traffic should occur before the merge detection
370 timeout is started.
371
372 The default is 30 rotations.
373
374
375 heartbeat_failures_allowed
376 [HeartBeating mechanism] Configures the optional HeartBeating
377 mechanism for faster failure detection. Keep in mind that engag‐
378 ing this mechanism in lossy networks could cause faulty loss
379 declaration as the mechanism relies on the network for heart‐
380 beating.
381
382 So as a rule of thumb use this mechanism if you require improved
383 failure in low to medium utilized networks.
384
385 This constant specifies the number of heartbeat failures the
386 system should tolerate before declaring heartbeat failure e.g 3.
387 Also if this value is not set or is 0 then the heartbeat mecha‐
388 nism is not engaged in the system and token rotation is the
389 method of failure detection
390
391 The default is 0 (disabled).
392
393
394 max_network_delay
395 [HeartBeating mechanism] This constant specifies in milliseconds
396 the approximate delay that your network takes to transport one
397 packet from one machine to another. This value is to be set by
398 system engineers and please dont change if not sure as this
399 effects the failure detection mechanism using heartbeat.
400
401 The default is 50 milliseconds.
402
403
404 window_size
405 This constant specifies the maximum number of messages that may
406 be sent on one token rotation. If all processors perform
407 equally well, this value could be large (300), which would
408 introduce higher latency from origination to delivery for very
409 large rings. To reduce latency in large rings(16+), the
410 defaults are a safe compromise. If 1 or more slow processor(s)
411 are present among fast processors, window_size should be no
412 larger then 256000 / netmtu to avoid overflow of the kernel
413 receive buffers. The user is notified of this by the display of
414 a retransmit list in the notification logs. There is no loss of
415 data, but performance is reduced when these errors occur.
416
417 The default is 50 messages.
418
419
420 max_messages
421 This constant specifies the maximum number of messages that may
422 be sent by one processor on receipt of the token. The max_mes‐
423 sages parameter is limited to 256000 / netmtu to prevent over‐
424 flow of the kernel transmit buffers.
425
426 The default is 17 messages.
427
428
429 miss_count_const
430 This constant defines the maximum number of times on receipt of
431 a token a message is checked for retransmission before a
432 retransmission occurs. This parameter is useful to modify for
433 switches that delay multicast packets compared to unicast pack‐
434 ets. The default setting works well for nearly all modern
435 switches.
436
437 The default is 5 messages.
438
439
440 rrp_problem_count_timeout
441 This specifies the time in milliseconds to wait before decre‐
442 menting the problem count by 1 for a particular ring to ensure a
443 link is not marked faulty for transient network failures.
444
445 The default is 2000 milliseconds.
446
447
448 rrp_problem_count_threshold
449 This specifies the number of times a problem is detected with a
450 link before setting the link faulty. Once a link is set faulty,
451 no more data is transmitted upon it. Also, the problem counter
452 is no longer decremented when the problem count timeout expires.
453
454 A problem is detected whenever all tokens from the proceeding
455 processor have not been received within the
456 rrp_token_expired_timeout. The rrp_problem_count_threshold *
457 rrp_token_expired_timeout should be atleast 50 milliseconds less
458 then the token timeout, or a complete reconfiguration may occur.
459
460 The default is 10 problem counts.
461
462
463 rrp_problem_count_mcast_threshold
464 This specifies the number of times a problem is detected with
465 multicast before setting the link faulty for passive rrp mode.
466 This variable is unused in active rrp mode.
467
468 The default is 10 times rrp_problem_count_threshold.
469
470
471 rrp_token_expired_timeout
472 This specifies the time in milliseconds to increment the problem
473 counter for the redundant ring protocol after not having
474 received a token from all rings for a particular processor.
475
476 This value will automatically be calculated from the token time‐
477 out and problem_count_threshold but may be overridden. It is
478 not recommended to override this value without guidance from the
479 corosync community.
480
481 The default is 47 milliseconds.
482
483
484 rrp_autorecovery_check_timeout
485 This specifies the time in milliseconds to check if the failed
486 ring can be auto-recovered.
487
488 The default is 1000 milliseconds.
489
490
491 Within the logging directive, there are several configuration options
492 which are all optional.
493
494
495 The following 3 options are valid only for the top level logging direc‐
496 tive:
497
498
499 timestamp
500 This specifies that a timestamp is placed on all log messages.
501
502 The default is off.
503
504
505 fileline
506 This specifies that file and line should be printed.
507
508 The default is off.
509
510
511 function_name
512 This specifies that the code function name should be printed.
513
514 The default is off.
515
516
517 The following options are valid both for top level logging directive
518 and they can be overriden in logger_subsys entries.
519
520
521 to_stderr
522
523 to_logfile
524
525 to_syslog
526 These specify the destination of logging output. Any combination
527 of these options may be specified. Valid options are yes and no.
528
529 The default is syslog and stderr.
530
531 Please note, if you are using to_logfile and want to rotate the
532 file, use logrotate(8) with the option copytruncate. eg.
533
534 /var/log/corosync.log {
535 missingok
536 compress
537 notifempty
538 daily
539 rotate 7
540 copytruncate
541 }
542
543 logfile
544 If the to_logfile directive is set to yes , this option speci‐
545 fies the pathname of the log file.
546
547 No default.
548
549
550 logfile_priority
551 This specifies the logfile priority for this particular subsys‐
552 tem. Ignored if debug is on. Possible values are: alert, crit,
553 debug (same as debug = on), emerg, err, info, notice, warning.
554
555 The default is: info.
556
557
558 syslog_facility
559 This specifies the syslog facility type that will be used for
560 any messages sent to syslog. options are daemon, local0, local1,
561 local2, local3, local4, local5, local6 & local7.
562
563 The default is daemon.
564
565
566 syslog_priority
567 This specifies the syslog level for this particular subsystem.
568 Ignored if debug is on. Possible values are: alert, crit, debug
569 (same as debug = on), emerg, err, info, notice, warning.
570
571 The default is: info.
572
573
574 debug This specifies whether debug output is logged for this particu‐
575 lar logger.
576
577 The default is off.
578
579
580 tags This specifies which tags should be traced for this particular
581 logger. Set debug directive to on in order to enable tracing
582 using tags. Values are specified using a vertical bar as a log‐
583 ical OR separator:
584
585 enter|leave|trace1|trace2|trace3|...
586
587 The default is none.
588
589
590 Within the logging directive, logger_subsys directives are optional.
591
592
593 Within the logger_subsys sub-directive, all of the above logging con‐
594 figuration options are valid and can be used to override the default
595 settings. The subsys entry, described below, is mandatory to identify
596 the subsystem.
597
598
599 subsys This specifies the subsystem identity (name) for which logging
600 is specified. This is the name used by a service in the log_init
601 () call. E.g. 'CKPT'. This directive is required.
602
603
605 /etc/corosync.conf
606 The corosync executive configuration file.
607
608
610 corosync_overview(8), logrotate(8)
611
612corosync Man Page 2006-03-28 COROSYNC_CONF(5)