corosync.conf(5)

1COROSYNC_CONF(5)  Corosync Cluster Engine Programmer's Manual COROSYNC_CONF(5)
2
3
4

NAME

6       corosync.conf - corosync executive configuration file
7
8

SYNOPSIS

10       /etc/corosync/corosync.conf
11
12

DESCRIPTION

14       The corosync.conf instructs the corosync executive about various param‐
15       eters needed to control the corosync executive.  Empty lines and  lines
16       starting with # character are ignored.  The configuration file consists
17       of bracketed top level directives.  The possible directive choices are:
18
19
20       totem { }
21              This top level directive contains configuration options for  the
22              totem protocol.
23
24       logging { }
25              This top level directive contains configuration options for log‐
26              ging.
27
28       quorum { }
29              This top level directive contains configuration options for quo‐
30              rum.
31
32       nodelist { }
33              This  top  level  directive  contains  configuration options for
34              nodes in cluster.
35
36       qb { } This top level directive contains configuration options  related
37              to libqb.
38
39       resources { }
40              This  top  level  directive  contains  configuration options for
41              resources.
42
43
44       Within the totem directive, an interface directive is required.   There
45       is also one configuration option which is required:
46
47       Within  the  interface sub-directive of totem there are four parameters
48       which are required.  There is one parameter which is optional.
49
50
51       ringnumber
52              This specifies the ring number for the  interface.   When  using
53              the redundant ring protocol, each interface should specify sepa‐
54              rate ring numbers to uniquely identify to the membership  proto‐
55              col  which  interface  to  use  for  which  redundant  ring. The
56              ringnumber must start at 0.
57
58
59       bindnetaddr
60              This specifies the network address the corosync executive should
61              bind to.
62
63              bindnetaddr should be an IP address configured on the system, or
64              a network address.
65
66              For example, if the local interface is 192.168.5.92 with netmask
67              255.255.255.0,  you  should  set  bindnetaddr to 192.168.5.92 or
68              192.168.5.0.  If the local interface is 192.168.5.92  with  net‐
69              mask   255.255.255.192,   set  bindnetaddr  to  192.168.5.92  or
70              192.168.5.64, and so forth.
71
72              This may also be an IPV6 address, in which case IPV6  networking
73              will be used.  In this case, the exact address must be specified
74              and there is no automatic selection  of  the  network  interface
75              within a specific subnet as with IPv4.
76
77              If IPv6 networking is used, the nodeid field in nodelist must be
78              specified.
79
80
81       broadcast
82              This is optional and can be set to yes.  If it is  set  to  yes,
83              the  broadcast  address will be used for communication.  If this
84              option is set, mcastaddr should not be set.
85
86
87       mcastaddr
88              This is the multicast address used by corosync  executive.   The
89              default  should work for most networks, but the network adminis‐
90              trator should be queried  about  a  multicast  address  to  use.
91              Avoid 224.x.x.x because this is a "config" multicast address.
92
93              This  may  also be an IPV6 multicast address, in which case IPV6
94              networking will be used.  If IPv6 networking is used, the nodeid
95              field in nodelist must be specified.
96
97              It's  not  needed  to  use this option if cluster_name option is
98              used. If both options are used, mcastaddr has higher priority.
99
100
101       mcastport
102              This specifies the UDP port number.  It is possible to  use  the
103              same  multicast  address on a network with the corosync services
104              configured for different UDP ports.  Please note  corosync  uses
105              two  UDP  ports mcastport (for mcast receives) and mcastport - 1
106              (for mcast sends).  If you have multiple clusters  on  the  same
107              network using the same mcastaddr please configure the mcastports
108              with a gap.
109
110
111       ttl    This specifies the Time To Live (TTL). If you run  your  cluster
112              on  a  routed network then the default of "1" will be too small.
113              This option provides a way to increase this up to 255. The valid
114              range  is  0..255.   Note  that  this is only valid on multicast
115              transport types.
116
117
118       Within the totem directive, there are seven  configuration  options  of
119       which one is required, five are optional, and one is required when IPV6
120       is configured in the interface subdirective.   The  required  directive
121       controls  the  version of the totem configuration.  The optional option
122       unless using IPV6 directive controls identification of  the  processor.
123       The  optional options control secrecy and authentication, the redundant
124       ring mode of operation and maximum network MTU field.
125
126
127       version
128              This specifies the version of the configuration file.  Currently
129              the only valid version for this directive is 2.
130
131
132       clear_node_high_bit  This  configuration option is optional and is only
133       relevant when no nodeid is specified.  Some corosync clients require  a
134       signed  32  bit  nodeid  that  is  greater than zero however by default
135       corosync uses all 32 bits of the IPv4 address space when  generating  a
136       nodeid.   Set  this  option to yes to force the high bit to be zero and
137       therefor ensure the nodeid is a positive signed 32 bit integer.
138
139       WARNING: The clusters behavior is undefined if this option  is  enabled
140       on only a subset of the cluster (for example during a rolling upgrade).
141
142
143       crypto_hash
144              This  specifies  which  HMAC  authentication  should  be used to
145              authenticate all messages. Valid values are none (no authentica‐
146              tion), md5, sha1, sha256, sha384 and sha512.
147
148              The default is sha1.
149
150
151       crypto_cipher
152              This  specifies  which cipher should be used to encrypt all mes‐
153              sages.  Valid values are none (no encryption),  aes256,  aes192,
154              aes128 and 3des.  Enabling crypto_cipher, requires also enabling
155              of crypto_hash.
156
157              The default is aes256.
158
159
160       secauth
161              This specifies that HMAC/SHA1 authentication should be  used  to
162              authenticate  all  messages.  It further specifies that all data
163              should be encrypted with the nss library and  aes256  encryption
164              algorithm to protect data from eavesdropping.
165
166              Enabling  this  option adds a encryption header to every message
167              sent by totem which reduces total  throughput.  Also  encryption
168              and authentication consume extra CPU cycles in corosync.
169
170              The default is on.
171
172              WARNING:  This  parameter  is deprecated. It's recomended to use
173              combination of crypto_cipher and crypto_hash.
174
175
176       rrp_mode
177              This specifies the mode of redundant ring, which  may  be  none,
178              active,  or  passive.   Currently only 'passive' is supported or
179              tested (using  'active'  is  not recommended).  Active  replica‐
180              tion  offers slightly lower latency from transmit to delivery in
181              faulty network environments but with less performance.   Passive
182              replication may nearly double the speed of the totem protocol if
183              the protocol doesn't become cpu  bound.   The  final  option  is
184              none,  in  which case only one network interface will be used to
185              operate the totem protocol.
186
187              If only one interface directive is specified, none is  automati‐
188              cally  chosen.   If multiple interface directives are specified,
189              only active or passive may be chosen.
190
191              The maximum number of interface directives that is  allowed  for
192              either modes (active or passive) is 2.
193
194              When  using multiple interfaces, make sure to use different mul‐
195              ticast address/port (port for same address  must  differ  by  at
196              least  two)  pair for each interface (this is checked by parser)
197              to make rrp works.
198
199
200       netmtu This specifies the network maximum transmit unit.  To  set  this
201              value  beyond  1500,  the  regular  frame MTU, requires ethernet
202              devices that support large, or also called  jumbo,  frames.   If
203              any device in the network doesn't support large frames, the pro‐
204              tocol will not operate properly.  The hosts must also have their
205              mtu size set from 1500 to whatever frame size is specified here.
206
207              Please  note  while some NICs or switches claim large frame sup‐
208              port, they support 9000 MTU as the maximum frame size  including
209              the  IP  header.   Setting the netmtu and host MTUs to 9000 will
210              cause totem to use the full 9000 bytes of the frame.  Then Linux
211              will  add  a  18 byte header moving the full frame size to 9018.
212              As a result some hardware will not operate  properly  with  this
213              size  of data.  A netmtu of 8982 seems to work for the few large
214              frame devices that have been tested.  Some  manufacturers  claim
215              large  frame  support  when  in fact they support frame sizes of
216              4500 bytes.
217
218              When sending multicast traffic, if the network frequently recon‐
219              figures,  chances  are  that  some device in the network doesn't
220              support large frames.
221
222              Choose hardware carefully if intending to use large  frame  sup‐
223              port.
224
225              The default is 1500.
226
227
228       transport
229              This  directive  controls  the transport mechanism used.  If the
230              interface to which corosync is binding is an RDMA interface such
231              as  RoCEE  or  Infiniband, the "iba" parameter may be specified.
232              To avoid the use of  multicast  entirely,  a  unicast  transport
233              parameter "udpu" can be specified.  This requires specifying the
234              list of members in nodelist directive,  that  could  potentially
235              make up the membership before deployment.
236
237              The  default is udp.  The transport type can also be set to udpu
238              or iba.
239
240
241       cluster_name
242              This specifies the name of cluster and it's used  for  automatic
243              generating of multicast address.
244
245
246       config_version
247              This  specifies  version  of  config  file. This is converted to
248              unsigned 64-bit int.  By default it's 0. Option is used to  pre‐
249              vent  joining  old  nodes  with not up-to-date configuration. If
250              value is not 0, and node is going for first time (only for first
251              time,  join  after split doesn't follow this rules) from single-
252              node membership to multiple nodes membership, other  nodes  con‐
253              fig_versions  are  collected.  If current node config_version is
254              not equal to highest of collected versions, corosync  is  termi‐
255              nated.
256
257
258       ip_version
259              Specifies  version  of IP to use for communication. Value can be
260              one of ipv4 or ipv6. Default (if unspecified) is ipv4.
261
262
263              Within the totem  directive,  there  are  several  configuration
264              options which are used to control the operation of the protocol.
265              It is generally not recommended to change any  of  these  values
266              without  proper  guidance and sufficient testing.  Some networks
267              may require larger values if suffering from frequent  reconfigu‐
268              rations.  Some applications may require faster failure detection
269              times which can be achieved by reducing the token timeout.
270
271
272       token  This timeout is used directly or as a base for real token  time‐
273              out  calculation (explained in token_coefficient section). Token
274              timeout specifies in milliseconds until a token loss is declared
275              after not receiving a token.  This is the time spent detecting a
276              failure of a processor in the current configuration.   Reforming
277              a  new  configuration takes about 50 milliseconds in addition to
278              this timeout.
279
280              For real token timeout used by totem it's possible to read  cmap
281              value of runtime.config.token key.
282
283              The default is 1000 milliseconds.
284
285
286       token_coefficient
287              This  value  is used only when nodelist section is specified and
288              contains at least 3 nodes. If so, real  token  timeout  is  then
289              computed  as  token + (number_of_nodes - 2) * token_coefficient.
290              This allows cluster to scale  without  manually  changing  token
291              timeout every time new node is added. This value can be set to 0
292              resulting in effective removal of this feature.
293
294              The default is 650 milliseconds.
295
296
297       token_retransmit
298              This timeout specifies in milliseconds  after  how  long  before
299              receiving  a  token  the  token  is retransmitted.  This will be
300              automatically calculated if token is modified.  It is not recom‐
301              mended  to  alter  this value without guidance from the corosync
302              community.
303
304              The default is 238 milliseconds.
305
306
307       hold   This timeout specifies in milliseconds how long the token should
308              be  held  by  the  representative when the protocol is under low
309              utilization.   It is not recommended to alter this value without
310              guidance from the corosync community.
311
312              The default is 180 milliseconds.
313
314
315       token_retransmits_before_loss_const
316              This  value  identifies  how  many  token  retransmits should be
317              attempted before forming a new configuration.  If this value  is
318              set,  retransmit  and hold will be automatically calculated from
319              retransmits_before_loss and token.
320
321              The default is 4 retransmissions.
322
323
324       join   This timeout specifies in milliseconds how long to wait for join
325              messages in the membership protocol.
326
327              The default is 50 milliseconds.
328
329
330       send_join
331              This  timeout specifies in milliseconds an upper range between 0
332              and send_join to wait before sending a join message.   For  con‐
333              figurations  with less than 32 nodes, this parameter is not nec‐
334              essary.  For larger rings, this parameter is necessary to ensure
335              the  NIC  is not overflowed with join messages on formation of a
336              new ring.  A reasonable value for large rings (128 nodes)  would
337              be 80msec.  Other timer values must also change if this value is
338              changed.  Seek advice from the corosync mailing list  if  trying
339              to run larger configurations.
340
341              The default is 0 milliseconds.
342
343
344       consensus
345              This timeout specifies in milliseconds how long to wait for con‐
346              sensus to be achieved before starting a new round of  membership
347              configuration.   The  minimum  value for consensus must be 1.2 *
348              token.  This value will be automatically  calculated  at  1.2  *
349              token if the user doesn't specify a consensus value.
350
351              For  two node clusters, a consensus larger than the join timeout
352              but less than token is safe.  For three node or larger clusters,
353              consensus  should  be larger than token.  There is an increasing
354              risk of odd membership changes, which  still  guarantee  virtual
355              synchrony,  as node count grows if consensus is less than token.
356
357              The default is 1200 milliseconds.
358
359
360       merge  This  timeout  specifies in milliseconds how long to wait before
361              checking for a partition when  no  multicast  traffic  is  being
362              sent.   If  multicast traffic is being sent, the merge detection
363              happens automatically as a function of the protocol.
364
365              The default is 200 milliseconds.
366
367
368       downcheck
369              This timeout specifies in milliseconds how long to  wait  before
370              checking  that  a network interface is back up after it has been
371              downed.
372
373              The default is 1000 milliseconds.
374
375
376       fail_recv_const
377              This constant specifies how many rotations of the token  without
378              receiving  any  of the messages when messages should be received
379              may occur before a new configuration is formed.
380
381              The default is 2500 failures to receive a message.
382
383
384       seqno_unchanged_const
385              This constant specifies how many rotations of the token  without
386              any  multicast  traffic  should  occur  before the hold timer is
387              started.
388
389              The default is 30 rotations.
390
391
392       heartbeat_failures_allowed
393              [HeartBeating mechanism] Configures  the  optional  HeartBeating
394              mechanism for faster failure detection. Keep in mind that engag‐
395              ing this mechanism in lossy networks  could  cause  faulty  loss
396              declaration  as  the  mechanism relies on the network for heart‐
397              beating.
398
399              So as a rule of thumb use this mechanism if you require improved
400              failure in low to medium utilized networks.
401
402              This  constant  specifies  the  number of heartbeat failures the
403              system should tolerate before declaring heartbeat failure e.g 3.
404              Also  if this value is not set or is 0 then the heartbeat mecha‐
405              nism is not engaged in the system  and  token  rotation  is  the
406              method of failure detection
407
408              The default is 0 (disabled).
409
410
411       max_network_delay
412              [HeartBeating mechanism] This constant specifies in milliseconds
413              the approximate delay that your network takes to  transport  one
414              packet  from  one machine to another. This value is to be set by
415              system engineers and please don't change if  not  sure  as  this
416              effects the failure detection mechanism using heartbeat.
417
418              The default is 50 milliseconds.
419
420
421       window_size
422              This  constant specifies the maximum number of messages that may
423              be sent on  one  token  rotation.   If  all  processors  perform
424              equally  well,  this  value  could  be  large (300), which would
425              introduce higher latency from origination to delivery  for  very
426              large  rings.   To  reduce  latency  in  large  rings(16+),  the
427              defaults are a safe compromise.  If 1 or more slow  processor(s)
428              are  present  among  fast  processors,  window_size should be no
429              larger than 256000 / netmtu to  avoid  overflow  of  the  kernel
430              receive buffers.  The user is notified of this by the display of
431              a retransmit list in the notification logs.  There is no loss of
432              data, but performance is reduced when these errors occur.
433
434              The default is 50 messages.
435
436
437       max_messages
438              This  constant specifies the maximum number of messages that may
439              be sent by one processor on receipt of the token.  The  max_mes‐
440              sages  parameter  is limited to 256000 / netmtu to prevent over‐
441              flow of the kernel transmit buffers.
442
443              The default is 17 messages.
444
445
446       miss_count_const
447              This constant defines the maximum number of times on receipt  of
448              a  token  a  message  is  checked  for  retransmission  before a
449              retransmission occurs.  This parameter is useful to  modify  for
450              switches  that delay multicast packets compared to unicast pack‐
451              ets.  The default setting  works  well  for  nearly  all  modern
452              switches.
453
454              The default is 5 messages.
455
456
457       rrp_problem_count_timeout
458              This  specifies  the  time in milliseconds to wait before decre‐
459              menting the problem count by 1 for a particular ring to ensure a
460              link is not marked faulty for transient network failures.
461
462              The default is 2000 milliseconds.
463
464
465       rrp_problem_count_threshold
466              This  specifies the number of times a problem is detected with a
467              link before setting the link faulty.  Once a link is set faulty,
468              no  more data is transmitted upon it.  Also, the problem counter
469              is no longer decremented when the problem count timeout expires.
470
471              A problem is detected whenever all tokens  from  the  proceeding
472              processor     have     not     been    received    within    the
473              rrp_token_expired_timeout.   The  rrp_problem_count_threshold  *
474              rrp_token_expired_timeout should be atleast 50 milliseconds less
475              then the token timeout, or a complete reconfiguration may occur.
476
477              The default is 10 problem counts.
478
479
480       rrp_problem_count_mcast_threshold
481              This specifies the number of times a problem  is  detected  with
482              multicast  before  setting the link faulty for passive rrp mode.
483              This variable is unused in active rrp mode.
484
485              The default is 10 times rrp_problem_count_threshold.
486
487
488       rrp_token_expired_timeout
489              This specifies the time in milliseconds to increment the problem
490              counter  for  the  redundant  ring  protocol  after  not  having
491              received a token from all rings for a particular processor.
492
493              This value will automatically be calculated from the token time‐
494              out  and  problem_count_threshold  but may be overridden.  It is
495              not recommended to override this value without guidance from the
496              corosync community.
497
498              The default is 47 milliseconds.
499
500
501       rrp_autorecovery_check_timeout
502              This  specifies  the time in milliseconds to check if the failed
503              ring can be auto-recovered.
504
505              The default is 1000 milliseconds.
506
507
508       Within the logging directive, there are several  configuration  options
509       which are all optional.
510
511
512       The following 3 options are valid only for the top level logging direc‐
513       tive:
514
515
516       timestamp
517              This specifies that a timestamp is placed on all log messages.
518
519              The default is off.
520
521
522       fileline
523              This specifies that file and line should be printed.
524
525              The default is off.
526
527
528       function_name
529              This specifies that the code function name should be printed.
530
531              The default is off.
532
533
534       blackbox
535              This specifies that blackbox functionality should be enabled.
536
537              The defualt is on.
538
539
540       The following options are valid both for top  level  logging  directive
541       and they can be overridden in logger_subsys entries.
542
543
544       to_stderr
545
546       to_logfile
547
548       to_syslog
549              These specify the destination of logging output. Any combination
550              of these options may be specified. Valid options are yes and no.
551
552              The default is syslog and stderr.
553
554              Please note, if you are using to_logfile and want to rotate  the
555              file, use logrotate(8) with the option copytruncate.  eg.
556              /var/log/corosync.log {
557                   missingok
558                   compress
559                   notifempty
560                   daily
561                   rotate 7
562                   copytruncate
563              }
564
565
566       logfile
567              If  the  to_logfile directive is set to yes , this option speci‐
568              fies the pathname of the log file.
569
570              No default.
571
572
573       logfile_priority
574              This specifies the logfile priority for this particular  subsys‐
575              tem.  Ignored if debug is on.  Possible values are: alert, crit,
576              debug (same as debug = on), emerg, err, info, notice, warning.
577
578              The default is: info.
579
580
581       syslog_facility
582              This specifies the syslog facility type that will  be  used  for
583              any messages sent to syslog. options are daemon, local0, local1,
584              local2, local3, local4, local5, local6 & local7.
585
586              The default is daemon.
587
588
589       syslog_priority
590              This specifies the syslog level for this  particular  subsystem.
591              Ignored if debug is on.  Possible values are: alert, crit, debug
592              (same as debug = on), emerg, err, info, notice, warning.
593
594              The default is: info.
595
596
597       debug  This specifies whether debug output is logged for this  particu‐
598              lar  logger. Also can contain value trace, what is highest level
599              of debug information.
600
601              The default is off.
602
603
604       Within the logging directive, logger_subsys directives are optional.
605
606
607       Within the logger_subsys sub-directive, all of the above  logging  con‐
608       figuration  options  are  valid and can be used to override the default
609       settings.  The subsys entry, described below, is mandatory to  identify
610       the subsystem.
611
612
613       subsys This  specifies  the subsystem identity (name) for which logging
614              is specified. This  is  the  name  used  by  a  service  in  the
615              log_init() call. E.g. 'CPG'. This directive is required.
616
617
618       Within  the quorum directive it is possible to specify the quorum algo‐
619       rithm to use with the
620
621
622       provider
623              directive. At the time of writing  only  corosync_votequorum  is
624              supported.  See votequorum(5) for configuration options.
625
626
627       Within the nodelist directive it is possible to specify specific infor‐
628       mation about nodes in cluster. Directive can  contain  only  node  sub-
629       directive,  which  specifies  every node that should be a member of the
630       membership, and where non-default options are needed. Every  node  must
631       have at least ring0_addr field filled.
632
633       For  UDPU, every node that should be a member of the membership must be
634       specified.
635
636       Possible options are:
637
638       ringX_addr
639              This specifies IP address of one of the nodes. X is ring number.
640
641
642       nodeid This configuration  option  is  optional  when  using  IPv4  and
643              required when using IPv6.  This is a 32 bit value specifying the
644              node identifier delivered to the cluster membership service.  If
645              this  is not specified with IPv4, the node id will be determined
646              from the 32 bit IP address the system to  which  the  system  is
647              bound  with  ring identifier of 0.  The node identifier value of
648              zero is reserved and should not be used.
649
650
651       Within the qb directive it is possible to specify options for libqb.
652
653       Possible option is:
654
655       ipc_type
656              This specifies type  of  IPC  to  use.  Can  be  one  of  native
657              (default),  shm  and socket.  Native means one of shm or socket,
658              depending on what is supported by OS. On  systems  with  support
659              for  both, SHM is selected. SHM is generally faster, but need to
660              allocate ring buffer file in /dev/shm.
661
662
663       Within the resources directive it is possible to  specify  options  for
664       resources.
665
666       Possible option is:
667
668       watchdog_device
669              (Valid only if Corosync was compiled with watchdog support.)
670              Watchdog  device  to  use.   The default value is /dev/watchdog.
671              The special value "off" disables watchdog usage.
672
673              In a cluster with properly configured power fencing  a  watchdog
674              provides  no additional value.  On the other hand, slow watchdog
675              communication may incur multi-second delays in the Corosync main
676              loop,  potentially breaking down membership.  IPMI watchdogs are
677              particularly  notorious  in  this  regard:   read   about   kip‐
678              mid_max_busy_us in IPMI.txt in the Linux kernel documentation.
679
680

FILES

682       /etc/corosync/corosync.conf
683              The corosync executive configuration file.
684
685

NAME

SYNOPSIS

DESCRIPTION

FILES

SEE ALSO