corosync.conf(5)

1COROSYNC_CONF(5)  Corosync Cluster Engine Programmer's Manual COROSYNC_CONF(5)
2
3
4

NAME

6       corosync.conf - corosync executive configuration file
7
8

SYNOPSIS

10       /etc/corosync/corosync.conf
11
12

DESCRIPTION

14       The corosync.conf instructs the corosync executive about various param‐
15       eters needed to control the corosync executive.  Empty lines and  lines
16       starting with # character are ignored.  The configuration file consists
17       of bracketed top level directives.  The possible directive choices are:
18
19
20       totem { }
21              This top level directive contains configuration options for  the
22              totem protocol.
23
24       logging { }
25              This top level directive contains configuration options for log‐
26              ging.
27
28       quorum { }
29              This top level directive contains configuration options for quo‐
30              rum.
31
32       nodelist { }
33              This  top  level  directive  contains  configuration options for
34              nodes in cluster.
35
36       system { }
37              This top level directive contains configuration options  related
38              to system.
39
40       resources { }
41              This  top level directive contains configuration options for re‐
42              sources.
43
44       nozzle { }
45              This top level directive contains configuration  options  for  a
46              libnozzle device.
47
48
49       The  interface  sub-directive  of  totem  is  optional for UDP and knet
50       transports.
51
52       For knet, multiple interface subsections  define  parameters  for  each
53       knet link on the system.
54
55       For  UDPU an interface section is not needed and it is recommended that
56       the nodelist is used to define cluster nodes.
57
58
59       linknumber
60              This specifies the link number for the  interface.   When  using
61              the  knet  protocol, each interface should specify separate link
62              numbers to uniquely identify to the  membership  protocol  which
63              interface  to  use for which link.  The linknumber must start at
64              0. For UDP the only supported linknumber is 0.
65
66
67       knet_link_priority
68              This specifies the priority for the link when knet  is  used  in
69              'passive' mode. (see link_mode below)
70
71
72       knet_ping_interval
73              This   specifies   the   interval   between   knet  link  pings.
74              knet_ping_interval and knet_ping_timeout are a pair, if  one  is
75              specified  the other should be too, otherwise one will be calcu‐
76              lated from the token timeout and one will be taken from the con‐
77              fig file.  (default is token timeout / (knet_pong_count*2))
78
79
80       knet_ping_timeout
81              If  no  ping  is received within this time, the knet link is de‐
82              clared dead.  knet_ping_interval  and  knet_ping_timeout  are  a
83              pair, if one is specified the other should be too, otherwise one
84              will be calculated from the token timeout and one will be  taken
85              from   the   config   file.    (default   is   token  timeout  /
86              knet_pong_count)
87
88
89       knet_ping_precision
90              How many values of latency are used  to  calculate  the  average
91              link latency. (default 2048 samples)
92
93
94       knet_pong_count
95              How  many  valid ping/pongs before a link is marked UP. (default
96              2)
97
98
99       knet_transport
100              Which IP transport knet should use. valid values are  "sctp"  or
101              "udp". (default: udp)
102
103
104       bindnetaddr (udp only)
105              This specifies the network address the corosync executive should
106              bind to when using udp.
107
108              bindnetaddr (udp only) should be an IP address configured on the
109              system, or a network address.
110
111              For example, if the local interface is 192.168.5.92 with netmask
112              255.255.255.0, you should set  bindnetaddr  to  192.168.5.92  or
113              192.168.5.0.   If  the local interface is 192.168.5.92 with net‐
114              mask  255.255.255.192,  set  bindnetaddr  to   192.168.5.92   or
115              192.168.5.64, and so forth.
116
117              This  may also be an IPV6 address, in which case IPV6 networking
118              will be used.  In this case, the exact address must be specified
119              and  there  is  no  automatic selection of the network interface
120              within a specific subnet as with IPv4.
121
122              If IPv6 networking is used, the nodeid field in nodelist must be
123              specified.
124
125
126       broadcast (udp only)
127              This  is  optional  and can be set to yes.  If it is set to yes,
128              the broadcast address will be used for communication.   If  this
129              option is set, mcastaddr should not be set.
130
131
132       mcastaddr (udp only)
133              This  is  the multicast address used by corosync executive.  The
134              default should work for most networks, but the network  adminis‐
135              trator  should  be  queried  about  a  multicast address to use.
136              Avoid 224.x.x.x because this is a "config" multicast address.
137
138              This may also be an IPV6 multicast address, in which  case  IPV6
139              networking will be used.  If IPv6 networking is used, the nodeid
140              field in nodelist must be specified.
141
142              It's not necessary to use this option if cluster_name option  is
143              used. If both options are used, mcastaddr has higher priority.
144
145
146       mcastport (udp only)
147              This  specifies  the UDP port number.  It is possible to use the
148              same multicast address on a network with the  corosync  services
149              configured  for  different UDP ports.  Please note corosync uses
150              two UDP ports mcastport (for mcast receives) and mcastport  -  1
151              (for  mcast  sends).   If you have multiple clusters on the same
152              network using the same mcastaddr please configure the mcastports
153              with a gap.
154
155
156       ttl (udp only)
157              This  specifies  the Time To Live (TTL). If you run your cluster
158              on a routed network then the default of "1" will be  too  small.
159              This option provides a way to increase this up to 255. The valid
160              range is 0..255.
161
162
163       Within the totem directive, there are seven  configuration  options  of
164       which one is required, five are optional, and one is required when IPV6
165       is configured in the interface subdirective.   The  required  directive
166       controls  the  version of the totem configuration.  The optional option
167       unless using IPV6 directive controls identification of  the  processor.
168       The  optional  options  control secrecy and authentication, the network
169       mode of operation and maximum network MTU field.
170
171
172       version
173              This specifies the version of the configuration file.  Currently
174              the only valid version for this directive is 2.
175
176
177       clear_node_high_bit
178              This  configuration option is optional and is only relevant when
179              no nodeid is specified.  Some corosync clients require a  signed
180              32  bit  nodeid  that  is  greater  than zero however by default
181              corosync uses all 32 bits of the IPv4 address space when  gener‐
182              ating a nodeid.  Set this option to yes to force the high bit to
183              be zero and therefore ensure the nodeid is a positive signed  32
184              bit integer.
185
186              WARNING: Cluster behavior is undefined if this option is enabled
187              on only a subset of the cluster (for example  during  a  rolling
188              upgrade).
189
190
191       crypto_model
192              This  specifies  which  cryptographic  library should be used by
193              knet.  Supported values depend on the libknet build and  on  the
194              installed cryptography libraries. Typically nss and openssl will
195              be available but gcrypt and others could also be allowed.
196
197              The default is nss.
198
199
200       crypto_hash
201              This specifies which HMAC authentication should be used  to  au‐
202              thenticate  all  messages. Valid values are none (no authentica‐
203              tion), md5, sha1, sha256, sha384 and sha512. Encrypted transmis‐
204              sion is only supported for the knet transport.
205
206              The default is none.
207
208
209       crypto_cipher
210              This  specifies  which cipher should be used to encrypt all mes‐
211              sages.  Valid values are none (no  encryption),  aes256,  aes192
212              and  aes128.   Enabling crypto_cipher, requires also enabling of
213              crypto_hash. Encrypted transmission is only  supported  for  the
214              knet transport.
215
216              The default is none.
217
218
219       secauth
220              This implies crypto_cipher=aes256 and crypto_hash=sha256, unless
221              those options are explicitly set. Encrypted transmission is only
222              supported for the knet transport.
223
224              The default is off.
225
226
227       keyfile
228              This  specifies  the fully qualified path to the shared key used
229              to authenticate and encrypt data used within the Totem protocol.
230
231              The default is /etc/corosync/authkey.
232
233
234       key    Shared key stored in configuration instead of authkey file. This
235              option  has  lower  precedence  than keyfile option so it's used
236              only when keyfile is not specified.  Using this  option  is  not
237              recommended for security reasons.
238
239
240       link_mode
241              This specifies the Kronosnet mode, which may be passive, active,
242              or rr (round-robin).  passive: the active link with the  highest
243              priority  (highest  number)  will  be used. If one or more links
244              share the same priority the one with the lowest link ID will  be
245              used.   active:  All active links will be used simultaneously to
246              send traffic.  link priority is ignored.  rr:  Round-Robin  pol‐
247              icy. Each packet will be sent to the next active link in order.
248
249              If  only  one interface directive is specified, passive is auto‐
250              matically chosen.
251
252              The maximum number of interface directives that is allowed  with
253              Kronosnet is 8. For other transports it is 1.
254
255
256       netmtu This  specifies  the network maximum transmit unit.  To set this
257              value beyond 1500, the regular frame MTU, requires ethernet  de‐
258              vices  that support large, or also called jumbo, frames.  If any
259              device in the network doesn't support large frames, the protocol
260              will  not  operate properly.  The hosts must also have their mtu
261              size set from 1500 to whatever frame size is specified here.
262
263              Please note while some NICs or switches claim large  frame  sup‐
264              port,  they support 9000 MTU as the maximum frame size including
265              the IP header.  Setting the netmtu and host MTUs  to  9000  will
266              cause totem to use the full 9000 bytes of the frame.  Then Linux
267              will add a 18 byte header moving the full frame  size  to  9018.
268              As  a  result  some hardware will not operate properly with this
269              size of data.  A netmtu of 8982 seems to work for the few  large
270              frame  devices  that have been tested.  Some manufacturers claim
271              large frame support when in fact they  support  frame  sizes  of
272              4500 bytes.
273
274              When sending multicast traffic, if the network frequently recon‐
275              figures, chances are that some device  in  the  network  doesn't
276              support large frames.
277
278              Choose  hardware  carefully if intending to use large frame sup‐
279              port.
280
281              The default is 1500.
282
283
284       transport
285              This directive controls the transport mechanism used.   The  de‐
286              fault  is  knet.   The transport type can also be set to udpu or
287              udp.  Only knet allows crypto or multiple interfaces per node.
288
289
290       cluster_name
291              This specifies the name of cluster and it's used  for  automatic
292              generating of multicast address.
293
294
295       config_version
296              This  specifies version of config file. This is converted to un‐
297              signed 64-bit int.  By default it's 0. Option is used to prevent
298              joining old nodes with not up-to-date configuration. If value is
299              not 0, and node is going for first time (only  for  first  time,
300              join  after  split  doesn't  follow this rules) from single-node
301              membership to multiple nodes membership, other nodes config_ver‐
302              sions are collected. If current node config_version is not equal
303              to highest of collected versions, corosync is terminated.
304
305
306       ip_version
307              This specifies version of IP to ask DNS resolver for.  The value
308              can be one of ipv4 (look only for an IPv4 address) , ipv6 (check
309              only IPv6 address) , ipv4-6 (look for all address  families  and
310              use  first  IPv4  address found in the list if there is such ad‐
311              dress, otherwise use first IPv6 address) and  ipv6-4  (look  for
312              all  address  families  and  use first IPv6 address found in the
313              list if there is such address,  otherwise  use  first  IPv4  ad‐
314              dress).
315
316              Default  (if unspecified) is ipv6-4 for knet and udpu transports
317              and ipv4 for udp.
318
319              The knet transport supports  IPv4  and  IPv6  addresses  concur‐
320              rently, provided they are consistent on each link.
321
322              Within  the totem directive, there are several configuration op‐
323              tions which are used to control the operation of  the  protocol.
324              It  is  generally  not recommended to change any of these values
325              without proper guidance and sufficient testing.   Some  networks
326              may  require larger values if suffering from frequent reconfigu‐
327              rations.  Some applications may require faster failure detection
328              times which can be achieved by reducing the token timeout.
329
330
331       token  This  timeout is used directly or as a base for real token time‐
332              out calculation (explained in token_coefficient section).  Token
333              timeout specifies in milliseconds until a token loss is declared
334              after not receiving a token.  This is the time spent detecting a
335              failure  of a processor in the current configuration.  Reforming
336              a new configuration takes about 50 milliseconds in  addition  to
337              this timeout.
338
339              For  real token timeout used by totem it's possible to read cmap
340              value of runtime.config.totem.token key.
341
342              Be careful to use the same timeout values on each of  the  nodes
343              in the cluster or unpredictable results may occur.
344
345              The default is 3000 milliseconds.
346
347
348       token_warning
349              Specifies  the  interval between warnings that the token has not
350              been received.  The value is a percentage of the  token  timeout
351              and can be set to 0 to disable warnings.
352
353              The default is 75%.
354
355
356       token_coefficient
357              This  value  is used only when nodelist section is specified and
358              contains at least 3 nodes. If so, real  token  timeout  is  then
359              computed  as  token + (number_of_nodes - 2) * token_coefficient.
360              This allows cluster to scale  without  manually  changing  token
361              timeout every time new node is added. This value can be set to 0
362              resulting in effective removal of this feature.
363
364              The default is 650 milliseconds.
365
366
367       token_retransmit
368              This timeout specifies in milliseconds after how long before re‐
369              ceiving  a token the token is retransmitted.  This will be auto‐
370              matically calculated if token is modified.   It  is  not  recom‐
371              mended  to  alter  this value without guidance from the corosync
372              community.
373
374              The minimum is 30 milliseconds. If not set and error occur, make
375              sure token / (token_retransmits_before_loss_const + 0.2) is more
376              than 30.
377
378              The default is 238 milliseconds for two nodes cluster. Three  or
379              more nodes reference token_coefficient.
380
381
382       knet_compression_model
383              Type  of  compression used by Kronosnet. Supported values depend
384              on the libknet build and on the installed compression libraries.
385              Typically  zlib  and  lz4 will be available but bzip2 and others
386              could also be allowed. The default is 'none'.
387
388
389       knet_compression_threshold
390              Tells knet to NOT compress any packets that are smaller than the
391              value indicated. Default 100 bytes.
392
393              Set  to  0 to reset to the default.  Set to 1 to compress every‐
394              thing.
395
396
397       knet_compression_level
398              Many compression libraries allow tuning of  compression  parame‐
399              ters.  For  example  0 or 1 ... 9 are commonly used to determine
400              the level of compression. This value is passed unmodified to the
401              compression  library  so  it  is  recommended to consult the li‐
402              brary's documentation for more detailed information.
403
404
405       hold   This timeout specifies in milliseconds how long the token should
406              be  held  by  the  representative when the protocol is under low
407              utilization.   It is not recommended to alter this value without
408              guidance from the corosync community.
409
410              The default is 180 milliseconds.
411
412
413       token_retransmits_before_loss_const
414              This  value  identifies how many token retransmits should be at‐
415              tempted before forming a new configuration. It is also used  for
416              token_retransmit and hold calculations.
417
418              The default is 4 retransmissions.
419
420
421       join   This timeout specifies in milliseconds how long to wait for join
422              messages in the membership protocol.
423
424              The default is 50 milliseconds.
425
426
427       send_join
428              This timeout specifies in milliseconds an upper range between  0
429              and  send_join  to wait before sending a join message.  For con‐
430              figurations with less than 32 nodes, this parameter is not  nec‐
431              essary.  For larger rings, this parameter is necessary to ensure
432              the NIC is not overflowed with join messages on formation  of  a
433              new  ring.  A reasonable value for large rings (128 nodes) would
434              be 80msec.  Other timer values must also change if this value is
435              changed.   Seek  advice from the corosync mailing list if trying
436              to run larger configurations.
437
438              The default is 0 milliseconds.
439
440
441       consensus
442              This timeout specifies in milliseconds how long to wait for con‐
443              sensus  to be achieved before starting a new round of membership
444              configuration.  The minimum value for consensus must  be  1.2  *
445              token.  This value will be automatically calculated at 1.2 * to‐
446              ken if the user doesn't specify a consensus value.
447
448              For two node clusters, a consensus larger than the join  timeout
449              but less than token is safe.  For three node or larger clusters,
450              consensus should be larger than token.  There is  an  increasing
451              risk  of  odd  membership changes, which still guarantee virtual
452              synchrony,  as node count grows if consensus is less than token.
453
454              The default is 3600 milliseconds.
455
456
457       merge  This timeout specifies in milliseconds how long to  wait  before
458              checking  for  a  partition  when  no multicast traffic is being
459              sent.  If multicast traffic is being sent, the  merge  detection
460              happens automatically as a function of the protocol.
461
462              The default is 200 milliseconds.
463
464
465       downcheck
466              This  timeout  specifies in milliseconds how long to wait before
467              checking that a network interface is back up after it  has  been
468              downed.
469
470              The default is 1000 milliseconds.
471
472
473       fail_recv_const
474              This  constant specifies how many rotations of the token without
475              receiving any of the messages when messages should  be  received
476              may occur before a new configuration is formed.
477
478              The default is 2500 failures to receive a message.
479
480
481       seqno_unchanged_const
482              This  constant specifies how many rotations of the token without
483              any multicast traffic should occur  before  the  hold  timer  is
484              started.
485
486              The default is 30 rotations.
487
488
489       heartbeat_failures_allowed
490              [HeartBeating  mechanism]  Configures  the optional HeartBeating
491              mechanism for faster failure detection. Keep in mind that engag‐
492              ing  this  mechanism  in  lossy networks could cause faulty loss
493              declaration as the mechanism relies on the  network  for  heart‐
494              beating.
495
496              So as a rule of thumb use this mechanism if you require improved
497              failure in low to medium utilized networks.
498
499              This constant specifies the number  of  heartbeat  failures  the
500              system should tolerate before declaring heartbeat failure e.g 3.
501              Also if this value is not set or is 0 then the heartbeat  mecha‐
502              nism  is  not  engaged  in  the system and token rotation is the
503              method of failure detection
504
505              The default is 0 (disabled).
506
507
508       max_network_delay
509              [HeartBeating mechanism] This constant specifies in milliseconds
510              the  approximate  delay that your network takes to transport one
511              packet from one machine to another. This value is to be  set  by
512              system engineers and please don't change if not sure as this ef‐
513              fects the failure detection mechanism using heartbeat.
514
515              The default is 50 milliseconds.
516
517
518       window_size
519              This constant specifies the maximum number of messages that  may
520              be  sent  on  one  token  rotation.   If  all processors perform
521              equally well, this value could be large (300), which  would  in‐
522              troduce  higher  latency  from  origination to delivery for very
523              large rings.  To reduce latency in  large  rings(16+),  the  de‐
524              faults  are  a  safe compromise.  If 1 or more slow processor(s)
525              are present among fast  processors,  window_size  should  be  no
526              larger  than 256000 / netmtu to avoid overflow of the kernel re‐
527              ceive buffers.  The user is notified of this by the display of a
528              retransmit  list  in the notification logs.  There is no loss of
529              data, but performance is reduced when these errors occur.
530
531              The default is 50 messages.
532
533
534       max_messages
535              This constant specifies the maximum number of messages that  may
536              be  sent by one processor on receipt of the token.  The max_mes‐
537              sages parameter is limited to 256000 / netmtu to  prevent  over‐
538              flow of the kernel transmit buffers.
539
540              The default is 17 messages.
541
542
543       miss_count_const
544              This  constant defines the maximum number of times on receipt of
545              a token a message is checked for  retransmission  before  a  re‐
546              transmission  occurs.   This  parameter  is useful to modify for
547              switches that delay multicast packets compared to unicast  pack‐
548              ets.   The  default  setting  works  well  for nearly all modern
549              switches.
550
551              The default is 5 messages.
552
553
554       knet_pmtud_interval
555              How often the knet PMTUd runs to look for network  MTU  changes.
556              Value in seconds, default: 30
557
558
559       block_unlisted_ips
560              Allow  UDPU  and KNET to drop packets from IP addresses that are
561              not known (nodes which don't exist in the nodelist) to corosync.
562              Value is yes or no.
563
564              This  feature  is mainly to protect against the joining of nodes
565              with outdated configurations after a cluster split.  Another use
566              case is to allow the atomic merge of two independent clusters.
567
568              Changing  the  default value is not recommended, the overhead is
569              tiny and an existing cluster may fail if corosync is started  on
570              an unlisted node with an old configuration.
571
572              The default value is yes.
573
574
575       cancel_token_hold_on_retransmit
576              Allows  Corosync  to  hold token by representative when there is
577              too much retransmit messages. This allows network to process in‐
578              creased  load  without overloading it. Used mechanism is same as
579              described for hold directive.
580
581              Some deployments may prefer to never hold token  when  there  is
582              retransmit messages. If so, option should be set to yes.
583
584              The default value is no.
585
586
587       Within  the  logging directive, there are several configuration options
588       which are all optional.
589
590
591       The following 3 options are valid only for the top level logging direc‐
592       tive:
593
594
595       timestamp
596              This  specifies  that a timestamp is placed on all log messages.
597              It can be one of off (no timestamp), on (second precision  time‐
598              stamp)  or  hires  (millisecond  precision timestamp - only when
599              supported by LibQB).
600
601              The default is hires (or on if hires is not supported).
602
603
604       fileline
605              This specifies that file and line should be printed.
606
607              The default is off.
608
609
610       function_name
611              This specifies that the code function name should be printed.
612
613              The default is off.
614
615
616       blackbox
617              This specifies that blackbox functionality should be enabled.
618
619              The default is on.
620
621
622       The following options are valid both for top  level  logging  directive
623       and they can be overridden in logger_subsys entries.
624
625
626       to_stderr
627
628       to_logfile
629
630       to_syslog
631              These specify the destination of logging output. Any combination
632              of these options may be specified. Valid options are yes and no.
633
634              The default is syslog and stderr.
635
636              Please note, if you are using to_logfile and want to rotate  the
637              file, use logrotate(8) with the option copytruncate.  eg.
638              /var/log/corosync.log {
639                   missingok
640                   compress
641                   notifempty
642                   daily
643                   rotate 7
644                   copytruncate
645              }
646
647
648       logfile
649              If  the  to_logfile directive is set to yes , this option speci‐
650              fies the pathname of the log file.
651
652              No default.
653
654
655       logfile_priority
656              This specifies the logfile priority for this particular  subsys‐
657              tem.  Ignored if debug is on.  Possible values are: alert, crit,
658              debug (same as debug = on), emerg, err, info, notice, warning.
659
660              The default is: info.
661
662
663       syslog_facility
664              This specifies the syslog facility type that will  be  used  for
665              any messages sent to syslog. options are daemon, local0, local1,
666              local2, local3, local4, local5, local6 & local7.
667
668              The default is daemon.
669
670
671       syslog_priority
672              This specifies the syslog level for this  particular  subsystem.
673              Ignored if debug is on.  Possible values are: alert, crit, debug
674              (same as debug = on), emerg, err, info, notice, warning.
675
676              The default is: info.
677
678
679       debug  This specifies whether debug output is logged for this  particu‐
680              lar  logger. Also can contain value trace, what is highest level
681              of debug information.
682
683              The default is off.
684
685
686       Within the logging directive, logger_subsys directives are optional.
687
688
689       Within the logger_subsys sub-directive, all of the above  logging  con‐
690       figuration  options  are  valid and can be used to override the default
691       settings.  The subsys entry, described below, is mandatory to  identify
692       the subsystem.
693
694
695       subsys This  specifies  the subsystem identity (name) for which logging
696              is specified. This  is  the  name  used  by  a  service  in  the
697              log_init() call. E.g. 'CPG'. This directive is required.
698
699
700       Within  the quorum directive it is possible to specify the quorum algo‐
701       rithm to use with the
702
703
704       provider
705              directive. At the time of writing  only  corosync_votequorum  is
706              supported.  See votequorum(5) for configuration options.
707
708
709       Within the nodelist directive it is possible to specify specific infor‐
710       mation about nodes in cluster. Directive can contain only node  sub-di‐
711       rective, which specifies every node that should be a member of the mem‐
712       bership, and where non-default options are needed. Every node must have
713       at least ring0_addr field filled.
714
715       Every node that should be a member of the membership must be specified.
716
717       Possible options are:
718
719       ringX_addr
720              This  specifies IP or network hostname address of the particular
721              node.  X is a link number.
722
723
724       nodeid This configuration option is required for each node for  Kronos‐
725              net  mode.   It is a 32 bit value specifying the node identifier
726              delivered to the cluster membership service. The node identifier
727              value  of  zero  is  reserved and should not be used. If knet is
728              set, this field must be set.
729
730
731       name   This option is used mainly with knet transport to identify local
732              node.  It's also used by client software (pacemaker).  Algorithm
733              for identifying local node is following:
734
735              1.     Looks up $HOSTNAME in the nodelist
736
737              2.     If this fails strip the domain name  from  $HOSTNAME  and
738                     looks up that in the nodelist
739
740              3.     If  this fails look in the nodelist for a fully-qualified
741                     name whose short version matches  the  short  version  of
742                     $HOSTNAME
743
744              4.     If  all this fails then search the interfaces list for an
745                     address that matches a name in the nodelist
746
747
748       Within the system directive it is possible to specify system options.
749
750       Possible options are:
751
752       qb_ipc_type
753              This specifies type of IPC to use. Can be  one  of  native  (de‐
754              fault),  shm and socket.  Native means one of shm or socket, de‐
755              pending on what is supported by OS. On systems with support  for
756              both,  SHM is selected. SHM is generally faster, but need to al‐
757              locate ring buffer file in /dev/shm.
758
759
760       sched_rr
761              Should be set to yes (default) if corosync  should  try  to  set
762              round robin realtime scheduling with maximal priority to itself.
763              When setting of scheduler fails, fallback to set maximal  prior‐
764              ity.
765
766
767       priority
768              Set  priority  of  corosync process. Valid only when sched_rr is
769              set to no.  Can be ether numeric value with similar  meaning  as
770              nice(1) or max / min meaning maximal / minimal priority (so min‐
771              imal / maximal nice value).
772
773
774       move_to_root_cgroup
775              Can be one of yes (Corosync always moves itself to root cgroup),
776              no  (Corosync never tries to move itself to root cgroup) or auto
777              (Corosync first checks if sched_rr is enabled,  and  if  so,  it
778              tries to set round robin realtime scheduling with maximal prior‐
779              ity to itself.  If setting of priority fails, corosync tries  to
780              move itself to root cgroup and retries setting of priority).
781
782              This  feature is available only for systems with cgroups v1 with
783              RT sched enabled (Linux with  CONFIG_RT_GROUP_SCHED  kernel  op‐
784              tion) and cgroups v2.
785
786              It's  worth  noting  that currently (May 3 2021) cgroup2 doesn’t
787              yet support control of realtime processes and the cpu controller
788              can only be enabled when all RT processes are in the root cgroup
789              (applies only for kernel with CONFIG_RT_GROUP_SCHED enabled). So
790              when  move_to_root_cgroup  is  disabled, kernel is compiled with
791              CONFIG_RT_GROUP_SCHED and systemd is used, it may be  impossible
792              to  make  systemd  options like CPUQuota working correctly until
793              corosync is stopped.
794
795              Also when moving to root cgroup is enforced  and  used  together
796              with  cgroup2 and systemd it makes impossible (most of the time)
797              for journald to add systemd specific metadata (most  importantly
798              _SYSTEMD_UNIT) properly, because corosync is moved out of cgroup
799              created by systemd. This means it  is  not  possible  to  filter
800              corosync  logged  messages  based on these metadata (for example
801              using -u or _SYSTEMD_UNIT=UNIT pattern) and also running system‐
802              ctl  status  doesn't  display  (all) corosync log messages.  The
803              problem is even worse because journald caches pid for some  time
804              (approx.  5 sec) so initial corosync messages have correct meta‐
805              data.
806
807
808       allow_knet_handle_fallback
809              If knet handle creation fails using privileged operations, allow
810              fallback  to creating knet handle using unprivileged operations.
811              Defaults to no,  meaning  if  privileged  knet  handle  creation
812              fails, corosync will refuse to start.
813
814              The  knet  handle will always be created using privileged opera‐
815              tions if possible, setting this to yes only allows  fallback  to
816              unprivileged operations. This fallback may result in performance
817              issues, but if running in an unprivileged environment, e.g. as a
818              normal user or in unprivileged container, this may be required.
819
820
821       state_dir
822              Existing  directory  where  corosync should chdir into. Corosync
823              stores important state files and blackboxes there.
824
825              The default is /var/lib/corosync.
826
827
828       Within the resources directive it is possible to  specify  options  for
829       resources.
830
831       Possible option is:
832
833       watchdog_device
834              (Valid only if Corosync was compiled with watchdog support.)
835              Watchdog  device  to  use, for example /dev/watchdog.  If unset,
836              empty or "off", no watchdog is used.
837
838              In a cluster with properly configured power fencing  a  watchdog
839              provides  no additional value.  On the other hand, slow watchdog
840              communication may incur multi-second delays in the Corosync main
841              loop,  potentially breaking down membership.  IPMI watchdogs are
842              particularly  notorious  in  this  regard:   read   about   kip‐
843              mid_max_busy_us in IPMI.txt in the Linux kernel documentation.
844
845
846
847       Within  the  nozzle  directive  it is possible to specify options for a
848       libnozzle device. This is a pseudo ethernet device that routes  network
849       traffic  through a channel on the corosync knet network (NOT cpg or any
850       corosync internal service) to other nodes in the cluster.  This  allows
851       applications  to  take advantage of knet features such as multipathing,
852       automatic failover, link switching etc. Note that libnozzle  is  not  a
853       reliable transport, but you can tunnel TCP through it for reliable com‐
854       munications.
855       libnozzle also supports optional interface  up/down  scripts  that  are
856       kept under a /etc/corosync/updown.d/ directory. See the knet documenta‐
857       tion for more information.
858       Only one nozzle device is allowed.
859       The nozzle stanza takes several options:
860
861       name   The name of the network device to be created. On Linux this  may
862              be  any  name  at  all, other platforms have restrictions on the
863              name.
864
865       ipaddr The IP address (IPv6 or IPv4) of the interface. The bottom  part
866              of  this  address will be replaced by the local node's nodeid in
867              conjunction with ipprefix. so, eg ipaddr: 192.168.1.0  ipprefix:
868              24  will  make  nodeids  1,2,5  use  IP  addresses  192.168.1.1,
869              192.168.1.2 & 192.168.1.5.  If a prefix length  of  16  is  used
870              then the bottom two bytes will be filled in with nodeid numbers.
871              IPv6 addresses must end in '::', the nodeid will be added  after
872              the  two  colons  to make the local IP address.  Only one IP ad‐
873              dress is currently supported in the  corosync.conf  file.  Addi‐
874              tional  IP  addresses  can be added in the ifup script if neces‐
875              sary.
876
877       ipprefix
878              specifies the IP address  prefix  for  the  nozzle  device  (see
879              above)
880
881       macaddr
882              Specifies  the  MAC address prefix for the nozzle device. As for
883              the IP address, the bottom part  of  the  MAC  address  will  be
884              filled  in with the node id. In this case no prefix applies, the
885              bottom two bytes of the MAC address will always  be  overwritten
886              with  the  node  id. So specifying macaddr: 54:54:12:24:12:12 on
887              nodeid  1  will  result  in  it  having   a   MAC   address   of
888              54:54:12:24:00:01
889
890

TO ADD A NEW NODE TO THE CLUSTER

892       For  example to add a node with address 10.24.38.108 with nodeid 3. The
893       node has the name NEW (in DNS or /etc/hosts) and is not currently  run‐
894       ning corosync. The current corosync.conf nodelist looks like this:
895
896              nodelist {
897                  node {
898                      nodeid: 1
899                      ring0_addr: 10.24.38.101
900                      name: node1
901                  }
902                  node {
903                      nodeid: 2
904                      ring0_addr: 10.24.38.102
905                      name: node2
906
907                  }
908              }
909
910       Add  a  new  entry  for the node below the existing nodes. Node entries
911       don't have to be in nodeid order, but it will help keep  you  sane.  So
912       the nodelist now looks like this:
913
914              nodelist {
915                  node {
916                      nodeid: 1
917                      ring0_addr: 10.24.38.101
918                      name: node1
919                  }
920                  node {
921                      nodeid: 2
922                      ring0_addr: 10.24.38.102
923                      name: node2
924
925                  }
926                  node {
927                      nodeid: 3
928                      ring0_addr: 10.24.38.108
929                      name: NEW
930
931                  }
932              }
933
934       This  file must then be copied onto all three nodes -  the existing two
935       nodes, and the new one.  On one of the existing  corosync  nodes,  tell
936       corosync to re-read the updated config file into memory:
937
938              corosync-cfgtool -R
939
940       This  command  only needs to be run on one node in the cluster. You may
941       then start corosync on the NEW node and it should join the cluster.  If
942       this doesn't work as expected then check the communications between all
943       three nodes is working, and check the syslog files  on  all  nodes  for
944       more  information.  It's important to note that the key bit of informa‐
945       tion about a node failing to join might be on a different node than you
946       expect.
947
948

TO REMOVE A NODE FROM THE CLUSTER

950       This  is the reverse procedure to 'Adding a node' above. First you need
951       to shut down the node you will be removing from the cluster.
952
953              corosync-cfgtool -H
954
955
956
957       Then delete the nodelist stanza from corosync.conf and  finally  update
958       corosync on the remaining nodes by running
959
960              corosync-cfgtool -R
961
962       on one of them.
963
964

ADDRESS RESOLUTION

966       corosync  resolves  ringX_addr  names/IP  addresses  using  the  getad‐
967       drinfo(3) call with respect of totem.ip_version setting.
968
969       getaddrinfo() function uses a sophisticated algorithm to sort node  ad‐
970       dresses  into  a  preferred order and corosync always chooses the first
971       address in that list of the required family.  As such it  is  essential
972       that  your DNS or /etc/hosts files are correctly configured so that all
973       addresses for ringX appear on the same network (or are  reachable  with
974       minimal  hops)  and  over the same IP protocol. If this is not the case
975       then some nodes might not be able to join the cluster. It  is  possible
976       to override the search order used by getaddrinfo() using the configura‐
977       tion file /etc/gai.conf(5) if necessary, but this is not recommended.
978
979       If there is any doubt about the order of addresses returned from getad‐
980       drinfo() then it might be simpler to use IP addresses (v4 or v6) in the
981       ringX_addr field.
982
983

FILES

985       /etc/corosync/corosync.conf
986              The corosync executive configuration file.
987
988