ctdb(7) - c7

1CTDB(7)                  CTDB - clustered TDB database                 CTDB(7)
2
3
4

NAME

6       ctdb - Clustered TDB
7

DESCRIPTION

9       CTDB is a clustered database component in clustered Samba that provides
10       a high-availability load-sharing CIFS server cluster.
11
12       The main functions of CTDB are:
13
14       ·   Provide a clustered version of the TDB database with automatic
15           rebuild/recovery of the databases upon node failures.
16
17       ·   Monitor nodes in the cluster and services running on each node.
18
19       ·   Manage a pool of public IP addresses that are used to provide
20           services to clients. Alternatively, CTDB can be used with LVS.
21
22       Combined with a cluster filesystem CTDB provides a full
23       high-availablity (HA) environment for services such as clustered Samba,
24       NFS and other services.
25

ANATOMY OF A CTDB CLUSTER

27       A CTDB cluster is a collection of nodes with 2 or more network
28       interfaces. All nodes provide network (usually file/NAS) services to
29       clients. Data served by file services is stored on shared storage
30       (usually a cluster filesystem) that is accessible by all nodes.
31
32       CTDB provides an "all active" cluster, where services are load balanced
33       across all nodes.
34

RECOVERY LOCK

36       CTDB uses a recovery lock to avoid a split brain, where a cluster
37       becomes partitioned and each partition attempts to operate
38       independently. Issues that can result from a split brain include file
39       data corruption, because file locking metadata may not be tracked
40       correctly.
41
42       CTDB uses a cluster leader and follower model of cluster management.
43       All nodes in a cluster elect one node to be the leader. The leader node
44       coordinates privileged operations such as database recovery and IP
45       address failover. CTDB refers to the leader node as the recovery
46       master. This node takes and holds the recovery lock to assert its
47       privileged role in the cluster.
48
49       By default, the recovery lock is implemented using a file (specified by
50       CTDB_RECOVERY_LOCK) residing in shared storage (usually) on a cluster
51       filesystem. To support a recovery lock the cluster filesystem must
52       support lock coherence. See ping_pong(1) for more details.
53
54       The recovery lock can also be implemented using an arbitrary cluster
55       mutex call-out by using an exclamation point ('!') as the first
56       character of CTDB_RECOVERY_LOCK. For example, a value of
57       !/usr/bin/myhelper recovery would run the given helper with the
58       specified arguments. See the source code relating to cluster mutexes
59       for clues about writing call-outs.
60
61       If a cluster becomes partitioned (for example, due to a communication
62       failure) and a different recovery master is elected by the nodes in
63       each partition, then only one of these recovery masters will be able to
64       take the recovery lock. The recovery master in the "losing" partition
65       will not be able to take the recovery lock and will be excluded from
66       the cluster. The nodes in the "losing" partition will elect each node
67       in turn as their recovery master so eventually all the nodes in that
68       partition will be excluded.
69
70       CTDB does sanity checks to ensure that the recovery lock is held as
71       expected.
72
73       CTDB can run without a recovery lock but this is not recommended as
74       there will be no protection from split brains.
75

PRIVATE VS PUBLIC ADDRESSES

77       Each node in a CTDB cluster has multiple IP addresses assigned to it:
78
79       ·   A single private IP address that is used for communication between
80           nodes.
81
82       ·   One or more public IP addresses that are used to provide NAS or
83           other services.
84
85
86   Private address
87       Each node is configured with a unique, permanently assigned private
88       address. This address is configured by the operating system. This
89       address uniquely identifies a physical node in the cluster and is the
90       address that CTDB daemons will use to communicate with the CTDB daemons
91       on other nodes.
92
93       Private addresses are listed in the file specified by the CTDB_NODES
94       configuration variable (see ctdbd.conf(5), default /etc/ctdb/nodes).
95       This file contains the list of private addresses for all nodes in the
96       cluster, one per line. This file must be the same on all nodes in the
97       cluster.
98
99       Private addresses should not be used by clients to connect to services
100       provided by the cluster.
101
102       It is strongly recommended that the private addresses are configured on
103       a private network that is separate from client networks. This is
104       because the CTDB protocol is both unauthenticated and unencrypted. If
105       clients share the private network then steps need to be taken to stop
106       injection of packets to relevant ports on the private addresses. It is
107       also likely that CTDB protocol traffic between nodes could leak
108       sensitive information if it can be intercepted.
109
110       Example /etc/ctdb/nodes for a four node cluster:
111
112           192.168.1.1
113           192.168.1.2
114           192.168.1.3
115           192.168.1.4
116
117
118   Public addresses
119       Public addresses are used to provide services to clients. Public
120       addresses are not configured at the operating system level and are not
121       permanently associated with a particular node. Instead, they are
122       managed by CTDB and are assigned to interfaces on physical nodes at
123       runtime.
124
125       The CTDB cluster will assign/reassign these public addresses across the
126       available healthy nodes in the cluster. When one node fails, its public
127       addresses will be taken over by one or more other nodes in the cluster.
128       This ensures that services provided by all public addresses are always
129       available to clients, as long as there are nodes available capable of
130       hosting this address.
131
132       The public address configuration is stored in a file on each node
133       specified by the CTDB_PUBLIC_ADDRESSES configuration variable (see
134       ctdbd.conf(5), recommended /etc/ctdb/public_addresses). This file
135       contains a list of the public addresses that the node is capable of
136       hosting, one per line. Each entry also contains the netmask and the
137       interface to which the address should be assigned.
138
139       Example /etc/ctdb/public_addresses for a node that can host 4 public
140       addresses, on 2 different interfaces:
141
142           10.1.1.1/24 eth1
143           10.1.1.2/24 eth1
144           10.1.2.1/24 eth2
145           10.1.2.2/24 eth2
146
147
148       In many cases the public addresses file will be the same on all nodes.
149       However, it is possible to use different public address configurations
150       on different nodes.
151
152       Example: 4 nodes partitioned into two subgroups:
153
154           Node 0:/etc/ctdb/public_addresses
155                10.1.1.1/24 eth1
156                10.1.1.2/24 eth1
157
158           Node 1:/etc/ctdb/public_addresses
159                10.1.1.1/24 eth1
160                10.1.1.2/24 eth1
161
162           Node 2:/etc/ctdb/public_addresses
163                10.1.2.1/24 eth2
164                10.1.2.2/24 eth2
165
166           Node 3:/etc/ctdb/public_addresses
167                10.1.2.1/24 eth2
168                10.1.2.2/24 eth2
169
170
171       In this example nodes 0 and 1 host two public addresses on the 10.1.1.x
172       network while nodes 2 and 3 host two public addresses for the 10.1.2.x
173       network.
174
175       Public address 10.1.1.1 can be hosted by either of nodes 0 or 1 and
176       will be available to clients as long as at least one of these two nodes
177       are available.
178
179       If both nodes 0 and 1 become unavailable then public address 10.1.1.1
180       also becomes unavailable. 10.1.1.1 can not be failed over to nodes 2 or
181       3 since these nodes do not have this public address configured.
182
183       The ctdb ip command can be used to view the current assignment of
184       public addresses to physical nodes.
185

NODE STATUS

187       The current status of each node in the cluster can be viewed by the
188       ctdb status command.
189
190       A node can be in one of the following states:
191
192       OK
193           This node is healthy and fully functional. It hosts public
194           addresses to provide services.
195
196       DISCONNECTED
197           This node is not reachable by other nodes via the private network.
198           It is not currently participating in the cluster. It does not host
199           public addresses to provide services. It might be shut down.
200
201       DISABLED
202           This node has been administratively disabled. This node is
203           partially functional and participates in the cluster. However, it
204           does not host public addresses to provide services.
205
206       UNHEALTHY
207           A service provided by this node has failed a health check and
208           should be investigated. This node is partially functional and
209           participates in the cluster. However, it does not host public
210           addresses to provide services. Unhealthy nodes should be
211           investigated and may require an administrative action to rectify.
212
213       BANNED
214           CTDB is not behaving as designed on this node. For example, it may
215           have failed too many recovery attempts. Such nodes are banned from
216           participating in the cluster for a configurable time period before
217           they attempt to rejoin the cluster. A banned node does not host
218           public addresses to provide services. All banned nodes should be
219           investigated and may require an administrative action to rectify.
220
221       STOPPED
222           This node has been administratively exclude from the cluster. A
223           stopped node does no participate in the cluster and does not host
224           public addresses to provide services. This state can be used while
225           performing maintenance on a node.
226
227       PARTIALLYONLINE
228           A node that is partially online participates in a cluster like a
229           healthy (OK) node. Some interfaces to serve public addresses are
230           down, but at least one interface is up. See also ctdb ifaces.
231

CAPABILITIES

233       Cluster nodes can have several different capabilities enabled. These
234       are listed below.
235
236       RECMASTER
237           Indicates that a node can become the CTDB cluster recovery master.
238           The current recovery master is decided via an election held by all
239           active nodes with this capability.
240
241           Default is YES.
242
243       LMASTER
244           Indicates that a node can be the location master (LMASTER) for
245           database records. The LMASTER always knows which node has the
246           latest copy of a record in a volatile database.
247
248           Default is YES.
249
250       The RECMASTER and LMASTER capabilities can be disabled when CTDB is
251       used to create a cluster spanning across WAN links. In this case CTDB
252       acts as a WAN accelerator.
253

LVS

255       LVS is a mode where CTDB presents one single IP address for the entire
256       cluster. This is an alternative to using public IP addresses and
257       round-robin DNS to loadbalance clients across the cluster.
258
259       This is similar to using a layer-4 loadbalancing switch but with some
260       restrictions.
261
262       One extra LVS public address is assigned on the public network to each
263       LVS group. Each LVS group is a set of nodes in the cluster that
264       presents the same LVS address public address to the outside world.
265       Normally there would only be one LVS group spanning an entire cluster,
266       but in situations where one CTDB cluster spans multiple physical sites
267       it might be useful to have one LVS group for each site. There can be
268       multiple LVS groups in a cluster but each node can only be member of
269       one LVS group.
270
271       Client access to the cluster is load-balanced across the HEALTHY nodes
272       in an LVS group. If no HEALTHY nodes exists then all nodes in the group
273       are used, regardless of health status. CTDB will, however never
274       load-balance LVS traffic to nodes that are BANNED, STOPPED, DISABLED or
275       DISCONNECTED. The ctdb lvs command is used to show which nodes are
276       currently load-balanced across.
277
278       In each LVS group, one of the nodes is selected by CTDB to be the LVS
279       master. This node receives all traffic from clients coming in to the
280       LVS public address and multiplexes it across the internal network to
281       one of the nodes that LVS is using. When responding to the client, that
282       node will send the data back directly to the client, bypassing the LVS
283       master node. The command ctdb lvs master will show which node is the
284       current LVS master.
285
286       The path used for a client I/O is:
287
288        1. Client sends request packet to LVSMASTER.
289
290        2. LVSMASTER passes the request on to one node across the internal
291           network.
292
293        3. Selected node processes the request.
294
295        4. Node responds back to client.
296
297       This means that all incoming traffic to the cluster will pass through
298       one physical node, which limits scalability. You can send more data to
299       the LVS address that one physical node can multiplex. This means that
300       you should not use LVS if your I/O pattern is write-intensive since you
301       will be limited in the available network bandwidth that node can
302       handle. LVS does work very well for read-intensive workloads where only
303       smallish READ requests are going through the LVSMASTER bottleneck and
304       the majority of the traffic volume (the data in the read replies) goes
305       straight from the processing node back to the clients. For
306       read-intensive i/o patterns you can achieve very high throughput rates
307       in this mode.
308
309       Note: you can use LVS and public addresses at the same time.
310
311       If you use LVS, you must have a permanent address configured for the
312       public interface on each node. This address must be routable and the
313       cluster nodes must be configured so that all traffic back to client
314       hosts are routed through this interface. This is also required in order
315       to allow samba/winbind on the node to talk to the domain controller.
316       This LVS IP address can not be used to initiate outgoing traffic.
317
318       Make sure that the domain controller and the clients are reachable from
319       a node before you enable LVS. Also ensure that outgoing traffic to
320       these hosts is routed out through the configured public interface.
321
322   Configuration
323       To activate LVS on a CTDB node you must specify the
324       CTDB_LVS_PUBLIC_IFACE, CTDB_LVS_PUBLIC_IP and CTDB_LVS_NODES
325       configuration variables.  CTDB_LVS_NODES specifies a file containing
326       the private address of all nodes in the current node's LVS group.
327
328       Example:
329
330           CTDB_LVS_PUBLIC_IFACE=eth1
331           CTDB_LVS_PUBLIC_IP=10.1.1.237
332           CTDB_LVS_NODES=/etc/ctdb/lvs_nodes
333
334
335       Example /etc/ctdb/lvs_nodes:
336
337           192.168.1.2
338           192.168.1.3
339           192.168.1.4
340
341
342       Normally any node in an LVS group can act as the LVS master. Nodes that
343       are highly loaded due to other demands maybe flagged with the
344       "slave-only" option in the CTDB_LVS_NODES file to limit the LVS
345       functionality of those nodes.
346
347       LVS nodes file that excludes 192.168.1.4 from being the LVS master
348       node:
349
350           192.168.1.2
351           192.168.1.3
352           192.168.1.4 slave-only
353
354

TRACKING AND RESETTING TCP CONNECTIONS

356       CTDB tracks TCP connections from clients to public IP addresses, on
357       known ports. When an IP address moves from one node to another, all
358       existing TCP connections to that IP address are reset. The node taking
359       over this IP address will also send gratuitous ARPs (for IPv4, or
360       neighbour advertisement, for IPv6). This allows clients to reconnect
361       quickly, rather than waiting for TCP timeouts, which can be very long.
362
363       It is important that established TCP connections do not survive a
364       release and take of a public IP address on the same node. Such
365       connections can get out of sync with sequence and ACK numbers,
366       potentially causing a disruptive ACK storm.
367

NAT GATEWAY

369       NAT gateway (NATGW) is an optional feature that is used to configure
370       fallback routing for nodes. This allows cluster nodes to connect to
371       external services (e.g. DNS, AD, NIS and LDAP) when they do not host
372       any public addresses (e.g. when they are unhealthy).
373
374       This also applies to node startup because CTDB marks nodes as UNHEALTHY
375       until they have passed a "monitor" event. In this context, NAT gateway
376       helps to avoid a "chicken and egg" situation where a node needs to
377       access an external service to become healthy.
378
379       Another way of solving this type of problem is to assign an extra
380       static IP address to a public interface on every node. This is simpler
381       but it uses an extra IP address per node, while NAT gateway generally
382       uses only one extra IP address.
383
384   Operation
385       One extra NATGW public address is assigned on the public network to
386       each NATGW group. Each NATGW group is a set of nodes in the cluster
387       that shares the same NATGW address to talk to the outside world.
388       Normally there would only be one NATGW group spanning an entire
389       cluster, but in situations where one CTDB cluster spans multiple
390       physical sites it might be useful to have one NATGW group for each
391       site.
392
393       There can be multiple NATGW groups in a cluster but each node can only
394       be member of one NATGW group.
395
396       In each NATGW group, one of the nodes is selected by CTDB to be the
397       NATGW master and the other nodes are consider to be NATGW slaves. NATGW
398       slaves establish a fallback default route to the NATGW master via the
399       private network. When a NATGW slave hosts no public IP addresses then
400       it will use this route for outbound connections. The NATGW master hosts
401       the NATGW public IP address and routes outgoing connections from slave
402       nodes via this IP address. It also establishes a fallback default
403       route.
404
405   Configuration
406       NATGW is usually configured similar to the following example
407       configuration:
408
409           CTDB_NATGW_NODES=/etc/ctdb/natgw_nodes
410           CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24
411           CTDB_NATGW_PUBLIC_IP=10.0.0.227/24
412           CTDB_NATGW_PUBLIC_IFACE=eth0
413           CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1
414
415
416       Normally any node in a NATGW group can act as the NATGW master. Some
417       configurations may have special nodes that lack connectivity to a
418       public network. In such cases, those nodes can be flagged with the
419       "slave-only" option in the CTDB_NATGW_NODES file to limit the NATGW
420       functionality of those nodes.
421
422       See the NAT GATEWAY section in ctdbd.conf(5) for more details of NATGW
423       configuration.
424
425   Implementation details
426       When the NATGW functionality is used, one of the nodes is selected to
427       act as a NAT gateway for all the other nodes in the group when they
428       need to communicate with the external services. The NATGW master is
429       selected to be a node that is most likely to have usable networks.
430
431       The NATGW master hosts the NATGW public IP address CTDB_NATGW_PUBLIC_IP
432       on the configured public interfaces CTDB_NATGW_PUBLIC_IFACE and acts as
433       a router, masquerading outgoing connections from slave nodes via this
434       IP address. If CTDB_NATGW_DEFAULT_GATEWAY is set then it also
435       establishes a fallback default route to the configured this gateway
436       with a metric of 10. A metric 10 route is used so it can co-exist with
437       other default routes that may be available.
438
439       A NATGW slave establishes its fallback default route to the NATGW
440       master via the private network CTDB_NATGW_PRIVATE_NETWORKwith a metric
441       of 10. This route is used for outbound connections when no other
442       default route is available because the node hosts no public addresses.
443       A metric 10 routes is used so that it can co-exist with other default
444       routes that may be available when the node is hosting public addresses.
445
446       CTDB_NATGW_STATIC_ROUTES can be used to have NATGW create more specific
447       routes instead of just default routes.
448
449       This is implemented in the 11.natgw eventscript. Please see the
450       eventscript file and the NAT GATEWAY section in ctdbd.conf(5) for more
451       details.
452

POLICY ROUTING

454       Policy routing is an optional CTDB feature to support complex network
455       topologies. Public addresses may be spread across several different
456       networks (or VLANs) and it may not be possible to route packets from
457       these public addresses via the system's default route. Therefore, CTDB
458       has support for policy routing via the 13.per_ip_routing eventscript.
459       This allows routing to be specified for packets sourced from each
460       public address. The routes are added and removed as CTDB moves public
461       addresses between nodes.
462
463   Configuration variables
464       There are 4 configuration variables related to policy routing:
465       CTDB_PER_IP_ROUTING_CONF, CTDB_PER_IP_ROUTING_RULE_PREF,
466       CTDB_PER_IP_ROUTING_TABLE_ID_LOW, CTDB_PER_IP_ROUTING_TABLE_ID_HIGH.
467       See the POLICY ROUTING section in ctdbd.conf(5) for more details.
468
469   Configuration
470       The format of each line of CTDB_PER_IP_ROUTING_CONF is:
471
472           <public_address> <network> [ <gateway> ]
473
474
475       Leading whitespace is ignored and arbitrary whitespace may be used as a
476       separator. Lines that have a "public address" item that doesn't match
477       an actual public address are ignored. This means that comment lines can
478       be added using a leading character such as '#', since this will never
479       match an IP address.
480
481       A line without a gateway indicates a link local route.
482
483       For example, consider the configuration line:
484
485             192.168.1.99 192.168.1.1/24
486
487
488       If the corresponding public_addresses line is:
489
490             192.168.1.99/24     eth2,eth3
491
492
493       CTDB_PER_IP_ROUTING_RULE_PREF is 100, and CTDB adds the address to eth2
494       then the following routing information is added:
495
496             ip rule add from 192.168.1.99 pref 100 table ctdb.192.168.1.99
497             ip route add 192.168.1.0/24 dev eth2 table ctdb.192.168.1.99
498
499
500       This causes traffic from 192.168.1.1 to 192.168.1.0/24 go via eth2.
501
502       The ip rule command will show (something like - depending on other
503       public addresses and other routes on the system):
504
505             0:      from all lookup local
506             100:         from 192.168.1.99 lookup ctdb.192.168.1.99
507             32766:  from all lookup main
508             32767:  from all lookup default
509
510
511       ip route show table ctdb.192.168.1.99 will show:
512
513             192.168.1.0/24 dev eth2 scope link
514
515
516       The usual use for a line containing a gateway is to add a default route
517       corresponding to a particular source address. Consider this line of
518       configuration:
519
520             192.168.1.99 0.0.0.0/0 192.168.1.1
521
522
523       In the situation described above this will cause an extra routing
524       command to be executed:
525
526             ip route add 0.0.0.0/0 via 192.168.1.1 dev eth2 table ctdb.192.168.1.99
527
528
529       With both configuration lines, ip route show table ctdb.192.168.1.99
530       will show:
531
532             192.168.1.0/24 dev eth2 scope link
533             default via 192.168.1.1 dev eth2
534
535
536   Sample configuration
537       Here is a more complete example configuration.
538
539           /etc/ctdb/public_addresses:
540
541             192.168.1.98 eth2,eth3
542             192.168.1.99 eth2,eth3
543
544           /etc/ctdb/policy_routing:
545
546             192.168.1.98 192.168.1.0/24
547             192.168.1.98 192.168.200.0/24    192.168.1.254
548             192.168.1.98 0.0.0.0/0      192.168.1.1
549             192.168.1.99 192.168.1.0/24
550             192.168.1.99 192.168.200.0/24    192.168.1.254
551             192.168.1.99 0.0.0.0/0      192.168.1.1
552
553
554       The routes local packets as expected, the default route is as
555       previously discussed, but packets to 192.168.200.0/24 are routed via
556       the alternate gateway 192.168.1.254.
557

NOTIFICATION SCRIPT

559       When certain state changes occur in CTDB, it can be configured to
560       perform arbitrary actions via a notification script. For example,
561       sending SNMP traps or emails when a node becomes unhealthy or similar.
562
563       This is activated by setting the CTDB_NOTIFY_SCRIPT configuration
564       variable. The specified script must be executable.
565
566       Use of the provided /etc/ctdb/notify.sh script is recommended. It
567       executes files in /etc/ctdb/notify.d/.
568
569       CTDB currently generates notifications after CTDB changes to these
570       states:
571           init
572           setup
573           startup
574           healthy
575           unhealthy
576

DEBUG LEVELS

578       Valid values for DEBUGLEVEL are:
579           ERR
580           WARNING
581           NOTICE
582           INFO
583           DEBUG
584

REMOTE CLUSTER NODES

586       It is possible to have a CTDB cluster that spans across a WAN link. For
587       example where you have a CTDB cluster in your datacentre but you also
588       want to have one additional CTDB node located at a remote branch site.
589       This is similar to how a WAN accelerator works but with the difference
590       that while a WAN-accelerator often acts as a Proxy or a MitM, in the
591       ctdb remote cluster node configuration the Samba instance at the remote
592       site IS the genuine server, not a proxy and not a MitM, and thus
593       provides 100% correct CIFS semantics to clients.
594
595       See the cluster as one single multihomed samba server where one of the
596       NICs (the remote node) is very far away.
597
598       NOTE: This does require that the cluster filesystem you use can cope
599       with WAN-link latencies. Not all cluster filesystems can handle
600       WAN-link latencies! Whether this will provide very good WAN-accelerator
601       performance or it will perform very poorly depends entirely on how
602       optimized your cluster filesystem is in handling high latency for data
603       and metadata operations.
604
605       To activate a node as being a remote cluster node you need to set the
606       following two parameters in /etc/sysconfig/ctdb for the remote node:
607
608           CTDB_CAPABILITY_LMASTER=no
609           CTDB_CAPABILITY_RECMASTER=no
610
611
612       Verify with the command "ctdb getcapabilities" that that node no longer
613       has the recmaster or the lmaster capabilities.
614

AUTHOR

621       This documentation was written by Ronnie Sahlberg, Amitay Isaacs,
622       Martin Schwenke
623

COPYRIGHT

625       Copyright © 2007 Andrew Tridgell, Ronnie Sahlberg
626
627       This program is free software; you can redistribute it and/or modify it
628       under the terms of the GNU General Public License as published by the
629       Free Software Foundation; either version 3 of the License, or (at your
630       option) any later version.
631
632       This program is distributed in the hope that it will be useful, but
633       WITHOUT ANY WARRANTY; without even the implied warranty of
634       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
635       General Public License for more details.
636
637       You should have received a copy of the GNU General Public License along
638       with this program; if not, see http://www.gnu.org/licenses.
639
640
641
642
643ctdb                              10/30/2018                           CTDB(7)