1CTDB(7) CTDB - clustered TDB database CTDB(7)
2
3
4
6 ctdb - Clustered TDB
7
9 CTDB is a clustered database component in clustered Samba that provides
10 a high-availability load-sharing CIFS server cluster.
11
12 The main functions of CTDB are:
13
14 · Provide a clustered version of the TDB database with automatic
15 rebuild/recovery of the databases upon node failures.
16
17 · Monitor nodes in the cluster and services running on each node.
18
19 · Manage a pool of public IP addresses that are used to provide
20 services to clients. Alternatively, CTDB can be used with LVS.
21
22 Combined with a cluster filesystem CTDB provides a full
23 high-availablity (HA) environment for services such as clustered Samba,
24 NFS and other services.
25
27 A CTDB cluster is a collection of nodes with 2 or more network
28 interfaces. All nodes provide network (usually file/NAS) services to
29 clients. Data served by file services is stored on shared storage
30 (usually a cluster filesystem) that is accessible by all nodes.
31
32 CTDB provides an "all active" cluster, where services are load balanced
33 across all nodes.
34
36 CTDB uses a recovery lock to avoid a split brain, where a cluster
37 becomes partitioned and each partition attempts to operate
38 independently. Issues that can result from a split brain include file
39 data corruption, because file locking metadata may not be tracked
40 correctly.
41
42 CTDB uses a cluster leader and follower model of cluster management.
43 All nodes in a cluster elect one node to be the leader. The leader node
44 coordinates privileged operations such as database recovery and IP
45 address failover. CTDB refers to the leader node as the recovery
46 master. This node takes and holds the recovery lock to assert its
47 privileged role in the cluster.
48
49 By default, the recovery lock is implemented using a file (specified by
50 recovery lock in the [cluster] section of ctdb.conf(5)) residing in
51 shared storage (usually) on a cluster filesystem. To support a recovery
52 lock the cluster filesystem must support lock coherence. See
53 ping_pong(1) for more details.
54
55 The recovery lock can also be implemented using an arbitrary cluster
56 mutex call-out by using an exclamation point ('!') as the first
57 character of recovery lock. For example, a value of !/usr/bin/myhelper
58 recovery would run the given helper with the specified arguments. See
59 the source code relating to cluster mutexes for clues about writing
60 call-outs.
61
62 If a cluster becomes partitioned (for example, due to a communication
63 failure) and a different recovery master is elected by the nodes in
64 each partition, then only one of these recovery masters will be able to
65 take the recovery lock. The recovery master in the "losing" partition
66 will not be able to take the recovery lock and will be excluded from
67 the cluster. The nodes in the "losing" partition will elect each node
68 in turn as their recovery master so eventually all the nodes in that
69 partition will be excluded.
70
71 CTDB does sanity checks to ensure that the recovery lock is held as
72 expected.
73
74 CTDB can run without a recovery lock but this is not recommended as
75 there will be no protection from split brains.
76
78 Each node in a CTDB cluster has multiple IP addresses assigned to it:
79
80 · A single private IP address that is used for communication between
81 nodes.
82
83 · One or more public IP addresses that are used to provide NAS or
84 other services.
85
86
87 Private address
88 Each node is configured with a unique, permanently assigned private
89 address. This address is configured by the operating system. This
90 address uniquely identifies a physical node in the cluster and is the
91 address that CTDB daemons will use to communicate with the CTDB daemons
92 on other nodes.
93
94 Private addresses are listed in the file /etc/ctdb/nodes). This file
95 contains the list of private addresses for all nodes in the cluster,
96 one per line. This file must be the same on all nodes in the cluster.
97
98 Some users like to put this configuration file in their cluster
99 filesystem. A symbolic link should be used in this case.
100
101 Private addresses should not be used by clients to connect to services
102 provided by the cluster.
103
104 It is strongly recommended that the private addresses are configured on
105 a private network that is separate from client networks. This is
106 because the CTDB protocol is both unauthenticated and unencrypted. If
107 clients share the private network then steps need to be taken to stop
108 injection of packets to relevant ports on the private addresses. It is
109 also likely that CTDB protocol traffic between nodes could leak
110 sensitive information if it can be intercepted.
111
112 Example /etc/ctdb/nodes for a four node cluster:
113
114 192.168.1.1
115 192.168.1.2
116 192.168.1.3
117 192.168.1.4
118
119
120 Public addresses
121 Public addresses are used to provide services to clients. Public
122 addresses are not configured at the operating system level and are not
123 permanently associated with a particular node. Instead, they are
124 managed by CTDB and are assigned to interfaces on physical nodes at
125 runtime.
126
127 The CTDB cluster will assign/reassign these public addresses across the
128 available healthy nodes in the cluster. When one node fails, its public
129 addresses will be taken over by one or more other nodes in the cluster.
130 This ensures that services provided by all public addresses are always
131 available to clients, as long as there are nodes available capable of
132 hosting this address.
133
134 The public address configuration is stored in
135 /etc/ctdb/public_addresses on each node. This file contains a list of
136 the public addresses that the node is capable of hosting, one per line.
137 Each entry also contains the netmask and the interface to which the
138 address should be assigned. If this file is missing then no public
139 addresses are configured.
140
141 Some users who have the same public addresses on all nodes like to put
142 this configuration file in their cluster filesystem. A symbolic link
143 should be used in this case.
144
145 Example /etc/ctdb/public_addresses for a node that can host 4 public
146 addresses, on 2 different interfaces:
147
148 10.1.1.1/24 eth1
149 10.1.1.2/24 eth1
150 10.1.2.1/24 eth2
151 10.1.2.2/24 eth2
152
153
154 In many cases the public addresses file will be the same on all nodes.
155 However, it is possible to use different public address configurations
156 on different nodes.
157
158 Example: 4 nodes partitioned into two subgroups:
159
160 Node 0:/etc/ctdb/public_addresses
161 10.1.1.1/24 eth1
162 10.1.1.2/24 eth1
163
164 Node 1:/etc/ctdb/public_addresses
165 10.1.1.1/24 eth1
166 10.1.1.2/24 eth1
167
168 Node 2:/etc/ctdb/public_addresses
169 10.1.2.1/24 eth2
170 10.1.2.2/24 eth2
171
172 Node 3:/etc/ctdb/public_addresses
173 10.1.2.1/24 eth2
174 10.1.2.2/24 eth2
175
176
177 In this example nodes 0 and 1 host two public addresses on the 10.1.1.x
178 network while nodes 2 and 3 host two public addresses for the 10.1.2.x
179 network.
180
181 Public address 10.1.1.1 can be hosted by either of nodes 0 or 1 and
182 will be available to clients as long as at least one of these two nodes
183 are available.
184
185 If both nodes 0 and 1 become unavailable then public address 10.1.1.1
186 also becomes unavailable. 10.1.1.1 can not be failed over to nodes 2 or
187 3 since these nodes do not have this public address configured.
188
189 The ctdb ip command can be used to view the current assignment of
190 public addresses to physical nodes.
191
193 The current status of each node in the cluster can be viewed by the
194 ctdb status command.
195
196 A node can be in one of the following states:
197
198 OK
199 This node is healthy and fully functional. It hosts public
200 addresses to provide services.
201
202 DISCONNECTED
203 This node is not reachable by other nodes via the private network.
204 It is not currently participating in the cluster. It does not host
205 public addresses to provide services. It might be shut down.
206
207 DISABLED
208 This node has been administratively disabled. This node is
209 partially functional and participates in the cluster. However, it
210 does not host public addresses to provide services.
211
212 UNHEALTHY
213 A service provided by this node has failed a health check and
214 should be investigated. This node is partially functional and
215 participates in the cluster. However, it does not host public
216 addresses to provide services. Unhealthy nodes should be
217 investigated and may require an administrative action to rectify.
218
219 BANNED
220 CTDB is not behaving as designed on this node. For example, it may
221 have failed too many recovery attempts. Such nodes are banned from
222 participating in the cluster for a configurable time period before
223 they attempt to rejoin the cluster. A banned node does not host
224 public addresses to provide services. All banned nodes should be
225 investigated and may require an administrative action to rectify.
226
227 STOPPED
228 This node has been administratively exclude from the cluster. A
229 stopped node does no participate in the cluster and does not host
230 public addresses to provide services. This state can be used while
231 performing maintenance on a node.
232
233 PARTIALLYONLINE
234 A node that is partially online participates in a cluster like a
235 healthy (OK) node. Some interfaces to serve public addresses are
236 down, but at least one interface is up. See also ctdb ifaces.
237
239 Cluster nodes can have several different capabilities enabled. These
240 are listed below.
241
242 RECMASTER
243 Indicates that a node can become the CTDB cluster recovery master.
244 The current recovery master is decided via an election held by all
245 active nodes with this capability.
246
247 Default is YES.
248
249 LMASTER
250 Indicates that a node can be the location master (LMASTER) for
251 database records. The LMASTER always knows which node has the
252 latest copy of a record in a volatile database.
253
254 Default is YES.
255
256 The RECMASTER and LMASTER capabilities can be disabled when CTDB is
257 used to create a cluster spanning across WAN links. In this case CTDB
258 acts as a WAN accelerator.
259
261 LVS is a mode where CTDB presents one single IP address for the entire
262 cluster. This is an alternative to using public IP addresses and
263 round-robin DNS to loadbalance clients across the cluster.
264
265 This is similar to using a layer-4 loadbalancing switch but with some
266 restrictions.
267
268 One extra LVS public address is assigned on the public network to each
269 LVS group. Each LVS group is a set of nodes in the cluster that
270 presents the same LVS address public address to the outside world.
271 Normally there would only be one LVS group spanning an entire cluster,
272 but in situations where one CTDB cluster spans multiple physical sites
273 it might be useful to have one LVS group for each site. There can be
274 multiple LVS groups in a cluster but each node can only be member of
275 one LVS group.
276
277 Client access to the cluster is load-balanced across the HEALTHY nodes
278 in an LVS group. If no HEALTHY nodes exists then all nodes in the group
279 are used, regardless of health status. CTDB will, however never
280 load-balance LVS traffic to nodes that are BANNED, STOPPED, DISABLED or
281 DISCONNECTED. The ctdb lvs command is used to show which nodes are
282 currently load-balanced across.
283
284 In each LVS group, one of the nodes is selected by CTDB to be the LVS
285 master. This node receives all traffic from clients coming in to the
286 LVS public address and multiplexes it across the internal network to
287 one of the nodes that LVS is using. When responding to the client, that
288 node will send the data back directly to the client, bypassing the LVS
289 master node. The command ctdb lvs master will show which node is the
290 current LVS master.
291
292 The path used for a client I/O is:
293
294 1. Client sends request packet to LVSMASTER.
295
296 2. LVSMASTER passes the request on to one node across the internal
297 network.
298
299 3. Selected node processes the request.
300
301 4. Node responds back to client.
302
303 This means that all incoming traffic to the cluster will pass through
304 one physical node, which limits scalability. You can send more data to
305 the LVS address that one physical node can multiplex. This means that
306 you should not use LVS if your I/O pattern is write-intensive since you
307 will be limited in the available network bandwidth that node can
308 handle. LVS does work very well for read-intensive workloads where only
309 smallish READ requests are going through the LVSMASTER bottleneck and
310 the majority of the traffic volume (the data in the read replies) goes
311 straight from the processing node back to the clients. For
312 read-intensive i/o patterns you can achieve very high throughput rates
313 in this mode.
314
315 Note: you can use LVS and public addresses at the same time.
316
317 If you use LVS, you must have a permanent address configured for the
318 public interface on each node. This address must be routable and the
319 cluster nodes must be configured so that all traffic back to client
320 hosts are routed through this interface. This is also required in order
321 to allow samba/winbind on the node to talk to the domain controller.
322 This LVS IP address can not be used to initiate outgoing traffic.
323
324 Make sure that the domain controller and the clients are reachable from
325 a node before you enable LVS. Also ensure that outgoing traffic to
326 these hosts is routed out through the configured public interface.
327
328 Configuration
329 To activate LVS on a CTDB node you must specify the
330 CTDB_LVS_PUBLIC_IFACE, CTDB_LVS_PUBLIC_IP and CTDB_LVS_NODES
331 configuration variables. CTDB_LVS_NODES specifies a file containing
332 the private address of all nodes in the current node's LVS group.
333
334 Example:
335
336 CTDB_LVS_PUBLIC_IFACE=eth1
337 CTDB_LVS_PUBLIC_IP=10.1.1.237
338 CTDB_LVS_NODES=/etc/ctdb/lvs_nodes
339
340
341 Example /etc/ctdb/lvs_nodes:
342
343 192.168.1.2
344 192.168.1.3
345 192.168.1.4
346
347
348 Normally any node in an LVS group can act as the LVS master. Nodes that
349 are highly loaded due to other demands maybe flagged with the
350 "slave-only" option in the CTDB_LVS_NODES file to limit the LVS
351 functionality of those nodes.
352
353 LVS nodes file that excludes 192.168.1.4 from being the LVS master
354 node:
355
356 192.168.1.2
357 192.168.1.3
358 192.168.1.4 slave-only
359
360
362 CTDB tracks TCP connections from clients to public IP addresses, on
363 known ports. When an IP address moves from one node to another, all
364 existing TCP connections to that IP address are reset. The node taking
365 over this IP address will also send gratuitous ARPs (for IPv4, or
366 neighbour advertisement, for IPv6). This allows clients to reconnect
367 quickly, rather than waiting for TCP timeouts, which can be very long.
368
369 It is important that established TCP connections do not survive a
370 release and take of a public IP address on the same node. Such
371 connections can get out of sync with sequence and ACK numbers,
372 potentially causing a disruptive ACK storm.
373
375 NAT gateway (NATGW) is an optional feature that is used to configure
376 fallback routing for nodes. This allows cluster nodes to connect to
377 external services (e.g. DNS, AD, NIS and LDAP) when they do not host
378 any public addresses (e.g. when they are unhealthy).
379
380 This also applies to node startup because CTDB marks nodes as UNHEALTHY
381 until they have passed a "monitor" event. In this context, NAT gateway
382 helps to avoid a "chicken and egg" situation where a node needs to
383 access an external service to become healthy.
384
385 Another way of solving this type of problem is to assign an extra
386 static IP address to a public interface on every node. This is simpler
387 but it uses an extra IP address per node, while NAT gateway generally
388 uses only one extra IP address.
389
390 Operation
391 One extra NATGW public address is assigned on the public network to
392 each NATGW group. Each NATGW group is a set of nodes in the cluster
393 that shares the same NATGW address to talk to the outside world.
394 Normally there would only be one NATGW group spanning an entire
395 cluster, but in situations where one CTDB cluster spans multiple
396 physical sites it might be useful to have one NATGW group for each
397 site.
398
399 There can be multiple NATGW groups in a cluster but each node can only
400 be member of one NATGW group.
401
402 In each NATGW group, one of the nodes is selected by CTDB to be the
403 NATGW master and the other nodes are consider to be NATGW slaves. NATGW
404 slaves establish a fallback default route to the NATGW master via the
405 private network. When a NATGW slave hosts no public IP addresses then
406 it will use this route for outbound connections. The NATGW master hosts
407 the NATGW public IP address and routes outgoing connections from slave
408 nodes via this IP address. It also establishes a fallback default
409 route.
410
411 Configuration
412 NATGW is usually configured similar to the following example
413 configuration:
414
415 CTDB_NATGW_NODES=/etc/ctdb/natgw_nodes
416 CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24
417 CTDB_NATGW_PUBLIC_IP=10.0.0.227/24
418 CTDB_NATGW_PUBLIC_IFACE=eth0
419 CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1
420
421
422 Normally any node in a NATGW group can act as the NATGW master. Some
423 configurations may have special nodes that lack connectivity to a
424 public network. In such cases, those nodes can be flagged with the
425 "slave-only" option in the CTDB_NATGW_NODES file to limit the NATGW
426 functionality of those nodes.
427
428 See the NAT GATEWAY section in ctdb-script.options(5) for more details
429 of NATGW configuration.
430
431 Implementation details
432 When the NATGW functionality is used, one of the nodes is selected to
433 act as a NAT gateway for all the other nodes in the group when they
434 need to communicate with the external services. The NATGW master is
435 selected to be a node that is most likely to have usable networks.
436
437 The NATGW master hosts the NATGW public IP address CTDB_NATGW_PUBLIC_IP
438 on the configured public interfaces CTDB_NATGW_PUBLIC_IFACE and acts as
439 a router, masquerading outgoing connections from slave nodes via this
440 IP address. If CTDB_NATGW_DEFAULT_GATEWAY is set then it also
441 establishes a fallback default route to the configured this gateway
442 with a metric of 10. A metric 10 route is used so it can co-exist with
443 other default routes that may be available.
444
445 A NATGW slave establishes its fallback default route to the NATGW
446 master via the private network CTDB_NATGW_PRIVATE_NETWORKwith a metric
447 of 10. This route is used for outbound connections when no other
448 default route is available because the node hosts no public addresses.
449 A metric 10 routes is used so that it can co-exist with other default
450 routes that may be available when the node is hosting public addresses.
451
452 CTDB_NATGW_STATIC_ROUTES can be used to have NATGW create more specific
453 routes instead of just default routes.
454
455 This is implemented in the 11.natgw eventscript. Please see the
456 eventscript file and the NAT GATEWAY section in ctdb-script.options(5)
457 for more details.
458
460 Policy routing is an optional CTDB feature to support complex network
461 topologies. Public addresses may be spread across several different
462 networks (or VLANs) and it may not be possible to route packets from
463 these public addresses via the system's default route. Therefore, CTDB
464 has support for policy routing via the 13.per_ip_routing eventscript.
465 This allows routing to be specified for packets sourced from each
466 public address. The routes are added and removed as CTDB moves public
467 addresses between nodes.
468
469 Configuration variables
470 There are 4 configuration variables related to policy routing:
471 CTDB_PER_IP_ROUTING_CONF, CTDB_PER_IP_ROUTING_RULE_PREF,
472 CTDB_PER_IP_ROUTING_TABLE_ID_LOW, CTDB_PER_IP_ROUTING_TABLE_ID_HIGH.
473 See the POLICY ROUTING section in ctdb-script.options(5) for more
474 details.
475
476 Configuration
477 The format of each line of CTDB_PER_IP_ROUTING_CONF is:
478
479 <public_address> <network> [ <gateway> ]
480
481
482 Leading whitespace is ignored and arbitrary whitespace may be used as a
483 separator. Lines that have a "public address" item that doesn't match
484 an actual public address are ignored. This means that comment lines can
485 be added using a leading character such as '#', since this will never
486 match an IP address.
487
488 A line without a gateway indicates a link local route.
489
490 For example, consider the configuration line:
491
492 192.168.1.99 192.168.1.1/24
493
494
495 If the corresponding public_addresses line is:
496
497 192.168.1.99/24 eth2,eth3
498
499
500 CTDB_PER_IP_ROUTING_RULE_PREF is 100, and CTDB adds the address to eth2
501 then the following routing information is added:
502
503 ip rule add from 192.168.1.99 pref 100 table ctdb.192.168.1.99
504 ip route add 192.168.1.0/24 dev eth2 table ctdb.192.168.1.99
505
506
507 This causes traffic from 192.168.1.1 to 192.168.1.0/24 go via eth2.
508
509 The ip rule command will show (something like - depending on other
510 public addresses and other routes on the system):
511
512 0: from all lookup local
513 100: from 192.168.1.99 lookup ctdb.192.168.1.99
514 32766: from all lookup main
515 32767: from all lookup default
516
517
518 ip route show table ctdb.192.168.1.99 will show:
519
520 192.168.1.0/24 dev eth2 scope link
521
522
523 The usual use for a line containing a gateway is to add a default route
524 corresponding to a particular source address. Consider this line of
525 configuration:
526
527 192.168.1.99 0.0.0.0/0 192.168.1.1
528
529
530 In the situation described above this will cause an extra routing
531 command to be executed:
532
533 ip route add 0.0.0.0/0 via 192.168.1.1 dev eth2 table ctdb.192.168.1.99
534
535
536 With both configuration lines, ip route show table ctdb.192.168.1.99
537 will show:
538
539 192.168.1.0/24 dev eth2 scope link
540 default via 192.168.1.1 dev eth2
541
542
543 Sample configuration
544 Here is a more complete example configuration.
545
546 /etc/ctdb/public_addresses:
547
548 192.168.1.98 eth2,eth3
549 192.168.1.99 eth2,eth3
550
551 /etc/ctdb/policy_routing:
552
553 192.168.1.98 192.168.1.0/24
554 192.168.1.98 192.168.200.0/24 192.168.1.254
555 192.168.1.98 0.0.0.0/0 192.168.1.1
556 192.168.1.99 192.168.1.0/24
557 192.168.1.99 192.168.200.0/24 192.168.1.254
558 192.168.1.99 0.0.0.0/0 192.168.1.1
559
560
561 The routes local packets as expected, the default route is as
562 previously discussed, but packets to 192.168.200.0/24 are routed via
563 the alternate gateway 192.168.1.254.
564
566 When certain state changes occur in CTDB, it can be configured to
567 perform arbitrary actions via notifications. For example, sending SNMP
568 traps or emails when a node becomes unhealthy or similar.
569
570 The notification mechanism runs all executable files ending in
571 ".script" in /etc/ctdb/events/notification/, ignoring any failures and
572 continuing to run all files.
573
574 CTDB currently generates notifications after CTDB changes to these
575 states:
576 init
577 setup
578 startup
579 healthy
580 unhealthy
581
583 Valid log levels, in increasing order of verbosity, are:
584 ERROR
585 WARNING
586 NOTICE
587 INFO
588 DEBUG
589
591 It is possible to have a CTDB cluster that spans across a WAN link. For
592 example where you have a CTDB cluster in your datacentre but you also
593 want to have one additional CTDB node located at a remote branch site.
594 This is similar to how a WAN accelerator works but with the difference
595 that while a WAN-accelerator often acts as a Proxy or a MitM, in the
596 ctdb remote cluster node configuration the Samba instance at the remote
597 site IS the genuine server, not a proxy and not a MitM, and thus
598 provides 100% correct CIFS semantics to clients.
599
600 See the cluster as one single multihomed samba server where one of the
601 NICs (the remote node) is very far away.
602
603 NOTE: This does require that the cluster filesystem you use can cope
604 with WAN-link latencies. Not all cluster filesystems can handle
605 WAN-link latencies! Whether this will provide very good WAN-accelerator
606 performance or it will perform very poorly depends entirely on how
607 optimized your cluster filesystem is in handling high latency for data
608 and metadata operations.
609
610 To activate a node as being a remote cluster node you need to set the
611 following two parameters in /etc/ctdb/ctdb.conf for the remote node:
612
613 [legacy]
614 lmaster capability = false
615 recmaster capability = false
616
617
618 Verify with the command "ctdb getcapabilities" that that node no longer
619 has the recmaster or the lmaster capabilities.
620
622 ctdb(1), ctdbd(1), ctdbd_wrapper(1), ctdb_diagnostics(1), ltdbtool(1),
623 onnode(1), ping_pong(1), ctdb.conf(5), ctdb-script.options(5),
624 ctdb.sysconfig(5), ctdb-statistics(7), ctdb-tunables(7),
625 http://ctdb.samba.org/
626
628 This documentation was written by Ronnie Sahlberg, Amitay Isaacs,
629 Martin Schwenke
630
632 Copyright © 2007 Andrew Tridgell, Ronnie Sahlberg
633
634 This program is free software; you can redistribute it and/or modify it
635 under the terms of the GNU General Public License as published by the
636 Free Software Foundation; either version 3 of the License, or (at your
637 option) any later version.
638
639 This program is distributed in the hope that it will be useful, but
640 WITHOUT ANY WARRANTY; without even the implied warranty of
641 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
642 General Public License for more details.
643
644 You should have received a copy of the GNU General Public License along
645 with this program; if not, see http://www.gnu.org/licenses.
646
647
648
649
650ctdb 04/28/2020 CTDB(7)