1CTDB(7) CTDB - clustered TDB database CTDB(7)
2
3
4
6 ctdb - Clustered TDB
7
9 CTDB is a clustered database component in clustered Samba that provides
10 a high-availability load-sharing CIFS server cluster.
11
12 The main functions of CTDB are:
13
14 · Provide a clustered version of the TDB database with automatic
15 rebuild/recovery of the databases upon node failures.
16
17 · Monitor nodes in the cluster and services running on each node.
18
19 · Manage a pool of public IP addresses that are used to provide
20 services to clients. Alternatively, CTDB can be used with LVS.
21
22 Combined with a cluster filesystem CTDB provides a full
23 high-availablity (HA) environment for services such as clustered Samba,
24 NFS and other services.
25
27 A CTDB cluster is a collection of nodes with 2 or more network
28 interfaces. All nodes provide network (usually file/NAS) services to
29 clients. Data served by file services is stored on shared storage
30 (usually a cluster filesystem) that is accessible by all nodes.
31
32 CTDB provides an "all active" cluster, where services are load balanced
33 across all nodes.
34
36 CTDB uses a recovery lock to avoid a split brain, where a cluster
37 becomes partitioned and each partition attempts to operate
38 independently. Issues that can result from a split brain include file
39 data corruption, because file locking metadata may not be tracked
40 correctly.
41
42 CTDB uses a cluster leader and follower model of cluster management.
43 All nodes in a cluster elect one node to be the leader. The leader node
44 coordinates privileged operations such as database recovery and IP
45 address failover. CTDB refers to the leader node as the recovery
46 master. This node takes and holds the recovery lock to assert its
47 privileged role in the cluster.
48
49 By default, the recovery lock is implemented using a file (specified by
50 CTDB_RECOVERY_LOCK) residing in shared storage (usually) on a cluster
51 filesystem. To support a recovery lock the cluster filesystem must
52 support lock coherence. See ping_pong(1) for more details.
53
54 The recovery lock can also be implemented using an arbitrary cluster
55 mutex call-out by using an exclamation point ('!') as the first
56 character of CTDB_RECOVERY_LOCK. For example, a value of
57 !/usr/bin/myhelper recovery would run the given helper with the
58 specified arguments. See the source code relating to cluster mutexes
59 for clues about writing call-outs.
60
61 If a cluster becomes partitioned (for example, due to a communication
62 failure) and a different recovery master is elected by the nodes in
63 each partition, then only one of these recovery masters will be able to
64 take the recovery lock. The recovery master in the "losing" partition
65 will not be able to take the recovery lock and will be excluded from
66 the cluster. The nodes in the "losing" partition will elect each node
67 in turn as their recovery master so eventually all the nodes in that
68 partition will be excluded.
69
70 CTDB does sanity checks to ensure that the recovery lock is held as
71 expected.
72
73 CTDB can run without a recovery lock but this is not recommended as
74 there will be no protection from split brains.
75
77 Each node in a CTDB cluster has multiple IP addresses assigned to it:
78
79 · A single private IP address that is used for communication between
80 nodes.
81
82 · One or more public IP addresses that are used to provide NAS or
83 other services.
84
85
86 Private address
87 Each node is configured with a unique, permanently assigned private
88 address. This address is configured by the operating system. This
89 address uniquely identifies a physical node in the cluster and is the
90 address that CTDB daemons will use to communicate with the CTDB daemons
91 on other nodes.
92
93 Private addresses are listed in the file specified by the CTDB_NODES
94 configuration variable (see ctdbd.conf(5), default /etc/ctdb/nodes).
95 This file contains the list of private addresses for all nodes in the
96 cluster, one per line. This file must be the same on all nodes in the
97 cluster.
98
99 Private addresses should not be used by clients to connect to services
100 provided by the cluster.
101
102 It is strongly recommended that the private addresses are configured on
103 a private network that is separate from client networks. This is
104 because the CTDB protocol is both unauthenticated and unencrypted. If
105 clients share the private network then steps need to be taken to stop
106 injection of packets to relevant ports on the private addresses. It is
107 also likely that CTDB protocol traffic between nodes could leak
108 sensitive information if it can be intercepted.
109
110 Example /etc/ctdb/nodes for a four node cluster:
111
112 192.168.1.1
113 192.168.1.2
114 192.168.1.3
115 192.168.1.4
116
117
118 Public addresses
119 Public addresses are used to provide services to clients. Public
120 addresses are not configured at the operating system level and are not
121 permanently associated with a particular node. Instead, they are
122 managed by CTDB and are assigned to interfaces on physical nodes at
123 runtime.
124
125 The CTDB cluster will assign/reassign these public addresses across the
126 available healthy nodes in the cluster. When one node fails, its public
127 addresses will be taken over by one or more other nodes in the cluster.
128 This ensures that services provided by all public addresses are always
129 available to clients, as long as there are nodes available capable of
130 hosting this address.
131
132 The public address configuration is stored in a file on each node
133 specified by the CTDB_PUBLIC_ADDRESSES configuration variable (see
134 ctdbd.conf(5), recommended /etc/ctdb/public_addresses). This file
135 contains a list of the public addresses that the node is capable of
136 hosting, one per line. Each entry also contains the netmask and the
137 interface to which the address should be assigned.
138
139 Example /etc/ctdb/public_addresses for a node that can host 4 public
140 addresses, on 2 different interfaces:
141
142 10.1.1.1/24 eth1
143 10.1.1.2/24 eth1
144 10.1.2.1/24 eth2
145 10.1.2.2/24 eth2
146
147
148 In many cases the public addresses file will be the same on all nodes.
149 However, it is possible to use different public address configurations
150 on different nodes.
151
152 Example: 4 nodes partitioned into two subgroups:
153
154 Node 0:/etc/ctdb/public_addresses
155 10.1.1.1/24 eth1
156 10.1.1.2/24 eth1
157
158 Node 1:/etc/ctdb/public_addresses
159 10.1.1.1/24 eth1
160 10.1.1.2/24 eth1
161
162 Node 2:/etc/ctdb/public_addresses
163 10.1.2.1/24 eth2
164 10.1.2.2/24 eth2
165
166 Node 3:/etc/ctdb/public_addresses
167 10.1.2.1/24 eth2
168 10.1.2.2/24 eth2
169
170
171 In this example nodes 0 and 1 host two public addresses on the 10.1.1.x
172 network while nodes 2 and 3 host two public addresses for the 10.1.2.x
173 network.
174
175 Public address 10.1.1.1 can be hosted by either of nodes 0 or 1 and
176 will be available to clients as long as at least one of these two nodes
177 are available.
178
179 If both nodes 0 and 1 become unavailable then public address 10.1.1.1
180 also becomes unavailable. 10.1.1.1 can not be failed over to nodes 2 or
181 3 since these nodes do not have this public address configured.
182
183 The ctdb ip command can be used to view the current assignment of
184 public addresses to physical nodes.
185
187 The current status of each node in the cluster can be viewed by the
188 ctdb status command.
189
190 A node can be in one of the following states:
191
192 OK
193 This node is healthy and fully functional. It hosts public
194 addresses to provide services.
195
196 DISCONNECTED
197 This node is not reachable by other nodes via the private network.
198 It is not currently participating in the cluster. It does not host
199 public addresses to provide services. It might be shut down.
200
201 DISABLED
202 This node has been administratively disabled. This node is
203 partially functional and participates in the cluster. However, it
204 does not host public addresses to provide services.
205
206 UNHEALTHY
207 A service provided by this node has failed a health check and
208 should be investigated. This node is partially functional and
209 participates in the cluster. However, it does not host public
210 addresses to provide services. Unhealthy nodes should be
211 investigated and may require an administrative action to rectify.
212
213 BANNED
214 CTDB is not behaving as designed on this node. For example, it may
215 have failed too many recovery attempts. Such nodes are banned from
216 participating in the cluster for a configurable time period before
217 they attempt to rejoin the cluster. A banned node does not host
218 public addresses to provide services. All banned nodes should be
219 investigated and may require an administrative action to rectify.
220
221 STOPPED
222 This node has been administratively exclude from the cluster. A
223 stopped node does no participate in the cluster and does not host
224 public addresses to provide services. This state can be used while
225 performing maintenance on a node.
226
227 PARTIALLYONLINE
228 A node that is partially online participates in a cluster like a
229 healthy (OK) node. Some interfaces to serve public addresses are
230 down, but at least one interface is up. See also ctdb ifaces.
231
233 Cluster nodes can have several different capabilities enabled. These
234 are listed below.
235
236 RECMASTER
237 Indicates that a node can become the CTDB cluster recovery master.
238 The current recovery master is decided via an election held by all
239 active nodes with this capability.
240
241 Default is YES.
242
243 LMASTER
244 Indicates that a node can be the location master (LMASTER) for
245 database records. The LMASTER always knows which node has the
246 latest copy of a record in a volatile database.
247
248 Default is YES.
249
250 The RECMASTER and LMASTER capabilities can be disabled when CTDB is
251 used to create a cluster spanning across WAN links. In this case CTDB
252 acts as a WAN accelerator.
253
255 LVS is a mode where CTDB presents one single IP address for the entire
256 cluster. This is an alternative to using public IP addresses and
257 round-robin DNS to loadbalance clients across the cluster.
258
259 This is similar to using a layer-4 loadbalancing switch but with some
260 restrictions.
261
262 One extra LVS public address is assigned on the public network to each
263 LVS group. Each LVS group is a set of nodes in the cluster that
264 presents the same LVS address public address to the outside world.
265 Normally there would only be one LVS group spanning an entire cluster,
266 but in situations where one CTDB cluster spans multiple physical sites
267 it might be useful to have one LVS group for each site. There can be
268 multiple LVS groups in a cluster but each node can only be member of
269 one LVS group.
270
271 Client access to the cluster is load-balanced across the HEALTHY nodes
272 in an LVS group. If no HEALTHY nodes exists then all nodes in the group
273 are used, regardless of health status. CTDB will, however never
274 load-balance LVS traffic to nodes that are BANNED, STOPPED, DISABLED or
275 DISCONNECTED. The ctdb lvs command is used to show which nodes are
276 currently load-balanced across.
277
278 In each LVS group, one of the nodes is selected by CTDB to be the LVS
279 master. This node receives all traffic from clients coming in to the
280 LVS public address and multiplexes it across the internal network to
281 one of the nodes that LVS is using. When responding to the client, that
282 node will send the data back directly to the client, bypassing the LVS
283 master node. The command ctdb lvs master will show which node is the
284 current LVS master.
285
286 The path used for a client I/O is:
287
288 1. Client sends request packet to LVSMASTER.
289
290 2. LVSMASTER passes the request on to one node across the internal
291 network.
292
293 3. Selected node processes the request.
294
295 4. Node responds back to client.
296
297 This means that all incoming traffic to the cluster will pass through
298 one physical node, which limits scalability. You can send more data to
299 the LVS address that one physical node can multiplex. This means that
300 you should not use LVS if your I/O pattern is write-intensive since you
301 will be limited in the available network bandwidth that node can
302 handle. LVS does work very well for read-intensive workloads where only
303 smallish READ requests are going through the LVSMASTER bottleneck and
304 the majority of the traffic volume (the data in the read replies) goes
305 straight from the processing node back to the clients. For
306 read-intensive i/o patterns you can achieve very high throughput rates
307 in this mode.
308
309 Note: you can use LVS and public addresses at the same time.
310
311 If you use LVS, you must have a permanent address configured for the
312 public interface on each node. This address must be routable and the
313 cluster nodes must be configured so that all traffic back to client
314 hosts are routed through this interface. This is also required in order
315 to allow samba/winbind on the node to talk to the domain controller.
316 This LVS IP address can not be used to initiate outgoing traffic.
317
318 Make sure that the domain controller and the clients are reachable from
319 a node before you enable LVS. Also ensure that outgoing traffic to
320 these hosts is routed out through the configured public interface.
321
322 Configuration
323 To activate LVS on a CTDB node you must specify the
324 CTDB_LVS_PUBLIC_IFACE, CTDB_LVS_PUBLIC_IP and CTDB_LVS_NODES
325 configuration variables. CTDB_LVS_NODES specifies a file containing
326 the private address of all nodes in the current node's LVS group.
327
328 Example:
329
330 CTDB_LVS_PUBLIC_IFACE=eth1
331 CTDB_LVS_PUBLIC_IP=10.1.1.237
332 CTDB_LVS_NODES=/etc/ctdb/lvs_nodes
333
334
335 Example /etc/ctdb/lvs_nodes:
336
337 192.168.1.2
338 192.168.1.3
339 192.168.1.4
340
341
342 Normally any node in an LVS group can act as the LVS master. Nodes that
343 are highly loaded due to other demands maybe flagged with the
344 "slave-only" option in the CTDB_LVS_NODES file to limit the LVS
345 functionality of those nodes.
346
347 LVS nodes file that excludes 192.168.1.4 from being the LVS master
348 node:
349
350 192.168.1.2
351 192.168.1.3
352 192.168.1.4 slave-only
353
354
356 CTDB tracks TCP connections from clients to public IP addresses, on
357 known ports. When an IP address moves from one node to another, all
358 existing TCP connections to that IP address are reset. The node taking
359 over this IP address will also send gratuitous ARPs (for IPv4, or
360 neighbour advertisement, for IPv6). This allows clients to reconnect
361 quickly, rather than waiting for TCP timeouts, which can be very long.
362
363 It is important that established TCP connections do not survive a
364 release and take of a public IP address on the same node. Such
365 connections can get out of sync with sequence and ACK numbers,
366 potentially causing a disruptive ACK storm.
367
369 NAT gateway (NATGW) is an optional feature that is used to configure
370 fallback routing for nodes. This allows cluster nodes to connect to
371 external services (e.g. DNS, AD, NIS and LDAP) when they do not host
372 any public addresses (e.g. when they are unhealthy).
373
374 This also applies to node startup because CTDB marks nodes as UNHEALTHY
375 until they have passed a "monitor" event. In this context, NAT gateway
376 helps to avoid a "chicken and egg" situation where a node needs to
377 access an external service to become healthy.
378
379 Another way of solving this type of problem is to assign an extra
380 static IP address to a public interface on every node. This is simpler
381 but it uses an extra IP address per node, while NAT gateway generally
382 uses only one extra IP address.
383
384 Operation
385 One extra NATGW public address is assigned on the public network to
386 each NATGW group. Each NATGW group is a set of nodes in the cluster
387 that shares the same NATGW address to talk to the outside world.
388 Normally there would only be one NATGW group spanning an entire
389 cluster, but in situations where one CTDB cluster spans multiple
390 physical sites it might be useful to have one NATGW group for each
391 site.
392
393 There can be multiple NATGW groups in a cluster but each node can only
394 be member of one NATGW group.
395
396 In each NATGW group, one of the nodes is selected by CTDB to be the
397 NATGW master and the other nodes are consider to be NATGW slaves. NATGW
398 slaves establish a fallback default route to the NATGW master via the
399 private network. When a NATGW slave hosts no public IP addresses then
400 it will use this route for outbound connections. The NATGW master hosts
401 the NATGW public IP address and routes outgoing connections from slave
402 nodes via this IP address. It also establishes a fallback default
403 route.
404
405 Configuration
406 NATGW is usually configured similar to the following example
407 configuration:
408
409 CTDB_NATGW_NODES=/etc/ctdb/natgw_nodes
410 CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24
411 CTDB_NATGW_PUBLIC_IP=10.0.0.227/24
412 CTDB_NATGW_PUBLIC_IFACE=eth0
413 CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1
414
415
416 Normally any node in a NATGW group can act as the NATGW master. Some
417 configurations may have special nodes that lack connectivity to a
418 public network. In such cases, those nodes can be flagged with the
419 "slave-only" option in the CTDB_NATGW_NODES file to limit the NATGW
420 functionality of those nodes.
421
422 See the NAT GATEWAY section in ctdbd.conf(5) for more details of NATGW
423 configuration.
424
425 Implementation details
426 When the NATGW functionality is used, one of the nodes is selected to
427 act as a NAT gateway for all the other nodes in the group when they
428 need to communicate with the external services. The NATGW master is
429 selected to be a node that is most likely to have usable networks.
430
431 The NATGW master hosts the NATGW public IP address CTDB_NATGW_PUBLIC_IP
432 on the configured public interfaces CTDB_NATGW_PUBLIC_IFACE and acts as
433 a router, masquerading outgoing connections from slave nodes via this
434 IP address. If CTDB_NATGW_DEFAULT_GATEWAY is set then it also
435 establishes a fallback default route to the configured this gateway
436 with a metric of 10. A metric 10 route is used so it can co-exist with
437 other default routes that may be available.
438
439 A NATGW slave establishes its fallback default route to the NATGW
440 master via the private network CTDB_NATGW_PRIVATE_NETWORKwith a metric
441 of 10. This route is used for outbound connections when no other
442 default route is available because the node hosts no public addresses.
443 A metric 10 routes is used so that it can co-exist with other default
444 routes that may be available when the node is hosting public addresses.
445
446 CTDB_NATGW_STATIC_ROUTES can be used to have NATGW create more specific
447 routes instead of just default routes.
448
449 This is implemented in the 11.natgw eventscript. Please see the
450 eventscript file and the NAT GATEWAY section in ctdbd.conf(5) for more
451 details.
452
454 Policy routing is an optional CTDB feature to support complex network
455 topologies. Public addresses may be spread across several different
456 networks (or VLANs) and it may not be possible to route packets from
457 these public addresses via the system's default route. Therefore, CTDB
458 has support for policy routing via the 13.per_ip_routing eventscript.
459 This allows routing to be specified for packets sourced from each
460 public address. The routes are added and removed as CTDB moves public
461 addresses between nodes.
462
463 Configuration variables
464 There are 4 configuration variables related to policy routing:
465 CTDB_PER_IP_ROUTING_CONF, CTDB_PER_IP_ROUTING_RULE_PREF,
466 CTDB_PER_IP_ROUTING_TABLE_ID_LOW, CTDB_PER_IP_ROUTING_TABLE_ID_HIGH.
467 See the POLICY ROUTING section in ctdbd.conf(5) for more details.
468
469 Configuration
470 The format of each line of CTDB_PER_IP_ROUTING_CONF is:
471
472 <public_address> <network> [ <gateway> ]
473
474
475 Leading whitespace is ignored and arbitrary whitespace may be used as a
476 separator. Lines that have a "public address" item that doesn't match
477 an actual public address are ignored. This means that comment lines can
478 be added using a leading character such as '#', since this will never
479 match an IP address.
480
481 A line without a gateway indicates a link local route.
482
483 For example, consider the configuration line:
484
485 192.168.1.99 192.168.1.1/24
486
487
488 If the corresponding public_addresses line is:
489
490 192.168.1.99/24 eth2,eth3
491
492
493 CTDB_PER_IP_ROUTING_RULE_PREF is 100, and CTDB adds the address to eth2
494 then the following routing information is added:
495
496 ip rule add from 192.168.1.99 pref 100 table ctdb.192.168.1.99
497 ip route add 192.168.1.0/24 dev eth2 table ctdb.192.168.1.99
498
499
500 This causes traffic from 192.168.1.1 to 192.168.1.0/24 go via eth2.
501
502 The ip rule command will show (something like - depending on other
503 public addresses and other routes on the system):
504
505 0: from all lookup local
506 100: from 192.168.1.99 lookup ctdb.192.168.1.99
507 32766: from all lookup main
508 32767: from all lookup default
509
510
511 ip route show table ctdb.192.168.1.99 will show:
512
513 192.168.1.0/24 dev eth2 scope link
514
515
516 The usual use for a line containing a gateway is to add a default route
517 corresponding to a particular source address. Consider this line of
518 configuration:
519
520 192.168.1.99 0.0.0.0/0 192.168.1.1
521
522
523 In the situation described above this will cause an extra routing
524 command to be executed:
525
526 ip route add 0.0.0.0/0 via 192.168.1.1 dev eth2 table ctdb.192.168.1.99
527
528
529 With both configuration lines, ip route show table ctdb.192.168.1.99
530 will show:
531
532 192.168.1.0/24 dev eth2 scope link
533 default via 192.168.1.1 dev eth2
534
535
536 Sample configuration
537 Here is a more complete example configuration.
538
539 /etc/ctdb/public_addresses:
540
541 192.168.1.98 eth2,eth3
542 192.168.1.99 eth2,eth3
543
544 /etc/ctdb/policy_routing:
545
546 192.168.1.98 192.168.1.0/24
547 192.168.1.98 192.168.200.0/24 192.168.1.254
548 192.168.1.98 0.0.0.0/0 192.168.1.1
549 192.168.1.99 192.168.1.0/24
550 192.168.1.99 192.168.200.0/24 192.168.1.254
551 192.168.1.99 0.0.0.0/0 192.168.1.1
552
553
554 The routes local packets as expected, the default route is as
555 previously discussed, but packets to 192.168.200.0/24 are routed via
556 the alternate gateway 192.168.1.254.
557
559 When certain state changes occur in CTDB, it can be configured to
560 perform arbitrary actions via a notification script. For example,
561 sending SNMP traps or emails when a node becomes unhealthy or similar.
562
563 This is activated by setting the CTDB_NOTIFY_SCRIPT configuration
564 variable. The specified script must be executable.
565
566 Use of the provided /etc/ctdb/notify.sh script is recommended. It
567 executes files in /etc/ctdb/notify.d/.
568
569 CTDB currently generates notifications after CTDB changes to these
570 states:
571 init
572 setup
573 startup
574 healthy
575 unhealthy
576
578 Valid values for DEBUGLEVEL are:
579 ERR
580 WARNING
581 NOTICE
582 INFO
583 DEBUG
584
586 It is possible to have a CTDB cluster that spans across a WAN link. For
587 example where you have a CTDB cluster in your datacentre but you also
588 want to have one additional CTDB node located at a remote branch site.
589 This is similar to how a WAN accelerator works but with the difference
590 that while a WAN-accelerator often acts as a Proxy or a MitM, in the
591 ctdb remote cluster node configuration the Samba instance at the remote
592 site IS the genuine server, not a proxy and not a MitM, and thus
593 provides 100% correct CIFS semantics to clients.
594
595 See the cluster as one single multihomed samba server where one of the
596 NICs (the remote node) is very far away.
597
598 NOTE: This does require that the cluster filesystem you use can cope
599 with WAN-link latencies. Not all cluster filesystems can handle
600 WAN-link latencies! Whether this will provide very good WAN-accelerator
601 performance or it will perform very poorly depends entirely on how
602 optimized your cluster filesystem is in handling high latency for data
603 and metadata operations.
604
605 To activate a node as being a remote cluster node you need to set the
606 following two parameters in /etc/sysconfig/ctdb for the remote node:
607
608 CTDB_CAPABILITY_LMASTER=no
609 CTDB_CAPABILITY_RECMASTER=no
610
611
612 Verify with the command "ctdb getcapabilities" that that node no longer
613 has the recmaster or the lmaster capabilities.
614
616 ctdb(1), ctdbd(1), ctdbd_wrapper(1), ctdb_diagnostics(1), ltdbtool(1),
617 onnode(1), ping_pong(1), ctdbd.conf(5), ctdb-statistics(7), ctdb-
618 tunables(7), http://ctdb.samba.org/
619
621 This documentation was written by Ronnie Sahlberg, Amitay Isaacs,
622 Martin Schwenke
623
625 Copyright © 2007 Andrew Tridgell, Ronnie Sahlberg
626
627 This program is free software; you can redistribute it and/or modify it
628 under the terms of the GNU General Public License as published by the
629 Free Software Foundation; either version 3 of the License, or (at your
630 option) any later version.
631
632 This program is distributed in the hope that it will be useful, but
633 WITHOUT ANY WARRANTY; without even the implied warranty of
634 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
635 General Public License for more details.
636
637 You should have received a copy of the GNU General Public License along
638 with this program; if not, see http://www.gnu.org/licenses.
639
640
641
642
643ctdb 10/30/2018 CTDB(7)