1CTDB(1) CTDB - clustered TDB database CTDB(1)
2
3
4
6 ctdb - CTDB management utility
7
9 ctdb [OPTION...] {COMMAND} [COMMAND-ARGS]
10
12 ctdb is a utility to view and manage a CTDB cluster.
13
14 The following terms are used when referring to nodes in a cluster:
15
16 PNN
17 Physical Node Number. The physical node number is an integer that
18 describes the node in the cluster. The first node has physical node
19 number 0. in a cluster.
20
21 PNN-LIST
22 This is either a single PNN, a comma-separate list of PNNs or
23 "all".
24
25 Commands that reference a database have a DB argument. This is either a
26 database name, such as locking.tdb or a database ID such as
27 "0x42fe72c5".
28
30 -n PNN-LIST
31 The nodes specified by PNN-LIST should be queried for the requested
32 information. Default is to query the daemon running on the local
33 host.
34
35 -Y
36 Produce output in machine readable form for easier parsing by
37 scripts. Not all commands support this option.
38
39 -t TIMEOUT
40 Indicates that ctdb should wait up to TIMEOUT seconds for a
41 response to most commands sent to the CTDB daemon. The default is
42 10 seconds.
43
44 -T TIMELIMIT
45 Indicates that TIMELIMIT is the maximum run time (in seconds) for
46 the ctdb command. When TIMELIMIT is exceeded the ctdb command will
47 terminate with an error. The default is 120 seconds.
48
49 -? --help
50 Print some help text to the screen.
51
52 --usage
53 Print useage information to the screen.
54
55 -d --debug=DEBUGLEVEL
56 Change the debug level for the command. Default is ERR (0).
57
58 --socket=FILENAME
59 Specify that FILENAME is the name of the Unix domain socket to use
60 when connecting to the local CTDB daemon. The default is
61 /tmp/ctdb.socket.
62
64 These are commands used to monitor and administer a CTDB cluster.
65
66 pnn
67 This command displays the PNN of the current node.
68
69 xpnn
70 This command displays the PNN of the current node without contacting
71 the CTDB daemon. It parses the nodes file directly, so can produce
72 unexpected output if the nodes file has been edited but has not been
73 reloaded.
74
75 status
76 This command shows the current status of all CTDB nodes based on
77 information from the queried node.
78
79 Note: If the the queried node is INACTIVE then the status might not be
80 current.
81
82 Node status
83 This includes the number of physical nodes and the status of each
84 node. See ctdb(7) for information about node states.
85
86 Generation
87 The generation id is a number that indicates the current generation
88 of a cluster instance. Each time a cluster goes through a
89 reconfiguration or a recovery its generation id will be changed.
90
91 This number does not have any particular meaning other than to keep
92 track of when a cluster has gone through a recovery. It is a random
93 number that represents the current instance of a ctdb cluster and
94 its databases. The CTDB daemon uses this number internally to be
95 able to tell when commands to operate on the cluster and the
96 databases was issued in a different generation of the cluster, to
97 ensure that commands that operate on the databases will not survive
98 across a cluster database recovery. After a recovery, all old
99 outstanding commands will automatically become invalid.
100
101 Sometimes this number will be shown as "INVALID". This only means
102 that the ctdbd daemon has started but it has not yet merged with
103 the cluster through a recovery. All nodes start with generation
104 "INVALID" and are not assigned a real generation id until they have
105 successfully been merged with a cluster through a recovery.
106
107 Virtual Node Number (VNN) map
108 Consists of the number of virtual nodes and mapping from virtual
109 node numbers to physical node numbers. Virtual nodes host CTDB
110 databases. Only nodes that are participating in the VNN map can
111 become lmaster or dmaster for database records.
112
113 Recovery mode
114 This is the current recovery mode of the cluster. There are two
115 possible modes:
116
117 NORMAL - The cluster is fully operational.
118
119 RECOVERY - The cluster databases have all been frozen, pausing all
120 services while the cluster awaits a recovery process to complete. A
121 recovery process should finish within seconds. If a cluster is
122 stuck in the RECOVERY state this would indicate a cluster
123 malfunction which needs to be investigated.
124
125 Once the recovery master detects an inconsistency, for example a
126 node becomes disconnected/connected, the recovery daemon will
127 trigger a cluster recovery process, where all databases are
128 remerged across the cluster. When this process starts, the recovery
129 master will first "freeze" all databases to prevent applications
130 such as samba from accessing the databases and it will also mark
131 the recovery mode as RECOVERY.
132
133 When the CTDB daemon starts up, it will start in RECOVERY mode.
134 Once the node has been merged into a cluster and all databases have
135 been recovered, the node mode will change into NORMAL mode and the
136 databases will be "thawed", allowing samba to access the databases
137 again.
138
139 Recovery master
140 This is the cluster node that is currently designated as the
141 recovery master. This node is responsible of monitoring the
142 consistency of the cluster and to perform the actual recovery
143 process when reqired.
144
145 Only one node at a time can be the designated recovery master.
146 Which node is designated the recovery master is decided by an
147 election process in the recovery daemons running on each node.
148
149 Example
150 # ctdb status
151 Number of nodes:4
152 pnn:0 192.168.2.200 OK (THIS NODE)
153 pnn:1 192.168.2.201 OK
154 pnn:2 192.168.2.202 OK
155 pnn:3 192.168.2.203 OK
156 Generation:1362079228
157 Size:4
158 hash:0 lmaster:0
159 hash:1 lmaster:1
160 hash:2 lmaster:2
161 hash:3 lmaster:3
162 Recovery mode:NORMAL (0)
163 Recovery master:0
164
165
166 nodestatus [PNN-LIST]
167 This command is similar to the status command. It displays the "node
168 status" subset of output. The main differences are:
169
170 · The exit code is the bitwise-OR of the flags for each specified
171 node, while ctdb status exits with 0 if it was able to retrieve
172 status for all nodes.
173
174 · ctdb status provides status information for all nodes. ctdb
175 nodestatus defaults to providing status for only the current node.
176 If PNN-LIST is provided then status is given for the indicated
177 node(s).
178
179 By default, ctdb nodestatus gathers status from the local node.
180 However, if invoked with "-n all" (or similar) then status is
181 gathered from the given node(s). In particular ctdb nodestatus all
182 and ctdb nodestatus -n all will produce different output. It is
183 possible to provide 2 different nodespecs (with and without "-n")
184 but the output is usually confusing!
185
186 A common invocation in scripts is ctdb nodestatus all to check whether
187 all nodes in a cluster are healthy.
188
189 Example
190 # ctdb nodestatus
191 pnn:0 10.0.0.30 OK (THIS NODE)
192
193 # ctdb nodestatus all
194 Number of nodes:2
195 pnn:0 10.0.0.30 OK (THIS NODE)
196 pnn:1 10.0.0.31 OK
197
198
199 recmaster
200 This command shows the pnn of the node which is currently the
201 recmaster.
202
203 Note: If the the queried node is INACTIVE then the status might not be
204 current.
205
206 uptime
207 This command shows the uptime for the ctdb daemon. When the last
208 recovery or ip-failover completed and how long it took. If the
209 "duration" is shown as a negative number, this indicates that there is
210 a recovery/failover in progress and it started that many seconds ago.
211
212 Example
213 # ctdb uptime
214 Current time of node : Thu Oct 29 10:38:54 2009
215 Ctdbd start time : (000 16:54:28) Wed Oct 28 17:44:26 2009
216 Time of last recovery/failover: (000 16:53:31) Wed Oct 28 17:45:23 2009
217 Duration of last recovery/failover: 2.248552 seconds
218
219
220 listnodes
221 This command shows lists the ip addresses of all the nodes in the
222 cluster.
223
224 Example
225 # ctdb listnodes
226 192.168.2.200
227 192.168.2.201
228 192.168.2.202
229 192.168.2.203
230
231
232 natgwlist
233 Show the current NAT gateway master and the status of all nodes in the
234 current NAT gateway group. See the NAT GATEWAY section in ctdb(7) for
235 more details.
236
237 Example
238 # ctdb natgwlist
239 0 192.168.2.200
240 Number of nodes:4
241 pnn:0 192.168.2.200 OK (THIS NODE)
242 pnn:1 192.168.2.201 OK
243 pnn:2 192.168.2.202 OK
244 pnn:3 192.168.2.203 OK
245
246
247 ping
248 This command will "ping" specified CTDB nodes in the cluster to verify
249 that they are running.
250
251 Example
252 # ctdb ping -n all
253 response from 0 time=0.000054 sec (3 clients)
254 response from 1 time=0.000144 sec (2 clients)
255 response from 2 time=0.000105 sec (2 clients)
256 response from 3 time=0.000114 sec (2 clients)
257
258
259 ifaces
260 This command will display the list of network interfaces, which could
261 host public addresses, along with their status.
262
263 Example
264 # ctdb ifaces
265 Interfaces on node 0
266 name:eth5 link:up references:2
267 name:eth4 link:down references:0
268 name:eth3 link:up references:1
269 name:eth2 link:up references:1
270
271 # ctdb ifaces -Y
272 :Name:LinkStatus:References:
273 :eth5:1:2
274 :eth4:0:0
275 :eth3:1:1
276 :eth2:1:1
277
278
279 ip
280 This command will display the list of public addresses that are
281 provided by the cluster and which physical node is currently serving
282 this ip. By default this command will ONLY show those public addresses
283 that are known to the node itself. To see the full list of all public
284 ips across the cluster you must use "ctdb ip -n all".
285
286 Example
287 # ctdb ip
288 Public IPs on node 0
289 172.31.91.82 node[1] active[] available[eth2,eth3] configured[eth2,eth3]
290 172.31.91.83 node[0] active[eth3] available[eth2,eth3] configured[eth2,eth3]
291 172.31.91.84 node[1] active[] available[eth2,eth3] configured[eth2,eth3]
292 172.31.91.85 node[0] active[eth2] available[eth2,eth3] configured[eth2,eth3]
293 172.31.92.82 node[1] active[] available[eth5] configured[eth4,eth5]
294 172.31.92.83 node[0] active[eth5] available[eth5] configured[eth4,eth5]
295 172.31.92.84 node[1] active[] available[eth5] configured[eth4,eth5]
296 172.31.92.85 node[0] active[eth5] available[eth5] configured[eth4,eth5]
297
298 # ctdb ip -Y
299 :Public IP:Node:ActiveInterface:AvailableInterfaces:ConfiguredInterfaces:
300 :172.31.91.82:1::eth2,eth3:eth2,eth3:
301 :172.31.91.83:0:eth3:eth2,eth3:eth2,eth3:
302 :172.31.91.84:1::eth2,eth3:eth2,eth3:
303 :172.31.91.85:0:eth2:eth2,eth3:eth2,eth3:
304 :172.31.92.82:1::eth5:eth4,eth5:
305 :172.31.92.83:0:eth5:eth5:eth4,eth5:
306 :172.31.92.84:1::eth5:eth4,eth5:
307 :172.31.92.85:0:eth5:eth5:eth4,eth5:
308
309
310 ipinfo IP
311 This command will display details about the specified public addresses.
312
313 Example
314 # ctdb ipinfo 172.31.92.85
315 Public IP[172.31.92.85] info on node 0
316 IP:172.31.92.85
317 CurrentNode:0
318 NumInterfaces:2
319 Interface[1]: Name:eth4 Link:down References:0
320 Interface[2]: Name:eth5 Link:up References:2 (active)
321
322
323 scriptstatus
324 This command displays which scripts where run in the previous
325 monitoring cycle and the result of each script. If a script failed with
326 an error, causing the node to become unhealthy, the output from that
327 script is also shown.
328
329 Example
330 # ctdb scriptstatus
331 7 scripts were executed last monitoring cycle
332 00.ctdb Status:OK Duration:0.056 Tue Mar 24 18:56:57 2009
333 10.interface Status:OK Duration:0.077 Tue Mar 24 18:56:57 2009
334 11.natgw Status:OK Duration:0.039 Tue Mar 24 18:56:57 2009
335 20.multipathd Status:OK Duration:0.038 Tue Mar 24 18:56:57 2009
336 31.clamd Status:DISABLED
337 40.vsftpd Status:OK Duration:0.045 Tue Mar 24 18:56:57 2009
338 41.httpd Status:OK Duration:0.039 Tue Mar 24 18:56:57 2009
339 50.samba Status:ERROR Duration:0.082 Tue Mar 24 18:56:57 2009
340 OUTPUT:ERROR: Samba tcp port 445 is not responding
341
342
343 disablescript SCRIPT
344 This command is used to disable an eventscript.
345
346 This will take effect the next time the eventscripts are being executed
347 so it can take a short while until this is reflected in 'scriptstatus'.
348
349 enablescript SCRIPT
350 This command is used to enable an eventscript.
351
352 This will take effect the next time the eventscripts are being executed
353 so it can take a short while until this is reflected in 'scriptstatus'.
354
355 listvars
356 List all tuneable variables, except the values of the obsolete tunables
357 like VacuumMinInterval. The obsolete tunables can be retrieved only
358 explicitly with the "ctdb getvar" command.
359
360 Example
361 # ctdb listvars
362 MaxRedirectCount = 3
363 SeqnumInterval = 1000
364 ControlTimeout = 60
365 TraverseTimeout = 20
366 KeepaliveInterval = 5
367 KeepaliveLimit = 5
368 RecoverTimeout = 20
369 RecoverInterval = 1
370 ElectionTimeout = 3
371 TakeoverTimeout = 9
372 MonitorInterval = 15
373 TickleUpdateInterval = 20
374 EventScriptTimeout = 30
375 EventScriptTimeoutCount = 1
376 RecoveryGracePeriod = 120
377 RecoveryBanPeriod = 300
378 DatabaseHashSize = 100001
379 DatabaseMaxDead = 5
380 RerecoveryTimeout = 10
381 EnableBans = 1
382 DeterministicIPs = 0
383 LCP2PublicIPs = 1
384 ReclockPingPeriod = 60
385 NoIPFailback = 0
386 DisableIPFailover = 0
387 VerboseMemoryNames = 0
388 RecdPingTimeout = 60
389 RecdFailCount = 10
390 LogLatencyMs = 0
391 RecLockLatencyMs = 1000
392 RecoveryDropAllIPs = 120
393 VerifyRecoveryLock = 1
394 VacuumInterval = 10
395 VacuumMaxRunTime = 30
396 RepackLimit = 10000
397 VacuumLimit = 5000
398 VacuumFastPathCount = 60
399 MaxQueueDropMsg = 1000000
400 UseStatusEvents = 0
401 AllowUnhealthyDBRead = 0
402 StatHistoryInterval = 1
403 DeferredAttachTO = 120
404 AllowClientDBAttach = 1
405 RecoverPDBBySeqNum = 0
406
407
408 getvar NAME
409 Get the runtime value of a tuneable variable.
410
411 Example
412 # ctdb getvar MaxRedirectCount
413 MaxRedirectCount = 3
414
415
416 setvar NAME VALUE
417 Set the runtime value of a tuneable variable.
418
419 Example: ctdb setvar MaxRedirectCount 5
420
421 lvsmaster
422 This command shows which node is currently the LVSMASTER. The LVSMASTER
423 is the node in the cluster which drives the LVS system and which
424 receives all incoming traffic from clients.
425
426 LVS is the mode where the entire CTDB/Samba cluster uses a single ip
427 address for the entire cluster. In this mode all clients connect to one
428 specific node which will then multiplex/loadbalance the clients evenly
429 onto the other nodes in the cluster. This is an alternative to using
430 public ip addresses. See the manpage for ctdbd for more information
431 about LVS.
432
433 lvs
434 This command shows which nodes in the cluster are currently active in
435 the LVS configuration. I.e. which nodes we are currently loadbalancing
436 the single ip address across.
437
438 LVS will by default only loadbalance across those nodes that are both
439 LVS capable and also HEALTHY. Except if all nodes are UNHEALTHY in
440 which case LVS will loadbalance across all UNHEALTHY nodes as well. LVS
441 will never use nodes that are DISCONNECTED, STOPPED, BANNED or
442 DISABLED.
443
444 Example output:
445
446 2:10.0.0.13
447 3:10.0.0.14
448
449
450 getcapabilities
451 This command shows the capabilities of the current node. See the
452 CAPABILITIES section in ctdb(7) for more details.
453
454 Example output:
455
456 RECMASTER: YES
457 LMASTER: YES
458 LVS: NO
459 NATGW: YES
460
461
462 statistics
463 Collect statistics from the CTDB daemon about how many calls it has
464 served.
465
466 Example
467 # ctdb statistics
468 CTDB version 1
469 num_clients 3
470 frozen 0
471 recovering 0
472 client_packets_sent 360489
473 client_packets_recv 360466
474 node_packets_sent 480931
475 node_packets_recv 240120
476 keepalive_packets_sent 4
477 keepalive_packets_recv 3
478 node
479 req_call 2
480 reply_call 2
481 req_dmaster 0
482 reply_dmaster 0
483 reply_error 0
484 req_message 42
485 req_control 120408
486 reply_control 360439
487 client
488 req_call 2
489 req_message 24
490 req_control 360440
491 timeouts
492 call 0
493 control 0
494 traverse 0
495 total_calls 2
496 pending_calls 0
497 lockwait_calls 0
498 pending_lockwait_calls 0
499 memory_used 5040
500 max_hop_count 0
501 max_call_latency 4.948321 sec
502 max_lockwait_latency 0.000000 sec
503
504
505 statisticsreset
506 This command is used to clear all statistics counters in a node.
507
508 Example: ctdb statisticsreset
509
510 dbstatistics DB
511 Display statistics about the database DB.
512
513 Example
514 # ctdb dbstatistics locking.tdb
515 DB Statistics: locking.tdb
516 ro_delegations 0
517 ro_revokes 0
518 locks
519 total 14356
520 failed 0
521 current 0
522 pending 0
523 hop_count_buckets: 28087 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0
524 lock_buckets: 0 14188 38 76 32 19 3 0 0 0 0 0 0 0 0 0
525 locks_latency MIN/AVG/MAX 0.001066/0.012686/4.202292 sec out of 14356
526 Num Hot Keys: 1
527 Count:8 Key:ff5bd7cb3ee3822edc1f0000000000000000000000000000
528
529
530 getreclock
531 This command is used to show the filename of the reclock file that is
532 used.
533
534 Example output:
535
536 Reclock file:/gpfs/.ctdb/shared
537
538
539 setreclock [filename]
540 This command is used to modify, or clear, the file that is used as the
541 reclock file at runtime. When this command is used, the reclock file
542 checks are disabled. To re-enable the checks the administrator needs to
543 activate the "VerifyRecoveryLock" tunable using "ctdb setvar".
544
545 If run with no parameter this will remove the reclock file completely.
546 If run with a parameter the parameter specifies the new filename to use
547 for the recovery lock.
548
549 This command only affects the runtime settings of a ctdb node and will
550 be lost when ctdb is restarted. For persistent changes to the reclock
551 file setting you must edit /etc/sysconfig/ctdb.
552
553 getdebug
554 Get the current debug level for the node. the debug level controls what
555 information is written to the log file.
556
557 The debug levels are mapped to the corresponding syslog levels. When a
558 debug level is set, only those messages at that level and higher levels
559 will be printed.
560
561 The list of debug levels from highest to lowest are :
562
563 EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG
564
565 setdebug DEBUGLEVEL
566 Set the debug level of a node. This controls what information will be
567 logged.
568
569 The debuglevel is one of EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG
570
571 getpid
572 This command will return the process id of the ctdb daemon.
573
574 disable
575 This command is used to administratively disable a node in the cluster.
576 A disabled node will still participate in the cluster and host
577 clustered TDB records but its public ip address has been taken over by
578 a different node and it no longer hosts any services.
579
580 enable
581 Re-enable a node that has been administratively disabled.
582
583 stop
584 This command is used to administratively STOP a node in the cluster. A
585 STOPPED node is connected to the cluster but will not host any public
586 ip addresse, nor does it participate in the VNNMAP. The difference
587 between a DISABLED node and a STOPPED node is that a STOPPED node does
588 not host any parts of the database which means that a recovery is
589 required to stop/continue nodes.
590
591 continue
592 Re-start a node that has been administratively stopped.
593
594 addip IPADDR/mask IFACE
595 This command is used to add a new public ip to a node during runtime.
596 This allows public addresses to be added to a cluster without having to
597 restart the ctdb daemons.
598
599 Note that this only updates the runtime instance of ctdb. Any changes
600 will be lost next time ctdb is restarted and the public addresses file
601 is re-read. If you want this change to be permanent you must also
602 update the public addresses file manually.
603
604 delip IPADDR
605 This command is used to remove a public ip from a node during runtime.
606 If this public ip is currently hosted by the node it being removed
607 from, the ip will first be failed over to another node, if possible,
608 before it is removed.
609
610 Note that this only updates the runtime instance of ctdb. Any changes
611 will be lost next time ctdb is restarted and the public addresses file
612 is re-read. If you want this change to be permanent you must also
613 update the public addresses file manually.
614
615 moveip IPADDR PNN
616 This command can be used to manually fail a public ip address to a
617 specific node.
618
619 In order to manually override the "automatic" distribution of public ip
620 addresses that ctdb normally provides, this command only works when you
621 have changed the tunables for the daemon to:
622
623 DeterministicIPs = 0
624
625 NoIPFailback = 1
626
627 shutdown
628 This command will shutdown a specific CTDB daemon.
629
630 setlmasterrole on|off
631 This command is used ot enable/disable the LMASTER capability for a
632 node at runtime. This capability determines whether or not a node can
633 be used as an LMASTER for records in the database. A node that does not
634 have the LMASTER capability will not show up in the vnnmap.
635
636 Nodes will by default have this capability, but it can be stripped off
637 nodes by the setting in the sysconfig file or by using this command.
638
639 Once this setting has been enabled/disabled, you need to perform a
640 recovery for it to take effect.
641
642 See also "ctdb getcapabilities"
643
644 setrecmasterrole on|off
645 This command is used ot enable/disable the RECMASTER capability for a
646 node at runtime. This capability determines whether or not a node can
647 be used as an RECMASTER for the cluster. A node that does not have the
648 RECMASTER capability can not win a recmaster election. A node that
649 already is the recmaster for the cluster when the capability is
650 stripped off the node will remain the recmaster until the next cluster
651 election.
652
653 Nodes will by default have this capability, but it can be stripped off
654 nodes by the setting in the sysconfig file or by using this command.
655
656 See also "ctdb getcapabilities"
657
658 reloadnodes
659 This command is used when adding new nodes, or removing existing nodes
660 from an existing cluster.
661
662 Procedure to add a node:
663
664 1, To expand an existing cluster, first ensure with 'ctdb status' that
665 all nodes are up and running and that they are all healthy. Do not try
666 to expand a cluster unless it is completely healthy!
667
668 2, On all nodes, edit /etc/ctdb/nodes and add the new node as the last
669 entry to the file. The new node MUST be added to the end of this file!
670
671 3, Verify that all the nodes have identical /etc/ctdb/nodes files after
672 you edited them and added the new node!
673
674 4, Run 'ctdb reloadnodes' to force all nodes to reload the nodesfile.
675
676 5, Use 'ctdb status' on all nodes and verify that they now show the
677 additional node.
678
679 6, Install and configure the new node and bring it online.
680
681 Procedure to remove a node:
682
683 1, To remove a node from an existing cluster, first ensure with 'ctdb
684 status' that all nodes, except the node to be deleted, are up and
685 running and that they are all healthy. Do not try to remove a node from
686 a cluster unless the cluster is completely healthy!
687
688 2, Shutdown and poweroff the node to be removed.
689
690 3, On all other nodes, edit the /etc/ctdb/nodes file and comment out
691 the node to be removed. Do not delete the line for that node, just
692 comment it out by adding a '#' at the beginning of the line.
693
694 4, Run 'ctdb reloadnodes' to force all nodes to reload the nodesfile.
695
696 5, Use 'ctdb status' on all nodes and verify that the deleted node no
697 longer shows up in the list..
698
699 reloadips [PNN-LIST]
700 This command reloads the public addresses configuration file on the
701 specified nodes. When it completes addresses will be reconfigured and
702 reassigned across the cluster as necessary.
703
704 getdbmap
705 This command lists all clustered TDB databases that the CTDB daemon has
706 attached to. Some databases are flagged as PERSISTENT, this means that
707 the database stores data persistently and the data will remain across
708 reboots. One example of such a database is secrets.tdb where
709 information about how the cluster was joined to the domain is stored.
710
711 If a PERSISTENT database is not in a healthy state the database is
712 flagged as UNHEALTHY. If there's at least one completely healthy node
713 running in the cluster, it's possible that the content is restored by a
714 recovery run automaticly. Otherwise an administrator needs to analyze
715 the problem.
716
717 See also "ctdb getdbstatus", "ctdb backupdb", "ctdb restoredb", "ctdb
718 dumpbackup", "ctdb wipedb", "ctdb setvar AllowUnhealthyDBRead 1" and
719 (if samba or tdb-utils are installed) "tdbtool check".
720
721 Most databases are not persistent and only store the state information
722 that the currently running samba daemons need. These databases are
723 always wiped when ctdb/samba starts and when a node is rebooted.
724
725 Example
726 # ctdb getdbmap
727 Number of databases:10
728 dbid:0x435d3410 name:notify.tdb path:/var/ctdb/notify.tdb.0
729 dbid:0x42fe72c5 name:locking.tdb path:/var/ctdb/locking.tdb.0
730 dbid:0x1421fb78 name:brlock.tdb path:/var/ctdb/brlock.tdb.0
731 dbid:0x17055d90 name:connections.tdb path:/var/ctdb/connections.tdb.0
732 dbid:0xc0bdde6a name:sessionid.tdb path:/var/ctdb/sessionid.tdb.0
733 dbid:0x122224da name:test.tdb path:/var/ctdb/test.tdb.0
734 dbid:0x2672a57f name:idmap2.tdb path:/var/ctdb/persistent/idmap2.tdb.0 PERSISTENT
735 dbid:0xb775fff6 name:secrets.tdb path:/var/ctdb/persistent/secrets.tdb.0 PERSISTENT
736 dbid:0xe98e08b6 name:group_mapping.tdb path:/var/ctdb/persistent/group_mapping.tdb.0 PERSISTENT
737 dbid:0x7bbbd26c name:passdb.tdb path:/var/ctdb/persistent/passdb.tdb.0 PERSISTENT
738
739 # ctdb getdbmap # example for unhealthy database
740 Number of databases:1
741 dbid:0xb775fff6 name:secrets.tdb path:/var/ctdb/persistent/secrets.tdb.0 PERSISTENT UNHEALTHY
742
743 # ctdb -Y getdbmap
744 :ID:Name:Path:Persistent:Unhealthy:
745 :0x7bbbd26c:passdb.tdb:/var/ctdb/persistent/passdb.tdb.0:1:0:
746
747
748 backupdb DB FILE
749 Copy the contents of database DB to FILE. FILE can later be read back
750 using restoredb. This is mainly useful for backing up persistent
751 databases such as secrets.tdb and similar.
752
753 restoredb FILE [DB]
754 This command restores a persistent database that was previously backed
755 up using backupdb. By default the data will be restored back into the
756 same database as it was created from. By specifying dbname you can
757 restore the data into a different database.
758
759 getlog [LEVEL] [recoverd]
760 In addition to the normal logging to a log file, CTDB also keeps a
761 in-memory ringbuffer containing the most recent log entries for all log
762 levels (except DEBUG).
763
764 This is useful since it allows for keeping continuous logs to a file at
765 a reasonable non-verbose level, but shortly after an incident has
766 occured, a much more detailed log can be pulled from memory. This can
767 allow you to avoid having to reproduce an issue due to the on-disk logs
768 being of insufficient detail.
769
770 This command extracts all messages of level or lower log level from
771 memory and prints it to the screen. The level is not specified it
772 defaults to NOTICE.
773
774 By default, logs are extracted from the main CTDB daemon. If the
775 recoverd option is given then logs are extracted from the recovery
776 daemon.
777
778 clearlog [recoverd]
779 This command clears the in-memory logging ringbuffer.
780
781 By default, logs are cleared in the main CTDB daemon. If the recoverd
782 option is given then logs are cleared in the recovery daemon.
783
784 setdbreadonly DB
785 This command will enable the read-only record support for a database.
786 This is an experimental feature to improve performance for contended
787 records primarily in locking.tdb and brlock.tdb. When enabling this
788 feature you must set it on all nodes in the cluster.
789
790 setdbsticky DB
791 This command will enable the sticky record support for the specified
792 database. This is an experimental feature to improve performance for
793 contended records primarily in locking.tdb and brlock.tdb. When
794 enabling this feature you must set it on all nodes in the cluster.
795
797 Internal commands are used by CTDB's scripts and are not required for
798 managing a CTDB cluster. Their parameters and behaviour are subject to
799 change.
800
801 gettickles IPADDR
802 Show TCP connections that are registered with CTDB to be "tickled" if
803 there is a failover.
804
805 gratiousarp IPADDR INTERFACE
806 Send out a gratious ARP for the specified interface through the
807 specified interface. This command is mainly used by the ctdb
808 eventscripts.
809
810 killtcp
811 Read a list of TCP connections, one per line, from standard input and
812 terminate each connection. A connection is specified as:
813
814 SRC-IPADDR:SRC-PORT DST-IPADDR:DST-PORT
815
816
817 Each connection is terminated by issuing a TCP RST to the
818 SRC-IPADDR:SRC-PORT endpoint.
819
820 A single connection can be specified on the command-line rather than on
821 standard input.
822
823 pdelete DB KEY
824 Delete KEY from DB.
825
826 pfetch DB KEY
827 Print the value associated with KEY in DB.
828
829 pstore DB KEY FILE
830 Store KEY in DB with contents of FILE as the associated value.
831
832 ptrans DB [FILE]
833 Read a list of key-value pairs, one per line from FILE, and store them
834 in DB using a single transaction. An empty value is equivalent to
835 deleting the given key.
836
837 The key and value should be separated by spaces or tabs. Each key/value
838 should be a printable string enclosed in double-quotes.
839
840 runstate [setup|first_recovery|startup|running]
841 Print the runstate of the specified node. Runstates are used to
842 serialise important state transitions in CTDB, particularly during
843 startup.
844
845 If one or more optional runstate arguments are specified then the node
846 must be in one of these runstates for the command to succeed.
847
848 Example
849 # ctdb runstate
850 RUNNING
851
852
853 setifacelink IFACE up|down
854 Set the internal state of network interface IFACE. This is typically
855 used in the 10.interface script in the "monitor" event.
856
857 Example: ctdb setifacelink eth0 up
858
859 setnatgwstate on|off
860 Enable or disable the NAT gateway master capability on a node.
861
862 tickle SRC-IPADDR:SRC-PORT DST-IPADDR:DST-PORT
863 Send a TCP tickle to the source host for the specified TCP connection.
864 A TCP tickle is a TCP ACK packet with an invalid sequence and
865 acknowledge number and will when received by the source host result in
866 it sending an immediate correct ACK back to the other end.
867
868 TCP tickles are useful to "tickle" clients after a IP failover has
869 occured since this will make the client immediately recognize the TCP
870 connection has been disrupted and that the client will need to
871 reestablish. This greatly speeds up the time it takes for a client to
872 detect and reestablish after an IP failover in the ctdb cluster.
873
874 version
875 Display the CTDB version.
876
878 These commands are primarily used for CTDB development and testing and
879 should not be used for normal administration.
880
881 OPTIONS
882 --print-emptyrecords
883 This enables printing of empty records when dumping databases with
884 the catdb, cattbd and dumpdbbackup commands. Records with empty
885 data segment are considered deleted by ctdb and cleaned by the
886 vacuuming mechanism, so this switch can come in handy for debugging
887 the vacuuming behaviour.
888
889 --print-datasize
890 This lets database dumps (catdb, cattdb, dumpdbbackup) print the
891 size of the record data instead of dumping the data contents.
892
893 --print-lmaster
894 This lets catdb print the lmaster for each record.
895
896 --print-hash
897 This lets database dumps (catdb, cattdb, dumpdbbackup) print the
898 hash for each record.
899
900 --print-recordflags
901 This lets catdb and dumpdbbackup print the record flags for each
902 record. Note that cattdb always prints the flags.
903
904 process-exists PID
905 This command checks if a specific process exists on the CTDB host. This
906 is mainly used by Samba to check if remote instances of samba are still
907 running or not.
908
909 getdbstatus DB
910 This command displays more details about a database.
911
912 Example
913 # ctdb getdbstatus test.tdb.0
914 dbid: 0x122224da
915 name: test.tdb
916 path: /var/ctdb/test.tdb.0
917 PERSISTENT: no
918 HEALTH: OK
919
920 # ctdb getdbstatus registry.tdb # with a corrupted TDB
921 dbid: 0xf2a58948
922 name: registry.tdb
923 path: /var/ctdb/persistent/registry.tdb.0
924 PERSISTENT: yes
925 HEALTH: NO-HEALTHY-NODES - ERROR - Backup of corrupted TDB in '/var/ctdb/persistent/registry.tdb.0.corrupted.20091208091949.0Z'
926
927
928 catdb DB
929 Print a dump of the clustered TDB database DB.
930
931 cattdb DB
932 Print a dump of the contents of the local TDB database DB.
933
934 dumpdbbackup FILE
935 Print a dump of the contents from database backup FILE, similar to
936 catdb.
937
938 wipedb DB
939 Remove all contents of database DB.
940
941 recover
942 This command will trigger the recovery daemon to do a cluster recovery.
943
944 ipreallocate, sync
945 This command will force the recovery master to perform a full ip
946 reallocation process and redistribute all ip addresses. This is useful
947 to "reset" the allocations back to its default state if they have been
948 changed using the "moveip" command. While a "recover" will also perform
949 this reallocation, a recovery is much more hevyweight since it will
950 also rebuild all the databases.
951
952 getmonmode
953 This command returns the monutoring mode of a node. The monitoring mode
954 is either ACTIVE or DISABLED. Normally a node will continuously monitor
955 that all other nodes that are expected are in fact connected and that
956 they respond to commands.
957
958 ACTIVE - This is the normal mode. The node is actively monitoring all
959 other nodes, both that the transport is connected and also that the
960 node responds to commands. If a node becomes unavailable, it will be
961 marked as DISCONNECTED and a recovery is initiated to restore the
962 cluster.
963
964 DISABLED - This node is not monitoring that other nodes are available.
965 In this mode a node failure will not be detected and no recovery will
966 be performed. This mode is useful when for debugging purposes one wants
967 to attach GDB to a ctdb process but wants to prevent the rest of the
968 cluster from marking this node as DISCONNECTED and do a recovery.
969
970 setmonmode 0|1
971 This command can be used to explicitly disable/enable monitoring mode
972 on a node. The main purpose is if one wants to attach GDB to a running
973 ctdb daemon but wants to prevent the other nodes from marking it as
974 DISCONNECTED and issuing a recovery. To do this, set monitoring mode to
975 0 on all nodes before attaching with GDB. Remember to set monitoring
976 mode back to 1 afterwards.
977
978 attach DBNAME [persistent]
979 This is a debugging command. This command will make the CTDB daemon
980 create a new CTDB database and attach to it.
981
982 dumpmemory
983 This is a debugging command. This command will make the ctdb daemon to
984 write a fill memory allocation map to standard output.
985
986 rddumpmemory
987 This is a debugging command. This command will dump the talloc memory
988 allocation tree for the recovery daemon to standard output.
989
990 thaw
991 Thaw a previously frozen node.
992
993 eventscript ARGUMENTS
994 This is a debugging command. This command can be used to manually
995 invoke and run the eventscritps with arbitrary arguments.
996
997 ban BANTIME
998 Administratively ban a node for BANTIME seconds. The node will be
999 unbanned after BANTIME seconds have elapsed.
1000
1001 A banned node does not participate in the cluster. It does not host any
1002 records for the clustered TDB and does not host any public IP
1003 addresses.
1004
1005 Nodes are automatically banned if they misbehave. For example, a node
1006 may be banned if it causes too many cluster recoveries.
1007
1008 To administratively exclude a node from a cluster use the stop command.
1009
1010 unban
1011 This command is used to unban a node that has either been
1012 administratively banned using the ban command or has been automatically
1013 banned.
1014
1015 rebalancenode [PNN-LIST]
1016 This command marks the given nodes as rebalance targets in the LCP2 IP
1017 allocation algorithm. The reloadips command will do this as necessary
1018 so this command should not be needed.
1019
1020 check_srvids SRVID ...
1021 This command checks whether a set of srvid message ports are registered
1022 on the node or not. The command takes a list of values to check.
1023
1024 Example
1025 # ctdb check_srvids 1 2 3 14765
1026 Server id 0:1 does not exist
1027 Server id 0:2 does not exist
1028 Server id 0:3 does not exist
1029 Server id 0:14765 exists
1030
1031
1032 vacuum [max-records]
1033 Over time CTDB databases will fill up with empty deleted records which
1034 will lead to a progressive slow down of CTDB database access. This
1035 command is used to prune all databases and delete all empty records
1036 from the cluster.
1037
1038 By default, vacuum will delete all empty records from all databases. If
1039 [max_records] is specified, the command will only delete the first
1040 [max_records] empty records for each database.
1041
1042 Vacuum only deletes records where the local node is the lmaster. To
1043 delete all records from the entire cluster you need to run a vacuum
1044 from each node. This command is not disruptive. Samba is unaffected and
1045 will still be able to read/write records normally while the database is
1046 being vacuumed.
1047
1048 Example: ctdb vacuum
1049
1050 By default, this operation is issued from the 00.ctdb event script
1051 every 5 minutes.
1052
1053 repack [max_freelist]
1054 Over time, when records are created and deleted in a TDB, the TDB list
1055 of free space will become fragmented. This can lead to a slowdown in
1056 accessing TDB records. This command is used to defragment a TDB
1057 database and pruning the freelist.
1058
1059 If [max_freelist] is specified, then a database will only be repacked
1060 if it has more than this number of entries in the freelist.
1061
1062 During repacking of the database, the entire TDB database will be
1063 locked to prevent writes. If samba tries to write to a record in the
1064 database during a repack operation, samba will block until the
1065 repacking has completed.
1066
1067 This command can be disruptive and can cause samba to block for the
1068 duration of the repack operation. In general, a repack operation will
1069 take less than one second to complete.
1070
1071 A repack operation will only defragment the local TDB copy of the CTDB
1072 database. You need to run this command on all of the nodes to repack a
1073 CTDB database completely.
1074
1075 Example: ctdb repack 1000
1076
1077 By default, this operation is issued from the 00.ctdb event script
1078 every 5 minutes.
1079
1081 ctdbd(1), onnode(1), ctdb(7), ctdb-tunables(7), http://ctdb.samba.org/
1082
1084 This documentation was written by Ronnie Sahlberg, Amitay Isaacs,
1085 Martin Schwenke
1086
1088 Copyright © 2007 Andrew Tridgell, Ronnie Sahlberg
1089
1090 This program is free software; you can redistribute it and/or modify it
1091 under the terms of the GNU General Public License as published by the
1092 Free Software Foundation; either version 3 of the License, or (at your
1093 option) any later version.
1094
1095 This program is distributed in the hope that it will be useful, but
1096 WITHOUT ANY WARRANTY; without even the implied warranty of
1097 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1098 General Public License for more details.
1099
1100 You should have received a copy of the GNU General Public License along
1101 with this program; if not, see http://www.gnu.org/licenses.
1102
1103
1104
1105
1106ctdb 11/27/2013 CTDB(1)