ctdb(1) - f14

1CTDB(1)                                                                CTDB(1)
2
3
4

NAME

6       ctdb - clustered tdb database management utility
7

SYNOPSIS

9       ctdb [ OPTIONS ] COMMAND ...
10
11       ctdb [-n <node>] [-Y] [-t <timeout>] [-T <timelimit>] [-? --help]
12            [--usage] [-d --debug=<INTEGER>] [--socket=<filename>]
13

DESCRIPTION

15       ctdb is a utility to view and manage a ctdb cluster.
16

OPTIONS

18       -n <pnn>
19           This specifies the physical node number on which to execute the
20           command. Default is to run the command on the deamon running on the
21           local host.
22
23           The physical node number is an integer that describes the node in
24           the cluster. The first node has physical node number 0.
25
26       -Y
27           Produce output in machine readable form for easier parsing by
28           scripts. Not all commands support this option.
29
30       -t <timeout>
31           How long should ctdb wait for the local ctdb daemon to respond to a
32           command before timing out. Default is 3 seconds.
33
34       -T <timelimit>
35           A limit on how long the ctdb command will run for before it will be
36           aborted. When this timelimit has been exceeded the ctdb command
37           will terminate.
38
39       -? --help
40           Print some help text to the screen.
41
42       --usage
43           Print useage information to the screen.
44
45       -d --debug=<debuglevel>
46           Change the debug level for the command. Default is 0.
47
48       --socket=<filename>
49           Specify the socketname to use when connecting to the local ctdb
50           daemon. The default is /tmp/ctdb.socket .
51
52           You only need to specify this parameter if you run multiple ctdb
53           daemons on the same physical host and thus can not use the default
54           name for the domain socket.
55

ADMINISTRATIVE COMMANDS

57       These are commands used to monitor and administrate a CTDB cluster.
58
59   pnn
60       This command displays the pnn of the current node.
61
62   status
63       This command shows the current status of the ctdb node.
64
65       node status
66           Node status reflects the current status of the node. There are five
67           possible states:
68
69           OK - This node is fully functional.
70
71           DISCONNECTED - This node could not be connected through the network
72           and is currently not participating in the cluster. If there is a
73           public IP address associated with this node it should have been
74           taken over by a different node. No services are running on this
75           node.
76
77           DISABLED - This node has been administratively disabled. This node
78           is still functional and participates in the CTDB cluster but its IP
79           addresses have been taken over by a different node and no services
80           are currently being hosted.
81
82           UNHEALTHY - A service provided by this node is malfunctioning and
83           should be investigated. The CTDB daemon itself is operational and
84           participates in the cluster. Its public IP address has been taken
85           over by a different node and no services are currnetly being
86           hosted. All unhealthy nodes should be investigated and require an
87           administrative action to rectify.
88
89           BANNED - This node failed too many recovery attempts and has been
90           banned from participating in the cluster for a period of
91           RecoveryBanPeriod seconds. Any public IP address has been taken
92           over by other nodes. This node does not provide any services. All
93           banned nodes should be investigated and require an administrative
94           action to rectify. This node does not perticipate in the CTDB
95           cluster but can still be communicated with. I.e. ctdb commands can
96           be sent to it.
97
98           STOPPED - A node that is stopped does not host any public ip
99           addresses, nor is it part of the VNNMAP. A stopped node can not
100           become LVSMASTER, RECMASTER or NATGW. This node does not
101           perticipate in the CTDB cluster but can still be communicated with.
102           I.e. ctdb commands can be sent to it.
103
104       generation
105           The generation id is a number that indicates the current generation
106           of a cluster instance. Each time a cluster goes through a
107           reconfiguration or a recovery its generation id will be changed.
108
109           This number does not have any particular meaning other than to keep
110           track of when a cluster has gone through a recovery. It is a random
111           number that represents the current instance of a ctdb cluster and
112           its databases. CTDBD uses this number internally to be able to tell
113           when commands to operate on the cluster and the databases was
114           issued in a different generation of the cluster, to ensure that
115           commands that operate on the databases will not survive across a
116           cluster database recovery. After a recovery, all old outstanding
117           commands will automatically become invalid.
118
119           Sometimes this number will be shown as "INVALID". This only means
120           that the ctdbd daemon has started but it has not yet merged with
121           the cluster through a recovery. All nodes start with generation
122           "INVALID" and are not assigned a real generation id until they have
123           successfully been merged with a cluster through a recovery.
124
125       VNNMAP
126           The list of Virtual Node Numbers. This is a list of all nodes that
127           actively participates in the cluster and that share the workload of
128           hosting the Clustered TDB database records. Only nodes that are
129           participating in the vnnmap can become lmaster or dmaster for a
130           database record.
131
132       Recovery mode
133           This is the current recovery mode of the cluster. There are two
134           possible modes:
135
136           NORMAL - The cluster is fully operational.
137
138           RECOVERY - The cluster databases have all been frozen, pausing all
139           services while the cluster awaits a recovery process to complete. A
140           recovery process should finish within seconds. If a cluster is
141           stuck in the RECOVERY state this would indicate a cluster
142           malfunction which needs to be investigated.
143
144           Once the recovery master detects an inconsistency, for example a
145           node becomes disconnected/connected, the recovery daemon will
146           trigger a cluster recovery process, where all databases are
147           remerged across the cluster. When this process starts, the recovery
148           master will first "freeze" all databases to prevent applications
149           such as samba from accessing the databases and it will also mark
150           the recovery mode as RECOVERY.
151
152           When CTDBD starts up, it will start in RECOVERY mode. Once the node
153           has been merged into a cluster and all databases have been
154           recovered, the node mode will change into NORMAL mode and the
155           databases will be "thawed", allowing samba to access the databases
156           again.
157
158       Recovery master
159           This is the cluster node that is currently designated as the
160           recovery master. This node is responsible of monitoring the
161           consistency of the cluster and to perform the actual recovery
162           process when reqired.
163
164           Only one node at a time can be the designated recovery master.
165           Which node is designated the recovery master is decided by an
166           election process in the recovery daemons running on each node.
167
168       Example: ctdb status
169
170       Example output:
171
172           Number of nodes:4
173           pnn:0 11.1.2.200       OK (THIS NODE)
174           pnn:1 11.1.2.201       OK
175           pnn:2 11.1.2.202       OK
176           pnn:3 11.1.2.203       OK
177           Generation:1362079228
178           Size:4
179           hash:0 lmaster:0
180           hash:1 lmaster:1
181           hash:2 lmaster:2
182           hash:3 lmaster:3
183           Recovery mode:NORMAL (0)
184           Recovery master:0
185
186
187   recmaster
188       This command shows the pnn of the node which is currently the
189       recmaster.
190
191   uptime
192       This command shows the uptime for the ctdb daemon. When the last
193       recovery or ip-failover completed and how long it took. If the
194       "duration" is shown as a negative number, this indicates that there is
195       a recovery/failover in progress and it started that many seconds ago.
196
197       Example: ctdb uptime
198
199       Example output:
200
201           Current time of node          :                Thu Oct 29 10:38:54 2009
202           Ctdbd start time              : (000 16:54:28) Wed Oct 28 17:44:26 2009
203           Time of last recovery/failover: (000 16:53:31) Wed Oct 28 17:45:23 2009
204           Duration of last recovery/failover: 2.248552 seconds
205
206
207   listnodes
208       This command shows lists the ip addresses of all the nodes in the
209       cluster.
210
211       Example: ctdb listnodes
212
213       Example output:
214
215           10.0.0.71
216           10.0.0.72
217           10.0.0.73
218           10.0.0.74
219
220
221   ping
222       This command will "ping" all CTDB daemons in the cluster to verify that
223       they are processing commands correctly.
224
225       Example: ctdb ping
226
227       Example output:
228
229           response from 0 time=0.000054 sec  (3 clients)
230           response from 1 time=0.000144 sec  (2 clients)
231           response from 2 time=0.000105 sec  (2 clients)
232           response from 3 time=0.000114 sec  (2 clients)
233
234
235   ip
236       This command will display the list of public addresses that are
237       provided by the cluster and which physical node is currently serving
238       this ip. By default this command will ONLY show those public addresses
239       that are known to the node itself. To see the full list of all public
240       ips across the cluster you must use "ctdb ip -n all".
241
242       Example: ctdb ip
243
244       Example output:
245
246           Number of addresses:4
247           12.1.1.1         0
248           12.1.1.2         1
249           12.1.1.3         2
250           12.1.1.4         3
251
252
253   scriptstatus
254       This command displays which scripts where run in the previous
255       monitoring cycle and the result of each script. If a script failed with
256       an error, causing the node to become unhealthy, the output from that
257       script is also shown.
258
259       Example: ctdb scriptstatus
260
261       Example output:
262
263           7 scripts were executed last monitoring cycle
264           00.ctdb              Status:OK    Duration:0.056 Tue Mar 24 18:56:57 2009
265           10.interface         Status:OK    Duration:0.077 Tue Mar 24 18:56:57 2009
266           11.natgw             Status:OK    Duration:0.039 Tue Mar 24 18:56:57 2009
267           20.multipathd        Status:OK    Duration:0.038 Tue Mar 24 18:56:57 2009
268           31.clamd             Status:DISABLED
269           40.vsftpd            Status:OK    Duration:0.045 Tue Mar 24 18:56:57 2009
270           41.httpd             Status:OK    Duration:0.039 Tue Mar 24 18:56:57 2009
271           50.samba             Status:ERROR    Duration:0.082 Tue Mar 24 18:56:57 2009
272              OUTPUT:ERROR: Samba tcp port 445 is not responding
273
274
275   disablescript <script>
276       This command is used to disable an eventscript.
277
278       This will take effect the next time the eventscripts are being executed
279       so it can take a short while until this is reflected in ´scriptstatus´.
280
281   enablescript <script>
282       This command is used to enable an eventscript.
283
284       This will take effect the next time the eventscripts are being executed
285       so it can take a short while until this is reflected in ´scriptstatus´.
286
287   getvar <name>
288       Get the runtime value of a tuneable variable.
289
290       Example: ctdb getvar MaxRedirectCount
291
292       Example output:
293
294           MaxRedirectCount    = 3
295
296
297   setvar <name> <value>
298       Set the runtime value of a tuneable variable.
299
300       Example: ctdb setvar MaxRedirectCount 5
301
302   listvars
303       List all tuneable variables.
304
305       Example: ctdb listvars
306
307       Example output:
308
309           MaxRedirectCount    = 3
310           SeqnumInterval      = 1000
311           ControlTimeout      = 60
312           TraverseTimeout     = 20
313           KeepaliveInterval   = 5
314           KeepaliveLimit      = 5
315           MaxLACount          = 7
316           RecoverTimeout      = 20
317           RecoverInterval     = 1
318           ElectionTimeout     = 3
319           TakeoverTimeout     = 5
320           MonitorInterval     = 15
321           TickleUpdateInterval = 20
322           EventScriptTimeout  = 30
323           EventScriptBanCount = 10
324           EventScriptUnhealthyOnTimeout = 0
325           RecoveryGracePeriod = 120
326           RecoveryBanPeriod   = 300
327           DatabaseHashSize    = 10000
328           DatabaseMaxDead     = 5
329           RerecoveryTimeout   = 10
330           EnableBans          = 1
331           DeterministicIPs    = 1
332           DisableWhenUnhealthy = 0
333           ReclockPingPeriod   = 60
334           NoIPFailback        = 0
335           VerboseMemoryNames  = 0
336           RecdPingTimeout     = 60
337           RecdFailCount       = 10
338           LogLatencyMs        = 0
339           RecLockLatencyMs    = 1000
340           RecoveryDropAllIPs  = 60
341           VerifyRecoveryLock  = 1
342           VacuumDefaultInterval = 300
343           VacuumMaxRunTime    = 30
344           RepackLimit         = 10000
345           VacuumLimit         = 5000
346           VacuumMinInterval   = 60
347           VacuumMaxInterval   = 600
348           MaxQueueDropMsg     = 1000
349           UseStatusEvents     = 0
350           AllowUnhealthyDBRead = 0
351
352
353   lvsmaster
354       This command shows which node is currently the LVSMASTER. The LVSMASTER
355       is the node in the cluster which drives the LVS system and which
356       receives all incoming traffic from clients.
357
358       LVS is the mode where the entire CTDB/Samba cluster uses a single ip
359       address for the entire cluster. In this mode all clients connect to one
360       specific node which will then multiplex/loadbalance the clients evenly
361       onto the other nodes in the cluster. This is an alternative to using
362       public ip addresses. See the manpage for ctdbd for more information
363       about LVS.
364
365   lvs
366       This command shows which nodes in the cluster are currently active in
367       the LVS configuration. I.e. which nodes we are currently loadbalancing
368       the single ip address across.
369
370       LVS will by default only loadbalance across those nodes that are both
371       LVS capable and also HEALTHY. Except if all nodes are UNHEALTHY in
372       which case LVS will loadbalance across all UNHEALTHY nodes as well. LVS
373       will never use nodes that are DISCONNECTED, STOPPED, BANNED or
374       DISABLED.
375
376       Example output:
377
378           2:10.0.0.13
379           3:10.0.0.14
380
381
382   getcapabilities
383       This command shows the capabilities of the current node. Please see
384       manpage for ctdbd for a full list of all capabilities and more detailed
385       description.
386
387       RECMASTER and LMASTER capabilities are primarily used when CTDBD is
388       used to create a cluster spanning across WAN links. In which case ctdbd
389       acts as a WAN accelerator.
390
391       LVS capabile means that the node is participating in LVS, a mode where
392       the entire CTDB cluster uses one single ip address for the entire
393       cluster instead of using public ip address failover. This is an
394       alternative to using a loadbalancing layer-4 switch.
395
396       Example output:
397
398           RECMASTER: YES
399           LMASTER: YES
400           LVS: NO
401
402
403   statistics
404       Collect statistics from the CTDB daemon about how many calls it has
405       served.
406
407       Example: ctdb statistics
408
409       Example output:
410
411           CTDB version 1
412            num_clients                        3
413            frozen                             0
414            recovering                         0
415            client_packets_sent           360489
416            client_packets_recv           360466
417            node_packets_sent             480931
418            node_packets_recv             240120
419            keepalive_packets_sent             4
420            keepalive_packets_recv             3
421            node
422                req_call                       2
423                reply_call                     2
424                req_dmaster                    0
425                reply_dmaster                  0
426                reply_error                    0
427                req_message                   42
428                req_control               120408
429                reply_control             360439
430            client
431                req_call                       2
432                req_message                   24
433                req_control               360440
434            timeouts
435                call                           0
436                control                        0
437                traverse                       0
438            total_calls                        2
439            pending_calls                      0
440            lockwait_calls                     0
441            pending_lockwait_calls             0
442            memory_used                     5040
443            max_hop_count                      0
444            max_call_latency                   4.948321 sec
445            max_lockwait_latency               0.000000 sec
446
447
448   statisticsreset
449       This command is used to clear all statistics counters in a node.
450
451       Example: ctdb statisticsreset
452
453   getreclock
454       This command is used to show the filename of the reclock file that is
455       used.
456
457       Example output:
458
459           Reclock file:/gpfs/.ctdb/shared
460
461
462   setreclock [filename]
463       This command is used to modify, or clear, the file that is used as the
464       reclock file at runtime. When this command is used, the reclock file
465       checks are disabled. To re-enable the checks the administrator needs to
466       activate the "VerifyRecoveryLock" tunable using "ctdb setvar".
467
468       If run with no parameter this will remove the reclock file completely.
469       If run with a parameter the parameter specifies the new filename to use
470       for the recovery lock.
471
472       This command only affects the runtime settings of a ctdb node and will
473       be lost when ctdb is restarted. For persistent changes to the reclock
474       file setting you must edit /etc/sysconfig/ctdb.
475
476   getdebug
477       Get the current debug level for the node. the debug level controls what
478       information is written to the log file.
479
480       The debug levels are mapped to the corresponding syslog levels. When a
481       debug level is set, only those messages at that level and higher levels
482       will be printed.
483
484       The list of debug levels from highest to lowest are :
485
486       EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG
487
488   setdebug <debuglevel>
489       Set the debug level of a node. This controls what information will be
490       logged.
491
492       The debuglevel is one of EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG
493
494   getpid
495       This command will return the process id of the ctdb daemon.
496
497   disable
498       This command is used to administratively disable a node in the cluster.
499       A disabled node will still participate in the cluster and host
500       clustered TDB records but its public ip address has been taken over by
501       a different node and it no longer hosts any services.
502
503   enable
504       Re-enable a node that has been administratively disabled.
505
506   stop
507       This command is used to administratively STOP a node in the cluster. A
508       STOPPED node is connected to the cluster but will not host any public
509       ip addresse, nor does it participate in the VNNMAP. The difference
510       between a DISABLED node and a STOPPED node is that a STOPPED node does
511       not host any parts of the database which means that a recovery is
512       required to stop/continue nodes.
513
514   continue
515       Re-start a node that has been administratively stopped.
516
517   addip <public_ip/mask> <iface>
518       This command is used to add a new public ip to a node during runtime.
519       This allows public addresses to be added to a cluster without having to
520       restart the ctdb daemons.
521
522       Note that this only updates the runtime instance of ctdb. Any changes
523       will be lost next time ctdb is restarted and the public addresses file
524       is re-read. If you want this change to be permanent you must also
525       update the public addresses file manually.
526
527   delip <public_ip>
528       This command is used to remove a public ip from a node during runtime.
529       If this public ip is currently hosted by the node it being removed
530       from, the ip will first be failed over to another node, if possible,
531       before it is removed.
532
533       Note that this only updates the runtime instance of ctdb. Any changes
534       will be lost next time ctdb is restarted and the public addresses file
535       is re-read. If you want this change to be permanent you must also
536       update the public addresses file manually.
537
538   moveip <public_ip> <node>
539       This command can be used to manually fail a public ip address to a
540       specific node.
541
542       In order to manually override the "automatic" distribution of public ip
543       addresses that ctdb normally provides, this command only works when you
544       have changed the tunables for the daemon to:
545
546       DeterministicIPs = 0
547
548       NoIPFailback = 1
549
550   shutdown
551       This command will shutdown a specific CTDB daemon.
552
553   recover
554       This command will trigger the recovery daemon to do a cluster recovery.
555
556   ipreallocate
557       This command will force the recovery master to perform a full ip
558       reallocation process and redistribute all ip addresses. This is useful
559       to "reset" the allocations back to its default state if they have been
560       changed using the "moveip" command. While a "recover" will also perform
561       this reallocation, a recovery is much more hevyweight since it will
562       also rebuild all the databases.
563
564   setlmasterrole <on|off>
565       This command is used ot enable/disable the LMASTER capability for a
566       node at runtime. This capability determines whether or not a node can
567       be used as an LMASTER for records in the database. A node that does not
568       have the LMASTER capability will not show up in the vnnmap.
569
570       Nodes will by default have this capability, but it can be stripped off
571       nodes by the setting in the sysconfig file or by using this command.
572
573       Once this setting has been enabled/disabled, you need to perform a
574       recovery for it to take effect.
575
576       See also "ctdb getcapabilities"
577
578   setrecmasterrole <on|off>
579       This command is used ot enable/disable the RECMASTER capability for a
580       node at runtime. This capability determines whether or not a node can
581       be used as an RECMASTER for the cluster. A node that does not have the
582       RECMASTER capability can not win a recmaster election. A node that
583       already is the recmaster for the cluster when the capability is
584       stripped off the node will remain the recmaster until the next cluster
585       election.
586
587       Nodes will by default have this capability, but it can be stripped off
588       nodes by the setting in the sysconfig file or by using this command.
589
590       See also "ctdb getcapabilities"
591
592   killtcp <srcip:port> <dstip:port>
593       This command will kill the specified TCP connection by issuing a TCP
594       RST to the srcip:port endpoint. This is a command used by the ctdb
595       eventscripts.
596
597   gratiousarp <ip> <interface>
598       This command will send out a gratious arp for the specified interface
599       through the specified interface. This command is mainly used by the
600       ctdb eventscripts.
601
602   reloadnodes
603       This command is used when adding new nodes, or removing existing nodes
604       from an existing cluster.
605
606       Procedure to add a node:
607
608       1, To expand an existing cluster, first ensure with ´ctdb status´ that
609       all nodes are up and running and that they are all healthy. Do not try
610       to expand a cluster unless it is completely healthy!
611
612       2, On all nodes, edit /etc/ctdb/nodes and add the new node as the last
613       entry to the file. The new node MUST be added to the end of this file!
614
615       3, Verify that all the nodes have identical /etc/ctdb/nodes files after
616       you edited them and added the new node!
617
618       4, Run ´ctdb reloadnodes´ to force all nodes to reload the nodesfile.
619
620       5, Use ´ctdb status´ on all nodes and verify that they now show the
621       additional node.
622
623       6, Install and configure the new node and bring it online.
624
625       Procedure to remove a node:
626
627       1, To remove a node from an existing cluster, first ensure with ´ctdb
628       status´ that all nodes, except the node to be deleted, are up and
629       running and that they are all healthy. Do not try to remove a node from
630       a cluster unless the cluster is completely healthy!
631
632       2, Shutdown and poerwoff the node to be removed.
633
634       3, On all other nodes, edit the /etc/ctdb/nodes file and comment out
635       the node to be removed. Do not delete the line for that node, just
636       comment it out by adding a ´#´ at the beginning of the line.
637
638       4, Run ´ctdb reloadnodes´ to force all nodes to reload the nodesfile.
639
640       5, Use ´ctdb status´ on all nodes and verify that the deleted node no
641       longer shows up in the list..
642
643   tickle <srcip:port> <dstip:port>
644       This command will will send a TCP tickle to the source host for the
645       specified TCP connection. A TCP tickle is a TCP ACK packet with an
646       invalid sequence and acknowledge number and will when received by the
647       source host result in it sending an immediate correct ACK back to the
648       other end.
649
650       TCP tickles are useful to "tickle" clients after a IP failover has
651       occured since this will make the client immediately recognize the TCP
652       connection has been disrupted and that the client will need to
653       reestablish. This greatly speeds up the time it takes for a client to
654       detect and reestablish after an IP failover in the ctdb cluster.
655
656   gettickles <ip>
657       This command is used to show which TCP connections are registered with
658       CTDB to be "tickled" if there is a failover.
659
660   repack [max_freelist]
661       Over time, when records are created and deleted in a TDB, the TDB list
662       of free space will become fragmented. This can lead to a slowdown in
663       accessing TDB records. This command is used to defragment a TDB
664       database and pruning the freelist.
665
666       If [max_freelist] is specified, then a database will only be repacked
667       if it has more than this number of entries in the freelist.
668
669       During repacking of the database, the entire TDB database will be
670       locked to prevent writes. If samba tries to write to a record in the
671       database during a repack operation, samba will block until the
672       repacking has completed.
673
674       This command can be disruptive and can cause samba to block for the
675       duration of the repack operation. In general, a repack operation will
676       take less than one second to complete.
677
678       A repack operation will only defragment the local TDB copy of the CTDB
679       database. You need to run this command on all of the nodes to repack a
680       CTDB database completely.
681
682       Example: ctdb repack 1000
683
684       By default, this operation is issued from the 00.ctdb event script
685       every 5 minutes.
686
687   vacuum [max_records]
688       Over time CTDB databases will fill up with empty deleted records which
689       will lead to a progressive slow down of CTDB database access. This
690       command is used to prune all databases and delete all empty records
691       from the cluster.
692
693       By default, vacuum will delete all empty records from all databases. If
694       [max_records] is specified, the command will only delete the first
695       [max_records] empty records for each database.
696
697       Vacuum only deletes records where the local node is the lmaster. To
698       delete all records from the entire cluster you need to run a vacuum
699       from each node. This command is not disruptive. Samba is unaffected and
700       will still be able to read/write records normally while the database is
701       being vacuumed.
702
703       Example: ctdb vacuum
704
705       By default, this operation is issued from the 00.ctdb event script
706       every 5 minutes.
707
708   backupdb <dbname> <file>
709       This command can be used to copy the entire content of a database out
710       to a file. This file can later be read back into ctdb using the
711       restoredb command. This is mainly useful for backing up persistent
712       databases such as secrets.tdb and similar.
713
714   restoredb <file>
715       This command restores a persistent database that was previously backed
716       up using backupdb.
717
718   wipedb <dbname>
719       This command can be used to remove all content of a database.
720

DEBUGGING COMMANDS

722       These commands are primarily used for CTDB development and testing and
723       should not be used for normal administration.
724
725   process-exists <pid>
726       This command checks if a specific process exists on the CTDB host. This
727       is mainly used by Samba to check if remote instances of samba are still
728       running or not.
729
730   getdbmap
731       This command lists all clustered TDB databases that the CTDB daemon has
732       attached to. Some databases are flagged as PERSISTENT, this means that
733       the database stores data persistently and the data will remain across
734       reboots. One example of such a database is secrets.tdb where
735       information about how the cluster was joined to the domain is stored.
736
737       If a PERSISTENT database is not in a healthy state the database is
738       flagged as UNHEALTHY. If there´s at least one completely healthy node
739       running in the cluster, it´s possible that the content is restored by a
740       recovery run automaticly. Otherwise an administrator needs to analyze
741       the problem.
742
743       See also "ctdb getdbstatus", "ctdb backupdb", "ctdb restoredb", "ctdb
744       dumpbackup", "ctdb wipedb", "ctdb setvar AllowUnhealthyDBRead 1" and
745       (if samba or tdb-utils are installed) "tdbtool check".
746
747       Most databases are not persistent and only store the state information
748       that the currently running samba daemons need. These databases are
749       always wiped when ctdb/samba starts and when a node is rebooted.
750
751       Example: ctdb getdbmap
752
753       Example output:
754
755           Number of databases:10
756           dbid:0x435d3410 name:notify.tdb path:/var/ctdb/notify.tdb.0
757           dbid:0x42fe72c5 name:locking.tdb path:/var/ctdb/locking.tdb.0 dbid:0x1421fb78 name:brlock.tdb path:/var/ctdb/brlock.tdb.0
758           dbid:0x17055d90 name:connections.tdb path:/var/ctdb/connections.tdb.0
759           dbid:0xc0bdde6a name:sessionid.tdb path:/var/ctdb/sessionid.tdb.0
760           dbid:0x122224da name:test.tdb path:/var/ctdb/test.tdb.0
761           dbid:0x2672a57f name:idmap2.tdb path:/var/ctdb/persistent/idmap2.tdb.0 PERSISTENT
762           dbid:0xb775fff6 name:secrets.tdb path:/var/ctdb/persistent/secrets.tdb.0 PERSISTENT
763           dbid:0xe98e08b6 name:group_mapping.tdb path:/var/ctdb/persistent/group_mapping.tdb.0 PERSISTENT
764           dbid:0x7bbbd26c name:passdb.tdb path:/var/ctdb/persistent/passdb.tdb.0 PERSISTENT
765
766
767       Example output for an unhealthy database:
768
769           Number of databases:1
770           dbid:0xb775fff6 name:secrets.tdb path:/var/ctdb/persistent/secrets.tdb.0 PERSISTENT UNHEALTHY
771
772
773       Example output for a healthy database as machinereadable output -Y:
774
775           :ID:Name:Path:Persistent:Unhealthy:
776           :0x7bbbd26c:passdb.tdb:/var/ctdb/persistent/passdb.tdb.0:1:0:
777
778
779   getdbstatus <dbname>
780       This command displays more details about a database.
781
782       Example: ctdb getdbstatus test.tdb.0
783
784       Example output:
785
786           dbid: 0x122224da
787           name: test.tdb
788           path: /var/ctdb/test.tdb.0
789           PERSISTENT: no
790           HEALTH: OK
791
792
793       Example: ctdb getdbstatus registry.tdb (with a corrupted TDB)
794
795       Example output:
796
797           dbid: 0xf2a58948
798           name: registry.tdb
799           path: /var/ctdb/persistent/registry.tdb.0
800           PERSISTENT: yes
801           HEALTH: NO-HEALTHY-NODES - ERROR - Backup of corrupted TDB in ´/var/ctdb/persistent/registry.tdb.0.corrupted.20091208091949.0Z´
802
803
804   catdb <dbname>
805       This command will dump a clustered TDB database to the screen. This is
806       a debugging command.
807
808   dumpdbbackup <backup-file>
809       This command will dump the content of database backup to the screen
810       (similar to ctdb catdb). This is a debugging command.
811
812   getmonmode
813       This command returns the monutoring mode of a node. The monitoring mode
814       is either ACTIVE or DISABLED. Normally a node will continously monitor
815       that all other nodes that are expected are in fact connected and that
816       they respond to commands.
817
818       ACTIVE - This is the normal mode. The node is actively monitoring all
819       other nodes, both that the transport is connected and also that the
820       node responds to commands. If a node becomes unavailable, it will be
821       marked as DISCONNECTED and a recovery is initiated to restore the
822       cluster.
823
824       DISABLED - This node is not monitoring that other nodes are available.
825       In this mode a node failure will not be detected and no recovery will
826       be performed. This mode is useful when for debugging purposes one wants
827       to attach GDB to a ctdb process but wants to prevent the rest of the
828       cluster from marking this node as DISCONNECTED and do a recovery.
829
830   setmonmode <0|1>
831       This command can be used to explicitely disable/enable monitoring mode
832       on a node. The main purpose is if one wants to attach GDB to a running
833       ctdb daemon but wants to prevent the other nodes from marking it as
834       DISCONNECTED and issuing a recovery. To do this, set monitoring mode to
835       0 on all nodes before attaching with GDB. Remember to set monitoring
836       mode back to 1 afterwards.
837
838   attach <dbname>
839       This is a debugging command. This command will make the CTDB daemon
840       create a new CTDB database and attach to it.
841
842   dumpmemory
843       This is a debugging command. This command will make the ctdb daemon to
844       write a fill memory allocation map to standard output.
845
846   rddumpmemory
847       This is a debugging command. This command will dump the talloc memory
848       allocation tree for the recovery daemon to standard output.
849
850   freeze
851       This command will lock all the local TDB databases causing clients that
852       are accessing these TDBs such as samba3 to block until the databases
853       are thawed.
854
855       This is primarily used by the recovery daemon to stop all samba daemons
856       from accessing any databases while the database is recovered and
857       rebuilt.
858
859   thaw
860       Thaw a previously frozen node.
861
862   eventscript <arguments>
863       This is a debugging command. This command can be used to manually
864       invoke and run the eventscritps with arbitrary arguments.
865
866   ban <bantime|0>
867       Administratively ban a node for bantime seconds. A bantime of 0 means
868       that the node should be permanently banned.
869
870       A banned node does not participate in the cluster and does not host any
871       records for the clustered TDB. Its ip address has been taken over by an
872       other node and no services are hosted.
873
874       Nodes are automatically banned if they are the cause of too many
875       cluster recoveries.
876
877   unban
878       This command is used to unban a node that has either been
879       administratively banned using the ban command or has been automatically
880       banned by the recovery daemon.
881

COPYRIGHT/LICENSE

886           Copyright (C) Andrew Tridgell 2007
887           Copyright (C) Ronnie sahlberg 2007
888
889           This program is free software; you can redistribute it and/or modify
890           it under the terms of the GNU General Public License as published by
891           the Free Software Foundation; either version 3 of the License, or (at
892           your option) any later version.
893
894           This program is distributed in the hope that it will be useful, but
895           WITHOUT ANY WARRANTY; without even the implied warranty of
896           MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
897           General Public License for more details.
898
899           You should have received a copy of the GNU General Public License
900           along with this program; if not, see http://www.gnu.org/licenses/.
901
902
903
904                                  12/09/2009                           CTDB(1)