Starting and testing CTDB
The CTDB log is in /var/log/log.ctdb so look in this file if something diud not start correctly.
Log in to all of the nodes in the cluster and start the ctdb service using
service ctdb start
Verify that the CTDB daemon started properly. There should normally be at least 2 processes started for CTDB, one for the main daemon and one for the recovery daemon.
pidof ctdbd
Once all CTDB nodes have started, verify that they are correctly talking to eachothers.
There should be one TCP connection from the private ip address on each node to TCP port 9001 on each of the other nodes in the cluster.
netstat -a -n | grep 9001
Automatically restarting CTDB
If you wish to cope with software faults in ctdb, or want ctdb to automatically restart when an administration kills it, then you may wish to add a cron entry for root like this:
* * * * * /etc/init.d/ctdb cron > /dev/null 2>&1
Testing CTDB
Once your cluster is up and running, you may wish to know how to test that it is functioning correctly. The following tests may help with that
The ctdb tool
The ctdb package comes with a utility called ctdb that can be used to view the behaviour of the ctdb cluster.
If you run it with no options it will provide some terse usage information. The most commonly used commands are:
ctdb status
ctdb ip
ctdb ping
ctdb status
The status command provides basic information about the cluster and the status of the nodes. when you run it you will get some output like :
Number of nodes:4
vnn:0 10.1.1.1 OK (THIS NODE)
vnn:1 10.1.1.2 OK
vnn:2 10.1.1.3 OK
vnn:3 10.1.1.4 OK
Generation:1362079228
Size:4
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
hash:3 lmaster:3
Recovery mode:NORMAL (0)
Recovery master:0
The important parts are in bold. This tells us that all 4 nodes are in a healthy state.
It also tells us that recovery mode is normal, which means that the cluster has finished a recovery and is running in a normal fully operational state.
Recovery state will briefly change to "RECOVERY" when there ahs been a node failure or something is wrong with the cluster.
If the cluster remains in RECOVERY state for very long (many seconds) there might be something wrong with the configuration. See /var/log/log.ctdb
ctdb ip
This command prints the current status of the public ip addresses and which physical node is currently serving that ip.
Number of nodes:4
192.168.1.1 0
192.168.1.2 1
192.168.2.1 2
192.168.2.1 3
ctdb ping
this command tries to "ping" each of the CTDB daemons in the cluster.
response from 0 time=0.000050 sec (13 clients)
response from 1 time=0.000154 sec (27 clients)
response from 2 time=0.000114 sec (17 clients)
response from 3 time=0.000115 sec (59 clients)