ctdbd

Name

ctdbd — The CTDB cluster daemon

Synopsis

ctdbd

ctdbd [-? --help] [-d --debug=<INTEGER>] {--dbdir=<directory>} {--dbdir-persistent=<directory>} [--event-script-dir=<directory>] [-i --interactive] [--listen=<address>] [--logfile=<filename>] [--lvs] {--nlist=<filename>} [--no-lmaster] [--no-recmaster] [--nosetsched] {--notification-script=<filename>} [--public-addresses=<filename>] [--public-interface=<interface>] {--reclock=<filename>} [--single-public-ip=<address>] [--socket=<filename>] [--start-as-disabled] [--start-as-stopped] [--syslog] [--log-ringbuf-size=<num-entries>] [--torture] [--transport=<STRING>] [--usage]

DESCRIPTION

+ctdbd

Name

ctdbd — The CTDB cluster daemon

Synopsis

ctdbd

DESCRIPTION

ctdbd is the main ctdb daemon.

ctdbd provides a clustered version of the TDB database with automatic rebuild/recovery of the databases upon nodefailures. @@ -8,7 +8,7 @@ ctdbd provides monitoring of all nodes in the cluster and automatically reconfigures the cluster and recovers upon node failures.

ctdbd is the main component in clustered Samba that provides a high-availability load-sharing CIFS server cluster. -

OPTIONS

-? --help: +

OPTIONS

-? --help: Print some help text to the screen.
-d --debug=<DEBUGLEVEL>: This option sets the debuglevel on the ctdbd daemon which controls what will be written to the logfile. The default is 0 which will only log important events and errors. A larger number will provide additional logging. @@ -154,10 +154,10 @@ implemented in the future.
--usage: Print useage information to the screen. -

Private vs Public addresses

When used for ip takeover in a HA environment, each node in a ctdb cluster has multiple ip addresses assigned to it. One private and one or more public. -

Private address

This is the physical ip address of the node which is configured in linux and attached to a physical interface. This address uniquely identifies a physical node in the cluster and is the ip addresses @@ -187,7 +187,7 @@ 10.1.1.2 10.1.1.3 10.1.1.4 -

Public address

A public address on the other hand is not attached to an interface. This address is managed by ctdbd itself and is attached/detached to a physical node at runtime. @@ -248,7 +248,7 @@ unavailable. 10.1.1.1 can not be failed over to node 2 or node 3 since these nodes do not have this ip address listed in their public addresses file. -

Node status

The current status of each node in the cluster can be viewed by the 'ctdb status' command.

@@ -285,9 +285,9 @@ RECMASTER or NATGW. This node does not perticipate in the CTDB cluster but can still be communicated with. I.e. ctdb commands can be sent to it. -

PUBLIC TUNABLES

These are the public tuneables that can be used to control how ctdb behaves. -

MaxRedirectCount

Default: 3

MaxRedirectCount

Default: 3

If we are not the DMASTER and need to fetch a record across the network we first send the request to the LMASTER after which the record is passed onto the current DMASTER. If the DMASTER changes before @@ -301,7 +301,7 @@

When chasing a record, this is how many hops we will chase the record for before going back to the LMASTER to ask for new guidance. -

SeqnumInterval

Default: 1000

SeqnumInterval

Default: 1000

Some databases have seqnum tracking enabled, so that samba will be able to detect asynchronously when there has been updates to the database. Everytime a database is updated its sequence number is increased. @@ -309,17 +309,17 @@ This tunable is used to specify in 'ms' how frequently ctdb will send out updates to remote nodes to inform them that the sequence number is increased. -

ControlTimeout

Default: 60

ControlTimeout

Default: 60

This is the default setting for timeout for when sending a control message to either the local or a remote ctdb daemon. -

TraverseTimeout

Default: 20

TraverseTimeout

Default: 20

This setting controls how long we allow a traverse process to run. After this timeout triggers, the main ctdb daemon will abort the traverse if it has not yet finished. -

KeepaliveInterval

Default: 5

KeepaliveInterval

Default: 5

How often in seconds should the nodes send keepalives to eachother. -

KeepaliveLimit

Default: 5

KeepaliveLimit

Default: 5

After how many keepalive intervals without any traffic should a node wait until marking the peer as DISCONNECTED.

@@ -328,60 +328,60 @@ require a recovery. This limitshould not be set too high since we want a hung node to be detectec, and expunged from the cluster well before common CIFS timeouts (45-90 seconds) kick in. -

RecoverTimeout

Default: 20

RecoverTimeout

Default: 20

This is the default setting for timeouts for controls when sent from the recovery daemon. We allow longer control timeouts from the recovery daemon than from normal use since the recovery dameon often use controls that can take a lot longer than normal controls. -

RecoverInterval

Default: 1

RecoverInterval

Default: 1

How frequently in seconds should the recovery daemon perform the consistency checks that determine if we need to perform a recovery or not. -

ElectionTimeout

Default: 3

ElectionTimeout

Default: 3

When electing a new recovery master, this is how many seconds we allow the election to take before we either deem the election finished or we fail the election and start a new one. -

TakeoverTimeout

Default: 9

TakeoverTimeout

Default: 9

This is how many seconds we allow controls to take for IP failover events. -

MonitorInterval

Default: 15

MonitorInterval

Default: 15

How often should ctdb run the event scripts to check for a nodes health. -

TickleUpdateInterval

Default: 20

TickleUpdateInterval

Default: 20

How often will ctdb record and store the "tickle" information used to kickstart stalled tcp connections after a recovery. -

EventScriptTimeout

Default: 20

EventScriptTimeout

Default: 20

How long should ctdb let an event script run before aborting it and marking the node unhealthy. -

EventScriptTimeoutCount

Default: 1

EventScriptTimeoutCount

Default: 1

How many events in a row needs to timeout before we flag the node UNHEALTHY. This setting is useful if your scripts can not be written so that they do not hang for benign reasons. -

EventScriptUnhealthyOnTimeout

Default: 0

EventScriptUnhealthyOnTimeout

Default: 0

This setting can be be used to make ctdb never become UNHEALTHY if your eventscripts keep hanging/timing out. -

RecoveryGracePeriod

Default: 120

RecoveryGracePeriod

Default: 120

During recoveries, if a node has not caused recovery failures during the last grace period, any records of transgressions that the node has caused recovery failures will be forgiven. This resets the ban-counter back to zero for that node. -

RecoveryBanPeriod

Default: 300

RecoveryBanPeriod

Default: 300

If a node becomes banned causing repetitive recovery failures. The node will eventually become banned from the cluster. This controls how long the culprit node will be banned from the cluster before it is allowed to try to join the cluster again. Don't set to small. A node gets banned for a reason and it is usually due to real problems with the node. -

DatabaseHashSize

Default: 100001

DatabaseHashSize

Default: 100001

Size of the hash chains for the local store of the tdbs that ctdb manages. -

DatabaseMaxDead

Default: 5

DatabaseMaxDead

Default: 5

How many dead records per hashchain in the TDB database do we allow before the freelist needs to be processed. -

RerecoveryTimeout

Default: 10

RerecoveryTimeout

Default: 10

Once a recovery has completed, no additional recoveries are permitted until this timeout has expired. -

EnableBans

Default: 1

EnableBans

Default: 1

When set to 0, this disables BANNING completely in the cluster and thus nodes can not get banned, even it they break. Don't set to 0 unless you know what you are doing. -

DeterministicIPs

Default: 0

DeterministicIPs

Default: 0

When enabled, this tunable makes ctdb try to keep public IP addresses locked to specific nodes as far as possible. This makes it easier for debugging since you can know that as long as all nodes are healthy @@ -392,12 +392,12 @@ public IP assignment changes in the cluster. This tunable may increase the number of IP failover/failbacks that are performed on the cluster by a small margin. -

LCP2PublicIPs

Default: 1

LCP2PublicIPs

Default: 1

When enabled this switches ctdb to use the LCP2 ip allocation algorithm. -

ReclockPingPeriod

Default: x

ReclockPingPeriod

Default: x

Obsolete -

NoIPFailback

Default: 0

NoIPFailback

Default: 0

When set to 1, ctdb will not perform failback of IP addresses when a node becomes healthy. Ctdb WILL perform failover of public IP addresses when a node becomes UNHEALTHY, but when the node becomes HEALTHY again, ctdb @@ -415,7 +415,7 @@ intervention from the administrator. When this parameter is set, you can manually fail public IP addresses over to the new node(s) using the 'ctdb moveip' command. -

DisableIPFailover

Default: 0

DisableIPFailover

Default: 0

When enabled, ctdb will not perform failover or failback. Even if a node fails while holding public IPs, ctdb will not recover the IPs or assign them to another node. @@ -424,59 +424,59 @@ the cluster by failing IP addresses over to other nodes. This leads to a service outage until the administrator has manually performed failover to replacement nodes using the 'ctdb moveip' command. -

NoIPTakeover

Default: 0

NoIPTakeover

Default: 0

When set to 1, ctdb will allow ip addresses to be failed over onto this node. Any ip addresses that the node currently hosts will remain on the node but no new ip addresses can be failed over onto the node. -

NoIPTakeoverOnDisabled

Default: 0

NoIPTakeoverOnDisabled

Default: 0

If no nodes are healthy then by default ctdb will happily host public IPs on disabled (unhealthy or administratively disabled) nodes. This can cause problems, for example if the underlying cluster filesystem is not mounted. When set to 1 this behaviour is switched off and disabled nodes will not be able to takeover IPs. -

DBRecordCountWarn

Default: 100000

DBRecordCountWarn

Default: 100000

When set to non-zero, ctdb will log a warning when we try to recover a database with more than this many records. This will produce a warning if a database grows uncontrollably with orphaned records. -

DBRecordSizeWarn

Default: 10000000

DBRecordSizeWarn

Default: 10000000

When set to non-zero, ctdb will log a warning when we try to recover a database where a single record is bigger than this. This will produce a warning if a database record grows uncontrollably with orphaned sub-records. -

DBSizeWarn

Default: 1000000000

DBSizeWarn

Default: 1000000000

When set to non-zero, ctdb will log a warning when we try to recover a database bigger than this. This will produce a warning if a database grows uncontrollably. -

VerboseMemoryNames

Default: 0

VerboseMemoryNames

Default: 0

This feature consumes additional memory. when used the talloc library will create more verbose names for all talloc allocated objects. -

RecdPingTimeout

Default: 60

RecdPingTimeout

Default: 60

If the main dameon has not heard a "ping" from the recovery dameon for this many seconds, the main dameon will log a message that the recovery daemon is potentially hung. -

RecdFailCount

Default: 10

RecdFailCount

Default: 10

If the recovery daemon has failed to ping the main dameon for this many consecutive intervals, the main daemon will consider the recovery daemon as hung and will try to restart it to recover. -

LogLatencyMs

Default: 0

LogLatencyMs

Default: 0

When set to non-zero, this will make the main daemon log any operation that took longer than this value, in 'ms', to complete. These include "how long time a lockwait child process needed", "how long time to write to a persistent database" but also "how long did it take to get a response to a CALL from a remote node". -

RecLockLatencyMs

Default: 1000

RecLockLatencyMs

Default: 1000

When using a reclock file for split brain prevention, if set to non-zero this tunable will make the recovery dameon log a message if the fcntl() call to lock/testlock the recovery file takes longer than this number of ms. -

RecoveryDropAllIPs

Default: 120

RecoveryDropAllIPs

Default: 120

If we have been stuck in recovery, or stopped, or banned, mode for this many seconds we will force drop all held public addresses. -

verifyRecoveryLock

Default: 1

verifyRecoveryLock

Default: 1

Should we take a fcntl() lock on the reclock file to verify that we are the sole recovery master node on the cluster or not. -

DeferredAttachTO

Default: 120

DeferredAttachTO

Default: 120

When databases are frozen we do not allow clients to attach to the databases. Instead of returning an error immediately to the application the attach request from the client is deferred until the database @@ -484,7 +484,7 @@

This timeout controls how long we will defer the request from the client before timing it out and returning an error to the client. -

HopcountMakeSticky

Default: 50

HopcountMakeSticky

Default: 50

If the database is set to 'STICKY' mode, using the 'ctdb setdbsticky' command, any record that is seen as very hot and migrating so fast that hopcount surpasses 50 is set to become a STICKY record for StickyDuration @@ -495,15 +495,15 @@ migrating across the cluster so fast. This will improve performance for certain workloads, such as locking.tdb if many clients are opening/closing the same file concurrently. -

StickyDuration

Default: 600

StickyDuration

Default: 600

Once a record has been found to be fetch-lock hot and has been flagged to become STICKY, this is for how long, in seconds, the record will be flagged as a STICKY record. -

StickyPindown

Default: 200

StickyPindown

Default: 200

Once a STICKY record has been migrated onto a node, it will be pinned down on that node for this number of ms. Any request from other nodes to migrate the record off the node will be deferred until the pindown timer expires. -

MaxLACount

Default: 20

MaxLACount

Default: 20

When record content is fetched from a remote node, if it is only for reading the record, pass back the content of the record but do not yet migrate the record. Once MaxLACount identical requests from the @@ -511,13 +511,13 @@ onto the requesting node. This reduces the amount of migration for a database read-mostly workload at the expense of more frequent network roundtrips. -

StatHistoryInterval

Default: 1

StatHistoryInterval

Default: 1

Granularity of the statistics collected in the statistics history. -

AllowClientDBAttach

Default: 1

AllowClientDBAttach

Default: 1

When set to 0, clients are not allowed to attach to any databases. This can be used to temporarily block any new processes from attaching to and accessing the databases. -

RecoverPDBBySeqNum

Default: 0

RecoverPDBBySeqNum

Default: 0

When set to non-zero, this will change how the recovery process for persistent databases ar performed. By default, when performing a database recovery, for normal as for persistent databases, recovery is @@ -528,7 +528,7 @@ a whole db and not by individual records. The node that contains the highest value stored in the record "__db_sequence_number__" is selected and the copy of that nodes database is used as the recovered database. -

FetchCollapse

Default: 1

FetchCollapse

Default: 1

When many clients across many nodes try to access the same record at the same time this can lead to a fetch storm where the record becomes very active and bounces between nodes very fast. This leads to high CPU @@ -544,7 +544,17 @@

This timeout controls if we should collapse multiple fetch operations of the same record into a single request and defer all duplicates or not. -

LVS

DeadlockTimeout

Default: 60

+ Number of seconds to determine if ctdb is in deadlock with samba. +

+ When ctdb daemon is blocked waiting for a lock on a database which is + blocked by some other process, ctdb logs a warning every 10 seconds. Most + often this is caused by samba locking databases and waiting on ctdb and + result in a deadlock. If the lock is not obtained by ctdb before deadlock + timeout expires, ctdb will detect it as a deadlock and terminate the + blocking samba process. Setting this value to 0 disables deadlock + detection. +

LVS

LVS is a mode where CTDB presents one single IP address for the entire cluster. This is an alternative to using public IP addresses and round-robin DNS to loadbalance clients across the cluster. @@ -585,7 +595,7 @@ the processing node back to the clients. For read-intensive i/o patterns you can acheive very high throughput rates in this mode.

Note: you can use LVS and public addresses at the same time. -

Configuration

To activate LVS on a CTDB node you must specify CTDB_PUBLIC_INTERFACE and CTDB_LVS_PUBLIC_ADDRESS in /etc/sysconfig/ctdb.

@@ -608,7 +618,7 @@ You must also specify the "--lvs" command line argument to ctdbd to activate LVS all of the clients from the node BEFORE you enable LVS. Also make sure that when you ping these hosts that the traffic is routed out through the eth0 interface. -

REMOTE CLUSTER NODES

It is possible to have a CTDB cluster that spans across a WAN link. For example where you have a CTDB cluster in your datacentre but you also want to have one additional CTDB node located at a remote branch site. @@ -637,7 +647,7 @@ CTDB_CAPABILITY_RECMASTER=no

Verify with the command "ctdb getcapabilities" that that node no longer has the recmaster or the lmaster capabilities. -

NAT-GW

Sometimes it is desireable to run services on the CTDB node which will need to originate outgoing traffic to external servers. This might be contacting NIS servers, LDAP servers etc. etc. @@ -660,7 +670,7 @@ CTDB_CAPABILITY_RECMASTER=no if there are no public addresses assigned to the node. This is the simplest way but it uses up a lot of ip addresses since you have to assign both static and also public addresses to each node. -

NAT-GW

A second way is to use the built in NAT-GW feature in CTDB. With NAT-GW you assign one public NATGW address for each natgw group. Each NATGW group is a set of nodes in the cluster that shares the same @@ -675,7 +685,7 @@ CTDB_CAPABILITY_RECMASTER=no In each NATGW group, one of the nodes is designated the NAT Gateway through which all traffic that is originated by nodes in this group will be routed through if a public addresses are not available. -

Configuration

NAT-GW is configured in /etc/sysconfig/ctdb by setting the following variables:

@@ -723,31 +733,31 @@ CTDB_CAPABILITY_RECMASTER=no
 # become natgw master.
 #
 # CTDB_NATGW_SLAVE_ONLY=yes
-

CTDB_NATGW_PUBLIC_IP

This is an ip address in the public network that is used for all outgoing traffic when the public addresses are not assigned. This address will be assigned to one of the nodes in the cluster which will masquerade all traffic for the other nodes.

Format of this parameter is IPADDRESS/NETMASK -

CTDB_NATGW_PUBLIC_IFACE

This is the physical interface where the CTDB_NATGW_PUBLIC_IP will be assigned to. This should be an interface connected to the public network.

Format of this parameter is INTERFACE -

CTDB_NATGW_DEFAULT_GATEWAY

This is the default gateway to use on the node that is elected to host the CTDB_NATGW_PUBLIC_IP. This is the default gateway on the public network.

Format of this parameter is IPADDRESS -

CTDB_NATGW_PRIVATE_NETWORK

This is the network/netmask used for the interal private network.

Format of this parameter is IPADDRESS/NETMASK -

CTDB_NATGW_NODES

This is the list of all nodes that belong to the same NATGW group as this node. The default is /etc/ctdb/natgw_nodes. -

Operation

When the NAT-GW functionality is used, one of the nodes is elected to act as a NAT router for all the other nodes in the group when they need to originate traffic to the external public network. @@ -766,7 +776,7 @@ CTDB_CAPABILITY_RECMASTER=no

This is implemented in the 11.natgw eventscript. Please see the eventscript for further information. -

Removing/Changing NATGW at runtime

The following are the procedures to change/remove a NATGW configuration at runtime, without having to restart ctdbd.

@@ -780,7 +790,7 @@ CTDB_CAPABILITY_RECMASTER=no 1, Run 'CTDB_BASE=/etc/ctdb /etc/ctdb/events.d/11.natgw removenatgw' 2, Then change the configuration in /etc/sysconfig/ctdb 3, Run 'CTDB_BASE=/etc/ctdb /etc/ctdb/events.d/11.natgw updatenatgw' -

POLICY ROUTING

A node running CTDB may be a component of a complex network topology. In particular, public addresses may be spread across several different networks (or VLANs) and it may not be possible @@ -790,7 +800,7 @@ CTDB_CAPABILITY_RECMASTER=no be specified for packets sourced from each public address. The routes are added and removed as CTDB moves public addresses between nodes. -

Configuration variables

There are 4 configuration variables related to policy routing:

CTDB_PER_IP_ROUTING_CONF: The name of a configuration file that specifies the @@ -831,7 +841,7 @@ CTDB_CAPABILITY_RECMASTER=no The label for a public address <addr;gt; will look like ctdb.<addr>. This means that the associated rules and routes are easy to read (and manipulate). -

Configuration file

The format of each line is:

     <public_address> <network> [ <gateway> ]
@@ -892,7 +902,7 @@ CTDB_CAPABILITY_RECMASTER=no
       
   192.168.1.0/24 dev eth2 scope link 
   default via 192.168.1.1 dev eth2 
-

Example configuration

Here is a more complete example configuration.

 /etc/ctdb/public_addresses:
@@ -912,7 +922,7 @@ CTDB_CAPABILITY_RECMASTER=no
 	The routes local packets as expected, the default route is as
 	previously discussed, but packets to 192.168.200.0/24 are
 	routed via the alternate gateway 192.168.1.254.
-

NOTIFICATION SCRIPT

Notification scripts are used with ctdb to have a call-out from ctdb to a user-specified script when certain state changes occur in ctdb. This is commonly to set up either sending SNMP traps or emails @@ -924,17 +934,17 @@ CTDB_CAPABILITY_RECMASTER=no See /etc/ctdb/notify.sh for an example script.

CTDB currently generates notifications on these state changes: -

unhealthy

This call-out is triggered when the node changes to UNHEALTHY state. -

healthy

This call-out is triggered when the node changes to HEALTHY state. -

startup

This call-out is triggered when ctdb has started up and all managed services are up and running. -

ClamAV Daemon

CTDB has support to manage the popular anti-virus daemon ClamAV. This support is implemented through the eventscript : /etc/ctdb/events.d/31.clamd. -

Configuration

Start by configuring CLAMAV normally and test that it works. Once this is done, copy the configuration files over to all the nodes so that all nodes share identical CLAMAV configurations. @@ -963,10 +973,10 @@ Once you have restarted CTDBD, use ctdb scriptstatus

and verify that the 31.clamd eventscript is listed and that it was executed successfully. -

COPYRIGHT/LICENSE

Copyright (C) Andrew Tridgell 2007
Copyright (C) Ronnie sahlberg 2007

diff --git a/ctdb/doc/ctdbd.1.xml b/ctdb/doc/ctdbd.1.xml index 7cde951d3d9..d192febf960 100644 --- a/ctdb/doc/ctdbd.1.xml +++ b/ctdb/doc/ctdbd.1.xml @@ -1083,6 +1083,21 @@ + DeadlockTimeout + Default: 60 + + Number of seconds to determine if ctdb is in deadlock with samba. + + + When ctdb daemon is blocked waiting for a lock on a database which is + blocked by some other process, ctdb logs a warning every 10 seconds. Most + often this is caused by samba locking databases and waiting on ctdb and + result in a deadlock. If the lock is not obtained by ctdb before deadlock + timeout expires, ctdb will detect it as a deadlock and terminate the + blocking samba process. Setting this value to 0 disables deadlock + detection. + + LVS -- cgit