summaryrefslogtreecommitdiffstats
path: root/ctdb/server/ctdb_tunables.c
Commit message (Collapse)AuthorAgeFilesLines
* tunables: Remove obsolete tunablesAmitay Isaacs2013-10-301-3/+0
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ca5fc3431573c44d55d09d987c715fb53756fc1f)
* daemon: Change the default recovery method for persistent databasesAmitay Isaacs2013-10-281-1/+1
| | | | | | | | | Use sequence numbers to do recovery for persistent databases instead of RSNs. This fixes the problem of registry corruption during recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d)
* Revert "LACOUNT: Add back lacount mechanism to defer migrating a ↵Amitay Isaacs2013-08-221-1/+0
| | | | | | | | | | | | | | | | | | | fetched/read copy until after default of 20 consecutive requests from the same node" This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504. This is a premature optimization. Record can bounce between nodes very quickly if it is a contended record. There is no need to hold a record on a node unnecessarily. In case record contention becomes bad, enabling sticky records on a database is a better idea. Conflicts: include/ctdb_private.h server/ctdb_tunables.c Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
* ctdbd: No need for DeadlockTimeout tunableAmitay Isaacs2013-07-101-1/+0
| | | | | | | | | The code for deadlock detection and killing smbd process causing deadlock has been removed and replaced with external debug script. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2211cd94bea266547d3e6f167d3160a6b23bec88)
* ctdbd: Update the get_tunable code to return -EINVAL for unknown tunableMartin Schwenke2013-05-241-1/+1
| | | | | | | | | Otherwise callers can't tell the difference between some other failure (e.g. memory allocation failure) and an unknown tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)
* recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabledMartin Schwenke2013-05-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This really needs to be per-node. The rename is because nodes with this tunable switched on should drop IPs if they become unhealthy (or disabled in some other way). * Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon. * Enhance set_ipflags_internal() and set_ipflags() to setup NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled and/or whether nodes are disabled/inactive. * Replace can_node_servce_ip() with functions can_node_host_ip() and can_node_takeover_ip(). These functions are the only ones that need to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They can make the decision without looking at any other flags due to previous setup. * Remove explicit flag checking in IP allocation functions (including unassign_unsuitable_ips()) and just call can_node_host_ip() and can_node_takeover_ip() as appropriate. * Update test code to handle CTDB_SET_NoIPHostOnAllDisabled. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
* ctdbd: Fix the PullDBPreallocation size to 10MB as intendedAmitay Isaacs2013-02-141-1/+1
| | | | | | | | | | In 1f262deaad0818f159f9c68330f7fec121679023, Ronnie changed recovery code to allocate chunks of 10MB in traverse_pulldb() and traverse_recdb(). The tunable PullDBPreallocation size was set to 100MB. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e204fac03412520e877ab04363b3ece02667c55b)
* daemon: Add a tunable to enable automatic database priority settingAmitay Isaacs2013-01-051-0/+1
| | | | | | | | | | | | | | Samba versions 3.6.x and older do not set the database priority. This can cause deadlock between Samba and CTDB since the locking order of database will be different. A hack was added for automatic promotion of priority for specific databases to avoid deadlock. This code should not be invoked with Samba version 4.x which correctly specifies the priority for each database. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)
* ctdbd: locking: Provide non-blocking API for locking of TDB record/db/alldbAmitay Isaacs2012-10-201-0/+1
| | | | | | | | | | | | | This introduces a consistent API for handling locks on single record, complete db or all dbs. The locks are taken out in a child process. In cases of timeout, find the processes that currently hold the lock and log. Callback functions for locking requests take locked boolean to indicate whether the lock was successfully obtained or not. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff)
* ctdbd: New tunable NoIPTakeoverOnDisabledMartin Schwenke2012-10-111-1/+2
| | | | | | | | | | | | Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
* RECOVERY: Increase the time we allow before timing out recovery related tasks.Ronnie Sahlberg2012-05-251-1/+1
| | | | | | If the system is temporarily taking unusually long to perform these tasks it is better to wait a lot longer and allow the tasks to complete than timing out repeatedly and then becomming banned. (This used to be ctdb commit 03fa2a517247eb2adfba67248e2466f17ea14418)
* RECOVER: When we pull databases during recovery, we used to reallocate the ↵Ronnie Sahlberg2012-05-251-1/+2
| | | | | | | | | | | databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)
* DEBUG: Add checks for and print debug messages when 1) a database contains ↵Ronnie Sahlberg2012-05-211-1/+4
| | | | | | | | very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)
* Debug: When scripts hang, we may need to collect additional data in order to ↵Ronnie Sahlberg2012-05-171-1/+1
| | | | | | | | | | | debug why the script hung. Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect. For now we only collect a pstree so we can see what part of the script we hung in. S1037271 (This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
* NoIPTakeover: change the tunable name for the "dont allow failing addresses ↵Ronnie Sahlberg2012-03-221-1/+2
| | | | | | over onto the node" to NoIPTakeover (This used to be ctdb commit 35592e618cfd827b6978af6332f80504f232c46a)
* STICKY: add prototype code to make records stick to a node to "calm" down if ↵Ronnie Sahlberg2012-03-201-1/+4
| | | | | | | | they are found to be very hot and accessed by a lot of clients. This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record (This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
* LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy ↵Ronnie Sahlberg2012-03-201-1/+2
| | | | | | | | until after default of 20 consecutive requests from the same node This can improve performance slightly on certain workloads where smbds frequently read from the same record (This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)
* FETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, ↵Ronnie Sahlberg2012-03-201-1/+2
| | | | | | including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records (This used to be ctdb commit eafd7bbaaa5931546a96c8beae3cf9a39a49c925)
* Vacuuming: change default timeout to 120 secondsRonnie Sahlberg2012-02-291-1/+1
| | | | (This used to be ctdb commit 5ae94c6b9b3000a6c79fccaaea1e007ebd5be1a9)
* Add a tunable variable to control how long we defer after a ctdb addip until ↵Ronnie Sahlberg2012-02-281-1/+2
| | | | | | | | we force a rebalance and try to failback addresses onto this node Have it default to 300 seconds. (This used to be ctdb commit 49791db7dc74cffd7e88bd73091590cdc1909328)
* tunables: don't list obsolete tunables in the list_tunables controlMichael Adam2011-12-231-0/+3
| | | | (This used to be ctdb commit d8ab86f0eb11437e50d18183858dd3177a8f61e6)
* tunables: add a bool obsolete flag to the tunable_map listMichael Adam2011-12-231-49/+50
| | | | (This used to be ctdb commit 1a7d9b25fdcf7b59598618d406c2a681c90d9163)
* vacuum: add new tunable VacuumInterval and mark ↵Michael Adam2011-12-231-3/+4
| | | | | | | | Vacuum{Default,Min,Max}Interval obsolete And use VacuumInterval instead of VacuumDefaultInterval in the code. (This used to be ctdb commit 78530f40338f511a7cd1d33ada450905742bfa8f)
* Recover Persistent database DB by DB and not record by recordRonnie Sahlberg2011-11-301-1/+2
| | | | | | | | | | | | | | | | | | Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc)
* Add a tunable "AllowClientDBAttach" with default value 1.Michael Adam2011-09-051-1/+2
| | | | | | | | | When set to 0, clients will not be able to attach to databases via the db_attach control. This might can be useful for maintenance where ctdb should be kept running but clients should not be able to modify databases. (This used to be ctdb commit ddfeecda87955b4e46777599f678e6926d37f4c4)
* Change the default for ip failover to be LCP2 and not DeterministicIPsRonnie Sahlberg2011-08-151-2/+2
| | | | (This used to be ctdb commit 038916248a73d6a250108c9235c0c4f76dba8e0c)
* IP allocation - add LCP2 algorithm.Martin Schwenke2011-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current non-deterministic IP allocation algorithm balances IPs across the whole cluster. It does not consider different interfaces/VLANs/subnets, so these different groups of IPs aren't generally well balanced. This adds the LCP2 algorithm for IP allocation and allows it to be enabled by setting the "LCP2PublicIPs" tunable to 1. The LCP2 algorithm calculates the imbalance of a node by totalling the squares of the distances between each IP on the node. The IP distance is defined as the length longest common prefix (LCP) of bits that is found when comparing 2 IPs. The imbalance of a cluster is the maximum imbalance for any node. At each step the algorithm selects an allocation to the IP/node combination that results in the choosing the allocation that best reduces the imbalance of the cluster. The implementation splits out the IP allocation part of ctdb_takeover_run() into new function ctdb_takeover_run_core(), and then extracts out the basic IP assignment code into new functions basic_allocate_unassigned() and basic_failback(). 3 new functions lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement the LCP2 algorithm, and are hooked into ctdb_takeover_run_core(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)
* vacuum: change all Vacuum*Interval tunables to default to 10Michael Adam2011-03-141-3/+3
| | | | | | | So, by default we have a fastpath vacuuming every 10 seconds and full blown db-traverse vacuuming once every 10 minutes. (This used to be ctdb commit 4f0ace982dbb5b4f9c035dbf4cb0ae74cd18d81b)
* Add a tunable VacuumFastPathCount.Michael Adam2011-03-141-0/+1
| | | | | | | | This will control how many fast-path vacuuming runs wil have to be done, before a full vacuuming will be triggered, i.e. one with a db-traversal. (This used to be ctdb commit 0d997ec7e61a7bee2cb05456f9c7d5e6f7a44797)
* Deferred attach : at early startup, defer any db attach calls until we are ↵Ronnie Sahlberg2011-03-011-1/+2
| | | | | | out of recovery. (This used to be ctdb commit eeaabd579841f60ab2c5b004cbbb1f5de2bfe685)
* Remove LACOUNT and LACCESSOR and migrate the records immediately.Ronnie Sahlberg2011-02-181-1/+0
| | | | | | | | | | | | | | | This concept didnt work out and it is really just as expensive as a full migration anyway, without the benefit of caching the data for subsequence accesses. Now, migrate the records immediately on first access. This will be combined with a "cheap vacuum-lite" for special empty records to prevent growth of databases. Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway. By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags. (This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)
* change the takeover script timeout to 9 seconds from 5Ronnie Sahlberg2010-11-101-1/+1
| | | | (This used to be ctdb commit cd09c3f8fd9700261f77779aee9cf71dbd4e441e)
* Add a new tunable : DisableIPFailover that when set to non 0Ronnie Sahlberg2010-11-101-0/+1
| | | | | | will stopp any ip reallocations at all from happening. (This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)
* change the default for how long to waqit before dropping all ips to 120 secondsRonnie Sahlberg2010-11-101-1/+1
| | | | (This used to be ctdb commit e5f03346133157734b4759d43c3ab8203028d5c2)
* Update the default hash size to be 100001 instead of 10000Ronnie Sahlberg2010-10-111-1/+1
| | | | | | | This can sometimes improve performance for environments where very many files are touched in rapid succession (This used to be ctdb commit 15455a13863105a87d2cae9f06eed7435898c30b)
* Create a tunable for how often to collect rolling statistics and initialize ↵Ronnie Sahlberg2010-09-301-1/+2
| | | | | | it to 1 second (This used to be ctdb commit cb8c779bb5d9862abbe08919aa181a1a1b2bef18)
* We only queued up to 1000 packets per queue before we start droppingRonnie Sahlberg2010-02-041-1/+1
| | | | | | | | | | | | | | | | | | | | packets, to avoid the queue to grow excessively if smbd has blocked. This could cause traverse packets to become discarded in case the main smbd daemon does a traverse of a database while there is a recovery (sending a erconfigured message to smbd, causing an avalanche of unlock messages to be sent across the cluster.) This avalance of messages could cause also the tranversal message to be discarded causing the main smbd process to hang indefinitely waiting for the traversal message that will never arrive. Bump the maximum queue length before starting to discard messages from 1000 to 1000000 and at the same time rework the queueing slightly so we can append messages cheaply to the queue instead of walking the list from head to tail every time. (This used to be ctdb commit 59ba5d7f80e0465e5076533374fb9ee862ed7bb6)
* server: Use tdb_check to verify persistent tdbs on startupStefan Metzmacher2009-12-161-1/+2
| | | | | | | | | | | | | | | | | | | | Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
* Revert "cleanup: remove a tunable we no longer use in the eventscripts any ↵Ronnie Sahlberg2009-12-161-0/+1
| | | | | | | | | | | | | more :" This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c. Conflicts: include/ctdb_private.h server/ctdb_tunables.c (This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)
* Rename the tunable EventScriptBanCount to EventScriptTimeoutCountRonnie Sahlberg2009-12-141-1/+1
| | | | | | | | since we no longer ban nodes when dodgy scripts continue to hang. We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban. (This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)
* cleanup: remove a tunable we no longer use in the eventscripts any more :Ronnie Sahlberg2009-12-141-1/+0
| | | | | | EventScriptUnhealthyOnTimeout (This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)
* remove the variable "disable when unhealthy"Ronnie Sahlberg2009-12-141-1/+0
| | | | | | there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails (This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)
* test of a change to make ctdbd use "status" event instead of the "monitor" ↵Ronnie Sahlberg2009-11-131-1/+2
| | | | | | | | | event. This allows running the actual monitoring asynchronously from ctdbd and only using "status" to pick up the actual results. (This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
* Enhance the logging fromeventscripts.Ronnie Sahlberg2009-10-281-1/+1
| | | | | | | | When a single script is finished, also log the name of the script, the duration it took and the return status. In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second (This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)
* set the eventscripts to timeout after 20 secondsRonnie Sahlberg2009-10-231-2/+2
| | | | | | change the ban count to 10 failures before we ban by default (This used to be ctdb commit 38d7487bc68c8cf85980004aceeef24ae32d6f36)
* When clients have blocked, perhaps because the node is banned or stopped ↵Ronnie Sahlberg2009-10-211-1/+2
| | | | | | | | | | | | and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES. Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery. This avoids having queued up very very large number of MESSAGES that samba semds between eachother to nodes that are blocked/banned/stopped for extended periods . (This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
* From wolfgang MuellerRonnie Sahlberg2009-10-201-0/+1
| | | | | | Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned (This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
* add more debugging output to eventscripts and when a script has timed out,Ronnie Sahlberg2009-10-141-1/+1
| | | | | | | | | | | | print a full "pstree -p" to the log. Example : |-ctdbd(29826)-+-ctdbd(29862) | `-ctdbd(31897)-+-00.ctdb(31898)---sleep(31908) change the default timeout to 60 seconds for eventscripts (This used to be ctdb commit a3406c10d70f89d332eab25d481083142dff987d)
* From Wolfgang Mueller-FriedtRonnie Sahlberg2009-09-291-1/+4
| | | | | | | | | | | | | | | Remove the explicit vacuum/repack commands from the 00.ctdb eventscript and implement this in the ctdb daemon. Combine vacuuming and repacking into one cheap read traverse to enumerate all candidate records and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely. this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration. minor stylish changes made by ronnie s (This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)
* change the defaults for repacking to repack once every 120 seconds and ↵Ronnie Sahlberg2009-07-291-2/+2
| | | | | | letting it work for 30 second before timing out. (This used to be ctdb commit 2aa5d18bb42dca4ef9cb049b4fa9d7bc999ce4ad)