| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ca5fc3431573c44d55d09d987c715fb53756fc1f)
|
|
|
|
|
|
|
|
|
| |
Use sequence numbers to do recovery for persistent databases instead of
RSNs. This fixes the problem of registry corruption during recovery.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fetched/read copy until after default of 20 consecutive requests from the same node"
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.
This is a premature optimization. Record can bounce between nodes
very quickly if it is a contended record. There is no need to hold a
record on a node unnecessarily. In case record contention becomes bad,
enabling sticky records on a database is a better idea.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
|
|
|
|
|
|
|
|
|
| |
The code for deadlock detection and killing smbd process causing deadlock
has been removed and replaced with external debug script.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2211cd94bea266547d3e6f167d3160a6b23bec88)
|
|
|
|
|
|
|
|
|
| |
Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This really needs to be per-node. The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).
* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.
* Enhance set_ipflags_internal() and set_ipflags() to setup
NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
and/or whether nodes are disabled/inactive.
* Replace can_node_servce_ip() with functions can_node_host_ip() and
can_node_takeover_ip(). These functions are the only ones that need
to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They
can make the decision without looking at any other flags due to
previous setup.
* Remove explicit flag checking in IP allocation functions (including
unassign_unsuitable_ips()) and just call can_node_host_ip() and
can_node_takeover_ip() as appropriate.
* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
|
|
|
|
|
|
|
|
|
|
| |
In 1f262deaad0818f159f9c68330f7fec121679023, Ronnie changed recovery code
to allocate chunks of 10MB in traverse_pulldb() and traverse_recdb(). The
tunable PullDBPreallocation size was set to 100MB.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e204fac03412520e877ab04363b3ece02667c55b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Samba versions 3.6.x and older do not set the database priority.
This can cause deadlock between Samba and CTDB since the locking order
of database will be different. A hack was added for automatic promotion
of priority for specific databases to avoid deadlock. This code should
not be invoked with Samba version 4.x which correctly specifies the
priority for each database.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a consistent API for handling locks on single record, complete
db or all dbs. The locks are taken out in a child process. In cases of timeout,
find the processes that currently hold the lock and log.
Callback functions for locking requests take locked boolean to indicate
whether the lock was successfully obtained or not.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stops the behaviour where unhealthy nodes can host IPs when there are
no healthy nodes. Set this to 1 when an immediate complete outage is
preferred when all nodes are unhealthy. The alternative
(i.e. default) can lead to undefined behaviour when the shared
filesystem is unavailable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
|
|
|
|
|
|
| |
If the system is temporarily taking unusually long to perform these tasks it is better to wait a lot longer and allow the tasks to complete than timing out repeatedly and then becomming banned.
(This used to be ctdb commit 03fa2a517247eb2adfba67248e2466f17ea14418)
|
|
|
|
|
|
|
|
|
|
|
| |
databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region.
Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move operations that may be required.
Create a tunable to override/change how much preallocation should be used.
(This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)
|
|
|
|
|
|
|
|
| |
very many records, 2) when a database is very big, 3) when a single record is very big.
Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0
(This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)
|
|
|
|
|
|
|
|
|
|
|
| |
debug why the script hung.
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.
S1037271
(This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
|
|
|
|
|
|
| |
over onto the node" to NoIPTakeover
(This used to be ctdb commit 35592e618cfd827b6978af6332f80504f232c46a)
|
|
|
|
|
|
|
|
| |
they are found to be very hot and accessed by a lot of clients.
This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record
(This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
|
|
|
|
|
|
|
|
| |
until after default of 20 consecutive requests from the same node
This can improve performance slightly on certain workloads where smbds frequently read from the same record
(This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)
|
|
|
|
|
|
| |
including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records
(This used to be ctdb commit eafd7bbaaa5931546a96c8beae3cf9a39a49c925)
|
|
|
|
| |
(This used to be ctdb commit 5ae94c6b9b3000a6c79fccaaea1e007ebd5be1a9)
|
|
|
|
|
|
|
|
| |
we force a rebalance and try to failback addresses onto this node
Have it default to 300 seconds.
(This used to be ctdb commit 49791db7dc74cffd7e88bd73091590cdc1909328)
|
|
|
|
| |
(This used to be ctdb commit d8ab86f0eb11437e50d18183858dd3177a8f61e6)
|
|
|
|
| |
(This used to be ctdb commit 1a7d9b25fdcf7b59598618d406c2a681c90d9163)
|
|
|
|
|
|
|
|
| |
Vacuum{Default,Min,Max}Interval obsolete
And use VacuumInterval instead of VacuumDefaultInterval in the code.
(This used to be ctdb commit 78530f40338f511a7cd1d33ada450905742bfa8f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new tunable that changes the mode how persistent databases are recovered.
RecoveryPDBBySeqNum
When set to 1, persistent databases will be recovered in whole from the node which
has the highest "__db_sequence_number__" record.
This record is managed by samba for those databases where we do persistent writes and have
inter-record relations.
For these databases we do not want the usual "blend records from all nodes based
on individual record RSN" but instead a mode where we pick one instance of the persistent database.
If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN".
Some persistent databases do not contain record interrelations and as such does not
contain this special record at all.
(This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc)
|
|
|
|
|
|
|
|
|
| |
When set to 0, clients will not be able to attach to databases
via the db_attach control. This might can be useful for maintenance
where ctdb should be kept running but clients should not be able
to modify databases.
(This used to be ctdb commit ddfeecda87955b4e46777599f678e6926d37f4c4)
|
|
|
|
| |
(This used to be ctdb commit 038916248a73d6a250108c9235c0c4f76dba8e0c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current non-deterministic IP allocation algorithm balances IPs
across the whole cluster. It does not consider different
interfaces/VLANs/subnets, so these different groups of IPs aren't
generally well balanced.
This adds the LCP2 algorithm for IP allocation and allows it to be
enabled by setting the "LCP2PublicIPs" tunable to 1.
The LCP2 algorithm calculates the imbalance of a node by totalling the
squares of the distances between each IP on the node. The IP distance
is defined as the length longest common prefix (LCP) of bits that is
found when comparing 2 IPs. The imbalance of a cluster is the maximum
imbalance for any node. At each step the algorithm selects an
allocation to the IP/node combination that results in the choosing the
allocation that best reduces the imbalance of the cluster.
The implementation splits out the IP allocation part of
ctdb_takeover_run() into new function ctdb_takeover_run_core(), and
then extracts out the basic IP assignment code into new functions
basic_allocate_unassigned() and basic_failback(). 3 new functions
lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement
the LCP2 algorithm, and are hooked into ctdb_takeover_run_core().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)
|
|
|
|
|
|
|
| |
So, by default we have a fastpath vacuuming every 10 seconds and
full blown db-traverse vacuuming once every 10 minutes.
(This used to be ctdb commit 4f0ace982dbb5b4f9c035dbf4cb0ae74cd18d81b)
|
|
|
|
|
|
|
|
| |
This will control how many fast-path vacuuming runs wil have to
be done, before a full vacuuming will be triggered, i.e. one with
a db-traversal.
(This used to be ctdb commit 0d997ec7e61a7bee2cb05456f9c7d5e6f7a44797)
|
|
|
|
|
|
| |
out of recovery.
(This used to be ctdb commit eeaabd579841f60ab2c5b004cbbb1f5de2bfe685)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This concept didnt work out and it is really just as expensive as a full migration
anyway, without the benefit of caching the data for subsequence accesses.
Now, migrate the records immediately on first access.
This will be combined with a "cheap vacuum-lite" for special empty records to
prevent growth of databases.
Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway.
By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags.
(This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)
|
|
|
|
| |
(This used to be ctdb commit cd09c3f8fd9700261f77779aee9cf71dbd4e441e)
|
|
|
|
|
|
| |
will stopp any ip reallocations at all from happening.
(This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)
|
|
|
|
| |
(This used to be ctdb commit e5f03346133157734b4759d43c3ab8203028d5c2)
|
|
|
|
|
|
|
| |
This can sometimes improve performance for environments where very many
files are touched in rapid succession
(This used to be ctdb commit 15455a13863105a87d2cae9f06eed7435898c30b)
|
|
|
|
|
|
| |
it to 1 second
(This used to be ctdb commit cb8c779bb5d9862abbe08919aa181a1a1b2bef18)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
packets, to avoid the queue to grow excessively if smbd has blocked.
This could cause traverse packets to become discarded in case the main
smbd daemon does a traverse of a database while there is a recovery
(sending a erconfigured message to smbd, causing an avalanche of unlock
messages to be sent across the cluster.)
This avalance of messages could cause also the tranversal message to be
discarded causing the main smbd process to hang indefinitely waiting
for the traversal message that will never arrive.
Bump the maximum queue length before starting to discard messages from
1000 to 1000000 and at the same time rework the queueing slightly so we
can append messages cheaply to the queue instead of walking the list
from head to tail every time.
(This used to be ctdb commit 59ba5d7f80e0465e5076533374fb9ee862ed7bb6)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.
The default is 0 which means to reject a startup with
unhealthy dbs.
The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.
Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".
metze
(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
more :"
This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
(This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)
|
|
|
|
|
|
|
|
| |
since we no longer ban nodes when dodgy scripts continue to hang.
We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban.
(This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)
|
|
|
|
|
|
| |
EventScriptUnhealthyOnTimeout
(This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)
|
|
|
|
|
|
| |
there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails
(This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)
|
|
|
|
|
|
|
|
|
| |
event.
This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.
(This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
|
|
|
|
|
|
|
|
| |
When a single script is finished, also log the name of the script, the duration it took and the return status.
In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second
(This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)
|
|
|
|
|
|
| |
change the ban count to 10 failures before we ban by default
(This used to be ctdb commit 38d7487bc68c8cf85980004aceeef24ae32d6f36)
|
|
|
|
|
|
|
|
|
|
|
|
| |
and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES.
Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.
This avoids having queued up very very large number of MESSAGES that samba semds
between eachother to nodes that are blocked/banned/stopped for extended periods
.
(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
|
|
|
|
|
|
| |
Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned
(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
|
|
|
|
|
|
|
|
|
|
|
|
| |
print a full "pstree -p" to the log.
Example :
|-ctdbd(29826)-+-ctdbd(29862)
| `-ctdbd(31897)-+-00.ctdb(31898)---sleep(31908)
change the default timeout to 60 seconds for eventscripts
(This used to be ctdb commit a3406c10d70f89d332eab25d481083142dff987d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the explicit vacuum/repack commands from the 00.ctdb eventscript
and implement this in the ctdb daemon.
Combine vacuuming and repacking into one
cheap read traverse to enumerate all candidate records
and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely.
this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration.
minor stylish changes made by ronnie s
(This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)
|
|
|
|
|
|
| |
letting it work for 30 second before timing out.
(This used to be ctdb commit 2aa5d18bb42dca4ef9cb049b4fa9d7bc999ce4ad)
|