summaryrefslogtreecommitdiffstats
path: root/ctdb/server
Commit message (Collapse)AuthorAgeFilesLines
...
* ctdb-vacuum: fix treatment of remaining records and statistics in ↵Michael Adam2014-03-061-24/+16
| | | | | | | delete_record_traverse() Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-vacuum: cast freelist_size in comparison.Michael Adam2014-03-061-1/+2
| | | | | | | At this point, it is >= 0 anyways. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-vacuum: improve output of delete list statisticsMichael Adam2014-03-061-5/+5
| | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Do not support connection tracking if there are no public IPsAmitay Isaacs2014-03-041-0/+15
| | | | | | | | | | | | CTDB tracks connections to be able to send tickle ACKs and gratuitous ARPs. When there are no public IPs, there is no need for tickle ACKs and gratuitous ARPs. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Mar 4 03:01:38 CET 2014 on sn-devel-104
* ctdb-recoverd: Check if callback function is registered before callingAmitay Isaacs2014-02-271-2/+7
| | | | | | | | | | Fix suggested by by Kevin Osborn <kosborn@overlandstorage.com>. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Thu Feb 27 13:54:59 CET 2014 on sn-devel-104
* ctdb-daemon: After updating tickles on other nodes, set update flag to falseAmitay Isaacs2014-02-271-0/+2
| | | | | | | | | | | tcp_update_flag is set to true whenever tickles are added or deleted. This flag is used to determine whether or not to send tickles list to other nodes. Once tickles list is sent to other nodes successfully, set tcp_update_flag to false, so ctdbd does not keep sending same tickles list every TickleUpdateInterval (20 seconds). Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Implement ctdb_control_startup()Martin Schwenke2014-02-271-4/+12
| | | | | | | | | | This doesn't implement what was recommended. That would require careful error handling, probably with a fallback to this code anyway. This is simple and does no worse that the current code. That is, the new node is updated on the next call to tdb_update_tcp_tickles(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Fix whitespacesAmitay Isaacs2014-02-271-4/+4
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Always talloc tickle array off vnn instead of ctdb->nodesAmitay Isaacs2014-02-271-8/+5
| | | | | | | | This fixes ctdb crash reported in bug #10366. Fix suggested by Kevin Osborn <kosborn@overlandstorage.com>. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-recoverd: LCP2 cleanupsMartin Schwenke2014-02-191-7/+7
| | | | | | | | | | | | | | | | * Remove unnecessary candimbl parameter. This parameter can be cheaply calculated in lcp2_failback_candidate(). The compiler will probably do an excellent job optimising it. :-) * Clarify a debug statement This is much clearer than doing a complex recalculation of a known value. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-recoverd: Optimise check for rebalance candidates in LCP2Martin Schwenke2014-02-191-16/+17
| | | | | | | | | Currently this can be checked many times. However, there's no point calling the rebalance/failback code at all if there are no rebalance candidates. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb:vacuum: move retrieval of freelist to after vacuum runMichael Adam2014-02-141-6/+7
| | | | | | | | | | The fast vacuum run may have increased the freelist size. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Feb 14 03:15:30 CET 2014 on sn-devel-104
* ctdb:vacuum: fix debug message typo in add_record_to_delete_list()Michael Adam2014-02-141-1/+1
| | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-recoverd: Fix a bug in the LCP2 rebalancing codeMartin Schwenke2014-02-131-1/+4
| | | | | | | | | | | | | | | | | srcimbl gets changed on every iteration of the loop. The value that should be stored for the new imbalance of the source node is minsrcimbl. To help diagnose this, added some extra debug that can be left in. The extra debug changes the output of a couple of tests. Note that the resulting IP allocations in those tests is unchanged - only the debug output is changed. Also add some new tests that illustrates the bug. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Consult CTDB_DEBUG_HUNG_SCRIPT variable before running debug scriptAmitay Isaacs2014-02-121-0/+4
| | | | | | | | | | | | If CTDB_DEUB_HUNG_SCRIPT is set, use that instead of the default debug script. This code was dropped by mistake in commit 18c1f432102f1a5093927be9276d001180539e50. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Wed Feb 12 08:47:47 CET 2014 on sn-devel-104
* ctdb-daemon: Return negative status only if there are known errorsAmitay Isaacs2014-01-311-1/+4
| | | | | | | If event script does not exist or does not have execute permissions, then return negative errno to distinguish from the exit errors of event script. Signed-off-by: Amitay Isaacs <amitay@gmail.com>
* ctdb/daemon: reloadips must register state of asynchronous controlsMartin Schwenke2014-01-311-0/+3
| | | | | | | Otherwise ctdb_client_async_wait() is a no-op. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Simplify listing event scripts using scandirAmitay Isaacs2014-01-211-94/+40
| | | | | | | | | | | | Instead of using RB tree for sorting the script names (incorrectly since it's only using the leading numbers in the script name), use scandir with alphasort. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104
* ctdb-daemon: Do not run monitor event if any other event is already runningAmitay Isaacs2014-01-211-0/+15
| | | | | | | | | | | | Any currently running monitor events are cancelled if any other events are scheduled. However, this does not stop monitor events to be run when other events are already running. Keep track of the number of active events and schedule monitor event only if there are no active events. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb/daemon: Untangle serialisation of 1st recovery -> startup -> monitorMartin Schwenke2014-01-172-68/+75
| | | | | | | | | | | | | | At the moment ctdb_check_healthy() is overloaded to wait until the first recovery is complete, handle the "startup" event and also actually handle monitoring. This is untidy and hard to follow. Instead, have the daemon explicitly wait for 1st recovery after the "setup" event. When first recovery is complete, schedule a function to handle the "startup" event. When the "startup" event succeeds then explicitly enable monitoring. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb/recoverd: Do not refuse disabling takeover runs on inactive nodesMartin Schwenke2014-01-171-7/+0
| | | | | | | | | | | | | | Failure might be expected when disabling takeover runs on banned nodes, since they might be suffering from performance problems or similar. More broadly, administrators who reconfigure a cluster that isn't in a happy state aren't necessarily doing something sensible. However, allowing takeover runs to be disabled on inactive nodes stops reconfiguration of stopped nodes. This is probaby an unreasonable limitation, so drop it. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-recoverd: Ignore failed ipreallocated controls to inactive nodesMartin Schwenke2014-01-171-17/+17
| | | | | | | | Currently timeouts for controls to inactive nodes can cause banning credits to be applied. This should not happen. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Remove ctdb_fork_with_logging()Amitay Isaacs2014-01-161-60/+0
| | | | | | | | | | This function has been replaced with ctdb_vfork_with_logging(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Thu Jan 16 04:05:35 CET 2014 on sn-devel-104
* ctdb-daemon: Remove unused code to run eventscriptsAmitay Isaacs2014-01-161-104/+0
| | | | | | | Eventscripts are now executed using a helper. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging ↵Amitay Isaacs2014-01-161-26/+28
| | | | | | | | | (part 2) Use ctdb_event_helper to run debug-hung-script.sh. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging ↵Amitay Isaacs2014-01-161-20/+90
| | | | | | | | | (part 1) Use ctdb_event_helper to run eventscripts. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Add helper process to execute event scriptsAmitay Isaacs2014-01-161-0/+136
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Add ctdb_vfork_with_logging()Amitay Isaacs2014-01-161-0/+78
| | | | | | | | This will be used to spawn lightweight helper processes to run eventscripts. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USERAmitay Isaacs2014-01-165-44/+15
| | | | | | | | This was added to support external monitoring using CTDB event scripts. However, it was never used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Deprecate RELOAD and STATUS eventsAmitay Isaacs2014-01-161-4/+1
| | | | | | | These events have never been used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-recoverd: Only respond to currently queued ipreallocated requestsMartin Schwenke2013-11-271-1/+10
| | | | | | | | | | | Otherwise new requests can come in during the latter parts of the takeover run when the IP allocation algorithm has already run, and the new requests will be dequeued even though they haven't really be processed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-recoverd: For persistent databases a sequence number of 0 is validMartin Schwenke2013-11-271-2/+3
| | | | | | | | Otherwise recovery ends up done by RSN when it is unnecessary. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-locking: Use vfork instead of fork to exec helpersAmitay Isaacs2013-11-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a significant overhead using fork() over vfork(), specially when the child process execs a helper. The overhead is in memory space and time. # strace -c ./test_fork 1024 200 count=1024, size=204800, total=200M failed fork=0 time for fork() = 4879.597000 us % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 4.543321 3304 1375 375 clone 0.00 0.000071 0 1033 mmap 0.00 0.000000 0 1 read 0.00 0.000000 0 3 write 0.00 0.000000 0 2 open 0.00 0.000000 0 2 close 0.00 0.000000 0 3 fstat 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 4.543392 2429 376 total # strace -c ./test_vfork 1024 200 count=1024, size=204800, total=200M failed fork=0 time for fork() = 82.041000 us % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 96.47 0.001204 1 1000 vfork 3.53 0.000044 0 1033 mmap 0.00 0.000000 0 1 read 0.00 0.000000 0 3 write 0.00 0.000000 0 2 open 0.00 0.000000 0 2 close 0.00 0.000000 0 3 fstat 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.001248 2054 1 total Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-locking: Update current lock statistics when lock is scheduledAmitay Isaacs2013-11-271-2/+2
| | | | | | | | | When a child process is created for a lock request, the current locks statistics should be updated immediately. This will provide accurate information on number of active lock requests. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-locking: Do not merge multiple lock requests to avoid unfair schedulingAmitay Isaacs2013-11-271-1/+6
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-locking: Implement active lock requests limit per databaseAmitay Isaacs2013-11-271-8/+11
| | | | | | | | | | | | This limit was currently a global limit and not per database. This prevents any database freeze lock requests from getting scheduled if the global limit was reached. Only individual record requests should be limited and database freeze requests should always get scheduled. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-recoverd: Fix backward compatibility for CTDB_SRVID_TAKEOVER_RUNMartin Schwenke2013-11-271-7/+7
| | | | | | | | | When running a mixed version cluster, compatibility with older versions was was broken during recent refactorisation. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-recoverd: A node refuses to play against itselfMartin Schwenke2013-11-271-0/+5
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-recoverd: Remove duplicate code to update flags during recoveryMartin Schwenke2013-11-271-17/+0
| | | | | | | | This also happens earlier in do_recovery() and the nodemap is not updated after that, so this update is redundant. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Michael Adam <obnox@samba.org>
* ctdb-server: Coverity fixesAmitay Isaacs2013-11-198-24/+49
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>
* tunables: Remove obsolete tunablesAmitay Isaacs2013-10-301-3/+0
| | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ca5fc3431573c44d55d09d987c715fb53756fc1f)
* recoverd: Rebalancing should be done regardless tunableMartin Schwenke2013-10-301-7/+14
| | | | | | | | | Rebalance target nodes should be set even if a deferred rebalance is not configured. The user can explicitly cause a takeover run. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit afd9b51644af074752d74c412cb4e7ec2eba2c69)
* recoverd: Improve an error message in the election codeMartin Schwenke2013-10-301-1/+1
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 275ed9ebe287e39d891888c13810c70f347af8ac)
* Revert "if a new node enters the cluster, that node will already be frozen ↵Martin Schwenke2013-10-301-20/+13
| | | | | | | | | | | | | | | | | | | at start" This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94. Furthermore, if a node doesn't force an election but wins it then it can fail to record that it is the new recovery master. This can lead to a reverse split brain where there is no recovery master. This reverts commit c5035657606283d2e35bea40992505e84ca8e7be. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Conflicts: server/ctdb_recoverd.c (This used to be ctdb commit c8b542e059a54b8d524bd430cad9d82e5edd864d)
* ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFOMartin Schwenke2013-10-291-2/+3
| | | | | | | | | This is important enough that we should see it when the log level is DEBUG_NOTICE. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit eb8ec5681bfccb26c8ffae72952d54bb0ba46249)
* Revert "recoverd: Disable takeover runs on other nodes for 5 minutes"Martin Schwenke2013-10-291-2/+2
| | | | | | | | | | | 5 minutes is too long to leave the cluster in limbo if the recovery daemon dies during a takeover run, even though this is quite unlikely. We need a new recover master to be able to do takeover runs fairly quickly. This reverts commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f. (This used to be ctdb commit 3e41170c78fc7a2bf526129c9b7db3739b61c6bf)
* daemon: Change the default recovery method for persistent databasesAmitay Isaacs2013-10-281-1/+1
| | | | | | | | | Use sequence numbers to do recovery for persistent databases instead of RSNs. This fixes the problem of registry corruption during recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d)
* packaging: Move ctdb/ directory from /var to /var/libAmitay Isaacs2013-10-251-3/+3
| | | | | | | | | Introduce CTDB_VARDIR variable that points to /var/lib/ctdb by default. This makes CTDB_VARDIR consistent across C code and scripts. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2c09aac71188f43cd592572b10ea30b7a2969678)
* ctdbd: Simplify database directory setting logicMartin Schwenke2013-10-252-57/+3
| | | | | | | | | | | | | | No need to check if the options are set. The options are always set via static defaults. No need to talloc_strdup() the values via wrapper functions. The options aren't going away. Remove now unused ctdb_set_tdb_dir() and similar functions. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1fe82f3d7b610547ff4945887f15dd6c5798a49b)
* ctdbd: Remove duplicate database directory setting logicMartin Schwenke2013-10-252-34/+5
| | | | | | | | | | | | | Defaults for ctdb->db_directory and similar variables are currently set in 2 places. Change this to set them in only 1 place and make the directories at initialisation time instead of waiting until later. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit d73d84346488a2ed54e6a86f9d7ec641c8e33ace)