summaryrefslogtreecommitdiffstats
path: root/ctdb
Commit message (Collapse)AuthorAgeFilesLines
...
* ctdb-vacuum: cast freelist_size in comparison.Michael Adam2014-03-061-1/+2
| | | | | | | At this point, it is >= 0 anyways. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-vacuum: improve output of delete list statisticsMichael Adam2014-03-061-5/+5
| | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Do not support connection tracking if there are no public IPsAmitay Isaacs2014-03-041-0/+15
| | | | | | | | | | | | CTDB tracks connections to be able to send tickle ACKs and gratuitous ARPs. When there are no public IPs, there is no need for tickle ACKs and gratuitous ARPs. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Mar 4 03:01:38 CET 2014 on sn-devel-104
* ctdb-util: Do not use mlockall() on AIXAmitay Isaacs2014-03-041-6/+1
| | | | | | | Memory lockdown causes recovery daemon to crash on AIX. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-build: AIX does not have working C99 vsnprintf, requires libreplaceAmitay Isaacs2014-03-041-2/+2
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-build: Remove auto-generated header file in distcleanAmitay Isaacs2014-03-041-0/+1
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-recoverd: Check if callback function is registered before callingAmitay Isaacs2014-02-271-2/+7
| | | | | | | | | | Fix suggested by by Kevin Osborn <kosborn@overlandstorage.com>. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Thu Feb 27 13:54:59 CET 2014 on sn-devel-104
* ctdb-daemon: After updating tickles on other nodes, set update flag to falseAmitay Isaacs2014-02-271-0/+2
| | | | | | | | | | | tcp_update_flag is set to true whenever tickles are added or deleted. This flag is used to determine whether or not to send tickles list to other nodes. Once tickles list is sent to other nodes successfully, set tcp_update_flag to false, so ctdbd does not keep sending same tickles list every TickleUpdateInterval (20 seconds). Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Implement ctdb_control_startup()Martin Schwenke2014-02-271-4/+12
| | | | | | | | | | This doesn't implement what was recommended. That would require careful error handling, probably with a fallback to this code anyway. This is simple and does no worse that the current code. That is, the new node is updated on the next call to tdb_update_tcp_tickles(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Fix whitespacesAmitay Isaacs2014-02-271-4/+4
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Always talloc tickle array off vnn instead of ctdb->nodesAmitay Isaacs2014-02-271-8/+5
| | | | | | | | This fixes ctdb crash reported in bug #10366. Fix suggested by Kevin Osborn <kosborn@overlandstorage.com>. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-eventscripts: Switch on dumping of stuck nfsd threadsMartin Schwenke2014-02-252-2/+2
| | | | | | | | | | | | | | | | This feature was added quite a while ago but was not enabled by default. It is a useful feature so enable it to dump stack traces of up to 5 stuck processes by default. This can be disabled by setting: CTDB_NFS_DUMP_STUCK_THREADS=0 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Feb 25 04:06:45 CET 2014 on sn-devel-104
* ctdb-scripts: Update a misleading commentMartin Schwenke2014-02-191-8/+1
| | | | | | | | | | | | This comment was true when 50.samba was spaghetti because it tried to automatically manage both smbd (and nmbd) and winbind. It isn't true anymore. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Feb 19 04:07:12 CET 2014 on sn-devel-104
* ctdb-tests: Improvements to tests INSTALL scriptMartin Schwenke2014-02-191-2/+7
| | | | | | | | | | | | | | * Should stop on 1st error * Fix up value of CTDB_TESTS_ARE_INSTALLED * Improve fixing of broken symlinks in INSTALL This is all of the links in tests/eventscript/etc-ctdb/ so no need to list them. Just find and fix them. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-recoverd: LCP2 cleanupsMartin Schwenke2014-02-191-7/+7
| | | | | | | | | | | | | | | | * Remove unnecessary candimbl parameter. This parameter can be cheaply calculated in lcp2_failback_candidate(). The compiler will probably do an excellent job optimising it. :-) * Clarify a debug statement This is much clearer than doing a complex recalculation of a known value. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-recoverd: Optimise check for rebalance candidates in LCP2Martin Schwenke2014-02-191-16/+17
| | | | | | | | | Currently this can be checked many times. However, there's no point calling the rebalance/failback code at all if there are no rebalance candidates. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-scripts: Enhancements to hung script debuggingMartin Schwenke2014-02-193-2/+153
| | | | | | | | | | | | | | | | | | * Add stack dumps for "interesting" processes that sometimes get stuck, so try to print stack traces for them if they appear in the pstree output. * Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and CTDB_DEBUG_HUNG_SCRIPT_STACKPAT. These are primarily for testing but the latter may be useful for live debugging. * Load CTDB configuration so that above configuration variables can be set/changed without restarting ctdbd. Add a test that tries to ensure that all of this is working. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb:vacuum: move retrieval of freelist to after vacuum runMichael Adam2014-02-141-6/+7
| | | | | | | | | | The fast vacuum run may have increased the freelist size. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Feb 14 03:15:30 CET 2014 on sn-devel-104
* ctdb:vacuum: fix debug message typo in add_record_to_delete_list()Michael Adam2014-02-141-1/+1
| | | | | Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-tests: Handle interactions with monitor eventsMartin Schwenke2014-02-131-7/+3
| | | | | | | | | | | | | | | In the first case, reconfiguration can longer happen in a monitor event, so this is no longer a problem. Drop it. Running a monitor event by hand no longer cancels the existing monitor event. Instead the hand-run event fails. So do this differently and just wait for a monitor event before continuing. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Thu Feb 13 04:05:57 CET 2014 on sn-devel-104
* ctdb-recoverd: Fix a bug in the LCP2 rebalancing codeMartin Schwenke2014-02-135-22/+673
| | | | | | | | | | | | | | | | | srcimbl gets changed on every iteration of the loop. The value that should be stored for the new imbalance of the source node is minsrcimbl. To help diagnose this, added some extra debug that can be left in. The extra debug changes the output of a couple of tests. Note that the resulting IP allocations in those tests is unchanged - only the debug output is changed. Also add some new tests that illustrates the bug. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-tests: New test to ensure "ctdb reloadips" manipulates IPs correctlyMartin Schwenke2014-02-131-0/+234
| | | | | | | | | | | | This adds a lot of IPs (currently 100) in a new network and deletes them in a few steps. First the primary is deleted and then a check is done to ensure that the remaining IPs are all correct. Then about 1/2 of the IPs and deleted and remaining IPs are checked. Then the remaining IPs are deleted and a check is done to ensure they are all gone. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-tests-eventscripts: Testing support for promote_secondariesMartin Schwenke2014-02-132-2/+12
| | | | | | | | | Just enable this behaviour by default in the ip command stub, since 10.interface assumes/sets it. The rc.local replacement for set_proc() doesn't do anything... Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-eventscripts: Deleting IPs should use the promote_secondaries optionMartin Schwenke2014-02-132-71/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a primary IP address is being deleted from an interface, the secondaries are remembered and added back after the primary is deleted. This is done under a lock shared by the add/del script code. It is necessary because, by default, Linux deletes secondaries when the corresponding primary is deleted. There is a race here between ctdbd and the scripts, since ctdbd doesn't know about the lock. If ctdbd receives a release IP control and the IP address is not on an interface then it is regarded as a "Redundant release of IP" so no "releaseip" event is generated. This can occur if the IP address in question is a secondary that has been temporarily dropped. It is more likely if the number of secondaries is large. Since Linux 2.6.12 (i.e. 2005) Linux has supported a promote_secondaries option on interfaces. This option is currently undocumented but that will change in Linux 3.14. With promote_secondaries enabled the kernel will not drop secondaries but will promote a corresponding secondary instead. The kernel does all necessary locking. Use promote_secondaries to simplify the code, avoid re-adding secondaries, avoid re-adding routes and provide improved performance. This could be done conditionally, with a fallback to legacy secondary-re-adding code, but no supported Linux distribution is running a pre-2.6.12 kernel so this is unnecessary. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Consult CTDB_DEBUG_HUNG_SCRIPT variable before running debug scriptAmitay Isaacs2014-02-121-0/+4
| | | | | | | | | | | | If CTDB_DEUB_HUNG_SCRIPT is set, use that instead of the default debug script. This code was dropped by mistake in commit 18c1f432102f1a5093927be9276d001180539e50. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Wed Feb 12 08:47:47 CET 2014 on sn-devel-104
* ctdb-eventscripts: Create extra files for ganesha recoverySrikrishan Malik2014-02-121-0/+2
| | | | | | | | | | This adds new files for Ganesha's recovery. myreleaseip_* are used by the recovery thread on the node where IP is released. The releaseip_* and tekeip_* files are used by recovery thread where IP is taken over. Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-eventscripts: Run mmlsconfig only once and use cached resultsSrikrishan Malik2014-02-121-2/+20
| | | | | | Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-doc: Fix usage string for ctdb readkey/writekeyAmitay Isaacs2014-01-311-2/+2
| | | | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri Jan 31 07:52:46 CET 2014 on sn-devel-104
* ctdb-daemon: Return negative status only if there are known errorsAmitay Isaacs2014-01-311-1/+4
| | | | | | | If event script does not exist or does not have execute permissions, then return negative errno to distinguish from the exit errors of event script. Signed-off-by: Amitay Isaacs <amitay@gmail.com>
* ctdb/tests/eventscripts: Avoid errors on broken pipeMartin Schwenke2014-01-311-3/+3
| | | | | | | | | | | ctdb_get_my_public_addresses() attempts to echo things and this causes an error if head has taken the first line and the pipe is closed. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104
* ctdb/tests/eventscripts: Improve ip command stub secondary handlingMartin Schwenke2014-01-311-22/+59
| | | | | | | | It should support primary and secondaries per network instead of per interface. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb/daemon: reloadips must register state of asynchronous controlsMartin Schwenke2014-01-311-0/+3
| | | | | | | Otherwise ctdb_client_async_wait() is a no-op. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-eventscripts: Do not mark node unhealthy if no fs is availableSrikrishan Malik2014-01-301-3/+4
| | | | | | | | | Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com> Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104
* ctdb-daemon: Simplify listing event scripts using scandirAmitay Isaacs2014-01-211-94/+40
| | | | | | | | | | | | Instead of using RB tree for sorting the script names (incorrectly since it's only using the leading numbers in the script name), use scandir with alphasort. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104
* ctdb-daemon: Do not run monitor event if any other event is already runningAmitay Isaacs2014-01-212-0/+16
| | | | | | | | | | | | Any currently running monitor events are cancelled if any other events are scheduled. However, this does not stop monitor events to be run when other events are already running. Keep track of the number of active events and schedule monitor event only if there are no active events. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb/eventscripts: Move all eventscript state under $CTDB_VARDIR/stateMartin Schwenke2014-01-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | Services can be flagged for reconfigure when they release IPs at shutdown. The flag is never removed and the service is prematurely reconfigured during the first "ipreallocated" event, before any IPs are hosted and before the "startup" event has actually started the services. $CTDB_VARDIR/state directly contained the service state subdirectories and is already removed in the "init" event. Just push the service state subdirectories down a level and put everything else in a subdirectory. This way all the eventscript state gets cleaned up every time CTDB starts up. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104
* ctdb/daemon: Untangle serialisation of 1st recovery -> startup -> monitorMartin Schwenke2014-01-173-69/+76
| | | | | | | | | | | | | | At the moment ctdb_check_healthy() is overloaded to wait until the first recovery is complete, handle the "startup" event and also actually handle monitoring. This is untidy and hard to follow. Instead, have the daemon explicitly wait for 1st recovery after the "setup" event. When first recovery is complete, schedule a function to handle the "startup" event. When the "startup" event succeeds then explicitly enable monitoring. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb/eventscripts: Print a count if killing TCP connections times outMartin Schwenke2014-01-172-2/+2
| | | | | | | Also update related test Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb/eventscripts: Reconfigure lock should be released quicklyMartin Schwenke2014-01-174-5/+15
| | | | | | | | | | | | | | | Currently the lock is held until the corresponding eventscript completes, since the process still exists. If the regular part of an eventscript hangs then the lock might unnecessarily be held for a long time. The pathological case is when a monitor event gets stuck in D-wait state and the script times out but can't be killed so the lock is still held. This can cause an unwanted monitor replay. Change this so that the lock is released immediately after the reconfiguration is complete. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb/recoverd: Do not refuse disabling takeover runs on inactive nodesMartin Schwenke2014-01-171-7/+0
| | | | | | | | | | | | | | Failure might be expected when disabling takeover runs on banned nodes, since they might be suffering from performance problems or similar. More broadly, administrators who reconfigure a cluster that isn't in a happy state aren't necessarily doing something sensible. However, allowing takeover runs to be disabled on inactive nodes stops reconfiguration of stopped nodes. This is probaby an unreasonable limitation, so drop it. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-recoverd: Ignore failed ipreallocated controls to inactive nodesMartin Schwenke2014-01-171-17/+17
| | | | | | | | Currently timeouts for controls to inactive nodes can cause banning credits to be applied. This should not happen. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* ctdb-daemon: Remove ctdb_fork_with_logging()Amitay Isaacs2014-01-162-65/+0
| | | | | | | | | | This function has been replaced with ctdb_vfork_with_logging(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Thu Jan 16 04:05:35 CET 2014 on sn-devel-104
* ctdb-tests: Set CTDB_EVENT_HELPER when running with local daemonsAmitay Isaacs2014-01-161-0/+1
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Remove unused code to run eventscriptsAmitay Isaacs2014-01-161-104/+0
| | | | | | | Eventscripts are now executed using a helper. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging ↵Amitay Isaacs2014-01-161-26/+28
| | | | | | | | | (part 2) Use ctdb_event_helper to run debug-hung-script.sh. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Replace ctdb_fork_with_logging with ctdb_vfork_with_logging ↵Amitay Isaacs2014-01-161-20/+90
| | | | | | | | | (part 1) Use ctdb_event_helper to run eventscripts. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Add helper process to execute event scriptsAmitay Isaacs2014-01-163-1/+143
| | | | | Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Add ctdb_vfork_with_logging()Amitay Isaacs2014-01-162-0/+87
| | | | | | | | This will be used to spawn lightweight helper processes to run eventscripts. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USERAmitay Isaacs2014-01-166-46/+16
| | | | | | | | This was added to support external monitoring using CTDB event scripts. However, it was never used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>
* ctdb-daemon: Deprecate RELOAD and STATUS eventsAmitay Isaacs2014-01-162-6/+3
| | | | | | | These events have never been used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>