summaryrefslogtreecommitdiffstats
path: root/ctdb/tests/src/ctdb_takeover_tests.c
Commit message (Collapse)AuthorAgeFilesLines
* recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODEMartin Schwenke2013-09-191-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
* tests/takeover: Takeover tests can use up to 1024 and checks limitsMartin Schwenke2013-05-241-1/+13
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cfd1371d3a1f78a0ed86485d83bd4d311727c3d4)
* tests/takeover: Allow takeover runs with differing IP allocations per nodeMartin Schwenke2013-05-241-10/+47
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 954ae6f84cb06a8dcbc12456d4752280072be5bf)
* recoverd: Nodes can only takeover IPs if they are in runstate RUNNINGMartin Schwenke2013-05-241-1/+29
| | | | | | | | | | | | | Currently the order of the first IP allocation, including the first "ipreallocated" event, and the "startup" event is undefined. Both of these events can (re)start services. This stops IPs being hosted before the "startup" event has completed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)
* recoverd: takeover_run_core() should not use modified node flagsMartin Schwenke2013-05-231-21/+24
| | | | | | | | | | | | | | | | Modifying the node flags with IP-allocation-only flags is not necessary. It causes breakage if the flags are not cleared after use. ctdb_takeover_run() no longer needs the general node flags - it only needs the IP flags. Instead of modifying the node flags in nodemap, construct a custom IP flags list and have takeover_run_core() use that instead of node flags. As well as being safer, this makes the IP allocation code more self contained and a little bit clearer. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446)
* recoverd: Remove unused mask argument and initial mask calculationMartin Schwenke2013-05-071-9/+3
| | | | | | | | This has been replaced by set_ipflags() and associated functionality. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0a3822573db296e73cc897835f783c8abc084b3)
* recoverd: Remove unused mask argument from IP allocation functionsMartin Schwenke2013-05-071-3/+0
| | | | | | | | | This is a no-op and is in a separate commit to make the previous commit less cumbersome. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a)
* recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabledMartin Schwenke2013-05-071-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This really needs to be per-node. The rename is because nodes with this tunable switched on should drop IPs if they become unhealthy (or disabled in some other way). * Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon. * Enhance set_ipflags_internal() and set_ipflags() to setup NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled and/or whether nodes are disabled/inactive. * Replace can_node_servce_ip() with functions can_node_host_ip() and can_node_takeover_ip(). These functions are the only ones that need to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They can make the decision without looking at any other flags due to previous setup. * Remove explicit flag checking in IP allocation functions (including unassign_unsuitable_ips()) and just call can_node_host_ip() and can_node_takeover_ip() as appropriate. * Update test code to handle CTDB_SET_NoIPHostOnAllDisabled. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
* tests/takeover: Allow per-node tunable settingsMartin Schwenke2013-05-071-9/+47
| | | | | | | | | Implemented for CTDB_SET_NoIPTakeover. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a1addd89fd9c0390912604097acd028cc24d3483)
* recoverd: Move failback retry loop into basic_failback() and lcp2_failback()Martin Schwenke2013-01-081-6/+3
| | | | | | | | | | | | | The retry loop is currently in ctdb_takeover_run_core(). Pushing it into each function will make it possible to put each algorithm into a separate top-level function. This will make the code much clearer and more maintainable. Also keep associated test code compatible. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6ce18d011dd9043b04256690d826deb2640cd89)
* tests/takeover: Support testing of NoIPTakeoverOnDisabledMartin Schwenke2013-01-081-0/+5
| | | | | | | | Via $CTDB_SET_NoIPTakeoverOnDisabled. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d357d52dbd533444a4af6151d04ba119a1533068)
* tests/takeover: IP allocation now selected via $CTDB_IP_ALGORITHMMartin Schwenke2013-01-081-3/+12
| | | | | | | | | Default to LCP2, like ctdbd. Also support "det" for deterministic IPs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 20631f5f29859920844dd8f410e24917aabd3dfd)
* Tests - make a comment more accurateMartin Schwenke2011-12-081-2/+1
| | | | | | Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 726b598076132a5a73f9259d6b65ee2a4012099f)
* Tests: change ctdb_takeover_tests.c to include ctdbd codeMartin Schwenke2011-11-111-10/+1
| | | | | | | | | | | | | | | | | | Do this instead of linking to it. This means that, after previous cleanups, we can fix ctdb_takeover.c to use static functions when appropriate and simply include all the code we need to run tests. To make this reusable in other tests, new file ctdbd_tests.c does all of the relevant including. ctdb_takeover_tests.c just includes that file. Test objects built in this way can depend on new Makefile macro $(CTDB_TEST_C), which contains ctdbd_tests.c and everything it includes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 41869d42194b74db43a176a068e96e411007e5f2)
* Tests - Allow some tests in ctdb_takover_tests to specify allowed nodesMartin Schwenke2011-11-011-21/+163
| | | | | | | | | | | | | | | | | | | | | | This mainly applies to ctdb_takeover_run_core when you might want to specify that some IPs can only be hosted by some nodes. Syntax on each line is now: IP current_pnn allowed_pnns where allowed_pnns is a comma-separated list. allowed_pnns is optional. If not specified then address can be assigned to all nodes that aren't included in an allowed_pnns list. Just think of it as all PNNs and that the behaviour is undefined when you only specify allowed_pnns for some IPs. ;-) current_pnn is optional and defaults to -1. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ed83604da82ebe566d6eb330ab7119e861e853c8)
* Tests - IP allocation - allow more interesting node states to be specifiedMartin Schwenke2011-09-251-11/+16
| | | | | | | | | Node states on the command line are now comma-separated hex numbers, so all flag states can be expressed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1f1534435b9d5f464604e28a8cce2cd0a779ef68)
* Tests: Initial test code for LCP2 IP allocation algorithm.Martin Schwenke2011-07-291-0/+378
Move struct ctdb_public_ip_list to ctdb_private.h and put some definitions for some functions from ctdb_takeover.c there. This allows those functions to be called from unit tests. Add ctdb_takeover_tests.c and the Makefile support to build it. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)