diff options
| author | Martin Schwenke <martin@meltin.net> | 2014-06-10 15:16:44 +1000 |
|---|---|---|
| committer | Amitay Isaacs <amitay@samba.org> | 2014-06-19 23:41:13 +0200 |
| commit | 6a552f1a12ebe43f946bbbee2a3846b5a640ae4f (patch) | |
| tree | 48a7da00070e52f9516dc2b756652f3d8af85d09 /ctdb/tests/complex | |
| parent | 364bdadde3159dde1ddcc8c5fa4be981448f6833 (diff) | |
| download | samba-6a552f1a12ebe43f946bbbee2a3846b5a640ae4f.tar.gz samba-6a552f1a12ebe43f946bbbee2a3846b5a640ae4f.tar.xz samba-6a552f1a12ebe43f946bbbee2a3846b5a640ae4f.zip | |
ctdb-tests: Try harder to avoid failures due to repeated recoveries
About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery. This was to avoid unexpected
recoveries causing tests to fail. However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.
Instead, have cluster_is_healthy() warn if the cluster is in recovery.
Also:
* Rename wait_until_healthy() to wait_until_ready() because it waits
until both healthy and out of recovery.
* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
add a loop to wait (for 2 seconds at a time) if the cluster is back
in recovery. The logic here is that the re-recovery timeout has
been set to 1 second, so sleeping for just 1 second might race
against the next recovery.
* Use reverse logic in node_has_status() so that it works for "all".
* Tweak wait_until() so that it can handle timeouts with a
recheck-interval specified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Diffstat (limited to 'ctdb/tests/complex')
| -rwxr-xr-x | ctdb/tests/complex/34_nfs_tickle_restart.sh | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/ctdb/tests/complex/34_nfs_tickle_restart.sh b/ctdb/tests/complex/34_nfs_tickle_restart.sh index 93587e2f31..b7eea4ca21 100755 --- a/ctdb/tests/complex/34_nfs_tickle_restart.sh +++ b/ctdb/tests/complex/34_nfs_tickle_restart.sh @@ -79,7 +79,7 @@ try_command_on_node $rn $CTDB_TEST_WRAPPER restart_ctdb_1 echo "Setting NoIPTakeover on node ${rn}" try_command_on_node $rn $CTDB setvar NoIPTakeover 1 -wait_until_healthy +wait_until_ready echo "Getting TickleUpdateInterval..." try_command_on_node $test_node $CTDB getvar TickleUpdateInterval |
