diff options
author | Martin Schwenke <martin@meltin.net> | 2014-06-10 15:16:44 +1000 |
---|---|---|
committer | Amitay Isaacs <amitay@samba.org> | 2014-06-19 23:41:13 +0200 |
commit | 6a552f1a12ebe43f946bbbee2a3846b5a640ae4f (patch) | |
tree | 48a7da00070e52f9516dc2b756652f3d8af85d09 /ctdb/lib/replace/snprintf.c | |
parent | 364bdadde3159dde1ddcc8c5fa4be981448f6833 (diff) | |
download | samba-6a552f1a12ebe43f946bbbee2a3846b5a640ae4f.tar.gz samba-6a552f1a12ebe43f946bbbee2a3846b5a640ae4f.tar.xz samba-6a552f1a12ebe43f946bbbee2a3846b5a640ae4f.zip |
ctdb-tests: Try harder to avoid failures due to repeated recoveries
About a year ago a check was added to _cluster_is_healthy() to make
sure that node 0 isn't in recovery. This was to avoid unexpected
recoveries causing tests to fail. However, it was misguided because
each test initially calls cluster_is_healthy() and will now fail if an
unexpected recovery occurs.
Instead, have cluster_is_healthy() warn if the cluster is in recovery.
Also:
* Rename wait_until_healthy() to wait_until_ready() because it waits
until both healthy and out of recovery.
* Change the post-recovery sleep in restart_ctdb() to 2 seconds and
add a loop to wait (for 2 seconds at a time) if the cluster is back
in recovery. The logic here is that the re-recovery timeout has
been set to 1 second, so sleeping for just 1 second might race
against the next recovery.
* Use reverse logic in node_has_status() so that it works for "all".
* Tweak wait_until() so that it can handle timeouts with a
recheck-interval specified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Diffstat (limited to 'ctdb/lib/replace/snprintf.c')
0 files changed, 0 insertions, 0 deletions