summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Schwenke <martin@meltin.net>2013-12-09 15:54:52 +1100
committerAmitay Isaacs <amitay@samba.org>2013-12-17 06:32:35 +0100
commitfdccaab2a9a1b9d7eebcd7a4d121dbf68ea48dcd (patch)
treeba45b0e858a0d43b8254d58e5802dd7a61ef9b02
parent970a6efa3b5fc11aa4ff79049738bb971a129a62 (diff)
downloadsamba-fdccaab2a9a1b9d7eebcd7a4d121dbf68ea48dcd.tar.gz
samba-fdccaab2a9a1b9d7eebcd7a4d121dbf68ea48dcd.tar.xz
samba-fdccaab2a9a1b9d7eebcd7a4d121dbf68ea48dcd.zip
ctdb/eventscripts: Do not reconfigure in "monitor" events
"monitor" events can be cancelled. If a reconfigure action does a service restart then the "monitor" event can be cancelled at the inconvenient moment after the service is stopped. In this case the service stays down and the node may become unhealthy (depending on whether there are any repair actions in the monitor event). A long time ago we did service reconfiguration in "monitor" events following failovers. Service reconfiguration was then moved to the "ipreallocated" event. However, reconfiguration in "monitor" events has been kept as a last resort in case an "ipreallocate" event does not occur. The only important case that this covers is "ctdb deleteip", where "releaseip" events are generated without a corresponding "ipreallocated". Therefore, IPs can be deleted without running the required service reconfiguration. The supported way of removing IP addresses is now via "ctdb reloadips", which always causes a takeover run with a corresponding "ipreallocate" event. This means that service reconfiguration in "monitor" events is no longer required and should be removed because it is unsafe. Also update the associated tests. Make the first confirm that the monitor event no longer does reconfiguration. Change the others to test that monitor status is correctly replayed when something else is doing a reconfigure and currently holds the reconfigure lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
-rwxr-xr-xctdb/config/functions10
-rwxr-xr-xctdb/tests/eventscripts/60.nfs.multi.002.sh10
-rwxr-xr-xctdb/tests/eventscripts/60.nfs.multi.003.sh5
-rwxr-xr-xctdb/tests/eventscripts/60.nfs.multi.004.sh5
-rwxr-xr-xctdb/tests/eventscripts/60.nfs.multi.005.sh5
5 files changed, 11 insertions, 24 deletions
diff --git a/ctdb/config/functions b/ctdb/config/functions
index aa31f89103..4430d866bf 100755
--- a/ctdb/config/functions
+++ b/ctdb/config/functions
@@ -1195,16 +1195,6 @@ ctdb_service_check_reconfigure ()
ctdb_service_reconfigure
fi
;;
- monitor)
- if ctdb_service_needs_reconfigure ; then
- ctdb_service_reconfigure
- # Given that the reconfigure might not have
- # resulted in the service being stable yet, we
- # replay the previous status since that's the best
- # information we have.
- ctdb_replay_monitor_status
- fi
- ;;
esac
else
# Somebody else is running an event we don't want to collide
diff --git a/ctdb/tests/eventscripts/60.nfs.multi.002.sh b/ctdb/tests/eventscripts/60.nfs.multi.002.sh
index 350c1bc726..29386c13b2 100755
--- a/ctdb/tests/eventscripts/60.nfs.multi.002.sh
+++ b/ctdb/tests/eventscripts/60.nfs.multi.002.sh
@@ -2,7 +2,7 @@
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure"
+define_test "takeip, monitor -> no reconfigure"
setup_nfs
@@ -12,12 +12,6 @@ ok_null
simple_test_event "takeip" $public_address
-# This currently assumes that ctdb scriptstatus will always return a
-# good status (when replaying). That should change and we will need
-# to split this into 2 tests.
-ok <<EOF
-Reconfiguring service "nfs"...
-Replaying previous status for this script due to reconfigure...
-EOF
+ok_null
simple_test_event "monitor"
diff --git a/ctdb/tests/eventscripts/60.nfs.multi.003.sh b/ctdb/tests/eventscripts/60.nfs.multi.003.sh
index 68f45ab15d..653dece07a 100755
--- a/ctdb/tests/eventscripts/60.nfs.multi.003.sh
+++ b/ctdb/tests/eventscripts/60.nfs.multi.003.sh
@@ -2,7 +2,7 @@
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure, replay error"
+define_test "takeip, take reconfigure lock, monitor -> replay error"
setup_nfs
@@ -16,8 +16,9 @@ simple_test_event "takeip" $public_address
ctdb_fake_scriptstatus 1 "ERROR" "$err"
+eventscript_call ctdb_reconfigure_try_lock
+
required_result 1 <<EOF
-Reconfiguring service "nfs"...
Replaying previous status for this script due to reconfigure...
$err
EOF
diff --git a/ctdb/tests/eventscripts/60.nfs.multi.004.sh b/ctdb/tests/eventscripts/60.nfs.multi.004.sh
index b071ec8bd9..43323cf61f 100755
--- a/ctdb/tests/eventscripts/60.nfs.multi.004.sh
+++ b/ctdb/tests/eventscripts/60.nfs.multi.004.sh
@@ -2,7 +2,7 @@
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure, replay timedout"
+define_test "takeip, take reconfigure lock, monitor -> reconfigure, replay timedout"
setup_nfs
@@ -16,8 +16,9 @@ simple_test_event "takeip" $public_address
ctdb_fake_scriptstatus -62 "TIMEDOUT" "$err"
+eventscript_call ctdb_reconfigure_try_lock
+
required_result 1 <<EOF
-Reconfiguring service "nfs"...
Replaying previous status for this script due to reconfigure...
[Replay of TIMEDOUT scriptstatus - note incorrect return code.] $err
EOF
diff --git a/ctdb/tests/eventscripts/60.nfs.multi.005.sh b/ctdb/tests/eventscripts/60.nfs.multi.005.sh
index 82802aa01e..9816bec838 100755
--- a/ctdb/tests/eventscripts/60.nfs.multi.005.sh
+++ b/ctdb/tests/eventscripts/60.nfs.multi.005.sh
@@ -2,7 +2,7 @@
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure, replay disabled"
+define_test "takeip, take reconfigure lock, monitor -> reconfigure, replay disabled"
setup_nfs
@@ -16,8 +16,9 @@ simple_test_event "takeip" $public_address
ctdb_fake_scriptstatus -8 "DISABLED" "$err"
+eventscript_call ctdb_reconfigure_try_lock
+
ok <<EOF
-Reconfiguring service "nfs"...
Replaying previous status for this script due to reconfigure...
[Replay of DISABLED scriptstatus - note incorrect return code.] $err
EOF