diff options
| author | Martin Schwenke <martin@meltin.net> | 2011-05-16 14:23:28 +1000 |
|---|---|---|
| committer | Martin Schwenke <martin@meltin.net> | 2011-08-30 14:29:48 +1000 |
| commit | b97625acb6c4f58c4df732043de06de4fd7b4654 (patch) | |
| tree | 07c9e13dc2d5de77943450791d4bc938a8fbc679 | |
| parent | 94c34295670c69338bed743d440f2f65b1af37a8 (diff) | |
| download | samba-b97625acb6c4f58c4df732043de06de4fd7b4654.tar.gz samba-b97625acb6c4f58c4df732043de06de4fd7b4654.tar.xz samba-b97625acb6c4f58c4df732043de06de4fd7b4654.zip | |
Eventscripts: add a synchronous synthetic reconfigure event.
In the current code services can only be reconfigured asynchronously.
This means that configuration file changes can be made, an asychronous
reconfigure event can be triggered, and it always succeeds. Some time
later when a service is actually reconfigured then a failure may be
seen
This adds a synthetic reconfigure event that reconfigures a service
synchronously so that any failure is reported on exit.
ctdb_service_check_reconfigure() is essentially reimplemented.
If a reconfigure event is in flight and an ipreallocated or monitor
event occurs then any scheduled asynchronous reconfigure is deferred
until the next monitor cycle. This is to avoid reconfigures trampling
on each other. In this case a monitor event will also replay the
previous status to try to avoid exposing any temporary instability.
If a reconfigure event collides with another reconfigure event it will
exit with status 2, indicating that the reconfigure should be retried.
The reconfigure event is implemented using a subprocess to control the
exit from the synthetic event.
As before, if a monitor event causes a scheduled synchronous
reconfigure to occure then it will replay the previous status for the
service, given that a reconfigure can cause temporary instability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 220578bfd3507152b29ba4c28942f9d5e8733886)
| -rwxr-xr-x | ctdb/config/functions | 109 |
1 files changed, 91 insertions, 18 deletions
diff --git a/ctdb/config/functions b/ctdb/config/functions index e30e57dba0..614f626cb9 100755 --- a/ctdb/config/functions +++ b/ctdb/config/functions @@ -1014,7 +1014,7 @@ ctdb_service_unset_reconfigure () ctdb_service_reconfigure () { - echo "Reconfiguring service \"$service_name\"..." + echo "Reconfiguring service \"$@\"..." ctdb_service_unset_reconfigure "$@" service_reconfigure "$@" || return $? ctdb_counter_init "$@" @@ -1026,28 +1026,101 @@ service_reconfigure () service "${1:-$service_name}" restart } +ctdb_reconfigure_try_lock () +{ + + _ctdb_service_reconfigure_common "$@" + _lock="${_d}/reconfigure_lock" + touch "$_lock" + + ( + flock 0 + # This is overkill but will work if we need to extend this to + # allow certain events to run multiple times in parallel + # (e.g. takeip) and write multiple PIDs to the file. + read _locker_event + if [ -n "$_locker_event" ] ; then + while read _pid ; do + if [ -n "$_pid" -a "$_pid" != $$ ] && \ + kill -0 "$_pid" 2>/dev/null ; then + exit 1 + fi + done + fi + + printf "%s\n%s\n" "$event_name" $$ >"$_lock" + exit 0 + ) <"$_lock" +} + +ctdb_replay_monitor_status () +{ + echo "Replaying previous status for this script due to reconfigure..." + ctdb scriptstatus | \ + grep -q -E "^${script_name}[[:space:]]+Status:OK[[:space:]]" + exit $? +} + ctdb_service_check_reconfigure () { - # Only do this for certain events. + [ -n "$1" ] || set -- "$service_name" + + # We only care about some events in this function. For others we + # return now. case "$event_name" in - monitor|ipreallocated) : ;; - *) return 0 + monitor|ipreallocated|reconfigure) : ;; + *) return 0 ;; esac - if ctdb_service_needs_reconfigure "$@" ; then - ctdb_service_reconfigure "$@" - - # Fall through to non-monitor events. - [ "$event_name" = "monitor" ] || return 0 - - # We don't want to proceed with the rest of the monitor event - # here, so we exit. However, if we exit 0 then, if the - # service was previously broken, we might return a false - # positive. So we simply retrieve the status of this script - # from the previous monitor loop and exit with that status. - ctdb scriptstatus | \ - grep -q -E "^${script_name}[[:space:]]+Status:OK[[:space:]]" - exit $? + if ctdb_reconfigure_try_lock "$@" ; then + # No events covered by this function are running, so proceed + # with gay abandon. + case "$event_name" in + reconfigure) + (ctdb_service_reconfigure "$@") + exit $? + ;; + ipreallocated) + if ctdb_service_needs_reconfigure "$@" ; then + ctdb_service_reconfigure "$@" + fi + ;; + monitor) + if ctdb_service_needs_reconfigure "$@" ; then + ctdb_service_reconfigure "$@" + # Given that the reconfigure might not have + # resulted in the service being stable yet, we + # replay the previous status since that's the best + # information we have. + ctdb_replay_monitor_status + fi + ;; + esac + else + # Somebody else is running an event we don't want to collide + # with. We proceed with caution. + case "$event_name" in + reconfigure) + # Tell whoever called us to retry. + exit 2 + ;; + ipreallocated) + # Defer any scheduled reconfigure and just run the + # rest of the ipreallocated event, as per the + # eventscript. There's an assumption here that the + # event doesn't depend on any scheduled reconfigure. + # This is true in the current code. + return 0 + ;; + monitor) + # There is most likely a reconfigure in progress so + # the service is possibly unstable. As above, we + # defer any scheduled reconfigured. We also replay + # the previous monitor status since that's the best + # information we have. + ctdb_replay_monitor_status + ;; + esac fi } |
