diff options
author | Stephen Gallagher <sgallagh@redhat.com> | 2014-12-10 14:16:49 -0500 |
---|---|---|
committer | Jakub Hrozek <jhrozek@redhat.com> | 2015-01-07 12:09:32 +0100 |
commit | 152251b13a99c88054055d46600e0478c4f7bd05 (patch) | |
tree | a1f841a86c1d991cf2fa5782b579248a291fa19a /contrib/rhel | |
parent | ad1bc5e129a9a2128851aa028247f8e5fab54cc8 (diff) | |
download | sssd-152251b13a99c88054055d46600e0478c4f7bd05.tar.gz sssd-152251b13a99c88054055d46600e0478c4f7bd05.tar.xz sssd-152251b13a99c88054055d46600e0478c4f7bd05.zip |
monitor: Service restart fixes
There are actually two bugs here:
1) When either the kill(SIGTERM) or kill(SIGKILL) commands returned
failure (for any reason), we would talloc_free(svc) which removed it
from being eligible for restart, resulting in the service never
starting again without an SSSD service restart.
2) There is a fairly wide race condition where it's possible for a
SIGKILL timer to "catch up" to the child exit handler between us
noticing the termination and actually restarting it. The race
happens because we re-enter the mainloop and add a restart
timeout to avoid a quick failure if we keep restarting due to a
transitory issue (the mt_svc object, and therefore the SIGKILL
timer, were never freed until we got to the actual service
restart).
We can minimize this race by recording the timer_event for the
SIGKILL timeout in the mt_svc object. This way, if the process
exits via SIGTERM, we will immediately remove the timer for the
SIGKILL. Additionally, we'll catch the special-case of an ESRCH
response from the kill(SIGKILL) and assume that it means that the
process has exited. The only other two possible errors are
* EINVAL: (an invalid signal was specified) - This should be
impossible, obviously.
* EPERM: This process doesn't have permission to send signals to
this PID. If this happens, it's either an SELinux bug or
else the process has terminated and a new process that
SSSD doesn't control has taken the ID over.
So in the incredibly unlikely case that one of those occurs, we'll
just go ahead and try to start a new process.
This patch also removes the incorrect talloc_free(svc) calls on the
kill() failures and replaces them with an attempt to just start up
the service again and hope for the best.
Resolves:
https://fedorahosted.org/sssd/ticket/2525
Reviewed-by: Pavel Březina <pbrezina@redhat.com>
Diffstat (limited to 'contrib/rhel')
0 files changed, 0 insertions, 0 deletions