sssd.git - Unnamed repository; edit this file to name it for gitweb.

diff options

author	Stephen Gallagher <sgallagh@redhat.com>	2014-12-10 14:16:49 -0500
committer	Jakub Hrozek <jhrozek@redhat.com>	2015-01-07 12:09:32 +0100
commit	152251b13a99c88054055d46600e0478c4f7bd05 (patch)
tree	a1f841a86c1d991cf2fa5782b579248a291fa19a /contrib/rhel
parent	ad1bc5e129a9a2128851aa028247f8e5fab54cc8 (diff)
download	sssd-152251b13a99c88054055d46600e0478c4f7bd05.tar.gz sssd-152251b13a99c88054055d46600e0478c4f7bd05.tar.xz sssd-152251b13a99c88054055d46600e0478c4f7bd05.zip

monitor: Service restart fixes

There are actually two bugs here: 1) When either the kill(SIGTERM) or kill(SIGKILL) commands returned failure (for any reason), we would talloc_free(svc) which removed it from being eligible for restart, resulting in the service never starting again without an SSSD service restart. 2) There is a fairly wide race condition where it's possible for a SIGKILL timer to "catch up" to the child exit handler between us noticing the termination and actually restarting it. The race happens because we re-enter the mainloop and add a restart timeout to avoid a quick failure if we keep restarting due to a transitory issue (the mt_svc object, and therefore the SIGKILL timer, were never freed until we got to the actual service restart). We can minimize this race by recording the timer_event for the SIGKILL timeout in the mt_svc object. This way, if the process exits via SIGTERM, we will immediately remove the timer for the SIGKILL. Additionally, we'll catch the special-case of an ESRCH response from the kill(SIGKILL) and assume that it means that the process has exited. The only other two possible errors are * EINVAL: (an invalid signal was specified) - This should be impossible, obviously. * EPERM: This process doesn't have permission to send signals to this PID. If this happens, it's either an SELinux bug or else the process has terminated and a new process that SSSD doesn't control has taken the ID over. So in the incredibly unlikely case that one of those occurs, we'll just go ahead and try to start a new process. This patch also removes the incorrect talloc_free(svc) calls on the kill() failures and replaces them with an attempt to just start up the service again and hope for the best. Resolves: https://fedorahosted.org/sssd/ticket/2525 Reviewed-by: Pavel Březina <pbrezina@redhat.com>

Diffstat (limited to 'contrib/rhel')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: