summaryrefslogtreecommitdiffstats
path: root/contrib/rhel
diff options
context:
space:
mode:
authorStephen Gallagher <sgallagh@redhat.com>2014-12-10 14:16:49 -0500
committerJakub Hrozek <jhrozek@redhat.com>2015-01-07 12:09:32 +0100
commit152251b13a99c88054055d46600e0478c4f7bd05 (patch)
treea1f841a86c1d991cf2fa5782b579248a291fa19a /contrib/rhel
parentad1bc5e129a9a2128851aa028247f8e5fab54cc8 (diff)
downloadsssd-152251b13a99c88054055d46600e0478c4f7bd05.tar.gz
sssd-152251b13a99c88054055d46600e0478c4f7bd05.tar.xz
sssd-152251b13a99c88054055d46600e0478c4f7bd05.zip
monitor: Service restart fixes
There are actually two bugs here: 1) When either the kill(SIGTERM) or kill(SIGKILL) commands returned failure (for any reason), we would talloc_free(svc) which removed it from being eligible for restart, resulting in the service never starting again without an SSSD service restart. 2) There is a fairly wide race condition where it's possible for a SIGKILL timer to "catch up" to the child exit handler between us noticing the termination and actually restarting it. The race happens because we re-enter the mainloop and add a restart timeout to avoid a quick failure if we keep restarting due to a transitory issue (the mt_svc object, and therefore the SIGKILL timer, were never freed until we got to the actual service restart). We can minimize this race by recording the timer_event for the SIGKILL timeout in the mt_svc object. This way, if the process exits via SIGTERM, we will immediately remove the timer for the SIGKILL. Additionally, we'll catch the special-case of an ESRCH response from the kill(SIGKILL) and assume that it means that the process has exited. The only other two possible errors are * EINVAL: (an invalid signal was specified) - This should be impossible, obviously. * EPERM: This process doesn't have permission to send signals to this PID. If this happens, it's either an SELinux bug or else the process has terminated and a new process that SSSD doesn't control has taken the ID over. So in the incredibly unlikely case that one of those occurs, we'll just go ahead and try to start a new process. This patch also removes the incorrect talloc_free(svc) calls on the kill() failures and replaces them with an attempt to just start up the service again and hope for the best. Resolves: https://fedorahosted.org/sssd/ticket/2525 Reviewed-by: Pavel Březina <pbrezina@redhat.com>
Diffstat (limited to 'contrib/rhel')
0 files changed, 0 insertions, 0 deletions