Closed Bug 980435 Opened 11 years ago Closed 11 years ago

processor lockfiles outlive their usefulness after SIGKILL

Categories

(Socorro :: Infra, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lars, Assigned: dmaher)

Details

the socorro-processor init.d script uses the killproc function to shutdown.  If the processor exceeds killproc's delay time, killprocs sends SIGKILL to the processor.  While that stops the processor, it also appears to stop "daemonize" from properly removing the lockfile.  We end up having to manually remove it.  That messes up automation for deployments of new code that requires a processor restart.  

look into using the pidfile instead of the lockfile as system documentation suggests is possible.  Alternatively, in socorro-processor, try to just delete the lock file after killproc has run.
(In reply to K Lars Lohn [:lars] [:klohn] from comment #0)
> look into using the pidfile instead of the lockfile as system documentation
> suggests is possible.  Alternatively, in socorro-processor, try to just
> delete the lock file after killproc has run.

To amplify:

The "killproc" function is provided by the RHEL-standard init library "functions".  In said function, the final code block is as follows :

        # Remove pid file if any.
    if [ -z "$killlevel" ]; then
            rm -f "${pid_file:-/var/run/$base.pid}"
    fi
    return $RC

Since we are not explicitly declaring a kill level (i.e. we run killproc without passing it as an argument), this block should run; of course, it only removes the pidfile (not the lockfile), which is where an alternate usage of daemonize comes into play.  The man page of daemonize states :

       It is possible to use the pidfile as the lock file (e.g., "-p /var/run/foo -l /var/run/foo"), though typical daemons use separate files.

This would have the net effect of using the same file for both purposes, meaning that when killproc removes the pidfile, it would also remove the lockfile, as they would be one in the same.  This could be a relatively clean solution to the problem that leverages existing system functionality without altering expected behaviour too much (i.e. makes sysadmins happy).

The alternative, as :lars noted, is to simply rm the lock file after killproc has run.  This would also have the net effect of removing the lockfile, but is generally considered bad form from a systems perspective.  If the former solution is one of finesse, this is one of brute force - that said, it's simple and effective.

I'll run some tests and see how the finesse option feels, first.
Some basic tests on an RHEL VM show that the lockfile is removed as expected in both cases.  There is no clear winner between the two.  At the end of the day this isn't such a big issue, so I'm inclined to try the "finesse" option first (since it makes the sysadmin in me happier), and if it doesn't work out, we'll just go the rm route.
Status: NEW → ASSIGNED
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/5030fecc75467a9aaf5098358e50a8f6b94d427e
Merge pull request #1934 from phrawzty/bug980435

fixes 980435
Closing this for now; re-open if the issue persists.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 78
You need to log in before you can comment on or make changes to this bug.