Closed Bug 868512 Opened 11 years ago Closed 7 years ago

investigate why socorro processors are not shut down in timely manner

Categories

(Socorro :: Backend, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rhelmer, Unassigned)

References

Details

See bug 864823 comment 5. I've been investigating why Socorro apps are not always shut down in a timely manner.

The only workable theory I have right now is:

1) kill signal sent to app
2) 15 seconds elapse
3) kill -KILL signal sent to app
4) app is still running, in uninterruptible sleep
5) start is send (before #4 has completed)

Is 15s just too short? Can we change the design of processor/monitor/crashmover so that this can't happen?

I'd rather throw away the work they are doing and have them respond to a kill signal in a timely manner, rather than wait an indeterminate length of time.
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
I suspect one or more threads is in an I/O op that prevents it from checking for sigint/term for 15s. I'm not sure how long to extend the waiting period, though.
Not actively working on this
Assignee: rhelmer → nobody
During today's release we had multiple failures to shut down. See release bug: https://bugzilla.mozilla.org/show_bug.cgi?id=897612

There were also 10 minidump_stackwalk processes running on processor 05 when it failed.

solarce: [root@sp-processor05 ~]# ps waux | grep mini | wc -l
solarce: 10
solarce: should there be /data/socorro/stackwalk/bin/minidump_stackwalk processes running when the service is stopped?

I remember lars talking about unexpectedly longrunning minidump_stackwalk processes in irc before he left. Is that related to what we're experiencing here?
Summary: investigate why socorro apps are not shut down in timely manner → investigate why socorro processors are not shut down in timely manner
pass
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.