Closed
Bug 868512
Opened 11 years ago
Closed 7 years ago
investigate why socorro processors are not shut down in timely manner
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: rhelmer, Unassigned)
References
Details
See bug 864823 comment 5. I've been investigating why Socorro apps are not always shut down in a timely manner. The only workable theory I have right now is: 1) kill signal sent to app 2) 15 seconds elapse 3) kill -KILL signal sent to app 4) app is still running, in uninterruptible sleep 5) start is send (before #4 has completed) Is 15s just too short? Can we change the design of processor/monitor/crashmover so that this can't happen? I'd rather throw away the work they are doing and have them respond to a kill signal in a timely manner, rather than wait an indeterminate length of time.
Reporter | ||
Updated•11 years ago
|
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
Comment 1•11 years ago
|
||
I suspect one or more threads is in an I/O op that prevents it from checking for sigint/term for 15s. I'm not sure how long to extend the waiting period, though.
Comment 3•11 years ago
|
||
During today's release we had multiple failures to shut down. See release bug: https://bugzilla.mozilla.org/show_bug.cgi?id=897612 There were also 10 minidump_stackwalk processes running on processor 05 when it failed. solarce: [root@sp-processor05 ~]# ps waux | grep mini | wc -l solarce: 10 solarce: should there be /data/socorro/stackwalk/bin/minidump_stackwalk processes running when the service is stopped? I remember lars talking about unexpectedly longrunning minidump_stackwalk processes in irc before he left. Is that related to what we're experiencing here?
Summary: investigate why socorro apps are not shut down in timely manner → investigate why socorro processors are not shut down in timely manner
You need to log in
before you can comment on or make changes to this bug.
Description
•