Closed
Bug 879763
Opened 12 years ago
Closed 12 years ago
release runner doesn't die properly when supervisord kills it
Categories
(Release Engineering :: Release Automation, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: rail)
References
Details
It seems to get hung, at least when using "supervisorctl stop". Not sure if this happens when using "restart". We end up with process state like this:
root 13268 0.0 0.0 150328 5204 ? Ss Jan30 2:44 /usr/bin/python26 /usr/bin/supervisord
cltbld 6902 0.0 0.0 63864 1128 ? S 06:30 0:00 \_ /bin/bash /home/cltbld/release-runner/tools/buildfarm/release/release-runner.sh
cltbld 6911 0.5 0.2 165284 13356 ? S 06:30 0:00 \_ python release-runner.py -c /home/cltbld/.release-runner.ini
cltbld 6801 0.1 0.2 169132 16988 ? S 06:28 0:00 python release-runner.py -c /home/cltbld/.release-runner.ini
...which is bad because we could have two release runners racing on a release.
One idea for fixing this:
09:54 < jhopkins> bhearsum: try using an exec call to launch python from the shell script. this will result in only one pid beneath supervisord
| Assignee | ||
Updated•12 years ago
|
Assignee: nobody → rail
| Assignee | ||
Comment 1•12 years ago
|
||
Unfortunately we can't easily replace the current invocation method with "exec" because we check the exit status of the child script and send an email if it exits no zero...
| Reporter | ||
Comment 2•12 years ago
|
||
A more invasive approach would be to get rid of the shell wrapper. The only thing it does is send failure e-mail AFAIK. We could probably get better failure mail with an extra LogHandler in Python, anyways...
| Assignee | ||
Comment 3•12 years ago
|
||
We need "stopasgroup" which was introduced in 3.0b1 (we use 3.0a9).
| Reporter | ||
Comment 4•12 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #3)
> We need "stopasgroup" which was introduced in 3.0b1 (we use 3.0a9).
Sounds like a good opportunity to move release runner to a more modern machine, and puppetize it...
| Assignee | ||
Comment 6•12 years ago
|
||
I just verified that 3.0b2 fixes the problem. Since bm36 is not managed by puppet and will die soon I left this version installed.
| Assignee | ||
Comment 7•12 years ago
|
||
This should be resolved now. I manually upgraded the supervisor package on bm36 (which is not managed by puppet), added stopasgroup/killasgroup to its config and verified that supervisor kills the subprocess properly.
stopasgroup/killasgroup will be set by default in bug 836289 for the future deployments.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•