Closed Bug 879763 Opened 12 years ago Closed 12 years ago

release runner doesn't die properly when supervisord kills it

Categories

(Release Engineering :: Release Automation, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: rail)

References

Details

It seems to get hung, at least when using "supervisorctl stop". Not sure if this happens when using "restart". We end up with process state like this: root 13268 0.0 0.0 150328 5204 ? Ss Jan30 2:44 /usr/bin/python26 /usr/bin/supervisord cltbld 6902 0.0 0.0 63864 1128 ? S 06:30 0:00 \_ /bin/bash /home/cltbld/release-runner/tools/buildfarm/release/release-runner.sh cltbld 6911 0.5 0.2 165284 13356 ? S 06:30 0:00 \_ python release-runner.py -c /home/cltbld/.release-runner.ini cltbld 6801 0.1 0.2 169132 16988 ? S 06:28 0:00 python release-runner.py -c /home/cltbld/.release-runner.ini ...which is bad because we could have two release runners racing on a release. One idea for fixing this: 09:54 < jhopkins> bhearsum: try using an exec call to launch python from the shell script. this will result in only one pid beneath supervisord
Assignee: nobody → rail
Unfortunately we can't easily replace the current invocation method with "exec" because we check the exit status of the child script and send an email if it exits no zero...
A more invasive approach would be to get rid of the shell wrapper. The only thing it does is send failure e-mail AFAIK. We could probably get better failure mail with an extra LogHandler in Python, anyways...
We need "stopasgroup" which was introduced in 3.0b1 (we use 3.0a9).
Depends on: 883693
(In reply to Rail Aliiev [:rail] from comment #3) > We need "stopasgroup" which was introduced in 3.0b1 (we use 3.0a9). Sounds like a good opportunity to move release runner to a more modern machine, and puppetize it...
I just verified that 3.0b2 fixes the problem. Since bm36 is not managed by puppet and will die soon I left this version installed.
This should be resolved now. I manually upgraded the supervisor package on bm36 (which is not managed by puppet), added stopasgroup/killasgroup to its config and verified that supervisor kills the subprocess properly. stopasgroup/killasgroup will be set by default in bug 836289 for the future deployments.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.