Closed Bug 879763 Opened 12 years ago Closed 12 years ago

release runner doesn't die properly when supervisord kills it

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: rail)

References

Details

bhearsum@mozilla.com (:bhearsum)

Reporter

Description

•

12 years ago

It seems to get hung, at least when using "supervisorctl stop". Not sure if this happens when using "restart". We end up with process state like this: root 13268 0.0 0.0 150328 5204 ? Ss Jan30 2:44 /usr/bin/python26 /usr/bin/supervisord cltbld 6902 0.0 0.0 63864 1128 ? S 06:30 0:00 \_ /bin/bash /home/cltbld/release-runner/tools/buildfarm/release/release-runner.sh cltbld 6911 0.5 0.2 165284 13356 ? S 06:30 0:00 \_ python release-runner.py -c /home/cltbld/.release-runner.ini cltbld 6801 0.1 0.2 169132 16988 ? S 06:28 0:00 python release-runner.py -c /home/cltbld/.release-runner.ini ...which is bad because we could have two release runners racing on a release. One idea for fixing this: 09:54 < jhopkins> bhearsum: try using an exec call to launch python from the shell script. this will result in only one pid beneath supervisord

Rail Aliiev [:rail]

Assignee

Updated

•

12 years ago

Assignee: nobody → rail

Rail Aliiev [:rail]

Assignee

Comment 1

•

12 years ago

Unfortunately we can't easily replace the current invocation method with "exec" because we check the exit status of the child script and send an email if it exits no zero...

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 2

•

12 years ago

A more invasive approach would be to get rid of the shell wrapper. The only thing it does is send failure e-mail AFAIK. We could probably get better failure mail with an extra LogHandler in Python, anyways...

Rail Aliiev [:rail]

Assignee

Comment 3

•

12 years ago

We need "stopasgroup" which was introduced in 3.0b1 (we use 3.0a9).

Rail Aliiev [:rail]

Assignee

Updated

•

12 years ago

Depends on: 883693

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 4

•

12 years ago

(In reply to Rail Aliiev [:rail] from comment #3) > We need "stopasgroup" which was introduced in 3.0b1 (we use 3.0a9). Sounds like a good opportunity to move release runner to a more modern machine, and puppetize it...

Rail Aliiev [:rail]

Assignee

Comment 6

•

12 years ago

I just verified that 3.0b2 fixes the problem. Since bm36 is not managed by puppet and will die soon I left this version installed.

Rail Aliiev [:rail]

Assignee

Comment 7

•

12 years ago

This should be resolved now. I manually upgraded the supervisor package on bm36 (which is not managed by puppet), added stopasgroup/killasgroup to its config and verified that supervisor kills the subprocess properly. stopasgroup/killasgroup will be set by default in bug 836289 for the future deployments.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.

Bugzilla

release runner doesn't die properly when supervisord kills it

Categories

(Release Engineering :: Release Automation, defect)

Tracking

(Not tracked)

People

(Reporter: bhearsum, Assigned: rail)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 6

Comment 7

Updated