Closed Bug 631854 Opened 14 years ago Closed 14 years ago

Buildbot can't kill subprocesses with usePTY=False, but usePTY=True causes other problems

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

()

Details

Attachments

(3 files)

---- usePTY=False leads to situations where the test code forks, and the parent process exits, but the child processes keep running for too long. Buildbot knows the processes are still running, because it still has open pipes, but it has no way to reach those processes to kill them, since they are not isolated to a process group. ---- usePTY=True leads to things like argv: ['rm', '-rf', 'tools'] environment: CVS_RSH=ssh ... _=/home/cltbld/bin/python closing stdin using PTY: True process killed by signal 1 I thought 'closing stdin' was the problem, but I can actually replicate this failure with 'keeping stdin open', so I'm not sure what the problem is. ==== Bottom line is, we never need an actual PTY (that's only useful for tests that need to output to a terminal for some reason). We *do* need to be able to kill process groups, though. So this will involve merging the result of http://trac.buildbot.net/ticket/1796 to the slaves branch, and then disabling usePTY everywhere.
The relevant fixes in upstream https://github.com/buildbot/buildbot/compare/a8a5c8ba4d98a8aba186a615109ba6b91ee3558b...e15751c301cff0ada7bd116449012c17d633ba09 do not apply at all cleanly against 0.8.0 - the slave code has been significantly refactored since that time. So this patch is not really backport-able; rather, it can be re-implemented for 0.8.0, which increases the risk factor significantly. I'd like some feedback on deciding whether to push forward with this re-implementation, or try to deploy 0.8.3+patch instead. Risks with 0.8.3+patch are: 1. 'buildbot start' becomes 'buildslave start' 2. requires sniffing out and forward-porting any local patches that were not upstreamed (1) is mitigated because runslave.py uses twistd directly, and I'd like to deploy runslave.py simuiltaneously with deploying the upgraded buildslave. From a quick diff between BUILDBOT_0_8_0 and SLAVE_0_8_0, it looks like (2) is only the addition of the 'setMozillaBuildProperties' command, which should be easy (but not trivial) to forward-port. On the basis of the above, I'm leaning toward upgrading to 0.8.3. Opinions?
Not applying cleanly vs 0.8.0 makes me lean towards 0.8.3, assuming that we think 0.8.3 is good for us. A p.o.c. run per platform would definitely make that clear. (In reply to comment #1) > 1. 'buildbot start' becomes 'buildslave start' This is actually an opportunity, imo. We can create a buildbot wrapper that checks for ssh-ness on OSX and kills it, otherwise passes off "$@" to buildslave.
+1 to upgrading to 0.8.3. We're going to have to do it at some point! The pain is only incrementally worse because of the buildbot->buildslave rename, and if that's really onerous, I'm sure we could create a 'buildbot' wrapper to do what we want.
(In reply to comment #2) > This is actually an opportunity, imo. We can create a buildbot wrapper that > checks for ssh-ness on OSX and kills it, otherwise passes off "$@" to > buildslave. Actually, I would like to build a buildbot script that says import sys print >>sys.stderr, "I think you meant 'sudo reboot' or 'shutdown /r /f /t 0'" sys.exit(1) We should not be in the habit of starting slaves without a reboot. So it looks like it's an upgrade - I'll adjust bug 631851 appropriately.
This is the diff between BUILDBOT_0_8_8 and SLAVE_0_8_0 looks like.
Assignee: nobody → dustin
And these are the new diffs after the upgrade - from the tip of 'slaves' to the tip of 'upstream' (which is at v0.8.4-pre-150-g7175a0d) From my read, we don't use this extra slave command anymore. If you can confirm that, I'll adjust the tip of 'slaves' to be equal to the tip of 'upstream'. This will need to be staged for quite a while before deployment to work out any kinks in the killing code. Other than those changes, the code in v0.8.4-pre-150-g7175a0d is very similar to that in v0.8.1 through v0.8.3, so it's pretty well-tested by others.
Attachment #510852 - Flags: review?(catlee)
Comment on attachment 510852 [details] [diff] [review] m631854-buildbot-new-from-upstream.patch I don't think we use this command either, but you may want to check with community projects. I think they were the last users of it.
Attachment #510852 - Flags: review?(catlee) → review+
Newly copied folks: my plan is to commit attachment 510852 [details] [diff] [review] as-is, which keeps the setMozillaBuildProperties command intact. However, if we can all agree that either: 1. you don't intend to use the build/buildbot 'slaves' branch for buildslaves; or 2. you don't need this slave-side command anymore then it'd be nice to get rid of this change and just run a pure upstream implementation of the buildslave code. Let me know what you think. I'm off to run this patch in staging for a while.
Calendar seems to be the only project using setMozillaBuildProperties. We are currently using buildbot 0.7 and we don't plan on rebuilding our slaves without upgrading to buildbot 0.8 and revamping the whole config to match firefox/seamonkey/thunderbird. Therefore we don't seem to be affected by this change.
With that in mind, here's a further commit that removes the mozilla-specific stuff from the slaves branch (including my failure to delete buildbot/slave/commands/mozilla.py). With this change in place, the slaves branch is identical (modulo .hgtags) to the BUILDBOT_0_8_4_PRE_150_G715A0D tag. I'll have another patch shortly to push this out with puppet.
Attachment #511221 - Flags: review?(catlee)
Attachment #511221 - Flags: review?(catlee) → review+
attachment 511221 [details] [diff] [review] is landed (on the slaves branch)
(In reply to comment #8) > Newly copied folks: my plan is to commit attachment 510852 [details] [diff] [review] as-is, which keeps > the setMozillaBuildProperties command intact. However, if we can all agree > that either: > > 2. you don't need this slave-side command anymore > As SeaMonkey release engineer, I agree with this. Dustin, its also defined/specified in buildbotcustom (steps/misc.py) as SetMozillaBuildProperties which is also unused (and in fact its THIS that calendar uses, though a skim seems to implicate it uses buildbot code we'll be losing) for posterity: * SeaMonkey has an import |from buildbotcustom.steps.misc import SetMozillaBuildProperties| in buildbot-configs\seamonkey\ccfactory.py) * Thunderbird has one in |from buildbotcustom.steps.misc import SetMozillaBuildProperties| in buildbot-configs\thunderbird\master.py * Neither SeaMonkey nor Thunderbird makes use of this import
The puppet and runslave.py changes will happen in bug 631851.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: