Closed Bug 619279 Opened 14 years ago Closed 14 years ago

Timeouts not handled properly on jetpack jobs

Categories

(Release Engineering :: General, defect, P5)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 626486

People

(Reporter: nthomas, Unassigned)

References

Details

eg, from a slaves's twisted.log: 2010-12-08 11:29:01-0800 [Broker,client] bash -c c:/talos-slave/mozilla-central_win7_test-jetpack/tools/buildfarm/utils/run_jetpack.sh win32 2010-12-08 11:29:01-0800 [Broker,client] in dir c:\talos-slave\mozilla-central_win7_test-jetpack\build (timeout 1200 secs) 2010-12-08 11:29:01-0800 [Broker,client] watching logfiles {} [snipped environment] 2010-12-08 11:29:01-0800 [Broker,client] closing stdin 2010-12-08 11:29:01-0800 [Broker,client] using PTY: False 2010-12-08 11:49:36-0800 [-] command timed out: 1200 seconds without output 2010-12-08 11:49:36-0800 [-] trying process.signalProcess('KILL') 2010-12-08 11:49:36-0800 [-] Unhandled Error Traceback (most recent call last): File "C:\mozilla-build\python25\lib\site-packages\twisted\application\app.py", line 390, in startReactor self.config, oldstdout, oldstderr, self.profiler, reactor) File "C:\mozilla-build\python25\lib\site-packages\twisted\application\app.py", line 311, in runReactorWithLogging reactor.run() self.mainLoop()build\python25\lib\site-packages\twisted\internet\base.py", line 1165, in run File "C:\mozilla-build\python25\lib\site-packages\twisted\internet\base.py", line 1174, in mainLoop self.runUntilCurrent() --- <exception caught here> --- File "C:\mozilla-build\python25\lib\site-packages\twisted\internet\base.py", line 796, in runUntilCurrent call.func(*call.args, **call.kw) File "C:\mozilla-build\python25\lib\site-packages\buildbot-0.8.0-py2.5.egg\buildbot\slave\commands\base.py", line 726, in doTimeout self.kill(msg) File "C:\mozilla-build\python25\lib\site-packages\buildbot-0.8.0-py2.5.egg\buildbot\slave\commands\base.py", line 791, in kill self.process.signalProcess(self.KILL) File "C:\mozilla-build\python25\lib\site-packages\twisted\internet\_dumbwin32proc.py", line 239, in signalProcess raise error.ProcessExitedAlready() twisted.internet.error.ProcessExitedAlready: This was on a talos-r3-w7-NNN but I've seen it on Leopard too and expect it to be all platforms. There's a firefox process still running using a really small amount of memory, ~ 2MB. There's no indication if something crashed, looks more like a startup hang. IIRC dustin said that buidlbot 0.8.3 will resolve this problem in the slave code, which should stop slaves dropping out of use until we go corral them again.
Priority: -- → P5
I believe this is more important than P5. If we want we can block on updating to buildbot-0.8.3. Would that be on the slaves side or just the master? We are probably going to disable jetpack in bug 627070 until it goes green.
I'm not sure what Nick means about 0.8.3 solving this. I don't particularly understand what the problem was, so I shouldn't have said anything about solutions (which doesn't mean I *didn't* say anything..) I just skimmed the v0.8.2..v0.8.3 history for buildslave, and I don't see anything relevant. If 0.8.3 does fix this, it would be a slave fix.
For now we are going to disable jetpack were is perma-red. So no worries.
I mistakenly thought 0.8.3 was better about not barfing about ProcessExitedAlready, but you mentioned elsewhere there are other bugs on file.
There's been at least a couple of these lately. I see ones on March 8 and 9th on Leopard.
Isn't this the same as bug 626486? I'd mark it as a dupe, but I'm not sure why I didn't do so in comment 2?
IIUC it is the same issue.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.