Closed Bug 422328 Opened 17 years ago Closed 17 years ago

qm-pmac-trunk05 starved of builds

Categories

(Release Engineering :: General, defect, P2)

PowerPC
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dwitte, Assigned: anodelman)

References

Details

looks like qm-pmac-trunk05 got its internets twisted. it last reported green around 6pm, then went red, and hasn't reported since: Running test tp: Started Tue, 11 Mar 2008 19:11:38 Screen width/height:1280/1024 colorDepth:24 Browser inner width/height: 1024/658 Browser outer width/height: 1024/768 NOISE: ### MRJPlugin: getPluginBundle() here. ### NOISE: ### MRJPlugin: CFBundleGetBundleWithIdentifier() succeeded. ### ### MRJPlugin: CFURLGetFSRef() succeeded. ### [Failure instance: Traceback (failure with no frames): twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. ]
Assignee: server-ops → aravind
Where do you see it as being red? I think its busy testing away?
http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox, it hasn't reported anything since 19:27.
I can't vnc or ssh into the box. Out of things to try per the docs.
Paged John about it.
Re-assigning to build, since there is not much I can do for this.
Assignee: aravind → nobody
Component: Server Operations: Tinderbox Maintenance → Build & Release
QA Contact: justin → build
Responding to page & irc. If I understand correctly, we also have qm-pmac-trunk04. While we do need to fix qm-pmac-trunk05, I do not see this as a blocker, because of healthy qm-pmac-trunk04. Please re-escalate if I'm missing something.
Severity: blocker → critical
Priority: -- → P1
justdave rebooted the box and restarted buildbot slave.
Waiting for restarted slave to try another build, but nothing yet. Dont know if thats expected or not. Its possible this could be a buildbot problem (slave seen as reconnected, but master not allocate work to slave). A quick look on qm-rhel02 confirmed that the buildmaster did not mention qm-mac-trunk05...was I looking in the right buildbot master location? I'd like to try restarting the master to see if that fixes the reconnect problem...
Looks like it's not running because there's been no new builds (because fx-win32-tbox was hung). Once that's fixed it should get a build shortly.
Hmm? Its full name is "MacOSX Darwin 9.0.0 talos trunk qm-pmac-trunk05," so if it's waiting on builds from fx-win32-tbox that name is very very misleading. I'd maybe buy build starvation by qm-pmac-trunk04, which is crashing in tsvg, and thus finishing quickly, except I thought that was fixed, and anyway it's been doing that since the 9th, and they coexisted that way for a couple of days before this.
Or perhaps it was build starvation: either qm-pmac-trunk04 died on its own, or someone put it out of its misery without updating either this bug or bug 422438, but either way 04 stopped reporting at 11:16, and 05 started up again at 11:44.
I took a look at the Buildbot waterfall for Talos. There are three slave machines in the "MacOSX Darwin 9.0.0 talos trunk" group: qm-pmac-trunk04, 5 and 6. There were three responses to a ping request, so they should all be up. Looking at the Firefox tree, it looks like bm-xserve08 cycles quickly enough to keep two slaves busy. Not sure what algorithm BuildBot uses to assign jobs, but it looks like the choice changes. The idle box then appears to fall of the Tinderbox page.
I would agree with comment #12 - we aren't generating builds quickly enough to keep three slaves occupied. Once bug 419071 is pushed we'll start selecting talos build slaves randomly so we should no longer see the situation where a slave is entirely starved off of the waterfall.
Assignee: nobody → anodelman
Once bug 419071 is applied to the production talos build master this issue should go away.
Component: Build & Release → Release Engineering
Depends on: 419071
Priority: P1 → P2
Summary: qm-pmac-trunk05 dead → qm-pmac-trunk05 starved of builds
With bug 419071 pushed to the talos buildbot master I'm seeing all three leopard machines report consistently - the random slave selection guarantees that a given slave in a set won't be starved.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Component: Release Engineering: Talos → Release Engineering
QA Contact: build → release
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.