Closed Bug 1222915 Opened 9 years ago Closed 9 years ago

Huge backlog of OS X 10.10. All trunk trees closed

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nigelb, Unassigned)

References

Details

Our backlog for 10.10 seems to go as far back as pushes from 12 hours ago. Can we investigate urgently what's going wrong? All trunk trees are closed pending catch up.
Slave health shows that t-yosemite-r5 slaves have not been picking any job since more than 1 day ago. Looking on /builds/slave/twistd.log on several machines will reveal the following error: 2015-11-07 18:05:20-0800 [Broker,client] Connected to buildbot-master107.bb.releng.scl3.mozilla.com:9201; slave is ready 2015-11-08 01:05:20-0800 [-] I feel very idle and was thinking of rebooting as soon as the buildmaster says it's OK 2015-11-08 01:05:20-0800 [-] Telling the master we want to shutdown after any running builds are finished 2015-11-08 01:05:20-0800 [Broker,client] Peer will receive following PB traceback: 2015-11-08 01:05:20-0800 [Broker,client] Unhandled Error Traceback (most recent call last): File "/tools/buildbot/lib/python2.7/site-packages/twisted/spread/banana.py", line 153, in gotItem self.callExpressionReceived(item) File "/tools/buildbot/lib/python2.7/site-packages/twisted/spread/banana.py", line 116, in callExpressionReceived self.expressionReceived(obj) File "/tools/buildbot/lib/python2.7/site-packages/twisted/spread/pb.py", line 514, in expressionReceived method(*sexp[1:]) File "/tools/buildbot/lib/python2.7/site-packages/twisted/spread/pb.py", line 826, in proto_message self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw) --- <exception caught here> --- File "/tools/buildbot/lib/python2.7/site-packages/twisted/spread/pb.py", line 840, in _recvMessage netResult = object.remoteMessageReceived(self, message, netArgs, netKw) File "/tools/buildbot/lib/python2.7/site-packages/twisted/spread/flavors.py", line 114, in remoteMessageReceived state = method(*args, **kw) File "/tools/buildbot/lib/python2.7/site-packages/buildslave/idleizer.py", line 157, in new_fn self.maybeReboot() File "/tools/buildbot/lib/python2.7/site-packages/buildslave/idleizer.py", line 118, in maybeReboot elif self.reboot_or_halt: exceptions.AttributeError: Idleizer instance has no attribute 'reboot_or_halt'
Rebooted all t-yosemite-r5 slaves. They started picking the remaining pending jobs, the backlog is decreasing at the moment.
Depends on: 1173942
(In reply to Alin Selagea [:aselagea][:buildduty] from comment #3) > Rebooted all t-yosemite-r5 slaves. They started picking the remaining > pending jobs, the backlog is decreasing at the moment. yeah we are still at a high number and its dropping slowly Pending test(s) @ Nov 09 03:19:11 mac10.10 (1190) P3 87 mozilla-aurora P4 1 fx-team P4 1056 mozilla-inbound P5 46 try mac10.6 (281) P3 124 mozilla-aurora P4 157 mozilla-inbound
reopen fx-team and b2g-i, m-i still need some minutes but should also reopen soon
ok trees reopen after catlee rebooted some more osx - but letting the bug open in case we need more reboots
Severity: blocker → normal
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.