Closed Bug 793986 Opened 13 years ago Closed 13 years ago

Builds/tests not being scheduled on all trees as of ~0830 UTC+1

Categories

(Release Engineering :: General, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: emorley, Unassigned)

Details

(Whiteboard: [buildduty])

An inbound push at 0835 UTC+1 got builds on just a couple of platforms: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=3a8ee10cf6ec mozilla-central/try pushes since have nothing: https://tbpl.mozilla.org/?onlyunstarred=1&rev=08d435dedc7f https://tbpl.mozilla.org/?tree=Try&rev=3c79b6847b25 All trees are closed.
Self-serve isn't working either, see bug 793984.
Seemed to clear up by 0945ish - lowering severity but leaving open for someone to look into cause/prevention.
Severity: blocker → critical
Oh and trees just reopened.
Nagios was going nuts about many machines in scl3 being inaccessible, between 0020 and 0100 Pacific. The buildbot database lives in scl3, so scheduling would have been affected.
Actually I can't find any errors in the logs for both schedulers, both were polling hg and scheduling changes fine. But we do have a lot of mail about exceptions on masters in scl1 and mtv1, complaints like _mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away') _mysql_exceptions.OperationalError: (2003, "Can't connect to MySQL server on ... So the work was being scheduled but the masters-with-slaves couldn't pick it up while scl3 had gone away. So within scl3 worked fine, but the links to scl1 and mtv1 were down.
Bug 794015 for NetOps to invesigate, but lest call this bug fixed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.