Our buildbot masters are ok (they talk to buildbot-rw-vip.db.scl3), but jobs are not showing up on tbpl.m.o and self-serve is broken. They talk to buildbot-ro-vip.db.scl3. Symptoms: no jobs for https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=32172106c658 https://secure.pub.build.mozilla.org/buildapi/pending is a timeout
for now, nthomas has closed the tree. oncall SRE is able to access the servers but ssh is otherwise down (I can't connect). We'll continue to update this bug with progress.
got access thanks to Aj. I've validated integrity and failed over the zeus pool to only use buildbot1. buildbot2 is out of the loop until further notice.
I've restarted buildapi and we're looking good. Please keep an eye on buildbot1 while we're running both ro and rw on it.
philor reopened the trees at 16:54.
eyes on buildbot1, maintenance still ongoing.
There was an issue with eth0 on buildbot2.db.scl3 which caused bonding (bond0) not to fail over to eth1. The cause of the issue was an outdated be2net nic driver. $ ethtool -i eth0 driver: be2net version: 4.1.307r firmware-version: 3.102.517.701 bus-info: 0000:02:00.0 I updated the kernel as follows; Kernel: 2.6.32-279.14.1.el6.x86_64 -> 2.6.32-358.6.2.el6.x86_64 which updated the b2net driver but not to the latest from HP. I then installed the latest from HP; $ ethtool -i eth0 driver: be2net version: 4.1.450.7 firmware-version: 3.102.517.701 bus-info: 0000:02:00.0 :cyborgshadow will add buildbot2.db.scl3 back to the pool.
Aj finished fixing the server. I've re-added buildbot2 to the RO pool and all looks good. I see traffic flowing on it. :)
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Adding notes here too for posterity: The crash was caused by the buggy Intel chipset, as confirmed by this: Jun 3 16:04:54 buildbot2.db.scl3.mozilla.com kernel: do_IRQ: 0.204 No irq handler for vector (irq -1) Apparently the box wasn't rebooted since we applied the kernel option in grub.conf, so it didn't pick it up. buildbot1 awaits the same fate if we don't reboot it.
Bug 879102 was filed for buildbot1.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.