Closed Bug 852380 Opened 12 years ago Closed 12 years ago

buildbot2.db.scl3 issues 2013-03-18

Categories

(Data & BI Services Team :: DB: MySQL, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: u429623, Assigned: rbryce)

References

Details

(Whiteboard: [reit-ops] [closed-trees])

bug filed after the fact to capture db crash and related fallout - assigned to rbryce as he handled it and can fill in more specifics From RelEng viewpoint, the cascade was: * mysql blew up, messed up buildbot schedulers, and buildapi leading to tree closure. * Restart load may have contributed to HG issues in bug 852376 mysql nagios alert was cleared by kernel module reload by rbryce,
Assignee: server-ops-database → rbryce
Blocks: 852351
No longer depends on: 852376
What could prevent mysql to blow out and the trees to get closed? Thanks!
The be2net driver crashed causing the network device on the server to shut down. I quickly reloaded the module and restarted networking. This driver crash, is an known issue across many of our older hp blades. I believe rhel is still working on a fix, Bug 831054. Im not sure the Hg load spike was caused by this network outage on buildbot2. The first load spike came and went before the buildbot2.db outage. With this in mind, we(IT) have not been able to recreate the circumstances that lead to the be2net driver crash. It may have been increased traffic from Hg that caused the driver to crash.
Do you know what could be a long term solution for something like this to not bring down the continuous integration? Thanks for the debrief!
In fact, dbas were not even paged about this issue. It being 100% network device related, MySQL stayed up (it's been up since 1/4 at 6:59 am Pacific time), so there was no worry of corruption or anything.
Armen - the long-term solution is to get a proper fix from RHEL.
Adding explicit link to the RHEL bug 831054
Depends on: 831054
Closing this out. The immediate issue is over, and there's an audit planned in Q3 of all machines: https://bugzilla.mozilla.org/show_bug.cgi?id=883228
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.