Closed
Bug 743976
Opened 12 years ago
Closed 12 years ago
"Lost connection to MySQL server during query" errors on some buildbot masters
Categories
(Data & BI Services Team :: DB: MySQL, task)
Data & BI Services Team
DB: MySQL
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: cshields)
Details
(Whiteboard: [buildduty][buildmasters])
While doing some normal work, I tried to load three buildbot masters (buildbot-master07, 08, and 12). All three of them hung for about 10 minutes, and then came back with a long traceback ending in: <class '_mysql_exceptions.OperationalError'>: (2013, 'Lost connection to MySQL server during query') It's hard to tell what impact this is having, since I can't load the buildbot master webpages, but it could be a tree closing event.
Comment 1•12 years ago
|
||
More debugging info: The pattern seems to be that the buildbot servers *outside* of scl3 are the ones having the issue. All the servers are configured to use tm-b01-master01.mozilla.org. The ones in sjc1 and scl1 started showing errors at 03:00PDT. Telentting to the mysql port from a machine throwing errors actually connects.
Comment 2•12 years ago
|
||
buildbot-master10 started showing a different error at 03:07: _mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
Reporter | ||
Comment 3•12 years ago
|
||
Given the error pattern, the only guess I have is that maybe we're hitting maximum connections per host or something? bm07/bm08/bm12 are all build masters, which are probably doing l10n nightlies right now - which is one of our busiest periods of the day.
Reporter | ||
Comment 4•12 years ago
|
||
I haven't been able to repro the web interface symptom at the moment on two of the scl1 test masters, bm04 and bm06, but I see some errors on bm06 in the last hour.
Assignee | ||
Comment 5•12 years ago
|
||
There was a 5 minute existing session timeout in zeus that appeared to be tripped here. Took out all timeouts and all per-client limitations in zeus and these problems have gone away.
Assignee: server-ops-database → cshields
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Resolution: FIXED → ---
Whiteboard: [buildduty][buildmasters]
Comment 6•12 years ago
|
||
reducing to major. I haven't seen the error yet, will mark as confirmed if that holds true for the next 30 or so minutes
Severity: critical → major
Comment 7•12 years ago
|
||
bah - did not realize that bugzilla reset the resolved flag - fixing and it's confirmed also :)
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•