Closed Bug 762589 Opened 13 years ago Closed 13 years ago

Generic MySQL Cluster - affecting wikimo, mozillians, basket, firefoxflicks, moztrap, tbpl, others

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

x86
macOS
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlong, Assigned: bburton)

Details

These errors are coming in: OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") basket.mozilla.org is the site, I bet other things are failing too.
Severity: normal → blocker
Summary: mysql errors flooding in from basket.mozilla.org → mysql errors flooding in from basket.mozilla.com
We did a 10-min maintenance on the databases, but the VIPs should have worked to fail over properly. The maintenance is over, but I will check to see why there were problems.
Been discussing with :sheeri, will update with RFO when further info is available
Assignee: server-ops → bburton
Status: NEW → ASSIGNED
I double-checked - the grants are the same. I'd verified that read_only was OFF on generic2 before we did the maintenance.
It *seems* like the errors have stopped, but let's keep this open for a little bit longer to make sure.
(In reply to James Long (:jlongster) from comment #4) > It *seems* like the errors have stopped, but let's keep this open for a > little bit longer to make sure. I saw tracebacks from flicks and mozillians, both stopped at 10:54 :sheeri and :dumutri are reviewing things to determine why the failover wasn't more seemless, as it should have been
Still getting flooded with these emails: OperationalError at /subscriptions/subscribe/ (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") Request Method: POST Request URL: http://basket.mozilla.com/subscriptions/subscribe/ Django Version: 1.3 pre-alpha Exception Type: OperationalError Exception Value: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
Summary: mysql errors flooding in from basket.mozilla.com → Generic MySQL Cluster - affecting wikimo, mozillians, basket, firefoxflicks, moztrap, tbpl, others
There was a second round of connection errors, but should be good now. RFO still forthcoming
Ah, the problem was that we hadn't configured the failover server. This is on purpose, but when I read the config and thought it was OK to failover, I read it wrong.
A certain amount of "connection lost" will occur in the event of a DB failover for this cluster. Future work will be scheduled for off hours for the time being. The additional "connection lost" round was due to troubleshooting and a misunderstanding of the current config of this DB cluster, per c8 Please re-open if further issues are seen
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.