Closed Bug 568031 Opened 14 years ago Closed 14 years ago

acelb is refusing connections on most of the vips

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
blocker

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: paulc, Assigned: dmoore)

References

()

Details

Getting a bunch of stack traces from Kitsune with this at the end:

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/MySQLdb/connections.py", line 188, in __init__
    super(Connection, self).__init__(*args, **kwargs2)

OperationalError: (2003, "Can't connect to MySQL server on '10.2.70.21' (111)")
10.2.70.21 is an ACE vip, the two back-end servers behind it are both fine.  Something's hosed with the ACE.

nagios seems to agree, most of the http vips are dead, too.

09:39:14 <@nagios> [15] static-redirect.nslb.sj:https - update.mozilla.org is CRITICAL: Connection refused
09:39:26 <@nagios> [18] mdc.acelb.sj:http_string - developer.mozilla.org is CRITICAL: Connection refused
09:39:51 <@nagios> [20] wild.add-ons.nslb.sj:en-us.add-ons.mozilla.com - string Themes2 is CRITICAL: Connection refused
09:39:58 <@nagios> [22] static-redirect.nslb.sj:http - www.mozilla-world.org is CRITICAL: Connection refused
09:40:49 <@nagios> [27] addons.acelb.sj:addons.mozilla.org - string Themes1 is CRITICAL: Connection refused
09:40:55 <@nagios> [29] hudson.acelb.sj:https_string - hudson.mozilla.org is CRITICAL: Connection refused
Assignee: server-ops → dmoore
Summary: SUMO - mysql connection is down → acelb is refusing connections on most of the vips
This was related to excessive connection queues caused by today's poor network performance. The ACE had backed up to over 4 million concurrent sessions, after which it began to refuse connections.

The problem was resolved once our network bottleneck was removed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
We saw a few of these trickle in over night, but just now saw a large group come in. I'm guessing it's more network performance causing queues to fill up on the ACE?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 568591
We've been addressing performance issues in the ACE today. We've moved some of our load to both Phoenix and secondary load balancers to compensate. ACE performance is acceptable after these changes. Moving forward, we'll be migrating more services off of the ACE to prevent this from recurring in the future.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
(In reply to comment #4)
> We've been addressing performance issues in the ACE today. We've moved some of
> our load to both Phoenix and secondary load balancers to compensate. ACE
> performance is acceptable after these changes. Moving forward, we'll be
> migrating more services off of the ACE to prevent this from recurring in the
> future.

Thanks Derek. Keep us updated if you can.
Verified FIXED; haven't seen these tracebacks in my inbox, since.
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.