Closed
Bug 701049
Opened 13 years ago
Closed 13 years ago
AMO returning "Service Unavailable" -- Zeus errors
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: tofumatt, Assigned: oremj)
References
()
Details
Attachments
(2 files)
clouserw mentioned there were some Zeus errors on AMO yesterday. Today, whilst trying to log in I was getting REALLY long-running requests that would eventually time out and show the error paged attached. After I managed to log in, I'd get it on various pages around the site. My account on the site is under my email "tofumatt@mozilla.com" if that's in the logs anyplace and will help. This has been happening for a good third of the request I've made to AMO prod this afternoon.
Assignee | ||
Comment 1•13 years ago
|
||
Username isn't recorded in the logs. Can you give us the exact time frame?
Assignee | ||
Updated•13 years ago
|
Assignee: server-ops → oremj
Reporter | ||
Comment 2•13 years ago
|
||
Started today (Nov.9) around 12:30pm Atlantic Time until at least 1:16:04 PM AST. (We're -4 GMT.) I had a fair amount of log in problems at ~12:35pm.
Comment 3•13 years ago
|
||
There have been multiple reports this morning in #amo, no specific time frames. You can see some traffic dips from ganglia: https://addons-dev.allizom.org/z/services/graphite/addons Looks like when we get above 2500 concurrent sessions the graphs start to get shaky.
Assignee: oremj → server-ops
Updated•13 years ago
|
Assignee: server-ops → oremj
Comment 4•13 years ago
|
||
There is a pretty steady stream of these errors on AMO prod: OperationalError: (2003, "Can't connect to MySQL server on 'db-amo-ro' (110)")
Reporter | ||
Comment 5•13 years ago
|
||
We're also getting tracebacks on Mozillians prod to the tune of: OperationalError: (2003, "Can't connect to MySQL server on 'generic-rw-zeus' (111)")
Comment 7•13 years ago
|
||
(In reply to Matthew Riley MacPherson [:tofumatt] from comment #5) > We're also getting tracebacks on Mozillians prod to the tune of: > > OperationalError: (2003, "Can't connect to MySQL server on 'generic-rw-zeus' > (111)") The bug 701049 has the Mozillians traceback information attached to it.
Comment 9•13 years ago
|
||
quick update: we had Zeus devs looking at our cluster in phx1 for most of yesterday. They discovered some slowness and issues in their code that might cause slowness, but I don't think they have discovered a root cause yet. In the meantime, we are working on procuring new servers to test zeus on that would be hosted outside of the blade environment and have more bandwidth to them. Unfortunately some components are on backorder (~3 weeks) due to the flooding overseas, so this 'solution' is a ways off.
Assignee | ||
Updated•13 years ago
|
Whiteboard: Waiting on Zeus support.
Assignee | ||
Comment 10•13 years ago
|
||
I don't think these severe problems are still happening.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Verified FIXED, as best I can tell.
Status: RESOLVED → VERIFIED
Whiteboard: Waiting on Zeus support.
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•