Closed Bug 701049 Opened 13 years ago Closed 13 years ago

AMO returning "Service Unavailable" -- Zeus errors

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: tofumatt, Assigned: oremj)

References

(
URL
)

Details

Attachments

(2 files)

Screenshot of error 13 years ago tofumatt [:tofumatt] 157.41 KB, image/png		Details
Post-fix screenshot 13 years ago Stephen Donner [:stephend] Not actively reading bugmail 565.78 KB, image/png		Details

tofumatt [:tofumatt]

Reporter

Description

•

13 years ago

Attached image Screenshot of error — Details

clouserw mentioned there were some Zeus errors on AMO yesterday. Today, whilst trying to log in I was getting REALLY long-running requests that would eventually time out and show the error paged attached. After I managed to log in, I'd get it on various pages around the site.

My account on the site is under my email "tofumatt@mozilla.com" if that's in the logs anyplace and will help. This has been happening for a good third of the request I've made to AMO prod this afternoon.

Jeremy Orem [:oremj]

Assignee

Comment 1

•

13 years ago

Username isn't recorded in the logs. Can you give us the exact time frame?

Jeremy Orem [:oremj]

Assignee

Updated

•

13 years ago

Assignee: server-ops → oremj

tofumatt [:tofumatt]

Reporter

Comment 2

•

13 years ago

Started today (Nov.9) around 12:30pm Atlantic Time until at least 1:16:04 PM AST. (We're -4 GMT.)

I had a fair amount of log in problems at ~12:35pm.

Wil Clouser [:clouserw]

Comment 3

•

13 years ago

There have been multiple reports this morning in #amo, no specific time frames.  You can see some traffic dips from ganglia: https://addons-dev.allizom.org/z/services/graphite/addons

Looks like when we get above 2500 concurrent sessions the graphs start to get shaky.

Assignee: oremj → server-ops

Wil Clouser [:clouserw]

Updated

•

13 years ago

Assignee: server-ops → oremj

Wil Clouser [:clouserw]

Comment 4

•

13 years ago

There is a pretty steady stream of these errors on AMO prod:

OperationalError: (2003, "Can't connect to MySQL server on 'db-amo-ro' (110)")

tofumatt [:tofumatt]

Reporter

Comment 5

•

13 years ago

We're also getting tracebacks on Mozillians prod to the tune of:

OperationalError: (2003, "Can't connect to MySQL server on 'generic-rw-zeus' (111)")

Matt Brandt [:mbrandt]

Comment 7

•

13 years ago

(In reply to Matthew Riley MacPherson [:tofumatt] from comment #5)
> We're also getting tracebacks on Mozillians prod to the tune of:
> 
> OperationalError: (2003, "Can't connect to MySQL server on 'generic-rw-zeus'
> (111)")

The bug 701049 has the Mozillians traceback information attached to it.

Corey Shields [:cshields]

Comment 9

•

13 years ago

quick update:

we had Zeus devs looking at our cluster in phx1 for most of yesterday.  They discovered some slowness and issues in their code that might cause slowness, but I don't think they have discovered a root cause yet.

In the meantime, we are working on procuring new servers to test zeus on that would be hosted outside of the blade environment and have more bandwidth to them.  Unfortunately some components are on backorder (~3 weeks) due to the flooding overseas, so this 'solution' is a ways off.

Jeremy Orem [:oremj]

Assignee

Updated

•

13 years ago

Whiteboard: Waiting on Zeus support.

Jeremy Orem [:oremj]

Assignee

Comment 10

•

13 years ago

I don't think these severe problems are still happening.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Stephen Donner [:stephend] Not actively reading bugmail

Comment 11

•

13 years ago

Attached image Post-fix screenshot — Details

Stephen Donner [:stephend] Not actively reading bugmail

Comment 12

•

13 years ago

Verified FIXED, as best I can tell.

Status: RESOLVED → VERIFIED

Whiteboard: Waiting on Zeus support.

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: Web Operations → WebOps: Other

Product: mozilla.org → Infrastructure & Operations

BMO Automation

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

AMO returning "Service Unavailable" -- Zeus errors

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

Tracking

(Not tracked)

People

(Reporter: tofumatt, Assigned: oremj)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 7

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Updated

Updated

Attachment

General

Description

File Name

Content Type