Closed Bug 903047 Opened 12 years ago Closed 12 years ago

Service unavailable timeouts on stage

Categories

(Infrastructure & Operations Graveyard :: WebOps: Socorro, task, P1)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lonnen, Assigned: bburton)

Details

(Whiteboard: [service interrupt])

QA has been unable to verify bugs because of an inordinate number of service unavailable timeouts in our stage environment. This happens somewhere outside of our machines -- the errors don't show up in our apache logs. Zeus maybe?
Assignee: server-ops-webops → bburton
Priority: -- → P1
Whiteboard: [service interrupt]
I did some log spelunking and found "https://crash-stats.allizom.org/topcrasher/products/Firefox/versions/26.0a1?days=7" was logging a 500 error in the access_log When I tried it in a browser it returned a Zeus timeout page after a few seconds I reviewed the Zeus timeouts and the virtual server for crash-stats.allizom.org was set to 10 seconds, which is shorter than URLs like the one above take to return data. I increased it to 100 seconds and https://crash-stats.allizom.org/topcrasher/products/Firefox/versions/26.0a1?days=7 is now returning content It was occasionally returning a 500 from crashstats itself, but after I restarted memcache that went away. I believe the main issue of this bug has been address and https://bugzilla.mozilla.org/show_bug.cgi?id=901159 should address the other
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.