Closed
Bug 662194
Opened 13 years ago
Closed 13 years ago
[stage] Staging env returns a 500 error
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: mbrandt, Assigned: nmaul)
References
()
Details
(Whiteboard: [stage])
Attachments
(1 file)
13.33 KB,
image/png
|
Details |
The staging env appears to be down.
Steps to reproduce:
1. goto http://input.stage.mozilla.com/
Actual:
"Service Unavailable" message is displayed by the site.
Updated•13 years ago
|
Assignee: nobody → server-ops
Component: Backend → Server Operations
Product: Input → mozilla.org
QA Contact: backend → mrz
Version: unspecified → other
Updated•13 years ago
|
Assignee: server-ops → phong
Comment 1•13 years ago
|
||
mrapp-stage02 is not responding (nor is its out of band). Phong is en route to check on it.
Updated•13 years ago
|
Assignee: phong → nmaul
Assignee | ||
Comment 2•13 years ago
|
||
This server is responding again, and we're looking into what happened.
One thing I can tell you, however, is that there appears to be a problem with http://input.stage.mozilla.com/en-US/. When I attempt to visit that page, the Apache process handling my request shoots up to 100% CPU usage and hangs there.
Assignee | ||
Comment 3•13 years ago
|
||
Dropping prio since the server is working again and this is now "just" a problem with input.stage, plus some investigative work.
Severity: blocker → major
Priority: P1 → --
Comment 4•13 years ago
|
||
Let me know if it's something strange on our end.
Assignee | ||
Comment 5•13 years ago
|
||
I do believe it is some type of coding issue, but I can't pinpoint it more specifically that just the URL:
http://input.stage.mozilla.com/en-US/
This feels like it's stuck the server in some type of infinite loop. By that I mean, when someone hits that page, I can watch an Apache worker suddenly jump up to 100% CPU usage and stay there forever, until someone kills it. In a browser, you'll get a Zeus 500 ISE error after 30 seconds or so, but the Apache worker on the server keeps going for as long as 5 minutes (longest I've seen one before it was killed manually).
The main Apache error_log has nothing, and the same goes for input.stage.mozilla.com's error_log. I doubt it's emailing you stack traces or anything, either.
I believe what happened is enough of these got loaded up at once that the server simply became unresponsive entirely, until the kernel "Out of Memory Killer" killed off one of offending httpd processes. This has happened twice yesterday morning, twice this morning, and once this afternoon (1:59:05pm MV time).
Note that other languages behave the same way (http://input.stage.mozilla.com/es/), but that other pages do work (http://input.stage.mozilla.com/en-US/feedback).
Comment 6•13 years ago
|
||
This happened again at ~0815 PDT, host ran out of RAM and swap and had to reboot to get the host back up. It was extremely busy oom-killing processes and took about 20-30 mins to reboot.
Assignee | ||
Comment 8•13 years ago
|
||
Considering this a solved code issue- the features on input.stage.mozilla.com suspected of causing this server-wide issue have been removed, and will not be re-enabled unless reimplemented some other way. Thanks all!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 9•13 years ago
|
||
thx for the attention... stage seems to be fantastic again. No longer going down with ISEs
Status: RESOLVED → VERIFIED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•