Closed
Bug 613252
Opened 14 years ago
Closed 14 years ago
SUMO production is in trouble
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jsocol, Assigned: phong)
Details
Seeing Zeus-style Service Unavailable at http://support.mozilla.com/, also not seeing metrics at all on http://nm-dash01.nms.mozilla.org/ganglia/?c=sumo&m=load_one&r=hour&s=descending&hc=4&mc=2 and the SUMO graphs on http://nm-dash01.nms.mozilla.org/ganglia/ cut off maybe 8 or 10 minutes ago.
Assignee | ||
Comment 1•14 years ago
|
||
It should be back now. I thought it could handle taking the webheads down for a RAM upgrade, but it didn't. We'll have to schedule a downtime next time around.
Assignee: server-ops → phong
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 2•14 years ago
|
||
Phong - can you talk to mrz about taking stuff down next time? He's supposed to announce stuff like that. Thanks for trying either way.
Comment 3•14 years ago
|
||
We should have enough machines so you can take one host down without affecting the site. This tells me we have N when we should have N+1. Filed bug 613323 to track.
Reporter | ||
Comment 4•14 years ago
|
||
Matthew, would you please cc me to bug 613323? If possible it would also be nice to move our RabbitMQ and celeryd instances off a web server so that traffic spikes won't cause a snowball effect with celery.
Comment 5•14 years ago
|
||
I opened that bug up. I'll do whatever you guys think is best - I don't know the whole system arch as well as you or oremj/fox2mike/.
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•