Closed
Bug 859822
Opened 11 years ago
Closed 11 years ago
[sumo][stage] celery issues
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rrosario, Assigned: cturra)
References
Details
See: https://errormill.mozilla.org/support/sumo-stage/ There are a bunch of AMQP related errors, for example: https://errormill.mozilla.org/support/sumo-stage/group/15341/ [Errno 113] No route to host This has happened at least 3 times this morning and the site stays broken until a deploy restarts everything.
Reporter | ||
Comment 1•11 years ago
|
||
Every time I deploy, things work fine for a bit since everything restarts. But then the errors happen again and the site goes down.
Assignee | ||
Comment 2•11 years ago
|
||
i have spent a little time tracking this down and it turns out that support-celery1.stage is currently offline. the esx cluster is reporting the following error for it: support-celery1.stage Alert vSphere HA virtual machine failover failed vc1.private.phx1.mozilla.com 4/9/2013 12:23:11 AM i am working with our sre team to investigate and get it back online.
Assignee: server-ops-webops → cturra
Assignee | ||
Comment 3•11 years ago
|
||
last night one of the esx hosts blew up causing this host to go down. other vms on this host were manually recovered, but this one was missed. the sre team is reviewing the monitoring of this node to avoid this from happening again in the future. as of now tho, the host is back online and functioning as expected. *important note, since it's been offline for a bit, please be sure to do another chief deploy to get all your latest code onto this stage node.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 4•11 years ago
|
||
(ESX host failure was 859698, for internal cross-reference purposes)
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•