Closed
Bug 901971
Opened 12 years ago
Closed 12 years ago
mdn: errant celery behaviour (overloaded?)
Categories
(Infrastructure & Operations Graveyard :: WebOps: Engagement, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: groovecoder, Assigned: dmaher)
References
Details
https://errormill.mozilla.org/mdn/mdn/group/61571/ is causing a bunch of production ISE's. Has something changed with our developer-prod celery node? Why is the 'celery' queue gone?
Please joine #mdndev to debug with us.
Updated•12 years ago
|
Assignee: server-ops-webops → ludovic
Comment 1•12 years ago
|
||
I'm not entirely sure what caused this, but we restarted celery (on developer-celery1.webapp.scl3) as well as rabbitmq (on rabbit1/2.webapp.scl3) and the problem is resolved.
While troubleshooting this, we discovered that this rabbit queue is actually *very* large... over 700k messages. We cleared it, but the problem seems to be that every celery tasks generate an average of 1 more celery task! Thus the incoming rate is always higher than the outgoing rate.
Recently (last week) the volume of celery tasks did increase greatly. ElasticSearch indexing and building JSON metadata became celery tasks. Combined these now account for 99%+ of all celery jobs processed, according to the log on developer-celery1.webapp.scl3. That node is constantly near-max on CPU usage.
The current speculation is that an ES reindex (a celery task now) kicks off a json build (a celery task), which kicks off a render (not celery), which kicks off a reindex task (back to celery, repeat).
... or something like that.
| Assignee | ||
Comment 2•12 years ago
|
||
Assignee: ludovic → dmaher
Severity: blocker → major
Component: Server Operations: Web Operations → WebOps: Engagement
Priority: -- → P3
Product: mozilla.org → Infrastructure & Operations
| Assignee | ||
Updated•12 years ago
|
See Also: → 901960
Summary: mdn: no queue 'celery' in vhost 'developer-prod' → mdn: errant celery behaviour (overloaded?)
| Assignee | ||
Comment 3•12 years ago
|
||
Hello,
Is this bug still valid ? If so, I would like to assemble the list of specific, actionable items required in order to see it closed. If not, I'll just close it off for now.
Thanks !
Flags: needinfo?(lcrouch)
| Reporter | ||
Comment 4•12 years ago
|
||
This one is done. I also filed bug 908204 so we can try to stay on top of this ourselves.
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: needinfo?(lcrouch)
Resolution: --- → FIXED
Updated•9 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•