Closed Bug 901971 Opened 12 years ago Closed 12 years ago

mdn: errant celery behaviour (overloaded?)

Categories

(Infrastructure & Operations Graveyard :: WebOps: Engagement, task, P3)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: groovecoder, Assigned: dmaher)

References

Details

https://errormill.mozilla.org/mdn/mdn/group/61571/ is causing a bunch of production ISE's. Has something changed with our developer-prod celery node? Why is the 'celery' queue gone? Please joine #mdndev to debug with us.
Assignee: server-ops-webops → ludovic
I'm not entirely sure what caused this, but we restarted celery (on developer-celery1.webapp.scl3) as well as rabbitmq (on rabbit1/2.webapp.scl3) and the problem is resolved. While troubleshooting this, we discovered that this rabbit queue is actually *very* large... over 700k messages. We cleared it, but the problem seems to be that every celery tasks generate an average of 1 more celery task! Thus the incoming rate is always higher than the outgoing rate. Recently (last week) the volume of celery tasks did increase greatly. ElasticSearch indexing and building JSON metadata became celery tasks. Combined these now account for 99%+ of all celery jobs processed, according to the log on developer-celery1.webapp.scl3. That node is constantly near-max on CPU usage. The current speculation is that an ES reindex (a celery task now) kicks off a json build (a celery task), which kicks off a render (not celery), which kicks off a reindex task (back to celery, repeat). ... or something like that.
Assignee: ludovic → dmaher
Severity: blocker → major
Component: Server Operations: Web Operations → WebOps: Engagement
Priority: -- → P3
Product: mozilla.org → Infrastructure & Operations
Depends on: 902177
See Also: → 901960
Summary: mdn: no queue 'celery' in vhost 'developer-prod' → mdn: errant celery behaviour (overloaded?)
Hello, Is this bug still valid ? If so, I would like to assemble the list of specific, actionable items required in order to see it closed. If not, I'll just close it off for now. Thanks !
Flags: needinfo?(lcrouch)
This one is done. I also filed bug 908204 so we can try to stay on top of this ourselves.
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: needinfo?(lcrouch)
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.