Closed Bug 685595 Opened 13 years ago Closed 13 years ago

[addons-dev] Celery queue is extremely backed up.

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jbalogh, Assigned: jason)

References

Details

Currently there's 16,000 messages in the queue according to ganglia[1]. I don't know why it's backed up so much, but we can start by figuring out how many workers are allotted for each queue and if we have capacity to add more. This is blocking -dev from functioning normally.

[1]: https://ganglia.mozilla.org/phx1/graph.php?c=Addons%20Stage%20Services&h=services-stage1.addons.phx1.mozilla.com&v=16773&m=rabbit_dev_zamboni_celery_messages_ready&r=hour&z=medium&jr=&js=&st=1315503016&vl=messages&z=large
This is blocking a test run of the SDK upgrade which has to launch before fx7
Severity: major → critical
Component: Server Operations: Web Content Push → Server Operations
Assignee: server-ops → jthomas
Here are the current workers per queue:
celeryd-dev - 9
devhub,images - 6
bulk - 6

The queue that seems to be affected is celeryd-dev and was recently raised from 6 -> 9 on 9/2. 

As per our conversation on IRC I would like to add additional resources to this host so we can increase the number of celery workers for the celeryd-dev queue. I will update once this is completed.
Depends on: 685769
Additional resources have been added. I am going to keep an eye on this host and see how well it consumes the queue now.
Status: NEW → ASSIGNED
Did you adjust the worker counts?
I stuck a ridiculous number of jobs in the bulk queue to import a bunch of stats. Can we send any spare workers to that queue for now?
(In reply to Jeff Balogh (:jbalogh) from comment #5)
> I stuck a ridiculous number of jobs in the bulk queue to import a bunch of
> stats. Can we send any spare workers to that queue for now?

Okay, I doubled the bulk queue.
(In reply to Jason Thomas [:jason] from comment #6)
> (In reply to Jeff Balogh (:jbalogh) from comment #5)
> > I stuck a ridiculous number of jobs in the bulk queue to import a bunch of
> > stats. Can we send any spare workers to that queue for now?
> 
> Okay, I doubled the bulk queue.

did this help?  can we close this out?
(In reply to Corey Shields [:cshields] from comment #7)
> (In reply to Jason Thomas [:jason] from comment #6)
> > (In reply to Jeff Balogh (:jbalogh) from comment #5)
> > > I stuck a ridiculous number of jobs in the bulk queue to import a bunch of
> > > stats. Can we send any spare workers to that queue for now?
> > 
> > Okay, I doubled the bulk queue.
> 
> did this help?  can we close this out?

It cleared out that part, but it looks like the normal queue is still going slowly[1]. Can we reassign those extra bulk workers to the celery queue? And please post the number of workers for each queue.

https://ganglia.mozilla.org/phx1/graph.php?c=Addons%20Stage%20Services&h=services-stage1.addons.phx1.mozilla.com&v=0&m=rabbit_dev_zamboni_celery_messages_ready&r=week&z=medium&jr=&js=&st=1315787974&vl=messages
Severity: critical → normal
Worker numbers as of now:

celeryd-dev - 15
devhub,images - 6
bulk - 6

What kind of jobs are being performed? When the queue gets backed up it is usually when a significant amount of MySQL update commands are being executed.
@mpressman - The celery queue becomes backed up usually when a large amount of MySQL UPDATE jobs are running. Do you see any issues with dev1.db.phx1.mozilla.com?
the addons_dev db was recently reseeded with a production copy. It didn't do any updates, but batch loaded the entire database
going to assume this was fixed. reopen if that is not the case.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Component: Server Operations → Server Operations: Web Operations
QA Contact: mrz → cshields
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.