[addons-dev] Celery queue is extremely backed up.

RESOLVED FIXED

Status

RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: jbalogh, Assigned: jason)

Tracking

Details

(Reporter)

Description

7 years ago
Currently there's 16,000 messages in the queue according to ganglia[1]. I don't know why it's backed up so much, but we can start by figuring out how many workers are allotted for each queue and if we have capacity to add more. This is blocking -dev from functioning normally.

[1]: https://ganglia.mozilla.org/phx1/graph.php?c=Addons%20Stage%20Services&h=services-stage1.addons.phx1.mozilla.com&v=16773&m=rabbit_dev_zamboni_celery_messages_ready&r=hour&z=medium&jr=&js=&st=1315503016&vl=messages&z=large
This is blocking a test run of the SDK upgrade which has to launch before fx7
Severity: major → critical
Component: Server Operations: Web Content Push → Server Operations
Assignee: server-ops → jthomas
(Assignee)

Comment 2

7 years ago
Here are the current workers per queue:
celeryd-dev - 9
devhub,images - 6
bulk - 6

The queue that seems to be affected is celeryd-dev and was recently raised from 6 -> 9 on 9/2. 

As per our conversation on IRC I would like to add additional resources to this host so we can increase the number of celery workers for the celeryd-dev queue. I will update once this is completed.
(Assignee)

Updated

7 years ago
Depends on: 685769
(Assignee)

Comment 3

7 years ago
Additional resources have been added. I am going to keep an eye on this host and see how well it consumes the queue now.
Status: NEW → ASSIGNED
Did you adjust the worker counts?
(Reporter)

Comment 5

7 years ago
I stuck a ridiculous number of jobs in the bulk queue to import a bunch of stats. Can we send any spare workers to that queue for now?
(Assignee)

Comment 6

7 years ago
(In reply to Jeff Balogh (:jbalogh) from comment #5)
> I stuck a ridiculous number of jobs in the bulk queue to import a bunch of
> stats. Can we send any spare workers to that queue for now?

Okay, I doubled the bulk queue.
(In reply to Jason Thomas [:jason] from comment #6)
> (In reply to Jeff Balogh (:jbalogh) from comment #5)
> > I stuck a ridiculous number of jobs in the bulk queue to import a bunch of
> > stats. Can we send any spare workers to that queue for now?
> 
> Okay, I doubled the bulk queue.

did this help?  can we close this out?
(Reporter)

Comment 8

7 years ago
(In reply to Corey Shields [:cshields] from comment #7)
> (In reply to Jason Thomas [:jason] from comment #6)
> > (In reply to Jeff Balogh (:jbalogh) from comment #5)
> > > I stuck a ridiculous number of jobs in the bulk queue to import a bunch of
> > > stats. Can we send any spare workers to that queue for now?
> > 
> > Okay, I doubled the bulk queue.
> 
> did this help?  can we close this out?

It cleared out that part, but it looks like the normal queue is still going slowly[1]. Can we reassign those extra bulk workers to the celery queue? And please post the number of workers for each queue.

https://ganglia.mozilla.org/phx1/graph.php?c=Addons%20Stage%20Services&h=services-stage1.addons.phx1.mozilla.com&v=0&m=rabbit_dev_zamboni_celery_messages_ready&r=week&z=medium&jr=&js=&st=1315787974&vl=messages
Severity: critical → normal
(Assignee)

Comment 9

7 years ago
Worker numbers as of now:

celeryd-dev - 15
devhub,images - 6
bulk - 6

What kind of jobs are being performed? When the queue gets backed up it is usually when a significant amount of MySQL update commands are being executed.
(Assignee)

Comment 10

7 years ago
@mpressman - The celery queue becomes backed up usually when a large amount of MySQL UPDATE jobs are running. Do you see any issues with dev1.db.phx1.mozilla.com?
the addons_dev db was recently reseeded with a production copy. It didn't do any updates, but batch loaded the entire database
going to assume this was fixed. reopen if that is not the case.
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Component: Server Operations → Server Operations: Web Operations
QA Contact: mrz → cshields
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.