1064457 - SUMO -dev and -stage: flush rabbitmq queues, verify celery is processing tasks

Reporter

Description

•

10 years ago

Possibly related: bug 1049772

We have charts of our rabbitmq queue sizes:
https://graphite-phx1.mozilla.org/render/?width=586&height=308&_salt=1410201310.704&target=stats.gauges.sumo-dev.rabbitmq.size&target=stats.gauges.sumo-stage.rabbitmq.size&target=stats.gauges.sumo.rabbitmq.size

The -dev and -stage queues seem to just be going up or stay stuck.

1- Can we flush them?
2- Can we verify that celery is pulling tasks off?

I'll be able to do 2 once 1 happens by generating a bunch of reindexing tasks.

:kanban

Updated

•

10 years ago

Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1228]

Stephen Donner [:stephend] Not actively reading bugmail

Comment 1

•

10 years ago

Bumping to major because this causes test failures on staging.

Severity: normal → major

OS: Mac OS X → All

Hardware: x86 → All

Chris Turra [:cturra]

Assignee

Comment 2

•

10 years ago

i have purged the rabbit queues in both dev and stage. it this work related to what was discussed in bug 1050437?

Assignee: server-ops-webops → cturra

Flags: needinfo?(rrosario)

Ricky Rosario [:rrosario, :r1cky]

Reporter

Comment 3

•

10 years ago

(In reply to Chris Turra [:cturra] from comment #2)
> i have purged the rabbit queues in both dev and stage. it this work related
> to what was discussed in bug 1050437?

That bug mentioned the queues were purged but I'm not sure if that really worked. The graphs show the queues to be down to zero now, so we'll monitor and make sure they dont grow crazy again.

Flags: needinfo?(rrosario)

Ricky Rosario [:rrosario, :r1cky]

Reporter

Comment 4

•

10 years ago

(In reply to Chris Turra [:cturra] from comment #2)
> i have purged the rabbit queues in both dev and stage. it this work related
> to what was discussed in bug 1050437?

I don't see celery tasks working. Is there a way to check that out on your end? Are they pulling tasks off the queue?

Chris Turra [:cturra]

Assignee

Comment 5

•

10 years ago

(In reply to Ricky Rosario [:rrosario, :r1cky] from comment #4)
> 
> I don't see celery tasks working. Is there a way to check that out on your
> end? Are they pulling tasks off the queue?

it looks like celery cannot start. the logs show the following for a service restart attempt:

 Traceback (most recent call last):
  File "/data/www/support.allizom.org/kitsune/manage.py", line 9, in <module>
    from django.conf import settings
 ImportError: No module named django.conf

Ricky Rosario [:rrosario, :r1cky]

Reporter

Comment 6

•

10 years ago

(In reply to Chris Turra [:cturra] from comment #5)
> (In reply to Ricky Rosario [:rrosario, :r1cky] from comment #4)
> > 
> > I don't see celery tasks working. Is there a way to check that out on your
> > end? Are they pulling tasks off the queue?
> 
> it looks like celery cannot start. the logs show the following for a service
> restart attempt:
> 
>  Traceback (most recent call last):
>   File "/data/www/support.allizom.org/kitsune/manage.py", line 9, in <module>
>     from django.conf import settings
>  ImportError: No module named django.conf

Oy. Stage is in a weird state where we are trying to switch to peep for deploying vendor. So it could be related to that.

Can you look at support-dev?

Chris Turra [:cturra]

Assignee

Comment 7

•

10 years ago

celery in dev seems to running fine and processing jobs as expected (at least i don't see any indication of queue growth or errors).

Ricky Rosario [:rrosario, :r1cky]

Reporter

Comment 8

•

10 years ago

(In reply to Chris Turra [:cturra] from comment #7)
> celery in dev seems to running fine and processing jobs as expected (at
> least i don't see any indication of queue growth or errors).

I just triggered 15 tasks on -dev. I expected them to start right away given the empty queue but they don't seem to be starting.

Chris Turra [:cturra]

Assignee

Comment 9

•

10 years ago

to be extra certain, i restarted the celery process on the dev instance. looks like it immediately picked up the tasks you submitted and processed them. can you confirm you saw that also?

Ricky Rosario [:rrosario, :r1cky]

Reporter

Comment 10

•

10 years ago

(In reply to Chris Turra [:cturra] from comment #9)
> to be extra certain, i restarted the celery process on the dev instance.
> looks like it immediately picked up the tasks you submitted and processed
> them. can you confirm you saw that also?

It worked yep. The issue on stage is on us. We'll be following up with a WebOps bug to change some things around so we can use the peep virtualenv. Thanks!

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

C. Liang [:cyliang]

Comment 11

•

10 years ago

N.B.: I've bumped down the number of celery worker processes in dev from 32 down to 24.

I chatted briefly with Chris about SUMO celery dev, since he will be away at TRIBE for the next couple of days.  He noted that, when the celery jobs were being processed, there was extremely high load on the server and postulated that celery might be choking itself of resources due to too many celery workers. 

Index: modules/celery/manifests/support/dev.pp
===================================================================
--- modules/celery/manifests/support/dev.pp	(revision 92992)
+++ modules/celery/manifests/support/dev.pp	(working copy)
@@ -2,7 +2,7 @@

     celery::service {
         'support-dev':
-            celery_workers => 32,
+            celery_workers => 24,
             celery_args    => '-E',
             require        => Celery::Rabbit_perms['support_dev'],
             app_dir        => '/data/www/support-dev.allizom.org/kitsune';

Stage is set to run with 64 workers.

Dean Johnson [:deanj]

Updated

•

10 years ago

Depends on: 1065152

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

Bugzilla

Quick Search

SUMO -dev and -stage: flush rabbitmq queues, verify celery is processing tasks

Categories

(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)

Tracking

(Not tracked)

People

(Reporter: rrosario, Assigned: cturra)

References

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1228] )

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Updated