Closed Bug 1064457 Opened 10 years ago Closed 10 years ago

SUMO -dev and -stage: flush rabbitmq queues, verify celery is processing tasks

Categories

(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rrosario, Assigned: cturra)

References

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1228] )

Possibly related: bug 1049772

We have charts of our rabbitmq queue sizes:
https://graphite-phx1.mozilla.org/render/?width=586&height=308&_salt=1410201310.704&target=stats.gauges.sumo-dev.rabbitmq.size&target=stats.gauges.sumo-stage.rabbitmq.size&target=stats.gauges.sumo.rabbitmq.size

The -dev and -stage queues seem to just be going up or stay stuck.

1- Can we flush them?
2- Can we verify that celery is pulling tasks off?

I'll be able to do 2 once 1 happens by generating a bunch of reindexing tasks.
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1228]
Bumping to major because this causes test failures on staging.
Severity: normal → major
OS: Mac OS X → All
Hardware: x86 → All
i have purged the rabbit queues in both dev and stage. it this work related to what was discussed in bug 1050437?
Assignee: server-ops-webops → cturra
Flags: needinfo?(rrosario)
(In reply to Chris Turra [:cturra] from comment #2)
> i have purged the rabbit queues in both dev and stage. it this work related
> to what was discussed in bug 1050437?

That bug mentioned the queues were purged but I'm not sure if that really worked. The graphs show the queues to be down to zero now, so we'll monitor and make sure they dont grow crazy again.
Flags: needinfo?(rrosario)
(In reply to Chris Turra [:cturra] from comment #2)
> i have purged the rabbit queues in both dev and stage. it this work related
> to what was discussed in bug 1050437?

I don't see celery tasks working. Is there a way to check that out on your end? Are they pulling tasks off the queue?
(In reply to Ricky Rosario [:rrosario, :r1cky] from comment #4)
> 
> I don't see celery tasks working. Is there a way to check that out on your
> end? Are they pulling tasks off the queue?

it looks like celery cannot start. the logs show the following for a service restart attempt:

 Traceback (most recent call last):
  File "/data/www/support.allizom.org/kitsune/manage.py", line 9, in <module>
    from django.conf import settings
 ImportError: No module named django.conf
(In reply to Chris Turra [:cturra] from comment #5)
> (In reply to Ricky Rosario [:rrosario, :r1cky] from comment #4)
> > 
> > I don't see celery tasks working. Is there a way to check that out on your
> > end? Are they pulling tasks off the queue?
> 
> it looks like celery cannot start. the logs show the following for a service
> restart attempt:
> 
>  Traceback (most recent call last):
>   File "/data/www/support.allizom.org/kitsune/manage.py", line 9, in <module>
>     from django.conf import settings
>  ImportError: No module named django.conf

Oy. Stage is in a weird state where we are trying to switch to peep for deploying vendor. So it could be related to that.

Can you look at support-dev?
celery in dev seems to running fine and processing jobs as expected (at least i don't see any indication of queue growth or errors).
(In reply to Chris Turra [:cturra] from comment #7)
> celery in dev seems to running fine and processing jobs as expected (at
> least i don't see any indication of queue growth or errors).

I just triggered 15 tasks on -dev. I expected them to start right away given the empty queue but they don't seem to be starting.
to be extra certain, i restarted the celery process on the dev instance. looks like it immediately picked up the tasks you submitted and processed them. can you confirm you saw that also?
(In reply to Chris Turra [:cturra] from comment #9)
> to be extra certain, i restarted the celery process on the dev instance.
> looks like it immediately picked up the tasks you submitted and processed
> them. can you confirm you saw that also?

It worked yep. The issue on stage is on us. We'll be following up with a WebOps bug to change some things around so we can use the peep virtualenv. Thanks!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
N.B.: I've bumped down the number of celery worker processes in dev from 32 down to 24.

I chatted briefly with Chris about SUMO celery dev, since he will be away at TRIBE for the next couple of days.  He noted that, when the celery jobs were being processed, there was extremely high load on the server and postulated that celery might be choking itself of resources due to too many celery workers. 

Index: modules/celery/manifests/support/dev.pp
===================================================================
--- modules/celery/manifests/support/dev.pp	(revision 92992)
+++ modules/celery/manifests/support/dev.pp	(working copy)
@@ -2,7 +2,7 @@

     celery::service {
         'support-dev':
-            celery_workers => 32,
+            celery_workers => 24,
             celery_args    => '-E',
             require        => Celery::Rabbit_perms['support_dev'],
             app_dir        => '/data/www/support-dev.allizom.org/kitsune';

Stage is set to run with 64 workers.
Depends on: 1065152
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.