Closed Bug 739111 Opened 13 years ago Closed 13 years ago

devhub task queue for addons.mozilla.org appears clogged

Categories

(Cloud Services :: Operations: Marketplace, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kumar, Assigned: jason)

References

Details

Attachments

(1 file)

Attached image celery queue backup
for the past hour, tasks have not been completing in the celery cluster. Any ideas on what it is? We just received bug 739106 about addon validation that never ends but when I tried it locally the validation ran ok (it was a bit slow). The workers should have timeouts so they shouldn't block for an hour like this. https://ganglia.mozilla.org/phx1/graph_all_periods.php?c=Addons%20Celery&h=celery2&v=38&m=rabbit_prod_zamboni_devhub_messages_ready&r=hour&z=default&jr=&js=&st=1332718837&vl=messages&z=large
[118] celery1.addons.phx1:RabbitMQ Queue prod_zamboni images is CRITICAL: CRITICAL: 63 unacknowledged messages in queue images on vhost prod_zamboni. root@celery1.addons.phx1 ~]# rabbitmqctl list_queues -p prod_zamboni name messages_ready messages_unacknowledged consumers Listing queues ... celery2 0 0 2 bulk 0 0 2 celery 0 0 2 images 66 0 0 devhub 142 0 0 celery1 0 0 2 auditd was spewing out lots of: root@celery1.addons.phx1 ~]# dmesg audit: audit_lost=1229477 audit_rate_limit=200 audit_backlog_limit=320 audit: rate limit exceeded I've stopped auditd for the time, and am watching the queue's
Component: Server Operations: Web Operations → Server Operations: AMO Operations
QA Contact: cshields → oremj
Blocks: 739106
Assignee: server-ops → jthomas
The devhub celery workers seemed to be unresponsive. Stracing the processes didn't show anything useful. I went ahead and restarted the devhub celery workers.
We restarted celery once more and looking at the logs devhub tasks are completing successfully: Mar 26 10:49:39 celery1.addons.phx1.mozilla.com: [][] z.devhub.task:INFO VALIDATING: 42da9c45d4b94418a22ab711871f90d9 :/data/www/addons.mozilla.org/zamboni/apps/devhub/tasks.py:42 Mar 26 10:49:40 celery1.addons.phx1.mozilla.com: [][] celery:INFO Task devhub.tasks.validator[a3bdaadc-8842-4b7e-ab66-331f6c9a14e2] succeeded in 0.785356044769s: None :/data/www/addons.mozilla.org/zamboni/vendor/lib/python/celery/worker/job.py:475 As per kumar we are currently using celery 2.2.6 and the latest release is 2.5. In celery 2.3.2 there were bug fixes for unresponsive workers [1]. Kumar is currently working on getting the latest celery on -dev for further testing. [1] http://docs.celeryproject.org/en/latest/changelog.html#v232-fixes
This has stabilized for the time being. We upgraded celery to 2.5.2 in -dev and it seems to be running well. We will deploy it on Thursday. bug 739702
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
I'm guessing it's unresponsive again since it's hanging on validating an updated version of my add-on. What's odd is that it successfully validates older (already uploaded) versions of the same add-on.
Depends on: 739715
Depends on: 739702
Component: Server Operations: AMO Operations → Operations: Marketplace
Product: mozilla.org → Mozilla Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: