We are seeing a lot of Validation timeouts in AMO Prod. See https://addons.mozilla.org/en-US/developers/addon/test-pilot/validation-result/131205 and screenshot at http://cl.ly/image/2k1l2q153g2a
Jason, is celery looking ok?
I ran one of the add-ons locally and it's only taking ~12s to validate. I don't think there have been any significant changes to the validator that would prompt these timeouts.
I ran the compatibility bump today, so that's probably the reason for the slower validations and timeouts. I saw a few timoeout errors in the failing list of add-ons.
The concept of a work queue is that the server always does the same amount of work, it might just take longer to pick up a job. If compatibility bumps are causing timeouts then that's a serious problem; we may have to adjust some queue settings.
(In reply to Kumar McMillan [:kumar] from comment #1) > Jason, is celery looking ok? The queue looks okay now, but the bulk queue alerted earlier today around the same time the validation errors occurred: Fri 10:04:59 PDT  celery1.addons.phx1.mozilla.com:AMO Celery - RabbitMQ Queue prod_zamboni bulk is CRITICAL: CRITICAL: 4617 unacknowledged messages in queue bulk on vhost prod_zamboni. Fri 10:25:00 PDT  celery1.addons.phx1.mozilla.com:AMO Celery - RabbitMQ Queue prod_zamboni bulk is OK: OK: 0 unacknowledged messages in queue bulk on vhost prod_zamboni.
ah, ok. The bulk queue gets a sudden burst of jobs and then will work out those jobs at its own pace. Putting an alert on a large number of unacknowledged messages might be a false alarm -- it is expected. However, we might want to revisit the settings for how many jobs run concurrently in the bulk queue so that it's not putting too much stress on the cluster.
Jorge Villalobos asked me to add a comment to this bug as I was personally seeing this when I tried to upload a new version of my Web Developer extension yesterday. Here is a screenshot of the error I saw: http://stuff.chrispederick.com/J1Ai The new version of the extension is a little bigger than the previous release (about 1.34MB) but I'm not sure if that is causing any issues. I can attach the XPI if that is useful - just let me know.
Just a quick follow up to say that I just tried again this morning and I'm still seeing the same problem.
Jason: what is the load like on these boxes? It sounds like we need to back off on the queue settings
The load looks okay overall, but does spike when there are jobs in the rabbit queue . I reduced the bulk workers from 12 -> 6 for now. https://ganglia-phx1.mozilla.org/ganglia/?r=week&cs=&ce=&c=Addons+Celery&h=celery1&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS
Bug 786292 to add additional celery node to the prod cluster.
I haven't seen these since the new node. Krupa?
I can confirm that I was just able to upload my new version of the Web Developer extension so this looks to be resolved to me.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
I just performed many validations without any problem. Closing bug.
Status: RESOLVED → VERIFIED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.