Closed
Bug 1274816
Opened 9 years ago
Closed 9 years ago
Validator hung
Categories
(Cloud Services :: Operations: AMO, task)
Cloud Services
Operations: AMO
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: andy+bugzilla, Assigned: jason)
Details
Looks like the validator is hanging on prod. Maybe something to do with the switch to the redis back end?
Currently any add-on pushed through https://addons.mozilla.org/en-US/developers/addon/validate is failing.
See also: https://github.com/mozilla/addons-server/issues/2720
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → jthomas
Assignee | ||
Comment 1•9 years ago
|
||
I believe this issue is related to web workers not being able to communicate with rabbitmq server.
Our rabbitmq setup included a internal AWS ELB with a 1 hour idle timeout and amqp client configured with a broker heartbeat which would verify dead connections. This worked well most of the time especially within celery but this seemed to not work as well on the web workers especially during low traffic times and when idle connections to rabbitmq were being dropped by the ELB.
There are mixed opinions on using a ELB in front of rabbitmq, the general recommendation is not to and use other methods and lb solutions[1][2].
I've deployed new rabbitmq service without a ELB. They are HA clustered and in DNS round robin. This should allow idle connections to stay active to the TCP configured limit.
We may also want to investigate adding BROKER_TRANSPORT_OPTIONS = {'confirm_publish': True}. This will reduce performance but will guarantee messages are delivered [3].
[1] http://www.greg-gilbert.com/2015/08/rabbitmq-aws-elb-and-you/
[2] https://www.safaribooksonline.com/library/view/rabbitmq-cookbook/9781849516501/ch10s05.html
[3] https://www.rabbitmq.com/confirms.html
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•