Closed Bug 1185098 Opened 10 years ago Closed 10 years ago

Convince celery to re-try failed connections

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(1 file)

Jordan's seeing ---- 2015-07-17 12:09:40,499 [relengapi.blueprints.archiver] Creating new celery task and task tracker for: try-9ae2bc693a00.tar.gz_testing_mozharness 2015-07-17 12:09:40,576 [relengapi.blueprints.archiver] checking status of task id try-9ae2bc693a00.tar.gz_testing_mozharness: current state PENDING ... ... 2015-07-17 12:09:49,586 [relengapi.blueprints.archiver] generating GET URL to try-2a9d42bfc513.tar.gz/testing/mozharness, expires in 300s 2015-07-17 12:09:49,650 [relengapi.app] Exception on /archiver/status/try-9ae2bc693a00.tar.gz_testing_mozharness [GET] Traceback (most recent call last): File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request rv = self.dispatch_request() File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/newrelic-2.46.0.37/newrelic/hooks/framework_flask.py", line 40, in _nr_wrapper_handler_ return wrapped(*args, **kwargs) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/relengapi/lib/api.py", line 103, in replacement result = wrapped(*args, **kwargs) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/relengapi/blueprints/archiver/__init__.py", line 80, in task_status log.info("checking status of task id {}: current state {}".format(task_id, task.state)) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/result.py", line 398, in state return self._get_task_meta()['status'] File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/result.py", line 341, in _get_task_meta return self._maybe_set_cache(self.backend.get_task_meta(self.id)) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/backends/amqp.py", line 163, in get_task_meta binding.declare() File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 504, in declare self.exchange.declare(nowait) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare nowait=nowait, passive=passive, File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/amqp/channel.py", line 613, in exchange_declare self._send_method((40, 10), args) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method self.channel_id, method_sig, args, content, File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method write_frame(1, channel, payload) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame frame_type, channel, size, payload, 0xce, File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 32] Broken pipe ---- which, ideally, celery would just automatically retry.
screen shot is from the last 3 hours looks like we hit an error a few times. which results in the entire rev's list of builders failing. I have updated the archiver client to fallback on getting the archive from hg.mozilla.org directly so this doesn't cause bustage in production
celery seems to recommend not using rabbit as a backend :) http://celery.readthedocs.org/en/latest/configuration.html#amqp-backend-settings maybe we should try redis or a database
Blocks: 1184722
as a bonus, making the backend a database may mean that I can remove my Tracker table..
No longer blocks: 1184722
Blocks: 1184722
Hm, I thought I commented on this a week or so ago. I don't think there's anything to do here: Celery *does* retry its AMQP frontend connections, and we're no longer using it on the backend. So, fixed by virtue of using the MySQL backend.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: