Closed Bug 922276 Opened 12 years ago Closed 12 years ago

Elastic Search Request Timeouts

Categories

(Infrastructure & Operations Graveyard :: WebOps: Engagement, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davidwalsh, Assigned: bburton)

Details

Starting around midnight last night (Sep 30th), I started frequently receiving these emails: Title: [mdn] [celery@developer-celery1.webapp.scl3.mozilla.com] Error: Task elasticutils.contrib.django.tasks.index_objects (fa2dd8a4-451b-4a86-90a3-60e9e784fc3d): <MaybeEncodingError: Error sending result: '<ExceptionInfo: UnpickleableExceptionWrapper('requests.exceptions', 'Timeout', (TimeoutError("HTTPConnectionPool(host='elasticsearch-zlb.webapp.scl3.mozilla.com', port=9200): Request timed out. (timeout=5)", ), ), 'Timeout(TimeoutError("HTTPConnectionPool(host=\'elasticsearch-zlb.webapp.scl3.mozilla.com\', port=9200): Request timed out. (timeout=5)", ), )')>'. Reason: ''TimeoutError' object has no attribute 'url''.> Content: Task elasticutils.contrib.django.tasks.index_objects with id fa2dd8a4-451b-4a86-90a3-60e9e784fc3d raised exception: '<MaybeEncodingError: Error sending result: \'<ExceptionInfo: UnpickleableExceptionWrapper(\'requests.exceptions\', \'Timeout\', (TimeoutError("HTTPConnectionPool(host=\'elasticsearch-zlb.webapp.scl3.mozilla.com\', port=9200): Request timed out. (timeout=5)",),), \'Timeout(TimeoutError("HTTPConnectionPool(host=\\\'elasticsearch-zlb.webapp.scl3.mozilla.com\\\', port=9200): Request timed out. (timeout=5)",),)\')>\'. Reason: \'\'TimeoutError\' object has no attribute \'url\'\'.>' Task was called with args: (<class 'wiki.models.DocumentType'>, [7718L]) kwargs: {}. The contents of the full traceback was: Traceback (most recent call last): File "/data/www/developer.mozilla.org/kuma/vendor/packages/celery/celery/concurrency/processes/pool.py", line 215, in worker put((READY, (job, i, result))) File "/usr/lib64/python2.6/multiprocessing/queues.py", line 366, in put return send(obj) File "/data/www/developer.mozilla.org/kuma/vendor/src/requests/requests/packages/urllib3/exceptions.py", line 23, in __reduce__ return self.__class__, (None, self.url) MaybeEncodingError: Error sending result: '<ExceptionInfo: UnpickleableExceptionWrapper('requests.exceptions', 'Timeout', (TimeoutError("HTTPConnectionPool(host='elasticsearch-zlb.webapp.scl3.mozilla.com', port=9200): Request timed out. (timeout=5)",),), 'Timeout(TimeoutError("HTTPConnectionPool(host=\'elasticsearch-zlb.webapp.scl3.mozilla.com\', port=9200): Request timed out. (timeout=5)",),)')>'. Reason: ''TimeoutError' object has no attribute 'url''. -- Just to let you know, celeryd at developer-celery1.webapp.scl3.mozilla.com. Worked with :jakem to resolve the issue, he may have more information.
Assignee: server-ops-webops → bburton
Status: NEW → ASSIGNED
The production ElasticSearch cluster experienced a network partition due to some core network maintenance which occurred yesterday. Unfortunately it did not recover from this on its own. Additionally, a monitoring misconfiguration, which was fixed in bug 922267, caused the cluster's bad state not to generate an alert. At this time the cluster has been restored to proper health and the monitoring is in place. Let me know if there are any questions.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.