Closed Bug 1367017 Opened 8 years ago Closed 8 years ago

[mozillians.org][prod] ES connection failures

Categories

(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)

task
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: nemo-yiannis, Assigned: danielh)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4860])

We are getting a lot of tracebacks from mozillians.org prod because of a connection error to ES server. ConnectionError: ConnectionError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5)) caused by: ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5))
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4860]
Severity: normal → major
Is this affecting user-facing services, or just spamming up the logs?
It is affecting user-facing services (mozillians.org search/indexing functionality).
John, After restarting the Elasticsearch service on one of the nodes, the cluster is healthy again. I expect those errors to go away.
Assignee: server-ops-webops → dhartnell
This issue should be resolved. I do not have a lot of information to add to this specific bug. Once the underlying issue was addressed and the cluster health was restored, Mozillians began working normally. John let me know that no errors were raised after re-indexing. Ultimately, restarting the service on elasticsearch6 was sufficient to correct the issue. This issue started in bug 1366946. Unfortunately, I misjudged the severity of that issue at the time which led to the errors reported here. We will be re-evaluating our documentation on Friday to see if we have opportunities to improve it so we can respond to issues better in the future. I'm going to mark this as solved at this point in time.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
I am marking this one as verified. Indeed, after kicking elasticsearch indexing worked fine.
Status: RESOLVED → VERIFIED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.