[mozillians.org][prod] ES connection failures

VERIFIED FIXED

Status

Infrastructure & Operations
WebOps: Community Platform
--
major
VERIFIED FIXED
5 months ago
5 months ago

People

(Reporter: nemo, Assigned: danielh)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4860])

(Reporter)

Description

5 months ago
We are getting a lot of tracebacks from mozillians.org prod because of a connection error to ES server.

ConnectionError: ConnectionError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5)) caused by: ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5))

Updated

5 months ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4860]
(Reporter)

Updated

5 months ago
Severity: normal → major
Is this affecting user-facing services, or just spamming up the logs?
(Reporter)

Comment 2

5 months ago
It is affecting user-facing services (mozillians.org search/indexing functionality).
(Assignee)

Comment 3

5 months ago
John,

After restarting the Elasticsearch service on one of the nodes, the cluster is healthy again. I expect those errors to go away.
(Assignee)

Updated

5 months ago
Assignee: server-ops-webops → dhartnell
(Assignee)

Comment 4

5 months ago
This issue should be resolved. I do not have a lot of information to add to this specific bug. Once the underlying issue was addressed and the cluster health was restored, Mozillians began working normally. John let me know that no errors were raised after re-indexing.

Ultimately, restarting the service on elasticsearch6 was sufficient to correct the issue. This issue started in bug 1366946. Unfortunately, I misjudged the severity of that issue at the time which led to the errors reported here. We will be re-evaluating our documentation on Friday to see if we have opportunities to improve it so we can respond to issues better in the future.

I'm going to mark this as solved at this point in time.
Status: NEW → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → FIXED
(Reporter)

Comment 5

5 months ago
I am marking this one as verified. Indeed, after kicking elasticsearch indexing worked fine.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.