[mozillians.org][prod] ES connection failures

VERIFIED FIXED

Status

--
major
VERIFIED FIXED
2 years ago
3 months ago

People

(Reporter: nemo-yiannis, Assigned: danielh)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4860])

(Reporter)

Description

2 years ago
We are getting a lot of tracebacks from mozillians.org prod because of a connection error to ES server.

ConnectionError: ConnectionError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5)) caused by: ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5))

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/4860]
(Reporter)

Updated

2 years ago
Severity: normal → major
Is this affecting user-facing services, or just spamming up the logs?
(Reporter)

Comment 2

2 years ago
It is affecting user-facing services (mozillians.org search/indexing functionality).
(Assignee)

Comment 3

2 years ago
John,

After restarting the Elasticsearch service on one of the nodes, the cluster is healthy again. I expect those errors to go away.
(Assignee)

Updated

2 years ago
Assignee: server-ops-webops → dhartnell
(Assignee)

Comment 4

2 years ago
This issue should be resolved. I do not have a lot of information to add to this specific bug. Once the underlying issue was addressed and the cluster health was restored, Mozillians began working normally. John let me know that no errors were raised after re-indexing.

Ultimately, restarting the service on elasticsearch6 was sufficient to correct the issue. This issue started in bug 1366946. Unfortunately, I misjudged the severity of that issue at the time which led to the errors reported here. We will be re-evaluating our documentation on Friday to see if we have opportunities to improve it so we can respond to issues better in the future.

I'm going to mark this as solved at this point in time.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
(Reporter)

Comment 5

2 years ago
I am marking this one as verified. Indeed, after kicking elasticsearch indexing worked fine.
Status: RESOLVED → VERIFIED

Updated

3 months ago
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.