We are getting a lot of tracebacks from mozillians.org prod because of a connection error to ES server. ConnectionError: ConnectionError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5)) caused by: ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-zlb.vlan81.phx1.mozilla.com', port=9200): Read timed out. (read timeout=5))
Is this affecting user-facing services, or just spamming up the logs?
It is affecting user-facing services (mozillians.org search/indexing functionality).
John, After restarting the Elasticsearch service on one of the nodes, the cluster is healthy again. I expect those errors to go away.
This issue should be resolved. I do not have a lot of information to add to this specific bug. Once the underlying issue was addressed and the cluster health was restored, Mozillians began working normally. John let me know that no errors were raised after re-indexing. Ultimately, restarting the service on elasticsearch6 was sufficient to correct the issue. This issue started in bug 1366946. Unfortunately, I misjudged the severity of that issue at the time which led to the errors reported here. We will be re-evaluating our documentation on Friday to see if we have opportunities to improve it so we can respond to issues better in the future. I'm going to mark this as solved at this point in time.
I am marking this one as verified. Indeed, after kicking elasticsearch indexing worked fine.