Diagnose ElasticSearch timeouts on Mozillians jenkins builds.

RESOLVED DUPLICATE of bug 803599

Status

Infrastructure & Operations
WebOps: Other
RESOLVED DUPLICATE of bug 803599
5 years ago
4 years ago

People

(Reporter: sancus, Assigned: phrawzty)

Tracking

Details

(Reporter)

Description

5 years ago
We're getting ES timeouts on jenkins builds: https://ci.mozilla.org/job/mozillians/296/console

I've tried reverting the code all the way back to the build previous to that, which was green, but it still failed: https://ci.mozilla.org/job/mozillians/306/console

Thus, I'm at least fairly sure that we didn't cause this string of failures directly with a code change, and I'd like some help trying to figure out what's going on here so we can get it corrected and get our builds back to green.

Thanks!
I increased the ES timeout in jenkins settings and builds returned to green.

rel commit: https://github.com/mozilla/mozillians/commit/2c654dd60d18856c5a660ef3c998853faa1d0564
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
It seems that the problem is back.

E.g. https://ci.mozilla.org/job/mozillians/333/console

"""
...

TimeoutError: Request timed out after 5.000000 seconds
...
"""
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 3

5 years ago
This is due to a general resource problem on the Jenkins box.  Basically, Elasticsearch and Jenkins are constantly competing for RAM and CPU, which results in one or the other failing in unpredictable ways.  The plan is to build a new services node / cluster to support Jenkins (bug 811380).

That said, I'm not sure that there's really a good short-term solution for the behaviour you're currently experiencing. :/
(Assignee)

Updated

5 years ago
Depends on: 811380
Thanks for the update Daniel!

As you said Jenkins / ES are failing randomly and with some luck and multiple tries we still get to run our tests. I guess this is the short-term "solution" for this problem :)
(Assignee)

Comment 5

5 years ago
As per comment #3 and bug 803599, closing as dupe.
Assignee: server-ops-webops → dmaher
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 803599
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.