Intermittent linux asan jsreftest jobs fail as exceptions with "claim_expired" after multiple retries
Categories
(Taskcluster :: Workers, defect)
Tracking
(Not tracked)
People
(Reporter: CosminS, Unassigned)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
First noticed this here: https://treeherder.mozilla.org/jobs?repo=autoland&revision=7dc6122ebd68b46a9d4320c40c86f20c2b78984a&selectedTaskRun=ZkyCwM6BR-eR9O9vxqOEfg.5&searchStr=Linux%2C18.04%2Cx64%2CWebRender%2Casan%2Copt%2CReftests%2Ctest-linux1804-64-asan-qr%2Fopt-jsreftest%2CJ1 and probably coming from changes in Bug 1847258.
The jobs keep on retrying until they end up as an exception with claim_expired.
There was a similar bug for browser-chrome tests in 1859204 that was fixed by increasing RAM for the machines in https://hg.mozilla.org/integration/autoland/rev/fb7b6fc608af4116fcc99a1003dfafc5bb78818a.
Reporter | ||
Updated•1 year ago
|
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1847258
Reporter | ||
Comment 2•1 year ago
|
||
Could be a combination of Bug 1865910 and then bug 1847258. https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&searchStr=Linux%2C18.04%2Cx64%2CWebRender%2Casan%2Copt%2CReftests%2Ctest-linux1804-64-asan-qr%2Fopt-jsreftest%2CJ&tochange=7dc6122ebd68b46a9d4320c40c86f20c2b78984a&fromchange=ee683c3699a76e2e7159a5297800dc28b78ac22a&selectedTaskRun=Wrbq080wTLyKFnTwDsm13w.5
Comment 3•1 year ago
|
||
Bug 1865910 only affects code inside #ifdef JS_ION_PERF
which is off in all CI builds.
Reporter | ||
Comment 4•1 year ago
|
||
Added some retriggers in this range in case it could be from changes in Bug 1852098.
Comment hidden (Intermittent Failures Robot) |
Comment 6•1 year ago
|
||
On initial inspection, this appears to be either:
- a Docker Worker issue, or
- related to https://github.com/taskcluster/taskcluster/issues/6682
If it is 1) this should be resolved when Docker Worker workers have been replaced with Generic Worker workers in fxci (Docker Worker is no longer supported)
@aerickson - do you have a bug/issue tracking that work?
If this is 2), the SRE team are investigating excessive HTTP 502 errors which seem to be the root cause of many of the claim expired issues we have been seeing.
@wezhou - do you have a bug/issue tracking that work?
Comment 7•1 year ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #6)
If it is 1) this should be resolved when Docker Worker workers have been replaced with Generic Worker workers in fxci (Docker Worker is no longer supported)
@aerickson - do you have a bug/issue tracking that work?
We're tracking that in https://mozilla-hub.atlassian.net/browse/RELOPS-528.
(In reply to Pete Moore [:pmoore][:pete] from comment #6)
If this is 2), the SRE team are investigating excessive HTTP 502 errors which seem to be the root cause of many of the claim expired issues we have been seeing.
@wezhou - do you have a bug/issue tracking that work?
Here is the ticket, https://mozilla-hub.atlassian.net/browse/SVCSE-1609
Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 12•11 months ago
|
||
Stopped failing around Dec 6th: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2023-11-22&endday=2023-12-22&tree=all&failurehash=all&bug=1866612
Description
•