Frequent Linux 18.04 x64 WebRender asan opt browser-chrome jobs fail as exceptions with "claim_expired"
Categories
(Taskcluster :: Workers, defect)
Tracking
(Not tracked)
People
(Reporter: imoraru, Assigned: jmaher)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
I think this is somehow regressed by the changes from Bug 1855321.
- backfill range and retriggers
From the backfill range and retriggers we can see that this appeared when the bug first landed, the failures then disappeared when it got backed out and they appeared again when the bug re-landed.
Hi Chris! Can you please take a look at this?
Thank you!
Comment 1•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Graphics: WebRender' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•1 year ago
|
||
:cfallin, since you are the author of the regressor, bug 1855321, could you take a look?
For more information, please visit BugBot documentation.
Reporter | ||
Updated•1 year ago
|
Comment 3•1 year ago
|
||
:cfallin, since you are the author of the regressor, bug 1855321, could you take a look?
Unfortunately, no, I can't: the link to the log tells me
NetworkError when attempting to fetch resource.
An error occurred attempting to load the provided log.
Please check the URL and ensure it is reachable.
Without logs, I can't really do much more.
In any case, my change was (i) ifdef'd out by default, (ii) to the JavaScript engine, (iii) having nothing to do with software WebRender. I would be very surprised if it were related. Any more information showing how it is related (and, ideally, a way for me to reproduce) would be helpful!
Updated•1 year ago
|
Reporter | ||
Comment 4•1 year ago
|
||
Yes, unfortunately this kind of failures do not have a log. I was surprised to see that the backfills pointed at that bug as the culprit that is why I asked if somehow it could cause this.
Perhaps Aryx will have additional insight into this or Luca.
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Updated•1 year ago
|
Comment 7•1 year ago
|
||
I'm not sure I can add much, trying to look at the failures linked from comment 5 didn't work (treeherder view doesn't seem to work for them), and the bug Jan linked in comment 6 seems to suggest this may be an infra issue, and so it may not even be a webextensions test failure that is actually being hit.
Clearning my pending needinfo for now, but feel free to add it back if we got a link to some webextensions test failures or some other issue that looks like on the webextensions side of things.
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 14•1 year ago
|
||
This is starting to get more and more frequent, hence reaching our disable-recommended queue.
Pete, can you help us redirect this bug?
Thank you.
![]() |
||
Comment 15•1 year ago
|
||
The recent failures are for tasks which ran the tests listed in browser/components/sessionstore/test/browser.toml. There have been no modifications to the test folder except eslint changes for the last week while the frequent failures started 3 days ago.
Andreas, you could investigate? One of the tests causes the test machine to become unresponsive (often from an OOM-like situation) and it stops to communicate with the worker manager.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 18•1 year ago
|
||
Do you have a link to logs where this happens? I the only lead is that it happens in browser/components/sessionstore/test/browser.toml I have no real place to start. Is there a way I can reproduce this? If so, is there a way to chunk this particular folder to see if that reproduces it?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 21•1 year ago
|
||
(In reply to Andreas Farre [:farre] from comment #18)
Do you have a link to logs where this happens? I the only lead is that it happens in browser/components/sessionstore/test/browser.toml I have no real place to start. Is there a way I can reproduce this? If so, is there a way to chunk this particular folder to see if that reproduces it?
There are no logs available because the worker gets unresponsive and stops communicating with the taskcluster instance and does not upload logs.
A Try push which only requests this folder should reproduce the issue. There are many tests in the manifest, and I hope a domain expert can identify what causes the issue. Interactive workers fail to run any browser-chrome tests on Linux ASan.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 30•1 year ago
•
|
||
Similar to bug 1863773, the timeout happens after the last test in the file finished.
![]() |
||
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 36•1 year ago
|
||
The issue has spread, now 5/16 Linux asan browser-chrome chunks fail or are likely to fail. Interactive debugging is not possible because of bug 1862426.
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 38•1 year ago
|
||
This might have gone away after changes in https://hg.mozilla.org/integration/autoland/rev/fb7b6fc608af4116fcc99a1003dfafc5bb78818a
Comment 39•1 year ago
|
||
cc yarik - although I would be surprised if increasing RAM helps.
Comment hidden (Intermittent Failures Robot) |
Comment 41•1 year ago
|
||
I think there are multiple causes of claim_expired
:
- Workers dying for unknown reason while running task
- Worker-runner <> worker miscommunication (shutting down itself while asking for more work)
- Network responses that never reach workers (which was discovered last week)
I believe that increasing RAM might help with the first case, if OOM errors were happening there (we've seen quite a lot of similar issues on CommunityTC in fuzzing)
Comment 42•1 year ago
|
||
Latest failures are these jsreftest that might have other cause. https://treeherder.mozilla.org/jobs?repo=autoland&revision=7dc6122ebd68b46a9d4320c40c86f20c2b78984a&selectedTaskRun=ZkyCwM6BR-eR9O9vxqOEfg.5&searchStr=Linux%2C18.04%2Cx64%2CWebRender%2Casan%2Copt%2CReftests%2Ctest-linux1804-64-asan-qr%2Fopt-jsreftest%2CJ1
The mochitest failures have stopped since increasing RAM.
Comment 43•1 year ago
|
||
No more failures here since the RAM increase. For the jsreftest exceptions I've filed 1866612.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 46•1 year ago
|
||
These kind of failures have resurfaced here. Both jobs run on t-linux-large-gcp
machines and are mochitest-plain
.
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 48•10 months ago
|
||
Hey Joel, this has been failing often (perhaps even perma) since Bug 1889412 has landed as seen here. Could you take a look please?
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 50•10 months ago
|
||
Updated•10 months ago
|
Assignee | ||
Comment 51•10 months ago
|
||
splitting up long manifest that causes memory issues, disable a very long running test, and run on xlarge instances instead of large.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•10 months ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 55•10 months ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 57•10 months ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Description
•