Closed Bug 1712866 Opened 3 years ago Closed 7 months ago

Perma Linux 18.04 x64 tsan opt crashtest exceptions claim_expired

Categories

(Core :: Sanitizers, defect)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: nataliaCs, Unassigned)

References

Details

Attachments

(1 file)

Summary: Intermittent Linux 18.04 x64 tsan opt crashtest exceptions claim_expired → Perma Linux 18.04 x64 tsan opt crashtest exceptions claim_expired

Crashtests were having issues with some suites where it triggered some sort of mysterious infinite-retry on running out of resources. I couldn't work out why it was happening, but I did find and turn off the series of tests that seemed to be causing it. After that I didn't run into any of these, but looks like it's still happening in the wild. I'll examine more of the tests tonight to see if any of them are responsible.

If there are more tests left over in the suite doing this at high enough volume, we may need to rollback until we can find the incriminating tests.

Flags: needinfo?(kwright)

I notice there's no logs for any of these. Is there any way to get logs? Or are they discarded for retry?

None of the crashtest exceptions have any logs, or at least they are not visible on our side.
Aryx, do you know what we can do to access them?
Thank you.

Flags: needinfo?(aryx.bugmail)

The machines terminates before the logs can be uploaded. Requesting an interactive worker, logging in an running the tests might be the only way to catch this.

Flags: needinfo?(aryx.bugmail)

Bug 1712198 which has the first failure added a crashtest. Emilio, can this be fixed soon?

Flags: needinfo?(emilio)

Yeah I'll try to repro. Worst case we can disable the crastest on tsan or something.

Not ideal, but not worse than having the jobs not terminate. I can try
to spend some time digging, but I have other stuff on my plate so this
should at least go back to the previous state.

Assignee: nobody → emilio
Status: NEW → ASSIGNED

Let's disable for now on tsan, I don't have a lot of cycles this month.

Flags: needinfo?(emilio)
See Also: → 1713376

(In reply to Cristian Tuns from comment #18)

This started to fail as an exception on Linux WebRender asan opt.
https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Cpending%2Crunning%2Cretry&fromchange=55fa1d670b119502b16f8795990f9456e1bfcd8c&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Casan%2Copt%2Cmochitests%2Cwith%2Csoftware%2Cwebrender%2Cand%2Cfission%2Cenabled%2Ctest-linux1804-64-asan-qr%2Fopt-mochitest-browser-chrome-swr-fis-e10s%2Cbc13&tochange=728ead24c5e80f438419b406678c567ed1215c50&selectedTaskRun=MlF6ilf-Q_itqcygvcLqQA.5

In the past when I've seen this it was often related to resource exhaustion. If there is something that has recently caused asan to go out of memory then we may run into this issue. From my understanding the machine crashes before it can generate any usable artifacts.

Recent cases are being handled in bug 1731580 (and regressor seems to have been identified).

See Also: → 1731580
Severity: -- → S3

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #21)

Recent cases are being handled in bug 1731580 (and regressor seems to have been identified).

Can this be closed?

Usually we keep the bug open until the issue has been fixed and don't close it when the test gets skipped. Developers can still choose to close it and work on the fix in a different bug.

The leave-open keyword is there and there is no activity for 6 months.
:emilio, maybe it's time to close this bug?
For more information, please visit BugBot documentation.

Flags: needinfo?(emilio)

Not working actively on it, but it seems this hasn't happened in quite a while, let's file a new bug if it ever happens again.

Assignee: emilio → nobody
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Flags: needinfo?(emilio)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: