Closed
Bug 1471724
Opened 7 years ago
Closed 1 year ago
Ignore ASan shutdown hangs until we can eliminate them
Categories
(Testing :: Mochitest, enhancement)
Testing
Mochitest
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: kmag, Unassigned)
References
Details
Attachments
(1 file, 2 obsolete files)
Landing bug 1442737 somehow triggered unpredictable ASan shutdown hangs every few thousand sessions. This seems to be a bug in ASan, so I don't think it should prevent important patches from landing. But since we run several thousand ASan browser sessions per push, it also really shouldn't trigger a handful of oranges.
As a short term workaround, I'd like to treat those hangs as non-critical failures.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 3•7 years ago
|
||
mozreview-review |
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.
https://reviewboard.mozilla.org/r/253590/#review260218
::: testing/mochitest/runtests.py:2850
(Diff revision 1)
> "%d seconds with no output") % (self.lastTestSeen, int(timeout))
> self.message_logger.dump_buffered()
> self.message_logger.buffering = False
> self.log.info(error_message)
> + if mozinfo.info["asan"] and self.lastTestSeen == "Shutdown":
> - self.log.error("Force-terminating active process(es).")
> + self.log.error("Force-terminating active process(es).")
You want to warn here and error below right?
Reporter | ||
Comment 4•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.
https://reviewboard.mozilla.org/r/253590/#review260218
> You want to warn here and error below right?
Hm. Yes. My last try run passed with it this way around, though, so maybe something else fixed the timeouts...
Comment hidden (mozreview-request) |
Comment 6•7 years ago
|
||
mozreview-review |
Comment on attachment 8988344 [details]
Bug 1471724: Part 1 - Use the correct structured logging for Shutdown task.
https://reviewboard.mozilla.org/r/253588/#review260296
::: testing/mochitest/browser-test.js:634
(Diff revision 1)
>
> // In the main process, we print the ShutdownLeaksCollector message here.
> let pid = Services.appinfo.processID;
> dump("Completed ShutdownLeaks collections in process " + pid + "\n");
>
> - this.structuredLogger.info("TEST-START | Shutdown");
> + this.structuredLogger.testStart("Shutdown");
I could be wrong, but won't we need a testEnd() as well?
Attachment #8988344 -
Flags: review?(jmaher) → review-
Comment 7•7 years ago
|
||
mozreview-review |
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.
https://reviewboard.mozilla.org/r/253590/#review260298
I know this depends on testStart() from the first patch.
Attachment #8988345 -
Flags: review?(jmaher) → review+
Comment 8•7 years ago
|
||
:ahal, could you chime in on making the shutdown of the test harness a formal structured log for testStart() vs .info(TEST-START) ?
Flags: needinfo?(ahal)
Comment 9•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8988344 [details]
Bug 1471724: Part 1 - Use the correct structured logging for Shutdown task.
https://reviewboard.mozilla.org/r/253588/#review260296
> I could be wrong, but won't we need a testEnd() as well?
Agreed, this isn't something we should be doing. It will cause confusion for consumers of structured logs (like ActiveData and all the things that depend on that) by falsely labeling this as a test. It might also cause failures in certain test harnesses if not logged with a corresponding `test_end`. The `TEST-START | Shutdown` message is a holdover from a bygone era and the only reason we haven't changed it to something more sane (like "INFO Test Harness finished") is a probably unfounded fear of breaking non-structured consumers of the log.
We should be able to solve this completely on the python side. Over there we know ahead of time which tests are expected to run. We also know whether a `test_end` action for a given test ever came in. So if we get the `test_end` action of the last test, we can change `self.lastTestSeen` to `Last test finished` then.
I have a WIP patch already, I'll upload shortly.
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.
https://reviewboard.mozilla.org/r/253590/#review260408
::: testing/mochitest/runtests.py:2849
(Diff revision 2)
> + if mozinfo.info["asan"] and self.lastTestSeen == "Shutdown":
> + self.log.warning("Force-terminating active process(es).")
Please add a comment referencing the follow-up bug to actually fix the ASan hangs.
Updated•7 years ago
|
Flags: needinfo?(ahal)
Comment hidden (mozreview-request) |
Comment 12•7 years ago
|
||
Kris could you rebase your second change on top of mine and verify it solves the ASan issue? You'll have to change "Shutdown" to "Last test finished". I think that's a slightly more truthful statement than "Shutdown".
If this works, I could land it in a separate bug which we could then make this one depend on.
Flags: needinfo?(kmaglione+bmo)
Reporter | ||
Comment 13•7 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #12)
> Kris could you rebase your second change on top of mine and verify it solves
> the ASan issue? You'll have to change "Shutdown" to "Last test finished". I
> think that's a slightly more truthful statement than "Shutdown".
>
> If this works, I could land it in a separate bug which we could then make
> this one depend on.
I'll give it a try. So far, it's been hard to test this, since it only shows up sometimes, and then I need to crawl the logs to find the ignored failure, but it seems like it should work.
Thanks
Flags: needinfo?(kmaglione+bmo)
Comment hidden (mozreview-request) |
Reporter | ||
Updated•7 years ago
|
Attachment #8988344 -
Attachment is obsolete: true
Reporter | ||
Comment 15•7 years ago
|
||
Your patch did the job. Can you file a bug to get it landed?
I did wind up having to also change the TEST-UNEXPECTED-TIMEOUT to TEST-EXPECTED-TIMEOUT since treeherder apparently ignores the structured logging output and treats it as an error.
Flags: needinfo?(ahal)
Updated•7 years ago
|
Attachment #8988501 -
Attachment is obsolete: true
Updated•2 years ago
|
Severity: normal → S3
Comment 17•1 year ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Assignee: kmaglione+bmo → nobody
Updated•1 year ago
|
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•