Open Bug 1471724 Opened 3 years ago Updated 3 years ago

Ignore ASan shutdown hangs until we can eliminate them

Categories

(Testing :: Mochitest, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

People

(Reporter: kmag, Assigned: kmag)

References

Details

Attachments

(1 file, 2 obsolete files)

Landing bug 1442737 somehow triggered unpredictable ASan shutdown hangs every few thousand sessions. This seems to be a bug in ASan, so I don't think it should prevent important patches from landing. But since we run several thousand ASan browser sessions per push, it also really shouldn't trigger a handful of oranges.

As a short term workaround, I'd like to treat those hangs as non-critical failures.
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.

https://reviewboard.mozilla.org/r/253590/#review260218

::: testing/mochitest/runtests.py:2850
(Diff revision 1)
>                           "%d seconds with no output") % (self.lastTestSeen, int(timeout))
>          self.message_logger.dump_buffered()
>          self.message_logger.buffering = False
>          self.log.info(error_message)
> +        if mozinfo.info["asan"] and self.lastTestSeen == "Shutdown":
> -        self.log.error("Force-terminating active process(es).")
> +            self.log.error("Force-terminating active process(es).")

You want to warn here and error below right?
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.

https://reviewboard.mozilla.org/r/253590/#review260218

> You want to warn here and error below right?

Hm. Yes. My last try run passed with it this way around, though, so maybe something else fixed the timeouts...
Comment on attachment 8988344 [details]
Bug 1471724: Part 1 - Use the correct structured logging for Shutdown task.

https://reviewboard.mozilla.org/r/253588/#review260296

::: testing/mochitest/browser-test.js:634
(Diff revision 1)
>  
>      // In the main process, we print the ShutdownLeaksCollector message here.
>      let pid = Services.appinfo.processID;
>      dump("Completed ShutdownLeaks collections in process " + pid + "\n");
>  
> -    this.structuredLogger.info("TEST-START | Shutdown");
> +    this.structuredLogger.testStart("Shutdown");

I could be wrong, but won't we need a testEnd() as well?
Attachment #8988344 - Flags: review?(jmaher) → review-
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.

https://reviewboard.mozilla.org/r/253590/#review260298

I know this depends on testStart() from the first patch.
Attachment #8988345 - Flags: review?(jmaher) → review+
:ahal, could you chime in on making the shutdown of the test harness a formal structured log for testStart() vs .info(TEST-START) ?
Flags: needinfo?(ahal)
Blocks: 1471222
Comment on attachment 8988344 [details]
Bug 1471724: Part 1 - Use the correct structured logging for Shutdown task.

https://reviewboard.mozilla.org/r/253588/#review260296

> I could be wrong, but won't we need a testEnd() as well?

Agreed, this isn't something we should be doing. It will cause confusion for consumers of structured logs (like ActiveData and all the things that depend on that) by falsely labeling this as a test. It might also cause failures in certain test harnesses if not logged with a corresponding `test_end`. The `TEST-START | Shutdown`  message is a holdover from a bygone era and the only reason we haven't changed it to something more sane (like "INFO Test Harness finished") is a probably unfounded fear of breaking non-structured consumers of the log.

We should be able to solve this completely on the python side. Over there we know ahead of time which tests are expected to run. We also know whether a `test_end` action for a given test ever came in. So if we get the `test_end` action of the last test, we can change `self.lastTestSeen` to `Last test finished` then.

I have a WIP patch already, I'll upload shortly.
Comment on attachment 8988345 [details]
Bug 1471724: Ignore shutdown hangs in ASan code.

https://reviewboard.mozilla.org/r/253590/#review260408

::: testing/mochitest/runtests.py:2849
(Diff revision 2)
> +        if mozinfo.info["asan"] and self.lastTestSeen == "Shutdown":
> +            self.log.warning("Force-terminating active process(es).")

Please add a comment referencing the follow-up bug to actually fix the ASan hangs.
Flags: needinfo?(ahal)
Kris could you rebase your second change on top of mine and verify it solves the ASan issue? You'll have to change "Shutdown" to "Last test finished". I think that's a slightly more truthful statement than "Shutdown".

If this works, I could land it in a separate bug which we could then make this one depend on.
Flags: needinfo?(kmaglione+bmo)
(In reply to Andrew Halberstadt [:ahal] from comment #12)
> Kris could you rebase your second change on top of mine and verify it solves
> the ASan issue? You'll have to change "Shutdown" to "Last test finished". I
> think that's a slightly more truthful statement than "Shutdown".
> 
> If this works, I could land it in a separate bug which we could then make
> this one depend on.

I'll give it a try. So far, it's been hard to test this, since it only shows up sometimes, and then I need to crawl the logs to find the ignored failure, but it seems like it should work.

Thanks
Flags: needinfo?(kmaglione+bmo)
See Also: → 1472064
Attachment #8988344 - Attachment is obsolete: true
Your patch did the job. Can you file a bug to get it landed?

I did wind up having to also change the TEST-UNEXPECTED-TIMEOUT to TEST-EXPECTED-TIMEOUT since treeherder apparently ignores the structured logging output and treats it as an error.
Flags: needinfo?(ahal)
Depends on: 1472280
Filed bug 1472280, patch is up for review.
Flags: needinfo?(ahal)
Attachment #8988501 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.