Closed Bug 1878346 Opened 9 months ago Closed 9 months ago

Attempt to prevent intermittent failure in test_bug514732-2.xhtml

Tracking

()

Status:

RESOLVED FIXED

Milestone:

124 Branch

Tracking Flags:

Tracking

Status

firefox-esr115

---

fixed

firefox123

---

fixed

firefox124

---

fixed

People

(Reporter: dholbert, Assigned: dholbert)

References

Details

Attachments

(3 files)

Bug 1878346: Avoid intermittent test failures in test_bug514732-2.xhtml by flushing layout before setting up MozScrolledAreaChanged event-handler. r?#layout 9 months ago Daniel Holbert [:dholbert] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1878346: Avoid intermittent test failures in test_bug514732-2.xhtml by flushing layout before setting up MozScrolledAreaChanged event-handler. r?#layout 9 months ago Daniel Holbert [:dholbert] 48 bytes, text/x-phabricator-request	phab-bot : approval-mozilla-beta+	Details \| Review
Bug 1878346: Avoid intermittent test failures in test_bug514732-2.xhtml by flushing layout before setting up MozScrolledAreaChanged event-handler. r?#layout 9 months ago Daniel Holbert [:dholbert] 48 bytes, text/x-phabricator-request	phab-bot : approval-mozilla-esr115+	Details \| Review

Daniel Holbert [:dholbert]

Assignee

Description

•

9 months ago

•

Edited

I was able to reproduce the intermittent failures in bug 1765787 locally a few times yesterday (though it stopped reproducing for me; it's very fiddly).

In the instances of the failure that I caught, it seems like the issue was this:

the test has a <browser> that starts out seeked to about:blank
the test immediately (in its load handler) calls doPageNavigation to navigate that <browser>, and the test listens for a MozScrolledAreaChanged event as a signal that it can move on to the next part of the test.
BUT occasionally, in that first doPageNavigation call, the <browser>'s initial about:blank document still has a pending reflow which posts a MozScrolledAreaChanged when it completes (because its viewport size does change size -- maybe the outer test element resized or something as part of its load).
So the test's first doPageNavigation call occasionally receives that MozScrolledAreaChanged and misinterprets it as a sign that the first navigation completed, even though it's barely started.
...and the test moves onto the next part of the test, despite that initial page not having been loaded in the <browser> and hence not being in the bfcache.
This results in us failing to navigate back (because there's nowhere to navigate back to) later on in the test, per bug 1765787 comment 38, and that explains this failure-mode:

TEST-UNEXPECTED-FAIL | layout/generic/test/test_bug514732-2.xhtml | Test timed out.

...because the event that we're waiting for (from our back-navigation) never arrives, because we didn't actually navigate back.

Also: I think in some cases, our first doPageNavigation call does get a chance to complete and so we end up getting two MozScrolledAreaChanged events (the unexpected one from about:blank and the expected one from the actual page that we land on). This explains the other failure-mode, which is:

layout/generic/test/test_bug514732-2.xhtml | Unexpected event (MozScrolledAreaChanged) occurred

I've got a patch that I think will fix things by doing an initial load to a dummy page, where we simply watch for the load event instead of MozScrolledAreaChanged. This should establish a more reliable initial state since I think the about:blank load event should not still be pending at that point and won't misfire in the same way that its MozScrolledAreaChanged event was.

Daniel Holbert [:dholbert]

Assignee

Comment 1

•

9 months ago

Actually, I think a simpler & more-robust fix is just to explicitly flush layout in the outer and inner doc before we do the navigation. That ensures that any pending reflow-induced MozScrolledAreaChanged events from about:blank will have already been fired before we make our first doPageNavigation call.

Daniel Holbert [:dholbert]

Assignee

Comment 2

•

9 months ago

Attached file Bug 1878346: Avoid intermittent test failures in test_bug514732-2.xhtml by flushing layout before setting up MozScrolledAreaChanged event-handler. r?#layout — Details

This patch doesn't change behavior; it's just making this test more robust.

Without this patch, we can occasionall get an unexpected MozScrolledAreaChanged
from the initial about:blank test-document, which throws off the test logic
and causes test failures and timeouts. See this bug comment for more details
on how that happens:
https://bugzilla.mozilla.org/show_bug.cgi?id=1878346#c0

Daniel Holbert [:dholbert]

Assignee

Comment 3

•

9 months ago

Without my patch, in an --enable-optimize --disable-debug build, I can pretty reliably reproduce the Unexpected event (MozScrolledAreaChanged) occurred failure-mode, with the following command:

./mach mochitest --repeat 200 --headless layout/generic/test/test_bug514732-2.xhtml

I ran that twice in a row, and I got 8 failures both times, like so:

===============

mochitest-chrome
~~~~~~~~~~~~~~~~
Ran 1013 checks (812 subtests, 201 tests)
Expected results: 1005
Unexpected results: 8
  subtest: 8 (8 fail)

Unexpected Results
------------------
layout/generic/test/test_bug514732-2.xhtml
  FAIL Unexpected event (MozScrolledAreaChanged) occurred
SimpleTest.ok@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:426:16
pageEventListener@chrome://mochikit/content/tests/SimpleTest/docshell_helpers.js:490:7
EventListener.handleEvent*doPageNavigation@chrome://mochikit/content/tests/SimpleTest/docshell_helpers.js:245:31
testIterator@chrome://mochitests/content/chrome/layout/generic/test/file_bug514732_window.xhtml:43:25
nextTest@chrome://mochitests/content/chrome/layout/generic/test/file_bug514732_window.xhtml:28:13
setTimeout handler*onload@chrome://mochitests/content/chrome/layout/generic/test/file_bug514732_window.xhtml:1:11
[...7 more copies of the above test-failure]

With this patch, that same command completes successfully with no failures. I ran it again with 4x as many repeat counts (--repeat 800) and still got no failures. So that's a good sign that this does indeed prevent/mitigate the situation that was leading to the test failures.

Daniel Holbert [:dholbert]

Assignee

Comment 4

•

9 months ago

(In reply to Daniel Holbert [:dholbert] from comment #3)

Without my patch [...] I can pretty reliably reproduce the Unexpected event (MozScrolledAreaChanged) occurred failure-mode
[...]
With this patch, that same command completes successfully with no failures.

(To be clear, I think the patch also addresses the other (test-timeout) failure-mode from bug 1765787, too, as described in comment 0 here. Both failure-modes are symptoms of the same underlying thing having gone wrong.)

Pulsebot

Comment 5

•

9 months ago

Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0238b7b5d676 Avoid intermittent test failures in test_bug514732-2.xhtml by flushing layout before setting up MozScrolledAreaChanged event-handler. r=layout-reviewers,emilio

Daniel Holbert [:dholbert]

Assignee

Comment 6

•

9 months ago

•

Edited

Good news: macos opt test-verify builds seem to confirm that this fixed the issue. They reproduce the failures quite reliably in a build without this patch, and they're solid-green with the patch.

Without the patch: 5/5 opt TV runs are orange:
https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=O5kVqM5eSu6XEGDM3-jR0g.0&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Crunning%2Cpending%2Crunnable&revision=8214788adbf84b869520d84eb3aa4f804017494f&searchStr=test-macosx1015-64-qr%2Fopt-test-verify

With the patch: 14/14 opt TV runs are green:
https://treeherder.mozilla.org/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Crunning%2Cpending%2Crunnable&revision=cfebe4a742ae39f7be3ff14a25ead13369567162&searchStr=test-macosx1015-64-qr%2Fopt-test-verify

Let's get this uplifted to beta/esr115 to stop the intermittent failures there as well.

Daniel Holbert [:dholbert]

Assignee

Comment 7

•

9 months ago

Attached file Bug 1878346: Avoid intermittent test failures in test_bug514732-2.xhtml by flushing layout before setting up MozScrolledAreaChanged event-handler. r?#layout — Details

This patch doesn't change behavior; it's just making this test more robust.

Without this patch, we can occasionally get an unexpected MozScrolledAreaChanged
from the initial about:blank test-document, which throws off the test logic
and causes test failures and timeouts. See this bug comment for more details
on how that happens:
https://bugzilla.mozilla.org/show_bug.cgi?id=1878346#c0

Original Revision: https://phabricator.services.mozilla.com/D200492

Phabricator Automation

Updated

•

9 months ago

Attachment #9378089 - Flags: approval-mozilla-beta?

Phabricator Automation

Comment 8

•

9 months ago

Uplift Approval Request

User impact if declined: No user impact either way; this is a test-only change
Code covered by automated testing: yes
Risk associated with taking this patch: Zero
Fix verified in Nightly: yes
Steps to reproduce for manual QE testing: N/A
Is Android affected?: no
String changes made/needed: None
Explanation of risk level: This is a test-only change
Needs manual QE test: no

Daniel Holbert [:dholbert]

Assignee

Comment 9

•

9 months ago

Attached file Bug 1878346: Avoid intermittent test failures in test_bug514732-2.xhtml by flushing layout before setting up MozScrolledAreaChanged event-handler. r?#layout — Details

This patch doesn't change behavior; it's just making this test more robust.

Original Revision: https://phabricator.services.mozilla.com/D200492

Phabricator Automation

Updated

•

9 months ago

Attachment #9378090 - Flags: approval-mozilla-esr115?

Phabricator Automation

Comment 10

•

9 months ago

Uplift Approval Request

Code covered by automated testing: yes
User impact if declined: No user impact either way; this is a test-only change
Fix verified in Nightly: yes
Risk associated with taking this patch: Zero
String changes made/needed: None
Is Android affected?: no
Steps to reproduce for manual QE testing: N/A
Needs manual QE test: no
Explanation of risk level: This is a test-only change

Sandor Molnar[:smolnar]

Comment 11

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0238b7b5d676

Status: NEW → RESOLVED

Closed: 9 months ago

status-firefox124: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 124 Branch

Phabricator Automation

Updated

•

9 months ago

Attachment #9378089 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Lando Automation

Updated

•

9 months ago

status-firefox123: --- → fixed

Pulsebot

Comment 12

•

9 months ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/b90706774a91

Phabricator Automation

Updated

•

9 months ago

Attachment #9378090 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+

Lando Automation

Updated

•

9 months ago

status-firefox-esr115: --- → fixed

Pulsebot

Comment 13

•

9 months ago

uplift

https://hg.mozilla.org/releases/mozilla-esr115/rev/8b7f1d62fafd

Daniel Holbert [:dholbert]

Assignee

Comment 14

•

9 months ago

This test-only-change was uplifted to esr115 (comment 13), but that patch appears to have been inadvertently reverted by an unrelated patch that landed in bug 1874627, specifically this part:
https://hg.mozilla.org/releases/mozilla-esr115/rev/025824aab59e3ee1778a8041eff04b9d653b8468#l4.2

I'm guessing that patch in bug 1874627 was generated by-hand using some sort of diff/revert command, and it inadvertently picked up the changes here as something-to-be-reverted.

We need to get this re-landed on ESR115, or else we'll keep getting intermittent failure reports on bug 1765787 for esr115 indefinitely. What's the best course of action at this point? Should I generate a new ESR115 uplift request? Or can we just automatically re-land the patch from comment 13?

Flags: needinfo?(pascalc)

Daniel Holbert [:dholbert]

Assignee

Comment 15

•

9 months ago

(Aryx, maybe you know the best course-of-action here. I'm guessing that the patch in question (on bug 1874627) was one that you generated & directly-pushed to work around some bitrot that lando may have been hitting; maybe you can just revert your changes to layout/generic/test/file_bug514732_window.xhtml as a followup commit to that bug or this bug?)

Flags: needinfo?(aryx.bugmail)

Pulsebot

Comment 16

•

9 months ago

uplift

https://hg.mozilla.org/releases/mozilla-esr115/rev/83d1c3fc0bb7

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 17

•

9 months ago

Sorry for the unintended reversal. Comment 16 fixed this.

Flags: needinfo?(pascalc)

Flags: needinfo?(aryx.bugmail)

Daniel Holbert [:dholbert]

Assignee

Comment 18

•

9 months ago

Thanks!

You need to log in before you can comment on or make changes to this bug.