Attempt to prevent intermittent failure in test_bug514732-2.xhtml
Categories
(Core :: Layout, task)
Tracking
()
People
(Reporter: dholbert, Assigned: dholbert)
References
Details
Attachments
(3 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr115+
|
Details | Review |
I was able to reproduce the intermittent failures in bug 1765787 locally a few times yesterday (though it stopped reproducing for me; it's very fiddly).
In the instances of the failure that I caught, it seems like the issue was this:
- the test has a <browser> that starts out seeked to about:blank
- the test immediately (in its load handler) calls doPageNavigation to navigate that <browser>, and the test listens for a MozScrolledAreaChanged event as a signal that it can move on to the next part of the test.
- BUT occasionally, in that first doPageNavigation call, the <browser>'s initial about:blank document still has a pending reflow which posts a MozScrolledAreaChanged when it completes (because its viewport size does change size -- maybe the outer test element resized or something as part of its load).
- So the test's first doPageNavigation call occasionally receives that MozScrolledAreaChanged and misinterprets it as a sign that the first navigation completed, even though it's barely started.
- ...and the test moves onto the next part of the test, despite that initial page not having been loaded in the <browser> and hence not being in the bfcache.
- This results in us failing to navigate back (because there's nowhere to navigate back to) later on in the test, per bug 1765787 comment 38, and that explains this failure-mode:
TEST-UNEXPECTED-FAIL | layout/generic/test/test_bug514732-2.xhtml | Test timed out.
...because the event that we're waiting for (from our back-navigation) never arrives, because we didn't actually navigate back.
Also: I think in some cases, our first doPageNavigation call does get a chance to complete and so we end up getting two MozScrolledAreaChanged events (the unexpected one from about:blank and the expected one from the actual page that we land on). This explains the other failure-mode, which is:
layout/generic/test/test_bug514732-2.xhtml | Unexpected event (MozScrolledAreaChanged) occurred
I've got a patch that I think will fix things by doing an initial load to a dummy page, where we simply watch for the load
event instead of MozScrolledAreaChanged
. This should establish a more reliable initial state since I think the about:blank load event should not still be pending at that point and won't misfire in the same way that its MozScrolledAreaChanged
event was.
Assignee | ||
Comment 1•9 months ago
|
||
Actually, I think a simpler & more-robust fix is just to explicitly flush layout in the outer and inner doc before we do the navigation. That ensures that any pending reflow-induced MozScrolledAreaChanged
events from about:blank will have already been fired before we make our first doPageNavigation
call.
Assignee | ||
Comment 2•9 months ago
|
||
This patch doesn't change behavior; it's just making this test more robust.
Without this patch, we can occasionall get an unexpected MozScrolledAreaChanged
from the initial about:blank test-document, which throws off the test logic
and causes test failures and timeouts. See this bug comment for more details
on how that happens:
https://bugzilla.mozilla.org/show_bug.cgi?id=1878346#c0
Assignee | ||
Comment 3•9 months ago
|
||
Without my patch, in an --enable-optimize --disable-debug
build, I can pretty reliably reproduce the Unexpected event (MozScrolledAreaChanged) occurred
failure-mode, with the following command:
./mach mochitest --repeat 200 --headless layout/generic/test/test_bug514732-2.xhtml
I ran that twice in a row, and I got 8 failures both times, like so:
===============
mochitest-chrome
~~~~~~~~~~~~~~~~
Ran 1013 checks (812 subtests, 201 tests)
Expected results: 1005
Unexpected results: 8
subtest: 8 (8 fail)
Unexpected Results
------------------
layout/generic/test/test_bug514732-2.xhtml
FAIL Unexpected event (MozScrolledAreaChanged) occurred
SimpleTest.ok@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:426:16
pageEventListener@chrome://mochikit/content/tests/SimpleTest/docshell_helpers.js:490:7
EventListener.handleEvent*doPageNavigation@chrome://mochikit/content/tests/SimpleTest/docshell_helpers.js:245:31
testIterator@chrome://mochitests/content/chrome/layout/generic/test/file_bug514732_window.xhtml:43:25
nextTest@chrome://mochitests/content/chrome/layout/generic/test/file_bug514732_window.xhtml:28:13
setTimeout handler*onload@chrome://mochitests/content/chrome/layout/generic/test/file_bug514732_window.xhtml:1:11
[...7 more copies of the above test-failure]
With this patch, that same command completes successfully with no failures. I ran it again with 4x as many repeat counts (--repeat 800
) and still got no failures. So that's a good sign that this does indeed prevent/mitigate the situation that was leading to the test failures.
Assignee | ||
Comment 4•9 months ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #3)
Without my patch [...] I can pretty reliably reproduce the
Unexpected event (MozScrolledAreaChanged) occurred
failure-mode
[...]
With this patch, that same command completes successfully with no failures.
(To be clear, I think the patch also addresses the other (test-timeout) failure-mode from bug 1765787, too, as described in comment 0 here. Both failure-modes are symptoms of the same underlying thing having gone wrong.)
Assignee | ||
Comment 6•9 months ago
•
|
||
Good news: macos opt test-verify builds seem to confirm that this fixed the issue. They reproduce the failures quite reliably in a build without this patch, and they're solid-green with the patch.
Without the patch: 5/5 opt TV runs are orange:
https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=O5kVqM5eSu6XEGDM3-jR0g.0&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Crunning%2Cpending%2Crunnable&revision=8214788adbf84b869520d84eb3aa4f804017494f&searchStr=test-macosx1015-64-qr%2Fopt-test-verify
With the patch: 14/14 opt TV runs are green:
https://treeherder.mozilla.org/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Crunning%2Cpending%2Crunnable&revision=cfebe4a742ae39f7be3ff14a25ead13369567162&searchStr=test-macosx1015-64-qr%2Fopt-test-verify
Let's get this uplifted to beta/esr115 to stop the intermittent failures there as well.
Assignee | ||
Comment 7•9 months ago
|
||
This patch doesn't change behavior; it's just making this test more robust.
Without this patch, we can occasionally get an unexpected MozScrolledAreaChanged
from the initial about:blank test-document, which throws off the test logic
and causes test failures and timeouts. See this bug comment for more details
on how that happens:
https://bugzilla.mozilla.org/show_bug.cgi?id=1878346#c0
Original Revision: https://phabricator.services.mozilla.com/D200492
Updated•9 months ago
|
Comment 8•9 months ago
|
||
Uplift Approval Request
- User impact if declined: No user impact either way; this is a test-only change
- Code covered by automated testing: yes
- Risk associated with taking this patch: Zero
- Fix verified in Nightly: yes
- Steps to reproduce for manual QE testing: N/A
- Is Android affected?: no
- String changes made/needed: None
- Explanation of risk level: This is a test-only change
- Needs manual QE test: no
Assignee | ||
Comment 9•9 months ago
|
||
This patch doesn't change behavior; it's just making this test more robust.
Without this patch, we can occasionally get an unexpected MozScrolledAreaChanged
from the initial about:blank test-document, which throws off the test logic
and causes test failures and timeouts. See this bug comment for more details
on how that happens:
https://bugzilla.mozilla.org/show_bug.cgi?id=1878346#c0
Original Revision: https://phabricator.services.mozilla.com/D200492
Updated•9 months ago
|
Comment 10•9 months ago
|
||
Uplift Approval Request
- Code covered by automated testing: yes
- User impact if declined: No user impact either way; this is a test-only change
- Fix verified in Nightly: yes
- Risk associated with taking this patch: Zero
- String changes made/needed: None
- Is Android affected?: no
- Steps to reproduce for manual QE testing: N/A
- Needs manual QE test: no
- Explanation of risk level: This is a test-only change
Comment 11•9 months ago
|
||
bugherder |
Updated•9 months ago
|
Updated•9 months ago
|
Comment 12•9 months ago
|
||
uplift |
Updated•9 months ago
|
Updated•9 months ago
|
Comment 13•9 months ago
|
||
uplift |
Assignee | ||
Comment 14•9 months ago
|
||
This test-only-change was uplifted to esr115 (comment 13), but that patch appears to have been inadvertently reverted by an unrelated patch that landed in bug 1874627, specifically this part:
https://hg.mozilla.org/releases/mozilla-esr115/rev/025824aab59e3ee1778a8041eff04b9d653b8468#l4.2
I'm guessing that patch in bug 1874627 was generated by-hand using some sort of diff/revert command, and it inadvertently picked up the changes here as something-to-be-reverted.
We need to get this re-landed on ESR115, or else we'll keep getting intermittent failure reports on bug 1765787 for esr115 indefinitely. What's the best course of action at this point? Should I generate a new ESR115 uplift request? Or can we just automatically re-land the patch from comment 13?
Assignee | ||
Comment 15•9 months ago
|
||
(Aryx, maybe you know the best course-of-action here. I'm guessing that the patch in question (on bug 1874627) was one that you generated & directly-pushed to work around some bitrot that lando may have been hitting; maybe you can just revert your changes to layout/generic/test/file_bug514732_window.xhtml as a followup commit to that bug or this bug?)
Comment 16•9 months ago
|
||
uplift |
Comment 17•9 months ago
|
||
Sorry for the unintended reversal. Comment 16 fixed this.
Assignee | ||
Comment 18•9 months ago
|
||
Thanks!
Description
•